Click any tag below to further narrow down your results
Links
The author details their process of building a domain-specific LLM using a 1 billion parameter Llama 3-style model on 8 H100 GPUs. They cover infrastructure setup, memory management, token budget, and optimization techniques like torch.compile to improve training efficiency.
This article details a project where the author trains a smaller LLM to understand and generate diagrams in the Pintora language. The process includes dataset creation, two training phases, and evaluation of the model's accuracy in producing valid diagram syntax.
nanochat is a full-stack implementation of a ChatGPT-like language model that can be trained on an 8XH100 GPU node for about $800. It features a simple UI for interaction and is designed to be highly configurable and hackable by users, allowing them to train and customize their own models. While it currently outperforms GPT-2, it still has limitations compared to more advanced models like GPT-5.
The article discusses strategies for leveraging Wikipedia to enhance the performance and training of large language models (LLMs). It emphasizes the importance of utilizing high-quality, well-sourced information from Wikipedia to improve the accuracy and reliability of LLM outputs. Key techniques include effective summarization and the integration of Wikipedia content into training datasets.
The article introduces "create-llm," a CLI tool designed to quickly scaffold production-ready PyTorch training projects for language models, similar to create-next-app. It offers various templates for different use cases, enabling users to set up training with minimal effort, complete with essential features like data preprocessing, checkpoint management, and integration options for popular tools.