dataset-processing

# fine-tuning → vintage-llm → llama → dataset-processing

1 link tagged with all of: fine-tuning + vintage-llm + llama + dataset-processing

Click any tag below to further narrow down your results

Links

Making a vintage LLM from scratch - Cr;Lf;

The author describes creating a 340 M-parameter Llama-based model trained exclusively on English texts published before 1900. They built custom data pipelines, tokenization, base-training and fine-tuning scripts, handled deduplication and filtering of historical sources, and trained locally and on cloud GPUs for about $80. The result is a toy “Victorian” chatbot that can hallucinate and isn’t aligned for modern safety.

Last saved Jun 18, 2026 · 7 min read

llama vintage-llm dataset-processing fine-tuning + historical-nlp + tldr-a-byte-sized-daily-tech-newsletter