1 link tagged with all of: fine-tuning + vintage-llm + llama + dataset-processing
Click any tag below to further narrow down your results
Links
The author describes creating a 340 M-parameter Llama-based model trained exclusively on English texts published before 1900. They built custom data pipelines, tokenization, base-training and fine-tuning scripts, handled deduplication and filtering of historical sources, and trained locally and on cloud GPUs for about $80. The result is a toy “Victorian” chatbot that can hallucinate and isn’t aligned for modern safety.
llama
vintage-llm
dataset-processing
fine-tuning
+ historical-nlp
+ tldr-a-byte-sized-daily-tech-newsletter