Quit Emailing Yourself

Making Very Small LLMs Smarter With RAG | Docker

6 min read | Saved February 14, 2026 | Copied!

llms 🤖 coding 🤖 golang 🤖 retrieval-augmented-generation 🤖 docker 🤖

Do you care about this?

Philippe discusses using small language models (LLMs) for coding tasks, particularly with a Golang project called Nova. He outlines techniques for improving model performance through tailored prompts and a method called Retrieval Augmented Generation (RAG).

If you do, here's more

Philippe, a Principal Solutions Architect, explores how to leverage small language models (LLMs) for coding assistance. He emphasizes that while you can't replicate powerful models like Claude AI or ChatGPT on a local machine, smaller models can still be effectively utilized with creativity and effort. Philippe focuses on a specific use case: developing a Golang library called Nova. He notes that existing LLMs often struggle with unfamiliar projects, which leads to subpar code suggestions. Traditional online tools like Claude or Gemini may not be accessible due to confidentiality or internet restrictions, making local LLMs a viable alternative.

Choosing the right language model is key. For his purposes, he selects a 3 billion parameter model, Qwen 2.5 Coder, optimized for code generation. However, small models have limitations, particularly regarding context size. Philippe outlines two critical rules: providing excessive information reduces effectiveness, and maintaining a long conversation history can overwhelm the model. To counter these issues, he introduces a technique known as Retrieval Augmented Generation (RAG). Instead of feeding the entire code snippets file to the model, he stores relevant snippets in a vector database and retrieves only the necessary information based on user queries.

The process involves embedding user requests and comparing them against stored embeddings to find the most relevant code snippets. This similarity search allows him to construct targeted prompts that guide the model more effectively. He details the technical steps, including creating embeddings using a specific model and implementing a similarity search algorithm. Philippe shares that all source code and detailed implementation steps are available in his GitHub project, providing a roadmap for others interested in similar applications.

Questions about this article

No questions yet.