6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article details a project where the author trains a smaller LLM to understand and generate diagrams in the Pintora language. The process includes dataset creation, two training phases, and evaluation of the model's accuracy in producing valid diagram syntax.
If you do, here's more
Text-to-diagram capabilities are mostly developed for popular languages like Mermaid and PlantUML, leaving less common languages like Pintora underexplored. The author aims to train a language model (LLM) to generate and edit diagrams using Pintora's syntax, focusing on models smaller than 30 billion parameters due to resource constraints. After considering various options, Qwen2.5-Coder-7B was selected for its coding capabilities despite being an older model.
The training involves two phases. First, the model undergoes Continued Pretraining (CPT) to learn Pintora's syntax through a dataset of around 1,000 to 1,500 entries, with a balanced mix of diagram types. The second phase, Instruction Finetune (IFT), focuses on teaching the model how to respond to specific diagramming tasks. The initial attempt to create training data manually proved inefficient, prompting the author to use AI to generate it. Despite generating 2,000 entries, much of the code was incorrect, leading to a cleaning process that resulted in 1,000 usable rows for training.
Training began on Google Colab but quickly hit memory limitations, necessitating a switch to a more powerful GPU. Even after optimizing the model by removing unnecessary components, the training still demanded significant resources. After completing the CPT phase, the model produced incorrect syntax, indicating further instruction was needed. The IFT phase improved its performance, allowing it to generate more accurate Pintora diagrams. The author evaluated the model's accuracy with randomized prompts and found it successfully learned to produce valid Pintora code instead of defaulting to other diagramming languages.
Questions about this article
No questions yet.