Quit Emailing Yourself

I Improved 15 LLMs at Coding in One Afternoon. Only the Harness Changed.

6 min read | Saved February 14, 2026 | Copied!

harness 🤖 coding 🤖 benchmarks 🤖 optimization 🤖 llm 🤖

Do you care about this?

The article discusses how the effectiveness of large language models (LLMs) in coding tasks often hinges on the harness used rather than the model itself. By experimenting with different editing tools, the author demonstrates significant improvements in performance, highlighting the importance of optimizing harnesses for better results.

If you do, here's more

The article highlights the importance of the "harness" in AI coding models, arguing that it significantly impacts performance, often more than the models themselves. The author critiques the current focus on comparing models like GPT-5.3 and Opus while pointing out that issues often arise from how input and output are managed. The author’s own project, oh-my-pi, illustrates this point. By making small changes to the harness, they improved the efficiency of various language models, showing that the interface between the model and the user experience is often overlooked.

The piece explains how different models handle editing tasks. For instance, Codex uses a method that relies heavily on structured input, which leads to high failure rates for models unfamiliar with its format. In contrast, simpler methods like string replacement lead to frequent errors due to their rigid requirements. The author proposes a solution called "hashline," where each line in the code is tagged with a unique identifier. This approach could help models reference specific lines without needing to reproduce the original content, potentially reducing error rates and improving success in coding tasks.

Benchmark results indicate that this hashline method significantly enhances performance across various models. In extensive tests, Grok Code Fast 1 saw its success rate jump from 6.7% to 68.3% simply by switching edit formats, demonstrating the critical role of the harness in the coding process. The author argues that harness optimization can lead to substantial gains without any additional training costs, making the case that improving how these models interact with code is just as important, if not more so, than the models themselves.

Questions about this article

No questions yet.