Quit Emailing Yourself

Scaling long-running autonomous coding

4 min read | Saved February 14, 2026 | Copied!

agents 🤖 coding 🤖 coordination 🤖 software-development 🤖 automation 🤖

Do you care about this?

This article details experiments with multiple autonomous coding agents working together on complex software projects. It discusses the challenges of coordination, the evolution from a flat structure to a role-based system, and the successes achieved, including building a web browser from scratch. The authors emphasize the importance of model choice and simplicity in design.

If you do, here's more

The article highlights experiments with autonomous coding agents designed to tackle complex projects that typically require significant human effort. The focus was on using hundreds of agents to complete a project that could take human teams months. During the tests, the agents collectively wrote over a million lines of code and handled trillions of tokens. Initial attempts at dynamic coordination faced challenges, such as agents getting stuck due to locking mechanisms and a lack of hierarchy, which led to risk-averse behavior and inefficiency.

Switching to a more structured approach helped. They established distinct roles: planners create tasks and can spawn sub-planners, while workers focus exclusively on completing assigned tasks without worrying about coordination. This streamlined the process, allowing the agents to work on ambitious projects like building a web browser from scratch, which generated over a million lines of code in less than a week. Other experiments showed agents successfully migrating a codebase from Solid to React and significantly improving video rendering speed.

The findings indicate that model choice is critical for long-running tasks. The GPT-5.2 model outperformed others in maintaining focus and implementing tasks accurately. The team learned that simplifying systems often leads to better outcomes, as adding complexity can create bottlenecks. Although the multi-agent coordination challenge remains, the results suggest that deploying many agents can lead to substantial progress on complex coding tasks.

Questions about this article

No questions yet.