Quit Emailing Yourself

Background Coding Agents: Predictable Results Through Strong Feedback Loops (Part 3) | Spotify Engineering

5 min read | Saved February 14, 2026 | Copied!

automation 🤖 software 🤖 verification 🤖 coding-agents 🤖 ci-cd 🤖

Do you care about this?

This article discusses Spotify’s approach to using background coding agents for software maintenance. It outlines the failure modes of these agents, the design of verification loops to ensure reliable outputs, and future plans for expanding the system's capabilities.

If you do, here's more

Spotify is working on background coding agents to automate software maintenance, focusing on creating reliable systems that operate with minimal human supervision. The main failures these agents encounter can be categorized into three types: failure to produce a pull request (PR), producing a PR that fails in continuous integration (CI), and generating a PR that passes CI but is functionally incorrect. The last two failure modes are particularly problematic because they can lead to significant issues in production, eroding trust in the automation.

To address these challenges, Spotify has implemented strong verification loops. These loops guide the agents by using independent verifiers that automatically activate based on the software components being modified. This setup helps the agents receive incremental feedback without needing to understand the complex details of the verification process. In addition, a language model (LLM) acts as a judge, evaluating the proposed changes against the original prompt to ensure the agent remains focused on its task. Internal metrics show that this judge vetoes about 25% of agent sessions, helping the agent correct its course when it strays from instructions.

Looking ahead, Spotify plans to expand its verifier infrastructure to support different hardware and operating systems, as their current setup only works on Linux x86. They also aim to integrate the background agent more closely with existing CI/CD pipelines, allowing it to react to CI checks in GitHub PRs. Finally, they recognize the need for structured evaluations to assess system prompts and agent architectures, which will enhance the reliability and efficiency of their coding agents.

Questions about this article

No questions yet.