Quit Emailing Yourself

Octomind - Automated E2E tests at scale for web

7 min read | Saved February 14, 2026 | Copied!

ai 🤖 coding 🤖 productivity 🤖 software-development 🤖 limitations 🤖

Do you care about this?

The article discusses the limitations of AI agents in software development, highlighting that humans still write most of the code. Despite experimenting with various coding agents, the author found that AI's productivity gains were minimal and its outputs often missed critical details and context. Key issues include a loss of mental model and AI's inability to self-assess its performance accurately.

If you do, here's more

At Octomind, the team is heavily invested in AI agents but finds that human developers still write most of their code. They’ve experimented with tools like Cursor and Claude Code but haven't seen significant productivity gains, often falling short of the 20% improvement others boast about. A recent project aimed to create a feature for handling branch-specific test cases in their end-to-end testing platform, which is essential for maintaining test integrity across different code branches. The existing setup often leads to either blocking unrelated pull requests or causing other developers' pipelines to break when changes are made.

During their first attempt using AI agents to build this feature, the results were disappointing. The agent produced a lengthy pull request filled with basic errors that a human developer wouldn’t typically miss, such as failing to wire up new components and ignoring established coding conventions. The team had to deal with a 2,000-line PR filled with issues, requiring extensive manual review and rework. In a second attempt, they focused on smaller, incremental changes, resulting in another large pull request with basic issues like transaction handling errors. This approach highlighted the agents' tendency to create half-finished work, leading to confidence in flawed outputs.

A significant concern is the loss of the developer's mental model of the codebase. As the AI generates large PRs, developers lose the intuitive understanding of how various components interact. This gap in knowledge means that when complex issues arise, developers feel like they’re encountering the code for the first time, lacking the context that comes from working through changes themselves. This reality check underscores that while AI can assist in coding, it cannot yet replace the nuanced understanding and judgment that human developers bring to the table.

Questions about this article

No questions yet.