Quit Emailing Yourself

Traversing the Frontier of Superintelligence

6 min read | Saved February 14, 2026 | Copied!

poetiq 🤖 ai 🤖 benchmarks 🤖 reasoning 🤖 models 🤖

Do you care about this?

Poetiq announced it has set new performance standards on the ARC-AGI benchmarks by integrating the latest AI models, Gemini 3 and GPT-5.1. Their systems improve accuracy while reducing costs, demonstrating significant advancements in AI reasoning capabilities.

If you do, here's more

Poetiq has achieved a significant milestone in AI reasoning by setting new records on the ARC-AGI-1 and ARC-AGI-2 benchmarks. Their systems, which integrate the latest models like Gemini 3 and GPT-5.1, outperform previous results by establishing new Pareto frontiers for cost and performance. This means they deliver better outcomes for lower costs across various levels of the ARC-AGI-2 Public Eval Set, which presents more complex tasks than ARC-AGI-1. Their configurations, including Poetiq (Mix) and Poetiq (Grok-4-Fast), demonstrate remarkable accuracy while being cost-effective, with some solutions costing less than 1 cent per problem.

The flexibility of Poetiq’s meta-system allows it to dynamically select and combine models for optimal performance. Notably, the system can autonomously adapt to various underlying LLM architectures, achieving improvements in accuracy and cost for popular models from Google DeepMind, OpenAI, and others. Their approach involves a multi-step problem-solving process that refines solutions iteratively, along with self-auditing capabilities that help minimize unnecessary computations. This adaptability is evident, as Poetiq's results have surpassed average human test-takers, indicating the system's potential for complex reasoning tasks.

In terms of architecture, Poetiq's system is recursive and LLM-agnostic, which enables rapid deployment of its solutions. The research highlights that their methods for optimizing reasoning strategies are not predetermined but discovered through interaction with the models. This adaptability is crucial for handling the inherent unpredictability in LLMs, allowing the system to efficiently extract and assemble knowledge necessary for reasoning tasks. The open-source code for their configurations is available on Github, aiming to inspire further advancements in AI.

Questions about this article

No questions yet.