Quit Emailing Yourself

Tongyi DeepResearch: A New Era of Open-Source AI Researchers

5 min read | Saved February 14, 2026 | Copied!

deep-research 🤖 ai-agent 🤖 synthetic-data 🤖 reasoning 🤖 training-pipeline 🤖

Do you care about this?

This article presents Tongyi DeepResearch, an open-source AI agent that matches OpenAI's benchmarks in various reasoning tasks. It outlines the innovative methodologies used in training, including synthetic data generation and new reasoning frameworks. The focus is on enhancing the agent's decision-making and planning capabilities.

If you do, here's more

Tongyi DeepResearch is an open-source web agent that matches the performance of OpenAI’s DeepResearch on several benchmarks. It scored 32.9 on Humanity’s Last Exam (HLE), 43.4 on BrowseComp, and 46.7 on BrowseComp-ZH, excelling in complex information-seeking tasks. The model also achieved a score of 75 on the xbench-DeepSearch benchmark, outperforming existing deep research agents. The creators provide a comprehensive methodology for developing such advanced agents, including a novel data synthesis solution that covers the entire training process.

The training involves Agentic Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT), culminating in a Reinforcement Learning (RL) phase. The data collection process is extensive, pulling from various sources like documents and knowledge graphs. Multi-style question-answer pairs are generated from this data, with first-order and higher-order action synthesis data enhancing decision-making capabilities. A fully automated process produces high-quality synthetic datasets without human intervention, allowing for the creation of complex, multi-source questions and answers through a structured pipeline.

The model incorporates two key reasoning frameworks: the ReAct framework for multi-turn reasoning and the IterResearch paradigm, which creates a streamlined workspace for each decision. This enhances the agent's ability to plan and use tools effectively, particularly for long-horizon tasks. The agent operates in two modes: the native ReAct Mode, which follows a straightforward Thought-Action-Observation cycle, and the Heavy Mode, designed for more intricate, multi-step research tasks. Both modes leverage scalable computation methods to maximize the model's potential, moving beyond traditional, complex designs.

Questions about this article

No questions yet.