Quit Emailing Yourself

AI infrastructure in the "Era of experience"

6 min read | Saved February 14, 2026 | Copied!

ai-infrastructure 🤖 reinforcement-learning 🤖 custom-models 🤖 intelligence-involution 🤖 grpo 🤖

Do you care about this?

This article explores the shift towards training AI models through reinforcement learning (RL) as text data sources diminish. It discusses the concept of intelligence involution, highlighting the rise of custom RL models and the implications for businesses in the next year. The text dives into technical aspects like GRPO and LoRA, addressing the challenges and opportunities in building specialized AI models.

If you do, here's more

The article outlines a shift in AI model training towards reinforcement learning (RL) as traditional text-based methods become less viable. Rich Sutton and David Silver's concept of the "Era of Experience" emphasizes models that learn through interaction with environments instead of just predicting text. With the diminishing availability of easily scrapable text data, the next 6-12 months present significant opportunities for businesses to develop custom RL models tailored to their specific needs. This shift is driven by the concept of intelligence involution, where intense competition leads to minimal profits and pushes companies toward specialization.

Intelligence involution describes a landscape where the differences between models are narrowing. Open-source models, particularly from China, are gaining traction and threatening the margins of established players like OpenAI and Google. The rapid rise of these models mirrors trends in industries like electric vehicles, where fierce competition drives costs down. For example, DeepSeek significantly reduced token prices from $2.19 to $0.42 in just a few months while enhancing model capabilities. As general-purpose models become commoditized, companies are gravitating towards specialized approaches that offer more defensible positions in the market.

The article also critiques Supervised Fine-Tuning (SFT), which has become less effective for both simple and complex tasks. Modern models can achieve fine-tuned performance using few-shot prompting, making SFT economically irrational for many applications. Caching techniques further reduce costs, making SFT unnecessary for straightforward tasks. The need for models to discover novel strategies through interaction highlights the limitations of SFT, particularly with complex workflows requiring tool orchestration. As open-source models provide a foundation for building custom RL models, companies can now create tailored solutions that prioritize data privacy and cost control, fundamentally changing the competitive landscape in AI.

Questions about this article

No questions yet.