Quit Emailing Yourself

INTELLECT-3: A 100B+ MoE trained with large-scale RL

5 min read | Saved February 14, 2026 | Copied!

reinforcement-learning 🤖 open-source 🤖 machine-learning 🤖 model-training 🤖 ai 🤖

Do you care about this?

INTELLECT-3 is a Mixture-of-Experts model with over 100 billion parameters, trained using a custom reinforcement learning framework. It outperforms larger models across various benchmarks in math, code, and reasoning. The training infrastructure and datasets are open-sourced for public use and research.

If you do, here's more

INTELLECT-3 is a new Mixture-of-Experts model with over 100 billion parameters, trained using a large-scale reinforcement learning (RL) framework. It sets a high bar for performance in various areas, including math, coding, and scientific reasoning. The model's complete architecture, along with its training methods and datasets, has been open-sourced to foster further research in the RL field. Users can access the model and the underlying infrastructure through Prime Intellect's platform, making advanced AI tools available to a wider audience.

The training process for INTELLECT-3 involved two main phases: supervised fine-tuning based on the GLM-4.5-Air model and a large-scale reinforcement learning stage. This was executed on a 512-GPU cluster, emphasizing the need for robust training environments. The team leveraged PRIME-RL, a custom RL framework, and created a set of modular environments through their verifiers library, which streamlines the training and evaluation process. These environments are publicly accessible, allowing others to replicate or build upon their work.

To support the high demands of RL training, Prime Intellect developed Prime Sandboxes, which enable efficient execution of untrusted code with minimal latency, and a compute orchestration system using 512 NVIDIA H200 GPUs. This setup ensures that the system remains stable and efficient, even during extensive training periods. Looking ahead, the team plans to enhance agentic capabilities, expand the variety of RL environments, and improve long-horizon task management within the model. All relevant resources, including model weights and technical documentation, are available online for further exploration.

Questions about this article

No questions yet.