M1 introduces a hybrid linear RNN reasoning model based on the Mamba architecture, designed for scalable test-time computation in solving complex mathematical problems. By leveraging distillation from existing models and reinforcement learning, M1 achieves significant speed and accuracy improvements over traditional transformer models, matching the performance of state-of-the-art distilled reasoning models while utilizing memory-efficient inference techniques.
machine-learning ✓
reasoning ✓
inference ✓
scalability ✓
benchmarks ✓