SmolVLA is a compact and open-source Vision-Language-Action model designed for robotics, capable of running on consumer hardware and trained on community-shared datasets. It significantly outperforms larger models in both simulation and real-world tasks, while offering faster response times through asynchronous inference. The model's lightweight architecture and efficient training methods aim to democratize access to advanced robotics capabilities.