MolmoAct is an innovative Action Reasoning Model (ARM) developed to enhance spatial reasoning in robotics, allowing machines to understand and execute tasks in three-dimensional space. Built on the open-source Molmo framework, MolmoAct utilizes depth-aware perception tokens for improved action planning and execution, demonstrating superior performance and generalization capabilities in real-world scenarios. The model is fully open-source, promoting transparency and accessibility for further research and development in the field.
SmolVLA is a compact and open-source Vision-Language-Action model designed for robotics, capable of running on consumer hardware and trained on community-shared datasets. It significantly outperforms larger models in both simulation and real-world tasks, while offering faster response times through asynchronous inference. The model's lightweight architecture and efficient training methods aim to democratize access to advanced robotics capabilities.