MolmoAct is an innovative Action Reasoning Model (ARM) developed to enhance spatial reasoning in robotics, allowing machines to understand and execute tasks in three-dimensional space. Built on the open-source Molmo framework, MolmoAct utilizes depth-aware perception tokens for improved action planning and execution, demonstrating superior performance and generalization capabilities in real-world scenarios. The model is fully open-source, promoting transparency and accessibility for further research and development in the field.
action-reasoning ✓
+ robotics
spatial-reasoning ✓
open-source ✓
vision-language ✓