Vision-language-action (VLA) models enhance robotic manipulation by integrating action generation with vision-language capabilities. This paper reviews post-training strategies for VLA models, drawing parallels with human motor learning to improve interaction with environments. It introduces a taxonomy focusing on environmental perception, embodiment awareness, task comprehension, and multi-component integration, while identifying key challenges and trends for future research.
robotics ✓
machine-learning ✓
human-learning ✓
action-generation ✓
+ post-training