Researchers have developed the Video Joint Embedding Predictive Architecture (V-JEPA), an AI model that learns about its environment through videos and exhibits a sense of "surprise" when presented with contradictory information. Unlike traditional pixel-space models, V-JEPA uses higher-level abstractions to focus on essential details, enabling it to understand concepts like object permanence with high accuracy. The model has potential applications in robotics and is being further refined to enhance its capabilities.
ai ✓
machine-learning ✓
intuitive-physics ✓
robotics ✓
+ v-jepa