VaViM and VaVAM introduce a novel approach to autonomous driving using large-scale generative video models. VaViM predicts video frames through autoregressive modeling, while VaVAM generates driving trajectories via imitation learning, showcasing emergent behaviors in complex driving scenarios. The paper analyzes the model's performance, including its strengths and limitations in various driving situations.
Self-play has proven to be a highly effective approach for training autonomous driving systems, achieving state-of-the-art performance without using human data. Utilizing the Gigaflow simulator, the study generated an impressive 1.6 billion kilometers of driving scenarios, resulting in a policy that demonstrates exceptional robustness and realism, averaging 17.5 years of continuous driving between incidents in simulation.