Vision Language Models (VLMs) have evolved significantly over the past year, showcasing advancements in any-to-any architectures, reasoning capabilities, and the emergence of multimodal agents. New trends include smaller yet powerful models, innovative alignment techniques, and the introduction of Vision-Language-Action models that enhance robotic interactions. The article highlights key developments and model recommendations in the rapidly growing field of VLMs.