A new study reveals that state-of-the-art Vision Language Models (VLMs) exhibit severe confirmation bias, achieving 100% accuracy on familiar images but dropping to approximately 17% accuracy on counterfactual images. The models rely on memorized knowledge rather than actual visual analysis, resulting in a significant gap between their performance on unmodified and modified images. The research highlights that 75.70% of errors are bias-aligned, indicating a fundamental flaw in how VLMs process multimodal information.