1 link tagged with all of: ai-research + visual-reasoning + evaluation + gpt-4 + zerobench
Links
The author reviews ZeroBench and finds its visual reasoning tasks too simplistic, mainly involving basic counting of objects. They argue that improvements in evaluation scores do not equate to advancements in visual reasoning capabilities.
visual-reasoning ✓
zerobench ✓
evaluation ✓
gpt-4 ✓
ai-research ✓