LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.
SpatialScore introduces a comprehensive benchmark for evaluating multimodal large language models (MLLMs) in spatial understanding, consisting of the VGBench dataset and an extensive collection of 28K samples. It features the SpatialAgent, a multi-agent system designed for enhanced spatial reasoning, and reveals persistent challenges and improvements in spatial tasks through quantitative and qualitative evaluations.