Quit Emailing Yourself

GitHub - deep-symbolic-mathematics/llm-srbench: [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ llm benchmark ✓ + scientific-discovery + open-source evaluation ✓

Abstract

SpatialScore introduces a comprehensive benchmark for evaluating multimodal large language models (MLLMs) in spatial understanding, consisting of the VGBench dataset and an extensive collection of 28K samples. It features the SpatialAgent, a multi-agent system designed for enhanced spatial reasoning, and reveals persistent challenges and improvements in spatial tasks through quantitative and qualitative evaluations.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ spatial-understanding + multimodal evaluation ✓ benchmark ✓ + artificial-intelligence

Links

GitHub - deep-symbolic-mathematics/llm-srbench: [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Abstract