Quit Emailing Yourself

# llm → benchmark

3 links tagged with all of: llm + benchmark

Click any tag below to further narrow down your results

Links

GitHub - deep-symbolic-mathematics/llm-srbench: [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

llm ✓ benchmark ✓ + scientific-discovery + open-source + evaluation

GitHub - 514-labs/LLM-query-test: Performance comparison of ClickHouse vs PostgreSQL using LLM-style query patterns on realistic aircraft tracking data

A benchmark is introduced to evaluate the impact of database performance on user experience in LLM chat interactions, comparing OLAP (ClickHouse) and OLTP (PostgreSQL) using various query patterns. Results show ClickHouse significantly outperforms PostgreSQL on larger datasets, with performance tests ranging from 10k to 10m records included in the repository. Users can run tests and simulations using provided scripts to further explore database performance and interaction latencies.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ database benchmark ✓ llm ✓ + clickhouse + postgresql

Do LLMs identify fonts? • Max Halford

LLMs struggle with font identification tasks, as demonstrated by a benchmark comparing their predictions to community responses on dafont.com. Despite providing context such as image, thread title, and description, the results were disappointing, highlighting the limitations of current LLM capabilities in this specific classification task. This evaluation emphasizes that LLMs are not infallible and still have significant room for improvement.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

llm ✓ + font-identification benchmark ✓ + dafont + machine-learning