3 links
tagged with all of: llm + benchmark
Click any tag below to further narrow down your results
Links
LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.
A benchmark is introduced to evaluate the impact of database performance on user experience in LLM chat interactions, comparing OLAP (ClickHouse) and OLTP (PostgreSQL) using various query patterns. Results show ClickHouse significantly outperforms PostgreSQL on larger datasets, with performance tests ranging from 10k to 10m records included in the repository. Users can run tests and simulations using provided scripts to further explore database performance and interaction latencies.
LLMs struggle with font identification tasks, as demonstrated by a benchmark comparing their predictions to community responses on dafont.com. Despite providing context such as image, thread title, and description, the results were disappointing, highlighting the limitations of current LLM capabilities in this specific classification task. This evaluation emphasizes that LLMs are not infallible and still have significant room for improvement.