Quit Emailing Yourself

GitHub - deep-symbolic-mathematics/llm-srbench: [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ llm + benchmark + scientific-discovery open-source ✓ evaluation ✓

GitHub - hoorangyee/LRAGE: A framework for evaluating RAG pipelines, specifically adapted for the legal domain.

LRAGE is an open-source toolkit designed for evaluating Large Language Models in a Retrieval-Augmented Generation context, specifically for legal applications. It integrates various tools and datasets to streamline the evaluation process, allowing researchers to effectively assess model performance with minimal engineering effort. Key features include a modular architecture for retrievers and rerankers, a user-friendly GUI, and support for LLM-as-a-Judge evaluations.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ legal evaluation ✓ + language-models open-source ✓ + retrieval-augmented

Links

GitHub - deep-symbolic-mathematics/llm-srbench: [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

GitHub - hoorangyee/LRAGE: A framework for evaluating RAG pipelines, specifically adapted for the legal domain.