Quit Emailing Yourself

# llm → evaluation → open-source

2 links tagged with all of: llm + evaluation + open-source

Click any tag below to further narrow down your results

Links

Fine-tuning open LLM judges to outperform GPT-5.2

This article discusses how fine-tuning open-source LLM judges using Direct Preference Optimization (DPO) can lead to performance that matches or exceeds GPT-5.2 in evaluating model outputs. The authors trained models like GPT-OSS 120B and Qwen 3 235B on human preference data, achieving better accuracy and efficiency at a lower cost.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

llm ✓ + fine-tuning + dpo evaluation ✓ open-source ✓

GitHub - deep-symbolic-mathematics/llm-srbench: [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

llm ✓ + benchmark + scientific-discovery open-source ✓ evaluation ✓