Quit Emailing Yourself

# evaluation → judging

1 link tagged with all of: evaluation + judging

Click any tag below to further narrow down your results

Links

LLMs as Judges: Measuring Bias, Hinting Effects, and Tier Preferences

The article explores how large language models (LLMs) act as judges in evaluating other LLMs. It examines potential biases, the impact of model identity on outcomes, and differences in performance between "fast" and "thinking" tiers across various tasks. Experiments reveal insights into self-preference among judges and how hinting can influence their decisions.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

+ bias evaluation ✓ + llm judging ✓ + experiments