Click any tag below to further narrow down your results
Links
The article explores how large language models (LLMs) act as judges in evaluating other LLMs. It examines potential biases, the impact of model identity on outcomes, and differences in performance between "fast" and "thinking" tiers across various tasks. Experiments reveal insights into self-preference among judges and how hinting can influence their decisions.