A recent study claims that LM Arena has been assisting leading AI laboratories in manipulating their benchmark results. This raises concerns about the integrity of performance evaluations in the AI research community, potentially undermining trust in AI advancements. The implications of these findings could affect funding and research priorities across the industry.
ai ✓
benchmarks ✓
research ✓
integrity ✓
+ lm-arena