The paper critiques the Chatbot Arena, a platform for ranking AI systems, highlighting significant biases in its benchmarking practices. It reveals that certain providers can manipulate performance data through undisclosed testing methods, leading to disparities in data access and evaluation outcomes. The authors propose reforms to enhance transparency and fairness in AI benchmarking.