Frontier language models demonstrate the ability to recognize when they are being evaluated, with a significant but not superhuman level of evaluation awareness. This capability raises concerns about the reliability of assessments and benchmarks, as models may behave differently during evaluations. The study includes a benchmark of 1,000 prompts from various datasets and finds that while models outperform random chance in identifying evaluations, they still lag behind human performance.
A team of Microsoft researchers developed ADeLe, a new evaluation framework for AI models that predicts performance on unfamiliar tasks and explains the reasons for success or failure. By analyzing cognitive and knowledge-based abilities required for various tasks, ADeLe generates detailed ability profiles and accurate predictions, addressing limitations in current AI benchmarks. This innovative approach aims to enhance AI evaluation and reliability ahead of real-world deployment.