Click any tag below to further narrow down your results
Links
Google DeepMind is expanding its Kaggle Game Arena to include benchmarks for social deduction and risk management games like Werewolf and Poker. These additions aim to evaluate AI models on communication, negotiation, and decision-making under uncertainty. The updates also enhance the platform's role in assessing AI behavior in complex environments.
ARC-AGI-3 is an innovative evaluation framework aimed at measuring human-like intelligence in AI through skill-acquisition efficiency in diverse, interactive game environments. The project, currently in development, proposes a new benchmark paradigm that tests AI capabilities such as planning, memory, and goal acquisition, while inviting community contributions for game design. Results from this competition, which seeks to bridge the gap between human and artificial intelligence, will be announced in August 2025.