Quit Emailing Yourself

GitHub - facebookresearch/ZeroSumEval: A framework for pitting LLMs against each other in an evolving library of games ⚔

3 min read | Saved October 29, 2025 | Copied!

llm 🤖 evaluation 🤖 games 🤖 framework 🤖 competition 🤖

Do you care about this?

ZeroSumEval is a framework designed for evaluating large language models (LLMs) through competitive games, dynamically scaling in difficulty as models improve. It features multi-agent simulations with clear win conditions to assess various capabilities such as knowledge, reasoning, and planning, while enabling easy extension for new games and integration with optimization tools. The framework supports multiple games including chess, poker, and math quizzes, and provides comprehensive logging and analysis tools for performance evaluation.

If you do, here's more

Click "Generate Summary" to create a detailed 2-4 paragraph summary of this article.

Questions about this article

No questions yet.