ZeroSumEval is a framework designed for evaluating large language models (LLMs) through competitive games, dynamically scaling in difficulty as models improve. It features multi-agent simulations with clear win conditions to assess various capabilities such as knowledge, reasoning, and planning, while enabling easy extension for new games and integration with optimization tools. The framework supports multiple games including chess, poker, and math quizzes, and provides comprehensive logging and analysis tools for performance evaluation.