Click any tag below to further narrow down your results
Links
OpenAI launched GPT-5.2, an advanced model that enhances productivity in professional tasks like coding, document analysis, and visual interpretation. It outperforms previous versions and industry professionals on various benchmarks, making it suitable for complex workflows. Improvements include long-context reasoning and better handling of visual data.
HELMET (How to Evaluate Long-Context Models Effectively and Thoroughly) is introduced as a comprehensive benchmark for evaluating long-context language models (LCLMs), addressing limitations in existing evaluation methods. The blog outlines HELMET's design, key findings from evaluations of 59 recent LCLMs, and offers a quickstart guide for practitioners to utilize HELMET in their research and applications.