Quit Emailing Yourself

# benchmarks → long-context

2 links tagged with all of: benchmarks + long-context

Click any tag below to further narrow down your results

Links

Introducing GPT-5.2 | OpenAI

OpenAI launched GPT-5.2, an advanced model that enhances productivity in professional tasks like coding, document analysis, and visual interpretation. It outperforms previous versions and industry professionals on various benchmarks, making it suitable for complex workflows. Improvements include long-context reasoning and better handling of visual data.

Saved by tldr-importer · Last saved February 14, 2026 · 9 min read

+ gpt-5.2 + productivity benchmarks ✓ long-context ✓ + visual-data

Introducing HELMET: Holistically Evaluating Long-context Language Models

HELMET (How to Evaluate Long-Context Models Effectively and Thoroughly) is introduced as a comprehensive benchmark for evaluating long-context language models (LCLMs), addressing limitations in existing evaluation methods. The blog outlines HELMET's design, key findings from evaluations of 59 recent LCLMs, and offers a quickstart guide for practitioners to utilize HELMET in their research and applications.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

long-context ✓ + language-models + evaluation benchmarks ✓ + nlp