Click any tag below to further narrow down your results
Links
OpenAI introduced GPT-5.2 and GPT-5.3 Codex, both trained on NVIDIA's infrastructure, showcasing significant performance gains in coding and reasoning tasks. The models achieve top scores on various industry benchmarks, reflecting advancements in AI training techniques. NVIDIA's powerful systems enable faster development cycles for AI applications.
The article reviews GPT-5.2, highlighting that while it has notable improvements in instruction-following and complex task handling, its performance is slower than expected. The author compares it to other models like Claude Opus 4.5 and Gemini 3, noting that it may not be the best choice for all use cases, especially in coding or when a more engaging personality is desired.
The ARC Prize Foundation evaluates OpenAI's latest models, o3 and o4-mini, using their ARC-AGI benchmarks, revealing varying performance levels in reasoning tasks. While o3 shows significant improvements in accuracy on ARC-AGI-1, both models struggle with the more challenging ARC-AGI-2, indicating ongoing challenges in AI reasoning capabilities. The article emphasizes the importance of model efficiency and the role of public benchmarks in understanding AI advancements.