Quit Emailing Yourself

Introducing GPT-5.3-Codex-Spark | OpenAI

5 min read | Saved February 14, 2026 | Copied!

codex 🤖 gpt-5 🤖 latency 🤖 coding 🤖 openai 🤖

Do you care about this?

OpenAI has launched GPT-5.3-Codex-Spark, a new model focused on real-time coding with ultra-low latency. It features a 128k context window and is designed for interactive tasks, allowing developers to make immediate edits and receive quick responses. This model will help improve coding workflows and is available to ChatGPT Pro users for experimentation.

If you do, here's more

OpenAI has introduced GPT-5.3-Codex-Spark, a new model designed for real-time coding. It’s smaller than its predecessor, GPT-5.3-Codex, and optimized for speed, delivering over 1000 tokens per second on low-latency hardware. This release marks a collaboration with Cerebras, focusing on improving responsiveness for developers. Codex-Spark allows for real-time collaboration, enabling users to make immediate adjustments and see results without delay.

The model features a 128k context window and is currently text-only. During the research preview, it has its own usage limits, separate from standard models, to manage demand. Codex-Spark enhances the user experience by minimizing latency across the request-response process, achieving an 80% reduction in overhead for client-server interactions and cutting time-to-first-token by 50%. These improvements are expected to benefit all models in the future.

Codex-Spark operates on Cerebras' Wafer Scale Engine 3, designed for high-speed inference. While GPUs remain essential for training and broader usage, this new infrastructure aims to tighten the feedback loop for coding tasks. The model is available today for ChatGPT Pro users and in API form for select partners, with plans for broader access as OpenAI refines its performance based on developer feedback. This launch sets the stage for future models that blend real-time collaboration with longer-term project execution.

Questions about this article

No questions yet.