Quit Emailing Yourself

Maia 200: The AI accelerator built for inference - The Official Microsoft Blog

4 min read | Saved February 14, 2026 | Copied!

ai 🤖 accelerator 🤖 inference 🤖 microsoft 🤖 hardware 🤖

Do you care about this?

Microsoft has unveiled Maia 200, an AI inference accelerator built on TSMC’s 3nm process, designed to enhance AI token generation efficiency. It features advanced memory systems and high-performance capabilities, making it more efficient than previous generations of AI hardware. Maia 200 will support multiple models, including OpenAI's GPT-5.2, and aims to streamline AI development across Microsoft's cloud services.

If you do, here's more

Maia 200 is Microsoft's new AI inference accelerator, built on TSMC’s 3nm process, featuring native FP8/FP4 tensor cores and a memory system with 216GB of HBM3e memory at 7 TB/s. This design makes it the top-performing silicon among major hyperscalers, offering three times the FP4 performance of Amazon's Trainium and surpassing Google’s TPU in FP8 performance. The Maia 200 is also more cost-effective, delivering 30% better performance per dollar compared to Microsoft’s latest hardware. It will support various models, including OpenAI’s GPT-5.2, enhancing Microsoft Foundry and Microsoft 365 Copilot.

The architecture includes over 140 billion transistors, delivering more than 10 petaFLOPS in FP4 and over 5 petaFLOPS in FP8, all within a 750W power envelope. Its memory subsystem addresses data bottlenecks with a specialized DMA engine and on-chip SRAM, increasing throughput for large AI models. The system employs a two-tier scale-up network design built on standard Ethernet, which provides 2.8 TB/s of bidirectional bandwidth and supports clusters of up to 6,144 accelerators. This setup enhances performance for dense inference tasks while minimizing power consumption and overall costs.

Microsoft's development process for Maia 200 prioritized validating the system before final silicon availability, using a pre-silicon environment to model LLM computation and communication patterns. This approach allowed for quick deployment in data centers, reducing the time from silicon delivery to operational status. The integration with Azure offers robust security and management capabilities, ensuring reliability for critical AI workloads. Developers and researchers are invited to explore the Maia SDK, which includes tools for model optimization and efficient coding.

Questions about this article

No questions yet.