Quit Emailing Yourself

Ben Pouladian on X: "The Memory Wars: Why the Future Karpathy, Musk, and Jim Fan See Requires 16-Hi HBM" / X

8 min read | Saved February 14, 2026 | Copied!

ai 🤖 memory 🤖 infrastructure 🤖 robotics 🤖 nvidia 🤖

Do you care about this?

This article discusses the rapid evolution of AI infrastructure, focusing on the demand for advanced memory solutions like 16-Hi HBM and the implications for programming and robotics. It highlights how the increasing capabilities of AI models are outpacing current hardware, leading to a potential shift in how we leverage AI in various fields.

If you do, here's more

Ben Pouladian highlights a critical shift in AI architecture driven by advancements in memory technology, particularly the push for 16-Hi High Bandwidth Memory (HBM). Andrej Karpathy, known for his work on Tesla's Autopilot, expressed feeling behind as a programmer due to the rapid evolution of AI models. This sentiment coincides with NVIDIA's order for 16-Hi HBM from major suppliers like Samsung and SK Hynix, indicating a transition from research to mass production. Pouladian argues that this new infrastructure will enable significant improvements in AI processing capabilities, potentially making programmers ten times more effective.

The article outlines the challenges posed by the scaling of AI models, noting that the demand for memory far outstrips current supply capabilities. For example, a 70 billion parameter model like Llama 3 requires 140GB just for weights, with significantly more needed for user context and caching. The inefficiencies in AI inference, where powerful GPUs like the H100 run at less than 1% utilization during token generation, underscore the urgent need for better memory solutions. Pouladian examines the trade-offs between HBM and SRAM, explaining how each type of memory plays a role in training and inference tasks.

Key developments include NVIDIA's strategy to integrate 3D-stacked SRAM with HBM, which aims to overcome the limitations of traditional architectures. This approach allows for high capacity and bandwidth while maintaining low latency for inference. Pouladian describes how Groq’s recent $20 billion deal with NVIDIA signifies a broader recognition of the benefits of SRAM-centric architectures. The focus is on creating a hybrid system that leverages both types of memory to optimize performance across various workloads, setting the stage for a new era in AI processing.

Questions about this article

No questions yet.