4 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Nvidia introduced its Vera Rubin architecture, promising significant efficiency gains in AI workloads by reducing inference costs and GPU requirements. The system features six new chips, including advanced networking components, designed to enhance performance through improved GPU connectivity and in-network computing.
If you do, here's more
Nvidia launched its Vera Rubin architecture at CES, aiming to deliver significant cost reductions and efficiency in AI workloads. The platform promises a tenfold decrease in inference costs and a fourfold drop in the number of GPUs needed for training certain models compared to the previous Blackwell architecture. The Rubin GPU itself can achieve 50 petaFLOPS of 4-bit computation for transformer-based tasks, which is a substantial upgrade from the 10 petaFLOPS offered by Blackwell.
The architecture consists of six new chips: the Vera CPU, the Rubin GPU, and four networking chips designed to enhance performance by promoting collaboration among components. Gilad Shainer, Nvidia’s senior VP of networking, emphasizes that the way these units connect dramatically affects performance. The NVLink6 switch, which doubles GPU-to-GPU bandwidth to 3,600 GB/s, is key to enabling a "scale-up network" that ensures effective communication within a rack. Shainer explains that by offloading certain computations onto the network, like the all-reduce operation during training, the architecture can save both time and processing power.
For broader data center operations, the “scale-out network” connects multiple racks. It features the ConnectX-9 networking interface and the BlueField-4 data processing unit, which enhances networking and security tasks. The Spectrum-6 Ethernet switch improves bandwidth while reducing jitter, which can lead to inefficiencies in distributed computing. Shainer notes that while the current architecture focuses on intra-data center connections, the next challenge will be linking multiple data centers as workloads demand more GPUs, potentially exceeding 100,000 in a single facility.
Questions about this article
No questions yet.