Quit Emailing Yourself

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192 | Tom's Hardware

Alibaba Cloud has developed a new pooling system called Aegaeon that significantly reduces the number of Nvidia GPUs required for large language model inference by 82%, allowing 213 GPUs to perform like 1,192. This innovative approach virtualizes GPU access at the token level, enhancing overall output and efficiency during periods of fluctuating demand. The findings, which were published in a peer-reviewed paper, highlight the potential for cloud providers to maximize GPU utilization in constrained markets like China.

Saved by hn_user_10 · Last saved October 28, 2025 · 3 min read

alibaba ✓ gpu ✓ + inference

Links

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192 | Tom's Hardware