Quit Emailing Yourself

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system— up to 9x increase in output lets 213 GPUs perform like 1,192 | Tom's Hardware

3 min read | Saved October 28, 2025 | Copied!

Do you care about this?

Alibaba Cloud has developed a new pooling system called Aegaeon that significantly reduces the number of Nvidia GPUs required for large language model inference by 82%, allowing 213 GPUs to perform like 1,192. This innovative approach virtualizes GPU access at the token level, enhancing overall output and efficiency during periods of fluctuating demand. The findings, which were published in a peer-reviewed paper, highlight the potential for cloud providers to maximize GPU utilization in constrained markets like China.

If you do, here's more

Click "Generate Summary" to create a detailed 2-4 paragraph summary of this article.

Questions about this article

No questions yet.