6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
The article argues that current AI systems are underutilized and have significant room for improvement in both software and hardware efficiency. It critiques the belief that we are hitting computational limits and outlines paths forward, including better training efficiencies and new model designs.
If you do, here's more
Dan Fu argues that the potential for Artificial General Intelligence (AGI) is still alive, primarily due to the inefficiencies present in current AI systems and the untapped computational resources available. He challenges Tim Dettmers' view that hardware limitations are significantly bottlenecking AGI progress. Fu points out that existing models, particularly Transformers, are not yet optimized for maximum efficiency. For instance, models like DeepSeek-V3 and Llama-4 achieve only around 20% mean FLOP utilization during training, while earlier models like BLOOM reached 50%. This underutilization suggests there's ample room for improvement.
Fu highlights two critical areas: training and inference. In training, the current generation of models struggles with efficiency due to their architecture, particularly mixture-of-experts (MoE) models, which require more communication and less arithmetic intensity. He proposes that newer hardware, such as Blackwell chips, could significantly enhance performance if algorithms can be adapted to utilize the increased FLOP capabilities. On the inference side, the situation is even worse, with some optimized implementations achieving less than 5% FLOP utilization. This discrepancy arises largely from bottlenecks in data transfer between memory types rather than inherent hardware limitations.
To address these issues, Fu outlines several paths forward, emphasizing co-designing more efficient machine learning architectures that leverage hardware better. He mentions ongoing work by researchers like Simran Arora and Songlin Yang that focuses on hardware-aware architectures and efficient attention mechanisms. He also advocates for the shift to FP4 training, which could double the available FLOPs, and calls for inference-efficient model designs to improve hardware utilization. Overall, Fu's perspective is that while current models may not meet every definition of AGI, there's significant potential to enhance their capabilities through better utilization of existing and emerging hardware.
Questions about this article
No questions yet.