2 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article explores the efficiency of local AI models compared to centralized cloud infrastructure. It introduces a metric called intelligence per watt (IPW) to evaluate local models' performance and energy use. The findings indicate that local models can accurately handle a significant portion of queries, and they outperform cloud models in terms of efficiency.
If you do, here's more
Large language model (LLM) queries currently rely heavily on centralized cloud infrastructure, which is struggling to keep up with increasing demand. However, recent advancements suggest a shift might be possible. Smaller language models, those with 20 billion or fewer active parameters, are now showing competitive performance compared to larger models. Furthermore, local accelerators, like the Apple M4 Max, can run these models with low latency, prompting a reevaluation of how LLMs are deployed and accessed.
The study introduces a metric called intelligence per watt (IPW), which measures the capability and efficiency of local inference by dividing task accuracy by power consumption. Researchers conducted an extensive analysis involving over 20 state-of-the-art local LMs, 8 different accelerators, and 1 million real-world chat and reasoning queries. The results revealed that local LMs could accurately respond to about 88.7% of these queries, with performance varying by specific domain. From 2023 to 2025, IPW improved by 5.3 times, and the coverage of local queries increased from 23.2% to 71.3%.
Notably, local accelerators demonstrated at least 1.4 times lower IPW compared to cloud accelerators using the same models. This indicates a substantial opportunity for optimization in local systems. The findings suggest that local inference could significantly ease the burden on centralized infrastructure. The researchers have made their IPW profiling tools available for benchmarking, aiming to support further exploration in this area.
Questions about this article
No questions yet.