Quit Emailing Yourself

Notion

5 min read | Saved February 14, 2026 | Copied!

autocomp 🤖 tensor-accelerators 🤖 optimization 🤖 aws 🤖 machine-learning 🤖

Do you care about this?

This article discusses Autocomp, a framework designed to optimize code for tensor accelerators using large language models. It highlights how Autocomp outperforms human experts in efficiency and portability, particularly when applied to AWS Trainium. The authors explore the challenges of programming tensor accelerators and the unique optimizations required for effective performance.

If you do, here's more

Autocomp is a new framework aimed at optimizing code for low-resource tensor accelerators, developed by a team at UC Berkeley’s SLICE Lab. The framework leverages large language models (LLMs) to outperform human experts in writing kernel code, achieving performance improvements of up to 17 times on AWS Trainium. This is particularly important because tensor accelerators, despite their promise, often struggle to gain traction due to immature software ecosystems. The complexity of writing software for these accelerators stems from their unique programming models, which require custom kernels and compilers that are slow and error-prone to develop.

Programming these accelerators is not straightforward. Unlike general-purpose CPUs, tensor accelerators focus on fixed-size matrix multiplications. Optimization strategies aim to minimize data movement between memory types and maximize computational efficiency. Techniques include scheduling operations to overlap computation and data movement, and employing various code transformations like loop tiling and software pipelining. The Autocomp framework addresses these challenges by prompting an LLM to select optimizations from a predefined menu, using detailed hardware performance feedback to guide decisions.

The framework’s portability is another notable feature; it has been effectively applied across different platforms, including academic and industry accelerators like AWS Trainium and Gemmini. The implementation involves a two-phase process: planning and then executing the chosen optimizations. By measuring local memory utilization, Autocomp can intelligently select optimizations that enhance performance while avoiding unnecessary complexity when memory is already fully utilized. The focus on practical applications, especially with AWS Trainium, highlights Autocomp's role in bridging the software gap that has hindered the widespread adoption of tensor accelerators.

Questions about this article

No questions yet.