Power Attention is an open-source implementation designed to optimize the core operation of symmetric power transformers, enabling efficient training and inference on long-context sequences. It serves as a drop-in replacement for various attention forms, significantly improving performance metrics like loss-per-FLOP compared to traditional and linear attention models. The architecture’s adjustable hyperparameter allows for better balance between weight and state FLOPs, enhancing scalability and learning efficiency.
power-attention ✓
transformers ✓
machine-learning ✓
open-source ✓
+ long-context