Power Attention is an open-source implementation designed to optimize the core operation of symmetric power transformers, enabling efficient training and inference on long-context sequences. It serves as a drop-in replacement for various attention forms, significantly improving performance metrics like loss-per-FLOP compared to traditional and linear attention models. The architecture’s adjustable hyperparameter allows for better balance between weight and state FLOPs, enhancing scalability and learning efficiency.