Quit Emailing Yourself

# optimization → ptxas

1 link tagged with all of: optimization + ptxas

Click any tag below to further narrow down your results

Links

Maybe consider putting "cutlass" in your CUDA/Triton kernels

This article explores an unusual optimization where adding "cutlass" to a CUDA kernel's name can significantly increase performance, sometimes by over 100 TFLOPs. It discusses the underlying mechanics of this optimization and its varying effects on different architectures and projects, emphasizing the importance of benchmarking.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

+ cuda + cutlass optimization ✓ + performance ptxas ✓