The article recounts a bug encountered while using PyTorch that caused a training loss plateau, initially attributed to user error but ultimately traced back to a GPU kernel bug on the MPS backend for Apple Silicon. The author details the investigative process which deepened their understanding of PyTorch internals, illustrating the importance of debugging and exploration in mastering the framework. A minimal reproduction script is provided for others interested in the issue.
pytorch ✓
debugging ✓
gpu ✓