Click any tag below to further narrow down your results
Links
This gist provides a single-file, dependency-free implementation of a GPT-style transformer, complete with a custom autograd engine, training loop using Adam, and inference routine. It trains on a list of names, demonstrating both the core algorithm and a brief benchmark discussion for a GPU-based microgpt.cu variant.
This article breaks down Andrej Karpathy’s zero-dependency, 243-line GPT implementation in plain Python. It explains how each part—tokenizer, autograd engine, embeddings, attention mechanism, residual connections, and MLP—mirrors a full-scale transformer on a tiny dataset of baby names.