1 link tagged with all of: autograd + micro-gpt + transformers + python
Click any tag below to further narrow down your results
Links
This article breaks down Andrej Karpathy’s zero-dependency, 243-line GPT implementation in plain Python. It explains how each part—tokenizer, autograd engine, embeddings, attention mechanism, residual connections, and MLP—mirrors a full-scale transformer on a tiny dataset of baby names.