1 link tagged with all of: python + transformers + micro-gpt
Click any tag below to further narrow down your results
Links
This article breaks down Andrej Karpathy’s zero-dependency, 243-line GPT implementation in plain Python. It explains how each part—tokenizer, autograd engine, embeddings, attention mechanism, residual connections, and MLP—mirrors a full-scale transformer on a tiny dataset of baby names.