Click any tag below to further narrow down your results
Links
This article breaks down Andrej Karpathy’s zero-dependency, 243-line GPT implementation in plain Python. It explains how each part—tokenizer, autograd engine, embeddings, attention mechanism, residual connections, and MLP—mirrors a full-scale transformer on a tiny dataset of baby names.
Pingkit is a toolkit designed for training reproducible, capacity-aware models using transformer activations. It offers features for extracting embeddings, training neural architectures, and creating custom probes tailored to specific research needs. The toolkit is integrated with Hugging Face models and provides various utilities for data processing and model training.