
Educational PyTorch re-implementation of GPT training and inference (fork of Karpathy's minGPT)
A clean, readable PyTorch re-implementation of GPT (training and inference) based on Andrej Karpathy's minGPT. The core model is ~300 lines of code: a standard Transformer decoder (masked self-attention, feed-forward, layer norm) with a BPE tokeniser matching OpenAI's GPT encoding. Includes demo notebooks for a sorting task and GPT-2 text generation, a character-level language model project, and an addition task trained from scratch.
~300-line educational PyTorch GPT implementation for learning Transformer internals.
• Model (mingpt/model.py): decoder-only Transformer with masked self-attention heads, feed-forward layers, layer norm, and configurable depth/width. Supports gpt2, gpt2-medium, gpt2-large, gpt2-xl presets.
• BPE tokeniser (mingpt/bpe.py): byte-pair encoding matching OpenAI's GPT-2 vocabulary (50,257 merges).
• Trainer (mingpt/trainer.py): generic PyTorch training loop; AdamW optimiser; configurable learning rate, batch size, max_iters.
• Projects: adder (trains GPT to add numbers), chargpt (character-level LM on arbitrary text), demo.ipynb (sorting example), generate.ipynb (GPT-2 text generation from prompt).
• Installation: pip install -e . for use as an importable mingpt library.