Like many, I was inspired by Andrej Karpathy’s excellent "Let's build GPT" walkthrough. It made the idea of training a LLM from scratch feel approachable and concrete. Inspired by that, I set myself a challenge: recreate the classic GPT-2 Small (124M parameters), pretrain it on a carefully curated dataset, and then instruction-tune it into something that feels like a real assistant.
In this blog, I’ll walk through what I built, the reasoning behind my choices and the pitfalls I ran into.
Table of contents
- Goals & constraints
- Pretraining
- Instruction Finetuning
- Things that went wrong
- Response Gallery: Hits & Misses
1) Goals & constraints
My main goal was to follow Andrej's lead but build something of my own under a few key principles:
- Build it from scratch: I wanted to implement most of the core components myself for a deeper understanding.
- Keep it practical: The project had to be feasible on a decent hardware budget, with a strict time constraint that training must finish in days, not weeks.
- Make it genuinely useful: The final model couldn't just be a toy. For me, "useful" meant it could:
- Construct English sentences without major grammatical flaws.
- Generate a coherent, sensible response to a query.
- Factual accuracy, however, was not a primary concern.
2) Pretraining