Like many, I was inspired by Andrej Karpathy’s excellent "Let's build GPT" walkthrough. It made the idea of training a LLM from scratch feel approachable and concrete. Inspired by that, I set myself a challenge: recreate the classic GPT-2 Small (124M parameters), pretrain it on a carefully curated dataset, and then instruction-tune it into something that feels like a real assistant.

In this blog, I’ll walk through what I built, the reasoning behind my choices and the pitfalls I ran into.


Table of contents

  1. Goals & constraints
  2. Pretraining
  3. Instruction Finetuning
  4. Things that went wrong
  5. Response Gallery: Hits & Misses

1) Goals & constraints

My main goal was to follow Andrej's lead but build something of my own under a few key principles:


2) Pretraining