How I trained an LLM from scratch and finetuned it.

Like many, I was inspired by Andrej Karpathy’s excellent "Let's build GPT" walkthrough. It made the idea of training a LLM from scratch feel approachable and concrete. Inspired by that, I set myself a challenge: recreate the classic GPT-2 Small (124M parameters), pretrain it on a carefully curated dataset, and then instruction-tune it into something that feels like a real assistant.

In this blog, I’ll walk through what I built, the reasoning behind my choices and the pitfalls I ran into.

Goals & constraints
Pretraining
Instruction Finetuning
Things that went wrong
Response Gallery: Hits & Misses

1) Goals & constraints

My main goal was to follow Andrej's lead but build something of my own under a few key principles:

Build it from scratch: I wanted to implement most of the core components myself for a deeper understanding.
Keep it practical: The project had to be feasible on a decent hardware budget, with a strict time constraint that training must finish in days, not weeks.
Make it genuinely useful: The final model couldn't just be a toy. For me, "useful" meant it could:
- Construct English sentences without major grammatical flaws.
- Generate a coherent, sensible response to a query.
- Factual accuracy, however, was not a primary concern.

Table of contents

1) Goals & constraints

2) Pretraining