Architecture Fundamentals
Explore transformer layers, attention mechanisms, residual streams, and MLPs to understand how language models are built
A hands-on exploration of LLM internals - with Claude as tutor, prioritizing understanding over answers
This is an active learning project, building understanding through code and experimentation. The focus is on smaller models (GPT-2, Pythia, Llama-2-7B) that fit on a 12GB GPU, allowing for local development and hands-on exploration.
This project uses Claude not just as a coding assistant, but as a Socratic tutor - prioritizing understanding over answers.
Rather than lectures and explanations, Claude:
Instead of:
"An autoencoder has an encoder that compresses input to a latent space and a decoder that reconstructs it..."
Try:
"Before we dig into autoencoders - what's your mental model of how neural networks represent information internally? What do you think happens to the input as it passes through layers?"
Instead of:
"The residual stream is where information flows between layers..."
Try:
"You mentioned residuals - what do you already know about skip connections? What problem do you think they solve?"
The goal isn't just working code - it's understanding deep enough to ask the next question yourself. This approach builds intuition and mental models through experiment-driven learning:
Read the full tutoring guidelines →