← BackJan 4, 2026

Building Neural Networks from Scratch: An Overview of Andrej Karpathy’s Course

Andrej Karpathy’s free video series walks learners through the fundamentals of neural networks—starting with back‑propagation and culminating in modern transformer architectures such as GPT. The curriculum is tightly focused on language modeling, offering hands‑on code in PyTorch while conveying best practices in training, debugging, and scaling deep nets.

Andrej Karpathy’s curriculum is a rare blend of theoretical depth and practical implementation. Designed for self‑study, the series builds a neural‑network toy framework from first principles before layering advanced concepts that underlie state‑of‑the‑art language models like GPT‑3. **Why Language Modeling?** Language models embody a rich set of learning problems—sequence modeling, discrete data representation, and probabilistic inference—that serve as an excellent microcosm for any deep‑learning endeavour. The techniques Karpathy teaches are immediately transferable to other domains such as computer vision or reinforcement learning. **Prerequisites** * Solid Python programming (proficiency in NumPy, data structures, and scripting) * Introductory calculus (derivatives, Gaussian distribution basics) Everything else is introduced on the fly. ## Syllabus Overview - **2h25m – Foundations of Back‑propagation** * A meticulous, step‑by‑step walkthrough of gradient descent, loss surfaces, and the mechanics of back‑propagation. * No deep knowledge of deep learning required; relies only on basic Python and high‑school calculus. - **1h57m – Bigram Character‑Level Language Model** * Introduces the core `torch.Tensor` API and explains how efficient tensor operations underpin neural inference. * Lays out the end‑to‑end language‑model pipeline: training loop, stochastic optimisation, loss calculation (negative log‑likelihood), sampling, and evaluation. - **1h15m – Multilayer Perceptron (MLP) Language Model** * Adds a hidden layer to the bigram model, highlighting key ML concepts such as learning‑rate tuning, hyperparameter selection, train/dev/test splits, over‑ and under‑fitting, and model diagnostics. - **1h55m – Internals of Deep MLPs & Batch Normalization** * Deconstructs forward activations, backward gradients, and scaling issues in multi‑layer networks. * Demonstrates common diagnostic visualisations and introduces Batch Normalization as a game‑changing regularisation technique. - **1h55m – Manual Back‑propagation of a 2‑Layer MLP** * Performs gradient calculations by hand through every node (loss, linear layers, activation, batch norm, embedding table). * Builds an intuitive understanding of gradient flow and computational graph efficiency without relying on autograd. - **56m – Deepening the MLP into a WaveNet‑Inspired CNN** * Transforms the tree‑structured MLP into a 1‑D convolutional architecture resembling DeepMind’s WaveNet. * Illustrates the practical workflow of building `torch.nn` modules, managing tensor shapes, and iterating between notebooks and production‑ready code. - **1h56m – Generative Pre‑trained Transformer (GPT)** * Derives the architecture from the seminal “Attention is All You Need” paper and maps it to OpenAI’s GPT‑2/3 lineage. * Contextualises the design choices behind autoregressive language modeling and ties them to the hype around ChatGPT and GitHub Copilot. - **2h13m – Tokenizer Design from Scratch** * Dissects the Byte Pair Encoding (BPE) tokenisation pipeline used in the GPT series. * Shows how tokenisation is a modular, independently trained component that can profoundly influence LLM behaviour. - **Ongoing Updates** * The course is continually expanded to cover new techniques and refinements, ensuring that students stay abreast of the latest in transformer research. **Conclusion** Karpathy’s series demonstrates that deep‑learning mastery is best acquired by building models from the ground up, paying close attention to the mathematics, implementation details, and engineering practices that make modern large‑scale systems work. Whether you aim to develop novel vision models or engineer the next generation of conversational agents, the foundational skills honed in this curriculum are universally applicable.