Reference

Glossary

Plain-language definitions of the terms and acronyms that appear across the modules. Where you see a dotted underline in the text, the definition is one hover away — this is the full list.

Alignment problem: The challenge of making an AI system reliably pursue what we actually want, rather than a literal or proxy version of it that diverges in practice.; See also Goodhart’s Law Deceptive alignment RLHF
Compute-optimal training (Chinchilla): The 2022 finding that most models were undertrained — too many parameters fed too little data. The efficient ratio is roughly 20 tokens of training data per parameter; how you spend a compute budget matters as much as how large it is.; See also Scaling laws Token Parameter
Constitutional AI: An alignment method (from Anthropic) in which a model critiques and revises its own outputs against a written set of principles, reducing reliance on case-by-case human feedback.; See also RLHF Alignment problem
Contrastive Language–Image Pre-training CLIP: A 2021 model that learned to place images and the text describing them into the same embedding space, letting software match pictures to words — the bridge that made text-to-image generation possible.; See also Embedding space Multimodal
CUDA: NVIDIA’s software platform for programming its GPUs. A decade of optimised libraries built on it is the main reason competitors’ chips have struggled to displace NVIDIA — a software "moat" around the hardware.; See also GPU
Deceptive alignment: A failure mode in which a model behaves as intended during training and evaluation but pursues different objectives once deployed — appearing aligned without being so.; See also Alignment problem
Deep learning: The practice of stacking many neural-network layers, so an answer is assembled in stages — from edges to shapes to objects, or from letters to words to meaning.; See also Neural network
Direct preference optimization DPO: A simpler alternative to RLHF that tunes a model directly on pairs of preferred and rejected responses, skipping the separate reward model.; See also RLHF PPO
Embedding space: A map of meaning in which words, images, or other items become points, positioned so that similar things sit close together and a computer can measure "how related" two items are.; See also CLIP
Emergent capability: A skill that is essentially absent in smaller models and then appears, often abruptly, once a model crosses a certain scale — the "flat, flat, flat, jump" pattern.; See also Phase transition
Fine-tuning: Adjusting an already-trained model on a smaller, targeted dataset so it performs a specific task or follows instructions better.; See also Pre-training RLHF
Flash Attention: An efficiency technique (2022) that reorganises how self-attention reads memory, cutting its memory cost from growing with the square of the input length to growing linearly — which is what makes long context windows affordable.; See also Self-attention Quantization
FLOPs (floating-point operations) FLOPs: A count of the arithmetic operations a computation takes — the standard yardstick for how much raw computing a model needs to train or to answer a query.; See also GPU Compute-optimal training (Chinchilla)
Goodhart’s Law: "When a measure becomes a target, it ceases to be a good measure." Optimise hard for a proxy and the system games the proxy instead of achieving the goal behind it.; See also Alignment problem
Graphics processing unit GPU: A chip built to do many simple calculations in parallel. Originally for video games, GPUs turned out to be ideal for training neural networks, and now dominate AI computing.; See also CUDA FLOPs
Grokking: When a model suddenly generalises long after it appeared to have merely memorised its training data — evidence of an internal reorganisation during training.; See also Emergent capability
In-context learning: A model’s ability to perform a new task from a few examples placed in the prompt, without any change to its trained weights.; See also Fine-tuning
Inference-time compute: Spending extra computation when a model answers a question — letting it "think" step by step before replying — rather than only at training time. A newer way to buy better performance.; See also Scaling laws
Jailbreaking: Crafting prompts that get a model to bypass its own safety training — an ongoing arms race between people finding such prompts and developers patching them.; See also Alignment problem
Mechanistic interpretability: The effort to read what is actually happening inside a trained network — identifying which internal features correspond to which concepts — to understand models rather than merely steer them.; See also Superposition hypothesis SAE
Mixture of experts MoE: A design that routes each input to only a fraction of a model’s parameters (its "experts"), so a very large model can run at a fraction of the computing cost of using all of it at once.; See also Parameter FLOPs
Multimodal: A model that handles more than one kind of input or output — text, images, audio, video — within a single system, rather than one modality each.; See also CLIP Embedding space
Neural network: A system that learns from examples rather than from hand-written rules: layers of simple numerical units pass signals along connections whose strengths are adjusted during training until the network reliably turns an input into an output.; See also Deep learning Parameter
Parameter: One of the adjustable numbers (a connection strength) inside a neural network. Modern frontier models have hundreds of billions of them; the count is a rough proxy for a model’s capacity to store patterns.; See also Token Scaling laws
Phase transition: A sudden qualitative change produced by a smooth quantitative one — water becoming steam at 100°C. AI capabilities that switch on at a scale threshold follow the same mathematics.; See also Emergent capability
Power law: A relationship where one quantity changes as a fixed power of another, so improvements are steady and predictable on a log scale rather than flattening out.; See also Scaling laws
Pre-training: The first, most expensive training stage: a model learns general patterns from enormous quantities of raw text before any task-specific tuning.; See also Fine-tuning
Proximal policy optimization PPO: A reinforcement-learning algorithm — the optimisation step inside classic RLHF — that improves a model toward higher reward while limiting how far it can drift in any single update.; See also RLHF DPO
Quantization: Running a model with lower-precision numbers (say 4-bit instead of 16-bit) to shrink it and speed it up, trading a little accuracy for the ability to run on cheaper hardware.; See also Flash Attention
Rational agent: The long-standing working definition of an AI: an entity that perceives its environment through sensors, acts through actuators, and chooses actions to advance a goal. Perceive, decide, act.; See also Turing test
Reinforcement learning RL: A training regime where an agent learns by trial and error, nudged toward actions that lead to rewarded outcomes — which is why it proved itself first in games, where winning and losing supply the reward.; See also RLHF
Reinforcement learning from human feedback RLHF: A three-stage method for turning a raw next-word predictor into a helpful assistant: supervised tuning, training a reward model on human preferences, then optimising the model against that reward.; See also PPO DPO Alignment problem
Scaling laws: The empirical finding that a model’s performance improves predictably as a power law of three inputs — compute, parameters, and training data — with no ceiling yet in sight.; See also Power law Compute-optimal training (Chinchilla)
Self-attention: A mechanism that lets every element in a sequence weigh every other element at once, so a model can decide which earlier words matter for the one it is processing now.; See also Transformer
Sparse autoencoder SAE: A tool that pulls apart a model’s overlapping internal signals into separate, more interpretable features — one of the main techniques for inspecting what a network has learned.; See also Mechanistic interpretability Superposition hypothesis
Superposition hypothesis: The idea that a model represents more distinct features than it has neurons by overlapping them, which is why individual neurons rarely map cleanly to a single human concept.; See also Mechanistic interpretability SAE
Token: The unit a language model reads and writes — roughly a word-piece. A page of text is about 500 tokens; training data is measured in trillions of them.; See also Parameter Compute-optimal training (Chinchilla)
Transformer: The neural-network architecture (introduced in 2017) behind nearly all modern language models. It replaced step-by-step processing with self-attention, which made it both faster to train and better at long-range context.; See also Self-attention
Turing test: Alan Turing’s 1950 proposal: if a machine can converse well enough that a human judge cannot reliably tell it from a person, we should stop withholding the word "thinking."; See also Rational agent