Reference
Glossary
Plain-language definitions of the terms and acronyms that appear across the modules. Where you see a dotted underline in the text, the definition is one hover away — this is the full list.
- Alignment problem
- The challenge of making an AI system reliably pursue what we actually want, rather than a literal or proxy version of it that diverges in practice.
- See also Goodhart’s Law Deceptive alignment RLHF
- Compute-optimal training (Chinchilla)
- The 2022 finding that most models were undertrained — too many parameters fed too little data. The efficient ratio is roughly 20 tokens of training data per parameter; how you spend a compute budget matters as much as how large it is.
- See also Scaling laws Token Parameter
- Constitutional AI
- An alignment method (from Anthropic) in which a model critiques and revises its own outputs against a written set of principles, reducing reliance on case-by-case human feedback.
- See also RLHF Alignment problem
- Contrastive Language–Image Pre-training CLIP
- A 2021 model that learned to place images and the text describing them into the same embedding space, letting software match pictures to words — the bridge that made text-to-image generation possible.
- See also Embedding space Multimodal
- CUDA
- NVIDIA’s software platform for programming its GPUs. A decade of optimised libraries built on it is the main reason competitors’ chips have struggled to displace NVIDIA — a software "moat" around the hardware.
- See also GPU
- Deceptive alignment
- A failure mode in which a model behaves as intended during training and evaluation but pursues different objectives once deployed — appearing aligned without being so.
- See also Alignment problem
- Deep learning
- The practice of stacking many neural-network layers, so an answer is assembled in stages — from edges to shapes to objects, or from letters to words to meaning.
- See also Neural network
- Direct preference optimization DPO
- A simpler alternative to RLHF that tunes a model directly on pairs of preferred and rejected responses, skipping the separate reward model.
- See also RLHF PPO
- Embedding space
- A map of meaning in which words, images, or other items become points, positioned so that similar things sit close together and a computer can measure "how related" two items are.
- See also CLIP
- Emergent capability
- A skill that is essentially absent in smaller models and then appears, often abruptly, once a model crosses a certain scale — the "flat, flat, flat, jump" pattern.
- See also Phase transition
- Fine-tuning
- Adjusting an already-trained model on a smaller, targeted dataset so it performs a specific task or follows instructions better.
- See also Pre-training RLHF
- Flash Attention
- An efficiency technique (2022) that reorganises how self-attention reads memory, cutting its memory cost from growing with the square of the input length to growing linearly — which is what makes long context windows affordable.
- See also Self-attention Quantization
- FLOPs (floating-point operations) FLOPs
- A count of the arithmetic operations a computation takes — the standard yardstick for how much raw computing a model needs to train or to answer a query.
- See also GPU Compute-optimal training (Chinchilla)
- Goodhart’s Law
- "When a measure becomes a target, it ceases to be a good measure." Optimise hard for a proxy and the system games the proxy instead of achieving the goal behind it.
- See also Alignment problem
- Graphics processing unit GPU
- A chip built to do many simple calculations in parallel. Originally for video games, GPUs turned out to be ideal for training neural networks, and now dominate AI computing.
- See also CUDA FLOPs
- Grokking
- When a model suddenly generalises long after it appeared to have merely memorised its training data — evidence of an internal reorganisation during training.
- See also Emergent capability
- In-context learning
- A model’s ability to perform a new task from a few examples placed in the prompt, without any change to its trained weights.
- See also Fine-tuning
- Inference-time compute
- Spending extra computation when a model answers a question — letting it "think" step by step before replying — rather than only at training time. A newer way to buy better performance.
- See also Scaling laws
- Jailbreaking
- Crafting prompts that get a model to bypass its own safety training — an ongoing arms race between people finding such prompts and developers patching them.
- See also Alignment problem
- Mechanistic interpretability
- The effort to read what is actually happening inside a trained network — identifying which internal features correspond to which concepts — to understand models rather than merely steer them.
- See also Superposition hypothesis SAE
- Mixture of experts MoE
- A design that routes each input to only a fraction of a model’s parameters (its "experts"), so a very large model can run at a fraction of the computing cost of using all of it at once.
- See also Parameter FLOPs
- Multimodal
- A model that handles more than one kind of input or output — text, images, audio, video — within a single system, rather than one modality each.
- See also CLIP Embedding space
- Neural network
- A system that learns from examples rather than from hand-written rules: layers of simple numerical units pass signals along connections whose strengths are adjusted during training until the network reliably turns an input into an output.
- See also Deep learning Parameter
- Parameter
- One of the adjustable numbers (a connection strength) inside a neural network. Modern frontier models have hundreds of billions of them; the count is a rough proxy for a model’s capacity to store patterns.
- See also Token Scaling laws
- Phase transition
- A sudden qualitative change produced by a smooth quantitative one — water becoming steam at 100°C. AI capabilities that switch on at a scale threshold follow the same mathematics.
- See also Emergent capability
- Power law
- A relationship where one quantity changes as a fixed power of another, so improvements are steady and predictable on a log scale rather than flattening out.
- See also Scaling laws
- Pre-training
- The first, most expensive training stage: a model learns general patterns from enormous quantities of raw text before any task-specific tuning.
- See also Fine-tuning
- Proximal policy optimization PPO
- A reinforcement-learning algorithm — the optimisation step inside classic RLHF — that improves a model toward higher reward while limiting how far it can drift in any single update.
- See also RLHF DPO
- Quantization
- Running a model with lower-precision numbers (say 4-bit instead of 16-bit) to shrink it and speed it up, trading a little accuracy for the ability to run on cheaper hardware.
- See also Flash Attention
- Rational agent
- The long-standing working definition of an AI: an entity that perceives its environment through sensors, acts through actuators, and chooses actions to advance a goal. Perceive, decide, act.
- See also Turing test
- Reinforcement learning RL
- A training regime where an agent learns by trial and error, nudged toward actions that lead to rewarded outcomes — which is why it proved itself first in games, where winning and losing supply the reward.
- See also RLHF
- Reinforcement learning from human feedback RLHF
- A three-stage method for turning a raw next-word predictor into a helpful assistant: supervised tuning, training a reward model on human preferences, then optimising the model against that reward.
- See also PPO DPO Alignment problem
- Scaling laws
- The empirical finding that a model’s performance improves predictably as a power law of three inputs — compute, parameters, and training data — with no ceiling yet in sight.
- See also Power law Compute-optimal training (Chinchilla)
- Self-attention
- A mechanism that lets every element in a sequence weigh every other element at once, so a model can decide which earlier words matter for the one it is processing now.
- See also Transformer
- Sparse autoencoder SAE
- A tool that pulls apart a model’s overlapping internal signals into separate, more interpretable features — one of the main techniques for inspecting what a network has learned.
- See also Mechanistic interpretability Superposition hypothesis
- Superposition hypothesis
- The idea that a model represents more distinct features than it has neurons by overlapping them, which is why individual neurons rarely map cleanly to a single human concept.
- See also Mechanistic interpretability SAE
- Token
- The unit a language model reads and writes — roughly a word-piece. A page of text is about 500 tokens; training data is measured in trillions of them.
- See also Parameter Compute-optimal training (Chinchilla)
- Transformer
- The neural-network architecture (introduced in 2017) behind nearly all modern language models. It replaced step-by-step processing with self-attention, which made it both faster to train and better at long-range context.
- See also Self-attention
- Turing test
- Alan Turing’s 1950 proposal: if a machine can converse well enough that a human judge cannot reliably tell it from a person, we should stop withholding the word "thinking."
- See also Rational agent