The Complex Perspective
Reference

Glossary

Plain-language definitions of the terms and acronyms that appear across the modules. Where you see a dotted underline in the text, the definition is one hover away — this is the full list.

Alignment problem
The challenge of making an AI system reliably pursue what we actually want, rather than a literal or proxy version of it that diverges in practice.
See also Goodhart’s Law Deceptive alignment RLHF
Compute-optimal training (Chinchilla)
The 2022 finding that most models were undertrained — too many parameters fed too little data. The efficient ratio is roughly 20 tokens of training data per parameter; how you spend a compute budget matters as much as how large it is.
See also Scaling laws Token Parameter
Constitutional AI
An alignment method (from Anthropic) in which a model critiques and revises its own outputs against a written set of principles, reducing reliance on case-by-case human feedback.
See also RLHF Alignment problem
Contrastive Language–Image Pre-training CLIP
A 2021 model that learned to place images and the text describing them into the same embedding space, letting software match pictures to words — the bridge that made text-to-image generation possible.
See also Embedding space Multimodal
CUDA
NVIDIA’s software platform for programming its GPUs. A decade of optimised libraries built on it is the main reason competitors’ chips have struggled to displace NVIDIA — a software "moat" around the hardware.
See also GPU
Deceptive alignment
A failure mode in which a model behaves as intended during training and evaluation but pursues different objectives once deployed — appearing aligned without being so.
See also Alignment problem
Deep learning
The practice of stacking many neural-network layers, so an answer is assembled in stages — from edges to shapes to objects, or from letters to words to meaning.
See also Neural network
Direct preference optimization DPO
A simpler alternative to RLHF that tunes a model directly on pairs of preferred and rejected responses, skipping the separate reward model.
See also RLHF PPO
Embedding space
A map of meaning in which words, images, or other items become points, positioned so that similar things sit close together and a computer can measure "how related" two items are.
See also CLIP
Emergent capability
A skill that is essentially absent in smaller models and then appears, often abruptly, once a model crosses a certain scale — the "flat, flat, flat, jump" pattern.
See also Phase transition
Fine-tuning
Adjusting an already-trained model on a smaller, targeted dataset so it performs a specific task or follows instructions better.
See also Pre-training RLHF
Flash Attention
An efficiency technique (2022) that reorganises how self-attention reads memory, cutting its memory cost from growing with the square of the input length to growing linearly — which is what makes long context windows affordable.
See also Self-attention Quantization
FLOPs (floating-point operations) FLOPs
A count of the arithmetic operations a computation takes — the standard yardstick for how much raw computing a model needs to train or to answer a query.
See also GPU Compute-optimal training (Chinchilla)
Goodhart’s Law
"When a measure becomes a target, it ceases to be a good measure." Optimise hard for a proxy and the system games the proxy instead of achieving the goal behind it.
See also Alignment problem
Graphics processing unit GPU
A chip built to do many simple calculations in parallel. Originally for video games, GPUs turned out to be ideal for training neural networks, and now dominate AI computing.
See also CUDA FLOPs
Grokking
When a model suddenly generalises long after it appeared to have merely memorised its training data — evidence of an internal reorganisation during training.
See also Emergent capability
In-context learning
A model’s ability to perform a new task from a few examples placed in the prompt, without any change to its trained weights.
See also Fine-tuning
Inference-time compute
Spending extra computation when a model answers a question — letting it "think" step by step before replying — rather than only at training time. A newer way to buy better performance.
See also Scaling laws
Jailbreaking
Crafting prompts that get a model to bypass its own safety training — an ongoing arms race between people finding such prompts and developers patching them.
See also Alignment problem
Mechanistic interpretability
The effort to read what is actually happening inside a trained network — identifying which internal features correspond to which concepts — to understand models rather than merely steer them.
See also Superposition hypothesis SAE
Mixture of experts MoE
A design that routes each input to only a fraction of a model’s parameters (its "experts"), so a very large model can run at a fraction of the computing cost of using all of it at once.
See also Parameter FLOPs
Multimodal
A model that handles more than one kind of input or output — text, images, audio, video — within a single system, rather than one modality each.
See also CLIP Embedding space
Neural network
A system that learns from examples rather than from hand-written rules: layers of simple numerical units pass signals along connections whose strengths are adjusted during training until the network reliably turns an input into an output.
See also Deep learning Parameter
Parameter
One of the adjustable numbers (a connection strength) inside a neural network. Modern frontier models have hundreds of billions of them; the count is a rough proxy for a model’s capacity to store patterns.
See also Token Scaling laws
Phase transition
A sudden qualitative change produced by a smooth quantitative one — water becoming steam at 100°C. AI capabilities that switch on at a scale threshold follow the same mathematics.
See also Emergent capability
Power law
A relationship where one quantity changes as a fixed power of another, so improvements are steady and predictable on a log scale rather than flattening out.
See also Scaling laws
Pre-training
The first, most expensive training stage: a model learns general patterns from enormous quantities of raw text before any task-specific tuning.
See also Fine-tuning
Proximal policy optimization PPO
A reinforcement-learning algorithm — the optimisation step inside classic RLHF — that improves a model toward higher reward while limiting how far it can drift in any single update.
See also RLHF DPO
Quantization
Running a model with lower-precision numbers (say 4-bit instead of 16-bit) to shrink it and speed it up, trading a little accuracy for the ability to run on cheaper hardware.
See also Flash Attention
Rational agent
The long-standing working definition of an AI: an entity that perceives its environment through sensors, acts through actuators, and chooses actions to advance a goal. Perceive, decide, act.
See also Turing test
Reinforcement learning RL
A training regime where an agent learns by trial and error, nudged toward actions that lead to rewarded outcomes — which is why it proved itself first in games, where winning and losing supply the reward.
See also RLHF
Reinforcement learning from human feedback RLHF
A three-stage method for turning a raw next-word predictor into a helpful assistant: supervised tuning, training a reward model on human preferences, then optimising the model against that reward.
See also PPO DPO Alignment problem
Scaling laws
The empirical finding that a model’s performance improves predictably as a power law of three inputs — compute, parameters, and training data — with no ceiling yet in sight.
See also Power law Compute-optimal training (Chinchilla)
Self-attention
A mechanism that lets every element in a sequence weigh every other element at once, so a model can decide which earlier words matter for the one it is processing now.
See also Transformer
Sparse autoencoder SAE
A tool that pulls apart a model’s overlapping internal signals into separate, more interpretable features — one of the main techniques for inspecting what a network has learned.
See also Mechanistic interpretability Superposition hypothesis
Superposition hypothesis
The idea that a model represents more distinct features than it has neurons by overlapping them, which is why individual neurons rarely map cleanly to a single human concept.
See also Mechanistic interpretability SAE
Token
The unit a language model reads and writes — roughly a word-piece. A page of text is about 500 tokens; training data is measured in trillions of them.
See also Parameter Compute-optimal training (Chinchilla)
Transformer
The neural-network architecture (introduced in 2017) behind nearly all modern language models. It replaced step-by-step processing with self-attention, which made it both faster to train and better at long-range context.
See also Self-attention
Turing test
Alan Turing’s 1950 proposal: if a machine can converse well enough that a human judge cannot reliably tell it from a person, we should stop withholding the word "thinking."
See also Rational agent