Module 11

Agent-Based Modeling in the AI Age

How simple agent rules produce complex worlds — and how AI is transforming agent-based modeling from a specialist tool into a mainstream methodology for understanding complex systems.

~21 min read Advanced Builds on M2 M10

Emergence from Simple Rules

Agent-based modeling (ABM) begins with a deceptively simple premise: instead of writing equations that describe a system from above, create agents with rules and let the system dynamics emerge from their interactions. No central planner designs the outcome. No equation predicts it. The macro-level pattern — segregation, wealth inequality, market crashes, epidemics — arises from the micro-level behavior of agents who cannot see the whole system and act only on local information.

This is the methodological counterpart to the complexity science foundations in Module 2. Where Module 2 explored how network structure shapes dynamics, ABM asks a different question: given agents with heterogeneous characteristics, adaptive behavior, and local interactions, what system-level phenomena emerge?

The canonical demonstration is Thomas Schelling’s segregation model (1971). Place two types of agents on a grid. Each agent has a mild preference: it wants at least 30% of its neighbors — 2 or 3 of its up-to-eight adjacent cells — to be its own type. An agent that falls short moves to a random empty cell; everyone else stays put. Notice what the rule does not ask for: no agent demands a majority of its own kind. Every agent is content living as a minority. Before you press Run, commit to a guess: once the grid settles, will it look well-mixed, mildly clumped, or starkly divided? And will settling take hundreds of steps, or fewer than ten?

Schelling Segregation Simulator

Two populations (blue and red) on a grid. Each agent wants at least a threshold fraction of its neighbors to be the same type; unsatisfied agents move to random empty cells. Press Run and watch the grid and the segregation index. Moving a slider resets the grid.

Threshold: 30%

Density: 80%

Speed: 8

Segregation Index

0.49

Segregation

80%

Satisfied

Steps

Agents are randomly distributed. Press Run to see how mild individual preferences produce macro-level segregation.

You should have seen the segregation index climb from about 0.5 to roughly 0.73 within half a dozen steps — a well-mixed grid sorting itself into nearly homogeneous neighborhoods almost immediately. No agent wanted this outcome. No agent even preferred a majority of same-type neighbors. Yet the aggregate result is stark segregation, produced by a positive feedback loop: each move makes nearby agents of the other type slightly less satisfied, triggering further moves.

The natural next guess is that more intolerance means more segregation. Test it before reading on. The Strong preset (75%) does drive the index to near-total separation — about 0.99 — but watch the step counter: it takes well over a hundred steps, twenty times longer than the mild case. Now drag the threshold to 90% (the grid resets) and run again, committing to a prediction first. The result: segregation collapses. At 90%, almost no agent can ever be satisfied, so nearly everyone moves every step, and the churning grid stays mixed near 0.5 until the simulation gives up. More preference does not simply mean more separation — the relationship bends back on itself at the extreme.

This is the ABM insight in its purest form: the whole is not just more than the sum of its parts — it is qualitatively different. The behavior at the macro level (segregation) cannot be deduced from the behavior at the micro level (a mild preference) without running the simulation. The planets of Module 2 had a closed-form solution that made simulation a mere convenience; for this far simpler movement rule, no closed-form solution has ever been found. Only simulation reveals it.

ABM’s power lies in what it reveals: emergent phenomena that no individual agent intends and no equation captures. Schelling showed that mild individual preferences produce extreme collective outcomes. This is not a curiosity — it is the fundamental mechanism behind segregation, market bubbles, bank runs, and the cascading failures explored throughout this project.

One sign of how far the field has matured since Schelling computed his grids by hand: there are now documentation standards — the ODD protocol — that make models like this reproducible and comparable; the depth panel in the tool-landscape section below covers them.

Growing Artificial Societies

If Schelling demonstrates emergence from preference, Sugarscape demonstrates emergence from competition. Created by Joshua Epstein and Robert Axtell in their landmark 1996 book Growing Artificial Societies, Sugarscape places agents on a landscape of renewable resources (“sugar”) and gives them simple rules: look around, move to the cell with the most sugar within your vision range, harvest it, consume what your metabolism requires, and die if you run out.

The agents are heterogeneous — they differ in metabolism (1 to 4 units of sugar consumed per step) and vision (1 to 6 cells of sight). The landscape is uneven — two “sugar mountains” provide concentrated resources — and every agent starts with a random purse of sugar on a random cell. The simulator below tracks a single inequality number as the agents forage; for now, read it simply as higher means more unequal. Two experiments, and a committed guess before each. First, run the default heterogeneous world: does the number rise, fall, or wander? Second, click Equal Start — every agent now has identical metabolism and vision, keeping only its random birthplace and random starting purse. Does inequality still emerge?

Sugarscape Simulator

Agents (colored dots) forage on a sugar landscape (gold = high sugar). Presets change the agents and the resource growback. Run Heterogeneous first, then Equal Start, and compare the two traces.

Agents: 150

Growback: 1

Speed: 8

Gini Coefficient / Population

0.22

Gini

150

Population

Median Wealth

Agents are scattered across a resource landscape with two sugar peaks. Press Run to watch wealth inequality emerge from simple foraging rules.

In the heterogeneous world, the inequality number climbs from about 0.22 to roughly 0.38 by step fifty — and then locks in, plateauing rather than rising forever. Agents with wide vision and low metabolism accumulate reserves near the sugar peaks; others deplete theirs and die, the population falling from 150 to about 100. In the Equal Start world, the number falls — settling around 0.16, below where it began. Geography and starting luck wash out; it is the differences between agents that compound, step by step, into durable inequality. The number now deserves its name: it is the Gini coefficient, the standard measure of inequality, running from 0 (everyone owns the same) to 1 (one agent owns everything).

One caution about a word. “Distribution” here comes from statistics: it describes how frequently values occur — how wealth distributes itself across the population. It does not mean there is a distributor somewhere. No one carries out this distribution; no agent does anything to any other agent in this model. Each simply goes about its own foraging. To call the outcome an “unjust distribution” would miss the point — no agent is unjust. Yet the inequality is real, durable, and nobody’s decision.

That is Sugarscape’s importance for ABM methodology: it demonstrates how structural inequality can emerge from fair rules — a result that connects directly to Module 8’s complexity economics and the limitations of models that assume representative agents.

Sugarscape demonstrates that wealth inequality does not require exploitation, corruption, or unfair rules — it can emerge purely from agents competing for spatially concentrated resources. The Equal Start experiment shows which ingredient is load-bearing: with identical agents, geography and starting luck wash out and inequality falls; it is individual variation, compounded through competition, that produces systemic inequality — through a process no single agent controls.

The ABM Tool Landscape

The tools have undergone a generational shift since 2015. NetLogo, the dominant platform for two decades, remains the entry point for education and rapid prototyping. Research and production have moved on: Mesa brought ABM into Python’s data-science and AI ecosystem, Agents.jl (Julia) serves models needing one to two orders of magnitude more speed, and FLAMEGPU 2 runs millions of agents on a single GPU workstation — simulations that previously required a computing cluster. Around them sit spatial platforms (GAMA), commercial tools (AnyLogic), and a shared model repository (COMSES.net).

That one-paragraph map is enough to follow the rest of this module — you may skip the panel below without losing the thread. If you want more, its Overview organizes the frameworks into four tiers by use case, and the Detailed view adds architectures, benchmarks, the ODD documentation protocol, and the field’s venues and key figures.

The ABM tool ecosystem has matured from educational-focused standalone platforms to production-grade frameworks integrated with the Python/Julia scientific computing ecosystems. The GPU acceleration frontier (FLAMEGPU 2) has removed the computational ceiling that limited ABM to small-scale models — million-agent simulations now run on single workstations.

Adjustable Depth

The ABM tool landscape: frameworks, performance, and ecosystems.

The ABM tool landscape can be organized into four tiers by use case:

Education and prototyping: NetLogo (v6.x series with Python integration), Mesa (Python, most accessible modern framework), AgentPy (Jupyter-optimized scientific workflows).
Research performance: Agents.jl (Julia, 10-100x Python speed, native RL integration), Mesa 4 (improved scheduling and visualization).
GPU-accelerated scale: FLAMEGPU 2 (>1000x CPU on CUDA, millions of agents, real-time viz, Python bindings via pyflamegpu).
Domain-specific spatial: GAMA (GIS integration, complete IDE, urban/environmental applications), MATSim (transport/mobility), AnyLogic (commercial, multimodal, drag-and-drop).

The Python ecosystem integration is the most significant shift: Mesa, AgentPy, and pyflamegpu all connect directly to scikit-learn (surrogates), TensorFlow/PyTorch (neural networks), Optuna/Ray Tune (optimization), Pandas/NetworkX (analysis), and Docker/MLflow (reproducibility).

The framework comparison reveals deep architectural trade-offs:

Mesa (Python, Apache 2 Licensed): Mesa’s architecture follows a Model-Agent-Schedule pattern. The Model class owns the schedule (which controls agent activation order) and optional grid/network spaces. Mesa 4 adds improved data collection, modular visualization, and NumPy-backed grid operations. The JOSS publication and active GSoC participation signal long-term viability. Weakness: Python’s GIL limits true parallelism for large models.

Agents.jl (Julia): Julia’s multiple dispatch enables clean composition of agent behaviors without the class hierarchy overhead of OOP approaches. The benchmark comparison (github.com/JuliaDynamics/ABMFrameworksComparison) shows 10-100x speedup over Mesa/Repast on equivalent models. Event-driven scheduling (continuous-time) is native, enabling mixed discrete/continuous models. The Datseris et al. (2022) SIMULATION paper provides the formal description.

FLAMEGPU 2: The key innovation is mapping agent operations to GPU kernels. Each agent type has a set of “agent functions” executed in parallel across GPU threads. Communication between agents uses message boards (broadcast, spatial, bucket) rather than direct references — a design forced by GPU memory architecture but well-suited to ABM’s local-interaction patterns. The >1000x Boids benchmark (NVIDIA developer blog) reflects embarrassingly parallel updates; models with complex agent-agent dependencies see smaller but still significant speedups.

The ODD Protocol: The 2020 second update (Grimm et al., JASSS 23(2)) refined the original 2006/2010 versions. The seven elements: Purpose and Patterns, Entities/State Variables/Scales, Process Overview and Scheduling, Design Concepts (11 sub-elements including Emergence, Adaptation, Sensing, Interaction, Stochasticity), Initialization, Input Data, Submodels. Extensions: ODD+D (decision-making), ODD+2D (decisions + data). CoMSES.net hosts the protocol and maintains the Computational Model Library for ODD-documented models.

Python bridge infrastructure: pyNetLogo, NL4Py, and Netlogopy enable controlling NetLogo from Python — preserving investments in existing NetLogo models while adding ML/analysis capabilities. This hybrid approach is common in transitioning research groups.

Venues and people: The research community centers on the MABS workshop (Multi-Agent-Based Simulation, since 1998, now at AAMAS 2026) and the Social Simulation Conference (coordinated by ESSA, the European Social Simulation Association); the field’s primary journal is JASSS (Journal of Artificial Societies and Social Simulation, founded 1998). Key figures include Volker Grimm (Helmholtz Centre, pioneer of pattern-oriented modeling, 2023 Whittaker Award), Steven Railsback (co-author of the standard textbook), and Robert Axtell (George Mason, whose 2023 review in the Journal of Economic Literature mapped the field’s impact on economics).

AI-Powered ABM

The convergence of AI and ABM is transforming the field in three distinct ways: machine learning for calibration, reinforcement learning for agent behavior, and differentiable programming for end-to-end optimization.

ML surrogates for calibration address ABM’s computational bottleneck. Calibrating an ABM — finding parameter values that reproduce observed data — traditionally requires running the model thousands of times across the parameter space. Surrogate models replace this brute force with a learned approximation: train a neural network on a subset of ABM runs, then use the surrogate for rapid parameter exploration. Studies show deep neural networks outperform Gaussian processes and gradient-boosted trees for ABM emulation, achieving 100–1,000x speedup in parameter exploration. Combined with Bayesian optimization, surrogates enable efficient multi-objective calibration — fitting to multiple empirical patterns simultaneously.

Reinforcement learning replaces hand-coded agent rules with learned policies. Instead of specifying how agents should behave, RL agents discover strategies through trial and error, optimizing a reward signal (maximize wealth, minimize distance, maintain cooperation). The Abmarl framework (Lawrence Livermore National Laboratory) bridges ABM simulation and multi-agent RL training. This connects to Game Theory and Cooperation: multi-agent RL (MARL) systems exhibit emergent cooperation without explicit communication, develop “telepathic” coordination, and show distinct phase transitions — coordinated, fragile, and jammed/disordered regimes — depending on synchronization dynamics.

The most transformative development is differentiable ABM. If a simulation is differentiable — if gradients can flow backward through agent interactions — then parameter optimization becomes a gradient descent problem rather than a search problem. AgentTorch (NeurIPS 2023) tensorizes agent operations on a PyTorch backend, enabling end-to-end gradient flow, one-shot sensitivity analysis, and millions of agents on a single GPU. FLAME (AAMAS 2024, MIT Media Lab) provides a domain-specific language for stochastic ABMs with Autograd compatibility. Foragax uses JAX’s functional, differentiable Python for multi-agent foraging simulations.

The implication is profound: calibrating a model with millions of agents, which previously required days of random or grid search, can now converge in minutes using gradient descent. Sensitivity analysis — understanding how each parameter affects outputs — becomes automatic rather than requiring thousands of separate runs. The “game changer” potential is not incremental improvement but a qualitative shift in what models are computationally feasible.

AI is not replacing ABM — it is amplifying it. ML surrogates make calibration tractable. RL makes agent behavior adaptive. Differentiable programming makes optimization automatic. Together, they are transforming ABM from a tool that requires expert hand-tuning into a methodology that can be systematically optimized at scale.

The demo below turns this from a claim into a count. The task: find the Schelling threshold that produces a chosen target segregation level. Both strategies may only run the model and observe the result — no formula, no shortcut. Random search will spend exactly 40 runs. Commit to a guess before clicking: how many runs will gradient descent need to land near the target — more than 40, about the same, or far fewer?

Differentiable ABM: Parameter Optimization

Find the Schelling threshold that produces the target segregation level. Run both search strategies and compare the evaluation counters under each button.

Target Segregation: 0.75

The grey curve shows how far each threshold value is from producing the target segregation level. Random search (🔴) evaluates many points blindly. Gradient descent (🟢) follows the slope downhill, typically reaching the target in far fewer evaluations. Differentiable ABM frameworks like AgentTorch make this gradient computation automatic — enabling calibration of models with millions of agents.

Count the evaluations: at the default target, gradient descent typically lands within tolerance after 7 to 16 model runs, against random search’s fixed 40 — the counter under each button is the checkable number. Now the vocabulary for what you just watched. The grey curve — how far each threshold lands from the target — is a loss function. Walking downhill along its slope is gradient descent. Random search ignores the slope and pays for it in evaluations; gradient descent reads the slope and spends its budget only where the curve points. Differentiable ABM frameworks exist to make that slope available automatically, even for models with millions of agents and parameters.

That is the practical picture, and it is enough for the rest of the module — the panel below is optional depth. Its Overview explains how frameworks keep randomness differentiable (the reparameterization trick); the Detailed view covers AgentTorch’s tensor representation and hybrid learning modes.

Adjustable Depth

Differentiable programming, surrogate models, and RL for agent behavior.

Differentiable ABM works by making every operation in the simulation differentiable — meaning gradients can flow backward from outputs (e.g., final segregation index) through every agent interaction to the input parameters (e.g., threshold). This is the same principle that makes neural network training possible (backpropagation), applied to agent-based simulations.

The key challenge is stochasticity: ABMs rely on random number generation (for agent movement, interaction outcomes, etc.), which is not differentiable. Frameworks like AgentTorch handle this using the “reparameterization trick” — expressing random samples as deterministic functions of parameters plus noise from a fixed distribution. This preserves differentiability while maintaining stochastic behavior.

Surrogate models take a complementary approach: instead of making the ABM itself differentiable, train a neural network to approximate the ABM’s input-output mapping. The neural network is already differentiable, so standard optimization applies. The trade-off is accuracy — the surrogate is an approximation, not the exact model.

AgentTorch’s architecture (Chopra et al., NeurIPS 2023) represents agents as tensors rather than objects. Agent states are stored in multi-dimensional arrays where each row is an agent and each column is a state variable. Agent interactions are expressed as tensor operations — matrix multiplications, reductions, and element-wise functions — all of which have well-defined gradients in PyTorch’s autograd system. This “tensorization” serves dual purposes: it enables differentiability and it maps naturally onto GPU parallel execution.

The FLAME framework (AAMAS 2024, MIT Media Lab) takes a different approach: it provides a domain-specific language for expressing stochastic ABMs that compiles to either PyTorch or JAX backends. FLAME supports three learning modes: supervised learning (calibrate parameters to match data), reinforcement learning (optimize agent policies), and hybrid learning (embed differentiable neural network modules within mechanistic agent rules). The hybrid mode is particularly powerful — agents can have hand-coded domain knowledge for well-understood behaviors and learned neural components for complex decision-making.

The BiLSTM inverse mapping approach (arXiv 2509.03303) trains a bidirectional LSTM to map directly from observed time-series data to ABM parameters — bypassing both surrogate models and differentiable simulation. The network is trained on synthetic data generated by running the ABM with known parameters. At inference time, it provides parameter estimates in a single forward pass, enabling real-time calibration.

Bayesian optimization with emulators (Nature Communications 2021) combines GP or neural network surrogates with active learning: the optimizer chooses the next parameter point to evaluate by balancing exploitation (regions near the current best) and exploration (regions of high uncertainty). For multi-objective calibration — fitting multiple empirical patterns simultaneously — this approach is dramatically more sample-efficient than grid search or random sampling.

LLM Agents and the Validation Challenge

The most dramatic recent development is using large language models as agent cognition. Instead of hand-coded rules or learned RL policies, LLM-powered agents perceive their environment through text descriptions, reason using the LLM’s capabilities, and produce actions in natural language. Stanford’s “Generative Agents” paper (2023) demonstrated LLM agents planning daily routines, forming relationships, and organizing social events in a simulated town — behavior far more human-like than any rule-based agent could produce.

LLM-assisted ABM creation is equally transformative. Studies show that LLMs can generate working ABM code from ODD protocol descriptions (CHI 2024), enable multi-stage workflows where the LLM translates natural language specifications into Mesa or NetLogo code, and accelerate the model development cycle from weeks to hours. The Tsinghua FIB-Lab’s survey (Nature Humanities and Social Sciences Communications, 2024) mapped the full landscape of LLM-ABM integration.

But this power comes with a fundamental challenge: validation. As Scherrers et al. (2024) argue in Artificial Intelligence Review, LLM integration may exacerbate rather than alleviate ABM’s validation crisis. The core problems are interconnected:

Black-box behavior: LLM agents make decisions through mechanisms that are opaque to the modeler. Traditional ABM rules are transparent — you can trace exactly why an agent moved. LLM agent decisions pass through billions of parameters with no interpretable causal chain.

Reproducibility: LLMs are stochastic — the same prompt can produce different outputs across runs, API versions, and even temperature settings. This undermines the reproducibility that the ODD protocol was designed to ensure.

Bias inheritance: LLM agents inherit the biases of their training data. A simulated population of LLM agents may exhibit WEIRD (Western, Educated, Industrialized, Rich, Democratic) biases rather than the behaviors of the population being modeled.

Scalability: Running an LLM query per agent per step is computationally expensive. Simulations with thousands of LLM agents face significant GPU and API cost constraints that traditional ABMs do not.

This creates a methodological fork: mechanistic ABMs offer transparency and reproducibility but limited behavioral realism; LLM-powered ABMs offer rich, human-like behavior but at the cost of interpretability and validation rigor. The field has not yet resolved this tension.

Meanwhile, evolutionary game theory — building on Game Theory and Cooperation — has been applied widely in ABM since 2019 — one review analyzed 539 such publications across healthcare (tumor heterogeneity as evolutionary games, vaccine logistics as tripartite games), sustainability (fisheries and water management as N-player commons dilemmas, smart grid economics as evolutionary games against dynamic pricing), and governance (stakeholder strategy co-evolution). These applications demonstrate ABM’s value precisely where validation is most critical — in domains where policy decisions affect millions.

The validation challenge is not a bug in ABM methodology — it is the central question. LLM agents make the trade-off explicit: we can have behavioral realism or mechanistic transparency, but not both simultaneously. The field’s maturity depends on developing validation frameworks that can handle this trade-off honestly, rather than pretending either side has been resolved. Module 15 develops the validation-tier taxonomy — face, behavioral, and predictive validity — in full.

If the fork’s practical consequences are all you need, skip the panel below and move on to the applications — nothing later depends on it. For readers who want the machinery: the Overview recaps how validation was done before LLM agents, and the Detailed view adds the formal verification–validation–accreditation framework and the economics of LLM-agent scale.

Adjustable Depth

LLM agents, validation frameworks, and the transparency-realism trade-off.

ABM validation has always been challenging — models are inherently underdetermined (many parameter combinations can produce the same output patterns). Traditional approaches include: pattern-oriented modeling (fitting to multiple empirical patterns at different scales), sensitivity analysis (testing how robust results are to parameter changes), and cross-validation against held-out data.

LLM agents add three new validation challenges. First, the agent’s decision function is a neural network with billions of parameters that was trained on internet text — there is no way to audit its “rules.” Second, LLM outputs are stochastic and change across model versions, making exact replication impossible. Third, the training data introduces systematic biases that may not represent the population being modeled.

The practical response is emerging as a hybrid approach: use LLM agents for exploratory analysis and hypothesis generation (where behavioral richness matters most), then validate findings with mechanistic ABMs (where transparency and reproducibility matter most). The two approaches complement rather than replace each other.

The [In]Credible Models framework (JASSS Vol. 27, Issue 4) formalizes the validation challenge as a three-stage process: Verification (does the code implement the model correctly?), Validation (does the model represent the real system adequately?), and Accreditation (is the model suitable for its intended purpose?). Each stage has distinct requirements and failure modes.

For LLM-ABM specifically, verification faces the problem that the LLM’s behavior cannot be specified declaratively — it emerges from prompt engineering and model weights. The ODD protocol’s “Design Concepts” section, which documents emergence, adaptation, sensing, interaction, and stochasticity, becomes nearly impossible to complete for LLM agents because these properties are not designed but inherited from pre-training.

The bias inheritance problem is well-documented in the NLP literature. LLM agents in social simulation inherit distributional biases from their training corpora: they over-represent English-language, Western, educated perspectives. Attempts to “prompt away” these biases have limited effectiveness because the biases are deeply embedded in the model’s representations. This is particularly problematic for simulations of non-Western societies or historical populations.

Evolutionary game theory applications demonstrate the value of validated mechanistic ABMs. In cancer research, tumors are modeled as multi-population evolutionary games where different cell phenotypes correspond to strategies. The evolutionary dynamics — mutation, selection, drift — are well-understood mechanistically and can be validated against clinical data. ABMs of healthcare supply chains (vaccine logistics as tripartite games between manufacturers, distributors, and consumers) similarly benefit from mechanistic transparency: policymakers need to understand why the model recommends a particular intervention.

The scalability constraint is economic as well as computational. Running GPT-4 for 1,000 agents × 100 steps × multiple runs generates significant API costs. FLAMEGPU’s >1000x speedup is irrelevant for LLM agents because the bottleneck is LLM inference, not agent computation. Some researchers explore distilling LLM behavior into smaller, faster models — training a lightweight neural network to approximate the LLM’s decisions — but this introduces yet another layer of approximation and validation challenge.

Applications and the Road Ahead

ABM has moved from academic exploration to operational deployment across multiple domains. COVID-19 was a watershed: individual-based models like Covasim captured heterogeneous contact patterns, superspreading events, and the effects of targeted interventions (school closures, workplace policies, vaccination prioritization) that aggregate compartmental models like SIR could not resolve. The pandemic demonstrated ABM’s value for policy under uncertainty — and its limitations when calibration data was sparse and fast-changing, as explored in Module 7’s complexity lens on COVID.

Smart cities and urban planning use ABM to simulate traffic, pedestrian flow, energy systems, and land use change. GAMA and MATSim power large-scale urban simulations integrating GIS data with agent behavior. Climate applications model farmers’ adaptation to changing conditions, migration patterns driven by environmental stress, and the cascading effects of extreme weather through interconnected infrastructure — connecting to Module 2’s resilience analysis.

Financial markets — the domain explored in Module 8’s market simulator — use ABM to model heterogeneous trader strategies, flash crashes, and systemic risk. Central banks including the Bank of England, ECB, and Federal Reserve now use ABM alongside traditional DSGE models for policy simulation.

The computational frontier continues to advance. A 2023 economic simulation ran 331 million agents in 108 seconds using 128 CPU cores — roughly 70% parallel scaling efficiency. Generative social simulations have exceeded 10,000 LLM-powered agents with over 5 million interactions per run. Digital twins — live ABMs continuously calibrated against real-time data — represent the convergence of ABM, IoT, and AI infrastructure; Module 15 develops them in full.

The field’s trajectory points toward a synthesis: differentiable ABM cores providing gradient-based optimization, GPU acceleration enabling million-agent scale, surrogate models making calibration tractable, and LLM agents providing behavioral richness where needed — all within reproducible, ODD-documented frameworks. The gap between this vision and current practice remains significant. But the tools, the community, and the demonstrated applications make agent-based modeling the most natural computational methodology for the complex systems that define our world.

Agent-based modeling has matured from an academic novelty into the computational methodology of complexity science. The core insight remains unchanged from Schelling in 1971: the dynamics that matter — segregation, inequality, market crashes, epidemics, cooperation, conflict — emerge from agents and interactions, not from equations. What has changed is our ability to build, calibrate, validate, and scale these models. The AI age has not replaced ABM — it has made it indispensable.