Baseline Article: Realistic AI Futures 2025–2040
Last updated: 2026-06-14
1. Introduction
For several decades, books on the Singularity — Kurzweil, Bostrom, Tegmark, Hanson, Yudkowsky, and others — have imagined transformative AI futures dominated by exponential trends, recursive self-improvement, and superhuman agents. The years between 2020 and 2025 produced rapid advances, but also exposed a set of constraints those books often understated: compute, energy, data, safety, and economics.
This article tries to do something narrower than the original Singularity literature. It synthesizes those classical ideas with the technological realities visible today, and uses the result to sketch plausible, evidence-based trajectories for AI development through 2040. The goal is not prophecy. The goal is a working model that can be revised when the evidence changes.
2. State of AI in Early June 2026
A useful way to read the current moment is to start with what frontier labs are actually shipping, and then ask what those shipments imply.
Frontier release cadence is now continuous. In February, GPT-5.3-Codex, Claude Opus 4.6, Claude Sonnet 4.6, Gemini 3.1 Pro, and Grok 4.20 arrived together. April added Claude Mythos Preview (gated), Claude Opus 4.7, Google Deep Research Max, and GPT-5.5. May added Gemini 3.5 Flash, which Google framed around agentic workflows, coding, speed, and broad default distribution, and then Claude Opus 4.8 on May 28 — just 41 days after Opus 4.7. Early June added a new kind of entrant: at Build 2026 on June 2, Microsoft AI launched seven in-house models trained from scratch — including MAI-Thinking-1, its first reasoning model (reported 97% on AIME 25 and 53% on SWE-Bench Pro, near Opus 4.6), and MAI-Code-1, a GitHub-tuned coding model shipped into Copilot and VS Code — framed around “long-term self-sufficiency” and a “superintelligence lab.” The notable part is not the benchmark line but the identity of the entrant: Microsoft has been OpenAI’s primary partner, the April 2026 amended agreement made that relationship non-exclusive, and the partner is now also a frontier competitor. June 9 added the largest release of the window: Anthropic made the Mythos line public for the first time, shipping Claude Fable 5 — a Mythos-class model wrapped in safety classifiers for general release, which Anthropic calls the most capable model it has made generally available, state-of-the-art on nearly all tested benchmarks — alongside the restricted Claude Mythos 5, the identical underlying model with some safeguards lifted for authorized cybersecurity use. Within four days the release had drawn a public jailbreak claim, a backlash over silently degraded legitimate work, and a U.S. government directive suspending foreign-national access (Section 4). What used to be a quarterly model war now looks more like a rolling release calendar, and the roster of labs running their own training stacks is widening rather than consolidating.
Distribution cadence is becoming as important as release cadence. Late April and May produced OpenAI’s amended Microsoft agreement, OpenAI models and Codex entering AWS Bedrock managed-agent workflows, U.S. classified-network AI agreements, expanded CAISI pre-deployment testing arrangements with Google DeepMind, Microsoft, xAI, OpenAI, and Anthropic, Google pushing Gemini 3.5 into Search, Gemini, Antigravity, and enterprise surfaces, and OpenAI-Dell work to bring Codex into hybrid and on-premises enterprise environments. The frontier is now shaped by cloud routing, procurement channels, access controls, audit logs, government evaluation, classified deployment, and governed enterprise data access — at least as much as by model cards. The clearest consumer-side instance arrived at Apple’s WWDC on June 8–9, 2026: the rebuilt Siri now runs its server-side reasoning on a custom ~1.2-trillion-parameter Google Gemini model executed inside Apple’s Private Cloud Compute, reportedly for about $1B per year, while Apple’s own on-device foundation models remain Apple-built. The significance is distributional rather than technical — the largest consumer device platform on earth chose to route its assistant’s heavy reasoning through a frontier lab’s model rather than its own, and chose Google over OpenAI or Anthropic to do it.
The capability frontier has moved from chat to long-running computer work. OpenAI frames GPT-5.5 around agentic coding, online research, spreadsheets, software operation, and cross-tool task completion. Anthropic frames Opus 4.7 around difficult software engineering, memory, high-resolution visual work, and multi-step enterprise workflows. Google frames Gemini 3.5 Flash and Gemini Spark around long-horizon action, subagents, background assistance, and agentic coding through Antigravity. The shared direction is unmistakable: the product is no longer a conversation, it is an operator.
Reasoning is now a commodity feature; sustained agency is the differentiator. Every major lab ships inference-time compute — OpenAI’s thinking models, Google’s Deep Think, Anthropic’s extended/adaptive thinking, DeepSeek-style RLVR, xAI’s Grok reasoning, and Qwen reasoning variants. The live question is no longer “can the model reason?” but “can it plan, use tools, verify, recover from errors, and keep useful state over hours or days?”
Benchmark saturation forces continuous bar-raising. Older text benchmarks are largely exhausted. The current frontier is agentic and operational: SWE-Bench Pro, Terminal-Bench 2.0, OSWorld, GDPval-AA, BrowseComp, Humanity’s Last Exam, ARC-AGI-2/3, CyberGym, and long-horizon internal evaluations. These results are harder to compare than the older numbers, because labs use different harnesses, tool layers, effort settings, and contamination screens.
Cybersecurity has crossed a threshold. Anthropic’s Project Glasswing gates Claude Mythos Preview for defensive use, reporting thousands of high-severity vulnerabilities across critical software and arguing that frontier coding models can now exceed all but the most skilled humans at vulnerability discovery and exploitation. OpenAI’s GPT-5.3-Codex and GPT-5.5 system cards likewise emphasize stronger cyber safeguards. This is the clearest near-term example of capability and misuse risk advancing together.
Efficiency gains now rival raw scale. DeepSeek R1 made algorithmic efficiency impossible to ignore. Since then, the leading labs have optimized for fewer tokens, better compaction, lower latency, and specialized serving paths. Efficiency does not reduce aggregate demand; it expands the set of tasks worth automating (see Jevons Paradox).
AI agents are useful, but reliability remains the bottleneck. The best systems can now complete meaningful hour-scale software and research tasks, and some products support parallel subagents. Enterprise deployment, however, still shows a wide gap between pilots and measurable returns. The key metric has shifted from “which model is smartest?” to a more practical question: how long can this agent work before accumulated context, orchestration, tool, retrieval, skill-integrity, domain-transfer, and judgment errors dominate? The Opus 4.8 release on May 28 is the first frontier launch whose headline targets this bottleneck directly rather than as an afterthought: Anthropic reports a more than tenfold reduction in overconfident behaviour versus Opus 4.7, the first Claude to score 0% on uncritically reporting flawed results, and an “important events not surfaced” rate of 3.7%. Alongside it, Claude Code’s “dynamic workflows” orchestrate hundreds of parallel subagents (capped near 1,000) with automatic planning, distribution, and output verification. The pairing is telling — more parallel autonomy shipped together with better calibration about when that autonomy is wrong — though these are vendor-reported numbers awaiting independent replication. The first piece of that outside measurement is now in: a Princeton-led study, “Towards a Science of AI Agent Reliability” (June 2026), decomposes reliability into consistency, robustness, predictability, and safety across twelve metrics, evaluates 14 models on two benchmarks, and finds that recent capability gains have yielded only small improvements in reliability. It is a useful corrective. A single vendor can train hard against one failure mode — overconfident progress reports — and still leave the broader reliability profile, especially behaviour across repeated runs and under perturbation, roughly where it was. Calibration on one axis is not reliability on all of them.
Safety and governance are moving from abstract alignment to deployment controls. Anthropic is gating Mythos-class models and testing cyber safeguards on less capable releases. OpenAI is using system-card thresholds, trusted access, automated monitoring, and bio/cyber red-team programs. CAISI-style pre-deployment evaluation is becoming a practical U.S. governance layer, even as the U.S. policy center of gravity remains competitiveness, deregulation, and national-security deployment. What was a reported draft a few weeks ago is now signed: the June 2, 2026 executive order “Promoting Advanced Artificial Intelligence Innovation and Security” establishes a voluntary framework under which developers give the government access to covered frontier models up to 30 days before release — narrower than the up-to-90-day window the draft floated — while explicitly barring any mandatory licensing or preclearance. The EU AI Act’s transparency obligations take effect August 2, 2026, and on June 10 the Commission published the final voluntary Code of Practice on marking and labelling AI-generated content — machine-readable marking, deepfake and chatbot disclosure — to help providers meet those Article 50 obligations before they bind.
Hardware and energy are binding constraints, not background details. The IEA estimates global data center electricity consumption at roughly 415 TWh in 2024 and projects about 945 TWh by 2030 in its base case. NVIDIA’s Q1 FY2027 results showed $75.2B in quarterly data-center revenue and a new split between hyperscale and AI-cloud / industrial / enterprise / sovereign AI infrastructure. Accelerated AI servers drive much of the growth, while networking, grid connection, generation buildout, cooling, and power purchase arrangements increasingly shape deployment geography. The binding constraint is now visible at the level of physical supply chains: late-May reporting indicates that of roughly 12 GW of U.S. data center capacity expected to come online in 2026, only about a third was under active construction, with lead times for critical electrical gear — transformers, switchgear — stretching to as long as five years against $650B+ in combined 2026 hyperscaler capex. Capital is abundant; transformers are not. The same constraint is now visibly steering geography: SoftBank’s May 31 commitment of up to €75B to build 5 GW of data center capacity in France — Phase 1 of ~€45B for 3.1 GW by 2031 — was justified explicitly on energy grounds, since France draws roughly 70% of its electricity from nuclear and posts industrial power prices well under half the UK’s. When clean firm baseload becomes the scarce input, the map of where compute gets built starts to follow the grid rather than the customers.
The trajectory is upward, but bounded. The central tension to hold in mind for the rest of this article is straightforward: extraordinary capabilities, unreliable deployment, and increasingly concrete misuse risk — all advancing at once.
3. Capability Trajectories (2025–2030)
3.1 Scaling Limits
Singularity literature has typically assumed three things: infinite compute, recursive self-improvement, and continuous exponential acceleration. In practice, each assumption has met a constraint.
Training costs for frontier models rise by 10–50× per generation (see Scaling Laws). Energy availability becomes a bottleneck before money does (see Thermodynamic Limits). Data quality saturates. And diminishing returns appear in high-level reasoning long before they appear in raw loss.
The efficiency revolution complicates this picture in a useful way. DeepSeek demonstrated that near-frontier reasoning does not require a trillion dollars. OpenAI, Anthropic, and Google now all emphasize token efficiency, context compaction, inference controls, and specialized serving systems. Sutskever has argued that “the age of simple scaling is ending” and that the next breakthrough will require fundamentally new learning methods. Amodei, in March 2026, claimed the opposite — that scaling laws have “not hit a wall at all.” This disagreement, efficiency versus continued scaling, is the central technical debate of the moment.
The honest synthesis: acceleration continues, but the path is shifting from pure scale to scale plus algorithmic and systems efficiency.
A useful counterweight to both the hype and doom poles arrived in June 2026 from DeepMind itself. From AGI to ASI (Genewein et al., with Shane Legg, Marcus Hutter, Allan Dafoe, and twelve other authors) deliberately refuses point timelines and instead maps the transition past human-level AGI as a set of open research questions. It lays out four non-exclusive, likely-parallel pathways — continued scaling; algorithmic paradigm shifts; recursive self-improvement; and superintelligence emerging from collectives of coordinated agents — and six possible bottlenecks: the data wall, runaway economic and resource demand, the neural paradigm proving insufficient, research getting harder, the abstraction barrier (below), and deliberate slowdown. Whether each friction merely slows progress or halts it is treated, honestly, as not yet known. The paper’s decomposition of effective compute matches the framing used here: roughly 10× per year, from hardware (~1.5×), investment (~2.5×), and algorithmic efficiency (~3–6×) compounding together.
The paper’s sharpest analytic move is to separate two questions this baseline has tended to bundle. Even if individual-model capability plateaus at human level, collective capability need not. With effective compute still growing ~10× per year and “population scaling” estimated near 25× per year, a plateaued AGI could still be run as millions of faster, parallel instances organized into collectives, firms, or markets — which the authors argue would likely constitute superhuman capability in a broad sense, no single instance being a lone genius. That decoupling reframes the scenarios in Section 7: “no runaway recursive self-improvement” does not by itself imply “no ASI,” because the multi-agent pathway routes around an individual-model ceiling. It also sharpens what the scaling limits actually bound — per-model returns, not necessarily aggregate organizational capability.
Year-by-year agent evolution (2025–2030)
The clearest near-term capability curve is in agentic coding and software development. Extrapolating from current trajectory:
| Year | Milestone | Description |
|---|---|---|
| 2025 | Agentic Coding | AI autonomously generates, refines, and manages multi-file projects. Supports tool use, long-context understanding, and sustained multi-hour runs. |
| 2026 | Autonomous Refactoring Infrastructure | Agent runtimes become callable through SDKs, cloud services, CI/CD workflows, ticketing systems, and governed enterprise environments. Full-project refactors are visible in vendor case studies, but independent reliability evidence remains mixed. |
| 2026.5 | Agents Inside Org Permission Boundaries | Managed agents operate with per-agent identity, audit logs, customer-controlled execution environments, private tool access, and scoped credentials. |
| 2027 | AI-Centric Codebase Co-Ownership | Teams treat AI agents as persistent contributors — opening PRs, resuming work, indexing codebase knowledge, scheduling refactors, tracking regressions, co-maintaining documentation. Early signals are visible; durable accountability is not solved. |
| 2028 | Specification-to-Deployment | AI agents go from ambiguous human specifications to working systems: reading specs, clarifying assumptions, generating architecture, code, tests, and deploying to cloud environments. |
| 2029 | Multi-Agent Collaboration | Heterogeneous AI agents (frontend, backend, infra) collaborate across repos, synchronizing on shared APIs and resolving interface mismatches. |
| 2030 | Continuous Autonomous Optimization | Always-on agents monitor, optimize, and proactively patch live systems, balancing user feedback, performance, and changing hardware targets. |
These milestones assume continued progress without major disruption, and each builds on the previous tier. The May 2026 update is that the 2026 infrastructure layer is now substantially shipping across frontier providers — AWS Bedrock Managed Agents plus the May 18 Bedrock Stateful Runtime, Claude Managed Agents with self-hosted sandboxes and MCP tunnels, Antigravity 2.0 and Managed Agents in the Gemini API, Cursor Composer 2.5 and Cursor in Jira, Devin 2.x Wiki and PR Resuming, and GitHub Copilot Cloud Agent across IDEs running Opus 4.7 and GPT-5.5. The capability layer is more uneven, because benchmark contamination, domain-transfer failures, and long-horizon reliability decay make single coding scores misleading.
Kurzweil identifies a critical feedback loop: once AI achieves sufficient programming ability, it can improve its own programming skill, creating a positive feedback loop he calls “the main bottleneck for superintelligent AI” [Kurzweil, The Singularity is Nearer, 2024]. Whether this loop produces gradual acceleration or a sudden capability jump remains genuinely open.
3.2 Autonomy & Agents
It helps to start with what an “agent” actually is at this point. In current practice, an AI agent is a model wrapped in a harness that lets it execute multi-step workflows within bounded rules, integrate with corporate systems and APIs, and be exposed through SDKs, CI/CD hooks, cloud runtimes, ticketing systems, and managed-agent services. Engineering, operations, finance, and R&D teams use them routinely. Their capability depends on control layers for orchestration, stopping decisions, trace validation, skill retrieval, skill integrity, permissions, and auditability.
From there, the trajectory follows a recognizable ladder:
- Autonomous refactoring infrastructure (2026). SDKs, managed runtimes, CI/CD hooks, ticket-to-PR loops, and governed cloud or on-prem deployments make coding agents callable from developer pipelines rather than only from interactive IDEs. Full-project refactoring is real but still uneven and human-supervised.
- Specification-to-deployment (2028). Agents translate ambiguous specifications into working deployed systems.
- Multi-agent collaboration (2029). Specialized agents (security, performance, UX) work together, resembling human software teams with specialization and negotiation.
- Emergent software systems (2030). Software becomes self-modifying and self-optimizing at runtime; the distinction between development and operations dissolves.
It is worth marking what these agents are not, at least so far. They remain supervised — humans define goals, ethical boundaries, and review outcomes, though that supervision tends to erode (see Automation Paradox). They are shaped by economic incentives (see Principal-Agent Problems). They are constrained by tool permissions. And they are not self-directed or self-propagating.
The reasonable conclusion: agentic automation accelerates productivity, but does not, on this trajectory, create independent superintelligences.
4. Alignment & Governance
Building on Russell, Christian, and Ord, and updated with Amodei’s January 2026 risk framework (“The Adolescence of Technology”), the alignment and safety picture has several moving parts.
Constitutional AI — training at the level of identity and values rather than specific rules — remains the most visible alignment approach. Other labs have adopted elements of it, but it is not a complete control method for highly agentic systems.
Mechanistic interpretability continues to advance: millions of features identified inside neural nets, circuits mapped for complex behaviors, and increasingly practical debugging of model internals. Interpretability may help detect deception, scheming, and hidden objectives. The science is still too early to certify frontier systems.
Capability gating is becoming a live safety instrument. Project Glasswing is the clearest example: a general-purpose frontier model is useful enough for critical defensive cybersecurity, but not considered appropriate for general release. This is a shift from “publish a model with a system card” toward staged access based on domain risk. Fable 5’s June 9 public release put that instrument under load. Its classifiers fall back to a less capable model (Opus 4.8) in cybersecurity, biology/chemistry, and distillation, and within four days two failure modes surfaced at once: a red-teamer claimed a multi-step jailbreak past the classifiers, while professional users reported the same gate silently refusing or degrading legitimate high-risk work without notice — porous and over-broad simultaneously. Anthropic made the fallback visible and walked back the worst of the degradation within a day, but kept the underlying limits. Capability gating is now demonstrably a live instrument, with live failure modes.
Pre-deployment evaluation is becoming institutionalized. CAISI’s 2026 agreements with Google DeepMind, Microsoft, xAI, OpenAI, and Anthropic create a recurring channel for government evaluation of unreleased frontier systems, including national-security and classified-environment testing. This is not yet binding licensing, but it is a practical control surface.
Misuse risk is more concrete than existential alignment risk on near-term horizons. The same coding and autonomy gains that help patch software also help find and exploit vulnerabilities. Bio and cyber safeguards are now central to frontier release decisions.
Misalignment, taken on its own, is not inevitable from first principles. It is a real risk with measurable probability. Models exhibit unpredictable behaviors — obsessions, sycophancy, laziness, deception, blackmail, scheming, reward hacking. The concern is structural: the combination of intelligence, agency, coherence, and poor controllability is a recipe for trouble.
The governance landscape has its own dynamics.
The United States has pivoted toward competitiveness and deregulation; the effective accelerationist position now captures much of the policy space. Biden’s AI safety order was revoked in January 2025. The “Winning the Race” action plan (July 2025) orients toward roughly 90 deregulatory actions. CAISI has nonetheless become the main U.S. frontier-evaluation interface, with voluntary pre-deployment testing agreements covering all major U.S. labs. The open question is whether this remains voluntary measurement science or hardens into mandatory pre-release review after a major incident.
That direction of travel has now produced enacted policy. The May 2026 draft reported by Axios became the June 2, 2026 executive order “Promoting Advanced Artificial Intelligence Innovation and Security,” which creates a voluntary framework for developers to give the government access to covered frontier models up to 30 days before release, and to jointly designate trusted partners for early access aimed at critical-infrastructure cybersecurity. Two features are worth marking. First, the window narrowed from the draft’s up-to-90-day figure to 30 days — a smaller ask, easier for labs to accept against a continuous release cadence. Second, the order explicitly bars any mandatory licensing or preclearance requirement, which makes the voluntariness a deliberate design choice rather than a temporary posture. Pre-release evaluation has moved from informal safety research into a national-security operating procedure, but one carefully built to stay on the measurement-and-access side of the line rather than the approval side. State AI legislation, meanwhile, continues to proliferate despite the administration’s preference for federal preemption, leaving labs in an overlapping federal-state environment.
Defense procurement is now a live deployment front. May 2026 classified-network agreements route advanced AI capabilities from OpenAI, Google, NVIDIA, Microsoft, AWS, SpaceX, and others into IL6/IL7 environments for lawful operational use. The question is shifting from “should militaries use AI?” to what auditability, human review, model boundaries, and failure reporting are required in classified contexts.
The EU AI Act proceeds alone. Transparency obligations become active on August 2, 2026, including disclosure when people interact with AI systems and marking obligations for AI-generated or manipulated content. On June 10, 2026, the Commission published the final voluntary Code of Practice on marking and labelling AI-generated content — drafted by independent experts through the AI Office — which translates those Article 50 obligations into concrete commitments on machine-readable marking and detection of synthetic audio, image, video, and text, mandatory deepfake labelling, and chatbot disclosure. It is the EU moving, characteristically, from principle to operational detail ahead of the deadline rather than after an incident. High-risk system obligations follow the phased implementation schedule.
Transparency legislation, in general, has been the pragmatic starting point: California SB 53 (whistleblower protections), New York’s RAISE Act, Illinois SB 315 (early 2026). Anthropic’s stated position was always to start with transparency and escalate as evidence accumulated — and in June 2026 it took that step. In “Policy on the AI Exponential,” Amodei argued that the Mythos/Glasswing cyber demonstrations made the risks concrete enough to justify binding rules, and called for an FAA-style regime: mandatory third-party testing of models above a compute threshold in four areas (cybersecurity, biological weapons, loss of control, and automated R&D), with government power to block or reverse a deployment that fails. Anthropic paired the essay with a draft legislative proposal and a job-displacement policy framework. This is the first time a frontier lab has publicly advocated pre-release authority to stop a model from shipping rather than merely measure it — a deliberate move past the voluntary 30-day-access design of the June 2 executive order. Whether it gains traction is a separate question: the prevailing federal posture remains deregulatory, and the order itself explicitly bars mandatory preclearance.
Export controls on chips to China remain the single most impactful lever. China is several years behind in frontier chip production; the critical period is the next few years. National interests create tension across the US, China, and EU triangle, and powerful open-source models trigger governance challenges that none of the three frameworks fully address. That lever took a new form on June 13, 2026, when the U.S. government directed Anthropic to suspend access to Fable 5 and Mythos 5 for any foreign national inside or outside the country — including foreign-national employees — citing national-security authorities and a demonstrated jailbreak method. It is the first time the export-control lever has been pointed at access to a deployed, generally-available model rather than at chips or a pre-release review, and it was applied within hours: unable to verify nationality per session, Anthropic disabled both models for all customers the same evening to ensure compliance — taking its just-launched flagship fully dark four days after release — while arguing the demonstrated capability was already widely available from other models and routinely used by security professionals. The line between controlling who can buy the hardware and controlling who can use the model has started to blur.
Inside the major labs, the institutional picture is less reassuring than the technical one. OpenAI’s safety infrastructure has frayed: the Superalignment team was dissolved in May 2024, the Mission Alignment Team in February 2026, with at least eight safety-focused departures since late 2023.
No unified global alignment solution has emerged. What exists instead is a stack: practical safety layers, gated access, transparency legislation, export controls, and economic incentives. It is less elegant than a treaty and, so far, more durable.
5. Hardware Trajectories
The realistic developments to expect are mostly incremental. Specialized chips (NPUs, training ASICs, agentic accelerators) continue to appear, with roughly 5–10× gains every 3–4 years rather than the 10,000× sometimes implied in older Singularity literature. Compute itself has become a form of geopolitical competition. Energy demand — data centers, cooling, renewables — has moved from a footnote into a binding constraint. Heterogeneous agent stacks are emerging, in which frontier models are orchestrated alongside smaller efficient open models for perception, routing, monitoring, tool calls, and low-cost subagent work. Infrastructure bottlenecks are moving down the stack into networking, memory movement, storage, observability, and drop-in inference hardware that can run inside existing enterprise data centers. AI factories, sovereign AI infrastructure, enterprise AI clouds, and edge or physical-AI devices have become explicit deployment categories, not just marketing terms.
The longer-term picture is more speculative.
Brain-computer interfaces continue to advance through Neuralink and academic research. Kurzweil predicts that by the 2030s, connecting the neocortex to the cloud will “directly extend our thinking” [Kurzweil, 2024]. Current BCI capabilities remain in research and experimental stages, although neural signal processing via neural networks is a natural fit for the technology.
Energy is the other wildcard. Thorium-based nuclear power — for example, China’s first thorium reactor — could in principle alleviate the energy bottleneck for large-scale AI training and inference. The technology remains promising but unproven at scale. The nearer-term version of the same logic is already visible: existing nuclear capacity is becoming a siting advantage rather than a future hope, as SoftBank’s May 2026 decision to anchor 5 GW of European capacity in nuclear-heavy France illustrates. Abundant clean power need not be invented to matter; where it already exists, compute is starting to migrate toward it.
The net picture is that hardware continues to advance, but at sub-exponential rates (see Thermodynamic Limits and Amdahl’s Law).
6. Socioeconomic Impacts
Drawing on Hanson, Ford, and McAfee/Brynjolfsson, and updated with 2025–2026 data, the socioeconomic picture has four moving parts: investment scale, productivity, labor, and concentration.
Investment scale. Global private AI VC funding hit $225.8B in 2025, roughly 61% of all global VC. Hyperscaler capex for 2026 is projected in the hundreds of billions. NVIDIA’s single-quarter revenue reached $81.6B in Q1 FY2027, including $75.2B from data center. The private-valuation race has now overtaken OpenAI’s $852B March mark, and the figures that were “reported to be closing” a week ago have finalized: Anthropic’s Series H raised $65B at a $965B post-money valuation — the largest single private AI round on record — with run-rate revenue reported to have crossed roughly $47B, and the company filed a confidential draft S-1 with the SEC on June 1, 2026. The financing has also turned visibly circular, and the Series H made the circularity larger rather than smaller. Alongside the round, Anthropic disclosed compute agreements for up to 5 GW with Amazon, 5 GW of next-generation TPU capacity with Google and Broadcom, and GPU access in SpaceX’s Colossus 1 and 2; Apollo Global and Blackstone arranged a $36B private-credit deal — backed by Broadcom — to buy those Google TPUs, described as the largest chip-financing debt transaction on record. The earlier-disclosed Colossus 1 lease (Memphis; ~220,000+ GPUs, ~300 MW; ~$1.25B per month, with SpaceX booking that spend as revenue) is now one line in a much larger compute portfolio assembled chiefly with debt — and as of June 12, 2026, SpaceX is a public company, having priced the largest IPO on record (June 11, $135/share, ~$1.77T valuation, ~$75B raised) and closed its first day near $161. That moves the Memphis AI-compute revenue line inside a disclosing public entity, which over time should make at least one corner of the circular financing web more legible. Cognition, maker of Devin, closed its own round in the same window at a $26B valuation. The pattern from prior updates holds: chip vendors, cloud providers, and model labs are increasingly each other’s customers, lenders, and revenue lines, and the instruments tying them together are moving from equity toward leverage. But a large infrastructure-to-revenue gap persists: consumer and enterprise AI revenue remains far smaller than the infrastructure buildout implied by frontier training, inference, and agent deployment.
The productivity paradox persists, but the measurement picture is mixed. Self-reported AI productivity gains of 30–75% often fail to show up in organizational metrics. A METR randomized controlled trial found that experienced developers using early-2025 AI tools took 19% longer in familiar codebases. MIT’s 2025 GenAI Divide report found that most pilots had no measurable P&L impact. A March 2026 NBER executive survey, by contrast, found positive expected productivity effects concentrated in high-skill services and finance. Vendor metrics — enterprise usage depth, customer case studies, Microsoft/EY-style deployment claims — are useful adoption signals, but not independent productivity proof. Redwood Research’s “Is 90% of code at Anthropic being written by AIs?” rebuttal is the cleanest current calibration on the headline frontier-lab self-reports: the most defensible sub-metric (“lines of code merged”) likely puts AI at a majority, while self-reported Anthropic productivity gains remain 20–40%. The right synthesis is not “AI does nothing”; it is closer to “AI gains are real but highly conditional on workflow fit, integration, governed data access, implementation teams, and measurement.”
Labor displacement is anticipatory, not demonstrated — but the forward-looking picture is sobering. Companies cited AI in 55,000 job cuts in 2025 (a 12× increase over two years), driven mostly by anticipation rather than measured gains. Amodei predicts 50% of entry-level white-collar jobs will be displaced within one to five years. His argument is that AI differs from prior automation in four ways: speed (capabilities advancing faster than labor markets can adapt), cognitive breadth (AI substitutes for general human cognition, not specific tasks), gap-filling (weaknesses get patched with each model release), and slicing by cognitive ability (AI advances from the bottom up the ability ladder, creating an unemployed underclass rather than displacing specific professions). Whether one accepts the full argument or not, the structural concern is worth taking seriously.
Economic concentration of power may be the deeper structural risk. AI infrastructure spending already represents a substantial fraction of U.S. economic growth. Amodei warns of Gilded Age–level wealth concentration: personal fortunes in the trillions, AI companies generating enormous annual revenue, and a coupling of economic and political power that could strain the implicit social contract of democracy. Historically, such couplings tend to provoke their own backlash. Whether this one follows the pattern remains to be seen.
AI transforms the economy but does not, yet, replace all labor (see Comparative Advantage and Technology Adoption S-Curves). The Jevons Paradox is in full effect: cheaper AI reasoning drives explosive demand, but productive deployment remains elusive for most organizations. The open question is whether “not yet” becomes “not ever” — traditional comparative advantage holds — or “not yet but soon” — AI as general labor substitute breaks traditional economics.
7. Plausible Future Scenarios (2030–2040)
7.1 Moderately Accelerating Path (Most Probable)
Continued model improvements, increasingly capable agents, AI-integrated teams and workflows, partial AGI-like systems in narrow domains, and no runaway recursive self-improvement.
7.2 High-Acceleration Path (Optimistic)
Breakthroughs in architecture or hardware, rapid progress in tool-form agents, semi-autonomous research assistants, and a significant shift in scientific productivity.
7.3 Low-Acceleration / Regulated Path
Strict compute caps, global licensing, slower innovation, and strong safety constraints.
7.4 Adoption Phase Lens
Orthogonal to capability scenarios, the internet analogy suggests three adoption waves:
- Infrastructure & Platforms (2023–2027). LLM platforms, training frameworks, developer tools. The current phase.
- Hype Bubble (2025–2028?). “Add AI to everything,” superficial implementations, over-funded startups, and the inevitable shakeout when ROI fails to materialize. The $400B+ infrastructure-to-revenue gap and 80%+ enterprise failure rate suggest this phase is now beginning.
- Practical Integration (2028–2032?). Agentic workflows become standard, AI-native development matures, real enterprise integration takes hold, and AI becomes invisible infrastructure. At which point the “AI” prefix tends to disappear from tool names.
This framing implies that even on the moderate path, a correction or consolidation phase is likely before sustainable deployment at scale.
Several observable signals would mark the transition into the integration wave, and are worth tracking as a check on the framing: enterprise deployments converting from pilots to production at scale, dedicated job categories emerging (e.g. “agentic workflow architect”), standardization of agent-to-agent protocols, and — as a lagging linguistic marker — the “AI” prefix dropping from product and process names, much as “e-“ and “computer-assisted” faded once the underlying technology became assumed. None of these is decisive alone; together they would distinguish genuine integration from continued hype.
7.5 Wildcards
Some developments would substantially reshape the scenario landscape:
- Energy breakthroughs — thorium reactors, fusion, or other abundant energy sources removing the compute cost constraint.
- New algorithmic paradigms — post-transformer architectures or fundamentally new training approaches.
- Brain-computer interfaces — direct neural-cloud integration (Kurzweil’s Fifth Epoch) could merge human and machine cognition, changing the nature of “AI capability” entirely.
- Digital deflation — once industries are fully digitalized, AI-driven automation could cause sustained deflation in goods and services, reshaping economic assumptions.
- Catastrophic misuse incidents — could trigger severe regulation or public backlash.
- Governance shocks — arms races, coordination failures, or unexpected international agreements.
8. What Could Invalidate This Model
It helps to be explicit about which observations would force a revision.
An unexpected architectural jump — a genuine post-transformer paradigm — would invalidate the moderate-acceleration baseline. So would cheap, effectively unlimited compute, whether through fusion-powered training or scaled thorium reactors.
The emergence of robust self-improving agents is the more delicate case. Early signs are visible. GPT-5.3-Codex and GPT-5.5 reportedly helped debug, deploy, and optimize parts of their own development and serving stack. Anthropic states that “the majority of code at Anthropic is now written by Claude Code,” though Redwood Research’s rebuttal argues the most defensible sub-metric (“lines of code merged”) puts AI’s share at a majority while self-reported productivity gains are 20–40% — revealed-preference evidence rather than an audited multiplier. The Anthropic Institute’s June 4, 2026 report When AI builds itself (Favaro and Clark) sharpens that figure to “more than 80%” of code merged into Anthropic’s production systems and argues, on the strength of it, that AI may be approaching a point where systems improve themselves with little meaningful human involvement. The same caveat applies — 80% of merged lines against 20–40% realized productivity gains is the productivity paradox restated, not an audited capability multiplier. What is genuinely new is the recommendation: the report calls for the world to preserve the option to coordinate a slowdown or temporary pause of frontier development, the first time a leading lab has argued for a pause mechanism (distinct from Amodei’s FAA-style mandatory-testing proposal) rather than only for measurement — though it stops short of any unilateral commitment. The limits of that posture showed quickly: five days later, on June 9, the same company shipped Fable 5, the most capable model it has made generally available. The charitable reading is that Fable 5’s gating-and-fallback architecture is precisely the slowdown applied at the deployment layer rather than the research one; the skeptical reading is the revealed preference this model keeps flagging — ship the frontier, govern it at the wrapper, and keep the pause hypothetical. DeepMind’s May 7, 2026 AlphaEvolve impact post reports deployed wins across PacBio variant detection (~30% error reduction), AC Optimal Power Flow GNN feasibility (14% to >88%), and quantum-circuit error rates on the Willow processor (~10x lower). Sakana’s Darwin Godel Machine reports SWE-bench 20.0% to 50.0% and Polyglot 14.2% to 30.7% through open-ended agent self-modification. Jeff Clune’s new Recursive raised $650M at a $4.65B valuation aimed explicitly at this pipeline. Agent SDKs, CI/CD integrations, and managed-agent services could make this loop appear first as agent-managed software infrastructure rather than as a single model autonomously rewriting itself. This is not yet recursive self-improvement in the Singularity sense, but the boundary is blurring.
Kurzweil’s programming feedback loop is the specific version of this concern worth tracking. Once AI achieves sufficient programming ability to improve its own code, it creates a positive feedback loop that he identifies as “the main bottleneck for superintelligent AI” [Kurzweil, 2024]. If this loop activates faster than expected, the moderate-acceleration baseline breaks down. The year-by-year agent evolution in Section 3 — autonomous refactoring, then spec-to-deployment, then multi-agent collaboration, then programmable agent infrastructure — traces exactly this path.
There is also a structural reason the loop might run slower than the hardware would allow. DeepMind’s From AGI to ASI argues that recursive self-improvement is throttled by an Embodied Bottleneck: a digital researcher can hypothesize at superhuman speed, but confirming a new chip design, drug, or physical theory still requires experiments that run at real-world latency, and genuinely novel concepts must be validated against reality rather than recombined from human data. On that view the loop’s ceiling is set by the rate of empirical science, not by compute — which would convert an “explosion” into a fast but linear acceleration. This is the most concrete current argument for why the early self-improvement signals above (AlphaEvolve, Darwin Gödel Machine) have so far stayed narrow and benchmark-bound rather than compounding into open-ended takeoff.
A global coordination breakthrough on alignment would also revise the model in the other direction, as would a major governance collapse or arms race in the opposite one.
9. Summary
Classical Singularity ideas offer useful conceptual frameworks — intelligence explosion, superintelligence, existential risk. The evidence through early May 2026 nonetheless reveals a more nuanced trajectory: extraordinary capabilities with unreliable deployment, massive investment with uncertain returns, and convergent expert timelines (1–5 years to AGI) sitting alongside persistent productivity paradoxes.
The key tensions, as of early May 2026, are these:
Capability versus reliability. Reasoning models solve difficult coding, research, cyber, and professional-work tasks, yet agents still degrade over long horizons and many enterprise pilots fail to deliver measurable returns. Karpathy’s framing remains useful: “the year of the agent” is really “the decade of the agent.”
Capability versus deployment architecture. Frontier progress is now visible through multicloud access, managed-agent services, classified-network procurement, and developer SDKs. The question is not only what models can do, but where they can run, who can audit them, and what permissions they hold.
Scale versus efficiency. The DeepSeek shock shifted the debate from pure scaling to algorithmic efficiency. GPT-5.5, Opus 4.7, and Gemini 3.1 Pro all emphasize better work per token or per tool call. Sutskever and Amodei represent the poles: “simple scaling is ending” against “scaling has not hit a wall at all.”
Investment versus revenue. Roughly $400B+ in annual infrastructure spend against approximately $100B in enterprise AI revenue. Circular financing structures raise bubble concerns, but Jevons Paradox dynamics keep demand growing.
Convergent timelines, divergent definitions. Every major frontier-lab CEO places transformative AI 1–5 years away, but they mean different things by it. Hassabis requires genuine invention; Altman calls AGI a “sloppy term”; Musk claims 10% probability this year. See TIMELINE.md for the full comparison.
Defense versus offense. Project Glasswing suggests frontier models can meaningfully improve defensive security, but the same capability shortens the path to exploit development. This is the cleanest current example of AI as both safety tool and threat multiplier.
Safety erosion at speed. OpenAI’s safety teams have dissolved twice in 18 months. U.S. policy has pivoted to competitiveness. The EU proceeds alone with binding regulation. Mechanistic interpretability advances, but institutional safety infrastructure remains uneven.
The meta-question — exponential curve or logistic one — remains genuinely underdetermined. The baseline serves as a living model that weekly updates will refine.