Week of 2026-04-26
Summary
This week updates the baseline around a sharper April 2026 pattern. Frontier models are becoming more capable at long-running computer work, but release decisions are increasingly shaped by cyber and bio risk, trusted access, and infrastructure constraints.
The main change is not a single benchmark jump. It is the convergence of GPT-5.5, Claude Opus 4.7, Gemini Deep Research Max, and Anthropic’s Project Glasswing around the same frontier: autonomous, tool-using work over real software, research, documents, and security tasks.
Key Developments
GPT-5.5 Extends OpenAI’s Agentic Work Thesis
OpenAI released GPT-5.5 on April 23, describing it as a model for “real work” across coding, online research, data analysis, documents, spreadsheets, and software operation. The relevant signal is less the name than the product framing. OpenAI is pushing from chat and coding assistance toward computer-operating agents that plan, use tools, check work, and continue through ambiguity.
Reported evaluations include 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro. OpenAI also emphasizes token efficiency relative to GPT-5.4, reinforcing the pattern that frontier progress now comes from systems optimization as much as from model scale.
Source: openai-gpt-5-5-2026
Claude Opus 4.7 Broadens Long-Running Enterprise Work
Anthropic released Claude Opus 4.7 on April 16. It is positioned as the broadly available model for advanced software engineering, high-resolution vision, memory, and multi-step enterprise workflows. Anthropic also frames it as part of a cyber-safety deployment strategy: Opus 4.7 is less capable than Mythos Preview in cybersecurity, but useful for testing safeguards on a model suitable for broader release.
The release supports the baseline’s “capability versus deployability” distinction. The frontier is not simply “best model available to everyone.” It is becoming a tiered access system where general capability, domain-specific risk, and monitoring all matter.
Source: anthropic-claude-opus-4-7-2026
Project Glasswing Makes Cyber Risk Concrete
Anthropic’s Project Glasswing is the most important safety signal in this update. Claude Mythos Preview is gated to launch partners for defensive security work, with Anthropic reporting thousands of high-severity vulnerabilities found across critical software. The accompanying red-team writeup argues that frontier coding models can now exceed all but the most skilled humans at vulnerability discovery and exploitation.
This shifts the safety discussion from abstract future misalignment toward present-day dual-use capability. The same models that can patch and harden software can also shorten exploit development loops. The baseline now treats gated access and domain-specific safeguards as central governance mechanisms, not side details.
Sources: anthropic-project-glasswing-2026, anthropic-mythos-red-team-2026
Gemini Deep Research Max Pushes Research Agents
Google introduced Deep Research and Deep Research Max, built with Gemini 3.1 Pro, as autonomous research agents with MCP support and native visualizations. The release matters because research agents are moving from “summarize webpages” toward enterprise workflows over web and private sources.
This strengthens the 2026–2028 agentic workflow forecast, especially for domains where value comes from synthesizing evidence rather than executing irreversible actions.
Sources: google-gemini-3-1-pro-2026, google-deep-research-max-2026
Energy Remains a Hard Deployment Constraint
The IEA’s Energy and AI report remains the cleanest baseline source for infrastructure pressure. Global data center electricity consumption was about 415 TWh in 2024 and reaches about 945 TWh by 2030 in the base case. AI accelerated servers are a major growth driver.
This does not imply an immediate energy wall. It does imply that geography, grid interconnection, power contracts, cooling, and hardware supply increasingly determine where frontier AI can be deployed.
Source: iea-datacenter-energy-forecast-2025
Baseline Impact
Updated:
- The “State of AI” snapshot now reflects late April 2026 rather than early March.
- The core tension is now phrased as capability, reliability, and misuse risk.
- Cybersecurity is elevated as the clearest near-term dual-use domain.
- The productivity discussion now distinguishes failed pilots from heterogeneous measured gains.
- The timeline document adds April 2026 release evidence without moving forecast dates mechanically.
No change:
- The baseline remains moderate acceleration.
- Broad AGI remains more uncertain than narrow agentic capability.
- The Metaculus 2033 median remains a more defensible center than either 2026–2027 industry optimism or 2047 survey conservatism.
Watch Next
- Whether Mythos-class models remain gated or competitors release similar capabilities broadly.
- Whether GPT-5.5 / Opus 4.7-style agents show measurable productivity gains in real organizations rather than on benchmarks.
- Whether enterprise research agents become trusted for decision support over private data.
- Whether energy constraints shift from forecasts to explicit model availability or pricing constraints.