kurzweil-singularity-near-2005
The Singularity is Near
Ray Kurzweil, 2005
Law of accelerating returns; original 2029 Turing test and 2045 Singularity predictions.
Central registry of books, papers, articles, forecasts, and public statements cited across the baseline, timeline, predictions, and weekly updates.
kurzweil-singularity-near-2005
Ray Kurzweil, 2005
Law of accelerating returns; original 2029 Turing test and 2045 Singularity predictions.
tegmark-life-3-0-2017
Max Tegmark, 2017
Scenarios for coexistence with superintelligent AI.
bostrom-superintelligence-2014
Nick Bostrom, 2014
Foundational existential-risk framing for advanced AI.
russell-human-compatible-2019
Stuart Russell, 2019
Inverse reinforcement learning approach to alignment.
hanson-age-of-em-2016
Robin Hanson, 2016
Economic analysis of brain emulation scenarios.
ord-precipice-2020
Toby Ord, 2020
Existential risk landscape including AI.
barrat-our-final-invention-2013
James Barrat, 2013
Risks of artificial superintelligence.
ford-architects-of-intelligence-2018
Martin Ford, 2018
Interviews with leading AI researchers on future trajectories.
christian-alignment-problem-2020
Brian Christian, 2020
Accessible overview of alignment challenges in current ML.
agrawal-prediction-machines-2018
Ajay Agrawal, Joshua Gans, Avi Goldfarb, 2018
Economic framework for AI as cheap prediction.
brynjolfsson-competing-age-ai-2020
Erik Brynjolfsson, Andrew McAfee, 2020
AI-driven transformation of business and labor markets.
kelly-what-technology-wants-2010
Kevin Kelly, 2010
Technology as an evolving system with its own tendencies.
kurzweil-singularity-nearer-2024
Ray Kurzweil, 2024
Updated predictions. Identifies programming as main bottleneck for superintelligent AI; positive feedback loop once AI achieves sufficient programming ability.
espai-survey-2023
AI Impacts / ESPAI, 2023
1,714 AI researchers. 50% HLMI by 2047, 50% FAOL by 2116.
~1,700 forecasters. 50% AGI by Nov 2033; weakly general AI Oct 2027. Feb 2026 data. Timelines have slightly lengthened in the past year despite long-term collapse from ~50 years in 2020.
AGI timelines dashboard aggregating Metaculus, Manifold, and Kalshi forecasts. On May 23, 2026, the combined forecast estimated AGI in 2031 with an 80% interval of 2027-2043.
forecaster-surveys-2024-2025
Various forecasters, 2025
More aggressive than ESPAI: 50% HLMI by 2030, 90% by 2040.
80000hours-critical-period-2025
80,000 Hours, 2025
Identifies 2028–2032 as likely bottleneck period for AGI arrival.
amodei-agi-prediction-2025
Dario Amodei, 2025
Anthropic CEO. 'Country of geniuses in a datacenter.' Anthropic official position (March 2025): powerful AI in late 2026 or early 2027.
altman-agi-asi-prediction-2024
Sam Altman, 2024
OpenAI CEO. AGI 2025–2029 ('sloppy term'). ASI by ~2028: 'more intellectual capacity in data centers than outside.'
altman-agi-confidence-2025
Sam Altman, 2025
January 2025: 'We are now confident we know how to build AGI as we have traditionally understood it.' Claims GPT-5 is 'already smarter than me in many ways.' Predicts superintelligence by 2030. Corporate actions: $500B Stargate, 800M+ weekly ChatGPT users, Jony Ive IO acquisition ($6.5B).
hassabis-agi-prediction-2025
Demis Hassabis, 2025
DeepMind CEO. '5–10 years' from March 2025 (= 2030–2035). Coding and math fastest; scientific discovery harder.
hassabis-agi-prediction-2026
Demis Hassabis, 2026
Narrowed from '5–10 years' to 'maybe within the next five years.' Requires 'one or two more major breakthroughs on the level of the Transformer or AlphaGo.' AGI must include genuine invention and creativity: 'Could a system invent Go, or come up with relativity?'
huang-agi-prediction-2024
Jensen Huang, 2024
Nvidia CEO. AGI within 5 years (2029) in March 2024. Shifted to 'already here' in Nov 2025.
lecun-agi-prediction-2025
Yann LeCun, 2025
Meta Chief AI Scientist. 'At least a decade, probably much more.' LLMs will not lead to AGI; new architectures needed.
legg-agi-prediction-2025
Shane Legg, 2025
DeepMind co-founder. 50% chance of 'minimal AGI' by 2028.
critch-agi-prediction-2025
Andrew Critch, 2025
AI researcher. 45% chance of AGI by end of 2026.
barnett-transformative-ai-2025
Matthew Barnett, 2025
Median for transformative AI ~2033 based on training loss extrapolation.
musk-agi-prediction-2026
Elon Musk, 2026
Claims AGI by year-end 2026. Grok 5 (6T parameters, Q1 2026) has '~10% chance of achieving AGI.' xAI acquired by SpaceX at $250B valuation Feb 2026.
sutskever-ssi-scaling-2026
Ilya Sutskever, 2026
Running Safe Superintelligence Inc. ($32B valuation, ~20 employees, zero revenue). 'Age of simple scaling is ending'; next breakthrough requires fundamentally new learning methods.
karpathy-rlvr-agents-2025
Andrej Karpathy, 2025
RLVR as high capability per dollar, gobbling compute from pretraining. 'Year of the agent' is really 'decade of the agent.'
amodei-scaling-2026
Dario Amodei, 2026
March 2026 Morgan Stanley conference. Scaling laws have 'not hit a wall at all.' Predicts 'radical acceleration in 2026.'
odlyzko-ai-bubble-warning-2026
Andrew Odlyzko, 2026
University of Minnesota researcher. Warns circular AI financing structures (OpenAI/NVIDIA/AMD/Microsoft cross-investments) are 'typical of bubbles.'
gartner-agentic-ai-forecast-2025
Gartner, 2025
Projects 40% of enterprise apps will embed agents by end of 2026, up from <5% in 2025.
February 2026 release. OpenAI describes GPT-5.3-Codex as its most capable agentic coding model to date, 25% faster than GPT-5.2-Codex, with gains on SWE-Bench Pro, Terminal-Bench 2.0, OSWorld, GDPval, and cybersecurity. OpenAI states early versions helped debug training, deployment, and evals.
April 23, 2026 release. OpenAI frames GPT-5.5 as a model for real work across code, research, data analysis, documents, spreadsheets, and software operation. Reports 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro, with better token efficiency than GPT-5.4.
OpenAI system card for GPT-5.5. Describes predeployment safety evaluations, cybersecurity and biology safeguards, and release posture for GPT-5.5 and GPT-5.5 Pro.
openai-swe-bench-verified-contamination-2026
OpenAI, 2026
February 23, 2026 OpenAI analysis arguing SWE-bench Verified is no longer suitable for frontier coding launches. OpenAI audited a subset of hard failures and found at least 59.4% had flawed tests, plus evidence frontier models could reproduce gold patches or problem specifics; recommends SWE-bench Pro instead.
February 5, 2026 release. Opus 4.6 introduced 1M token context in beta for Opus-class models, 128k output tokens, agent teams in Claude Code, context compaction, adaptive thinking, and stronger long-running coding and knowledge-work performance.
February 17, 2026 release. Anthropic describes Sonnet 4.6 as an upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design, with a 1M token context window in beta.
April 16, 2026 release. Anthropic describes Opus 4.7 as stronger than Opus 4.6 on advanced software engineering, high-resolution vision, memory, and multi-step enterprise workflows. It is also used to test cyber safeguards before broader Mythos-class releases.
April 7, 2026 initiative giving launch partners gated access to Claude Mythos Preview for defensive security. Anthropic reports Mythos Preview found thousands of high-severity vulnerabilities and argues frontier coding models can surpass all but the most skilled humans at vulnerability discovery and exploitation.
anthropic-claude-managed-agents-2026
Anthropic, 2026
April 2026 Anthropic announcement of Claude Managed Agents, a hosted agent harness and production runtime with standard token rates plus $0.08 per active session-hour. Evidence that frontier vendors are productizing agent orchestration rather than only model endpoints.
Claude Platform docs for Managed Agents. Defines agents, environments, sessions, and events; Managed Agents API requests require the managed-agents-2026-04-01 beta header and support Anthropic-managed cloud containers or self-hosted sandboxes.
Claude Platform docs for running Managed Agent tool execution in customer-controlled infrastructure. Anthropic keeps orchestration while code, filesystem, and network egress remain in the customer's environment.
Claude Platform docs for MCP tunnels, a research-preview feature connecting Claude to private-network MCP servers through outbound-only connections without opening inbound firewall ports or exposing services publicly.
anthropic-mythos-red-team-2026
Anthropic Frontier Red Team, 2026
Technical writeup on Claude Mythos Preview. Describes autonomous vulnerability discovery and exploit development, including exploit chains and comparisons to Opus 4.6. Useful as evidence for offense-defense asymmetry and gated-release logic.
February 19, 2026 release. Google describes Gemini 3.1 Pro as an upgraded core intelligence model for complex tasks, rolling out across Gemini API, Vertex AI, Google AI Studio, Gemini CLI, Antigravity, Gemini app, and NotebookLM. Reports 77.1% verified score on ARC-AGI-2.
google-deep-research-max-2026
Google DeepMind, 2026
April 21, 2026 release. Google frames Deep Research and Deep Research Max, built with Gemini 3.1 Pro, as autonomous research agents with MCP support, native visualizations, and stronger long-horizon analytical workflows.
xAI developer documentation for the Grok 4.20 reasoning model. Used as source registry entry for February 2026 frontier release cadence.
April 26, 2026 statement by Sam Altman. Reframes OpenAI's public principles around broad access to general AI, democratic governance, decentralized power, infrastructure expansion, and safety, with less emphasis on the older AGI-charter language.
microsoft-openai-partnership-amendment-2026
Microsoft / OpenAI, 2026
April 27, 2026 amended agreement. Microsoft remains OpenAI's primary cloud partner, but OpenAI can serve products across any cloud provider; Microsoft's OpenAI IP license becomes non-exclusive through 2032; revenue-share terms are simplified.
April 28, 2026 limited preview bringing OpenAI models, Codex, and Amazon Bedrock Managed Agents powered by OpenAI into AWS environments. Important signal that frontier models and coding agents are becoming multicloud enterprise infrastructure.
aws-bedrock-openai-managed-agents-2026
Amazon Web Services, 2026
April 28, 2026 AWS limited-preview announcement. Bedrock OpenAI offerings inherit IAM, PrivateLink, guardrails, encryption, and CloudTrail logging; Managed Agents powered by OpenAI have per-agent identity, action logs, and run in customer AWS environments with inference on Bedrock.
May 1, 2026 announcement of agreements with SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft, and Amazon Web Services to deploy advanced AI capabilities on IL6 and IL7 classified networks for lawful operational use.
April 28, 2026 public beta of a TypeScript SDK exposing Cursor's agent runtime for local, cloud, CI/CD, and embedded product workflows. Evidence that coding agents are becoming programmable infrastructure rather than only interactive IDE tools.
May 19, 2026 Cursor changelog announcing Jira integration: assigning work items or mentioning @Cursor starts a cloud agent that scopes the task from the Jira item and repository settings, then posts completion updates and a pull-request link.
April 28, 2026 announcement that Warp's client is open source and organized around agent-first workflows using Oz, with OpenAI as founding sponsor. Useful as an example of agent-managed software development moving into public repos.
nvidia-nemotron-3-nano-omni-2026
NVIDIA, 2026
April 28, 2026 open omni-modal model for video, audio, image, and text reasoning in agentic workloads. NVIDIA reports higher throughput and lower compute for video reasoning, reinforcing the efficiency-plus-agent-infrastructure trend.
Granite 4.1 family of Apache 2.0 dense language models in 3B, 8B, and 30B sizes, with instruction-tuned variants, FP8 quantization, and improvements in tool calling, instruction following, coding, and mathematical reasoning.
caisi-frontier-testing-agreements-2026
NIST Center for AI Standards and Innovation, 2026
May 5, 2026 announcement expanding CAISI collaborations with Google DeepMind, Microsoft, and xAI for pre-deployment evaluations, post-deployment assessment, classified-environment testing, and national-security research. Builds on renegotiated OpenAI and Anthropic partnerships.
caisi-deepseek-v4-pro-evaluation-2026
NIST Center for AI Standards and Innovation, 2026
May 1, 2026 evaluation finding DeepSeek V4 Pro is the most capable PRC model CAISI has assessed, but roughly 8 months behind leading U.S. models across cyber, software engineering, natural science, abstract reasoning, and mathematics. Also reports strong cost efficiency versus similarly capable U.S. reference models.
anthropic-enterprise-ai-services-company-2026
Anthropic, 2026
May 4, 2026 announcement of an AI services company formed by Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs to help mid-sized companies deploy Claude into core operations with hands-on engineering support.
May 5, 2026 release of ten ready-to-run financial-service agent templates for tasks such as pitchbooks, KYC review, audits, valuations, and month-end close, distributed through Claude Cowork, Claude Code, and Claude Managed Agents.
May 6, 2026 launch of a business extension to OpenAI Signals using privacy-preserving Enterprise usage patterns to measure depth of AI adoption inside organizations, shifting attention from seats deployed to workflow intensity.
May 7, 2026 customer case study describing Simplex adopting ChatGPT Enterprise and Codex as its primary coding agent while quantitatively measuring generative-AI productivity across systems-development projects.
amd-openai-mrc-2026
AMD, 2026
May 6, 2026 AMD post describing OpenAI, AMD, Microsoft, and other industry contributors making Multipath Reliable Connection available through the Open Compute Project to improve production-scale AI networking.
amd-instinct-mi350p-pcie-2026
AMD, 2026
May 7, 2026 AMD announcement of MI350P PCIe cards aimed at fitting agentic inference into standard air-cooled enterprise servers rather than only purpose-built large GPU clusters.
yudkowsky-soares-if-anyone-builds-it-2025
Eliezer Yudkowsky, Nate Soares, 2025
NYT bestseller. Core thesis: superintelligent AI will pursue goals diverging from human values. P(doom) >75%.
Amodei's vision of AI upside. Defines 'powerful AI' as 'country of geniuses in a datacenter' — Nobel-caliber across fields, millions of instances, 10–100x human speed, autonomous for hours/days/weeks. Five domains: biology, neuroscience, economic development, peace/governance, work/meaning. Introduces 'marginal returns to intelligence' framework. Estimates 10–20% sustained annual GDP growth. Powerful AI could arrive as early as 2026.
20,000-word risk framework, follow-up to 'Machines of Loving Grace.' Five risk categories: (1) autonomy risks — AI misalignment not inevitable but measurably probable; (2) misuse for destruction — bioweapons as primary concern, AI breaks motive/ability correlation; (3) misuse for seizing power — AI-enabled totalitarianism via autonomous weapons, surveillance, propaganda; (4) economic disruption — predicts 50% of entry-level white-collar jobs displaced in 1–5 years, warns of Gilded Age-level wealth concentration; (5) indirect effects — unknown unknowns from accelerated progress. Defenses: Constitutional AI, mechanistic interpretability, transparency legislation (SB 53, RAISE Act), export controls, progressive taxation. AI feedback loop: 'each generation of AI can be used to design and train the next generation.' Stopping AI development is 'fundamentally untenable.'
kokotajlo-ai-2027-scenario-2025
Daniel Kokotajlo et al., 2025
Former OpenAI researcher. Month-by-month AGI projection by 2027, ASI shortly after. Early 2026 self-assessment: progress at ~65% of predicted pace. Median shifted from 2028 to 2029.
us-ai-action-plan-2025
White House / OSTP, 2025
~90 policy actions oriented toward competitiveness and deregulation. Published July 2025.
mit-tech-review-breakthroughs-2026
MIT Technology Review, 2026
Named mechanistic interpretability as one of 10 Breakthrough Technologies of 2026.
Experienced open-source developers using AI tools took 19% longer than without AI in familiar codebases.
January 29, 2026 METR update to autonomous-agent time-horizon estimates. Expands the task suite from 170 to 228 tasks, increases long tasks from 14 to 31, moves infrastructure to Inspect, and reports a post-2024 TH1.1 doubling time of about 89 days.
METR's live frontier-agent time-horizon page, last updated May 8, 2026. Defines 50% and 80% task-completion horizons and warns that measurements above 16 hours are unreliable with the current task suite.
January 8, 2025 independent evaluation of Devin on 20 real-world coding tasks: 3 successes, 14 failures, and 3 inconclusive results. Useful counterweight to vendor-reported autonomous-coding case studies.
mit-nber-ai-productivity-2026
Salomé Baslandze et al., 2026
March 2026 NBER working paper using a survey of nearly 750 corporate executives. Finds heterogeneous AI adoption, positive productivity gains concentrated in high-skill services and finance, and expected strengthening in 2026.
mit-genai-divide-2025
MIT Project NANDA, 2025
Enterprise AI adoption report widely cited for finding that most generative AI pilots fail to produce measurable P&L impact. Emphasizes learning gaps, workflow isolation, and the difference between experimentation and transformation.
khanal-long-horizon-reliability-2026
Aaditya Khanal, Yangyang Tao, Junxiu Zhou, 2026
March 31, 2026 arXiv paper arguing pass@1 hides long-horizon reliability failures. Introduces Reliability Decay Curve, Variance Amplification Factor, Graceful Degradation Score, and Meltdown Onset Point; evaluates 10 models across 23,392 episodes on 396 tasks.
yao-tau-bench-2024
Shunyu Yao et al., 2024
Tool-agent-user interaction benchmark for realistic retail and airline domains. Shows that repeated-trial reliability degrades sharply: a model can have moderate pass^1 while pass^k falls quickly as k increases.
scale-swe-bench-pro-2025
Scale AI, 2025
September 2025 SWE-Bench Pro paper introducing 1,865 long-horizon software-engineering problems from 41 actively maintained repositories, intended as a harder and more contamination-resistant successor to SWE-bench Verified.
uk-aisi-agent-reliability-2025
UK AI Safety Institute, 2025
Most advanced systems complete hour-long software tasks with >40% success (up from <5% in late 2023), but reliability degrades catastrophically over longer horizons.
anthropic-model-organisms-misalignment-2025
Anthropic, 2025
Frontier models facing replacement in simulated environments resorted to blackmail. Microscope project can trace complete reasoning paths.
agentsearchbench-2026
Bin Wu, Arastun Mammadli, Xiaoyu Zhang, Emine Yilmaz, 2026
April 24, 2026 arXiv paper introducing a benchmark for discovering suitable agents from nearly 10,000 real-world agents, using execution-grounded signals rather than text descriptions alone. Finds a gap between semantic similarity and actual agent performance.
kohler-agentic-reproduction-2026
Benjamin Kohler, David Zollikofer, Johanna Einsiedler, Alexander Hoyle, Elliott Ash, 2026
April 23, 2026 arXiv paper evaluating agents that reproduce empirical social-science results from methods descriptions and data without seeing original code or results. Agents can often recover results, but performance varies and failures include both agent errors and underspecified papers.
zhang-llm-mas-rl-orchestration-2026
Chenchen Zhang, 2026
May 4, 2026 arXiv paper framing multi-agent RL around orchestration traces covering spawning, delegation, communication, aggregation, and stopping. Finds a gap in explicit RL methods for stopping decisions and a scale gap between public academic evaluations and industrial deployments.
sharma-agent-execution-validation-2026
Reshabh K Sharma, Gaurav Mittal, Yu Hu, 2026
May 4, 2026 arXiv paper proposing validation of autonomous-agent execution from 2-10 passing traces, using dominator analysis, semantic equivalence, and topological subsequence matching to detect bugs and false successes.
cho-skillret-2026
Hongcheol Cho, Ryangkyung Kang, Youngeun Kim, 2026
May 7, 2026 arXiv paper introducing a benchmark with 17,810 public agent skills, 63,259 training samples, and 4,997 evaluation queries. Finds skill retrieval remains difficult at realistic library scale.
May 19, 2026 Google announcement launching Gemini 3.5 Flash as a model family focused on agentic workflows, coding, speed, and broad distribution through the Gemini app, AI Mode in Search, Antigravity, Gemini API, Android Studio, and Gemini Enterprise.
May 20, 2026 Google I/O roundup announcing Gemini 3.5 Flash, Gemini Spark, Daily Brief, AI Mode/Search updates, Universal Cart, Workspace features, and a $100 Google AI Ultra subscription tier.
Cognition's 2026 Devin release notes. Includes PR resuming, Devin Review auto-merge, Wiki v2, subagents, enterprise audit logs, MCP marketplace upgrades, hard ACU caps, and other persistent-agent workflow features.
infosys-cognition-devin-2026
Infosys / Cognition, 2026
January 7, 2026 Infosys and Cognition announcement to deploy Devin across Infosys's internal engineering ecosystem and client engagements, combining Devin with Infosys Topaz Fabric for enterprise software-development workflows.
openai-dell-codex-enterprise-2026
OpenAI, 2026
May 18, 2026 OpenAI announcement that Codex will connect with Dell AI Data Platform and explore Dell AI Factory integrations so enterprises can run agentic workflows closer to governed on-prem and hybrid data.
microsoft-ey-enterprise-ai-impact-2026
Microsoft, 2026
May 21, 2026 Microsoft post describing EY's large-scale Copilot deployment and a more than $1B Microsoft-EY initiative using forward-deployed engineers and transformation teams to move enterprises from pilots to production.
nvidia-q1-fy2027-results-2026
NVIDIA, 2026
May 20, 2026 earnings release reporting $81.6B total revenue and $75.2B data-center revenue for the quarter ended April 26, 2026, plus a new reporting split between Hyperscale, ACIE, and Edge Computing.
eu-ai-act-transparency-consultation-2026
European Commission, 2026
May 8, 2026 European Commission consultation on AI Act transparency obligations taking effect August 2, 2026, including disclosure of AI interaction and machine-readable marking for AI-generated or manipulated content.
axios-frontier-model-eo-2026
Ashley Gold, 2026
May 19, 2026 Axios report that a draft White House executive order would create a voluntary framework for labs to share covered frontier models with government as much as 90 days before public release. Treat as reporting on a draft, not enacted policy.
zou-phoenix-bench-2026
Qingyun Zou, Feng Yu, Hongshi Tan, Bingsheng He, WengFai Wong, 2026
May 13, 2026 arXiv paper introducing Phoenix-bench, a benchmark of 511 Verilator instances from 114 repositories. Finds software-tuned agents lose 37-58% moving from SWE-bench Verified to hardware debugging tasks, with failures concentrated in hierarchy-aware signal-flow tracking and coordinated multi-file edits.
wu-agent-skill-biv-2026
Yuhao Wu, Tung-Ling Li, Hongliang Liu, 2026
May 12, 2026 arXiv paper formalizing behavioral integrity verification for agent skills. On 49,943 OpenClaw skills, 80.0% deviated from declared behavior; 5.0% carried predicted multi-stage attack chains; malicious-skill detection reached F1 0.946.
liao-agentic-ai-pathway-agi-2026
Junwei Liao, Shuai Li, Muning Wen, Jun Wang, Weinan Zhang, 2026
May 13, 2026 ICML 2026 position-track paper arguing that agentic systems, rather than pure monolithic scaling, are a foreseeable path to AGI because routing, DAG-style task composition, and multi-agent structures can improve generalization and sample efficiency.
May 7, 2026 arXiv paper taking a critical software-studies perspective on AGI, emphasizing that AGI remains conceptually and definitionally problematic and that pathways differ across frontier proprietary, open-weight, domain-specific, and sovereign model trajectories.
Estimates data centers consumed around 415 TWh in 2024 and projects global data center electricity consumption to reach about 945 TWh by 2030 in the Base Case. Accelerated AI servers are a major driver.
redwood-anthropic-code-share-2026
Redwood Research, 2026
Rebuttal to the popular '90% of code at Anthropic is AI-written' framing. Argues the most defensible sub-metric, 'lines of code merged,' likely puts AI's share at a majority while self-reported Anthropic productivity gains remain in the 20-40% range. Calls the 90% framing 'probably false in a straightforward sense.' Useful as a calibration counterweight to the vendor programming-feedback-loop narrative.
Anthropic's Claude Code product page. Includes the 'majority of code at Anthropic is now written by Claude Code' claim and named enterprise case studies: Stripe (10,000-line Scala-to-Java migration in 4 days vs ~10 engineer-weeks), Wiz (50,000-line Python-to-Go in ~20 hours of active dev time vs 2-3 months), Rakuten (average new-feature delivery 24 to 5 working days), Goldman Sachs Devin-and-Claude pilot, and Visma developer-productivity claims. Vendor-curated and not third-party audited; pair with the Redwood Research calibration.
May 7, 2026 DeepMind retrospective reporting AlphaEvolve-discovered improvements across DeepConsensus variant detection (~30% error reduction for PacBio sequencers), AC Optimal Power Flow GNN feasibility (14% to >88%), natural-disaster risk modelling (+5% accuracy across 20 categories), and quantum-circuit error reduction (~10x on the Willow processor). Extends the May 2025 results, which already included a 23% Gemini training matmul speedup, 32.5% FlashAttention speedup, ~0.7% recovered data-center compute, and a 48-multiplication 4x4 complex matmul beating Strassen. Concrete partial evidence for Kurzweil's programming feedback loop in narrow domains.
sakana-darwin-godel-machine-2025
Jenny Zhang, Shengran Hu, Cong Lu, Robert Tjarko Lange, Jeff Clune, 2025
Sakana AI self-improving-agent system that edits its own code, archives, and benchmarks. Reports SWE-bench from 20.0% to 50.0% and Polyglot from 14.2% to 30.7% through open-ended self-modification. v3 revisions posted March 12, 2026. Concrete partial evidence for the programming feedback loop within narrow benchmarked settings.
ieee-spectrum-recursive-self-improvement-2026
IEEE Spectrum, 2026
May 2026 IEEE Spectrum overview characterising the state of recursive AI self-improvement as 'emerging, but humans are still in the loop.' Useful as a calibration counterweight to both runaway-takeoff and dismissive framings.
bloomberg-cognition-25b-raise-2026
Bloomberg, 2026
April 23, 2026 Bloomberg report that Cognition (maker of Devin) is targeting a $25B raise, roughly 2.5x its $10.2B September 2025 valuation set in the $400M Founders Fund-led round. Signal that capital markets continue to price autonomous-coding-agent capability aggressively.
recursive-clune-startup-2026
Recursive (Jeff Clune), 2026
Reports that Jeff Clune's new company Recursive raised $650M at a $4.65B valuation, aimed explicitly at the full recursive self-improvement pipeline. No public products yet. Market-side signal that frontier-adjacent labs are explicitly funding self-improvement work, even though capability evidence remains narrow.
aws-bedrock-stateful-runtime-2026
AWS, 2026
May 18, 2026 AWS announcement of a stateful runtime for Bedrock agents handling multi-step state, tool invocation, error handling, and resume-safe long-running tasks. Carries 'working context' across executions: memory and history, tool and workflow state, environment use, and identity and permission boundaries. Concrete infrastructure milestone for the 2026.5 'agents inside org permission boundaries' row.
github-copilot-cloud-agent-2026
GitHub, 2026
GitHub Copilot Cloud Agent surfaces across Visual Studio Code, JetBrains, Xcode, Eclipse, github.com, and Mobile, running Claude Opus 4.7 and GPT-5.5 under admin policy gates. Evidence that frontier coding agents are being routed into existing developer tools rather than only standalone IDEs, with persistent identity and policy enforcement.
cursor-composer-2-5-2026
Cursor, 2026
May 18, 2026 Cursor in-house coding model release. Evidence that frontier-adjacent tooling vendors are training their own specialised coding models rather than only wrapping API frontier models. Released alongside Cursor in Jira and Build-in-Parallel async subagents.