What Stays Scarce When Generation Gets Cheap

In early 2025, a group of experienced open‑source developers sat down to do something they had done for years — fix bugs and add features to codebases they knew intimately — this time with modern AI assistants at their side. Beforehand, they expected a speed‑up of about a quarter. Afterward, they were sure the tools had sped them up, by roughly a fifth.

The measurements told a different story: across 246 real tasks, they were about 19% slower. The trial, run by METR, had found something more interesting than a productivity number. The developers’ intuition about their own work was wrong in the optimistic direction, and it stayed wrong even after the fact.

This is a useful place to begin, because it punctures the assumption underneath most discussion of AI at work — that the scarce thing is the ability to produce. For most of industrial history that was correct. The constraint was generation: writing the code, drafting the contract, drawing the schematic. Capable AI removes that constraint almost entirely. And once generation stops being the bottleneck, the bottleneck moves somewhere else. What follows is a first sketch of where it went, and why the operator who understands the boundary will outlast the one who merely generates faster.

§ 01

Four limits a machine inherits

Stated plainly, AI is a universal engine for generating and transforming anything that can be communicated. That is an enormous claim, and mostly true. But “anything that can be communicated” runs into limits that are properties of the physical and logical world, not engineering problems waiting for next year’s model; any machine — like any human — inherits them rather than escapes them.

Constraints inherited by any generation engineFIG. 1

The logical limit

Domain : what is verifiable in principle · Gödel · Turing · Rice

Gödel showed any formal system rich enough for arithmetic holds truths it cannot prove, and cannot prove its own consistency from inside. Turing showed no general procedure decides whether a program halts; Rice generalised the gloom to every interesting property of behaviour. An AI is a formal system. So no model can, in general, certify that its own output meets a spec, is safe, or is even correct — which is exactly why “let the AI check its own work” fails where it would matter most.

The complexity limit

Domain : what is solvable in feasible time · NP‑hardness

Scheduling, routing, allocation, search — many problems a founder actually cares about are NP‑hard, their solution spaces exploding combinatorially. An algorithm running in 2ⁿ steps is, at n = 100, already asking for more operations than the universe has had seconds. More compute shifts the wall; it does not remove it. Ask AI for the optimal anything in a hard domain and you get a good heuristic dressed as an answer — often enough, but not the same thing.

The physical limit

Domain : what is computable at what cost · energy · latency

Computation costs energy irreducibly, and in practice moving data dominates. The 2025–26 chokepoint stopped being chips and became power: the IEA projects data‑centre electricity will more than double by 2030, to roughly 945 TWh — about what Japan consumes today. For the operator the lesson is smaller and immediate. Inference is metered. “Throw more agent at it” is a sentence with a price tag — and the most capable reasoning, image and video models carry the largest. Whether the meter is a per‑token bill or a flat subscription’s monthly quota, generation feels free only until the cap, then stops you dead.

The economic & organisational limit

Domain : what is knowable & trustable in a human system · Goldratt · Snowden

Every system has one binding constraint (Goldratt); in an AI‑augmented company it is no longer generation but your own capacity to specify and verify — or, when generation runs on costly reasoning or video models, the compute budget you can afford. Which of the two binds shifts with your capital and the price of the task. Some problems are clear or complicated, with knowable cause and effect, and AI handles them well; others are complex or chaotic, where the right move is a small, safe‑to‑fail experiment, not an oracle’s verdict (Snowden’s Cynefin). Mistaking which you are in is the classic, costly error.

The families interlock. Complexity (02) and the knowledge problem under (04) are one insight in two languages — comprehensive planning in advance is impossible, so you must iterate. Logic (01) and the agency problem under (04) make trust costly and impossible to fully automate. The physical limit (03) puts a price on the very iteration the others demand.

§ 02

The knowledge problem did not disappear; it moved upstream

In 1945 Friedrich Hayek made an argument against central planning that has worn extraordinarily well. The knowledge a society needs to coordinate itself does not sit in any one mind or office. It is dispersed, local, often tacit — “the knowledge of the particular circumstances of time and place” — and much of it cannot be written down at all. Michael Polanyi put it sharply: we know more than we can tell. The planner fails not from stupidity but because the relevant knowledge is structurally unavailable to him.

Pause here, because this is exactly the kind of foundational idea a new technology can reopen. Hayek argued against twentieth‑century planners with pencils and paper forms. Large‑scale machine learning on behavioural data is a genuinely different instrument, and it has revived the old socialist‑calculation debate in earnest. The honest reading: AI can codify some knowledge that used to be tacit — which is a real change — but the change is mostly in the applications, not the principle.

The residual does not vanish. It reappears as a new and very concrete cost — the cost of context engineering: making your model of the business legible to the machine, and keeping it legible as the business changes. The knowledge problem, in other words, migrated. It moved out of the market and into the founder’s job description.

Coase’s question travels the same road. Firms exist, he argued, because internal coordination is sometimes cheaper than transacting in the market. As AI drives coordination costs toward zero, the boundary of the firm dissolves and the one‑person company becomes economically thinkable in a way it never was. As coordination costs fall, new costs rise in their place — around trust, verification, and specification, which are, conveniently, the subject of this playbook.

The winners will not be those who rent the best model. Everyone you compete with rents the same frontier.

§ 03

The bottleneck moved, and the data shows it

Return to the METR developers, because their experience is now visible at scale. Surveys describe a workforce that has adopted these tools enthusiastically and trusts them less than it used to. The single most common frustration is not outright failure but the answer that is “almost right, but not quite,” the kind that costs more to find and fix than to have written.

The telemetry shows the pattern a careful operator should fear: real gains in raw throughput arriving alongside sharp rises in incidents and in the time it takes to review a change. The generation got faster. The checking got harder — and the checking is now where the work lives.

This is the verification bottleneck — not a temporary growing pain but the logical limit from § 01, made economic. Because no system can fully certify its own output, verification has to come from somewhere else — a separate model with different context, a test suite, a type system, your own eyes — and that somewhere‑else costs attention, the one input a solo founder cannot mint more of. Reviewing AI’s work has changed character, too: it is no longer about catching typos but about asking whether the thing should have been built at all. That is a judgment, and judgment does not parallelise the way generation does.

§ 04

What the operator actually does

So what is left for the human once the machine does the producing? A specific and unglamorous list, which is the real subject of this playbook. Each item answers one of the four limits.

Initiator

Nothing happens without your intent; the machine has none of its own. You choose the problem.

Model‑builder

You hold the conceptual map of the business — its value stream, its constraint, what “good” looks like — that the AI lacks and depends on you to supply. This answers the knowledge problem.

Specifier

You turn that mostly‑tacit model into descriptions precise enough to act on. Spec‑driven development is just the discipline of writing requirements and acceptance criteria before code, so review becomes a checking task, not an archaeological one. This answers combinatorial hardness, by decomposition.

Verifier

You own the definition of correct and confirm it by means other than the thing that generated the work. This answers the impossibility of self‑certification.

Orchestrator

You decompose problems into pieces small enough to check, delegate them, budget the compute, and run the feedback loops that compensate for everything that could not be planned. This answers thermodynamics, and the rest.

The human’s job is not to do what the machine does, only better, but to do the things the machine cannot — which turn out to be the ones that were never going to get cheaper.

§ 05

An operating discipline, held lightly

The practical core is a handful of habits rather than a methodology, and they are worth stating without ceremony.

01
Describe before you delegate. If you cannot specify it, the failure when the AI gets it wrong is yours, not the model’s.
02
You are the constraint — protect it. Optimise your specification and verification throughput, never generation speed, which stopped being the bottleneck some time ago.
03
Never let generation outrun verification. Limit work to what you can actually check; over‑generation is waste — and, once generation is metered, a rejected output bills you twice, for the making and the checking.
04
Diagnose the domain first. Trust the machine in the ordered problems; keep your hands on the wheel in the genuinely uncertain ones.
05
Verify by different means than you generated. Separate reviewer, tests, types, your own eyes — never self‑grading.
06
Measure, don’t feel. The METR developers are a standing reminder that the sensation of speed is not evidence of it.

None of this is settled. The capability frontier is moving quickly and unevenly: the length of task an AI completes reliably has been roughly doubling every few months, which argues for redrawing the line between human, machine, and market on a schedule rather than once. If contamination‑resistant benchmarks start showing AI handling the messy, high‑context, judgment‑laden work at human quality, the argument here weakens and the operator’s effort should move further upstream still — into pure problem‑selection and taste. That has not happened yet.

Two caveats cut the other way, and both are worth holding as forecasts rather than findings. The first is that you may not be able to see the frontier improve even where it does. A more capable model only shows its edge on a problem hard enough to stretch it; bring nothing that hard and two models look identical — not because they are, but because the task never tested them. Call it the demand horizon: the point past which you cannot tell models apart because your own problems are not difficult enough to separate them. It sits beneath the verification limit of § 01 — one bounds the hardest answer you can judge, the other the hardest question you can pose. For most operators, most of the time, the question of whether AI has caught up on the messy, judgment‑laden work is not answered. It is never asked.

The second cuts against a claim made twice in these notes — that everyone rents the same frontier. That holds today. It need not hold tomorrow. Should the most capable models come to be gated — by cost, or by the access controls now being floated for the most powerful systems — then “the same frontier” would mean only the same tier you can actually buy. For the solo operator that tier is the commodity one. Which, far from undoing the argument, is the precise condition under which it bites hardest: denied the model that might have obsoleted the craft, you are left to practise it.

None of these futures has arrived; the safest bet is still the dull one. The winners will not be those who rent the best model — everyone you compete with rents the same frontier — but those who have built the human and organisational machinery to feed it and to check its work.

About this note

A field note from The Bounded Operator’s Playbook — on the human capabilities that stay scarce when AI makes generation cheap. Written in an analytical, non‑ideological register: observation, interpretation, and speculation kept distinct.

Sources

METR (2025; time‑horizon v1.1, 2026) · IEA, Energy and AI (2025) · Stack Overflow Developer Survey (2025) · Faros AI, Acceleration Whiplash (2026) · Hayek (1945) · Polanyi (1966) · Coase (1937) · Brynjolfsson & Hitzig (2025) · Gödel, Turing, Rice; Goldratt; Snowden.

Status

Forecasts (the one‑person billion‑dollar company; capability doubling) are marked as forecasts, not facts. Quantitative figures are indicative; energy estimates in particular vary by methodology. Revised July 2026.