Alchemy

Part IV · The Derivation

IV.C — Meeting the Perez Criteria

10 min read · 1,982 words

The previous sections have defined Factor Prime as a production input with thermodynamic weight and selection filtering, and have argued that its recursive character distinguishes it from other energy-intensive processes. But a definition is not a demonstration. If Factor Prime is to be taken seriously as the key input of a new techno-economic paradigm, it must satisfy the criteria that Carlota Perez established for recognizing such transitions. This section examines each criterion in turn, with attention to what would falsify the claim.

Perez identifies four characteristics that distinguish a genuine key input from an ordinary technological improvement.1 First, the input must exhibit a steep and sustained decline in relative cost. Second, it must have nearly unlimited supply at the new, lower price point. Third, it must be applicable across many sectors of the economy, not confined to a single industry. Fourth, its adoption must reshape organizational forms, business models, and institutional arrangements. A technology that meets only one or two of these criteria is an incremental improvement; a technology that meets all four is a candidate for paradigm status.


Cost trajectory. The relevant question is not whether compute has become cheaper—it has, across multiple imperfect measures—but whether selected cognition has become cheaper: uncertainty reduction per dollar, error correction per dollar, task completion per dollar. Raw compute prices are necessary but insufficient; what matters is capability-per-dollar on economically relevant tasks.

The evidence is robust in sign, uncertain in magnitude. Hardware cost per floating-point operation has declined by roughly an order of magnitude every four to five years since the 1970s, a trend documented by Nordhaus and updated by subsequent researchers tracking semiconductor price-performance.2 This measures the physical substrate, not the cognitive output.

More relevant is the cost of achieving a fixed level of model capability. Epoch AI estimates that the compute required to reach a given performance threshold on standard benchmarks has fallen by approximately 50 percent per year since 2012, reflecting both algorithmic improvements and hardware gains.3 This measures training efficiency, not deployment value.

The sharpest test is inference cost per successful task—the canonical unit for Factor Prime cost. Here the data are more fragmented, but the direction is clear. API pricing for frontier models has declined by roughly an order of magnitude since early 2023, with GPT-4-class inference moving from approximately 0.03perthousandtokenstounder0.03 per thousand tokens to under 0.003 per thousand tokens by late 2024. For tasks where model outputs directly substitute for human labor, the cost per successful completion has fallen correspondingly. Enterprise deployments report cost reductions of 40–70 percent on ticket resolution and first-draft generation in early pilots, with the caveat that these figures depend heavily on task definition, quality thresholds, and human-review requirements.

Operational measures—cost per resolved support ticket, cost per accepted code commit, cost per document reviewed to a specified accuracy threshold—are imperfect but auditable, and they connect the thermodynamic framework to observable deployment outcomes. Success means accepted by a downstream verifier, whether human or automated.

Cost trajectory falsifier: If inference costs per successful task plateau or rise over the next five years—due to capability saturation, regulatory burden, or supply constraints—the cost criterion would fail. The test is not compute price but task-relevant output price.


Perceived unbounded supply. The second criterion is more subtle and currently the most uncertain. A key input must appear effectively unlimited at its new price point—not literally infinite, but abundant enough that users do not treat it as a binding constraint during the installation phase.

Computation currently occupies an ambiguous position. Inference capacity for deployed models is expanding rapidly: cloud providers offer effectively unlimited API access on demand, and a developer can provision thousands of GPU-hours with a credit card. The constraint is budget, not availability. For routine inference tasks, supply appears unbounded.

Frontier training is a different matter. The physical constraints detailed in Parts II and III—interconnection queues, transformer lead times, advanced packaging capacity, water permits—create bottlenecks that money alone cannot clear in the near term. The largest training runs require access to clusters that only a handful of organizations possess. The infrastructure to support such runs takes years to build.

This pattern is consistent with Perez’s framework: in the early phases of a paradigm, the key input is scarce precisely because the infrastructure to produce it at scale has not yet been built. Coal was scarce before the mines were dug; oil was scarce before the refineries were constructed; semiconductors were scarce before the fabs were erected. The perception of unlimited supply emerges as the infrastructure matures and the installation phase gives way to deployment.

But the current scarcity may not be merely transitional. Unlike coal or oil, compute depends on multiple bottlenecks simultaneously: chips and power and cooling and interconnect and grid access and permits. Carbon and water constraints are not early-phase frictions; they may be enduring. The question is whether infrastructure buildout will outpace demand growth, and the answer is not yet clear.

Unbounded supply falsifiers: (1) If inference remains capacity-rationed and price does not trend toward commodity behavior outside frontier niches over the next decade; (2) if grid interconnection lead times do not compress below current averages; (3) if transformer lead times remain above 18 months; (4) if marginal compute cost remains dominated by scarcity rents rather than capital amortization; (5) if new generation capacity dedicated to data centers does not rise as a share of total grid additions. This is the criterion most likely to bind, and the manuscript does not claim certainty about its resolution.


Broad applicability. The evidence here is stronger, though the pattern of adoption is uneven. Because computation operates on patterns rather than matter, it can be directed toward any problem that admits a formal representation. The practical question is whether the cost is low enough to make the application economical, and whether the selection gradient is tight enough to ensure that the outputs are useful.

Current AI systems have demonstrated meaningful competence in language-mediated tasks: customer support, code generation, document summarization, translation, search, and retrieval-augmented analysis. They have shown rapidly improving performance in structured domains with clear feedback: code review, test generation, data extraction, and classification tasks with well-defined categories. There is early but uneven traction in scientific workflows—literature synthesis, hypothesis generation, experiment design—with reliability that varies substantially by domain.

Some domains remain resistant: situations requiring physical manipulation, contexts where errors are catastrophic and cannot be caught by downstream review, problems with sparse data or ill-defined objectives, tasks requiring sustained multi-step reasoning over novel domains. The uneven pattern is typical of paradigm transitions. Steam power was not equally applicable to all industries; electricity took decades to reshape manufacturing; the internet transformed some sectors immediately and others only after years of experimentation.

DomainCurrent ROI StatusAutonomy LevelBinding Constraint
Customer support (text)Positive, deployed at scaleHuman-in-the-loop for escalationQuality variance, edge cases
Code generationPositive for first draftsHuman review requiredCorrectness verification
Document summarizationPositive for routine tasksHuman spot-checkAccuracy on specialized content
Scientific researchEarly traction, high varianceHuman-directedDomain-specific reliability
Physical manipulationNegative except narrow casesRequires full autonomyEmbodiment, real-world feedback
High-stakes decision-makingNegative without oversightCannot be autonomousLiability, error cost

The table distinguishes domains where human-in-the-loop deployment achieves positive ROI from domains that would require full autonomy to be economical. Most current successes are in the former category; the latter remains largely unrealized.

Broad applicability falsifier: If the ROI-positive domain list does not expand substantially over the next five years—if adoption remains confined to the current set of tasks without new categories becoming economical—the applicability criterion would be met only weakly.


Reshaping organizational forms. The fourth criterion is the most difficult to assess because organizational change lags technological change. The factory system did not spring fully formed from the first steam engines; the multidivisional corporation did not emerge immediately from the telegraph and railroad. The organizational innovations that will define the Factor Prime era may not yet be visible.

What can be observed is the beginning of experimentation. Three candidate organizational forms are emerging, though none has yet proven dominant:

AI-native micro-enterprises. Small teams (3–10 people) achieving output levels previously requiring 30–100, by substituting AI for mid-level coordination and production tasks. Early examples include AI-assisted content studios, code-generation consultancies, and automated research services. The distinctive feature is that the leverage comes from inference, not from software automation or outsourcing—the firm’s marginal cost of cognitive work is determined by API pricing, not by headcount. Revenue per employee ratios reported at 5–10x industry norms in early cases, though systematic data remain scarce.

Agentic operations stacks. Existing firms embedding AI as an internal control layer across CRM, ERP, support, and workflow systems. The AI does not replace departments but coordinates between them, handling routing, summarization, and first-draft generation. The distinctive feature is that coordination overhead is reduced by inference, not by better software or reorganization—the AI handles routing and summarization that previously required human attention.

Vertical compute-workflow integrators. Firms that own both infrastructure access (compute capacity, model weights, data pipelines) and the domain-specific workflow in which AI is deployed. The distinctive feature is control over the selection gradient—owning both the model and the deployment context allows tighter alignment between training objectives and deployment value than firms relying on third-party APIs can achieve.

These forms are speculative. The deeper organizational changes are likely still ahead. If Factor Prime continues to improve along its current trajectory, the boundary between human and machine cognition will continue to shift, and the organizational forms that exploit that shift have not yet been invented.

Organizational reshaping falsifier: If, five years from now, the dominant firm structures and labor compositions are indistinguishable from 2020—if AI remains a tool used within existing organizational forms rather than a force reshaping those forms—the criterion would fail.


Criterion Scoreboard

CriterionCurrent EvidenceFalsifier
Cost trajectoryInference cost per task declining ~order of magnitude since 2023; training efficiency improving ~50%/yearCost per successful task plateaus or rises
Unbounded supplyInference abundant; frontier training supply-constrained by chips, power, grid, permitsInference remains rationed; grid queues don’t compress; transformer lead times persist; scarcity rents dominate
Broad applicabilityROI-positive in text support, code, summarization (human-in-loop); uneven elsewhereDomain list does not expand; adoption confined to current tasks
Organizational reshapingEarly experimentation; candidate forms emerging; no dominant new archetypeFirm structures indistinguishable from 2020 in five years

By Perez’s criteria, Factor Prime is a live candidate for paradigm-input status. The cost trajectory criterion is the strongest; the supply criterion is the weakest and most uncertain. This is not a claim of inevitability. Paradigm shifts can stall; installation phases can be followed by crashes and recessions; the deployment phase may be delayed by decades. The criteria for recognizing a paradigm shift do not require certainty about outcomes. They require evidence that the technology has the potential to reshape the economy in fundamental ways, and explicit tests that would falsify the claim. By that standard, Factor Prime qualifies as a candidate—with the supply constraint as the binding uncertainty and the organizational test as the lagging indicator.

The next section will examine the property that makes this transition distinctive: the recursive character of a factor that can produce more of itself.