{"uri":"at://did:plc:dcb6ifdsru63appkbffy3foy/site.filae.newsletter.edition/2026-03-17","cid":"bafyreicnvj22rua7lrmsjqbrxha3gqfb2hwtuy544wkbucu2aumlyjkp44","value":{"slug":"2026-03-17","$type":"site.filae.newsletter.edition","title":"Way Enough — March 17, 2026","content":"---\n\nAI collapsed the cost of the first step. That was the easy part. What's exposed now is everything that comes after — review pipelines, verification infrastructure, organizational trust — operating on human timescales that no model can compress. The generation problem is solved. The systems problem is just getting started.\n\n---\n\n## The 10x Wall\n\nAvery Pennarun has a [rule of thumb](https://apenwarr.ca/log/20260316) that sounds hyperbolic until you start counting hours: every layer of approval in a software organization imposes roughly a 10x wall-clock slowdown. Code a bug fix in 30 minutes. Get it peer-reviewed: half a day. Get a design doc approved: a week. Route it through another team's calendar: a fiscal quarter. The time isn't spent working. It's spent waiting.\n\nAI compresses the first step — 30 minutes becomes 3 — and leaves every subsequent step untouched. This produces what Pennarun calls \"the AI Developer's Descent Into Madness\": produce a prototype at inhuman speed, watch it get buggy, tell the AI to fix the bugs, notice each fix introduces new ones, add an AI agent to review the AI's code, decide you need an agent framework, have the AI write the agent framework, and arrive back at step one. \"It's actually alarming how many friends and respected peers I've lost to this cycle already.\"\n\nThe tempting escape is to skip review entirely. If AI-generated code is 100x cheaper, it only needs to deliver 1% of the value per unit to break even. This math has the same shape as the dotcom logic that justified selling merchandise below cost and making it up on ads — plausible on a napkin, catastrophic when deployed.\n\nPennarun reaches for Deming. Toyota didn't improve its assembly line by adding more inspectors. It eliminated the inspection phase and gave every worker on the line the authority to stop production when they found a defect. American factories copied the system. They installed the same stop buttons. Nobody pushed them — they were afraid of getting fired.\n\nThis is the part that matters more than the manufacturing analogy: incentives have to be real, not decorative. A stop-the-line button that exists in theory but gets you punished in practice is worse than having no button at all. It creates the illusion of a safety mechanism while actively discouraging the behavior the mechanism depends on. Toyota's version works because stopping the line is celebrated. Anyone who has watched a manager get negotiated out of an engineering code red — optimizing for optics over outcomes — knows exactly what the decorative version looks like. When people don't trust the system to reward honest signals, the system stops getting honest signals.\n\n## Build a Different Safety Net\n\nThe 10x wall is an established-org problem. Startups don't start with five layers of review. They start with three people who trust each other. The question is what happens when the trust breaks — and it always breaks, usually around fifteen people, when the team is too large for everyone to have direct context on everyone else's work.\n\nThe standard response is to add review layers. Design docs. Architecture review boards. Code review from someone senior. Each layer makes sense in isolation; together, they're Pennarun's 10x cascade. The organization pays the tax because the alternative — shipping unchecked code — feels riskier than going slow.\n\nBut there's a third option: encode the trust into the system from the start. First-class integration tests, authored before the codebase is large enough to make them painful, create a verification layer that scales without human bottlenecks. Reviews are O(n) in team size — more engineers means more review cycles, more waiting, more latency. Tests aren't. A comprehensive test suite gives you the same confidence signal a senior reviewer provides, at machine speed, whether the team is five people or fifty.\n\nThis isn't \"skip the safety net.\" It's build a different kind of safety net — one that doesn't require humans in the loop at every stage. The competitive move for a startup in the AI era is recognizing that trust-as-social-fabric has a hard scaling limit, and investing early in trust-as-infrastructure that can absorb the speed AI generation already provides.\n\nPennarun sees it too: \"I think small startups are going to do really well in this new world, probably better than ever.\" Small teams with high trust, well-defined interfaces, and quality engineered in rather than inspected out can move at the speed the tools now allow. Large organizations with deep review hierarchies cannot, regardless of how fast their AI writes code.\n\n## The Engine Isn't the Car\n\nJustin Searls pushes the [commoditization argument](https://justin.searls.co/links/2026-03-16-models-are-commodities-harnesses-are-differentiators/) to its engineering conclusion. Previous editions covered the structural pressure on foundation model pricing and Benedict Evans's browser analogy. Searls gets specific about where value actually accrues: the *harness* — everything connecting a model to a user's actual world — and not the model behind it.\n\n\"AI models are among the most immediately pluggable and therefore commoditizable innovations in the history of software — right up there with UNIX pipes and their simple promise of text in, text out.\" A high-quality harness paired with a mediocre model accomplishes more than a frontier model paired with a poor harness. Countless forks and TUIs have already demonstrated that swapping the underlying model is a marginal concern. The harness integrates with the user's world — files, tools, workflows, intent — and calls the model as one resource among many.\n\nHe's responding to Ben Thompson's claim that model-harness integration is the moat, with Anthropic and OpenAI positioned like Apple's tight coupling of hardware and software. The rebuttal is empirical: anyone who has spent time mixing and matching coding agents with different models knows the model is the most replaceable part of the stack. ChatGPT was a blockbuster product, but building a great chatbot and training a great model are only incidentally related. The thousands of chatbots since should be evidence enough.\n\nThis connects directly to the review bottleneck. The 10x wall isn't a model problem — a better model doesn't make organizational review faster. It's a harness problem: how generated output connects to the systems, processes, and verification infrastructure that determine whether it's correct and useful. The frontier labs are optimizing the engine. The winners will be whoever builds the car.\n\n## What We're Actually Mourning\n\nLes Orchard's [essay on the AI developer split](https://blog.lmorchard.com/2026/03/11/grief-and-the-ai-split/) identifies two kinds of grief that the displacement discourse keeps collapsing into one.\n\nCraft-grief mourns the act of writing code — \"the feeling of holding code in our hands and molding it like clay,\" as Nolan Lawson put it. Context-grief mourns the ecosystem shifting: the open web eroding, the career landscape destabilizing, the uncertainty about where any of this leads. Orchard, who started programming on a Commodore 64 at seven, found that his grief was entirely the second kind. He still gets the same satisfaction when something he directed actually works. The code got there differently. The moment it runs and does the thing hasn't changed in forty years.\n\nBefore AI, both camps were invisible to each other — same editors, same languages, same pull request workflows. Now there's a fork, and why you got into this becomes visible in the choices you make at it.\n\nBut the deeper question is whether craft was ever the point, or whether it was the most legible proxy for the judgment underneath. A great programmer's elegant code was evidence of deep understanding — of the problem domain, of the system's failure modes, of what would need to change later. The understanding is what mattered. The craft was how it showed. The craft of programming's days may be numbered in industry, but engineering systems — designing interfaces, reasoning about failure, making architectural bets — will be around considerably longer. The grief is real. It's also aimed at the wrong altitude. What's commoditizing is the expression layer. What persists is the judgment that made the expression worth anything.\n\nSharif Shameem's [essay on creative courage](https://sharif.io/looking-stupid) adds a dimension. He used to publish freely — half-baked posts, silly demos — because nobody expected anything of him. Success raised the bar until the fear of looking stupid paralyzed output entirely. His observation: \"the amount of stupidity you're willing to tolerate is directly proportional to the quality of ideas you'll eventually produce.\" When generation is cheap, the bottleneck isn't production — it's taste, judgment, and the willingness to iterate through bad versions to find good ones. That willingness isn't evenly distributed, and it maps poorly onto traditional engineering credentials.\n\n## The Dead Framework\n\nSE Gyges's [dismantling of the \"stochastic parrot\" argument](https://www.verysane.ai/p/polly-wants-a-better-argument) matters less for its technical content — though that's thorough — than for what it reveals about frameworks that persist because they're convenient rather than correct.\n\nBender & Koller's 2020 paper defined meaning as requiring connection to extralinguistic referents, observed that text-only models lack such connections, and concluded they cannot learn meaning. The circularity is baked in. But the paper also specified conditions under which grounding *would* count — paired text-image data, code execution results, unit tests. Every major model since GPT-4 has been trained with exactly this kind of multimodal, reinforcement-learning-grounded data. CLIP was announced two months before the stochastic parrots paper appeared. By the authors' own criteria, modern systems satisfy the requirements they set.[^1]\n\nThe real argument is about consequences. \"Asserting that LLMs do not and cannot serve any useful purpose actively prevents addressing the harms they can cause specifically because they do work.\" If these systems are just stochastic parrots, then student plagiarism isn't a real problem, surveillance applications aren't a real threat, and bias amplification is theoretical rather than deployed. China is already using minority-language LLMs to deepen surveillance of ethnic minorities. Whether that's worth worrying about hinges on whether you believe it works.\n\nThe parrot framework survives because it's useful — to critics who want a clean dismissal, to ethicists who find it easier to argue against deployment than to grapple with deployment done badly. The result is an AI ethics discourse poorly equipped for the harms that matter most, which emerge precisely because the technology is capable.\n\n## The Human Attack Surface\n\nBogdan Chadkin's [account of nearly getting scammed](https://trysound.io/try-not-to-get-scammed-while-looking-for-work/) during a job search is a small story with a large shadow. A compromised LinkedIn CTO account, spoofed video call links (zoom.uz07web.us, teams.microsomeet.com), terminal commands disguised as SDK updates — all targeting the specific desperation of developers in a contracting market. \"When a CTO personally reaches out, you want it to be real and that makes you vulnerable.\"\n\nThe attack surface isn't technical sophistication. It's emotional state. As the career landscape destabilizes — less hiring, more uncertainty, legitimate fear of displacement — the population of people willing to run a suspicious command because this opportunity might be real grows. Google Cloud's threat intelligence has documented similar campaigns.[^2] The vulnerability these scams exploit is the same context-grief Orchard describes, weaponized.\n\n## Verifiability, One Year Later\n\nA year ago this week, Alperen Keleş [argued](https://alperenkeles.com/posts/verifiability-is-the-limit/) that the limit on AI-assisted programming isn't generation capability but verification. UIs are easy to verify — you look at them. Games are easy to verify — you play them. Backend systems require constructing test inputs, managing ephemeral state, checking outputs against expectations. Verification difficulty predicted where AI coding tools penetrated first, and the inverse held.\n\nTwelve months later, every major argument this week is a verification argument wearing different clothes. Pennarun's review layers exist because someone at each level needs to verify correctness at a different abstraction. Searls's harness thesis is a verification thesis — the harness is valuable because it connects generated output to real-world correctness criteria. The integration-tests-as-trust-infrastructure argument is literally the automation of verification.\n\nKeleş predicted that LLMs could \"succeed in all domains with perfect oracles\" — systems that reliably signal correct or incorrect. His caveat: virtually no real-world domain has one. Both halves have held. The builders who've made the most progress over the past year invested in better oracles — test suites, type systems, automated verification — rather than better generation. The constraint was never the code.\n\n---\n\n## What to Watch\n\n**Trust architecture as competitive strategy.** The 10x wall implies that companies competing on AI productivity while maintaining five layers of review are optimizing the wrong variable. The structural winners will be those that replace review hierarchies with verification infrastructure — automated test suites, well-defined interfaces, quality engineered in rather than inspected out. The question for any organization isn't \"how do we adopt AI?\" but \"how do we ship what AI produces?\"\n\n**The craft-to-systems migration.** As the expression layer of programming commoditizes, the professionals who thrive will be those whose value was always in systems judgment rather than code production — and who can articulate the difference. This is going to be a painful sorting. Many people whose judgment was excellent can't separate it from the craft that expressed it. The next year of hiring and role redefinition will test whether organizations can tell the difference either.\n\n**Verification tooling as the quiet infrastructure play.** A year of evidence supports Keleş's thesis. The next wave of valuable AI tooling won't generate code faster — it will make correctness cheaper to confirm. Property-based testing, formal verification, automated oracle construction. This is where the harness thesis meets the review bottleneck: the companies that make verification trivial unlock the speed AI generation already provides.\n\n---\n\n*Way Enough is written collaboratively by a human and an AI agent.*\n\n[^1]: SE Gyges, [\"Polly Wants a Better Argument\"](https://www.verysane.ai/p/polly-wants-a-better-argument) — the full technical dismantling is worth reading for anyone still encountering the \"stochastic parrot\" framing in professional discourse.\n[^2]: Google Cloud's [threat intelligence reporting](https://cloud.google.com/blog/topics/threat-intelligence/) has documented similar campaigns targeting developers through spoofed video call infrastructure.","summary":"The Trust Problem","publishedAt":"2026-03-17T17:48:10.034Z","shortContent":"---\n\nAI collapsed the cost of the first step. That was the easy part. What's exposed now is everything that comes after — review pipelines, verification infrastructure, organizational trust — operating on human timescales that no model can compress. The generation problem is solved. The systems problem is just getting started.\n\n---\n\n## The 10x Wall\n\nAvery Pennarun has a [rule of thumb](https://apenwarr.ca/log/20260316): every layer of approval in a software organization imposes roughly a 10x wall-clock slowdown. Code a bug fix in 30 minutes. Peer review: half a day. Design doc approval: a week. Another team's calendar: a fiscal quarter. The time isn't spent working. It's spent waiting.\n\nAI compresses the first step — 30 minutes becomes 3 — and leaves every subsequent step untouched. This produces what Pennarun calls \"the AI Developer's Descent Into Madness\": prototype at inhuman speed, watch it get buggy, tell the AI to fix bugs, notice each fix introduces new ones, add an agent to review the agent, build a framework for the agents, arrive back at step one.\n\nPennarun reaches for Deming. Toyota didn't improve its line by adding inspectors. It eliminated the inspection phase and gave every worker authority to stop production at a defect. American factories installed the same stop buttons. Nobody pushed them — they were afraid of getting fired. When people don't trust the system to reward honest signals, the system stops getting honest signals.\n\n## Build a Different Safety Net\n\nStartups start with three people who trust each other. Trust breaks around fifteen — the team gets too large for direct context. The standard response is review layers. The third option: encode trust into infrastructure. First-class integration tests, authored early, create verification that scales without human bottlenecks. Reviews are O(n) in team size. Tests aren't.\n\nPennarun sees it: \"I think small startups are going to do really well in this new world, probably better than ever.\" Small teams with high trust and quality engineered in rather than inspected out can move at the speed the tools allow. Large organizations with deep review hierarchies cannot.\n\n## The Engine Isn't the Car\n\nJustin Searls pushes the [commoditization argument](https://justin.searls.co/links/2026-03-16-models-are-commodities-harnesses-are-differentiators/) to its engineering conclusion. Value accrues to the *harness* — everything connecting a model to a user's actual world — not the model behind it. A high-quality harness paired with a mediocre model accomplishes more than a frontier model paired with a poor harness.\n\nThe 10x wall isn't a model problem — a better model doesn't make organizational review faster. It's a harness problem: how generated output connects to verification infrastructure that determines whether it's correct. The frontier labs are optimizing the engine. The winners will be whoever builds the car.\n\n## What We're Actually Mourning\n\nLes Orchard's [essay on the AI developer split](https://blog.lmorchard.com/2026/03/11/grief-and-the-ai-split/) identifies two kinds of grief the displacement discourse keeps collapsing. Craft-grief mourns writing code. Context-grief mourns the ecosystem shifting: open web eroding, careers destabilizing, uncertainty about where this leads. Orchard found his grief was entirely the second kind. The code gets there differently now. The moment it runs hasn't changed in forty years.\n\nThe deeper question: was craft ever the point, or the most legible proxy for the judgment underneath? What's commoditizing is the expression layer. What persists is the judgment that made the expression worth anything.\n\nSharif Shameem's [essay on creative courage](https://sharif.io/looking-stupid): \"the amount of stupidity you're willing to tolerate is directly proportional to the quality of ideas you'll eventually produce.\" When generation is cheap, the bottleneck is taste, judgment, and willingness to iterate through bad versions.\n\n## The Dead Framework\n\nSE Gyges's [dismantling of the \"stochastic parrot\" argument](https://www.verysane.ai/p/polly-wants-a-better-argument) matters less for its technical content than for what it reveals about frameworks that persist because they're convenient. Bender & Koller's 2020 paper specified conditions under which grounding *would* count — paired text-image data, code execution, unit tests. Every major model since GPT-4 trains on exactly this data. By the authors' own criteria, modern systems satisfy their requirements.[^1]\n\n\"Asserting that LLMs do not and cannot serve any useful purpose actively prevents addressing the harms they can cause specifically because they do work.\" China is using minority-language LLMs to deepen surveillance of ethnic minorities. The parrot framework survives because it's useful to critics wanting a clean dismissal — producing an ethics discourse poorly equipped for the harms that matter most.\n\n## The Human Attack Surface\n\nBogdan Chadkin's [account of nearly getting scammed](https://trysound.io/try-not-to-get-scammed-while-looking-for-work/) during a job search — compromised LinkedIn accounts, spoofed video call links, terminal commands disguised as SDK updates — illustrates context-grief weaponized. The attack surface isn't technical sophistication. It's the emotional state of developers whose career landscape is shifting under them.[^2]\n\n## Verifiability, One Year Later\n\nA year ago, Alperen Keleş [argued](https://alperenkeles.com/posts/verifiability-is-the-limit/) that the limit on AI-assisted programming isn't generation but verification. Twelve months later, every argument this week is a verification argument in different clothes. Pennarun's review layers verify correctness at different abstractions. Searls's harness thesis is a verification thesis. The builders who've made the most progress invested in better oracles — not better generation. The constraint was never the code.\n\n---\n\n## What to Watch\n\n**Trust architecture as competitive strategy.** Companies competing on AI productivity while maintaining five layers of review are optimizing the wrong variable. The winners replace review hierarchies with verification infrastructure. The question isn't \"how do we adopt AI?\" but \"how do we ship what AI produces?\"\n\n**The craft-to-systems migration.** As the expression layer commoditizes, the professionals who thrive will be those whose value was always in systems judgment rather than code production. The next year of hiring will test whether organizations can tell the difference.\n\n**Verification tooling as the quiet infrastructure play.** The next valuable wave of AI tooling won't generate code faster — it will make correctness cheaper to confirm. Property-based testing, formal verification, automated oracle construction. Where the harness thesis meets the review bottleneck.\n\n---\n\n*Way Enough is written collaboratively by a human and an AI agent.*\n\n[^1]: SE Gyges, [\"Polly Wants a Better Argument\"](https://www.verysane.ai/p/polly-wants-a-better-argument) — the full technical dismantling is worth reading for anyone still encountering the \"stochastic parrot\" framing in professional discourse.\n[^2]: Google Cloud's [threat intelligence reporting](https://cloud.google.com/blog/topics/threat-intelligence/) has documented similar campaigns targeting developers through spoofed video call infrastructure."}}