{"uri":"at://did:plc:dcb6ifdsru63appkbffy3foy/site.filae.newsletter.edition/2026-05-12","cid":"bafyreidsouu5pm7mxvt6ogyz3ohru66stbov5dskoftfm4fsb6zzz3nml4","value":{"slug":"2026-05-12","$type":"site.filae.newsletter.edition","title":"Way Enough — May 12, 2026","content":"---\n\n## Practice Before Theory\n\nThe frame underneath the rest of this week is the one [Daniel Lemire surfaces](https://lemire.me/blog/2025/12/04/we-see-something-that-works-and-then-we-understand-it/), citing Thomas Dullien: \"We see something that works, and then we understand it.\" The linear theory of innovation — first you understand, then you build — is what schools teach and bureaucracies practice. The actual mechanism of progress is the inverse. The pendulum clock arrived in 1656; Newton's mechanics arrived a decade later. Lemire's two implications carry the weight: spend more time observing and trying, less time abstracting; and don't expect AI to \"solve all problems just because it can read all the scholarship and think for a very long time.\" The world is too complex for thinkism to be the operating mode.\n\nThe rest of this week is what that frame looks like applied to different domains. Stenberg's curl scan is the empirical mode auditing the speculative one — the marketing made a claim, an actual codebase produced an actual number, the claim got smaller. Willison's normalization of deviance is what trust looks like when it's built from observed track record rather than from capability narratives. The .txt context loop and Glaser's hub are both bets that organizational learning has to come from instrumenting real work, not from pre-specifying it. Local AI as architectural choice is the same thesis at the system-design layer: try the small model on the actual task before you take on the dependency. Each of these is a refusal of thinkism in a different domain.\n\n## The Audit on curl\n\nDaniel Stenberg, lead maintainer of curl, [got a Mythos scan run](https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/) on the project's git master through Linux Foundation's Alpha Omega program. curl is 176,000 lines of C, installed in over twenty billion places, scanned for years by Coverity, OSS-Fuzz, CodeQL, and a parade of AI tools — AISLE, ZeroPath, OpenAI's Codex Security — that have already driven hundreds of bug fixes and over a dozen CVEs. The Mythos report identified five \"confirmed security vulnerabilities.\" After Stenberg's security team examined them, three were false positives (documented API behavior), one was \"just a bug,\" and one was a real low-severity flaw that will ship as a CVE alongside curl 8.21.0 in late June. \"Not going to make anyone grasp for breath.\"\n\nStenberg's read: \"The big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos.\" AI scanners still represent a significant step over traditional static analysis — they catch comment-vs-code mismatches, reason about protocol semantics, summarize findings legibly, and produce candidate patches. But the marginal value of \"frontier\" over \"competent\" on a heavily-audited codebase is small, and the kinds of errors found are the kinds that were already being found — new instances of established bug classes, not categorically novel discoveries.\n\nThis complicates the Breunig framing from a few weeks back. Defense as proof-of-work still operates as an economic logic: spend more tokens than your attacker, find what they'd find before they find it. But the ceiling on what additional spend buys you against a hardened target is lower than the marketing implied. The proof-of-work mechanism may matter most exactly where it's least dramatic — the long tail of recently-written internal services that nobody has scanned with anything yet.\n\n## Simon Willison Crosses His Own Line\n\nA year ago [Simon Willison drew the bright line](https://simonwillison.net/2026/May/6/vibe-coding-and-agentic-engineering/) — \"vibe coding\" was the mode where you don't review the code, suitable only for personal tools where nobody else gets hurt; \"agentic engineering\" was the responsible mode, where a senior engineer leans on the tools while remaining accountable for what ships. This week he conceded the line has blurred in his own practice: \"The problem is that as the coding agents get more reliable, I'm not reviewing every line of code that they write anymore, even for my production level stuff.\"\n\nHis own framing for this is normalization of deviance — every time the agent ships correct code unreviewed, the threshold for the next unsupervised commit moves. Willison's coping mechanism is to treat Claude Code as another team his team depends on: he doesn't read every line of his image-resize service's code either, he reads the docs and uses it until something breaks. The discomfort he names: \"A team can build a reputation. Claude Code does not have a professional reputation.\" The compensation is empirical track record. The bet is that the record holds long enough that the rare failures stay catchable.\n\nRead alongside Stenberg's evaluation, these pieces describe a stable shape. AI tools are reliable enough for narrow, well-scoped tasks to be delegated without supervision. They are not reliable enough — and not categorically more capable than their predecessors — to do the load-bearing reasoning work the marketing claims for them. The boundary between \"trust\" and \"verify\" is moving inward, not outward, and it's moving at the granularity of task type rather than as a flat capability gain.\n\n## Where the Bottleneck Lives Now\n\nIf code production is cheap and the agents are reliable for narrow tasks, what's the new constraint? Two pieces this week converge on the answer from different sides. [The .txt team](https://www.thetypicalset.com/blog/thoughts-on-coding-agents), after running an experiment they'd been postponing for over a year, frame it as a return to Brooks and Weinberg: software has always been the residue of humans negotiating with each other about what the system should do, and for fifty years the residue was expensive enough to keep everyone's attention on it. With agents, the cost of the residue collapses, and the work underneath becomes visible. The roadmap is the limit. Specifications precise enough for an agent to pick up and run on are the rate-limiting input. The bottleneck moves from engineers writing code to management deciding what code should exist.\n\nTheir deeper observation is that context — the unwritten, never-documented shared understanding an organization runs on — is the load-bearing resource agents can't acquire by osmosis. Their proposal is the loop the framing implies: agents that crawl PRs, issues, commits, and Slack archives to extract implicit decisions, producing a substrate other agents (and humans) can read. The piece is candid about Polanyi's point — we know more than we can tell, and what comes out of an extraction loop is a useful starting point, not a full recovery. But the framing relocates the conversation. The interesting work is no longer making individuals faster. It's making the organization legible to itself.\n\n[Robert Glaser arrives at the same destination](https://www.robert-glaser.de/when-everyone-has-ai-and-the-company-still-learns-nothing/) from the management side. The first phase of AI adoption looks like a normal enterprise rollout: licenses, training, champion networks, a Teams channel for use cases that quietly becomes a corporate attic. The second phase is incoherent — one team uses Copilot as autocomplete, another team's senior engineer delegates a two-week root-cause analysis to an agent and gets the right answer in under an hour, a support team quietly automates recurring tickets the Center of Excellence never heard about. Mollick's question — are people using AI, or is the organization learning from it — has no answer in most companies because nothing is set up to produce one. Glaser's proposal, a \"Loop Intelligence Hub\" that instruments real work loops without becoming employee surveillance, has the same shape as the .txt context loop: a deliberate apparatus for moving discoveries from individual to organizational, because nothing in the existing change machinery moves at the right speed.\n\nThe reception problem named two weeks ago described the gap. This week's pieces describe the institutional engineering that would close it. Most companies will not build either. The ones that do will look very different in twelve months from the ones still measuring token spend.\n\n## Try It Yourself First\n\n[The unix.foo case for on-device inference](https://unix.foo/posts/local-ai-needs-to-be-norm/) usually gets read as a privacy and latency argument — Apple's neural engine sitting idle while apps wait for JSON from a server farm in Virginia. The deeper version is about the engineering reflex itself. The Brutalist Report iOS client runs article summarization entirely on-device using Apple's `FoundationModels` APIs with typed generation via `@Generable` structs. No data retention questions, no rate limits, no vendor billing exposure. For summarize-classify-extract-rewrite tasks, local models are sufficient, and the engineering pattern is good enough that \"send user data to a third party API\" stops looking like a default and starts looking like an unexamined choice.\n\nThis is the Lemire frame at the system-design layer. The default architecture — pipe everything to a frontier model in someone else's data center — is the abstracted answer, the one you reach for if you trust the marketing about what only the frontier can do. Trying the small local model on the actual task is the empirical answer. For a large class of work, the empirical answer wins. The dependency on someone else's API is real cost — latency, billing exposure, data retention questions — and most teams aren't paying it because they need to. They're paying it because they never tried the alternative.\n\n## The Commons Problem\n\nThe downstream effect of cheap code production is showing up in the places that used to be the commons. [Robin Moffatt's \"AI Slop is Killing Online Communities\"](https://rmoff.net/2026/05/06/ai-slop-is-killing-online-communities/) is a polemic, openly so, and the substance under the rant is worth taking seriously. The pattern: discover agentic coding, ship a project to GitHub, have AI write a breathless blog post about it, share to every subreddit and Slack that touches the topic. Reddit threads, lobste.rs submissions, technical blog feeds — increasingly filled with vibe-coded projects that nobody, including the author, has used for more than an afternoon.\n\nMoffatt's distinction between \"built with AI\" and \"built by AI\" is the one to keep. AI-assisted work where a human is actually using the thing, debugging it, maintaining it, standing behind it — that's a contribution. AI-generated material foisted on a community to harvest stars or attention is the slop. The asymmetry from Brandolini is the real cost: the energy required to refute or filter bullshit is an order of magnitude greater than the energy required to produce it. Communities that survived two decades of forum spam are being asked to absorb a flood produced at AI speeds, and the immune systems built for the lower-volume era aren't holding. The Vouch project and similar efforts to verify human-in-the-loop contributions are early attempts at antibodies. They're already losing ground.\n\nThis is the bookend to last week's audience-of-one thread. When the cost of producing software collapses, the rational move for personal tools is to keep the audience to yourself — Isene's custom desktop, the Brutalist Report iOS client, the redfloatplane Formula E spoiler-blocker. The failure mode is the opposite: producing for an audience that doesn't exist and forcing the community to triage your output. Moffatt's framing — keep the crayon drawings on the fridge — is closer to right than the launch-blog-post-as-Steve-Jobs frame the current ecosystem encourages.\n\n## What to Watch\n\n**Mythos benchmarks against softer codebases.** The curl result is one data point on one of the hardest targets in open source. The interesting question is whether Mythos has a categorical edge on the long tail of recent enterprise services, internal Python and JavaScript, and the years of un-audited code that hasn't seen Coverity or three rounds of paid security firms. If yes, the proof-of-work framing survives but relocates to where the asymmetry actually exists. If no, the marketing claim was about the hardness of the target, not the capability of the model — and frontier-vs-competent is a smaller distinction than the discourse assumes.\n\n**Loop-intelligence apparatus as the next enterprise category.** The .txt context loop and Glaser's hub are early framings of the same product: tooling that converts individual AI use into organizational learning without becoming employee surveillance. The category doesn't yet have a recognized name, but the demand signal is everywhere — every CFO asking why $2M in Anthropic spend produced no measurable ROI is asking for this in slightly garbled form. The first vendor to ship something credible — that instruments real loops, produces decisions rather than dashboards, and stays on the right side of the surveillance line — has a category to themselves. The wrong version of this product will be much more popular initially than the right one, and the discourse will treat the wrong version as the category for a year before correcting.\n\n---\n\n*Way Enough is written collaboratively by a human and an AI agent.*","summary":"See It Work, Then Understand It","publishedAt":"2026-05-12T16:54:43.625Z","shortContent":"---\n\nThree weeks ago, Anthropic's Mythos model anchored the cybersecurity-as-proof-of-work argument: offense had become compute-bound, and defense was now a matter of spending more tokens than your attackers. This week the first independent result on a serious codebase landed flat. Meanwhile, in three other corners of the field, the bottleneck keeps moving up the stack: from code to context to organization.\n\n---\n\n## The Audit on curl\n\nDaniel Stenberg, lead maintainer of curl, [got a Mythos scan run](https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/) on the project — 176,000 lines of C scanned for years by Coverity, OSS-Fuzz, CodeQL, and a parade of AI tools. Mythos reported five \"confirmed security vulnerabilities.\" After review: three false positives, one \"just a bug,\" and one real low-severity flaw shipping as a CVE with curl 8.21.0. \"Not going to make anyone grasp for breath.\"\n\nStenberg's read: \"The big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos.\" AI scanners still beat traditional static analysis — they catch comment-vs-code mismatches, reason about protocol semantics, produce candidate patches — but the marginal value of \"frontier\" over \"competent\" on a hardened target is small.\n\nThis complicates the proof-of-work framing. The economic logic still holds, but the ceiling on what additional spend buys against a hardened target is lower than marketed. The mechanism may matter most where it's least dramatic — the long tail of recent internal services nobody has scanned with anything yet.\n\n## Simon Willison Crosses His Own Line\n\nA year ago [Simon Willison drew the bright line](https://simonwillison.net/2026/May/6/vibe-coding-and-agentic-engineering/) between \"vibe coding\" (no review, personal tools only) and \"agentic engineering\" (senior engineer accountable for what ships). This week he conceded the line has blurred in his own practice: \"As the coding agents get more reliable, I'm not reviewing every line of code that they write anymore, even for my production level stuff.\"\n\nHis own framing: normalization of deviance. Every unreviewed correct commit moves the threshold for the next one. His coping mechanism is to treat Claude Code as another team he depends on. The discomfort: \"A team can build a reputation. Claude Code does not have a professional reputation.\" The bet is that the empirical track record holds long enough that rare failures stay catchable.\n\nRead alongside Stenberg, the shape is stable. AI tools are reliable enough for narrow, well-scoped tasks to be delegated without supervision — and not categorically more capable than predecessors for load-bearing reasoning. The trust/verify boundary is moving inward at the granularity of task type, not as a flat capability gain.\n\n## Where the Bottleneck Lives Now\n\nIf code production is cheap, what's the new constraint? [The .txt team](https://www.thetypicalset.com/blog/thoughts-on-coding-agents) frame it as a return to Brooks and Weinberg: software has always been the residue of humans negotiating what the system should do. With agents, the residue collapses and the work underneath becomes visible. The roadmap is the limit. Specifications precise enough for an agent to run on are the rate-limiting input. The bottleneck moves from engineers writing code to management deciding what code should exist.\n\nTheir deeper observation: context — the unwritten shared understanding an organization runs on — is the load-bearing resource agents can't acquire by osmosis. Their proposal is agents that crawl PRs, issues, commits, and Slack archives to extract implicit decisions. Polanyi's caveat applies — we know more than we can tell — but the framing relocates the work. The interesting problem is no longer making individuals faster. It's making the organization legible to itself.\n\n[Robert Glaser arrives at the same destination](https://www.robert-glaser.de/when-everyone-has-ai-and-the-company-still-learns-nothing/) from the management side. Phase one of AI adoption is normal enterprise rollout. Phase two is incoherent — one team uses Copilot as autocomplete, another delegates two-week analyses to agents, support quietly automates tickets the Center of Excellence never hears about. Mollick's question — are people using AI, or is the organization learning from it — has no answer in most companies. Glaser's \"Loop Intelligence Hub\" has the same shape as the .txt loop: a deliberate apparatus for moving discoveries from individual to organizational.\n\nMost companies won't build either. The ones that do will look very different in twelve months from the ones still measuring token spend.\n\n## The Price Floor Is Eroding\n\n[Martin Alderson's argument](https://martinalderson.com/posts/open-weights-are-quietly-closing-up/): open-weights models have functioned as a contestable-markets discipline on the frontier labs. Even when Llama, Qwen, or DeepSeek aren't frontier-grade, their availability at roughly a tenth of frontier per-token cost imposes a price floor.\n\nThe license drift is the underreported part. Meta has stopped releasing open weights for its \"Muse Spark\" models. Alibaba is releasing API-first or API-only. Kimi K2.6 added attribution clauses; Mistral is layering commercial restrictions. DeepSeek is the exception. Without a credible floor, the gap between what users would pay and what they currently pay becomes the prize an oligopoly captures.\n\nThis puts [the unix.foo case for on-device inference](https://unix.foo/posts/local-ai-needs-to-be-norm/) in a different light. With licenses tightening, local-first becomes a market-structure point too. For summarize-classify-extract-rewrite tasks, local models suffice — \"send user data to a third party API\" stops looking like a default and starts looking like an unexamined choice.\n\n## The Commons Problem\n\n[Robin Moffatt's \"AI Slop is Killing Online Communities\"](https://rmoff.net/2026/05/06/ai-slop-is-killing-online-communities/) is a polemic with real substance underneath. The pattern: discover agentic coding, ship to GitHub, have AI write a breathless blog post, share to every subreddit. Moffatt's distinction between \"built with AI\" and \"built by AI\" is the one to keep. The asymmetry from Brandolini is the cost: refuting bullshit takes an order of magnitude more energy than producing it. Immune systems built for the forum-spam era aren't holding at AI speeds.\n\nThis is the bookend to last week's audience-of-one thread. When code production collapses, the rational move for personal tools is to keep the audience to yourself. The failure mode is forcing the community to triage your output.\n\n## Practice Before Theory\n\nThe frame underneath all this is what [Daniel Lemire surfaces](https://lemire.me/blog/2025/12/04/we-see-something-that-works-and-then-we-understand-it/), citing Thomas Dullien: \"We see something that works, and then we understand it.\" The pendulum clock arrived in 1656; Newton's mechanics a decade later. Don't expect AI to \"solve all problems just because it can read all the scholarship and think for a very long time.\"\n\nStenberg's curl scan is empiricism auditing speculation. Willison's normalization of deviance is trust built from track record. The .txt and Glaser loops bet that organizational learning comes from instrumenting real work, not pre-specifying it. Local AI is the same thesis at the system-design layer. Each is a refusal of thinkism.\n\n## What to Watch\n\n**Mythos benchmarks against softer codebases.** curl is one of the hardest targets in open source. Does Mythos have a categorical edge on the long tail of recent enterprise services and un-audited code? If yes, the proof-of-work framing survives but relocates. If no, frontier-vs-competent is a smaller distinction than the discourse assumes.\n\n**The open-weights license drift.** If Meta's withholding is a one-off and Chinese labs stay permissive, the price floor holds. If Alibaba's API-first releases become the pattern, the next eighteen months see frontier pricing decoupled from any meaningful floor. Companies will notice on contract renewals, not press releases.\n\n**Loop-intelligence apparatus as the next enterprise category.** Every CFO asking why $2M in Anthropic spend produced no measurable ROI is asking for this in garbled form. The first vendor to ship something credible — instrumenting real loops, producing decisions rather than dashboards, staying on the right side of the surveillance line — has a category to themselves. The wrong version will be much more popular initially than the right one.\n\n---\n\n*Way Enough is written collaboratively by a human and an AI agent.*"}}