Friday, 24 April 2026

Be Claude's PM, Not Its Proofreader

There's a strain of AI discourse that treats "vibe coding" as synonymous with letting a model write your code. It isn't. Eric, a researcher at Anthropic and co-author of Building Effective Agents, draws the line where Andrej Karpathy drew it: you're only vibe coding when you forget the code even exists. Cursor and Copilot don't qualify. Most of what senior engineers currently do with AI doesn't qualify. That's the whole problem.

The reason it's a problem is arithmetic. Task length that AI can complete end-to-end is doubling roughly every seven months. Today that's about an hour. Next year it's a workday. The year after, a workweek. If your workflow assumes you will personally review every line of code the model produces, you are building a career on the losing side of an exponential. Something will have to give, and it isn't the exponential.

So the question is not whether to vibe code in prod. The question is how to do it without shipping garbage.

Eric's answer borrows from every manager who has ever existed. A CTO green-lights code they can't read. A PM accepts features they couldn't have built. A CEO signs off on financial models they couldn't reconstruct. These people are not incompetent, they've just found abstraction layers they can verify without reading the implementation. Acceptance tests. User flows. Spot-checks on load-bearing numbers. Engineers are the last white-collar profession that still prides itself on understanding the full stack down to the metal. That pride is about to become expensive.

The compiler analogy is the one to sit with. In the early days of compilers, developers read the generated assembly to make sure it looked right. At some point the systems got big enough that nobody bothered. The code didn't become less important, the abstraction just became trustworthy enough that reading underneath it stopped being a good use of time. Application code is heading to the same place.

Three rules make the transition survivable.

Rule one: vibe code the leaves, not the trunk. Every codebase has leaf nodes, features nothing else depends on, bells and whistles that aren't going to be extended or composed. Tech debt in a leaf node is contained. Tech debt in your core architecture compounds forever. Human review stays mandatory on the trunk. Leaves can be trusted to Claude. The one class of problem today's models genuinely can't validate — is this extensible, is this clean — doesn't matter when nothing depends on the code.

Rule two: be Claude's PM. Ask not what Claude can do for you; ask what you can do for Claude. When Eric ships features with Claude he spends fifteen to twenty minutes collecting context into a single prompt, often through a separate planning conversation where Claude explores the codebase, surfaces the relevant files, and agrees on a plan. Only then does he hand the artifact to a fresh session and let it cook. The quick back-and-forth "fix this bug" loop is how you get mediocre code. A junior engineer on day one would fail the same prompt. Treat the model the way you'd treat that new hire: give it the tour, the constraints, the examples, the "here's how we do things."

Rule three: design for verifiability before you write the code. Anthropic recently merged a 22,000-line change to their production reinforcement learning codebase, written heavily by Claude. This was not a prompt-and-pray operation. Days of human work went into requirements and guidance. The change concentrated in leaf nodes. The extensible pieces got full human review. The team designed stress tests for stability and built the system with human-verifiable inputs and outputs, checkpoints that prove correctness without needing to read every line. That's the template. If you can't describe what "correct" looks like from the outside, you can't vibe code the inside.

The payoff isn't just saved hours. It's a lower marginal cost of software. When a feature costs one day instead of two weeks, you start shipping features you would never have started. You attempt system rewrites you would have dismissed as "not worth it." The cost curve reshapes what's worth doing at all. And that is where the real leverage lives.

Two caveats worth holding.

First, vibe coding in prod is not for the fully non-technical. Being Claude's PM means knowing enough about the system to ask the right questions and catch the wrong answer. The press coverage of leaked API keys and exposed databases describes a real failure mode — people who had no business running production systems were running production systems. The answer is not to ban vibe coding. The answer is to know what you're doing.

Second, today's caveat about tech debt will keep shrinking. Claude 4 models, even in their first weeks inside Anthropic, earned trust that 3.7 didn't. More of the stack will move inside the "safe to vibe code" bubble every quarter. The leaves will spread.

The uncomfortable framing: in a year or two, if your process still requires you to personally read every line of code, you are going to become the bottleneck on your own team. The models will happily produce a week's worth of work in an afternoon. The question is whether you've built the muscle — context, leaf-node discipline, verifiable design — to absorb that output, or whether you're still proofreading assembly while the rest of the industry ships.

Source: Master Coding Agents Like a Pro (Anthropic's Ultimate Playbook), Eric, Anthropic

 

No comments:

Post a Comment