Tuesday, 28 April 2026

Token Maxing, Mech Suits, and Why Shopify Was Right About AI

A leaderboard went up inside Meta a few months ago. It measured, per engineer, the number of AI tokens consumed. It was supposedly "one data point among many" for performance reviews. Within weeks, Meta engineers were asking Claude to summarize docs they could read faster themselves. Salesforce set a minimum $175 monthly AI spend per engineer; people started racing to hit the floor. Microsoft engineers ran autonomous agents overnight to build junk so the number would climb. Meta eventually pulled the leaderboard — but people kept token maxing anyway, because in big tech, you don't forget that the metric ever existed.

Gergely Orosz (The Pragmatic Engineer) described all of this on the AI Engineer Summit stage with Swyx, and the historical rhyme is perfect. Ten years ago, early developer productivity tools measured lines of code and PR count. That was stupid, and everyone knew it was stupid, and people optimized for it anyway. The same thing is happening now — just dressed up as "AI adoption."

Why the push exists, though

The uncomfortable truth is that leadership didn't invent token maxing out of stupidity. Six months ago, CTOs were genuinely worried their engineers weren't using AI tools at all. One CTO at a Netherlands e-commerce company told a room of peers: "My engineers are skeptical. They're not using it." On existing codebases with older models, they had a point — the tool didn't find the bug, didn't refactor well, didn't earn its keep.

At the same time, Anthropic kept publicly saying a huge share of their own code was written with Claude Code — and their revenue line went vertical. So leadership, unable to tell correlation from causation, decided the safer bet was: force adoption. Coinbase's CEO Brian Armstrong literally emailed the company that anyone not using AI tools within a week would "have a conversation," then fired an engineer that Saturday. On $300–400K base salaries, the message lands.

Token maxing is the downstream result. It's leetcode reborn — an absurd ritual that selects for people willing to perform it to keep the job. The people who actually get value out of AI coding are the ones who ignore the metric entirely and just use the tools to ship.

The mech suit, not the manager

The other framing that's getting big tech wrong: "you're not an engineer anymore, you're an engineering manager." Gergely and DHH both think this is bullshit. The whole reason people resist becoming engineering managers is the stuff they'd have to give up: the product, the feedback loop, the hands-on craft. Agents don't give you any of that pain. You don't have to mediate conflict between agents. You don't do 1:1s with them.

DHH's metaphor: it's a mech suit. You're still the pilot. You're just doing seven things at once. The feedback loop is days, not quarters. The decisions you make compound in weeks instead of in six months. If anything, the role looks more like "tech lead" than "manager" — you're orchestrating without being removed from the work.

The role is compressing

Pre-AI, venture-funded startups were already running a compressed version of engineering. Dedicated QA teams disappeared into "every engineer writes tests." Dedicated devops teams disappeared into "every engineer owns their deploys." Product engineering emerged as a real title. AI is pushing the same compression one more notch. Junior engineers are now expected to reason about the business, plan at the architecture level, ship end-to-end.

One concrete signal: a VP at John Deere — a 200-year-old tractor company — told Gergely their "two-pizza teams" are now one-pizza teams. The smallest units in a company that has no reason to move fast are getting smaller. Everywhere else, it's already happened.

The real action is internal infra

The most underrated observation in the whole conversation: the biggest AI investment at large tech companies is not the product you see. Uber looks like it isn't shipping features. Inside, they are rebuilding the entire engineering stack — custom background coding agents integrated into their monorepo, an MCP gateway wired into service discovery, on-call tooling re-tooled around AI, code review systems that auto-categorize changes by risk. Airbnb, Intercom, Meta, Microsoft — every one of them is doing a version of this.

Three reasons it's rational:

1. It's the low-risk way to get hands-on with AI. You don't want your first AI feature to be customer-facing slop. Internal tooling is a safe training ground.<br>2. Their codebases will never fit in any context window. Off-the-shelf vendors (Cursor, Claude Code, Copilot) are built for typical codebases. The hyperscalers have code that is an order of magnitude larger and messier. Custom tooling + basic agents will beat the vendor stack on their own codebase.<br>3. Anything with "AI" in the name gets funded. Ask for two engineers for the platform team and get nowhere. Ask for two engineers for "agent experience" and it's done.

If you're at a large tech company and you're not building an internal MCP gateway, Gergely's line was: "what are you even doing?"

Why Shopify was right

The Shopify story is the one to remember. In 2021 — before Copilot was a product — Shopify's head of engineering heard it was being developed internally at GitHub. He DM'd Thomas Dohmke, the GitHub CEO, and said: "I'd like to get this rolled out to all of Shopify. In exchange you get feedback from 3,000 engineers, honestly, forever." It wasn't for sale. Shopify got it anyway, a full year before anyone else.

The tool wasn't great initially. Shopify burned real money and ate real engineering churn. They kept iterating. They became the first company onboarded to every major AI tool that followed, with unlimited budget.

Gergely's read: Shopify is trading churn + expense for being six months ahead. That trade is not rational for most companies — if your business is a physical product or a legacy vertical, wait it out, the tools will catch up. But if your company competes in technology, paying for the turn is worth it. Plus, at that point, AI adoption is a recruiting signal: "Come work here, you'll have every tool before your friends do."

The weird thing is that every tech company is doing this at the same time. So it looks performative. It isn't. It's rational individually. It just happens to be universal.

The takeaway

Three things to remember, regardless of whether you're writing code or running a company:

  1. Don't measure token count. Measure output. Every time a company makes a metric the goal, smart people game it.
  2. Treat AI tooling as a mech suit, not a manager. The value is in what you can now ship alone, not in "managing" anything.
  3. If you're in tech, eat the churn to be six months ahead. Shopify's trade is available to most of us. The cost is real. The alternative is worse.

Source: How AI Is Changing Software Engineering — Gergely Orosz, The Pragmatic Engineer

Monday, 27 April 2026

Ship With Friction: Why The Slow Part Is The Only Part That's Still Yours

A security incident went up on a company's forum last week. The config change shipped by mistake. The auto-generated social preview rendered the company's tagline right next to it: Ship without friction. Armin Ronacher — the guy who wrote Flask — used that screenshot to open a conference talk, and the joke landed because everybody in the room had made the same mistake recently.

The uncomfortable argument Armin and his co-founder Christina Poncela Cubeiro made: the friction engineers have spent a decade trying to remove is the friction that was doing the thinking. Remove it and you don't get speed. You get a codebase nobody can steer.

The psychological trap

The first few months of coding with Claude Code or Cursor feel like cheating. You prompt, the machine writes, you ship. Then everyone on your team is using it. Then your team's baseline expectation resets. Then the ambient pressure becomes: more output, faster cycles, shorter PRs. The gift becomes the tax. You no longer have the quiet moments to stop and ask whether this is the best way to implement the thing — because you're one prompt away from shipping it.

Armin calls this the gambler's loop. You don't know if the next prompt is the one that makes the product work, or the one last drop of slop that tips the whole thing into an outage. You keep pulling the lever.

The more interesting part is the illusion underneath it. Because you're producing a lot of output very fast, you feel more efficient. You're not. You've just stopped doing the part of the work where you design.

The team composition shift nobody warns you about

Before agents, engineering teams were supply-constrained on the creation side. The balance between writing code and reviewing code was roughly okay. Now every engineer has 5–10× the production power, and nobody got 5–10× the review power.

Two downstream effects:

1. Pull requests pile up. The ones that aren't reviewed carefully get rubber-stamped.
2. The set of people shipping code expands. Marketing people ship code. Former-engineer CEOs ship code again. The number of entities — humans and machines — participating in code creation now vastly outnumbers the ones that can carry responsibility for it. And the machine can't carry responsibility.

The engineering team is still on the hook. But the production volume hitting them is no longer something they authored.

Why agents rot products faster than libraries

The single most useful technical observation in the talk: agents are excellent at libraries and mediocre at products.

Libraries have a tightly defined problem, a clear API surface, and a simple core. The agent can fit the whole thing in its context window, reason about it globally, and add features cleanly. That's why open-source maintainers are getting real leverage from these tools.

Products are the opposite. UI, API responses, permissions, feature flags, billing, background jobs — every change touches three other concerns. The agent cannot fit the global structure in its context. Locally it looks reasonable. Globally it's incoherent.

And the agent's failure mode is specific: it's been trained to write code that runs. That reward function is exactly what you don't want in a product. A human engineer writing a config loader feels bad when they write if config missing, silently load defaults. The agent feels nothing. So the agent ships it. Two hours later you have database records written against the default config, and you don't know it yet.

Humans build up revulsion toward bad code. Agents don't. The codebase accumulates entropy until the agent itself can no longer navigate it — it starts missing files, writing duplicates, forgetting what already exists. You've built a system neither you nor the agent can reason about.

The agent-legible codebase

Armin's prescription: your codebase is now infrastructure. Design it for the agent the way you'd design infrastructure for operators.

Concrete rules Earendil is enforcing through lint:

  • Modularize the code flow, not just the components. The agent does its worst damage between the clearly defined steps — parsing types it shouldn't, stuffing things into state. Name the steps.
  • Don't fight the RL. If there's a canonical way to do a thing in this language, use it. The agent is trained on the canonical version.
  • No hidden magic. Raw SQL hides intent. An ORM shows it. If the agent can't see it, it can't respect it.
  • No bare catch-alls. Silent failure is how products rot.
  • One query interface for SQL. Don't make the agent grep the codebase to find where queries live.
  • Unique function names. Not for readability. For token efficiency — when the agent greps, it wants one hit, not twelve.
  • One UI primitives library, no raw inputs. Consistent styling, consistent behavior.
  • No dynamic imports. Source of truth should be static.
  • Erasable-syntax-only TypeScript. No transpile step. One source of truth between your code and the compiler.

Every one of these is friction. Every one of them is the point.

The part where your judgment gets woken up

The piece that made the whole talk click for me: Earendil built a PR extension that separates the review inputs. Mechanical bugs and style violations go straight back to the agent — those don't need a human. But a database migration, a permissioning change, a new dependency — those explicitly route to a human call-out that says "your brain should be on now."

Because if you miss them, you will regret them. And you will miss them. The machine's job, in this model, is to notice the moments your judgment is actually required, and to make sure you don't sleep through them.

Why "friction is bad" is the wrong slogan

Large engineering organizations have long used SLOs — service-level objectives — as deliberately inserted friction. The point of an SLO is to force the team to stop and ask: Do I actually need this reliability? Do I have the headcount to run this? Should I ship this service at all?

The AI-coding era has encouraged us to treat all friction as waste. In physical systems, friction is what lets you steer. Without it, you don't go faster. You just stop being the one driving.

The single line to take from Armin and Christina: the friction is where your judgment lives. The shift isn't to stop using agents — it's to stop pretending the remaining ten percent of the work, the slow part, is the disposable part. It's the only part that's still yours.

Source: The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil

Sunday, 26 April 2026

One-Chart Businesses: The Boring Way to Pick What to Build Next

Most people pick a business the wrong way. They start from what's trending on X, or what their friends are building, or what an AI demo made them feel. The better starting point is almost never that interesting: pull up a single chart, squint, and ask whether the line on it is going to bend in the next ten years. Sam Parr and Steph Smith on My First Million call these "one-chart businesses." The framing is simple — if a demographic, behavioral, or physical trend is already locked in and the chart makes it obvious, you've found a tailwind you don't have to fight.

Here's the chart that matters most right now: the global population curve split by age. The under-15 line is flat. The working-age line is flat. The 65-plus line goes from under 1 billion today to 2.5 billion. That's the tailwind. Everything that touches elder care rides it.

What the silver tsunami actually unlocks

The US Bureau of Labor Statistics already calls nursing the fastest-growing occupation between 2020 and 2030 — 275,000 new jobs. Assisted-living prices in the US have grown 31% faster than inflation and hit $54,000 a year on average. There are 31,000 facilities; four out of five are for-profit; half of the operators clear 20%+ annual returns on operating cost. That's not a tech margin, but on a real-estate-backed operating business, it's staggering.

Japan ran the experiment ten years earlier. Their silver tsunami produced akiya — over 8 million abandoned houses the government now hands out nearly free. It also produced nursing-home construction up 50% in a decade. Every country is running the same play on a delay.

The gap worth noticing: most assisted-living options are terrible. People already pay $20,000–$30,000 a month for the "good ones." Imagine the premium version — the place you'd actually feel good about sending your parent. That product doesn't really exist at scale. Build it and you own a category that's growing faster than anything AI is disrupting.

The physical-world businesses that don't fit in a pitch deck

A few more one-chart candidates from Steph Smith's database worth stealing:

Air quality. About half the world is exposed to roughly 5x the safe limit for PM2.5 particles. Delhi routinely hits an AQI of 450 — the equivalent of smoking 25 cigarettes a day. People notice water quality because someone showed them a glass of filtered-vs-unfiltered water. Nobody has done that for air yet. The company that turns "invisible threat" into "visible dashboard with a clear product answer" owns the category. Amazon data already shows AC furnace filters + air-quality monitors clearing $40M+ a month in revenue — and that's before anyone markets it seriously.

Sports that aren't pickleball. Pickleball is #1 on the fastest-growing-sports list. The more interesting names underneath are alpine touring, winter fat biking, off-course golf, and trail running. All of them have one thing in common: they bend a traditional sport toward something you can do socially, outdoors, in a smaller time window, without elite fitness. There's a whole "suburban triathlon" waiting to be branded — walk half a mile to a bar, drink two beers, play nine holes of golf. Out-of-shape middle-aged guys will buy anything with a finisher medal. The brand is already funny; the product design is the easy part.

Nerd neck. An entire generation is hunched over laptops and phones. Brian Johnson made a video about it, Tim Ferriss keeps talking about Egoscue, Roger Frampton's "why sitting destroys you" TED talk has millions of views. Right now the product landscape is a few dorky straps (BetterBack) and expensive sports bras (Form). There's a lot of room for a posture product that doesn't look like a medical device.

The less-obvious lens: Ask Nature

Sean Puri's favorite new rabbit hole from the episode is asknature.org — a database of how animals solve engineering problems. African darter feathers are radically water-resistant. Camel fur cools during the day and insulates at night. Otter fur is the blueprint half the wetsuit industry quietly stole. Biomimicry isn't a product category — it's a cheat code for brand stories. If you're building a clothing company and your marketing doesn't punch, the origin story is sitting on Ask Nature for free.

The breakup economy

A random stat from The Hustle: the average person spends $15,000 after a breakup. Divorce parties, breakup cakes, and "revenge body" kits are already getting organic search volume. If you already run a consumer meme account — F*Jerry, Lad Bible, anything with 5M+ followers — you have free distribution for a viral physical product. Breakup vodka. A "send us your ex's stuff in this box and we'll burn it on camera" service. Products like this usually top out at $2–10M a year, but they run themselves on the meme tailwind.

The rule the whole conversation rests on

Every idea in that episode sits on top of a chart that is already committed. Demographics don't reverse. Pollution doesn't un-compound. Posture doesn't fix itself while screens get more engaging. The only real question is whether a marketer shows up to translate the chart into a product the average person can buy.

If you're choosing what to build in 2026, don't start from the newest model or the sharpest framework. Start from the dullest possible chart. The more inevitable the line, the less competition you'll fight for the next ten years.

Source: 9 Killer Business Ideas the Internet Hasn't Caught Up To in 2026 — My First Million with Steph Smith

Saturday, 25 April 2026

Purpose vs Task: The Lens That Tells You Which Jobs AI Actually Eats

Ten years ago, radiology was the consensus “first job to go.” Computer vision had just become superhuman, and the core task of a radiologist — looking at scans — was the most obvious target. A decade later, AI has completely permeated radiology. Every department uses it. Every scan gets processed faster. And the number of radiologists has gone up.

Jensen Huang offered this as a throwaway during a Davos conversation with BlackRock’s Larry Fink, but it is the single most useful frame I’ve heard for thinking about AI and labor. The lens is simple: distinguish the task of a job from the purpose of it.

A radiologist’s task is to study scans. Their purpose is to diagnose disease. When AI compresses the task from minutes to seconds, the purpose doesn’t vanish — it gets more of the person’s attention. More time with patients, more time with clinicians, more scans processed per day. The hospital sees more patients, earns more revenue, and hires more radiologists.

Same story with nurses. The US is short five million nurses. Nurses currently spend half their time charting and transcribing. Companies like Abridge are eating that task. The nurses don’t disappear — the bottleneck moves. More patients get seen, hospitals do better, more nurses get hired.

If all you can see is the task, every knowledge job looks extinct. If you look at the purpose, you notice that the purpose usually gets bigger, not smaller.

The industrial view of AI

Most people think AI is the model. Huang insists AI is actually a five-layer cake. Energy sits at the bottom. Chips and compute sit on top of energy. Cloud services sit on top of the chips. Models sit on top of the cloud. And applications — healthcare, manufacturing, financial services, the places where economic value actually shows up — sit on top of the models.

The reason this matters: every layer needs to be built before the one above it works. Last year, the models finally got good enough to support a real application layer. That’s why 2025 was the largest VC year in history, and why most of that money went to “AI native” companies in healthcare, manufacturing, robotics, and financial services. The model layer is subsidizing the application layer.

And the infrastructure beneath the models is enormous. A few hundred billion dollars in already. TSMC is building 20 new chip plants. Foxconn, Wistron and Quanta are building 30 new computer plants. Micron has committed $200 billion in the US. Trillions more to go. Huang calls it the single largest infrastructure buildout in human history. Not hyperbolically. Literally.

Why it isn’t a bubble

The word “bubble” gets used whenever a lot of capital moves at once. Huang’s test is simple: try to rent a GPU. Spot prices on Nvidia GPUs in every cloud are going up — not just the latest generation, but two-generations-old hardware. If the infrastructure were overbuilt relative to demand, spot prices would be collapsing. They aren’t.

The more interesting read: the bubble question is the wrong question. The right question is whether we’re investing enough to broaden the benefit. Right now AI usage is dominated by educated users in developed economies. That’s how every platform shift starts. The difference with AI is that it’s the easiest software to use in human history — a billion users in three years. If a country has electricity and roads, it can have AI. The open-model wave (DeepSeek, and everything that followed) means any country with local linguistic and cultural expertise can build AI that actually serves its own population.

For Europe specifically, Huang’s pitch was: your industrial base and your deep sciences are your moat. The US led the software era. AI is “software that doesn’t need to write software” — you teach it instead of coding it. That collapses the American advantage. Fuse Europe’s manufacturing strength with AI and the next layer — physical AI, robotics — plays to European strengths.

Three things to steal from this conversation

  1. Audit your role by purpose, not task. If most of what you do is the purpose (diagnosis, judgment, client relationship), AI makes you faster. If most of what you do is the task (charting, retrieval, prediction), your seat gets compressed. Know which one you are.
  2. Pick your layer. Energy, chips, cloud, models, applications — each has different economics, a different moat, a different timeline. Don’t build at the model layer unless you have a real reason to.
  3. Infrastructure is the bet. The buildout is measured in trillions and in decades. Pension funds, sovereigns, and retail investors who sit it out will feel left out. The ones who fund the energy, chips, and factories will own the compounding.

The line that stuck: “You don’t write AI. You teach AI.” That sentence alone rewrites a lot of assumptions about who gets to build.

Source: Jensen Huang and Larry Fink at the World Economic Forum

Friday, 24 April 2026

Be Claude's PM, Not Its Proofreader

There's a strain of AI discourse that treats "vibe coding" as synonymous with letting a model write your code. It isn't. Eric, a researcher at Anthropic and co-author of Building Effective Agents, draws the line where Andrej Karpathy drew it: you're only vibe coding when you forget the code even exists. Cursor and Copilot don't qualify. Most of what senior engineers currently do with AI doesn't qualify. That's the whole problem.

The reason it's a problem is arithmetic. Task length that AI can complete end-to-end is doubling roughly every seven months. Today that's about an hour. Next year it's a workday. The year after, a workweek. If your workflow assumes you will personally review every line of code the model produces, you are building a career on the losing side of an exponential. Something will have to give, and it isn't the exponential.

So the question is not whether to vibe code in prod. The question is how to do it without shipping garbage.

Eric's answer borrows from every manager who has ever existed. A CTO green-lights code they can't read. A PM accepts features they couldn't have built. A CEO signs off on financial models they couldn't reconstruct. These people are not incompetent, they've just found abstraction layers they can verify without reading the implementation. Acceptance tests. User flows. Spot-checks on load-bearing numbers. Engineers are the last white-collar profession that still prides itself on understanding the full stack down to the metal. That pride is about to become expensive.

The compiler analogy is the one to sit with. In the early days of compilers, developers read the generated assembly to make sure it looked right. At some point the systems got big enough that nobody bothered. The code didn't become less important, the abstraction just became trustworthy enough that reading underneath it stopped being a good use of time. Application code is heading to the same place.

Three rules make the transition survivable.

Rule one: vibe code the leaves, not the trunk. Every codebase has leaf nodes, features nothing else depends on, bells and whistles that aren't going to be extended or composed. Tech debt in a leaf node is contained. Tech debt in your core architecture compounds forever. Human review stays mandatory on the trunk. Leaves can be trusted to Claude. The one class of problem today's models genuinely can't validate — is this extensible, is this clean — doesn't matter when nothing depends on the code.

Rule two: be Claude's PM. Ask not what Claude can do for you; ask what you can do for Claude. When Eric ships features with Claude he spends fifteen to twenty minutes collecting context into a single prompt, often through a separate planning conversation where Claude explores the codebase, surfaces the relevant files, and agrees on a plan. Only then does he hand the artifact to a fresh session and let it cook. The quick back-and-forth "fix this bug" loop is how you get mediocre code. A junior engineer on day one would fail the same prompt. Treat the model the way you'd treat that new hire: give it the tour, the constraints, the examples, the "here's how we do things."

Rule three: design for verifiability before you write the code. Anthropic recently merged a 22,000-line change to their production reinforcement learning codebase, written heavily by Claude. This was not a prompt-and-pray operation. Days of human work went into requirements and guidance. The change concentrated in leaf nodes. The extensible pieces got full human review. The team designed stress tests for stability and built the system with human-verifiable inputs and outputs, checkpoints that prove correctness without needing to read every line. That's the template. If you can't describe what "correct" looks like from the outside, you can't vibe code the inside.

The payoff isn't just saved hours. It's a lower marginal cost of software. When a feature costs one day instead of two weeks, you start shipping features you would never have started. You attempt system rewrites you would have dismissed as "not worth it." The cost curve reshapes what's worth doing at all. And that is where the real leverage lives.

Two caveats worth holding.

First, vibe coding in prod is not for the fully non-technical. Being Claude's PM means knowing enough about the system to ask the right questions and catch the wrong answer. The press coverage of leaked API keys and exposed databases describes a real failure mode — people who had no business running production systems were running production systems. The answer is not to ban vibe coding. The answer is to know what you're doing.

Second, today's caveat about tech debt will keep shrinking. Claude 4 models, even in their first weeks inside Anthropic, earned trust that 3.7 didn't. More of the stack will move inside the "safe to vibe code" bubble every quarter. The leaves will spread.

The uncomfortable framing: in a year or two, if your process still requires you to personally read every line of code, you are going to become the bottleneck on your own team. The models will happily produce a week's worth of work in an afternoon. The question is whether you've built the muscle — context, leaf-node discipline, verifiable design — to absorb that output, or whether you're still proofreading assembly while the rest of the industry ships.

Source: Master Coding Agents Like a Pro (Anthropic's Ultimate Playbook), Eric, Anthropic

 

Jensen Huang's Real Job Is Not Building Chips

Thursday, 23 April 2026

AI Is Getting Cheaper — Fast. Here's the Data.

Every few months I get asked the same question: "Is AI actually getting cheaper, or is that just hype?" The answer is: yes, dramatically, and faster than almost any technology in modern history. Here is the data, plotted two ways.

Chart 1 — How many tokens you get for $1

In 2020, one US dollar bought you about 17,000 tokens (roughly 12,000 words) of the best AI model available (GPT-3). Today, one dollar buys you 800,000 tokens on GPT-5. That is ~48× more AI per dollar in six years.

In 2020, one dollar bought you about 17,000 tokens of the best AI (roughly 12,000 words). Today, one dollar buys 800,000 tokens~48× more AI per dollar in 6 years.

Green bars get taller each year because you are getting more output for the same money. Growth charts feel intuitive in a way that price-declining charts do not — so this is usually the version I lead with.

Chart 2 — How much 1 million tokens costs

The flip side of the same coin. A million tokens of the best AI cost $60 in 2020. Today the same workload costs $1.25 — roughly 50× cheaper.

Flip side of the same coin: a million tokens cost $60 in 2020, now costs $1.25~50× cheaper. Faster price decline than computers, electricity, or solar ever managed.

For context, that rate of price decline is faster than:

  • Computers (Moore's Law doubling cost-performance every ~2 years → ~8× per 6 years)
  • Solar panels (~10× cheaper per decade)
  • Electricity, cars, steel, aluminum — pick any major industrial technology, AI is beating it.

The data

YearBest AI modelPrice / 1M tokensTokens per $1vs. 2020
2020GPT-3$60.00~17,000
2023GPT-4$30.00~33,000
2024GPT-4o$5.00200,00012×
2025GPT-5$1.25800,00048×
2026 (today)GPT-5 / Claude Opus 4.7$1.25 – $1567,000 – 800,0004× – 48×

Why this matters for builders

The "cost per token" line does not feel revolutionary until you realise what it unlocks. At $60/1M you think twice about letting a user ask a question. At $1.25/1M you stop thinking about cost entirely and start asking "what if I ran 100 AI calls per user action?" — which is exactly how modern AI agents work.

The product patterns of 2026 (autonomous agents, document-heavy pipelines, real-time reasoning loops) were simply uneconomic in 2023. They are routine now because the floor fell out from under the price. And it is still falling.