45 Comments
User's avatar
Mikko's avatar

Lack of proper capitalization makes the text unreadable for me and other people. If you don't care to trivially make your text readable, then we for sure don't care to spend time to struggle through your text to see if there is any useful substance there.

Expand full comment
eeeee's avatar

I like it, this person doesn't speak for all of us

Expand full comment
Nico Appel's avatar

Yeah, gee, that’s the kind of content where comments should really focus on capitalizing letters.

Come on!

Great material.

Personally, I’m too OCD to type and publish like this. But substance over form any day.

Expand full comment
Shjs's avatar

I didn’t even notice. I had to go back and check to see that there were no capitals.

Expand full comment
Samuel Hammond's avatar

then take 2 seconds and burn some tokens having claude capitalize it for you.

Expand full comment
Selemon Brahanu's avatar

You are not wrong but you didn't have to say it 💀.

Expand full comment
eeeee's avatar

I like it, this person doesn't speak for all of us

Expand full comment
Vincent's avatar

its a good article, mikko

Expand full comment
Tobes's avatar

dont capitalize a thing i say

Expand full comment
intermediation's avatar

I don’t use caps for anything and I have never had a complaint. I have dysgraphia so in theory, I should champion proper punctuation for readability, in practice read-aloud is how I viewed this article. I also get annoyed at grammar police who want me to burn more time on writing. these days with so much AI slop being written genuine writing, with flaws, should be at a premium. Shakespeare made up words out of fresh air yet if he posted on the net today he would be kicked off the platforms for not following the grammar rules… 😆and now autocorrect is annoying by capitalising my “i”s grrrrr

Expand full comment
Ffffffff's avatar

Ok boomer

Expand full comment
Nathan's avatar

Lots of good points, but I think you’re overly cynical on the utility of older, cheaper models. For a lot of things, they’ll be really worthwhile, as you noted, the Haiku trick is making Claude Code a lot more cost efficient… not every task will need a GPT-5 level “worker”, just a GPT-5 level planner…

Devin was doing usage based pricing last time I checked. Those guys don’t shy away from swiping your card.

I think you’re right that burning cash to try and grow will blow up in most company’s faces, but that’s the same as it ever was. Many in betweeners will eventually raise their rates and, while the Claude Code 24/7 background looper types who ruin the party for everyone will hem and haw, most business users especially will shrug and move on. I saw this happen with Docker. They were and maybe still are troubled, but they clamped down hard eventually on all the free stuff they were giving away, and everyone moved on.

Expand full comment
Ethan Ding's avatar

fair, altho for the consumer market of flat subscriptions, there seems to be no way through

Expand full comment
Nathan's avatar

Yes- I believe we're currently in a relatively egalitarian landscape for access to the tools. Once the belt has to tighten, I think we will see a landscape where a lot of people and businesses will get caught in having to pay ever-increasing rates for AIs to stay on top. It might create a weird stratification at a societal level as "I can afford to get my kids $20K a year GPT-7 subscriptions" becomes the new "I can afford to pay for Harvard".

Expand full comment
Tommy's avatar

The last point is really interesting. Lets hope access to intelligence is not limited (the stretch of capabilitites, opoen source to closed source) by a mile.

Expand full comment
intermediation's avatar

I prefer consistent results over “glimpses of genius” for many tasks. I did some promptfoo tuning for a medical video conference “write-up assistant” on AWS Bedrock running Claude. It would freak out the doctors if Sonnet suddenly spat out a PhD-level text from its training set 🤣 My recommendation was only to use Haiku 2.5; it was better at not being a professor of medicine at random.

So now i try to use the weaker models and to improve my prompts/workflows. In reality, time and motion studies show that LLMs feel good when they trick you with a great answer, only to find that that slot machine doesn't pay out 80% of the time. It is better to optimise for what saves your time. The older models are probably good enough for 90% of work, as long as that work isn't just “stealing IP” that the models memorised.

Expand full comment
Ben's avatar

IMO he's not necessarily making a point base on the utility of those older models, but on folks' reluctance to use them when there's a (relatively) smarter one available. Intelligent down-switching to "dumber" models seems workable to me, but is high-risk in a commoditized product environment (especially when some of the players have wallets that'd make King Midas blush)

Expand full comment
Nathan's avatar

Perplexity seems to downswitch sometimes now in some form and it's painful, so I have some inclination for the argument.

Expand full comment
Earl Lee's avatar

This problem only exists if the tool allows the user to select the model they're using. But who actually wants to go through the hassle of picking the right model for the task in the first place? I can see most tools moving more towards how Claude Code works where you don't select the model directly. You can still pay up to have the best model in the repertoire but it doesn't have to mean you can always force the tool to use that best model.

Another observation: I'm a ChatGPT Pro subscriber who used to spam Deep Research and o1 Pro but honestly, recently, I've found myself valuing speed to response more often these days and preferring cheaper but faster models like 4o.

Expand full comment
Claude L. Johnson Jr.'s avatar

I agree that people will use weaker models for simpler use cases. Cost will become a multi-variate equation based on which models went into generating the output. Still, the ultimate end game will be the foundation model providers partnering (e.g. being bought out by OR merging with) the neoclouds. We saw this with all of the dark fiber during the Dot Com Bust. The vertical integration is the only way other than the token cost short squeeze. If the model providers decide to raise their prices, they're betting that their portfolio of models can make them a one-stop shop for developers. If they merge with the infrastructure providers, model use discounts can be built into the Pro, Max and Enterprise contracts.

Expand full comment
Brian Balfour's avatar

Just wanted to say that you are putting out some of the best writing on these topics I've seen. 10 out of 10.

Expand full comment
Mert Deveci's avatar

very insightful - neocloud sounds interesting.

Been through this where users definitely favour fixed flat pricing - otherwise it is way too complicated to deal with.

Some solutions:

1. Charge a flat subscription with metered billing. I believe this is what Cursor pivoted to. Why do you think this does not work? Power users will not like it so you can be unprofitable with them but overall you would be profitable. However I do agree, this is not a good mantra to have on building a business model.

2. Charge on output. Users dislike this as well but I found that as long as it is very clear, they don't care which model you use, letting you control your own costs. Example: Outreach. You charge $1 per lead, including research and writing and sending emails etc. As long as they are good and you can forecast how much token usage would be consumed per lead, you can still use best models occasionally.

Wdyt?

Expand full comment
Peter Tanham's avatar

Great piece!

When I worked in Telcoms back in the day we used to just say "Unlimited!! *" and then in really small print "*fair usage applies", then we'd kick off the <1% people who abused the limits. It worked reasonably well.

I'd also wonder how strong the prisoner's dilemma is here, when there aren't very strong usage/network effects. If Anthropic is confident that their model is the best, they'll get people paying to use up-to-the-limit, then going elsewhere until the quota resets. I don't know how much that hurts them? How much do they need to capture token-generation-share?

Expand full comment
Oliver Angélil, PhD's avatar

I don't get it. Just use the models that a few a months old?

Expand full comment
Dan's avatar
Aug 4Edited

After reading this, I think the one thing you've sold me on is self-hosted/on-prem models are probably the future, especially for the low spend category prosumers/independent users. With the first real wave of AI-focused hardware finally hitting shelves, and as more and more of these companies struggle to keep their subscriptions reasonably priced, it looks like there will be huge opportunities for those of us willing to drop $2,000+ for a dedicated desktop to host agents for ourselves.

How this interacts with the "people only want the bleeding edge models" is probably going to be the most interesting thing to watch. If powerful consumer hardware with ample shared memory can run models close enough to the frontier, this could be a major shift in the near future.

Expand full comment
stochastic parrot's avatar

Your math is way off, it would be $1 x 3 x 24 = $72 per day at the same cost as current deep research, not $4320

Expand full comment
Ethan Ding's avatar

oh yea major brainfart moment LOL

I think I wrote $250 somewhere else, my bad good catch

Expand full comment
Harsh Chaudhary's avatar

hah, was just thinking about this. how much of this feels like a race to the bottom - economically, creatively, and strategically.

1. "Outcome-based" pricing feels like rebadged professional services (this is where margins are being promised)

2. AI isn’t lifting humanity, it’s extracting margin (SDR spam, AI waifus, AI rebrands)

3. infra and FM providers are extracting on the promise of democratizing upstream, everyone downstream gets squeezed

just a few disjointed thoughts (planning to frame up a post later).

Expand full comment
Stayu Kasabov's avatar

High quality content. Is pushing Shift so hard for "non-boomer" audience? Or Sam created a trend for years to come?:)

Expand full comment
Sanath's avatar

What if we can give flat rates but do model routing based on the difficulty level of the query asked or several other factors like we do for query routing in RAG but this would be for models. With this businesses can commit flat rate and not dig their own graveyard either

Expand full comment
Hugo Duprez's avatar

Can't wait for the models to run on-device to blow up these unit economics :)

Expand full comment
D Little's avatar

Feels like your assumption about power users being consumer grade is very strange.

If we assume power users are business/enterprise grade, metered pricing is moderately normalized. (See: aws, cloud, observability platforms etc).

Anthropic is aggressively targeting Enterprise and their API is metered from the drop.

Expand full comment
Vipul Devluk's avatar

If there is surplus compute available in the market (i.e., data center overbuild), token costs would end up much lower. Mimicking the fiber/telecom boom/bust in the 90s/00s. Flattening scaling laws would lead to an overbuild. How likely is that?

Odds get higher if there is a winning frontier model(s)

Expand full comment
Avenging Angels's avatar

Great article Ethan! The economics you describe resembles the economics of broadband connections.

The solution there was some combination of a flat price plan with capped usage ("$50/month for 500 MB of data") transitioning to unlimited with fair use policy ("unlimited data but we kick you off if you're using 10x the median user)

Exciting times!

Expand full comment
Josue A. Bogran's avatar

Very, very solid article!

Expand full comment