tokens are getting more expensive

Lack of proper capitalization makes the text unreadable for me and other people. If you don't care to trivially make your text readable, then we for sure don't care to spend time to struggle through your text to see if there is any useful substance there.

Expand full comment

Reply (8)

eeeee

I like it, this person doesn't speak for all of us

Expand full comment

Nico Appel

Aug 13

Yeah, gee, that’s the kind of content where comments should really focus on capitalizing letters.

Come on!

Great material.

Personally, I’m too OCD to type and publish like this. But substance over form any day.

Expand full comment

Shjs

I didn’t even notice. I had to go back and check to see that there were no capitals.

Expand full comment

Samuel Hammond

Aug 9

then take 2 seconds and burn some tokens having claude capitalize it for you.

Expand full comment

Selemon Brahanu

You are not wrong but you didn't have to say it 💀.

Expand full comment

eeeee

I like it, this person doesn't speak for all of us

Expand full comment

Vincent

its a good article, mikko

Expand full comment

Tobes

Aug 8

dont capitalize a thing i say

Expand full comment

intermediation

I don’t use caps for anything and I have never had a complaint. I have dysgraphia so in theory, I should champion proper punctuation for readability, in practice read-aloud is how I viewed this article. I also get annoyed at grammar police who want me to burn more time on writing. these days with so much AI slop being written genuine writing, with flaws, should be at a premium. Shakespeare made up words out of fresh air yet if he posted on the net today he would be kicked off the platforms for not following the grammar rules… 😆and now autocorrect is annoying by capitalising my “i”s grrrrr

Expand full comment

Ffffffff

Ok boomer

Expand full comment

Nathan

Jul 31

Lots of good points, but I think you’re overly cynical on the utility of older, cheaper models. For a lot of things, they’ll be really worthwhile, as you noted, the Haiku trick is making Claude Code a lot more cost efficient… not every task will need a GPT-5 level “worker”, just a GPT-5 level planner…

Devin was doing usage based pricing last time I checked. Those guys don’t shy away from swiping your card.

I think you’re right that burning cash to try and grow will blow up in most company’s faces, but that’s the same as it ever was. Many in betweeners will eventually raise their rates and, while the Claude Code 24/7 background looper types who ruin the party for everyone will hem and haw, most business users especially will shrug and move on. I saw this happen with Docker. They were and maybe still are troubled, but they clamped down hard eventually on all the free stuff they were giving away, and everyone moved on.

Expand full comment

Reply (4)

Ethan Ding

Jul 31

fair, altho for the consumer market of flat subscriptions, there seems to be no way through

Expand full comment

Nathan

Jul 31Edited

Yes- I believe we're currently in a relatively egalitarian landscape for access to the tools. Once the belt has to tighten, I think we will see a landscape where a lot of people and businesses will get caught in having to pay ever-increasing rates for AIs to stay on top. It might create a weird stratification at a societal level as "I can afford to get my kids $20K a year GPT-7 subscriptions" becomes the new "I can afford to pay for Harvard".

Expand full comment

Tommy

The last point is really interesting. Lets hope access to intelligence is not limited (the stretch of capabilitites, opoen source to closed source) by a mile.

Expand full comment

intermediation

3dEdited

I prefer consistent results over “glimpses of genius” for many tasks. I did some promptfoo tuning for a medical video conference “write-up assistant” on AWS Bedrock running Claude. It would freak out the doctors if Sonnet suddenly spat out a PhD-level text from its training set 🤣 My recommendation was only to use Haiku 2.5; it was better at not being a professor of medicine at random.

So now i try to use the weaker models and to improve my prompts/workflows. In reality, time and motion studies show that LLMs feel good when they trick you with a great answer, only to find that that slot machine doesn't pay out 80% of the time. It is better to optimise for what saves your time. The older models are probably good enough for 90% of work, as long as that work isn't just “stealing IP” that the models memorised.

Expand full comment

Ben

IMO he's not necessarily making a point base on the utility of those older models, but on folks' reluctance to use them when there's a (relatively) smarter one available. Intelligent down-switching to "dumber" models seems workable to me, but is high-risk in a commoditized product environment (especially when some of the players have wallets that'd make King Midas blush)

Expand full comment

Reply (2)

Nathan

Aug 6

Perplexity seems to downswitch sometimes now in some form and it's painful, so I have some inclination for the argument.

Expand full comment

Earl Lee

Aug 6

This problem only exists if the tool allows the user to select the model they're using. But who actually wants to go through the hassle of picking the right model for the task in the first place? I can see most tools moving more towards how Claude Code works where you don't select the model directly. You can still pay up to have the best model in the repertoire but it doesn't have to mean you can always force the tool to use that best model.

Another observation: I'm a ChatGPT Pro subscriber who used to spam Deep Research and o1 Pro but honestly, recently, I've found myself valuing speed to response more often these days and preferring cheaper but faster models like 4o.

Expand full comment

Claude L. Johnson Jr.

Aug 7

I agree that people will use weaker models for simpler use cases. Cost will become a multi-variate equation based on which models went into generating the output. Still, the ultimate end game will be the foundation model providers partnering (e.g. being bought out by OR merging with) the neoclouds. We saw this with all of the dark fiber during the Dot Com Bust. The vertical integration is the only way other than the token cost short squeeze. If the model providers decide to raise their prices, they're betting that their portfolio of models can make them a one-stop shop for developers. If they merge with the infrastructure providers, model use discounts can be built into the Pro, Max and Enterprise contracts.

Expand full comment

Brian Balfour

Just wanted to say that you are putting out some of the best writing on these topics I've seen. 10 out of 10.

Expand full comment

Mert Deveci

very insightful - neocloud sounds interesting.

Been through this where users definitely favour fixed flat pricing - otherwise it is way too complicated to deal with.

Some solutions:

1. Charge a flat subscription with metered billing. I believe this is what Cursor pivoted to. Why do you think this does not work? Power users will not like it so you can be unprofitable with them but overall you would be profitable. However I do agree, this is not a good mantra to have on building a business model.

2. Charge on output. Users dislike this as well but I found that as long as it is very clear, they don't care which model you use, letting you control your own costs. Example: Outreach. You charge $1 per lead, including research and writing and sending emails etc. As long as they are good and you can forecast how much token usage would be consumed per lead, you can still use best models occasionally.

Wdyt?

Expand full comment

Peter Tanham

Aug 6

Great piece!

When I worked in Telcoms back in the day we used to just say "Unlimited!! *" and then in really small print "*fair usage applies", then we'd kick off the <1% people who abused the limits. It worked reasonably well.

I'd also wonder how strong the prisoner's dilemma is here, when there aren't very strong usage/network effects. If Anthropic is confident that their model is the best, they'll get people paying to use up-to-the-limit, then going elsewhere until the quota resets. I don't know how much that hurts them? How much do they need to capture token-generation-share?

Expand full comment

Oliver Angélil, PhD

I don't get it. Just use the models that a few a months old?

Expand full comment

Dan

Aug 4Edited

After reading this, I think the one thing you've sold me on is self-hosted/on-prem models are probably the future, especially for the low spend category prosumers/independent users. With the first real wave of AI-focused hardware finally hitting shelves, and as more and more of these companies struggle to keep their subscriptions reasonably priced, it looks like there will be huge opportunities for those of us willing to drop $2,000+ for a dedicated desktop to host agents for ourselves.

How this interacts with the "people only want the bleeding edge models" is probably going to be the most interesting thing to watch. If powerful consumer hardware with ample shared memory can run models close enough to the frontier, this could be a major shift in the near future.

Expand full comment

stochastic parrot

Your math is way off, it would be $1 x 3 x 24 = $72 per day at the same cost as current deep research, not $4320

Expand full comment

Ethan Ding