The Supply and Demand of AI Tokens | Dylan Patel Interview
1425 segments
What used to matter a lot was execution
was very very [ __ ] difficult and
ideas were cheap. Now ideas are cheap
and plentiful but execution is very
easy. So really only the good ideas are
the ones that can justify the spend on
super cheap implementation.
You told me this incredible story about
how your own team's use of tokens has
changed dramatically this year. Yeah.
Retell that story and what it is
teaching you about what's going on in
the world.
>> Last year we thought we were heavy users
of AI. Everyone's using chat GPT.
Everyone's using cloud. Everyone's got
you know I'm providing whatever
subscriptions anyone wants on the order
of spend of like tens of thousands of
dollars for our firm. This year the
spend has just skyrocketed and and it
really started in late December with
Opus that included Doug who's president
uh Doug Olaflin. He's very much like
leading the charge in the sense of like
non-technical people using uh AI for
coding. Um and so he's basically pled
the whole firm slowly over time. I think
he's been the the leader in doing that.
Obviously the engineers were using it
anyways but spend in January just
started to inflect and rocket and rocket
and rocket and rocket. Um, we signed,
you know, an enterprise contract with
Anthropic and it's gone to the point
where now, um, I think when I last
talked to you it was 5 million spend
rate. It's actually 7 million spend
right now.
>> That was last week, by the way.
>> A lot of that is just the usage, right?
What's what's really, you know, people
people who are have never coded before
are using cloud code and spending
thousands of dollars sometimes a day,
but across a firm, we're spending $7
million a year now on cloud code at the
current rate. um versus our salary
expense being in the neighborhood of $25
million. So, you know, we're north of
25% of spend on cloud code as a
percentage of salary. And if this
trajectory continues, then you know,
we'll spend more than 100% by the end of
the year. Uh which is a bit terrifying.
Thankfully, I don't have to decide
between people and AI because our
company's growing so fast. It's, you
know, more so like, okay, well, I don't
have to hire nearly as fast and I can
spend a lot more on AI and it works and
we just grow faster. But I think other
folks will start to reckon with the fact
that, huh, if this person can do the
work of five to 10 to 15 people uh using
cloud code, then all of a sudden I
should probably cut people. But right
now, I think the use cases are so broad.
For example, one thing is we have a
reverse engineering lab in Oregon that
we've been building for a year and a
half. We have a bunch of, you know,
fancy microscopes, scanning electron
microscopes. The whole purpose of this
is you reverse engineer chips. You get
uh the architecture out of it. you get
the materials that they're using to
manufacture and this is some of the data
we sell. This is a very slow process of
analyzing that data. Instead, um one
person on the team, they've been able to
spend with a couple thousand dollars of
cloud tokens. They've been able to
create this application that is GPU
accelerated runs on a server that we
have at Coreweave and anytime we send it
an image, it's able to take the picture
of the chip and overlay where every
single material is. Oh, this part is
copper. Oh, this part of the gate is uh
tantelum. This part of the gate is
germanium. This part of the gate is
cobalt. And so you can do a finite
element analysis of the entire stackup
of the chip very very quickly visual
with a dashboard guey it's everything
few thousand dollars would took claude
the person previously worked at Intel
and he said that was an entire team's
job to build that and maintain that now
rack that up across you know the entire
firm it's it's insane another example
that I think is super fun is Malcolm
who's an economist at a major bank
before um their economist department was
like 100 or 200 people what he built was
the most incredible thing ever. He piped
all of this different data, you know,
FRED data and all these other data,
right? Employment reports and all these
other things from various APIs. We
signed a couple contracts with folks to
get API access to data. Pulled it all
in, started running regression, started
looking at the impact of various
economic revolutions on the economy um
from a deflationary inflationary
perspective. The BLS has this entire um
Bureau of Labor Statistics has this
entire like set of like 2,000 tasks. And
so he did that with AI, which ones can
be done by AI, which ones cannot, and
grading them across a rubric. You know,
about 3% are doable now with AI. Um, and
so he's created this like metric so that
you can measure things that can be done
by AI, what what the massive
deflationary uh, you know, what the cost
of being able to do those with AI and
therefore the deflationary aspect of it.
You know, output can go up. It's called
phantom GDP is what he called it.
Phantom GDP. Output can go up, but
because cost falls so much, actually GDP
theoretically shrinks. So he created
this whole analysis and a brand new
benchmark of uh language models um a set
of evals across 2,000 different evals.
Right.
>> This all by himself.
>> This is all by himself. Yeah. And he's
like dude this would have taken the team
of 200 economist a year. He's just like
he's like completely cracked out on
claude. He's like everything has
changed.
>> How do you think about as a business
owner going from close to zero to 25%
accelerating towards whatever percent of
total spend? Like at what point are you
like, whoa, I need to put the brakes on
this and be careful how much we're
spending. Maybe we don't need to spend
on the most cutting it on Opus 4.7,
which came out today. Maybe I can
throttle it back to something that's a
little bit cheaper.
>> Ultimately, like I'm in the information
business, right? That that is, you know,
we sell analysis, we sell, we do
consulting, we create data sets. I don't
see why this wouldn't be completely
commoditized on a pretty rapid basis if
I'm not constantly improving. my first
product that I was selling as a data set
actually it is you know like there's
more people trying to do it now we've
made it constantly better and better and
better and more detailed and so
therefore it sells a market but the way
we were doing it in 2023 is not terribly
different than you know is it's it's
basically what everyone else is doing
now if I don't move up the bar then I
will be commoditized if I don't move
fast enough I will also lose my edge so
the question is yes AI commoditizes
things just like it commoditizes
software those who can move fast and
keep control of their customers and keep
providing them an awesome service and
keep improving the service won't shrink.
They'll grow. They'll grow faster. Those
who are incumbent and not doing
anything, they're going to lose. And so,
it's a bit of an existential like if I
don't adopt AI, someone else will and
they will beat me. Uh, another easy
example is the energy space. So, we've
had a few energy analysts for a couple
for like a year now. We've been trying
to build out this energy model. It's
very complex. Energy's data services
market is something like $900 million.
So obviously a huge market for me to try
and break into but it has you know we
really hadn't broken into the energy
data services business despite a year of
having multiple people on the team. Um
then cloud code psychosis hits one of
the people who leads the data center
energy and industrial sort of business
at semi analysis uh Jeremy hits him and
now all of a sudden in 3 weeks um he
spent a lot he was spending like $6,000
a day. It was an insane amount but he
scraped every single power plant in the
US every single transmission line above
a certain voltage. um and created this
entire mapping of the entire US grid as
well as a lot of demand sources all from
various public sources of data. Um and
we've shown it to and and we built and
it's got like this dashboard where you
can view and check you can see all the
micro regions of the US where there's
power deficits and surpluses. Um all of
these details built in a handful of
weeks we started showing some of our
customers who buy our data center data
set but are energy like traders. We
showed some of them and they're like wow
how long did this take you? This is
really good. this is better than XYZ
company and then we like get dig deeper.
XYZ company has 100 people and have been
working on this for a decade now.
Obviously our thing is not fully robust
as robust but in some ways it is better.
I'm going to commoditize these energy
services companies, data services
company. Who's going to come commoditize
me if I don't move faster? And so the
question from a business owner's
perspective is yeah I'm spending a lot
but what does that spend getting me? Is
it getting more revenue? Yeah.
>> Most software companies try to maximize
your time on their app to juice
engagement. RAMP does the exact
opposite. RAMP understands that no one
wants to spend hours filing expense
reports, reviewing expense reports, and
checking for policy violations. So, they
built their tools to give that time
back, using AI to automate 85% of
expense reviews with 99% accuracy. And
since RAMP saves companies 5%, it's no
wonder that Shopify runs on RAM, Stripe
runs on RAM, and my business does, too.
To see what happens when you eliminate
the busy work, check out
ramp.com/invest.
OpenAI, Cursor, Enthropic, Perplexity,
and Verscell all have something in
common. They all use Work OS. And here's
why. To achieve enterprise adoption at
scale, you have to deliver on core
capabilities like SSO, SKIM, Arbback,
and audit logs. That's where work OS
comes in. Instead of spending months
building these missionritical
capabilities yourself, you can just use
work OS APIs to gain all of them on day
zero. That's why so many of the top AI
teams you hear about already run on work
OS. Work OS is the fastest way to become
enterprise ready and stay focused on
what matters most, your product. Visit
works.com to get started. Felix by Rogo
is a personal finance agent that turns a
single prompt into finished clientready
work using your firm's own templates,
context, and standards. Send Felix an
email like, "Take these comments and
turn them for me, or update my tracker
with the context of these emails, or run
the ability to pay math on this buyer."
And Felix sends back finished PowerPoint
decks, Excel models, and sourced
research. Felix works the way your team
already does, delivering work quickly
and accurately around the clock. Learn
more at rogo.ai/felix.
Are you worried that in the limit the
people that control capital and
investing capital who are often hiring
you for for what you do will just say,
"Well, we have analysts too who are
really smart about this. Like, we'll
just build this ourselves." Like if it's
getting that easy, at what point does it
just all pull into the investment firms
that stand to gain the most because they
have the most leverage on top of the
data or the insights that that they
glean?
>> First of all, any information services
business, obviously I don't generate as
much value as my customer does from such
information. Uh because if I sell you
information for a dollar, you're only
buying it for a dollar because you know
that information helps you make a
decision that lets you make more than
$1. And so therefore, you have you have
arbit you you have made more money off
of me than I did from the information
myself. An investment fund, these
investment funds all have their own
information services, you know,
especially like the super like the Jane
Streets of the world and the Citadels.
They're they're really detailed on their
data. And yet, um, these sort of folks
also purchase data from us and continue
to do so and continue to grow with us
because I think there's just some some
it factor, right? We move faster, we're
more nimble, we're a smaller team that's
focused on just one specific thing. uh
AI infrastructure and and the huge
revolution that causes in AI um and
tokconomics and all these things and and
we sort of really see where it's headed
and so we're moving faster and building
faster. Um I think investment
professionals just would you know yes
they'll try and build some of the stuff
we do and um more likely they'll just
buy the data from us and it's cheaper
for them to buy the data from us and
then to build and then build on top of
it than it is to build it themselves.
But ultimately some may try. I feel like
every conversation I have with you, what
I'm always getting at is just supply and
demand of tokens like that's the thing
that's interesting to me in the world
right now. What has this experience
taught you about the demand? Has it
changed your view on the demand side of
that equation? Just feeling it
viscerally yourself.
>> If we take a step back and look at the
macro lens, right? Enthropic has gone
from 9 billion revenue to what they're
at 3540 billion now. Probably by the
time this airs 40 45 billion, who does
ARR? Their compute has not grown to the
same degree. Um, and if you do the
calculations and you assume they didn't
decrease their research and development
compute, they clearly didn't. Their
release, they have Mythos, they have up
is 4.7. So they clearly didn't decrease
their research compute spend. Um, so
ultimately what they've done, even if
you assume all incremental compute
they've gotten has gone towards
inference, their margins are at a floor
of 72%. In reality, some of that
incremental compute they've got probably
went to research and development. It may
be higher than 72% gross margins. To be
clear, at the start of the year, they
started uh there was um there was a leak
by someone from their funding some some
of their funding round docs. Someone
leaked it 30 something% gross margins.
Where on earth does a business like this
grow margins like that? And it's in
principle, right? Their demand is so
high. They're able to cut back on usage
limits, rate limits, all these things.
Um, what really matters is having an
anthropic rep and having an enterprise
contract with them and getting the rate
limit increases that you need because
otherwise tokens are ultimately super
super in demand. Whoever whoever can pay
for them anthropic has the same problem,
right? Like I mean not problem, it's
it's just the reality of how capitalism
works. Yes, people are spending sending
them $40 billion AR in tokens and but
those tokens are generating way more
than $40 billion in value. Various
businesses will have different value
generation per token. But as we get more
and more intelligent, what really
matters is access to these most
intelligent tokens and leveraging them
at things. You as a person deciding what
is the best way to leverage these tokens
to grow business and generate value
because a lot of folks will want tokens
and generate tokens. Uh but the shitty
SAS startup and and and and SF who is
using Claude to generate, you know,
their software product is not
necessarily actually creating a ton of
value and therefore they're going to get
priced out of tokens uh soon enough.
>> Are you at all surprised that I I had
this experience just today where on the
flight here I got rate limited out on
something I saw 4.7 came out and what I
immediately wanted was like to be on 4.7
that second and I was it just I couldn't
think about using 4.6 anymore. or not.
This 47 is out. I was perfectly happy
with 4.6 for the last many weeks. It's
amazing. Are you surprised that people
are so insistent on going to the most
expensive leading edge thing to the
degree they are?
>> Without a doubt. One of my funniest
memories in the past month and a half is
myself and a buddy of mine, Leopold,
being on our knees in front of an
anthropic co-founder begging him for
access to Methos and then pretending it
doesn't exist cuz we knew it existed.
were like, "Please give us access." And
he's like, "I don't know what you're
talking about."
>> What was your reaction to that rate card
or that eval card coming out?
>> It was rumored in the Bay Area.
Everyone, you know, we sort of like knew
it was supposed to be really good, but
um if you just look at the benchmarks
and obviously benchmarks change over
time, Mythos is potentially the biggest
step up in model capabilities in like 2
years. I think that's really really an
an important detail that you know it
it's so good that they're like don't
want to release it even though they're
they they already announced the price to
their people that they did a selective
release for cyber for and it's like five
or 10x the token cost. They just don't
want to release it um because they're
worried about the like impact on the
world and they're releasing a shitty
worse version of open 47 to us and they
explicitly said in the model card hey we
actually preferentially made it worse at
cyber. I don't know if you read that.
whoever you are, if you have enough
capital, you should get a freaking
enterprise cloud uh enterprise anthropic
subscription where you pay per token,
not with these like subscriptions
because then you won't get rate limited
much. And then you must you need to
figure out how to leverage those tokens
to the highest value task um and make
money off of it because ultimately what
you're doing maybe maybe like a year
from now or two years from now the
business is actually just arbitrageing
tokens, right? The tokens are amazing,
but let's figure out what direction to
point them in and then three or four
years from now the model will know, you
know, what to do with the tokens and how
to make the most value. You know, you
can look at this retroactively. Pick any
benchmark. The cost to hit a certain
capability tier used to cost X and now
it cost 1/100th or 1/ 1,000th of that.
Deepseek, for example, on GPD4 was
1/600th the cost. And since then, the
costs have fallen further for GPD4 class
models. Of course, no one gives a crap
about GP4 class models. They want the
frontier because the frontier lets them
create the economically valuable things.
But GP4 class models can still be used
in like stuff and so people are using
them in some like tiny use cases. It's
just the cost have fallen so fast. It's
it's not really what's driving the
demand. What's driving the demand is is
all these new use cases. Yeah. Current
4.6 opus or 4.7 opus tier models a year
from now my spend for the same exact
quality of the model would probably be
like 70k. I bet you it'll be 100 times
cheaper. irrelevant because I'm going to
be using a way way way better model
which can do way way better things.
Enthropic mythos is more expensive as a
model but it spends a lot less tokens to
do the thing and therefore it is
actually cheaper in most tasks than 46
opus because it's just way more
efficient even though each individual
token is smarter.
>> When I last saw you Methos had just come
out maybe the day before or something or
the the card had just come out and you
said something like uh it actually made
you feel like a little scared it was so
good. What did you mean by that?
Anthropic's whole like goal in 2025 was
and and even a lot of 2024 they're like
hey by the end of 2025 we need an L4
software engineer uh in our model and
and they by and large achieved that with
46 Opus. What they didn't say is that
you know and if you look at Mythos and
if you compare like the benchmarks it's
like an L6 engineer. So L4 is like
pretty new. L6 is like quite well
experienced. I think Anthropic said that
the model internally was available in
February. So in two months they've gone
from L4 engineer to L6 engineer. Uh
what's next? Um you know when when you
think about the model progress it's only
accelerated. Enthropic release cadence
has compressed. Open's release cadence
has compressed. Why? Because these
models generally to make a better model
you need a few things right. You need
amazing compute. Compute is very
expensive and it has a time scale that
we you know we track and it's like you
know it's growing but like you know it's
it's sort of set in stone for the next
you know short short term. it's like
kind of set in stone what you've already
signed. Um there will be delays and
shifts and some somehow you can find a
little more but it's generally pretty
set in stone. There's amazing
researchers that people are paying tens
of millions of dollars for. And then
lastly there's implementation.
Historically has been very difficult. If
I have an idea now I have to implement
it. Implementing is hard. Now ideas are
there. Implementation is very easy. It's
expensive but it's very easy. So how do
you how does one decide what ideas to
implement? And it turns out if your
implementation is just so much easier
now you can just implement more ideas
and move on the treadmill faster and
faster and faster. Whether that is AI
model research and so now your model
release cadence is shrunk to down to 2
months from where it was 6 months before
or hey I want to I want to take every
power plant in the US and every
transmission line and model it and run
regressions and see the micro supply and
demand. I can also do that. The idea is
cheap. You know which idea makes sense?
which idea is worth the capital that you
have to spend on the tokens because the
implementation is there. It's it's
that's the I think the key learning and
if implementation costs continue to tank
which they are um we don't even have
mythos yet. It's only been you know a
handful of hours since Opus 47 launched
but you know my team is pretty excited
about it internally. What now comes to
the world uh it's a complete reordering
of how like economies work. What used to
matter a lot was execution was very very
[ __ ] difficult and ideas were cheap.
Now ideas are cheap and plentiful but
execution is very easy. So really only
the good ideas are worth are the ones
that can justify the spend on super
cheap implementation.
>> So are you actually scared or are you
just is it just does it just introduce
an uncertainty that's hard to grapple
with?
>> Uncertainty is there. Um but I do I do
think that causes some fear in terms of
how does society reform itself? How does
one
exist in a world where actually any you
know your ability to implement something
is not actually that important. Your
ability to choose the correct idea for
AI to implement and then your ability to
sell that idea or sell what the AI has
implemented is what matters. Your
ability to garner capital towards that
is what matters. And going back to the
point of like it's very important to
have the newest model always. Who's
going to have access to the newest
model? Anthropics project. I know it's
not called earwig, but I troll anthropic
people by calling it earwig. Um,
glasswig anthropic earwig, you know,
where they only release mythos to
certain companies for cyber. That's just
going to be something that continues.
Models will have less broad and less
broad deployment. I know I know Open AI
and Enthropic and all these people are
like, we want to have great AI for
everyone. AI is very [ __ ] expensive.
Who's going to pay for the trillion
dollars of infrastructure? People who
have money and can can build useful
things with AI. And then you don't want
people to distill your models. So you
don't release them broadly. Uh you
release them to a fewer and fewer set of
customers. Those customers are also now
wrestling over the tokens unless
anthropic jacks them. You know, they
could double their pricing on Opus and I
would continue to pay and I bet most
users would continue to pay.
>> I bet that wouldn't solve their
humongous capacity problem that they
have. So then the question becomes where
does this cycle end where you know token
usage and therefore the benefits of
those tokens the additional value
generated on top of those tokens
aggregates among fewer and fewer and
fewer companies. I don't have mythos.
You know who has mythos? Top freaking
banks. Um now they're only using it for
cyber security. But at some point I can
envision a world where hey maybe I
because I have an enterprise enthropic
contract and because enthropic people
kind of like me they're willing to give
us like slightly earlier access or
slightly higher rate limits or something
for a model. I hope that's what happens.
And then my competitor whoever that is
doesn't have that and I'm able to
[ __ ] crush them. There are people who
are like Ken Griffin of Citadel is like
super well-connected and super rich and
he's like he he just signs, you know,
who knows? He goes and signs a deal with
Open Arenthropic that's like, "Yeah, I'm
going to get access to your models. Um,
and I'll buy the first $10 billion worth
of tokens each year. So, whenever you
release the model, you know, I'll spend
the first 10 billion tokens and then
everyone else can get the model after
that." And it's like, okay, well, now
what does that do? Well, now he's going
to crush everyone in the market. That's
just an example. Could be cyber like
Anthropic is worried about, oh, now I
can hack people. could be information
services business like myself where I
crush someone else. I think you know it
it's it's such a broad base. We don't
know what these models can do. Anthropic
doesn't know what these models can do.
No one knows what these models can do.
It's up to the end user to figure out
where they can leverage the tokens to
see what they can build and imagine
which is tremendously productive and
uplifting for humanity. But then what
happens to the concentration of
resources and usage of it?
>> Presumably right now robotics or robots
consume relatively zero tokens versus
everything else. Do you see what's your
view of that? If that's like a second
demand curve that could start to
ratchet, there's a new startup every
single day, you know, within a mile of
here trying to build something
interesting in robotics.
>> So there's this concept of software only
singularity, which is that the world
has, you know, AI singularity, but only
in software. And now what about the rest
of the world? Vast majority of the world
is physical. You can see the world
orient around hardware, not software.
That's actually why I think software
only singularity is like just a blip and
not like a you know we we do get
everything else because once software is
super easy what makes robots really hard
it's like programming microcontrollers
and actuators and controlling all this
stuff is very difficult right now the
interesting thing about models AI models
is they're actually really inefficient
in learning it's just we're able to give
them so much data that they're able to
learn and surpass us in certain ways
robots currently the robot models um
VA's uh vision language action models
which is very popular right now is
probably not going to be the thing that
ultimately scales beyond. They are
inefficient in data um and we can't
scale the data for them fast enough.
There is going to be some way to large
scale pre-train robot models where just
like humans see all this data throughout
their lives. And what's interesting is
humans the reason why we're so good is
we're sample efficient. One example, two
example, we're good. And so applying
that to robotics. So once you once you
have this software only singularity
implementation is super cheap. anyone
can start to build these mo people can
start to build models that now robots
are actually useful and so I think in
the next six to 18 months we'll start
seeing real breakthroughs in robotics
that enable few shot learning i.e.
there's a pre-trained robot model and
now there's a robot that you have hired
or bought or whatever. You show it a few
examples and it's able to do it. You
tell it to stack these two things or you
tell it, hey, this can can actually like
balance perfectly, you know, and and it
starts doing these things.
>> Nicely done.
>> One shot.
>> No, trust me, I've spilled many of
times.
>> So, I think I think robots will get fot
learning right now. Now, you know,
there's a lot of companies doing robots
for like, you know, advertisement or
robots for like simple stuff like that,
but it'll be like, oh, folding clothes,
but it's going to get really niche like
robots just for cleaning chalkboards.
Um, and it's a rental service or, you
know, it'll be it'll be a model package
that you download onto your standard
robot that then does that, right? And
and you pay for that. And anyways, there
will be a huge explosion in physical
good acceleration and and deflationary
effects there. But and and so that's
that's ultimately going to keep token
demand going crazy. I I don't think
token demand slows down personally.
>> Did you learn anything else about the
world based on Mythos's results and how
it was built? My way of asking like the
you know if you break down the the
components of the scaling laws like the
>> So Methos is a materially larger model
than prior models and so yes it is a
much larger model. Now whether or not
it's it's what chip it's trained on is
not really relevant. It's the scale and
obviously you know to a 100,000 black
wells is equivalent to hundreds of
thousands of prior generation chips.
TPUs and tranium have their different
release cadence. So it's not exactly
like mirrored one to one. Um but
ultimately yes mythos is a significantly
larger model. It's proof that the
scaling laws still work. Um everything
about it shows the trend line continues
of models. More compute into model makes
model better. And along the whole way
it's not just more compute into model
makes model better. along the whole way
we're also getting these compute
efficiency wins which are you know as as
all this research compute that the labs
are spending is actually turning into if
I want x capability tier model every 6
months that cost or every two months
that cost is dramatically decreasing but
then if I scale it up massively I get a
humongous capability jump as well and so
yes it's it's proof that this is still
happening Google and anthropic are not
heavy heavy users of GPUs on the
training side but openai they'll they'll
start having their new class of models I
think they're taking a more sensible
principled approach to scaling uh in
small steps. Enthropic really went for a
huge jump. We'll see better and better
models throughout the year and the
release cadence is only going to get
faster.
>> We've gone a long way in the
conversation with saying almost nothing
about OpenAI which would have been so
strange.
>> So, so this is this is the interesting
thing. Everyone's like, okay, so
Anthropics just won, right? You know,
they had Methos in February. They never
even released it cuz they didn't feel
the need to. They're already sold out.
Their revenue is already adding $10
billion a month. Um and then you've got
Opus 47 today all before open eyes you
know um alleged Spud release which you
know media such as the information and
others have have posted about. So
clearly Anthropic is in the lead right
and OpenAI is cooked. What's interesting
is because Anthropic has such bounds on
compute and they can only grow it so
fast and sort of to the point of you
know you know Daria Daria used to gloat
about how OpenAI was being too
aggressive on compute and Anthropic was
more sensible in their scaling and now
Enthropic is like [ __ ] we should have I
wish we had a lot more compute. OpenAI
is able to pay the bills perfectly fine.
In fact, they've raised a ton of money
to get incremental compute in addition
to the irresponsible levels of compute
that they were buying from Oracle and
Core and SoftBank and all these people
and Microsoft uh you know such as
Tranium. Now they're getting tranium as
well from Amazon. Um so so they've done
this like insane thing on compute and
they need know they also know they need
more. But what's interesting is if you
were to say Opus 46, you know, let's
ignore models getting better over time.
Let's just take diffusion of this
technology. You and I may get jump on
the model immediately day one, but other
businesses take time and it takes time
for people to learn and the spark of oh
[ __ ] claude psychosis moment doesn't hit
everyone at the same time. And so by the
end of the year, let's say a 46 opus
tier model the economy would spend
$und00 billion on. I don't think that's
unreasonable. It's spending $40 billion
right now.
>> That's like a linear extrapolation.
>> It's a linear extrapolation, not a not
an exponential. To get the exponential,
you need the better models. Enthropic
won't have enough compute to do that.
And so and and presumably OpenAI and
Google will hit that tier soon enough.
Whoever hits that tier next, sure,
Enthropic may get to charge 70 plus%
gross margins, but if OpenAI hits it
next, they charge 50% gross margins.
They still get all of this incremental
demand. And probably they also won't
have enough compute to serve all the
users. And so, sure, maybe Mythos is a
model where if the world had enough
compute, it'd be $500 billion of revenue
or something crazy. There is such demand
for these tokens and such limitations on
compute, you know, and we see this with
H100 prices skyrocketing and the useful
life of these GPUs continue to extend.
It's pretty clear even the tier 2 lab is
going to be sold out of tokens, let
alone the tier one lab. The tier one lab
will have better margins, but the tier
two lab will be sold out and probably
the tier three lab will also be close to
sold out. Economic value that the best
model can deliver is growing faster than
our ability to actually serve those
tokens to people via the infrastructure.
And so this gap will continue to grow
and the model labs will continue to have
expanding margins until people in the
hardware supply chain infrastructure
supply chain are like wait no why don't
I just jack up my margins. So suffice to
say I think the assessment today or your
assessment of the demand side is
completely explosive in your own
particular example here at semi analysis
but just more broadly that as people
fall in you call it AI psychosis as
people fall into this experience of what
they can do the implementation
difficulty going completely away I I've
certainly felt that you know my own
token spend is just through the absolute
roof just in the matter of weeks so that
that feels like a pretty good assessment
anything we're missing on the demand
side
>> if you don't use more tokens you'll
never escape the permanent underclass
just expand on that.
>> So either either you use more tokens and
you generate economic value outsized
economic value for the use of those
tokens. Um a lot of people are doing it
the boring lazy way. Oh, I guess I'll
just work one hour a day instead of
eight hours a day and I'll have AI do
most of my job. That's the boring way.
The cool way is I'll still work eight
hours a day and I'll I'll do 8x the work
and maybe I'll make 5x the money. Um
maybe not you can't do this with a job
obviously. There's people who have
multiple jobs. Um there's people who
like start companies and start selling
stuff. get that economic value on on
this AI before everyone is using it and
it's table stakes. Uh because it's still
not table stakes if you don't use more
tokens and generate the value from them
and capture that value. These there's
three different problems here. Using
more tokens, generating value from those
tokens and capturing value from those
tok uh from the value that you created
from the tokens. Uh if you don't do
these three things, you'll never escape
the permanent underclass i.e. as models
continue to skyrocket in capability and
the concentration of resources
potentially happens.
>> Okay, let's talk about supply. what is
going on like how would you describe the
frontier of what's changing or what is
changing at the frontier of supplying
the the entire stack that's required to
serve all these tokens as the demand
curve explodes
>> as demand skyrockets prices are going up
for everything on the supply side um
whether it be the NGPUs
uh their prices are going up in addition
their useful life is extending
>> H100 prices look like this
>> yeah exactly there's people who have
argued GPU's full lives are less than 5
years complete nonsense
Um there are clusters now resigning
three or foury old hopper clusters
resigning for 3 or four more years. Um
there's A100 clusters that are resigning
for another couple years. So the useful
life is clearly not 5 years. It's maybe
even seven or eight years. Um arguably
we we don't know yet. We'll see. We'll
see when Hopper gets there, but it it's
clearly not 5 years. So the useful life
is extending and the prices are going up
on that renewal. So in effect the gross
margin was not 35% on a cluster, it's
beyond that. Um so margins are expanding
in the in the cloud layer. Margins are
um extremely healthy on the hardware
layer with you know Nvidia still
charging 75 or whatever percent gross
margin as we move down the stack. Memory
obviously margins have skyrocketed
there. Places like optics and logic
there are large prepayments um and
margins are growing slowly um more so
the companies that are making chips like
Nvidia are paying huge prepayments. So
in effect the cast of capital or timing
of cash flow return on invested capital
is going up even if the gross margin
isn't. And you see this across the whole
supply chain. You see ASML is completely
sold out and they need Carl Zeiss to
expand faster. Everywhere along the
chain
everyone's either sold out and margins
are going up or they're getting
prepayments increases the return on
invested capital because the invested
capital is lower. And so this is like a
consistent trend across any part. It's
it's even like you know a PCB to make a
PCB requires copper foil and that copper
foil is sold out and people are making
prepayments for it. It's like anything
and everything that like has a pulse and
is like sold out. People are like
jumping to get more incremental supply
and fighting over the supply for the
years after.
>> As your business scales up, everything
gets more complex, especially your
compliance and security needs. With so
many tools offering band-aids and
patches, it's unfortunately far too easy
for something to slip through the
cracks. Fortunately, Vanta is a powerful
tool designed to simplify and automate
your security work and deliver a single
source of truth for compliance and risk.
There's a reason that Ramp, Cursor, and
Snowflake all use Vanta. It frees them
to focus on building amazing
differentiated products, knowing that
compliance and security are under
control. Learn more at vanta.com/invest.
I know firsthand how complex the tech
stack is for asset management firms. And
seemingly every new tool and data source
makes the problem even worse. Adding
more complexity, more headcount, and
more risk. Ridgeline offers a better way
forward. One unified platform that
automates away that that automates away
that complexity across portfolio
accounting, reconciliation, reporting,
trading, compliance, and more. All at
scale. Ridgeline is revolutionizing
investment management, helping ambitious
firms scale faster, operate smarter, and
stay ahead of the curve. See what
Ridgeline can unlock for your firm.
Schedule a demo at ridgeline.ai.
What do you think are the most important
bottlenecks? Like typically in economic
history when there's this kind of
demand, supply reorients and rises very
very quickly to meet the demand. It
seems like it's almost impossible for
supply right now in this moment to keep
up. You know, famous last words, every
every shortage is followed by a glut
historically. But what are the most
interesting bottlenecks to you on across
the supply side?
>> Supply chains are usually very fast to
react. Um, one unique thing is that our
supply chains now are more complex than
ever. and the things we're building are
more complex than ever and therefore the
lead times are longer. Um, and it's not
like we haven't seen 18-monthl long lead
times in other industries. It's just
building incremental supply didn't take
years. Um, and this is the case with
memory, right? Memory can only grow
capacity, you know, low double digit
percentages a year, right? 20s 30% a
year. Um, even less for NAND, a little
bit higher for DRM. Even though the
demand signal was very strong at the end
of 2025, the memory companies
immediately sort of started reacting.
None of that incremental capacity really
gets here until the second that they've
decided to do in addition to the typical
20 to 30%. You know, they can stretch a
little bit, but really the true
incremental supply doesn't come till 28,
which is a very unique thing. Even if
they wanted to build as fast as
possible, it doesn't come till 28 uh
early late 27 at best. And so the result
is memory prices have, you know, gone
through the roof. And guess what?
they're going to double and triple
again. Um, at least on DRAM especially,
people are like, "Oh, the memory storage
is overplayed. Everyone gets it." And
it's like, "No, no, no. You don't get
it." DM will double or triple from here
still because that's that's how much
capacity is required and they have to
steal capacity from somewhere else. And
the only way to steal capacity from
somewhere else in a in a capitalist
economy is demand destruction via higher
pricing. We're not like rationing stuff
here. And so ultimately, that's what's
going to happen. And so margins continue
to go up. Um, I think Logic also has
humongous uh capacity problems. TSMC
just had their earnings. Uh, they keep
upping capex. Ultimately, you know, it
takes them quite some time to build
fabs. Um, they're trying to do
everything they can to squeeze every
little output out of every fab that they
have. But ultimately, they're not
raising prices fast because they're good
people. It seems like, um, you know,
singledigit price increases instead of,
you know, tripledigit price increases
like the memory guys have had. And so
you ultimately have like this like
market where yeah TSMC is a great
company but are they are they actually
going to extract all the value? I
mentioned things like copper foil, glass
fibers for PCBs, lasers. These are
things that are like well understood and
niche supply chains but they're very
very tight. Um and ultimately upstream
the semiconductor wafer fabrication
equipment supply chain is one that like
I still think is it's gone up a lot but
it's still very underappreciated. TSMC
capex this year they say 56. Uh we've
had 57.4 4 billion since January. Um,
and we may up it slightly more just
because we see some some ways that they
can get incremental capex. But what
people aren't focusing on is what does
that mean next year and what does that
mean the year after? And it turns out 3
years from now TSMC is going to spend
hundred billion on capex. U maybe two
years from now, right? Maybe 28.
Sincerely, they may spend $und00 billion
on capex in 2028. And people like just
can't fathom that. But what does that
mean for their downstream supply chains?
um you know companies like Lamb Research
or Applied Materials or ASML or their
further downstream supply chains like
MKSI and and all these other companies
the tail whip just gets whipped harder
and harder and harder and ultimately
that's a shortage if you know TSMC wants
to spend $100 billion in 2028 which is a
real possibility I think people would
think that's insane but that's a real
real possibility
>> what about other parts of the chip
ecosystem where GPUs have been
completely dominant what about like CPUs
or AS6 or things that start to pop out
as both opportunities and bottlenecks
beyond just like Nvidia's GPU dominance.
>> Yeah, I mean AS6 are obviously taking
off, but I'll sort of pivot away from AI
chips to talk about these other things.
There's a project we did on FPGAAS and
there turns out there's 120 FPGAs per
per um next generation rack um AI rack
and then like what about all the FPGA
names CPU wise all these reinforcement
learning environments plus all the slop
code you and I are generating that is
now running on some you know Versell
instance or whatever it is um or some
AWS instant or some bucket that we've
spun up all of that requires CPU and so
CPUs are completely sold out and demand
is skyrocketing there
>> help people understand the role that CPU
plays and everything.
>> Yeah. So there's two there's two main
reasons why you need tons of CPUs. One
is when you're doing reinforcement
learning um the CPU is very critical to
that. So so before you would throw all
the internet's data into the model,
train it, spit it spits and it it spits
some stuff out. Now you train all the
world's internet you put all the
internet data into the model. Then you
put it in this environment. This
environment is like hey model try this
out and it tries stuff out. It tries a
bunch of different things and in the end
there is an environment which scores
whether or not what it tried out is
successful and it grades it. And these
environments can be anything. It can be,
hey, check if the text was outputed in
the right way, structured outputs. It
can be very simple stuff. It can be very
complex stuff. Um, and people are
starting to get into very complex
things, right? Like, hey, I want you to
open this file, change it, edit it,
update it, submit it to this website. I
want you to open up this physics
simulation from Seammens and edit this
CAD model. So the environments can get
more and more complex and those
environments run on CPUs. They don't run
on GPUs. They don't run on AS6. The AS6
run the model that takes the input data
from the environment, runs it through
the model. The model creates outputs of
various different trajectories, right?
Ways that it think it could solve it um
in different instances. those
trajectories are graded slashscored and
the ones that are successful you train
on and you update and you reiterate and
you iterate iterate iterate and so CPUs
are very useful for that one and then
once you have these great models and
you're deploying them those models are
generating code they're generating
useful output that useful output it
doesn't go from a GPU straight to the
human brain um it goes from a GPU or an
ASIC through to you know a deployed app
that you're deploying somewhere that
actually just runs on CPUs so that's
another area where there's a lot of
demand and and things are sold out um in
a large large way.
>> As you continue to assess and try to be
the world's best informed person on both
the trajectory of supply and demand,
what are things that you wish you knew
to make that understanding that you
don't know?
>> I think the hardest area for us um and
for everyone is understanding
tokconomics, economics of tokens. Um, I
think we have a really tremendously like
good insight into how much it costs to
run infrastructure, what the cost of
tokens are, what the cost of models are,
what the margins of these labs are, but
the usage and adoption is what's really
difficult to model, you know,
continuously, right? We we have these
like we had like crazy in January, we
had crazy estimates for February,
anthropic smashed them. How do we
calibrate this model? What are the data
sources for this? February, uh, we had
crazy assumptions for March and then
they smashed them. And everyone sees the
number of 10 billion and they're like
what the how do they add 10 billion in
revenue? Who is using all these tokens?
Why are they using them? What are they
building with them? And then more
importantly with what they're building
with these tokens, how is that actually
diffusing into the economy? And what
value is that generating? Because it's
not really something that you can
capture in any any GDP statistic, right?
all of the value of the tokens that I
use get transformed into better
information which I then sell at a
discount to what people used to sell
information for relatively because um
and therefore that information is now
making its way throughout the economy
and and people are making better
investment decisions or better
competitive decisions if they're a semi
company or data center company or
hyperscaler and now how how much what
what is the value of this and what has
that what has that done to the economy
it's clearly by every subjective metric
amazing Amazing. But where is the
phantom GDP? What is the phantom GDP?
How do we track the real economic?
Because because the GDP metrics are not,
you know, accurate if you were to say
what is the GDP that Dylan Patel is
making. It's tiny compared to what the
value that I think is being created. And
so ultimately, what is the value being
created by these tokens? Not on a basis
of, you know, just simple, you know,
what is the knock-on effect, right? What
is the knock-on effect of all the things
that these things are doing? I think
that's the real uh question and
challenge uh that's hard to measure. I
think we've got a tremendous, you know,
reading on the supply side of things. I
think we've got a tremendous reading on
even a lot of the demand side signals,
but it's it's what is the value these
tokens are generating. That's hard to
quantify and measure.
>> I hope we get a chance to do this like
every 3 months because this changes so
quickly. What do you think is going to
happen next? Like when I when I come
back 3 months from now and we're in San
Francisco together again, what do you
expect?
>> Large scale protests.
>> Really?
>> Yeah. Yeah, I think there will be a
large scale protest against anthropic
>> and open AI.
>> Expand on that a little more.
>> Um, people hate AI. Um, AI is less
popular than ICE, less popular than
politicians. Confused how Pew surveyed
this, but apparently AI is less popular
than politicians. You know, with
Enthropic adding so much revenue, that's
going to start causing business changes
downstream. People are going to get more
and more scared of AI. they'll start
blaming more and more of their own
problems and things that are, you know,
global, you know, have been deep-seated
problems for a long time. Those will
bubble up and be blamed on AI. Um,
probably some politician or some social
media people will start to be able to
take uh influencer will be able to start
taking and weaponizing AI against
people. You look at the comments of news
articles where Sam Alman had a Molotov
cocktail thrown in his house twice in
like two weeks. They're like, people are
cheering it on. Uh, and this is just the
beginning. So, I think I think we'll see
large scale protests against AI in three
months.
>> What is the counterwe to that? Like, how
should the AI industry head that off?
>> First of all, Sam Alman and Dario have
to stop getting on interviews. They're
so uncarismatic.
I don't know what they're doing. Every
interview they do is like, wow, normal
people are going to hate you even more.
Like, Sam being on Tucker Carlson
probably made all Republicans hate
OpenAI. And same with Dario. They just
have no charisma. I think that's first.
Two, they need to start showing
uplifting things that can be done with
AI. Um, three, they need to stop talking
about how the capabilities are going to
change the whole world constantly
because then people are going to get
fear of that capability because they
have no connection.
>> They don't know how to use it. Yeah.
>> There's no connection to it either. It's
like the average person doesn't know an
anthropic employee. The average person
doesn't know an open eye employee.
average person doesn't know who these
people are, what their goals are, and
they just view them as like this like
sneaky cobball of like 5,000 people at
this company that are going to change
the world and automate all the jobs and
and destroy society. That's what they
view it as. And and as people who are
funding the building of all these data
centers and and power plants that are
going to pollute the world, right? They
don't quite understand what's happening.
You know, they have to stop talking
about the future thing that's going to
happen and only talk about present, how
uplifting AI is. I think it's a huge
reorg and rebranding that needs to be
done.
>> I love doing this with you. Thanks for
your time.
>> Awesome. Thanks.
>> Your finance team isn't losing money on
big mistakes. It's leaking through a
thousand tiny decisions nobody's
watching. Ramp puts guard rails on
spending before it happens. Real-time
limits, automatic rules, zero
firefighting. Try it at ramp.com/invest.
As your business grows, Vant scales with
you, automating compliance, and giving
you a single source of truth for
security and risk. Learn more at
vanta.com/invest.
Ridgeline is redefining asset management
technology as a true partner, not just a
software vendor. They've helped firms 5x
and scale, enabling faster growth,
smarter operations, and a competitive
edge. Visit ridgelineapps.com
to see what they can unlock for your
firm.
Every investment firm is unique, and
generic AI doesn't understand your
process. Rogo does. It's an AI platform
built specifically for Wall Street,
connected to your data, understanding
your process, and producing real
outputs. Check them out at
rogo.ai/invest.
The best AI and software companies from
OpenAI to Cursor to Perplexity use work
OS to become enterprise ready overnight,
not in months. Visit works.com to skip
the unglamorous infrastructure work and
focus on your product.
Ask follow-up questions or revisit key timestamps.
The video features a deep dive into the rapid adoption of AI tokens within professional workflows, particularly focusing on the shift from 'ideas being cheap' to 'execution becoming easy'. The speaker discusses how AI-driven tools are dramatically increasing productivity in sectors ranging from semiconductor reverse engineering to economic research. The conversation also explores the explosive growth in token demand, the structural supply chain bottlenecks for compute infrastructure, and the potential societal backlash against the rapid advancement of AI technologies.
Videos recently processed by our community