Dylan Patel — The single biggest bottleneck to scaling AI compute

Watch on YouTube

Now Playing

Transcript

1706 segments

0:00

All right, this is the episode where my roommate teaches me semiconductors.

0:04

It's also the send off for this current set. It is. After you use it, I'm like,

0:09

"I can't use this again. I gotta get out of here."

0:11

No sloppy seconds for Dwarkesh. Dylan is the CEO of SemiAnalysis.

0:18

Dylan, here’s the burning question I have for you. If you add up the big four—Amazon, Meta, Google,

0:23

Microsoft—their combined forecasted CapEx this year that you published recently is $600 billion.

0:31

Given yearly prices of renting that compute, that would be close to 50 gigawatts.

0:38

Obviously, we're not putting on 50 gigawatts this year,

0:40

so presumably that's paying for compute that is going to be coming online over the coming years.

0:46

How should we think about the timeline around when that CapEx comes online? Similar question

0:51

for the labs. OpenAI just announced they raised $110 billion, and Anthropic just

0:57

announced they raised $30 billion. If you look at the compute they

1:01

have coming online this year—you should tell me how much it is, but is it on the

1:08

order of another four gigawatts total? The cost to rent the compute that OpenAI

1:10

and Anthropic will have this year to sustain their compute spend is $10 to $13 billion a gigawatt.

1:18

Those individual raises alone are enough to cover their compute spend for the year.

1:23

And this is not even including the revenue that they're going to earn this year.

1:26

So help me understand: first, what is the timescale at which

1:30

the Big Tech CapEx actually comes online? And second, what are the labs raising all

1:34

this money for if the yearly price of a one-gigawatt data center is $13 billion?

1:41

So when you talk about the CapEx of these hyperscalers being on the order of $600 billion,

1:46

and you look across the rest of the supply chain, it gets you to the order of a trillion dollars.

1:51

A portion of this is immediately for compute going online this year: the chips and the

2:00

other parts of CapEx that get paid this year. But there's a lot of setup CapEx as well.

2:05

When we're talking about 20 gigawatts of incremental added capacity this year in America,

2:11

a portion of this is not spent this year. A portion of that CapEx was actually

2:16

spent the prior year. When you look at Google

2:19

having $180 billion, a big chunk of that is spent on turbine deposits for '28 and '29.

2:25

A chunk of that is spent on data center construction for '27.

2:28

A chunk of that is spent on power purchasing agreements, down payments, and all these other

2:33

things they're doing further out into the future so they can set up this super fast scaling.

2:40

This applies to all the hyperscalers and other people in the supply chain.

2:45

So with roughly 20 gigawatts deployed this year, a big chunk is hyperscalers, and a chunk is not.

2:51

For all of these companies, their biggest customers are Anthropic and OpenAI.

2:55

Anthropic and OpenAI are at roughly two to two-and-a-half gigawatts right

3:02

now, and they're trying to scale much larger. If you look at what Anthropic has done over the

3:08

last few months, with $4 billion or $6 billion in revenue added,

3:11

we can just draw a straight line and say they'll add another $6 billion of revenue a month.

3:16

People would argue that’s bearish, and that they should go faster.

3:18

What that implies is they're going to add $60 billion of revenue across the next ten months.

3:26

At the current gross margins Anthropic had, as last reported by media, that would imply

3:33

they have roughly $40 billion of compute spend for that inference, for that $60 billion of revenue.

3:39

That $40 billion of compute, at roughly $10 billion a gigawatt in rental costs,

3:44

means they need to add four gigawatts of inference capacity just to grow revenue.

3:49

That’s assuming their research and development training fleet stays flat.

3:55

In a sense, Anthropic needs to get to well above five gigawatts by the end of this year.

3:59

It's going to be really tough for them to get there, but it's possible.

4:01

Can I ask a question about that? If Anthropic was not on track to have

4:06

five gigawatts by the end of this year, but it needs that to serve both the revenue that's gone

4:12

crazier than expected—and maybe it's going to be even more than that—plus the research and training

4:16

to make sure its models are good enough for next year: Where is that capacity going to come from?

4:21

Dario, when he was on your podcast, was very conservative.

4:24

He said, "I'm not going to go crazy on compute because if my revenue inflects

4:28

at a different rate, at a different point… I don't want to go bankrupt.

4:31

I want to make sure that we're being responsible with this scaling."

4:35

But in reality, he's screwed the pooch compared to OpenAI, whose approach was,

4:40

"Let's just sign these crazy fucking deals." OpenAI has got way more access to compute

4:46

than Anthropic by the end of the year. What does Anthropic have to do to get the compute?

4:50

They have to go to lower-quality providers that they would not have gone to before.

4:56

Anthropic historically had the best quality providers, like Google and

5:00

Amazon, the biggest companies in the world. Now Microsoft is expanding across the supply

5:07

chain, and they're going to other newer players. OpenAI has been a bit more

5:12

aggressive on going to many players. Yes, they have tons of capacity from Microsoft,

5:16

Google, and Amazon, but they also have tons with CoreWeave and Oracle.

5:20

They've gone to random companies, or companies one would think are random, like SoftBank Energy,

5:25

who has never built a data center in their life but is building data centers now for OpenAI.

5:29

They've gone to many others, like NScale, to get capacity.

5:35

There's this conundrum for Anthropic because they were so conservative on compute,

5:42

because they didn't want to go crazy. In some sense, a lot of the financial

5:46

freakouts in the second half of last year were because, "OpenAI signed all these

5:50

deals but they didn't have the money to pay for them…" Okay, Oracle's stock is going to

5:55

tank, CoreWeave's stock is going to tank. All these companies' stocks tanked,

5:58

and credit markets went crazy because people thought the end buyer couldn't pay for this.

6:02

Now it's like, "Oh wait, they raised a ton of money.

6:04

Okay, fine, they can pay for it." Anthropic was a lot more conservative.

6:07

They were like, "We'll sign contracts, but we'll be principled.

6:11

We'll purposely undershoot what we think we can possibly do and be conservative because

6:16

we don't want to potentially go bankrupt." The thing I want to understand is, what does

6:20

it mean to have to acquire compute in a pinch? Is it that you have to go with neoclouds? Do they

6:26

have worse compute? In what way is it worse? Did you have to pay gross margins to a cloud

6:31

provider that you wouldn't have otherwise had to pay because they're coming in at the last minute?

6:35

Who built the spare capacity such that it's available for Anthropic

6:39

and OpenAI to get last minute? What is the concrete advantage

6:42

that OpenAI has gotten if they end up at similar compute numbers by 2027?

6:48

Are they just going to end this year with different gigawatts?

6:50

If so, how many gigawatts are Anthropic and OpenAI going to have by the end of this year?

6:56

To acquire excess compute, yes, there is capacity at hyperscalers.

7:01

Not all contracts for compute are long-term, five-year deals.

7:04

There's compute from 2023 or 2024, or H100s from 2025, that were signed at shorter terms.

7:11

The vast majority of OpenAI's compute is signed on five-year deals, but there were many other

7:16

customers that had one-year, two-year, three-year, or six-month deals, on demand.

7:20

As these contracts roll off, who is the participant in the

7:24

market most willing to pay price? In this sense, we've seen H100 prices

7:30

inflect a lot and go up. People are willing to

7:34

sign long-term deals for above $2 even. I've seen deals where certain AI labs—I'm being a

7:42

little bit vague here for a reason—have signed at as high as $2.40 for two to three years for H100s.

7:49

If you think about the margin, it costs $1.40 to build Hopper, across five years.

7:57

Now, two years in, you're signing deals for two to three years at $2.40? Those margins are way

8:03

higher. Now you can crowd out all of these other suppliers, whether Amazon had these, or CoreWeave,

8:09

or Together AI, or Nebius, or whoever it is. These neoclouds are the firms that had a

8:19

higher percentage of Hopper in general because they were more aggressive on it.

8:23

They also tended to sign shorter-term deals, not CoreWeave but the others.

8:30

So if I want Hopper, there is some capacity out there.

8:33

Also, while most of the capacity at an Oracle or a CoreWeave is signed for a long-term deal

8:39

in terms of Blackwell, anything that's going online this quarter is already sold.

8:44

In some cases, they're not even hitting all the numbers they promised they would sell because

8:48

there are some data center delays, not just those two, but Nebius, Microsoft, Amazon, and Google.

8:53

But there are a lot of neoclouds, as well as some of the hyperscalers, who have capacity they're

8:57

building that they haven't sold yet, or capacity they were going to allocate to some internal use

9:02

that is not necessarily super AGI-focused, that they may now turn around and sell.

9:06

Or in the case of Anthropic, they don't have to have all the compute directly.

9:10

Amazon can have the compute and serve Bedrock, or Google can have the compute and serve Vertex,

9:15

or Microsoft can have the compute and serve Foundry, and then do a

9:18

revenue share with Anthropic, or vice versa. Basically, you're saying Anthropic is having to

9:22

pay either this 50% markup in the sense of the revenue share, or in the sense of last-minute

9:28

spot compute that they wouldn't have otherwise had to pay had they bought the compute early.

9:32

Right, there's a trade-off there. But at the same time,

9:38

for a solid four months, everyone was saying to OpenAI, "We're not going to sign deals with you."

9:43

That sounds crazy, but it was because, "you don’t have the money."

9:45

Now everyone's saying, "OpenAI, we believed you the whole time.

9:48

We can sign any deal because you've raised all this money."

9:51

Anthropic is constrained in that sense. There are not that many incremental buyers of

9:58

compute yet, because Anthropic hit the capability tier first where their revenue is mooning.

10:03

That's interesting. Otherwise you might think having the best model is an

10:08

extremely depreciating asset, because three months later you don't have the best model.

10:12

But the reason it's important is that you can sign these deals, lock in the compute

10:16

in advance, and get better prices. Maybe this is an obvious point.

10:22

But at least until recently, people had made this huge point about the depreciation cycle of a GPU.

10:30

The bears, the Michael Burrys or whoever, have said, "Look, people

10:33

are saying four or five years for these GPUs. Maybe it's because the technology is improving

10:41

so fast, but it in fact makes sense to have two-year depreciation cycles for these GPUs,"

10:46

which increases the reported amortized CapEx in a given year and makes it financially

10:53

less lucrative to build all these clouds. But in fact you’re pointing out that maybe the

10:58

depreciation cycle is even longer than five years. If we're using Hoppers—especially if AI really

11:03

takes off and in 2030 we’re saying, "We have to get the seven-nanometer fabs up, we have

11:08

to go back and turn on the A100s again"—then the depreciation cycle is actually incredibly long.

11:18

I feel like that's an interesting financial implication of what you're saying.

11:21

There's a few strings to pull on there. One is, what happens to depreciation of GPUs?

11:30

I guess I didn't answer your prior question, which is that I think Anthropic will be able to

11:34

get to five gigawatts-ish, maybe a little bit more by the end of the year through

11:38

themselves as well as their product being served through Bedrock, Vertex, or Foundry.

11:45

I think they'll be able to get to five or six gigawatts, which is way above their initial plans.

11:53

OpenAI will be roughly the same, actually a little bit higher based on our numbers.

11:59

But anyway, the depreciation cycle of a GPU. Michael Burry was saying it's three years

12:04

or less. That’s sort of his argument. There are two lenses to look at this.

12:09

Mechanically, there's a TCO model, total cost of ownership of a GPU, where we project pricing out

12:17

for GPUs and build up the total cost of a cluster. There are a number of costs: your data center

12:23

cost, your networking cost, your smart hands and people in the data center swapping stuff out.

12:29

There's your spare parts, your actual chip cost, your server cost.

12:32

All these various costs get lumped together. There's some depreciation cycles on it,

12:37

certain credit costs on it. You build up to, "Hey, an H100 costs

12:42

$1.40/hour to deploy at volume across five years if your depreciation is five years."

12:48

If you sign a deal at $2/hour for those five years, your gross margin is roughly 35%.

12:53

It's a little bit above that. If you sign it for $1.90, it's 35% roughly.

12:58

Then you assume at that fifth year, the GPU falls off a bus and is dead.

13:03

In some cases, the argument people are making is if you didn't sign a long-term deal, because every

13:09

two years NVIDIA is tripling or quadrupling the performance while only 2X-ing or 50% increasing

13:15

the price… Then the price of an H100… Sure maybe the value in the market was $2 at 35%

13:20

gross margins in 2024, but in 2026, when Blackwell is in super high volume and deploying millions

13:28

a year, you’re actually now worth $1/hour. And when Rubin in '27 is in super high volume—even

13:33

though it starts shipping this year, it’s super high volume next year—doing millions of chips a

13:38

year deployed into clouds, you've got another 3X in performance, another 50% or 2X in price,

13:44

then the Hopper is only worth $0.70/hour. So the price of a GPU would continue to

13:49

fall. That's one lens. The other lens is, what is the utility you get out of the chip?

13:54

If you could build infinite Rubin or infinite of the newest chip,

13:59

then yes, that's exactly what would happen. The price of a Hopper would fall at a spot

14:04

or short-term contract rate as the new chips come out and the price per performance goes up.

14:10

But because you are so limited on semiconductors and deployment timelines, what actually prices

14:18

these chips is not the comparative thing I can buy today, but rather what is the value

14:24

I can derive out of this chip today. In that sense, let's take GPT-5.4.

14:31

GPT-5.4 is both way cheaper to run than GPT-4 and has fewer active parameters.

14:38

It's much smaller, in that sense of active parameter, because it's a

14:42

sparser MoE versus GPT-4 being a coarser MoE. There's also been so many other advancements

14:47

in training, RL, model architecture, and data qualities that have made GPT-5.4 way better than

14:54

GPT-4. And it's cheaper to serve. When you look at an H100, it can serve more tokens per GPU of

15:02

5.4 than if you had ran GPT-4 on it. So it's producing more tokens of a

15:07

model that is of higher quality. What is the maximum TAM for GPT-4 tokens?

15:16

Maybe it was a few billion dollars, maybe it was tens of billions of dollars. Adoption

15:19

takes time. For GPT-5.4, that number is probably north of a hundred billion.

15:23

But there's an adoption lag, there's competition, and there's the constant

15:27

improvements that everyone else is having. If improvements stopped here, the value of

15:32

an H100 is now predicated on the value that GPT-5.4 can get out of it instead

15:36

of the value that GPT-4 can get out of it. These labs are in a competitive environment,

15:42

so their margins can't go to infinity. You sort of have this dynamic that is

15:47

quite interesting in that an H100 is worth more today than it was three years ago.

15:51

That's crazy. It's also interesting from the perspective of just taking that forward.

15:56

If we had actual AGI models developed, if we had a genuine human on a server… These

16:06

are such hand wave-y numbers about how many flops the brain can do.

16:08

But on a flop basis, an H100 is estimated to do 1e15, which is how much some people estimate

16:15

the human brain does in flops. Obviously, in terms of memory,

16:19

the human brain has way more. An H100 is 80 gigabytes,

16:22

and the brain might have petabytes. Oh, yeah, you've got petabytes? Name a petabyte

16:28

of ones and zeros, bro. Name me a string. Well, this is actually the point.

16:33

No, we’ve just got the best sparse attention techniques ever.

16:36

Genuinely though. In the amount of information that is compressed, it might be petabytes.

16:42

The brain is an extremely sparse MoE. But anyways, imagine a human knowledge

16:48

worker can produce six figures a year of value. If an H100 can produce something close to that,

16:54

if we had actual humans on a server, the value of an H100 is such that it can repay

16:58

itself in the course of a couple of months. So when I interviewed Dario, the point I was

18:02

trying to make is not that I think the singularity is two years away and therefore Dario desperately

18:08

needs to buy more compute, although the revenue is certainly there that he needs to buy more compute.

18:12

The point I was trying to make is that given what Dario seems to be saying—given his statements that

18:17

we're two years away from a data center of geniuses, and certainly not more than five

18:21

years away, and a data center of geniuses should be earning trillions upon trillions of dollars

18:25

of revenue—it just does not make sense why he keeps making these statements about being

18:30

more conservative on compute or, to your point, being less aggressive than OpenAI on compute.

18:35

I guess that point got lost because then people were roasting me, saying, "Oh, this podcaster

18:39

is trying to convince this multi-hundred billion dollar company CEO to YOLO it, bro."

18:44

I was just trying to say that internally, his statements are inconsistent.

18:50

Anyway, it's good to iron it out. I think going back to the earlier

18:55

view that if the models are so powerful, the value of a GPU goes up over time, right now

19:06

only OpenAI and Anthropic have that viewpoint. But as we approach further out, everyone is going

19:11

to be able to see that value skyrocket per GPU. So in that sense,

19:19

you should commit now to compute. Interestingly, in Anthropic fashion,

19:28

there's a bit of a meme that they have commitment issues and are sort of polyamorous.

19:35

Not Dario, but this is a bit of a meme. Explains everything. By the way, there's

19:42

this interesting economic effect called Alchian-Allen, which is the idea that if

19:48

you increase the fixed cost of different goods, one of which is higher quality and one which is

19:54

lower quality, that will make people choose the higher quality good, on the margin.

19:59

To give a specific example, suppose the better-tasting apple costs two dollars and

20:04

the shittier apple costs one dollar. Now suppose you put an import tariff on them.

20:10

Now it's $3 versus $2 for a great apple versus a medium apple.

20:15

Is that because they both increased by a dollar, or should it be a 50% increase?

20:18

No, because they both increased by $1. The whole effect is that if there's

20:22

a fixed cost that is applied to both. Then the price difference between them,

20:28

the ratio, changes. Previously, the more

20:31

expensive one was 2X more expensive. Now it's just 1.5X more expensive.

20:34

So I wonder if applied to AI that would mean that, if GPUs are going to get more expensive,

20:39

there will be a fixed cost increase in the price of compute.

20:43

As a result, that will push people to be willing to pay higher margins for slightly better models.

20:51

Because the calculus is, I'm going to be paying all this money for the compute anyway.

20:55

I might as well just pay slightly more to make sure it's the very best model rather

21:00

than a model that's slightly worse. So the Hopper went from $2 to $3.

21:03

If a Hopper can make a million tokens of Opus and it can make two million tokens of Sonnet,

21:11

the price differential between Opus and Sonnet has decreased because the price of

21:15

the GPU has increased by a dollar from $2 to $3. Interesting. I think that makes a ton of sense.

21:22

We just see all of the volumes are on the best models today,

21:25

all the revenue is on the best models today. In a compute-limited world, two things happen.

21:34

One, companies that don't have commitment issues and have these five-year contracts for compute

21:41

have locked in a humongous margin advantage. They've locked in compute for five years

21:47

at the price it transacted at two, three, or five years ago.

21:51

Whereas if you're three years into that five-year contract and someone else's

21:55

two-year or three-year contract rolled off, and now they're trying to buy that at modern pricing,

22:00

when it's priced to the value of models, the price is going to be up a lot more.

22:05

So the person who committed early has better margins in general.

22:11

The percentage of the market that is in long-term contracts is much larger than the percentage of

22:15

the market in short-term contracts that can be this flex capacity you add at the last second.

22:21

At the same time, where does the margin go? Because models get more valuable,

22:28

how much can the cloud players flex their pricing?

22:33

If you look at CoreWeave, their average term duration is over three years right now.

22:39

For ninety-eight percent plus of their compute, it's over three years.

22:43

They end up with this conundrum where they can't actually flex price.

22:46

But every year they're adding incrementally way more capacity than they had previously.

22:52

This year alone, Meta's adding as much capacity as they had in their entire fleet of compute and data

22:58

centers for all purposes for serving WhatsApp, Instagram, and Facebook in 2022, and doing AI.

23:03

They're adding that alone this year. In the same sense, you talk about Meta doing that,

23:07

CoreWeave, Google, and Amazon, all these companies are adding insane amounts of compute year on year.

23:13

That new compute gets transacted at the new price. In a sense, yes, you've locked in, as long as

23:19

we're in a takeoff. "Oh, OpenAI went from six hundred megawatts to two gigawatts last year,

23:24

and from two gigawatts to six plus this year, and six to twelve next year."

23:29

The incremental added compute is where all the cost is, not the prior long-term contracts.

23:34

Then who holds the cards is the infra providers for charging margin.

23:38

Now the cloud players, the neoclouds, or the hyperscalers can charge the margin.

23:43

They can to some extent, but then as you go upstream to who has access to all the memory and

23:48

logic capacity, it's Nvidia for the most part. They've signed a lot of long-term contracts.

23:53

They've got ninety billion dollars of long-term contracts today, and they're negotiating

23:56

three-year deals today with the memory vendors. You've got Amazon and Google through Broadcom,

24:04

Amazon directly, and AMD. These companies hold all the

24:07

cards because they've secured the capacity. TSMC is not raising prices, but memory vendors

24:13

are, to some extent, raising a lot of price. They're going to double or triple price again, but

24:18

then they're also signing these long-term deals. Who is able to accrue all the margin dollars is

24:23

potentially the cloud, potentially the chip vendors, and the memory vendors,

24:28

until TSMC or ASML break out and say, "No, we're going to charge a lot more."

24:33

But at the same time, do the model vendors get to charge crazy margins?

24:38

At least this year, we're going to see margins for the model vendors go up a lot.

24:41

Because they're so capacity constrained, they have to destroy demand.

24:46

There's no way Anthropic can continue at the current pace without destroying demand.

24:52

Let's get into logic and memory. How specifically has Nvidia been

24:58

able to lock up so much of both? I think according to your numbers,

25:02

by '27, Nvidia is going to have +70% of N3 wafer capacity, or around that area.

25:12

I forget what the numbers were for memory at SK Hynix and Samsung and so forth.

25:19

Think about how the neocloud business works and how Nvidia works with that,

25:22

or how the RL environment business works and how Anthropic works with that.

25:26

In both those cases, Nvidia is purposely trying to fracture the complementary industry to make sure

25:33

that they have as much leverage as possible. They're giving allocation to random neoclouds

25:37

to make sure that there's not one person that has all the compute.

25:39

Similarly, Anthropic or OpenAI, when they're working with the data providers, they say, "No,

25:44

we're going to just seed a huge industry of these things so that we're not locked

25:48

into any one supplier for data environments." And I wonder why on the 3 nm process—that's

25:56

going to be Trainium 3, that's going to be TPU v7, other accelerators potentially—why

26:03

is TSMC just giving it all up to Nvidia rather than trying to fracture the market?

26:09

There are a couple points here. On 3 nm, if we go back to last year,

26:15

the vast majority of 3 nm was Apple. Apple is being moved to 2 nm.

26:20

Memory prices are going up, so Apple's volumes may go down.

26:24

As memory prices go up, either they cut margin or they move on.

26:29

There's some time lag because they have long-term contracts, but Apple likely

26:33

reduces demand or moves to 2 nm faster, where 2 nm is only capable of mobile chips today.

26:39

In the future, AI chips will move there. So Apple has that. Apple is also talking to

26:44

third-party vendors because they're getting squeezed out of TSMC a little bit.

26:48

TSMC's margins on high-performance computing—HPC, AI chips, et cetera—are higher than they are

26:54

for mobile, because they have a bigger advantage in HPC than they do in mobile.

27:00

When you look at TSMC’s running calculus here, they're actually providing really good

27:06

allocations to companies that are doing CPUs. When you think about Amazon having Trainium and

27:14

Graviton, both of those are on 3 nm, Graviton being their CPU, Trainium being their AI chip.

27:20

TSMC is much more excited to give allocation to Graviton than they

27:23

are to Trainium because they view the CPU business as more stable, long-term growth.

27:30

As a company that is conservative and doesn't want to ride cycles of growth too hard,

27:35

you actually want to allocate to the market that is more stable with a lower growth rate first

27:42

before you allocate all the incremental capacity to the fast growth rate market. That is the case

27:48

generally. Same for AMD. The allocations they get on their CPUs, TSMC is much more excited

27:57

about those than they are for GPUs. Likewise for Amazon. Nvidia is a bit unique because yes,

28:03

they have CPUs, they make switches, they make networking, NVLink, InfiniBand, Ethernet, NICs.

28:11

By and large, most of these things will be on 3 nm by the end of this year with

28:14

the Rubin launch and all the chips in that family, the GPU being the most important one.

28:20

Yet Nvidia is getting the majority of supply. Part of this is because you look at the market

28:27

and TSMC and others forecast market demand in many ways, but it's also the market signal.

28:36

The market signaled, "Hey, we need this much capacity next year. We need this much. We'll

28:42

sign non-cancelable, non-returnable. We may even pay deposits." Nvidia just did

28:46

it way earlier than Google or Amazon. In some cases, Google and Amazon had

28:53

stumbling blocks. One of the chips

28:56

got delayed slightly by a couple quarters. Trainium and all these sorts of things happened.

29:01

In that case, there was a huge sort of, "Well, these guys are delaying,

29:05

but Nvidia is wanting more, more, more, more. And we are checking with the rest of the supply

29:10

chain, is there enough capacity?" They're going to all the PCB

29:13

vendors and saying, "Is there enough PCB?" Victory Giant is one of the largest suppliers

29:18

of PCBs to Nvidia, and they're a Chinese company. All the PCBs come from China, or many of them.

29:25

They're like, "Do you have enough PCB capacity? Great. Hey memory vendors, who has all the memory

29:28

capacity? Okay, Nvidia does. Great." When you look at who is AGI-pilled enough to buy compute

29:36

on long timelines at levels that seem ridiculous to people who aren't AGI-pilled—but nonetheless,

29:42

they're willing to pay a pretty good margin and sign it now because they view in the future that

29:49

ratio is screwed up—the same thing happens with the supply chain for semiconductors.

29:54

I don't think Nvidia is quite AGI-pilled. Jensen doesn't believe software is going

29:58

to be fully automated and all these things. Accelerated computing, not AI chips, right?

30:03

It's AI chips. But that's what he calls it, right?

30:05

Yeah. I think it's a broader term, AI is within that, but also physics modeling and simulations.

30:11

But it's like he's not embracing the main use case.

30:14

I think he's embracing it, but I just don't think he's AGI-pilled like Dario or Sam.

30:19

But he's still way, way more AGI-pilled than Google was in Q3 of last year, or Amazon was

30:30

in Q3 of last year, and he saw way more demand. The reason is pretty simple. You can see all the

30:33

data center construction. He's like, "Okay,

30:34

I want to have this market share." We have all the data centers tracked,

30:38

and there's a lot of data centers that could be one or the other.

30:44

To some extent, Google and Amazon, Google especially, even though their TPU is just

30:49

better for them to deploy, they have to deploy a crap load of GPUs because they

30:52

don't have enough TPUs to fill up their data centers. They can't get them fabbed.

30:56

I have a question about that. Google sold a million, was it

31:00

the v7s? Yes.

31:01

—the Ironwoods to Anthropic, and you're saying the big bottleneck right now, this year or next year,

31:07

I guess going forward forever now, is going to be the logic and memory,

31:13

the stuff it takes to build these chips. Google has DeepMind, the third prominent AI lab.

31:19

If this is the big bottleneck, why would they sell it rather than just giving it to DeepMind?

31:24

This is again a problem of… DeepMind people were like, "This is insane. Why did we do

31:29

this?" But Google Cloud people and Google executives saw a different thought process.

31:37

You and I know the compute team at Anthropic. Both of the main people came from Google.

31:45

They saw this dislocation, they negotiated a deal, and they were able to get access

31:49

to this compute before Google realized. The chain of events, at least from our

31:54

data that we found, was in early Q3, over the course of six weeks, we saw capacity

32:06

on TPUs go up by a significant amount. It went up multiple times in those six

32:12

weeks. There were multiple requests. Google even had to go to TSMC and explain to them

32:18

why they needed this increase in capacity because it was so sudden.

32:21

A lot of that capacity increase was for selling to Anthropic.

32:25

Because Anthropic saw it before Google. And then Google had Nano Banana and Gemini

32:29

3 which caused their user metrics to skyrocket. Then leadership at Google was like, "Oh."

32:34

Then they started making the statement that we have to double compute every six months,

32:37

or whatever the exact number was. They really woke up a lot more, and then

32:42

they went to TSMC and said, "We want more. We want more." TSMC replied, "Sorry guys, we're sold out.

32:50

We can maybe get 5-10% more for 2026, but really we're going to work on 2027."

32:54

There was this information asymmetry among the labs, in my mind. I don't know exactly.

32:59

It's the narrative I've spun myself from seeing all the data in the supply chain on

33:02

wafer orders and what's going on with the data centers that Anthropic and Fluidstack signed.

33:09

It's pretty clear to me that Google screwed up. You can see this from Google's Gemini ARR.

33:16

They had next to nothing in Q1 to Q3—in Q3 a little bit once they started inflecting.

33:21

But in Q4 they reached $5 billion in revenue on an ARR basis.

33:30

It's clear Google didn't see revenue skyrocket initially.

33:34

In a sense, Anthropic had a little bit of commitment issues before their ARR exploded,

33:40

even though they had far more information asymmetry and saw what was coming down the pipe.

33:44

Google is going to be more conservative than Anthropic and Google had even less ARR.

33:52

So they were just not willing to do it, and then they realized they should do it.

33:58

Since then, Google has gotten absurdly AGI-pilled in terms of what they're doing.

34:05

They bought an energy company. They're putting deposits down for turbines.

34:09

They're buying a ridiculous percentage of powered land.

34:13

They're going to utilities and negotiating long-term agreements.

34:15

They're doing this on the data center and power side very aggressively.

34:22

I think Google woke up towards the end of last year, but it took them some time.

34:26

How many gigawatts do you think Google will have by the end of next year?

34:28

Buy my data. You charge for that kind of information.

34:32

Yes, yes. I feel like every year the bottleneck for what is preventing us

34:37

from scaling AI compute keeps changing. A couple years ago it was CoWoS. Last

34:41

year it was power. You'll tell me what the bottleneck is this year.

34:45

But I want to understand five years out, what will be the thing that is

34:48

constraining us from deploying the singularity? The biggest bottleneck is compute. For that,

34:55

the longest lead time supply chains are not power or data centers.

34:59

They're actually the semiconductor supply chains themselves.

35:01

It switches back from power and data centers as a major bottleneck to chips.

35:08

In the chip supply chain, there's a number of different bottlenecks.

35:11

There's memory, logic wafers from TSMC, and the fabs themselves.

35:17

Construction of the fabs takes two to three years, versus a data center which takes less than a year.

35:25

We've seen Amazon build data centers in as fast as eight months.

35:28

There's a big difference in lead times because of the complexity

35:31

of building the fab that actually makes the chips.

35:33

The tools also have really long lead times. The bottlenecks, as we've scaled,

35:39

have shifted based on what the supply chain is currently not able to do.

35:44

It was CoWoS, power, and data centers, but those were all shorter lead time items.

35:50

CoWoS is a much simpler process of packaging chips together.

35:54

Power and data centers are ultimately way simpler than the actual manufacturing of the chips.

35:59

There's been some sliding of capacity across mobile or PC to data center chips,

36:08

which has been somewhat fungible. Whereas CoWoS, power, and data

36:12

centers have had to start anew as supply chains. But now there's no more capacity for the mobile

36:19

and PC industries—which used to be the majority of the semiconductor industry—to shift over to AI.

36:26

Nvidia is now the largest customer at TSMC and SK Hynix, the largest memory manufacturer.

36:33

It's sort of impossible for the sliding of resources away from

36:39

the common person's PCs and smartphones to shift any more towards the AI chips.

36:45

So now the question is how do we scale AI chip production?

36:48

That's the biggest bottleneck as we go to 2030. It would be very interesting if there's an

36:53

absolute gigawatt ceiling that you can project out to 2030 based just on "We can't produce more

37:01

than this many EUV machines." To scale compute further,

37:06

there are different bottlenecks this year and next year, but ultimately by 2028 or 2029,

37:11

the bottleneck falls to the lowest rung on the supply chain, which is ASML.

37:16

ASML makes the world's most complicated machine: an EUV tool.

37:21

The selling price for those is $300-400 million. Currently, they can make about 70.

37:27

Next year, they'll get to 80. Even under very aggressive supply

37:31

chain expansion, they only get to a little bit over 100 by the end of the decade. What

37:35

does that mean? They can make a hundred of these tools by the end of the decade, and 70 right now.

37:40

How does that actually translate to AI compute? We see all these numbers from Sam Altman and

37:46

many others across the supply chain: gigawatts, gigawatts, gigawatts.

37:50

How many gigawatts are we adding? We see Elon saying a hundred gigawatts in space.

37:55

A year. A year. The problem with any of

37:59

these numbers, or the challenge to these numbers, is actually not the power or the data center.

38:04

We can dive into that, but it's manufacturing the chips.

38:07

Take a gigawatt of Nvidia's Rubin chips. Rubin is announced at GTC,

38:14

I believe the week this podcast goes live. To make a gigawatt worth of data center

38:19

capacity of Nvidia's latest chip that they're releasing towards the end of this year,

38:24

you need a few different wafer technologies. You need about 55,000 wafers of 3 nm.

38:32

You need about 6,000 wafers of 5 nm, and then you need about 170,000 wafers of DRAM memory.

38:41

Across these three different buckets, each requires different amounts of EUV.

38:46

When you manufacture a wafer, there are thousands and thousands of process steps where you're

38:50

depositing material and removing them. But the key critical step—which at least

38:55

in advanced logic is 30% of the cost of the chip—is something that

39:00

doesn't actually put anything on the wafer. You take the wafer, you deposit photoresist,

39:04

which is a chemical that chemically changes when you expose it to light.

39:07

Then you stick it into the EUV tool, which shines light at it in a certain way. It

39:11

patterns it. There's what's called a mask, which is effectively a stencil for the design.

39:16

When you look at a leading-edge 3 nm wafer, it has 70 or so masks, 70 or so layers of lithography,

39:23

but 20 of them are the most advanced EUV. If you need 55,000 wafers for a gigawatt, and you

39:33

do 20 EUV passes per wafer, you can do the math. That's 1.1 million passes of EUV for a single

39:43

gigawatt. It's pretty simple. Once you add the rest of the stuff, it ends up being 2 million,

39:47

across 5 nm and all the memory. You're at roughly 2 million EUV

39:52

passes for a single gigawatt. These tools are very complicated. When you think about

39:57

what it's doing across a wafer, it's taking the wafer and scanning and stepping across.

40:03

It does this dozens of times across the whole wafer.

40:09

When you're talking about how many EUV passes, that’s the

40:11

entire wafer being exposed at a certain rate. An EUV tool can do roughly 75 wafers per hour,

40:19

and the tool is up roughly 90% of the time. In the end, you need about three and a half

40:26

EUV tools to do the 2 million EUV wafer passes for the gigawatt.

40:32

So three and a half EUV tools satisfies a gigawatt.

40:35

It's funny to think about the numbers. What does a gigawatt cost? It costs roughly $50 billion.

40:40

Whereas what do three and a half EUV tools cost? That's $1.2 billion. It's actually quite a lower

40:46

number, which is interesting to think about. Fifty gigawatts of economic CapEx in the data

40:53

center, and what gets built on top of that in terms of tokens is even larger.

40:56

It might be $100 billion worth of AI value into the supply chain,

41:10

three years, TSMC has done $100 billion of CapEx. So it's $30/$30/$40 billion. A small fraction of

41:19

that is being used by Nvidia for the 3 nm, or previously 4 nm, that it's using for its chips.

41:30

What were its earnings last quarter? It was $40 billion.

41:34

So $40 billion times four is $160 billion. Nvidia alone is turning some small fraction

41:41

of $100 billion in CapEx, which is going to be depreciated over many years and not just

41:45

this one year, into $160 billion in a single year. That gets even more intense when you go down the

41:50

supply chain to ASML, which is taking a billion dollars' worth of machines to produce a gigawatt.

41:54

Of course, those machines last for more than a year so it’s doing more than that.

41:58

Now I want to understand, how many such machines will there be by 2030,

42:02

if you include not just the ones that are sold that year, but have been compiling over the

42:06

previous years? What does that imply? Sam Altman says he wants to do a gigawatt a week in 2030.

42:14

When you add up those numbers, is it compatible with that?

42:17

That's completely compatible, if you think about it.

42:19

TSMC and the entire ecosystem have something like 250 to 300 EUV tools already.

42:26

Then you stack on 70 this year, 80 next year, growing to 100 by 2030.

42:30

You're at 700 EUV tools by the end of the decade. 700 EUV tools, at three and a half tools per

42:35

gigawatt—assuming it's all allocated to AI, which it's not—gets you to 200 gigawatts worth of AI

42:43

chips for the data centers to deploy. Sam wants 52 gigawatts a year.

42:49

He's only taking 25% share then. Obviously, there's some share given to mobile and

42:54

PC, assuming we're even allowed to have consumer goods still and we don't get priced out of them.

43:04

But roughly, he's saying 25% market share of the total chips fabbed.

43:09

That's very reasonable given that this year alone, I think he's going to have

43:14

access to 25% of the Blackwell GPUs that are deployed. It's not that crazy.

43:23

When did ASML start shipping EUV tools, when 7 nm started?

43:27

I don't know when that was exactly. You're saying in 2030, they're going to be using

43:31

machines that initially were shipped in 2020. So for ten years, you're using the same most

43:36

important machine in this most technologically advanced

43:39

industry in the world? I find that surprising. ASML's been shipping EUV tools now for roughly a

43:45

decade, but it only entered mass volume production around 2020. The tool's not the same. Back then,

43:52

the tools were even lower throughput. There are various specifications around

43:57

them called overlay. I was mentioning you're

43:59

stacking layers on top of each other. You'll do some EUV, you'll do a bunch

44:02

of different process steps—depositing stuff, etching stuff, cleaning the wafer—dozens of

44:07

those steps before you do another EUV layer. There's a spec called overlay, which is:

44:11

you did all this work, you drew these lines on the wafer, now I want to draw these dots.

44:17

Let's say I want to draw these dots to connect these lines of metal to holes,

44:21

and then the next layer up is another set of lines going perpendicular, so now you're connecting

44:25

wires going perpendicular to each other. You have to be able to land them on top of

44:30

each other. It's called overlay. Overlay is a spec that's been improved rapidly by ASML.

44:36

Wafer throughput has been improved rapidly by ASML.

44:38

The price of the tool has gone up, but not as much as the capabilities of the tool.

44:42

Initially, the EUV tools were $150 million. Over time, they're now $400 million

44:49

as I look out to 2028. But the capabilities of the

44:51

tools have more than doubled as well, especially on throughput and overlay accuracy, which is

44:56

the ability to accurately align the subsequent passes on top of each other even though you do

45:03

tons of steps between. ASML is improving super rapidly. It's also noteworthy to say that ASML

45:13

is maybe one of the most generous companies in the world. They have this linchpin thing.

45:19

No one has anything competitive. Maybe China will have some EUV by the end of the decade, but no one

45:24

else has anything even close to EUV, and yet they haven't taken price and margins up like crazy.

45:31

You go ask some other folks that we talk to all the time, like Leopold,

45:37

and they're like, "Let's have the price go up." Because they can. The margin is there.

45:42

You can take the margin. Nvidia takes the margin. Memory players are taking the margin.

45:45

But ASML has never raised the price more than they've increased the capability of the tool.

45:51

In a sense, they've always provided net benefit to their customers.

45:54

It's not that the tool is stagnant, it's just that these tools are old.

45:58

Yes, you can upgrade them some, and the new tools are coming.

46:01

For simplicity's sake, we're ignoring the advances in overlay

46:06

or throughput per tool for this podcast. You say we're producing 60 of these machines

46:10

this year and then 70, 80 over subsequent years.

46:15

What would happen if ASML just decided to double its CapEx or triple its CapEx?

46:20

What is preventing them from producing more than 100 in 2030?

46:23

Why are you so confident that even five years out, you can be

46:27

relatively sure what their production will be? I think there are a couple factors here.

46:31

ASML has not decided to just go YOLO, let's expand capacity as fast as possible.

46:37

In general, the semiconductor supply chain has not.

46:39

It's lived through the booms and busts, and we can talk a bit more about it.

46:43

Basically some players have recently woken up, but in general no one really sees demand

46:52

for 200 gigawatts a year of AI chips, or trillions of dollars of spend a year in

46:58

the semiconductor supply chain. They're not AI-pilled. They're not AGI-pilled.

47:02

We're going to get to a trillion dollars this year.

47:05

Yeah, I feel you, but I'm saying no one really understands this in the supply chain.

47:11

Constantly, we're told our numbers are way too high, and then when they're right,

47:14

they're like, "Oh, yeah, but your next year's numbers are still too high."

47:18

ASML's tool has four major components. It has the source,

47:25

which is made by Cymer in San Diego. It has the reticle stage, which is made

47:31

in Wilmington, Connecticut. It has the wafer stage. It has the optics, the lenses and such.

47:39

Those last two are made in Europe. When you look at each of these four,

47:42

they're tremendously complex supply chains that, (A) they have not tried to expand massively,

47:48

and (B) when they try to expand them, the time lag is quite long.

47:55

Again, this is the most complicated machine that humans make, period, at any sort of volume.

48:02

Let's talk about the source specifically. What does the source do? It drops these tin droplets.

48:08

It hits it three subsequent times with a laser perfectly.

48:11

The first one hits this tin droplet, it expands out.

48:13

It hits it again, so it expands out to this perfect shape,

48:16

and then it blasts it at super high power. The tin droplets get excited enough that they

48:21

release EUV light, 13.5 nanometer, and then it's in this thing that is collecting all

48:26

the light and directing it into the lens stack. Then you have the lens stack, which is Carl Zeiss,

48:31

as you mentioned, and some other folks, but Zeiss being the most important part of it.

48:36

They also have not tried to expand production capacity because they don't see...

48:40

They're like, "We're growing a lot because of AI. We're growing from 60 to 100." It's like, "No, no,

48:46

no. We need to go to a couple hundred, but it's fine. Whatever." Each of these tools has, I think,

48:51

18 of these lenses, effectively. They are multilayer mirrors,

48:57

which are perfect layers of molybdenum and ruthenium, if I recall correctly,

49:03

stacked on top of each other in many layers, and then the light bounces off of it perfectly.

49:08

When we think about a lens, it's in a shape, and it focuses the light.

49:12

This is like a mirror that's also a lens, so it's pretty complicated.

49:16

Any defect in these super thinly deposited stacks will mess it up.

49:23

Any curvature issues will mess it up. There are a lot of challenges

49:26

with scaling the production. It's quite artisanal in this sense

49:29

because you're not making tens of thousands of these a year, you're making hundreds,

49:34

you're making thousands. 60 tools a year, 18 of these per tool, you’re still in the hundreds,

49:43

of tools, or you're at the thousand number roughly for these lenses and projection optics.

49:51

Then you step forward to the reticle stage, which is also something really crazy.

49:57

This thing moves at, I want to say, nine Gs. It will shift nine Gs because as you step

50:03

across a wafer, the tool will go... The wafer stage is complementary. It's the

50:07

wafer part. You line these two things up. You're taking all the light through the

50:11

lenses that's focused, and here's the reticle, here's the wafer.

50:16

The reticle's moving one direction, the wafer's moving the other direction as it

50:20

scans a 26x33 millimeter section of the wafer, and then it stops.

50:25

It shifts over to another part of the wafer and does it again.

50:28

It does that in just seconds. Each of them is moving

50:32

at nine Gs in opposite directions. Each of these things is a wonder and marvel

50:37

of chemistry, fabrication, mechanical engineering, and optical engineering, because you have to align

50:47

all these things and make sure they're perfect. All of these things have crazy amounts

50:50

of metrology because you have to perfectly test everything.

50:53

If anything is messed up, the yield goes to zero, because this is such a finely tuned system.

50:58

By the way, it's so large that you're building it in the factory in Eindhoven, Netherlands,

51:05

and they're deconstructing it and shipping it on many planes to the customer site, and then you're

51:10

reassembling it there and testing it again. That process takes many, many months.

51:15

There are so many steps in the supply chain, whether it's Zeiss making their

51:19

lenses and projection optics or Cymer, which is an ASML-owned company, making the EUV source.

51:25

Each of these has its own complex supply chain. ASML has commented that their supply chain has

51:29

over ten thousand people in it. Like individual suppliers?

51:32

Yes. It might not be directly. It might be through Zeiss having so many suppliers

51:37

and XYZ company having so many suppliers. If you just think about it, you're talking

51:44

about two physically moving objects that are the size of a wafer, and it has to be accurate

51:51

to the level of single-digit nanometers or even smaller because the entire system, the overlay,

51:58

the layer-to-layer overlay variation, has to be on the order of 3 nanometers.

52:04

If the overlay is 3 nms, that means each individual part, the accuracy of its

52:09

physical movement has to be even less than that. It has to be sub-one nanometer in most cases,

52:14

because the error of these things stack up. There's no way to just snap your fingers and

52:23

increase production. Things as simple as power. The US going from zero percent power growth to

52:27

two percent power growth, even though China's already at thirty, was so hard for America to do.

52:34

And that's a really simple supply chain with very few people in it who make difficult things.

52:41

There are probably 100,000 electricians and people who work in

52:45

the electricity supply chain, or more, in the US? When you look at ASML, they employ so few people.

52:53

Carl Zeiss probably employs less than a thousand people working on this, and all of

52:58

those people are super, super specialized. You can't just train random people up

53:02

for this in the snap of a finger. You can't just get your entire supply

53:06

chain to get galvanized. Nvidia's had to do a lot

53:11

to get the entire supply chain to even deliver the capacity they're going to make this year.

53:15

When you go talk to Anthropic, they're like, "We're short of TPUs, we're short

53:18

of training, and we're short of GPUs." When you go talk to OpenAI, they're like,

53:21

"We're short of these things." OpenAI and Anthropic know they need X.

53:25

Nvidia is not quite as AGI-pilled. They're building X - 1. You go down the

53:31

supply chain, everyone's doing X - 1. In some cases, they're doing X ÷ 2,

53:36

because they're not AGI-pilled. You end up with this time lag

53:42

for the whip to react. The AI-pilledness and the

53:48

desire to increase production takes so long. Once they finally understand that they need

53:53

to increase production rapidly… They think they understand.

53:57

They think AI means we have to go from 60 to 100, in addition to the tools getting

54:01

better and faster, the source getting higher power from 500 watts to 1,000,

54:05

and all these other aspects of the supply chain advancing technically and increasing production.

54:09

They think they're actually increasing production a lot.

54:13

But if you flow through the numbers… What does Elon want?

54:15

He wants 100 gigawatts a year in space by 2028 or 2029.

54:23

Sam Altman wants 52 gigawatts a year by the end of the decade.

54:28

Anthropic probably needs the same, and Google needs that.

54:32

You go across the supply chain, and it's like, wait, no, the supply chain can't

54:35

possibly build enough capacity for everyone to get what they want on the side of compute.

55:44

I feel like in the data center supply chain for the last few years,

55:50

people have been making arguments like, "We are bottlenecked by this specific thing, therefore

55:55

AI compute can't scale more than X." But as you've written about, if the

56:00

grid is a bottleneck, then we just do behind the meter on the site, we do gas turbines, et cetera.

56:06

If that doesn't work, there are all these other alternatives that people fall back on.

56:11

I want to ask whether we can imagine a similar thing happening in the semiconductor supply chain.

56:17

If EUV becomes a bottleneck, what if we just went back to 7 nm and did what China

56:24

is doing currently, producing 7 nm chips with multi-patterning with DUV machines?

56:31

If you look at a 7 nm chip like the A100, there's been a lot of progress

56:36

obviously from the A100 to the B100 or B200. How much of that progress is just numerics?

56:45

If you just hold FP16 constant from A100 to B100. The B100 is a little over one petaflop, and the

56:54

A100 is like 300 teraflops. Yeah, 312.

57:02

Holding numerics constant, you have a 3x improvement from A100 to B100.

57:07

Some of that is the process improvement, some of that is just the accelerator design improving,

57:11

which we could replicate again in the future. It seems there's actually a very small effect

57:16

from the process improving from 7nm to 4 nm. I don't know the numbers offhand but let's

57:24

say there's 150k wafers per month of 3 nm and eventually similar amounts for 2 nm.

57:31

But then there's a similar amount for 7 nm. If you have all those old wafers and there's

57:36

maybe a 50% haircut because the bits per wafer area are 50% less or something,

57:45

it doesn't seem that bad to just bring on 7 nm wafers if that gives you another fifty or

57:50

hundred gigawatts. Tell me why that's naive. We potentially do go crazy enough that this

58:01

happens because we just need incremental compute, and the compute is worth the

58:04

higher cost and power of these chips. But it's also unlikely to a large extent

58:13

because some of these are not fair comparisons. For example, from A100, which is 312 teraflops,

58:22

to Blackwell, which is 1,000 or 2,000 FP16, and then Rubin is 5,000 or so FP16… It's not

58:31

a fair comparison because these chips have vastly different design targets.

58:38

With A100, Nvidia optimized for FP16 and BF16 numerics.

58:45

When you look at Hopper, they didn't care as much about that; they cared about FP8.

58:49

When you look at Rubin, they don’t care about FP16 and BF16 so much,

58:53

they care mostly about FP4 and FP6. Numerics are what they've designed their chip for.

59:06

Let's say we make a new chip design on 7 nm, optimized for the numerics of the modern day.

59:14

The performance difference is still going to be much larger

59:16

than the FLOPS difference you mentioned. Often it's easy to boil things down to FLOPS

59:23

per watt or FLOPS per dollar, but that's not a fair comparison.

59:32

Let's look at Kimi K2.5 and DeepSeek. When you look at those two models and

59:40

their performance on Hopper versus Blackwell on very optimized software,

59:45

you get vastly different performance. Most of this is not attributed to

59:50

FLOPS or numerics, because those models are actually eight-bit.

59:55

So it's not like Blackwells and Hopper are both optimized for eight-bit, and Blackwell is not

59:59

really taking advantage of its four-bit there. The performance gulf is actually much larger.

60:09

Sure it's one thing to shrink process technology and make the transistor smaller

60:14

so each chip has X number of FLOPS, but you forget the big gating factor.

60:18

These models don't run on a single chip. They run on hundreds of chips at a time.

60:22

If you look at DeepSeek's production deployment, which is well over a

60:25

year old now, they were running on 160 GPUs. That's what they serve production traffic on.

60:31

They split the model across 160 GPUs. Every time you cross the barrier from one

60:35

chip to another, there is an efficiency loss. You have to transmit over high-speed

60:40

electrical SerDes, which brings a latency cost and a power cost.

60:44

There are all these dynamics that hurt. As you shrink and shrink the process node, you've

60:51

increased the amount of compute in a single chip. Now in-chip movement of data is at least tens

61:01

of terabytes a second, if not hundreds of terabytes a second.

61:04

Whereas between chips, you're on the order of a terabyte a second.

61:09

Then you have this movement of data between chips that are super close to each other physically.

61:13

You can only put so many chips close to each other physically,

61:15

so you have to put chips in different racks. The movement of data between racks is on the order

61:20

of hundreds of gigabits a second, 400 gig or 800 gig a second, so roughly 100 gigabytes a second.

61:27

So you have this huge ladder: on-chip communication is super fast, within the

61:32

rack is an order of magnitude slower, and outside the rack is an order of magnitude lower than that.

61:39

As you break the bounds of chips, you end up with a performance loss.

61:43

The reason I explain this is because when you look at Hopper versus Blackwell,

61:47

even if both are using a rack's worth of chips, Hopper is significantly slower.

61:52

The amount of performance you have leveraged to the task within each domain—tens of terabytes a

62:00

second of communication between these processing elements versus terabytes a second between these

62:06

processing elements—is much, much higher and therefore the performance is much higher.

62:11

When you look at inference at 100 tokens a second for DeepSeek and Kimi K2.5,

62:19

the performance difference between Hopper and Blackwell is on the order of 20x.

62:21

It's not 2x or 3x like the FLOPS performance difference indicates,

62:24

even though those are on the same process node. There are just differences in networking

62:28

technologies and what they've worked on. You can translate some of these back,

62:32

but when you look at what they're doing on 3 nm with Rubin, some of those things

62:36

are simply not possible to do all the way back on A100, even if you make a new chip for 7 nm.

62:42

There are certain architectural improvements you can port and certain ones you cannot.

62:47

The performance difference is not just going to be the difference in FLOPS.

62:50

It's in some senses cumulative between the difference in FLOPS per chip,

62:56

networking speed between chips, how many FLOPS are on a chip versus a system, and memory

63:00

bandwidth on a single chip versus an entire system. All of these things compound.

63:03

Can I ask you a very naive question? The B200 now has two dies on a single chip,

63:10

so you can get that bandwidth without having to go through NVLink or InfiniBand.

63:16

Next year, Rubin Ultra will have four dies on one chip.

63:19

What is preventing us from just doing that with an older… How many dies could

63:24

you have on a single chip and still get these tens of terabytes a second?

63:28

Even within Blackwell, there are differences in

63:32

performance when you're communicating on the chip versus across the chips.

63:36

Those bounds are obviously much smaller than when you're going out of the entire chip.

63:45

When you scale the number of chips up, there is some performance loss.

63:50

It's not perfect, but it is way better than different entire packages.

63:54

How large can advanced packaging scale? The way Nvidia is doing it is CoWoS.

64:01

Google, Broadcom, MediaTek, and Amazon's Trainium are all doing CoWoS.

64:07

But actually you can go look back at what Tesla did with Dojo, which they cancelled and restarted.

64:16

Dojo was a chip that was the size of an entire wafer.

64:19

They had 25 chips on it. There were some tradeoffs. They couldn't put HBM on it.

64:26

But the positive side was that they had 25 chips on it.

64:30

To date, it is still probably the best chip for running convolutional neural networks.

64:35

It's just not great at transformers because the shape of the chip, the memory, the arithmetic,

64:41

and all these various specifications are just not well-suited for transformers. They're

64:45

well-suited for CNNs. Dojo chips were optimized around that, and they made a bigger package.

64:52

But as you make packages bigger and bigger, you have other constraints: networking speed,

64:59

memory bandwidth, and cooling capabilities. All of these things start to rear their heads.

65:03

It's not simple. But yes, you will see a trend line of more chips on the package, and yes,

65:08

you're going to be able to do that on 7 nm. In fact, that's what Huawei did

65:11

with their Ascend 910C or D. They initially put one, and then they did two.

65:20

They're focusing on scaling the packaging up because that is an

65:23

area where they can advance faster than process technology where they can't shrink.

65:28

But at the end of the day, that’s something you can do on the leading-edge chips too.

65:32

Anything you do on 7 nm, you can also probably do on 3 nm in terms of packaging.

65:36

If we end up in this world in 2030 where the West has the most advanced process technology

65:42

but has not ramped it up as much, whereas China… I don't know if you think by 2030

65:48

they would have EUV and 2 nm or whatever. But they are semiconductor-pilled and they

65:53

are producing in mass quantity. Basically, I'm wondering what

65:57

the year is where there's a crossover, where our advantage in process technology has faded enough,

66:03

and their advantage in scale has increased enough. And also, if their advantage in having one country

66:09

with the entire supply chain indigenized—rather than having random suppliers in Germany

66:13

and the Netherlands—would mean that China would be ahead in its ability to produce mass flops.

66:22

To date, China still does not have an entirely indigenized semiconductor supply chain.

66:28

But would they in 2030? By 2030, it's possible that they do.

66:33

But to date, all of China's 7 nm and 14 nm capacity uses ASML DUV tools.

66:42

The amount that they can import from ASML is large.

66:47

But the vast majority of ASML's revenue, especially on EUV all of it, is outside of China.

66:54

The scale advantage is still in the favor of the West plus Taiwan,

66:58

Japan, and Korea, et cetera. But they're trying to make

66:59

their own DUV and EUV tools, right? They're trying to do all these things.

67:03

The question is how fast can they advance and scale up production as well as quality.

67:08

To date, we haven't seen that. Now I'm quite bullish that they're

67:12

going to be able to do these things over the next five to ten years.

67:16

They will really scale up production and kick it into high gear.

67:20

They have more engineers working on it and more desire to throw capital at the problem.

67:24

So by 2030, will they have fully indigenized DUV? I think for sure. DUV, yes.

67:28

And fully indigenized EUV by 2030? I think they'll have working tools.

67:32

I don't think that they'll be able to manufacture a bunch yet.

67:36

There's having it work, and then there's production hell.

67:42

ASML had EUV working in the early 2010s at some capacity.

67:49

The tools were not accurate enough. They were not scaled for high-volume

67:54

manufacturing or reliable enough. They had to ramp production,

67:57

and that all took time. Production hell takes time. That's why it took another five to seven

68:01

years to get EUV into mass production at a fab rather than just working in the lab.

68:07

How many DUV tools do you think they'll be able to manufacture in 2030?

68:11

ASML? No, China.

68:14

That's a great question. It's a bit of a challenge to look into this supply chain

68:23

especially. We try really hard. In some instances, they're buying stuff from Japanese vendors.

68:31

If they want a fully indigenized supply chain, they need to not buy these lenses, projection

68:36

optics, or stages from Japanese vendors. They need to build it internally.

68:40

It's really tough to say where they'll be able to get to.

68:42

I honestly think it's a shot in the dark. But it's probably not unlikely that they'll

68:46

be able to do on the order of 100 DUV tools a year, whereas ASML is currently doing

68:51

hundreds of DUV tools a year. No company has a process node

69:00

where they make a million wafers a month. Elon says he wants to do it and China is

69:05

obviously going to do it. TSMC is trying to do that.

69:12

The memory makers may get to a million wafers a month as well, but not in a single fab.

69:16

It's mind-boggling to think of that scale, and challenging to

69:22

see the supply chain galvanized for that. I don't want to doubt China's capability to scale.

69:29

I guess this is an interesting question. I think at some point SemiAnalysis

69:34

will do the deep dive on this. By when would indigenized Chinese production

69:44

be bigger than the rest of the West combined. And put in the input of your model of when they'll

69:52

have DUV machines and EUV machines at scale? Because there's this question around if you

69:56

have long timelines on AI—by long meaning 2035, which is not that long in the grand

70:00

scheme of things—should you expect a world where China is dominating in semiconductors?

70:06

It doesn't get asked enough because if you're in San Francisco,

70:09

we're thinking on timescales of weeks. If you're outside of San Francisco,

70:14

you're not thinking about AGI at all. What if we have AGI? What if you have this transformational

70:19

thing that is commanding tens or hundreds of trillions of dollars of economic growth

70:23

and token output, but it happens in 2035? What does that imply for the West versus China?

70:33

SemiAnalysis has got to write the definitive model on this.

70:39

It's really challenging when you move timescales out that far.

70:43

What we tend to focus on is tracking every data center, every fab, and all the tools.

70:48

We track where they're going, but the time lags for these things are relatively short.

70:54

We can only make reasonably accurate estimates for data center capacity based on land purchasing,

71:01

permits, and turbine purchasing. We know where all these things

71:04

are going, that's the data we sell. As you go out to 2035, things are just

71:10

so radically different. Your error bars get so

71:13

large it's hard to make an estimate. But at the end of the day, if takeoff

71:19

or timelines are slow enough, I don't see why China wouldn't be able to catch up drastically.

71:28

In some sense, we've got this valley where, three to six months ago, or maybe even now, Chinese

71:36

models are as competitive as they've ever been. I think Opus 4.6 and GPT 5.4 have really pulled

71:41

away and made the gap a little bit bigger, but I'm sure some new Chinese models will come out.

71:45

As we move from selling tokens where they provide the entire reasoning chain, to

71:53

selling automated white-collar work—an automated software engineer, you send them the request,

71:59

they give you the result back, and there's a bunch of thinking on the back end that they don't show

72:02

you—the ability to distill out of American models into Chinese models will be harder.

72:05

Second, look at the scale of the compute the labs have.

72:10

OpenAI exited the year with roughly two gigawatts last year.

72:13

Anthropic will get to two-plus gigawatts this year.

72:17

By the end of next year, they'll both be at ten gigawatts of capacity.

72:21

China is not scaling their AI lab compute nearly as fast.

72:25

At some point, when you can't distill the learnings from these labs into the Chinese

72:30

models, plus with this compute race that OpenAI, Anthropic, Google, and Meta are all racing on,

72:37

they end up getting to a point where the model performance should start to diverge more.

72:44

Then look at all this CapEx being spent on data centers.

72:49

Amazon is spending $200 billion, Google $180 billion.

72:53

All these companies are spending hundreds of billions of dollars on CapEx.

72:57

There's nearly a trillion dollars of CapEx being invested in data

73:02

centers in America this year, roughly. What's the return on invested capital here?

73:08

You and I would think the return on invested capital for data center CapEx is very high.

73:14

If we look at Anthropic's revenues, in January they added $4 billion.

73:18

In February, which was a shorter month, they added $6 billion.

73:21

We'll see what they can do in March and April,

73:24

given that compute constraints are what's bottlenecking their growth.

73:27

The reliability of Claude is quite low because they're so compute constrained.

73:31

But if this continues, then the ROIC on these data centers is super high.

73:36

At some point, the US economy starts growing faster and faster over this year and next year

73:42

because of all this CapEx, all the revenue these models are generating, and the downstream supply

73:47

chain. China doesn't have that yet. They have not built the scale of infrastructure

73:54

to invest in models, get to the capabilities, and then deploy these models at such scale.

74:00

When you look at Anthropic, they're at $20 billion ARR.

74:05

The margins are sub-50 percent, at least as last reported by The Information.

74:09

So that's $13 or $14 billion of compute that it's running on rental cost-wise, which is actually $50

74:16

billion worth of CapEx that someone laid out for Anthropic to generate their current revenue.

74:22

China has just not done this. If and when Anthropic 10Xs revenue again—and

74:28

I think our answer would be when, not if—China doesn't have the compute to deploy at that scale.

74:34

So there is some sense that we're in a fast takeoff.

74:39

It's not like we're talking about a Dyson sphere by X date,

74:42

it's more like the revenue is compounding at such a rate that it does affect economic growth.

74:47

The resources these labs are gathering are growing so fast.

74:51

China hasn't done that yet, so in that case, the US and the West are actually diverging.

74:56

The flip side is that these infrastructure investments have middling returns.

75:01

Maybe they're not as good as hoped. Maybe Google is wrong for wanting

75:05

to take free cash flow to zero and spend $300 billion on CapEx next year.

75:09

Maybe they’re just wrong and people on Wall Street who are bearish and people

75:13

who don't understand AI are correct. In that case, the US is building all

75:19

this capacity but doesn't get great returns. Meanwhile, China is able to build a fully

75:23

vertical, indigenized supply chain, instead of the US/Japan/Korea/Taiwan/SE Asia/Europe countries

75:33

together building this less vertical supply chain. In a sense, at some point China is able to scale

75:40

past us if AI takes longer to get to certain capability levels than the vast majority of

75:47

your guests on this podcast believe. It's fast timelines, the US wins;

75:50

long timelines, China wins. Yeah but I don't know what fast timelines means.

75:54

I don't think you have to believe in AGI to have the timelines where the US wins.

76:01

Let's go back to memory. I think people on Wall Street and people in the industry are

76:06

understanding how big this is, but maybe generally people don't understand what a big deal it is.

76:10

So we've got this memory crunch, as you were talking about.

76:12

And earlier I was asking about, oh, could we solve for the EUV

76:16

tool shortage by going back to seven nanometers? So let me ask a similar question about memory.

76:21

HBM is made of DRAM, but has three to four times fewer bits per wafer

76:26

area than the DRAM it's made out of. Is it possible that accelerators in the

76:30

future could just use commodity DRAM and not HBM, so we can get

76:35

much more capacity out of the DRAM we have? The reason I think this might be possible is,

76:43

if we're going to have agents that are just going off and doing work, and it's

76:48

not a synchronous chatbot application, then you don't necessarily need extremely fast latency.

76:57

Maybe you can have lower bandwidth, because the reason you stack DRAM into

77:04

HBM is for higher bandwidth. Is it possible to go to HBM

77:09

accelerators and basically have the opposite of Claude Code Fast, like have Claude Slow?

77:17

At the end of the day, the incremental purchaser who's willing to pay the highest

77:20

price for tokens also ends up being the one that's less price-sensitive.

77:26

Compute should be allocated, in a capitalistic society, towards the goods that have the

77:31

highest value, and the private market determines this by willingness to pay.

77:35

To some extent, Anthropic could actually release a slow mode.

77:39

They could release Claude Slow Mode and increase tokens per dollar by a significant amount.

77:46

They could probably reduce the price of Opus 4.6 by 4-5x and reduce the speed by maybe just 2x.

77:54

The curve on inference throughput versus speed is already there just on HBM.

77:59

And yet they don't, because no one actually wants to use a slow model.

78:04

Furthermore, on these agentic tasks, it's great that the model can run at a time horizon of hours.

78:11

But if the model was running slower, those hours would become a day.

78:16

Vice versa, if the model is running faster, those hours become an hour.

78:21

No one really wants to move to a day-long wait period, because the highest-value tasks also have

78:27

some time sensitivity to them. I struggle to see… Yes,

78:34

you could use regular DRAM. There are a couple of challenges with this.

78:44

One of the core constraints of chips is that a chip is a certain size, and all

78:52

of the I/O escapes on the edges. Often, the left and right of the

78:58

chip are HBM—so the I/O from the chip to the HBM is on the sides—and then the

79:02

top and bottom are I/O to other chips. If you were to change from HBM to DDR,

79:11

all of a sudden this I/O on the edge would have significantly less bandwidth,

79:17

but significantly more capacity per chip. But the metric you actually care about

79:28

is bandwidth per wafer, not bits per wafer. Because the thing that is constraining the FLOPS

79:34

is just getting in and out the next matrix, and for that you just need more bandwidth.

79:39

Yeah, getting out the weights and getting in and out the KV cache.

79:44

In many cases, these GPUs are not running at full memory capacity.

79:47

It's obviously a system design thing: model, hardware, and software co-design.

79:52

You have to figure out how much KV cache you need, how much you keep on the chip,

79:55

how much you offload to other chips and call when you need it for tool calling,

80:00

and how many chips you parallelize this on. Obviously, the search space for this is very

80:05

broad, which is why we have InferenceX, an open-source model that searches all

80:09

the optimal points on inference for a variety of different chips and models.

80:16

The point is, you're not always necessarily constrained by memory capacity.

80:22

You can be constrained by FLOPS, network bandwidth, memory bandwidth, or memory capacity.

80:30

If you really simplify it down, there are four constraints,

80:33

and each of these can break out into more. If you switch to DDR, yes, you produce

80:39

four times the bits per DRAM wafer, but all of a sudden the constraints shift a lot and your

80:44

system design shifts. You go slower. Is the market smaller? Maybe. But also,

80:50

all these FLOPS are wasted because they're just sitting there waiting for memory.

80:53

You don't need all that capacity because you can't really increase batch size because then the KV

80:58

cache would take even longer to read. Makes sense. What is the bandwidth

81:04

difference between HBM and normal DRAM? An HBM4 stack—let's talk about the stuff

81:11

that's in Rubin, because that's what we've been indexing on—is 2048 bits across, connected in an

81:16

area that's 13 millimeters wide. It transfers memory at around

81:22

10 giga-transfers a second. So a stack of HBM4 is 2048 bits on

81:27

an area that's roughly 11 to 13 millimeters wide. That's the shoreline you're taking on the chip.

81:33

In that shoreline, you have 2048 bits transferring at 10 giga-transfers per second.

81:39

You multiply those together and divide by eight,

81:41

bits to a byte, and you're at roughly 2.5 terabytes a second per HBM stack.

81:46

When you look at DDR, in that same area, it's maybe 64 or 128 bits wide.

81:53

That DDR5 is transferring at anywhere from 6.4 to maybe 8,000 giga-transfers a second.

82:01

So your bandwidth is significantly lower. It's 64 times 8,000 divided by

82:07

eight, which puts you at 64 gigabytes a second. Even if you take a generous interpretation of

82:14

128 times 8 giga-transfers, you're at 128 gigabytes a second for the same shoreline,

82:18

versus 2.5 terabytes a second. There's an order of magnitude

82:21

difference in bandwidth per edge area. If your chip is a square, or 26 by 33

82:27

millimeters—which is the maximum size for an individual die—you only have so much edge area.

82:32

On the inside of that chip, you put all your compute.

82:34

There are things you can do to try and change that, like more SRAM or more caching.

82:38

But at the end of the day, you're very constrained by bandwidth.

82:42

Then there's the question of where you can destroy demand to free up enough for AI.

82:48

I guess the picture is especially bad because, as you're saying, if it takes four times more

82:52

wafer area to get the same byte, for HBM you have to destroy four times as much consumer demand for

82:58

laptops and phones to free up one byte for AI. What does this imply for the next year or two?

83:08

Sorry for the run-on question, in your newsletter you said 30% of Big Tech's CapEx in 2026 is going

83:14

towards memory? Yes.

83:16

That's insane, right? Of the $600 billion or whatever, 30% is going just to memory.

83:23

Yes. Obviously, there's some level of margin stacking that Nvidia does,

83:26

so you have to separate that out and apply their margin to the memory and the logic.

83:30

But at the end of the day, a third of their CapEx is going to memory.

83:33

That's crazy. What should we expect over the next year or two as this memory crunch hits?

83:41

The memory crunch will continue to get harder, and prices will continue to go up.

83:48

This affects different parts of the market differently.

83:52

Are people going to hate AI more and more? Yes, because smartphones and PCs are not

83:56

going to get incrementally better year on year. In fact, they're going to get incrementally worse.

84:00

If you look at the bill of materials for an iPhone, what fraction of it is the memory?

84:04

How much more expensive does an iPhone get if the memory is two times more expensive?

84:09

I believe an iPhone has 12 gigabytes of memory. Each gig used to cost roughly $3-4, so that's $50.

84:17

But now the price of memory has tripled. Let's say it's $12 per gig for DDR.

84:23

Now you're talking about $150 versus $50. That's a $100 increase in cost for Apple.

84:30

Apple has some margin, they're not just going to eat the margin.

84:32

NAND also has the same market dynamics, so in reality, it's probably a $150

84:32

increase on the iPhone. So now that’s a $100 cost

84:33

increase and that’s just on the DRAM. The NAND also has the same sort of market.

84:37

So in fact it’s probably a $150 increase on the iPhone.

84:41

Apple either has to pass that on to the consumer or eat it.

84:46

I don't see Apple reducing their margin too much, maybe they eat a little bit.

84:49

But at the end of the day, that means the end consumer is paying $250 more for an iPhone.

84:54

Now that’s just on last year’s pricing versus today’s.

84:59

There is some lag before Apple feels the heat because they tend to have long-term contracts

85:06

for memory that last three months to a year. But at the end of the day, Apple gets hit

85:09

pretty hard by this. They won't really

85:13

adjust until the next iPhone release. But that's the high end of the market,

85:17

which is only a few hundred million phones a year. Apple sells two or three hundred million

85:20

phones annually. The bulk of the market is mid-range and low-end.

85:25

It used to be that 1.4 billion smartphones were sold a year.

85:28

Now we're at about 1.1 billion. Our projections are that we might

85:31

drop to 800 million this year, and down to 500 or 600 million next year.

85:37

We look at data points out of China from some of our analysts in Asia,

85:42

Singapore, Hong Kong, and Taiwan. They've been tracking this,

85:45

and they see Xiaomi and Oppo cutting low-end and mid-range smartphone volumes by half.

85:52

Yes, it’s only a $150 BOM increase on a $1,000 iPhone where Apple has some larger margin.

86:02

But for smaller phones, the percentage of the BOM that goes to memory and storage is much larger.

86:08

And the margins are lower, so there's less capacity to even eat the margins.

86:13

And they have also generally tended not to do long-term agreements on memory.

86:20

Why this is a big deal is that if smartphone volumes halve, that drop will happen in

86:26

the low and mid-range, not the high end. So it’s not like the bits released are halving.

86:32

Currently, consumer devices account for more than half of memory demand.

86:35

Even if you halve smartphone volumes, because of the shape of the halving,

86:38

the low end gets cut by more than half, while the high end gets cut by less than half,

86:42

because you and I will still buy the high-end phones that cost north of a thousand dollars.

86:46

We'll buy them even if they get a little bit more expensive.

86:48

And Apple's volumes will not go down as much as a low-end smartphone provider.

86:52

The same applies to PCs. What this does to the market is quite drastic.

86:59

DRAM gets released and goes to AI chips, who are willing to do longer-term contracts and pay higher

87:06

margins, because at the end of the day the margin they extract from the end user is much larger.

87:14

This probably leads to people hating AI even more. Today, you already see all the memes on PC

87:22

subreddits and gaming PC Twitter. It's cat dancing videos saying,

87:27

"This is why memory prices have doubled and you can't get a new gaming GPU or desktop."

87:33

It's going to be even worse when memory prices double again, especially DRAM.

87:37

Another interesting dynamic is that it's not just DRAM, it's also NAND.

87:42

NAND is also going up in price. Both of these markets have expanded capacity very

87:46

slowly over the last few years, NAND almost zero. The percentage of NAND that goes to phones and

87:54

PCs is larger than the percentage of DRAM that goes to phones and PCs.

87:58

As you destroy demand, mostly for DRAM purposes, you unlock more NAND

88:03

that gets allocated and can go to other markets. The price increases of DRAM will be larger than

88:09

those of NAND because you've released more from the consumer, and in fact,

88:13

you've produced more memory for AI. Sorry, maybe you just explained

88:18

it and I missed it. Is it because SSDs are

88:21

being used in large quantities for data centers? They are, but not in as large quantities as DRAM.

88:27

Okay, so they will also increase because they'll be using some quantity, but there's

88:32

not as much of a need as there is for HBM. Makes sense. One thing I didn't appreciate until I was

88:37

reading some of your newsletters is that the same constraints preventing logic scaling over

88:43

the next few years are quite similar to what's preventing us from producing more memory wafers.

88:49

In fact, literally the same exact machine, this EUV tool, is needed for memory.

88:55

So I guess the question someone could ask right now is, why can't we just make more memory?

89:05

The constraints, as I was mentioning earlier, are not necessarily EUV tools today or next year.

89:11

They become that as we get to the latter part of the decade.

89:15

Currently, the constraints are more that they physically just haven't built fabs.

89:20

Over the last three to four years, these vendors have not built new fabs

89:25

because memory prices were really low. Their margins were low, and in fact,

89:29

they were losing money in 2023 on memory. So they decided they weren't building new fabs.

89:34

The market slowly recovered over time but never really got amazing until last year.

89:40

In 2024, we were banging on the drums that reasoning means long context,

89:44

which means a large KV cache, which means you need a lot of memory demand.

89:48

We've been talking about that for a year and a half, two years.

89:51

People who understand AI went really long on memory then.

89:57

So you’ve seen that dynamic, but now it has finally played out in pricing.

90:01

It took so long for what was obvious: long context means the KV

90:05

cache gets bigger, you need more memory. Half the cost of accelerators is memory.

90:09

Of course they're going to start going crazy on it.

90:13

It took a year for that to actually reflect in memory prices.

90:16

Once memory prices reflected that, it took another three to six months for the

90:20

memory vendors to start building fabs. Those fabs take two years to build.

90:24

So we won't have really meaningful fabs to even put these tools in until late 2027 or 2028.

90:34

Instead, you've seen some really crazy stuff to get capacity.

90:39

Micron bought a fab from a company in Taiwan that makes lagging-edge chips.

90:47

Hynix and Samsung are doing some pretty crazy things to try and expand capacity

90:51

at their existing fabs, which also have large knock-on effects in the economy.

90:56

So why can't we build more capacity? There's nowhere to put the tools.

91:02

It's not just EUV; there are other tools involved in DRAM and logic.

91:06

In logic, for N3, about 28%

91:11

of the cost of the final wafer is EUV. When you look at DRAM, it's in the teens.

91:19

It's going up, but it's a much smaller percentage of the cost.

91:24

These other tools are also bottlenecks, although their supply chains are not as complex as ASML's.

91:30

You see Applied Materials, Lam Research, and all these other

91:32

companies expanding capacity a lot as well. But you don't have anywhere to put the tool,

91:37

because the most complex buildings people make are fabs, and fabs take two years to build.

92:40

I interviewed Elon recently, and his whole plan is that they're going to build this TeraFab

92:47

and they're going to build the clean rooms. I won't even ask you about the dirty rooms thing,

92:53

but let's say they build the clean rooms. I have a couple of questions.

92:58

One, do you think this is the kind of thing that Elon Co. could build much

93:04

faster than people conventionally build it? This is not about building the end tools.

93:07

This is just about building the facility itself. How complicated is it to just build

93:11

the clean room extremely fast? Is this something that Elon, with his "move

93:15

fast" approach, could do much faster if that's what we're bottlenecked on this year or next year?

93:19

Two, does that even matter if, in two years, your view is that we're not bottlenecked on

93:24

clean room space, but on the tooling? As with any complex supply chain,

93:29

it takes time, and constraints shift over time. Even if something is no longer a constraint, that

93:33

doesn't mean that market no longer has margin. For example, energy will not be a big bottleneck

93:40

a couple of years from now, but that doesn't mean energy isn't growing super

93:43

fast and there's no margin there. It's just not the key bottleneck.

93:47

In the space of fabs, clean rooms are the biggest bottleneck this year and next year.

93:52

As we get to 2028, 2029, 2030, there will still be constraints there.

93:57

The thing about Elon is he has a tremendous capability to garner physical resources and

94:04

really smart people to build things. The way he recruits amazing people

94:08

is by trying to build the craziest stuff. In the case of AI, that hasn't really worked

94:12

because everyone's trying to build AGI. Everyone is very ambitious. But in the case of going to

94:17

Mars, making rockets that land themselves, fully autonomous electric cars, or humanoid robots,

94:25

these are methods of recruiting the people who think that's the most important problem in the

94:28

world to work on that problem, because he's the only one trying really hard.

94:31

In the case of semiconductors, he stated he wants to make a fab that's a million wafers per month.

94:35

No one has a fab that big. It's possible that he's able to recruit a

94:41

lot of really awesome people and get them on this crazy task of building a million wafers a month.

94:47

Step one is to build the clean room, and that I think he probably can do.

94:53

His mindset around deleting things, that it can be dirty, it's fine, is probably not right.

94:58

Actually I think it’s 100% not right. You need the fab to be very clean.

95:06

All of the air in the fab gets replaced every three seconds, it’s that fast.

95:11

There have to be so few particles. But I think he can build the clean room.

95:14

It'll take a year or two. Initially, it won't be super fast,

95:17

but over time, he'll get faster at it. The really complex part is actually developing

95:21

a process technology and building wafers. I don't think he can develop that quickly.

95:26

That has a lot of built-up knowledge. The most complicated integration of

95:32

very expensive tools and supply chains is done by TSMC, Intel, or Samsung.

95:39

These two other companies aren't even that great at it, and they're tremendously complex.

95:43

How surprised would you be if in 2030 there just happened to be some total

95:48

disruption where we're not using EUV? What if we're using something that has

95:52

much better effects, is much simpler to produce, and can be produced in much bigger quantities?

95:56

I'm sure as an industry insider that sounds like a totally naive question,

95:58

but do you see what I'm asking? What probability should we put on

96:03

something coming totally out of left field to make all of this irrelevant?

96:07

Something that's very simple and easy to scale, I assign a very, very low probability.

96:12

There are a number of companies working on effectively particle

96:16

accelerators or synchrotrons that generate light that's either 13.5 nanometer, like EUV,

96:21

or an even narrower wavelength, like X-ray at 7 nanometers, to then use in lithography tools.

96:29

But those things are massive particle accelerators generating this light.

96:32

It's a very complicated thing to build. There are a couple of companies and I think

96:35

that could be a big disruption to the industry beyond EUV.

96:38

But I don't think we're going to magically build something new that

96:43

is direct write and super simple, and can be manufactured at huge volumes, although

96:49

there are some attempts to do things like this. I ask because if you think about Elon's companies

96:54

in the past, rocketry was this thing that was thought to be—and is—incredibly complicated.

96:59

Look, I'm just a naive yapper compared to Elon. What have I built? So maybe it's possible.

97:05

In order to build more memory in the future, could we build 3D DRAM the way

97:12

we do 3D NAND and then go back to DUV? That is the hope currently. Everyone's

97:17

roadmap for 3D DRAM is that you'll still use EUV because you want to have that tighter overlay.

97:24

When you're doing these subsequent processing steps, everything is vertically stacked and you

97:28

have more layers on top of each other. You want the pitches to be tighter.

97:33

So generally, people are still trying to do it with EUV.

97:35

But what 3D would do is change the calculation of how many bits a single EUV pass can make.

97:42

That number would go up drastically if you go to 3D DRAM. That is the hope. Right now,

97:47

everyone's roadmap goes from the current 6F cell, to a 4F cell, and then finally 3D DRAM by the end

97:56

of the decade or early next decade. There's still a lot of R&D,

98:00

manufacturing, and integration to be done. I wouldn't call that out of the cards.

98:04

I think it's very likely going to happen. It's also going to require a huge

98:08

retooling of fabs. The breakdown of

98:11

tools in a fab will be very different. The lithography tool is actually the

98:14

only thing that isn't that different. But the number of them relative to different

98:18

types of chemical vapor deposition, atomic layer deposition, dry etch, or different kinds of etch

98:25

chambers with different chemistries… You have all these different tools for different process nodes.

98:31

You can't just convert a logic fab to a DRAM fab, or vice versa, or a NAND fab

98:35

to a DRAM fab, in a short amount of time. In the same way, existing DRAM fabs require a

98:41

lot of retooling just to go from 1-alpha to 1-beta to 1-gamma process nodes, because they have to

98:46

add DUV and change the chemistry stacks for when you’re using EUV in terms of deposition and etch.

98:51

And the EUV tool has to be there. Furthermore, when you change to 3D DRAM,

98:55

there's going to be an even larger shift, so a lot of retooling of these fabs needs to happen.

99:01

That would be a big disruption. That would make EUV demand generally lower.

99:06

But as we've seen across time, lithography demand as a percentage of wafer cost has trended up.

99:12

Around the 2014 era, it was 17% of the wafer cost, and it's gone to 30% over the last fifteen years.

99:24

For DRAM, it was in the low to mid-teens, and now it's trended toward the high teens.

99:30

Before we get to 3D DRAM, it'll likely cross into the 20% range.

99:33

But then, if we get to 3D DRAM, the total end wafer cost as a percentage of EUV tanks again.

99:39

I guess you care less about the percent of cost and more about how much it bottlenecks production.

99:43

Right, but the percentage of cost— It’s a proxy, yeah. If you're Jensen

99:50

or Sam Altman, or whoever stands to gain a lot from scaling up AI compute,

99:56

there are these stories that they'd go to TSMC and say, "Why can't we access Y and Z?"

100:01

But I think the point you're making is that it doesn't really

100:06

matter what TSMC does in some sense. In fact, even if you have Intel and

100:09

Samsung building more foundries, in the long run, you're going to be bottlenecked

100:13

by ASML and other tool and material makers. First, is that a correct interpretation?

100:18

Second, should Silicon Valley people be going to the Netherlands right now to try

100:23

to pitch ASML to make more tools so that in 2030 they can have more AI compute?

100:30

It's a funny dynamic we saw in 2023, 2024, and 2025.

100:35

People who saw the energy bottleneck before others asymmetrically went to

100:40

Siemens, Mitsubishi, and of course GE Vernova, and bought up turbine capacity.

100:45

Now they're able to charge excess amounts for deploying

100:47

these turbines in places because of energy. In the same sense, this could be done for EUV,

100:52

except ASML is not just going to trust any random bozo who wants to buy EUV tools.

101:00

These turbines are much cheaper than EUV tools, and there's many more of them produced.

101:04

Especially once you get to industrial gas turbines, not just combined-cycle but the cheaper,

101:10

smaller, less efficient ones, people put down deposits for these. Someone could

101:15

do this. Someone should go to the Netherlands and be like, "I'll pay you a billion dollars.

101:21

You give me the right to purchase ten EUV tools two years from now, and I'm first in line."

101:30

Then over those two years, you go around and wait for everyone to realize, "Oh crap,

101:34

I don't have enough EUV tools," and you try to sell your option at some premium.

101:38

All you're effectively doing is saying, "ASML, you're dumb.

101:41

You weren't making enough margin on these. I'm going to make a margin."

101:44

The question is, will ASML even agree to this? I don't think so.

101:49

There's a world where they at least get the demand signal from that to increase production.

101:53

Potentially. I agree. But it sounds like you're

101:56

saying they couldn't even increase production if they wanted to, given the supply chain.

101:59

Right. But that's exactly the market in which… If they can't increase production,

102:02

just like TSMC cannot increase production that fast, and yet demand is mooning,

102:06

then the obvious solution is to arbitrage this. You and I know demand is way higher than they're

102:12

projecting and their capability to build. You arbitrage this by locking up the capacity,

102:17

doing a forward contract, and then trying to sell it at a later date

102:21

once other people realize everything is fucked and we don't have enough capacity.

102:26

Then you'll have this insane margin that ASML and TSMC should have been charging.

102:30

But the thing is, I don't know if ASML and TSMC will ever agree to this.

102:34

Let me ask you about power now. It sounds like you think power

102:37

can be arbitrarily scaled. Not arbitrarily, but yes.

102:41

But beyond these numbers. If I'm remembering correctly, your blog post on

102:47

how AI labs are increasing power implied that GE Vernova, Mitsubishi, and Siemens could

102:54

produce 60 gigawatts a year in gas turbines. Then there are these other sources,

102:59

but they're less significant than the turbines. Only a fraction of that goes to AI, I assume.

103:10

If in 2030 we have enough logic and memory to do 200 gigawatts a year, do you just think that

103:15

these things are on a path to ramp up to more than 200 gigawatts a year, or what do you see?

103:20

Right now we're at 20 or 30. This is critical IT capacity, by the way,

103:26

which is an important thing to mention. When I'm talking about these gigawatts,

103:29

I'm talking about critical IT capacity. Server plugged in, that's how much power it pulls.

103:32

But there are losses along the chain. There is loss on transmission,

103:37

conversion, cooling, et cetera. So you should gross this factor

103:43

up from 20 gigawatts for this year, or 200 gigawatts by the end of the decade, to some

103:49

number 20-30% higher. Then you have capacity factors. Turbines don't run at 100 percent.

103:54

If you look at PJM, which I think is the largest grid in America—covering the Midwest and some of

103:59

the Northeast area—in their models they want to have roughly 20 percent excess capacity.

104:12

Within that 20 percent excess capacity, they're running all the turbines at 90%

104:16

because they are derated some for reliability, maintenance, and so on.

104:22

In reality, the nameplate capacity for energy is always way higher than the actual end critical IT

104:26

capacity because of all these factors. But it's not just turbines. If you were just making power

104:32

from turbines, that's simple, boring, and easy. Humans and capitalism are far more effective.

104:41

The whole point of that blog was that, yes, there are only three people making combined-cycle gas

104:45

turbines, but there's so much more we can do. We can do aeroderivatives. We can take

104:49

airplane engines and turn them into turbines. There are even new entrants in the market,

104:55

like Boom Supersonic trying to do that and working with Crusoe.

104:58

Also there's all the other ones like that already exist in the market.

105:00

There are also medium-speed reciprocating engines: engines

105:04

that spin in circles, like a diesel engine. There are ten people who make engines that way.

105:13

I'm from Georgia, and people used to be like, "Oh man, you got a

105:15

Cummins engine in there," regarding RAM trucks. Automobile manufacturing is going down, so these

105:22

companies all have capacity and could scale and convert that for data center power.

105:26

You stick all these reciprocating engines in. It's not as clean as combined-cycle, but maybe you

105:31

can convert them from diesel to gas if you want. What about ship engines? All of these engines for

105:38

massive cargo ships are great. Nebius is doing that for

105:41

a Microsoft data center in New Jersey. They're running ship engines to generate power.

105:49

Bloom Energy is doing fuel cells. We've been very positive on them for

105:52

a year and a half now because they have such a capability to increase their production.

105:57

Their payback period for a production increase is very fast, even if the cost

106:01

is a little bit higher than combined-cycle, which is the best for cost and efficiency.

106:06

Then there's solar plus battery, which can come online as those cost curves continue to come down.

106:11

There's wind, where you might only expect 15 percent of the maximum power because things

106:18

oscillate, but you add batteries. There are all these things. The other thing is that the

106:23

grid is scaled so we don't cut off power at peak usage on the hottest day of the summer.

106:32

But in reality, that's a load spike that is 10-20% higher than the average.

106:37

If you just put enough utility-scale batteries, or peaker plants that only

106:41

run a small portion of the year—and those could be gas, industrial gas turbines, combined-cycle,

106:49

batteries, or any of the other sources I mentioned—then all of a sudden you've

106:54

unlocked 20% of the US grid for data centers. Most of the time that capacity is sitting idle.

107:00

It's really only there for that peak, which is just a few hours over a few days of the year.

107:07

If you have enough capacity to absorb that peak load,

107:11

then all of the sudden you’ve transferred it all. Today, data centers are only 3-4% of the power of

107:15

the US grid, and by 2028 they'll be 10%. But if you can unlock 20% of the US grid

107:20

like this, it's not that crazy. The US grid is terawatt-level,

107:25

not hundreds-of-gigawatts-level. So we can add a lot more energy. I'm not saying

107:33

it's easy. These things are going to be hard. There's a lot of hard engineering,

107:36

risks people have to take, and new technologies people have to use.

107:40

But Elon was the first to do this behind-the-meter gas, and since then we've seen an explosion of

107:45

different things people are doing to get power. They're not easy,

107:50

but people are gonna be able to do them. The supply chains are just way simpler than chips.

107:56

Interesting. He made the point during the interview that for the specific blade for

108:00

the specific turbine he was looking at, the lead times go out beyond 2030. Your point is that—

108:06

That's great. There are so many other ways to make energy. Just be inefficient. It's fine.

108:10

Right now, combined-cycle gas turbines have CapEx of $1,500 per kilowatt.

108:17

Are you saying it would make sense to have either technologies that are much

108:20

more expensive than that, or other things are getting cheap enough to make it competitive?

108:24

Exactly. It can be as high as $3,500 per kilowatt. It could be twice as much as the cost of

108:31

combined-cycle, and the total cost of the GPU on a TCO basis has only gone up a few cents per hour.

108:40

Because we've been talking about Hopper pricing, $1.40, let's say the power price doubles.

108:46

The Hopper that was $1.40 is now $1.50 in cost. I don't care, because the models are improving so

108:54

fast that the marginal utility of them is worth way more than that ten-cent increase in energy.

109:00

So you're saying 20 percent of the grid—the grid is about one terawatt—can just come online from

109:06

utility-scale batteries, increasing what you'd be comfortable putting on the grid.

109:11

The regulatory mechanism there is not easy, by the way.

109:13

But that's 200 gigawatts, if that hypothetically happens.

109:18

Just from the different sources of gas generation you mentioned—the different kinds of engines

109:22

and turbines—combined, how many gigawatts could they unlock by the end of the decade?

109:28

We're tracking this in our data. There are over 16 different manufacturers

109:33

of power-generating things just from gas alone. Yes, there are only three turbine manufacturers

109:39

for combined-cycle, but we're tracking 16 different vendors,

109:43

and we have all of their orders. It turns out there are hundreds of

109:47

gigawatts of orders to various data centers. As we get to the end of the decade,

109:51

we think something like half of the capacity that's being added will be behind the meter.

109:59

Behind the meter is almost always more expensive than grid-connected, but there are just a lot of

110:03

problems with getting grid-connected: permits and interconnection queues and all this sort of stuff.

110:08

So even though it's more expensive, people are doing behind the meter.

110:12

What they're doing behind the meter ranges widely. It could be reciprocating engines, ship engines,

110:17

or aeroderivatives. It could be combined-cycle,

110:19

although combined-cycle is not that great for behind the meter.

110:22

It could be Bloom Energy fuel cells, or solar plus battery.

110:26

It could be any of these things. And you're saying any of these

110:29

individually could do tens of gigawatts? Any of these individually will do tens of

110:34

gigawatts, and as a whole, they will do hundreds of gigawatts.

110:36

Okay. So that alone should more than— Electrician wages will probably

110:42

double or triple again. There are going to be a lot of new people entering

110:45

that field, and a ton of people who make money, but I don't see that as the main bottleneck.

110:51

Right now in Abilene, at the 1.2-gigawatt data center that Crusoe is building for OpenAI,

110:59

I think they have 5,000 people working there, or at peak they did.

111:04

If you turn that into 100 gigawatts—and I'm sure things will get more efficient

111:10

over time—that would be 400,000 people it would take to build 100 gigawatts.

111:16

If you think about the US labor force, and how many electricians there are and how

111:20

many construction workers there are… I guess there are 800,000 electricians.

111:24

I don't know if they're all substitutable in this way.

111:26

There are millions of construction workers. But if we're in a world where we're adding

111:30

200 gigawatts a year, are we going to be crunched on labor eventually, or do you

111:35

think that is actually not a real constraint? Labor is a big constraint. It's a humongous

111:38

constraint in this. People have to be trained. Likewise, we'll probably

111:43

start importing the highest-skilled labor. It makes sense that a really high-skilled

111:50

electrician in Europe who was working on destroying power plants now comes

111:55

to America and is building high-voltage electricity moving across a data center.

112:03

Humanoid robots or robotics at least might start to help, but the main factor for reducing the

112:09

number of people is going to be modularizing things and making them in factories in Asia.

112:13

Unfortunately for America, places like Korea, Southeast Asia, and in many ways China as well

112:24

are going to ship more and more built-out sections of the data center and those will be shipped in.

112:34

Today you currently ship servers or a rack in, and then you plug that into different pieces that

112:40

you're shipping from different places. But now you'll ship it to a factory

112:43

and integrate the entire thing. Maybe this is a two-megawatt block,

112:48

and this block goes from high-voltage AC power to the DC voltage that you deliver

112:56

to the rack, or something like this. Or with cooling, you ship a fully

113:03

integrated unit that has a lot of the cooling subsystems already put together,

113:08

because plumbers are also a big constraint here. Furthermore, instead of just a single rack where

113:13

you have people wiring up all these racks with electricity, you take a skid and put an entire

113:19

row of servers on it that is shipped directly from the factories.

113:25

Today, a single rack may be 120 or 140 kilowatts, but as we get to next-generation Nvidia Kyber and

113:32

things like that, it's almost a megawatt. In addition, if you do an entire

113:36

row, it'll have the rack, the networking, the cooling, and the power all integrated together.

113:42

Now when you come in, you have much less to cable. There's less networking fiber, fewer power

113:53

connections, and fewer plumbing things. This can drastically reduce the number

113:58

of people working in data centers, so our capability to build them will be much larger.

114:03

Along the way, some people will move faster to new things, and some will move slower.

114:08

Crusoe and Google have been talking a lot about this modularization,

114:12

as have companies like Meta and many others. The people who move faster to new things may

114:24

face delays, while the people who are slower will face labor problems.

114:27

There will always be dislocations in the market because this is a very complex supply chain.

114:30

At the end of the day, it's still simple enough that we will be able

114:33

to solve it through capitalism and human ingenuity on the timescales required.

114:39

Speaking of big problems to solve, Elon Musk is very bullish on space GPUs.

114:46

If you're right that power is not a constraint on Earth… I guess the other reason they would

114:50

make sense is that even if there will be enough gas turbines or whatever on Earth,

114:55

Elon's next argument is that you can't get the permitting to build hundreds of gigawatts on

115:00

Earth. Do you buy that argument? Land-wise, America is big. Data

115:05

centers don't actually take up that much space, so you can solve that.

115:09

Permitting-wise, air pollution permits are a challenge, but the Trump administration

115:12

made it much easier. You go to Texas,

115:15

and you can skip a lot of this red tape. Elon had to deal with a lot of this complex

115:22

stuff in Memphis, and then building a power plant across the border for Colossus 1 and 2.

115:28

But at the end of the day, there's a lot more you can get away with in the middle of Texas.

115:32

Given that Elon lives in Texas, why didn't he just go to Texas?

115:34

I think it was partially that they over-indexed on grid power for a temporary period of time.

115:40

That's just what they thought they needed more of. Because they had an aluminum refinery

115:43

connected to the grid there. It was actually an idled appliance factory.

115:50

But I think they may have indexed more to grid power, water access, and gas access.

115:56

I think they bought that knowing the gas line was right there and they were going

115:59

to tap it. Same with water. It was a whole host of different constraints.

116:03

It was probably an area where electricians were easier to find.

116:07

At the end of the day, I'm not exactly sure why they chose that site.

116:10

I bet Elon would've chosen somewhere in Texas if he could've gone back because

116:16

of the regulatory challenges he faced. Ultimately, permitting is a challenge,

116:23

but America is a big place with 50 states, and things will get done.

116:27

There are a lot of small jurisdictions where you can just transport in all the workers

116:32

you need for a temporary period of three to twelve months, depending on the contractor.

116:37

You can put them in temporary housing and pay out the butt, because labor is very cheap relative

116:44

to the GPUs and the networking, and the end value of the tokens it's going to produce.

116:52

So there is plenty of room to pay for all of these things.

116:59

Also, people are also diversifying now. Australia, Malaysia, Indonesia, and India

117:06

are all places where data centers are going up at a much faster pace.

117:09

But currently, over 70% of AI data centers are still in America,

117:12

and that continues to be the trend. People are figuring out how to build these things.

117:19

Ultimately, dealing with permitting and red tape in middle-of-nowhere Texas,

117:23

Wyoming, or New Mexico is probably a hell of a lot easier than sending stuff into space.

117:30

Other than the economic argument making less sense once you consider that energy is a small fraction

117:36

of the total cost of ownership of a data center, what are the other reasons you're skeptical?

117:41

Obviously, power is basically free in space. That's the reason to do it.

117:45

Yeah, that's the reason to do it. But there are all the other counterarguments.

117:50

Even if power costs double on Earth, it's still a fraction of the total cost of the GPU.

117:54

The main challenge is… We have ClusterMAX, which rates all the neoclouds.

118:03

We test over 40 cloud companies, including the hyperscalers and neoclouds.

118:06

Outside of software, what differentiates these clouds the most is their ability to deploy and

118:11

manage failure. GPUs are horrendously unreliable. Even today, around 15% of Blackwells that get

118:19

deployed have to be RMA'd. You have to take them out.

118:21

Sometimes you just have to plug them back in, but sometimes you have to take

118:23

them out and ship them back to Nvidia or their partners who do the RMAs and such.

118:28

What do you make of Elon's argument that after an initial phase, they actually don't fail that much?

118:34

Sure, but now you've done this, tested them all, deconstructed them, put them on a spaceship,

118:39

launched them into space, and then put them online again. That takes months. If

118:44

your argument is that a GPU has a useful life of five years, and this takes six additional months,

118:57

that is 10% of your cluster's useful life. Because we're so capacity-constrained,

119:04

that compute is theoretically most valuable in the first six months you have it.

119:08

We're more constrained now than we will be in the future.

119:11

That compute can contribute to a better model in the future, or generate revenue

119:15

today that you can use to raise more money. All these things make now the most important

119:20

moment, but you've potentially delayed your compute deployment by six months.

119:25

What separates these cloud providers is… We see some clouds taking six months to deploy

119:28

GPUs right here on Earth. We see clouds that take

119:31

a lot less than six months. So the question is, where does space get in there?

119:36

I don't see how you could test them all on Earth, deconstruct them, and ship them to space without

119:41

it taking significantly longer than just leaving them in the facility where you tested them.

119:45

The question I wanted to ask is about the topology of space communication.

119:50

Right now, Starlink satellites talk to each other at 100 gigabits per second.

119:56

You could imagine that being much higher with optical intersatellite

120:00

laser links optimized for this. That actually ends up being quite

120:04

close to InfiniBand bandwidth, which is 400 gigabytes a second.

120:09

But that's per GPU, not per rack. So multiply that by 72. Also, that was Hopper. When you go

120:16

to Blackwell and Rubin, that 2x's and 2x's again. But how much compute is happening per… During

120:24

inference, are the different scale-ups still working together, or is inference just

120:27

happening as a batch within a single scale-up? A lot of models fit within one scale-up domain,

120:33

but many times you split them across multiple scale-up domains.

120:42

As models become more and more sparse, which is the general trend, you want to

120:48

ping just a couple of experts per GPU. If leading models today have hundreds,

120:53

if not a thousand, of experts, then you'd want to run this across hundreds or thousands of chips,

120:59

even as we advance into the future. So then you end up with the problem of

121:05

needing to connect all these satellites together for communications as well.

121:09

That would be tough. If there's a world where you could do inference for a batch on a single

121:17

scale-up, then maybe it's more plausible. But if not, it's a different story.

121:21

Networking these chips together is a problem, and you can't just

121:24

make the satellite infinitely large. There are a lot of physics challenges to

121:29

making a satellite really big. That's why you need these

121:34

interconnects between the satellites. Those interconnects are more expensive. In a cluster,

121:38

15-20% of the cost is networking. All of a sudden, you're using space lasers

121:43

instead of simple lasers that are manufactured in volumes of millions with pluggable transceivers.

121:50

And those things are very unreliable as well, more unreliable than the GPUs by the way.

121:54

Across the life of a cluster, you have to unplug and clean them all the time.

121:57

You have to unplug and replug them just for random reasons.

121:59

These things are just not as reliable. So you've got that problem as well.

122:03

You've got a more expensive, complicated space laser to communicate instead of this

122:08

pluggable optical transceiver that's been produced in super high volume.

122:11

So all in all, what does that imply for space data centers?

122:13

Space data centers effectively are not limited by their energy advantage.

122:19

They are limited by the same contended resource. We can only make two hundred gigawatts

122:24

of chips a year by the end of the decade. What are we going to do to get that capacity?

122:29

It doesn't matter if it's on land or in space. It doesn’t really matter,

122:36

because you can build that power. Human capabilities and capacity could get

122:41

to the period where we're adding a terawatt a year globally of various types of power.

122:47

At some point, we do cross the chasm where space data centers make sense, but it's not this decade.

122:52

It is much further out, once energy constraints actually become a big bottleneck

122:59

and land permitting becomes a much bigger bottleneck as it subsumes more of the economy.

123:04

And crucially, once chips are no longer the bottleneck.

123:07

Right now, chips are the biggest bottleneck. You want them deployed and working on

123:11

AI the moment they're manufactured. There are a lot of things people are

123:15

doing to increase that speed faster and faster. They’re modularizing data centers, or even

123:20

modularizing racks where you put the chip in at the data center, but only the chip and everything

123:26

else is already wired up and ready to go. There are things like this people are doing to

123:31

decrease that time that you cannot do in space. At the end of the day, all that matters in a

123:36

chip-constrained world is getting these chips producing tokens ASAP.

123:43

Maybe by 2035, the semiconductor industry, ASML, Zeiss, and suppliers like Lam

123:45

Research and Applied Materials and other fab manufacturers will catch up once the pendulum

123:53

swings and we are able to make enough chips. Then we will be optimizing every dial and it makes

123:58

sense to optimize the 10-15% of energy costs. As we move to ASICs potentially,

124:03

and if Nvidia's margins aren't +70%, maybe that energy cost becomes 30% of the cluster.

124:11

These are the things to optimize. But Elon doesn't win by doing 20% gains. He

124:18

never wins that way. Elon wins when he swings for the fences and does 10X gains. That's what SpaceX

124:24

is about. That's what Tesla is about. All of his success has been about that, not chasing the 20%.

124:31

I think space data centers will eventually be a 10X gain as Earth's resources get more

124:37

and more contentious, but that's not this decade. Just to drive some intuition about how much land

124:42

there is on Earth… Obviously, for the chips themselves, especially if you move to a world

124:46

where you have racks that have megawatts— That's the other thing. If manufacturing is

124:55

the constraint, right now it's roughly one watt per square millimeter for AI chips.

125:01

One easy way to improve that is to pump it to two watts per square millimeter.

125:05

You may not get 2x the performance, you may only get 20% more performance,

125:09

and that requires much more exotic cooling. It requires more complicated cold plates

125:13

and complex liquid cooling, or maybe even things like immersion cooling.

125:18

In space, higher watts per millimeter is very difficult,

125:20

whereas on Earth, these are solved problems. One of these things enables you to get a lot

125:25

more tokens, maybe 20% more tokens per wafer that's manufactured, and that's a humongous win.

125:31

Square millimeter, you mean of die area? Yeah, of die area.

125:36

It would be better for space because more watts per millimeter means the chip runs hotter.

125:42

I guess this is a question of computer chip engineering, but it cools to the

125:46

fourth power by the Stefan-Boltzmann law. If you can run a very hot

125:49

chip, it allows a lot of— No, you can't run it hotter.

125:51

You can only run it denser. The problem is that getting

125:54

the heat out of that dense area means you have to move away from standard air and liquid cooling to

126:00

more exotic forms of liquid cooling, or even immersion, to get to higher power densities.

126:05

That's more difficult in space than it is on Earth.

126:08

Maybe it's worth explaining at this point what exactly a scale-up is and what it looks like for

126:13

Nvidia versus Trainium versus TPUs. Earlier I was mentioning how

126:22

communication within a chip is super fast. Communication within chips that are in the

126:26

same rack is fast, but not as fast. It's on the order of terabytes.

126:30

Communication very far away is on the order of hundreds of gigabytes.

126:36

As you get further distance, maybe across the country, the order

126:39

of magnitude is on the order of gigabytes. A scale-up domain is this tight domain

126:44

where the chips are communicating on the order of terabytes a second.

126:50

For Nvidia, previously this meant an H100 server had eight GPUs,

126:55

and those eight GPUs could talk to each other at terabytes a second.

126:58

With Blackwell NVL72, they implemented rack-scale scale-up.

127:03

That meant all seventy-two GPUs in the rack could connect to each other at terabytes a second.

127:09

The speed doubled generation on generation, but the most important innovation was going from eight

127:13

to seventy-two in the domain. When we look at Google,

127:16

their scale-up domain is completely different. It has always been on the order of thousands.

127:20

With TPU v4, they had pods the size of four thousand chips.

127:23

With v8 or v7, they have pods in the eight or nine thousand range.

127:31

What's relevant here is that it's not the same as Nvidia. It's not like for like.

127:35

Google has a topology that's a torus. Every chip connects to six neighbors.

127:40

Nvidia's 72 GPUs connect all-to-all. They can send terabytes a second to

127:46

any arbitrary other chip in that pod of scale-up. Whereas Google, you have to bounce through chips.

127:52

If TPU 1 needs to talk to TPU 76, it has to bounce through various chips, and there is always some

127:59

blocking of resources when you do that because that one TPU is only connected to six other TPUs.

128:04

So there is a difference in topology and bandwidth,

128:07

and there are trade-offs and advantages to both. Google gets to have a massive scale-up domain,

128:11

but they have the trade-off of bouncing across chips to get from one to another.

128:15

You can only talk to six direct neighbors. Amazon has mutated their scale-up domain.

128:23

They're somewhere in between Nvidia and Google. They're trying to make larger scale-up domains.

128:28

They try to do all-to-all to some extent with switches, which is what Nvidia does, but they also

128:33

use torus topologies like Google to some extent. As we advance forward to next generations,

128:40

all three of them are moving more towards a dragonfly topology.

128:44

That means there are some fully connected elements and some elements that are not fully connected.

128:49

You can get the scale-up to be hundreds or thousands of chips, but also have it not contend

128:54

for resources when bouncing through chips. Related question: I heard somebody make the

129:00

claim that the reason parameter scaling has been slow—and only now are we getting bigger models

129:08

from OpenAI and Anthropic—is that… The original GPT-4 is over a trillion parameters, and only now

129:18

are models starting to approach that again. I heard a theory that the reason is that

129:24

Nvidia's scale-ups have just not had that much memory capacity.

129:37

Let's say you have a 5T model running at FP8, so that's five trillion gigabytes.

129:43

And then you have the KV cache, let's say it's— Just call it the same size.

129:47

Okay, let's say it's the same size for one batch. So you need ten terabytes to be able to run…

129:54

A single forward pass, yeah. And then only with the GB200 and NVL72

129:59

do you have an Nvidia scale-up that has twenty terabytes, and before that they were much smaller.

130:03

Whereas Google, on the other hand, has had these huge TPU pods that are not all-to-all,

130:09

but still have hundreds of terabytes of capacity in a single scale-up.

130:13

Does that explain why parameter scaling has been slow?

130:16

I think it's partially the capacity and bandwidth, but also as you build a larger

130:22

model, the ability to deploy it is slower. In terms of what the inference speed is for

130:28

the end user, that's kind of irrelevant. What's really relevant is RL. What we've seen with these

130:33

models and allocation of compute at a lab… There are a few main ways you can allocate compute.

130:38

You can allocate it to inference, i.e. revenue. You can allocate it to development,

130:42

i.e. making the next model. You can allocate it to research.

130:46

In development specifically, you split it between pre-training and RL.

130:52

When you think about what is happening, the compute efficiency gains you get from research

130:58

are so large that you actually want most of your compute to go to research, not to development.

131:04

All these researchers are generating new ideas, trying them out, testing them,

131:08

and continuing to push the Pareto optimal curve of scaling laws further and further.

131:14

Empirically, what we’ve seen is that model costs get ten times cheaper

131:17

every year, or even more than that. At the same scale it gets ten times cheaper,

131:23

and to reach new frontiers it costs the same amount or more.

131:27

So you don't want to allocate too many resources to pre-training and RL.

131:33

You actually want to allocate most of your resources to research.

131:36

In the middle is this development period. If you pre-train a five-trillion-parameter model,

131:45

how many rollouts do you have to do in RL? Rollouts for a five-trillion-parameter model

131:51

are five times larger than for a one-trillion-parameter model.

131:54

If you wanted to do as many rollouts—maybe the larger model is two times more sample

131:57

efficient—now you need 2.5x as much time of RL to get the model smarter.

132:05

Or you could RL the smaller model for 2x the time. You'd still have a 25% difference in the big

132:12

model, which is 2x as sample efficient and doing X number of rollouts.

132:16

But the smaller model, which is a trillion parameters, although its

132:19

less sample efficient, is doing twice as many rollouts and is still done faster.

132:23

You get the model sooner, you've done more RL, and then you can take that model to help you

132:28

build the next models, help your engineers train, and do all these research ideas.

132:33

This feedback loop is actually weighed towards smaller models in every case,

132:39

no matter what your hardware is. As you look to Google, they do

132:42

deploy the largest production model of any of the major labs with Gemini Pro.

132:49

It's a larger model than GPT-5.4. It's a larger model than Opus.

132:55

Google does this because they have a unipolar set of compute. It's almost all TPU. Whereas Anthropic

133:04

is dealing with H100s, H200s, Blackwell, Trainiums, and TPUs of various generations.

133:12

OpenAI is dealing with mostly Nvidia right now, but going towards having AMD and Trainium as well.

133:18

The fleets of compute like Google's can just optimize around a larger model.

133:23

They can leverage a thousand chips in a scale-up domain to get the RL time speed much faster

133:30

so that this feedback loop can be fast. But at the end of the day, in isolation,

133:36

you almost always want to go with a smaller model that gets RL'd faster and gets deployed

133:41

into research and development earlier. You can build the next thing and

133:44

get more efficiency wins. You have this compounding

133:47

effect of making a smaller model that can be deployed into research and development earlier.

133:53

I spend less compute on the training because I was able to allocate more compute to the research.

133:58

This compounding effect of being able to do research faster and faster is

134:01

potentially a faster takeoff. That's all these companies want:

134:03

the fastest takeoff possible. Okay, a spicy question. You've explained

134:10

that SemiAnalysis sells these spreadsheets. You're always pointing out how six

134:14

months or a year ago, you warned people about the memory crunch.

134:17

Now you're telling people about the cleanroom crunch, and in the future, the tool crunch.

134:22

Why is Leopold the only person using your spreadsheets to make outrageous money? What

134:27

is everybody else doing? I think there are a lot

134:30

of people making money in many ways. Leopold jokes that he's the only client

134:38

of mine who tells me our numbers are too low. Everyone else tells me our numbers are too high,

134:42

almost ad nauseam. Whether it's a hyperscaler saying,

134:46

"Hey, that other hyperscaler, their numbers are too high," and we're like, "Nah, that's it."

134:50

They're like, "No, no, no, it's impossible," blah, blah, blah.

134:52

You finally have to convince them through all these facts and data when we're working with

134:55

hyperscalers or AI labs that in fact, no, that number isn't too high, that's correct.

135:00

Eventually, sometimes it takes them six months to realize, or a year later.

135:05

Other clients, on the trading side, also use our data.

135:12

Roughly 60% of my business is industry. So AI labs, data center companies,

135:17

hyperscalers, semiconductor companies, the whole supply chain across AI infrastructure.

135:23

But 40% of our revenue is hedge funds. I'm not going to comment on who our customers are,

135:28

but a lot of people use the data. It's just how do you interpret it,

135:33

and then what do you view as beyond it? I will say Leopold is pretty much the only person

135:39

who tells me my numbers are too low, always. Sometimes he's too high, sometimes I'm too low.

135:44

But in general, I think other people are doing that.

135:50

You can look across the space at hedge funds and look at their 13Fs and see they own, maybe not

135:56

exactly what Leopold does, because it's always a question of what is the most constrained thing.

136:00

What's the thing that's going to be most outside of expectations?

136:03

That's what you're really trying to exploit: inefficiencies in the market.

136:06

In a sense, our data is making the market more efficient by making the base data

136:12

of what's happening more accurate. Many funds do trade on information

136:22

that is out there… I don't think Leopold's the only person.

136:26

I think he has the most conviction about the AGI takeoff, though.

136:32

Right, but the bets are not about what happens in 2035.

136:37

The bets that you're making—that are at least exemplified by public returns we can see for

136:41

different funds including Leopold's—are about what has happened in the last year.

136:45

The last year stuff could be predicted using your spreadsheets.

136:50

It's about buying the next year's spreadsheets. They're not just spreadsheets. There are

136:53

reports. There's API access to the data. There's a lot of data.

136:56

But do you see what I mean? It's not about some crazy singularity thing.

137:00

It's about, do you buy the memory crunch? You only buy the memory crunch if you

137:05

believe AI is going to take off in a huge way. The memory crunch, a lot of it was predicated

137:12

on… At least for people in the Bay Area who think about infrastructure, it's obvious.

137:17

KV cache explodes as context lengths get longer, so you need more memory. Then you do the math.

137:22

You also have to have a lot of supply chain understanding of what fabs are being built,

137:25

what data centers are being built, how many chips, and all these things.

137:28

We track all these different datasets very tightly, but at the end of the day,

137:32

it takes someone to fully believe that this is going to happen.

137:38

A year ago, if you told someone memory prices would quadruple and smartphone

137:42

volumes are going to go down 40% over the year or two after that, people were like,

137:48

"You're crazy. That'd never happen." Except a few people do believe that, and those people did trade

137:52

memory. And people did. I don't think Leopold was the only person buying memory companies.

138:00

He, of course, sized and positioned and did things in better ways than some, maybe most.

138:06

I don't want to comment on whose returns are what, but he certainly did well.

138:12

Other people also did really well. Wow, you've made me diplomatic for

138:18

the first time ever. No, no, you're fine. I think this is hilarious. I'm being a

138:22

diplomat, whereas usually I'm spicy. Okay, some rapidfire to close out.

138:31

If you're saying with the memory, logic, et cetera, the N3 is mostly

138:38

going to be AI accelerators, but then there's N2, which is mostly Apple now… In the future,

138:44

I guess AI would also want to go on N2. Can TSMC kick out Apple if Nvidia and

138:53

Amazon and Google say, "Hey, we're willing to pay a lot of money for N2 capacity?"

138:59

I think the challenge with this is chip design timelines take a long while, so that's more

139:04

than a year out, and the designs that are on two nanometer are more than a year out.

139:08

What would really happen is Nvidia and all these others will be like, "Hey,

139:12

we're going to prepay for the capacity and you're going to expand it for us."

139:17

Maybe TSMC takes a little bit of margin, but not a ton.

139:21

They're not going to kick Apple out entirely. What they're going to do is when Apple orders X,

139:25

they might say, "Hey, we project you only need X minus one, and so that's what we're going to

139:29

give you, X minus one." Then that flex capacity,

139:31

Apple's kind of screwed on. Traditionally, Apple has always

139:35

over-ordered by 10% and cut back by 10% over the course of the year.

139:38

Some years they hit the entire 10%. Volumes vary based on the season and macro.

139:47

I don't think TSMC would kick out Apple. I think Apple will become a smaller and smaller

139:52

percentage of TSMC's revenue, and therefore be less relevant for TSMC to cater to their demands.

139:57

TSMC could eventually start saying, "Hey, you've got to pre-book your capacity for next year,

140:01

for two years out, and you have to prepay for the CapEx," because that's what Nvidia and

140:05

Amazon and Google are doing. I wonder if it's worth

140:08

going into specific numbers. I don't have any of them on hand.

140:15

What percentage of N2 does Apple have its hands on over the coming years versus AI?

140:22

This year Apple has the majority of N2 that's going to get fabricated.

140:26

There's a little bit from AMD. They are trying to make some AI

140:28

chips and CPU chips early. There's a little bit,

140:30

but for the most part, it's Apple. As we go forward to the year after that, Apple

140:36

still gets closer to half of it as other people start ramping, but then it falls drastically,

140:43

just like for N3, where they were half. When I say N2, that includes A16,

140:49

which is a variant of N2. Over time, those nodes will be the majority.

140:56

What's also interesting is traditionally, Apple has been the first to a process node.

141:00

2 nm is actually the first time they're not. Well, that’s besides Huawei. Huawei,

141:04

back in 2020 and before, was the first with Apple, but they were both making smartphones.

141:08

Now, with 2 nm, you've got AMD trying to make a CPU and a GPU chiplet that

141:14

they use advanced packaging to package together, in the same timeframe as Apple.

141:21

This is a big risk for AMD that causes potential delays because it's a brand-new

141:26

process technology. It's hard. But at the end of the day, this is a bet that they want to do

141:30

to scale faster than Nvidia and try and beat them. As we move forward, when we move to the A16 node,

141:36

the first customer there is not even Apple. It's AI. As we move forward,

141:41

that will become more and more prevalent. Not only will Apple not be the first to a node,

141:46

they will also not be the majority of the volume to the new node.

141:49

They'll then just be like any old customer. Because the scale of TSMC's CapEx keeps

141:53

ballooning, but Apple's business is not growing at the same pace,

141:56

they become a less and less relevant customer. They also will just cut their orders because

142:02

things in the supply chain are kicking them out, whether it be

142:04

packaging or materials or DRAM or NAND. These things are increasing in cost.

142:10

They can't pass on all the cost to customers likely because the consumer is not that strong.

142:14

You end up with this conundrum where they are just not TSMC's

142:18

best bud like they have been historically. Do you think if Huawei had access to 3 nm,

142:23

they would have a better accelerator than Rubin? Potentially, yeah. Huawei was the

142:29

first with a 7 nm AI chip as well. They were the first with a 5 nm mobile chip,

142:33

but they were the first with a 7 nm AI chip. The Huawei Ascend was two months before the TPU

142:41

and four months before Nvidia's A100, I think. That's just moving to a process node.

142:49

That doesn't imply software or hardware design or all these other things.

142:55

But Huawei is arguably the only company in the world that has all the legs. Huawei has cracked

143:02

software engineers. Huawei has cracked networking technologies. That's, in fact,

143:06

their biggest business historically. They have cracked AI talent. Furthermore, beyond Nvidia,

143:13

they actually have better AI researchers. Beyond Nvidia, they have their own fabs.

143:18

And beyond Nvidia, they have their own end market of selling tokens and things like that.

143:23

Huawei is able to get the top, top talent. Nvidia is as well, but not with as much

143:30

concentration, and Huawei has a bigger pool in China.

143:33

It's very arguable that Huawei, if they had TSMC, would be better than Nvidia.

143:38

There are areas where China has advantages in areas that Nvidia can't access as easily.

143:46

Not just scale, but certain optical technologies China's actually really good at.

143:54

I think it's very reasonable that if in 2019 Huawei was not banned from using TSMC,

144:02

Huawei would have already eclipsed Apple as the biggest TSMC customer.

144:06

Huawei has huge share in networking, compute, CPUs, and all these things.

144:10

They would have kept gaining share, and they'd likely be TSMC's biggest customer.

144:14

Wow. That's crazy. I've got a random final question for you.

144:18

The other part of the Elon interview was robots. If humanoids take off faster than people expect,

144:24

if by 2030 there's millions of humanoids running around which each need local compute,

144:33

any thoughts on what that implies? What would be required for that?

144:37

There's a lot of difficulties with the VLMs and VLAs that people are deploying on robots.

144:46

But to some extent, you don't need to have all the intelligence in the robot.

144:49

It would be much more efficient to not do that. Because in the cloud, you can batch

144:54

process and all these things. What you may want to do is have a

144:58

lot of the planning and longer-horizon tasks determined by a much more capable model in

145:04

the cloud that runs at very high batch sizes. Then it pushes those directions to the robots,

145:08

who interpolate between each subsequent action. Or it is given a command like, "Hey,

145:13

pick up that cup," and then the model on the robot can pick up the cup.

145:17

As it's picking up, things like weight and force may have to be determined by the model

145:27

on the robot, but not everything needs to be. It can say, "hey that’s a headphone" and the

145:34

super model in the cloud can say, "I know these headphones are Sony XM6s,"

145:38

which is not a Dwarkesh ad spot, but... I’m like, why is this guy's plugging this

145:42

thing so hard. It's on the table. It's on his neck when we're interviewing Satya together.

145:48

Is he getting paid by Sony? Unfortunately not. But anyways,

145:53

it might say, "Hey, the headband is soft, and this is the weight of it," and all these things.

145:58

Then the model on the robot can be less intelligent,

146:00

take these inputs, and do the actions. It may get told by the model in the cloud

146:05

every second, or maybe ten times a second, depending on the hertz of the action.

146:09

But a lot of that can be offloaded to the cloud. Otherwise, if you do all of the processing on the

146:15

device, I believe it would be more expensive because you can't batch.

146:17

Two, you couldn't have as much intelligence as you do in the cloud because the

146:20

models will just be bigger in the cloud. Three, we're in a semiconductor shortage world,

146:25

and any robot you deploy needs leading-edge chips because the power is really bad for robots.

146:31

You need it to be low power and efficient, and all of a sudden you're taking power

146:36

and chips that would've been for AI data centers, and you're putting them in robots.

146:39

So now that 200 gigawatts gets lower if you're deploying millions of humanoids.

146:43

I think this is very interesting because something people might not appreciate

146:47

about the future is how centralized, in a physical sense, intelligence will be.

146:52

Right now, there are eight billion humans, and their compute is in their heads, on their person.

147:00

In the future, even with robots that are out physically in the world—obviously,

147:04

knowledge work will be done in a centralized way from data centers with hundreds of thousands

147:09

or maybe millions of instances—the future you're suggesting is one where there's more

147:17

centralized thinking and centralized computation driving millions of robots out in the world.

147:25

That's an interesting fact about the future that I think people might not appreciate.

147:28

I think Elon recognizes this, which is why he's going to different places for his chips.

147:35

He signed this massive deal with Samsung to make his robot chips in Texas because I personally

147:41

think he thinks Taiwan risk is huge. Because of that and the centralization

147:46

of resources in Taiwan, having his robot chips in Texas means having a separate

147:51

supply chain that is not as constrained. No one's really making AI chips on Samsung

147:56

besides Nvidia's new LPU that they launched. They’re launching it next week, but we're

148:01

recording this the week before. This episode's coming out Friday.

148:04

Oh, this episode's coming out before. Sick. They're launching this new AI chip

148:09

next week which is built on Samsung, but that's a recent development from Nvidia.

148:15

That's the only other AI demand there, whereas on TSMC, everything is competing.

148:19

He gets both geopolitical diversification and supply chain diversity for his robots,

148:25

and he's not competing as much with the infinite willingness to pay for the data center geniuses.

148:34

Final question, on Taiwan. If we believe that tools are the ultimate bottleneck,

148:41

how much of Taiwan's place in the AI semiconductor supply chain could we de-risk simply by having a

148:50

plan to airlift every single process engineer at TSMC out if they get blockaded or something?

148:56

Or do you still need to ship out the EUV tools, which would be multiple plane loads

149:02

per single tool and would not be practical? If you ship out all the process engineers and

149:06

assuming it's hot enough that you destroy the fabs, no one has all the fabs in Taiwan now,

149:11

which is a big risk. These tools actually use a lot of

149:16

semiconductors which are manufactured in Taiwan. It's a snake eating its own tail meme because

149:22

you can't make the tools without the chips from Taiwan, which you can't use without the tools in

149:26

Taiwan. There's obviously some diversification there. They don't use super advanced chips in

149:32

lithography tools, but at the end of the day, there is some dragon eating its tail.

149:36

Just shipping out all the engineers and blowing up the fabs means China has a

149:40

stronger semiconductor supply chain than the rest of the world in terms of verticalization,

149:44

now that you've removed Taiwan. You've got all the know-how,

149:49

but you've got to replicate it in, let's say, Arizona or wherever for TSMC.

149:56

It's going to take a long time to build all the capacity that TSMC has built over the years.

150:01

And so you've drastically slowed US and global GDP.

150:06

Not just growth, you've shrunk the GDP massively, and you've got a lot bigger problems.

150:12

Your incremental ability to add compute goes to almost zero.

150:16

Instead of hundreds of gigawatts a year by the end of the decade,

150:18

let's say something happens to Taiwan, now you're at maybe 10 gigawatts across Intel and Samsung,

150:24

or 20 gigawatts. It's nothing. Now all of a sudden you've really caused some crazy dynamics in AI.

150:31

Of course, you have all the existing capacity, but that existing capacity pales in comparison

150:35

to the capacity that's being expanded. Okay. Dylan, that was excellent. Thank

150:39

you so much for coming on the podcast. Thank you for having me. And see you tonight.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Dylan Patel, CEO of SemiAnalysis, joins Dwarkesh Patel to discuss the semiconductor landscape and the physical constraints on scaling AI. Patel explains that the massive CapEx from Big Tech is being funneled into long-term infrastructure like turbines, data center construction, and power agreements. He identifies chip manufacturing—specifically ASML's EUV lithography tools—as the ultimate bottleneck for the decade, while also addressing the current memory crunch that is driving up costs for consumer electronics. The conversation covers the diverging strategies of OpenAI and Anthropic, the potential for behind-the-meter power generation, and why terrestrial data centers remain more economically viable than space-based alternatives due to maintenance and deployment speed.

Recently Distilled

Videos recently processed by our community

Should You Still Trust US Stocks? Scott Galloway's 20-Year Investing Playbook | Office Hours

Jun 2, 2026

by The Prof G Pod – Scott Galloway

The Man That Makes Millionaires: How To Turn $1,000 Into $100 Million!: Alex Hormozi | E235

Jun 2, 2026

by The Diary Of A CEO

Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494

Jun 2, 2026

by Lex Fridman

Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age | Lex Fridman Podcast #495

Jun 2, 2026

by Lex Fridman

FFmpeg: The Incredible Technology Behind Video on the Internet | Lex Fridman Podcast #496

Jun 2, 2026

by Lex Fridman

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

Jun 2, 2026

by Lex Fridman

Broadway Director Talks His Tony-Nominated Musical | Bloomberg Talks

Jun 2, 2026

by Bloomberg Podcasts