HomeVideos

Dylan Patel — The single biggest bottleneck to scaling AI compute

Now Playing

Dylan Patel — The single biggest bottleneck to scaling AI compute

Transcript

1706 segments

0:00

All right, this is the episode where  my roommate teaches me semiconductors. 

0:04

It's also the send off for this current set. It is. After you use it, I'm like,  

0:09

"I can't use this again. I gotta get out of here." 

0:11

No sloppy seconds for Dwarkesh. Dylan is the CEO of SemiAnalysis. 

0:18

Dylan, here’s the burning question I have for you. If you add up the big four—Amazon, Meta, Google,  

0:23

Microsoft—their combined forecasted CapEx this  year that you published recently is $600 billion. 

0:31

Given yearly prices of renting that compute,  that would be close to 50 gigawatts. 

0:38

Obviously, we're not putting  on 50 gigawatts this year,  

0:40

so presumably that's paying for compute that is  going to be coming online over the coming years. 

0:46

How should we think about the timeline around  when that CapEx comes online? Similar question  

0:51

for the labs. OpenAI just announced they  raised $110 billion, and Anthropic just  

0:57

announced they raised $30 billion. If you look at the compute they  

1:01

have coming online this year—you should  tell me how much it is, but is it on the  

1:08

order of another four gigawatts total? The cost to rent the compute that OpenAI  

1:10

and Anthropic will have this year to sustain their  compute spend is $10 to $13 billion a gigawatt. 

1:18

Those individual raises alone are enough  to cover their compute spend for the year. 

1:23

And this is not even including the revenue  that they're going to earn this year. 

1:26

So help me understand: first,  what is the timescale at which  

1:30

the Big Tech CapEx actually comes online? And second, what are the labs raising all  

1:34

this money for if the yearly price of a  one-gigawatt data center is $13 billion? 

1:41

So when you talk about the CapEx of these  hyperscalers being on the order of $600 billion,  

1:46

and you look across the rest of the supply chain,  it gets you to the order of a trillion dollars. 

1:51

A portion of this is immediately for compute  going online this year: the chips and the  

2:00

other parts of CapEx that get paid this year. But there's a lot of setup CapEx as well. 

2:05

When we're talking about 20 gigawatts of  incremental added capacity this year in America,  

2:11

a portion of this is not spent this year. A portion of that CapEx was actually  

2:16

spent the prior year. When you look at Google  

2:19

having $180 billion, a big chunk of that is  spent on turbine deposits for '28 and '29. 

2:25

A chunk of that is spent on data  center construction for '27. 

2:28

A chunk of that is spent on power purchasing  agreements, down payments, and all these other  

2:33

things they're doing further out into the future  so they can set up this super fast scaling. 

2:40

This applies to all the hyperscalers  and other people in the supply chain. 

2:45

So with roughly 20 gigawatts deployed this year,  a big chunk is hyperscalers, and a chunk is not. 

2:51

For all of these companies, their biggest  customers are Anthropic and OpenAI. 

2:55

Anthropic and OpenAI are at roughly  two to two-and-a-half gigawatts right  

3:02

now, and they're trying to scale much larger. If you look at what Anthropic has done over the  

3:08

last few months, with $4 billion  or $6 billion in revenue added,  

3:11

we can just draw a straight line and say they'll  add another $6 billion of revenue a month. 

3:16

People would argue that’s bearish,  and that they should go faster. 

3:18

What that implies is they're going to add $60  billion of revenue across the next ten months. 

3:26

At the current gross margins Anthropic had,  as last reported by media, that would imply  

3:33

they have roughly $40 billion of compute spend for  that inference, for that $60 billion of revenue. 

3:39

That $40 billion of compute, at roughly  $10 billion a gigawatt in rental costs,  

3:44

means they need to add four gigawatts of  inference capacity just to grow revenue. 

3:49

That’s assuming their research and  development training fleet stays flat. 

3:55

In a sense, Anthropic needs to get to well  above five gigawatts by the end of this year. 

3:59

It's going to be really tough for  them to get there, but it's possible. 

4:01

Can I ask a question about that? If Anthropic was not on track to have  

4:06

five gigawatts by the end of this year, but it  needs that to serve both the revenue that's gone  

4:12

crazier than expected—and maybe it's going to be  even more than that—plus the research and training  

4:16

to make sure its models are good enough for next  year: Where is that capacity going to come from? 

4:21

Dario, when he was on your  podcast, was very conservative. 

4:24

He said, "I'm not going to go crazy on  compute because if my revenue inflects  

4:28

at a different rate, at a different  point… I don't want to go bankrupt. 

4:31

I want to make sure that we're being  responsible with this scaling." 

4:35

But in reality, he's screwed the pooch  compared to OpenAI, whose approach was,  

4:40

"Let's just sign these crazy fucking deals." OpenAI has got way more access to compute  

4:46

than Anthropic by the end of the year. What does Anthropic have to do to get the compute? 

4:50

They have to go to lower-quality providers  that they would not have gone to before. 

4:56

Anthropic historically had the best  quality providers, like Google and  

5:00

Amazon, the biggest companies in the world. Now Microsoft is expanding across the supply  

5:07

chain, and they're going to other newer players. OpenAI has been a bit more  

5:12

aggressive on going to many players. Yes, they have tons of capacity from Microsoft,  

5:16

Google, and Amazon, but they also  have tons with CoreWeave and Oracle. 

5:20

They've gone to random companies, or companies  one would think are random, like SoftBank Energy,  

5:25

who has never built a data center in their life  but is building data centers now for OpenAI. 

5:29

They've gone to many others,  like NScale, to get capacity. 

5:35

There's this conundrum for Anthropic because  they were so conservative on compute,  

5:42

because they didn't want to go crazy. In some sense, a lot of the financial  

5:46

freakouts in the second half of last year  were because, "OpenAI signed all these  

5:50

deals but they didn't have the money to pay  for them…" Okay, Oracle's stock is going to  

5:55

tank, CoreWeave's stock is going to tank. All these companies' stocks tanked,  

5:58

and credit markets went crazy because people  thought the end buyer couldn't pay for this. 

6:02

Now it's like, "Oh wait,  they raised a ton of money. 

6:04

Okay, fine, they can pay for it." Anthropic was a lot more conservative. 

6:07

They were like, "We'll sign  contracts, but we'll be principled. 

6:11

We'll purposely undershoot what we think we  can possibly do and be conservative because  

6:16

we don't want to potentially go bankrupt." The thing I want to understand is, what does  

6:20

it mean to have to acquire compute in a pinch? Is it that you have to go with neoclouds? Do they  

6:26

have worse compute? In what way is it worse? Did you have to pay gross margins to a cloud  

6:31

provider that you wouldn't have otherwise had to  pay because they're coming in at the last minute? 

6:35

Who built the spare capacity such  that it's available for Anthropic  

6:39

and OpenAI to get last minute? What is the concrete advantage  

6:42

that OpenAI has gotten if they end up  at similar compute numbers by 2027? 

6:48

Are they just going to end this  year with different gigawatts? 

6:50

If so, how many gigawatts are Anthropic and  OpenAI going to have by the end of this year? 

6:56

To acquire excess compute, yes,  there is capacity at hyperscalers. 

7:01

Not all contracts for compute  are long-term, five-year deals. 

7:04

There's compute from 2023 or 2024, or H100s  from 2025, that were signed at shorter terms. 

7:11

The vast majority of OpenAI's compute is signed  on five-year deals, but there were many other  

7:16

customers that had one-year, two-year,  three-year, or six-month deals, on demand. 

7:20

As these contracts roll off,  who is the participant in the  

7:24

market most willing to pay price? In this sense, we've seen H100 prices  

7:30

inflect a lot and go up. People are willing to  

7:34

sign long-term deals for above $2 even. I've seen deals where certain AI labs—I'm being a  

7:42

little bit vague here for a reason—have signed at  as high as $2.40 for two to three years for H100s. 

7:49

If you think about the margin, it costs  $1.40 to build Hopper, across five years. 

7:57

Now, two years in, you're signing deals for two  to three years at $2.40? Those margins are way  

8:03

higher. Now you can crowd out all of these other  suppliers, whether Amazon had these, or CoreWeave,  

8:09

or Together AI, or Nebius, or whoever it is. These neoclouds are the firms that had a  

8:19

higher percentage of Hopper in general  because they were more aggressive on it. 

8:23

They also tended to sign shorter-term  deals, not CoreWeave but the others. 

8:30

So if I want Hopper, there  is some capacity out there. 

8:33

Also, while most of the capacity at an Oracle  or a CoreWeave is signed for a long-term deal  

8:39

in terms of Blackwell, anything that's  going online this quarter is already sold. 

8:44

In some cases, they're not even hitting all the  numbers they promised they would sell because  

8:48

there are some data center delays, not just those  two, but Nebius, Microsoft, Amazon, and Google. 

8:53

But there are a lot of neoclouds, as well as some  of the hyperscalers, who have capacity they're  

8:57

building that they haven't sold yet, or capacity  they were going to allocate to some internal use  

9:02

that is not necessarily super AGI-focused,  that they may now turn around and sell. 

9:06

Or in the case of Anthropic, they don't  have to have all the compute directly. 

9:10

Amazon can have the compute and serve Bedrock,  or Google can have the compute and serve Vertex,  

9:15

or Microsoft can have the compute  and serve Foundry, and then do a  

9:18

revenue share with Anthropic, or vice versa. Basically, you're saying Anthropic is having to  

9:22

pay either this 50% markup in the sense of the  revenue share, or in the sense of last-minute  

9:28

spot compute that they wouldn't have otherwise  had to pay had they bought the compute early. 

9:32

Right, there's a trade-off  there. But at the same time,  

9:38

for a solid four months, everyone was saying to  OpenAI, "We're not going to sign deals with you." 

9:43

That sounds crazy, but it was  because, "you don’t have the money." 

9:45

Now everyone's saying, "OpenAI,  we believed you the whole time. 

9:48

We can sign any deal because  you've raised all this money." 

9:51

Anthropic is constrained in that sense. There are not that many incremental buyers of  

9:58

compute yet, because Anthropic hit the capability  tier first where their revenue is mooning. 

10:03

That's interesting. Otherwise you  might think having the best model is an  

10:08

extremely depreciating asset, because three  months later you don't have the best model. 

10:12

But the reason it's important is that you  can sign these deals, lock in the compute  

10:16

in advance, and get better prices. Maybe this is an obvious point. 

10:22

But at least until recently, people had made this  huge point about the depreciation cycle of a GPU. 

10:30

The bears, the Michael Burrys or  whoever, have said, "Look, people  

10:33

are saying four or five years for these GPUs. Maybe it's because the technology is improving  

10:41

so fast, but it in fact makes sense to have  two-year depreciation cycles for these GPUs,"  

10:46

which increases the reported amortized CapEx  in a given year and makes it financially  

10:53

less lucrative to build all these clouds. But in fact you’re pointing out that maybe the  

10:58

depreciation cycle is even longer than five years. If we're using Hoppers—especially if AI really  

11:03

takes off and in 2030 we’re saying, "We have  to get the seven-nanometer fabs up, we have  

11:08

to go back and turn on the A100s again"—then the  depreciation cycle is actually incredibly long. 

11:18

I feel like that's an interesting financial  implication of what you're saying. 

11:21

There's a few strings to pull on there. One is, what happens to depreciation of GPUs? 

11:30

I guess I didn't answer your prior question,  which is that I think Anthropic will be able to  

11:34

get to five gigawatts-ish, maybe a little  bit more by the end of the year through  

11:38

themselves as well as their product being  served through Bedrock, Vertex, or Foundry. 

11:45

I think they'll be able to get to five or six  gigawatts, which is way above their initial plans. 

11:53

OpenAI will be roughly the same, actually  a little bit higher based on our numbers. 

11:59

But anyway, the depreciation cycle of a GPU. Michael Burry was saying it's three years  

12:04

or less. That’s sort of his argument.  There are two lenses to look at this. 

12:09

Mechanically, there's a TCO model, total cost of  ownership of a GPU, where we project pricing out  

12:17

for GPUs and build up the total cost of a cluster. There are a number of costs: your data center  

12:23

cost, your networking cost, your smart hands and  people in the data center swapping stuff out. 

12:29

There's your spare parts, your  actual chip cost, your server cost. 

12:32

All these various costs get lumped together. There's some depreciation cycles on it,  

12:37

certain credit costs on it. You build up to, "Hey, an H100 costs  

12:42

$1.40/hour to deploy at volume across five  years if your depreciation is five years." 

12:48

If you sign a deal at $2/hour for those five  years, your gross margin is roughly 35%. 

12:53

It's a little bit above that. If you sign it for $1.90, it's 35% roughly. 

12:58

Then you assume at that fifth year,  the GPU falls off a bus and is dead. 

13:03

In some cases, the argument people are making is  if you didn't sign a long-term deal, because every  

13:09

two years NVIDIA is tripling or quadrupling the  performance while only 2X-ing or 50% increasing  

13:15

the price… Then the price of an H100… Sure  maybe the value in the market was $2 at 35%  

13:20

gross margins in 2024, but in 2026, when Blackwell  is in super high volume and deploying millions  

13:28

a year, you’re actually now worth $1/hour. And when Rubin in '27 is in super high volume—even  

13:33

though it starts shipping this year, it’s super  high volume next year—doing millions of chips a  

13:38

year deployed into clouds, you've got another  3X in performance, another 50% or 2X in price,  

13:44

then the Hopper is only worth $0.70/hour. So the price of a GPU would continue to  

13:49

fall. That's one lens. The other lens is,  what is the utility you get out of the chip? 

13:54

If you could build infinite Rubin  or infinite of the newest chip,  

13:59

then yes, that's exactly what would happen. The price of a Hopper would fall at a spot  

14:04

or short-term contract rate as the new chips  come out and the price per performance goes up. 

14:10

But because you are so limited on semiconductors  and deployment timelines, what actually prices  

14:18

these chips is not the comparative thing I  can buy today, but rather what is the value  

14:24

I can derive out of this chip today. In that sense, let's take GPT-5.4. 

14:31

GPT-5.4 is both way cheaper to run than  GPT-4 and has fewer active parameters. 

14:38

It's much smaller, in that sense  of active parameter, because it's a  

14:42

sparser MoE versus GPT-4 being a coarser MoE. There's also been so many other advancements  

14:47

in training, RL, model architecture, and data  qualities that have made GPT-5.4 way better than  

14:54

GPT-4. And it's cheaper to serve. When you look  at an H100, it can serve more tokens per GPU of  

15:02

5.4 than if you had ran GPT-4 on it. So it's producing more tokens of a  

15:07

model that is of higher quality. What is the maximum TAM for GPT-4 tokens? 

15:16

Maybe it was a few billion dollars, maybe  it was tens of billions of dollars. Adoption  

15:19

takes time. For GPT-5.4, that number  is probably north of a hundred billion. 

15:23

But there's an adoption lag, there's  competition, and there's the constant  

15:27

improvements that everyone else is having. If improvements stopped here, the value of  

15:32

an H100 is now predicated on the value  that GPT-5.4 can get out of it instead  

15:36

of the value that GPT-4 can get out of it. These labs are in a competitive environment,  

15:42

so their margins can't go to infinity. You sort of have this dynamic that is  

15:47

quite interesting in that an H100 is worth  more today than it was three years ago. 

15:51

That's crazy. It's also interesting from  the perspective of just taking that forward. 

15:56

If we had actual AGI models developed, if  we had a genuine human on a server… These  

16:06

are such hand wave-y numbers about  how many flops the brain can do. 

16:08

But on a flop basis, an H100 is estimated to  do 1e15, which is how much some people estimate  

16:15

the human brain does in flops. Obviously, in terms of memory,  

16:19

the human brain has way more. An H100 is 80 gigabytes,  

16:22

and the brain might have petabytes. Oh, yeah, you've got petabytes? Name a petabyte  

16:28

of ones and zeros, bro. Name me a string. Well, this is actually the point. 

16:33

No, we’ve just got the best  sparse attention techniques ever. 

16:36

Genuinely though. In the amount of information  that is compressed, it might be petabytes. 

16:42

The brain is an extremely sparse MoE. But anyways, imagine a human knowledge  

16:48

worker can produce six figures a year of value. If an H100 can produce something close to that,  

16:54

if we had actual humans on a server, the  value of an H100 is such that it can repay  

16:58

itself in the course of a couple of months. So when I interviewed Dario, the point I was  

18:02

trying to make is not that I think the singularity  is two years away and therefore Dario desperately  

18:08

needs to buy more compute, although the revenue is  certainly there that he needs to buy more compute. 

18:12

The point I was trying to make is that given what  Dario seems to be saying—given his statements that  

18:17

we're two years away from a data center of  geniuses, and certainly not more than five  

18:21

years away, and a data center of geniuses should  be earning trillions upon trillions of dollars  

18:25

of revenue—it just does not make sense why  he keeps making these statements about being  

18:30

more conservative on compute or, to your point,  being less aggressive than OpenAI on compute. 

18:35

I guess that point got lost because then people  were roasting me, saying, "Oh, this podcaster  

18:39

is trying to convince this multi-hundred  billion dollar company CEO to YOLO it, bro." 

18:44

I was just trying to say that internally,  his statements are inconsistent. 

18:50

Anyway, it's good to iron it out. I think going back to the earlier  

18:55

view that if the models are so powerful, the  value of a GPU goes up over time, right now  

19:06

only OpenAI and Anthropic have that viewpoint. But as we approach further out, everyone is going  

19:11

to be able to see that value skyrocket per GPU. So in that sense,  

19:19

you should commit now to compute. Interestingly, in Anthropic fashion,  

19:28

there's a bit of a meme that they have  commitment issues and are sort of polyamorous. 

19:35

Not Dario, but this is a bit of a meme. Explains everything. By the way, there's  

19:42

this interesting economic effect called  Alchian-Allen, which is the idea that if  

19:48

you increase the fixed cost of different goods,  one of which is higher quality and one which is  

19:54

lower quality, that will make people choose  the higher quality good, on the margin. 

19:59

To give a specific example, suppose the  better-tasting apple costs two dollars and  

20:04

the shittier apple costs one dollar. Now suppose you put an import tariff on them. 

20:10

Now it's $3 versus $2 for a great  apple versus a medium apple. 

20:15

Is that because they both increased by a  dollar, or should it be a 50% increase? 

20:18

No, because they both increased by $1. The whole effect is that if there's  

20:22

a fixed cost that is applied to both. Then the price difference between them,  

20:28

the ratio, changes. Previously, the more  

20:31

expensive one was 2X more expensive. Now it's just 1.5X more expensive. 

20:34

So I wonder if applied to AI that would mean  that, if GPUs are going to get more expensive,  

20:39

there will be a fixed cost  increase in the price of compute. 

20:43

As a result, that will push people to be willing  to pay higher margins for slightly better models. 

20:51

Because the calculus is, I'm going to be  paying all this money for the compute anyway. 

20:55

I might as well just pay slightly more to  make sure it's the very best model rather  

21:00

than a model that's slightly worse. So the Hopper went from $2 to $3. 

21:03

If a Hopper can make a million tokens of Opus  and it can make two million tokens of Sonnet,  

21:11

the price differential between Opus and  Sonnet has decreased because the price of  

21:15

the GPU has increased by a dollar from $2 to $3.  Interesting. I think that makes a ton of sense. 

21:22

We just see all of the volumes  are on the best models today,  

21:25

all the revenue is on the best models today. In a compute-limited world, two things happen. 

21:34

One, companies that don't have commitment issues  and have these five-year contracts for compute  

21:41

have locked in a humongous margin advantage. They've locked in compute for five years  

21:47

at the price it transacted at  two, three, or five years ago. 

21:51

Whereas if you're three years into that  five-year contract and someone else's  

21:55

two-year or three-year contract rolled off, and  now they're trying to buy that at modern pricing,  

22:00

when it's priced to the value of models,  the price is going to be up a lot more. 

22:05

So the person who committed early  has better margins in general. 

22:11

The percentage of the market that is in long-term  contracts is much larger than the percentage of  

22:15

the market in short-term contracts that can be  this flex capacity you add at the last second. 

22:21

At the same time, where does the margin go? Because models get more valuable,  

22:28

how much can the cloud players flex their pricing? 

22:33

If you look at CoreWeave, their average  term duration is over three years right now. 

22:39

For ninety-eight percent plus of  their compute, it's over three years. 

22:43

They end up with this conundrum  where they can't actually flex price. 

22:46

But every year they're adding incrementally  way more capacity than they had previously. 

22:52

This year alone, Meta's adding as much capacity as  they had in their entire fleet of compute and data  

22:58

centers for all purposes for serving WhatsApp,  Instagram, and Facebook in 2022, and doing AI. 

23:03

They're adding that alone this year. In the same sense, you talk about Meta doing that,  

23:07

CoreWeave, Google, and Amazon, all these companies  are adding insane amounts of compute year on year. 

23:13

That new compute gets transacted at the new price. In a sense, yes, you've locked in, as long as  

23:19

we're in a takeoff. "Oh, OpenAI went from six  hundred megawatts to two gigawatts last year,  

23:24

and from two gigawatts to six plus this  year, and six to twelve next year." 

23:29

The incremental added compute is where all the  cost is, not the prior long-term contracts. 

23:34

Then who holds the cards is the  infra providers for charging margin. 

23:38

Now the cloud players, the neoclouds, or  the hyperscalers can charge the margin. 

23:43

They can to some extent, but then as you go  upstream to who has access to all the memory and  

23:48

logic capacity, it's Nvidia for the most part. They've signed a lot of long-term contracts. 

23:53

They've got ninety billion dollars of long-term  contracts today, and they're negotiating  

23:56

three-year deals today with the memory vendors. You've got Amazon and Google through Broadcom,  

24:04

Amazon directly, and AMD. These companies hold all the  

24:07

cards because they've secured the capacity. TSMC is not raising prices, but memory vendors  

24:13

are, to some extent, raising a lot of price. They're going to double or triple price again, but  

24:18

then they're also signing these long-term deals. Who is able to accrue all the margin dollars is  

24:23

potentially the cloud, potentially the  chip vendors, and the memory vendors,  

24:28

until TSMC or ASML break out and say,  "No, we're going to charge a lot more." 

24:33

But at the same time, do the model  vendors get to charge crazy margins? 

24:38

At least this year, we're going to see  margins for the model vendors go up a lot. 

24:41

Because they're so capacity constrained,  they have to destroy demand. 

24:46

There's no way Anthropic can continue at  the current pace without destroying demand. 

24:52

Let's get into logic and memory. How specifically has Nvidia been  

24:58

able to lock up so much of both? I think according to your numbers,  

25:02

by '27, Nvidia is going to have +70% of  N3 wafer capacity, or around that area. 

25:12

I forget what the numbers were for memory  at SK Hynix and Samsung and so forth. 

25:19

Think about how the neocloud business  works and how Nvidia works with that,  

25:22

or how the RL environment business  works and how Anthropic works with that. 

25:26

In both those cases, Nvidia is purposely trying to  fracture the complementary industry to make sure  

25:33

that they have as much leverage as possible. They're giving allocation to random neoclouds  

25:37

to make sure that there's not one  person that has all the compute. 

25:39

Similarly, Anthropic or OpenAI, when they're  working with the data providers, they say, "No,  

25:44

we're going to just seed a huge industry  of these things so that we're not locked  

25:48

into any one supplier for data environments." And I wonder why on the 3 nm process—that's  

25:56

going to be Trainium 3, that's going to be  TPU v7, other accelerators potentially—why  

26:03

is TSMC just giving it all up to Nvidia  rather than trying to fracture the market? 

26:09

There are a couple points here. On 3 nm, if we go back to last year,  

26:15

the vast majority of 3 nm was Apple. Apple is being moved to 2 nm. 

26:20

Memory prices are going up, so  Apple's volumes may go down. 

26:24

As memory prices go up, either  they cut margin or they move on. 

26:29

There's some time lag because they have  long-term contracts, but Apple likely  

26:33

reduces demand or moves to 2 nm faster, where  2 nm is only capable of mobile chips today. 

26:39

In the future, AI chips will move there.  So Apple has that. Apple is also talking to  

26:44

third-party vendors because they're  getting squeezed out of TSMC a little bit. 

26:48

TSMC's margins on high-performance computing—HPC,  AI chips, et cetera—are higher than they are  

26:54

for mobile, because they have a bigger  advantage in HPC than they do in mobile. 

27:00

When you look at TSMC’s running calculus  here, they're actually providing really good  

27:06

allocations to companies that are doing CPUs. When you think about Amazon having Trainium and  

27:14

Graviton, both of those are on 3 nm, Graviton  being their CPU, Trainium being their AI chip. 

27:20

TSMC is much more excited to give  allocation to Graviton than they  

27:23

are to Trainium because they view the CPU  business as more stable, long-term growth. 

27:30

As a company that is conservative and doesn't  want to ride cycles of growth too hard,  

27:35

you actually want to allocate to the market that  is more stable with a lower growth rate first  

27:42

before you allocate all the incremental capacity  to the fast growth rate market. That is the case  

27:48

generally. Same for AMD. The allocations they  get on their CPUs, TSMC is much more excited  

27:57

about those than they are for GPUs. Likewise  for Amazon. Nvidia is a bit unique because yes,  

28:03

they have CPUs, they make switches, they make  networking, NVLink, InfiniBand, Ethernet, NICs. 

28:11

By and large, most of these things will  be on 3 nm by the end of this year with  

28:14

the Rubin launch and all the chips in that  family, the GPU being the most important one. 

28:20

Yet Nvidia is getting the majority of supply. Part of this is because you look at the market  

28:27

and TSMC and others forecast market demand in  many ways, but it's also the market signal. 

28:36

The market signaled, "Hey, we need this much  capacity next year. We need this much. We'll  

28:42

sign non-cancelable, non-returnable. We  may even pay deposits." Nvidia just did  

28:46

it way earlier than Google or Amazon. In some cases, Google and Amazon had  

28:53

stumbling blocks. One of the chips  

28:56

got delayed slightly by a couple quarters. Trainium and all these sorts of things happened. 

29:01

In that case, there was a huge sort  of, "Well, these guys are delaying,  

29:05

but Nvidia is wanting more, more, more, more. And we are checking with the rest of the supply  

29:10

chain, is there enough capacity?" They're going to all the PCB  

29:13

vendors and saying, "Is there enough PCB?" Victory Giant is one of the largest suppliers  

29:18

of PCBs to Nvidia, and they're a Chinese company. All the PCBs come from China, or many of them. 

29:25

They're like, "Do you have enough PCB capacity?  Great. Hey memory vendors, who has all the memory  

29:28

capacity? Okay, Nvidia does. Great." When you  look at who is AGI-pilled enough to buy compute  

29:36

on long timelines at levels that seem ridiculous  to people who aren't AGI-pilled—but nonetheless,  

29:42

they're willing to pay a pretty good margin and  sign it now because they view in the future that  

29:49

ratio is screwed up—the same thing happens  with the supply chain for semiconductors. 

29:54

I don't think Nvidia is quite AGI-pilled. Jensen doesn't believe software is going  

29:58

to be fully automated and all these things. Accelerated computing, not AI chips, right? 

30:03

It's AI chips. But that's what he calls it, right? 

30:05

Yeah. I think it's a broader term, AI is within  that, but also physics modeling and simulations. 

30:11

But it's like he's not  embracing the main use case. 

30:14

I think he's embracing it, but I just don't  think he's AGI-pilled like Dario or Sam. 

30:19

But he's still way, way more AGI-pilled than  Google was in Q3 of last year, or Amazon was  

30:30

in Q3 of last year, and he saw way more demand.  The reason is pretty simple. You can see all the  

30:33

data center construction. He's like, "Okay,  

30:34

I want to have this market share." We have all the data centers tracked,  

30:38

and there's a lot of data centers  that could be one or the other. 

30:44

To some extent, Google and Amazon, Google  especially, even though their TPU is just  

30:49

better for them to deploy, they have to  deploy a crap load of GPUs because they  

30:52

don't have enough TPUs to fill up their  data centers. They can't get them fabbed. 

30:56

I have a question about that. Google sold a million, was it  

31:00

the v7s? Yes. 

31:01

—the Ironwoods to Anthropic, and you're saying the  big bottleneck right now, this year or next year,  

31:07

I guess going forward forever now,  is going to be the logic and memory,  

31:13

the stuff it takes to build these chips. Google has DeepMind, the third prominent AI lab. 

31:19

If this is the big bottleneck, why would they  sell it rather than just giving it to DeepMind? 

31:24

This is again a problem of… DeepMind people  were like, "This is insane. Why did we do  

31:29

this?" But Google Cloud people and Google  executives saw a different thought process. 

31:37

You and I know the compute team at Anthropic. Both of the main people came from Google. 

31:45

They saw this dislocation, they negotiated  a deal, and they were able to get access  

31:49

to this compute before Google realized. The chain of events, at least from our  

31:54

data that we found, was in early Q3, over  the course of six weeks, we saw capacity  

32:06

on TPUs go up by a significant amount. It went up multiple times in those six  

32:12

weeks. There were multiple requests. Google  even had to go to TSMC and explain to them  

32:18

why they needed this increase in  capacity because it was so sudden. 

32:21

A lot of that capacity increase  was for selling to Anthropic. 

32:25

Because Anthropic saw it before Google. And then Google had Nano Banana and Gemini  

32:29

3 which caused their user metrics to skyrocket. Then leadership at Google was like, "Oh." 

32:34

Then they started making the statement that  we have to double compute every six months,  

32:37

or whatever the exact number was. They really woke up a lot more, and then  

32:42

they went to TSMC and said, "We want more. We want  more." TSMC replied, "Sorry guys, we're sold out. 

32:50

We can maybe get 5-10% more for 2026,  but really we're going to work on 2027." 

32:54

There was this information asymmetry among  the labs, in my mind. I don't know exactly.  

32:59

It's the narrative I've spun myself from  seeing all the data in the supply chain on  

33:02

wafer orders and what's going on with the data  centers that Anthropic and Fluidstack signed. 

33:09

It's pretty clear to me that Google screwed up. You can see this from Google's Gemini ARR. 

33:16

They had next to nothing in Q1 to Q3—in Q3  a little bit once they started inflecting. 

33:21

But in Q4 they reached $5 billion  in revenue on an ARR basis. 

33:30

It's clear Google didn't see  revenue skyrocket initially. 

33:34

In a sense, Anthropic had a little bit of  commitment issues before their ARR exploded,  

33:40

even though they had far more information  asymmetry and saw what was coming down the pipe. 

33:44

Google is going to be more conservative  than Anthropic and Google had even less ARR. 

33:52

So they were just not willing to do it,  and then they realized they should do it. 

33:58

Since then, Google has gotten absurdly  AGI-pilled in terms of what they're doing.  

34:05

They bought an energy company. They're  putting deposits down for turbines. 

34:09

They're buying a ridiculous  percentage of powered land. 

34:13

They're going to utilities and  negotiating long-term agreements. 

34:15

They're doing this on the data center  and power side very aggressively. 

34:22

I think Google woke up towards the end  of last year, but it took them some time. 

34:26

How many gigawatts do you think Google  will have by the end of next year? 

34:28

Buy my data. You charge for that kind of information. 

34:32

Yes, yes. I feel like every year the  bottleneck for what is preventing us  

34:37

from scaling AI compute keeps changing. A couple years ago it was CoWoS. Last  

34:41

year it was power. You'll tell me  what the bottleneck is this year. 

34:45

But I want to understand five years  out, what will be the thing that is  

34:48

constraining us from deploying the singularity? The biggest bottleneck is compute. For that,  

34:55

the longest lead time supply chains  are not power or data centers. 

34:59

They're actually the semiconductor  supply chains themselves. 

35:01

It switches back from power and data  centers as a major bottleneck to chips. 

35:08

In the chip supply chain, there's  a number of different bottlenecks. 

35:11

There's memory, logic wafers from  TSMC, and the fabs themselves. 

35:17

Construction of the fabs takes two to three years,  versus a data center which takes less than a year. 

35:25

We've seen Amazon build data  centers in as fast as eight months. 

35:28

There's a big difference in lead  times because of the complexity  

35:31

of building the fab that actually makes the chips. 

35:33

The tools also have really long lead times. The bottlenecks, as we've scaled,  

35:39

have shifted based on what the supply  chain is currently not able to do. 

35:44

It was CoWoS, power, and data centers, but  those were all shorter lead time items. 

35:50

CoWoS is a much simpler process  of packaging chips together. 

35:54

Power and data centers are ultimately way simpler  than the actual manufacturing of the chips. 

35:59

There's been some sliding of capacity  across mobile or PC to data center chips,  

36:08

which has been somewhat fungible. Whereas CoWoS, power, and data  

36:12

centers have had to start anew as supply chains. But now there's no more capacity for the mobile  

36:19

and PC industries—which used to be the majority  of the semiconductor industry—to shift over to AI. 

36:26

Nvidia is now the largest customer at TSMC  and SK Hynix, the largest memory manufacturer. 

36:33

It's sort of impossible for the  sliding of resources away from  

36:39

the common person's PCs and smartphones  to shift any more towards the AI chips. 

36:45

So now the question is how do  we scale AI chip production? 

36:48

That's the biggest bottleneck as we go to 2030. It would be very interesting if there's an  

36:53

absolute gigawatt ceiling that you can project  out to 2030 based just on "We can't produce more  

37:01

than this many EUV machines." To scale compute further,  

37:06

there are different bottlenecks this year and  next year, but ultimately by 2028 or 2029,  

37:11

the bottleneck falls to the lowest rung  on the supply chain, which is ASML. 

37:16

ASML makes the world's most  complicated machine: an EUV tool. 

37:21

The selling price for those is $300-400 million. Currently, they can make about 70. 

37:27

Next year, they'll get to 80. Even under very aggressive supply  

37:31

chain expansion, they only get to a little  bit over 100 by the end of the decade. What  

37:35

does that mean? They can make a hundred of these  tools by the end of the decade, and 70 right now. 

37:40

How does that actually translate to AI compute? We see all these numbers from Sam Altman and  

37:46

many others across the supply chain:  gigawatts, gigawatts, gigawatts. 

37:50

How many gigawatts are we adding? We see Elon saying a hundred gigawatts in space. 

37:55

A year. A year. The problem with any of  

37:59

these numbers, or the challenge to these numbers,  is actually not the power or the data center. 

38:04

We can dive into that, but  it's manufacturing the chips. 

38:07

Take a gigawatt of Nvidia's Rubin chips. Rubin is announced at GTC,  

38:14

I believe the week this podcast goes live. To make a gigawatt worth of data center  

38:19

capacity of Nvidia's latest chip that they're  releasing towards the end of this year,  

38:24

you need a few different wafer technologies. You need about 55,000 wafers of 3 nm. 

38:32

You need about 6,000 wafers of 5 nm, and then  you need about 170,000 wafers of DRAM memory. 

38:41

Across these three different buckets,  each requires different amounts of EUV. 

38:46

When you manufacture a wafer, there are thousands  and thousands of process steps where you're  

38:50

depositing material and removing them. But the key critical step—which at least  

38:55

in advanced logic is 30% of the  cost of the chip—is something that  

39:00

doesn't actually put anything on the wafer. You take the wafer, you deposit photoresist,  

39:04

which is a chemical that chemically  changes when you expose it to light. 

39:07

Then you stick it into the EUV tool, which  shines light at it in a certain way. It  

39:11

patterns it. There's what's called a mask,  which is effectively a stencil for the design. 

39:16

When you look at a leading-edge 3 nm wafer, it has  70 or so masks, 70 or so layers of lithography,  

39:23

but 20 of them are the most advanced EUV. If you need 55,000 wafers for a gigawatt, and you  

39:33

do 20 EUV passes per wafer, you can do the math. That's 1.1 million passes of EUV for a single  

39:43

gigawatt. It's pretty simple. Once you add the  rest of the stuff, it ends up being 2 million,  

39:47

across 5 nm and all the memory. You're at roughly 2 million EUV  

39:52

passes for a single gigawatt. These tools  are very complicated. When you think about  

39:57

what it's doing across a wafer, it's taking  the wafer and scanning and stepping across. 

40:03

It does this dozens of times  across the whole wafer. 

40:09

When you're talking about how  many EUV passes, that’s the  

40:11

entire wafer being exposed at a certain rate. An EUV tool can do roughly 75 wafers per hour,  

40:19

and the tool is up roughly 90% of the time. In the end, you need about three and a half  

40:26

EUV tools to do the 2 million EUV  wafer passes for the gigawatt. 

40:32

So three and a half EUV  tools satisfies a gigawatt. 

40:35

It's funny to think about the numbers. What does  a gigawatt cost? It costs roughly $50 billion.  

40:40

Whereas what do three and a half EUV tools cost?  That's $1.2 billion. It's actually quite a lower  

40:46

number, which is interesting to think about. Fifty gigawatts of economic CapEx in the data  

40:53

center, and what gets built on top of  that in terms of tokens is even larger. 

40:56

It might be $100 billion worth of  AI value into the supply chain,  

41:10

three years, TSMC has done $100 billion of CapEx.  So it's $30/$30/$40 billion. A small fraction of  

41:19

that is being used by Nvidia for the 3 nm, or  previously 4 nm, that it's using for its chips. 

41:30

What were its earnings last  quarter? It was $40 billion.  

41:34

So $40 billion times four is $160 billion. Nvidia alone is turning some small fraction  

41:41

of $100 billion in CapEx, which is going to  be depreciated over many years and not just  

41:45

this one year, into $160 billion in a single year. That gets even more intense when you go down the  

41:50

supply chain to ASML, which is taking a billion  dollars' worth of machines to produce a gigawatt. 

41:54

Of course, those machines last for more  than a year so it’s doing more than that. 

41:58

Now I want to understand, how many  such machines will there be by 2030,  

42:02

if you include not just the ones that are sold  that year, but have been compiling over the  

42:06

previous years? What does that imply? Sam Altman  says he wants to do a gigawatt a week in 2030. 

42:14

When you add up those numbers,  is it compatible with that? 

42:17

That's completely compatible,  if you think about it. 

42:19

TSMC and the entire ecosystem have  something like 250 to 300 EUV tools already. 

42:26

Then you stack on 70 this year, 80  next year, growing to 100 by 2030. 

42:30

You're at 700 EUV tools by the end of the decade.  700 EUV tools, at three and a half tools per  

42:35

gigawatt—assuming it's all allocated to AI, which  it's not—gets you to 200 gigawatts worth of AI  

42:43

chips for the data centers to deploy. Sam wants 52 gigawatts a year. 

42:49

He's only taking 25% share then. Obviously, there's some share given to mobile and  

42:54

PC, assuming we're even allowed to have consumer  goods still and we don't get priced out of them. 

43:04

But roughly, he's saying 25% market  share of the total chips fabbed. 

43:09

That's very reasonable given that this  year alone, I think he's going to have  

43:14

access to 25% of the Blackwell GPUs  that are deployed. It's not that crazy. 

43:23

When did ASML start shipping  EUV tools, when 7 nm started? 

43:27

I don't know when that was exactly. You're saying in 2030, they're going to be using  

43:31

machines that initially were shipped in 2020. So for ten years, you're using the same most  

43:36

important machine in this  most technologically advanced  

43:39

industry in the world? I find that surprising. ASML's been shipping EUV tools now for roughly a  

43:45

decade, but it only entered mass volume production  around 2020. The tool's not the same. Back then,  

43:52

the tools were even lower throughput. There are various specifications around  

43:57

them called overlay. I was mentioning you're  

43:59

stacking layers on top of each other. You'll do some EUV, you'll do a bunch  

44:02

of different process steps—depositing stuff,  etching stuff, cleaning the wafer—dozens of  

44:07

those steps before you do another EUV layer. There's a spec called overlay, which is:  

44:11

you did all this work, you drew these lines  on the wafer, now I want to draw these dots. 

44:17

Let's say I want to draw these dots to  connect these lines of metal to holes,  

44:21

and then the next layer up is another set of lines  going perpendicular, so now you're connecting  

44:25

wires going perpendicular to each other. You have to be able to land them on top of  

44:30

each other. It's called overlay. Overlay is  a spec that's been improved rapidly by ASML. 

44:36

Wafer throughput has been  improved rapidly by ASML. 

44:38

The price of the tool has gone up, but not  as much as the capabilities of the tool. 

44:42

Initially, the EUV tools were $150 million. Over time, they're now $400 million  

44:49

as I look out to 2028. But the capabilities of the  

44:51

tools have more than doubled as well, especially  on throughput and overlay accuracy, which is  

44:56

the ability to accurately align the subsequent  passes on top of each other even though you do  

45:03

tons of steps between. ASML is improving super  rapidly. It's also noteworthy to say that ASML  

45:13

is maybe one of the most generous companies  in the world. They have this linchpin thing.  

45:19

No one has anything competitive. Maybe China will  have some EUV by the end of the decade, but no one  

45:24

else has anything even close to EUV, and yet they  haven't taken price and margins up like crazy. 

45:31

You go ask some other folks that we  talk to all the time, like Leopold,  

45:37

and they're like, "Let's have the price go  up." Because they can. The margin is there.  

45:42

You can take the margin. Nvidia takes the  margin. Memory players are taking the margin. 

45:45

But ASML has never raised the price more than  they've increased the capability of the tool. 

45:51

In a sense, they've always provided  net benefit to their customers. 

45:54

It's not that the tool is stagnant,  it's just that these tools are old. 

45:58

Yes, you can upgrade them some,  and the new tools are coming. 

46:01

For simplicity's sake, we're  ignoring the advances in overlay  

46:06

or throughput per tool for this podcast. You say we're producing 60 of these machines  

46:10

this year and then 70, 80 over subsequent years. 

46:15

What would happen if ASML just decided  to double its CapEx or triple its CapEx? 

46:20

What is preventing them from  producing more than 100 in 2030? 

46:23

Why are you so confident that  even five years out, you can be  

46:27

relatively sure what their production will be? I think there are a couple factors here. 

46:31

ASML has not decided to just go YOLO,  let's expand capacity as fast as possible. 

46:37

In general, the semiconductor  supply chain has not. 

46:39

It's lived through the booms and busts,  and we can talk a bit more about it. 

46:43

Basically some players have recently woken  up, but in general no one really sees demand  

46:52

for 200 gigawatts a year of AI chips, or  trillions of dollars of spend a year in  

46:58

the semiconductor supply chain. They're  not AI-pilled. They're not AGI-pilled. 

47:02

We're going to get to a  trillion dollars this year. 

47:05

Yeah, I feel you, but I'm saying no one  really understands this in the supply chain. 

47:11

Constantly, we're told our numbers are  way too high, and then when they're right,  

47:14

they're like, "Oh, yeah, but your next  year's numbers are still too high." 

47:18

ASML's tool has four major components. It has the source,  

47:25

which is made by Cymer in San Diego. It has the reticle stage, which is made  

47:31

in Wilmington, Connecticut. It has the wafer  stage. It has the optics, the lenses and such. 

47:39

Those last two are made in Europe. When you look at each of these four,  

47:42

they're tremendously complex supply chains that,  (A) they have not tried to expand massively,  

47:48

and (B) when they try to expand  them, the time lag is quite long. 

47:55

Again, this is the most complicated machine  that humans make, period, at any sort of volume. 

48:02

Let's talk about the source specifically. What  does the source do? It drops these tin droplets.  

48:08

It hits it three subsequent  times with a laser perfectly. 

48:11

The first one hits this tin  droplet, it expands out. 

48:13

It hits it again, so it expands  out to this perfect shape,  

48:16

and then it blasts it at super high power. The tin droplets get excited enough that they  

48:21

release EUV light, 13.5 nanometer, and then  it's in this thing that is collecting all  

48:26

the light and directing it into the lens stack. Then you have the lens stack, which is Carl Zeiss,  

48:31

as you mentioned, and some other folks, but  Zeiss being the most important part of it. 

48:36

They also have not tried to expand  production capacity because they don't see... 

48:40

They're like, "We're growing a lot because of AI. We're growing from 60 to 100." It's like, "No, no,  

48:46

no. We need to go to a couple hundred, but it's  fine. Whatever." Each of these tools has, I think,  

48:51

18 of these lenses, effectively. They are multilayer mirrors,  

48:57

which are perfect layers of molybdenum  and ruthenium, if I recall correctly,  

49:03

stacked on top of each other in many layers,  and then the light bounces off of it perfectly. 

49:08

When we think about a lens, it's in  a shape, and it focuses the light. 

49:12

This is like a mirror that's also  a lens, so it's pretty complicated. 

49:16

Any defect in these super thinly  deposited stacks will mess it up. 

49:23

Any curvature issues will mess it up. There are a lot of challenges  

49:26

with scaling the production. It's quite artisanal in this sense  

49:29

because you're not making tens of thousands  of these a year, you're making hundreds,  

49:34

you're making thousands. 60 tools a year, 18 of  these per tool, you’re still in the hundreds,  

49:43

of tools, or you're at the thousand number  roughly for these lenses and projection optics. 

49:51

Then you step forward to the reticle stage,  which is also something really crazy. 

49:57

This thing moves at, I want to say, nine Gs. It will shift nine Gs because as you step  

50:03

across a wafer, the tool will go... The  wafer stage is complementary. It's the  

50:07

wafer part. You line these two things up. You're taking all the light through the  

50:11

lenses that's focused, and here's  the reticle, here's the wafer. 

50:16

The reticle's moving one direction, the  wafer's moving the other direction as it  

50:20

scans a 26x33 millimeter section  of the wafer, and then it stops. 

50:25

It shifts over to another part  of the wafer and does it again. 

50:28

It does that in just seconds. Each of them is moving  

50:32

at nine Gs in opposite directions. Each of these things is a wonder and marvel  

50:37

of chemistry, fabrication, mechanical engineering,  and optical engineering, because you have to align  

50:47

all these things and make sure they're perfect. All of these things have crazy amounts  

50:50

of metrology because you have  to perfectly test everything. 

50:53

If anything is messed up, the yield goes to  zero, because this is such a finely tuned system. 

50:58

By the way, it's so large that you're building  it in the factory in Eindhoven, Netherlands,  

51:05

and they're deconstructing it and shipping it on  many planes to the customer site, and then you're  

51:10

reassembling it there and testing it again. That process takes many, many months. 

51:15

There are so many steps in the supply  chain, whether it's Zeiss making their  

51:19

lenses and projection optics or Cymer, which is  an ASML-owned company, making the EUV source. 

51:25

Each of these has its own complex supply chain. ASML has commented that their supply chain has  

51:29

over ten thousand people in it. Like individual suppliers? 

51:32

Yes. It might not be directly. It might  be through Zeiss having so many suppliers  

51:37

and XYZ company having so many suppliers. If you just think about it, you're talking  

51:44

about two physically moving objects that are  the size of a wafer, and it has to be accurate  

51:51

to the level of single-digit nanometers or even  smaller because the entire system, the overlay,  

51:58

the layer-to-layer overlay variation,  has to be on the order of 3 nanometers. 

52:04

If the overlay is 3 nms, that means each  individual part, the accuracy of its  

52:09

physical movement has to be even less than that. It has to be sub-one nanometer in most cases,  

52:14

because the error of these things stack up. There's no way to just snap your fingers and  

52:23

increase production. Things as simple as power.  The US going from zero percent power growth to  

52:27

two percent power growth, even though China's  already at thirty, was so hard for America to do. 

52:34

And that's a really simple supply chain with  very few people in it who make difficult things. 

52:41

There are probably 100,000  electricians and people who work in  

52:45

the electricity supply chain, or more, in the US? When you look at ASML, they employ so few people. 

52:53

Carl Zeiss probably employs less than a  thousand people working on this, and all of  

52:58

those people are super, super specialized. You can't just train random people up  

53:02

for this in the snap of a finger. You can't just get your entire supply  

53:06

chain to get galvanized. Nvidia's had to do a lot  

53:11

to get the entire supply chain to even deliver  the capacity they're going to make this year. 

53:15

When you go talk to Anthropic, they're  like, "We're short of TPUs, we're short  

53:18

of training, and we're short of GPUs." When you go talk to OpenAI, they're like,  

53:21

"We're short of these things." OpenAI and Anthropic know they need X. 

53:25

Nvidia is not quite as AGI-pilled.  They're building X - 1. You go down the  

53:31

supply chain, everyone's doing X - 1. In some cases, they're doing X ÷ 2,  

53:36

because they're not AGI-pilled. You end up with this time lag  

53:42

for the whip to react. The AI-pilledness and the  

53:48

desire to increase production takes so long. Once they finally understand that they need  

53:53

to increase production rapidly…  They think they understand. 

53:57

They think AI means we have to go from 60  to 100, in addition to the tools getting  

54:01

better and faster, the source getting  higher power from 500 watts to 1,000,  

54:05

and all these other aspects of the supply chain  advancing technically and increasing production. 

54:09

They think they're actually  increasing production a lot. 

54:13

But if you flow through the  numbers… What does Elon want? 

54:15

He wants 100 gigawatts a year  in space by 2028 or 2029. 

54:23

Sam Altman wants 52 gigawatts a  year by the end of the decade. 

54:28

Anthropic probably needs the  same, and Google needs that. 

54:32

You go across the supply chain, and it's  like, wait, no, the supply chain can't  

54:35

possibly build enough capacity for everyone  to get what they want on the side of compute. 

55:44

I feel like in the data center  supply chain for the last few years,  

55:50

people have been making arguments like, "We are  bottlenecked by this specific thing, therefore  

55:55

AI compute can't scale more than X." But as you've written about, if the  

56:00

grid is a bottleneck, then we just do behind the  meter on the site, we do gas turbines, et cetera. 

56:06

If that doesn't work, there are all these  other alternatives that people fall back on. 

56:11

I want to ask whether we can imagine a similar  thing happening in the semiconductor supply chain. 

56:17

If EUV becomes a bottleneck, what if we  just went back to 7 nm and did what China  

56:24

is doing currently, producing 7 nm chips  with multi-patterning with DUV machines? 

56:31

If you look at a 7 nm chip like the  A100, there's been a lot of progress  

56:36

obviously from the A100 to the B100 or B200. How much of that progress is just numerics? 

56:45

If you just hold FP16 constant from A100 to B100. The B100 is a little over one petaflop, and the  

56:54

A100 is like 300 teraflops. Yeah, 312. 

57:02

Holding numerics constant, you have  a 3x improvement from A100 to B100. 

57:07

Some of that is the process improvement, some of  that is just the accelerator design improving,  

57:11

which we could replicate again in the future. It seems there's actually a very small effect  

57:16

from the process improving from 7nm to 4 nm. I don't know the numbers offhand but let's  

57:24

say there's 150k wafers per month of 3 nm  and eventually similar amounts for 2 nm. 

57:31

But then there's a similar amount for 7 nm. If you have all those old wafers and there's  

57:36

maybe a 50% haircut because the bits per  wafer area are 50% less or something,  

57:45

it doesn't seem that bad to just bring on 7  nm wafers if that gives you another fifty or  

57:50

hundred gigawatts. Tell me why that's naive. We potentially do go crazy enough that this  

58:01

happens because we just need incremental  compute, and the compute is worth the  

58:04

higher cost and power of these chips. But it's also unlikely to a large extent  

58:13

because some of these are not fair comparisons. For example, from A100, which is 312 teraflops,  

58:22

to Blackwell, which is 1,000 or 2,000 FP16,  and then Rubin is 5,000 or so FP16… It's not  

58:31

a fair comparison because these chips  have vastly different design targets. 

58:38

With A100, Nvidia optimized  for FP16 and BF16 numerics. 

58:45

When you look at Hopper, they didn't care  as much about that; they cared about FP8. 

58:49

When you look at Rubin, they don’t  care about FP16 and BF16 so much,  

58:53

they care mostly about FP4 and FP6. Numerics are what they've designed their chip for. 

59:06

Let's say we make a new chip design on 7 nm,  optimized for the numerics of the modern day. 

59:14

The performance difference is  still going to be much larger  

59:16

than the FLOPS difference you mentioned. Often it's easy to boil things down to FLOPS  

59:23

per watt or FLOPS per dollar,  but that's not a fair comparison. 

59:32

Let's look at Kimi K2.5 and DeepSeek. When you look at those two models and  

59:40

their performance on Hopper versus  Blackwell on very optimized software,  

59:45

you get vastly different performance. Most of this is not attributed to  

59:50

FLOPS or numerics, because those  models are actually eight-bit. 

59:55

So it's not like Blackwells and Hopper are both  optimized for eight-bit, and Blackwell is not  

59:59

really taking advantage of its four-bit there. The performance gulf is actually much larger. 

60:09

Sure it's one thing to shrink process  technology and make the transistor smaller  

60:14

so each chip has X number of FLOPS,  but you forget the big gating factor. 

60:18

These models don't run on a single chip. They run on hundreds of chips at a time. 

60:22

If you look at DeepSeek's production  deployment, which is well over a  

60:25

year old now, they were running on 160 GPUs. That's what they serve production traffic on. 

60:31

They split the model across 160 GPUs. Every time you cross the barrier from one  

60:35

chip to another, there is an efficiency loss. You have to transmit over high-speed  

60:40

electrical SerDes, which brings  a latency cost and a power cost. 

60:44

There are all these dynamics that hurt. As you shrink and shrink the process node, you've  

60:51

increased the amount of compute in a single chip. Now in-chip movement of data is at least tens  

61:01

of terabytes a second, if not  hundreds of terabytes a second. 

61:04

Whereas between chips, you're on  the order of a terabyte a second. 

61:09

Then you have this movement of data between chips  that are super close to each other physically. 

61:13

You can only put so many chips  close to each other physically,  

61:15

so you have to put chips in different racks. The movement of data between racks is on the order  

61:20

of hundreds of gigabits a second, 400 gig or 800  gig a second, so roughly 100 gigabytes a second. 

61:27

So you have this huge ladder: on-chip  communication is super fast, within the  

61:32

rack is an order of magnitude slower, and outside  the rack is an order of magnitude lower than that. 

61:39

As you break the bounds of chips,  you end up with a performance loss. 

61:43

The reason I explain this is because  when you look at Hopper versus Blackwell,  

61:47

even if both are using a rack's worth of  chips, Hopper is significantly slower. 

61:52

The amount of performance you have leveraged to  the task within each domain—tens of terabytes a  

62:00

second of communication between these processing  elements versus terabytes a second between these  

62:06

processing elements—is much, much higher and  therefore the performance is much higher. 

62:11

When you look at inference at 100 tokens  a second for DeepSeek and Kimi K2.5,  

62:19

the performance difference between Hopper  and Blackwell is on the order of 20x. 

62:21

It's not 2x or 3x like the FLOPS  performance difference indicates,  

62:24

even though those are on the same process node. There are just differences in networking  

62:28

technologies and what they've worked on. You can translate some of these back,  

62:32

but when you look at what they're doing  on 3 nm with Rubin, some of those things  

62:36

are simply not possible to do all the way back  on A100, even if you make a new chip for 7 nm. 

62:42

There are certain architectural improvements  you can port and certain ones you cannot. 

62:47

The performance difference is not just  going to be the difference in FLOPS. 

62:50

It's in some senses cumulative between  the difference in FLOPS per chip,  

62:56

networking speed between chips, how many FLOPS  are on a chip versus a system, and memory  

63:00

bandwidth on a single chip versus an entire  system. All of these things compound. 

63:03

Can I ask you a very naive question? The B200 now has two dies on a single chip,  

63:10

so you can get that bandwidth without  having to go through NVLink or InfiniBand. 

63:16

Next year, Rubin Ultra will  have four dies on one chip. 

63:19

What is preventing us from just doing  that with an older… How many dies could  

63:24

you have on a single chip and still  get these tens of terabytes a second? 

63:28

Even within Blackwell, there are differences in  

63:32

performance when you're communicating  on the chip versus across the chips. 

63:36

Those bounds are obviously much smaller than  when you're going out of the entire chip. 

63:45

When you scale the number of chips  up, there is some performance loss. 

63:50

It's not perfect, but it is way  better than different entire packages. 

63:54

How large can advanced packaging scale? The way Nvidia is doing it is CoWoS. 

64:01

Google, Broadcom, MediaTek, and  Amazon's Trainium are all doing CoWoS. 

64:07

But actually you can go look back at what Tesla  did with Dojo, which they cancelled and restarted. 

64:16

Dojo was a chip that was  the size of an entire wafer. 

64:19

They had 25 chips on it. There were some  tradeoffs. They couldn't put HBM on it. 

64:26

But the positive side was  that they had 25 chips on it. 

64:30

To date, it is still probably the best chip  for running convolutional neural networks. 

64:35

It's just not great at transformers because the  shape of the chip, the memory, the arithmetic,  

64:41

and all these various specifications are just  not well-suited for transformers. They're  

64:45

well-suited for CNNs. Dojo chips were optimized  around that, and they made a bigger package. 

64:52

But as you make packages bigger and bigger,  you have other constraints: networking speed,  

64:59

memory bandwidth, and cooling capabilities. All of these things start to rear their heads.  

65:03

It's not simple. But yes, you will see a trend  line of more chips on the package, and yes,  

65:08

you're going to be able to do that on 7 nm. In fact, that's what Huawei did  

65:11

with their Ascend 910C or D. They initially put one, and then they did two. 

65:20

They're focusing on scaling the  packaging up because that is an  

65:23

area where they can advance faster than  process technology where they can't shrink. 

65:28

But at the end of the day, that’s something  you can do on the leading-edge chips too. 

65:32

Anything you do on 7 nm, you can also  probably do on 3 nm in terms of packaging. 

65:36

If we end up in this world in 2030 where the  West has the most advanced process technology  

65:42

but has not ramped it up as much, whereas  China… I don't know if you think by 2030  

65:48

they would have EUV and 2 nm or whatever. But they are semiconductor-pilled and they  

65:53

are producing in mass quantity. Basically, I'm wondering what  

65:57

the year is where there's a crossover, where our  advantage in process technology has faded enough,  

66:03

and their advantage in scale has increased enough. And also, if their advantage in having one country  

66:09

with the entire supply chain indigenized—rather  than having random suppliers in Germany  

66:13

and the Netherlands—would mean that China would  be ahead in its ability to produce mass flops. 

66:22

To date, China still does not have an entirely  indigenized semiconductor supply chain. 

66:28

But would they in 2030? By 2030, it's possible that they do. 

66:33

But to date, all of China's 7 nm and  14 nm capacity uses ASML DUV tools. 

66:42

The amount that they can  import from ASML is large. 

66:47

But the vast majority of ASML's revenue,  especially on EUV all of it, is outside of China. 

66:54

The scale advantage is still in  the favor of the West plus Taiwan,  

66:58

Japan, and Korea, et cetera. But they're trying to make  

66:59

their own DUV and EUV tools, right? They're trying to do all these things. 

67:03

The question is how fast can they advance  and scale up production as well as quality. 

67:08

To date, we haven't seen that. Now I'm quite bullish that they're  

67:12

going to be able to do these things  over the next five to ten years. 

67:16

They will really scale up production  and kick it into high gear. 

67:20

They have more engineers working on it and  more desire to throw capital at the problem. 

67:24

So by 2030, will they have fully indigenized DUV? I think for sure. DUV, yes. 

67:28

And fully indigenized EUV by 2030? I think they'll have working tools. 

67:32

I don't think that they'll be  able to manufacture a bunch yet. 

67:36

There's having it work, and  then there's production hell. 

67:42

ASML had EUV working in the  early 2010s at some capacity. 

67:49

The tools were not accurate enough. They were not scaled for high-volume  

67:54

manufacturing or reliable enough. They had to ramp production,  

67:57

and that all took time. Production hell takes  time. That's why it took another five to seven  

68:01

years to get EUV into mass production at  a fab rather than just working in the lab. 

68:07

How many DUV tools do you think  they'll be able to manufacture in 2030? 

68:11

ASML? No, China. 

68:14

That's a great question. It's a bit of a  challenge to look into this supply chain  

68:23

especially. We try really hard. In some instances,  they're buying stuff from Japanese vendors. 

68:31

If they want a fully indigenized supply chain,  they need to not buy these lenses, projection  

68:36

optics, or stages from Japanese vendors. They need to build it internally. 

68:40

It's really tough to say where  they'll be able to get to. 

68:42

I honestly think it's a shot in the dark. But it's probably not unlikely that they'll  

68:46

be able to do on the order of 100 DUV tools  a year, whereas ASML is currently doing  

68:51

hundreds of DUV tools a year. No company has a process node  

69:00

where they make a million wafers a month. Elon says he wants to do it and China is  

69:05

obviously going to do it. TSMC is trying to do that. 

69:12

The memory makers may get to a million wafers  a month as well, but not in a single fab. 

69:16

It's mind-boggling to think of  that scale, and challenging to  

69:22

see the supply chain galvanized for that. I don't want to doubt China's capability to scale. 

69:29

I guess this is an interesting question. I think at some point SemiAnalysis  

69:34

will do the deep dive on this. By when would indigenized Chinese production  

69:44

be bigger than the rest of the West combined. And put in the input of your model of when they'll  

69:52

have DUV machines and EUV machines at scale? Because there's this question around if you  

69:56

have long timelines on AI—by long meaning  2035, which is not that long in the grand  

70:00

scheme of things—should you expect a world  where China is dominating in semiconductors? 

70:06

It doesn't get asked enough  because if you're in San Francisco,  

70:09

we're thinking on timescales of weeks. If you're outside of San Francisco,  

70:14

you're not thinking about AGI at all. What if we  have AGI? What if you have this transformational  

70:19

thing that is commanding tens or hundreds  of trillions of dollars of economic growth  

70:23

and token output, but it happens in 2035? What does that imply for the West versus China? 

70:33

SemiAnalysis has got to write  the definitive model on this. 

70:39

It's really challenging when you  move timescales out that far. 

70:43

What we tend to focus on is tracking every  data center, every fab, and all the tools. 

70:48

We track where they're going, but the time  lags for these things are relatively short. 

70:54

We can only make reasonably accurate estimates  for data center capacity based on land purchasing,  

71:01

permits, and turbine purchasing. We know where all these things  

71:04

are going, that's the data we sell. As you go out to 2035, things are just  

71:10

so radically different. Your error bars get so  

71:13

large it's hard to make an estimate. But at the end of the day, if takeoff  

71:19

or timelines are slow enough, I don't see why  China wouldn't be able to catch up drastically. 

71:28

In some sense, we've got this valley where, three  to six months ago, or maybe even now, Chinese  

71:36

models are as competitive as they've ever been. I think Opus 4.6 and GPT 5.4 have really pulled  

71:41

away and made the gap a little bit bigger, but  I'm sure some new Chinese models will come out. 

71:45

As we move from selling tokens where they  provide the entire reasoning chain, to  

71:53

selling automated white-collar work—an automated  software engineer, you send them the request,  

71:59

they give you the result back, and there's a bunch  of thinking on the back end that they don't show  

72:02

you—the ability to distill out of American  models into Chinese models will be harder. 

72:05

Second, look at the scale of  the compute the labs have. 

72:10

OpenAI exited the year with  roughly two gigawatts last year. 

72:13

Anthropic will get to  two-plus gigawatts this year. 

72:17

By the end of next year, they'll  both be at ten gigawatts of capacity. 

72:21

China is not scaling their AI  lab compute nearly as fast. 

72:25

At some point, when you can't distill the  learnings from these labs into the Chinese  

72:30

models, plus with this compute race that OpenAI,  Anthropic, Google, and Meta are all racing on,  

72:37

they end up getting to a point where the model  performance should start to diverge more. 

72:44

Then look at all this CapEx  being spent on data centers. 

72:49

Amazon is spending $200  billion, Google $180 billion. 

72:53

All these companies are spending  hundreds of billions of dollars on CapEx. 

72:57

There's nearly a trillion dollars  of CapEx being invested in data  

73:02

centers in America this year, roughly. What's the return on invested capital here? 

73:08

You and I would think the return on invested  capital for data center CapEx is very high. 

73:14

If we look at Anthropic's revenues,  in January they added $4 billion. 

73:18

In February, which was a shorter  month, they added $6 billion. 

73:21

We'll see what they can do in March and April,  

73:24

given that compute constraints are  what's bottlenecking their growth. 

73:27

The reliability of Claude is quite low  because they're so compute constrained. 

73:31

But if this continues, then the ROIC  on these data centers is super high. 

73:36

At some point, the US economy starts growing  faster and faster over this year and next year  

73:42

because of all this CapEx, all the revenue these  models are generating, and the downstream supply  

73:47

chain. China doesn't have that yet. They  have not built the scale of infrastructure  

73:54

to invest in models, get to the capabilities,  and then deploy these models at such scale. 

74:00

When you look at Anthropic,  they're at $20 billion ARR. 

74:05

The margins are sub-50 percent, at least  as last reported by The Information. 

74:09

So that's $13 or $14 billion of compute that it's  running on rental cost-wise, which is actually $50  

74:16

billion worth of CapEx that someone laid out  for Anthropic to generate their current revenue. 

74:22

China has just not done this. If and when Anthropic 10Xs revenue again—and  

74:28

I think our answer would be when, not if—China  doesn't have the compute to deploy at that scale. 

74:34

So there is some sense that  we're in a fast takeoff. 

74:39

It's not like we're talking  about a Dyson sphere by X date,  

74:42

it's more like the revenue is compounding at  such a rate that it does affect economic growth. 

74:47

The resources these labs are  gathering are growing so fast. 

74:51

China hasn't done that yet, so in that case,  the US and the West are actually diverging. 

74:56

The flip side is that these infrastructure  investments have middling returns. 

75:01

Maybe they're not as good as hoped. Maybe Google is wrong for wanting  

75:05

to take free cash flow to zero and  spend $300 billion on CapEx next year. 

75:09

Maybe they’re just wrong and people on  Wall Street who are bearish and people  

75:13

who don't understand AI are correct. In that case, the US is building all  

75:19

this capacity but doesn't get great returns. Meanwhile, China is able to build a fully  

75:23

vertical, indigenized supply chain, instead of  the US/Japan/Korea/Taiwan/SE Asia/Europe countries  

75:33

together building this less vertical supply chain. In a sense, at some point China is able to scale  

75:40

past us if AI takes longer to get to certain  capability levels than the vast majority of  

75:47

your guests on this podcast believe. It's fast timelines, the US wins;  

75:50

long timelines, China wins. Yeah but I don't know what fast timelines means. 

75:54

I don't think you have to believe in AGI  to have the timelines where the US wins. 

76:01

Let's go back to memory. I think people on  Wall Street and people in the industry are  

76:06

understanding how big this is, but maybe generally  people don't understand what a big deal it is. 

76:10

So we've got this memory crunch,  as you were talking about. 

76:12

And earlier I was asking about,  oh, could we solve for the EUV  

76:16

tool shortage by going back to seven nanometers? So let me ask a similar question about memory. 

76:21

HBM is made of DRAM, but has three  to four times fewer bits per wafer  

76:26

area than the DRAM it's made out of. Is it possible that accelerators in the  

76:30

future could just use commodity  DRAM and not HBM, so we can get  

76:35

much more capacity out of the DRAM we have? The reason I think this might be possible is,  

76:43

if we're going to have agents that are  just going off and doing work, and it's  

76:48

not a synchronous chatbot application, then you  don't necessarily need extremely fast latency. 

76:57

Maybe you can have lower bandwidth,  because the reason you stack DRAM into  

77:04

HBM is for higher bandwidth. Is it possible to go to HBM  

77:09

accelerators and basically have the opposite  of Claude Code Fast, like have Claude Slow? 

77:17

At the end of the day, the incremental  purchaser who's willing to pay the highest  

77:20

price for tokens also ends up being  the one that's less price-sensitive. 

77:26

Compute should be allocated, in a capitalistic  society, towards the goods that have the  

77:31

highest value, and the private market  determines this by willingness to pay. 

77:35

To some extent, Anthropic could  actually release a slow mode. 

77:39

They could release Claude Slow Mode and increase  tokens per dollar by a significant amount. 

77:46

They could probably reduce the price of Opus 4.6  by 4-5x and reduce the speed by maybe just 2x. 

77:54

The curve on inference throughput versus  speed is already there just on HBM. 

77:59

And yet they don't, because no one  actually wants to use a slow model. 

78:04

Furthermore, on these agentic tasks, it's great  that the model can run at a time horizon of hours. 

78:11

But if the model was running slower,  those hours would become a day. 

78:16

Vice versa, if the model is running  faster, those hours become an hour. 

78:21

No one really wants to move to a day-long wait  period, because the highest-value tasks also have  

78:27

some time sensitivity to them. I struggle to see… Yes,  

78:34

you could use regular DRAM. There are a couple of challenges with this. 

78:44

One of the core constraints of chips is  that a chip is a certain size, and all  

78:52

of the I/O escapes on the edges. Often, the left and right of the  

78:58

chip are HBM—so the I/O from the chip  to the HBM is on the sides—and then the  

79:02

top and bottom are I/O to other chips. If you were to change from HBM to DDR,  

79:11

all of a sudden this I/O on the edge  would have significantly less bandwidth,  

79:17

but significantly more capacity per chip. But the metric you actually care about  

79:28

is bandwidth per wafer, not bits per wafer. Because the thing that is constraining the FLOPS  

79:34

is just getting in and out the next matrix,  and for that you just need more bandwidth. 

79:39

Yeah, getting out the weights and  getting in and out the KV cache. 

79:44

In many cases, these GPUs are not  running at full memory capacity. 

79:47

It's obviously a system design thing:  model, hardware, and software co-design. 

79:52

You have to figure out how much KV cache  you need, how much you keep on the chip,  

79:55

how much you offload to other chips and  call when you need it for tool calling,  

80:00

and how many chips you parallelize this on. Obviously, the search space for this is very  

80:05

broad, which is why we have InferenceX,  an open-source model that searches all  

80:09

the optimal points on inference for a  variety of different chips and models. 

80:16

The point is, you're not always  necessarily constrained by memory capacity. 

80:22

You can be constrained by FLOPS, network  bandwidth, memory bandwidth, or memory capacity. 

80:30

If you really simplify it down,  there are four constraints,  

80:33

and each of these can break out into more. If you switch to DDR, yes, you produce  

80:39

four times the bits per DRAM wafer, but all of  a sudden the constraints shift a lot and your  

80:44

system design shifts. You go slower.  Is the market smaller? Maybe. But also,  

80:50

all these FLOPS are wasted because they're  just sitting there waiting for memory. 

80:53

You don't need all that capacity because you can't  really increase batch size because then the KV  

80:58

cache would take even longer to read. Makes sense. What is the bandwidth  

81:04

difference between HBM and normal DRAM? An HBM4 stack—let's talk about the stuff  

81:11

that's in Rubin, because that's what we've been  indexing on—is 2048 bits across, connected in an  

81:16

area that's 13 millimeters wide. It transfers memory at around  

81:22

10 giga-transfers a second. So a stack of HBM4 is 2048 bits on  

81:27

an area that's roughly 11 to 13 millimeters wide. That's the shoreline you're taking on the chip. 

81:33

In that shoreline, you have 2048 bits  transferring at 10 giga-transfers per second. 

81:39

You multiply those together and divide by eight,  

81:41

bits to a byte, and you're at roughly  2.5 terabytes a second per HBM stack. 

81:46

When you look at DDR, in that same  area, it's maybe 64 or 128 bits wide. 

81:53

That DDR5 is transferring at anywhere from  6.4 to maybe 8,000 giga-transfers a second. 

82:01

So your bandwidth is significantly lower. It's 64 times 8,000 divided by  

82:07

eight, which puts you at 64 gigabytes a second. Even if you take a generous interpretation of  

82:14

128 times 8 giga-transfers, you're at 128  gigabytes a second for the same shoreline,  

82:18

versus 2.5 terabytes a second. There's an order of magnitude  

82:21

difference in bandwidth per edge area. If your chip is a square, or 26 by 33  

82:27

millimeters—which is the maximum size for an  individual die—you only have so much edge area. 

82:32

On the inside of that chip,  you put all your compute. 

82:34

There are things you can do to try and  change that, like more SRAM or more caching. 

82:38

But at the end of the day, you're  very constrained by bandwidth. 

82:42

Then there's the question of where you can  destroy demand to free up enough for AI. 

82:48

I guess the picture is especially bad because,  as you're saying, if it takes four times more  

82:52

wafer area to get the same byte, for HBM you have  to destroy four times as much consumer demand for  

82:58

laptops and phones to free up one byte for AI. What does this imply for the next year or two? 

83:08

Sorry for the run-on question, in your newsletter  you said 30% of Big Tech's CapEx in 2026 is going  

83:14

towards memory? Yes. 

83:16

That's insane, right? Of the $600 billion  or whatever, 30% is going just to memory. 

83:23

Yes. Obviously, there's some level  of margin stacking that Nvidia does,  

83:26

so you have to separate that out and apply  their margin to the memory and the logic. 

83:30

But at the end of the day, a third  of their CapEx is going to memory. 

83:33

That's crazy. What should we expect over the  next year or two as this memory crunch hits? 

83:41

The memory crunch will continue to get  harder, and prices will continue to go up. 

83:48

This affects different parts  of the market differently. 

83:52

Are people going to hate AI more and more? Yes, because smartphones and PCs are not  

83:56

going to get incrementally better year on year. In fact, they're going to get incrementally worse. 

84:00

If you look at the bill of materials for an  iPhone, what fraction of it is the memory? 

84:04

How much more expensive does an iPhone get  if the memory is two times more expensive? 

84:09

I believe an iPhone has 12 gigabytes of memory. Each gig used to cost roughly $3-4, so that's $50. 

84:17

But now the price of memory has tripled. Let's say it's $12 per gig for DDR. 

84:23

Now you're talking about $150 versus $50. That's a $100 increase in cost for Apple. 

84:30

Apple has some margin, they're  not just going to eat the margin. 

84:32

NAND also has the same market dynamics,  so in reality, it's probably a $150  

84:32

increase on the iPhone. So now that’s a $100 cost  

84:33

increase and that’s just on the DRAM. The NAND also has the same sort of market. 

84:37

So in fact it’s probably a  $150 increase on the iPhone. 

84:41

Apple either has to pass that  on to the consumer or eat it. 

84:46

I don't see Apple reducing their margin  too much, maybe they eat a little bit. 

84:49

But at the end of the day, that means the end  consumer is paying $250 more for an iPhone. 

84:54

Now that’s just on last  year’s pricing versus today’s. 

84:59

There is some lag before Apple feels the heat  because they tend to have long-term contracts  

85:06

for memory that last three months to a year. But at the end of the day, Apple gets hit  

85:09

pretty hard by this. They won't really  

85:13

adjust until the next iPhone release. But that's the high end of the market,  

85:17

which is only a few hundred million phones a year. Apple sells two or three hundred million  

85:20

phones annually. The bulk of the market is mid-range and low-end. 

85:25

It used to be that 1.4 billion  smartphones were sold a year. 

85:28

Now we're at about 1.1 billion. Our projections are that we might  

85:31

drop to 800 million this year, and  down to 500 or 600 million next year. 

85:37

We look at data points out of China  from some of our analysts in Asia,  

85:42

Singapore, Hong Kong, and Taiwan. They've been tracking this,  

85:45

and they see Xiaomi and Oppo cutting low-end  and mid-range smartphone volumes by half. 

85:52

Yes, it’s only a $150 BOM increase on a $1,000  iPhone where Apple has some larger margin. 

86:02

But for smaller phones, the percentage of the BOM  that goes to memory and storage is much larger. 

86:08

And the margins are lower, so there's  less capacity to even eat the margins. 

86:13

And they have also generally tended not  to do long-term agreements on memory. 

86:20

Why this is a big deal is that if smartphone  volumes halve, that drop will happen in  

86:26

the low and mid-range, not the high end. So it’s not like the bits released are halving. 

86:32

Currently, consumer devices account  for more than half of memory demand. 

86:35

Even if you halve smartphone volumes,  because of the shape of the halving,  

86:38

the low end gets cut by more than half, while  the high end gets cut by less than half,  

86:42

because you and I will still buy the high-end  phones that cost north of a thousand dollars. 

86:46

We'll buy them even if they get  a little bit more expensive. 

86:48

And Apple's volumes will not go down as  much as a low-end smartphone provider.  

86:52

The same applies to PCs. What this  does to the market is quite drastic. 

86:59

DRAM gets released and goes to AI chips, who are  willing to do longer-term contracts and pay higher  

87:06

margins, because at the end of the day the margin  they extract from the end user is much larger. 

87:14

This probably leads to people hating AI even more. Today, you already see all the memes on PC  

87:22

subreddits and gaming PC Twitter. It's cat dancing videos saying,  

87:27

"This is why memory prices have doubled and  you can't get a new gaming GPU or desktop." 

87:33

It's going to be even worse when memory  prices double again, especially DRAM. 

87:37

Another interesting dynamic is that  it's not just DRAM, it's also NAND. 

87:42

NAND is also going up in price. Both of these markets have expanded capacity very  

87:46

slowly over the last few years, NAND almost zero. The percentage of NAND that goes to phones and  

87:54

PCs is larger than the percentage  of DRAM that goes to phones and PCs. 

87:58

As you destroy demand, mostly for  DRAM purposes, you unlock more NAND  

88:03

that gets allocated and can go to other markets. The price increases of DRAM will be larger than  

88:09

those of NAND because you've released  more from the consumer, and in fact,  

88:13

you've produced more memory for AI. Sorry, maybe you just explained  

88:18

it and I missed it. Is it because SSDs are  

88:21

being used in large quantities for data centers? They are, but not in as large quantities as DRAM. 

88:27

Okay, so they will also increase because  they'll be using some quantity, but there's  

88:32

not as much of a need as there is for HBM. Makes  sense. One thing I didn't appreciate until I was  

88:37

reading some of your newsletters is that the  same constraints preventing logic scaling over  

88:43

the next few years are quite similar to what's  preventing us from producing more memory wafers. 

88:49

In fact, literally the same exact machine,  this EUV tool, is needed for memory. 

88:55

So I guess the question someone could ask right  now is, why can't we just make more memory? 

89:05

The constraints, as I was mentioning earlier,  are not necessarily EUV tools today or next year. 

89:11

They become that as we get to  the latter part of the decade. 

89:15

Currently, the constraints are more that  they physically just haven't built fabs. 

89:20

Over the last three to four years,  these vendors have not built new fabs  

89:25

because memory prices were really low. Their margins were low, and in fact,  

89:29

they were losing money in 2023 on memory. So they decided they weren't building new fabs. 

89:34

The market slowly recovered over time but  never really got amazing until last year. 

89:40

In 2024, we were banging on the drums  that reasoning means long context,  

89:44

which means a large KV cache, which  means you need a lot of memory demand. 

89:48

We've been talking about that  for a year and a half, two years. 

89:51

People who understand AI went  really long on memory then. 

89:57

So you’ve seen that dynamic, but now  it has finally played out in pricing. 

90:01

It took so long for what was  obvious: long context means the KV  

90:05

cache gets bigger, you need more memory. Half the cost of accelerators is memory. 

90:09

Of course they're going to  start going crazy on it. 

90:13

It took a year for that to  actually reflect in memory prices. 

90:16

Once memory prices reflected that, it  took another three to six months for the  

90:20

memory vendors to start building fabs. Those fabs take two years to build. 

90:24

So we won't have really meaningful fabs to even  put these tools in until late 2027 or 2028. 

90:34

Instead, you've seen some really  crazy stuff to get capacity. 

90:39

Micron bought a fab from a company in  Taiwan that makes lagging-edge chips. 

90:47

Hynix and Samsung are doing some pretty  crazy things to try and expand capacity  

90:51

at their existing fabs, which also have  large knock-on effects in the economy. 

90:56

So why can't we build more capacity? There's nowhere to put the tools. 

91:02

It's not just EUV; there are other  tools involved in DRAM and logic. 

91:06

In logic, for N3, about 28%  

91:11

of the cost of the final wafer is EUV. When you look at DRAM, it's in the teens. 

91:19

It's going up, but it's a much  smaller percentage of the cost. 

91:24

These other tools are also bottlenecks, although  their supply chains are not as complex as ASML's. 

91:30

You see Applied Materials, Lam  Research, and all these other  

91:32

companies expanding capacity a lot as well. But you don't have anywhere to put the tool,  

91:37

because the most complex buildings people make  are fabs, and fabs take two years to build. 

92:40

I interviewed Elon recently, and his whole plan  is that they're going to build this TeraFab  

92:47

and they're going to build the clean rooms. I won't even ask you about the dirty rooms thing,  

92:53

but let's say they build the clean rooms. I have a couple of questions. 

92:58

One, do you think this is the kind of  thing that Elon Co. could build much  

93:04

faster than people conventionally build it? This is not about building the end tools. 

93:07

This is just about building the facility itself. How complicated is it to just build  

93:11

the clean room extremely fast? Is this something that Elon, with his "move  

93:15

fast" approach, could do much faster if that's  what we're bottlenecked on this year or next year? 

93:19

Two, does that even matter if, in two years,  your view is that we're not bottlenecked on  

93:24

clean room space, but on the tooling? As with any complex supply chain,  

93:29

it takes time, and constraints shift over time. Even if something is no longer a constraint, that  

93:33

doesn't mean that market no longer has margin. For example, energy will not be a big bottleneck  

93:40

a couple of years from now, but that  doesn't mean energy isn't growing super  

93:43

fast and there's no margin there. It's just not the key bottleneck. 

93:47

In the space of fabs, clean rooms are the  biggest bottleneck this year and next year. 

93:52

As we get to 2028, 2029, 2030, there  will still be constraints there. 

93:57

The thing about Elon is he has a tremendous  capability to garner physical resources and  

94:04

really smart people to build things. The way he recruits amazing people  

94:08

is by trying to build the craziest stuff. In the case of AI, that hasn't really worked  

94:12

because everyone's trying to build AGI. Everyone  is very ambitious. But in the case of going to  

94:17

Mars, making rockets that land themselves, fully  autonomous electric cars, or humanoid robots,  

94:25

these are methods of recruiting the people who  think that's the most important problem in the  

94:28

world to work on that problem, because  he's the only one trying really hard. 

94:31

In the case of semiconductors, he stated he wants  to make a fab that's a million wafers per month. 

94:35

No one has a fab that big. It's possible that he's able to recruit a  

94:41

lot of really awesome people and get them on this  crazy task of building a million wafers a month. 

94:47

Step one is to build the clean room,  and that I think he probably can do. 

94:53

His mindset around deleting things, that it  can be dirty, it's fine, is probably not right. 

94:58

Actually I think it’s 100% not right. You need the fab to be very clean. 

95:06

All of the air in the fab gets replaced  every three seconds, it’s that fast. 

95:11

There have to be so few particles. But I think he can build the clean room. 

95:14

It'll take a year or two. Initially, it won't be super fast,  

95:17

but over time, he'll get faster at it. The really complex part is actually developing  

95:21

a process technology and building wafers. I don't think he can develop that quickly. 

95:26

That has a lot of built-up knowledge. The most complicated integration of  

95:32

very expensive tools and supply chains  is done by TSMC, Intel, or Samsung. 

95:39

These two other companies aren't even that  great at it, and they're tremendously complex. 

95:43

How surprised would you be if in 2030  there just happened to be some total  

95:48

disruption where we're not using EUV? What if we're using something that has  

95:52

much better effects, is much simpler to produce,  and can be produced in much bigger quantities? 

95:56

I'm sure as an industry insider that  sounds like a totally naive question,  

95:58

but do you see what I'm asking? What probability should we put on  

96:03

something coming totally out of left  field to make all of this irrelevant? 

96:07

Something that's very simple and easy to  scale, I assign a very, very low probability. 

96:12

There are a number of companies  working on effectively particle  

96:16

accelerators or synchrotrons that generate  light that's either 13.5 nanometer, like EUV,  

96:21

or an even narrower wavelength, like X-ray at  7 nanometers, to then use in lithography tools. 

96:29

But those things are massive particle  accelerators generating this light. 

96:32

It's a very complicated thing to build. There are a couple of companies and I think  

96:35

that could be a big disruption  to the industry beyond EUV. 

96:38

But I don't think we're going to  magically build something new that  

96:43

is direct write and super simple, and can  be manufactured at huge volumes, although  

96:49

there are some attempts to do things like this. I ask because if you think about Elon's companies  

96:54

in the past, rocketry was this thing that was  thought to be—and is—incredibly complicated. 

96:59

Look, I'm just a naive yapper compared to Elon.  What have I built? So maybe it's possible. 

97:05

In order to build more memory in the  future, could we build 3D DRAM the way  

97:12

we do 3D NAND and then go back to DUV? That is the hope currently. Everyone's  

97:17

roadmap for 3D DRAM is that you'll still use EUV  because you want to have that tighter overlay. 

97:24

When you're doing these subsequent processing  steps, everything is vertically stacked and you  

97:28

have more layers on top of each other. You want the pitches to be tighter. 

97:33

So generally, people are still  trying to do it with EUV. 

97:35

But what 3D would do is change the calculation  of how many bits a single EUV pass can make. 

97:42

That number would go up drastically if you  go to 3D DRAM. That is the hope. Right now,  

97:47

everyone's roadmap goes from the current 6F cell,  to a 4F cell, and then finally 3D DRAM by the end  

97:56

of the decade or early next decade. There's still a lot of R&D,  

98:00

manufacturing, and integration to be done. I wouldn't call that out of the cards. 

98:04

I think it's very likely going to happen. It's also going to require a huge  

98:08

retooling of fabs. The breakdown of  

98:11

tools in a fab will be very different. The lithography tool is actually the  

98:14

only thing that isn't that different. But the number of them relative to different  

98:18

types of chemical vapor deposition, atomic layer  deposition, dry etch, or different kinds of etch  

98:25

chambers with different chemistries… You have all  these different tools for different process nodes. 

98:31

You can't just convert a logic fab to a  DRAM fab, or vice versa, or a NAND fab  

98:35

to a DRAM fab, in a short amount of time. In the same way, existing DRAM fabs require a  

98:41

lot of retooling just to go from 1-alpha to 1-beta  to 1-gamma process nodes, because they have to  

98:46

add DUV and change the chemistry stacks for when  you’re using EUV in terms of deposition and etch. 

98:51

And the EUV tool has to be there. Furthermore, when you change to 3D DRAM,  

98:55

there's going to be an even larger shift, so a  lot of retooling of these fabs needs to happen. 

99:01

That would be a big disruption. That would make EUV demand generally lower. 

99:06

But as we've seen across time, lithography demand  as a percentage of wafer cost has trended up. 

99:12

Around the 2014 era, it was 17% of the wafer cost,  and it's gone to 30% over the last fifteen years. 

99:24

For DRAM, it was in the low to mid-teens,  and now it's trended toward the high teens. 

99:30

Before we get to 3D DRAM, it'll  likely cross into the 20% range. 

99:33

But then, if we get to 3D DRAM, the total end  wafer cost as a percentage of EUV tanks again. 

99:39

I guess you care less about the percent of cost  and more about how much it bottlenecks production. 

99:43

Right, but the percentage of cost— It’s a proxy, yeah. If you're Jensen  

99:50

or Sam Altman, or whoever stands to  gain a lot from scaling up AI compute,  

99:56

there are these stories that they'd go to  TSMC and say, "Why can't we access Y and Z?" 

100:01

But I think the point you're  making is that it doesn't really  

100:06

matter what TSMC does in some sense. In fact, even if you have Intel and  

100:09

Samsung building more foundries, in the  long run, you're going to be bottlenecked  

100:13

by ASML and other tool and material makers. First, is that a correct interpretation? 

100:18

Second, should Silicon Valley people be  going to the Netherlands right now to try  

100:23

to pitch ASML to make more tools so that  in 2030 they can have more AI compute? 

100:30

It's a funny dynamic we saw  in 2023, 2024, and 2025. 

100:35

People who saw the energy bottleneck  before others asymmetrically went to  

100:40

Siemens, Mitsubishi, and of course GE  Vernova, and bought up turbine capacity. 

100:45

Now they're able to charge  excess amounts for deploying  

100:47

these turbines in places because of energy. In the same sense, this could be done for EUV,  

100:52

except ASML is not just going to trust any  random bozo who wants to buy EUV tools. 

101:00

These turbines are much cheaper than EUV  tools, and there's many more of them produced. 

101:04

Especially once you get to industrial gas  turbines, not just combined-cycle but the cheaper,  

101:10

smaller, less efficient ones, people put  down deposits for these. Someone could  

101:15

do this. Someone should go to the Netherlands  and be like, "I'll pay you a billion dollars. 

101:21

You give me the right to purchase ten EUV tools  two years from now, and I'm first in line." 

101:30

Then over those two years, you go around  and wait for everyone to realize, "Oh crap,  

101:34

I don't have enough EUV tools," and you  try to sell your option at some premium. 

101:38

All you're effectively doing  is saying, "ASML, you're dumb. 

101:41

You weren't making enough margin on these. I'm going to make a margin." 

101:44

The question is, will ASML even  agree to this? I don't think so. 

101:49

There's a world where they at least get the  demand signal from that to increase production. 

101:53

Potentially. I agree. But it sounds like you're  

101:56

saying they couldn't even increase production  if they wanted to, given the supply chain. 

101:59

Right. But that's exactly the market in  which… If they can't increase production,  

102:02

just like TSMC cannot increase production  that fast, and yet demand is mooning,  

102:06

then the obvious solution is to arbitrage this. You and I know demand is way higher than they're  

102:12

projecting and their capability to build. You arbitrage this by locking up the capacity,  

102:17

doing a forward contract, and then  trying to sell it at a later date  

102:21

once other people realize everything is  fucked and we don't have enough capacity. 

102:26

Then you'll have this insane margin that  ASML and TSMC should have been charging. 

102:30

But the thing is, I don't know if  ASML and TSMC will ever agree to this. 

102:34

Let me ask you about power now. It sounds like you think power  

102:37

can be arbitrarily scaled. Not arbitrarily, but yes. 

102:41

But beyond these numbers. If I'm  remembering correctly, your blog post on  

102:47

how AI labs are increasing power implied that  GE Vernova, Mitsubishi, and Siemens could  

102:54

produce 60 gigawatts a year in gas turbines. Then there are these other sources,  

102:59

but they're less significant than the turbines. Only a fraction of that goes to AI, I assume. 

103:10

If in 2030 we have enough logic and memory to  do 200 gigawatts a year, do you just think that  

103:15

these things are on a path to ramp up to more  than 200 gigawatts a year, or what do you see? 

103:20

Right now we're at 20 or 30. This is critical IT capacity, by the way,  

103:26

which is an important thing to mention. When I'm talking about these gigawatts,  

103:29

I'm talking about critical IT capacity. Server plugged in, that's how much power it pulls. 

103:32

But there are losses along the chain. There is loss on transmission,  

103:37

conversion, cooling, et cetera. So you should gross this factor  

103:43

up from 20 gigawatts for this year, or 200  gigawatts by the end of the decade, to some  

103:49

number 20-30% higher. Then you have capacity  factors. Turbines don't run at 100 percent. 

103:54

If you look at PJM, which I think is the largest  grid in America—covering the Midwest and some of  

103:59

the Northeast area—in their models they want  to have roughly 20 percent excess capacity. 

104:12

Within that 20 percent excess capacity,  they're running all the turbines at 90%  

104:16

because they are derated some for  reliability, maintenance, and so on. 

104:22

In reality, the nameplate capacity for energy is  always way higher than the actual end critical IT  

104:26

capacity because of all these factors. But it's  not just turbines. If you were just making power  

104:32

from turbines, that's simple, boring, and easy. Humans and capitalism are far more effective. 

104:41

The whole point of that blog was that, yes, there  are only three people making combined-cycle gas  

104:45

turbines, but there's so much more we can  do. We can do aeroderivatives. We can take  

104:49

airplane engines and turn them into turbines. There are even new entrants in the market,  

104:55

like Boom Supersonic trying to  do that and working with Crusoe. 

104:58

Also there's all the other ones like  that already exist in the market. 

105:00

There are also medium-speed  reciprocating engines: engines  

105:04

that spin in circles, like a diesel engine. There are ten people who make engines that way. 

105:13

I'm from Georgia, and people used  to be like, "Oh man, you got a  

105:15

Cummins engine in there," regarding RAM trucks. Automobile manufacturing is going down, so these  

105:22

companies all have capacity and could scale  and convert that for data center power. 

105:26

You stick all these reciprocating engines in. It's not as clean as combined-cycle, but maybe you  

105:31

can convert them from diesel to gas if you want.  What about ship engines? All of these engines for  

105:38

massive cargo ships are great. Nebius is doing that for  

105:41

a Microsoft data center in New Jersey. They're running ship engines to generate power. 

105:49

Bloom Energy is doing fuel cells. We've been very positive on them for  

105:52

a year and a half now because they have such  a capability to increase their production. 

105:57

Their payback period for a production  increase is very fast, even if the cost  

106:01

is a little bit higher than combined-cycle,  which is the best for cost and efficiency. 

106:06

Then there's solar plus battery, which can come  online as those cost curves continue to come down. 

106:11

There's wind, where you might only expect 15  percent of the maximum power because things  

106:18

oscillate, but you add batteries. There are  all these things. The other thing is that the  

106:23

grid is scaled so we don't cut off power at  peak usage on the hottest day of the summer. 

106:32

But in reality, that's a load spike  that is 10-20% higher than the average. 

106:37

If you just put enough utility-scale  batteries, or peaker plants that only  

106:41

run a small portion of the year—and those could  be gas, industrial gas turbines, combined-cycle,  

106:49

batteries, or any of the other sources  I mentioned—then all of a sudden you've  

106:54

unlocked 20% of the US grid for data centers. Most of the time that capacity is sitting idle. 

107:00

It's really only there for that peak, which is  just a few hours over a few days of the year. 

107:07

If you have enough capacity  to absorb that peak load,  

107:11

then all of the sudden you’ve transferred it all. Today, data centers are only 3-4% of the power of  

107:15

the US grid, and by 2028 they'll be 10%. But if you can unlock 20% of the US grid  

107:20

like this, it's not that crazy. The US grid is terawatt-level,  

107:25

not hundreds-of-gigawatts-level. So we can add a lot more energy. I'm not saying  

107:33

it's easy. These things are going to be hard. There's a lot of hard engineering,  

107:36

risks people have to take, and new  technologies people have to use. 

107:40

But Elon was the first to do this behind-the-meter  gas, and since then we've seen an explosion of  

107:45

different things people are doing to get power. They're not easy,  

107:50

but people are gonna be able to do them. The supply chains are just way simpler than chips. 

107:56

Interesting. He made the point during the  interview that for the specific blade for  

108:00

the specific turbine he was looking at, the lead  times go out beyond 2030. Your point is that— 

108:06

That's great. There are so many other ways to  make energy. Just be inefficient. It's fine. 

108:10

Right now, combined-cycle gas turbines  have CapEx of $1,500 per kilowatt. 

108:17

Are you saying it would make sense to  have either technologies that are much  

108:20

more expensive than that, or other things are  getting cheap enough to make it competitive? 

108:24

Exactly. It can be as high as $3,500 per kilowatt. It could be twice as much as the cost of  

108:31

combined-cycle, and the total cost of the GPU on  a TCO basis has only gone up a few cents per hour. 

108:40

Because we've been talking about Hopper pricing,  $1.40, let's say the power price doubles. 

108:46

The Hopper that was $1.40 is now $1.50 in cost. I don't care, because the models are improving so  

108:54

fast that the marginal utility of them is worth  way more than that ten-cent increase in energy. 

109:00

So you're saying 20 percent of the grid—the grid  is about one terawatt—can just come online from  

109:06

utility-scale batteries, increasing what  you'd be comfortable putting on the grid. 

109:11

The regulatory mechanism  there is not easy, by the way. 

109:13

But that's 200 gigawatts, if  that hypothetically happens. 

109:18

Just from the different sources of gas generation  you mentioned—the different kinds of engines  

109:22

and turbines—combined, how many gigawatts  could they unlock by the end of the decade? 

109:28

We're tracking this in our data. There are over 16 different manufacturers  

109:33

of power-generating things just from gas alone. Yes, there are only three turbine manufacturers  

109:39

for combined-cycle, but we're  tracking 16 different vendors,  

109:43

and we have all of their orders. It turns out there are hundreds of  

109:47

gigawatts of orders to various data centers. As we get to the end of the decade,  

109:51

we think something like half of the capacity  that's being added will be behind the meter. 

109:59

Behind the meter is almost always more expensive  than grid-connected, but there are just a lot of  

110:03

problems with getting grid-connected: permits and  interconnection queues and all this sort of stuff. 

110:08

So even though it's more expensive,  people are doing behind the meter. 

110:12

What they're doing behind the meter ranges widely. It could be reciprocating engines, ship engines,  

110:17

or aeroderivatives. It could be combined-cycle,  

110:19

although combined-cycle is not  that great for behind the meter. 

110:22

It could be Bloom Energy fuel  cells, or solar plus battery. 

110:26

It could be any of these things. And you're saying any of these  

110:29

individually could do tens of gigawatts? Any of these individually will do tens of  

110:34

gigawatts, and as a whole, they  will do hundreds of gigawatts. 

110:36

Okay. So that alone should more than— Electrician wages will probably  

110:42

double or triple again. There are going to be a lot of new people entering  

110:45

that field, and a ton of people who make money,  but I don't see that as the main bottleneck. 

110:51

Right now in Abilene, at the 1.2-gigawatt data  center that Crusoe is building for OpenAI,  

110:59

I think they have 5,000 people  working there, or at peak they did. 

111:04

If you turn that into 100 gigawatts—and  I'm sure things will get more efficient  

111:10

over time—that would be 400,000 people  it would take to build 100 gigawatts. 

111:16

If you think about the US labor force, and  how many electricians there are and how  

111:20

many construction workers there are… I  guess there are 800,000 electricians. 

111:24

I don't know if they're all  substitutable in this way. 

111:26

There are millions of construction workers. But if we're in a world where we're adding  

111:30

200 gigawatts a year, are we going to be  crunched on labor eventually, or do you  

111:35

think that is actually not a real constraint? Labor is a big constraint. It's a humongous  

111:38

constraint in this. People have to  be trained. Likewise, we'll probably  

111:43

start importing the highest-skilled labor. It makes sense that a really high-skilled  

111:50

electrician in Europe who was working  on destroying power plants now comes  

111:55

to America and is building high-voltage  electricity moving across a data center. 

112:03

Humanoid robots or robotics at least might start  to help, but the main factor for reducing the  

112:09

number of people is going to be modularizing  things and making them in factories in Asia. 

112:13

Unfortunately for America, places like Korea,  Southeast Asia, and in many ways China as well  

112:24

are going to ship more and more built-out sections  of the data center and those will be shipped in. 

112:34

Today you currently ship servers or a rack in,  and then you plug that into different pieces that  

112:40

you're shipping from different places. But now you'll ship it to a factory  

112:43

and integrate the entire thing. Maybe this is a two-megawatt block,  

112:48

and this block goes from high-voltage AC  power to the DC voltage that you deliver  

112:56

to the rack, or something like this. Or with cooling, you ship a fully  

113:03

integrated unit that has a lot of the  cooling subsystems already put together,  

113:08

because plumbers are also a big constraint here. Furthermore, instead of just a single rack where  

113:13

you have people wiring up all these racks with  electricity, you take a skid and put an entire  

113:19

row of servers on it that is  shipped directly from the factories. 

113:25

Today, a single rack may be 120 or 140 kilowatts,  but as we get to next-generation Nvidia Kyber and  

113:32

things like that, it's almost a megawatt. In addition, if you do an entire  

113:36

row, it'll have the rack, the networking, the  cooling, and the power all integrated together. 

113:42

Now when you come in, you have much less to cable. There's less networking fiber, fewer power  

113:53

connections, and fewer plumbing things. This can drastically reduce the number  

113:58

of people working in data centers, so our  capability to build them will be much larger. 

114:03

Along the way, some people will move faster  to new things, and some will move slower. 

114:08

Crusoe and Google have been talking  a lot about this modularization,  

114:12

as have companies like Meta and many others. The people who move faster to new things may  

114:24

face delays, while the people who  are slower will face labor problems. 

114:27

There will always be dislocations in the market  because this is a very complex supply chain. 

114:30

At the end of the day, it's still  simple enough that we will be able  

114:33

to solve it through capitalism and human  ingenuity on the timescales required. 

114:39

Speaking of big problems to solve, Elon  Musk is very bullish on space GPUs. 

114:46

If you're right that power is not a constraint  on Earth… I guess the other reason they would  

114:50

make sense is that even if there will be  enough gas turbines or whatever on Earth,  

114:55

Elon's next argument is that you can't get the  permitting to build hundreds of gigawatts on  

115:00

Earth. Do you buy that argument? Land-wise, America is big. Data  

115:05

centers don't actually take up that  much space, so you can solve that. 

115:09

Permitting-wise, air pollution permits are  a challenge, but the Trump administration  

115:12

made it much easier. You go to Texas,  

115:15

and you can skip a lot of this red tape. Elon had to deal with a lot of this complex  

115:22

stuff in Memphis, and then building a power  plant across the border for Colossus 1 and 2. 

115:28

But at the end of the day, there's a lot more  you can get away with in the middle of Texas. 

115:32

Given that Elon lives in Texas,  why didn't he just go to Texas? 

115:34

I think it was partially that they over-indexed  on grid power for a temporary period of time. 

115:40

That's just what they thought they needed more of. Because they had an aluminum refinery  

115:43

connected to the grid there. It was actually an idled appliance factory. 

115:50

But I think they may have indexed more to  grid power, water access, and gas access. 

115:56

I think they bought that knowing the gas  line was right there and they were going  

115:59

to tap it. Same with water. It was a  whole host of different constraints. 

116:03

It was probably an area where  electricians were easier to find. 

116:07

At the end of the day, I'm not  exactly sure why they chose that site. 

116:10

I bet Elon would've chosen somewhere in  Texas if he could've gone back because  

116:16

of the regulatory challenges he faced. Ultimately, permitting is a challenge,  

116:23

but America is a big place with 50  states, and things will get done. 

116:27

There are a lot of small jurisdictions where  you can just transport in all the workers  

116:32

you need for a temporary period of three to  twelve months, depending on the contractor. 

116:37

You can put them in temporary housing and pay out  the butt, because labor is very cheap relative  

116:44

to the GPUs and the networking, and the end  value of the tokens it's going to produce. 

116:52

So there is plenty of room to  pay for all of these things. 

116:59

Also, people are also diversifying now. Australia, Malaysia, Indonesia, and India  

117:06

are all places where data centers  are going up at a much faster pace. 

117:09

But currently, over 70% of AI  data centers are still in America,  

117:12

and that continues to be the trend. People are figuring out how to build these things. 

117:19

Ultimately, dealing with permitting and  red tape in middle-of-nowhere Texas,  

117:23

Wyoming, or New Mexico is probably a hell of  a lot easier than sending stuff into space. 

117:30

Other than the economic argument making less sense  once you consider that energy is a small fraction  

117:36

of the total cost of ownership of a data center,  what are the other reasons you're skeptical? 

117:41

Obviously, power is basically free in space. That's the reason to do it. 

117:45

Yeah, that's the reason to do it. But there are all the other counterarguments. 

117:50

Even if power costs double on Earth, it's  still a fraction of the total cost of the GPU. 

117:54

The main challenge is… We have  ClusterMAX, which rates all the neoclouds. 

118:03

We test over 40 cloud companies,  including the hyperscalers and neoclouds. 

118:06

Outside of software, what differentiates these  clouds the most is their ability to deploy and  

118:11

manage failure. GPUs are horrendously unreliable.  Even today, around 15% of Blackwells that get  

118:19

deployed have to be RMA'd. You have to take them out. 

118:21

Sometimes you just have to plug them  back in, but sometimes you have to take  

118:23

them out and ship them back to Nvidia or  their partners who do the RMAs and such. 

118:28

What do you make of Elon's argument that after an  initial phase, they actually don't fail that much? 

118:34

Sure, but now you've done this, tested them all,  deconstructed them, put them on a spaceship,  

118:39

launched them into space, and then put  them online again. That takes months. If  

118:44

your argument is that a GPU has a useful life of  five years, and this takes six additional months,  

118:57

that is 10% of your cluster's useful life. Because we're so capacity-constrained,  

119:04

that compute is theoretically most valuable  in the first six months you have it. 

119:08

We're more constrained now  than we will be in the future. 

119:11

That compute can contribute to a better  model in the future, or generate revenue  

119:15

today that you can use to raise more money. All these things make now the most important  

119:20

moment, but you've potentially delayed  your compute deployment by six months. 

119:25

What separates these cloud providers is… We  see some clouds taking six months to deploy  

119:28

GPUs right here on Earth. We see clouds that take  

119:31

a lot less than six months. So the question is, where does space get in there? 

119:36

I don't see how you could test them all on Earth,  deconstruct them, and ship them to space without  

119:41

it taking significantly longer than just leaving  them in the facility where you tested them. 

119:45

The question I wanted to ask is about  the topology of space communication. 

119:50

Right now, Starlink satellites talk to  each other at 100 gigabits per second. 

119:56

You could imagine that being much  higher with optical intersatellite  

120:00

laser links optimized for this. That actually ends up being quite  

120:04

close to InfiniBand bandwidth,  which is 400 gigabytes a second. 

120:09

But that's per GPU, not per rack. So multiply  that by 72. Also, that was Hopper. When you go  

120:16

to Blackwell and Rubin, that 2x's and 2x's again. But how much compute is happening per… During  

120:24

inference, are the different scale-ups  still working together, or is inference just  

120:27

happening as a batch within a single scale-up? A lot of models fit within one scale-up domain,  

120:33

but many times you split them  across multiple scale-up domains. 

120:42

As models become more and more sparse,  which is the general trend, you want to  

120:48

ping just a couple of experts per GPU. If leading models today have hundreds,  

120:53

if not a thousand, of experts, then you'd want to  run this across hundreds or thousands of chips,  

120:59

even as we advance into the future. So then you end up with the problem of  

121:05

needing to connect all these satellites  together for communications as well. 

121:09

That would be tough. If there's a world where  you could do inference for a batch on a single  

121:17

scale-up, then maybe it's more plausible. But if not, it's a different story. 

121:21

Networking these chips together  is a problem, and you can't just  

121:24

make the satellite infinitely large. There are a lot of physics challenges to  

121:29

making a satellite really big. That's why you need these  

121:34

interconnects between the satellites. Those  interconnects are more expensive. In a cluster,  

121:38

15-20% of the cost is networking. All of a sudden, you're using space lasers  

121:43

instead of simple lasers that are manufactured in  volumes of millions with pluggable transceivers. 

121:50

And those things are very unreliable as well,  more unreliable than the GPUs by the way. 

121:54

Across the life of a cluster, you have  to unplug and clean them all the time. 

121:57

You have to unplug and replug  them just for random reasons. 

121:59

These things are just not as reliable. So you've got that problem as well. 

122:03

You've got a more expensive, complicated  space laser to communicate instead of this  

122:08

pluggable optical transceiver that's  been produced in super high volume. 

122:11

So all in all, what does that  imply for space data centers? 

122:13

Space data centers effectively are  not limited by their energy advantage. 

122:19

They are limited by the same contended resource. We can only make two hundred gigawatts  

122:24

of chips a year by the end of the decade. What are we going to do to get that capacity? 

122:29

It doesn't matter if it's on land or in space. It doesn’t really matter,  

122:36

because you can build that power. Human capabilities and capacity could get  

122:41

to the period where we're adding a terawatt  a year globally of various types of power. 

122:47

At some point, we do cross the chasm where space  data centers make sense, but it's not this decade. 

122:52

It is much further out, once energy  constraints actually become a big bottleneck  

122:59

and land permitting becomes a much bigger  bottleneck as it subsumes more of the economy. 

123:04

And crucially, once chips  are no longer the bottleneck. 

123:07

Right now, chips are the biggest bottleneck. You want them deployed and working on  

123:11

AI the moment they're manufactured. There are a lot of things people are  

123:15

doing to increase that speed faster and faster. They’re modularizing data centers, or even  

123:20

modularizing racks where you put the chip in at  the data center, but only the chip and everything  

123:26

else is already wired up and ready to go. There are things like this people are doing to  

123:31

decrease that time that you cannot do in space. At the end of the day, all that matters in a  

123:36

chip-constrained world is getting  these chips producing tokens ASAP. 

123:43

Maybe by 2035, the semiconductor industry,  ASML, Zeiss, and suppliers like Lam  

123:45

Research and Applied Materials and other fab  manufacturers will catch up once the pendulum  

123:53

swings and we are able to make enough chips. Then we will be optimizing every dial and it makes  

123:58

sense to optimize the 10-15% of energy costs. As we move to ASICs potentially,  

124:03

and if Nvidia's margins aren't +70%, maybe  that energy cost becomes 30% of the cluster. 

124:11

These are the things to optimize. But Elon doesn't win by doing 20% gains. He  

124:18

never wins that way. Elon wins when he swings for  the fences and does 10X gains. That's what SpaceX  

124:24

is about. That's what Tesla is about. All of his  success has been about that, not chasing the 20%. 

124:31

I think space data centers will eventually  be a 10X gain as Earth's resources get more  

124:37

and more contentious, but that's not this decade. Just to drive some intuition about how much land  

124:42

there is on Earth… Obviously, for the chips  themselves, especially if you move to a world  

124:46

where you have racks that have megawatts— That's the other thing. If manufacturing is  

124:55

the constraint, right now it's roughly one  watt per square millimeter for AI chips. 

125:01

One easy way to improve that is to pump  it to two watts per square millimeter. 

125:05

You may not get 2x the performance,  you may only get 20% more performance,  

125:09

and that requires much more exotic cooling. It requires more complicated cold plates  

125:13

and complex liquid cooling, or maybe  even things like immersion cooling. 

125:18

In space, higher watts per  millimeter is very difficult,  

125:20

whereas on Earth, these are solved problems. One of these things enables you to get a lot  

125:25

more tokens, maybe 20% more tokens per wafer  that's manufactured, and that's a humongous win. 

125:31

Square millimeter, you mean of die area? Yeah, of die area. 

125:36

It would be better for space because more watts  per millimeter means the chip runs hotter. 

125:42

I guess this is a question of computer  chip engineering, but it cools to the  

125:46

fourth power by the Stefan-Boltzmann law. If you can run a very hot  

125:49

chip, it allows a lot of— No, you can't run it hotter. 

125:51

You can only run it denser. The problem is that getting  

125:54

the heat out of that dense area means you have to  move away from standard air and liquid cooling to  

126:00

more exotic forms of liquid cooling, or even  immersion, to get to higher power densities. 

126:05

That's more difficult in  space than it is on Earth. 

126:08

Maybe it's worth explaining at this point what  exactly a scale-up is and what it looks like for  

126:13

Nvidia versus Trainium versus TPUs. Earlier I was mentioning how  

126:22

communication within a chip is super fast. Communication within chips that are in the  

126:26

same rack is fast, but not as fast. It's on the order of terabytes. 

126:30

Communication very far away is on  the order of hundreds of gigabytes. 

126:36

As you get further distance, maybe  across the country, the order  

126:39

of magnitude is on the order of gigabytes. A scale-up domain is this tight domain  

126:44

where the chips are communicating  on the order of terabytes a second. 

126:50

For Nvidia, previously this meant  an H100 server had eight GPUs,  

126:55

and those eight GPUs could talk to  each other at terabytes a second. 

126:58

With Blackwell NVL72, they  implemented rack-scale scale-up. 

127:03

That meant all seventy-two GPUs in the rack could  connect to each other at terabytes a second. 

127:09

The speed doubled generation on generation, but  the most important innovation was going from eight  

127:13

to seventy-two in the domain. When we look at Google,  

127:16

their scale-up domain is completely different. It has always been on the order of thousands. 

127:20

With TPU v4, they had pods the  size of four thousand chips. 

127:23

With v8 or v7, they have pods in  the eight or nine thousand range. 

127:31

What's relevant here is that it's not the  same as Nvidia. It's not like for like.  

127:35

Google has a topology that's a torus. Every chip connects to six neighbors.  

127:40

Nvidia's 72 GPUs connect all-to-all.  They can send terabytes a second to  

127:46

any arbitrary other chip in that pod of scale-up. Whereas Google, you have to bounce through chips. 

127:52

If TPU 1 needs to talk to TPU 76, it has to bounce  through various chips, and there is always some  

127:59

blocking of resources when you do that because  that one TPU is only connected to six other TPUs. 

128:04

So there is a difference  in topology and bandwidth,  

128:07

and there are trade-offs and advantages to both. Google gets to have a massive scale-up domain,  

128:11

but they have the trade-off of bouncing  across chips to get from one to another. 

128:15

You can only talk to six direct neighbors. Amazon has mutated their scale-up domain. 

128:23

They're somewhere in between Nvidia and Google. They're trying to make larger scale-up domains. 

128:28

They try to do all-to-all to some extent with  switches, which is what Nvidia does, but they also  

128:33

use torus topologies like Google to some extent. As we advance forward to next generations,  

128:40

all three of them are moving more  towards a dragonfly topology. 

128:44

That means there are some fully connected elements  and some elements that are not fully connected. 

128:49

You can get the scale-up to be hundreds or  thousands of chips, but also have it not contend  

128:54

for resources when bouncing through chips. Related question: I heard somebody make the  

129:00

claim that the reason parameter scaling has been  slow—and only now are we getting bigger models  

129:08

from OpenAI and Anthropic—is that… The original  GPT-4 is over a trillion parameters, and only now  

129:18

are models starting to approach that again. I heard a theory that the reason is that  

129:24

Nvidia's scale-ups have just not  had that much memory capacity. 

129:37

Let's say you have a 5T model running at  FP8, so that's five trillion gigabytes. 

129:43

And then you have the KV cache, let's say it's— Just call it the same size. 

129:47

Okay, let's say it's the same size for one batch. So you need ten terabytes to be able to run… 

129:54

A single forward pass, yeah. And then only with the GB200 and NVL72  

129:59

do you have an Nvidia scale-up that has twenty  terabytes, and before that they were much smaller. 

130:03

Whereas Google, on the other hand, has had  these huge TPU pods that are not all-to-all,  

130:09

but still have hundreds of terabytes  of capacity in a single scale-up. 

130:13

Does that explain why parameter  scaling has been slow? 

130:16

I think it's partially the capacity and  bandwidth, but also as you build a larger  

130:22

model, the ability to deploy it is slower. In terms of what the inference speed is for  

130:28

the end user, that's kind of irrelevant. What's  really relevant is RL. What we've seen with these  

130:33

models and allocation of compute at a lab… There  are a few main ways you can allocate compute. 

130:38

You can allocate it to inference, i.e. revenue. You can allocate it to development,  

130:42

i.e. making the next model. You can allocate it to research. 

130:46

In development specifically, you  split it between pre-training and RL. 

130:52

When you think about what is happening, the  compute efficiency gains you get from research  

130:58

are so large that you actually want most of your  compute to go to research, not to development. 

131:04

All these researchers are generating new  ideas, trying them out, testing them,  

131:08

and continuing to push the Pareto optimal  curve of scaling laws further and further. 

131:14

Empirically, what we’ve seen is that  model costs get ten times cheaper  

131:17

every year, or even more than that. At the same scale it gets ten times cheaper,  

131:23

and to reach new frontiers it  costs the same amount or more. 

131:27

So you don't want to allocate too  many resources to pre-training and RL. 

131:33

You actually want to allocate most  of your resources to research. 

131:36

In the middle is this development period. If you pre-train a five-trillion-parameter model,  

131:45

how many rollouts do you have to do in RL? Rollouts for a five-trillion-parameter model  

131:51

are five times larger than for  a one-trillion-parameter model. 

131:54

If you wanted to do as many rollouts—maybe  the larger model is two times more sample  

131:57

efficient—now you need 2.5x as much  time of RL to get the model smarter. 

132:05

Or you could RL the smaller model for 2x the time. You'd still have a 25% difference in the big  

132:12

model, which is 2x as sample efficient  and doing X number of rollouts. 

132:16

But the smaller model, which is a  trillion parameters, although its  

132:19

less sample efficient, is doing twice as  many rollouts and is still done faster. 

132:23

You get the model sooner, you've done more RL,  and then you can take that model to help you  

132:28

build the next models, help your engineers  train, and do all these research ideas. 

132:33

This feedback loop is actually weighed  towards smaller models in every case,  

132:39

no matter what your hardware is. As you look to Google, they do  

132:42

deploy the largest production model of  any of the major labs with Gemini Pro. 

132:49

It's a larger model than GPT-5.4. It's a larger model than Opus. 

132:55

Google does this because they have a unipolar set  of compute. It's almost all TPU. Whereas Anthropic  

133:04

is dealing with H100s, H200s, Blackwell,  Trainiums, and TPUs of various generations. 

133:12

OpenAI is dealing with mostly Nvidia right now,  but going towards having AMD and Trainium as well. 

133:18

The fleets of compute like Google's can  just optimize around a larger model. 

133:23

They can leverage a thousand chips in a scale-up  domain to get the RL time speed much faster  

133:30

so that this feedback loop can be fast. But at the end of the day, in isolation,  

133:36

you almost always want to go with a smaller  model that gets RL'd faster and gets deployed  

133:41

into research and development earlier. You can build the next thing and  

133:44

get more efficiency wins. You have this compounding  

133:47

effect of making a smaller model that can be  deployed into research and development earlier. 

133:53

I spend less compute on the training because I  was able to allocate more compute to the research. 

133:58

This compounding effect of being able  to do research faster and faster is  

134:01

potentially a faster takeoff. That's all these companies want:  

134:03

the fastest takeoff possible. Okay, a spicy question. You've explained  

134:10

that SemiAnalysis sells these spreadsheets. You're always pointing out how six  

134:14

months or a year ago, you warned  people about the memory crunch. 

134:17

Now you're telling people about the cleanroom  crunch, and in the future, the tool crunch. 

134:22

Why is Leopold the only person using your  spreadsheets to make outrageous money? What  

134:27

is everybody else doing? I think there are a lot  

134:30

of people making money in many ways. Leopold jokes that he's the only client  

134:38

of mine who tells me our numbers are too low. Everyone else tells me our numbers are too high,  

134:42

almost ad nauseam. Whether it's a hyperscaler saying,  

134:46

"Hey, that other hyperscaler, their numbers are  too high," and we're like, "Nah, that's it." 

134:50

They're like, "No, no, no, it's  impossible," blah, blah, blah. 

134:52

You finally have to convince them through all  these facts and data when we're working with  

134:55

hyperscalers or AI labs that in fact, no,  that number isn't too high, that's correct. 

135:00

Eventually, sometimes it takes them  six months to realize, or a year later. 

135:05

Other clients, on the trading  side, also use our data. 

135:12

Roughly 60% of my business is industry. So AI labs, data center companies,  

135:17

hyperscalers, semiconductor companies, the  whole supply chain across AI infrastructure. 

135:23

But 40% of our revenue is hedge funds. I'm not going to comment on who our customers are,  

135:28

but a lot of people use the data. It's just how do you interpret it,  

135:33

and then what do you view as beyond it? I will say Leopold is pretty much the only person  

135:39

who tells me my numbers are too low, always. Sometimes he's too high, sometimes I'm too low. 

135:44

But in general, I think  other people are doing that. 

135:50

You can look across the space at hedge funds and  look at their 13Fs and see they own, maybe not  

135:56

exactly what Leopold does, because it's always a  question of what is the most constrained thing. 

136:00

What's the thing that's going to  be most outside of expectations? 

136:03

That's what you're really trying to  exploit: inefficiencies in the market. 

136:06

In a sense, our data is making the market  more efficient by making the base data  

136:12

of what's happening more accurate. Many funds do trade on information  

136:22

that is out there… I don't  think Leopold's the only person. 

136:26

I think he has the most conviction  about the AGI takeoff, though. 

136:32

Right, but the bets are not  about what happens in 2035. 

136:37

The bets that you're making—that are at least  exemplified by public returns we can see for  

136:41

different funds including Leopold's—are  about what has happened in the last year. 

136:45

The last year stuff could be  predicted using your spreadsheets. 

136:50

It's about buying the next year's spreadsheets. They're not just spreadsheets. There are  

136:53

reports. There's API access to  the data. There's a lot of data. 

136:56

But do you see what I mean? It's not about some crazy singularity thing. 

137:00

It's about, do you buy the memory crunch? You only buy the memory crunch if you  

137:05

believe AI is going to take off in a huge way. The memory crunch, a lot of it was predicated  

137:12

on… At least for people in the Bay Area who  think about infrastructure, it's obvious. 

137:17

KV cache explodes as context lengths get longer,  so you need more memory. Then you do the math.  

137:22

You also have to have a lot of supply chain  understanding of what fabs are being built,  

137:25

what data centers are being built,  how many chips, and all these things. 

137:28

We track all these different datasets  very tightly, but at the end of the day,  

137:32

it takes someone to fully believe  that this is going to happen. 

137:38

A year ago, if you told someone memory  prices would quadruple and smartphone  

137:42

volumes are going to go down 40% over the  year or two after that, people were like,  

137:48

"You're crazy. That'd never happen." Except a few  people do believe that, and those people did trade  

137:52

memory. And people did. I don't think Leopold  was the only person buying memory companies. 

138:00

He, of course, sized and positioned and did  things in better ways than some, maybe most. 

138:06

I don't want to comment on whose returns  are what, but he certainly did well. 

138:12

Other people also did really well. Wow, you've made me diplomatic for  

138:18

the first time ever. No, no, you're fine.  I think this is hilarious. I'm being a  

138:22

diplomat, whereas usually I'm spicy. Okay, some rapidfire to close out. 

138:31

If you're saying with the memory,  logic, et cetera, the N3 is mostly  

138:38

going to be AI accelerators, but then there's  N2, which is mostly Apple now… In the future,  

138:44

I guess AI would also want to go on N2. Can TSMC kick out Apple if Nvidia and  

138:53

Amazon and Google say, "Hey, we're willing  to pay a lot of money for N2 capacity?" 

138:59

I think the challenge with this is chip design  timelines take a long while, so that's more  

139:04

than a year out, and the designs that are  on two nanometer are more than a year out. 

139:08

What would really happen is Nvidia and  all these others will be like, "Hey,  

139:12

we're going to prepay for the capacity  and you're going to expand it for us." 

139:17

Maybe TSMC takes a little  bit of margin, but not a ton. 

139:21

They're not going to kick Apple out entirely. What they're going to do is when Apple orders X,  

139:25

they might say, "Hey, we project you only need  X minus one, and so that's what we're going to  

139:29

give you, X minus one." Then that flex capacity,  

139:31

Apple's kind of screwed on. Traditionally, Apple has always  

139:35

over-ordered by 10% and cut back  by 10% over the course of the year. 

139:38

Some years they hit the entire 10%. Volumes vary based on the season and macro. 

139:47

I don't think TSMC would kick out Apple. I think Apple will become a smaller and smaller  

139:52

percentage of TSMC's revenue, and therefore be  less relevant for TSMC to cater to their demands. 

139:57

TSMC could eventually start saying, "Hey, you've  got to pre-book your capacity for next year,  

140:01

for two years out, and you have to prepay for  the CapEx," because that's what Nvidia and  

140:05

Amazon and Google are doing. I wonder if it's worth  

140:08

going into specific numbers. I don't have any of them on hand. 

140:15

What percentage of N2 does Apple have its  hands on over the coming years versus AI? 

140:22

This year Apple has the majority of  N2 that's going to get fabricated. 

140:26

There's a little bit from AMD. They are trying to make some AI  

140:28

chips and CPU chips early. There's a little bit,  

140:30

but for the most part, it's Apple. As we go forward to the year after that, Apple  

140:36

still gets closer to half of it as other people  start ramping, but then it falls drastically,  

140:43

just like for N3, where they were half. When I say N2, that includes A16,  

140:49

which is a variant of N2. Over time, those nodes will be the majority. 

140:56

What's also interesting is traditionally,  Apple has been the first to a process node.  

141:00

2 nm is actually the first time they're  not. Well, that’s besides Huawei. Huawei,  

141:04

back in 2020 and before, was the first with  Apple, but they were both making smartphones. 

141:08

Now, with 2 nm, you've got AMD trying  to make a CPU and a GPU chiplet that  

141:14

they use advanced packaging to package  together, in the same timeframe as Apple. 

141:21

This is a big risk for AMD that causes  potential delays because it's a brand-new  

141:26

process technology. It's hard. But at the end  of the day, this is a bet that they want to do  

141:30

to scale faster than Nvidia and try and beat them. As we move forward, when we move to the A16 node,  

141:36

the first customer there is not even  Apple. It's AI. As we move forward,  

141:41

that will become more and more prevalent. Not only will Apple not be the first to a node,  

141:46

they will also not be the majority  of the volume to the new node. 

141:49

They'll then just be like any old customer. Because the scale of TSMC's CapEx keeps  

141:53

ballooning, but Apple's business  is not growing at the same pace,  

141:56

they become a less and less relevant customer. They also will just cut their orders because  

142:02

things in the supply chain are  kicking them out, whether it be  

142:04

packaging or materials or DRAM or NAND. These things are increasing in cost. 

142:10

They can't pass on all the cost to customers  likely because the consumer is not that strong. 

142:14

You end up with this conundrum  where they are just not TSMC's  

142:18

best bud like they have been historically. Do you think if Huawei had access to 3 nm,  

142:23

they would have a better accelerator than Rubin? Potentially, yeah. Huawei was the  

142:29

first with a 7 nm AI chip as well. They were the first with a 5 nm mobile chip,  

142:33

but they were the first with a 7 nm AI chip. The Huawei Ascend was two months before the TPU  

142:41

and four months before Nvidia's A100, I think. That's just moving to a process node. 

142:49

That doesn't imply software or hardware  design or all these other things. 

142:55

But Huawei is arguably the only company in the  world that has all the legs. Huawei has cracked  

143:02

software engineers. Huawei has cracked  networking technologies. That's, in fact,  

143:06

their biggest business historically. They have  cracked AI talent. Furthermore, beyond Nvidia,  

143:13

they actually have better AI researchers. Beyond Nvidia, they have their own fabs. 

143:18

And beyond Nvidia, they have their own end  market of selling tokens and things like that. 

143:23

Huawei is able to get the top, top talent. Nvidia is as well, but not with as much  

143:30

concentration, and Huawei  has a bigger pool in China. 

143:33

It's very arguable that Huawei, if they  had TSMC, would be better than Nvidia. 

143:38

There are areas where China has advantages  in areas that Nvidia can't access as easily. 

143:46

Not just scale, but certain optical  technologies China's actually really good at. 

143:54

I think it's very reasonable that if in  2019 Huawei was not banned from using TSMC,  

144:02

Huawei would have already eclipsed  Apple as the biggest TSMC customer. 

144:06

Huawei has huge share in networking,  compute, CPUs, and all these things. 

144:10

They would have kept gaining share, and  they'd likely be TSMC's biggest customer. 

144:14

Wow. That's crazy. I've got a  random final question for you. 

144:18

The other part of the Elon interview was robots. If humanoids take off faster than people expect,  

144:24

if by 2030 there's millions of humanoids  running around which each need local compute,  

144:33

any thoughts on what that implies? What would be required for that? 

144:37

There's a lot of difficulties with the VLMs  and VLAs that people are deploying on robots. 

144:46

But to some extent, you don't need to  have all the intelligence in the robot. 

144:49

It would be much more efficient to not do that. Because in the cloud, you can batch  

144:54

process and all these things. What you may want to do is have a  

144:58

lot of the planning and longer-horizon tasks  determined by a much more capable model in  

145:04

the cloud that runs at very high batch sizes. Then it pushes those directions to the robots,  

145:08

who interpolate between each subsequent action. Or it is given a command like, "Hey,  

145:13

pick up that cup," and then the model  on the robot can pick up the cup. 

145:17

As it's picking up, things like weight and  force may have to be determined by the model  

145:27

on the robot, but not everything needs to be. It can say, "hey that’s a headphone" and the  

145:34

super model in the cloud can say, "I  know these headphones are Sony XM6s,"  

145:38

which is not a Dwarkesh ad spot, but... I’m like, why is this guy's plugging this  

145:42

thing so hard. It's on the table. It's on his  neck when we're interviewing Satya together. 

145:48

Is he getting paid by Sony? Unfortunately not. But anyways,  

145:53

it might say, "Hey, the headband is soft, and  this is the weight of it," and all these things. 

145:58

Then the model on the robot  can be less intelligent,  

146:00

take these inputs, and do the actions. It may get told by the model in the cloud  

146:05

every second, or maybe ten times a second,  depending on the hertz of the action. 

146:09

But a lot of that can be offloaded to the cloud. Otherwise, if you do all of the processing on the  

146:15

device, I believe it would be more  expensive because you can't batch. 

146:17

Two, you couldn't have as much intelligence  as you do in the cloud because the  

146:20

models will just be bigger in the cloud. Three, we're in a semiconductor shortage world,  

146:25

and any robot you deploy needs leading-edge  chips because the power is really bad for robots. 

146:31

You need it to be low power and efficient,  and all of a sudden you're taking power  

146:36

and chips that would've been for AI data  centers, and you're putting them in robots. 

146:39

So now that 200 gigawatts gets lower if  you're deploying millions of humanoids. 

146:43

I think this is very interesting because  something people might not appreciate  

146:47

about the future is how centralized, in  a physical sense, intelligence will be. 

146:52

Right now, there are eight billion humans, and  their compute is in their heads, on their person. 

147:00

In the future, even with robots that are  out physically in the world—obviously,  

147:04

knowledge work will be done in a centralized  way from data centers with hundreds of thousands  

147:09

or maybe millions of instances—the future  you're suggesting is one where there's more  

147:17

centralized thinking and centralized computation  driving millions of robots out in the world. 

147:25

That's an interesting fact about the future  that I think people might not appreciate. 

147:28

I think Elon recognizes this, which is why  he's going to different places for his chips. 

147:35

He signed this massive deal with Samsung to make  his robot chips in Texas because I personally  

147:41

think he thinks Taiwan risk is huge. Because of that and the centralization  

147:46

of resources in Taiwan, having his robot  chips in Texas means having a separate  

147:51

supply chain that is not as constrained. No one's really making AI chips on Samsung  

147:56

besides Nvidia's new LPU that they launched. They’re launching it next week, but we're  

148:01

recording this the week before. This episode's coming out Friday. 

148:04

Oh, this episode's coming out before.  Sick. They're launching this new AI chip  

148:09

next week which is built on Samsung, but  that's a recent development from Nvidia. 

148:15

That's the only other AI demand there,  whereas on TSMC, everything is competing. 

148:19

He gets both geopolitical diversification  and supply chain diversity for his robots,  

148:25

and he's not competing as much with the infinite  willingness to pay for the data center geniuses. 

148:34

Final question, on Taiwan. If we believe  that tools are the ultimate bottleneck,  

148:41

how much of Taiwan's place in the AI semiconductor  supply chain could we de-risk simply by having a  

148:50

plan to airlift every single process engineer  at TSMC out if they get blockaded or something? 

148:56

Or do you still need to ship out the EUV  tools, which would be multiple plane loads  

149:02

per single tool and would not be practical? If you ship out all the process engineers and  

149:06

assuming it's hot enough that you destroy the  fabs, no one has all the fabs in Taiwan now,  

149:11

which is a big risk. These tools actually use a lot of  

149:16

semiconductors which are manufactured in Taiwan. It's a snake eating its own tail meme because  

149:22

you can't make the tools without the chips from  Taiwan, which you can't use without the tools in  

149:26

Taiwan. There's obviously some diversification  there. They don't use super advanced chips in  

149:32

lithography tools, but at the end of the  day, there is some dragon eating its tail. 

149:36

Just shipping out all the engineers and  blowing up the fabs means China has a  

149:40

stronger semiconductor supply chain than the  rest of the world in terms of verticalization,  

149:44

now that you've removed Taiwan. You've got all the know-how,  

149:49

but you've got to replicate it in,  let's say, Arizona or wherever for TSMC. 

149:56

It's going to take a long time to build all the  capacity that TSMC has built over the years. 

150:01

And so you've drastically  slowed US and global GDP. 

150:06

Not just growth, you've shrunk the GDP  massively, and you've got a lot bigger problems. 

150:12

Your incremental ability to add  compute goes to almost zero. 

150:16

Instead of hundreds of gigawatts  a year by the end of the decade,  

150:18

let's say something happens to Taiwan, now you're  at maybe 10 gigawatts across Intel and Samsung,  

150:24

or 20 gigawatts. It's nothing. Now all of a sudden  you've really caused some crazy dynamics in AI. 

150:31

Of course, you have all the existing capacity,  but that existing capacity pales in comparison  

150:35

to the capacity that's being expanded. Okay. Dylan, that was excellent. Thank  

150:39

you so much for coming on the podcast. Thank you for having me. And see you tonight.

Interactive Summary

Dylan Patel, CEO of SemiAnalysis, joins Dwarkesh Patel to discuss the semiconductor landscape and the physical constraints on scaling AI. Patel explains that the massive CapEx from Big Tech is being funneled into long-term infrastructure like turbines, data center construction, and power agreements. He identifies chip manufacturing—specifically ASML's EUV lithography tools—as the ultimate bottleneck for the decade, while also addressing the current memory crunch that is driving up costs for consumer electronics. The conversation covers the diverging strategies of OpenAI and Anthropic, the potential for behind-the-meter power generation, and why terrestrial data centers remain more economically viable than space-based alternatives due to maintenance and deployment speed.

Suggested questions

6 ready-made prompts