HomeVideos

NVIDIA's $660 Billion Problem Just Got Worse

Now Playing

NVIDIA's $660 Billion Problem Just Got Worse

Transcript

407 segments

0:00

OpenAI just launched a new model that,

0:02

get this, does not run on Nvidia

0:05

hardware.

0:07

[music]

0:10

Finance bros and PR dudes covering this

0:13

are going to miss this because you need

0:14

to understand the underlying technical

0:16

architecture to understand why this is

0:18

massively significant for Nvidia's

0:21

valuation, especially with their

0:23

earnings coming on February 25th. If you

0:25

own Nvidia stock or if you're on the

0:27

sidelines waiting to buy, you need to

0:29

pay attention to this. You need to know

0:31

this. I'm going to break it down for you

0:32

in this video. By the way, you like the

0:34

hat? It's from abandoned wearer. Link

0:35

down there in the description. So,

0:36

here's what went down. Open AI in a

0:41

continued effort to get worse SEO for

0:45

their model names releases GPT 5.3

0:49

Codeex Spark. It is a lighter version of

0:51

their existing coding model called

0:54

Codeex. What's different about this one

0:56

is it's designed for real-time coding.

0:58

If you have ChatgPT Pro, you can use it

1:01

right now through their app on the

1:03

Codeex tab. You can use it in the Codeex

1:05

CLI and you'll al also have access in

1:07

the VS Code plugin. The headline number

1:09

is a,000 tokens per second, which might

1:12

not mean anything to you, but tokens per

1:14

second is effectively how fast a model

1:17

can produce a response. to put that in

1:20

relative terms for you. Any other

1:22

frontier model, whether that be any of

1:24

the other codeex models, chachi,

1:27

claude, etc., etc., they usually do

1:30

about 30 to 100x tokens per second. So,

1:32

even the fastest Frontier models on the

1:35

market that you have access to right

1:36

now, this is 10 times as fast. And

1:39

here's this the part that should make

1:41

Jensen really nervous. This does not run

1:43

on Nvidia silicon. It runs on the

1:46

Cerebrris wafer scale engine 3. So not

1:49

H100's but different manufacturer

1:52

entirely. And the WSC3 is a massive feat

1:56

of engineering. It's truly impressive.

1:58

It is a thick chip. It is big. It's

2:02

actually the size of a typical silicon

2:05

wafer. 46,000 mm to be precise. And for

2:09

comparison, Nvidia's H100 is about 814

2:13

square mm. The WSC3 by the numbers has

2:17

four trillion transistors, 900,000 AI

2:20

cores, and 44 GB of onchip SRAMM. That

2:24

is an insane amount crammed onto one

2:27

chip. Even if it is big, the memory

2:29

bandwidth out of those numbers is 21

2:32

pabytes per second, which is

2:34

extraordinary. Again, if that doesn't

2:36

make any sense to you, let's compare it

2:38

to an H100, which only does about 3

2:40

terab per second. So, we're talking

2:43

7,000 times more bandwidth than Nvidia's

2:47

best option. This partnership between

2:49

OpenAI and Cerebrris was announced in

2:51

January. This is the first real product

2:53

to emerge out of that partnership. And

2:55

Open AI Sammy Boy was real clear about

2:58

the strategy. They said, quote,

3:01

"Diversifying compute infrastructure

3:03

beyond GPUs for latency sensitive

3:05

workloads." It's a direct quote. You can

3:07

read between the lines there, but if you

3:09

need it spelled out, it's basically like

3:11

Sam and Jensen are in a romantic

3:15

relationship. Jensen goes out to the

3:17

grocery store, hears a weird knocking in

3:19

the bedroom as he walks in. And he walks

3:21

in and Sam is in bed with somebody else.

3:24

It's cerebrous. And he sits up and

3:26

instead of saying, "It's not what it

3:28

looks like," he says, "It's exactly what

3:30

it looks like. We're diversifying." Now,

3:32

if this was just one ship deal, one

3:34

company making this, I'd say, you know,

3:36

probably some PR or publicity stunt,

3:39

probably inflated numbers, doesn't

3:40

really matter, but all of the

3:41

hyperscalers are doing this, or they're

3:43

at least attempting it. Google has been

3:44

running inference on their own TPUs for

3:47

years now. They're on version six now.

3:50

Amazon built the regrettably named

3:52

Trrenium. They're on Trannium 3. I'm

3:54

told that the first two movies weren't

3:56

very interesting. Meta built MTIA for

3:59

their own internal workloads, which

4:00

nobody has ever heard of. Microsoft is

4:02

building Maya. Nobody's ever heard of

4:04

it. But my point is, every hyperscaler

4:06

is at least trying to get off of

4:09

Nvidia's monopoly. And you might say,

4:11

"Dr. J, it's publicity stunt. It's a

4:13

publicity stunt. You don't know what

4:14

you're talking about." And that's not

4:16

true. The best numbers we have for this

4:18

are that these hyperscalers are running

4:20

about 10 to 15% of their current

4:22

workloads on custom silicon. Whether

4:24

that's developed in house in the case of

4:25

Google or with somebody like Cerebrris

4:27

for OpenAI. And it doesn't sound like a

4:29

lot. It's just 10%. you can kind of

4:30

shrug it off, but remember in these

4:32

videos for AI infrastructure, we're

4:34

talking about this year a proposed spend

4:36

from the hyperscalers alone of about

4:38

$660 to $690 billion.

4:41

So yes, 10% of that is a significant

4:44

amount of money. That's empire building

4:45

money. So Nvidia still dominates about

4:48

90% of the AI accelerator market, but

4:52

the dominance doesn't matter. It's is

4:54

that number moving up or is it moving

4:56

down? That's what Wall Street cares

4:58

about. Are you signaling strength or are

5:00

you beginning to lose? And they're

5:02

beginning to lose out. And part of that

5:03

is model dynamics. Again, you're not

5:05

going to understand this if you are not

5:07

deep into the technicals of it. But

5:09

model development and training has

5:11

slowed down a little bit. The frontier

5:13

models are getting better every year,

5:14

but we're not seeing the massive gains

5:17

like we saw with this last generation

5:19

compared compared to the uh previous

5:21

generation. And so when you're training

5:22

a model, when you're building a model

5:24

like Opus 4.6, 6, you need these big

5:27

heavy lifting GPU clusters and Nvidia

5:31

has a lock on that market and

5:33

specifically their CUDA infrastructure

5:35

is the the best in class for this. It

5:38

cannot be rivaled currently. What we are

5:41

beginning to see though is this shift

5:43

from an emphasis on training these

5:45

models to doing more inference at least

5:47

in a percentage breakdown. And I can

5:50

prove this very easily. You watching,

5:52

how many times have you trained an AI

5:54

model? probably never. I'm pretty

5:56

technical and I'm pretty into this stuff

5:58

and I have never trained a full-on AI

6:01

model from the ground up. I've never

6:03

I've never done it and I'm very very

6:04

interested in this stuff. I spent a lot

6:06

of time on it. But I'll tell you what I

6:08

do use and you probably use you probably

6:10

use claude or chat GPT. You probably

6:13

Google search and sometimes you use that

6:15

little AI summary. All of that is what's

6:17

called inference when you're actually

6:18

using the model to get a result out of

6:20

it. And so we've seen percentage-wise a

6:23

big shift from yes, the labs are still

6:25

training models, but there's a good

6:27

approach for training. People are doing

6:29

things largely the same way. There are

6:30

discoveries being made, but a huge

6:33

percentage-wise effort is now going

6:35

towards inference because everybody is

6:37

using these models. Very few people are

6:40

training them. So Nvidia's moat which

6:42

was training especially with those CUDA

6:44

drivers is starting to diminish in tech

6:48

like the WSC3 are starting to dominate

6:51

in the inference space. So it begs the

6:53

question why does Nvidia have so much

6:55

market share if most of the use of AI

6:58

chips is now inference and they don't do

7:01

inference as well as other companies.

7:02

They do training very well. The numbers

7:04

are bold here and we'll talk about that

7:06

in a minute. But Cerebrris claims

7:09

20 times faster inference than GPU

7:11

clusters and seven times better

7:13

performance per watt. Let's break that

7:16

down. Let's first of all, let's assume

7:17

that those are PR numbers, right? Those

7:19

are PR numbers. Those aren't real

7:20

numbers. And so you got to divide them

7:21

at least let's say let's say let's be

7:23

generous. Let's divide them by half. So

7:25

we'll say 10x faster and 3.5x better

7:27

performance per watt. The the faster is

7:30

just like right on the nose of it. I

7:31

mean, it's like, yes, it would be great

7:33

if these models responded faster. I

7:34

think everyone could agree that's the

7:36

biggest use case issue with a lot of

7:38

these models is they take forever to

7:40

respond. If you're going to use an Opus

7:42

4.6, it's correct every time, but it

7:44

takes a long time to get back to you.

7:47

And then the performance per watt, I

7:49

mean, you environmentalists, you might

7:50

be getting stoked about this, and I I am

7:53

too. I know that AI is pretty bad for

7:55

the environment, but even from a

7:57

business perspective, it's incentivized

7:59

to build more energyefficient chips

8:02

because if you are in the data center

8:04

building and purveying business, you're

8:07

paying a boatload of energy electricity

8:11

costs to power your data center. And so

8:14

if you can get those gains by using

8:16

other chips and the chips can do it

8:18

faster, man, that's an obvious choice.

8:20

It's kind of like when the Tesla came

8:22

out originally and the 0 to 60 time, you

8:25

know, if you were parked at a light next

8:27

to a a Corvette, some kind of a boomer

8:30

hot rod car, it could just smoke it off

8:32

the line and it probably cost half the

8:34

price as the Corvette. It's that kind of

8:37

big paradigm shift that we're looking

8:38

at. That bullcase for Nvidia has always

8:41

been their moat. And right now, the moat

8:42

is starting to show some cracks in it.

8:44

It's not like Nvidia was doing really,

8:46

really well before this. They had

8:49

something kind of nefarious in their

8:51

latest quarterly earnings that they try

8:53

to cover up. And if you read the

8:55

earnings, it's really clever how they

8:57

kind of couch it and try to push this

8:58

aside like, no, it's totally not a

9:00

problem. And that's the China problem.

9:02

It's an $8 billion problem to be

9:04

precise. And the issue with that is in

9:06

April 2025, the Trump administration

9:09

banned their export of H20 chips to

9:12

China. They partially reversed that at

9:14

some point over the summer. There's some

9:16

insider baseball going on there, but it

9:18

seems like they are going to be able to

9:19

ship those H20 chips to China at a much

9:21

reduced capacity. So, that's taken a

9:24

huge chunk out of their sales. And

9:26

that's not all. It's not just sales on

9:28

the nose. Like, we could turn around and

9:29

sell those H20s to someone else. The

9:30

H20s were developed specifically to be

9:33

neutered and underpowered enough that

9:35

they would meet the export regulation

9:37

guidance. So, if you're an American data

9:39

center looking to buy some Nvidia chips,

9:42

it's like, "No, I don't want the H20

9:45

chips, like they're super underpowered.

9:47

They're deliberately neutered because

9:48

they were going to go to China." So,

9:49

they actually have a lot of these

9:51

sitting in inventory. There's

9:52

specifically they're looking at a $4.5

9:54

billion inventory charge. They had

9:56

buyers lined up from Alibaba,

9:58

10-centent, Bite Dance. They placed

10:00

orders worth over $16 billion and all of

10:03

that, or at least most of it, went up in

10:05

smoke. So the total rev impact for that

10:07

across 2025 and 2026 like I mentioned is

10:09

about 8 billion. That's not a small

10:11

number for a company even if they are

10:13

doing 50 billion quarters. So that

10:15

future revenue outlook for uh Chinese

10:17

business for Nvidia mm- not looking

10:20

good. So on the inference side you got

10:22

people starting to develop their own

10:23

chips. On the China side you got people

10:26

starting to develop their own chips. So

10:27

going into a Feb 25 earnings man I would

10:31

hate to be Jensen right now. Those

10:33

earnings will be on Feb 25. The

10:35

conference call is going to start at

10:37

2:00 p.m. Pacific time. Let me tell you

10:39

the four signals I'm watching out for in

10:41

that call going into it. First are the

10:44

headline numbers versus what Wall Street

10:47

is guiding. Literally, if they do not

10:50

beat, if they miss, nothing else in the

10:53

call matters and can save them. The

10:54

stock is going to tank in a bad way.

10:58

Nvidia always beats. This is one of the

11:00

most reliable things. It's like so

11:02

reliable. You can set your watch to it.

11:04

They beat. That's what they do. They're

11:05

Nvidia. And so if they don't beat, I

11:07

mean, right off the bat, everybody's

11:09

going to tune out. Don't care what

11:11

you're saying. Like, all I hear is you

11:12

you're losing. You're losing. And I

11:14

don't want to invest money in you and I

11:15

want my money back. Second is the

11:18

guidance. Jensen is in a horrifically

11:21

bad position here because he has two

11:23

options, right? He can say that our

11:26

guidance is going to be lower. He can

11:28

lower the guidance compared to

11:29

expectations and the stock is going to

11:31

dip because it's like you're not

11:32

confident that your company is going to

11:34

keep growing at least at the same rate

11:36

it is. Or he can guide that it's going

11:38

to grow in line with historical

11:40

performance in line with what the

11:41

analysts are guiding. And even that

11:44

might not be enough given where the

11:46

stock is trading at. Third, we got to

11:48

hear about inference diversification.

11:50

And I'll tell I'll let you in on a

11:51

little secret on this. If Jensen gets

11:55

hit with a question or proactively

11:57

mentions inference diversification, and

12:01

when I say inference diversification, I

12:02

mean basically, what are you doing to

12:04

address the fact that a bunch of these

12:05

other companies are building their own

12:07

custom silicon to compete with you and

12:09

they're not going to need you for as

12:11

much inference work anymore? If he gives

12:13

some answer about how their proprietary

12:16

CUDA is the best, it's unassailable,

12:19

they're sunk. They are so sunk. If he

12:22

mentions CUDA in that response in a

12:23

favorable way for inference,

12:26

man, wrecked. Wrecked. He's just trying

12:28

to block and tackle and he's banking on

12:30

the fact that the analysts are too dumb,

12:32

which might work out actually, to see

12:34

past what he's saying. The only correct

12:36

answer he can give here is about

12:38

Blackwell specific performance. And when

12:42

I say give an answer about Blackwell's

12:43

specific performance, if he says

12:45

anything like might be, could be,

12:47

should, we expect, any of those weasel

12:50

words, he's also in trouble. He's also

12:52

in trouble. The only correct answer he

12:54

can give is concrete data on Blackwell

12:58

performance. Concrete, real data on how

13:00

this is going to save them on the

13:02

inference front. And if that doesn't

13:04

manifest, you know that their

13:05

development isn't far enough along to

13:07

mount a serious threat to these new

13:09

contenders. Fourth, watch the data

13:11

center revenue breakdown. Nvidia is

13:14

classically shady about how much of

13:17

their product is going towards inference

13:20

versus training. But if they mention

13:22

that inference revenue is growing slower

13:25

than the overall market for inference,

13:26

which is my hypothesis, this would all

13:28

but confirm that the alternatives are

13:30

winning. Stock has been trading around

13:32

188 for 6 months. Technicals point to

13:34

maybe 195 to 200 on a beat. But if any

13:37

of these risk factors that I just

13:38

mentioned hit, the downside is a lot

13:40

more than $12. I can tell you that right

13:42

now. Look, I'm not saying to sell your

13:44

Nvidia, but what I am saying is that the

13:47

narrative is changing and they haven't

13:49

addressed this properly. And I don't

13:50

think they can address it. The monopoly

13:53

narrative that got them from $15 to $188

13:57

is starting to show some real cracks.

13:59

Open AAI running their new model on

14:02

Cerebrus is a signal. The $8 billion

14:04

China hole is a signal. and the

14:06

hyperscalers building out their own

14:08

custom silicon across the board is

14:10

another signal. I'm going to do a full

14:12

Nvidia earnings analysis after that

14:15

earnings call on Feb 25. That's going to

14:17

pop up on the channel 1 to two hours as

14:19

fast as I can edit it and post it after

14:21

those earnings. And we're going to have

14:23

the best analysis on this channel. I

14:25

know what's going on at the technical

14:26

level. I've worked in software. I've

14:27

worked in hardware. I've been an

14:28

engineering director. And I know what's

14:30

going on at the financial level as well.

14:33

I'm going to be able to call them out if

14:34

there's any BS. I'm going to be able to

14:36

explain to you what a lot of those

14:38

technical decisions and guidance means

14:40

and I'm also going to inform you on the

14:42

financial ramifications of those things.

14:44

If you want the analysis before anyone

14:46

else, join us on the newsletter. Link is

14:48

down there in the description. Thank you

14:50

for watching.

Interactive Summary

OpenAI has launched GPT 5.3 Codeex Spark, a real-time coding model that runs on Cerebras's wafer scale engine 3 (WSC3) instead of Nvidia hardware. This is significant for Nvidia's valuation, especially given its upcoming earnings. The WSC3 boasts superior inference capabilities, being 10 times faster than other frontier models and offering significantly higher memory bandwidth than Nvidia's H100. This move by OpenAI signals a broader industry trend where hyperscalers are diversifying their compute infrastructure beyond GPUs, particularly for inference workloads, which are growing faster than training workloads. Nvidia's dominance in AI accelerators, particularly with CUDA for training, is being challenged as the market shifts towards inference, where alternative chips like Cerebras excel in speed and energy efficiency. Additionally, Nvidia faces an $8 billion "China problem" due to export bans on its H20 chips, leading to inventory issues and lost sales. The speaker advises watching four key signals during Nvidia's February 25th earnings call: headline numbers, guidance, their response on inference diversification, and the data center revenue breakdown.

Suggested questions

6 ready-made prompts