HomeVideos

E22: NVIDIA'S HUGE AI Announcements Will Change Everything

Now Playing

E22: NVIDIA'S HUGE AI Announcements Will Change Everything

Transcript

702 segments

0:00

I'm excited to share this exclusive

0:01

interview with the investing community.

0:03

Most people think of Nvidia as a

0:05

hardware company that builds chips to

0:07

train massive AI models, but you're

0:09

about to get an inside look at a very

0:11

different side of the story. I'm joined

0:13

by Joe Dalaire, product lead of AI

0:15

infrastructure at Nvidia. Joe spent the

0:18

last 4 years deploying the hardware and

0:20

software behind some of the most

0:22

powerful AI models on the planet, and he

0:24

shared a few surprising insights about

0:26

where AI is headed next. But, that's

0:28

just one of the many technologies that

0:30

I'll be covering live at GTC in a few

0:33

weeks. GTC is Nvidia's massive AI

0:36

conference, showcasing the biggest

0:38

breakthroughs in robotics and

0:39

self-driving cars, AI agents and the

0:41

chips that power them, and a whole lot

0:43

more. And anyone who signs up for a free

0:46

online session at GTC with my link can

0:48

win an Nvidia RTX 5090 graphics card.

0:52

Just attend any session, take a

0:53

screenshot as proof, and send it to me

0:56

after the conference using the links

0:57

below. GTC should be on every investor's

1:00

radar, and so should Nvidia's ecosystem

1:03

for AI inference. Your time is valuable,

1:05

so let's get right into it.

1:08

I'm so happy to be here with you. Thanks

1:09

for taking the time, by the way.

1:10

>> Okay. Jensen talked about a lot of

1:12

awesome things at the keynote, and one

1:14

of the things that he talked about in

1:16

detail is that Nvidia actually

1:18

co-designed six different chips for the

1:21

Vera Rubin generation.

1:23

That's a lot to go through, so I'd love

1:24

to go through all of it with you,

1:26

starting from the GPU itself and working

1:28

all the way up to the rack-scale system

1:31

level, if that's okay. So, let's let's

1:33

just start with uh Rubin itself. What's

1:35

the difference between Blackwell and

1:37

Rubin?

1:38

Oh, so there's several different things

1:40

about Rubin that are are different than

1:42

Blackwell. So, we have the six chips

1:44

that you talked about. Uh all of it

1:46

co-designed together. So, what we did is

1:48

we looked at the data center

1:49

requirements,

1:50

and we worked our way backwards and

1:51

said, "What do we need in all these six

1:53

different chips to make sure that we get

1:54

the best performance, the best energy

1:57

efficiency, the lowest cost. Yeah. So,

1:59

that's what the fundamental thing about

2:02

Rubin is this extreme co-design. All

2:05

these chips

2:06

manufactured together, designed

2:08

together, working in concert for the

2:11

best performance.

2:11

>> And when you say you looked at the data

2:13

center requirements, are those being

2:15

driven by AI models today? Or what like

2:17

what's driving those requirements?

2:18

>> Absolutely. Models are definitely the

2:19

thing that are driving this compute

2:21

demand. And MOE models in particular,

2:24

mixture of experts, where they're

2:26

generating many, many tokens factors

2:29

more tokens because of the reasoning

2:31

that they do.

2:33

Also, the model sizes are growing as

2:35

well. So, they're getting more

2:36

intelligence from model size, from

2:38

reasoning.

2:40

So, that is just generating a tremendous

2:42

amount of

2:43

compute demand. And Rubin is designed to

2:46

address that. Got it. So, talk to me

2:48

about the difference between Blackwell

2:50

and Rubin, the GPUs specifically in

2:52

terms of power and performance.

2:55

So, in terms of power and performance,

2:57

for inference workloads, we will see up

3:00

to 10x better performance on Rubin

3:03

versus Blackwell.

3:04

Wow. 10x performance per watt. So, that

3:07

means that So, at given

3:09

fixed latency, you can see with those

3:12

parado charts that we've shown in

3:13

Jensen's keynotes, at a particular

3:16

latency, a very, you know, high latency,

3:18

that's a very uh

3:19

good for users of the model. So, yeah,

3:22

the 10x performance is across the rack

3:24

scale. Is it at the rack scale?

3:25

>> Rack scale architecture. So, here we

3:27

have the Blackwell Ultra generation

3:29

compute tray. And I can show you what

3:32

what we have here in terms of the

3:33

components and their breakdown.

3:35

So, we have two superchips. Two

3:38

superchips, okay.

3:38

>> Superchips have

3:40

two Blackwell Ultra GPUs,

3:43

and then one Grace CPU on on one

3:46

superchip, and then there's two of them

3:47

together, so four GPUs, two a

3:51

uh we also have ConnectX-8 super NICs

3:54

that are also part of this superchip. Uh

3:57

and that's going to be an important

3:58

distinction when we talk about Vera

3:59

Rubin later and how those have been

4:01

moved. Um but yeah, you can see that

4:04

this is a hybrid cooled. Okay.

4:06

>> So, these are cold plates doing the

4:07

liquid cooling on the superchips and all

4:09

their components. And then on the bottom

4:11

half of of the tray or the front half,

4:14

uh I should say, this is air cooled. So,

4:16

these are all What I'm actually looking

4:18

at is the tops of all fans, right?

4:20

>> Eight fans here.

4:21

>> Got it. So, eight fans and then we have

4:23

a uh BlueField DPU uh that is part of

4:26

this tray as well. That is for the

4:29

north-south traffic uh connecting the

4:31

storage, getting the data in to the the

4:35

the compute rack so that it feeds the

4:38

the GPUs. Got it. So, the yeah, the DPU

4:40

brings data in and out. Yes. And then

4:42

all the process all the magic happens in

4:44

the superchips themselves. Got it. So,

4:46

there's two kinds of network traffic.

4:48

North-south is inside the same rack.

4:51

East-west is connecting multiple racks.

4:53

Is that how we should think about it?

4:55

>> That's That's the proper way to think of

4:56

it, yes. Yes.

4:58

I thought NVIDIA was just a GPU

5:00

designer, but Grace is a CPU, right? So,

5:03

what is the CPU do?

5:05

So, the CPU it handles a lot of the

5:06

management. So, for example, like when

5:09

you're doing

5:10

uh you're trying to use inference and

5:12

you want your your model to make uh some

5:15

code for you. And you want it to make

5:17

maybe it makes a little application, a

5:18

Python application. It needs to run

5:20

that. Grace CPU can actually run that

5:22

application. The GPU wouldn't run an

5:24

application that's generated by by a

5:26

model.

5:27

Um but it is also doing uh other kinds

5:30

of things like database analytics and

5:33

those types of functions that are more

5:34

CPU-friendly.

5:36

Uh it's able to accelerate those types

5:37

of Oh, so really the whole idea is kind

5:40

of like you have the GPUs do what

5:41

they're the best at, then you have the

5:43

CPU to do things obviously that CPUs are

5:46

much better at GPUs at so that you can

5:48

sort of spread out the work over the

5:50

right chip for the job, right?

5:52

>> You also mentioned something called a

5:53

DPU. Can you walk us through what a DPU

5:56

>> So, DPU, BlueField DPU, data processing

5:58

unit. That's going to handle some of the

6:00

north-south traffic. North-south

6:02

traffic, yep. And when you're connected

6:04

to storage that's on a different rack,

6:07

there's going to be compression,

6:09

encryption,

6:10

that's all going to be managed by the

6:12

DPU that we have in BlueField BlueField

6:14

3. And the goal for that is just to make

6:16

sure the CPU and the GPU aren't doing

6:18

those things. That's correct. Offloading

6:20

Offloading all those functions from the

6:22

CPU and the GPU, accelerating those

6:24

functions in hardware,

6:26

so that you get the fastest data access

6:28

to feed the GPUs. That makes a lot of

6:31

sense. Okay, so those are three of the

6:33

six chips so far, right? The CPU, the

6:36

GPU, and the DPU. And the And the

6:38

ConnectX.

6:39

>> Yeah, talk to me a little more about

6:40

that.

6:40

>> ConnectX-8, this is your east-west

6:42

connectivity. So, this is your supernic

6:45

for connecting east-west. It also has

6:48

in-line encryption, those types of

6:50

functions for the east-west traffic

6:52

that's going to be connecting between

6:54

rack to rack of GPU racks. Got it. So,

6:56

we have the GPU, the CPU, the DPU, and

7:01

the ConnectX on this board. That's

7:02

correct. Where are the other two chips?

7:04

So, the NVLink switch is the the other

7:07

chip. And there there's a two here on

7:11

this switch tray.

7:12

This is NVLink 5 or the fifth generation

7:15

of NVLink.

7:16

And these are these are communicating to

7:18

the NVLink network at 1,800 GB per

7:21

second. 1,800 GB

7:22

>> 1.8 TB per second. So,

7:25

very high speed,

7:28

and that's really going to be the

7:30

the central nervous system of a

7:33

Blackwell GB200 NVL72. Got it. So, So,

7:37

these are two completely different

7:38

trays, right? So, this This compute

7:40

tray, that's where the magic happens in

7:42

terms of crunching the numbers. And then

7:44

this is the switch tray, which I think

7:46

you mentioned earlier is all about just

7:48

connecting all the GPUs together. So, it

7:50

connects all the GPUs together. Uh

7:53

there's several of these trays within a

7:54

rack. Yeah. Uh all the GPUs are 72 GPUs.

7:58

They have 72 GPUs in a rack. And it's

8:01

all-to-all connectivity. So, every GPU

8:03

has to be able to talk to every other

8:04

GPU at full bandwidth. And that's what

8:08

the switches achieve. So, 1.8 terabytes

8:11

per second, any GPU talking to any other

8:12

GPU. Is that why it's called a compute

8:15

fabric? Like when I think when I draw a

8:16

network diagram of That's Okay, got it.

8:19

So. So, yeah. They call it a compute

8:21

fabric not just because it's connecting

8:23

all the GPUs to each other. There's also

8:26

some compute functions in our NVLink

8:29

switch chips. So, we call that all

8:31

reduce or collective operations where in

8:34

training when certain operations need to

8:36

be shared across the network, instead of

8:39

sending it to all the GPUs, it will do

8:41

some of those operations within the

8:42

switch. Oh, wow. Okay, so the switch

8:44

isn't just connecting things, it's

8:46

actually also doing some Some

8:48

computation as well. That's awesome.

8:51

Okay, so I think we've covered five of

8:53

the chips now, right? Is that correct?

8:55

That's correct. Where What's the sixth

8:57

chip? Six is the uh Spectrum-X uh

9:00

What's Can we try to take a look at

9:01

those racks? Yeah, let's go take a look.

9:06

There's 10 trays up top. Those are the

9:08

compute trays. Nine networking trays.

9:11

Nine NVLink switch trays, I should say.

9:13

And their job is to connect all the GPUs

9:16

in the 10 above and the eight below

9:19

compute trays together, right? That's

9:21

correct. So, what's up there then?

9:24

So, that that is the top-of-rack uh 1

9:27

gigabit switch for telemetry. That's

9:29

telemetry? That's just telemetry. It's

9:31

just system management, managing

9:33

functions. It's low-speed Ethernet. It's

9:35

just a uh It's a just a management

9:37

system for the rack itself. It doesn't

9:39

It's not processing the compute data for

9:41

AI.

9:42

>> It's managing if a GPU goes down. It's

9:44

like Help me understand what telemetry

9:46

means and what that

9:47

>> Telemetry means like I'm just looking at

9:48

the the functions of the rack itself.

9:50

I'm looking at its uptime. I'm looking

9:53

at

9:53

>> Health and status, I guess.

9:54

>> Health and status checking, yes.

9:56

Diagnostics would also

9:58

>> And you mentioned that there's another

9:59

kind of rack that would sit next to

10:01

this.

10:02

So yeah, you will have your your group

10:05

of compute racks, GB200 compute racks,

10:08

and then you would also have racks

10:09

dedicated to Spectrum-X east-west

10:12

network switches.

10:14

We don't have that here, but

10:16

though that's how the the function would

10:18

be like a we call it a pod. You have

10:21

maybe eight GB200 racks, and then you'll

10:24

have a few

10:26

switch racks with Spectrum-X.

10:27

>> Yeah. So that's a great overview of the

10:30

Blackwell system, right?

10:31

>> That's right. Now, I want to understand

10:33

how what things changed from Blackwell

10:36

to Rubin. Okay. Can we

10:39

go over there and look at that?

10:39

>> at the at the trays.

10:42

So this is Looking at the components up

10:44

here on the wall,

10:46

we talked about in the compute tray, the

10:47

BlueField DPU, BlueField 4. So that

10:50

There you can see it on the wall. The

10:52

that that board is part of the module

10:55

system that slides in and out of the

10:57

compute tray for serviceability.

10:59

And then all likewise, the ConnectX-9 is

11:02

there in the middle.

11:03

And there's two ConnectX-9s that are on

11:06

that board

11:07

for a total of eight in every compute

11:10

tray. So every GPU is fed 1.6 terabits

11:14

per second for the ConnectX-9s.

11:16

And then we have the

11:18

the Spectrum-X photonics co-packaged

11:22

optics. This is really, really cool.

11:24

Yeah, what is that? So instead of having

11:27

SFP pluggable modules for for the

11:30

optics, they're actually built onto the

11:33

chip itself. Okay.

11:34

with it. So, this has a a huge gain in

11:37

energy efficiency, uh reliability, uh

11:40

and this factors more in terms of of

11:42

those two factors.

11:43

>> So, before we would have fiber optic

11:45

transceivers. The fiber optic optical

11:47

transceivers.

11:48

>> Yeah. So, the fiber optic cables on

11:50

either end, and those transceivers have

11:52

lasers in them. That's correct.

11:54

>> That need power, right? Like and that's

11:56

what you're getting rid of And we're

11:58

putting them packaging on the with the

12:00

chip.

12:01

>> What does that actually mean in terms of

12:02

like performance or power gains?

12:04

So, in terms of performance, the

12:06

performance would be the same. But, it's

12:08

going to be the uh the power reduction

12:11

and the uh reliability improvement. Cuz

12:14

uh those pluggable lasers can be very,

12:17

you know, sometimes very unreliable.

12:19

They have to be swapped out very

12:20

frequently. But, if it's co-packaged

12:22

here uh on on the chip, the reliability

12:25

goes up like uh I think 10x better

12:28

reliability.

12:29

>> So, it's a huge difference. And where in

12:30

the rack does that live?

12:32

So, that would be in its own switch tray

12:35

uh or a switch server. And that's a

12:37

separate rack.

12:38

>> That's the side rack, right?

12:39

>> That's the separate rack that's separate

12:41

from the the NVE 72. So, that's the

12:43

east-west traffic switch rack.

12:46

Awesome. So, Quantum MX, uh there's also

12:50

uh for InfiniBand, which is a

12:52

an alternative to Ethernet. There's also

12:54

a co-packaged optics for Quantum

12:56

InfiniBand as well.

12:57

>> So, those two chips are equivalent. One

12:59

is for Spectrum-X Ethernet, one is for

13:01

Quantum InfiniBand.

13:02

>> That's correct. And then you also have a

13:04

Spectrum-X Ethernet photonics switch.

13:06

So, that is the uh the co-packaged

13:08

optics chip is in there in the Ethernet

13:11

photonics switch. So, that's where the

13:13

photonics part is, the co-packaged

13:15

optics. Got it. But, these these go in

13:17

the side car. These go into uh switch

13:20

racks. Yeah. Got it.

13:22

As well as that one, right? If you're

13:24

doing Quantum InfiniBand as your

13:26

east-west traffic protocol, then you

13:28

would use the Infiniband as a side rack.

13:31

>> So, these are Sorry. These are

13:33

equivalents. One for Infiniband, one for

13:35

Ethernet, right? Correct. Got it. Yeah,

13:38

that's right. So, what we kind of just

13:39

talked about is what I would say is the

13:41

current state of the art for data

13:43

centers, right? Blackwell Ultra is the

13:46

one that's sort of the best in class in

13:47

data centers right now. And then Jensen

13:50

announced Vera Rubin, the six chips we

13:52

just talked about. We talked about the

13:54

Blackwell versions. This is a

13:55

substantially different compute tray

13:57

than the one we just saw. Can you walk

13:59

us through all the differences? Oh,

14:01

yeah, there's plenty. So,

14:03

uh what we've done is overall, it's a

14:05

modular design. Okay. So, that means

14:07

that there there's bays here and these

14:10

can just slide out and slide in and just

14:13

lock and latch. So, there's not a bunch

14:15

of wires and cabling to do all the

14:17

connectivity between all the components

14:20

that are on the tray.

14:22

Also, the hosing as well, that's been

14:24

streamlined.

14:25

>> Yeah. So, there's a manifold in the in

14:26

the middle,

14:28

um and it manages a lot of the uh

14:30

distribution of liquid. So, overall on

14:33

the GB300, there was 43 hoses. There was

14:36

a bay of fans here cuz it was a hybrid

14:39

cooled. Uh the the bottom half of GB300

14:41

was was fan cooled.

14:43

This is we have eliminated that uh and

14:45

because we're 100% liquid cooled now.

14:48

So, eight fans goes to zero fans, zero

14:51

hoses.

14:52

And then there's a bunch of cables that

14:53

have been removed as well. Uh so, it's

14:56

cable free. So,

14:58

this

14:59

I I'm trying to even piece together what

15:01

I'm looking at. So, these would be where

15:02

the two super chips were in the last

15:04

generation.

15:04

>> the super chips. They slide in and out.

15:06

They latch in.

15:07

Uh so, you have the two Rubins, uh one

15:10

Vera on them. So, uh one other important

15:13

point is because it's modular now and we

15:16

have all these bays that slide in and

15:18

out and it's all connectivity with

15:20

connectors instead of cabling,

15:22

putting this together and doing assembly

15:24

on it is like 20 times faster.

15:26

>> Sure. So, something that would take 2

15:28

hours to assemble the GB300 rack, now

15:30

you can do in 5 minutes on this

15:33

particular rack.

15:34

>> And that's And that's just assembly,

15:35

right? Like if I have a maintenance

15:36

issue and I need to

15:37

>> for maintenance, right? The

15:39

The amount of speed that you can do

15:41

serviceability increases that manyfold

15:44

as well.

15:45

No, it makes a ton of sense, right? If I

15:46

don't have all these wires and hoses, I

15:48

can just snap things out, fix fix

15:51

whatever the issue is, snap it back in.

15:53

And it's modular like so we'll talk

15:55

about some of the other pieces down

15:56

here. So, two super chips, Reuben, Vera.

16:00

We also have the CX 9's, ConnectX-9, the

16:03

next generation of that super nick are

16:05

over on these in boards in modules. So,

16:09

before they were connected to the bottom

16:10

of the super chip on GB300, but now

16:13

they're their own module and cards slide

16:15

in and out. So, you can service

16:17

different components now separately.

16:19

Yeah.

16:20

>> And then BlueField 4, the new generation

16:23

of the DPU, is also a module here that

16:25

slides in and out. Got it. So, this is

16:28

not just about performance, it's also

16:31

about more uptime, right? So, that's

16:33

another multiplier on the overall output

16:36

of an AI factory is how much uptime you

16:38

We call that good put. Like you want the

16:41

the the amount of time that you're

16:43

actually producing tokens, you want to

16:44

maximize that.

16:45

>> Yeah.

16:46

That makes sense. So, okay, this is the

16:48

equivalent compute tray. That's right.

16:51

And then there's also an equivalent

16:52

switch tray, right? That's correct. And

16:54

this looks a lot more streamlined, too.

16:56

So, walk me through the changes here.

16:58

So, in terms of the changes here,

17:00

you know, we have the the switches at

17:02

the top, 100% liquid cooled. There's

17:04

four switch chips. This is NVLink 6.

17:07

Okay.

17:07

>> Sixth generation NVLink, twice the speed

17:10

of what we had in the Blackwall.

17:11

>> Twice the speed. Wow. So, now it's 3.6

17:14

terabytes per second. And that's just

17:16

going to help us with our that

17:18

performance I talked about, 10x

17:20

performance per watt or per megawatt per

17:22

gigawatt, whatever value you want. Uh,

17:25

that's the increase in NVLink speed is

17:29

part of that contributes to that along

17:31

with some other GPU features that we can

17:33

talk about as well. And are there so is

17:36

it the same number of total GPUs in a

17:39

Blackwell rack versus a Rubin rack?

17:41

>> It is. So, it's a NBL72s. The 72

17:44

signifies the GPU count. So, GB200 NBL72

17:49

uh, and now we have Vera Rubin NBL72.

17:51

Same GPU count. Um, and it also makes it

17:53

so it's very compatible for our

17:55

customers to to move from one to the

17:56

other.

17:57

Uh, and that's part of the goal of

17:59

having the same GPU count, same kind of

18:02

MGX rack architecture.

18:04

Um, so that that's just makes it easier

18:06

for our customers. The ecosystem is, you

18:09

know, been working with these racks for

18:12

two generations now. Now we have a third

18:13

generation. They're just going to be

18:15

able to work very fast and deploy uh, at

18:17

a very high rate with our end customers.

18:19

>> No, it makes total sense. Okay, can we

18:21

go look at a Vera Rubin rack now?

18:23

>> Yes.

18:25

So, this is the Vera Rubin. Uh, this is

18:27

the Vera Rubin NBL72 rack. You can see

18:30

that there's, you know, it's very

18:31

similar in in form and in look to the

18:35

GB200. The the most uh, the biggest

18:38

difference is on the compute trays

18:40

you'll see there's no vents. So, there

18:43

was vents on the GB200 cuz the bottom

18:45

half of the compute tray still had fans.

18:48

Okay, yeah. And then we got rid of those

18:50

fans. It's all 100% liquid cooled on the

18:52

compute trays. That's why you see in the

18:54

face plate you don't see those vents

18:55

anymore. Got it. Uh, but overall still,

18:58

you know, still the nine uh, switch

19:00

trays,

19:01

still 10 compute trays on top and the

19:04

eight on the bottom. Same kind of still

19:06

telemetry on top. Still top of rack

19:08

telemetry with the one gig switch on

19:10

top.

19:11

>> Now here's the big question, right? From

19:13

Blackwell to Rubin

19:15

at the rack level, talk to me about the

19:17

performance gains at the rack level.

19:20

>> Performance gain at rack level is the

19:22

10x. 10x?

19:24

>> The 10x more tokens per second per

19:27

megawatt or per watt uh

19:29

then that's going to be a rack level

19:31

kind of uh performance metric. And

19:33

that's with a mixture of expert model,

19:35

something like Kimmy K2 thinking, uh

19:37

which is very large model over a

19:39

trillion parameters. Uh and that is

19:42

going to fit and be uh optimized in a

19:44

single rack uh with, you know, thanks to

19:48

NVLink switch, the experts in a mixture

19:51

of expert model are distributed across

19:54

the 72 GPUs. And uh that can uh factors

19:58

more performance in tokens per second.

20:00

So, here we have the Kyber rack. So,

20:02

this would be for the Rubin Ultra

20:04

generation. Subsequent to Rubin, which

20:06

is a 2026 product, in 2027 we'll have uh

20:10

Rubin Ultra.

20:11

So, that's going to be a different rack

20:13

architecture than we've had for the

20:14

previous three generations. Uh we're

20:17

putting much more compute. Yeah, I'm

20:19

noticing a lot more trays in this one.

20:23

So, we have

20:24

18 compute trays in each of these

20:27

canisters. So, there's four canisters,

20:29

up to 72 GPUs in each of the canisters.

20:32

So, you would have 288. 288 So, moving

20:35

from 144 to 288 or is that 72?

20:39

>> 72

20:40

>> to 288. Okay, so it's a 4x increase in

20:42

GPUs.

20:42

>> So, each of these canisters, the four I

20:44

talked about, is equivalent to the whole

20:46

rack over here.

20:48

>> So, there's four racks worth of GPUs

20:50

>> racks of NBL72s worth of uh compute in

20:53

here. So, very uh high compute density.

20:56

>> Yeah. Um and that's why the architecture

20:58

is different. It's a blade type of

21:00

architecture rather than a tray

21:01

architecture. Uh so, we have 18 uh

21:05

compute blades uh in each of the

21:08

canisters. Excuse me, sorry. These are

21:09

all compute then? This is all compute on

21:11

the front.

21:12

>> Yeah. On the back is where the switch

21:15

blades are. For the for the NVLink

21:17

connectivity. Got it. And so, that's

21:20

What is the performance leap that you

21:23

guys are expecting from Rubin to Rubin

21:26

Ultra in the Kyber rack?

21:29

So, we haven't given any of the

21:31

performance yet on Rubin Ultra, but it's

21:32

going to be

21:34

factors more performance as as usual

21:36

between our generations. Just because

21:38

you're going to have inc- performance

21:40

increases at the chip level, at the

21:41

superchip level, at the rack level, and

21:44

you're going to have four times as many

21:45

>> extreme co-design all all again, right?

21:47

Extreme co-design, all the chips being

21:50

designed for for greater performance,

21:52

working in concert,

21:53

being designed from scratch together.

21:55

Are we expecting extreme co-design of

21:58

all six chips for every generation from

22:00

now on? We should expect to see six new

22:03

chips? So, for every generation, there's

22:05

going to be a new generation of GPU for

22:07

for every year.

22:08

Now, whether all six are going to be

22:10

co-designed every year, that's that's

22:13

probably not going to be the case, but

22:15

you're going to see at the for the

22:17

flagship starting of each generation

22:18

like Rubin, six new chips,

22:21

some other new chips that go with Rubin

22:23

Ultra, but not the entirety of all six.

22:26

>> we might see the Vera CPU, but the Rubin

22:29

Ultra GPU.

22:30

>> Exactly.

22:30

>> Got it. Exactly. Got it. Yeah. I'm super

22:33

excited for this. I can't wait to see

22:34

what this looks like. When when can we

22:36

expect to learn a little more about

22:38

this? Is this something that we'll learn

22:39

about this year, next year?

22:42

So, yeah, it'll be something that Jensen

22:44

talks about, you know, in the in the

22:45

coming year. Uh,

22:47

I don't have a specific date, but yeah.

22:49

I'm super excited for it, man. What are

22:51

you looking forward to the most? Like,

22:53

what excites you the most as you see

22:54

like this rapid evolution year over year

22:57

and generation over generation? So, the

23:00

the amount of in- innovation at the with

23:02

the extreme co-design, that's what's

23:04

most impressive. Yeah. Right. So,

23:06

there's only so much, and Jensen talked

23:08

about this, that you can do moving from

23:10

one GPU generation to a next. Process

23:13

technology can only improve so much.

23:16

You know, it's not factors more

23:18

improvement in in the number of

23:20

transistors that you can go from one

23:21

generation to the next. So, for example,

23:24

between Vera Rubin and Blackwell, it's

23:27

about 70% more

23:30

transistors. Yeah. In terms of all the

23:32

different chips that we we co-design.

23:35

But, we're getting the 10x more

23:36

performance per watt. So, if you were

23:38

just Moore's law, it would only be a 70%

23:41

jump, not a 1,000% jump. From Yeah, not

23:44

a 10x, yeah. So, so this kind of

23:46

all these different chips being designed

23:48

together, working together to maximize

23:50

that performance, that's the most

23:52

amazing thing about this generation and

23:54

the future generations. Yeah. That's

23:56

really exciting.

23:57

Thanks so much for your time.

23:58

A huge thank you to Joe Balaware for

24:00

breaking down Nvidia's Blackwell

24:02

ecosystem, giving us an inside look at

24:04

Rubin, and explaining how it will all

24:06

make AI models faster, smarter, and more

24:09

efficient. Not just language models, but

24:11

everything from image and video models

24:13

to medicine, robotics, and so much more.

24:16

And if you want to really understand the

24:18

science behind this stock, join me at

24:20

Nvidia GTC. You can register for free

24:23

with my links below, and jump into as

24:25

many online sessions as you like. I'll

24:27

announce the winner of that RTX 5090

24:29

giveaway a few days after the

24:31

conference, so make sure to enter.

24:33

Another huge thank you to Nvidia for

24:35

sponsoring my travel and my media access

24:37

to cover GTC live, and to you for

24:40

supporting the channel. Thanks for

24:42

watching, and until next time, this is

24:44

ticker symbol you. My name is Alex,

24:46

reminding you that the best investment

24:48

you can make

24:49

is in you.

Interactive Summary

This video features an exclusive look at Nvidia's AI infrastructure, led by Joe Dalaire, who explores the transition from the Blackwell to the Vera Rubin chip generations. The discussion highlights Nvidia's philosophy of 'extreme co-design,' where six distinct chips—GPU, CPU, DPU, ConnectX, NVLink switch, and management components—are developed in unison to drastically enhance performance, energy efficiency, and data center scalability for modern AI models. The interview also covers technical improvements like 100% liquid cooling, modular design for faster serviceability, and future roadmap developments like the Rubin Ultra architecture.

Suggested questions

4 ready-made prompts