HomeVideos

E26: NVIDIA Just Changed The Course of AI Forever

Now Playing

E26: NVIDIA Just Changed The Course of AI Forever

Transcript

571 segments

0:00

Today, you're joining me for a really

0:01

special interview that marks 10 years of

0:04

NVIDIA DGX and 20 years of CUDA, the

0:08

software that makes modern AI possible.

0:10

I'm joined by Charlie Boyle, vice

0:13

president of DGX systems at NVIDIA and

0:15

frequent guest of the channel. Charlie

0:17

helped shape the evolution of AI

0:19

infrastructure as we know it, and you're

0:22

about to get an in-depth look at the 35

0:24

to 50x power, efficiency, and

0:27

performance gains from Blackwell to

0:29

Rubin and the DGX features that helped

0:32

make that possible. I asked Charlie

0:34

every question I could think of, and he

0:36

had some surprising things to say about

0:39

the future of AI factories and data

0:41

centers. Your time is valuable, so let's

0:44

get right into it.

0:45

Charlie, I'm so happy to be able to

0:47

speak with you again, and I'm super

0:49

excited to talk to you about all things

0:51

DGX today. So, I know it's the 10th

0:53

anniversary of DGX and the 20th

0:56

anniversary of CUDA. So, let me just

0:58

start by asking the absolute basics.

1:00

What is a DGX system to begin with?

1:03

Yeah, so it is the 10-year of DGX. We

1:06

started 10 years ago today in this

1:07

building. Oh, wow. A little bit

1:10

different setup there. The DGX 1 was

1:13

behind velvet ropes. It was our first AI

1:15

supercomputer, and the mission back then

1:19

is still the mission today, which is to

1:21

take the best technology that NVIDIA

1:23

has,

1:25

build a system with it. Back then, it

1:27

was one box. Now, it's a giant data

1:29

center. But, to build the system, to

1:31

make a vertically integrated software

1:34

stack that makes AI easy to use for

1:37

customers. 10 years ago, it was all

1:39

about researchers. You know, can I get

1:42

the first-generation AI models working?

1:44

But, now it's all about how do we make

1:46

AI easy to use, cost-effective, and

1:49

really deliver business value to

1:51

customers around the world. And DGX is

1:54

just one implementation of that. We are

1:56

the reference architecture that all of

1:58

our partners around this show floor and

2:00

around the world use to build their AI

2:02

systems today. And what separates DGX

2:04

from the other ways that NVIDIA offers

2:07

their systems, right? So, this big form

2:09

factor versus the bladed systems that we

2:11

often see Jensen reference. Walk me

2:13

through what makes DGX DGX specifically,

2:15

you know? We build DGX, it's our

2:17

reference architecture, because we have

2:19

to take all those NVIDIA components, new

2:22

GPUs, new networking, new power, build

2:25

that into a system, and then share that

2:27

design with all of our ecosystem, so

2:29

they can build systems for all of our

2:31

customers. Now, the system we're

2:32

standing in front of is our current

2:34

generation, our latest DGX B300, so the

2:37

300 series of the Blackwell generation.

2:40

On the show floor, there's plenty of

2:41

Vera Rubins. What you saw on stage today

2:43

with Jensen, which you referenced as the

2:45

blade systems, that's our NVL72 system.

2:48

>> Yeah.

2:49

>> that last year. We You and I saw it

2:50

together, our Blackwell NVL72. Now,

2:53

we've got our Vera Rubin NVL72. The

2:56

great thing is, from generation to

2:57

generation, for all of our customers out

2:59

there, it's the same chassis. It's just

3:02

that compute blade that changed. Now, it

3:04

got a lot faster. It's 35x faster. You

3:08

know, it got a lot more memory, but not

3:10

without a lot more power in that. So,

3:13

we're delivering tremendous new

3:14

performance

3:15

all in that same footprint. And the

3:17

reason that we're building those, it's

3:20

not just so we can sell this to

3:22

customers, that we're the reference

3:23

design, it's to help all of our

3:25

partners. I'm looking in the background

3:27

at the Dell booth right now. You know, I

3:28

see our other partners all around here.

3:31

The systems that we build as reference

3:32

architectures, they take out to their

3:34

customers. So, there's a Dell Vera

3:36

Rubin, there's a Supermicro Vera Rubin,

3:38

there's an HPE Vera Rubin. All of those

3:40

things started on our reference design

3:42

that we built internally. I'm super

3:44

proud of these. They're beautiful,

3:45

they're gold. We help thousands of

3:47

customers around the world, but they're

3:49

helping tens of thousands, hundreds of

3:50

thousands of customers around the world

3:52

with AI.

3:53

>> Yeah. And beautiful and powerful, right?

3:55

So, uh Vera Rubin NVL 72, you know, uh

3:59

how many GPUs are in this one? So, in

4:01

each of these there's eight. Yeah. And

4:03

so, in these four there's four in the

4:05

rack here. So, in this rack there's 32

4:08

GPUs.

4:09

>> Yeah.

4:09

In the Vera Rubin, in that same space,

4:12

there's 72.

4:13

>> Yeah. And they're all connected with

4:15

NVLink networking. So, all of those 72

4:19

GPUs, there's actually 18 different

4:21

compute trays in there. All act as one

4:23

big GPU. And the reason you need that is

4:26

for that massive agentic workflow that

4:27

Jensen was talking about.

4:29

>> Yeah. You know, it's not just a chatbot

4:30

anymore. I'm asking you a question. It

4:33

reads a PDF. It's a whole workflow.

4:35

Like, go build me a compiler. Yeah. You

4:38

know, I need a system that's a rack

4:40

level to go do that work and to come

4:42

back in a reasonable amount of time

4:43

that's cost-effective for me. Yeah. And

4:45

that's the generation on generation

4:47

efficiency that we have is every year

4:50

that efficiency, you know, Jensen talked

4:52

about 35x. Well, that's just not 35x

4:54

faster. That means for the same job, for

4:56

the same thing that was impossible or

4:58

too expensive for you to do last year,

5:00

it's now 35 times less expensive to do

5:02

it.

5:03

>> Yeah. When somebody chooses between a

5:04

system with 72 GPUs and 32 GPUs, what

5:08

what's what's the reason to go with this

5:10

system? So, it it's all about the

5:12

specific workload and and where you are.

5:14

>> Yeah. You know, we started our AI

5:15

systems, you know, I don't build them as

5:17

DGX, but our partners build them, you

5:19

know, Yeah. put PCI cards into a

5:21

standard x86 server. Uh-huh. Some of our

5:23

customers, their AI workload works great

5:25

on that. This eight-way form factor

5:28

something that we introduced 10 years

5:30

ago. The original DGX-1 was the very

5:32

first eight-way system. And the funny

5:34

thing was, 10 years ago, when I would

5:36

talk to customers, their number one

5:37

question is, "What am I possibly going

5:39

to do with eight GPUs? Can I virtualize

5:41

it?" Now, people have one application

5:43

that takes thousands of GPUs.

5:45

>> Yeah. But the reason you choose one of

5:47

these systems is where your application

5:48

is. So this is a very standard form

5:51

factor that you know every not only

5:53

every OEM system but every cloud the

5:56

eight-way Nvidia GPU server is up until

5:59

the Blackwell generation the gold

6:01

standard that everyone had. And then for

6:03

new really large memory workloads cuz

6:05

the the difference is these are four

6:07

different computers but the applications

6:10

that you run on that would just use the

6:12

memory in one of these computers. With

6:14

the NVL 72 you have the memory of all

6:17

those 72 GPUs connected with one NVLink

6:20

so your application can see that as one

6:22

giant GPU. So whereas this your

6:25

application would see eight GPUs

6:27

together in a memory context the NVL 72

6:30

it's

6:31

72 GPUs. So I can do a much bigger

6:33

agentic workload. I can do trillion

6:36

parameter context. I can do applications

6:38

that weren't possible before that

6:40

technology. That's amazing. Actually

6:42

let's let's double click on that for a

6:44

second. So one of the things that got

6:46

announced during the keynote was the

6:48

BlueField 4 STX reference memory

6:50

architecture right? So explain what that

6:52

is at a high level and what that means

6:54

for these systems going forward. Right.

6:56

So you know kind of going back in

6:58

history when we introduced our A100 that

7:00

was the first time and Jensen showed it

7:02

in that history video that was fabulous

7:04

in the keynote in A100 we started

7:06

something new that we called the

7:08

SuperPod. And so that was our first DGX

7:11

SuperPod which was a reference

7:13

architecture of a number of these nodes

7:15

connected together with at the time

7:17

InfiniBand plus storage. And so

7:20

customers would buy an AI factory in

7:22

that pod format. And that was 32 of

7:25

these systems together and you kind of

7:27

put those together you know to build

7:29

your AI factory. Well as AI has gotten a

7:32

lot more powerful it's not just enough

7:36

to have just GPUs anymore. So what we

7:38

talked about in the keynote was a brand

7:40

new pod. So it's the NVL 72 systems.

7:44

It's our Vera rack as well because a

7:47

gentic AI needs a lot of CPU processing

7:51

power for all the sandboxing, for all

7:53

the testing that it does. So, I need NVL

7:55

72 Vera Rubins. I need racks of Vera

7:58

systems for all the compute work. And

8:01

there's a new class of storage that's

8:03

needed and that's what we did with STX.

8:06

Very similar to what we did 10 years ago

8:07

with DGX, we came out with a reference

8:10

architecture for the industry to

8:11

accelerate a new form of application.

8:14

And all this a gentic workflow needs

8:16

high-speed storage context that can

8:20

either store the context of what the

8:22

workflow that you want it to do. It

8:24

could offload certain things because AI,

8:27

the power of AI, it needs data. You need

8:29

to be close to the data. And so, we're

8:31

working with all of our storage partners

8:33

so that they can take their storage

8:34

stack, all of our great partners like

8:37

NetApp and Vast and DDN and HPE, their

8:40

storage stacks, what they've got decades

8:43

of investment will run on top of that

8:44

STX reference architecture all in that

8:47

same AI pod. And so, as enterprise

8:50

customers looking to deploy AI,

8:53

they're not going to buy STX from

8:54

Nvidia. They're going to buy that STX

8:56

design from the storage partners they're

8:58

already working with today. Nvidia's

9:00

innovating on the STX platform to help

9:03

all of our storage partners to bring

9:06

better speed, better efficiency, better

9:07

token economics to that entire pod with

9:10

the STX design.

9:12

>> Yeah. Help me understand what the STX

9:14

design even enables. So, like from a

9:15

workload perspective, if I'm thinking

9:17

about running an AI agent before, I was

9:20

storing a lot of that context in like

9:21

high-bandwidth memory close to the GPU,

9:23

right? Now, what does that let me do?

9:26

Bigger workloads faster? Like help me

9:27

understand. Do all of the above. When

9:29

you think about the new a gentic

9:31

workloads, it's more beyond, you know,

9:33

the things that we were doing just even

9:35

a year or so ago where a job would run

9:37

for a minute, maybe 5 minutes. You know,

9:41

one example that I think we all saw in

9:42

the news was they had an energetic

9:45

workload from scratch build a C

9:46

compiler. That took a week.

9:49

Now, in that I couldn't possibly store

9:51

all that context in GPU memory all at

9:54

once. So, I needed something that was

9:56

very close to the GPU, something that

9:58

was very accelerated to store all that

10:00

context to move things back and forth.

10:02

You know, so that's one very

10:03

long-running use case of it. But, the

10:06

other part of accelerating, you know,

10:07

your token economics on that workload,

10:10

especially in today's, you know, storage

10:11

world with everything that's going on in

10:13

the market, if I can make your tokens

10:15

process 5x faster because I'm putting

10:17

that storage optimized closer to the

10:19

GPUs, well, I can do 5x more work on the

10:24

same amount of storage that I just

10:25

bought. And so, it's it's not only great

10:27

for our storage partners, but it's great

10:29

for our customers who are trying to put

10:31

all these things in their data center.

10:33

Less physical infrastructure means it's

10:34

more power efficient, means I can use

10:36

more power for processing. It's lower

10:38

cost cuz I can get more work done with

10:40

the same physical footprint. So, all in

10:42

all it's a win-win, but it all builds in

10:44

that same pod architecture.

10:46

>> Yeah. Speaking of which, so power

10:48

efficiency I think is something I'd love

10:49

to talk to you about. Um,

10:52

during the keynote, you know, there was

10:53

a lot of talk about Vera, the CPU, and

10:55

Rubin, the GPU. Help me understand how

10:58

those two um new architectures, like,

11:01

you know, the new Vera-Rubin

11:02

architecture uh affects the DGX systems

11:05

going forward. What is the performance

11:06

jump from the Blackwell version of DGX

11:09

to the Vera-Rubin version of DGX? So,

11:12

you know, as as Jensen put it up in the

11:13

keynote, you know,

11:15

35x on

11:16

>> 35x energetic workloads. Now, the the

11:19

funny thing is cuz

11:20

last year we had a 35x as well, and even

11:23

talked about it in the keynote, the you

11:25

know, the SemiAnalysis, when they ran

11:27

it, it was 50x. And they they What did

11:29

they say? They said, "Jensen, you're

11:30

sandbagging." Well, it's funny is most

11:32

people think when we put out those

11:33

numbers like that's the most

11:35

cherry-picked number possible out there,

11:37

but I see real numbers like that from

11:39

customers even in the Hopper to

11:40

Blackwell generation. I had a customer

11:43

that was seeing a 50 to 100 x speed up.

11:47

And for them,

11:48

that meant for the same system they had,

11:51

they could get 50 more clients on that

11:54

same infrastructure. Like So, they could

11:57

serve more customers, bring more people

11:59

on board at the same cost, the same

12:01

power efficiency. And so, when you see

12:03

that 35x in Vera Rubin, you can take

12:06

that in two ways. Like I can do more

12:08

work faster, or I can save a lot of

12:10

money. And most of our customers do

12:12

both.

12:13

And a big thing that Jensen talked about

12:15

towards the end in the new DSX gigascale

12:18

AI factory, uh he talked about dynamic

12:21

power and Max-Q. And now, most of most

12:24

of the people watching this today aren't

12:26

going out and building a gigascale AI

12:28

factory tomorrow, right? But you know,

12:30

I've been in the data center industry

12:31

for more years than I'd care to remember

12:34

at this point, but in many decades. But

12:36

what does everyone do in a data center

12:38

when you're building it? You provision

12:40

for the power that's on the nameplate on

12:42

the back of the server. And what that

12:44

does is you're over-provisioning the

12:46

power because your entire racks and

12:48

racks of systems are never running at

12:50

100% all at the same time. But for

12:53

safety reasons, everyone says, "Well,

12:55

no, you know, like it could happen." And

12:57

humans can't turn the knobs fast enough

13:00

if

13:01

everything does happen hit at the same

13:03

time. So, that's what we talked about in

13:05

the new DSX design for gigascale, but

13:08

that translates all the way down to a

13:10

customer buying two racks of NVL472.

13:13

It's that dynamic power management that

13:15

you just tell it how how much power you

13:18

have available to you, and if one rack

13:21

is using you know, both racks using 100%

13:24

of that, great.

13:25

If one of the racks isn't using all of

13:27

that, it can speed up the other rack.

13:29

And because that's AI built into the

13:31

chip, built into the power management,

13:33

brand new in Vera Rubin, cuz it's both

13:35

the CPU and the GPU working together,

13:38

that power slashing, I can make every

13:41

watt I pay for turn into real tokens.

13:44

Whereas today,

13:46

anyone would tell you with over

13:47

provisioning,

13:49

the average is

13:51

60% of the energy coming into the data

13:54

center is actually doing useful work.

13:56

That other 40% is over provisioned, it's

13:59

heat loss, it's all those other things,

14:01

because nobody ever felt safe pushing

14:03

that limit because there weren't

14:05

automatic controls. And that's brand new

14:07

in the Vera Rubin architecture. It

14:08

starts with the chip, it goes all the

14:10

way through the software and the

14:11

telemetry, so that as a customer, you

14:13

set that number. We put things in the

14:15

power systems, capacitors, everything

14:17

needed so that you can feel safe for

14:18

that. So, you're getting the value out

14:20

of every watt you're spending. So, it's

14:22

not just a 35x improvement in terms of

14:24

performance, but it sounds like there's

14:25

also like a 67% improvement in the

14:28

amount of power you can use that you had

14:30

provisioned, right? From that 60% all

14:32

the way to the 100%.

14:34

>> Yeah. And because any data center

14:35

operator would tell you like, "Oh my

14:36

god, I you know, like, if you hit 100%,

14:39

bad things happen today."

14:41

>> Sure, yeah. But,

14:42

if I can have those automatic controls

14:44

and I can believe in it, and that's why

14:46

we're investing. So, I just not only

14:47

that pretty picture that you saw of DSX

14:50

in the render, in the simulation, that's

14:52

being built in Northern Virginia. So,

14:54

we're going to build that and run that

14:55

for our own use, but that same design,

14:58

we can show customers not only that it

15:00

works on paper, but that we're running

15:02

that 24/7 at 100%. And when the public

15:05

utility says, "Hey, I need you to not be

15:07

at 100%," they can send us a signal and

15:10

the system automatically reacts to that.

15:13

So, it's not only what we do in the data

15:14

center, everyone's talking about

15:15

worldwide power, that the interfaces,

15:17

the things that we're pioneering in

15:19

there aren't just for our own things

15:21

like, "Hey, it's hot. People need more

15:23

air conditioning in their house. Hey,

15:25

data center, can you turn it down a

15:26

little bit?" They send a signal and it

15:28

automatically works and we're still

15:29

optimizing the work coming out of this.

15:31

Wow, I feel like that's really slept on.

15:32

I didn't hear I certainly didn't hear

15:34

enough about that in the keynote, so I'm

15:35

really happy you highlighted that. I

15:37

think that's a really huge benefit to

15:40

especially since most data centers are

15:41

power constrained today, right? Yeah, no

15:43

matter no matter what size you are, you

15:45

only have so much power. Whether it's

15:46

your home, you know, whether it's your

15:48

data center, like you've only got so

15:49

much power, but at a data center level,

15:51

however much power you pay for, whether

15:54

you use it all or not, you're still

15:56

paying for it. And so, that's the

15:57

tremendous advantage in the Vera Rubin

15:59

generation. We had to put a lot into the

16:01

hardware itself. We had some of it in

16:03

the Blackwell generation. We could

16:05

smooth things out a little bit. Yeah, we

16:07

talked about it last year. That was a

16:08

new innovation in the power shelves, but

16:10

now it's all the way from the chip to

16:12

the power shelves to the rack to the

16:13

data center. That's huge. That's huge. I

16:16

think that's a feature that I'd love to

16:17

talk more about, but one of the things I

16:19

want to ask just because I know we're

16:20

short on time. Um so, 10th year of DGX,

16:24

20th year of CUDA. You've seen the

16:26

system evolve so much over generation

16:28

after generation. Is there another

16:30

feature that you're like really proud

16:31

of, really pumped to talk about, you

16:33

know, that's you've seen evolve sort of

16:35

from the ground up? I I don't It It's

16:37

less of a feature in the system. It's

16:39

how our customers use these. Because the

16:41

you know, the the biggest thing and you

16:43

know, one of the things, you know, I've

16:44

talked to people in this about this in

16:46

the past, but it's still true today.

16:48

Every system we put out, within a year

16:50

of putting that system out, just in

16:52

software, the system usually gets up to

16:54

2x faster, which is like completely

16:56

opposite of consumer electronics. Like

16:58

your phone gets slower every year.

17:01

But because the optimizations that we do

17:02

in CUDA and because of Tensor Core, that

17:04

20 years of CUDA, it's application

17:06

compatible. That very first DGX-1 that

17:09

was running on the show floor here 10

17:10

years ago, the application that was

17:12

running on that would run on this thing

17:14

today. So, like when NVIDIA releases a

17:16

TensorRT LLM update that makes it twice

17:19

as fast as inference. Yeah. All these

17:21

Regardless of the generation, yeah.

17:22

Everyone gets it in that generation, and

17:25

you know, that's something, you know,

17:26

it's it's a little bit of the unsung

17:27

hero. Our customers talk about it, but

17:30

it's one of those things that like you

17:31

can't see on day one. The numbers that

17:33

we put up in the keynote, fantastic

17:34

numbers today. When we revisit that 6

17:37

months from now, 9 months from now,

17:38

they're just going to get better. And,

17:40

you know, I I guess the, you know, from

17:42

a feature perspective, it's not a

17:43

feature that I'm looking forward to.

17:45

It's all the new things that our

17:46

customers are doing with this is a

17:47

genetic workload. We talked about open

17:49

claw and doing it safely. Like that is

17:52

the most exciting thing as just a

17:54

general like technology user. Like I'm

17:56

sure you've had this idea. I'm, you

17:58

know, I've definitely had the idea to

17:59

like, oh, I wish I had a program that

18:01

could do X. And it's just, you know, an

18:03

average everyday user I'm like, I could

18:05

probably code that or I could call a

18:07

friend, but it's like, nah, I I never do

18:09

that. But now that we can safely take

18:13

open claw,

18:14

build a software application, and

18:15

sandbox it. We're doing that actually in

18:17

the park. We got open claw on stall fest

18:19

with safe software to help people build

18:22

their own applications. That's the thing

18:24

that's exciting me the most is that

18:25

everyone at this show, everyone at home,

18:27

every business user that ever had an

18:29

idea that used to say like, hey, I wish

18:31

I just had a little software application

18:33

that did X.

18:35

Well, if you can think about that now,

18:37

with the technology that's available

18:38

today, you can make that happen. So, you

18:41

know, that's super exciting now, and

18:43

what I can't wait for is like next year

18:45

everyone showing the examples of like

18:47

what they did on their systems that they

18:49

got from us this year. Like what was new

18:51

and unexpected that like nobody thought

18:53

of that like changed the way they did

18:55

their day-to-day work or their

18:56

day-to-day life. I'm super excited for

18:58

that. Charlie, thank you so much for

19:00

your time. A huge thank you to Charlie

19:02

for walking us through Nvidia's DGX

19:04

systems, their role in the AI

19:06

revolution, and the huge gains from

19:08

Blackwell to Rubin. 35 to 50x

19:11

performance in a single generation

19:13

redefines what's possible across

19:15

training, inference, and opens the doors

19:18

for entirely new kinds of AI workloads.

19:21

And to me, that's a future worth

19:23

investing in. Thank you to the Nvidia

19:25

team for flying us out to California,

19:27

for supplying us with press passes for

19:29

GTC, and for making this interview

19:32

possible. And of course, thank you for

19:34

watching and supporting the channel.

19:36

Without you, I would never get these

19:38

kinds of opportunities in the first

19:39

place. And if you want to see what else

19:41

I'm investing in, check out this video

19:44

next. Either way, thanks for watching,

19:46

and until next time, this is Ticker

19:48

Symbol You. My name is Alex, reminding

19:50

you that the best investment you can

19:52

make is in you.

Interactive Summary

This interview with Charlie Boyle, VP of DGX systems at NVIDIA, celebrates the 10th anniversary of DGX and 20th of CUDA. The discussion covers the evolution of AI infrastructure, highlighting the significant performance gains of the new Vera Rubin NVL72 systems, the introduction of the STX reference architecture for storage, and the critical importance of dynamic power management in optimizing data center efficiency. Furthermore, Charlie emphasizes how continuous software optimizations and the shift toward agentic AI workflows are enabling entirely new capabilities for users.

Suggested questions

4 ready-made prompts