I'm Investing In This Breakthrough AI Chip (Here's Why)

Watch on YouTube

Now Playing

Transcript

484 segments

0:00

I just got back from GTC and I'm

0:02

convinced that Wall Street does not

0:04

understand Nvidia. That's because the

0:06

best long-term investments come from

0:08

understanding a company's products, not

0:11

just their profits. And after everything

0:13

I saw, I believe that Nvidia will be the

0:16

first company on Earth to hit $10

0:18

trillion in market cap. Let me show you

0:20

why. Your time is valuable, so let's get

0:23

right into it. Look, I'm not here to

0:25

recap Jensen's keynote. Instead, I want

0:28

to share what I learned by actually

0:30

going to GTC myself, interviewing

0:32

Nvidia's executives, trying out

0:34

prototype robots, riding in self-driving

0:36

cars, touching a quantum computer, and

0:39

even talking to Jensen Hong himself.

0:41

After all the big announcements, the

0:43

mainstream media and Wall Street

0:44

analysts are focused on Nvidia's new

0:47

Reuben GPUs. But I think they're missing

0:49

the bigger picture. Vera Rubin isn't

0:51

just about faster chips. It's a

0:53

blueprint for the entire AI revolution

0:56

with huge implications for data center

0:58

spending, how AI systems will be

1:00

designed going forward, and of course,

1:02

what kind of stocks will win big as a

1:04

result. So, let me break down the

1:06

biggest things I learned at GTC and what

1:09

surprised me the most. Nvidia's Vera

1:11

Rubin platform and the new Gro 3

1:13

inference chips. How that hardware ties

1:16

into Nvidia's AI strategy for agentic

1:18

systems like Openclaw. the surprising

1:21

things I saw in self-driving cars and

1:23

humanoid robots and what I think this

1:25

all means for Nvidia stock going

1:27

forward. One thing that surprised me is

1:29

just how different Nvidia's Vera Rubin

1:31

platform is from Blackwell. It's not

1:33

just a faster system like a lot of

1:35

headlines are suggesting. Reuben comes

1:37

with fundamentally different approaches

1:39

to networking memory and even compute.

1:42

And that needed to happen for two very

1:44

important reasons. First, AI models

1:47

don't just get trained one time anymore.

1:49

They continuously get fine-tuned via

1:51

reinforcement learning. And second, AI

1:53

workloads are shifting from short chat

1:55

prompts written by humans to autonomous

1:58

agents like OpenClaw, Perplexity

2:00

Computer, and Claude. These agents are

2:02

calling tools. They're browsing

2:04

websites, writing code, and running for

2:06

millions of tokens at a time. And that

2:09

costs thousands of times more tokens

2:11

than regular chat prompts, which makes

2:13

power efficient, low latency inference

2:15

the new main cost driver for AI. This is

2:18

why I expect data center spending to

2:20

actually accelerate, not slow down like

2:23

most analysts predict. And this is why

2:25

Vera Rubin is a fundamentally different

2:27

system from Blackwell. It's designed to

2:30

produce as many useful tokens as

2:32

possible per rack, per watt, and per

2:35

dollar. So these openclaw style agents

2:37

are actually affordable to deploy at

2:39

scale. Nvidia announced seven new chips

2:42

as part of the Reuben platform. I want

2:44

to respect your time, so I'll list them

2:46

all out for you, but then I'll focus on

2:48

the two that really matter for

2:49

investors. The Reuben GPU is the main AI

2:52

chip. It has a new transformer engine

2:54

that gives it a much higher token

2:56

throughput versus Blackwell, about five

2:58

times higher inference performance, 3.5

3:01

times higher training performance, and

3:03

it cuts token costs by over 90%. The

3:06

headlines are right to call out these

3:07

insane improvements, but I'll show you

3:09

what they mean for the bigger picture in

3:11

a minute. The Vera Rubin CPU is an

3:13

ARMbased processor with 88 custom cores

3:16

designed to handle all the messy tasks

3:18

that GPUs are bad at like orchestration

3:21

and control, branching logic, and

3:23

preparing data. The Vera CPU schedules

3:26

and coordinates multi- aent workloads.

3:28

It handles API and tool calls, and it

3:31

runs any additional software and

3:32

services that are installed on that same

3:34

rack. Think about things like data

3:36

logging, monitoring, security services,

3:38

and so on. Vera has roughly three times

3:41

the memory capacity, double the memory

3:43

bandwidth per core, and double the

3:45

connection speed to the GPUs compared to

3:47

Grace. And it can also do full

3:49

confidential computing, which wasn't

3:51

available in the Grace CPU. So yeah,

3:54

CPUs are still very important to the AI

3:56

story. They just look very different

3:58

from the traditional CPUs we're used to.

4:01

The NVLink 6 switch chips connect all 72

4:04

GPUs together at the rack level. NVLink

4:07

6 has double the bandwidth from the

4:09

previous generation, around 3.6

4:11

terabytes per second. That's fast enough

4:14

to move around 250 fulllength 4K movies

4:17

between chips every single second. The

4:20

ConnectX9 Supernick is a network

4:22

interface card that sits in each compute

4:24

tray to move data between the network

4:26

and GPU memory as well as encrypt

4:29

traffic so that the network stays fast,

4:31

predictable, and secure as more racks

4:34

get added to it. The Spectrum 6 Ethernet

4:36

switch provides the backbone that

4:38

connects Reuben racks and storage pods

4:40

together along with co-packaged optics.

4:43

This is the part of the system where

4:44

Nvidia's $2 billion investments in

4:46

coherent ticker symbol COH and Lum

4:50

ticker symbol LIT come into play. Leave

4:53

me a comment if you want me to make a

4:54

full video about optical networking

4:56

because this technology is all about

4:58

making networks more resilient, more

5:01

error-free, and more power efficient.

5:03

These next two chips, the Gro 3 LPU and

5:06

the Bluefield 4 DPU are where I think

5:08

Nvidia really innovated the most. While

5:10

the GPUs, CPUs, and networking chips got

5:14

obvious upgrades, Gro 3 rewrites how

5:16

token generation works in general. And

5:18

Bluefield 4 adds a whole new context

5:21

memory layer for AI agents. By the way,

5:23

a new study shows that US workers who

5:25

use AI everyday earn 40% more than those

5:29

who don't. That means AI is not

5:31

optional. It's an advantage that you

5:33

either have or others have over you.

5:36

That's where Outskll comes in. The

5:38

sponsor of this video, Outskll is

5:40

running a live 2-day AI mastermind this

5:42

weekend. 16 hours of training that

5:44

brings together the knowledge of over a

5:46

100red experts from companies like

5:48

Nvidia and Microsoft to make you more

5:51

confident in using AI on your own,

5:53

staying ahead as tools rapidly evolve,

5:55

and turning those skills into higher

5:57

value, better paid work. and they're

5:59

giving the first 1,000 people who sign

6:01

up with my link a free seat. Whether you

6:03

work in tech or sales management or

6:06

marketing, you'll learn to use AI

6:08

agents, create automated workflows, and

6:10

connect them to the software and

6:12

spreadsheets you already use every day.

6:14

This is a great way to level up your AI

6:16

knowledge, gain a real competitive

6:18

advantage, and understand the science

6:20

behind the stocks. Over 10 million

6:23

people all over the world have already

6:25

attended and slots for this one are

6:27

filling up faster than ever. So, make

6:29

sure to register for your free seat with

6:31

my link below today. All right, let's

6:34

start with Grock since it's one of the

6:36

most important acquisitions for

6:37

investors to understand. The Gro 3 chip

6:40

seems to be replacing the Reuben CPX

6:42

GPU, which Nvidia originally designed

6:45

for inference. But this Grock chip isn't

6:47

a GPU at all. It's an LPU, a language

6:51

processing unit. And it's crazy just how

6:53

fast Nvidia was able to integrate it.

6:56

Nvidia announced a $20 billion deal to

6:58

license Gro's technology and hire most

7:01

of the core engineering team on December

7:03

24th, 2025. Jensen showed off the first

7:06

Gro 3 LPX during his GTC keynote. That's

7:10

roughly 3 months from the acquisition

7:12

announcement to the first public demo

7:14

and 9 months from the start of the deal

7:16

to the first chip launch, which is even

7:18

faster than most startups can move.

7:21

Nvidia moved so fast because each Grock

7:23

LPU is built around 500 MGB of onchip

7:26

SRAMM that stores the model weights, the

7:29

activations, and the KV cache instead of

7:31

distributing them all over external

7:33

DRAM. I know that sounds like alphabet

7:35

soup, so let me say it in English.

7:37

Static random access memory or SRAMM is

7:40

small but insanely fast and it lives

7:43

right on a chip. It's expensive and

7:45

power- hungry per bit, but it has very

7:47

predictable memory access at low

7:49

latencies. DRAM is much larger but

7:51

slower offchip memory. It's cheaper per

7:54

bit and it's great for capacity, but

7:57

accessing it costs more time and energy

7:59

and latency can vary a lot under

8:01

different kinds of workloads. And during

8:03

his keynote, Jensen had a whole slide

8:05

dedicated to this difference. The Reuben

8:08

GPU has 288 GB of high bandwidth memory,

8:12

which is DRAM, while one Grock LPU has

8:15

500 mgabytes of SRAMM, almost 600 times

8:19

less capacity. My point is, these are

8:22

fundamentally different kinds of chips

8:24

for fundamentally different parts of the

8:26

AI chain. They even sit in separate

8:28

racks. Nvidia's LPX racks connect 256

8:32

Grock LPUs to create a dedicated ultra-

8:35

low latency path for the decode phase of

8:37

inference, while the Reuben GPUs focus

8:40

on training, prefill, and attention. If

8:43

you look back at Nvidia's original road

8:44

map, it used to have a Reuben CPX GPU

8:48

specifically designed for large context

8:50

inference. I even made a whole video

8:52

about it. That chip is now missing from

8:54

the latest slides. With these Gro

8:56

systems effectively taking its place, so

8:59

set another way, Nvidia spent $20

9:01

billion. They integrated Grock's LPU

9:04

architecture into their systems in under

9:05

a year, and they quietly replaced their

9:08

own Reuben CPX accelerator with

9:10

something that delivers up to 35 times

9:12

higher inference throughput per watt and

9:15

up to 10 times more revenue per rack

9:17

when serving large models. I honestly

9:19

think we're going to look back at

9:21

Nvidia's Grock deal as their most

9:23

important acquisition since Melanox.

9:25

Melanox is why Nvidia now owns the

9:27

networking technologies around their

9:29

GPUs, Spectrum X Ethernet, Quantum

9:32

Infiniband, and their Bluefield DPUs. So

9:36

far, we've talked about the Reuben GPUs,

9:38

the Various CPUs, and the Gro LPUs. But

9:41

Bluefield 4 is the piece that literally

9:43

ties them all together. Bluefield 4 is a

9:45

data processing unit or DPU. It sits

9:48

inside the Vera Rubin compute trays, the

9:51

Grock LPX trays, and the separate

9:53

context memory and storage trays. The

9:55

LPU in each tray handles the networking,

9:58

the memory access, and the data controls

10:00

so that the GPUs and the LPUs can focus

10:03

on generating tokens. And on the storage

10:06

side, Bluefield is the processor inside

10:08

Nvidia's new STX context memory racks.

10:11

These racks keep long-term agent context

10:14

on separate drives instead of on

10:16

expensive GPU memory. Then it pulls the

10:19

right data back into the GPUs right

10:21

before it's needed. That's how Ruben

10:23

keeps token speeds high while cutting

10:25

power costs for agents with long context

10:27

windows by around 5x. Here's what that

10:30

means in terms of performance at the

10:31

rack level. Pairing a Vera Rubin rack

10:34

with a Gro 3 LPX rack can generate up to

10:37

35 times the inference tokens per watt

10:40

and one STX context memory rack gives up

10:43

to five times more tokens per second and

10:45

five times better power efficiency for

10:47

long context workloads. So when you add

10:50

it all up, the Reuben GPUs, the Vera

10:53

CPUs, the Gro 3 LPUs, and the Bluefield

10:56

4 DPUs, as well as the context memory

10:58

and networking stack, you're looking at

11:00

a complete overhaul of Nvidia's hardware

11:03

portfolio that data centers can mix and

11:05

match. For example, data centers focused

11:08

on training and big batch inference will

11:10

mostly deploy Vera Rubin NVL72 racks.

11:13

But for real-time agentic workloads

11:15

where latency really matters, Jensen

11:18

suggested that about 25% of a data

11:20

center could shift to the new Grock LPX

11:22

racks. Here's another insight for

11:24

investors that I don't see any Wall

11:26

Street analysts talking about. We should

11:28

be watching Nvidia's data center

11:30

revenues for two key reasons. First,

11:32

Ruben gives Nvidia new ways to scale

11:35

beyond selling more GPU racks, like

11:38

layering on high-v value components and

11:40

services across more specialized racks

11:42

like Grock and these memory racks. And

11:45

second, if they break out revenue from

11:47

things like memory, DPUs, and LPUs, like

11:50

they did for networking, the mix will

11:52

tell us a lot about which workloads

11:54

their customers are leaning into. all

11:56

the way from classic model training to

11:58

supporting AI agents which helps us find

12:01

more winning stocks across the supply

12:03

chain. All right, now let's talk about

12:05

who actually uses all these tokens.

12:07

Jensen called OpenClaw the operating

12:10

system for personal AI. OpenClaw is an

12:13

open- source agent that can browse the

12:15

internet, code, call tools, and run for

12:18

millions of tokens at a time. This is

12:20

why I think token demand will be much

12:22

higher than most analysts expect. data

12:24

centers won't just be serving a few

12:26

billion people, but potentially tens of

12:29

billions of always on AI agents, burning

12:32

tokens to do everything that people

12:34

already do, except much faster and for

12:37

much longer, including spinning up even

12:39

more agents of their own. The problem

12:41

with OpenClaw is that it's an

12:43

open-source AI agent with root access to

12:46

everything on a computer. That's a

12:47

security and compliance nightmare for

12:50

enterprises. That's where Nemo Claw

12:52

comes in. Nvidia's open- source stack

12:54

that wraps Open Quaw with a policy

12:56

engine, privacy routing, and a secure

12:59

runtime environment so that companies

13:01

can build in guard rails to decide which

13:03

tools the agent can use, what data it

13:05

can touch, and where everything runs

13:08

locally, in the cloud, or on their own

13:10

Ruben pods. And now we've come full

13:12

circle. OpenClaw is what drives token

13:15

demand through the roof. Nemoclaw is the

13:17

control layer that makes agents safe and

13:19

deployable in the real world. And

13:21

Nvidia's Reuben architecture is the

13:23

hardware stack built to serve that flood

13:25

of tokens as efficiently as possible. As

13:28

more enterprises plug into OpenClaw and

13:30

Nemoclaw, more agents will use more

13:32

tokens. And that's how this software

13:34

story eventually shows up in Nvidia's

13:37

data center revenues. This is why it's

13:39

so important to understand the science

13:41

behind the stocks. We can see these

13:43

demand signals long before they show up

13:46

in the earnings numbers. But GTC also

13:48

made it clear that Nvidia isn't stopping

13:50

at software agents. They're going after

13:52

robots and self-driving cars to bring AI

13:55

to the physical world. So, let's talk

13:57

about that next. And if you feel I've

13:59

earned it, consider hitting the like

14:01

button and subscribing to the channel

14:03

and even sharing this video. That really

14:05

helps me out and it lets me know to make

14:07

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video provides an in-depth analysis of Nvidia's recent announcements at GTC, arguing that Wall Street underestimates the company's long-term potential. The host explains that Nvidia's 'Vera Rubin' platform is not just a hardware upgrade but a fundamental redesign to support the AI revolution—specifically focusing on agentic systems, data center efficiency, and physical AI through robots and autonomous vehicles. By integrating specialized hardware like GPUs, CPUs, LPUs, and advanced networking, Nvidia is positioning itself as the backbone of an economy driven by AI agents and token-heavy workloads, likely leading to a $10 trillion market cap.