E24: I Tested NVIDIA's Self Driving Car... Is Tesla In Trouble?

Watch on YouTube

Now Playing

Transcript

1997 segments

0:00

Today, you're joining me for something

0:01

really special. A realworld unedited

0:04

1-hour drive through downtown Los

0:06

Angeles in a Mercedes equipped with

0:09

Nvidia's L2++ autonomous driving

0:11

platform. This isn't a highlight reel,

0:13

and it's not a simulation. It's

0:15

continuous footage of the system

0:17

navigating the everyday chaos of LA

0:20

traffic, lane merges, sudden cutins,

0:22

construction zones, and unpredictable

0:25

pedestrians. Joining me for the ride is

0:27

Armen Connie, senior product manager for

0:29

autonomous vehicles user experience at

0:32

NVIDIA. I asked him every technical

0:34

question I could think of. So, you'll

0:36

hear him explaining what's really

0:38

happening under the hood, from sensor

0:40

fusion and decision logic to how the

0:42

user experience is designed to keep

0:44

drivers informed. For investors, I hope

0:46

this footage helps you assess how far

0:48

Nvidia's automotive platform has really

0:51

come and how it compares to others in

0:53

the autonomous mobility market. Every

0:55

few minutes, something unexpected

0:57

happens, and how the system handles it

0:59

might just challenge your assumptions

1:01

about who's ahead in full self-driving,

1:03

or at least by how much. Your time is

1:06

valuable. So, let's get right into it.

1:08

>> So, the system's engaged, right? So, you

1:10

can see we'll try to make this uh right

1:12

on red here after uh this car passes by.

1:15

>> Yeah.

1:15

>> Uh so, what you're experiencing here,

1:16

this is what we call our level two plus

1:18

experience. So, this is built on our

1:20

Hyperion architecture, right? So, this

1:22

car is using 10 cameras, five radar, and

1:25

then 12 ultrasonics uh for parking.

1:27

>> No lighter.

1:28

>> No lighter on this car.

1:29

>> Okay.

1:29

>> Right. So, because we still have George

1:31

is still a level two product, right? Uh

1:34

we still have the driver here. Uh and

1:36

it's designed where he can collaborate

1:37

with the car, right? So, uh the car will

1:39

follow all the speed limits. Uh if he

1:41

wanted to increase the speed, he can do

1:43

that from pressing the steering wheel

1:44

button. There's a speed adjustment

1:45

there. He can request lane changes by

1:47

using the turn signal stocks. uh or if

1:50

he wants, let's say there's a big

1:51

pothole in the road or something like

1:52

that, he would be able to collaborate

1:54

with the car and actually you adjust the

1:56

steering and then release the steering

1:57

and then the car resumes uh you know the

1:59

driving. So uh but for now we'll let the

2:01

car do everything. So we're approaching

2:02

the stop sign. The car can see the stop

2:04

sign there. It'll know to stop. It knows

2:06

to follow the right of way order right

2:08

before proceeding. So we're going to

2:09

start with a kind of a little section

2:11

here where we're going to get to like a

2:12

shopping area with some uh restaurants

2:15

and stores. So hopefully we'll get some

2:16

people crossing the street and some

2:17

delivery vans and things like that just

2:19

so we can show that we can handle a

2:20

variety of these scenarios.

2:21

>> Sure.

2:23

Uh one of my first questions is going to

2:25

be how did you guys come to the design

2:27

decision not to incorporate LAR?

2:29

>> So we work with uh our partners here to

2:32

determine what sensors right we want to

2:35

use. And for a level two plus product uh

2:38

we felt that we can achieve that with

2:39

just the 10 cameras and the five uh

2:42

radar with ultrasonics as well. Uh but

2:44

for our level three and level four

2:45

initiatives, that's when we'll add the

2:46

additional LAR to it. And then we'll

2:49

also change the actual driving model

2:51

that we're using to a bigger model. So

2:53

it all just scales to what the, you

2:55

know, design intent is for the given

2:56

product.

2:57

>> Yeah. And I assume that's like going to

2:59

be like a more extended version of the

3:01

same stack. So like the people who just

3:03

want L2++, you know, the people who want

3:06

three, four, or eventually five. It's

3:08

just an evolution of that same stack,

3:10

not a different stack completely.

3:11

>> So it's the same principles, right? So

3:13

what we're experiencing here today, this

3:15

is running all in a single orin. Uh and

3:17

then this uh experience is about 95% of

3:21

the driving will be done by Alamo,

3:22

right? And then we still have that

3:24

classical stack sitting kind of you

3:26

remember those old drivers education

3:28

cars where you have like two sets of

3:29

brake pedals, two gas pedals, two

3:30

steering wheels.

3:31

>> So the way I like to think about it is

3:33

uh Alpha Mayo, the end model in the

3:34

driver's seat, right? It's doing the

3:35

driving. Yeah,

3:36

>> but we have this classical stack that is

3:38

sitting in that passenger seat with the

3:40

extra set of brake pedal, gas pedal, and

3:42

steering wheel to take over to help

3:44

enforce certain rules if needed to. So,

3:46

that's how we're able to have both the

3:48

safety of that kind of classical stack,

3:50

right? But also the human driving

3:51

behavior of the end to end model where

3:53

you get that smooth, comfortable driving

3:55

behavior.

3:56

>> Got it. Um, one thing I'd love to

3:58

understand is, you know, you mentioned

4:01

there's a lot of cameras, there's a lot

4:02

of radars, and there's a lot of

4:04

ultrasonic sensors, right?

4:06

um what do each of those sensors do and

4:08

how do they get combined into this

4:10

larger like 360° view of what's going on

4:13

around the car?

4:14

>> Yeah. So it can take input from all

4:16

those sensors and it creates what we

4:17

call the world model, right? So the car

4:19

can see that all these cars are parked,

4:21

none of them are moving. So we can

4:22

detect the velocities for example,

4:24

right? Based off what the car can see

4:26

with the cameras, we can see the lane

4:27

lines and the lane markings. So we can

4:28

tell that we're in a drivable lane here,

4:30

but to our right is a bike lane, right?

4:32

So the car can label and understand what

4:34

all those are. So then it creates this

4:36

reconstruction of the world, right? And

4:38

it uses that to understand behaviors,

4:40

right? So for example, at an all-way

4:42

stop sign, we can use that to understand

4:44

right-of-way order, right? So we can see

4:46

when the other cars arrive and determine

4:48

when it is our turn to move in terms of

4:50

precedence, right? But then we also have

4:52

to consider there's these people

4:53

walking, right? So they will also impact

4:55

our ability to start. So you can see

4:57

this guy's here stopped, right? You have

4:59

these guys in the crosswalk. It's safe

5:00

for us to proceed, right? We also can

5:01

see that there's this guy on the scooter

5:02

in the bike lane. We don't need to freak

5:04

out, right? We can just keep driving in

5:05

our own lane. No problem, right? We can

5:07

just drive next to this guy,

5:08

>> right?

5:09

>> So the end to end model is using kind of

5:12

the front camera where it can see,

5:13

right? And then it's receiving inputs

5:15

from that world model to see what's also

5:16

going on behind it as well. And then the

5:19

classical stack is using all of them to

5:20

determine where it should go.

5:22

>> And sorry. So it sounds like everything

5:23

you've described so far is coming from

5:25

the cameras, right? That's the

5:27

>> cameras and it's also using the radar as

5:28

well. So

5:28

>> and the radar.

5:29

>> Yep. So it can use both basically to

5:31

understand the velocity of an object

5:33

right or a person that's walking right

5:35

it can tell right this is a car in front

5:37

of us. So it combines actually both

5:39

inputs in order to understand what it's

5:41

looking at and what it can detect with

5:42

the radar.

5:42

>> And then the third type of sensor you

5:44

mentioned I believe was ultrasonic. Is

5:45

that right?

5:46

>> So those are used for parking. Right. So

5:47

when we get really close to curbs and

5:49

things like that that's where we're

5:50

using those ultrasonics. But for driving

5:52

most of it's done with the uh camera and

5:54

the the radars.

5:55

>> Yeah. And and the radars are there

5:56

primarily to do range and speed.

5:58

>> Correct. You've got it. That's very

6:00

cool. How come you can't use um uh

6:03

information from like multiple cameras

6:05

like in stereo vision to determine range

6:08

and speed? What why radar or not?

6:10

>> We do a bit of both, right? So, you

6:11

know, there that's bit of the secret

6:13

sauce, right? But, uh you know, we're

6:14

using both to confirm and understand the

6:16

world around us, right? So, that's where

6:17

you have that redundancy that's helpful

6:19

to understand what is drivable, what

6:22

these other things are doing, right? Are

6:23

these people, right? You know, a radar

6:25

can see these people are crossing the

6:26

street, right? without the camera to

6:28

tell you that hey this is a person right

6:30

I know a person shape right you're able

6:32

to combine those together and understand

6:33

that these are people walking versus

6:34

that's a car or a small scooter right

6:37

>> so certain solutions on the road try to

6:39

approach this from a vision only

6:41

perspective

6:42

>> um how did you guys like can you walk me

6:44

through a little bit of the bigger

6:45

thought process that made the

6:47

determination you know what vision only

6:49

may not be enough especially for like an

6:51

L3 or L4 solution and these other

6:53

sensors need to come into play is that

6:55

from a safety perspective, a regulation

6:58

perspective, a capabilities in general

7:00

perspective. How how did you guys decide

7:02

to use more than vision only in the

7:03

first place?

7:03

>> For our Hyperion architecture, right, we

7:05

wanted the redundancy, right? So for

7:07

like level three and level four, uh

7:09

we'll use to uh to Thor, right? And

7:11

we'll also have the LAR there, right? So

7:13

we get more information, right?

7:15

>> And sorry, can you just briefly explain

7:17

what THOR is and Orin is and

7:19

>> Yep. So these are the different uh

7:21

onboard computing chips that we provide

7:22

to our partners, right? So the THOR has

7:26

more computing power than the Orin. So

7:28

with additional computing power, you can

7:29

use bigger end toend models, right? We

7:32

can take inputs from more uh signals,

7:34

right? So we can use more cameras to

7:36

power that model to get more

7:37

information. So like here's a great

7:38

example. We just had that light turn

7:40

yellow right as we're pretty close,

7:42

right? The car has to understand, should

7:44

I, you know, step on the brakes and stop

7:45

before the uh light or should I proceed

7:48

through kind of like a human would,

7:49

right? So it's calculating the distance

7:50

between us and the stop line, how fast

7:53

we're moving, right? to make some of

7:54

those decisions as well. So, sorry, just

7:55

an interesting scenario that we we had.

7:58

>> Uh, so yes, like coming up ahead, I

8:00

think we're gonna have an unprotected

8:01

left turn, which also be interesting to

8:03

see, right? You know, we have to

8:04

consider the oncoming cars, right? We

8:06

have to consider if anybody's in the

8:07

crosswalk. So, it's it's it should be

8:10

interesting uh coming up here. But this

8:11

is that shopping area saying where we

8:12

might have some double parked vehicles

8:14

and some pedestrians crossing the

8:16

street. So, uh earlier today, we've been

8:18

lucky. We've got some fun interesting

8:20

scenarios. I'm hoping we get some to

8:21

share with you guys as well.

8:22

>> Yeah. No, I'm looking forward to that.

8:24

I'm I'm noticing this is probably the

8:26

most boring driving job ever because

8:28

it's like you're not touching the

8:29

steering wheel. You're never touching

8:30

either pedal. Like it seems so smooth

8:32

that it's like it's interesting just

8:34

being here almost like a safety backup

8:36

instead of um the primary, right? Like

8:39

you're really being driven instead of

8:40

you driving the car is what I'm just

8:42

noticing for really the first time.

8:44

Yeah.

8:44

>> Yeah. So for for example, I did a test

8:46

where I drove from San Francisco to San

8:47

Diego in one day. So that was about 14

8:49

hours. It's about 1,000 miles, right? So

8:51

it was another driver and I and uh we

8:54

went down and you'd imagine being in the

8:56

driver's seat for that long right you'd

8:57

arrive really tired and you know

9:00

irritated right you sitting in traffic

9:01

throughout the day and things like that

9:03

>> but honestly by using the system and

9:04

letting the car handle a lot of the kind

9:06

of mundane task of you know sitting in

9:08

bumperto-bumper traffic in LA right the

9:10

car handled a lot of that right so there

9:12

I actually arrived even after a long day

9:14

quite refreshed and actually not that

9:16

tired because you release so much of

9:18

that you know processing that you're

9:20

doing right while you're driving that

9:21

you arrive a little bit more refreshed.

9:23

>> Yeah. You you uh eliminate a lot of that

9:25

decision fatigue, right? That decision

9:27

fatigue.

9:32

>> What uh what's going on with the

9:33

displays here? So, like this seems to be

9:35

a pretty static display. Um walk me

9:37

through like what the driver what kind

9:38

of information you're presenting to the

9:40

driver in a situation where they're

9:42

being driven versus them driving

9:43

themselves.

9:43

>> So, here on the center display, right,

9:45

you can see just the navigation, right?

9:47

So again, we just set a route out to

9:49

show you a variety of scenarios, but

9:50

we're just using that purely for

9:51

navigation, right? So we're not getting

9:53

any there's no HD map here. So we're not

9:55

getting any hints about the lane to our

9:57

right is a parking lane, then there's a

9:59

bike lane and there's this lane. So we

10:00

don't get any of that information. Uh

10:02

that's based purely off the car can see.

10:03

And then George, when you have a second,

10:05

do you want to switch over to the

10:07

conference view? Right. So we also then

10:09

can provide these inputs to our partners

10:11

where they can choose how they want to

10:12

visualize what

10:14

>> I'm sorry, I'm looking there now, right?

10:15

>> Yes. In front of the driver. So the

10:16

instrument cluster there, right? So we

10:18

can show that, hey, we see we have this

10:19

lead vehicle, right? So the car is able

10:22

to present that information to the

10:24

driver to help communicate a little bit

10:25

more about what it can see, what it's

10:27

doing, right? And we have all those

10:28

inputs that we can share, right? So we

10:29

can share things like traffic lights,

10:31

other vehicles, lane detections, and

10:33

then the partners can choose what they

10:35

want to uh use to display.

10:37

>> And sorry, just just so for clarity for

10:39

me, everything I'm seeing on the screens

10:41

now is purely for humans. None none of

10:44

this is also like information like the

10:46

information that the car itself is using

10:49

to make decisions is completely separate

10:51

from these like

10:52

>> what you're saying. Yeah. The only thing

10:53

that car is taking here is basically the

10:54

route, right? That's the only thing that

10:55

it receives that you can see. So as we

10:57

adjust around and again as we said

10:59

towards the end here, we can update the

11:00

route and set new points that are not on

11:02

the route. No problem, right? Uh at that

11:04

point the car just getting the

11:05

navigation to turn on this street,

11:07

right? Proceed on that street. That's

11:08

all it's getting from there.

11:09

>> Um how come? Why like why why not give

11:12

it as much information like is is the

11:14

idea that's the sufficient amount of

11:16

information to do the job and you don't

11:18

need to give it more or is there like

11:20

why not give it for example all the

11:21

speed limits uh to these roads is it

11:23

because it's enough to have it determine

11:25

things uh with cameras using looking at

11:28

street signs even in low visibility or

11:30

>> so it's there's always a data quality

11:32

question right so sometimes we can get

11:34

incorrect information from the map or it

11:36

doesn't have information right so we'll

11:39

use what we have if it's there, but if

11:40

it's not there, no problem. We'll go off

11:42

what the car can see from perception,

11:44

right? So, as we see speed limit signs

11:46

change, right, the car then can see that

11:48

and adjust its speed accordingly, right?

11:50

So, you know, if maps don't get updated,

11:52

things like that, we want to make sure

11:54

that the car is always driving based off

11:55

what is most relevant. Like we can see

11:57

we have some construction coming up

11:58

ahead, right? That may not be

12:00

represented in any mapping, right? So,

12:02

the car just needs to be able to handle,

12:03

okay, there's a lane closure coming up,

12:06

right? So, you can see the car stopped.

12:07

Does this guy cut in front? Right? No

12:09

problem. Uh, it's able to stop. And then

12:12

you can see up ahead, right? We'll have

12:14

this lane closure with that big LED

12:16

board, right? So, we can see, you know,

12:18

the the signs there, right? And we know

12:21

we want to make a left turn, so it's

12:23

going to want to get over to the left,

12:24

but it's actually going to change its

12:26

mind and go here, right? No problem. It

12:27

sees there's a guy standing on the road,

12:29

right? So, we can come here and we'll

12:31

drive here with this center lane that's

12:32

closed, right? How often do you see a

12:34

center lane is closed?

12:35

>> Yeah. Right.

12:37

What's that like for you as the driver,

12:39

if you don't mind me asking? Like, you

12:40

know, someone cuts in front of you, your

12:41

hands close to the your pedals are close

12:43

to the feet, your hands are close to the

12:45

and you do nothing and it just works

12:47

12:48

>> um Well, I've been testing the software

12:50

enough, so I know the car can handle

12:52

most of the situations.

12:53

>> Was the first time it was just like,

12:55

"Oh, this is crazy." And then,

12:56

>> no, I've seen crazier stuffs.

12:58

>> Okay. Yeah. So, so this it sounds like

13:01

it can really handle truly like outlier

13:03

situations is what I'm getting at.

13:05

>> Yeah. So the, you know, the model's been

13:07

trained enough, right, that we can do a

13:08

lot of the general driving, right? And

13:10

what we're excited about is, you know,

13:11

we'll have, you know, kind of a beta

13:13

release of this, uh, in Q2 of this year,

13:16

right? But we're looking for a, so we

13:17

can see we're trying to get through with

13:18

these two guys here. No problem. U,

13:21

we're trying to do a nationwide roll out

13:23

by the end of this year, right? So,

13:25

>> and sorry, when you say nationwide roll

13:26

out, what do you mean? Is that like

13:28

nationwide in Mercedes? Like what what

13:30

>> for customers? Yeah. That are buying

13:32

this car, right? they'd be able to, you

13:34

know, purchase this software, right, and

13:35

use it to drive, you know, from here to,

13:38

you know, Miami or, you know, everything

13:40

in between, right? So, that's really

13:42

exciting. And then from that, right, as

13:44

the, you know, we have this rolled out

13:46

in more and more customer cars, right,

13:47

we start to get, you know, interesting

13:49

data events that we get from those cars

13:50

as well. So, that data will be sanitized

13:52

and sent back to us. So, we can always

13:53

evaluate all the new events that we're

13:55

seeing that maybe our fleet didn't

13:56

catch, right? But someone living in, you

13:58

know, a state where we don't have a test

14:00

car, right? we can then start getting

14:01

that information that way as well and

14:03

use that to enhance the models for

14:04

future releases.

14:06

>> And I so I assume a bunch of things are

14:08

happening at once, right? Like for

14:09

example, the software is being offered

14:11

in more and more automobiles and with

14:13

more and more auto uh manufacturers and

14:16

then separately there's also growing

14:18

more and more capabilities in each one

14:20

of those. So for example, we're in an L2

14:22

plus vehicle, right? Yep.

14:24

>> And then you know eventually level three

14:26

and beyond. Can you speak a little bit

14:28

to All right, you just said a little bit

14:30

about the nationwide roll out, right?

14:32

What's the what's on the road map uh the

14:34

other way, right? Like when do you guys

14:36

expect to sort of reach a level three, a

14:38

level four?

14:38

>> Yeah. So, as you may have seen a lot of

14:40

the news with GTC this year, we

14:42

announced that with Uber, we'll do a

14:43

level four robo taxi in Los Angeles in

14:45

San Francisco starting next year. Right.

14:47

So, you can see we're nice going down

14:48

this nice narrow road, right? And sorry,

14:50

he he actually touched the wheel there

14:52

or was it more about what's what what's

14:54

the distinction like why did why why um

14:58

manually turn there as opposed to let

14:59

the car do it?

15:00

>> Um

15:01

>> I'm genuinely like I'm just trying to

15:03

understand the

15:06

>> I'm just a safety driver. So if I feel

15:08

like sometimes um like uh the car maybe

15:11

is um going to uh get into contact with

15:14

objects then I can collaborate uh I can

15:18

do something called collaborative uh

15:20

steering

15:21

>> but the car still lets me uh handle the

15:23

wheel

15:23

>> and it's like a seamless transition it

15:25

seems like like you you did one tiny

15:27

maneuver hands off and it wasn't like a

15:29

hard intervention like the mode didn't

15:31

change.

15:32

>> Yeah. So the design is so that way for

15:34

level two, George can be involved or he

15:36

can let go, right? It can go either way,

15:38

right? So like I've done that turn six

15:39

times today, right? No issue, right? So

15:41

in this case, right, depending on if

15:43

George wants to get closer or not,

15:44

right? He can help the steering out,

15:46

right? To increase his comfort, right?

15:48

Or increase our comfort, right? In that

15:50

case,

15:50

>> and that hand like I think what I'm more

15:52

commenting on is like I've I've been in

15:55

other, you know, level two assisted cars

15:58

in the past. Um, and when you make a

16:01

manual intervention, from then on,

16:03

you're in manual driving until you

16:05

re-engage a lot of those features. But

16:08

in this case, it seemed so smooth. It

16:10

was like touch the wheel, adjust the

16:12

turn a little bit, you're back to hands

16:14

off, feet off, you know.

16:15

>> Exactly. So,

16:16

>> so in this case, if George can he can

16:18

tap the gas, he can, you know, touch the

16:20

steering wheel, right? The system will

16:22

stay engaged. It's only if he hits the

16:23

brakes that'll disengage completely.

16:24

>> Got it. Uh, super interesting. And

16:27

sorry, so right before that happened, we

16:29

were talking about uh level three and

16:31

level four. Can you just remind me? So

16:33

with Uber, that's what you're talking

16:34

about.

16:34

>> Yeah. So we'll have uh level four will

16:36

start rolling out in LA and San

16:38

Francisco to start, right? And it'll be

16:39

28 cities by the end of 2028 uh with

16:42

them. So we're excited to see that

16:44

coming as well. So we can see this guy

16:45

stopped, right? No problem. We have that

16:48

>> there. Uh so yeah, so we're excited

16:50

again to see how the architecture can

16:52

scale from a level two plus product all

16:54

the way up to that level four, right?

16:56

Where the car would have to do

16:57

everything, right? Where you you don't

16:58

have George here, right? You don't have

16:59

anyone in that seat, right? The car

17:00

would be able to handle all of those

17:02

scenarios. Are you expecting when that

17:04

happens, will there be certain new kinds

17:07

of cars coming out that maybe don't have

17:08

a steering wheel at all? Like how does

17:11

this impact the future of what

17:13

automobiles will even look like?

17:15

>> Yeah, there's a couple of different

17:16

ways, right? I think we can imagine a

17:18

world where yeah for kind of robo taxis

17:21

right where there never will be a driver

17:24

right you can have a design where you

17:25

know there's a car that doesn't have a

17:27

steering wheel right or doesn't or have

17:28

the seats that maybe face inward right

17:30

that's one concept

17:31

>> uh the other one right it can be you

17:33

know consumer grade where you know you

17:35

can buy a car where you know at least

17:38

living in California right I might want

17:39

to go drive on you know highway 1 and

17:41

PCH and go drive you know the beautiful

17:43

scenic countryside roads along the ocean

17:46

but then when I'm sitting in traffic in

17:47

San Francisco, I want the car to do

17:48

everything, right? So, you can have

17:50

different approaches to, you know, when

17:52

you want to drive versus when you don't.

17:53

So, the the as long as you have the

17:56

sensor set, right, the stack is flexible

17:58

enough that you can have a steering

17:59

wheel there and we can design it where

18:01

we want the driver to be part of the

18:02

experience or we can do it where we

18:04

don't want the driver to be part of the

18:05

experience at all.

18:06

>> Sure. No, that makes a lot of sense.

18:07

Speaking of the stack, um since you know

18:09

so much of it is camera based, uh what

18:12

is performance like you know like super

18:14

foggy weather, nighttime, bad rain,

18:17

right?

18:17

>> Yeah, it's a great question. So the

18:19

system has levels of kind of degradation

18:21

that can it can accept, right? And we

18:23

also can understand where maybe the

18:25

blockages, right? So let's say there's

18:27

dirt that gets on some of the cameras,

18:28

right? For example, if we're looking in

18:30

front of us, if it's blocking where we

18:32

can't really see the tops of these

18:33

buildings, right? We don't really care

18:35

as much, right? you don't need to

18:36

prioritize things that are up high,

18:37

right? As long as we still can see

18:38

directly in front of us in that type of

18:40

drivable space. So, it's prioritizing

18:43

different areas for each of the cameras

18:44

of what is most important, right? And

18:46

then until they reach, you know, a

18:47

certain degradation level, right? Then

18:49

it may say, "Hey, okay, actually, we

18:51

want the driver to take over in this

18:52

case, right? And then with kind of the

18:54

level threes and level fours, right,

18:55

that's why we want the redundant sensor

18:57

sets there so we can have a couple

18:59

different options in order to help aid

19:02

uh what the car can see, right? and make

19:03

sure that you know it's able to see all

19:05

the objects and all the cars on the

19:06

road.

19:08

>> That makes a lot of sense. So, we've

19:10

we've seen a lot of interesting use

19:12

cases already, right? For example, um or

19:14

edge cases, sorry, I should say, like um

19:16

construction in the middle of the road

19:18

where you have to make a left right

19:19

decision, but the construction zone is

19:20

between them. What is some of the like

19:22

craziest edge cases you've seen that

19:25

turn into real practical examples and

19:27

training data for the model?

19:29

>> Yeah, I would say a lot of construction

19:30

ones have been interesting, right? Uh we

19:33

I had one example uh it was actually in

19:35

San Francisco where there was a row of

19:38

cones, right? And there's people working

19:39

in the middle. No problem, right? We see

19:42

that, you know, a million times. No

19:43

issues, right? But then one of the

19:44

construction workers decided to throw a

19:46

cone in front of the car, right? To say,

19:48

"Hey, stop. We have we're going to

19:49

unload some stuff." So it literally just

19:50

throws a cone, right? And the car sees

19:52

this object. Yeah. So then it stops,

19:53

right? But you're like, "Okay, wait,

19:55

what? I've never seen someone do that

19:56

before." And the guy just didn't want to

19:58

wait. So he threw a cone in front of the

19:59

car and then he went and carried, you

20:00

know, a couple boxes across the street

20:02

and then went and picked up the cone and

20:03

then walked out of the way, you know,

20:04

>> and it was like totally fine.

20:05

>> Yeah, it was totally fine, right? But it

20:06

was one of those things you're like,

20:07

"Okay, I've never I've never seen that

20:09

even as a human driver, right? So we can

20:11

see we don't want to block the

20:13

intersection, but then we also have, you

20:16

know, these pedestrians here, right? So

20:18

we want to try to clear the ped the

20:19

intersection as much as we can, right?

20:21

But then we have these guys that are

20:23

walking, right? Cuz the light came to a

20:24

stop. We're in gridlock, right? This is

20:26

a, you know, a nice deep dense traffic

20:28

area in downtown San Francisco.

20:30

>> So, this is a case where it seems to

20:31

have made the decision, uh, even though

20:33

we were already moving at low speeds and

20:35

the light was yellow, uh, then it turned

20:37

red while we were still in the

20:38

intersection. Is is that typical? Like,

20:41

walk me through kind of what just

20:42

happened versus what maybe the average

20:44

person would have expected to happen.

20:46

>> Right. It's, you know, so you have this

20:48

guy that's really close to a spine, so

20:49

that's why we got the rear blind spot

20:51

going off there. U, it's interesting,

20:54

right? The, you know, yellow light

20:56

handling is also one of those

20:57

interesting things as a human, right?

20:58

Where you can tell how quickly you're

20:59

moving towards a car or towards the

21:01

intersection and you can kind of gauge,

21:03

okay, I should stop for this one, right?

21:04

I'm really far away. I'm not going that

21:05

fast. I should stop. Uh versus, hey, I'm

21:07

really close to the intersection. I

21:09

should probably proceed to be safer that

21:10

way. You're not slamming on the brakes,

21:11

right? So, it's a very similar approach,

21:13

right? We have enough training data on

21:15

yellow lights, right? Where it's

21:16

learned, okay, in this situation where

21:18

I'm, you know, about this far away at

21:19

this speed, right? I should proceed

21:20

through versus I should not. And then we

21:22

also know in this situation where if we

21:24

end up in a situation like that where we

21:25

get kind of stuck where we've already

21:27

entered the intersection, we're already

21:28

passed kind of the weight line, right,

21:30

we should try to clear the intersection,

21:31

right? So that's why we came over to the

21:33

right lane and we're able to kind of

21:34

open up and clear up the intersection so

21:36

that we're not blocking the the cross

21:37

traffic.

21:38

>> Got it. Yeah. As a more selfish driver,

21:40

I would have probably stayed in the

21:41

intersection, not gotten in this lane,

21:43

you know, so it's really interesting

21:44

seeing um the way the car prioritizes

21:48

certain things. Like for example,

21:50

clearing the intersection at the cost of

21:52

our own convenience because now we're in

21:53

sort of a lane people park in versus the

21:56

more human decision of like, oh, I'll

21:57

just wait because I'm about to clear

21:59

this light even though right now I'm

22:00

sticking in the intersection. Right.

22:02

>> Yeah. But then it also saw we had people

22:03

walking. Right. So it stopped. Right. So

22:05

it let those people cross ahead. Right.

22:07

>> Right.

22:08

>> Whereas I probably would have just

22:09

beeped at them.

22:12

>> What are you most looking forward to

22:14

like near-term more on the road map? Is

22:16

there like a specific feature that's

22:18

coming soon? Is it more like the global

22:20

rollout? Like walk me through as

22:21

somebody who lives this sort of

22:22

day-to-day like what's kind of next that

22:24

you're looking forward to?

22:25

>> It's a great question. I think what is

22:27

exciting is it's kind of like watching

22:29

like a 16-year-old learn how to drive

22:30

right as it gets better every day,

22:32

right? So uh as mentioned when I'm not

22:34

uh sharing these experiences with you

22:36

guys, I'm driving the car every day and

22:37

experiencing uh the latest builds.

22:39

>> Uh so it's fun to see the car get better

22:41

at handling those edge cases like the

22:43

construction worker scenario I gave or

22:46

you know the construction in the middle

22:47

lane. Seeing how the car can handle more

22:48

and more complex situations is really

22:51

cool. And then as I mentioned like the

22:52

test down in San Diego, right, taking it

22:54

to different locations, right? Seeing

22:56

how the car can handle, you know, the

22:58

different scenarios, right, is also

23:00

really exciting to see how it can drive

23:02

in different cities, right?

23:04

>> Jeez. Yeah, this is this is tough. I I

23:07

get it. Um and sorry I don't know if you

23:09

said this when we had already started

23:11

recording but can you just please say

23:13

like what your actual role is what you

23:16

actually do at NVIDIA right

23:17

>> so I'm one of the product managers that

23:18

works on our ADAS features right so uh I

23:21

specifically am on our user experience

23:23

team so we try to look after the overall

23:25

driving behavior right so is the car

23:28

comfortable is it being safe right how

23:30

does it feel when the car is driving so

23:32

that's my priority when it comes to uh

23:34

working on the stack

23:35

>> and sorry when you say you know is Is it

23:37

comfortable? Is it safe? Do you mean

23:38

like individual to individual like this

23:40

experience or do you mean like based on

23:42

the data we're seeing, you know, the

23:44

smoothness of stops, the ease of turns,

23:47

like the more macro level like

23:49

statistics, the overall experience is

23:51

safe, easy, like

23:53

>> both, right? So, we look at both, right?

23:54

So, we have a number of, you know, uh,

23:56

regression tests that we can do where we

23:58

could run the car through, I don't know,

23:59

10,000 left turn events, right? and make

24:01

sure that it always makes sure that it

24:03

clears safely and doesn't collide with

24:05

any traffic or pedestrians in

24:06

simulation. Right? So, we do all of our

24:08

offline testing and then we also do

24:10

on-road testing, right? Because

24:11

ultimately we also want to validate,

24:13

right, the behavior uh in the car as

24:16

well. So, uh I do both, right? And it's

24:18

a lot of fun to actually get in the car

24:20

and actually experience different builds

24:21

and see some models may be more relaxed,

24:24

some models may be more aggressive,

24:25

right? And everything in between. Uh and

24:27

we try to kind of design for you know

24:29

kind of the 80% where you know my mom

24:31

would be happy to get in this car who

24:32

someone who's not technical right

24:34

doesn't you know doesn't want to give up

24:36

control right where we can get to the

24:37

point where someone can feel comfortable

24:39

uh and feel safe using these types of

24:41

software so simulation while while we're

24:44

on the subject I'm really interested you

24:46

know you take 10,000 right turns in

24:48

simulation for example right and then

24:50

you look at data from 10,000 right turns

24:52

in real life on the same model how big

24:55

of a variance or like difference I guess

24:57

between what you see in simulation and

24:59

what you see using real data is there

25:01

>> yeah with Cosmos and our physical kind

25:04

of AI simulation right you get the real

25:06

world physics right so it's it actually

25:09

behaves differently right if you run a

25:10

simulation for an area with snow and

25:12

rain right braking distances are longer

25:14

right because in real life they would be

25:16

right so uh it's quite accurate actually

25:18

you you'd be surprised where okay this

25:20

model shows hey this one might uh we

25:23

call like under steer where it turns and

25:24

it drifts into the other lane because it

25:25

can't keep its lane as tightly, right?

25:27

We can see that in simulation and if you

25:29

deploy that model to a car, it follows

25:31

like, oh, actually this car doesn't

25:33

follow the turn trajectory as tightly as

25:34

we'd like it to, right? So, it gives you

25:36

a pretty good uh sense of what the

25:38

performance will look like.

25:39

>> It probably also gives you a pretty good

25:41

sense of like how good drivers are,

25:42

right? Like what a car would do versus

25:44

what people decide to do, like how many

25:46

people decide to take control of a turn

25:48

versus let the model go through it, for

25:50

example.

25:50

>> Yep. So, we get that data. We also, it's

25:52

interesting. Obviously, it's trained on

25:54

a lot of human driving data, right? So

25:55

things like uh what we call like the

25:57

California role where you creep through

25:59

a stop sign and they come to a complete

26:01

stop, right? So you can see you know

26:03

certain data sets have more of that in

26:04

it. So all of a sudden now we see the

26:06

model thought it stopped but it really

26:07

didn't. It's like okay wait no we need

26:09

to enforce that. So that's where having

26:11

that classical stack underneath is

26:13

really helpful where you can enforce

26:14

certain things like making sure you stop

26:16

completely for stop signs, right? Or

26:18

>> uh for example if you were to go to

26:21

different states where you can't make

26:22

right turns on red, right? you can

26:24

enforce it by location, right? Things

26:26

like that. You're also able to help, you

26:28

know, enforce the behavior.

26:30

>> Can you speak a little bit more to that

26:31

enforcement? Is that like um, you know,

26:33

a rules-based enforcement? Like, hey, in

26:36

this boundary box, which is the state

26:37

line or whatever, the rule is no right

26:39

turn on red, or is it more like a

26:41

training a separate model state, but I

26:43

know not exactly a separate model, but

26:44

like walk me through a little bit.

26:46

>> So, we we do a bit of both, right? So,

26:47

you can have region specific. So, you

26:50

can see this guy cutting in, right? He's

26:51

here. No problem, right? we can kind of

26:53

get over it's all good but then we see

26:55

these people right so we're just being a

26:57

little cautious here u so you can do

26:59

different data sets right where you can

27:02

have your you know California data set

27:05

Florida data set if you'd like right you

27:06

can have different uh locations in the

27:08

actual model yeah but then you also to

27:10

your point you can also use uh rules to

27:13

enforce certain behaviors like hey when

27:15

in you know a different state right a

27:17

state that doesn't allow right on red

27:18

right you can do that right so you also

27:21

can see uh the sign there that says no

27:23

right on red. Yeah.

27:24

>> Right. So, we'll come to a stop here and

27:26

we'll wait a little bit further. Right.

27:28

We have a couple areas where you can

27:29

make a ride on red where the car will

27:31

creep forward and it'll make sure

27:32

there's nobody coming and it'll make

27:33

that turn here.

27:34

>> And sorry, like um how does it determine

27:36

right now? I mean, this is gonna sound

27:39

like a silly question, but like does it

27:41

know it's in California right now

27:43

because it's in some certain latitude

27:44

longitude box or is it told that it's in

27:48

California or like

27:48

>> the car has GPS, so it knows that it's

27:50

uh at least for this.

27:51

>> So, it's just using GPS data on top of

27:52

that. Okay. I didn't know if it was Got

27:54

it. That makes a lot of sense.

27:56

>> So, so you see we had the green light,

27:58

but this guy decided to go. So, we

27:59

yielded for him, right? So, even though

28:01

humans can be bad actors, right? Now, we

28:03

can go. Nobody else is going. Great. We

28:05

can make this turn.

28:06

>> Yeah.

28:07

I don't know what's worse, me behind the

28:09

driver's seat of a self-driving car or

28:11

me being a pedestrian once self-driving

28:14

cars are on the road. You know,

28:16

>> I'll be a bad doctor actor either way.

28:18

>> That's humans.

28:23

>> Um, what are some typical confusers that

28:27

um sometimes can make the car misbehave?

28:29

So, for here's what I'm specifically

28:31

asking. Right now, we're on an incline

28:33

and there's like low hanging wires

28:35

directly. I know not directly in front

28:37

of the car, but because our noises, our

28:39

nose is pointed up, we're seeing things

28:41

that would seem to be unusual to

28:43

sensors, right?

28:44

>> Yeah. So, to kind of the earlier part

28:46

about like camera degradation, right, we

28:48

know to prioritize things that are

28:49

closer to the ground, right? So, what we

28:51

can see where the car actually is going

28:52

to drive, right? So, if we see something

28:54

weird like the wires above, right, we

28:55

can say, "Hey, that's probably doesn't

28:57

have anything to do with where we're

28:58

driving, right? We can ignore, you know,

29:00

these cables above us, right? Those

29:02

aren't lane lines, right? Those aren't

29:03

railroad tracks, right? Those are just

29:04

cables, right?

29:05

>> So with enough data, right, you can

29:07

learn to dep prioritize certain regions

29:09

of what the car can see and then also

29:11

what it is seeing.

29:12

>> Does the car care like does the car

29:13

understand when it's like how does the

29:16

car understand when it's on an incline

29:18

and it needs to look like further like

29:20

closer to the ground?

29:21

>> So you'll see actually uh on this route

29:24

we also have some of the nice classic

29:25

San Francisco hills, right? So the car

29:27

can tell that there is occlusion there,

29:29

right? So maybe it needs to be a little

29:31

bit more cautious when dealing with

29:32

always stop signs in that scenario

29:33

because you might not be able to see

29:34

someone right who's coming. So uh it has

29:37

this understanding of hey I can't see

29:40

right I can tell there's gradient here

29:42

let me be uh a little bit more cautious

29:44

drive a little bit more slowly

29:45

>> right like we also see a section where

29:47

the speed limit will be 25 miles an hour

29:49

right but the car actually will slow

29:51

down because at 25 mph it feels very

29:53

fast on a steep hill in San Francisco

29:55

right so the car will naturally slow

29:56

itself down

29:57

>> and again that also comes with kind of

29:59

the endto-end model where you get enough

30:01

diverse driving data right it'll learn

30:03

that humans may naturally slow down even

30:06

though the speed limit may be higher,

30:07

right? If it's a narrow road or a steep

30:09

road, we naturally will slow down. The

30:11

car will also learn that behavior.

30:13

>> That makes a lot of sense. And I guess

30:14

one thing I just thought of is some of

30:16

the cameras, I believe, are mounted

30:18

higher than us, right? Like they're

30:20

mounted up here. So things that seem

30:22

invisible to me, the camera might still

30:24

be able to see over it. Right.

30:25

>> Exactly. So Right. And it has multiple

30:27

cameras and it has a radar, right? So it

30:28

can compare the position, what it can

30:30

see from each of those and try to

30:31

determine, okay, is there something

30:33

there? Right. Should I be cautious?

30:34

what's happening here?

30:36

>> What do you do when um there's a

30:37

situation where one sensor says one

30:39

thing and another says another? So, for

30:41

example, a highly reflective surface

30:43

surface to the radar, but that doesn't

30:45

show up like on the cameras,

30:47

>> you know?

30:47

>> Yeah. So, it'll do a comparison, right?

30:49

And we can assign like confidence

30:50

percentage to different things, right?

30:52

So, we can say, hey, we are not sure

30:54

what this is, right? So, we we have what

30:56

we call multi-ensor fusion, right? And

30:58

then it can choose what to wait and what

31:00

to prioritize based off on how confident

31:03

it is on what it's detecting.

31:05

>> Got it. And like does that multi-ensor

31:08

fusion think about like the resolution

31:11

of like cameras versus the radar like

31:14

>> Yeah. So it can it knows obviously we

31:15

know the spec uh of what the camera is,

31:17

what it can see, right? So it's able to

31:19

determine, hey, what is the likelihood

31:21

that I think that there's something

31:22

here, right? And then you can see here

31:24

it looks like maybe it's trying to

31:25

consider a lane change for this guy

31:27

who's stopped who decided to stop. So

31:31

we'll wait for these cars to pass.

31:34

So you can see it's creeping forward.

31:36

All right. And it was able to make that

31:37

lane change to go around all these guys

31:38

that are stopped. Uh one thing that we

31:41

found that's really interesting with the

31:43

model is lane changing actually feels a

31:45

lot more natural. Uh with the classical

31:48

stack, right, you have to identify where

31:49

the gap is in traffic, right? So you're

31:51

calculating the velocity of the lead

31:53

car, the rear car, right? And also have

31:55

to determine where to position your own

31:57

vehicle, right? So getting that kind of

31:59

what we call the speed adapt phase where

32:00

the car is accelerating and slowing

32:02

down, right? You have both lateral and

32:04

longitudinal acceleration that you need

32:05

to consider, right, when making those

32:07

lane changes. Uh with the classical

32:09

stack, we found that sometimes it could

32:10

feel

32:11

>> a little bit more robotic or a little

32:12

bit more jerky, right? Whereas once we

32:14

started training the model on lane

32:15

change data, it felt really smooth,

32:18

right? where it's able to, you know,

32:19

gently slot itself into gaps in traffic

32:22

where it feels much more intuitive and

32:24

it feels much more humanlike because

32:25

it's able to, you know, more naturally

32:27

control both lateral and longitudinal uh

32:29

behavior.

32:31

Are there any surprises? Like, so that's

32:33

a good example of something where adding

32:35

in that next layer of data made the

32:37

driving experience noticeably and

32:39

significantly better, right? Less jerky,

32:41

more smooth. Um are there other examples

32:43

like that you can share where it's like

32:45

adding that extra layer of um training

32:48

really resulted in a clear change in

32:49

behavior?

32:50

>> Yeah, a big one that was a challenge for

32:51

us is handling like double parked

32:53

vehicles, right? So you have a car

32:55

stopped in front of you. You need to go

32:56

into your oncoming lane, right? And then

32:59

come back into your original lane,

33:00

right? So you have to detect the

33:02

distance between you and the double

33:03

parked car and then you have to also

33:05

check for any oncoming vehicles, right?

33:07

That can be really challenging because

33:09

if it's something narrow like a bicycle

33:11

or motorcycle ride or something coming

33:13

really quickly, right? That can be a bit

33:14

of a challenge to know when you should

33:16

decide to go versus when not to go,

33:18

right? But giving that uh a big data set

33:21

of human driving, right? The model we

33:23

found was actually uh much more natural

33:26

in its timing, right? So it's decision

33:27

to go versus not go felt a lot uh more

33:30

natural and actually the maneuver

33:31

itself, the quality of the maneuver was

33:33

a lot better, right? And so that's

33:34

another one where we're really excited

33:36

to see, okay, here's this really

33:37

challenging scenario, right? But the car

33:40

is able to do it in a way that feels

33:42

natural and humanlike without giving you

33:44

like big jerks or harsh brakes because

33:46

it doesn't want to hit something. So

33:48

that was really nice.

33:49

>> How is the car itself at parallel

33:51

parking? Is that something like

33:52

>> Yeah, we can do parallel parking. We can

33:54

do perpendicular parking, right? Can do

33:56

angled spots, right? So, uh we also have

33:58

parking capabilities. We're looking to

34:00

add kind of ability to park within

34:02

parking structures and things like that.

34:03

So that's other products that we're

34:04

working on. Uh so it's exciting to see

34:08

how quickly it advances, especially with

34:10

uh the end toend models, right? We see

34:13

the rate of improvement is pretty quick.

34:15

And because it's sitting on top of that

34:17

classical stack, you still get the

34:19

safety of the classical stack, right?

34:20

But then you get all these improvements

34:21

for things like lane change or double

34:23

park vehicles, all that sort of stuff,

34:25

you're able to quickly iterate on and

34:27

get the advantage of that endto-end

34:28

driving behavior and the human-like

34:30

behavior.

34:32

That's really interesting. Oh, yep.

34:36

>> Yeah. So, it sees there's guys there

34:37

with cones, right?

34:39

>> And then it seems to like have

34:40

understood, hey, that cone actually

34:42

isn't in my lane. I'm just going to keep

34:44

going. No problem. Right. That was

34:45

really cool.

34:46

>> Yeah. So, you can see, hey, we'll slow

34:47

down. We see there's a guy. Is it going

34:48

to step in front of us? What's going to

34:49

happen? Right. Okay. No. Okay. I can go

34:52

ahead and proceed.

34:52

>> What is the So, like um how many times a

34:56

second does it make decisions? I know

34:57

it's like continuously, but you only

34:59

get, you know, I'm making this number

35:01

up, 60 frames a second from the cameras,

35:03

let's say.

35:04

>> So, like how often is it processing and

35:06

making those decisions per second?

35:07

>> Uh, I don't know the exact number off

35:09

the top of my head, but I can tell you

35:09

it's generating trajectories basically

35:11

every second, comparing that with the

35:12

the classical stack to determine, hey,

35:15

is this rational and is this safe? So,

35:17

we can follow up after and get you the

35:18

exact number if you

35:19

>> Sure. Yeah. Just generally,

35:20

>> just curious. Yeah. Um, when I I went to

35:24

the Q&A with Jensen and I got to ask him

35:26

a question and I asked him, you know,

35:29

with the advent of like OpenClaw and

35:30

Nemoclaw, what are you most what

35:32

application areas are you most excited

35:34

to see tackled with these new

35:35

technologies? And actually to my

35:36

surprise, he talked a lot about

35:38

self-driving and how agents and agent AI

35:41

will sort of be infused into cars in the

35:43

future and help with that decision

35:44

stack. Can you speak to as somebody

35:46

who's like a little more on the ground,

35:48

can you speak to are how are you guys

35:50

thinking if you are at this point using

35:52

Open Claw like how does that fit into

35:54

this larger picture if at all yet?

35:56

>> Yeah. But for you know my role no not

35:58

not yet right but uh some of our other

36:00

team members are using it to help you

36:01

know search for you know certain data

36:03

sets that we need right so if we're

36:05

looking we can use AIs to search for

36:07

construction workers that throw cones in

36:09

front of you right across all of the you

36:11

know data that we collect from all of

36:12

our fleet plus you know data that we get

36:14

from customers and partners right so

36:17

you're able to use it to train you can

36:19

use AI to find the data you need to

36:21

train your model right so

36:23

>> uh I don't work on the model directly

36:24

but you know my team members that too,

36:26

right? There's different ways that we

36:27

use, you know, AIS in that way in order

36:29

to find the data that we need for those

36:31

corner cases that we're looking for to

36:33

help improve the performance.

36:35

>> Yeah, that makes a lot of So, it's

36:36

sometimes it's more about um using AI to

36:39

go through all this largely unstructured

36:41

data.

36:41

>> Correct. We use it to label the data,

36:43

right? So, we can say, hey, these are

36:44

cars, these are people, these are dogs,

36:45

right? These are cones, right? And then

36:47

we also can use it then to go capture or

36:50

collect the data that we need to train

36:51

the model in a certain scenario.

36:53

>> That makes a lot of sense. How do you um

36:57

determine when you need to take do

37:00

something new in simulation? Like do you

37:02

look at data first and then say hey we

37:05

don't really have enough of this kind of

37:06

data. Let's go simulate this case many

37:08

many more times or like what's the

37:10

process for deciding when to load

37:12

something up in omniverse and just like

37:14

>> create a new scenario. Yeah. So we have

37:16

this concept we call like a functional

37:18

scenario tree right. So to give you an

37:20

example right let's take stopping at all

37:22

way stop signs right. So, okay, we know

37:25

for an always stop sign scenario, you

37:27

can go straight, you can turn left, you

37:28

can turn right. Now, do we have data

37:31

that covers all three of those

37:33

scenarios, right? Uh, yes, but maybe we

37:36

have a little bit less data, right?

37:37

Let's go mine for more data for

37:38

specifically all way stop signs, right?

37:40

Uh, and then we say, okay, well, we also

37:42

realize that we need to consider, let's

37:44

say, two-way stops, right? So, add

37:46

another node to the the tree, right?

37:47

Okay, now we need two-way stop sign

37:49

scenarios, right? So as you expand upon

37:51

individual scenarios and use cases, you

37:53

then can layer on data on top to support

37:55

and continue to build out all kind of

37:57

the longtail quarter cases, right? So

37:59

>> the driving experience you're seeing

38:00

here, right,

38:01

>> I don't want to speak for you, but it's

38:02

pretty good, right? So the car is able

38:04

to handle construction and pedestrians

38:06

and things like that, right? But again,

38:08

as we go to different geographies,

38:11

different scenarios that maybe we

38:12

haven't encountered, right? We can

38:13

always continue to add more. And if we

38:15

what we do is we look at any new issues

38:17

that we find across the fleet all across

38:19

you know the globe right and we'll say

38:21

hey we've never seen anything like this

38:22

before. We haven't simulated this

38:23

anything before let's go find data that

38:25

supports this now one new issue that we

38:27

found and then we can expand it like

38:29

that. It's like here like we're driving

38:30

with cones in the middle of the road,

38:32

right? We don't know if there's people,

38:33

there's guys reversing, right? Let's

38:35

break a little bit and confirm, right?

38:37

Okay, now we have this where we can use

38:39

this in the future, right? If you wanted

38:40

to, hey, we need data on driving past

38:42

cones, right? As an example,

38:44

>> it's it's really interesting, you know,

38:45

like now that I'm really alert and like

38:48

keeping my eyes open for it. Driving's

38:50

hard, man. Like there's a lot of stuff

38:51

going on in the road that you kind of

38:53

take for granted when you're driving in

38:54

the moment because that's all you're

38:55

focusing on. But when you can take a

38:57

step back and just assess and be like,

38:59

"Oh yeah, we've been in a few crazy

39:00

situations already,

39:02

>> you know, it's really funny." And you're

39:04

right, the car has handled it very well.

39:06

Um, good segue into my next question.

39:08

What do you think, you know, is the

39:10

ultimate hard scenario for uh

39:13

self-driving. So like my thought, you

39:16

know, I imagine those videos or images

39:18

uh in India with those giant roundabouts

39:21

where it's just like people are merging

39:22

and weaving through each other. It's

39:23

just like it seems like pure chaos when

39:25

you're observing from above. Um, but

39:28

that's just my thought. What is actually

39:30

the hardest scenario to train for?

39:31

>> I think the way to think about it,

39:32

right, is is very similar, right? What

39:34

do you think would be difficult for a

39:35

human, right? And because a lot of the

39:37

driving behavior is trained off human

39:38

data, right?

39:39

>> What would humans struggle with, right?

39:40

Those are the things that we need to get

39:42

through. So whether it's, you know, very

39:44

dense traffic with scooters and people

39:46

walking in between with no lane lines

39:48

and no road markings, right? uh those

39:50

are the things that I think you know are

39:51

the kind of the longer tale right

39:53

solutions that you know we need to get

39:54

to where I think again with enough data

39:57

right I don't see why we wouldn't be

39:58

able to support you know handling

40:00

driving in different countries different

40:02

geographies right different weather

40:03

conditions right because you have enough

40:05

understanding of what good driving

40:07

behavior is right you can reinforce that

40:09

and make it so that way it can handle uh

40:11

better than a good human driver would

40:13

>> yeah no that makes a lot of sense um and

40:15

then I guess as a follow-up to that you

40:17

know one of the things that I'm noticing

40:18

is super important is lane markings,

40:21

right? Um how is this on dirt roads?

40:24

>> So it'll use the context from other cars

40:26

as well, right? To understand where

40:29

other people are driving, right? So it

40:31

doesn't Yeah. Yes. Obviously having lane

40:33

markings is nice, but uh there are

40:35

certain parts even in San Francisco

40:36

where the roads under construction where

40:37

there are no lane markings, right? So

40:39

the car is able to see, hey, this is

40:40

roughly the width of two lanes and I can

40:42

see the other cars are driving here.

40:44

Okay, contextually, this is where I

40:45

should drive.

40:46

>> Yeah, this must be where the lane would

40:47

be. Right.

40:48

>> Exactly. Right. So the the platform

40:50

itself, right, is is a HD mapless

40:52

solution, right? So it's able to

40:54

understand context, right, and try to

40:57

figure out, okay, this is where the lane

40:59

should be, right? I should drive here,

41:01

>> right? Um, when you say fully

41:03

contextless, like one of the things I'm

41:05

also imagining is like no cell signal,

41:08

no online, like this can this is a fully

41:10

self-contained solution that doesn't

41:12

reach back over the internet or the

41:13

cloud to anything, direct.

41:14

>> Yeah. So this is all built uh, you know,

41:16

in the car, right? So this is a

41:18

production car. We've just flashed one

41:20

of our latest kind of software builds to

41:22

the car so that way we can enable all

41:23

these features, right? But the physical

41:25

hardware on the car is the same, right?

41:27

And there's no, you know, there is an

41:28

internet connection that we use to

41:29

upload data, but there's no streaming,

41:32

you know, to this car saying, "Hey,

41:34

here's what's the latest map is of

41:35

what's going on in San Francisco."

41:37

>> Um, what about like you, this is maybe a

41:40

silly question, but we just passed a

41:42

handicap parking space. If I put a

41:44

handicap parking um you know thing here,

41:48

would it contextually understand, hey,

41:49

now I'm allowed to park in a handicap

41:51

space?

41:51

>> So, not yet. That's not something we

41:52

have yet. So, we're not looking at like

41:54

curb colors or anything quite yet, but

41:56

you know, those are some of the concepts

41:57

that some of my colleagues are working

41:58

on. That's, you know, exciting to see.

42:00

You know, hopefully, you know, we'll

42:01

roll some of those features out in the

42:02

future. Is there a so like broader than

42:05

like even that you know if I had

42:08

handicap put like I guess what I'm

42:09

really asking is like is there a button

42:11

I can push to say hey this vehicle is

42:13

allowed to park in certain special

42:15

spaces that would not otherwise be you

42:18

know handicap in this example but you

42:19

can imagine like any wide variety of

42:22

>> yeah it depends on what the the partner

42:24

is looking for right so you we can adapt

42:26

the stack to do any number of things

42:28

right so if someone wanted us to look

42:30

for you know identifying curb color,

42:32

right? And understanding that yellow

42:34

means temporary parking, but red means

42:36

no parking, right? Those are the types

42:37

of things that we can do and we can work

42:38

on with the partner in order to provide

42:40

that type of capability. Uh, which we

42:42

haven't done that yet here for this

42:43

case.

42:44

>> What about Sorry, I realize I'm asking a

42:46

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video features a real-world, unedited one-hour test drive in downtown Los Angeles using a Mercedes vehicle equipped with Nvidia's L2++ autonomous driving platform. The narrator rides with Armen Connie, a senior product manager at Nvidia, to discuss the system's technical design, including the use of cameras, radar, and ultrasonics without LiDAR. They explore how the system handles complex urban scenarios like construction zones, pedestrian interactions, and gridlock, while highlighting the 'world model' that synthesizes sensor data to inform decision-making. The discussion also covers the roadmap from L2++ to L4, the role of end-to-end training models, and the collaboration between the new AI stack and a traditional safety-backup stack.