E24: I Tested NVIDIA's Self Driving Car... Is Tesla In Trouble?
1997 segments
Today, you're joining me for something
really special. A realworld unedited
1-hour drive through downtown Los
Angeles in a Mercedes equipped with
Nvidia's L2++ autonomous driving
platform. This isn't a highlight reel,
and it's not a simulation. It's
continuous footage of the system
navigating the everyday chaos of LA
traffic, lane merges, sudden cutins,
construction zones, and unpredictable
pedestrians. Joining me for the ride is
Armen Connie, senior product manager for
autonomous vehicles user experience at
NVIDIA. I asked him every technical
question I could think of. So, you'll
hear him explaining what's really
happening under the hood, from sensor
fusion and decision logic to how the
user experience is designed to keep
drivers informed. For investors, I hope
this footage helps you assess how far
Nvidia's automotive platform has really
come and how it compares to others in
the autonomous mobility market. Every
few minutes, something unexpected
happens, and how the system handles it
might just challenge your assumptions
about who's ahead in full self-driving,
or at least by how much. Your time is
valuable. So, let's get right into it.
>> So, the system's engaged, right? So, you
can see we'll try to make this uh right
on red here after uh this car passes by.
>> Yeah.
>> Uh so, what you're experiencing here,
this is what we call our level two plus
experience. So, this is built on our
Hyperion architecture, right? So, this
car is using 10 cameras, five radar, and
then 12 ultrasonics uh for parking.
>> No lighter.
>> No lighter on this car.
>> Okay.
>> Right. So, because we still have George
is still a level two product, right? Uh
we still have the driver here. Uh and
it's designed where he can collaborate
with the car, right? So, uh the car will
follow all the speed limits. Uh if he
wanted to increase the speed, he can do
that from pressing the steering wheel
button. There's a speed adjustment
there. He can request lane changes by
using the turn signal stocks. uh or if
he wants, let's say there's a big
pothole in the road or something like
that, he would be able to collaborate
with the car and actually you adjust the
steering and then release the steering
and then the car resumes uh you know the
driving. So uh but for now we'll let the
car do everything. So we're approaching
the stop sign. The car can see the stop
sign there. It'll know to stop. It knows
to follow the right of way order right
before proceeding. So we're going to
start with a kind of a little section
here where we're going to get to like a
shopping area with some uh restaurants
and stores. So hopefully we'll get some
people crossing the street and some
delivery vans and things like that just
so we can show that we can handle a
variety of these scenarios.
>> Sure.
Uh one of my first questions is going to
be how did you guys come to the design
decision not to incorporate LAR?
>> So we work with uh our partners here to
determine what sensors right we want to
use. And for a level two plus product uh
we felt that we can achieve that with
just the 10 cameras and the five uh
radar with ultrasonics as well. Uh but
for our level three and level four
initiatives, that's when we'll add the
additional LAR to it. And then we'll
also change the actual driving model
that we're using to a bigger model. So
it all just scales to what the, you
know, design intent is for the given
product.
>> Yeah. And I assume that's like going to
be like a more extended version of the
same stack. So like the people who just
want L2++, you know, the people who want
three, four, or eventually five. It's
just an evolution of that same stack,
not a different stack completely.
>> So it's the same principles, right? So
what we're experiencing here today, this
is running all in a single orin. Uh and
then this uh experience is about 95% of
the driving will be done by Alamo,
right? And then we still have that
classical stack sitting kind of you
remember those old drivers education
cars where you have like two sets of
brake pedals, two gas pedals, two
steering wheels.
>> So the way I like to think about it is
uh Alpha Mayo, the end model in the
driver's seat, right? It's doing the
driving. Yeah,
>> but we have this classical stack that is
sitting in that passenger seat with the
extra set of brake pedal, gas pedal, and
steering wheel to take over to help
enforce certain rules if needed to. So,
that's how we're able to have both the
safety of that kind of classical stack,
right? But also the human driving
behavior of the end to end model where
you get that smooth, comfortable driving
behavior.
>> Got it. Um, one thing I'd love to
understand is, you know, you mentioned
there's a lot of cameras, there's a lot
of radars, and there's a lot of
ultrasonic sensors, right?
um what do each of those sensors do and
how do they get combined into this
larger like 360° view of what's going on
around the car?
>> Yeah. So it can take input from all
those sensors and it creates what we
call the world model, right? So the car
can see that all these cars are parked,
none of them are moving. So we can
detect the velocities for example,
right? Based off what the car can see
with the cameras, we can see the lane
lines and the lane markings. So we can
tell that we're in a drivable lane here,
but to our right is a bike lane, right?
So the car can label and understand what
all those are. So then it creates this
reconstruction of the world, right? And
it uses that to understand behaviors,
right? So for example, at an all-way
stop sign, we can use that to understand
right-of-way order, right? So we can see
when the other cars arrive and determine
when it is our turn to move in terms of
precedence, right? But then we also have
to consider there's these people
walking, right? So they will also impact
our ability to start. So you can see
this guy's here stopped, right? You have
these guys in the crosswalk. It's safe
for us to proceed, right? We also can
see that there's this guy on the scooter
in the bike lane. We don't need to freak
out, right? We can just keep driving in
our own lane. No problem, right? We can
just drive next to this guy,
>> right?
>> So the end to end model is using kind of
the front camera where it can see,
right? And then it's receiving inputs
from that world model to see what's also
going on behind it as well. And then the
classical stack is using all of them to
determine where it should go.
>> And sorry. So it sounds like everything
you've described so far is coming from
the cameras, right? That's the
>> cameras and it's also using the radar as
well. So
>> and the radar.
>> Yep. So it can use both basically to
understand the velocity of an object
right or a person that's walking right
it can tell right this is a car in front
of us. So it combines actually both
inputs in order to understand what it's
looking at and what it can detect with
the radar.
>> And then the third type of sensor you
mentioned I believe was ultrasonic. Is
that right?
>> So those are used for parking. Right. So
when we get really close to curbs and
things like that that's where we're
using those ultrasonics. But for driving
most of it's done with the uh camera and
the the radars.
>> Yeah. And and the radars are there
primarily to do range and speed.
>> Correct. You've got it. That's very
cool. How come you can't use um uh
information from like multiple cameras
like in stereo vision to determine range
and speed? What why radar or not?
>> We do a bit of both, right? So, you
know, there that's bit of the secret
sauce, right? But, uh you know, we're
using both to confirm and understand the
world around us, right? So, that's where
you have that redundancy that's helpful
to understand what is drivable, what
these other things are doing, right? Are
these people, right? You know, a radar
can see these people are crossing the
street, right? without the camera to
tell you that hey this is a person right
I know a person shape right you're able
to combine those together and understand
that these are people walking versus
that's a car or a small scooter right
>> so certain solutions on the road try to
approach this from a vision only
perspective
>> um how did you guys like can you walk me
through a little bit of the bigger
thought process that made the
determination you know what vision only
may not be enough especially for like an
L3 or L4 solution and these other
sensors need to come into play is that
from a safety perspective, a regulation
perspective, a capabilities in general
perspective. How how did you guys decide
to use more than vision only in the
first place?
>> For our Hyperion architecture, right, we
wanted the redundancy, right? So for
like level three and level four, uh
we'll use to uh to Thor, right? And
we'll also have the LAR there, right? So
we get more information, right?
>> And sorry, can you just briefly explain
what THOR is and Orin is and
>> Yep. So these are the different uh
onboard computing chips that we provide
to our partners, right? So the THOR has
more computing power than the Orin. So
with additional computing power, you can
use bigger end toend models, right? We
can take inputs from more uh signals,
right? So we can use more cameras to
power that model to get more
information. So like here's a great
example. We just had that light turn
yellow right as we're pretty close,
right? The car has to understand, should
I, you know, step on the brakes and stop
before the uh light or should I proceed
through kind of like a human would,
right? So it's calculating the distance
between us and the stop line, how fast
we're moving, right? to make some of
those decisions as well. So, sorry, just
an interesting scenario that we we had.
>> Uh, so yes, like coming up ahead, I
think we're gonna have an unprotected
left turn, which also be interesting to
see, right? You know, we have to
consider the oncoming cars, right? We
have to consider if anybody's in the
crosswalk. So, it's it's it should be
interesting uh coming up here. But this
is that shopping area saying where we
might have some double parked vehicles
and some pedestrians crossing the
street. So, uh earlier today, we've been
lucky. We've got some fun interesting
scenarios. I'm hoping we get some to
share with you guys as well.
>> Yeah. No, I'm looking forward to that.
I'm I'm noticing this is probably the
most boring driving job ever because
it's like you're not touching the
steering wheel. You're never touching
either pedal. Like it seems so smooth
that it's like it's interesting just
being here almost like a safety backup
instead of um the primary, right? Like
you're really being driven instead of
you driving the car is what I'm just
noticing for really the first time.
Yeah.
>> Yeah. So for for example, I did a test
where I drove from San Francisco to San
Diego in one day. So that was about 14
hours. It's about 1,000 miles, right? So
it was another driver and I and uh we
went down and you'd imagine being in the
driver's seat for that long right you'd
arrive really tired and you know
irritated right you sitting in traffic
throughout the day and things like that
>> but honestly by using the system and
letting the car handle a lot of the kind
of mundane task of you know sitting in
bumperto-bumper traffic in LA right the
car handled a lot of that right so there
I actually arrived even after a long day
quite refreshed and actually not that
tired because you release so much of
that you know processing that you're
doing right while you're driving that
you arrive a little bit more refreshed.
>> Yeah. You you uh eliminate a lot of that
decision fatigue, right? That decision
fatigue.
>> What uh what's going on with the
displays here? So, like this seems to be
a pretty static display. Um walk me
through like what the driver what kind
of information you're presenting to the
driver in a situation where they're
being driven versus them driving
themselves.
>> So, here on the center display, right,
you can see just the navigation, right?
So again, we just set a route out to
show you a variety of scenarios, but
we're just using that purely for
navigation, right? So we're not getting
any there's no HD map here. So we're not
getting any hints about the lane to our
right is a parking lane, then there's a
bike lane and there's this lane. So we
don't get any of that information. Uh
that's based purely off the car can see.
And then George, when you have a second,
do you want to switch over to the
conference view? Right. So we also then
can provide these inputs to our partners
where they can choose how they want to
visualize what
>> I'm sorry, I'm looking there now, right?
>> Yes. In front of the driver. So the
instrument cluster there, right? So we
can show that, hey, we see we have this
lead vehicle, right? So the car is able
to present that information to the
driver to help communicate a little bit
more about what it can see, what it's
doing, right? And we have all those
inputs that we can share, right? So we
can share things like traffic lights,
other vehicles, lane detections, and
then the partners can choose what they
want to uh use to display.
>> And sorry, just just so for clarity for
me, everything I'm seeing on the screens
now is purely for humans. None none of
this is also like information like the
information that the car itself is using
to make decisions is completely separate
from these like
>> what you're saying. Yeah. The only thing
that car is taking here is basically the
route, right? That's the only thing that
it receives that you can see. So as we
adjust around and again as we said
towards the end here, we can update the
route and set new points that are not on
the route. No problem, right? Uh at that
point the car just getting the
navigation to turn on this street,
right? Proceed on that street. That's
all it's getting from there.
>> Um how come? Why like why why not give
it as much information like is is the
idea that's the sufficient amount of
information to do the job and you don't
need to give it more or is there like
why not give it for example all the
speed limits uh to these roads is it
because it's enough to have it determine
things uh with cameras using looking at
street signs even in low visibility or
>> so it's there's always a data quality
question right so sometimes we can get
incorrect information from the map or it
doesn't have information right so we'll
use what we have if it's there, but if
it's not there, no problem. We'll go off
what the car can see from perception,
right? So, as we see speed limit signs
change, right, the car then can see that
and adjust its speed accordingly, right?
So, you know, if maps don't get updated,
things like that, we want to make sure
that the car is always driving based off
what is most relevant. Like we can see
we have some construction coming up
ahead, right? That may not be
represented in any mapping, right? So,
the car just needs to be able to handle,
okay, there's a lane closure coming up,
right? So, you can see the car stopped.
Does this guy cut in front? Right? No
problem. Uh, it's able to stop. And then
you can see up ahead, right? We'll have
this lane closure with that big LED
board, right? So, we can see, you know,
the the signs there, right? And we know
we want to make a left turn, so it's
going to want to get over to the left,
but it's actually going to change its
mind and go here, right? No problem. It
sees there's a guy standing on the road,
right? So, we can come here and we'll
drive here with this center lane that's
closed, right? How often do you see a
center lane is closed?
>> Yeah. Right.
What's that like for you as the driver,
if you don't mind me asking? Like, you
know, someone cuts in front of you, your
hands close to the your pedals are close
to the feet, your hands are close to the
and you do nothing and it just works
like
>> um Well, I've been testing the software
enough, so I know the car can handle
most of the situations.
>> Was the first time it was just like,
"Oh, this is crazy." And then,
>> no, I've seen crazier stuffs.
>> Okay. Yeah. So, so this it sounds like
it can really handle truly like outlier
situations is what I'm getting at.
>> Yeah. So the, you know, the model's been
trained enough, right, that we can do a
lot of the general driving, right? And
what we're excited about is, you know,
we'll have, you know, kind of a beta
release of this, uh, in Q2 of this year,
right? But we're looking for a, so we
can see we're trying to get through with
these two guys here. No problem. U,
we're trying to do a nationwide roll out
by the end of this year, right? So,
>> and sorry, when you say nationwide roll
out, what do you mean? Is that like
nationwide in Mercedes? Like what what
is
>> for customers? Yeah. That are buying
this car, right? they'd be able to, you
know, purchase this software, right, and
use it to drive, you know, from here to,
you know, Miami or, you know, everything
in between, right? So, that's really
exciting. And then from that, right, as
the, you know, we have this rolled out
in more and more customer cars, right,
we start to get, you know, interesting
data events that we get from those cars
as well. So, that data will be sanitized
and sent back to us. So, we can always
evaluate all the new events that we're
seeing that maybe our fleet didn't
catch, right? But someone living in, you
know, a state where we don't have a test
car, right? we can then start getting
that information that way as well and
use that to enhance the models for
future releases.
>> And I so I assume a bunch of things are
happening at once, right? Like for
example, the software is being offered
in more and more automobiles and with
more and more auto uh manufacturers and
then separately there's also growing
more and more capabilities in each one
of those. So for example, we're in an L2
plus vehicle, right? Yep.
>> And then you know eventually level three
and beyond. Can you speak a little bit
to All right, you just said a little bit
about the nationwide roll out, right?
What's the what's on the road map uh the
other way, right? Like when do you guys
expect to sort of reach a level three, a
level four?
>> Yeah. So, as you may have seen a lot of
the news with GTC this year, we
announced that with Uber, we'll do a
level four robo taxi in Los Angeles in
San Francisco starting next year. Right.
So, you can see we're nice going down
this nice narrow road, right? And sorry,
he he actually touched the wheel there
or was it more about what's what what's
the distinction like why did why why um
manually turn there as opposed to let
the car do it?
>> Um
>> I'm genuinely like I'm just trying to
understand the
>> I'm just a safety driver. So if I feel
like sometimes um like uh the car maybe
is um going to uh get into contact with
objects then I can collaborate uh I can
do something called collaborative uh
steering
>> but the car still lets me uh handle the
wheel
>> and it's like a seamless transition it
seems like like you you did one tiny
maneuver hands off and it wasn't like a
hard intervention like the mode didn't
change.
>> Yeah. So the design is so that way for
level two, George can be involved or he
can let go, right? It can go either way,
right? So like I've done that turn six
times today, right? No issue, right? So
in this case, right, depending on if
George wants to get closer or not,
right? He can help the steering out,
right? To increase his comfort, right?
Or increase our comfort, right? In that
case,
>> and that hand like I think what I'm more
commenting on is like I've I've been in
other, you know, level two assisted cars
in the past. Um, and when you make a
manual intervention, from then on,
you're in manual driving until you
re-engage a lot of those features. But
in this case, it seemed so smooth. It
was like touch the wheel, adjust the
turn a little bit, you're back to hands
off, feet off, you know.
>> Exactly. So,
>> so in this case, if George can he can
tap the gas, he can, you know, touch the
steering wheel, right? The system will
stay engaged. It's only if he hits the
brakes that'll disengage completely.
>> Got it. Uh, super interesting. And
sorry, so right before that happened, we
were talking about uh level three and
level four. Can you just remind me? So
with Uber, that's what you're talking
about.
>> Yeah. So we'll have uh level four will
start rolling out in LA and San
Francisco to start, right? And it'll be
28 cities by the end of 2028 uh with
them. So we're excited to see that
coming as well. So we can see this guy
stopped, right? No problem. We have that
>> there. Uh so yeah, so we're excited
again to see how the architecture can
scale from a level two plus product all
the way up to that level four, right?
Where the car would have to do
everything, right? Where you you don't
have George here, right? You don't have
anyone in that seat, right? The car
would be able to handle all of those
scenarios. Are you expecting when that
happens, will there be certain new kinds
of cars coming out that maybe don't have
a steering wheel at all? Like how does
this impact the future of what
automobiles will even look like?
>> Yeah, there's a couple of different
ways, right? I think we can imagine a
world where yeah for kind of robo taxis
right where there never will be a driver
right you can have a design where you
know there's a car that doesn't have a
steering wheel right or doesn't or have
the seats that maybe face inward right
that's one concept
>> uh the other one right it can be you
know consumer grade where you know you
can buy a car where you know at least
living in California right I might want
to go drive on you know highway 1 and
PCH and go drive you know the beautiful
scenic countryside roads along the ocean
but then when I'm sitting in traffic in
San Francisco, I want the car to do
everything, right? So, you can have
different approaches to, you know, when
you want to drive versus when you don't.
So, the the as long as you have the
sensor set, right, the stack is flexible
enough that you can have a steering
wheel there and we can design it where
we want the driver to be part of the
experience or we can do it where we
don't want the driver to be part of the
experience at all.
>> Sure. No, that makes a lot of sense.
Speaking of the stack, um since you know
so much of it is camera based, uh what
is performance like you know like super
foggy weather, nighttime, bad rain,
right?
>> Yeah, it's a great question. So the
system has levels of kind of degradation
that can it can accept, right? And we
also can understand where maybe the
blockages, right? So let's say there's
dirt that gets on some of the cameras,
right? For example, if we're looking in
front of us, if it's blocking where we
can't really see the tops of these
buildings, right? We don't really care
as much, right? you don't need to
prioritize things that are up high,
right? As long as we still can see
directly in front of us in that type of
drivable space. So, it's prioritizing
different areas for each of the cameras
of what is most important, right? And
then until they reach, you know, a
certain degradation level, right? Then
it may say, "Hey, okay, actually, we
want the driver to take over in this
case, right? And then with kind of the
level threes and level fours, right,
that's why we want the redundant sensor
sets there so we can have a couple
different options in order to help aid
uh what the car can see, right? and make
sure that you know it's able to see all
the objects and all the cars on the
road.
>> That makes a lot of sense. So, we've
we've seen a lot of interesting use
cases already, right? For example, um or
edge cases, sorry, I should say, like um
construction in the middle of the road
where you have to make a left right
decision, but the construction zone is
between them. What is some of the like
craziest edge cases you've seen that
turn into real practical examples and
training data for the model?
>> Yeah, I would say a lot of construction
ones have been interesting, right? Uh we
I had one example uh it was actually in
San Francisco where there was a row of
cones, right? And there's people working
in the middle. No problem, right? We see
that, you know, a million times. No
issues, right? But then one of the
construction workers decided to throw a
cone in front of the car, right? To say,
"Hey, stop. We have we're going to
unload some stuff." So it literally just
throws a cone, right? And the car sees
this object. Yeah. So then it stops,
right? But you're like, "Okay, wait,
what? I've never seen someone do that
before." And the guy just didn't want to
wait. So he threw a cone in front of the
car and then he went and carried, you
know, a couple boxes across the street
and then went and picked up the cone and
then walked out of the way, you know,
>> and it was like totally fine.
>> Yeah, it was totally fine, right? But it
was one of those things you're like,
"Okay, I've never I've never seen that
even as a human driver, right? So we can
see we don't want to block the
intersection, but then we also have, you
know, these pedestrians here, right? So
we want to try to clear the ped the
intersection as much as we can, right?
But then we have these guys that are
walking, right? Cuz the light came to a
stop. We're in gridlock, right? This is
a, you know, a nice deep dense traffic
area in downtown San Francisco.
>> So, this is a case where it seems to
have made the decision, uh, even though
we were already moving at low speeds and
the light was yellow, uh, then it turned
red while we were still in the
intersection. Is is that typical? Like,
walk me through kind of what just
happened versus what maybe the average
person would have expected to happen.
>> Right. It's, you know, so you have this
guy that's really close to a spine, so
that's why we got the rear blind spot
going off there. U, it's interesting,
right? The, you know, yellow light
handling is also one of those
interesting things as a human, right?
Where you can tell how quickly you're
moving towards a car or towards the
intersection and you can kind of gauge,
okay, I should stop for this one, right?
I'm really far away. I'm not going that
fast. I should stop. Uh versus, hey, I'm
really close to the intersection. I
should probably proceed to be safer that
way. You're not slamming on the brakes,
right? So, it's a very similar approach,
right? We have enough training data on
yellow lights, right? Where it's
learned, okay, in this situation where
I'm, you know, about this far away at
this speed, right? I should proceed
through versus I should not. And then we
also know in this situation where if we
end up in a situation like that where we
get kind of stuck where we've already
entered the intersection, we're already
passed kind of the weight line, right,
we should try to clear the intersection,
right? So that's why we came over to the
right lane and we're able to kind of
open up and clear up the intersection so
that we're not blocking the the cross
traffic.
>> Got it. Yeah. As a more selfish driver,
I would have probably stayed in the
intersection, not gotten in this lane,
you know, so it's really interesting
seeing um the way the car prioritizes
certain things. Like for example,
clearing the intersection at the cost of
our own convenience because now we're in
sort of a lane people park in versus the
more human decision of like, oh, I'll
just wait because I'm about to clear
this light even though right now I'm
sticking in the intersection. Right.
>> Yeah. But then it also saw we had people
walking. Right. So it stopped. Right. So
it let those people cross ahead. Right.
>> Right.
>> Whereas I probably would have just
beeped at them.
>> What are you most looking forward to
like near-term more on the road map? Is
there like a specific feature that's
coming soon? Is it more like the global
rollout? Like walk me through as
somebody who lives this sort of
day-to-day like what's kind of next that
you're looking forward to?
>> It's a great question. I think what is
exciting is it's kind of like watching
like a 16-year-old learn how to drive
right as it gets better every day,
right? So uh as mentioned when I'm not
uh sharing these experiences with you
guys, I'm driving the car every day and
experiencing uh the latest builds.
>> Uh so it's fun to see the car get better
at handling those edge cases like the
construction worker scenario I gave or
you know the construction in the middle
lane. Seeing how the car can handle more
and more complex situations is really
cool. And then as I mentioned like the
test down in San Diego, right, taking it
to different locations, right? Seeing
how the car can handle, you know, the
different scenarios, right, is also
really exciting to see how it can drive
in different cities, right?
>> Jeez. Yeah, this is this is tough. I I
get it. Um and sorry I don't know if you
said this when we had already started
recording but can you just please say
like what your actual role is what you
actually do at NVIDIA right
>> so I'm one of the product managers that
works on our ADAS features right so uh I
specifically am on our user experience
team so we try to look after the overall
driving behavior right so is the car
comfortable is it being safe right how
does it feel when the car is driving so
that's my priority when it comes to uh
working on the stack
>> and sorry when you say you know is Is it
comfortable? Is it safe? Do you mean
like individual to individual like this
experience or do you mean like based on
the data we're seeing, you know, the
smoothness of stops, the ease of turns,
like the more macro level like
statistics, the overall experience is
safe, easy, like
>> both, right? So, we look at both, right?
So, we have a number of, you know, uh,
regression tests that we can do where we
could run the car through, I don't know,
10,000 left turn events, right? and make
sure that it always makes sure that it
clears safely and doesn't collide with
any traffic or pedestrians in
simulation. Right? So, we do all of our
offline testing and then we also do
on-road testing, right? Because
ultimately we also want to validate,
right, the behavior uh in the car as
well. So, uh I do both, right? And it's
a lot of fun to actually get in the car
and actually experience different builds
and see some models may be more relaxed,
some models may be more aggressive,
right? And everything in between. Uh and
we try to kind of design for you know
kind of the 80% where you know my mom
would be happy to get in this car who
someone who's not technical right
doesn't you know doesn't want to give up
control right where we can get to the
point where someone can feel comfortable
uh and feel safe using these types of
software so simulation while while we're
on the subject I'm really interested you
know you take 10,000 right turns in
simulation for example right and then
you look at data from 10,000 right turns
in real life on the same model how big
of a variance or like difference I guess
between what you see in simulation and
what you see using real data is there
>> yeah with Cosmos and our physical kind
of AI simulation right you get the real
world physics right so it's it actually
behaves differently right if you run a
simulation for an area with snow and
rain right braking distances are longer
right because in real life they would be
right so uh it's quite accurate actually
you you'd be surprised where okay this
model shows hey this one might uh we
call like under steer where it turns and
it drifts into the other lane because it
can't keep its lane as tightly, right?
We can see that in simulation and if you
deploy that model to a car, it follows
like, oh, actually this car doesn't
follow the turn trajectory as tightly as
we'd like it to, right? So, it gives you
a pretty good uh sense of what the
performance will look like.
>> It probably also gives you a pretty good
sense of like how good drivers are,
right? Like what a car would do versus
what people decide to do, like how many
people decide to take control of a turn
versus let the model go through it, for
example.
>> Yep. So, we get that data. We also, it's
interesting. Obviously, it's trained on
a lot of human driving data, right? So
things like uh what we call like the
California role where you creep through
a stop sign and they come to a complete
stop, right? So you can see you know
certain data sets have more of that in
it. So all of a sudden now we see the
model thought it stopped but it really
didn't. It's like okay wait no we need
to enforce that. So that's where having
that classical stack underneath is
really helpful where you can enforce
certain things like making sure you stop
completely for stop signs, right? Or
>> uh for example if you were to go to
different states where you can't make
right turns on red, right? you can
enforce it by location, right? Things
like that. You're also able to help, you
know, enforce the behavior.
>> Can you speak a little bit more to that
enforcement? Is that like um, you know,
a rules-based enforcement? Like, hey, in
this boundary box, which is the state
line or whatever, the rule is no right
turn on red, or is it more like a
training a separate model state, but I
know not exactly a separate model, but
like walk me through a little bit.
>> So, we we do a bit of both, right? So,
you can have region specific. So, you
can see this guy cutting in, right? He's
here. No problem, right? we can kind of
get over it's all good but then we see
these people right so we're just being a
little cautious here u so you can do
different data sets right where you can
have your you know California data set
Florida data set if you'd like right you
can have different uh locations in the
actual model yeah but then you also to
your point you can also use uh rules to
enforce certain behaviors like hey when
in you know a different state right a
state that doesn't allow right on red
right you can do that right so you also
can see uh the sign there that says no
right on red. Yeah.
>> Right. So, we'll come to a stop here and
we'll wait a little bit further. Right.
We have a couple areas where you can
make a ride on red where the car will
creep forward and it'll make sure
there's nobody coming and it'll make
that turn here.
>> And sorry, like um how does it determine
right now? I mean, this is gonna sound
like a silly question, but like does it
know it's in California right now
because it's in some certain latitude
longitude box or is it told that it's in
California or like
>> the car has GPS, so it knows that it's
uh at least for this.
>> So, it's just using GPS data on top of
that. Okay. I didn't know if it was Got
it. That makes a lot of sense.
>> So, so you see we had the green light,
but this guy decided to go. So, we
yielded for him, right? So, even though
humans can be bad actors, right? Now, we
can go. Nobody else is going. Great. We
can make this turn.
>> Yeah.
I don't know what's worse, me behind the
driver's seat of a self-driving car or
me being a pedestrian once self-driving
cars are on the road. You know,
>> I'll be a bad doctor actor either way.
>> That's humans.
>> Um, what are some typical confusers that
um sometimes can make the car misbehave?
So, for here's what I'm specifically
asking. Right now, we're on an incline
and there's like low hanging wires
directly. I know not directly in front
of the car, but because our noises, our
nose is pointed up, we're seeing things
that would seem to be unusual to
sensors, right?
>> Yeah. So, to kind of the earlier part
about like camera degradation, right, we
know to prioritize things that are
closer to the ground, right? So, what we
can see where the car actually is going
to drive, right? So, if we see something
weird like the wires above, right, we
can say, "Hey, that's probably doesn't
have anything to do with where we're
driving, right? We can ignore, you know,
these cables above us, right? Those
aren't lane lines, right? Those aren't
railroad tracks, right? Those are just
cables, right?
>> So with enough data, right, you can
learn to dep prioritize certain regions
of what the car can see and then also
what it is seeing.
>> Does the car care like does the car
understand when it's like how does the
car understand when it's on an incline
and it needs to look like further like
closer to the ground?
>> So you'll see actually uh on this route
we also have some of the nice classic
San Francisco hills, right? So the car
can tell that there is occlusion there,
right? So maybe it needs to be a little
bit more cautious when dealing with
always stop signs in that scenario
because you might not be able to see
someone right who's coming. So uh it has
this understanding of hey I can't see
right I can tell there's gradient here
let me be uh a little bit more cautious
drive a little bit more slowly
>> right like we also see a section where
the speed limit will be 25 miles an hour
right but the car actually will slow
down because at 25 mph it feels very
fast on a steep hill in San Francisco
right so the car will naturally slow
itself down
>> and again that also comes with kind of
the endto-end model where you get enough
diverse driving data right it'll learn
that humans may naturally slow down even
though the speed limit may be higher,
right? If it's a narrow road or a steep
road, we naturally will slow down. The
car will also learn that behavior.
>> That makes a lot of sense. And I guess
one thing I just thought of is some of
the cameras, I believe, are mounted
higher than us, right? Like they're
mounted up here. So things that seem
invisible to me, the camera might still
be able to see over it. Right.
>> Exactly. So Right. And it has multiple
cameras and it has a radar, right? So it
can compare the position, what it can
see from each of those and try to
determine, okay, is there something
there? Right. Should I be cautious?
what's happening here?
>> What do you do when um there's a
situation where one sensor says one
thing and another says another? So, for
example, a highly reflective surface
surface to the radar, but that doesn't
show up like on the cameras,
>> you know?
>> Yeah. So, it'll do a comparison, right?
And we can assign like confidence
percentage to different things, right?
So, we can say, hey, we are not sure
what this is, right? So, we we have what
we call multi-ensor fusion, right? And
then it can choose what to wait and what
to prioritize based off on how confident
it is on what it's detecting.
>> Got it. And like does that multi-ensor
fusion think about like the resolution
of like cameras versus the radar like
>> Yeah. So it can it knows obviously we
know the spec uh of what the camera is,
what it can see, right? So it's able to
determine, hey, what is the likelihood
that I think that there's something
here, right? And then you can see here
it looks like maybe it's trying to
consider a lane change for this guy
who's stopped who decided to stop. So
we'll wait for these cars to pass.
So you can see it's creeping forward.
All right. And it was able to make that
lane change to go around all these guys
that are stopped. Uh one thing that we
found that's really interesting with the
model is lane changing actually feels a
lot more natural. Uh with the classical
stack, right, you have to identify where
the gap is in traffic, right? So you're
calculating the velocity of the lead
car, the rear car, right? And also have
to determine where to position your own
vehicle, right? So getting that kind of
what we call the speed adapt phase where
the car is accelerating and slowing
down, right? You have both lateral and
longitudinal acceleration that you need
to consider, right, when making those
lane changes. Uh with the classical
stack, we found that sometimes it could
feel
>> a little bit more robotic or a little
bit more jerky, right? Whereas once we
started training the model on lane
change data, it felt really smooth,
right? where it's able to, you know,
gently slot itself into gaps in traffic
where it feels much more intuitive and
it feels much more humanlike because
it's able to, you know, more naturally
control both lateral and longitudinal uh
behavior.
Are there any surprises? Like, so that's
a good example of something where adding
in that next layer of data made the
driving experience noticeably and
significantly better, right? Less jerky,
more smooth. Um are there other examples
like that you can share where it's like
adding that extra layer of um training
really resulted in a clear change in
behavior?
>> Yeah, a big one that was a challenge for
us is handling like double parked
vehicles, right? So you have a car
stopped in front of you. You need to go
into your oncoming lane, right? And then
come back into your original lane,
right? So you have to detect the
distance between you and the double
parked car and then you have to also
check for any oncoming vehicles, right?
That can be really challenging because
if it's something narrow like a bicycle
or motorcycle ride or something coming
really quickly, right? That can be a bit
of a challenge to know when you should
decide to go versus when not to go,
right? But giving that uh a big data set
of human driving, right? The model we
found was actually uh much more natural
in its timing, right? So it's decision
to go versus not go felt a lot uh more
natural and actually the maneuver
itself, the quality of the maneuver was
a lot better, right? And so that's
another one where we're really excited
to see, okay, here's this really
challenging scenario, right? But the car
is able to do it in a way that feels
natural and humanlike without giving you
like big jerks or harsh brakes because
it doesn't want to hit something. So
that was really nice.
>> How is the car itself at parallel
parking? Is that something like
>> Yeah, we can do parallel parking. We can
do perpendicular parking, right? Can do
angled spots, right? So, uh we also have
parking capabilities. We're looking to
add kind of ability to park within
parking structures and things like that.
So that's other products that we're
working on. Uh so it's exciting to see
how quickly it advances, especially with
uh the end toend models, right? We see
the rate of improvement is pretty quick.
And because it's sitting on top of that
classical stack, you still get the
safety of the classical stack, right?
But then you get all these improvements
for things like lane change or double
park vehicles, all that sort of stuff,
you're able to quickly iterate on and
get the advantage of that endto-end
driving behavior and the human-like
behavior.
That's really interesting. Oh, yep.
>> Yeah. So, it sees there's guys there
with cones, right?
>> And then it seems to like have
understood, hey, that cone actually
isn't in my lane. I'm just going to keep
going. No problem. Right. That was
really cool.
>> Yeah. So, you can see, hey, we'll slow
down. We see there's a guy. Is it going
to step in front of us? What's going to
happen? Right. Okay. No. Okay. I can go
ahead and proceed.
>> What is the So, like um how many times a
second does it make decisions? I know
it's like continuously, but you only
get, you know, I'm making this number
up, 60 frames a second from the cameras,
let's say.
>> So, like how often is it processing and
making those decisions per second?
>> Uh, I don't know the exact number off
the top of my head, but I can tell you
it's generating trajectories basically
every second, comparing that with the
the classical stack to determine, hey,
is this rational and is this safe? So,
we can follow up after and get you the
exact number if you
>> Sure. Yeah. Just generally,
>> just curious. Yeah. Um, when I I went to
the Q&A with Jensen and I got to ask him
a question and I asked him, you know,
with the advent of like OpenClaw and
Nemoclaw, what are you most what
application areas are you most excited
to see tackled with these new
technologies? And actually to my
surprise, he talked a lot about
self-driving and how agents and agent AI
will sort of be infused into cars in the
future and help with that decision
stack. Can you speak to as somebody
who's like a little more on the ground,
can you speak to are how are you guys
thinking if you are at this point using
Open Claw like how does that fit into
this larger picture if at all yet?
>> Yeah. But for you know my role no not
not yet right but uh some of our other
team members are using it to help you
know search for you know certain data
sets that we need right so if we're
looking we can use AIs to search for
construction workers that throw cones in
front of you right across all of the you
know data that we collect from all of
our fleet plus you know data that we get
from customers and partners right so
you're able to use it to train you can
use AI to find the data you need to
train your model right so
>> uh I don't work on the model directly
but you know my team members that too,
right? There's different ways that we
use, you know, AIS in that way in order
to find the data that we need for those
corner cases that we're looking for to
help improve the performance.
>> Yeah, that makes a lot of So, it's
sometimes it's more about um using AI to
go through all this largely unstructured
data.
>> Correct. We use it to label the data,
right? So, we can say, hey, these are
cars, these are people, these are dogs,
right? These are cones, right? And then
we also can use it then to go capture or
collect the data that we need to train
the model in a certain scenario.
>> That makes a lot of sense. How do you um
determine when you need to take do
something new in simulation? Like do you
look at data first and then say hey we
don't really have enough of this kind of
data. Let's go simulate this case many
many more times or like what's the
process for deciding when to load
something up in omniverse and just like
>> create a new scenario. Yeah. So we have
this concept we call like a functional
scenario tree right. So to give you an
example right let's take stopping at all
way stop signs right. So, okay, we know
for an always stop sign scenario, you
can go straight, you can turn left, you
can turn right. Now, do we have data
that covers all three of those
scenarios, right? Uh, yes, but maybe we
have a little bit less data, right?
Let's go mine for more data for
specifically all way stop signs, right?
Uh, and then we say, okay, well, we also
realize that we need to consider, let's
say, two-way stops, right? So, add
another node to the the tree, right?
Okay, now we need two-way stop sign
scenarios, right? So as you expand upon
individual scenarios and use cases, you
then can layer on data on top to support
and continue to build out all kind of
the longtail quarter cases, right? So
>> the driving experience you're seeing
here, right,
>> I don't want to speak for you, but it's
pretty good, right? So the car is able
to handle construction and pedestrians
and things like that, right? But again,
as we go to different geographies,
different scenarios that maybe we
haven't encountered, right? We can
always continue to add more. And if we
what we do is we look at any new issues
that we find across the fleet all across
you know the globe right and we'll say
hey we've never seen anything like this
before. We haven't simulated this
anything before let's go find data that
supports this now one new issue that we
found and then we can expand it like
that. It's like here like we're driving
with cones in the middle of the road,
right? We don't know if there's people,
there's guys reversing, right? Let's
break a little bit and confirm, right?
Okay, now we have this where we can use
this in the future, right? If you wanted
to, hey, we need data on driving past
cones, right? As an example,
>> it's it's really interesting, you know,
like now that I'm really alert and like
keeping my eyes open for it. Driving's
hard, man. Like there's a lot of stuff
going on in the road that you kind of
take for granted when you're driving in
the moment because that's all you're
focusing on. But when you can take a
step back and just assess and be like,
"Oh yeah, we've been in a few crazy
situations already,
>> you know, it's really funny." And you're
right, the car has handled it very well.
Um, good segue into my next question.
What do you think, you know, is the
ultimate hard scenario for uh
self-driving. So like my thought, you
know, I imagine those videos or images
uh in India with those giant roundabouts
where it's just like people are merging
and weaving through each other. It's
just like it seems like pure chaos when
you're observing from above. Um, but
that's just my thought. What is actually
the hardest scenario to train for?
>> I think the way to think about it,
right, is is very similar, right? What
do you think would be difficult for a
human, right? And because a lot of the
driving behavior is trained off human
data, right?
>> What would humans struggle with, right?
Those are the things that we need to get
through. So whether it's, you know, very
dense traffic with scooters and people
walking in between with no lane lines
and no road markings, right? uh those
are the things that I think you know are
the kind of the longer tale right
solutions that you know we need to get
to where I think again with enough data
right I don't see why we wouldn't be
able to support you know handling
driving in different countries different
geographies right different weather
conditions right because you have enough
understanding of what good driving
behavior is right you can reinforce that
and make it so that way it can handle uh
better than a good human driver would
>> yeah no that makes a lot of sense um and
then I guess as a follow-up to that you
know one of the things that I'm noticing
is super important is lane markings,
right? Um how is this on dirt roads?
>> So it'll use the context from other cars
as well, right? To understand where
other people are driving, right? So it
doesn't Yeah. Yes. Obviously having lane
markings is nice, but uh there are
certain parts even in San Francisco
where the roads under construction where
there are no lane markings, right? So
the car is able to see, hey, this is
roughly the width of two lanes and I can
see the other cars are driving here.
Okay, contextually, this is where I
should drive.
>> Yeah, this must be where the lane would
be. Right.
>> Exactly. Right. So the the platform
itself, right, is is a HD mapless
solution, right? So it's able to
understand context, right, and try to
figure out, okay, this is where the lane
should be, right? I should drive here,
>> right? Um, when you say fully
contextless, like one of the things I'm
also imagining is like no cell signal,
no online, like this can this is a fully
self-contained solution that doesn't
reach back over the internet or the
cloud to anything, direct.
>> Yeah. So this is all built uh, you know,
in the car, right? So this is a
production car. We've just flashed one
of our latest kind of software builds to
the car so that way we can enable all
these features, right? But the physical
hardware on the car is the same, right?
And there's no, you know, there is an
internet connection that we use to
upload data, but there's no streaming,
you know, to this car saying, "Hey,
here's what's the latest map is of
what's going on in San Francisco."
>> Um, what about like you, this is maybe a
silly question, but we just passed a
handicap parking space. If I put a
handicap parking um you know thing here,
would it contextually understand, hey,
now I'm allowed to park in a handicap
space?
>> So, not yet. That's not something we
have yet. So, we're not looking at like
curb colors or anything quite yet, but
you know, those are some of the concepts
that some of my colleagues are working
on. That's, you know, exciting to see.
You know, hopefully, you know, we'll
roll some of those features out in the
future. Is there a so like broader than
like even that you know if I had
handicap put like I guess what I'm
really asking is like is there a button
I can push to say hey this vehicle is
allowed to park in certain special
spaces that would not otherwise be you
know handicap in this example but you
can imagine like any wide variety of
>> yeah it depends on what the the partner
is looking for right so you we can adapt
the stack to do any number of things
right so if someone wanted us to look
for you know identifying curb color,
right? And understanding that yellow
means temporary parking, but red means
no parking, right? Those are the types
of things that we can do and we can work
on with the partner in order to provide
that type of capability. Uh, which we
haven't done that yet here for this
case.
>> What about Sorry, I realize I'm asking a
similar question a few times. Um, you
know, in the robo taxi case, 15minute
parking, I want I just want to drop
someone off and leave. Is the car smart
enough to say I can park there for 15
minutes? I really only need like one and
a half minutes, two minutes to drop off
my passenger. Is it
>> we can get to that point. Uh again, I
haven't worked on the robo taxi project
myself yet, but you know, we can read
street signs, right? So, we can
understand things like, you know, like
the no ride on red, for example, right?
We can see and read the sign that says,
you know, the arrow through the uh the
sign and the
>> be fun. Sorry, not to interrupt you. I
apologize.
>> No worries. Right. So, we can see if we
can fit through this narrow space,
right? So, we stop. Right. So, that way
we don't get too close. Right. And you
can see we'll just kind of slop through.
>> Wow.
>> But again, so the double park vehicle
case I talked about, right? Yes. Like
that case would be really difficult
before we had these end to-end models,
right? Whereas now we saw that she had
stopped. There's enough space, right? We
can fit. Okay, great. We'll go ahead and
take this gap here.
>> Well, what I'm realizing about myself is
like I'm a much more conservative driver
than even this computer is because in
that situation, I would have just
stayed. I cuz I saw that car coming the
other way. I would just stayed. So, I'm
learning two things. One, it's really
capable of making really fine
estimations of like, hey, I think I can
fit through there. I'm going to try. And
two, I need to be a much more aggressive
driver is what I'm really hearing.
>> The, you know, the cool thing about
this, right, is we've deployed different
models, right? So, we see some that are
very conservative, right? So, we get
feedback not only from my team, but also
our drivers and everybody else at
testing the fleet where we say, "Hey,
this model seems to be getting stuck
more often, right?" Yeah. Sure. So we
try to find the balance of, you know,
getting stuck versus being assertive,
right? Where it gets to this kind of
nice middle ground where sometimes we'll
get stuck, sometimes we'll make the
pass, right? It just depends on, you
know, what the scenario is, right?
>> And is is that um like a global
assertion about the model, hey, it's
globally aggressive, it's globally
conservative, or is it like, hey, this
model is really aggressive when it comes
to overtaking a vehicle, but it could be
conservative in other cases. Like, is
the whole model aggressive, passive,
conservative? that it's you'll find that
different models will do different
things, right? So on average, we have
roughly seven new models that you know
we generate per day, right? And you'll
try different ones and not all of them
end up making up to the cost.
>> Sorry. Seven new models you generate per
day. Can you just speak a little bit to
the different like why seven wide
>> just about what we can generate with uh
you know the GPU usage that we have and
everything to come up with new models,
right? But you can imagine these are
enormous data sets,
>> right? So we're always trying to improve
the driving behavior, right? So some
models may be more reactive to people
where it's too conserved where every
time we see someone it might want to
break right okay that model to us right
might not be as comfortable right safety
might be the same right but the comfort
is less because it's over sensitive
right so there are different things like
that that we can deploy and we can test
based off different data sets that we
add to the model and then we can wait
and prioritize different data sets to
get the right mix of safety and comfort
that we're looking for. So like here
again we have another for our car right
that car is far away he's pulling over
we're able to easily drive around that
no problem right
>> um so yeah so there's different models
that we deploy and every day we test
different variants across our entire
fleet right where we can see in
different you know geographies right one
model might be really good in California
but struggles in Texas right so that's
also why we want to test in different
geographies uh in addition to the
simulation testing right we also always
want to do on-road testing as well just
to compare and make sure that we don't
see this guy just ran his stop sign,
right? And no problem. The car gently
waited for him and then proceeded.
>> Yeah. No, got it. Okay. So, it's really
about generating variants of the same
model, comparing their outputs. Okay.
That that was going to be my next
question is like what are these seven
models even? But
>> yeah, so typically you'll see it will
start from a very similar base, right?
But over time that base will evolve and
get more capable, right? So, uh back to
my kind of 16-year-old analogy, right?
Uh you know, in the beginning, right,
the car might be a little jerky or a
little, you know, wobbly, right? Now you
can see the drive is quite smooth,
right? So over time you see the general
capability improves as it deals with
more scenarios, right? You can detect
this person standing here with this open
trunk, right? So let me slow down, let
me go around them and let me come back
to, you know, the center of my lane as
an example. Can you speak a little more
to that like 16-year-old comment like
two years ago the the average model was
like driving like an 11y old and two
years from now we expected to drive like
a 25-year-old like you know can you help
me understand like the pace of evolution
of how good driving models are on
average. So this so this project you can
say with these types of models it's been
a little bit over a year
>> right and this has been this model was
about you know 2,300 or so models that
we've generated to get to this point
>> right so
>> uh you know fortunately we have the you
know compute in order to generate and
process this data to create new models
but you can see today this driving is
very smooth very capable right it can
understand you know construction and
double park cars right u so you know
that's what I'd say is you know pretty
good, right? It's a pretty good driver,
right?
>> For sure.
>> So, like here, right, we had that yellow
light flash really quick, right? We were
able to make it through no problem,
right? We see this guy coming at us on a
skateboard, right? There's no harsh
braking for that scenario, right?
There's no big swerves, things like that
where the car is learned to be very
smooth and predictable with its outputs
in terms of vehicle motion.
>> Incredible.
>> You see a nice unpredicted left turn,
right? And we have this nice downhill,
right? Where the limit is 25 or 30 here,
right? But we're not going to
immediately try to jump up to the, you
know, the speed limit, right? We can see
there's a red light. We should just
gently come to this light here.
>> Yeah. And we're so we're on a pretty
steep incline. So the car understands
like, hey, even though it's at eye
level, what I'm looking at out there is
actually the horizon. It's not useful
information in the context of driving.
I'm just going to look at what's right
in front of me.
>> Yeah. But it can use that for context,
right? Like it can see the roofs of the
cars are disappearing, right? So, okay,
I know that there's a pretty steep
grade. let me uh you know kind of come
over here gently, right? Let me not
floor it to come over this hill, right?
Because you don't know someone could be
stopped there, right?
>> Um what what do you find are some of the
most um interesting driver behavior
changing features? Is it things like
elevation? Is it things like weather? Is
there another one that's like that that
surprised you like, oh wow, this sort of
scenario is surprisingly difficult based
on what I thought. Uh yeah, I would say
those two were definitely two of the
ones that, you know, surprised me at
first, right? I think uh when you get
into like really dense traffic with a
lot of, you know, bicyclists, people
weaving in and out and, you know, you
can imagine, you know, downtown San
Francisco at rush hour with like
delivery bikes and things like that,
>> right? So understanding when to be
assertive versus when to be
conservative, right, is really
interesting. To your earlier point, it
makes me really reflect on my own
driving, right? Right. And it's like,
okay, yeah, I would have done this. Or,
oh, okay, I didn't even see that guy.
Like, there's been times where I've been
at stop signs where I'm like, hey, come
on, car. Like, let's go ahead. Like,
let's start going. And then out of
nowhere, right, I see a guy cross the
street cuz he was behind a bush and I
couldn't see him, right? But the car was
able to see the motion there and say,
hey, there's someone there. I'm not
going to go.
>> Uh, so yeah, it's it's a lot of fun
seeing those types of things where,
okay, here's where I maybe need to
reconsider how I approach driving.
>> For sure. Yeah, for sure. Um, one of the
things that, um, constantly is a
scenario that I deal with back home is,
you know, we have ambulances or
emergency services and you hear them way
before you see them. And so the culture
in Florida is, you know, when you kind
of get the sense of the flashing lights
around you, you pull over to let them
pass.
>> How does that work when you're
self-driving? Like, does the car
understand, hey, I hear the siren or is
it looking for the lights or what's the
>> No. So, not quite yet. So, that's where
for like an L2++ product, right? that's
where the driver can take over and pull
over in that case,
>> right? But those are kind of the longer
tail things that we're looking to add
for level four, right? Where the car
needs to understand, hey, I need to pull
over for safety vehicles and things like
that. So,
>> uh that's where we see kind of the the
next jump coming in in the future.
>> How does it deal with like um even finer
like
I'm using stranger because I just don't
have a better word to say it, but like
you know there's a police officer in the
middle of the road. He's directing
traffic with his hands, not a sign. very
small, very hard to see, maybe you know
the color blends in with maybe the
background or his uniform or whatever.
If he waves you on like this, can the
car see that and understand to do that
or is that an intervention?
>> So sometimes we can do that, sometimes
you can't, right? So that's one where
you know we still need to refine that
part of the model behavior, right? So in
this case, right, you can still leave
the system engaged but tap the gas,
right? And the car says, "Okay, the
driver's giving me this input, right?
Okay, now I can proceed." Right? So we
can see that there's a person there. So
obviously don't drive towards the
person, right? But because the driver is
given the confidence of hey go ahead and
proceed, right? Then the car then can
resume control.
>> That's that's interesting. So it's not
really a the driver took over in the
sense that he's going to get past that
scenario. It's simply a I'm tapping the
gas to let you know it's okay for you to
move forward and then you're like
putting the decisions of what to
actually do back in the car's hands, so
to speak.
>> Exactly. Right. So it's kind of like a
you know you're working together with
the car, right? So, you know, even let's
say there's a double parked car and if
it gets a little too close, a little
stuck, right? You know, there's times
where I can just give a little bit of
steering input to say, "Hey, I can see
around. It's all good." Right? Then the
car then will control the accelerator to
then pull out, right? So, it's very much
you're working with the model to drive,
right? And the car said, "Okay, you're
giving me this input, right? Okay,
great. Let's go ahead and move forward."
>> So, that's really interesting. I I
didn't expect it to be so cooperative.
One of the things that's uh ahead of us
right now is a school bus. And school
buses have this magic ability to
materialize a stop sign out of nowhere,
right? How does the car react when like
all of the sudden it sees a stop sign
that wasn't there just a few seconds
ago, but it's already made its
trajectory. It's made its plan.
>> So, we can identify different types of
vehicles, right? So, we have like a
school bus classification. We have
regular cars, trucks, right? So, we can
identify different types of vehicles.
>> Oh, so sorry, not to interrupt you, but
not just a bus. You can tell that's a
school bus. And part of the school bus
entity is the stop sign. So then if the
stop sign's out, we can detect there's a
stop sign there, right? And we can see
the flashing lights as well. So using,
okay, we see a stop sign, we see the
flashing lights. Okay, we should stop
and wait here.
>> Got it. Okay. So it's at that level.
It's not like, hey, I know what a bus
is, and then I know some buses are also
school buses. And one of their magic
powers, for lack of a better word, is to
put out a stop sign.
>> Yeah. So we can see Yeah. We'll classify
as a bus, but then we see bus plus stop
sign plus flashing lights. Okay. Yeah.
Stop.
>> Got it. Thanks. That's that's very
clear. Um there's an object in So we're
looking at a Whimo, right? Obvious to
the GoPros and stuff, but what some of
the sensors might see is like something
that's spinning in the road. Does it
care about that? Does it not? It
understands the larger context. Walk me
through.
>> So that it would just see this. It would
just see this is a car, right? And
there's things moving on it, right? But
I can tell this is a car. So either wait
for it to clear my path, right? Or, you
know, go ahead and proceed.
>> Beyond that, it doesn't matter. It's
like, oh, whatever strange features it
might have are just strange features.
>> Yeah. doesn't behave differently for a
different take by hand.
>> Okay. Thank you, George. Yeah. Just to
take us back to keep us on our schedule
here. Thank you. Sure.
>> Yeah. But, uh, as mentioned, right? So,
you can see George's braking is a little
bit more firm than, uh,
>> our car is, right? So,
>> what else what else should we know about
like the the car the capabilities like,
you know, I tried to ask as many
questions as I could, but what's what's
one thing maybe I missed that you'd love
to like share with my audience? I think
what's really exciting is that we could
take this architecture, right? And we
can scale it up or scale it down, right?
So this is like our, as I mentioned, our
level two plus experience, right? But
we're scaling this up to that level four
experience, right? So whether that's a
robo taxi or even consumer grade level
four, right? We're flexible enough where
we can adapt to whatever the partner is
looking for, right? So I think what I'm
really excited about is a future where,
you know, I have my own car that I buy.
I want to go drive the beautiful twisty
road. Great. But then when I'm commuting
home from work, I can let the car do
everything, right? And have that be a
level four experience.
>> And um like no, off the record, no one's
holding you to it, you know, I'm just
curious, do you expect to be able to do
that in 3 years, 10 years, like
>> Yeah. I think what we see with, you
know, Uber, right, we're going to launch
in San Francisco and California by next
year, right, for uh our robo taxi
initiatives, right? So we're going to
have to figure out how to make sure we
make it work. And I'm excited to see
that. So I think it's coming much sooner
than we think.
>> That's really, really exciting. That's
awesome. Um, anything else you want to
say
>> for me?
>> Dude, that was really awesome. We saw a
lot of really cool scenarios that I
never really expected or really thought
about how much and how fatiguing driving
can be, right? Like I imagine if I was
driving that,
>> yeah,
>> I'd be pretty tired. You know what I
mean? That was a lot of stuff we went
through.
>> It's interesting. Uh, so again, like I
mentioned, uh, yeah, I work with the
product team here. U, you know, you guys
are some of the first to see it that are
non Nvidia employees, right? What did
you guys think? Right. You know, I'm
curious.
>> I I think it was really, really smooth.
I was really impressed with how smooth
it was. And I was really impressed with
some of the decisions it made
>> versus the decisions I would make. I do
consider myself a very good driver.
>> Everybody does.
>> And and and this really made me and and
then this really made me reconsider
like, oh, there are certain situations,
you know, I made the joke earlier, but I
can and probably should be more
aggressive just because that that
actually is the safer option, like to
clear this lane sooner or whatever, for
example. And there are situations where
maybe I would have misjudged it because,
you know, my two cameras are in a fixed
position in the middle of the car, but
this has multiple cameras, including
blind spots I can't see and above me
where it can see like horizons, right? I
think those things put together really
I'm impressed by the difference between
its behavior and my behavior. Yeah. You
know,
>> it's really fun the first time you drive
it, right, where you're sitting in
Georgia seat, right? where you're like,
"Okay, let's see what the car does,
right?" And after a couple minutes,
you're like, "Oh, this is as good, if
not better than I am." Right? And then
you get to those really interesting
scenarios like, "Okay, I would have done
this, right? But the car did this."
You're like, "Huh, okay. Yeah, that
actually makes sense." So, it's it's
kind of fun because it makes you look at
driving in a very different way.
>> Yeah, there's a feedback. I'm curious to
see one one thing I am curious to see is
like, "All right, you've rolled out
self-driving. It's been out for a while.
Some people obviously still choose to
manually drive their cars on occasion.
as you keep collecting that data, are
people on average becoming better
drivers because they see computers that
are doing better things? You know what I
mean?
>> It'll be interesting to see, right? The
next couple years, it'll be really
interesting to see.
>> Yeah.
>> And then I have one question.
>> Yeah, sure.
>> So,
>> is there equal priority or any
difference in weight in goals of safety
versus efficiency on the road of like
what the
>> goals of of automation might be?
>> Yeah.
>> Increased safety, increased efficiency,
all of the above. people waiting.
>> Safety is always the highest weight,
right? We want to make sure we're not
driving into people while driving other
cars, right? So, that's always
>> Yeah. Always always takes priority,
right? And that's why we're really
excited that we have that classical
stack there always to make sure the
model doesn't do anything we don't want
it to do,
>> right? But then after that, then comes
that comfort and efficiency, right?
Where nobody wants to get stuck in
traffic, right? Doesn't want to get
stuck in the lane. So, that's where
having a big model is really helpful
because then you're able to get that
human decision-making. It's like, "Okay,
let me go into this lane and, you know,
we can get over here where we're not
blocking traffic. We're not stuck here,
right? We can see the lane is closed."
So, it's that's where it's really
powerful.
>> Okay,
>> this was awesome, man. Thank you so much
for your time.
>> Thank you.
>> Great to meet you guys.
>> Yeah, thank you so much. I mean, your
driving was fantastic.
>> A few things stood out to me after
spending an hour with Nvidia's L2++
powered Mercedes. The car had to react
dynamically to LA traffic. Wayne merges,
sudden cutins, construction zones, and
unpredictable pedestrians. And it
handled everything with a smooth
precision that you don't really get with
human drivers. For investors, the bigger
takeaways might be in Armen's
commentary. how Nvidia's Drive OS and
perception stack are working together,
the specific in-car capabilities they
unlock, and how this architecture will
let OEMs like Hyundai and platforms like
Uber tailor the driving experience
across level two, level three, and level
four autonomy over the next couple
years. A huge thank you to the entire
Nvidia team for flying us out to
California for supplying us with press
passes for GTC and for making this test
drive possible and to Armen for
answering my non-stop questions. And of
course, thank you for watching and
supporting the channel. Without you, I
wouldn't get opportunities like this in
the first place. And if you want to see
what else I learned at Nvidia GTC and
what I'm investing in, check out this
video next. or if you want more science
behind the stocks, then this video is
for you. Either way, thanks for watching
and until next time, this is Ticker
Simple U. My name is Alex, reminding you
that the best investment you can make is
in you.
Ask follow-up questions or revisit key timestamps.
This video features a real-world, unedited one-hour test drive in downtown Los Angeles using a Mercedes vehicle equipped with Nvidia's L2++ autonomous driving platform. The narrator rides with Armen Connie, a senior product manager at Nvidia, to discuss the system's technical design, including the use of cameras, radar, and ultrasonics without LiDAR. They explore how the system handles complex urban scenarios like construction zones, pedestrian interactions, and gridlock, while highlighting the 'world model' that synthesizes sensor data to inform decision-making. The discussion also covers the roadmap from L2++ to L4, the role of end-to-end training models, and the collaboration between the new AI stack and a traditional safety-backup stack.
Videos recently processed by our community