Andrej Karpathy: Software Is Changing (Again)

Watch on YouTube

Now Playing

Transcript

1139 segments

0:01

Please welcome former director of AI

0:04

Tesla Andre Carpathy.

0:07

[Music]

0:11

Hello.

0:14

[Music]

0:19

Wow, a lot of people here. Hello.

0:22

Um, okay. Yeah. So I'm excited to be

0:24

here today to talk to you about software

0:27

in the era of AI. And I'm told that many

0:30

of you are students like bachelors,

0:32

masters, PhD and so on. And you're about

0:34

to enter the industry. And I think it's

0:36

actually like an extremely unique and

0:37

very interesting time to enter the

0:38

industry right now. And I think

0:41

fundamentally the reason for that is

0:43

that um software is changing uh again.

0:47

And I say again because I actually gave

0:49

this talk already. Um but the problem is

0:52

that software keeps changing. So I

0:54

actually have a lot of material to

0:55

create new talks and I think it's

0:56

changing quite fundamentally. I think

0:58

roughly speaking software has not

1:00

changed much on such a fundamental level

1:02

for 70 years. And then it's changed I

1:04

think about twice quite rapidly in the

1:06

last few years. And so there's just a

1:08

huge amount of work to do a huge amount

1:09

of software to write and rewrite. So

1:12

let's take a look at maybe the realm of

1:14

software. So if we kind of think of this

1:16

as like the map of software this is a

1:17

really cool tool called map of GitHub.

1:20

Um this is kind of like all the software

1:21

that's written. Uh these are

1:23

instructions to the computer for

1:24

carrying out tasks in the digital space.

1:26

So if you zoom in here, these are all

1:28

different kinds of repositories and this

1:30

is all the code that has been written.

1:31

And a few years ago I kind of observed

1:33

that um software was kind of changing

1:35

and there was kind of like a new type of

1:37

software around and I called this

1:39

software 2.0 at the time and the idea

1:42

here was that software 1.0 is the code

1:44

you write for the computer. Software 2.0

1:46

know are basically neural networks and

1:48

in particular the weights of a neural

1:50

network and you're not writing this code

1:53

directly you are most you are more kind

1:55

of like tuning the data sets and then

1:56

you're running an optimizer to create to

1:58

create the parameters of this neural net

2:00

and I think like at the time neural nets

2:02

were kind of seen as like just a

2:03

different kind of classifier like a

2:04

decision tree or something like that and

2:06

so I think it was kind of like um I

2:09

think this framing was a lot more

2:10

appropriate and now actually what we

2:12

have is kind of like an equivalent of

2:13

GitHub in the realm of software 2.0 And

2:15

I think the hugging face is basically

2:18

equivalent of GitHub in software 2.0.

2:20

And there's also model atlas and you can

2:22

visualize all the code written there. In

2:24

case you're curious, by the way, the

2:25

giant circle, the point in the middle,

2:28

uh these are the parameters of flux, the

2:30

image generator. And so anytime someone

2:32

tunes a on top of a flux model, you

2:34

basically create a git commit uh in this

2:37

space and uh you create a different kind

2:39

of a image generator. So basically what

2:41

we have is software 1.0 is the computer

2:43

code that programs a computer. Software

2:45

2.0 are the weights which program neural

2:48

networks. Uh and here's an example of

2:50

Alexet image recognizer neural network.

2:53

Now so far all of the neural networks

2:55

that we've been familiar with until

2:56

recently where kind of like fixed

2:58

function computers image to categories

3:01

or something like that. And I think

3:03

what's changed and I think is a quite

3:05

fundamental change is that neural

3:06

networks became programmable with large

3:09

language models. And so I I see this as

3:12

quite new, unique. It's a new kind of a

3:14

computer and uh so in my mind it's uh

3:18

worth giving it a new designation of

3:19

software 3.0. And basically your prompts

3:22

are now programs that program the LLM.

3:25

And uh remarkably uh these uh prompts

3:28

are written in English. So it's kind of

3:30

a very interesting programming language.

3:33

Um so maybe uh to summarize the

3:36

difference if you're doing sentiment

3:37

classification for example you can

3:39

imagine writing some uh amount of Python

3:42

to to basically do sentiment

3:44

classification or you can train a neural

3:46

net or you can prompt a large language

3:47

model. Uh so here this is a few short

3:50

prompt and you can imagine changing it

3:51

and programming the computer in a

3:52

slightly different way. So basically we

3:54

have software 1.0 software 2.0 and I

3:57

think we're seeing maybe you've seen a

3:59

lot of GitHub code is not just like code

4:01

anymore. there's a bunch of like English

4:03

interspersed with code and so I think

4:05

kind of there's a growing category of

4:07

new kind of code. So not only is it a

4:09

new programming paradigm, it's also

4:10

remarkable to me that it's in our native

4:12

language of English. And so when this

4:14

blew my mind a few uh I guess years ago

4:17

now I tweeted this and um I think it

4:20

captured the attention of a lot of

4:21

people and this is my currently pinned

4:23

tweet uh is that remarkably we're now

4:25

programming computers in English. Now,

4:28

when I was at uh Tesla, um we were

4:31

working on the uh autopilot and uh we

4:34

were trying to get the car to drive and

4:37

I sort of showed this slide at the time

4:39

where you can imagine that the inputs to

4:41

the car are on the bottom and they're

4:43

going through a software stack to

4:44

produce the steering and acceleration

4:47

and I made the observation at the time

4:48

that there was a ton of C++ code around

4:51

in the autopilot which was the software

4:52

1.0 code and then there was some neural

4:54

nets in there doing image recognition

4:56

and uh I kind of observed that over time

4:58

as we made the autopilot better

5:00

basically the neural network grew in

5:02

capability and size and in addition to

5:05

that all the C++ code was being deleted

5:08

and kind of like was um and a lot of the

5:12

kind of capabilities and functionality

5:14

that was originally written in 1.0 was

5:16

migrated to 2.0. So as an example, a lot

5:19

of the stitching up of information

5:20

across images from the different cameras

5:22

and across time was done by a neural

5:24

network and we were able to delete a lot

5:26

of code and so the software 2.0 stack

5:29

quite literally ate through the software

5:32

stack of the autopilot. So I thought

5:34

this was really remarkable at the time

5:35

and I think we're seeing the same thing

5:37

again where uh basically we have a new

5:39

kind of software and it's eating through

5:40

the stack. We have three completely

5:42

different programming paradigms and I

5:44

think if you're entering the industry

5:45

it's a very good idea to be fluent in

5:47

all of them because they all have slight

5:49

pros and cons and you may want to

5:50

program some functionality in 1.0 or 2.0

5:53

or 3.0. Are you going to train

5:54

neurallet? Are you going to just prompt

5:55

an LLM? Should this be a piece of code

5:57

that's explicit etc. So we all have to

5:59

make these decisions and actually

6:00

potentially uh fluidly trans transition

6:03

between these paradigms. So what I

6:06

wanted to get into now is first I want

6:09

to in the first part talk about LLMs and

6:11

how to kind of like think of this new

6:13

paradigm and the ecosystem and what that

6:15

looks like. Uh like what are what is

6:17

this new computer? What does it look

6:18

like and what does the ecosystem look

6:20

like? Um I was struck by this quote from

6:23

Anduring actually uh many years ago now

6:25

I think and I think Andrew is going to

6:27

be speaking right after me. Uh but he

6:29

said at the time AI is the new

6:30

electricity and I do think that it um

6:33

kind of captures something very

6:34

interesting in that LLMs certainly feel

6:36

like they have properties of utilities

6:38

right now. So

6:41

um LLM labs like OpenAI, Gemini,

6:44

Enthropic etc. They spend capex to train

6:47

the LLMs and this is kind of equivalent

6:48

to building out a grid and then there's

6:51

opex to serve that intelligence over

6:53

APIs to all of us and this is done

6:56

through metered access where we pay per

6:58

million tokens or something like that

7:00

and we have a lot of demands that are

7:01

very utility- like demands out of this

7:03

API we demand low latency high uptime

7:06

consistent quality etc. In electricity,

7:08

you would have a transfer switch. So you

7:10

can transfer your electricity source

7:12

from like grid and solar or battery or

7:14

generator. In LLM, we have maybe open

7:16

router and easily switch between the

7:18

different types of LLMs that exist.

7:20

Because the LLM are software, they don't

7:23

compete for physical space. So it's okay

7:25

to have basically like six electricity

7:26

providers and you can switch between

7:28

them, right? Because they don't compete

7:29

in such a direct way. And I think what's

7:31

also a little fascinating and we saw

7:33

this in the last few days actually a lot

7:36

of the LLMs went down and people were

7:38

kind of like stuck and unable to work.

7:41

And uh I think it's kind of fascinating

7:42

to me that when the state-of-the-art

7:43

LLMs go down, it's actually kind of like

7:45

an intelligence brownout in the world.

7:47

It's kind of like when the voltage is

7:49

unreliable in the grid and uh the planet

7:52

just gets dumber the more reliance we

7:55

have on these models, which already is

7:56

like really dramatic and I think will

7:58

continue to grow. But LLM's don't only

8:00

have properties of utilities. I think

8:02

it's also fair to say that they have

8:03

some properties of fabs. And the reason

8:06

for this is that the capex required for

8:09

building LLM is actually quite large. Uh

8:12

it's not just like building some uh

8:14

power station or something like that,

8:15

right? You're investing a huge amount of

8:17

money and I think the tech tree and uh

8:20

for the technology is growing quite

8:22

rapidly. So we're in a world where we

8:24

have sort of deep tech trees, research

8:26

and development secrets that are

8:28

centralizing inside the LLM labs. Um and

8:32

but I think the analogy muddies a little

8:34

bit also because as I mentioned this is

8:36

software and software is a bit less

8:38

defensible because it is so malleable.

8:40

And so um I think it's just an

8:43

interesting kind of thing to think about

8:44

potentially. There's many analogy

8:46

analogies you can make like a 4

8:48

nanometer process node maybe is

8:49

something like a cluster with certain

8:51

max flops. You can think about when

8:53

you're use when you're using Nvidia GPUs

8:54

and you're only doing the software and

8:56

you're not doing the hardware. That's

8:57

kind of like the fabless model. But if

8:59

you're actually also building your own

9:00

hardware and you're training on TPUs if

9:02

you're Google, that's kind of like the

9:03

Intel model where you own your fab. So I

9:05

think there's some analogies here that

9:06

make sense. But actually I think the

9:08

analogy that makes the most sense

9:09

perhaps is that in my mind LLM have very

9:12

strong kind of analogies to operating

9:15

systems. Uh in that this is not just

9:17

electricity or water. It's not something

9:19

that comes out of the tap as a

9:20

commodity. uh this is these are now

9:22

increasingly complex software ecosystems

9:25

right so uh they're not just like simple

9:28

commodities like electricity and it's

9:30

kind of interesting to me that the

9:32

ecosystem is shaping in a very similar

9:33

kind of way where you have a few closed

9:36

source providers like Windows or Mac OS

9:38

and then you have an open source

9:39

alternative like Linux and I think for u

9:42

neural for LLMs as well we have a kind

9:45

of a few competing closed source

9:47

providers and then maybe the llama

9:49

ecosystem is currently like maybe a

9:51

close approximation to something that

9:53

may grow into something like Linux.

9:55

Again, I think it's still very early

9:56

because these are just simple LLMs, but

9:58

we're starting to see that these are

9:59

going to get a lot more complicated.

10:01

It's not just about the LLM itself. It's

10:02

about all the tool use and the

10:03

multiodalities and how all of that

10:05

works. And so when I sort of had this

10:07

realization a while back, I tried to

10:09

sketch it out and it kind of seemed to

10:11

me like LLMs are kind of like a new

10:12

operating system, right? So the LLM is a

10:15

new kind of a computer. It's sitting

10:17

it's kind of like the CPU equivalent. uh

10:19

the context windows are kind of like the

10:21

memory and then the LLM is orchestrating

10:24

memory and compute uh for problem

10:26

solving um using all of these uh

10:29

capabilities here and so definitely if

10:32

you look at it looks very much like

10:34

operating system from that perspective.

10:36

Um, a few more analogies. For example,

10:38

if you want to download an app, say I go

10:41

to VS Code and I go to download, you can

10:43

download VS Code and you can run it on

10:46

Windows, Linux or or Mac in the same way

10:50

as you can take an LLM app like cursor

10:53

and you can run it on GPT or cloud or

10:55

Gemini series, right? It's just a drop

10:57

down. So, it's kind of like similar in

10:59

that way as well.

11:00

uh more analogies that I think strike me

11:02

is that we're kind of like in this

11:04

1960sish

11:05

era where LLM compute is still very

11:09

expensive for this new kind of a

11:10

computer and that forces the LLMs to be

11:13

centralized in the cloud and we're all

11:15

just uh sort of thing clients that

11:18

interact with it over the network and

11:20

none of us have full utilization of

11:22

these computers and therefore it makes

11:24

sense to use time sharing where we're

11:26

all just you know a dimension of the

11:28

batch when they're running the computer

11:30

in the cloud. And this is very much what

11:32

computers used to look like at during

11:33

this time. The operating systems were in

11:35

the cloud. Everything was streamed

11:36

around and there was batching. And so

11:39

the p the personal computing revolution

11:41

hasn't happened yet because it's just

11:42

not economical. It doesn't make sense.

11:44

But I think some people are trying. And

11:46

it turns out that Mac minis, for

11:48

example, are a very good fit for some of

11:50

the LLMs because it's all if you're

11:52

doing batch one inference, this is all

11:53

super memory bound. So this actually

11:55

works.

11:56

And uh I think these are some early

11:58

indications maybe of personal computing.

12:00

Uh but this hasn't really happened yet.

12:02

It's not clear what this looks like.

12:03

Maybe some of you get to invent what

12:05

what this is or how it works or uh what

12:08

this should what this should be. Maybe

12:10

one more analogy that I'll mention is

12:12

whenever I talk to Chach or some LLM

12:14

directly in text, I feel like I'm

12:16

talking to an operating system through

12:18

the terminal. Like it's just it's it's

12:21

text. It's direct access to the

12:22

operating system. And I think a guey

12:24

hasn't yet really been invented in like

12:26

a general way like should chatt have a

12:29

guey like different than just a tech

12:31

bubbles. Uh certainly some of the apps

12:33

that we're going to go into in a bit

12:35

have guey but there's no like guey

12:38

across all the tasks if that makes

12:40

sense. Um there are some ways in which

12:43

LLMs are different from kind of

12:45

operating systems in some fairly unique

12:47

way and from early computing. And I

12:49

wrote about uh this one particular

12:52

property that strikes me as very

12:54

different uh this time around. It's that

12:57

LLMs like flip they flip the direction

12:59

of technology diffusion uh that is

13:02

usually uh present in technology. So for

13:05

example with electricity, cryptography,

13:07

computing, flight, internet, GPS, lots

13:09

of new transformative technologies that

13:10

have not been around. Typically it is

13:12

the government and corporations that are

13:14

the first users because it's new and

13:16

expensive etc. and it only later

13:18

diffuses to consumer. Uh, but I feel

13:20

like LLMs are kind of like flipped

13:22

around. So maybe with early computers,

13:24

it was all about ballistics and military

13:26

use, but with LLMs, it's all about how

13:29

do you boil an egg or something like

13:30

that. This is certainly like a lot of my

13:32

use. And so it's really fascinating to

13:33

me that we have a new magical computer

13:35

and it's like helping me boil an egg.

13:37

It's not helping the government do

13:38

something really crazy like some

13:40

military ballistics or some special

13:42

technology. Indeed, corporations are

13:43

governments are lagging behind the

13:45

adoption of all of us, of all of these

13:47

technologies. So, it's just backwards

13:48

and I think it informs maybe some of the

13:50

uses of how we want to use this

13:52

technology or like where are some of the

13:53

first apps and so on.

13:56

So, in summary so far, LLM labs LLMs. I

14:01

think it's accurate language to use, but

14:03

LLMs are complicated operating systems.

14:06

They're circa 1960s in computing and

14:08

we're redoing computing all over again.

14:10

and they're currently available via time

14:11

sharing and distributed like a utility.

14:13

What is new and unprecedented is that

14:16

they're not in the hands of a few

14:17

governments and corporations. They're in

14:18

the hands of all of us because we all

14:20

have a computer and it's all just

14:21

software and Chaship was beamed down to

14:24

our computers like billions of people

14:26

like instantly and overnight and this is

14:28

insane. Uh and it's kind of insane to me

14:30

that this is the case and now it is our

14:33

time to enter the industry and program

14:34

these computers. This is crazy. So I

14:37

think this is quite remarkable. Before

14:39

we program LLMs, we have to kind of like

14:42

spend some time to think about what

14:43

these things are. And I especially like

14:45

to kind of talk about their psychology.

14:48

So the way I like to think about LLMs is

14:50

that they're kind of like people

14:51

spirits. Um they are stoastic

14:54

simulations of people. Um and the

14:56

simulator in this case happens to be an

14:58

auto reggressive transformer. So

14:59

transformer is a neural net. Uh it's and

15:02

it just kind of like is goes on the

15:04

level of tokens. It goes chunk chunk

15:06

chunk chunk chunk. And there's an almost

15:08

equal amount of compute for every single

15:10

chunk. Um and um this simulator of

15:14

course is is just is basically there's

15:16

some weights involved and we fit it to

15:19

all of text that we have on the internet

15:20

and so on. And you end up with this kind

15:22

of a simulator and because it is trained

15:24

on humans, it's got this emergent

15:26

psychology that is humanlike. So the

15:28

first thing you'll notice is of course

15:30

uh LLM have encyclopedic knowledge and

15:32

memory. uh and they can remember lots of

15:34

things, a lot more than any single

15:36

individual human can because they read

15:37

so many things. It's it actually kind of

15:39

reminds me of this movie Rainman, which

15:41

I actually really recommend people

15:43

watch. It's an amazing movie. I love

15:44

this movie. Um and Dustin Hoffman here

15:46

is an autistic savant who has almost

15:49

perfect memory. So, he can read a he can

15:51

read like a phone book and remember all

15:53

of the names and phone numbers. And I

15:55

kind of feel like LM are kind of like

15:57

very similar. They can remember Shaw

15:58

hashes and lots of different kinds of

16:00

things very very easily. So they

16:02

certainly have superpowers in some set

16:04

in some respects. But they also have a

16:06

bunch of I would say cognitive deficits.

16:08

So they hallucinate quite a bit. Um and

16:11

they kind of make up stuff and don't

16:13

have a very good uh sort of internal

16:15

model of self-nowledge, not sufficient

16:17

at least. And this has gotten better but

16:19

not perfect. They display jagged

16:21

intelligence. So they're going to be

16:22

superhuman in some problems solving

16:24

domains. And then they're going to make

16:26

mistakes that basically no human will

16:27

make. like you know they will insist

16:29

that 9.11 is greater than 9.9 or that

16:32

there are two Rs in strawberry these are

16:34

some famous examples but basically there

16:36

are rough edges that you can trip on so

16:38

that's kind of I think also kind of

16:40

unique um they also kind of suffer from

16:43

entrograde amnesia um so uh and I think

16:46

I'm alluding to the fact that if you

16:48

have a co-orker who joins your

16:49

organization this co-orker will over

16:51

time learn your organization and uh they

16:54

will understand and gain like a huge

16:55

amount of context on the organization

16:57

and they go home and they sleep and they

16:59

consolidate knowledge and they develop

17:01

expertise over time. LLMs don't natively

17:03

do this and this is not something that

17:04

has really been solved in the R&D of

17:06

LLM. I think um and so context windows

17:09

are really kind of like working memory

17:10

and you have to sort of program the

17:12

working memory quite directly because

17:13

they don't just kind of like get smarter

17:15

by uh by default and I think a lot of

17:17

people get tripped up by the analogies

17:19

uh in this way. Uh in popular culture I

17:22

recommend people watch these two movies

17:23

uh Momento and 51st dates. In both of

17:26

these movies, the protagonists, their

17:27

weights are fixed and their context

17:29

windows gets wiped every single morning

17:32

and it's really problematic to go to

17:34

work or have relationships when this

17:35

happens and this happens to all the

17:37

time. I guess one more thing I would

17:39

point to is security kind of related

17:42

limitations of the use of LLM. So for

17:44

example, LLMs are quite gullible. Uh

17:46

they are susceptible to prompt injection

17:48

risks. They might leak your data etc.

17:50

And so um and there's many other

17:52

considerations uh security related. So,

17:55

so basically long story short, you have

17:57

to load your you have to load your you

18:00

have to simultaneously think through

18:01

this superhuman thing that has a bunch

18:03

of cognitive deficits and issues. How do

18:05

we and yet they are extremely like

18:07

useful and so how do we program them and

18:10

how do we work around their deficits and

18:12

enjoy their superhuman powers.

18:15

So what I want to switch to now is talk

18:17

about the opportunities of how do we use

18:18

these models and what are some of the

18:20

biggest opportunities. This is not a

18:22

comprehensive list just some of the

18:23

things that I thought were interesting

18:24

for this talk. The first thing I'm kind

18:26

of excited about is what I would call

18:29

partial autonomy apps. So for example,

18:32

let's work with the example of coding.

18:34

You can certainly go to chacht directly

18:36

and you can start copy pasting code

18:38

around and copyping bug reports and

18:40

stuff around and getting code and copy

18:42

pasting everything around. Why would you

18:44

why would you do that? Why would you go

18:45

directly to the operating system? It

18:47

makes a lot more sense to have an app

18:48

dedicated for this. And so I think many

18:50

of you uh use uh cursor. I do as well.

18:53

And uh cursor is kind of like the thing

18:56

you want instead. You don't want to just

18:57

directly go to the chash apt. And I

18:59

think cursor is a very good example of

19:01

an early LLM app that has a bunch of

19:03

properties that I think are um useful

19:06

across all the LLM apps. So in

19:08

particular, you will notice that we have

19:09

a traditional interface that allows a

19:12

human to go in and do all the work

19:13

manually just as before. But in addition

19:16

to that, we now have this LLM

19:17

integration that allows us to go in

19:19

bigger chunks. And so some of the

19:21

properties of LLM apps that I think are

19:23

shared and useful to point out. Number

19:25

one, the LLMs basically do a ton of the

19:28

context management. Um, number two, they

19:31

orchestrate multiple calls to LLMs,

19:33

right? So in the case of cursor, there's

19:34

under the hood embedding models for all

19:36

your files, the actual chat models,

19:39

models that apply diffs to the code, and

19:41

this is all orchestrated for you. A

19:43

really big one that uh I think also

19:46

maybe not fully appreciated always is

19:48

application specific uh GUI and the

19:50

importance of it. Um because you don't

19:53

just want to talk to the operating

19:54

system directly in text. Text is very

19:56

hard to read, interpret, understand and

19:59

also like you don't want to take some of

20:00

these actions natively in text. So it's

20:03

much better to just see a diff as like

20:05

red and green change and you can see

20:06

what's being added is subtracted. It's

20:08

much easier to just do command Y to

20:10

accept or command N to reject. I

20:11

shouldn't have to type it in text,

20:13

right? So, a guey allows a human to

20:15

audit the work of these fallible systems

20:17

and to go faster. I'm going to come back

20:20

to this point a little bit uh later as

20:21

well. And the last kind of feature I

20:23

want to point out is that there's what I

20:25

call the autonomy slider. So, for

20:27

example, in cursor, you can just do tap

20:29

completion. You're mostly in charge. You

20:31

can select a chunk of code and command K

20:33

to change just that chunk of code. You

20:36

can do command L to change the entire

20:37

file. Or you can do command I which just

20:40

you know let it rip do whatever you want

20:42

in the entire repo and that's the sort

20:44

of full autonomy agent agentic version

20:46

and so you are in charge of the autonomy

20:48

slider and depending on the complexity

20:50

of the task at hand you can uh tune the

20:53

amount of autonomy that you're willing

20:54

to give up uh for that task maybe to

20:57

show one more example of a fairly

20:58

successful LLM app uh perplexity um it

21:03

also has very similar features to what

21:04

I've just pointed out to in cursor uh it

21:07

packages up a lot of the information. It

21:08

orchestrates multiple LLMs. It's got a

21:10

GUI that allows you to audit some of its

21:13

work. So, for example, it will site

21:15

sources and you can imagine inspecting

21:17

them. And it's got an autonomy slider.

21:18

You can either just do a quick search or

21:20

you can do research or you can do deep

21:22

research and come back 10 minutes later.

21:24

So, this is all just varying levels of

21:25

autonomy that you give up to the tool.

21:27

So, I guess my question is I feel like a

21:30

lot of software will become partially

21:32

autonomous. I'm trying to think through

21:33

like what does that look like? And for

21:35

many of you who maintain products and

21:36

services, how are you going to make your

21:38

products and services partially

21:40

autonomous? Can an LLM see everything

21:42

that a human can see? Can an LLM act in

21:45

all the ways that a human could act? And

21:47

can humans supervise and stay in the

21:49

loop of this activity? Because again,

21:50

these are fallible systems that aren't

21:52

yet perfect. And what does a diff look

21:54

like in Photoshop or something like

21:56

that? You know, and also a lot of the

21:58

traditional software right now, it has

22:00

all these switches and all this kind of

22:01

stuff that's all designed for human. All

22:03

of this has to change and become

22:04

accessible to LLMs.

22:07

So, one thing I want to stress with a

22:09

lot of these LLM apps that I'm not sure

22:11

gets as much attention as it should is

22:14

um we we're now kind of like cooperating

22:16

with AIS and usually they are doing the

22:18

generation and we as humans are doing

22:20

the verification. It is in our interest

22:22

to make this loop go as fast as

22:24

possible. So, we're getting a lot of

22:25

work done. There are two major ways that

22:28

I think uh this can be done. Number one,

22:30

you can speed up verification a lot. Um,

22:32

and I think guies, for example, are

22:34

extremely important to this because a

22:36

guey utilizes your computer vision GPU

22:39

in all of our head. Reading text is

22:41

effortful and it's not fun, but looking

22:43

at stuff is fun and it's it's just a

22:45

kind of like a highway to your brain.

22:47

So, I think guies are very useful for

22:49

auditing systems and visual

22:51

representations in general. And number

22:53

two, I would say is we have to keep the

22:56

AI on the leash. We I think a lot of

22:58

people are getting way over excited with

23:00

AI agents and uh it's not useful to me

23:03

to get a diff of 10,000 lines of code to

23:05

my repo. Like I have to I'm still the

23:07

bottleneck, right? Even though that

23:09

10,00 lines come out instantly, I have

23:11

to make sure that this thing is not

23:12

introducing bugs. It's just like and

23:15

that it's doing the correct thing,

23:16

right? And that there's no security

23:17

issues and so on. So um I think that um

23:22

yeah basically you we have to sort of

23:25

like it's in our interest to make the

23:28

the flow of these two go very very fast

23:30

and we have to somehow keep the AI on

23:32

the leash because it gets way too

23:33

overreactive. It's uh it's kind of like

23:35

this. This is how I feel when I do AI

23:37

assisted coding. If I'm just bite coding

23:39

everything is nice and great but if I'm

23:40

actually trying to get work done it's

23:42

not so great to have an overreactive uh

23:44

agent doing all this kind of stuff. So

23:47

this slide is not very good. I'm sorry,

23:48

but I guess I'm trying to develop like

23:51

many of you some ways of utilizing these

23:53

agents in my coding workflow and to do

23:55

AI assisted coding. And in my own work,

23:58

I'm always scared to get way too big

23:59

diffs. I always go in small incremental

24:02

chunks. I want to make sure that

24:04

everything is good. I want to spin this

24:06

loop very very fast and um I sort of

24:09

work on small chunks of single concrete

24:10

thing. Uh and so I think many of you

24:13

probably are developing similar ways of

24:14

working with the with LLMs.

24:17

Um, I also saw a number of blog posts

24:19

that try to develop these best practices

24:22

for working with LLMs. And here's one

24:24

that I read recently and I thought was

24:25

quite good. And it kind of discussed

24:26

some techniques and some of them have to

24:28

do with how you keep the AI on the

24:29

leash. And so, as an example, if you are

24:32

prompting, if your prompt is vague, then

24:34

uh the AI might not do exactly what you

24:36

wanted and in that case, verification

24:38

will fail. You're going to ask for

24:40

something else. If a verification fails,

24:42

then you're going to start spinning. So

24:43

it makes a lot more sense to spend a bit

24:45

more time to be more concrete in your

24:46

prompts which increases the probability

24:48

of successful verification and you can

24:50

move forward. And so I think a lot of us

24:52

are going to end up finding um kind of

24:54

techniques like this. I think in my own

24:56

work as well I'm currently interested in

24:57

uh what education looks like in um

25:00

together with kind of like now that we

25:01

have AI uh and LLMs what does education

25:04

look like? And I think a a large amount

25:07

of thought for me goes into how we keep

25:09

AI on the leash. I don't think it just

25:11

works to go to chat and be like, "Hey,

25:13

teach me physics." I don't think this

25:14

works because the AI is like gets lost

25:16

in the woods. And so for me, this is

25:18

actually two separate apps. For example,

25:20

there's an app for a teacher that

25:22

creates courses and then there's an app

25:24

that takes courses and serves them to

25:26

students. And in both cases, we now have

25:29

this intermediate artifact of a course

25:31

that is auditable and we can make sure

25:32

it's good. We can make sure it's

25:33

consistent. and the AI is kept on the

25:35

leash with respect to a certain

25:37

syllabus, a certain like um progression

25:40

of projects and so on. And so this is

25:42

one way of keeping the AI on leash and I

25:44

think has a much higher likelihood of

25:45

working and the AI is not getting lost

25:47

in the woods.

25:49

One more kind of analogy I wanted to

25:51

sort of allude to is I'm not I'm no

25:54

stranger to partial autonomy and I kind

25:56

of worked on this I think for five years

25:57

at Tesla and this is also a partial

26:00

autonomy product and shares a lot of the

26:01

features like for example right there in

26:03

the instrument panel is the GUI of the

26:05

autopilot so it's showing me what the

26:07

what the neural network sees and so on

26:09

and we have the autonomy slider where

26:10

over the course of my tenure there we

26:13

did more and more autonomous tasks for

26:15

the user and maybe the story that I

26:18

wanted to tell very briefly is uh

26:21

actually the first time I drove a

26:22

self-driving vehicle was in 2013 and I

26:25

had a friend who worked at Whimo and uh

26:27

he offered to give me a drive around

26:29

Palo Alto. I took this picture using

26:31

Google Glass at the time and many of you

26:33

are so young that you might not even

26:35

know what that is. Uh but uh yeah, this

26:37

was like all the rage at the time. And

26:39

we got into this car and we went for

26:40

about a 30-minute drive around Palo Alto

26:42

highways uh streets and so on. And this

26:45

drive was perfect. There was zero

26:46

interventions and this was 2013 which is

26:49

now 12 years ago. And it kind of struck

26:52

me because at the time when I had this

26:54

perfect drive, this perfect demo, I felt

26:56

like, wow, self-driving is imminent

26:59

because this just worked. This is

27:00

incredible. Um, but here we are 12 years

27:03

later and we are still working on

27:04

autonomy. Um, we are still working on

27:07

driving agents and even now we haven't

27:09

actually like really solved the problem.

27:10

like you may see Whimos going around and

27:12

they look driverless but you know

27:14

there's still a lot of teleoperation and

27:16

a lot of human in the loop of a lot of

27:18

this driving so we still haven't even

27:20

like declared success but I think it's

27:22

definitely like going to succeed at this

27:24

point but it just took a long time and

27:26

so I think like like this is software is

27:29

really tricky I think in the same way

27:31

that driving is tricky and so when I see

27:34

things like oh 2025 is the year of

27:36

agents I get very concerned and I kind

27:38

of feel like you know this is the decade

27:41

of agents and this is going to be quite

27:44

some time. We need humans in the loop.

27:45

We need to do this carefully. This is

27:47

software. Let's be serious here. One

27:51

more kind of analogy that I always think

27:52

through is the Iron Man suit. Uh I think

27:56

this is I always love Iron Man. I think

27:58

it's like so um correct in a bunch of

28:01

ways with respect to technology and how

28:02

it will play out. And what I love about

28:04

the Iron Man suit is that it's both an

28:05

augmentation and Tony Stark can drive it

28:08

and it's also an agent. And in some of

28:10

the movies, the Iron Man suit is quite

28:11

autonomous and can fly around and find

28:13

Tony and all this kind of stuff. And so

28:15

this is the autonomy slider is we can be

28:17

we can build augmentations or we can

28:19

build agents and we kind of want to do a

28:21

bit of both. But at this stage I would

28:23

say working with fallible LLMs and so

28:25

on. I would say you know it's less Iron

28:29

Man robots and more Iron Man suits that

28:31

you want to build. It's less like

28:33

building flashy demos of autonomous

28:35

agents and more building partial

28:36

autonomy products. And these products

28:39

have custom gueies and UIUX. And we're

28:41

trying to um and this is done so that

28:43

the generation verification loop of the

28:45

human is very very fast. But we are not

28:48

losing the sight of the fact that it is

28:49

in principle possible to automate this

28:51

work. And there should be an autonomy

28:52

slider in your product. And you should

28:54

be thinking about how you can slide that

28:55

autonomy slider and make your product uh

28:58

sort of um more autonomous over time.

29:01

But this is kind of how I think there's

29:02

lots of opportunities in these kinds of

29:04

products. I want to now switch gears a

29:06

little bit and talk about one other

29:08

dimension that I think is very unique.

29:09

Not only is there a new type of

29:11

programming language that allows for

29:12

autonomy in software but also as I

29:15

mentioned it's programmed in English

29:16

which is this natural interface and

29:19

suddenly everyone is a programmer

29:20

because everyone speaks natural language

29:22

like English. So this is extremely

29:24

bullish and very interesting to me and

29:26

also completely unprecedented. I would

29:28

say it it used to be the case that you

29:29

need to spend five to 10 years studying

29:31

something to be able to do something in

29:32

software. this is not the case anymore.

29:35

So, I don't know if by any chance anyone

29:37

has heard of vibe coding.

29:40

Uh, this this is the tweet that kind of

29:42

like introduced this, but I'm told that

29:44

this is now like a major meme. Um, fun

29:46

story about this is that I've been on

29:49

Twitter for like 15 years or something

29:51

like that at this point and I still have

29:53

no clue which tweet will become viral

29:56

and which tweet like fizzles and no one

29:58

cares. And I thought that this tweet was

30:00

going to be the latter. I don't know. It

30:01

was just like a shower of thoughts. But

30:03

this became like a total meme and I

30:05

really just can't tell. But I guess like

30:06

it struck a chord and it gave a name to

30:08

something that everyone was feeling but

30:10

couldn't quite say in words. So now

30:13

there's a Wikipedia page and everything.

30:17

This is like

30:18

[Applause]

30:25

yeah this is like a major contribution

30:27

now or something like that. So,

30:30

um, so Tom Wolf from HuggingFace shared

30:32

this beautiful video that I really love.

30:34

Um,

30:37

these are kids vibe coding.

30:42

And I find that this is such a wholesome

30:44

video. Like, I love this video. Like,

30:46

how can you look at this video and feel

30:48

bad about the future? The future is

30:49

great.

30:52

I think this will end up being like a

30:53

gateway drug to software development.

30:56

Um, I'm not a doomer about the future of

30:59

the generation and I think yeah, I love

31:02

this video. So, I tried by coding a

31:04

little bit uh as well because it's so

31:07

fun. Uh, so bike coding is so great when

31:09

you want to build something super duper

31:10

custom that doesn't appear to exist and

31:12

you just want to wing it because it's a

31:13

Saturday or something like that. So, I

31:15

built this uh iOS app and I don't I

31:18

can't actually program in Swift, but I

31:20

was really shocked that I was able to

31:21

build like a super basic app and I'm not

31:23

going to explain it. It's really uh

31:24

dumb, but uh I kind of like this was

31:27

just like a day of work and this was

31:28

running on my phone like later that day

31:30

and I was like, "Wow, this is amazing."

31:32

I didn't have to like read through Swift

31:33

for like five days or something like

31:35

that to like get started. I also

31:38

vipcoded this app called Menu Genen. And

31:40

this is live. You can try it in

31:41

menu.app. And I basically had this

31:44

problem where I show up at a restaurant,

31:45

I read through the menu, and I have no

31:46

idea what any of the things are. And I

31:48

need pictures. So this doesn't exist. So

31:51

I was like, "Hey, I'm going to bite code

31:52

it." So, um, this is what it looks like.

31:55

You go to menu.app,

31:58

um, and, uh, you take a picture of a of

32:01

a menu and then menu generates the

32:03

images and everyone gets $5 in credits

32:06

for free when you sign up. And

32:08

therefore, this is a major cost center

32:10

in my life. So, this is a negative

32:13

negative uh, revenue app for me right

32:16

now.

32:17

I've lost a huge amount of money on

32:19

menu.

32:21

Okay. But the fascinating thing about

32:23

menu genen for me is that the code of

32:28

the v the vite coding part the code was

32:30

actually the easy part of v of v coding

32:32

menu and most of it actually was when I

32:35

tried to make it real so that you can

32:36

actually have authentication and

32:37

payments and the domain name and averal

32:39

deployment. This was really hard and all

32:41

of this was not code. All of this devops

32:44

stuff was in me in the browser clicking

32:47

stuff and this was extreme slo and took

32:49

another week. So it was really

32:51

fascinating that I had the menu genen um

32:54

basically demo working on my laptop in a

32:57

few hours and then it took me a week

32:59

because I was trying to make it real and

33:01

the reason for this is this was just

33:02

really annoying. Um, so for example, if

33:05

you try to add Google login to your web

33:07

page, I know this is very small, but

33:09

just a huge amount of instructions of

33:11

this clerk library telling me how to

33:13

integrate this. And this is crazy. Like

33:15

it's telling me go to this URL, click on

33:17

this dropdown, choose this, go to this,

33:19

and click on that. And it's like telling

33:21

me what to do. Like a computer is

33:22

telling me the actions I should be

33:24

taking. Like you do it. Why am I doing

33:26

this?

33:28

What the hell?

33:31

I had to follow all these instructions.

33:33

This was crazy. So I think the last part

33:36

of my talk therefore focuses on can we

33:39

just build for agents? I don't want to

33:41

do this work. Can agents do this? Thank

33:44

you.

33:46

Okay. So roughly speaking, I think

33:48

there's a new category of consumer and

33:50

manipulator of digital information. It

33:53

used to be just humans through GUIs or

33:55

computers through APIs. And now we have

33:57

a completely new thing and agents are

34:00

they're computers but they are humanlike

34:02

kind of right they're people spirits

34:04

there's people spirits on the internet

34:05

and they need to interact with our

34:06

software infrastructure like can we

34:08

build for them it's a new thing so as an

34:10

example you can have robots.txt on your

34:12

domain and you can instruct uh or like

34:15

advise I suppose um uh web crawlers on

34:18

how to behave on your website in the

34:19

same way you can have maybe lm.txt txt

34:21

file which is just a simple markdown

34:23

that's telling LLMs what this domain is

34:25

about and this is very readable to a to

34:28

an LLM. If it had to instead get the

34:30

HTML of your web page and try to parse

34:32

it, this is very errorprone and

34:33

difficult and will screw it up and it's

34:35

not going to work. So we can just

34:36

directly speak to the LLM. It's worth

34:38

it. Um a huge amount of documentation is

34:41

currently written for people. So you

34:42

will see things like lists and bold and

34:45

pictures and this is not directly

34:47

accessible by an LLM. So I see some of

34:51

the services now are transitioning a lot

34:52

of the their docs to be specifically for

34:54

LLMs. So Versell and Stripe as an

34:57

example are early movers here but there

34:59

are a few more that I've seen already

35:01

and they offer their documentation in

35:04

markdown. Markdown is super easy for LMS

35:06

to understand. This is great. Um maybe

35:10

one simple example from from uh my

35:12

experience as well. Maybe some of you

35:14

know three blue one brown. He makes

35:15

beautiful animation videos on YouTube.

35:19

[Applause]

35:23

Yeah, I love this library. So that he

35:25

wrote uh Manon and I wanted to make my

35:27

own and uh there's extensive

35:30

documentations on how to use manon and

35:32

so I didn't want to actually read

35:34

through it. So I copy pasted the whole

35:35

thing to an LLM and I described what I

35:37

wanted and it just worked out of the box

35:39

like LLM just bcoded me an animation

35:41

exactly what I wanted and I was like wow

35:43

this is amazing. So if we can make docs

35:45

legible to LLMs, it's going to unlock a

35:48

huge amount of um kind of use and um I

35:51

think this is wonderful and should

35:52

should happen more. The other thing I

35:55

wanted to point out is that you do

35:56

unfortunately have to it's not just

35:57

about taking your docs and making them

35:58

appear in markdown. That's the easy

36:00

part. We actually have to change the

36:01

docs because anytime your docs say click

36:04

this is bad. An LLM will not be able to

36:06

natively take this action right now. So,

36:09

Verscell, for example, is replacing

36:11

every occurrence of click with an

36:13

equivalent curl command that your LM

36:15

agent could take on your behalf. Um, and

36:18

so I think this is very interesting. And

36:19

then, of course, there's a model context

36:21

protocol from Enthropic. And this is

36:23

also another way, it's a protocol of

36:24

speaking directly to agents as this new

36:26

consumer and manipulator of digital

36:28

information. So, I'm very bullish on

36:29

these ideas. The other thing I really

36:31

like is a number of little tools here

36:33

and there that are helping ingest data

36:36

that in like very LLM friendly formats.

36:38

So for example, when I go to a GitHub

36:40

repo like my nanoGPT repo, I can't feed

36:42

this to an LLM and ask questions about

36:44

it uh because it's you know this is a

36:46

human interface on GitHub. So when you

36:48

just change the URL from GitHub to get

36:50

ingest then uh this will actually

36:52

concatenate all the files into a single

36:54

giant text and it will create a

36:55

directory structure etc. And this is

36:57

ready to be copy pasted into your

36:59

favorite LLM and you can do stuff. Maybe

37:01

even more dramatic example of this is

37:03

deep wiki where it's not just the raw

37:05

content of these files. uh this is from

37:08

Devon but also like they have Devon

37:10

basically do analysis of the GitHub repo

37:12

and Devon basically builds up a whole

37:14

docs uh pages just for your repo and you

37:18

can imagine that this is even more

37:19

helpful to copy paste into your LLM. So

37:22

I love all the little tools that

37:23

basically where you just change the URL

37:24

and it makes something accessible to an

37:26

LLM. So this is all well and great and u

37:29

I think there should be a lot more of

37:30

it. One more note I wanted to make is

37:32

that it is absolutely possible that in

37:35

the future LLMs will be able to this is

37:38

not even future this is today they'll be

37:39

able to go around and they'll be able to

37:40

click stuff and so on but I still think

37:42

it's very worth u basically meeting LLM

37:46

halfway LLM's halfway and making it

37:48

easier for them to access all this

37:49

information uh because this is still

37:51

fairly expensive I would say to use and

37:54

uh a lot more difficult and so I do

37:56

think that lots of software there will

37:58

be a long tail where it won't like adapt

38:00

apps because these are not like live

38:02

player sort of repositories or digital

38:04

infrastructure and we will need these

38:06

tools. Uh but I think for everyone else

38:08

I think it's very worth kind of like

38:09

meeting in some middle point. So I'm

38:11

bullish on both if that makes sense.

38:14

So in summary, what an amazing time to

38:17

get into the industry. We need to

38:18

rewrite a ton of code. A ton of code

38:20

will be written by professionals and by

38:23

coders. These LLMs are kind of like

38:25

utilities, kind of like fabs, but

38:27

they're kind of especially like

38:28

operating systems. But it's so early.

38:30

It's like 1960s of operating systems and

38:34

uh and I think a lot of the analogies

38:36

cross over. Um and these LMS are kind of

38:38

like these fallible uh you know people

38:41

spirits that we have to learn to work

38:43

with. And in order to do that properly,

38:45

we need to adjust our infrastructure

38:47

towards it. So when you're building

38:48

these LLM apps, I describe some of the

38:50

ways of working effectively with these

38:52

LLMs and some of the tools that make

38:54

that uh kind of possible and how you can

38:57

spin this loop very very quickly and

38:59

basically create partial tunneling

39:00

products and then um yeah, a lot of code

39:03

has to also be written for the agents

39:04

more directly. But in any case, going

39:07

back to the Iron Man suit analogy, I

39:09

think what we'll see over the next

39:10

decade roughly is we're going to take

39:12

the slider from left to right. And I'm

39:15

very interesting. It's going to be very

39:17

interesting to see what that looks like.

39:19

And I can't wait to build it with all of

39:21

you. Thank you.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The speaker, Andrej Karpathy, discusses the fundamental evolution of software, proposing Software 1.0 (human-written code), Software 2.0 (neural network weights), and the emergent Software 3.0 (Large Language Models programmed by natural language prompts). He posits that LLMs resemble utilities, fabs, and especially operating systems, but currently operate in an environment akin to 1960s computing. A unique aspect of LLMs is their inverted technology diffusion, starting with consumer applications. Despite their superhuman knowledge, LLMs exhibit cognitive deficits like hallucinations, jagged intelligence, and anterograde amnesia. Karpathy highlights opportunities in "partial autonomy apps" that manage context, orchestrate LLM calls, and feature application-specific GUIs and "autonomy sliders." He emphasizes the importance of speeding up the human generation-verification loop and keeping AI on a "leash." The talk introduces "vibe coding" as a new, accessible programming paradigm, but notes challenges in real-world deployment. Finally, he advocates for designing software to be agent-friendly (e.g., markdown documentation, curl commands) and building "Iron Man suits" (augmentations) rather than fully autonomous "Iron Man robots," stressing the need for human involvement in the loop as the industry slowly moves towards greater autonomy.