The End of Coding: Andrej Karpathy on Agents, AutoResearch, and the Loopy Era of AI

Watch on YouTube

Now Playing

Transcript

2231 segments

0:00

code's not even the right verb anymore,

0:02

right? But I have to um express my will

0:04

to my agents for 16 hours a day

0:07

manifest.

0:07

>> How can I have not just a single session

0:09

of clot code or codeex or some of these

0:11

agent harnesses? How can I have more of

0:12

them? How can I do that appropriately?

0:14

The agent part is now taken for granted.

0:16

Now the claw-like entities are taken for

0:18

granted and now you can have multiple of

0:19

them and now you can have instructions

0:21

to them and now you can have

0:22

optimization over the instructions. But

0:24

there I mean this is why it gets to the

0:25

psychosis is that this is like infinite

0:27

and everything is skill issue.

0:34

Hi listeners, welcome back to No Briars.

0:37

Today I'm here with Andre Karpathy and

0:38

we have a wide-ranging conversation for

0:40

you about code agents, the future of

0:43

engineering and AI research, how more

0:45

people can contribute to research,

0:47

what's happening in robotics, his

0:49

prediction for how agents can reach out

0:51

into the real world, and education in

0:53

this next age. Welcome, Andre. Andre,

0:56

thanks for doing this. Yeah, thank you

0:58

for having me.

0:59

>> Uh, so it's been a very exciting couple

1:01

of months in AI.

1:02

>> Uh, yeah, you could say that.

1:04

>> I remember um walking into the office at

1:07

some point and you were like really

1:08

locked in and I was asking what you were

1:10

up to and you're like, I just I have to

1:11

code for 16 hours a day or code's not

1:13

even the right verb anymore, right? But

1:15

I have to

1:16

>> um express my will to my agents for 16

1:19

hours a day. Manifest

1:21

um because like there's been a jump in

1:24

capability.

1:25

>> Uh what's happening? and tell me about

1:27

your experience.

1:28

>> Yeah, I kind of feel like I was just in

1:29

this perpetual I still am often in this

1:31

state of AI psychosis just like all the

1:33

time. Uh because there was a huge unlock

1:35

in what you can achieve as a person as

1:36

an individual, right? Because you were

1:38

bottlenecked by you know your typing

1:39

speed and so on. But now with these

1:41

agents, it really I would say in

1:43

December is when it really just

1:45

something flipped where I kind of went

1:46

from 8020 of like you know uh to like

1:49

2080 of writing code by myself versus

1:51

just delegating to agents. And I don't

1:53

even think it's 2080 by now. I think

1:54

it's a lot more than that. I don't think

1:55

I've typed like a line of code probably

1:58

since December basically. Um, which is

2:01

like an extremely large uh change. Um, I

2:05

was talking to it like for example I was

2:06

talking about it to for example my

2:08

parents and so on and I don't think like

2:09

a normal person actually realizes that

2:11

this happened or how dramatic it was

2:12

like literally like if you just find a

2:14

random software engineer or something

2:15

like that at their at their desk and

2:17

what they're doing like their default

2:19

workflow of you know building software

2:21

is completely different as of basically

2:23

December. Uh so I'm just like in this

2:26

state of psychosis of trying to figure

2:28

out like what's possible uh trying to

2:30

push it to the limit. How is it how can

2:31

I have not just a single session of you

2:33

know um clot code or codecs or some of

2:35

these agent harnesses. How can I have

2:36

more of them? How can I do that uh

2:38

appropriately? And then how can I use

2:40

these claws? What are these claws? Uh

2:43

and uh so there's like a lot of new

2:45

things. I want to be at the forefront of

2:46

it, you know, and I'm very antsy that

2:49

I'm not at the forefront of it. And I

2:50

see lots of people on Twitter doing all

2:51

kinds of things and they all sound like

2:52

really good ideas and I need to be at

2:54

the forefront or I feel extremely

2:55

nervous. And so I guess I'm just in this

2:57

psychosis of like what's possible like

2:58

because it's unexplored fundamentally.

3:00

>> Well, if you're nervous, the rest of us

3:02

are nervous. We have a uh we have a team

3:04

that we work with at conviction that

3:07

their setup is everybody is like, you

3:10

know, none of the engineers write code

3:11

by hand and they they're all microphoned

3:14

and they just like whisper to their

3:15

agents all the time. It's the strangest

3:17

work setting ever. Uh, and I thought

3:19

they were crazy and now I like I fully

3:21

accept I was like, "Oh, this was the

3:22

way." Like you're just ahead of it.

3:24

>> Um, what uh how do you think about your

3:27

own capacity now to like explore or to

3:30

do projects like what is it limited by?

3:32

>> Yeah. What is it limited by? Uh just I

3:34

think everything like so many things

3:36

even if they don't work I think to a

3:38

large extent you feel like it's skill

3:39

issue. It's not that the capability is

3:41

not there. It's that you just haven't

3:42

found a way to string it together of

3:44

what's available. like I just don't I

3:46

didn't give good enough instructions in

3:47

the agents MD file or whatever it may

3:50

be. I don't have a nice enough memory

3:52

tool that I put in there or something

3:54

like that. So it all kind of feels like

3:56

skills when it doesn't work to some

3:57

extent. You want to see how you can

3:58

paralyze them etc. And you want to be

4:00

Peter Steinberg basically. Uh so Peter

4:02

is famous. He has a funny photo where

4:04

he's in front of a monitor with lots of

4:05

uh like uh he uses codecs. So lots of

4:07

codecs agents styling the the monitor

4:10

and they all take about 20 minutes if

4:11

you prompt them correctly and you use

4:12

the high effort. And so they all take

4:14

about 20 minutes. they have multiple,

4:15

you know, 10 repos checked out and so

4:18

he's just um going between them and

4:20

giving them work. It's just like you can

4:22

you can you can move in much larger

4:24

macro actions. It's not just like here's

4:25

a line of code, here's a new function.

4:27

It's like here's a new functionality and

4:29

delegate it to agent one. Here's a new

4:30

functionality that's not going to

4:31

interfere with the other one. Give it a

4:33

two and then try to uh review their work

4:35

as best as you can depending on how much

4:38

you care about that code. Like what are

4:39

these macro actions that I can like

4:41

manipulate my software repository by?

4:43

and like another agent is doing some

4:45

like research and another agent is

4:46

writing code, another one is coming up

4:48

with a plan for some new implementation.

4:49

And so everything just like happens in

4:51

these like macro actions over your

4:52

repository. Um, and you're just trying

4:55

to become like really good at it and

4:56

develop like a muscle memory for it is

4:58

extremely um, yeah, it's very rewarding

5:01

number one because it actually works.

5:02

Uh, but it's also kind of like the new

5:04

thing to learn. So that's why hence the

5:06

psychosis. Yeah, I I do feel like my

5:09

instinct is like whenever I am waiting

5:12

for an agent to complete something, the

5:13

obvious thing to do is like, well, I can

5:15

do more work, right? Like if I have

5:16

access to more tokens, then like I

5:18

should just paralyze add more tasks. And

5:20

so that that's very stressful because if

5:22

you

5:23

>> don't feel very bounded by your ability

5:25

to spend on tokens, then you know you

5:28

are the bottleneck in the system that is

5:30

max capability.

5:31

>> Yeah. you're not maximizing your

5:32

subscription at least and ideally for

5:35

multiple agents like if you run out of

5:36

the kod on codecs you should switch to

5:38

cloud or whatnot I don't know like

5:40

that's what I've been trying to do a

5:41

little bit and I feel nervous when I

5:42

have subscription left over uh that just

5:44

means I haven't maximized my token

5:46

throughput so I actually kind of

5:47

experienced this when I was a PhD

5:48

student you would feel nervous when your

5:50

GPUs are not running like you have GPU

5:52

capability and you're not maxing out the

5:53

available flops to you but now it's not

5:55

about flops it's about tokens uh so what

5:58

is your token throughput and what token

6:00

throughput do you command I would

6:01

actually argue that it's very

6:02

interesting that we had you know at

6:05

least 10 years where in many engineering

6:09

tasks people just did they didn't feel

6:10

compute bound

6:12

>> right um and like the entire industry

6:14

feels that now they feel like they they

6:16

they felt resource bound

6:18

>> uh and now that you have this big

6:20

capability jump you're like oh actually

6:23

it's not you know my ability to access

6:25

the compute anymore like I'm I'm the

6:27

binding constraint

6:28

>> yeah it's a skill issue

6:29

>> which is very empowering cuz um yeah cuz

6:31

you could be getting better. So that's

6:33

why that's why I think it's very

6:34

addictive because there's unlocks when

6:36

you when you get better.

6:37

>> Where do you think it goes? Like if you

6:38

just think about like okay you know

6:40

Andre is iterating and everybody else is

6:42

for 16 hours a day getting better at

6:44

using coding agents like what does it

6:45

look like in a year of like you've

6:47

reached mastery?

6:49

>> Yeah. What does mastery look like right

6:50

at the end of the year or like two three

6:52

years 5 years 10 years etc.

6:54

>> Well I think everyone is basically

6:55

interested in like going up the stack.

6:58

So I would say yeah it's not about a

6:59

single session with your agent. Um

7:02

multiple agents how do they collaborate

7:03

and teams and so on. So everyone's

7:05

trying to figure out what that looks

7:06

like. And then I would say claw is also

7:07

kind of an interesting direction because

7:08

it really when I say a claw I mean this

7:10

like layer that uh kind of takes

7:12

persistence to a whole new level. Like

7:14

it's something that like keeps looping

7:16

is is like um it's not something that

7:18

you are interactively in the middle of.

7:20

It kind of like has its own little

7:21

sandbox its own little you know it kind

7:24

of like does stuff on your behalf even

7:25

if you're not looking kind of thing.

7:27

um and then also has like maybe more

7:29

sophisticated memory systems etc that

7:30

are not yet implemented in agents. So

7:33

open claw has a lot more sophisticated

7:34

memory I would say than what you would

7:35

get by default uh which is just a memory

7:37

compaction when your context runs out.

7:39

Right.

7:39

>> You think that's the piece that

7:40

resonated for more users versus like

7:43

perhaps like broader tool access

7:45

>> for open claw.

7:46

>> Yeah. There there's like I think there's

7:48

at least I think

7:48

>> there's a lot of really good ideas in

7:49

here. Yeah. Good job Peter.

7:50

>> I mean Peter has done a really amazing

7:52

job. Um I saw him recently uh and I

7:55

talked to him about it and I he's very

7:57

humble about it but I think he innovated

7:59

simultaneously in like five different

8:00

ways and put it all together. Um so for

8:02

example like the soul and D document

8:04

like he actually really crafted a

8:05

personality that is kind of compelling

8:07

and interesting and I feel like a lot of

8:08

the current agents they don't get this

8:09

correctly. I actually think a clot has a

8:11

pretty good personality. It feels like a

8:12

teammate

8:13

>> uh and it's excited with you etc. Uh, I

8:16

would say um, for example, Codex is a

8:18

lot more dry. Um, which is kind of

8:20

interesting because in ChachiPT CEX is

8:22

like a lot more upbeat and highly

8:23

cyclical. But I would say Codex the

8:25

coding agent is very dry. It doesn't it

8:27

doesn't seem to care about what you're

8:28

creating. It's kind of like, oh, I

8:30

implemented it. It's like, okay, but do

8:31

you understand what we're building?

8:34

>> It's true.

8:34

>> You know, it doesn't it. The other thing

8:37

I would say is for example with Claude I

8:38

think they dial the psychopasy fairly

8:40

well where when Claude gives me praise I

8:42

do feel like I slightly deserve it

8:44

>> because sometimes I kind of give it like

8:45

not very wellformed thoughts and uh I

8:47

give it an idea that I don't think it's

8:49

fully baked and it doesn't actually

8:50

react very strongly. It's like oh yeah

8:51

we can implement that. But when it's a

8:53

really good idea by my own account, it

8:55

does seem to reward it a bit more. And

8:57

so I kind of feel like I'm trying to

8:59

like earn its praise which is really

9:00

weird.

9:01

>> And so I do think the personality

9:02

matters a lot. uh and I think a lot of

9:04

the other uh tools maybe don't

9:06

appreciate as much and I think in this

9:07

aspect also Peter really cares about

9:08

this and so that was correct and then

9:10

the memory system and then uh just you

9:12

know he's just having fun with this um

9:14

and then the the single WhatsApp portal

9:16

to all of the automation.

9:17

>> Yeah. Is there something that you have

9:20

done personally with your claws beyond

9:23

software engineering that you think is

9:24

fun or interesting?

9:25

>> Yeah. So in January I had a claw I went

9:27

through a period of claw psychosis. So,

9:29

I built um I have a claw basically that

9:32

takes care of my home and I call him

9:33

Dobby the elf claw. Um and uh basically

9:38

I used uh the agents to find all of the

9:41

smart home subsystems of my home on the

9:43

local area network which I was kind of

9:44

surprised that worked out of the box.

9:45

Like I just told it that I think I have

9:47

Sonos at home. Like can you try to find

9:48

it? and it goes and it did like IP scan

9:50

of all the um basically um computers on

9:53

the local area network and it found the

9:55

Sonos thing uh the Sonos uh system and

9:58

it turned out that there's no password

10:00

protection or anything like that. I just

10:01

logged in and it's like oh yeah you have

10:02

these Sonos systems installed I let me

10:04

try to reverse engineer how it's

10:05

working. It does some web searches and

10:07

it finds like okay these are the API

10:08

endpoints and then it's like do you want

10:10

to try it? And I'm like whoa like you

10:12

just did that. And I'm like, "Yeah, can

10:13

you try to play something in the study?"

10:14

And uh it does and music comes out and

10:16

I'm like, "I can't believe I just

10:18

>> That's crazy. That's like three

10:19

prompts." Yeah.

10:20

>> I can't believe I just typed in like,

10:21

"Can you find my sonos?" And that

10:22

suddenly it's playing music. And it did

10:23

the same for lights. And so basically

10:25

like it kind of hacked in, figured out

10:27

the whole thing. Uh created APIs,

10:28

created a dashboard so I could see the

10:30

command kind of center of like all of my

10:32

lights in the home. And then it was like

10:34

switching lights on and off. And you

10:35

know, so I can ask it like Dobby at

10:37

sleepy time. And when it's sleepy time,

10:39

that just means all the lights go off,

10:41

etc. and so on. So, it controls all of

10:43

my lights, my HVAC, my shades, uh the

10:45

pool and uh spa and also my security

10:48

system. So, I have a camera pointed

10:49

outside of the house and anytime someone

10:51

rolls in, I have a Quinn uh a Quinn

10:55

model that looks at the videos. So,

10:56

first of all, there's change detection,

10:58

right?

10:58

>> And then based on change detection, it

10:59

goes to Quinn and then it actually like

11:01

tells me um it sends me a text to my

11:03

WhatsApp. It shows an image from the

11:05

outside and it says, "Hey, a FedEx truck

11:08

just pulled up. FedEx truck just pulled

11:10

up and you might want to check it and

11:11

you got me mail or something like that.

11:12

And Dobby just text me this this really

11:15

incredible. Um so so Dobby is in charge

11:18

of the house. I text through with it

11:20

through WhatsApp. Um and it's been like

11:22

really fun to have these macro actions

11:24

that maintain my house. I haven't like

11:25

really pushed it uh like way more beyond

11:27

that and I think people are doing a lot

11:28

more crazy things with it. Uh but for me

11:30

even just a home automation setup, I

11:32

used to use like six apps, completely

11:34

different apps and I don't have to use

11:35

these apps anymore. Like Doby controls

11:36

everything in natural language. It's

11:38

amazing. Um, and so I think like I

11:40

haven't even pushed a paradigm fully,

11:42

but already that is so helpful and so

11:44

inspiring I would say.

11:45

>> Do you think that's indicative of like

11:46

what people want from a user experience

11:48

perspective with software, right?

11:50

Because I I don't think you know it's

11:52

pretty ignored that it takes humans

11:53

effort to like learn new software like

11:56

new UI. Yeah, I think uh to some extent

11:59

that's right. It's like working

12:00

backwards from how people think an AI

12:02

should be because what people have in

12:05

their mind of like what an AI is is not

12:06

actually what an LLM is by by like in a

12:08

raw sense like LLM is a token generator,

12:10

you know, like more tokens come out, but

12:12

what they think of is like this p this

12:14

persona identity that they can tell

12:16

stuff and it remembers it, you know, and

12:19

it's just kind of an entity behind a

12:20

WhatsApp. It's like a lot more

12:21

understandable. Mhm.

12:22

>> Uh so I think to some extent it's like

12:24

matching the expectations that humans

12:26

already have for what an AI should

12:27

behave but under the hood it's like a

12:28

lot of technical details go into that

12:30

and LLMs are too raw of a primitive to

12:32

actually um type check as AI I think for

12:35

most people if that makes sense.

12:37

>> Yeah. Um I think that's like how we

12:39

understand what the AI is and like the

12:42

um description of it as Dobby or some

12:45

personality obviously resonates with

12:47

people. Um, I also think that it uh the

12:50

unification that you did across your six

12:52

different software systems for your home

12:54

automation speaks to a different

12:55

question of like

12:56

>> do people really want all the software

12:58

that we have today?

12:59

>> Yeah.

12:59

>> Right. Um because I I would argue like

13:01

well you have the hardware

13:03

>> but you've now thrown away the software

13:05

>> or the the UX layer of it. Um do you

13:08

think that's what people want?

13:09

>> Yeah. I think there's this like there's

13:11

this sense that these apps that are in

13:13

the app store for using these smart home

13:15

devices etc. uh these shouldn't even

13:17

exist kind of in a certain sense like

13:18

shouldn't it just be APIs and shouldn't

13:20

agents be just using it directly and um

13:24

wouldn't it like I can do all kinds of

13:26

home automation stuff that uh any

13:28

individual app will not be able to do

13:29

right um and an LLM can actually drive

13:31

the tools and call all the right tools

13:32

and do do pretty complicated things um

13:36

>> and so in a certain sense it does point

13:38

to this like maybe there's like an

13:39

overproduction of lots of custom bespoke

13:41

apps that shouldn't exist because agents

13:43

kind of like crumble them up and

13:45

everything should be a lot more just

13:47

like exposed API endpoints and agents

13:49

are the glue of the intelligence that

13:51

actually like tool calls all the all the

13:53

parts. Um, another example is like my

13:55

treadmill. Uh, there's an app for my

13:57

treadmill and I wanted to like keep

13:58

track of how often I do my cardio. Uh,

14:00

but like I don't want to like log into a

14:02

web UI and go through a flow and etc.

14:04

Like all this should just be like make

14:06

APIs available and this is kind of you

14:08

know going towards the agentic um sort

14:10

of web or like agent first uh tools and

14:12

all this kind of stuff. So I think the

14:14

industry just has to reconfigure in so

14:16

many ways that it's like the customer is

14:17

not the human anymore. It's like agents

14:19

who are acting on behalf of humans and

14:20

this refactoring will be will probably

14:22

be substantial in a certain sense. One

14:24

way that people sometimes push back on

14:26

this is like do people do do we expect

14:28

people to v code some of these tools? Do

14:30

we expect normal people to do this kind

14:32

of stuff that I described?

14:33

>> But I think to some extent

14:35

>> this is just you know technology as it

14:36

exists today and right now there is some

14:38

vibe coding and I'm actually watching it

14:40

and I'm working with the system. But I

14:42

kind of feel like this kind of stuff

14:43

that I just talked about this should be

14:45

free like in a year or two or three.

14:47

There's no back coding involved. This is

14:49

trivial. This is table stakes. This is

14:50

like any AI even the open source models

14:52

etc can like do this.

14:54

>> You should be able to translate from a

14:56

less technical humans intent very easily

14:59

to this

14:59

>> extremely easily. Yeah. Today it's vi

15:01

coding it's involved and not many people

15:02

are going to do it. But

15:02

>> and you still have to make some design

15:04

decisions, right? We were talking about

15:05

like you take frames for example.

15:07

>> Yeah.

15:08

>> Yeah. But I kind of feel like this will

15:10

just uh start to the barrier will just

15:12

come down and it's just ephemeral

15:14

software on your behalf and some kind of

15:17

like claw is handling all the details

15:19

for you but you're not involved. Claw

15:20

has a

15:21

>> claw has a machine and it will figure it

15:22

out and it's just presenting you UIs and

15:24

you're like saying stuff you know. Mhm.

15:27

>> Why haven't you um I guess like pushed

15:29

the boundaries of what you can do

15:30

personally with Claus? Like is it you

15:32

know you're focusing on more important

15:35

projects, auto research, etc. or uh

15:38

you're climbing the hill to mastery or

15:40

something else, right?

15:41

>> Yeah. I just feel like I'm so distracted

15:43

by everything. So I spend I spent like a

15:45

week on the class stuff and I I have

15:47

more to do almost. Um but I will say

15:49

that um

15:50

>> like Jensen tools were all just busier

15:52

unfortunately.

15:53

>> Yeah. Uh, I didn't really take advantage

15:54

of a lot of like email and calendar and

15:57

all this other stuff and I didn't give

15:58

it access because I'm still a little bit

16:00

like suspicious and it's still very new

16:01

and rough around the edges. So, I didn't

16:03

want to give it like full access to my

16:04

digital life yet. And part of it is just

16:06

the security, privacy and uh just being

16:08

very cautious in that in that realm. And

16:11

um, so some of it is like held back by

16:13

that I would say. Yeah, maybe that's

16:14

like the dominant dominant feature, but

16:16

some of it is also just I feel so

16:18

distracted because I feel like I had a

16:19

week of claw and then other stuff is

16:21

happening. And

16:21

>> what was the um I mean you've talked

16:24

about like being able to train or at

16:26

least optimize a a model as a task you

16:30

want to see agents do for a long time

16:31

like what was the motivation behind auto

16:33

research?

16:33

>> Auto research. Yeah. So I think like I

16:36

had a tweet earlier where I kind of like

16:38

said something along the lines of to get

16:40

the most out of the tools that have

16:41

become available now you have to remove

16:43

yourself as the as the bottleneck. You

16:45

can't be there to prompt the next thing.

16:47

You're you need to take yourself

16:48

outside. um you have to arrange things

16:51

such that they're completely autonomous

16:52

and the more you you know how can you

16:54

maximize your token throughput and not

16:55

be in the loop. This is the this is the

16:57

goal and so I kind of mentioned that the

16:59

the name of the game now is to increase

17:00

your leverage. uh I put in just very few

17:03

tokens just once in a while and a huge

17:04

amount of stuff happens on my behalf and

17:06

so auto research like I tweeted that and

17:08

I think people liked it and whatnot but

17:10

that they haven't like maybe worked

17:12

through like the implications of that

17:13

and for me auto research is an example

17:14

of like an implication of that

17:16

>> where it's like I don't want to be like

17:17

the researcher in the loop like looking

17:19

at results etc like I'm I'm holding the

17:21

system back so the question is how do I

17:24

refactor all the abstractions so that

17:25

I'm not I have to arrange it once and

17:27

hit go the name of the game is how can

17:29

you get more agents running for longer

17:31

periods of time without your involvement

17:32

doing stuff on your behalf and auto

17:34

research is just yeah here's an

17:36

objective here's a metric here's your

17:37

boundaries of what you can and cannot do

17:39

and go and uh yeah

17:42

>> you were surprised at its effectiveness

17:44

>> yeah I I didn't expect uh it to work

17:46

because so I have the project data chat

17:48

um and fundamentally like I think a lot

17:50

of people are very confused with my

17:52

obsession for like training GBT2 models

17:54

and so on but for me uh training GBT

17:56

models and so on is just a little

17:57

harness a little playground for training

17:58

LLMs and fundamentally what I'm more

18:00

interested in is like this idea of

18:01

recursive self-improvement and to what

18:03

extent you can actually have LLMs

18:04

improving LLMs because I think all the

18:06

Frontier Labs this is like the thing

18:08

>> uh for obvious reasons and they're all

18:10

trying to recursively self-improve

18:12

roughly speaking and so for me this is

18:13

kind of like um a little play pen of

18:15

that um and I guess I like tuned Namat

18:18

already quite a bit by hand in a good

18:20

old fashioned way that I'm used to like

18:21

I'm a researcher I've done this for like

18:22

you know two decades I have some amount

18:24

of like what is the opposite of

18:26

>> uh yeah

18:28

>> earned confidence

18:29

>> okay I have like two decades of like oh

18:31

I've trained this model like thousands

18:33

of times of like um so I've done a bunch

18:35

of experiments I've done hyper primary

18:37

tuning I've done all the things I'm very

18:38

used to and I've done for two decades

18:39

and I've gotten to a certain point and I

18:42

thought it was like fairly well tuned

18:43

and then I let auto research go for like

18:45

overnight and it came back with like

18:47

tunings that I didn't see

18:48

>> and yeah I did forget like the weight

18:50

decay on the value embeddings and my

18:51

atom betas were not sufficiently tuned

18:54

and these things jointly interact so

18:56

like once you tune one thing the other

18:57

things have to potentially change too

18:58

you know I shouldn't be a bottleneck

18:59

like I shouldn't be running these

19:00

hyperparameter search optimizations. I

19:02

shouldn't be looking at the results.

19:03

There's objective criteria in this case.

19:05

Uh so you just let you just have to

19:06

arrange it so that it can just go

19:08

forever. So that's a single sort of

19:09

version of auto research of like a

19:11

single loop trying to improve. And I was

19:13

surprised that it um it found these

19:15

things that I you know the repo is

19:17

already fairly well tuned and still

19:18

found something. And that's just a

19:19

single it's a single loop like these

19:21

frontier labs they have GPU clusters of

19:24

tens of thousands of them. And so it's

19:26

very easy to imagine how you would

19:28

basically get a lot of this automation

19:30

on um smaller models and fundamentally

19:32

everything around like frontier level

19:34

intelligence is about extrapolation and

19:36

scaling loss and so you basically do a

19:38

ton of the exploration on the smaller

19:39

models and then you try to um

19:42

extrapolate out.

19:42

>> So you're saying our research efforts

19:44

are going to get more efficient like

19:45

we're going to have better direction for

19:47

when we scale as well if we can do this

19:49

experimentation better. Yeah, I would

19:50

say that like the most interesting

19:52

project and probably what the frontier

19:53

labs are working on is uh you know you

19:55

experiment on the smaller models. You

19:56

try to make it as autonomous as

19:58

possible. Remove researchers from the

20:00

loop. Uh they have way too much what is

20:02

the what is the opposite? Way too much

20:04

confidence. Yeah, they don't know. They

20:06

shouldn't be touching any of this really

20:08

and so you have to like rewrite the

20:09

whole thing because right now I mean

20:10

certainly they can contribute ideas but

20:12

okay uh they shouldn't actually be

20:14

enacting those ideas. There's a queue of

20:16

ideas and there's maybe an automated

20:18

scientist that comes up with ideas based

20:19

on all the archive papers and GitHub

20:21

repos and it funnels ideas in or

20:23

researchers can contribute ideas but

20:25

it's a single queue and there's workers

20:27

that pull uh items and they try them out

20:29

and uh whatever works just gets uh sort

20:31

of put on the feature branch and maybe

20:33

some people like um monitor the feature

20:36

branch and merge to the main branch

20:37

sometimes. So yeah, just removing humans

20:41

uh from all the processes and automating

20:43

as much as possible and getting high tok

20:44

tokens per second throughputs and it

20:46

does require rethinking of all the

20:48

abstractions. Um and uh everything has

20:50

to be reshuffled. So yeah, I think it's

20:53

very exciting. If

20:54

>> we take one more recursive step here, um

20:57

uh when is the model going to write a

20:58

better program MD than you?

21:00

>> Yeah. Uh so program MD is

21:03

>> we're not in the loop.

21:04

>> Yeah, exactly.

21:05

>> Yeah. Um, so program MD is my crappy

21:07

attempt at describing like how the auto

21:10

researcher should work like oh do this

21:11

then do that and that and then try these

21:13

kinds of ideas and then here's maybe

21:15

some ideas like look at architecture

21:16

look at optimizer etc. I just came up

21:18

with this in markdown, right?

21:20

>> Um

21:21

>> and so uh yeah, exactly. You want some

21:25

kind of an auto research loop maybe that

21:27

looks for you can imagine that different

21:29

program NDS would um would give you

21:32

different uh progress. So you basically

21:35

every research organization is described

21:37

by program MD. Yeah,

21:38

>> a research organization is a set of

21:40

markdown files that describe all the

21:41

roles and how the whole thing connects.

21:43

Um and you can imagine having a better

21:45

research organization. So maybe they do

21:47

fewer stand-ups in the morning because

21:48

they're useless. And this is all just

21:50

code, right? Um and so you can so one

21:52

organization can have fewer stand-ups,

21:54

one organization can have more uh one

21:56

organization can be very risk-taking,

21:58

one organization can be less. And so you

22:00

can definitely imagine that you have

22:01

multiple research orgs. Um and then they

22:03

all have code and once you have code,

22:05

then you can imagine tuning the code. So

22:06

100% there's like the meta layer of it.

22:08

Uh um

22:09

>> did you see my text about my contest

22:11

idea? My contest idea was

22:14

uh like let people write uh different

22:17

program MDs, right? And and so for same

22:19

hardware, where do you get most

22:21

improvement?

22:22

>> Oh, I see.

22:22

>> And then you can take all that data and

22:24

then give it to the model and say write

22:25

a better program MD.

22:26

>> Yes. Yes.

22:28

>> Yeah. Exactly.

22:28

>> We're going to get something better.

22:29

Like there's no way we don't.

22:30

>> You can 100% look at um where the

22:32

improvements came from and like can I

22:35

change the program MD such that more of

22:36

these kinds of things would be done or

22:38

like things that didn't work. uh

22:40

>> meta optimization. Yeah,

22:41

>> you can 100% imagine doing that. So I

22:43

think this is a great idea, but it's

22:45

like you know I think like you sort of

22:47

go one step at a time where you sort of

22:48

have one process and then second process

22:50

and then the next process and these are

22:51

all layers of an onion like the LLM sort

22:54

of part is now taken for granted. The

22:56

agent part is now taken for granted. Now

22:58

the claw-like entities are taken for

22:59

granted and now you can have multiple of

23:01

them and now you can have instructions

23:02

to them and now you can have

23:03

optimization over the instructions and

23:05

it's just like it's a little too much

23:06

you know but there I mean this is why it

23:08

gets to the psychosis is that this is

23:09

like infinite and everything is still

23:10

issue and that's why I feel like yeah

23:12

that's just coming back to this is why

23:14

it's so insane. Okay. Well, if we're

23:16

we're just trying to like diagnose the

23:18

current moment and uh what is a relevant

23:22

skill right now, what do you like what

23:23

do you think is the implication that

23:25

this um that this is the loop we should

23:27

be trying to achieve in different areas

23:29

and that it works right like you know

23:31

remove

23:32

>> create the metric or create the ability

23:34

for um agents to continue working on it

23:37

without you.

23:37

>> Yeah.

23:38

>> Do we still have performance engineering

23:39

23:40

>> Yeah. I mean so there's a few caveats

23:42

that I would put on top of the LM

23:43

ecosystem. Number one,

23:45

>> uh this is extremely well suited to

23:46

anything that has objective uh metrics

23:48

that are easy to evaluate. So for

23:49

example, like writing kernels for more

23:51

efficient CUDA, you know, code for

23:53

various parts of a model, etc. are the

23:55

perfect fit.

23:56

>> Because you have inefficient code and

23:58

then you want efficient code that has

23:59

the exact same behavior, but it's much

24:01

faster, perfect fit.

24:03

>> Uh so a lot of things like like are

24:05

perfect fit for auto research, but many

24:07

things will not be and so they it's just

24:08

if you can't evaluate then you can't

24:10

auto research it, right? Uh so that's

24:12

like caveat number one. And then maybe

24:14

caveat number two I would say is, you

24:15

know, we're we're kind of talking about

24:16

next steps and we kind of see what the

24:18

next steps are, but fundamentally the

24:19

the whole thing still doesn't it's still

24:21

kind of like bursting at the seams a

24:22

little bit and there's cracks and it

24:24

doesn't fully work. And if you kind of

24:26

try to go too far ahead, the whole thing

24:27

is actually net not useful if that makes

24:29

sense.

24:30

>> Um because these models like still are

24:32

not, you know, they've improved a lot,

24:34

but they're still like rough around the

24:35

edges is maybe the way I would describe

24:37

it. I simultaneously feel like I'm

24:39

talking to an extremely brilliant PhD

24:41

student who's been like a systems

24:43

programmer for their entire life and a

24:45

10-year-old. And it's so weird because

24:47

humans like there's I feel like they're

24:49

a lot more coupled. Like you have, you

24:51

know, um everything

24:52

>> you wouldn't you wouldn't encounter that

24:54

combination.

24:54

>> This jaggedness is really strange and

24:56

humans have a lot less of that kind of

24:57

jaggedness. Although they definitely

24:59

have some, but humans have a lot more

25:01

jaggedness. uh sorry the agents have a

25:04

lot more jaggedness where uh sometimes

25:06

like you know I ask for functionality

25:08

and it like comes back with something

25:09

that's just like totally wrong and then

25:11

we get into loops that are totally wrong

25:12

and then I'm just I get so frustrated

25:14

with the agents all the time still

25:15

because you feel the power of it but you

25:18

also there's still like it does

25:21

nonsensical things once in a while for

25:22

me still as well

25:23

>> I get very annoyed when um uh I feel

25:27

like the agent wasted a lot of compute

25:30

on something it should have recognized

25:31

was an obvious problem.

25:32

>> Yeah, I think like some of the bigger

25:34

things is like maybe what's under

25:36

underneath it, if I could hypothesize,

25:38

is fundamentally these models are

25:39

trained via reinforcement learning. So

25:41

they're actually struggling with the

25:41

exact same thing we just talked about,

25:42

which is the labs can improve the models

25:45

in anything that is verifiable, whether

25:47

has rewards. So did you write the

25:49

program correctly and does it do the

25:51

unit test check out? Yes or no? But some

25:53

of the things where they're struggling

25:54

is like for example, I think they have a

25:56

tough time with like nuance of maybe

25:58

what I what I had in mind or what I

25:59

intended and when to ask clarifying

26:01

questions. Um like what I yeah it's just

26:04

um anything that feels softer is like

26:07

worse. And so you're kind of like you're

26:09

either on Rails and you're part of the

26:11

super intelligence circuits or you're

26:13

not on Rails and you're outside of the

26:14

verifiable domains and suddenly

26:15

everything kind of just like meanders.

26:17

Like maybe another way to put it is if

26:19

you go to if today if you go to like

26:21

state-of-the-art model chachi PT and you

26:22

ask it tell me a joke um do you know

26:26

what joke you're going to get? There's

26:27

the joke.

26:28

>> The joke I do feel I I I can't tell you

26:30

like the you know standard form of it

26:32

but I do feel like Chach has like three

26:34

jokes.

26:34

>> Yeah. Yeah. So the the joke that

26:36

apparently all thems like laugh the most

26:38

is why do scientists uh not trust atoms?

26:41

>> Okay.

26:42

>> Because they make everything up.

26:43

>> Okay.

26:44

>> They make everything up.

26:45

So this is

26:46

>> how did that emerge?

26:47

>> So this is the joke you would get like

26:49

three or four years ago and this is the

26:50

joke you still get today.

26:52

>> Okay.

26:52

>> So even though the models have improved

26:54

tremendously.

26:54

>> Yeah.

26:55

>> And if you give them an agentic task,

26:56

they will just go for hours and move

26:59

mountains for you.

27:00

>> And then you ask for like a joke and it

27:02

has a stupid joke, a crappy joke from 5

27:04

years ago. And it's because it's outside

27:06

of the it's outside of the RL.

27:08

>> It's outside of the reinforcement

27:09

learning. It's outside of what's being

27:10

improved. It's like and it's part of the

27:13

jaggedness of like shouldn't you expect

27:15

models as they get better to also have

27:16

like better jokes or more diversity of

27:18

them or it's just it's not being

27:20

optimized and it's stuck.

27:22

>> Do you uh uh think that that implies

27:25

that we are not seeing like

27:27

generalization in the sense of like

27:29

broader intelligence of joke smartness

27:32

being attached to code smartness. Yeah,

27:35

I think there's some decoupling where

27:37

some things are verifiable and some

27:38

things are not and some things are

27:40

optimized for arbitrarily by the labs

27:41

depending on like what data went in and

27:43

some things are not and um

27:46

>> and

27:46

>> but I mean the the premise there's a you

27:48

know premise from some research groups

27:51

that if you are smarter at code

27:53

generation or in these ver verifiable

27:55

fields you should be better at

27:56

everything and and like the the the joke

27:59

situation suggests that that's not

28:00

happening in all

28:00

>> I don't think that's happening. Yeah, I

28:02

don't think that's happening. I think uh

28:03

I think maybe we're seeing like a little

28:05

bit of that but not like a satisfying

28:06

amount.

28:06

>> Yeah, that agonist exists in humans.

28:10

>> You can be very very good at math and

28:12

still tell a really bad joke.

28:14

>> Yeah, that's true. Yeah, but it just it

28:15

still means that we're not getting like

28:17

the story is that we're getting a lot of

28:18

the intelligence and capabilities in all

28:20

the domains of society like for free as

28:22

we get better and better models. And

28:24

that's not like exactly fundamentally

28:25

what's going on. And there's some blind

28:27

spots and some things are not being

28:28

optimized for. And this is all clustered

28:30

up in these neural net opaque models,

28:33

right? So you're either on rails of what

28:35

it was trained for and everything is

28:36

like you're going at speed of light or

28:37

you're not. Um and so it's jaggedness.

28:40

So um so that's why I think like even

28:43

though the the progression is obvious

28:45

what should happen, you can't let it

28:48

fully go there yet because it doesn't

28:51

fully work or it's a skill issue and we

28:53

just haven't like figured out how to use

28:54

it. So you know it's hard to tell. Can I

28:56

ask kind of a blasphemous question which

28:58

is like if this jaggedness is persisting

29:01

um and it's all rolled up in a uh at

29:04

least monolithic interface right but you

29:06

know single model

29:08

>> um does that make sense or do do you

29:10

should should it be unbundled into

29:11

things that are can be optimized and

29:13

improved against different

29:15

>> domains of intelligence

29:16

>> uh like unbundling the models into

29:18

multiple experts in different areas etc

29:20

>> more directly yeah

29:21

>> um

29:22

>> instead of juste that we have no

29:24

exposure to that can be confusing as a

29:28

why is it so good at this but not at

29:30

this other thing.

29:31

>> Yeah, I think currently my impression is

29:33

the labs are trying to have a single

29:34

sort of like monoculture of a model that

29:36

is arbitrarily intelligent in all these

29:39

different domains and they just stuff it

29:41

into the parameters. I do think that we

29:43

will we I I I do think we should expect

29:44

more speciation in the um intelligences.

29:48

Um like you know the animal kingdom is

29:51

extremely diverse in the brains that

29:52

exist and there's lots of different

29:53

niches of uh of nature and some animals

29:56

have overdeveloped visual cortex or

29:58

other part kind of parts and I think we

30:00

we should be able to see more speciation

30:03

and um you don't need like this oracle

30:05

that knows everything. and you kind of

30:06

speciate it and then you put it on a

30:07

specific task. And we should be seeing

30:09

some of that because you should be able

30:10

to have like much smaller models that

30:11

still have the cognitive core like

30:13

they're still competent but then they

30:14

specialize and then um and then they can

30:17

become more efficient in terms of

30:19

latency or throughput on uh specific

30:21

tasks that you really care about like if

30:22

you're a mathematician working in lean.

30:24

I saw for example there's a few releases

30:25

that really like target that as a

30:27

domain. Um uh so there's a probably

30:29

going to be a few examples like that

30:30

where the unbundling kind of makes

30:32

sense. One question I have is whether or

30:34

not uh the capacity constraint on

30:38

available compute infrastructure

30:40

>> drives more of this because efficiency

30:42

Yeah. actually matters more, right? Like

30:44

your

30:45

>> if you

30:47

financing aside though financing is

30:49

involved in all of this, if you have

30:50

access to full compute for anything you

30:52

do, like even one single model, right?

30:55

But if you actually feel pressure where

30:57

you're like I can't serve

31:00

>> um a model of massive size for every use

31:03

case

31:04

>> like do you think that leads to any

31:05

speciation? Does that question make

31:06

sense to you? The question makes sense

31:08

and I guess like what I'm what I'm what

31:10

I what I'm struggling with is I don't

31:11

think we've seen too much speciation

31:13

just yet, right?

31:14

>> No.

31:14

>> Uh we're seeing a monoculture of models.

31:16

>> Yeah.

31:16

>> So um

31:17

>> and there's like clearly pressure for

31:18

like make a good code model, put it back

31:20

in the main merge again.

31:21

>> Yeah. Yeah.

31:23

>> Um

31:25

even though there already is pressure on

31:27

the models.

31:28

>> I guess perhaps I I feel like there's a

31:30

lot of very short-term supply crunch and

31:32

like maybe that causes more speciation

31:34

now.

31:35

>> Yeah. Yeah, I think fundamentally like

31:37

the the the labs are serving a model and

31:40

they don't really know what the end user

31:41

is going to be asking about. Uh so maybe

31:43

that's like some part of it because they

31:45

kind of have to multitask over all the

31:46

possible things that could be asked. But

31:48

I think if you're coming to a business

31:49

and maybe partnering on some specific

31:50

problems you care about, then maybe you

31:52

would see that there. Um or there would

31:54

be some very high value applications

31:56

that are like more niche. Um but uh I

31:59

think right now they're kind of like

32:00

going after the totality of what's

32:02

available. I don't think that the

32:03

science of manipulating the brains is

32:05

like fully developed yet partly.

32:07

>> What do you mean manipulating?

32:08

>> So like so fine-tuning without losing

32:11

capabilities as an example and we don't

32:12

have these primitives for actually like

32:14

working with the intelligences in ways

32:15

other than just context windows like

32:17

context windows kind of just just work

32:19

and it's very cheap to manipulate etc.

32:20

And this is how we're getting some of

32:21

the customization etc. Uh but I think if

32:24

it was I think it's a it's a bit more of

32:26

a developing science of how you like

32:27

more deeply adjust the models, how you

32:29

have continual learning maybe or how you

32:32

um how you fine-tune in a certain area,

32:34

how you get better in a certain area or

32:35

like how you actually touch the weights,

32:36

not just the context windows. And so

32:38

it's a lot more tricky, I would say, to

32:40

touch the weights than just the context

32:42

windows. Uh because you're actually

32:43

fundamentally changing the full model

32:44

and potentially its intelligence. And so

32:47

um so maybe it's just like not a fully

32:49

developed science, if that makes sense,

32:50

of speciation. A and it also has to be

32:53

like cheap enough

32:54

>> for that speciation to be worthwhile in

32:56

these given

32:57

>> contexts. Can I ask a question about uh

33:00

like uh an extension to auto research

33:02

that you described in terms of um open

33:04

ground? You said okay well you know we

33:06

have this thing

33:07

>> um we need more collaboration surface

33:10

around it essentially for people to

33:12

contribute um to research overall. Can

33:15

you talk about that? Yeah. So, we talked

33:16

about our research has a single thread

33:17

of like I'm going to try stuff in loop,

33:19

but fundamentally uh the paralization of

33:21

this is like the interesting component.

33:23

Um, and I guess I was trying to like

33:24

play around with a few ideas, but I

33:26

don't have anything that like clicks as

33:28

simply as like I don't have something

33:29

that I'm like super happy with just yet,

33:30

but it's something I'm like working on

33:31

on the side when I'm not working on my

33:33

claw. Um so I think like one issue is if

33:37

you have a bunch of nodes uh of

33:40

paralization available to you then it's

33:42

very easy to just have multiple auto

33:43

researchers talking through um a common

33:45

system or something like that. What I

33:47

was more interested in is how you can

33:48

have an untrusted pool of workers out

33:50

there on the internet.

33:51

>> So for example in auto research uh

33:53

you're just trying to find um the piece

33:57

of code that trains a model to a very

33:58

low validation loss. If anyone gives you

34:01

a candidate commit, it's very easy to

34:03

verify that that commit is correct, is

34:05

good. Like they someone could claim from

34:07

the internet that this piece of code

34:08

will optimize uh much better and give

34:10

you much better performance. You could

34:11

just check very easy, but probably a lot

34:14

of work goes into that checking. Uh but

34:16

fundamentally they could lie and etc. So

34:18

you're basically dealing with a similar

34:19

kind of it's almost actually like looks

34:21

a little bit like my my designs that

34:22

incorporate an untrusted pool of workers

34:25

uh actually look a little bit more like

34:26

a blockchain a little bit. uh because

34:28

instead of blocks, you have uh commits

34:31

and these commits can build on each

34:32

other and they contain like changes to

34:33

the code as you're improving it. Um and

34:36

uh the proof of work is basically doing

34:38

tons of experimentation to find the

34:39

commits that work.

34:41

>> Um and that's hard. Um and then the

34:43

reward is just being on the leaderboard

34:44

right now. There's no monetary reward

34:46

whatsoever. uh but I don't want to push

34:48

the analogy too far but it fundamentally

34:50

has this issue where you a huge amount

34:52

of search goes into it but it's very

34:54

cheap to verify that a candidate

34:55

solution is indeed good because you can

34:57

just train a single you know someone had

34:59

to try 10,000 ideas but you just have to

35:01

check that the thing that they produced

35:02

actually works

35:03

>> because the 99,000 of them didn't work

35:05

you know

35:06

>> um and so basically long story short is

35:09

like you have to come up with a system

35:11

where an untrusted pool of workers can

35:13

collaborate with a trusted pool of

35:15

workers uh that do the verification

35:18

And the whole thing is kind of like

35:19

asynchronous and works and um and so on

35:22

and uh it's it's like safe from a

35:24

security perspective because if anyone

35:26

sends you arbitrary code and you're

35:27

going to run it that's very sketchy and

35:28

dodgy. So um but fundamentally it should

35:31

be totally possible. So you're familiar

35:32

with projects like seti at home and

35:34

folding at home all of these problems

35:35

have a similar kind of setup. So folding

35:37

at home you're folding a protein um and

35:40

it's very hard to find a configuration

35:41

that is low energy. But if someone finds

35:42

a configuration that they evaluate to be

35:44

low energy that's perfect you can just

35:45

use it. you can easily verify it. So a

35:47

lot of things have this property that

35:48

you know very expensive to come up with

35:50

but very cheap to verify and so in all

35:52

those cases things like folding at home

35:54

or seti at home or auto research at home

35:57

will be good fits. And so, um, long

36:00

story short, a swarm of agents on the

36:02

internet could collaborate to improve

36:04

LLMs and could potentially even like run

36:06

circles around Frontier Labs. Like, who

36:08

knows, you know? Um, yeah, like maybe

36:11

that's even possible. Like, Frontier

36:12

Labs have a huge amount of trusted

36:14

compute, but the Earth is much bigger

36:16

and has huge amount of untrusted

36:18

compute. But if you put systems in check

36:20

uh systems in place that you know deal

36:22

with this then maybe it is possible that

36:24

the swarm out there could uh could come

36:26

up with with better with better

36:28

solutions and people kind of like

36:30

contribute cycles um to to a thing that

36:33

they care about. And so sorry so the

36:35

last thought is uh lots of companies or

36:37

whatnot they could maybe have like their

36:38

own uh things that they care about and

36:40

you if you have compute capacity you

36:42

could contribute to different kind of

36:43

auto research tracks like maybe you care

36:45

about certain you know like you care

36:47

about like cancer or something like that

36:48

of certain type you don't have just

36:50

donate money to an institution you

36:51

actually could like purchase compute and

36:53

then you could join the auto resource

36:55

forum for that project you know uh so if

36:58

everything is rebundled into other

36:59

researchers then compute becomes the

37:01

thing that you're contributing to the

37:02

pool. Yeah, that's very inspiring. And

37:04

it's also interesting like I don't I

37:06

don't know how far this goes, but it is

37:08

interesting that at least some audience

37:10

of people, you know, here in Silicon

37:12

Valley or lining up at um you know,

37:14

retail stores in China have discovered

37:16

that like having access to personal

37:19

compute is interesting again.

37:20

>> Yeah.

37:20

>> Right. So maybe they're really motivated

37:22

to do that for their claws and then they

37:24

can contribute to auto research.

37:25

>> It's almost like dollars the thing

37:27

everyone cares about, but is flop the

37:29

thing that actually everyone cares about

37:30

in the future? Like is there going to be

37:32

like a flipping almost of like what the

37:34

thing that you care about? Like right

37:35

now for example, it's really hard to get

37:36

compute even if you have money.

37:37

>> Yeah.

37:38

>> So actually it almost seems like the

37:40

flop is like dominant uh in a certain

37:43

sense. Um yeah. So uh so maybe that's

37:45

kind of like kind of like that like how

37:47

much how many flops do you control

37:49

instead of like what wealth do you

37:50

control? I don't actually think that's

37:51

true but it's kind of interesting to

37:52

think about.

37:53

>> The last thing you released was like a

37:55

little bit of jobs data analysis. Is

37:57

that right?

37:58

What um and might have touched a nerve

38:01

even though you're just like visualizing

38:02

some public data. Yeah. Uh what was you

38:05

know what were you curious about?

38:06

>> Yeah, I guess I was curious to um I mean

38:09

everyone is like really it's everyone is

38:11

really thinking about the impacts of AI

38:12

on the job market and what's going to

38:13

look like. So I was just interested to

38:15

take a look like what does the job

38:16

market look like? Where are the

38:17

different roles? um and how many people

38:20

are in different professions. And I was

38:21

like really just interested to like look

38:23

through uh the individual cases and try

38:25

to think myself about like you know with

38:27

these AIs and how they're likely to

38:28

evolve like are these going to be tools

38:31

that people are using? Are these going

38:32

to be displacing tools for these uh

38:35

professions and like what are the

38:37

current professions and how are they

38:38

going to change? Are they going to grow

38:39

or uh adjust to a large extent or like

38:42

what could be new professions? So it's

38:43

really just like a way to fuel my own

38:45

chain of thought about the industry I

38:46

suppose. M

38:47

>> um and so uh yeah the jobs data

38:50

basically is just a Bureau of Labor

38:52

Statistics they actually have um percent

38:55

outlook for each profession about how

38:56

much it's expected to grow over the next

38:58

I think almost a decade.

38:59

>> Uh yeah I think it's a decade but it was

39:01

made in 2024.

39:02

>> We need a lot of healthare workers.

39:04

>> Yeah. So so they've already made those

39:06

projections and I'm not sure actually

39:07

100% what the methodology was that they

39:09

that they put into the projections. Um,

39:11

I guess I was interested to color things

39:13

by like if people think that what's like

39:15

primarily being um developed now is this

39:18

kind of like more digital AI that is

39:20

kind of like almost like these ghosts or

39:21

spirit entities that can like interact

39:24

in the digital world and manipulate a

39:26

lot of like digital information and they

39:27

currently don't really have a physical

39:28

embodiment or presence and the physical

39:30

stuff is probably going to go slightly

39:32

slower because you're manipulating

39:33

atoms. So flipping flipping bits and and

39:36

the ability to copy paste a digital

39:38

information is like makes everything a

39:39

million times faster than accelerating

39:41

matter, you know. So um so energetically

39:44

I just think we're going to see a huge

39:46

amount of activity in digital space,

39:47

huge amount of rewriting, huge amount of

39:49

activity boiling soup and I think the

39:52

we're going to see something that that

39:53

in the digital space goes at the speed

39:54

of light compared to I think what's

39:56

going to happen in the physical world to

39:57

some extent if would be the

39:59

extrapolation. And so I think like

40:01

there's currently kind of like I think

40:03

overhang where there can be like a lot

40:06

of unhobling almost potentially of like

40:08

a lot of digital information processing

40:10

that used to be done by computers and

40:11

people and now with AI as like a third

40:13

kind of manipulator of digital

40:14

information. There's going to be a lot

40:15

of refactoring in those in those uh

40:18

disciplines. Um but the physical world

40:20

is actually going to be like I think um

40:22

behind that by some amount of time. And

40:24

so I think what's really fascinating to

40:26

me is like so that's why I was

40:28

highlighting the the professions that

40:29

fundamentally manipulate digital

40:31

information. This is work you could do

40:32

from your home etc. because I feel like

40:34

those will be like things will change

40:36

and it doesn't mean that there's going

40:37

to be less of those jobs or more of

40:38

those jobs because that has to do with

40:40

like demand elasticity and many other

40:41

factors but things will change in these

40:43

professions because of these new tools

40:45

and um because of this upgrade to the

40:47

nervous system of the human

40:49

superorganism if you want to think about

40:51

it that way. Given the look you had at

40:53

the data, do you have either any

40:54

observations or um uh guidance for

40:58

people facing the job market or thinking

41:00

about what to study now or what skills

41:02

to develop? I mean, we can all go get

41:04

like I'm very thankful that I have to

41:06

like meet people for my job right now.

41:08

>> More physical. Yeah.

41:09

>> Could you do your work from home though?

41:11

Uh I could

41:12

>> I think there are relationship parts of

41:14

it that are hard, but most of it I

41:15

could.

41:16

>> Yeah. I think it's really hard to tell

41:17

because again like the job market is

41:18

extremely diverse and I think the

41:19

answers will probably vary but uh to a

41:21

large extent like these tools are

41:23

extremely new, extremely powerful and so

41:24

just being uh you know just trying to

41:26

keep up with it is like the first thing

41:28

um and um yeah because I think a lot of

41:32

people kind of like dismiss it or

41:33

>> or they're afraid of it

41:34

>> or they're afraid of it etc which is

41:36

totally understandable of course. Yeah,

41:38

I think like um it's fundamentally an

41:40

empowering tool at the moment. Um and

41:42

these jobs are bundles of tasks and some

41:44

of these tasks can go a lot faster and

41:45

so people should think of it as

41:46

primarily a tool that it is right now.

41:48

Um and I think the long-term future of

41:50

that is uncertain. Yeah, it's kind of

41:52

really hard to forecast to be honest and

41:54

like I'm not professionally like doing

41:56

that really and I think it's a job of

41:57

like economists to do properly.

41:59

>> You are an engineer though. Uh and like

42:02

one thing I thought was interesting is

42:03

that like the uh demand for engineering

42:05

jobs is continuing to increase.

42:08

>> Yeah.

42:08

>> Um I I can't tell if that's like a

42:10

temporary phenomenon. I'm not sure how I

42:11

feel about it yet. Do you know?

42:13

>> Yeah. That's like the demand almost like

42:15

uh software was scarce, right? And so

42:17

the reason we don't have more demand for

42:19

software is just it's scarcity and it's

42:21

too expensive.

42:21

>> Too expensive. Yeah.

42:22

>> So if the barrier comes down then

42:23

actually you have the Jevans paradox

42:25

which is like you know you actually the

42:26

demand for software actually goes up.

42:28

It's cheaper and there's more more

42:29

>> powerful. Yeah. the the classical

42:31

example of this always is the ATMs and

42:33

the bank tellers uh because there was a

42:35

lot of like fear that um ATMs and

42:38

computers basically uh would displace

42:40

tellers but what happened is they made

42:42

like the cost of operation of um of a

42:44

bank branch much cheaper and so there

42:46

were more bank branches so there were

42:48

more tellers is like the canonical

42:49

example people site uh but basically

42:51

it's just paradox like something becomes

42:53

cheaper so there's a lot of unlocked

42:56

demand for it so I do think that that's

42:58

probably I do have cautiously optimistic

43:00

view of this in software engineering

43:02

where I do um it does seem to me like

43:05

the demand for software will be

43:06

extremely large um and it's just become

43:08

a lot cheaper and um so I do think that

43:12

for quite some time um it's very hard to

43:16

forecast but it does seem to me like

43:18

right now at least locally there's going

43:19

to be more demand for software um

43:21

because software is amazing it's like

43:22

you know digital information processing

43:23

you're not forced to use like arbitrary

43:25

tools that were given to you that are

43:26

imperfect in various ways you're not

43:28

forced to subscribe to what exists uh

43:30

code is now ephemeral and it can change

43:32

and it can be modified um and so I think

43:35

there's going to be a lot of activity in

43:36

the digital space to like rewire

43:38

everything in a certain sense and I

43:40

think it's going to create a lot of

43:41

demand for for this kind of stuff I

43:43

think long term um yeah obviously even

43:45

with auto research like openi or or you

43:48

know uh anthropic or these other labs

43:50

like they're employing what like a

43:51

thousand something researchers right

43:53

>> these researchers are basically like

43:54

glorified auto like you know

43:58

>> they're like automating themselves away

43:59

like actively and this is like the thing

44:00

they're all trying to do.

44:01

>> Yeah.

44:02

>> I f like I went around um

44:04

>> some of those researchers also feel feel

44:06

the psychosis, right? Because they can

44:07

it's working. Yeah.

44:08

>> Right. And and so they're like oh it's

44:10

over for me too.

44:11

>> I did spend a bunch of time going around

44:12

opening eye and I was like you guys

44:14

realize if we're successful like we're

44:15

all out of job like

44:16

>> like it's just we're just building

44:18

automation for Sam or something like

44:19

that. like I or the board I'm not sure

44:22

but like uh there's just building like

44:24

this automation for yeah the board or

44:26

the CEO or something like that and we're

44:27

all out of our job and maybe um

44:30

contributing on the sides and so yeah

44:33

it's kind of like uh nerving from that

44:34

perspective

44:35

>> is it okay if I ask you Nome's question

44:38

>> you know you could be doing that right

44:40

auto researching with a lot of compute

44:42

scale and a bunch of colleagues at one

44:43

of the frontier labs like why not

44:45

>> well I was there for a while right like

44:46

and I did re-enter so to some extent I

44:49

agree and I think that there are many

44:50

ways to slice this question. It's a very

44:51

loaded question a little bit. Um I will

44:53

say that I feel very good about like

44:55

what people can contribute in their

44:57

impact uh outside of the frontier labs

44:59

obviously not in the industry but also

45:01

in like more like ecosystem level roles.

45:04

Um so your role for example is more like

45:05

ecosystem level. My role currently is

45:07

also kind of more on ecosystem level and

45:08

I feel very good about like impact that

45:10

people can have in those kinds of uh

45:11

roles. I think conversely there's there

45:13

are definite problems in my mind for um

45:16

uh for basically aligning yourself way

45:18

too much with the frontier labs too. So

45:20

fundamentally I mean you're you have a

45:21

huge amount of financial incentive to uh

45:23

with these frontier labs and by your own

45:25

admission the uh the AIs are going to

45:28

like really change humanity and society

45:29

in very dramatic ways and here you are

45:32

basically like building the technology

45:34

and benefiting from it like and being

45:35

like very allied to it through financial

45:37

means like this was a conundrum that was

45:39

in um at the heart of you know how open

45:42

started in the beginning like this was

45:43

the conundrum that we were trying to

45:44

solve. M

45:45

>> um and so you know that so it's kind of

45:49

>> it's still not the conundrum is still

45:50

not like fully resolved. So that's

45:52

number one. You you're not a completely

45:54

free agent and you can't actually like

45:55

be part of that conversation in a fully

45:57

autonomous um free way like if you're

45:59

inside one of the frontier labs like

46:01

there are certain things that you can't

46:02

say. Uh and conversely there are certain

46:04

things that the organization wants you

46:05

to say and you know they're not going to

46:07

twist your arm but you feel the pressure

46:09

of like what you should be saying you

46:11

know cuz like obviously

46:14

Otherwise, it's like really awkward

46:16

conversations,

46:17

strange side eyes, like what are you

46:19

doing? You know, like so you can't like

46:20

really be an independent agent and I I

46:22

feel like a bit more ali like aligned

46:24

with humanity in a certain sense outside

46:26

of a frontier lab because uh I don't I'm

46:28

not subject to those pressures almost,

46:29

right? And I can't say whatever I want

46:31

or yeah, I would say in the frontier

46:32

labs like um you can have like uh impact

46:36

there of course as well. So uh but

46:38

there's many researchers and maybe

46:39

you're one of them, maybe your ideas are

46:40

really good, etc. Maybe there's a lot of

46:42

decision-m to to do and you want to be

46:43

in a position where you are in the room

46:44

with those conversations when they come

46:45

up. I do think that currently the stakes

46:47

are like overall fairly low and so

46:49

everything is kind of like nice but

46:51

ultimately at the end of the day like

46:52

when the stakes are really high etc. If

46:53

you're an employee at an organization I

46:55

don't actually know how much sway you're

46:56

going to have on the organization what

46:58

it's going to do like fundamentally at

46:59

the end of the day um uh it's uh you're

47:02

not like really in charge like you're in

47:04

a room and you're contributing ideas but

47:05

you're not like really in charge of that

47:06

entity that you're that you're a part

47:08

of. So those are like some sources of

47:09

misalignment I think to some extent. I

47:11

will say that like in one way I do agree

47:13

a lot with that sentiment that um I do

47:16

feel like and if uh like the labs for

47:18

better or worse they're opaque and a lot

47:20

of work is there and they're kind of

47:21

like at the edge of capability and

47:22

what's possible and they're working on

47:24

what's coming down the line and I think

47:25

if you're outside of the frontier lab uh

47:28

your your judgment fundamentally will

47:29

start to drift because you're not part

47:31

of the you know what's coming down the

47:33

line right and so I feel like my

47:35

judgment will inevitably start to drift

47:36

as well and uh I won't actually have an

47:38

understanding of how these systems

47:39

actually work under the hood that's an

47:40

opaque system uh I won't have a a good

47:43

understanding of how it's going to

47:44

develop and etc. And so I do think that

47:46

in that sense I agree and something I'm

47:48

nervous about. I think it's worth

47:50

basically bas uh being in touch with

47:52

what's actually happening and actually

47:53

being in the frontier lab and if if some

47:55

of the frontier labs would have me come

47:56

for you know some amount of time and do

47:58

really good work for them and then maybe

48:00

coming

48:00

>> is looking for a job. This is super

48:02

exciting.

48:02

>> Yeah.

48:03

>> Then I think that's maybe a good setup

48:05

because I kind of feel like it kind of

48:06

um you know um maybe that's like one way

48:10

>> uh to to actually be connected to what's

48:12

actually happening but also not feel

48:13

like you're necessarily fully controlled

48:14

48:15

>> Yeah.

48:15

>> by those entities. So I think honestly

48:17

in my mind like uh Noom can probably get

48:20

do extremely good work at uh at OI but

48:22

also I think his most um impactful work

48:24

could very well be outside of OpenAI.

48:26

>> No that's a call to be an independent

48:28

researcher with auto research. Yeah,

48:30

there's many things to do on the outside

48:31

and it's a it's a and I think ultimately

48:34

I think the ideal solution maybe is like

48:36

yeah going back and forth uh or um yeah

48:39

and I think fundamentally you can have

48:41

really amazing impact in both places. So

48:43

very complic I don't know like it's a

48:44

very loaded question a little bit but I

48:46

mean I joined the frontier lab and now

48:47

I'm outside and then maybe in the future

48:49

I'll want to join again and I think um

48:53

uh that's kind of like how I look at it.

48:54

One question related to what visibility

48:57

to does the world or the AI ecosystem

49:00

have into uh the frontier is like how

49:04

how close open sources to the frontier

49:06

>> um and how sustainable that is. I I

49:09

think yeah I think it is quite

49:11

>> surprising the entire sequence of events

49:13

actually from like having a handful of

49:16

Chinese models and global models and I

49:19

think people are going to continue

49:20

releasing here in the near term that are

49:22

closer than much of the industry

49:24

anticipated from a capability

49:25

perspective.

49:26

>> Um I don't know if you're surprised by

49:27

that but you're a long-term contributor

49:28

to open source. Like what's your

49:30

prediction here? Yeah. So roughly

49:31

speaking basically the um yeah the

49:34

closed models are ahead but like people

49:35

are monitoring the number of months that

49:36

sort of like open source models are

49:38

behind. Um

49:39

>> and it started with there's nothing and

49:40

then it went to 18 months and now it's

49:42

>> convergence right. So maybe they're

49:44

behind by like what is the latest maybe

49:46

like eight six months eight months kind

49:47

of thing right now. Yeah I'm a huge fan

49:48

of open source obviously. So for example

49:50

in operating systems you have like

49:51

closed like you know Windows and Mac OS.

49:53

These are large software projects kind

49:54

of like what LM are going to become and

49:56

there's Linux but Linux is very easy

49:58

like actually Linux is extremely

50:00

successful project it runs on the vast

50:01

majority of computers like last time I

50:03

checked was it like 60% or something

50:05

like run Linux um and that's because

50:07

there is a need in the industry to have

50:09

a common open platform that everyone

50:10

feels uh sort of safe using I would say

50:13

like the industry has always felt a

50:14

demand for that kind of a project to

50:16

exist and I think the same is true now

50:18

and that's why businesses actually want

50:19

there's demand for this kind of a um a

50:21

thing to exist the big difference is

50:23

that everything is capital. Uh there's a

50:25

lot of capex that goes into this.

50:27

>> Um so I think that's where things like

50:29

fall apart a little bit make it a bit

50:30

harder to to compete in certain sense.

50:32

Uh I I do think that the current models

50:34

are very good. The other thing that I

50:35

think is like really interesting is that

50:36

for the vast majority of like consumer

50:38

use cases and things like that even like

50:40

term open source models are actually

50:41

quite good I would say and I think like

50:43

if you go forward like more uh more

50:46

years it does seem to mean like a huge

50:48

amount of like simple use cases are

50:50

going to be well covered and actually

50:51

even run locally. Um, but there's going

50:54

to be always like some demand for like

50:56

frontier intelligence and that that can

50:57

actually be extremely large piece of the

50:59

pie. But it could be that the frontier

51:01

the need for frontier intelligence is

51:02

going to be like, you know, Nobel Prize

51:04

kind of work or like let's move Linux

51:07

from C to Rust. There's going to be like

51:08

bigger projects, you know, like scoped

51:11

in that kind of a way. And there's going

51:12

to be maybe more um and maybe that's

51:15

where a lot of the frontier closed

51:16

intelligences were are going to be

51:18

interacting with and open source is kind

51:20

of like going to eat through a lot of

51:21

the more basic use cases or something

51:23

like that. You know at some point what

51:25

is frontier today is going to be you

51:27

know probably later this year what's

51:28

frontier today in terms of what I'm

51:30

using right now from the closed labs uh

51:32

might be open source and that's going to

51:33

be doing a lot of work. So I kind of

51:34

expect that this dynamic will actually

51:36

basically continue like we'll have

51:37

Frontier Labs that have closed um AIS

51:39

that are kind of like these oracles and

51:41

then we'll have open source kind of like

51:42

behind by some amount of months and I

51:44

kind of expect that to uh to continue

51:46

and I actually think that's like a

51:47

pretty pretty good setup uh overall. Um

51:51

because I I'm a little bit hesitant of

51:53

having um I don't actually think it's

51:54

like structurally I think there's some

51:56

systemic risk attached to just having

51:58

intelligences that are closed and that's

51:59

like that's it. Mhm.

52:01

>> And I think that that's uh you know

52:02

centralization has a very poor track

52:04

record in my view uh in in the past and

52:07

has uh

52:07

>> you mean like in political or economic

52:09

systems in general.

52:10

>> Yes.

52:12

>> Exactly. I think there's like a lot of

52:13

>> like Eastern European. Yeah.

52:15

>> A lot of pretty bad president. So I want

52:16

there to be a thing that is maybe not at

52:18

the edge of capability because it's new

52:20

and unexplored etc. But I want there to

52:21

be a thing that's behind and that uh is

52:24

kind of like a common working space for

52:25

intelligences that the entire industry

52:27

has access to. Yeah, that seems to me

52:28

like a pretty decent power balance for

52:30

the industry.

52:31

>> Yeah, I also think there's just like

52:32

there are many problems to solve, right?

52:34

Like if you keep advancing intelligence

52:36

from the frontier, we can do new things

52:38

and there are a lot of like very big

52:39

problems for humanity, right? And so

52:42

like it seems that that will continue to

52:44

be a very expensive game. And so I want

52:45

to like root for labs that are doing

52:47

that because there are problems we

52:48

cannot solve without continuing to

52:50

advance the models in a very expensive

52:52

way. Yeah. And yet, as you point out,

52:54

>> if what we have today as Frontier is

52:58

open, that's a lot of capability. Yeah.

53:00

Right. And and so I I think you know the

53:02

power of that or the democratization of

53:04

that seems like

53:05

>> very useful and also healthy.

53:06

>> Yeah. I think basically by accident

53:08

we're actually like in okay spot

53:09

>> and optimal. Yeah.

53:11

>> By accident we we are happen to be in a

53:12

good spot in a certain sense. Um

53:14

>> well and and to some degree the the

53:16

longer this endures like this dynamic

53:19

>> um the the healthier of a spot like the

53:22

ecosystem might be in right because you

53:24

have more and more area under the curve

53:26

>> and I will say that even on the close

53:27

side I almost feel like it's been like

53:29

even further centralizing recently

53:30

because I think a lot of the front

53:31

runners are like not necessarily like

53:33

the top tier and so yeah like in that

53:36

sense I think it's um it's not super

53:38

ideal. I would love there to be more

53:40

more front to last because yeah I'm like

53:42

by default very suspicious of like um I

53:45

want there to be more people in the

53:46

room. I want I think like in machine

53:48

learning ensembles always outperform any

53:50

individual model and so I want there to

53:52

be ensembles of people thinking about

53:53

all the hardest problems and I want

53:54

there to be ensembles of people in the

53:56

room when they um to be all well

53:58

informed and to make all those decisions

54:00

you know so uh I don't want it to be

54:01

like a closed doors with two people or

54:03

three people. I feel like that's like

54:04

not a good not a good future. I almost

54:06

wish like there were more labs is long

54:08

story short and I I I do think that open

54:10

source has a has a has a place to play.

54:12

I hope it sticks around and I basically

54:15

it's currently slightly behind and

54:16

that's actually kind of like a good

54:17

thing.

54:18

>> Okay. you worked on the precursor to

54:20

generalized robotics autonomy um in

54:23

cars, right? Uh a a lot has happened in

54:26

the last couple months with robotics

54:28

companies as well like acceleration of

54:31

really impressive generalization of

54:33

environment of tasks like increasing

54:35

long horizon tasks, lots of money going

54:37

into the space like is it going to

54:39

happen? Has anything in your view

54:41

changed recently?

54:42

>> So like my view is kind of informed by

54:44

what I saw in self-driving and I do feel

54:45

like self-driving is the first robotics

54:46

application. So probably what I saw is

54:48

at the time like 10 years ago there were

54:50

a large number of startups and I kind of

54:52

feel like um like most of them basically

54:54

like didn't long-term make it. Um and

54:57

what I saw is that like a lot of capital

54:58

expenditure had to go in and a lot of

55:00

time and so um I think it like I think

55:03

robotics because it's so difficult and

55:05

so messy and requires huge amount of

55:06

capital investment and a lot of like con

55:08

conviction um just it's like a big

55:11

problem and I think items are really

55:12

hard. So I kind of feel like they will

55:14

lag be it will lag behind what's going

55:16

to happen in digital space and in

55:17

digital space there's going to be a huge

55:19

amount of unhobling uh basically like

55:21

things that weren't super efficient

55:22

becoming a lot more efficient by like a

55:24

factor of 100

55:25

>> because bits are so much easier and so I

55:27

think currently in terms of what's going

55:29

to change and like where the activity is

55:32

I kind of feel like digital space is

55:34

going to like change a huge amount and

55:36

then the physical space will lag behind

55:38

and what I find very interesting is like

55:39

this interface in between them as well

55:41

because I think in this

55:43

If you we do have more agents acting on

55:44

behalf of humans and more agents kind of

55:46

like talking to each other and and doing

55:49

tasks and participating in the kind of

55:50

economy of agents etc. Um you're going

55:53

to run out of things that you're going

55:54

to do purely in a digital space. At some

55:56

point you have to go to the universe and

55:57

you have to ask it questions. Um you

55:59

have to run an experiment and see what

56:01

the universe tells you to get back to

56:02

learn something. And so we currently

56:04

have a huge amount of like digital work

56:07

uh because there's an overhang in how

56:08

much we collectively thought about what

56:10

already is digital. So we just didn't

56:12

have enough thinking cycles among the

56:14

humans to think about all the

56:15

information that is already digital and

56:16

already uploaded. Um and so we're going

56:18

to start running out of stuff that is

56:20

actually like um already uploaded. Uh so

56:23

you're going to at some point read all

56:24

the papers and process them and have

56:26

some ideas about what to try. But um

56:28

yeah, we're just going to uh I don't

56:30

actually know how much you can like get

56:31

intelligence that's like fully closed

56:33

off and with just information that's

56:34

available to it, you know. And so I

56:36

think what what's going to happen is

56:37

first there's going to be huge amount of

56:38

unhobling and I think there's a huge

56:39

amount of work there. Then actually it's

56:41

going to move to like the interfaces

56:42

between physical and digital. So and

56:44

that's like sensors of like seeing the

56:46

world and actuators of like doing

56:47

something to the world. So I think a lot

56:49

of interesting companies will actually

56:50

come from that interface of like can we

56:53

feed the super intelligence in a certain

56:55

sense data and can we actually like take

56:57

data out and manipulate the physical

56:59

world um per its bidding if you want to

57:01

like interropomorphize the whole thing

57:03

right and then the the physical world

57:04

actually I almost feel like the the

57:06

total addressable market etc in terms of

57:07

like the amount of work and so on is is

57:09

massive possibly even much larger maybe

57:12

what can happen in digital space so I

57:13

actually think it's like a much bigger

57:14

opportunity as well but um I do feel

57:18

like it's a huge amount of work and and

57:20

in my in my mind the atoms are just like

57:22

a a million times harder. So um so it

57:25

will lag behind but it's also I think a

57:27

little bit of bigger market. So it's

57:28

kind of like uh yeah I think the

57:30

opportunities kind of like follow that

57:31

kind of trajectory. So right now this

57:34

digital is like my main interest then

57:36

interfaces would be like after that and

57:38

then maybe like some of the physical

57:40

things um like their time will come and

57:41

they'll be huge when they do come. Well,

57:44

it's it's an interesting framework for

57:45

it too because uh certain things not the

57:47

things I'm working on right now but

57:48

certain things are much easier even in

57:50

the world of atoms right like if you

57:52

just think about like read and write to

57:54

the physical world like read like

57:56

sensors cameras like there's a lot of

57:58

existing hardware and you can imagine

58:00

58:01

>> enriching agent capabilities or

58:03

capturing a lot of new data if you're

58:04

just clever about it and like you don't

58:06

necessarily have to invest a lot to like

58:09

get something valuable.

58:11

>> Yeah. So like examples of this that I

58:12

saw for example are you know a friend of

58:14

mine Liam is running is a CEO of

58:16

periodic I visited them last week so

58:19

it's just on top of mind like they're

58:21

trying to do auto research for material

58:22

science

58:23

>> um and so in that case it's like the

58:25

sensors to the intelligence are actually

58:27

like pretty expensive lab equipment and

58:28

the same is true in biology. I think a

58:30

lot of people are very interested in

58:31

engineering biology and you know the

58:32

sensors will be more than just like

58:34

video cameras if that makes sense. And

58:35

then the other thing I was I saw for

58:37

example is companies that are trying to

58:38

have um like you basically pay people

58:40

for training data. Yeah. As an example

58:42

to feed

58:42

>> programmatically.

58:43

>> Yeah. To feed to feed the Borg. Uh um

58:47

and so like these are all examples of

58:48

like sensors in a certain sense. So they

58:50

take many diverse shapes and forms if

58:51

that makes sense.

58:53

>> Yeah. So I'm looking forward to the

58:54

point where I can ask for a task in the

58:56

physical world and I can put a price on

58:58

it and just tell the agent like you know

59:00

you figure out how to do it. Go get the

59:02

data.

59:02

>> I'm actually kind of surprised we don't

59:03

have enough like information markets.

59:05

Mhm.

59:05

>> Like if for example if poly market or

59:07

other betting markets or even stocks etc

59:08

if they have so much autonomous activity

59:10

and rising amount of activity

59:12

>> like um why should like for example if

59:14

Iran was just happening now like how

59:16

come there isn't a process where like

59:17

taking a photo or video from somewhere

59:18

in Tan should cost like 10 bucks like

59:21

someone should be able to pay for that

59:22

you know like and that's an example of

59:23

like feeding the intelligence there's

59:25

not going to be a human looking at it

59:26

it's going to be like agents who are

59:27

trying to guess the betting games and

59:29

stock markets and so on. M

59:30

>> so I kind of feel like the agentic web

59:32

is still like fairly new that there's no

59:33

like mechanisms for this but this is an

59:35

example of what I I think might happen

59:38

there's a good book that maybe is

59:39

inspiring called demon you potentially

59:42

read it in Damon the intelligence um

59:45

ends up like puppeteering almost a

59:47

little bit like humanity in a certain

59:48

sense you know and so humans are kind of

59:49

like its actuators but humans are also

59:51

like its sensors

59:53

>> um and so I think like collectively like

59:55

society will kind of like reshape in a

59:56

certain way in uh to to serve that kind

60:00

kind of a that will kind of like end up

60:02

happening collectively across the

60:03

industry where yeah there's just a lot

60:06

more automation and has certain needs

60:07

and kind of humans will be serving those

60:09

needs of that of that machine not

60:11

necessarily like to each other

60:12

>> well we were um on this very specific

60:14

point of uh like missing pieces of

60:16

training data we needed um we needed

60:18

something like auto research right like

60:20

we we need the training cycle or the SFT

60:22

piece to be uh far more mechanized

60:27

>> for for what part

60:28

>> in order to make the uh collection like

60:31

in order to take the human out of the

60:33

loop to ask for a task that is just like

60:34

improve my model quality

60:36

>> with new data,

60:38

>> right?

60:38

>> Uh yes.

60:40

>> Does that make sense to you? Like we um

60:42

if you can't have the model do the

60:45

training runs by itself,

60:47

>> then your ability to do this as a like

60:50

closed loop task Yes. with u by pricing

60:53

data Yeah.

60:54

>> is um more challenged.

60:55

>> Yes. Yes. 100%. Yeah. But now the thing

60:58

is for LLM training it actually is like

61:00

very easily it like really fits the

61:01

paradigm.

61:02

>> Um so you'd actually

61:04

>> yeah clean metric

61:05

>> yeah like LM training actually fits the

61:06

paradigm really well really easily like

61:08

all the optimization of all the code and

61:10

so it runs faster and then you also have

61:12

like metrics that you can optimize

61:13

against. I do think that if you had an

61:15

autonomous loop over those metrics

61:16

there's going to be a lot of like good

61:17

harding going on where the system will

61:19

like overfitit to those metrics and so

61:21

um but then you can use the system to

61:23

devise more metrics and you just have

61:24

really good coverage. So it's kind of

61:26

hard to tell but um in a certain sense

61:28

it's like a pretty pretty good fit.

61:30

>> I want to talk about a little uh tiny

61:32

side project you have before we end. Um

61:34

tell me about the micro GPTR.

61:37

>> Oh yeah. Okay. So micro GPT. So I have

61:39

this like running obsession of like

61:41

maybe a decade or two of just like

61:42

simplifying and boiling down the uh

61:45

basically LLMs uh to like their bare

61:47

essence. And I've had a number of

61:48

projects along these lines. So like nano

61:50

GPT and um make more and uh micro GP

61:54

microrad etc. So I feel like micro GPT

61:57

is now the state-of-the-art of me trying

61:58

to like just boil it down to just the

62:00

essence because the thing is like

62:02

training neural nets and LLMs

62:04

specifically um it's a huge amount of

62:05

code but all of that code is actually

62:07

complexity from efficiency.

62:09

>> It's just because you need it to go

62:10

fast. If you don't need it to go fast

62:12

and you just care about the algorithm

62:13

then that algorithm actually is 200

62:15

lines of Python very simple to read and

62:17

this includes comments and everything.

62:19

Um because you just have like uh your

62:21

data set which is a text um and you need

62:23

your neural network architecture which

62:24

is like 50 lines. You need to do your

62:26

forward pass and then you have to do uh

62:28

your backward pass to calculate the

62:29

gradients. And so an little autograd

62:31

engine uh to calculate the gradients is

62:32

like 100 lines and then you need an

62:34

optimizer an atom for example which is a

62:36

very state-of-the-art optimizer is like

62:38

again 10 lines really. And so putting

62:40

everything together in a training loop

62:41

is like yeah 200 lines. And it was

62:44

interesting to me like normally before

62:46

like maybe a year ago or more if I had

62:49

come up with micro GPT I would be

62:50

tempted to basically explain to people

62:52

like I have a video like stepping

62:54

through it or something like that. Uh

62:56

and I actually tried to make that video

62:58

a little bit and I tried to make like a

62:59

little guide to it and so on but I kind

63:01

of realized that this is is not really

63:03

it's not really adding too much because

63:05

people cuz it's already so simple that

63:06

it's 200 lines that anyone could ask

63:08

their agent to explain it in various

63:10

ways and the agents like I'm not

63:12

explaining to people anymore. I'm

63:13

explaining it to agents. If you can

63:14

explain it to agents, then agents can be

63:16

the router and they can actually target

63:18

it to the human in their language uh

63:20

with infinite uh you know uh patience

63:23

and uh just at their capability and so

63:25

on.

63:26

>> Right. If I don't understand um this

63:28

particular function, I can ask the agent

63:30

to explain it to me like three different

63:31

ways and I'm not going to get that from

63:32

you.

63:33

>> Exactly.

63:33

>> And so I kind of feel like you know what

63:35

is education? like it used to be guides,

63:36

it used to be lectures, it used to be

63:38

this thing, but I feel like now more I'm

63:40

explaining things to agents and maybe

63:41

I'm coming up with skills uh where like

63:44

um uh so basically skill is just a way

63:47

to instruct the agent how to teach the

63:49

thing. So maybe I could have a skill for

63:50

micro GPT of the progression I imagine

63:52

the agent should take you through if

63:54

you're interested in understanding the

63:55

codebase and it's just like hints to the

63:57

model to like oh first start off with

63:58

this and then with that and so I could

64:00

just script the curriculum a little bit

64:02

as a skill. Uh, so, uh, so I I don't

64:05

feel like, um, yeah, I feel like there's

64:07

going to be less of like explaining

64:08

things directly to people and it's going

64:10

to be more of just like does the agent

64:12

get it? And if the agent gets it,

64:13

they'll do the explanation. And we're

64:15

not fully there yet because they I still

64:17

can I still think I can probably explain

64:19

things a little bit better than the

64:20

agents, but I still feel like the models

64:21

are improving so rapidly that um, I feel

64:25

like it's a losing battle to some to

64:26

some extent.

64:28

Um and so I think uh education is going

64:30

to be kind of like reshuffled by this

64:32

quite substantially uh where it's the

64:34

end of like teaching each other things

64:36

almost a little bit like if I have a um

64:38

library for example of code or something

64:40

like that it used to be that you have

64:41

documentation for other people who are

64:42

in my user library but like you

64:44

shouldn't do that anymore like you

64:45

should have instead of HTML documents

64:47

for humans you have markdown documents

64:48

for agents because if agents get it then

64:50

they can just explain all the different

64:52

parts of it. So it's this redirection

64:54

through agents, you know, um, and that's

64:56

like, so I think we're going to see a

64:58

lot more of that playing out.

65:00

>> Well, we'll see if the great teachers

65:02

know like to develop intuition for how

65:04

to explain things to agents differently.

65:06

>> Ultimately, so for example, micro GPT

65:07

like I asked I tried to get an agent to

65:09

write micro GPT. So I told it like try

65:11

to boil down the simplest things like

65:14

try to boil down um, neural networking

65:16

to the simplest thing and can't do it.

65:17

like micro GPT is like my is it's like

65:20

my end of my obsession. It's the 200

65:23

lines. I thought about this for a long

65:25

time. I was obsessed about this for a

65:26

long time. This is this is the solution.

65:28

Trust me, it can't get simpler. And this

65:30

is this is my value ad. Everything else

65:32

like agent gets it.

65:33

>> It just can't come up with it. But it

65:35

totally gets it and understands why it's

65:36

done in certain way etc. So like my

65:39

contribution is kind of like these few

65:40

bits, but everything else in terms of

65:42

like the education that goes on after

65:44

that is like not my domain anymore. So

65:47

maybe yeah, it's like education kind of

65:49

changes in those ways where you kind of

65:50

have to infuse the few bits that you

65:52

feel strongly about the curriculum or

65:54

the the best the better way of

65:56

explaining it or something like that.

65:57

The things that agents can't do is your

65:59

job now. The things that agents can do,

66:01

they can probably do better than you or

66:03

like very soon. And so you should um be

66:05

strategic about what you're actually

66:07

spending time on.

66:07

>> Well, we appreciate the few things.

66:09

Thank you, Andre.

66:10

>> Okay.

66:13

>> Find us on Twitter at no prior pod.

66:16

Subscribe to our YouTube channel if you

66:17

want to see our faces. Follow the show

66:19

on Apple Podcasts, Spotify, or wherever

66:22

you listen. That way, you get a new

66:23

episode every week. And sign up for

66:25

emails or find transcripts for every

66:27

episode at no-briers.com.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Andrej Karpathy discusses the revolutionary shift in software engineering caused by AI agents, a phenomenon he calls "AI psychosis" due to the massive jump in individual capability. He shares how his workflow shifted from writing code to delegating to agents like Claude and Cursor, even automating his home via a WhatsApp-controlled agent named "Dobby." The conversation delves into "Auto Research," where AI models recursively improve themselves, the concept of "jagged intelligence" where models excel at coding but fail at simple humor, and the future of education, where experts will document for agents rather than humans.