The Dangerous Illusion of AI Coding?

The Dangerous Illusion of AI Coding? - Jeremy Howard

Watch on YouTube

Now Playing

The Dangerous Illusion of AI Coding? - Jeremy Howard

Transcript

2246 segments

0:00

It it it literally disgusts me. Like I

0:02

literally think it's it's inhumane. My

0:05

mission remains the same as it has been

0:07

for like 20 years, which is to stop

0:10

people working like this.

0:12

>> Jeremy Howard, a deep learning pioneer,

0:15

a Kaggle grandmaster. He is a huge

0:18

advocate for actually understanding what

0:21

we are building through an interactive

0:24

loop, a notebook, a ripple, the act of

0:28

poking at a problem until it pushes

0:30

back. He argues this is where the real

0:34

insight happens. And the funny thing is

0:36

they're both right. LLM's

0:40

cosplay understanding things. They

0:42

pretend to understand things. No one's

0:44

actually creating

0:46

50 times more high-quality software than

0:48

they were before. Um, so we've actually

0:52

just done a study of this and there's a

0:55

tiny uptick tiny uptick in what people

0:58

are actually shipping. The thing about

1:01

AI based coding is that it's like a slot

1:05

machine in that you you have an illusion

1:08

of control. you know, you could get to

1:09

craft your prompt and your

1:12

list of MCPs and your skills and

1:15

whatever and then but in the end you

1:17

pull the lever, right? Here's a piece of

1:19

code that no one understands.

1:20

>> Yeah.

1:22

>> And am I going to bet

1:26

my company's product on it?

1:29

And I the answer is I don't know because

1:31

like I I don't I don't like I don't know

1:34

what to do now because no one's like

1:37

been in this situation. They're they're

1:39

really bad at software engineering.

1:42

Uh and then I think that's possibly

1:46

always going to be true. The idea that a

1:48

human can do a lot more with a computer

1:52

when the human can like manipulate the

1:56

objects in inside that computer in real

1:59

time and study them and move them around

2:01

and combine them together. Whoever you

2:03

listen to, you know, whether it be

2:05

Feainman or whatever, like you you

2:07

always hear from the great scientists,

2:09

how

2:11

they build deeper intuition by by

2:14

building mental models which they get

2:16

over time by interacting with the things

2:19

that they're learning about.

2:22

A machine could kind of build an

2:26

effective hierarchy of abstractions

2:28

about what the world is and how it works

2:30

entirely through

2:33

looking at the statistical correlations

2:34

of a huge corpus of text using a deep

2:37

learning model. That was my premise.

2:40

This video is brought to you by Nvidia

2:42

GTC. It's running March the 16th until

2:45

the 19th in San Jose and streaming free

2:48

online. The key topics this year are

2:50

agentic AI and reasoning, high

2:52

performance inference and training, open

2:54

models, and physical AI and robotics.

2:58

I'm so excited about the DJX Spark. I've

3:00

been on the waiting list for over a year

3:02

now. It's a personal superco computer

3:05

that is about the size of a Mac Mini.

3:07

It's the perfect adornment to a MacBook

3:09

Pro, by the way. And you can fine-tune a

3:11

70 billion parameter language model with

3:13

one of these things. And I'm giving one

3:15

away for free. All you have to do is

3:18

3:19

of the sessions using the link in the

3:21

description. As for the sessions, I'm

3:23

interested in attending Ammon Sang's

3:25

talk. So, he's the CTO of Cursor and his

3:28

session is code with context. Build an

3:30

agentic IDE that truly understands your

3:33

codebase. Now, obviously, Jensen's

3:35

keynote is on March the 16th. He said

3:37

he's going to unveil a new chip that

3:39

will surprise the world. Their next

3:42

generation architecture, Vera Rubin, is

3:44

already in full production. And there's

3:46

speculation we might even get an early

3:48

glimpse of their new Fineman

3:50

architecture. So don't forget folks, the

3:52

link is in the description. If you're

3:54

attending virtually, it's completely

3:55

free. Don't miss it. Jeremy Howard,

3:58

welcome to MLST.

3:59

>> I mean, welcome to my home. Thanks for

4:02

coming.

4:03

>> Yeah. Well, where are we now?

4:04

>> We are in beautiful Morton Bay in

4:06

southeast Queensland. We are by the sea

4:10

um in my backyard.

4:12

>> The weather didn't disappoint.

4:13

>> It certainly didn't. It doesn't often,

4:15

but if you were here yesterday, it would

4:16

have been very different.

4:18

>> Well, I don't know where to start. So,

4:19

I've been I've been a huge fan probably

4:21

since about 2017 18. Of course, you had

4:23

the famous ulm fit paper. And uh when I

4:26

was at Microsoft, I remember doing a

4:27

presentation about that because it was

4:29

actually I mean now we take it for

4:30

granted that we fine-tune language

4:33

models on a corpus of text and then we

4:37

kind of like continue to train them and

4:39

specialize them. But apparently this was

4:41

not received wisdom.

4:42

>> No, this was the first time it happened.

4:44

Yeah, kind of the first or second. Uh,

4:46

so Quarkley and Andrew Dy had done

4:49

something a few years ago, but they had

4:50

missed the key point, which is the thing

4:52

you pre-train on has to be a general

4:54

purpose corpus.

4:55

>> So, no one quite realized this key

4:57

thing. And maybe I had a bit of fortune

4:59

here that my background was in

5:01

philosophy and cognitive science. And

5:04

so, I'd spent some decades thinking

5:05

about this.

5:06

>> The technical architecture of of ULM

5:09

fit. Just just sketch that out.

5:10

>> I'm a huge fan of regularization. And

5:13

I'm a huge fan of taking a model that's

5:14

incredibly flexible and then making it

5:18

more constrained not by decreasing the

5:20

size of the architecture by but by

5:21

adding regularization. So even that at

5:24

the time was

5:26

extremely controversial. Uh but that was

5:28

by no means a unique insight of of ours.

5:32

So what Steven Merid had done is he

5:34

taken the extreme flexibility of an LS

5:37

LSTM a kind of ver classic stateful

5:42

recurrent neural net towards which

5:44

things are kind of gradually heading

5:45

back towards nowadays and it added five

5:48

different types of regularization. He

5:49

added every type of regularization you

5:51

can imagine. And then um that was my

5:55

starting point was to say okay I now

5:56

have a massively flexible deep learning

5:58

model that can be as powerful as I want

5:59

it to be and it can also be as

6:01

constrained as I need it to be. And then

6:03

I needed a really big corpus of text.

6:06

Funnily enough, this is also Steven. He

6:08

had been at Common Crawl and uh I think

6:11

he helped or made the uh Wikipedia data

6:14

set. And then I realized actually the

6:16

Wikipedia data set made lots of

6:18

assumptions. It had all these like unk

6:21

for unknown words because it all assumed

6:24

classic NLP approaches. So I redid the

6:26

whole thing, created a new Wikipedia

6:28

data set and that was my general corpus

6:30

and then I used a WDLSTM trained it. So

6:34

it was actually overnight. So for eight

6:37

hours on a gaming GPU, you know, um

6:41

because I was at the University of San

6:42

Francisco, we didn't have heaps of

6:45

resources. Um probably like a 2080 Ti or

6:49

something, I suspect. Um and then the

6:51

then next morning when I woke up, I then

6:54

it's the same three-stage architecture

6:56

that we do today, you know,

6:57

pre-training, mid-training,

6:59

post-training. So then I figured, okay,

7:01

I've now that I've trained something to

7:03

predict the next word of Wikipedia, it

7:06

must know a lot about the world. I

7:10

then figured if I then fine-tune it on a

7:12

corpus specific, so what we could now

7:14

call supervised fine-tuning um data set,

7:17

which in this case was a data set of

7:19

movie reviews, it would becomes

7:21

especially good at predicting the next

7:23

word of those. So it would learn a lot

7:25

about movies. uh did that for like an

7:28

hour and then like a few minutes of

7:32

fine-tuning

7:33

the downstream classifier which was a

7:35

classic academic uh data set kind of

7:38

considered the hardest one which was to

7:40

take like 5,000word movie reviews and to

7:42

say like was this a positive or negative

7:44

sentiment which today is considered easy

7:46

but at that time you know the only

7:49

things that did it quite well were

7:51

highly specialized models that people

7:53

wrote their whole PhDs on I beat all of

7:57

their results, you know, 5 minutes later

7:59

when it fine-tuning that that model. It

8:02

was uh amazing.

8:04

>> And the other interesting thing is this

8:06

kind of um methodology around how you do

8:09

the finetuning.

8:10

>> Yeah. So the how we do the finetuning

8:12

was something we had developed at fast

8:14

AI. So this is kind of year one of fast

8:18

AI. So this is still in our very early

8:19

days. And one of the extremely

8:22

controversial things we did was we felt

8:24

that we should focus on fine-tuning

8:27

existing models because we thought

8:28

fine-tuning was important. Uh some other

8:31

folks were doing work contemporaneously

8:33

with that. So um Jason Nusinski did some

8:36

really great uh research I think it's

8:38

during his PhD on how to fine-tune

8:43

models and how good they can be and some

8:44

other folks in the in the computer

8:47

vision world.

8:49

we were, you know, amongst the first.

8:51

There's a bunch of us kind of really

8:53

investing in fine-tuning. And so, yeah,

8:56

we we felt that using a single learning

8:59

rate to fine-tune the whole thing all at

9:01

once made no sense because the different

9:04

layers have different behaviors. And

9:06

this is one of the things Jason

9:06

Yusinsk's research also showed. We

9:09

developed this idea of like, well, it's

9:11

also way faster if you just train the

9:13

last layer, right? because it only has

9:14

to backrop the last layer and then once

9:17

that's pretty good backrop the ne the

9:19

last two and then the last three and

9:21

then we use something called

9:22

discriminative learning rates. So

9:23

different layers we would give different

9:25

learning rates to and then another

9:27

critical insight that no one realized

9:30

for years even though we had told

9:32

everybody was that you actually have to

9:35

fine-tune every batch norm. So all the

9:37

normalization layers you do actually

9:39

have to fine-tune

9:41

because that's moving the whole thing up

9:43

and down or changing its scale. So yeah,

9:46

when you do that, you can often just

9:47

fine-tune the last layer or two. And we

9:50

found that actually with ulm fit uh

9:52

although we did end up unfreezing all

9:54

the layers, only the last two were

9:57

really needed to get the close to

9:58

state-of-the-art result. So it like took

10:00

like seconds.

10:01

>> Yeah. Because the discriminative

10:03

learning rate thing is interesting

10:04

because I I think the received wisdom at

10:06

the time was when you fine-tune a model,

10:08

if the learning rate is too high, you

10:09

kind of blow out the representations. So

10:12

I guess the wisdom was if if you don't

10:14

have a really low learning rate, you'll

10:16

just destroy the representations.

10:17

>> I mean, there was no received wisdom

10:19

because nobody talked about it. No one

10:20

cared, you know. It was just this

10:24

to like nearly no one cared. Transfer

10:26

learning was just not something anybody

10:28

thought about. And Rachel and I felt

10:31

like it matters more than anything, you

10:34

know, because

10:36

only one person has to train a really

10:38

big model once and then the rest of us

10:40

can all fine-tune it. Um, so we thought

10:44

we just should learn how to do that

10:45

really well. So we um

10:49

spent a lot of time just trying lots of

10:51

things, but in the end the intuition was

10:54

pretty straightforward and what

10:55

intuitively seemed like it ought to work

10:57

basically always did work. Which is

11:00

another big difference between how

11:01

people still today tend to do ML

11:05

research is they think it's all about

11:08

ablations and you can't make any

11:11

assumptions or guesses. And it's not at

11:14

all true. I find nearly everything that

11:16

I expect to work almost always works

11:18

first time because I spend a lot of time

11:21

building up that those intuitions that

11:24

kind of understanding of how gradients

11:27

behave.

11:27

>> I I think there's a dichotomy though

11:29

between continual learning which is when

11:31

we want to keep training the thing but

11:33

maintain generality versus fine-tuning a

11:36

thing to do something specific. that

11:38

there's always been this idea that yes,

11:40

you can make a model specific, you can

11:42

bend it to your will, but you lose

11:44

generality and you kind of degrade the

11:46

representation. So tell me about that.

11:48

>> Yeah, there's some truth in that,

11:50

although not as much as you might think.

11:52

On the whole, the big problem is that

11:54

people don't actually look at their

11:57

activations and don't actually look at

11:58

their gradients. So something we do in

12:00

our software in our fast AR software is

12:02

we have built into it this ability to to

12:05

see in a glance

12:07

what your entire network looks like. And

12:09

once you've done it a few times it just

12:11

takes a couple of hours to learn you can

12:13

immediately see oh I see this is

12:16

overtrained or undertrained or at this

12:18

layer that something went wrong. It's

12:21

not a mystery you know. So basically

12:22

what happens is for example you end up

12:24

with with dead neurons that go to a

12:26

point where they they've got zero

12:29

gradient regardless of what you do with

12:30

them. Um that often happens if they get

12:34

head off towards infinity. You can

12:37

always fix that. So yeah it's it's not

12:40

as bad as people think by any means.

12:43

something that trains well for

12:45

continuous learning when done properly

12:48

can also be done well to train well for

12:52

a particular task if you're careful in a

12:55

sense you do want the neurons to die and

12:58

I'll explain what what I mean by this

13:00

like we want to bend the behavior of

13:02

models to introduce implicit constraints

13:06

because without constraints there is no

13:07

creativity there is no reasoning and and

13:10

so on and so forth so so in a sense you

13:11

actually want it to say, "Don't do that.

13:14

You want it to do something else."

13:16

>> I don't think of it that way. Like to

13:18

me, it's more like I find thinking about

13:21

humans extremely helpful when it comes

13:24

to thinking about AI. I find they

13:27

behave more similarly than differently.

13:29

Um, and my intuition about each tends to

13:31

work quite well. You know, with a human,

13:35

when you learn something new, it's not

13:38

about unlearning something else. And so

13:40

something I always found is when I got

13:44

models to try to learn to do two

13:46

somewhat similar tasks, they almost

13:48

always got better at both of them than

13:50

one that only learned one of them.

13:52

>> Um I was reminded a little bit of um you

13:54

know the Dino paper from Lun. So this

13:56

whole kind of um regime of self-s

13:58

supervised learning with with um I mean

14:00

that that was that was a vision model

14:02

but the you know the idea was okay so

14:04

we're doing pre-training and we want to

14:06

maintain as much diversity and fidelity

14:08

as possible so that when we do the

14:10

downstream task we can kind of we've got

14:12

more things that we can latch on.

14:14

>> Yeah. Yeah. And um

14:17

you know semi-supervised and

14:18

self-supervised learning was such an

14:20

unappreciated area and yeah Jan Lun was

14:22

absolutely one of the guys who was also

14:26

working on it.

14:27

>> I actually did a post because I was so

14:30

annoyed at how few people cared about

14:31

semi-supervised learning. I did a whole

14:33

post about it years ago. Yan Lukun

14:35

looked at it for me as well and you know

14:37

suggested a few other pieces of work

14:38

that I I had missed and um but I was

14:41

kind of surprised at how well you know

14:43

how incredibly useful it is to basically

14:47

say like basically come up with a

14:49

pretext task right so in vision so we

14:51

did this in vision before ulm fit so it

14:53

was like in medical imaging you know p

14:58

take a a a hisytologology slide and

15:02

predict

15:04

you know, mask out a few squares and

15:06

predict what used to be there. So, some

15:08

of my students at USF I had doing stuff

15:10

with that. It was basically entirely

15:13

taking stuff that we and others had

15:15

already done in vision.

15:16

>> Yeah.

15:17

>> So, like this idea of masking out

15:19

squares, we didn't invent it. Masking

15:22

out words was the obvious thing, you

15:25

know, and this idea of um gradually

15:27

unfreezing layers we had done before in

15:29

computer vision. The whole idea of

15:31

starting with a pre-trained model that

15:33

was general purpose had been in computer

15:35

vision. There was a really classic paper

15:38

actually in computer vision in might

15:42

have been around 2015 was entirely an

15:44

empirical paper saying look what happens

15:46

when we take a pre-trained imageet model

15:48

predicting what sculptor created this

15:50

sculpture or predicting what

15:52

architecture style this is and like in

15:54

every task it got the state-of-the-art

15:55

result. And it really surprised me.

15:58

People didn't look at that and think

15:59

like, I bet that ought to work in every

16:02

other area as well, whether it be genome

16:04

sequences or language or whatever. But

16:07

people have a bit of a lack of

16:09

imagination. I find they tend to assume

16:11

things only work in one particular

16:13

field. Um, that's really true.

16:16

>> Yeah. I mean, I guess there's two things

16:17

there. I mean, first of all, we were

16:18

kind of hinting at this notion of almost

16:20

Goodart's law or the shortcut rule that

16:22

you get exactly what you optimize for at

16:24

the cost of everything else. But that

16:25

doesn't seem to be the case because we

16:27

can you know optimize for perplexity in

16:29

the case of language models and as you

16:30

say what seems to happen is we're

16:32

getting into the distributional

16:33

hypothesis here a little bit. So you

16:35

know you you know the word by the

16:36

company it keeps. So when we have an

16:38

incredible amount of associative data it

16:40

might be master auto prediction or any

16:42

of these things like that. The model

16:44

seems to build something that we might

16:46

call an understanding like

16:49

>> or I have always thought of it as a

16:51

hierarchy of of abstractions. you know,

16:53

it it it needs if it's going to predict,

16:56

you know, if the if the document is

17:00

here was the,

17:02

you know, opening that uh,

17:06

you know, that um, Bobby Fisher used and

17:09

it has chess notation to predict the

17:11

next thing, it needs to know something

17:12

about chess notation or at least

17:14

openings. Um if it's like uh you know

17:17

and this was vetoed by the 1956 US

17:21

president, you need to know like you

17:24

don't even you don't just need to know

17:25

who the president was but the idea that

17:26

there are presidents and therefore that

17:28

the idea that there are leaders and

17:29

therefore the idea that there are groups

17:31

of people who have hierarchies and

17:32

therefore that there are people and

17:34

therefore that there are objects and

17:35

like you can't predict the next word of

17:37

a sentence

17:39

well without knowing all of these

17:43

things. So that knowing

17:46

uh my hypothesis for why I created ULM

17:48

ffit is to say it would end to to to

17:51

compress that as well as possible to get

17:53

that knowledge, it would have to create

17:56

these abstractions, these hierarchies of

17:58

abstractions somewhere deep inside its

18:00

model. Otherwise, how could it possibly

18:02

do a good job of predicting the next

18:04

word, you And because um deep learning

18:07

models are universal learning machines,

18:11

you know, and we had a universal way to

18:13

train them, I figured

18:15

if if we get the data right and if the

18:18

hardware is good enough, then in theory,

18:21

we ought to be able to build that next

18:24

word predicting machine, which ought to

18:27

implicitly build a hierarchical

18:30

structural understanding of the things

18:32

that are being described by the text

18:34

that it is learning to predict.

18:36

>> I think that they can know in quite a,

18:37

you know, they know in quite a

18:39

superficial way. So there's a myriad of

18:42

surface statistical relationships and

18:44

they generalize extraordinarily well.

18:47

It's it's miraculous.

18:48

>> It is.

18:49

>> But the thing is I want to contrast this

18:51

with other comments you've made about

18:52

creativity. So I I I think knowledge is

18:55

about constraints and I think creativity

18:57

is the evolution of knowledge,

18:59

respecting those constraints. Therefore,

19:00

AI is not creative. And and you've said

19:02

the same thing. You've said AI isn't

19:04

creative. So like on the one hand, how

19:05

can you say that they know and not think

19:09

that they can be created?

19:10

>> I mean, I don't think I've used that

19:12

exact expression. You know, I know

19:14

actually I remember

19:16

chatting with Peter Norvig on camera and

19:18

both of us said, well, actually, they

19:20

kind of are creative like we just got to

19:22

be a bit careful about our choices of

19:24

words, I guess. So um you know Peter

19:26

Wnjak who's a guy I really really

19:28

respect who kind of rediscovered space

19:31

repetitive learning built the super memo

19:32

system and is the modern-day guru of

19:36

memory. The entire reason he's based his

19:38

life around remembering things is

19:40

because he believes that creativity

19:43

comes from having a lot of stuff

19:46

remembered which is to say putting

19:48

together stuff you've remembered in

19:51

interesting ways is a great way to be

19:52

creative. LLMs are actually quite good

19:55

at that, but there's a kind of

19:57

creativity they're not at all good at,

20:00

which is, you know, moving outside the

20:03

distribution. So,

20:04

>> uh, which I think is where you're

20:06

heading with your question. Um, but I'm

20:09

just kind of I'm framing it this way to

20:11

say you have to be so nuanced about this

20:13

stuff because if you say like they're

20:15

not creative, it gives you the can give

20:17

you the wrong idea because they can do

20:20

very creative seeming things.

20:24

But if it's like well can they really

20:26

extrapolate outside the training

20:28

distribution

20:30

the answer is no they can't

20:33

>> but the training distribution is so big

20:35

and the number of ways to interpolate

20:37

between them is so vast

20:39

we don't really know yet what the

20:42

limitations of that is but I see it

20:45

every day you know because I my my work

20:48

is R&D I'm constantly on the edge of and

20:52

outside the training data. I'm doing

20:53

things that haven't been done before.

20:55

And there's this weird thing, I don't

20:57

know if you've ever seen it before. I

20:58

see it, but I see it multiple times

20:59

every day where the LM goes from being

21:02

incredibly clever

21:05

to like worse than stupid like like not

21:09

understanding the most basic fundamental

21:11

premises about how the world works.

21:14

>> Yeah.

21:14

>> And it's like, oh, whoops. I fell

21:16

outside the training data distribution.

21:18

It's gone dumb. And then like there's no

21:20

point

21:21

having that discussion any further

21:23

because

21:24

>> yes,

21:24

>> you know, you've lost it at that point.

21:26

>> Yes. I mean I I love um you know

21:28

Margaret Bowden, she had this kind of

21:30

hierarchy of creativity. So there's like

21:32

combinatorial, exploratory and um and

21:34

transformative and the models can

21:36

certainly do combinatorial creativity

21:38

but for me it's all about constraints.

21:40

So that I mean this is what Bowden said

21:42

and even Leonardo da Vinci he said that

21:44

creativity is all about constraints and

21:45

and you've spoken about you know we'll

21:47

talk about this dialogue engineering but

21:49

what happens is when when we talk with

21:50

language models it's a specification

21:53

acquisition problem. So we go back and

21:55

forth and actually when we think the

21:57

process of intelligence is about

21:59

building this imaginary Lego block in

22:00

our mind and respecting various

22:02

constraints and when you respect those

22:04

constraints and you just continue to

22:05

evolve then those things are said to be

22:07

creative. So language models when you

22:10

add constraints to them so this could be

22:11

via supervision via critics via

22:13

verifiers then they are creative and and

22:16

we alpha evolve we've seen many examples

22:18

of this but the illusion is on their own

22:21

sans constraints obviously they have

22:22

this behavioral shaping stuff that we're

22:24

talking about they don't have hard

22:26

constraints and that's why they can't go

22:27

outside their distribution I mean I

22:30

think they can't go outside their

22:31

distribution because it's just something

22:32

that a um that type of mathematical

22:36

model can't do you know I It can do it,

22:38

but it won't do it well. You know, when

22:39

you look at the kind of 2D case of of

22:42

fitting a curve to data, once you go

22:45

outside the area that the data covers,

22:48

the curves disappear off into space in

22:51

wild directions, you know, and that's

22:53

all we're doing. But we're doing it in

22:55

multiple dimensions. You know, I think

22:58

Bowden might be pretty shocked at how

23:02

far compositional creativity can go when

23:04

you can compose

23:07

the entirety of the human knowledge

23:09

corpus.

23:10

Um, and I think this is where people

23:12

often get confused because it's like, so

23:16

for example, I was talking to Chris

23:18

Latner yesterday about

23:21

uh how

23:22

Claude uh Anthropic, you know, had had

23:25

got Claude to write a C compiler and

23:28

they were like, "Oh, this is a clean

23:30

room C compiler. You can tell it's clean

23:32

room because it was created in Rust, you

23:35

know, and um so Chris created the kind

23:38

23:40

you know, I guess it's probably the top

23:42

most widely used C C++ compiler nowadays

23:45

playing on top of LLVM, which is the

23:47

most widely used kind of foundation for

23:49

compilers. Um, they were like, "Oh,

23:51

well, Chris didn't use Rust. This is,

23:53

you know, and we didn't give it access

23:54

to any

23:56

compiler source code, so it's a clean

23:58

room implementation."

24:01

But that misunderstands

24:03

how LLMs work, right? which is all of

24:07

Chris's work was in the training data

24:09

many many times LLVM is used widely and

24:11

lots and lots of things are built on it

24:13

um including lots of C and C++ compilers

24:18

converting it converting it to Rust is

24:21

an interpolation

24:23

between

24:25

parts of the training data you know it's

24:27

a style transfer problem um so it's

24:31

definitely compositional creativity at

24:33

most if you can call it creative at all

24:34

and you actually see it when you look at

24:37

the the repo that it created, it's

24:41

copied

24:43

parts of the LL VM code which today

24:46

Chris says like, "Oh, I made a mistake.

24:49

I shouldn't have done it that way.

24:50

Nobody else does it that way." You know,

24:53

oh wow, look, they're the only other one

24:55

that did it that way. That doesn't

24:56

happen accidentally. That happens

24:58

because you're not actually being

24:59

creative. you're actually just finding

25:02

the kind of nonlinear average point in

25:04

your training data between like Rust

25:07

things and building compiler things.

25:09

>> All of that is true. I mean first of all

25:11

I think we shouldn't underestimate the

25:13

size of how big this combinatorial

25:16

creativity is. So all of that is true.

25:17

So the code is on the internet but also

25:19

they had a whole bunch of tests which

25:21

were scaffolded which meant that every

25:23

single time some code was was committed

25:25

they could run the test and they and

25:26

they basically had a critic and they

25:28

could then do this autonomous feedback

25:30

loop. So in in a sense it's very similar

25:32

to the recent research by open AAI and

25:34

and Gemini where you you're you're

25:37

trying to solve a problem in math and

25:40

you already have an evaluation function.

25:42

The same on the AR prize, right? You

25:43

have an evaluation function and what

25:45

people discount is even knowledge of

25:48

what the evaluation function is is

25:50

partial knowledge of the problem. So you

25:52

can then brute force search. You can use

25:54

the statistical pattern matching, use

25:55

the verifier as a constraint and you can

25:57

actually

25:58

>> and they don't even need to do that,

25:59

right? like they you literally already

26:01

know how to pass those tests because

26:02

there's lots of software that already

26:04

does it.

26:04

>> So, it just uses that and translates

26:08

them to Rust. Like that's that's all it

26:10

did. Um which is impressive.

26:14

>> Yeah.

26:14

>> Um and if you I'm much less familiar

26:16

with math than I am computer science,

26:18

but from talking to mathematicians,

26:22

they tell me that that's also what's

26:24

happening with like Erdos problems and

26:26

stuff. It's some of them are

26:30

newly solved.

26:31

>> Yeah.

26:32

>> Um

26:34

but they are not

26:37

sparks of insight. You know, they're

26:39

solving ones that you can solve by

26:41

meshing up together

26:43

very closely related things that humans

26:46

have already figured out.

26:47

>> So on the subject of Claude code, now I

26:49

know you've spoken extensively about

26:51

vibe coding. Um actually Rachel had some

26:53

interesting work out. I mean she she

26:55

quoted the the meter study which showed

26:57

that productivity actually went down

26:59

when people were vibe coding but I think

27:00

>> and they thought that they went up which

27:02

is the most interesting

27:03

>> and then also there was the anthropic

27:04

study I mean you know maybe we should

27:05

rewind a little bit I mean Dario had

27:07

this essay out the other day uh I think

27:08

it was called the adolescence of

27:10

technology or something like that and

27:11

and he was basically saying look you

27:12

know um we have all of these amazing

27:14

software engineers at anthropic and they

27:17

are just so productive and he was

27:19

extrapolating to the average software

27:21

engineer so there's going to be mass

27:22

unemployment because soon we're going to

27:24

be able to automate all of this with AI.

27:26

>> I mean, it it doesn't make any sense. Um

27:30

Elon Musk said something a bit similar a

27:32

few days ago saying like, "Oh, LM will

27:35

just spit out the machine code directly.

27:37

We won't need libraries, programming

27:38

languages."

27:40

>> Yeah.

27:40

>> Um yeah, look, the thing is none of

27:43

these guys have

27:45

have been software engineers recently.

27:48

I'm not sure Dario's ever been a

27:49

software engineer at all. Software

27:50

engineering is a unusual discipline and

27:53

a lot of people mistake it for being the

27:56

same as

27:57

typing code into an IDE. Coding is

28:01

another one of these style transfer

28:03

problems. You you take a specification

28:06

of the problem to solve and you can use

28:09

your compositional creativity to find

28:11

the parts of the training data which

28:12

interpolated between them solve that

28:15

problem and interpolate that with

28:19

syntax of the target language and you

28:21

get code. There's a very famous essay by

28:25

Fred Brooks written many decades ago um

28:28

no silver bullet and which it almost

28:31

sounded like he was talking about today

28:33

it it he was specifically saying

28:35

something he was responding to something

28:36

very similar which is in those days it

28:38

was all like oh what about all these new

28:40

fourth generation languages and stuff

28:42

like that you know we're not going to

28:44

need any coders anymore any software

28:46

engineers anymore because software is

28:49

now so easy to write anybody can write

28:52

it Um,

28:55

and he said, well,

28:58

he guessed that you could get at maximum

28:59

a 30% improvement.

29:02

He specifically said a 30% improvement

29:03

in the next decade, but I don't think he

29:05

needed to limit it that much because the

29:08

vast majority of work in software

29:10

engineering isn't typing in the code.

29:12

>> Yeah. Um,

29:15

so in some sense parts of what Dario

29:18

said were right, just like for quite a

29:21

few people now most of their code is

29:26

being typed by a language model. Um,

29:31

that's true for me. Uh, say like maybe

29:34

90%.

29:36

Um, but it hasn't made me that much more

29:39

productive. um because that was never

29:43

the slow bit. It's also helped me with

29:46

kind of the research a lot and figuring

29:48

out, you know, which files are going to

29:50

be touched.

29:52

But anytime I've made any attempt at

29:55

getting an LLM to like design

29:59

a solution to something that hasn't been

30:01

designed lots of times before, it's it's

30:05

horrible because what it actually every

30:07

time gives me is the design of something

30:10

that looks on its surface a bit similar.

30:13

And often that's going to be an absolute

30:15

disaster because things that look on

30:17

their surface a bit similar and like I'm

30:18

literally trying to create something new

30:19

to get away from the similar thing. It's

30:22

very misleading. First of all, I'm I'm

30:24

exasperated by what I see as the tech

30:27

bro predilction to misunderstand

30:30

cognitive science and philosophy and

30:31

what not because we we've spoken to so

30:33

many really interesting people on MLST

30:35

like for example Cesar Hadalgo he wrote

30:37

this book the laws of knowledge and and

30:39

even Marva Chirama she's a a philosopher

30:42

of neuroscience and she was talking all

30:43

about you know like you know basically

30:45

that knowledge is protein so yeah I I

30:47

think that that knowledge is

30:48

perspectival I don't think that

30:50

knowledge can be this abstract

30:52

perspective free thing that can exist on

30:55

Wikipedia and um I also think that

30:57

knowledge is is embodied and it's alive.

31:00

It's it's something that exists in in us

31:02

and the purpose of an organization is to

31:05

preserve and evolve knowledge. So when

31:08

you start delegating cognitive tasks to

31:10

language models, you actually have this

31:12

weird paradoxical effect that you erode

31:14

the knowledge inside the organization.

31:16

>> Well, that's true and that's terrifying.

31:18

There's often this these arguments

31:20

online

31:22

>> between people who are like, "LMs don't

31:24

understand anything. They're just

31:26

pretending to understand."

31:28

>> And then other people are like, "Don't

31:30

be ridiculous. Look what this LLM just

31:31

did for me." Right? And the funny thing

31:34

is they're both right. LLM's

31:37

cosplay understanding things. They

31:40

pretend to understand things.

31:42

And this was the interesting thing about

31:44

the early kind of uh work with like uh

31:46

cognitive science work with like Daniel

31:47

Dennett. Um that's basically what the

31:50

Chinese room experiment is, right? Is

31:52

you've got a guy in a room who can't

31:55

speak Chinese at all, but he sure looks

31:58

like he does because you can feed in

32:00

questions and he gives you back answers,

32:01

but all he's actually doing is looking

32:03

up things in a huge array of books or

32:06

machines or whatever. The difference

32:09

between pretending to be intelligent and

32:11

actually being intelligent is entirely

32:13

unimportant as long as you're in the

32:15

region in which the pretense is actually

32:19

effective, you know. So, so it's

32:21

actually fine for a great many tasks

32:24

that LLMs only pretend to be intelligent

32:28

because for all intents and purposes, it

32:31

it it just doesn't matter until you get

32:33

to the point where it can't pretend

32:36

anymore. And then you realize like oh my

32:39

god this thing's so stupid.

32:41

>> I'm a fan of so by the way. So you know

32:42

he said that um you know understanding

32:46

is causally reducible but ontologically

32:48

irreducible and he was saying there was

32:49

a phenomenal component to understanding

32:50

but you don't even need to go there.

32:51

Like the interesting thing about

32:53

knowledge being protein is this idea

32:55

that the you know it's basically this

32:57

canon idea the world is a complex place.

32:59

None of us understand it. It's like the

33:00

blind men and the elephant. We all have

33:02

different perspectives. It's very

33:03

complex thing. And so we we all we all

33:06

do this kind of modeling. But the

33:08

interesting thing is that the language

33:09

models sometimes they seem to understand

33:11

and they understand because the

33:12

supervisor places them in a frame. So

33:15

inside that frame, so when you have that

33:17

perspective of the elephants, they're

33:19

actually surprisingly coherent, but we

33:21

discount the supervisor placing the

33:24

models in that frame.

33:25

>> Yeah. Yeah. So that so C cell versus

33:28

Dennit or is it versus cell and Dennut

33:30

was what everybody was talking about

33:32

when I back when I was doing my

33:33

undergrad in philosophy you know so I

33:36

think consciousness explained came out

33:38

about then probably Chinese room a

33:41

little bit before

33:43

um it's interesting because the

33:45

discussions were the same discussions

33:46

we're having now but they've gone from

33:48

being abstract discussions to being real

33:51

discussions

33:52

it's helpful if people go back to the

33:54

abstract discussions because that it's

33:56

it it helps you get out of your

33:59

you know it's very distracting at the

34:01

moment to look at something that's

34:03

cosplaying intelligence so well and go

34:07

back to the fundamental question. Um, so

34:11

anyway, I just wanted to mention that's

34:12

kind of it's it's this interesting

34:15

situation we're now in where it's very

34:17

easy

34:19

34:21

really get the wrong idea about what AI

34:24

can do. Um, particularly when you don't

34:28

understand the difference between coding

34:29

and software engineering.

34:30

>> Yeah. Which then takes me to your point

34:32

or your question about the implications

34:36

of that

34:38

for organizations.

34:40

>> Yeah.

34:41

>> You know, a lot of organizations are

34:42

basically betting their futures on a

34:46

speculative premise

34:48

which is that

34:50

AI is going to be able to do everything

34:54

better than humans

34:56

uh or at least everything in coding

34:58

better than humans. I I worry about this

35:01

a lot both for the organizations and for

35:04

the humans. You know, for the humans

35:06

when you're not actively using your

35:09

design and engineering and coding

35:11

muscles, you don't grow. You might even

35:15

wither, but you at least don't grow. And

35:18

you know, speaking of the CEO of an R&D

35:21

startup, you know, if my staff aren't

35:23

growing, then we're going to fail. You

35:26

know, uh that we can't let that happen.

35:28

and getting better at the particular

35:33

prompting skills whatever details of the

35:36

current generation of AI CLI frameworks

35:41

isn't growing you know that's that's

35:43

like that's as helpful as learning about

35:46

the details of some AWS API when you

35:50

don't actually understand how the

35:51

internet works you know it's not it's

35:54

not reusable knowledge it's ephemeral

35:57

knowledge

35:58

So like if you wanted to, you can

36:02

actually use it as a learning

36:03

superpower,

36:05

but also

36:07

it can do the opposite. You know, the

36:09

natural thing it's going to do is

36:12

remove your confidence over time.

36:15

>> I agree that that's the natural thing.

36:17

So and this is especially pertinent for

36:19

you because your your career has been

36:20

around basically educating people to

36:23

get, you know, technology and AI

36:24

literacy. So the default behavior is

36:26

very similar to a self-driving car that

36:28

you know there's this tipping point

36:29

where at some point you're not engaged

36:31

anymore. You're not paying attention and

36:33

you get this delegation of competence

36:34

and you get understanding debt. That's

36:36

the default thing. So this study from

36:37

anthropic a couple of weeks ago, it it

36:40

contradicted Dario completely because it

36:41

even said that yeah, there were a few

36:43

people in the study that were asking

36:45

conceptual questions that are actually

36:46

kind of, you know, keeping on top of

36:47

things and they had a gradient of

36:49

learning, but most people didn't. And my

36:51

hypothesis about that is, you know, the

36:53

ideal situation for Gen AI coding is

36:56

that like us, we've been writing

36:57

software for decades. We already have

36:59

this abstract understanding. We're using

37:00

it in domains that we know well and we

37:02

can specify, we can remove loads of

37:04

ambiguity. we can track and we can go

37:06

back and forth and we can we can stay in

37:08

touch with the process. But what happens

37:10

is that the the default attractor is for

37:13

people to just go into this autopilot

37:15

mode and they've got no idea what's

37:17

happening and it's actually making them

37:18

dumber.

37:19

>> Uh I I created a the first deep learning

37:22

for medicine company called Denalytic

37:24

back in what was that like 2014.

37:27

Um, and our initial focus was on

37:29

radiology and a lot of people were

37:32

worried

37:34

that this would cause radiologists to

37:36

become less effective at radiology.

37:38

>> Yeah.

37:39

>> And I strongly felt the opposite, which

37:42

is, and I did quite a bit of research

37:44

into this of like what happens when

37:46

there's like fly by wire in airplanes or

37:49

anti-lock brakes in cars or whatever.

37:52

If you can successfully automate parts

37:55

of a task that really are automatable,

37:58

you can allow the the expert to focus on

38:02

the things that they need to focus on.

38:04

And we saw this happen. So in radiology,

38:06

we found if we could automate

38:09

identifying the possible nodules in a

38:13

lung CT scan, we were actually good at

38:16

it, which we were. And then we the the

38:19

radiologist then can focus on looking at

38:22

the nodules and trying to decide if

38:24

they're malignant or what to do about

38:26

it. So again, it's one of these subtle

38:28

things. So if there's things which you

38:32

can fully automate effectively in a way

38:36

that you can remove that cognitive

38:38

burden from a human so that they can

38:39

focus on things that they need to focus

38:41

on.

38:42

That can be good. You know, I don't know

38:47

where we sit in software development

38:50

because, you know, I've been coding for

38:54

40ish

38:56

years. So, I've written a lot of code

38:59

and I can glance at a screen of code

39:02

and, you know, unless it's something

39:04

quite weird or sophisticated, I can

39:06

immediately tell you what it does and

39:07

whether it works and whatever. Um,

39:12

I can kind of see intuitively things

39:14

that could be improved, you know,

39:15

possible things to be careful of. I'm

39:17

not sure I could have got to that point

39:20

if I hadn't have written a lot of code.

39:24

So the people I'm finding who can really

39:27

benefit from AI right now are either

39:30

really junior people who can't code at

39:32

all who can now write some apps that

39:37

they have in their head and as long as

39:39

they work reasonably quickly um with the

39:44

current AI capabilities then they're

39:46

happy and then really experienced people

39:49

like like me or like Chris Latner

39:51

because we can basically have it do some

39:53

of our typing for us, you know, and some

39:55

of our research for us. People in the

39:58

middle, which is most people most of the

40:00

time, it really worries me because how

40:02

do you get from point A to point B?

40:04

>> Yeah.

40:05

>> Without typing code, it might be

40:07

possible, but we don't have a we have no

40:10

experience of that. We don't is is it

40:12

possible? How would you do it? Like is

40:14

it kind of like going back to school

40:16

where at primary school we don't let

40:18

kids use calculators so that they

40:20

develop their

40:22

number muscle?

40:24

Do we need to do that for like first

40:26

five years as a developer? You have to

40:28

write all the code yourself.

40:30

I don't know. But if I was an between

40:33

like two and 20 years of experience

40:37

developer, I would be asking that

40:39

question of myself a lot because

40:43

otherwise you might be in the process of

40:46

making yourself obsolete.

40:48

>> Yeah. Well, this is another thing about

40:50

knowledge that um this Cesa Hadalgo guy

40:52

said. So he said that knowledge is

40:54

nonfgeible and which means it can't be

40:56

exchanged. So what he means by that is

40:58

the process of learning is in some

41:00

important sense not reducible right so

41:03

you have to have the experience and the

41:05

experience has to have friction and when

41:08

we build models of the world we actually

41:10

learn like you know there's this phrase

41:12

reality pushes back so we make lots of

41:14

mistakes and we update our models and we

41:16

and we're just placing these coherence

41:18

constraints in in our in our model and

41:20

that's how we come to learn. So you use

41:21

claw code and there's so little friction

41:23

in the process. That's exactly what this

41:25

study from anthropic said. It said there

41:26

was so little friction they didn't learn

41:28

anything

41:29

>> right. Yeah. No, exactly. Um desirable

41:32

difficulty is the concept that kind of

41:35

comes up in education. But even going

41:37

back to the work of uh Ebinghouse who

41:39

was the original repetitive spaced

41:41

learning guy in the 19th century and

41:43

then Peter Wnjak more recently it's we

41:46

find the same like we we we know that

41:49

memories don't get formed

41:52

unless

41:54

it is hard work to form them. Uh so you

41:58

know that's where you kind of get this

41:59

somewhat um surprising result that says

42:03

uh revising too often is a bad idea

42:07

because it comes to mind too quickly.

42:09

And so with repetitive space learning

42:11

with stuff like Anki and Super Memo, the

42:13

algorithm tries to schedule the flash

42:16

cards at a just before the moment you're

42:19

about to forget. So then it's hard work.

42:22

So I I studied uh Chinese for 10 years

42:26

um in order to try to learn about

42:28

learning myself. Um and I really noticed

42:32

this that I used Anki and because it was

42:35

always scheduling my cards just before I

42:38

was about to forget them. It was always

42:41

incredibly hard work.

42:43

>> Yeah.

42:43

>> You know to do reviews because almost

42:46

all the cards were ones I was on the

42:48

verge of forgetting. It was absolutely

42:50

exhausting. But my god, it worked well.

42:52

Here I am. I don't really haven't done

42:54

any study for 15 plus years and I still

42:58

remember my Chinese.

43:00

>> Well, I know I mean also I mean coming

43:01

back to your radiology example um like

43:04

one example people give is call centers.

43:07

So you know we have this notion that in

43:09

an organization we have high

43:10

intelligence roles and low intelligence

43:12

roles and for me intelligence is just

43:13

the adaptive acquisition and synthesis

43:15

of of knowledge. So we assume that that

43:17

you know the low intelligence roles

43:18

doing the call center stuff um it's it

43:21

doesn't adapt which means we can you

43:22

know there are certain things that an

43:24

organization does that do not change so

43:26

we could automate them and we don't need

43:28

to update our knowledge and I think that

43:30

discounts actually maybe with the

43:31

radiology example that having this

43:33

holistic knowledge like you know in a

43:35

call center there are so many weird edge

43:36

cases that come in so many weird things

43:38

happen and that filters up in the

43:40

organization and we adapt over time so

43:42

when you start to automate things and

43:43

you actually lose the competence to

43:45

create the process which created the

43:46

thing in the in the first place and you

43:48

lose the evolvability of that knowledge

43:50

in the organization, you're actually

43:52

kind of cutting your legs off.

43:54

>> Yeah, absolutely. And um so I you know

43:57

all I know is in in my company

44:01

I just I tell our our staff all the time

44:06

almost the only thing I care about is

44:08

how much your your personal human

44:11

capabilities are growing. you know, I

44:13

don't actually care how many PRs you're

44:15

doing, um, how many features you're

44:20

doing. Like, uh, there's that nice, um,

44:22

you know, John Oster, the T TCL guy,

44:25

recently released some of his st the his

44:28

Stanford Friday

44:30

takeaway lectures and he has this nice

44:32

one called um, a little bit of slope

44:35

makes up for a lot of intercept. Uh just

44:38

basically the idea that that you know in

44:40

your life if you can focus on doing

44:42

things that cause you to grow faster.

44:44

>> Yeah.

44:45

>> It's way better than focusing on

44:48

focusing on the things that you're

44:50

already good at. You know that has that

44:51

high intercept.

44:53

>> So the only thing I really care about

44:55

and I think is the only thing that

44:56

matters for for my company is that my

44:59

team I'm focusing on their slope.

45:02

>> Yeah. If you focus on just driving out

45:05

results at the limit of whatever AI can

45:08

do right now, you're only caring about

45:10

the intercept, you know. So, I think

45:13

it's basically a path to obsolescence

45:15

through a company and the people who are

45:17

in it. And so, I'm really surprised how

45:20

many executives of big companies are

45:22

pushing this now because it feels like

45:24

if they're wrong, which they probably

45:27

are, and they have nowhere to tell if

45:29

they are because this is an area they're

45:30

not at all familiar with. if they never

45:32

learned it in their MBAs. They're

45:34

basically setting up their companies to

45:36

be destroyed.

45:37

>> Yeah.

45:38

>> And really surprised that,

45:41

you know, shareholders would let them do

45:45

that. You know, set up such an

45:46

incredibly speculative action. Yeah.

45:49

Here we are. It feels like a lot of

45:50

companies are are going to fail as a

45:52

result of the uh amassed tech debt that

45:55

causes them to not be able to maintain

45:57

or build their products anymore. There

45:59

are loads of folks out there like France

46:00

relate like he he he really gets it. He

46:02

he understands this and you know so he's

46:04

always said that it's it's about this

46:06

kind of mmetic sharing of cognitive

46:08

models about the domain and how we

46:09

refine it together on the sharing thing.

46:11

This is another big scaling problem with

46:13

Gen AI coding, right? So the the the the

46:15

ideal case, I've done this. I know a

46:17

domain really well and I can specify it

46:20

with exquisite detail and I tell claude

46:22

code, go and do this thing and the

46:24

models in my mind doesn't matter. Um and

46:26

then you go into an organization and now

46:28

I need to share like my knowledge with

46:30

all of the other people, right? And I'm

46:32

sure you have this in your company as

46:33

well. Like you need to that this

46:34

knowledge acquisition bottleneck is a

46:36

real serious problem in in

46:37

organizations. So when it's just me, I I

46:40

I think I'm probably about 50 times more

46:42

productive using claude code. It's

46:44

absolutely magic and I can see why

46:46

people are so excited about it. But

46:48

people don't seem to understand the

46:50

bottleneck and and how that doesn't

46:52

really translate to many real world

46:53

organization.

46:54

>> No one's actually creating

46:57

50 times more highquality software than

47:00

they were before. So, we've actually

47:01

just done a study of this and there's a

47:04

tiny uptick tiny uptick in what people

47:08

are actually shipping.

47:10

That's the facts. Obviously, I'm an

47:12

enthusiast of of AI and what it can do,

47:16

but also my wife Rachel recently pointed

47:19

out in an article,

47:22

all of the pieces that make gambling

47:26

addictive are present in

47:30

>> Yeah. dark flow.

47:31

>> I was going to bring that up. You have

47:32

to tell us about

47:33

>> coding.

47:33

>> Yeah,

47:34

>> it's this really awkward situation where

47:37

it's very almost everybody I know who

47:40

got very enthusiastic about AI powered

47:43

coding in recent months have totally

47:47

changed their mind about it when they

47:48

finally went back and looked at like how

47:50

much stuff that I built during those

47:53

days of great enthusiasm am I using

47:56

today? Are my customers using today? am

47:58

I making money from today?

48:01

Almost all the money is being made by

48:04

influencers, you know, or by the

48:06

companies that produce the tokens. The

48:09

thing about AI based coding is that it's

48:14

like a slot machine in that you you have

48:17

an illusion of control. You know, you

48:18

can get to craft your prompt and your

48:22

list of MCPs and your skills and

48:25

whatever. And then but in the end, you

48:26

pull the lever, right? You put in the

48:28

prompt and something comes back and it's

48:32

like cherry cherry. It's like oh next

48:36

time I'll change my prompt a bit. I'll

48:38

add a bit more context. Pull the lever

48:40

again. Pull the lever again. It's the

48:43

stochastic thing. You get the occasional

48:45

win that's like, "Oh, I won. I got a

48:48

feature."

48:50

So, it's got it's got all these

48:51

hallmarks of like loss disguised as a

48:53

win. um somewhat stochcastic uh feeling

48:57

of control, all the stuff that um gaming

49:01

companies try to engineer into their

49:03

gaming rooms. Now, none of that means

49:06

that AI is not useful,

49:09

but gosh, it's hard to tell.

49:12

>> I know. And and Rachel, just just to be

49:14

clear to she also said that one of the

49:16

hallmarks of gambling is that you kind

49:17

of delude yourself that you have some

49:19

awareness of what's going on, but but

49:20

actually you don't. But let let's do the

49:22

bull case a little bit, though. So

49:24

because I do I do think in restricted

49:26

cases it it is it is very useful and

49:28

these are cases where we understand and

49:30

and we can place constraints and

49:31

specification but um even in those cases

49:34

you could argue on the one hand that

49:36

we're not you know we're not going to be

49:37

unemployed anytime soon because you just

49:39

do more work on the addiction thing I've

49:41

noticed that so I've had 14-hour claude

49:44

code marathon sessions and and I

49:45

actually feel addicted to it. It's like

49:47

a slot machine, you know. It It really

49:49

is.

49:49

>> Been there, too. Absolutely.

49:51

>> Yeah, I know. It's And just I've never

49:53

felt more drained writing code. I

49:55

actually need to take a rest afterwards,

49:57

like a few days rest because it

49:59

completely

50:00

>> was crap, you know. Yeah, definitely.

50:02

I've had some successes, right? And so

50:04

in fact, we've spent the last couple of

50:07

years building a whole product based

50:09

around where we know the successes are

50:11

going to be, which is when you're

50:13

working on reasonably small pieces that

50:15

you can fully understand and that you

50:18

can design and you can build up your own

50:20

layers of abstraction to create things

50:22

that are bigger than the uh parts that

50:23

you're building out of. had a very

50:25

interesting situation recently where I

50:28

just it was kind of an experiment

50:29

basically which is we uh we rely very

50:32

heavily on something called um IPI

50:34

kernel which is the thing that's powers

50:37

Jupyter notebooks

50:38

um and there had been a major version

50:41

release of IPI kernel from 6 to 7 and it

50:45

stopped working um and it stopped

50:47

working and both of the products that we

50:49

were try to use it with one was was

50:51

called today NB classic which is the

50:53

original Jupyter notebook book and then

50:55

it's our own product called Solve It

50:57

would just randomly crash.

51:00

Um,

51:02

an iPel's over 5,000 lines of code. It's

51:06

very complex code, multiple threads,

51:09

events, blocks, interfaces with IP,

51:13

Python, you know, with ZMQ,

51:17

you know, all kinds of different pieces,

51:19

um, um, debug pie.

51:22

and I I couldn't get my head around it

51:24

and I couldn't see why it was crashing.

51:27

The tests are all passing. I wonder if

51:30

AI can solve this. You know, it's like

51:32

I'm always interested in the question of

51:33

like how big a chunk can AI handle on

51:35

its own right now. The answer turned out

51:37

to be

51:39

yes. I think it can just it was like so

51:44

I spent a couple of weeks I didn't

51:47

develop a lot of understanding about how

51:49

IPI kernel really worked in the process

51:51

but I did spend quite a bit of time kind

51:53

of pulling out separate comp like so the

51:55

answer was in two hours codeex 5 point I

51:59

think it was 5.2 two at that time or

52:01

maybe three had just come out. Couldn't

52:03

do it. Then if I got the $200 a month uh

52:08

GPT5.3

52:11

Pro

52:13

>> to fix the problems, it could. And so by

52:16

rolling back between those two pieces of

52:18

software, those two models, I could get

52:22

things working uh over a couple of weeks

52:26

period. And like you say, it wasn't at

52:28

all fun. It was very tiring and it felt

52:31

stressful because I wasn't really in

52:32

control. But the interesting thing is I

52:34

now am in a situation

52:37

where I have the only implementation of

52:40

an of a Python Jupiter kernel that

52:43

actually works correctly as far as I can

52:45

tell with these new version 7 protocol

52:49

improvements. And now I'm like, well,

52:51

this is fascinating because we don't

52:53

have a kind of a a software engineering

52:55

theory of what to do now. like

52:59

here's a piece of code that no one

53:01

understands.

53:02

>> Yeah.

53:03

>> Am I going to bet my company's product

53:06

on it?

53:08

And I the answer is I don't know because

53:10

like I I don't I don't like I don't know

53:14

what to do now because no one's like

53:16

been in this situation. And like will it

53:20

does it have memory leaks? Will it still

53:22

work in a year's time if there's some

53:23

minor change to the protocol?

53:26

Is there some weird edge case that's

53:29

going to destroy everything?

53:32

No one knows because no one understands

53:34

his code. It's a really curious

53:36

situation. I

53:37

>> mean, first of all, we should

53:38

acknowledge the penicious erosion of

53:41

control. So, at the very beginning, you

53:43

have 10% AI generated code and then you

53:46

can just see how it creeps up and up and

53:47

then at some point 6 months down the

53:49

line a PR comes in and now you know 60%

53:52

of the code is AI generated and do do

53:54

you see what you see what happens? you

53:56

you slowly become disconnected. But the

53:58

bullcase for this is, you know, in AI

54:00

there's this idea called functionalism

54:01

that, you know, we don't care what the

54:03

intelligent thing is made out of. As

54:05

long as it does all of the right things,

54:07

then we know, you know, we would say

54:09

it's AI. And it's the same thing with

54:10

software. So the bullcase is I I

54:13

understand the domain. I don't need to

54:15

write I don't need to know how to write

54:17

the quicksort algorithm. I just need to

54:19

I I just need to understand it, right?

54:21

and then and then you know so I just

54:23

need to have all of these tests and it

54:26

needs to go into deployment and these

54:27

things need to happen and at that point

54:30

you know what I don't actually care and

54:32

and I could also

54:33

>> I quite and I quite to be clear I quite

54:35

like that um framing but you know what

54:38

that actually does is it says wow

54:40

software engineering sure is important

54:41

then because software engineering is all

54:44

about finding what those pieces are and

54:46

how they should behave and then how you

54:48

can put them together to create a bigger

54:50

piece and then how you can put them

54:51

together to create a bigger piece. And

54:53

if we do that well, then in 10 years

54:56

time, we could have software that is far

54:59

more capable than anything we could even

55:01

imagine today.

55:03

>> Um,

55:05

but you're already going to get that

55:07

with really great software engineering.

55:09

Yeah, you want to be careful. I think in

55:12

the end like IPI kernel I'm finding for

55:14

example, it's just too big a piece,

55:18

right? Because in the end the the team

55:22

that made the original IPI kernel were

55:25

not able to create a set of tests that

55:28

correctly exercised it and therefore

55:32

real world downstream projects including

55:35

the original NB classic you know which

55:37

is what IPI kernel was extracted from

55:39

didn't work anymore. So this is this is

55:41

kind of where our focus is on now on the

55:43

development side at um at Answeri

55:46

is finding the right sized pieces

55:51

and making sure they're the right

55:52

pieces. Knowing how to recognize what

55:56

those pieces are and how to design them

55:58

and how to put them together is actually

55:59

something that normally requires

56:03

some decades of experience before you're

56:06

really good at it. Um certainly it's

56:09

true for me. Um I reckon I got pretty

56:12

good at it after maybe 20 years of

56:13

experience.

56:15

Yeah, it's a big question is like how do

56:17

you build these software engineering

56:20

chops which are now even more important

56:22

than they've ever been before. They're

56:24

the difference between somebody who's

56:26

good at writing computer software and

56:28

somebody who's not. That feels like a

56:31

challenging question.

56:32

>> I know. And there's also this notion

56:34

that there are so many different ways to

56:36

abstract and represent something. You

56:38

know, the world is a very complex place.

56:40

And maybe the way we've been abstracting

56:42

and representing software is mostly a

56:45

reflection of our own cognitive

56:47

limitations, right? And even in the

56:48

sciences in in physics, you tend to have

56:51

a lot of quite reductive methods of

56:53

modeling the world. And then you've got

56:54

complexity science, which is just

56:55

embracing the constructive dissipative,

56:58

you know, gnarly nature of of things.

57:01

And I think a lot of software today we

57:03

don't understand right so for example

57:05

there are many globally distributed

57:07

software applications that use the actor

57:09

pattern and this is just this in it's

57:12

basically like a complex system right

57:14

and the only way we can understand it is

57:16

by doing simulations and tests because

57:18

no one actually knows how all of these

57:20

things um fit together so you could

57:22

argue I guess as a bullc case that maybe

57:24

we already are doing this at the top of

57:26

software engineering and that is what we

57:28

want to do eventually anyway. Yeah, I'd

57:31

say probably not. You see companies like

57:35

Instagram and WhatsApp

57:38

dominate their sectors where whilst

57:40

having

57:42

10 staff and beating companies like

57:47

Google and Microsoft in the process. I

57:50

would argue this way of building

57:53

software in very large companies is

57:56

actually failing. And I think we're

57:59

seeing a lot of these very large

58:00

companies becoming you know increasingly

58:02

desperate and you know for example the

58:06

quality of

58:08

Microsoft Windows and Mac OS has very

58:12

obviously deteriorated greatly in the

58:15

last 5 to 10 years. You know, back when

58:20

Dave Cutler was looking at every line of

58:23

the NT kernel and making sure it was

58:25

beautiful,

58:27

it it was a elegant and marvelous piece

58:31

of software, you know, and there's I

58:32

don't think there's anybody in the world

58:33

who's going to say that Windows 11 is an

58:35

elegant and marvelous piece of software.

58:37

So, I actually think we do need to find

58:39

these smaller components that we do

58:41

fully understand and that we need to

58:43

build them up. And here's the problem.

58:47

Um, AI is no good at that. So, and and

58:50

so I say that empirically. They're

58:52

really bad at software engineering. Uh,

58:54

and then I think that's

58:58

possibly always going to be true

59:01

because,

59:02

you know, we're we're asking them to

59:06

often move outside of their training

59:08

data. you know, if we're trying to build

59:09

something that literally hasn't been

59:11

built before and do it in a better way

59:12

than has been done before, we're saying

59:14

like don't just copy what was in the

59:16

training data.

59:18

So, um, and again, this is a confusing

59:23

point for a lot of people because they

59:25

see AI being very good at coding and

59:28

then you think like, oh, that's software

59:30

engineering, you know, it's like, oh,

59:31

it's must be good at software

59:32

engineering. But it's they're different

59:34

tasks. There's not a huge amount of

59:37

overlap between them and there's no

59:39

current empirical data to suggest that

59:44

LLMs are gaining any competency at

59:46

software engineering. Every time you

59:48

look at a piece of software engineering

59:50

they've done like the browser for

59:53

example which um cursor created or the C

59:56

compiler which um anthropic comp created

59:59

like I've read the source code of those

60:02

things quite a bit. Um Chris Latner is

60:06

much more familiar with the compiler

60:07

example than me. Um but they're they're

60:10

very very obvious copies of things that

60:13

already exist. So

60:16

that's the challenge, you know, is if

60:18

you want to build something that's not

60:20

just a copy, then you can't outsource

60:24

that to an LLM.

60:26

There's no theoretical to reason to

60:28

believe that you'll ever be able to and

60:31

there's no empirical data to suggest

60:33

that you'll ever be able to.

60:34

>> Yes. I think the punchline of this

60:37

conversation is and I'm I'm sure you

60:39

would agree with this that we need to

60:40

have the combination of AI and humans

60:42

working together, right? Because the

60:44

humans provide the the understanding and

60:47

all of the stuff we were saying about

60:48

knowledge, but we can still use AI as a

60:51

tool. We need to we need to design

60:53

operating models or ways of working that

60:56

make that you know we we say we we don't

60:59

want to diminish our competence and

61:01

understanding right

61:02

>> so so it's very it's a very fine line

61:04

>> that's that's been our focus and we both

61:05

focus on that for teaching and for our

61:08

own internal development the stuff I've

61:11

been working on for 20 years has turned

61:15

out to be the thing that makes this all

61:17

work

61:19

should get credit for this was the guy

61:20

that created the notebook interface.

61:24

Although also lots of ideas kind of go

61:26

back to small talk and lisp and APL. But

61:30

basically the idea that a human can do a

61:33

lot more with a computer when the human

61:37

can like manipulate the objects in

61:41

inside that computer in real time and

61:43

study them and move them around and

61:45

combine them together. Yeah, that's what

61:47

small talk was all about. you know with

61:49

objects and AP was the same with arrays.

61:52

Mathematica basically is a superpowered

61:54

lisp which then also added on this very

61:56

elegant notebook interface that allowed

61:58

you to construct kind of a living

62:00

document out of all this. So I built

62:03

this thing called NBDEV a few years ago

62:05

which is a way of creating production

62:08

software inside these notebook

62:11

interfaces inside these rich dynamic

62:13

environments and I found that made me

62:16

dramatically more um productive as a

62:19

programmer and like today even though

62:21

I've

62:22

never been a full-time programmer as my

62:25

job when you look at my kind of GitHub

62:27

repo output I think GitHub produced some

62:29

statistics about it and I was like just

62:31

about the most productive programmer in

62:32

in Australia, you know, like it it's

62:35

working and a lot of the stuff I build

62:38

has lots and lots of people use it

62:41

because it's such a rich powerful way to

62:44

build things. And so it turns out we've

62:48

now discovered that if you put AI in the

62:50

same environment with the human again in

62:53

a in a rich interactive environment

62:57

AI is much better as well which perhaps

63:00

isn't shocking to hear but the the

63:03

normal like if you use clawed code which

63:06

I know you do and it's a very good piece

63:07

of software but the environment we give

63:09

clawed code is very similar to the

63:12

environment that people had 40 years

63:15

ago. go, you know, it's a it's a

63:17

linebased terminal interface.

63:20

Uh, you know, it can use MCP or

63:22

whatever. Most of the times it just

63:24

nowadays uses bash tools, which again

63:26

very powerful. I love bash tools. I use

63:28

them all the, you know, CLI tools all

63:30

the time, but it's still just it's using

63:33

text files, you know, as its as its

63:35

interface to the world. It's it's it's

63:38

really meager.

63:40

So um so we put the human and the AI

63:46

inside a Python interpreter.

63:48

Um and now suddenly you've got the full

63:51

power power of a very elegant expressive

63:54

programming language that the human can

63:57

use to talk to the AI. The AI can talk

63:59

to the computer. The human can talk to

64:00

the computer. The computer can talk to

64:02

the AI. Like you have this really rich

64:04

thing. And then we let the human and the

64:08

AI in real time

64:11

build tools that each other can use. And

64:13

that's what it's about to me, right?

64:15

It's about like creating an environment

64:17

where humans can grow and engage and

64:23

share. It's like for me when I use solve

64:27

it, it's the opposite of that experience

64:29

you described with Claude Code. After a

64:31

couple of hours, I feel energized and

64:35

happy and fulfilled.

64:38

>> I'll give you I'll give you my take. I

64:40

think that the thing that you're

64:41

pointing to here is there's something

64:43

magic about having an interactive

64:46

stateful environment that gives you

64:48

feedback.

64:49

>> And that is because our brains kind of

64:52

they they can do a a certain you know

64:55

unit of work. So, so we actually think

64:57

through refining and testing with

65:00

reality and that's why I mean I during

65:02

my PhD I use Mathematica and Mat Lab and

65:04

I agree so we've got this ripple

65:05

environment and you know here's the

65:07

matrix do an image plot you know do a

65:09

change this is what it looks like now

65:11

and it's actually a wonderful way to

65:13

kind of just just refine my mental model

65:16

about something

65:17

>> and but but Claude code does a lot of

65:19

this stuff I I think it's mostly a skill

65:21

issue I think the people that use Claude

65:23

code effectively do this I've written a

65:26

content management system.

65:27

>> It's possible. It is possible.

65:28

>> It's possible. Yeah. So, you know, I've

65:30

written a content management system

65:31

called Rescript. And when I'm putting

65:32

together a documentary video, it can go

65:34

it can it can pull transcripts and then

65:36

I can verify the claims. And you know,

65:39

part of AI literacy is just

65:40

understanding the the asymmetry of

65:42

language models, right? So, when you

65:44

give them a sort of discriminative task,

65:47

they're actually quite good. So if I

65:48

tell it in a sub agent to go and verify

65:50

every individual claim, it's much more

65:52

accurate than if I was in generation

65:54

mode and I was generating a bunch of

65:56

claims and the stateful feedback thing

65:58

again, you know, I can have some kind of

66:00

like schematized XML dump and I can have

66:02

like an application here on the side

66:04

which is visualizing and it's like a

66:06

feedback loop and for me this is an AI

66:08

literacy thing like the the good people

66:10

at AI are already doing this.

66:12

>> Yeah. So I don't fully agree with you. I

66:15

agree you can do it in clawed code and I

66:18

agree it is a AI literacy thing as to

66:21

whether you can but also claude code was

66:25

not designed to do this. It's not very

66:28

good at it and it doesn't make it the

66:30

natural way of working with it. I don't

66:32

want to say it's an AI literacy problem

66:34

because that's like saying like, oh,

66:36

it's a you problem. To me, if a tool is

66:39

not making it the natural way for a

66:42

human to become

66:44

more knowledgeable, more happy, more

66:48

connected

66:49

with a deeper understanding and a deeper

66:52

connection to what they're working on.

66:54

That's a tool problem. That that should

66:56

be how tools are designed to work. So so

67:00

many models and tools expressly are

67:03

being evaluated on can I give it a

67:05

complete piece of work and have it go

67:07

away and do the whole thing which feels

67:09

like a huge mistake to me versus have

67:14

you evaluated whether a human comes out

67:16

the other end with a deep understanding

67:18

of a topic you know so that they can

67:21

really easily build things in the

67:23

future. I agree with all of that, but

67:24

then there's the other interesting angle

67:26

which is that there was a famous talk by

67:28

Joel Grus and we'll talk about this and

67:30

and he said that notebooks are terrible.

67:33

Um they're really bad from a software

67:34

engineering point of view and and at the

67:36

time and maybe still now to a certain

67:37

extent I I agree with him because um you

67:41

know I' I've I've done ML DevOps. I've

67:43

worked in large organizations you know

67:45

like trying to figure out how do we

67:46

bridge like data science and software

67:49

engineering and claude code is already

67:51

more towards the software engineering

67:53

side and what that means is it creates

67:55

item potent stateless repeatable

67:58

artifacts right so as you say from a

68:00

pedagogical point of view it's really

68:02

good having this stateful feedback

68:03

because I can understand what's going on

68:05

but then I need to translate that into

68:07

something which is deployable and can

68:09

you tell us the story of you You you

68:12

responded to Joel Bruce, didn't you? And

68:14

and it was a bit of a fiasco, wasn't it?

68:17

But what just tell us about that story?

68:19

>> He did a really good video called uh I

68:22

don't like notebooks.

68:24

Um it was uh hilarious. It was really

68:27

well done. Um and uh yeah, I was totally

68:30

wrong. And all the things he said

68:33

notebooks can't do,

68:35

they can. And all the things he said you

68:38

can't do with notebooks, I do with

68:40

notebooks all the time. So it was a very

68:42

good very amusing uh incorrect talk. So

68:45

then I did a kind of a parody of it

68:47

called um I like notebooks um in which I

68:51

basically copied with credit most of his

68:54

slides and showed how every one of them

68:57

was totally incorrect. But like I

69:00

actually think your comment about it

69:05

does come down to the heart of it which

69:07

is this diff difference between like how

69:10

software engineering is normally done

69:12

versus how

69:15

scientific research and similar things

69:17

is normally done

69:19

and I think and I agree there is a

69:21

dichotomy there and I think that

69:23

dichotomy is a real shame because I

69:26

think software development is being done

69:28

wrong. uh it's being done in this way

69:30

which is yeah all about reproducibility

69:33

and these like dead these dead pieces

69:37

you know it's it's all dead code dead

69:39

files I I will never be able to express

69:42

this one millionth as clearly as um as

69:45

Brett Victor has in his work so I'd

69:47

encourage people who haven't watched

69:49

Brett Victor to to to watch him but you

69:52

know he he's he shows again and again

69:55

how a direct connection you a direct

69:59

visceral connection with

70:01

the thing you're doing is is all that

70:05

matters, you know, and that's his

70:07

mission is to make sure people have that

70:09

connection and that's basically my

70:11

mission as well. So for me, traditional

70:14

software engineering is as far from that

70:16

as it is possible to get. I think it's I

70:19

think it's gross. Like I I I find it

70:23

disgusting and I find it sad that people

70:26

are being forced to work like that. It's

70:29

like I think it's inhumane and I just

70:31

don't think it works very well. I mean

70:32

empirically it doesn't work very well.

70:34

Uh and it's much less good for for AI as

70:38

well as it's much less good for humans.

70:40

It hasn't always been that way like you

70:42

know with with Ellen K and Small Talk

70:45

and uh Iverson and APL you know Lisp

70:51

Wolf with Mathematica.

70:54

It to me these were the golden days

70:58

when when

71:00

people were focused on the question of

71:02

how do we get the human into the

71:05

computer to work as closely with it as

71:07

possible. You know, that's where the the

71:10

mouse came from, for example, like to

71:13

like click and drag and

71:17

visualize entities in your computer as

71:20

things you can move around.

71:22

So, I feel like we've lost that. I think

71:25

it's really sad. Yeah. With claude code

71:28

and stuff, the the default way of

71:30

working with them is to go super deep

71:34

into it. It's like, okay, there's a

71:36

whole folder full of files. You never

71:39

even look at them. Your entire

71:41

interaction with it is through a prompt.

71:44

>> Yeah.

71:45

>> I it it it literally disgusts me. Like I

71:49

literally think it's it's inhumane and

71:52

it's my mission remains the same as it

71:55

has been for like 20 years, which is to

71:57

stop people working like this.

72:01

>> I know. But so casting my mind back, I

72:04

used to work with data scientists. They

72:05

were using Jupyter notebooks. And what I

72:06

found was typically I mean back then you

72:09

couldn't if you checked them into git it

72:10

wouldn't look very good. Most of these

72:11

data scientists didn't know how to use

72:13

git. They would run the cells out of

72:15

order which means it wouldn't be

72:16

reproducible. There all sorts of things

72:17

like that. But the thing is I agree with

72:19

you that you you can use them in this in

72:21

this workflow. But it comes back to what

72:23

I was saying before about you know we we

72:24

were talking about the call center and

72:26

it being like a low intelligence job.

72:28

You know the data scientists the reason

72:30

why they they are doing intelligent work

72:32

is they are actually creating something

72:34

that doesn't exist. They are figuring

72:36

out the the the contours of a problem.

72:38

They're actually working in a domain

72:40

that is poorly understood. But you could

72:42

argue now the bull case is when the data

72:45

scientists can succinctly describe the

72:48

contours of the problem. Maybe we could

72:49

go to claude code and we could implement

72:51

it properly. But how do we bridge

72:53

between those two worlds?

72:54

>> I think that'd be a terrible terrible

72:56

idea. you know like

72:59

>> you don't want to remove people from

73:01

their exploratory environment you know

73:05

73:07

research and and uh um science is

73:11

developed by people

73:13

building insight you know um whoever you

73:18

listen to you know whether it be

73:20

Feainman or whatever like you you always

73:22

hear from the great scientists how

73:26

They build deeper intuition by by

73:28

building mental models which they get

73:30

over time by interacting with the things

73:34

that they're learning about. Now like in

73:36

Feman's case because it was theoretical

73:39

physics he couldn't actually pick up a

73:42

spinning quark but he did literally

73:44

study spinning plates. You know you got

73:47

to find ways to to deeply interact with

73:50

with what you're working with. Like so

73:53

so many times I've seen data science

73:56

teams because you're right, data science

73:58

teams

73:59

aren't very familiar with Git and aren't

74:02

very familiar with things that they do

74:04

need to understand.

74:06

Um, and so often I've seen a a software

74:09

engineer will become their manager and

74:11

their fix to this will be to tell them

74:13

all to stop using Jupyter notebooks and

74:15

now they have to use all these

74:16

reproducible blah blah virtual, you

74:19

know, virtual end blah blah. they

74:21

destroy these teams over and over again.

74:24

I've seen this keep happening

74:26

um because the solution is not create

74:29

more discipline and bureaucracy,

74:32

it's solve the actual problem. So for

74:35

example, we we built a um a thing called

74:39

an NB merge driver

74:41

which a lot of people don't realize this

74:43

but actually uh notebooks are extremely

74:46

git friendly. It's just that Git doesn't

74:48

ship with a merge driver for them. So,

74:51

Git only ships with a merge driver for

74:54

uh linebased text files, but it's fully

74:56

pluggable. And so, you can easily plug

74:59

in one for JSON files instead. And so,

75:02

we wrote one. So, now when you diff, you

75:07

know, when you get a get diff with our

75:08

merge driver, you see cell level diffs.

75:12

If you get a merge conflict, you get

75:14

surge level cell level merge conflicts.

75:16

the the notebook is always openable in

75:19

Jupiter. Um NB Dime did the same thing.

75:22

So two independent implementations of

75:25

this.

75:26

>> So yeah, there were problems to solve,

75:28

you know. Um but the solution to it was

75:30

not

75:32

throw away Brett Victor's ideas and make

75:35

people further away from from their

75:37

exploratory tools, but to fix the

75:40

exploratory tools. And I think all

75:43

software developers

75:44

should be using exploratory based

75:47

programming to deepen their

75:49

understanding of what they're working

75:51

with so that they end up with a really

75:54

strong mental model of the system that

75:57

they're building and they're working

75:58

with and then they can come up with um

76:01

better solutions more incrementally

76:03

better tested. I basically never have to

76:05

use a debugger because I basically never

76:07

have bugs and it's not because I'm a

76:09

particularly good programmer. It's

76:10

because I build things up small little

76:13

steps and each step works and I can see

76:16

it working and I can interact with it.

76:18

So there's no room for bugs. You know,

76:20

>> you know, I'm so torn on this because I

76:21

agree with you and I'm also skeptical of

76:24

people who say that organizations they

76:27

they they converge onto ways of doing

76:30

things and they no longer need to

76:31

evolve. They no longer need to adapt.

76:33

You know, innovation is adaptivity,

76:35

right? And we should increase the

76:38

surface area of adaptivity as much as we

76:40

possibly can. So we need people that are

76:41

constantly testing new ideas, finding

76:43

these constraints. But by the same

76:45

token, we need to use the cloud. We need

76:47

to use CI/CD. We need to get this stuff

76:49

into production.

76:50

>> Yeah. So do you abs do but like there's

76:53

absolutely no like so NBD ships with out

76:58

ofthe-box CI integration and

77:02

the like the tests are literally there

77:05

like because the source is a notebook

77:08

the entire exploration of like how does

77:10

this API work you know what does it look

77:12

like when you call it the implementation

77:14

of the functions the examples of them

77:16

the documentation of them the tests of

77:18

them are in one place So it it's much

77:22

easier to be a good software engineer in

77:26

this environment.

77:28

Um so yeah like do do both you know.

77:33

>> So do you remember there was there was

77:34

that existential risk should be an

77:37

urgent priority and it was signed by

77:39

folks like Hinton and and Demis and you

77:42

responded um basically with a rebuttal

77:45

and that was with um Aravind you know

77:47

the the snake oil guy. Tell me about

77:49

that. Do you think we should be worried

77:51

about AI existential risk?

77:52

>> I mean, that was a certain time, wasn't

77:54

it? And I feel like things have changed

77:57

a bit. Thank God. I feel like we we not

78:01

just me and Ara, but um broadly

78:04

speaking, the community of which we're a

78:05

part kind of probably won that. Now we

78:09

have other problems to worry about. But

78:10

you know basically at that point um

78:14

the prevailing narrative was

78:18

AI is about to become autonomous. It

78:20

could become autonomous at any moment

78:22

and could destroy the world. Uh so very

78:24

much comes from um you know Alysia

78:28

Yukowsk's

78:30

>> work which

78:32

>> I think clearly has been shown to be

78:35

wrong at many levels to this point.

78:38

>> They would refute that obviously. Of

78:39

course they would.

78:40

>> Yeah,

78:41

>> it's one of those things that they can

78:43

always refute just like any doomsday

78:44

cult unless you give it a date uh and

78:47

the date passes.

78:48

>> Well, even I' I've updated a little bit

78:50

in the sense that I I now think I would

78:53

now say that these models can be said to

78:56

be intelligent in restricted domains.

78:57

The arc challenge showed that. So if you

79:00

place constraints into the problem, you

79:02

you can you can go faster towards a

79:04

known goal. even agency. You you can put

79:06

a planner on there and you can go if you

79:08

if you know where you're going, you can

79:09

go there faster, but that doesn't help

79:11

you. Like you can have all the

79:12

intelligence and agency in the world,

79:13

but if you don't have the knowledge and

79:15

the constraints, then you're going in

79:16

the wrong direction faster. And I think

79:18

they don't seem to appreciate that these

79:20

models don't actually know the world.

79:22

Like none of that was even relevant to

79:24

Avan and my point which was and is that

79:29

it's misunderstanding

79:31

where the actual danger is

79:35

>> which is that when you have a

79:37

dramatically more powerful technology

79:41

entering the world

79:43

uh that can make some people

79:46

dramatically more powerful.

79:48

People who

79:51

are in love with power will seek to

79:54

monopolize that technology

79:57

and the more powerful it is, the more

80:00

strong that urge from those power-

80:03

hungry people will be. So to ignore

80:05

people, so here's the problem. If you're

80:08

like, I don't care about any of that.

80:09

All I care about is autonomous AI taking

80:13

off, you know, singularity, paperclip,

80:16

nano goo, whatever. The obvious solution

80:19

to that is, oh, let's centralize power.

80:22

And this is which is what we kept seeing

80:24

particularly at that time. Let's give um

80:27

uh either very rich technology companies

80:29

or the government or both all of this

80:33

power and make sure nobody else has it.

80:36

In in my threat model, that's the worst

80:39

possible thing you can do because you've

80:41

centralized the ability to control in

80:44

one place and therefore these people who

80:46

are desperate for power just have to

80:48

take over that thing.

80:50

>> Could we distinguish though what you

80:51

mean by power? Because we've we've just

80:53

spent some of this conversation talking

80:54

about how it's not actually as powerful

80:56

as people think it is.

80:57

>> But I but I'm but I'm not even that's

80:59

what but mine is an even if thing,

81:01

right? So, like I I I'm just saying even

81:04

if it turns out to be incredibly

81:07

powerful, right? Like I don't I don't

81:09

even want to argue about whether it's

81:11

going to be powerful because that's

81:14

speculative.

81:16

Even if it's going to be incredibly

81:17

powerful, you still shouldn't centralize

81:20

all of that power in the hands of one

81:22

company or the government.

81:25

>> Yeah. Because if you do, all of that

81:28

power is going to be monopolized by

81:30

power- hungry people and used to

81:36

destroy

81:37

civilization. Basically, you'll end up

81:39

with a case where all of that wealth and

81:43

power will be centralized with the kinds

81:46

of people who who want it centralized.

81:48

So like

81:51

society for hundreds of years have faced

81:54

this again and again and again you know

81:56

so when it's like you know writing

82:00

used to be something that only the most

82:05

exclusive people had access to knowing

82:08

about writing.

82:09

And the same arguments were made. If you

82:13

let everybody write, they're going to

82:15

use it to write things that we don't

82:17

want them to write and it's going to be

82:19

really bad. You know, ditto with

82:21

printing, ditto with the vote. Like,

82:27

and again and again, society has to

82:29

fight against this natural prediliction

82:31

of the people that have the status quo

82:33

power to be like, no, this is a threat.

82:38

So when we're saying like, okay, what if

82:42

AI turned out to be incredibly powerful,

82:46

would it be better for society to be

82:48

that to be kept in the hands of a few or

82:52

spread out across society?

82:55

>> My argument was the latter. Now, there's

82:59

also an argument which is like, don't

83:01

worry about it. It's not going to be

83:02

that powerful anyway.

83:04

I I just didn't want to go there because

83:08

it's not

83:11

an argument that's easy to win because

83:13

you can't really say what's going to

83:15

happen.

83:17

We're all just guessing. But I can very

83:19

clearly say like, well, if it happens,

83:21

would it be a really good idea to only

83:24

let Elon Musk have it or would it be a

83:27

good idea to only let Donald Trump have

83:28

it?

83:29

>> Dan Hendris spoke about this offense

83:31

defense asymmetry. So it's actually very

83:34

important for us to have counterveailing

83:36

um you know defenses. But let's just

83:38

take that as a given for a minute

83:40

because obviously when we look at

83:41

something like Meta and Facebook, it's

83:43

quite clear what the power imbalance is.

83:45

You know, they they control all of our

83:47

data. They they know what we're doing

83:48

with with something like OpenAI and

83:50

Claude. So it's it's not as good as we

83:53

thought it was because actually humans

83:55

still need to be involved. But for

83:56

example, they have all of our data,

83:58

right? and you might be working on some

84:00

new innovative technology and you're

84:02

using Claude and you're sending all of

84:03

your information up there and they can

84:05

now copy you. I mean what what kind of

84:07

risks are you talking about to be more

84:09

concrete?

84:09

>> Yeah. No, I mean so I was not talking

84:11

about any of those things, right? So at

84:12

the time I was talking about this

84:13

speculative question of what if AI gets

84:16

incredibly powerful? I mean like like

84:18

now for example they they say that this

84:21

is the new means of production and

84:23

that's that seems completely hyperbolic

84:25

to me but like in your best estimation

84:27

now if there are risks what are they

84:30

>> if there are risks with the current

84:31

state of technology I mean I think some

84:33

of them are the ones we've discussed

84:35

which is

84:37

people enfeeing themselves

84:40

by basically losing their ability to be

84:44

to become more competent over

84:47

That's that's that's the big risk I

84:49

worry about the most. Um

84:53

the privacy risk, it's there, but I'm

84:57

not sure it's much more there than it

84:59

was for Google and Microsoft before.

85:01

Like you know, you used to work at

85:03

Microsoft, you know, how much data they

85:06

have about the average uh Outlook,

85:09

Office, etc. user uh ditto for Google,

85:13

you know, the average Google Workspace

85:14

or Gmail user.

85:17

um those privacy issues are real

85:20

although I think there are bigger

85:22

privacy issues around these companies

85:25

which the government can outsource data

85:27

collection to so back in the day it used

85:30

to be companies like choice point and

85:32

Axiom nowadays it's probably more

85:33

companies like Palanteer

85:36

the US government is um actually

85:39

prohibited from building large databases

85:42

about US citizens for example but It's

85:46

not prohibited. Companies are not

85:47

prohibited from doing so, and the

85:49

government's not prohibited from

85:51

contracting things to those companies.

85:53

So, I mean, that's a huge worry, but I

85:55

don't think it's one that AI is uniquely

85:59

creating. It certainly you're in the UK,

86:02

as you know, in the UK surveillance has

86:04

been universal for quite a while now. It

86:07

certainly makes it easier to use that

86:10

surveillance, but a sufficiently

86:14

wellresourced organization could just

86:16

throw a thousand bodies at the problem.

86:19

Um, so yeah, I'm not sure these are due

86:22

privacy problems as maybe more

86:26

common ones than they used to be.

86:28

>> Yeah,

86:29

>> Jeremy, I've just noticed the time. I

86:30

need to get to the airport.

86:31

>> All right,

86:32

>> this has been amazing.

86:33

>> Thank you, sir. Thank you for coming.

86:35

>> Yeah.

86:35

>> Hope you had a nice trip. Thank you so

86:36

much.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video discusses the current state and future implications of AI in software development and beyond. It features a conversation with Jeremy Howard, who shares his insights on the ULM-fit model, the importance of fine-tuning, and the limitations of current LLMs. The discussion touches upon the nature of understanding in AI, creativity, the dangers of over-reliance on AI for coding, and the potential for AI to erode human expertise. It also explores the concept of 'vibe coding' and its addictive, yet potentially detrimental, nature. The conversation highlights the difference between coding and software engineering, emphasizing that true software engineering requires a deeper understanding and intuition that AI currently lacks. The importance of human-AI collaboration is stressed, with the need for environments that foster growth and understanding rather than dependence. Finally, the discussion touches on AI existential risks, arguing that the greater danger lies in the monopolization of powerful AI by a few, rather than the AI itself becoming autonomous and destructive. The conversation concludes by reinforcing the idea that AI should be a tool to augment human capabilities and knowledge, not replace them.