The Dangerous Illusion of AI Coding? - Jeremy Howard
2246 segments
It it it literally disgusts me. Like I
literally think it's it's inhumane. My
mission remains the same as it has been
for like 20 years, which is to stop
people working like this.
>> Jeremy Howard, a deep learning pioneer,
a Kaggle grandmaster. He is a huge
advocate for actually understanding what
we are building through an interactive
loop, a notebook, a ripple, the act of
poking at a problem until it pushes
back. He argues this is where the real
insight happens. And the funny thing is
they're both right. LLM's
cosplay understanding things. They
pretend to understand things. No one's
actually creating
50 times more high-quality software than
they were before. Um, so we've actually
just done a study of this and there's a
tiny uptick tiny uptick in what people
are actually shipping. The thing about
AI based coding is that it's like a slot
machine in that you you have an illusion
of control. you know, you could get to
craft your prompt and your
list of MCPs and your skills and
whatever and then but in the end you
pull the lever, right? Here's a piece of
code that no one understands.
>> Yeah.
>> And am I going to bet
my company's product on it?
And I the answer is I don't know because
like I I don't I don't like I don't know
what to do now because no one's like
been in this situation. They're they're
really bad at software engineering.
Uh and then I think that's possibly
always going to be true. The idea that a
human can do a lot more with a computer
when the human can like manipulate the
objects in inside that computer in real
time and study them and move them around
and combine them together. Whoever you
listen to, you know, whether it be
Feainman or whatever, like you you
always hear from the great scientists,
how
they build deeper intuition by by
building mental models which they get
over time by interacting with the things
that they're learning about.
A machine could kind of build an
effective hierarchy of abstractions
about what the world is and how it works
entirely through
looking at the statistical correlations
of a huge corpus of text using a deep
learning model. That was my premise.
This video is brought to you by Nvidia
GTC. It's running March the 16th until
the 19th in San Jose and streaming free
online. The key topics this year are
agentic AI and reasoning, high
performance inference and training, open
models, and physical AI and robotics.
I'm so excited about the DJX Spark. I've
been on the waiting list for over a year
now. It's a personal superco computer
that is about the size of a Mac Mini.
It's the perfect adornment to a MacBook
Pro, by the way. And you can fine-tune a
70 billion parameter language model with
one of these things. And I'm giving one
away for free. All you have to do is
sign up to the conference and attend one
of the sessions using the link in the
description. As for the sessions, I'm
interested in attending Ammon Sang's
talk. So, he's the CTO of Cursor and his
session is code with context. Build an
agentic IDE that truly understands your
codebase. Now, obviously, Jensen's
keynote is on March the 16th. He said
he's going to unveil a new chip that
will surprise the world. Their next
generation architecture, Vera Rubin, is
already in full production. And there's
speculation we might even get an early
glimpse of their new Fineman
architecture. So don't forget folks, the
link is in the description. If you're
attending virtually, it's completely
free. Don't miss it. Jeremy Howard,
welcome to MLST.
>> I mean, welcome to my home. Thanks for
coming.
>> Yeah. Well, where are we now?
>> We are in beautiful Morton Bay in
southeast Queensland. We are by the sea
um in my backyard.
>> The weather didn't disappoint.
>> It certainly didn't. It doesn't often,
but if you were here yesterday, it would
have been very different.
>> Well, I don't know where to start. So,
I've been I've been a huge fan probably
since about 2017 18. Of course, you had
the famous ulm fit paper. And uh when I
was at Microsoft, I remember doing a
presentation about that because it was
actually I mean now we take it for
granted that we fine-tune language
models on a corpus of text and then we
kind of like continue to train them and
specialize them. But apparently this was
not received wisdom.
>> No, this was the first time it happened.
Yeah, kind of the first or second. Uh,
so Quarkley and Andrew Dy had done
something a few years ago, but they had
missed the key point, which is the thing
you pre-train on has to be a general
purpose corpus.
>> So, no one quite realized this key
thing. And maybe I had a bit of fortune
here that my background was in
philosophy and cognitive science. And
so, I'd spent some decades thinking
about this.
>> The technical architecture of of ULM
fit. Just just sketch that out.
>> I'm a huge fan of regularization. And
I'm a huge fan of taking a model that's
incredibly flexible and then making it
more constrained not by decreasing the
size of the architecture by but by
adding regularization. So even that at
the time was
extremely controversial. Uh but that was
by no means a unique insight of of ours.
So what Steven Merid had done is he
taken the extreme flexibility of an LS
LSTM a kind of ver classic stateful
recurrent neural net towards which
things are kind of gradually heading
back towards nowadays and it added five
different types of regularization. He
added every type of regularization you
can imagine. And then um that was my
starting point was to say okay I now
have a massively flexible deep learning
model that can be as powerful as I want
it to be and it can also be as
constrained as I need it to be. And then
I needed a really big corpus of text.
Funnily enough, this is also Steven. He
had been at Common Crawl and uh I think
he helped or made the uh Wikipedia data
set. And then I realized actually the
Wikipedia data set made lots of
assumptions. It had all these like unk
for unknown words because it all assumed
classic NLP approaches. So I redid the
whole thing, created a new Wikipedia
data set and that was my general corpus
and then I used a WDLSTM trained it. So
it was actually overnight. So for eight
hours on a gaming GPU, you know, um
because I was at the University of San
Francisco, we didn't have heaps of
resources. Um probably like a 2080 Ti or
something, I suspect. Um and then the
then next morning when I woke up, I then
it's the same three-stage architecture
that we do today, you know,
pre-training, mid-training,
post-training. So then I figured, okay,
I've now that I've trained something to
predict the next word of Wikipedia, it
must know a lot about the world. I
then figured if I then fine-tune it on a
corpus specific, so what we could now
call supervised fine-tuning um data set,
which in this case was a data set of
movie reviews, it would becomes
especially good at predicting the next
word of those. So it would learn a lot
about movies. uh did that for like an
hour and then like a few minutes of
fine-tuning
the downstream classifier which was a
classic academic uh data set kind of
considered the hardest one which was to
take like 5,000word movie reviews and to
say like was this a positive or negative
sentiment which today is considered easy
but at that time you know the only
things that did it quite well were
highly specialized models that people
wrote their whole PhDs on I beat all of
their results, you know, 5 minutes later
when it fine-tuning that that model. It
was uh amazing.
>> And the other interesting thing is this
kind of um methodology around how you do
the finetuning.
>> Yeah. So the how we do the finetuning
was something we had developed at fast
AI. So this is kind of year one of fast
AI. So this is still in our very early
days. And one of the extremely
controversial things we did was we felt
that we should focus on fine-tuning
existing models because we thought
fine-tuning was important. Uh some other
folks were doing work contemporaneously
with that. So um Jason Nusinski did some
really great uh research I think it's
during his PhD on how to fine-tune
models and how good they can be and some
other folks in the in the computer
vision world.
we were, you know, amongst the first.
There's a bunch of us kind of really
investing in fine-tuning. And so, yeah,
we we felt that using a single learning
rate to fine-tune the whole thing all at
once made no sense because the different
layers have different behaviors. And
this is one of the things Jason
Yusinsk's research also showed. We
developed this idea of like, well, it's
also way faster if you just train the
last layer, right? because it only has
to backrop the last layer and then once
that's pretty good backrop the ne the
last two and then the last three and
then we use something called
discriminative learning rates. So
different layers we would give different
learning rates to and then another
critical insight that no one realized
for years even though we had told
everybody was that you actually have to
fine-tune every batch norm. So all the
normalization layers you do actually
have to fine-tune
because that's moving the whole thing up
and down or changing its scale. So yeah,
when you do that, you can often just
fine-tune the last layer or two. And we
found that actually with ulm fit uh
although we did end up unfreezing all
the layers, only the last two were
really needed to get the close to
state-of-the-art result. So it like took
like seconds.
>> Yeah. Because the discriminative
learning rate thing is interesting
because I I think the received wisdom at
the time was when you fine-tune a model,
if the learning rate is too high, you
kind of blow out the representations. So
I guess the wisdom was if if you don't
have a really low learning rate, you'll
just destroy the representations.
>> I mean, there was no received wisdom
because nobody talked about it. No one
cared, you know. It was just this
to like nearly no one cared. Transfer
learning was just not something anybody
thought about. And Rachel and I felt
like it matters more than anything, you
know, because
only one person has to train a really
big model once and then the rest of us
can all fine-tune it. Um, so we thought
we just should learn how to do that
really well. So we um
spent a lot of time just trying lots of
things, but in the end the intuition was
pretty straightforward and what
intuitively seemed like it ought to work
basically always did work. Which is
another big difference between how
people still today tend to do ML
research is they think it's all about
ablations and you can't make any
assumptions or guesses. And it's not at
all true. I find nearly everything that
I expect to work almost always works
first time because I spend a lot of time
building up that those intuitions that
kind of understanding of how gradients
behave.
>> I I think there's a dichotomy though
between continual learning which is when
we want to keep training the thing but
maintain generality versus fine-tuning a
thing to do something specific. that
there's always been this idea that yes,
you can make a model specific, you can
bend it to your will, but you lose
generality and you kind of degrade the
representation. So tell me about that.
>> Yeah, there's some truth in that,
although not as much as you might think.
On the whole, the big problem is that
people don't actually look at their
activations and don't actually look at
their gradients. So something we do in
our software in our fast AR software is
we have built into it this ability to to
see in a glance
what your entire network looks like. And
once you've done it a few times it just
takes a couple of hours to learn you can
immediately see oh I see this is
overtrained or undertrained or at this
layer that something went wrong. It's
not a mystery you know. So basically
what happens is for example you end up
with with dead neurons that go to a
point where they they've got zero
gradient regardless of what you do with
them. Um that often happens if they get
head off towards infinity. You can
always fix that. So yeah it's it's not
as bad as people think by any means.
something that trains well for
continuous learning when done properly
can also be done well to train well for
a particular task if you're careful in a
sense you do want the neurons to die and
I'll explain what what I mean by this
like we want to bend the behavior of
models to introduce implicit constraints
because without constraints there is no
creativity there is no reasoning and and
so on and so forth so so in a sense you
actually want it to say, "Don't do that.
You want it to do something else."
>> I don't think of it that way. Like to
me, it's more like I find thinking about
humans extremely helpful when it comes
to thinking about AI. I find they
behave more similarly than differently.
Um, and my intuition about each tends to
work quite well. You know, with a human,
when you learn something new, it's not
about unlearning something else. And so
something I always found is when I got
models to try to learn to do two
somewhat similar tasks, they almost
always got better at both of them than
one that only learned one of them.
>> Um I was reminded a little bit of um you
know the Dino paper from Lun. So this
whole kind of um regime of self-s
supervised learning with with um I mean
that that was that was a vision model
but the you know the idea was okay so
we're doing pre-training and we want to
maintain as much diversity and fidelity
as possible so that when we do the
downstream task we can kind of we've got
more things that we can latch on.
>> Yeah. Yeah. And um
you know semi-supervised and
self-supervised learning was such an
unappreciated area and yeah Jan Lun was
absolutely one of the guys who was also
working on it.
>> I actually did a post because I was so
annoyed at how few people cared about
semi-supervised learning. I did a whole
post about it years ago. Yan Lukun
looked at it for me as well and you know
suggested a few other pieces of work
that I I had missed and um but I was
kind of surprised at how well you know
how incredibly useful it is to basically
say like basically come up with a
pretext task right so in vision so we
did this in vision before ulm fit so it
was like in medical imaging you know p
take a a a hisytologology slide and
predict
you know, mask out a few squares and
predict what used to be there. So, some
of my students at USF I had doing stuff
with that. It was basically entirely
taking stuff that we and others had
already done in vision.
>> Yeah.
>> So, like this idea of masking out
squares, we didn't invent it. Masking
out words was the obvious thing, you
know, and this idea of um gradually
unfreezing layers we had done before in
computer vision. The whole idea of
starting with a pre-trained model that
was general purpose had been in computer
vision. There was a really classic paper
actually in computer vision in might
have been around 2015 was entirely an
empirical paper saying look what happens
when we take a pre-trained imageet model
predicting what sculptor created this
sculpture or predicting what
architecture style this is and like in
every task it got the state-of-the-art
result. And it really surprised me.
People didn't look at that and think
like, I bet that ought to work in every
other area as well, whether it be genome
sequences or language or whatever. But
people have a bit of a lack of
imagination. I find they tend to assume
things only work in one particular
field. Um, that's really true.
>> Yeah. I mean, I guess there's two things
there. I mean, first of all, we were
kind of hinting at this notion of almost
Goodart's law or the shortcut rule that
you get exactly what you optimize for at
the cost of everything else. But that
doesn't seem to be the case because we
can you know optimize for perplexity in
the case of language models and as you
say what seems to happen is we're
getting into the distributional
hypothesis here a little bit. So you
know you you know the word by the
company it keeps. So when we have an
incredible amount of associative data it
might be master auto prediction or any
of these things like that. The model
seems to build something that we might
call an understanding like
>> or I have always thought of it as a
hierarchy of of abstractions. you know,
it it it needs if it's going to predict,
you know, if the if the document is
here was the,
you know, opening that uh,
you know, that um, Bobby Fisher used and
it has chess notation to predict the
next thing, it needs to know something
about chess notation or at least
openings. Um if it's like uh you know
and this was vetoed by the 1956 US
president, you need to know like you
don't even you don't just need to know
who the president was but the idea that
there are presidents and therefore that
the idea that there are leaders and
therefore the idea that there are groups
of people who have hierarchies and
therefore that there are people and
therefore that there are objects and
like you can't predict the next word of
a sentence
well without knowing all of these
things. So that knowing
uh my hypothesis for why I created ULM
ffit is to say it would end to to to
compress that as well as possible to get
that knowledge, it would have to create
these abstractions, these hierarchies of
abstractions somewhere deep inside its
model. Otherwise, how could it possibly
do a good job of predicting the next
word, you And because um deep learning
models are universal learning machines,
you know, and we had a universal way to
train them, I figured
if if we get the data right and if the
hardware is good enough, then in theory,
we ought to be able to build that next
word predicting machine, which ought to
implicitly build a hierarchical
structural understanding of the things
that are being described by the text
that it is learning to predict.
>> I think that they can know in quite a,
you know, they know in quite a
superficial way. So there's a myriad of
surface statistical relationships and
they generalize extraordinarily well.
It's it's miraculous.
>> It is.
>> But the thing is I want to contrast this
with other comments you've made about
creativity. So I I I think knowledge is
about constraints and I think creativity
is the evolution of knowledge,
respecting those constraints. Therefore,
AI is not creative. And and you've said
the same thing. You've said AI isn't
creative. So like on the one hand, how
can you say that they know and not think
that they can be created?
>> I mean, I don't think I've used that
exact expression. You know, I know
actually I remember
chatting with Peter Norvig on camera and
both of us said, well, actually, they
kind of are creative like we just got to
be a bit careful about our choices of
words, I guess. So um you know Peter
Wnjak who's a guy I really really
respect who kind of rediscovered space
repetitive learning built the super memo
system and is the modern-day guru of
memory. The entire reason he's based his
life around remembering things is
because he believes that creativity
comes from having a lot of stuff
remembered which is to say putting
together stuff you've remembered in
interesting ways is a great way to be
creative. LLMs are actually quite good
at that, but there's a kind of
creativity they're not at all good at,
which is, you know, moving outside the
distribution. So,
>> uh, which I think is where you're
heading with your question. Um, but I'm
just kind of I'm framing it this way to
say you have to be so nuanced about this
stuff because if you say like they're
not creative, it gives you the can give
you the wrong idea because they can do
very creative seeming things.
But if it's like well can they really
extrapolate outside the training
distribution
the answer is no they can't
>> but the training distribution is so big
and the number of ways to interpolate
between them is so vast
we don't really know yet what the
limitations of that is but I see it
every day you know because I my my work
is R&D I'm constantly on the edge of and
outside the training data. I'm doing
things that haven't been done before.
And there's this weird thing, I don't
know if you've ever seen it before. I
see it, but I see it multiple times
every day where the LM goes from being
incredibly clever
to like worse than stupid like like not
understanding the most basic fundamental
premises about how the world works.
>> Yeah.
>> And it's like, oh, whoops. I fell
outside the training data distribution.
It's gone dumb. And then like there's no
point
having that discussion any further
because
>> yes,
>> you know, you've lost it at that point.
>> Yes. I mean I I love um you know
Margaret Bowden, she had this kind of
hierarchy of creativity. So there's like
combinatorial, exploratory and um and
transformative and the models can
certainly do combinatorial creativity
but for me it's all about constraints.
So that I mean this is what Bowden said
and even Leonardo da Vinci he said that
creativity is all about constraints and
and you've spoken about you know we'll
talk about this dialogue engineering but
what happens is when when we talk with
language models it's a specification
acquisition problem. So we go back and
forth and actually when we think the
process of intelligence is about
building this imaginary Lego block in
our mind and respecting various
constraints and when you respect those
constraints and you just continue to
evolve then those things are said to be
creative. So language models when you
add constraints to them so this could be
via supervision via critics via
verifiers then they are creative and and
we alpha evolve we've seen many examples
of this but the illusion is on their own
sans constraints obviously they have
this behavioral shaping stuff that we're
talking about they don't have hard
constraints and that's why they can't go
outside their distribution I mean I
think they can't go outside their
distribution because it's just something
that a um that type of mathematical
model can't do you know I It can do it,
but it won't do it well. You know, when
you look at the kind of 2D case of of
fitting a curve to data, once you go
outside the area that the data covers,
the curves disappear off into space in
wild directions, you know, and that's
all we're doing. But we're doing it in
multiple dimensions. You know, I think
Bowden might be pretty shocked at how
far compositional creativity can go when
you can compose
the entirety of the human knowledge
corpus.
Um, and I think this is where people
often get confused because it's like, so
for example, I was talking to Chris
Latner yesterday about
uh how
Claude uh Anthropic, you know, had had
got Claude to write a C compiler and
they were like, "Oh, this is a clean
room C compiler. You can tell it's clean
room because it was created in Rust, you
know, and um so Chris created the kind
of
you know, I guess it's probably the top
most widely used C C++ compiler nowadays
playing on top of LLVM, which is the
most widely used kind of foundation for
compilers. Um, they were like, "Oh,
well, Chris didn't use Rust. This is,
you know, and we didn't give it access
to any
compiler source code, so it's a clean
room implementation."
But that misunderstands
how LLMs work, right? which is all of
Chris's work was in the training data
many many times LLVM is used widely and
lots and lots of things are built on it
um including lots of C and C++ compilers
converting it converting it to Rust is
an interpolation
between
parts of the training data you know it's
a style transfer problem um so it's
definitely compositional creativity at
most if you can call it creative at all
and you actually see it when you look at
the the repo that it created, it's
copied
parts of the LL VM code which today
Chris says like, "Oh, I made a mistake.
I shouldn't have done it that way.
Nobody else does it that way." You know,
oh wow, look, they're the only other one
that did it that way. That doesn't
happen accidentally. That happens
because you're not actually being
creative. you're actually just finding
the kind of nonlinear average point in
your training data between like Rust
things and building compiler things.
>> All of that is true. I mean first of all
I think we shouldn't underestimate the
size of how big this combinatorial
creativity is. So all of that is true.
So the code is on the internet but also
they had a whole bunch of tests which
were scaffolded which meant that every
single time some code was was committed
they could run the test and they and
they basically had a critic and they
could then do this autonomous feedback
loop. So in in a sense it's very similar
to the recent research by open AAI and
and Gemini where you you're you're
trying to solve a problem in math and
you already have an evaluation function.
The same on the AR prize, right? You
have an evaluation function and what
people discount is even knowledge of
what the evaluation function is is
partial knowledge of the problem. So you
can then brute force search. You can use
the statistical pattern matching, use
the verifier as a constraint and you can
actually
>> and they don't even need to do that,
right? like they you literally already
know how to pass those tests because
there's lots of software that already
does it.
>> So, it just uses that and translates
them to Rust. Like that's that's all it
did. Um which is impressive.
>> Yeah.
>> Um and if you I'm much less familiar
with math than I am computer science,
but from talking to mathematicians,
they tell me that that's also what's
happening with like Erdos problems and
stuff. It's some of them are
newly solved.
>> Yeah.
>> Um
but they are not
sparks of insight. You know, they're
solving ones that you can solve by
meshing up together
very closely related things that humans
have already figured out.
>> So on the subject of Claude code, now I
know you've spoken extensively about
vibe coding. Um actually Rachel had some
interesting work out. I mean she she
quoted the the meter study which showed
that productivity actually went down
when people were vibe coding but I think
>> and they thought that they went up which
is the most interesting
>> and then also there was the anthropic
study I mean you know maybe we should
rewind a little bit I mean Dario had
this essay out the other day uh I think
it was called the adolescence of
technology or something like that and
and he was basically saying look you
know um we have all of these amazing
software engineers at anthropic and they
are just so productive and he was
extrapolating to the average software
engineer so there's going to be mass
unemployment because soon we're going to
be able to automate all of this with AI.
>> I mean, it it doesn't make any sense. Um
Elon Musk said something a bit similar a
few days ago saying like, "Oh, LM will
just spit out the machine code directly.
We won't need libraries, programming
languages."
>> Yeah.
>> Um yeah, look, the thing is none of
these guys have
have been software engineers recently.
I'm not sure Dario's ever been a
software engineer at all. Software
engineering is a unusual discipline and
a lot of people mistake it for being the
same as
typing code into an IDE. Coding is
another one of these style transfer
problems. You you take a specification
of the problem to solve and you can use
your compositional creativity to find
the parts of the training data which
interpolated between them solve that
problem and interpolate that with
syntax of the target language and you
get code. There's a very famous essay by
Fred Brooks written many decades ago um
no silver bullet and which it almost
sounded like he was talking about today
it it he was specifically saying
something he was responding to something
very similar which is in those days it
was all like oh what about all these new
fourth generation languages and stuff
like that you know we're not going to
need any coders anymore any software
engineers anymore because software is
now so easy to write anybody can write
it Um,
and he said, well,
he guessed that you could get at maximum
a 30% improvement.
He specifically said a 30% improvement
in the next decade, but I don't think he
needed to limit it that much because the
vast majority of work in software
engineering isn't typing in the code.
>> Yeah. Um,
so in some sense parts of what Dario
said were right, just like for quite a
few people now most of their code is
being typed by a language model. Um,
that's true for me. Uh, say like maybe
90%.
Um, but it hasn't made me that much more
productive. um because that was never
the slow bit. It's also helped me with
kind of the research a lot and figuring
out, you know, which files are going to
be touched.
But anytime I've made any attempt at
getting an LLM to like design
a solution to something that hasn't been
designed lots of times before, it's it's
horrible because what it actually every
time gives me is the design of something
that looks on its surface a bit similar.
And often that's going to be an absolute
disaster because things that look on
their surface a bit similar and like I'm
literally trying to create something new
to get away from the similar thing. It's
very misleading. First of all, I'm I'm
exasperated by what I see as the tech
bro predilction to misunderstand
cognitive science and philosophy and
what not because we we've spoken to so
many really interesting people on MLST
like for example Cesar Hadalgo he wrote
this book the laws of knowledge and and
even Marva Chirama she's a a philosopher
of neuroscience and she was talking all
about you know like you know basically
that knowledge is protein so yeah I I
think that that knowledge is
perspectival I don't think that
knowledge can be this abstract
perspective free thing that can exist on
Wikipedia and um I also think that
knowledge is is embodied and it's alive.
It's it's something that exists in in us
and the purpose of an organization is to
preserve and evolve knowledge. So when
you start delegating cognitive tasks to
language models, you actually have this
weird paradoxical effect that you erode
the knowledge inside the organization.
>> Well, that's true and that's terrifying.
There's often this these arguments
online
>> between people who are like, "LMs don't
understand anything. They're just
pretending to understand."
>> And then other people are like, "Don't
be ridiculous. Look what this LLM just
did for me." Right? And the funny thing
is they're both right. LLM's
cosplay understanding things. They
pretend to understand things.
And this was the interesting thing about
the early kind of uh work with like uh
cognitive science work with like Daniel
Dennett. Um that's basically what the
Chinese room experiment is, right? Is
you've got a guy in a room who can't
speak Chinese at all, but he sure looks
like he does because you can feed in
questions and he gives you back answers,
but all he's actually doing is looking
up things in a huge array of books or
machines or whatever. The difference
between pretending to be intelligent and
actually being intelligent is entirely
unimportant as long as you're in the
region in which the pretense is actually
effective, you know. So, so it's
actually fine for a great many tasks
that LLMs only pretend to be intelligent
because for all intents and purposes, it
it it just doesn't matter until you get
to the point where it can't pretend
anymore. And then you realize like oh my
god this thing's so stupid.
>> I'm a fan of so by the way. So you know
he said that um you know understanding
is causally reducible but ontologically
irreducible and he was saying there was
a phenomenal component to understanding
but you don't even need to go there.
Like the interesting thing about
knowledge being protein is this idea
that the you know it's basically this
canon idea the world is a complex place.
None of us understand it. It's like the
blind men and the elephant. We all have
different perspectives. It's very
complex thing. And so we we all we all
do this kind of modeling. But the
interesting thing is that the language
models sometimes they seem to understand
and they understand because the
supervisor places them in a frame. So
inside that frame, so when you have that
perspective of the elephants, they're
actually surprisingly coherent, but we
discount the supervisor placing the
models in that frame.
>> Yeah. Yeah. So that so C cell versus
Dennit or is it versus cell and Dennut
was what everybody was talking about
when I back when I was doing my
undergrad in philosophy you know so I
think consciousness explained came out
about then probably Chinese room a
little bit before
um it's interesting because the
discussions were the same discussions
we're having now but they've gone from
being abstract discussions to being real
discussions
it's helpful if people go back to the
abstract discussions because that it's
it it helps you get out of your
you know it's very distracting at the
moment to look at something that's
cosplaying intelligence so well and go
back to the fundamental question. Um, so
anyway, I just wanted to mention that's
kind of it's it's this interesting
situation we're now in where it's very
easy
to
really get the wrong idea about what AI
can do. Um, particularly when you don't
understand the difference between coding
and software engineering.
>> Yeah. Which then takes me to your point
or your question about the implications
of that
for organizations.
>> Yeah.
>> You know, a lot of organizations are
basically betting their futures on a
speculative premise
which is that
AI is going to be able to do everything
better than humans
uh or at least everything in coding
better than humans. I I worry about this
a lot both for the organizations and for
the humans. You know, for the humans
when you're not actively using your
design and engineering and coding
muscles, you don't grow. You might even
wither, but you at least don't grow. And
you know, speaking of the CEO of an R&D
startup, you know, if my staff aren't
growing, then we're going to fail. You
know, uh that we can't let that happen.
and getting better at the particular
prompting skills whatever details of the
current generation of AI CLI frameworks
isn't growing you know that's that's
like that's as helpful as learning about
the details of some AWS API when you
don't actually understand how the
internet works you know it's not it's
not reusable knowledge it's ephemeral
knowledge
So like if you wanted to, you can
actually use it as a learning
superpower,
but also
it can do the opposite. You know, the
natural thing it's going to do is
remove your confidence over time.
>> I agree that that's the natural thing.
So and this is especially pertinent for
you because your your career has been
around basically educating people to
get, you know, technology and AI
literacy. So the default behavior is
very similar to a self-driving car that
you know there's this tipping point
where at some point you're not engaged
anymore. You're not paying attention and
you get this delegation of competence
and you get understanding debt. That's
the default thing. So this study from
anthropic a couple of weeks ago, it it
contradicted Dario completely because it
even said that yeah, there were a few
people in the study that were asking
conceptual questions that are actually
kind of, you know, keeping on top of
things and they had a gradient of
learning, but most people didn't. And my
hypothesis about that is, you know, the
ideal situation for Gen AI coding is
that like us, we've been writing
software for decades. We already have
this abstract understanding. We're using
it in domains that we know well and we
can specify, we can remove loads of
ambiguity. we can track and we can go
back and forth and we can we can stay in
touch with the process. But what happens
is that the the default attractor is for
people to just go into this autopilot
mode and they've got no idea what's
happening and it's actually making them
dumber.
>> Uh I I created a the first deep learning
for medicine company called Denalytic
back in what was that like 2014.
Um, and our initial focus was on
radiology and a lot of people were
worried
that this would cause radiologists to
become less effective at radiology.
>> Yeah.
>> And I strongly felt the opposite, which
is, and I did quite a bit of research
into this of like what happens when
there's like fly by wire in airplanes or
anti-lock brakes in cars or whatever.
If you can successfully automate parts
of a task that really are automatable,
you can allow the the expert to focus on
the things that they need to focus on.
And we saw this happen. So in radiology,
we found if we could automate
identifying the possible nodules in a
lung CT scan, we were actually good at
it, which we were. And then we the the
radiologist then can focus on looking at
the nodules and trying to decide if
they're malignant or what to do about
it. So again, it's one of these subtle
things. So if there's things which you
can fully automate effectively in a way
that you can remove that cognitive
burden from a human so that they can
focus on things that they need to focus
on.
That can be good. You know, I don't know
where we sit in software development
because, you know, I've been coding for
40ish
years. So, I've written a lot of code
and I can glance at a screen of code
and, you know, unless it's something
quite weird or sophisticated, I can
immediately tell you what it does and
whether it works and whatever. Um,
I can kind of see intuitively things
that could be improved, you know,
possible things to be careful of. I'm
not sure I could have got to that point
if I hadn't have written a lot of code.
So the people I'm finding who can really
benefit from AI right now are either
really junior people who can't code at
all who can now write some apps that
they have in their head and as long as
they work reasonably quickly um with the
current AI capabilities then they're
happy and then really experienced people
like like me or like Chris Latner
because we can basically have it do some
of our typing for us, you know, and some
of our research for us. People in the
middle, which is most people most of the
time, it really worries me because how
do you get from point A to point B?
>> Yeah.
>> Without typing code, it might be
possible, but we don't have a we have no
experience of that. We don't is is it
possible? How would you do it? Like is
it kind of like going back to school
where at primary school we don't let
kids use calculators so that they
develop their
number muscle?
Do we need to do that for like first
five years as a developer? You have to
write all the code yourself.
I don't know. But if I was an between
like two and 20 years of experience
developer, I would be asking that
question of myself a lot because
otherwise you might be in the process of
making yourself obsolete.
>> Yeah. Well, this is another thing about
knowledge that um this Cesa Hadalgo guy
said. So he said that knowledge is
nonfgeible and which means it can't be
exchanged. So what he means by that is
the process of learning is in some
important sense not reducible right so
you have to have the experience and the
experience has to have friction and when
we build models of the world we actually
learn like you know there's this phrase
reality pushes back so we make lots of
mistakes and we update our models and we
and we're just placing these coherence
constraints in in our in our model and
that's how we come to learn. So you use
claw code and there's so little friction
in the process. That's exactly what this
study from anthropic said. It said there
was so little friction they didn't learn
anything
>> right. Yeah. No, exactly. Um desirable
difficulty is the concept that kind of
comes up in education. But even going
back to the work of uh Ebinghouse who
was the original repetitive spaced
learning guy in the 19th century and
then Peter Wnjak more recently it's we
find the same like we we we know that
memories don't get formed
unless
it is hard work to form them. Uh so you
know that's where you kind of get this
somewhat um surprising result that says
uh revising too often is a bad idea
because it comes to mind too quickly.
And so with repetitive space learning
with stuff like Anki and Super Memo, the
algorithm tries to schedule the flash
cards at a just before the moment you're
about to forget. So then it's hard work.
So I I studied uh Chinese for 10 years
um in order to try to learn about
learning myself. Um and I really noticed
this that I used Anki and because it was
always scheduling my cards just before I
was about to forget them. It was always
incredibly hard work.
>> Yeah.
>> You know to do reviews because almost
all the cards were ones I was on the
verge of forgetting. It was absolutely
exhausting. But my god, it worked well.
Here I am. I don't really haven't done
any study for 15 plus years and I still
remember my Chinese.
>> Well, I know I mean also I mean coming
back to your radiology example um like
one example people give is call centers.
So you know we have this notion that in
an organization we have high
intelligence roles and low intelligence
roles and for me intelligence is just
the adaptive acquisition and synthesis
of of knowledge. So we assume that that
you know the low intelligence roles
doing the call center stuff um it's it
doesn't adapt which means we can you
know there are certain things that an
organization does that do not change so
we could automate them and we don't need
to update our knowledge and I think that
discounts actually maybe with the
radiology example that having this
holistic knowledge like you know in a
call center there are so many weird edge
cases that come in so many weird things
happen and that filters up in the
organization and we adapt over time so
when you start to automate things and
you actually lose the competence to
create the process which created the
thing in the in the first place and you
lose the evolvability of that knowledge
in the organization, you're actually
kind of cutting your legs off.
>> Yeah, absolutely. And um so I you know
all I know is in in my company
I just I tell our our staff all the time
almost the only thing I care about is
how much your your personal human
capabilities are growing. you know, I
don't actually care how many PRs you're
doing, um, how many features you're
doing. Like, uh, there's that nice, um,
you know, John Oster, the T TCL guy,
recently released some of his st the his
Stanford Friday
takeaway lectures and he has this nice
one called um, a little bit of slope
makes up for a lot of intercept. Uh just
basically the idea that that you know in
your life if you can focus on doing
things that cause you to grow faster.
>> Yeah.
>> It's way better than focusing on
focusing on the things that you're
already good at. You know that has that
high intercept.
>> So the only thing I really care about
and I think is the only thing that
matters for for my company is that my
team I'm focusing on their slope.
>> Yeah. If you focus on just driving out
results at the limit of whatever AI can
do right now, you're only caring about
the intercept, you know. So, I think
it's basically a path to obsolescence
through a company and the people who are
in it. And so, I'm really surprised how
many executives of big companies are
pushing this now because it feels like
if they're wrong, which they probably
are, and they have nowhere to tell if
they are because this is an area they're
not at all familiar with. if they never
learned it in their MBAs. They're
basically setting up their companies to
be destroyed.
>> Yeah.
>> And really surprised that,
you know, shareholders would let them do
that. You know, set up such an
incredibly speculative action. Yeah.
Here we are. It feels like a lot of
companies are are going to fail as a
result of the uh amassed tech debt that
causes them to not be able to maintain
or build their products anymore. There
are loads of folks out there like France
relate like he he he really gets it. He
he understands this and you know so he's
always said that it's it's about this
kind of mmetic sharing of cognitive
models about the domain and how we
refine it together on the sharing thing.
This is another big scaling problem with
Gen AI coding, right? So the the the the
ideal case, I've done this. I know a
domain really well and I can specify it
with exquisite detail and I tell claude
code, go and do this thing and the
models in my mind doesn't matter. Um and
then you go into an organization and now
I need to share like my knowledge with
all of the other people, right? And I'm
sure you have this in your company as
well. Like you need to that this
knowledge acquisition bottleneck is a
real serious problem in in
organizations. So when it's just me, I I
I think I'm probably about 50 times more
productive using claude code. It's
absolutely magic and I can see why
people are so excited about it. But
people don't seem to understand the
bottleneck and and how that doesn't
really translate to many real world
organization.
>> No one's actually creating
50 times more highquality software than
they were before. So, we've actually
just done a study of this and there's a
tiny uptick tiny uptick in what people
are actually shipping.
That's the facts. Obviously, I'm an
enthusiast of of AI and what it can do,
but also my wife Rachel recently pointed
out in an article,
all of the pieces that make gambling
addictive are present in
>> Yeah. dark flow.
>> I was going to bring that up. You have
to tell us about
>> coding.
>> Yeah,
>> it's this really awkward situation where
it's very almost everybody I know who
got very enthusiastic about AI powered
coding in recent months have totally
changed their mind about it when they
finally went back and looked at like how
much stuff that I built during those
days of great enthusiasm am I using
today? Are my customers using today? am
I making money from today?
Almost all the money is being made by
influencers, you know, or by the
companies that produce the tokens. The
thing about AI based coding is that it's
like a slot machine in that you you have
an illusion of control. You know, you
can get to craft your prompt and your
list of MCPs and your skills and
whatever. And then but in the end, you
pull the lever, right? You put in the
prompt and something comes back and it's
like cherry cherry. It's like oh next
time I'll change my prompt a bit. I'll
add a bit more context. Pull the lever
again. Pull the lever again. It's the
stochastic thing. You get the occasional
win that's like, "Oh, I won. I got a
feature."
So, it's got it's got all these
hallmarks of like loss disguised as a
win. um somewhat stochcastic uh feeling
of control, all the stuff that um gaming
companies try to engineer into their
gaming rooms. Now, none of that means
that AI is not useful,
but gosh, it's hard to tell.
>> I know. And and Rachel, just just to be
clear to she also said that one of the
hallmarks of gambling is that you kind
of delude yourself that you have some
awareness of what's going on, but but
actually you don't. But let let's do the
bull case a little bit, though. So
because I do I do think in restricted
cases it it is it is very useful and
these are cases where we understand and
and we can place constraints and
specification but um even in those cases
you could argue on the one hand that
we're not you know we're not going to be
unemployed anytime soon because you just
do more work on the addiction thing I've
noticed that so I've had 14-hour claude
code marathon sessions and and I
actually feel addicted to it. It's like
a slot machine, you know. It It really
is.
>> Been there, too. Absolutely.
>> Yeah, I know. It's And just I've never
felt more drained writing code. I
actually need to take a rest afterwards,
like a few days rest because it
completely
>> was crap, you know. Yeah, definitely.
I've had some successes, right? And so
in fact, we've spent the last couple of
years building a whole product based
around where we know the successes are
going to be, which is when you're
working on reasonably small pieces that
you can fully understand and that you
can design and you can build up your own
layers of abstraction to create things
that are bigger than the uh parts that
you're building out of. had a very
interesting situation recently where I
just it was kind of an experiment
basically which is we uh we rely very
heavily on something called um IPI
kernel which is the thing that's powers
Jupyter notebooks
um and there had been a major version
release of IPI kernel from 6 to 7 and it
stopped working um and it stopped
working and both of the products that we
were try to use it with one was was
called today NB classic which is the
original Jupyter notebook book and then
it's our own product called Solve It
would just randomly crash.
Um,
an iPel's over 5,000 lines of code. It's
very complex code, multiple threads,
events, blocks, interfaces with IP,
Python, you know, with ZMQ,
you know, all kinds of different pieces,
um, um, debug pie.
and I I couldn't get my head around it
and I couldn't see why it was crashing.
The tests are all passing. I wonder if
AI can solve this. You know, it's like
I'm always interested in the question of
like how big a chunk can AI handle on
its own right now. The answer turned out
to be
yes. I think it can just it was like so
I spent a couple of weeks I didn't
develop a lot of understanding about how
IPI kernel really worked in the process
but I did spend quite a bit of time kind
of pulling out separate comp like so the
answer was in two hours codeex 5 point I
think it was 5.2 two at that time or
maybe three had just come out. Couldn't
do it. Then if I got the $200 a month uh
GPT5.3
Pro
>> to fix the problems, it could. And so by
rolling back between those two pieces of
software, those two models, I could get
things working uh over a couple of weeks
period. And like you say, it wasn't at
all fun. It was very tiring and it felt
stressful because I wasn't really in
control. But the interesting thing is I
now am in a situation
where I have the only implementation of
an of a Python Jupiter kernel that
actually works correctly as far as I can
tell with these new version 7 protocol
improvements. And now I'm like, well,
this is fascinating because we don't
have a kind of a a software engineering
theory of what to do now. like
here's a piece of code that no one
understands.
>> Yeah.
>> Am I going to bet my company's product
on it?
And I the answer is I don't know because
like I I don't I don't like I don't know
what to do now because no one's like
been in this situation. And like will it
does it have memory leaks? Will it still
work in a year's time if there's some
minor change to the protocol?
Is there some weird edge case that's
going to destroy everything?
No one knows because no one understands
his code. It's a really curious
situation. I
>> mean, first of all, we should
acknowledge the penicious erosion of
control. So, at the very beginning, you
have 10% AI generated code and then you
can just see how it creeps up and up and
then at some point 6 months down the
line a PR comes in and now you know 60%
of the code is AI generated and do do
you see what you see what happens? you
you slowly become disconnected. But the
bullcase for this is, you know, in AI
there's this idea called functionalism
that, you know, we don't care what the
intelligent thing is made out of. As
long as it does all of the right things,
then we know, you know, we would say
it's AI. And it's the same thing with
software. So the bullcase is I I
understand the domain. I don't need to
write I don't need to know how to write
the quicksort algorithm. I just need to
I I just need to understand it, right?
and then and then you know so I just
need to have all of these tests and it
needs to go into deployment and these
things need to happen and at that point
you know what I don't actually care and
and I could also
>> I quite and I quite to be clear I quite
like that um framing but you know what
that actually does is it says wow
software engineering sure is important
then because software engineering is all
about finding what those pieces are and
how they should behave and then how you
can put them together to create a bigger
piece and then how you can put them
together to create a bigger piece. And
if we do that well, then in 10 years
time, we could have software that is far
more capable than anything we could even
imagine today.
>> Um,
but you're already going to get that
with really great software engineering.
Yeah, you want to be careful. I think in
the end like IPI kernel I'm finding for
example, it's just too big a piece,
right? Because in the end the the team
that made the original IPI kernel were
not able to create a set of tests that
correctly exercised it and therefore
real world downstream projects including
the original NB classic you know which
is what IPI kernel was extracted from
didn't work anymore. So this is this is
kind of where our focus is on now on the
development side at um at Answeri
is finding the right sized pieces
and making sure they're the right
pieces. Knowing how to recognize what
those pieces are and how to design them
and how to put them together is actually
something that normally requires
some decades of experience before you're
really good at it. Um certainly it's
true for me. Um I reckon I got pretty
good at it after maybe 20 years of
experience.
Yeah, it's a big question is like how do
you build these software engineering
chops which are now even more important
than they've ever been before. They're
the difference between somebody who's
good at writing computer software and
somebody who's not. That feels like a
challenging question.
>> I know. And there's also this notion
that there are so many different ways to
abstract and represent something. You
know, the world is a very complex place.
And maybe the way we've been abstracting
and representing software is mostly a
reflection of our own cognitive
limitations, right? And even in the
sciences in in physics, you tend to have
a lot of quite reductive methods of
modeling the world. And then you've got
complexity science, which is just
embracing the constructive dissipative,
you know, gnarly nature of of things.
And I think a lot of software today we
don't understand right so for example
there are many globally distributed
software applications that use the actor
pattern and this is just this in it's
basically like a complex system right
and the only way we can understand it is
by doing simulations and tests because
no one actually knows how all of these
things um fit together so you could
argue I guess as a bullc case that maybe
we already are doing this at the top of
software engineering and that is what we
want to do eventually anyway. Yeah, I'd
say probably not. You see companies like
Instagram and WhatsApp
dominate their sectors where whilst
having
10 staff and beating companies like
Google and Microsoft in the process. I
would argue this way of building
software in very large companies is
actually failing. And I think we're
seeing a lot of these very large
companies becoming you know increasingly
desperate and you know for example the
quality of
Microsoft Windows and Mac OS has very
obviously deteriorated greatly in the
last 5 to 10 years. You know, back when
Dave Cutler was looking at every line of
the NT kernel and making sure it was
beautiful,
it it was a elegant and marvelous piece
of software, you know, and there's I
don't think there's anybody in the world
who's going to say that Windows 11 is an
elegant and marvelous piece of software.
So, I actually think we do need to find
these smaller components that we do
fully understand and that we need to
build them up. And here's the problem.
Um, AI is no good at that. So, and and
so I say that empirically. They're
really bad at software engineering. Uh,
and then I think that's
possibly always going to be true
because,
you know, we're we're asking them to
often move outside of their training
data. you know, if we're trying to build
something that literally hasn't been
built before and do it in a better way
than has been done before, we're saying
like don't just copy what was in the
training data.
So, um, and again, this is a confusing
point for a lot of people because they
see AI being very good at coding and
then you think like, oh, that's software
engineering, you know, it's like, oh,
it's must be good at software
engineering. But it's they're different
tasks. There's not a huge amount of
overlap between them and there's no
current empirical data to suggest that
LLMs are gaining any competency at
software engineering. Every time you
look at a piece of software engineering
they've done like the browser for
example which um cursor created or the C
compiler which um anthropic comp created
like I've read the source code of those
things quite a bit. Um Chris Latner is
much more familiar with the compiler
example than me. Um but they're they're
very very obvious copies of things that
already exist. So
that's the challenge, you know, is if
you want to build something that's not
just a copy, then you can't outsource
that to an LLM.
There's no theoretical to reason to
believe that you'll ever be able to and
there's no empirical data to suggest
that you'll ever be able to.
>> Yes. I think the punchline of this
conversation is and I'm I'm sure you
would agree with this that we need to
have the combination of AI and humans
working together, right? Because the
humans provide the the understanding and
all of the stuff we were saying about
knowledge, but we can still use AI as a
tool. We need to we need to design
operating models or ways of working that
make that you know we we say we we don't
want to diminish our competence and
understanding right
>> so so it's very it's a very fine line
>> that's that's been our focus and we both
focus on that for teaching and for our
own internal development the stuff I've
been working on for 20 years has turned
out to be the thing that makes this all
work
should get credit for this was the guy
that created the notebook interface.
Although also lots of ideas kind of go
back to small talk and lisp and APL. But
basically the idea that a human can do a
lot more with a computer when the human
can like manipulate the objects in
inside that computer in real time and
study them and move them around and
combine them together. Yeah, that's what
small talk was all about. you know with
objects and AP was the same with arrays.
Mathematica basically is a superpowered
lisp which then also added on this very
elegant notebook interface that allowed
you to construct kind of a living
document out of all this. So I built
this thing called NBDEV a few years ago
which is a way of creating production
software inside these notebook
interfaces inside these rich dynamic
environments and I found that made me
dramatically more um productive as a
programmer and like today even though
I've
never been a full-time programmer as my
job when you look at my kind of GitHub
repo output I think GitHub produced some
statistics about it and I was like just
about the most productive programmer in
in Australia, you know, like it it's
working and a lot of the stuff I build
has lots and lots of people use it
because it's such a rich powerful way to
build things. And so it turns out we've
now discovered that if you put AI in the
same environment with the human again in
a in a rich interactive environment
AI is much better as well which perhaps
isn't shocking to hear but the the
normal like if you use clawed code which
I know you do and it's a very good piece
of software but the environment we give
clawed code is very similar to the
environment that people had 40 years
ago. go, you know, it's a it's a
linebased terminal interface.
Uh, you know, it can use MCP or
whatever. Most of the times it just
nowadays uses bash tools, which again
very powerful. I love bash tools. I use
them all the, you know, CLI tools all
the time, but it's still just it's using
text files, you know, as its as its
interface to the world. It's it's it's
really meager.
So um so we put the human and the AI
inside a Python interpreter.
Um and now suddenly you've got the full
power power of a very elegant expressive
programming language that the human can
use to talk to the AI. The AI can talk
to the computer. The human can talk to
the computer. The computer can talk to
the AI. Like you have this really rich
thing. And then we let the human and the
AI in real time
build tools that each other can use. And
that's what it's about to me, right?
It's about like creating an environment
where humans can grow and engage and
share. It's like for me when I use solve
it, it's the opposite of that experience
you described with Claude Code. After a
couple of hours, I feel energized and
happy and fulfilled.
>> I'll give you I'll give you my take. I
think that the thing that you're
pointing to here is there's something
magic about having an interactive
stateful environment that gives you
feedback.
>> And that is because our brains kind of
they they can do a a certain you know
unit of work. So, so we actually think
through refining and testing with
reality and that's why I mean I during
my PhD I use Mathematica and Mat Lab and
I agree so we've got this ripple
environment and you know here's the
matrix do an image plot you know do a
change this is what it looks like now
and it's actually a wonderful way to
kind of just just refine my mental model
about something
>> and but but Claude code does a lot of
this stuff I I think it's mostly a skill
issue I think the people that use Claude
code effectively do this I've written a
content management system.
>> It's possible. It is possible.
>> It's possible. Yeah. So, you know, I've
written a content management system
called Rescript. And when I'm putting
together a documentary video, it can go
it can it can pull transcripts and then
I can verify the claims. And you know,
part of AI literacy is just
understanding the the asymmetry of
language models, right? So, when you
give them a sort of discriminative task,
they're actually quite good. So if I
tell it in a sub agent to go and verify
every individual claim, it's much more
accurate than if I was in generation
mode and I was generating a bunch of
claims and the stateful feedback thing
again, you know, I can have some kind of
like schematized XML dump and I can have
like an application here on the side
which is visualizing and it's like a
feedback loop and for me this is an AI
literacy thing like the the good people
at AI are already doing this.
>> Yeah. So I don't fully agree with you. I
agree you can do it in clawed code and I
agree it is a AI literacy thing as to
whether you can but also claude code was
not designed to do this. It's not very
good at it and it doesn't make it the
natural way of working with it. I don't
want to say it's an AI literacy problem
because that's like saying like, oh,
it's a you problem. To me, if a tool is
not making it the natural way for a
human to become
more knowledgeable, more happy, more
connected
with a deeper understanding and a deeper
connection to what they're working on.
That's a tool problem. That that should
be how tools are designed to work. So so
many models and tools expressly are
being evaluated on can I give it a
complete piece of work and have it go
away and do the whole thing which feels
like a huge mistake to me versus have
you evaluated whether a human comes out
the other end with a deep understanding
of a topic you know so that they can
really easily build things in the
future. I agree with all of that, but
then there's the other interesting angle
which is that there was a famous talk by
Joel Grus and we'll talk about this and
and he said that notebooks are terrible.
Um they're really bad from a software
engineering point of view and and at the
time and maybe still now to a certain
extent I I agree with him because um you
know I' I've I've done ML DevOps. I've
worked in large organizations you know
like trying to figure out how do we
bridge like data science and software
engineering and claude code is already
more towards the software engineering
side and what that means is it creates
item potent stateless repeatable
artifacts right so as you say from a
pedagogical point of view it's really
good having this stateful feedback
because I can understand what's going on
but then I need to translate that into
something which is deployable and can
you tell us the story of you You you
responded to Joel Bruce, didn't you? And
and it was a bit of a fiasco, wasn't it?
But what just tell us about that story?
>> He did a really good video called uh I
don't like notebooks.
Um it was uh hilarious. It was really
well done. Um and uh yeah, I was totally
wrong. And all the things he said
notebooks can't do,
they can. And all the things he said you
can't do with notebooks, I do with
notebooks all the time. So it was a very
good very amusing uh incorrect talk. So
then I did a kind of a parody of it
called um I like notebooks um in which I
basically copied with credit most of his
slides and showed how every one of them
was totally incorrect. But like I
actually think your comment about it
does come down to the heart of it which
is this diff difference between like how
software engineering is normally done
versus how
scientific research and similar things
is normally done
and I think and I agree there is a
dichotomy there and I think that
dichotomy is a real shame because I
think software development is being done
wrong. uh it's being done in this way
which is yeah all about reproducibility
and these like dead these dead pieces
you know it's it's all dead code dead
files I I will never be able to express
this one millionth as clearly as um as
Brett Victor has in his work so I'd
encourage people who haven't watched
Brett Victor to to to watch him but you
know he he's he shows again and again
how a direct connection you a direct
visceral connection with
the thing you're doing is is all that
matters, you know, and that's his
mission is to make sure people have that
connection and that's basically my
mission as well. So for me, traditional
software engineering is as far from that
as it is possible to get. I think it's I
think it's gross. Like I I I find it
disgusting and I find it sad that people
are being forced to work like that. It's
like I think it's inhumane and I just
don't think it works very well. I mean
empirically it doesn't work very well.
Uh and it's much less good for for AI as
well as it's much less good for humans.
It hasn't always been that way like you
know with with Ellen K and Small Talk
and uh Iverson and APL you know Lisp
Wolf with Mathematica.
It to me these were the golden days
when when
people were focused on the question of
how do we get the human into the
computer to work as closely with it as
possible. You know, that's where the the
mouse came from, for example, like to
like click and drag and
visualize entities in your computer as
things you can move around.
So, I feel like we've lost that. I think
it's really sad. Yeah. With claude code
and stuff, the the default way of
working with them is to go super deep
into it. It's like, okay, there's a
whole folder full of files. You never
even look at them. Your entire
interaction with it is through a prompt.
>> Yeah.
>> I it it it literally disgusts me. Like I
literally think it's it's inhumane and
it's my mission remains the same as it
has been for like 20 years, which is to
stop people working like this.
>> I know. But so casting my mind back, I
used to work with data scientists. They
were using Jupyter notebooks. And what I
found was typically I mean back then you
couldn't if you checked them into git it
wouldn't look very good. Most of these
data scientists didn't know how to use
git. They would run the cells out of
order which means it wouldn't be
reproducible. There all sorts of things
like that. But the thing is I agree with
you that you you can use them in this in
this workflow. But it comes back to what
I was saying before about you know we we
were talking about the call center and
it being like a low intelligence job.
You know the data scientists the reason
why they they are doing intelligent work
is they are actually creating something
that doesn't exist. They are figuring
out the the the contours of a problem.
They're actually working in a domain
that is poorly understood. But you could
argue now the bull case is when the data
scientists can succinctly describe the
contours of the problem. Maybe we could
go to claude code and we could implement
it properly. But how do we bridge
between those two worlds?
>> I think that'd be a terrible terrible
idea. you know like
>> you don't want to remove people from
their exploratory environment you know
um
research and and uh um science is
developed by people
building insight you know um whoever you
listen to you know whether it be
Feainman or whatever like you you always
hear from the great scientists how
They build deeper intuition by by
building mental models which they get
over time by interacting with the things
that they're learning about. Now like in
Feman's case because it was theoretical
physics he couldn't actually pick up a
spinning quark but he did literally
study spinning plates. You know you got
to find ways to to deeply interact with
with what you're working with. Like so
so many times I've seen data science
teams because you're right, data science
teams
aren't very familiar with Git and aren't
very familiar with things that they do
need to understand.
Um, and so often I've seen a a software
engineer will become their manager and
their fix to this will be to tell them
all to stop using Jupyter notebooks and
now they have to use all these
reproducible blah blah virtual, you
know, virtual end blah blah. they
destroy these teams over and over again.
I've seen this keep happening
um because the solution is not create
more discipline and bureaucracy,
it's solve the actual problem. So for
example, we we built a um a thing called
an NB merge driver
which a lot of people don't realize this
but actually uh notebooks are extremely
git friendly. It's just that Git doesn't
ship with a merge driver for them. So,
Git only ships with a merge driver for
uh linebased text files, but it's fully
pluggable. And so, you can easily plug
in one for JSON files instead. And so,
we wrote one. So, now when you diff, you
know, when you get a get diff with our
merge driver, you see cell level diffs.
If you get a merge conflict, you get
surge level cell level merge conflicts.
the the notebook is always openable in
Jupiter. Um NB Dime did the same thing.
So two independent implementations of
this.
>> So yeah, there were problems to solve,
you know. Um but the solution to it was
not
throw away Brett Victor's ideas and make
people further away from from their
exploratory tools, but to fix the
exploratory tools. And I think all
software developers
should be using exploratory based
programming to deepen their
understanding of what they're working
with so that they end up with a really
strong mental model of the system that
they're building and they're working
with and then they can come up with um
better solutions more incrementally
better tested. I basically never have to
use a debugger because I basically never
have bugs and it's not because I'm a
particularly good programmer. It's
because I build things up small little
steps and each step works and I can see
it working and I can interact with it.
So there's no room for bugs. You know,
>> you know, I'm so torn on this because I
agree with you and I'm also skeptical of
people who say that organizations they
they they converge onto ways of doing
things and they no longer need to
evolve. They no longer need to adapt.
You know, innovation is adaptivity,
right? And we should increase the
surface area of adaptivity as much as we
possibly can. So we need people that are
constantly testing new ideas, finding
these constraints. But by the same
token, we need to use the cloud. We need
to use CI/CD. We need to get this stuff
into production.
>> Yeah. So do you abs do but like there's
absolutely no like so NBD ships with out
ofthe-box CI integration and
the like the tests are literally there
like because the source is a notebook
the entire exploration of like how does
this API work you know what does it look
like when you call it the implementation
of the functions the examples of them
the documentation of them the tests of
them are in one place So it it's much
easier to be a good software engineer in
this environment.
Um so yeah like do do both you know.
>> So do you remember there was there was
that existential risk should be an
urgent priority and it was signed by
folks like Hinton and and Demis and you
responded um basically with a rebuttal
and that was with um Aravind you know
the the snake oil guy. Tell me about
that. Do you think we should be worried
about AI existential risk?
>> I mean, that was a certain time, wasn't
it? And I feel like things have changed
a bit. Thank God. I feel like we we not
just me and Ara, but um broadly
speaking, the community of which we're a
part kind of probably won that. Now we
have other problems to worry about. But
you know basically at that point um
the prevailing narrative was
AI is about to become autonomous. It
could become autonomous at any moment
and could destroy the world. Uh so very
much comes from um you know Alysia
Yukowsk's
>> work which
>> I think clearly has been shown to be
wrong at many levels to this point.
>> They would refute that obviously. Of
course they would.
>> Yeah,
>> it's one of those things that they can
always refute just like any doomsday
cult unless you give it a date uh and
the date passes.
>> Well, even I' I've updated a little bit
in the sense that I I now think I would
now say that these models can be said to
be intelligent in restricted domains.
The arc challenge showed that. So if you
place constraints into the problem, you
you can you can go faster towards a
known goal. even agency. You you can put
a planner on there and you can go if you
if you know where you're going, you can
go there faster, but that doesn't help
you. Like you can have all the
intelligence and agency in the world,
but if you don't have the knowledge and
the constraints, then you're going in
the wrong direction faster. And I think
they don't seem to appreciate that these
models don't actually know the world.
Like none of that was even relevant to
Avan and my point which was and is that
it's misunderstanding
where the actual danger is
>> which is that when you have a
dramatically more powerful technology
entering the world
uh that can make some people
dramatically more powerful.
People who
are in love with power will seek to
monopolize that technology
and the more powerful it is, the more
strong that urge from those power-
hungry people will be. So to ignore
people, so here's the problem. If you're
like, I don't care about any of that.
All I care about is autonomous AI taking
off, you know, singularity, paperclip,
nano goo, whatever. The obvious solution
to that is, oh, let's centralize power.
And this is which is what we kept seeing
particularly at that time. Let's give um
uh either very rich technology companies
or the government or both all of this
power and make sure nobody else has it.
In in my threat model, that's the worst
possible thing you can do because you've
centralized the ability to control in
one place and therefore these people who
are desperate for power just have to
take over that thing.
>> Could we distinguish though what you
mean by power? Because we've we've just
spent some of this conversation talking
about how it's not actually as powerful
as people think it is.
>> But I but I'm but I'm not even that's
what but mine is an even if thing,
right? So, like I I I'm just saying even
if it turns out to be incredibly
powerful, right? Like I don't I don't
even want to argue about whether it's
going to be powerful because that's
speculative.
Even if it's going to be incredibly
powerful, you still shouldn't centralize
all of that power in the hands of one
company or the government.
>> Yeah. Because if you do, all of that
power is going to be monopolized by
power- hungry people and used to
destroy
civilization. Basically, you'll end up
with a case where all of that wealth and
power will be centralized with the kinds
of people who who want it centralized.
So like
society for hundreds of years have faced
this again and again and again you know
so when it's like you know writing
used to be something that only the most
exclusive people had access to knowing
about writing.
And the same arguments were made. If you
let everybody write, they're going to
use it to write things that we don't
want them to write and it's going to be
really bad. You know, ditto with
printing, ditto with the vote. Like,
and again and again, society has to
fight against this natural prediliction
of the people that have the status quo
power to be like, no, this is a threat.
So when we're saying like, okay, what if
AI turned out to be incredibly powerful,
would it be better for society to be
that to be kept in the hands of a few or
spread out across society?
>> My argument was the latter. Now, there's
also an argument which is like, don't
worry about it. It's not going to be
that powerful anyway.
I I just didn't want to go there because
it's not
an argument that's easy to win because
you can't really say what's going to
happen.
We're all just guessing. But I can very
clearly say like, well, if it happens,
would it be a really good idea to only
let Elon Musk have it or would it be a
good idea to only let Donald Trump have
it?
>> Dan Hendris spoke about this offense
defense asymmetry. So it's actually very
important for us to have counterveailing
um you know defenses. But let's just
take that as a given for a minute
because obviously when we look at
something like Meta and Facebook, it's
quite clear what the power imbalance is.
You know, they they control all of our
data. They they know what we're doing
with with something like OpenAI and
Claude. So it's it's not as good as we
thought it was because actually humans
still need to be involved. But for
example, they have all of our data,
right? and you might be working on some
new innovative technology and you're
using Claude and you're sending all of
your information up there and they can
now copy you. I mean what what kind of
risks are you talking about to be more
concrete?
>> Yeah. No, I mean so I was not talking
about any of those things, right? So at
the time I was talking about this
speculative question of what if AI gets
incredibly powerful? I mean like like
now for example they they say that this
is the new means of production and
that's that seems completely hyperbolic
to me but like in your best estimation
now if there are risks what are they
>> if there are risks with the current
state of technology I mean I think some
of them are the ones we've discussed
which is
people enfeeing themselves
by basically losing their ability to be
to become more competent over
That's that's that's the big risk I
worry about the most. Um
the privacy risk, it's there, but I'm
not sure it's much more there than it
was for Google and Microsoft before.
Like you know, you used to work at
Microsoft, you know, how much data they
have about the average uh Outlook,
Office, etc. user uh ditto for Google,
you know, the average Google Workspace
or Gmail user.
um those privacy issues are real
although I think there are bigger
privacy issues around these companies
which the government can outsource data
collection to so back in the day it used
to be companies like choice point and
Axiom nowadays it's probably more
companies like Palanteer
the US government is um actually
prohibited from building large databases
about US citizens for example but It's
not prohibited. Companies are not
prohibited from doing so, and the
government's not prohibited from
contracting things to those companies.
So, I mean, that's a huge worry, but I
don't think it's one that AI is uniquely
creating. It certainly you're in the UK,
as you know, in the UK surveillance has
been universal for quite a while now. It
certainly makes it easier to use that
surveillance, but a sufficiently
wellresourced organization could just
throw a thousand bodies at the problem.
Um, so yeah, I'm not sure these are due
privacy problems as maybe more
common ones than they used to be.
>> Yeah,
>> Jeremy, I've just noticed the time. I
need to get to the airport.
>> All right,
>> this has been amazing.
>> Thank you, sir. Thank you for coming.
>> Yeah.
>> Hope you had a nice trip. Thank you so
much.
Ask follow-up questions or revisit key timestamps.
This video discusses the current state and future implications of AI in software development and beyond. It features a conversation with Jeremy Howard, who shares his insights on the ULM-fit model, the importance of fine-tuning, and the limitations of current LLMs. The discussion touches upon the nature of understanding in AI, creativity, the dangers of over-reliance on AI for coding, and the potential for AI to erode human expertise. It also explores the concept of 'vibe coding' and its addictive, yet potentially detrimental, nature. The conversation highlights the difference between coding and software engineering, emphasizing that true software engineering requires a deeper understanding and intuition that AI currently lacks. The importance of human-AI collaboration is stressed, with the need for environments that foster growth and understanding rather than dependence. Finally, the discussion touches on AI existential risks, arguing that the greater danger lies in the monopolization of powerful AI by a few, rather than the AI itself becoming autonomous and destructive. The conversation concludes by reinforcing the idea that AI should be a tool to augment human capabilities and knowledge, not replace them.
Videos recently processed by our community