The End of Coding: Andrej Karpathy on Agents, AutoResearch, and the Loopy Era of AI
2231 segments
code's not even the right verb anymore,
right? But I have to um express my will
to my agents for 16 hours a day
manifest.
>> How can I have not just a single session
of clot code or codeex or some of these
agent harnesses? How can I have more of
them? How can I do that appropriately?
The agent part is now taken for granted.
Now the claw-like entities are taken for
granted and now you can have multiple of
them and now you can have instructions
to them and now you can have
optimization over the instructions. But
there I mean this is why it gets to the
psychosis is that this is like infinite
and everything is skill issue.
Hi listeners, welcome back to No Briars.
Today I'm here with Andre Karpathy and
we have a wide-ranging conversation for
you about code agents, the future of
engineering and AI research, how more
people can contribute to research,
what's happening in robotics, his
prediction for how agents can reach out
into the real world, and education in
this next age. Welcome, Andre. Andre,
thanks for doing this. Yeah, thank you
for having me.
>> Uh, so it's been a very exciting couple
of months in AI.
>> Uh, yeah, you could say that.
>> I remember um walking into the office at
some point and you were like really
locked in and I was asking what you were
up to and you're like, I just I have to
code for 16 hours a day or code's not
even the right verb anymore, right? But
I have to
>> um express my will to my agents for 16
hours a day. Manifest
um because like there's been a jump in
capability.
>> Uh what's happening? and tell me about
your experience.
>> Yeah, I kind of feel like I was just in
this perpetual I still am often in this
state of AI psychosis just like all the
time. Uh because there was a huge unlock
in what you can achieve as a person as
an individual, right? Because you were
bottlenecked by you know your typing
speed and so on. But now with these
agents, it really I would say in
December is when it really just
something flipped where I kind of went
from 8020 of like you know uh to like
2080 of writing code by myself versus
just delegating to agents. And I don't
even think it's 2080 by now. I think
it's a lot more than that. I don't think
I've typed like a line of code probably
since December basically. Um, which is
like an extremely large uh change. Um, I
was talking to it like for example I was
talking about it to for example my
parents and so on and I don't think like
a normal person actually realizes that
this happened or how dramatic it was
like literally like if you just find a
random software engineer or something
like that at their at their desk and
what they're doing like their default
workflow of you know building software
is completely different as of basically
December. Uh so I'm just like in this
state of psychosis of trying to figure
out like what's possible uh trying to
push it to the limit. How is it how can
I have not just a single session of you
know um clot code or codecs or some of
these agent harnesses. How can I have
more of them? How can I do that uh
appropriately? And then how can I use
these claws? What are these claws? Uh
and uh so there's like a lot of new
things. I want to be at the forefront of
it, you know, and I'm very antsy that
I'm not at the forefront of it. And I
see lots of people on Twitter doing all
kinds of things and they all sound like
really good ideas and I need to be at
the forefront or I feel extremely
nervous. And so I guess I'm just in this
psychosis of like what's possible like
because it's unexplored fundamentally.
>> Well, if you're nervous, the rest of us
are nervous. We have a uh we have a team
that we work with at conviction that
their setup is everybody is like, you
know, none of the engineers write code
by hand and they they're all microphoned
and they just like whisper to their
agents all the time. It's the strangest
work setting ever. Uh, and I thought
they were crazy and now I like I fully
accept I was like, "Oh, this was the
way." Like you're just ahead of it.
>> Um, what uh how do you think about your
own capacity now to like explore or to
do projects like what is it limited by?
>> Yeah. What is it limited by? Uh just I
think everything like so many things
even if they don't work I think to a
large extent you feel like it's skill
issue. It's not that the capability is
not there. It's that you just haven't
found a way to string it together of
what's available. like I just don't I
didn't give good enough instructions in
the agents MD file or whatever it may
be. I don't have a nice enough memory
tool that I put in there or something
like that. So it all kind of feels like
skills when it doesn't work to some
extent. You want to see how you can
paralyze them etc. And you want to be
Peter Steinberg basically. Uh so Peter
is famous. He has a funny photo where
he's in front of a monitor with lots of
uh like uh he uses codecs. So lots of
codecs agents styling the the monitor
and they all take about 20 minutes if
you prompt them correctly and you use
the high effort. And so they all take
about 20 minutes. they have multiple,
you know, 10 repos checked out and so
he's just um going between them and
giving them work. It's just like you can
you can you can move in much larger
macro actions. It's not just like here's
a line of code, here's a new function.
It's like here's a new functionality and
delegate it to agent one. Here's a new
functionality that's not going to
interfere with the other one. Give it a
two and then try to uh review their work
as best as you can depending on how much
you care about that code. Like what are
these macro actions that I can like
manipulate my software repository by?
and like another agent is doing some
like research and another agent is
writing code, another one is coming up
with a plan for some new implementation.
And so everything just like happens in
these like macro actions over your
repository. Um, and you're just trying
to become like really good at it and
develop like a muscle memory for it is
extremely um, yeah, it's very rewarding
number one because it actually works.
Uh, but it's also kind of like the new
thing to learn. So that's why hence the
psychosis. Yeah, I I do feel like my
instinct is like whenever I am waiting
for an agent to complete something, the
obvious thing to do is like, well, I can
do more work, right? Like if I have
access to more tokens, then like I
should just paralyze add more tasks. And
so that that's very stressful because if
you
>> don't feel very bounded by your ability
to spend on tokens, then you know you
are the bottleneck in the system that is
max capability.
>> Yeah. you're not maximizing your
subscription at least and ideally for
multiple agents like if you run out of
the kod on codecs you should switch to
cloud or whatnot I don't know like
that's what I've been trying to do a
little bit and I feel nervous when I
have subscription left over uh that just
means I haven't maximized my token
throughput so I actually kind of
experienced this when I was a PhD
student you would feel nervous when your
GPUs are not running like you have GPU
capability and you're not maxing out the
available flops to you but now it's not
about flops it's about tokens uh so what
is your token throughput and what token
throughput do you command I would
actually argue that it's very
interesting that we had you know at
least 10 years where in many engineering
tasks people just did they didn't feel
compute bound
>> right um and like the entire industry
feels that now they feel like they they
they felt resource bound
>> uh and now that you have this big
capability jump you're like oh actually
it's not you know my ability to access
the compute anymore like I'm I'm the
binding constraint
>> yeah it's a skill issue
>> which is very empowering cuz um yeah cuz
you could be getting better. So that's
why that's why I think it's very
addictive because there's unlocks when
you when you get better.
>> Where do you think it goes? Like if you
just think about like okay you know
Andre is iterating and everybody else is
for 16 hours a day getting better at
using coding agents like what does it
look like in a year of like you've
reached mastery?
>> Yeah. What does mastery look like right
at the end of the year or like two three
years 5 years 10 years etc.
>> Well I think everyone is basically
interested in like going up the stack.
So I would say yeah it's not about a
single session with your agent. Um
multiple agents how do they collaborate
and teams and so on. So everyone's
trying to figure out what that looks
like. And then I would say claw is also
kind of an interesting direction because
it really when I say a claw I mean this
like layer that uh kind of takes
persistence to a whole new level. Like
it's something that like keeps looping
is is like um it's not something that
you are interactively in the middle of.
It kind of like has its own little
sandbox its own little you know it kind
of like does stuff on your behalf even
if you're not looking kind of thing.
um and then also has like maybe more
sophisticated memory systems etc that
are not yet implemented in agents. So
open claw has a lot more sophisticated
memory I would say than what you would
get by default uh which is just a memory
compaction when your context runs out.
Right.
>> You think that's the piece that
resonated for more users versus like
perhaps like broader tool access
>> for open claw.
>> Yeah. There there's like I think there's
at least I think
>> there's a lot of really good ideas in
here. Yeah. Good job Peter.
>> I mean Peter has done a really amazing
job. Um I saw him recently uh and I
talked to him about it and I he's very
humble about it but I think he innovated
simultaneously in like five different
ways and put it all together. Um so for
example like the soul and D document
like he actually really crafted a
personality that is kind of compelling
and interesting and I feel like a lot of
the current agents they don't get this
correctly. I actually think a clot has a
pretty good personality. It feels like a
teammate
>> uh and it's excited with you etc. Uh, I
would say um, for example, Codex is a
lot more dry. Um, which is kind of
interesting because in ChachiPT CEX is
like a lot more upbeat and highly
cyclical. But I would say Codex the
coding agent is very dry. It doesn't it
doesn't seem to care about what you're
creating. It's kind of like, oh, I
implemented it. It's like, okay, but do
you understand what we're building?
>> It's true.
>> You know, it doesn't it. The other thing
I would say is for example with Claude I
think they dial the psychopasy fairly
well where when Claude gives me praise I
do feel like I slightly deserve it
>> because sometimes I kind of give it like
not very wellformed thoughts and uh I
give it an idea that I don't think it's
fully baked and it doesn't actually
react very strongly. It's like oh yeah
we can implement that. But when it's a
really good idea by my own account, it
does seem to reward it a bit more. And
so I kind of feel like I'm trying to
like earn its praise which is really
weird.
>> And so I do think the personality
matters a lot. uh and I think a lot of
the other uh tools maybe don't
appreciate as much and I think in this
aspect also Peter really cares about
this and so that was correct and then
the memory system and then uh just you
know he's just having fun with this um
and then the the single WhatsApp portal
to all of the automation.
>> Yeah. Is there something that you have
done personally with your claws beyond
software engineering that you think is
fun or interesting?
>> Yeah. So in January I had a claw I went
through a period of claw psychosis. So,
I built um I have a claw basically that
takes care of my home and I call him
Dobby the elf claw. Um and uh basically
I used uh the agents to find all of the
smart home subsystems of my home on the
local area network which I was kind of
surprised that worked out of the box.
Like I just told it that I think I have
Sonos at home. Like can you try to find
it? and it goes and it did like IP scan
of all the um basically um computers on
the local area network and it found the
Sonos thing uh the Sonos uh system and
it turned out that there's no password
protection or anything like that. I just
logged in and it's like oh yeah you have
these Sonos systems installed I let me
try to reverse engineer how it's
working. It does some web searches and
it finds like okay these are the API
endpoints and then it's like do you want
to try it? And I'm like whoa like you
just did that. And I'm like, "Yeah, can
you try to play something in the study?"
And uh it does and music comes out and
I'm like, "I can't believe I just
>> That's crazy. That's like three
prompts." Yeah.
>> I can't believe I just typed in like,
"Can you find my sonos?" And that
suddenly it's playing music. And it did
the same for lights. And so basically
like it kind of hacked in, figured out
the whole thing. Uh created APIs,
created a dashboard so I could see the
command kind of center of like all of my
lights in the home. And then it was like
switching lights on and off. And you
know, so I can ask it like Dobby at
sleepy time. And when it's sleepy time,
that just means all the lights go off,
etc. and so on. So, it controls all of
my lights, my HVAC, my shades, uh the
pool and uh spa and also my security
system. So, I have a camera pointed
outside of the house and anytime someone
rolls in, I have a Quinn uh a Quinn
model that looks at the videos. So,
first of all, there's change detection,
right?
>> And then based on change detection, it
goes to Quinn and then it actually like
tells me um it sends me a text to my
WhatsApp. It shows an image from the
outside and it says, "Hey, a FedEx truck
just pulled up. FedEx truck just pulled
up and you might want to check it and
you got me mail or something like that.
And Dobby just text me this this really
incredible. Um so so Dobby is in charge
of the house. I text through with it
through WhatsApp. Um and it's been like
really fun to have these macro actions
that maintain my house. I haven't like
really pushed it uh like way more beyond
that and I think people are doing a lot
more crazy things with it. Uh but for me
even just a home automation setup, I
used to use like six apps, completely
different apps and I don't have to use
these apps anymore. Like Doby controls
everything in natural language. It's
amazing. Um, and so I think like I
haven't even pushed a paradigm fully,
but already that is so helpful and so
inspiring I would say.
>> Do you think that's indicative of like
what people want from a user experience
perspective with software, right?
Because I I don't think you know it's
pretty ignored that it takes humans
effort to like learn new software like
new UI. Yeah, I think uh to some extent
that's right. It's like working
backwards from how people think an AI
should be because what people have in
their mind of like what an AI is is not
actually what an LLM is by by like in a
raw sense like LLM is a token generator,
you know, like more tokens come out, but
what they think of is like this p this
persona identity that they can tell
stuff and it remembers it, you know, and
it's just kind of an entity behind a
WhatsApp. It's like a lot more
understandable. Mhm.
>> Uh so I think to some extent it's like
matching the expectations that humans
already have for what an AI should
behave but under the hood it's like a
lot of technical details go into that
and LLMs are too raw of a primitive to
actually um type check as AI I think for
most people if that makes sense.
>> Yeah. Um I think that's like how we
understand what the AI is and like the
um description of it as Dobby or some
personality obviously resonates with
people. Um, I also think that it uh the
unification that you did across your six
different software systems for your home
automation speaks to a different
question of like
>> do people really want all the software
that we have today?
>> Yeah.
>> Right. Um because I I would argue like
well you have the hardware
>> but you've now thrown away the software
>> or the the UX layer of it. Um do you
think that's what people want?
>> Yeah. I think there's this like there's
this sense that these apps that are in
the app store for using these smart home
devices etc. uh these shouldn't even
exist kind of in a certain sense like
shouldn't it just be APIs and shouldn't
agents be just using it directly and um
wouldn't it like I can do all kinds of
home automation stuff that uh any
individual app will not be able to do
right um and an LLM can actually drive
the tools and call all the right tools
and do do pretty complicated things um
>> and so in a certain sense it does point
to this like maybe there's like an
overproduction of lots of custom bespoke
apps that shouldn't exist because agents
kind of like crumble them up and
everything should be a lot more just
like exposed API endpoints and agents
are the glue of the intelligence that
actually like tool calls all the all the
parts. Um, another example is like my
treadmill. Uh, there's an app for my
treadmill and I wanted to like keep
track of how often I do my cardio. Uh,
but like I don't want to like log into a
web UI and go through a flow and etc.
Like all this should just be like make
APIs available and this is kind of you
know going towards the agentic um sort
of web or like agent first uh tools and
all this kind of stuff. So I think the
industry just has to reconfigure in so
many ways that it's like the customer is
not the human anymore. It's like agents
who are acting on behalf of humans and
this refactoring will be will probably
be substantial in a certain sense. One
way that people sometimes push back on
this is like do people do do we expect
people to v code some of these tools? Do
we expect normal people to do this kind
of stuff that I described?
>> But I think to some extent
>> this is just you know technology as it
exists today and right now there is some
vibe coding and I'm actually watching it
and I'm working with the system. But I
kind of feel like this kind of stuff
that I just talked about this should be
free like in a year or two or three.
There's no back coding involved. This is
trivial. This is table stakes. This is
like any AI even the open source models
etc can like do this.
>> You should be able to translate from a
less technical humans intent very easily
to this
>> extremely easily. Yeah. Today it's vi
coding it's involved and not many people
are going to do it. But
>> and you still have to make some design
decisions, right? We were talking about
like you take frames for example.
>> Yeah.
>> Yeah. But I kind of feel like this will
just uh start to the barrier will just
come down and it's just ephemeral
software on your behalf and some kind of
like claw is handling all the details
for you but you're not involved. Claw
has a
>> claw has a machine and it will figure it
out and it's just presenting you UIs and
you're like saying stuff you know. Mhm.
>> Why haven't you um I guess like pushed
the boundaries of what you can do
personally with Claus? Like is it you
know you're focusing on more important
projects, auto research, etc. or uh
you're climbing the hill to mastery or
something else, right?
>> Yeah. I just feel like I'm so distracted
by everything. So I spend I spent like a
week on the class stuff and I I have
more to do almost. Um but I will say
that um
>> like Jensen tools were all just busier
unfortunately.
>> Yeah. Uh, I didn't really take advantage
of a lot of like email and calendar and
all this other stuff and I didn't give
it access because I'm still a little bit
like suspicious and it's still very new
and rough around the edges. So, I didn't
want to give it like full access to my
digital life yet. And part of it is just
the security, privacy and uh just being
very cautious in that in that realm. And
um, so some of it is like held back by
that I would say. Yeah, maybe that's
like the dominant dominant feature, but
some of it is also just I feel so
distracted because I feel like I had a
week of claw and then other stuff is
happening. And
>> what was the um I mean you've talked
about like being able to train or at
least optimize a a model as a task you
want to see agents do for a long time
like what was the motivation behind auto
research?
>> Auto research. Yeah. So I think like I
had a tweet earlier where I kind of like
said something along the lines of to get
the most out of the tools that have
become available now you have to remove
yourself as the as the bottleneck. You
can't be there to prompt the next thing.
You're you need to take yourself
outside. um you have to arrange things
such that they're completely autonomous
and the more you you know how can you
maximize your token throughput and not
be in the loop. This is the this is the
goal and so I kind of mentioned that the
the name of the game now is to increase
your leverage. uh I put in just very few
tokens just once in a while and a huge
amount of stuff happens on my behalf and
so auto research like I tweeted that and
I think people liked it and whatnot but
that they haven't like maybe worked
through like the implications of that
and for me auto research is an example
of like an implication of that
>> where it's like I don't want to be like
the researcher in the loop like looking
at results etc like I'm I'm holding the
system back so the question is how do I
refactor all the abstractions so that
I'm not I have to arrange it once and
hit go the name of the game is how can
you get more agents running for longer
periods of time without your involvement
doing stuff on your behalf and auto
research is just yeah here's an
objective here's a metric here's your
boundaries of what you can and cannot do
and go and uh yeah
>> you were surprised at its effectiveness
>> yeah I I didn't expect uh it to work
because so I have the project data chat
um and fundamentally like I think a lot
of people are very confused with my
obsession for like training GBT2 models
and so on but for me uh training GBT
models and so on is just a little
harness a little playground for training
LLMs and fundamentally what I'm more
interested in is like this idea of
recursive self-improvement and to what
extent you can actually have LLMs
improving LLMs because I think all the
Frontier Labs this is like the thing
>> uh for obvious reasons and they're all
trying to recursively self-improve
roughly speaking and so for me this is
kind of like um a little play pen of
that um and I guess I like tuned Namat
already quite a bit by hand in a good
old fashioned way that I'm used to like
I'm a researcher I've done this for like
you know two decades I have some amount
of like what is the opposite of
>> uh yeah
>> earned confidence
>> okay I have like two decades of like oh
I've trained this model like thousands
of times of like um so I've done a bunch
of experiments I've done hyper primary
tuning I've done all the things I'm very
used to and I've done for two decades
and I've gotten to a certain point and I
thought it was like fairly well tuned
and then I let auto research go for like
overnight and it came back with like
tunings that I didn't see
>> and yeah I did forget like the weight
decay on the value embeddings and my
atom betas were not sufficiently tuned
and these things jointly interact so
like once you tune one thing the other
things have to potentially change too
you know I shouldn't be a bottleneck
like I shouldn't be running these
hyperparameter search optimizations. I
shouldn't be looking at the results.
There's objective criteria in this case.
Uh so you just let you just have to
arrange it so that it can just go
forever. So that's a single sort of
version of auto research of like a
single loop trying to improve. And I was
surprised that it um it found these
things that I you know the repo is
already fairly well tuned and still
found something. And that's just a
single it's a single loop like these
frontier labs they have GPU clusters of
tens of thousands of them. And so it's
very easy to imagine how you would
basically get a lot of this automation
on um smaller models and fundamentally
everything around like frontier level
intelligence is about extrapolation and
scaling loss and so you basically do a
ton of the exploration on the smaller
models and then you try to um
extrapolate out.
>> So you're saying our research efforts
are going to get more efficient like
we're going to have better direction for
when we scale as well if we can do this
experimentation better. Yeah, I would
say that like the most interesting
project and probably what the frontier
labs are working on is uh you know you
experiment on the smaller models. You
try to make it as autonomous as
possible. Remove researchers from the
loop. Uh they have way too much what is
the what is the opposite? Way too much
confidence. Yeah, they don't know. They
shouldn't be touching any of this really
and so you have to like rewrite the
whole thing because right now I mean
certainly they can contribute ideas but
okay uh they shouldn't actually be
enacting those ideas. There's a queue of
ideas and there's maybe an automated
scientist that comes up with ideas based
on all the archive papers and GitHub
repos and it funnels ideas in or
researchers can contribute ideas but
it's a single queue and there's workers
that pull uh items and they try them out
and uh whatever works just gets uh sort
of put on the feature branch and maybe
some people like um monitor the feature
branch and merge to the main branch
sometimes. So yeah, just removing humans
uh from all the processes and automating
as much as possible and getting high tok
tokens per second throughputs and it
does require rethinking of all the
abstractions. Um and uh everything has
to be reshuffled. So yeah, I think it's
very exciting. If
>> we take one more recursive step here, um
uh when is the model going to write a
better program MD than you?
>> Yeah. Uh so program MD is
>> we're not in the loop.
>> Yeah, exactly.
>> Yeah. Um, so program MD is my crappy
attempt at describing like how the auto
researcher should work like oh do this
then do that and that and then try these
kinds of ideas and then here's maybe
some ideas like look at architecture
look at optimizer etc. I just came up
with this in markdown, right?
>> Um
>> and so uh yeah, exactly. You want some
kind of an auto research loop maybe that
looks for you can imagine that different
program NDS would um would give you
different uh progress. So you basically
every research organization is described
by program MD. Yeah,
>> a research organization is a set of
markdown files that describe all the
roles and how the whole thing connects.
Um and you can imagine having a better
research organization. So maybe they do
fewer stand-ups in the morning because
they're useless. And this is all just
code, right? Um and so you can so one
organization can have fewer stand-ups,
one organization can have more uh one
organization can be very risk-taking,
one organization can be less. And so you
can definitely imagine that you have
multiple research orgs. Um and then they
all have code and once you have code,
then you can imagine tuning the code. So
100% there's like the meta layer of it.
Uh um
>> did you see my text about my contest
idea? My contest idea was
uh like let people write uh different
program MDs, right? And and so for same
hardware, where do you get most
improvement?
>> Oh, I see.
>> And then you can take all that data and
then give it to the model and say write
a better program MD.
>> Yes. Yes.
>> Yeah. Exactly.
>> We're going to get something better.
Like there's no way we don't.
>> You can 100% look at um where the
improvements came from and like can I
change the program MD such that more of
these kinds of things would be done or
like things that didn't work. uh
>> meta optimization. Yeah,
>> you can 100% imagine doing that. So I
think this is a great idea, but it's
like you know I think like you sort of
go one step at a time where you sort of
have one process and then second process
and then the next process and these are
all layers of an onion like the LLM sort
of part is now taken for granted. The
agent part is now taken for granted. Now
the claw-like entities are taken for
granted and now you can have multiple of
them and now you can have instructions
to them and now you can have
optimization over the instructions and
it's just like it's a little too much
you know but there I mean this is why it
gets to the psychosis is that this is
like infinite and everything is still
issue and that's why I feel like yeah
that's just coming back to this is why
it's so insane. Okay. Well, if we're
we're just trying to like diagnose the
current moment and uh what is a relevant
skill right now, what do you like what
do you think is the implication that
this um that this is the loop we should
be trying to achieve in different areas
and that it works right like you know
remove
>> create the metric or create the ability
for um agents to continue working on it
without you.
>> Yeah.
>> Do we still have performance engineering
like
>> Yeah. I mean so there's a few caveats
that I would put on top of the LM
ecosystem. Number one,
>> uh this is extremely well suited to
anything that has objective uh metrics
that are easy to evaluate. So for
example, like writing kernels for more
efficient CUDA, you know, code for
various parts of a model, etc. are the
perfect fit.
>> Because you have inefficient code and
then you want efficient code that has
the exact same behavior, but it's much
faster, perfect fit.
>> Uh so a lot of things like like are
perfect fit for auto research, but many
things will not be and so they it's just
if you can't evaluate then you can't
auto research it, right? Uh so that's
like caveat number one. And then maybe
caveat number two I would say is, you
know, we're we're kind of talking about
next steps and we kind of see what the
next steps are, but fundamentally the
the whole thing still doesn't it's still
kind of like bursting at the seams a
little bit and there's cracks and it
doesn't fully work. And if you kind of
try to go too far ahead, the whole thing
is actually net not useful if that makes
sense.
>> Um because these models like still are
not, you know, they've improved a lot,
but they're still like rough around the
edges is maybe the way I would describe
it. I simultaneously feel like I'm
talking to an extremely brilliant PhD
student who's been like a systems
programmer for their entire life and a
10-year-old. And it's so weird because
humans like there's I feel like they're
a lot more coupled. Like you have, you
know, um everything
>> you wouldn't you wouldn't encounter that
combination.
>> This jaggedness is really strange and
humans have a lot less of that kind of
jaggedness. Although they definitely
have some, but humans have a lot more
jaggedness. uh sorry the agents have a
lot more jaggedness where uh sometimes
like you know I ask for functionality
and it like comes back with something
that's just like totally wrong and then
we get into loops that are totally wrong
and then I'm just I get so frustrated
with the agents all the time still
because you feel the power of it but you
also there's still like it does
nonsensical things once in a while for
me still as well
>> I get very annoyed when um uh I feel
like the agent wasted a lot of compute
on something it should have recognized
was an obvious problem.
>> Yeah, I think like some of the bigger
things is like maybe what's under
underneath it, if I could hypothesize,
is fundamentally these models are
trained via reinforcement learning. So
they're actually struggling with the
exact same thing we just talked about,
which is the labs can improve the models
in anything that is verifiable, whether
has rewards. So did you write the
program correctly and does it do the
unit test check out? Yes or no? But some
of the things where they're struggling
is like for example, I think they have a
tough time with like nuance of maybe
what I what I had in mind or what I
intended and when to ask clarifying
questions. Um like what I yeah it's just
um anything that feels softer is like
worse. And so you're kind of like you're
either on Rails and you're part of the
super intelligence circuits or you're
not on Rails and you're outside of the
verifiable domains and suddenly
everything kind of just like meanders.
Like maybe another way to put it is if
you go to if today if you go to like
state-of-the-art model chachi PT and you
ask it tell me a joke um do you know
what joke you're going to get? There's
the joke.
>> The joke I do feel I I I can't tell you
like the you know standard form of it
but I do feel like Chach has like three
jokes.
>> Yeah. Yeah. So the the joke that
apparently all thems like laugh the most
is why do scientists uh not trust atoms?
>> Okay.
>> Because they make everything up.
>> Okay.
>> They make everything up.
So this is
>> how did that emerge?
>> So this is the joke you would get like
three or four years ago and this is the
joke you still get today.
>> Okay.
>> So even though the models have improved
tremendously.
>> Yeah.
>> And if you give them an agentic task,
they will just go for hours and move
mountains for you.
>> And then you ask for like a joke and it
has a stupid joke, a crappy joke from 5
years ago. And it's because it's outside
of the it's outside of the RL.
>> It's outside of the reinforcement
learning. It's outside of what's being
improved. It's like and it's part of the
jaggedness of like shouldn't you expect
models as they get better to also have
like better jokes or more diversity of
them or it's just it's not being
optimized and it's stuck.
>> Do you uh uh think that that implies
that we are not seeing like
generalization in the sense of like
broader intelligence of joke smartness
being attached to code smartness. Yeah,
I think there's some decoupling where
some things are verifiable and some
things are not and some things are
optimized for arbitrarily by the labs
depending on like what data went in and
some things are not and um
>> and
>> but I mean the the premise there's a you
know premise from some research groups
that if you are smarter at code
generation or in these ver verifiable
fields you should be better at
everything and and like the the the joke
situation suggests that that's not
happening in all
>> I don't think that's happening. Yeah, I
don't think that's happening. I think uh
I think maybe we're seeing like a little
bit of that but not like a satisfying
amount.
>> Yeah, that agonist exists in humans.
>> You can be very very good at math and
still tell a really bad joke.
>> Yeah, that's true. Yeah, but it just it
still means that we're not getting like
the story is that we're getting a lot of
the intelligence and capabilities in all
the domains of society like for free as
we get better and better models. And
that's not like exactly fundamentally
what's going on. And there's some blind
spots and some things are not being
optimized for. And this is all clustered
up in these neural net opaque models,
right? So you're either on rails of what
it was trained for and everything is
like you're going at speed of light or
you're not. Um and so it's jaggedness.
So um so that's why I think like even
though the the progression is obvious
what should happen, you can't let it
fully go there yet because it doesn't
fully work or it's a skill issue and we
just haven't like figured out how to use
it. So you know it's hard to tell. Can I
ask kind of a blasphemous question which
is like if this jaggedness is persisting
um and it's all rolled up in a uh at
least monolithic interface right but you
know single model
>> um does that make sense or do do you
should should it be unbundled into
things that are can be optimized and
improved against different
>> domains of intelligence
>> uh like unbundling the models into
multiple experts in different areas etc
>> more directly yeah
>> um
>> instead of juste that we have no
exposure to that can be confusing as a
why is it so good at this but not at
this other thing.
>> Yeah, I think currently my impression is
the labs are trying to have a single
sort of like monoculture of a model that
is arbitrarily intelligent in all these
different domains and they just stuff it
into the parameters. I do think that we
will we I I I do think we should expect
more speciation in the um intelligences.
Um like you know the animal kingdom is
extremely diverse in the brains that
exist and there's lots of different
niches of uh of nature and some animals
have overdeveloped visual cortex or
other part kind of parts and I think we
we should be able to see more speciation
and um you don't need like this oracle
that knows everything. and you kind of
speciate it and then you put it on a
specific task. And we should be seeing
some of that because you should be able
to have like much smaller models that
still have the cognitive core like
they're still competent but then they
specialize and then um and then they can
become more efficient in terms of
latency or throughput on uh specific
tasks that you really care about like if
you're a mathematician working in lean.
I saw for example there's a few releases
that really like target that as a
domain. Um uh so there's a probably
going to be a few examples like that
where the unbundling kind of makes
sense. One question I have is whether or
not uh the capacity constraint on
available compute infrastructure
>> drives more of this because efficiency
Yeah. actually matters more, right? Like
your
>> if you
financing aside though financing is
involved in all of this, if you have
access to full compute for anything you
do, like even one single model, right?
But if you actually feel pressure where
you're like I can't serve
>> um a model of massive size for every use
case
>> like do you think that leads to any
speciation? Does that question make
sense to you? The question makes sense
and I guess like what I'm what I'm what
I what I'm struggling with is I don't
think we've seen too much speciation
just yet, right?
>> No.
>> Uh we're seeing a monoculture of models.
>> Yeah.
>> So um
>> and there's like clearly pressure for
like make a good code model, put it back
in the main merge again.
>> Yeah. Yeah.
>> Um
even though there already is pressure on
the models.
>> I guess perhaps I I feel like there's a
lot of very short-term supply crunch and
like maybe that causes more speciation
now.
>> Yeah. Yeah, I think fundamentally like
the the the labs are serving a model and
they don't really know what the end user
is going to be asking about. Uh so maybe
that's like some part of it because they
kind of have to multitask over all the
possible things that could be asked. But
I think if you're coming to a business
and maybe partnering on some specific
problems you care about, then maybe you
would see that there. Um or there would
be some very high value applications
that are like more niche. Um but uh I
think right now they're kind of like
going after the totality of what's
available. I don't think that the
science of manipulating the brains is
like fully developed yet partly.
>> What do you mean manipulating?
>> So like so fine-tuning without losing
capabilities as an example and we don't
have these primitives for actually like
working with the intelligences in ways
other than just context windows like
context windows kind of just just work
and it's very cheap to manipulate etc.
And this is how we're getting some of
the customization etc. Uh but I think if
it was I think it's a it's a bit more of
a developing science of how you like
more deeply adjust the models, how you
have continual learning maybe or how you
um how you fine-tune in a certain area,
how you get better in a certain area or
like how you actually touch the weights,
not just the context windows. And so
it's a lot more tricky, I would say, to
touch the weights than just the context
windows. Uh because you're actually
fundamentally changing the full model
and potentially its intelligence. And so
um so maybe it's just like not a fully
developed science, if that makes sense,
of speciation. A and it also has to be
like cheap enough
>> for that speciation to be worthwhile in
these given
>> contexts. Can I ask a question about uh
like uh an extension to auto research
that you described in terms of um open
ground? You said okay well you know we
have this thing
>> um we need more collaboration surface
around it essentially for people to
contribute um to research overall. Can
you talk about that? Yeah. So, we talked
about our research has a single thread
of like I'm going to try stuff in loop,
but fundamentally uh the paralization of
this is like the interesting component.
Um, and I guess I was trying to like
play around with a few ideas, but I
don't have anything that like clicks as
simply as like I don't have something
that I'm like super happy with just yet,
but it's something I'm like working on
on the side when I'm not working on my
claw. Um so I think like one issue is if
you have a bunch of nodes uh of
paralization available to you then it's
very easy to just have multiple auto
researchers talking through um a common
system or something like that. What I
was more interested in is how you can
have an untrusted pool of workers out
there on the internet.
>> So for example in auto research uh
you're just trying to find um the piece
of code that trains a model to a very
low validation loss. If anyone gives you
a candidate commit, it's very easy to
verify that that commit is correct, is
good. Like they someone could claim from
the internet that this piece of code
will optimize uh much better and give
you much better performance. You could
just check very easy, but probably a lot
of work goes into that checking. Uh but
fundamentally they could lie and etc. So
you're basically dealing with a similar
kind of it's almost actually like looks
a little bit like my my designs that
incorporate an untrusted pool of workers
uh actually look a little bit more like
a blockchain a little bit. uh because
instead of blocks, you have uh commits
and these commits can build on each
other and they contain like changes to
the code as you're improving it. Um and
uh the proof of work is basically doing
tons of experimentation to find the
commits that work.
>> Um and that's hard. Um and then the
reward is just being on the leaderboard
right now. There's no monetary reward
whatsoever. uh but I don't want to push
the analogy too far but it fundamentally
has this issue where you a huge amount
of search goes into it but it's very
cheap to verify that a candidate
solution is indeed good because you can
just train a single you know someone had
to try 10,000 ideas but you just have to
check that the thing that they produced
actually works
>> because the 99,000 of them didn't work
you know
>> um and so basically long story short is
like you have to come up with a system
where an untrusted pool of workers can
collaborate with a trusted pool of
workers uh that do the verification
And the whole thing is kind of like
asynchronous and works and um and so on
and uh it's it's like safe from a
security perspective because if anyone
sends you arbitrary code and you're
going to run it that's very sketchy and
dodgy. So um but fundamentally it should
be totally possible. So you're familiar
with projects like seti at home and
folding at home all of these problems
have a similar kind of setup. So folding
at home you're folding a protein um and
it's very hard to find a configuration
that is low energy. But if someone finds
a configuration that they evaluate to be
low energy that's perfect you can just
use it. you can easily verify it. So a
lot of things have this property that
you know very expensive to come up with
but very cheap to verify and so in all
those cases things like folding at home
or seti at home or auto research at home
will be good fits. And so, um, long
story short, a swarm of agents on the
internet could collaborate to improve
LLMs and could potentially even like run
circles around Frontier Labs. Like, who
knows, you know? Um, yeah, like maybe
that's even possible. Like, Frontier
Labs have a huge amount of trusted
compute, but the Earth is much bigger
and has huge amount of untrusted
compute. But if you put systems in check
uh systems in place that you know deal
with this then maybe it is possible that
the swarm out there could uh could come
up with with better with better
solutions and people kind of like
contribute cycles um to to a thing that
they care about. And so sorry so the
last thought is uh lots of companies or
whatnot they could maybe have like their
own uh things that they care about and
you if you have compute capacity you
could contribute to different kind of
auto research tracks like maybe you care
about certain you know like you care
about like cancer or something like that
of certain type you don't have just
donate money to an institution you
actually could like purchase compute and
then you could join the auto resource
forum for that project you know uh so if
everything is rebundled into other
researchers then compute becomes the
thing that you're contributing to the
pool. Yeah, that's very inspiring. And
it's also interesting like I don't I
don't know how far this goes, but it is
interesting that at least some audience
of people, you know, here in Silicon
Valley or lining up at um you know,
retail stores in China have discovered
that like having access to personal
compute is interesting again.
>> Yeah.
>> Right. So maybe they're really motivated
to do that for their claws and then they
can contribute to auto research.
>> It's almost like dollars the thing
everyone cares about, but is flop the
thing that actually everyone cares about
in the future? Like is there going to be
like a flipping almost of like what the
thing that you care about? Like right
now for example, it's really hard to get
compute even if you have money.
>> Yeah.
>> So actually it almost seems like the
flop is like dominant uh in a certain
sense. Um yeah. So uh so maybe that's
kind of like kind of like that like how
much how many flops do you control
instead of like what wealth do you
control? I don't actually think that's
true but it's kind of interesting to
think about.
>> The last thing you released was like a
little bit of jobs data analysis. Is
that right?
What um and might have touched a nerve
even though you're just like visualizing
some public data. Yeah. Uh what was you
know what were you curious about?
>> Yeah, I guess I was curious to um I mean
everyone is like really it's everyone is
really thinking about the impacts of AI
on the job market and what's going to
look like. So I was just interested to
take a look like what does the job
market look like? Where are the
different roles? um and how many people
are in different professions. And I was
like really just interested to like look
through uh the individual cases and try
to think myself about like you know with
these AIs and how they're likely to
evolve like are these going to be tools
that people are using? Are these going
to be displacing tools for these uh
professions and like what are the
current professions and how are they
going to change? Are they going to grow
or uh adjust to a large extent or like
what could be new professions? So it's
really just like a way to fuel my own
chain of thought about the industry I
suppose. M
>> um and so uh yeah the jobs data
basically is just a Bureau of Labor
Statistics they actually have um percent
outlook for each profession about how
much it's expected to grow over the next
I think almost a decade.
>> Uh yeah I think it's a decade but it was
made in 2024.
>> We need a lot of healthare workers.
>> Yeah. So so they've already made those
projections and I'm not sure actually
100% what the methodology was that they
that they put into the projections. Um,
I guess I was interested to color things
by like if people think that what's like
primarily being um developed now is this
kind of like more digital AI that is
kind of like almost like these ghosts or
spirit entities that can like interact
in the digital world and manipulate a
lot of like digital information and they
currently don't really have a physical
embodiment or presence and the physical
stuff is probably going to go slightly
slower because you're manipulating
atoms. So flipping flipping bits and and
the ability to copy paste a digital
information is like makes everything a
million times faster than accelerating
matter, you know. So um so energetically
I just think we're going to see a huge
amount of activity in digital space,
huge amount of rewriting, huge amount of
activity boiling soup and I think the
we're going to see something that that
in the digital space goes at the speed
of light compared to I think what's
going to happen in the physical world to
some extent if would be the
extrapolation. And so I think like
there's currently kind of like I think
overhang where there can be like a lot
of unhobling almost potentially of like
a lot of digital information processing
that used to be done by computers and
people and now with AI as like a third
kind of manipulator of digital
information. There's going to be a lot
of refactoring in those in those uh
disciplines. Um but the physical world
is actually going to be like I think um
behind that by some amount of time. And
so I think what's really fascinating to
me is like so that's why I was
highlighting the the professions that
fundamentally manipulate digital
information. This is work you could do
from your home etc. because I feel like
those will be like things will change
and it doesn't mean that there's going
to be less of those jobs or more of
those jobs because that has to do with
like demand elasticity and many other
factors but things will change in these
professions because of these new tools
and um because of this upgrade to the
nervous system of the human
superorganism if you want to think about
it that way. Given the look you had at
the data, do you have either any
observations or um uh guidance for
people facing the job market or thinking
about what to study now or what skills
to develop? I mean, we can all go get
like I'm very thankful that I have to
like meet people for my job right now.
>> More physical. Yeah.
>> Could you do your work from home though?
Uh I could
>> I think there are relationship parts of
it that are hard, but most of it I
could.
>> Yeah. I think it's really hard to tell
because again like the job market is
extremely diverse and I think the
answers will probably vary but uh to a
large extent like these tools are
extremely new, extremely powerful and so
just being uh you know just trying to
keep up with it is like the first thing
um and um yeah because I think a lot of
people kind of like dismiss it or
>> or they're afraid of it
>> or they're afraid of it etc which is
totally understandable of course. Yeah,
I think like um it's fundamentally an
empowering tool at the moment. Um and
these jobs are bundles of tasks and some
of these tasks can go a lot faster and
so people should think of it as
primarily a tool that it is right now.
Um and I think the long-term future of
that is uncertain. Yeah, it's kind of
really hard to forecast to be honest and
like I'm not professionally like doing
that really and I think it's a job of
like economists to do properly.
>> You are an engineer though. Uh and like
one thing I thought was interesting is
that like the uh demand for engineering
jobs is continuing to increase.
>> Yeah.
>> Um I I can't tell if that's like a
temporary phenomenon. I'm not sure how I
feel about it yet. Do you know?
>> Yeah. That's like the demand almost like
uh software was scarce, right? And so
the reason we don't have more demand for
software is just it's scarcity and it's
too expensive.
>> Too expensive. Yeah.
>> So if the barrier comes down then
actually you have the Jevans paradox
which is like you know you actually the
demand for software actually goes up.
It's cheaper and there's more more
>> powerful. Yeah. the the classical
example of this always is the ATMs and
the bank tellers uh because there was a
lot of like fear that um ATMs and
computers basically uh would displace
tellers but what happened is they made
like the cost of operation of um of a
bank branch much cheaper and so there
were more bank branches so there were
more tellers is like the canonical
example people site uh but basically
it's just paradox like something becomes
cheaper so there's a lot of unlocked
demand for it so I do think that that's
probably I do have cautiously optimistic
view of this in software engineering
where I do um it does seem to me like
the demand for software will be
extremely large um and it's just become
a lot cheaper and um so I do think that
for quite some time um it's very hard to
forecast but it does seem to me like
right now at least locally there's going
to be more demand for software um
because software is amazing it's like
you know digital information processing
you're not forced to use like arbitrary
tools that were given to you that are
imperfect in various ways you're not
forced to subscribe to what exists uh
code is now ephemeral and it can change
and it can be modified um and so I think
there's going to be a lot of activity in
the digital space to like rewire
everything in a certain sense and I
think it's going to create a lot of
demand for for this kind of stuff I
think long term um yeah obviously even
with auto research like openi or or you
know uh anthropic or these other labs
like they're employing what like a
thousand something researchers right
>> these researchers are basically like
glorified auto like you know
>> they're like automating themselves away
like actively and this is like the thing
they're all trying to do.
>> Yeah.
>> I f like I went around um
>> some of those researchers also feel feel
the psychosis, right? Because they can
it's working. Yeah.
>> Right. And and so they're like oh it's
over for me too.
>> I did spend a bunch of time going around
opening eye and I was like you guys
realize if we're successful like we're
all out of job like
>> like it's just we're just building
automation for Sam or something like
that. like I or the board I'm not sure
but like uh there's just building like
this automation for yeah the board or
the CEO or something like that and we're
all out of our job and maybe um
contributing on the sides and so yeah
it's kind of like uh nerving from that
perspective
>> is it okay if I ask you Nome's question
>> you know you could be doing that right
auto researching with a lot of compute
scale and a bunch of colleagues at one
of the frontier labs like why not
>> well I was there for a while right like
and I did re-enter so to some extent I
agree and I think that there are many
ways to slice this question. It's a very
loaded question a little bit. Um I will
say that I feel very good about like
what people can contribute in their
impact uh outside of the frontier labs
obviously not in the industry but also
in like more like ecosystem level roles.
Um so your role for example is more like
ecosystem level. My role currently is
also kind of more on ecosystem level and
I feel very good about like impact that
people can have in those kinds of uh
roles. I think conversely there's there
are definite problems in my mind for um
uh for basically aligning yourself way
too much with the frontier labs too. So
fundamentally I mean you're you have a
huge amount of financial incentive to uh
with these frontier labs and by your own
admission the uh the AIs are going to
like really change humanity and society
in very dramatic ways and here you are
basically like building the technology
and benefiting from it like and being
like very allied to it through financial
means like this was a conundrum that was
in um at the heart of you know how open
started in the beginning like this was
the conundrum that we were trying to
solve. M
>> um and so you know that so it's kind of
um
>> it's still not the conundrum is still
not like fully resolved. So that's
number one. You you're not a completely
free agent and you can't actually like
be part of that conversation in a fully
autonomous um free way like if you're
inside one of the frontier labs like
there are certain things that you can't
say. Uh and conversely there are certain
things that the organization wants you
to say and you know they're not going to
twist your arm but you feel the pressure
of like what you should be saying you
know cuz like obviously
Otherwise, it's like really awkward
conversations,
strange side eyes, like what are you
doing? You know, like so you can't like
really be an independent agent and I I
feel like a bit more ali like aligned
with humanity in a certain sense outside
of a frontier lab because uh I don't I'm
not subject to those pressures almost,
right? And I can't say whatever I want
or yeah, I would say in the frontier
labs like um you can have like uh impact
there of course as well. So uh but
there's many researchers and maybe
you're one of them, maybe your ideas are
really good, etc. Maybe there's a lot of
decision-m to to do and you want to be
in a position where you are in the room
with those conversations when they come
up. I do think that currently the stakes
are like overall fairly low and so
everything is kind of like nice but
ultimately at the end of the day like
when the stakes are really high etc. If
you're an employee at an organization I
don't actually know how much sway you're
going to have on the organization what
it's going to do like fundamentally at
the end of the day um uh it's uh you're
not like really in charge like you're in
a room and you're contributing ideas but
you're not like really in charge of that
entity that you're that you're a part
of. So those are like some sources of
misalignment I think to some extent. I
will say that like in one way I do agree
a lot with that sentiment that um I do
feel like and if uh like the labs for
better or worse they're opaque and a lot
of work is there and they're kind of
like at the edge of capability and
what's possible and they're working on
what's coming down the line and I think
if you're outside of the frontier lab uh
your your judgment fundamentally will
start to drift because you're not part
of the you know what's coming down the
line right and so I feel like my
judgment will inevitably start to drift
as well and uh I won't actually have an
understanding of how these systems
actually work under the hood that's an
opaque system uh I won't have a a good
understanding of how it's going to
develop and etc. And so I do think that
in that sense I agree and something I'm
nervous about. I think it's worth
basically bas uh being in touch with
what's actually happening and actually
being in the frontier lab and if if some
of the frontier labs would have me come
for you know some amount of time and do
really good work for them and then maybe
coming
>> is looking for a job. This is super
exciting.
>> Yeah.
>> Then I think that's maybe a good setup
because I kind of feel like it kind of
um you know um maybe that's like one way
>> uh to to actually be connected to what's
actually happening but also not feel
like you're necessarily fully controlled
by
>> Yeah.
>> by those entities. So I think honestly
in my mind like uh Noom can probably get
do extremely good work at uh at OI but
also I think his most um impactful work
could very well be outside of OpenAI.
>> No that's a call to be an independent
researcher with auto research. Yeah,
there's many things to do on the outside
and it's a it's a and I think ultimately
I think the ideal solution maybe is like
yeah going back and forth uh or um yeah
and I think fundamentally you can have
really amazing impact in both places. So
very complic I don't know like it's a
very loaded question a little bit but I
mean I joined the frontier lab and now
I'm outside and then maybe in the future
I'll want to join again and I think um
uh that's kind of like how I look at it.
One question related to what visibility
to does the world or the AI ecosystem
have into uh the frontier is like how
how close open sources to the frontier
>> um and how sustainable that is. I I
think yeah I think it is quite
>> surprising the entire sequence of events
actually from like having a handful of
Chinese models and global models and I
think people are going to continue
releasing here in the near term that are
closer than much of the industry
anticipated from a capability
perspective.
>> Um I don't know if you're surprised by
that but you're a long-term contributor
to open source. Like what's your
prediction here? Yeah. So roughly
speaking basically the um yeah the
closed models are ahead but like people
are monitoring the number of months that
sort of like open source models are
behind. Um
>> and it started with there's nothing and
then it went to 18 months and now it's
>> convergence right. So maybe they're
behind by like what is the latest maybe
like eight six months eight months kind
of thing right now. Yeah I'm a huge fan
of open source obviously. So for example
in operating systems you have like
closed like you know Windows and Mac OS.
These are large software projects kind
of like what LM are going to become and
there's Linux but Linux is very easy
like actually Linux is extremely
successful project it runs on the vast
majority of computers like last time I
checked was it like 60% or something
like run Linux um and that's because
there is a need in the industry to have
a common open platform that everyone
feels uh sort of safe using I would say
like the industry has always felt a
demand for that kind of a project to
exist and I think the same is true now
and that's why businesses actually want
there's demand for this kind of a um a
thing to exist the big difference is
that everything is capital. Uh there's a
lot of capex that goes into this.
>> Um so I think that's where things like
fall apart a little bit make it a bit
harder to to compete in certain sense.
Uh I I do think that the current models
are very good. The other thing that I
think is like really interesting is that
for the vast majority of like consumer
use cases and things like that even like
term open source models are actually
quite good I would say and I think like
if you go forward like more uh more
years it does seem to mean like a huge
amount of like simple use cases are
going to be well covered and actually
even run locally. Um, but there's going
to be always like some demand for like
frontier intelligence and that that can
actually be extremely large piece of the
pie. But it could be that the frontier
the need for frontier intelligence is
going to be like, you know, Nobel Prize
kind of work or like let's move Linux
from C to Rust. There's going to be like
bigger projects, you know, like scoped
in that kind of a way. And there's going
to be maybe more um and maybe that's
where a lot of the frontier closed
intelligences were are going to be
interacting with and open source is kind
of like going to eat through a lot of
the more basic use cases or something
like that. You know at some point what
is frontier today is going to be you
know probably later this year what's
frontier today in terms of what I'm
using right now from the closed labs uh
might be open source and that's going to
be doing a lot of work. So I kind of
expect that this dynamic will actually
basically continue like we'll have
Frontier Labs that have closed um AIS
that are kind of like these oracles and
then we'll have open source kind of like
behind by some amount of months and I
kind of expect that to uh to continue
and I actually think that's like a
pretty pretty good setup uh overall. Um
because I I'm a little bit hesitant of
having um I don't actually think it's
like structurally I think there's some
systemic risk attached to just having
intelligences that are closed and that's
like that's it. Mhm.
>> And I think that that's uh you know
centralization has a very poor track
record in my view uh in in the past and
has uh
>> you mean like in political or economic
systems in general.
>> Yes.
>> Exactly. I think there's like a lot of
>> like Eastern European. Yeah.
>> A lot of pretty bad president. So I want
there to be a thing that is maybe not at
the edge of capability because it's new
and unexplored etc. But I want there to
be a thing that's behind and that uh is
kind of like a common working space for
intelligences that the entire industry
has access to. Yeah, that seems to me
like a pretty decent power balance for
the industry.
>> Yeah, I also think there's just like
there are many problems to solve, right?
Like if you keep advancing intelligence
from the frontier, we can do new things
and there are a lot of like very big
problems for humanity, right? And so
like it seems that that will continue to
be a very expensive game. And so I want
to like root for labs that are doing
that because there are problems we
cannot solve without continuing to
advance the models in a very expensive
way. Yeah. And yet, as you point out,
like
>> if what we have today as Frontier is
open, that's a lot of capability. Yeah.
Right. And and so I I think you know the
power of that or the democratization of
that seems like
>> very useful and also healthy.
>> Yeah. I think basically by accident
we're actually like in okay spot
>> and optimal. Yeah.
>> By accident we we are happen to be in a
good spot in a certain sense. Um
>> well and and to some degree the the
longer this endures like this dynamic
>> um the the healthier of a spot like the
ecosystem might be in right because you
have more and more area under the curve
>> and I will say that even on the close
side I almost feel like it's been like
even further centralizing recently
because I think a lot of the front
runners are like not necessarily like
the top tier and so yeah like in that
sense I think it's um it's not super
ideal. I would love there to be more
more front to last because yeah I'm like
by default very suspicious of like um I
want there to be more people in the
room. I want I think like in machine
learning ensembles always outperform any
individual model and so I want there to
be ensembles of people thinking about
all the hardest problems and I want
there to be ensembles of people in the
room when they um to be all well
informed and to make all those decisions
you know so uh I don't want it to be
like a closed doors with two people or
three people. I feel like that's like
not a good not a good future. I almost
wish like there were more labs is long
story short and I I I do think that open
source has a has a has a place to play.
I hope it sticks around and I basically
it's currently slightly behind and
that's actually kind of like a good
thing.
>> Okay. you worked on the precursor to
generalized robotics autonomy um in
cars, right? Uh a a lot has happened in
the last couple months with robotics
companies as well like acceleration of
really impressive generalization of
environment of tasks like increasing
long horizon tasks, lots of money going
into the space like is it going to
happen? Has anything in your view
changed recently?
>> So like my view is kind of informed by
what I saw in self-driving and I do feel
like self-driving is the first robotics
application. So probably what I saw is
at the time like 10 years ago there were
a large number of startups and I kind of
feel like um like most of them basically
like didn't long-term make it. Um and
what I saw is that like a lot of capital
expenditure had to go in and a lot of
time and so um I think it like I think
robotics because it's so difficult and
so messy and requires huge amount of
capital investment and a lot of like con
conviction um just it's like a big
problem and I think items are really
hard. So I kind of feel like they will
lag be it will lag behind what's going
to happen in digital space and in
digital space there's going to be a huge
amount of unhobling uh basically like
things that weren't super efficient
becoming a lot more efficient by like a
factor of 100
>> because bits are so much easier and so I
think currently in terms of what's going
to change and like where the activity is
I kind of feel like digital space is
going to like change a huge amount and
then the physical space will lag behind
and what I find very interesting is like
this interface in between them as well
because I think in this
If you we do have more agents acting on
behalf of humans and more agents kind of
like talking to each other and and doing
tasks and participating in the kind of
economy of agents etc. Um you're going
to run out of things that you're going
to do purely in a digital space. At some
point you have to go to the universe and
you have to ask it questions. Um you
have to run an experiment and see what
the universe tells you to get back to
learn something. And so we currently
have a huge amount of like digital work
uh because there's an overhang in how
much we collectively thought about what
already is digital. So we just didn't
have enough thinking cycles among the
humans to think about all the
information that is already digital and
already uploaded. Um and so we're going
to start running out of stuff that is
actually like um already uploaded. Uh so
you're going to at some point read all
the papers and process them and have
some ideas about what to try. But um
yeah, we're just going to uh I don't
actually know how much you can like get
intelligence that's like fully closed
off and with just information that's
available to it, you know. And so I
think what what's going to happen is
first there's going to be huge amount of
unhobling and I think there's a huge
amount of work there. Then actually it's
going to move to like the interfaces
between physical and digital. So and
that's like sensors of like seeing the
world and actuators of like doing
something to the world. So I think a lot
of interesting companies will actually
come from that interface of like can we
feed the super intelligence in a certain
sense data and can we actually like take
data out and manipulate the physical
world um per its bidding if you want to
like interropomorphize the whole thing
right and then the the physical world
actually I almost feel like the the
total addressable market etc in terms of
like the amount of work and so on is is
massive possibly even much larger maybe
what can happen in digital space so I
actually think it's like a much bigger
opportunity as well but um I do feel
like it's a huge amount of work and and
in my in my mind the atoms are just like
a a million times harder. So um so it
will lag behind but it's also I think a
little bit of bigger market. So it's
kind of like uh yeah I think the
opportunities kind of like follow that
kind of trajectory. So right now this
digital is like my main interest then
interfaces would be like after that and
then maybe like some of the physical
things um like their time will come and
they'll be huge when they do come. Well,
it's it's an interesting framework for
it too because uh certain things not the
things I'm working on right now but
certain things are much easier even in
the world of atoms right like if you
just think about like read and write to
the physical world like read like
sensors cameras like there's a lot of
existing hardware and you can imagine
like
>> enriching agent capabilities or
capturing a lot of new data if you're
just clever about it and like you don't
necessarily have to invest a lot to like
get something valuable.
>> Yeah. So like examples of this that I
saw for example are you know a friend of
mine Liam is running is a CEO of
periodic I visited them last week so
it's just on top of mind like they're
trying to do auto research for material
science
>> um and so in that case it's like the
sensors to the intelligence are actually
like pretty expensive lab equipment and
the same is true in biology. I think a
lot of people are very interested in
engineering biology and you know the
sensors will be more than just like
video cameras if that makes sense. And
then the other thing I was I saw for
example is companies that are trying to
have um like you basically pay people
for training data. Yeah. As an example
to feed
>> programmatically.
>> Yeah. To feed to feed the Borg. Uh um
and so like these are all examples of
like sensors in a certain sense. So they
take many diverse shapes and forms if
that makes sense.
>> Yeah. So I'm looking forward to the
point where I can ask for a task in the
physical world and I can put a price on
it and just tell the agent like you know
you figure out how to do it. Go get the
data.
>> I'm actually kind of surprised we don't
have enough like information markets.
Mhm.
>> Like if for example if poly market or
other betting markets or even stocks etc
if they have so much autonomous activity
and rising amount of activity
>> like um why should like for example if
Iran was just happening now like how
come there isn't a process where like
taking a photo or video from somewhere
in Tan should cost like 10 bucks like
someone should be able to pay for that
you know like and that's an example of
like feeding the intelligence there's
not going to be a human looking at it
it's going to be like agents who are
trying to guess the betting games and
stock markets and so on. M
>> so I kind of feel like the agentic web
is still like fairly new that there's no
like mechanisms for this but this is an
example of what I I think might happen
there's a good book that maybe is
inspiring called demon you potentially
read it in Damon the intelligence um
ends up like puppeteering almost a
little bit like humanity in a certain
sense you know and so humans are kind of
like its actuators but humans are also
like its sensors
>> um and so I think like collectively like
society will kind of like reshape in a
certain way in uh to to serve that kind
kind of a that will kind of like end up
happening collectively across the
industry where yeah there's just a lot
more automation and has certain needs
and kind of humans will be serving those
needs of that of that machine not
necessarily like to each other
>> well we were um on this very specific
point of uh like missing pieces of
training data we needed um we needed
something like auto research right like
we we need the training cycle or the SFT
piece to be uh far more mechanized
>> for for what part
>> in order to make the uh collection like
in order to take the human out of the
loop to ask for a task that is just like
improve my model quality
>> with new data,
>> right?
>> Uh yes.
>> Does that make sense to you? Like we um
if you can't have the model do the
training runs by itself,
>> then your ability to do this as a like
closed loop task Yes. with u by pricing
data Yeah.
>> is um more challenged.
>> Yes. Yes. 100%. Yeah. But now the thing
is for LLM training it actually is like
very easily it like really fits the
paradigm.
>> Um so you'd actually
>> yeah clean metric
>> yeah like LM training actually fits the
paradigm really well really easily like
all the optimization of all the code and
so it runs faster and then you also have
like metrics that you can optimize
against. I do think that if you had an
autonomous loop over those metrics
there's going to be a lot of like good
harding going on where the system will
like overfitit to those metrics and so
um but then you can use the system to
devise more metrics and you just have
really good coverage. So it's kind of
hard to tell but um in a certain sense
it's like a pretty pretty good fit.
>> I want to talk about a little uh tiny
side project you have before we end. Um
tell me about the micro GPTR.
>> Oh yeah. Okay. So micro GPT. So I have
this like running obsession of like
maybe a decade or two of just like
simplifying and boiling down the uh
basically LLMs uh to like their bare
essence. And I've had a number of
projects along these lines. So like nano
GPT and um make more and uh micro GP
microrad etc. So I feel like micro GPT
is now the state-of-the-art of me trying
to like just boil it down to just the
essence because the thing is like
training neural nets and LLMs
specifically um it's a huge amount of
code but all of that code is actually
complexity from efficiency.
>> It's just because you need it to go
fast. If you don't need it to go fast
and you just care about the algorithm
then that algorithm actually is 200
lines of Python very simple to read and
this includes comments and everything.
Um because you just have like uh your
data set which is a text um and you need
your neural network architecture which
is like 50 lines. You need to do your
forward pass and then you have to do uh
your backward pass to calculate the
gradients. And so an little autograd
engine uh to calculate the gradients is
like 100 lines and then you need an
optimizer an atom for example which is a
very state-of-the-art optimizer is like
again 10 lines really. And so putting
everything together in a training loop
is like yeah 200 lines. And it was
interesting to me like normally before
like maybe a year ago or more if I had
come up with micro GPT I would be
tempted to basically explain to people
like I have a video like stepping
through it or something like that. Uh
and I actually tried to make that video
a little bit and I tried to make like a
little guide to it and so on but I kind
of realized that this is is not really
it's not really adding too much because
people cuz it's already so simple that
it's 200 lines that anyone could ask
their agent to explain it in various
ways and the agents like I'm not
explaining to people anymore. I'm
explaining it to agents. If you can
explain it to agents, then agents can be
the router and they can actually target
it to the human in their language uh
with infinite uh you know uh patience
and uh just at their capability and so
on.
>> Right. If I don't understand um this
particular function, I can ask the agent
to explain it to me like three different
ways and I'm not going to get that from
you.
>> Exactly.
>> And so I kind of feel like you know what
is education? like it used to be guides,
it used to be lectures, it used to be
this thing, but I feel like now more I'm
explaining things to agents and maybe
I'm coming up with skills uh where like
um uh so basically skill is just a way
to instruct the agent how to teach the
thing. So maybe I could have a skill for
micro GPT of the progression I imagine
the agent should take you through if
you're interested in understanding the
codebase and it's just like hints to the
model to like oh first start off with
this and then with that and so I could
just script the curriculum a little bit
as a skill. Uh, so, uh, so I I don't
feel like, um, yeah, I feel like there's
going to be less of like explaining
things directly to people and it's going
to be more of just like does the agent
get it? And if the agent gets it,
they'll do the explanation. And we're
not fully there yet because they I still
can I still think I can probably explain
things a little bit better than the
agents, but I still feel like the models
are improving so rapidly that um, I feel
like it's a losing battle to some to
some extent.
Um and so I think uh education is going
to be kind of like reshuffled by this
quite substantially uh where it's the
end of like teaching each other things
almost a little bit like if I have a um
library for example of code or something
like that it used to be that you have
documentation for other people who are
in my user library but like you
shouldn't do that anymore like you
should have instead of HTML documents
for humans you have markdown documents
for agents because if agents get it then
they can just explain all the different
parts of it. So it's this redirection
through agents, you know, um, and that's
like, so I think we're going to see a
lot more of that playing out.
>> Well, we'll see if the great teachers
know like to develop intuition for how
to explain things to agents differently.
>> Ultimately, so for example, micro GPT
like I asked I tried to get an agent to
write micro GPT. So I told it like try
to boil down the simplest things like
try to boil down um, neural networking
to the simplest thing and can't do it.
like micro GPT is like my is it's like
my end of my obsession. It's the 200
lines. I thought about this for a long
time. I was obsessed about this for a
long time. This is this is the solution.
Trust me, it can't get simpler. And this
is this is my value ad. Everything else
like agent gets it.
>> It just can't come up with it. But it
totally gets it and understands why it's
done in certain way etc. So like my
contribution is kind of like these few
bits, but everything else in terms of
like the education that goes on after
that is like not my domain anymore. So
maybe yeah, it's like education kind of
changes in those ways where you kind of
have to infuse the few bits that you
feel strongly about the curriculum or
the the best the better way of
explaining it or something like that.
The things that agents can't do is your
job now. The things that agents can do,
they can probably do better than you or
like very soon. And so you should um be
strategic about what you're actually
spending time on.
>> Well, we appreciate the few things.
Thank you, Andre.
>> Okay.
>> Find us on Twitter at no prior pod.
Subscribe to our YouTube channel if you
want to see our faces. Follow the show
on Apple Podcasts, Spotify, or wherever
you listen. That way, you get a new
episode every week. And sign up for
emails or find transcripts for every
episode at no-briers.com.
Ask follow-up questions or revisit key timestamps.
Andrej Karpathy discusses the revolutionary shift in software engineering caused by AI agents, a phenomenon he calls "AI psychosis" due to the massive jump in individual capability. He shares how his workflow shifted from writing code to delegating to agents like Claude and Cursor, even automating his home via a WhatsApp-controlled agent named "Dobby." The conversation delves into "Auto Research," where AI models recursively improve themselves, the concept of "jagged intelligence" where models excel at coding but fail at simple humor, and the future of education, where experts will document for agents rather than humans.
Videos recently processed by our community