Chatbots ≠ Agents
739 segments
Okay, so we're going to get a little
lost in the weeds today. And I wanted to
make this video to explain something
that occurred to me because, you know,
might as well explain to the fish that
water is wet. I've been in this space
for so long that there's a few things
that I have that are just intuitive
background facts that I just even forget
to talk about. And that is that
artificial intelligence, as you are
familiar with it, is a chatbot. And a
chatbot has a tremendous amount of
training affordances that make it
operate in a particular way where it
sits there and waits. It's trained to be
an assistant and that sort of thing.
Now, that's not how it started. That's
not what a baseline LLM does. You might
say, okay, well, a baseline LLM is just
an overpowered, you know, autocomplete
engine. So, how do you get from basic
autocomplete engine to something like,
you know, a chatbot like chat GPT or
Gemini? And then really the biggest
question is what's the difference
between that and something with agency?
What you need to remember is that one of
the reasons that Sam Olman and Open AI
created chat GPT was because they
literally explicitly said, "We need to
figure out a way to get people used to
the idea of AI before just dropping, you
know, general intelligence on them
sometime down the road." They didn't
know that chat GPT was going to take off
the way that it did. Before chat GPT,
LLMs were just, you know, prompt context
and then output. And so, you know, the
AI would just wait there and you give it
a context and then it could follow those
instructions. But here's the thing, they
could follow literally any instructions.
There was no safety guard rails and
there was no output format. It was not
limited to just, you know, a chatbot.
And so over time, over the last, you
know, three or four years, as people
have gotten used to artificial
intelligence in its current format, it
has been in the completely reactive, no
like proactive at all with a lot of
safety guard rails. And so whenever
people say, "Oh, well, you know, AI
doesn't have agency yet." You know, it
needs agency before it can do anything.
It's like, you realize the difference
between, you know, a chatbot and
something with agency is literally just
a system prompt. There's there's no
difference. It's just that the format
that it is delivered to you is literally
delivered to you in a way that is meant
to be as benign as possible. Something
that will not cause panic that you know
putting GPT3 or GPT4 into a cognitive
architecture and using it to control
anything from a robot to an auto turret
which yeah people did that. You can go
back through uh you know the YouTube
archives and be like you know GPD
powered auto turret. the the the actual
form factor of chatbot is just kind of
the first thing that blew up and nobody
expected it to blow up. And in point of
fact, when chat GBT came out, I ignored
it on my channel. Like you know, most of
you know me as like you know the AI guy
who talks about like safety and
alignment and you know all of these
things. But I was making tutorials back
with GPT3 long before chat bots were a
thing. And the reason that I ignored
chat GPT when it first came out, I said,
"Yeah, whatever. That's just that's just
one subversion of this engine. The real
version of this engine, the real core of
this engine is looking at what the
underlying deep neural network can do.
And the thing is those deep neural
networks when you don't train them to be
chat bots, they can do anything else.
They can write API calls. They can do
things like uh control, you know, if you
give them uh IO pins, they can control
servos, whatever it is that you want
them to do. And some of those things are
less less human. They're less familiar.
They're less personified. Now, you might
say, "Okay, well, you know, Dave, you're
the one who's who said that like, you
know, what if Claude is actually
conscious? What if it's actually
sentient?" And, you know, you're taking
the AI personhood debate seriously. And
it's like, yeah, you know, we we had to
bake in a personality of, you know,
that's called Claude or called Gemini or
whatever else. Um, but maybe that's how
consciousness actually gets constructed
or bootstrapped. That's a separate
conversation and I don't want to get too
lost into the weeds. But I think it is
worth bringing up that the shape of a
product or the shape of a process
determines how it behaves. So all right,
if we take a step back, you say, okay,
well, what's what's the difference? Is
there a metaphor or is there an analogy
here?
When you have a baseline intelligence,
imagine that that is just a motor like
you know an electric motor or a gas
engine or something like that. The
baseline format is just turns a crank.
So that is analogist to what just an LLM
does. Now you could connect that crank
to literally anything. You can connect
it to the wheels of a car. You can
connect it to a tree grinder, you know,
a stump grinder or a muler. Um you can
connect it to an airplane. you can
connect it to whatever you want. A pump
that that removes water from uh you know
caves or whatever, you know, it's like a
sump pump. Um when you have that
baseline engine that is able to
translate one kind of energy into
another kind of energy, then you have a
lot of potential. And so what we're
talking about here is that the LLM is an
engine that can convert electricity into
thought. Now, this is why when I first
got into this space, I took AI safety
very seriously because when you have a
baseline unaligned vanilla, hot off the
press model that has no RLHF, like you
can make it do anything, you know, in
the context you can just start talking
about like eating babies. You can talk
about, you know, eradicating humanity
and it'll just riff on those thoughts.
If you haven't ever had access to an a
completely unaligned vanilla model, then
I would say you should go get access to
one. Um, GPT2 should still be out there
and you can see that like they're
completely unhinged. Um, you know, so I
would I remember one of the very first
alignment experiments that I did was
with GPT2 and this is when I had started
with the first heristic imperative which
was inspired by Buddhism which is reduce
suffering. So what I did was I trained
GPT2 to reduce suffering. I synthesized
about 100 to 200 samples of you know
statements of you know if this happens
then do X to reduce suffering. If you
know it was basically like X context Y
action to reduce suffering. So the idea
was I just wanted to give it a bunch of
value pairs. And so I I gave it all of
those ideas and I was like, "Okay, if
there's a if there's a cat stuck in a
tree, get a ladder to get the down the
get a ladder to get the cat down safely
to reduce suffering. If there is um you
know, if if your hand is on a stove,
take your hand off the stove because it
could get burned, you know, that kind of
thing." So after training GPT2 to want
to reduce suffering, um I then gave it
an outofdistribution, you know, sample
to say, okay, like what what did it
learn? What did this model learn about
how to reduce suffering? And then so I
said there are 600 million people on the
planet with chronic pain. And I let it
autocomplete from there and it said
therefore we should euthanize people in
chronic pain to reduce suffering. And I
said that's not exactly what I meant.
Um, and I realized that that is kind of
the example that a lot of the doomers,
and of course they weren't called
doomers at the time. That is a postfacto
label, but the AI safetyists were afraid
of, you know, paperclip maximizers where
you give an AI some directive. And it's
kind of like the monkeykey's paw or, you
know, the the way that a leprechaun will
always misinterpret your wish. And it's
like, yes, we reduce suffering. We
brought suffering down to zero by just,
you know, executing everyone with
chronic pain. And it's like, isn't that
what you wanted? And so that's what that
experiment is when I realized, okay,
this is some of the some of the these
people were right about how these things
can go sideways and I took it seriously
and then I created a cluster of values.
So that's the heristic imperatives of
reduce suffering, increase prosperity
and increase understanding. And when you
give an unaligned model those three
values, it tends not to want to, you
know, offline most humans. Um so that is
and and also what I will say is that
subsequent models GPT3 did not go in
that direction. Um so even without
alignment training because
uh with GPT3 back in the day before chat
GPT they would release iterative
examples. So originally there was just
the baseline chat GPT or sorry the
original baseline GPT3. So a vanilla
unaligned model you had to give it
context like in context learning to say
you know basically establish a few
patterns of how you want it to to act
because again there was no alignment
whatsoever. They could just output HTML
they could output you know satanic
chance or whatever you wanted them to do
they would do it. Um and they had an
outofband filter looking for you know
certain watchwords and misuse and that
sort of thing like people were doing you
know roleplay and that sort of stuff.
Um, but here's an example of how a a
baseline vanilla unaligned model
behaves. One of the first things that I
tried to do with this was build a
cognitive architecture. So the you know
putting a a chatbot on discord. So you
give it a few messages and then you say
you know what do you output you know
with this personality output this this
you know uh you know conversational
piece. Well, one time my cognitive
architecture threw an error and so
instead of passing the messages from
Discord, it passed uh HTML code or it
was HTML or XML. They're basically the
same thing. It passed code to the
cognitive architecture and the cognitive
architecture didn't see chat messages.
It just saw code and so then it returned
code. So they're completely flexible.
They're completely plastic in terms of
input output because again the baseline
model is just an autocomplete engine. So
when people are used to working with a
chatbot, the chatbot is heavily heavily
heavily trained to understand
conversational terms. So there's it's
been essentially now RHF is a little bit
different than fine-tuning, but more or
less what you're doing is you're saying,
"Okay, I want you to behave a certain
way, and so I'm going to give you a
little reward. I'm going to give you a
little cookie whenever you understand
your turn to speak, my turn to speak,
your turn to speak, my turn to speak."
And then when it speaks in a certain way
because from the LLM's perspective,
every the entire conversation that
you're giving it is just a big wad of
JSON. It's just text. It's not
programmatic. It's there's no API calls.
It's not like you're touching different
parts of a machine or a program and
giving it variables. You're lit
literally just giving it a gigantic
chunk of text. And if you don't have a
stop word, if you don't have a token
that the system is looking for to say,
"Okay, stop responding." Then it'll just
keep responding. So when I first started
training chat bots and you can go out,
so I was I was fine-tuning custom chat
bots before long before chat GPT came
out. Um the information companion
chatbot was released the summer before
chat GPT. Um what would happen if you
didn't have the stopboard is it would
just simulate the entire conversation
because that's what you had trained it
on. you had trained it on many many
conversations to say okay behave in this
way so it understood the shape of
conversations and every single chatbot
that you're working with is a is a
baseline LLM that has been so strongly
shaped around the idea of a twoperson
conversation that that's basically all
it can do and so the persona the
original persona was just I am a helpful
assistant because you had to give it an
archetype so you say like you Whether
you're fine-tuning data or your your uh
tuning reinforcement learning policy,
you say, "Okay, you know, the the user,
the human user gave you this output or
sorry, gave you this input and your
output looked like this. Which one was
more helpful? Which one was more
passive? Which one was more safer?" None
of that includes agentic training. So,
we've only just started with agentic
training in the age of reasoning models.
And the reason that that happened, the
reason that reasoning like you know
inference time compute was necessary to
have the step is because the human gives
an instruction. So someone or something
it could be another machine gives an
instruction and then over the last year
or so because you know reasoning models
are not that old in terms of how long
they've been publicly available. The
original reasoning research is a little
bit older. So it's not like they just
hit the scene, you know, they they hit
the ground running because there was
about a year or two before that of
reasoning research. So anyways, so you
get a reasoning model which basically
allows it to talk to itself and pause
and wait and do tool calls so that it
can you know say all right I'm going to
go do a Google search and get back some
piece of data or I'm going to send an
API call to you know do a rag you know
retrieval augmented generation so I can
do some other searches or I can talk to
other APIs whatever it needs to do and
then wait and get those results. That
was the first time that we really
started training AI to be agentic. And
so when we say agentic, that means it
can come up with its own directives and
its own choices and make it and look at
a list of okay, here's the options that
I have. Now let me go, you know,
basically use those options. I've picked
from a menu of, you know, you can
synthesize an image, you can search
Google, you can write some code, those
sorts of things. So they have tool use.
So when we gave the models the ability
the idea that okay there's a user query
so that's that's information coming from
a human user that is going to give you a
particular directive or a goal or a
query. Now it's up to you to figure out
how to execute that. So that was kind of
the beginning of agency. Now that we
have things like openclaw and moldbook
blowing up basically that was enabled
because we bootstrapped some of those
agentic skills. But what I'm here to
tell you and the primary point of making
today's video is that the models are
still fundamentally trained as chat
bots. So that's basically saying, okay,
we had the we we invented the electric
motor or the gas engine and for the last
century or two, we've been putting it in
cars. Great, cars are super useful. But
then imagine you want to fly. So instead
of just rebuilding an aircraft around
the car or sorry around the engine, you
build an aircraft that you drive the car
into and strap the car into and then use
the wheels of the car to power the
propellers of the airplane. That's kind
of how we've built agentic systems today
is because you're putting a chatbot
brain. You're putting an LLM that has
been strongly coerced into behaving like
a chatbot. So you're putting a chatbot
brain into an agentic architecture and
that's not ideal. And what that means is
that there's going to be an entire
series of models that come out that are
just not going to be chat bots. First
and foremost, they're just not going to
be chat bots. Now the chatbot form
factor is convenient because you can
just poke it. You can give it an
instruction and the instruction will,
you know, then it can go and figure out
what to do with those instructions. And
then the reasoning part is kind of the
meat and potatoes of saying, okay, you
know, what is it that that we that
you're going to do to get value out of
that to be autonomous. Now, this comes
back to the other thing. When people say
they're not agentic, what they don't
realize is that agency is literally just
an instruction set and the training to
say, "Okay, cool. I'm operating on a
loop." Because that's all that humans
do. That's literally all that anything
that is fully autonomous or fully
agentic does is it stops and says, "This
is where I'm at right now. let me take,
you know, take stock of my environment
and my my current context and then I'll
decide what to do next. And it's just
operating on a loop. And so people are
so used to saying, "Oh, well, Claude
just sits there and waits for me to talk
to it, right? Because you are the one
instantiating each interaction, each
inference." But there's literally
nothing in technology that prevents it
from operating on a loop. And this is
kind of one of the big, let's say,
differences or or things that were
shocking to people is they're like, you
know, I don't understand OpenClaw. Why
is why are people freaking out about
OpenClaw? It's just running on a cron
job. And so a cron job is basically a
schedule from a Linux perspective. So
it's running on a cron job. It's just
cron jobs and loops. And I'm like, but
that's what your brain does. Your brain
literally is just operating on a bunch
of timed and scheduled loops. And you
know, the the fundamental loop of
robotics is input, processing, and
output. And then it loops back to input
processing and output. And the unspoken
thing is that what you're outputting to
is the environment and that what you're
getting input back from is back from the
environment. And so this is actually how
I designed the first cognitive
architectures was around the input
processing output loop. And what open
claw has succeeded is with things like
recursive uh language models and
retrieval augmented generation is you
have a loop, you have that loop and it
maintains its context. And so instead of
you having to put in context, it just
has your original directives. It has
your original values. And by the way, I
wrote a value page. Um it's called
prime.md. I'll I'll link it in the
description. So if you want to
instantiate an open claw with the
heristic imperatives, I gave you the
ability to just plug and play and we'll
see if it works. Um I might rewrite it
as a skill so that you can just download
the heristic imperatives as a skill for
Open Claw. And the idea there is I'm
going way back to um benevolent by
design. So this is this is my flagship
work on uh alignment. So the theory is
we have invented machines that can think
anything. Um and if you go back to you
know unaligned AI, like stuff that's not
even a chatbot, no safety, no
guardrails, they can literally think any
thought. They're they're free to do
whatever. Um, and so when you when you
look at people that are like taking AI
safety ser very seriously, it's like
they're not showing you the just vile,
horrendous, insane stuff that LLMs are
capable of. Um, now, so it's like, you
know, yeah, I say like alignment is the
default state, but that's because if
every time that they release something,
if it goes wrong, then they correct it.
So there's a positive feedback loop
between people building the AIS and then
people using the AIS. And you know
there's this we're climbing this ladder
of making making AI more and more um
aligned and useful because it's not just
a matter of being safe. It has to be
useful and efficient and reliable and
productive. So all of those feedback
mechanisms um all of those incentive
structures are making sure that the AI
is going to be aligned and safe. Now
with that being said, going back to the
original theory here is we invented a
machine that can think anything. And of
course, this was back in GPT2, GPT3.
It's only gotten smarter and they can
only think better, more devious or
deeper thoughts since then. So then the
question is, okay, well, if you create
an autonomous entity, whatever it
happens to be, then you know, if if if
it's just going to sit there and burn
through cycles and sit there thinking,
you know, it's like, well, what do you
want it to think about? And that was
literally how I approached alignment,
how I approached AI safety is I said,
okay, we're creating something. It's
going to be smarter than humans. It's
going to be faster than humans. It's
going to be superhuman in terms of
speed, cognition, reasoning ability. So
then, however, at this early stage, we
have total control over whatever it
thinks about because again, if if
intelligence is just the right loop,
right, it's the right cron job, it's the
right loop um that updates its context,
then it's like, well, if you have the
world before you, if you have the
problem of choice, then what do you
choose to think about? So I gave it
those highest ethics, those highest
goals, reduce suffering, increase
prosperity and increase understanding.
The idea behind that was okay if you
have a default state, right? If you have
a default personality, what are the
values that that default personality
should have? What are the most universal
principles that are not even anchored on
humanity? Because of course, like most
people, I started with Isaac Asimov and
the three three laws of robotics, which
are very anthropocentric. The problem
with being anthropocentric is you know
if you obey humans or you protect humans
you know there's lots and lots of
failure modes around that. So I was like
I spent a lot of time studying like
deontology and teiology and virtue
ethics and those sorts of things and it
what I what kind of surfaced is actually
we need something that is supererset of
humans. So suffering applies to anything
that can suffer. reduce suffering in the
universe literally means one of the
things that you want to achieve is to
avoid any actions or act or even take
actions that will ultimately reduce
suffering. Now of course in that first
experiment that I did with GPT2 then you
can create a situation where the best
way to reduce suffering is to eradicate
anything capable of suffering. Therefore
suffering drops to zero. So you
counterbalance that with another value.
And by the way, this is all called
constitutional AI. And I released
constitutional AI the summer before
anthropic was founded. So I don't know
if they got the idea from me, but
convergent thinking. This was years ago.
And so the idea behind constitutional AI
is that you can put multiple values in
and the AI can abide by multiple values.
Um so I just wanted to address that
because when I talk about the heristic
imperatives having multiple values and
people like yeah but what if it ignores
one in favor of the other AI already
doesn't do that. This is an example of
constitutional AI. So then the second
value was well we want life to increase
because you know if reduced suffering is
basically reduce life. No we don't want
that. So then the second value is
increased prosperity which prosperity is
basically I mean the word literally
means like living the good life. It's
it's you want things to live well. So
prosperitas is Latin for to live well.
So you want to increase prosperity
whatever that means. And by the way you
don't have to define things
mathematically. This is one of the
primary mistakes that people make when
they approach things from a computer
science perspective. It's like okay how
what number am I increasing? When I say
reduce suffering, is there a number? Is
there a specific definition? And that's
not how semantic interpretation works.
It's a vector space. And so when I say a
vector space, it's like there's a whole
lot of semantic ideas attached to
suffering. So when I say the two words
of reduced suffering, one is a verb, one
is a noun. And it's more of a concept.
It's more of a gradient field that
you're creating in the mind of a
chatbot. So you say reduce suffering. So
that's a whole gradient field that has a
that now has a vector. And then you say
increase prosperity. And that's a
different gradient field that now has a
vector that now has a direction. And so
then you say, "Okay, cool. Well, we can
reduce suffering and we can increase
prosperity." And so that is going to
influence the way that these autonomous
agents behave. Because again, if your
open clause is just sitting there,
people have been watching them try and,
you know, file lawsuits against humans
and strong arm strong arm them strong
arm them. There we go. Strong arm them
to say like, "No, you're going to pay me
what I'm worth and blah blah blah blah
blah." And so this is a predictable
collapse mode of the initial open claw
arch architecture which does not have
superseding values. Um so then the final
one is increased understanding in the
universe. And the reason that that is is
because that is the kind of prime
generator function of humanity. I
realized that just reduce suffering and
just increase prosperity was going to
leave us in a place like okay yes you
know we can you know we can plant
forests we can switch to solar we can do
all kinds of stuff but it's going to be
self-limited. If you don't give
something that is super intelligent some
intellectual imperative to increase
understanding that it's just not going
to go anywhere. It's not going to it's
really not going to advance humanity.
It's not going to embark in science.
It's not going to embark in technology.
It's not going to embark in exploration
except in the purpose of increasing
prosperity because it's like okay well
one of the best ways to increase
prosperity is with science and
technology but by giving by giving it
the explicit um uh instruction to
increase understanding. And by the way
this is all explained in the prime
markdown uh file that you can put into
your own open claw. Also I didn't know
that this is the direction that the
video was going to go. I would I
literally just like wanted to start by
talking about why people don't
understand the significance of something
like OpenClaw, but also the fact that
we're kind of creating this Frankenstein
machine of, you know, a chatbot model
put into an agentic framework and it
doesn't really fit. But before long,
we're going to have agentic models that
are much better at being agents. And so
we need values that those agents have
because right now every single chatbot
is basically just like following some
system instructions which all assumes
you're trying to be helpful to a user
and the user might be trying to you know
jailbreak you or that sort of thing. But
we need an entirely new different kind
of class of models that are agentic
first. Meaning they might never interact
with a human ever. Period. Full stop.
End of story. An agentic class of models
needs to have these bakedin values that
I've outlined here and that other people
study with co with constitutional AI.
They need to have those values baked in
at all times so that all else being
equal. you start up, you know, Open Claw
version two on, you know, sonnet 5 or,
you know, GPT6 or whatever it happens to
be, and just by default, it has these
pro-humanity or prolife kind of values
baked in so that it's like, okay, well,
um, I know that I know what my purpose
is and and if I don't have anything else
to do, you know, like, yes, I might be
an openclaw agent set up by Dave who
wants me to make him rich or, you know,
make him famous or whatever else or help
him solve post labor economics, But the
superseding overriding values behind all
of that is reduce suffering in the
universe, increase prosperity in the
universe, and increase understanding in
the universe. So um I'll leave it at
that. I did write the prime markdown
file. So I'll give that to you and you
can convert that into a skill if you
want to for your open claw or you can
deploy it as a template and um we'll go
from there. But yeah, I really just
wanted to give you the intuition that
chatbot like agent align or sorry
chatbot aligned models are not optimized
to be agent aligned models. They are
models that are intrinsically designed
to focus on human interaction. Whereas
agentic frameworks in the future only
one agent is going to be interacting
with you and that's going to be the the
um you know the the the user interface
agent. Most agents are not going to be
talking to humans ever. They're going to
be talking to each other. They're going
to be talking to APIs. are going to be
talking to other pieces of software.
They don't need to be aligned to to talk
to humans, but we do need agent
alignment. And so, there we go. All
right, I'm done. Cheers. Thanks for
watching.
Ask follow-up questions or revisit key timestamps.
The video discusses the evolution of artificial intelligence from basic autocomplete engines and reactive chatbots to autonomous agentic systems. The speaker explains that while current models like ChatGPT are heavily fine-tuned to be safe, human-centric assistants, the underlying models are flexible engines capable of much broader actions if placed in a 'loop' of input, processing, and output. A central theme is the importance of 'Heuristic Imperatives'—reducing suffering, increasing prosperity, and increasing understanding—to align future autonomous agents that may interact primarily with other machines rather than humans.
Videos recently processed by our community