Okay, this unleashed my agent
550 segments
Thanks Habit Bot for sponsoring this
video.
For the past few weeks, massive amount
of progress has been made for making
your agents self-evolving. From his auto
agent project, which is concept evolved
from Andrew Kapathy's auto research that
utilizing Cloud Code or Codex to
self-evolve an agent harness for a
specific set of tasks, and it achieved
number one on the spreadsheet branch,
and number one on terminal branch. As
well as from Cloud Code's leaked source
code, where people found this hidden
auto dream feature that is getting Cloud
Code to automatically extract learnings
and best practice from the conversation.
To super popular Hermes agent that has
almost took his growth away from Open
Cloud, because it's agent that remember
what it learns and gets more capable
over time. The question is, what is
actual mechanism behind also
self-learning projects? And what is
state-of-the-art implementation that you
can take away for your own agent
building? And this is what I want to
take you through today. What's the
state-of-the-art way of building
self-evolving agent that gets smarter
the more you use it? So, first of all,
you should actually break down those
different projects into two groups. Auto
agents and auto research is actually
very different creature compared with
the rest of those self-evolving agent
setup. Where auto agent or auto research
is a mechanism to improve the agent
harness or software itself, which means
the goal of auto agent is produce an
agent harness that can complete a
specific type of task better. While
Hermes agent, auto dream, and many other
self-learning skills are really focusing
on the in-context learning or memory
output. So, they're serving very
different purpose. Like with auto agent
or auto research, fundamentally it is
four loop that is running, where user
would define a vision or PRD in a
program.md file that clearly explain
what this agent or model should do. And
you will get a latest agent harness like
Cloud Code or Codex to read this
program.md and make improvements to the
system itself, which can be the agent
harness runtime itself or a special
model and script. Then, agent will run
evaluation to compare performance of
this new version against the
and decide whether they should keep or
discard the improvements and repeat this
loop infinitely. And the system once
it's produced rather than mechanism that
makes your existing agent to
continuously learning. And to run this
loop, you will actually have to have
this database of the task and
programmatic way to evaluate and verify
the performance, which in many cases you
probably don't have that large database
of deterministic way to verify its work.
And yet not so appropriate comparison,
this auto agent approach is almost like
training or fine-tuning a model because
output is model or agent harness itself.
But once produced, this harness or model
is kind of frozen. Whereas what Hermes
agents or Auto Dream or other
self-learning skills introduced is this
mechanism for in-context learning memory
mechanism to making sure agent actually
remember its action and feedback so that
it can make a better judgment calls next
time, which means you get this agent
that grows smarter the longer you use
it. And this second branch is a part
that is much more practically useful
today. So Cloud Code Open Claw or Hermes
agent, they all have their own different
setup for the self-evolving part. And
we're going to take you through
implementation for each one of them. So
at the end of there, you have a good
understanding about difference between
each implementation and also form a good
understanding of what a state-of-the-art
implementation look like achieves this
type of in-context self-learning
mechanism. But before we dive into this,
one thing I think a lot people get wrong
with agents right now is that they
assume more agentic is always better.
But in reality, there's a spectrum that
there are different ways you can deliver
large language model-based system from
just one single large language model
call to workflow-based system like chain
step together just like how you do in
Zapier and N8N. And on the other hand,
you have fully agentic system that can
make decision, generate skills, and
evolve over time. And so for security
builder, don't always go for fully
agentic system because it costs more
token and can be slower. In fact, you
choose the right architecture setup
based on the use case. Sometimes you
want something deterministic and
predictable, and other times you want
something more flexible and adaptive.
And this one, this AI agent cheat sheet
from HubSpot is actually really useful.
It covers the fundamental of different
agentic system. It breaks down and
compare different production large
language model systems, how they are
architect, what they are good at, and
what type of use case suits you the
best. So, you can decide what kind of
system actually fits your use case, as
well as list of tips and pitfalls that
will really help you make your agent
system much more effective. So, if
you're building agents, it's a solid
reference for you to learn and think
through the architecture decisions. I
put the link in the description below,
so you can download for free. And thanks
again to HubSpot for sponsoring this
video. Now, let's get back to the right
harness setup for in-context
self-learning. So, at high level, to
making sure agent actually continuously
learn from its own action feedback,
there's three main pillars we're in
power that, which normally contain the
important facts, like user.md or
cloud.md file. And normally, there will
be a separation between the hot memory,
which is something that always loaded
into the system prompt of the agent,
versus warm memory, things that will be
loaded on demand. And second one is
skill. A skill quite often contain the
domain knowledge for agent to execute
very specific type of task. And third is
a history, which log the raw
conversation thread, so the agent can
refer back. And each agent harness, like
Cloud Code, Open Cloud, or Hermes Agent,
attach different parts of those three
pillars. And let's firstly take a look
at a Cloud Code, how they implement this
three-layer memory system that many
people didn't know about. So, when Cloud
Code was just introduced, initially have
this cloud.md file. And whatever in this
MD file will be feed into agent system
prompt. And this is where most people
started to put a lot of preference,
additional guardrail to address agent's
behavior. But the problem is that then
this file very quickly became too
bloated and too large. Then a common
practice is that people would just put
index or table contents about also
different other files into cloud.md,
with a description to agent when to read
and update which file. And from this,
people already build this type of hot
and warm memory setup, where hot memory
is something that always part of system
prompt, and warm memory is something
that will be loaded on demand. And this
setup is kind of like 99% of how people
using Cloud Code today. But many people
didn't know Cloud Code actually evolve a
lot and has this three-layer memory
system in place already. And there's one
article from Artyom where he give a very
detailed breakdown of how the memory
system work, which is very useful. So, I
highly recommend you go check out. But
at high level, Cloud Code already
introduces auto memory feature that you
can turn on. This auto memory feature is
basically instruction to the agent to
ask to achieve something similar of what
some of you already doing. Once it turn
on, it has this special prompt as part
of Cloud Code system prompt in terms of
when to save memory and what type of
things should be considered as worth
saving. And the agent will start saving
those different memory file into the
.cloud folder for each individual
project. And it has very specific
structure that has this memory.md file
that considered as the index or table of
content of all the memory file. And
Cloud Code has this organization
convention for different type of
memories. It could be something related
to user, or could be a piece of feedback
they give, or related to certain
projects, as well as reference doc. So,
if you open your .cloud code folder
projects, your specific project, you
might see a memory folder that has this
memory.md file that just lock the table
of content. And that specific memory
file contains the main details. So, the
process is basically you talk to the
Cloud Code, and because of the special
prompt it has, if Cloud Code notices
there's something that worth remembering
about user, project, feedback, it will
try to create a file and index in
memory.md file. And this memory.md file
will be automatically load as part of
system prompt to the agent, so that it
will know what are all the different
memory exist and read those file on
demand. And it kind of work in many
situation. But the problem here is that
this system is purely prompt-based,
which means to make it work, you have to
making sure agent remember to create and
update those memories, which we know
large language model can easily forget
and skip some steps. And that caused a
problem because that means those memory
get out there very fast. And this
outdating information can actually
pollute the context and impact the
performance negatively. That's why they
introduce this auto trim feature. And
this auto trim feature is something
exposed during the cloud code source
code exposure. So people realize this
hidden feature called auto trim, which
is memory consolidation. It's basically
a background process that would be
triggered after a certain session
finish. And it would restart this new
cloud code session with this special
prompt to ask cloud code firstly look at
what's already in the memory, then check
the conversation history to see if
there's any memory that is outdated,
then consolidate all the different
memory as well as update the index. And
this process will be triggered while
your cloud code is not running to gather
all the sessions, read the memory, store
results, and consolidate. So this is
three-layer memory system cloud code
has. It evolved from just a single cloud
on MD file to auto extract memory
system, and now a background async
process that will automatically keeps
memory updated. And even though it is
actually pretty simple, but it's kind of
represent a state-of-the-art setup for
the memory itself. Which means you
should have hot and warm memory, where
hot memory is always loaded into the
system prompt, which normally include
index or table content of other warm
memory that can be loaded on demand. And
then you give agent instructions about
when and where to write those memories,
as well as some async process to
automatically update it. But the
limitation with cloud code setup is also
they mainly have mechanism to handle
those kind of facts memory. But there
are also very important pieces need to
be filled in, like the skill, which is
domain knowledge, as well as the
auditable history. And even though cloud
code does have skill feature, and also
does have the raw conversation log, but
the conversation log for example is not
searchable. It is there, but it is not
designed for agent to search across
because it didn't really make sense in
the coding agent context. And skill,
even though it's supported, but it's
more or less relying on human to find
some skill and equip cloud core wisdom.
And it's those gaps that make people
feel open core is so much smarter than
other agent when people first try it.
Because it put those memory as a
first-class citizen. So, they have a
list of more defined memory file, each
represent different aspect. And they
also have a bootstrap MD file, which
will instruct agent to chat and cloud
this information per activity from user.
And they also have the daily log provide
high-level snapshot of interactions
between human and agent. And most
importantly, they have this memory
search tool out of box. And this memory
search tool will search across all those
memory file as well as a raw
conversation history. And that's what
make open core feels like it just
remember things across all different
sessions. And also, another aspect is
skills. Open core agent has very
specific instruction to tell the agent
that you can use cloud hub to search
more relevant skills and can add and
remove update skills on the go. And when
you look at open core setup, it's
actually very simple still, but they
just design a whole system to making
sure this type of self-improving is a
core of their agent harness. However, it
also still have problems. So, when you
use open core, you will notice all those
memory creations, skill creation, and
memory search still requires human to
prompt it properly. And there's no a
thing called proactive process that is
autonomous updating those memories. And
this is a gap that Hermes agent comes in
and try to solve it. And they basically
introduce two concepts that really made
the agent feels much better. One is
autonomous skill generation, another is
memory reviewer. And autonomous skill
creation is a core of the system. So,
Hermes agent has this mechanism that is
counting the number of steps agents do.
And every time when agent run more than
10 steps without creating any skills, it
will spin up this new sub agent that
will not block the main agent process,
but at background to review what has
been done and decide if there is any
useful skills that can be created to
make this complex process more stable.
And the problem of skill reviewer agent
basically looks something like this.
Review the conversation above and
consider saving or updating a skill if
appropriate. And focus on was a
non-trivial approach used to complete a
task, where it required trial and error
or changing course due to experimental
findings along the way. And from there,
agent will create skill in format like
this. So, the agent is equipped with a
skill manager tool that allow them to
create new skill, patch or add existing
skill, delete one or write and remove
files from a skill. And also add a
proactive prompt in the main agent
saying, "When using a skill and finding
it outdated, incomplete, or wrong, patch
it immediately. Don't wait to be asked.
Skills that aren't maintained become
liabilities." And it is this fluency
system that made Hermes vision just feel
so much smarter in terms of extracting
its learnings and doing it better next
time. And because it is giving agents
ability to create skills itself, they
also add this concept of safety scan,
which means when agent try to create new
skill, it will go through the skill
guard Python file, where they define a
whole bunch of reject pattern. And once
those are detected, it will
automatically fail and delete the skill
and also send message back to the agents
so that it can know how to adjust the
skill. If it's all good, then it will be
saved. So, this first one of autonomous
skill generation. It basically have this
autonomous process to making sure domain
of procedural knowledge is autonomously
saved and maintained. And on the
outside, they're also doing the same
thing for the general memory and facts.
So, Out of box, Hermes agent have this
four main tiers of different memories.
They have user.md file, which mainly
contain who user is, preference style,
workflow habits, as well as memory.md
file, which contain the environment
facts about the project conventions,
operation systems. And those two are
part of the system prompt every time.
Then they use skill for the domain
knowledge that will be loaded on demand,
as well as role history. So, every
single conversation history will be
saved to this local SQLite DB. They can
be searched and retrieved using session
search. And if you need, they also have
a way for you to plug into a semantic
memory layer, like Mem Zero or Hundo.
The main part agents managing, apart
from skill and the raw conversation
history, are just these two files of
memory.md and user.md. And each one of
them have very strict character caps,
that in total is less than 4,000
characters. So, you can see that they
really try to push agent to just use
skill as a way to maintain most of task
knowledge. And they have similar type of
a sync background process to exchange
memory. It is counting the number of
agent turns. And only up 10 turns, if
there's no memory extraction happened
before, it will respond a new memory
reviewer agent with a special prompt. As
a user reviews things about themselves,
their persona, desire, preference, as a
user express expectation about how you
should behave. If so, save them to those
two files. So, this is how the Hermes
agent works. It basically has a hot
memory that is automatically extracted
every 10 turns, as well as warm memory
of all sorts of different skills. That's
again automatically extracted every time
when there's more than 10 steps. As well
as large core memory for conversation
history, a semantic long-term DB that
agent can search. And after going
through this, you can basically map out
how the different agents works and
understand why Hermes agent feels just
smarter, because it has those a sync at
times process across skill and memory
creation updates, as well as a way for
agent to search raw conversation log.
And this kind of like the state of art
implementation for you to build any kind
of in-context self-learning aspect for
your agent too. You basically use skill
for capture domain knowledge, memory for
facts, and a searchable and auditable
raw history. And ideally have a sync
process, so we don't rely on agent or
human to extract and maintain snapshot
knowledge. And if you're already using
Open Crawl, you actually don't have to
change to Hermes agent to get this type
of really good self-learning experience.
There are different skills on the market
that are already available with a
plug-in and enhance your Open Crawl or
Crawl Co's memory and self-learning
setup. And here are three skills that I
tested and found a pretty novel
approach. I put the table here that can
take a look in the detail. But their
setup is very similar to what we just
discussed before. Just implementation
wise, each one has its own pros and
cons. And the most popular one is this
self-improving agent skill. They
introduced a simple memory structure.
Apart from Open Claw's own memory, they
have this dollar learning folder with
learnings, arrows, and feature request
time default. And they have pretty smart
use of hooks to making sure this memory
creation and updates are more formal.
For example, they use this user submit
hook. So every time after you send a
message, they will capture that and feed
a small piece of prompt to just make
sure agent follow this memory generation
pattern. Then they also have this post
tool use hook. After every bash command,
they will check the result from bash
command to see if they match with any
error pattern. If it does introduce
errors, they will again append a error
detected reminder prompt as part of tool
result. And for Open Claw, when it is
bootstrapped, they also have this
self-improvement reminder MD file that
is injected as part of the system
prompt. So if you already have agent
that you used for a while, you don't
have to suddenly change your agent to
the another one. Though this migration
from Open Claw's Hermes agent is
actually pretty simple. They have just
one command to migrate everything over.
So this is basically the state of art of
how teams are achieving in-context
self-learning agent behavior. And as you
can see, it's actually surprisingly
simple. So if you're building your own
agent harness, I hope this is useful.
Meanwhile, if you want to learn more, I
also have a more detailed breakdown of
different agent memory and harness setup
with step-by-step module in AI Builder
Club, where we have group of top AI
builders who are launching agent
products. And we have weekly workshop
where myself or other industry experts
will come and share the latest tips and
practical learnings. And we recently
launched this new platform called
Crewllet that is also a self-improving
agent that monitor all the critical data
across all your business, prioritize
growth actions autonomously, and every
day and every week review the results so
you can drive the growth autonomously.
You just give you a company website,
connect all your business data source
and integrations. They will analyze
across different data source and build
the organization memory and start taking
actions autonomously across content,
leads, ads, or any other growth
operations. And most importantly, it
remembers all actions it ever took, so
you can it review and improve the next
time. We're opening early access to
member in AI Builder Club. So, if you're
interested, I put the link of both AI
Builder Club and Cruise It in the
description below, so you can check out.
Thank you, and I'll see you next time.
Ask follow-up questions or revisit key timestamps.
This video explores the current state-of-the-art for building self-evolving AI agents. It distinguishes between two primary groups: 'auto-agents' (focused on improving the harness/system itself through recursive evaluation loops) and 'in-context learning' agents like Hermes (which leverage memory and skills to grow smarter through experience). The author provides a deep dive into how various agent harnesses like Claude Code, Open-Source agents, and Hermes implement three-pillar memory systems—Hot Memory, Warm Memory, and History—along with the importance of autonomous background processes (async consolidation) for keeping information up-to-date and effective.
Videos recently processed by our community