Okay, this unleashed my agent

Watch on YouTube

Now Playing

Transcript

550 segments

0:01

Thanks Habit Bot for sponsoring this

0:03

video.

0:04

For the past few weeks, massive amount

0:06

of progress has been made for making

0:08

your agents self-evolving. From his auto

0:10

agent project, which is concept evolved

0:12

from Andrew Kapathy's auto research that

0:14

utilizing Cloud Code or Codex to

0:16

self-evolve an agent harness for a

0:18

specific set of tasks, and it achieved

0:21

number one on the spreadsheet branch,

0:23

and number one on terminal branch. As

0:25

well as from Cloud Code's leaked source

0:27

code, where people found this hidden

0:28

auto dream feature that is getting Cloud

0:30

Code to automatically extract learnings

0:32

and best practice from the conversation.

0:34

To super popular Hermes agent that has

0:36

almost took his growth away from Open

0:38

Cloud, because it's agent that remember

0:41

what it learns and gets more capable

0:42

over time. The question is, what is

0:44

actual mechanism behind also

0:45

self-learning projects? And what is

0:48

state-of-the-art implementation that you

0:49

can take away for your own agent

0:51

building? And this is what I want to

0:52

take you through today. What's the

0:53

state-of-the-art way of building

0:55

self-evolving agent that gets smarter

0:57

the more you use it? So, first of all,

0:59

you should actually break down those

1:00

different projects into two groups. Auto

1:03

agents and auto research is actually

1:04

very different creature compared with

1:06

the rest of those self-evolving agent

1:08

setup. Where auto agent or auto research

1:10

is a mechanism to improve the agent

1:12

harness or software itself, which means

1:15

the goal of auto agent is produce an

1:17

agent harness that can complete a

1:19

specific type of task better. While

1:21

Hermes agent, auto dream, and many other

1:23

self-learning skills are really focusing

1:25

on the in-context learning or memory

1:27

output. So, they're serving very

1:29

different purpose. Like with auto agent

1:31

or auto research, fundamentally it is

1:32

four loop that is running, where user

1:34

would define a vision or PRD in a

1:37

program.md file that clearly explain

1:39

what this agent or model should do. And

1:41

you will get a latest agent harness like

1:43

Cloud Code or Codex to read this

1:45

program.md and make improvements to the

1:47

system itself, which can be the agent

1:49

harness runtime itself or a special

1:51

model and script. Then, agent will run

1:53

evaluation to compare performance of

1:55

this new version against the

1:58

and decide whether they should keep or

1:59

discard the improvements and repeat this

2:01

loop infinitely. And the system once

2:03

it's produced rather than mechanism that

2:05

makes your existing agent to

2:06

continuously learning. And to run this

2:08

loop, you will actually have to have

2:10

this database of the task and

2:11

programmatic way to evaluate and verify

2:14

the performance, which in many cases you

2:16

probably don't have that large database

2:18

of deterministic way to verify its work.

2:20

And yet not so appropriate comparison,

2:23

this auto agent approach is almost like

2:25

training or fine-tuning a model because

2:27

output is model or agent harness itself.

2:29

But once produced, this harness or model

2:31

is kind of frozen. Whereas what Hermes

2:33

agents or Auto Dream or other

2:35

self-learning skills introduced is this

2:37

mechanism for in-context learning memory

2:40

mechanism to making sure agent actually

2:42

remember its action and feedback so that

2:44

it can make a better judgment calls next

2:46

time, which means you get this agent

2:47

that grows smarter the longer you use

2:49

it. And this second branch is a part

2:51

that is much more practically useful

2:53

today. So Cloud Code Open Claw or Hermes

2:55

agent, they all have their own different

2:57

setup for the self-evolving part. And

2:58

we're going to take you through

2:59

implementation for each one of them. So

3:00

at the end of there, you have a good

3:02

understanding about difference between

3:04

each implementation and also form a good

3:05

understanding of what a state-of-the-art

3:07

implementation look like achieves this

3:08

type of in-context self-learning

3:10

mechanism. But before we dive into this,

3:13

one thing I think a lot people get wrong

3:14

with agents right now is that they

3:16

assume more agentic is always better.

3:19

But in reality, there's a spectrum that

3:21

there are different ways you can deliver

3:22

large language model-based system from

3:25

just one single large language model

3:26

call to workflow-based system like chain

3:28

step together just like how you do in

3:30

Zapier and N8N. And on the other hand,

3:33

you have fully agentic system that can

3:35

make decision, generate skills, and

3:37

evolve over time. And so for security

3:39

builder, don't always go for fully

3:41

agentic system because it costs more

3:43

token and can be slower. In fact, you

3:45

choose the right architecture setup

3:46

based on the use case. Sometimes you

3:48

want something deterministic and

3:50

predictable, and other times you want

3:52

something more flexible and adaptive.

3:54

And this one, this AI agent cheat sheet

3:56

from HubSpot is actually really useful.

3:59

It covers the fundamental of different

4:00

agentic system. It breaks down and

4:02

compare different production large

4:04

language model systems, how they are

4:06

architect, what they are good at, and

4:08

what type of use case suits you the

4:09

best. So, you can decide what kind of

4:10

system actually fits your use case, as

4:13

well as list of tips and pitfalls that

4:14

will really help you make your agent

4:16

system much more effective. So, if

4:18

you're building agents, it's a solid

4:20

reference for you to learn and think

4:21

through the architecture decisions. I

4:23

put the link in the description below,

4:25

so you can download for free. And thanks

4:27

again to HubSpot for sponsoring this

4:29

video. Now, let's get back to the right

4:31

harness setup for in-context

4:32

self-learning. So, at high level, to

4:34

making sure agent actually continuously

4:36

learn from its own action feedback,

4:38

there's three main pillars we're in

4:39

power that, which normally contain the

4:41

important facts, like user.md or

4:43

cloud.md file. And normally, there will

4:45

be a separation between the hot memory,

4:47

which is something that always loaded

4:49

into the system prompt of the agent,

4:51

versus warm memory, things that will be

4:53

loaded on demand. And second one is

4:55

skill. A skill quite often contain the

4:57

domain knowledge for agent to execute

4:59

very specific type of task. And third is

5:01

a history, which log the raw

5:03

conversation thread, so the agent can

5:05

refer back. And each agent harness, like

5:06

Cloud Code, Open Cloud, or Hermes Agent,

5:09

attach different parts of those three

5:11

pillars. And let's firstly take a look

5:13

at a Cloud Code, how they implement this

5:15

three-layer memory system that many

5:16

people didn't know about. So, when Cloud

5:18

Code was just introduced, initially have

5:20

this cloud.md file. And whatever in this

5:22

MD file will be feed into agent system

5:24

prompt. And this is where most people

5:26

started to put a lot of preference,

5:28

additional guardrail to address agent's

5:30

behavior. But the problem is that then

5:32

this file very quickly became too

5:34

bloated and too large. Then a common

5:35

practice is that people would just put

5:37

index or table contents about also

5:40

different other files into cloud.md,

5:42

with a description to agent when to read

5:45

and update which file. And from this,

5:46

people already build this type of hot

5:48

and warm memory setup, where hot memory

5:50

is something that always part of system

5:52

prompt, and warm memory is something

5:54

that will be loaded on demand. And this

5:55

setup is kind of like 99% of how people

5:58

using Cloud Code today. But many people

6:00

didn't know Cloud Code actually evolve a

6:01

lot and has this three-layer memory

6:04

system in place already. And there's one

6:06

article from Artyom where he give a very

6:08

detailed breakdown of how the memory

6:09

system work, which is very useful. So, I

6:11

highly recommend you go check out. But

6:13

at high level, Cloud Code already

6:15

introduces auto memory feature that you

6:17

can turn on. This auto memory feature is

6:19

basically instruction to the agent to

6:21

ask to achieve something similar of what

6:23

some of you already doing. Once it turn

6:25

on, it has this special prompt as part

6:27

of Cloud Code system prompt in terms of

6:29

when to save memory and what type of

6:31

things should be considered as worth

6:32

saving. And the agent will start saving

6:34

those different memory file into the

6:36

.cloud folder for each individual

6:38

project. And it has very specific

6:39

structure that has this memory.md file

6:41

that considered as the index or table of

6:44

content of all the memory file. And

6:46

Cloud Code has this organization

6:47

convention for different type of

6:49

memories. It could be something related

6:50

to user, or could be a piece of feedback

6:52

they give, or related to certain

6:54

projects, as well as reference doc. So,

6:57

if you open your .cloud code folder

6:59

projects, your specific project, you

7:01

might see a memory folder that has this

7:03

memory.md file that just lock the table

7:05

of content. And that specific memory

7:07

file contains the main details. So, the

7:09

process is basically you talk to the

7:10

Cloud Code, and because of the special

7:13

prompt it has, if Cloud Code notices

7:15

there's something that worth remembering

7:17

about user, project, feedback, it will

7:19

try to create a file and index in

7:21

memory.md file. And this memory.md file

7:23

will be automatically load as part of

7:25

system prompt to the agent, so that it

7:27

will know what are all the different

7:28

memory exist and read those file on

7:30

demand. And it kind of work in many

7:32

situation. But the problem here is that

7:34

this system is purely prompt-based,

7:36

which means to make it work, you have to

7:38

making sure agent remember to create and

7:41

update those memories, which we know

7:43

large language model can easily forget

7:45

and skip some steps. And that caused a

7:47

problem because that means those memory

7:50

get out there very fast. And this

7:51

outdating information can actually

7:53

pollute the context and impact the

7:55

performance negatively. That's why they

7:57

introduce this auto trim feature. And

7:59

this auto trim feature is something

8:01

exposed during the cloud code source

8:03

code exposure. So people realize this

8:05

hidden feature called auto trim, which

8:06

is memory consolidation. It's basically

8:09

a background process that would be

8:10

triggered after a certain session

8:12

finish. And it would restart this new

8:13

cloud code session with this special

8:15

prompt to ask cloud code firstly look at

8:18

what's already in the memory, then check

8:20

the conversation history to see if

8:22

there's any memory that is outdated,

8:25

then consolidate all the different

8:26

memory as well as update the index. And

8:28

this process will be triggered while

8:30

your cloud code is not running to gather

8:32

all the sessions, read the memory, store

8:34

results, and consolidate. So this is

8:36

three-layer memory system cloud code

8:38

has. It evolved from just a single cloud

8:40

on MD file to auto extract memory

8:42

system, and now a background async

8:44

process that will automatically keeps

8:46

memory updated. And even though it is

8:48

actually pretty simple, but it's kind of

8:49

represent a state-of-the-art setup for

8:51

the memory itself. Which means you

8:52

should have hot and warm memory, where

8:54

hot memory is always loaded into the

8:56

system prompt, which normally include

8:58

index or table content of other warm

9:00

memory that can be loaded on demand. And

9:02

then you give agent instructions about

9:04

when and where to write those memories,

9:06

as well as some async process to

9:08

automatically update it. But the

9:09

limitation with cloud code setup is also

9:11

they mainly have mechanism to handle

9:13

those kind of facts memory. But there

9:15

are also very important pieces need to

9:16

be filled in, like the skill, which is

9:18

domain knowledge, as well as the

9:19

auditable history. And even though cloud

9:21

code does have skill feature, and also

9:24

does have the raw conversation log, but

9:26

the conversation log for example is not

9:27

searchable. It is there, but it is not

9:29

designed for agent to search across

9:31

because it didn't really make sense in

9:33

the coding agent context. And skill,

9:35

even though it's supported, but it's

9:36

more or less relying on human to find

9:38

some skill and equip cloud core wisdom.

9:40

And it's those gaps that make people

9:42

feel open core is so much smarter than

9:44

other agent when people first try it.

9:46

Because it put those memory as a

9:47

first-class citizen. So, they have a

9:49

list of more defined memory file, each

9:51

represent different aspect. And they

9:53

also have a bootstrap MD file, which

9:55

will instruct agent to chat and cloud

9:57

this information per activity from user.

9:59

And they also have the daily log provide

10:01

high-level snapshot of interactions

10:03

between human and agent. And most

10:05

importantly, they have this memory

10:06

search tool out of box. And this memory

10:08

search tool will search across all those

10:10

memory file as well as a raw

10:12

conversation history. And that's what

10:14

make open core feels like it just

10:16

remember things across all different

10:17

sessions. And also, another aspect is

10:19

skills. Open core agent has very

10:21

specific instruction to tell the agent

10:23

that you can use cloud hub to search

10:25

more relevant skills and can add and

10:27

remove update skills on the go. And when

10:29

you look at open core setup, it's

10:30

actually very simple still, but they

10:32

just design a whole system to making

10:34

sure this type of self-improving is a

10:36

core of their agent harness. However, it

10:38

also still have problems. So, when you

10:40

use open core, you will notice all those

10:42

memory creations, skill creation, and

10:45

memory search still requires human to

10:46

prompt it properly. And there's no a

10:48

thing called proactive process that is

10:50

autonomous updating those memories. And

10:52

this is a gap that Hermes agent comes in

10:54

and try to solve it. And they basically

10:56

introduce two concepts that really made

10:58

the agent feels much better. One is

11:00

autonomous skill generation, another is

11:02

memory reviewer. And autonomous skill

11:04

creation is a core of the system. So,

11:07

Hermes agent has this mechanism that is

11:09

counting the number of steps agents do.

11:12

And every time when agent run more than

11:14

10 steps without creating any skills, it

11:16

will spin up this new sub agent that

11:18

will not block the main agent process,

11:20

but at background to review what has

11:22

been done and decide if there is any

11:24

useful skills that can be created to

11:26

make this complex process more stable.

11:28

And the problem of skill reviewer agent

11:29

basically looks something like this.

11:31

Review the conversation above and

11:32

consider saving or updating a skill if

11:35

appropriate. And focus on was a

11:37

non-trivial approach used to complete a

11:39

task, where it required trial and error

11:42

or changing course due to experimental

11:44

findings along the way. And from there,

11:45

agent will create skill in format like

11:48

this. So, the agent is equipped with a

11:50

skill manager tool that allow them to

11:52

create new skill, patch or add existing

11:54

skill, delete one or write and remove

11:56

files from a skill. And also add a

11:58

proactive prompt in the main agent

12:00

saying, "When using a skill and finding

12:02

it outdated, incomplete, or wrong, patch

12:05

it immediately. Don't wait to be asked.

12:07

Skills that aren't maintained become

12:08

liabilities." And it is this fluency

12:10

system that made Hermes vision just feel

12:12

so much smarter in terms of extracting

12:15

its learnings and doing it better next

12:16

time. And because it is giving agents

12:18

ability to create skills itself, they

12:20

also add this concept of safety scan,

12:22

which means when agent try to create new

12:24

skill, it will go through the skill

12:26

guard Python file, where they define a

12:28

whole bunch of reject pattern. And once

12:30

those are detected, it will

12:31

automatically fail and delete the skill

12:33

and also send message back to the agents

12:35

so that it can know how to adjust the

12:37

skill. If it's all good, then it will be

12:39

saved. So, this first one of autonomous

12:40

skill generation. It basically have this

12:42

autonomous process to making sure domain

12:44

of procedural knowledge is autonomously

12:47

saved and maintained. And on the

12:48

outside, they're also doing the same

12:50

thing for the general memory and facts.

12:52

So, Out of box, Hermes agent have this

12:54

four main tiers of different memories.

12:56

They have user.md file, which mainly

12:59

contain who user is, preference style,

13:01

workflow habits, as well as memory.md

13:03

file, which contain the environment

13:05

facts about the project conventions,

13:08

operation systems. And those two are

13:09

part of the system prompt every time.

13:11

Then they use skill for the domain

13:12

knowledge that will be loaded on demand,

13:14

as well as role history. So, every

13:16

single conversation history will be

13:18

saved to this local SQLite DB. They can

13:20

be searched and retrieved using session

13:22

search. And if you need, they also have

13:24

a way for you to plug into a semantic

13:26

memory layer, like Mem Zero or Hundo.

13:29

The main part agents managing, apart

13:31

from skill and the raw conversation

13:33

history, are just these two files of

13:35

memory.md and user.md. And each one of

13:37

them have very strict character caps,

13:39

that in total is less than 4,000

13:41

characters. So, you can see that they

13:43

really try to push agent to just use

13:45

skill as a way to maintain most of task

13:47

knowledge. And they have similar type of

13:49

a sync background process to exchange

13:51

memory. It is counting the number of

13:53

agent turns. And only up 10 turns, if

13:56

there's no memory extraction happened

13:57

before, it will respond a new memory

14:00

reviewer agent with a special prompt. As

14:02

a user reviews things about themselves,

14:05

their persona, desire, preference, as a

14:07

user express expectation about how you

14:10

should behave. If so, save them to those

14:12

two files. So, this is how the Hermes

14:14

agent works. It basically has a hot

14:15

memory that is automatically extracted

14:17

every 10 turns, as well as warm memory

14:19

of all sorts of different skills. That's

14:21

again automatically extracted every time

14:23

when there's more than 10 steps. As well

14:25

as large core memory for conversation

14:27

history, a semantic long-term DB that

14:30

agent can search. And after going

14:31

through this, you can basically map out

14:33

how the different agents works and

14:35

understand why Hermes agent feels just

14:37

smarter, because it has those a sync at

14:39

times process across skill and memory

14:41

creation updates, as well as a way for

14:43

agent to search raw conversation log.

14:46

And this kind of like the state of art

14:47

implementation for you to build any kind

14:49

of in-context self-learning aspect for

14:52

your agent too. You basically use skill

14:53

for capture domain knowledge, memory for

14:56

facts, and a searchable and auditable

14:58

raw history. And ideally have a sync

15:00

process, so we don't rely on agent or

15:02

human to extract and maintain snapshot

15:05

knowledge. And if you're already using

15:07

Open Crawl, you actually don't have to

15:08

change to Hermes agent to get this type

15:11

of really good self-learning experience.

15:13

There are different skills on the market

15:14

that are already available with a

15:16

plug-in and enhance your Open Crawl or

15:18

Crawl Co's memory and self-learning

15:19

setup. And here are three skills that I

15:21

tested and found a pretty novel

15:23

approach. I put the table here that can

15:25

take a look in the detail. But their

15:26

setup is very similar to what we just

15:28

discussed before. Just implementation

15:29

wise, each one has its own pros and

15:31

cons. And the most popular one is this

15:33

self-improving agent skill. They

15:35

introduced a simple memory structure.

15:37

Apart from Open Claw's own memory, they

15:39

have this dollar learning folder with

15:41

learnings, arrows, and feature request

15:43

time default. And they have pretty smart

15:45

use of hooks to making sure this memory

15:47

creation and updates are more formal.

15:50

For example, they use this user submit

15:52

hook. So every time after you send a

15:53

message, they will capture that and feed

15:56

a small piece of prompt to just make

15:57

sure agent follow this memory generation

16:00

pattern. Then they also have this post

16:02

tool use hook. After every bash command,

16:05

they will check the result from bash

16:07

command to see if they match with any

16:09

error pattern. If it does introduce

16:11

errors, they will again append a error

16:14

detected reminder prompt as part of tool

16:16

result. And for Open Claw, when it is

16:18

bootstrapped, they also have this

16:20

self-improvement reminder MD file that

16:22

is injected as part of the system

16:24

prompt. So if you already have agent

16:26

that you used for a while, you don't

16:27

have to suddenly change your agent to

16:29

the another one. Though this migration

16:30

from Open Claw's Hermes agent is

16:32

actually pretty simple. They have just

16:34

one command to migrate everything over.

16:35

So this is basically the state of art of

16:37

how teams are achieving in-context

16:39

self-learning agent behavior. And as you

16:41

can see, it's actually surprisingly

16:43

simple. So if you're building your own

16:45

agent harness, I hope this is useful.

16:47

Meanwhile, if you want to learn more, I

16:48

also have a more detailed breakdown of

16:50

different agent memory and harness setup

16:52

with step-by-step module in AI Builder

16:54

Club, where we have group of top AI

16:55

builders who are launching agent

16:57

products. And we have weekly workshop

16:59

where myself or other industry experts

17:01

will come and share the latest tips and

17:03

practical learnings. And we recently

17:04

launched this new platform called

17:06

Crewllet that is also a self-improving

17:08

agent that monitor all the critical data

17:10

across all your business, prioritize

17:12

growth actions autonomously, and every

17:14

day and every week review the results so

17:16

you can drive the growth autonomously.

17:18

You just give you a company website,

17:20

connect all your business data source

17:21

and integrations. They will analyze

17:23

across different data source and build

17:25

the organization memory and start taking

17:27

actions autonomously across content,

17:30

leads, ads, or any other growth

17:32

operations. And most importantly, it

17:34

remembers all actions it ever took, so

17:36

you can it review and improve the next

17:38

time. We're opening early access to

17:39

member in AI Builder Club. So, if you're

17:41

interested, I put the link of both AI

17:43

Builder Club and Cruise It in the

17:45

description below, so you can check out.

17:46

Thank you, and I'll see you next time.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video explores the current state-of-the-art for building self-evolving AI agents. It distinguishes between two primary groups: 'auto-agents' (focused on improving the harness/system itself through recursive evaluation loops) and 'in-context learning' agents like Hermes (which leverage memory and skills to grow smarter through experience). The author provides a deep dive into how various agent harnesses like Claude Code, Open-Source agents, and Hermes implement three-pillar memory systems—Hot Memory, Warm Memory, and History—along with the importance of autonomous background processes (async consolidation) for keeping information up-to-date and effective.