HomeVideos

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, AllHands

Now Playing

Automating Large Scale Refactors with Parallel Agents - Robert Brennan, AllHands

Transcript

1497 segments

0:00

All

0:21

right. Thank you all for for joining for

0:23

automating massive refactors with uh

0:25

with parallel agents. Um super excited

0:28

to talk to you all today about uh you

0:30

know what we're doing with open hands to

0:32

really automate large scale chunks of

0:34

software engineering work. Lots of uh

0:36

lots of toil related to tech debt, code

0:39

maintenance, code modernization. Uh

0:41

these are tasks that are super

0:42

automatable. Uh you can throw agents at

0:44

them, but they tend to be way too big

0:46

for like you know a single just one

0:48

shot. So it involves a lot of what we

0:50

call agent orchestration. Uh we're going

0:52

to talk a little bit about how we do

0:53

that uh with Open Hands and also just

0:55

more generically.

0:58

Uh a little bit about me. Um my name is

1:00

Robert Brennan. I'm the co-founder and

1:03

CEO at Open Hands. Uh my background is

1:06

in dev tooling. I've been working in

1:08

open source dev tools for over a decade

1:10

now. I've also been working in natural

1:12

language processing for about the same

1:13

amount of time. Um uh I've been really

1:17

excited over the last few years to see

1:18

those two fields suddenly converge as

1:20

LMS are really good at writing code. Um

1:22

and I'm super excited to be be working

1:24

in the space. Uh then open hands is an

1:26

MIT licensed coding agent. Open hands

1:29

started at open dev about a year and a

1:31

half ago when Devon first launched their

1:33

uh demo video of a fully autonomous

1:35

software engineering agent. Uh my

1:37

co-founders and I saw that got super

1:39

excited about you know what was possible

1:40

what the future of software engineering

1:42

might look like. uh but realized that

1:44

that shouldn't happen in a black box,

1:45

right? If our adopts are going to

1:46

change, we want that change to be driven

1:48

by the software development community.

1:50

We want to have a say in that change. Um

1:52

and so we started opens uh then open dev

1:55

as a way to give the community a way to

1:58

help drive what the future of software

1:59

engineering might look like in an AI

2:01

powered world.

2:04

Uh so hopefully not uh controversial for

2:06

me to say that software development is

2:08

changing. Um, I know my workflow has

2:10

changed a great deal, uh, in the last

2:12

year and a half. Um, uh, I would say

2:15

now, like, you know, pretty much every

2:17

line of code that I write goes through

2:18

an agent. Uh, rather than me opening up

2:21

my IDE and typing out lines of code, I'm

2:23

now asking an agent to do the work for

2:24

me. I'm still, you know, doing a lot of

2:27

critical thinking. You know, a lot of

2:28

the the mentality of the job hasn't

2:31

changed, but what the actual work looks

2:32

like has changed quite a bit. Uh, but

2:35

what I want to convince you all of is

2:36

that it's still changing. We're still

2:38

just in the first innings of this

2:39

change. We still haven't realized all

2:41

the um all the impact that large

2:44

language models are have already brought

2:46

to the job and are going to continue to

2:48

bring to the job as they improve. Uh I

2:50

would say even if you froze large

2:51

language models today and they didn't

2:53

get any better, you would still see the

2:54

job of software engineering changing

2:56

very drastically over the next two to

2:58

three years as we figure out ways to

3:00

operationalize the technology. Uh I

3:02

think there's still a lot of uh sort of

3:05

psychological and organizational hurdles

3:07

to adopting uh large language models

3:10

within software engineering. Um and

3:12

we're seeing a lot of those hurdles

3:15

disappear as time goes on.

3:20

A brief history of kind of how we got

3:22

here. U everything started I would say

3:24

with what I call contextware code

3:26

snippets. um some of the first large

3:28

language models it turned out were very

3:30

good at writing chunks of code

3:31

especially things that they'd seen over

3:32

and over again. So you could ask it to

3:34

write bubble sort. Uh you could ask it

3:36

for you know small algorithms you know

3:38

how to how to access a SQL database

3:40

things like that. Uh it was able to

3:42

generate little bits of code. It was

3:43

able to to you know it seemed to

3:45

understand the logic a bit. But this was

3:47

totally context unaware right it was

3:48

just dropping code into a chat window

3:50

that you had asked for. It had no idea

3:52

what project you were working on what

3:53

the context was.

3:55

Shortly thereafter we got these

3:57

contextaware code generation. Uh so like

4:00

GitHub copilot as autocomplete um was

4:03

probably like the the best example here

4:05

right uh so you actually was in your IDE

4:08

it could see you know where you're

4:09

typing you know what the what the code

4:11

you're working on in uh and it could

4:13

generate code that was specific to your

4:14

codebase that reference you know local

4:16

variable names that reference you know

4:18

local table names in your database uh

4:20

huge huge improvement for um uh you know

4:23

our productivity so instead of copy

4:25

pasting back and forth between the chat

4:26

GBT window and your IDE now All of a

4:29

sudden, you can see the little robot get

4:30

its eyes. It can see inside your

4:31

codebase and it can actually generate

4:33

relevant code for your for your your

4:35

codebase.

4:37

And then I think the the giant leap

4:39

happened in early 2024 um with the

4:42

launch of Devon and then uh the next day

4:44

the launch of open devon now open hands.

4:46

Uh this is where we first started to see

4:49

autonomous coding agents. So this is

4:51

when AI started not just writing code

4:53

but could run the code that it wrote and

4:55

it could Google an error message that

4:56

came out, find a stack overflow article,

4:58

apply that to the code, add some debug

5:00

statements into the code and run it and

5:02

see what happens. Basically automating

5:04

the entire inner loop of development. Um

5:06

this was this was a huge uh step

5:09

function forward. Um you can see the

5:11

little the little robot gets arms in

5:13

this picture. Um this was a this was a

5:15

huge jump at least at least in my own

5:16

productivity. um being able to like just

5:20

write a couple sentences of English,

5:21

give it to an agent and let it churn

5:23

through the task until it's got

5:25

something that's actually working,

5:26

running, tests are passing.

5:29

And then now what we're seeing is uh

5:31

parallel agents, what we're calling

5:32

agent orchestration. Uh folks are

5:34

figuring out how to get multiple agents

5:36

working uh in parallel, sometimes

5:39

talking to each other, sometimes

5:40

spinning up new agents under the hood.

5:42

Um you know, agents creating agents. Um

5:45

this is uh I would say kind of bleeding

5:47

edge of what's possible. Um people are

5:50

just starting to experiment with this

5:51

are just starting to see success with

5:52

this at scale but there are some uh some

5:55

really good tasks that are um uh very

5:59

amenable to this sort of workflow. Uh

6:01

and it has the potential to really uh

6:04

automate away a huge mountain of tech

6:06

that sits under you know every

6:07

contemporary software company.

6:13

a little bit about kind of like the the

6:14

market landscape here. Um, again, you

6:16

can kind of see that same evolution from

6:18

left to right where we really started

6:20

with, you know, plugins like GitHub

6:21

copilot inside of our existing IDEs and

6:24

we got these like AI AI empowered IDEs,

6:26

ids with like AI tacked onto them. Um, I

6:30

would say your your median developer is

6:32

kind of adopting local agents now. They

6:33

may be running cloud code locally for uh

6:35

one or two things. Um, maybe some ad hoc

6:38

tasks. Uh your early adopters though are

6:41

starting to look at cloud-based agents,

6:43

agents that get their own sandbox

6:45

running in the cloud. This allows uh

6:47

those early adopters to run as many

6:50

agents as they want in parallel. U it

6:52

allows them to run those agents much

6:53

more autonomously than if they were

6:54

running on their local laptop, right? If

6:56

it's running on your local laptop,

6:57

there's nothing stopping the agent from

6:58

doing rmrf slash trying to delete

7:01

everything in your home directory,

7:02

whatever it might do, installing some

7:04

weird software. Whereas if it's got its

7:06

own like containerized environment

7:07

somewhere in the cloud, you can run a

7:09

little bit more safely knowing that you

7:11

know the worst it can do is ruin its own

7:13

environment uh and um uh you don't have

7:16

to like sit there babysitting it and

7:17

hitting the Y key every time it wants to

7:19

run a command. Uh so those cloud-based

7:21

environments much more scalable uh a bit

7:23

more secure. Um and then uh I would say

7:26

at the far right here what we're really

7:27

just seeing the top like 1% of early

7:29

adopters uh start to experiment with is

7:32

orchestration. this idea that you not

7:34

only have these agents running in the

7:35

cloud, but you have them talking to each

7:37

other. Uh you're coordinating those

7:39

agents, you know, on a larger task. Uh

7:42

maybe those agents are spinning out sub

7:43

aents within the cloud that have their

7:45

own sandbox environments. Uh some really

7:47

cool stuff happening there. Uh I would

7:49

say, you know, with open hands, we we

7:52

generally started with cloud agents. Uh

7:54

we've leaned back a little bit and built

7:56

local CLI similar to cloud code in order

7:58

to meet developers where they are today.

8:00

you know these these types of

8:01

experiences are much more comfortable

8:03

for developers. Uh you know we've been

8:04

using autocomplete for decades just got

8:06

million times better with GitHub go-

8:08

pilot. Um I would say these experiences

8:10

on the right side are very foreign to

8:12

developers. They feel very strange to

8:13

like give off a pass to an agent or a

8:15

fleet of agents uh and let them do the

8:17

work for you. It feels kind of like uh

8:19

for me at least uh the jump that I made

8:22

when I went from being an IC to being a

8:23

manager um is is what it feels like

8:26

going from writing code myself to giving

8:28

that code to agents. Uh so very very

8:30

different way of working. I think one of

8:31

the developers have been very slow to

8:32

adopt. Uh but again the top 1% or so of

8:35

engineers that we've seen adopt the

8:37

stuff on the right side of this uh

8:39

landscape. Uh they've been able to get

8:41

you know massive massive lifts in

8:43

productivity and tackle huge backlogs of

8:45

tech that other teams just weren't

8:47

getting to.

8:49

Uh some examples of where you would want

8:51

to use orchestration rather than a

8:53

single agent. Uh typically these are

8:55

tasks that are going to be very

8:56

repeatable and very automatable.

8:59

Uh so some examples are things like the

9:01

basic code maintenance tasks, right?

9:03

Every codebase has to uh you know

9:05

there's there's a certain amount of work

9:07

to do to just keep the lights on, right?

9:09

To keep dependencies up to date to uh

9:11

make sure that any vulnerabilities get

9:13

solved. Uh we have one client for

9:15

instance that is using open hands to uh

9:19

remediate CDEs throughout their entire

9:20

codebase. They have tens of thousands of

9:22

developers, thousands of thousands of

9:24

repositories. Um and basically every

9:26

time a new vulnerability gets announced

9:28

in an open source project, they have to

9:29

go through their entire codebase, figure

9:31

out which of their repos are vulnerable,

9:33

uh submit a poll request to that

9:34

codebase to uh actually uh you know

9:37

resolve the CVD, update whatever

9:38

dependency, fix breaking API changes. Uh

9:41

and they have seen a 30x improvement on

9:43

time resolutions for these CVDs by doing

9:46

uh orchestration at scale. uh they

9:49

basically have a setup now where every

9:50

time an ACV gets announced, new

9:52

vulnerability comes in. Uh they kick off

9:54

an open hand session to scan a repo for

9:58

that vulnerability. Uh make any code

10:00

changes that are necessary and open up a

10:02

pull request and all the downstream team

10:04

has to do is click merge, validate the

10:06

changes.

10:08

Um you can also do this for like

10:09

automating documentation and release

10:11

notes. Um there's a bunch of

10:13

modernization challenges that uh

10:15

companies face. Um, for instance, uh,

10:19

you might want to add type annotations

10:21

to your Python codebase if you're

10:22

working with Python 3. Um, you might

10:25

want to split your Java, you know, like

10:27

a monolith into microservices. Um, these

10:29

are the sorts of tasks that are still

10:31

going to take a lot of um, thought for

10:33

an engineer. You know, you can't just

10:35

like one shot it with code and say like

10:38

uh, you know, refactor my model if it's

10:39

microservices, but it is still very real

10:42

work, right? You're still just kind of

10:43

like copying and pasting a lot of code

10:44

around. So if you thoughtully or trade

10:46

agents together, they can do this. Um a

10:49

lot of migration stuff. So migrating

10:51

from like old versions of Java to new

10:52

versions of Java. We're working with one

10:54

client to migrate a bunch of Spark 2

10:56

jobs to Spark 3. Um we've uh used Open

11:01

to migrate our entire front end from

11:03

React uh from Redux to Zustand. U so you

11:06

can do these very large migrations.

11:08

Again, lots of very growth work. still

11:09

takes a lot of um thinking from a human

11:12

about how they're going to orchestrate

11:13

these agents. Um and there's a lot of

11:15

tech that uh detecting unused code

11:17

getting rid of that um you know we we

11:20

have one client who's using our SDK to

11:23

basically scan their data.logs every

11:25

time there's a new error pattern go into

11:26

the codebase and uh add error handling

11:29

fix whatever problem is uh is cropping

11:31

up. Um, so lots of things that you know

11:34

are a little too big for a single agent

11:36

to just one shot. Um, but are super

11:38

automatable are good tasks to handle

11:40

with an agent as long as you're

11:41

thoughtful about orchestrating them.

11:45

A bit about why these aren't onestopable

11:47

tasks. Uh, some of them are

11:49

technological problems, some of them are

11:50

more like human psychological problems.

11:52

On the technology side, you have a

11:54

limited amount of context uh that you

11:56

can give to the agent. So extremely long

11:58

running tasks are tasks that span like a

12:00

very large code base. Usually you don't

12:02

really have enough there. You're going

12:03

to have to uh compact that context

12:05

window to the point the agent might get

12:07

lost. Uh we've all seen the laziness

12:09

problem. Uh I've tried to launch out

12:11

some of these types of tasks. And the

12:12

agent will say, "Okay, I migrated three

12:14

of your 100 services. I need to hire a

12:17

team of six people to do the rest." Um

12:20

uh the agents often lack domain

12:22

knowledge within your codebase, right?

12:23

They don't have the same intuition that

12:24

you do for the problem.

12:26

Uh and errors compound when you go on

12:28

these really long trajectories with an

12:30

agent. Uh a tiny error in the beginning

12:32

is going to uh you know compound over

12:35

time. The agent is going to basically

12:36

repeat that error over and over and over

12:37

again for every single step that it

12:39

takes in its task. Uh and then on the

12:42

human side uh you know we do have this

12:44

intuition for the problem we can't

12:45

convey. You know say you want to break

12:46

your model into microservices. You

12:48

probably have a mental model of how

12:49

that's going to work. Uh if you just

12:51

tell the agent break the model with

12:53

microservices it's just going to take a

12:55

shot in the dark. based on patterns seen

12:56

in the past without any real

12:58

understanding of your codebase.

13:00

Uh we have some difficulty decomposing

13:02

apps for agents and understanding like

13:04

what agent can actually get done uh in

13:07

one shot. Um uh we also like you you uh

13:12

do need this intermediate review

13:14

intermediate checkin from the human as

13:15

the agent's doing its work. We'll talk a

13:17

little bit about what that loop looks

13:18

like later. Uh but it's again not

13:20

something you can just like tell an

13:21

agent to do and expect the final result

13:23

to come in. have to kind of approve

13:25

things as the agent goes along. Uh and

13:27

then not having a true definition of

13:29

them. I think uh if you don't really

13:30

know what finish looks like for this

13:32

project, it's hard to tell the agent.

13:37

Uh on these types of orchestration

13:38

paths, want to make it super clear that

13:40

we don't expect every developer to be

13:41

doing agent orchestration. Um, we think

13:43

most developers are going to use a

13:45

single agent locally uh for you know

13:47

sort of ad hoc tasks that are common for

13:49

engineers building new features uh

13:52

fixing a bug things like that. I think

13:54

running quad code locally uh in a

13:56

familiar environment alongside an IDE is

13:58

probably going to be a common workflow

13:59

at least for the next couple years. Uh

14:01

what we're seeing is that a small

14:03

percentage of engineers who are early

14:05

adopters of agents who are really

14:06

excited about agents are finding ways to

14:08

orchestrate agents to t tackle like huge

14:11

mountains of tech debt at scale and get

14:14

a much bigger lift in productivity for

14:16

that smaller select set of tasks. Right?

14:18

You're not going to see 3,000% lifted

14:20

productivity for all software

14:21

engineering. Probably going to get more

14:22

of that, you know, 20% lift that

14:24

everybody's been reporting. uh but for

14:27

some select tasks like CDE remediation

14:29

or codebased modernization you can get a

14:32

massive massive lift you can do you know

14:34

ending your years of work in a in a

14:36

couple weeks

14:39

I want to talk a little bit about what

14:41

these workflows look like in practice so

14:44

this loop probably looks pretty familiar

14:45

if you're used to working with local

14:47

agents um this is very typical loop that

14:50

looks a lot like the inner loop of

14:51

development for you know nonI coding as

14:53

well but basically you know you give the

14:55

agents a prompt

14:56

uh it does some work in the background.

14:58

Maybe you babysit it and watch, you

15:00

know, everything it's doing and hit the

15:01

Y key every time it wants to run a

15:02

command. Uh then the agent finishes, you

15:05

look at the output. Uh you see the tests

15:06

are passing. You see if this actually

15:08

satisfies uh what you asked for and then

15:11

maybe you prompt the agent again to get

15:12

it to get a little closer to the answer.

15:14

Or maybe you're satisfied with the

15:15

result. You uh you know, you commit the

15:17

results and and push.

15:20

For bigger orchestrated tasks, this

15:22

becomes a little bit more complicated.

15:24

Uh basically what you need to do is uh

15:27

you or maybe handinhand with cloud you

15:30

want to decompose your task into a

15:32

series of tasks that can be executed

15:35

individually by agents. Uh then you'll

15:37

send off an agent for each one of those

15:38

individual tasks and you'll do one of

15:41

those one of those agents for each of

15:42

the individual tasks. And then finally

15:45

at the end uh you maybe with the help of

15:47

an agent are going to need to pull in

15:48

all the output together from all those

15:50

individual agents into a single change

15:52

uh and merge that into your codebase.

15:58

Very importantly there's still a lot of

15:59

human in the loop here. Um you need to

16:02

review not just the final output of the

16:04

collated result but uh the intermediate

16:06

outputs for each agent. Um I like to

16:09

tell folks the goal is not to automate

16:10

this process 100%. It's something like

16:12

90% automation. Uh that's still, you

16:15

know, an order of magnitude productivity

16:16

lift. Um I think this is this is really

16:19

tricky to get right. This is where a lot

16:21

of like thought comes into the process

16:23

of like how am I going to break the tax

16:25

down so that I can verify individual

16:27

step uh and so that uh I can actually uh

16:31

automate this whole process without just

16:32

ending up with a high coded mess.

16:37

Uh this is a typical git workflow that I

16:39

like to use for tasks like this. Uh

16:42

typically we'll start a new branch on

16:43

our repository. Uh we might add some

16:46

high level context to that branch using

16:47

like an agent or an open hand the

16:50

concept of a micro agent. Uh but I just

16:52

a markdown explaining you know here's

16:54

what we're doing here. Uh just so the

16:56

agent knows okay we're migrating from

16:58

Redux is us andor we're going to migrate

17:01

these Spark 2 jobs to Spark 3. uh you

17:04

might want to put some kind of

17:04

scaffolding in place. Uh I'll talk a

17:06

little bit more about examples of of uh

17:08

scaffolding later. Uh you're going to

17:11

create a bunch of agents based on that

17:12

on that first branch. Uh the idea is

17:15

that they're going to be submitting

17:16

their work into that branch and it's

17:18

basically going to accumulate our work

17:19

as we go along and then eventually once

17:22

we get to the end we can rip out our

17:23

scaffolding and merge that branch into

17:24

main. Uh now for uh if you're you're

17:28

kind of getting started with this I

17:30

would suggest limiting yourself to about

17:31

three to five concurrent agents. Uh I

17:34

find more than that your brain starts to

17:35

break. Uh but for folks that have really

17:37

adopted orchestration at scale uh we see

17:40

them running hundreds even thousands of

17:42

agents concurrently. Usually a human is

17:45

not uh in the loop for you know one

17:47

human is not on the hook to review every

17:48

single one but maybe those agents are

17:50

sending out pull requests to individual

17:52

teams things like that. Um, so you can

17:54

scale up very aggressively once you

17:55

start to get a feel for how all this

17:57

works and you feel like you have a very

17:58

good way of getting that human input

18:00

into the loop.

18:04

I'm going to kick it off to uh my

18:06

coworker Calvin here. He's going to talk

18:08

about uh a very very large scale

18:10

migration uh basically u eliminating

18:13

code smells from the open hands database

18:15

that he did using our refactor SDK up

18:18

here.

18:23

Open

18:25

hands excels at solving open tasks. Give

18:28

it a focused problem something like fix

18:30

my failing CI add and debug this end

18:32

point and it delivers. But like all

18:35

agents it can stumble when the scope

18:37

grows too large. Let's say I want to

18:39

refactor an entire code base. Maybe

18:41

enforce certifiing update with your

18:44

dependency or even migrate from one

18:46

framework to another.

18:48

These are not tasks. They're sprawling

18:50

interconnected changes that can touch

18:52

hundreds of files.

18:54

To battle problems at this scale, we're

18:56

using the open hands agent SDK to build

18:58

tools designed to specifically

19:00

orchestrate collaboration between humans

19:02

and multiple agents.

19:06

As an example, let's work to eliminate

19:08

code from the open answer.

19:11

Here's the repository structure. Just

19:13

the core agent definition has about 380

19:16

files uh spanning 60,000 lines of code.

19:20

Says a lot about the volume of the code

19:22

but not much about the structure. So

19:24

let's use our new tools to visualize the

19:26

dependency graph of this chunk of the

19:28

repository.

19:30

Here each node represents a file. The

19:33

edges show dependencies who imports who.

19:36

And as we keep zooming out it becomes

19:38

clear this tangled web is why

19:40

refactoring at scale is hard. To make

19:43

this manageable, we need to break the

19:44

scrap up into humanized chunks. Think PR

19:48

size batches that an agent can handle a

19:49

human can understand.

19:52

There are many ways to bash based on

19:53

what's important to you. Graph theoretic

19:56

algorithms give strong guarantees about

19:57

the structure of edges in between

19:59

induced batches, but for our purposes,

20:02

we can simply use the existing directory

20:04

structure to make sure that semantically

20:06

related files appear inside the same

20:07

batch. Navigating back to the dependency

20:10

graph, we can see that the codes of the

20:11

nodes are no longer randomly

20:13

distributed. Instead, they correspond to

20:15

the batch that each of those associated

20:16

files exist. Zooming out and zooming

20:19

back in, we easily find a cluster of

20:21

adjacent notes that are all the same

20:22

color, which indicates that an agent is

20:24

going to access all of those files

20:25

simultaneously.

20:28

Of course, this graph is still large and

20:30

incredibly tangled. To construct a

20:32

simpler view, we'll build a new graph

20:34

where nodes are batches and the edges

20:37

between those nodes are dependencies

20:38

that are inherited from the files within

20:41

each of those patches. This view is much

20:43

simpler. We can see the entire structure

20:45

on our screen at the same time.

20:48

But this is something we have with using

20:51

a graph. We can identify batches that

20:53

have no redies and expect the files that

20:56

go. Dispatch, for example, add 16. Looks

21:00

like it's in the file. It's probably

21:02

empty. Let's check.

21:04

Now, this is a tool intended for human

21:06

AI collaboration. So, once we know that

21:08

this file is empty, we might determine

21:10

that it's better to move it elsewhere.

21:12

Or maybe we're okay keeping it inside

21:14

this batch. And all that we want to do

21:15

is add a note to ourselves or reach so

21:18

we know the contents.

21:20

Of course, when refactoring code, it's

21:22

important to consider the complexity of

21:24

what it is you're moving. This batch is

21:27

trivial. Let's find one that's a little

21:29

bit more complex. Here's a batch that

21:31

has four files. They all do and the

21:34

complexity measures reflect this. These

21:37

are useful to indicate to a human that

21:39

we should be more careful when this for

21:41

example the first examples.

21:54

You need to identify what's wrong in the

21:56

first place. Enter the verifier.

21:59

There are several different ways of

22:00

defining the verifier based on what you

22:02

care about. You consider it to be

22:04

programmatic. So it calls a match

22:06

command. This is useful if your

22:08

verification is checking unit tests or

22:11

running a lender or a text.

22:14

Instead though, because I'm interested

22:15

in code smells, I'm going to be using a

22:17

language model that's going to be

22:18

looking at the code and trying to

22:20

identify any problematic patterns based

22:22

on a set of rules that I provided.

22:24

Now, let's go back to our first batch

22:26

and actually put this verifier to use.

22:28

Remember, this batch is trivial and

22:30

fortunately the verifier recognizes it

22:32

as such. It comes back with a nice

22:34

little report indicating which person

22:36

identified and didn't. And status of

22:38

this batch is turned to completed green.

22:41

Good.

22:42

And this change in status is also

22:44

reflected in the batch graph. Navigating

22:47

back and toggling the color display, we

22:48

can see that we have exactly one node

22:51

out of many completed and the rest are

22:53

still yet to be handled. But this

22:55

already gives us a really good sense of

22:57

the work that we've done and how it fits

22:59

into the bigger picture.

23:01

So now our strategy for ensuring that

23:03

there are no code smells in the highly

23:04

of our repository is straightforward. We

23:06

just have to ensure that every single

23:08

node on this batch graph turns green. So

23:10

let's go back to our batches and

23:12

continue verifying till we run across a

23:14

failure.

23:16

We'll keep going in dependency, making

23:18

sure that we pick nodes that don't have

23:19

any dependencies on other batches that

23:21

we have yet to analyze. This next batch

23:24

is about as simple as the first, but

23:26

because the init file is a little bit

23:28

more complex. The report that gets

23:29

generated is a little bit more verbose.

23:32

Continuing down the list, we come across

23:34

the bash we identified earlier with some

23:36

chunky files of relatively high code

23:38

complexity. And this batch happens to

23:40

give us our first tree later. Notice

23:43

that the status turns red instead of

23:44

green. Now this batch has more files

23:48

than what we've seen in the past. So the

23:50

verification report is proportionally

23:52

longer. Looking through see that it is

23:55

listing file by file. The code that is

23:58

identified in which

24:00

I see one file is particularly egregious

24:03

with its violations. We'll have to come

24:05

back to that.

24:07

And if we zoom all the way back out to

24:09

the bash graph and look at the status

24:11

indicators, we'll see the two green

24:12

nodes representing the batches we've

24:14

already successfully verified. We'll

24:16

also see the red representing the batch

24:18

that we just saw that verification. Now,

24:21

our student goal is to turn this entire

24:23

graph green. This red node presents a

24:25

little bit of an issue. To convert this

24:27

red node into a green node, we need to

24:28

address the problems that the verifier

24:31

found using the next step of the

24:32

pipeline, the fixer.

24:35

Just like the verifier, the fixer can be

24:37

defined in a number of different ways.

24:39

The programmatic fixer can run a batch

24:41

command or you can feed the entire batch

24:43

into a language model and hope it

24:45

addresses the issues in a single step.

24:47

But by far the most powerful fixer that

24:49

we have uses the open agent SDK to make

24:52

clean copy of the code instead of an

24:54

agent that has access to all sorts of

24:56

tools to run tests, examine the code,

25:00

look at documentation on the do whatever

25:02

it needs to to address these issues. So

25:05

let's go back to the scaling dash and

25:07

run the fixer and see what happens.

25:10

Now this part of the demo is sped up

25:12

considerably, but because we're

25:14

exploring these patches in dependency

25:15

order, while we're waiting, we can

25:17

continue to go down the list, running

25:19

our verifiers, and spinning up new

25:20

instances of the open agent using the

25:23

SDK until we come across a node that's

25:25

blocked because one of its extreme

25:27

dependencies is still complete.

25:30

When the fixer is done, the status of

25:31

the batch is set. We'll need to rerun

25:34

verification in the future to make sure

25:35

the associated returns again.

25:38

Looking at the report that the fixer is

25:39

returned, there's not much information,

25:41

just the title of the DR. We've set this

25:44

up so that every fixer produces a nice

25:46

tidy for request ready for human

25:48

approval. Just because the refactor is

25:50

automated doesn't mean it needs to be

25:52

viewed.

25:54

And here's the generated. and the agent

25:56

does an excellent job of summarizing the

25:58

code smells that identified the changes

26:00

made to address those as well as any

26:01

changes that they have to make. It's

26:04

also less helpful for the reviewer and

26:06

some notes for anybody working on this

26:08

part of the code in future.

26:12

And when we look at the content of this,

26:14

we see it's very risky. All the changes

26:16

are tightly focused on addressing the

26:18

code snails that we provided earlier.

26:20

And we've only modified a couple hundred

26:22

lines of code, the bulk of which is

26:23

simply refactoring messed block into its

26:26

own function call.

26:28

Not all the scope to be this small, but

26:30

our batching strategy and narrow

26:32

instructions ensure that the scope of

26:34

the changes are well considered. This

26:36

helps to improve performance, but it

26:38

also will easily

26:41

from here. The full process for removing

26:43

code smells from the entirety of code

26:46

becomes clear. Use the verify to

26:48

identify problems. Use the fixer to spin

26:51

up the address those problems. Review

26:53

and merge those PRs. Unblock new fixes

26:56

and repeat until that entire screen.

26:59

We've already used this tool to make

27:01

some pretty significant changes to the

27:03

code including typing and improving

27:06

test. And we could not have done it

27:08

without the open HSDK power everything

27:10

under the hood. All

27:14

right. So, that's the uh open hands

27:16

refactor SDK powered by our open hands

27:19

agent SDK. Uh we're going to walk

27:20

through a little bit later on the

27:22

workshop how to build something a little

27:24

simpler but very similar where we get

27:26

parallel agents working together to fix

27:28

tasks that were discovered by initial

27:30

agent.

27:34

Uh I want to talk a little bit about

27:36

strategy for both decomposing tasks and

27:38

sharing context between these agents.

27:40

These are both really big important

27:42

parts of agent orchestration. Uh so

27:44

effective task decomposition

27:46

uh you're really looking to uh break

27:49

down your very big problem into tasks

27:51

that a single agent can solve, a single

27:52

agent can one shot. Um something that

27:55

can fit in a single commit, single pull

27:57

request. Um super super important

27:59

because you don't want to be, you know,

28:01

constantly iterating with each of the

28:02

sub agents. You want each one, you want

28:04

a pretty good guarantee that each one is

28:06

just going to one-shot the thing. you'll

28:07

be able to rubber stamp it and get

28:09

merged into your ongoing branch.

28:12

Uh you want to look for things that can

28:13

be parallelized. This is going to be a

28:15

huge way to increase the uh the speed of

28:18

the task. Um you know, if you're just

28:20

executing a bunch of different agents

28:22

serially, you might as well just have a

28:24

single agent moving through the task

28:25

serially. U the more you can

28:27

parallelize, the more you get many

28:28

agents working at once, the faster

28:30

you're going to able to move through the

28:31

task uh and iterate. Um, you want things

28:35

that you can verify as correct very

28:36

easily and quickly. Ideally, you'll have

28:38

something where you can just like look

28:40

at the CI/CD status and have good

28:42

confidence that if everything's green,

28:43

you're good. Uh, maybe you'll need to

28:45

click through the application itself,

28:47

something like that, run a command

28:48

yourself to verify that things look good

28:50

to you. Uh, but you want to be able to

28:52

very quickly understand whether an agent

28:54

has done the work you asked it to or

28:55

not. U, and you want to have clear

28:57

dependencies and order in between tasks.

29:00

Uh you notice these these uh criteria

29:02

are pretty similar to how you might

29:04

break down work for an engineering team,

29:06

right? You need to make sure that you

29:07

have tasks that are maybe separable,

29:09

tasks that like different people on your

29:11

team can execute in parallel and then

29:13

colle the results together. You want to

29:14

know uh once I get task A done, then

29:16

that unlocks tasks B, C, and D and then

29:19

once those are done, we can do E. Um so

29:21

very similar to breaking down work for a

29:23

team of engineers.

29:26

Uh there are a few different strategies

29:28

for breaking down a very large refactor

29:30

like the one we saw challenges do. Uh

29:32

the simplest like most one is to just go

29:34

piece by piece. You know you might

29:35

iterate through every file in the

29:36

repository, every directory, maybe every

29:39

function or class. Um you know this this

29:42

uh is a fairly straightforward way to do

29:45

things. It works well uh if those um

29:48

dependencies are can be kind of executed

29:52

um you know without depending on one

29:53

another too much. Um so good examples

29:56

might be like adding type annotations

29:58

throughout your pipeline codebase. Um

30:01

uh and then you know at the very end

30:03

once you've migrated every single file

30:04

say you can collect all those results

30:05

into a single PR.

30:08

A slightly more sophisticated thing

30:10

would be to create a dependency tree. Um

30:13

and the idea here is to add some

30:14

ordering to that piece by piece approach

30:16

where you know you start as we saw

30:18

Calvin do you start with like the leaf

30:20

nodes in your dependency graph right you

30:22

start with maybe your utility files get

30:24

those migrated over um and then anything

30:26

that depends on those you know it's

30:27

going to have those those initial fixes

30:29

in place and the dependencies can uh can

30:31

start working through um you know their

30:34

their set of the process. You can

30:35

basically back your way up to whatever

30:37

the entry point of the application is.

30:39

Uh this is often a a better way to

30:42

proceed. Um it's more kind of a

30:43

principal approach for how you're going

30:44

to order through these tasks.

30:49

Another example is to create some kind

30:50

of scaffolding that allows you to live

30:52

in both the like pre-migrated and post

30:55

migrated worlds. Um we did this uh for

30:58

example when migrating our React state

31:00

management system. Uh we basically had

31:02

an agent set up uh some scaffolding that

31:05

would allow us to to work with both

31:06

Redux Redux and Zustan at the same time.

31:09

Um pretty ugly, not something you would

31:12

actually really want to do. Um but it

31:13

allowed us to test the application as

31:15

each individual component got migrated

31:17

from the old state management system to

31:18

the new state management system. Uh and

31:21

then we sent off parallel agents for

31:23

each of the components. uh I got each

31:25

component done and then at the very end

31:27

once everything was using zestand we

31:29

were able to rip out all of the u all

31:32

the scaffolding so there was no more

31:33

mention of redux and everything was

31:35

working but having that scaffolding in

31:37

place allowed us to validate you know as

31:39

each agent finished its work for just

31:40

that one component we could validate the

31:42

application was still working that

31:43

component still works uh we didn't have

31:45

to do everything all at once we got some

31:47

kind of human uh feedback from the

31:49

agents

31:52

uh next I want to talk a bit about

31:53

context sharing uh as you go through a

31:57

big large scale project like this uh

31:59

you're going to learn things right

32:00

you're going to figure out okay what I

32:02

my original mental model wasn't actually

32:03

complete I didn't actually uh you know

32:06

understand the problem correctly um your

32:08

agents might uh run into that you know

32:10

you might have a fleet of agents you got

32:12

10 agents running they're all hitting

32:13

the exact same problem you kind of want

32:15

to share the solution of that problem so

32:17

they're not all getting stuck right

32:19

there's a bunch of different strategies

32:20

for doing this context sharing between

32:22

agents

32:24

Uh, one strategy that I think the most

32:26

naive thing you can do is share

32:27

everything. Basically, every agent sees

32:28

every other agent's context. Uh, this

32:30

is, uh, not great. Uh, it's basically

32:32

the same thing as just having a single

32:33

agent working iteratively through the

32:35

task. Uh, you're going to leave your

32:36

context window really quickly if you do

32:38

something like this. Uh, so this is this

32:40

is not going to help.

32:43

Uh, a a better value approach would be

32:45

to have the human being just sort of

32:47

manually enter information into the

32:48

agents. Uh if you have a chat message, a

32:50

chat window with each agent, you can

32:52

just paste in like hey use library 1.2.3

32:55

instead of 1.2.2.

32:57

Um the human can also modify like an

33:00

agent MD or micro agent to pass messages

33:03

to these agents. Uh but this does

33:05

involve manual human effort. Um it

33:08

involves a lot more like babysitting of

33:10

the agents. So it's it's not super

33:12

scalable.

33:14

Uh you can also have the agents

33:15

basically share context with each other

33:17

through a file like agent MD. Uh you can

33:20

allow the agents to actually modify this

33:21

file themselves. Uh maybe they send a

33:23

pull request into the file as they learn

33:25

new things. Uh downside here is that

33:27

sometimes agents will try and learn

33:28

unimportant things. Uh they can get kind

33:30

of aggressive about pushing information

33:32

to this file. Uh so doing some kind of

33:34

human review seems to help.

33:37

And then last uh this is probably the

33:40

most like leading edge idea here. Um,

33:42

but you can basically give each change

33:44

in a tool that allows it to send

33:45

messages to other agents. Uh, it could

33:47

be like a broadcast message that goes

33:48

out to all the other agents. Uh, or it

33:50

could be, uh, you know, pointto-point

33:52

conversation. Uh, this is super, uh, fun

33:55

to experiment with. We're doing a lot

33:56

uh, to experiment with this now, uh,

33:58

with our SDK. Um, but it's, uh, it's

34:01

tricky to get right. It's, uh, you you

34:03

once you get agents talking to each

34:05

other, you're like increasing the, uh,

34:07

level of non-determinism in the system.

34:09

Uh, things can get a little bland. Uh I

34:11

have an example here on the right of uh

34:14

this is from a doctor's report where

34:15

they had two agents just talk to each

34:17

other. They just entered into a loop of

34:18

wishing each other zen perfection.

34:22

Um

34:24

cool. Uh now I want to work through an

34:26

exercise. Uh I would love it if you all

34:29

want to follow along. Um you can access

34:32

this presentation for uh copy pasting

34:35

purposes at uh dev.shophands-workshop.

34:41

Um, we'll work through some coding

34:43

exercises with the open hands SDK

34:45

specifically to uh do CD remediation at

34:48

scale. Um, we're going to write a script

34:51

that will take in a GitHub repository,

34:53

scan it for open source vulnerabilities

34:55

for CDEs. Um, uh, and then set up a

34:59

parallel agent for every single

35:00

vulnerability we find to solve that and

35:02

open up a poll request.

35:04

So, dub.shophandworkshop.

35:08

uh let me know anybody can access it.

35:11

>> It's gonna be the slideshow.

35:14

>> So, so it should be the slideshow if you

35:16

want to. There will be um

35:19

uh copy pasteable prompts and uh links

35:22

and stuff like that around slide 29.

35:24

>> Got it.

35:25

>> We'll get there.

35:29

Uh so in terms of how this process is

35:32

going to work,

35:33

uh basically we're going to start with

35:35

one agent that runs a CVE scan on this

35:38

repository. It's going to stand for

35:39

vulnerabilities. Uh what's nice about

35:41

using an agent for this is it can look

35:42

at the um uh the repository and decide

35:47

how am I going to scan for

35:48

vulnerabilities, right? Am I going to

35:49

use trivia to scan a Docker image? Uh am

35:51

I going to run npm audit on a

35:53

package.json?

35:55

uh so it can it can basically detect the

35:56

programming language to figure out how

35:58

am I going to stand for CDES here. Uh

36:01

then once we have our list of

36:02

vulnerabilities, we're going to run a

36:04

separate agent for each individual

36:06

vulnerability. Uh each of these agents

36:08

is going to research whether or not it's

36:10

solvable. Uh it's going to update the

36:11

relevant dependency, fix any breaking

36:13

API changes throughout the codebase, and

36:15

then open up a poll request. Uh what's

36:18

nice about this is that we can merge

36:19

those individual PRs once they're ready.

36:21

You

36:21

>> show the link again. Yeah.

36:25

Uh what's nice about running the solving

36:27

in parallel is that you know we get we

36:29

get a bunch of different PRs. Uh so we

36:32

can merge them as they're ready. If one

36:34

agent gets stuck, one of the

36:35

vulnerabilities isn't solvable. All the

36:37

other ones are still going to work. Uh

36:39

maybe we get to 90% or 95% solved. Uh we

36:42

don't have to get to 100% in order to

36:44

have any value here. Uh just some quick

36:47

pseudo code of what this is going to

36:49

look like.

36:50

Uh so this is an example using the

36:52

openhance SDK of how to create an agent.

36:55

You can see we create a large language

36:56

model. Um we then pass that large

36:59

language model to an agent object along

37:01

with some tools. Uh a terminal, a file

37:04

owner, a pass tracker for planning. Uh

37:07

we give it a workspace and then we just

37:09

tell it we want to do run. Uh this is a

37:13

pretty like naive hello world example.

37:14

We'll see how it gets a little bit more

37:16

complicated as we progress through this

37:17

particular task.

37:19

Uh but then once that first agent is

37:21

done, we're going to iterate through all

37:23

of the vulnerabilities to get back out.

37:25

Um and then for each one, we'll send off

37:27

a new agent uh asking it to solve that

37:29

particular CDE.

37:35

All right. So, uh to get started here,

37:37

uh it would say create a new GitHub

37:40

repository. Uh we start save our work

37:43

there. Uh you're also going to need both

37:45

a GitHub token and an LLM token.

37:49

Uh, I would, uh, if you sign up for for

37:51

OpenHands app.allands.dev, you can get a

37:54

$10 free credit u LLM credits there. Um,

37:59

if you're already an existing user, let

38:00

me know and I can I can bump up your

38:02

your existing credits for the purpose of

38:03

this exercise.

38:05

Um, then we're going to start uh an

38:08

agent server. Uh, this is a um uh

38:13

basically like a Docker container that's

38:15

going to house all the work that our

38:16

agents are doing. Uh this is a great way

38:19

again to run agents securely and more

38:21

scalably. So instead of running the

38:23

agents on our local machine to solve all

38:25

these CVEes uh we're going to run them

38:27

inside of a container. Hypothetically if

38:29

we were doing thousands of CVEs we could

38:31

run this in like a Kubernetes cluster so

38:33

that you know we have as many

38:34

workstations as we want for our agents

38:36

but for the purposes of this exercise

38:37

we'll just run one one Docker container

38:39

as a home for our agents. Um then we can

38:42

create uh an agent of be or an open

38:44

enhance micro agent to uh you know start

38:47

working through this task. I'm going to

38:48

be using the openhand CLI as we go here.

38:51

Um you're welcome to check out the open

38:52

hand cli. You can also use cursor or pod

38:54

code or whatever you're used to using uh

38:56

as we uh kind of bode our way through a

38:58

CD remediation process with open hands.

39:02

Uh I'm going to give it a couple

39:04

minutes. I'm going to walk through

39:05

creating my GitHub repo, getting my

39:07

GitHub token, etc. Um

39:10

uh if you all have any trouble feel free

39:12

to raise your hand and come around and

39:14

uh help you know getting it all out etc.

39:52

You said app.allhands.dev

39:54

app.

39:55

>> Yeah.

40:06

So, I've got my new GitHub repo here.

40:11

Uh, so I'm gonna add a quick open hands

40:14

micro agent here.

40:32

Perfect.

40:44

I'm just going to tell

40:46

a

40:48

uh process for remediating

40:52

with agents.

40:56

relevant talks

40:59

for the open hand SDK are at

41:08

open hands SDK.

41:13

So some data opens a little bit of

41:15

context

41:18

similar to agent. Um we now have officed

41:24

uh to get a token. I'm not actually

41:27

going to do it here so that was my token

41:29

but you can go to GitHub settings your

41:32

profile

41:35

then developer settings

41:39

personal access tokens.

41:42

I like to do classic tokens.

41:45

Uh classic token. Give it a name and

41:50

then uh the repo scope is really what

41:52

you'll need. Uh that way we can open up

41:54

pull requests uh to solve to the CS

41:56

involved.

42:02

>> We did a classic token not the new

42:05

thing.

42:07

I I haven't gotten a link

42:10

used to you're welcome to do. I guess

42:13

you could create a new repository.

42:14

>> I haven't got to them either. So,

42:16

>> I'm not with you.

42:20

>> Back in the old days.

42:22

>> So, what permissions do we need to

42:24

>> uh just the repo permission?

42:34

Also, it's going to show you sign up for

42:36

app.alland.dev.

42:39

Um,

42:41

you go to

42:43

piece under your profile here, you can

42:46

get your open API key, your L key here.

42:50

I won't show it, but

42:53

this will allow you to use our proxy

43:00

step.

43:05

Last, I'm gonna

43:07

start up some agent server here. You'll

43:09

probably want to copy paste this out of

43:11

the presentation.

43:34

Got my repo close

43:53

dinner.

44:05

Maybe

44:24

that's back here. If you do want to work

44:26

with the open hand cli

44:33

tool install open hands

45:08

I'm going to start up the open hands

45:12

CLI.

45:13

Again, you can use cloud code, cursor,

45:15

whatever else if you want. Uh you folks

45:17

need a little more time with the setup.

45:20

key get token set up.

45:23

Sorry, check.

45:26

Uh so I'm gonna start with

45:29

this first prompt. Uh basically what

45:31

we're going to do is we're going to

45:33

point our agent uh at the open hands SDK

45:37

point it at the documentation

45:39

uh and just ask it to basically check

45:42

that our LLM API key is working that it

45:44

can actually do an LLM deletion. This

45:46

will be like a very basic hello world.

45:48

just kind of get started here. Um, I'm

45:51

going to tell it uh I'm using I'm using

45:53

the open hands uh key that I generated

45:56

at app.allands.dub.

45:58

Um, so I'm telling it to use this open

46:00

handbon 4 model. Uh, you can replace

46:03

this with enthropic. If you want to use

46:05

just like a regular anthropic API key.

46:08

Uh, you may need to set this model a bit

46:09

differently depending on if you're using

46:11

open AI using light.

46:14

You can look at the light all docs to

46:16

figure out if you have an open API key

46:17

or an open AI key. Uh you can look at

46:20

the light all docs to figure out which

46:21

model plug for the string. But I'm just

46:24

going to copy paste this as is.

46:39

Sorry, what's the step for uh agents.md

46:42

or the one for open hands?

46:44

>> So I would say just create a u a file

46:48

either the agents.md if you're working

46:49

with a a tool that's compatible with

46:51

that or

46:53

uh for open hands we have it's called a

46:56

micro aent I can get to it. Uh so

46:59

openhands.openhandsmicroagent

47:03

by convention repo.mmd is the

47:05

description of the repository you're in.

47:07

Um and I just gave it a couple links to

47:11

the SDK documentation uh and the

47:13

repository for the SDK so it has access

47:15

to you know basically the the API docs

47:18

there.

47:21

This is kind of an optional step. Make

47:22

things a little easier though.

47:34

is doing. All right, it thinks it's got

47:37

something good. So, let's see what's

47:40

going on.

47:42

Python CV solver

47:57

need

47:59

environment variables.

48:10

I'm using the to set my brightness here.

48:15

Make sure I don't check those in.

48:23

One more time.

48:33

Got a small error. Looks like the agent

48:36

didn't quite get the API doc right.

48:40

Let's uh paste the error back. See what

48:43

happens.

49:01

Let's try again.

49:03

Of

49:14

course, never never go.

49:51

Not there.

49:54

She's working.

50:49

version.

50:57

>> Let's use club.

51:13

UV

51:16

tool install that breaks.

51:19

>> Yeah.

51:19

>> You know what version of UV you're on?

51:21

>> I'm on 096.9.6.

51:26

>> What error are you getting?

51:28

>> I don't know why.

51:30

No executables are provided by package

51:32

open hands. Removing tool error failed

51:34

to install entry points.

51:37

>> I'm newish to the Python world. So I

51:39

assumed I was doing silly.

51:42

>> You could try updating on 111 which is

51:45

what I'm on. But okay. Yeah, I'll try.

51:48

>> Another question.

51:49

>> Yeah.

51:49

>> Um, so I was able to I see you running

51:52

through the CLI. I was able to run this

51:53

on the like all all.dev.

51:57

>> Yeah. Cool. and it submitted a PR and

51:58

created it. Looks good.

52:00

>> Awesome.

52:00

>> Why are you doing it through the CLI?

52:02

>> Uh really just for

52:05

um

52:06

normally I actually prefer to work

52:08

through the web UI here. Um

52:12

I think uh being able to like run and

52:15

show that script is working locally. Uh

52:17

it's like a little bit better of a hand

52:19

out. I actually like to work through the

52:20

web UI normally and then have the agent

52:23

push and I pull locally if I really want

52:25

to work locally, but figured that was

52:26

just extra extra steps for presenting

52:29

purposes.

52:30

Yeah, feel free to use the the web or

52:32

the tool.

52:41

Looks like I

52:44

API key here. Come

53:12

Jesus.

54:00

200.

54:01

>> What's that?

54:03

>> Should we get 200?

54:09

>> Uh yeah, you should get something like

54:11

this. Uh like I just got finally uh

54:14

where the other one says hello.

54:21

Just

54:25

section.

54:37

Anybody managed to get connection

54:39

working?

54:40

>> I think so. I've created the file.

54:43

>> Nice.

54:47

Uh just a quick view of what this looks

54:50

like

54:52

in the first

54:56

basically you can see we create an tell

54:58

what model we want to use what I key we

55:00

want we want to use

55:03

and then just send a quick message to to

55:06

the to make sure it's actually working.

55:10

Uh all right for the second time I'm

55:11

going to move towards prompt two. Uh so

55:15

here we're going to actually start to do

55:17

some work for the uh so we're going to

55:19

tell um you know the agent we're working

55:21

with uh we want to use the SDK to create

55:25

a new agent uh that's going to take in a

55:28

GitHub repository. Uh it's going to

55:31

connect to a remote workspace uh running

55:33

at localhost 8000. Again that's the the

55:36

docker start command from before. If you

55:37

haven't already run that now's a good

55:39

time to get Docker running. uh Docker

55:42

run this agent server. Uh

55:46

it's going to uh clone our repository

55:49

into that Docker container. Uh we're

55:51

going to create an agent that's going to

55:53

work inside that Docker container and

55:54

we're going to tell that agent to scan

55:56

this repository for any

56:43

with the open ends CLA. Is there a way

56:46

to interrupt and get it to stop?

56:48

>> Uh, hit control P or pause. Yeah.

56:53

>> And then can I insert my corrections?

56:55

>> Yeah. Then you can type me a message or

56:57

just type continue.

57:08

Yeah,

57:13

>> I got the CLI to install, but I had to

57:14

add - AI.

57:19

>> Seems on PI that there's a D AI version,

57:23

but then it says in the docs.

57:26

>> I don't I think the AI one is

57:27

deprecated, but it is it is a usable

57:29

CLI. You want to use that

57:32

service that one off our team.

57:40

Did you get the dash AI1 to work?

57:42

Because as soon as I tried to run it, it

57:44

crashed. Oh,

57:47

>> oops. It installed. I was so happy.

57:49

>> Yeah, it installed and then it it didn't

57:51

work.

57:52

>> There's a deprecation warning when I go

57:53

to version. So, yeah,

57:55

>> there is a if you want to download an

57:57

executable binary on our release page.

57:59

>> Okay,

57:59

>> that might be straightforward. You can

58:01

also run it in a docker container. Um,

58:05

if you CLI docs,

58:09

I think there's a UV run as well.

58:58

Try UV run.

59:10

the version.

59:11

The version

59:23

that's for the not open AI regular.

59:27

>> Okay.

59:29

Thank you.

59:33

Okay, supposedly have an agent working

59:36

here. Let's see. Going to run it with

59:39

repo. It should have a few CVs in it.

59:42

Let's see if we find any vulnerable

59:46

by default. Open hands. We'll uh

59:53

we'll visualize the output here. So, we

59:55

can see the agent working uh even with

59:57

the SDK. pretty similar to how we saw

60:00

the uh uh CLI.

60:03

Uh you can see it's task list.

60:07

It's uh

60:10

the repository.

60:18

It's uh doesn't have trivia itself. So

60:20

it's like trivia. It's basically doing

60:22

what we would expect an agent to do. Uh,

60:25

we've been a task. We can't get to it.

60:49

So, we're running Trivia now.

61:04

show a bit about what this what this

61:06

generated code looks like. Uh you can

61:09

see so we we instantiated our LLM in the

61:11

first step.

61:14

Now we're actually passing this LLM to

61:15

an agent. We're also giving it terminal

61:17

tool and file editor tool. Uh we're

61:20

creating this remote workspace that's

61:22

connecting to our Docker container so

61:23

that a can start working in its own

61:25

environment. Uh we create what's called

61:27

a conversation which is basically one

61:29

chunk of context that the is going to

61:30

manage as it goes about it its work. Uh

61:33

we pass it a task with some clear

61:35

instructions for what it's supposed to

61:36

do and then send that send that task.

61:46

Looks like that initial scanner agent is

61:48

almost done.

61:57

Looks like that agent ran just fine. Got

62:00

these results.

62:08

I'll keep uh keep plugging along here.

62:10

We've got an agent that's uh scanning

62:13

for vulnerabilities.

62:17

Uh so the next thing I'm going to ask

62:19

this to do is basically we're going to

62:21

reach into the environment and get the

62:23

vulnerability list out from it. Uh the

62:27

idea is we're going to have it save as

62:29

the vulnerabilities to a JSON file. Uh

62:32

then we can on that workspace object

62:33

inside of the docker container we can

62:35

run execute command in order to get

62:36

those vulnerabilities back out. We also

62:38

have some some options for like

62:40

manipulating files uh within the

62:43

workspace. Uh then for now we're just

62:45

going to iterate over the

62:46

vulnerabilities.json file, print it down

62:49

just so we can see we were able to reach

62:51

into this workspace and get some

62:52

information back out.

64:26

All

64:50

right. Supposedly

64:52

good to go. See what happens.

64:56

Sheep.

66:05

Got some vulnerability results.

66:33

Agent's finished. Let's see if our

66:34

script can get

66:37

results back.

66:46

person. Jace

67:13

Um,

68:16

One more time.

68:25

>> What is the observation event? So for

68:28

every

68:30

states uh there's a there's an action

68:32

and then an observation. So it might be

68:34

run this command and then an observation

68:36

comes back with the output of that.

68:40

>> Uh it's more than a it's it's the

68:45

basically the entire trajectory the

68:46

agent takes of events and then there's

68:49

two kinds of events actions and

68:50

observations. So fans whenever we get

68:53

calls with the LM it comes back with an

68:55

action to take or basically a tool call

68:58

uh and then the observation is like a

68:59

tool call.

69:23

If anyone stuck on anything, happy to

69:25

come around to free to raise a hand.

69:39

Number

69:41

three.

69:52

>> Nice. Yeah, it looks like it's printing

69:54

the CV list. Yeah, that looks good.

70:06

create like

70:08

a specific sub agent for each script we

70:12

are running. Why you overating

70:15

the same file again and again?

70:19

>> So the the process we're going through

70:22

here with the five the five prompts this

70:24

is really

70:26

uh to demonstrate what it would feel

70:28

like to actually like build with our

70:29

SDK, right? uh this is not the way that

70:32

I would this is the way I I would maybe

70:34

like work if I was actively working on a

70:36

problem you know I could have just given

70:37

you this this whole fully packaged code

70:40

base pre-built right yeah

70:41

>> that had all this built but uh

70:45

is that what you're asking like why are

70:46

we why are we pasting these prompts in

70:48

one by one

70:48

>> eventually we get a very large script

70:52

right we should break it several

70:54

separate files or sections

70:57

>> yeah yeah yeah no I think there's

70:59

there's definitely better ways to

71:00

organize this code than to have one

71:01

single script just uh easier for demo

71:04

purposes. Yes, I do have a I do have a

71:07

demo repo um I think it's openhand CVE

71:10

demo that uses special classes. There's

71:13

a single, you know, CVE agent subassm

71:17

that's a little bit more than just this

71:20

one script.

71:33

We're still pressing JSON.

71:52

Seems

72:09

Yes.

72:46

Focus.

72:49

Enough of us.

73:38

That's beautiful question.

73:48

Our

74:16

SC

75:04

the open source models.

75:13

We're actually

75:34

I don't know what I'll be doing.

75:39

All right.

75:41

The thing is

75:49

I mean

76:00

>> Yeah.

76:20

Heat.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video discusses the evolution and application of AI agents in software engineering, focusing on automating large-scale refactoring and maintenance tasks. It introduces Open Hands, an MIT-licensed coding agent, and traces the progression from simple code snippets to context-aware generation, autonomous agents, and now parallel agents (agent orchestration). The presentation highlights the shift from IDE plugins to cloud-based agents for scalability and security. It details use cases for agent orchestration, such as CVE remediation and code modernization, emphasizing that while single agents are useful for smaller tasks, complex refactors benefit from coordinated multi-agent approaches. The video also touches upon the challenges and strategies for task decomposition, context sharing, and human-in-the-loop processes. Finally, it provides a practical demonstration of using the Open Hands SDK for CVE remediation at scale, showcasing the setup and execution of agents for scanning, solving, and submitting pull requests for vulnerabilities.

Suggested questions

9 ready-made prompts