Hacking Subagents Into Codex CLI — Brian John, Betterup

Watch on YouTube

Now Playing

Transcript

334 segments

0:02

Hi everybody, my name is Brian John and

0:05

I'm excited today to talk to you about

0:07

hacking sub agents in the codeex CLI. So

0:10

who am I? I'm a principal fullstack

0:13

engineer. My current focus at work is AI

0:16

enablement for R&D. So think helping our

0:19

R&D team members get their work done

0:22

faster and with higher quality using AI.

0:25

The company I work for is BetterUp. It's

0:27

an awesome place to work. We've been

0:29

using AI since the very beginning. I've

0:31

been there for over eight years now,

0:33

which is longer than any place I've ever

0:34

worked before. And our mission is to

0:37

help people everywhere live their lives

0:39

with better purpose, clarity, and

0:41

passion. If that sounds interesting to

0:43

you, you work want to work on cool stuff

0:45

with LLMs,

0:47

please hit me up. I'll add my contact

0:49

info in the last slide.

0:52

So, why would we want to hack sub agents

0:55

into Codeex CLI?

0:59

Well, I've been using Clog Code as my

1:01

daily driver since the very beginning.

1:04

It's a great tool. It's got tons of

1:06

bells and whistles. It's got great

1:08

models,

1:09

and I use sub agents all the time. But I

1:12

don't want to be locked in to one tool,

1:14

and I really don't want to be locked in

1:16

to one model family.

1:19

I wanted to be able to use other tools,

1:22

particularly codec CLI, because the

1:24

models look really good and I want to be

1:27

able to still use sub aents with them so

1:29

that I can use my workflows

1:31

with other tools.

1:35

Context management. So, as you all know,

1:38

sub agents are amazing for context

1:40

management. The main agent can give a

1:43

problem to a sub agent. It can go off,

1:46

do its work, use its tokens, and pass

1:49

just the answer back to the main agent.

1:51

And all that context got used up by the

1:53

sub agent doesn't end up in your main

1:55

context window, which is incredible.

2:00

And I don't think I have to say any more

2:02

about this one. We've all seen this way

2:03

too many times and it gets annoying.

2:06

And I have to give credit where credit

2:08

is due. This talk by Dex Hory changed

2:12

the way that I work with AI.

2:15

The workflows he proposes here I found

2:18

to be really effective especially in

2:22

working with large code bases. I'd

2:25

recommend you check out this talk. He's

2:27

also talking at AI engineer code this

2:29

year and I recommend you check out that

2:31

one too because I'm sure it's going to

2:32

be great.

2:36

All right, so let's talk about design.

2:40

At the end of the day, a sub aent is

2:42

really simple. It's just another

2:44

instance of the main agent. So our

2:46

design can also be really simple.

2:49

In this case, we're going to have our

2:50

parent codec session.

2:52

We're going to have it run a script.

2:54

It's just going to be a wrapper script

2:56

that's going to kind of take care of

2:58

like figuring out what agent to run.

3:00

It's going to build the prompt, etc.

3:02

It's going to kick off codeex exec. So

3:05

that child codeex is going to run as the

3:07

sub aent. It's going to respond to the

3:09

prompt. It's going to do its work and

3:10

it's going to write its answer into a

3:12

file and then our wrapper script is

3:15

going to read that file and it's going

3:17

to print that result to standard out and

3:19

give it back to the parent codec

3:20

session.

3:22

Pretty straightforward.

3:26

Well, this is simple, so it should be

3:28

easy, right? Well, that's what I thought

3:30

too. And I started to get all these

3:32

errors from Codeex when I tried it.

3:36

Codex's sandbox really seems to not want

3:39

to let you do this. Now, you can of

3:41

course run it with dangerously skip

3:42

permissions or whatever. I don't do

3:45

that,

3:47

but to get it to work with the normal

3:49

set of permissions actually was really,

3:52

really hard and I bang my head against

3:53

the wall a long time trying to get this

3:55

to work.

3:59

So, figuring out the minimum required

4:01

permissions is probably the hardest part

4:04

about this. getting the combination just

4:06

right. On the parent, you need at least

4:08

sandbox of workspace, right, to be able

4:10

to run the codeex command. You can

4:12

always run that dangerously whatever

4:15

whatever command if you want. Again, I

4:17

don't really do that. The child process

4:20

is a little bit trickier. The sandbox

4:22

prevents its access to the OpenAI

4:25

credentials in your home directory since

4:27

it's outside of the workspace.

4:30

the you need at least sandbox workspace

4:32

write again so that it can write the

4:35

file that the uh wrapper script is going

4:38

to read and you need to disable this

4:41

thing called the rollout recorder

4:44

which is like a logging thing the just

4:46

because the parent sandbox again it

4:48

prevents file system access to any

4:51

subcomands

4:52

that are outside of the workspace

5:00

All right, before we go any further, I

5:02

have to give a quick note about

5:03

security.

5:05

Meta recently wrote a great paper called

5:07

the agents rule of two that I think

5:09

explains this really, really well. And

5:12

what it says is there's three things you

5:13

need to care about with your agent when

5:15

it comes to security. whether it's

5:17

processing untrustworthy input, whether

5:20

it has access to sensitive systems or

5:22

private data, and whether it can change

5:24

state or communicate externally.

5:27

In our case, we're not processing

5:29

untrustworthy inputs.

5:31

We do have access to sensitive systems

5:33

or private data because we're probably

5:35

working with a proprietary codebase.

5:39

And it can change state and it also can

5:41

can communicate externally. Now the

5:44

state that it can change is really kind

5:45

of dependent on your system. In my case,

5:48

it's really not very high risk and the

5:51

communication it does externally is just

5:53

to OpenAI's API endpoint. So again,

5:57

not a major risk, I would say. So that

5:59

puts us in the lower risk category. But

6:03

importantly,

6:04

lower risk does not mean no risk. So

6:07

your mileage may vary here. you need to

6:09

make your own determination on if this

6:11

is something you feel comfortable with.

6:15

With that, let's move forward.

6:18

All right. So, to get codeex to be able

6:20

to use sub agents with this wrapper

6:23

script and everything, we have to tell

6:25

it how to run them. So in our agents MD

6:28

we're going to have just a little bit of

6:30

information here that tells codeex hey

6:32

when I say use the whatever sub agent go

6:38

and actually like run this script and

6:41

you know with these commands or whatever

6:43

and that's how you do it.

6:47

Also we have to tell it when to run sub

6:50

aents. So that would be you know when

6:52

the user asks or just when you think

6:54

helpful. Then we want to tell it what

6:56

sub aents are available and what they

6:58

do.

7:04

All right, with that, let's do a quick

7:06

demo.

7:10

I've put together a really quick and

7:13

small proof of concept repository. It's

7:16

open source. You can go and take a look

7:18

at it yourself. I'll have the URL at the

7:20

end of the talk. Let's just take a look

7:22

at what's in here.

7:24

So, first of all, let's take a look at

7:26

our agents.

7:28

I just created a couple of toy agents

7:30

here. Let's go take a look at them how

7:32

they're defined.

7:35

You can see here each agent has a name.

7:38

It also has a reasoning effort. So,

7:40

depending on what kind of work it's

7:42

doing, you can give it a light, medium,

7:44

you can give it a high reasoning effort,

7:47

whatever you think is appropriate. Then

7:49

you just give it, you know, the prompt

7:51

for the agent. So very similar to kind

7:53

of how claude code sub aents work. In

7:55

this case, it's just counting words. You

7:58

know, this other one is a file writer

7:59

agent. Just going to take some text and

8:01

put it in a file. Don't need much

8:03

reasoning for that.

8:07

All right. So now let's look at our

8:09

wrapper script.

8:15

It's really small, only 72 lines.

8:18

basically just takes in the inputs.

8:24

It's going to call this agent exeutor

8:26

Python class, which I'll show in just a

8:28

minute. Also very small, and it's going

8:29

to return that uh the agent's output to

8:34

standard out so that the main agent can

8:37

see it. Let's look at that agent

8:39

executive class.

8:42

Not going to go through this whole

8:43

thing. Again, it's pretty small.

8:46

basically just kicks off the child sub

8:48

agent with the proper permissions and

8:50

with the right reasoning effort

8:53

and it disables the rollout recorder all

8:55

that kind of stuff just does all that

8:56

for you. So pretty handy. One thing that

9:00

I think I didn't cover you look at

9:02

agents MD

9:05

is it's kind of important here is this

9:08

part. So when we're telling Codeex how

9:13

to invoke the sub agent, we're going to

9:15

have it write the agent name to a file.

9:17

We're going to have it write the user's

9:19

query to a file and then we're going to

9:20

have it run this command.

9:22

You know, another alternative to this

9:24

would be to actually pass the agent name

9:27

and the query as command arguments. The

9:30

reason why we don't want to do that is

9:32

because of Codex's permissioning system.

9:36

As long as the command looks exactly the

9:38

same, you only have to grant permission

9:42

once. But if you have different

9:44

arguments to the command, you have to

9:46

approve it every time. So it gets really

9:49

annoying if you have to approve every

9:52

time that codeex wants to call sub

9:55

agent.

9:57

So in this case, we make the command

9:58

look exactly the same. Codex is just

10:00

going to run it.

10:03

Now, if you run again with dangerously

10:05

skip permissions or whatever, you don't

10:06

have to worry about this.

10:09

But all right, let's go in. Oh, then

10:10

we've got this also this wrapper script

10:12

around codeex. So, let's take a look at

10:13

that real quick. Super simple. Uh, what

10:16

it does is it takes the codeex home

10:19

files from your home directory. It's

10:20

going to sync them into a subdirectory

10:23

so it has access to them and it's going

10:25

to set codeex home to that directory.

10:27

And it's just going to launch codeex. In

10:29

this case, I'm launching in full auto

10:31

mode, which is just like shorthand for

10:33

workspace, write plus, I think, approval

10:36

on a request or something like that. I

10:38

can't remember which one. Um, but pretty

10:40

straightforward. Not much going on here.

10:42

Really not much code.

10:45

All right, let's go ahead and launch

10:46

this.

10:50

Okay,

10:51

now let's just give it just a quick

10:54

query.

10:56

I'm going to tell it to use its work

10:57

counter sub agent. Have it go off and do

11:00

that.

11:04

You're going to see it

11:06

figure out that it needs to run this

11:08

agent exec. It's going to go ahead and

11:10

put the name of the agent in a file.

11:12

It's going to put query in a file. Then

11:15

it's going to ask me for permissions to

11:16

run it. And it's really important here

11:17

that I say yes and don't ask again for

11:19

this command. That way it's not going to

11:22

ask me every time it has to run a sub

11:25

agent.

11:26

You'll notice that it's running

11:28

everything in serial here. Codeex does

11:30

not have the ability to run things

11:32

asynchronously like claw does. So, this

11:35

is slower. And Codeex in general, if

11:38

you've used it, I think you find it's

11:40

slower overall than than Cloud Code. But

11:45

I think that's really kind of

11:46

intentional. seems like Codex is really

11:48

kind of meant to be more of like a

11:50

hands-off unattended type of a tool

11:53

versus clog code is meant to be more

11:55

kind of iterative and so you know I

11:58

think that's actually okay. I found this

11:59

okay for me the way that I've used

12:01

codeex. All right so we can see we got

12:04

that result back printed to standard out

12:06

here and then

12:09

codeex just gave us back the answer. So,

12:10

let's just do one more with this file

12:12

writer sub agent.

12:18

Again, it's going to do the same thing.

12:20

It's going to write that agent name into

12:22

a file. It's going to write

12:25

the query into a file. Then, it's going

12:28

to call that same command. It will not

12:30

ask for permissions this time.

12:34

Oh, and we're using the timeout 600 here

12:36

because some of these agents can

12:38

actually take a long time to run. If

12:40

you're having it do a big task that's

12:42

going to have it look across a whole

12:44

codebase and you have a large codebase,

12:46

it can take up to 10 minutes. I've

12:48

actually seen them take longer, up to 20

12:49

minutes sometimes. So, you might even

12:52

want a longer timeout here. This is what

12:54

I set for this example. In this case,

12:57

this is a pretty easy one. So, it only

12:59

took about 40 seconds. All right. So, it

13:01

wrote the file. Just go ahead and verify

13:03

that.

13:06

All right.

13:08

All right, that's all I have. You can

13:11

find the code at that URL.

13:14

You can find betterup at betterup.com.

13:17

If you have any questions for me, you

13:18

can use my email address or you can DM

13:21

me on X. I don't post anything on X, so

13:24

really no reason to follow me, but go

13:26

ahead if you want. And I hope this was

13:29

helpful for you. And again, if BetterUp

13:32

sounds like an interesting place to you,

13:34

please hit me up. Have a great day.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Brian John explains how to integrate sub-agents into the Codeex CLI to improve context management and maintain flexibility across different AI tools. He demonstrates a wrapper script design that allows the parent agent to invoke child agents by writing parameters to files, effectively bypassing permission hurdles and managing environment constraints. The presentation includes a live proof-of-concept demonstration and addresses security considerations regarding agent execution.