Build Hour: Agents SDK

Watch on YouTube

Now Playing

Transcript

1352 segments

0:00

Hey everyone, welcome to OpenAI Build

0:02

Hours. I'm Christine. I'm on the startup

0:05

marketing team and I'm here with Steve.

0:07

>> Hi everybody.

0:08

>> So you might might remember Steve.

0:10

Welcome back to build hours.

0:12

>> Thank you. It's great to be back

0:13

>> from our last session on responses API.

0:16

Um but today we're going to be talking

0:18

about the agents SDK.

0:20

So if this is your first build hour, uh

0:24

the purpose of build hours is to empower

0:25

you with the best practices, tools, AI

0:28

expertise to help you build with our

0:29

APIs and models. Down below on your

0:32

screen is our homepage. So you can find

0:34

all of our past sessions including

0:36

Steve's one on responses API on demand

0:39

and this will also be on demand after

0:41

our live session. So to give you a

0:44

little snapshot on what we're going to

0:46

be talking about today. So, first we'll

0:49

be going through a lot of the new

0:50

updates on agents SDK. There's now a

0:54

codec style harness um that's really

0:56

improved it and Steve will walk you

0:58

through all of these new releases. Um we

1:01

also have some new releases in the API

1:03

that we'll walk through and then we'll

1:04

move into a live demo. So, this will be

1:06

the majority of the time that we'll be

1:08

spending today. Um and we'll be building

1:09

an agentic uh task tracker and then as

1:12

always we have a lot of time set aside

1:14

for Q&A. So on the right side of your

1:17

screen, you'll see a Q&A tab. So feel

1:19

free to submit your questions. Um we

1:22

have our team who is dialing in

1:24

virtually to answer them during the

1:25

session as well as saving some to answer

1:28

live towards the end. And we'll have a

1:30

very special guest be joining us

1:31

virtually uh to help answer some of

1:33

these. So with that, I will turn it back

1:36

over to Steve to walk through some of

1:37

these slides.

1:38

>> Awesome. Thanks, Christine. Uh so great

1:40

to be back on Build Hour. My name is

1:42

Steve. I'm an engineer on the API team

1:44

here at OpenAI. First thing I want to

1:46

talk about is some of the updates we've

1:48

made to the agents SDK in the last

1:49

couple months. Um kind of uh kind of the

1:53

backstory here is that models are

1:54

getting way better at doing work over

1:57

long periods of time. Uh a lot of folks

1:59

have seen the sort of meter chart that

2:00

shows the ability of models to kind of

2:02

work uh on their own for longer and

2:04

longer periods of time that's just kind

2:05

of going continually up and to the

2:07

right. And there are a lot of great

2:08

examples of this uh even within our own

2:10

products. So many of you know and love

2:12

Codeex, our agent coding tool that you

2:14

can use to kind of build software and do

2:17

tasks from end to end. You've probably

2:19

seen it run for maybe minutes or up to

2:21

an hour. If you've used the goal

2:22

feature, maybe you've seen it run for

2:23

much longer than this internally. Folks

2:25

have gotten Codex to run for days, up to

2:27

a week on tasks. So these models are

2:30

really really able to kind of do things

2:32

over really long trajectories, which is

2:33

really cool. Um we also uh have this

2:36

sort of security agent that's built on

2:38

top of Codeex. um you know the models

2:40

are getting really really good at

2:41

finding security vulnerabilities

2:43

especially in legacy software and so we

2:45

have an agent that's powered by codecs

2:47

that actually scans repos and our our

2:50

own code and code that we depend on for

2:52

vulnerabilities that might affect the

2:54

security of our systems and then we also

2:56

have an internal data agent that's kind

2:59

of powered again by codeex that's

3:01

connected to our data links able to

3:02

answer any questions so I can say

3:04

something like how many people are using

3:06

the agents SDK how many requests were

3:08

there in the responses API two days ago

3:10

and where previously I might have spent

3:12

an hour writing that SQL query, now it's

3:15

just a simple prompt and a few minutes

3:16

later I have an answer which is which is

3:18

really awesome. Uh the reality is though

3:21

that building production grade agents in

3:22

your own systems is is still pretty

3:24

hard. Uh it's pretty hard to kind of

3:27

strike this balance between maximizing

3:29

performance and getting and having sort

3:32

of a really uh flexible maybe cross

3:34

model provider platform. um because it

3:38

more and more the models are kind of

3:39

being tailored to the individual

3:41

harnesses and so it's kind of hard to

3:42

stay in distribution while also

3:43

maximizing that flexibility and that

3:46

customization. Um there's also a lot of

3:49

challenges that come with sort of modern

3:51

computer using agents. So we think of

3:53

codeex as a coding agent uh that can do

3:55

things within a sandbox. Most most of

3:57

the time when you're running codeex that

3:59

sandbox is just your local laptop. So

4:01

those two things are together. We'll

4:02

talk about how we switched that up with

4:03

the agents SDK. But if you're deploying

4:05

agents in production, oftentimes you

4:07

want to be able to run these agents in

4:10

sort of an isolated environment and then

4:12

you have to deal with problems like the

4:14

container dies, it expires, it goes

4:16

away, what do you do? Does that thing

4:18

loadbearing? Did you have state on that

4:20

then rehydrate? Uh this stuff is is

4:22

really really hard. Uh and then kind of

4:24

the third thing is like at the end of

4:26

the day, you want to be able to have

4:27

this sort of like really customizable

4:29

framework you can use to bring your own

4:30

data in and uh maybe add your own skills

4:33

and and all of this stuff like this. And

4:35

so this combination of things is kind of

4:36

makes deploying agents in production a

4:38

little bit challenging. Uh the cool

4:40

thing about the agent SDK, we've added

4:42

we've made some updates to it, but sort

4:44

of the baseline is this open- source

4:45

customizable framework that maybe you've

4:47

used since we released the agent SDK

4:49

last March. It's been around for a

4:51

while, continues to be open source and

4:53

and super flexible. Uh we've added in

4:56

this concept of sandbox using agents and

4:59

you get to decide where these sandboxes

5:01

are run. So if you are you know maybe a

5:03

modal user or an EDB user or a

5:05

cloudflare user and you're kind of

5:06

already using those sandbox products or

5:08

maybe you're new to sandboxes and new to

5:10

sandbox using agents and you want to but

5:12

you're already using one of these

5:13

platforms we have provided first class

5:16

support for using the sandbox products

5:19

in a bunch of these uh different

5:21

platforms like ETB modal cloudflare in

5:23

addition versell uh blackel daya uh and

5:28

just things like docker and uh if you

5:30

just want to use your local laptop for

5:32

testing. Those are kind of all supported

5:33

first class. And then on top of this is

5:35

this Codex style harness. So bringing in

5:37

all of the great stuff that makes CEX

5:39

Codex, all the great tools that are in

5:40

distribution with our models, but work

5:42

great with other models as well. And

5:44

then some of the cool things like skill

5:46

use, uh, computer use, memory, and we'll

5:49

be demoing some of this stuff here in a

5:50

little bit. So uh, yeah. So basically

5:54

what the agents SDK is is it provides

5:56

this sort of model native harness that

5:58

the that the agent can use in order to

6:00

kind of do productive work over long

6:01

periods of time. So if we think about

6:03

maybe what building an agent might have

6:04

looked like you know a year a year and a

6:06

half ago where you're just kind of using

6:07

an API directly an LLM API directly you

6:10

would have done something like you know

6:12

you set up a loop you make a request to

6:14

an LLM API you get some stuff back maybe

6:17

you get a tool you execute that tool you

6:19

update the context and then you kind of

6:21

repeat until the model has no is no

6:23

longer calling any tools you might be

6:25

layering in stuff in addition to that

6:26

like web search file search MCP kind of

6:29

like building spending most of your time

6:30

building this orchestration layer around

6:32

this loop where actually you should be

6:34

spending most of the time building

6:35

product building stuff into your product

6:37

that that makes it it much better for

6:39

your end users and so agents SDK is kind

6:41

of designed to help you do all that

6:43

stuff and not really think about the

6:45

orchestration. So the agent loop is

6:47

built in again very CEX-like in its uh

6:49

design. Uh also web search, file search,

6:52

MCPS, code interpreter, skills, these

6:54

are things are all you get out of the

6:56

box or really easy to turn on. And then

6:58

we've kind of added in some of this like

6:59

sandbox using functionality that I'll

7:02

talk about uh pretty shortly. The thing

7:04

that I'm most excited about that we've

7:06

kind of done here is we've split the

7:08

harness from the compute. And uh what

7:12

does that mean basically? So if you can

7:14

kind of imagine a world where the model

7:17

is working sort of directly on a uh

7:20

computer that it itself is running on.

7:22

So if you can kind of imagine like

7:23

codecs running on your laptop is the

7:25

sort of example where the harness and

7:26

the compute are tied together. Uh

7:28

imagine putting that into production

7:30

right so you have sort of a container

7:32

where the agent is both running. So like

7:34

the the loop that is calling your LLM

7:37

API in addition to working on the same

7:38

file system those things are together.

7:40

This kind of creates a few problems when

7:42

you run this in production at scale. A

7:44

your sandboxes become all of the sudden

7:47

loadbearing and if one dies or goes away

7:49

then all of a sudden all that state is

7:51

gone and you don't have some sort of

7:52

external place where maybe you can

7:54

refresh from. In addition, you have to

7:57

sort of do interesting uh sort of

8:00

gymnastics to manage your secrets. You

8:02

ideally don't want to have any secrets

8:03

on your sandbox otherwise you might be

8:05

vulnerable to prompt injection attacks

8:08

or exfiltration. And so if you kind of

8:10

split those things up, then you can

8:11

treat the sandbox as this totally

8:13

ephemeral thing that you don't really

8:15

have to worry if it lives or dies. It

8:17

can expire, go away. And then the uh

8:19

harness, which is running maybe in a

8:21

temporal job or in AWS or you know kind

8:23

of wherever your infrastructure is

8:24

already running can sort of handle the

8:26

rehydration and snapshotting and all

8:29

this stuff. And the agents SDK is going

8:30

to make that really easy. And I'll kind

8:32

of show some examples of this. Um, so

8:36

agent SDK kind of like already fully

8:37

packed with a ton of features that you

8:40

might have already been using things

8:41

like web search, file search, uh, you

8:44

know, agent memory, text in addition to

8:46

some modalities that you've probably

8:48

been using, uh, text and voice. Um,

8:50

there are some new things that are we're

8:52

pretty excited about that we've added.

8:53

So this is kind of first class support

8:54

for skills. Uh, containers is kind of a

8:57

new thing sort of in the API, also in

8:59

the SDK. And then we've also added agent

9:01

memory so your tasks can improve over

9:03

time um and get better as they as they

9:06

go. So some of the kind of core features

9:09

here again uh code a really you know

9:12

codeex inspired harness a lot of those

9:14

same tools sandbox use so your models

9:16

can work with files in sort of a

9:18

virtualized environment you know do work

9:20

over long periods of time write bash

9:22

write python write whatever to kind of

9:24

accomplish tasks and then as always open

9:27

source super customizable

9:29

socy harness what kind of concretely

9:32

does that mean um the kind of things

9:34

we've brought from codeex into the

9:36

agency SDK are these things like uh

9:39

autoco compaction uh especially you know

9:42

this computer use through through uh the

9:44

shell. So the model is able to write

9:46

shell commands. Um we've you know the

9:49

codec style loop is this sort of async

9:51

shell interaction loop where it can

9:52

write a command maybe wait a little bit

9:54

if it doesn't finish it can go do

9:56

something else and then kind of come

9:57

back to those things and it kind of

9:58

keeps track of what commands it has

10:00

running at any given time. And so we've

10:02

kind of brought that into the agent SDK

10:03

to allow you know coding agents to work

10:06

kind of in any domain in any

10:08

infrastructure. And then sandboxes are

10:10

super crucial because you know modern

10:12

work means working over a lot of files.

10:15

If you think about kind of the work that

10:16

you and I do every day maybe you're

10:17

working in a big codebase with a bunch

10:19

of files editing files and and making

10:21

your changes. Maybe you're working with

10:22

a bunch of PDFs. Maybe you are you know

10:25

creating word documents or powerpoints.

10:27

a agents are no different and they need

10:28

access to those files and a kind of a a

10:30

a place to create those files and store

10:32

them. And so that's kind of where

10:33

sandboxes come into the picture. It's an

10:35

isolated environment where the model has

10:37

access to some files and can do stuff

10:39

and and produce uh meaningful output.

10:42

And uh kind of again agent SDK open

10:45

source and customizable. It's been open

10:47

source since day one. Super, you know,

10:50

uh model agnostic. If you want to use

10:52

other model providers, you're more than

10:53

welcome to. Anything that kind of uses

10:55

the responses API format will plug in

10:56

really well. Uh it you can make this

10:58

multi-tenant super easily. So if you

11:00

want to have a system that's running the

11:01

agents SDK where you're kind of

11:03

processing many requests from users at

11:05

once again super easy. Made session

11:07

management really simple. So agents SDK

11:09

is going to manage the snapshotting uh

11:13

rehydration of tasks. You can kind of

11:14

start from where you left off. All that

11:15

stuff is really simple. Kind of show a

11:17

demo of that. And then really

11:19

configurable. You can bring your own

11:20

tools, bring your own MCPs and and kind

11:22

of all that stuff. So I want to switch

11:24

gears a little bit and talk about some

11:26

other cool things that we've shipped in

11:27

the API in the last couple months. Uh

11:30

first one is this hosted shell tool in

11:32

the responses API. So idea here is I

11:35

want to make a just a quick API call. I

11:38

want to provide some files there that I

11:40

want the model to do something with. Um

11:42

and behind the scenes we're going to

11:43

spin up a container, load that up with

11:45

the files that you've asked us to. The

11:46

model can write code, do some stuff in

11:48

that container and then return the

11:50

results back to you. So think about this

11:52

as like a lightweight version of the

11:53

agents SDK. It's kind of this ephemeral

11:55

container. You can put a few files in

11:57

there. The model can work in that

11:58

container and then it can finish and you

12:00

can you know create these out of band.

12:02

We have a containers endpoint where you

12:04

can go and create a container, upload

12:05

files to it, spin it up, attach it to a

12:08

responses API request or you can kind of

12:10

use it in auto mode where you just add

12:12

some files and then we'll spin up the

12:14

container for you and then spin it back

12:15

down when you're done. Really easy way

12:17

to kind of get started with this, you

12:18

know, sandbox using paradigm.

12:20

We've also added the ability for you to

12:24

kind of control network access in the

12:26

containers you spin up. So if you want

12:27

to do things like domain allow lists,

12:29

lock it down entirely so you have no

12:31

egress or ingress into that container.

12:33

Uh these are kind of like all things

12:35

we've added so you have a little bit

12:36

more security control over your

12:37

containers. And we've also added the

12:40

skills API. So if anybody has used

12:42

skills in the past, there's kind of like

12:44

a pretty common problem that comes into

12:46

play which is that you kind of need a

12:47

central source of truth for these

12:49

skills. whether that's maybe a GitHub or

12:51

a bucket or now we have a skills API

12:54

where you can kind of upload your skills

12:55

too. Uh and a skill is kind of this

12:58

bundle for for those who haven't used

12:59

it. It's kind of a bundle of files that

13:01

has sort of one skill MD file that has

13:03

all the top level stuff in it, but they

13:04

can also contain scripts or other

13:06

resources that the model can use to kind

13:08

of do a pretty specific task. So, the

13:11

example I always use is like maybe you

13:12

have a tax prep skill that kind of

13:14

defines like, okay, here's all the

13:15

things that I need to know as an agent

13:17

in order to do somebody's taxes for like

13:19

tax year 2025. Has all the like IRS

13:22

rules and maybe some scripts to like

13:23

process documents and like fill in a

13:25

1040, something like that. Um, in the

13:27

with the skills API now, you could

13:28

upload that, you can iterate on it over

13:30

time and create versions. You can set up

13:31

default version and then reference those

13:33

versions with the hosted shell and all

13:35

that stuff works together super

13:36

seamlessly.

13:39

Uh yeah, so kind of a couple curl

13:42

snippets here of how you might use this

13:43

thing. Pretty easy. If you already have

13:45

zip files of your skills, you can just

13:46

kind of upload them directly to the API.

13:50

And then the last thing is we've also

13:52

shipped uh the sandbox agent stuff that

13:55

we will be talking about today in

13:57

Typescript. This is a big ask from a lot

13:58

of folks after we ship the Python

13:59

version in April. Uh sand the TypeScript

14:02

version is now available. So if you are

14:03

a TypeScript user, uh knock yourself

14:06

out. So, uh, want to switch gears a

14:08

little bit and kind of go over to the

14:09

demo. Uh, we're going to be kind of like

14:12

building a task tracker with the agents

14:13

SDK and kind of making it aic and then

14:15

offloading all the work so I don't have

14:17

to do anything anymore.

14:19

So, we're going to

14:24

Oops.

14:31

Flip over here. And uh I kind of put

14:33

this kind of like vibe coded this uh

14:37

kind of like sort of like a linear

14:39

board. Doesn't really do anything right

14:40

now, but I can kind of create tasks and

14:43

uh assign them to folks. So the sort of

14:45

idea here is that we are going to be

14:48

doing some conference planning. So you

14:50

can kind of imagine this is a tool that

14:51

I might use as a conference planner to

14:53

put together some talks. So I might have

14:56

tasks for you know creating program

14:59

assets or refining somebody's talk or

15:03

you know uploading stuff to the website

15:04

things like that. Um my goal here is not

15:06

to do any work myself. So we're going to

15:08

kind of automate a lot of this with the

15:09

agents SDK. So I'm going to create a

15:11

fake task here and I'm going to say uh

15:13

you know create program assets and we'll

15:18

say we can leave this blank and we have

15:20

a couple of things down here. where we

15:22

have we can have a assignment. Uh I need

15:25

to refresh this.

15:30

Sorry guys, I need to reset my demo.

15:34

Cool.

15:36

Okay, there we go. So I can uh I can

15:39

create a blank task here that's we can

15:41

say create program

15:44

assets

15:46

and uh I can assign this. Right now it's

15:48

Right now it's just me, but just got a

15:50

sneak peek at what might be coming in a

15:51

few minutes. Um, and I can upload files

15:54

to this. So, if I just go ahead and grab

15:57

my fake files here,

16:01

can drag these in. I can create an

16:04

issue. Uh, and in a pre-agentic world, I

16:07

would assign this to somebody. Maybe I'd

16:09

assign it to myself and I would I would

16:10

do this work, but that's not really that

16:12

much fun. So, we're going to go ahead

16:13

and create an agent to kind of help us

16:14

automate this. So, if I hop back here,

16:21

uh, we are just going to comment in this

16:25

agent definition. So, I want to take a

16:27

sec to kind of walk through what we're

16:28

looking at here. So, if you've been an

16:30

agent SDK user in the past, this will

16:32

look pretty familiar to you. Uh, the

16:34

sandbox agent is a new type of agent.

16:36

Uh, it actually is just a subclass of

16:38

the agent class that you might have been

16:40

using before. So, a lot of the same

16:42

things will still apply. A lot of the

16:44

same parameters are still going to be

16:45

there. We are having we have a couple

16:48

new things that we'll kind of talk about

16:49

in a sec, but suffice to say that we are

16:52

adding some instructions that are kind

16:53

of defined up here. Uh, and then we'll

16:55

be adding some other stuff later. So, if

16:57

I go back to my demo, refresh,

17:01

should be able to assign this to the

17:03

program editor agent. And what I'll say

17:05

is like, please take a look at these

17:08

files and uh, edit them for clarity.

17:14

I'll hit send

17:16

and then we should be able to see that

17:18

kickoff. So the first thing that's going

17:19

to happen is uh because I've configured

17:22

this first agent to use docker as its

17:25

sandbox can actually see that docker

17:27

container running and uh the agents SDK

17:30

is going to handle uploading these files

17:32

to that container. So if I pop this

17:34

might be a little hard to see but uh we

17:37

can see that a lot of these files have

17:39

now made it into the container. So the

17:41

agent is going to be sandboxed to just

17:43

this folder and we'll have access to all

17:44

of these files that it can then use to

17:47

look at. We can as we kind of look over

17:49

here, we can see that it's executing

17:50

commands. It's kind of like looking

17:52

through the files and figuring out

17:54

what's what. And then it's going to kind

17:55

of produce some output. And so it just

17:57

finished and it gave us it just kind of

18:00

gave us a little log of of what it did

18:02

and kind of told us that it's missing

18:04

some data here. That's kind of okay for

18:05

now. So,

18:08

uh, this is great, but we kind of want

18:09

to be able to customize this agent and,

18:12

uh, kind of give it some like really

18:13

specific context that is pretty

18:16

particular to what it needs to do. And a

18:18

really good way to do that is through

18:19

skills. So, prior to this demo, I kind

18:22

of put together this skill that's for a

18:25

conference, the conference program

18:26

editor. It's got some stuff in here.

18:28

Again, this is kind of entirely vibe

18:29

coded, so the actual content doesn't

18:31

matter. But what I want to point out is

18:32

that GitHub is actually a really great

18:34

place to store skills and kind of

18:35

addition to the skills API. The reason

18:37

GitHub is so great is because you can

18:40

have a lot of this a lot of these uh

18:42

kind of concerns of producing and

18:44

editing skills are kind of taken care of

18:46

for you. So you know if you want to have

18:49

version control over your skills, if you

18:50

want to have pull requests so that you

18:52

can review changes to skills, this is

18:54

all really great and a pretty common way

18:56

that people are starting to create

18:57

repositories of their skills is kind of

18:59

in Git. And we have first class support

19:01

for that in the agent SDK. So if I come

19:03

back here and I comment this stuff back

19:06

19:08

and hop down, we can see that we have

19:10

this uh skills capability. And I want to

19:13

pause for a sec to kind of talk about

19:14

what a capability is in the new SDK.

19:17

Capability is kind of uh this object

19:20

that is able to combine a bunch of

19:22

different concepts all into one. So if

19:24

you want to bring in uh tools, if you

19:27

want to bring in additional

19:27

instructions, maybe add some stuff to

19:29

the manifest so that a capability can

19:31

put things into the computer when it

19:33

starts up. These are kind of all things

19:35

capability can do. The sort of default

19:37

set is file system. So this is the

19:39

ability for the model to view images and

19:42

run the apply patch tool. So kind of

19:43

like edit files in line. And then we

19:45

also have the shell tool which is

19:47

actually a combination of of two tools.

19:48

This is as I mentioned earlier exact

19:50

command and write standard in. This is

19:52

this async bash command flow. And then

19:55

we have compaction which is allows the

19:57

model to compact its context and keep

19:59

going even when it exceeds the context

20:01

window. And this allows the model to

20:02

keep working for a super long period of

20:05

time, hours, weeks, technically uh

20:07

there's no limit to to how long it can

20:09

work. So these this is kind of the

20:11

default set. We're also adding the

20:12

skills capability. And then what we're

20:14

saying here is that we want to load our

20:15

skills from this git repo. And so I'm

20:17

pointing it at this git repo that I just

20:20

created. Uh we're saying we want to pull

20:22

from main. If you're not using GitHub,

20:24

you can actually change the host here,

20:25

but uh it's we're kind of like using

20:27

GitHub by default. So I'll go ahead and

20:29

save this. We'll flip back to our demo.

20:32

I'm going to ask the question, tell me

20:35

the most

20:37

interesting

20:39

thing from your skill file.

20:43

Will it send? This is going to kind of

20:44

spin back up, but I want to take a sec

20:45

now to actually talk about how we do

20:47

resume uh like pausing and resuming

20:50

behavior between sessions at the agents

20:52

SDK. When you start a sandbox agent, the

20:54

agents SDK will handle spinning up that

20:56

sandbox for you based on how you define

20:58

it. So, if I hop back to the code here,

21:02

we can go and say for by default, we're

21:04

using Docker. And so, I'm kind of just

21:06

providing these two classes here to kind

21:08

of define where I want to run the agent.

21:09

In this case, I'm saying I want to use

21:10

the Docker sandbox client. the and I can

21:13

provide some options. Different backends

21:15

might have different options. Docker, I

21:17

can provide an image. In this case, it's

21:19

just uh the Python 312 image. And uh you

21:23

can kind of easily swap this out for any

21:25

of the backends that that I mentioned

21:26

before. And we'll kind of show an

21:27

example of that in a little bit. Um but

21:29

what's cool is that when that when this

21:31

task spins down, when it stops, the

21:33

agent will automatically stop that

21:35

container and then we'll take a snapshot

21:36

of the file system and put it somewhere

21:38

you define. Right now that place is just

21:41

my laptop. So if I go over here

21:45

uh we can see some of the snapshots. So

21:48

this is kind of the a couple tasks that

21:50

I recently created. And if I untar one

21:52

of these and look, it's basically just

21:55

the exact file system that the model was

21:57

using. So when I resume that task, it is

22:01

grabbing that tarball from the place

22:02

that I defined, spinning up a new

22:04

container if the last one doesn't exist

22:06

anymore and then rehydrating that file

22:08

system onto that new container so that

22:11

it to the model looks exactly the same

22:13

as it did when it left off and the model

22:15

is actually unaware of the fact that

22:16

it's running on a new container is

22:18

totally oblivious to that. And this

22:20

makes resuming really easily easy and

22:21

seamless. And if you don't want to use

22:23

your local file system or tarballs, we

22:25

have sort of provider specific ways that

22:28

you can do this. So if you're a modal

22:29

customer and you use volumes or their

22:31

snapshot concept, this stuff plugs right

22:33

in super easily. But kind of by default,

22:36

we're using this sort of like naive

22:37

tarball approach to kind of help make

22:39

this easy across any provider. So uh we

22:42

can kind of go back and if I refresh

22:46

should be done. And it was able to kind

22:48

of like look at the skill file. If I

22:50

look at the output here, we can see that

22:53

it probably was kind of graping over

22:55

this skill file. Uh, it did quite a bit

22:59

quite a bit of stuff. So, um, but it was

23:01

able to kind of give me some output

23:02

here, kind of, uh, telling me what was

23:05

the most interesting thing here. So, now

23:08

we have an agent. It can use a skill. We

23:10

want to kind of hoist this thing up into

23:12

the cloud. We don't want to be running

23:13

this in Docker forever. Docker's great

23:14

for testing, especially on your laptop,

23:16

because just kind of works out of the

23:17

box. It uh is a really great place to

23:21

kind of do local development, but it's a

23:23

bit hard to deploy in production.

23:24

There's a bunch of companies that have

23:26

built really great first class agent

23:28

sandbox products that you can use that

23:30

we've added first class support for. In

23:32

this demo, I'm going to be using modal,

23:34

but you could also be using Cloudflare,

23:35

ETB, Verscell, Daytona, Blackel, uh or

23:39

you know any of the other you can kind

23:41

of bring your own implementation if if

23:42

you choose to. So I've already set up a

23:45

uh modal kind of app here for us. So I'm

23:49

going to go back to my

23:52

uh code here and we'll go to step three

23:57

and we will enable modal and here we're

24:00

kind of returning this sandbox provider.

24:02

Um but the kind of the key thing here is

24:04

just this client and options which are

24:05

the things that the the agents SDK

24:07

expects and here we're just kind of

24:09

providing this modal sandbox client. If

24:11

I go look at the definition of this

24:13

thing, uh this just inherits from base

24:15

sandbox client just like the docker

24:16

version and we are saying here that uh

24:20

we are going to use the same image. So

24:22

that same Python 312 image and then uh

24:26

in addition we're going to be using R2

24:28

for snapshots. So that both our

24:29

snapshots are now off my laptop and into

24:31

the cloud. Additionally, the modal

24:34

sandbox is going to be moving into the

24:35

cloud as well. So we'll kind of show an

24:37

example of what this looks like. So save

24:38

all this. Go back to my

24:42

here and we'll say uh edit these files

24:46

for clarity.

24:49

Go back,

24:59

load this,

25:02

and we'll assign it to our program

25:05

editor and we'll say get started,

25:08

please.

25:10

And so now we're starting it. We can see

25:12

that we're starting a modal sandbox

25:13

instead. And if I hop over to modal, I

25:16

can see that now we have a sandbox uh

25:18

that's starting. Zoom in a little bit if

25:20

that's hard to see. And this will take

25:23

uh usually they start up in just a few

25:25

seconds or around a second. So now it's

25:27

running. What's happening behind the

25:28

scenes is the agent SDK is handling sort

25:31

of the applying all of the files that we

25:34

have defined onto that sandbox. And so

25:36

want to take a sec now to actually go

25:37

and look at this manifest object that we

25:40

talked about. And a manifest is is

25:42

pretty simple in concept. You're

25:43

basically saying this is what I want the

25:44

file system to look like when the agent

25:46

spins up. Here's how I kind of attach

25:48

files to this task in a way that I can

25:50

share across tasks or share across

25:51

agents. So manifest is just kind of a

25:54

simple class. Um we're saying the

25:56

entries are this tree here basically

25:58

kind of defining a directory structure

26:00

and this this is a pretty simple

26:01

example. So we're just saying we want to

26:03

kind of establish this base directory

26:04

structure and then we'll upload files to

26:06

the sandbox directly. But you could also

26:08

here plug in things like files if you

26:10

want to copy things from, you know,

26:11

wherever the harness is running. Or if

26:14

you want to plug in an R2 bucket, S3

26:16

bucket, an Azure blob store account, all

26:18

these things are fair game, a GitHub

26:20

repo, kind of a whole flexible set of

26:22

primitives here that allow you to put

26:24

the files on the container that the

26:26

model needs in order to kind of get

26:27

started. And what happens is that we

26:29

render a version of the file tree to the

26:31

model when it starts up so that it can

26:33

kind of see an example of its workspace.

26:36

Uh so that it doesn't really have to do

26:37

too much grepping or lsing to understand

26:39

what's on the file system. And so you

26:40

can kind of add descriptions to these

26:41

things too, but we won't get too far

26:43

into that. Um so now if I go back to the

26:49

demo, we can see that this finished. Uh

26:52

and what I want to show here is that now

26:55

if I go to my Cloudflare account and

26:57

refresh, we can see that we now have

27:00

snapshots that are stored in the cloud.

27:01

So if I want to resume this task and I

27:06

dep have this app deployed in

27:07

production, it's really easy for the the

27:11

agent to just kind of like pull its

27:12

context for the agent to pull its

27:14

context and the snapshot from R2 instead

27:16

of you know anywhere else. Uh it just

27:18

provides like a fast pretty clear

27:20

interface for this and so just want to

27:22

give a quick demo of that. So now uh we

27:25

kind of want to we have this agent it's

27:27

deployed in the cloud. It's pulling its

27:29

snapshots from another cloud provider

27:31

and we want to go ahead and add our own

27:32

tools into it because you know bringing

27:34

our own context and our own

27:35

functionality into agents is is pretty

27:37

critical. So if I go back to my program

27:39

editor agent, I'm going to comment this

27:41

back in

27:44

and we have some predefined tools that

27:46

we built sort of before this demo.

27:49

There's there's three of them. We have

27:50

one for updating the status of a task.

27:52

We have one for updating the assenee of

27:54

a task. And then we have one for

27:56

searching all of the assignees. So, uh,

27:58

if you've used the agents SDK before,

28:00

function tools work exactly the same way

28:02

as they always have. They're essentially

28:03

just Python functions or in TypeScript,

28:05

TypeScript functions, and you decorate

28:07

them using this function tool decorator.

28:10

And what this does is it allows us to

28:13

translate this function call into sort

28:15

of the API representation of the

28:17

function tool with all the parameters

28:19

uh, clearly defined so the model knows

28:21

what to call. And then when the agents

28:23

SDK receives from the API an instruction

28:25

to call a specific tool, it routes it

28:27

automatically to this function, calls

28:29

this function and then sends the

28:30

response right back. So you don't have

28:32

to sort of like be implementing this

28:34

function calling loop yourself. Uh and

28:36

so we will uh just show a quick demo of

28:40

this. I'll go back here and I'll say um

28:45

please assign this

28:50

task to Steve and mark it as ready for

28:55

review.

28:57

Let's send. While this is spinning up,

28:59

I'll kind of cover some of the other

29:00

cool things you can do with the function

29:02

tool. Um kind of a bunch of things you

29:04

can do here. So if you want to kind of

29:06

override the name, like what the model

29:07

sees is different from what the name of

29:09

the function is, you can do that. You

29:10

can provide a description to kind of

29:11

help steer the model to how to use this

29:13

tool. You can provide guard rails around

29:17

uh whether you know if the model's call

29:19

trying to call the tool with specific

29:20

arguments. You can add hooks around that

29:22

to kind of you know control the

29:24

execution flow. You can add timeouts and

29:26

things like this which is which is

29:27

pretty cool. Um and you can kind of like

29:30

set whether it's enabled kind of

29:32

dynamically maybe based on a feature

29:33

flag or something else. Uh so you don't

29:35

have to like comment and comment out

29:36

these these lists of tools which is

29:37

which is pretty cool. So go back to our

29:40

demo here

29:43

and we should see it uh assign the task

29:48

to me which is cool. So we can kind of

29:49

see like using these tools and this is a

29:51

great way to sort of bring your own

29:53

context or your own functionality from

29:55

your app into your agent whether maybe

29:57

you have want to give it access to your

29:59

database or want to have it be able to

30:02

you know uh use Slack through an MCP or

30:04

something like this. These are kind of

30:05

like great ways to bring in your own

30:07

functionality into the SDK and into your

30:09

agent. So, uh, now we are going to talk

30:14

a little bit about like tool call

30:15

approvals. So, ideally, um, there are

30:19

certain tools that maybe kind of require

30:21

a human of the loop step in order for

30:22

you to have confidence that it's kind of

30:24

doing what you ask and not going off the

30:26

rails. If you have been a Codex user in

30:28

the past, you're pretty familiar with

30:29

this. It tends to ask you a lot, do you

30:30

want me to run this bash command or am I

30:32

allowed to access the internet? This

30:34

sort of thing. You can do this in the

30:35

agent SDK pretty easily by just

30:37

providing a function. Uh you can I can

30:40

either say you know true directly or I

30:43

can provide a function that returns true

30:45

or false dynamically based on the

30:47

parameters. And so in this example we've

30:49

said if you're trying to set the status

30:51

to done then it needs my approval first.

30:54

I'm not going to let you mark something

30:55

as done without taking a quick look at

30:57

it first. And so if I refresh this and

31:00

go back and we'll go back to this other

31:01

task we have and I'll say please mark

31:04

this task as done.

31:07

We'll send that and this is going to

31:08

spin up a sandbox rehydrate the state.

31:11

Uh it will kind of like examine its

31:13

current state and then it should call

31:14

the tool and then we should kind of see

31:16

a tool call approval um pop over and

31:19

over here. Uh while this is working

31:22

another thing we can kind of show is the

31:23

ability for the agents to kind of hand

31:25

off to each other. So let's say I want

31:28

to, you know, I kind of have this

31:30

built-in mechanism here where we have

31:32

the ability to search and update the

31:33

assigne. And this is great for agents to

31:35

kind of work together. So I have two

31:37

agents now in my system. I have one for

31:39

producing assets. Um, and then I have

31:41

one for just like the program editor

31:43

agent that we looked at earlier. And so

31:45

I can go ahead and create an issue new

31:46

issue here and say like uh please

31:50

31:52

refine and uh

31:56

this content

31:59

and produce some assets. We'll hit uh

32:03

drop some files in our same set and then

32:06

we'll assign this to the program editor

32:09

agent and we'll say like please uh

32:12

refine this content

32:15

and then assign to the what's the other

32:20

one called asset producer agent

32:26

and we'll create that. Then we'll kick

32:28

this task off. the program editor agent

32:30

will be the first one to pick it up. It

32:31

will kind of do its thing and then it

32:33

will uh reassign this to the asset

32:36

producer agent and then that agent will

32:38

kind of kick off and then you know be

32:40

able to create some assets and then put

32:42

them back in the sandbox. And so kind of

32:44

coming back to my uh previous task here

32:48

it looks like this marked as done. If I

32:50

go to activity here I can see that we

32:52

now have this request uh for approval.

32:56

And if I go ahead and approve this, then

32:57

this will kind of kick the flow back off

33:00

and then it will eventually kind of get

33:01

marked as as done here.

33:04

So, uh let's see. I think the last thing

33:08

I want to show here is sort of the

33:09

ability to

33:11

to mount uh external buckets to your

33:14

manifests in a way that uh allows you to

33:18

kind of bring in external data. So, what

33:20

are the reasons you might want to do

33:21

this? Let's say you have either a ton of

33:23

files, let's say you have like hundreds

33:24

of PDFs that you need the model to be

33:26

able to work through and it's kind of

33:28

annoying to just, you know, have to copy

33:29

all of those to a sandbox every time you

33:32

want to spin it up and have it do some

33:34

work. Or let's say you have data with a

33:36

a strong freshness constraint, so the

33:38

data is changing all the time and you

33:40

want the agents to be able to work over

33:41

data that is like super live. In either

33:43

one of these cases, mounting data from a

33:45

centralized source of truth instead of

33:47

copying it in where it might already be

33:48

out of date by the time the agent starts

33:50

working is pretty critical. And we've

33:52

built that natively into the agents SDK.

33:55

So if I go to

33:58

34:03

if I go here, what I'm going to do is

34:05

kind of comment this in. And we've built

34:06

this uh R2 attachment store. And so this

34:08

is kind of doing a couple things. This

34:10

is a taking our uploaded files and then

34:13

instead of just putting them on my

34:15

laptop and then copying them to the

34:17

container, we're actually now uploading

34:19

them straight to R2. So that's going to

34:21

be our source of truth just for our

34:22

application layer. And then back in the

34:26

manifest, oops,

34:32

uh we're saying if we are using this R2

34:33

attachment store, then we're actually

34:34

going to use a different manifest here.

34:36

And we have a similar layout. So we have

34:39

task and then these four folders. But

34:40

instead of the input being an empty

34:43

directory that we then copy files into,

34:45

we're instead saying I want to use this

34:46

S3 mount type. And we're technically

34:49

using an R2 bucket, but we're using the

34:50

S3 mount. The reason is is because R2

34:52

and S3 are API compatible. So we have

34:54

one object for them both. But as we look

34:56

through the parameters here, we can see

34:58

we're using this R2 bucket name passing

35:00

passing we're passing the access key and

35:03

secret key. And then we're using this uh

35:05

modal cloud bucket strategy. A lot of

35:08

these sandbox providers, modal included,

35:09

have native ways to mount external

35:12

volumes to their sandboxes. And so we're

35:14

using the sort of built-in one for

35:15

modal. But if you're using a Docker

35:17

container or sort of any generic

35:19

container, we provide a couple out of

35:20

the box. You can either use our clone or

35:23

fuse. And these both work equally well.

35:24

And it will depend a little bit on your

35:26

systems which one will work better for

35:27

you, but they're both provided out of

35:29

the box. So pretty easy to just plug

35:31

this stuff in. And so now uh I will go

35:35

and create another issue here. We'll say

35:39

edit these assets again please. And

35:43

we'll grab our files.

35:46

Drop these in.

35:48

Hit create.

35:54

And

35:56

we should see so this has been I didn't

35:59

assign this but we'll sign in a sec. Uh

36:01

what I want to show here though is now

36:02

we have if we go back to our bucket we

36:05

now have tasks uh and we have a a

36:08

reference to our task here that has all

36:10

the files in it. So now if I go back and

36:14

then assign it to our program editor,

36:16

program editor and say get started,

36:19

hit send, it's going to be able to we'll

36:22

watch this thing spin up and then it

36:24

will be able to read and write to this

36:26

volume just like it would if it were on

36:28

the file system local to the container.

36:31

So you can kind of bring in these places

36:33

of context of you know file storage from

36:36

kind of anywhere across the internet and

36:37

the model will be able to access it as

36:39

if it were on the local file system.

36:40

It's completely opaque to the model that

36:42

it's actually over operating over the

36:44

network. So there are some trade-offs

36:46

here. You might incur some additional

36:47

latency as the model is kind of like

36:49

lsing maybe a large bucket that takes

36:51

you know a long time to list all the

36:52

files through. Uh, but there's sort of

36:54

this, you know, really nice trade-off

36:56

that you get where you get to, you know,

36:58

maintain your freshness constraint or

36:59

have it work over a ton of files without

37:01

having to sort of like copy all them all

37:03

onto the container where you might be

37:05

sort of like limited in space. Um, so

37:08

cool. We can wait a sec for it to see

37:10

the spin up, but that kind of brings us

37:12

to the end of the demo here.

37:15

Uh, you know, we kind of covered a bunch

37:18

of cool things. Uh we showed how we can

37:22

build agents that both inspect files,

37:24

spin up sandboxes, and kind of any

37:25

environment that you choose, maybe one

37:27

that you're already using. Uh agents

37:29

that can edit code and kind of like work

37:31

on these long horizon tasks for maybe

37:34

minutes or hours or longer uh over, you

37:37

know, whatever period of time that you

37:39

want and whatever context you want. So

37:41

yeah, with that, I think we'll do some

37:42

questions.

37:43

>> Yeah.

37:44

>> Cool.

37:45

>> Okay. Um next slide. I'm really excited

37:50

to welcome Nish on stage with us. So

37:52

Nish is product for agents. Nish, do you

37:56

want to give a quick intro?

38:00

>> Hey guys. Uh yes. So product manager on

38:02

agents. Uh really excited about the

38:04

stuff we're doing with agents SDK. I've

38:07

been answering some of the questions in

38:08

the channel. Really great questions and

38:10

uh yeah, excited to get into it.

38:12

>> Awesome. Okay. Um next slide here.

38:15

Perfect. Um, so the first question, when

38:17

are harnessed based agents best versus

38:20

building with just responses API?

38:22

>> Cool. Yeah. So I think that if you the

38:25

responses API is great for a huge

38:27

variety of tasks, you know, I think the

38:29

places where it really shines is and

38:31

it's an agentic model by default and or

38:34

agentic API by default and we kind of

38:35

talked about that in the responses API

38:37

build hour from October. um it really

38:39

shines I think if you are using if maybe

38:42

you're using a language where you can't

38:43

find a really good uh sort of agentic

38:46

framework that you like. So maybe you're

38:47

building an elixir, maybe you're

38:49

building in HASLL or something a little

38:51

bit more say esoteric, you can always

38:53

just build on the API. You can kind of

38:55

recreate a lot of these concepts

38:56

yourself. It might be a little bit more

38:57

work. Um the you know sort of like

38:58

harnesses are becoming uh sort of really

39:01

key to how the models operate. So in

39:03

particular, the codeex harness with

39:05

openAI models is sort of like becoming

39:07

more and more coupled as time moves on.

39:08

And so you know in the future, it's very

39:10

very possible that you will be able to

39:12

get the best performance from the

39:13

harnesses that the models are trained

39:14

with. And uh so that's where kind of the

39:17

harnesses are are really going to shine.

39:20

Um and also if you you are using uh an

39:23

environment where you can kind of drop

39:24

these things in, it just is making if

39:26

you're building agents, it's going to

39:27

make a lot of this stuff super easy. Um,

39:28

however, if you're doing, you know, kind

39:30

of oneshot tasks like maybe a

39:32

translation use case where you just need

39:33

to like just oneshot translate a

39:35

document or, you know, taking

39:37

unstructured data and structuring it

39:38

into JSON or some other format,

39:40

sponsor's API just out of the box is

39:42

going to be really good at that stuff.

39:43

And you can use all the hosted tools to

39:44

kind of have these like really light

39:46

agentic loops that are uh pretty easy to

39:48

to orchestrate in just one API call.

39:52

>> Cool.

39:54

>> Okay, next question. Does the agents SDK

39:57

feature any built-in out ofthe-box

39:59

persistent state management to handle

40:00

this pause and resume behavior natively?

40:03

>> Yeah, totally. So, I showcase a little

40:05

bit that snapshot mechanism. That's

40:06

totally out of the box. That works if

40:08

you just start using it. Um, the only

40:10

thing you have to define is kind of

40:11

where you want to store the snapshots,

40:12

whether it's in R2 as I mentioned or

40:14

just kind of local to the to disk which

40:16

is the default. Um, and so this is kind

40:18

of that, you know, really out of the box

40:21

thing where it will take one sandbox's

40:23

file system and then zip it up and then

40:26

put it somewhere else. Uh, so that

40:28

another sandbox can resume from the

40:29

point where it left off. Um, but also

40:31

the run state is sort of managed. So you

40:33

can think about two pieces of state

40:35

here. It's kind of like the file system

40:36

and then all of the messages in that

40:38

conversation so far that the agent will

40:40

need to continue from agents SDK manages

40:42

handling kind of like both of those

40:44

pieces in in one shot. And so you can

40:46

take that thing that has both a pointer

40:48

to where the snapshots stored and then

40:50

also the full roll out and then you can

40:52

store in your database. It's just a JSON

40:53

object and then load it back up later

40:55

and then you can kind of resume

40:56

losslessly maybe if you have sort of a

40:58

multi-node system or something like this

41:00

that just has the database to share

41:02

state.

41:04

>> Nish anything to add there or we can

41:06

move on to next question. Yeah. No, I

41:08

think that covers it. Like essentially

41:10

we store the roll out and we also

41:11

snapshot the file system which is uh a

41:14

little bit unique compared to other

41:15

SDKs. Um so you should be able to uh

41:19

resume very like long running uh dayong

41:22

agents pretty easily.

41:24

>> Cool. Okay. Next question. How would you

41:27

compare the code executing harness in

41:29

the agents SDK via the sandbox agent

41:32

versus the actual codeex harness? Any

41:34

capability? Uh Nish, you want to take

41:36

this one or?

41:37

>> Yeah. Yeah. Um so there are some sort of

41:40

small uh differences between the sort of

41:43

like actual codeex binary and the agents

41:45

SDK. Agents SDK does come with a lot of

41:48

those core capabilities. Uh some of

41:50

those are like the bash tool. Uh the way

41:52

codeex edits files, skills, memories. Um

41:57

there are some things in the codeex

41:59

binary that uh we are still going to

42:01

like build into agents SDK eventually.

42:04

uh the codex binary is like this sort of

42:05

like multi-threaded thing that you can

42:07

run uh multiple different sub aents in

42:10

agents SDK we have this concept of

42:12

handoffs so you can do something like

42:13

very similar in agents SDK you should be

42:15

able to do like a lot of the things that

42:17

you can do in the codex binary and

42:18

agents SDK um but yeah ultimately like

42:21

agents SDK is like very much uh in

42:24

distribution uh through the Codex

42:27

harness today.

42:29

>> Awesome. Thanks Nish. Um, next question.

42:32

Can agents persist and restore

42:33

longrunning workflows automatically?

42:35

>> Yeah, definitely. This is kind of what I

42:37

showed in in the demo. It's kind of like

42:39

very easy with very little. We didn't

42:40

look at a lot of my orchestration code,

42:42

but there's not really that much to it.

42:44

Just kind of kicks off the SDK, stores

42:46

the the roll out, that JSON file in a

42:49

file just on the laptop, and then is

42:52

able to kind of resume uh pretty easily.

42:54

So yeah, I think mostly most of the sort

42:57

of engineering that you'll do is

42:58

figuring out what tools to bring in,

43:00

what what skills should I add to the

43:02

agent and uh the sort of orchestration

43:04

of starting and stopping and resuming

43:06

tasks is is sort of uh just out of the

43:08

box.

43:12

>> Okay. Can the sandbox get auto destroyed

43:15

after the task is completed or is it a

43:17

better pattern to keep the sandbox

43:19

running?

43:19

>> Yeah, this is a good this is a great

43:20

question. Uh so I I actually only demoed

43:23

one mode of the ages SDK. But there's

43:24

two ways that you can use sandboxes. You

43:27

can do the sort of what we call um you

43:29

know SDKowned sandbox where when you

43:31

spin the task up if you just provide it

43:34

the recipe for creating the sandbox. So

43:36

you provide it one of the clients like a

43:37

docker client or an ETB client it will

43:40

handle spinning up that sandbox for you

43:42

and then when the when the turn is done

43:44

it will spin that sandbox down. But you

43:47

can also eagerly load a sandbox. So you

43:49

can kind of create a sandbox out of band

43:51

and then pass the reference of the

43:52

sandbox to the agent and then that's

43:54

what we call like user own sandbox and

43:56

so you would handle managing the

43:58

lifetime of that thing. It the SDK won't

44:00

automatically spin it down at the end of

44:01

the turn. So you can kind of keep it

44:03

warm and then use it across like

44:04

multiple turns instead of spinning it up

44:06

and spinning it down. Um, but the out

44:08

of- thebox behavior is that we'll handle

44:10

spinning it up and and and down so that

44:12

you don't end up in a situation where

44:14

you've built your demo and then all of a

44:15

sudden you have 100 sandboxes running in

44:18

the cloud that you're paying for and

44:19

it's kind of providing a bad outcome. So

44:21

there's a bunch of different ways that

44:22

you can use this. Um, but yeah, by

44:24

default we the HSDK will will handle the

44:27

lifetime.

44:31

>> Hey, how should I populate and persist

44:34

my file system? Yeah.

44:36

>> Yeah. Happy to take this one. Um, so

44:38

essentially we have this concept called

44:40

the manifest uh in agents SDK. Uh, the

44:43

thing that I really like about the

44:44

manifest concept is you can kind of

44:46

choose how you want to populate and

44:48

persist uh the file system into the

44:49

sandbox. Uh, so you can essentially copy

44:52

over all of your files from cloud

44:54

storage into the sandbox or you can

44:56

mount uh, you know, all of those files

44:59

uh, onto the sandbox. And there's sort

45:01

of like different benefits of of either

45:04

if you copy over everything like during

45:06

run the runtime the the model will be

45:08

able to read the files a lot more

45:09

quickly but you know startup will be

45:11

slower because you are copying like sort

45:13

of this big blob uh into the sandbox and

45:15

there's sort of the second world where

45:17

you can optionally mount something on

45:19

the sandbox and uh startup will be like

45:22

uh very very fast but like over the

45:25

runtime it has to like read these files

45:27

over the network. Um, but the nice thing

45:29

about agents SDK is we kind of like let

45:30

you do multiple different ways to kind

45:33

kind of like mount or copy over things

45:34

in the sandbox and then persist that uh

45:37

after the sandbox spins down.

45:40

>> Awesome. Thanks, Nish.

45:42

>> Okay, last question here. Can a

45:44

supervisor agent monitor and coordinate

45:46

hundreds of specialized agents

45:48

simultaneously?

45:49

>> I think the answer is yes. It would

45:51

definitely it would take a little bit of

45:52

work to kind of spin this up. And we

45:53

have some uh cool stuff around sort of

45:55

like multi- aent frameworks that's kind

45:57

of coming in the near like weeks and

45:59

months. Um so it will get a lot easier

46:01

to do this in the future, but uh there's

46:04

nothing kind of stopping you from doing

46:05

this now. You can kind of, you know, go

46:07

crazy with agents and spin up a ton and

46:10

then have one that can kind of check on

46:11

the status between them. I think like

46:13

ways we've seen good coordination

46:15

patterns is either by through messages.

46:17

You can give the parent agent a tool

46:19

that allows it to understand what sub

46:21

aents are running and then check in on

46:23

them. So kind of like check in on their

46:25

progress or you can sort of like have

46:27

them all communicate via the same file

46:29

system or like a database or something

46:31

like this with kind of a variety of

46:32

communication mechanisms you can use to

46:34

coordinate them. But yeah, nothing

46:35

stopping you from from doing this. And

46:37

um you know, I think that this sort of

46:39

massively parallel work will become sort

46:41

of the default in the near future.

46:45

>> Love that. Okay, so let's wrap up with

46:47

some resources. So on the screen here

46:50

are everything you need to know to kind

46:51

of get started with agents SDK. Um we

46:54

had a ton of questions also in the chat

46:56

about your demo and where they can where

46:58

they can find this. So on the very

46:59

bottom is our build hour repo. Um we

47:02

have all the code from the last build

47:04

hours and we'll be we'll be posting that

47:06

um very shortly right after right after

47:08

we're off camera. Um then for any

47:12

upcoming build hours, feel free to check

47:14

back on our homepage. Um so if you hit

47:17

the next slide, we show you just the

47:19

link again. Um and after this session,

47:22

we'll email out a survey. Let us know

47:23

what topics you want to hear about next.

47:25

And we'll also include the link for this

47:28

recording so you can catch up if you

47:30

missed the beginning. Um you can catch

47:31

get up to speed um there. And that's all

47:36

for now. Thanks so much for joining us

47:37

and we'll see you next time.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

In this Build Hour session, the OpenAI team introduces the latest updates to the Agents SDK, highlighting its ability to handle long-running tasks using a Codex-inspired harness. The session features a live demonstration of building an agentic task tracker, showcasing features like sandbox isolation, persistent state management through snapshots, and the use of tools, skills, and cloud-based deployments like Modal to run production-grade agents.