Building durable Agents with Workflow DevKit & AI SDK - Peter Wielander, Vercel
1577 segments
Thank you all for coming. Hello. Hello.
>> Got you.
Uh, I don't know about you, but um, my
ride agents, I like focusing on the
capabilities and the features, and I
like not thinking about um, all of the
extra effort that goes into getting
something that works locally into
production. And something that's very
useful for that is uh, a workflow
pattern. And that's why we developed the
workflow def kit which is what we're
talking about today. Presumably if
you're here you've had similar issues.
Um and today uh we are going to um turn
a agent uh coding agent um into a
workflow supported coding agent
throughout this session. So
this slide. So we um have a open- source
example um ready to go. Uh this is on
the versel/examples
repository. Uh so you can clone that and
check out the VIP coding platform app
insight. Uh we're going to be using this
app um for today's demo.
Um and after we're done we get first
class observability built in and also
durability reliability. uh we get a lot
of extra features like resumability
um and
draft kit makes it very easy to add
human in loop workflows and similar uh
things.
So if you think about our uh general
agent loop um that we've all seen
before, we mostly have calls back and
forth from another LAM um to our to
calls and our backend uh code, right?
Which would include MCP servers, human
approval, any kind of async tasks.
And uh the usual way to go about this is
to uh wire up some cues um and a
database especially if you are doing
longunning agents right that might run
for hours um and you want to scale and
you're running on serverless for example
uh you want something some kind of
reliability layout in between which
usually is filled by cues um and then
you'll also need to add a lot of error
and retry code you'll need to store all
the messages that people are sending and
also between states and you probably
also need to add some kind of
observability layer in between. All of
those things we are going to do today uh
using only a single library which is the
workflow development kit. Um it's open
source um and it runs with any of your
TypeScript um front ends or backends and
can also run on any cloud. Uh we're
going to be deploying to Brazil today,
but this could just as easily run or any
of your cloud stacks um orbase or any of
your custom stacks.
Um all right, so who here has heard of
the workflow pattern or has used a
workflow library before? Show of hands.
All right, that's less than half. Um I'm
going to quickly explain what a workflow
pattern is um to uh make it clear what
we're doing and then in about 2 minutes
we're going to go into the code. So um a
workflow
pattern is essentially a sort of
orchestration layer um that separates
your code into steps that can run in
isolation and can be retrieded um and
have their uh data persisted and a
orchestration layer um that we call a
workflow within platforms have different
names for that. And in our case here, um
the the workflow parts would be whatever
the loop is that calls um the reason the
LM calls and then goes back to the tool
calls and then back to the LM calls. And
the steps would be our actual tool calls
and our LM calls. Um
right, so looking at the agenda for
today, um we're going to be jumping
jumping into the code. We're going to
add workflow development kit which is
going to be quite fast and then we have
a lot of time to talk about uh cool
additional features that it adds like
resumable streams out of the box um how
to suspend and resume at any point um
and how to add web hooks for human in
the loop uh processes. At the end
there's going to be ample time for Q&A
but there is a reason that you're here
in the workshop and not looking at this
online which is that you can ask
questions. Please do so at any point. Uh
feel free to raise your hand or just
shout out the question. Um and we'll get
rid of
all right so as I said uh we're working
off the baselample repository and we're
going to be working off of the conface
branch. Uh why this branch? I stripped a
bunch of the access code on the example
to make sure that we can focus on the
most important parts. And every
checkpoint from this workshop will have
its own branch. So uh if you're not
coding along directly, you can also
check out the steps step by step and
then check the diffs and see what
changed between.
All right. So
um I have already run uh npm dev locally
on this uh um platform just to show you
what it looks like. I'm going to run a
simple query. Um so this is a uh coding
agent, right? It's like a code editor
but without the code editing and it can
take a prompt, generate some files and
it'll eventually show you a iframe with
the finished app that is deployed. Um,
so it's it's mostly UI with a few simple
tool calls um that we'll look at in a
second. And the file system and output
uh runs over the cell sandbox, but you
could just as easily run this locally.
Uh, looking at the code, I'm going to go
and check out our actual branch.
Looking at the code, uh, we have one
endpoint that accepts our chat messages,
right? Then it does some some regular
sort of model ID checking with us to see
whether the model is supported. And in
the end it's going to simply create an
agent. Oh yes.
>> What was that branch one more time?
>> The branch was
conf.
Yeah. And you can see we'll we'll move
on to these com/2- etc. Uh just look for
the numbers and you'll find all the
checkpoints.
Um yes. So our main endpoint just
accepts some messages and calls the AI
SDK agent which is essentially the same
thing as a stream text call. Um we'll
pass some tools and internally it'll
just loop for stream text to call stream
text call. uh and then it'll uh stream
the all of the chunks generated back to
the client in a format that is easy for
the client to understand. This is all
sort of AISDK regular code um that you
could replace with a different library
if you want. Um that is mostly there to
support the UI. Um but again all of the
actual agent stuff is very simple and
happens here. So
uh oh let's also take a look at the
tools that we have. Uh we have four
tools uh create sandbox get sandbox URL.
These are very simple. They just wrap uh
bell sandbox.create and get URL and
similar with run command essentially
reps uh sandbox.r run command and
generate files um will generate uh a
file from a simple prompt. And we're
going to take a look at one of these
root calls as an example. Uh we have a
we have a prompt that looks somewhat
like, you know, a markdown file. Sort of
what to do, what not to do. Um and
my hotkeys are not working. And
back to the tool call. We also have an
input schema that's a Z schema um for
what BI is supposed to pass. This is all
very standard. and then an execute run
which wraps sandbox one command with
some error handling.
So that's essentially our entire agent
code setup and then in the front end we
just call use chat from AI SDK to
consume the stream and display uh things
in the UI. So uh let's get started
adding workflow to this. Um any
questions before I get started?
Cool. All right. Uh, so step one is
we're going to run npm install for
workflow and add workflow
which will give us the latest version.
Uh, workflow is the main library and add
workflow are some helpers for uh uh some
some rappers for
that work well with the workflow
development kit.
So now that we have this installed, uh
we are running a Nex.js app here. So
we're going to uh extend the compiler to
uh compiler workflow code by doing use
workflow or with workflow. Um which we
can import from
workflow next
and that'll set up next.js. Yes.
>> Verifying question. You are in the
example applied coding directory.
>> Yes.
>> So adding this will let the compiler
know to also compile our web um
separately which we'll get into more in
a second. And then for convenience, we
can also add a TypeScript plugin to our
TS config um same package and that'll
give us some better autocomp completion
for our workflow code.
And so we talked about uh a workflow
being
um having an orchestration layer and
having a number of steps. Uh what we're
going to do first is we're going to
write the orchestration layer. Um, in
our case, that is essentially just the
agent, right? Does the loop that calls
steps um back and forth. We're going to
add a new file. Uh, you can call it
whatever you want. And we're going to
take our
agent call and move it over there. And
I'm going to call this our uh code
workflow,
which is going to be all of our workflow
code.
And then I'm going to go and auto
complete a bunch of imports.
Thank you, AI. So we're just passing
most of the uh arguments that we would
otherwise get from here over there. And
this completes the refactor. Essentially
having done nothing but pull out some of
the workflow code into our file. So this
is where it gets interesting. Um, now
that this is a code, this is a separate
function, we can use the use workflow
directive
which will mark this for our compiler as
a workflow function.
So what this does is under the hood,
yes. Oh, sorry.
uh under the hood it um compiles all of
the code related to the function into a
separate bundle and it ensures that
there is no imports to anything that
would have side effects because the
workflow orchestration layer needs to be
deterministic. So it can be rerun um in
a in a uh deterministic fashion and
there's no worries about state uh
pollution.
So now that we have this um we would
need to now mark our LLM calls as steps
and because the calls are happening
inside the agent um this is a little bit
harder to do here and so we uh ended up
writing a
durable agent class which is essentially
the same thing as agent with a use step
marker in the actual LM calls that it
does under the hood.
So, now that we have this set up, we're
going to
await the actual streaming and let's see
if there's anything
we need to do.
Um, checking for errors. Oh, yes, we
need a stream to write to. So,
previously we could just write to the
stream that the API handler gave us. Uh,
now we're going to have to create a new
stream to write to. Um, we export a get
writable function from workflow which is
which gets a stream implicitly
associated with the workflow to write
to. And we're going to get get into that
a little bit more in a second. But for
now, we'll just pass that to our agent.
And we're going to see if this is right
type.
Um, presumably
not.
Um and then finally
back in our actual workflow
we need to call our workflow in a way
that the uh framework understands which
for us is a call to start
with the arguments being passed
separately which is essentially telling
it to start a new workflow
on this on this function and start can
be imported from workflow/ API Okay,
so now we essentially have uh the
workflow fully hooked up and a lot of
this was just pulling out some of the
codes and adding a directive.
Yeah.
Um
has volunteered to help anyone who's
like following along and has some
debugging questions. Um just reach out.
>> I'm on the team as well. So let me know
if you guys are following one.
>> I'll be around to help.
>> And finally,
this start call returns a run instance
um that has the stream that just um end
up writing to that we can return to the
UI. So this completes our workflow um
definition. And now uh we also said that
we would need to mark things as steps.
The durable agent class already marked
the LLM calls as steps. But our tools
right now are not marked as steps.
Thankfully, this is very easy. Uh in the
execute function for each of these
tools, you can just write use step and
that will let the compiler know that
this is a separate chunk to uh of code
to uh execute in a separate instance.
Right? If this is deployed to
production, this would run in a separate
serless instance and the inputs and
outputs would be cached if it already
ran and it would be retrieded if it
failed. So I'm going to go and
go through the other tool calls and also
add use step to these. Thankfully we
only have four of them.
And that should complete our
transformation.
So now we can go and run the mpm dev.
See if this works as expected.
We're going to reload our page.
And it seems like nothing changed. Let
us actually run a query.
And we can see that it's still streaming
as expected. So uh for us developing
locally right all we had to do is pull
out a function um and then add some
directives. But now if I deploy this um
to any adapter again was or an AWS
adapter or maybe you have your own um
this will run uh in isolation with
durability and all of those good things.
And something that's really nice for
local development also is that if I go
and
if I go and I'm going to go into the
same folder here and I'm going to run
npx workflow web which is this cli call
to start a local web UI to inspect our
runs. Um and you can see that our run is
currently still running. Um and every
step everything that is marked as a step
will have a uh span here and you can
inspect the inputs the outputs and any
associated event. And we can see that
our workflow just completed I think and
yeah this gets built in. Yes.
>> And just for clarification every time
you're prompting your vibe coder that is
is one instance of the workflow that
runs to completion. So then so each one
is Yeah. It's
>> exactly Yeah. And you could model this
uh in any way you want. Um you can also
model your entire like an entire user uh
sort of uh session as one workflow and
uh have the workflow sort of do a loop
wait for the next query um and then
again you know we can run code for weeks
if we need to essentially um
and I'm going to go into uh some tools
for that in a second.
So now that we have this uh set up uh
you can see that um on the right side we
do not get any sort of helpful uh
feedback. But um if I visit this link
and see that our app
has likely been created correctly
um or or it failed because of some
errors
and
either way we're not getting any output
on the right side. So uh the reason this
is happening is that we are streaming um
the agent output to the client but our
tools aren't actually doing any stream
calls right now. So what we could do is
similarly in our tool calls uh we could
get the writable
um which would which will get the same
writable instance as any other uh as the
workflow itself. Um there is an infinite
amount of streams you can you can create
and consume in a workflow. Um and you
can also like you can tag them with a
certain name and then fetch them from
there. Um but this will get the default
instance.
And once we have a writable we can
actually connect to the writable
by getting the writer. And now we can
write any kind of information um to the
to the iPhone to be consumed. I think we
want something like data create sandbox
I think is what I hooked up in the UI
and then
we'll call ID
we want the sandbox ID do it here. So
this is me just writing a data packet
that our UI knows how to consume. So now
that I did this and I if I reload the
app and start this again, we'll see that
at least the sandbox create call
presumably gets filled in correctly at
the start.
>> Yeah,
>> you said that there are stream that you
can create and
what do you mean by that?
>> Right. Yes. So um a stream
the workflow sort of the adapter they
use for workflows in local development
right this would just just be a file in
production this might be a reddus
instance
>> um supports the workflow calling it to
create a new stream for example in
reddus right and then passing that
stream back and so anytime you call get
writable it'll create a stream for
example again in radius with the ID of
that workflow and it'll pass that So any
step can attach to that and any client
can attach to that
>> and in local uh host this would be
written to a file and read from a file.
>> Sorry you're setting up right now.
>> Right. So pre previously we had a a API
handler that took some messages called
the agent and then streamed back
messages from that API handler. Now we
have an API handler that calls it starts
a workflow and it'll pass back the
stream that this workflow creates. Um,
what this allows us to do also I think
that was not working correctly. I'm
going to restart the server just to see
if that's the case. Um,
>> anything else so far getting this set up
to where where needs any help with
getting set up?
>> Seems good. Um good point you made uh
something this allows us to do is that
the stream is not bound to the API
handler. This means that at any point we
can resume this stream. If you lose
connection to your API handler and then
the user reconnects this stream still
exists and we could reconnect to the
stream to resume the session. This is
also part of the durability aspect where
everything you do in a workflow you can
resume at any point.
I'm going to restart this query and hope
that it works this time. Yeah. So, now
that I hooked up this data packet, you
can see this special UI handling for
creating a sandbox uh works. But even
after it's done, it's not showing up
that it's done. This is because we're
only writing the initial loading state
packet. So, I could go through all of
our tools and I could add more packets
and just, you know, make the UI richer.
Um, but I'm going to go and check out a
different branch which is the uh conf-
sleep branch. Uh, just the next step uh,
which already has these
actually I'll go for the workflow one.
Sorry, conf /2-workflow
which already has all of these writer.
Calls populated. There's no difference
otherwise. Um, so now that all of our
tools have these right calls, the stream
would again presumably look the same as
it did when we started out in this app.
Um, all right. So, now that we have
streams working again, everything is
working as expected and we have more
observability and we can deploy this
with durability. Um, I talked about
resumable streams before. We're going to
see if we can get this stream to resume
so we have durable sessions.
So, the only thing we need to do to uh
make that work is to go to our API
endpoint.
Um, and what where we get the run
instance, we're also going to return
the workflow ID
as a as additional additional
information. So I can I can return run.r
run ID for example. This is just again
any way you do this is fine. Um, I'm
adding it as a header here because we're
already returning a stream. But anyway
you pass the ID to the UI um is
something that the UI can then use to
resume the stream from.
So what we do from here is uh the UI
should be able to decide whether to
whether it has a run ID and whether it
should resume a stream. So we're going
to go and create a new endpoint. Um
let's call it
ID for type slashexisting ID. Then we're
going to make a folder stream and we're
going to add a route handler. So this is
just next.js uh configuration for adding
an API route at slash chat/ ID/ stream
and we're going to
auto complete with AI.
Um what we're essentially doing is uh we
get the ID from the params and then all
we're going to do is call get run in the
workflow API which gets a uh the run
instance and then we can
return the same stream
that we return in the other endpoint
just without calling the actual agent um
only only doing the stream and
>> this whole project
>> I think that should be good. Um, we're
also taking our start index which is
very helpful from the AI. We can get a
readable stream um, from a certain start
point. I think this is why it's auto
computed. Um, so if you're trying to
resume a stream like midway, you can
pass a, you know, which chunk you were
on when you initially left off. So now
that this is done, I'm going to comment
out these things we don't currently
need.
um we need the UI to support
this uh conditional of whether to resume
or whether to start a new chat. So I'm
going to go to our chat front end and
I'm going to go pull in some code from a
different branch um for simplicity which
is the it's on the four-streams branch
which I'm going to
just show for
completion. um we do a use chat call
already in the UI to consume the stream
and we all we added now is a transport
layer which is this big block here that
has some middleware
for the stream that says that if I'm
trying to start this call uh I'm going
to check first whether we have an
existing run ID and if so I'm instead
going to do a reconnect by calling this
different API endpoint instead
I'm sort of handwaving over this a
little bit because it's client side
handling for for traditionals. Um, if
there's more questions about this,
please feel free.
All right. So, that gives us resumable
streams. Um, and I'm also going to demo
uh what if we wanted to deploy this and
see it in production.
Uh, so I'm going to call this and then
we can check out a production preview
example.
In the meantime, uh the next we're going
to do is we talk about
events and resumability.
The workflows because they run the way
it runs is that every step runs on its
own serless instance in production. The
actual work workflow orchestration layer
is only called very briefly to
facilitate a step runs. What this allows
us to do is to have a workflow to spend
for any amount of time. a workflow could
wait for a week and not consume any
resources. Uh this is built into the
workflow development kit um
in a way where we can in inside a
workflow anything tag with use workflow
we can simply call sleep
three days for example
um and that would also wake us uh that
will pause the workflow for three days
and then resume where we left off. If
someone was trying to reconnect to a
stream for example, right? This was
sleep an hour, the stream would just
reconnect again to the same endpoint and
things would resume from there. So we
don't lose anything by uh losing the
instance that runs the code because we
can always restart it, resume from where
we left off.
Um and this is useful for AI agents
because we can to a tool call. uh we can
have the UI as the AI agent have a call
that says sleep any amount of time and
then use it to make an agent that that
essentially uses a crown job where it
says every day read my emails and do
this thing right so that would be sleep
one day yes
>> when the when the kind of agent goes
down that means all the state goes with
it down
>> yes so when it when it sleeps no when it
sleeps for three days
>> no then it kind of paused but when that
would be killed output is for some
reason where does the state go of that?
>> So the the way it works is that any step
call um is cached. So when you when an
input goes to a step call, we register
that as an event and we run the step and
if the step completes, we cach the
output and say this step has been run um
to completion. Right? So if it was if it
was something like this where we run the
agent first, right? Let's say we run the
agent and we run a bunch of steps. um
the state of the workflow function at
this point in time would be saved and
all of the outputs from all of the stat
calls would be saved and at the time
where we restart the workflow from this
specific line of code it'll rehydrate
the entire state and it'll just go from
here.
Um and this happens so that again we
don't have to replay any of the code of
it uh in a way that that does any actual
resource consumption.
Um
yeah, so
we can use this to make an agent that
has it's essentially a crown job again.
Um and we can use it to make agents that
run for weeks or interact with any of
your um sort of like information over
over a very long time horizon.
And
while I've been talking, we have
deployed uh our current app to the cell.
So I can check out this preview branch
for example and you can see the app is
now live online um and working just as
it usually does. And
yes it works perfectly. And if I then
again I can do
uh I can use the UI to inspect this at
any point. If I call
workflow inspect web or just workflow
web with the dash backend for cell and
dash and preview parameters for example
that'll just let us let it know where
our deployment is to be found and then
that'll spawn up the same UI and now we
can check on this run run run that's
running in production and you can see
we're getting the same kind of
information here. Um,
yeah. So, this is sort of Oh, I'm not
going to cancel the run. I could cancel
the run. Let's cancel it. Um,
this is to show that the way it works in
local locally is the exact same way that
it works in production from a conceptual
standpoint. Um, which is the UX we are
aiming for.
All right. Um, I talked a little bit
about sleep and suspend. Let us go and
write this
sleep tool call. It's going to be very
simple. Uh I'm going to go and copy the
I mean I already here but I'm going to
copy this and write it from scratch.
We're going to write a sleep
uh pool call.
I'm just going to call it sleep.ts.
And uh we're going to turn down the
input schema to be something like time
out milliseconds
and the actual run command to be
none of this and instead just call
sleep.
Um because sleep is already a step um
that we export from workflow in workflow
library. We don't need to call uh we
don't need to mark this function as use
step
but this will now uh let's see if this
is oh this should be a number. There you
go.
>> Can you see that again? Why don't you
need the use step?
>> Oh uh so this is already a step um that
we export from workflow. It's going to
be the observability will also show it
as a step which we'll see in a second.
And
um this should just work assuming the
prompt is good which we're going to
modify to be say something like
see
of this.
Um
yeah, do this. Um only use this tool if
the user directs you to do so.
All right. And get a double quote here.
There we go. And so now that this sleep
call is set up, um that should be all
that we need to do.
We'll call it run sleep command and
sleep tool. And we're going to add this
to our tools list.
And
I think I confused our compiler a little
bit or at least TypeScript.
This seems to work great. Okay, now we
have the tool. Um, and we also want the
UI to sort of display when it's
sleeping. Um, so I'm going to add I'm
going to add a another not a function to
log sleep. Uh,
this is the reason we're doing this is
we cannot write to a stream directly
from a workflow because then it wouldn't
be uh deterministic anymore because
every run of the workflow would write to
the stream again. Yeah. So I'm trying to
run the project. Um I had to create a
versel API key for the AI gateway. Did
that did that getting error that says
header is missing from the request. Do
you have the YC option enabled in the
project settings?
>> You skipped this in the beginning but
>> oh yes.
>> Yes. Our uh because this code uses uh
sandbox uh you would need to log into
this. Um my mistake uh this should be
running locally. Um if you don't use
this sandbox which uh I will uh we'll
have a branch that doesn't use this
sandbox um for after the talk.
>> For now you might
>> um at this point I'll just do it
afterwards. It's fine. So here I'm just
going to add another call to writable
and we're going to call uh we're gonna
let's see we're going to need
local ID and so now this is just going
to writes to the stream and that should
allow us to
show it in the UI correctly. Um, let me
see if I figured
the UI to correctly interpret this
packet.
Um,
there is no data sleep type which I
think might wait. Yes.
All right. So, now that I have this, um,
I can go start our app again
and
And so it loads. We can try out the
second prompt here which is sleep for 30
seconds and then return a
uh just to show that it's going to
correctly interpret the sleep call and
then
sleep. Um, it's not showing the data
packet here sadly, but we can go to the
web UI
and we can show it
has been it's engaging in the sleep call
and this is going to return after 30
seconds.
All right, so that's sleep and there's
one final thing um one final feature
that I want to show you u which is web
hooks and the ability to resume from web
hooks easily. uh implementing web host
is usually quite difficult or a headache
and in our case uh I'm going to check
out the conf /5-hooks branch and show
you that we can in the same fashion as
we do sleep we can add a new tool that
I'll just I'll just show you where the
actual tool call is um just a a log call
and then we create a web hook which is a
function we export from the workflow Uh
and we can then log the web hook URL
to the client or anywhere else and await
web hook and this will suspend for as
long as necessary to uh someone to click
on this URL and then
let's see if we can the server is
running and I can show you this uh
running hopefully
reload this and Um,
wait for wait for human approval
before starting
and call
Pokemon index.
Let's see if it happen this correctly.
Been changing branches, so I might need
to restart my server.
Um, and the way this works under the
hood is that again we'd be creating a a
URL and we're going to sleep the um the
workflow until a call comes into that
endpoint. And this comes with I'm going
to run this query. This comes with a lot
of extra features like I could also do
respond with if I wanted to. Uh this is
a full API API request handlaw. I could
respond with a request object. Uh I can
treat this as a uh again API endpoint. I
could also check the body against the
result schema for example and then only
uh resume once that matches.
So this gives you full control. Um, but
the nice thing is it does hook up the
URL internally and you can see that it's
paused waiting for a human to click on
this link and if you're running in local
host it's a local host link running in
production it will be whatever your
deployment uh URL is. Um yes about both
sleep and human approval those are like
a workflow is is purely steps and steps
always run to completion right so so
sleep is a step it's not like the
suspension of you know like some sort of
like it's not a suspension of the the
execution it's like it's it's a step
>> no it is so we model is the step in
terms for the observability and for the
for how you call it but it is an
internal feature that completely
suspends the workflow and all steps
nothing is running while to sleep. Um
you can also do sleep and another step
and you can promise them if you want. Um
it works as a step call in that sense
that it's a execution that takes a
certain amount of time. Um and you can
use promise await syntax uh to model
that but again it completely suspends
unless there is anything else running at
a time and the same for the web hook.
it's modeled as a step um for the
observability
um but it completely suspends unless you
have auto code running at the time.
>> So just from my understanding if you
have an agent running uh with a workflow
it keeps running.
>> Yeah.
>> You connect to it again let's say
through another session and you would
call sleep in this session does that
like the previous one just like whatever
it was doing just goes down. So if you
have two sessions
um so let's say we we have a coding
session right and it already built an
app and then it's sleeping for a week
right um and then we reconnect to the
stream is that the
>> no the thing is uh let's say I kick off
a work uh workflow and it's calculating
like the numbers of pi just keeps on
right but I connect to the same sandbox
and then I call sleep will it stop
calculating pi
>> um so the way you would do this in a
workflow is again let's let's see how we
would code this
>> you have a sandbox there sleep in the
sandbox
>> well you can connect to this sandbox you
connect again to the sandbox and some
thread call sleep does the whole sandbox
go
>> so the the sandbox is uh it's basel
sandbox which is a sort of just imagine
it as an EC2 instance um so this is just
a a helper for us to spin up an instance
to run this coding agent like run the
code in order to store the files
Um if you met this differently you
wouldn't have to use sandbox um and the
sleep call doesn't happen as a as a bash
call for example then two different
>> right
>> like an orchestration thing and then
when you're actually in this box you you
call sleep in a sandbox you're
>> okay so there are two different
>> right so so there is sleep that you
could call from a terminal um in the
sandbox as a as a as a terminal command
or there sleep from the workflow which
suspends the workflow
Uh yeah so we have we have these
features for for web hooks right and we
can see that after I clicked on the URL
it resumed and then coded me a Pokédex
um that is all of the features we're
going to in the session and I think we
have ample time for Q&A about 20 minutes
at least
please go for
>> how would I spin up claude code session
with this
>> a cloud co session
remotely or are you
>> no kind of run and kick it off as an
agent doing certain stuff um is that
possible and then kind of orchestrate
that as agents
>> that is possible so cloud code uh is um
if you're talking about the app like a
tonal app right cloud code then that
doesn't use a lot of the workflow
features internally um so it's hard to
isolate that or know where the oxidation
There is you could write your own
version of uh cloud code or take the
plot code source code and add workflow
and step um for the calls and that would
then run as as a workflow in the cloud.
There's no way to say like okay I have
my steps you know spin up claw work uh
kind of code type this command and wait
for anything that would be a versel
workflow but how would I actually boot
drop it like code it
>> is one command told right so you know
what you're asking
>> if you so if you're calling cloud code
uh in a um so made as a confusion of
like where this is running right for a
coding agent here if the coding agent
runs make the right for like creating
creating a folder that make the command
runs in a step but it runs against a
like in a sandbox there sandbox being a
VM and so this VM state is not managed
by the workflow itself um so if you call
cloud code on the VM that's essentially
treated like an SSH session but if you
run any any agents or steps within the
workflow right those steps are going to
be resumable um and observable through
the workflow pattern
Um, another question, how do I control
um what my agent has access to from
going out to the internet doing stuff?
>> This would be uh whatever you're
whatever you're already doing for the
agent, right? If you if you in the end
you're going you're going going to be
doing tool calls and stream calls to the
LM provider, right?
um that is that is in your code
presumably already and whatever you're
already using to control permissions
there like your tool calls for example
right if your tool call uh allows you to
delete a resource in S3 for example then
you as
call uh can write whatever code you want
uh in the usual way that
>> it's my job to implement it but it's not
that it has some wrappers by the end
Yeah, all in the sandboxes.
>> Workflows is a general orchestration
layer for durable execution um and
doesn't necessarily provide a sandbox
for running code or um like running
third party code or running agent code
or making files. Uh that's something
that the sandbox is good for because
every sandbox instantiation is a new VM
um that only lasts for as long as your
session lasts.
>> Yes.
>> Yeah. So uh if I'm running workflows uh
and I'm like creating a lot of agent uh
workflows through my brother uh how does
that do does that get queued up on your
system? How does that get run? Is there
rate domain or currency and controls
that we can uh use?
>> Yes. So this is this is uh this goes
into sort of uh some of the patterns
that all of this is going to be
supported um and for the most part is
supported right now which is that if
you're deploying for example to go right
and as usual if you do nextjs every
deploy is a separate uh like live URL
right that if you call it spawns up a
serverless instance and so your
workflows are bound to the deployment.
So if you have something that something
very nice that you get here is if you an
agent and it runs for a week but you
deploy five times in during this week
those new deploys are going to be
isolated from the original workflow and
the original workflow is going to run to
completion and then any new workflow
will run on the new deployments and
we'll also allow upgrading between
those. So if you have a a workflow that
runs for a year, right? Because it's
like every month give me a summary of so
and so, right? But you have new code and
you want the workflow to uh you know
take its current state and use the new
code for the workflow. Um there's going
to be an upgrade button in the UI that
checks for compatibility between the old
workflow and the new workflow by
checking all of the step signatures and
all of the existing events and then you
can upgrade the workflow. Um or you can
currently already cancel and rerun with
the with the new workflow.
>> Um is there a timeout for those workflow
steps?
>> Oh
>> yes. So uh if you're doing serless right
and whatever platform you're on whether
it be like lambda or something else or
or uh your serless functions are going
to have timeouts. The nice thing is that
every step runs in its own serless
function. So the timeouts only apply to
the stats. So if one of individual step
you have runs the risk of running more
than five minutes maybe 15 minutes
depending on platform um then you can
split into two steps um or if it's if it
runs the timeout right it'll fail it'll
retry maybe the will be faster um and
you'll see in the UI that oh this step
is being retrieded after 15 minutes a
bunch of times right presumably because
it's failing and then you can go and uh
split it into two steps upgrade the
workflow and it'll just continue from
And uh yeah the other point was uh
around queuing workflows like I trigger
the agent a bunch of times.
>> Yeah.
>> Does it get cued? Like how does that
process?
>> Right. You can model this in in
different ways. Uh right now again we're
doing this from like a API route where
every call to this API route will create
a new workflow. Um that is mostly again
the only the only the only interactable
output you have is a stream in this
case. Um so it'll do things it'll write
to the stream. Nobody looks at the
stream. We don't know the work before is
running. Uh you can kick off 10 of
these, right? Um and they're going to be
running in the background. Um there is
essentially no limit to how many you can
create because they all run in serless
functions. So you can scale for as much
compute as there is in your provider. um
which is a lot of cute and um you can
also list active runs right there's an
API here for doing C interfacing with
your runs look at all of the runs that
are running look at which version
they're on uh what step they're on um
cancel them I hope that answers your
question oh concurrency yes um you can
also so right now it's infinite
concurrency but uh very soon we'll add
per step or per workflow concurrency
where you can say This workflow is only
supposed to run at most 10 times at the
same time and any extra Q addition gets
like any extra start gets cued so that
it will wait for those 10 to uh reduce
and then slide in. You can also use that
to have a free tier for example on your
product where there's 10 instances
running for your free tier at any one
point and some people that come in later
will wait for the queue but your pro
tier has infinite concurrency.
>> Yeah.
>> Can I roll back steps in a way that
let's say I have 10 steps but in step
seven I think like okay let's go back to
step three. Would that be possible to
like reset the state of the workflow or
>> you can technically do this do this? We
current we don't currently support it
but it would be very easy to implement
because uh you have every step again the
inputs and outputs are cached and we can
enter the workflow at any point and sort
of play from there. So we'd need to
expose this in the in the UI to as a
function to resume from step so and so.
But yes, that would be possible. more
more likely what you want to do because
you control the workflow.
So you wanted to log there and use this
JavaScript
and might have to
step into state or do something just
keep growing.
Maybe my second question. So if you go
through the steps you said, okay, you're
passing out input and outputs kind of
across and that's kind of what get
cached. Is there like a way to attach
metadata or does it always have to be in
kind of the input outputs of the
function?
>> You can also attach metadata. uh we'll
have a tagging API soon where you can
add arbitary tags to the workflow at any
point in the workflow run and you can
use those tags also to maybe decide
whether to early or dd duplicate your
runs. Um yeah.
>> Yes.
>> About the deployment are we tied to
versel or is it possible to like
>> as I mentioned before uh so there
there's two aspects to this. There's the
the front end side of the framework uh
the docs are on use workflow.dev. Uh you
can see for the front end sides um which
is also sort of the the uh API layer it
might work with. Uh we currently support
all of these platforms.
um and more coming soon. And then
there's separately the deployment target
right next.js JS you can deploy to
anything right now right this would work
with anything you can deploy XJS2 for
example or any of these other frameworks
um and we have implementations first
party implementation for a Postgres
example that uses Postgres as the
dability layer um and as we'll be
building this out and community comes in
we'll have support for essentially any
back end um because underneath the
TypeScript framework connects to any
storage or Q layer so anything that
provides a storage a database uh or AQ
can be used the back end for for this
>> related question for the observability
we also like providers to data dog stuff
>> yeah so uh we we have a uh multiple
things we have an an API that you can
use to to uh access data directly um and
we also have open source UI components
that you can use to display it and then
you can export this to data Um if you
want. Yeah.
>> You talked about screen a little bit and
talked about how it essentially like
job. Uh do is there more scheduling and
chron controls within workflow? So uh
because it's just typescript uh if
you're in a workflow um you can do
something like let's say we call sleep
for example right this this would just
be resume in a sorry resume in one day
but what you can also do is uh this is
just a promise or you can treat it as a
promise so you can do
while true sleep one day and then uh you
know do your code and it'll run once a
If you wanted to run once a day at 2
a.m. um you could you could say you know
how much time
to you know 1:00 a.m. tomorrow. Thank
you AI. And then you know done. Um and
you could also
wake up every hour do some checks
whether you actually want to run the
rest of the code. If not go back to
sleep. Um anything you can do with con
to do here. And if you want again
concurrency controls something um or any
kind of other deterministic controls you
have control flow in Typescript here.
You can check check external um APIs.
for example, um which you have to wrap
in a step, but you can make fetch calls
if you want to actually check data and
then determine from there.
>> So if you if you wanted to do a uh agent
that runs every once in a while every
day, you could have a scheduling wrapper
scheduling workflow that launches
another agent workflow
>> also. Yeah, you can you can you can
start workflows from workflows or you
could do this right where you uh sleep a
day and then call your agent and then
depending on the the on the stream you
want to right this is all right in the
same stream presumably you don't want
that presum maybe you can also uh
thatable
allows you to do namespace right uh one
um and you can get a new writable here
and then every time it runs you can have
a new stream that has a deterministic uh
name and you can choose which stream to
connect to.
>> Yeah.
>> Is there cancellation logic? Like say I
have something waiting for a long time
and then I decide to not have that be a
thing. How can I just like stop an
existing sleep from waking,
>> right? So you can cancel your workflows
from the observability UI or from the
from the API or the CLI. um all of those
avenues uh have you can call cancel or
you can also say well I don't even know
if I want to sleep on the end of the
zoom what you can do is do a uh let me
remove this part here you can do
something like
you know await promise grace and you can
do the sleep one day and you can do some
other actually human approval maybe wake
up earlier if a human clicks a button
than the one Okay. Um, there you go.
>> Yes.
>> Um, if you have multiple agents running,
uh, what would be your advised way of
having them communicate with each other?
>> Um,
if if you're
sort of depends on what kind of
communication you're looking for.
>> Firing things off and they're working,
but I wouldn't sure there was. So in
steps um you have access to
all code APIs or
NodeJS API fetch etc. You can have a
database, right? If you want to automate
over your own data source, you can have
a database. Um, if you want to have
multiple agents, um, you can, uh, use
one, you can use some of the same
streams to write to and share a stream.
Um
yeah,
I guess it's up to us ultimately with
our steps that they're like I demp it in
and if they have side effects when they
fail halfway that it's well behaved like
that's that's not at your orchestration
layer like that's up to us,
>> right? for the workflow layer, we
guarantee that there's no side effects
because if you try to import some code
that does side effect, I think it'll
just it'll just say like can't compile,
doesn't you know, don't do it. Um,
>> that was true for workflows, but for
steps
>> for steps, it can have side effects.
That's sort of the point.
>> So, it's up to us like if it fails, we
need to make sure that transactional and
it's rerunnable and
>> identity. There's some there's some uh
there's some error controls you can add
here where if a step fails, it'll
usually fail uh with a an error that
tells the workflow station layout that
you can retry it. You can also catch
this error and say, well, if it's a you
know this kind of error, don't retry it.
Instead, signal to the human to do
something or try this other avenue.
Yeah.
See there's something else.
>> Do you have one of the branches that has
like the complete code for what you just
did?
>> Yes. So, uh they all built on top of
each other. So, the conflooks
branch has the human approval tool
called the sleep tool call. resumable
streams um and using web flow.
>> Uh I will
see how I can post one general access
>> just tweet it out.
>> Yes, we'll do that.
>> Uh the workflows are in beta and so
>> Yes. Yeah. Um
>> okay. Yeah. Uh yes, I forgot to mention
this important uh workflow devel
I believe and we have a a GitHub
using production especially durable
agent stuff
>> internally we have I think more than 1
million workflows have been run on on
>> a day. Yeah,
>> it's mostly just like getting
the API to be stable and a bunch of
issues. But one of the things I love is
that we actually have more than issues.
But uh
>> yeah,
>> if there's any feature that you that you
need or that you really want to see, uh
we have an RFC section on GitHub uh
discussions for upcoming features,
things that we'll ship by GA or shortly
afterwards. And then open issues, right?
Um where you can add any issue and
presumably we'll be able to it soon or
someone from the community. Again, this
is uh all of the adapters that help
workflow develop run on any kind of
cloud back end or your homebrew back
end. All of those adapters are also open
source. So, you can see exactly what's
happening and you can connect it to your
own back end and only the source um just
looking at that code and we'll be happy
to help you.
Can you talk about uh
>> uh yeah um what would you like to know
>> the road map for like
>> right so for versioning um I talked a
little bit about about uh the ability to
upgrade runs across versions right uh
versioning is going to be very simple
where we have a crowd interface for all
of the versions that you have created
which for most people will be a
deployment right if you deploy your
um you know CI deploys your code to a
preview environment or production uh
every deployment will be one version um
and you can list those versions at any
point uh using the the workflow API and
the run will know which version it's
running on and you can call run.upgrade
upgrade to see if it's compatible with a
new version to upgrade it to that
version and I know any any more things
>> version
>> no yeah so so every every deployment uh
gets its own URL and not just in basel
but presumably in your if you go to AWS
lender for example right uh every
deployment has its own URLs so the web
hooks would apply to its own URLs um
which means that you don't need worry
about versioning except for tagging a
version when you first create the
deploy. Um and then whatever you think
is you want to be your main version is
the one you route to via your public
API.
>> Yeah, I think um sorry
obviously I used to work at my
experience in this stuff. I think a lot
of people unfortunately it's like
isolation of but sometimes you want to
sort of fix in place things that have
been a while.
>> Yeah. Migrations almost like agent
migrations to new version.
>> Yeah. So this is the same as as
upgrading in that sense, right? But if
you have a bunch of runs that are all on
a certain version and you have ship new
code and you want all those runs to be
upgraded to the new version or migrated,
right? Um, in the UI you'll be able to
select however many runs you want or for
the CLI you want to be able to get a
list and then say for these 20 IDs I
want to upgrade the run to this version.
Uh, will it'll do an internal check of
of can I resume these uh workflows from
a certain point? um like can I migrate
them in place sort of because the step
signatures overlap or if not it'll offer
you the option to cancel all of the
existing runs and rerun them on a new
version of the same input. Um if you
write your code in a way that's
compatible again um there's there's
going to be different options for for in
place migrations.
>> How would it detect that just by code
parts not being changed? So because we
have because we're essentially a
compiler plugin uh we can get full a uh
compatibility and and we are saving this
a uh
the inputs and outputs signatures um to
a manifest that we're uploading for the
versions and so for every version we can
tell what are the signatures for every
uh step and for the workflow itself and
all the other things that are happening
in between.
>> Another thing here is the workflow
function itself.
So yes, we played a whole bunch of times
during execution do anyway.
>> So when you want to upgrade
your event,
>> it's kind
of the code that you've got on the event
against it.
>> Sure.
There's a lot of variations between kind
of I do a step all the previous steps
stayed the same or you know this one got
changed. So it's like you like if
everything's done automatically it feels
like okay I could go down immediately
with all my agents or up
>> there's two ways you could there's two
ways you could be versioning
and I think the thing with
is you build for a platform where you
assume that the code your code is always
going to evolve in the same place. So
what we've seen is you end up with as
you start your first version of your
workload that you ship but as you start
making updates to you starting new
version your code now has all the stuff
in there that has
you have no guarantees of what version
running on the actual code that you're
running. The default assumption is that
my code could be running an event log
anyway and you end up with starts great
and then you have to do model
>> but it was the same with like killing
and sustaining
already had
forever. there's a natural step to go
say cool we're just gonna assume that
you you push a new version of the
workflow you pin everything
and so you don't have to worry about
that mental model that
code but instead it's time to go up push
button
I can go in
or even choose like you know
theoretically choose exactly how much of
that button get
there's a lot of stuff that you can do
on top since you have But what's nice is
that's that's a hard US and you know for
us to build when done well
hopefully very
>> I don't know we're close to done
>> I think we're close to done
>> uh we'll be sticking around for more
questions so uh
>> I guess okay so let's say I'm a third
party like But I think the other part is
observability.
Um I don't I I like poked around in I
don't see like much of a dashboard. I
expect that obviously you're going to
build one, right? Uh and then I also
want to import it into my data dog. See
there not a data dog but hey what's this
supposed
>> uh open telemetry spans uh which we'll
you know be able to emit um we'll add
some context to the spans by default
presumably. Uh so if you if you just
pipe your spans through data dog it'll
already have a lot of information on the
steps and event log. Um and you can also
submit your own uh telemetry obviously.
>> So is that the plan or is it you have
the first party?
>> The plan is that we will we will first
party support adding some of the like
all of the sort of step and event log
related context. Um will uh presumably
export a helper to add some of these
information to the spans. Um and then
every all information you want to tag in
there is up to you.
>> Can I uh attach like secrets to workflow
in a way that when I need to update them
they kind like you know
>> Yeah. So uh for one like right now you
can you can inspect all of the input and
output data right and it's obviously uh
for you as a someone with access to the
API uh which someone consuming the
workflow or starting the workflow
through web API wouldn't usually have
>> no
steps in steps
>> yeah so the the workflows run in the
same deployment as as it would usually
do and has access to the process uh
environment right so you can inject
environment levels the way you would
usually do um and as as long as long as
you don't log them which again
presumably you wouldn't do any anyway um
it's the same way as an API endpoint and
then if you want your data to be secret
right um right right now we expose it in
observability um if you have access but
we also will allow in the future to do
end to end encryption for any data store
>> right then we'll uh close the session
but we'll be around a little bit more
for questions if you want to you know
look over
She lost
heat.
Ask follow-up questions or revisit key timestamps.
The video introduces the open-source Workflow Development Kit, designed to simplify moving local AI agents to production by providing durability, reliability, and observability. It achieves this by separating agent code into deterministic workflow orchestration and isolated, retriable steps. The session demonstrates integrating the kit into a Next.js coding agent, highlighting features like resumable streams, long-running processes using sleep functionality, and human-in-the-loop workflows via webhooks.
Videos recently processed by our community