Claude Agent SDK [Full Workshop] — Thariq Shihipar, Anthropic
2858 segments
[music]
Okay. Yeah, thanks for joining me. I uh
I'm still on the West Coast time, so it
feels like I'm doing this at like 7:00
a.m.
Uh so yeah, but um glad to talk to you
about the Claude agent SDK. So um yeah,
I think like this is going to be like a
rough agenda, but we're going to talk
about we're going to talk about like
what is the claud agent SDK? Why use it?
There's so many other agent frameworks.
What is an agent? What is an agent
framework?
um how do you design an agent uh using
the agent SDK or or just in general? Um
and then I'm going to do some like live
coding or Claude is going to do some
live coding on prototyping an agent. Um
and uh I've got some starter code. But
uh yeah, I I the whole goal of this is
like know we got two hours. We're going
to be super collaborative, ask
questions. Um, this is also going to be
not like a super canned demo in the
sense that like we're going to be like
thinking through things live. You know,
I'm not going to have all the answers
right away. Um, and I think that'll be a
good way of like building an agent loop
I think is like really very much like
kind of an art or intuition. So, um, but
yeah, before we get started, just
curious, a show of hands, like how many
people have heard of the cloud agent SDK
or Okay, great. Cool. How many have like
used it or tried it out? Okay, awesome.
Okay, so pretty good show of hands. Um,
yeah, so I'll I'll just get started on
like the like, you know, overview on
agents. I I think that like this is I I
I think something that people
[clears throat] have seen before, but I
think it still is taking some time to
like really sink in. Uh how AI features
are evolving, you know? So I think like
when GPT, you know, 3 came out, it was
really about like single LLM features,
right? You're like, oh, like, hey, can
you categorize this like return a
response in one of these categories? Um,
and then we've got more like workflow
like things, right? Hey, like can you
like take this email and label it or
like, hey, here's my codebase like index
via rag. Can you give me like the next
completion or the next um the next file
to edit, right? And so that's what we'd
call like a workflow where you're very
like structured. You're like, hey, like
given this code, give me code back out,
right? And now we're getting to agents,
right? And uh like the canonical agent
to use is cloud code, right? Cloud code
is a tool where you don't really tell
it. We don't restrict what it can do
really, right? You're just talking to it
in text and it will take a really wide
variety of actions, right? And so agents
uh build their own context, like decide
their own trajectories, are working very
very autonomously, right? And so, uh,
yeah, and I think like as the future
goes on, like agents will get more and
more autonomous. Um, and we,
uh, yeah, I think it's like we're kind
of at a break point where we can start
to build these agents. Um, they're not
perfect, you know, but it's definitely
like the right time to get started. So,
um, yeah, Cloud Code, I'm sure many of
you have have tried or used. Um it is
yeah I think the first true agent right
like the first uh time where I saw an AI
working for like 10 20 30 minutes right
so um yeah it's a coding agent and uh
the cloud agent SDK is actually built on
top of cloud code and uh the reason we
did that is because
um basically we found that when we were
building agents at anthropic we kept
rebuilding
the same parts over and over again. And
so to to give you a sense of like what
that looks like, of course, they're the
models to start, right? Um, and then in
the harness, you've got tools, right?
And that's like sort of the first
obvious step, like let's add some tools
to this harness. And later on, we'll
give an example of sort of like trying
to build your own harness from scratch,
too, and and what that looks like and
and how challenging it can be. But tools
are not just like your own custom tools.
might be tools to interact with your
file system like with cloud code. Um did
the volume just go up or were they not
holding it close enough? [laughter]
Okay. Now anyways um got tools tools you
run in a loop and then you have the
prompts right like the core agent
prompts the um the prompts for the
things like that. Uh and then finally
you have the file system right and or
not finally but you have the file
system. The file system is a way of
context engineering that we'll talk more
about later, right? And I think like I
one of the key insights we had through
cloud code was thinking a lot more
through the like context not just a
prompt, it's also the tools, the files
and scripts that it can use. Um, and
then there are skills which we've like
rolled out recently and uh we can talk
more about skills uh um if that's
interesting to you guys as well. Um and
then yeah things like uh sub aents uh
web search you know like um like
research compacting hooks memory there
all these like other things around the
harness as well um and uh it ends up
being quite a lot. So the cloud agent
SDK is all of these things packaged up
for you to use right [clears throat]
um and yeah you have your application.
So I I think like
uh to give you a sense of uh yeah to
give you a sense of like
maybe why the cloud agent SDK is um
yeah like like so yeah people are
already building agents on the SDK a lot
of software agents uh you know software
reliability security triaging bug
finding um site and dashboard builders
if
These are extremely popular. If you're
using it, you should absolutely use the
SDK. Um, I guess office agents, if
you're doing any sort of office work,
tons of examples there. Um, got some
like, you know, legal, finance,
healthcare ones. Um, so yeah, there are
tons of people building on top of it.
Um, I want to Oh, yeah. Okay. So, why
the cloud agent SDK, right? Like why did
we do it this way? It's why did we build
it on top of cloud code? And we realized
basically that as soon as we put cloud
code out, yeah, the engineers started
using it, but then the finance people
started using it and the data science
people started using it and the
marketing people started using it and
yeah, I think it just like it we just
realized that people were using cloud
code for non-coding tasks and we felt
and and as we were building, you know,
non-coding agents, we kept coming back
to it, right? And so, um, it's a like,
and we'll go more into why that just
works, why we you could use cloud code
for non-coding task. Uh, spoiler alert,
it's like the bash tool. Um, but yeah,
it's uh it it was something that we saw
as an emergent pattern that we want to
use and we've built our agents on top of
it, right? And uh these are lessons that
we've learned from deploying cloud code
that we've sort of baked in. So, uh,
tool use errors or compacting or things
like that, stuff that is like very can
take a lot of scale to find, you know,
like what are the best practices we've
sort of baked into the cloud agent SDK.
Um, as a result, we have a lot of strong
opinions on the best way to build
agents. Uh, like I think the cloud agent
SDK is quite opinionated. I'll talk over
some of these opinions and and why like
uh why we chose them, right? Um but
yeah, one of the big opinions of the
bash tool is the most powerful agent
tool. So okay, um what what are like
what I would describe as the anthropic
way to build agents, right? And I'm not
I'm not saying that you can only build
agents using the API this way, right?
But this is like um if you're using our
opinionated stack on the agent SDK, what
is it? Right? So roughly Unix primitives
like the bash and file system and you
know we're going to go over like
prototyping an agent using cloud code
and my goal is really to sort of show
you what that looks like in real time
right like why is batch useful why is
the file system useful why not just use
tools um yeah agents uh I mean you can
also make workflows and we'll talk about
that a bit later but agents build their
own context um thinking about code
generation for non-coding
um like we use codegen to generate docs,
query the web, like do data analysis,
take uh unstructured action. So um
there's a lot of like uh this can be
pretty counterintuitive to some people
and again in the like prototyping
session, we'll we'll go over how to use
code generation for non-coding agents.
Um and yeah, every agent has a container
or is hosted locally because this is
cloud code. uh it needs a file system,
it needs bash, it needs to be able to
operate on it. And so it's a very very
different architecture. I'm not planning
to talk too much about the architecture
today, but we can at the end if that's
what people are interested in in or
sorry by architecture I mean hosting
architecture like how do you host an
agent and like uh what are best
practices there? Have you talked about
that at the end? Um [clears throat] yeah
so
well let me pause there because I feel
like I covered a lot already. any
questions so far on the agent SDK agents
um yeah like what you get from it
>> can you can you explain what code
generation for non-coding means exactly
>> yeah um this is um like basically when
you ask cloud code to do a task right
like let's say that you ask it to uh
find the weather in San Francisco and
like you know tell me what I should wear
or something right? Like uh what it
might do is it might start writing a
script uh to fetch a weather API, right?
And then start like maybe it wants it to
be reusable. Like maybe you want to do
this pretty often, right? So it might
fetch the weather API and then get the
like maybe even get your location
dynamically right based on your IP
address and then it will like um you
know check the weather and then maybe
like call out to like a sub agent to
give you recommendations. Maybe there's
an API for your closet or wardrobe,
right? It's like so that's an example. I
I think that like it's kind of um for
any single example we can talk over how
you might use code codegen. Uh a lot of
it is like composing APIs is like the
high level way to think about it. Yeah.
>> Uh yeah. And [clears throat]
>> yeah uh workflow versus agent uh like
for repetitive task or you know like a
process a business process that is
always the same. Do you will still
prefer to build an agent versus a fully
deterministic workflow?
>> Yeah. So, we do have
>> Oh, sure. Yeah. Yeah. Um, so the
question the question was about
workflows versus agents and would you
still use the cloud agent SDK for
workflows? Is that right? Um, yes. And
and so uh I mean we I just we just sort
of tell you what we do internally
basically and what we do internally is
we've done a lot of like GitHub
automations and Slack automations built
on the cloud agent SDK. So, uh, you
know, we have a bot that triages issues
when it comes in. That's a pretty
workflow like thing, but we've still
found that, you know, in order to triage
issues, we want it to be able to clone
the codebase and sometimes spin up a
Docker container and test it and things
like that. And so, it's still ends up
being like a very like there's a lot of
steps in the middle that need to be
quite free flowing. Um, and then you
like give structured output at the end.
So, um, yes. All right, we'll take one
more question and then we'll keep going.
So, yeah, in the blue. Yeah. Uh so could
you talk about security and guardians
like if if you know you're using cloud
agent SDK and you know you lean towards
using bash as the you know all powerful
generic tool and is the onus on uh
building the agent builder to make sure
that you know you're preventing against
like common attack vectors or is that
something that the model is is is doing
itself?
>> Yeah. So I I think this is sort of like
the Swiss chief. Oh yeah. Okay. So the
question was uh permissions on the bash
tool, right? Or like how do you think
about permissions and guardrails the
like in like when you're giving the
agent this much power over you know your
its environment and the computer, how do
you make sure it's aligned, right? And
so the way we think about this is uh
what we call like the Swiss cheese
defense, right? So like there is um like
on every layer some defenses and
together we hope that it like blocks
everything, right? So obviously on the
model layer uh we do a lot of um
alignment there. We actually just put
out a really good paper on reward
hacking. Super recommend you check that
out. Um so like definitely I think cloud
models like we try and make them very
very aligned, right? And uh so yeah
there's the model alignment behavior
then there is like the harness itself,
right? And so we have a lot of like
permissioning and prompting um and uh
like we do a pass par parser on the bash
tool for example so we know um fairly
reliably like what the bash tool is
actually doing and definitely not
something you want to build yourself. Um
and then finally
the last layer is sandboxing right so
like let's say that an someone has
maliciously taken over your agent what
can it actually do uh we've included a
sandbox and like where you can sandbox
network request um and sandbox uh file
system operations outside of the file
system. And so, uh, yeah, ultimately
that's what they call like the lethal
triacto, right? Is like, um, like the
ability to like execute code in an
environment, change a file system, um,
excfiltrate the code, right? I think I'm
getting the lethal trifecta a little bit
wrong there, but like the idea is
basically like if they can excfiltrate
your like information back out, right?
Um, that's like they still need to be
able to extract information. And so if
you sandbox the network, that's a good
way of doing it. Um if you're hosting on
a sandbox container like Cloudflare uh
modal or you know E2B Daytona like all
of these like sound sandbox providers
they've also done like some level level
of security there right it's like you're
not hosting it on your personal computer
um or on a computer with like your prod
secrets or something so uh yeah lots of
different layers there and and yeah we
can talk more about hosting in depth um
so okay so I'm going to uh talk a little
bit about bash is all you need you
Um, I think this is something that Oh,
yeah. Um, this is like my stickick, you
know? I'm just going to like keep
talking about this until everyone like
uh agrees with me. Um, or like I think
this is something that we found
atanthropic. I think it is sort of
something I discovered once I got here.
Um, bash is what makes code so good,
right? So, I think like you guys have
probably seen like code mode or
programmatic tool use, right? like the
uh different ways of like composing MLPS
uh cloudfl put out some blog post on
that we put out some blog posts uh the
way I think about code mode is like or
bash is that it was like the first code
mode right so the bash tool allows you
to you know like store the results of
your tool calls to files uh store memory
dynamically generate scripts and call
them compose functionality like tail
graph uh it lets you use existing
software like fmp or libra office right
so there's a lot of like interesting
things and powerful things that the
batch tool can do. And like think about
like again what made cloud code so good.
If you were designing an agent harness,
maybe what you would do is you'd have a
search tool and a lint tool and an
execute tool, right? And like you have
end tools, right? Like every time you
thought of like a new use case, you're
like, I need to have another tool now,
right? Um instead now cloud just uses
grap, right? And nodes your package
manager. So it runs like npm run like
test.ts or index.ts s or whatever,
right? Like it can lint, right? And it
can find out how you lint, right? And
can run npm run lint if if you don't
have a llinter. It can be like what if I
install eslint for you, right? So, um
this is like you know like I said the
first programmatic tool calling first
code mode, right? Like you can do a lot
of different actions very very
generically, right? Um and so to talk
about this a little bit in the context
of non-coding agents, right? So let's
say that we have an email agent and the
user is like okay how much did I spend
on ride sharing this week um a you know
like it's got one tool call or generally
it's got the ability to search your
inbox right and so it can run a query
like hey search Uber oryft right and
without bash it it searches Uber oryft
it gets like a hundred emails or
something and now it's just got to
think about it. You know what I mean?
And I I think like a good like analogy
is sort of like imagine if someone came
to you with like like a stack of papers
and like hey, how much did I spend on
ride sharing this week? Can you like
read through my emails? You know, I mean
like that that would be really hard,
right? Like uh you need very very good
precision and recall to do it. Um or
with bash, right? Like let's say there's
a Gmail search script, right? It takes
in a query function. Um, and then you
can start to save that query function to
a file or pipe it. You can GP for
prices. You know, you can uh then add
them together. You can check your work
too, right? Like you can say, okay, let
me grab all my prices, store those as
like in a file with line numbers and
then let me then be able to check
afterwards like uh was this actually a
price? Like what does each one correlate
to? Right? So there's a lot more like
dynamic information you can do to check
your work with the bash tool. So this is
like
um just a simple example but like
hopefully showing you sort of the power
of like the composability of bash right
so I'll pause there any questions on
bash is all you need the bash tool any
any thing I can make a little bit
clearer
>> do you have stats on how many people use
yolo mode
>> uh stats on yolo mode we probably do
um I mean internally we we don't uh but
that's just I think we just have a
higher security posture. Um,
[clears throat]
yeah, I'm not sure. Uh, I can probably
pull that. Any other questions on bash?
Okay, cool. Um, yeah, just to give you
like some more examples like let's say
that you had an email API and you wanted
to uh, you know, like go through like
fetch my like tell me who emailed me
this week, right? So, you've got two
APIs. You've got an inbox API and a
contact API. Um this is like a way you
can do it via bash. You can also do it
via codegen. This is kind of like enough
bash that it is codegen, right? Like um
bash is a ostensibly codegen tool. Um
and then yeah like let's say that you
wanted to you had a video meeting agent,
right? You wanted to say like find all
the moments where the speaker says
quarterly results in this earnings call,
right? You can use ffmpeg to like slice
up this video, right? um you can use jq
to like uh start analyzing the
information afterward. So um yeah, lots
of like def like powerful ways to use uh
to use bash. So
I'm going to talk a little bit about
workflows and agents. Yeah, you can do
both. You could use uh build workflows
and agents on the agent SDK. Um yeah,
agents are like cloud code. If if you
are like building something where you
want to talk to it in natural language
and take action flexibly, right? Then
that's why you're building an agent,
right? Like you want you have an agent
that talks to your like business data
and you want to get insights or
dashboards or answer questions or uh
write code or something like that's an
agent, right? And then a workflow is
kind of like, you know, we do a lot of
GitHub actions for example, right? So
you define the inputs and outputs very
closely, right? So you're like, "Okay,
take it a PR and give me a code review."
Um, and yeah, both of these you can use
agent SDK for. Um, when building
workflows, you can use structured
outputs. We just released this. Um, you
can, yeah, Google agent SDK structured
outputs. Um, but yeah, so you can do
both. I'm going to primarily be talking
about agents right now. A lot of the
things that you can like learn from this
are applicable to workflows as well. So,
um, yeah, we'll we'll talk about this.
Uh, wait, show of hands. How many people
have like designed an agent loop before?
Okay, cool. Okay, great. Great. Um, so
yeah, I mean, I think the number one
thing the the metalarning for designing
an agent loop to me is just to read the
transcripts over and over again. Like
every time you see see the agent
running, just read it and figure out
like, hey, what is it doing? Why is it
doing this? can I uh help it out
somehow? Right? Um and uh we'll do some
of that later, right? So we'll uh we'll
build an agent loop. Um but here is the
uh the three parts to an agent loop,
right? So uh first it's gather context,
right? Second is taking action and the
third is verifying the work, right? And
uh this is like not the only way to
build an agent, but I think a pretty
good way to think about it. Um gathering
context is uh like you know for cloud
code it's grepping and finding the files
needed, right? Um you know for an email
agent it's like finding the relevant
emails, right? Um, and so these are all
like pretty um, yeah, like I I think
thinking about how it finds this context
is very important and I think a lot of
people sort of uh, skip the step or like
underthink it. This can be like very
very important. Uh, and then taking
action um, how does it like do its work?
Uh, does it have the right tools to do
it like code generation, uh, bash these
are more flexible ways of taking action,
right? And then verification is another
really important step. And so the
basically what I'd say right now is like
if you're thinking of building an agent,
think about like can you verify its
work, right? And if you can verify its
work, it's like a great like candidate
for an agent. If you can't verify its
work, like it's like you know coding you
can verify by lending, right? And you
can at least make sure it compiles. So
that's great. uh if you're doing let's
say deep research for example it's
actually a lot harder to verify your
work one way you can do it is by citing
sources right so that's like a step in
verification but obviously research is
less verifiable than code in some ways
right because like code has a compile
step right you can also like execute it
then see what it does right so um I
think like thinking on you know like as
we build agents the ones that are
closest to being very general are the
ones with the verification step that is
very strong right So I I think there was
a question here. Yeah.
>> So when where do you generate a plan of
the work?
>> Yeah. I mean you you might
>> question
>> Oh yeah sorry the the question was when
do you generate a plan um before you run
through it. So um like in cloud code you
don't always generate a plan. Uh but if
you want to you'd insert it between the
gathering context and taking action
step, right? And so um plans sort of
help the agent think through step by
step, but they add some latency, right?
And so there is like some trade-off
there. Um but yeah, the agent SDK helps
you like do some planning as well. So
yeah.
>> Yeah. Can you like make the agent create
that to-do list for like 100%
sure that it will create that to-do list
and run by it?
>> Uh yeah. So the question was will the
agent create the to-do list? Uh yes. Um
if you're using the agent SDK, we have
like some to-do tools that come with it
and so it will like maintain and check
off to-dos and you can display that as
you go. So yep.
Um,
any other questions about this right
now? Okay, cool. Okay, so I'm going to
quickly talk about like like how do you
do this stuff? You like what are your
tools for doing it, right? And uh there
are three things you can do that you
have tools, bash and code generation,
right? And I I think traditionally I
think a lot of people are only thinking
about tools and uh yeah, basically one
of the call to actions is just figuring
out like thinking about it more broadly,
right? So tools are extremely structured
and very very reliable, right? Like if
you want to sort of have as fast an
output as possible with minimal errors,
uh minimal retries, uh tools are great.
Uh cons, they're high context usage. If
anyone's built an agent with like 50 or
100 tools, right? Like they take up a
lot of context and the model it kind of
gets a little bit confused, right? Um
there's no like sort of discoverability
of the tools. Um and they're not
composable, right? and and I say tools
in the sense of like if you're using you
know messages or completion API right
now um that's how the tools work of
course like you know there's like code
mode and programmatic tool calling so
you can sort of blend some of these um
but [clears throat] there's bash so bash
is very composable right like uh static
scripts low context usage uh it can take
a little bit more discovery time because
like let's say that you have whatever
you have like the playright MCP or
something like that um or sorry the
playright CLI the playright like bash
tool um you can do playright-help to
figure out all the things you can do but
the agent needs to do that every time
right so it needs to like discover what
it can do um which is kind of powerful
that it helps take away some of the high
context usage but add some latency um
there might be slightly lower call rates
you know just because like it has a
little bit more time to um it needs to
like find the tools and what it can do.
Um but this will definitely like improve
as it goes. And then finally, codegen
highly composable dynamic scripts. Um
they take the longest to execute, right?
So they need linking possibly
compilation. API design becomes like a
very very interesting step here, right?
And I and I'll talk more about like uh
best like how to think about API design
in an agent. Um but yeah I think this is
like how we like the the three tools you
have and so yeah using tools think you
still want some tools but you want to
think about them as atomic actions your
agent usually needs to execute in
sequence and you need a lot of control
over right so for example in cloud code
we don't use bash to write a file we
have a write file tool right because we
want the user to be able to sort of see
the output and approve it and um we're
not really composing write file with
other things, right? It's like very
atomic action. Um, sending an email is
another example. Like any sort of like
non-destruct like destructible or sort
of like you know uh unreversible change
is definitely like a a tool is a good
place for that. Um then [clears throat]
we've got bash. Uh so for example there
are like uh composable actions like
searching a folder using GitHub linting
code and checking for errors or memory.
Um and so yeah you can write files to
memory and that can be your bash like
bash can be your memory system for
example right so um and then finally
you've got code generation right so if
you're trying to do this like highly
dynamic very flexible logic composing
APIs uh like you're doing data analysis
or deep research or like reusing
patterns and so um yeah we'll talk more
about uh code generation in a bit
um any questions so far about like the
SDK loop loop or tools versus bash
versus codegen. Yeah.
>> Yeah. Uh I was going to ask
[clears throat] you are you going to
have any readymade tools for like
offloading results [snorts]
>> offloading tool called results like into
the file system or
>> like let's say goes to bash and then
context explodes.
>> Does it like [clears throat] typed a
command that like do everything up?
>> Okay.
>> Or or otherwise just like long outputs
polluting your history.
>> Sure. Yeah. Yeah. Yeah. I imagine like
all the time just uploading them to
files.
>> Yeah. Yeah. I I think that's a good
common practice. I think um we
I I remember seeing some PRs about this
very recently on on cloud code about
handling very long outputs and I I I
don't know exactly like I I think I
think we are moving towards a place
where more and more things are being
like just stored in the file system and
this is like a good example. Yeah, like
it's storing like long outputs uh over
time. Um, I think like generally
prompting the agent to do this is a good
uh way to think about it. Or even if you
have I think like something I just do
always now is like whenever I have a
tool call I um I save it like the
results of the tool call to the file
system so that you can like search
across it and then have the tool call
return the path of the result. Um just
because like that helps it like sort of
recheck its work. So um yes. Um, do you
find that you need to [clears throat]
use like the skills um kind of structure
to help claude along to use the bash
better or out of the box? You know,
that's not necessary.
>> Yeah. So, the question was about skills
and like do we need skills to use bash
better? Um, yeah, for context skills
maybe I can
Okay, skills. Okay. Yeah, skills are
basically a way of like uh you know
allowing our agent to take longer
complex tasks and like sort of load in
things via context, right? So some like
for example we have uh a bunch of DOCX
skills and these DOCX skills tell it how
to do code generation to generate these
files, right? And so um yeah, I think
overall skills are yeah, basically just
a collection of files. They're also sort
of like an example of being very like
file system or bash tool pilled, right?
Um because they're really just folders
that your agent can like CD into and
like read, right? Um and so yeah, they
give like what we found the skills are
really good for is pretty like
repeatable instructions that need a lot
of expertise in them. Uh like for
example, we released a front-end design
skill recently that I really really like
and um it's really just sort of a very
detailed and good prompt on how to do
front-end design. Uh but it comes from
like our best, you know, like uh AI
front-end engineer, you know what I
mean? And he like really put a lot of
top thought and iteration to it. So
that's one way of using skills. Um
>> yeah,
>> quick question. Why use that front
skill?
>> Sure. It's pretty good. Thanks for
publishing it. Uh I want to understand
uh there are multiple MP files like MP
is also there and it is also at the user
level
and then there are skill files like is
there like a priority order should some
stuff be relegated to claw.md and some
other stuff should only come to
skill.md? H so the question was about
skill.md versus claw.md and how to think
about uh that right and uh I think like
I I will say all of these concepts are
so new you know I mean like even cloud
code is like released it like eight or
nine months ago right like um and so
skills were released like two weeks ago
like I like I won't pretend to know all
of the best practices for for everything
right um I think generally
skills are a form of progressive context
disclosure closure and that's sort of a
pattern that we've talked about a bunch
right like with like uh bash and you
know like preferring that over like you
know purely like normal tool calls is
like it's a way of like the agent being
like okay I need to do this let me find
out how to do this and then let me read
in this skill empty right so you ask it
to make a docx file and then it like cds
into the directory reads how to do it
writes some scripts and keeps going so
um yeah I think like there's still some
intuition to build around like what what
exactly you like define as a skill and
how you split it out. Um but uh yeah, I
think uh yeah, lots of best practices to
learn there still. Um
>> yeah,
>> so yesterday
we [clears throat] talked about the
future of skills over time.
>> Do you see these as ultimately becoming
part of the model and some of the skills
this is just a way to bridge the gap for
now?
>> Yeah. Yeah. So the question was are
skills ultimately part of the model? Um
are they a way to bridge the gap? I
missed Barry's talk at Barry and M's
talk yesterday, but uh yeah, I think
roughly the idea is that the model will
get better and better at doing a wide
variety of tasks and skills are the best
way to give it out of distribution
tasks, right? Um, [clears throat] but I
I would broadly say that like it's
really really hard especially like you
know if you're like uh not at a lab to
like tell where the models are going
exactly. Um my general rule of thumb is
like I try and like rethink or rewrite
my like agent code like every 6 months.
Uh just cuz I'm like uh things have
probably changed enough that I've like
baked in some assumptions here. And so
like I think that like our agent SDK is
built to as much as possible sort of
advance with capabilities, right? Like
the bash tool will get better and
better. Uh we're building it on top of
cloud code. So as cloud code evolves,
you'll get those wins out out of the
gate. Um but at the same time like you
know things are so different now like
than they were a year ago in in terms of
like AI engineering, right? And I think
like a general best practice to me is
sort of like, hey, we can write code 10
times faster. We should throw out code
10 times faster as well. Um, and I think
thinking about like not so like hedging
your bets on like where is the future
right now, but like what can we do today
that really works, right? And like like
let's get market share today and not be
afraid to throw out code later. Um, if
you're a startup, this is arguably your
largest advantage that you have over
competitors. They're like, you know,
larger [snorts] companies have like
six-month incubation cycles. And so
they're always like stuck in the past of
like the agent capabilities, right? And
so your advantage is that you can like
be like, hey, the agent the capabilities
are here right now. Let me build
something that uses this right now,
right? So, um, yeah. Uh
any any other questions on for we're
talking about skills in bash. Okay. It
seems like there are a lot of skill
questions. So um yeah uh I I think at
the back someone you might have to
shout.
>> Yeah. So why would you use a skill
versus an API? They look very similar to
that Python program there could be a
package, right?
>> Yeah. The question was why use a skill
versus an API? Um, good question. I I
think that like um when you like these
are all forms of progressive disclosure
basically to the agent to figure out
what it needs to do. Um, and I'll go
over like uh examples of like you just
have an API, right? In in our like in
our prototyping session. Um, it's
totally like use case dependent, right?
Like just I think like I don't have a
like I don't think there's a general
rule. I think it's like read the
transcript and see what your agent
wants. If your agent always wants like
thinks about the API better as like a
API.ts file or something or API.py file,
do that. You know, that's great. Like I
think skills are like like sort of an
introduction into like thinking about
the file system as a way of storing
context, right? And they're a great
abstraction. Um, but there are many ways
to use the system. Um, and I I should
say that like something about skills
that like you need the bash tool, you
need a virtual file system, things like
that. So the agent SDK is like basically
the only way to really use skills to
like their full extent right now. So um
yeah. Yeah. Back there.
>> Can we expect a marketplace for skills?
>> Yeah. The question was can we expect a
marketplace for skills? So um yeah,
clock code has a plug-in marketplace
that you can also use with the agent
SDK. Uh we're evolving that over time,
you know, like it was like a very much a
v0ero. Um, and by marketplace, I'm not
sure if people will be charging for this
exactly. It's more just like a discovery
system, I think. Um, but yeah, that
exists right now. You can do SL plugins
in cloud code. Um, and and you can find
some. So, yeah. Yep.
>> What's your current thinking about when
you're going to reach for like the SDK,
you know, to solve a problem?
>> When? Yes. The question is when do I use
the SDK to solve a problem? uh if I'm
building an agent basically I I think
that like um my overall belief is that
like for any agent the bash tool gives
you so much power and flexibility and
using the file system gives you so much
power and flexibility that you can
always ek out performance gains over it
right and so uh yeah in the prototyping
part of this talk we're going to like
look at an example with only tools and
an example without with you bash and the
file system and compare those two. Um,
and yeah, that's what I mean by being
bashful to build. I'm like I I just like
start from the agent SDK, you know, and
I think a lot of people at Enthropic
have started like doing that as well.
So, um, of course I I do want to say
that there are lots of times where the
agent SDK is kind of annoying because
you've got like this network sandbox
container and you're like, I hate like I
don't want to do this, you know? I mean,
like I want to run on my browser
locally, right? Um, I totally get that.
And I think it's there is like a real
performance trade-off. Um the way I
think about it is sort of like React
versus like jQuery, you know, like I
like I when I was coming up, I was like
very into webdev and like you know I was
using jQuery and Backbone and then React
came out and it was by Facebook and
they're like you have to here's JSX like
we just made this up and and now there's
a bundler, right? I'm like it's so
annoying. Um, but like they generally
makes the model or it makes it made web
apps more powerful, right? And I think
we're sort of like the agent SDKs are
like the react of agent frameworks to me
because it's like we build our own stuff
on top of it. So, you know, it's real
and all the annoying parts of it are
just like things where we're annoyed
about it too, but we're like it just
works like you have like got to do this,
you know? Um, so yeah.
Uh, yeah. Okay. more more skill
questions I guess. Yeah. Right here.
>> Uh what's the question? Bash.
>> Oh, sure. Bash question. Great. I love
bash.
>> Custom internal like bash tools.
>> Yeah.
>> How do you let the agent discover that
or do those have to become tool tools?
>> Okay. The question is if you have custom
agent bash tools, how do you let the
agent discover that? By custom bash
tools, you mean like bash scripts?
>> We have we have bash scripts. Yeah.
>> Um yeah. So I I think uh where is it?
you just put it in the file system and
you tell it like hey like here is a
script. Uh you can call it you know I
I'm generally thinking in the context of
the cloud agent SDK where it has the
file system and the bash tools are tied
together. This is kind of an anti
pattern I see sometimes where people are
like, "Oh, like we're going to host the
bash tool in this like virtualized place
and it's not going to interact with
other parts of like the agent loop, you
know, and that sort of, you know, makes
it hard cuz if if you got a tool result
that's saving a file, then your bash
tool can't like uh read it, you know, I
mean, unless it's all in one one
container." So, does that answer your
question? Like
>> Yeah, kind of. I mean, like, so you're
just saying you just put it in like
system prompt or something? Yeah, I just
put in system prompting like hey you
have access to this. I would like sort
of design all my CLI scripts to have
like a d-help or something so that the
model can call that and then it can like
progressively disclose like every like
subcomand inside of the script. Yeah.
>> Uh yeah m
>> yeah so uh like my question is around
when to reach for the agent SDK. So have
you designed or rather would you
recommend someone use the agent SDK to
build like a generic chat agent as
compared to like oh you know I'm
building an agent where you have some
input and the agent goes and does some
stuff and finally I care about the
output as compared to let's say someone
like are you using or do you foresee
using the agent to build like the agent
SDK to build like clot the the app
rather than clot code. Uh yeah. So the
question is when do we reach for the
agent SDK uh does um like uh like would
we use the agent SDK to build cloud.AI
which is a more traditional chatbot uh
than cloud code. Um I one I think cloud
code is like a very like like interface
is not a traditional chatbot interface
but like the inputs and outputs are
right like you input code in you you get
like or you input text in you get text
out and you they take actions along the
way um you might have seen that like
when we rolled out doc creation for
cloud.ai AI. Um, now it has the ability
to spin up a file system and like create
spreadsheets and PowerPoint files and
things like that by generating code. And
so that is like you know we're in the
midst of sort of like um like merging
our agent loops and stuff like that. But
but broadly like uh like yeah cloud.ai
will like is getting more and more like
you see it with skills and the memory
tool and stuff more and more file system
pills, right? So, uh, we do think this
like a broad thing that you can use just
just generally and happy to talk through
examples.
Um, yeah, one more question then we'll
keep going. Yeah.
>> Still trying to understand the rule of
thumb on when to build a tool or use a
tool, when to
wrap something with a script or just let
the agent go wild on the bash because I
I'll give you an example. Let's say I
need to access a database
from time to time. I can use an MCP. I
can wrap it in a script and I can just
let the agent call an endpoint from B
directly from bash, right?
>> Yeah, great question. Great question.
So, it still trying to gro like when to
use tools versus bash versus codegen and
you gave an example like okay, I have a
database. Um, I want the agent to be
able to access it in some way. What
should I do? Should I create a tool that
queries the database in some way? Um,
should I use the bash? Should I use
codegen? Right? These are all these are
three ways of doing it. Um I think that
they are like you could use any of them
and I I think like part of it is like I
I think unfortunately there's no like
single best practice, right? This is
like kind of a system design problem.
But let's say that you want to access
your bash your database via tool. You
would do that if your database was very
very structured and you had to be very
careful about like I don't know you're
accessing like user sensitive
information or something like that and
you're like hey I I can only take in
this input and I need to like give this
output and I have to mask everything
else about the database from the agent
right obviously that like sort of limits
what the agent can do right like it
can't write a very dynamic query right
um if you're writing a full-on SQL
query, I would definitely use bash or
cogen uh just because when the model is
writing a SQL query, it can make
mistakes and the way it fixes it is is
its mistakes is by like linting or like
running the file, looking at the output,
seeing if there are errors and then
iterating on it, right? Um, and so I
generally like if I'm building an agent
today, I'm giving it as much access to
my database as possible and then I'm
like putting in guard rails, right? Like
I'm probably limiting its like right
access in different ways. But what I
probably what I would do is like I would
give it right access and put in specific
rules and then give it feedback if it
tries to do something it can't do. You
know what I mean? And so I know this is
like kind of a hard problem, but I think
this is the like set of problems for us
to solve, right? Like we built a bash
tool parser. Um, and that's a super
annoying problem. Uh, but we need to
solve that in order to like let the
agent work generally, right? And same
thing with like database like like yes,
it's quite hard to understand what is a
query doing, but if you can solve that,
you can let your agent work more
generally over time. So um yeah I think
thinking about it uh like flexibly as
much as possible and keeping tools to be
like very very like sort of atomic
actions right that you need a lot of
guarantees around.
Um
>> a follow on the same thing right uh how
do you ensure the role based access
controls are taken care of
>> how do you uh so the question is how do
you ensure that the role based act uh
access controls are taken care of
usually that's in like how you provision
your API key or your backend service or
something like that right like um I
think that like probably what I do is
like I create like temporary API keys
sometimes people create proxies in
between to insert the API keys
um if you're concerned about
excfiltration of that. Um but yeah, I
would create like API keys for your
agents that are scoped in certain ways
and so then on the back end you can sort
of check it's like you know what it's
trying to do and like uh if it's a an
agent you can like give it different
feedback. So yeah.
>> All right. I have one more question.
>> Um anything you could tell us uh more
about the the memory tool the internal
memory tool? Um,
I have I I'm not trying to like keep a
secret. I I don't know exactly like I
haven't read the code, but I I think it
generally works on on the file system.
And so, um,
>> you expose it to uh to the uh SDK or is
it already available?
>> Um, I would say that like we we've had
this question a bunch. I would just use
the file system on in the cloud agent
SDK. I would just create like a memories
folder or something and tell it to write
memories there. Um it's like I I don't
know the exact implementation of the
memory tool but it does use the file
system in in in that way. So yeah
um all right yeah last question on this.
Yeah
>> yeah how you are manage for the b and
the code how you are managing the like
reusability suppose the same agent is
rolled out to hundreds of users and uh
same code every time it is generating
and every time it is executing. So how
can we use the reusability? Yeah, that's
a really good question. So, uh yeah,
let's say you have two agents
interacting with two different people.
The question is like how do you think
about reusability between agents or how
do agents communicate, right? Um I think
uh this is a thing to be discovered. I
think like but I think there's a lot of
best practices and system design to be
done on like um because traditionally
with web apps you're serving one app to
like a million people right and with
agents like with cloud code we serve
like you know a onetoone like container
when you use cloud code on the web it's
like it's your container right and so
there's not a lot of like communication
between containers it's a very very
different paradigm I'm not going to say
that like I know exactly the best system
design to do that right and like I think
there's a lots of best practices on like
okay these agents are reusing work um
how can we give them like like
general scripts that combine together
the work that they've done how can we
make them share it um I would generally
think this is sort of like a tangent but
on like agent communication frameworks I
would say that like we probably don't
need like a whole we don't I I think
this is more of a personal opinion I
think like we probably don't need to
reinvent and uh like a new communication
system. There are like the agents are
good at using the things that we have
like HTTP requests and hash tools and
API keys and uh named pipes and all of
these things. And so like probably like
the agents are just making HTTP requests
back and forth from each other, you
know, using HTTP server. Um there's a
bunch of interesting work there. I've
seen people make like a virtual forum
for their agents to communicate and they
like post topics and like reply and
stuff like that. Um kind of cool. I
think there's a lot of things to explore
and discover there. Yeah. Okay. Um going
to keep going a little bit. How are we
doing for time? Okay. It's got an hour
left, I think. Okay. Um
cool. So, an example of designing an
agent. Uh this is a like yeah let's this
is not the prototyping session but I
think this is like will be a good sort
of like like lee way into it. Let's say
we're making a spreadsheet agent. Uh
what is the best way to search a
spreadsheet? What's the best way to
execute code like or what's the best way
to take action in a spreadsheet? What is
the best way to link a spreadsheet?
Right? These are all like really
interesting things to do. Uh I'm going
to do like a Figma and we can go over
it. Um, if someone could grab a water as
well, that'd be great. I like could
really use water. I'm uh Yeah. Yeah.
Okay. Um, thanks. Uh,
okay. So, we're going to um
Yeah, let's let's talk through it. Uh,
or why don't you spend like a couple
minutes yourselves thinking about this
question? You have a spreadsheet agent.
You want it to be able to search. You
want to be able to like gather context,
take action, verify its work. How would
you think about it? Right? So like just
spend some time thinking through that.
Take some notes or something.
Okay. Is everyone get had a little bit
of time to think about this? Does anyone
want more time or want to just dive into
it? Okay. Uh, what's the best way for an
agent to search a spreadsheet? Realizing
I have to type with one hand now. Um,
I should figure this out because I'm
going to type later. Okay. Um, the Okay,
searching a spreadsheet. Uh, any any
ideas how do you search a spreadsheet?
Like what would you do?
>> CSV.
>> Okay. You've got a CSV. Okay. And now
like your agent wants to like search the
CSV. What what does it do?
>> A GP. Okay. Uh what does the GP look
like?
>> Needs to look at all the headers.
>> Looks at the headers. Okay.
>> Headers of all sheets.
>> Okay. Great. Yeah. Yeah. And let's say
I'm looking for the revenue in 2024 or
something. Um
now I've got my headers like uh I'm just
going to pull up a spreadsheet, right?
Um let's say that the revenue is in
there's a revenue column and then
there's like a
uh so yeah let's see
okay so yeah let's say it's something
like this right like um How do I get
revenue in 2026? Right? So, this is sort
of like a tabular problem, right? Like
there is revenue here and there's also
2026 here, right? So, it's like a
multi-dimensional step, right? We could
look at the headers that will then give
us uh like if you just pull this, you'll
get 100, 200, 300, right? So, we need a
little bit more. Uh any other ideas?
>> Yeah,
>> there's a bash tool for it. Uh, a AWK, I
think.
>> O. Okay. Yeah. Yeah. Yeah. And what
would it A for?
>> Well, depends on what you what you're
looking for.
>> Yeah. Yeah. Yeah. That that's a
question, right? Like what is the user
looking for, right? They're probably
looking for something like this, like
revenue in 2026, right? Um,
>> maybe use the APIs to use the Google
tools to add all the numbers together or
V look up something like this, right?
>> Yeah. So the idea is like use the APIs
like use the Google APIs to like look it
up. Um that's great. Uh but yeah, let's
say we're working locally. We need to
sort of design these APIs. Yeah.
>> SQLite ord
CSV directly and work.
>> Oh, interesting. Okay. Yeah, I didn't
know that. That's great. So yeah, you
you use SQLite to query a CSV. Um that's
a great like sort of creative way of
thinking about API interfaces, right?
like um if you can translate something
into a interface that the agent knows
very well that's great right and so like
if you have a data source if you can
convert it into a SQL query then your
agent really knows how to search SQL
right so thinking about this
transformation step is really really
interesting it's a great way of like
designing like an agentic search
interface so
>> um yeah over there
>> sorry real quick while we're talking
about tools because you can use TSV for
some of the stuff as well
>> um is there any good ranking within the
with Is Cloud smart enough to start
ranking the right tool for the right
job? Because that's kind of what we're
talking about here is right tool for the
right job.
>> Yeah. Is Cloud smart enough to write
rank the right cool tool for the right
job? Uh yeah, if you prompt it, you
know, like or like I I think that's one
of those things where like I don't know,
let's find out like let's read the
transcript. Uh if it's not like how can
you help it?
>> Yeah. Just sort of like I I think all of
these things are like an intuition, you
know? It's like like kind of like riding
a horse. Not that I've ever rode a
horse, but I know just like I imagine
it's like running.
[laughter]
>> Yeah. Like you like, you know, you're
sort of giving these signals to the
horse. You're calming it down. You're
trying to understand what it how how do
you push it faster? You know what I
mean? And sort of like it's a very
organic like thing, right? Um like I
think we like to say that models are
grown and not designed, right? And so
we're like sort of understanding their
capabilities. Yeah. Uh yeah what and
where it is. Yeah
>> quick question. So is a way to add like
metadata to the spreadsheet? Can you
give descriptions in a different
document?
>> Yeah that's for example KPI
to build intelligence to ask questions.
>> Yeah. So that's another great pattern is
like okay can you add metadata to a
spreadsheet? So these are some questions
that you might want to think about
before like when you're thinking about
search is like what pre-processing can
you do to make the search better, right?
And so one example is that you translate
it into like a SQL format or something
where you use something that can query
it, right? That's like a translation
step. Another step is like maybe you
have a tool or um like a a
pre-processing step where another agent
annotates the the spreadsheet and and
like adds information so that the agent
can then like search across that
information better. Right. So um yeah,
one more. Um, I was just curious what I
mean all those tools sound great, but
yeah, why can't the agent just,
>> you know, do what was suggested, read
the header and then just get the date?
Like I feel like that should pretty
trivial
or retest.
>> Yeah, probably I should have like
prepared this in code. But yeah, I I
built a ton of spreadsheet agents
before. Basically, it's
>> not it's kind of hard to do. Yeah. Yeah.
So, um, basically what what I would
think about is like, so we we've got
like Okay, I
Sean, do you have suggestions on how it
can talk and code at the same time? Go
ahead.
>> Oh, I see. Yeah.
>> Do you work at Whisper Flow or something
or
>> Stick the mic in your shirt?
>> There's a microphone button. [laughter]
>> There's a microphone button on the bat.
>> Stick the mic in your shirt.
Oh, I I I just don't trust that stuff,
man. Okay. Um,
[laughter]
maybe I shouldn't be working in an AI
lab. Um, okay. So,
uh, let's see.
>> Hold on. Hold on. Um,
search. So,
>> one way to do it is like
you see in spreadsheets, right? Like you
can say here you can design formulas
right so like B3
2
right so this is a syntax for example
that the agent's pretty familiar with
like B3 to B5 right and so you can
design an agentic search interface which
is like this right like B3
B5 or something right so like your
agentic search interface can take in a
range right it can taking a range
string, right? And these are things that
like the agent knows pretty well, right?
Like you can um do SQL queries, right?
Agent knows SQL queries pretty well,
right? Um and uh like these you can also
uh do XML, right? Sorry, the font is so
small. Um
okay. Uh yeah, you can also do XML.
I'm not sure if you guys know but like
uh XLX files are XML in the back end
right and XML is very structured uh you
can do like an XML search query uh and
there are different libraries that can
do that so that's one example right is
like how do you search and gather
context and I hope this sort of like
illustrates to you that like gathering
context is really really creative right
like and and like there's so many
iterations and if you just if you've
only tried one iteration it's probably
not enough right like think about like
as many different ways as you can like
try these out, right? Like try SQL, try
try the CER, try try the GP and O and
like all of these things and um have a
few tests that you're trying across
different things and and see what the
agent likes and what it what it doesn't
like. Um it's going to be different for
each case.
>> Sorry.
When you say agent, you're referring to
the the model or because we're building
an agent here.
>> Yeah. and you're relying on already free
existing knowledge of how to handle XML
who's who's doing that the model.
>> Yeah, because the question is like who
uh where is the knowledge come from? Is
it the model? Is it like what is what do
I mean by the agent? Yeah, generally I
think what you're looking for is like
you have a problem you want to make it
as indistribution as possible for the
agent, right? And so the agent knows a
lot about a lot of different things. It
knows a lot about for example finance,
right? So if you ask it to make a DCF
model, it knows what DCF is, right? And
you can if if you want to give it more
information, you can make a skill,
right? But so it knows what DCF is. It
knows what SQL is. Can it combine those
things together, right? And so like uh
ideally you want to like your your
problem is going to be out of
distribution in some way, right? like
like there's some like information
that's not on the internet or something
that you have um or something somewhat
unique to you and you want to try and
like massage it to be as in distribution
as possible. Um and uh yeah it's it's
very very creative I think like uh you
know it's not like a it's not a science
to be [laughter] very much like an art.
So, um, yeah. Okay. So, we we've tried
gathering context, then taking action.
Um, we can probably do a lot of the same
stuff here that we've done before,
right? Like we can do like insert
2D array, right? Um, if we've got like a
SQL interface, right, we can um we can
do a SQL query, we can edit XML. Um,
these are like often very similar,
right? Like taking action and gathering
context that that you probably want a
similar API back and forth. And then the
last thing is verifying work, right?
Like how do you think about how do you
think about that? Um, check
for null pointers, right, is one of the
ways to do it. Um, any other ideas on on
verification or Yeah.
>> Sorry, I'm I'm a bit confused if you say
>> like when when you're using other SDKs
to build Asian, I don't need to tell it
how it should gather the context.
>> Sure.
>> I just give it the context and explain
this is what like basically I explain in
plain English
>> what is meant to do.
>> Yeah.
>> And what I tend to do and you tell me if
I'm wrong, I actually end up creating a
separate agent for QA.
>> Oh, interesting.
>> To to verify because I don't trust the
agent to verify itself.
But I'm just I'm I'm just a bit I
confused about the level of detail I
need to provide the agent in that
example.
>> Yeah. Okay. So the question is about um
giving context to the agent versus
having it gather its own context. Uh you
mentioned that you sometimes use a Q&A
agent. Uh can I ask like what like
domain you you're building your agent in
or
>> in uh cyber security.
>> Okay. Sure. Yeah. Yeah. Um, I think that
I I think I need to like look into more
specifics, but the cloud agent SDK is
great for cyber security and like I
would generally push people on like let
the agent gather context as much as
possible, you know, like let it find its
own work as much as possible. Um, you're
trying to give it the tools to find its
own work. The way I think about this is
kind of like let's say that someone
locked you in a room and they were they
were like giving you tasks, you know,
like that's your what your job was like
a Mr. Beast sort of like scenario,
right? Like you get $500,000 if you stay
in this room for 6 months. Um then like
like someone's giving you a message,
what tools would you want to be able to
do it, right? Like would you just want
like a list of papers or like would you
want a calculator or like a computer?
Right? Probably I would want a computer,
right? I'd want Google. I'd want like
all of these things, right? And so like
I wouldn't want the person to send me
like a stack of papers being like, "Hey,
this is probably all the information you
need." I'd rather just be like, "Hey,
just give me a computer. Give me the
problem. Let me search it and figure it
out." Right? And so that's how I think
about agents as well. like they need
like like you know they're stuck in a
room.
>> So I need to give them tools. So if you
can go back to the slides you have to
the graph you had
>> to the graph like this or
>> yeah the so basically that gathering
context is basically these are the tools
I'm offering it.
>> Yeah. Exactly. Yeah. You're I'm giving
it like maybe an API for code
generation. Maybe I'm giving it a SQL
tool. Maybe I'm giving a bash. These are
all like examples, right? So yeah, you
have one more question
>> question. So for all the agents that
you're [clears throat] having,
do they share the same context window?
>> Interesting. Yeah. So do agents share
the context window? I think I think this
is like an interesting question just
overall about how you manage context. Um
I think and I haven't talked about this
too much yet, but sub agents are like a
very very important way of managing
context. Um, I think that this is like
we're using more and more sub agents
inside of cloud code and I would think
about like doing sub agents very
generally. So like what we might do for
the spreadsheet agent is maybe we have a
search sub agent, right? So like sub
aents are great for when you need to do
a lot of work and return an answer to
the main agent. So for search, let's say
the question is like how do I find my
revenue in 2026? Maybe you need to do a
bunch of results. Maybe you need to like
uh search the internet, maybe you need
to search the spreadsheet, things like
that. And there's a bunch of things that
don't need to go into the context of the
main agent. The main agent just needs to
see the follow result, right? And so
that's a great sub agent task. Um I
don't have a dedicated sub aent slide
here, but like yeah, they're very very
useful and I I think a great way to
think about things. Um yeah,
>> and just to just to build on that
question actually
>> for verification for example, you can
imagine doing that through a skill or a
sub agent. You might even want to have
an adversarial like the security example
is a great one. Want to have really go
to town on it and not really have any
sympathetic relationship with the work
already done. It's a very I I get it's a
spectrum, but do you like are you saying
yes, you'd use a sub agent here, you'd
use a skill? How would you think about
this?
>> Yeah, definitely. So question on like uh
do you sub agents or oh
>> I'm sure it'll work just to make sure.
>> Oh, sure. Okay. Yeah. Yeah. Thank you.
Appreciate it. Um okay. Yeah. Uh can you
use sub agents for verification? Uh yes.
I I think this is a pattern. I think
like ideally the the best form of
verification is rulebased, right? You're
like is there like a null pointer or
something? Uh that's like easy
verification. it it doesn't lint or
compile like like as many rules as you
can try and insert them and again be
creative right like for example uh in
cloud code if the agent tries to write
to a file that we know it hasn't read
yet like we haven't seen we haven't seen
it enter the read cache we throw it an
error we we tell it like hey uh you
haven't read this file yet try reading
it first right and that's an example of
sort of like a deterministic tool that
we insert into the verification step and
So as much as possible like anytime you
are thinking about you know verification
first step is like what can you do
deterministically what like what like
you know outputs can you do and again
like when you're choosing which a like
types of agents to make the agents that
have more deterministic rules are better
you know like they just like like it
just makes a lot of sense right so um of
course as the models get better and
better at reasoning then you can have
these sub agents that check the work of
the main agent the Main thing there is
to like avoid uh context pollution. So
you probably wouldn't want to like fork
the context. You'd probably want to
start a new context session and just be
like, "Hey, yeah, adversarily check um
the work of like this this output was
made by a junior analyst at McKenzie or
something. They graduated from like not
a grade school like their GPA like you
know like like just like feed it a bunch
of stuff and then tell it to critique
it, right? Like that's like one of the
tools of the sub agent, right? And so um
yeah, the more you like
uh yeah, as the models get better and
better, that sort of verification will
become better as well. Um but doing it
deterministically is like a great start.
>> Yeah.
>> Just a [clears throat] question about
the verify work. So
>> yeah.
>> Um so let's say we found no pointers.
It's probably easy to just say, "Okay,
fix it." But like you know let's say we
deploy it to production and the client
is using it that's not us and they
somehow get into a spot where the whole
spreadsheet is deleted and so like like
on what level do we need to bake in like
the ability to like undo tools and
because like um let's say the QA agent
returns that their spreadsheet is empty.
>> Yeah.
>> Not necessarily is able to undo for so
like what was your advice there?
>> Yeah. So the question is like how do you
think about state and like undoing and
redoing being able to um fix errors
basically right? I think this is like a
really good question and honestly
another sort of like um like when you
think about like what are agents good at
right like or what problem domains are
agents good at? How reversible is the
work is like a really good intuition
right? So code is quite reversible. you
can just like go back, you can undo the
git history. We we come with like, you
know, these atomic operations right out
of the gate, right? Like I use git
constantly through cloud code. I I don't
type g commands anymore, right? So, um
that's like a really good example. A
really bad example is computer use,
[clears throat] you know, because
computer use has is not reversible in
state, right? Like let's say you go to
like door-ash.com and you add like the
user wants you to order a Coke and you
add order a Pepsi now like you can't
just go back and click on the Coke. You
have to like go to the cart and you have
to remove the Pepsi, right? And so your
mistake has like compounded this like
you know this state and the state
machine has gotten more complex, right?
And and so like whenever you're dealing
with like very very complex state
machines that you can't undo or redo of
it does become harder, right? And I
think one of the questions for you as an
engineer is like can you turn this into
a reversible state machine kind of like
you said can you store state between
checkpoints such that the user can be
like oh my spreadsheet is messed up
right now just go back to the previous
uh checkpoint right uh potentially even
can the model go back to previous
checkpoints um I I think someone had
this like time travel tool um that they
were giving one of the coding agents
which was kind of cool where you're like
it's like you can time travel back to a
point before this happened. You know
what I mean? Uh it's kind of fun. I
think like all of these tools, some of
them don't work that well yet, but you
know, we'll we'll get there. Um yeah,
thinking about state and verification is
is very useful, right? So, um yeah,
quick question at the back.
>> Yeah. Um
>> I'm kind of curious about scale. Um so
what if the spreadsheet is like millions
of rows million and thou hundreds of
thousands of columns right or just like
any sort of database like in that type
of situation how would you go about
searching there's obviously a context
you have to commentate for
>> yeah this is great um I probably should
have done the spreadsheet example as my
coding example for for a preview my
coding like agent is a Pokemon agent um
probably spreadsheet would have been
Okay. Uh the question was what if the
spreadsheet is very big? If you have a
million rows, uh how do you think about
>> 100 column
>> yeah 100,000 columns or 100 columns or
whatever like how do you think about it
right like your database is also very
big like how do you how do you do that?
Um I think for all of these things uh
one of course as the data becomes larger
and larger it's just a harder problem
like you know it just absolutely is your
accuracy will go down right like cloud
code is worse in larger code bases than
it is in smaller code bases right as the
models get better they will get better
at all of that um for all of these I
would think about like how would I do
this if I had a spreadsheet that was
like a million columns and a million
rows what would I do I I mean I would
need to start searching for it Right. I
would need to be like like if I'm
searching for revenue, I'd be like
searching Ctrl+F revenue and then I'd go
check each of these like results and I'd
be like is this right? And then like I'd
see like hey is there a number here? And
then I'd probably keep a scratch pad
like a new sheet where I'm like hey like
equals revenue equals this you know and
and and store this reference and and
keep going. So I I think that's a good
way of thinking about it is like the
model should you should never like read
the entire spreadsheet into context
because it would it would take too much
right like um you want to give it like
the starting amount of context that's
also how you work right like let's say
that you open up the spreadsheet what
you see is rows is this right you see
like the first 10 rows and the first
like you know 30 columns or something
right that's what you see you don't load
all of it into context right away you
probably have an intuition for like,
hey, I should load more of this into
context, right? And and like, oh, I
should navigate to this other sheet,
right? And this other sheet has more
data, right? Um, but you need to like
sort of you gather context yourself,
right? And so the agent can operate in
the same way. It can like navigate to
these sheets, read them, like try and
like keep a scratch pad, keep some notes
and keep going. So that's how I would
think about it. Uh, yeah, in the back.
>> Yeah. So my question is about managing
context pollution and actually I guess
relates to the previous question. Um do
you have a rule of thumb for you know
what fraction of the context window do
you use before you start hitting
diminishing returns or it becomes less
effective?
>> Yeah the question is yeah context
management. Do you have a rule of thumb
for like uh how much of the context
window to use before it becomes less
effective? This is actually I'd say a
pretty interesting problem right now.
Um,
I think a lot of times when I talk to
people who are using cloud code, they're
like, I'm on my fifth compact. I'm like,
what? Like like I've I like almost have
never done a compact before. You know
what I mean? Like I have to like test
the UX myself by like like forcing
myself to get compacted. Um just because
like I I tend to like clear the context
window very often right when I'm using
cloud code myself just because like um
at least in in code the state is in the
the files of the codebase right so let's
say that I've made some changes uh cloud
code can just look at my git diff and be
like [snorts] oh hey these are the
changes you made it doesn't need to know
like my entire chat history with it you
know in order to continue a new task
right and so in cloud code I clear the
context very very often often and I'm
like, "Hey, look at my outstanding get
changes. I'm working on this. Can you
help me extend it in this way?" Right?
That's like a way of thinking about it.
And um when you're building your own
agent, like let's say we're building a
spreadsheet agent, it gets a little bit
more complex because your users are less
technical, right? And they don't know
what a context window is, right? Um that
is like I'd say a hard problem. I think
there's like some UX design there of
like can you reset the conversation
state, right? like can you maybe every
time the user asks a new question can
you do your own compact or something and
can you like uh summarize the context?
Um does it like in a spreadsheet a lot
of the state is in the spreadsheet
itself so it probably doesn't need you
know to know the entire context. um can
you store user preferences
um as it goes so that you remember some
of this stuff you know like there's a
lot of like again like it's an art
there's like so many different angles
and ways in which you can do this right
um but yeah you are trying to like sort
of minimize context usage um you
probably don't need s a million context
or something you know I mean like you
just need good context management like
UX design yeah um yeah
>> um just I just want to ask the sub
agents were made to protect the conduct
of the core agent. Right.
>> That's right. Yeah. Sub agents were made
to
>> spreadsheet. Would we be able to use
multiple sub agents and try to make a
process where we chunk up the
spreadsheet in the case where it's super
large. So then the agents can kind of
run through each portion like in
parallel with each other.
>> Yeah. Yeah. I mean um Yeah. So like one
of the things I love about cloud code is
that we are like the best experience for
using sub agents like especially sub
agents with bash. It is very very good.
I didn't really quite realize uh all the
pain. Um I think if anyone's going to
QCON, I believe Adam Wolf is giving a
talk on QCON about how we did the bash
tool. Adam's a legend and the bash tool
such a good job. Um when you're running
parallel sub agents at the same time,
bash becomes like very complex and there
are lots of like like race conditions
and stuff like that and and so there's a
lot of work that we've solved there,
right? So this is like one of the things
I love about cloud code is you can just
be like hey like spin up three sub
agents to do this task and it will do
that and in the agent SDK as well you
you can just ask it to do that. So
number one sub agents are a great
primitive in the agent SDK and I haven't
seen anyone do it as well. So that's
like a big reason to use it. Um yes
generally you want it you want these sub
agents to preserve context. Let's say
you have if you have a spreadsheet, you
could potentially have multiple read sub
aents going on at the same time, right?
So maybe the main agent is like, "Hey,
can this agent read and summarize sheet
one? Can this agent read and summarize
sheet two? Can this re agent summarize
sheet three?" And then they return their
results and then the agent maybe spins
off more sub agents again. Right? So
this is like another
knob you have. Um, and I I think what I
want to say is like
there's like we've talked so many so
much about like all these different
creative ways that you can like do
things. This is like the level at which
you should think about should have to
think about your problem. You should not
really in my opinion think about like uh
like how like how do I spin off a
process to make a sub agent or like you
know like the system engineering between
like behind like what is a compact or
something right? So like we take care of
all of this for you in the harness so
that you can think about like hey what
sub agents do I need to spin off right
and like how do I create a a a genic
search interface and how do I like
verify it's work these are the really
core and hard problems that you have to
solve [laughter] and any time you spend
not solving these problems and and
solving like lower level problems you're
probably not delivering value to your
users you know and and so um yeah I I
think sub agents big fan of the agent
SDK in case of yeah uh yeah
>> so uh like we have this uh text and the
verification task so where exactly we
need to put the verification in this
example I let's say after generation of
the SQL query I can verify it is the
right query is generated or not that is
the one path second path is like
generation the query directly executing
and once I will get the output then I
will do the verification so uh and how
how agent can choose dynamically like
which one is the right path?
>> Yeah. So the question is like where do
you do verification? Uh is it only at
the end? You do it in the middle like
things like that. I would say like
everywhere you can just like constantly
verify right like uh like I said we do
some verification in the read step of
the of cloud code right so that's like a
great example um you can do it at the
end you should absolutely do it at the
end but at any other point if you have
rules or heruristics especially uh like
if for example you're like hey one of my
rules is that you shouldn't do like the
the total number of columns you should
search is should be under 10,000 or
under a thousand or something that's
like a a nice way of doing it, right?
Like similarly here like maybe you
shouldn't be inserting like a huge like
row like of of values like give feedback
to the model be like hey chunk this up
right you throw an error and give a
feedback and the great thing about the
model is like it listens to feedback it
will read the error outputs right and
then it'll just keep going so yeah
verification is definitely like I I know
I have it in this like as a sort of a
loop but um it's definitely more like
verification can happen anywhere and and
should happen anywhere like like put it
in as many places as you can. So, um all
right, I do need to start doing some of
the prototyping, but I'll take one more
question. So, right here. Yeah.
>> How do we say how do we form the steps?
I mean, like how do we say the agent
that go search first and then this step
and then do that step?
>> How does the loop actually from the
start point to the end? How do we
>> you just tell it? So, like uh
>> like is it in a system prompt or
>> Yeah, in the system prompt. Yeah. So
like with cloud code, we just give it
the bash tool and we're like, "Hey, like
gather context, read your files, uh do
stuff like run your linting, you know
what I mean?" Um, and so yeah, again
with the agent, you don't need to
enforce this, right? You don't need to
tell it, hey, like you need to do this
because like sometimes it might not be
necessary, right? Like let's say that
someone is asking a readonly question
for your spreadsheet.
you don't need to like verify that uh
like you're that there are no compile
errors, right? Because there you haven't
done any write errors, write write
operations, right? So, um let the agent
be intelligent and and like in the same
way that you would like that same
freedom when you're doing your work,
right? Uh you're trapped in this box or
whatever like same way, right? Uh so,
okay, cool. I I I do want to try and see
if I can do some prototyping now that we
have this uh uh the the holder as well.
Um okay, yeah, execute lint. We've done
a bunch of Q&A. Okay. Prototyping. Okay.
Let's say that you have an agent, right?
Like you want you want to build an
agent. You come out of this talk and
you're like great. I have a bunch of
ideas. How how do I do this? Um I think
what I say overall is like building an
agent should be simple. Your agent at
the end should be simple, but simple is
not the same as easy, right? So like it
should be very simple to get started and
it is just go to cloud code, give cloud
code some scripts and libraries and uh
custom cloud identities and ask it to do
it, right? That's what we're going to
do, right? Um that's like it should be
so easy to be like, hey, this is my API.
This would be like an API key. uh can
you like go search like you know I
[clears throat] don't know like my
customer support tickets or something
and organize them by priority or
something like that right and then look
at what cloud code does and and and
iterate on it right and this is like a
great way of like just skipping to like
the hard domain specific problems that
you have right so you have a lot of like
domain problems like how do you organize
your data your agentic search how do you
like create guard rails on your database
these are all questions that you can
just start solving right away with cloud
code, right? And so try and like build
something that feels pretty good with
cloud code. And I think generally what
I've seen is that you can do this and
get really good results just out of the
bat using cloud code locally, right? And
and you should have high conviction by
the end of it, right? And so um yeah, I
think like [laughter]
I forgot more info. Watch my AI engineer
talk. Uh this is like a deck for
internal that we were using. Um okay. So
uh yeah, I'm going to be inserting this.
So yeah, you're getting what we what we
show customers, right? So um okay. Uh
yeah. So yeah, use use cloud code. Uh
again, simple but simple is not easy,
right? So like the amount of code in
your agent should not be like super
large. Doesn't need to be huge. doesn't
need to be extremely complex, but it
does need to be elegant. It needs to be
like what the model wants. You want to
have this interesting insight. Let's
turn the the model into a SQL query. Oh,
let's turn this spreadsheet into a SQL
query and then go from there, right? So,
um, think about it that way. And cloud
code is like a great way of doing that.
So, okay, uh, let's make a Pokemon
agent, right? This is what we're going
to do. Uh, Pokemon is a game with a lot
of information. There are thousands of
Pokemon, each with a ton of moves. Um,
uh, we want to be pretty general. And so
there is actually like a Pokey API. Um,
and the reason I chose Pokemon is just
because like I know that you guys have
your own APIs as well, right? And
they're all like very unique, right? And
uh, so I wanted to choose something with
a kind of complex API that I haven't
tried before. Um, so the Poke API has
like, you know, you can search up
Pokemon like Ditto. Uh, you can search
up like items and things like that. Um,
and so it's got this like yeah, this
custom API. You've got everything in the
games, right? So, um, and yeah, like one
of the Quest things
agent might want, your user might want
to do is make a Pokemon team, right? I
love Pokemon. I know very little about
making an interesting Pokemon team for
competitive play. Uh, could my agent
help me with that? That'd be that'd be
cool, right? So, um, my goal is to make
an agent that can chat about Pokemon and
then we will like, you know, see what we
can do, right? And and and how far we
get. So, um, I've done like some of this
work already and I will like open up and
show you. So, um, the first step and the
prompt here is like the first step is
I'm I'm going to do mostly code
generation for this, right? And so, um,
let me
Is that going to be on GitHub somewhere?
>> Uh, actually it is. Uh, yeah, it's on my
personal GitHub.
>> Oh, yeah. I was going to commit all of
this as well.
>> Yeah.
>> Um, yeah. Yeah. So, uh, I think my
personal GitHub is, let's see. All
right.
>> Is it secure GitHub or does it have
malware in [laughter] it?
>> You guys are AI engineers. Yeah. Like,
if you can get owned, that's that's your
fault. Um,
yeah. So,
um, yeah, you can you can clone this if
you'd like. Um, I need to push the last
change this. So, okay. So, um, yeah. Can
you guys see this? Should I put it in
dark mode instead or is this fine? Like,
um,
>> dark mode.
>> Dark mode. Okay. [laughter]
>> Okay. Okay, this better.
>> No.
>> You want a different dark mode?
>> Dark hard. Okay. I don't think this as
good as it's going to get, guys. Um,
okay.
I Is it How does this work? Can you guys
still hear me or
>> Okay. Um, okay. So here's an example of
like I've taken the the prompt I gave it
was
hey I go search Pokey API for its API
and create a TypeScript library right
and so this is all by coded um and so
you can see here that it's created this
like interface for Pokemon right and so
it's created like this Pokemon API I can
get by name I can list Pokemon I can get
all Pokemon I can get species and
ability abilities and stuff like that.
And so like this is just a prompt that I
gave it, right? And it generated this
like TypeScript API. It also did it for
moves. Um and then it's created this um
like uh it's created this like API that
I can use import Poke API right from the
Poke API SDK. And uh yeah, you can see
like sort of how it's like set set this
up. And uh now in contrast, right, and
and so this is the cloud. MB, right?
This is a TypeScript SDK for the Pokey
API. Um, this is like the the modules in
the Pokey API. Here are some of the key
features. Um, uh, I'm asking it to write
scripts in the examples directory and
then it will execute those scripts to
help me with my queries, right? Um, and
I give it some example scripts. It
doesn't always need all this
information, right? Like, uh, but yeah,
fetching Pokemon, listing the resources,
getting data, things like that. So this
is like my agent really. It's like a
prompt I gave it to generate a
TypeScript library and then this
cloud.md and I I can chat with it in
cloud code. I'll also show you a version
of it that is just tools, right? So here
I'm using the messages completion API,
right? And I've given it a bunch of
tools from the API. So like get Pokemon,
get Pokemon species, uh get Pokemon
ability, get Pokemon type, get move. So
you've defined all of these tools and
you can see that like you know I also
just gave it a prompt and told it to
make the tools. Um it doesn't want to
make a 100 tools right like there's a
ton of smoke on or sorry um pokey API
data. Um but like it you know there's
only so many parameters it can do. So
it's got this like tool call and now um
and I I made like a little chat
interface with it. Right. So let me now
go here and say like uh this is my tool
calling.
Um
>> did I
great. So yeah, here we've got this
chat.ts, right? Um
I I use bun when I'm prototyping stuff
just cuz like I don't want to compile
from Typescript to JavaScript. Um and uh
again bun has like linting built into
it. Uh it's a way of like simplifying
for the agent so the agent doesn't need
to remember to compile but TypeScript is
better for generation because it has
types right. I'm going to start this
like fun chat and then I'm going to try
like, okay, what are the generation
two water Pokemon?
Um, and you'll see that it's it's
starting to like search and I'm logging
all the tool calls here. This is very
very important, right? Because like it
needs to like do the tool calls. And so
you can see that what it's doing is like
it's searching a bunch of Pokemon. Um,
and then it told me, okay, here are the
water Pokemon for Gen 2, right? It's got
Toadile, Crocenoff, or alligator. You
can see sort of like how it's thought
like in between each step, it's thinking
through um the previous steps. Right
now, like let's say that I want to do
with claw code. I think I might need to
>> uh
I need to delete this example.
>> Um Oh, yeah.
>> Small question. How do you log the the
tool calls? It's like there's just an
argument you can
>> Oh yeah, this is um this is like in the
normal API, right? So I just like uh in
the model every time it logs it, I just
call this this is in the like normal
anthropic API um in the SDK. I I'll get
back to get to the SDK. Um it's just
like you just log every system message.
So, um, just doing it in console logs.
Does that make sense or Yeah. Okay.
Yeah.
>> So, so that chat interface you were
showing, is that just using the regular
API or
>> Yeah, that's using the regular API.
>> So, not the agent SK,
>> not the agent SDK. Yeah. Yeah. And so,
what I'm going to do here is um here I'm
going to delete the script
because I don't want it to cheat. Um,
but okay. So, here you know that um I've
I'm just opening cloud code. I've
created a bunch of files here. I'm going
to say like, can you tell me all the
generation 2 water Pokemon?
Um, and then we'll see what it can do,
right? So, um, [clears throat] I forget
if I need to prompt it to write a script
or something. I think it'll be fine.
We'll see what happens.
>> Do you mind going to the core SDK file
and just showing you talked about
getting context and then action and then
verification? Can you show that in the
code and how we're configuring the tool
description?
>> Yeah. So, uh, we haven't done the SDK
part yet. So, so far I've just put put
some APIs in cloud code. Yeah. Yeah.
>> Sorry, I thought I missed that. That's
>> Yeah. Yeah. Yeah. Of course. Okay. Um,
but yeah. So, okay. You can see here um
it's it's given me a lot more, right?
And um
>> yeah, it's given me a lot more. So, it
it it's it's saying there's 20 water
Pokemon, right? And I think this is
roughly right. I've like um
uh what did it do?
I think it just knows. Okay.
That's funny. Live this. Um
um anyways uh
yeah Pokemon is slightly in distribution
which is which is I I guess good
[laughter]
um but yeah so like what what it will do
is like it will try and like write like
a script and uh because you don't want
it to think as much right so here it's
like okay what I'm going to do is
um let's see gen two water type Pokemon
and where is it?
Okay. So, yeah, you can see here it
knows like, okay, the start of the
generations. It fetches these uh per
API. Um I guess this decided not to use
like my pre-built API here. Um
and then uh yeah, and and then runs it,
right? So, um I think I need to like
improve the cloud. MV for this. But
anyways, you can see that like it's able
to like check 200 plus Pokémon and then
check for their type and and you know
get their get their information, right?
So this is like uh just a quick example
on like how to do codegen and how to use
cloud code to do it, right? So um we'll
run this script and then like uh um like
keep going, right? So, uh it will give
me the output and um yeah, basically
what I want to show, let's see, we have
roughly 15 minutes left. Um
>> play Pokemon.
>> The time play Pokemon. Yeah. Yeah.
Actually, this is one of the demos I was
thinking of doing. Um Cloud Code plays
Pokemon. So, like let's say you want to
do like an agentic version of Cloud
Plays Pokemon. How would you do it?
um
what you would do I think is like you
would give it access to the internal
memory of the uh the ROM right and so
let's say that it wanted to find its
party it could search that in memory and
Pokémon Red is like a very well in
distribution uh reverse engineered uh
game right and so it could search in
memory to be like hey these are the
Pokemon um these are like this is how I
figure out where the map is this how I
navigate it right so this is like maybe
exercise to the reader if you want to
try it out. It's like um there is like a
no.js GBA emulator. Um I think I have to
legally say you have to go buy Pokemon
Red and try it. Um but yeah, I think
like uh yeah, good example. Anyways,
here so it's it's fetched all of them
and it's listed all their types and um
yeah, you can see how it's like used
code generation to do this, right? So um
a quick example of using cloud code to
prototype this. Um now there can be like
more interesting like data here. So um I
do want to leave time for example. So I
think I'll just sort of like for
questions. So I'll just sort of go
through like an example. Let's say
you're making competitive Pokemon.
Competitive Pokemon has a lot of
different variables and data. So, this
is like a a
text file from this online like a
library basically which stores like all
of the Pokemon and their like moves and
who they work well with and don't work
well with and you know like who they're
countered by and all of these things,
right? So, there's a ton of data here,
right? And it's all in text file. Um,
which is actually pretty good for cloud
code, right? because I can say like,
okay, um, hey, I'm going to give it a
little bit more data. Normally, I put
this in the, um, check the data folder.
Tell me,
I I want to make a team around Venusaur.
Can you give me some suggestions based
on the Smogon data? Um,
and Smoke on is like this online API.
And so I'm I'm not entirely sure what
it'll do here yet. I haven't done this
query before. Uh, but we'll see. I think
it'll be it'll be fun. Um,
where am I? Oh, I see.
Um, yeah. But what I wanted to do is
sort of grapple through this this data,
right? And and sort of figure out from
itself from first principles, not having
seen this data before, how can I like
answer my query, right? So um while it
does does that I'll I'll take any
questions. Yeah.
>> Um first of all great work job. Uh so
this is like really on top of cloud code
>> and so my question is if we were to
deploy this customer basically
>> are we supposed to have cloud code
running in like a like a swarm or are we
somehow able to take the cloud code part
out just bot and the agent SDK?
>> Yeah. So, let me show you like very
quickly like what the what it looks like
to use the agent SDK here. Um, so I've
already done this file system, right?
And again, I want you to think about the
file system as a way of doing context
engineering, right? Like this is like a
lot of the inputs into the agent. So, my
actual agent file is like 50 lines,
right? Um, and it's mostly just like
random like boiler plate, right? Like I
guess, yeah, it's decided to stop it
from
uh writing scripts outside of the custom
scripts directory. Again, fully
backcoded. So, um yeah, you can see like
it just runs this query, takes in the
working directory
um and uh like like runs it in a loop,
right? And so probably I'd want to like
turn into like some allowed tools here
and stuff, but it it's very simple. And
and so um if I were to like
productionize this, the first step I do
is like okay, I I've tested it on cloud
cloud code. It seems to do pretty well.
I write this file. Then I put it there
are two ways to do it. So one is I do
think that like local apps might be
coming back with AI because I think that
like there's such an overhead to running
it. Like for example, cloud code is a
front-end app, right? Like it works on
your computer. So maybe the way I shift
this as a Pokemon app is like hey I have
like an app that you install and it
works locally on your computer and
writing scripts. I think that's one way
of doing it, right? Um the other way is
yeah you have you [clears throat] host
it in a sandbox. Um and again there's a
bunch of different sandbox providers
that make it really easy like Cloudflare
has a good example um of using the agent
SDK and it's just like sandbox.st start
you know and then like bun agent.ts ts
and that's kind of all it takes, right?
Like it's like like they've abstracted
away a lot of it. Um so you run like the
sandbox um and then you communicate with
it and um yeah I think there is like
some very interesting stuff that I'm not
sure I had time to get to but um like I
I think some interesting questions are
like um
yeah like how do you do this sort of
like service now you're just spinning up
a sub like a sandbox per user. Um,
there's a lot of like I'd say best
practices to solve here. One thing I
just want to call out for you guys to
think about, um, if you're making a an
agent with a UI, like let's say that you
have, uh, yeah, my Pokemon agent and I
wanted to have an UI that is adaptable
to the user, right? Like maybe some
users are doing team building, some
users are helping it with their game,
some users just want pictures of
Pokemon. How would how would I have an
agent that adapts in in real time to my
user, right? Um the way I would do it is
in my sandbox, I would have a dev
server, right? And the dev server would
expose a port. Um it would run on bun or
node or something. It would like expose
a port. The agent could edit code and it
would live refresh and and your user
would be interacting with that website.
This is how a lot of like site builders
like lovable and stuff work, right? they
they use sandboxes and they host
essentially a dev server. And so
thinking about this for your users, if
you want a customized interface, this is
a great way to do it. Um, okay, let's
see. Let's see what it did. Um,
okay, cool. Okay. So, um it's like
written this like script has generated
like show me some base stats and
suggested it a like um uh a move set and
some teammates and you can see sort of
like see what did it do? Um control E.
Um
yeah. Okay. Okay. So, you can see here
what it started doing is like it started
searching for Venusaur, right? And it
started finding uh those types the like
those Pokemon and when it does that it
also gets other Pokemon that mentioned
Venusaur. So, it gets like its teammates
and it counters and stuff, right? And
it's sort of over this time found
interesting Pokemon, right, that like it
might work with, right? So, it's done a
bunch of these searches and it's gone
these profile. It's found most common
teammates and and written a script to to
analyze it, right? And so this is all
based on a text file. Of course, I could
have pre-processed a text file a little
bit more. Um, but yeah, it's like done
this sort of like interesting
um analysis for me, right? And again,
I'll I'll push out more code to the
GitHub repo. And um I'll also tweet
about this. I'm on Twitter. I'm uh
TRQ212.
Uh I tweet a lot. So, uh, definitely
like mostly about agent SDK stuff. Um,
but yeah, we have about 8 minutes left,
so I want to spend the rest of time
taking questions about kind of anything,
you know, and I'm sorry we didn't get to
do more prototyping. Um, but, uh, yeah,
over there.
>> Yeah, I was going to say it's a flaw
play. Can you uh sort of plug this in
with that just to see if the agent will
uh be more selective with the team it uh
tries to capture?
>> Yeah, put it in in Cloud Play's Pokemon.
Yeah. Yeah, I do want to play CL
codeplays Pokemon. I think that would be
fun. Yeah. Yeah. I I think cloud plays
Pokemon. I think we try and keep it like
a pure reasoning task as much as
possible. Yeah. Um other questions?
Yeah.
>> I was curious about how people are
monetizing
like
you know kind of like
you kind of like lose the opportunity to
get all the margins.
>> Yeah. I'm curious like
shipping your own SDK so that they kind
of take the usage base.
>> Yeah. I I do think overall, especially
right now, agents are kind of pricey,
you know what I mean? Because like um
the models are have just started to get
agentic. We really focus on like having
the most intelligent models, you know,
and like you generally this is just like
an overall like SAS business software
thing. You'd rather charge fewer people
more money that really have like a hard
problem, you know? And so I think this
is still good. like you probably should
find um you know these hard use cases
but I would say like number one make
sure you're solving a problem that
people want to pay for right is like the
number one step right and then number
two um yeah I think you could do
subscription or token based I I think
this kind of comes down to like how much
you expect people to use your product uh
versus like how much you expect them to
like use it occasionally like cloud code
obviously people use a lot and in order
to like we do a mix of like if we give
you some rate limits and if you exceed
it we do uh usage based pricing. Um I
think that like yeah it's very like
dependent on your own user base and kind
of like what they will do but I will say
monetization is something you should
think about up front and design your you
know agent around because it's hard to
walk back these promises.
Um, yeah, back there.
>> I haven't heard you talk at all about
hooks and be curious to hear your take
on how
>> uh Yeah, there's so much to talk about.
Um, hooks are great. We we do ship with
hooks. Um, hooks are a way of doing
deterministic verification in particular
or inserting context. So, um, you know,
we fire these hooks as events and you
can register them in the in the agent
SDK. There's like a guide on how to do
that. Um, examples of things you might
use hooks for is like for example, um,
yeah, you can run it to verify the like
a spreadsheet each time. Uh, you can
also like let's say I'm working with an
agent and, uh, I'm the agent is doing
some spreadsheet operations and the user
has also changed the spreadsheet. This
is an interesting like place to use a
hook because you can be like hey has
after every tool call insert changes
that the user has made uh and and so
you're giving it kind of live context
changes um in an interesting way. So um
yeah I think uh uh yeah there there's
more stuff on like the docs about hooks
um and happy to like talk about it
afterwards as well. Yeah, more
questions. Yeah.
like I do
in
>> I realize it's working.
>> I want to take the same conversation
that I've already done because I'm going
through
>> and convert that into a new
>> okay
>> which is that I followed a few steps now
it's actually working but I don't want
to rewrite all of the code to write
[clears throat] it
like because it works.
>> Yeah. Sure. Yeah. So like let's say
you've done this prototyping, you found
something that works. What I would do is
like I somewhere the cloud MD like
obviously like when I tried doing this
one time it like didn't use my API
directly and it wrote JavaScript. I
should have been more specific in my
cloud. Mmd to be like hey you should use
this. Um [snorts]
I yeah I think like that's one thing. Um
the second thing is uh yeah do summarize
in terms have the helper scripts that
you need and then like write something
like this agent.ts script for like to
run the test again. Uh yeah more
questions in the grade. Uh yeah, I just
put it a Pokemon and I think it's lying
about using the scripts answer. It tries
a couple times like my SDK isn't very
good it tries twice and then it's like
oh here's your comparison table but it's
just because it's a distribution. Do you
have any advice for that kind of
problem?
>> Yeah, this is a good question and and
you know like I'm I think there is some
messiness, right? Like I I think one of
the things if an agent knows an answer
um and you want to like sort of like
fight it kind of to be like okay like no
it's generation 9 now and like Venusaur
stats have changed and there's like this
new like charact like um this is hard I
actually think uh one of the ways of
doing that is hooks. So you can say for
example like hey uh don't if if you've
like returned a response without writing
a script you know you can check that you
can be like give feedback to bit like
please make sure you write a script
please make sure you read this data
right and and you can use hooks to like
give that feedback in in the same way
that in cloud code uh we have these like
rules like make sure you read a file
before you write to it right so add some
determinism uh it can definitely be like
I said it's an art you know sometimes
you know yeah maybe like like writing
course I guess probably um [laughter]
yeah and the gray
>> how are you guys dealing with like large
code bases I'm working like a 50 million
plus code base and so
>> bre tool doesn't work really
>> um so I'm having to build like my own
like semantic indexing type thing to
kind of help with that right
>> is there any kind of like addthropic
maybe thinking about how that can be
more native to the product like you know
in a couple months is the thing I'm
writing just going to go away or like
how how do you guys think about
Okay, your last question in a couple
months is you're thinking to go away
generally. Yes. Yeah. [laughter] Anytime
you ask about AI, yeah. Uh I think that
um
semantic search this is a cloud code
question more than a security question,
but happy to answer it. Like um we you
know there are trade-offs of semantic
search. It's more brittle. Um I think
you have to like index and and and
search and we it's not necessar the
model is not trained on semantic search
and so I think that's sort of like a
problem like you know grap it's trained
on because it's like it's easy to do
that but like semantic search you're
implementing your bespoke query um for
like very large code bases you know we
have lots of customers that work in
large code bases I think what I've seen
is sort of like they just do like good
claw mds you start in you know trying
Make sure you start in the directory you
want, have like good like verification
steps and hooks and links and things
like that. And so u you know that's what
we do. We don't have you know a custom
we dog food clunk code, right? So um
yeah.
>> Okay. Yeah. Last question.
>> We have to close unfortunately actually.
Give it up for Tariq everyone.
[applause]
[music]
>> [music]
>> Heat.
Ask follow-up questions or revisit key timestamps.
The presentation introduces the Claude Agent SDK, built upon Claude Code, as a framework for developing autonomous AI agents. It highlights the evolution of AI features from single LLMs to structured workflows and then to agents that build their own context and trajectories. A core tenet, dubbed the "Anthropic way," emphasizes the bash tool and file system as powerful, composable primitives for agent design, enabling code generation for both coding and non-coding tasks. The agent design loop involves gathering context, taking action, and verifying work, with a strong focus on verification. The speaker also discusses security measures like the "Swiss cheese defense" and practical considerations for agent development, including the use of tools, bash, and code generation, and managing context with sub-agents, particularly for large datasets. Prototyping is encouraged using Claude Code to quickly iterate on domain-specific problems.
Videos recently processed by our community