From Stateless Nightmares to Durable Agents

From Stateless Nightmares to Durable Agents — Samuel Colvin, Pydantic

Watch on YouTube

Now Playing

From Stateless Nightmares to Durable Agents — Samuel Colvin, Pydantic

Transcript

616 segments

0:00

Hi, I'm Samuel from Pantic and today I'm

0:03

going to give a demo of Pyantic AI

0:05

temporal and Pantic logfire. I'll also

0:08

cover Pyantic evals. So we have in

0:12

Pantic AI support for temporal and

0:14

deboss to durable execution frameworks.

0:16

We're actually adding a bunch more. I

0:18

think we've had something like five pull

0:19

requests to add other durable execution

0:22

or like workflow orchestration backends.

0:25

But at the moment it's it's these two.

0:26

And I think it's fair to say temporal

0:27

are like the big incumbent in this space

0:30

and they're they're kind of I guess

0:32

leaders leaders in how you do this to to

0:34

demonstrate this a simple example of

0:36

like I go ask an LLM question it replies

0:40

mostly just works and we don't need the

0:41

durable execution component. So, but

0:43

once we get into longer running

0:45

workflows, that's where it really

0:47

becomes a problem. In particular, where

0:49

we've done enough compute that we don't

0:50

want to lose it or we've spent enough

0:53

time on that compute that we really

0:54

don't want to have to start again for

0:55

the user. That's what for example I

0:58

think OpenAI it's I think it's public

1:00

that OpenAI use temporal for their um

1:02

deep research and I think some of the

1:04

other LLM deep research do the same

1:06

thing. So, I'll I'll start with a kind

1:08

of toy example and then I'll move on to

1:09

a more deep research type example. in

1:12

fact a deep research example. But before

1:13

we get into that, let me let me run this

1:15

example without. So this is a uh two

1:18

agents that play um 20 questions. Um so

1:22

instead of just a yes no answer, they

1:23

get to give a little bit more detail

1:25

like yes, kind of not really, no,

1:27

completely wrong. Um that was because I

1:29

was getting bored of waiting for them to

1:30

take ages to succeed on the with the

1:32

just the yes no. So, we have an answer

1:35

agent which is runs a relatively small

1:37

model Hiku 3.5 because I didn't know 4.5

1:39

came out until about an hour ago. And

1:41

and this basically takes a question and

1:43

answers yes well with with one of these

1:45

answers and it it gets added into its

1:48

context the the like secret object that

1:49

you're looking for um which in this

1:51

example is a potato. So, and then we

1:54

have the um questioner agent or the

1:57

player agent has a bit more context on

2:00

what it's going to go and do. You don't

2:01

need to read read through all of this

2:02

stuff. This code is is public now. It's

2:05

a pull request, but I'll I'll merge that

2:06

afterwards, but you should get the idea.

2:08

Um, and the way that the questioner

2:12

agent gets to ask its questions is by

2:15

calling a tool, ask a question. Inside

2:17

that tool, we run the other agent, the

2:20

answer agent to basically decide the

2:22

answer to this question, and then we

2:24

respond. And it takes a little bit of

2:26

time to run. You can see in this case,

2:27

it succeeded pretty quickly. Sometimes

2:30

it it's amazing how even these very

2:32

simple questions, even very intelligent

2:34

LLMs get themselves completely confused

2:36

and go down some weird track and and

2:38

like get very confused. But you can see

2:40

in the last run that it asked a bunch of

2:42

questions, got down to like is this a

2:45

fruit or vegetable? Is it a fruit? No.

2:47

So it knew it was a vegetable, is it

2:49

orange? And it worked out the answer was

2:51

was potato. But obviously and here it is

2:53

running again. I don't know how many

2:55

steps it's going to take. Sometimes it

2:56

can take up to like 50 steps to get this

2:59

question right. And obviously the

3:00

problem is if this dies either because

3:03

we have some unreliable uh endpoint

3:07

within our system or because we're

3:09

running in the cloud and Kubernetes

3:11

decides it wants to scale or whatever it

3:13

might be. If we run this again, we

3:15

obviously have to start from scratch.

3:17

That is problematic in this case. Um but

3:20

you can imagine as the tasks get longer

3:22

and longer just restarting it gets more

3:24

and more problematic. So um I think the

3:27

other thing to say about this this like

3:30

um 20 questions example is although it's

3:32

pretty simple to understand and it feels

3:34

like a toy, it is actually directly

3:36

equivalent to a deep research case where

3:38

effectively the agent is is like going

3:41

off on a quest to go and find an answer

3:43

to a question where it needs to ask the

3:45

like you know troll at the bottom of the

3:48

garden the other the like question to

3:50

the next riddle to get to the next

3:52

endpoint, right? like deep research is

3:53

effectively this 20 questions just with

3:56

like web like web search or rag or

3:59

whatever it might be is your

4:00

intermediate steps. So let's turn that

4:03

20 questions into a durable agent. Um

4:07

so this is mostly the same code for

4:09

simplicity. I've actually copied it here

4:10

but you see we have our answer agent. We

4:13

need to wrap it in this temporal agent

4:15

which gives us another thing that

4:16

behaves like an agent like a pantic AI

4:19

agent. So it's also a subclass of

4:20

abstract agent. We do the same with our

4:23

questionnaire agent. To keep things

4:25

simple, we aren't doing the same um

4:27

stuff about passing around the answer in

4:30

context. We've just hardcoded the answer

4:32

into the system prompt here for the for

4:34

the answer agent. But apart from these

4:37

like adding the temporal wrappers, you

4:39

can as I will show later just um apply

4:42

durable execution later. But here's

4:44

where the the temporal bit comes in. And

4:47

I'm not a salesperson for temporal. And

4:48

although the underlying stuff they do is

4:50

amazing, I do think some of their Python

4:52

abstractions are kind of ugly. But

4:54

anyway, the the principle of temporal is

4:56

that you have workflows and activities.

5:00

And workflows need to be entirely

5:02

deterministic and activities then need

5:05

to do anything that is non-deterministic

5:08

like IO in particular. So you can

5:10

basically do anything inside a workflow

5:11

other than IO and calling random. And if

5:14

you're if if that's the case, then you

5:15

have a deterministic system. And what

5:19

you can think of what temporal is doing

5:20

in the background as it's running that

5:23

workflow and it's basically recording

5:25

the every activity that runs and both

5:28

the the inputs to that and the outputs.

5:30

And so if you want to rerun it from like

5:33

from the beginning to a certain point,

5:34

it can basically plug in those answers.

5:36

And I'll show what that looks like. So

5:38

this is how we define our workflow. The

5:40

activities here are implicit. The point

5:42

is that this temporal agent takes care

5:45

of turning all of the IO that you need

5:48

to do to call an LLM into activities in

5:50

the background, including tool calls.

5:52

So, OpenAI claim to have temporal

5:54

support, but they don't support tool

5:56

calls as activities, which to me makes

5:58

it slightly a chocolate teapot. Like,

6:00

there's actually no point in having any

6:01

of these things without without tool

6:03

calling or little little point. But we

6:05

define our workflow like this. I think

6:08

you can for the most part just copy

6:09

paste their their definitions of how to

6:11

do it. Here we have our play mechanism.

6:14

The point here is we're going to we're

6:15

going to connect to the temporal server

6:17

which is what's going to record the

6:19

state of our task or our agent as it's

6:22

executing and be able to resume and

6:23

stuff. I have temporal running locally

6:26

here. This is just the open source

6:27

version of temporal which runs as a

6:29

separate process and I can restart that

6:31

to kind of kill the state and that's why

6:33

we're connecting to local host. In

6:35

production you use temporal's cloud.

6:37

that's why they make so much money. Um,

6:39

and here we we run the worker. This is

6:42

where we're actually going to kick off

6:43

our workflow. In general, we just kick

6:45

off our workflow with execute workflow.

6:48

We pl pass in the the workflow that we

6:50

want to run. We pass the inputs. There

6:53

aren't any inputs in this case because

6:54

we just start. So, there aren't any

6:56

here. And it will then go and run. And

6:58

so, if I run this, we will you will see

7:03

it start to to execute.

7:06

You'll see it running. The only couple

7:08

of things to note, there's a couple of

7:09

log messages. Ah, and we immediately

7:12

have this exception broken. And that was

7:14

because to simulate some system that's

7:17

unreliable inside the tool I added 20%

7:20

of the time it's going to it's going to

7:21

break. What you will see is that

7:23

temporal has immediately taken care of

7:25

continuing after that. So even though

7:28

this broke, it will continue to run. And

7:31

I may have set 20% to be too high um

7:33

because it's now failing all the time,

7:35

but it's actually going to continue and

7:37

deal with those runtime errors and just

7:38

continue to operate absolutely fine. Let

7:41

me However, I think I dialed up 20% too

7:43

high just before. So, I'm going to

7:46

actually see if this is going to

7:47

continue to operate.

7:49

Obviously, when you give a demo,

7:50

everything suddenly grinds to a halt.

7:52

Someone recently said they hate demos

7:53

where everything goes to plan, and I

7:55

said you'll never need to worry about

7:56

that with me. Um, I don't know why that

7:59

has actually ground to a complete halt.

8:00

I don't know whether that's it just

8:02

repeatedly failing.

8:06

Let me I'm going to kill temporal server

8:08

here and restart it so that we don't

8:10

have the state stored. And I will clear

8:13

this and run it again. And you should

8:16

now see it succeeding most of the time

8:17

and failing 10% of the time. So yeah,

8:19

you see it now asking questions and

8:22

occasionally breaking. Good timing, but

8:24

continuing. Um, so that's one of the

8:26

things Temporal does. It just does the

8:28

like retry logic that is like you could

8:31

implement without temporal, but they do

8:32

it very nicely. But there are more

8:34

powerful things. So let's say this

8:35

process that's in the middle of running

8:37

gets killed by Kubernetes. Now, so we go

8:40

across here and we we just like kill it.

8:42

Process gets killed. Now, what I didn't

8:44

show you is I also instrumented this

8:46

with logfire. So if we look at our our

8:49

workflow, we can see exactly what was

8:51

going on here. Um, and we we can see

8:53

what's going on here. So we have the

8:55

different calls to claude and then

8:56

inside that we have we're running the

8:58

activity which is then running the um

9:01

other agent. But in particular if we

9:03

come to the top level start for the

9:04

workflow I can take the the workflow ID

9:07

if we come back over to code here you'll

9:10

see I had some some code in here to

9:12

basically allow me to continue with a

9:13

given uh resume ID and to continue a

9:16

workflow. Now for the most part you

9:18

wouldn't have to do this. This is just

9:19

for the sake of the demo. If I just

9:21

reran the script again, it would kick

9:23

off this workflow again and it would run

9:24

the two in parallel. That would look

9:26

really confusing. So instead of doing

9:28

that, I'm I'm specifically hanging on a

9:30

particular workflow to finish. So you

9:32

can see what's going on. So if I if I

9:35

run my script again, but I give it the

9:37

workflow that was ongoing. Now you see

9:40

it's already whizzed forward to question

9:42

six and it's continuing to operate. So,

9:44

we've got it to basically resume without

9:46

having to add any resume code anywhere

9:48

in our actual agent code. We just set up

9:51

temporal and it works. And you can see

9:53

exactly what's happened if you um look

9:56

at logfire. What you will see is that

9:58

that whole that first bunch of um calls

10:02

to the LLM responded in like 5

10:04

milliseconds. So, these were not

10:06

actually sent to the LLM. Temporal just

10:08

returned the result, the kind of cached

10:10

result that it already had for each of

10:11

these cases. So we're able to

10:13

effectively zoom forward to the point

10:16

where it then continues to to call the

10:18

LLM. It's it's as if you've gone through

10:20

everywhere that you're doing IO and

10:22

you've set up uh caching on each

10:24

individual call so that you can run your

10:26

code. Uh I see some people nodding which

10:28

is making me feel a bit better about

10:29

explaining this. But we don't have to do

10:31

the inference. We don't have to wait the

10:32

time. We can basically run our workflow

10:34

code that's generally very fast because

10:36

it's no IO. It's just procedural. and it

10:39

will just keep getting results instantly

10:40

until it gets to the point where it

10:41

needs to needs to continue. And you see

10:44

in this case, it's got itself completely

10:46

confused and it's off um wondering about

10:48

whether this thing is a salad bowl. So,

10:50

you see how sometimes the LLM does well,

10:52

sometimes it does does terribly. Um I'm

10:55

going to actually I'll leave that

10:57

running to see whether it you see it's

10:58

it knows it's related to food, but it's

11:00

got itself really confused. Um I will

11:02

just say as as it might interest you

11:05

just before this I was wondering how the

11:07

different models would perform and so I

11:10

I ran some evals with padantic evals on

11:12

these different cases and you can see

11:14

here we have it's a bit hard to read on

11:16

the screen but uh GPT 4.1 Gemini and

11:20

Claude Sonnet 4.5 and you can see the

11:25

different assertions for each case

11:27

whether they passed or failed here and

11:29

you can see the the average cost. You

11:32

can see Gemini was way way cheaper, way

11:34

way faster. And somewhere we should

11:37

have, if we look at an individual case,

11:39

we have a a metric for how many steps it

11:42

took to succeed. I think maybe we have

11:44

to scroll over. Yeah. Question count.

11:45

You can see here Gemini was way quicker

11:48

each time. I discovered subsequently

11:50

having having checked the results that

11:52

actually the reason Gemini is way faster

11:54

and answers much more quickly is it just

11:56

invents an answer that's wrong and I

11:57

wasn't checking it. So, this is not

11:59

perfect yet, but like it's uh the the EV

12:02

this is definitely an interesting case

12:03

for evals and seeing and like working

12:05

out which model is actually better

12:06

because they're definitely not

12:07

particularly good at it by default. But

12:09

yeah, in my naive case, Gemini did way

12:11

better, but that's not representative.

12:13

Anyway, I'm going to leave that because

12:15

it's got 46 steps in and it's still

12:17

failing to work out that that thing's a

12:18

potato. I I can show the evals case if

12:20

you want, but I think it might be more

12:22

interesting to look at a deep research

12:25

case, which is a kind of more meaningful

12:27

example of where you would run durable

12:29

execution. Um, and also doing stuff in

12:31

parallel, which is also one of the

12:33

things that like just works out of the

12:34

box with temporal without you having to

12:36

write any any code. So, this is my very

12:39

quick last night hours attempt at

12:41

building deep research. I honestly think

12:43

it's as good as lots of the actual deep

12:46

research systems. So we have we define

12:48

our plan for deep research and this is

12:52

this is effectively our deep research

12:54

plan. So it has an executive summary

12:56

what you would effectively pump out to

12:57

the user about what I'm going to go and

12:59

do. Then we have a list of web search

13:02

steps. We maximum we have a maximum of

13:04

five here so it doesn't take forever and

13:06

then we have analysis instructions. And

13:09

the point is that like I think this is

13:11

one of the big change in AI this year

13:13

answering a bit the question I had on

13:14

the other zoom. I think we've moved from

13:16

thinking that like agents in the sense

13:18

of so there are three definitions of

13:20

agents. There is the like AI definition

13:22

which is LLM's calling tools in a loop.

13:26

There is the tech definition which is a

13:28

micros service and then there is the

13:30

business definition which is something

13:32

that can replace a human. Um ignoring

13:34

the business definition for a minute. If

13:36

you think about the AI and the like

13:37

engineering definitions, we thought at

13:40

the beginning of this year you would

13:41

have one AI agent, one LLM calling tools

13:45

in the loop within each microser. I

13:47

think we've moved more and more to think

13:48

that the agents are actually the kind of

13:51

quantum of development. They are the the

13:53

micro tasks that are doing that you

13:56

build up to to form a like what most

13:58

people would think of as an agent,

14:00

something that actually goes and

14:00

autonomously completes a task. And so

14:03

our deep research agent is actually made

14:04

up of multiple agents. So we have this

14:07

plan agent which goes off with a prompt

14:09

and it returns an instance structured

14:11

data extraction gives you an instance of

14:13

this pyantic model which is your plan to

14:14

run it. Then you have the search agent

14:17

which has access in this case to search

14:19

tool or I'll show using tavilli in the

14:21

other case um which is using a a faster

14:24

model gemini flash uh in this case and

14:26

then you're using in this case I'm using

14:29

claude son 4.5 for the final analysis

14:32

stage. So I suppose this is a bit what

14:34

people talk about when they're leaning

14:35

towards graphs. I haven't built this in

14:37

a graph although I could because it

14:38

doesn't need a graph. It's not complex

14:40

enough to need a graph. And durable

14:42

execution is a way better way of getting

14:44

snapshotting, but like much more

14:46

granular support for for durable

14:48

execution. We added a tool that allowed

14:50

the analysis agent to do a bit more web

14:52

search if it really wanted to. I don't

14:53

think it uses it, but this is the actual

14:55

deep research code. So you can see how

14:58

concise it is. We run the plan agent. We

15:01

get back our plan. We run in parallel

15:04

all of the search agents. So, we're just

15:06

using a task group from Python to run

15:09

all of these. We get those results,

15:10

which will all be the text results of

15:13

the different bits of search. We use

15:15

format as XML to basically smash all of

15:17

that into a massive lump of reasonably

15:19

readable data for the analysis agent.

15:21

Then we go off and run the agent. And we

15:23

run the kind of question that I'm asking

15:25

AI relatively regularly for sales, which

15:28

is find me a list of hedge funds that

15:29

write Python in London. And if I go and

15:31

run this uh UV run deep research,

15:39

we'll see it starting to churn away. We

15:42

can see it in the terminal with logfire.

15:45

But we can also

15:48

come over here to log fire. Let me clear

15:52

that. Um go to the bottom here and we

15:55

can see this this run here as it's going

15:58

on. It's it's run the plan step in nine

16:00

seconds and you can see all of the

16:02

search steps going on in parallel. Once

16:05

they've finished, it will start. You can

16:06

see the analysis agent has just started.

16:09

We can look at the individual searches.

16:11

So, you get a pretty good idea of what

16:13

happened, the question it got asked, uh

16:15

the queries it decided to run, bunch of

16:19

data from medium, different sites,

16:21

structured data, and then like the agent

16:24

also bangs in quite a lot of context,

16:26

right? So this is each individual one of

16:28

our like 10 parallel searches. And now

16:31

the analysis is going to go and run with

16:33

all of that input. You can see so far

16:36

we've sent spent 8 cents on this

16:38

particular run. We'll see what it gets

16:41

to by the time it finishes. But

16:43

obviously the problem with this is if I

16:44

kill this now, it's just going to die

16:46

and I'd have to restart from the

16:47

beginning if I wanted to to run it

16:49

again. So, while that churns away, let

16:53

me start introducing you to the durable

16:55

execution example, spoiler, it's going

16:58

to be pretty similar. I discovered last

17:00

night that there's a bug with the Vertex

17:03

SDK. That means that you can't use it

17:04

with temporal right now. So, I've

17:06

swapped out uh I think we should fix

17:08

that, or at least I'll be winging at uh

17:11

Deep Mind today to go and fix that. So,

17:14

I've switched it out to OpenAI responses

17:16

here and I'm using Tavilli instead of

17:19

the built-in search. Yeah, but other

17:21

than that, this is all pretty similar

17:22

code. I could have probably imported the

17:25

code from the other module. I just

17:26

decided to duplicate it just to keep

17:29

things easy. But you see again, we do

17:31

the same thing. We wrap uh our agents in

17:33

temporal agent. This analysis one can

17:35

take more than I think whatever the

17:37

default activity duration is because

17:39

it's a long basically build up a a

17:42

summary. And so I give it I think it was

17:44

taking longer than 2 minutes or whatever

17:46

the default is. So I just gave it an

17:47

hour so it's not going to fail. Um and

17:49

then the the for me the most powerful

17:51

bit is everything here inside my

17:54

workflow looks exactly the same. I don't

17:56

have to do any crazy stuff to do

17:57

parallelism. I just use uh task group

18:01

exactly the same. I could use async.io

18:04

gather. It's all just imperative Python

18:06

code as you would be used to. I could

18:08

have a if I wanted to run this

18:09

periodically, I could sleep for seven

18:12

days in here. Temporal would take care

18:14

of pausing everything. Again, I'm not

18:15

here to be a temporal salesperson. I

18:17

don't love everything about what they

18:18

do, but it's a pretty powerful way of

18:20

thinking about code. We don't have to do

18:22

all the infra stuff. And then

18:24

ultimately, again, smash all my context

18:27

into the last agent and run it. And

18:30

again, there's a bit of plug-in stuff. I

18:32

have to plug in log add some plugins,

18:34

add the agents as plugins. But again, my

18:36

code to actually go and kick it off is

18:37

just execute workflow. Simple as that.

18:40

And I asked it here slightly more

18:42

controversial question of what's the

18:44

best Python agent framework to use for

18:46

durable execution and type safety. And

18:48

we will pray to God it gives the right

18:50

answer when we run it in front of

18:52

everyone. If I go and kick that off and

18:54

run this

18:56

again, we should see it. If we come over

18:58

here, we should see it running in

19:00

Logfire. You can see we have the stuff

19:02

related to kicking off the agent. It's

19:05

kicking off the workflow, excuse me,

19:06

here. And we have the searches beginning

19:09

to happen happen. But the the powerful

19:12

bit here is again imagine that we're

19:15

halfway through running all these

19:16

searches. We're about to start the final

19:17

step and something comes along and kills

19:20

the process. And by in general, you'd

19:23

have to go and completely restart this

19:25

process and run your deep deep research

19:27

all over again with temporal

19:30

it will just go and rerun that workflow

19:32

automatically. In this case, I'm

19:34

restarting it and just running that one

19:35

workflow. But in general, it would just

19:37

automatically go and be restarted and on

19:41

the the next time that Kubernetes comes

19:42

up, the workflow will run as it would

19:45

have done before, but it will get

19:47

answers to each individual question

19:49

basically instantly. And so

19:52

if it's not going to fail for me, which

19:55

it seems to be, there we are. It started

19:57

again. You see a plan took 24

20:00

milliseconds. Search all took no time at

20:03

all in the grand scheme of things

20:04

because it just got the result back from

20:06

temporal immediately. And then the

20:08

analysis that was the the task we needed

20:11

to that we hadn't run yet. Obviously

20:13

that needs to go and start again because

20:14

that's an activity and you can't

20:16

activities obviously have to run again

20:17

from scratch. And so once that finishes

20:21

I think it does take quite a long time.

20:24

Maybe I can show the previous output. Or

20:26

did we not get to displaying the

20:28

previous output? Did it ironically

20:29

actually fail the time before? But

20:31

hopefully once this finishes, we should

20:33

be able to see uh its analysis, which

20:36

you know, I think is on a par with what

20:38

I see from the other deep research

20:39

things. Obviously, there will be some

20:41

there's some UI work to do to display

20:42

this in a nice deep deep research

20:44

interface. There we are. It's completed

20:47

and it has primary recommendation is

20:49

pantic AI with temporal. So, it it it

20:52

did what I hoped it would do. And you

20:53

see it's given a reasonable report here

20:55

of like the relative trade-offs of the

20:58

other inferior agent frameworks and it

21:00

should have done an executive summary at

21:02

the beginning with with links. Yeah. So

21:04

it said podantic AI langraph obviously

21:07

if you love snapshotting or writing

21:09

unsafe code type unsafe code temporal on

21:11

its own which makes sense. Yeah. So

21:13

there's a there's the summary. That is

21:16

the main stuff I had to show. I will

21:18

merge the the durable execution stuff in

21:20

here. So go here. I just other thing I

21:22

want to just say quickly while I have I

21:24

can't work out how to post a comment but

21:26

like you'll find it on Pantic if you if

21:28

you if you look for it. Um oh I have I

21:31

can do that if anyone wants to take a

21:32

picture of that QR code. The other thing

21:34

I just wanted to mention we're about to

21:36

announce uh Pantic AI gateway. So if

21:38

anyone wants to try it early let us

21:41

know. Um but yeah that platform will be

21:43

landing soon. You'll be able to use

21:45

panic gateway directly to buy inference

21:48

from any of the big models or most of

21:50

the open source models and self-hosting

21:52

for enterprise all the observability

21:54

stuff. But I I'll I'll save you the full

21:56

spiel, but that's coming soon. I think

21:57

some of you will find it interesting.

21:59

That's it. Thanks so much for watching.

22:01

If you want to learn more about Padantic

22:03

AI, Padantic AI gateway or padantic

22:05

logfire, please scan these QR codes. If

22:07

you have any feedback, uh please come

22:09

and talk to us. Thanks so much for

22:11

listening.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video provides a practical demonstration of using Pydantic AI in combination with Temporal for creating durable, reliable LLM agents. The presenter explains why durable execution is essential for long-running workflows to avoid losing state during system failures or process restarts. By implementing a '20 questions' game and a 'deep research' agent as examples, the demo shows how Temporal automatically caches results of deterministic activities, allowing workflows to resume instantly from where they left off without re-executing expensive I/O operations. The session also touches on the use of Pydantic Logfire for observability and mentions upcoming features like the Pydantic AI gateway.