HomeVideos

The Unbearable Lightness of Agent Optimization — Alberto Romero, Jointly

Now Playing

The Unbearable Lightness of Agent Optimization — Alberto Romero, Jointly

Transcript

375 segments

0:03

Right. Hello everyone. Uh today I will

0:05

present meta adaptive context

0:07

engineering or meta AC for short which

0:10

is a new framework designed to optimize

0:12

AI agents beyond single dimension

0:14

approaches. We will explore how

0:17

orchestrating multiple adaptation

0:19

strategies can overcome the limitations

0:21

of existing context engineering methods.

0:25

Now a little introduction about myself.

0:28

Uh so I'm Alberto Romero. I'm the

0:30

co-founder and CEO at jointly. And for

0:32

context at jointly we build the main

0:34

specialized agents for regulated

0:36

industries where policy adherance

0:38

constraints are particularly strict.

0:41

Most of our research work is in the area

0:44

of selfoptimizing agent architectures uh

0:47

using systematic approaches.

0:49

Now about myself, I have spent uh 20

0:52

plus years at the intersection of AI and

0:54

data. Uh some of my recent experience

0:57

includes being the CTO and co-founder of

0:59

human AI uh think MLbased risk

1:02

prediction for mobility which was

1:04

acquired by AON in 2023 and in my

1:08

previous role I headed up city bank's

1:10

genai engineering team.

1:13

Now here's our agenda for today. Um

1:16

we'll begin with the motivation and

1:18

problems that current systems face. Then

1:21

we'll review the AC framework and its

1:24

limitations.

1:26

Um after surveying recent research uh

1:29

insights, we'll introduce the meta AC

1:31

approach. We'll discuss its architecture

1:34

and strategy toolbox, show some results

1:38

um and finish with future directions on

1:40

challenges.

1:43

Now the agentic context engineering

1:45

framework or AC for short uh for which

1:48

you've got the paper link uh on the

1:50

slide there. So it's it's very popular

1:53

framework um and the paper um came out a

1:56

few months ago. Um basically organizes a

2:00

patient into three roles. First of all

2:02

there's a generator that produces

2:04

reasoning paths. Then there's a

2:06

reflector that extracts lessons. And

2:09

finally, there is a curator that

2:10

synthesizes these lessons into

2:12

incremental updates.

2:14

AC uses incremental delta updates and a

2:18

grow and refine mechanism to prevent

2:20

context collapse and maintain relevance.

2:23

Now, most importantly, it can improve

2:25

without label data by learning directly

2:28

from execution feedback.

2:32

Now so AC has been um quite successful

2:35

and has achieved substantial gains

2:37

across some of the most popular HM

2:39

benchmarks like Upworld or finer uh

2:42

almost an 11% compared to previous

2:45

state-of-the-art approaches such as Japa

2:48

or DC.

2:50

Um and it's also achieved an 8.6%

2:54

um gain on financial reasoning tasks.

2:58

Um there are four fundamental

3:01

limitations um for AC that I'm going to

3:05

reflect on and um just discuss on the

3:08

next slide. Um and those form the basis

3:11

for um for meta AC basically.

3:15

Now as I was saying um despite it

3:18

strength AC has got four critical

3:21

failure modes. First it is highly

3:24

dependent on the reflector. Um so when

3:27

reflection fails the context becomes

3:28

noisy and even harmful.

3:32

Uh secondly there's feedback

3:34

brittleleness which means that when

3:36

ground truth signals are weak or absent

3:38

AC may reinforce incorrect behaviors.

3:43

Third, the the task complexity blindness

3:46

um which leads to treat simple and

3:49

complex tasks the same which can be a

3:52

waste of resource uh and also a miss of

3:55

opportunities um for optimization

3:59

and then finally um AC optimizes only

4:02

the context dimension so ignores compute

4:04

memory and parameter updates.

4:08

Now the 24 and 25 research landscape

4:11

offers um four key insights in my views.

4:15

First of all uh verification me

4:17

mechanisms uh like self evaluation,

4:20

multimodel consensus and execution

4:22

checks are really important for

4:25

robustness of any solution.

4:28

Secondly, uh adaptive compute allocation

4:31

shows that small models can outperform

4:33

much larger ones by selectively

4:36

increasing inference steps.

4:39

The third one is that structured memory

4:41

architectures outperform linear context

4:44

context accumulation by organizing facts

4:47

as graphs or multi-randular memories.

4:51

Then finally, test time training bridges

4:54

inference and learning uh and enables

4:56

temporary parameter updates to yield

4:58

large accuracy gains.

5:01

So these advances suggest that we need a

5:03

hybrid multi-dimensional system.

5:08

Now, MetaC um addresses AC's limitation

5:12

by adding a meta controller that learns

5:14

to orchestrate multiple adaptation

5:17

strategies based on a task's complexity,

5:20

uncertainty, verifiability,

5:23

and also resource constraints. So

5:25

instead of applying the same procedure

5:28

to every problem, Metaac profiles each

5:31

task and allocates the right combination

5:34

of strategies across context, compute,

5:36

verification, memory and parameter

5:38

dimensions.

5:40

Um so this adaptive uh learned

5:43

coordination is what enables it to

5:45

outperform single dimension methods.

5:50

Now the the meta framework consists of

5:53

four layers. So getting into the

5:55

architecture

5:57

um the first layer is the task profiling

5:59

one which assesses complexity

6:02

uncertainty verifiability and resource

6:04

budgets.

6:06

Then there is a lightweight meta

6:08

controller that selects and allocates

6:10

adaptation strategies accordingly.

6:13

The next layer down is a strategy

6:15

execution one and the carries out the

6:18

reflection, adaptive compute,

6:20

hierarchical verification,

6:22

structure memory retrieval and selective

6:25

uh test time training. And then finally

6:29

uh there's a feedback aggregation layer

6:30

that collects the outcomes and updates

6:33

the meta controllers policy through

6:35

metalarning.

6:36

So this layer design allows the system

6:38

to learn from its experience and uh

6:41

continuously refine its decision making.

6:47

Now in terms of the task profiling um

6:50

there are four key dimensions that are

6:52

being assessed. The first one is uh

6:55

semantic complexity. So this is

6:58

basically an embedding based similarity

7:00

to uh known dash distributions that gets

7:03

produced.

7:05

Uh second one is uncertainty

7:07

quantification.

7:09

Uh think of it as a relative softmax uh

7:12

scoring that predicts model confidence.

7:15

The third one is verifiability

7:17

assessment. So whether we can execute

7:19

and validate the output.

7:22

And then the fourth one is resource

7:24

availability. So we take into

7:26

consideration the context window, the

7:28

compute budget and even other

7:30

constraints such as time.

7:32

So the output of this layer of the task

7:35

profiling layer is a 32dimensional task

7:38

embedding which is what fits as input

7:41

into the meta controller.

7:45

Now in terms of the strategy toolbox um

7:48

meta draws from six strategies.

7:51

First one is minimal context which uses

7:54

concise prompts for simple tasks.

7:58

Um then we use AC reflection uh which

8:01

retains the generator reflector curator

8:04

loop for incremental knowledge

8:05

accumulation um as established by uh

8:08

standard AC.

8:10

Then we also use adaptive compute which

8:13

scales the number of reasoning steps or

8:15

samples based on the task difficulty.

8:19

We also use hierarchical verification

8:22

that combines self-evaluation multimodal

8:25

consensus and execution checks.

8:28

uh adaptive memory uh that retrieves

8:31

relevant information from structured

8:32

multi granular memories and then finally

8:35

we use selective test time training

8:38

which applies temporary parameter

8:40

updates such as lower adapters for high

8:42

stakes tasks.

8:44

So the meta controller learns to combine

8:46

these tools effectively over time.

8:51

Now the um reward formula um upon which

8:55

the the learning strategy is selected

8:59

accounts for the following components.

9:01

Um the first one is the correctness of

9:04

an action or prediction which is

9:05

accuracy.

9:07

Then we also have the penalty associated

9:11

um with resources used or negative

9:13

outcomes. So one minus cost and then is

9:16

the trustworthiness of the models which

9:18

is self-expressed certainty.

9:21

So the confidence calibration basically

9:23

uh with weighted importance determined

9:25

by the hyperparameters alpha, beta and

9:28

gamma.

9:31

In terms of the uh metalarning loop um

9:35

we have four sources of feedback

9:37

collection. Uh first of all is task

9:40

outcomes. The success failure or

9:43

correctness um of the task. Then we've

9:47

got the strategy performance. So what is

9:50

the individual contribution of each

9:52

strategy to the overall performance of

9:54

the task?

9:56

Then we also have efficiency metrics

9:58

such as the compute, latency, memory.

10:02

And then finally we've got confidence

10:04

calibration. So where predictions are

10:07

accurate.

10:10

Um so moving on to um how we go on about

10:15

uh solving the uh the limitations from

10:18

AC. The first one was the weak reflector

10:21

problem. So AC's issue is that there is

10:25

a a 50 to 60% performance drop when

10:28

reflector quality degrades. Um with beta

10:31

AC we introduce um uh three things

10:35

basically. So first of all is quality

10:38

gates. Um so it's a learned classifier

10:42

that blocks harmful deltas and secondly

10:45

there's a multi- signal reflector uh or

10:48

reflection which basically um is an

10:50

ensemble of specialist models uh when

10:53

there is a level of uncertainty.

10:56

Uh and then the third one is adaptive

10:59

strategy allocation. So the meta

11:01

controller learns when reflection fails

11:04

and then it roots to verification or

11:06

test time compute instead.

11:09

Um so we can expect to maintain an 80%

11:13

plus performance even when the uh

11:15

reflector degrades around 30%.

11:20

Now the the second um limitation we had

11:24

was um the feedback quality

11:26

brittleleness.

11:28

So what we observe with AC is that there

11:30

can be significant degradation without

11:33

reliable ground truth signals.

11:36

Uh with beta AC we introduce a

11:38

hierarchical verification cascade um

11:41

where we can expect a 50 to 60%

11:44

reduction in errors from poor feedback

11:47

and that's through three tiers. The

11:49

first tier is self verification which is

11:52

just fast filter. We just accept if the

11:55

confidence level is over a certain

11:57

value. Second tier is a multimodel

12:00

consensus. So we leverage a diverse

12:03

range of models such as GBT4, claude and

12:07

dips and we do confidence weighted

12:09

voting. And then the tier three is

12:13

execution based verification

12:16

uh where we leverage code sandbox APA

12:18

API validation and schema compliance.

12:23

Um the the third um limitation we had

12:27

was uh task complexity mismatch. Um so

12:31

in a sense the fact that AC uses uniform

12:36

processing um also for simple tasks

12:39

which can be a waste of resource. So

12:42

meta adapts uh strategy allocation

12:45

dynamically rather than using the same

12:47

heavy pipeline for everything. The

12:50

alphas are allocation weights for the

12:52

six optimization strategies and they

12:55

represent how much computational budget

12:57

is assigned to each strategy for a given

13:00

task. So simple tasks um require minimal

13:05

processing can save n around a 90% uh

13:09

compute compared to standard AC.

13:13

moderate tasks um is more of a balanced

13:16

approach um that include AC plus

13:20

verification and then complex tasks um

13:23

basically heavy test time compute

13:26

multiple attempts and memory retrieval.

13:31

Um so just to conclude with some results

13:35

um and and these are initial results uh

13:38

we have observed um around an 8 to 11%

13:42

uh improvement on agent benchmarks.

13:46

Um we have also observed a six to eight

13:49

points improvement on on some domain

13:52

specific tasks. um also a 30 to 40%

13:56

reduction in compute costs um through

13:59

the allocation of um adaptive strategies

14:04

um and overall there's um there's more

14:07

robustness more consistency

14:10

um and you know we can generalize better

14:13

we can use the framework across a a

14:16

diverse range of of domains so the

14:19

conclusion is that um meta can can

14:22

orchestrate ates a context compute and

14:25

verification and memory and parameter

14:27

adaptation and produce a robust uh

14:31

self-improvement

14:32

um framework for agents.

14:35

Um future work will implement uh and

14:38

evaluate the full system across uh a a

14:43

more diverse range of domains and we'll

14:46

continue exploring metalarning and this

14:49

will involve also incorporating um

14:52

additional strategies as well.

14:56

Now I also wanted to touch on um

14:58

additional applications of meta that I

15:01

think are quite relevant. Um so first

15:05

one is um for multimodel AI systems. So

15:09

for example deciding when to use vision

15:12

versus uh language processing again can

15:15

be um a like a a meta adaptive uh

15:19

strategy decisioning.

15:21

Um also when you have uh compound AI

15:25

systems that um require different models

15:29

for different stages um and the

15:32

complexity is um you know is substantial

15:37

uh we can actually um uh in a in a meta

15:41

adaptive manner uh select the most

15:43

effective uh strategies to to resolve a

15:47

task and to end. um also um for human

15:52

collaboration um so in other words to

15:54

determine when to have a human in the

15:57

loop and also for continual learning

16:00

systems um where we are balancing

16:03

exploration versus exploitation.

16:06

Um so the the core takeaway is that

16:10

optimization requires a meta layer of

16:13

intelligence and and that has to be

16:15

trained um and you know um it requires

16:20

um a lot of trial and error before it

16:22

can actually um perform at the right

16:25

level.

16:27

In terms of the future direction and

16:29

challenges um there are still several

16:32

challenges that remain. So the meta

16:34

controllers training u may be unstable

16:37

um due to sparse rewards and that this

16:39

can be mitigated through curriculum

16:41

learning. Uh also robust advantage

16:44

estimation and um regularization of

16:47

entropy.

16:49

Also computational overhead from

16:51

profiling and multiple uh strategies um

16:55

needs to be reduced with efficient

16:57

models. Um we can leverage things like

17:00

lazy execution, batching and caching.

17:04

Um also uh the ver verification uh

17:08

cascades can be brittle if all models um

17:11

make the same mistake. So we need

17:13

diverse models um with confidence

17:17

waiting and human oversight um as well

17:20

as active learning. uh metalarning loops

17:23

require substantial data. Uh synthetic

17:26

task uh task generation of policy

17:28

learning uh transfer from related

17:31

domains and sample efficient algorithms

17:34

uh can also help as well. And finally uh

17:38

addressing these ch these challenges um

17:40

is going to be key to scaling meta and

17:44

applying it across um a wide range of

17:46

domains.

17:48

So that was all from me. Thank you very

17:50

much for listening. Um, and yeah, uh,

17:54

appreciate you being there. Thank you.

Interactive Summary

This video introduces Meta Adaptive Context Engineering (Meta AC), a framework designed to optimize AI agents by moving beyond single-dimension approaches. The speaker explains the limitations of the existing Agentic Context (AC) framework—such as reflector dependency, feedback brittleness, task complexity blindness, and optimization restricted to the context dimension—and proposes Meta AC as a solution. By utilizing a meta-controller that profiles tasks and dynamically selects from a toolbox of six strategies (covering context, compute, verification, memory, and parameter updates), Meta AC achieves greater robustness, efficiency, and performance across various benchmarks.

Suggested questions

3 ready-made prompts