HomeVideos

The 3 Pillars of Autonomy – Michele Catasta, Replit

Now Playing

The 3 Pillars of Autonomy – Michele Catasta, Replit

Transcript

664 segments

0:00

[Music]

0:14

[Music]

0:21

So at Raplet, we're building a coding

0:23

agent for nontechnical users. It's a

0:26

very peculiar challenge I would say

0:28

compared to many people in this room.

0:29

And what I'm going to talk about today

0:31

is why autonomy has become kind of the

0:33

northstar that we keep chasing you know

0:36

since we launched the very first version

0:37

of rapid agent September last year.

0:41

Let's start from this very interesting

0:45

plot in case my clicker worked which now

0:48

does. Um I'm sure you all have seen it

0:51

the semiync value that published by Zixs

0:54

a few weeks ago and it kind of clarified

0:57

a bit the landscape you know for all of

0:58

us uh agent builders. On one hand, you

1:01

have the low latency interactions that

1:04

really allow you to stay in the loop,

1:05

you know, so you can do deep work and

1:07

focus really on the on the coding task

1:09

at hand, but you need to be an expert.

1:11

You need to know exactly what to the

1:12

model for and you need to understand

1:14

quickly if you want to accept the

1:15

changes or not. Then for several months

1:18

many of us including replet we kind of

1:21

lived in this I think value that where

1:24

the agent wasn't autonomous enough to

1:26

really delegate a task and come back and

1:28

see it accomplished but at the same time

1:31

it run long enough not to keep in the

1:33

zone not to keep in the loop likely over

1:36

time we managed to go all the way on the

1:38

right and now we have agents that runs

1:39

for several hours in a row. What I'm

1:42

going to be arguing with today and hopes

1:44

is not going to stop inviting me to this

1:46

event is the fact that there is an

1:48

additional dimension like a third

1:49

dimension to this plot that you know it

1:51

hasn't been covered here and namely the

1:53

fact is how do we build autonomous

1:56

agents for nontechnical users.

1:59

So what I'm going to be arguing today is

2:01

that there are two types of autonomy.

2:03

One of it is more supervised. So think

2:06

of the you know Tesla FSD example. When

2:09

you sit in a Tesla, you're still

2:12

expected to have a driving license.

2:13

You're going to be sitting in front of

2:15

the steering wheel. Perhaps 99% of the

2:17

time, you're not going to use it, but

2:18

you're there in order to take care of

2:20

the longtail events. And similarly, a

2:23

lot of the coding agents that we have

2:25

today require you to be technically

2:27

savvy in order to use them correctly.

2:30

We at Replet and uh other companies at

2:33

this point are focusing on kind of the

2:35

whimo experience for autonomous coding

2:38

agents. So you're expected to sit in the

2:40

back. You don't even have access to the

2:42

steering wheel. And I expect you

2:44

basically not to need any driving

2:46

license. Uh why is this important?

2:48

Because we want to empower every

2:50

knowledge worker to create software. And

2:52

I can't expect knowledge workers to know

2:54

what kind of technical decisions an

2:56

agent should be making. We should

2:57

offload completely the level of

2:59

complexity away from them.

3:02

Of course, it took a while to get here.

3:04

So I'm I'm sure what I'm showing you

3:06

here is something that all of you are

3:07

very familiar with. It took several

3:10

years to go from I know maybe less than

3:13

a minute feedback loop constant

3:14

supervision and talking about

3:16

completions and talking about

3:17

assistance. These are areas where the AI

3:20

power is and really been pioneering this

3:22

this type of user interaction. Then we

3:26

slowly climbed through you know higher

3:28

levels of autonomy. So we had the first

3:30

version of the agents based on on react.

3:32

So we concocted autonomy with a very

3:35

simple paradigm on top of LMS. Then

3:38

likely AI providers understood that tool

3:40

calling was extremely important poured a

3:42

lot of effort on that. So we built the

3:44

next version of agents with native tool

3:46

calling. And then I would say there is a

3:48

third generation of agents which I call

3:50

autonomous and that's when we started to

3:52

break the barrier of say one hour of

3:54

autonomy. Basically the the agent being

3:56

capable of running on long horizon tasks

3:58

and remaining coherent. It happens to be

4:01

the case that those are also the

4:02

versions of rapid agent that we launched

4:04

for the last year. So the B3 is the one

4:07

that we launched a couple of months ago

4:08

and it has exactly showcases those

4:10

properties. So the question for today is

4:13

can we actually build fully autonomous

4:15

agents and how do we get there?

4:18

So I'm going to try to redefine the

4:21

definition of autonomy today. I think

4:23

that often times we conflate autonomy

4:26

with a concept of something in the lungs

4:27

for a for a lot of time and usually as a

4:30

user you lose control. In reality what

4:34

the autonomy that I want to give to

4:36

agents can be very specifically scoped

4:39

and what I mean by that is especially

4:42

with rapid agent 3 what we accomplish is

4:44

we we make sure that our agent takes

4:46

holy technical decisions. Of course,

4:48

that could lead to very long gap between

4:51

the different user interactions and in

4:52

case the agent again runs for several

4:54

hours. But this happens if and only if

4:56

the scope of the task you're giving to

4:58

the agent is really broad. And it turns

5:01

out that in reality you can have an

5:02

agent that is really autonomous and is

5:04

still fast as long as you give it a very

5:07

narrow scope for the task, you know, at

5:09

hand. So what we can accomplish in this

5:12

way is that the user still maintains

5:14

control on the aspects that they care

5:16

about and a user cares about what

5:17

they're building. Especially again our

5:19

users, knowledge workers, they don't

5:21

care about how something has been built.

5:23

They just want to see their goals to be

5:25

accomplished. So autonomy should not be

5:27

basically conflated with long run times.

5:31

And similarly, it shouldn't become a

5:34

vanity metric. You know, a lot of us are

5:35

talking about it as a as a badge of

5:37

honor. And it's definitely been exciting

5:39

to see in the last few months that you

5:40

know many of us broke the the barrier of

5:42

running several hours in a row. But I

5:44

think in terms of how to build agents

5:47

that are going to be more powerful and

5:49

more steable in the future, we kind of

5:50

have to change a bit uh the the target

5:53

the metric that we that we keep in mind.

5:55

So think about it in this way. Tasks

5:58

have a natural level of complexity. And

6:00

basically what we care about is that

6:02

they have a minimum irreducible amount

6:04

of work that they express. What agents

6:07

do is that they always go through this

6:09

loop of planning, implementing and

6:10

testing. And of course to make this

6:13

happen and to make it work correctly,

6:14

you want this work to be happening over

6:16

a long quering trajectory. So our goal

6:19

is to maximize the reducible runtime of

6:22

the agent. By reducible, I mean having a

6:25

span of time where the user doesn't have

6:27

to make any technical decisions and the

6:29

agent can accomplish the task again in

6:31

full autonomy. This is especially

6:33

important for us because I can't trust

6:36

our users to make technical decisions.

6:38

So they they need a proper technical

6:39

collaborator by their side. I want to

6:42

abstract away as much complexity as

6:44

possible from the process of software

6:45

creation. And last but not least, I want

6:49

the users to feel in control of what

6:51

they're creating without startling their

6:54

creativity because they have also to

6:55

think about the technical decision that

6:57

the agent is making.

6:59

So now what are the pillars of autonomy?

7:02

How are we making this happen? I would

7:04

say there are three pillars that are

7:06

extremely important to think about. The

7:08

first one is of course the capabilities

7:10

of frontier models like the baseline IQ

7:13

that we inject in the main agentic loop.

7:15

I'm going to leave this as an exercise

7:17

to the reader and to other people in the

7:18

room. I'm really glad a lot of you are

7:20

building amazing models that you know we

7:22

use all the time at Rapid. So this is

7:24

the pillar number one. The second pillar

7:26

is verification. It's very important

7:30

that we test for local correctness of

7:32

our agent at every step that it takes

7:34

and the reason is fairly intuitive. If

7:36

you are building on very shaky

7:38

foundations, eventually the castle will

7:40

topple down. So we brought verification

7:43

in the loop to make sure that in a sense

7:45

you are having you know nines of

7:47

reliability where in the compounding

7:48

errors than an agent will make

7:50

unavabodably if you know you don't put

7:52

any control on it. And last but not

7:54

least, you heard it on stage even

7:56

earlier. I'm sure you're going to be

7:57

hearing this, you know, the entire day

7:58

or the entire duration of the

8:00

conference. Uh the importance of context

8:02

management. So on one end you want to

8:04

have an agent that is capable of being

8:06

globally coherent. So it's aligned with

8:08

the intent of the user, the expectation

8:10

of the user. But at the same time, it is

8:12

also to be capable of managing both the

8:14

high level goal and the single task that

8:16

the agent is working on. I think we made

8:18

amazing progress in the last months on

8:20

context management. But I'm also excited

8:22

to see, you know, where we're going as a

8:24

field.

8:25

Let's start from the first pillar that

8:27

we work actively at rapid which is

8:29

verification.

8:31

So why do we focus on this? Over the you

8:34

know last year we realize something that

8:38

I think each one of you has experienced.

8:40

So without testing agents build a lot of

8:42

painted doors. In our case the painted

8:45

doors are very visible because we create

8:47

a lot of web applications. So you end up

8:49

basically trying to click on a button

8:51

and the handler is not looked up or some

8:54

of the data that we're showing is

8:55

actually mock data and it's not coming

8:57

it's not coming from a database. But in

8:59

general this phenomenon spans you know

9:00

across every type of component you're

9:02

building being it front end or back end

9:05

a lot of components are actually not

9:06

fully fleshed uh by the agent. So we run

9:10

some evaluations internally. We found

9:12

out that more than 30% of the individual

9:14

features happen to be broken. Know the

9:16

first time that are cooked by the agent.

9:18

And that also means that almost every

9:21

applications at least one broken feature

9:24

or painted door. They're hard to find.

9:27

The reason is users are not going to

9:29

spend time testing every single button,

9:31

every single field. And this is also

9:34

probably one of the reasons why a lot of

9:37

our users, especially the nontechnical

9:39

ones, still can't trust coding agents

9:41

very much. They are shocked when they

9:43

find that there's a painted door out

9:44

there. So, how do we solve this problem?

9:48

Fundamentally, we need an agent must

9:50

gather all the feedback that they need

9:52

from their environment, right? It's

9:54

easier said than done. Um again

9:57

nontechnical users not only cannot make

9:59

technical decisions but also they cannot

10:01

provide the technical feedback that you

10:03

know an agent is required to make

10:05

progress and most what they can do is

10:07

basic you know quality assurance testing

10:09

they can literally go around the UI

10:12

click interact with application

10:14

I'm I'm sure you have tried it in your

10:16

life this is extremely tedious to do and

10:18

it leads to a very bad user experience

10:20

and even though we relied on that with

10:22

our first release of the agent last year

10:25

quickly we undo that users don't want to

10:26

spend time doing testing. So we had to

10:29

find a complete you know orthogonal

10:31

solution to that which is autonomous

10:33

testing and it solves several different

10:36

issues. The first one is it breaks the

10:38

feedback bottleneck. Even if again we

10:41

ask feedback to the user we were not

10:43

given enough of that. Now we don't have

10:45

to wait anymore for human feedback. We

10:47

have a way to elicit as much information

10:49

as possible from the app autonomously.

10:52

We also want to prevent the accumulation

10:54

of small errors. What I was saying

10:55

before, we don't want to have

10:56

compounding errors while the agent is

10:58

building. And last but not least, we

11:00

have to overcome the laziness of

11:02

frontier models. So we need to verify

11:04

that whenever a model tells us that a

11:06

task has been completed, there is

11:08

actually the truth and that result is

11:09

not being elucinated.

11:12

There is a wide spectrum of code

11:14

verification that you know you can

11:16

accomplish. I think we all started from

11:18

the very left. You know you have basic

11:20

study code analysis with LSPs. We have

11:23

been executing the code since we had

11:24

basically LMS that were capable of

11:26

debugging and then we slowly started to

11:28

move towards the right. So generating

11:30

unit tests and running them it has a

11:32

limitation. It's limited only to

11:34

functional correctness. Uh unit testing

11:36

is not very powerful to do like proper

11:38

integration testing by definition. We

11:41

started also to do now API testing but

11:43

is only limited to API code. So you can

11:46

test endpoint of an applications. you

11:48

can't really test how a web app

11:50

functions and looks like and for this

11:53

reason in the last few months H has and

11:56

other companies are putting a lot of

11:58

effort in really creating autonomous

12:00

testing based on the browser you know in

12:02

case the app that we're building is a

12:03

web application there are two main

12:05

categories here one is computer use it's

12:08

a onetoone mapping with user interface

12:10

so the model is directly interacting

12:11

with the application it requires

12:13

screenshots it tends to be fairly

12:15

expensive and fairly slow I'm sure you

12:18

you tested it yourself. A good way in in

12:20

the middle is browser use where we

12:23

simulate the user interface. You can

12:25

then interact with the browser and with

12:27

the web application and it relies on

12:29

basically accessing the DOM through

12:30

abstractions.

12:33

So how do we how do we make this work?

12:36

Um what we do is that we generate

12:38

applications that are amenable to

12:40

testing and we sort of merge everything

12:44

together from the previous slides that I

12:46

showed you. So we allow the our testing

12:49

agent to interact with an application

12:51

and gather screenshots in case nothing

12:53

has worked. So we have a full back to

12:54

computer use. But the vast majority of

12:57

times what we do is that we have

12:58

programmatic interactions with the

13:00

applications. So we interact with the

13:01

database, we read the logs, we do API

13:04

calls, we literally click on the app and

13:07

get back all the information that we

13:09

need. And by putting all of this

13:10

together, we collect enough feedback

13:13

that allows our agent both to make

13:16

progress and also to fix all the painted

13:18

doors that it encounters.

13:21

Just a know short technical deep dive on

13:25

how we accomplish this. I'm sure you

13:27

have seen a lot of the toolbased uh

13:30

browser use. There are amazing libraries

13:32

out there. First one comes to my mind is

13:34

stan and the idea is that you have an

13:37

agent that has a few very generic tools

13:40

exposed. So know the agent can create a

13:42

new tab, can click, can fill forms etc

13:45

etc. The limitation is that it's

13:48

difficult to enumerate all the different

13:50

type of interactions you could be having

13:51

with a browser. The problem of testing

13:53

is very similar to the Tesla analogy I

13:56

was making before. Maybe this cardality

13:58

of tools available is enough for 99% of

14:01

the interaction types. But then there is

14:03

always a long tale of idiosyncratic

14:06

interactions that a user makes with the

14:07

with a web application that are hard to

14:10

map into this tool these different tool

14:12

calls. So what we do uh in our case at

14:15

rapid is we directly write playrite code

14:19

and playrite code is first of all very

14:21

manageable for LLMs. LLMs are kind of

14:24

amazing at writing playright. You know

14:26

this is the experience that we had uh

14:27

since we started to work on this project

14:29

is also very powerful and expressive. So

14:33

in a sense it's a super set to what you

14:34

can express uh on the compared to the

14:37

left on the tools uh testing and last

14:40

but not least there is beauty in

14:42

creating playright code because you can

14:44

reuse those tests. The moment you write

14:46

a test in script then you can rerun it

14:48

as many times as you want. So in a sense

14:50

the moment you created a test you're

14:52

also creating a regression test suite

14:54

that you can keep running in the future.

14:56

And all these kind of uh tricks that I

14:59

explained to you right now, they helped

15:01

us to create something that is roughly a

15:04

order of magnitude cheaper and faster

15:05

compared to computer use. And we'll go

15:08

back later on how important latency is.

15:11

The second thing that the second pillar

15:13

that I wanted to talk about today of

15:14

course is context management. And I'm

15:16

going to go very fast here because I

15:18

think you're going to be hearing a lot

15:19

of talks today about it. The the high

15:22

level message here is that long context

15:25

models are not needed to work on queer

15:27

and long trajectories. Uh from

15:30

experience, we found that most of the

15:31

tasks, even the more ambitious one, can

15:34

be accomplished within the 200,000

15:35

tokens. So we're still not in a world

15:38

where working with models that have 10

15:41

million or 100 million uh context

15:43

windows is necessary to actually run

15:44

autonomous agents. And we accomplish

15:47

this by means of learning how to do

15:49

context management correctly. So first

15:52

of all, there are several different ways

15:54

to maintain state which don't imply

15:57

chucking all the state into your context

15:59

window. You can do that for example by

16:02

using the codebase itself to maintain

16:04

state. So you can write documentation

16:06

while the agent is creating new code.

16:08

You can also include the plan

16:10

description and all the different task

16:12

list that the agent is working on. You

16:13

can persist them on the file system. So

16:15

even there like have a lot of ways to

16:17

offload your memories. And last but not

16:19

least and this is something I think you

16:20

know Antropic has been uh really

16:22

evangelizing about um you can even dump

16:25

directly your memories in the file

16:27

system and then making sure that your

16:29

agent decides when to write them back

16:31

the moment they become relevant to your

16:32

work. So for this reason we have been

16:35

seeing a lot of announcements in the

16:36

last couple of months. Uh just pick this

16:38

one from Entropic with Cloud Sonet 4.7.

16:42

So I wish 4.5 uh they have been able to

16:45

run uh focus task for more than 30 hours

16:48

in a row. We have seen similar results

16:50

from open AI on the math problems. So I

16:53

think we we kind of broke the barrier of

16:55

running for long and you know being able

16:57

to have coent tasks.

17:00

I would say the key ingredient to make

17:01

this happen has been how good models

17:04

hand as agent builders have become in

17:06

doing sub agent orchestration.

17:09

Sabages basically work by means of

17:11

they're invoked in the core loop. So

17:12

it's a completely it's starting from a

17:14

blank slate uh from a completely fresh

17:17

context. You as an agent builder decide

17:19

what subset of the context to inject

17:21

when the sub agent starts and it's a

17:24

concept that is very similar I think to

17:26

everyone who's been writing software you

17:27

know in the last decades is separation

17:29

of concerns. So you decide what your sub

17:31

engine is going to be working on. You

17:32

give it the least possible amount of

17:34

context. You allow it to run to

17:35

completion. you only get the output the

17:38

results. You inject them back into the

17:39

main loop and you keep running in this

17:41

way. Of course, it significantly

17:43

improves the number of memories per

17:45

compression. I just brought this plot

17:48

from directly from reput agent running

17:50

in production the moment we kicked in

17:52

our new subvision orchestrator on the a

17:55

on the y-axis you can see the number of

17:57

memories per compression. So we went

17:59

from roughly 35 to 4550 recently. So big

18:04

improvement in terms of how often we are

18:07

recompressing our context just because

18:10

we can offload a lot of the context

18:12

pollution by means of using sub agents.

18:16

I'm going to give an example where this

18:18

made the difference for us. You know the

18:19

what I'm showing you here is more kind

18:21

of a cost optimization in a sense like

18:23

you're compressing less. You also have

18:25

separation of concerns which definitely

18:27

make your agent a bit smarter. In the

18:29

case of testing,

18:32

working with sub engine was almost

18:34

mandatory for us and basically we

18:36

started to work on automated testing

18:37

even before we were very advanced in

18:39

terms of subgent orchestration. And what

18:41

we found out is of course again as I was

18:44

saying before it makes things easier,

18:46

better cost, less pollution. But when

18:49

you allow the main loop not only to

18:52

create code but also to do browser opt

18:54

browser actions to put back the

18:56

observation of your browser actions into

18:58

the main loop you tend to confuse the

19:00

the hedging loop very much because at

19:02

this point there is a lot of it in terms

19:04

of the action that your main loop is

19:06

looking at. So in order to make this

19:09

work not only we have to build all the

19:11

playright framework that I was showing

19:12

to you before but we also have to move

19:14

our entire architecture into sub aents.

19:16

So at this point you can see very

19:18

clearly why there is a separation of

19:19

concern here. Got the main agent loop

19:21

running. We decide at a certain point

19:24

that it's time to verify if the output

19:26

of the agent has been correct. We make

19:28

this happen all within a sub agent. Then

19:30

we scratch the context window of that

19:31

sub agent. We just return back the last

19:34

observation to the agent loop and then

19:35

we keep running in that way. So if

19:38

you're having issues today making your

19:40

sub agents uh work correctly, this is

19:42

one of the reasons why that you want to

19:44

take a look at.

19:46

So I think we covered the high level of

19:49

how to create more and more powerful uh

19:51

autonomous agents over time and I only

19:54

see us as a field becoming even more

19:56

proficient than that in the next months.

19:58

There is one additional ingredient

20:00

though that is going to make the

20:01

difference and it's parallelism. And I

20:03

will argue that parallelism is important

20:06

not because it's going to make agents

20:08

more powerful per se, but rather because

20:10

it's going to make the user experience

20:12

more exciting. So of course it is great

20:16

to have an agent that is capable of

20:18

running autonomously for long, but at

20:20

the same time it comes with the price of

20:22

making the user experience less

20:24

thrilling. You are not in the zone

20:26

anymore. What you do is that you write a

20:28

very long prompt. It's translated into a

20:30

task list uh and then you go to have

20:33

lunch with your colleagues and then you

20:34

come back and you hope that the agent is

20:36

done. That is not the kind of experience

20:38

that most of the productive people want

20:39

to have in life. You know, you want to

20:41

see as much work as done as possible in

20:43

the shortest span of time.

20:45

So what we do as a as a field at this

20:48

point has been to create parallel

20:49

agents. It's a very common trade-off

20:52

which by the way doesn't only apply to

20:54

agents. it it applies to computing in

20:56

general and for parallel agents what you

20:58

do is that you you trade off basically

21:01

extra compute in exchange for time. Why

21:03

there is this trade-off? So first of all

21:05

when you're running agents in parallel

21:08

you're gathering the same context in

21:10

multiple context windows. So every

21:13

single parallel agent you will be

21:14

running probably shares say 80% of the

21:16

context across the board. So of course

21:18

you are just putting more computed work

21:21

because you're running those agents in

21:23

parallel. There is also another cost

21:25

that is kind of intangible for a lot of

21:28

you here in the room because I'm sure

21:29

you're all expert software developers.

21:32

But what do you do with the output of

21:34

multiple par agents at the end? Often

21:36

times you need to resolve merge

21:38

conflicts. So as a reminder, my users

21:40

don't even know what's the concept of

21:42

merge conflicts. It's something that I

21:43

have to figure out on our own. So the

21:47

current way in which we think of

21:48

parallel agents in in the space doesn't

21:51

really apply to rapid. Now at the same

21:53

time I still want to very much to

21:55

accomplish this. There are so many

21:57

interesting features that you can enable

21:59

with parallelism aside from the fact

22:00

that you can get more work done. Uh at

22:03

times you want to you want testing to be

22:05

running in parallel with the agent that

22:07

creates code. Testing no matter how much

22:09

we optimize it is still very slow. If an

22:11

agent is only spending time on testing

22:13

users are not going to be engaging with

22:15

your application anymore. Um, at the

22:17

same time, it's also great to have a

22:18

synchronous process running while your

22:20

agent is running because you can inject

22:22

useful information back into the main

22:24

core loop. And last but not least is a

22:26

very common technique that we know boost

22:29

performance if you have enough budget to

22:32

do so. You should be sampling multiple

22:35

trajectories at the same time. So a lot

22:37

of perks are coming with parallel

22:38

agents. But the the way in which we

22:42

implement them today which I go

22:43

basically call user has an orchestrator

22:45

is the fact that tasks the par task that

22:48

you want to run are determined by you by

22:50

the user and each task is dispatched in

22:53

its own thread. So there's a bit of

22:56

manual process even the task

22:57

decomposition in a sense is happening in

22:59

your mind while you're thinking about

23:00

which agents you want to run and then

23:03

the moment you get back all the results

23:05

you need to go through the problem of

23:06

merge conflicts and often times this is

23:09

not trivial at all no matter how many

23:11

amazing tools are out there. So what

23:14

we're working on today for our next

23:16

version of the agent is having the core

23:19

loop as the orchestrator. So the key

23:21

difference here is the fact that the the

23:24

subtask that we're going to be working

23:26

on are not determined by the user but

23:28

they're determined by the corion loop

23:30

and the parallelism is basically

23:32

deciding on the fly. The agent does the

23:35

task de composition on behalf of the

23:36

user and this comes with a couple of

23:38

advantages. First of all again there's

23:41

no cognitive burden to for the user to

23:43

understand how they should be

23:44

decomposing the task. At the same time

23:47

also there are ways in which you can

23:49

create tasks that sort of mitigate the

23:53

problem of merge conflicts. I'm not

23:55

claiming that we're going to be able to

23:56

mitigate it 100%. There are so many

23:58

corner cases in which merge conflict

24:00

will still represent a problem but there

24:02

are a lot of different techniques known

24:04

in software engineering to make sure

24:06

that you can try to have multiple sub

24:08

agent not stepping on each other tools.

24:10

So the core loop as an orchestrator is

24:14

going to be the our main bet for the

24:16

next few months.

24:18

And in case you're passionate about

24:19

these topics,

24:21

I'm always hiring a rabbit. Thank you.

24:30

[Music]

Interactive Summary

The presentation discusses the development of autonomous coding agents at Replit, specifically tailored for non-technical users. The speaker defines two types of autonomy: supervised (like Tesla's FSD, requiring user intervention) and fully autonomous (the 'Waymo' experience, removing the need for technical oversight). The strategy focuses on empowering knowledge workers by abstracting technical complexity, with the main pillars being frontier model capabilities, autonomous verification, and context management. The speaker emphasizes that autonomy should not be equated solely with long run-times, but rather with 'reducible' runtime where the agent handles tasks without user intervention. Future developments focus on using the core agent loop as an orchestrator to manage parallelism and sub-task decomposition, aiming to improve user experience without requiring users to manage merge conflicts.

Suggested questions

4 ready-made prompts