HomeVideos

Making Codebases Agent Ready – Eno Reyes, Factory AI

Now Playing

Making Codebases Agent Ready – Eno Reyes, Factory AI

Transcript

415 segments

0:13

[music]

0:20

Hey everybody, my name is Eno. Uh really

0:23

pumped to talk today about uh something

0:26

that at Factory we care a lot about. uh

0:28

when we started 2 and 1/2 years ago uh

0:31

we said that our mission is to bring

0:33

autonomy to software engineering. Um and

0:35

that is like got a ton of loaded words

0:38

in it. That sounds a little buzzwordy

0:39

right now, but I think that the my goal

0:42

is that you guys leave this like roughly

0:44

20 minutes uh with a bunch of insights

0:47

that will apply to your organization uh

0:49

and the teams that you build, the

0:51

companies you advise, um and if you're

0:53

building products in the space, uh

0:54

insight into like sort of maybe how to

0:56

think about building autonomous systems

0:58

and also making your engineering org one

1:01

that's able to use agents really

1:03

successfully. Um, a sort of like plus of

1:06

this is that ideally this applies to any

1:08

tools you're using that involve AI. So

1:10

it won't be specific to like our product

1:12

or any of the other amazing tools out

1:13

there. Um, I'd like to start with a

1:16

little bit about uh, you know, Andre

1:18

Karpathy had a very welltimed tweet. Uh,

1:20

so of course I'm going to mention it.

1:21

Uh, you know, he he kind of talked about

1:23

uh, this idea of software 2.0 coming

1:26

from auto uh, the the ability to verify

1:29

things, right? Um, this is something

1:31

that's in sort of like the the mind of

1:34

Silicon Valley right now as uh the most

1:37

frontier models are built with post-

1:38

training that involve lots of like

1:40

verifiable tasks. Um, and really I think

1:42

the most interesting thing here is the

1:44

sort of frontier and boundary of what

1:46

can be solved by AI systems is really

1:48

just a uh sort of an input function of

1:51

whether or not you can specify an

1:53

objective and search through the space

1:55

of possible uh solutions, right? And so

1:58

uh we are used to building software uh

2:00

purely via specification. We say like

2:02

the algorithm does this and like input

2:05

is x output is y. But if you sort of

2:07

shift your mindset to thinking about

2:09

automation via verification uh it is a

2:12

little bit of a of of a difference in

2:14

what is possible to build. Um and there

2:17

is another great blog post by uh Jason

2:20

where he talks about the asymmetry of

2:22

verification. Uh this is like pretty

2:24

intuitive to most people who know about

2:26

like P versus NP. Uh it's like a a thing

2:28

that a lot of people have talked about

2:29

throughout the like history of computing

2:32

and and software. But there are a ton of

2:34

tasks that are much easier to verify

2:36

than they are to solve. Um and and vice

2:38

versa, but but the the most interesting

2:40

sorts of uh easy to verify problems are

2:43

ones where there's an objective truth.

2:45

They're pretty quick to validate whether

2:47

or not they're true. Uh they're

2:49

scalable. So validating a bunch of these

2:51

things maybe in parallel uh is easy. Um

2:54

it's low noise so your chance of

2:56

validating it is like really really

2:58

high. Um and they have continuous sort

3:01

of signals. Uh it's not just like a

3:03

binary yes no but like maybe you're 30%

3:06

70% 100% accurate or correct. Um and you

3:10

know the reason I bring both these

3:11

things up is software development is

3:14

highly verifiable. Right? This is like

3:16

the frontier. It's why uh software

3:19

development agents are the most advanced

3:20

agents in the world right now. uh and

3:23

there are so much uh there's so much

3:26

work that has been put in uh over the

3:27

last you know 20 to 30 years around the

3:30

automated validation and verification of

3:32

software that you build um testing right

3:35

unit tests end to end tests QA tests

3:38

right um the frontier of this is

3:40

expanding there's tons of cool companies

3:42

like browser base and computer use

3:44

agents and all these things that are

3:45

making it easier to validate uh really

3:48

complex visual or front-end changes um

3:50

docs right having like an open API spec

3:53

for your codebase uh is something that

3:55

can be automated. It's validated. Um I I

3:58

I can go through and enumerate a bunch

4:00

of these, but I actually think it is

4:01

sort of a nice checklist for yourself,

4:03

right? Do you have some automated

4:05

validation for the format of your code?

4:08

Uh do you have llinters? These things

4:10

for professional software engineers are

4:12

sort of like, yeah, of course we do. But

4:13

I think you can go a step further,

4:15

right? This is where that continuous

4:17

validation component comes in. Um, do

4:19

you have llinters that are so

4:21

opinionated that a coding agent will

4:23

always make code that is exactly at the

4:26

level of what your senior engineers will

4:27

produce? How do you do that? What does

4:29

that even mean? Right? Do you have tests

4:32

that will fail when AI slop has been

4:34

introduced? Uh, and when highquality AI

4:37

code is introduced, those tests pass,

4:39

right? These additional layers of

4:41

validators are things that most code

4:44

bases actually lack because humans are

4:46

pretty good at handling most of this

4:49

stuff without the automated validation.

4:51

Right? Your company may be at some test

4:53

coverage rate that's like 50% or 60%.

4:56

And that's good enough because humans

4:58

will test manually. Um you may have a

5:00

flaky build that every third build it

5:02

sort of fails and everyone at your

5:03

company secretly hates it but no one

5:05

says anything, right? These are the

5:07

sorts of things that we know are true

5:08

about large code bases. And as you scale

5:11

out to extremely large code bases,

5:12

organizations with 44,000 plus

5:15

engineers, right? Uh this starts to

5:17

become a very accepted norm that the bar

5:19

is sort of maybe at 50% or 60%. Um and

5:23

the reality is is most software orgs can

5:25

actually scale like that. uh it's sort

5:27

of fine to be at that lower uh barrier,

5:30

but when you start introducing AI agents

5:32

into your software development life

5:34

cycle, and I don't just mean in

5:35

interactive coding, but really across

5:37

the board, right? Uh review,

5:38

documentation, testing, all this stuff.

5:41

Um this breaks their capabilities. Most

5:43

of you have probably only seen an AI

5:45

agent that operates in a codebase that

5:47

has uh a decent amount of validation. Um

5:51

I think a lot of the best companies in

5:52

the world right now actually have

5:54

introduced very rigorous validation

5:56

criteria and it means that their ability

5:58

to use agents is significantly greater

6:01

than that your like average uh

6:03

developer.

6:05

Uh you know and and if you think about

6:07

it this like traditional loop of

6:09

understanding a problem, designing a

6:11

solution to the problem, coding it out

6:13

and then testing it uh sort of shifts if

6:16

you have really rigorous validation. Uh

6:18

it becomes a process of when you're

6:20

using agents specifying the constraints

6:23

by which you would like to be validated

6:25

and what should be built. Uh generating

6:27

solutions to that outcome verifying uh

6:30

both with your automated validation as

6:32

well as with your your own intuition. Um

6:34

and then iteration where you continue to

6:36

iterate on that loop. This move from

6:39

sort of like traditional development to

6:41

spec specificationdriven development is

6:43

one that we're starting to see sort of

6:45

bleed into all of the different tools.

6:46

Different tools have spec mode. Droids

6:49

have like our Droid is our coding agent

6:51

have like specification mode, plan mode.

6:53

Uh there are entire idees that orient

6:56

you around this like specificationdriven

6:58

flow. Um and if you combine these two

7:01

things together, this is really how you

7:03

build reliable and highquality

7:05

solutions. So if you think about it,

7:07

what is like the best decision for you

7:09

to make as an organization? Is it

7:12

spending 45 days comparing every single

7:14

possible coding tool in the space and

7:16

then determining that one tool is

7:17

slightly better because it's 10% more

7:20

accurate at Swebench or is it making

7:22

changes to your organizational practices

7:24

that enable all of these coding agents

7:26

to succeed and then picking one that

7:28

you're, you know, developers like or

7:30

honestly letting people choose from the

7:32

tons of amazing tools out there.

7:35

And when you have these validation

7:37

criteria, you can actually introduce way

7:40

more complex AI workflows to your

7:42

organization, right? Uh if you cannot

7:44

automatically validate whether or not a

7:47

uh a PR is like reasonably successful or

7:50

has code that won't definitely break

7:52

prod, uh you are not going to be

7:54

parallelizing several like agents at

7:56

once, right? you are not going to be

7:58

decomposing a large-scale modernization

8:01

project uh into a bunch of different

8:03

subtasks like that is that is a very

8:06

frontier style task to use AI for and if

8:09

the single task execution right the

8:11

simple I would like to get this done

8:13

here's exactly how I'd like it to be

8:15

done and here's how you should validate

8:16

if that does not work nearly 100% of the

8:19

time you can sort of forget successfully

8:22

using these other things at scale in

8:24

your company um when you get into other

8:26

tools like code review, right? Uh if you

8:28

want a really highquality AI generated

8:30

code review, you need documentation for

8:33

your AI systems. Uh and yes, uh agents

8:36

will get better at, you know, picking

8:38

out, you know, whether or not to run

8:40

lint or test. They will get better at

8:42

finding solutions when you don't have

8:45

explicit pointers. They'll get better at

8:47

search, but they won't get better at

8:49

just randomly creating this validation

8:51

criteria out of thin air. Right? This is

8:53

why we believe software developers, by

8:55

the way, are going to continue to be

8:56

heavily involved in the process of

8:58

building software because your role

9:00

starts to shift to curating the sort of

9:03

environment and garden that your

9:04

software is built from. You're setting

9:06

the constraints. You're building these

9:08

automations and introducing continued

9:10

opinionatedness

9:12

uh into the uh into these automations.

9:14

Um, and you know, if your company

9:17

doesn't have at least all of these,

9:18

right? Then that means that there's a

9:20

lot of work that you can do totally

9:22

absent of a procurement cycle or buying

9:24

one tool or trying out another one. Uh,

9:27

and so plug is that we help

9:30

organizations do this, right? I think

9:32

that it's great to have tools that allow

9:34

you to uh go in and assess this stuff.

9:37

They have ROI analytics that let you

9:39

interact. Um but I think that for most

9:42

organizations uh there is actually like

9:44

a very clear way to do this right you

9:47

can go and analyze where are you across

9:50

those eight different pillars of like

9:52

automated validation do you have a

9:54

llinter how good is the llinter do you

9:55

have agents MD files an open standard

9:58

that almost every single coding agent

10:00

supports um you can improve uh and

10:03

systematically enhance uh these

10:05

different validation criteria uh and you

10:08

can go through and say Well, we're

10:10

seeing that coding agents are reliable

10:12

enough for a senior developer to use,

10:14

but our junior developers, if you have

10:16

the tooling to to tell, by the way, like

10:18

which developer is using what tools, you

10:20

you you can ask questions like maybe our

10:22

junior developers are actually totally

10:25

unable to use these coding agents. And

10:27

you'll learn that the reason why is not

10:28

because they're like more incompetent or

10:30

they don't know how to use the tool, but

10:31

because there's these niche practices

10:33

that you don't have automated validation

10:35

for, right? And if you think about what

10:37

what is the difference between a like

10:39

Google or a meta and a uh a still large

10:43

but like 2,000 person engineering or the

10:46

difference is that a newrad with

10:47

effectively zero context can go and ship

10:50

a change to make YouTube's like boundary

10:52

like slightly more round and it won't

10:54

with some degree of confidence take down

10:56

YouTube for like a billion users, right?

10:58

And the reason that's possible is

11:00

because of the insane amounts of

11:01

validation that have to happen on that

11:03

code for it to be shipped. The big

11:06

difference that we now have is we have

11:07

coding agents that can go and identify

11:10

exactly where these gaps are and they

11:12

can actually remediate those fixes.

11:14

Right? So you can ask a coding agent,

11:16

could you figure out where we're not

11:18

being opinionated enough about our

11:19

llinters. You can ask a coding agent to

11:22

generate tests. We have an engineer

11:23

named Alvin who I love this quote. He

11:26

said a slop test is better than no test.

11:28

Uh and I think that that's slightly

11:30

controversial, but the thing that I

11:31

would argue here is that just having

11:33

something there, right, that it passes

11:36

uh when changes are correct and somewhat

11:39

accurately uh matches to the spec of

11:41

what you want built, uh people will

11:43

enhance it. They'll upgrade it and other

11:45

agents will actually notice these tests.

11:48

They will follow the patterns. So the

11:50

more opinionated you get, the faster the

11:52

cycle continues. So I think that what

11:54

you guys should be thinking about is

11:55

what are the feedback loops in our

11:56

organization that we are catering

11:58

towards. If you have better agents, they

12:01

will make the environment better which

12:03

will make the agents better which will

12:04

mean you have more time to make the

12:05

environment better. And this is sort of

12:07

the new DevX loop as well that

12:09

organizations can invest in uh that will

12:11

enhance all of the tools that you're

12:13

procuring, right? So no matter whether

12:15

it's a code review tool, a coding agent,

12:17

etc., they will all benefit. Um and I

12:19

would argue that it sort of shifts your

12:21

mental model about what you're as a

12:22

leader investing in when you're

12:24

investing in your software work right

12:26

now. The idea of uh you know opex as

12:30

like the input to engineering projects

12:31

like we are investing in we want more

12:33

people in order to solve this problem.

12:35

we need 10 more people. Um, I would I

12:37

would argue that uh the other thing that

12:39

you can now start investing in is this

12:40

environment feedback loop that enables

12:42

these additional people to be

12:44

significantly more successful, right?

12:46

And I think that that's the feedback

12:48

loop that can actually take quite a lot

12:49

of value because coding agents can just

12:51

scale this out. So you know all of this

12:54

is to say there's a lot that can be done

12:56

outside of the like product itself uh to

12:59

enable these systems and the best coding

13:01

agents will actually take advantage of

13:03

these validation loops right so if your

13:05

coding agent isn't proactively seeking

13:08

llinters tests etc then you know at the

13:11

end of the day it's not going to be as

13:13

good as one that will seek those

13:15

validation criteria and in addition to

13:17

that when organizations uh uh think

13:20

about these sorts of things if you're

13:22

the person who's able to say, "Here's my

13:24

opinion. Here's how I want software to

13:26

be built." It scales your capabilities

13:29

out greater than ever before. Like one

13:31

opinionated engineer can actually

13:33

meaningfully change the velocity of the

13:35

entire business if you take this to

13:37

heart. Uh and you have a way to measure

13:39

and systematically improve. Um so that's

13:42

uh you know the the majority of uh what

13:45

I came here to say. I think that the the

13:47

the only thing that I'd leave you with

13:49

uh is that when you think about where AI

13:52

is going and like where we're at today,

13:54

we are still really earn early in our

13:57

journey of using software development

13:58

agents. If you want a world where the

14:02

moment a customer issue comes in, a bug

14:04

is filed, that ticket is picked up, a

14:07

coding agent executes on that, that

14:10

feedback is presented to a developer,

14:12

they click approve, that code is merged

14:15

and deployed to production in a feedback

14:17

loop that takes maybe an hour, 2 hours.

14:20

That will be possible, right? We all are

14:23

sort of skeptical about that fully

14:24

autonomous flow. That is technically

14:27

feasible today. The limiter is not the

14:29

capability of the coding agent. The

14:31

limit is your organization's validation

14:33

criteria. So this is like an investment

14:35

that made today will make your

14:37

organization not 1.5x, not 2x, but that

14:42

is where the real like 5x, 6x, 7x comes

14:45

from. Um, and it's sort of a an easy

14:47

thing to say and it's an unfortunate

14:49

story because what that means is you

14:51

have to invest in this. It's not

14:52

something that like AI will just

14:54

magically give to you. Uh it's a choice

14:56

that you as an organization have. Uh and

14:58

if you make it now, I can guarantee you

15:00

that you will be in the top 1 5% of

15:03

organizations in terms of edge velocity.

15:05

Um and you will out compete everybody

15:06

else in the field. So highly recommend

15:09

investing in this sort of stuff and

15:10

hopefully you found this helpful and

15:12

have some lessons to take home. Thanks.

15:14

[applause]

15:16

[music]

15:22

>> [music]

15:32

[music]

15:32

>> Heat.

Interactive Summary

Eno discusses the critical role of rigorous validation and feedback loops in successfully integrating AI coding agents into software engineering organizations. Rather than focusing solely on selecting the best AI tool, he argues that companies must invest in automated verification—such as linters, testing frameworks, and clear development specifications—to build reliable, high-quality autonomous systems. By shifting towards 'specification-driven development' and enhancing internal validation criteria, organizations can significantly increase their engineering velocity and unlock the full potential of AI agents.

Suggested questions

3 ready-made prompts