HomeVideos

What Data from 20m Pull Requests Reveal About AI Transformation — Nick Arcolano, Jellyfish

Now Playing

What Data from 20m Pull Requests Reveal About AI Transformation — Nick Arcolano, Jellyfish

Transcript

481 segments

0:00

Hi, my name is Nicholas Arcolano and I'm

0:02

the head of research at Jellyfish.

0:04

Today, I'd like to talk to you about AI

0:06

transformation, specifically what real

0:08

world data can tell us about what's

0:10

actually happening in the wild. Now, a

0:13

lot of AI native companies are being

0:15

founded right now, and there are many

0:16

more existing companies that are trying

0:18

to transform themselves into being AI

0:20

native. I've talked to many folks from

0:23

these companies, and they all have the

0:24

same big questions. Number one, what

0:27

does good adoption of AI coding tools

0:29

and agents actually look like? Uh,

0:32

number two, what productivity gains

0:34

should I be expecting as we transform

0:36

our team and the tools that we use? Uh,

0:39

three, what are the side effects of this

0:42

transformation?

0:43

And perhaps most importantly, if AI

0:46

transformation isn't delivering as

0:48

advertised, what's going on and what can

0:50

you do about it? Now, at Jellyfish, we

0:52

believe the best way to get answers is

0:54

with data. So in the next 15- 20 minutes

0:56

or so, I'm going to give you some

0:58

databacked insights from studies we've

0:59

done to help you tackle these big

1:01

questions.

1:03

Okay, before we jump in though, uh let's

1:05

take a minute to talk about the data

1:07

behind the rest of the stuff in this

1:08

talk. Uh now at Jellyfish, we provide

1:12

analytics and insights for software

1:13

engineering leaders. And to do this, we

1:15

combine information from multiple

1:17

sources, including

1:19

usage and interactions with AI coding

1:21

tools like C-pilot, Cursor, Claude Code,

1:25

uh interactions uh with autonomous

1:27

coding agents, things like Devon and

1:29

Codeex as well as PR review bots. We

1:32

also combine this with data from source

1:34

control platforms like GitHub, so we can

1:36

understand things about the actual

1:38

codebase where the work is happening. We

1:40

also pull in data from task management

1:43

platforms uh things like linear or Jira

1:46

and that tells you about what the actual

1:47

goal of uh the work being done is. So

1:50

for the rest of this talk we're going to

1:52

be looking at findings from a data set

1:53

with data like this uh across our

1:56

customers comprises about 20 million

1:58

poll requests. Uh these were written

2:00

emerged by about 200,000 uh developers

2:03

from around a thousand companies. We've

2:06

been collecting this data for more than

2:07

a year. So today we'll be looking at

2:09

results that span from June 2024 to the

2:12

present. Okay,

2:14

so let's dig in. Question one, what does

2:18

good adoption look like?

2:21

Well, [sighs and gasps] let's start with

2:23

lines of code. I don't think this is a

2:25

great metric, but it's one we all hear

2:26

about in the media a bunch, so it's

2:28

worth talking about. Here's data from a

2:31

cohort of companies we've been tracking

2:32

since June of last year. The purple bar

2:34

represents the fraction of those

2:36

companies that are generating 50% or

2:38

more of their code with AI. So if you

2:40

look at that purple bar, you can see

2:42

that starting last summer, only about 2%

2:44

of these companies were generating 50%

2:46

or more of their code with AI. But you

2:48

can see this has been steadily growing.

2:50

And as of last month, among these same

2:52

companies, now nearly half are

2:53

generating 50% or more of their code

2:56

with AI.

2:57

Now, I think a more useful thing to look

3:00

at actually is developer adoption

3:02

because this gets at the actual behavior

3:05

change that you want to see in your

3:07

team. It's also the thing I've seen that

3:09

correlates most directly with good

3:11

productivity outcomes. And we're going

3:12

to talk about this a lot more later. Uh

3:15

but first, we define an AI adoption rate

3:18

for developers by computing the fraction

3:20

of time that they use AI tools when they

3:22

code. So 100% for a developer, that

3:25

means you're using AI tools every time

3:27

you code. A company's adoption rate for

3:30

the whole company, that's just the

3:31

average of the adoption rates for all

3:33

their individuals. So 100% for a company

3:35

means that every developer is using AI

3:37

every time they code. So what you see

3:40

here, this is a plot of the 25th, 50th,

3:43

and 75th percentile of company adoption

3:45

rates uh by week for the developers and

3:48

companies that we've been tracking. And

3:50

if you look at the AI adoption rates as

3:52

of last summer, you can see the median

3:54

adoption rate was around 22%. So, uh,

3:57

median company developers are using AI

4:01

22% of the time that they code. It's

4:03

grown steadily since then, and today

4:05

we're seeing median adoption rates close

4:08

to 90%. Now, if you're like me and

4:12

you're using multiple tools constantly

4:14

in parallel, both synchronous and

4:16

asynchronous modes, uh, you're you're at

4:18

100%. It might seem crazy to you that

4:20

not everyone else is at 100%. However,

4:22

the reality is that for many teams,

4:24

there are still real technical,

4:26

organizational, and cultural barriers to

4:28

adopting these tools more completely.

4:30

So, that brings me to my final point on

4:32

adoption. You might ask, what about

4:35

autonomous coding agents? Now, the

4:37

results I've just shown you, those are

4:39

overwhelmingly from interactive coding

4:40

tools, things like Copilot, Cursor,

4:43

Claude Code. Now, we know that these

4:45

tools all have interactive agentic

4:47

modes, but what about your your kind of

4:49

true fully autonomous agents like your

4:52

Devons or your codeexs? Maybe you're

4:54

using agents like these or something

4:56

else to great effect or maybe you

4:57

haven't really gotten going with

4:59

autonomous agents yet. It's fine. You

5:01

know, wherever you are in your journey,

5:03

but if it feels like you're slow going

5:05

getting off the ground with autonomous

5:06

agents, I'm here to tell you you're not

5:09

alone. So in our data set, we only see

5:12

about 44% of companies have done

5:14

anything with autonomous agents at all

5:16

in the past 3 months. The vast majority

5:19

of that work is what you'd consider um

5:21

triing and experimentation type stuff

5:23

like not full scale production and

5:26

ultimately it all amounts to less than

5:28

uh 2% of the millions of PRs that were

5:31

merged over that time frame. Uh so you

5:34

know still very early days.

5:38

All right, let's move on. Now, I'd like

5:40

to talk about productivity. So, even

5:42

though autonomous agents aren't yet

5:44

delivering at scale, we're still seeing

5:47

big gains from adoption of interactive

5:49

coding agents. So, let's talk about what

5:51

we're seeing. First though, what do we

5:54

mean by productivity? This can be a very

5:57

loaded term, kind of squishy,

5:59

overloaded. There's many ways to attack

6:01

it. Uh, a good place to start though,

6:04

just plain old PR throughput. How many

6:06

pull requests does the average engineer

6:08

merge per week? Not the most exotic

6:11

metric, but it's proven. It's widely

6:13

accepted. Uh do note that the absolute

6:16

level of PR throughput is something that

6:18

varies, right? It depends on things like

6:21

how you like to scope work. It actually

6:23

also depends on your architecture and

6:25

put a pin in that because we're going to

6:26

talk about that more later. Uh however,

6:29

measuring the change in PR throughput,

6:31

especially to keep all these other

6:32

things constant. Measuring that for your

6:34

team is a good way to uh track

6:36

productivity gains. Another good one,

6:38

cycle time. Uh you know, lots of

6:41

different ways to define that one, but

6:43

basically the latency or lead time to

6:46

code getting deployed. For our purposes,

6:48

we'll take each PR and we'll measure the

6:50

time frame from the first commit in the

6:51

PR until it was merged.

6:54

Okay, so here's what we're seeing for

6:57

changes in PR throughput. And let me

6:59

explain this chart. Uh, every data point

7:02

here is a snapshot of a given company on

7:04

a given week. The x-axis is the

7:07

company's AI adoption rate that we

7:09

discussed earlier. The y-axis is the

7:11

company's average PRs per engineer that

7:14

week. So you can see here a clear

7:16

correlation between AI adoption and PR

7:18

throughput. The average trend here is

7:20

about a 2x change as you go from zero to

7:23

full adoption. So on average, a company

7:26

should expect to double their PR

7:28

throughput if they go from not using AI

7:30

at all, which not really anybody's doing

7:32

anymore, to 100% uh adoption of AI

7:35

coding tools.

7:38

Now, we also see some gains in cycle

7:39

time. So more work is happening and it's

7:42

happening faster. This is similar to the

7:44

previous chart, but now on the y- axis,

7:46

we're looking at median cycle time for

7:48

PRs merged each week instead of PR

7:50

throughput. Uh, this is a cool chart. As

7:52

an aside, I like the cycle time

7:54

distribution because you can see these

7:56

two clear bands horizontally. So that

7:59

lower horizontal cluster that

8:01

corresponds to tasks that take less than

8:02

a day and then you see sort of a valley

8:05

and then there's a band in the middle

8:06

for tasks that take about two days. Then

8:08

there's a long tail of stuff going up

8:09

the y-axis that takes much longer. I've

8:11

truncated it here because as we all know

8:13

some things can take uh quite a while

8:16

[laughter] to to get merged. Um but you

8:20

know what's exciting here is the average

8:21

trend is a 24% decrease in cycle times

8:24

as you go from 0% to 100% adoption of AI

8:27

coding tools.

8:30

So big picture is good news for

8:33

productivity gains and maybe you're

8:35

seeing these things in your own

8:36

organization but uh what about the side

8:39

effects? We all know there's no free

8:40

lunch. So, what other things change as

8:42

you go through an AI transformation?

8:46

Well, one thing we've observed is that

8:48

PRs are getting bigger. So, here's a

8:50

plot like the previous ones I've showed,

8:52

except now the y-axis is PR size. So, on

8:57

average, teams that have fully adopted

8:59

AI coding tools are pushing PRs that are

9:01

18% larger in terms of net lines of code

9:04

added. Now that size change is due much

9:07

more uh you know when I say net it's due

9:09

more to additions than deletion. So that

9:12

means that the combined change is

9:13

primarily coming from net new code not

9:15

necessarily just uh you know fully

9:17

rewritten or heavily reworked code. Uh

9:21

another kind of interesting detail is

9:23

that the average number of files touched

9:25

is about the same. So this change is

9:26

more about code that's uh it's more

9:29

thorough or maybe just more verbose. But

9:31

it's not the case that AI is touching

9:33

more files and changing code in more

9:35

different places in in the code base.

9:37

This is largely happening within the

9:39

same files.

9:41

Well, now if teams are pushing more PRs

9:44

and writing and merging them faster and

9:46

the PRs are getting bigger, then you

9:49

might be wondering about quality. So,

9:51

are we seeing effects on quality as we

9:53

use more AI and push code faster? Well,

9:57

right now the answer is not really.

9:59

We're not really seeing any big effects.

10:01

We've looked at bug tickets created and

10:03

we looked at rates of PR reverts code

10:05

that had to be rolled back and we

10:07

haven't found any statistically

10:08

significant relationship with the rate

10:10

of AI adoption.

10:12

Uh interestingly we have found increases

10:15

in the rates of bugs resolved. Uh when

10:18

you dig into the data you find this is

10:20

because um teams are disproportionately

10:22

using AI to tackle bug tickets in their

10:24

backlog. So you see a lot more uh bug

10:28

tickets being um uh addressed by AI but

10:32

not necessarily being caused by AI. Uh

10:36

this makes sense. You know, bugs are

10:37

often well scoped verifiable tasks that

10:39

AI coding tools can be set up well to

10:40

succeed at. And we're seeing uh a lot of

10:42

people having success throwing AI at

10:45

those kinds of tasks. Uh but basically

10:47

there's there's no smoking gun on

10:49

quality yet though you know we're going

10:52

to keep digging in here especially as

10:54

usage of of asynchronous agents grows.

10:58

All right last question.

11:01

What if what you're seeing at your or

11:04

doesn't align with the kind of results

11:06

we've been talking about here so far?

11:08

You know what if you're listening to

11:10

this and it is just not your reality.

11:13

Well, I think I've made it clear uh so

11:16

far that the most important thing to

11:17

focus on first is adoption. You're not

11:19

going to see gains until you get folks

11:21

using these tools at scale. I think

11:22

that's common sense, but maybe you are

11:23

seeing high adoption and you're still

11:25

not seeing the kind of productivity

11:26

gains that all your friends on LinkedIn

11:28

are crowing about. So, what's going on?

11:32

Well, we've looked at a lot of things

11:33

here and there's plenty more to

11:35

investigate, but I'd like to share one

11:37

that's particularly interesting and uh

11:39

that's code architecture. By code

11:42

architecture uh what I mean is how are

11:46

the code for your products and services

11:48

organized across your repositories. So

11:51

uh think about code being organized into

11:53

monor repos versus poly repos and that

11:57

arrangement of of your code. It could be

12:00

indicative of monolithic services versus

12:03

microservices. It could be the

12:04

difference between a centralized versus

12:06

a more federated product strategy. Uh

12:09

and the way that we actually measure

12:10

this, you know, one key metric for

12:12

understanding it is active repos per

12:15

engineer. This is actually a pretty

12:16

straightforward one. It's just how many

12:18

distinct repos typical engineer uh

12:22

pushes code in in a given week.

12:25

One really cool thing about this metric

12:27

is that it's scale independent. So it

12:30

turns out that you know by computing

12:31

this per engineer normalizing by the

12:33

number of engineers you remove any

12:35

correlation with the size of of the

12:38

company uh with the size of the team. So

12:40

in other words this metric it tells you

12:43

something about the shape of the code

12:44

that your engineers have to work with on

12:46

a daily and weekly basis and it tells

12:48

you that regardless of how big your

12:49

company is.

12:52

So you know this metric that that I'm

12:54

introducing here this is what the

12:56

distribution of that metric looks like.

12:58

Uh here's a probability distribution

13:00

across the companies in our study. The

13:03

more centralized architectures you can

13:05

see on the left uh and then there's a

13:06

long tail of highly distributed

13:09

architectures to the right and then more

13:11

balanced architectures, you know,

13:12

balanced and lightly distributed line

13:14

between these two extremes. So we we've

13:16

got these four regimes as you increase

13:20

um the active repos per engineer.

13:23

So you know, here's where it gets really

13:26

interesting. So remember those 2x gains

13:29

in PR throughput that I showed you

13:31

before. Here's a flashback. Remember

13:33

this. Uh well, if we take this plot, you

13:36

know, take all these data points, all

13:37

these different companies, and you

13:40

segment on um this active repos per

13:43

engineer.

13:45

We've got, you know, four different

13:47

regimes that we can do this analysis in.

13:49

So we've got centralized, balanced,

13:51

distributed, and highly distributed. And

13:54

if we perform that same analysis, we see

13:56

big differences. So looking at that top

13:59

row, you can see centralized and

14:01

balanced code architectures,

14:04

uh, they trend more like 4x, not like

14:06

2x. So they're doing much better than

14:08

the average. And the distributed

14:10

architecture there, uh, in the the lower

14:13

leftand corner in the teal, that that

14:15

looks more like that global 2x trend

14:17

that we see when you look at all the

14:18

data. What's really interesting is this

14:21

highly distributed case. There's

14:22

essentially no correlation here between

14:24

AI adoption and PR throughput. Um, and

14:28

actually the the weak trend that does

14:29

exist is actually slightly negative. So

14:33

what's what's going on here? Like why

14:35

are teams with highly distributed

14:36

architectures struggling? They don't

14:38

seem to be getting real gains, at least

14:40

not on average from AI. Well, a big part

14:44

of what you're seeing here is really the

14:46

problem of context. So most of today's

14:48

tools are really set up uh best to work

14:51

with one repo at a time. You know, we've

14:53

used these uh you know, you pick a repo

14:55

and and you dive in and combining

14:57

context across repos, it's often

14:59

challenging. It's challenging uh for

15:01

humans as well as for coding tools and

15:03

for agents. Uh moreover,

15:06

the relationships between these repos

15:08

and the systems and products they relate

15:10

to, they're often not even written down

15:12

very clearly. They might be largely

15:14

locked in the heads of senior engineers.

15:16

they're definitely not accessible often

15:18

to coding tools and agents. So, it's

15:20

going to take some time for for teams to

15:22

invest in the context engineering that's

15:24

needed here. It's an interesting

15:26

challenge and especially you know in

15:28

light of the fact that uh a lot of folks

15:30

are saying you may have heard this too

15:32

that microservices are the right way to

15:34

go for a native development. So I could

15:37

see a world certainly where we solve

15:39

these context challenges. We adopt

15:41

autonomous agents at scale. They're set

15:43

up for success and this whole thing

15:44

flips and this highly distributed

15:46

category becomes the most productive way

15:48

to do things. But right now this is what

15:51

we're seeing out in the world. Um as an

15:54

aside, another thing you might notice

15:55

here is that all of these distributions

15:57

they you know as you go from the most

15:59

centralized to the most distributed uh

16:02

these uh you know this um PR per

16:04

engineer uh shifts upward uh you know

16:08

what's happening is the absolute number

16:09

of repos uh increases as architectures

16:13

get more distributed. Basically, in a

16:15

highly distributed architecture, it just

16:17

takes more PRs overall to get things

16:18

done due to things like migrations,

16:20

cross reaper coordination. And I bring

16:21

this up because this is one of the many

16:23

reasons why counting PRs in the absolute

16:25

sense isn't isn't a great metric. You

16:27

really need to be tracking change in PR

16:29

throughput to understand productivity.

16:31

Uh because these things vary due to to

16:35

factors like architecture choices.

16:38

Okay, so that's it. Uh to recap, you

16:41

know, probably not news to anyone uh

16:43

watching this, but AI coding tools are

16:45

being used in a big way. Autonomous

16:47

agents though, not so much. It's still

16:49

uh still early days. Uh we're seeing big

16:52

productivity gains with mo more code

16:54

being shipped and faster. Even if all

16:57

you're using is interactive AI coding

16:59

tools like Copilot, Cursor, and Cloud

17:01

Code, you feel like maybe, you know,

17:03

you're not uh as up on agentic, you

17:06

know, fully autonomous agentic coding as

17:08

you ought to be. two exchange of PR

17:10

throughput uh should be your your

17:12

expectation. You should you should be

17:13

seeing that or more. Um but also you

17:16

should expect bigger PRs. Uh but maybe

17:19

we can all ease up on some extreme

17:22

quality anxiety. Like we want to keep an

17:24

eye on that, but we're just not seeing

17:25

big issues there. At least not yet. And

17:28

finally, there are a lot of reasons why

17:30

your mileage may vary and we're going to

17:32

continue looking at this. But one place

17:34

you can start is to think about your

17:36

code architecture. how it might be

17:37

holding you back, what you can do um you

17:40

know to to compensate for some of the

17:43

context limitations you have and

17:44

ultimately try to unlock some of those

17:46

uh the sweet AI productivity gains. So

17:50

that's it. That's all I've got. I'm

17:52

Nicholas Arcolano, head of research at

17:54

Jellyfish. Thank you so much for

17:55

listening.

Interactive Summary

Nicholas Arcolano, Head of Research at Jellyfish, discusses the current state of AI in software development based on real-world data from over 200,000 developers. He highlights that while AI adoption, particularly with interactive coding tools, is surging and delivering significant productivity gains (like doubled PR throughput), the use of fully autonomous agents remains in its early stages. Furthermore, Arcolano emphasizes that organizational code architecture significantly impacts these productivity gains, noting that highly distributed systems currently pose challenges for AI tools due to context limitations.

Suggested questions

3 ready-made prompts