HomeVideos

The 5 Levels of AI Coding (Why Most of You Won't Make It Past Level 2)

Now Playing

The 5 Levels of AI Coding (Why Most of You Won't Make It Past Level 2)

Transcript

1123 segments

0:00

90% of cloud code was written by claude

0:02

code. Codeex is releasing features

0:04

entirely written by codecs. And yet most

0:07

developers using AI empirically get

0:10

slower, at least at first. The gap

0:12

between these two facts is where the

0:13

future of software lives. Imagine

0:15

hearing this at work. Code must not be

0:18

written by humans. Code must not be even

0:20

reviewed by humans. Those are the first

0:23

two principles of a real production

0:24

software team called Strong DM and their

0:27

software factory. They're just three

0:30

engineers. No one writes code. No one

0:32

reviews code. The system is a set of AI

0:35

agents orchestrated by markdown

0:37

specification files. The system is

0:39

designed to take a specification, build

0:41

the software, test the software against

0:43

real behavior scenarios, and

0:45

independently ship it. All the humans do

0:48

is write the specs and evaluate the

0:50

outcomes. The machines do absolutely

0:53

everything in between. As I was saying,

0:55

meanwhile, 90% and yes, it's true. Over

0:58

at Anthropic, 90% of Claude Code's

1:00

codebase was written by Claude Code

1:02

itself. Boris Triny, who leads the

1:04

Claude Code project at Anthropic, hasn't

1:06

personally written code in months. And

1:08

Anthropic's leadership is now estimating

1:10

that functionally 100% the entirety of

1:12

code produced at the company is AI

1:14

generated. And yet at the same time, in

1:17

the same industry, with us here on the

1:19

same planet, a rigorous 2025 randomized

1:22

control trial by METR found that

1:24

experienced open-source developers using

1:27

AI tools took 19% longer to complete

1:32

tasks than developers working without

1:34

them. There is a mystery here. They're

1:36

not going faster, they're going slower.

1:38

And here's the part that should really

1:40

unsettle you. Those developers are bad

1:42

at estimation. They believed AI had made

1:45

them 24% faster. They were wrong not

1:48

just about the direction but about the

1:50

magnitude of the change. Three teams are

1:53

running lights out software factories.

1:56

The rest of the industry is getting

1:57

measurably slower. Just a few teams

1:59

around tech are running truly lights out

2:02

software factories. The rest of the

2:04

industry tends to get measurably slower

2:06

while convincing themselves and everyone

2:08

around them with press releases that

2:09

they're speeding up. The distance

2:11

between these two realities is the most

2:14

important gap in tech right now and

2:16

almost nobody is talking honestly about

2:19

it and what it takes to cross it. That

2:21

is what this video is about. Dan

2:22

Shapiro, the CEO over at Glow Forge and

2:25

the veteran of multiple companies built

2:26

on the boundary between software and

2:28

physical products, just published a

2:30

framework earlier this year in 2026 that

2:32

maps where the industry stands. He calls

2:35

it the five levels of vibe coding. And

2:37

the name is deliberately informal

2:38

because the underlying reality is what

2:40

matters. Level zero is what he calls

2:43

spicy autocomplete. You type the code,

2:45

the AI suggests the next line. You

2:48

accept or reject. This is GitHub copilot

2:50

in its original format. Just a faster

2:52

tab key. The human is really writing the

2:54

software here. And the AI is just

2:56

reducing the keystrokes and the effort

2:57

your fingers have. Level one is coding

3:00

intern. You hand the AI a discrete well

3:02

scoped task. You write the function. You

3:05

build the component. You refactor the

3:06

module. That's the task you give the AI.

3:08

You hand the AI a discrete and well

3:10

scoped task like write this function or

3:13

build this component or refactor this

3:15

module. You then review as the human

3:17

everything that comes back. The AI

3:19

handles the tasks. The human handles the

3:21

architecture, the judgment and the

3:22

integration. Do you see the pattern

3:24

here? Do you see how the human is

3:25

stepping back more and more through

3:27

these levels? Let's keep going. Level

3:29

two is the junior developer. The AI

3:31

handles multifile changes. It can

3:33

navigate a codebase. It can understand

3:35

dependencies. It can build features that

3:36

span modules. You're reviewing more

3:39

complicated output, but you as a human

3:41

are still reading all of the code.

3:42

Shapiro estimates that 90% of developers

3:45

who say they are AI native are operating

3:48

at this level. And I think from what

3:49

I've seen, he's right. Software

3:51

developers who operate here think

3:53

they're farther along than they are.

3:55

Let's move on. Level three, the

3:57

developer is now the manager. This is

3:59

where the relationship starts to flip.

4:01

This is where it gets interesting.

4:02

You're now not writing code and having

4:04

the AI help. You're simply directing the

4:06

AI and you're reviewing what it

4:08

produces. Your day is whether you want

4:11

to read, whether you want to approve,

4:12

whether you want to reject, but at the

4:14

feature level, at the PR level. The

4:17

model is doing the implementation. The

4:18

model is submitting PRs for your review.

4:21

You have to have the judgment. Almost

4:23

everybody tops out here right now. Most

4:26

developers, Shapiro says, hit that

4:27

ceiling at level three because they are

4:30

struggling with the psychological

4:33

difficulty of letting go of the code.

4:35

But there are more levels. And this is

4:37

where it gets spicy and exciting. Level

4:39

four is the developer as the product

4:41

manager. You write a specification, you

4:44

leave, you come back hours later and

4:46

check whether the tests pass. You're not

4:48

really reading the code anymore. You're

4:50

just evaluating the outcomes. The code

4:52

is a black box. you care whether it

4:54

works, but because you have written your

4:56

eval so completely, you don't have to

4:59

worry too much about how it's written if

5:01

it passes. This requires a level of

5:03

trust both in the system and in your

5:06

ability to write spec. And that quality

5:08

of spec writing almost nobody has

5:10

developed well yet. Level five, the dark

5:13

factory. This is effectively a black box

5:16

that turns specs into software. It is

5:18

where the industry is going. No human

5:20

writes the code. No human even reviews

5:23

the code. The factory runs autonomously

5:26

with the lights off. Specification goes

5:29

in, working software comes out. And you

5:32

know, Shapiro is correct. Almost nobody

5:34

on the planet operates at this level.

5:36

The rest of the industry is mostly

5:38

between level one and level three, and

5:40

most of them are treating AI kind of

5:42

like a junior developer. I like this

5:44

framework because it gives us really

5:46

honest language for a conversation

5:48

that's been drowning in hype. When a

5:50

vendor tells you their tool writes code

5:52

for you, they often mean level one. When

5:55

a startup says they're doing agentic

5:57

software development, they often mean

5:59

level two or three. But when strong DM

6:01

says their code must not be written by

6:03

humans, they really do mean level five,

6:06

the dark factory, and they actually

6:08

operate there. The gap between marketing

6:11

language and operating reality is

6:13

enormous. and collapsing that gap into

6:16

what is actually going on on the ground

6:18

requires changes that go way beyond

6:21

picking a better AI tool. So many people

6:24

look at this problem and think this is a

6:26

tool problem. It's not a tool problem.

6:28

It's a people problem. So what does

6:31

level five software development actually

6:34

look like? I think strong DM software

6:37

factory is the most thoroughly

6:38

documented example of level five in

6:40

production. Simon Willis, one of the

6:42

most careful and credible observers in

6:44

the developer tooling space, calls

6:46

StrongDm Software Factory, quote, "The

6:49

most ambitious form of AI assisted

6:51

software development that I've seen

6:53

yet." The details are really worth

6:55

digging into here because they reveal

6:57

what it looks like to run a dark factory

6:59

for software on today's agents. And as

7:02

we have this discussion, I want you to

7:05

keep in mind that for most of us

7:07

listening, we are getting to time

7:09

travel. We are seeing how a bold vision

7:12

for the future can be translated into

7:14

reality with today's agents and today's

7:16

agent harnesses. It is only going to get

7:19

easier as we go into 2026 which is one

7:22

of the reasons I think this is going to

7:25

be a massive center of gravity for

7:27

future agentic software development

7:29

practices. We are all going to level

7:31

five. So what does strong DM do? The

7:34

team is three people. Justin McCarthy,

7:36

CTO, Jay Taylor, and Nan Chowan. They've

7:39

been running the factory since July of

7:41

last year, actually. And the inflection

7:44

point they identify is Claude 3.5

7:46

Sonnet, which shipped actually in the

7:49

fall of 2024. That's when long horizon

7:52

agentic coding started compounding

7:54

correctness more than compounding

7:56

errors. Give them credit for thinking

7:58

ahead. Almost no one was thinking in

8:00

terms of dark factories that far back.

8:03

But they found that 3.5 sonnet could

8:06

sustain coherent work across sessions

8:09

long enough that the output was reliable

8:11

and it wasn't just a flash in the pan.

8:14

It wasn't just demo worthy and so they

8:16

built around it. The factory runs on an

8:18

open-source coding agent called

8:19

attractor. The repo is just three

8:22

markdown specification files and that's

8:24

it. That's the agent. The specifications

8:27

describe what the software should do.

8:29

The agent reads them. It writes the code

8:31

and it tests it. And here's where it

8:33

gets really interesting and where most

8:35

people's mental model really starts to

8:37

break down. Strong DM doesn't actually

8:40

use traditional software tests. They use

8:42

what they call scenarios. And the

8:44

distinction is important. Tests

8:46

typically live inside the codebase. The

8:48

AI agent can read them, which means the

8:50

AI agent can intentionally or not

8:53

optimize for passing the tests rather

8:55

than building correct software. It's the

8:58

same problem as teaching to the test in

9:00

education. You can get perfect scores

9:02

and shallow understanding. Scenarios are

9:04

different. Scenarios live outside the

9:06

codebase. They're behavioral

9:08

specifications that describe what the

9:10

software should do from an external

9:12

perspective, stored separately so the

9:15

agent cannot see them during

9:16

development. They function as a holdout

9:19

set. The same concept that machine

9:21

learning users use to prevent

9:23

overfitting. The agent builds the

9:25

software and the scenarios evaluate

9:27

whether the software actually works. The

9:30

agent never sees the evaluation

9:32

criteria. It can't game the system. This

9:34

is really a new idea in software

9:36

development and I don't see it

9:38

implemented very frequently yet. But it

9:40

solves a problem that nobody was

9:42

thinking about when all the code was

9:44

written by humans. When humans write

9:46

code, we don't tend to worry about the

9:48

developer gaming their own test suite

9:50

unless incentives are really, really

9:52

skewed at that organization and then you

9:54

have bigger problems. When AI writes the

9:57

code, optimizing for test passage is the

10:00

default behavior unless you deliberately

10:02

architect around it. And it's one of the

10:04

most important differences to really

10:07

understand as you start to think about

10:09

AI as a code builder. Strongdm

10:11

architected around that with external

10:14

scenarios. The other major piece of the

10:16

architecture is what StrongDM calls

10:18

their digital twin universe. Behavioral

10:21

clones of every external service the

10:24

software interacts with. a simulated

10:26

octa, a simulated Jira, a simulated

10:29

Slack, Google Docs, Google Drive, Google

10:31

Sheets. The AI agents develop against

10:34

these digital twins, which means they

10:36

can run full integration testing

10:38

scenarios without ever touching real

10:41

production systems, real APIs, or real

10:44

data. It's a complete simulated

10:46

environment purpose-built for autonomous

10:48

software development. And the output is

10:50

real. CXDB, their AI context store, has

10:53

16,000 lines of Rust, nine and a half

10:55

thousand lines of Go, and 700 lines of

10:58

TypeScript. It's shipped, it's in

11:00

production, it works, it's real

11:01

software, and it's built by agents end

11:03

to end. And then the metric that tells

11:04

you how seriously they take it. They say

11:07

if you haven't spent $1,000 per human

11:10

engineer, your software factory has room

11:12

for improvement. I think they're right.

11:15

That's not a joke. $1,000 per engineer

11:17

per day enables AI agents to run at a

11:20

volume that makes the cost of compute

11:23

meaningful if you are giving them a

11:25

mission to build software that has real

11:27

scale and real utility in production use

11:30

cases and it's often still cheaper than

11:32

the humans they're replacing. Let's hop

11:34

over and look at what the hyperscalers

11:36

are doing. The self-referential loop has

11:39

taken hold at both anthropic and open

11:41

AAI and it's stranger than the hype

11:43

might make it sound. Codex 5.3 is the

11:46

first frontier AI model that was

11:47

instrumental in creating itself. And

11:50

that's not a metaphor. Earlier builds of

11:51

Codeex would analyze training logs,

11:53

would flag failing tests, and might

11:55

suggest fixes to training scripts. But

11:58

this model shipped as a direct product

12:01

of its own predecessors coding labor.

12:04

OpenAI reported a 25% speed improvement

12:07

and 93% fewer wasted tokens in the

12:11

effort to build Codeex 5.3. And those

12:14

improvements came in part from the model

12:16

identifying its own inefficiencies

12:19

during the build process. Isn't that

12:21

wild? Cloud code is doing something

12:22

similar. 90% of the code in Claude Code,

12:25

including the tool itself, was built by

12:27

Claude Code, and that number is rapidly

12:29

converging toward 100%.

12:31

Boris Churny isn't joking when he talks

12:34

about not writing code in the last few

12:35

months. He's simply saying his role has

12:37

shifted to specification, to direction,

12:40

to judgment. Anthropic is estimating all

12:43

of their company moving to entirely AI

12:45

generated code about now. Everyone at

12:48

Anthropic is architecting and the

12:51

machines are implementing. And the

12:52

downstream numbers tell the same story.

12:55

When I made a video on co-work and

12:57

talked about how it was written in 10

12:59

days by four engineers, what I want you

13:02

to remember is it wasn't just four

13:04

engineers hyperting so that they could

13:06

get that out super fast and write every

13:08

line by hand. No, no, no. They were

13:11

directing machines to build the code for

13:14

co-work. And that's why it was so fast.

13:16

4% of public commits on GitHub are now

13:19

directly authored by Claude Code, a

13:21

number that Anthropic thinks will exceed

13:23

20% by the end of this year. I think

13:25

they're probably right. Claude Code by

13:27

itself has hit a billion dollar run rate

13:30

just 6 months since launch. This is all

13:33

real today in February of 2026. The

13:36

tools are building themselves. They're

13:38

improving themselves. is they're

13:40

enabling us to go faster at improving

13:42

themselves and that means the next

13:44

generation is going to be faster and

13:46

better than it would have been otherwise

13:47

and we're going to keep compounding. The

13:49

feedback loop on AI has closed and the

13:53

question is not whether we're going to

13:55

start using AI to improve AI. The

13:57

question is how fast that loop is going

13:59

to accelerate and what it means for the

14:02

40 or 50 million of us around the world

14:04

who currently build software for a

14:05

living. This is true for vendors as much

14:08

as it's true for software developers.

14:10

And I don't think we talk about that

14:11

enough because the gap between what's

14:13

possible at the frontier in February of

14:15

2026 and what tends to happen in

14:18

practice and what vendors want to sell

14:20

has never been wider. That MER study, a

14:23

randomized control trial, by the way,

14:24

not a survey, found that open source

14:27

developers using AI coding tools

14:29

completed their task 19% slower. We

14:32

talked about that, right? The

14:33

researchers controlled for task

14:34

difficulty. They controlled for

14:36

developer experience. They controlled

14:38

even for tool familiarity and none of it

14:40

mattered. AI made even experienced

14:42

developers slower. Why? In a world where

14:45

co-work can ship that fast. Why? Because

14:48

the workflow disruption outweighed the

14:50

generation speed. Developers spent time

14:53

evaluating AI suggestions, correcting

14:56

almost right code, context switching

14:58

between their own mental model and the

15:00

model's output, and debugging really

15:02

subtle errors introduced by generated

15:04

code that looked correct but weren't.

15:06

46% of developers in broader surveys say

15:09

they don't fully trust AI generated

15:11

code. These guys aren't lites, right?

15:13

This is experienced engineers running

15:15

into a consistent problem. The AI is

15:18

fast, but it struggles with the

15:19

reliability to trust without what they

15:22

view as vital human review. And this

15:25

irony is the J curve that adoption

15:28

researchers keep identifying. When you

15:30

bolt an AI coding assistant onto an

15:33

existing workflow, productivity dips

15:36

before it gets better. It goes down like

15:38

the bottom of a J. Sometimes for a

15:40

while, sometimes for months. And the dip

15:42

happens because the tool changes the

15:44

workflow, but the workflow has not been

15:46

redesigned around the tool explicitly.

15:49

And so you're kind of running a new

15:51

engine on old transmission. The gears

15:54

are going to grind. Most organizations

15:55

are sitting in the bottom of that J

15:57

curve right now. And many of them are

15:59

interpreting the dip as evidence that AI

16:02

tools don't work, that the vendors did

16:04

not tell them the truth, and that the

16:06

evidence that their workflows haven't

16:08

adapted is really evidence that AI is

16:11

hype and not real. I think GitHub

16:13

Copilot might be the clearest

16:15

illustration of this. It has 20 million

16:17

users, 42% market share among AI coding

16:20

tools, apparently. Uh, and lab studies

16:22

show 55% faster code completion on

16:25

isolated tasks. I'm sure that makes the

16:28

people driving GitHub Copilot happy in

16:30

their slide decks. But in production,

16:32

the story is much more complicated.

16:35

There are larger poll requests. There

16:36

are higher review costs. There's more

16:38

security vulnerabilities introduced by

16:40

generated code. And developers are

16:43

wrestling with how to do it well. One

16:44

senior engineer put it really sharply.

16:46

C-Ilot makes writing code cheaper but

16:49

owning it more expensive. And that is

16:51

actually a very common sentiment I've

16:52

heard across a lot of engineers in the

16:54

industry. not just for co-pilot but for

16:56

AI generated code in general. The

16:58

organizations that are seeing

17:00

significant call it 25 30% or more

17:02

productivity gains with AI are not the

17:05

ones that just installed co-pilot had a

17:08

one-day seminar and called it done.

17:10

They're the ones that thought carefully

17:12

went back to the whiteboard and

17:14

redesigned their entire development

17:16

workflow around AI capabilities.

17:19

changing how they write their specs,

17:20

changing how they review their code,

17:22

changing what they expect from junior

17:24

versus senior engineers, changing their

17:26

CI/CD pipelines to catch the new

17:28

category of errors that AI generated

17:30

code introduces. End to end process

17:33

transformation. It's not about tool

17:35

adoption. And end toend transformation

17:37

is hard. It's sometimes it's politically

17:40

contentious. It's expensive. It's slow

17:42

and most companies don't have the

17:44

stomach for it. Which is why most

17:46

companies are stuck at the bottom of the

17:48

J curve. Which is why the gap between

17:50

frontier teams and everyone else is not

17:53

just widening, it's accelerating

17:55

rapidly. Because those teams on the edge

17:57

that are running dark factories, they

17:59

are positioned to gain the most. As

18:01

tools like Opus 4.6 and Codeex 5.3

18:05

enable widespread agentic powers for

18:08

every software engineer on the planet.

18:10

95% of those software engineers don't

18:12

know what to do with that. It's the ones

18:14

that are actually operating at level

18:15

four, level five that truly get the

18:18

multiplicative value of these tools. So

18:20

if this is a politically contentious

18:22

problem, if this is not just a tool

18:24

problem but a people problem, we need to

18:26

look at the nature of our software

18:29

organizations. Most software

18:31

organizations were designed to

18:33

facilitate people building software.

18:36

every process, every ceremony, every

18:38

role. They exist because humans building

18:41

software in teams need coordination

18:44

structures. Stand-up meetings exist

18:46

because developers working on the same

18:47

codebase, they got to synchronize every

18:50

single day. Sprint planning exists

18:52

because humans can only hold a certain

18:54

number of tasks in working memory and

18:56

then they need a regular cadence to rep

18:58

prioritize. Code review exists because

19:00

humans make mistakes that other humans

19:02

can catch. QA teams exist because the

19:05

people who build software, they can't

19:07

evaluate it objectively. You get the

19:09

idea. Every one of these structures is a

19:12

response to a human limitation. And when

19:14

the human is no longer the one writing

19:16

the code, the structures, they're not

19:19

optional, they're friction. So what does

19:22

sprint planning look like when the

19:24

implementation happens in hours, not

19:26

weeks? What does code review look like

19:28

when no human wrote the code and no

19:31

human can really review the diff that AI

19:34

produced in 20 minutes because it's

19:35

going to produce another one in 20 more

19:37

minutes. So what does a QA team do when

19:39

the AI already tested against scenarios

19:42

it was never shown? Strong BM's

19:43

threeperson team doesn't have sprints.

19:46

They don't have standups. They don't

19:48

have a Jiraa board. They write specs and

19:50

they evaluate outcomes. That is it.

19:53

The entire coordination layer that

19:55

constitutes the operating system of a

19:57

modern software organization. The layer

19:59

that most managers spend 60% of their

20:02

time maintaining is just deleted. It

20:05

does not exist. Not because it was

20:07

eliminated as a cost-saving measure, but

20:09

because it no longer serves a purpose.

20:12

This is the structural shift that's

20:13

harder to see than the tech shift, and

20:16

it might matter more. The question is

20:18

becoming what happens to the

20:19

organizational structures that were

20:21

built for a world where humans write

20:24

code? What happens to the engineering

20:26

manager whose primary value is

20:28

coordination? What happens to the scrum

20:31

master, the release manager, the

20:32

technical program manager whose job is

20:34

to make sure a dozen teams ship on time?

20:38

Look, those roles don't disappear

20:39

overnight, but the center of gravity is

20:42

shifting. The engineering manager's

20:44

value is moving from coordinate the team

20:48

building the feature to define the

20:50

specification clearly enough that agents

20:52

build the feature. The program manager's

20:54

value is moving from track dependencies

20:57

between human teams to architect the

20:59

pipeline of specs that flow through the

21:01

factory. The skills that matter are

21:03

shifting very rapidly from coordination

21:06

to articulation. From making sure people

21:08

are rowing in the same direction to

21:10

making sure the direction is described

21:12

precisely enough that machines can go do

21:14

it. And oh, by the way, for engineering

21:16

managers, there's an extra challenge.

21:18

How do you coach your engineers to do

21:20

the same thing? It's a people challenge.

21:22

If you think this is a trivial shift,

21:24

you have never tried to write a

21:26

specification detailed enough for an AI

21:28

agent to implement it correctly without

21:30

human intervention. And you've certainly

21:32

never sat down and tried to coach an

21:34

engineer to do the same. It is a

21:35

different skill. It requires the kind of

21:38

rigorous systems thinking that most

21:40

organizations have never needed from

21:42

most of their people because the humans

21:44

on the other end of the spec could fill

21:45

in the gaps with judgment, with context,

21:48

with a slack message that says, "Did you

21:49

mean X or Y?" The machines don't have

21:52

that layer of human context. They build

21:54

what you described. If what you

21:56

described was ambiguous, you're going to

21:58

get software that fills in the gaps with

22:00

software guesses, not customer- ccentric

22:02

guesses. The bottleneck has moved from

22:04

implementation speed to spec quality.

22:07

And spec quality is a function of how

22:10

deeply you understand the system, your

22:12

customer, and your problem. That kind of

22:15

understanding has always been the

22:17

scarcest resource in software

22:19

engineering. The dark factory doesn't

22:20

reduce the demand for that. It just

22:22

makes the demand an absolute law. It

22:25

becomes the only thing that matters.

22:28

Now, let's be honest. Everything that I

22:30

have just talked about assumes you're

22:32

building from scratch. Most of the

22:34

software economy is not built from

22:36

scratch. The vast majority of enterprise

22:39

software is brownfield. It's existing

22:41

systems. It's accumulated over years,

22:43

over decades. It's running in

22:45

production, serving real users, carrying

22:47

real revenue. CRUD applications that

22:50

process business transactions. Monoliths

22:52

that have grown organically through 15

22:54

years of feature additions. CI/CD

22:56

pipelines tuned to the quirks of a

22:58

specific codebase and a specific team's

23:00

workflow. Config management that exists

23:02

in the heads of the three people who've

23:04

been at the company long enough to

23:05

remember why that one environment

23:07

variable is set to that one value. You

23:09

know who you are. You cannot dark

23:11

factory your way through a legacy

23:13

system. You cannot just pretend that you

23:15

can bolt that on. It doesn't work that

23:17

way. The specification for that does not

23:19

exist. The tests, if they're any, cover

23:22

30% of your existing codebase, and the

23:24

other 70% runs on institutional

23:26

knowledge and tribal lore and someone

23:29

who shows up once a week in a polo shirt

23:31

and knows where all the skeletons are

23:33

buried in the code. The system is the

23:35

specification. It's the only complete

23:38

description of what the software does

23:40

because no one ever wrote down the

23:42

thousand implicit decisions that

23:44

accumulated over a decade or more of

23:47

patches of hot fixes of temporary

23:49

workarounds that of course became

23:51

permanent. This is the truth about the

23:54

interstitial states that lie along this

23:57

continuum toward more autonomous

23:59

software development. For most

24:01

organizations, the path is not to start

24:04

with deploy an agent that writes code.

24:06

It starts with let's develop a

24:08

specification for what your real

24:11

existing software really actually does.

24:14

And that specification work that reverse

24:17

engineering of the implicit knowledge

24:19

embedded in a running system is very

24:22

difficult and it's deeply human work. It

24:25

requires the engineer who knows why the

24:27

billing module has the one edge case for

24:29

Canadian customers. It requires the

24:31

architect who remembers which micros

24:34

service it was that carved out of the

24:36

monolith under duress during the 2021

24:38

outage and we've always maintained it

24:39

ever since. It requires the product

24:41

person who can explain what the software

24:44

actually does for real users versus what

24:46

the PRD says it does. Domain expertise,

24:49

ruthless honesty, customer

24:51

understanding, systems thinking. exactly

24:54

the human capabilities that matter even

24:57

more in the dark factory era, not less.

25:00

Look, the migration path is different

25:02

for every business, but it starts to

25:04

look something like this. First, you use

25:07

your AI as much as you can at say level

25:09

two or level three to accelerate the

25:11

work your developers are already doing,

25:14

writing new features, fixing bugs,

25:16

refactoring modules. This is where most

25:18

organizations are at now and it's where

25:20

the J-Curve productivity dip and it's

25:23

where the J-Curve productivity dip

25:25

happened. You should expect that.

25:26

Second, you start using AI to document

25:29

what your system really does, generating

25:32

specs directly from the code, building

25:34

scenario suites that capture real

25:36

existing behavior, creating the holdout

25:38

sets that a future dark factory will

25:40

need. Then you redesign your CI/CD

25:43

pipeline to handle AI generated code at

25:45

volume. different testing strategies,

25:47

different review processes, different

25:49

deployment gates. Fourth, you start to

25:53

begin to shift new development to level

25:55

four or five autonomous agent patterns

25:57

while maintaining the legacy system in

26:00

parallel. That path takes time. Anyone

26:02

telling you otherwise is selling you

26:04

something. The organizations that will

26:06

get there the fastest aren't necessarily

26:08

the ones that bought the fanciest vendor

26:10

tools. They're the ones who can write

26:13

the best and most honest specs about

26:15

their code, who have the deepest domain

26:17

understanding, who have the discipline

26:19

to invest in the boring, unglamorous

26:21

work of documenting what their systems

26:24

really do and of how they can support

26:26

their people to scale up in the ways

26:29

that will support this new dark factory

26:31

era. I cannot give you a clear timeline

26:33

here. For some organizations, this is

26:36

looking like a multi-year transition,

26:38

and I don't want to hide the ball on

26:39

that. Some are going faster and it's

26:41

looking like multimonth. It will depend,

26:43

frankly, on the stomach you have for

26:45

organizational pain. And that brings me

26:47

to the talent reckoning. Junior

26:49

developer employment is dropping 9 to

26:52

10% within six quarters of widespread AI

26:55

coding tool adoption, according to a

26:56

2025 Harvard study. Anyone out there at

26:59

the start of their career is nodding

27:00

along and saying it's actually worse

27:01

than that. In the UK, graduate tech

27:04

roles fell 46% in 2024 with a further

27:08

53% drop projected by 2026. In the US,

27:11

junior developer job postings have

27:13

declined by 67%.

27:16

Simply put, the junior developer

27:18

pipeline is starting to collapse, and

27:20

the implications go far beyond the

27:22

people who cannot find entry-level jobs,

27:24

although that is bad enough and it's a

27:26

real issue. The career ladder in

27:28

software engineering has always worked

27:30

like this. Juniors learn by doing. They

27:34

write simple features. They fix small

27:35

bugs. They absorb the codebase through

27:38

immersion. Seniors review the work and

27:40

mentor them and catch their mistakes.

27:42

Over 5 to seven years, a junior becomes

27:44

a senior through accumulated experience.

27:47

The system is frankly an apprenticeship

27:50

model wearing enterprise clothing. AI

27:52

breaks that model at the bottom. If AI

27:54

handles the simple features and the

27:56

small bug fixes, the work that juniors

27:58

lean on, where do the juniors learn? If

28:01

AI reviews code faster and more

28:03

thoroughly than a senior engineer doing

28:05

a PR review, where does the mentorship

28:07

start to happen? The career ladder is

28:09

getting hollowed out from underneath.

28:11

Seniors at the top, AI at the bottom,

28:13

and a thinning middle where learning

28:14

used to happen. So, the pipeline is

28:16

starting to break. And yet, we need more

28:19

excellent engineers than we have ever

28:21

needed before, not fewer engineers. I've

28:24

said this before. I do not believe in

28:26

the death of software engineering. We

28:28

need better engineers. The bar is rising

28:31

and it's rising toward exactly the

28:34

skills that have always been the hardest

28:36

to develop and the hardest to hire for.

28:38

The junior of 2026 needs the systems

28:41

design understanding that was expected

28:43

of a mid-level engineer in 2020. Not

28:46

because the entry-level work necessarily

28:48

got harder, but because the entry-level

28:50

work got automated and the remaining

28:53

work requires deeper judgment. And you

28:55

don't need someone who can write a CRUD

28:57

endpoint anymore. Right? The AI will

28:58

handle that in a few minutes. You need

29:00

someone who can look at a system

29:01

architecture and identify where it will

29:04

break under load, where the security

29:06

model has gaps, where the user

29:08

experience falls apart at the edge

29:09

cases, and where the business logic

29:11

encodes assumptions that are about to

29:13

become wrong. And if you think as a

29:15

junior that you can use AI to patch

29:17

those gaps, I've got news for you. The

29:19

seniors are using AI to do that and they

29:22

have the intuition over the top. So you

29:24

need systems thinking, you need customer

29:26

intuition. You need the ability to hold

29:28

a whole product in your head and reason

29:31

about how those pieces interact. You

29:33

need the ability to write a

29:34

specification clearly enough that an

29:36

autonomous agent can implement it

29:38

correctly, which requires understanding

29:40

the problem deeply enough to anticipate

29:42

the questions the agent does not know to

29:45

ask. Those skills have always separated

29:47

really great engineers from merely

29:49

adequate ones. The difference now is

29:51

that adequate is no longer a viable

29:53

career position regardless of seniority

29:56

because adequate is what the models do.

29:58

Enthropics hiring has already shifted.

30:00

Open AAI's hiring has already shifted.

30:02

Hiring is shifting across the industry

30:04

and it's shifting toward generalists

30:06

over specialists. People who can think

30:08

across domains rather than people who

30:11

are expert in one really narrow tech

30:13

stack. The logic is super

30:14

straightforward, right? When the AI

30:16

handles the implementation, the human's

30:19

value is in understanding the problem

30:21

space broadly enough to direct

30:22

implementation correctly. A specialist

30:25

who knows everything about Kubernetes

30:26

but can't reason about the product

30:28

implications of an architectural

30:30

decision is way way less valuable than a

30:33

generalist who understands the systems,

30:35

the users, and the business constraints

30:36

even if they can't handconfigure a pot.

30:39

Some orgs are moving toward what amounts

30:41

to a medical residency model for their

30:43

junior engineers. Simulated environments

30:45

where early career developers learn by

30:47

working alongside AI systems, reviewing

30:49

AI output, and developing judgment about

30:51

what's correct and what's subtly wrong

30:53

by working with AI. It is not the same

30:56

thing as learning by writing code from

30:58

scratch. I don't want to pretend it is,

31:00

but it might be better training for a

31:02

world where the job is directing and

31:04

evaluating AI output rather than

31:06

producing code from a blank editor. I

31:08

will also call out, as I've called out

31:10

before, there are organizations

31:12

preferentially hiring juniors right now,

31:15

despite the pipeline collapsing

31:17

precisely because the juniors they are

31:20

looking for provide an AI native

31:22

injection of fresh blood into an

31:24

engineering org where most of the

31:27

developers started their careers long

31:29

before chat GPT launched in 2022. In

31:32

that world, having people who are AI

31:34

native from the get-go can be a huge

31:36

accelerating factor. And that points to

31:38

one of the things that is a plus for

31:40

juniors coming in. Lean into the AI if

31:43

you're a junior. Lean into your

31:45

generalist capabilities. Lean into how

31:48

quickly you can learn. Show that you can

31:50

pick up a problem set and solve it in a

31:53

few minutes with AI across a really wide

31:56

range of use cases. Gartner is

31:58

projecting that 80% of software

32:00

engineers will need to upskill in AI

32:02

assisted dev tools by 2027. Estimating

32:05

wrong. it's going to be 100%. The number

32:09

is not the point. The question isn't

32:11

whether the skills need to change. We

32:13

all know they will. It's whether we in

32:15

the industry can develop the training

32:18

infrastructure quickly enough to keep

32:20

pace with the capability change. Because

32:22

I've got to be honest with you, if

32:24

you're a software engineer and the last

32:27

model you touched was released in

32:30

January of 2026, you are out of date.

32:33

You need a February model. And that is

32:35

going to keep being true all the way

32:36

through this year and into next year.

32:38

And whether the organizations that

32:40

depend on software can tolerate a period

32:43

where the talent pipeline is being built

32:45

and rebuilt like this on a monthly basis

32:48

is a big question because you have to

32:51

invest in your people more to get them

32:54

through this period of transition. So

32:56

what does the shape of a new org look

32:58

like when we look at AI native startups?

33:01

How are they different from these

33:02

traditional orgs? cursor. The AI native

33:05

code editor is past half a billion

33:07

dollars in annual recurring revenue and

33:09

it has at last count a couple of dozen

33:12

few dozen employees. It's operating at

33:14

roughly three and a half million in

33:16

revenue per employee in a world where

33:18

the average SAS company is generating

33:22

$600,000 per employee. Midjourney is

33:25

similar. They have the story of

33:26

generating half a billion in revenue

33:28

with a few dozen people around a hundred

33:31

a little bit more depending on who's

33:32

counting. Lovable is well into the

33:34

multiundred million dollars in ARR in

33:37

just a few months and their team is

33:39

scaling but it's way way behind the

33:42

amount of revenue gain they're

33:43

experiencing. They are also seeing that

33:45

multi-million dollar revenue per

33:47

employee world. The top 10 AI native

33:50

startups are averaging three and change

33:52

million in revenue per employee which is

33:55

between five and six times the SAS

33:57

average. This is happening enough that

34:00

it is not an outlier. This is the

34:02

template for an AI native org. So what

34:05

does that org look like? If you have 15

34:07

million people generating a hund00

34:08

million a year, which we've seen in

34:10

multiple cases in 2025, what does that

34:12

look like? It does not look like a

34:14

traditional software company. It does

34:16

not have a traditional engineering team,

34:18

a traditional product team, a QA team, a

34:20

DevOps team. It looks like a small group

34:23

of people who are exceptionally good at

34:26

understanding what users need, who are

34:28

exceptional at translating that into

34:30

clear spec, and who are directing AI

34:32

systems that handle that implementation.

34:34

The org chart is flattening radically.

34:37

The layers of coordination that exist to

34:39

manage hundreds of engineers building a

34:41

product can be deleted when the

34:43

engineering is done by agents. The

34:45

middle management layer is going to

34:47

either evolve into something

34:48

fundamentally different at these big

34:50

companies or it's going to cease to

34:52

exist entirely. The only people who

34:55

remain are the ones whose judgment

34:58

cannot be automated. The ones who know

35:00

what to build for whom and why, and who

35:02

have excellent AI sense. Sort of like

35:06

horse sense where you have a sense of

35:08

the horse if you're a rider and you can

35:10

direct the horse where you want to go.

35:11

You'll need people who have that sense

35:13

with artificial intelligence. And yes,

35:15

it is a learned skill. The restructuring

35:18

that is going to happen as more and more

35:20

companies move toward that cursor model

35:23

of operating, even if they never

35:25

completely get there, that restructuring

35:27

is real. It's going to happen. It's

35:30

going to be very painful for specific

35:32

people in specific roles. the middle

35:34

management layer, the junior developer

35:36

whose entry-level work is getting

35:38

automated first, the QA engineers who

35:40

just run manual test passes, the release

35:43

manager whose entire value is just

35:46

coordination. Those kinds of roles are

35:49

going to have to transform or they're

35:51

just going to disappear. And for people

35:53

in those roles, you need to find ways to

35:57

move toward developing with AI and

36:02

rewriting your entire workflow around

36:04

agents as central to your development.

36:07

That is going to look different

36:08

depending on your stack, your manager's

36:10

budget for token spend, and your

36:13

appetite to learn. But you need to lean

36:16

that way as quickly as you can for your

36:18

own career's sake. I want to leave you

36:21

with one thing that gets lost in every

36:24

conversation about AI and jobs. We have

36:27

never found a ceiling on the demand for

36:30

software and we have never found a

36:32

ceiling on the demand for intelligence.

36:34

Every time the cost of computing has

36:36

dropped from mainframes to PCs, from PCs

36:40

to cloud, from cloud to serverless, the

36:43

total amount of software the world

36:44

produced did not stay flat. It exploded.

36:48

New categories of software that were

36:50

economically impossible at the old cost

36:52

structure became viable and then

36:54

ubiquitous and then essential. The cloud

36:56

didn't just make existing software

36:58

cheaper to run. It created SAS, mobile

37:01

apps, streaming, real-time analytics,

37:03

and a hundred other categories that

37:05

could not exist when you had to buy a

37:07

rack of servers to ship something. I

37:09

think the same dynamic applies now and

37:12

it applies at a scale that dwarfs every

37:15

previous transition. Every company in

37:17

every industry needs software. Most of

37:20

them, like a regional hospital or a

37:22

mid-market manufacturer or a family

37:24

logistics company. They can't afford to

37:26

build what they need at current labor

37:28

costs. A custom inventory system

37:30

traditionally could cost a half a

37:32

million or more and take over a year. A

37:34

patient portal integration might cost a

37:36

third of a million. You get the idea.

37:38

These companies tend to make do with

37:40

spreadsheets today. But we are dropping

37:43

the cost of software production by an

37:46

order of magnitude or more. And now that

37:48

unmet need is becoming addressable. Not

37:52

theoretically now. You can serve markets

37:55

that traditional software companies

37:57

could never afford to enter. The total

38:00

addressable market for software is

38:02

exploding. Now this can sound like a

38:05

very comfortable rebuttal to people

38:06

struggling with the pain of jobs

38:08

disappearing. It is not the same thing.

38:10

Just saying the market is getting bigger

38:12

doesn't fix it. But it is a structural

38:15

observation about what happens as

38:17

intelligence gets cheaper. The demand is

38:20

going to go up, not down. We watched

38:23

this happen with compute, with storage,

38:25

with bandwidth, with every resource

38:27

that's ever gotten dramatically cheaper.

38:29

Demand has never saturated. The

38:32

constraint has always moved to the next

38:34

bottleneck. And in this case, the

38:35

judgment is to know what to build and

38:37

for whom. The people who thrive in this

38:40

world are going to be the ones who were

38:42

always the hardest to replace. The ones

38:44

who understand customers deeply, who

38:47

think in systems, who can hold ambiguity

38:49

and make decisions under uncertainty,

38:52

who can articulate what needs to exist

38:54

before it exists at all. The dark

38:56

factory does not replace those people

38:58

and it won't. It amplifies them. It

39:00

turns a great product thinker with five

39:02

engineers into a great product thinker

39:05

with unlimited engineering capacity. The

39:07

constraint moves from can we build it to

39:10

should we build it and should we build

39:12

it has always been the harder and more

39:14

interesting question. I don't have a

39:16

silver bullet to magically resolve this

39:18

but I have to tell you that we must

39:20

confront the tension or we are being

39:22

dishonest. The dark factory is real. It

39:26

is not hype. It actually works. A small

39:29

number of teams around the world are

39:30

producing software without any humans

39:33

writing or reviewing code. They are

39:35

shipping shippable production code that

39:39

improves with every single model

39:41

generation. The tools are building

39:43

themselves. The feedback loop is closed.

39:46

And those teams are going faster and

39:48

faster and faster and faster. And yet

39:51

most companies aren't there. They're

39:52

stuck at level two. They're getting

39:54

measurably slower with AI tools they

39:56

believe are making them faster. They're

39:58

wrong. running organizational structures

40:01

designed for a world where humans do all

40:03

of the implementation work. Both of

40:06

these things are true at the same time.

40:08

The frontier is farther ahead than

40:10

almost anyone wants to admit and the

40:13

middle is farther behind than the

40:15

frontier teams like to talk about. The

40:17

distance between them isn't a technology

40:20

gap. It's a people gap. It's a culture

40:23

gap. It's an organizational gap. It's a

40:25

willingness to change gap that no tool

40:29

and no vendor can close. The enterprises

40:31

that get across this distance are not

40:34

the ones that buy the best coding tool.

40:37

They're the ones that do the very hard,

40:39

very slow, very unglamorous work of

40:41

documenting what their systems do, of

40:44

rebuilding their org charts and their

40:45

people around the skill of judgment

40:48

instead of the skill of coordination.

40:50

And they are organizations who invest in

40:52

the kind of talent that understands

40:55

systems and customers deeply enough to

40:58

direct machines to build anything that

41:00

should be built. And those orgs need to

41:02

be honest enough with themselves to

41:04

admit that this change will not happen

41:06

as fast as they want it to because

41:08

people change slowly. The dark factory

41:11

does not need more engineers, but it

41:14

desperately needs better ones. And

41:16

better means something different than it

41:18

did a few years ago. It means people who

41:20

can think clearly about what should

41:22

exist, describe it precisely enough that

41:24

machines can build it and who can

41:26

evaluate whether what got built actually

41:29

serves the real humans it was built for.

41:32

This has always been the hard part of

41:34

software engineering. We just used to

41:36

let the implementation complexity hide

41:39

how few people were actually good at it.

41:41

The machines have now stripped away that

41:43

camouflage, and we're all about to find

41:45

out how good we are at building

41:48

software. I hope this video has helped

41:50

you make sense of the enormous gap

41:52

between the dark factories in automated

41:54

software production and the way most of

41:57

us are building software today. Best of

41:59

luck navigating that transition. I wrote

42:01

up a ton of exercises and a ton of

42:04

resources over on the Substack if you'd

42:06

like to dig in further. This tends to be

42:07

something where people want to learn

42:09

more, so I wanted to give you as much as

42:10

I could. Have fun, enjoy, and I'll see

42:13

you in the comments.

Interactive Summary

The video discusses the growing gap between cutting-edge AI-driven software development (

Suggested questions

7 ready-made prompts