Red Green Refactor is OP With Claude Code

Watch on YouTube

Now Playing

Transcript

162 segments

0:00

Let's talk about a ridiculously easy way

0:01

to get better results from a coding

0:03

agent using a software practice that's

0:05

like 20 30 years old at this point. This

0:08

is red green refactor or in some

0:10

Willis's web blog here this is red green

0:12

TDD. TDD stands for testdriven

0:15

development. TDD is probably most kind

0:17

of uh prolific advocate is Mr. Kent Beck

0:20

here in extreme programming explained.

0:23

XP was a software practice developed in

0:26

I don't know 90s 2000s and it advocated

0:28

extremely aggressive use of unit tests

0:31

or aggressive is maybe not the right

0:33

word but everything in the software

0:35

practice was built around unit tests. As

0:37

Simon said the most disciplined form of

0:39

TDD is test first development. You write

0:41

the automated test first, confirm that

0:44

they fail and then iterate on the

0:45

implementation until the tests pass. And

0:47

as Simon says, this turns out to be a

0:49

fantastic fit for coding agents. I have

0:52

definitely found this to be true. But

0:53

what do the red and green here mean?

0:56

Well, red essentially means write a

0:57

failing test and the CI goes red. In

1:00

other words, any automated types or

1:02

tests that you've got on the repo will

1:04

be at that point red. You've written a

1:06

failing test to verify that the thing is

1:08

going to work when it's built. This

1:10

might be that you're fetching something

1:12

from a database using some kind of SDK

1:14

and you're testing the SDK and you

1:17

basically write that the fetch is going

1:19

to work before you've even implemented

1:21

the API method before you've even

1:23

implemented the DB schema. And then once

1:25

that red test is in, you then write a

1:27

green implementation to make the CI go

1:29

green again. In other words, all of your

1:31

unit tests go, "Yep, tick. That looks

1:32

good." The important thing is that this

1:34

is minimal, okay? because in the next

1:36

step we're then going to refactor the

1:38

code that we just wrote in order to make

1:40

it prettier to factor it into the shape

1:42

that we want. And we get the luxury of

1:44

doing that because we have already made

1:46

our CI green. And so we have a set of

1:48

tests that allow us to test that our

1:50

refactor doesn't break anything. Now for

1:51

experienced software engineers, you're

1:52

probably going, okay, why does this

1:54

matter more now than it did 10 minutes

1:56

ago? Because red green refactor has

1:58

always been an amazing way to build

1:59

software. So why are we talking about it

2:01

in the AI age? Well, for me, I find it

2:03

really, really comforting when I see an

2:06

agent doing red green refactor. Let's

2:08

imagine this sequence of events where

2:09

the agent writes a failing test. I can

2:11

see in the CI or I can see in the

2:13

agent's output that the test failed. I

2:16

then see it write an implementation and

2:19

it doesn't change anything about the

2:20

test. Doesn't try to like fake it to

2:22

make it pass or anything and the test

2:24

goes green. Now, if it's a reasonable

2:26

agent, it's pretty hard for it to fake

2:28

that. And this means I don't end up

2:31

reading a lot of the tests that are

2:33

created during my red green refactor

2:34

loop. I maybe skin them, especially just

2:36

the titles to understand what is being

2:38

tested, but I don't necessarily read all

2:40

of the implementations because I've seen

2:41

it go red and then go green. I feel

2:44

pretty confident that it's a reasonable

2:46

testing the thing that it's supposed to.

2:48

And of course, once this loop is over

2:50

and it's committed to the branch, I then

2:51

go and QA that chunk of work so I catch

2:54

anything that the tests might have

2:55

missed. And at that point, I can

2:56

generally flush out any bad tests if

2:58

there are any. You can find all my

2:59

skills at the link below. To make this

3:01

work, I have a TDD skill that I like to

3:04

invoke when I'm building these features.

3:06

You can find all my skills at the link

3:07

below. Now, there's stuff in here that

3:09

I'm not talking about in this video, but

3:10

the main focus of this is using the

3:13

where is it? The red green refactor

3:15

approach. So, down here, I get it to do

3:18

an incremental loop for each remaining

3:20

behavior. Write the next test. see that

3:22

it fails and write then the minimal code

3:24

to pass and then pass. The rules are

3:26

that it should only do one test at a

3:28

time. I find that this is a really

3:29

important caveat because it means that

3:31

you then don't end up with a huge

3:33

splurge of tests. This is one thing that

3:35

LLMs love to do which is they love to

3:38

create huge horizontal layers and then

3:40

they'll try to oneshot an implementation

3:42

that passes all 90 of those tests. So

3:44

they will do one massive file edit where

3:46

they'll add 90 different tests. Now

3:48

that's possible, but you do end up with

3:49

a lot of crap tests in my opinion. So

3:52

just getting it to focus on the thing

3:53

that it's implementing at the time and

3:55

then writing a single test for that.

3:57

Then writing uh the implementation to do

3:59

that, another test, another

4:01

implementation, another test, another

4:02

implementation. You end up with tests

4:04

that are really important for actually

4:06

guiding the implementation. In other

4:08

words, this one test at a time idea is

4:10

really focused on improving the quality

4:12

of your tests. Then once I'm done with

4:14

this incremental loop, I then say after

4:16

all tests pass, look for refactor

4:17

candidates. But again, that's probably

4:18

the topic for another video. So yeah,

4:20

red green refactor is a thing that you

4:22

have to know how to be able to do. And

4:23

unsaid throughout this whole video is

4:25

the fact that feedback loops matter so

4:27

so much with AI. Because AI is so eager

4:30

to create code and find the like fastest

4:32

solution to your problem, you need to

4:35

impose some back pressure on it to

4:37

essentially keep it uh in a stable

4:39

state. and strong types like TypeScript

4:41

of course or you know like uh unit tests

4:44

or things like that can really assist

4:46

you in getting highquality code and so I

4:48

think code quality is actually more

4:50

important than ever because I mean if

4:53

you've got a lowquality code base the

4:54

LLM is going to replicate what it sees

4:57

just like any developer out there it

4:59

will be happy to play in the mud if what

5:01

you have is mud. Now, if you're digging

5:02

this stuff and you like what you're

5:04

seeing on this channel, then you should

5:05

check out my newsletter, which is where

5:06

all of these videos go, and also sign up

5:08

there to get my new agent skills first,

5:10

too. I am putting together a Claude Code

5:12

course, which is going to be very, very

5:14

exciting. So, I will, of course, let you

5:16

know there when it drops. Thanks for

5:17

watching, and I will see you in the next

5:18

one.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video explains how using the "red green refactor" practice, also known as Test-Driven Development (TDD), can significantly improve results from coding agents. TDD involves writing a failing test (red), then the minimal implementation to make it pass (green), and finally refactoring the code while ensuring tests continue to pass. This method is particularly beneficial for coding agents as it provides clear feedback loops, building confidence in the agent's output without extensive manual review. A key rule for agents is to handle one test at a time, preventing the creation of numerous low-quality tests. The speaker emphasizes that feedback loops and maintaining high code quality are more crucial than ever in the AI age to guide agents and prevent the replication of subpar code.