The FATAL flaw of coding agents..

Watch on YouTube

Now Playing

Transcript

179 segments

0:00

Here's why your coding agent gets dumber

0:02

the longer you use it. In this video,

0:05

I'm going to show you why you're leaving

0:06

up to 90% of your agents performance on

0:09

the table and the simple yet powerful

0:11

fix that will save you from hours of

0:13

debugging. By the end, you'll have a

0:15

complete understanding of context rot,

0:17

the sinister problem that ruins your

0:19

sessions without you realizing it's

0:21

happening. I'm Roman. I published a top

0:23

3% paper at Nurips, the largest AI

0:26

conference in the world. Now I'm focused

0:28

on pushing agentic coding to its limits.

0:31

So what is context rot? Context rot is

0:34

the phenomenon where a large language

0:36

model's attention gets stretched thin as

0:38

input tokens increase resulting in

0:40

decreasing intelligence, increased

0:42

hallucination rate, lower quality

0:43

instruction following, and perceived

0:45

laziness. Hallucinations are

0:47

particularly malignant since they are

0:50

very convincing and can go undetected in

0:52

your codebase, causing catastrophic

0:54

bugs. Context fraud occurs because like

0:57

in humans, the attention of models is a

1:00

finite resource. Also, training examples

1:03

both in the pre-training and

1:04

reinforcement learning phases are

1:05

relatively short conversations. As

1:08

context length grows, the model is

1:10

increasingly out of distribution.

1:12

Research shows that context rot can

1:14

decrease model effectiveness by up to

1:16

90% in coding tasks. It's my belief that

1:19

much of the literature on context rot

1:21

understates its impact on LLMs due to

1:24

the fact that most studies measuring

1:26

context rot use benchmarks that are far

1:28

simpler than coding and reasoning such

1:30

as needle in a haststack. This is why

1:32

proper context engineering and agentic

1:34

orchestration is a core skill to have in

1:37

the modern era. Regardless of your

1:39

profession, your agent is already

1:41

context rotted before you type a single

1:43

word. Smart users start their sessions

1:46

between 23 and 26,000 tokens. That's

1:49

system prompts and built-in tools alone.

1:52

Average users sit at 40 to 50K because

1:55

they've added MCP servers and a

1:57

decentsized claw.md. And egregious users

2:00

are already above 50 to 100,000 tokens.

2:04

Massive claw.md files crammed with

2:06

30,000 tokens of instructions. Every MCP

2:10

server they found on Twitter, plugins,

2:12

custom agents. That's like trying to fit

2:14

Harry Potter and the Sorcerer Stone into

2:16

Claude's context window and then asking

2:18

it to write to your precious codebase.

2:20

At this point, model intelligence has

2:22

dropped by 40 to 90% before you even

2:26

typed your first message. So, as the

2:29

context builds up, our model's effective

2:31

IQ begins to drop. Naturally, our goal

2:35

is to maximize our model's intelligence

2:37

while keeping all critical information

2:39

in the context window. Notably, there

2:41

are also some tasks where a dumber or

2:43

less reliable model can still complete

2:45

them. So, the solution to all of this is

2:47

context engineering. The primary ways to

2:50

context engineer are/clear, which resets

2:53

the context window, slre, which lets you

2:56

trim recent context by jumping to a

2:58

previous checkpoint, slash compact,

3:00

which summarizes the conversation, and

3:03

spawning a sub aent, which creates an

3:05

isolated context window. If you want to

3:07

learn more about how to context engineer

3:09

properly, join my community linked

3:11

below. It's the largest agentic coding

3:13

community on school. So, the context rot

3:17

problem is particularly tough with

3:18

coding agents due to the fact that they

3:20

must get up to speed on your codebase.

3:23

The quicker and more efficiently you can

3:24

get them the information they need, the

3:26

better. We need to focus on good specs

3:28

for them to read and point them in the

3:30

direction of the files they should read

3:32

for a given task. Don't just let Claude

3:34

run off and read whatever he pleases.

3:36

And believe me, he will point him in the

3:39

right direction and give him a map, not

3:41

a novel.

3:43

Many people assume that a 1 million

3:45

context window model is actually more

3:47

robust against context rot. But

3:49

actually, these models are just as

3:51

susceptible to context rot as every

3:53

other model, regardless of the size of

3:55

their context windows. So instead of the

3:57

dum zone being in the latter half of the

3:59

context window, the dum zone is in 90%

4:02

of their context window. This leads to

4:04

absolutely catastrophic decreases in

4:06

performance quality. A practically

4:09

unusable model for any application at

4:11

all, let alone coding. At 700,000

4:14

tokens, it's like fitting 10 Harry

4:17

Potter and the Sorcerer Stones on top of

4:19

one another, and then asking the agent

4:21

to code something for you. As you can

4:23

imagine, this is not going to go well.

4:25

Now, let me show you Context Rod in

4:27

action. I gave the exact same prompt to

4:30

Claude. Build me a landing page for my

4:32

AI agency. Please make it absolutely

4:34

beautiful with a strong and shocking

4:36

white lightning hero effect pulled with

4:39

shaders from WebGL. On the left, a clean

4:42

session with only the system prompt

4:43

taking up context. On the right, I first

4:46

fed 60% of Harry Potter and the

4:48

Philosopher Stone into the context

4:50

window, starting work at 78% context.

4:53

This is the quality difference that

4:54

comes from irrelevant information being

4:56

left in the context window. The model is

4:59

obviously less capable while under the

5:01

influence of context rot and it is also

5:03

more lazy. This is because it's been

5:05

rewarded during reinforcement learning

5:07

to not do work that would fill the

5:09

context window resulting in this

5:11

perceived laziness. As you can see, the

5:13

white lightning hero effect has a stark

5:15

quality difference. And I think of this

5:18

in this way. Front-end tasks like this

5:21

are quite easy for LLMs. Imagine how

5:24

their performance decreases when they

5:26

are in your complex code bases making

5:28

changes. So, here are some key takeaways

5:31

from your next coding session. Take

5:33

these seriously and your agentic coding

5:35

journey will change drastically.

5:37

Stop treating coding agents like chat

5:39

bots. They are incredibly sensitive.

5:42

Every token and context will impact the

5:44

trajectory of their response. So,

5:46

context engineer aggressively and

5:47

actively. Models are amplifiers, not

5:50

reducers. Stop trying to take yourself

5:53

out of the loop until you have mastered

5:55

agentic patterns. Sit on the loop

5:57

instead. No effort in equals no effort

6:00

out. 2x effort in equals 20x output at

6:04

least. Now, watch what your agent reads

6:07

at startup. Don't let it just explore

6:09

your repo and get context audited. Tell

6:11

it what to read ahead of time. Turn off

6:14

autocompact. You are doing work, not

6:16

watching TV. You won't need it if you're

6:19

smart about context management. Context

6:21

is not free storage or a place to teach

6:23

models every little thing. Every token

6:26

you store costs you intelligence. So

6:28

stop putting so many behavioral

6:29

instructions in your claw.mmd. It's

6:32

great to know about context rot when

6:34

using coding agents. But what's even

6:36

better is having a community to talk

6:38

about it with. If you want to go deeper

6:39

on this, join the free community. It's

6:41

the number one agent coding community on

6:44

school. Link in the description. Thanks

6:46

for watching.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video explains the problem of 'context rot' in coding agents, where the performance of large language models (LLMs) degrades as the input token count increases. This degradation leads to decreased intelligence, increased hallucinations, and lower quality instruction following. The issue arises because LLM attention is a finite resource, similar to human attention, and training data typically involves shorter conversations. Research indicates context rot can reduce model effectiveness by up to 90% in coding tasks. The video highlights that even models with large context windows are susceptible, with the 'dumb zone' simply shifting to a larger portion of the context. Several strategies for 'context engineering' are proposed to mitigate this problem, including clearing the context, trimming recent context, summarizing the conversation, and spawning sub-agents. The speaker emphasizes that users should actively manage the context provided to agents, directing them to specific files and providing clear instructions rather than allowing them to explore freely. Key takeaways include treating coding agents with sensitivity to token count, actively managing context, and avoiding the temptation to fully automate the process before mastering agentic patterns.