The FATAL flaw of coding agents..
179 segments
Here's why your coding agent gets dumber
the longer you use it. In this video,
I'm going to show you why you're leaving
up to 90% of your agents performance on
the table and the simple yet powerful
fix that will save you from hours of
debugging. By the end, you'll have a
complete understanding of context rot,
the sinister problem that ruins your
sessions without you realizing it's
happening. I'm Roman. I published a top
3% paper at Nurips, the largest AI
conference in the world. Now I'm focused
on pushing agentic coding to its limits.
So what is context rot? Context rot is
the phenomenon where a large language
model's attention gets stretched thin as
input tokens increase resulting in
decreasing intelligence, increased
hallucination rate, lower quality
instruction following, and perceived
laziness. Hallucinations are
particularly malignant since they are
very convincing and can go undetected in
your codebase, causing catastrophic
bugs. Context fraud occurs because like
in humans, the attention of models is a
finite resource. Also, training examples
both in the pre-training and
reinforcement learning phases are
relatively short conversations. As
context length grows, the model is
increasingly out of distribution.
Research shows that context rot can
decrease model effectiveness by up to
90% in coding tasks. It's my belief that
much of the literature on context rot
understates its impact on LLMs due to
the fact that most studies measuring
context rot use benchmarks that are far
simpler than coding and reasoning such
as needle in a haststack. This is why
proper context engineering and agentic
orchestration is a core skill to have in
the modern era. Regardless of your
profession, your agent is already
context rotted before you type a single
word. Smart users start their sessions
between 23 and 26,000 tokens. That's
system prompts and built-in tools alone.
Average users sit at 40 to 50K because
they've added MCP servers and a
decentsized claw.md. And egregious users
are already above 50 to 100,000 tokens.
Massive claw.md files crammed with
30,000 tokens of instructions. Every MCP
server they found on Twitter, plugins,
custom agents. That's like trying to fit
Harry Potter and the Sorcerer Stone into
Claude's context window and then asking
it to write to your precious codebase.
At this point, model intelligence has
dropped by 40 to 90% before you even
typed your first message. So, as the
context builds up, our model's effective
IQ begins to drop. Naturally, our goal
is to maximize our model's intelligence
while keeping all critical information
in the context window. Notably, there
are also some tasks where a dumber or
less reliable model can still complete
them. So, the solution to all of this is
context engineering. The primary ways to
context engineer are/clear, which resets
the context window, slre, which lets you
trim recent context by jumping to a
previous checkpoint, slash compact,
which summarizes the conversation, and
spawning a sub aent, which creates an
isolated context window. If you want to
learn more about how to context engineer
properly, join my community linked
below. It's the largest agentic coding
community on school. So, the context rot
problem is particularly tough with
coding agents due to the fact that they
must get up to speed on your codebase.
The quicker and more efficiently you can
get them the information they need, the
better. We need to focus on good specs
for them to read and point them in the
direction of the files they should read
for a given task. Don't just let Claude
run off and read whatever he pleases.
And believe me, he will point him in the
right direction and give him a map, not
a novel.
Many people assume that a 1 million
context window model is actually more
robust against context rot. But
actually, these models are just as
susceptible to context rot as every
other model, regardless of the size of
their context windows. So instead of the
dum zone being in the latter half of the
context window, the dum zone is in 90%
of their context window. This leads to
absolutely catastrophic decreases in
performance quality. A practically
unusable model for any application at
all, let alone coding. At 700,000
tokens, it's like fitting 10 Harry
Potter and the Sorcerer Stones on top of
one another, and then asking the agent
to code something for you. As you can
imagine, this is not going to go well.
Now, let me show you Context Rod in
action. I gave the exact same prompt to
Claude. Build me a landing page for my
AI agency. Please make it absolutely
beautiful with a strong and shocking
white lightning hero effect pulled with
shaders from WebGL. On the left, a clean
session with only the system prompt
taking up context. On the right, I first
fed 60% of Harry Potter and the
Philosopher Stone into the context
window, starting work at 78% context.
This is the quality difference that
comes from irrelevant information being
left in the context window. The model is
obviously less capable while under the
influence of context rot and it is also
more lazy. This is because it's been
rewarded during reinforcement learning
to not do work that would fill the
context window resulting in this
perceived laziness. As you can see, the
white lightning hero effect has a stark
quality difference. And I think of this
in this way. Front-end tasks like this
are quite easy for LLMs. Imagine how
their performance decreases when they
are in your complex code bases making
changes. So, here are some key takeaways
from your next coding session. Take
these seriously and your agentic coding
journey will change drastically.
Stop treating coding agents like chat
bots. They are incredibly sensitive.
Every token and context will impact the
trajectory of their response. So,
context engineer aggressively and
actively. Models are amplifiers, not
reducers. Stop trying to take yourself
out of the loop until you have mastered
agentic patterns. Sit on the loop
instead. No effort in equals no effort
out. 2x effort in equals 20x output at
least. Now, watch what your agent reads
at startup. Don't let it just explore
your repo and get context audited. Tell
it what to read ahead of time. Turn off
autocompact. You are doing work, not
watching TV. You won't need it if you're
smart about context management. Context
is not free storage or a place to teach
models every little thing. Every token
you store costs you intelligence. So
stop putting so many behavioral
instructions in your claw.mmd. It's
great to know about context rot when
using coding agents. But what's even
better is having a community to talk
about it with. If you want to go deeper
on this, join the free community. It's
the number one agent coding community on
school. Link in the description. Thanks
for watching.
Ask follow-up questions or revisit key timestamps.
This video explains the problem of 'context rot' in coding agents, where the performance of large language models (LLMs) degrades as the input token count increases. This degradation leads to decreased intelligence, increased hallucinations, and lower quality instruction following. The issue arises because LLM attention is a finite resource, similar to human attention, and training data typically involves shorter conversations. Research indicates context rot can reduce model effectiveness by up to 90% in coding tasks. The video highlights that even models with large context windows are susceptible, with the 'dumb zone' simply shifting to a larger portion of the context. Several strategies for 'context engineering' are proposed to mitigate this problem, including clearing the context, trimming recent context, summarizing the conversation, and spawning sub-agents. The speaker emphasizes that users should actively manage the context provided to agents, directing them to specific files and providing clear instructions rather than allowing them to explore freely. Key takeaways include treating coding agents with sensitivity to token count, actively managing context, and avoiding the temptation to fully automate the process before mastering agentic patterns.
Videos recently processed by our community