Why Coding Agents Hallucinate.
202 segments
If you have ever tried to solve
hallucinations in coding agents, then
this video is for you. In today's video,
I'm going to take you through why coding
agents are mathematically designed to
hallucinate and how this information
will change your coding workflows
forever.
And if you've been trying to fix this
with a better prompt, bigger specs, or
more instructions, you've been making it
worse. I'm Roman. I published a top 3%
paper at Nurips, the largest AI
conference in the world. Now I'm on a
mission to become the best AI coder.
Here's how the naive user builds with
Claude code. Claude hallucinates a
function that doesn't exist. So you add
rules to prevent this in the future.
Claude will remember this. You pile
rules into your cloud.md thinking that
Claude will learn. After Claude still
doesn't fix his act, you decide to take
it to the specs. You draft up a massive
spec document telling Claude exactly
what to do in every situation. Specify
deliberately all of the things that he
can't or shouldn't do. And after 5
hours, you have a 10,000line monolith
plan. Claude surely won't screw this up.
Except Claude just gets worse. He begins
to ignore instructions, hallucinate
more, and leaves you with a bugridden
codebase.
Because with all of this added context,
what you effectively just did is cause
severe context rot and context
poisoning. Context rot is a phenomenon
where the attention in a language model
gets stretched thin, leading to a model
who ignores instructions and has lower
intelligence. Because just like humans,
LLM's attention is a finite resource. On
top of this, with all of these
unnecessary lines in your specs, you are
boxing in the model. The creator of
Cloud Code himself said, "Don't box in
the model." Coding models perform best
when they are given a strict goal, which
are specifications, and a path to
achieve the goal implementation plan.
When the model has nowhere else to go,
it bluffs. But there is a better way and
it starts with understanding why LLMs
hallucinate. Researchers at OpenAI
released a paper that mathematically
proves that hallucinations are a default
behavior of how large language models
are trained. In the paper, there were
two patterns in particular that are
important to learn. I call them the
knowledge gap and the confidence trap.
The first pattern is what I call the
knowledge gap. The model only knows what
it was trained on and what is given in
context. If a fact appeared once or
never in its training data, the model is
forced to make an educated guess. And in
my experience, this guess looks like an
interpolation over the model's training
data. Meaning, it will try to choose the
most plausible answer based on what it
knows. This is why models are so good at
boilerplate code. They have seen it
millions of times in their training
data. However, for your novel codebase,
they are forced to interpolate. And when
it interpolates from the wrong reference
points, you get hallucinations. Even
worse, you can't tell if these are well-
behaved outputs or hallucinations
because hallucinations are meant to be
convincing.
However, the knowledge gap is a solvable
problem. Since the model has knowledge
gaps in training, we turn to the other
way to teach models things in context
memory. We must give the model proper
information on how we write code, the
layout of our codebase, and the docs it
needs to make the implementation.
However, as we discussed, you can't just
give the model everything upfront every
time because this causes context rot.
So, what does this look like in our
codebase? Well, we want to give the
model a map, not a novel. Specs should
be traversible and bite-sized instead of
monolithic, and should tell the model
what we want, not exactly how it should
do it. We want to map out what is going
on in our repository in a way that makes
sense to the model without forcing it to
go and read all of the code, which is an
impossible task. The model should be
able to grab what it needs and when.
Part of our job as Agentic engineers is
to steer the model and tell it where to
look instead of just letting it run free
or giving it too much information up
front and boxing it in. I call this LLM
friendly architecture. And outside of
your codebase, the real solution to
knowledge gaps is to get familiar with
context engineering, maximizing signal
while minimizing noise. Mastering this
principle will decrease hallucination
rates drastically. Learn the best ways
to trim context during your coding
session, not just defaulting to
autocompact.
The second problem posed in the paper is
that of the confidence trap. And it is a
very logical and simple reason why
models actually hallucinate. Even when
the model is uncertain, it was rewarded
to sound confident. Think of it like
when you take a multiple choice test.
You never just leave the bubbles blank.
You choose one, the one that looks the
most reasonable. This is exactly the
same things that LLMs do. They are
maximizing the expected value of their
output.
Here's the thing about the confidence
trap. You cannot solve it with context.
The confidence trap is an incentive
problem, not a knowledge problem. The
model was trained to always give an
answer. You can give it perfect context,
every file in your repo, complete
documentation, and it will still
confidently pick one approach when it
should have been asking you a question.
It will still generate code instead of
saying, "I'm not sure this is right."
So, what do you do? You build
verification loops around the model so
that when it bluffs, the bluff gets
caught. The most reliable verification
is deterministic. Tests don't
hallucinate. A llinter doesn't
hallucinate. A type checker doesn't
hallucinate. But don't count out
non-deterministic verification either.
Not every problem can be boxed into a
test. Parallel sub aents are fantastic
at sweeping code, making sure it matches
specs, and does exactly what it is
supposed to do. This is because
verification and generation should be
split into different context windows.
The more layers of verification that we
can add after code is generated, the
better. So instead of tackling the
problem of hallucinations by preventing
them, try to catch them instead.
The same training objective that makes
them good at writing code is the
objective that guarantees they'll
sometimes write the wrong code
confidently. That is not changing.
There's no framework, plugin, or
architecture that will solve all your
problems. And when everyone has access
to the same tools, they are no longer a
moat. If you ground yourself and stop
chasing the next big thing, you will see
that coding with AI takes an incredible
amount of skill. The biggest reframe for
me personally was that all of these
quote unquote AI breakthroughs are just
reskins of the same two core patterns.
Context engineering for when the model
doesn't have the answers and harness
engineering for when the model
inevitably tries to convince you it
does. You are the variable at every
stage from planning to implementation to
verification. The models are the same.
The fact that these models are imperfect
and require skill to wield, that is not
a bad thing. The skill is what separates
agentic coders from vibe coders and the
returns to those who learn the tools
will be monumental. Now is the best time
to learn agentic coding for first
principles and build the workflows and
harnesses that fit you and your apps.
The largest agentic coding community on
school is linked below. It's free. I'll
see you in there.
Ask follow-up questions or revisit key timestamps.
This video explains why coding agents like Claude hallucinate and how to mitigate this issue. The speaker, Roman, a researcher with a top publication at NeurIPS, argues that traditional methods like adding more rules or detailed specifications worsen the problem by causing "context rot" and "context poisoning," where the model's attention is diluted and it becomes less intelligent. Instead, understanding the root causes is key: the "knowledge gap" and the "confidence trap." The knowledge gap occurs because models only know their training data and provided context. When faced with novel information, they interpolate, which can lead to hallucinations if they reference incorrect data. The solution is not to provide more context, but to use "context engineering" to deliver concise, relevant information the model can access when needed, like a "map" rather than a "novel." This involves creating "LLM-friendly architecture" with traversable, bite-sized specifications. The confidence trap stems from the model being trained to always provide an answer, even when uncertain, to appear confident. This is an incentive problem, not a knowledge one, and cannot be solved by more context. The solution is to implement "verification loops" such as deterministic tests, linters, type checkers, or parallel sub-agents. The key is to split generation and verification into different context windows and to catch hallucinations rather than trying to prevent them entirely. Ultimately, mastering AI coding requires skill in context engineering and harness engineering, understanding that AI breakthroughs often stem from these two core patterns. The skill of the user is the differentiating factor, and learning agentic coding principles now offers significant advantages.
Videos recently processed by our community