Linus Lays down the Law
382 segments
Hey there, buddy. Do you know why I
pulled you over? Undisclosed use of AI
while coding. Yeah, that's an actionable
offense you need to site when you use an
LLM. Okay, Buster. If you continue this,
we're going to throw you clean out of
the city. Okay. Yes. This is my mason
jar creeping into the the camera angle.
Is it a hot cup of pee? Yes, it is. Now,
you're probably thinking, "Okay, what
the heck are you even talking about? Why
are we talking about needing to publicly
uh announce that you're using an AI?
Well, the reason why we're talking about
it is that the Linux development team
and Lionus himself has finally come to a
conclusion of how LLMs can be used
within the Linux kernel development. And
for the last 6 months, there has been a
bit of a raging war. Okay, there's a
France Fertan involved that kicked off
the entire thing. And there's lots of
angry and sweaty nerds in the comments.
So to kick this thing off, I think it
would be best if we just looked over the
code in which kicked off this entire
thing. It's only a little 19line change
in which there's a replacement of this
old hash table that used to be used, but
instead there's now this new define
hasht function. Instead of saying how
big it is numerically, you say it as a
power of two. 2 to the 7 128. So you
say, "Hey, I only need seven bits."
Instead of creating a key by doing a
little bit of masking operations,
instead you just simply get to use some
of the more convenient functions. Same
thing with adding. Deleting is virtually
the exact same. So, even me as a layman
looking at this code, this looks rather
reasonable, right? But if you zoom in,
no, we're talking about zooming in.
Okay. No, no, no, no, no, no. You're not
listening to me. I want you to get in
it, baby. Do you see that read mostly?
Do you see how that read mostly was
removed? Well, a bunch of the reviewers
didn't really catch that. I assume they
thought that the define hasht uses read
mostly. Oh, but it didn't. And this
caused a performance regression. Now,
performance regressions. Oopsy daisies.
Changing of interfaces, usually not a
big deal. This is just kind of part and
parcel for developing any large
sophisticated piece of software. But
there were several people that got quite
hot and bothered. We're talking about
frothing here, okay? They were out there
just yelling, pulling a Jerry Maguire,
just throwing his briefcase everywhere.
And the thing is is that the performance
regression and the patch actually really
didn't cause any problems. Yeah, people
had to fix some stuff, but no one was
really upset. What actually caused the
whole thing was this article right here
on June 26, 2025, which is talking about
supporting kernel development with large
language models. And this was actually
just recapping a talk in 2025 from the
open-source summit given by Sasha Leven.
In this talk, he kind of talks about how
he's utilizing LLMs for kernel
development. In one of the examples, he
points to that patch I just got done
showing you that was merged in 6.15. But
here's the thing is that when the
Colonel developers inevitably found this
just a short bit later, they said, "Hey,
wait a second. That patch he's pointing
to, I knew about that patch. I reviewed
that patch. I am baffled that he
apparently saw no need to disclose this
when he was posting the patch. Or am I
missing something in the lore links?"
And this, my friends, is how World War
AI happened. Now, there's obviously a
lot of conversation going on, going back
and forth, disputing the legal
challenges of using AI, which honestly,
I'm on their team. There is definitely
some gray area on the old copyright,
clean room engineering. Can you ever
have clean room engineering if the AI
itself likely has the copyrighted code
actually stored within its weights or
compressed within its weights? Yeah,
that's a that's a serious problem. And
that is a serious legal problem to the
point that even Microsoft Copilot comes
with the customer copyright commitment.
It's a provision in the Microsoft
product terms that describes Microsoft
obligations to defend customers against
certain third-party intellectual
property claims relating to output
content. Microsoft is stating and
willing to go to bat for its customers
using C-pilot if C-Pilot leads to
generated copyrightable and enforceable
code. But that's only part of the reason
why a lot of people were upset. A big
part of it is that if I would have known
that it was AI generated, I would have
reviewed it more strongly. I was going
to even ask Sasha if this came from some
new tool. I think I should have. And
yes, it would have been nice if Sasha
mentioned it was completely generated by
an LLM because I would have taken a
deeper look at it. It appears from the
comments below that it does indeed have
a slight bug which I WOULD HAVE CAUGHT
if I would have known this was 100%
degenerated. And this honestly to me
this is kind of like an interesting
point. I I you know what I will have to
first and foremost say that the people
developing Linux are obviously extremely
talented engineers and I think it's
completely reasonable to say that
they're likely better engineers than I
am. But this idea that you would have
reviewed it different knowing that it
was an LLM versus someone you know. I
don't know about that. Why why would you
ever feel that way? Isn't a Isn't that
the purpose of a review for you to put
on your thinking cap and to think deeply
about the code being merged such that
you make sure that it is of the quality
you would expect? And so if somebody
hands me 19 lines and it's my job to say
yes, this code is good or no, this code
is bad, I really don't care who hands it
to me. I'm going to do a review and
ultimately even me and my layman's I
would have been able to see that, hey,
these two are different. I would have
had the check. Does define hasht come
with this intrinsic flag read mostly? I
don't in fact I don't know what that
does. I just assume that what it does is
it somehow changes either implementation
or how optimizations work to the point
where reading becomes much more of an
optimize and writes become more
penalized. In other words, this is
something that's read a whole bunch. As
it says right here, read mostly, right?
It just seems pretty straightforward.
You can change data structures or you
can change optimizations to make these
things work where reading is the kind of
the expected path. And there's actually
quite a bit of this throughout this
entire uh little argument going back and
forth of the people who reviewed the
code. And it's because they signed their
name, but they missed this bug. And
yeah, the bug wasn't a big deal. Yeah,
it was fixed. But it's kind of this
weird cope that I'm seeing that yeah, I
would have reviewed harder knowing it's
an AI. I I don't I don't really
understand that. I would just assume you
review well because that's what you got
to do. And you know, I hate to be on the
side of defending anything with AI, but
this just feels like something I have
to. The outcome of after 6 months of
arguing is this document right here, AI
coding assistance, which effectively
says that, hey, if you're going to be
contributing to the Linux kernel using
an LLM, you have a couple requirements
yourself. All code must be compatible
with the licenses. And B, AI agents must
not add sign off by tags. They can only
add an assisted by tag. That means
somebody else is liable. Which honestly
is a pretty good approach. Meaning that
any line of code that hits the potential
to become merged into the kernel,
somebody has to say, "Hey, I own this
code. Hey, I fully understand it. I've
done 100% of the review and I say this
code is good." You cannot allow a
machine to say the code is good. But a
more interesting point in this argument
is actually down below in one of the
many comments that says, "At the same
time, the explicit goal of generating
code with LMS is to make every developer
more productive at writing patches,
meaning there will be more patches to
review and reviewers will be under more
pressure. And in the long term, there
will be fewer new reviewers because none
of the junior developers who outsource
their understanding of the code to an
LLM will be learning enough to take on
that role." It's actually kind of an
interesting uh proposition. And it's
definitely something I've talked about a
whole bunch, which is the value of
learning in the day and age of AI. Do
you really need to learn? Should you
really take the time to understand
everything or should you just like, yo,
Dario, take the wheel. And for me, you
know what? This is my culture, okay?
It's learning. And for a juniors out
there, when they see that and they hear
that, they're like, that that's your
culture learning. I really do have to
sympathize with this, though. right at
the very end right here says, "I think
writing code is already the easiest and
most enjoyable part of software
development." So, it seems like the
worst part is trying to be automated
away. Honestly, I can feel that so deep.
Holy cow. The most enjoyable part of the
software development practice isn't
trying to figure out what to program.
That is long. That is difficult. And it
takes a whole bunch of time to come up
with good ideas. I can tell you I have
failed. I have created so many projects
that absolutely went to the wayside.
I've created few projects that actually
were a very, you know, high amounts of
value and people ended up using a whole
bunch. But the coding part, the coding
part is and has always been the easiest
and the most enjoyable part of software
development. So, I can like I completely
sympathize with this. This has been one
of my big gripes about just reviewing
code. Reviewing code ain't fun. Nobody
likes reviewing code. And this is why
ultimately I think this little document
that's been developed, this is just a
stop gap. I don't know how long this
document can uphold, but my guess is it
won't uphold that long. Meaning that
they're going to claim the new tools are
able to review just as well as humans.
And it's going to kind of go into this
point of why can't I have agents on the
signed off tag? Hey, they're better at
code reviewing than I am. They find more
bugs than I do. And so my guess is
within one to two years this is going to
have to change because the amount of
patches coming in could very well not
explode by like double but could end up
10xing in the amount of reviewers are
simply going to be unable to get through
that code and some other thing is going
to have to happen. I somehow doubt that
Lionus is gonna allow Linux to be
sloppified, but the pressure from the
slop trebuche, it's intense. And my
suspicion is that it's only going to
grow year over year. You probably
noticed that I actually haven't said
much on whether I think you should be
able to use LLMs or not inside the
kernel. And that's because honestly, I
don't have a really good gauge. That
patch that we looked at, it was pretty
well done. Yeah, the read mostly thing
should have been caught. We could blame
the reviewers. We could blame the person
who created Sash 11 for not catching it.
But mistakes are made and it probably
would have even been not caught by an AI
as well potentially, but I'm also on the
other side of the fence, which is that I
think you should review the code you put
out there. Like I'm still on that side.
And so I don't know what it's going to
take for me to change that. And so even
if I can produce infinite amount of
code, it also means that I can't go
infinitely fast because I still have to
read it. And my guess is that Linux will
be in this kind of state for a very very
long time. And of course very long in
the day and age of technology is
somewhere between 2 to 50 years. I don't
know. Okay. The future's difficult.
That's it. That's what I wanted to yap
about. I always find the Linux, you
know, the Linux mailing list and the LWN
just very interesting. this kind of
argument that's always happening that
these are people that are deeply
concerned about code, the code craft,
the process of creating code, and kind
of their gigantic responsibility to
never be wrong. So for me, this is kind
of like a large, I guess, canary in the
coal mine for AI usage. If the kernel
can remain as consistent as it has been,
unlike all these sloppied websites, and
be able to take advantage of using AI,
then hey, you know, it's probably means
me and you need to take more looks into
it all. And and most of all like I think
the big thing to just always think about
is that at the end of the day writing
code and building things and building
kind of your ideas like this is amazing.
This is an amazing thing we get to do.
Who would have guessed in the entire
history of the world there would be
something in which millions of people
could create whatever comes to their
mind. We live in truly kind of like an
amazing day and age. So really with all
the FUD with all the kind of just
non-stop fear-mongering. Sorry, I keep
putting this. I just I just love taking
drinks. I just leave it right in the
camera shot. So, with all the
fear-mongering being done by Daario and
Sam and these hype beasts, to me, I look
at it as one of the, you know, one of
the greatest days to be writing software
and to have expertise because expertise
probably going to be needed for quite
some time. So, sorry. I know some of you
want her to G on your stack until you
produce 37,000 lines of code, but for
the rest of us, we're still just trying
to make sense of the world. Hey, the
name, it's the hopeen, okay? That's what
I want. That's what I want in life.
Also, I do think there's a big issue
with copyright and I really do hope some
people go to jail because honestly, the
fact what they did to Aaron Schwarz just
a little bit earlier, only a few years
back, but somehow all the execs and all
the researchers just stealing massive
amounts of stuff face absolutely no jail
time or consequences seems a little bit
suspicious to me. Just throwing it out
there. If we were to get to true
justice, I think jail time is in order,
at least a court hearing.
A gen again again
a genen. Hey, do you want to learn how
to code? Do you want to become a better
back-end engineer? Well, you got to
check out boot.dev. Now, I personally
have made a couple courses from them. I
have live walkthroughs free available on
YouTube of the whole course. Everything
on boot.dev you can go through for free.
But if you want the gamified experience,
the tracking of your learning and all
that, then you got to pay up the money.
But hey, go check them out. It's
awesome. Many content creators you know
and you like make courses there.
boot.dev. dev/prime for 25% off.
Ask follow-up questions or revisit key timestamps.
The video discusses the debate surrounding the use of Large Language Models (LLMs) in Linux kernel development. It explores a specific instance where an AI-generated patch was merged without disclosure, leading to a performance regression and a broader discussion on liability, the necessity of code review, and the potential impact of AI on learning and the future of software development.
Videos recently processed by our community