Only 40 lines of code
241 segments
Some people just have odd hobbies, you
know, like I know some of you out there
after a long day at the pixel mines
likes to go home and play that game of
Factorio. Yes, Factorio for when
assembly just doesn't feel like it gives
you the same high used to. Now for
Jeremir, his special hobby, as he likes
to call it, is a habit of skimming the
open JDK commit log every few weeks. And
this time while scanning it, he ended up
finding a commit that had 40line fix
that eliminated a 400x performance gap.
Now, this is actually it's super cool.
How it works is pretty amazing. You got
I I I love it. There's going to be flame
graphs. I'm going to explain to you what
flame graphs are if you've never used
them. And also, hey, you shouldn't knock
it until you rock it. Okay? Don't make
fun of somebody for looking over change
logs. Okay? That's a good pastime. Now,
the change log that got his attention
was this one right here. replacing
reading proc to get thread CPU time with
clock get time. And the diff contained
96 insertions, 54 deletions, and the
change set adds a 55line JM benchmark,
which means that the production code
itself was actually reduced. And it
turns out that this change is actually
from like a 2018 bug JDK8210452
that says get current thread user time
is 30 to 400x slower than get current
thread CPU time. So it's been around for
a while. Okay, we've all we've all been
there. Okay, I I've had a few Jirro
tickets that lasted the entire tenure of
my previous job and were never ever
fixed. So I get it. I'm not no Hey, no
stones being thrown here. Okay, buddy.
All right, so let's first like let's
look over the code that got deleted
because when you see this, you'll be
surprised at how much effort was
required to get user current time for a
thread. First thing they do is a bunch
of like kind of declarations stuff,
right? They get the current thread time
right here and then a whole bunch of
just initialization of either dummy
variables or a buffer right here. The
first thing it does is open up a file to
the proc name. Then it's going to read
out all the data into this 2K buffer,
add a little kind of null terminator at
the end, and then it's going to close
down the file that was open. So then
after reading everything from the file,
it does this kind of reverse character
look into a string right here. Now,
apparently this was a source of many
bugs. Like this just happened over and
over again. And so they came up with
this way where it would just find the
last parenthesy. And then at that point,
you knew, okay, everything after this,
we're going to be able to get some
information about the process. We're
going to then skip a whole bunch of
whites space until we get to a non-white
space character. Then we're going to
read 13 numbers. That's right. So that
means the proc, which remember these
proc files, they aren't just like actual
real files. They are generated on the
fly. So it actually takes a bunch of
numbers, converts them into strings,
copies them to user space. Then the user
space is going to take that string back
out, do a little bit of parsing itself
with a little reverse char search. And
then it's going to take those
stringified numbers and reput them back
into numbers just to get these last two
right here. And then at that point, you
got yourself a little bit of that user
time. Okay, easy as that. Just a little
one, two, three. By the way, you ever
you ever wonder why people working at
really low levels tend to be grumpy?
Okay, we call them those angry neck
beards. Okay, you understand why a lot
of people had to put up with a lot of
Okay, so don't you go don't you be
just sitting there adjusting your little
div being like, I just want a little
drop shadow. Okay, there's people that
spent their lives trying to figure out
why drop shadows aren't working and
you're over there complaining. You don't
know what's going on in there. Okay,
buddy. And of course, here's the new
implementation. And there's like a bunch
of kind of some internal stuff that is
uh rather confusing exactly how it
works. But nonetheless, it does a little
bit of kind of flipping a couple bits
inside of this little clock ID and then
down here, boom, it just gets the thread
time. Yes, it's still for a lot of
people that looks like very ugly code,
but it is significantly better. We can
all agree, reading out from a proc file,
doing a bit of string processing,
hitting a scan f, and then grabbing 12th
and 13th number out of a list. We can
all agree this has to be better. Right
now, the best way to kind of understand
the performance of how something runs is
using flame graphs. If you've never used
a flame graph before, here one is
they're actually not too confusing.
They're actually pretty simple. How you
read this is that you read this as uh
from the very bottom. This is like the
very bottom function call and then each
one of these little tops. These are all
different function calls. Now, the
ordering doesn't necessarily matter.
These are not uh these are not time
dependent. Instead, you look at the very
top and this top part. This is the
percentage of the program that while
running was within this function right
here. So you just kind of look at it as
like relative portions. You can say that
hey the time spent closing a file. This
operation right here looks to it was
like 33% of the time of getting user
time was just spent closing a file. What
appears to be like 16%. Oh, hold on. I
just realiz I forgot that they have
this. It's 34% of the time. I forgot
that it has like a little percent sign
right there when you hover over it. uh
read was 16.8% of the time and f open
was 43% of the time. Yes, most of that
time if you did quick maths in your head
about 90% of the time was spent just
just fangling and jangling around with
files. This right here, this was the
scan f time. Like very little just only
3.9% of the time was it actually doing
something. The rest of the time is just
off in file land hopping and doing the
who whoever knows what all these little
futexes and everything. These fast user
space mutexes. Okay. Hey, get my mutex
out your mouth, buddy. I actually I have
no idea what that means. And of course,
the change, this is what the change
looks like. Now, you can tell right away
that something's happened. Okay. You
don't have to be a genius here. You
don't really have to understand how this
works to look at it and just say, "Hey,
it just looks like a lot of stuff is not
happening anymore. There's one little
peak right here, one little flame.
There's one super thin flame right here.
We don't really care what that is. And
that's it. It's just it's obvious. Okay.
Very little things are happening.
Jeremir was nice enough, of course, to
actually do the benchmarking also, so
that way we can have some real times.
And it went from about 11 microsconds to
279 nconds. Now, this is shocking,
right? I would have never I just would
have never guessed that I would have
been spending a lot of time. And by a
lot of time, I mean it's micro seconds.
Yeah, sure. You're right. It's not like
a ton of time out there. Uh, but
nonetheless, I would have just never
guessed it. I would assume that's like
the fastest pop possible operation. I
would have never investigated it at any
point, but this is what happened. This
is just like a recent fix from just a
couple weeks ago. And I just think these
kind of investigations are super cool. I
really do appreciate these write-ups.
There's even more cool information
within this write up about how it
everything exactly works. But I just
love the fact that there are just people
out there making things better all the
time, especially these super intense
ones. And then you just get to read
about just like, oh my gosh, like I
would have never would have thought. I
just would have never even put together
that getting user time would have been
so dang complicated. I just thought it
was interesting. Okay, I thought I'd
take my time and share a little
something. By the way, if you like a
little bit more of a technical one,
okay, hey, the thing is is engineering
is fun. I actually really like
engineering even if I don't work in the
JDK. I don't program C++. Okay, I don't
do any of that. But it's still awesome
to read this stuff. It's still super
cool to look at it. And uh I I just
enjoy, you know, enjoy engineering at
its finest. So, I hope you enjoyed it,
too. Hey, press that like button or
something. Tell me you liked it or tell
me you don't like it. Honestly, I don't
really care whichever whichever one you
choose. Don't don't be don't be a grump,
okay? But don't be a grump. The name is
the prime engine. Hey, do you want to
learn how to code? If you want to become
a better back-end engineer, well, you
got to check out boot.dev. Now, I
personally have made a couple courses
from them. I have live walkthroughs free
available on YouTube of the whole
course. Everything on boot.dev you can
go through for free. But if you want the
gamified experience, the tracking of
your learning and all that, then you got
to pay up the money. But hey, go check
them out. It's awesome. Many content
creators you know and you like make
courses there. boot.dev/prime for 25%
off.
Ask follow-up questions or revisit key timestamps.
The video discusses a significant performance improvement in the OpenJDK, discovered by an individual named Jeremir while reviewing commit logs. A 40-line fix eliminated a 400x performance gap in retrieving thread CPU time. The original method involved reading from a proc file, parsing strings, and converting numbers, which was inefficient and prone to bugs. The new implementation, using `clock_gettime`, is much faster and simpler. The video also explains flame graphs as a tool to visualize performance, showing how the old method spent most of its time on file operations rather than actual computation. The fix, originating from a 2018 bug, highlights the importance of diligent code review and the surprising complexity of seemingly simple operations.
Videos recently processed by our community