Only 40 lines of code

Watch on YouTube

Now Playing

Transcript

241 segments

0:00

Some people just have odd hobbies, you

0:02

know, like I know some of you out there

0:03

after a long day at the pixel mines

0:05

likes to go home and play that game of

0:07

Factorio. Yes, Factorio for when

0:10

assembly just doesn't feel like it gives

0:12

you the same high used to. Now for

0:14

Jeremir, his special hobby, as he likes

0:16

to call it, is a habit of skimming the

0:19

open JDK commit log every few weeks. And

0:22

this time while scanning it, he ended up

0:24

finding a commit that had 40line fix

0:26

that eliminated a 400x performance gap.

0:30

Now, this is actually it's super cool.

0:32

How it works is pretty amazing. You got

0:35

I I I love it. There's going to be flame

0:37

graphs. I'm going to explain to you what

0:38

flame graphs are if you've never used

0:39

them. And also, hey, you shouldn't knock

0:40

it until you rock it. Okay? Don't make

0:42

fun of somebody for looking over change

0:44

logs. Okay? That's a good pastime. Now,

0:45

the change log that got his attention

0:47

was this one right here. replacing

0:49

reading proc to get thread CPU time with

0:52

clock get time. And the diff contained

0:54

96 insertions, 54 deletions, and the

0:57

change set adds a 55line JM benchmark,

1:01

which means that the production code

1:02

itself was actually reduced. And it

1:04

turns out that this change is actually

1:06

from like a 2018 bug JDK8210452

1:12

that says get current thread user time

1:14

is 30 to 400x slower than get current

1:17

thread CPU time. So it's been around for

1:19

a while. Okay, we've all we've all been

1:21

there. Okay, I I've had a few Jirro

1:22

tickets that lasted the entire tenure of

1:25

my previous job and were never ever

1:27

fixed. So I get it. I'm not no Hey, no

1:30

stones being thrown here. Okay, buddy.

1:32

All right, so let's first like let's

1:33

look over the code that got deleted

1:34

because when you see this, you'll be

1:37

surprised at how much effort was

1:39

required to get user current time for a

1:42

thread. First thing they do is a bunch

1:43

of like kind of declarations stuff,

1:45

right? They get the current thread time

1:47

right here and then a whole bunch of

1:48

just initialization of either dummy

1:50

variables or a buffer right here. The

1:52

first thing it does is open up a file to

1:54

the proc name. Then it's going to read

1:56

out all the data into this 2K buffer,

1:59

add a little kind of null terminator at

2:01

the end, and then it's going to close

2:02

down the file that was open. So then

2:04

after reading everything from the file,

2:06

it does this kind of reverse character

2:07

look into a string right here. Now,

2:09

apparently this was a source of many

2:11

bugs. Like this just happened over and

2:13

over again. And so they came up with

2:15

this way where it would just find the

2:17

last parenthesy. And then at that point,

2:19

you knew, okay, everything after this,

2:21

we're going to be able to get some

2:22

information about the process. We're

2:24

going to then skip a whole bunch of

2:25

whites space until we get to a non-white

2:27

space character. Then we're going to

2:28

read 13 numbers. That's right. So that

2:31

means the proc, which remember these

2:33

proc files, they aren't just like actual

2:35

real files. They are generated on the

2:36

fly. So it actually takes a bunch of

2:38

numbers, converts them into strings,

2:40

copies them to user space. Then the user

2:42

space is going to take that string back

2:44

out, do a little bit of parsing itself

2:46

with a little reverse char search. And

2:48

then it's going to take those

2:49

stringified numbers and reput them back

2:52

into numbers just to get these last two

2:54

right here. And then at that point, you

2:57

got yourself a little bit of that user

2:58

time. Okay, easy as that. Just a little

3:01

one, two, three. By the way, you ever

3:03

you ever wonder why people working at

3:05

really low levels tend to be grumpy?

3:07

Okay, we call them those angry neck

3:08

beards. Okay, you understand why a lot

3:10

of people had to put up with a lot of

3:12

Okay, so don't you go don't you be

3:14

just sitting there adjusting your little

3:16

div being like, I just want a little

3:18

drop shadow. Okay, there's people that

3:19

spent their lives trying to figure out

3:21

why drop shadows aren't working and

3:22

you're over there complaining. You don't

3:25

know what's going on in there. Okay,

3:26

buddy. And of course, here's the new

3:28

implementation. And there's like a bunch

3:29

of kind of some internal stuff that is

3:31

uh rather confusing exactly how it

3:33

works. But nonetheless, it does a little

3:35

bit of kind of flipping a couple bits

3:37

inside of this little clock ID and then

3:39

down here, boom, it just gets the thread

3:41

time. Yes, it's still for a lot of

3:43

people that looks like very ugly code,

3:45

but it is significantly better. We can

3:46

all agree, reading out from a proc file,

3:48

doing a bit of string processing,

3:50

hitting a scan f, and then grabbing 12th

3:52

and 13th number out of a list. We can

3:54

all agree this has to be better. Right

3:56

now, the best way to kind of understand

3:58

the performance of how something runs is

4:00

using flame graphs. If you've never used

4:01

a flame graph before, here one is

4:03

they're actually not too confusing.

4:05

They're actually pretty simple. How you

4:07

read this is that you read this as uh

4:10

from the very bottom. This is like the

4:12

very bottom function call and then each

4:13

one of these little tops. These are all

4:15

different function calls. Now, the

4:16

ordering doesn't necessarily matter.

4:18

These are not uh these are not time

4:20

dependent. Instead, you look at the very

4:22

top and this top part. This is the

4:25

percentage of the program that while

4:27

running was within this function right

4:29

here. So you just kind of look at it as

4:31

like relative portions. You can say that

4:33

hey the time spent closing a file. This

4:36

operation right here looks to it was

4:39

like 33% of the time of getting user

4:42

time was just spent closing a file. What

4:45

appears to be like 16%. Oh, hold on. I

4:47

just realiz I forgot that they have

4:49

this. It's 34% of the time. I forgot

4:51

that it has like a little percent sign

4:52

right there when you hover over it. uh

4:54

read was 16.8% of the time and f open

4:57

was 43% of the time. Yes, most of that

5:00

time if you did quick maths in your head

5:02

about 90% of the time was spent just

5:05

just fangling and jangling around with

5:08

files. This right here, this was the

5:10

scan f time. Like very little just only

5:13

3.9% of the time was it actually doing

5:16

something. The rest of the time is just

5:18

off in file land hopping and doing the

5:20

who whoever knows what all these little

5:22

futexes and everything. These fast user

5:24

space mutexes. Okay. Hey, get my mutex

5:28

out your mouth, buddy. I actually I have

5:30

no idea what that means. And of course,

5:32

the change, this is what the change

5:33

looks like. Now, you can tell right away

5:35

that something's happened. Okay. You

5:36

don't have to be a genius here. You

5:37

don't really have to understand how this

5:39

works to look at it and just say, "Hey,

5:40

it just looks like a lot of stuff is not

5:43

happening anymore. There's one little

5:44

peak right here, one little flame.

5:46

There's one super thin flame right here.

5:47

We don't really care what that is. And

5:49

that's it. It's just it's obvious. Okay.

5:52

Very little things are happening.

5:53

Jeremir was nice enough, of course, to

5:55

actually do the benchmarking also, so

5:56

that way we can have some real times.

5:58

And it went from about 11 microsconds to

6:01

279 nconds. Now, this is shocking,

6:04

right? I would have never I just would

6:06

have never guessed that I would have

6:07

been spending a lot of time. And by a

6:10

lot of time, I mean it's micro seconds.

6:11

Yeah, sure. You're right. It's not like

6:12

a ton of time out there. Uh, but

6:14

nonetheless, I would have just never

6:16

guessed it. I would assume that's like

6:17

the fastest pop possible operation. I

6:20

would have never investigated it at any

6:22

point, but this is what happened. This

6:24

is just like a recent fix from just a

6:26

couple weeks ago. And I just think these

6:27

kind of investigations are super cool. I

6:29

really do appreciate these write-ups.

6:31

There's even more cool information

6:32

within this write up about how it

6:34

everything exactly works. But I just

6:37

love the fact that there are just people

6:39

out there making things better all the

6:41

time, especially these super intense

6:43

ones. And then you just get to read

6:44

about just like, oh my gosh, like I

6:46

would have never would have thought. I

6:48

just would have never even put together

6:50

that getting user time would have been

6:52

so dang complicated. I just thought it

6:54

was interesting. Okay, I thought I'd

6:55

take my time and share a little

6:56

something. By the way, if you like a

6:58

little bit more of a technical one,

6:59

okay, hey, the thing is is engineering

7:00

is fun. I actually really like

7:02

engineering even if I don't work in the

7:04

JDK. I don't program C++. Okay, I don't

7:07

do any of that. But it's still awesome

7:09

to read this stuff. It's still super

7:11

cool to look at it. And uh I I just

7:13

enjoy, you know, enjoy engineering at

7:15

its finest. So, I hope you enjoyed it,

7:17

too. Hey, press that like button or

7:18

something. Tell me you liked it or tell

7:20

me you don't like it. Honestly, I don't

7:22

really care whichever whichever one you

7:24

choose. Don't don't be don't be a grump,

7:26

okay? But don't be a grump. The name is

7:29

the prime engine. Hey, do you want to

7:31

learn how to code? If you want to become

7:33

a better back-end engineer, well, you

7:35

got to check out boot.dev. Now, I

7:36

personally have made a couple courses

7:37

from them. I have live walkthroughs free

7:40

available on YouTube of the whole

7:41

course. Everything on boot.dev you can

7:44

go through for free. But if you want the

7:46

gamified experience, the tracking of

7:48

your learning and all that, then you got

7:49

to pay up the money. But hey, go check

7:51

them out. It's awesome. Many content

7:52

creators you know and you like make

7:55

courses there. boot.dev/prime for 25%

7:58

off.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video discusses a significant performance improvement in the OpenJDK, discovered by an individual named Jeremir while reviewing commit logs. A 40-line fix eliminated a 400x performance gap in retrieving thread CPU time. The original method involved reading from a proc file, parsing strings, and converting numbers, which was inefficient and prone to bugs. The new implementation, using `clock_gettime`, is much faster and simpler. The video also explains flame graphs as a tool to visualize performance, showing how the old method spent most of its time on file operations rather than actual computation. The fix, originating from a 2018 bug, highlights the importance of diligent code review and the surprising complexity of seemingly simple operations.