PewDiePie beat chatGPT?

Watch on YouTube

Now Playing

Transcript

1096 segments

0:00

Anyway, so PewDiePie probably has done

0:02

some distillation is my guess.

0:04

[laughter]

0:05

Is that Is that what we're about to find

0:06

out? Peudes is just distilling poor

0:10

anthropic and Daario right now is just

0:12

dying on the inside.

0:13

>> Seek 2.5 way bigger model than mine.

0:16

Facebook's flagship model llama for

0:20

Maverick. [laughter]

0:23

>> Does anyone even consider Facebook at

0:25

the race anymore? when you're like

0:26

Facebook's, you know, flagship model,

0:29

I'm just like, oh, okay. Yeah. Like, is

0:31

this equivalent to Carpathy's nano GPT?

0:34

Like, what are we looking at right here?

0:35

Model B Lama for Maverick destroyed. And

0:40

most importantly of all, my model

0:44

outperforms Chad GPTs 4 [laughter]

0:49

in like November or something. That

0:51

works. This sounds like impressive, but

0:54

it's way less impressive once we

0:56

>> No, first off, it depends on what he

0:57

actually means by that. But if if you do

0:59

out compete 40, that's actually super

1:01

impressive. So, I don't know what's

1:03

about to happen in this video, but hey,

1:06

I'm in. That's cool. In like November or

1:09

something that

1:11

sounds like impressive, but it's way

1:14

less impressive once we get into it, and

1:16

we will get into it. It also sounds way

1:18

less impressive considering the fact

1:20

that I almost burned down my house twice

1:23

for this project.

1:25

It also sounds way LESS IMPRESSIVE. I

1:28

LOST MY GODDAMN VICINITY FOR THIS. NOW,

1:31

I have not created my own AI. I have

1:33

merely taken an AI model and trained it.

1:37

It's like stealing a child on the street

1:40

instead of birthing one myself. It's way

1:43

more effective that way. Plus, it would

1:45

cost million. What? Okay. I mean, that's

1:49

it's true. I guess stealing a child is

1:51

way more like time effective than than

1:54

creating one yourself. True. True.

1:56

Absolutely true. It cost millions and

1:58

millions of dollars in infrastructure,

2:00

which I do not have yet. Am I talking

2:02

about birthing or But it's important for

2:06

you to understand just how naive I was

2:08

going into this cuz I knew nothing about

2:10

machine learning, training, AI, and

2:13

coding that I mentioned. My model is a

2:16

coding model. I can make a coding model.

2:18

Sure, Felix. Great idea. But I also know

2:22

I wasn't that crazy going into this, and

2:24

I'll explain why. But mainly the fact

2:26

that I wanted to do this was because I

2:28

wanted to learn.

2:28

>> Is he doing what I think he's doing?

2:30

Whoopsies.

2:31

2:34

Yeah, look at this.

2:38

He put the He put the He put the little

2:40

microphone on his fingy. snapped it on

2:43

to the old figure like most people hold

2:45

it. He just snapped it on. The man's a

2:48

revolutionary genius. I swear he just

2:50

does things and I just I don't even know

2:52

what to do with it. He's just uh also

2:54

just to be completely fair, him saying

2:57

he knows nothing about coding but going

2:58

to build a coding model. Uh I mean

3:01

technically that makes him on par with

3:04

every other AI ML researcher right now.

3:07

Great idea. But I also know I wasn't

3:10

that crazy going into this and I'll

3:12

explain why. But mainly the fact that I

3:14

wanted to do this was because I wanted

3:16

to learn. The reason I'm here with this

3:18

video and the series of events has just

3:20

been me approaching this philosophy of

3:24

just yes this might be difficult but I

3:26

will learn from it step by step. You

3:28

know that takes you places. And that's

3:30

why I'm so excited to announce today's

3:33

sponsor which is boot.dev. boot.dev Dev

3:35

is a website. I teach it to you how to

3:37

code. But for real, I did their Linux

3:39

course and it's there is a nonzero

3:42

chance I might accidentally show up by

3:44

accident in this video. Very exciting.

3:47

Okay, very very exciting. Also, by the

3:49

way, just so you guys know, my audio

3:51

again is kind of goofy. Every time I

3:53

pause it, it requires me to like go back

3:55

a little bit cuz there's like a few

3:56

seconds of silence. So, if you're

3:57

wondering why I'm repeating it,

3:58

fantastic. I don't know what's

3:59

happening. When you actually understand

4:00

how something works, it changes things

4:02

for real. It's none of this Dualingo

4:05

bing fake learning. Okay, it's fun. It's

4:08

engaging and it works. I'm super excited

4:11

to do their other courses as well.

4:13

Create your own AI agent in Python

4:15

course. I'm going to say something that

4:17

makes me sound like such a boomer dad,

4:19

but I am at this point. So, to heck with

4:22

it. Instead of focusing on gaming too

4:24

much lately, I just been focusing on

4:26

leveling up myself. But it's true. It's

4:29

such an amazing feeling and I want you

4:32

guys to experience it as well. And I

4:33

think boot.dev is an amazing venture to

4:36

do so. So check it out in the

4:37

description and I will remind you later

4:40

as well. Okay. Now, funny enough to

4:43

dude, he's right. I hope you took that

4:45

little QR code. Do you see that little

4:46

QR code? I'm going to give you guys one

4:48

more chance. Okay. Well, the problem is

4:50

that hopefully that doesn't screw it up.

4:52

That hopefully that title [laughter]

4:56

>> true. It's such an amazing feeling and I

4:59

want you guys to experience it as well.

5:01

And I I can't believe Peudes is is a

5:03

better modern philosopher than most of

5:05

tech Twitter. You don't need to learn

5:07

[ __ ] Peudes is like actually as you get

5:08

older, it turns out it's really

5:10

meaningful and awesome to actually

5:12

become an expert in something and to

5:13

feel like you're actually improving at

5:14

something. I can't believe, dude, we

5:16

live in just the strangest world. Peudes

5:19

is now the modern philosopher. Funny

5:21

enough, to train my model, AI model, it

5:24

ironically it works very similar to how

5:26

you would train on yourself on boot.dev.

5:28

There's an instruction of a problem. You

5:30

get a framework on how to initiate or

5:32

what to use and then there's a validated

5:35

answer at the end of it. Basically, I g

5:37

around 100,000 samples like that [music]

5:39

and then you feed that to the model

5:40

which slightly nudges its parameters

5:43

like it slightly probably nudges your

5:45

brain [music] and bada bing bada boom

5:47

you have trained your model. It's kind

5:49

of like this. You're AI. Look at this.

5:55

Are you learning?

5:59

Are you getting this?

6:02

>> You will look at these. You will look at

6:05

100,000 more. Play. Pay attention.

6:14

>> This might take a while. The model. Hey,

6:16

to be fair, humans are super efficient.

6:18

Like if you really think about it, it

6:21

takes a baby from zero to to like one to

6:25

figure out how to walk. So it's only

6:27

seen however many hours of real life

6:31

from zero to one. And and for the first

6:33

couple months, its eyeballs didn't even

6:34

work. And it doesn't have a developed

6:36

nervous system that far into this entire

6:38

situation. It didn't even have muscles

6:40

and it figured it out. It takes AI like

6:43

billions upon billions of hours to be

6:45

able to do anything. You know, humanity

6:48

hyperefficient when you really think

6:50

about it. This little fellow right here,

6:51

hyper efficient learner. Okay,

6:56

>> hyper efficient. This might take a

6:57

while.

6:58

>> The model that I used was Gwen 32B,

7:00

which is already amazing at coding, but

7:02

I needed it to be amazing at coding.

7:04

>> He's also doing instruction tuning. By

7:06

the way,

7:06

>> the whole reason I decided to do this

7:07

and where I landed, cuz I discovered

7:09

something. There are many ways to

7:12

benchmark and test your model. I

7:14

discovered there's one called Ader

7:15

Polyglad. If you remember last video, I

7:18

used the agent ader to code my own web

7:20

UI. This is

7:21

>> Am I allowed to show this on Twitch?

7:23

What do we What do we do with this? Am I

7:24

Am I about to get banned? Is legendary

7:27

gooning lord Dan Clansancy about to come

7:28

in here and ban me? I'm sorry, Dan.

7:30

>> This is a respectable benchmark [music]

7:32

that has coding in six different

7:34

languages. That's six more languages

7:36

than I know.

7:38

But what was interesting about this is

7:40

[music] that state-of-the-art models

7:41

like Chat GPT perform like garbage on

7:44

this benchmark. 18.2 2% uh what my model

7:49

that I was planning to train on performs

7:51

8% [music]

7:52

trash but it performs 16% if you use a

7:57

different format called whole format

7:59

instead of diff format. Basically

8:01

there's two different formats but one of

8:03

them is important. How do I explain

8:05

this?

8:05

>> It's basically like this. You draw a

8:07

picture. Okay, imagine you draw a

8:09

picture and you want to add a cloud. But

8:11

instead of just adding the [ __ ]

8:13

cloud, you redraw the entire picture

8:15

with the cloud. It makes no [ __ ]

8:17

sense. But basically, a lot of

8:19

>> Did the original one The original one

8:20

did not have a

8:22

>> the cloud.

8:22

>> The original one did not have a a

8:24

dingleberry. He added more than just a

8:26

cloud in that. Okay. Hey, hey,

8:31

there is some bonus content in this one

8:32

that I was not I did not think we were

8:34

allowed to have bonus content of instead

8:36

of just writing the whole thing. It's

8:38

just stupid. It wastes compute and it

8:40

wastes time and I never used it as well.

8:42

So I thought hey if I can just fix the

8:44

format I will make my model 16%. And I

8:48

will almost beat Chad GPT. The goal beat

8:51

chat GPT easy because what I had on my

8:55

side at my disposal my arsenal was

8:59

Chinese AI research. You'd think China

9:02

would be on the more censorship side of

9:04

things. At least that's what I thought.

9:06

How wrong I was. It's completely the

9:08

opposite. Deepseek China Chinese AI

9:11

basically just [music] released their

9:13

model for anyone to run and a whole

9:15

document with their entire training

9:17

process in great just don't ask about

9:20

the three T's. You know what I'm

9:21

[laughter]

9:26

[laughter]

9:31

detail? It's amazing. There's so much

9:33

information from these research

9:35

documents combined with the open source

9:37

community. There's so much to try and it

9:40

just makes you really want to try it

9:41

yourself. At least that's how I saw it.

9:43

Even though a lot of it was super

9:45

advanced and I didn't understand

9:46

anything, eventually I did. I [laughter]

9:48

think

9:50

it's also hilarious reading these

9:51

documents. Chinese AI researchers are

9:53

unironically comedians cuz they write

9:55

the most unh [ __ ] They say [ __ ] like,

9:57

"Oh, we trained our model on 248

10:01

GPUs." You're like, "What? That's like

10:03

$60 million." And and you keep reading

10:06

and they're like, "We emphasize this is

10:08

an economical APPROACH." I'M LIKE,

10:09

"ECONOMICAL? WHAT'S THE non-economical

10:12

version?" But I think that kind of puts

10:14

things in perspective and why a lot of

10:15

companies in the west, [music] they

10:17

don't want to share information. So,

10:19

with Chinese AI research on my side and

10:21

a boot dev core,

10:22

>> to be completely fair, 24,48 2 to the 11

10:27

uh AH200s is an extremely cheap amount

10:31

of GPUs to be able to train. And the

10:33

reason why is because they uh you know

10:35

the whole uh

10:38

hey chat GPT

10:40

yo dog, how how does quantum mechanics

10:43

work? Yo, ain't give me that knowledge.

10:46

[laughter]

10:47

I that's that's what happens. ready to

10:48

do this thing.

10:50

Now, what do you need to train AI? Data.

10:54

Now, how do you get data without

10:55

sacrificing your soul in the process?

10:57

Well, there is options, believe it or

10:59

not. You can mine the stack, which is

11:01

this 60 TBTE data set that you're

11:03

[music] allowed to train on. Good thing

11:04

I kept my hard drives around. You can

11:06

use publicly available data sets. Yep.

11:09

You can also mine git, which is a little

11:10

more of a gray zone cuz you have to

11:12

check for licenses, which yeah, you just

11:14

you just hit all the MIT licenses. It's

11:16

actually something I did want to see is

11:17

like how good could you make nano GPT if

11:20

you just use uh MIT licenses and like

11:23

hey can you make an okay inline

11:24

autocomplete from MIT maybe which I'm

11:27

sure everyone is doing right [laughter]

11:29

these big tech companies they're they're

11:31

being super ethical and great about it

11:33

I'm sure that's why they're not sharing

11:34

any information they can also synthesize

11:38

your data glorious synthetic data by the

11:42

way is PewDiePie becoming the next uh

11:45

Rossman Is this what we're Are we in

11:47

real time? Oh, hold on. Time out. Time

11:49

out. Are we in real time watching the

11:52

guy who gave us Ligma? The Ligma Chad

11:56

to like Luke Smith pipeline, Vim Diesel.

11:59

Is this what we're watching like right

12:01

now in live action? Like, where are we

12:03

inside the cification of it all? Is Is

12:06

this what's happening? I think that's

12:07

what's happening. Wow. Don't taste so

12:09

good. I tried every single method there

12:12

is. It's been a mess. This is scraping

12:15

git or enriching my data. This is

12:17

scraping for more data. This is running

12:22

testing on the data. And this is

12:24

augmenting the data. And this is my

12:27

eight LLMs. This is the level we've

12:29

reached. This project was kind of like I

12:32

was like in the middle of a freeway and

12:34

there's a bunch of cars direct that I

12:36

had to direct constantly because [ __ ]

12:38

had to move for this to finish.

12:40

Everything took so much time and my GPUs

12:43

had to constantly be cooking and I had

12:45

to do all this makeup cosmetic surgery

12:48

to the data cuz all these developers

12:50

writing [music] poor lazy first of all

12:53

why do you even publish some of that

12:55

lazy whoa whoa whoa whoa whoa whoa whoa

12:58

whoa whoa whoa pudes what do what do you

13:00

mean commit I don't know that's dude

13:03

that's that's obvious what that one

13:05

means okay it means that they went on

13:07

stack overflow or asked an LLM them

13:10

about something. A bunch of code was

13:12

generated. It worked and they didn't

13:14

understand what happened. The second one

13:17

is also evidently clear. This guy's

13:19

working on CI right now. Do you have any

13:22

idea what CI stands for? It stands for

13:24

Dante's Inferno. Okay, it's just it's

13:27

awful. Okay, that's how you spell Dante.

13:29

It starts with a C. It's it's actually

13:31

insane. And so he's just praying, please

13:34

work. This is my one chance. This is my

13:36

one chance to actually have it happen.

13:38

And then this last one is also very very

13:40

obvious. Okay, some guy started fixing a

13:44

bug and then he accidentally had an idea

13:46

for a feature and then found three more

13:47

bugs along the way. And so a lot of [ __ ]

13:50

happened and you can't just like say all

13:51

that. So you're just like, yo, there's

13:53

just like just just changes. Okay, we

13:55

made changes. I don't want to talk about

13:57

it. Just a lot of stuff happened. Like

13:59

this is really good commit commit

14:01

messages. I know what happened on all of

14:02

them. Lazy instructions. Don't make a

14:05

commit if you're just going to add a

14:06

comment. But finally, I had my data, but

14:09

I still knew I needed more. And I also

14:12

wanted to try out synthetic generation

14:15

of data, which is basically you

14:18

typically take the strongest model that

14:19

you have and you ask it, hey, look at

14:22

this. Make more in this format. And my

14:26

god, it was a beautiful thing. You get

14:28

the perfect data exactly the way you

14:31

want it. It's amazing. But the problem

14:34

is, and maybe you already know this, AI

14:37

is wrong all the [ __ ] time.

14:39

[laughter]

14:40

>> Okay, it's basically like this. I tell

14:43

AI,

14:44

>> you know, also what is crazy. Can I just

14:45

can I throw something out there? Uh, in

14:47

today's world,

14:49

why just why oh why if you say AI is

14:52

wrong all the time on the internet,

14:54

people are like,

14:56

uh, hallucin hallucinations have been

14:58

largely solved. What are you talking

14:59

about? Well, dude, if you're getting No,

15:02

no one gets hall. That That was like a

15:03

2024 problem. Okay. No, no one gets

15:05

hallucinations. Like, what are you

15:07

talking about? I get hallucinations all

15:09

the time. It just makes up [ __ ]

15:11

constantly. Like, what do you what do

15:13

you mean?

15:17

[laughter]

15:18

My drawing skills have really I'm sure

15:21

glad I learned how to draw. [laughter]

15:25

>> I was just thinking like what happened

15:26

to his drawing skills?

15:27

by showing him a burger. And then AI

15:30

[music] makes what looks like a burger,

15:33

but I open the lid and he put razor

15:36

blades in there. So my harness is a

15:39

burger eater [laughter]

15:43

to check that if the real burgers are

15:45

made instead of fake burgers. I think I

15:48

explained that pretty well. You lied to

15:50

me. My synthetic approach, I know that

15:53

most people don't care. I used oss

15:55

instruct from magic coder and evol

15:57

instruct which is an amazing method.

15:59

It's basically like that cloning dancing

16:02

guy technique. You get feed a code

16:04

snippet and then you say hey make it

16:06

into this format and also make it do

16:09

another one but make it more advanced. I

16:11

don't want to get into this technical it

16:13

doesn't matter. this video would take

16:14

forever. But the problem I had was I was

16:16

not getting enough of it and I don't

16:18

trust

16:18

>> also like how do you know like if you're

16:20

not a coder, real talk, if you're not a

16:22

coder, how do you know you're even

16:23

generating the right thing? Is this why

16:26

chat GPT puts like if statements and

16:30

stupid guard checks around everything?

16:33

I don't know if you guys noticed that,

16:34

but chat chat jippity 35 codeex is just

16:37

like, oh, hey, is that a function that

16:38

takes in a number? Hey, I'm going to do

16:39

a quick type check. Hey, is that a

16:41

number? Are you sure your number? Hey,

16:43

if you're a number, are you like a whole

16:44

number? Oh, uh oh, you're a decimal

16:47

number. Hey, if you're a decimal number,

16:48

are you at least like greater than a

16:50

hundredth? It's just like, dude, what

16:51

are you doing? I just asked you to do

16:53

the absolute value. You don't we don't

16:55

need to do all this. Just just put put

16:58

the bags in the put the fries in the

17:00

bags, bro. Like, it's not that hard.

17:01

It's crazy how much if checks it does.

17:04

So, I validated it. I kept thinking the

17:06

problem was my test harness, so I needed

17:08

to FIX THAT. WHAT AN IDIOT.

17:12

THAT JUST MADE IT so I passed more

17:14

garbage. So when the time came for me to

17:16

finally train my model after months,

17:19

months, I was so excited. I ran the

17:23

benchmark and I had made an AI model

17:27

that finally

17:28

>> is worse.

17:29

>> That was worse. I had made it worse.

17:33

I probably should have quit then, but I

17:36

am way too stubborn for that. I just

17:38

can't. I had I don't have it in me. I

17:39

can't do it. Nice. This is when you

17:41

realize AI is kind of like magic, right?

17:43

But it also is garbage data in, garbage

17:46

data out. There was a lot of issues with

17:48

my data. When I had fixed my harness, I

17:50

had just let more garbage data passed

17:52

through. There's also all these other

17:54

issues like empty white spaces and

17:56

classic coding issues that I just wasn't

17:58

aware about. So that's why I had [music]

18:00

made it worse. But I gave it another

18:02

attempt. And this time, no more

18:05

mistakes, no more dillydling. lock in

18:09

again. The model is worse. [laughter]

18:15

Oh my gosh.

18:17

Oh, the brother locked in and actually

18:19

made it significantly worse.

18:22

[laughter]

18:23

Oh, that has to hurt a little bit. That

18:25

has to hurt a little bit. There's a meme

18:26

here. There's there I just feel it. I

18:29

just feel like there's a meme here.

18:30

Right here.

18:35

Something like that. There's something

18:36

out there. Okay. Okay, I don't know what

18:37

to do with it. I'm just going to let it

18:38

go out into the ether. Somebody will

18:40

come up with something way funnier, but

18:42

we're going to call it We're going to

18:43

call it even. Okay, we're going to call

18:44

it even. It was my harness yet again.

18:48

I don't know why I got so stuck on this

18:50

[ __ ] thing. I didn't even need

18:52

synthetic data. I just wanted it to

18:53

work. I just wanted to work so bad.

18:56

Finally, I had fixed the [ __ ] thing.

18:58

And the benchmark came in 16%. Sometimes

19:02

15, sometimes 14, but 16 was the

19:06

highest. The mod is not going to solve

19:08

the

19:09

>> Why not just I'm c I I Is it just

19:12

because he wanted synthetic data to win

19:13

so bad? Why not just GitHub MIT that

19:17

crap? GitHub MIT that just get it. Just

19:21

get give me all the MI.

19:23

>> Honestly, it's not going to solve the

19:24

same problem every time. It it just

19:25

doesn't work like that. There is a level

19:27

of randomness to it to the benchmark

19:30

performance, but 16 was a ceiling. And

19:32

that makes sense because that was a

19:33

ceiling in the official whole format or

19:36

whatever. And I had fixed it. So it's in

19:38

the diff format, but I had not made a

19:40

[ __ ] difference. I had not made the

19:42

model smarter. And remember, I need to

19:44

beat 18%. To say that I beat Chad GPT,

19:48

otherwise this means nothing. So my plan

19:50

of attack was to add reasoning to my

19:52

data. Adding reasoning to your data is

19:55

basically making it write out some

19:57

thinking before it solves the problem.

20:00

Instead of doing two plus two in your

20:01

head, you go, "Oh, okay. So, let me

20:04

reasoning to your Why is that mirrored?

20:08

Why is he always trolling us?" I swear

20:12

every image in all of his videos,

20:14

there's just like some small amount of

20:16

trolling. This is not some sort of like

20:19

right to left language. This is just

20:21

English inverted. Look, T h e, right?

20:24

Like just right there. It's just just

20:26

messing with it's just he's just trying

20:28

to get my brain

20:29

>> up there. Some thinking before it solves

20:31

the problem. Instead of doing 2 plus 2

20:33

in your head, you go, "Oh, okay. So, let

20:35

me break this down. So, I'll have two

20:37

apples and then if I add two more apples

20:40

and then I count them all together, then

20:42

I will have four." This is really

20:45

effective for complicated issues to

20:47

break down the problem into parts and it

20:49

really can improve the performance of

20:51

the AI but when it's simple questions

20:54

and it still does it it can be very very

20:56

irritating as well. You've probably seen

20:58

this actually if you use a stronger AI

21:00

model. A lot of times you ask it

21:02

questions, it goes, "Oh, absolutely.

21:04

Let's break this down into parts. First,

21:06

we're gonna you're like, "That was a

21:07

simple yes or no question. What do you

21:09

mean?" But it's a really effective way

21:10

to make your model solve problems more

21:13

accurately. And a lot of these smaller

21:16

open source models that I train on

21:18

struggle with this because they just

21:19

haven't been trained on it enough. So

21:21

that was my plan of attack to make my

21:23

model smarter. and I found a study that

21:26

showed that 10% in performance. I'm like

21:29

10% that's way more than I need, baby.

21:32

Let's just clone this git repo and get

21:34

going. The only problem was, of course,

21:37

that the ungodly level of computation

21:39

that was needed for this. And that's

21:41

when things were starting to smell

21:43

funny. I realized my room had this weird

21:47

aura to it all of a sudden. I swear it

21:49

didn't used to smell like that. So, I

21:51

decide to reboot my computer and

21:55

Lightning strikes all across it. Smoke

21:58

starts pouring out. My whole life

22:00

flashed before my eyes. I turn off the

22:01

computer, but the damage was done. One

22:04

of my GPUs [music] had died. Rest in

22:07

peace. I tested each one one by one. And

22:10

it seemed like everything else was okay.

22:12

It was just this one problem. And

22:14

looking at my purchase history, that one

22:16

GPU came from a different factory. You

22:19

have to understand what a hack job this

22:21

setup that I have. By the way, I love

22:23

that is like what a hack job his setup

22:25

is. And it's it's like the single most

22:28

beautiful

22:31

home setup I have ever seen. Yeah. It's

22:34

like my huge hack job. Oh, by the way,

22:36

you just excelled beyond everything I

22:38

have ever seen anybody just do by

22:40

themselves. Like that is beautiful.

22:42

That's a beautiful case. Does a great

22:45

job.

22:46

Okay. I mean, that one's less that Okay,

22:49

that's less of a good job.

22:53

That one's not as nice. But I mean, but

22:55

this one right here, that one's really

22:57

good.

22:59

It's very nice. [laughter]

23:01

Not so nice. That's not so nice. That's

23:02

not so nice. I don't know about that

23:03

one.

23:04

>> Bifurcated. It's undervolted to death.

23:07

These GPUs run on 450 watt. I run it on

23:10

175 watt just so my house wouldn't

23:12

[ __ ] crash every time. And then these

23:15

are hacked Chinese 4090 GPUs. This whole

23:18

thing is held together by prayers.

23:20

Literally in Japan, they sell these

23:22

prayer infused it badges. Where is it?

23:25

It's in front of my computer. Official

23:28

Japanese priests.

23:31

I probably have now blessed my computer

23:34

and I was ready to give it another shot

23:37

only for the smell to appear again.

23:40

After sniffing my GPUs like a maniac, I

23:42

concluded my GPUs was not the problem

23:45

this time. And eventually I found it. I

23:48

don't think it's supposed to look like

23:49

that. Again, I was using cable that was

23:52

graded for 1,500 watt. I was running

23:54

over 2,000. Change cable. We were good

23:58

to go. You know, I was planning on uh I

24:01

I had we had to do some power work at my

24:03

house and so I actually had uh some

24:06

extra wattage brought over to this room

24:08

cuz I was like, "Okay, you know, I

24:09

should probably maybe I should think

24:10

about building my own GPU rack."

24:13

I I don't know, bro.

24:17

I don't know. I don't know if I want all

24:20

of this. This kind of It just kind of

24:22

worries me about the whole the whole

24:24

situation. I don't really know what I'm

24:26

doing. Uh I don't want to end up

24:28

smelling I don't want to end up smelling

24:29

GPUs. You don't want that smoke. I don't

24:31

know if I want that smoke. Burning down

24:33

your house is really dangerous. True.

24:34

True. It's true.

24:36

It's actually really true. Like the the

24:39

more you burn down your house, like the

24:40

more likely you are to die. Okay. Hey, I

24:43

just want to throw that out there.

24:44

That's like it's it's you can't escape

24:46

it. Kept crashing still.

24:50

>> Oh, it's still thought it crashed. It

24:54

has crashed. Oh, [ __ ] What a pile of

24:59

You'd think training would be just this

25:00

straightforward thing, but it's really

25:03

not. It's taking too long to train. I

25:05

need more computer. Okay, what am I

25:07

going to do? It's not my fault. I know

25:09

what you're thinking, Felix. Are you

25:11

building another computer?

25:13

>> No, I am just extending the one I have.

25:16

[laughter] Of course. Of course. I'm an

25:18

epic minimalist.

25:22

>> He's Dude, Felix, I sorry, Mr. Mr. uh

25:25

Mr. pie. Uh, by the end of this video, I

25:29

swear he's going to be renting his own

25:32

place, being like, I actually had power

25:33

issues and I needed to be able to get

25:34

more power, so therefore I had to go and

25:37

rent my own facility cuz there's no way

25:39

I'm going to be able to rent this in my

25:40

in my like standard housing unit. I need

25:43

I need all the powers I can get. I feel

25:44

like that's where we're going. My

25:46

bathroom and I drilled a hole.

25:51

It's super heavy on your computer. And

25:53

every time it crashes, I have to [ __ ]

25:56

defibrillate it back. Bring it back from

25:58

a coma. It's not pleasant. It's And with

26:02

everything that had happened in the

26:03

past, extremely stressful. And I really,

26:06

really, really started doubting what the

26:09

[ __ ] I'm doing here for these 2%. I

26:12

started calling upon Deepseek API

26:14

because it's practically for free. And

26:17

eventually I had 15,000 samples which is

26:20

way less than I had aimed for but these

26:23

were the top of the top the creme the

26:26

cram samples. So hold on let me get this

26:28

straight. Okay. Okay guys we're we're

26:30

going to run we're going to run some

26:31

numbers here. Okay let me jump over here

26:33

really quickly. Let's open up [ __ ] GNU

26:35

image manipulation program. Uh in which

26:38

some people are offended by its name.

26:39

You can shut up if you are. Uh so with

26:41

[ __ ] let me get this straight. There's

26:44

something called open code. uh or sorry

26:46

not open code. Sorry, sorry, sorry,

26:47

sorry. I got DAX on the mind. I don't

26:49

know what happened there. There's Open

26:50

AI. Then there's something called

26:52

DeepSseek. Right now, what DeepS did is

26:56

that they went and requested a bunch of

26:58

synthetic examples from Open AI, stole

27:01

it,

27:03

and then used that to train their model.

27:05

So now Peudes, Mr. Mr. Pi over here, all

27:10

right, is doing the exact same thing to

27:13

DeepSeek. And now I'm no mathematician

27:16

but if a equals b and b equals c

27:23

therefore a equals c which in this case

27:25

means that pewdiepie is actually taking

27:28

from openi

27:31

okay

27:33

brain draining one of the great American

27:36

companies. Okay they're stealing. Okay.

27:38

Hey PewDiePie. You can't steal. Stealing

27:40

is wrong. Okay. Here's the thing. Okay,

27:43

hold on. So, if you just go on GitHub or

27:45

you take some books, that's one thing.

27:46

Okay, but if you go to a company that

27:47

has stolen the entire internet and then

27:49

proceed to steal their content, that is

27:52

unjust. They spent money on the on the

27:55

new stuff, therefore that's bad. I'm

27:58

just throwing that out there and that's

27:59

unethical. Beautifully crafted

28:02

step-by-step reasoning the world has

28:04

ever seen. I did my supervised

28:06

finetuning three epochs and I ran the

28:09

benchmark and it scored

28:12

>> 17.

28:14

Are you [ __ ] kidding me?

28:16

>> But as I mentioned, the performance is

28:18

kind of random. So I kept running the

28:20

benchmark. I had eventually given up,

28:23

but I had done one more just for the

28:25

[ __ ] You know how it's so funny about

28:26

what he's saying that people aren't

28:28

putting together. How much do you want

28:29

to bet the benchmarks that OpenAI also

28:32

published? You want to guess? You want

28:34

to guess what they look like? This is

28:35

the Opus uh 46 tracker.

28:39

Yo, dog. Why does it do really good and

28:41

then the next day it goes from 60% to

28:43

51%. 60% 55%. 58 54 52 54. And this is

28:50

all in the exact same benchmark.

28:53

Nothing's even changing.

28:56

Look at the look at the variance.

28:59

And then, oh my gosh, you jump over to

29:02

to to Codeex.

29:07

You know what's so funny is everybody

29:11

last week on Twitter was just like,

29:13

"Dude, Codeex sucks."

29:16

You go here and you're like, "Oh, that's

29:18

because it's been sitting at like a 60s

29:20

pretty much average the entire time."

29:22

And then proceeds to go down to a 45%

29:24

and everybody on Twitter talks about it.

29:27

Crazy. It's actually crazy that you can

29:30

go, "Oh, yeah. I see that. Look at the

29:32

weekly trend." [ __ ] the bed. [laughter]

29:36

It's nuts that like I can actually line

29:38

this up to reality.

29:41

I know. Anyways, holy [ __ ] This one is

29:44

having like a god run. It's at 40%. What

29:47

is happening? It drops to 30. I'm like,

29:50

"Hold, please hold." It drops to 25. It

29:55

drops to 20. And it finally finishes

29:59

all the exercises in the benchmark. It

30:02

is done at 19.6%.

30:05

I have beat chat.

30:09

It felt so goddamn good. STOP. STOP.

30:14

DON'T LISTEN to this guy. The benchmark

30:17

is invalid. I did not check for

30:19

contamination before the benchmark.

30:21

Basically, if you grab data all across

30:23

the internet, there's a high chance that

30:25

you're going to grab data that might

30:27

already exist in a benchmark somewhere.

30:29

So, you have to check for contamination.

30:31

I didn't check for contamination. I

30:33

didn't want to check for contamination,

30:35

but my stupid conscience was like, I

30:37

should probably do it.

30:39

>> I was backing up, going through my data

30:41

for the gazillionth time, and I realized

30:44

there was some contamination. It's not a

30:46

huge deal, but it's like if I just am

30:49

honest, whatever. To me, it's not good

30:52

enough. And I want to clear out my data

30:54

and I'm going to retrain again,

30:57

benchmark again. Oh my gosh, it's

30:59

painful watching this because I know for

31:01

a fact that

31:05

PewDiePie is more ethical than the guys

31:07

running those state-of-the-art models.

31:10

Like, oh my gosh, what are we doing? Oh,

31:14

why? Why is ligma man more ethical? This

31:17

is just my whole reality is breaking

31:19

before me. It's just it's hurting. Okay,

31:21

I'm in physical pain. All right.

31:23

>> And I think it will still give me the

31:25

same result cuz it's such [music] small

31:27

contamination. I'm kind of worried cuz

31:30

I'm running out of time. [laughter]

31:32

>> Have I done [ __ ] all this entire time?

31:35

Have I achieved nothing this entire

31:37

time? And the whole thing was a hoc

31:39

pogus. That's what I thought. So this

31:41

time I went all out. I trained on my

31:44

entire data set. Previously, I trained

31:46

on a small subset of what I thought was

31:48

my best data because it takes forever to

31:51

train. It takes forever. It takes days,

31:54

weeks. And since I reached the score of

31:56

19.6, I was like, uh, I'M DONE. BUT NOT.

32:01

THEN I made another discovery. I was

32:04

training on the wrong model. A major

32:07

update. I'm watching I'm watching my

32:09

video. this guy. I'm giving feedback to

32:12

my editor and I'm like, "Hold on, hold

32:15

on. What is that?

32:18

That is not the coder version."

32:20

>> YEAH, DUDE. OH MY GOSH. I NOTICED THAT.

32:24

THAT WAS EARLIER in the video and I was

32:25

just like, "Oh, weird. It's Quen too.

32:27

It's Quen whatever." And I was like,

32:29

"Huh, okay. That's interesting that he's

32:31

using Quen whatever. Not a big deal."

32:34

AND THEN HE HE'S TALKING ABOUT IT. OH, I

32:37

WISH I WOULD HAVE SET OUT. IT WOULD have

32:39

been so many IQ units. Oh, no.

32:43

>> Oh, I [laughter] the bag so hard.

32:46

>> Have I been training on the wrong

32:47

version this entire time?

32:52

>> Oh,

32:55

so maybe this has all been a blessing in

32:57

disguise because I feel like with these

32:59

two in my things in mind, I I should I

33:01

should beat Chacha. I should. But we

33:04

will know in a couple days. Did I beat

33:06

Chachi?

33:08

Well, I trained on the coding model and

33:10

the first score

33:13

4.4%.

33:17

I don't know what the [ __ ] happened. I

33:18

can probably think of five things, but I

33:20

don't care to figure it all out. It's

33:21

going to take forever. I just retrained

33:23

again and the model score 25% baby.

33:30

I have not just beat Chachi. I beat

33:32

Chachi twice. They're

33:36

that's really good.

33:39

That's a big improvement. Wow. Okay.

33:42

>> August shitty version as well. I was so

33:45

relieved. I was like, "Thank God."

33:47

>> You know, there is one problem though

33:48

that he's not saying, which I I I feel,

33:50

you know, I feel like especially a guy

33:52

with such uh, you know, such integrity

33:54

as him is that he's not mentioning it at

33:57

all,

33:59

which is Jippy 40. though being worse

34:02

will make you feel incredible. I mean,

34:04

the sycopency of 40 was so good that

34:08

like you could ask it anything and it's

34:10

going to love you and it's going to make

34:12

you feel loved and seen and it's going

34:15

to cause an entire group of people on

34:17

Twitter that if you say that thorough is

34:19

ridiculous and people that are addicted

34:20

to AI and are causing their entire life

34:22

to be ruined will go after you and your

34:25

family and will have absolutely insane

34:27

mental breakdowns live on the internet.

34:29

That's how good that model is. Okay? You

34:32

can't tell me that a model's you can't

34:35

you just can't you cannot convince me

34:37

that there has ever been such a great

34:38

model as 40. No one has created a

34:41

stronger group of sick infancy than

34:43

Jeepy 40. It was incredible. I was like,

34:46

"Thank God." But I made another

34:49

discovery. Okay, it's not over. A third

34:52

of the benchmark was not even running.

34:54

C++ and JavaScript just wasn't being

34:57

tested properly. Oh, that's okay. Okay.

34:58

No, nobody likes those languages

35:00

anyways. Okay. Nobody even cares about

35:01

it. Honestly, Pew, I think you're I

35:03

think you're fine. I think we can all I

35:05

think I can speak for everybody when I

35:07

say that if you never touch JavaScript

35:09

or or C++, you're living a good life.

35:11

Okay. A good good life. I run it again

35:14

and the final score

35:17

36% baby.

35:21

This means I also beat Google Schmoogles

35:24

thing as well. And I think I beat I beat

35:27

4.1 or something. I don't know. Chachi,

35:29

it's a massacre out here. You're being

35:31

destroyed. It's embarrassing. Open AI

35:33

stock just demolished. Just quit

35:36

already. This is what pops the AI

35:38

bubble. This [laughter]

35:41

That's actually really impressive.

35:42

Generated a bunch of synthetic data and

35:44

did that well.

35:46

That's really good.

35:49

That's like really really good. Is is is

35:52

Peudes is Peudes about to start

35:53

accepting investments and creating the

35:55

first Japanese-based AI company with

35:58

Japanese hardware prayer tags on every

36:01

GPU?

36:03

Is that where we're going? Is this the

36:04

universe we're about to land in? I was

36:06

like, there are still issues. I'm going

36:08

to do some post training. 1,500 samples,

36:12

splinky blinky, five epochs, and I ran

36:16

the benchmark again. Pure

36:18

decontaminated, let it be known. Purer

36:21

than the fountain of youth. 39.1%

36:24

baby.

36:27

Another destroy. I think this one I beat

36:29

Google Smoogle. I Where's the stupid

36:30

benchmark? Yeah, Gemini. Gemini Pro. You

36:35

guys pay for that. I did not think this

36:37

would even ever come close. But here's

36:39

the embarrassing part about all of this.

36:42

>> That does have to hurt Google a little

36:44

bit. Okay, this guy right here very

36:47

disappointed.

36:50

He has spent so much time.

36:53

Now look at him. Now look at him.

36:56

Glasses entirely too blue.

36:59

Doesn't even know what to do with his

37:00

life anymore. He's just very very upset.

37:04

Five. But Gwen 3 is out and it scores

37:08

40%.

37:10

So unless I beat 40% this means nothing.

37:12

[laughter]

37:13

And I'm I don't I'm out of time. I don't

37:15

have [ __ ] time. I need to send this

37:18

video to my editor. Yes, there was one

37:19

more thing. A model being good at one

37:21

benchmark is stupid as [ __ ] Okay, I

37:24

need to test this model on other

37:26

benchmarks as well. I want to test it on

37:28

Sweetbench, all these other coding

37:29

benchmarks. Unless you're improving that

37:31

it see future and said what benchmark he

37:36

was going to say and pulled it out.

37:39

>> I got that future site.

37:41

>> I don't even know. I am just out of

37:42

time. Okay, I don't have time to do it,

37:44

but I will. And if this model ever gets

37:46

to a point where I feel like it's good

37:48

enough, I would love to share it, but I

37:49

think if anything, I might just move on

37:51

to a different project in in the

37:52

background because it takes a long time.

37:55

And uh I'm kind of tired. That being

37:57

said, you've seen me fail a lot in this

37:59

video. I have become so accustomed to

38:02

failure, you have no idea. I've almost

38:05

given up on this project so many times.

38:07

There are so many times where I'm just

38:08

like, I don't know what I'm doing. This

38:09

is the stupidest thing ever. I have

38:11

graveyards full of just garbage,

38:14

debunkle, schmunkle, deformed data that

38:18

I have generated thinking this is the

38:20

best. [laughter]

38:22

I have gone through the whole alphabet

38:24

of failures. I was just so way in over

38:26

my head on this project. But I think the

38:28

number one thing I've learned, how do I

38:30

explain this? When you install Linux,

38:32

here's what happens. Linux the creator

38:35

becomes your godfather inevitably. And I

38:38

was watching one of a random video of

38:40

him talking and he was talking about how

38:41

he's doing this project and he was

38:43

failing. But that's okay because that's

38:46

how you learn. Uh some people think that

38:48

failure is a bad thing and I happen to

38:50

be one of those people who actually

38:53

enjoy doing

38:56

things I'm not good at

38:59

because it's how you learn.

39:02

>> Damn, Twitter could really use that

39:04

advice, dude. the mo the modern AI crowd

39:07

that's just like

39:09

it's all about taste now. You can taste

39:12

these nuts. Um so it's just like man

39:15

this is so good. I again I cannot

39:18

believe we're living in a timeline where

39:21

like the CEO of Y Combinator feels

39:25

incredibly less wise than PewDiePie.

39:29

Like what the he what is this world

39:32

we're living in? This can't even be

39:35

real, right? Am I Am I wrong here? Like,

39:39

this it's just No, no, no, no. I refuse

39:44

it. But yet, here we are. I don't know,

39:48

man.

39:50

I love the fact that Peudes is going on

39:53

this journey and doing this because

39:54

honestly, there are so many people that

39:57

just need to be encouraged to see like

39:59

failure is actually really good.

40:02

You should be measuring your growth not

40:05

in just successes, but more importantly

40:07

in how many things you've been failing

40:08

at. Like honestly, people don't realize

40:11

like just how valuable that fail fail

40:13

fail succeed path is. It's incredible.

40:16

It's just oh my god.

40:17

>> I'm like he's speaking to me right now.

40:20

Oh my god. But I really feel like that's

40:23

the main thing I've learned from all

40:24

this because there's so much to learn

40:26

from failure. Learn from it and iterate

40:29

and keep working. I think if you have

40:30

expectations of how things should go for

40:32

yourself, you're just going to get

40:33

disappointed and you're going to want to

40:35

give up. So, expect to fail, embrace

40:38

failing. That's the message I want to

40:41

send out to you kids. Last thing, this

40:44

is a coding model and I think a lot of

40:45

people are looking at coding models like

40:48

are they going to replace and I saw

40:49

Linus say this as well and he was

40:51

basically saying coding models if

40:53

anything it will just bring in more

40:54

people interested in learning how to

40:55

code again. That was online as tech

40:58

tips. very very good message. Absolutely

41:00

loved it. He said that the, you know,

41:02

the genie's out of the bo the bottle.

41:04

Pandora's opened up the box. They're

41:06

here to stay. But does he think No. He

41:09

just thinks it's going to end up

41:10

[clears throat] aligning closer to a

41:12

tool.

41:14

Good take. Good take, Lionus. Good take,

41:16

Lionus. Like, I would never have learned

41:18

wanting to learn how to code if it

41:20

wasn't for AI coming into the picture.

41:22

So, on the final note, learn something

41:25

new. Check out boot.dev. I highly

41:28

recommend it. It's a great course. Pick

41:30

out whatever you want. Challenge

41:31

yourself into something difficult that

41:33

may be above what you think you're

41:35

capable of. That's it.

41:37

>> Now, scram.

41:39

>> I'm just kidding, bro.

41:42

This month, we're traveling a lot. And

41:44

what always happened whenever we travel?

41:46

The inevitable connecting to public

41:49

Wi-Fi. Free Wi-Fi is a trap. Get the

41:52

reference. If you connect to someone

41:54

else's Wi-Fi, you might as well just

41:55

give up your credit card and banking

41:57

[music] information and all your

41:58

information they ever have. You should

42:00

always use a VPN. NordVPN. Say it with

42:05

me. Nordmeos

42:08

board connect always. I'm even connected

42:11

right now.

42:11

>> Now that we're going that direction,

42:12

>> I made a little module for me.

42:14

>> I wasn't ready.

42:16

>> Cuz whatever I do online, if I want to

42:18

download is my goddamn business. It's my

42:22

[clears throat]

42:23

It's my goddamn right. So, thank you

42:26

NordVPN for making it possible to free

42:28

yourself, protect yourself online. If

42:31

you go to nordvpn.com/piepie, you get a

42:33

huge deal on a 2-year plan, plus bonus

42:35

extra nordvpn.com

42:39

online privacy. Seriously, this is the

42:42

best deal for NordVPN you're going to

42:43

find. So, take advantage. Thank you,

42:45

Nord, for sponsoring this video. That's

42:47

nordvpn.com/pie.

42:49

>> Very lovely. Hey, that was really good.

42:52

again. I can't believe how good, dude.

42:54

He's just so good. It's just so good.

42:56

Absolutely good. The name

43:00

this is the primogen.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video details the arduous process of training an AI model, focusing on the challenges and unexpected hurdles encountered. The creator initially aimed to outperform ChatGPT-4 with their own model, but faced numerous setbacks including nearly burning down their house. Key challenges involved data acquisition, cleaning, and formatting, with significant time spent on iterating and refining the training data. The creator also highlighted the surprising performance of Chinese AI research and the ethical considerations of data sourcing. Despite numerous failures and a steep learning curve, the project ultimately led to valuable lessons about embracing failure as a learning opportunity and the importance of persistence. The video also touches upon the role of AI in coding education and the evolving landscape of AI development, with sponsored segments promoting learning platforms and VPN services.