HomeVideos

Is Google allowed be mad at this?

Now Playing

Is Google allowed be mad at this?

Transcript

175 segments

0:00

All right, we have a very serious

0:01

problem here in America. This is

0:03

something that I need to make a video on

0:05

because not enough people are talking

0:07

about it. It is a violation, a deep

0:10

violation of ethics. And it just makes

0:13

me sick. Honestly, the behavior I'm

0:15

seeing, not only is it not ethical, it

0:17

could be downright felonious. So, you're

0:19

probably asking yourself, "Well, what

0:20

the heck are we talking about? What is

0:21

this felonious behavior?" Well,

0:25

you wouldn't believe what Google just

0:27

recently discovered. Google Deep Mind

0:29

and the GTIG, that of course stands for

0:32

Google Threat Intelligence Group, have

0:34

identified increase in model extraction

0:37

attempts or distillation attacks, a

0:40

method of intellectual property theft

0:43

that violates Google's terms of service.

0:46

Yes, you heard it here first. The

0:48

company in which has stolen the world's

0:52

data to sell back to us in the form of

0:56

generative AI is upset because there is

1:00

someone stealing from the generative AI.

1:04

OH MY GOSH. You cannot make this up,

1:06

dude. The fact that Google is even

1:09

attempting to cry intellectual property

1:12

theft. This has to be one of the most

1:15

rich ironies I have ever seen in my

1:18

entire lifetime. It's not just Google.

1:22

Open AI is also blowing the whistle on

1:25

the alarm. Okay? Open AI believes the

1:28

best future is one in which we move

1:29

forward with democratic AI. By the way,

1:31

by democratic AI, they certainly don't

1:33

mean open sourcing more of their models.

1:36

Okay? [laughter]

1:37

Wow. That would be way too democratic.

1:40

AI that is shaped by the principles

1:42

America has always stood for. In

1:44

advancing democratic AI, America is

1:47

competing with the Chinese Communist

1:49

Party determined to become the global

1:51

leader in AI by 2030. That's one reason

1:54

why the release of Deepseek R1's model

1:56

at the Lunar New Year one year ago was

1:59

so noteworthy as a gauge of the state of

2:01

competition. It goes on later to say

2:03

that a OpenAI is providing an assessment

2:05

of Deep Seek's distillation techniques

2:07

effectively just going through how

2:08

they're going about taking all the data

2:11

from say OpenAI or from Google or from

2:14

Anthropic and using it to train their

2:16

model. And you already know how Daario

2:19

feels about the those those Chinese

2:21

open- source models. Okay, open weights

2:23

not in this household. All right, for

2:25

those that have no idea even what

2:27

distillation and all those things are,

2:29

uh I can give like a very very uh high

2:32

30,000 ft overview of what actually goes

2:35

on. So with these LLMs, effectively you

2:37

go through kind of two major rounds of

2:39

training. Now this is obviously much

2:41

more complex in actual practice, but for

2:44

you know the sake of this video, you can

2:46

think of it as two rounds. There's this

2:47

pre-training data kind of uh section

2:50

which is going to be feeding in a list

2:52

of words that you say find from a blog

2:54

and then having the model predict the

2:56

next word that's going to come out of

2:58

the blog and then correcting all of the

3:01

billions upon billions of parameters uh

3:04

according to the eight line algorithm

3:06

known as back propagation. When I was

3:08

learning about this back in college back

3:09

when I was getting my old masters in the

3:11

uh in the AIS there's a lot of partial

3:14

differentials. What error back

3:16

propagation really means is just simply

3:17

the proportion in which you incorrectly

3:19

contributed to the answer is the

3:21

proportion in which you should uh change

3:24

and along with some multiplication kind

3:26

of scaling down called the learning

3:27

rate. Anyway, so this kind of gives the

3:28

models their basic shape and then

3:30

there's the second one called

3:31

instruction tuning in which you have a

3:33

question and then you have an answer.

3:36

This is kind of how AIs go from just

3:38

simply next word you know prediction

3:40

into kind of this more broader like hey

3:42

how does this work? Well, this is

3:43

exactly how it works and this is blah

3:45

blah blah blah. And that is through this

3:46

phase right here. Now, how Deepseek

3:48

works is that, hey, chat GPT, how do I

3:52

code a Python server? That's the

3:54

question. Guess what the answer is? It's

3:56

just doing the instruction tuning by

3:58

using chat GPT. From this lovely video,

4:01

which I'll link in the description, they

4:02

typically call this behavioral cloning.

4:05

And that is because you're getting none

4:06

of the nuance of the pre-training and

4:08

kind of learning what is like halfway

4:11

matching. You just simply go, "Okay,

4:12

this is how you should answer always."

4:14

You're just trying to mimic the behavior

4:15

of another model. And this is where

4:17

Google and Open AI and all of them are

4:19

claiming intellectual property theft.

4:22

China is stealing their instruction

4:24

tuning. I cannot believe we live in a

4:26

world where Google, one of the leaders

4:29

in stealing the world's data for

4:32

pre-training, actually has the to

4:36

the coes to be able to release an

4:39

executive summary claiming that people

4:42

are stealing their intellectual property

4:44

and breaking terms of services. Bro, how

4:47

many terms of services have you broken

4:50

by gathering pre-training data from

4:52

sites? you know, you have literally

4:55

broken millions if not tens of millions

4:58

of terms of services throughout the

5:00

internets. So to actually like write it

5:03

down like dude, you know what's super

5:06

unfair and unethical? When people steal

5:08

from us, dude, it's like the most

5:10

classic definition of unethical. It's

5:13

always this way. It's like, yeah, yeah,

5:14

ethics don't really count until it's

5:16

unethical for me. Then it counts. Then

5:19

it's for real. Then I'm like upset about

5:21

it. How dare you come in here and steal

5:23

intellectual property? I mean, I know

5:24

like 5 minutes ago I said intellectual

5:26

property is fake and doesn't exist, but

5:27

no, it's real. Now, this just shows me

5:31

that there's really one big takeaway,

5:33

which is always the exact same takeaway.

5:35

It's not about the democratic AI and

5:38

everybody gets AI. It's about the

5:40

democratic AI that only I can provide

5:44

for you. Okay? Open AI, Google, they

5:47

want to be at the tippity top. They want

5:48

to be the ones providing the AI, shaping

5:51

its input, being able to say what is and

5:54

what is not allowed to be said. I'm not

5:55

really convinced it's for anything else

5:57

but the bottom line. And so when Google

5:59

cries intellectual property theft, not

6:02

going to lie, I don't shed a single

6:04

tear. The name

6:06

is the Kimmy Kagen.

6:09

Hey, do you want to learn how to code?

6:10

Do you want to become a better back-end

6:12

engineer? Well, you got to check out

6:13

boot.dev. Now, I personally have made a

6:15

couple courses from them. I have live

6:17

walkthroughs free available on YouTube

6:19

of the whole course. Everything on

6:21

boot.dev you can go through for free.

6:23

But if you want the gamified experience,

6:25

the tracking of your learning and all

6:27

that, then you got to pay up the money.

6:28

But hey, go check them out. It's

6:29

awesome. Many content creators you know

6:31

and you like make courses there.

6:34

boot.dev/prime for 25% off.

Interactive Summary

The speaker addresses what they call a serious ethical problem: Google Deep Mind and OpenAI are complaining about intellectual property theft through "model extraction" or "distillation attacks" by entities like Deepseek, often attributed to Chinese efforts. The speaker finds this highly ironic and hypocritical, arguing that Google and OpenAI themselves have extensively scraped data for pre-training, often violating terms of service. They explain how distillation works by essentially 'behavioral cloning' instruction tuning from existing models. Ultimately, the speaker concludes that these companies' concerns are less about promoting true "democratic AI" or ethics, and more about maintaining their market dominance, control over AI narratives, and profitability.

Suggested questions

4 ready-made prompts