Is Google allowed be mad at this?
175 segments
All right, we have a very serious
problem here in America. This is
something that I need to make a video on
because not enough people are talking
about it. It is a violation, a deep
violation of ethics. And it just makes
me sick. Honestly, the behavior I'm
seeing, not only is it not ethical, it
could be downright felonious. So, you're
probably asking yourself, "Well, what
the heck are we talking about? What is
this felonious behavior?" Well,
you wouldn't believe what Google just
recently discovered. Google Deep Mind
and the GTIG, that of course stands for
Google Threat Intelligence Group, have
identified increase in model extraction
attempts or distillation attacks, a
method of intellectual property theft
that violates Google's terms of service.
Yes, you heard it here first. The
company in which has stolen the world's
data to sell back to us in the form of
generative AI is upset because there is
someone stealing from the generative AI.
OH MY GOSH. You cannot make this up,
dude. The fact that Google is even
attempting to cry intellectual property
theft. This has to be one of the most
rich ironies I have ever seen in my
entire lifetime. It's not just Google.
Open AI is also blowing the whistle on
the alarm. Okay? Open AI believes the
best future is one in which we move
forward with democratic AI. By the way,
by democratic AI, they certainly don't
mean open sourcing more of their models.
Okay? [laughter]
Wow. That would be way too democratic.
AI that is shaped by the principles
America has always stood for. In
advancing democratic AI, America is
competing with the Chinese Communist
Party determined to become the global
leader in AI by 2030. That's one reason
why the release of Deepseek R1's model
at the Lunar New Year one year ago was
so noteworthy as a gauge of the state of
competition. It goes on later to say
that a OpenAI is providing an assessment
of Deep Seek's distillation techniques
effectively just going through how
they're going about taking all the data
from say OpenAI or from Google or from
Anthropic and using it to train their
model. And you already know how Daario
feels about the those those Chinese
open- source models. Okay, open weights
not in this household. All right, for
those that have no idea even what
distillation and all those things are,
uh I can give like a very very uh high
30,000 ft overview of what actually goes
on. So with these LLMs, effectively you
go through kind of two major rounds of
training. Now this is obviously much
more complex in actual practice, but for
you know the sake of this video, you can
think of it as two rounds. There's this
pre-training data kind of uh section
which is going to be feeding in a list
of words that you say find from a blog
and then having the model predict the
next word that's going to come out of
the blog and then correcting all of the
billions upon billions of parameters uh
according to the eight line algorithm
known as back propagation. When I was
learning about this back in college back
when I was getting my old masters in the
uh in the AIS there's a lot of partial
differentials. What error back
propagation really means is just simply
the proportion in which you incorrectly
contributed to the answer is the
proportion in which you should uh change
and along with some multiplication kind
of scaling down called the learning
rate. Anyway, so this kind of gives the
models their basic shape and then
there's the second one called
instruction tuning in which you have a
question and then you have an answer.
This is kind of how AIs go from just
simply next word you know prediction
into kind of this more broader like hey
how does this work? Well, this is
exactly how it works and this is blah
blah blah blah. And that is through this
phase right here. Now, how Deepseek
works is that, hey, chat GPT, how do I
code a Python server? That's the
question. Guess what the answer is? It's
just doing the instruction tuning by
using chat GPT. From this lovely video,
which I'll link in the description, they
typically call this behavioral cloning.
And that is because you're getting none
of the nuance of the pre-training and
kind of learning what is like halfway
matching. You just simply go, "Okay,
this is how you should answer always."
You're just trying to mimic the behavior
of another model. And this is where
Google and Open AI and all of them are
claiming intellectual property theft.
China is stealing their instruction
tuning. I cannot believe we live in a
world where Google, one of the leaders
in stealing the world's data for
pre-training, actually has the to
the coes to be able to release an
executive summary claiming that people
are stealing their intellectual property
and breaking terms of services. Bro, how
many terms of services have you broken
by gathering pre-training data from
sites? you know, you have literally
broken millions if not tens of millions
of terms of services throughout the
internets. So to actually like write it
down like dude, you know what's super
unfair and unethical? When people steal
from us, dude, it's like the most
classic definition of unethical. It's
always this way. It's like, yeah, yeah,
ethics don't really count until it's
unethical for me. Then it counts. Then
it's for real. Then I'm like upset about
it. How dare you come in here and steal
intellectual property? I mean, I know
like 5 minutes ago I said intellectual
property is fake and doesn't exist, but
no, it's real. Now, this just shows me
that there's really one big takeaway,
which is always the exact same takeaway.
It's not about the democratic AI and
everybody gets AI. It's about the
democratic AI that only I can provide
for you. Okay? Open AI, Google, they
want to be at the tippity top. They want
to be the ones providing the AI, shaping
its input, being able to say what is and
what is not allowed to be said. I'm not
really convinced it's for anything else
but the bottom line. And so when Google
cries intellectual property theft, not
going to lie, I don't shed a single
tear. The name
is the Kimmy Kagen.
Hey, do you want to learn how to code?
Do you want to become a better back-end
engineer? Well, you got to check out
boot.dev. Now, I personally have made a
couple courses from them. I have live
walkthroughs free available on YouTube
of the whole course. Everything on
boot.dev you can go through for free.
But if you want the gamified experience,
the tracking of your learning and all
that, then you got to pay up the money.
But hey, go check them out. It's
awesome. Many content creators you know
and you like make courses there.
boot.dev/prime for 25% off.
Ask follow-up questions or revisit key timestamps.
The speaker addresses what they call a serious ethical problem: Google Deep Mind and OpenAI are complaining about intellectual property theft through "model extraction" or "distillation attacks" by entities like Deepseek, often attributed to Chinese efforts. The speaker finds this highly ironic and hypocritical, arguing that Google and OpenAI themselves have extensively scraped data for pre-training, often violating terms of service. They explain how distillation works by essentially 'behavioral cloning' instruction tuning from existing models. Ultimately, the speaker concludes that these companies' concerns are less about promoting true "democratic AI" or ethics, and more about maintaining their market dominance, control over AI narratives, and profitability.
Videos recently processed by our community