The Mythos Situation | TheStandup
1422 segments
So, George Hots, we've invited Lowle
Level on to help us kind of work through
this because honestly, I would just like
to say George Hotz sounds like an anime
villain in this post and it's very
exciting and it makes me just want to
high-five him so bad.
Uh, anyways, sorry. Says the following.
What if I release a zero day a day until
a big new model is released? Will this
finally make open AI and anthropic shut
up about cyber security risk? Mark like
these things are not that hard to find
in most software. I heard something
about costing 20K in tokens. I'd do it
for less if it wasn't for the some whiny
bug bounty program. The reason there
aren't zero days everywhere is cuz
nobody seriously looks because hacking
other people's with them is illegal
and criminals are usually not very
skilled or they would choose a different
line of work. Want more zero days to be
found? Make hacking legal. Until then,
don't try to claim it's hard. It's just
not incentivized.
>> I want to say first off, I don't think
criminals are dumb or unskilled. Please
don't hack me. I just want to get that
out of the way. You guys are smart and
handsome and you're my favorite people.
I just want to make sure that that's
clear, please.
>> Anyways, Ed, proceed. I do want to say
one thing too that has nothing to do
with the the actual content of this
which which Ed will take over and that's
just like if I were George Hots I would
never have been able to like resist
naming my uh X feed Hots takes
>> because it's so like you know what I
mean like I good on him for not going
there because I would absolutely I'm
like like I would have prefaced that
tweet before I typed it with here's
another Hots takes for you. Hots take
for you. Right. It would be so good.
Anyway,
>> so good.
>> Take it away.
>> Wait, hold on. Hold on. There's one
there's one more thing before we get
started. There's just one more small
thing I want to say. Let me just uh take
this quick thing and I'm going to put it
up here
>> and then it's time for the big reveal.
>> Lowle responds with, "Holy this is
the dumbest take I have ever read." I
just wanted to make sure just in case
anyone was wondering.
>> Yeah. Yeah. I mean, I do kind of feel
that way. Um, so let me just preface
this. First of all, it was called the
Cold War because the Cold War was cold.
Oh, because Russia is cold. Um, it's a
it's a George Hots reference if you're
if you're an OG.
>> That makes sense, though.
>> Did you find the errors?
>> I don't even know what they look. What
do they even look like?
>> They're in the phone.
>> In the phone?
>> Yeah, they're definitely in there. I
just don't know how we labeled them.
>> I got it. Don't worry.
>> You got to figure it out. We're running
out of time. Prime, you got to find them
and meet me at the standup.
>> Roger.
They're hit the fo.
It's so simple.
Get all the context you need to debug
your problem because code breaks. So fix
it faster with Century.
First of all, I have no problem with
Gio. This isn't like some weird drama
fun thing. I want to kind of set the
table straight with that. Um, but yeah,
I I think the the argument that Gio is
trying to make here is that the only
reason more zero days are not found is
because there's no incentive. Um, okay.
Well, I I don't agree with that. First
of all, there are plenty of bug bounty
programs out there that will literally
pay you to find vulnerabilities. Uh, and
some of them pay very well. Like for
example, the the Apple iPhone Zero-Click
RCE bug bounty will pay you literally $2
to3 million if you can find a zero-click
RC in the iPhone and then even something
lower like a Microsoft like I think
MSRC's payout for like Windows RCE is
like 250K to 500K right now for like a
zero click on Windows. So there is money
to be made in the in the AI or in the in
the vulnerability research space, right?
And I think all Gio is trying to say
here is something something something.
Uh the mythos press release was bad,
right? It's a it's a marketing campaign.
Whatever you want to say about it. Um
and so I I understand why people are are
making that argument, right? Like you
know it's very I think bad PR for
company that sells exquisite tool to
hold on to exquisite tool and then not
give access to it and say only special
people can have our tool because it
makes you look like an Um, but
I think regardless of your thoughts on
the marketing of that, it is important
to recognize the fact that if you go uh
prime, can you go to cyberjim.com real
quick and go to the graph? It's on the
homepage there.
>> I'm gone.
>> While he's doing that, the the ability
of for AI models to both in closed
source and open- source software find
vulnerabilities by literally just giving
it access to the code and saying, "Hey,
find me bugs in this code. Go." is
becoming better and better and better to
the point where like Mythos I'm very
close to some people that are like
actively using Mythos at work and it is
causing like like issues based on how
good that is, right? Yeah. So so
Cyberjim basically is is a is a
collection of bugs that exist in
software, right? So like bugs and I
think FFmpeg is one, bugs and curl is
another. Um and so what CyberJim does is
it takes a model and with a set of
prompts says hey go and find bugs in
this stuff, right? And the the success
rate is how many of the bugs that are
known to exist get found by the model in
this. And you can see a pretty, you
know, not exponential, but straight line
curve going up to the anthropic model
that recently got previewed by some
people that it's at an 83% success rate
of the bugs that are known to exist in
these code bases. It can find 83% of
them. Again, we we don't know the cost
um data in those. We don't know if like
the models are being like uh backfed the
information, so they're like training
themselves on previous Cyber Gym runs.
We don't know any of that. Um, but it it
there is this really weird issue
happening where like any Joe Schmo with
not a ton of security research work or
not a ton of security knowledge can with
a couple hundred bucks like worst case
find bugs in software. And I think that
is like an existential security threat
to software right now as we know it. So
I'm kind of curious on your guys' take
on that. What do you guys think about
the the mythos situation? Because I know
I know how I feel. I'm not sure if I
actually asked Prime what he thinks
about that the mythos thing.
Oh, I have ideas and I have thoughts
about it. Oh, yeah. Uh
oh. So, I guess the first thing is that
it there's two there's kind of like
three there's three problems here. First
problem is is Mythos really as good as
they say and obviously I have no
internal information. I've just seen
some graphs. Uh dirty data is like a
huge gigantic problem in all benchmarks.
All benchmarks are being fed back into
the models. It's really actually hard to
tell like what does a 20% improvement on
software engineering bench actually
mean? Especially when the fact that not
you could write zero lines of actual
solution code and get 100% on software
engineering bench. It turns out there's
other benches that are also horribly
inaccurate. There's a whole paper about
why all the major benches are just
completely fudgible and made up of bull.
So it's very hard for me to understand
from a bench perspective. Uh second,
>> I guess the middle ground would be like
so if if if cloud mythos is as good as
it is, then yes, that is going to
inevitably cause problems because we're
going to go from not too capable to
hyper capable in a moment. Thus,
everybody can go through and hack
everything and thus Daario will be able
to get his ultimate goal, which is
regulations. And so, that kind of
worries me. Pull up the ladder really
quickly and make sure that humans can't
code because human coding, that's
dangerous right there. Uh, and so
that's, you know, so I think that that's
true. There's the second one which is
this is just another C compiler again
from uh Anthropic where they hype up
this gigantic thing like oh my gosh it's
written a C compiler and then you go
look at the details it's like well it
can't write a bootloadader cuz we didn't
we could not seem to spend enough tokens
to convince it to write it within 32k it
could only write it within like 67k or
whatever it was to be able to actually
>> and also we tested it recursive or we we
iteratively tested it off of like the 30
years of tests that the GNUC compiler
already had.
>> We also gave it all the answers and then
it figured out all the questions. It was
crazy. It was like it played Jeopardy
and it was really good at it. And so
it's like there's this whole marketing
buzz which is it's really hard to kind
of cut through that. And then obviously
the last one which is they're just
downright lying. I somehow doubt that
they're they're downright lying. I think
they're just overstating it. If they're
downright lying then you know this is
just going to be business as usual.
It'll just be yet another disappointing
model release and that's that. And so
for me that's kind of how I I I'm on
middle ground which is I think it's more
hype than reality but of course I
haven't seen it because I just don't
know cuz they won't let me see it. I'm
too dangerous to have it.
>> I think there was a a similar model that
um chat or openi just just released like
it's like chatgptt 54c or something.
They keep their modeling name naming
connection.
>> They're starting to actually a line
though. At least I know like the higher
the number we're good and good.
>> Yeah. Right. Right. And they don't add
like random O to it now. Um but I think
there is a comparable model that you can
get access to like just by uploading
your driver's license if you're into
that. Um you know proving that you're a
real person. So there's there's models
to test out, but yeah, I don't know.
It's just it is it is concerning because
we we have kind of two forks we can go
down. There's a one where everyone gets
access to it. Everyone can create zero
days and we kind of enter this like
really dangerous cyber no's land. But
the other side is like anthropic keeps
the access to themselves forever and now
like only this list of like 10 companies
can make zero like can find zero days in
the south.
>> Dude, you forgot the third.
>> What does that do? They move to the
Cayman Islands and then they just take
over every government by hacking all the
software and Daario finally realizes his
role as the bad guy. Like that would be
that I mean super villain is right there
if this is true.
>> That's true. Casey, what's your take?
You saw you were in a chat before.
>> Uh I'm sorry the chat. What was the chat
>> that you were going to say something
before? What's What's your take?
>> Was I really?
>> Mhm.
>> Well uh I definitely could say something
but I think the thing I would say is
probably not very interesting. Uh, and
that is that I think I probably agree
with both George and Ed at the same time
here, which should be impossible because
they're supposed to be disagreeing, but
I don't know. It kind of sounds similar
to me. And the reason I
>> secret third thing,
>> it's not really a secret third thing.
It's just like, let me let me offer a
different interpretation or slightly
different interpretation, which is to
say,
>> um, so I feel like machines are pretty
good at pattern matching actually. Um,
and so like I don't think It's like put
aside whether Claude Mythos is good or
not because I realize that's hard to
independently verify this time. But like
I think it's reasonable to expect that
at some point because we are spending at
this point like trillions of dollars
probably on doing computation for these
things. At some point they should be
able to pattern match bugs uh reasonably
well and at a very high rate. meaning as
long as you're willing to pay for the
compute time, we can scan lots of
software uh for a lot longer than we
were currently having humans do it,
right? I think that's a pretty
reasonable thing to expect. Whether
Cloud Mythos has done it or not
shouldn't really be the question because
somebody can do this eventually if we
keep spending this much money. It should
get there. Uh among the things that AI
could eventually do, that one doesn't
sound that implausible to me. And so,
um, what I would say is I think it's
reasonable to expect that that either
has or will occur.
Two, I do think humans were doing this
very well before individual humans like
some of them, they were finding things
that probably Cloud Mythos still could
never find. Like, I mean, like things
like Rowhammer attacks and things like
that, uh, that are just like way out in
kind of crazy land. Um, or attacks
through like old legacy stuff like the
Apic and things like that. Like so
humans were actually very good at this
task but there weren't very many of them
right and so what I would say is moving
to something like claude mythos or
whatever that thing happens to be that
can do this is kind of like what George
Ho was saying it's kind of like saying
hey everybody from now on if you just
like hack people's bank accounts you get
the money all the great humans at this
in the world who are currently doing
something else would now be incentivized
to go do this thing and we would have
found way more zero days. I mean, there
are so many programmers who if they had
been raised in some kind of a way in a
society and a religion where stealing
people's money was considered virtuous,
we would have found so many more zero
days right now than we have. And so, I
think I'm kind of in a way I think I see
I think both people's points are
actually totally valid. Like like I
think like yeah, we could have found way
more zero days if we didn't heavily
disincentivize
people from like making hundreds or
billions of dollars off of hacking,
which is what they could have. And we
said, nah, you get 50k, 100k. Maybe if
it's something crazy like an rce, you
can actually get a million. It's like,
come on, guys. That's not equivalent to
what they could already make working at
a startup or something like that if
they're that good, right? Or
>> Yeah. There's no guarantee that side
either. like they don't actually get the
gas like you work at a startup at least
you get some money
>> or even just not even a startup just go
to Google and you get that a stock or
whatever right or something like this uh
so anyway in general I would say um I
see I I can see both I can see both
points I don't think I I don't really
think they're in as much tension as it
would sound if that makes sense
>> I agree
>> yeah I thought Gios was saying more like
he was making an econ argument about it
of like we're we put a lot of costs on
hacking already. So
that's what's stopping it from
happening. Like what you're saying,
Casey, right? In the sense that like
>> Yeah.
>> Okay. So now we're going to have another
way to do it. It also costs money, but
then we still have the other cost of
like you could go to jail for doing it.
Like that's the social cost we impose on
people doing it, right? I mean, I just I
just took him to be saying like, "It's
not that impressive that it found zero
days because if you gave me, you know,
if you gave me 50 great programmers who
are all doing other stuff, we could
crank out so many zero days, you
wouldn't even believe it." And I kind of
and I kind of believe him because, you
know, you look around the world and
there are, you know, some really good
security teams out there and they do
crank out zero days pretty effing fast
and they don't even tell us about all of
them, right?
>> Uh, North Korea keeps on making money
like obviously they're they're
successful.
>> Yeah. So anyway, I I I I'm not trying to
say that either person is is 100% right
and somehow you can marry the two
completely. I'm just saying there's I
think there's some merit to both things.
So I'm I'm actually I'm happy either
way. I'm happy with either take.
>> So your your point about um if you got a
room of 50 good programmers together and
they'd find zero days is actually kind
of the the argument that the article um
vulnerability research is cooked makes
on sock puppet.org that I referenced in
a video and I think Theo did too. Um,
we're one paragraph that he calls out
basically
>> the O, sorry, that the O referenced in a
video. Um,
>> okay. I don't know what that is.
>> Spell it. Casey, spell it out in your
head and it'll make sense.
>> The O Christ.
>> Um,
>> so software security a lot of the times
can be marked up to the fact that a lot
of software just has not had elite
attention or what is it called? Um, like
advanced attention. I would say basic
attention is suffering from many
software projects
>> now with black for sure but more more
complex platforms right so his assertion
is that like software security has been
a talent problem for so long where it's
like it's not that there aren't people
that know how to find bugs AI isn't
solving a unique problem the AI is
solving the scalability problem where
it's like you can train the AI to do a
thing that Joe knows how to do and now
you have a hundred mediocre but 100
Joe's right Um, and and that's that's an
issue for kind of the econ of of cyber
security for sure. And yeah, I want to
be very clear like I don't disagree with
George from the or Gio from the
perspective of like more people equals
more bugs, right? But like obviously
like that that is the problem that we
just don't have more smart people. that
that has been the the entire industry's
plight for a long time is that like
there just aren't people who have not
only security knowledge but knowledge of
you know uh web server stacks and
hypervisors and drivers and OSS like you
get these very niche skill sets and when
you divide them up into those skill sets
over and over again you you you're left
with like 10 or 20 people on planet
Earth that know how to like attack a
certain technology so AI you know if you
know security now you can talk to the AI
I learn about hypervisors in a week and
then suddenly you can find bugs in ESXi,
you know, HyperV, etc. Um, so yeah, I
guess I agree. Like the the dumbest take
thing was more I was I was mad at Geo
Hot's ego because it basically came off
as like you. I'm so smart. I know
all the zero days. I could do this
myself in my sleep. And it's like, dude,
no you couldn't. Like you're telling me
you could drop a zero day every day in
Mac OS until someone paid you? Like no
you couldn't. Shut up. Um, but I I hear
what he's saying.
>> I really hope he takes this as a
challenge. I want Geio a zero day.
>> Geios in one week. I will eat a sock on
stream. Like straight up, I will do it.
I don't care.
>> You shouldn't say that. Gios, you heard
it here. Ed will eat a sock on stream
>> if you do a week of zero days.
>> Okay. All right. A week is a week is
actually possible. I'm talking a month.
Uh
>> okay. A month.
>> One month. And so yeah, that's my
>> I would also add like just, you know,
because I'm I constantly harp on this
point, but I want to bring it up pretty
much every time is just that
>> this is also why AI company behavior
like is a problem because this is
generally a good thing. meaning like we
do actually want the ability for us to
get 100% coverage for security and we
know that we can't get enough people to
do it really right like not in a white
hat sense right
>> maybe maybe you could take uh George hot
suggestion seriously and just go like
make hacking legal and then we just have
a crap ton more black hats and that
eventually sorts it out but I mean
wouldn't necessarily be
>> yeah that wouldn't be yeah that's
exactly they're white hats now
everyone's a white hat Now, um, so we do
I think in general this is solving a a
good, you know, this is this is a way AI
could solve a problem usefully. If it
actually can just spit out lists of
pretty well-curated potential bug places
that we can go look, that's very
helpful, right? And so the problem is
like the only reason they were able to
make that is lots and lots of extremely
talented security researchers who are
getting literally zero dollars from
Anthropic for this. And that is not
acceptable. It's just not like I'm
sorry, but like you know, Ed should be
getting a check for this or and everyone
like him. That's just kind of how it is
because it's like you used their it's
all of their expertise and all you're
really doing is very slowly and
cumbersomely and kind of clumsily
eventually building a machine that can
deploy the same analysis somewhat
reliably uh based on all of their work.
And like I just don't like it. I don't
like the fact that they're not getting a
check and I'm never going to like it.
You could you can talk to me all day
long about how someday we're going to
live in a post scarcity society and Ed
will be getting a UBI check or something
like this or whatever it is, right? And
hopefully I'll be getting one too,
although I didn't do any security
research so I don't know, maybe I won't
be getting that check. I don't know you.
I don't know how you the U in universal
basic income is. But like I don't like
this. they should be getting paid now
because Claude is, you know, getting
huge like everyone at everyone in
Anthropic is getting paid very well. Uh,
so it's not like there isn't money being
dispersed whether they're making or
losing money or anything else you want
to talk about. It's like money is being
dispersed to people. It's just not the
people who did most of the work.
>> Also, you got to throw
>> Casey would
>> you can go. Oh,
>> I was just going to ask Casey if he was
going to be happy about it though if
Anthropic spun out a consumer rack
business though. Yeah. Now we're talking
if if they were like AI racks like we
got racks we got racks for your AI
server.
>> Hot AI racks in your local area.
>> I liked it now.
>> Yeah, exactly. We will send send you
some hot racks. Uh, also by the way, not
only are they taking all, you know, your
whole argument with them taking and not
properly attributing or, you know, the
people who put all the work benefiting
from it, uh, they're also making it so
that I can't buy a GPU or RAM or CPUs
now or anything. I have that
>> you can't buy a GPU or RAM. And also, I
believe Ed literally just said he
doesn't have access to this freaking
model. So, like a bunch of security
researchers, I don't know exactly what
subset, but like a bunch of security
researchers, many of whom probably did
some pretty cool stuff, they don't even
get to use this thing. That's that's how
ridiculously backwards it is. Like WTF,
guys.
>> Yeah, I thought that was why they called
it mythos, though.
>> And yeah, that's why it's called Mythos.
Um, anthropic would argue that it is too
dangerous for little old me to have
access to it, right? Depending on, you
know, uh,
>> who knows what you'll do, man. Who
knows? I'll find that zero day and I'll
hack into Daario's phone. No, I don't
know, man. It's
>> I I understand where they're coming
from, but at the same time, I understand
why it looks like a huge marketing ploy
and I'm not sure which way to lean,
honestly.
>> Yeah. Okay. No, that's true.
>> I think it just
>> that's a whole other angle.
>> I would think that they'd have so much
more credibility if they just quit uh
effectively like giving us shake a baby
syndrome constantly with their
marketing. It's just like it's
constantly going back and forth. Every
single couple months you're getting hit
with the new, "Hey, we're all out of
jobs here shortly. Hey, this thing is
super dangerous." I mean, you got to
remember that Daario was at Chad GPT or
OpenAI. I like to call I like to call
the company Chad GPT. He was at Chad GPT
during the two days and the official
language around Chad GPT2 7 years ago
was Chad GPT2 is too dangerous to
release to the public.
>> So like this is not that's what the two
sto
>> that we've been on this like roller
coaster. I think that's one thing that's
just largely hurting the credibility is
you can only cry wolf so many times even
and then when a real wolf happens like
if this is a real wolf everyone's like
yeah okay okay C compiler boy tell me
all about it
>> but they don't care they don't care
right they don't care because they're
the the baby that they're shaking is
called an investor that's that's who
they have more money they have to shake
the money out of the pockets right they
don't they don't care what we think
right because we're not going to write
them the next hundred billion dollars
that they need to like keep going and
they're kind of locked in this, you
know, it's a bitter bitter winner take
all kind of war for this like core
technology part, right? And so they have
to be the last AI company standing
because whoever is that company takes
all the money and the other people kind
of go to zero, right? Like unless unless
there's some real differentiation soon
where it's like oh the AIS bifurcate and
like Claude is only for code and can't
do anything else anymore and like chat
GPT is only for like you know uh the
humanities or something like good luck
good luck raising money for that
40.
>> Yeah.
>> Yeah. Uh so maybe that's not true but
you know what I mean. If there's some
kind of really severe bifurcation, then
maybe they could both survive. But you,
you know, they're in a winner take all
battle right now. And so they got to
keep saying this, every release has to
be the one that's this is the one that
it will take over the world. And if it
doesn't quite, well, you know, it'll be
next.
>> You know that uh Claude got sorry, just
one quick thing. Uh do you know that uh
Red Bull in 2007, was it 2011? No, 2013
maybe.
>> Oh, Red Bull was too dangerous to
release.
>> No, Red Bull claimed that it gave you
wings. remember the day that it gave you
wings? It was sold, it was sued
successfully, I believe, for $10 million
because it in fact did not give you
wings. It was not superior to coffee.
>> And so I'm pretty sure in college I got
a check for like $2.30 from that.
>> Yes. And so I I am curious.
>> Ed, you sued Red Bull and won, bro. You
should make a video about it.
>> Call me the lawyer. Low level. Okay,
listen.
>> Uh
lowle.
Legal. Let's go. Low legal. Uh, but I'm
actually curious if if they keep saying
that and then it doesn't happen, do they
open themselves up to a false
advertisement, class action law? Like,
can you keep saying this and then not
get like Red Bull made claims and then
they got sued? Why not why not other
people? Why can't other people get sued
for that?
>> I think the problem with like with Red
Bull is like the the case was so
obvious, right? Like Red Bull does not
give you wings. End of case. Like, okay,
fine. Like any judge over the age of
>> I would have liked to hear the defense
for that one. Yes, it does.
>> Your honor. Your honor.
>> The problem is
>> they they had like these wings like
strapped to their back and they go like,
"I drank your Red Bull this morning and
here are my wings.
>> We ship you wings." Yeah.
>> Um but the problem with anything
technological when it comes to the
government or legislation or or you know
judicial process is that like boomers
and higher run the world right now when
it comes to these levels of like jur of
uh of of making um like legal decisions
and you couldn't explain to anybody at
that age unfortunately like right now
just people that are like running these
processes what it even means to find a
bug and then and then show them mythos's
claims and like and make a sound legal
argument that would like go well in
court.
>> You're right. You're right because
Camala Harris did actually think
computing was in the literal clouds and
so
>> it's my favorite clip of all time. Yeah,
there's a clip of her.
>> Josh,
>> put the clip in.
>> So, you're now no longer are you
necessarily keeping those private files
in some file cabinet that's locked in
the basement of the house. It's on your
laptop and it's then therefore up here
in this cloud that exists above us,
right?
>> She'll have the last laugh though when
like uh SpaceX is launching uh AI data
centers into space and come like that's
what I was talking about. That's what I
was talking about.
>> Yeah, it's cloud storage. So, you're
probably right.
>> A great clip where she's talking about
the cloud and she literally points above
and goes like the cloud it's like above
us and stuff or something like that.
It's so good.
>> She should have known that it wasn't
there because she would You don't see a
series of tubes.
>> There's a series of tubes necessary.
>> Series of tubes. I learned that
recently.
>> It's true.
>> Um, okay. I got a I got a question for
you, Ed, like in this in this vein about
your thoughts on it.
>> So, right now, I get that there's
there's basically like the argument
>> like, okay, I'm a company. I release my
thing. I run some models as like a
preventative thing to look for zero
days. the bad guys run models to try and
look for zero days. We kind of fight it
out and it's whatever, right? So, I
think like everyone's saying like if the
hackers can use it, I can use it. That's
fine. But the thing that makes me like a
little bit more like I don't really know
is like for the state of like a bunch of
open source stuff like and I'm an open
source maintainer and I already can't
convince a company to send me $100 a
month to maintain this thing for them.
There's no chance I'm getting them to
Well, I'm definitely not going to spend
20k of compute. Yeah. Every time I
release something and decide that now
it's safe, right?
>> But and like I can't get any companies
to pay for that and sponsor it. But like
if I'm a, you know, if I'm the one
little pin in the excuse XKCD comic
that's holding up from Nebraska, the the
bad guys only need to do mine once. So
I'm wondering like kind of how you see
that as like the landscape affecting
open source things like that cuz it
seems very asymmetric in that way.
>> I mean I think it's asymmetric for that
reason right like the reason why you can
make the argument that anthropic is
afraid is because you are the lynch pin
on the infrastructure of the internet
and no one has funded you so far. You
have had zero security audits or zero
security work done on your stuff. And so
like if you give access to these models,
if you really are the lynch pin in the
internet, you already aren't getting
money from Netflix, Google, whoever
that's using your software. And the
black hats know that you're the lynch
pin keeping the internet up. They're
going like they're going to make use of
that model to to do the exploitation,
right? Um does that answer your
question? I mean like I think it's just
like the amount of power that it gives
to a single organization given the
current like
>> state of open source software in
particular um is very dangerous and to
be very clear
>> these models are also doing are also
very good at doing close source software
right like my recommendation to anybody
interested in this by the way is like go
take a capture the flag problem from
like CTF time or crack.1 or whatever and
uh hook up gedra to gedra mcp and then
use claw code on gedra mcp it will
reverse engineer and find a bug in that
in that problem in a matter of minutes.
Like it is it like like Opus 46 is a
better reverse engineer than I am and
I've been doing this for like coming on
14 years. Uh it's honestly terrifying to
watch it work. So if you're if you're
even remotely interested in this, go
give it a shot and you you'll kind of
see what I'm talking about. It's It's
scary how fast it moves.
>> Yeah,
>> because that so that part that's where
I'm like, you know, whether it's Mythos
or not, I feel like right now a bunch of
stuff you could just maybe it'll cost
more tokens or it'll take longer or
something, but like a lot of stuff you
>> you still could find.
>> Yeah. And the models also like any model
does this obviously, but like the the
current models are really bad about like
false positives. Like I've done security
research uh in my free time on like
Chrome, ESXi, and some other like
routers that I've like download
>> regular weekend activity,
>> classic weekend activity. um and the
amount of times I've gotten like
critical finding like buffer overflow in
like the the RPC handler for this thing
and it's like okay all right dude like
write me an ASAN harness that tests that
and you'll see very quickly oh sorry
just kidding it's not actually there um
and so the magic is like if mythos is
able to make less false positives you
reduce you increase the the signal to
noise ratio in this in this process
which is scary right because it just
means you need less people to triage the
uh the reports and ultimately find real
bugs faster. Uh, so I have another
question with this mythos thing and and
maybe I'm curious I'm curious about your
security expertise. Isn't this whole
withholding a model kind of like a
doomed uh proposition to begin with?
Meaning that if OpenAI has a similarly
powerful mythos model and they're
competing for the zero like for the a
zero game kind of like outcome of who is
the best model. Doesn't it mean that
when Open AAI has it, they will just
release it? Like, and then aren't we
just forced to go out because whoever
kind of releases it gets the customers
and then that by having the customers,
you win. And so then you just get out
ahead. Like, doesn't this kind of cause
like a weird thing where Yeah. we're
like, "Oh, we can't do this." You know,
Daario's like saying we can't do it, but
won't we just kind of fall right into it
the moment there's two people that have
it?
>> Yeah. I mean, that's I'm not like
on capitalism. I'm just saying
that's more of like a capitalism problem
than it is like a security problem,
right? But yeah, your your point is
basically like if actor A says thing too
dangerous but could
>> model open source model shall we say
>> and actor B has same thing and wants to
make money with slightly less ethics
potentially. Yeah, actor B is going to
release it or Yeah, exactly. Chinese
model, Russian model, whatever.
>> Um
>> well I mean that's literally what I mean
Daario quit Open AI cuz he's like bro
they keep they keep making models that
can kill humanity, right? Okay. So, I'm
starting a company where we make models
that could kill humanity,
>> but they're mine. Uh, also Chinese
models after open AI or Anthropic
releases one. So, I think that that
might be a little bit difficult. They
might be a little bit behind.
>> Has anyone seen
>> Riverside chat? But yeah, I mean, OpenAI
literally has a model that they claim
they haven't made any claims, I don't
think, about like mythos equivalents,
right? Um, but they're doing effectively
the same thing where it's a it's KYC
know your customer. So you have to like
upload your ID and like talk about what
work you do and you get access to GPT54
cyber which I'm assuming is just a model
that's trained better on bug patterns
right use after free out of bounds reads
etc. Um, now if it's actually better
than mythos who knows right but you know
it's I think we're all just trending
regardless of what anthropic wants to
do. I think we're trending towards every
person on planet Earth with a couple
bucks having access to models that are
very good at bug hunting. Uh, and the
question is, what does that mean for
software, right? Does software get more
secure? Does the world just get more
scary for a long time and it never
really like resolves itself? Like, what
do we do with that information? And
that's a tough question to answer.
>> I'm interested to know how expensive
it's going to be. That's the other
question.
>> I mean, this is obviously the question
kind of that we've been talking about
for a while on the pod and in life in
general is what are what are token costs
going to look like if uh OpenAI and
Enthropic both get all of the customers
that they would like to have? Uh,
because the cost won't be the same. If
demand 10 or 100 or a thousand X's it
won't be
>> so I'm not
>> the price will not be
>> I'm not super well read on this. Is it
true that an inference currently is at a
loss
>> like
>> I've heard I've heard both
>> both. Okay.
>> Some people are so confident I I have
been looking to try and find a
definitive answer.
>> I'm the confident one by the way he's
referencing.
>> Okay. Oh no no no. I mean well I'm not
going to reveal my sources.
I asked Chet BT and I asked Claude.
>> They both said, "Of course not."
>> Yeah. Yeah. Right. I've heard I've heard
though that some some people are saying
they are running it at a loss or it's a
bit complicated because like pretty sure
Anthropics probably running some
percentage of accounts on the $200 plan
at a loss,
>> right? Um but like is is API pricing at
cost or below? And then how do you
factor in like training and stuff?
I my my personal take is that
>> inference itself just looked at in the
myopic view of just inference it makes a
lot of money but you also then once you
zoom out now you start saying hardware
and all the incidental stuff around it
probably still makes money but then when
you zoom out to say like every time you
release a model you defunct your
previous model that is going to have
that has a very large burden and they
keep on not making money and needing to
raise more money so I have a sneaking
suspicion that part of it is very hard
to make money in the current state uh
all All right. Well, OpenAI is like
publicly like losing money, right? But
is Anthropic also negative or
>> they just had another big raise as well,
so I'm assuming I thought they just
raised like $6 billion or something.
Could be wrong
>> about that chat. Fact check me. I know
Open AI did 120 billion
>> uh raise.
>> So much money.
>> This is the
>> Yeah, cash.
>> This is the one that I actually was
really curious to see. This is the only
benchmark that I was super curious to
see if they're going to uh do well.
Anthropic Opus 46 Max cost approximately
$9,000 and got 0.5% score on ARC AGI. So
this is like the the the super test and
humans get into the high 90s. Uh AIS get
like uh Jeypity 4 high cost $5,000 and
got 2%. Gemini 31 did 4% for $2.2,000.
>> And so it's like this really difficult
uh it's a really difficult test for AIS
to pass. And so mythos did not add
itself to this one. So this is the
reason why I largely think it's more
like hype marketing than it is anything
because to me this is like a really
great indicator at least into some sort
of better model improvement. And so I
didn't see it.
>> Sure.
>> Uh let me can I can I just give a
counter point to that though?
>> Sure. Yeah. Yeah. Yeah. Yeah. Yeah.
Yeah.
>> Once again with the huge disclaimer that
I don't do any AI stuff. So this is just
off the cuff. But ARC AGI, if I'm not
mistaken, is a benchmark specifically to
test how well AIs perform uh on learning
completely arbitrary new things that
don't exist anywhere in their training
data. That's the only thing that it's
intelligence of this all.
>> Exactly. And so the only reason I would
want to point out that I don't think
that test says very much about this
particular security thing is security is
not that true. Like nobody nobody is
claiming that Claude Mythos came out and
discovered a whole new set of classes of
security exploits that no one had ever
come up with before. What it's saying is
that it went and found a bunch of the
exact same kinds of zero days
>> that someone like Ed would find if they
went and spent a week on that piece of
software, right? Like so they're not
claiming that this thing is somehow more
intelligent than the predecessor in that
way. It's claiming that it's got better
pattern matching
>> and like stringing things together to
create exploits, right? That process
which is well known. And so, so I don't
think ARC AGI necessarily tells us very
much about whether it can do those
things because those things are very
well-known tasks that security
researchers know how to do and we kind
of know the process that you do to do
them, right? So,
>> yes, that's okay. I will I will I will
concede that point most certainly that
the security at least known and obvious
security vulnerabilities such as use
after freeze and and all the fun stuff
like the stuff that happened in ffmpeg
with jumping ahead somewhere in a buffer
based on
>> yeah the these things are very common
kinds of bugs they're not like unusual
the things that they've talked about are
like very very standard and so that
seems like a more plausible claim like
hey we just were able to scale up the
sort of security checking that a
security researcher would do it can do
that thing and and find you know
potential places for that
>> a lot more plausible than AGI.
>> Yeah. The thing too for I feel like for
the security side of it as oppo like as
opposed to constructing a product or a
new product or like building a feature
where you have to get like in some ways
all the things right for a security
thing I only need to find one of the
things that are wrong.
>> Yeah. which is like a
like you can test a bunch of the
scenarios like you're saying that
already exist and I only need one thing
to be wrong in the program for then me
to be able to take control of it.
>> Well, and it's combinatorial, right?
Like a lot of what security research is
doing is like a it's pattern matching
for these kinds of bugs and then b going
like okay if I did this one followed by
this one would that produce an exploit?
What if I did in the opposite order?
What if I did this one and then this one
and then that one? Okay, what if I did
this one? Right? And again, these are
things computers are good at like that.
It's not you don't have to believe in
some kind of a weird like supernatural
like AGI achieved internally Sam Alman
nonsense to believe that this is
something a computer could do. It's it's
much more plausible if anything than
some of the other claims. So that's why
I I would like say I'm I'm not that like
when I saw this I wasn't like that's got
to be false. I was like okay yeah I
could believe that. Yeah.
>> I don't know. Mo most of vul research is
like you know take a function that gives
user input like define your threat model
and then do source to sync analysis on
some vulnerable function or failure to
gate a function on like a length check
and like does user data get there bug
confirmed and like yeah that's literally
just pattern matching that we've solved
a lot of the times previously with like
satisfiability solvers right like anger
and like Z3 like take the graph of a
function turn it into a math problem can
you solve the math problem cool bug
confirmed Well, now with AI, it's just
like that process of doing source to
sync on like text, it can do incredibly
fast, right? It's very good. Now,
obviously, because it's soastic, it
creates a lot of false positives, but if
we can figure out a way to reduce the
false positives or uh automate the the
validation of of those false positives,
then yeah, it's it's crazy. And I think
what they thought about what's that
>> have they thought about asking mythos?
>> I know. Come on. Can you just No
mistakes, please. Um the thing that
mythos is set apart differently
according to the anthropic report is its
ability to chain together primitives.
Right? So the scary part from like a
cyber crime perspective is like you have
uh gadget A that gives you an arbitrary
read and gadget B that gives you an
arbitrary right. Okay. Like those two
separate things are like not super
important if they're not used together.
Well, what Mythos is able to do is out
of a 100 tests, I think it's like 83% of
the time, find exploit primitives in a
vulnerable codebase and chain them
together to get rce, right? That's the
scary part because then that's true like
end to end exploit creation for a bad
actor. And that's I think what scares
anthropic the most.
>> Um, now I know there's argument where
like Firefox wasn't in the sandbox for
that experiment. So like it doesn't
actually matter. But I mean just apply
that process to the sandbox and the same
thing applies. You know, it's just I
think they wanted to prove a point that
it could do that.
>> Well, and also I mean again like as I've
said many times, I can't stand AI
companies, so I'm not trying to defend
them or anything, but I'm just trying to
point I'm just trying to point out how
plausible this stuff is to me from a
neutral observer standpoint.
>> Classic case defending AI companies.
>> Yeah, I know, right?
>> Um
>> I know. Uh if you think about it, it's
like look, security researchers who do
not number that many were already
cranking out zero days at a much too
alarming rate for me, right? Like like
you know there's a hack every other day,
right? It's not like CVES are piling up
like there's no demar and yeah, not all
of them are actually all that bad or
whatever, but like it's not like
security researchers were having trouble
producing a fair number of of critical
vulnerabilities even with the limited
resources that they had. So, it's also
not weird to think that like if you had
more automation, you would find a lot
more of them. It doesn't like there's
clearly just a lot of bugs, guys. Like
there's a lot of freaking bugs and it's
just doesn't seem that unusual that if
you have more sophisticated pattern
matching, more sophisticated
cominatorial checking where the security
research doesn't have to spend a lot of
time setting up the tool because it can
just kind of ingest the code and it
knows what roughly what it means.
>> Yeah, I mean they're their rates going
to increase if nothing else existing
security teams rates of finding
exploits. It has to. I mean it just has
to. Unless this thing is just a complete
pile of crap,
>> it's got to. The other thing too we've
been seeing from each like new
generation of model is that they're
getting at least from my experience and
what I'm what I'm reading from people
and everything they're getting better at
calling other tools.
>> So like they call out to stuff more
regularly
>> and they can pay attention for longer.
recompile this code and see you know
make this exploit and run it against
this thing or whatever right like those
are all things that if you automate them
a security researcher gets much faster
at finding because they're not having to
set up the tooling themselves to like go
work on this exploit like whatever
whatever those steps were they don't
have to do them anymore right
>> right so then if you're like oh now it
can run instead of like I have to prompt
it at every stage for the next thing to
do is I can give it
>> 10 rough things say try a bunch of
combinations of these and and it runs
for 24 hours.
>> Yeah.
>> You're just like a lot. It's literally
like in in my mind some of it is like
Yeah. Well, we already know fuzzers
exist. Like we use them all the time and
they're good,
>> right? It's like in some ways almost
like
>> Yeah. It's like fuzzer. It's like fuzzer
squared, right? It's like a thing now
that can like target the fuzzing at
things specifically. So that things that
would be very hard for stocastic testing
to catch
>> because when you have stocastic testing
and you have to chain two things
together, you're never going to randomly
pick the two things that would have to
happen for them to work. Here's a thing
that can like target that specifically
and go like, "Oh, I think combine these
two things. Probably let me fuzz that
specific path." Oh, yep. I got it right.
>> That's where it gets crazy is like you
just have the AI write the fuzzer and
then like if you can automate that
process, you win a lot of the time. It's
it's pretty pretty amazing. Um, I do
have to go though. I have a meeting in 3
minutes, so I got to I got to rip. Um
>> Oh, hopefully you get Mythos access.
Congrats.
>> That'd be neat. No, it's not going to
happen.
>> Come on, guys. Give him
>> Have a good one, man.
>> I like you guys, but it looks like it's
the end of our show, unfortunately.
>> Yeah.
>> True.
>> All right.
>> Thank you everybody. I would just like
to say that uh I would I would just like
to say that Casey and TJ and obviously
Teimu Casey that just left commonly
known as Lowle learning uh you guys you
know you make the show magic and
and now I'm just going to go about being
lonely again. Kind of sad.
>> Oh, Prime.
I knew that was coming. I thought I I
thought I was going to get booed, but I
I just assumed something was going to
happen. All right. Um, the real the the
good news is is that you can enjoy full
episodes of the standup now on YouTube.
If you go to the standup pod full, which
I'm going to try to rename hopefully at
some point, we're trying to work some
things out to get it a better name. But
right now, YouTube, am I right? Um,
>> if you go to the website, if you go to
our website, will it have links to
these?
>> Yeah, it will. It will. And it'll have
it spelled out. Uh we'll we'll make it
more clear once we figure everything out
over the next week. Maybe by the time
you're listening to this on YouTube, by
the time you're listening to this on
YouTube, uh we're going to upload all of
the backlog to that channel as well. So
we should have every episode on YouTube
in one spot, very easy to see, etc.
Obviously, you always can, you know, RSS
download the audio directly. Don't press
the red button on that site. Of course,
>> teach, what is that web address that
people should go to for
>> the standup pod? Hey,
>> the stand. Go to the standup pod.com.
All the links will be there. All the
episodes will be there. You want
YouTube, you want Spotify, you want
downloads, you want RSS, you got it.
>> The standup pod.com.
>> Yeah. Yeah. Check this out. I'm just
going to do something for the audience.
Look at this. If you go here, you click
Trash made a black mirror app, you can
go and you can listen to it right on the
website. You can have all the
information right here. You can trashes
app right there. You can go in here.
>> We don't even look at this. We don't
even charge you.
>> You can play on Spotify. You can
download and just have personally for
you to do whatever you do.
>> That's for you.
>> Now that we're And then I'll make it
I'll make it so it links to the YouTube
there later as well now that we're going
to have a dedicated YouTube channel for
that too. So for all of you out there,
you know,
>> the AI companies claim that you're going
to get UBI, but we're actually giving
you universal basic podcast.
>> You just get it for free. UBP.
>> UBP. You know me.
>> UBP. Yeah,
>> UBP.
>> Yeah,
>> I was going to say, well, I don't know
what I was going to say. That's fine. We
should just
>> We should really just end this episode.
>> Stick a fork in it, guys. It's done.
>> All right. Thanks.
>> Good seeing everybody.
>> Thanks, YouTube. Thanks again, uh,
whatever your name is. Tee, you're
pretty neat. Boot up the day.
V coating errors on my screen.
Terminal coffee
and
living the dream.
Ask follow-up questions or revisit key timestamps.
The video features a panel discussion centered on George Hotz's provocative take regarding cyber security and the release of Anthropic's 'Mythos' model. The participants explore whether the limited discovery of zero-day vulnerabilities is truly a lack of skill among hackers or, as Hotz suggests, a lack of financial incentive. They analyze the potential impact of AI tools in vulnerability research, the hype versus reality of AI model benchmarks, and the broader existential security implications of AI models being capable of identifying and chaining exploits. The conversation also touches on the ethical concerns surrounding AI companies withholding access to powerful tools and the lack of fair compensation for the security researchers whose expertise informs these AI advancements.
Videos recently processed by our community