Is Mythos too Dangerous?

Watch on YouTube

Now Playing

Transcript

279 segments

0:00

Here we are. Claude did it again.

0:02

Dropped a new version of itself. Okay.

0:04

But this one, it has a very special

0:06

name. Okay. It's It's much better. We're

0:08

not on the old Sonnet or Opus or Haiku.

0:11

No, we've been upgraded to Mythos. The

0:15

greatest model to ever be dropped. In

0:17

fact, it's so great. It's so fantastic

0:20

that you you the per Yeah. You sitting

0:22

there. Yeah. You right now. You can't

0:24

you can't have you can't have that.

0:26

Okay. Hey, you're not allowed to touch

0:27

that. Apparently, this model is finding

0:31

bugs and uh able to crack out of

0:33

sandboxes like nobody's business. We are

0:36

talking about able to take down

0:38

computers just simply by connecting

0:40

them. They're the Chuck Norris, God rest

0:42

his soul, of of all of the models, okay?

0:44

It's just able just to destroy

0:46

everything apparently. Okay, you got to

0:47

hide your kids, hide your Raspberry Pies

0:49

cuz they're taking everybody out here.

0:51

So, let's talk about this new model for

0:53

a second. They kind of released a bunch

0:54

of stats for it and then they released

0:56

the part that would be considered the

0:58

scary part. The part that you always see

1:00

Anthropic does, right? Because this is

1:02

pretty typical of Anthropic is they have

1:04

a new model and then what do they do

1:06

with it? They're like, "Dude, by the

1:07

way, AI super scary. The most scary

1:10

ever. So scary. US government. Hey,

1:13

government so scary. You better put some

1:16

regulation in place and help us control

1:18

because man, it's scary." So, first

1:21

let's just go with the least interesting

1:22

of the items, which honestly I don't

1:24

care about any of these numbers cuz

1:26

honestly it really means nothing to me.

1:28

But here we go. The Sweet Benchmark Pro

1:31

Mythos preview, the new model, 77.8%

1:34

versus Opus 46 at 53.4%. So, as you can

1:38

see, it's dramatically better.

1:40

Practically 20% better. Now, what does

1:43

that actually mean for you or me? Well,

1:45

it doesn't really mean anything because

1:46

you're not going to touch this model.

1:48

You know, you're not allowed to.

1:49

Nobody's allowed to. Only a few people

1:51

at Amazon, Google, and Apple, and a

1:54

couple other top companies and the US

1:55

government are allowed to touch this

1:57

model. And you can see the rest of the

1:59

benchmarks just seems to perform super,

2:02

you know, super much better than Opus

2:04

46. On the reasoning side, the GP, QA,

2:07

Diamond, Mythos Preview dominates Opus

2:10

46. Humanity's last exam, Mythos Preview

2:14

without tools still gets an F, but I

2:16

mean, we're we're getting near D

2:18

territory. And you know what? D's earn

2:20

degrees at some some of the places in

2:22

Mythos with tools actually does get a D.

2:25

Okay, it is passing some colleges. This

2:27

is some serious PhD level intelligence

2:29

going on here. The actual interesting

2:31

part about the model is security

2:33

research. I've already just released a

2:35

video about this. How Daniel Stenberg,

2:38

the uh maintainer, lead maintainer of

2:40

CURL has said, "Hey, AI reporting, it's

2:42

gotten a lot better. It's actually

2:43

starting to show real issues. For a long

2:46

time, AI inside the security field has

2:49

been a security issue itself because it

2:51

just inundates any maintainer with so

2:54

many fake reports that it's actually

2:56

impossible for maintainers to really be

2:58

able to operate on their own repository.

2:59

But then a kind of a shift, a big shift

3:01

happened with 46. We're actually

3:02

starting to see AI being actually, oh

3:05

wow, no, this is actually serious now.

3:07

Now it can seriously find things. But

3:09

this new one, Mythos, apparently is real

3:12

good. During our testing, we found that

3:14

Mythos Preview is capable of identifying

3:17

and then exploring zero-day

3:18

vulnerabilities in every major operating

3:20

system and every major web browser when

3:23

directed by a user to do so. The

3:26

vulnerabilities it finds are often

3:27

subtle and difficult to detect. Many of

3:30

them are 10 or 20 years old with the

3:33

oldest we have found so far being a now

3:36

patched 27-year-old bug in OpenBSD, an

3:40

operating system known primarily for its

3:42

security. Mythos preview wrote a web

3:45

browser exploit that chained together

3:46

four vulnerabilities writing a complex

3:49

JIT heap spray that that escaped both

3:52

renderer and OS sandboxes. It

3:54

autonomously obtained local privilege

3:56

escalation exploits on Linux and other

3:59

operating systems by exploiting subtle

4:00

race conditions and Casler bypasses. It

4:03

autonomously wrote remote execution code

4:06

exploit on free BSD NFS server that

4:09

granted full route access to

4:10

unauthenticated users by splitting a 20

4:13

gadget RO chain over multiple packets.

4:16

It even found a 16-year-old

4:18

vulnerability in FFmpeg, the hand

4:20

artisally crafted library. So if this is

4:23

all to be believed and this is actually

4:25

what is happening and we are literally

4:27

entering into the most impressive era

4:31

for AI ever to the point where releasing

4:34

the model publicly would result in every

4:36

system that has ever existed being

4:39

hacked. Well we got ourselves a bit of a

4:42

problem now don't we? And that is why

4:44

Enthropic has said the following. We do

4:46

not plan to make claude mythos preview

4:48

generally available. We plan to launch

4:50

new safeguards with an upcoming claude

4:52

opus model allowing us to improve and

4:54

refine them with a model that does not

4:56

pose the same level of risk as mythos

4:58

preview. So that 20 plus improvement on

5:00

sweet bench baby, you're never going to

5:02

taste that. Okay? You're never going to

5:04

get your sweet hands on that one. But

5:06

you might get a smarter claude. Does

5:08

that mean we're entering into the nation

5:09

of geniuses on a GPU that's stored in a

5:12

warehouse in which Anthropic owns and

5:14

you are now able to create everything

5:15

you've ever wanted just with a simple

5:18

quick text description? Well, it doesn't

5:20

necessarily sound like it. It sounds

5:22

like some people might have it, but I

5:24

don't think you're going to have it

5:26

anytime soon, and I probably not going

5:27

to have it anytime soon either. See, the

5:29

thing is, they're going to release it to

5:31

a few select tech cartel leaders, and

5:34

who knows when it's actually going to

5:36

happen. So, is it as big of a deal as we

5:38

are seeing or is it not? Obviously, we

5:40

can see the receipts with FFmpeg saying,

5:43

"Hey, thanks for the patch." But some

5:45

aren't buying it. You got Boris saying,

5:47

"Hey, it's very powerful and should feel

5:49

terrifying." Kind of continuing to push

5:51

the same narrative, but just never

5:53

forget the exact same narrative was

5:55

pushed with Chad GPT2. It is really

5:58

dangerous. You got to be super careful.

6:00

It's honestly too dangerous to release.

6:03

Well, the best we can hope for is that

6:04

Chad GPT also happens to have Chad GPT6

6:07

or something or Chad GPT Cosmos going to

6:10

be coming out and that will force

6:12

Anthropic to have to catch up and

6:14

release their super powerful model which

6:16

is also just a weird place to be in that

6:18

we're I what did I just say there? Me

6:21

rooting for open a Oh my gosh, something

6:24

got into my head there for a second. But

6:25

I think Lowle said it best. They called

6:28

it Mythos because no one's ever going to

6:29

see it. They're literally trying to rage

6:32

bait us right now. I'm feeling it. I'm

6:34

feel I'm feeling the baiting. You know,

6:36

it's hard not to look at all this and

6:37

realize that there's some part of my

6:40

skills every year becoming more and more

6:42

irrelevant. You know, the ability to

6:44

hammer out all those Vim shortcuts. Kind

6:46

of a dying skill, right? It's a little

6:48

sad. I I mean, I personally think it's

6:50

pretty dang sad, but it's an ending

6:52

skill. It's a It's a skill that I don't

6:54

think the younger kids, them young

6:56

fellas, are going to really learn

6:57

because they don't really have to learn

6:59

it. And it's becoming more and more

7:00

apparent that people would rather just

7:03

hammer on to a model than actually learn

7:05

any of these tasks or these like really

7:07

fine difficult things anyways. And so

7:09

here we are. So the things that you know

7:11

I have defined myself with over the last

7:14

20 years. See while you guys went out

7:16

smoking with cigarettes, staying up too

7:19

late, probably experimenting with

7:21

mindaltering drugs. I on the other hand

7:23

was sharpening my skills. And now those

7:26

skills, maybe they're a little bit more

7:28

useless. Every single year, a little bit

7:30

more useless. But honestly, I'm okay

7:32

with it. I know that might be strange to

7:33

say, but I am okay with it. I'm okay if

7:36

these things do turn out to be fantastic

7:38

that I don't have to be uh I don't have

7:40

to identify myself as the greatest Neoim

7:43

user of all time. It's cool. I can still

7:46

use Neoim and I can still enjoy it, but

7:48

it doesn't have to be my identity. And

7:50

also I'm just happy I've done all those

7:52

years of trying to understand how to

7:54

make good software because now even if I

7:56

do AI generate something I can go oh

7:59

yeah this is here's why it's wrong I can

8:02

just understand things at a level in

8:04

which people who've never even touched

8:05

software have no idea about. So hey am I

8:08

happy about that still? Sure. And maybe

8:10

you know what one day those skills even

8:12

could become invalidated. And if they

8:14

are I guess I have to be okay with that.

8:16

That's it. I just kind of wanted to yap

8:18

about this because, you know, it's it's

8:20

been an interesting time and I genuinely

8:23

really appreciate that I still have uh

8:25

the chance just to yap to yap to you

8:27

guys, you know, to kind of talk about

8:28

these things cuz I know a lot of people

8:30

they feel kind of really unsure about

8:33

everything. They feel kind of worried

8:34

about everything. Uh especially with

8:37

just all of just the crazy talk from the

8:39

hype beast being like, "Oh, it's the end

8:41

of the universe." Even this report right

8:43

here by Anthropic being like it's it

8:46

knows how to take advantage of every

8:48

single browser, every single operating

8:50

system. It's finding bugs 27 years old.

8:53

You're absolutely going to get destroyed

8:55

if we let this thing out. It's just

8:57

constant fear instilling,

8:59

you know, just attacks on you at all

9:01

times. And you know, I see these things.

9:03

I'm like, "Okay, hey, I'm glad that if

9:05

it really is that that Anthropic making

9:08

quote unquote steps towards Amazon and

9:11

Google and all this nonsense to be able

9:13

to patch all these problems, but at the

9:15

same time, I don't want to have to live

9:16

under this like intense pressure and

9:20

this intense constant barrage of just

9:22

negativity. Like I can look at it as

9:24

like, wow, I now have the ability to

9:26

accomplish things that before would have

9:29

taken me a lot longer. They would have

9:30

been a lot harder. I would have been

9:32

less likely to even start them just

9:33

because I can only have so many side

9:35

projects. Now I get the benefit to be

9:38

able to abandon several side projects.

9:40

Like I have been able to abandon more

9:43

projects than I've ever done in my

9:45

lifetime thanks to the power of AI. And

9:47

honestly, that feels pretty amazing.

9:49

Hey, the name the primogen. Hey, is that

9:52

HTTP? Get that out of here. That's not

9:55

how we order coffee. We order coffee via

9:57

ssh terminal.shop. Yeah, you want a real

10:00

experience. You want real coffee. You

10:02

want awesome subscriptions so you never

10:04

have to remember again. Oh, you want

10:06

exclusive blends with exclusive coffee

10:09

and exclusive content? Then check out

10:11

CRON. You don't know what SSH is?

10:14

>> Well, maybe the coffee is not for you.

10:21

Living the dream.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Anthropic has released a new, highly advanced AI model named Mythos Preview, which demonstrates significantly improved performance over previous versions like Opus 46, particularly in security vulnerability detection. This model is reportedly capable of finding and exploiting zero-day vulnerabilities in major operating systems and browsers, including a 27-year-old bug in OpenBSD. Due to its advanced capabilities and potential risks, Mythos Preview will not be made generally available. Instead, Anthropic plans to incorporate its advancements into future, safer models like Claude Opus, while also developing new safeguards. The release has sparked discussions about the rapid advancement of AI, its potential impact on various skills and professions, and the ongoing debate around AI safety and regulation, with some viewing the model's restricted release as a form of "rage baiting" while others express genuine concern about its power.