Mythos unleashed on Opensource

Watch on YouTube

Now Playing

Transcript

448 segments

0:00

Over the last couple months, you've

0:01

probably seen headlines that look

0:03

something like this mythos. It's

0:06

entirely too dangerous. And even just

0:09

last month, Mosilla posted this blog

0:12

post that said, "The zero days are

0:15

numbered." And more so, they even write,

0:17

"Defenders finally have a chance to win

0:19

decisively." Very, very funny. First

0:21

off, you can't say a chance to, which

0:24

means that you it's not guaranteed. And

0:26

then use a word that means, "Hey, it's

0:28

guaranteed." No, like really like look

0:30

at this. Often without wavering or doubt

0:32

like this. This is [laughter]

0:34

it. You might win decisively. What are

0:38

you doing, Mo? This is just silly. That

0:40

means for the last little bit, it's been

0:42

kind of difficult to figure out what the

0:44

heck is actually going on. Is Mythos

0:46

really the thing that's going to end all

0:49

and be all of security, or is it

0:51

actually going to be just another model

0:53

release? Yeah, things get slightly

0:55

better, but nothing that's just going to

0:56

blow your socks off. Just looking at the

0:58

marketing, it's probably going to be the

1:00

greatest thing that has ever existed

1:02

ever. Just saying. Going out on a hunch

1:04

here. But today, an amazing article

1:06

comes out called a mythos finds a curl

1:09

vulnerability. Daniel Stenberg is

1:11

probably best known for the library

1:13

curl. You've probably used it several

1:14

several times, and if not, your AI agent

1:16

is pounding it away, but also he is

1:19

known for this article. The I in LLM

1:22

stands for intelligence where he kind of

1:23

bemons the state of security. This is

1:26

what's going on. is security projects

1:28

are being inundated with really slop

1:30

level PRs and security bugs that are

1:33

just absolutely denying the attention of

1:36

maintainers. We called a denial of

1:37

attention attack kind of unique and he

1:39

was one of the first people to really

1:41

write strongly against this saying how

1:43

just annoying and just awful this AI

1:46

revolution has been. This of course was

1:48

on January 2nd, 2024. Fast forward about

1:51

2 years later on January 26, 2026 and

1:55

Curl finally says, "You know what? our

1:57

paid for bug pounty program. It just has

1:59

to go away. Honestly, the level of slop

2:02

is just absolutely distracting and we

2:05

can get nothing done. And then of

2:07

course, three months later, April 22nd,

2:09

2026, which uh is only one day after the

2:12

defenders have a chance to win

2:14

decisively, Daniel writes this article.

2:16

High quality chaos. No more AI slop.

2:19

Meaning that the actual PRs and things

2:21

have gone up significantly in quality

2:24

and he can now finally rely on these AI

2:27

assisted security reports. And this has

2:30

largely been a lot of people's

2:31

experience. Like if you were any sort of

2:33

serious programmer, 2024 AI was

2:36

effectively useless. There were a few

2:39

cases where it's actually pretty sweet,

2:41

but besides for that, it was largely

2:43

ineffective. Now, today, there's

2:46

actually some use cases, and it actually

2:47

is pretty dang good at finding security.

2:49

I've heard from many, many, many of

2:51

security researchers that are actively

2:53

employed at small to large companies

2:55

saying, "Hey, this thing's actually gone

2:57

from completely useless to very useful."

3:00

And now to this Mythos finds a curl

3:03

vulnerability. I am very excited about

3:06

the article being written right here

3:08

about Mythos and curl because this is

3:10

going to be somebody who's gone through

3:12

all the proper cycles recognizing when

3:14

AI first came on the scene that it had a

3:16

huge amount of gaps and it's largely not

3:18

that useful. And now today it's improved

3:21

and there's actually like some good

3:22

utility that can be drawn from it. And

3:24

since curl is approximately 178,000

3:28

lines of C, I mean Mythos must have a

3:30

heyday in it. Well, I've read the

3:32

article and I have a lot to yap about.

3:35

We got I mean this is there's some juice

3:38

in here, people. But of course, before

3:39

we get started, the bag. Hey, is that

3:42

HTTP? Get that out of here. That's not

3:45

how we order coffee. We order coffee via

3:47

ssh terminal.shop. Yeah. You want a real

3:50

experience? You want real coffee? You

3:52

want awesome subscriptions so you never

3:54

have to remember again? Oh, you want

3:56

exclusive blends with exclusive coffee

3:59

and exclusive content? Then check out

4:01

Cron. You don't know what SSH is?

4:04

>> Well, maybe the coffee is not for you.

4:15

>> All right. So, Mythos finds a curl

4:17

vulnerability. Yes, a singular one. That

4:21

might be a little disappointing to some

4:22

of you. I think right away your hearts

4:24

probably dropped a little bit. Back in

4:26

April 2026, Anthropic caused a lot of

4:28

media noise when they concluded that

4:30

their new AI model, Mythos, is too

4:33

dangerously good at finding security

4:35

flaws in source code. Apparently, Mythos

4:38

was so good at this that Anthropic would

4:40

not release this model to the public

4:42

yet, but instead trickle it out to a

4:46

select few companies for a while to

4:48

allow a few good ones to get a head

4:50

start and fix most of the pressing

4:52

problems first before the general

4:54

populace would get their hands on it. By

4:56

the way, I do agree with Daniel's little

4:58

question mark here. I'm not really sure

4:59

why just a few people, you know, like

5:02

the few good ones. Like, I don't know

5:03

what that says about a lot of these

5:04

companies. a lot of companies just out

5:06

there just completely vulnerable because

5:08

Anthropic won't share. Okay, kind of. I

5:10

mean, cool, I guess. So, as part of

5:12

Project Glasswing, which is anthropics

5:14

reach out with Mythos saying, "Hey,

5:15

these certain companies and these

5:17

certain open source projects get early

5:19

access." So, as part of the glasswing

5:21

outreach, Curl was contacted and Daniel

5:24

had the offer, hey, would you like to

5:26

use Mythos to go through your curl

5:29

library? Of course, he was excited about

5:31

it, but then uh you know, some weeks

5:33

went by and nobody reached out again.

5:34

And then eventually someone reached out

5:36

and just said, "Hey, we're not going to

5:37

give it to you, but we will analyze curl

5:40

for you. Do you want us to do that?" To

5:41

me, the distinction isn't that

5:43

important. It's not that I would have a

5:45

lot of time to explore lots of different

5:47

prompts and doing deep dive adventures.

5:49

Anyways, getting the tool to generate

5:51

the first proper scan and analysis would

5:53

be great. Whoever did it, I happily

5:56

accepted this offer. Honestly, kind of

5:57

the best of both worlds, right? You got

5:59

the person that's been using this model

6:00

a whole bunch to be able to go and do

6:03

the scan for you on your codebase. You

6:05

don't have to do a bunch of the

6:06

learning. You don't have to set up a

6:07

bunch of harnesses. Ideally, they

6:09

probably have things that are more

6:11

advanced than say I would have to begin

6:13

with. Right before we kind of proceed,

6:14

there's a little bit of a backstory you

6:16

have to understand about Curl. Curl's

6:17

been around for a very, very long time.

6:20

And so, they have a fairly extensive

6:22

suite of tests. They also just have a

6:24

whole bunch of other tools that they

6:26

run. And this is kind of this screenshot

6:28

that Daniel provides which means that

6:29

they have code style. They have band

6:31

functions. There's certain functions in

6:32

C which you can just shoot your foot a

6:35

lot easier than others. I'm thinking of

6:36

stir copy versus stir end copy.

6:38

Obviously human review bot reviews no

6:41

binary blobs no get force push a bunch

6:43

of other stuff in here that's actually

6:45

pretty important. And putting that all

6:46

together it has led curl down a route in

6:49

which has had very few vulnerabilities

6:51

kind of disclosed. And when I say few, I

6:53

mean they've had a couple hundred CVEes

6:56

total. So you can imagine that on May

6:58

6th, 2026, when they received the Mythos

7:01

report, they were probably pretty

7:03

excited because what are the chances

7:04

that 176,000 lines of C code doesn't

7:09

have quite a few bugs lurking around in

7:11

it. They've had 573 separate individuals

7:14

commit to CURL over its lifetime, which

7:16

means that it's not even just one

7:18

person's thought process, but just a

7:20

huge amount of people that have

7:21

committed over a long period of time.

7:23

All right, so we're almost done with

7:24

this story and then we'll get to the

7:25

yapping. The report concluded it found

7:27

five confirmed security vulnerabilities.

7:29

Now, this is a pretty classic AISM.

7:32

Whenever AI has confidence, it's going

7:34

to tell you like there's there's

7:36

something amazingly magical about the

7:38

fact that even if something is wrong,

7:40

the AI is so dang confident in being

7:44

incorrect. It's honestly a skill I could

7:46

learn a lot from. So, after a couple

7:48

hours of investigation, this is what

7:49

Daniel says. We had trimmed the list

7:52

down and were left with one confirmed

7:54

vulnerability. The other four were three

7:56

false positives and the fourth we deemed

7:58

just a bug. The single confirmed

8:00

vulnerability is going to end up as a

8:02

severity low CVE planned to get

8:04

published in sync with our pending next

8:06

curl release in 8.21.0

8:09

in late June. The flaw is not going to

8:11

make anyone gasp for breath. So in other

8:14

words, it's nothing that's like super

8:16

important important enough for them to

8:17

do some sort of outofband release.

8:19

They're just like, "Hey, next time we

8:20

release, hey, sometime in the future,

8:22

don't even worry about it. That's how

8:23

much we're concerned about it. So at

8:25

some point we will go out and we will we

8:28

will fix it in public. No one rush in.

8:30

Hey, no need to rush in. A side effect

8:32

of the Mythos report is that it found

8:34

about 20 bugs that were described nicely

8:37

with very low false positive rates. And

8:39

so that actually is something that just

8:41

generally was a pretty good deal for

8:42

Daniel. Curl is certainly getting better

8:44

thanks to this report, but counted by

8:46

the volume of issues found all the

8:48

previous AI tools we have used have

8:50

resulted in larger bug fix amounts. Now

8:52

this is where things kind of get funny.

8:54

The reality is is that curl has been

8:56

using AI now to kind of do a lot of

8:58

analysis on its code and it's able to

9:00

find more and more bugs. Even though in

9:03

the beginning Daniel was very very very

9:05

hesitant about AI, they have since

9:08

changed their tune as AI has improved

9:11

and since then they've been able to find

9:13

a large amount of bugs which means that

9:15

when mythos ran it actually found very

9:18

few bugs. Now I want you to think about

9:20

that for a second. What does that mean

9:22

to you? Before I give my conclusion,

9:24

let's read Daniel's conclusion first. My

9:26

personal conclusion can however not end

9:28

up with anything else than that the big

9:30

hype around this model so far was

9:33

primarily marketing. I see no evidence

9:36

that this setup finds issues to any

9:38

particular higher or more advanced

9:40

degree than other tools have done before

9:43

Mythos. Maybe this model is a little bit

9:46

better. But even if it is, it is not

9:48

better to a degree that seems to make a

9:50

significant dent in code analyzing. Any

9:52

prefaces with the following? This is

9:54

just one source code repository and

9:57

maybe it is much better on other things.

9:59

I can only tell and comment on what it

10:01

found here. all the AI tools that they

10:03

have already been testing out. All the

10:05

security researchers that have helped

10:07

make CURL a better product. All the

10:09

stuff that came before it effectively

10:11

did most of the cleaning to where when

10:14

mythos was unleashed on this product,

10:17

there wasn't the same kind of like just,

10:19

you know, headline making outcome.

10:22

Instead, it's just like, oh yeah, that

10:24

was a bug. Oh yeah, it's not like all

10:26

that severe. Yeah, they they they found

10:28

some extra additional stuff. And he

10:30

further concludes with this. These were

10:32

absolutely not the last bugs to find or

10:34

report. Just while I was writing the

10:36

drafts for this blog post, we have

10:38

received more reports from security

10:40

researchers about suspected problems.

10:42

The AI tools will improve further and

10:44

researchers can find new and different

10:46

ways to prompt the existing AIS to make

10:48

them find more. We have not reached the

10:50

end of this yet. In other words, even

10:53

after Mythos, there's still just people

10:56

maybe empowered with security tools or

10:58

not finding more vulnerabilities. So, is

11:01

this the end? Can we definitively say

11:04

the zero days are numbered? Defenders

11:07

finally have a chance to win decisively?

11:10

No. I don't think we can say those

11:12

things. Do I think that the field of

11:14

security is going to be complicated?

11:16

Sure. It's going to be pretty dang

11:18

complicated. And I think as these tools

11:19

get more and more advanced and we're

11:21

able to run faster, not only does it

11:23

allow the defenders to, you know,

11:25

effectively have a better castle, if you

11:27

will, it also allows the offenders to

11:29

come at it in a much more aggressive

11:31

way. So, is this the end? No. Are we

11:34

going to are we like spiraling towards a

11:36

security doomsday situation? No, I don't

11:39

think we are. And for me, this is kind

11:41

of the cold water that needed to be

11:42

poured on the hype because it's really

11:44

difficult to understand because it's one

11:45

thing that if someone there's all these

11:47

Twitter users like, "Oh my gosh, because

11:49

you may you probably remember this back

11:50

in the day when Jippy 35 came out and so

11:53

many people are like, dude, your jobs

11:55

are over software engineers. Jippy 35 is

11:58

amazing." And you use it and you're

11:59

like, "Dude, what the hell are you even

12:01

talking about?" Like, yeah, that's cool,

12:04

but what? And so the nice part is that

12:07

because I could use it, I could actually

12:09

see the effects. But with Mythos, it's

12:12

kind of like the monster you don't get

12:13

to see. They're like, "Dude, just trust

12:14

me, bro. What's behind that door is

12:16

insane. It's so insane." And then you

12:18

got Mozilla like, "Dude, defenders have

12:20

a chance at winning definitively." Like,

12:23

those are crazy claims. Those are claims

12:25

that just feel pretty outlandish. I have

12:27

a particularly hard time believing any

12:30

of it. So, when I read this stuff, I

12:31

just realize like, yeah, okay, sure, it

12:34

could very well be better. And I assume

12:36

it's better. It's significantly larger,

12:38

right? It's all the latest and greatest

12:40

stuff, and it's a harness that's been

12:41

designed to do security stuff. So, one

12:43

would hope that it's actually pretty

12:45

good, but it's not something incredibly

12:48

special, something that has never been

12:49

seen before. It's just an iteration on

12:51

the things we already have. And yeah,

12:53

maybe the harness is a bit better. maybe

12:55

how they have the thing set up can go a

12:57

little bit longer, a little bit deeper

12:59

with all the different projects.

13:01

Therefore, you'll get better results,

13:03

but it's not something that's completely

13:04

out of reach of other tools. I love that

13:07

conclusion because it a it goes in the

13:09

face of Mozilla's just ridiculous uh

13:11

statement of the zero days are numbered.

13:13

Like, that is an absurd statement to

13:16

make. Like, there's there's just no more

13:18

security problems. You don't even have

13:19

to worry about security problems when

13:20

you got Mythos, baby. like

13:24

like what what are we doing here?

13:26

Mozzilla. Absolutely absurd. But more

13:29

so, it just shows that there's still a

13:32

lot of human ingenuity and creativity

13:35

that is needed to be able to control and

13:37

drive these things to be able to find

13:38

new bugs. It's not just simply like,

13:40

hey, just fire and forget. it still

13:43

requires people to be actively in the

13:45

loop because even Mythos reported five

13:48

critical vulnerabilities and it ended

13:50

with one actual vulnerability. So again,

13:53

I feel like I've been saying this a

13:55

whole lot lately. There is still so much

13:58

room for expertise in our field and I

14:00

just feel that more than ever because

14:02

everybody can move so fast now, but

14:04

people don't actually know what they're

14:06

doing. It's not like the intelligence of

14:08

the average intelligence of a person has

14:10

risen. No, just the more you offload has

14:13

risen, that is it. So, yeah, mythos is

14:15

probably better. I just assume it's

14:17

better. But by the degree that it's so

14:20

dangerous they can't even release in

14:21

public, probably not. Honestly, probably

14:24

not. Open AAI has this kind of behind a

14:27

identification wall that you can get

14:29

access to a more security focused model.

14:32

You could probably get a lot of the same

14:33

things that you can go get with Mythos

14:35

right now from it. So, yeah, it's

14:37

probably just a small addition better.

14:39

But I will say I am a little tired of

14:41

this marketing like this. The the

14:42

non-stop marketing towards developers is

14:45

ridiculous because I mean let's just

14:46

face it. We're kind of the cash cow of

14:48

these models. Like the people the people

14:50

keeping afloat these models are all of

14:52

us token users. I mean that should be

14:54

pretty obvious because Jensen over here

14:56

says he'd be deeply alarmed if his

14:58

$500,000 a year engineer is not spending

15:01

a minimum of $250,000

15:04

on tokens a year. Hon, our world could

15:06

be so much better if Sam and Daario

15:09

weren't competing for who's going to be

15:10

the first trillionaire with AI. Like

15:12

that's just what we're saying that we

15:13

are the pawns in the situation and

15:15

they're just trying to compete with it.

15:16

And of course, that's why you see these

15:18

absurd tweets. That's why two years ago,

15:20

never forget hashtag Sam tweeted a

15:23

picture of the Death Star saying, "Hey,

15:25

we have a big announcement tomorrow.

15:26

Death Star [laughter]

15:29

AGI achieved." And now look at us.

15:32

That's two years later. It's still the

15:34

same crap going on. We're still seeing

15:35

the same headlines. We're the target

15:37

market, people. I just wanted to say all

15:39

these things. I wanted to read this

15:40

article because I honestly thought

15:41

Daniel brought some really great

15:42

perspective, which was, hey, these tools

15:44

are really fantastic. They've helped us

15:46

close several several several bugs and

15:48

actually even also closed out a bunch of

15:50

CVEes. We've really liked it. It's gone

15:53

from the I in LLM stands for

15:56

intelligence to, hey, this crap sucks

15:58

to, oh, hey, actually, no, it's starting

16:00

to get pretty dang good. all over the

16:02

course of 2 years. That's all I got for

16:04

you. Hey, this is a good article,

16:05

Daniel. I really liked it.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video discusses the highly anticipated AI model Mythos and its supposed groundbreaking ability to find security vulnerabilities, contrasting initial marketing hype with its actual performance when tested on the Curl library. Daniel Stenberg, the maintainer of Curl, initially skeptical of AI, observed that while AI tools have improved and helped Curl find numerous bugs over time, Mythos, after extensive testing, only identified one low-severity vulnerability and about 20 minor bugs. The speaker and Stenberg conclude that Mythos, despite its significant marketing, does not offer a remarkably superior capability compared to other existing AI tools or human researchers, serving as a "cold water" reality check on exaggerated claims about AI's revolutionary impact on cybersecurity. The video also criticizes the pervasive and often misleading marketing of AI tools to developers.