Vibe Coding With GLM 5

Watch on YouTube

Now Playing

Transcript

805 segments

0:00

video,

0:00

I'm going to be vibe coding with the newly released GLM5,

0:04

which is a model that just got released from ZAI just a few

0:08

hours ago.

0:09

This is the new flagship open source model that we've been

0:13

waiting for.

0:14

And in this video,

0:15

I'm going to share everything that you need to know about

0:17

this new model.

0:18

We're going to dive deep into benchmarks.

0:20

We're going to give it a full stack test and we're going to

0:24

put it through the new bridge bench to see how it scores on

0:27

the bridge bench relative to other frontier models.

0:30

But with that being said,

0:32

I have a like goal of 200 likes on this video.

0:35

And if you have not already liked,

0:36

subscribed or joined the discord, make sure you do so.

0:39

And with that being said,

0:40

let's drive straight into the video.

0:43

All right.

0:43

So the first thing that I want to dive into is the context

0:46

of this model.

0:47

So this is the staple from ZAI, but it's again, 202,000

0:52

in context,

0:53

which is consistent with other models that we've seen,

0:56

other GLM models that we've seen.

0:58

But the really interesting factor here is the cost.

1:02

You know,

1:02

this is one of the biggest things from GLM models is the

1:05

cost.

1:06

You know, these are very affordable models.

1:08

$1 per million on the input in $3 and 20 cents per million

1:11

on the output.

1:13

That's like very, very affordable.

1:15

When I was using this on stream for day 132 of vibe coding

1:18

and app until I make a million dollars,

1:20

I was able to work with it for about an hour and I only

1:22

spent about like $6.

1:24

So in terms of cost,

1:26

this is an incredibly affordable model.

1:29

Now here's what ZAI actually says about the model.

1:32

I want to highlight a couple of them and read the whole

1:33

thing.

1:34

But one thing that they do say is that it's delivers

1:37

production grade performance on large scale programming

1:40

tasks, rivaling leading closed source models.

1:44

So they are talking about Opus 4.6 and GBD 5.3 when they

1:48

say that.

1:48

So the one thing that I do want to highlight though,

1:50

is their throughput, the tokens per second,

1:53

you can see that it's actually running quite slow right

1:55

now.

1:56

I did see it earlier today running at about 36 tokens per

1:59

second when I was on stream,

2:00

but that's since been slowed down a little bit.

2:03

They're probably experiencing a ton of demand since it's

2:06

launch day.

2:07

I'll be interested to see if that speeds up or slows down.

2:10

This again is only based off of the data from the last 30

2:14

minutes.

2:14

So you guys can see that they're only based off the data

2:16

from the last 30 minutes.

2:17

So it's a little bit slow,

2:18

but I did see it at 36 tokens per second.

2:21

If we compare this with Opus 4.6,

2:24

what you're going to see is that best runs at about 33

2:26

tokens per second.

2:27

So right now for the past 30 minutes,

2:29

at least Opus 4.6 has been running faster than GLM 5.

2:33

But in terms of cost, that is the biggest thing, Opus 4.6,

2:37

$5 per million and $25 per million respectively,

2:40

it's just in GLM 5 is in a league of its own in terms of

2:43

cost.

2:43

And that's what we continue to see out of some of these

2:46

frontier models.

2:47

But in terms of cost and speed,

2:49

I will say right off the bat,

2:50

incredibly affordable model speed kind of, I mean,

2:54

on par with Claude,

2:55

we'll see how it fluctuates over the coming weeks as they

2:57

come online and get a little bit more stable,

3:00

but kind of mid on the speed, but in terms of cost,

3:04

this is an incredibly affordable model.

3:07

But with that being said,

3:08

let's take a look at some of the benchmarks.

3:10

Okay.

3:10

So one quick thing to note is that GLM 5 has not yet been

3:14

added to the leader boards in Ella Marina,

3:17

but in artificial analysis,

3:19

which is a great benchmark to take a look at,

3:21

we do have access now to GLM 5 it's been benchmarked in

3:25

artificial analysis.ai.

3:27

And that's what we're going to take a look at here.

3:29

So again, do we just covered speed,

3:31

but you can see it's kind of reflective of what we already

3:33

saw over an open router, you can see 47 tokens per second.

3:38

Whereas Opus 4.6 is at 66 and Claude 4.5 Sonnet is at 78.

3:43

And then you have some of the Gemini models,

3:44

which are just very, very fast.

3:46

You also have GBD 5.2 extra high,

3:49

which is ranking at 95 now.

3:51

So speed, it lacks a little bit behind the frontier models.

3:55

But the biggest thing that I want to draw guys attention to

3:57

very,

3:58

very impressive here is the artificial analysis intelligence

4:02

index.

4:03

So I'm going to zoom in a little bit so you guys can see

4:04

this better, but check this out.

4:07

GLM 5 literally matches Claude Opus 4.5 on the intelligence

4:13

index.

4:14

It scores a 50 Claude Opus 4.5 also scored a 50 Claude Opus

4:18

4.6 scored a 53 GBD 5.2 scored 51,

4:21

but GLM 5 actually beats out Gemini 3 Pro preview,

4:26

which is just absolutely incredible.

4:29

You have an open source model with the pricing that it has,

4:33

and it beats out some of the frontier labs.

4:36

Very, very impressive.

4:37

Now, one thing I do want to draw your attention to though,

4:40

is the coding index.

4:42

So this is something that we saw.

4:43

If we take a look over,

4:44

I'm going to go over to my X real quick because you're

4:46

going to see a little bit more information here.

4:48

So here is the benchmarks that ZAI shared.

4:53

So these are benchmarks.

4:53

You guys can pause the video to take a look at this as

4:55

well,

4:56

but this is going to be reflective of what I'm about to

4:58

show you too.

4:59

So if you look at sweet bench verified right here,

5:02

you can see that it only scores a 77.8,

5:06

which is actually like, I mean, it's obviously good.

5:08

It beats out Gemini 3 pro,

5:09

but in terms of comparing that to some of the frontier

5:12

coding models like Opus 4.6, or in this case,

5:15

it's Opus 4.5, but you know,

5:17

still like three points behind that,

5:19

which actually is like a pretty large difference.

5:22

But if you look at the artificial analysis coding index,

5:24

the same thing gets reflected here.

5:26

You can see that it actually scored only a 44 on the

5:30

artificial analysis coding index,

5:32

which is still high up there, but still, you know,

5:34

a decent amount below some of the frontier models.

5:37

So if you are on the frontier, if you're using, you know,

5:40

Opus, you're using Claude models, you're using Claude code,

5:43

you're using codex, you know,

5:44

this is still not going to be up to par in terms of coding

5:48

as to what you're used to, right?

5:50

If you're using Claude or if you're using GPT,

5:53

but the thing is, is that when you do look at the pricing,

5:57

that's when things get a little bit interesting, right?

5:59

It's like, Hey,

6:00

are you going to spend $10 on a task to have Claude Opus 4

6:02

.6 do it?

6:03

That may do it a little bit better.

6:05

Are you going to spend, you know,

6:06

20 cents to have GLN 5 do it, right?

6:08

It's like kind of an interesting trade off here.

6:10

Also,

6:11

one thing that you guys should note is it performed very,

6:14

very well on the artificial analysis agentic index.

6:17

So it scored a 63 beating out GPT 5.2 beating out Claude

6:22

Opus 4.5 beating out can be K 2.5 and beating out all the

6:26

Gemini models.

6:27

I mean, that is incredible for an open source model.

6:30

The next thing that I want to show you guys is very,

6:33

very impressive.

6:33

And this is something that we will definitely be learning

6:36

more about.

6:37

Let me just find it's down here.

6:39

Okay.

6:40

Check this out guys.

6:41

If we go to this one here,

6:42

this is a very important benchmark.

6:44

And what this is,

6:45

is this is the basically the hallucination rate index.

6:49

So the lower score is better.

6:53

And what I want to highlight to you guys in this benchmark

6:56

is that GLN 5 has scored the lowest hallucination rate out

7:01

of any other model.

7:03

It scored a 34% whereas Claude 4.5 sonnet which is next.

7:10

It's second scored 47.

7:11

You can see that even some of the Gemini models, I mean,

7:13

they hallucinate like crazy,

7:15

but they're at 90% 88% whereas Claude Opus 4.6 75% but GLN

7:22

5 34.49%.

7:25

So in terms of hallucination,

7:27

I am incredibly impressed by what they were able to achieve

7:31

here.

7:32

This is the best score that's ever been produced on this

7:34

index.

7:35

Very, very impressed for this benchmark.

7:38

But with that being said,

7:39

I think that this now is going to sum up.

7:42

If we just take a look at the price,

7:44

what you can see here as well is like for this here,

7:46

lower is better on this price 1.6.

7:49

Whereas Claude Opus 4.6 scored a 10.

7:52

GBD 5.2 scored a 4.8.

7:54

So this gives you the look at like if you're using GLN 5,

7:58

you know, 1.6 versus a 10.

8:00

I mean, that is just insane, right?

8:03

I mean,

8:03

you're talking about a model that is just incredibly

8:06

affordable and incredibly capable.

8:08

So if you are a budget vibe coder,

8:11

this is going to be a model that you are definitely going

8:14

to want to take a look at.

8:15

It performs very well on the benchmarks and it's very

8:19

affordable.

8:20

Okay,

8:20

so the next thing that we're going to do is we're actually

8:22

going to be vibe coding using GLN 5 here inside of open

8:26

code and we're going to give it some real world vibe coding

8:29

tasks in production for BridgeMind.

8:31

So one thing I do want to highlight,

8:33

you guys may have noticed it's currently 8 30 a.m.

8:35

The last clip that you guys just watched was yesterday and

8:38

I want to tell you guys why.

8:40

So I was putting GLN 5 through the bridge bench and here

8:44

are the results.

8:45

These are the preliminary results.

8:46

They're not yet available on bridge mind.ai.

8:48

I'm going to push this here in a minute,

8:50

but GLN 5 after I used it yesterday and actually put it

8:54

through the bridge bench,

8:54

it made it so that I couldn't even put out this video that

8:57

you're watching right now.

8:58

Yesterday,

8:59

it was my initial plan to put this video out that you're

9:01

watching yesterday, but it's now, you know, 16, 14,

9:04

16 hours into the future.

9:05

It's 8 30 a.m.

9:07

February 12th and you can now see that we do have the

9:10

results from GLN 5, but look at this guy.

9:13

So here are the initial results from the bridge bench.

9:15

So overall, you can see the score of Opus 4.6.

9:18

It scored a 60.1 GPT 5.2 Codex.

9:21

It got a 58.3 and then GLN 5 a 41.5.

9:25

And I want to highlight the average response time for each

9:29

of these tasks.

9:30

So what the bridge bench is,

9:31

is it basically puts it through six different categories,

9:35

UI, code generation, refactoring, security, algorithms,

9:38

and debugging.

9:39

And it launches all the tasks concurrently and runs the

9:42

model on all these tasks.

9:44

Right.

9:44

And what I want to highlight is the average response time

9:47

for each of these models.

9:48

So Opus 4.6, 8.3 seconds, GPT 5.2 Codex 19.9 seconds,

9:54

and then GLN 5 156.7 seconds was the average response time

9:59

for this model.

10:00

So another thing is the completion rate.

10:03

Look at this.

10:03

It only completed 75 out of 130 of its tasks.

10:07

So even though this thing is a benchmark beast, you know,

10:09

we were just looking at the artificial analysis benchmarks

10:12

inside a bridge bench and real world task when you're

10:15

actually using the model with open router or with open

10:17

code.

10:18

This is what we're seeing very, very slow,

10:21

very bad completion rate.

10:23

So even though this thing is benchmark maxing, Hey,

10:25

an actual practice, it's not super reliable.

10:27

It literally caused me to have to delay this video.

10:30

So I just want to show you guys, this is the bridge bench.

10:33

I'm going to post this on the website,

10:35

but in order to do that,

10:36

we're actually going to be vibe coding with GLN 5 in this

10:39

video.

10:39

And we're going to be basically using GLN 5 to help us with

10:43

the bridge bench to get it deployed.

10:44

So if we look here, here's the public leaderboard.

10:47

Here's how you can find it.

10:47

You can go to bridge,

10:48

mine.ai and you can go to community and you go to bridge

10:50

bench.

10:50

Okay.

10:51

This is placeholder data.

10:52

We're running this locally.

10:53

But what I want to do is I want to make it so that first of

10:55

all, this table is going to be like have much better UI,

10:58

right?

10:58

So let's go over to GLN 5 here.

11:00

Let's go here.

11:01

And I'm going to use bridge, bridge voice.

11:03

This is the voice to text tool that I use.

11:05

It's the voice to text that I use.

11:07

So let's go here.

11:08

Bridge mine UI localhost bridge bench.

11:10

I want you to review the bridge mine UI and the bridge

11:13

bench page,

11:13

specifically the table that is in the bridge bench page.

11:17

And right now this table is has horizontal scrolling.

11:20

It's terribly styled.

11:21

I want you to completely reinvent the styling of the page

11:23

to make it better, make it more modern,

11:25

make it professional and compact the table and make it so

11:30

that there's no scrolling and so that it's just better

11:32

overall styling.

11:34

Okay.

11:34

So let's drop that in.

11:35

We'll see what it comes up with in the styling.

11:36

And then now let's go over to bridge bench and I want to

11:39

show you guys another thing that I did last night because I

11:41

did actually have a lot of time to work with the model

11:43

since the last time that you guys saw this.

11:45

And what I did is we're also creating something called a

11:48

creative HTML benchmark.

11:50

And you can see here that I have ZAI right here, GLN 5.

11:53

And what this is, is you can see if I go here,

11:57

we have these MD files, right?

11:59

And these MD files are prompts that we're going to start

12:02

benchmarking each of these models with, right?

12:04

So we give basically GLN 5 this prompt here,

12:07

it's able to generate a singular HTML file so that we can

12:11

compare how these models actually are with styling in a

12:14

very simple task, right?

12:16

So this is create a single HTML file with a full screen,

12:19

animated lava lamp, use only HTML, CSS and JavaScript.

12:23

So it's pretty good in terms of like just being able to do

12:26

that, right?

12:26

It's like, okay, like do this real quick.

12:29

But let's go over here.

12:30

And what I want to do is I'm going to drop in the bridge

12:31

bench, and then I'm going to drop in the bridge mind UI.

12:34

And what I want to do is I'm also going to drop in,

12:37

let's grab this here,

12:39

it's going to be that same open code is decent sometimes,

12:42

but other times no.

12:44

So let's paste this in.

12:46

I want you to add another section to this page that will

12:50

basically like there should be tabs at the top that allows

12:53

people to choose between which benchmark or which

12:56

leaderboard they want to see.

12:58

So there's the bridge bench,

12:59

which is the scores that you see and the current existing

13:02

table that you see.

13:03

That is the most important.

13:04

That should be by default what people land on.

13:06

That is the official bridge bench.

13:08

But if you look at bridge bench,

13:10

you will notice this creative HTML styling and all of the

13:13

prompts that are included with this.

13:15

This is going to be an additional layer of bridge bench.

13:19

And for this,

13:19

I want you to update this page so that it has this tab

13:22

system and so that it has this new tab.

13:25

And in this new tab,

13:26

I want you to display all of the system prompts that we're

13:29

putting to the test.

13:31

So there's gonna be a section for this.

13:33

So let's go here and let's actually just paste this in.

13:37

Okay, it's pasted.

13:38

So let's do that and then I'm gonna give it,

13:41

so you need to review bridge bench and review the system

13:44

prompts for the HTML and just make sure that you update the

13:47

page that has this tab system and then it also includes the

13:50

system prompts and allow people to copy the system prompts

13:52

as well.

13:54

So we'll just start there.

13:55

Now the one thing that you do wanna look out for is these

13:59

are working on the same pages, right?

14:01

So they could overlap,

14:02

but we're just gonna send it anyway and see if it can do

14:05

it.

14:06

But one thing that's very important in that you guys are

14:08

even seeing like right now, this task that I just gave it,

14:11

right?

14:11

Like improve the styling, this is a super easy task.

14:15

And even though at the start of the video,

14:17

and I already see a lot of people hyping this model up

14:20

because it is a benchmark beast.

14:21

It performs very well in the benchmarks like artificial

14:23

analysis.

14:24

You know, it tops things out,

14:25

performs well in not hallucinating,

14:30

but the issue here is that, hey, in practice,

14:32

what I'm seeing about this model, and hey, it is day two,

14:34

we can't be too hard on it, right?

14:36

But the problem is that it's like super,

14:39

super slow and it's not very reliable.

14:42

And that'll probably change over time.

14:44

Obviously it's a day two of them launching this,

14:46

but it's like, hey, if Opus 4.6 launches,

14:49

it's gonna be reliable right out of the gate.

14:51

These frontier models like, hey, we get GBD 5.3 codecs,

14:54

it's usable, it's workable.

14:56

It's usable in practice.

14:58

Whereas with GLON 5, I'm like, honestly,

15:00

it's just not doing that great of a job for me.

15:03

So that's what I've noticed so far in terms of like how it

15:07

actually works in practice.

15:09

And it's kind of disrupted this video, to be honest,

15:11

because it's so slow and it's so unreliable that I'm like

15:17

having a hard time even making this video.

15:18

I mean, yesterday I was like,

15:19

I was already making the video that you're watching right

15:21

now.

15:22

And I was like, okay,

15:22

now we're gonna put it through the bridge bench and look

15:24

at, I mean, look at this.

15:25

It was taking almost three minutes to do every single task.

15:29

So, you know,

15:29

you'd compound that over 130 tasks and you guys can do the

15:32

math real quick.

15:32

Like I was literally waiting for this benchmark to complete

15:34

for hours and yeah, it's ridiculous.

15:38

It did not do a good job.

15:39

So, you know, it's a little bit disappointing to see that,

15:42

right?

15:43

But I mean, we'll keep giving it a shot.

15:46

It's just slow.

15:47

It's just very slow.

15:49

So, that's a little bit frustrating.

15:52

Whereas, you know, hey, like, let's like go here, right?

15:54

I wanna show you guys this.

15:55

So, this here,

15:56

this is a cursor instance that has been actually going

15:59

through and it looks like it's done now, which is,

16:01

thank goodness.

16:03

But look at this.

16:04

This is a perfect example of this.

16:06

So, I basically go back over here, right?

16:09

And the prompt initially was test this model in this,

16:12

right?

16:12

And it was,

16:12

I was testing it in this creative HTML benchmark, right?

16:15

And you can see if we go here,

16:17

it's basically putting GLM5 to the test and we're using

16:21

Opus 4.6 to be able to guide GLM5 in this test, right?

16:24

And it's saying, good, two down, prompt two was 22,000

16:27

characters in 150 seconds.

16:29

Let me keep waiting.

16:30

This model is taking over two minutes per prompt.

16:33

So, it slept for 180 seconds, then 300 seconds,

16:35

then 600 seconds, then 600 seconds,

16:37

and then another 600 seconds.

16:38

You can now see that all, you know,

16:40

10 HTML files are complete.

16:42

But hey,

16:42

it literally just took me 30 minutes to get 10 HTML files.

16:45

So, we can now, you know, stop this task because it's done.

16:49

But I want to show you guys something just in practice to

16:51

just like compare models,

16:53

because I know that there's going to be a lot of goofy

16:54

content creators that are hyping this up, which yeah,

16:57

this thing's a benchmark beast.

16:58

It's very affordable.

17:00

But hey, in practice, if you're a serious vibe coder,

17:02

this thing is just not very reliable because watch this.

17:05

So, I just put this one through, right?

17:07

And what I want to do is I'm going to say, hey,

17:10

here's the prompts, here's our output.

17:12

Review this.

17:13

We now have our 10 completed HTML files from GLM5.

17:19

I now want you to test it and put it through the benchmarks

17:22

with this model.

17:24

And now I'm going to go over to open router here and I'm

17:26

going to drop in the Opus,

17:28

I'm going to drop in Opus 4.6 and you guys are going to

17:30

see,

17:31

I literally guarantee you this model is going to be insanely

17:34

fast.

17:34

So watch this.

17:35

Anthropic 4.6, okay?

17:37

Watch how fast this goes.

17:38

I bet this will complete in under 60 seconds.

17:42

I bet it'll complete in under 60 seconds.

17:44

That's going to be my guess.

17:45

So it's sleeping for 60 seconds.

17:47

So it's going to check on it periodically,

17:49

but it's going to,

17:50

I bet it'll complete in under 60 seconds.

17:52

So we'll come back to that.

17:53

But my bet is that that's going to complete in under 60

17:55

seconds.

17:56

And now if we go back here,

17:58

what you guys are seeing is look at this.

18:00

Error, file not found.

18:01

Users, desktop, bridge mind, bridge mind UI styling guide.

18:04

This thing is still looking at the styling guide and is

18:07

like thinking about its approaches.

18:09

This one's finally actually writing code.

18:12

But hey, if you're a serious vibe coder,

18:14

you are not going to be using GLM5,

18:16

at least in this current state that it is.

18:19

And I'll even show you guys like, hey,

18:20

one of the reasons that we're seeing this,

18:22

if we pull up OpenRouter here, look at the uptime.

18:25

Look at the uptime.

18:25

Look at how bad it is.

18:26

Uptime, 62%, 94%, ZAI, 97%.

18:31

It's better from ZAI, but the uptime, uptime not available,

18:35

77%, 73.69%.

18:37

Whereas if you look at Opus 4.6,

18:39

it's going to be like a hundred percent across the board.

18:40

Yeah, 99.6, 99, 99.99.

18:44

Like you're just not going to have those issues with some

18:46

frontier models.

18:47

Like these Chinese laboratories, like whatever, they can't,

18:50

I said laboratories, they can't get their,

18:52

they can't get their act together.

18:54

You know, they just launched.

18:55

So we'll give them some,

18:56

I think looking at GLM 4.7 would probably give us a little

18:58

bit of a better reference, right?

19:00

Let's look at GLM 4.7.

19:01

So uptime, 94%, 91%, 99.6.

19:06

That's better, better.

19:07

What about from ZAI directly?

19:09

ZAI, are they not even offering this anymore in OpenRouter?

19:11

Oh, six more, okay.

19:13

ZAI at the bottom here, ZAI, 97%.

19:15

So it's like a 97%, like that's not 99.9, right?

19:19

Like that you're seeing from Claude.

19:21

So it is a thing that sometimes like they struggle with the

19:25

uptime with these models.

19:26

I'm not sure why, but I mean, yeah,

19:29

this is what we're seeing now.

19:29

It's like, yeah, this is a, this model is slow.

19:33

So I may have to like pause this video so that it can

19:36

actually go through.

19:37

Now it looks like I was incorrect.

19:39

It looks like this is a deep thinker.

19:41

Let me give it more time.

19:41

So it's done with the lava lamp.

19:43

It's done with the lava lamp.

19:44

And I do,

19:45

I'm gonna give this some time because what I want to do is

19:49

I want to make it so that these actually finish so that you

19:51

guys can see what at least it created.

19:54

But if we were waiting here, I kid you not,

19:56

this is probably gonna take 10 minutes to even get through

19:58

two prompts.

19:58

So I'm gonna cut to the point where these actually finish.

20:02

Right now it's 841.

20:03

We'll see how fast these are,

20:05

but definitely slow and unreliable right now.

20:08

I think we'll give it a little bit of a break,

20:09

but I'll come back to you guys once these are actually

20:12

complete and more review what it comes up with.

20:15

All right, guys, so it's been about another 10 minutes.

20:17

It's now 850 and these are still working and they weren't

20:20

able to make some changes.

20:22

Here's what we're looking at.

20:24

It basically did nothing.

20:26

And I don't know.

20:29

I don't think that this is a really great model in

20:33

practice.

20:34

So even though these are working,

20:36

I do want to kind of like just can't,

20:38

I want to interrupt these because they're just,

20:41

we'll use Opus 4.6 for that.

20:43

It'll not be that difficult with Opus 4.6.

20:45

So in practice,

20:47

not seeing GLM5 produce the results that I need as a vibe

20:51

coder in practice, not reliable and not fast enough,

20:54

at least that's my perspective on it right now.

20:57

Now for back to this here.

20:59

So I will say I was wrong.

21:00

Opus is taking longer than 60 seconds for this HTML

21:04

benchmark.

21:04

But since this is done,

21:06

I think this will be a good opportunity for you guys to see

21:08

the differences between the HTML files and the creativity

21:12

produced by Opus 4.6 versus GLM5.

21:15

So you can see that in the past.

21:17

What has it been?

21:17

I guess it's been 10 minutes about.

21:19

But Opus 4.6 was able to create six of these so far and it

21:23

even created a nice table for us.

21:25

So you can see that GLM5 was able to do this in 103

21:27

seconds, Opus 4.6 95, 150 versus 117, 146 versus 106,

21:34

128 versus 116.

21:36

So you can see that the time Opus 4.6 is much faster than

21:40

GLM5.

21:41

Now the big question here is let's actually now compare

21:45

some of these HTML files, right?

21:47

So let's drop this one in and then let's drop this one in.

21:50

So let's do the lava lamp.

21:50

Let's check out coffee being poured.

21:55

I want you to start these so that I can give you the URL so

22:01

that I can actually see these HTML files for review.

22:04

I'll put them in a list.

22:06

Okay,

22:06

so we're gonna output this in a list and then what we're

22:10

gonna do is we're gonna actually take a look at these so

22:12

that we at least have something to go off of for GLM5 of

22:17

what it's actually able to produce.

22:18

So what we're seeing is like, okay,

22:20

benchmark beast in practice though, just not great.

22:24

So, okay, let's check out this lava lamp.

22:26

So here is the, we'll do side by sides.

22:28

Let's take a look at the Opus 4.6 lava lamp.

22:30

Okay, wait, hold up here.

22:32

I put in the wrong thing.

22:33

Okay, there we go.

22:34

So here is the lava lamp produced by Opus 4.6, okay?

22:38

Here's the lava lamp, pretty cool.

22:40

And then let's now take a look at the lava lamp produced by

22:43

GLM5.

22:44

So let's just go over here, go over here.

22:46

Oh, let's cancel out of this and let's go here.

22:52

and we'll drop this in and there we go.

22:54

All right, so lava lamp.

22:55

Oh my gosh.

22:56

All right lava lamp.

22:58

Sorry guys.

22:59

All right, there we go.

23:01

This good.

23:02

Oh my gosh, this is killing me.

23:04

Okay, hold up here.

23:05

Lava lamp.

23:07

There we go.

23:08

Okay, so let's drop this in.

23:12

There we go.

23:13

Okay, so I think this is actually a good example.

23:16

Okay,

23:17

so here's the difference in the lava lamp between GLM5 and

23:21

Opus 4.6.

23:23

You guys can be the judge here of what you think is

23:26

better.

23:27

I think another one is this coffee being poured.

23:30

Let's look at that one.

23:31

And again, like hey,

23:32

I did have like literally 30 minutes for GLM5 to come back

23:37

with all these.

23:37

So we're now finally being able to see it.

23:39

Here is the coffee being poured by Opus 4.6.

23:42

And here is the coffee being poured by GLM5.

23:46

Let's take a look at these.

23:49

Yeah, so I don't know guys.

23:52

I mean, you guys can be the judges of this, right?

23:55

But even like right off the get go, in my opinion,

23:59

I think that I mean, even look at like, look at the flow.

24:01

Let's refresh this, right?

24:04

I mean, I don't know.

24:06

You guys can be the judge of it.

24:07

Let me know what you guys think in the comment section down

24:08

below.

24:09

Let's also see this aquarium fish tank.

24:11

I want to see this one.

24:14

Let's see this aquarium fish tank.

24:17

Same thing.

24:17

Only give me the URL to go to.

24:21

Okay, same thing.

24:23

Only give me the URL to go to.

24:25

So what do you guys think so far?

24:27

We'll take a look at the aquarium,

24:28

but here's the lava lamp.

24:29

Here is the coffee being poured.

24:32

Let's now take a look at this aquarium fish tank.

24:35

This will be available.

24:36

I'm literally going to have to use,

24:37

I'm going to have to use Opus 4.6 to be able to get this

24:39

available.

24:41

Here's the aquarium fish tank from Opus 4.6.

24:44

And then here's the aquarium fish tank from GLM5.

24:48

Similar here.

24:50

You guys can be the judges of this,

24:51

but even like stuff like this,

24:53

like look at the flow of these down here.

24:55

Do you guys see how these are flowing?

24:56

And just how these are like, kind of like still, right?

25:00

Where, okay, this is, these are moving like really well.

25:04

I mean, you guys can be the judges of it.

25:05

I won't, I won't, you know, give too much here,

25:08

but here is now the solar system.

25:12

Same thing.

25:15

Okay.

25:15

Solar system.

25:16

And then we'll also do the thunderstorm over the city.

25:20

Okay.

25:21

Solar system.

25:22

Here is the solar system from, okay, great.

25:27

All right.

25:28

Same thing.

25:29

Okay.

25:31

All right.

25:32

Solar system.

25:36

Oh wow.

25:37

Okay.

25:38

So solar system from Opus.

25:40

Solar system from GLM5.

25:45

I mean, uh, there was more here,

25:47

a vision of celestial mechanics as like mechanics.

25:51

I don't know why I did the mechanics in the sun, but, uh,

25:54

you guys can be the judges of these.

25:56

So you have solar system, tropical fish tank,

26:02

morning coffee.

26:04

and lava lamp.

26:05

In my opinion, I think that Opus 4.6 is better.

26:09

And if you do go back to the benchmarks,

26:11

that is what you see.

26:14

You know, you do have that cost trade off where, yes,

26:17

GLM5 is much more affordable.

26:19

But in practice,

26:20

no serious vibe coder is going to be using GLM5,

26:24

at least right now, with the speed that it has.

26:28

Like speed is becoming more and more important.

26:31

And even though GLM5 was able to create a better model that

26:33

is, yes, it's much, it's like, it's just cheap.

26:35

It's cheap, in my opinion.

26:38

Opus 4.6 is incredibly reliable, it's much faster.

26:42

And it's actually going to be able to, you know,

26:44

do the tooling correctly so that it can get what you need

26:46

done.

26:47

GLM5, on the other hand, at least today,

26:49

and what we're seeing so far,

26:50

is that it has not produced and been able to help me in

26:53

actual tasks, right, in actual vibe coding tasks of, hey,

26:56

I need you to update this UI, I need you to add this page,

26:58

I need you to add this tab.

27:00

Nope.

27:00

20 minutes later, I still have nothing to show for it.

27:03

So, you guys can be the judge of this.

27:05

I think that we're definitely going to be wanting to use

27:08

GLM5 over the coming week to be able to, like,

27:11

see if this gets better.

27:13

But, I mean, you guys saw it firsthand.

27:15

I was trying to make this video yesterday,

27:16

and if you go back to the bridge bench,

27:19

this is what I experienced, okay?

27:21

Massive response time, very slow, and the completion,

27:25

it wasn't able to complete my tasks for the bridge bench.

27:29

So, this is what I'm seeing.

27:30

You guys can let me know what you think in the comment

27:32

section down below.

27:33

But GLM5, I think if you're a serious vibe coder,

27:35

it's probably just going to be a little bit hard to use

27:37

right now.

27:37

So, we'll continue to stay up to date with this.

27:39

I am impressed by the trajectory of open source models.

27:43

You know,

27:43

one thing that I'll kind of share with you guys before we

27:45

finish off the video here is I'm going to show you guys a

27:48

chart that somebody from the bridge mind community actually

27:51

shared with me.

27:52

And I think that it does tell a really,

27:54

really interesting story about these open source models.

27:58

Let me find this real quick and show it to you guys,

28:00

because I think it is important to understand the

28:02

trajectory of these open source models.

28:05

Here it is.

28:06

So, look at this.

28:07

So,

28:07

this is the frontier lag analysis that people are anticipating,

28:10

and what you're looking at is the open versus closed source

28:14

model.

28:15

So,

28:15

what you can see here is that it goes all the way back to

28:17

April 2025.

28:19

Here's where some of the closed,

28:20

the open source models come in.

28:21

So, here's deep seek, here's the AI,

28:24

and basically what this model is projecting that there is

28:26

going to be an inflection point in June of this year where

28:29

people really think that the open source models are

28:32

actually going to be outperforming the closed source

28:34

models.

28:35

I don't know what you guys think about this.

28:37

It says that the projected crossover is June of 2026,

28:41

but I actually don't agree with this.

28:44

I think that this is going to change.

28:45

I think that the frontier models are actually going to get

28:47

way better.

28:47

And yes,

28:48

it's possible for you to be able to create these benchmark

28:51

beasts.

28:52

But when we actually use the model in practice,

28:54

sometimes they just are not very performant.

28:56

They don't perform as well as some of these frontier

28:59

models.

28:59

So, you can put it through a benchmark,

29:00

and you can see how it performs on SWE bench and all these

29:03

other artificial analysis benchmarks.

29:05

But the most important thing is, hey,

29:07

how does it work in practice for a real vibe coder?

29:09

And as a vibe coder, it's like I notice immediately,

29:12

I'm like, hey, slow, unreliable, unusable,

29:14

for me personally.

29:15

So, that's what I'll share.

29:17

I think that that will be better.

29:18

I think that GLM5 will become faster and more reliable as

29:21

they get more GPUs online or whatever they need to do.

29:23

But so far, not impressed in my actual use case.

29:28

And I do think that Opus 4.6 is better,

29:30

especially when you do look at some of these examples that

29:32

we have.

29:33

And in real practice,

29:34

Opus 4.6 is actually able to work with me.

29:37

Whereas GLM5, not reliable.

29:39

So,

29:39

I'm not going to certify GLM5 as a bridge mine certified

29:42

model,

29:43

but I will be putting this model in the bridge bench.

29:45

You guys can go check it out at bridgemine.ai.

29:47

And if you guys liked this video, make sure you like,

29:49

subscribe, and join the Discord community.

29:51

And with that being said,

29:52

I will see you guys in the future.

29:54

Thank you.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video introduces GLM5, a new flagship open-source model from ZAI, highlighting its features and performance. While GLM5 boasts a large 202,000 token context window and incredibly affordable pricing ($1 per million input, $3.20 per million output), its real-world performance raises concerns. Benchmarks show GLM5 matching Claude Opus 4.5 on the intelligence index and achieving the lowest hallucination rate among models (34.49%). However, its coding index is lower than frontier models. In practical application, specifically during the Bridge Bench test, GLM5 exhibited very slow response times (156.7 seconds average) and a low task completion rate (75/130), proving unreliable and delaying the video's production. A comparison with Opus 4.6 in creative HTML generation tasks further demonstrated GLM5's slower speed and perceived lower quality outputs. The speaker concludes that despite its benchmark strengths and affordability, GLM5 is currently too slow and unreliable for serious developers, favoring more stable frontier models like Opus 4.6 for practical 'vibe coding' tasks. The video also touches on the anticipated future trajectory of open-source models versus closed-source models.

Recently Distilled

Videos recently processed by our community