Vibe Coding With MiniMax M2.5

Watch on YouTube

Now Playing

Transcript

773 segments

0:00

video I'm going to be vibe coding with the newly released

0:03

mini max M 2.5.

0:05

This was a model that released yesterday from mini max.

0:08

You can see here created February 12th, 2026.

0:11

The context is 204,000

0:13

the input 30 cents per million output $1 and 20 cents per

0:17

million.

0:18

So an incredible model.

0:19

We're going to be doing a deep dive today on the

0:21

benchmarks.

0:22

We're going to be putting it through my real world vibe,

0:24

coding workflow on production tasks for bridge mind.

0:28

And I'm going to be showing you guys how it performs on the

0:31

bridge bench,

0:31

which is a newly released vibe coding benchmark that I've

0:34

created that is going to be open sourced soon.

0:37

But we're going to be really putting this model to the test

0:39

to see how it actually performs.

0:41

Because as you guys know,

0:42

just because a model performs well on the benchmarks does

0:45

not mean that it's necessarily going to be that performant

0:48

in real world scenarios.

0:50

So with that being said,

0:51

I do have a light goal of 200 likes on this video.

0:54

If you guys have not already joined the bridge mind discord

0:57

community, make sure you do so.

0:59

This is a community of over 5,000

1:01

builders that are shipping daily using vibe coding in their

1:05

workflow.

1:05

These are the people that are on the frontier of the AI

1:08

revolution.

1:09

This is the place to be in 2026.

1:11

So if you have not already joined this community,

1:13

make sure you do so there's going to be a link in the

1:15

description down below.

1:16

And if you haven't already liked and subscribed,

1:18

make sure you do so.

1:19

And with that being said, let's get right into the video.

1:22

All right.

1:23

So the first thing that I want to cover is the benchmarks

1:25

that were provided directly by minimax.

1:29

And what I want to draw your guys' attention to is the fact

1:31

that minimax scored an 80.2% on sweet bench verified,

1:36

which is only 0.6 away from Claude Opus 4.6.

1:40

So this is very impressive.

1:41

They also released benchmarks on sweet bench pro.

1:44

It performed only 0.2 below GBT 5.2 and just about

1:49

literally matched Claude Opus 4.6 on sweet bench pro and is

1:54

about 1.3% lower than Claude Opus 4.5 on sweet bench pro,

1:58

but very impressive here.

2:00

And one thing I want to highlight as well is the jump from

2:03

minimax M2.1 to minimax M2.5.

2:07

Minimax did a fantastic job because, hey,

2:10

look at the difference between how much they improved the

2:12

model from the M2.1 iteration to 2.5.

2:16

You can see in all of these benchmarks,

2:18

they just absolutely crushed it in terms of their

2:21

improvement from their last model to this model.

2:23

Now, the vibe pro,

2:25

this is apparently a new benchmark and you can see it's up

2:28

there again, you know, with these frontier models.

2:30

So, you know,

2:31

that is one thing to look at these benchmarks and to say,

2:34

oh, wow, you know,

2:35

this is performing 80.2% on the sweet bench verified

2:38

benchmark,

2:38

which is a critical benchmark that I personally look to.

2:41

I think it's very,

2:43

very important to be able to look at this benchmark.

2:46

So this is what we're seeing initially out of the

2:48

benchmarks that are provided by minimax,

2:51

but I think it's a little bit more important to also check

2:54

what the benchmarks are from artificial analysis.

2:56

So this is another great benchmarking tool.

2:59

And here you can actually see the artificial analysis

3:02

intelligence index.

3:03

This is important.

3:04

And what you guys see here is that it actually does not

3:07

perform well on this intelligence index.

3:10

It gets a 42 where GLM5 scored a 50, Opus 4.5 scored 50,

3:15

and then GBD5.2 extra high scored 51 on the coding index.

3:20

Very, this is actually, I saw this and I was like, oh, wow,

3:23

that's interesting.

3:24

It scored 37 on the coding index and GLM5 scored 44.

3:29

And then you have some of the frontier models getting like

3:31

48 and 49, but it actually performed not that great.

3:36

It performs worse than sonnet on this coding index.

3:39

So I thought that this was interesting because it performs

3:41

very well on sweet bench verified,

3:43

but then in artificial analysis coding index,

3:45

it scores low.

3:46

So that's one thing to note.

3:48

Now here on the agentic index,

3:51

this measures the agentic capability benchmarks.

3:54

It scored pretty well.

3:55

So you can see GLM5 is actually the top model in this

3:59

benchmark, Minimax M2.5,

4:01

falling four points behind Opus 4.5 and GBD5.2.

4:05

So definitely something to note there,

4:07

but also one thing I always look at is the hallucination

4:10

rate.

4:11

So if we go down here in these benchmarks,

4:13

I always look at this here and let's actually hold on here.

4:18

GLM5 is 34% and where is Minimax M2.5?

4:23

I don't even see it.

4:24

Okay.

4:24

88%.

4:25

So this thing's going to be hallucinating all over the

4:27

place.

4:28

So, and I actually have an example, but yeah,

4:31

so high hallucination rate, 87%.

4:33

But I mean,

4:33

you also could make the argument that have like, okay,

4:35

like a lot of these models have high hallucination rates,

4:37

right?

4:38

But I mean, 88%, that's pretty high,

4:40

but you can see like Claude Opus 4.5 is way down here at

4:42

57%.

4:42

So this model will hallucinate,

4:45

but it's just interesting that, you know,

4:46

in terms of the benchmarks,

4:48

to see it get 37% on this coding benchmark.

4:52

And then like right here, or wait,

4:54

hold on this one right here, 37.

4:56

And then you look at the sweet bench and you're like, okay,

4:58

80.2, something's not adding up here.

5:00

So I don't know what's up with that.

5:01

Let's now check speed.

5:02

What are we looking at in open router?

5:04

So it looks like 25 tokens per second.

5:07

This is that slow.

5:08

I mean, the uptime is good, 99% uptime.

5:10

So they did do a better job than GLM5.

5:12

The uptime on GLM5 yesterday was absolutely awful.

5:15

The speed was absolutely awful for GLM5.

5:17

Let's even look at GLM5.

5:19

And it's still not great.

5:21

Look at the uptime on this model.

5:22

13 tokens per second, 12 tokens per second, 33.

5:25

What's it actually at from ZAI?

5:27

What's there?

5:27

What's the, what are they running at GLM5?

5:30

Okay, they're doing 29 tokens per second.

5:32

So Minimax M2.5,

5:34

if we just actually look directly from Minimax.

5:35

So it's about on par with GLM5 speed wise,

5:38

but it's slower than Opus 4.6 and GPT model.

5:42

So definitely something to note there,

5:45

but with that being said,

5:47

I actually want to show you guys next a really important

5:49

example that I actually did yesterday on the bridge bench.

5:53

So if you want to check out the bridge bench,

5:55

I think that this is going to be a really useful tool for

5:58

us to be able to test models.

6:00

So I will say one thing,

6:01

let's actually zoom in a little bit.

6:03

When I put Minimax M2.5 through the bridge bench,

6:05

it performed pretty well.

6:08

It scored a 59.7 on the bridge bench.

6:10

So 0.4 away from Claude Opus 4.6 in vibe coding tasks,

6:14

it was able to complete 100% of the tasks.

6:17

And then the cost, I mean, the cost for this model, I mean,

6:20

that's obviously going to be a huge aspect of the argument

6:23

to be able to use this model.

6:25

You know, you can look at the context and be like, ah,

6:26

the context isn't great, but the cost, I mean, oh my gosh,

6:29

this is,

6:30

this is the lowest price model on the market right now in

6:32

terms of like having a frontier model that also, you know,

6:35

has that good of pricing.

6:37

So if you look at the bridge bench,

6:38

you'll see that it only costs 72 cents for me to actually

6:41

run this through all 130 tasks in the bridge bench.

6:45

But you know, the speed 25.8 seconds, whereas, you know,

6:49

Opus 4.6 was 8.3 seconds on average per task.

6:53

But if you haven't checked the bridge bench out,

6:54

I would definitely suggest that you check it out.

6:56

But also one thing that I've added to the bridge bench that

6:58

I think is really going to be helpful for everyone is look

7:02

at this.

7:02

So let's you, there's also this creative HTML and this guy,

7:05

this will give you a visual for the models capabilities,

7:09

right?

7:10

And I actually have a video.

7:11

Hold on.

7:11

This is, this is a very helpful video.

7:13

Let's go over here.

7:14

So this is the MP4 and we're actually going to be building

7:16

in this video.

7:17

We're going to use mini max to basically create the ability

7:20

to export MP4s in that creative HTML.

7:23

But we'll get into that a little bit later in the video.

7:26

But for example,

7:27

I put these models in the bridge bench through this

7:29

creative HTML assignment and I want you guys,

7:31

we're going to scroll back in.

7:32

So we have GLM5 here, mini max M2.5 here,

7:36

Claude Opus 4.6 and then Gemini 3 Pro.

7:38

Let's just play this and you guys are going to see

7:40

something very interesting about this.

7:42

And it's going to kind of bring into the context and the

7:44

argument of like these models being a little bit cheap.

7:48

Okay.

7:48

So look at mini max M2.5 here.

7:51

The task was to be able to be able to create a neon open

7:55

sign and you can look at Claude Opus 4.6 here.

7:59

It absolutely nails it, right?

8:00

Perfect brick background, perfect neon sign.

8:03

It even has like effects for like the radiating light,

8:06

which is insane.

8:07

GLM5 like, like it's just crazy.

8:10

Like look at the,

8:11

even like the font that it used for drawing open.

8:13

This isn't that great,

8:14

but then look at mini max M2.5 on this.

8:17

Look at this.

8:18

Look at how bad it is.

8:19

And I think that this is why the bridge bench is so

8:22

important and what other things that we're going to be

8:24

doing is so important because you can have models that are

8:27

like benchmark beasts, but then in actual practice,

8:30

they're like complete garbage, right?

8:32

So there's all these goofy AI influencers that are hyping

8:35

up these models of like mini max M2.5 Opus performance at,

8:39

you know, a 50th of the cost, right?

8:41

And it's like, that's not really the case.

8:43

Look at this.

8:43

It may be doing well on the benchmarks,

8:45

but why in one shot did it produce this in this creative

8:49

HTML assignment where Opus produced this and Gemini 3

8:52

produced this, right?

8:54

And I think that that's a very, very interesting, you know,

8:58

example of like, okay,

9:00

it spaced out the words OBN like that's not open, right?

9:05

And you can see Gemini 3 pro and Claude Opus 4.6 did

9:08

amazing on this task, right?

9:10

And that's like the, that's a good task.

9:11

That's a good example.

9:12

I think let's look at a thunderstorm over city.

9:14

I'm pretty sure this actually did a pretty good job on this

9:16

one.

9:17

So if you guys look at the thunderstorm over the city,

9:19

it didn't do as bad on this one.

9:21

It actually did a pretty good job here.

9:23

So thunderstorm over the city, I would even say that, yeah,

9:25

like if you look at a,

9:26

let's like zoom this out a little bit just on this screen

9:28

so you guys can kind of see it a little bit better,

9:30

but yeah, like this did a pretty good job.

9:31

Jill in five also did a good job on this one.

9:33

And Opus did a good job on this one.

9:35

So I really like what a mini max did for this one,

9:38

but let's look at another one.

9:39

Look at, let's look at the aquarium fish tank.

9:42

This is actually, I did see this one.

9:43

So this was interesting.

9:45

Look at this one.

9:46

This one is like a good example.

9:48

It's, it's like all glitchy and the fish,

9:51

the way that they're swimming is like,

9:53

I don't know if these fish have like, they were born with,

9:57

you know, some, some special things about them,

9:59

but you know, look at this.

10:00

Why is this one swimming backwards?

10:02

You know what I mean?

10:02

It's like things like that where you look at it and it's

10:05

like, okay, well, let's look at what Opus 4.6 produced.

10:07

Look, right.

10:08

It's like all of the fish are swimming in the right

10:10

direction.

10:11

These, these weeds down here,

10:13

the seaweed is like flowing naturally.

10:16

Whereas with M2.5,

10:17

it's all like spazzing out and the fish are like swimming

10:20

all over the place.

10:21

And it's like, okay, when you see stuff like this,

10:23

it kind of brings into the argument.

10:25

It's like, okay, it scored an 80.2 on sweet bench verified,

10:28

but in actual practice, I mean,

10:29

my fish are going to be swimming upside down.

10:31

Right.

10:32

And that's the important thing to know.

10:34

And I think that that's like another thing is it's like,

10:36

Hey,

10:36

what I want to be for this community and what I want to be

10:38

for, you know,

10:39

this vibe coding revolution in general is I want to be

10:43

somebody that people can actually go to and that's not

10:45

going to be like hyping up new models.

10:48

I think that there's a lot of people that create videos on

10:49

the internet and they put it on YouTube and they're just

10:51

like hype, hype, hype.

10:53

That's not going to be me.

10:54

I'm going to be trying to do my absolute best to really put

10:56

these models to the test so that you guys get a clear view

10:59

of what you're going to be seeing in real world scenarios.

11:02

So this is another good example.

11:04

Look,

11:04

this is the retro space and you guys can see all this at

11:07

bridge mind.ai, but this is what Opus 4.6 produced.

11:10

Okay.

11:11

But this is insane.

11:12

So look at this.

11:12

So this is what Opus 4.6 produced for space invaders,

11:16

right?

11:16

You have insane effects.

11:17

You have absolutely incredible like look at, look at how,

11:20

look how good it is on the, like, look, look at down here.

11:22

It's like dodging the bullets.

11:24

I mean, that is insane how good this is.

11:26

Right.

11:27

And then you look at minimax.

11:29

I haven't even looked at this yet.

11:29

How, how to do, I mean, look at how bad this is.

11:32

Look at the bullets.

11:33

Look at the bullets.

11:34

They're all like going diagonal, which I mean,

11:35

it's all right, but you can see noticeably.

11:38

I mean, the score is going up, but you can see noticeably,

11:41

like why are all the aliens coming in from like the far

11:44

left, right?

11:45

Like even stuff like that, like weird details,

11:47

like why is the bullets like so diagonal?

11:49

Just weird stuff like that.

11:51

Right.

11:51

Whereas like you compare with Opus and it's like, whoa,

11:53

like this literally made like a perfect space invaders

11:56

game.

11:57

Right.

11:57

And I think that that's something that people need to kind

12:00

of like, look at the score.

12:01

It's going up perfectly.

12:02

There's like animations and effects.

12:04

So when you look at these benchmarks and it's like, Oh,

12:06

well, this scored 80.2.

12:09

Well, in real world practice,

12:12

is this thing just a benchmark beast or are my fish going

12:14

to be swimming upside down and is my space invaders game

12:17

going to be like diagonal, right?

12:19

So you can make the argument that for the price difference,

12:23

there's going to be an argument there.

12:25

This model did perform very well on the bridge bench.

12:29

So, you know,

12:29

in terms of being able to complete these tasks and earn a

12:31

good score, it did do exactly that.

12:34

It did very well on the bridge bench.

12:35

But I think that now with all of this, you know,

12:38

these benchmark overviews,

12:39

now let's actually dive in and start building a very

12:44

important feature in using minimax to be able to solve bugs

12:47

and work across the entire stack to actually build

12:49

features,

12:50

production features for bridge mine to test if this is

12:54

going to be a model that is going to be capable in my

12:57

personal production vibe coding workflow.

13:00

Let's get into it.

13:01

Okay, so I now have open code opened up.

13:04

And what you guys can see is that I've connected minimax M 2

13:07

.5 via open router.

13:09

But one thing that I do want to cover just so that

13:11

everybody knows, if you guys are a budget vibe coder,

13:15

open code literally offers this for free via open code Zen,

13:19

which is like incredible.

13:20

So what I'm going to do for this vibe coding test is I'm

13:23

going to use open code Zen for free on the right.

13:25

And we're going to use open router on the left.

13:28

And the first thing that I want to do is there was a bug

13:31

that I had.

13:32

And I just want to see, okay, for simple bug findings,

13:34

look at this issue here.

13:35

Let's see if we can fix this one.

13:37

Just very easily.

13:38

I mean, this is a very simple issue.

13:40

It's a hydration issue over on bridge mind UI.

13:43

And let's just paste this in and let's see what it is able

13:46

to come up with.

13:47

Just paste it in.

13:48

Now the next thing I want to work on is if we go over to

13:52

bridge mind, if you guys,

13:53

I kind of was showing you guys this example,

13:54

right in this video that you guys are seeing here,

13:57

this was actually made using remotion.

14:00

So I've been working on basically understanding better

14:03

agentic workflows.

14:04

And I was able to basically give a cloud code instance,

14:07

remotion skills and have it create this based off the HTML

14:10

files that were produced in another project, right?

14:13

But what I want to do is I think it's probably possible

14:16

that we're going to be able to create,

14:18

create basically a function.

14:20

And we're going to,

14:20

this is going to be a really difficult task to see if M 2

14:23

.5,

14:23

I think that Opus would be able to achieve this based off

14:26

of just my personal experience.

14:27

But what I want to do is if we go over to bridge bench,

14:29

I want to create a new feature where people are able to

14:33

like click a button and they're able to select which models

14:38

they want to compare.

14:40

And it's then going to produce an MP4 video of like a grid

14:44

layout like this.

14:46

Do you guys see what I mean?

14:47

So this was like very complex for me to do in remotion.

14:50

And one thing that I noticed is like, if you guys saw it,

14:52

it's all like sped up and fast paced.

14:55

I don't know why I was doing this last night with remotion,

14:58

but it's just not quite right.

15:00

You know what I mean?

15:00

Like, it's just like a little bit weird, right?

15:02

And I don't want to be the only one that can produce these

15:04

comparisons because I think it would be a great opportunity

15:07

for marketing purposes for people to be able to like

15:10

literally create those comparisons with the click of a

15:13

button.

15:13

Nobody else is doing that, right?

15:14

So I want to do this and I want to create, you know,

15:17

a button that I can just click and select, like create,

15:20

you know, click, you know,

15:21

compare Opus 4.6 with GLM5 and then click that button and

15:25

it's able to create basically a grid layout.

15:27

That's an MP4 video with the HTML provided that has,

15:32

you know, the bridge mind logo at the top.

15:33

That's going to be great for marketing purposes.

15:36

So I want to see if I able to do this with Minimax M2.5.

15:39

I think that I would be able to do this with, you know,

15:41

some of the frontier models.

15:42

So to be able to test this with Minimax,

15:45

I think is going to be a great opportunity because it's

15:47

like, okay, for actual difficult tasks,

15:50

is this thing going to understand what I'm trying to

15:53

accomplish, right?

15:54

Like let's,

15:55

let's paste this in and I'm going to use bridge voice to be

15:58

able to prompt this.

15:59

This is the official bridge mind voice to text tool.

16:02

Also,

16:02

what you guys are seeing is I'm actually using bridge space,

16:05

which is the ADE.

16:07

Currently it's like in beta,

16:08

we're fixing some performance issues,

16:10

but that's the ADE that I'm using right now,

16:12

the agent development environment.

16:13

But let's give this a prompt real quick.

16:16

I want you to review this page in the BridgeMind website,

16:20

and I want you to take a particular note of the creative

16:23

HTML page.

16:25

I need you to build a new feature where users are able to

16:29

click a button that says produce MP4 comparison.

16:33

And when they click this button,

16:35

it should show a dropdown of a multi-select menu for the

16:38

user to be able to select which models they want to compare

16:41

for that particular HTML test.

16:44

They should be allowed to select as many models as they

16:46

want.

16:47

And there needs to be a utility where they're able to

16:51

select which models they want to compare and then click

16:54

create MP4.

16:56

And when they click that button,

16:57

it's going to produce an MP4 comparison that allows the

17:03

user to compare the models in a grid layout that is branded

17:08

bridge bench with the BridgeMind logo.

17:10

You can check out this image as reference so that you

17:14

understand something similar to what I want to produce.

17:19

Okay, so we're going to drop this in,

17:21

and this is what BridgeVoice just outputted.

17:25

So it's very, if you guys aren't using voice-to-text tools,

17:27

if you want to be a serious vibe coder,

17:29

like I see people that are doing whatever, right?

17:33

And I'm like,

17:33

why are you not using voice-to-text tools yet?

17:37

If you're not using voice-to-text tools,

17:39

that is insane to me.

17:42

So what I want to do now is, hold on here.

17:45

I'm going to do one thing in BridgeVoice real quick.

17:47

I'm going to change this.

17:48

So this is BridgeVoice.

17:49

Again, I'm going to just change my,

17:50

I'm going to clear my shortcut real quick here.

17:51

I'm going to change it to a different shortcut,

17:52

but I'm going to take a screenshot here.

17:55

Oh, I need to launch CleanShotX here.

17:57

All right, so I use CleanShotX for screenshots.

18:01

It's just a very easy way to do screenshots,

18:02

but I'm going to now do this screenshot of this grid

18:04

layout.

18:06

And so here's what I'm going to drop in.

18:07

And this is even going to give us a good look of,

18:11

is it possible?

18:11

Okay, I think that this model takes in images, correct?

18:15

I mean, some of these models don't take in images.

18:17

We're not even going to put it in plan mode.

18:19

We're just going to check out its one-shot capabilities.

18:21

So is it able to see, and this one's still working.

18:25

We'll be able to see this, but it says,

18:27

I'll explore the BridgeBench page and understand its

18:29

structure first.

18:30

And I will say,

18:31

if it's able to accomplish this in one shot,

18:34

I'll be very impressed.

18:35

This is a very difficult task.

18:38

To be able to compile HTML in the animated way that it is

18:43

rendering and compile it into a MP4 video and to be able to

18:48

download that, that is, think about it,

18:50

you're talking about mapping.

18:52

You're talking about MP4 generation.

18:54

This is a difficult feature.

18:56

Now, this one on the right here, this took five minutes.

18:59

You can see this one took five minutes to complete.

19:02

This was using, again, OpenCode Zen.

19:04

So this was using the free version that's offered via

19:06

OpenCode.

19:06

Let's see if it was actually able to fix that hydration

19:08

issue.

19:09

Let's just go over to here.

19:11

Let's launch localhost again.

19:12

And look at that.

19:14

It looks like it was not able to fix our hydration issue.

19:17

We're getting the same exact issue.

19:19

So I think that a good test for this would be to say, well,

19:23

it didn't fix it, right?

19:25

It didn't fix it inside of Minimax M2.5, right?

19:30

But if we change the model and we switch the model to a

19:34

frontier provider, like Opus 4.6,

19:37

is this going to be able to fix it, right?

19:40

I need you to, hold on, fix this error.

19:47

Okay, I need you to fix this error.

19:48

And this is gonna use Claude Opus 4.6, right?

19:50

So this is going to give us a good idea of, okay,

19:55

it didn't work with Minimax M2.5.

19:57

Well, is this just a really difficult error to solve?

20:00

That's gonna be Opus 4.6.

20:03

I already saw it.

20:03

Look how fast it was.

20:05

That is insane.

20:07

So Opus 4.6, 20 seconds, 20 seconds on Opus 4.6.

20:13

And you refresh this.

20:14

No more error.

20:19

No more error.

20:22

So this example right here, I think is a good example,

20:29

okay?

20:31

And this is my personal experience with some of these

20:34

Chinese models.

20:34

If you guys have ever heard the phrase made in China,

20:38

I think that this applies here.

20:41

The only thing that I don't understand is this model

20:44

performs so well on the benchmarks, right?

20:47

Like you look at the benchmarks, even on the bridge bench,

20:50

you're like, okay, it performs so well, right?

20:53

Well, why are the fish upside down?

20:57

And even like with this one, right?

20:58

Like this is another example.

20:59

The Minimax M2.5 lava lamp, what's up with it, right?

21:02

Why is it just a square, right?

21:04

Why is the lava lamp just a square?

21:08

And that's why we're creating the bridge bench and why I

21:10

like to run these models through and why I can tell you

21:12

guys right now, I am not going to be just a, you know,

21:15

an AI influencer.

21:16

That's like mini max M 2.5 Opus 4.6 performance at 50th of

21:22

the cost.

21:22

It's like, no, like I'm a true vibe coder.

21:24

Like I know when we have good,

21:26

I know the difference between a good model and a bad model

21:28

is, is my point.

21:29

Okay.

21:29

And yes, like I think a lot of people are,

21:32

you can make the argument that, hey, mini max M 2.5,

21:35

it's so cheap, right?

21:37

It's so inexpensive that for a lot of tasks,

21:39

if you prompt it really well and you give it what exactly

21:43

what it needs to be able to do the task,

21:44

it's going to be able to do it for much cheaper than Opus 4

21:46

.6 would, right?

21:47

So you have simple tasks or something like that,

21:49

that are not going to be prone to hallucination.

21:52

Sure.

21:52

Give mini max M 2.5, give it, give it to M 2.5, right?

21:56

It's, you're going to save it on money.

21:57

But the issue is that like, Hey,

21:59

I spent five minutes trying to fix a simple hydration

22:03

issue.

22:04

Opus 4.6 fixed it in 20 seconds, right?

22:06

So, you know, even like with that example, right there,

22:09

you know, I, we still need to give this one a shot.

22:12

And one thing I'll note is that the model is slow.

22:15

Okay, the model,

22:15

like Opus 4.6 would already be coding this implementation,

22:19

like 100%.

22:20

I guarantee it.

22:21

And a mini max M 2.5 is still thinking.

22:25

So I'm going to just like cut, I'm gonna,

22:27

I'm gonna cut the video here and I'm gonna let this run

22:30

because it could run for like 10 minutes.

22:32

So I just don't want to sit here in the video and like have

22:33

the video be 40 minutes long while I wait for the response.

22:36

But I think that this task that's running here on the left

22:38

is going to give us a really good view into like, okay,

22:43

how good is this model in real world tasks, right?

22:46

Because this is a difficult task.

22:48

Let's see if mini max M 2.5 is able to do it.

22:51

And with that being said,

22:52

I'll just come back once this task is gone and then we'll

22:54

review its work.

22:55

All right, guys.

22:56

So the mini max M 2.5 model did complete the task,

23:01

but I want you to take close note that last clip that you

23:04

guys just saw was like over 30 minutes ago.

23:06

It took over 30 minutes for me to produce this from mini

23:09

max M 2.5.

23:10

Now that could be because it just released yesterday.

23:12

It's a little bit slower.

23:13

I mean,

23:14

you did see from open router earlier in the video that,

23:16

you know, it's a slower model, but take,

23:18

draw your attention here.

23:19

So we now have this here, which is what it created.

23:22

It did center this well,

23:24

because it added in this new option here.

23:26

So this is a drop down.

23:27

The styling, I will say does look good.

23:29

It followed the theme,

23:31

but let's now like choose mini max M 2.5 Gemini 3 Pro,

23:34

GLM 5, Claude Opus 4.6.

23:36

And remember,

23:37

I think that this is an incredibly difficult task.

23:39

If it's able to complete it in one shot, I'll be impressed.

23:42

Let's, let's drop this in.

23:44

It says, hold on here.

23:46

So it says,

23:46

MP4 comparison recording captures canvas elements for best

23:50

results.

23:50

Wait for animations to fully loads.

23:52

Let's click start recording.

23:54

And this is interesting.

23:55

So instead of it, this is really interesting.

23:58

So a record.

24:00

This is an interesting approach that it took.

24:02

So let's do 15 seconds and then let's see what it creates.

24:05

Okay, so we're going to do 15 seconds.

24:07

Stop, download this.

24:09

Whoa.

24:10

Alright, so it download a web M file.

24:14

And it didn't work.

24:15

It didn't work chat.

24:17

It didn't work.

24:18

Alright, so yeah, I mean, that's that that is what it is,

24:22

right?

24:22

It downloaded a web M file.

24:26

And it didn't work.

24:27

So I think that, you know, one thing to note is that, okay,

24:32

it did do a decent job in creating this here.

24:35

But like I said, this is a difficult task.

24:38

But it is a task that I do think that Opus 4.6 could have

24:41

completed in one shot.

24:44

And it wasn't able to do it, right?

24:46

So looking at what we have, it's like, okay, like,

24:48

first of all, the model is a little bit slow.

24:49

That's what we're seeing from these Chinese models,

24:51

they can be a little bit unreliable and slower, right?

24:53

Because they don't have the amount of GPUs that some of

24:55

these other frontier labs have from America, right?

24:57

So, you know, I will say that, um,

25:01

Not particularly impressed by, number one,

25:05

the hydration issue that we had where it was a simple issue

25:08

that it wasn't able to fix.

25:09

I'm not impressed by some of the results that we have,

25:12

like, for example, with the neon light here.

25:15

You know, again, just this,

25:16

this is what Minimax M2.5 produced.

25:19

So it's like, hey, you guys be the judges of that, right?

25:22

Open, no, that is a B, that is not a P.

25:25

So, you know,

25:26

I think that you guys can be the judges of this.

25:28

My personal perspective on this is, yes, it's a very,

25:31

I don't know what was going on there, but, you know,

25:33

it's a very cheap model.

25:35

You know, if we go back to OpenRouter, it's like, hey,

25:37

at the end of the day,

25:37

to some extent you are going to get what you pay for,

25:41

right?

25:41

So even though that this model,

25:43

it appears to be bench maxed to me, you know,

25:46

it did perform well in the bridge bench.

25:48

So even on my own benchmark, it performed well.

25:51

Hold on, that's local.

25:52

So if we go to Bridge Mine here, you can go to Bridge Mine,

25:54

go to Community, go to Bridge Bench, and here it is.

25:56

You know, it performed, it got a 59.7.

25:58

So, you know,

25:59

we're gonna have to improve the bridge bench so that maybe

26:02

we can add in some type of UI element where it's graded by

26:05

another LLM, because right now,

26:07

how this benchmark works is it's able to,

26:10

it gives it a set of tasks,

26:12

and based off of the completion of those tasks,

26:13

it scores it, right?

26:14

So we're gonna continue to improve the bridge bench,

26:16

because what we wanna do is we wanna weed out models like

26:19

Minimax M2.5 that are good on benchmarks.

26:23

They're good on, you know, let's say, leak code tasks,

26:25

right?

26:26

But in practice, when you give them, you know,

26:28

actual assignments like here with our neon sign or our

26:31

hydration issue,

26:32

they're not great in a real vibe coding workflow.

26:36

So that's definitely one thing to note.

26:38

I'm gonna be using this on stream today.

26:40

I'm gonna be live today right after this video finishes

26:42

premiering, but I'm just not super impressed.

26:45

I think that from the cost perspective,

26:47

if you are a budget vibe coder, like this is great,

26:49

because you obviously are getting a much better addition.

26:53

You know, if you were a big fan of Minimax M2.1,

26:55

you're gonna love Minimax M2.5,

26:57

but the only issue is it's like, hey, like,

26:59

if you're on the frontier, if you're using cloud models,

27:01

or if you're using GPT models,

27:03

this is not gonna be something that you say, oh, like,

27:05

you know, what's up with this?

27:06

And I do wanna mention one thing before I end the video,

27:09

and that is to share with you guys,

27:12

out of all the emails that I get from people that are

27:15

trying to sponsor and partner with BridgeMind,

27:18

I get more emails, more than anybody,

27:21

from these Chinese labs.

27:23

They all want me to use their model.

27:25

They want me to partner with them.

27:26

They want to give me free stuff.

27:28

They want to pay me.

27:29

And here's what I will say.

27:31

Does that reveal some aspect of what's happening here?

27:34

Look at this lava lamp.

27:36

Is this lava lamp that good?

27:37

I don't know.

27:38

I don't know how Minimax M2.5 scores in 80.2% on the SWE

27:45

bench,

27:45

and then also it performs very well in the BridgeMind.

27:47

So, you know, I just, I think we'll have to use it.

27:50

I'll use it a little bit more on stream,

27:52

so that we get a little bit more of a view of what it looks

27:55

like in like a very long session, right?

27:57

Maybe like an hour or two.

27:59

But so far, it's like, hey, from the creative HTML tasks,

28:02

from the tasks that we gave it here,

28:04

with this MP4 download, I mean, can we do it with one?

28:07

Let's just try it with one.

28:08

Can we do this?

28:09

Start recording, stop.

28:11

Does that do anything?

28:12

Let's go here.

28:14

Still nothing, yeah.

28:15

So we're gonna have to build this in.

28:17

If you guys are wondering if Clod Opus 4.6 can do this,

28:20

we are going to test this.

28:21

We're gonna build this feature out today with Clod Opus 4

28:24

.6.

28:24

And I'm assuming that it's gonna be able to one-shot,

28:26

maybe even two-shot it.

28:27

So with that being said, I'll be using this a little bit,

28:30

but probably not gonna be a model that I'm gonna be using

28:33

in my personal vibe coding workflow.

28:35

I am not going to give it the BridgeMind stamp of approval

28:38

based off of what we're seeing inside the creative HTML

28:40

benchmark,

28:41

as well as some of these actual production tasks.

28:43

What we're seeing is that this model is benchmarksed.

28:46

I'm not going to give it the BridgeMind stamp of approval.

28:48

But with that being said, guys,

28:50

I will see you guys here on stream in a minute.

28:52

And if you guys have not already liked, subscribed,

28:55

or joined the Discord, make sure you do so.

28:57

And with that being said,

28:57

I will see you guys in the future.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video provides a deep dive into the Minimax M2.5 AI model, released yesterday, evaluating its performance through official benchmarks and real-world 'vibe coding' tasks using the custom Bridge Bench. While M2.5 showed impressive scores on some benchmarks like Sweet Bench (80.2% verified, close to Claude Opus 4.6) and completed 100% of Bridge Bench tasks at a very low cost, it struggled significantly in practical scenarios. It performed poorly on Artificial Analysis's intelligence (42) and coding (37) indexes, had an 88% hallucination rate, failed to fix a simple hydration bug (which Claude Opus 4.6 fixed in 20 seconds), and took over 30 minutes to unsuccessfully implement a complex MP4 comparison feature. The presenter concludes that despite its low price, Minimax M2.5 is unreliable, slow, and appears to be 'benchmarked' (good on paper, poor in practice), thus not receiving the BridgeMind stamp of approval for professional vibe coding workflows.

Recently Distilled

Videos recently processed by our community