HomeVideos

MiniMax M2.5 explained in 5min..

Now Playing

MiniMax M2.5 explained in 5min..

Transcript

136 segments

0:00

Minimax M2.5 is the next iteration from

0:03

the previous M2.1 model and the broad

0:06

market implication of this model is

0:08

quite large. The model is a contemporary

0:10

to the state-of-the-art models like

0:12

Anthropic Opus 4.6, OpenAI GBT 5.2 and

0:16

Gemini 3 Pro. But what's really

0:18

impressive is the fact that each token

0:21

only activates 10 billion parameters out

0:23

of the 230 billion parameters in total.

0:26

Now, this kind of sparsity isn't

0:28

anything new, per se, where the model

0:30

only activates around 4% of its brain

0:33

power to generate output. But when you

0:35

compare Miniax M2.5 against other models

0:38

like GLM5, Kimik 2.5, and Deepseek V3.2

0:43

Special, you start to see why this is

0:45

such a big achievement. Now, you might

0:47

wonder why this even matters. And here's

0:49

why. In order to fit trillions of tokens

0:52

of training data into the model, you

0:54

need to pick the right size of the

0:56

model. And Miniax chose 230 billion

0:59

parameters in size for their M2 series.

1:02

And now once this knowledge is packed

1:05

into the model's weights, you also need

1:07

to retrieve that information out of the

1:09

models efficiently during inference. And

1:12

in the case of Miniax M2 series, they

1:14

chose to activate only 4% of the entire

1:18

parameters at 10 billion parameters.

1:20

Meaning every token that the model

1:22

outputs only uses 4% of the model's

1:25

entire weights through mixture of

1:27

experts. Now, even at the 10 billion

1:29

parameter sparity, they were able to

1:31

achieve 80% in the Sweetbench verified

1:33

benchmark, which is neck-to-neck with

1:35

Enthropic's most recent model, Opus 4.6.

1:39

And because the memory footprint is so

1:41

low at 10 billion parameters, MiniAX is

1:44

not only able to serve the model at 3%

1:47

of the cost of Opus 4.6 in output

1:51

tokens, but they also host the model

1:53

nearly twice the speed at 100 tokens per

1:56

second. Now looking at the progression

1:58

going from M1 to M2, M2.1 to M2.5 at

2:03

each progression, they were able to up

2:05

the performance given their release

2:07

cycle of around 50 days during each

2:09

release. And they were able to do this

2:11

while maintaining the cost at 30 per

2:14

million input tokens and output token

2:17

cost at $1.20 per million output tokens.

2:20

Here's another way to look at why this

2:22

is such a big deal. Ever since Moldbook

2:24

and Open Claw became viral, one

2:26

impression that left on people's minds

2:29

was this idea called always on agent,

2:31

where you can have an agent that's

2:33

always listening, always doing some

2:36

background tasks, and always ready to

2:38

work on things. But one of the problems

2:40

with this was when it comes to cost. As

2:43

much as having an always on agent was

2:45

extremely cool, the total cost of

2:47

ownership was just too high for a lot of

2:49

people. So, assuming you're running an

2:51

agent 24/7 for an entire year that's

2:55

constantly outputting tokens, just like

2:57

leaving the tap on in the bathroom, the

2:59

cost will look different depending on

3:01

what model you choose. And as you can

3:03

see, similar to hiring an employee for a

3:06

job, you have different models at

3:08

varying wages in terms of salary per

3:10

year, but also varying levels of

3:12

intelligence, which in this case, the

3:14

intelligence is sampled by SWEBench

3:16

benchmark. And for Miniax M2.5, you're

3:19

only paying $1,892

3:22

per year in salary, assuming they run at

3:25

50 tokens per second for the entire

3:27

duration. And even at that cost, the

3:30

model is just as capable as the OPUS 4.6

3:33

in performance, but at a 2 to 3% cost of

3:36

intelligence. So what is a broad market

3:39

impact for something like this? One of

3:41

the biggest debates when it comes to AI

3:43

bubble was the idea around the falling

3:45

cost of intelligence. Meaning as the

3:48

cost to run AI models gets cheaper and

3:50

cheaper, the demand for graphics cards

3:53

will decrease because AI models just

3:56

like the Miniax M2.5 are becoming more

3:59

and more efficient, which means we can

4:01

more or less do more with less graphics

4:03

cards. So when people saw that companies

4:06

like OpenAI, Oracle, and XAI are

4:09

spending tens and billions of dollars

4:11

into graphics cards, while the cost of

4:13

intelligence is dropping, it caused

4:16

panic in the market of potentially

4:18

overprovisioning graphics cards as

4:20

models gets cheaper and more efficient.

4:22

And the argument against this was a

4:24

Jevans paradox where as resources get

4:27

cheaper, the demand actually grows as a

4:30

consequence. In other words, the cost of

4:32

intelligence dropping is actually a good

4:35

thing because now more people will want

4:37

to use it. And Miniax M2.5 is really one

4:40

of the first signals towards this

4:42

paradox where the cost of intelligence

4:44

is actually nearing the cost of

4:47

electricity at this point which means

4:49

the aggregate demand for AI and agents

4:52

should also increase. Now, I think we

4:54

all envision someday that models like

4:56

Miniax M2.5 will someday be able to run

5:00

on our phone. And well, we're just not

5:02

there yet. In fact, running that Miniax

5:04

M2.5 still requires up to 400 GB of VRAM

5:08

to fit both the model's weights and the

5:11

KV cache for inference. But just like

5:13

how quantized version could run locally

5:15

for the previous M2.1 model which share

5:18

the same architecture. Since Miniax

5:20

M2.5's models weights will be open

5:22

source shortly, we'll soon be able to

5:24

run this locally at home at a much lower

5:27

precision. You can try the Minax M2.5

5:29

model on the Miniax coding plan. Check

5:31

out the link in the description for 12%

5:34

discount for both new and existing

5:36

users.

Interactive Summary

Minimax M2.5 is a new state-of-the-art AI model that rivals leading models like Anthropic Opus 4.6 and OpenAI GBT 5.2. Its key innovation is sparsity, activating only 10 billion out of 230 billion parameters per token, which translates to superior efficiency. This efficiency allows M2.5 to achieve performance comparable to Opus 4.6 on benchmarks, but at a significantly lower cost (3% for output tokens) and nearly double the speed. The model's low operational cost makes "always-on agents" economically viable, costing only about $1,892 annually for 24/7 operation. This marks a significant step in the "falling cost of intelligence," potentially increasing the overall demand for AI, aligning with the Jevons paradox. Although currently requiring substantial VRAM, Minimax plans to open-source M2.5's weights, enabling future local, lower-precision deployments.

Suggested questions

5 ready-made prompts

Recently Distilled

Videos recently processed by our community