MiniMax M2.5 explained in 5min..
136 segments
Minimax M2.5 is the next iteration from
the previous M2.1 model and the broad
market implication of this model is
quite large. The model is a contemporary
to the state-of-the-art models like
Anthropic Opus 4.6, OpenAI GBT 5.2 and
Gemini 3 Pro. But what's really
impressive is the fact that each token
only activates 10 billion parameters out
of the 230 billion parameters in total.
Now, this kind of sparsity isn't
anything new, per se, where the model
only activates around 4% of its brain
power to generate output. But when you
compare Miniax M2.5 against other models
like GLM5, Kimik 2.5, and Deepseek V3.2
Special, you start to see why this is
such a big achievement. Now, you might
wonder why this even matters. And here's
why. In order to fit trillions of tokens
of training data into the model, you
need to pick the right size of the
model. And Miniax chose 230 billion
parameters in size for their M2 series.
And now once this knowledge is packed
into the model's weights, you also need
to retrieve that information out of the
models efficiently during inference. And
in the case of Miniax M2 series, they
chose to activate only 4% of the entire
parameters at 10 billion parameters.
Meaning every token that the model
outputs only uses 4% of the model's
entire weights through mixture of
experts. Now, even at the 10 billion
parameter sparity, they were able to
achieve 80% in the Sweetbench verified
benchmark, which is neck-to-neck with
Enthropic's most recent model, Opus 4.6.
And because the memory footprint is so
low at 10 billion parameters, MiniAX is
not only able to serve the model at 3%
of the cost of Opus 4.6 in output
tokens, but they also host the model
nearly twice the speed at 100 tokens per
second. Now looking at the progression
going from M1 to M2, M2.1 to M2.5 at
each progression, they were able to up
the performance given their release
cycle of around 50 days during each
release. And they were able to do this
while maintaining the cost at 30 per
million input tokens and output token
cost at $1.20 per million output tokens.
Here's another way to look at why this
is such a big deal. Ever since Moldbook
and Open Claw became viral, one
impression that left on people's minds
was this idea called always on agent,
where you can have an agent that's
always listening, always doing some
background tasks, and always ready to
work on things. But one of the problems
with this was when it comes to cost. As
much as having an always on agent was
extremely cool, the total cost of
ownership was just too high for a lot of
people. So, assuming you're running an
agent 24/7 for an entire year that's
constantly outputting tokens, just like
leaving the tap on in the bathroom, the
cost will look different depending on
what model you choose. And as you can
see, similar to hiring an employee for a
job, you have different models at
varying wages in terms of salary per
year, but also varying levels of
intelligence, which in this case, the
intelligence is sampled by SWEBench
benchmark. And for Miniax M2.5, you're
only paying $1,892
per year in salary, assuming they run at
50 tokens per second for the entire
duration. And even at that cost, the
model is just as capable as the OPUS 4.6
in performance, but at a 2 to 3% cost of
intelligence. So what is a broad market
impact for something like this? One of
the biggest debates when it comes to AI
bubble was the idea around the falling
cost of intelligence. Meaning as the
cost to run AI models gets cheaper and
cheaper, the demand for graphics cards
will decrease because AI models just
like the Miniax M2.5 are becoming more
and more efficient, which means we can
more or less do more with less graphics
cards. So when people saw that companies
like OpenAI, Oracle, and XAI are
spending tens and billions of dollars
into graphics cards, while the cost of
intelligence is dropping, it caused
panic in the market of potentially
overprovisioning graphics cards as
models gets cheaper and more efficient.
And the argument against this was a
Jevans paradox where as resources get
cheaper, the demand actually grows as a
consequence. In other words, the cost of
intelligence dropping is actually a good
thing because now more people will want
to use it. And Miniax M2.5 is really one
of the first signals towards this
paradox where the cost of intelligence
is actually nearing the cost of
electricity at this point which means
the aggregate demand for AI and agents
should also increase. Now, I think we
all envision someday that models like
Miniax M2.5 will someday be able to run
on our phone. And well, we're just not
there yet. In fact, running that Miniax
M2.5 still requires up to 400 GB of VRAM
to fit both the model's weights and the
KV cache for inference. But just like
how quantized version could run locally
for the previous M2.1 model which share
the same architecture. Since Miniax
M2.5's models weights will be open
source shortly, we'll soon be able to
run this locally at home at a much lower
precision. You can try the Minax M2.5
model on the Miniax coding plan. Check
out the link in the description for 12%
discount for both new and existing
users.
Ask follow-up questions or revisit key timestamps.
Minimax M2.5 is a new state-of-the-art AI model that rivals leading models like Anthropic Opus 4.6 and OpenAI GBT 5.2. Its key innovation is sparsity, activating only 10 billion out of 230 billion parameters per token, which translates to superior efficiency. This efficiency allows M2.5 to achieve performance comparable to Opus 4.6 on benchmarks, but at a significantly lower cost (3% for output tokens) and nearly double the speed. The model's low operational cost makes "always-on agents" economically viable, costing only about $1,892 annually for 24/7 operation. This marks a significant step in the "falling cost of intelligence," potentially increasing the overall demand for AI, aligning with the Jevons paradox. Although currently requiring substantial VRAM, Minimax plans to open-source M2.5's weights, enabling future local, lower-precision deployments.
Videos recently processed by our community