Vibe Coding With Kimi K2.5
869 segments
Hello, everyone, and welcome back to another video.
In this video,
I'm going to be vibe coding with the newly released Kimmy K
2.5, which released yesterday.
It's a model from Moonshot AI,
and this is a new Frontier model.
As you can see here on OpenRouter,
the context window is 262,000,
and it's a relatively affordable model at 60 cents per
million on the input and $3 per million on the output.
This is an open source model and is effectively now the
number one open source model,
but we're going to get into that as we vibe code with this
newly released model in this video.
But before we get started, guys,
we are going to be using Kimmy K2.5 inside of Cursor.
I configured it using OpenRouter and it's configured here.
And we're going to be working on a variety of different
tasks, both across the back end,
as well as the front end and the database.
But before we get started, I do ask that you guys like,
subscribe, turn on post notifications,
and join the BridgeMind Discord community,
which is the fastest growing vibe coding community on the
internet right now.
If you guys haven't already seen,
we do have a light goal for this video of 200 likes.
So if you guys could like the video, let's hit that goal.
But with that being said,
let's actually take a look at this model, who made it,
what is different about it,
and how it ranks up on the leaderboards and the benchmarks.
So here is like the summary from Moonshot.
So they say,
Kimmy K2.5 is Moonshot AI's native multimodal model,
delivering state-of-the-art visual coding capability and a
self-directed agent swarm paradigm, which is very,
very interesting.
We're going to get into that in a second here because Kimi
K2.5 is different in this way.
So built on Kimi K2 with continued pre-training over
approximately 15 trillion, which is insane,
mixed visual and text tokens.
It delivers strong performance in general reasoning,
visual coding, and agentic tool calling.
So one really important thing to note is the speeds that
people are actually getting with this model.
So you can see here Moonshot AI,
it's about 29 tokens per second directly from the provider,
which is actually relatively slow.
But then there's some faster GMI Cloud is at 75 and then
Fireworks is at 104.
So a little bit faster with those providers,
but relatively slow model, I will say.
But this model is not yet in Ella Marina.
So we can't see how it stacks up there.
But artificial analysis,
it has added this model to its benchmark.
So what I want to dive into before we start vibe coding
with this model and putting it through the test of real
vibe coding workflows, what I first want to do is, okay,
where does this stack up with speed?
Where does it stack up in intelligence?
So even here,
you can actually see that Kimi K2.5 in the artificial
analysis benchmark actually stacked up pretty well.
So it may be a little bit slower when we use it.
There may be different demands on OpenRouter that cause it
to be a little bit slower.
But artificial analysis actually sets this at 119.
So it's number five on speed, which is surprising.
So I don't know how they're integrating with that model,
but in terms of speed, it's actually number five,
which is surprising.
But in intelligence, this is like a very,
very important index.
So let's just scroll down here.
Kimmy K2.5 is performing up with Frontier models like GPT
5.1, Gemini 3 Pro, Claude Opus 4.5, and GPT 5.2.
So you can see it's literally ranking at 47,
number five on the list,
which is just absolutely incredible.
It's beating out Claude 4.5 Sonnet.
It totally beats out Minimax M2.1.
Wait,
let's add MiniMax because I think that like in the context
of like these cheaper open source models,
I think we definitely want to add MiniMax M2.1 and GLM 4.7
to this list to see, okay,
how does this stack up to those models?
So MiniMax M2.1, you can see is back here at 40,
and then GLM 4.7 is here at 42.
So Kimi K2.5 is the number one open source model now.
And in terms of like that cost affordability, I mean,
look at that.
I mean, 60 cents per million on the input,
$3 per million on the output.
So it's literally like a ninth of the cost of Opus 4.5,
and you're not losing a ton of intelligence.
However,
I do want to look at the coding index and highlight this.
This model is not that good at coding, okay?
Like it definitely is obviously a good,
a good coding model.
It still beats out Sonic 4.5,
but you can see here that in the intelligence index,
it's like up there with the Frontier models.
But then you look at the coding index and you're like, ah,
you know, it's eight below Opus 4.5.
It just didn't, it didn't perform super,
super well in the coding index.
And this is reflected too,
like in their actual benchmarks that they put out,
which we're going to look at here in a second.
But the next thing that I want to show you guys that I did
notice when I was doing a review of some of these
benchmarks is the hallucination rate, okay?
Because sometimes you can have a model that is high on the
intelligence index or high on the coding index,
but then it has a crazy high hallucination rate.
Like, for example, a good example of this is GLM 4.7.
GLM 4.7 has a 90% hallucination rate, which is insane,
right?
That's very, very high.
But what I do want to kind of show you guys is that Kimmy
K2.5 actually does very,
very well on this hallucination benchmark.
It beats out models like,
I think I'm pretty sure it beats out, where is it?
Where is Jupiter?
Yeah, look at this.
Gemini 3 Pro here, 88% on the hallucination rate.
And Kimmy K2.5, they just did a very,
very good job with hallucination.
So this is going to be a model where, hey,
it does have a good hallucination rate.
That's definitely one thing that I noticed.
That's actually pretty good.
So with that being said,
let's take a look at the blog that Moonshot AI actually put
out.
So they said, today we are introducing Kimi K2.5,
the most powerful open source model to date.
So we already read this,
but Kimmy K2.5 builds on Kimi K2 with continued
pre-training over approximately 15 trillion mixed visual
and text tokens.
Built as a native multimodal model,
K2.5 delivers state-of-the-art coding and vision
capabilities.
I don't know about state-of-the-art coding,
but in a self-directed agent swarm paradigm.
This is very interesting.
So they say that for complex tasks,
KimiK2.5 can self-direct an agent swarm with up to 100
sub-agents executing parallel workflows across up to 1500
tool calls.
I don't know how expensive that's going to be to launch 100
sub-agents,
but the fact that it does a self-directed agent swarm,
I don't know how we're going to be able to see that or like
what that looks like in practice.
But later in this video,
once we start putting this into practice, we may see that.
I'm not sure yet.
Compared with a single agent setup,
this reduces execution time by up to 4.5x.
The agent swarm is automatically created and orchestrated
by Kimi K2.5 without any predefined sub-agents or
workflows.
So literally the model itself is able to deploy this agent
swarm.
Whereas let's say that you were using Claude Code or you
were using one of those other models,
you're going to have to say, hey,
launch 10 sub-agents to do this task.
But this seems like it's going to be self-directed.
So I don't know how we'll be able to actually see that,
but we'll see it here in a minute once you start coding.
So in agents, it does very well on these benchmarks.
It's leading.
So I don't,
I haven't like these aren't benchmarks that I really look
at.
The biggest benchmarks that I look at, obviously,
for Vibe coding is this Sui Bench Verified coding benchmark
here, SuiBench Verified.
So you can see here that like this model, like I said,
you know, it does well on a lot of these benchmarks.
You can see that it's up here in image and agents and
video.
But the issue is that for vibe coding,
it actually doesn't perform very well.
You can see right here, 76.8.
You can look at it and say, okay,
it does perform better than Gemini 3 Pro.
So that's something to look at.
And you can say, hey, look, look at the cost.
Look at how well it does with the hallucination rate and
these other factors.
But in terms of it being the number one coding model,
it's not even close.
I mean,
those four points on this benchmark does make a difference.
So Opus 4.5 is still leading for the best coding model,
and GBT is still up there as well.
Kimmy K2.5, not as good.
But with that being said,
I think that gives us enough information to be able to like
test this out.
So let's go back over to cursor here.
And what are we going to be working on?
We're going to be working across both the back end and
front end.
We're going to be working in Bridgemind.
So as you guys know, we are launching Bridgemind here.
Let's just log in real quick.
I think that I have an account here.
So let me just sign in and all right, perfect.
So here's Bridgemind.
There's a couple different things that I want KinnyK2.5 to
actually build out for us.
So in with Bridgemind,
what we're working on is we're working on the Bridge MCP.
Okay.
So for example,
I can go here and I can just create a new project.
And this project has basically this task list.
Okay.
So this is a way that you can work across agents where you
can see I have it to do, in progress, in review, complete,
and canceled.
And what this does is this allows you to work with agents
that will then,
you can create instructions and you guys will see how this
works.
But what we're going to focus on particularly is that we
want to build basically a prompt library and a skill
library for people to be able to use BridgeMind and use
pre-built prompts and pre-built agent skills that they'll
be able to just grab with their subscription.
So this is kind of a new concept.
So we're not really going to be focusing on the Bridge MCP.
We're more so going to be focusing and testing Kimmy K2.5
on its ability to actually build out back-end logic and
then build that into the front end.
So I'm just going to launch a new agent here and we're
going to be working with a lot of different agents.
Okay.
So the first thing I want to do is I'm going to open up a
browser and we're going to go over to localhost here and
we're going to sign in again.
Let's see here.
Like even one thing here.
Let's zoom out.
Okay.
There we go.
Okay.
So let's zoom out.
We're going to log in again and we can do this in a second.
Let's give it its first prompt.
So all I'm going to do is I'm going to drag and drop the
database.
I'm going to drag and drop the API and I'm going to drag
and drop the UI and I'm going to drag and drop the web app.
Okay.
And I'm then going to give it my first prompt.
Drag and dropping.
Okay.
Hold on.
I have to be careful here so I don't get this nested.
Okay.
So I'm just going to add this.
Okay, Bridgemind, web app.
I want you to review the database, API, UI, and web app,
and I need you to build a new table in our Drizzle schema
called skills.
Skills will be similar to prompts in how it's structured.
I need you to review the database in the API because I need
you to create the new schema for this skills table.
And then I need you to create a new module in the API so
that users can create skills.
And then it will be similar to the prompt schema where
there can be system skills that are created by admins.
So I want you to review the database, how it's structured,
the API, and how it's structured,
and then I want you to introduce this new skills schema.
And I want you to build the module for this in the API so
that users can create skills, update skills,
and so that admins can create skills and manage skills.
And I want to start out with the most simple schema
structure possible.
So all we really need is the ability for users to create a
skill, to add content to that.
And that's all I need for now.
So review the project and create a structured plan before
you code anything.
But first,
you need to do an in-depth review of these different
repositories.
You can launch sub-agents to do this.
And I then need you to create a structured plan for
building out this new functionality.
You need to make sure that you add a skills link in the
sidebar of the Bridgemind web app and then create the pages
for this and integrate this functionality across the
database, API, and front-end web app.
Okay, so you guys kind of got my prompt there.
We are going to put cursor in plan mode for this,
but I did ask it to launch sub-agents,
and sub-agents are new in cursor.
So sub-agents, if you guys don't know what sub-agents are,
it's essentially the ability for cursor to be able to spin
up multiple sub-agents rather than just working in like a
single conversation chat.
So this actually helps with context.
It helps with doing things faster.
But also, if you guys remember, like, you know, Kimi K2.5,
I don't know how we're going to be able to see this or if
there's going to be some type of visual representation of
this, but they did say that it can launch agent swarms.
So what that actually looks like in practice,
I'm not sure yet, but we'll see how that works.
It does look like it's a little bit slow,
like planning next moves that, you know,
some of the other models, they'll go like super,
super fast.
But let's launch another agent and let's actually sign in
here and let's move on to like the next the next portion of
what we're going to be working on because we are going to
be working on multiple things at once.
Okay.
So the next thing that I want to try is just like,
let's see the UI capabilities of this model.
So we're going to drop in this div.
I want you to do an in-depth review of the styling of this
dashboard.
Right now, this is not the styling I want.
It does not look good.
There's wrapping all over the place.
It's not professional.
I want you to review the other parts of the website and
update the styling so it's more compact, more professional,
and more modern.
Okay, so we'll drop that in, and that should do the trick.
Let's actually add the BridgeMind web app just so that it
knows, okay,
like it's not going to be searching for what project that's
in.
Then here, okay,
so here's those sub-agents that I was talking about.
So I'll launch sub-agents to explore the database API and
front-end structures in parallel to understand how prompts
are currently implemented so I can model skills similarly.
So one great part about using sub-agents and the reason
that you should be using them is because let's say that
you're working on something like this, right?
Where I drop in database, API, UI, and a web app.
You can ask it to launch sub-agents so that rather than
just a singular agent that's, you know,
doing all of this work,
it can deploy multiple agents to review the database,
review the API, review the front end,
and then they come back together to work.
So it leads to much faster, better results.
So this styling is going in.
This is doing, this agent here is going to be doing a lot.
Like this is going to be testing the front end one-shot
capabilities of this.
Can I launch another browser or let's at least go back to
localhost?
Because what we do need is I do need a MCP page.
I believe that I may have one.
Let's see.
Okay.
Yeah.
I do have one.
So, okay.
Review this page here in this project.
I need you to update it to be styled much better because
rather than there being so much information about the
Bridgemind MCP, number one, I need you to rebrand it,
not to Bridgemind MCP, but to BridgeMCP,
like you see it in other parts of the project.
I also need you to improve this so that it actually has the
directions for being able to use the MCP in different AI
agents like cursor, codex, cloud code,
and other commonalities and commonly used AI agents.
You need to update the page so that the main focus is
actually being able to install and use the MCP with these
AI agents.
Note that in order to use the MCP,
they will need to create an account and get an API key.
The first thing that you should do is that you should
actually go to the web app and look at the API to see how
this works and see how this properly is structured.
Launch as many sub-agents as you need to get this done and
then update the BridgeMind MCP page with these
specifications so that there is a better structured
information of how to actually use this MCP so that users
can just copy and paste things in and it's easy to set up
for users and there's good instructions and directions.
Okay, that's a lot.
And the reason that I use like voice to text tools is
because it's just a great way to be able to stream your
conscience, in my opinion, of like, okay,
this is what I want done.
You know, I speak at 170 words per minute.
So rather than me typing at 100 words per minute,
I instead type at, you know, or talk at 170.
So it's just a much faster and better way to do vibe
coding.
Also, it does look like these two agents did complete.
Now, I do want to highlight one key thing here.
And this is an issue that I have when using like a lot of
open source, non-Frontier lab models, which is like,
you can see here that it just like stopped.
So look at this, red prompts.ts and it just stopped, right?
Continue, continue.
So I literally had to just say, hey,
continue because it just stopped.
Like,
and this is a common thing that I have when using like open
router sometimes in cursor or using like some of these open
source models using cursor is that sometimes I'll be using
it and it'll just like stop, right?
And that's actually one thing to check,
which I didn't check at the start of this video and I
probably should check now,
but I did configure this with cursor, right?
But one thing to look at is maybe we can take a look at
models and see is Kimi K 2.5 actually added natively as a
model?
Because it wasn't yesterday, but it could be added now.
Let's see.
You can see that this Kimi K2 here has been added by
cursor.
So I'm actually surprised that they haven't added it
natively yet.
Kimi K2.5.
Yes, that's actually kind of surprising.
So cursor has still not added Kimi K2.5,
which is unfortunate.
They really should be adding that model.
They have added Kimi K2,
but you can see here that I'm actually connected to Kimi
K2.5 via open router.
So I hope that cursor does add it natively because a lot of
people do want to be using these models rather than just
using like your common Frontier lab models like Opase 4.5
or Gemini 3 Pro.
But one thing that we are seeing is obviously it's getting
the job done, but in terms of reliability,
like we had these two agents stopped and it is going very,
very slow.
This is a really slow model in practice.
Like right now,
I know if we go back over to the artificial index,
like you can see, okay, it did put it at speed 119, right?
Which is like incredibly fast, 119 tokens per second.
But how were they integrating with that, right?
Like did they use, if you go back over to open router,
like were they using this fireworks to do that?
I don't know, you know, who they use for that,
but you can see Moonshot AI.
The throughput here is 32 tokens per second.
And here it's even 10.
And I don't know,
could I check on OpenRouter to see how fast?
Here, let me pull this up on my other monitor.
And even here, the same thing happened.
Now let me read the current MCP page to understand what
needs to be updated.
And it just stopped again.
So I'm going to have to say continue.
So I don't know if this is necessarily like the models
issue or if it's a cursor issue.
But hey,
you can see that when I'm trying to use this in cursor,
it is not like grasping the,
like it's just bugging out on me, right?
So let me see if I can actually go to,
and I've spent 14 cents so far on these prompts,
just so you guys know.
So I'm going to go over to activity and see if I can see
the speed that's coming out of this.
Okay, so this is perfect, guys.
So check this out.
So in terms of speed, this is what I'm getting right now.
Like Kimi K2.5, you can see the speed.
In some areas, I'm getting like here, like Kimi K2.5,
this here, you know,
this one was 154.3 tokens per second for tool calls, but,
you know, this was eight, right?
And so you can see the fluctuation there, but on average,
look at this, 14 tokens per second, 38 tokens per second,
28 tokens per second, 35 tokens per second,
29 tokens per second, 111 tokens per second.
So that's the speed that you guys are seeing.
For some of the larger tasks,
like can we find a task that was like actually like,
I mean,
this one was two cents and it was 111 tokens per second.
Can we see what it actually was?
Yeah, I mean, it was just a tool call.
So tool calls, like, I don't know.
I just am, I'm looking at it,
and I think that when you are doing vibe coding,
you do get a good feel for how fast a model is in practice
and how reliable it is.
And look at what we're seeing.
You know,
we're continuing to get issues inside of cursor when using
that.
So it's just not very reliable.
Same thing here, explored four files and then stopped.
So I'm going to do continue again.
And this is an issue, right?
And I will say we have to give it some time.
We have to let whatever's happening happening, you know,
and then hopefully cursor will maybe natively integrate
Kimmy K2.5 into cursor rather than us having to go through
open routers because sometimes it's not as reliable.
Okay, here is the plan that Kimi K2.5 has created.
So this actually does look good.
So let's check it out.
So architecture overview.
So for the database,
it's going to create a skills.ts schema.
It's going to create a skill type enum, which is okay.
It's going to update the relations.ts file with the skills
relations, which is good.
And then it's going to add this,
the export to the index.tsx file or ts file,
which is perfect.
Then in the bridgemind API,
it's going to create this new module, the new controller.
It's going to create this system skills controller,
then the service, the skills there, the DTOs,
and then for the web app,
it's going to add that link in the sidebar.
It's going to create that new page.
It's going to add the types,
and then it's going to add the API endpoints,
and then it's going to add the components for skills.
So that does look okay.
We're just going to see like here is all the different
endpoints that it's going to build.
So like, for example, here are the endpoints.
So skills controller,
it's going to create a skills post endpoint,
which is obviously perfect.
Git, get ID, patch, delete.
So these are within the skills controller.
And then for the system skills,
which I don't know how I feel about this.
I think this will be fine because it's just creating
different routes that are going to be admin only,
which is fine.
And then here is going to be the API endpoints for this.
So I think that this is okay.
You know, here are all the endpoints.
And I will say that this is actually a pretty good plan
that it just created from Kimmy K2.5.
So let's click build on this.
And there's now 18 to-dos.
So even though it only said that it explored four files,
it looks to me like it explored a lot more than four files.
And then here, so that's this one's now working.
This is now implementing this plan.
You guys can see that this is now in process for the
dashboard.
What is this?
Is this just doing this?
Okay, so it did stop again.
So I'm continuously having to do continue, continue,
continue.
So I don't know why that's the case.
Maybe somebody can let me know in the comment section down
below why exactly we continue to have to say continue,
continue, continue and why halfway through the prompt,
like it's just stopping.
This could be a cursor issue.
It could be the fact that it's just a pretty new model,
but now we're actually getting some code out of it.
And this was for the bridge MCP page.
So I don't know why we keep having it stop, right?
I do not like that, but it did do those searches.
It's just that maybe it's not very verbose.
And there's nothing wrong with that.
I mean, some models are too verbose, right?
Like Claude Sonnet 4.5 is too verbose.
And maybe it's just that this model is just not very
verbose.
And when it's done with its work, it just like,
but that's actually not a good thing because it didn't say
like, hey, here's what I did.
Here's what's next.
But I will say now it is building.
So whatever,
whatever we're experiencing with the model stopping and
being a little bit unreliable, we are getting through that.
And now it's building out this plan, which has 18 to-do's.
So I'm excited to see how that turns out.
Now, in terms of this one for the MCP page,
we're going to see this go in here in just a second.
So let's go here.
This is going to be updated.
So this is the page that this is updated.
And remember what our instructions were was, hey,
we need better instructions for how to actually like
install the MCP and how to use the MCP.
So this is this one here.
This page is still updating.
And I've got to say, it's slow.
Okay.
This model is slow.
I don't think this model is very fast.
And, you know,
something that we talk about as a community and that I've
had conversations with some of you about as well is that,
hey, in 2026,
one of the biggest things is going to be speed.
Like for those of you that remember when Composer 1 came
out, which Composer 1,
like the hype totally died down for it because it's not
that, it wasn't that intelligent of a model, right?
But the one thing about Composer 1 is it was such a fast
model that it did make a very big difference because it was
just so fast.
And in terms of speed, it's like, hey,
we just can't have all these slow models.
Okay, so here's another issue.
So this one says building, right?
But it stopped again.
So it did this to do and then it did, it just stopped,
right?
And I don't like that.
I have to keep saying continue.
Again,
maybe somebody can let me know in the comment section down
below why exactly that's happening,
but we're continuing to kind of have to watch these agents.
And again,
this one's still just updating this bridge MCP page.
I mean, this is like incredibly slow.
Now, the file is 649 long, 49 lines long or 652 lines long,
but this is relatively slow because if you use a model like
Sonnet or Opus or even GPT,
I think GPT is faster than this model.
So, hey, like you take GPT, for example, right?
And if we go back, where's our, hold on,
let me pull up my browser and I'm going to show you guys
like one of the biggest things with, like,
let's take Chat GPT, for example, or GBT 5.2, right?
One of the biggest things with GPT models is that, hey,
they're just a little bit slow.
So even though the model performs very, very well.
And if you go over to, you know, the intelligence index,
it's literally number one on intelligence.
But one of the reasons that people use cloud code way more
is because they just say, hey, you know,
GBT models are just slow, right?
And I think that people now realize that, hey,
the faster that you can get the model, the better.
And what I'm seeing out of Kimmy K2.5 thinking is just it's
a little bit slow.
And look at this.
Okay, so it did have this error too.
So it wasn't a one-shot.
So this is a very,
very common issue in Next.js where the prop href expects a
string or object and link, but got undefined instead.
Open your browser's console to view the component stack
trace.
So it literally not only took a very long time to make a
simple update, but it also did it with errors.
So I don't know how I feel about that.
Review this error that you created and fix it.
So not a one-shot, even on like a simple UI page.
So not sure.
We'll give it a chance.
We'll let these three tasks finish.
We've got this one here, which fix client layout,
fix projects page, fix agents page,
which if you guys remember this,
like one interesting thing is if you go back to the
original prompt,
what I asked was I asked the styling of this dashboard
right now.
This is not the sign.
It does not look good.
So I guess that it, okay, here, here's the issue.
I dropped in this div, right?
So we were looking at, and now I have an issue here.
Maybe I can go to here.
Let's go.
Let's go over to the web app.
So we're going to go to report 3001 here.
So, ooh, I don't know how about feel.
It did make it more compact,
but it also took away the header.
Okay, this is very, very interesting.
So I asked it,
I want you to do an in-depth review of the styling of this
dashboard, and I passed in the div, right?
And the reason that I passed in the div is because I only
wanted it to focus on that div section, right?
And instead of Kimi K2.5 intuitively knowing that all I
wanted to update was this div,
you can see that it actually did hallucinate and it started
going in and updating every single page.
Look at this.
It updated the client layout.
It updated the page when in reality,
all I wanted it to do was to update this main dashboard
page, which it did.
But if you look at the to-do list,
you can say it's fixed projects page, fixed agents page,
update card components, fix sidebar styling.
It even took away the button for the sidebar.
So there was a button, if you guys remember,
up in the header here where I could actually collapse the
sidebar, but it took that away.
And now there's no header for me to even be able to
collapse or open up my sidebar.
So that updated the client layout.
I don't know if it's going to be able to fix that,
but we're going to have to click continue because it also
did that thing again where it stopped working halfway
through.
So so far, how long is this video?
27 minutes.
I'm not super impressed.
You know, I think that a lot of people,
when these new models come out, they hype up the models,
right?
Because for a lot of people that are creating content or
whatnot, they want to keep the AI hype going, right?
But my approach to it is, hey,
we're like seriously putting vibe coding into real
practice.
If you guys know, like, you know,
I'm vibe coding every day until I make a million dollars.
So every single day I wake up and I use these models and I
use them for hours and hours and hours on end.
And pretty quickly, you can realize, okay,
this model is not doing a great job.
So I know that it's performing well on the benchmarks,
but in practice, I mean, this thing is hardly usable.
So I don't want to like bash it and, you know, like say,
I mean,
obviously there's a reason that it's performing so well in
the leaderboards, but in actual practice,
this is kind of a hard model to use.
I mean, I'm using it in open router.
I'm using it in cursor.
You can see here, like if we refresh, like look at this,
26 tokens per second, 13 tokens per second.
What is the average?
Average spend, average day, average request?
Okay,
I wish I could see like an average speed rather than just
like the speed for each tool call.
But yeah, I mean,
I think it's a little bit slow and then it's not really
getting exactly what I want done done.
Because for example, like, hey, if I take this right here,
right?
And I'll just show this to you guys.
So let's go back, right?
And that was actually the MCP.
So I didn't want to do that.
Let's hold on.
Let's restore to this checkpoint here.
Hold on.
Can we restore here?
Okay, hold on.
Let's go back over here.
So it was this one here.
So I want you to do an in-depth review of the styling of
this dashboard.
So if we switch the model and I'm literally going to turn
off Kimik2.5 thinking because I actually don't think it's
that good.
I'm not impressed, to be honest with you guys.
So it could be, I mean,
if you guys are like big open source users and using it
like locally,
maybe through a different provider that works a little bit
better, then hey, like go for it.
But in terms of using this in cursor and using it through
open router, it's not reliable.
It's a little bit slow.
And then I'm seeing the model hallucinate like quite a bit
here.
So I want to turn this off and let's even like just use
Gemini 3 flash, for instance, here.
So if I revert this and this was the prompt where,
remember, I passed in this, right?
And I told you guys, I said, it's probably just, and look,
you can see it immediately.
First of all, look at how fast this is.
That index where it said that KBK 2.5 was number five on
speed, that just did not look right to me,
if I'm being completely honest with you guys.
That model is really slow because if you look at Gemini 3.3
Flash here, it immediately is doing all these.
It already created its to-dos.
And like I said, what I found that Kimmy K2.5 did,
and this is just a very particular example that I pick up
on as a vibe coder, is that when I drop in a div,
the model should intuitively know that I'm focused in on
that div.
And what Kimi K2.5 did is it did not, like,
it kind of hallucinated there.
And look at that, boom, already done.
And that's the thing is that, like, as a vibe coder,
you know very quickly like how, how good these models are.
And Kimi K2.5, remember, when I ran this with Kimi K2.5,
it broke my sidebar.
First of all,
it took like five minutes to even do anything,
and it didn't really make the styling that much better.
But with Gemini 3 Flash, it made the styling much better.
It knew that all I wanted to update was that dashboard
page, whereas Kimi K2.5 was updating like the entire,
like a bunch of different pages.
It was slow.
It messed up my nav bar.
So I'm not going to like bash the model,
but I'm not going to approve this model.
This is,
I'm not going to give this the Bridgemind stamp of approval.
We'll give it some time, but honestly, guys,
I'm just not impressed.
It was just a little bit slow.
It wasn't following what I was asking it to do.
It was hallucinating a bit.
So I'm actually surprised that the benchmarks were as good
as they were because I did not think that it reflected the
speed that I saw on the benchmarks.
I don't think that it reflected the intelligence that we
saw on the benchmarks.
But we'll give it some more time.
I may use this a little bit more on stream just to give it
a little bit more of a fair shake.
But in terms of, you know,
using it in a vibe quitting workflow, I'm not impressed.
So I'm curious as to why it's doing so well on the
benchmarks because that's surprising.
This does not feel like a 47 on the intelligence index to
me.
And especially when you look at the speed,
I am not getting 119 tokens per second.
So I don't know if they paid them off or something,
but I'm not going to prove this.
I could use it a little bit more on stream just for the
sake of giving it another shot,
but I don't perceive that this will be a model that I'm
going to be using a bunch.
But with that being said, guys,
I'm going to wrap up the video.
If you guys haven't already liked and subscribed or joined
the Discord, make sure you do so.
I'm live pretty much every day,
vibe quitting an app until I make a million dollars.
And with that being said, guys,
I will see you guys in the future.
Ask follow-up questions or revisit key timestamps.
This video provides a real-world assessment of Moonshot AI's newly released Kimi K2.5 model through the lens of 'vibe coding' in Cursor. The creator initially highlights the model's impressive benchmarks, where it ranks as a top-tier open-source model in terms of intelligence and features an innovative 'agent swarm' capability. However, practical testing reveals a significant gap between these rankings and actual performance. The model struggled with slow response speeds, frequent interruptions requiring manual continuation, and a lack of intuitive understanding of coding tasks, often leading to UI hallucinations and errors. Ultimately, the creator concludes that despite its high intelligence scores on paper, the model's unreliability and poor instruction-following make it less effective than competitors like Gemini 3.3 Flash for professional development workflows.
Videos recently processed by our community