HomeVideos

Vibe Coding With Kimi K2.5

Now Playing

Vibe Coding With Kimi K2.5

Transcript

869 segments

0:00

Hello, everyone, and welcome back to another video.

0:03

In this video,

0:04

I'm going to be vibe coding with the newly released Kimmy K

0:07

2.5, which released yesterday.

0:10

It's a model from Moonshot AI,

0:12

and this is a new Frontier model.

0:15

As you can see here on OpenRouter,

0:17

the context window is 262,000,

0:19

and it's a relatively affordable model at 60 cents per

0:22

million on the input and $3 per million on the output.

0:25

This is an open source model and is effectively now the

0:29

number one open source model,

0:31

but we're going to get into that as we vibe code with this

0:34

newly released model in this video.

0:36

But before we get started, guys,

0:37

we are going to be using Kimmy K2.5 inside of Cursor.

0:40

I configured it using OpenRouter and it's configured here.

0:43

And we're going to be working on a variety of different

0:46

tasks, both across the back end,

0:48

as well as the front end and the database.

0:51

But before we get started, I do ask that you guys like,

0:54

subscribe, turn on post notifications,

0:57

and join the BridgeMind Discord community,

0:58

which is the fastest growing vibe coding community on the

1:01

internet right now.

1:03

If you guys haven't already seen,

1:04

we do have a light goal for this video of 200 likes.

1:07

So if you guys could like the video, let's hit that goal.

1:10

But with that being said,

1:11

let's actually take a look at this model, who made it,

1:14

what is different about it,

1:16

and how it ranks up on the leaderboards and the benchmarks.

1:18

So here is like the summary from Moonshot.

1:21

So they say,

1:21

Kimmy K2.5 is Moonshot AI's native multimodal model,

1:26

delivering state-of-the-art visual coding capability and a

1:28

self-directed agent swarm paradigm, which is very,

1:32

very interesting.

1:32

We're going to get into that in a second here because Kimi

1:35

K2.5 is different in this way.

1:37

So built on Kimi K2 with continued pre-training over

1:40

approximately 15 trillion, which is insane,

1:43

mixed visual and text tokens.

1:45

It delivers strong performance in general reasoning,

1:48

visual coding, and agentic tool calling.

1:50

So one really important thing to note is the speeds that

1:54

people are actually getting with this model.

1:55

So you can see here Moonshot AI,

1:57

it's about 29 tokens per second directly from the provider,

2:00

which is actually relatively slow.

2:03

But then there's some faster GMI Cloud is at 75 and then

2:06

Fireworks is at 104.

2:07

So a little bit faster with those providers,

2:09

but relatively slow model, I will say.

2:11

But this model is not yet in Ella Marina.

2:14

So we can't see how it stacks up there.

2:16

But artificial analysis,

2:18

it has added this model to its benchmark.

2:21

So what I want to dive into before we start vibe coding

2:24

with this model and putting it through the test of real

2:27

vibe coding workflows, what I first want to do is, okay,

2:30

where does this stack up with speed?

2:32

Where does it stack up in intelligence?

2:34

So even here,

2:34

you can actually see that Kimi K2.5 in the artificial

2:38

analysis benchmark actually stacked up pretty well.

2:42

So it may be a little bit slower when we use it.

2:45

There may be different demands on OpenRouter that cause it

2:47

to be a little bit slower.

2:48

But artificial analysis actually sets this at 119.

2:52

So it's number five on speed, which is surprising.

2:54

So I don't know how they're integrating with that model,

2:56

but in terms of speed, it's actually number five,

2:59

which is surprising.

3:00

But in intelligence, this is like a very,

3:03

very important index.

3:04

So let's just scroll down here.

3:05

Kimmy K2.5 is performing up with Frontier models like GPT

3:10

5.1, Gemini 3 Pro, Claude Opus 4.5, and GPT 5.2.

3:14

So you can see it's literally ranking at 47,

3:17

number five on the list,

3:19

which is just absolutely incredible.

3:20

It's beating out Claude 4.5 Sonnet.

3:22

It totally beats out Minimax M2.1.

3:25

Wait,

3:25

let's add MiniMax because I think that like in the context

3:28

of like these cheaper open source models,

3:30

I think we definitely want to add MiniMax M2.1 and GLM 4.7

3:34

to this list to see, okay,

3:35

how does this stack up to those models?

3:38

So MiniMax M2.1, you can see is back here at 40,

3:41

and then GLM 4.7 is here at 42.

3:43

So Kimi K2.5 is the number one open source model now.

3:49

And in terms of like that cost affordability, I mean,

3:51

look at that.

3:52

I mean, 60 cents per million on the input,

3:53

$3 per million on the output.

3:55

So it's literally like a ninth of the cost of Opus 4.5,

3:58

and you're not losing a ton of intelligence.

4:00

However,

4:01

I do want to look at the coding index and highlight this.

4:03

This model is not that good at coding, okay?

4:06

Like it definitely is obviously a good,

4:08

a good coding model.

4:09

It still beats out Sonic 4.5,

4:11

but you can see here that in the intelligence index,

4:13

it's like up there with the Frontier models.

4:15

But then you look at the coding index and you're like, ah,

4:18

you know, it's eight below Opus 4.5.

4:20

It just didn't, it didn't perform super,

4:22

super well in the coding index.

4:24

And this is reflected too,

4:26

like in their actual benchmarks that they put out,

4:28

which we're going to look at here in a second.

4:29

But the next thing that I want to show you guys that I did

4:32

notice when I was doing a review of some of these

4:35

benchmarks is the hallucination rate, okay?

4:37

Because sometimes you can have a model that is high on the

4:40

intelligence index or high on the coding index,

4:42

but then it has a crazy high hallucination rate.

4:44

Like, for example, a good example of this is GLM 4.7.

4:47

GLM 4.7 has a 90% hallucination rate, which is insane,

4:51

right?

4:51

That's very, very high.

4:53

But what I do want to kind of show you guys is that Kimmy

4:56

K2.5 actually does very,

4:58

very well on this hallucination benchmark.

5:00

It beats out models like,

5:02

I think I'm pretty sure it beats out, where is it?

5:04

Where is Jupiter?

5:05

Yeah, look at this.

5:05

Gemini 3 Pro here, 88% on the hallucination rate.

5:09

And Kimmy K2.5, they just did a very,

5:11

very good job with hallucination.

5:13

So this is going to be a model where, hey,

5:15

it does have a good hallucination rate.

5:17

That's definitely one thing that I noticed.

5:19

That's actually pretty good.

5:20

So with that being said,

5:22

let's take a look at the blog that Moonshot AI actually put

5:25

out.

5:25

So they said, today we are introducing Kimi K2.5,

5:28

the most powerful open source model to date.

5:31

So we already read this,

5:32

but Kimmy K2.5 builds on Kimi K2 with continued

5:35

pre-training over approximately 15 trillion mixed visual

5:38

and text tokens.

5:39

Built as a native multimodal model,

5:42

K2.5 delivers state-of-the-art coding and vision

5:44

capabilities.

5:44

I don't know about state-of-the-art coding,

5:46

but in a self-directed agent swarm paradigm.

5:50

This is very interesting.

5:51

So they say that for complex tasks,

5:53

KimiK2.5 can self-direct an agent swarm with up to 100

5:58

sub-agents executing parallel workflows across up to 1500

6:02

tool calls.

6:03

I don't know how expensive that's going to be to launch 100

6:05

sub-agents,

6:05

but the fact that it does a self-directed agent swarm,

6:08

I don't know how we're going to be able to see that or like

6:10

what that looks like in practice.

6:12

But later in this video,

6:13

once we start putting this into practice, we may see that.

6:15

I'm not sure yet.

6:15

Compared with a single agent setup,

6:18

this reduces execution time by up to 4.5x.

6:20

The agent swarm is automatically created and orchestrated

6:23

by Kimi K2.5 without any predefined sub-agents or

6:26

workflows.

6:27

So literally the model itself is able to deploy this agent

6:30

swarm.

6:30

Whereas let's say that you were using Claude Code or you

6:33

were using one of those other models,

6:35

you're going to have to say, hey,

6:35

launch 10 sub-agents to do this task.

6:38

But this seems like it's going to be self-directed.

6:40

So I don't know how we'll be able to actually see that,

6:42

but we'll see it here in a minute once you start coding.

6:44

So in agents, it does very well on these benchmarks.

6:47

It's leading.

6:48

So I don't,

6:48

I haven't like these aren't benchmarks that I really look

6:51

at.

6:52

The biggest benchmarks that I look at, obviously,

6:53

for Vibe coding is this Sui Bench Verified coding benchmark

6:57

here, SuiBench Verified.

6:59

So you can see here that like this model, like I said,

7:02

you know, it does well on a lot of these benchmarks.

7:05

You can see that it's up here in image and agents and

7:07

video.

7:08

But the issue is that for vibe coding,

7:11

it actually doesn't perform very well.

7:12

You can see right here, 76.8.

7:15

You can look at it and say, okay,

7:17

it does perform better than Gemini 3 Pro.

7:19

So that's something to look at.

7:21

And you can say, hey, look, look at the cost.

7:23

Look at how well it does with the hallucination rate and

7:26

these other factors.

7:27

But in terms of it being the number one coding model,

7:30

it's not even close.

7:31

I mean,

7:31

those four points on this benchmark does make a difference.

7:34

So Opus 4.5 is still leading for the best coding model,

7:38

and GBT is still up there as well.

7:40

Kimmy K2.5, not as good.

7:41

But with that being said,

7:42

I think that gives us enough information to be able to like

7:46

test this out.

7:47

So let's go back over to cursor here.

7:49

And what are we going to be working on?

7:51

We're going to be working across both the back end and

7:53

front end.

7:54

We're going to be working in Bridgemind.

7:56

So as you guys know, we are launching Bridgemind here.

7:59

Let's just log in real quick.

8:00

I think that I have an account here.

8:04

So let me just sign in and all right, perfect.

8:07

So here's Bridgemind.

8:08

There's a couple different things that I want KinnyK2.5 to

8:11

actually build out for us.

8:13

So in with Bridgemind,

8:15

what we're working on is we're working on the Bridge MCP.

8:17

Okay.

8:18

So for example,

8:18

I can go here and I can just create a new project.

8:21

And this project has basically this task list.

8:24

Okay.

8:25

So this is a way that you can work across agents where you

8:27

can see I have it to do, in progress, in review, complete,

8:30

and canceled.

8:31

And what this does is this allows you to work with agents

8:34

that will then,

8:35

you can create instructions and you guys will see how this

8:37

works.

8:37

But what we're going to focus on particularly is that we

8:40

want to build basically a prompt library and a skill

8:44

library for people to be able to use BridgeMind and use

8:49

pre-built prompts and pre-built agent skills that they'll

8:52

be able to just grab with their subscription.

8:54

So this is kind of a new concept.

8:56

So we're not really going to be focusing on the Bridge MCP.

8:58

We're more so going to be focusing and testing Kimmy K2.5

9:01

on its ability to actually build out back-end logic and

9:04

then build that into the front end.

9:06

So I'm just going to launch a new agent here and we're

9:08

going to be working with a lot of different agents.

9:09

Okay.

9:10

So the first thing I want to do is I'm going to open up a

9:12

browser and we're going to go over to localhost here and

9:15

we're going to sign in again.

9:17

Let's see here.

9:19

Like even one thing here.

9:20

Let's zoom out.

9:21

Okay.

9:22

There we go.

9:22

Okay.

9:22

So let's zoom out.

9:23

We're going to log in again and we can do this in a second.

9:26

Let's give it its first prompt.

9:27

So all I'm going to do is I'm going to drag and drop the

9:29

database.

9:30

I'm going to drag and drop the API and I'm going to drag

9:33

and drop the UI and I'm going to drag and drop the web app.

9:36

Okay.

9:37

And I'm then going to give it my first prompt.

9:40

Drag and dropping.

9:41

Okay.

9:41

Hold on.

9:42

I have to be careful here so I don't get this nested.

9:44

Okay.

9:44

So I'm just going to add this.

9:45

Okay, Bridgemind, web app.

9:48

I want you to review the database, API, UI, and web app,

9:53

and I need you to build a new table in our Drizzle schema

9:58

called skills.

10:00

Skills will be similar to prompts in how it's structured.

10:04

I need you to review the database in the API because I need

10:07

you to create the new schema for this skills table.

10:11

And then I need you to create a new module in the API so

10:14

that users can create skills.

10:16

And then it will be similar to the prompt schema where

10:20

there can be system skills that are created by admins.

10:23

So I want you to review the database, how it's structured,

10:26

the API, and how it's structured,

10:28

and then I want you to introduce this new skills schema.

10:32

And I want you to build the module for this in the API so

10:35

that users can create skills, update skills,

10:38

and so that admins can create skills and manage skills.

10:42

And I want to start out with the most simple schema

10:47

structure possible.

10:48

So all we really need is the ability for users to create a

10:52

skill, to add content to that.

10:54

And that's all I need for now.

10:56

So review the project and create a structured plan before

11:00

you code anything.

11:01

But first,

11:01

you need to do an in-depth review of these different

11:04

repositories.

11:05

You can launch sub-agents to do this.

11:08

And I then need you to create a structured plan for

11:11

building out this new functionality.

11:13

You need to make sure that you add a skills link in the

11:17

sidebar of the Bridgemind web app and then create the pages

11:21

for this and integrate this functionality across the

11:24

database, API, and front-end web app.

11:27

Okay, so you guys kind of got my prompt there.

11:30

We are going to put cursor in plan mode for this,

11:33

but I did ask it to launch sub-agents,

11:35

and sub-agents are new in cursor.

11:37

So sub-agents, if you guys don't know what sub-agents are,

11:39

it's essentially the ability for cursor to be able to spin

11:42

up multiple sub-agents rather than just working in like a

11:45

single conversation chat.

11:46

So this actually helps with context.

11:48

It helps with doing things faster.

11:50

But also, if you guys remember, like, you know, Kimi K2.5,

11:53

I don't know how we're going to be able to see this or if

11:55

there's going to be some type of visual representation of

11:57

this, but they did say that it can launch agent swarms.

12:01

So what that actually looks like in practice,

12:03

I'm not sure yet, but we'll see how that works.

12:06

It does look like it's a little bit slow,

12:08

like planning next moves that, you know,

12:09

some of the other models, they'll go like super,

12:11

super fast.

12:12

But let's launch another agent and let's actually sign in

12:15

here and let's move on to like the next the next portion of

12:19

what we're going to be working on because we are going to

12:21

be working on multiple things at once.

12:22

Okay.

12:23

So the next thing that I want to try is just like,

12:26

let's see the UI capabilities of this model.

12:28

So we're going to drop in this div.

12:30

I want you to do an in-depth review of the styling of this

12:33

dashboard.

12:34

Right now, this is not the styling I want.

12:36

It does not look good.

12:37

There's wrapping all over the place.

12:39

It's not professional.

12:40

I want you to review the other parts of the website and

12:42

update the styling so it's more compact, more professional,

12:45

and more modern.

12:47

Okay, so we'll drop that in, and that should do the trick.

12:51

Let's actually add the BridgeMind web app just so that it

12:54

knows, okay,

12:54

like it's not going to be searching for what project that's

12:56

in.

12:56

Then here, okay,

12:57

so here's those sub-agents that I was talking about.

13:00

So I'll launch sub-agents to explore the database API and

13:02

front-end structures in parallel to understand how prompts

13:05

are currently implemented so I can model skills similarly.

13:09

So one great part about using sub-agents and the reason

13:12

that you should be using them is because let's say that

13:14

you're working on something like this, right?

13:16

Where I drop in database, API, UI, and a web app.

13:19

You can ask it to launch sub-agents so that rather than

13:21

just a singular agent that's, you know,

13:23

doing all of this work,

13:24

it can deploy multiple agents to review the database,

13:27

review the API, review the front end,

13:29

and then they come back together to work.

13:31

So it leads to much faster, better results.

13:35

So this styling is going in.

13:37

This is doing, this agent here is going to be doing a lot.

13:40

Like this is going to be testing the front end one-shot

13:44

capabilities of this.

13:45

Can I launch another browser or let's at least go back to

13:48

localhost?

13:50

Because what we do need is I do need a MCP page.

13:53

I believe that I may have one.

13:54

Let's see.

13:54

Okay.

13:55

Yeah.

13:55

I do have one.

13:56

So, okay.

13:58

Review this page here in this project.

14:01

I need you to update it to be styled much better because

14:04

rather than there being so much information about the

14:07

Bridgemind MCP, number one, I need you to rebrand it,

14:12

not to Bridgemind MCP, but to BridgeMCP,

14:15

like you see it in other parts of the project.

14:18

I also need you to improve this so that it actually has the

14:21

directions for being able to use the MCP in different AI

14:25

agents like cursor, codex, cloud code,

14:29

and other commonalities and commonly used AI agents.

14:33

You need to update the page so that the main focus is

14:35

actually being able to install and use the MCP with these

14:38

AI agents.

14:40

Note that in order to use the MCP,

14:42

they will need to create an account and get an API key.

14:45

The first thing that you should do is that you should

14:47

actually go to the web app and look at the API to see how

14:50

this works and see how this properly is structured.

14:54

Launch as many sub-agents as you need to get this done and

14:58

then update the BridgeMind MCP page with these

15:00

specifications so that there is a better structured

15:05

information of how to actually use this MCP so that users

15:08

can just copy and paste things in and it's easy to set up

15:11

for users and there's good instructions and directions.

15:14

Okay, that's a lot.

15:15

And the reason that I use like voice to text tools is

15:17

because it's just a great way to be able to stream your

15:20

conscience, in my opinion, of like, okay,

15:22

this is what I want done.

15:24

You know, I speak at 170 words per minute.

15:26

So rather than me typing at 100 words per minute,

15:28

I instead type at, you know, or talk at 170.

15:32

So it's just a much faster and better way to do vibe

15:34

coding.

15:35

Also, it does look like these two agents did complete.

15:38

Now, I do want to highlight one key thing here.

15:42

And this is an issue that I have when using like a lot of

15:46

open source, non-Frontier lab models, which is like,

15:50

you can see here that it just like stopped.

15:52

So look at this, red prompts.ts and it just stopped, right?

15:56

Continue, continue.

15:58

So I literally had to just say, hey,

16:00

continue because it just stopped.

16:02

Like,

16:03

and this is a common thing that I have when using like open

16:05

router sometimes in cursor or using like some of these open

16:08

source models using cursor is that sometimes I'll be using

16:11

it and it'll just like stop, right?

16:13

And that's actually one thing to check,

16:15

which I didn't check at the start of this video and I

16:18

probably should check now,

16:19

but I did configure this with cursor, right?

16:22

But one thing to look at is maybe we can take a look at

16:25

models and see is Kimi K 2.5 actually added natively as a

16:29

model?

16:30

Because it wasn't yesterday, but it could be added now.

16:33

Let's see.

16:34

You can see that this Kimi K2 here has been added by

16:37

cursor.

16:37

So I'm actually surprised that they haven't added it

16:39

natively yet.

16:40

Kimi K2.5.

16:42

Yes, that's actually kind of surprising.

16:43

So cursor has still not added Kimi K2.5,

16:48

which is unfortunate.

16:49

They really should be adding that model.

16:51

They have added Kimi K2,

16:54

but you can see here that I'm actually connected to Kimi

16:56

K2.5 via open router.

16:59

So I hope that cursor does add it natively because a lot of

17:02

people do want to be using these models rather than just

17:06

using like your common Frontier lab models like Opase 4.5

17:09

or Gemini 3 Pro.

17:11

But one thing that we are seeing is obviously it's getting

17:13

the job done, but in terms of reliability,

17:15

like we had these two agents stopped and it is going very,

17:20

very slow.

17:22

This is a really slow model in practice.

17:24

Like right now,

17:26

I know if we go back over to the artificial index,

17:28

like you can see, okay, it did put it at speed 119, right?

17:32

Which is like incredibly fast, 119 tokens per second.

17:36

But how were they integrating with that, right?

17:39

Like did they use, if you go back over to open router,

17:41

like were they using this fireworks to do that?

17:43

I don't know, you know, who they use for that,

17:46

but you can see Moonshot AI.

17:47

The throughput here is 32 tokens per second.

17:49

And here it's even 10.

17:51

And I don't know,

17:53

could I check on OpenRouter to see how fast?

17:55

Here, let me pull this up on my other monitor.

17:57

And even here, the same thing happened.

17:59

Now let me read the current MCP page to understand what

18:02

needs to be updated.

18:03

And it just stopped again.

18:05

So I'm going to have to say continue.

18:06

So I don't know if this is necessarily like the models

18:10

issue or if it's a cursor issue.

18:12

But hey,

18:13

you can see that when I'm trying to use this in cursor,

18:15

it is not like grasping the,

18:19

like it's just bugging out on me, right?

18:21

So let me see if I can actually go to,

18:23

and I've spent 14 cents so far on these prompts,

18:26

just so you guys know.

18:26

So I'm going to go over to activity and see if I can see

18:28

the speed that's coming out of this.

18:30

Okay, so this is perfect, guys.

18:31

So check this out.

18:32

So in terms of speed, this is what I'm getting right now.

18:35

Like Kimi K2.5, you can see the speed.

18:38

In some areas, I'm getting like here, like Kimi K2.5,

18:42

this here, you know,

18:43

this one was 154.3 tokens per second for tool calls, but,

18:48

you know, this was eight, right?

18:50

And so you can see the fluctuation there, but on average,

18:53

look at this, 14 tokens per second, 38 tokens per second,

18:55

28 tokens per second, 35 tokens per second,

18:57

29 tokens per second, 111 tokens per second.

19:00

So that's the speed that you guys are seeing.

19:02

For some of the larger tasks,

19:03

like can we find a task that was like actually like,

19:06

I mean,

19:06

this one was two cents and it was 111 tokens per second.

19:09

Can we see what it actually was?

19:11

Yeah, I mean, it was just a tool call.

19:13

So tool calls, like, I don't know.

19:16

I just am, I'm looking at it,

19:17

and I think that when you are doing vibe coding,

19:19

you do get a good feel for how fast a model is in practice

19:23

and how reliable it is.

19:24

And look at what we're seeing.

19:26

You know,

19:26

we're continuing to get issues inside of cursor when using

19:30

that.

19:31

So it's just not very reliable.

19:32

Same thing here, explored four files and then stopped.

19:36

So I'm going to do continue again.

19:38

And this is an issue, right?

19:40

And I will say we have to give it some time.

19:43

We have to let whatever's happening happening, you know,

19:46

and then hopefully cursor will maybe natively integrate

19:49

Kimmy K2.5 into cursor rather than us having to go through

19:53

open routers because sometimes it's not as reliable.

19:55

Okay, here is the plan that Kimi K2.5 has created.

19:59

So this actually does look good.

20:00

So let's check it out.

20:01

So architecture overview.

20:02

So for the database,

20:03

it's going to create a skills.ts schema.

20:06

It's going to create a skill type enum, which is okay.

20:09

It's going to update the relations.ts file with the skills

20:11

relations, which is good.

20:13

And then it's going to add this,

20:14

the export to the index.tsx file or ts file,

20:17

which is perfect.

20:18

Then in the bridgemind API,

20:19

it's going to create this new module, the new controller.

20:22

It's going to create this system skills controller,

20:24

then the service, the skills there, the DTOs,

20:28

and then for the web app,

20:29

it's going to add that link in the sidebar.

20:31

It's going to create that new page.

20:33

It's going to add the types,

20:34

and then it's going to add the API endpoints,

20:35

and then it's going to add the components for skills.

20:37

So that does look okay.

20:39

We're just going to see like here is all the different

20:42

endpoints that it's going to build.

20:43

So like, for example, here are the endpoints.

20:45

So skills controller,

20:46

it's going to create a skills post endpoint,

20:48

which is obviously perfect.

20:49

Git, get ID, patch, delete.

20:52

So these are within the skills controller.

20:53

And then for the system skills,

20:56

which I don't know how I feel about this.

21:00

I think this will be fine because it's just creating

21:03

different routes that are going to be admin only,

21:06

which is fine.

21:07

And then here is going to be the API endpoints for this.

21:10

So I think that this is okay.

21:13

You know, here are all the endpoints.

21:14

And I will say that this is actually a pretty good plan

21:17

that it just created from Kimmy K2.5.

21:18

So let's click build on this.

21:20

And there's now 18 to-dos.

21:22

So even though it only said that it explored four files,

21:26

it looks to me like it explored a lot more than four files.

21:29

And then here, so that's this one's now working.

21:31

This is now implementing this plan.

21:33

You guys can see that this is now in process for the

21:35

dashboard.

21:37

What is this?

21:37

Is this just doing this?

21:38

Okay, so it did stop again.

21:40

So I'm continuously having to do continue, continue,

21:42

continue.

21:43

So I don't know why that's the case.

21:44

Maybe somebody can let me know in the comment section down

21:47

below why exactly we continue to have to say continue,

21:50

continue, continue and why halfway through the prompt,

21:52

like it's just stopping.

21:53

This could be a cursor issue.

21:55

It could be the fact that it's just a pretty new model,

21:57

but now we're actually getting some code out of it.

21:59

And this was for the bridge MCP page.

22:02

So I don't know why we keep having it stop, right?

22:08

I do not like that, but it did do those searches.

22:10

It's just that maybe it's not very verbose.

22:13

And there's nothing wrong with that.

22:15

I mean, some models are too verbose, right?

22:16

Like Claude Sonnet 4.5 is too verbose.

22:20

And maybe it's just that this model is just not very

22:22

verbose.

22:23

And when it's done with its work, it just like,

22:25

but that's actually not a good thing because it didn't say

22:27

like, hey, here's what I did.

22:28

Here's what's next.

22:29

But I will say now it is building.

22:31

So whatever,

22:32

whatever we're experiencing with the model stopping and

22:34

being a little bit unreliable, we are getting through that.

22:37

And now it's building out this plan, which has 18 to-do's.

22:40

So I'm excited to see how that turns out.

22:41

Now, in terms of this one for the MCP page,

22:43

we're going to see this go in here in just a second.

22:46

So let's go here.

22:47

This is going to be updated.

22:48

So this is the page that this is updated.

22:50

And remember what our instructions were was, hey,

22:52

we need better instructions for how to actually like

22:56

install the MCP and how to use the MCP.

23:00

So this is this one here.

23:02

This page is still updating.

23:03

And I've got to say, it's slow.

23:06

Okay.

23:06

This model is slow.

23:08

I don't think this model is very fast.

23:11

And, you know,

23:11

something that we talk about as a community and that I've

23:15

had conversations with some of you about as well is that,

23:17

hey, in 2026,

23:18

one of the biggest things is going to be speed.

23:21

Like for those of you that remember when Composer 1 came

23:23

out, which Composer 1,

23:24

like the hype totally died down for it because it's not

23:27

that, it wasn't that intelligent of a model, right?

23:29

But the one thing about Composer 1 is it was such a fast

23:33

model that it did make a very big difference because it was

23:37

just so fast.

23:39

And in terms of speed, it's like, hey,

23:41

we just can't have all these slow models.

23:44

Okay, so here's another issue.

23:45

So this one says building, right?

23:47

But it stopped again.

23:48

So it did this to do and then it did, it just stopped,

23:52

right?

23:52

And I don't like that.

23:53

I have to keep saying continue.

23:56

Again,

23:56

maybe somebody can let me know in the comment section down

23:58

below why exactly that's happening,

24:00

but we're continuing to kind of have to watch these agents.

24:03

And again,

24:04

this one's still just updating this bridge MCP page.

24:07

I mean, this is like incredibly slow.

24:09

Now, the file is 649 long, 49 lines long or 652 lines long,

24:14

but this is relatively slow because if you use a model like

24:19

Sonnet or Opus or even GPT,

24:22

I think GPT is faster than this model.

24:25

So, hey, like you take GPT, for example, right?

24:29

And if we go back, where's our, hold on,

24:31

let me pull up my browser and I'm going to show you guys

24:33

like one of the biggest things with, like,

24:37

let's take Chat GPT, for example, or GBT 5.2, right?

24:40

One of the biggest things with GPT models is that, hey,

24:44

they're just a little bit slow.

24:46

So even though the model performs very, very well.

24:49

And if you go over to, you know, the intelligence index,

24:52

it's literally number one on intelligence.

24:54

But one of the reasons that people use cloud code way more

24:58

is because they just say, hey, you know,

24:59

GBT models are just slow, right?

25:02

And I think that people now realize that, hey,

25:04

the faster that you can get the model, the better.

25:07

And what I'm seeing out of Kimmy K2.5 thinking is just it's

25:10

a little bit slow.

25:12

And look at this.

25:13

Okay, so it did have this error too.

25:15

So it wasn't a one-shot.

25:17

So this is a very,

25:18

very common issue in Next.js where the prop href expects a

25:22

string or object and link, but got undefined instead.

25:24

Open your browser's console to view the component stack

25:26

trace.

25:27

So it literally not only took a very long time to make a

25:30

simple update, but it also did it with errors.

25:33

So I don't know how I feel about that.

25:36

Review this error that you created and fix it.

25:39

So not a one-shot, even on like a simple UI page.

25:42

So not sure.

25:45

We'll give it a chance.

25:46

We'll let these three tasks finish.

25:48

We've got this one here, which fix client layout,

25:51

fix projects page, fix agents page,

25:53

which if you guys remember this,

25:55

like one interesting thing is if you go back to the

25:57

original prompt,

25:59

what I asked was I asked the styling of this dashboard

26:03

right now.

26:03

This is not the sign.

26:04

It does not look good.

26:04

So I guess that it, okay, here, here's the issue.

26:09

I dropped in this div, right?

26:11

So we were looking at, and now I have an issue here.

26:13

Maybe I can go to here.

26:14

Let's go.

26:14

Let's go over to the web app.

26:15

So we're going to go to report 3001 here.

26:18

So, ooh, I don't know how about feel.

26:20

It did make it more compact,

26:22

but it also took away the header.

26:26

Okay, this is very, very interesting.

26:28

So I asked it,

26:29

I want you to do an in-depth review of the styling of this

26:32

dashboard, and I passed in the div, right?

26:34

And the reason that I passed in the div is because I only

26:37

wanted it to focus on that div section, right?

26:41

And instead of Kimi K2.5 intuitively knowing that all I

26:45

wanted to update was this div,

26:47

you can see that it actually did hallucinate and it started

26:51

going in and updating every single page.

26:53

Look at this.

26:54

It updated the client layout.

26:55

It updated the page when in reality,

26:57

all I wanted it to do was to update this main dashboard

27:02

page, which it did.

27:03

But if you look at the to-do list,

27:05

you can say it's fixed projects page, fixed agents page,

27:08

update card components, fix sidebar styling.

27:11

It even took away the button for the sidebar.

27:13

So there was a button, if you guys remember,

27:15

up in the header here where I could actually collapse the

27:17

sidebar, but it took that away.

27:20

And now there's no header for me to even be able to

27:23

collapse or open up my sidebar.

27:25

So that updated the client layout.

27:27

I don't know if it's going to be able to fix that,

27:29

but we're going to have to click continue because it also

27:31

did that thing again where it stopped working halfway

27:33

through.

27:33

So so far, how long is this video?

27:36

27 minutes.

27:37

I'm not super impressed.

27:39

You know, I think that a lot of people,

27:40

when these new models come out, they hype up the models,

27:44

right?

27:44

Because for a lot of people that are creating content or

27:47

whatnot, they want to keep the AI hype going, right?

27:49

But my approach to it is, hey,

27:52

we're like seriously putting vibe coding into real

27:54

practice.

27:54

If you guys know, like, you know,

27:55

I'm vibe coding every day until I make a million dollars.

27:58

So every single day I wake up and I use these models and I

28:01

use them for hours and hours and hours on end.

28:03

And pretty quickly, you can realize, okay,

28:06

this model is not doing a great job.

28:09

So I know that it's performing well on the benchmarks,

28:11

but in practice, I mean, this thing is hardly usable.

28:14

So I don't want to like bash it and, you know, like say,

28:18

I mean,

28:18

obviously there's a reason that it's performing so well in

28:20

the leaderboards, but in actual practice,

28:23

this is kind of a hard model to use.

28:25

I mean, I'm using it in open router.

28:26

I'm using it in cursor.

28:28

You can see here, like if we refresh, like look at this,

28:31

26 tokens per second, 13 tokens per second.

28:34

What is the average?

28:35

Average spend, average day, average request?

28:38

Okay,

28:39

I wish I could see like an average speed rather than just

28:41

like the speed for each tool call.

28:43

But yeah, I mean,

28:44

I think it's a little bit slow and then it's not really

28:48

getting exactly what I want done done.

28:51

Because for example, like, hey, if I take this right here,

28:54

right?

28:55

And I'll just show this to you guys.

28:56

So let's go back, right?

29:00

And that was actually the MCP.

29:01

So I didn't want to do that.

29:02

Let's hold on.

29:03

Let's restore to this checkpoint here.

29:07

Hold on.

29:07

Can we restore here?

29:11

Okay, hold on.

29:12

Let's go back over here.

29:14

So it was this one here.

29:15

So I want you to do an in-depth review of the styling of

29:17

this dashboard.

29:18

So if we switch the model and I'm literally going to turn

29:21

off Kimik2.5 thinking because I actually don't think it's

29:24

that good.

29:25

I'm not impressed, to be honest with you guys.

29:27

So it could be, I mean,

29:28

if you guys are like big open source users and using it

29:31

like locally,

29:32

maybe through a different provider that works a little bit

29:34

better, then hey, like go for it.

29:36

But in terms of using this in cursor and using it through

29:38

open router, it's not reliable.

29:40

It's a little bit slow.

29:41

And then I'm seeing the model hallucinate like quite a bit

29:45

here.

29:45

So I want to turn this off and let's even like just use

29:49

Gemini 3 flash, for instance, here.

29:51

So if I revert this and this was the prompt where,

29:55

remember, I passed in this, right?

29:57

And I told you guys, I said, it's probably just, and look,

30:00

you can see it immediately.

30:01

First of all, look at how fast this is.

30:04

That index where it said that KBK 2.5 was number five on

30:08

speed, that just did not look right to me,

30:11

if I'm being completely honest with you guys.

30:14

That model is really slow because if you look at Gemini 3.3

30:17

Flash here, it immediately is doing all these.

30:21

It already created its to-dos.

30:23

And like I said, what I found that Kimmy K2.5 did,

30:26

and this is just a very particular example that I pick up

30:29

on as a vibe coder, is that when I drop in a div,

30:34

the model should intuitively know that I'm focused in on

30:38

that div.

30:39

And what Kimi K2.5 did is it did not, like,

30:42

it kind of hallucinated there.

30:44

And look at that, boom, already done.

30:45

And that's the thing is that, like, as a vibe coder,

30:47

you know very quickly like how, how good these models are.

30:51

And Kimi K2.5, remember, when I ran this with Kimi K2.5,

30:55

it broke my sidebar.

30:57

First of all,

30:57

it took like five minutes to even do anything,

31:00

and it didn't really make the styling that much better.

31:03

But with Gemini 3 Flash, it made the styling much better.

31:06

It knew that all I wanted to update was that dashboard

31:09

page, whereas Kimi K2.5 was updating like the entire,

31:12

like a bunch of different pages.

31:14

It was slow.

31:15

It messed up my nav bar.

31:17

So I'm not going to like bash the model,

31:19

but I'm not going to approve this model.

31:20

This is,

31:21

I'm not going to give this the Bridgemind stamp of approval.

31:25

We'll give it some time, but honestly, guys,

31:27

I'm just not impressed.

31:29

It was just a little bit slow.

31:30

It wasn't following what I was asking it to do.

31:33

It was hallucinating a bit.

31:34

So I'm actually surprised that the benchmarks were as good

31:37

as they were because I did not think that it reflected the

31:41

speed that I saw on the benchmarks.

31:43

I don't think that it reflected the intelligence that we

31:45

saw on the benchmarks.

31:47

But we'll give it some more time.

31:48

I may use this a little bit more on stream just to give it

31:51

a little bit more of a fair shake.

31:52

But in terms of, you know,

31:53

using it in a vibe quitting workflow, I'm not impressed.

31:57

So I'm curious as to why it's doing so well on the

32:00

benchmarks because that's surprising.

32:03

This does not feel like a 47 on the intelligence index to

32:05

me.

32:06

And especially when you look at the speed,

32:08

I am not getting 119 tokens per second.

32:10

So I don't know if they paid them off or something,

32:13

but I'm not going to prove this.

32:15

I could use it a little bit more on stream just for the

32:17

sake of giving it another shot,

32:18

but I don't perceive that this will be a model that I'm

32:20

going to be using a bunch.

32:21

But with that being said, guys,

32:23

I'm going to wrap up the video.

32:24

If you guys haven't already liked and subscribed or joined

32:27

the Discord, make sure you do so.

32:29

I'm live pretty much every day,

32:30

vibe quitting an app until I make a million dollars.

32:32

And with that being said, guys,

32:34

I will see you guys in the future.

Interactive Summary

This video provides a real-world assessment of Moonshot AI's newly released Kimi K2.5 model through the lens of 'vibe coding' in Cursor. The creator initially highlights the model's impressive benchmarks, where it ranks as a top-tier open-source model in terms of intelligence and features an innovative 'agent swarm' capability. However, practical testing reveals a significant gap between these rankings and actual performance. The model struggled with slow response speeds, frequent interruptions requiring manual continuation, and a lack of intuitive understanding of coding tasks, often leading to UI hallucinations and errors. Ultimately, the creator concludes that despite its high intelligence scores on paper, the model's unreliability and poor instruction-following make it less effective than competitors like Gemini 3.3 Flash for professional development workflows.

Suggested questions

4 ready-made prompts