Cursor just crushed Claude Code

Watch on YouTube

Now Playing

Transcript

378 segments

0:00

So a few days ago I opened up Curser.

0:02

As always, I went and I switched my model here to opus 4.7,

0:05

and I started giving it a bunch of instructions and getting it to build out an app for me.

0:09

Now, after about 2 or 3 hours, I actually ran through all of my usage

0:12

of the high end premium models here and actually unknown to me.

0:16

Cursor switched my model to composer 2.5, which at the time was even still.

0:21

Now is their brand new model that they just released.

0:23

Now, I didn't know that this switch had happened, or I didn't see it in the UI

0:27

until I realized that all of the things I was asking cursor do were happening 3 or 4 times faster,

0:33

and the results that were getting was pretty much the exact same as what I was getting before.

0:37

With opus 4.7.

0:38

If not even a little bit better in terms of the structure of the output.

0:42

So I actually dug into this. I was like, why is this so much faster?

0:44

I realized that it had switched me over here to composer 2.5,

0:48

and then I started looking into this new model and just using it for the next 2 or 3 days.

0:52

And what I realized is that this model is as good, if not better than all of the frontier models.

0:57

We're talking opus, we're talking GPT 5.5.

1:00

And I don't know why more people are not talking about it, which is why I'm making this video.

1:04

Now, I actually just had composer 2.5 whip up a quick presentation so I can go through this to you,

1:08

but what I'm going to do later in the video is show you a real demo.

1:10

So wait for that if you want to see exactly how it works.

1:13

But the main significance here is that cursors new composer model is almost

1:17

as good as all of the frontier models, but at a fraction of the cost.

1:20

That's why I put it on the homepage here.

1:22

It costs $0.50 per task using the composer model, versus

1:26

$7 per task using opus 4.7 on the exact same benchmark.

1:31

This is not scientific.

1:32

This is just cursors own benchmark.

1:34

The thing is, this is 14 times cheaper, at least in this test,

1:37

and in my tests it's much cheaper, it's much faster, and it runs just as good.

1:42

So I'm going to go through all of the key benchmarks and give you all of the data,

1:44

and then we'll get into a live demo so you can see what I mean.

1:46

Okay. So let's go through this here.

1:48

Cursor is just winning on every front right now.

1:51

They have a great model.

1:52

They have the best a genetic coding harness which I want to talk about later.

1:55

And they're just one of the most pleasant tools to use for real software development.

1:59

Even if you want to use opus, you want to use GPT 5.5.

2:02

You will get a better result using it inside of cursor because of the harness

2:07

the cursor has developed.

2:08

And if you're unfamiliar with what a harness is, I have a whole section on that later.

2:11

The clawed code codex kind of right cursor.

2:15

These are harnesses.

2:16

These are tools that are built around an LM and cursors harness

2:20

objectively is just better than pretty much all of the other ones that are on the market.

2:24

Which is why, in my opinion, it's really winning the agent development race.

2:27

So if we go through a few stats of this model, you can see that was released

2:30

May 18th, just eight days from when I'm recording this video.

2:33

It was based on the Kimi k 2.5 checkpoint, or at least that's what they modeled it off of.

2:37

And then this is the same base as composer two, which is the previous model they had.

2:41

That was also very good.

2:43

That was released in March, except this one obviously is significantly better if we keep going.

2:47

There's a few other stats you can look at here in terms of the parameters, inference, all of this

2:51

kind of stuff.

2:52

I'm not going to bore you if you don't know what those mean,

2:54

but pretty much this uses a mixture of experts architecture in the background

2:57

and had kind of interesting compute and training split.

3:00

If you're interested, look it up.

3:02

I'm not going to bore you with all of the details.

3:03

Okay. So let's go through a few of the benchmarks here.

3:06

First, I have a third party evaluation benchmark.

3:08

This is the artificial Analysis coding agent index.

3:11

You can see that it's practically tied with opus 4.7 and GPT 5.5.

3:16

If we keep going and we look at a sweep bench multi-lingual, you can see that

3:20

it is very similar performance.

3:21

In fact, even better than GPT 5.5.

3:24

If we have a look at Cursor Bench V3, point one, of course it's going to shine here

3:28

because this is cursors own benchmark.

3:30

But we can see it outperforms the field.

3:32

And then if we look at Terminal Bench 2.0 it's pretty much the exact same as opus.

3:36

And then of course GPT 5.5 is still leading by quite a bit.

3:39

So if you still want to do shell heavy work, probably you're going to go with that.

3:42

But still very, very impressive.

3:44

And the most impressive part is the cost, right.

3:47

This is a significantly more efficient model.

3:49

It is way cheaper.

3:50

It is way faster as you're going to see.

3:52

And just in their comparison here, when they were running this on I believe it was

3:56

the cursor benchmark.

3:57

The cost per task was $0.50

4:00

for a composer 2.5, and nearly $7 for opus 4.7.

4:04

Now, I don't know what the value was for GPT, or I didn't put it in this presentation.

4:08

I imagine it would be less.

4:09

But still,

4:10

if you were just to look at this, you would imagine the model is going to be significantly worse,

4:14

but it's not.

4:14

If anything, it's lagging by 2 or 3 percentage points,

4:17

which really isn't noticeable in real world coding tasks,

4:20

especially when you want to get something done fast and when you want to have usage left

4:24

for the next day, which you just can't say when you're using these high end anthropic models

4:28

unless you're extremely rich.

4:29

Now let's keep going and let's talk about coding harnesses,

4:33

because this, to me is actually the more interesting kind of part of the story here.

4:37

When we talk about cursor, if we just talk about these raw models, right.

4:41

It's like giving some text input and giving some text output.

4:44

Your frontier models are still going to perform better than something like a composer two point font.

4:48

Now that's because these are more general purpose models.

4:51

They're not just used for coding.

4:53

And well, when you just use them from the base API, you just generally get a better response

4:57

because they have more training data and they're just more designed to do that.

5:01

However, when you want to use these models in a special task

5:04

or specialized use case like coding, you're usually not just using a raw API.

5:08

You have context management, you have skills, you have tools, right?

5:12

You have sub agents, all of these different things.

5:14

And those get wrapped into what we call a coding hearts.

5:17

Now, Claude, code that's a coding harness.

5:20

When you use cloud code you're connecting to MSPs.

5:22

You have this context management going on.

5:24

You have tools in the background that are being used that you don't even know about,

5:27

to search for files and find context and information.

5:30

Now cursor is also a coding harness.

5:33

When you go and you load up opus or you load up compose or you load up

5:37

GPT inside of cursor, you're using cursors harness,

5:41

which means the way the context is being injected, the way the tools are being loaded,

5:45

the way that skills are being presented is actually being controlled by cursor.

5:49

And cursor has a really good harness to the point where

5:52

if you actually just bring in any of these models that you would use outside.

5:56

So GPT 5.5, ops, whatever, and you run them inside of cursor, you're going to see

6:01

that you get a better response because cursor has spent more time optimizing their coding harness.

6:07

And that means that when you combine their really good harness with now, they're really good model.

6:11

You get an even better result because they kind of mesh and have their chemistry right.

6:15

It just like what Apple silicon runs really well on MacBooks because it's designed to do exactly that.

6:21

It just to give you a bit of a better visual, you can have a look at kind of a quick graphic

6:24

on what a harness actually looks like.

6:27

Now you can see that this is really the orchestration layer, right.

6:30

So context looping tools

6:33

and then actually connecting and reading your code like I was talking about before.

6:37

The model is really just one part of this entire equation.

6:40

Now it's the brain. Sure.

6:41

It's really what's needed in order to do anything or come up with any creative work,

6:46

but the way that the model functions is completely dictated by the harness that's around it.

6:51

Because again, all a model can do is just predict text.

6:54

That's literally all it does.

6:55

You give it a text in and it will give you some text.

6:58

So we need to control what text goes into it.

7:00

And then we need to provide things like tools to the model.

7:03

So if I have a really good tool in this harness, but I don't have it in another harness,

7:08

my model can perform better just because of the quality of the tools that it has access to.

7:13

If my looping and failures are a lot better in one harness, I'm going to get a better result

7:17

because I'm going to fix all of these errors proactively and have the model

7:21

run multiple turns right, or do more things than it would do in another harness.

7:25

I'm being super vague with how I'm describing this, but Curser

7:28

has really set themselves up from the beginning to be the best harness.

7:32

They fine tuned everything.

7:33

They've written, the best descriptions, they have, the best system prompts,

7:36

they have the best context, engineering and indexing of your code base.

7:40

And this allows them to give you a much better result when you're coding, regardless

7:44

of what model you're using.

7:45

Because the tools, the infrastructure, and the orchestration around it are simply just better.

7:50

And if you don't believe me, go and use Claude code on a project and then go and use cursor.

7:54

And I guarantee you you will see a noticeable difference, even if you use

7:57

the exact same model at the exact same sets.

8:00

Again, the model is just predicting text.

8:03

All of these other things really are that full picture.

8:05

And I'm going to make a full video on this later to break it down more.

8:08

Anyways, point is harness is really important.

8:10

Cursor is a really good harness and that's why I think it's important to talk about this.

8:14

And just as I was discussing here, the model is the engine.

8:16

The harness is the car.

8:17

Doesn't matter how good your engine is if it's in a shitty car, right?

8:20

That's just the analogy to go with.

8:21

Now, like I talked about, cursor didn't just ship a model, it fine tuned the harness around it.

8:26

So not only do they have composer 2.5, but they have the specific coding harness which is the cursor

8:31

application fine tuned around that model, which just gives you this really good result.

8:35

Now, because I know a lot of you guys are going to ask this.

8:37

Yes. The site that I made here was built using composer 2.5 inside of cursor.

8:42

But you'll actually notice that I have it deployed here to a public here

8:46

now to main, which was entirely free that I didn't need to pay for,

8:50

and it required pretty much no setup at all.

8:52

Now here, Right now is sponsoring this video and has been sponsoring my channel, but

8:55

I love mentioning them because you literally don't even need to make an account or pay for anything.

8:59

It is completely free.

9:01

The way that it works is you literally just copy the setup instructions for your AI agent.

9:05

You paste it to your agent and you just say, hey, I want to deploy this site using here

9:10

dot now, and it just figures it out for you and deploys the site to a URL

9:14

so you can see that I did that here. I said deploy this using here to now.

9:17

I gave it the setup instructions and if we scroll down install the skill

9:20

and then it just deployed it to this URL which we're on right now, which I can now

9:24

share with anyone to share this presentation of that you guys can access.

9:28

And then by default it will be available for 24 hours.

9:31

If you want to claim the site and have it be available permanently forever.

9:34

Again, you don't need to pay for anything, you can just claim it and connect it to your account.

9:38

So you make it here dot now account.

9:40

And then it will just save the URL permanently so that you're able to access it

9:43

and is connected to an account.

9:45

You can do this for literally hundreds of different websites.

9:47

It is the fastest, easiest way to deploy something, especially if you just want to send it

9:51

to someone for testing and you can set up all kinds of additional stuff like secrets.

9:55

You can claim domains, you can have a drive, variables, API keys, all of this kind of stuff.

9:59

I myself have deployed many different sites and that's why I'm happy to have them

10:02

as a sponsor of this channel, because they think it is a genuinely really cool tool

10:06

that all of you guys can use if you are using AI agents.

10:09

Now, of course, I believe you can upgrade and you can get, you know, like a $4

10:12

per month plan if you want thousands of sites or a developer plan.

10:15

But I've never needed to do that.

10:16

I just use it for free and I'm always happy to share with you guys.

10:19

Check it out from the link below.

10:20

It's literally just here dot.

10:22

Now you can deploy sites for free.

10:23

Okay, so now what I want to do is get into a live demo and show you the speed difference.

10:29

Importantly and the quality difference between composer 2.5 and a model like opus or GPT 5.5.

10:35

So I have two cursor applications open on screen.

10:39

On the left hand side, I'm going to be using composer 2.5 in fast mode,

10:43

and on the right side I'm using opus 4.7 in medium fast.

10:47

Now, in order for me to use the fast mode here, I did need to enable the max mode in cursor,

10:51

which now means that I'm going to be paying six times the default price.

10:54

So we'll look at the credits and usage afterwards to see how much this actually ends up costing me,

10:59

because it's going to go on extra usage on my plan, but I want it to be fair and do two fast mode

11:03

comparisons rather than just comparing to the default opus one, but we can look at that as well.

11:08

So what I'm going to do is just paste in a prompt.

11:10

The prompt is build a production ready real time collaboration

11:13

whiteboard web application similar to a simplified Miro or Excel a draw

11:17

the application must support multiple users end to end editing, the same board, etc..

11:21

Okay, so I'm going to paste it in both of them and just hit enter at approximately the same time.

11:26

And we're going to see how long this takes.

11:28

Obviously I'll fast forward through it using both of these models and overall

11:31

kind of the result that we get now, the more scientific comparison.

11:34

But I just want to show you quickly what we're looking at here in terms of the side

11:38

by side comparison on one model that is going to be, in this case, like 20 times

11:42

more expensive than the other.

11:43

Okay, so already here composer's going and it's starting and it's working on the application.

11:48

It's writing code.

11:49

Whereas opus still hasn't started doing anything yet.

11:51

Yes, it's in thinking mode.

11:53

So maybe it's coming up with like a more clever plan

11:55

or something, but it's like randomly, for some reason, checking the workspace contents

11:59

while this is already going and just spitting out a massive amount of code

12:02

and you can see, you know, 120 line files pretty much instantly being completed,

12:06

whereas opus hasn't even started writing any code yet.

12:09

Again, not the whole scientific response, but I'm just trying to show you the speed difference

12:13

that we're looking at here, because this is drastic to me,

12:16

and I felt this a lot when I was doing my own coding projects.

12:19

All right. So composer just finished here.

12:22

I don't know the exact time, but it looks like it was maybe 3 or 4 minutes to execute this.

12:27

And then if we're looking at our, what is it, opus version

12:30

on this side, it's still going right now it's running the dependencies.

12:34

It's trying to test everything. So we'll see how long that's going to take.

12:36

So for now it looks like this is running.

12:38

Let's actually just open it up.

12:39

You can see that okay. That's pretty impressive.

12:44

Let me see.

12:44

Can I open multiple of these.

12:45

So that's going to be I guess the key Distinguisher.

12:49

So let's run this and put it over here okay.

12:52

It looks good to start.

12:53

And let's see if I draw okay.

12:56

If I do a circle, if I do a stroke okay.

13:00

Whatever.

13:01

Let's just mess around with a bunch of different stuff here.

13:04

Let's add some text or something.

13:06

I don't know how the text one works. Oh, I got to do text here.

13:09

Let's see.

13:09

Text and edit this hello world.

13:13

And you can see that it's happening in real time.

13:15

And obviously on localhost it's quite fast but you get the idea.

13:18

So I mean that's pretty good in the first run here

13:22

of this application and probably is a bunch of other stuff that I'm missing.

13:25

I can zoom in, zoom out, and it actually shows where I am on the other screen, which is super cool.

13:30

Okay.

13:31

So I mean, that just essentially one prompted that application.

13:34

Opus is still running right now.

13:36

So just to be fair, let's stop this dev server just so that

13:39

it doesn't have some issue running it, and let's see how much longer it takes for opus.

13:42

Okay, so opus finally finished.

13:44

I was actually getting impatient relative to composer here.

13:47

I think we're at at least 15 minutes.

13:48

We're again with composers maximum like four, and it didn't even run the application for me.

13:53

So I'm just going to say run the app for me because it wasn't doing that.

13:57

Even though composer did that by default.

13:59

It's giving me the instructions.

14:01

So let's see if it can run it.

14:02

Hopefully it will spin up the shell and then we can test it and see if it works.

14:06

Now, one thing to note, because I was just looking at the code,

14:08

given that I just had to sit here for 15 minutes, is that if we look at the opus version,

14:13

we actually get it with pure CSS JS so no TypeScript, it's not using react.

14:20

Yeah, we have a server and whatever separated out it looks like at least

14:22

the structure is somewhat decent in terms of all of the different files.

14:27

But then if we go and we have a look at our composer version,

14:31

we can see that this actually used a separate client,

14:34

a separate server, it used react for the client.

14:37

You can see with this let me close this here kind of app.

14:40

Dot of main dot.

14:42

All of this.

14:43

Again we have some folder structures here hooks lib canvas.

14:48

And generally the structure is as good if not better than the what do you call it here.

14:53

Opus version on the right hand side for the server.

14:55

I mean it's a little bit barebones.

14:57

What we've got going on here in terms of the WebSockets,

14:59

it looks like it's just transmitting like all of the data.

15:02

So that's probably an issue if I look into that a little bit deeper.

15:05

The point is that at least proactively went with TypeScript, whereas this one just went with pure JS.

15:10

Okay. So it's saying let's check this out here.

15:12

So let's go to the demo one.

15:16

What is it saying to go to let's go to be my board okay.

15:19

And let's open this twice and let's test it and just do a sanity check and make sure that this works.

15:24

Okay. Here we go.

15:26

Paste this and let's try to draw something.

15:30

Can I draw

15:32

okay.

15:33

And it looks like unless there's some issue that I cannot do

15:38

anything share maybe link.

15:43

So work.

15:44

Yeah.

15:44

Like, I cannot seem to do anything at all.

15:48

Like it just completely broken.

15:51

So that's great.

15:52

I waited 50 minutes for all of that.

15:54

So I'm not going to try to debug this right now.

15:55

But this is kind of what I'm talking about guys, is that yeah, we think opus

15:59

or we think these models are the best,

16:00

but then something like composer comes out and just completely tears it apart.

16:03

And sure, this is just one experiment.

16:04

Of course it can make mistakes.

16:05

It's not a scientific method I'm showing you right here, but I have this in opus 4.7, medium

16:10

fast mode, where I'm paying probably 2025 bucks what I should be for this model to run.

16:15

Meanwhile, we get a better result in four minutes, right?

16:19

That actually functions from this extremely cheap composer model.

16:23

So just to be fair, let's switch over now. GPD 5.5.

16:25

Give it the same prompt and see what we get.

16:27

Okay, so we just started running GPT 5.5.

16:30

I just made a full screen here. You can see I'm doing medium fast.

16:32

So same thing I had to put in max mode.

16:34

So again I'm going to spend an arm and a leg to run this here inside of cursor.

16:37

But hopefully this is going to give us a functioning response.

16:40

Let's see.

16:41

All right. So Codex is wrapping up now.

16:43

Now this is taking a long time as well I think we're at about ten minutes at this point.

16:47

So a little bit shorter than opus but still taking a good amount of time.

16:51

I will say, however, that the code quality that we're getting, at least at first

16:54

glance, is significantly better than what we had with all of the other models.

16:58

I just skim through some of the files.

16:59

It did use react and TypeScript.

17:01

By default, the server looks a little bit more complete in terms of what it's actually doing.

17:06

We have shared types, we have automated testing.

17:09

And just again, me quickly going through this is not a scientific analysis here,

17:14

but even just looking at like the board and all of these different things,

17:17

it seems like this would be a more maintainable application overall,

17:21

and at least at first glance for me is a little bit more understandable.

17:24

Now it's also running the tests automatically,

17:26

making sure that everything is going to work, and then hopefully it's going to spin up the app.

17:29

So once it spins it up, we'll test it and see if it does actually function.

17:33

But at first glance, I'm feeling a little bit

17:34

more positive here with the GPT and or with Codex or whatever you want to call it.

17:38

I guess it's just GPT because we're not using it in the Codex app, and same thing as before.

17:42

I'm using the fast mode in a medium effort, which I think is kind of a fair analysis.

17:47

Okay, so a few minutes later and it looks like this is working now,

17:52

so we can do a rectangle, we can do an ellipse.

17:56

If I go here we can do a sticky note okay.

17:58

And just place it on screen I don't know how I'm supposed to like zoom in on this.

18:04

Because that doesn't really seem to be working.

18:07

And it looks like, yeah, we can add text box.

18:09

Okay, let's go back here to select.

18:13

We have a name that I guess we can update.

18:16

So overall it's not the most clean version of the application,

18:21

but at least it's functioning unlike the version that we had with opus here.

18:25

And in terms of the time

18:26

analysis here, this took a similar amount of time as opus after I told it to run the app,

18:29

and it was doing all of the testing and all of this, at least we're getting something functioning.

18:33

But still my favorite version of the app.

18:35

So far, at least in terms of just initial function, is the one that we had with composer.

18:40

Now, I didn't know that was going to happen when I ran these prompts.

18:42

I've been ran these before.

18:43

I'm not trying to be biased towards one particular model.

18:46

In fact, I would hope that these models would perform better, but it doesn't seem that that's the case.

18:50

Now. GPT does seem like it has a bit more reasoning, a bit think a bit more thinking.

18:54

It's trying to structure the app in a bit more of a clever way.

18:57

So I appreciate that in terms of the way that it structured it here.

19:00

And I'm sure with a few more prompts, I could get it to give me a much better result

19:03

if I told it more specifically what I want, but overall a little bit disappointing compared

19:08

to what we got with composer in like five minutes, whereas this took like 15 or 20 minutes to run.

19:13

So anyways, guys, that's going to wrap up this video.

19:15

I wanted to show you how good the composer model is,

19:18

how cheap it is, how well it works inside of cursor, and encourage you to use it.

19:23

I'm not biased towards

19:24

any one of these individual models, I just want to use whatever one is the best I found.

19:27

It works really well for me, so hopefully this gave you something to chew on

19:31

and let me know what you think of the composer model and your experience with it,

19:34

and I will see you in another video.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The video highlights the impressive performance and cost-efficiency of Cursor's new 'Composer 2.5' model compared to premium frontier models like Opus 4.7 and GPT 5.5. The presenter demonstrates through a live coding challenge that Composer 2.5 delivers high-quality, functional results significantly faster and at a much lower cost ($0.50 vs $7 per task). The video emphasizes that Cursor's success is largely due to its superior 'harness'—the orchestration layer that manages context, tools, and code indexing—which optimizes the output of any underlying AI model.