HomeVideos

Vibe Coding With Claude Sonnet 4.6

Now Playing

Vibe Coding With Claude Sonnet 4.6

Transcript

646 segments

0:00

I'm going to be vibe coding with the newly released Claude

0:03

sonnet 4.6 This was a model that released yesterday from

0:07

anthropic and so far.

0:09

I am very impressed I was able to use this model

0:12

extensively yesterday and so far.

0:14

I have a lot to say about it I've already put it through

0:16

the bridge bench I've put it through the creative HTML test

0:19

and I have a lot to talk about and show you in this video

0:22

We are going to be looking at benchmarks We are going to be

0:25

putting it through production vibe coding tasks to really

0:28

give this model a fair shake And by the end of this video,

0:31

I will share whether or not it is bridge mind certified

0:34

with that being said if you haven't already Liked

0:37

subscribed or joined the discord make sure you do so

0:40

There's a link in the description down below to join the

0:42

fastest growing vibe coding community on the internet right

0:44

now And with that being said,

0:46

let's get right into the video All right So the first thing

0:49

that I want to do is actually take a look at this model

0:51

inside of open router see how fast it's running From

0:54

anthropic check out the price and compare it a little bit

0:57

with opus 4.6 So right away you can see the million context

1:00

window.

1:01

That's pretty that's pretty good,

1:02

but that's was expected from this model We also can see a

1:06

three dollars per million on the input in a fifteen dollars

1:09

per million on the output That is the same exact pricing as

1:12

sonnet 4.5 If we compare this now to opus 4.6,

1:16

you know You're talking about a 1 million in context with

1:18

opus 4.6 and then the price Five dollars and twenty five

1:21

dollars respectively.

1:22

So you're talking about like close to a 40 What is that 40%

1:26

cost reduction when talking about sonnet compared to?

1:30

Opus so you're talking about a 40% cost reduction The

1:33

question really is is okay How much of a speed up is sonnet

1:37

compared with opus and we can see that here So here you can

1:40

see that from anthropic anthropic is putting out this model

1:43

at 40 tokens per second And with opus you're looking at 32

1:48

tokens per second So what even is that a 25% speed up I

1:51

believe so you're talking about a 25% speed up when when

1:55

comparing sonnet?

1:56

Versus opus which is pretty good get 25% faster And then if

1:59

you look at the latency the latency 1.22 seconds versus 2

2:03

.25 seconds,

2:04

so it almost cuts the latency is half So this is a more

2:07

performant model for sure and then you're talking about a

2:11

little bit more of a price reduction But something I do

2:13

want to emphasize is that if you guys remember Claude

2:16

sonnet 5 4.5 was very very popular and The reason that it

2:22

was popular is that at the time that Claude sonnet?

2:25

4.5 was the best model that anthropic had it was when opus 4

2:29

.1 was the leading opus model Okay And something I want to

2:33

highlight here that some people may forget is that the

2:36

reason that sonnet?

2:37

4.5 was so much better at that time,

2:40

which is what I don't know maybe four or five months ago at

2:42

that point The reason it was so much better is that it had

2:45

this price three dollars per million and fifteen dollars

2:48

per million While the leading opus model was fifteen

2:51

dollars per million and seventy five dollars per million So

2:53

this was a 5x in an increase in price to use opus models

2:58

But that's no longer the case right and the point that I

3:01

want to make here that everybody needs to understand is if

3:03

you guys Remember when sonnet 4.5 was like by far the best

3:06

model that everybody was using that was when opus was way

3:09

more expensive And they really reduced the price for opus

3:13

So that makes it a little bit more that brings in the

3:16

debate it makes the debate a little bit hotter for hey

3:18

Should you use sonnet or should you use opus?

3:20

So with that being said,

3:22

let's now take a look at the benchmarks and look at the

3:25

improvement from sonnet 4.5 to 4.6 So here it is.

3:28

You can see the terminal bench 2.0.

3:30

You're looking at an 8.1 percent improvement agentic coding

3:34

suite bench verified This is the most in courting important

3:37

coding benchmark that I always look at and here they were

3:40

able to get a 2.4 percent Improvement which is okay,

3:43

but you're talking about a one point.

3:45

It's still 1.2 percent worse than opus 4.6 So in terms of

3:50

coding you're still looking at that opus 4.6 and saying

3:53

okay This model is more capable there now if you do compare

3:57

this in the agentic computer use It's just right up there

4:01

with opus 4.6 It does very good with computer use and if we

4:05

actually go to the blog that anthropic put out They

4:08

emphasized computer use a lot in this blog.

4:12

Look at this.

4:12

They actually emphasized this first They said almost every

4:16

organization has software It can't easily automate

4:18

specialized systems and tools built before modern

4:20

interfaces like API's existed to have AI view software

4:23

Software users would previously have had to build bespoke

4:27

connectors But a model that can use a computer the way a

4:29

person does changes that equation So this is something that

4:32

I've talked about previously on on live streams and in my

4:35

channel is just the fact that Hey computer use is coming

4:39

That's the reason that I think open claw is such a big deal

4:42

is that computer use is going to become a much bigger deal

4:46

In the future you're going to have isolated environments

4:48

and these AI agents are going to be using these computers

4:51

Look at sonnet 3.5 in October 20 24 Compared to where

4:55

sonnet 4.6 is now look at that 14.9 compared to 72.5,

4:59

which is just insane So take note of that also They did put

5:04

out look at this the agentic financial analysis that beat

5:08

out the office tasks It also beat out it did perform very

5:11

well an agentic tool use right up there with opus 4.6 So

5:15

this model is on par with opus 4.6.

5:18

It's not it's not better than it You know,

5:20

we've seen sonnet models like sonnet 4.5 was better than

5:23

the opus model that existed at the time You're like, okay,

5:26

use use sonnet 4.5, right?

5:28

But now that we have opus 4.6 with the price that we're

5:31

getting it at,

5:31

you know You're only talking about a 40% price reduction

5:34

and then a 25% speed improvement So that does kind of bring

5:37

into the equation the equation What should you be using

5:40

look at novel problem solving opus 4.6?

5:43

10% higher 10.5 percent higher than sonnet 4.6 So there's a

5:47

lot that you can look at here and say okay opus 4.6 is

5:50

still a really capable model That is better than sonnet 4

5:54

.6.

5:55

So you have to weigh Okay,

5:57

do I want that improvement in speed but do I also want to

6:01

have that at a price reduction, right?

6:02

So like you have to you just have to ask yourself that

6:04

question Another thing that they emphasized is that users

6:07

preferred sonnet 4.6 to opus 4.5 Their frontier model from

6:12

November 59% of the time.

6:14

This is really interesting that they said this So 59% of

6:19

the time people actually preferred this model and they

6:21

relate they rated it as significantly less prone to Over

6:25

-engineering and laziness.

6:27

So that's very very interesting.

6:28

It means that it's better instruction following Okay,

6:30

so that's another important thing.

6:32

Also look at this guys So if you guys are familiar with the

6:36

vending bench,

6:37

this is a really trending bending Benchmark the vending

6:41

bench.

6:42

Look at how it performed sonnet 4.6 outperforms sonnet 4.5

6:46

on vending bench arena by investing in capacity Early then

6:50

pivoting to profitability in the final stretch.

6:52

Look at how it performed 6,000

6:55

versus like barely over 2,000

6:57

So it was it was like close to 3x more performant,

7:00

which is insane So you're looking at this and you're saying

7:03

okay Look at the difference between sonnet 4.5 and 4.6 Like

7:06

that's actually a pretty noticeable difference And what we

7:10

want to look at is we now want to look at the bridge bench

7:13

and we want to see okay How does this model actually

7:16

perform in our own personal benchmarks?

7:18

And what we've done is we've created bridge bench to do

7:21

that and I want to emphasize that I have Now index this in

7:25

bridge bench.

7:25

You guys can go check it out If you go to bridge mind and

7:28

just go to community go to bridge bench There's also going

7:30

to be a link in the description down below but check out

7:32

how this performed.

7:33

So Here's Claude Opus 4.6.

7:36

Here's Claude sonnet 4.6.

7:38

Here is the entire ranking so you can see here's GLM 5

7:41

Here's Quinn 3.5.

7:42

Here's mini max m 2.5.

7:44

Here's GPT 5.2.

7:45

Codex So Claude sonnet 4.6 outperforms GPT 5.2 codex.

7:50

It does better in completions.

7:52

It also is faster So here you can see that the the total

7:55

time and I'm gonna change up the formatting of this But the

7:58

total time for sonnet 4.6 to complete this benchmark was

8:01

924 seconds Whereas with Opus 4.6,

8:05

it was 983 seconds and with GPT 5.2 codex.

8:09

It took 2245 seconds,

8:11

so that's where you know with with these philanthropic

8:13

models There's so much more performance and speed and I

8:16

will say that they did improve this with GPT 5.3 codex They

8:20

improved the speed but this model we can't benchmark it yet

8:23

since it's not available in OpenRouter or the API The next

8:26

thing I want to take a look at is the creative HTML

8:28

benchmark So this is a benchmark that we use to be able to

8:32

actually look at UI comparisons So the first one is the

8:35

lava lamp So let's go take a look at the lava lamp that

8:38

sonnet 4.6 produced.

8:40

Here it is right here So this is the lava lamp produced by

8:43

sonnet 4.6.

8:44

You guys can make your judgments on this Let's go back over

8:47

to Opus 4.6 and look at this.

8:50

So here's Opus 4.6.

8:51

Here is sonnet 4.6 So definitely different approaches.

8:54

I don't know what you guys think is better But here's the

8:57

lava lamp.

8:58

Another one that I like to take a look at is the retro

9:00

space invaders This is probably my favorite one to look at.

9:03

Here's how Opus 4.6 performed I think it did very well on

9:07

this on this creative HTML test,

9:09

but let's take a look at how sonnet 4.6 performed Let's

9:13

take a look at it here sonnet 4.6 And I did post this on X

9:16

and a lot of people were saying that they thought that

9:18

Sonnet 4.6 did a better job because it followed the

9:23

instruction look at the alien behind it like added in a

9:25

boss It added in a really just the spaceship down there at

9:29

the bottom is a little bit better styled It did make it

9:32

more retro.

9:33

So you're seeing the styling and the styling so far is

9:36

pretty impressive So in terms of styling,

9:39

I think that this is a very capable model for styling Let's

9:43

go check out if it is in design arena.

9:45

Let's go over here and check out design arena AI let's go

9:49

see if it's in design arena yet.

9:51

I actually don't know if it's in design arena Let's check

9:53

the leaderboards though.

9:54

So sonnet it is in design arena.

9:55

Look at this.

9:56

So sonnet 4.6 performs 1338 so it is behind opus 4.6.

10:02

So opus 4.6 got 1397 the thinking version got 1373 and then

10:07

GLM 5 got 1354 and then sonnet 4.6 Performed for they got

10:12

fourth place with 1338.

10:14

So that's actually interesting It did perform worse than

10:17

GLM 5 but still up there highly outperforming Gemini 3 Pro

10:21

and Way outperforming GBT 5.2 high so that gives you guys a

10:26

look at the UI and the styling There I think that even

10:30

though like it is 1338, you know,

10:32

you are bringing into the account of like,

10:33

okay It is more affordable.

10:35

It's faster.

10:36

That is very important to note Also another important

10:40

benchmark that I do want to pull up is the artificial

10:42

analysis This is one of the most important benchmarks that

10:45

I look at.

10:45

So let's see.

10:46

It should be benchmarked Let's take a look and just make

10:49

sure real quick.

10:50

So here is the artificial intelligence.

10:52

This is a really important benchmark Let's see if it's

10:54

indexed here.

10:55

Let's go to sonnet 4.6 it is here.

10:58

So here it is adaptive reasoning and then let's also add

11:00

GBT 5.2 to the mix as well I think I turned it off by

11:03

accident.

11:04

So let's see here sonnet.

11:06

Oh my gosh guys Look at this sonnet 4.6 max is ranked

11:10

number one.

11:11

That is insane.

11:12

So it actually outperforms G Or I guess it's Thai,

11:14

but I guess it's outperforming here.

11:16

Maybe it did a little bit better So it's outperforming but

11:18

look at this sonnet 4.6 on the intelligence index

11:21

outperforms opus 4.5 And let's take a look at opus 4.6 and

11:26

add that to the equation.

11:27

So here's opus 4.6 So it is under opus 4.6 And again,

11:30

that's what we're seeing is we're seeing whereas with

11:33

sonnet 4.5 versus opus 5 or opus 4 Sonnet 4.5 was a better

11:38

model at that time, right?

11:40

But what we're seeing is that opus 4.6 is actually still

11:44

Outperforming sonnet 4.6,

11:45

which is actually a good thing if we go to the coding

11:48

index.

11:48

Let's take a look there Wow, look at this guys.

11:50

So on the coding index,

11:52

this is big So on the coding index sonnet 4.6 is the best

11:57

performing model in the artificial intelligence analysis

12:00

coding index Look at this 49 and then 51.

12:04

So this is the highest performing model in the artificial

12:08

analysis coding index I definitely need to make an ex post

12:10

about that because that's a big deal.

12:12

This this is a high value Index so we want to take a look

12:17

at this now.

12:17

Oh my gosh Agenic index it absolutely performs really well

12:22

Let's add is GLM 5 added to this index as well because I

12:25

know GLM 5 performs very well on this one as well Let's add

12:28

it so GLM 5 performs well on this agentic index But look at

12:31

this it destroys GPT 5.2 and again a little bit under opus 4

12:35

.6 max But look at that 63 that's very very performance.

12:40

Another thing that we want to take a look at is the

12:42

hallucination index So let's scroll down a little bit and

12:44

take a look at that.

12:45

That is the index that measures the hallucination rate So

12:48

let's take a look at this.

12:49

Wow.

12:50

Wow chap.

12:51

Look at this 38% on the hallucination index This is insane

12:56

So if you look at opus 4.6 That's at 60% and on this

13:00

benchmark the lower the score the better and look at that 38

13:04

% so that is a 10% reduction in the hallucination rate from

13:08

sonnet 4.5 So that is very very impressive.

13:12

That's something to look at right hallucination rate is

13:15

very very important You know a big problem with Gemini

13:18

models is that they hallucinate all over the place, right?

13:20

Like or even is Gemini 3 Pro look at that 88% in something

13:24

that we see with these Anthropic models is that they do

13:28

have very good hallucination rates, right?

13:30

Meaning lower hallucination.

13:32

You look at GPT 5.2 extra high 78% where now look at this

13:36

sonnet 4.6 38% So this model is going to hallucinate less

13:41

than opus 4.6 and that is going to make it more reliable So

13:45

you're talking about a model that's going to be more

13:46

reliable.

13:47

That's faster.

13:48

That's cheaper This is this is this model has a lot of

13:52

potential guys and that's going to summarize the benchmark

13:55

review And now I want to move on to actually using this

13:59

model inside of quad code and talking to you guys about

14:02

what I saw Yesterday in my session my vibe coding session

14:05

at a vibe coding No talking stream and I also was using

14:08

this model when it initially launched during my vibe coding

14:12

in app until I make a million dollar Series so with that

14:14

being said,

14:15

let's launch Claude code and actually start vibe coding

14:18

with this model All right guys,

14:19

so I have bridge voice and bridge space launched These are

14:23

two tools in our vibe coding suite that I'm going to be now

14:27

using in order to vibe code with sonnet 4.6 You can see

14:31

bridge voice is a very good voice to text tool and a bridge

14:34

space is a great ADE agentic development Environment to be

14:38

able to manage an agent swarm So you can see I have 16

14:42

sonnet 4.6 agents open and I have my voice to text tool And

14:46

I'm just gonna start working like I normally would in

14:48

production and you guys can follow along as I put this to

14:51

the test But for a minute here,

14:52

I'm just gonna put all of this to the test to watch this

14:54

The first thing I'm going to do is drop in bridge bench I

14:57

need you to review a bridge bench and take note of the new

15:00

speed test bench that we have What current results do we

15:04

have coming out of the speed test?

15:05

Are there any models that have completed the speed test?

15:07

So we're going to drop this in and then we're also going to

15:10

go over here and we're going to drop in the bridge Mind UI

15:13

I need you to do an in-depth review launch three sub agents

15:16

to be able to review this website and review it in terms Of

15:19

the performance for lighthouse and make sure that this

15:22

website is optimized for speed and performance as well as

15:25

for SEO Do an in-depth deep dive and analyze it and then

15:29

tell me your findings with these three sub agents So I'm

15:32

going to drop these in here.

15:33

I'm then going to go over here.

15:34

I'm going to drop in a bridge voice.

15:37

I Need you to do an in-depth review of the customized

15:40

scroll bar that is in this application I need you to make

15:43

sure that it has an on theme scroll bar for this app that

15:46

works Well both on Windows and Mac and Linux I also need

15:50

you to make sure that that top header looks good and is

15:52

customized for Windows Mac and Linux So let's now do that.

15:56

So it looks like only one model has the results so far

15:58

anthropic So what I need to do now is I'm going to drop in

16:01

bridge mind UI I need you to review this website and go to

16:05

the bridge bench page and add this as a toggle for the

16:08

speed test bench Where we will be able to display the

16:11

leaderboards for the fastest models Please create this new

16:14

toggle in the bridge bench page and then update accordingly

16:17

add this sonnet 4.6 model to the leaderboard there So we're

16:21

going to pass this in and just for reference So you guys

16:23

can see what that is doing So if we go over here and we

16:25

actually go over to local host What you guys are gonna see

16:28

is that we can go to bridge bench,

16:30

which I kind of show you guys earlier but you can now it

16:32

art Wow it already added the speed bench and Wow,

16:36

so it already did that that's actually really impressive So

16:38

let's um,

16:39

let's now go over here and let's let's drop in another

16:42

prompt.

16:43

So let's drop in bridge voice I do have an issue with

16:46

bridge voice locally I need you to launch three sub agents

16:48

to do an in-depth review of bridge voice We made some

16:51

updates recently when we added in the ability to be able to

16:54

toggle the view of the widget between Logo only versus

16:57

having the logo and bridge voice for the widget pill and

17:00

now the widget pill is no longer showing when I initially

17:03

launched bridge voice would you launch three sub agents to

17:05

do an in-depth deep dive to figure out why the Widget pill

17:09

is no longer displaying on Mac Do an in-depth deep dive and

17:12

figure out what is causing this and then implement a fix so

17:15

that it does display on launch So I'm gonna do that and I'm

17:19

gonna launch another agent and this one is for bridge

17:21

Actually,

17:22

I can't do that because we're using it locally right now So

17:24

let's see this one.

17:25

So these agents are all working right now This one is doing

17:28

a look there and then I'm also gonna do a follow prompt on

17:31

this one I need you to look at the models that we have

17:34

tested in open router like GLM 5 Minimax M 2.5 and I now

17:39

need you to launch sub agents for each of these models that

17:41

we have not put through the speed Bench and I need you to

17:44

benchmark mark these models and then output their results

17:48

respectively But launch sub agents for each model that we

17:50

have not yet put through the speed bench yet So we're gonna

17:53

put all those models.

17:53

I'm just gonna send that there we're also going to launch

17:56

here and I will say that so far what I am finding is that

17:59

this is Noticeably like I was using this last night and

18:03

this model is very fast and it is reliable You know when

18:06

you look at that hallucination rate one thing that I can

18:08

say for certain is that it is more it is very reliable It's

18:12

a very reliable model.

18:13

It's cheap.

18:14

It's fast.

18:15

I need you to create an agent team that is built around

18:17

identifying AI slop code and Ways that we are able to

18:22

improve performance in bridge mind API You need to create a

18:26

team that will be able to evaluate the API which is a nest

18:29

JS Application and this team will be able to work together

18:32

to identify findings that we can improve the code without

18:36

breaking functionality Improve performance without breaking

18:39

functionality launch this team and output your findings and

18:43

then using that same team I need you to implement fixes

18:46

again without breaking any functionality.

18:48

So let's also paste that in So that is going to create a

18:51

team So we'll put teams to the test and one thing with

18:53

teams is that you know quad code they released teams and

18:56

teams with something I said,

18:57

hey This is going to be useful But Claude Opus 4.6 is just

19:01

a little bit too expensive to actually get the use out of

19:03

teams So with sonnet the interesting part about sonnet is

19:07

that sonnet is a 40% cost reduction So with that 40% cost

19:11

reduction that is going to make teams a little bit more

19:14

usable because with Opus if you launch a team It's going to

19:17

max out your usage immediately.

19:18

But with sonnet, it's you know,

19:20

you're looking at 40% cost reduction So you're just gonna

19:23

get way more use out of your plan,

19:24

right and it's right up there in performance It's even

19:27

better with the hallucination rate.

19:29

So definitely something to take note of let's launch bridge

19:31

my API and I already put it in I need you to run the test

19:35

suite and fix any failing tests.

19:37

Let's launch this one.

19:37

So you can see that team is now being created Let's now go

19:40

over here and drop in another one here and I'm gonna drop

19:44

in the bridge mind web app and the bridge mind UI I need

19:47

you to do an analysis of the onboarding flow and I need you

19:50

to make sure that the styling is polished and that it Is

19:52

accurate and flows well with best practices.

19:55

So let's drop this in another interesting thing that we

19:58

have is this one is now done So this one is giving us about

20:01

the lighthouse performance 72 to 78 the target is 90 plus

20:05

So total estimated our 30 to 40 hours for full

20:08

implementation.

20:09

So it's not at 4.6.

20:10

Just probably take about 10 minutes So I now need you to

20:14

output all of the findings so that we can get our

20:17

lighthouse score to improve Run the test and then create a

20:21

structured readme file with everything that is needed So

20:25

that we can create documentation and then be able to start

20:28

improvements But first you need to create this readme you

20:30

need to run the test and then create the readme

20:32

respectively Okay,

20:34

so let's paste that in now and then I'm gonna go over here

20:36

and I have an interesting problem here Which is the URLs

20:39

when I'm redirecting with bridge space and bridge voice.

20:42

So check out the URLs here So you see this one redirected

20:46

to this This here this port and then this desktop code It's

20:50

not a very friendly URL and what I want to do is I want to

20:53

improve the URL redirects for both bridge space and Bridge

20:56

voice without impacting the auth functionality.

20:59

So I'm gonna drop in both of these redirects and I'm going

21:02

to paste this in I'm going to say I want you to take a look

21:06

at the URLs that I get sent to when Using the auth flow for

21:11

both a bridge space and bridge voice take a look at these

21:14

URLs And then also can I get this can I go back one?

21:18

So yeah I want to go back one too so that it can take a

21:21

look at like the other step Because I think that this can

21:24

be improved and this is a little bit more of a just

21:26

difficult and nuanced task It's a little bit weird, right?

21:30

So let's drop this in so we have bridge voice bridge space

21:33

and you can see that even in the bridge Voice it says

21:36

electron redirect, right?

21:37

That's not necessarily correct, right?

21:39

It's a little bit outdated So let's now pass in bridge

21:41

voice and let's also pass in bridge space tari And what

21:45

we're now going to do is we're going to say I want you to

21:48

do a deep dive on the redirect URLs and the auth flow for

21:52

both bridge space and bridge voice think through best

21:55

practices for creating a user friendly URL system you can

22:00

see that for bridge voice it has electron redirect even

22:03

though this even isn't even an electron application So

22:06

there's some issues there It would be really nice if we

22:09

could optimize the URLs to be user friendly and follow best

22:12

practices launch three sub agents and Search the internet

22:16

for the best practices for this and then I need you to

22:20

create a structured plan for updates That will not break

22:23

functionality But we'll update the URLs and this off flow

22:26

to be to have URLs that are more user friendly and follow

22:29

best practices So I also need to drop in the bridge mind UI

22:34

and again We are currently using bridge voice and bridge

22:37

space which are tools in our vibe coding tool suite You can

22:41

currently subscribe to bridge mind pro for 50% off your

22:45

first three months So $10 a month for your first three

22:48

months,

22:49

you can go to bridge mind AI and learn more But these

22:51

products are getting closer and closer to shipping They

22:54

already are in production for Mac OS and Windows and we're

22:57

getting to more stable releases But these products are

23:00

going to be very helpful for vibe coders and I use them

23:03

daily in my workflow So it's a creative match patterned

23:06

here.

23:06

So this is the recommendation So these are all the issues

23:10

that it found So I'm now I want you to launch sub agents

23:14

that will implement updates for each in the best practice

23:17

way That will not break functionality,

23:18

but we'll implement the fixes accordingly So we're going to

23:23

now pass that in it looks like all tests of two thousand

23:26

two hundred eight tests are all passing So we don't need to

23:28

worry about that.

23:29

Let's see this here.

23:30

So we are passing these tests It looks like let's let's

23:32

look at the speed test here.

23:33

So Aurora alpha two hundred fifty eight tokens per second

23:36

cloud sonnet four point six 63 tokens per second.

23:39

So here on the speed test this actually was running at 63

23:42

tokens per second So even though it said 40 tokens per

23:45

second in my speed test I got sixty three point eight

23:49

tokens per second,

23:49

which is absolutely nuts inside of open inside of open

23:53

router inside of the bridge bench Speed test.

23:55

So that's one thing to look at there.

23:56

Sixty three tokens per second I'm very interested to see

23:59

what we get out of Opus four point six It looks like Quinn

24:03

three point five was at fifty one and then a couple more

24:06

are still running I need you to add Opus four point six to

24:09

this as well.

24:10

So let's let's do this as well We'll get the results from

24:13

Opus four point six here These agents are analyzing and

24:16

what I will say about using this model is what yesterday I

24:20

was using it I'm using it right now.

24:21

And what I will say is I notice it being more reliable I

24:25

noticed the hallucination rate being lower I notice it

24:28

being faster and I notice it being able to one-shot So in

24:32

terms of talking about this model and whether you should be

24:35

using it whether it has the bridge mind stamp of approval

24:38

Boom bridge mind stamp of approval.

24:40

I will say I can tell that this is a good model just after

24:44

using it This is something that you want to start

24:47

implementing in your workflow that 40% cost reduction and

24:50

the speed increase for this model is very impressive It's

24:54

noticeable.

24:55

You're going to notice it when you're launching teams,

24:57

you're going to notice it when you're using sub agents You

24:59

know if you go to this is another thing that I wanted to

25:02

talk about if you go to model, right?

25:04

This is something that you guys need to know about.

25:06

So the model menu has changed for quad code So check this

25:10

out.

25:10

So select model switch between cloud models.

25:13

So the default recommended is Opus four point six This is

25:16

the most capable model for complex work and you now have

25:19

the ability to this is this is new with yesterday's

25:22

deployment Of sonnet four point six.

25:23

They also shipped this so now you can choose the one

25:26

million in context for Opus four point six But look at

25:30

this.

25:30

It's billed as extra usage.

25:32

Do you guys see this?

25:33

So I don't know why they necessarily did this This is

25:35

weird.

25:36

So They have the default right?

25:39

But then they also have the one million in context and then

25:42

that's billed as extra usage and I don't know why they did

25:45

that So if you want to get absolutely taxed you can select

25:50

the one million in context and then it's the same thing for

25:53

sonnet So you can choose sonnet and it says best for

25:55

everyday tasks and then you can also choose sonnet 1m and

25:59

it said billed as extra usage and This is actually billed

26:03

at a higher price.

26:05

So six dollars per million and $22.50 per million,

26:09

you know,

26:09

that's that's something interesting and it's a little bit

26:12

misleading If I'm being completely honest because you're

26:15

looking at open router, right?

26:16

And if you go over to open router What we're gonna see is

26:19

you look at it and you're like,

26:20

oh one million in context three dollars per million on the

26:22

input Fifteen dollars per million on the output.

26:23

That's great But then in quad code for some reason they're

26:28

making these 1 million in context models Basically unusable

26:31

because they're not even allowing you to use it with your

26:34

plan.

26:35

They're billing it as extra usage So this is something I

26:38

wanted to highlight Maybe you guys can let me know in the

26:41

comment section down below is the like what is this model

26:44

even running at right?

26:45

Like sonnet, can I even ask it?

26:47

Like could I switch over here and could I say what is the

26:50

context window on this model?

26:52

Will it will it respond?

26:53

Will it let us know because 1 million in context is like

26:57

great, right?

26:58

Wow Okay,

26:59

this is actually something new that we just learned guys So

27:01

sonnet 4.6 if you're using it just normally without having

27:06

it set to this 1 million in context It has a two hundred

27:09

thousand context window that is actually a little bit.

27:12

I'm gonna dock it a little bit here This is not good.

27:15

This is not a good approach.

27:17

So they're billing it for a higher price with more cost per

27:23

token you're unable to use the 1 million in context with

27:26

your plan and the normal default mode only has a 200,000

27:31

in context like that's really really interesting because if

27:35

you look at Gemini obviously Gemini models They all have 1

27:38

million they have over a million in context and this is

27:41

just hey if you're using Gemini 3 Pro It's using that by

27:43

default right same thing with GPT.

27:45

I can only look at GPT 5.2 But GPT 5.2 codecs this one has

27:48

a context of 400,000

27:50

right and that's like what so it's 2x So that's one thing

27:53

that I'm gonna kind of dock against quad code is I don't

27:57

necessarily know why they're doing that I'm still going to

27:59

approve the model I'm going to be using this but why is it

28:02

that they're not offering us this model with the 1 million

28:05

in context window with Our plan that's just a little bit

28:08

weird, right?

28:09

So let's go back over here and it looks like these are

28:12

still running.

28:12

We have the speed test going up This one is all fixed and

28:15

this is very very fast great work I now need you to launch

28:19

sub agents that will implement the fixes without breaking

28:22

functionality or writing AI slop code So these are all

28:25

working.

28:25

Let's launch this one.

28:26

Let's see how this one is doing.

28:27

So it does have a plan here So what did it say?

28:30

So what changes does it want to make so it says change URL?

28:33

Which one how is it gonna change URL output a table of your

28:37

plan?

28:37

Show me the URL before and then show me what you're going

28:41

to change it to So let's go here and let's just kind of

28:44

make sure that that looks right But that's definitely one

28:47

thing to note is that this model selector is a little bit

28:50

odd I don't know what you guys think about this,

28:52

but definitely something that we want to take note of

28:56

right?

28:56

Go ahead and implement it.

28:58

So I'm going to implement that but even with this context

29:01

window drawback I still am noticing some differences and

29:06

some improvements and now it does bring Sonnet back into

29:09

the equation,

29:10

you know with Sonnet models if you guys remember Sonnet 4.5

29:12

before we got Opus 4.5 Sonnet 4.5 was by far the best

29:18

coding model on the market and then we got Opus 4.5

29:21

Everybody forgot about Sonnet 4.5,

29:23

but Sonnet 4.6 this is a significant improvement for the

29:27

Sonnet suite and it comes at a 40% cost reduction with

29:31

about a 25% speed increase So definitely something that you

29:35

guys are going to be see me working on and working with in

29:38

my vibe coding workflows And with that being said guys,

29:41

I'm gonna give it the bridge mind stamp of approval I hope

29:44

this helped you guys and you guys are gonna see me using

29:47

this in streams And with that being said if you haven't

29:49

already liked subscribe and join the discord Make sure you

29:52

do so and I will see you guys in the future

Interactive Summary

The video reviews the new Claude Sonnet 4.6 model, comparing its performance, speed, and cost to Opus 4.6. Sonnet 4.6 offers a 40% cost reduction and a 25% speed increase over Opus 4.6. Benchmarks show improvements in areas like the Agentic Coding Suite and a significant reduction in hallucination rates (38% for Sonnet 4.6 vs. 60% for Opus 4.6). The reviewer also tests Sonnet 4.6 in practical

Suggested questions

5 ready-made prompts