HomeVideos

Vibe Coding With Claude Opus 4.6 And Agent Teams

Now Playing

Vibe Coding With Claude Opus 4.6 And Agent Teams

Transcript

884 segments

0:00

everyone, and welcome back to another video.

0:02

In this video,

0:03

I'm going to be vibe coding with the newly released Opus 4

0:06

.6 and using a new feature in cloud code called agent

0:11

teens.

0:12

Okay.

0:13

So right here, I'm using bridge space.

0:15

And as you can see,

0:16

I have six Claude Opus 4.6 terminals opened up in bridge

0:20

space.

0:20

This is actually an ADE product that we are working on

0:24

right now.

0:24

But before we dive right into the video,

0:26

I have a light goal of 200 likes on this video.

0:30

So if you haven't already liked and subscribed,

0:32

make sure you do so bridge mind is also the fastest growing

0:35

vibe coding community on the internet right now.

0:38

So if you haven't already joined our discord community,

0:40

make sure you check the discord link in the description

0:42

down below and join the discord.

0:44

We're about to pass 5,000 members in the discord.

0:47

So with that being said,

0:48

let's now dive right into the video.

0:50

Okay.

0:50

The first thing that I want to cover is a little bit of the

0:53

statistics,

0:54

the leaderboards in the benchmarks for Claude Opus 4.6.

0:58

So over here on open router,

1:00

you can see that the biggest difference in Claude Opus 4.6

1:04

obviously is the context window.

1:06

You have a million in context with this model,

1:08

which is the first Opus model that actually has a million

1:12

in context.

1:13

The previous Opus models were around 200,000.

1:16

So if you go back to Opus 4.5, you're going to see 200,000.

1:19

And if you go back to Opus 4.1,

1:21

you're going to see what is it 200,000.

1:24

So they were able to keep it the same price,

1:26

but they increased the context window by five X.

1:30

So one thing to note in this is that if you are using

1:32

Claude Opus 4.6 in Claude code,

1:34

this is actually not yet accessible to you.

1:36

This is in beta.

1:38

So if you're using it via Claude code,

1:40

you do not yet have access to the one 1 million context

1:42

window.

1:43

It's still at 200,000,

1:45

but with cursor or any product that's integrating with Opus

1:49

4.6 via the API,

1:50

you're going to have access to this one million context

1:53

window.

1:53

But let's actually see what they say.

1:54

So the Opus 4.6 is anthropic strongest model for coding and

1:58

long running professional tasks.

1:59

It is built for agents that operate across entire workflows

2:02

rather than single prompts,

2:04

making it especially effective for large code bases,

2:07

complex refactors and multi-step debugging that unfolds

2:10

over time.

2:11

So that's a lot of yapping,

2:12

but let's go check the artificial analysis leaderboard,

2:16

because this is really interesting because I was taking a

2:18

look at this and if you look at the artificial analysis

2:21

intelligence index,

2:22

you'll note that Opus 4.6 now is at the top of this

2:26

leaderboard.

2:26

So it beats out GBD 5.2 extra high.

2:29

So that's 51.

2:31

Opus 4.6 is 53 on this leaderboard.

2:34

You can see that Opus 4.5 is at 50.

2:36

So they were able to get a three point bump here.

2:39

But one thing that I want to draw your attention to is the

2:41

coding index.

2:42

So look at this.

2:43

So we have 48 and 48.

2:45

And this is one of the biggest issues that I actually saw

2:48

with this release.

2:49

It's in the benchmarks here.

2:51

So Opus 4.5 right here, Opus 1.6 right here.

2:54

And what I want to draw your attention to is the suite

2:57

bench verified benchmark was 80.8%.

3:00

So that's one thing that I was not necessarily disappointed

3:03

by,

3:04

but I was hoping that we would see like 83% on this benchmark.

3:08

This is the number one benchmark for coding that I take a

3:10

look at.

3:10

So they were able to improve like the terminal bench for

3:13

coding,

3:14

which is important by quite a bit actually by over 5%.

3:17

So that's nice.

3:18

But I was looking at this one here and I was like, Oh man,

3:21

that kind of stinks because you know the suite bench

3:24

verified.

3:25

That's where, Hey,

3:25

if we see a big jump in that benchmark performance,

3:28

that's when we're going to see a big jump in performance

3:30

for us when we're actually using it in practice vibe

3:32

coding.

3:33

So I was a little bit disappointed in that.

3:35

Another important one to look at here is the speed.

3:38

So Opus 4.6 here, you can see, I can't highlight this,

3:41

but this is Opus 4.6.

3:43

You can see it's at 73 tokens per second.

3:45

And then Opus 4.5 is at 88.

3:47

So we saw a little bit of a decrease in tokens per second.

3:50

Now this does fluctuate over time.

3:52

So this isn't like set in stone,

3:53

but tokens per second did go down a little bit,

3:55

which is a little bit discouraging, but not too bad.

3:59

But for intelligence and speed,

4:01

really what we're seeing is that on some of these other

4:03

benchmarks,

4:04

Opus 4.6 was able to see a really big improvement.

4:07

Like one of those is definitely like this one here,

4:09

novel problem solving the Arc AGI two liter board.

4:12

This saw a big jump of 30%.

4:15

So that's huge.

4:16

Also office tasks saw a big jump.

4:19

Financial tasks saw a 5% jump.

4:21

Multidisciplinary reasoning on humanities.

4:23

Last exam saw a big jump as well.

4:25

Agentic search saw a big jump.

4:27

So, you know, a lot of these saw a big jump.

4:29

Look at this.

4:29

Agentic tool use also saw a jump.

4:32

Agentic computer use saw a big jump.

4:35

But, you know, the biggest disappointing thing in this,

4:37

in the benchmarks was definitely Agentic coding.

4:40

I was surprised to see this not only not improve,

4:43

but also take a little bit of a step back.

4:45

So discouraging there, but in, in all in all,

4:48

I think that it is going to be a smarter model across the

4:51

board.

4:52

Maybe just not specifically on the coding index.

4:54

The next thing that I like to cover about this new model is

4:57

a little bit less about the model as specifically,

5:01

but more so about the update that anthropic gave to a cloud

5:06

code in this model.

5:07

So if you now go to models,

5:08

I want to highlight something here.

5:10

So with Opus 4.6,

5:12

you now have the capability to adjust the effort.

5:16

So if we kind of just zoom in here,

5:17

actually let's launch another workspace here and do a

5:19

control T.

5:20

Let's just draw, let's just do a single,

5:21

let's just do a single and then we'll launch,

5:23

we'll launch one cloud code instance in this,

5:26

but let's just launch a new cloud code instance.

5:28

And this is again, this is bridge space,

5:30

which is very useful, but if we are in Opus 4.6, right?

5:33

And let's do model.

5:34

Okay.

5:34

And what you're going to see is that the default

5:36

recommended model is now Opus 4.6.

5:38

And with this model,

5:39

you do not have this ability with Haiku or Sonnet.

5:42

So this is specific to Opus 4.6,

5:45

but you can now adjust the effort.

5:47

Do you see how it says medium effort here?

5:49

So all I have to do to adjust this is I can set it to high

5:51

effort.

5:52

I can set it to medium effort,

5:53

or I can set it to low effort.

5:55

So this is similar to codex where, Hey, with codex,

5:58

they literally offer it as like different models, right?

6:01

Like codex is offering codex, GBT 5.3, you know,

6:05

extra high or medium or high, you know,

6:07

so they have four different models.

6:08

So this is kind of anthropics approach to this,

6:10

which is actually much better rather than offering like a

6:13

specific model.

6:13

They just allow you to adjust the effort in cloud code,

6:17

which I like.

6:17

So that's definitely one thing to note that when you are

6:20

setting your model inside cloud code,

6:21

make sure you check out the effort level.

6:24

That's going to be very nice to be able to set it at a

6:27

specific effort level.

6:28

If you're working on maybe an easier task,

6:30

you can set it to a lower effort.

6:31

So you use less tokens.

6:33

And then if it's a little bit more complex,

6:34

you can set it to high effort,

6:35

and it's going to think longer and do a better job.

6:37

But that is an important thing to know about this model is

6:40

that this is a new capability of Opus 4.6 inside of cloud

6:43

code.

6:44

The last thing I want to do before we actually get our

6:46

hands dirty and start vibe coding with Opus 4.6,

6:49

I want to show you probably the most important thing that

6:52

actually got released yesterday.

6:54

So it's, you know, it's interesting.

6:55

Like if we go back to the benchmarks,

6:57

I'm not going to hype this stuff up.

6:59

I think that a lot of people that watch this channel and

7:02

watch bridge mind and are in the bridge mind community know

7:05

that I don't really I don't really perceive myself as a

7:08

content creator,

7:09

I'm not going to hype up models or hype things up that

7:12

shouldn't be hyped up.

7:13

So I want to draw your guys attention to one last thing

7:15

before we go over agent teams.

7:17

Look at the jump between Claude Opus 4.1 and Claude Opus 4

7:22

.5 on the coding index.

7:24

You know,

7:24

we literally saw a jump of 11 points on this coding index.

7:27

And that's why so many people once we got Opus 4.5,

7:30

they were like, this is magic, right?

7:32

Because we saw such an improvement on the suite bench

7:34

verify benchmark,

7:35

and the coding capability saw a huge jump.

7:39

But if I'm being completely honest with you guys,

7:41

it's like, hey, look at this, it's at 48,

7:44

we did not get a substantial improvement in the coding

7:47

capabilities from 4.5 to 4.6.

7:50

Now there are going to be a couple other things that are

7:52

better about the model.

7:54

But definitely one thing to note is that this is not going

7:57

to change a whole lot in terms of its coding capabilities,

8:01

when you look at the benchmarks,

8:03

because the benchmarks do tell a lot about a model.

8:07

And I can see this right now it's like, hey,

8:09

this is not going to be a model that was that jump from

8:12

Opus 4.1 to Opus 4.5.

8:14

But with that being said,

8:16

I want to now highlight one of the biggest things that got

8:19

released yesterday that may have been a little bit

8:21

overlooked by a lot of people.

8:23

And we were using this on stream yesterday.

8:25

And it's called agent teams.

8:28

And agent teams is a new experience experimental feature

8:32

from anthropic that you can use inside of cloud code.

8:36

And you guys can go I'm gonna you I'm gonna drop this link

8:38

in the description.

8:39

So if you guys want to take a look at it,

8:41

and kind of read through it yourself,

8:43

but I'm going to cover pretty much the basics that you guys

8:45

should know.

8:46

And I'm actually going to draw how this works up on the

8:48

whiteboard.

8:48

Okay.

8:49

So sub agents, as you guys know,

8:52

sub agents has been a very big thing.

8:54

So I'm just going to use bridge voice and give cloud code a

8:57

quick prompt here.

8:58

I want you to launch five sub agents to do an in depth

9:00

review of the bridge mind UI and identify any

9:03

inconsistencies in the theme and styling on different pages

9:06

on the website.

9:07

So we're going to launch some sub agents right now.

9:10

Okay, so we're going to drop in bridge mind UI.

9:12

And let's just drop this in with opus 4.6 and give it that

9:15

prompt, right.

9:16

So what this is going to be doing is this is going to be

9:19

launching what's called sub agents.

9:21

So we've had sub agents for a while now.

9:23

And what actually are sub agents?

9:25

Well, you can see that right here,

9:26

they have their own context window,

9:28

and the results are returned to the caller,

9:30

they report back to the main agent,

9:33

main agent manages all the work and focus tasks where only

9:36

the result matters.

9:36

And then the token cost is going to be lower when compared

9:40

with agent teams.

9:41

But the important thing to understand is the difference

9:43

between these two things, right?

9:44

So you have agent teams and you have sub agents.

9:47

So what agent teams are,

9:48

and I'm just going to highlight this on the whiteboard is

9:50

sub agents, you had a main agent here that like right now,

9:55

this is going to give it to five sub agents, right?

9:58

So this is what this is what sub agents are.

10:01

Okay, so sub agents, you have a main agent,

10:03

and the sub agents only communicate back to the main agent.

10:07

Okay, so these are all communicating with each other.

10:10

Okay.

10:10

This is how sub agents work sub agents.

10:13

Okay, this is how they work.

10:16

The sub agents communicate with the main agent,

10:18

and that's how they work, you can see,

10:20

it's launching these sub agents and these sub agents are

10:23

now working.

10:23

Okay, you can see that they're reviewing,

10:25

reviewing product page styling, reviewing off page styling.

10:28

So these are reviewing different things, right?

10:30

But now what this new agent teams feature is,

10:33

is this is actually like a very big deal.

10:36

Now,

10:37

I'm going to talk a little bit more about it and some of

10:39

the nuances of it,

10:40

but how agent teams work is instead of let's say you had

10:44

sub agents.

10:45

Now, instead of that being the approach,

10:47

you basically have something that looks like this.

10:50

Okay, so let's say that you have your,

10:53

your main agent here, okay, and you now have five agents,

10:58

right?

10:59

And previously,

10:59

you only had them communicating with that main agent,

11:03

but now these sub agents actually are able to send messages

11:06

to one another.

11:08

Okay,

11:08

so the difference here is that these sub agents can now all

11:12

communicate as a team because they can now communicate from

11:15

agent to agent here,

11:17

rather than only being able to communicate to the main

11:19

agent.

11:20

So that's the different with agent,

11:22

the difference with agent teams.

11:23

And I'm actually going to show you guys how to set this up.

11:25

And I'm going to show you a couple examples that I actually

11:28

implemented yesterday on stream and a couple of my thoughts

11:32

with agent teams.

11:33

Now I'm going to drop this link in the description below.

11:36

And the best way to be able to set this up is to literally

11:40

just drop the link into cloud code and ask it,

11:44

would you confirm that this is set up and enabled with my

11:47

cloud code?

11:48

If it isn't,

11:49

make sure you enable it that just give it that prompt.

11:52

I already have mine enabled.

11:53

So this isn't going to do anything for me.

11:55

I already have it enabled,

11:56

but let's now go over to this agent down here.

11:59

And what we're going to do is how many teams are

12:02

configured?

12:03

Would you list out the teams that are associated with my

12:06

cloud code?

12:07

So we're going to,

12:08

we're going to ask it what it's a cloud code,

12:11

but we're going to ask it what teams we actually

12:14

configured,

12:14

because basically what I was doing yesterday is I was

12:17

setting up multiple agent teams.

12:20

And you can see here, I have an API security review team,

12:23

and then I also have a code quality fix team.

12:26

So I have these teams here and let's actually,

12:29

can I drag this?

12:30

Okay.

12:30

I don't have that configured yet.

12:32

So with this, you can see that these teams, for example,

12:35

code quality fix.

12:37

So the purpose of this team is to fix all bugs and clean up

12:41

AI slop across the bridge mind mono repo.

12:44

So there's a team lead, and that is that, that's,

12:48

that's the team lead.

12:49

That's this here.

12:50

So rather than it just being a main agent that all the sub

12:52

-agents talk to,

12:53

now you have a team lead and that's what it looks like.

12:56

So the team lead, we have a UI fixer.

12:59

So this is a front end bug fixer.

13:00

We have an admin fixer, the admin panel bug fixer.

13:03

So it's three members in total, and they both are,

13:06

you know, three members in total,

13:07

both workers are general purpose agents, right?

13:09

So these, these agents are working together as a team.

13:11

And then we also have an API security review.

13:14

And with that team, you can see that I have a team lead,

13:17

an auth deep auditor, a biz logic auditor,

13:19

an exposure auditor, and a verify auditor.

13:22

Okay.

13:22

So we have these teams and I'm going to show you what it

13:25

looks like to actually run one of these.

13:26

Okay.

13:27

So.

13:28

We're going to use the API security review two team.

13:30

Okay, I want you to enable the API and hold on.

13:34

I'm not gonna say enable.

13:35

I'm just gonna say I want to work with the API security

13:39

review team and I want this team to do an in-depth review

13:43

of the bridge mind API and do an audit of our API security

13:47

any findings that the team finds should be outputted and

13:50

inputted into a readme file of the findings with detailed

13:54

descriptions of each finding so that we can complete the

13:58

task and make sure that the API is secure.

14:00

Okay,

14:00

so we're going to drop this in right and what you guys are

14:03

going to see is you're going to see Opus 4.6 be able to

14:06

work as a team.

14:07

So it says I'll set up a security audit team to thoroughly

14:09

review the bridge mind API.

14:11

Let me create the team and organize the work.

14:13

Okay, so actually what it did here.

14:16

Hold on.

14:17

Did it just create a new team?

14:19

Hold on.

14:20

What did it not actually let me create the team.

14:22

Okay,

14:23

I actually don't think that that did that correctly and

14:26

this is one thing that to note is that Claude Claude the an

14:29

anthropic did say that there was going to be some issues.

14:32

Hold on.

14:32

I actually want to stop this because it looked like it was

14:35

creating a was it creating a new team.

14:37

It said I'll set up a security audit team to that's not

14:40

what I wanted to do.

14:41

I want to let's let her let's prompt this.

14:43

Let's now prompt this one because this one knows what teams

14:45

we have.

14:46

So maybe you have to like do really well with prompting.

14:49

I want you to use this team.

14:52

So let's go back over here and let's actually make sure

14:54

that we drop in and so that it knows like hey use this team

14:57

that may be something that you need to do just while this

15:00

is like a beta feature.

15:01

It's like an experimental feature.

15:03

So it had a hard time kind of like enacting this right,

15:06

but let's go over here and let's check what did this one

15:09

actually find here.

15:10

So in here,

15:11

this was the one that was trying to it was launching sub

15:13

-agents, right?

15:14

So it's asking want me to start fixing any of these issues.

15:17

So I'm going to do is I'm actually to change the model and

15:19

I'm going to change it to low effort because UI issues

15:23

aren't like super complex.

15:24

They don't need high effort and high thinking.

15:27

Yes.

15:27

I now want you to fix the findings and update the code

15:30

respectively.

15:31

So we'll be able to launch that now.

15:35

So one thing that I want to test just briefly is like let's

15:38

go over and let's actually launch.

15:40

Let's launch local host here and let's go into the bridge

15:43

mind UI, right?

15:44

Let's go over here and hold on.

15:46

I think there were some updates being made.

15:47

So we may have to kill our terminal real quick.

15:49

Let's go over here and where is it?

15:52

Let's see here.

15:54

It is let's go here and let's just do that.

15:58

Okay,

15:58

so we're going to restart this real quick and we're going

16:01

to go back.

16:02

Okay.

16:02

Look at this.

16:03

This is this is not good and okay.

16:05

There we go.

16:06

All right.

16:06

So you can see our home page here and let's just try a

16:09

brief UI fix right and look at this.

16:12

So this is this agent here and it's making updates.

16:16

Let's go over to bird space real quick.

16:17

So it's this agent and it's actually asking high impact

16:20

only recommend.

16:21

Okay,

16:22

what what agent is causing all these these UI issues that

16:25

were experiencing?

16:25

So let's see.

16:26

Let's go back over here and let's just drop this issue and

16:30

let's let's have an agent fix that real quick.

16:35

I want you to fix this error that I'm getting in the

16:37

BridgeMind UI.

16:38

So let's...

16:40

BridgeMind.

16:40

Okay,

16:41

and that's why we're going to need to set up a dictionary

16:42

for BridgeVoice.

16:44

So speaking of which,

16:45

let's actually pull up BridgeVoice real quick.

16:48

And we are going to have to solve a couple issues here.

16:52

So we are having a storage error here.

16:55

So what I want to do is I'm going to work with codecs 5.3,

16:58

and I'm actually going to drop in.

17:01

And this is a great strategy, okay?

17:02

So what I'm going to do is I'm going to work with codecs 5

17:05

.3, and I'm going to have it write a handoff prompt.

17:08

I want you to write a handoff prompt to Claude Opus 4.6.

17:12

And just so you guys know,

17:13

yesterday was one of the craziest days that we've had in

17:16

AI.

17:17

Probably to this point, we had two frontier models drop.

17:20

Now the only caveat to that that I would say is that if you

17:24

go back to the benchmarks,

17:25

this was not the same improvement that we saw from Opus 4.1

17:28

to Opus 4.5.

17:30

So even though we did get two new frontier models, to me,

17:33

it seems like these models were not that much of a jump.

17:36

They were like a small iteration that are definitely going

17:38

to be better,

17:39

but not the same jump that we saw an improvement from like 5

17:42

.1 to 5.2.

17:44

Or like you can see here, Opus 4.1 to Opus 4.5, right?

17:48

So it's a little bit of a different situation than what we

17:50

had.

17:52

But what we're going to do now is we're actually going to

17:54

drop in this handoff prompt to Opus 4.6, okay?

17:58

So we're going to drop this in and we're going to give this

18:02

handoff prompt to Opus 4.6 and we're going to have Opus 4.6

18:05

do this.

18:06

So this,

18:06

it says you're taking over a debugging task and it has

18:08

this,

18:09

and now we're going to go back over and let's just use warp

18:12

here and we're going to drop in and we're going to do

18:14

Claude and I'm going to let you guys look at this prompt.

18:17

So what I'm going to do here is I'm actually going to

18:18

change this and I'm going to change the model and I'm going

18:20

to do high effort on this because this is a little bit more

18:23

of a complex debugging task.

18:24

So I'm going to drop in the handoff from Codex 5.3.

18:28

I'm going to give it here.

18:29

And this is a good strategy because Codex does think for

18:32

longer and what a lot of people say is it's like, hey,

18:36

you can have Codex instances that are running for up to

18:38

like an hour, very, very easily.

18:40

Like you'll have Codex instances that are running for very,

18:42

very long times.

18:44

And you know, something to really understand is that, hey,

18:46

like you definitely want to know that like,

18:49

when do I enact and when do I enable Codex versus this,

18:52

right?

18:53

It's like very important to know that.

18:55

So let's see if we can refresh this page.

18:56

I think that there's an issue somewhere on my computer

18:59

where I'm running this.

19:00

Okay.

19:00

I do see it.

19:01

So let me, let me,

19:01

let me cancel out of this and then let's launch this again

19:04

and let's go over to warp and we're going to just restart

19:07

the server real quick because it was having a cache issue.

19:10

So let's pull this back up and let's launch local host.

19:13

And there we go.

19:15

Okay.

19:15

So here's our front end, right?

19:16

And what I want to try is I just want to try a simple UI

19:19

test.

19:19

So this here coded the speed of thought.

19:21

This is a little bit outdated at this point.

19:23

So I'm going to drop in this instance right here.

19:27

And what we're going to do is we're going to go over to

19:29

warp.

19:30

I'm going to take this and hold on here.

19:32

Did this not, did this prompt not submit?

19:34

Do you guys see this?

19:34

Hold on.

19:36

This one that we had here, did this not submit?

19:38

What happened here?

19:39

We had it, didn't we?

19:41

Oh, it didn't.

19:42

Did this not submit?

19:43

I don't know if you guys see that, but for some reason it,

19:44

it didn't, it didn't submit my prompt.

19:46

I'm not sure why.

19:48

Okay.

19:48

Okay.

19:48

Let's try this again then.

19:49

All right.

19:49

You're taking over.

19:50

All right.

19:50

Let's try this and let's submit.

19:54

Okay.

19:54

Perfect.

19:55

All right.

19:55

So let's launch Claude here and we're going to see if we

19:57

can update the styling of that section as well inside of

20:01

our websites.

20:02

Let's drop in this section here and let's just say I need

20:06

you to review this section on the homepage of the bridge

20:09

mind UI.

20:10

And what I need you to do is I need you to update it with a

20:13

completely different component and looking section focused

20:16

more on different particular products in the bridge mind

20:20

suite.

20:20

I want you to focus on the bridge mind MCP.

20:22

I want you to focus on bridge code.

20:24

I want you to focus on bridge space and I want you to focus

20:28

on.

20:30

bridge voice and there should be some information and some

20:33

nice, uh,

20:34

unique components and graphics for each of these products.

20:38

But this component can be removed and replaced with four

20:41

different components covering each separate product.

20:45

Okay.

20:45

So what I'm going to also do,

20:47

I'm going to submit this prompt,

20:48

but I also want to go back over to, um,

20:51

bridge space here and you can see this team.

20:53

And what I want to highlight here out of this team is that

20:56

it's been working for the past five minutes and 58 seconds.

20:59

And what I want to show you guys is with these teams,

21:01

they're able to work together and message each other,

21:05

right?

21:05

So you can see that this one was where we launched that

21:07

team and you can see off audit exposure auditor and it's

21:12

running these different API, these API teammates, right?

21:17

So it's security agents that are all running.

21:19

And you can see that this one biz logic auditor, right?

21:22

This one is doing that particular task.

21:24

So what's interesting about these agents,

21:27

these agent teams is that each of these teammates is like

21:30

highly specified for a particular action, right?

21:33

So, you know,

21:34

we had another team that was code quality fix,

21:37

which was basically just focused on making it so that

21:40

there's no AI slop, right?

21:42

So we can go and we can add, for example,

21:44

let's just add the bridge mind API.

21:46

And let's say that we want to enact this team, right?

21:49

So this, this here, I want to say,

21:52

I want you to enact the code quality fix team that will do

21:57

a thorough review of the bridge mind API and identify any

22:01

areas where the code quality needs to be fixed and updated.

22:05

Do not update any code,

22:06

but compile all the findings from this team into a read me

22:10

file with different tasks and instructions of why it needs

22:14

to be fixed,

22:15

why the code quality is bad and how to fix it and make sure

22:18

that the changes in the instructions will not break any of

22:21

the functionality.

22:22

So again, I'm using bridge voice for that.

22:23

It's a very, very good text to speech tool.

22:25

We're going to have that launching probably next week.

22:28

This agent is now done.

22:30

So this was the sub agents, right?

22:31

So this one was launching sub agents,

22:32

which are differently, right?

22:33

That's where the agent communicates with where each sub

22:36

agent communicates with the main agent.

22:38

Whereas this now, this team,

22:40

so it says to recap what was delivered.

22:42

So it says all done,

22:43

the security team has been fully shut down and cleaned up

22:46

to recap what was delivered.

22:48

Three auditors ran in parallel across the entire bridge

22:51

mind API code base, 23 findings documented, zero critical,

22:55

five high, 10 medium, eight low.

22:57

All findings consolidated into this read me with security

23:00

levels, severity levels, effective files, descriptions,

23:03

recommended fixes, and OWASP categories.

23:06

So it then has the says the file is ready for your team to

23:08

review and start working through the remediation

23:10

priorities.

23:11

Great work.

23:12

I now need you to launch the team once again to actually

23:15

start working through in updating the code respectively and

23:18

complete all of the updates so that there are no more

23:21

security vulnerabilities based off of the findings.

23:25

Okay.

23:25

Okay.

23:26

One issue that just happened is one of these agents did

23:29

just make an update to bird's voice.

23:31

So it just opened up and closed.

23:33

So let's just do hold on here.

23:35

I need to fix this.

23:36

And this is the issue that we're having with bird's voice

23:37

right now.

23:37

This is bird's voice.

23:38

It's great.

23:39

It did just relaunch.

23:40

So hold on.

23:41

Let's give that prompt again.

23:43

I now want you to launch the agent team to actually start

23:46

working through the remediation priorities and updating the

23:49

code.

23:50

Make sure that each is updated correctly without impacting

23:54

or breaking functionality and that we solve the

23:56

vulnerability without breaking code.

23:58

The code that it writes needs to not be AI slot, be clear,

24:02

concise, and well-written.

24:04

And once again,

24:05

the most important thing is that we fix the security

24:07

vulnerabilities without breaking functionality.

24:10

Now I want you to launch the team again to update the code

24:13

and solve and fix the priorities and the vulnerabilities.

24:16

So we'll launch this, we'll submit this prompt,

24:19

and this is once again going to restart that team.

24:22

Now I did say that there was one caveat with the teams,

24:25

right?

24:25

And the caveat is if we go to our usage here,

24:28

what you guys are going to see with our usage,

24:30

and I guess this is one thing, I mean, look,

24:32

so this is here, right?

24:33

I haven't really used that much.

24:36

I haven't launched agents on that many tasks, right?

24:38

But what I want you guys to take very close note of is the

24:41

22% used.

24:45

So the teams that we've been enacting,

24:48

like literally yesterday, when I was,

24:49

when I was using teams,

24:51

one of the teams ran through over 500,000

24:55

tokens inside of one task that I gave it.

24:58

So like one thing to note is that if you are using agent

25:01

teams, you are going to be burning through tokens,

25:04

especially with Opus 4.6.

25:06

Now,

25:06

one thing that I will say to that is that you could use

25:09

this with like Haiku 4.5,

25:11

or you could use it with Claude Sonnet 4.5, you know,

25:14

that obviously the coding capabilities are going to drop a

25:16

ton.

25:18

But here's what I will say with that.

25:20

Once we get Claude Sonnet 5,

25:23

this is going to be incredibly more useful.

25:26

Right now, you know,

25:27

we're using agent teams and we're using it with incredibly

25:29

expensive models, right?

25:31

Like if we go back to Opus 4.6, you know, look at the cost,

25:35

right?

25:36

It's expensive.

25:37

It's $5 per input, $25 per million on the output,

25:40

and $10 per thousand on a web search.

25:43

Like it's expensive.

25:45

So like we have to be at the point where it's like, okay,

25:48

we want to use this feature,

25:49

but you also have to know that you're going to be going

25:52

through insane amounts of token usage with this feature,

25:57

right?

25:57

Like this team is launching as well.

25:59

And I don't know how many tokens this one is going to run

26:01

through,

26:01

but yesterday I had one agent team that went through that,

26:05

right?

26:06

And it literally went through half a million tokens inside

26:14

of one task that I gave it.

26:15

So that's like unbelievable.

26:17

Let's go back over to this agent here.

26:19

So this agent looks like it's done.

26:20

It finished inside of two minutes and 53 seconds.

26:22

Also, this one did run as well.

26:24

So let's see if our agent,

26:26

it looks like it potentially did.

26:28

It says sign in with credentials.

26:29

So let's see if it was able to fix our issue that we had.

26:32

This one ran for what, five minutes?

26:34

And it did.

26:36

Oh, wow.

26:36

Okay.

26:37

So guys, that's actually huge.

26:38

So the issue that we had here is that before the stream,

26:41

but not the stream before that I started with this video,

26:43

um, that actually was not working.

26:47

And what I did is I had GPT 5.3 review what was actually

26:52

causing that issue, right?

26:53

Cause signing in was not working.

26:55

So this is the bridge voice dashboard.

26:56

So I can see, for example,

26:57

like some of my recent activities,

26:59

here's the prompts that I've been giving it in this

27:01

session.

27:01

You can see the total words that I've spoken to it,

27:03

speaking time sessions and words per minute.

27:06

Um, so very useful here.

27:07

This is also where I set like my shortcuts and whatnot and

27:09

where I can manage my subscription, but here's my history,

27:12

but it was able to one-shot this.

27:14

I gave it to 5.3 first,

27:15

and then I had it handed off to Opus 4.6 and Opus 4.6 was

27:19

actually able to implement it fairly quickly.

27:21

Like that is one thing that you should take very close note

27:24

of is that, Hey, you know,

27:26

Codex 5.3 is going to run for longer,

27:29

but Opus 4.6 is going to run shorter and it's probably

27:31

going to do a pretty good job.

27:33

So the next task, this was that UI update.

27:35

Let's see how it actually did on that UI update.

27:37

So it says, okay, perfect.

27:39

So it removed that task, right?

27:40

And what I wanted to do is I wanted to like take,

27:43

basically remove that particular component that was there

27:45

previously and then update it with different sections about

27:49

each product in the suite of vibe,

27:50

coding tools that we're building.

27:57

So that looks nice.

27:59

Shift from your terminal bridge code.

28:00

That looks good.

28:01

We'll have to make some updates, but it also just, yeah,

28:03

this did a really good job.

28:06

There's going to be a couple of things that we're going to

28:07

want to change with this,

28:08

but this actually did look at this.

28:10

It even did like a little SVG background for bridge voice.

28:13

That's really cool.

28:14

Yeah.

28:14

I mean, the backgrounds did really nice.

28:16

It was able to know like the different colors and themes

28:18

that we're using for each of these products.

28:20

So that's, that actually did a really good job on the UI.

28:24

I would say very, very good job here.

28:27

Let's try another one.

28:27

How long is this video?

28:28

I don't want to get, okay, we're 21, we're at 21 minutes,

28:30

at least on this session.

28:31

So I don't want to get, want this video to get too long.

28:34

I'll try and cap it at 30 minutes here, but let's, let's,

28:37

let's like see how is this one doing this team here to two

28:40

agents still working?

28:41

I'll report when they finished.

28:42

So I will say this just to summarize,

28:45

cause I don't want the video to get too long.

28:47

I will be streaming today for day 128 of vibe queen app

28:50

until I make a million dollars,

28:51

but this team's feature is actually a game changer.

28:55

It's just that it may not actually be obligatory until we

28:58

get some cheaper models that are better and sweet bench,

29:02

but are cheaper than Opus 4.6, right?

29:04

Like Opus 4.6,

29:05

you're just going to be spending through your usage a ton.

29:08

Like even if we go back to that usage,

29:09

like did our usage go up?

29:10

Cause that team's running.

29:11

Let's check it out.

29:12

Yeah.

29:13

27%.

29:13

So another 5% and it was used on this session just from

29:17

these teams running, right?

29:19

Like this one, this one's running,

29:20

this one running and it went up to another 5%, right?

29:23

So if you're running these teams,

29:25

you're just going to max out your usage like very,

29:27

very quickly.

29:28

So I think that this feature is incredible.

29:31

Again, the difference is substantial with,

29:34

with teammates and agents being able to message each other.

29:37

This is a game changer.

29:39

Um, you will get better results if you're using teams,

29:41

but right now it doesn't really seem very applicatory

29:44

because Hey,

29:44

you're running it with models that are just too expensive.

29:47

So until we get models that are cheaper,

29:49

like a Claude son at five,

29:50

we're not really going to be able to get like a ton of use

29:53

out of this because, you know, I'm on the 20 X plan.

29:55

If you're on the five X plan,

29:57

you would already have maxed out your,

29:58

your usage for a five hour period, uh,

30:01

just from running these two teams, right?

30:03

So it's just doesn't really make sense for your average

30:05

user, right?

30:05

So definitely something to take note of, but it did,

30:09

it did do a great job on the UI.

30:10

It fixed our issue here that we were having with bridge

30:12

voice.

30:13

The teams were running and cleaning up the code.

30:15

Um,

30:16

but this is like the thing that's interesting about these

30:18

teammates.

30:19

So it sends messages.

30:21

So it does this send message that you're seeing.

30:23

That's what you're seeing.

30:24

The teams talk, the teammates talk to each other.

30:26

So this is incredible.

30:28

I'm not,

30:28

I don't think that this model is like a game changer in

30:32

terms of the coding performance just because of a, like,

30:36

this was not the same jump that we saw from 4.1 to 4.5.

30:39

But with that being said, like,

30:41

I think that we're going to see a jump once the sonnet five

30:44

model releases, this is a model that's been rumored.

30:47

I think that sonnet five is probably going to be a game

30:49

changer.

30:50

Whereas with this model, it's definitely a iteration.

30:53

There's some improvements in other parts of the benchmarks.

30:55

It is going to be a better model.

30:57

Like you've,

30:58

you obviously would be wanting to use Opus 4.6 over 4.5.

31:02

But in, you know, when you look at the coding index,

31:04

I don't think that we're going to see like massive

31:05

improvements in the coding quality,

31:07

but we'll see with time.

31:09

I think we need to give it a little bit more time.

31:10

But with that being said, guys, this is good.

31:12

I'm going to wrap up the video here.

31:14

I don't want to get it to get too long, but Opus 4.6,

31:17

I'm impressed what they did with some of the different

31:19

features.

31:20

And I'm super excited about agent teams.

31:23

I'm very excited to be using this model in my streams.

31:26

And I also will be having a, a review of 5.3 Codex.

31:30

I did upgrade to the chat GBT pro plan.

31:32

So I'm going to be using that model in the streams as well.

31:36

But with that being said, guys,

31:37

if you haven't already liked and subscribed,

31:39

make sure you do so.

31:41

Join the discord,

31:41

the fastest growing vibe coding community on the internet

31:43

right now.

31:44

So there's a link in the description below.

31:46

And with that being said, guys,

31:47

I will see you guys in the future.

Interactive Summary

This video provides a deep dive into the newly released Claude Opus 4.6, focusing on its benchmarks, features, and the experimental 'agent teams' capability in Claude code. The presenter highlights the increase to a 1-million-token context window and the new ability to adjust effort levels for tasks. While noting that the coding performance jump is more iterative compared to previous versions, the video demonstrates how agent teams allow multiple AI agents to collaborate by communicating directly with each other. However, a significant caveat is mentioned regarding the high token consumption and cost associated with running these complex multi-agent workflows.

Suggested questions

5 ready-made prompts