Claude Opus 4.1 -- The new king of coding?

Watch on YouTube

Now Playing

Transcript

191 segments

0:00

Hello everyone. This is Professor

0:01

Patterns and today's a great day because

0:03

we also have the Opus 4.1 model that's

0:04

also been released. So, this I've been

0:07

waiting for for a really long time. Um,

0:10

and it's an upgrade from the current

0:12

Claude Opus 4 model. And it's great for

0:15

things like agentic task, real world

0:17

coding, as well as reasoning. And we see

0:19

here how much of an improvement it is.

0:22

So, this is set 37, that's 62.3, and

0:25

then opus 4 is 72.5. and then 41 is

0:28

74.5.

0:30

So they say it's about a one standard

0:32

deviation improvement over Opus 4 on

0:35

their junior developer benchmark. And

0:37

the claim is that it's about the same

0:38

performance leap as you would have when

0:41

you go from Sonnet 37 to Sonnet 4. I

0:44

don't really know how I feel about that

0:46

because to be honest, I kind of still

0:48

like the 37 model more than the Sonnet 4

0:51

model. And I'm only saying this from

0:54

experience just because I've been using

0:56

not only the sonnet model heavily

0:59

through open router. I mean honestly if

1:00

you take a look at my open router

1:03

credits page uh you you'll see here that

1:07

I just over the last two or three months

1:09

I've probably spent about 2500 to about

1:11

$3,000 on just open router credits. And

1:14

that doesn't even include my Cloud Max

1:17

subscription as well as the API that

1:20

I've been using from uh Claude directly

1:23

and also the API that I've been using

1:24

from Amazon Bedrock. So probably spent

1:27

closer to about $15,000 over the last 3

1:30

months on you know all of these models.

1:33

So I think I would be pretty decent at

1:37

being able to figure out that if there

1:38

whether or not there is a difference or

1:40

not uh between these models. So, um, I'm

1:43

excited about Opus 41. I think Opus 4 by

1:46

itself is great. If I have a bug, uh,

1:50

anywhere in my codebase, it's able to

1:52

find this bug and fix this problem like

1:54

really, really easily. And I just need

1:56

to review it and just give it a go and

1:58

say like, yep, this sounds great. Okay,

2:01

so enough talking. Let's call this model

2:04

through open router. And I could have

2:06

used the cloud provider as well, but

2:09

that's fine. For now, I'm just going to

2:10

give it the cloud opus 41 model. Just

2:14

make sure that that looks right. I will

2:16

also give it the or enable extended

2:19

thinking and allow the full budget of

2:21

6,553 tokens. And here I'm actually

2:25

giving it a code base that actually has

2:29

a bunch of different things. So, it's

2:31

not just like an empty codebase. Um,

2:33

this is a code base for my website for

2:37

Percr. And as you can see, like this is

2:41

a full sort of landing page, but doesn't

2:43

really have anything else in there. Um,

2:45

this particular folder is just for the

2:47

landing page development. And what I'm

2:49

going to do is simply take a screenshot

2:52

of this and go here. What's wrong with

2:57

my hero page? And paste the link. Um,

3:01

the background is really off. Can you

3:05

help improve it further?

3:09

Uh, think step by step and put yourself

3:12

in the mind of an expert UI UX designer

3:17

and researcher.

3:19

List out all the things to improve and

3:23

do it.

3:26

This is kind of how I usually

3:29

work with this sort of prompting. I ask

3:31

the models to put themselves in the mind

3:32

or themselves itself in the mind of what

3:36

I want it to be. And there we go. It's

3:40

starting to work. This is my first ever

3:42

time working. Oh my god. That's that

3:44

that was 20 cents. That was 50 cents and

3:48

it hasn't really done anything yet.

3:50

Okay, so it looks like it is finally

3:52

done now. And this cost me about $2.50.

3:56

Um, and this is the current state of the

3:58

site and the landing page. And this is

4:02

the one that I just created.

4:05

I absolutely hate it. This is complete

4:09

waste of $2.50.

4:12

But I can see an appeal that this might

4:16

have for some people. I just don't think

4:19

that this is the right type of thing for

4:21

the product.

4:23

But I guess the UI is kind of cool. You

4:26

hover your mouse there and all the stars

4:28

kind of move and stuff. We have this

4:31

little hover effect kind of thing.

4:34

But would this really be premium? I

4:35

don't know. It's like stars and space

4:37

kind of vibe, I guess. But okay. Uh,

4:41

that's what you get for vibe coding. Let

4:44

me try it one more time. I gave it the

4:46

same exact prompt and this time it built

4:48

this

4:50

which to be honest does look a little

4:54

bit at least it looks different. At

4:55

least it doesn't look

4:58

as whatever it was the last time. I

5:01

think it's a little bit more

5:02

enterprisegrade. It did add in a bunch

5:04

of stuff that's not true. So that needs

5:07

to get worked on a little bit. But

5:11

honestly this time around it's not as

5:13

bad as it was. Would I still change it?

5:16

I don't know. Um, maybe I'll think about

5:19

it. Let's also see how this works on

5:23

LinkedIn. Create a post for the

5:25

LinkedIn. So, I basically ha had it

5:28

create a full master document that

5:30

essentially lists all of the things that

5:31

the site actually does. And I want to

5:34

see what the writing style looks like.

5:37

Um, recruit. So, it does have m dashes.

5:41

So, a little bit of a giveaway that this

5:43

is an AI generated post. 212 word count.

5:49

And what's the raw content? We flip this

5:52

equation. Here's what changed.

5:56

Yeah, I don't know. I think that if I

5:58

was to create a post like this, it would

6:03

be pretty obvious that I feel that it

6:05

was AI generated.

6:07

So, I don't know. Is it still there in

6:10

terms of like the writing part of

6:12

things? I probably I I don't really

6:14

think so. Um, but maybe I just need to

6:17

play around with the prompt a little

6:19

bit. Maybe give it a couple of like one

6:20

twoot examples of like how I want it to

6:23

be. Maybe create like a full brand

6:25

writing style. Um, if you haven't

6:27

watched my video on Claude projects, um,

6:30

I recommend it. That way you can kind of

6:32

like create an entire project, give it a

6:35

full knowledge base. And then here I can

6:37

maybe add some content on let's say text

6:40

content on like brand

6:43

writing guidelines and style. And then

6:46

this way anytime that I'm referencing u

6:49

or anytime I'm actually making a chat

6:51

here, it's always going to be able to

6:53

reference the writing style and

6:55

guideline and all of those kind of

6:56

things.

6:58

But yeah, so far I I feel like in terms

7:00

of coding, it looks great. I'm going to

7:03

try it out with a couple of bugs that I

7:05

know that I haven't been able to fix

7:07

yet. Um, and that way I'll have a full

7:10

understanding of like how well this

7:12

model actually does. In terms of cost,

7:14

yeah, it is an expensive model, so I'm

7:16

not going to be using it all the time.

7:18

This is just something that I would use

7:20

when, you know, I'm really frustrated. I

7:22

just can't seem to get to the bottom of

7:23

the problem with cloud set. and then I

7:25

just, you know, either debug it myself

7:28

or I would just use cloud opus. But

7:31

that's it for this video. I think that

7:33

overall I'm really excited about this

7:35

new update. Uh the model does seem

7:37

really great at figuring out a lot of

7:39

different things. I'm still needing to

7:41

experiment with it a little bit more. Uh

7:43

let me give it a couple of really

7:45

important stuff like things like in a

7:47

huge code code base that has a lot of

7:49

different interactions and stuff and let

7:51

me see what I get back from there. But

7:53

more to come. Thank you all for tuning

7:56

in. If you haven't watched my video on

7:57

GPT OSS model that was released

8:00

literally an hour after this model, uh,

8:02

watch that one as well. I posted a link

8:04

to it in the description below. Thank

8:06

you all for tuning in. I'll see you in

8:07

the next one. Goodbye.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

Professor Patterns introduces the Claude Opus 4.1 model, highlighting its significant improvements in coding, reasoning, and agentic tasks over its predecessor. Through practical tests involving UI design and LinkedIn post generation, the author evaluates its strengths and weaknesses, concluding that while it is an expensive model best reserved for complex debugging, its problem-solving capabilities are highly promising.