Claude Opus 4.1 -- The new king of coding?
191 segments
Hello everyone. This is Professor
Patterns and today's a great day because
we also have the Opus 4.1 model that's
also been released. So, this I've been
waiting for for a really long time. Um,
and it's an upgrade from the current
Claude Opus 4 model. And it's great for
things like agentic task, real world
coding, as well as reasoning. And we see
here how much of an improvement it is.
So, this is set 37, that's 62.3, and
then opus 4 is 72.5. and then 41 is
74.5.
So they say it's about a one standard
deviation improvement over Opus 4 on
their junior developer benchmark. And
the claim is that it's about the same
performance leap as you would have when
you go from Sonnet 37 to Sonnet 4. I
don't really know how I feel about that
because to be honest, I kind of still
like the 37 model more than the Sonnet 4
model. And I'm only saying this from
experience just because I've been using
not only the sonnet model heavily
through open router. I mean honestly if
you take a look at my open router
credits page uh you you'll see here that
I just over the last two or three months
I've probably spent about 2500 to about
$3,000 on just open router credits. And
that doesn't even include my Cloud Max
subscription as well as the API that
I've been using from uh Claude directly
and also the API that I've been using
from Amazon Bedrock. So probably spent
closer to about $15,000 over the last 3
months on you know all of these models.
So I think I would be pretty decent at
being able to figure out that if there
whether or not there is a difference or
not uh between these models. So, um, I'm
excited about Opus 41. I think Opus 4 by
itself is great. If I have a bug, uh,
anywhere in my codebase, it's able to
find this bug and fix this problem like
really, really easily. And I just need
to review it and just give it a go and
say like, yep, this sounds great. Okay,
so enough talking. Let's call this model
through open router. And I could have
used the cloud provider as well, but
that's fine. For now, I'm just going to
give it the cloud opus 41 model. Just
make sure that that looks right. I will
also give it the or enable extended
thinking and allow the full budget of
6,553 tokens. And here I'm actually
giving it a code base that actually has
a bunch of different things. So, it's
not just like an empty codebase. Um,
this is a code base for my website for
Percr. And as you can see, like this is
a full sort of landing page, but doesn't
really have anything else in there. Um,
this particular folder is just for the
landing page development. And what I'm
going to do is simply take a screenshot
of this and go here. What's wrong with
my hero page? And paste the link. Um,
the background is really off. Can you
help improve it further?
Uh, think step by step and put yourself
in the mind of an expert UI UX designer
and researcher.
List out all the things to improve and
do it.
This is kind of how I usually
work with this sort of prompting. I ask
the models to put themselves in the mind
or themselves itself in the mind of what
I want it to be. And there we go. It's
starting to work. This is my first ever
time working. Oh my god. That's that
that was 20 cents. That was 50 cents and
it hasn't really done anything yet.
Okay, so it looks like it is finally
done now. And this cost me about $2.50.
Um, and this is the current state of the
site and the landing page. And this is
the one that I just created.
I absolutely hate it. This is complete
waste of $2.50.
But I can see an appeal that this might
have for some people. I just don't think
that this is the right type of thing for
the product.
But I guess the UI is kind of cool. You
hover your mouse there and all the stars
kind of move and stuff. We have this
little hover effect kind of thing.
But would this really be premium? I
don't know. It's like stars and space
kind of vibe, I guess. But okay. Uh,
that's what you get for vibe coding. Let
me try it one more time. I gave it the
same exact prompt and this time it built
this
which to be honest does look a little
bit at least it looks different. At
least it doesn't look
as whatever it was the last time. I
think it's a little bit more
enterprisegrade. It did add in a bunch
of stuff that's not true. So that needs
to get worked on a little bit. But
honestly this time around it's not as
bad as it was. Would I still change it?
I don't know. Um, maybe I'll think about
it. Let's also see how this works on
LinkedIn. Create a post for the
LinkedIn. So, I basically ha had it
create a full master document that
essentially lists all of the things that
the site actually does. And I want to
see what the writing style looks like.
Um, recruit. So, it does have m dashes.
So, a little bit of a giveaway that this
is an AI generated post. 212 word count.
And what's the raw content? We flip this
equation. Here's what changed.
Yeah, I don't know. I think that if I
was to create a post like this, it would
be pretty obvious that I feel that it
was AI generated.
So, I don't know. Is it still there in
terms of like the writing part of
things? I probably I I don't really
think so. Um, but maybe I just need to
play around with the prompt a little
bit. Maybe give it a couple of like one
twoot examples of like how I want it to
be. Maybe create like a full brand
writing style. Um, if you haven't
watched my video on Claude projects, um,
I recommend it. That way you can kind of
like create an entire project, give it a
full knowledge base. And then here I can
maybe add some content on let's say text
content on like brand
writing guidelines and style. And then
this way anytime that I'm referencing u
or anytime I'm actually making a chat
here, it's always going to be able to
reference the writing style and
guideline and all of those kind of
things.
But yeah, so far I I feel like in terms
of coding, it looks great. I'm going to
try it out with a couple of bugs that I
know that I haven't been able to fix
yet. Um, and that way I'll have a full
understanding of like how well this
model actually does. In terms of cost,
yeah, it is an expensive model, so I'm
not going to be using it all the time.
This is just something that I would use
when, you know, I'm really frustrated. I
just can't seem to get to the bottom of
the problem with cloud set. and then I
just, you know, either debug it myself
or I would just use cloud opus. But
that's it for this video. I think that
overall I'm really excited about this
new update. Uh the model does seem
really great at figuring out a lot of
different things. I'm still needing to
experiment with it a little bit more. Uh
let me give it a couple of really
important stuff like things like in a
huge code code base that has a lot of
different interactions and stuff and let
me see what I get back from there. But
more to come. Thank you all for tuning
in. If you haven't watched my video on
GPT OSS model that was released
literally an hour after this model, uh,
watch that one as well. I posted a link
to it in the description below. Thank
you all for tuning in. I'll see you in
the next one. Goodbye.
Ask follow-up questions or revisit key timestamps.
Professor Patterns introduces the Claude Opus 4.1 model, highlighting its significant improvements in coding, reasoning, and agentic tasks over its predecessor. Through practical tests involving UI design and LinkedIn post generation, the author evaluates its strengths and weaknesses, concluding that while it is an expensive model best reserved for complex debugging, its problem-solving capabilities are highly promising.
Videos recently processed by our community