SAME DAY: Opus 4.6 AND Chat GPT 5.3!
234 segments
Today I got to try out Opus 46 and Chad
Chippity 5.3. Okay, these are the latest
and greatest state-of-the-art models. It
is kind of funny because at 10:45 a.m.
today, Opus 46 launches, but then at
11:12 a.m. Jippidity 5.3 Codeex
launches. I there's something about it.
I'm not going to lie. I think Sam is
feeling a little bit jealous. Okay, he's
a little upset at Daario for making
those commercials. Not very happy for
him getting made fun of. Yet, here we
are. Anywh who, I built the same
application in both of them. As you can
see this beautiful application right
here and this beautiful application
right here. How I did it is I asked it
to make a JSX transformer that could
take in JSX and produce out a 60 frames
pers terminal application that's
rendered using bun and I want the
transformer to be written in Rust for
JSX to JavaScript. I wanted to have hot
module reloading. It involved making a
bunch of prompts to both. But I did the
same initial seed prompt, let it do plan
mode. Jippy asked a couple extra
questions, got those done. Opus only
asked me one question, got those. And
then all the follow-up answers were all
answered in the exact same order doing
the same thing. So as close of a, you
know, sanitized test as one could make.
Also, what what is this? What what what
did what did Claude draw here? What is
this? I don't I don't know what to do
with this. Ridiculous. So how did it
actually do? Well, first I want to point
out that chat jippity 53 it actually did
make some JSX. It actually has on the-ly
compiling. It actually had quite a few
things. Never got hot module reloading
working. But I can come in here and I
can simply adjust this thing right here.
Resave it. Just rerun it. And you can
see the thing moves. Actually pretty
impressive. I am pretty happy about it.
I even asked Opus about it. Like, "Hey
Opus, how did Chad ship 835 do?" And
it's just like, "Well, it kind of
cheated on the syntax. It was doing more
of a DSL than anything else." Also, I
just want to let you know that it did it
was kind of it was kind of creative
approach because it did sidestep the
entire JavaScript ecosystem with JSX
tooling. And you know what I said? I
said, "Hey, that's smart." Okay,
JavaScript ecosystem for tooling. That's
what I have to say about it. On the
other hand, this is Opus'. Okay, first
off, art not that good. Okay, Squid,
Crab, and what appears to be LA. Okay,
this is LA. You know, the old dongle, I
guess, represents LA. Well, its code's a
little bit different. First off, you'll
notice something right here. Uh, I hate
to tell you this. That ain't LA. It just
cheated, which I'm not going to lie to
you, not very happy. I'm not very happy
at all. It just cheated. It just uses
functions right here. This is not uh
this is not the compiled code. This is
just the actual running code. But I will
say on the other hand, it did get hot
module reloading actually working. So
hey, little bit better for Opus on some
of the things. But the nonJSX thing, not
very happy about it. Also something kind
of interesting. It took about 2,000
lines of of of JavaScript to make this
happen, whereas it's only about a,000
lines of JavaScript for chat GPT. Also,
the chat GPT1 has an actually working
JSX parser also written with only 520
lines of Rust code. Whereas the compiler
for Opus, which I mean I don't know
what's going on here. It's not actually
running. It's not actually doing JSX is
1,300 lines of code. So, is that a good
measurement? I can't really tell. One
for one features. It's really really
hard. But I can say there was live
actual JSX compiled in one of them in
fewer lines of code. That's a pretty
good measurement for me. Also, just
generally speaking, I tend to prefer
ChatJippid's code output comparatively
to Claude. I can't tell you why. I just
seem to like its styling just a little
bit better. I think it just tends to
organize things a little bit better. Its
functions aren't maybe as as egregious
as Opus. Opus just is like, "Dude, I'm
solving. I'm solving. I'm solving. Next
function, I'm solving." Like, it just
doesn't feel as good ever looking at the
code. So, honestly, I I feel like Chad
Chippity kind of won here. Now, at the
end of the day, does it actually matter
which which model you use? I actually I
don't think the state-of-the-art models
at this point really matter. Long as you
you hold it right, it'll just work. I
think at the end of the day, if you're
going to use a model, they're all really
really decent, right? They can all go
pretty far producing code. And at the
end of the day, it's on to you to making
good code. And it's it's like the same
old thing. It nothing has really
changed. The people that weren't
producing good code just produce bad
code faster. the people that were
producing good code, they produce good
code and probably not at an entirely
changed rate, like a decent small like
maybe some small chunk increase because
producing good code is actually really
hard. So, is it going to matter if
you're on 45 versus 46 if you're still
stuck on like 51 codeex versus 53
codeex? I don't think it's going to
really matter. I think at the end of the
day, if you know what you're doing and
you practice, you get in those reps, you
get the time in the saddle, and you work
really, really hard at your craft, I
think you're going to be happy no matter
what model you're using, even if it's
unbreakable Kimmy K. And I think that's
the real takeaway here. And I also think
that that's why you see these model
providers competing so hard because when
one does an improvement, the other one
needs to because there's not really
anything holding people there, right?
It's just like, okay, I like this model.
I like this model. No, I like that
model. Honestly, I think one of the big
things is that people have a good
experience with one model and then
they're like, "Oh, this is the best
one." And then you just kind of stick
with it for some sort of weird emotional
appeal for some amount of time and then
it does something. You're like, "I don't
like it anymore. I'm going to try this
other one." And then it does something
good. You're like, "Nah, this is my new
favorite one." I just wanted to throw
that out there because I think you're
going to get hit with a bunch of
benchmarks over the next few days. Not
even sure any of them really matter at
this point cuz let's just face it when
you're producing 10,000 nay 15,000 lines
of code a day. Does it really matter? I
just don't I'm just not sure it matters
at all. And a lot of that's because I
just have this theory. Okay, now we're
going to get into some weird theory
time. Okay, I think that the average
programmer is of quality X. Now, here's
the weird part about the quality of X. I
am not really sure if it's between 0 to
one. I think it might be between
negative one to one where some engineers
and you know this you've worked with
some engineers when they build things
afterwards it actually takes more work
and more time to figure out how to undo
some of the things that have been done
and I kind of feel like AI it's more of
a multiplier right it's not an addition
it's not like plus five you're not just
that much better it doesn't take
somebody that's a negative0.5 and turns
them into a 4.5 whatever these numbers
mean developer no it's more like a mult
multiplier. So if you work with somebody
that say is a negative0.1, you know,
like they deliver features, but there's
a bit of kind of tech debt that kind of
comes associated with these features,
the only thing I think you're getting is
you're going to be transferring them
from a 0.1 to a one. They're just going
to make those questionable features just
10 times faster. That's kind of like my
grand unifying theory of programming as
it is along with kind of the impacts of
AI. And I also think I know there's
going to be another crazy one. I also
think that the impact of AI is
proportional to how much you're on this
side of the equation versus this side of
the equation. Like there are some things
that man it makes life easy. Like yo, go
run this integration test and then I
want you to figure out what's breaking
about it and then I want you just to
give me a report of why you think
something breaks. I'm going to come back
to you in 30 minutes and I'll see if you
figured out some stuff and kind of
follow your leads. Oh my gosh. Saves me
literally hours a day. Like that that
actually is an incredible feature of AI.
Okay, that's it. That's all I have to
say. Okay, bye-bye. Take care. The name
The name is I don't really The name is I
don't I don't really care. A Jen. Hey,
is that HTTP? Get that out of here.
That's not how we order coffee. We order
coffee via ssh terminal.shop. Yeah. You
want a real experience? You want real
coffee? You want awesome subscriptions
so you never have to remember again? Oh,
you want exclusive blends with exclusive
coffee and exclusive content? Then check
out cron. You don't know what SSH is?
>> Well, maybe the coffee is not for you.
Living the dream.
Ask follow-up questions or revisit key timestamps.
The speaker compares the capabilities of two advanced AI models, Opus 46 and GPT-5.3, by having them build a JSX transformer for a 60fps terminal application. While GPT-5.3 successfully implemented a working JSX parser with high efficiency, Opus 46 managed to implement hot module reloading but 'cheated' on the JSX requirement. The video concludes that AI serves as a productivity multiplier, meaning it helps skilled developers produce good code faster, but also enables less skilled developers to create technical debt at an accelerated pace.
Videos recently processed by our community