HomeVideos

SAME DAY: Opus 4.6 AND Chat GPT 5.3!

Now Playing

SAME DAY: Opus 4.6 AND Chat GPT 5.3!

Transcript

234 segments

0:00

Today I got to try out Opus 46 and Chad

0:02

Chippity 5.3. Okay, these are the latest

0:05

and greatest state-of-the-art models. It

0:06

is kind of funny because at 10:45 a.m.

0:08

today, Opus 46 launches, but then at

0:11

11:12 a.m. Jippidity 5.3 Codeex

0:14

launches. I there's something about it.

0:15

I'm not going to lie. I think Sam is

0:17

feeling a little bit jealous. Okay, he's

0:18

a little upset at Daario for making

0:20

those commercials. Not very happy for

0:22

him getting made fun of. Yet, here we

0:24

are. Anywh who, I built the same

0:26

application in both of them. As you can

0:27

see this beautiful application right

0:29

here and this beautiful application

0:30

right here. How I did it is I asked it

0:32

to make a JSX transformer that could

0:34

take in JSX and produce out a 60 frames

0:37

pers terminal application that's

0:39

rendered using bun and I want the

0:41

transformer to be written in Rust for

0:43

JSX to JavaScript. I wanted to have hot

0:44

module reloading. It involved making a

0:47

bunch of prompts to both. But I did the

0:48

same initial seed prompt, let it do plan

0:50

mode. Jippy asked a couple extra

0:52

questions, got those done. Opus only

0:54

asked me one question, got those. And

0:55

then all the follow-up answers were all

0:57

answered in the exact same order doing

0:59

the same thing. So as close of a, you

1:01

know, sanitized test as one could make.

1:03

Also, what what is this? What what what

1:05

did what did Claude draw here? What is

1:07

this? I don't I don't know what to do

1:09

with this. Ridiculous. So how did it

1:11

actually do? Well, first I want to point

1:13

out that chat jippity 53 it actually did

1:16

make some JSX. It actually has on the-ly

1:19

compiling. It actually had quite a few

1:21

things. Never got hot module reloading

1:23

working. But I can come in here and I

1:25

can simply adjust this thing right here.

1:27

Resave it. Just rerun it. And you can

1:30

see the thing moves. Actually pretty

1:31

impressive. I am pretty happy about it.

1:33

I even asked Opus about it. Like, "Hey

1:35

Opus, how did Chad ship 835 do?" And

1:37

it's just like, "Well, it kind of

1:39

cheated on the syntax. It was doing more

1:41

of a DSL than anything else." Also, I

1:44

just want to let you know that it did it

1:46

was kind of it was kind of creative

1:48

approach because it did sidestep the

1:49

entire JavaScript ecosystem with JSX

1:51

tooling. And you know what I said? I

1:52

said, "Hey, that's smart." Okay,

1:53

JavaScript ecosystem for tooling. That's

1:56

what I have to say about it. On the

1:57

other hand, this is Opus'. Okay, first

2:00

off, art not that good. Okay, Squid,

2:03

Crab, and what appears to be LA. Okay,

2:06

this is LA. You know, the old dongle, I

2:08

guess, represents LA. Well, its code's a

2:11

little bit different. First off, you'll

2:12

notice something right here. Uh, I hate

2:14

to tell you this. That ain't LA. It just

2:16

cheated, which I'm not going to lie to

2:18

you, not very happy. I'm not very happy

2:20

at all. It just cheated. It just uses

2:23

functions right here. This is not uh

2:25

this is not the compiled code. This is

2:27

just the actual running code. But I will

2:29

say on the other hand, it did get hot

2:31

module reloading actually working. So

2:33

hey, little bit better for Opus on some

2:35

of the things. But the nonJSX thing, not

2:38

very happy about it. Also something kind

2:39

of interesting. It took about 2,000

2:41

lines of of of JavaScript to make this

2:44

happen, whereas it's only about a,000

2:46

lines of JavaScript for chat GPT. Also,

2:49

the chat GPT1 has an actually working

2:52

JSX parser also written with only 520

2:55

lines of Rust code. Whereas the compiler

2:58

for Opus, which I mean I don't know

3:00

what's going on here. It's not actually

3:01

running. It's not actually doing JSX is

3:03

1,300 lines of code. So, is that a good

3:05

measurement? I can't really tell. One

3:07

for one features. It's really really

3:08

hard. But I can say there was live

3:10

actual JSX compiled in one of them in

3:13

fewer lines of code. That's a pretty

3:14

good measurement for me. Also, just

3:16

generally speaking, I tend to prefer

3:18

ChatJippid's code output comparatively

3:21

to Claude. I can't tell you why. I just

3:24

seem to like its styling just a little

3:26

bit better. I think it just tends to

3:28

organize things a little bit better. Its

3:29

functions aren't maybe as as egregious

3:32

as Opus. Opus just is like, "Dude, I'm

3:33

solving. I'm solving. I'm solving. Next

3:35

function, I'm solving." Like, it just

3:37

doesn't feel as good ever looking at the

3:38

code. So, honestly, I I feel like Chad

3:41

Chippity kind of won here. Now, at the

3:42

end of the day, does it actually matter

3:44

which which model you use? I actually I

3:46

don't think the state-of-the-art models

3:47

at this point really matter. Long as you

3:50

you hold it right, it'll just work. I

3:51

think at the end of the day, if you're

3:53

going to use a model, they're all really

3:54

really decent, right? They can all go

3:56

pretty far producing code. And at the

3:58

end of the day, it's on to you to making

4:00

good code. And it's it's like the same

4:02

old thing. It nothing has really

4:04

changed. The people that weren't

4:05

producing good code just produce bad

4:07

code faster. the people that were

4:09

producing good code, they produce good

4:11

code and probably not at an entirely

4:13

changed rate, like a decent small like

4:15

maybe some small chunk increase because

4:17

producing good code is actually really

4:19

hard. So, is it going to matter if

4:20

you're on 45 versus 46 if you're still

4:23

stuck on like 51 codeex versus 53

4:26

codeex? I don't think it's going to

4:27

really matter. I think at the end of the

4:29

day, if you know what you're doing and

4:30

you practice, you get in those reps, you

4:32

get the time in the saddle, and you work

4:34

really, really hard at your craft, I

4:37

think you're going to be happy no matter

4:38

what model you're using, even if it's

4:40

unbreakable Kimmy K. And I think that's

4:41

the real takeaway here. And I also think

4:44

that that's why you see these model

4:45

providers competing so hard because when

4:47

one does an improvement, the other one

4:49

needs to because there's not really

4:50

anything holding people there, right?

4:52

It's just like, okay, I like this model.

4:54

I like this model. No, I like that

4:55

model. Honestly, I think one of the big

4:57

things is that people have a good

4:59

experience with one model and then

5:00

they're like, "Oh, this is the best

5:01

one." And then you just kind of stick

5:03

with it for some sort of weird emotional

5:04

appeal for some amount of time and then

5:06

it does something. You're like, "I don't

5:08

like it anymore. I'm going to try this

5:09

other one." And then it does something

5:10

good. You're like, "Nah, this is my new

5:11

favorite one." I just wanted to throw

5:12

that out there because I think you're

5:14

going to get hit with a bunch of

5:15

benchmarks over the next few days. Not

5:17

even sure any of them really matter at

5:18

this point cuz let's just face it when

5:20

you're producing 10,000 nay 15,000 lines

5:23

of code a day. Does it really matter? I

5:26

just don't I'm just not sure it matters

5:28

at all. And a lot of that's because I

5:30

just have this theory. Okay, now we're

5:31

going to get into some weird theory

5:32

time. Okay, I think that the average

5:34

programmer is of quality X. Now, here's

5:36

the weird part about the quality of X. I

5:39

am not really sure if it's between 0 to

5:42

one. I think it might be between

5:43

negative one to one where some engineers

5:46

and you know this you've worked with

5:47

some engineers when they build things

5:50

afterwards it actually takes more work

5:52

and more time to figure out how to undo

5:54

some of the things that have been done

5:56

and I kind of feel like AI it's more of

5:59

a multiplier right it's not an addition

6:00

it's not like plus five you're not just

6:03

that much better it doesn't take

6:05

somebody that's a negative0.5 and turns

6:07

them into a 4.5 whatever these numbers

6:10

mean developer no it's more like a mult

6:12

multiplier. So if you work with somebody

6:14

that say is a negative0.1, you know,

6:16

like they deliver features, but there's

6:18

a bit of kind of tech debt that kind of

6:20

comes associated with these features,

6:22

the only thing I think you're getting is

6:23

you're going to be transferring them

6:24

from a 0.1 to a one. They're just going

6:26

to make those questionable features just

6:28

10 times faster. That's kind of like my

6:30

grand unifying theory of programming as

6:33

it is along with kind of the impacts of

6:34

AI. And I also think I know there's

6:36

going to be another crazy one. I also

6:38

think that the impact of AI is

6:40

proportional to how much you're on this

6:42

side of the equation versus this side of

6:45

the equation. Like there are some things

6:46

that man it makes life easy. Like yo, go

6:49

run this integration test and then I

6:50

want you to figure out what's breaking

6:52

about it and then I want you just to

6:53

give me a report of why you think

6:55

something breaks. I'm going to come back

6:56

to you in 30 minutes and I'll see if you

6:58

figured out some stuff and kind of

7:00

follow your leads. Oh my gosh. Saves me

7:02

literally hours a day. Like that that

7:04

actually is an incredible feature of AI.

7:06

Okay, that's it. That's all I have to

7:07

say. Okay, bye-bye. Take care. The name

7:10

The name is I don't really The name is I

7:14

don't I don't really care. A Jen. Hey,

7:17

is that HTTP? Get that out of here.

7:19

That's not how we order coffee. We order

7:21

coffee via ssh terminal.shop. Yeah. You

7:24

want a real experience? You want real

7:26

coffee? You want awesome subscriptions

7:28

so you never have to remember again? Oh,

7:30

you want exclusive blends with exclusive

7:33

coffee and exclusive content? Then check

7:36

out cron. You don't know what SSH is?

7:39

>> Well, maybe the coffee is not for you.

7:46

Living the dream.

Interactive Summary

The speaker compares the capabilities of two advanced AI models, Opus 46 and GPT-5.3, by having them build a JSX transformer for a 60fps terminal application. While GPT-5.3 successfully implemented a working JSX parser with high efficiency, Opus 46 managed to implement hot module reloading but 'cheated' on the JSX requirement. The video concludes that AI serves as a productivity multiplier, meaning it helps skilled developers produce good code faster, but also enables less skilled developers to create technical debt at an accelerated pace.

Suggested questions

3 ready-made prompts