Vibe Coding With GPT 5.3 Codex

Watch on YouTube

Now Playing

Transcript

986 segments

0:00

(Transcribed by TurboScribe.ai. Go Unlimited to remove this message.) In this video, I am putting the newly

0:02

released GPT 5.3 codecs to the test

0:05

in a new benchmark that I am calling

0:07

BridgeBench.

0:09

BridgeBench is going to put GPT 5.3

0:11

codecs to the test from giving it tasks

0:14

like building an FPS game inside one prompt

0:18

to building the Windows 10 interface all in

0:21

one shot.

0:22

If you guys would like to check out

0:24

the prompts that I'm going to be using

0:25

in BridgeBench and even test it out yourself,

0:28

you can just go to bridgemind.ai and

0:31

go to BridgeBench.

0:32

There'll be a link to this down in

0:34

the description below, but you'll be able to

0:36

actually see and copy the prompts that I'm

0:38

using for this test that you're going to

0:40

see in this video.

0:41

Before we get too deep into the video,

0:43

we are only 25 members away from hitting

0:46

5,000 members in the BridgeMind Discord community.

0:49

So if you guys are watching and have

0:50

not already joined the fastest growing vibe coding

0:52

community on the internet right now, make sure

0:54

you check out the description below and join

0:56

the BridgeMind Discord community.

0:58

Also, I do have a 200 like goal

1:01

on this video.

1:01

So if you guys haven't already liked the

1:03

video or subscribed and turn on post notifications,

1:06

make sure you do so.

1:07

And with that being said, let's get right

1:08

into the video.

1:10

Let's now open up BridgeSpace, which is the

1:12

ADE, the agent development environment that we've been

1:15

working on that's going to ship in the

1:17

next week or so.

1:18

And what I'm going to do now is

1:20

I'm going to open up a new workspace

1:22

in the BridgeBench directory.

1:25

So let's go over to my desktop and

1:27

let's select BridgeBench.

1:28

And we are going to launch six codex

1:31

agents inside of this workspace.

1:33

So I just select the directory.

1:35

I select how many terminals I want to

1:36

open up and I select which AI agent

1:37

I actually want to use.

1:38

And now I'm going to create this new

1:40

workspace.

1:41

And just like that, we have six codex

1:43

agents that are ready to go.

1:44

So I'm actually going to close one of

1:46

these because we only have five prompts here.

1:49

So we need five codex agents.

1:51

So what I'm going to do is I'm

1:52

going to give a quick prompt and I'm

1:57

just going to I'm going to add the

1:58

prompt one.

2:00

And I'm just going to say, I want

2:02

you to I want you to work inside

2:06

of the GPT 5 3 codex directory and

2:11

complete this project.

2:16

OK, so we're going to input this prompt

2:19

and we're going to use this same exact

2:22

prompt.

2:22

I just use that was BridgeVoice that I'm

2:24

using for my my prompts there.

2:26

That's another product in the BridgeMine suite of

2:28

Vibe coding tools that we have releasing.

2:30

So we're just going to add all of

2:32

these prompts that are in the BridgeBench and

2:34

paste in the same exact prompt here.

2:36

And we're going to do number four now.

2:39

So it's prompt number four and it is

2:41

now prompt number five.

2:42

So here we are.

2:44

We have all of these ready to go.

2:47

And just like that, let's just submit all

2:49

of these.

2:49

They're all in extra high.

2:51

They're all going to work in the GPT

2:52

5.3 codex directory.

2:54

And we're going to let these work because

2:56

as you guys know, one thing about GPT

2:59

models is that they are a little bit

3:01

slower.

3:02

They think for longer, especially when you're in

3:04

this extra high mode.

3:06

So we are using GPT 5.3 codex

3:08

extra high for this.

3:10

But while these actually work, you can see

3:12

that each of them are working in their

3:14

respective directories here.

3:15

But what I want to do now is

3:17

I actually want to take a look at

3:18

the benchmarks while these are working that OpenAI

3:22

released when they released GPT 5.3 because

3:25

I think that benchmarks are a very valuable

3:28

thing to look at for us to understand

3:30

what this model's capabilities are.

3:32

So while these all work and give us

3:34

a good look of what this model is

3:36

capable of, let's now take a look at

3:38

the benchmarks while these complete in the background.

3:40

All right.

3:41

So one very important thing that you guys

3:43

need to know is that GPT 5.3

3:45

codex released the same day as Opus 4

3:49

.6 did.

3:50

They basically released this at the same time

3:53

for composition sake so that people would be

3:55

able to be like, oh, which one's better?

3:56

Which one's better, right?

3:58

Now, a lot of people are saying that

3:59

GPT 5.3 is better.

4:01

But one thing to know is that GPT

4:03

5.3 codex is not available via the

4:06

API.

4:07

You can only use this inside of OpenAI's

4:10

codex right now.

4:11

So if you're using the codex CLI, you

4:13

can use GPT 5.3. That's the only

4:15

place that you can use it right now.

4:17

So it is a little bit unfortunate because

4:19

we also cannot see GPT 5.3 codex

4:22

inside of LM Arena.

4:24

So if we go to LM Arena real

4:25

quick, because it's not in the API, we

4:28

are unable to see it inside of the

4:30

LM Arena.

4:31

So you can see here Opus 4.6

4:33

is absolutely crushing it on LM Arena at

4:37

1576.

4:37

So it's dominating GPT 5.2 high.

4:40

Now, we don't know how that will rack

4:42

up once GPT 5.3 codex is actually

4:45

available inside of the API and is able

4:47

to be benchmarked on these leaderboards.

4:49

But Opus 4.6 is actually crushing it.

4:52

But in terms of GPT 5.3, it's

4:54

not yet available in the API.

4:56

So we're unable to see it on the

4:58

benchmarks.

4:58

But let's go over here and see the

5:00

SWE Bench Pro.

5:01

So this is how it compares to GPT

5:04

5.2 codex.

5:05

So you can see that that's actually a

5:08

bad chart.

5:08

Like here is one that's very, very interesting.

5:10

The Terminal Bench 2.0. So this is

5:13

one of the biggest jumps that OpenAI did

5:17

with this model.

5:17

So you can see that on the Terminal

5:19

Bench, it jumped from 64% in GPT

5:23

5.2 codex to 77.3% with

5:26

GPT 5.3 codex, which is actually a

5:29

huge jump.

5:30

And it does beat out Opus 4.6

5:32

in the Terminal Bench by quite a bit.

5:34

So that is a big deal for us

5:37

as people that are using CLI tools.

5:39

So here's a little example that they gave

5:41

on what the capabilities are of these differences.

5:44

They basically gave it the same prompt.

5:46

So interesting there to be able to see

5:48

that.

5:49

But let's check it out down here.

5:51

OS World Verified.

5:53

You can see a huge jump there.

5:54

So that is an agentic computer use benchmark

5:57

where the agent has to complete productivity tasks

6:00

in a virtual desktop computer environment.

6:02

So this is one thing that I want

6:03

to kind of highlight here because I've talked

6:05

about it a little bit on stream, but

6:07

I think that computer use is going to

6:09

start to become a bigger deal in 2026.

6:13

We saw that with CloudBot or OpenClaw coming

6:16

out and people using it for computer use.

6:18

So that's big as well.

6:19

It saw a huge jump there.

6:21

They have a lot of text here.

6:23

Can we get?

6:23

Okay, here we go.

6:24

Finally, in the appendix, we have some of

6:26

the information here.

6:26

So one thing to note, and this came

6:30

with both the Opus 4.6 drop as

6:32

well as the GPT 5.3 drop.

6:35

In both of these models, we did not

6:37

see a significant improvement in the SWE bench

6:42

marks.

6:43

So this is the SWE Bench Pro benchmark,

6:45

but you can see that it hardly improved

6:47

from GPT 5.2 to GPT 5.3.

6:50

We saw a 0.4% improvement on

6:52

the SWE Bench Pro.

6:54

Now on the terminal bench, we saw a

6:56

huge improvement, which is great.

6:57

And in these other areas, we also saw

6:59

a pretty big improvement.

7:01

But with SWE Bench Pro and SWE Bench

7:04

Verified, which is also an important benchmark, we're

7:07

not really seeing the massive jumps that we

7:09

saw.

7:10

For example, from Opus 4.1 to Opus

7:12

4.6, there was a huge jump in

7:14

capability.

7:15

Same thing from GPT 5 to what we

7:17

now have, GPT 5.3, a pretty large

7:20

jump in coding capabilities.

7:22

But this just seems like a very small

7:23

iteration when you're actually looking at the benchmarks.

7:27

But I will say after having used the

7:29

model for hours and hours and hours on

7:32

end, I will say that that small jump,

7:35

that small iteration has made a difference as

7:38

well as some of the other improvements in

7:39

the models.

7:40

In my personal workflow, I have noticed it

7:42

one-shot things that I've been struggling on

7:45

doing with the previous models, GPT 5.2

7:48

or even Opus 4.5. I did notice

7:51

that GPT 5.3 was able to one

7:53

-shot things that it was not able to

7:55

do before.

7:56

So I will say that, but that's one

7:57

thing I want to highlight is that in

7:59

terms of the benchmarks and the coding benchmarks,

8:01

we're not seeing some massive jump with 5

8:04

.3 or Opus 4.6 for that matter.

8:07

But specifically with 5.3, we're seeing only

8:09

a 0.4% increase here.

8:11

So not going to be a massive improvement

8:13

in coding capabilities, but still a small improvement.

8:17

So maybe that small improvement helps us do

8:19

a lot.

8:20

So that's the benchmarks, that's what you guys

8:22

need to know.

8:23

Just to summarize, not super big jumps in

8:26

coding, but some pretty big jumps in other

8:28

areas like the Terminal Bench 2.0 and

8:30

OS World Verified.

8:31

So those are some pretty impressive jumps, but

8:33

coding, not that great.

8:34

But with that being said, let's go check

8:36

up on our coding agents and see if

8:37

they're done yet.

8:38

All right.

8:38

So all of our agents have now finished

8:40

working.

8:41

So just as a reminder, we are building

8:43

an FPS game, a Windows 10 interface, the

8:46

flight simulator, and bridge trade stock app, as

8:49

well as a new bridge mine landing page.

8:51

So let's actually now pull these up and

8:54

see what was created starting with the new

8:57

Windows 10 interface, which was created by chatGBT,

9:00

or sorry, GBT 5.3 codecs.

9:03

So here we are.

9:04

So we have a Windows interface.

9:05

Let's just click around a little bit.

9:07

Let's check out the command prompt and see

9:09

if this works.

9:10

So let's do an LS.

9:11

So we have desktop.

9:13

Let's CD to our desktop.

9:16

That's interesting.

9:17

Okay.

9:17

So let's CD out.

9:18

So how can we...

9:20

So this looks like it works.

9:21

The command prompt works.

9:22

It looks like we have a calculator.

9:24

Does the calculator work?

9:25

Let's do six times seven equals...

9:28

Okay, 42.

9:30

Okay.

9:30

So that looks like it works.

9:32

You can see the calculator.

9:33

When I dismiss it, it stays down here.

9:35

If I close it, it gets removed.

9:37

That looks nice.

9:37

What's this?

9:38

This is a notepad.

9:39

Hello there.

9:40

Can I save this?

9:41

Let's see if I can actually save this.

9:42

Let's save as, and we'll save it to...

9:46

Let's save it in our desktop, and we'll

9:48

just title it test.

9:51

And if this shows up on a desktop,

9:52

that's actually interesting, which it doesn't look like

9:55

it does.

9:55

So it looks like we just saved it,

9:58

but it didn't actually save.

9:59

Let's see file explorer and...

10:02

Oh, here's test, and here's hello there.

10:04

Oh, wow.

10:04

Okay.

10:04

So it did do that correctly.

10:07

There's the test.

10:09

Now, for some reason, it doesn't actually show

10:10

up on my desktop here.

10:14

Can I drag and drop this into other...

10:17

Okay.

10:17

I can't drag and drop.

10:19

Can I just...

10:20

Okay.

10:20

I can delete these.

10:21

I can rename them.

10:23

Remember, this is just in a simple HTML

10:25

file.

10:26

So it was able to do all of

10:27

this in one shot.

10:28

If I go to my recycle bin, here's

10:30

what that looks like.

10:30

You can see that test.txt that I

10:34

had was now moved to the recycling bin.

10:38

So, I mean, that's pretty impressive.

10:40

Microsoft Edge.

10:41

I mean, I don't think this is going

10:42

to work because it's in an HTML file.

10:45

So, yeah, it's not going to work.

10:47

So that's decent.

10:49

Now, photos.

10:50

What is up with this?

10:52

Photos is all right.

10:53

Control panel.

10:55

Here's everything in the control panel.

10:56

Can I actually change this to...

10:58

Oh, wow.

10:59

Okay.

10:59

So I can adjust the taskbar color.

11:01

I can adjust the window color, which is

11:03

pretty cool.

11:03

I can turn on dark mode, dark mode,

11:06

light mode.

11:06

I mean, this is pretty basic, in my

11:10

opinion.

11:11

It's all right.

11:12

You can adjust the brightness, resolution.

11:14

I don't notice any difference with the resolution.

11:19

But settings.

11:21

Settings is interesting.

11:23

Paint.

11:24

Can we paint?

11:24

We can paint.

11:25

So that's cool.

11:27

And remember, you know, this is all in

11:28

just one shot.

11:29

We can erase.

11:30

We can fill.

11:32

Green.

11:34

Lines.

11:34

Rectangle.

11:36

I mean, this is...

11:37

Wow.

11:37

This is pretty cool.

11:39

So that does work.

11:41

Now, with this, I mean, I guess this

11:43

is all the functionality that it has.

11:45

So if we do the windows, like, here's

11:47

what it has.

11:48

The weather.

11:49

I doubt that the weather works.

11:50

Yeah.

11:50

So, I mean, this is just in a

11:51

basic HTML, but it did do a decent

11:53

job.

11:53

I mean, it's got the time correct.

11:55

You can see it down here.

11:58

1103272026.

11:59

It's got the got the Wi-Fi.

12:00

I don't think it's actually connected to my

12:01

Wi-Fi, but we can adjust the volume.

12:03

Battery.

12:04

There's not actually any battery, but I mean,

12:06

for a one shot, this is okay.

12:08

Obviously, you can see, like, it's not like,

12:11

I mean, it's not Windows 10, but, you

12:13

know, it did a pretty good job for

12:14

one prompt.

12:15

So I would give this maybe like a

12:18

6 out of 10.

12:19

You can move everything around.

12:21

I don't know what else you guys would

12:22

like to see, but, I mean, it did

12:24

build the Windows 10 interface.

12:27

Not exactly Windows 10, but definitely was able

12:31

to build something pretty, pretty unique here that

12:33

reflects Windows 10.

12:35

So I would give it a 6 out

12:36

of 10.

12:37

All right.

12:37

The next thing that I want to test

12:39

out is this FPS game.

12:41

So first of all, okay, so it's called

12:42

Neon Breach.

12:44

Browser FPS built with a pure JavaScript raycasting

12:48

engine.

12:49

Controls are WASD.

12:52

Mouse look, shift, sprint, 1, 2, 3, switch

12:54

weapon, R to reload, left click to fire.

12:57

Okay.

12:58

So let's go and first of all, like,

13:00

I do want to pull up cursor and

13:02

I want to see how many agents, how

13:05

many lines of code did these agents actually

13:07

create?

13:07

So for the Windows interface, let's see this.

13:09

So that did, the one shot for the

13:12

Windows 10 interface was 5,308 lines of

13:15

code.

13:15

For this FPS game, it's only 1,623.

13:20

So let's click to start.

13:22

So let's just mouse look, shift to sprint,

13:24

1, 2, 3, switch weapon.

13:24

Okay.

13:25

Let's start.

13:26

So one, whoa.

13:29

Okay.

13:30

First of all, the, there's a map in

13:32

the left corner here.

13:33

There's, you can see the map in the

13:35

top left corner here.

13:36

Oh, am I taking damage?

13:37

Oh, shoot.

13:37

I'm literally taking damage right now.

13:39

The, okay.

13:39

First of all, the styling isn't that great.

13:41

What is this?

13:42

Okay.

13:43

Shotgun shells.

13:43

How do I get a gun out?

13:44

How do I shoot?

13:46

Whoa.

13:47

Okay.

13:48

So I'm shooting, but I'm also getting hit.

13:51

Am I getting killed?

13:52

What, how am I getting, okay.

13:54

Game over.

13:55

Enemies killed.

13:56

So I arrived for 29 seconds.

13:58

Accuracy was 17%.

13:59

Let's restart.

14:00

Okay.

14:00

So this is actually like not that good.

14:03

So shotgun shells, can I, I can't even

14:04

shoot.

14:05

I'm, I'm, I'm doing, okay, wait, hold up.

14:07

Oh, whoa.

14:08

Okay.

14:08

So I think I did shoot it.

14:12

Okay.

14:12

So I'm sprinting, I'm pressing shift.

14:17

Okay.

14:17

I think I did.

14:18

Was that me killing it?

14:19

Okay.

14:19

I think that was me killing it.

14:21

Okay.

14:22

I mean, the graphics are horrible.

14:26

The animation's not good.

14:28

There's health over here.

14:29

I mean, there's a map in the up,

14:31

upper left-hand corner.

14:33

The styling that it did, like the spacing

14:35

that you guys see, not good.

14:38

They're also like, I just don't even understand.

14:41

Like, it just wasn't very creative, if I'm

14:42

being completely honest.

14:43

Like, hey, the map is nice.

14:45

There's like a ton of people in over

14:48

here.

14:49

Can I like get over here?

14:50

Okay.

14:51

I mean, there's no animations.

14:55

There's like, I wish there would be like

14:56

some sort of, okay.

14:58

It's reloading my shotgun.

14:59

So reloading works.

15:01

Okay.

15:03

Oh, I mean, there's, I wish there was

15:05

like some type of, you know, gunshots or

15:09

something like that.

15:10

Here's health.

15:11

Here's, what is this?

15:12

Machine ammo.

15:14

How do I?

15:15

Okay.

15:16

Okay.

15:16

I switched my machine gun.

15:18

Oh, whoa.

15:19

Okay.

15:19

Game over.

15:20

We survived for 79.5 seconds.

15:22

Apparently, my accuracy was 217%.

15:25

So I don't know.

15:26

I would give this like a two out

15:27

of 10.

15:28

Um, this was not very good.

15:31

And you know, this could go back to

15:32

our system prompt that we gave it.

15:33

And just that it, uh, was maybe like

15:36

misunderstood our system prompt, but you can look

15:38

at this and like, obviously the animations and

15:41

what it created here, not super impressed by

15:44

what it did.

15:45

So that could tie back to our system

15:48

prompt.

15:48

We may need to, you know, update that

15:50

system prompt for future bridge bench tests, but

15:53

I am not impressed with what Codex did

15:54

here.

15:56

I'm next thing that I want to check

15:58

out is the front end styling.

16:00

So this is the remake of the bridge

16:02

mind website that it made.

16:04

So, you know, one thing that GPT models

16:06

have been notoriously bad at has been styling.

16:09

Now, as I've used to GPT 5.3

16:12

Codex in the past couple of days, I

16:13

will say that when given the correct information

16:16

and prompts, it has been doing a much

16:18

better job than what we had with like

16:20

GPT 5 for sure.

16:22

But what I will say with this is

16:24

it's not that great.

16:25

What it created here.

16:26

I mean, let's, um, let's try shrinking this.

16:28

How's the, I mean, I would say that

16:31

the responsiveness is actually pretty good.

16:33

Like the responsiveness and spacing is, is pretty

16:35

good.

16:36

Um, it could do a couple of things

16:38

better, like maybe making the, not those buttons,

16:40

not wrap and just making them, you know,

16:42

compact a little bit more, um, and adjust

16:44

to the screen size, but, um, overall, definitely

16:47

better than what we've seen with previous GPT

16:50

models.

16:50

I will say that, Hey, with like one

16:52

shot with no styling guide, um, it's doing

16:55

a pretty decent job.

16:56

It just created this landing page.

16:58

I believe.

16:58

Yeah.

16:59

Like if you, all these buttons will just

17:00

link to, uh, to different parts of that

17:02

page.

17:02

So I would give this a four out

17:06

of 10.

17:06

Um, I'm not super impressed by like the

17:10

background gradients.

17:11

Like this is like kind of your standardized

17:12

AI look.

17:13

I don't feel like this is really unique.

17:15

You know, sometimes when you're using Gemini three

17:17

pro, you can get these really unique UIs.

17:20

Uh, but with GBT, I would say that

17:21

it didn't do a great job on this.

17:23

Now with that, I will say like, uh,

17:25

for example, with bridge voice, I was having

17:27

GBT 5.3 codex, make a couple updates

17:29

to bridge voice and all the UI that

17:31

you guys see here.

17:32

Um, like even like these, these theme changes

17:34

and whatnot, these were all made by a

17:37

GBT 5.3 codex.

17:39

And this one had a little bit more

17:41

of a structured styling in place to begin

17:43

with.

17:44

And GBT 5.3 did do a really

17:46

good job making it like very uniform.

17:49

So that's just one example that I will

17:50

show you.

17:51

Like from my personal work, when you're getting,

17:53

when you're actually using 5.3 in practice

17:55

and you're giving it the correct style guides

17:57

and you're giving it the correct references and

17:59

you're prompting it really well, uh, 5.3

18:02

can do a really good job at improving

18:04

your styling.

18:04

It definitely did in the case of bridge

18:06

voice here, but for just a one shot,

18:08

like off rip, just singular prompt, not super

18:12

impressed with this one.

18:13

I'm going to give it a four out

18:14

of 10 for the styling on this website

18:16

with one shot.

18:17

Now for another cool one, let's go to

18:19

the flight simulator.

18:20

So here it is the flight simulator.

18:22

So it says procedural terrain, full flight model,

18:25

dynamic weather and waypoint challenge.

18:28

Um, so here, so to pitch w okay,

18:30

so it's WASD.

18:31

Um, so pitch roll, we can do rolls,

18:34

a arrow left, down arrow, right.

18:36

Okay.

18:37

Um, rudder.

18:38

So there's some like actually some decent sorts

18:39

of throttle shift up control down camera chase,

18:43

uh, cockpit.

18:44

Okay.

18:45

So let's, uh, let's try this.

18:47

So camera, see chase slash cockpit.

18:49

Okay.

18:50

Whoa.

18:51

Okay.

18:52

All right.

18:53

So here is the look here.

18:55

So let's try moving around a little bit.

18:57

Okay.

18:57

So first of all, whoa, uh, codex, what

19:01

are you doing?

19:03

Okay.

19:03

So one thing I will say is we

19:05

are completely inverted.

19:06

I mean, I don't know if this is

19:07

like straight out of flight, if you guys

19:08

have seen that movie, but, um, I don't

19:10

know why the mountains are above us, but

19:13

that does not look right.

19:14

There's the weather change.

19:15

There's the daylight change.

19:17

Um, so control C is going to change

19:19

the view here.

19:21

Um, you can see that we have like

19:23

an altitude checker over in the top left.

19:26

What is that?

19:26

It's VSI waypoints.

19:29

Uh, I have a buddy.

19:29

That's a flight.

19:31

He is a airline pilot.

19:32

So, you know, he may understand how to

19:34

fly planes.

19:35

I personally don't, but when you're in this

19:37

mode, it looks like it's, um, man, there's

19:39

like a lot of glitches with this.

19:41

This is not smooth at all.

19:42

Um, let's go back to here.

19:44

And we were like literally in the matrix

19:46

right now.

19:47

Uh, we're in a bunch of green, very

19:49

glitchy.

19:51

Uh, there are like some interesting things like

19:53

on the left and right.

19:54

I mean, it is like, looks like a

19:55

flight simulator, but why are the mountains in

19:57

the sky?

19:58

Why am I in a green blob?

20:00

I'm going to give this a two out

20:01

of 10.

20:02

Uh, this is not, this is not good.

20:04

Uh, you know, we may want to, at

20:06

some point, adjust the prompts to be a

20:09

little bit less specific and maybe let codex

20:11

have its way with it.

20:14

Um, so let's actually try that.

20:17

So we're going to give this a shot.

20:19

We're going to try and rebuild this flight

20:20

simulator, but rather than giving it such a

20:22

distinguished particular prompt, we're just going to give

20:25

it something basic here.

20:26

Let's exit.

20:27

Let's pause this real quick and let's actually

20:29

pull bridge space back up.

20:30

And what I want to do is let's

20:31

actually launch a new workspace, uh, inside of

20:35

let's, let's pull back bridge bench up.

20:37

And, um, let's just launch a singular codex

20:40

instance in this workspace.

20:42

And what I want to do is I'm

20:42

going to just give it this prompt.

20:45

I want you to create a new flight

20:48

simulator that does not follow the readme or

20:51

the prompt in the existing flight simulator.

20:53

This is to be a completely newly invented

20:56

flight simulator.

20:57

All I want you to do is create

20:59

a flight sim that is simple and accurate

21:02

and allows me to fly around a map

21:04

in a plane.

21:05

Okay.

21:05

Let's just try that.

21:07

And we'll, we'll see if that's a little

21:09

bit better.

21:10

Um, you know, and we'll see if it,

21:12

you know, using its creativity, if maybe the

21:14

system prompts that we're giving it is actually

21:16

causing it to like hallucinate.

21:17

So we'll let it kind of be a

21:19

little bit more creative on its own and

21:21

we'll give it less of a, uh, less

21:23

of like specific instructions that we want it

21:26

to follow.

21:26

And we'll more so just kind of let

21:28

it run and do its own thing.

21:29

So let's say I want you to put

21:32

the code in, in, in a flight sim

21:38

sim to dot HTML file.

21:42

Okay.

21:43

So we're going to give this, and then

21:44

we're going to review the, um, stock trading

21:46

application that made, and we'll let this one

21:48

work.

21:48

And then we'll come back to this and

21:50

finish off the video, but with a review

21:51

of this, but so far with that existing

21:54

flight simulator, yeah, that was, this is not

21:56

good.

21:57

This was not good.

21:57

I'm going to give it a two out

21:58

of 10.

21:59

Literally the mountains are in the sky.

22:00

Uh, yeah, two out of 10.

22:02

All right.

22:02

So here is bridge trade.

22:04

So this is what it created for, uh,

22:07

basically the prompt that was requesting it to

22:09

create a website where I would get real

22:11

time stock analytics and be able to see

22:13

like stock market prices and stuff.

22:15

And right off the bat, I'm going to

22:17

say like, this is a pretty big fail.

22:20

Um, and I'll show you why like, Hey,

22:22

like this is not the price of Apple.

22:23

It's updating live.

22:24

It's literally Saturday right now.

22:25

So I don't know why it's actively updating.

22:27

So obviously this does not connect with like

22:29

any free, there's like a bunch of free

22:31

API APIs that you can integrate with.

22:32

So I don't know why Codex wasn't intuitively

22:34

able to search that up, identify, you know,

22:37

what it would be able to do to

22:39

build this and then be able to build

22:40

that in.

22:40

So that's a little bit discouraging there.

22:42

And then in terms of the styling, like

22:44

this just has absolutely zero creativity at all.

22:47

It tried to create like some logo that

22:48

looks bad.

22:50

Um, so I'm going to give this a

22:51

one out of 10.

22:52

I think this is a complete fail on

22:53

this prompt.

22:54

Um, and uh, yeah, that's, that's not good.

22:58

We're going to give this a one out

22:58

of 10, but let's go check back out

23:00

on the flight simulator and see how it

23:02

did.

23:02

All right.

23:02

So here's the conversation for the flight simulator.

23:05

So this actually completed very quickly.

23:07

So this one ran and it only created,

23:11

let's see here.

23:11

So this one was 600, 721 lines of

23:18

HTML and it was able to only work.

23:21

It didn't work for very long.

23:22

So let's like compare that first off with

23:24

the other, other that it created.

23:26

Right?

23:26

So let's go here.

23:27

You can see this flight simulator, the one

23:29

that it created initially, this one was 2

23:31

,500 lines.

23:33

So this one, you know, was 2,500

23:35

lines, about what?

23:37

Close to like three and a half times

23:38

larger.

23:39

Let's now go and let's do flight sim

23:41

two and go to flight sim two here.

23:45

And um, let's see, what is this?

23:47

Whoa.

23:47

Oh my gosh.

23:48

Okay.

23:49

Uh, so this one, yeah, so this, this

23:53

one, I don't even know what that is.

23:55

All right.

23:56

That is not good.

23:58

Okay.

23:58

So this one gave us a different approach.

24:00

It gave us a different look and I

24:01

don't even know what I'm looking at here.

24:03

This is a, this one's like really, really

24:04

bad.

24:05

So, um, yeah, I mean, I would say

24:07

this is obviously a one out of 10

24:09

as well.

24:09

So I, I don't know if, how, if

24:11

we could potentially be making, sorry, I've just

24:14

put the OBS, but I don't know if

24:15

we can make our bridge bench better to

24:17

really be putting these to the test, but

24:19

I just wanted to put GBD 5.3

24:21

to the test with some like system prompts

24:24

to be able to test like, okay, in

24:26

one shot, what is it able to create

24:28

in terms of like the windows interface.

24:30

But you know, in practice, when I've been

24:32

using GBD 5.3 in my normal workflow,

24:35

like you can even see this here, it

24:37

has been able to do a very good

24:39

job in creating real SAS products for me.

24:42

And the styling, I have found that the

24:45

styling is a little bit more creative.

24:48

Um, you know, I think that for us

24:49

to be able to look at like some

24:51

of these things like these here, these, these

24:53

sections here on my website about bridge code

24:55

and about the bridge mind MCP and about

24:57

the bridge space, the agent development environment and

25:00

bridge voice, you know, these were created using

25:03

codecs.

25:04

So I think that for styling and when

25:07

we're actually applying it, it's like, Hey, when

25:08

you're actually using the tools in practice, it

25:12

is a little bit different than just saying,

25:14

Hey, go build a flight simulator in one

25:16

shot.

25:16

Right.

25:16

It's like, I think that, you know, I

25:18

think it'll be nice to have some type

25:20

of bridge bench and I want to build

25:22

this out.

25:22

Maybe the community can help, um, as we

25:24

create some better system prompts to be putting

25:26

these different models to the test.

25:29

Um, but you know, I think this was

25:30

just a fun way to be able to

25:31

put it to the test, but all in

25:32

all, some of these things that codecs created,

25:35

I'm actually like not super impressed with.

25:37

Right.

25:37

I mean, what the best thing that it

25:38

created was that windows interface.

25:40

And then like some of this stuff just

25:41

wasn't that good.

25:42

Right.

25:42

So I don't know, I don't know how

25:44

I feel about that.

25:45

I think that with the Opus 4.6

25:47

with a one M context window, there's going

25:49

to be room for both in our workspace

25:51

and our, in our workflows.

25:53

Right.

25:53

But I will say this, this styling that

25:55

it did here, I really liked it.

25:57

And I think it did a good job.

25:58

Like even like, Hey, the bridge voice tab

26:00

here.

26:01

Right.

26:01

Like I had to create a bridge mind

26:03

theme.

26:04

And what's really cool about this is like,

26:06

you can see when I hover over this

26:07

now it's that bridge mine gold.

26:08

Right.

26:09

So I just thought that that was cool.

26:10

Like it did a good job, like picking

26:11

up on like really, you know, unique details

26:14

like that, as well as like, you know,

26:16

building stuff like this collapse feature, like, right.

26:18

Like that looks really good.

26:19

And Hey, the real state of software development,

26:22

when you're using vibe coding and using AI

26:24

models to help you is that it's not

26:27

a one shot.

26:28

A lot of people think about vibe coding.

26:30

They think, Oh, like the AI models can't,

26:32

you know, it can't create a flight simulator

26:33

in one shot, or it can't create, you

26:35

know, the winners windows 10 interface in one

26:37

shot, but really like what people are using

26:39

by coding for right now, or at least

26:41

what I'm using by coding for right now

26:42

is you build and then you iterate, right.

26:45

So you actually get the products working right.

26:47

And you kind of just go one step

26:49

at a time and you slowly iterate and

26:52

improve coding with the AI models via vibe

26:55

coding, but then you ultimately get something like

26:58

bridge voice, or you get something like bridge

27:01

space, which is a great product, right.

27:02

And these products are going to be launching

27:04

soon.

27:04

So I think it's a great opportunity for

27:06

us to start to build out something like

27:08

the bridge bench.

27:09

You know, I don't know how I feel

27:10

about these initial five prompts that I tested

27:12

with it.

27:12

I may want to change that over time,

27:14

but I think that with GBD 5.3,

27:17

you know, it's not like a substantial improvement

27:20

in coding, right?

27:21

Like we saw that with the benchmarks, but

27:23

even here, like, Hey, you have a flight

27:24

simulator, right?

27:25

And the mountains are in the sky, right?

27:28

And then you have a FPS game and

27:30

there's no animations for the bullets.

27:31

And there's no like animations for the characters

27:35

when you actually shoot them to like show

27:36

it that you actually like killed them or

27:38

whatever.

27:38

Right.

27:39

So it's like, you know, there's certain details

27:41

and intuition that GBD 5.3 codecs, it

27:44

does not have, um, that may need some

27:46

coaxing from the developer, right?

27:48

And that's why, Hey, that when you are

27:50

vibe coding, it's very important that you need

27:52

to understand, Hey, AI is just an extension

27:55

of the developer.

27:56

So if you understand, um, these different concepts

27:59

and you're able to prompt it correctly, then

28:01

you're going to be able to use a

28:03

model like 5.3, that will do a

28:04

good job for you.

28:05

Now, one thing I will say that I

28:07

did notice during this test is that these

28:10

prompts actually finished relatively quickly.

28:13

Like right here, this wrote 4,338 lines

28:16

of code here in HTML to build a,

28:18

what was this?

28:18

This was a, the, the bridge trade, right?

28:21

And it did do it in like less

28:23

than 10 minutes.

28:24

So I think that the substantial improvement that

28:27

we're seeing is like, I think that I

28:30

am seeing a speed up with 5.3.

28:32

Now this could be because they switched and

28:35

they partnered with Cerebrus and now they have

28:37

better and higher, you know, faster inference.

28:39

But I will say that I definitely have

28:41

noticed a speed increase.

28:42

I also have noticed some other intuitive things

28:45

such as it being able to run terminal

28:47

commands a little bit better and being able

28:49

to work in and launch different sub-agents.

28:51

You know, you can see that this one,

28:52

you know, worked for another three minutes on

28:54

doing a final polish pass before it, you

28:56

know, it did this polish pass.

28:58

So, you know, there's certain things and it's

29:00

right there.

29:01

Sorry about that.

29:01

So, you know, there's certain things that I'm

29:02

seeing with this model that are definitely an

29:04

improvement that are that like, Hey, I'm happy

29:07

that we have the model, but this model

29:09

is not the substantial jump that we've seen

29:11

with other models.

29:12

For example, like moving from O3 to GBT5,

29:16

right?

29:16

Like that was a substantial jump or 4

29:19

.0 to O1.

29:20

That was another substantial jump that OpenAI did,

29:22

right?

29:22

So I'm more looking forward to the next

29:25

iteration of whatever they come out with, whether

29:28

it's GBT6 or GBT5.3, because we didn't

29:32

see that substantial jump in the coding index.

29:34

And that's really what we need to look

29:36

at is, Hey, SWE Bench Pro, SWE Bench

29:38

Verified.

29:38

These are the coding benchmarks that we need

29:40

to be looking at.

29:43

But I will say, obviously it's a slight

29:45

improvement.

29:46

It is a speed up.

29:48

I'm going to be using Codex more on

29:49

my workflow.

29:50

I did purchase the chat GBT Pro plan

29:52

for $200 a month.

29:54

So, you know, sometimes I think that people

29:55

are asking the question, they're saying, Hey, which

29:57

model is better?

29:58

Opus models?

Interactive Summary

Ask follow-up questions or revisit key timestamps.

This video puts the newly released GPT 5.3 codecs to the test using a custom suite called BridgeBench. The creator evaluates the model's ability to build complex applications like an FPS game, a Windows 10 interface, and a flight simulator in a single prompt. While the model shows impressive speed and significant improvements in terminal-based benchmarks, the results for one-shot complex coding tasks are mixed, ranging from decent interfaces to glitchy simulations. The creator concludes that while GPT 5.3 isn't a massive leap in coding logic, it excels when used iteratively in a 'vibe coding' workflow.