Vibe Coding With GPT 5.3 Codex
986 segments
(Transcribed by TurboScribe.ai. Go Unlimited to remove this message.) In this video, I am putting the newly
released GPT 5.3 codecs to the test
in a new benchmark that I am calling
BridgeBench.
BridgeBench is going to put GPT 5.3
codecs to the test from giving it tasks
like building an FPS game inside one prompt
to building the Windows 10 interface all in
one shot.
If you guys would like to check out
the prompts that I'm going to be using
in BridgeBench and even test it out yourself,
you can just go to bridgemind.ai and
go to BridgeBench.
There'll be a link to this down in
the description below, but you'll be able to
actually see and copy the prompts that I'm
using for this test that you're going to
see in this video.
Before we get too deep into the video,
we are only 25 members away from hitting
5,000 members in the BridgeMind Discord community.
So if you guys are watching and have
not already joined the fastest growing vibe coding
community on the internet right now, make sure
you check out the description below and join
the BridgeMind Discord community.
Also, I do have a 200 like goal
on this video.
So if you guys haven't already liked the
video or subscribed and turn on post notifications,
make sure you do so.
And with that being said, let's get right
into the video.
Let's now open up BridgeSpace, which is the
ADE, the agent development environment that we've been
working on that's going to ship in the
next week or so.
And what I'm going to do now is
I'm going to open up a new workspace
in the BridgeBench directory.
So let's go over to my desktop and
let's select BridgeBench.
And we are going to launch six codex
agents inside of this workspace.
So I just select the directory.
I select how many terminals I want to
open up and I select which AI agent
I actually want to use.
And now I'm going to create this new
workspace.
And just like that, we have six codex
agents that are ready to go.
So I'm actually going to close one of
these because we only have five prompts here.
So we need five codex agents.
So what I'm going to do is I'm
going to give a quick prompt and I'm
just going to I'm going to add the
prompt one.
And I'm just going to say, I want
you to I want you to work inside
of the GPT 5 3 codex directory and
complete this project.
OK, so we're going to input this prompt
and we're going to use this same exact
prompt.
I just use that was BridgeVoice that I'm
using for my my prompts there.
That's another product in the BridgeMine suite of
Vibe coding tools that we have releasing.
So we're just going to add all of
these prompts that are in the BridgeBench and
paste in the same exact prompt here.
And we're going to do number four now.
So it's prompt number four and it is
now prompt number five.
So here we are.
We have all of these ready to go.
And just like that, let's just submit all
of these.
They're all in extra high.
They're all going to work in the GPT
5.3 codex directory.
And we're going to let these work because
as you guys know, one thing about GPT
models is that they are a little bit
slower.
They think for longer, especially when you're in
this extra high mode.
So we are using GPT 5.3 codex
extra high for this.
But while these actually work, you can see
that each of them are working in their
respective directories here.
But what I want to do now is
I actually want to take a look at
the benchmarks while these are working that OpenAI
released when they released GPT 5.3 because
I think that benchmarks are a very valuable
thing to look at for us to understand
what this model's capabilities are.
So while these all work and give us
a good look of what this model is
capable of, let's now take a look at
the benchmarks while these complete in the background.
All right.
So one very important thing that you guys
need to know is that GPT 5.3
codex released the same day as Opus 4
.6 did.
They basically released this at the same time
for composition sake so that people would be
able to be like, oh, which one's better?
Which one's better, right?
Now, a lot of people are saying that
GPT 5.3 is better.
But one thing to know is that GPT
5.3 codex is not available via the
API.
You can only use this inside of OpenAI's
codex right now.
So if you're using the codex CLI, you
can use GPT 5.3. That's the only
place that you can use it right now.
So it is a little bit unfortunate because
we also cannot see GPT 5.3 codex
inside of LM Arena.
So if we go to LM Arena real
quick, because it's not in the API, we
are unable to see it inside of the
LM Arena.
So you can see here Opus 4.6
is absolutely crushing it on LM Arena at
1576.
So it's dominating GPT 5.2 high.
Now, we don't know how that will rack
up once GPT 5.3 codex is actually
available inside of the API and is able
to be benchmarked on these leaderboards.
But Opus 4.6 is actually crushing it.
But in terms of GPT 5.3, it's
not yet available in the API.
So we're unable to see it on the
benchmarks.
But let's go over here and see the
SWE Bench Pro.
So this is how it compares to GPT
5.2 codex.
So you can see that that's actually a
bad chart.
Like here is one that's very, very interesting.
The Terminal Bench 2.0. So this is
one of the biggest jumps that OpenAI did
with this model.
So you can see that on the Terminal
Bench, it jumped from 64% in GPT
5.2 codex to 77.3% with
GPT 5.3 codex, which is actually a
huge jump.
And it does beat out Opus 4.6
in the Terminal Bench by quite a bit.
So that is a big deal for us
as people that are using CLI tools.
So here's a little example that they gave
on what the capabilities are of these differences.
They basically gave it the same prompt.
So interesting there to be able to see
that.
But let's check it out down here.
OS World Verified.
You can see a huge jump there.
So that is an agentic computer use benchmark
where the agent has to complete productivity tasks
in a virtual desktop computer environment.
So this is one thing that I want
to kind of highlight here because I've talked
about it a little bit on stream, but
I think that computer use is going to
start to become a bigger deal in 2026.
We saw that with CloudBot or OpenClaw coming
out and people using it for computer use.
So that's big as well.
It saw a huge jump there.
They have a lot of text here.
Can we get?
Okay, here we go.
Finally, in the appendix, we have some of
the information here.
So one thing to note, and this came
with both the Opus 4.6 drop as
well as the GPT 5.3 drop.
In both of these models, we did not
see a significant improvement in the SWE bench
marks.
So this is the SWE Bench Pro benchmark,
but you can see that it hardly improved
from GPT 5.2 to GPT 5.3.
We saw a 0.4% improvement on
the SWE Bench Pro.
Now on the terminal bench, we saw a
huge improvement, which is great.
And in these other areas, we also saw
a pretty big improvement.
But with SWE Bench Pro and SWE Bench
Verified, which is also an important benchmark, we're
not really seeing the massive jumps that we
saw.
For example, from Opus 4.1 to Opus
4.6, there was a huge jump in
capability.
Same thing from GPT 5 to what we
now have, GPT 5.3, a pretty large
jump in coding capabilities.
But this just seems like a very small
iteration when you're actually looking at the benchmarks.
But I will say after having used the
model for hours and hours and hours on
end, I will say that that small jump,
that small iteration has made a difference as
well as some of the other improvements in
the models.
In my personal workflow, I have noticed it
one-shot things that I've been struggling on
doing with the previous models, GPT 5.2
or even Opus 4.5. I did notice
that GPT 5.3 was able to one
-shot things that it was not able to
do before.
So I will say that, but that's one
thing I want to highlight is that in
terms of the benchmarks and the coding benchmarks,
we're not seeing some massive jump with 5
.3 or Opus 4.6 for that matter.
But specifically with 5.3, we're seeing only
a 0.4% increase here.
So not going to be a massive improvement
in coding capabilities, but still a small improvement.
So maybe that small improvement helps us do
a lot.
So that's the benchmarks, that's what you guys
need to know.
Just to summarize, not super big jumps in
coding, but some pretty big jumps in other
areas like the Terminal Bench 2.0 and
OS World Verified.
So those are some pretty impressive jumps, but
coding, not that great.
But with that being said, let's go check
up on our coding agents and see if
they're done yet.
All right.
So all of our agents have now finished
working.
So just as a reminder, we are building
an FPS game, a Windows 10 interface, the
flight simulator, and bridge trade stock app, as
well as a new bridge mine landing page.
So let's actually now pull these up and
see what was created starting with the new
Windows 10 interface, which was created by chatGBT,
or sorry, GBT 5.3 codecs.
So here we are.
So we have a Windows interface.
Let's just click around a little bit.
Let's check out the command prompt and see
if this works.
So let's do an LS.
So we have desktop.
Let's CD to our desktop.
That's interesting.
Okay.
So let's CD out.
So how can we...
So this looks like it works.
The command prompt works.
It looks like we have a calculator.
Does the calculator work?
Let's do six times seven equals...
Okay, 42.
Okay.
So that looks like it works.
You can see the calculator.
When I dismiss it, it stays down here.
If I close it, it gets removed.
That looks nice.
What's this?
This is a notepad.
Hello there.
Can I save this?
Let's see if I can actually save this.
Let's save as, and we'll save it to...
Let's save it in our desktop, and we'll
just title it test.
And if this shows up on a desktop,
that's actually interesting, which it doesn't look like
it does.
So it looks like we just saved it,
but it didn't actually save.
Let's see file explorer and...
Oh, here's test, and here's hello there.
Oh, wow.
Okay.
So it did do that correctly.
There's the test.
Now, for some reason, it doesn't actually show
up on my desktop here.
Can I drag and drop this into other...
Okay.
I can't drag and drop.
Can I just...
Okay.
I can delete these.
I can rename them.
Remember, this is just in a simple HTML
file.
So it was able to do all of
this in one shot.
If I go to my recycle bin, here's
what that looks like.
You can see that test.txt that I
had was now moved to the recycling bin.
So, I mean, that's pretty impressive.
Microsoft Edge.
I mean, I don't think this is going
to work because it's in an HTML file.
So, yeah, it's not going to work.
So that's decent.
Now, photos.
What is up with this?
Photos is all right.
Control panel.
Here's everything in the control panel.
Can I actually change this to...
Oh, wow.
Okay.
So I can adjust the taskbar color.
I can adjust the window color, which is
pretty cool.
I can turn on dark mode, dark mode,
light mode.
I mean, this is pretty basic, in my
opinion.
It's all right.
You can adjust the brightness, resolution.
I don't notice any difference with the resolution.
But settings.
Settings is interesting.
Paint.
Can we paint?
We can paint.
So that's cool.
And remember, you know, this is all in
just one shot.
We can erase.
We can fill.
Green.
Lines.
Rectangle.
I mean, this is...
Wow.
This is pretty cool.
So that does work.
Now, with this, I mean, I guess this
is all the functionality that it has.
So if we do the windows, like, here's
what it has.
The weather.
I doubt that the weather works.
Yeah.
So, I mean, this is just in a
basic HTML, but it did do a decent
job.
I mean, it's got the time correct.
You can see it down here.
1103272026.
It's got the got the Wi-Fi.
I don't think it's actually connected to my
Wi-Fi, but we can adjust the volume.
Battery.
There's not actually any battery, but I mean,
for a one shot, this is okay.
Obviously, you can see, like, it's not like,
I mean, it's not Windows 10, but, you
know, it did a pretty good job for
one prompt.
So I would give this maybe like a
6 out of 10.
You can move everything around.
I don't know what else you guys would
like to see, but, I mean, it did
build the Windows 10 interface.
Not exactly Windows 10, but definitely was able
to build something pretty, pretty unique here that
reflects Windows 10.
So I would give it a 6 out
of 10.
All right.
The next thing that I want to test
out is this FPS game.
So first of all, okay, so it's called
Neon Breach.
Browser FPS built with a pure JavaScript raycasting
engine.
Controls are WASD.
Mouse look, shift, sprint, 1, 2, 3, switch
weapon, R to reload, left click to fire.
Okay.
So let's go and first of all, like,
I do want to pull up cursor and
I want to see how many agents, how
many lines of code did these agents actually
create?
So for the Windows interface, let's see this.
So that did, the one shot for the
Windows 10 interface was 5,308 lines of
code.
For this FPS game, it's only 1,623.
So let's click to start.
So let's just mouse look, shift to sprint,
1, 2, 3, switch weapon.
Okay.
Let's start.
So one, whoa.
Okay.
First of all, the, there's a map in
the left corner here.
There's, you can see the map in the
top left corner here.
Oh, am I taking damage?
Oh, shoot.
I'm literally taking damage right now.
The, okay.
First of all, the styling isn't that great.
What is this?
Okay.
Shotgun shells.
How do I get a gun out?
How do I shoot?
Whoa.
Okay.
So I'm shooting, but I'm also getting hit.
Am I getting killed?
What, how am I getting, okay.
Game over.
Enemies killed.
So I arrived for 29 seconds.
Accuracy was 17%.
Let's restart.
Okay.
So this is actually like not that good.
So shotgun shells, can I, I can't even
shoot.
I'm, I'm, I'm doing, okay, wait, hold up.
Oh, whoa.
Okay.
So I think I did shoot it.
Okay.
So I'm sprinting, I'm pressing shift.
Okay.
I think I did.
Was that me killing it?
Okay.
I think that was me killing it.
Okay.
I mean, the graphics are horrible.
The animation's not good.
There's health over here.
I mean, there's a map in the up,
upper left-hand corner.
The styling that it did, like the spacing
that you guys see, not good.
They're also like, I just don't even understand.
Like, it just wasn't very creative, if I'm
being completely honest.
Like, hey, the map is nice.
There's like a ton of people in over
here.
Can I like get over here?
Okay.
I mean, there's no animations.
There's like, I wish there would be like
some sort of, okay.
It's reloading my shotgun.
So reloading works.
Okay.
Oh, I mean, there's, I wish there was
like some type of, you know, gunshots or
something like that.
Here's health.
Here's, what is this?
Machine ammo.
How do I?
Okay.
Okay.
I switched my machine gun.
Oh, whoa.
Okay.
Game over.
We survived for 79.5 seconds.
Apparently, my accuracy was 217%.
So I don't know.
I would give this like a two out
of 10.
Um, this was not very good.
And you know, this could go back to
our system prompt that we gave it.
And just that it, uh, was maybe like
misunderstood our system prompt, but you can look
at this and like, obviously the animations and
what it created here, not super impressed by
what it did.
So that could tie back to our system
prompt.
We may need to, you know, update that
system prompt for future bridge bench tests, but
I am not impressed with what Codex did
here.
I'm next thing that I want to check
out is the front end styling.
So this is the remake of the bridge
mind website that it made.
So, you know, one thing that GPT models
have been notoriously bad at has been styling.
Now, as I've used to GPT 5.3
Codex in the past couple of days, I
will say that when given the correct information
and prompts, it has been doing a much
better job than what we had with like
GPT 5 for sure.
But what I will say with this is
it's not that great.
What it created here.
I mean, let's, um, let's try shrinking this.
How's the, I mean, I would say that
the responsiveness is actually pretty good.
Like the responsiveness and spacing is, is pretty
good.
Um, it could do a couple of things
better, like maybe making the, not those buttons,
not wrap and just making them, you know,
compact a little bit more, um, and adjust
to the screen size, but, um, overall, definitely
better than what we've seen with previous GPT
models.
I will say that, Hey, with like one
shot with no styling guide, um, it's doing
a pretty decent job.
It just created this landing page.
I believe.
Yeah.
Like if you, all these buttons will just
link to, uh, to different parts of that
page.
So I would give this a four out
of 10.
Um, I'm not super impressed by like the
background gradients.
Like this is like kind of your standardized
AI look.
I don't feel like this is really unique.
You know, sometimes when you're using Gemini three
pro, you can get these really unique UIs.
Uh, but with GBT, I would say that
it didn't do a great job on this.
Now with that, I will say like, uh,
for example, with bridge voice, I was having
GBT 5.3 codex, make a couple updates
to bridge voice and all the UI that
you guys see here.
Um, like even like these, these theme changes
and whatnot, these were all made by a
GBT 5.3 codex.
And this one had a little bit more
of a structured styling in place to begin
with.
And GBT 5.3 did do a really
good job making it like very uniform.
So that's just one example that I will
show you.
Like from my personal work, when you're getting,
when you're actually using 5.3 in practice
and you're giving it the correct style guides
and you're giving it the correct references and
you're prompting it really well, uh, 5.3
can do a really good job at improving
your styling.
It definitely did in the case of bridge
voice here, but for just a one shot,
like off rip, just singular prompt, not super
impressed with this one.
I'm going to give it a four out
of 10 for the styling on this website
with one shot.
Now for another cool one, let's go to
the flight simulator.
So here it is the flight simulator.
So it says procedural terrain, full flight model,
dynamic weather and waypoint challenge.
Um, so here, so to pitch w okay,
so it's WASD.
Um, so pitch roll, we can do rolls,
a arrow left, down arrow, right.
Okay.
Um, rudder.
So there's some like actually some decent sorts
of throttle shift up control down camera chase,
uh, cockpit.
Okay.
So let's, uh, let's try this.
So camera, see chase slash cockpit.
Okay.
Whoa.
Okay.
All right.
So here is the look here.
So let's try moving around a little bit.
Okay.
So first of all, whoa, uh, codex, what
are you doing?
Okay.
So one thing I will say is we
are completely inverted.
I mean, I don't know if this is
like straight out of flight, if you guys
have seen that movie, but, um, I don't
know why the mountains are above us, but
that does not look right.
There's the weather change.
There's the daylight change.
Um, so control C is going to change
the view here.
Um, you can see that we have like
an altitude checker over in the top left.
What is that?
It's VSI waypoints.
Uh, I have a buddy.
That's a flight.
He is a airline pilot.
So, you know, he may understand how to
fly planes.
I personally don't, but when you're in this
mode, it looks like it's, um, man, there's
like a lot of glitches with this.
This is not smooth at all.
Um, let's go back to here.
And we were like literally in the matrix
right now.
Uh, we're in a bunch of green, very
glitchy.
Uh, there are like some interesting things like
on the left and right.
I mean, it is like, looks like a
flight simulator, but why are the mountains in
the sky?
Why am I in a green blob?
I'm going to give this a two out
of 10.
Uh, this is not, this is not good.
Uh, you know, we may want to, at
some point, adjust the prompts to be a
little bit less specific and maybe let codex
have its way with it.
Um, so let's actually try that.
So we're going to give this a shot.
We're going to try and rebuild this flight
simulator, but rather than giving it such a
distinguished particular prompt, we're just going to give
it something basic here.
Let's exit.
Let's pause this real quick and let's actually
pull bridge space back up.
And what I want to do is let's
actually launch a new workspace, uh, inside of
let's, let's pull back bridge bench up.
And, um, let's just launch a singular codex
instance in this workspace.
And what I want to do is I'm
going to just give it this prompt.
I want you to create a new flight
simulator that does not follow the readme or
the prompt in the existing flight simulator.
This is to be a completely newly invented
flight simulator.
All I want you to do is create
a flight sim that is simple and accurate
and allows me to fly around a map
in a plane.
Okay.
Let's just try that.
And we'll, we'll see if that's a little
bit better.
Um, you know, and we'll see if it,
you know, using its creativity, if maybe the
system prompts that we're giving it is actually
causing it to like hallucinate.
So we'll let it kind of be a
little bit more creative on its own and
we'll give it less of a, uh, less
of like specific instructions that we want it
to follow.
And we'll more so just kind of let
it run and do its own thing.
So let's say I want you to put
the code in, in, in a flight sim
sim to dot HTML file.
Okay.
So we're going to give this, and then
we're going to review the, um, stock trading
application that made, and we'll let this one
work.
And then we'll come back to this and
finish off the video, but with a review
of this, but so far with that existing
flight simulator, yeah, that was, this is not
good.
This was not good.
I'm going to give it a two out
of 10.
Literally the mountains are in the sky.
Uh, yeah, two out of 10.
All right.
So here is bridge trade.
So this is what it created for, uh,
basically the prompt that was requesting it to
create a website where I would get real
time stock analytics and be able to see
like stock market prices and stuff.
And right off the bat, I'm going to
say like, this is a pretty big fail.
Um, and I'll show you why like, Hey,
like this is not the price of Apple.
It's updating live.
It's literally Saturday right now.
So I don't know why it's actively updating.
So obviously this does not connect with like
any free, there's like a bunch of free
API APIs that you can integrate with.
So I don't know why Codex wasn't intuitively
able to search that up, identify, you know,
what it would be able to do to
build this and then be able to build
that in.
So that's a little bit discouraging there.
And then in terms of the styling, like
this just has absolutely zero creativity at all.
It tried to create like some logo that
looks bad.
Um, so I'm going to give this a
one out of 10.
I think this is a complete fail on
this prompt.
Um, and uh, yeah, that's, that's not good.
We're going to give this a one out
of 10, but let's go check back out
on the flight simulator and see how it
did.
All right.
So here's the conversation for the flight simulator.
So this actually completed very quickly.
So this one ran and it only created,
let's see here.
So this one was 600, 721 lines of
HTML and it was able to only work.
It didn't work for very long.
So let's like compare that first off with
the other, other that it created.
Right?
So let's go here.
You can see this flight simulator, the one
that it created initially, this one was 2
,500 lines.
So this one, you know, was 2,500
lines, about what?
Close to like three and a half times
larger.
Let's now go and let's do flight sim
two and go to flight sim two here.
And um, let's see, what is this?
Whoa.
Oh my gosh.
Okay.
Uh, so this one, yeah, so this, this
one, I don't even know what that is.
All right.
That is not good.
Okay.
So this one gave us a different approach.
It gave us a different look and I
don't even know what I'm looking at here.
This is a, this one's like really, really
bad.
So, um, yeah, I mean, I would say
this is obviously a one out of 10
as well.
So I, I don't know if, how, if
we could potentially be making, sorry, I've just
put the OBS, but I don't know if
we can make our bridge bench better to
really be putting these to the test, but
I just wanted to put GBD 5.3
to the test with some like system prompts
to be able to test like, okay, in
one shot, what is it able to create
in terms of like the windows interface.
But you know, in practice, when I've been
using GBD 5.3 in my normal workflow,
like you can even see this here, it
has been able to do a very good
job in creating real SAS products for me.
And the styling, I have found that the
styling is a little bit more creative.
Um, you know, I think that for us
to be able to look at like some
of these things like these here, these, these
sections here on my website about bridge code
and about the bridge mind MCP and about
the bridge space, the agent development environment and
bridge voice, you know, these were created using
codecs.
So I think that for styling and when
we're actually applying it, it's like, Hey, when
you're actually using the tools in practice, it
is a little bit different than just saying,
Hey, go build a flight simulator in one
shot.
Right.
It's like, I think that, you know, I
think it'll be nice to have some type
of bridge bench and I want to build
this out.
Maybe the community can help, um, as we
create some better system prompts to be putting
these different models to the test.
Um, but you know, I think this was
just a fun way to be able to
put it to the test, but all in
all, some of these things that codecs created,
I'm actually like not super impressed with.
Right.
I mean, what the best thing that it
created was that windows interface.
And then like some of this stuff just
wasn't that good.
Right.
So I don't know, I don't know how
I feel about that.
I think that with the Opus 4.6
with a one M context window, there's going
to be room for both in our workspace
and our, in our workflows.
Right.
But I will say this, this styling that
it did here, I really liked it.
And I think it did a good job.
Like even like, Hey, the bridge voice tab
here.
Right.
Like I had to create a bridge mind
theme.
And what's really cool about this is like,
you can see when I hover over this
now it's that bridge mine gold.
Right.
So I just thought that that was cool.
Like it did a good job, like picking
up on like really, you know, unique details
like that, as well as like, you know,
building stuff like this collapse feature, like, right.
Like that looks really good.
And Hey, the real state of software development,
when you're using vibe coding and using AI
models to help you is that it's not
a one shot.
A lot of people think about vibe coding.
They think, Oh, like the AI models can't,
you know, it can't create a flight simulator
in one shot, or it can't create, you
know, the winners windows 10 interface in one
shot, but really like what people are using
by coding for right now, or at least
what I'm using by coding for right now
is you build and then you iterate, right.
So you actually get the products working right.
And you kind of just go one step
at a time and you slowly iterate and
improve coding with the AI models via vibe
coding, but then you ultimately get something like
bridge voice, or you get something like bridge
space, which is a great product, right.
And these products are going to be launching
soon.
So I think it's a great opportunity for
us to start to build out something like
the bridge bench.
You know, I don't know how I feel
about these initial five prompts that I tested
with it.
I may want to change that over time,
but I think that with GBD 5.3,
you know, it's not like a substantial improvement
in coding, right?
Like we saw that with the benchmarks, but
even here, like, Hey, you have a flight
simulator, right?
And the mountains are in the sky, right?
And then you have a FPS game and
there's no animations for the bullets.
And there's no like animations for the characters
when you actually shoot them to like show
it that you actually like killed them or
whatever.
Right.
So it's like, you know, there's certain details
and intuition that GBD 5.3 codecs, it
does not have, um, that may need some
coaxing from the developer, right?
And that's why, Hey, that when you are
vibe coding, it's very important that you need
to understand, Hey, AI is just an extension
of the developer.
So if you understand, um, these different concepts
and you're able to prompt it correctly, then
you're going to be able to use a
model like 5.3, that will do a
good job for you.
Now, one thing I will say that I
did notice during this test is that these
prompts actually finished relatively quickly.
Like right here, this wrote 4,338 lines
of code here in HTML to build a,
what was this?
This was a, the, the bridge trade, right?
And it did do it in like less
than 10 minutes.
So I think that the substantial improvement that
we're seeing is like, I think that I
am seeing a speed up with 5.3.
Now this could be because they switched and
they partnered with Cerebrus and now they have
better and higher, you know, faster inference.
But I will say that I definitely have
noticed a speed increase.
I also have noticed some other intuitive things
such as it being able to run terminal
commands a little bit better and being able
to work in and launch different sub-agents.
You know, you can see that this one,
you know, worked for another three minutes on
doing a final polish pass before it, you
know, it did this polish pass.
So, you know, there's certain things and it's
right there.
Sorry about that.
So, you know, there's certain things that I'm
seeing with this model that are definitely an
improvement that are that like, Hey, I'm happy
that we have the model, but this model
is not the substantial jump that we've seen
with other models.
For example, like moving from O3 to GBT5,
right?
Like that was a substantial jump or 4
.0 to O1.
That was another substantial jump that OpenAI did,
right?
So I'm more looking forward to the next
iteration of whatever they come out with, whether
it's GBT6 or GBT5.3, because we didn't
see that substantial jump in the coding index.
And that's really what we need to look
at is, Hey, SWE Bench Pro, SWE Bench
Verified.
These are the coding benchmarks that we need
to be looking at.
But I will say, obviously it's a slight
improvement.
It is a speed up.
I'm going to be using Codex more on
my workflow.
I did purchase the chat GBT Pro plan
for $200 a month.
So, you know, sometimes I think that people
are asking the question, they're saying, Hey, which
model is better?
Opus models?
Ask follow-up questions or revisit key timestamps.
This video puts the newly released GPT 5.3 codecs to the test using a custom suite called BridgeBench. The creator evaluates the model's ability to build complex applications like an FPS game, a Windows 10 interface, and a flight simulator in a single prompt. While the model shows impressive speed and significant improvements in terminal-based benchmarks, the results for one-shot complex coding tasks are mixed, ranging from decent interfaces to glitchy simulations. The creator concludes that while GPT 5.3 isn't a massive leap in coding logic, it excels when used iteratively in a 'vibe coding' workflow.
Videos recently processed by our community