HomeVideos

Shipping with Codex

Now Playing

Shipping with Codex

Transcript

768 segments

0:08

I'm Thibaud.

0:10

And I'm here at OpenAI and I build

0:12

Codex.

0:14

With Codex, we're building an AI

0:18

software engineer.

0:19

I personally like to think about it as a

0:22

little bit like a human teammate.

0:25

You can pair program with it on your

0:27

computer.

0:28

You can delegate to it. Or, as you'll

0:31

see, you can give it a job without

0:33

explicit prompting.

0:37

There's been, recently,

0:39

a massive vibe shift.

0:42

This has started from August, where we

0:44

had pretty decent usage, and since then,

0:46

thanks to all of you,

0:48

we've grown tenfold.

0:51

Today, I want to start by sharing some

0:54

of the recent updates that have created

0:56

this vibe shift.

0:58

Then, we'll bring some engineers from

1:00

OpenAI to show you some examples of how

1:03

we use Codex day-to-day.

1:06

Some of them are building here on the

1:07

Codex team. Some of them are just really

1:10

excited users of Codex at OpenAI.

1:13

Let's first talk about some of those

1:15

updates.

1:17

Codex now works everywhere you build.

1:20

Whether it's in your IDE, your terminal,

1:23

GitHub, web, or mobile.

1:27

No matter where you are,

1:28

it is the same powerful agent under the

1:31

hood.

1:32

The first and most important improvement

1:34

we made was to completely overhaul the

1:37

agent.

1:39

We think of the agent as a combination

1:41

of two things.

1:42

The reasoning model under the hood and

1:45

its tool harness to allow it to act and

1:48

impact change upon the world to create

1:50

value for you.

1:51

First, the model.

1:53

In August, we shipped GPT-5, our best

1:57

agentic model thus far.

2:00

That was until we listened to your

2:02

feedback.

2:04

And we approved upon it by shipping

2:06

GPT-5 Codex, a model that was further

2:10

optimized for work within Codex,

2:12

improving by being smarter, better

2:14

following code style, and adapting its

2:17

thinking time.

2:19

One of my favorite quotes from the

2:21

feedback from you all was that it feels

2:25

a little bit more like a true senior

2:27

engineer because it gives such few

2:29

compliments. And it also pushes back

2:33

on bad ideas.

2:36

Next, we completely rewrote the harness

2:39

to make most of the new models.

2:42

Add support for planning, MCP, auto

2:45

compaction,

2:47

so that you can have these really long

2:48

conversations and interactions, and so

2:50

much more.

2:52

At this point, we started seeing the CLI

2:55

usage take off.

2:58

But, there's more feedback.

3:00

The model felt really good. The agent

3:03

was useful, but the CLI felt early.

3:06

We appreciate the feedback, and so we

3:09

decided to completely revamp the Codex

3:11

CLI. We simplified approvals modes,

3:14

created a more legible UI, and added a

3:17

ton of polish polish.

3:20

And by default, it works with

3:22

sandboxing, so it is safe by default,

3:25

but you always have control.

3:28

It's been a work in progress, and we

3:30

shipped a big update last Friday.

3:32

We'll ship a new release today again.

3:36

More feedback from you all. A bunch of

3:38

you collaborate with the agent and want

3:40

to look at the code at the same time.

3:42

This is why we shipped it in the IDE

3:45

directly as a native extension.

3:48

Here, it works with your code alongside,

3:52

you know, you having control over your

3:54

IDE, get this little collaborator. It

3:56

works in VS Code, it works in Cursor,

3:59

and other popular forks.

4:01

This immediately took off.

4:04

Within the first week, we had 100,000

4:06

users. Many of you, I'm sure, are in

4:08

this room. A lot of our users prefer to

4:11

use Codex in their IDE directly. Part of

4:14

the magic here is that it is the exact

4:17

same agent.

4:18

It is the same open-source harness that

4:21

is powering the CLI bundled right within

4:24

the extension.

4:26

At the same time, we're also upgrading

4:28

Codex Cloud

4:30

so that

4:31

you could run many more tasks in

4:33

parallel.

4:35

For us, this is still the beginning, but

4:37

we think it's incredibly cool to be able

4:39

to command Codex through your phone.

4:42

Cloud tasks now run 90% faster

4:45

faster. They can set up their

4:46

dependencies automatically and verify

4:48

their work by taking screenshots and

4:50

sending them to you.

4:52

Giving the agent its own computer really

4:54

feels magical when it works.

4:57

And then, you can start working with

4:59

agents like this in tools like GitHub.

5:02

Or now Slack.

5:05

Here's an example of one engineer. Some

5:07

of you might know him.

5:09

Who had a question. And then another

5:11

engineer, Steve Lee, immediately jumps

5:13

on it and delegates it to Codex. Here,

5:16

Codex receives the entire context from

5:18

the thread and just gets to work. A

5:20

couple of minutes later, it posts a

5:22

solution together with a summary. It

5:24

actually went, explored the whole

5:26

problem, and wrote some code.

5:28

All of this progress means that we can

5:31

write code so much faster,

5:33

which also means that we have a lot of

5:35

code collectively to review.

5:38

Validating and reviewing code is now

5:40

becoming a huge bottleneck.

5:43

This We've been thinking about this for

5:44

a while.

5:46

Past experiments with code review at

5:48

OpenAI showed that it could be

5:51

useful, but also oftentimes noisy.

5:53

Previous attempts had to be turned off

5:56

because users were complaining about the

5:58

lack of signal.

6:00

So, we purposely trained GPT-5 Codex to

6:03

be great at ultra-thorough code review.

6:07

It goes through the dependencies, all

6:09

the code in depth inside its little

6:10

container,

6:11

truly explores the contract of like your

6:14

intent and what actually happens in the

6:17

implementation,

6:19

and then comes back with high-quality

6:20

findings. We now find that many teams

6:23

decide to enable it by default and

6:25

almost and want to make it mandatory

6:28

because it is such a high signal

6:31

finding every time.

6:33

You can trigger it while pairing with

6:34

Codex, or you can completely automate it

6:37

by running on every PR in GitHub.

6:40

Okay.

6:41

It's been a busy few months for a small,

6:44

growing team.

6:46

We've been using Codex to build Codex.

6:48

There's really no way we could have done

6:50

it without it.

6:52

Even more fun has been seeing OpenAI as

6:56

a whole get accelerated.

6:59

Today,

7:00

92%

7:02

almost all of OpenAI technical staff

7:04

uses Codex daily.

7:07

Up from 50% around last July.

7:10

Engineers that use Codex submit 70% more

7:13

PRs per week.

7:16

And pretty much all PRs are reviewed by

7:18

Codex.

7:20

When it finds an issue, people are

7:21

actually excited. It saves you time. You

7:25

ship with more confidence.

7:27

There's nothing worse than finding a bug

7:29

after you actually shipped the feature.

7:32

When we as a team see the stats, it

7:35

feels great. But even better is being at

7:37

lunch with someone

7:40

who then goes, "Hey, I use Codex all the

7:42

time. Here's a cool thing that I do with

7:45

it. Do you want to hear about it?"

7:48

And so we wanted to give you a taste of

7:49

that.

7:51

So, let's get to lunch with a few

7:53

teammates

7:54

and hear about their stories. They'll

7:56

show you real workflows of our teams,

7:59

how they use it every day.

8:01

Please welcome Nacho to the stage to

8:03

talk about iterating on UI

8:06

for the ChatGPT iOS app.

8:09

[Applause]

8:12

Thanks, Thibaud.

8:15

Hello. My name is Nacho Soto. I'm a

8:17

member of the core iOS team at OpenAI.

8:20

Going to do two things today. I'm going

8:22

to tell you about a workflow that I use

8:24

frequently when building the ChatGPT

8:26

app. And I'd like to share a demo that

8:28

shows you how I do this.

8:30

Let's start with the demo.

8:34

Thibaud asked me to build a weather app.

8:36

So, I have a starter project with just

8:38

an empty window.

8:42

And I've also asked ChatGPT to make a

8:44

mock-up of what I want the UI to look

8:45

like.

8:48

So, I'm going to ask Codex

8:50

to implement that design.

8:57

Great. While that's running, let me tell

8:59

you what's special about how Codex is

9:01

going to implement this.

9:03

Working in the ChatGPT core team means I

9:05

spend a lot of time on infrastructure,

9:07

performance, but also do some amount of

9:09

front-end work.

9:12

Recently, I worked on this small feature

9:14

where we simplified our personalization

9:16

screen to make our new ChatGPT

9:18

personalities more discoverable.

9:21

And I'm sure you've run into something

9:22

like this before. With that last 10% of

9:25

polish, like getting these headers and

9:27

footers aligned, it's actually taking

9:29

90% of the time.

9:32

But Codex can help you with that 10%.

9:35

And you can work on that while you do

9:37

something else. Maybe you're watching

9:38

some of the other DevDay talks.

9:41

You can even have nine other terminal

9:43

tabs running Codex if you want to be a

9:45

true 10x engineer.

9:50

Who here has been sent a pull request

9:52

from a junior engineer, and within a few

9:54

seconds you you that they didn't

9:55

actually test it, cuz there's no way

9:57

that it works.

10:00

If you used ChatGPT or any other agent 6

10:03

months ago, you were working with that

10:05

junior engineer.

10:07

But Codex is not.

10:10

Like Tevo said, I would argue that Codex

10:12

is now a senior engineer.

10:15

It doesn't just write the code and

10:17

assume that it works. It will verify

10:19

that it does.

10:21

I'm a big fan of TDD, test-driven

10:23

development. And I think Codex really

10:26

thrives with that workflow.

10:29

It will run your tests, fix the code,

10:31

run your tests again, over and over

10:33

until they pass.

10:36

But why stop at unit tests?

10:39

Codex is multimodal, which means it can

10:41

also verify its work visually.

10:46

A few weeks ago, we gave Codex that

10:48

superpower of being able to see images.

10:51

So, I taught it to generate snapshots

10:53

for the UI code that it writes.

10:56

And best of all, it's actually very

10:58

simple.

10:59

First,

11:00

I made this very simple makefile that

11:02

runs the unit tests to extract the

11:05

SwiftUI previews.

11:07

And that calls a small Python script,

11:09

which Codex wrote, by the way.

11:11

And that extracts those images, puts

11:13

them in a folder so Codex can find them.

11:17

Then in the agents.md I just told Codex

11:19

about that script, and I've asked it to

11:21

use it to verify its work.

11:25

We use this workflow to build the

11:26

ChatGPT iOS and Mac app. You could do

11:29

the same on web, for example, with tools

11:32

like Storybook or Playwright.

11:35

So, that's my workflow. I give Codex

11:37

some tools to generate screenshots so we

11:39

can verify the UI code that it writes.

11:42

Let's check in and see how Codex is

11:43

doing.

11:47

Okay, so if I scroll back,

11:51

looks like it wrote some code, started

11:52

with a plan

11:54

uh to review the existing code, uh

11:57

implement the UI, and provide preview

12:00

data to verify that it's good. So,

12:03

great, looks like it wrote all that

12:04

code, ran the snapshot tests.

12:08

Cool. So, uh

12:10

no, I guess no, but for 3 minutes, go

12:12

ahead and run that up.

12:18

Cool.

12:20

So, obviously this is a very simple

12:22

example, but it actually scales with how

12:24

many changes to match larger projects

12:27

like the ChatGPT app.

12:28

And it can run for many hours depending

12:30

on the tasks, iterating over and over

12:33

until it's pixel perfect.

12:36

And speaking of working for many hours,

12:38

I like to pass it over to Freal, who's

12:40

going to show us how to scale these

12:42

verification loops to run for longer

12:44

periods of time and more complex

12:46

problems.

12:47

Thank you.

12:48

[Applause]

12:53

Thanks, Nacho.

12:54

I'm Freal, and I work on developer

12:56

productivity.

12:57

Here at OpenAI, I've set high scores for

13:00

the longest sessions or the most tokens

13:03

produced.

13:04

I'm known as the guy that gets Codex to

13:05

do this.

13:07

For being able to use Codex to one-shot

13:09

big features and complex code changes.

13:11

I've seen the GPT-5 Codex model work for

13:13

over 7 hours productively. That was my

13:15

prompt. Or process more than 150 million

13:18

tokens over the course of a marathon

13:20

session.

13:23

This is one of those projects.

13:25

It's a complex refactor, a major feature

13:27

for my my personal JSON parser project.

13:31

And for large projects like this, there

13:33

can be long periods of time where it

13:34

seems like

13:36

all of the tests are failing until the

13:38

work is complete, especially when you're

13:39

making that core change.

13:41

Now, this is a JSON parser built for

13:43

streaming tool calls,

13:44

a parser for the AI age.

13:47

And this person this PR has over 15,000

13:49

lines of code changed, and it was

13:51

created over many hours of work from

13:52

Codex,

13:54

but only a few minutes and a handful of

13:56

prompts from me.

13:57

Let's walk through how I go from prompt

13:59

to pull request.

14:02

We'll do this in just a couple prompts.

14:04

First, we'll tell Codex that we want a

14:06

plan to implement our feature.

14:09

Then, we're going to review that plan

14:11

and tell it to execute. And finally, we

14:14

ship.

14:18

Here, I've opened my project in VS Code,

14:21

and I'll open up our Codex extension as

14:23

well.

14:25

Uh I have a fairly complex feature I

14:26

want to implement, and I've prepared

14:28

that in a document for me for me to

14:29

read.

14:31

And I'm going to tell Codex that I want

14:32

it to write a plan to implement this

14:34

feature, and I've described the end

14:36

state.

14:37

But I want Codex to do the heavy lifting

14:39

for me and research how to integrate

14:41

this library into my parser.

14:44

So, what I do is I ask Codex to write a

14:46

spec.

14:48

And I'm going to go ahead and kick that

14:49

off.

14:51

And actually, I'm going to turn off auto

14:53

context here. A little bit of an aside,

14:55

I've rehearsed this a few times, and

14:57

it's actually found finished specs from

15:00

my Git history and cheated and copied

15:02

right to the end of the process.

15:05

So, I'm going to have it really do the

15:07

work.

15:09

And uh

15:10

you'll notice I don't need to tell it a

15:11

lot. I've given it my example, I've told

15:13

it to do some research,

15:15

follow the example of the code that I've

15:17

already got.

15:20

So,

15:22

while that's working, let me show you

15:23

what I mean by a plan or an exec spec.

15:27

This is my plans.md file. I've

15:29

abbreviated everything here so we don't

15:31

have to read all of it. It's 160 lines.

15:34

Uh but really what I'm doing here is I'm

15:36

writing a design document for design

15:38

documents.

15:40

Codex is now a senior engineer, after

15:41

all, so we should be asking it to do

15:43

some of its own paperwork, too.

15:46

And like most engineers writing a design

15:48

doc, it's going to start by copying from

15:50

an example. I've got one here.

15:54

And I tell it,

15:55

you know, this plan is going to contain

15:57

is going to be a living document. It's

15:59

going to describe its big picture. It's

16:01

going to have a to-do list and progress

16:03

that it keeps up to date.

16:04

And I also want to say,

16:07

why do I keep on saying exec plan?

16:09

And I'm doing that because I want to

16:11

give the model a term to anchor on and

16:13

know when I use the term exec plan,

16:16

use plans.md to design that, to iterate

16:19

on it, and follow up.

16:21

It's good to give it a a term that's

16:22

unique so that it knows to reflect back

16:24

on that. And when I say that, it's

16:26

something special, not just any design

16:28

doc or implementation spec.

16:31

So,

16:32

in this spec, we've got our progress,

16:34

our surprises and discoveries. We even

16:35

have a decision log in here for me to

16:37

keep track of what it's been working on.

16:39

Now,

16:40

normally I don't ask engineers to write

16:42

this much. I only do that when maybe I

16:44

don't like their project.

16:46

[Laughter]

16:47

But in this case, this helps Codex steer

16:50

towards a completed project. It is its

16:52

memory as it works on this large plan.

16:56

And after this talk, we'll upload the

16:58

plans.md recipe to our OpenAI cookbooks

17:01

so any of you can adopt it in your

17:02

repositories.

17:05

Now,

17:06

how does it know

17:07

how to use this plans.md? As I mentioned

17:10

earlier,

17:12

I've used my agents.md.

17:14

I drop a couple lines here in my

17:16

agents.md, just a few instructions. When

17:18

you're working on something complex,

17:20

this is what an exec plan is, refer

17:22

plans.md,

17:24

make sure that you're following that.

17:26

Now, as you can see, it's doing quite a

17:28

bit of research on the side here, so

17:30

let's go ahead and look at a completed

17:31

spec.

17:34

So, I've switched over to a completed

17:36

session here, and it's written my spec.

17:39

Let me open up that plan here.

17:43

So,

17:44

I can review this. I can give it

17:46

feedback. I can look at Okay, that looks

17:48

like, you know,

17:50

quite a lot of words, but it is what I

17:51

wanted to do, and it has a plan.

17:54

Looks like a couple spikes, some

17:56

features that it wants to implement, and

17:58

of course documentation. So, that looks

18:00

good to me. I'm going to go ahead and

18:01

tell

18:02

Codex,

18:05

let's go ahead

18:06

and implement.

18:10

And we can't type today. There we go.

18:13

And

18:14

so while that runs, uh I like to keep an

18:17

eye on Codex. It keeps something

18:19

scrolling on my screen. My manager knows

18:20

that I'm still working.

18:23

And I like to watch the tests. So,

18:27

what I'll do is I'll kick off these

18:28

tests. They run very fast. Uh Codex

18:31

helped me write all of these, by the

18:32

way, from simple property tests or

18:34

simple uh unit tests to exhaustive

18:36

property tests. There's even some

18:38

fuzzing in this crate.

18:39

And uh so, I'll keep an eye on this, and

18:42

if it stays red for too long, I might

18:45

intervene and say, you know, Codex,

18:46

maybe we need to back out. Maybe that

18:48

plan is going a little off the rails.

18:51

All right, let's go ahead and look at

18:54

what it's completed in this project. So,

18:57

I'm skipping ahead to Codex having

18:59

finished that task. By the way, that

19:01

took over an hour in my my previous

19:03

session, so we're skipping ahead quite a

19:04

ways. And it looks like it's written

19:06

some new tests.

19:08

Um they're all passing, which is great.

19:10

Uh let's go ahead and look at the

19:11

changes.

19:13

Okay.

19:14

Wow, and it looks like it vendored in

19:16

and even maybe forked or updated the

19:19

upstream library to make some changes to

19:21

implement what it needed to do.

19:23

Now,

19:24

again, I don't have all day to read all

19:26

of this to you, so I'm going to go ahead

19:27

and open up the plan again.

19:30

So, I open up the plan, and I can see in

19:33

the progress, it's checked off some big

19:35

items. It's completed some spikes.

19:37

It's updated documentation in the

19:38

readme.

19:40

Plans.md specifies that all of these

19:42

plans have to be a living document, and

19:44

so I can use this as an executive

19:45

summary to know what it's accomplished.

19:48

That way I don't have to read all the

19:49

code myself.

19:54

Okay.

19:55

It looks like it's done and the tests

19:57

are passing.

19:59

So,

20:00

uh

20:01

what I've shown you today is we can go

20:03

from

20:04

uh

20:05

an implementation idea feature

20:07

a prompt to a PR in only a few steps.

20:11

Rigorous planning and thorough testing

20:14

enabled the model to work on this

20:15

feature for a sustained period of time.

20:17

And let's just see how many lines of

20:19

code it's written.

20:28

Crashed.

20:30

Okay.

20:31

Okay.

20:32

4,200 lines of code and just about an

20:36

hour of work.

20:37

Incredible.

20:38

Now,

20:39

I could just merge this as is, but I

20:41

would really like another set of eyes on

20:42

this code.

20:44

Thankfully, we have Daniel up next to

20:45

talk about code review.

20:54

Hello.

21:03

All right, my name is Daniel and I'm an

21:05

engineer here on the Codex team. So, uh

21:09

today I want to talk about code reviews.

21:13

As Thibault mentioned, we launched code

21:15

reviews on GitHub a couple months ago

21:16

and it has been a huge hit.

21:19

Um

21:20

both externally, but especially

21:21

internally, we love code reviews. We

21:24

have them running on all of our PRs and

21:26

it's finding so many bugs that we would

21:28

have otherwise missed. And some of these

21:30

bugs are so complex that you have to

21:33

like read and reread the comment a

21:35

couple times to even understand what

21:36

it's saying. So, I highly highly

21:39

recommend you enable code reviews for

21:41

all your GitHub PRs.

21:42

Um

21:44

here's an example of one of my PRs

21:46

that's on the Codex repo. It's open

21:48

source.

21:49

So,

21:50

I uh

21:52

pushed a feature and then immediately

21:54

Codex started reviewing my code and it

21:57

found a P1 issue. Great.

22:00

Uh so, then I said, "Thanks, Codex.

22:02

Please fix it." And that kicked off a

22:04

background task um to to make that

22:07

change. And then once that got merged, I

22:09

said, "All right, Codex. Um

22:12

now that you have all this, now that you

22:13

have this change, review it again. Make

22:15

sure we don't have any issues."

22:18

And then it found another issue.

22:20

And then uh I was just embarrassed.

22:23

So,

22:25

this got me thinking.

22:28

What if you could have a workflow where

22:30

you create a feature

22:32

and then you review it for bugs and then

22:34

if there are any bugs, you fix it and

22:35

then you review it again and then you

22:37

fix and review and fix and review until

22:39

theoretically your code doesn't have any

22:41

issues.

22:42

So,

22:43

uh we decided to make this super easy by

22:45

bringing code reviews to local as well.

22:48

And I'm going to show you how to do that

22:49

with slash commands. And this is what I

22:51

do every day before I even submit the

22:53

PR.

22:56

Okay.

22:57

Uh so, I'm working on a little feature.

23:00

Uh you can see it has like three

23:03

different commits. It's a pretty small

23:04

one.

23:05

Um and I have the CLI running on the

23:08

side. So,

23:10

all I have to do is write {slash}

23:13

review,

23:14

hit enter,

23:17

and then you'll see there are a couple

23:18

different options here.

23:20

So, the first option is reviewing

23:22

against a base branch, just like a PR.

23:24

So, this would take a some of your well,

23:26

all of your commits in your base branch,

23:28

compare it to main,

23:29

uh just like a normal PR, and then look

23:31

at the whole diff and try to find any

23:33

uh issues with it.

23:35

There are other options, too, like

23:36

reviewing uncommitted changes or

23:38

specific commit or custom review

23:39

instructions, but what I usually do at

23:42

the end of the day when I have a bunch

23:43

of different commits is just review the

23:46

whole thing. So, I select the first

23:47

option. Now, I have to select a base

23:49

branch. Usually main is the first one,

23:51

so I hit enter again.

23:53

And now code review begins.

23:56

So,

23:58

a question I have is

24:00

why is it so good?

24:02

Why is GPT-5 Codex so good at code

24:05

reviews? Because we actually trained it

24:06

specifically on finding very technical

24:09

bugs and it will go on for a very long

24:12

time researching all sorts of different

24:14

files and then when it has a hypothesis

24:16

for something that could be wrong, it'll

24:18

even write tests, um scripts, execute

24:21

them to make sure that it gives you like

24:24

one or maybe at most two critical issues

24:28

that you have to fix before you land

24:29

your PR. It doesn't give you like 20 or

24:31

30 different things that it one shots

24:33

from just looking at your diff. It

24:37

doesn't waste your time.

24:39

So, um

24:42

yeah, there's actually a bug here.

24:44

Uh

24:45

If anyone gets Oh, nice.

24:47

It It got it for us.

24:49

Um so, it's a P0. Great. Um

24:54

and it's exactly correct. So, we aren't

24:57

supposed to be hardcoding the string

24:59

here in the code. We should be getting

25:01

that dynamically. So,

25:04

all I have to do now is tell Codex,

25:07

please fix.

25:11

Uh and usually I don't even read the

25:14

comments, so

25:16

uh

25:17

it just goes.

25:19

But,

25:20

yeah, and the ni- the nice thing about

25:23

reviews in the CLI is that it actually

25:25

spawns a separate thread from the

25:26

parent. So, let's say you've been

25:28

working on this feature and it is like

25:30

super biased, you know, you have to do

25:32

this feature like this, you have to

25:33

implement it like that.

25:35

The review thread is separate. It has a

25:37

fresh pair of eyes, a fresh context, new

25:40

chat, uh so it doesn't have that same

25:43

implementation bias and it'll help find

25:45

these bugs for you.

25:47

Uh so, yeah, that is going to go ahead

25:50

and and uh you know, give us some

25:53

changes. While that runs, I want to

25:54

actually show you how you can enable

25:57

um reviews on all your PRs. So, go to

25:59

chat.openai.com/codex.

26:02

And then

26:03

you just connect your GitHub.

26:06

And then there's a button here called

26:07

enable code review.

26:08

So, this will take you to the code

26:10

review settings and you can have like

26:12

repository level settings to say like I

26:14

want this repo to get code reviews, I

26:17

want that one to not, but I just have

26:19

this toggle over here that I just say,

26:21

"Review all of my pull requests. Please

26:23

make sure I don't ship any regressions

26:25

to prod."

26:26

So,

26:28

let's go back.

26:29

Fantastic.

26:30

Uh it made the change. Let's see.

26:33

That looks correct. Yeah, now it's

26:35

getting the prompt directory

26:36

dynamically. So, now that this is done,

26:41

what I want to do is I want to run

26:42

{slash} review again.

26:44

So, I hit {slash} review, enter, enter.

26:47

Great. So, this will start another

26:50

review thread. And then once that goes

26:53

on, hopefully it won't find any issues,

26:55

but if there are any issues, you can

26:56

continue it again.

26:57

Um and then once that's done, it gives

26:59

you a thumbs up.

27:01

You commit.

27:03

You push to get. And then you get one

27:05

final thumbs up from uh Codex on your PR

27:08

and you're merged.

27:10

So,

27:11

that is what I do every day

27:13

using {slash} review in my daily

27:15

workflow before I even create a PR.

27:18

Thank you so much and I'll hand it back

27:20

to Thibault to wrap it up.

27:25

[Music]

27:27

All right, folks.

27:29

That's it.

27:30

I hope today's demos gave you a glimpse

27:32

of

27:33

how we're shipping faster and with more

27:36

confidence with Codex and a little bit

27:39

about where we're going.

27:41

If you haven't tried Codex yet, just npm

27:44

install. This will give you Codex right

27:47

in your terminal. Then you just type

27:48

Codex and you could get going and use a

27:51

lot of the things that we demoed you

27:52

today. Everything we showed to you today

27:54

is real and you can use it right away.

27:57

Gabriel Peel, one of the people here

28:00

working on the Codex team, actually just

28:02

sent me a message that the V045

28:05

of the CLI is out like right now. It has

28:08

a few

28:09

incremental updates and also support for

28:12

uh OAuth MCP, which I think is very

28:14

cool. Uh so, just go and install it. Um

28:17

and this will give you the latest

28:18

version. And then if you want to hang

28:20

with a few of the people building Codex,

28:23

uh just come and join us at the booths.

28:25

Uh there will be some of us there and

28:26

also some of the, you know, top users of

28:29

Codex here at OpenAI. We also have

28:32

uh a Q&A

28:33

on uh Discord that you can join and uh

28:37

this will uh start shortly. So, come and

28:39

say hi. Don't be shy. And uh thank you

28:41

for joining today.

28:43

[Music]

28:43

[Applause]

Interactive Summary

Thibaud introduces Codex as an AI software engineer, describing it as a human teammate for pair programming and delegation, which has experienced tenfold growth due to recent updates. Codex now functions across various environments like IDEs, terminals, GitHub, web, and mobile. Key technical improvements include an overhauled agent with the GPT-5 Codex model (smarter, better code style, adaptive thinking) and a rewritten harness supporting planning and long conversations. The CLI was revamped, an IDE native extension was launched (gaining 100k users in a week), and Codex Cloud tasks run 90% faster with visual verification capabilities. Codex also integrates with tools like GitHub and Slack for task delegation and solution generation. A significant feature is its ultra-thorough code review capability, powered by GPT-5 Codex, trained to find critical technical bugs, which can be automated on GitHub PRs or used locally. Internally at OpenAI, 92% of technical staff use Codex daily, leading to a 70% increase in PR submissions and almost all PRs being reviewed by Codex. Demonstrations included Nacho Soto using Codex for UI development with visual snapshot verification, Freal leveraging it for complex, sustained refactors over many hours using detailed "exec plans," and Daniel showcasing local code reviews with automatic bug fixing and re-review cycles. Users are encouraged to install Codex via npm to use its features immediately.

Suggested questions

8 ready-made prompts