HomeVideos

New AI coding paradiagm - OpenAI Symphony

Now Playing

New AI coding paradiagm - OpenAI Symphony

Transcript

408 segments

0:00

So, Open AI just released this

0:01

open-source repo called Symphony. On the

0:03

surface level, it looks like a

0:05

orchestrator that allow you to manage

0:07

coding agents through ticket tracker

0:08

like linear, but it is a lot more than

0:10

just connecting linear. It's a totally

0:13

different way of interacting with

0:14

agents. So, the way we use coding agent

0:16

has shifted a lot for the past few

0:18

months. From initially just the

0:19

auto-complete to primarily interactive

0:21

session with coding agent to now most of

0:23

us around two or three different

0:25

sessions in parallel, each working

0:27

isolated work tree for different

0:28

features or bug fixing. And then new

0:30

tooling like Super set or conductor that

0:32

has been introduced to help you run and

0:34

manage different interactive coding

0:36

sessions easier. The problem is that

0:38

even with those tools, many people,

0:40

including myself, will feel this burden

0:42

when we are working on more than like

0:43

three different sessions cuz we just

0:45

can't context switch every minute. And I

0:47

personally have had multiple times

0:49

sending the wrong instruction to the

0:50

wrong thread. So, ceiling of how much we

0:52

can get out from those coding agent is

0:54

no longer the model capability, but our

0:57

own attention and cognitive load. And

0:59

the recent project Symphony is so

1:00

interesting is that Open AI's

1:02

engineering team had this realization

1:03

that the current experience has been

1:05

orienting around coding session, merge

1:07

PRs, but in reality for the past

1:09

decades, software workflow are largely

1:11

organized around deliverables, things

1:13

like issues, tasks, tickets, milestones.

1:15

Engineer leaders have been managing

1:17

massive amount of tasks across thousands

1:19

of workers, not by reviewing everyone's

1:21

PR, but looking at final outcomes using

1:23

tools like linear and Atlassian. And

1:25

Open AI's proposed solution is move

1:27

human up a level. Instead of managing

1:29

two three interactive sessions, you

1:31

manage tickets. The agent works at

1:33

ticket level, report back through the

1:35

ticket itself, and you stay in the loop

1:37

without monitoring individual sessions.

1:38

The ticket tracker becomes state machine

1:40

itself. And the way Symphony makes this

1:42

work is almost embarrassingly simple,

1:44

but very effective. It's a background

1:46

process. You run it once, point to a

1:48

workflow file, which we'll talk a bit

1:49

more, and then it runs forever. Every 30

1:51

seconds, this background process will

1:53

glance through your linear board. If it

1:55

finds any ticket in to-do slots, it will

1:57

set up an isolated workspace and start

1:59

agent in that workspace. And the whole

2:01

system has three key components. One is

2:03

the scheduler, the background process

2:05

that is pulling ticket data and set up

2:07

workspace, manage session life cycle,

2:09

and a workflow.md file that lives inside

2:11

your repo. It contains configuration of

2:13

scheduler and detailed instruction for

2:15

coding agent to know how to work with

2:16

those ticketing system. And those

2:18

external system like linear is a durable

2:20

state machine for human to interact with

2:22

agent. And this whole setup is actually

2:23

very flexible. You don't have to use

2:25

linear. You don't have to use Code X.

2:27

You can actually customize to whatever

2:28

you want. But overall implementation

2:30

concept is what's interesting. And the

2:31

most interesting part is this

2:32

workflow.md file. It basically break

2:35

down into two parts. The top part is the

2:36

YAML front matter. It configures

2:38

scheduler directly, like which linear

2:40

project it is, what type of ticket it

2:41

should pick up, where should agent

2:43

create isolated workspace, and even

2:45

programmatic hooks to run after it set

2:47

up the workspace. And this is very

2:48

useful, so you no longer need to rely on

2:50

agent to set those things up. As well as

2:52

how many agents can be run in parallel

2:54

and specific agent settings. And after

2:56

that, the bottom half is a markdown

2:58

file. This is the prompt agent every

3:00

single turn details rendered in. It's a

3:03

standard operating procedure for

3:04

handling tickets in this repo. How

3:06

should agent plan task? How should agent

3:08

go validate its work? And what would be

3:10

considered as done? And when should

3:11

outreach for human review? And what I

3:13

love about this design is that the same

3:15

file just live inside your repo, so it's

3:17

version controlled and can be changed

3:19

through normal pull request. And the

3:20

file itself contains some programmatic

3:22

rule that controls scheduler and also

3:24

what an agent does. There's no separate

3:26

config service, no admin panel, no UI at

3:28

all. And the team only code base on this

3:30

workflow. So, when you onboard a new

3:32

agent capability of adding new step in

3:34

the process, you just very easily change

3:36

this markdown file, and the rest will

3:37

just follow. And this whole system is

3:39

designed very flexible. You don't have

3:41

to use Code X, and you don't have to use

3:43

linear. They have one example

3:44

implementation in Elixir, which is a

3:46

programming language. But they have this

3:48

spec.md file that's detailing how this

3:50

framework or system is designed. So, you

3:53

can just drop this file to any coding

3:55

agent and ask it to build and design a

3:57

system in any programming language.

3:59

There are already a lot of different

4:00

community attempts. Like someone

4:01

building custom TUI based on the task

4:03

data. And also another person already

4:05

rebuilt it to support Cloud Code as

4:07

agent harness. And I'm going to show you

4:09

step by step how you can set these

4:10

things up. But orchestrating agent is

4:12

only part of the work. As Open AI

4:14

mentioned, this whole thing only works

4:16

if your coding agent's environment is

4:18

set up properly in a way that it can

4:20

complete tickets end-to-end atomically,

4:22

which you can call it harness engineer,

4:23

but fundamentally just whether your

4:25

environment or code base has been set up

4:27

in the right way, so agent has

4:29

everything it needs to complete task

4:30

end-to-end. And typical things like is

4:32

the system bootable, so agent can just

4:34

run a script to get everything set up

4:36

without spending time to figure that

4:37

part out. And does the system has a

4:39

proper documentation structure for

4:40

different things. And I think most

4:42

people does have these two things

4:43

properly set up in your code.md or

4:45

agent.md file. But the part I think most

4:47

of team didn't set up is those

4:48

self-verifying tools. They allow agent

4:51

to do an end-to-end test after

4:53

implementing something. And even submit

4:54

a video recording to prove that it have

4:57

tested and it's working in the ticket

4:59

directly just like in their demo. But in

5:01

the doc, they didn't really mention how

5:02

they were handling this part. So, I did

5:04

some research across many major skills.

5:07

And the best one I found is this

5:08

Playwright CRI tool. So, I believe many

5:10

of us are pretty familiar with

5:12

Playwright MCP, which allow agent to use

5:14

the browser and do a task, check the

5:16

logs. But the problem before was that

5:17

Playwright with MCP setup, it took a

5:20

huge amount of tokens in context window

5:21

even when it's not needed. But they have

5:23

released this Playwright CRI tool

5:25

alongside agent skill that detailing

5:27

every single comment. And the most

5:29

interesting comment is this video

5:30

recording CRI. So, Playwright allow

5:32

agent to run commands like video start

5:35

and video stop to capture browser

5:36

session into a MP4 or WebM video. They

5:39

even have some pretty advanced video

5:41

rendering capability where they can add

5:43

different chapter on the screen. Like

5:45

here's one example video where it can

5:47

record its own session and even add new

5:49

HTML element on top of the screen to

5:52

annotate the action the agent took. And

5:54

then upload session into linear, so you

5:56

can very easily verify if things

5:58

actually work. And as far as I know,

6:00

other tools like Chrome DevTools MCP or

6:02

agent browser don't have this video

6:04

capability out of box. So, this is one

6:06

very important skill that will make your

6:07

whole experience complete. And

6:08

meanwhile, there are also other skills

6:10

that you should add. And I just take one

6:11

of the repo I have as example. We have

6:13

this Playwright CRI tool that has a

6:15

skill as well as a list of reference for

6:17

agent to know how to like record a video

6:19

and tracing the debug logs. And we also

6:21

have a skill here to tell agent how to

6:23

start server locally. And because ours

6:25

is pretty straightforward, so it's just

6:27

a skill file. But sometimes for more

6:28

complicated things, you can create

6:30

predefined script as well. So, agent no

6:32

longer spends cognitive power on those

6:33

type of stuff. And meanwhile, I also

6:35

created this linear skill that allow

6:37

agent to know how to operate linear

6:38

tickets by using linear API as well as

6:41

things like upload video evidence of the

6:43

test. And we actually have more

6:44

documentations about different parts of

6:46

system. And in the agent.md or cloud.md

6:49

file, this is where we have a proper

6:50

index of different documentation

6:52

systems, so you can always go and find

6:53

the relevant information. We also give

6:55

more detailed debugging skills. For

6:57

example, we use Grafana to track and

6:59

store all the logging in production. And

7:00

we add a relevant Grafana log skill in

7:03

our repo, so the agent can fetch real

7:05

production logs for bug fixing. And all

7:07

those things are try to serve one

7:08

purpose, which is setting up your code

7:10

base so that your agent can fix bug,

7:12

building new features, verify things are

7:14

working fully atomically end-to-end. I

7:16

put all skills inside AI Build Club, so

7:18

you can copy-paste and ask your agent to

7:20

customize for your own code base. I put

7:22

the link in the description below, so

7:23

you can join and access. And once you

7:25

set this up, even though you don't use

7:27

Symphony, they're still going to be

7:28

really useful. But after that, this is

7:30

where we can start setting up the

7:31

Symphony, connect to linear, as well as

7:33

this workflow.md file. So, once you

7:35

clone the Symphony repo, you'll see

7:37

folder like this. You'll have this

7:38

folder of Elixir. So, this is one

7:40

version implementing Elixir programming

7:42

language from Open AI. And most of the

7:44

time, you can just use this Elixir

7:46

directly. But if you want to customize

7:47

it to like connect not linear, but

7:49

connect to Trello or Jira, you can ask

7:51

coding agent to customize it or even

7:54

building a different language by

7:55

pointing to spec.md file. And here's

7:57

basically what I did in Python folder. I

7:59

just point to spec.md file and ask it to

8:02

build a new version in Python. But most

8:04

of the time, you actually don't need to

8:05

do that. You can just reuse what Open AI

8:07

provided. And firstly, you can confirm

8:09

whether the script is So, you can run

8:10

script by doing this, which point to the

8:13

Symphony program that has been built.

8:14

And run help. So, this should show you

8:16

the actual command about how to run

8:18

Symphony. You basically just do Symphony

8:20

and point to a path to workflow.md file.

8:22

And by default, you can't just run the

8:23

Symphony like this. You can run this to

8:25

bind Symphony command to the specific

8:28

path. So, just run this. And then you

8:30

can do Symphony, point to a specific

8:32

workflow.md file. And by default, it

8:34

will give you this warning. Then you can

8:36

add this argument to the command, which

8:38

will set our Symphony background process

8:39

like this. It will track all the tasks,

8:41

show you the project, and next refresh

8:43

time. It will track a specific linear

8:45

project you set up every 30 seconds. If

8:47

there any ticket in to-do, it will pick

8:49

up and show up in this list. And all

8:51

those configurations are actually

8:52

defined in workflow.md file. So, in

8:54

workflow.md file, at the front matter,

8:56

there is a project slug. And Symphony

8:58

script will basically read that

9:00

metadata,

9:01

importing information from a specific

9:02

project. Same thing for all the other

9:04

configurations, like how frequently it

9:06

should pull the ticket data, what are

9:08

things it should do after setting up a

9:09

new workspace, how many agent can be run

9:11

at the same time, and the Code X

9:13

configuration. But once you set up this,

9:15

it's basically monitoring the specific

9:16

Symphony repo with Elixir

9:18

implementation. What we want to do is

9:20

apply this to your own workspace. It's

9:22

actually pretty straightforward. You can

9:24

just open any coding agent like Code X

9:26

or Cloud Code, point to the spec.md file

9:28

and say, "I want to set up Symphony for

9:30

my repo, and we will reuse the Elixir

9:32

implementation here, and help me build

9:34

the workflow.md file for my repo." With

9:36

just one command, coding agent is smart

9:38

enough to look at your own repo and

9:41

design a workflow.md file inside there.

9:43

And this is the one it created for me,

9:44

including the project slug and API key

9:47

and all the other configurations. But

9:49

you do need to set up linear first. If

9:51

you haven't created linear account yet,

9:52

just go create a one and then add a new

9:54

project. And in this project, click on

9:56

the button here, you can just paste into

9:59

your coding agent. This thing in the

10:00

middle here is a project slot, or you

10:03

can manually paste into the workflow.md

10:04

file as well. And meanwhile, you need to

10:06

get a linear API key, which you can get

10:08

by clicking on settings, security and

10:10

access, and add a new personal API key

10:12

here. And once you did that, you should

10:13

run this command, which will save the

10:14

linear API key globally on your

10:16

computer. So, every time when agent try

10:18

to use linear, it can access any

10:20

projects you have access to. And there

10:21

are some configuration you should do,

10:23

which is status. So, Symphony out-of-box

10:26

are designed for some specials status

10:27

control flow, like human review status

10:30

and also merging status. Once you put a

10:32

ticket into do, Symphony will

10:33

automatically pick up and put that in

10:35

progress and trigger an agent session.

10:37

And once agent finish the work, it will

10:38

change to human review status, so that

10:40

you can review the work. And once

10:42

finished, you can set the status to be

10:43

merging, which will trigger the agent

10:45

automatically raise a PR from this work.

10:47

And once you did all that, you can do

10:48

run Symphony past through your

10:50

workflow.md file, plus this I understand

10:53

that this will be running without the

10:55

usual guardrail comment. And now

10:57

Symphony will be working and picking up

10:58

all the tickets in your project here. To

11:00

make it easier, you can also create a

11:01

new view, set up this board, so that you

11:03

get this kind of Kanban experience. But

11:05

to just test, I can just create ticket

11:07

change the landing page hero copy from

11:09

your company on autopilot to your AI

11:11

growth team, and the set up the status

11:14

to be to do. And this should trigger our

11:15

agent here. If I go back here, you can

11:17

see this time it pick up this ticket,

11:20

and then you can see the agent session

11:21

show up, and then last agent message

11:23

here. And depends on your settings, you

11:25

can also go check this workspace. You

11:27

can see inside this workspace, it has

11:29

one workspace per ticket. So, each one

11:31

is running isolated environment. And

11:33

this example implementation also has uh

11:35

kind of web UI dashboard that you can

11:37

visit, and this will list out similar

11:39

information you will see in terminal

11:41

here. Not particularly useful, but I

11:43

just thought I'd mention this. And you

11:44

can see after a while, this agent

11:46

changes ticket to in progress status,

11:48

which reflect in our linear board as

11:50

well. And if I click on that, agent made

11:52

a plan and logged all the steps it did.

11:55

After a few minutes, the agent check off

11:57

every single items on the checklist, and

11:59

upload a video recording to verify

12:02

things are working. And as a human, I

12:03

can just very easily see if things are

12:05

working or not. And once I mark

12:06

something as merging, it will also

12:08

create a PR for me. So, this is a whole

12:09

end-to-end process and how you set

12:11

things up. It definitely feels like

12:12

future. If you hit any blockers, I have

12:14

more detailed step-by-step breakdown, as

12:16

well as all skills posted in the AI

12:18

Build Club. Every week, we have workshop

12:20

to go through those latest learnings and

12:21

answer any questions. So, if you're

12:23

interested, you can click on the link

12:24

below and join our next batch. But this

12:26

is project Symphony, how it works and

12:28

what's the implications. If you found

12:29

this video useful, please give me a

12:31

subscribe and comment below. Thank you,

12:32

and I see you next time.

Interactive Summary

This video introduces OpenAI's open-source project, Symphony, an orchestrator designed to manage coding agents at the ticket level rather than through manual, session-based interaction. By shifting the focus to deliverables like Linear tickets, it reduces cognitive load and allows for autonomous, end-to-end task completion. The system uses a 'workflow.md' configuration file within a repository to define agent behaviors, while also leveraging specific skills—such as Playwright with video recording capabilities—to ensure agents can verify their own work, allowing human developers to manage outcomes instead of micromanaging individual coding sessions.

Suggested questions

3 ready-made prompts