Context Engineering & Coding Agents with Cursor

Watch on YouTube

Now Playing

Transcript

485 segments

0:07

[Applause]

0:09

I'm Lee and I'm on the cursor team and

0:11

I'm going to talk about how building

0:13

software has evolved. So, thanks for

0:14

being here.

0:16

We started with punch cards and

0:18

terminals back in the 60s where

0:20

programming was this new superpower, but

0:22

it was inaccessible to most people. And

0:25

then in the 70s, programmers grew up

0:28

writing basic on their Apple 2s and

0:30

their Commodore 64s. Then in the 80s,

0:34

gueies started to get mainstream, but

0:36

still most programming was done on

0:38

textbased terminals. It wasn't until the

0:42

'9s and the 2000s that we started to see

0:44

programming shift to graphical

0:46

interfaces. So Front Page and

0:48

Dreamweaver, which you might remember,

0:50

allowed beginners to drag and drop and

0:53

build websites. And new editors and

0:55

idees like Visual Studio made it easier

0:58

for professionals to work in very large

1:00

code bases. And I of course had to add

1:02

my favorite text editor, Sublime Text

1:04

here. I'm sure some of you have used it

1:06

before. It's a good one. Now with AI,

1:09

building software is becoming more

1:11

accessible and powerful than ever.

1:13

Unlike this slower shift from terminals

1:16

to the shift to write code with AI is

1:20

really being speedrun. The progress of

1:23

decades is happening in just a few

1:25

years. And with each iteration, the

1:28

interface and the UX is changing to

1:30

allow the models to achieve more

1:32

ambitious tasks. So I'd like to talk

1:35

about context engineering and how coding

1:38

agents have evolved over the past few

1:40

years from the perspective of cursor.

1:42

I'll show how we've went from

1:44

autocompleting your next action to fully

1:47

autonomous coding agents. And finally,

1:49

we'll have Michael Curser CEO talk about

1:51

the future of where we believe software

1:54

engineering is headed.

1:56

So, let's start with TAB. One of the

1:59

products that inspired Cursor was GitHub

2:02

C-Pilot. It showed that with

2:04

improvements to the UX of autocomplete

2:07

and with better models, we can make

2:09

writing code much easier. We released

2:12

the first version of tab back in 2023

2:15

and the experience has evolved from

2:17

predicting the next word to the next

2:19

line and then ultimately to where your

2:21

cursor is going to go next. Tab now

2:24

handles over 400 million requests per

2:27

day. And this means we have a lot of

2:29

data about which suggestions users

2:31

accept and reject. This led to us moving

2:35

from an off-the-shelf model to training

2:38

a model specialized for next action

2:40

prediction. So to improve this model, we

2:44

use data to positively reinforce

2:46

behaviors that lead to accepted

2:48

suggestions and then negatively

2:50

reinforce rejected suggestions. And

2:53

we're able to do this in near real time.

2:55

So you can accept a suggestion and then

2:58

30 minutes later the tab model has been

3:00

updated using online RL based on your

3:03

feedback.

3:05

Getting this experience right has taken

3:08

a lot of iterations. There's a delicate

3:10

balance between the speed of the

3:12

suggestion, the quality of the

3:14

suggestion, and also just the general UX

3:16

for how it's displayed. If it's slower

3:19

than 200 milliseconds, it kind of takes

3:21

you out of your flow. But you also don't

3:23

want to see fast unhelpful suggestions.

3:26

So with our latest release now, we show

3:28

fewer suggestions, but we have higher

3:30

confidence that they're going to be

3:32

accepted.

3:34

We find tab really helpful for domains

3:36

where AI models just aren't as helpful

3:39

yet. And the bottleneck here really is

3:41

your own typing speed. Now, most people

3:43

type at about 40 words per minute, even

3:46

though I'm sure all of you type at 90

3:48

plus, right? We've got some amazing

3:49

typists in here.

3:51

So what would it look like if we allow

3:53

the AI models to write more code for us?

3:57

This is where coding agents come in and

3:59

this is that next evolution of coding

4:01

with AI. You can talk to models directly

4:04

in products like cursor or like we saw

4:06

in codeex and have them create or update

4:09

entire blocks of code.

4:12

Something we've tried really hard to

4:14

make a focus in cursor is giving you

4:16

control over the level of autonomy of

4:18

working with the models. So, one of the

4:20

first features we added back in 2023 was

4:23

prompting models to add inline

4:25

suggestions. This would take your

4:27

current line as well as the broader file

4:29

context and then pass it to the model to

4:32

suggest a diff.

4:34

Shortly after, we released our first

4:36

steps towards a coding agent, which was

4:38

a feature called composer, which some of

4:41

the longtime cursor user fans may

4:43

remember. Uh, we even have a pixelated

4:46

Twitter demo that I've included here of

4:48

one of the first versions. This made it

4:50

much easier to do multifile edits with

4:53

more of a conversational UI.

4:55

And then in 2024, we added a fully

4:58

autonomous coding agent. This saw models

5:01

use more tokens as they were getting

5:03

better at tool calling and it allowed

5:05

cursor to self gather its own context.

5:08

So in the previous versions, you had to

5:10

provide all of that context up front,

5:12

which was a bit more difficult. So let's

5:14

talk about some of the ways that we've

5:16

optimized the cursor agent harness.

5:19

There's been a lot of talk recently

5:21

about context engineering as an

5:23

evolution of prompt engineering, which I

5:26

personally find really helpful. As

5:28

models are getting better, getting high

5:30

quality output is less about specific

5:33

prompting tricks, although those can

5:35

still help, but it's more about giving

5:37

the models the right context. And not

5:40

just any context, but intentional

5:42

context. Models get worse at recalling

5:45

information as the size of the context

5:47

increases. And in reality, you don't

5:50

want to push the limits of the context

5:52

window. You want to use a minimal amount

5:54

of highquality tokens. And this is why

5:57

the retrieval of code is actually really

5:59

important and fundamental to context

6:01

engineering. So let's look at an example

6:03

of searching code in a larger codebase.

6:07

We found that when you give models very

6:09

powerful tools, it can significantly

6:11

improve the rate at which code is

6:13

accepted.

6:15

Many coding agents now use commands like

6:17

GP or RIP Grep to look for direct string

6:21

matches across files and directories.

6:23

And as new models are trained on tool

6:25

calling and agents get better at using

6:27

tools, the search quality does improve.

6:30

However, we found that you can make

6:32

searching even better by automatically

6:34

indexing your codebase and creating

6:36

embeddings. So this allows us to have

6:38

semantic search. So I can ask the agent

6:41

update the top navigation, but if the

6:44

file is actually called header.tsx, tsx

6:46

semantic search allows the agent to go

6:48

and quickly and accurately find the

6:51

correct code during the retrieval

6:52

process

6:54

for generating embeddings. We also moved

6:56

from an off-the-shelf embedding model to

6:58

training a custom model that helped us

7:00

produce more accurate results and we

7:02

constantly AB test the performance of

7:04

using semantic search. We found that in

7:07

comparison to using GP alone, users

7:10

would send more follow-up questions and

7:12

also spend more tokens. So semantic

7:14

search is really helpful. One of the

7:17

biggest wins though is it shifts where

7:19

the compute happens. You spend the

7:21

compute and the latency upfront during

7:24

the indexing rather than at inference

7:26

time when the agent is actually being

7:28

invoked. So in other words, you're doing

7:30

the heavy lifting offline, which means

7:32

you can get faster and cheaper responses

7:34

at runtime without sacrificing

7:36

performance and putting that on the

7:38

user. So the takeaway here is you likely

7:40

want both GP and semantic search for the

7:43

best results. And we'll have a full blog

7:45

post soon that talks about some of these

7:47

results. So giving the models better

7:50

tools helps improve their quality. But

7:53

what about the UX of actually using

7:55

these coding agents? There's been a lot

7:57

of exploration with coding CLI from

7:59

OpenAI's codeex to claude to Cursor's

8:02

own CLI. And the idea here is to find

8:05

the most minimal abstraction over the

8:07

model, kind of iterate on the harness

8:10

and then make the agent extensible. But

8:12

we don't believe CLIs are the final

8:15

state or the end goal of working with

8:16

coding agents. What I like about the

8:19

terminal is that it opens up a new

8:21

surface for coding agents to run. So

8:23

this can be in the CLI. It can also be

8:26

on the web or from your phone. It can be

8:28

from a bug report in Slack, which I use

8:30

all the time. It can be from a backlog

8:33

item in linear just automatically

8:34

triaged for you.

8:37

Because CLI based agents are scriptable,

8:40

you can use them in any type of

8:41

environment which is really helpful. We

8:43

use this internally to automatically

8:45

write docs or update parts of our

8:47

codebase. And it can be as simple as

8:49

just doing cursor-p and then a prompt

8:52

and having text or even structured

8:53

formats like JSON come back.

8:56

We also believe that you'll need more

8:58

specialized agents, which makes sense

9:01

when you see the keynote today. Last

9:03

year, we started experimenting with

9:04

using AI models to read and review code

9:07

instead of just writing and editing

9:09

code. And we made an internal tool

9:11

called Bugbot. It tried to help you find

9:14

meaningful logic bugs in your code. And

9:17

after using it internally for about 6

9:19

months, we found that it actually caught

9:22

a lot of bugs that we missed on code

9:24

reviews. So we decided to make it public

9:26

and funnily enough it actually caught a

9:28

bug which took down bugbot itself which

9:32

of course we accidentally ignored. So we

9:34

learned to then really pay attention to

9:36

those bugbot comments.

9:38

Newer models are also getting very good

9:40

at longer horizon tasks. So one way

9:43

we've pushed agents to run longer inside

9:46

of cursor is having them plan and do

9:48

more research upfront. This not only

9:50

gives you a chance to verify the

9:52

requirements of what you're trying to

9:54

build and course correct along the way,

9:56

but we've also seen it significantly

9:58

improves the quality of the code

10:00

generated, which makes sense, right?

10:02

You're giving the models much higher

10:04

quality input context. And to do this

10:06

well, it's more than a simple prompt

10:08

change like plan better, but you

10:10

actually need to have deeper product

10:12

integration in how you store the plans,

10:14

how you edit the files, and also giving

10:16

the model new tools.

10:19

It also makes sense to allow the agent

10:21

to create and manage a to-do list. This

10:24

gives the model the critical context so

10:26

they don't forget the task it's working

10:28

on or waste tokens. And it's like they

10:30

can have notes that they can constantly

10:32

reference. One area we're still

10:35

exploring is taking your to-dos and

10:37

making them have the same source of

10:39

truth, which is your codebase, which I

10:41

know is something that I would

10:42

personally use for smaller projects

10:44

where maybe I don't need a fully

10:45

featured task management tool.

10:48

Another important part of agent

10:50

extensibility is allowing you to package

10:53

up your workflows and then share them

10:55

with your team. So custom commands are a

10:58

way to share prompts and then rules

11:01

allow you to include important context

11:03

in every single agent conversation. One

11:05

way our engineers have found this really

11:07

helpful internally is packaging up our

11:09

commit standards and guidelines, putting

11:12

them in slashcomit and then being able

11:14

to pass in tickets like you pass in the

11:16

linear ticket that you're working on.

11:18

Another thing that I've noticed is that

11:20

a lot of the context engineering

11:22

breakthroughs actually happen in user

11:24

space first. So all of you the power

11:27

users figure out the workflows and the

11:28

patterns that actually work really well

11:31

and then as they get adopted they make

11:33

their way back into the core product as

11:35

features. So we see this with plans,

11:37

memories and rules are really all like

11:39

this.

11:41

Speaking of teams, you want to trust

11:43

these agents to write code for you. But

11:45

that requires keeping a human in the

11:47

loop. Which is why when the agent tries

11:50

to run shell commands, cursor will ask

11:52

you if you would like to run it just

11:53

once or if you're comfortable, you can

11:55

add it to the allow list to auto run in

11:58

the future. And all these settings can

11:59

be stored in code and explicitly shared

12:02

with your team, including blocking

12:04

certain shell commands or actions. Our

12:06

latest release also has custom hooks, so

12:09

you can tap into every part of the

12:11

agents run. Maybe you want to have a

12:13

shell script that runs when the agent

12:15

finishes, for example.

12:17

So, we've covered a lot of ground here.

12:19

Coding agents have evolved quite a bit

12:21

in the past year, and they're getting

12:23

better and better when you give them

12:24

very powerful tools. And as the models

12:27

have got more capable, we've actually

12:29

been able to remove overly pre precise

12:32

instructions from our system prompts

12:34

that just weren't necessary anymore. So,

12:37

what would it look like if we allowed

12:39

agents to run for significantly longer?

12:42

What is the right interface for managing

12:44

multiple coding agents?

12:47

If you're just getting started coding

12:49

with agents, I don't recommend

12:51

immediately trying to juggle multiple

12:53

agents. I mean, let's be honest, are we

12:56

really being productive running nine

12:58

CLIs in parallel?

13:00

Probably not.

13:03

Probably not yet, though. I mean, not

13:05

only do you need to set up your local

13:06

machine for running parallel agents, but

13:09

it's also kind of hard to review the

13:10

output of all of these agents. So, we

13:12

don't think that this form factor is the

13:14

end goal or the end state, but there is

13:16

promise here. One thing we've been dog

13:19

fooding over the past few months is a

13:21

new type of interface for managing

13:23

multiple coding agents. And we found

13:25

this really helpful internally when

13:28

maybe you have an agent in the

13:29

foreground, but you need to ask

13:31

questions about the codebase or maybe do

13:33

some research about tools you want to

13:35

integrate or small refactors. When you

13:38

have this fast coding model in the

13:40

foreground, you can really stay in the

13:41

flow and then you have your parallel

13:43

agents kind of run other tasks in the

13:45

background which could run for much

13:47

longer. Those could be in the foreground

13:50

on your machine. They can be in the

13:52

background on the cloud. Each one of

13:54

these decisions has unique constraints

13:56

that right now you have to think about

13:57

and spend a lot of time on. If you're in

13:59

the cloud, you get these sandbox virtual

14:02

machines, which are really nice for very

14:04

long horizon tasks, but the trade-off is

14:07

that it usually takes longer to boot up

14:09

and you have to set up some initial

14:10

configuration with the environment that

14:12

you're working in. But running agents

14:15

locally in parallel is kind of a

14:17

different type of isolation. If you have

14:19

multiple agents that are trying to

14:21

modify the same set of files on your

14:23

local machine, you need to have tools

14:26

like git work trees that allow you to

14:28

have different copies of your codebase

14:30

where you can run independently. And

14:32

then you also have to think about all

14:34

the other parts of local dev like

14:36

managing accessing your database and

14:39

viewing the work trees on different

14:40

ports today. And I I talked to some

14:42

developers early like a lot of this is

14:44

happening in userland and people are

14:47

writing scripts and hacks to make this

14:48

work really well. And what we're working

14:50

on and exploring is actually building

14:52

this natively into the cursor product.

14:55

Another idea that we've started to

14:57

explore for multiple agents is being

14:59

able to have the models compete against

15:00

each other. So what if you had GPT5 high

15:03

reasoning versus medium or low reasoning

15:06

and then you can pick the best result or

15:08

compare results across different model

15:10

providers with cursors agent. This will

15:13

soon be an option to go from one to n

15:16

for any given prompt and any models.

15:20

Part of context engineering for agents

15:22

is making it so they can check their own

15:24

work. So the agent needs to be able to

15:27

run the code, test it, and then verify

15:29

it's actually working correctly, which

15:31

is why we're exploring giving the agent

15:33

computer use. They can then control a

15:35

browser to view network requests or take

15:39

snapshots of the DOM and even give

15:41

feedback about the design of the page.

15:45

As you can tell, there's still a lot to

15:47

figure out on the right interface, the

15:50

right product experience for managing

15:51

multiple coding agents. Some of the

15:53

things I just showed are available in

15:55

cursor today in beta. So go try them out

15:57

if you're curious. And we'll have a

15:59

stable release later this month. But I

16:01

would love to hear your feedback on how

16:02

you want to work with coding agents in

16:04

the future. So come find me later and we

16:06

can talk about it. And speaking of the

16:08

future, I'd like to welcome Michael to

16:09

the stage to talk about where software

16:11

engineering is headed next.

16:17

Thanks, Lee. Our goal with cursor is to

16:21

automate coding. We think that half of

16:24

that is a model problem and an autonomy

16:27

problem. And we think that half of that

16:29

is a human computer interaction problem

16:32

of what the act of building software

16:33

looks like. We want engineers to be more

16:36

ambitious, more inventive, and more

16:39

fulfilled. And today I want to hint a

16:41

little bit at the picture of the future

16:43

that I think we can create together. one

16:46

where AI frees up more time to work on

16:49

the parts of building software that you

16:51

love.

16:53

Imagine waking up in the morning,

16:54

opening cursor, and seeing that all of

16:57

your tedious work has already been

16:58

handled. On call issues were fixed and

17:01

triaged overnight. Boiler plate you

17:04

never wanted to write was generated,

17:06

tested, and ready to merge. A world

17:08

where code review is actually fun, too.

17:12

Instead of being buried in your busy

17:14

work, your energy goes toward the things

17:16

that drew you to engineering in the

17:18

first place, solving hard problems,

17:21

designing beautiful systems, and

17:23

building things that matter.

17:25

Imagine agents that deeply understand

17:28

your codebase, your team style, and your

17:32

product sense. Agents that come back to

17:34

you after working for long, long, long

17:36

periods of time and show their work in

17:38

higher level programming languages.

17:40

Agents that propose ideas, help you

17:43

explore new directions, break down

17:45

complex projects into pieces you can

17:47

accept, reject, or refine. Ones that

17:50

extend your ambition, but never take

17:52

away your thinking and judgment. When

17:54

you have a problem too complex for

17:57

agents, they show you what they tried.

17:59

Pulling in runtime logs or debugging

18:01

tools. You'll never start from scratch.

18:05

This is the future we're working

18:06

towards. a world where building software

18:09

feels less like toil and much more like

18:11

play and where creativity is the focus.

18:15

Uh, and I think it's possible sooner

18:16

than even some of the most ambitious

18:18

people in this room think.

18:20

Uh, if this vision excites you, we'd

18:22

love to chat. And if you haven't tried

18:24

cursor, we've been shipping lots of

18:26

improvements to our agent and to our

18:28

editor. We'd love to hear what you

18:29

think. Thank you.

18:31

[Applause]

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The presentation discusses the evolution of software development from punch cards in the 1960s to the current era of AI-driven coding. It highlights how AI is rapidly accelerating the accessibility and power of software building, contrasting with the slower shifts of the past. Lee from the Cursor team details Cursor's contributions, including 'Tab' for next-action prediction and the development of fully autonomous coding agents. Key innovations such as context engineering, leveraging semantic search for efficient code retrieval, and enabling agents to plan and manage tasks are explained. The talk also explores the challenges and future possibilities of managing multiple specialized agents, envisioning a future where AI handles tedious work, allowing engineers to focus on more creative and ambitious problem-solving.