HomeVideos

Defying Gravity - Kevin Hou, Google DeepMind

Now Playing

Defying Gravity - Kevin Hou, Google DeepMind

Transcript

761 segments

0:13

[music]

0:21

All right. Hello. Last one of the day.

0:23

Can we get a uh little energy boost?

0:25

Who's ready? Who's ready? [applause]

0:29

All right, happy Friday. I hope everyone

0:31

has had a good week, a good conference.

0:33

Um, and let me tell you, it's been a

0:35

really bad week if you are Gravity.

0:37

Wicked 2 is coming out tonight. And

0:40

then, of course, Anti-gravity came out

0:42

earlier this week alongside Gemini 3 Pro

0:44

on Tuesday.

0:46

Google Anti-gravity is a brand new IDE

0:49

out of Google DeepMind. It's the first

0:52

one from a foundational lab and it is

0:54

coming right off the press. In fact, um

0:57

I probably should be working on the

0:58

product right now, but I wanted to spend

0:59

some time to share what we've built here

1:01

today.

1:04

Anti-gravity is unapologetically

1:06

agent first. And today, I'm going to

1:08

tell you a little bit about what that

1:10

means and how it manifests in the

1:12

product. But perhaps maybe a little bit

1:14

more interestingly, we're going to talk

1:15

a little bit about how we got here.

1:17

Product principles, direction of the

1:18

industry, these sorts of things. Um so

1:21

my name is Kevin How. I lead our product

1:22

engineering team at Google Antigravity.

1:25

And let's start with the basics. Um, and

1:28

first just to get a sense of the room.

1:29

Um, who has used anti-gravity?

1:33

All right, there you go. Power of

1:34

Google. Love it. Um, who's used the

1:37

agent manager?

1:40

Cool. Nice. Good. Good. All right. So,

1:42

basics of anti-gravity.

1:45

Anti-gravity, notably anti-gravity, not

1:47

anti-gravity. Anti-gravity. It's an AI

1:49

developer platform with three surfaces.

1:51

The first one is an editor. The second

1:54

one is a browser and the third one is

1:57

the agent manager. So we'll dive into

1:59

what this means, which one what what

2:01

each looks like. So a paradigm shift

2:04

here is that agents are now living

2:07

outside of your IDE and they can

2:08

interact across many different surfaces

2:10

that your agent or that you as a

2:12

software developer might spend time in.

2:14

And let's start with the agent manager.

2:15

So that's the thing up top. This is your

2:17

central hub. It's an agent first view

2:20

and it pulls you one level higher than

2:22

just looking at your code. So instead of

2:24

looking at diffs, you'll be kind of a

2:27

little bit further back. And at any

2:29

given time, there is one agent manager

2:31

window.

2:32

Now you have an AI editor. This is

2:34

probably what you've grown to love and

2:36

expect. Has all the bells and whistles

2:38

that you would expect. Uh lightning fast

2:40

autocomplete. This is the part where you

2:42

can make your memes about yes, we forked

2:43

VS Code. And it has an agent sidebar.

2:46

And this is the sort of thing it's

2:47

mirrored with the agent manager. And

2:49

this is when you need to dive into your

2:51

editor to accomplish maybe your 80% to

2:53

100% of your task. And at any point, we

2:56

made it very very easy because we

2:57

recognize not everything can be done

2:58

purely with an agent for you to command

3:01

E or control E and hop instantly from

3:04

the editor into the agent manager and

3:06

vice versa. And this takes on under 100

3:09

milliseconds. It's zippy. And then

3:11

finally, something that I love, an agent

3:14

controlled browser. This is really,

3:15

really cool. And hopefully for the folks

3:17

in the room that have tried

3:18

anti-gravity, you've noticed some of the

3:19

magic that we've put in behind here. So,

3:22

we have an agent controlled Chrome

3:24

browser. And this gives the agent access

3:26

to the richness of the web. And I mean

3:28

that in two ways. The first one, context

3:31

retrieval, right? It has the same

3:32

authentication that you would in your

3:34

normal Chrome. You can give it access to

3:35

your Google Docs. You can give it access

3:37

to, you know, your GitHub dashboards and

3:39

things like that and interact with a

3:40

browser like you would as an engineer.

3:43

But also what you're seeing on the

3:44

screen is that it lets you it lets the

3:46

agent take control of your browser,

3:48

click and scroll and run JavaScript and

3:51

do all the things that you would do to

3:52

test your apps. So here I put together

3:55

this like random artwork generator. All

3:57

you do is refresh and you get a new

3:58

picture of um like a Thomas piece of

4:01

Thomas Cole artwork. And now we added in

4:03

a new feature which is this little

4:04

little modal card. and the agent

4:06

actually went out and said, "Okay, I

4:07

made all the code, but instead of

4:09

showing you a diff of what I did, let's

4:11

instead show you a recording of Chrome."

4:13

So, this is a recording of Chrome where

4:15

the blue circle is the mouse. It's

4:17

moving around the screen, and in this

4:18

way, you get verifiable results. So,

4:21

this is what we're very excited about

4:22

our uh our Chrome browser. And then the

4:25

agent manager can serve as your control

4:27

panel. The editor and the browser are

4:29

tools for your agent. And we want you to

4:32

spend time in the agent manager. And as

4:34

models get better and better, I bet you

4:36

you're going to be spending more and

4:37

more time inside of this agent manager.

4:39

And it has an inbox, and I'll talk a

4:41

little bit about this and sort of why we

4:43

did this, but it lets you manage many

4:46

agents at once. So you can have things

4:49

that require your attention. For

4:50

example, running terminal commands. We

4:52

don't want it to just kind of go off and

4:53

just run every terminal command. There

4:54

are probably some commands that you want

4:55

to make sure you you hit okay on. So

4:57

things like this will get surfaced

4:58

inside of this inbox. One click, you can

5:00

manage many different things happening

5:01

at once.

5:03

And it has a wonderful OS level

5:05

notification. So if there is something

5:06

that you need, it will sort of let you

5:08

know. And this kind of solves that

5:09

problem of multi-threading across many

5:12

tasks at once. And so our team is

5:14

thrilled to launch this brand new

5:16

product. It's a brand new product

5:18

paradigm. And we did so in conjunction

5:19

with Gemini 3, which was a very exciting

5:21

week for the team. But alas, we ran out

5:25

of capacity.

5:26

[laughter]

5:27

Um, this has been tormenting me the last

5:29

couple of days. And so I apologize. On

5:31

behalf of the anti-gravity team, I'd

5:33

like to apologize for our global chip

5:34

shortage. Um, we're working around the

5:36

clock to try and make this work for you.

5:37

Uh, hopefully we'll have a few less of

5:39

these sorts of errors. Um, but we, it's

5:41

what's been really exciting is people

5:43

who have used the product have seen what

5:45

the magic of combining these three

5:46

surfaces can do for your workflows, for

5:48

your software development. Um, so let's

5:50

talk about it. Why did we build the

5:53

product? How did we arrive at this sort

5:55

of conclusion? You might say, "Oh,

5:57

adding in a new window, it's pretty

5:58

pretty random, right? It's this one to

5:59

many relationship between the agent

6:01

manager and many other surfaces.

6:04

Um, and it's important to remember I've

6:06

I've been at this conference a couple of

6:07

times and and everything every single

6:09

time there is this theme. The product is

6:11

only ever as good as the models that

6:13

power it. And this is very important for

6:15

us as builders, right? Every year there

6:17

is this sort of new step function. The

6:19

first there was a year when it was

6:21

autocomplete, right? Copilot. And this

6:23

this sort of thing was only enabled

6:24

because models suddenly got good at

6:26

doing this short form autocomplete. And

6:28

then we had chat. We had chat with RHF.

6:30

Then we had agents. So you can see how

6:32

every single one of these product

6:33

paradigms is sort of motivated by some

6:35

change that happens with model

6:36

capabilities. And it's a blessing that

6:39

our team is able to work and be embedded

6:41

inside of DeepMind. We had access to

6:44

Gemini for a couple of months um earlier

6:46

and we were able to work with the

6:47

research team to basically figure out

6:48

you know what are the strengths that we

6:50

want to show off in our product. what

6:51

are the things that we can exploit and

6:52

then also what are the gaps right this

6:54

desired experience where are the gaps in

6:57

the model and and how can we fix that

6:59

right and so this is this was a very

7:01

very powerful part of why anti-gravity

7:03

came to be and there are four main

7:05

categories of improvements powered by a

7:07

little nano banana artwork the first one

7:10

is intelligence and reasoning you all

7:12

are probably familiar with this you use

7:13

nano or you used um Gemini 3 and you

7:16

probably thought it was a smarter model

7:17

this is good it's better at instruction

7:19

following it's better at using tools.

7:21

There's more nuance in the tool use. You

7:22

can afford things like, you know,

7:24

there's a browser now. There's a million

7:25

things that you could do in a browser.

7:26

It can literally even execute

7:28

JavaScript. How do you get an agent to

7:30

understand the nuance of all these

7:31

tools? It can do longer [clears throat]

7:33

running tasks. These things now take a

7:35

bit longer, right? And so you can afford

7:37

to run these things in the background.

7:39

It thinks for longer. Just time has

7:41

gotten stretched out. And then

7:43

multimodal. I really love this property

7:45

of what Google has been up to. the

7:47

multimodal functionality of Gemini 3 is

7:49

off the charts and you start combining

7:51

it with all these other models like Nano

7:53

Banana Pro um and you really get

7:55

something magical. So we have these

7:56

roughly four different categories where

7:58

things have gotten much better

8:01

and if you think about these properties

8:02

the question becomes what do we do about

8:04

these differences and from a product

8:07

perspective it's like how do you

8:08

construct a product that can take

8:09

advantage of this new wave and hopefully

8:11

and in my opinion this is the next step

8:13

function autocomplete chat agents and

8:16

then I probably got to come up with

8:18

something more interesting than whatever

8:19

this thing is called.

8:22

So step one is we want to raise the

8:24

ceiling of capability.

8:26

We want to aim higher, have higher

8:27

ambition.

8:30

And so a lot of the teams at DeepMind

8:33

were working on all sorts of cutting

8:34

edge research, right? There's Google is

8:37

a big big company. And one of my

8:38

learnings going from a startup to one of

8:40

these bigger companies is that there is

8:41

a team of people that is attacking a

8:43

very very hard technical problem. And as

8:45

a nerd, this is super exciting, right?

8:47

And then as a product person it's like

8:48

wow we can start using computer use. So

8:52

browser use has been one of these huge

8:54

unlocks.

8:57

And this is twofold right I mentioned

8:58

the sort of retrieval aspect of things.

9:02

Um

9:04

I guess for for software engineers there

9:05

is much more that happens that is beyond

9:07

the code right you can roughly think

9:08

about it as there's what to build

9:10

there's how to build it and then you

9:12

actually have to build it. I would say

9:13

building it has become more or less you

9:16

know it's reasonable for the model to

9:17

now given context it can generate the

9:18

code that hopefully functionally works

9:21

and then you've got the what to build

9:22

this is the part that is up to you kind

9:24

of human imagination and then there's

9:26

the how to build it right and there's

9:27

this richness in context the richness

9:29

and institutional knowledge and these

9:30

are the sorts of things that having

9:33

access to a browser having access to

9:34

your bug dashboards having access to

9:36

your experiments all these sorts of

9:38

things that now gives the agent this

9:40

additional level of context and maybe I

9:42

should have clicked before, but if you

9:43

saw on the screen, let's see, how do I

9:45

do this?

9:46

So, this is now the other side of

9:48

things. Browser is verification. So, you

9:50

might have seen this video, this is a

9:51

tutorial video that we put together on

9:52

just how to use it. But this is the

9:54

agent. The blue border indicates that

9:56

it's being in control by the agent. And

9:58

so, this is a flight tracker. You put

9:59

in, you know, a flight ID and then it'll

10:01

give you sort of the start and end of of

10:03

that flight. And this is being done

10:04

entirely by a Gemini computer use

10:07

variant. So it can click, it can scroll,

10:10

it can retrieve the DOM, it can do all

10:12

the things. And then what's really cool

10:13

is you end up with not just a diff, you

10:16

end up with a screen recording of what

10:18

it did. So it's changed the game. And

10:20

the model can take this and because it

10:21

has the ability to understand images, it

10:24

can take this and iterate from there. So

10:26

that was the first category, browser

10:27

use, just an insane, insane magical

10:29

experience. Now the second place that we

10:32

wanted to spend time is on image

10:33

generation. And we noticed this theme

10:35

when we, you know, when I when I first

10:36

started at at Google, we noticed, okay,

10:38

Gemini is spending a lot of time on

10:39

multimodal. And this is really great for

10:42

consumer use cases, right? Nano Banana 2

10:43

was was mindboggling. Um, but also for

10:46

devs. Devs are inherently this is a

10:49

multimodal experience. You're not just

10:51

looking at text. You're looking at the

10:52

output of websites. You're looking at

10:53

architecture diagrams. There's so much

10:55

more to coding than just text. And so

10:59

there's image understanding. This is

11:01

verifying screenshots, verifying

11:03

recordings, all these sorts of things.

11:05

And then the beautiful part about Google

11:06

is that you have this synergistic

11:08

nature. This product takes into account

11:09

not just Gemini 3 Pro, but also takes

11:12

into account the image side of things.

11:14

And so here I want to give you a quick

11:15

demo of um mockups. So I have a hunch

11:19

and you all probably believe this too.

11:20

Design is going to change, right? You're

11:24

going to spend, you know, maybe some

11:25

time iterating with an agent to to

11:26

arrive at a mockup. But for something

11:28

like, oh, let's build this website. we

11:30

can start in image space. And what's

11:32

really cool about image space is it lets

11:33

you do really cool things like this. We

11:35

can add comments. And so you end up

11:37

commenting and leaving a bunch of a

11:39

bunch of queued up responses. And it's

11:41

kind of like GitHub. You'll just say,

11:42

"All right, now update the design."

11:45

And then it'll put it in here. The agent

11:47

is smart enough to know when and how to

11:48

apply those comments. And now we're

11:50

iterating with the agent in image space.

11:52

So really, really cool new capability.

11:54

And what was awesome is that um we had

11:57

Nano Banana Pro, you know, we pulled an

11:59

allnighter for uh for the Gemini launch

12:01

because that was our first launch. Then

12:02

they said, "Do it again. Do it on

12:04

Thursday." So we made Gemini Pro um or

12:07

I'm getting all these model names

12:08

confused. The image Gen one, the Nano

12:10

Banana one, we made that available on

12:11

day one. I'm running on very little

12:13

sleep on day one inside of the

12:15

anti-gravity editor. And our hope is

12:17

that the anti-gravity editor is this

12:18

place where any sort of new capability

12:20

can be represented inside of our

12:22

product.

12:24

And so step two was all right, we have

12:26

this new capability. We've pushed the

12:28

ceiling higher. Agents can do longer

12:30

running tasks. They can do more

12:31

complicated things. They can interact on

12:32

other surfaces. And so this necessitates

12:35

a new interaction pattern. And we're

12:37

calling this artifacts.

12:40

This is a new way to work with an agent.

12:43

And this is one of my favorite parts

12:44

about the product. And at its core is

12:46

this agent manager.

12:48

So let's start by defining an artifact.

12:51

An artifact is a dynamic representation

12:54

of something that the agent generates.

12:56

Sorry, it's a an artifact is something

12:58

that the agent generates that is a

13:00

dynamic representation of information

13:02

for you and your use case. And the key

13:05

here is that it's dynamic.

13:07

Artifacts are used to keep the agent

13:09

organized. They can use used for uh kind

13:11

of like self-reflection and and

13:13

self-organization. It can be used to

13:15

communicate with the user to maybe give

13:17

you a screenshot to maybe give you a

13:18

screen recording like we described. And

13:20

it can also be used across agents,

13:22

whether this be with our browser sub

13:24

agent or with other conversations or as

13:27

memory. And this is what you see on the

13:29

right side of this agent manager. We've

13:32

dedicated sort of half the screen and

13:34

and your sidebar to this concept of

13:36

artifacts.

13:39

And so we've all tried to follow along

13:41

chain of thought. And I would say this,

13:44

you know, we did some fanciness here

13:45

inside of the agent manager to make sure

13:47

conversations are broken up into like

13:48

chunks. So in theory, you could follow

13:50

along a little bit better in the

13:51

conversation view, but ultimately you're

13:53

looking at a lot a lot of strings, a lot

13:54

of tokens. This is like very hard to

13:56

follow. And then this is actually like

13:58

there's like 10 of these, right? So you

14:00

just scroll and scroll and scroll.

14:01

You're like, "What the heck did this

14:02

agent do?" And and this this has been

14:04

traditionally the way that people review

14:07

and sort of supervise agents. You're

14:08

kind of just looking at the thought

14:10

patterns.

14:12

But isn't it much easier to understand

14:13

what is going on inside of this visual

14:15

representation? And that is what an

14:17

artifact is. The whole point and the

14:19

reason why I'm not just standing up here

14:20

and giving you this long, you know,

14:22

stream of consciousness is because I

14:23

have a PowerPoint. The PowerPoint is my

14:25

artifact. And so Gemini 3 is really

14:29

really strong with this sort of visual

14:30

representation. It's really strong with

14:32

multimodal. And so instead of showing

14:34

this, which of course we always let you

14:36

show, we always we will always show you

14:37

this, but we want to focus on this. And

14:39

I think this is the game-changing part

14:41

about anti-gravity.

14:43

And the theme is this dynamicism.

14:46

The model can decide if it wants to

14:49

generate an artifact. And let's remember

14:50

there are some tasks. We're changing a

14:52

title. We're changing something small.

14:53

Doesn't really need to to produce an

14:54

artifact for this. So, it will decide if

14:57

it needs an artifact. And then second,

14:59

what type of artifact? And this is where

15:01

it's really cool. There there are many

15:03

potential in potentially infinite ways

15:05

that it can represent information. And

15:07

so, the common ones are markdown in the

15:11

concept of a of a plan and a

15:12

walkthrough. So, this is probably what

15:14

you've used most most often. When you

15:16

start a task, it will do some research.

15:17

It will put together a plan. This is

15:19

much very much like a PRD. It will even

15:21

list out open questions. So, you can see

15:22

in this feedback section, it'll surface,

15:24

hey, you should probably answer these

15:26

three questions before I get going. And

15:27

what's really awesome, and we're betting

15:28

on the models here, what's really

15:30

awesome is that the model will decide

15:31

whether or not it can auto continue. If

15:33

it has no questions, why should it wait?

15:35

It should just go off. But more often

15:38

than not, there are probably areas where

15:39

you may be underspecified or maybe it

15:41

did something during research, right?

15:42

everyone has gone through and and

15:43

started a big refactor then realized

15:45

they actually don't have all the

15:45

information ahead of them. They got to

15:46

go back to the drawing board, maybe talk

15:48

to some people. Same idea. So it'll

15:50

surface um it'll surface open questions

15:53

for you. And so that's you'll start with

15:55

that implementation plan and then you'll

15:56

say all right LGTM let's like send it.

15:59

You go all the way down. It might

16:00

produce other artifacts. You know we've

16:02

got a task list here. This is the way

16:03

that you can monitor the the progress of

16:06

the agent instead of looking at the

16:07

conversation. might put together some

16:08

architecture diagrams and then you'll

16:10

get a you'll get a walkthrough at the

16:12

end and this walkthrough you kind of saw

16:13

a glimpse of this before but it is hey

16:16

how do I prove to you agent to human

16:18

that I did the correct thing and I did

16:20

it well and then this is the part that

16:22

you'll end with it's kind of like a PR

16:24

description and then there's a whole

16:25

host of other types right Images screen

16:27

recordings these mermaid diagrams and

16:30

really what's what's what's quite cool

16:32

is that because it's dynamic the agent

16:34

will decide this over time so suddenly

16:35

there's maybe a new type of artifact

16:36

that we maybe we missed Right? And then

16:40

it'll figure that out. It'll just become

16:41

part of the experience. So it's very

16:43

scalable. But this artifact primitive is

16:44

something that's very very powerful that

16:46

I'm pretty excited about. And then I

16:49

guess another question is why is it

16:50

needed? So we'll always explain to the

16:52

user what the purpose of this artifact

16:53

is. Um and then interestingly like who

16:57

should see it? So should the sub agents

16:59

see it? Should the other agents see it?

17:02

Should other conversations see this?

17:03

Should this be stored in my memory bank?

17:05

Right? If this is something that I

17:06

derived, one of the cool examples um

17:08

that I like is like if you give it a a

17:10

piece of documentation and give it your

17:11

API key, it'll like go off and run curl

17:14

requests to basically figure out the

17:15

exact schema of like what the types of

17:18

APIs you're using and it'll do this like

17:20

deep research um for quite a while and

17:22

then it'll give you a report and

17:23

basically like deeply understand uh this

17:25

sort of uh this sort of API. You

17:27

wouldn't want to just throw that away

17:28

and have to rederive it the second time

17:30

you did this. So it'll store it in your

17:31

memory and then all of a sudden that's

17:32

just a part of your knowledge base. So,

17:35

and then there's also this idea of like

17:37

notifications, right? So, if there's an

17:38

open question, you want the agent to be

17:40

proactive with you. And that's another

17:43

very cool property of this artifact

17:44

system. We want to be able to provide

17:47

feedback along this cycle. So, from task

17:50

start to task end, we want to be able to

17:52

provide feedback and inform the agent on

17:54

what to change.

17:56

And the artifact system lets you iterate

17:58

with the model more fluidly

18:01

during this process of execution. And

18:04

so, not to sound like a complete Google

18:05

shell, but I love Google Docs, right?

18:07

Google Docs is a great pattern. It's

18:09

awesome. The comments are great. And

18:11

this is how you might interact with a

18:12

colleague, right? You're collaborating

18:13

on a document. Then all of a sudden, you

18:14

want to leave a textbased comment. So,

18:16

we took inspiration from that. We took

18:18

inspiration from GitHub. But you leave

18:20

comments. You highlight text. You say,

18:21

"Hey, maybe this part needs to get

18:22

ironed out a bit more. Maybe there's a

18:24

part that you missed or actually don't

18:25

use Tailwind. Use vanilla CSS." So,

18:28

these are the sorts of comments that you

18:29

would leave. You'd batch them up and

18:30

then you go off and send. And then in

18:32

image space, this is very cool. We now

18:34

have this like Figma style drag and drop

18:37

like or not drag, you know, highlight to

18:38

select. And now you're leaving comments

18:40

in a in a completely different modality,

18:42

right? And we've done this and

18:43

instrumented the agent to ma naturally

18:46

take your comments into consideration

18:47

without interrupting that task execution

18:49

loop. So at any point during your

18:52

conversation, you could just say, "Oh,

18:53

actually, you know, mid mid browser

18:55

actuation, I actually really don't like

18:56

the way that that turned out. Let me

18:58

just highlight that, tell you,

19:00

send it off." and then I'll just get

19:02

notified when you're done taking into

19:03

consideration those comments. And so

19:06

it's a whole new way of working. And

19:08

this is really at the center of what

19:09

we're trying to build with anti-gravity.

19:10

It's pulling you out into this higher

19:12

level view. And the agent manager really

19:15

is built to optimize the UI of

19:18

artifacts.

19:20

So we have a beautiful, beautiful

19:22

artifact review system. We're very proud

19:25

of this. And it can also handle sort of

19:29

the

19:30

property that is like parallelism and

19:32

orchestration. So whether this be many

19:34

different projects, whether this be the

19:36

same project and you just want to

19:37

execute maybe a design mockup iteration

19:39

at the same time you're doing research

19:41

on an API at the same time you're

19:42

iterating and and and actually building

19:44

out your app. You can do all these

19:45

things in parallel and the artifacts are

19:48

the way that you provide that feedback.

19:49

The notifications are the way that you

19:50

know that something requires your

19:52

attention. It's a completely different

19:53

pattern. And what's really nice is that

19:55

you can you can take a step back and of

19:57

course you can always go into the

19:58

editor. I'm not going to lie to you.

19:59

There are tasks that you know you maybe

20:01

don't trust the agent yet. We don't

20:02

trust the models yet. And so you can

20:04

command E and you can command E and

20:05

it'll open inside the editor within a

20:07

split second with the exact files, the

20:09

exact artifacts and that exact

20:11

conversation open ready for you to

20:13

autocomplete away to continue chatting

20:15

synchronously to get you from 80% to

20:17

100%. So we always want to give devs

20:19

that escape hatch. But in the future

20:22

world, we're building for the future.

20:23

You'll spend a lot of time in this agent

20:25

manager working with parallel sub

20:27

agents, right? It's a very very exciting

20:28

concept.

20:31

Okay, so now that you've seen we've got

20:33

new capabilities, multitude of new

20:35

capabilities, we've got a new form

20:37

factor. Now the question is like what is

20:40

going on under the hood at Deepmind? And

20:42

the secret here is a lesson that I guess

20:45

we've just learned over the past I don't

20:47

know we've spent like or I I've

20:48

personally spent like three years in in

20:50

codegen. It's just to be your your

20:52

biggest user, right? And that creates

20:54

this research and product flywheel.

20:58

And so I will tell you anti-gravity will

20:59

be the most advanced product on the

21:01

market because we are building it for

21:02

ourselves. We are our own users. And so

21:06

in the dayto-day

21:08

we were able to give Google engineers,

21:11

deep mind researchers, we were able to

21:12

give them an early access and now an

21:14

official access to anti-gravity

21:16

internally. And so now all of a sudden

21:19

the actual experience of the models that

21:21

people are improving, the actual

21:23

experience of of using the agent manager

21:26

and touching artifacts

21:28

is letting them see at a very very real

21:31

level what are the gaps in the model.

21:34

And whether it be computer use, whether

21:37

it be image generation, whether it be

21:41

instruction following, right? Every

21:42

single one of these teams, and there are

21:44

many teams at Google, has some hand

21:46

inside of this very, very full stack

21:48

product.

21:50

And so you might notice as an

21:51

infrastructure engineer, you might say,

21:52

"Oh, this is a bit slow.

22:03

page. Well, go off and and make that

22:05

better, right? So, it gives you this

22:07

level of insight that eval just simply

22:08

can't give you. And I think that's

22:10

what's really cool about being a deep

22:11

mind. You are able to integrate product

22:14

and research in a way that creates this

22:15

flywheel and pushes that frontier. And I

22:18

guarantee you that whatever that

22:20

frontier provides, we will provide an

22:21

anti-gravity for the rest of the world.

22:23

These are the same product. And so, I'll

22:25

give you two examples of how this is has

22:27

worked. The first one was that computer

22:28

use example, right? in collaboration

22:31

with the computer use team which we sit

22:33

you know a couple couple tens of feet

22:35

away from we identify gaps on both sides

22:38

right so we're not just using an API we

22:40

are interacting across teams to

22:42

basically say oh like the capability is

22:44

kind of off here can can we go off and

22:46

figure out what's going on here maybe

22:47

there's a there's a mismatch in data

22:48

distribution and then on the other side

22:50

it's like yo your like agent harness is

22:53

like pretty screwed up you got to fix

22:55

your tools right and so then we'll go

22:56

off and we'll fix our side but it's this

22:58

harmony it's it's both sides talking to

23:00

each other that really makes this type

23:01

of thing possible. Similarly, you come

23:04

up with a new product paradigm

23:05

artifacts. Artifacts were not good on

23:08

the initial on the initial uh versions,

23:10

right? What part of training, what part

23:12

of data distribution includes this like

23:14

weird concept of reviews? And so, it

23:16

took a little bit of plumbing, a little

23:18

bit of work with the research team to

23:19

figure out, all right, let's steadily

23:21

improve this ability. Let's give you a

23:22

hill to climb. And then now we were able

23:25

to launch Gemini 3 Pro with a very good

23:28

ability to handle these sorts of

23:29

artifacts. And so it's this cyclic

23:31

nature that I'm really really betting

23:33

on.

23:34

And this this is really how anti-gravity

23:37

will defy gravity. We've got pushing the

23:40

ceiling. We're going to have an agent

23:41

with very very high level of ambition.

23:43

We're going to try and do as much as we

23:44

can. And this includes vibe coding.

23:47

Though I will say there are some

23:48

excellent products out there by Google.

23:50

AI Studio is an excellent product.

23:53

We are in the business of increasing the

23:55

ceiling.

23:58

Second, we built this agent first

24:00

experience artifacts agent manager. And

24:03

then finally, we have this research

24:05

product flywheel. And this is the magic.

24:07

And this is the three-step process that

24:08

we used in building anti-gravity.

24:13

So, it's been a blast. I mean, I've I've

24:14

been back at um AI Engineer Summit.

24:17

Thank you again, Swix and Ben, for

24:18

having me. It's been awesome to come

24:20

back every year. And so on behalf of the

24:21

anti-gravity team, I just want to thank

24:22

you for your time, for your patience as

24:25

you use the product um and your support.

24:28

And of course,

24:30

you too can adopt a TPU and help us uh

24:33

turn off pager duty a bit more. Um and

24:36

then of course, you know, you could also

24:37

yell at me on Twitter. That's another

24:38

way of doing it. Maybe do it in DMs

24:39

instead. Um but we've got a lot of

24:41

exciting things and I'm really really

24:42

excited to bring anti-gravity to market.

24:44

The team is thrilled that this is now

24:46

out in the wild. So we welcome your

24:48

feedback. Um, and thank you again for

24:50

listening. Enjoy the rest of the

24:51

conference. [applause]

24:57

[music]

Interactive Summary

Kevin How, product engineering lead for Google's new agent-first IDE, 'Anti-gravity,' presents the tool's core architecture and philosophy. The platform operates across three main surfaces: an editor, an agent-controlled browser, and an 'agent manager.' The latter serves as a hub for managing long-running tasks, artifacts, and multi-agent workflows. The development team leverages a deep integration between product development and AI research, using internal feedback to refine model capabilities, such as computer use and multimodal interactions, to enhance the developer experience.

Suggested questions

3 ready-made prompts