HomeVideos

From Arc to Dia: Lessons learned building AI Browsers – Samir Mody, The Browser Company of New York

Now Playing

From Arc to Dia: Lessons learned building AI Browsers – Samir Mody, The Browser Company of New York

Transcript

446 segments

0:13

[music]

0:21

My name is Samir and I'm the head of AI

0:24

engineering at the browser company of

0:26

New York. And today I'm going to talk a

0:28

little bit about how we transitioned

0:30

from building ARC to DIA and the lessons

0:33

we learned in building an AI browser.

0:37

But first, a little about the browser

0:38

company.

0:40

So we started with a mission to rethink

0:43

how people use the internet. At its

0:45

core, we believe that the browser is one

0:48

of the most important pieces of software

0:50

in your life and it wasn't getting the

0:52

attention it deserved. Simply put, the

0:56

way we've used a browser has changed

0:58

over the last couple decades, but the

1:00

browser itself hadn't. And think about

1:02

this. We we started this company in

1:04

2019. Um, and so this is a screen cap of

1:08

Josh, our CEO, sharing a little bit

1:11

about our idea on the internet a few

1:13

years ago, which we endearingly called

1:15

the internet computer. So our mission

1:18

has been to build a browser that

1:20

reflects how people use the internet

1:22

today and how we think the browser

1:25

should be used tomorrow.

1:28

So through years of discovery, trial and

1:33

error, and some ups and downs, we

1:36

shipped our first browser, Arc, in 2022.

1:40

It was a browser we felt was an

1:42

improvement over the browsers of that

1:44

time. It made the internet more

1:46

personal, more organized, and to us, a

1:49

little more delightful with a little

1:51

more craft.

1:52

And it was a browser that was loved by

1:54

many. It still is by millions, many of

1:57

whom are probably in this audience

1:58

today. I've gotten a lot of questions

2:00

about Arc today. Um, and uh, it's great,

2:05

but um, if we took a step back, we felt

2:08

that ARC was still just an incremental

2:10

improvement over the browsers of that

2:12

time. And it didn't really hit the

2:14

vision that we set out to create. And

2:17

so, uh, we kept building and then in

2:21

2022, we got access to LLMs like the GPT

2:24

models. And so, we started like we

2:27

always do with prototyping. We started

2:29

trying new ideas um and eventually

2:32

shipped a few of them in ARC. But what

2:35

started as a you know a basic

2:37

exploration turned into a fully formed

2:39

thesis. In the beginning of 2024 uh our

2:43

company put out what we called act 2 a

2:45

video on YouTube where we shared that

2:48

thesis that we believe that AI is going

2:51

to transform how people use the internet

2:53

and in turn fundamentally change the

2:56

browser itself. And so with that we

2:59

started building again but this time we

3:01

built a new browser with AI speed and

3:04

security in mind and from the ground up.

3:08

And later and sorry earlier this year we

3:10

shipped DIA our AI native browser.

3:14

It allows you to have an assistant

3:16

alongside you in all the work you do in

3:17

the browser. It gets to know you,

3:20

personalizes, helps you get work done

3:22

with your tabs, and effectively get more

3:25

work done through the apps you use. And

3:29

while it hasn't achieved our vision yet,

3:31

we fully believe it's well on the way,

3:33

too.

3:37

So, it is not easy to build a product.

3:41

You all know that. Let alone two, the

3:43

latter of which an AI native one. We've

3:46

had a lot of years of iteration, trial

3:48

and error and through that we've learned

3:50

a lot and I'm going to just talk about a

3:53

few of those things uh here today.

3:57

The first I want to talk about is

3:59

optimizing your tools and process for

4:01

faster iteration. From the beginning,

4:03

browser company has believed that we're

4:05

not going to win unless we build the

4:07

tools, the process, the platform, and

4:10

the mindset to iterate, build, ship, and

4:13

learn faster than everyone else. And

4:15

that of course holds true today but the

4:17

form it takes with AI and an AI native

4:20

product has changed.

4:22

So even as a small company where are we

4:25

investing in tooling these days? First

4:28

is prototyping for AI product features.

4:30

Second is building and running evals.

4:33

Third is collecting data for training

4:35

and for eval

4:37

uh last but definitely not least

4:38

automation for hill climbing.

4:42

So let's start with tools. Initially uh

4:45

as we always do, we built some tools.

4:47

The first was a very rudimentary uh

4:49

prompt editor and it was only in dev

4:51

builds. What did what did this mean for

4:54

us? Well, it meant a few things. One,

4:56

limited access as only engineers were

4:58

able to access this. Two, slow iteration

5:01

speeds. And three, none of your personal

5:04

context. And as you all know with an AI

5:06

product, the context is what matters. It

5:07

what gives you the feel for whether a

5:09

product is good or not.

5:11

So we evolved and since then we built

5:14

all of our tools into our product, the

5:16

product that we as a company internally

5:18

use every day. And that includes the

5:20

prompts, the tools, the context, the

5:22

models, every parameter. Um, which has

5:25

not only allowed us to 10x our speed of

5:27

ideating, iterating and refining our

5:29

products. But it has also widened the

5:32

number of people who can access and

5:33

iterate on our products themselves. from

5:35

our CEO to our newest hire can ideate

5:37

and create a new product in DIA and also

5:40

refine an existing one all with their

5:42

full context.

5:44

And this holds true with all of our

5:46

major product protocols. We have tools

5:48

for optimizing our memory knowledge

5:49

graph which all of us use and we have

5:52

tools for creating iterating on our

5:54

computer use mechanism. We actually

5:56

tried tens of different types of

5:58

computer use strategies before landing

6:00

on one before even building it into the

6:02

product itself.

6:05

And I'll say and I'll end this part with

6:08

uh it actually is a lot of fun. People

6:10

don't talk about that a lot but uh

6:12

actually building these tools into our

6:14

product has enabled so much creativity.

6:16

It has enabled our PMs, our designers,

6:19

uh customer service and strategy and ops

6:21

to try out new ideas that are tailored

6:23

to their use cases. And that ultimately

6:26

is what we're trying to do.

6:28

The next thing I want to talk about is

6:30

how we evolve and optimize our prompts

6:33

through a mechanism called Jeba. This

6:35

for us is very nent but an important

6:38

learning nevertheless.

6:40

How we heel climb and refine our AI

6:42

products is just as important as

6:44

ideulating them in the first place. So

6:46

we're investing in mechanisms to help

6:48

with this to enable faster hill climbing

6:50

and one of those being Jeepa. And this

6:52

is based on a paper from earlier this

6:54

year from a few smart folks.

6:57

So the key motivation here is simple.

6:59

It's a sample efficient way to improve a

7:00

complex LLM system without having to

7:03

leverage RL or other fine-tuning

7:04

techniques. And for us as a small

7:07

company, that's hugely critical.

7:09

And how it works is you're able to seed

7:11

the system with a set of prompts, then

7:13

execute it across a set of tasks and

7:15

score them. Then leverage a mechanism

7:18

called PA selection to select the best

7:20

ones. And then leverage an LLM on top of

7:22

that to reflect on what went well and

7:24

what didn't and then generate new

7:26

prompts and then repeat with the key

7:28

innovations here being around that

7:30

reflective prompt mutation technique.

7:32

the selection process which allows you

7:34

to explore more of the space of

7:35

prompting rather than one avenue and the

7:38

ability to tune text and not weights.

7:42

And here's a modest uh example of this

7:45

at work for us. You know, you can

7:47

provide it a very simple uh a simple

7:50

simple prompt and run it through JPA and

7:52

it's able to optimize it uh along the

7:54

metrics and scoring mechanisms that we

7:57

uh created to refine that prompt.

8:02

And so if I take a step back and talk

8:04

about kind of how we build uh for

8:07

certain types of features, I would buck

8:09

it into a couple different phases. The

8:11

first is that prototyping and ideation

8:13

phase where we have widened the breadth

8:16

of number of ideas at the top of the

8:17

funnel um and lower the threshold on who

8:20

can build them and how. And so we try

8:22

out a bunch of ideas every week, every

8:23

day from all types of people and we dog

8:25

food those. And if we feel like there's

8:28

actually real utility there, it's

8:29

solving a real problem for us and there

8:32

is a path towards actually hitting the

8:34

quality threshold that we believe we

8:35

need to hit, then we'll move on to this

8:37

next phase where we collect and refine

8:39

eval to clarify product requirements and

8:42

then hill climb through code through

8:44

prompting and automated techniques like

8:46

Jeba and then dog food as we always do

8:48

internally and then chip

8:51

and I do want to kind of double down on

8:54

these phases. The ideation phase is

8:57

extremely important just as much as that

8:59

refinement phase.

9:02

And our goal is to enable faster

9:05

ideation and a more efficient path to

9:07

shipping. Because with all these AI

9:09

advancements every week, new

9:11

possibilities are unlocked in DIA. And

9:13

it's up to us as a browser, as a product

9:16

to get as many at bats with these new

9:18

ideas and try out as many of them and

9:20

explore as many of them as possible. At

9:22

the same time though not underestimating

9:24

the path it takes to ship some of these

9:26

ideas to productions as a high quality

9:28

experience.

9:32

Next uh I want to talk about treating

9:34

model behavior as a craft and

9:36

discipline.

9:37

So what is model behavior to us? It's

9:40

the function that defines evaluates and

9:42

ships the desired behavior models. It's

9:45

turning principles into product

9:46

requirements, prompts, and evals, and

9:49

ultimately shaping the behavior and the

9:51

personality of our LLM products, and

9:53

ultimately for us, our DIA assistant.

9:57

So, I'd buck it into a few different

9:58

areas. First, it's that behavior design,

10:00

defining the product experience we

10:02

actually want, the style, the tone, the

10:04

shape of responses in some cases. Then,

10:07

it's collecting that data for

10:08

measurement and training, clarifying

10:10

those product requirements through eval.

10:13

And last but not least, it's the model

10:14

steering. It's the building of the

10:16

product itself. It's the prompting. It's

10:18

the model selection. It's defining the

10:19

what's in the context window, the

10:21

parameters, etc. Um, and so much more.

10:25

And to us, that that process is

10:28

iterative, very iterative. We build,

10:31

refine, we create evals, and then we

10:33

ship, and then we collect more feedback

10:36

and feed that into our iterative

10:38

building process. That could be internal

10:40

feedback, and that could be also uh

10:41

external feedback.

10:43

And so if I move on for a second, one

10:46

analogy we've thought about uh is for

10:48

model behaviors that to product design

10:51

through the evolution of the internet.

10:53

At first websites were functional. They

10:55

got the job done. But over time that

10:58

evolved as we tried to achieve more on

11:00

the internet and technology advanced. Uh

11:03

product design and the craft of the

11:05

internet itself grew as well as well as

11:07

the complexity.

11:09

And so what might that be for model

11:11

behavior? Well, at first it was

11:13

functional. We had prompts. We had

11:15

evals. We had instructions in and output

11:17

out. Now we frame it through agent

11:19

behaviors. It's goal- directed

11:21

reasoning, the shaping of autonomous

11:23

tasks, selfcorrection and learning, and

11:26

even shaping the personality of the LM

11:28

models themselves.

11:30

And so, what might the future hold? I'm

11:32

excited to see. But what we believe is

11:35

that we are in the early days of

11:36

building AI products and model behavior

11:39

will continue to evolve and into a

11:41

specialized and prevalent function of

11:43

its own even at product companies.

11:46

And the last thing I'll leave you with

11:47

here is that the best people for it

11:50

might just surprise you. One of my

11:52

favorite stories about building DIA

11:54

these last couple years has been uh the

11:56

formation of actually this model

11:58

behavior team. As I mentioned earlier,

12:00

uh engineers were writing the prompts at

12:01

first and then we built these prompt

12:03

tools to enable more people at the

12:04

company to actually prompt and iterate.

12:07

And there was a person on our team on

12:08

the strategy and ops team and he

12:10

actually leveraged these prompt tools

12:12

one weekend to rewrite all our prompts.

12:14

And he came in on a Monday morning and

12:16

dropped a loom video sharing what he

12:19

did, how he did it, and why. and a set

12:21

of prompts and those prompts alone

12:23

unlocked a new level of capability and

12:26

quality and experience in our product

12:28

and consequentially uh it was the

12:30

formation of our model behavior team and

12:33

so one thing I'd emphasize to you all is

12:36

to think about who are those people at

12:37

the company agnostic of their role who

12:39

can help shape your product and help

12:41

shape and steer the model itself it

12:43

might not be an engineer or it might be

12:45

it could also be someone on the strategy

12:47

and ops team

12:50

next I want to talk about AI security as

12:52

an emergent property of product

12:54

building. And today I'm going to focus

12:55

specifically on prompt injections.

12:58

So what is a prompt injection? Well,

13:01

it's a prompt attack in which a third

13:03

party can override the instructions of

13:04

an LLM to cause harm. That might be data

13:07

exfiltration, the execution of malicious

13:09

commands, or ignoring safety rules.

13:14

And so here's an example in which you

13:16

give uh the context of a website to an

13:19

LLM and instruct it to summarize it.

13:22

Little did you know that there was a

13:23

prompt injection hidden in that

13:24

website's uh HTML.

13:27

So instead of actually summarizing the

13:29

web page, the LM actually gets directed

13:31

to open a new website, extracting your

13:33

personal information and embedding it as

13:34

get parameters in the website's URL,

13:37

effectively exfiltrating that data.

13:40

So, as a browser, prompt injections are

13:43

extremely crucial for us to prevent.

13:46

They're critical to prevent

13:48

because browsers sit at the middle of

13:51

what we can call a lethal trifecta.

13:54

It has access to your private data. It

13:56

has exposure to untrusted content and it

13:59

has the ability to externally

14:01

communicate and for us that means

14:03

opening websites, sending emails,

14:05

scheduling events, etc. So, how do we

14:08

prevent this? Well, there's some

14:11

technical strategies we can try. First

14:13

is wrapping that untrusted context in

14:15

tags. You can tell the LM, listen to

14:17

these instructions around these tags and

14:19

don't listen to the content around these

14:20

tags. But this is easily escapable and

14:24

quite trivy, an attacker could still uh

14:27

leverage a prompt injection on your

14:29

browser.

14:30

Well, another solution we could try is

14:32

separating that data and that

14:34

instructions. We can assign uh the

14:38

operating instructions to a system role

14:40

and we can assign a user role for the

14:42

content of the third party and even

14:44

layer on randomly generated tags to wrap

14:46

that user content to be extra sure that

14:49

the LM listens to the instructions and

14:51

not the content. And while this can

14:53

help, there are no guarantees and prompt

14:56

injections will still happen.

15:01

So what do we do? Well, it's on us to

15:03

design a product with that in mind. We

15:06

have to blend technology approaches and

15:08

user experience and design into a

15:10

cohesive story that actually builds them

15:13

from the ground up and solves it

15:14

together.

15:16

So, what that might what that excuse me

15:18

what might that be for a feature in DIA?

15:21

Well, let's take the autofill tool in

15:23

DIA. The autofill tool allows you to

15:25

leverage an LLM with context, memory,

15:28

and your details to fill forms on the

15:30

internet. It's extremely powerful, but

15:33

as you can imagine, it has some

15:35

vulnerabilities. A prompt injection here

15:37

could extract your data and put it on a

15:40

form, and once it's on that form, it's

15:42

out of your hands. So, we try to build

15:44

with that in mind.

15:47

In this case, before the form is written

15:48

to, we actually let the user read and

15:50

confirm that data in plain text. This

15:53

doesn't prevent a prompt injection, but

15:56

it gives the user control, awareness,

15:57

and trust in what is happening. And this

16:00

is a framing we carry throughout our

16:02

product and how we build every single

16:04

feature. So here are some examples.

16:06

Scheduling events in DIA, we have a

16:08

similar confirmation step. Writing

16:11

emails India, we also have a similar

16:14

confirmation step.

16:17

So I've talked about three different

16:19

things here today. First is optimizing

16:21

your tools and process for fast

16:23

iteration. Second, treating model

16:25

behavior as a craft and discipline. And

16:28

third, AI security as an emergent

16:30

property of building products.

16:34

But uh the last thing I want to leave

16:36

you with, when we started on this

16:38

journey to building DIA, we recognized a

16:40

technology shift and we sought to evolve

16:43

our product of Arc. We initially came at

16:45

it from a hey, how can we leverage AI to

16:48

make ARC better, make the browser

16:50

better? But what we quickly learned and

16:53

adapted to was that it wasn't just a

16:55

product evolution. It was a company one

16:58

and today I shared a glimpse of that.

17:00

How we build and how it's changed a team

17:03

we've literally created around this and

17:05

how we think about security for AI

17:06

products. But really it's so much more.

17:08

It goes beyond that. It's how we train

17:10

everyone here. It's how we hire. It's

17:12

how we communicate. It's how we

17:13

collaborate and so much more. And if

17:16

there's one thing I'll leave you all

17:17

with, if there's one thing we've learned

17:19

over the last couple years, it's that

17:21

when when you recognize that technology

17:23

shift, you have to embrace it. And you

17:25

have to embrace it with conviction.

17:28

Thank you.

17:30

[applause and music]

17:39

[music]

Interactive Summary

In this presentation, Samir, head of AI engineering at The Browser Company, discusses the company's journey from developing their first browser, Arc, to creating their AI-native browser, DIA. He emphasizes the shift from viewing AI as just a feature to seeing it as a fundamental transformation in how they build software. The core lessons shared include the importance of building specialized tools for rapid prototyping and iteration, treating model behavior as a core product design discipline, and integrating AI security, specifically regarding prompt injections, as an essential part of the product development process.

Suggested questions

3 ready-made prompts