Ramp: Lessons from Building a New AI Product

Ramp: Lessons from Building a New AI Product - The Pragmatic Summit

Watch on YouTube

Now Playing

Ramp: Lessons from Building a New AI Product - The Pragmatic Summit

Transcript

1052 segments

0:05

Today we're going to talk about AI at uh

0:08

RAMP and uh I'm going to give an

0:11

introduc quick introduction into what

0:12

RAMP is. Um really briefly we're going

0:15

to walk through the simplest possible

0:17

expense use case that you guys can all

0:19

resonate because I see everybody's

0:20

drinking coffee. And um then we're going

0:23

to uh talk quickly about a lesson that

0:27

we learned uh this year while we were

0:29

building gazillion agents um and sort of

0:32

the pivot in the paradigm that's

0:34

happening especially after February 6.

0:36

And um then we're going to double click

0:39

onto how we built one of our most

0:41

popular agents, the policy agent. Um and

0:44

then finally we'll dig in into the

0:47

infrastructure build that this is

0:49

requiring requiring to do on our side

0:52

and in my mind most importantly the

0:54

culture shift that needs to happen on

0:57

everyone's teams in order to be able to

0:59

operate in a way that delivers products

1:02

into the hands of your customers in the

1:04

fastest and most impactful way. Uh so

1:07

without further ado uh quick intro about

1:10

RAMP. Uh we are number one finance

1:12

platform for modern businesses. We have

1:15

50,000 plus customers and we're in the

1:17

business of saving you time and money.

1:21

Uh we have uh I've seen some of the uh

1:24

some of those names on the on the name

1:26

tags here. So thank you for being

1:27

customers. Uh

1:30

really exciting. Uh really quickly. So a

1:33

cup of coffee uh takes uh usually about

1:37

15 minutes of your time um because you

1:40

got to do this three simple things which

1:43

unfortunately take minutes. Um this

1:45

compounds through the company and uh

1:48

what RAM does in the simplest possible

1:50

way we just condense time and uh return

1:54

money back. Uh so a simple story of a

1:57

transaction from tapping the card to

2:00

writing a memo to classifying the

2:01

transaction according to your GL to

2:04

sourcing the receipt attaching the

2:05

receipt um normalizing the merchant to

2:08

your um inventory of merchants uh is all

2:12

done agentically at ramp and this was

2:14

our first foray uh probably by now uh

2:17

you guys still still here? Yeah,

2:19

probably by now about 3 years ago we

2:21

started doing this oneshot things with

2:22

AI. uh normal as merchant write a memo

2:26

and it's been working really really well

2:27

as the models get better. Uh what else

2:30

is going on at the company? Well,

2:33

literally every persona

2:35

uh at the company is wasting time on a

2:40

lot of manual work. Uh so from AP clerks

2:43

to your finance team, from your

2:45

purchasing teams,

2:47

uh keep going to more finance work, your

2:50

data teams. Uh at RAMP we used to have a

2:52

channel called help data where somebody

2:55

will ask for a CSV and a poor person

2:56

will go and write a SQL query. Uh it we

2:59

replaced it about a year and a half ago.

3:02

Uh so a lot of time being spent and the

3:05

complexity has a ramp shape. It only

3:07

increases as you go through different

3:09

jobs to be done. Um so if you guys watch

3:12

Super Bowl uh you might familiar with

3:14

Brian um our agent. Uh so we've been

3:17

writing a lot of agents literally for

3:19

every job to be done to cover the

3:22

entirety in the end state the entirety

3:24

of what admins employees and finance

3:27

teams are doing that is not directly

3:29

related to making the money. We want you

3:32

all to be making money and focus on your

3:34

customers not on how to close the books.

3:37

Uh but what's been happening for the

3:39

past few weeks is uh that we're living

3:43

through the most exciting paradigm shift

3:45

in software. Um and it requires complete

3:48

rethink and with rethink simplification

3:52

of your stack. Um so what we learned is

3:55

you don't need to build a thousand

3:57

agents. We intentionally last year

3:59

allowed each individual team to go and

4:01

experiment and we ended up maybe with

4:03

four different ways of doing the same

4:05

thing both for synchronous agents as

4:07

well as for background agents. Um but

4:10

instead you want to drive your framework

4:14

towards a single agent with a thousand

4:18

skills. Uh so let's talk about what the

4:21

software traditionally used to focus on.

4:24

So every process um especially in the

4:26

modern modern AI stack boils down to

4:28

having an event. Um so a you can

4:31

receive an invoice and you want to pay

4:33

it. Um some prompt instructions of what

4:36

you want to do with it and some

4:37

guardrails like a policy uh like an

4:39

expense policy or a payables policy. Um

4:42

context what is the data that the agent

4:45

should consider and then finally tools.

4:48

These are APIs and actions that you can

4:50

do. And traditionally software would

4:52

focus on only four and five. Uh in the

4:55

new paradigm software is doing

4:58

everything. So you want to focus on

5:01

building an autonomous system of action

5:04

that can react, reason and act without a

5:08

human or with very little human

5:09

supervision.

5:11

Uh so what does it mean in terms of what

5:13

we're building?

5:15

So first uh we decided we're going to

5:19

consolidate the interactions

5:22

um verbal interactions uh with the

5:26

agents to a single conversational UX. Uh

5:28

we literally at the end of last year we

5:30

had about five different conversational

5:32

UXs. We now have consolidated it into

5:34

what we call an omnihat. Omni meaning

5:37

for omniresent. It is now being deployed

5:39

to every surface of the product. And it

5:42

works well with the traditional UX

5:43

because you still need tables and

5:45

buttons and uh you don't always want to

5:47

be talking uh to your software. Um but

5:51

this is a good example of what omnihat

5:52

looks like. Uh please on board a new

5:54

employee. Uh Omnihat can resolve an

5:57

employee to uh an employee ID and look

6:00

up through an HS tool uh their corporate

6:03

structure. And it found a workflow, a

6:06

genic workflow that we created

6:08

previously called the new hire playbook.

6:10

and the agent is asking would you like

6:11

me to to onboard the person using this

6:13

playbook? How is this possible? We built

6:16

a in-house lightweight agent framework

6:19

uh that provides orchestration with uh

6:21

tools that engineers are very quickly

6:23

building and most recently uh we have

6:26

one product manager VIP code about 20

6:28

tools. So engineers are no longer needed

6:30

to build those tools. Um and sometimes

6:33

your workflows are involved such as

6:36

employment boarding consists of four

6:38

steps. So you can just go on ramp and

6:40

describe what do you want to happen when

6:42

a new employee joins. Give them a card.

6:45

Um make sure they get receipts for every

6:47

transaction. Congratulate them on on

6:49

Slack and check in with them in two

6:51

weeks. We now are able to compile this

6:53

into a runnable deterministic workflow

6:57

um and then give it to the agent to

6:59

execute. Uh playbooks make use of tools

7:02

and how this all comes together. Um this

7:06

is an example which uh Viral is going to

7:08

double click. Next is uh upon swiping

7:11

the card uh there's a real-time policy

7:14

review that's happening uh directly in

7:16

the software and policy agent enforces

7:20

your company requirements with regard to

7:23

spend. Um therefore it's very safe to

7:25

give RAM cards to literally every

7:27

employee in your company. And there's a

7:29

handoff happening with uh an accounting

7:31

coding agent that uh classifies this

7:33

transaction applies the rules of your

7:37

back office team of your finance team as

7:39

an employer have no idea how certain

7:41

transaction should match to our GL and

7:43

that's what typical traditional products

7:45

would do. They will expose it to you. Um

7:48

so the agent is much better at doing it

7:49

because it has the full context of your

7:51

chart of accounts. it understands your

7:52

ERP and then it can either auto approve

7:55

or in the worst case scenario it will

7:57

involve uh the human in the loop to

7:59

review materiality or notify that there

8:02

is an out of policy spend. Um with that

8:06

uh please welcome Viral who'll uh dive

8:08

deeper into the policy agent.

8:12

Thanks Nick.

8:19

Awesome. So, a lot of finance teams are

8:22

looking at receipts like this basically

8:24

every day and maybe they might have

8:26

hundreds or thousands of these. If you

8:28

told me to look at this and decide if I

8:30

should approve or reject this

8:31

transaction, I'm probably going to make

8:33

a mistake.

8:34

So, policy agent basically reasons on

8:37

this image and all the transaction data

8:38

that we have and told me that there were

8:40

eight guests in the receipt. I could

8:42

barely see that when I was looking at

8:43

it. Uh, it was below the $80 a person

8:46

cap that we have internally. uh they

8:49

were going for a team welcome dinner uh

8:51

and so because the amount was verified

8:53

as well and the merchant uh policy agent

8:56

told me to approve this transaction.

8:59

Similarly for this OpenAI transaction,

9:01

Anand was testing out um some some

9:03

chatbt features and so policy agent told

9:06

me this was a valid B uh business

9:08

expense and told me to approve it and

9:10

then this $3 bakery charge was told uh

9:14

was was uh rejected because uh it wasn't

9:17

uh part of an overtime purchase and it

9:19

didn't happen on the weekend.

9:22

So really we looked at this as an

9:24

opportunity to rethink how ramp was set

9:27

up. um controllers and finance teams are

9:30

looking at transactions like these and

9:32

and making these decisions every day.

9:34

And a Fortune 500 company that is one of

9:36

our customers was coming to us and

9:38

saying, "Hey, can you uh make sure that

9:40

you approve these types of expenses and

9:42

reject these types of expenses?" And

9:43

they basically had a list of all the

9:45

rules that uh RAMP uh should should

9:48

follow. And we kind of saw this as an

9:50

opportunity not to kind of add more

9:52

incremental deterministic rules that

9:55

kind of defined our product. and I

9:56

worked on some of the first versions of

9:57

these um but actually kind of take out a

10:01

page from Andre Karpathy saying that

10:03

English is the new programming language

10:05

and kind of turn the expense policy into

10:07

the rules themselves. So um you can you

10:10

can see ramps expense policy on the left

10:12

and and this is a screenshot from our

10:14

production environment but we are seeing

10:16

really great uh use out of our policy

10:19

agent product and it kind of needed to

10:22

start it kind of needed to start really

10:25

um organically. So we kind of operated

10:27

like a early stage startup. We're

10:28

already very incremental and and and

10:30

fast at ramp but uh we found some design

10:33

partners like that Fortune 500 company.

10:35

We iterated really quickly and we had

10:37

weekly weekly meetings with all of them

10:39

to kind of understand exactly what uh

10:41

feedback we wanted to hear and what what

10:43

we could improve.

10:46

I think one of the main important um I

10:49

guess things that we realized across uh

10:51

ramp is that we really needed to lean

10:54

into the fact that AI products cannot be

10:56

oneshotted. You need to start with

10:58

something simple. And so as long as

11:00

everyone on your team, PMs, designers,

11:03

engineers are aligned, that you're not

11:04

going to have perfection on day one. I

11:07

think that was actually one of the main

11:08

like cultural learnings. Um, and so we

11:11

dog fooded a lot of this work internally

11:13

uh and started with an even more

11:14

constrained problem of trying to decide

11:17

whether our coffee with a colleague

11:18

transaction should be approved or

11:20

rejected. These are single uh uh dollar

11:23

amount transactions that are low risk um

11:26

uh according to our finance team. And so

11:28

we started uh with these transactions

11:30

and uh one of the early learnings

11:32

especially as we kind of released this

11:35

uh into production was that a lot of the

11:37

reason that policy agent would be wrong

11:39

would be less on the models themselves

11:41

and more about the context that we were

11:43

giving uh to to LLMs themselves. So uh

11:46

we we could have sat down and thought

11:47

about all the context in the beginning

11:49

before we even kicked off any

11:51

engineering work. Uh but we realized

11:53

actually the best thing would be to

11:54

learn from some of our live internal

11:56

data. And so uh for example we learned

11:58

that the role and the title of an

12:00

employee is super important when looking

12:02

at expense policy docs certain levels

12:04

seuite for example might have higher

12:06

limits maybe they can fly on first class

12:08

for for certain flights and so we

12:09

started extracting more information from

12:11

receipts started uh pulling in

12:13

information from HRS fields that are

12:15

already on ramp and so um will is going

12:18

to kind of talk you through exactly the

12:20

iterations that we went through to

12:22

implement policy agent and and uh some

12:24

of the learnings along the

12:33

down.

12:34

>> Yeah.

12:37

>> All right. Cool. Um, awesome. Um, so

12:40

when we first started building the

12:42

policy agent internally, um, we dream,

12:44

we went big. We're like, hey, let's

12:45

automate all of finance. Let's automate

12:47

all reviews. But when it came down to

12:49

it, we actually had to start small. Um,

12:51

is that cup of coffee, you know, in your

12:53

expense policy? And the reason that we

12:55

did that was because even though the

12:57

problem sounds simple to automate, you

13:00

know, is this a simple question. Is this

13:02

in policy or not? Um, it was going to

13:04

grow to be complex. Kind of like Vir

13:06

said, we could have gone down and we

13:08

could have figured out what context do

13:09

we have, how can we add it, how can we

13:11

put it all together in a way that Ellen

13:12

can understand and you know, put it all

13:14

together from the get-go. But we knew

13:16

that even if we aimed and got everything

13:19

right the first time, it was probably

13:21

going to be wrong once you applied and

13:22

generalized it and went to another

13:23

business. Um, so

13:27

the simpler the system, I think the

13:29

easier it is to iterate on top of it.

13:31

And once you iterate, you know what's

13:32

going to work, you know what's not, and

13:33

you can kind of layer complexity on top

13:34

of that. And I think that's pretty

13:36

important to um, keep in mind when

13:37

you're building a um, LM or an agent

13:40

starter. So for us we started really

13:43

simple very very um kind of the classic

13:46

you know we have an expense come in

13:47

retrieve the context around it we pass

13:49

it through a series of LM calls that are

13:51

very well defined of like hey is this in

13:53

policy why is it in policy how can we

13:55

show the user that's in policy and then

13:57

give an output that uh makes sense in

13:58

this way to the user eventually we

14:01

learned that each expense is kind of

14:03

different we can classify an expense

14:04

based on is it travel is it a meal is it

14:06

entertainment do conditional prompting

14:08

and then retrieve context based on that

14:10

and and passages here's LM calls and

14:12

give it some tools so that it can also

14:14

autonomously decide hey um I need flight

14:16

information actually or I need this

14:18

employees level um and kind of layer

14:20

that on top and a few iterations later

14:22

we came to a full on agentic workflow um

14:26

we ended up with um complex tools to

14:29

read across all of our platform and

14:31

these tools are shared across our all of

14:33

our agents it's not just for policy

14:34

agent we have a company internal toolbox

14:36

that all of our agents are easily can

14:38

you know reach into and And we gave it

14:41

the um we gave it the um capability to

14:44

write as well. So it's now writing

14:46

decisions. It's writing uh reasoning.

14:47

It's writing autoproving expenses on

14:50

users behalf. Um and it goes in a loop.

14:53

So um you know now it's more of a black

14:55

box. And that's kind of the trade-off

14:56

you get. Um as you go from simple to

15:00

complex systems um your capability goes

15:02

up, your uh autonomy goes up, your

15:04

agents are able to do more, your AI can

15:06

do more, your AI seems smarter. But in

15:08

exchange, you're going to be able to

15:10

you're losing traceability and

15:11

explainability. Uh we look at it now, we

15:14

can kind of look at the reasoning tokens

15:15

that the LM gives us, but in the end, we

15:17

have no control over it. It's going to

15:18

do what it thinks it's right. It's going

15:19

to make the tool calls. It's going to

15:21

tell you it's right or wrong. So a

15:23

smaller black box becomes a bigger black

15:24

box as the system becomes more complex.

15:29

So one thing that is really important uh

15:31

when doing something like this is from

15:32

the beginning you need really good

15:33

auditability. Um, assume even if you

15:36

know how it works, assume that your

15:38

inputs and outputs are all you know and

15:40

make sure that it's correct. Um, so if

15:44

it was a blackbox system and you only

15:46

saw the input output, can you verify

15:47

that it did the right thing? And even if

15:49

that blackbox changes, you should be

15:50

able to reason about whether the output

15:52

is correct. Um, as with many products

15:55

that we built at RAMP and across, you

15:57

know, other companies, we thought that

15:59

the users would be correct. Uh, you

16:00

know, if the user says approve, the

16:02

agent should approve. If the user says

16:04

reject, the agent should reject. But

16:06

turns out the users are actually

16:08

incorrect. They're wrong. They are

16:09

sometimes, you know, they don't know the

16:10

expense policy. You know, they trust

16:12

their employees. They're lazy. It's a

16:14

Sunday. Who knows? Um, so turns out we

16:17

can't always do what the users are doing

16:19

because sometimes that's where uh

16:20

finance teams come back to you and are

16:22

like, "Hey, this is wrong. This

16:23

shouldn't be on the uh company card."

16:26

So, we had to define our own definition

16:28

of correctness. Um and to do that uh we

16:30

had a weekly labeling session with

16:32

across functions that are working on

16:33

this product. Um and that had two um

16:36

kind of really good outcomes. One was

16:39

that we had a ground truth data set that

16:41

we could always test against and we knew

16:42

that this was correct. And two was that

16:44

everyone was on the same page. If our

16:46

agent got something wrong, everyone knew

16:48

that it got it wrong. Or you know our

16:50

agent is missing context, everyone knew

16:52

that it's missing that context. So there

16:54

was less communication. everyone's on

16:55

the same page and um they could focus on

16:57

what's really priority and kind of have

16:59

alignment on that.

17:02

Initially um getting all those people

17:05

together in a room every week giving

17:06

them homework to label a 100 data points

17:08

it's expensive you know that everyone

17:10

everyone has things to do and it's

17:12

sometimes they don't come back with

17:13

their homework done it's just kind of

17:14

like come almost becomes tedious even

17:16

though it's so important so we wanted to

17:18

make it as simple as possible and the

17:20

way we did that was that we looked for

17:22

third party vendors that could provide

17:24

us the tools to label data and collect

17:25

the data but turns out some tools are

17:28

too specific to a use case some tools

17:29

are too general and we could have spent

17:31

weeks trying out different tools, but we

17:33

decided let's just build our own. Um, so

17:35

we used clock code using streamllet. We

17:38

basically oneshotted all of this. And

17:40

the greatest part of it all is that it's

17:42

low maintenance, um, low risk. It's in a

17:44

part of the codebase that if it breaks,

17:46

we can fix it right away. Deploys happen

17:47

in like instant seconds. And

17:49

non-engineers can go and personalize it.

17:51

They can they can vive code it. They can

17:52

clock code. And this was in Opus 4. So

17:54

now with Opus 4.6, I expect it's even

17:56

better. And uh, with something like

17:58

that, it's definitely easier and cheaper

17:59

sometimes to do something one-off like

18:01

this.

18:04

And

18:06

with that, with the ground truth data

18:07

set, we were able to make quick

18:08

iterations. We were able to find out,

18:10

hey, we need employee levels, add that.

18:12

How does that work? Run it against this

18:13

data set. Does it actually catch it? And

18:15

now say accept or approve. Um, and we

18:17

were able to make really quick

18:18

iterations and that was kind of the key

18:20

um that was actually kind of a key point

18:22

in developing this. uh we had really

18:24

early confidence that this could

18:26

actually work and we were able to

18:27

actually buy get a lot of buy in um get

18:30

a lot of customers on boarded and kind

18:31

of try it out as a design partner. Um

18:35

and as part of like doing that iteration

18:38

with the data set you had evals and I

18:40

feel like evals are being you know

18:41

obviously everyone I think in this room

18:43

now knows about evals and what they mean

18:44

but um it's pretty important to have

18:47

them early on. I wouldn't say that you

18:48

know don't let perfectionism you know

18:50

get in the way. You don't need a full

18:51

data set of a thousand data points. So

18:52

you're testing against every iteration.

18:54

We started with five, you know, and we

18:56

knew that those five we were not going

18:57

to fail. We kept adding and adding and

18:59

adding and you know make sure it's easy

19:02

to run. Anyone could go and just run

19:04

that command and then make sure that the

19:06

results are really easy to understand.

19:07

Um they are able to look at it get

19:09

instant you know output like and

19:10

understand like hey this is what the

19:12

model's doing. This is like good this is

19:13

bad and like if you run it as part of

19:15

your CI everyone now can safely merge in

19:18

code. Um because whenever um whenever

19:23

you think you're doing something right

19:24

for the LMS or agent, giving more

19:26

context, giving a tools, more likely

19:27

than not, it's probably gonna have some

19:28

kind of bad, you know, consequence that

19:30

you didn't see happening. Context rot.

19:32

Um whether it be the tool instructions

19:34

are wrong or maybe the dock string was

19:36

like a little confusing and conflicting.

19:37

Um so it might have consequences. You

19:39

just want to you just want to make sure

19:40

you're catching against those. Um and

19:42

then I'll touch on it briefly, but

19:44

online evals are also great. So these

19:45

are offline. You have a data set. It's

19:47

historical. You're testing it. But if

19:49

you can online evos can be a little more

19:51

confusing and uh harder to kind of

19:52

measure but if you can measure anything

19:54

that as your users are interacting with

19:56

the system definitely as a leading

19:57

metric I'll set them up and for us part

19:59

of that was hey how many our rates of

20:01

like decisions we had an unsure decision

20:03

which is which just meant that the agent

20:04

didn't have enough information so we

20:06

could measure that online know it's much

20:08

simple eval but that also gave us a

20:10

pretty good health check um as our

20:12

system was running

20:15

>> cool and another great part about eval

20:18

is Uh with evals you can make confident

20:20

model changes. Uh whenever a new model

20:22

comes out open 46 GPT53 you want to make

20:25

sure that you can leverage those new

20:27

models because sometimes that could mean

20:28

the difference between you know your

20:29

system getting one part of the problem

20:31

right to wrong. But it could also mean

20:32

the opposite. It could have it could

20:34

actually be not good without any problem

20:36

changes or changing how your system

20:38

works. So um having evos really set up

20:40

and being able to benchmark really helps

20:41

um make confident model changes.

20:46

Cool. Um so now that policy agent we've

20:49

been developing this for a while it's

20:50

available for everyone on the RAM

20:52

platform. Some of the things that we

20:53

learned along the way is that um clot

20:56

code as engineers is very exciting. We

20:58

have full control. We get to modify our

20:59

cloud MD. We get to make sure you know

21:00

tell it to not leave comments. It won't

21:02

leave comments hopefully. Um turns out

21:04

it's not just us. Um finance people also

21:06

really like to have you know modify

21:08

their cloud MD which is their expense

21:09

policy. So if something went wrong with

21:11

the decision then we just like tell them

21:13

hey go update your policy doc which to

21:15

them it's a little scary concept to

21:17

begin like this is a document like you

21:19

know you don't mess with that um you

21:20

have to go through a lot of hoops if you

21:21

want to mess with that but it turns out

21:23

if you get them really excited about the

21:24

feedback loop hey change that you'll see

21:26

it right away turns out they'll be like

21:28

really excited to do this um and then

21:31

trust builds over time so some of the

21:32

earlier customers that we had were some

21:34

of the fortune 500s we actually started

21:36

with the really big you know um

21:38

enterprise customers that we had because

21:39

we that they would have the most value.

21:41

They have the most expenses coming in.

21:42

They have the most time spent on

21:44

reviewing coffee expenses. Um, so you

21:48

know, roll out to them, let them have

21:50

the trust. Don't we didn't do any

21:51

autonomous action. We're just like, hey,

21:53

we're going to give you a suggestion.

21:54

That's that's how that's kind of how we

21:56

phrased it, suggestions. And eventually

21:58

they came to us and we're like, okay,

21:59

you know what? I want to go from

22:00

suggestions to auto approvals. Like

22:03

anything under $20, you guys are mostly

22:05

right. I don't care about this. Let me

22:06

just go auto approve it. So we gave them

22:08

the autonomy slider, we gave them a way

22:10

to like turn it on and then they

22:11

actually could do it themselves. And

22:13

then last but not least, um similar to

22:16

LMS, users thrive, you know, in product

22:18

feedback loops. Um so you know, when

22:20

you're building an AI product and you

22:22

have a full way of like LMS can test if

22:25

it's code was right and it's able to

22:27

iterate, users are the same way. um gave

22:29

them in product ways to improve the

22:31

expense policy doc, improve the agent

22:33

and how it operates and um they're more

22:35

than excited to kind of take it over

22:36

themselves and um kind of improve it and

22:39

personalize it for them. So um from here

22:43

I'll pass it on to Ian who's going to

22:44

kind of talk about the infrastructure

22:45

and the culture that we have at RAMP

22:46

that kind of it led us to building the

22:48

policy agent.

22:54

>> Hey everybody.

22:56

So you've heard a little a little bit

22:58

about like how we're kind of getting

22:59

leverage to all of the different finance

23:02

teams as we operate on top of their

23:04

financial infrastructure and really try

23:05

to get leverage for our customers. Um

23:07

but I think a big thing that we also

23:09

spend a lot of time thinking about is

23:11

how can we get leverage for ramp itself,

23:14

the engineers, our XFN orgs, all the

23:16

people that we work with um every single

23:18

day. And this slide is this section is

23:21

pretty intentionally named AI

23:22

infrastructure and culture because we

23:24

think that this is both like a really

23:26

challenging infrastructure problem but

23:28

it's also really challenging culture

23:29

problem and changing how you work as

23:31

well is a big part of the story.

23:35

And so to kind of start on the

23:36

infrastructure side the core of how most

23:39

of applied AI happens at RAMP is our

23:41

applied AI surface uh service. And at

23:44

like a 10,000 foot view, this looks

23:46

something kind of like an LM proxy or

23:49

something like light LLM. But there's

23:50

really three kind of main extensions

23:52

that we've invested in to make this a

23:54

lot more powerful for a lot of our use

23:56

cases. The first is like structured

23:58

output and consistent API and SDKs

24:00

across different model providers. This

24:02

can be pretty tricky to do, especially

24:03

with how quickly the APIs are changing,

24:05

but it's a problem that we don't want

24:06

downstream product teams to have to

24:08

think about. So if you have an idea of I

24:10

want to switch from uh GPT 5.3 to Opus

24:14

or I want to try Gemini 3 Pro, you

24:17

should be able to do that with a config

24:18

change and really quickly be able to

24:20

iterate on semantic similarity and

24:22

trying to do a bunch of different um you

24:24

know code sandboxing and structured

24:25

output calls that way. The other thing

24:28

that we've spent a ton of time thinking

24:29

about is kind of batch processing and

24:31

workflow handling. This is really useful

24:32

for eval if you're doing like bulk for

24:35

us bulk document or data analysis. Um,

24:38

and that's something that we also don't

24:39

want teams to have to spend a bunch of

24:40

time on of how do you want to batch this

24:42

and handle it with rate limits and do we

24:43

want to do this on an offline or online

24:45

job with something like Enthropic. We

24:47

just want to handle that for downstream

24:49

consumers so they can just focus on

24:50

providing value for downstream

24:52

customers. And then the last which is a

24:54

pretty big deal is the ability to trace

24:56

different costs across teams and against

24:58

products as well. And this allows us to

25:00

kind of identify the parado you know

25:02

curve of like what is the best kind of

25:04

model performance for cost? How are

25:05

these evolving over time? what teams are

25:07

actually not you know building something

25:09

that's going to be sustainable long term

25:10

for different product services and this

25:13

can be really really important to just

25:14

remove all this work from internal teams

25:16

h having to think about this and the

25:18

last thing that's kind of I think funny

25:20

to think about we often joke about that

25:22

you know our customers are actually

25:24

using the front more of a frontier model

25:26

than they may even know even is out yet

25:28

is it allows us to stay at the frontier

25:29

when a new model comes out it's a

25:31

oneline config change that impacts every

25:33

single SDK downstream and so rather than

25:36

teams having to learn the SDK or go into

25:38

12 or dozens of different call sites,

25:40

they can just change it in one place for

25:42

their specific team and they now get the

25:44

benefit of being on the latest and

25:45

greatest models um that we've kind of

25:47

vetted and built into the rest of the

25:49

system.

25:53

Our product as you've kind of heard

25:54

earlier earlier works on a lot of like

25:57

very sensitive data and very sensitive

25:59

workflows. And I think often times uh

26:01

you know something that I hear from

26:02

engineers in the space is this kind of

26:04

concept of hallucination and safety and

26:07

how are you actually going to be able to

26:08

produce a lot of these things to have

26:09

benefits to downstream finance teams.

26:12

And we're pretty big believers that it

26:13

all comes down to the catalog of tools

26:15

that teams are building and integrating

26:16

with in a daily basis. And so what

26:19

you're seeing here is our internal tool

26:21

catalog. So an example would be like get

26:24

a policy snippet or PDM rate or recent

26:27

transactions. And these are built

26:29

alongside of product teams to really

26:31

understand a lot of the nuances in the

26:32

data and the use case. And what's really

26:34

cool about this is not only can you see

26:36

where there's gaps in our offering that

26:37

oh we actually don't have a tool for

26:39

this specific use case. These can be

26:41

used both in internal repos and our core

26:43

product. And so if you have an idea of I

26:44

want to do a cool reimbursement agent

26:47

idea here are the different ways to

26:48

integrate the tools, the different APIs

26:50

and systems that they integrate with.

26:51

And now you can prototype that on a

26:53

totally new product in vibe coded

26:55

surface area without having to worry

26:56

about like learning all of these things

26:58

from scratch or building the tools on

27:00

your own. We're up to like many hundreds

27:02

of these tools today and we as Nick

27:04

mentioned earlier thinks that this could

27:06

be like multiple thousands over time.

27:11

On the topic of context, another big

27:14

thing we think about is context for our

27:16

customers of how do we actually

27:17

integrate the financial stack and allow

27:18

them to be a lot more productive. But we

27:20

noticed like a very similar problem

27:22

internally um on our engineering team.

27:24

And I think something that's like not as

27:26

always obvious is that you know even if

27:28

you're using something like cloud code

27:30

or codeex there's all this fragmentation

27:32

of actually what you do on a daily basis

27:34

to get work done in your company that

27:36

that's not integrated too. There's logs

27:38

in data dog. There's a production you

27:40

know database that has a bunch of things

27:42

going on. There's different alerting

27:44

systems. There's incident IO. There's a

27:46

Slack message you have to pull in.

27:47

There's a notion doc. And then there's a

27:49

lot of like knowledge that those actual

27:50

specific product teams have of how they

27:53

actually need to get work done as well.

27:55

And so at the end of last year, we

27:58

decided to start out and try to solve

28:00

this problem of how can we actually

28:01

integrate all this context and build our

28:03

own internal background coding agent

28:05

which we've called ramp inspect. You may

28:07

have seen this on LinkedIn or X. We

28:09

actually have open sourced the blueprint

28:10

of how we built this and at the end I

28:12

can definitely show you guys a link of

28:13

where to find that. And the the progress

28:16

has been pretty phenomenal of actually

28:18

integrating this into a background agent

28:20

that can run autonomously as people are

28:22

in meetings if as bug fixes come up and

28:24

things like that. And currently this

28:27

month ramp inspect is responsible for

28:29

over 50% of PRs that we merge to

28:31

production. I have some interesting

28:33

we're like really big nerds with uh

28:35

stats and numbers and things like that.

28:36

So we have this dashboard to kind of

28:38

create this like interesting one like

28:41

subtle healthy competition but also

28:43

inspire people that they can actually

28:45

use this as well. And so you can see

28:47

engineering uh has a huge lead of the

28:49

amount of sessions but you also have

28:51

product you also have design there's

28:53

risk legal corporate finance and even

28:55

marketing and CX teams using ramp

28:57

inspect and they're doing things like

28:59

simple copy changes they're doing logic

29:01

fixes they're trying to respond to

29:03

incidents or bugs and what's been really

29:05

cool to see as this has evolved over

29:08

time

29:11

is how we've actually designed a couple

29:13

of these things um with some core

29:15

principles to really powerful. So what

29:17

you're seeing here is a ramp inspec

29:18

session. I think this is an example of

29:20

like a query that we were trying to fix.

29:22

This spins up in the background really

29:24

fast modal code sandbox. This allows us

29:27

to like resume spin up and spin down

29:28

these containers in an isolated

29:30

environment which has the same

29:31

environment that you would have if

29:32

you're developing a ramp. There's a

29:34

series of tasks to keep it on track and

29:36

it creates a GitHub branch and

29:38

integrates with all of the context

29:39

documents, our data dog, our read

29:42

replica so it can actually write queries

29:43

and different context documents that

29:45

product teams have um have put together.

29:48

And what's really uh I think subtle

29:49

about how we've designed this is we've

29:51

designed it to be multiplayer first. And

29:53

that means that as you integrate or you

29:55

try to pair with like a designer or

29:57

somebody on the PM team, they you can

30:00

actually help them like level up their

30:01

own prompting skills. they can give us

30:03

feedback of, hey, click on this link.

30:04

This actually failed in a way that I

30:06

wasn't expecting. And so that can be a

30:08

really great source of like

30:09

crossunctional collaboration. That was a

30:12

very subtle design choice that we made

30:13

that ended up being a really big impact

30:15

for the company. And then these can be

30:17

kicked off either via the canban UI, we

30:20

have an API, and then also a slack

30:22

thread. And we can take the full context

30:24

of the Slack thread when it is actually

30:25

kicked off. So you don't have to

30:26

reprompt it with a bunch of conversation

30:28

that happened earlier.

30:32

What you see here is um we also have a

30:34

full VS code environment. We run VNC

30:36

inside of a modal sandbox as well. So

30:38

this allows us to have Chrome dev tools

30:40

and MCP. So it can actually do full

30:42

stack work which is pretty cool. And it

30:44

has access to the 150 plus thousand

30:46

tests that we have. So it also knows if

30:48

things are broken, can respond to the CI

30:50

inside of GitHub and actually patch

30:52

fixes before it actually pings you that

30:54

the PR is done.

30:56

Um the link for this is

30:58

builders.ramp.com.

31:00

I think it's like one of the first uh

31:01

blog posts that we have or the most

31:03

recent blog posts that we have and we

31:05

open source like the whole blueprint of

31:06

how to build this and put this together

31:08

as well. I think there's also a GitHub

31:10

repo called open inspect which is an

31:11

open source implementation of this as

31:13

well.

31:17

So it's been pretty interesting to see

31:19

the impact that ramp inspect has had

31:20

where over 50% of PRs that we merge on a

31:23

weekly basis goes through the system.

31:25

And so with all this time not spent on

31:28

thinking about these really low-level

31:29

firefighting tasks or really low-level

31:32

small fixes or tweaks that can be kind

31:34

of democratized across the company,

31:36

we're really rethinking like how our

31:38

engineering teams operate and think

31:39

about their job and how they can

31:42

actually be really impactful and this

31:44

new kind of AI native future.

31:47

And so as a thought experiment, um we

31:49

let's pretend we have two different

31:51

teams. I'm sure everyone in this room

31:52

has worked with like their handful of

31:54

extraordinary teams, maybe teams that

31:56

are finding their footing. And you'll

31:58

notice that there's like a couple of

31:59

different qualities that may sound that

32:01

may resonate. So we have team A on the

32:03

left here. And let's say that they

32:05

really care about impact. They handle

32:06

ambiguous problems. They understand the

32:08

product, business, and data. They adopt

32:10

new tools. They can find creative

32:12

solutions and they obsess over like the

32:14

user experience. And then team B may

32:17

also resonate with some people. You

32:18

know, they debate libraries. They add

32:20

process when things start to feel

32:21

chaotic. They constantly complain about

32:23

headcount. They bike shed the details

32:26

instead of actually focusing on the user

32:28

experience like hey should we use you

32:30

know functional programming paradigm

32:32

here or what version of you know

32:33

different typescript libraries do we

32:35

want to use and then they build before

32:37

understanding the problem right they

32:38

just say hey we're going to just vibe

32:40

code this bro don't worry or they focus

32:42

on you know performative code quality or

32:44

nitpicks that may not actually that may

32:46

be very much like a subjective kind of

32:48

matter of fact as well I've worked on

32:50

both of these teams and I think the

32:51

argument that I'm going to make today is

32:53

that there's going to be divergence I

32:55

think depending on what side of the

32:56

aisle you land there. This is a study

32:59

from Harvard that was out uh I think the

33:01

end of last year and it was very much

33:03

geared towards juniors and in seniors in

33:05

terms of what's actually happening with

33:06

hiring trends in uh in engineering since

33:09

AI tools have accelerated. And I think

33:11

what this glosses over is I don't think

33:13

it's just a years of experience problem.

33:15

I actually think it's very much um all

33:17

of the different qualities that I said

33:18

in team A versus team B that really make

33:21

it apparent that like coding was never

33:23

really the hardest part of a lot of jobs

33:25

for a lot for a long time. There's all

33:28

these other engineering principles that

33:29

become really important than just raw

33:31

coding speed. So when you think about

33:33

like a staff or a staff plus engineer,

33:36

you're really compensating those people

33:37

more for a lot of the judgment that they

33:40

bring to the table, the context, the

33:41

ability to see around corners, all the

33:43

learning that they have, the actual like

33:45

scar tissue. And so if you know you ask

33:47

Opus 46 to do something, they'll have

33:50

the knowledge to actually know if that

33:51

is not going to work or that's actually

33:53

a bad idea. And I think one thing that a

33:55

lot of the narratives that we see in the

33:56

media gets get wrong about coding agents

33:59

is they don't really identify the fact

34:01

that you could still build the wrong

34:02

thing just a lot faster and you can

34:04

build like bigger messes. And I think

34:06

that having a lot of these skills of a

34:08

team A and really focusing on like what

34:10

is the context and reason behind this

34:11

will only become more important um in

34:14

AI.

34:16

And so what does that actually look

34:18

like? We hit on some of these things.

34:20

figuring out what to build and

34:22

understanding users well enough. Selling

34:24

an idea to skeptical stakeholders. This

34:26

is still something when we decided to

34:28

build a a background coding agent. This

34:30

was not something that was obvious that

34:31

we should be spending time on this.

34:33

Having good design design decisions with

34:36

incomplete information and maintaining

34:38

momentum through the long middle of this

34:40

project, which can be really gnarly. And

34:42

I think this last bit, you know,

34:44

everyone in this room, I'm sure, is

34:45

painfully aware of, you know, the

34:47

conversation around SAS and and the

34:49

stock market and things like that. And I

34:51

think this is like a big element that

34:52

they gloss over, which is that yes, it's

34:55

easy to vibe code something, but

34:56

actually going through that middle

34:58

process is like why you need really good

34:59

engineers to actually get something

35:00

deployed that has product market fit

35:03

that people are really excited about.

35:04

Um, and I think not enough people

35:06

recognize that.

35:10

And so where does that leave us?

35:11

Personally, I think there's a lot of

35:13

kind of dumerism and scariness around a

35:15

lot of the AI narratives, but I think

35:17

it's also a really exciting time to be

35:19

building. Unlike maybe factory work or

35:22

farming, software is never done. We have

35:25

this uh really kind of like meme

35:28

internally where we say, you know, jobs

35:30

not finished. You've probably seen in

35:31

the marketing as well. And I think

35:32

software is perpetually not finished.

35:34

And so with all this extra capacity,

35:37

with people focusing less on this kind

35:39

of low-level work and more on high

35:40

leverage engineering tasks, I think four

35:43

things are going to really happen. I

35:45

think companies are just going to chase

35:46

opportunities they couldn't afford to

35:48

pursue. I don't know if we would be

35:50

chasing these like agentic workflows and

35:52

really thinking about bigger scale

35:54

problems in the financial stack if this

35:56

technology didn't exist. People are

35:58

going to enter adjacent markets. They're

36:00

going to try to stitch together more

36:01

value for customers. It's not going to

36:03

be like because everyone's 2x more

36:05

productive, you need two less or half

36:07

the people. You're going to rebuild

36:09

systems that are too expensive to touch.

36:11

I think building an internal background

36:12

coding agent uh for a company that does

36:15

financial operations um software felt

36:17

like probably a pretty crazy idea, but

36:19

now that makes a ton of sense and raise

36:21

the bar for what good enough means. I

36:23

think, you know, being able to kind of

36:25

build more mind-blowing experiences for

36:27

users, provide a lot more value is going

36:29

to be the narrative of the next decade.

36:32

And I'm super excited to be able to

36:33

build some of these things and see what

36:35

everyone in this room is going to build,

36:36

too. So, thank you.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The presentation discusses RAMP's application of AI, starting with its initial agentic approach for finance automation, evolving to a paradigm of a single agent with many skills. It details the development and learnings from their "Policy Agent," which automates expense policy enforcement through real-time review, and highlights the importance of starting simple, iterating, building ground truth data, and robust evaluation systems. The talk then covers RAMP's AI infrastructure, including a unified AI service and an internal tool catalog, and introduces "RAMP Inspect," a background coding agent responsible for over 50% of PRs. Finally, it addresses the cultural shift in engineering, emphasizing that AI amplifies the need for judgment, context, and problem understanding over raw coding speed, leading to new opportunities and a higher bar for product value.