Lessons from Building a New AI Product at Ramp

Lessons from Building a New AI Product at Ramp - The Pragmatic Summit

Watch on YouTube

Now Playing

Lessons from Building a New AI Product at Ramp - The Pragmatic Summit

Transcript

1054 segments

0:05

Today we're going to talk about AI at uh

0:08

RAMP. And uh I'm going to give an

0:11

introduc quick introduction into what

0:12

RAMP is. Um really briefly we're going

0:15

to walk through the simplest possible

0:17

expense use case that you guys can all

0:19

resonate because I see everybody's

0:20

drinking coffee. And um then we're going

0:23

to uh talk quickly about a lesson that

0:27

we learned uh this year while we were

0:29

building gazillion agents um and sort of

0:32

the pivot in the paradigm that's

0:34

happening especially after February 6.

0:36

And um then we're going to double click

0:39

onto how we built one of our most

0:41

popular agents, the policy agent. Um and

0:44

then finally we'll dig in into the

0:47

infrastructure build that this is

0:49

requiring requiring to do on our side

0:52

and in my mind most importantly the

0:54

culture shift that needs to happen on

0:57

everyone's teams in order to be able to

0:59

operate in a way that delivers products

1:02

into the hands of your customers in the

1:04

fastest and most impactful way. Uh so

1:07

without further ado uh quick intro about

1:10

RAMP. Uh we are number one finance

1:12

platform for modern businesses. We have

1:15

50,000 plus customers and we're in the

1:17

business of saving you time and money.

1:21

Uh we have uh I've seen some of the uh

1:24

some of those names on the on the name

1:26

tags here. So thank you for being REM

1:27

customers. Uh

1:30

really exciting. Uh really quickly. So a

1:33

cup of coffee uh takes uh usually about

1:37

15 minutes of your time um because you

1:40

got to do this three simple things which

1:43

unfortunately take minutes. Um this

1:45

compounds through the company and uh

1:48

what RAM does in the simplest possible

1:50

way we just condense time and uh return

1:54

money back. Uh so a simple story of a

1:57

transaction from tapping the card to

2:00

writing a memo to classifying the

2:01

transaction according to your GL to

2:04

sourcing the receipt attaching the

2:05

receipt um normalizing the merchant to

2:08

your um inventory of merchants uh is all

2:12

done agentically at ramp and this was

2:14

our first foray uh probably by now uh

2:17

you guys still still here? probably by

2:19

now about 3 years ago we started doing

2:21

this oneshot things with AI uh normalize

2:24

merchant write a memo and it's been

2:26

working really really well as the models

2:28

get better uh what else is going on at

2:31

the company well literally every persona

2:35

uh at the company is wasting time on a

2:40

lot of manual work uh so from AP clerks

2:43

to your finance team from your

2:45

purchasing teams

2:47

uh keep going more finance work. Your

2:50

data teams uh at RAMP we used to have a

2:52

channel called help data where somebody

2:55

will ask for a CSV and a poor person

2:56

will go and write a SQL query. Uh it we

2:59

replaced it about a year and a half ago.

3:02

Uh so a lot of time being spent and the

3:05

complexity has a ramp shape. It only

3:07

increases as you go through different

3:09

jobs to be done. Um so if you guys watch

3:12

Super Bowl uh you might familiar with

3:14

Brian um our agent. Uh so we've been

3:17

writing a lot of agents literally for

3:19

every job to be done to cover the

3:22

entirety in the end state the entirety

3:24

of what admins employees and finance

3:27

teams are doing that is not directly

3:29

related to making the money. We want you

3:32

all to be making money and focus on your

3:34

customers not on how to close the books.

3:37

Uh but what's been happening for the

3:39

past few weeks is uh that we're living

3:43

through the most exciting paradigm shift

3:45

in software. Um and it requires complete

3:48

rethink and with rethink simplification

3:52

of your stack. Um so what we learned is

3:55

you don't need to build a thousand

3:57

agents. We intentionally last year

3:59

allowed each individual team to go and

4:01

experiment and we ended up maybe with

4:03

four different ways of doing the same

4:05

thing both for synchronous agents as

4:07

well as for background agents. Um but

4:10

instead you want to drive your framework

4:14

towards a single agent with a thousand

4:18

skills. Uh so let's talk about what the

4:21

software traditionally used to focus on.

4:23

So every process um especially in the

4:26

modern modern AI stack boils down to

4:28

having an event. Um so a you can

4:31

receive an invoice and you want to pay

4:33

it. Um some prompt instructions of what

4:36

you want to do with it and some

4:37

guardrails like a policy uh like an

4:39

expense policy or a payables policy. Um

4:42

context what is the data that the agent

4:45

should consider and then finally tools.

4:48

These are APIs and actions that you can

4:50

do. And traditionally software would

4:52

focus on only four and five. Uh in the

4:55

new paradigm software is doing

4:58

everything. So you want to focus on

5:01

building an autonomous system of action

5:04

that can react, reason and act without a

5:08

human or with very little human

5:09

supervision.

5:11

Uh so what does it mean in terms of what

5:13

we're building?

5:15

So first uh we decided we're going to

5:19

consolidate the interactions

5:22

um verbal interactions uh with the

5:26

agents to a single conversational UX. Uh

5:28

we literally at the end of last year we

5:30

had about five different conversational

5:32

UXs. We now have consolidated it into

5:34

what we call an omnihat. Omni meaning

5:37

for omniresent. It is now being deployed

5:39

to every surface of the product. And it

5:42

works well with the traditional UX

5:43

because you still need tables and

5:45

buttons and uh you don't always want to

5:47

be talking uh to your software. Um but

5:51

this is a good example of what omnihat

5:52

looks like. Uh please on board a new

5:54

employee. Uh Omnihat can resolve an

5:57

employee to uh an employee ID and look

6:00

up through an HIS tool uh their

6:02

corporate structure. And it found a

6:06

workflow, a genic workflow that we

6:07

created previously called the new hire

6:09

playbook. and the agent is asking would

6:11

you like me to to onboard the person

6:13

using this playbook? How is this

6:15

possible? We built a in-house

6:17

lightweight agent framework uh that

6:19

provides orchestration with uh tools

6:22

that engineers are very quickly building

6:24

and most recently uh we have one product

6:26

manager VIP code at about 20 tools. So

6:28

engineers are no longer needed to build

6:30

those tools. Um and sometimes your

6:33

workflows are involved such as

6:36

employment boarding consists of four

6:38

steps. So you can just go on ramp and

6:40

describe what do you want to happen when

6:42

a new employee joins. Give them a card.

6:45

Um make sure they get receipts for every

6:47

transaction. Congratulate them on on

6:49

Slack and check in with them in two

6:51

weeks. We now are able to compile this

6:53

into a runnable deterministic workflow

6:57

um and then give it to the agent to

6:59

execute. Uh playbooks make use of tools

7:02

and how this all comes together. Um this

7:06

is an example which uh Viral is going to

7:08

double click. Next is uh upon swiping

7:11

the card uh there's a real-time policy

7:14

review that's happening uh directly in

7:16

the software and policy agent enforces

7:20

your company requirements with regard to

7:23

spend. Um therefore it's very safe to

7:25

give RAM cards to literally every

7:27

employee in your company. And there's a

7:29

handoff happening with uh an accounting

7:31

coding agent that uh classifies this

7:33

transaction applies the rules of your

7:37

back office team of your finance team as

7:39

an employer have no idea how certain

7:41

transaction should match to our GL and

7:43

that's what typical traditional products

7:45

would do. They will expose it to you. Um

7:48

so the agent is much better at doing it

7:49

because it has the full context of your

7:51

chart of accounts. it understands your

7:52

ERP and then it can either auto approve

7:55

or in the worst case scenario it will

7:57

involve uh the human in the loop to

7:59

review materiality or notify that there

8:02

is an out of policy spend. Uh with that

8:06

uh please welcome Viral who'll uh dive

8:08

deeper into the policy agent.

8:12

Thanks Nick.

8:19

Awesome. So, a lot of finance teams are

8:22

looking at receipts like this basically

8:24

every day and maybe they might have

8:26

hundreds or thousands of these. If you

8:28

told me to look at this and decide if I

8:30

should approve or reject this

8:31

transaction, I'm probably going to make

8:33

a mistake.

8:34

So, policy agent basically reasons on

8:37

this image and all the transaction data

8:38

that we have and told me that there were

8:40

eight guests in the receipt. I could

8:42

barely see that when I was looking at

8:43

it. Uh, it was below the $80 a person

8:46

cap that we have internally. uh they

8:49

were going for a team welcome dinner uh

8:51

and so because the amount was verified

8:53

as well and the merchant uh policy agent

8:56

told me to approve this transaction.

8:59

Similarly for this OpenAI transaction,

9:01

Anand was testing out um some some

9:03

chatbt features and so policy agent told

9:06

me this was a valid B uh business

9:08

expense and told me to approve it and

9:10

then this $3 bakery charge was told uh

9:14

was was uh rejected because uh it wasn't

9:17

uh part of an overtime purchase and it

9:19

didn't happen on the weekend.

9:22

So really we looked at this as an

9:24

opportunity to rethink how ramp was set

9:27

up. um controllers and finance teams are

9:30

looking at transactions like these and

9:32

and making these decisions every day.

9:34

And a Fortune 500 company that is one of

9:36

our customers was coming to us and

9:38

saying, "Hey, can you uh make sure that

9:40

you approve these types of expenses and

9:42

reject these types of expenses?" And

9:43

they basically had a list of all the

9:45

rules that uh RAMP uh should should

9:48

follow. And we kind of saw this as an

9:50

opportunity not to kind of add more

9:52

incremental deterministic rules that

9:55

kind of defined our product. and I

9:56

worked on some of the first versions of

9:57

these um but actually kind of take out a

10:01

page from Andre Karpathy saying that

10:03

English is the new programming language

10:05

and kind of turn the expense policy into

10:07

the rules themselves. So um you can you

10:10

can see ramps expense policy on the left

10:12

and and this is a screenshot from our

10:14

production environment but we are seeing

10:16

really great uh use out of our policy

10:19

agent product and it kind of needed to

10:22

start it kind of needed to start really

10:25

um organically. So we kind of operated

10:27

like a early stage startup. We're

10:28

already very incremental and and and

10:30

fast at ramp but uh we found some design

10:33

partners like that Fortune 500 company.

10:35

We iterated really quickly and we had

10:37

weekly weekly meetings with all of them

10:39

to kind of understand exactly what uh

10:41

feedback we wanted to hear and what what

10:43

we could improve.

10:46

I think one of the main important um I

10:49

guess things that we realized across uh

10:51

ramp is that we really needed to lean

10:54

into the fact that AI products cannot be

10:56

oneshotted. You need to start with

10:58

something simple. And so as long as

11:00

everyone on your team, PMs, designers,

11:03

engineers are aligned, that you're not

11:04

going to have perfection on day one. I

11:07

think that was actually one of the main

11:08

like cultural learnings. Um, and so we

11:11

dog fooded a lot of this work internally

11:13

uh and started with an even more

11:14

constrained problem of trying to decide

11:17

whether our coffee with a colleague

11:18

transaction should be approved or

11:20

rejected. These are single uh uh dollar

11:23

amount transactions that are low risk um

11:26

uh according to our finance team. And so

11:28

we started uh with these transactions

11:30

and uh one of the early learnings

11:32

especially as we kind of released this

11:35

uh into production was that a lot of the

11:37

reason that policy agent would be wrong

11:39

would be less on the models themselves

11:41

and more about the context that we were

11:43

giving uh to to LLMs themselves. So uh

11:46

we we could have sat down and thought

11:47

about all the context in the beginning

11:49

before we even kicked off any

11:51

engineering work. Uh but we realized

11:53

actually the best thing would be to

11:54

learn from some of our live internal

11:56

data. And so uh for example we learned

11:58

that the role and the title of an

12:00

employee is super important when looking

12:02

at expense policy docs certain levels

12:04

seuite for example might have higher

12:06

limits maybe they can fly on first class

12:08

for for certain flights and so we

12:09

started extracting more information from

12:11

receipts started uh pulling in

12:13

information from HRS fields that are

12:15

already on ramp and so um will is going

12:18

to kind of talk you through exactly the

12:20

iterations that we went through to

12:22

implement policy agent and and uh some

12:24

of the learnings along the

12:32

down. It's down.

12:34

>> Yeah.

12:37

>> All right. Cool. Um, awesome. Um, so

12:40

when we first started building the

12:42

policy agent internally, um, we dream,

12:44

we went big. We're like, hey, let's

12:45

automate all of finance. Let's automate

12:47

all reviews. But when it came down to

12:49

it, we actually had to start small. Um,

12:51

is that cup of coffee, you know, in your

12:53

expense policy? And the reason that we

12:55

did that was because even though the

12:57

problem sounds simple to automate, you

13:00

know, is this a simple question. Is this

13:02

in policy or not? Um, it was going to

13:04

grow to be complex. Kind of like Vir

13:06

said, we could have gone down and we

13:08

could have figured out what context do

13:09

we have, how can we add it, how can we

13:11

put it all together in a way that Ellen

13:12

can understand and you know, put it all

13:14

together from the get-go. But we knew

13:16

that even if we aimed and got everything

13:19

right the first time, it was probably

13:21

going to be wrong once you applied and

13:22

generalized it and went to another

13:23

business. Um, so

13:27

the simpler the system, I think the

13:29

easier it is to iterate on top of it.

13:31

And once you iterate, you know what's

13:32

going to work, you know what's not, and

13:33

you can kind of layer complexity on top

13:34

of that. And I think that's pretty

13:36

important to um, keep in mind when

13:37

you're building a um, LM or an agent

13:40

starter. So for us we started really

13:43

simple very very um kind of the classic

13:46

you know we have an expense come in

13:47

retrieve the context around it we pass

13:49

it through a series of LM calls that are

13:51

very well defined of like hey is this in

13:53

policy why is it in policy how can we

13:55

show the user that's in policy and then

13:57

give an output that uh makes sense in

13:58

this way to the user eventually we

14:01

learned that each expense is kind of

14:03

different we can classify an expense

14:04

based on is it travel is it a meal is it

14:06

entertainment do conditional prompting

14:08

and then retrieve context based on that

14:10

and and passages here's LM calls and

14:12

give it some tools so that it can also

14:14

autonomously decide hey um I need flight

14:16

information actually or I need this

14:18

employees level um and kind of layer

14:20

that on top and a few iterations later

14:22

we came to a full on agentic workflow um

14:26

we ended up with um complex tools to

14:29

read across all of our platform and

14:31

these tools are shared across our all of

14:33

our agents it's not just for policy

14:34

agent we have a company internal toolbox

14:36

that all of our agents are easily can

14:38

you know reach into and And we gave it

14:40

the um we gave it the um capability to

14:44

write as well. So it's now writing

14:46

decisions. It's writing uh reasoning.

14:47

It's writing auto approving expenses on

14:50

users behalf. Um and it goes in a loop.

14:53

So um you know now it's more of a black

14:55

box. And that's kind of the trade-off

14:56

you get. Um as you go from simple to

15:00

complex systems um your capability goes

15:02

up, your uh autonomy goes up, your

15:04

agents are able to do more, your AI can

15:06

do more, your AI seems smarter. But in

15:08

exchange, you're going to be able to

15:10

you're losing traceability and

15:11

explainability. Uh we look at it now, we

15:14

can kind of look at the reasoning tokens

15:15

that the LM gives us, but in the end, we

15:17

have no control over it. It's going to

15:18

do what it thinks it's right. It's going

15:19

to make the tool calls. It's going to

15:21

tell you it's right or wrong. So a

15:23

smaller black box becomes a bigger black

15:24

box as the system becomes more complex.

15:29

So one thing that is really important uh

15:31

when doing something like this is from

15:32

the beginning you need really good

15:33

auditability. Um, assume even if you

15:36

know how it works, assume that your

15:38

inputs and outputs are all you know and

15:40

make sure that it's correct. Um, so if

15:44

it was a blackbox system and you only

15:46

saw the input output, can you verify

15:47

that it did the right thing? And even if

15:49

that blackbox changes, you should be

15:50

able to reason about whether the output

15:52

is correct. Um, as with many products

15:55

that we built at ramp and across, you

15:57

know, other companies, we thought that

15:59

the users would be correct. Uh, you

16:00

know, if the user says approve, the

16:02

agent should approve. If the user says

16:04

reject, the agent should reject. But

16:06

turns out the users are actually

16:08

incorrect. They're wrong. They are

16:09

sometimes, you know, they don't know the

16:10

expense policy. You know, they trust

16:12

their employees. They're lazy. It's a

16:14

Sunday. Who knows? Um, so turns out we

16:17

can't always do what the users are doing

16:19

because sometimes that's where uh

16:20

finance teams come back to you and are

16:22

like, "Hey, this is wrong. This

16:23

shouldn't be on the uh company card."

16:26

So, we had to define our own definition

16:28

of correctness. Um and to do that uh we

16:30

had a weekly labeling session with

16:32

across functions that are working on

16:33

this product. Um and that had two um

16:36

kind of really good outcomes. One was

16:39

that we had a ground truth data set that

16:41

we could always test against and we knew

16:42

that this was correct. And two was that

16:44

everyone was on the same page. If our

16:46

agent got something wrong, everyone knew

16:48

that it got it wrong. Or you know our

16:50

agent is missing context, everyone knew

16:52

that it's missing that context. So there

16:54

was less communication. everyone's on

16:55

the same page and um they could focus on

16:57

what's really priority and kind of have

16:59

alignment on that.

17:02

Initially um getting all those people

17:05

together in a room every week giving

17:06

them homework to label a 100 data points

17:08

it's expensive you know that everyone

17:10

everyone has things to do and know it's

17:12

sometimes they don't come back with

17:13

their homework done it's just like kind

17:14

of like come almost becomes tedious even

17:16

though it's so important so we wanted to

17:18

make it as simple as possible and the

17:20

way we did that was that we looked for

17:22

third party vendors that could provide

17:24

us the tools to label data and collect

17:25

the data but turns out some tools are

17:28

too specific to a use case some tools

17:29

are too general and we could have spent

17:31

weeks trying out different tools, but we

17:33

decided let's just build our own. Um, so

17:35

we used clock code using streamllet. We

17:38

basically oneshotted all of this. And

17:40

the greatest part of it all is that it's

17:42

low maintenance um low risk. It's in a

17:44

part of the codebase that if it breaks,

17:46

we can fix it right away. Deploys happen

17:47

in like instant seconds. And

17:49

non-engineers can go and personalize it.

17:51

They can they can vive code it. They can

17:52

clock code. And this was in Opus 4. So

17:54

now with Opus 4.6, I expect it's even

17:56

better. And uh with something like that,

17:58

it's definitely easier and cheaper

17:59

sometimes to do something oneoff like

18:01

this.

18:04

And

18:06

with that, with the ground truth data

18:07

set, we were able to make quick

18:08

iterations. We were able to find out,

18:10

hey, we need employee levels, add that.

18:12

How does that work? Run it against this

18:13

data set. Does it actually catch it? And

18:15

now say accept or approve. Um, and we

18:17

were able to make really quick

18:18

iterations and that was kind of the key

18:20

um that was actually kind of a key point

18:22

in developing this. uh we had really

18:24

early confidence that this could

18:26

actually work and we were able to

18:27

actually buy get a lot of buy in um get

18:30

a lot of customers on boarded and kind

18:31

of try it out as a design partner. Um

18:35

and as part of like doing that iteration

18:38

with the data set you had evals and I

18:40

feel like evals are being you know

18:41

obviously everyone I think in this room

18:43

now knows about evals and what they mean

18:44

but um it's pretty important to have

18:47

them early on. I wouldn't say that you

18:48

know don't let perfectionism you know

18:50

get in the way. You don't need a full

18:51

data set of a thousand data points. So

18:52

you're testing against every iteration.

18:54

We started with five, you know, and we

18:56

knew that those five we were not going

18:57

to fail. We kept adding and adding and

18:59

adding and you know make sure it's easy

19:02

to run. Anyone could go and just run

19:04

that command and then make sure that the

19:06

results are really easy to understand.

19:07

Um they are able to look at it get

19:09

instant you know output like and

19:10

understand like hey this is what the

19:12

model's doing. this is like good, this

19:13

is bad. And like if you run it as part

19:15

of your CI, then everyone now can safely

19:18

merge in code. Um because whenever um

19:22

whenever you think you're doing

19:23

something right for the LMS or agent,

19:25

giving more context, giving a tool, more

19:27

likely than not, it's probably going to

19:28

have some kind of bad, you know,

19:29

consequence that you didn't see

19:31

happening. Context rot. Um whether it be

19:33

the tool instructions are wrong or maybe

19:35

the dock string was like a little

19:36

confusing and conflicting. Um so it

19:38

might have consequences. You just want

19:39

to you just want to make sure you're

19:40

catching against those. Um, and then

19:42

I'll touch on it briefly, but online

19:44

evals are also great. So, these are

19:45

offline. You have a data set, it's

19:47

historical, you're testing it, but if

19:49

you can, online evals can be a little

19:50

more confusing and uh harder to kind of

19:52

measure, but if you can measure anything

19:54

that as your users are interacting with

19:56

the system, definitely as a leading

19:57

metric, I'll set them up. And for us,

19:59

part of that was hey, how many are rates

20:01

of like decisions? We had an unsure

20:02

decision, which is which just meant that

20:04

the agent didn't have enough

20:05

information. So, we could measure that

20:07

online. know it's much simple eval but

20:09

that also gave us a pretty good health

20:10

check um as our system was running.

20:15

>> Cool. And another great part about evals

20:18

is that with evals you can make

20:19

confident model changes. Uh whenever a

20:22

new model comes out open 46 GPT53 you

20:24

want to make sure that you can leverage

20:26

those new models because sometimes that

20:28

could mean the difference between you

20:29

know your system getting one part of the

20:31

problem right to wrong. But it could

20:32

also mean the opposite. it could have it

20:34

could actually be not good without any

20:36

prompt changes or changing how your

20:37

system works. So um having evos really

20:39

set up and being able to benchmark

20:41

really helps um make confident model

20:43

changes.

20:46

Cool. Um so now that policy agent we've

20:49

been developing this for a while it's

20:50

available for everyone on the RAM

20:52

platform. Some of the things that we

20:53

learned along the way is that um cloud

20:56

code as engineers is very exciting. We

20:58

have full control. We get to modify our

20:59

cloud MD. We get to make sure you know

21:00

tell it to not leave comments. it won't

21:02

leave comments hopefully. Um, turns out

21:04

it's not just us. Um, finance people

21:06

also really like to have you know modify

21:08

their cloud MD which is their expense

21:09

policy. So if something went wrong with

21:11

the decision then we just like tell them

21:13

hey go update your policy doc which to

21:15

them it's a little scary concept to

21:17

begin like this is a document like you

21:19

know you don't mess with that. Um you

21:20

have to go through a lot of hoops if you

21:21

want to mess with that but it turns out

21:23

if you get them really excited about the

21:24

feedback loop hey change that you'll see

21:26

it right away. Turns out they'll be like

21:28

really excited to do this. Um and then

21:31

trust builds over time. So some of the

21:32

earlier customers that we had were some

21:34

of the Fortune 500s. We actually started

21:36

with the really big, you know, um

21:38

enterprise customers that we had because

21:39

we thought that they would have the most

21:40

value. They have the most expenses

21:42

coming in. They have the most time spent

21:44

on reviewing coffee expenses. Um so out

21:49

to them, let them have the trust. Don't

21:51

we didn't do any autonomous action.

21:52

We're just like, "Hey, we're going to

21:53

give you a suggestion." That's that's

21:55

how that's kind of how we phrased it,

21:56

suggestions. And eventually they came to

21:58

us and were like, "Okay, you know what?

22:00

I want to go from suggestions to auto

22:02

approvals. Like anything under $200, you

22:04

guys are mostly right. I don't care

22:06

about this. Let me just go auto approve

22:07

it." So we gave them the autonomy

22:09

slider. We gave them a way to just like

22:10

turn it on and then they actually could

22:12

do it themselves. And then last but not

22:15

least, um similar to LMS, users thrive,

22:18

you know, in product feedback loops. Um

22:19

so you know when you're building an AI

22:21

product and you have a full way of like

22:23

LMS can test if it's code was right and

22:26

it's able to iterate users are the same

22:29

way um give them in product ways to

22:31

improve the expense policy doc improve

22:33

the agent and how it operates and um

22:35

they're more than excited to kind of

22:36

take it over themselves and um kind of

22:38

improve it and personalize it for them.

22:40

So um from here I'll pass it on to Ian

22:43

who's going to kind of talk about the

22:44

infrastructure and the culture that we

22:46

have at ramp that kind of it led us to

22:48

building the policy agent.

22:54

>> Hey everybody.

22:56

So you've heard a little a little bit

22:58

about like how we're kind of getting

22:59

leverage to all of the different finance

23:02

teams as we operate on top of their

23:04

financial infrastructure and really try

23:05

to get leverage for our customers. Um

23:07

but I think a big thing that we also

23:09

spend a lot of time thinking about is

23:11

how can we get leverage for ramp itself

23:14

the engineers our XFN orgs all the

23:16

people that we work with um every single

23:18

day and this slide is this section is

23:21

pretty intentionally named AI

23:22

infrastructure and culture because we

23:24

think that this is both like a really

23:26

challenging infrastructure problem but

23:28

it's also really challenging culture

23:29

problem and changing how you work as

23:31

well is a big part of the story

23:35

and so to kind of start on the

23:36

infrastructure side. The core of how

23:38

most of applied AI happens at RAMP is

23:41

our applied AI surface service. And at

23:44

like a 10,000 foot view, this looks

23:46

something kind of like an LM proxy or

23:49

something like light LLM. But there's

23:50

really three kind of main extensions

23:52

that we've invested in to make this a

23:54

lot more powerful for a lot of our use

23:56

cases. The first is like structured

23:58

output and consistent API and SDKs

24:00

across different model providers. This

24:02

can be pretty tricky to do, especially

24:03

with how quickly the APIs are changing,

24:05

but it's a problem that we don't want

24:06

downstream product teams to have to

24:08

think about. So, if you have an idea of

24:10

I want to switch from uh GPT 5.3 to Opus

24:14

or I want to try Gemini 3 Pro, you

24:17

should be able to do that with a config

24:18

change and really quickly be able to

24:20

iterate on semantic similarity and

24:22

trying to do a bunch of different um you

24:24

know, code sandboxing and structured

24:25

output calls that way. The other thing

24:28

that we've spent a ton of time thinking

24:29

about is kind of batch processing and

24:31

workflow handling. This is really useful

24:32

for eval if you're doing like bulk for

24:35

us bulk document or data analysis. Um

24:38

and that's something that we also don't

24:39

want teams to have to spend a bunch of

24:40

time on of how do you want to batch this

24:42

and handle it with rate limits and do we

24:43

want to do this on an offline or online

24:45

job with something like Enthropic. We

24:47

just want to handle that for downstream

24:49

consumers so they can just focus on

24:50

providing value for downstream

24:52

customers. And then the last which is a

24:54

pretty big deal is the ability to trace

24:56

different costs across teams and against

24:58

products as well. And this allows us to

25:00

kind of identify the parado you know

25:02

curve of like what is the best kind of

25:04

model performance for cost? How are

25:05

these evolving over time? What teams are

25:07

actually not you know building something

25:09

that's going to be sustainable long term

25:10

for different product surfaces. And this

25:13

can be really really important to just

25:14

remove all this work from internal teams

25:16

h having to think about this. And the

25:18

last thing that's kind of I think funny

25:20

to think about, we often joke about

25:22

that, you know, our customers are

25:24

actually using the front more of a

25:25

frontier model than they may even know

25:27

even is out yet is it allows us to stay

25:29

at the frontier when a new model comes

25:30

out. It's a oneline config change that

25:33

impacts every single SDK downstream. And

25:35

so rather than teams having to learn the

25:37

SDK or go into 12 or dozens of different

25:40

call sites, they can just change it in

25:41

one place for their specific team. and

25:43

they now get the benefit of being on the

25:45

latest and greatest models um that we've

25:47

kind of vetted and built into the rest

25:48

of the system.

25:51

Our

25:53

product as you've kind of heard earlier

25:55

earlier works on a lot of like very

25:57

sensitive data and very sensitive

25:59

workflows and I think oftent times uh

26:01

you know something that I hear from

26:02

engineers in the space is this kind of

26:04

concept of hallucination and safety and

26:07

how are you actually going to be able to

26:08

produce a lot of these things to have

26:09

benefits to downstream finance teams and

26:12

we're pretty big believers that it all

26:13

comes down to the catalog of tools that

26:15

teams are building and integrating with

26:17

a daily basis and so what you're seeing

26:19

here is is our internal tool catalog. So

26:23

an example would be like get a policy

26:25

snippet or PDM rate or recent

26:27

transactions. And these are built

26:29

alongside of product teams to really

26:31

understand a lot of the nuances in the

26:32

data and the use case. And what's really

26:34

cool about this is not only can you see

26:36

where there's gaps in our offering that

26:37

oh we actually don't have a tool for

26:39

this specific use case. These can be

26:41

used both in internal repos and our core

26:43

product. And so if you have an idea of I

26:44

want to do a cool reimbursement agent

26:47

idea, here are the different ways to

26:48

integrate the tools, the different APIs

26:50

and systems that they integrate with.

26:51

And now you can prototype that on a

26:53

totally new product and vibe coded

26:55

surface area without having to worry

26:56

about like learning all of these things

26:58

from scratch or building the tools on

27:00

your own. We're up to like many hundreds

27:02

of these tools today and we as Nick

27:04

mentioned earlier thinks that this could

27:06

be like multiple thousands over time.

27:11

On the topic of context, another big

27:14

thing we think about is context for our

27:16

customers of how do we actually

27:17

integrate the financial stack and allow

27:18

them to be a lot more productive. But we

27:20

notice like a very similar problem

27:22

internally um on our engineering team.

27:24

And I think something that's like not as

27:26

always obvious is that you know even if

27:28

you're using something like cloud code

27:30

or codeex there's all this fragmentation

27:32

of actually what you do on a daily basis

27:34

to get work done in your company that

27:36

that's not integrated too. There's logs

27:38

in data dog. There's a production, you

27:40

know, database that has a bunch of

27:42

things going on. There's different

27:43

alerting systems. There's incident IO.

27:45

There's a Slack message you have to pull

27:47

in. There's a notion doc. And then

27:48

there's a lot of like knowledge that

27:50

those actual specific product teams have

27:52

of how they actually need to get work

27:54

done as well. And so at the end of last

27:57

year, we decided to start out and try to

28:00

solve this problem of how can we

28:01

actually integrate all this context and

28:02

build our own internal background coding

28:05

agent, which we've called ramp inspect.

28:07

You may have seen this on LinkedIn or X.

28:09

We actually have open sourced the

28:10

blueprint of how we've built this and at

28:12

the end I can definitely show you guys a

28:13

link of where to find that. And the the

28:15

progress has been pretty phenomenal of

28:17

actually integrating this into a

28:19

background agent that can run

28:21

autonomously as people are in meetings

28:22

if as bug fixes come up and things like

28:25

that. And currently this month, Ramp

28:28

Inspect is responsible for over 50% of

28:30

PRs that we merge to production. I have

28:32

some interesting we're like really big

28:34

nerds with uh stats and numbers and

28:36

things like that. So we have this

28:37

dashboard to kind of create this like

28:39

interesting one like subtle healthy

28:42

competition but also inspire people that

28:44

they can actually use this as well. And

28:46

so you can see engineering uh has a huge

28:49

lead of the amount of sessions, but you

28:50

also have product, you also have design,

28:52

there's risk, legal, corporate, finance,

28:55

and even marketing and CX teams using

28:56

ramp inspect. And they're doing things

28:58

like simple copy changes. They're doing

29:01

logic fixes. They're trying to respond

29:03

to incidents or bugs. And what's been

29:05

really cool to see as this has evolved

29:08

over time

29:11

is how we've actually designed a couple

29:13

of these things um with some core

29:15

principles to be really powerful. So

29:17

what you're seeing here is a ramp and

29:18

spec session. I think this is an example

29:20

of like a query that we were trying to

29:21

fix. This spins up in the background

29:24

really fast um modal code sandbox. This

29:26

allows us to like resume spin up and

29:28

spin down these containers in an

29:30

isolated environment which has the same

29:31

environment that you would have if

29:32

you're developing a ramp. There's a

29:34

series of tasks to keep it on track and

29:36

it creates a GitHub branch and

29:38

integrates with all of the context

29:39

documents, our data dog, our read

29:42

replica so it can actually write queries

29:43

and different context documents that

29:45

product teams have um have put together.

29:48

And what's really uh I think subtle

29:49

about how we've designed this is we've

29:51

designed it to be multiplayer first. And

29:53

that means that as you integrate or you

29:55

try to pair with like a designer or

29:57

somebody on the PM team, they you can

30:00

actually help them like level up their

30:01

own prompting skills, they can give us

30:03

feedback of, hey, click on this link.

30:04

This actually failed in a way that I

30:06

wasn't expecting. And so that can be a

30:08

really great source of like

30:09

crossunctional collaboration that was a

30:12

very subtle design choice that we made

30:13

that ended up being a really big impact

30:15

for the company. And then these can be

30:17

kicked off either via a canban UI, we

30:20

have an API, and then also a Slack

30:22

thread. and we can take the full context

30:24

of the Slack thread when it has actually

30:25

kicked off. So you don't have to

30:26

reprompt it with a bunch of conversation

30:28

that happened earlier.

30:32

What you see here is um we also have a

30:34

full VS code environment. We run VNC

30:36

inside of a modal sandbox as well. So

30:38

this allows us to have Chrome dev tools

30:40

and MCP. So it can actually do full

30:42

stack work which is pretty cool. And it

30:44

has access to the 150 plus thousand

30:46

tests that we have. So, it also knows if

30:48

things are broken, can respond to the CI

30:50

inside of GitHub, and actually patch

30:52

fixes before it actually pings you that

30:54

the PR is done.

30:56

Um, the link for this is

30:58

builders.ramp.com.

31:00

I think it's like one of the first uh

31:01

blog posts that we have or the most

31:03

recent blog post that we have and we

31:05

open source like the whole blueprint of

31:06

how to build this and put this together

31:08

as well. I think there's also a GitHub

31:10

repo called Open Inspect which is an

31:11

open source implementation of this as

31:13

well.

31:17

So, it's been pretty interesting to see

31:19

the impact that ramp inspect has had

31:20

where over 50% of PRs that we merge on a

31:23

weekly basis goes through the system.

31:25

And so with all this time not spent on

31:28

thinking about these really low-level

31:29

firefighting tasks or really low-level

31:32

small fixes or tweaks that can be kind

31:34

of democratized across the company,

31:36

we're really rethinking like how our

31:38

engineering teams operate and think

31:39

about their job and how they can

31:42

actually be really impactful and this

31:44

new kind of AI native future.

31:47

And so as a thought experiment, um we

31:49

let's pretend we have two different

31:51

teams. I'm sure everyone in this room

31:52

has worked with like their handful of

31:54

extraordinary teams, maybe teams that

31:56

are finding their footing. And you'll

31:58

notice that there's like a couple of

31:59

different qualities that may sound that

32:01

may resonate. So we have team A on the

32:03

left here. And let's say that they

32:05

really care about impact. They handle

32:06

ambiguous problems. They understand the

32:08

product, business, and data. They adopt

32:10

new tools. They can find creative

32:12

solutions and they obsess over like the

32:14

user experience. And then team B may

32:17

also resonate with some people. You

32:18

know, they debate libraries. They add

32:20

process when things start to feel

32:21

chaotic. They constantly complain about

32:23

headcount. They bike shed the details

32:26

instead of actually focusing on the user

32:28

experience like hey should we use you

32:30

know functional programming paradigm

32:32

here or what version of you know

32:33

different typescript libraries do we

32:35

want to use and then they build before

32:37

understanding the problem right they

32:38

just say hey we're going to just vibe

32:40

code this bro don't worry or they focus

32:42

on you know performative code quality or

32:44

nitpicks that may not actually that may

32:46

be very much like a subjective kind of

32:48

matter of fact as well I've worked on

32:50

both of these teams and I think the

32:51

argument that I'm going to make today is

32:53

that there's going to be divergence I

32:55

think depending on what side of the

32:56

aisle you land there. This is a study

32:59

from Harvard that was out uh I think the

33:01

end of last year and it was very much

33:03

geared towards juniors and in seniors in

33:05

terms of what's actually happening with

33:06

hiring trends in uh in engineering since

33:09

AI tools have accelerated. And I think

33:11

what this glosses over is I don't think

33:13

it's just a years of experience problem.

33:15

I actually think it's very much um all

33:17

of the different qualities that I said

33:18

in team A versus team B that really make

33:21

it apparent that like coding was never

33:23

really the hardest part of a lot of jobs

33:25

for a lot for a long time. There's all

33:28

these other engineering principles that

33:29

become really important than just raw

33:31

coding speed. So when you think about

33:33

like a staff or a staff plus engineer,

33:36

you're really compensating those people

33:37

more for a lot of the judgment that they

33:40

bring to the table, the context, the

33:41

ability to see around corners, all the

33:43

learning that they have, the actual like

33:45

scar tissue. And so if you know you ask

33:47

Opus 46 to do something, they'll have

33:50

the knowledge to actually know if that

33:51

is not going to work or that's actually

33:53

a bad idea. And I think one thing that a

33:55

lot of the narratives that we see in the

33:56

media gets get wrong about coding agents

33:59

is they don't really identify the fact

34:01

that you could still build the wrong

34:02

thing just a lot faster and you can

34:04

build like bigger messes. And I think

34:06

that having a lot of these skills of a

34:08

team A and really focusing on like what

34:10

is the context and reason behind this

34:11

will only become more important um in

34:14

AI.

34:16

And so what does that actually look

34:18

like? We hit on some of these things.

34:20

figuring out what to build and

34:22

understanding users well enough. Selling

34:24

an idea to skeptical stakeholders. This

34:26

is still something when we decided to

34:28

build a a background coding agent. This

34:30

was not something that was obvious that

34:31

we should be spending time on this.

34:33

Having good design design decisions with

34:36

incomplete information and maintaining

34:38

momentum through the long middle of this

34:40

project, which can be really gnarly. And

34:42

I think this last bit, you know,

34:44

everyone in this room, I'm sure, is

34:45

painfully aware of, you know, the

34:47

conversation around SAS and and the

34:49

stock market and things like that. And I

34:51

think this is like a big element that

34:52

they gloss over, which is that yes, it's

34:55

easy to vibe code something, but

34:56

actually going through that middle

34:58

process is like why you need really good

34:59

engineers to actually get something

35:00

deployed that has product market fit

35:03

that people are really excited about.

35:04

Um, and I think not enough people

35:06

recognize that.

35:10

And so where does that leave us?

35:11

Personally, I think there's a lot of

35:13

kind of dumerism and scariness around a

35:15

lot of the AI narratives, but I think

35:17

it's also a really exciting time to be

35:19

building. Unlike maybe factory work or

35:22

farming, software is never done. We have

35:25

this uh really kind of like meme

35:28

internally where we say, you know, jobs

35:30

not finished. You've probably seen in

35:31

the marketing as well. And I think

35:32

software is perpetually not finished.

35:34

And so with all this extra capacity,

35:37

with people focusing less on this kind

35:39

of low-level work and more on high

35:40

leverage engineering tasks, I think four

35:43

things are going to really happen. I

35:45

think companies are just going to chase

35:46

opportunities they couldn't afford to

35:48

pursue. I don't know if we would be

35:50

chasing these like agentic workflows and

35:52

really thinking about bigger scale

35:54

problems in the financial stack if this

35:56

technology didn't exist. People are

35:58

going to enter adjacent markets. They're

36:00

going to try to stitch together more

36:01

value for customers. It's not going to

36:03

be like because everyone's 2x more

36:05

productive, you need two less or half

36:07

the people. You're going to rebuild

36:09

systems that are too expensive to touch.

36:11

I think building an internal background

36:12

coding agent uh for a company that does

36:15

financial operations um software felt

36:17

like probably a pretty crazy idea, but

36:19

now that makes a ton of sense and raise

36:21

the bar for what good enough means. I

36:23

think, you know, being able to kind of

36:25

build more mind-blowing experiences for

36:27

users, provide a lot more value is going

36:29

to be the narrative of the next decade.

36:32

And I'm super excited to be able to

36:33

build some of these things and see what

36:35

everyone in this room is going to build,

36:36

too. So, thank you.

Interactive Summary

Ask follow-up questions or revisit key timestamps.

The presentation details RAMP's strategy for integrating AI into financial operations, highlighting their shift toward a multi-skilled 'single agent' architecture rather than thousands of separate agents. It showcases the 'Policy Agent' for automated expense management and 'Ramp Inspect,' an internal background coding agent that generates over 50% of the company's production code changes. The speakers also emphasize a cultural shift in engineering, where human judgment and context become more critical than raw coding speed as AI handles low-level tasks.