HomeVideos

Small Bets, Big Impact Building GenBI at a Fortune 100 – Asaf Bord, Northwestern Mutual

Now Playing

Small Bets, Big Impact Building GenBI at a Fortune 100 – Asaf Bord, Northwestern Mutual

Transcript

585 segments

0:13

[music]

0:21

Doesn't this look like something's going

0:22

to drop from the ceiling? Like a ground

0:25

zero type thing? [snorts]

0:27

Be honest. It's like who has the buzzer

0:28

that if I'm I really suck, they press it

0:31

and everything falls down through the

0:32

trap door. No.

0:34

>> Be careful.

0:35

>> Yeah. Okay. Who was it?

0:37

>> Okay. You tell me if I'm doing okay or

0:39

if I should take a couple of steps back.

0:40

Right. So, hi everyone. I'm Assaf. Um,

0:45

and I'm here to talk about Genbi. And

0:47

kind of first disclaimer, this

0:49

presentation was not created with Gen

0:50

AI. Um to be honest, I actually started

0:54

doing it uh with uh GPT03 back in

0:57

August. [snorts] Uh and then I did kind

1:00

of a first draft and then a couple of

1:02

weeks back I wanted to come in and

1:03

refresh it before the conference and

1:05

then GPT5 took over completely messed up

1:09

my slides. So I ended up doing it

1:12

manually kind of old-fashioned. So if

1:13

I'm missing like an M dash somewhere in

1:16

the middle, let me know after. Okay.

1:18

[snorts]

1:19

Uh so first of all, a bit of

1:21

housekeeping. What's GenBBI? So, it's a

1:23

fusion of Gen AI and BI. It's basically

1:25

an agent that helps people answer

1:28

business questions with data like a a

1:31

business intelligence person would do in

1:32

real life. [snorts] Uh, the reason that

1:35

we're pursuing GenBI is really because

1:36

of the data democratization that it can

1:38

bring, right? So having access to data

1:41

at your fingertips without having to be

1:43

reliant on a BI team that helps you find

1:45

a report, figure out what it means, uh,

1:48

understand your world before they can

1:50

even give you any kind of input. Uh, so

1:53

that's GenBBI. Uh, a bit about

1:55

Northwestern Mutual. That's where I

1:57

work. So we're a financial services,

2:00

life insurance, and wealth management.

2:01

Been around for 160 years. [snorts] Uh,

2:04

some very impressive numbers there. But

2:06

first of all, I want to say why is

2:08

Northwestern Mutual a great place to do

2:10

Gen AI. We got a lot of data, we got a

2:13

lot of money, we got a lot of use cases,

2:16

and we got access to some of the best

2:17

talent uh anyone can dream of. Really

2:20

truly humbled by the people that I get

2:22

to work with. Um but on the flip side,

2:26

why is it hard to do Gen AI at

2:28

Northwestern Mutual? Because it is a

2:30

very riskaverse company, right? If you

2:33

think about it, our main motto is

2:35

generational responsibility. I call it

2:38

don't f up. Uh because what we end

2:41

up selling to people is a decadesl long

2:45

commitment, right? You buy life

2:48

insurance now. Uh if you stay with us

2:52

until it comes to term, so to speak,

2:55

that can be 20, 40, 80 years down the

2:58

line, depending on when you buy it and

2:59

how long you get to live. And so

3:01

stability is something that's very

3:04

important for us because it's important

3:05

for our clients. So how do we balance

3:08

stability with innovation? That's what I

3:11

want to talk about today. Um and really

3:14

the four main challenges that we had

3:17

when we even came up with the idea kind

3:20

of a pie in the sky Genbi concept. Uh

3:23

[snorts] first of all, no one's done it

3:24

before, right? Truly, no one's done

3:26

Genbi in this fashion in the past. Uh

3:29

secondly, and this was really a

3:31

preference for us. We wanted to use

3:34

actual data that's messy because we knew

3:37

that those were that's where the real

3:40

challenges are going to be, right?

3:41

Understanding actual messy data for 160y

3:44

old company and how can we perform well

3:47

within that ecosystem. Um the third was

3:51

kind of a blind trust bias. So um the bi

3:55

the trust that we had to build was both

3:57

with the users but also with the

3:59

leadership of the company right how can

4:02

we bring accurate information accurate

4:04

answers to people when uh all of these

4:08

things that we know about and everyone's

4:09

talked about is is just out there right

4:11

no one's blind to the trust barriers no

4:14

one's blind to the accuracy barriers so

4:16

how do we convince that this is actually

4:17

something that we can trust in the

4:20

company and lastly

4:23

Um but really firstly when we go to

4:26

approach this from an enterprise

4:27

perspective [snorts] budget the impact

4:29

right how do we convince someone in a

4:32

leadership uh organization where risk

4:34

aversement is ingrained in the DNA to

4:38

even invest in something like this that

4:40

no one's done before we don't really

4:42

know how we would do it uh we're not

4:44

even sure how it would look like when it

4:45

comes to turn

4:48

uh so I'll start kind of one by one uh

4:51

and first of all really talk about why

4:53

we chose to use actual data uh and not

4:56

synthesized data or cleanse data.

4:58

>> [snorts]

4:58

>> Uh so really it's about making sure that

5:00

we understand the actual complexities

5:02

that we will have to face when we

5:05

eventually want to go to production

5:06

right we know that you know building uh

5:09

PC's and demos is so easy but the gap

5:12

from PC to production is so broad uh

5:14

especially in this gen AI space

5:16

especially because we don't know upfront

5:18

how to design the system what we would

5:20

expect it to behave like so making sure

5:23

that we operate with real data just gave

5:25

us that extra confidence that when

5:26

something works in the lab it's very

5:28

likely to also work in reality. Uh but

5:32

also and maybe not uh in the least less

5:36

important is that we got to work with

5:38

actual people who work with the data day

5:40

in and day out and that gave us two

5:42

things. Okay, first of all subject

5:44

matter expertise which are super

5:46

critical for us to be able to validate

5:47

that the system is actually working gave

5:49

us a lot of real life examples of what

5:51

people are actually asking in a

5:54

corporate and what people have answered

5:55

to them. So basically the eval right and

5:58

all the testing and stuff. Uh but at the

6:00

end of the day it also brought the

6:04

business to be a part of the research

6:06

project itself and they became kind of

6:09

bought into the idea as part of the

6:11

process. So we didn't just test

6:14

something in the lab and then had to

6:15

convince someone to go ahead and use it.

6:18

The end users were part of the research

6:21

process itself. And so when eventually

6:23

it matured enough so we can take some of

6:25

that to production, they were already

6:28

there and they actually were pulling

6:29

that. They told us we want to take this,

6:31

how can we wrap it, how can we package

6:33

it uh quickly enough so we can put it

6:35

into practice.

6:38

Uh and the next part was really about

6:40

building trust. Uh so this is about

6:43

building trust first of all with our

6:45

management team. All right. Now, I don't

6:47

know about you, but last time that I got

6:49

a million dollar to do a research

6:51

project that I wanted in a pie sky idea.

6:53

I woke up from the dream and I realized

6:56

that this is not how things work in

6:58

reality. You don't just get a million

6:59

dollars and go ahead and try something

7:01

out. Uh, you had to show that you know

7:04

what you're doing. And part of what we

7:06

did, it's kind of listed out here, but

7:09

obviously, you know, we did all the

7:10

regular stuff, right? We worked in a

7:12

sandbox environment. We made sure that

7:14

we're not using actual client data. We

7:16

made sure to put in all the security

7:17

risk aside. But uh one of the first

7:21

approaches that we said we're going to

7:22

take is we're not just going to build a

7:24

tool that's going to be uh released to

7:27

everyone, right? We understood very

7:30

quickly that um how people interact with

7:34

the tool, their ability to verify that

7:37

what they're getting is right and also

7:39

give us feedback changes dramatically

7:41

depending on their expertise and

7:42

understanding of the data. So we took

7:44

that crawl, walk, run approach that

7:46

basically said we're first going to

7:48

release it to actual BI experts, right?

7:52

people that would be able to do it on

7:54

their own and know what good looks like

7:56

when they get it. And we're just going

7:58

to expedite the process for them kind of

8:00

like a GitHub co-pilot. The next phase

8:02

would be to bring it to business

8:04

managers and again people who are closer

8:06

to the BI team, but when they see a

8:09

mistake, they can pretty much figure out

8:12

that what they're seeing is wrong

8:13

because they're used to seeing that on

8:15

day-to-day basis. um and they will might

8:18

be less sensitive to these types of

8:19

mistakes and be more inclined to give us

8:21

that feedback instead of just, you know,

8:23

dumping it aside and never using it

8:25

again. Giving this type of tool to

8:27

executives in the company, I don't even

8:29

know when we're going to get there,

8:31

right? Like an executive, they want

8:33

clear, concise answers that they know

8:36

they can trust. We're definitely not

8:38

there yet. I think that's the vision uh

8:40

at some point in time, but the system is

8:42

not accurate enough for us to get there.

8:44

Maybe it never will be.

8:46

>> [snorts]

8:46

>> Um, another way that we another liver

8:49

that we kind of used to build inherent

8:51

transit the system is that we said,

8:53

well, in the get-go, we're not going to

8:56

even try to build SQLs, right? This is

9:00

very complex. This is very hard even for

9:03

a person. So, we said step number one,

9:06

let's just bring information that is

9:08

already in the ecosystem that's already

9:10

verified, right? We have a lot of uh

9:12

certified reports and dashboards. Um and

9:15

actually in the conversations we had

9:16

with some of the BI teams that we worked

9:19

with, they told us guys like 80% of the

9:22

work that we do is basically sending

9:23

people to the right report and helping

9:25

them figure out how to use it. So the

9:27

report is already there. Um and that

9:30

again built some inherent trust into how

9:33

we architected the system because we

9:34

said we're not going to make up

9:36

information. we're just going to deliver

9:38

you the same asset that you would have

9:39

gotten anyway just in a much faster much

9:42

more interactive way. Uh and that was

9:44

the alignment of expectations that we

9:46

did very upfront with the uh users and

9:49

also with the management team.

9:51

Now [clears throat]

9:54

the biggest um

9:57

process or kind of the most important

9:59

approach that we took when uh

10:01

approaching our leadership team and

10:02

convincing them that we want to do this

10:05

was to create a very gradual incremental

10:08

process that gave them a lot of

10:10

visibility and control. Uh and it was

10:13

very important for us to build

10:15

incremental deliveries throughout that

10:17

process so that [snorts] uh not only did

10:20

they have the the visibility into what

10:22

are we funding now, what do we get out

10:25

of it, they actually had business

10:26

deliverables they could realize

10:28

potential from throughout the process

10:31

and at any point in time they could pull

10:33

the plug right and say okay like it's

10:35

not working well or we got enough out of

10:37

it or you know the next phase is so you

10:40

know unknown and long that we don't want

10:42

further invest in it. And this is how we

10:44

basically broke it down. So phase one

10:47

was just pure research, right? We kind

10:49

of did the shift from natural language

10:51

to SQL. We figured out how to write

10:53

responses. We figure out how to

10:55

understand questions that coming in.

10:57

Just kind of setting the stage. [snorts]

10:59

Phase two was about really

11:00

understanding, okay, so what does good

11:03

metadata and good context look like in

11:06

the perspective of a BI agent, right? It

11:08

looks very different if you're just

11:10

chatting with something or if you're

11:11

trying to do a rag with you know

11:13

unstructured data like documents and uh

11:16

business knowledge and stuff like that.

11:18

And this phase on its own already had uh

11:21

impact on the business because when we

11:23

define what good metadata looks like for

11:24

an an LLM uh we could immediately apply

11:29

that also to just the ecosystem of data

11:32

users across the enterprise. Um, and by

11:35

understanding how to extract LM from the

11:37

information, we could also how to

11:40

extract metadata. Sorry, here's where

11:42

[snorts] the trap door comes into play,

11:43

right? Um, we could also project that on

11:47

how or what good metadata looks like for

11:50

humans interacting with the data. We

11:52

have another initiative around semantic

11:54

layer going on which tries to model

11:56

exactly that and this provided a very

11:58

valuable input to that initiative as

12:00

well. But the immediate next step was

12:02

basically just doing this kind of uh

12:05

multicontext semantic search, right?

12:07

People coming in asking different

12:08

questions and having the system figure

12:11

out what's the right context, what's the

12:13

right information we need in uh bring

12:15

them. And this is something that could

12:16

already be packaged as its own product

12:19

and delivered uh and basically just do

12:22

kind of a data finder and data owner

12:25

finder which is something that could

12:28

take anywhere between two to maybe four

12:30

weeks in an enterprise like Northwestern

12:33

Mutual just finding what data exists and

12:35

who owns it so I can start talk uh the

12:37

conversation with them.

12:39

Um and the next layer was really about

12:42

pulling in information and trying to do

12:44

some light pivoting around the data. Um

12:47

each one of these steps as you can see

12:50

also created an input to the to the

12:52

following step so that the research

12:55

itself was kind of self u

12:57

self-propelling and there were

12:59

incremental outcomes coming out of each

13:00

one of these phases. Uh the next one is

13:03

more kind of setting it up for

13:05

enterprise level usage. So understanding

13:07

roles of in uh of different users coming

13:10

in what they may be asking about what

13:12

type of access we want to give them etc

13:14

and eventually and this is still some

13:17

ways to go ahead uh building kind of a

13:19

fullyfledged NBI agent which doesn't

13:21

only quote information from existing

13:23

reports but I can actually run SQL

13:26

queries on its own uh pull in more data

13:29

do more sophisticated joints between

13:31

different data so it can answer more

13:32

complex questions so that's the road map

13:35

right that's kind the high level plan.

13:37

Now, why did that work? Well, kind of

13:39

quickly summarize them. We talked about

13:41

uh so we get value uh early and we get

13:44

value often. Each one of this was a six

13:46

week sprint at the end of which we had

13:49

had a very tangible deliverable coming

13:52

back to the business that we could

13:53

decide to productize. Uh and at any

13:56

point in time, we could decide how we

13:59

want to move forward. There was

14:00

transparent progress. There was

14:02

incremental business value. Uh each one

14:04

of these steps allowed us to learn

14:06

something that helped feed the next

14:08

step.

14:09

And maybe the most important part and

14:11

that's the bottom line here and that's

14:13

the part that executives really look at.

14:16

How do we control the risk in continuing

14:18

to invest in this type of research

14:20

project and this is really about

14:23

eliminating things like sun cost bias,

14:25

right? We already paid you know you know

14:27

whatever a million dollar. Let's just

14:28

get through the project see what we get

14:30

at the end. This eliminates the uh uh

14:33

fear of of competitors coming in and

14:35

maybe we don't need to continue

14:37

investing in this right so everyone in

14:39

the industry is researching GenBI and

14:41

there are solutions like data bricks

14:43

genie that are coming up and they're

14:44

getting better and better maybe at some

14:46

point in time it's better for us as an

14:47

organization to actually adopt data

14:49

bricks genie but at that point again

14:52

first it's much easier for us to pull

14:54

the plug and the funding but we already

14:56

have a good understanding of what good

14:59

looks like we have benchmarks that we

15:00

used for ourselves when testing our own

15:02

system that we can test a third party

15:04

solution with. And we know what to

15:07

expect, right? We know what works, we

15:09

know what doesn't. We know what a kind

15:11

of fluffy demo from a vendor would look

15:13

like. And we know where to drill in to

15:15

ask the tough questions.

15:18

So let's see kind of what it looks like

15:20

under the hood and how we productize

15:23

different elements uh of this

15:24

architecture. Uh and maybe kind of very

15:26

quickly, why can't we just do it with uh

15:28

chat GPT? So you [snorts] know just

15:30

dumping a schema into chachpp doesn't

15:32

work. Usually schemas are very messy.

15:34

It's not uh easy to understand the

15:36

context and the meaning of things.

15:38

[snorts] Uh and eventually governance is

15:40

super important. So there was a lot of

15:41

governance built into the architecture

15:43

that was very hard to apply on chpd from

15:46

the outside but even solutions like you

15:48

know data bricks genius third party much

15:50

harder to govern from the outside than

15:52

from the inside. But still TBD.

15:55

Uh so the stack kind of looks like this.

15:57

Uh we have a data and metadata layer

16:00

that we produced. We have four different

16:01

agents that are running across the

16:03

pipeline. A metadata agent that

16:06

understands the context. A rag agent

16:07

that finds the different reports. An SQL

16:10

agent that can pull more data if we need

16:12

that. And then eventually what we call a

16:14

BI agent that takes all that information

16:15

and delivers an answer to the question

16:17

that was asked. On top of that, we slap

16:20

governance and trust and orchestration

16:22

and eventually some kind of a contextual

16:24

UI. Um and this is how the flow goes. So

16:29

when a business question comes in we uh

16:32

push it into the orchestrator and

16:34

basically decides how to facilitate the

16:36

process. The first thing that we do is

16:38

understanding the context. So that's

16:40

where that metadata agent comes in works

16:42

with the catalog works with all the

16:44

documentation that we have across the

16:45

system to understand what we're being

16:47

asked about and what's the relevant

16:49

information to share. Then we go to the

16:51

rag agent which tries to find an

16:53

existing report again out of a list of

16:55

certified reports that we know are

16:58

allowed for people to use and people

16:59

have spent a lot of time fine-tuning

17:01

them and making them as accurate as

17:03

possible.

17:04

If we can't find the report or if it's

17:07

not exactly what we need to um to use,

17:09

that's where we go to the SQL agent that

17:12

basically tries to create a more um

17:15

exact query or a more elaborate query.

17:18

And even if the report that we have is

17:21

not usable as is, it gives us that

17:24

initial seed of a query that we can then

17:26

expand on rather than having to build

17:28

one from scratch. So it's kind of like a

17:31

fewot uh example, but in this case the

17:35

example that we give is very very close

17:37

to the actual result that we're

17:39

expecting to get. We then execute it

17:42

against the database pull and push it

17:43

into the BI agent which gen with which

17:46

gen uh translate that to a business

17:49

answer and not just dumping data back on

17:52

the user and this is what goes into the

17:55

final answer. Now there's obviously some

17:56

kind of a loop that says if I'm in the

17:58

same conversation I'm probably talking

18:00

about the same data so we don't have to

18:02

talk about this or do this again and

18:03

again. Now

18:06

each one of these three components, each

18:07

one of these three agents can be

18:09

packaged as its own product and

18:13

delivered to production with a very

18:16

tangible and actual impact on business

18:19

metrics. Okay. And that's the kind of

18:22

beauty of this uh approach that after we

18:26

productize each one of these, we could

18:28

have basically said stop or let's move

18:30

forward.

18:32

uh and just some giving bottom line

18:34

numbers around some of these. So just

18:36

the rag agent that pulls the right

18:39

report uh allowed us to take about 20%

18:43

of the overall capacity of the BI team

18:46

that basically said uh all we do is just

18:50

share the right report with the right

18:51

person. So we were able to automate

18:53

around 80% out of those uh 20% and we're

18:56

talking about a team of 10 people. So

19:00

roughly two people full-time job all

19:03

they do is find the right report and

19:05

send it to the right person.

19:08

uh the metadata understandings that we

19:10

got from learning how to interact with

19:12

the data through an LLM allowed us to

19:14

run AB test in a in the semantic layer

19:17

project that we did and that allowed us

19:19

to prove back again to the senior

19:23

leadership in the company that there is

19:25

value and tangible value measurable

19:28

value in enriching metadata. And we did

19:31

that basically by running uh a a battery

19:34

of questions um against a database that

19:37

had good metadata and one that didn't

19:39

have good metadata. And we show how much

19:42

better an LLM performs when having the

19:44

right metadata in place. So basically

19:46

proving the value of something that can

19:48

be very fluffy like hey let's bring in

19:50

more documentation into the code. Uh

19:53

right now we're experimenting with the

19:55

data pivoting bot. Uh so once you have a

19:58

dashboard or a report be able to change

20:01

the time horizon some of the views some

20:03

of the segmentations and the groupings

20:04

of the data again kind of real time

20:06

without having a person do that for uh a

20:10

business stakeholder and some of the

20:12

next steps is really evaluating the

20:13

tools that are out there for uh Genbi

20:16

like data bricks genie for example and

20:18

we're going to go into a much more

20:20

rigorous process of enriching our

20:21

catalog with metadata and documentation

20:24

and that's also going to come out of a

20:26

lot of the learnings that we got from uh

20:28

the research that we've done. So even if

20:30

we don't end up writing a GenBI agent

20:34

full-fledged end to end, we already got

20:37

a lot of value back from this and this

20:39

is really what allowed our senior

20:41

leadership team to continuously invest

20:43

in this project quarter over quarter.

20:47

One thing that I want to wrap up with is

20:50

just a couple of thoughts I had about

20:52

the future. So um I think we talk a lot

20:55

about how to prepare data. I think

20:57

that's going to be a huge area in the

20:58

market and they're going to be probably

21:00

a lot of companies and tools that are

21:01

going to help us with that. Uh building

21:04

very specific task specific models and

21:07

applications. I think a lot of startups

21:09

and companies are going to come up from

21:10

that area. Uh co-pilot is really making

21:14

sure that we meet the users where they

21:15

are. Uh and securing of models obviously

21:18

a very big thing. The last thing is the

21:20

one I the the one I want to focus on the

21:22

most because that's kind of a recent

21:24

thought that came to me a couple of

21:26

weeks ago. How we do pricing of SAS in

21:29

the Gen AI era. Uh this is really about

21:32

the fact that one individual person

21:35

today can be 10x more effective uh than

21:38

they used to be in the past. And then do

21:41

we price uh software based on seats or

21:44

do we price software based on how much

21:46

they used it or do we price software

21:48

based on the value that they got out of

21:50

it? Uh Salesforce is already

21:53

experimenting with that. So that the

21:54

data cloud product at Salesforce is

21:56

starting to be uh usage priced and not

21:59

seats priced. And I think this is going

22:01

to have a big impact on just the uh kind

22:04

of SAS economics worldwide.

22:07

uh and it it doesn't even matter if the

22:09

product itself is genai. It's really

22:10

about what does the person using the

22:13

product can do and what can they do in

22:15

their other time uh and whether it still

22:18

makes sense to price it by how many

22:19

employees you have or how much work you

22:22

get done with the employees that you

22:24

have.

22:26

That is me and thank you very much for

22:28

listening and thanks for not opening the

22:30

door on me.

22:37

>> [music]

22:43

[music]

22:49

>> Heat.

Interactive Summary

Assaf introduces GenBI, a fusion of Generative AI and Business Intelligence, aimed at democratizing data access within enterprises. He highlights the unique challenges faced at Northwestern Mutual, a risk-averse financial services company with a focus on long-term commitment, when implementing such a novel concept. The presentation details a strategic approach to overcome these challenges, including using messy, real-world data, building trust incrementally with users and leadership through a "crawl, walk, run" method, and developing the GenBI agent through a phased, incremental process. This approach provided early, tangible business value, ensured transparent progress, and allowed for continuous risk control, avoiding sunk cost bias. The technical architecture involves multiple specialized agents (metadata, RAG, SQL, BI) to process business questions and deliver accurate answers. The project has already yielded significant benefits, such as automating BI team capacity and proving the measurable value of enriching metadata. Assaf concludes by pondering future implications of GenAI, particularly the shift towards usage-based pricing for SaaS products due to increased individual productivity.

Suggested questions

8 ready-made prompts