HomeVideos

Leveraging AI in the production of official statistics How AI helps NSO deliver statistics faster..

Now Playing

Leveraging AI in the production of official statistics How AI helps NSO deliver statistics faster..

Transcript

2698 segments

0:04

So um good morning everyone, good good

0:07

afternoon, good evening. Uh thank you

0:09

for joining this uh this uh um side

0:13

event which is part of the uh the leadup

0:16

to the 57 session of the statistical

0:18

commission. Um the the webinar is on

0:22

leveraging AI in the production of

0:24

official statistics. how AI helps NSOS

0:27

deliver uh statistics faster, cheaper

0:30

and better. So today's session uh will

0:33

focus on AI for data production uh so to

0:36

speak the supply side of of statistics

0:40

serving as the counterpart for the

0:42

Friday seminar on 28th February which

0:45

will focus on AI readiness and data use

0:48

or what we would call the demand side uh

0:51

of this conversation. So this seminar

0:54

will be organized uh in four strategic

0:57

dialogues around four concrete case

0:59

studies and each segment will consist of

1:02

a 7 minute presentation followed by 13

1:06

minutes strategic discussion and the

1:08

participants should use the the Q&A tab

1:11

in teams to submit their questions

1:14

throughout the event. Um

1:18

just to give you uh a few a few uh

1:21

housekeeping a more housekeeping details

1:23

the the the agenda uh we will have

1:26

opening remarks by uh Mr. Ashwell Jenner

1:29

who is deputy director general for

1:32

statistical operations and provincial

1:34

coordination of statistics South Africa.

1:36

Uh Mr. Jennica is representing the chair

1:38

of the bureau of the committee of

1:40

experts on big data and data science for

1:42

official statistics

1:44

and uh as a moderator we will have Mr.

1:46

Osama Raman who is director of strategy

1:49

planning innovation delivery assurance

1:51

and support at the office of national

1:53

statistics of the United Kingdom and Mr.

1:56

Raman is uh representing the chair of

1:58

the data science leaders network of

2:00

singing.

2:01

So the first segment on the use of AI uh

2:05

to enhance uh uh statistical operations

2:07

from manual processing to algorithmic

2:09

supervision. We'll start with a case

2:11

study by Mr. Elu Vijas who is director

2:14

of the data science laboratory and

2:16

modern methods of information production

2:19

in the national institute of statistics

2:22

and geography of Mexico and we will have

2:24

a discussion with uh professor Shu

2:27

Renjun uh from uh the representing the

2:30

global China hub and and from Sang

2:33

University in China.

2:36

Segment two will be on the use of AI to

2:38

fill data gaps and uh we will have a

2:41

case study uh presented uh by Mr. Setia

2:45

Pmana professor of statistics at the

2:47

poly techchnic statistica stis in

2:50

Indonesia uh and uh representing the

2:53

regional hub for big data and data

2:55

science for Asia and the Pacific in

2:57

Indonesia. Um the discussion will be led

3:01

uh with uh Miss Mariel Lisagera Toledo,

3:04

head of research uh in the National

3:07

School of Statistical Sciences in the

3:08

Brazilian Institute of Geography and

3:10

Statistics, IB and also representing the

3:13

UN regional hub for big data and data

3:15

science in Brazil and uh Mr. Marcus Sala

3:19

who is director general of statistics

3:21

Finland.

3:23

Uh the third segment will be on the

3:25

governance challenges of the AI in

3:28

statistical organizations and um we will

3:31

have a discussion uh a presentation

3:33

first by Mr. Matias Jock uh from

3:35

statistics Netherland and uh a

3:38

discussant uh uh discussant will be Mr.

3:41

Gary Donut, interim chief of methodology

3:44

and statistics in New Zealand in

3:46

statistics New Zealand and um uh Miss

3:49

Franchesca Frankie K uh chief uh

3:52

information officer uh of the central

3:54

statistics office of Ireland. And

3:57

finally the final segment on

3:58

institutionalizing AI in statistical

4:01

production we will have a presentation

4:03

by Mr. Rohit Bad Badway, deputy director

4:08

general uh in uh on data informatics and

4:11

innovation in the national statistics

4:14

office of India. Um the discussion will

4:16

be led with uh Mr. Ivan Moreni, director

4:19

general uh of the national institute of

4:22

statistics of Rwanda and representing

4:24

also the UN regional hub for big data in

4:27

Africa. And so we will have at the end

4:29

closing remarks uh by Mr. Am Andur,

4:33

chief of uh the data innovation and

4:35

capacity branch of the United Nations

4:37

Statistics Division. Uh with this I

4:40

would like to uh hand over to uh Mr.

4:44

Jenner uh from Statistics South Africa

4:46

uh for the official opening uh remarks

4:49

and and um the floor is yours. Uh

4:53

>> thank you Louise.

4:56

Good morning, good afternoon, good

4:58

evening depending on where you are

5:00

around the globe and heartly welcome

5:04

from the committee of experts on big

5:07

data and data science for official

5:10

statistics.

5:12

Now according to Fortune Business

5:15

Insights, the AI industry will grow from

5:19

367

5:21

billion to 2.5 trillion by 2034.

5:29

This means a compound annual growth rate

5:31

of about 26%.

5:34

So when the committee of experts looked

5:36

at this tremendous growth that will take

5:38

place

5:41

uh we considered this when we changed

5:43

our mandate in 2024

5:46

and we've now got specific reference to

5:49

AI and and what we were specifically

5:52

thinking about is we need to provide

5:55

strategic direction to emerging new

6:00

technologies and methodologies.

6:03

uh such as AI and that's to the whole um

6:08

sort of official statistics industry.

6:12

We also need to promote the practical

6:14

use of of of AI in official statistics.

6:19

Also promote capacity development,

6:23

enhance communication and advocacy

6:26

uh for AI initiatives in the industry.

6:31

Lastly, but the most important, we need

6:34

to build trust. We need to build public

6:38

trust for AI. But not just public trust.

6:41

We also need to build trust in the

6:44

industry because you must remember we're

6:47

a we're an old and established

6:49

statistics industry. And once something

6:52

new like AI comes along, we need to make

6:56

sure that we build the trust so that we

6:58

can adopt that into our processes. So

7:03

the question is today

7:05

how do we produce statistics that's

7:08

faster that's less costly and that's of

7:13

higher quality. It mean it's available

7:16

at a lower level but it's of higher

7:19

quality and we'll have discussions

7:21

around that.

7:23

So strategy is good. It's good to have

7:25

strategy to know where we're going but

7:28

today we'll focus on implementation.

7:32

So we're going to move from talking to

7:35

doing.

7:38

So we'll move to doing today. Not so

7:40

much talking but how do we start doing?

7:44

So I wish you fruitful deliberations and

7:48

I hope that um the discussions that will

7:53

follow will bring you much closer to

7:57

implementation. Thank you very much. Let

7:59

me hand over to Osama. You take it from

8:02

there. Thanks.

8:05

you still muted Osama

8:11

handing over and telling them I'm on

8:13

mute. Uh thank you. Uh as Luis has said,

8:16

we have a great set of presentations and

8:20

presenters uh and a great set of uh

8:25

panelists and discussants. So I think

8:27

we're going to have a really good

8:28

conversation. So rather than waste any

8:31

more time, uh let's go to my good friend

8:33

Elio Visenor and he's going to talk

8:36

about using LMS across the GSPM.

8:42

>> Thank you, Sama. I will share my screen.

8:49

Uh can you see my presentation?

8:53

>> Can see. Yep.

8:54

>> Thank you.

8:56

Hello everyone. Today uh I would like to

8:58

present a project that we have conduct

9:00

in at use large language models for a

9:04

very specific high impact task coding

9:07

open-ended responses in surveys and

9:10

sensors.

9:11

Before I introduce the project, let me

9:13

briefly explain what large language

9:16

models are. These models are produce

9:19

tense vector-based representations of

9:22

language. Unlike traditional approaches

9:24

where document is represented as a

9:26

sparse vector based on word occurrence,

9:29

language models represent words and text

9:32

as vectors in a high dimensional space.

9:35

The key idea is that in that space,

9:38

geome geometric distances reflect

9:40

semantic relationships. Words with

9:43

similar meanings tend to be close to

9:45

each other and analogous relationships

9:47

are preserved in a consistent way. In

9:50

addition, these representations are

9:52

contextual. The same word may be mapped

9:55

to different vectors depending on the

9:57

context which in which it appears.

10:01

At ING, we are able to experiment with

10:04

these models within our data science

10:07

lab, an infrastructure that integrates

10:09

both technological and human

10:11

capabilities.

10:13

This responds to the need to integrate

10:15

diverse data sources and apply AI to tax

10:20

such as text classifications special

10:22

analytics and synthetic data generation

10:25

among other use cases.

10:28

Specifically, we have developed machine

10:30

learning models to automatic coding of

10:33

open-ended responses. Traditionally,

10:35

this process is manual. Expert coders

10:38

interpret each response and a sync uh

10:41

code from a reference classification

10:43

catalog. This coded responses are

10:46

essential for producing a statistics.

10:48

However, in many operations, the volume

10:50

is extremely large. For instance, in a

10:53

recent exercise of interensal survey,

10:56

more than 12 pinions serve households

11:01

were interviewed. So you can imagine the

11:03

number of people needed to code all

11:05

responses. manually within a reasonable

11:09

time frame. For some time, part of the

11:12

task has been automated using rulebased

11:15

approaches driving by word occurrence

11:17

patterns. In our experience, around 70%

11:21

can be coded this way while preserving

11:24

good quality. But the remaining 30%

11:27

still requires manual coding. To reduce

11:30

the that workload,

11:33

we develop machine learning methods that

11:36

use already coded responses to train

11:38

algorithms capable of detecting patterns

11:41

in vector representation as text.

11:45

Within with these algorithms, we found

11:47

it possible to reduce manual effort

11:49

while maintaining coding quality.

11:52

In recent years, language models have

11:55

significantly improved automatic coding

11:57

because embedding representation capture

12:00

meaningful

12:01

meaning more effectively.

12:04

The workflow change as follows. Each

12:07

word is represented as an embedding

12:09

vector. Each response is modeled as a

12:13

sequence of vectors and models are

12:15

trained to classify these sequences and

12:19

assync a appropriate code. The key

12:22

element is that these models can output

12:26

an uncertainty score which can tune so

12:30

that the system codes only those

12:33

responses where the model is

12:35

sufficiently confident. This allow us to

12:38

quantify uncertaintity

12:40

and efforts a minimum desired level of

12:43

quality.

12:45

In recent operation, we achieved about a

12:49

50% reduction in manual coding workload

12:53

while maintaining accuracy of 94.3%

12:57

comparable to the quality typically

12:59

observed in traditional manual

13:02

workflows. In other words, we optimize

13:05

the resources and reserve expert time to

13:08

the most difficult cases.

13:11

We also collaborate with to implement

13:13

this approach for automatic coding in

13:16

Chile's obtimization survey achieving

13:19

high highly competitive resource. This

13:23

analysis highlights the value of moving

13:25

from traditional representations to

13:27

embeddings and sequence based models

13:30

using word occurrences features accuracy

13:33

was around 85%. Adding embeddings with

13:37

traditional method like extreme radium

13:40

boosting increase performance to roughly

13:43

87%

13:45

using sequence classification reach

13:47

about 91%

13:50

and using a large model for for example

13:54

Roberta a fine tune of bird for Spanish

13:58

generic embeddings reach around 92%.

14:02

It is important to note that large

14:04

models such as B often require

14:07

specialized computing infrastructure

14:08

like high performance GPUs which may not

14:11

be relatively available in a standard

14:14

production environments at the physical

14:16

offices. Therefore, lightweight

14:18

alternatives such as fast text

14:20

embeddings remain very relevant offering

14:23

a strong balance between performance and

14:25

computational cost.

14:29

Working with embeddings and large

14:30

language models also opens the door to

14:32

intelligent agents. We are currently

14:35

developing this with a a goal of

14:38

evaluating the quality of our institute

14:40

data and metadata so they can be

14:43

effectively be consumed by agents that

14:45

answer questions using tabular data and

14:48

knowledge coded in metadata such as

14:50

variable definitions and survey design

14:53

information. This model can generate SQL

14:56

like queries from natural language.

14:58

However, the critical step is providing

15:01

the agent with the right context to

15:04

accurately identify the relevant

15:06

variables and determining the correct

15:08

query. This is an ongoing effort

15:12

morely.

15:14

We are exploring

15:15

>> you have one minute left just to warn

15:17

you.

15:18

>> Yes, thank you. how how these models can

15:22

support different processes within the

15:24

GSVPN. For example, a chatbot, train

15:27

interviewers, tool to access data and

15:29

metadata quality, automate generation of

15:32

informative documents and semantic

15:34

search over our repositories.

15:38

Relations learned include a unifying

15:40

architecture is essential for

15:42

scalability and production deployment.

15:44

High performance on premises computing

15:46

supports data and long-term

15:48

sustainability. Finally, international

15:50

collaboration has significantly

15:52

accelerated our process. Going forward,

15:55

we our priorities include improving data

15:58

and metadata quality to strength AR

16:00

readiness. Continuing to develop use

16:03

cases that relate these models in

16:05

products and services and addressing

16:07

challenges related to human and

16:09

financial resources. continuous

16:11

infrastructures updates and the adoption

16:13

of appropriate security measures. Thank

16:16

you.

16:18

>> Brilliant. Thank you, Elio. Uh and as

16:21

always, great stuff from Enki. Um right.

16:23

Uh is Renjung with us?

16:27

>> Yes, I'm here.

16:28

>> Yes.

16:30

Is uh is your camera? Let me sh Okay.

16:36

Okay. Can you hear? Okay.

16:44

Hey, can you see my screen?

16:48

>> Uh, my camera.

16:50

>> There you go. Now I can see you.

16:52

Perfect.

16:53

>> Okay.

16:55

Thank you S and thank you Mr. Via for a

16:58

very concrete and well structured

17:00

demonstration. This is exactly what this

17:02

session promised me from air potential

17:05

to air production. I will not recap what

17:08

he have shown. Instead, let me name what

17:10

I think is the most strategically

17:12

significant thing in your presentation

17:14

because it easy to miss amid this

17:17

technical detail. What Ingi has built is

17:20

not just an automated coding system. It

17:23

is a tunable governance architecture.

17:25

The trade-off tunable accuracy curve is

17:28

showed is not a technical artifact. It

17:30

is a policy instrument. It it gives the

17:33

subject matter director not just the

17:36

data scientist a level to decide how

17:38

much human oversight to retain and that

17:43

design choice in my view is what makes

17:45

this architecture ready for production

17:48

in a way that many AI parts are not but

17:51

the curve also surface a profound the

17:53

question about this shifting law of the

17:56

statistician which is the focus of our

17:59

session so let me put two question to M

18:01

Mr.

18:02

and then I will offer some broader

18:04

reflections for for the room. The first

18:08

question you your present your

18:10

presentation showed AI deployed across

18:13

four phases of GSP BPM from interview or

18:17

training agents in phase 4 through

18:19

automatic coding in phase 5 quality

18:21

assessments in phase six to document

18:23

generation semantic search and chatbots

18:25

in phase 7 that is a remarkably broad

18:28

footprint. My question is about the

18:30

people when you moved automatic coding

18:33

into production. What happened to the

18:36

coding staff? Was he retrained as

18:39

supervisor of the algorithm reviewing

18:41

the cases below the confidence threshold

18:44

or were they deployed to other tasks and

18:48

practically speaking did they need a new

18:50

skills that your organization had to

18:52

build?

18:55

>> Yes.

18:57

Thank you very much Ren for for your

19:00

question and and yes that the the the

19:03

people that

19:06

the uh coder experts that are already in

19:10

the institution still uh coding the hard

19:15

the difficult responses, right?

19:19

and and many other the the the workforce

19:22

that that was needed for for calling the

19:25

the whole responses

19:28

uh are are not a permanent staff of the

19:32

institute. They they they are hired for

19:35

the

19:37

er concrete uh operation. Right. So uh

19:42

we we just need to hire less people now

19:46

for for for for that.

19:50

>> I see very interesting. So um so uh I

19:55

guess for uh maybe but for other do you

19:59

have any suggestion for other uh static

20:01

offices maybe they have uh the previous

20:04

statisticians how should they do you

20:07

have any suggestion do they need some

20:09

skill retraining or redeployment what's

20:11

do you have any suggestion

20:22

Do you want to repeat the question?

20:24

Thank you.

20:25

So my question is maybe for other

20:28

offices we have coding previous we have

20:32

coding staff right we have coding staff

20:33

we need do do you have a suggestion for

20:36

those uh those offices do they should

20:39

they deploy so the coding staff to other

20:43

maybe higher level low or should they uh

20:46

should we do should should they retrain

20:49

those coding staff to uh maybe to to uh

20:53

on uh to train with maybe who have the

20:58

higher level of skills to handle we have

21:02

a new scenario. Should they should we

21:04

retrain the those staff or should do you

21:08

have any suggestion on that?

21:12

Erh, I I I have to say that that that

21:16

the

21:17

putting in production this this uh

21:20

models is an already new erh thing that

21:24

is happening right now in a

21:28

actually the interial survey was the

21:31

first operation that that we have put in

21:34

production this coding system. So many

21:38

of of the consequences of of doing this

21:42

we we we have already

21:46

just starting to face right we don't

21:48

have uh the the evaluation of the

21:51

processes already made is something that

21:54

is already happening

21:57

>> yeah I guess the human capital freed up

21:59

for maybe you can you can the human

22:02

capital maybe can be freed up for higher

22:04

value work

22:06

We yeah so

22:09

here okay so now I have a second

22:12

question um is this is about something

22:15

you your slides revealed that I found

22:17

very striking the crime narrative

22:19

classification work the joint project

22:22

between ini and in Chile showed a

22:26

calibration diagram where you use the

22:28

temperature scaling to align the model's

22:30

confidence store with with actual

22:32

accuracy and you publish the full

22:34

pipeline as open source code on GitHub

22:36

in the lab. Here's the what I want to

22:39

ask the certain parameter the competence

22:42

threshold that determines what gets

22:44

autocoded and what goes to human review

22:48

who decides where to set it is it is the

22:50

data scientist the survey director or

22:53

some institutional governance body and

22:56

is that decision documented as part of

22:58

the statical methodology the way we

23:01

document sampling design or estimation

23:03

procedures.

23:06

Yes, it has to be the the the

23:08

methodologies, right? And the as you

23:12

mentioned, we have a a threshold of

23:15

uncertainty

23:18

uh uh level and and we move this

23:22

threshold for different classes. So we

23:27

can uh guarantee that we preserve the

23:31

quality of qualification made by manual

23:35

others. Right? That's the general idea.

23:40

>> Okay. Thank you. So I want to build on

23:42

both of your answers and our three

23:44

reflections for the loom. First from the

23:47

shifting law of the stitation what in

23:50

experience illustrates and what Mr. Veno

23:53

has just described from this inside is

23:56

that automating routine process does not

23:59

eliminates the need for statical

24:01

judgment. It lead the statistitation

24:05

expertise move from executing the coding

24:08

task to governing the systems that

24:10

executes it. Setting the confidence

24:12

threshold monitoring for distribution

24:14

shift when new response pattern may

24:16

emerge. deciding what when to retrain

24:19

and signing off on the quality of the AI

24:22

system output. This is a high order low

24:25

and frankly a more intell intelligally

24:28

demanding one but our training programs

24:31

and our job description in most NSOs

24:34

have not yet caught up with this

24:35

reality. Mr. Veno's own lesson that

24:39

human and resources cause organization

24:42

elain a challenge and learn this point.

24:45

The second one is on balance efficiency

24:47

and regal the tunable trade of curve is

24:49

elegant but it raise a governance

24:51

question that goes beyond ini if 94.3

24:55

accuracy at 50% and coding is the chosen

24:58

operating point for one survey who

25:01

ensures consistency across different

25:03

surveys within the same office. What if

25:06

the population survey team set an

25:09

aggressive threshold while the economic

25:11

survey team is more conservative? We

25:14

need a institutional level governance

25:16

frameworks standard operating procedure

25:18

for AI system pro production not just

25:21

team level technical decisions the UNC

25:24

quality framework for statistical

25:26

algorithm provides the conceptual

25:28

foundation but operationalizing is it is

25:30

the work ahead the third one is on

25:33

making these gains globally accessible I

25:35

want to highlight something from ini's

25:38

presentation that deserves more

25:39

attention the ini initially

25:42

collaboration on crime narrative

25:43

classification public as open source

25:45

code. This is a mod model of source

25:47

cooperations that makes efficiency gains

25:50

transferable. A statistic office in

25:52

another Spanish speaking country can

25:55

adapt that pipeline rather than build it

25:57

from scratch. At the UN global hub in

25:59

China, we have taken a similar approach.

26:01

In December 2025, we contributed to the

26:04

launch of the UN handbook of on sensing

26:07

of for agricult

26:11

production tool kit. The principle is

26:13

the same. Not every NSO building its own

26:15

AI from scratch but a global community

26:18

maintaining sh validated open tools. The

26:21

strategic questioning I want to leave

26:23

with this room is can we do the same for

26:26

AI assisted survey coding and the UN

26:29

global platform host validated

26:31

multilingual coding models for standard

26:33

classifications occup occupation

26:36

industry crime type that any office can

26:39

deploy. the technical ingredients exist

26:42

thanks to the kind of work energies is

26:44

leading. What we need is the

26:46

collaborative commitment. Thank you.

26:49

>> Thank you. Uh brilliant discussion.

26:52

Okay, Elio, there is a question for you

26:54

in the Q&A tab from uh my colleague

26:57

Farah Nanoir here at the ONS and he's

27:00

interested in hearing about you know the

27:02

benefits from training spoke model

27:05

versus just using an existing LLM and

27:08

rag. Do you have views on when you might

27:12

favor one approach over the other? So

27:14

the question is in the Q&A tab if you

27:16

can see it.

27:23

Okay. Well, uh actually to to to build a

27:28

a rack, we we need to use a LLM.

27:32

Erh maybe it's not uh uh

27:37

very

27:39

obvious but uh this is a because in

27:45

commercial solutions we just build the

27:47

rack given the the LLM the the right

27:51

context right but h if we

27:57

h work with an LLM that is already in

28:02

our own infrastructure. We can build the

28:07

ra the rack erh with with this LLM

28:14

when when we use the embeddings we

28:18

we it's like like we use the abser

28:21

representation of the of the text and

28:24

and

28:26

can

28:28

go uh straightforward

28:31

by a typical machine learning

28:35

procedure and I I think that it is a

28:39

more standardized way to pro pursue that

28:43

asking direct directly to the to the to

28:46

the agent in in in the classification

28:50

workloads.

28:52

Maybe the the advantage that use the the

28:55

rack is that that you don't need a a big

28:58

h

29:00

training data set. Right? This is maybe

29:03

the main advantage of doing that if you

29:06

don't have a great uh a large printing

29:10

data set is is a way to pursue.

29:14

>> Thank you.

29:15

>> Thank you. Uh Ranjen, I have a question

29:18

for you. Mhm.

29:19

>> Um as we use Gen AI more and more across

29:24

the GSBPM

29:26

uh the traditional ways in which we've

29:29

assessed statistical methodology may not

29:32

work. Do you have any thoughts on

29:33

actually when doing this sort of work h

29:36

how do you assess quality?

29:39

This that's a very an excellent question

29:41

and it gets an uh get at something

29:44

fundamental traditional quality

29:46

framework for office official stat like

29:49

the European statistical code of

29:51

practice the UN fundamental principles

29:54

the equality dimensions were designed

29:57

for a world where the production process

29:59

was determinist deterministic and

30:01

chasable. uh you could every step with

30:04

gen AI model set traceability breaks

30:08

down the model reasoning is not directly

30:11

inspectable. So how do we assume

30:13

quality? I think we need a layered

30:15

approach. The first lay is output

30:17

validation the benchmarking against the

30:19

ground truth. This is what in has done

30:22

well compare the models codes against a

30:25

human coded gold standard and measure

30:27

accuracy

30:29

precision record. This works for

30:30

classific tasks where we have historical

30:33

reference data. It is necessary but not

30:36

sufficient. The second layer is

30:37

calibration. Uh ensuring the model knows

30:40

what it does not know the core narrative

30:42

work that in published with the

30:45

initially use temporary scaling to

30:47

calibrate the model's confidence score.

30:50

So we so that when the model says I'm

30:53

90% confident, it is actually correct

30:55

90% of the time. This is critical

30:58

because the entire human in the loop

31:00

architecture depends on the quality of

31:02

the uncertainty estimates. If the world

31:04

is overconfident or low quality bricks

31:07

through calibration testing should

31:10

become a standard part of the AI quality

31:12

assurance to kit for the NSO. The third

31:15

layer is behavior monitoring in

31:17

production. This is a piece most offices

31:19

have not yet built. Once the model is

31:22

deployed, you need to continuously check

31:25

whether today's distribution of input

31:28

matches the distribution the model was

31:30

trained on. If survey responders start

31:33

describing the job differently because

31:35

of economic change or new industry or

31:38

even lingu linguistic drift the models

31:41

training data becomes stale. Statistical

31:45

process control techniques where which

31:47

our community knows well can be adapted

31:50

for this purpose can trust on model

31:53

confidence distribution l on category

31:56

level accuracy

31:58

validation against the fresh human coded

32:00

samples. The fourth layer and this is

32:03

specific to generative AI as opposed to

32:05

class classification AI is semantic

32:08

evaluation. When general AI produces a

32:11

statistical report or data description,

32:14

there is no single correct answer to

32:16

benchmark against. Here we need a

32:18

different quality paradigm. human expert

32:21

review of a random sample structured

32:24

rubrics for factorial accuracy and

32:27

coherence and increasingly AI assisted

32:30

evaluation where a second model uh

32:32

checks the first. This is an act active

32:35

area of research and I would not claim

32:37

we have solved it. But the principle is

32:40

clear. Any gen AI output that enter the

32:42

official statistical production chain

32:45

must be subject to documented quality

32:48

review and as the review process itself

32:50

should must be auditable. The honest

32:52

answer is that traditional quality

32:54

methodology still works for the

32:56

principles accuracy timelines coherence

32:59

compar compatibility but as the

33:01

measurement tool need to expanded that

33:04

expansion is an urgent task for the

33:06

international statistical community and

33:08

the UNCBD and the DC as the right bodies

33:13

to related. Thank you.

33:15

>> Thank you. Uh and I got one last

33:18

question for you. This is in the Q&A uh

33:21

from Ezi. Uh so how have you handled

33:24

cases where the AI model produces

33:27

incorrect classifications

33:29

and then what mechanism are in place to

33:31

make sure those errors don't propagate?

33:35

>> Yes. Right. Uh this is a question uh

33:39

well we we can avoid that uh some some

33:43

errors from from classification.

33:44

Actually ma manual manual coders also

33:48

make mistakes because not everyone is

33:51

expert and they apply different criteria

33:56

for for some codes. Uh so h we have to

34:02

to live with that errors. What what we

34:05

can do with with machine learning

34:08

algorithms is that we we can have as an

34:12

output the uncertainty

34:16

or of the confident

34:18

of the model

34:22

to to make the the classification.

34:25

Right? So we we can use this factor of

34:28

confidence uh to to to

34:33

leave the difficult cases to the

34:35

experts. That's that's the the way we

34:38

can manage that.

34:40

>> Okay. Thank you. Uh I'm going to make

34:42

one comment because I think you said

34:43

something very important. I think um you

34:45

know with the hype around AR you hear oh

34:47

hallucinations and everything else. The

34:48

incorrect benchmark as you've said is

34:50

not zero errors. the correct benchmark

34:54

is what's the error rate given our

34:57

current methodology and that's what you

34:58

need to benchmarking which I think is a

35:00

very important point and I think a lot

35:01

of people miss out on that okay that was

35:04

a fantastic presentation and discussion

35:07

and thank you for those who ask

35:09

questions sorry we're not we are not

35:11

going to have time to uh

35:14

look um answer every question that's

35:16

raised in the Q&A but keep them coming

35:18

for each section uh and we'll we'll make

35:21

sure we answer some Okay. So, uh next up

35:24

we have my very good friend Satia

35:26

Prammana from uh uh Statistical Poly

35:29

Technic in Indonesia. Uh I'm excited

35:31

about this because for quite a few years

35:33

I've been hearing about all the great

35:35

work that they've been doing in in in

35:38

Indonesia.

35:39

So, uh today Seti is going to talk about

35:42

using AI to derive timely statistics

35:44

from satellite imagery. Um Setia, are

35:48

you here?

35:49

>> Yes. Can you hear me? I can hear you cuz

35:52

there we go.

35:53

>> You can see your presentation, right? So

35:55

I think you just have to tell I think

35:56

Clarence is running the presentation. So

35:58

you just have to tell them next slide or

35:59

whatever whatever you want. Right.

36:01

>> Okay. Thank you, Clarence.

36:03

>> Over to you.

36:05

>> Thank you very much. It's great to meet

36:07

you over uh after the year. So uh thank

36:12

you very much for inviting me and then

36:14

it's great pleasure for me to uh be part

36:16

of this uh event and meet uh all my old

36:21

friends like uh Elio and all colleagues

36:24

here I think is also good names of

36:26

colleague from UNSD and also from UNSBD

36:30

and uh uh Osama right now I'm just like

36:33

almost a month appointed as the director

36:35

of statistical methology right now in

36:38

NBP statistic in Asia I'm still also

36:41

holding the director of uh regional hub

36:44

on big data center for official

36:46

statistics in uh as the Pacific I think

36:50

uh next. Yeah. Yeah.

36:55

Okay. Yes. So before focusing on the I

36:58

think the moderations of cultural

37:00

statistic using AI and nonidential data

37:02

sources let me just overview what we

37:05

have done so far. uh I just brief uh

37:08

overview what we have done mostly is

37:10

about uh using several data sources such

37:13

as mobile processing data for different

37:16

uh type of uh official statistics such

37:19

as uh tourism statistic migrations and

37:22

also uh other statistics but we will

37:25

focus later about mix method how you to

37:27

use the uh satellite for agricultural

37:30

statistics. Next please. We also as uh

37:34

we also try to use the uh AI gene AI for

37:39

uh classifications. So it is leveraging

37:43

JFI for automatic predictive of

37:46

Indonesian standard industrial

37:47

classifications. Now it's been uh we are

37:50

still working on that and then we

37:52

hopefully that we can use it for uh our

37:55

economic sensors this year. Also we have

37:59

uh I think this one also I discussed and

38:01

also have some good very good feedback

38:04

uh last year together with Osama that uh

38:07

we use uh we develop now I pass AI AI

38:11

knowledge for metadata chatbot and

38:13

automatic interpretations but today next

38:16

please I'm going to just focus because I

38:18

just have seven minutes oh I still have

38:20

another one sorry we also use the earth

38:23

observation for poverty mapping and also

38:26

chat deprivations for poverty mapping

38:28

it's uh we we have the SDGS award for

38:32

this in 2023 and then for mix metal

38:35

actually we just last year we have also

38:37

SG award for um uh using certain image

38:42

for agricultural statistics next please

38:46

yes so uh agriculture statistic is a

38:50

fundamental

38:51

uh data right for policy especially

38:55

Indonesia for inflation control also

38:57

national planning. But uh next please

39:00

the traditional data collection method

39:02

alone now longer sufficients to meet the

39:05

two today demands uh as you know that we

39:07

need to have data timely granular and

39:09

also high frequency statistics. So

39:12

initiate a transformation starting for a

39:14

few years ago to integrate satellite

39:16

imageries machine learning and also

39:18

other data sources into official

39:20

statistic production especially for

39:22

agricultural statistics. So

39:25

traditionally uh right now statistic

39:26

Indonesia conducts the IRA sampling

39:29

frame survey every month especially in

39:32

the last week and then this is to obtain

39:34

the um the pendical stage. So the

39:40

harvest and uh estimation times so we

39:43

have uh

39:46

sampling frame uh survey for uh

39:51

obtaining the productivity rights. So

39:53

this methods actually provide strong

39:55

ground truths but it has limitations. It

39:58

has high operational cost. Yeah. Because

40:00

we have thousands of uh staff going to

40:03

the field every month. Also time

40:06

constraints and also related to

40:09

logistical challenges across Indonesia

40:11

because some of area is difficult to

40:13

access also maybe dangerous to access

40:16

also a limited temporal frequency. So we

40:19

try to combine the use of uh satellite

40:22

imagery or aeros machine learning

40:25

algorithm to predict the uh next please

40:29

to predict the uh uh pen phological

40:34

faces. So we know that we start from uh

40:37

tage fitive one fitive two and etc. So

40:40

we use um satellites to predict the area

40:44

of uh this the area and also the

40:48

phological faces of this uh fat patty.

40:51

Why patty? Because this is our main

40:53

staple. So why we use sentinel? Why the

40:56

sentinel? Because it's rather based and

40:58

also it can penetrate clouds. uh it

41:01

works well in tropical region like

41:03

Indonesia and then uh hopefully we can

41:06

detect planting and also growing stages

41:08

and also hopefully that we can detect

41:11

harvesting phases and then this is a

41:13

crucial breakthrough because it allows

41:15

continuous monitoring not only uh we we

41:18

don't need to go to we don't need to

41:19

bring our staff to the to the field but

41:22

we can uh allow us to continue

41:24

monitoring without uh waiting for a

41:26

field visit which is actually difficult

41:28

and then uh we also uh use this uh based

41:33

on the guidance principle of visual

41:35

statistics. Next please. And yeah so how

41:40

the predition is made? So this actually

41:42

in collaboration with several uh several

41:46

uh agencies and also ministers Indonesia

41:49

and also we uh collaborate with the

41:51

UNSD, UNSAP and also with several uh

41:55

agencies such as FO. So in Indonesia we

41:58

use uh we collaborate with Brin uh our

42:01

anal research agencies and then uh we

42:04

based on the uh sentinel data uh images

42:09

from BRIN from the research agency for

42:12

every 12 years we check the ISF segment

42:15

and then we predict the pera of each

42:17

palical classes and then later we try to

42:21

estimate all faces area particularly the

42:24

het petty using the sampling sign

42:27

including relative standard error. Next

42:29

>> just to let you know you got like a

42:31

minute left.

42:32

>> Yes. Next please.

42:35

>> Yes. This actually because uh Indonesia

42:38

is quite big so we also uh make some uh

42:42

uh checking area the ground check area

42:44

and then the in general right most of

42:47

the province have accuracy above 80%. Of

42:50

course, there's still area that below

42:52

80% of accuracy. And then based on the

42:55

estimation, the new method demonstrates

42:57

the capability of capturing the hed the

43:00

harvesting patterns similar to the

43:02

current official data through uh ISF uh

43:07

sampling frame survey and then uh we

43:10

have high accuras in some area. So in

43:13

low some area with low accuras actually

43:16

we try to uh fine-tune the model. Next

43:18

please. The last one. Yes. So the

43:22

product delivery it's been like uh uh

43:25

part of the UN hardbook of reassultural

43:28

statistics with Gabel to Lorenzo and

43:30

also Ronald Ronald on the UN. So it's

43:33

part of this and then if you want to

43:36

know in more detail about this we also

43:39

have a knowledge videos to share. We

43:42

have also the uh product of uh

43:46

publication but it's still in uh Bahasa

43:49

Indonesia. Later we also will put that

43:52

in in in English. Next

43:56

is this ah yes uh last year we uh we

44:01

have uh awards uh of this uh uh method

44:05

of this uh breakthrough uh geo particip

44:08

uh geo awards uh member awards of

44:12

modernizing ag statistic uh using

44:15

satellite imagery for penological

44:18

stage I think that's all because I just

44:21

have only seven minutes hopefully that

44:23

can Thank you very for your attention

44:25

and then looking forward for any

44:27

fruitful discussion or comment for our

44:30

uh

44:31

>> Thank you. Thank you. 7 minutes but

44:32

there was a lot in there. Fantastic.

44:34

>> Okay. Uh Mar Louisa uh and Marcus are

44:39

you here?

44:41

>> Yes.

44:43

>> Yes.

44:45

>> Can't see you yet.

44:50

>> Can you see me? I'm here. I can hear

44:53

you. Can't see you yet. Is your camera

44:56

on?

44:58

>> Yes, it's on. Interesting.

45:04

>> H

45:05

I can see Marcus now.

45:08

>> And now I can see you as well. Mary

45:10

Louisa. Great. Okay. Fantastic. Uh Sia,

45:12

you should keep your camera on because I

45:13

am going to bring you into this

45:14

discussion as well. So don't disappear.

45:17

Uh right. Um

45:19

>> I'm here.

45:20

>> Okay. Okay. So, Mar Louisa, I'm going to

45:22

go to you first. Um,

45:25

what work has the Brazil regional hub uh

45:29

been doing to enable more use of AI

45:32

drive statistics using big data to fill

45:35

the gaps and then how do you determine

45:38

what use cases to investigate?

45:43

>> Well, h thank you very much for the

45:45

invitation to participate in this

45:47

important discussion.

45:50

It's truly a pleasure to be here

45:52

especially with my colleague Satia from

45:56

the regional hub in Indonesia.

45:59

H

46:00

and I think that Satia example clearly

46:05

illustrates the potential of AI derived

46:08

statistics to fill the critical data

46:11

gaps. particularly when we think about

46:14

domains where traditional surveys face

46:17

limitations in costs and however I think

46:21

it also raises an essential question for

46:25

NSOs under what conditions can AI

46:28

derived outputs be used responsible as

46:32

official statistics and from the

46:34

perspective of the UN regional hub for

46:36

big dates in Brazil. Uh this is one of

46:39

the central questions guiding our work

46:42

across Latin America and the Caribbean.

46:46

uh our recent consultation with 21

46:49

countries showed that almost 40% of the

46:53

NSOs already have AI initiatives

46:57

integrated into their production process

47:01

most commonly in the processing phase to

47:04

improve like data validation cleaning

47:08

and efficiency.

47:10

Uh this confirms that AI is already

47:13

straightening statistical production but

47:17

scaling its use to fill statistical gaps

47:20

requires clear and robust criteria to

47:23

ensure quality and trust

47:27

and supporting countries in

47:30

operationalizing these principles is a

47:33

central priority of the regional hub.

47:36

One example is our regional project with

47:40

11 countries to develop climate change

47:43

indicators using satellite imagery and

47:46

AI through a shared cloud blaze

47:50

cloud-based platform and these

47:52

initiatives enables countries to

47:55

co-develop validated methodologies

47:58

uh to share infrastructury and ensure h

48:02

consistent quality standards And

48:08

in parallel uh through our training

48:11

activities with the UN global platform,

48:15

we are helping countries assess

48:17

cloudnative data science infrastructure

48:21

and build the capacity required to

48:24

validate and to deploy these methods

48:27

responsibly.

48:29

Um and this work also helps identify

48:32

some priority use case. We focus

48:36

particularly on domains where there are

48:40

clear policy needs where traditional

48:43

data source are insufficient and where

48:48

alternative data source can complement

48:50

the existing statistical systems.

48:54

And what what we have seen is that AI

48:57

does not replace statistical systems. It

49:01

strengthen their ability to fulfill

49:03

their core mission by combining strong

49:06

validation frameworks uh transparent

49:09

methodologies and shared infrastructure.

49:12

The NSOS can responsibly use AI derived

49:17

statistics to fill the critical gaps

49:19

while maintaining the trust and the

49:22

important statistical integrity.

49:28

>> Thank you Marcus. I've got a question

49:31

for you. I think you know from Sea's

49:33

presentations

49:35

there's two aspects here that take us

49:37

away from traditional production of

49:39

statistics. And one is that these sorts

49:42

of big data are not data that NSOs have

49:45

traditionally been used to working with.

49:46

We're traditionally we tend to work with

49:48

data we either collect ourselves

49:51

and also maybe uh government

49:53

administrative data. So you know this

49:55

big data site of all it's not our data

49:57

and then using Gen AI tools that's not

50:01

part of the standard skill set of

50:03

staticians in NSOS. So how as a leader

50:06

do you set a culture that allows for

50:09

innovation and that encourages the use

50:12

of big data and new approaches?

50:16

>> Um thank you Osama.

50:19

May I start by uh noting that uh uh use

50:24

of AI in a context of G gatial data is

50:29

extremely natural in the sense that

50:32

these uh satellite observations and also

50:36

other forms of collecting data where

50:40

where the physical location of the

50:43

object is is is measured because it can

50:46

be done without satellite.

50:49

in information.

50:51

the the but the nature of of the data is

50:56

that the data sets are huge and it's

50:59

very natural to ask can we somehow

51:03

automaticize the the process of the

51:06

using the the using the the data and and

51:10

also also also you you mentioned the

51:13

important aspect here u even uh

51:17

statistics Finland is not really big

51:20

user of satellite observations

51:24

because we have access to other big data

51:27

sets. uh having this u uh the location

51:33

information included. Um

51:38

the nature of the satellite imaginary

51:41

and the the administrative data data is

51:45

exactly there's a one exact uh aspect

51:49

what is similar is that we we are not

51:52

able to very much affect how the data is

51:55

collected. We have to cope the the data

51:58

with someone else has produced. We don't

52:02

simply have a resources to uh uh to to

52:06

to to force other data collectors

52:10

collect exactly information what we

52:14

need. We have to cope with the already

52:17

existing data. And this is my basically

52:21

first question related to to our speaker

52:26

today. Uh how much uh work

52:30

you describe is actually done inside a

52:34

statistical institution and and how much

52:37

outside uh in in other government agency

52:40

and secondly

52:42

how how this everything is financed

52:45

who is paying to whom I mean is this

52:48

data expensive

52:50

and if and is the accuracy of data uh

52:55

high enough. I've understood that the

52:59

kind of very um uh the data the the

53:05

where the granularity is high it's it's

53:09

free or almost uh um free very cheap but

53:14

more you want to get information more

53:17

you have to pay uh uh have you have you

53:20

how you have handled this uh cost part

53:24

of the story Uh do you have a budget for

53:27

compensating this data or

53:31

is the data given to you free? And um

53:36

and and and finally definitely

53:39

everything is a question of what our

53:42

organizations what we are able to do. Uh

53:46

it's actually big uh uh process of

53:49

learning and changing organizations.

53:52

I believe that this consists of kind of

53:56

a right combination of planning and

53:59

anarchy. In an anarchy I I mean in a

54:03

sense that

54:05

every person is a kind of every every

54:08

person has a size of creativity

54:13

a side of kind of research attitude. uh

54:17

and uh and there are a lot of things

54:19

that you have to experiment.

54:21

But to to get these uh um uh what what

54:25

people invent and learn into production

54:30

means needs very normal structures that

54:35

you have to set targets, you have to set

54:38

timetables, you have to define your

54:41

corporate policies and uh so it's a kind

54:44

of balancing these different aspects and

54:47

and basically I feel that my role is is

54:51

to

54:53

try to generate

54:55

um

54:57

environment and culture where people can

55:02

freely uh start to think

55:06

perhaps I can use uh uh the AI in in my

55:09

everyday work. Perhaps I can try to do

55:12

it some here or or or in some other

55:16

place. But at the same time

55:19

I feel that I I have to start to ask is

55:23

this in a production or are you just

55:26

dreaming about it? So it's a combination

55:29

of the uh the encouragement and uh

55:35

gradually going towards demanding

55:37

results. Thank you very much.

55:40

>> Okay. I have to say I'm all in favor of

55:42

a bit of anarchy. um solution and you

55:46

raised an interesting point around costs

55:48

and financing

55:50

>> particularly with lots of this data

55:52

being collected by generated through

55:55

private companies not all of it but a

55:56

lot of it is and how I think this is for

55:59

another discussion somewhere else at

56:00

another time how

56:05

what are the things we have to do to

56:06

make this data a public good but that

56:08

that's for another discussion there's a

56:10

bunch of things there that you both

56:11

raised and I'm just going to go back to

56:13

Seia for a second. Uh, Seiia, are you

56:15

there?

56:18

>> Yes, I'm here. You hear me?

56:19

>> Okay. Oh, there your camera's off again.

56:21

I can't see you.

56:22

>> Um, so you're also heavily Okay. You're

56:25

also heavily involved in training

56:27

through the university.

56:28

>> Just so I'm interested, what have you

56:30

been doing to ensure statisticians have

56:32

the skills and technical knowledge to

56:34

use uh, you know, these new forms of

56:36

data, uh, machine learning tools, uh,

56:40

geno tools. So what have you been doing

56:41

on the train? What have you been doing

56:42

on the training side because that's

56:44

actually quite important I think.

56:47

>> Okay. Thank you. Uh can I just respond

56:49

to Marcus and then respond to course you

56:51

also just a bit a bit with what? Thank

56:53

you very much Marcus also Maria Louis.

56:56

So uh a good pleasure to have all the

56:59

comments from you and just quick respond

57:02

to what I think what is raised by Maria

57:05

and also by Marcus that uh how about how

57:07

it works right and how about the funding

57:09

or the team and the planning etc. For

57:11

example, the cost related to uh the data

57:14

sources indeed the data of big data uh

57:17

is not free. Some for example mobile

57:19

phone we have to pay in collaboration

57:21

with uh our mobile network operators but

57:24

for example for satellite we can obtain

57:26

it for central because it's free but

57:28

again it's in collaboration with our

57:30

research agency which actually they are

57:32

using they're more they're making the

57:35

satellite imageries to fit to our uh uh

57:38

needs and then uh whether it's only BPS

57:41

of course uh we in the beginning

57:43

actually the challenge is not only about

57:45

technical capacities also related to um

57:49

collaboration with between uh ministries

57:51

for example ministry of planning,

57:53

ministry of finance I think related to

57:54

the the budget Marus I think you

57:56

mentioned about that the budget and also

57:59

how about we can use that ministry of

58:01

agricultural minister of research etc.

58:03

So in the first years actually it

58:05

started on like three years ago. In the

58:07

first years we we try to collaborate and

58:10

to make into into one uh single

58:13

um vision that we can use satellite

58:16

imagery as part of our national uh

58:19

office statistical uh official

58:21

statistics and then right now it's still

58:23

on the uh research but mix method

58:26

actually the use of uh earth observation

58:29

is part of the big data utilization road

58:31

map OB statistics Indonesia and

58:33

hopefully it's goal to be official this

58:36

here that's actually that's why I'm

58:37

appointed to be the director to make

58:39

sure that uh I think it can be used as

58:41

official and then uh what you mentioned

58:44

uh again Osama related to uh the uh uh

58:48

uh training right of course we we

58:51

conduct uh uh luckily we have a uh

58:55

politics strategic where we have also

58:58

the scared of of our uh uh regional hub

59:02

same like FG and also the same like what

59:04

Maria have in In Brazil we have school

59:07

where we actually not only uh produce uh

59:11

young uh talent data scientists and

59:14

statutician but also actually together

59:16

with BPS data synchronia with together

59:19

with uh training center of data

59:21

synchronia we conduct a regular training

59:23

hands-on workshop covering data

59:25

management machine learning and AI uh

59:28

topics including um uh like chai

59:31

responsible AI and etc. And uh this also

59:35

we organized and know sharing program

59:39

on the SAS community including

59:40

government institution academia and also

59:42

practitioner. This hopefully can

59:44

accelerate the collective learning and

59:46

promote good practice of uh AI and also

59:48

official statistics in Indonesia.

59:51

I think that's just to to respond to

59:52

your uh comment or thank you.

59:56

>> Right. Uh I think we've got time for one

59:58

or two questions that have come up. Uh I

60:01

think this one is going to Satia. Uh how

60:04

do you do the validation of ground truth

60:06

validation? So this is uh a question

60:09

from anonymous in the Q&A. Their second

60:12

question

60:24

maybe my connection. Let me check.

60:26

>> Okay. How would how do you do the

60:28

validation the ground truth validation?

60:39

Uh oh.

60:42

>> Uh I lost signal here. Can you repeat

60:45

some? I'm sorry that

60:47

>> maybe no

60:49

worries. So um with mixed methods you

60:52

often with these sorts of data

60:53

especially with earth observation data

60:55

there's some ground truth validation

60:57

that needs to be done. How have you been

60:59

doing that?

61:02

>> Oh yes. So we select several area in

61:06

different type of areas in Asia like in

61:09

the coastal area in the uh

61:13

mountain areas to see the situation of

61:16

the ground check. So we send uh several

61:19

uh staff to uh some area to check out

61:23

whether our prediction actually

61:24

happening in uh true in in the field. So

61:27

that's why we we observe that most of

61:29

the area actually have high accuracy but

61:32

some area maybe is not that uh good for

61:36

example area that um uh montaneous area

61:40

that difficult to to uh access for

61:43

example that's uh area that have lower

61:46

accuracy but we send uh I think more

61:49

than almost 100 of staff to go to the

61:52

field and then check the the prediction

61:55

and then we actually have to uh inline

62:01

the the the time where the satellite is

62:04

going through the area and the time that

62:07

we have to go on check because sometimes

62:10

the window of checking should be the

62:12

same where when the satellite moving in

62:14

that area and then the time when we are

62:16

going to fill Zama

62:19

>> okay um uh much as Eric is a friend I'm

62:22

going to leave his question because you

62:24

can always get in touch with SEIA

62:25

directly uh even via me. Uh I've got one

62:28

for Maria Louisa. Um we'll we'll pick up

62:31

Eric's question separately. Um

62:35

um I've got a question for Maria Louisa.

62:38

Um which is kind of similar. Um

62:42

again it's around you know what have you

62:43

been doing in terms of skill any

62:46

specific things that you think are

62:48

important in terms of building skills

62:50

for using you know this sort of big data

62:53

and you know using these uh techniques?

62:55

I know you've been doing quite a bit.

62:56

Anything that you think is particularly

62:58

important?

63:03

>> Yes, there are some aspects that are

63:06

important. H and when we think about the

63:10

big question what how can

63:14

under what conditions can AI derived

63:17

outputs be used responsible as official

63:20

discs. So we need to think about many

63:23

skills that that must be developed h in

63:26

the in the staff.

63:29

H when you you think that the

63:31

statistician maybe he's going to change

63:34

his ability from being h the one that's

63:39

going to make all of the analysis but to

63:42

be a supervisor

63:44

of the models. Uh so uh we have like a

63:50

more focus on the computational uh

63:54

aspects and we try to to to to give this

63:57

approach in our in our trainings

64:01

h and also to define some some reference

64:05

to work. For example, when we the

64:08

statistician need to validate h again

64:12

the trusted reference data because the

64:16

AI derived estimates must be must be

64:19

like always compared with survey data

64:21

administrative sources h and this kind

64:26

of principles that's that must guide the

64:28

production of official stat statistics

64:31

also related to to how to measure and

64:35

communicate uncertainty

64:38

uh because when we are thinking about AI

64:41

outputs outputs it's it must like

64:45

include clear clear information about

64:48

their confidence and their limitations

64:51

and how we we transport the the

64:54

statistical knowledge the theoretical

64:57

statistical knowledge to this kind of of

65:00

universe

65:01

um also

65:04

h I think there's a change in how we see

65:07

the models because as Satia mentioned

65:11

the the the assessment of potential bias

65:15

and the fitness for purpose. Uh and we

65:18

we we need the statistician need to see

65:21

that the the models are not perfect.

65:25

They they must be like sufficiently

65:28

reliable for the intended statistical

65:31

use. And uh I think a last point that

65:35

that that must be h in the core of the

65:39

training for a statistician h the

65:41

question to about transparency and

65:44

reproducibility

65:46

because the methods must be documented

65:49

very clearly. H and

65:53

the staff need to be to be like

65:55

oversighting the model design the

65:58

validation implementation.

66:00

So like as the regional hubby we we we

66:05

try to focus uh the our supporting to

66:08

the countries in operationalizing these

66:11

principles as a central priority for the

66:14

new skills that that must be developed.

66:18

>> Thank you Marcus. I have one last quick

66:21

question for you. In a world where you

66:23

allow a bit of anarchy and new entrance

66:26

into NSOs are just, you know,

66:28

technically stronger and stronger all

66:29

the time, you know, for old people like

66:32

me whose technical skills aren't what

66:34

they used to be, but has to supervise

66:36

this work, what what's what's

66:39

the capability development people like

66:41

me need?

66:44

Oh well um perhaps the most important

66:48

thing is uh trying to listen what what

66:51

what people are saying. If you don't

66:54

understand what they are doing you may

66:57

ask and uh and and also it's helpful for

67:02

young younger people that you are trying

67:05

to force them to explain what they are

67:08

trying to do. It also helps them to

67:12

understand what what they are doing. But

67:15

also uh I I tried to emphasize that the

67:21

the leadership in a in a condition uh

67:25

what is uncertain because many aspects

67:27

of use of AI are such that we simply

67:31

don't know one knows what what can what

67:35

can be done then the good policy is to

67:40

uh

67:41

people at the same time try what they

67:45

won't try uh let them to create their

67:49

their own methods and ideas and uh and

67:54

also let them to just to play the uh

67:59

tools. We decided to open um to to buy

68:05

um copilot a little bit of license for

68:09

40% of stuff. So it's basically it was a

68:13

random number 40% and uh 40% of staff

68:17

have they have now um a better co-pilot

68:22

license and then they can start to play

68:24

with it but but so the the playing uh

68:30

it's a way to create ideas but quite

68:33

soon you have to ask people that can you

68:36

show me that what you have done is

68:39

actually beneficial ial for our

68:42

institution. So it's a combination of

68:44

free thinking and uh and gradually the

68:48

planning and discipline orders are

68:51

coming to um to affect the playing uh

68:56

playing ground. Thank you.

68:59

>> Thank you. Okay, Marcus Mar Louisa

69:01

Setia, thank you very much and we'll

69:03

move on to the next thing.

69:04

>> Thank you.

69:05

>> Okay, so uh

69:07

>> thank you very much. We're now going to

69:08

move on and hear about the governance

69:10

challenges of AI in statistical

69:13

organizations from the pets task team.

69:16

Uh and Matias Jük from the pets task

69:19

team and stats Netherlands is going to

69:21

do a presentation and uh Gary Donnut

69:25

from Stats New Zealand and Frankie K

69:27

from CSO Ireland are going to be

69:29

discussing. So uh Matas over to you.

69:33

>> Yeah. Uh thank you Osama. Uh yeah just

69:36

uh sharing the screen. Oh it's already

69:39

there. Yeah sorry. Uh yeah uh yeah thank

69:42

you for the invitation to present the

69:44

work of our task team. So uh the privacy

69:46

enhancing technology task team has been

69:48

um active since 2018 investigating the

69:51

role of pets uh in in official

69:53

statistics and since 2022

69:56

uh we also have UN pet lab which is kind

69:59

of a community of practice also doing

70:02

experimentation practical

70:04

experimentation. So first of all I would

70:05

like to thank um yeah all colleagues

70:07

from the task team and unpad lab for

70:09

providing input for this presentation.

70:11

Uh next slide. Uh so uh

70:18

uh let's start with what are pets. So

70:21

there are a lot of definitions floating

70:23

around but uh I like this one uh because

70:26

it's very simple. Privacy enhancing

70:28

technologies are a suit of tools that

70:30

can help maximize the use of data by

70:32

reducing the risks inherent to data use.

70:35

Uh so which captures that pets really

70:38

focus on uh the data in use as as

70:42

opposed to you know traditional

70:44

encryption uh which focus on other parts

70:47

uh uh of the uh of the cycle and uh

70:51

basically reducing the the privacy and

70:54

confidentiality risks. And if next one

70:58

um so in that uh respect some people

71:02

even talk about not just uh privacy

71:06

enhancing technologies but partnership

71:07

enhancing technologies because they also

71:10

um improve uh trust and then uh in many

71:14

use cases we need to rely on the other

71:18

partners like a data providers like a

71:20

researchers uh or tech companies and and

71:24

pets can help there. Uh and then finally

71:26

if you click again uh uh they support

71:29

principles of privacy by design, purpose

71:31

limitation, data minimization, all those

71:33

principles that come with the privacy

71:34

regulations such as GDPR. Next slide.

71:39

Uh so uh going now to the AI uh uh let's

71:42

look at the definition of AI model. Uh

71:44

so u according to Google the AI model is

71:47

a computer program or algorithm. uh and

71:50

this is a bit misleading because it's

71:52

not the kind of algorithm or program

71:55

that that traditionally we we were used

71:58

to uh because it can learn patterns and

72:01

relationships in the data which means

72:03

that um uh uh next one uh AI model

72:08

actually is data or can contain the data

72:11

that can be sensitive. Of course, it was

72:13

trained on sensitive data. So we need to

72:15

treat AI models differently than we had

72:18

been treating the the regular methods in

72:21

the past. Next

72:24

uh so that means uh there are a lot of

72:27

uh additional disclosure risks that come

72:29

with the AI and AI pipelines. So I would

72:31

just mention very quickly a couple of

72:33

them. Uh so of course if we want to

72:36

train uh the model uh on sensitive data

72:40

that in often includes u some risks with

72:43

leakage uh of sensitive information then

72:46

the model itself can can leak like

72:48

similarly that our outputs can leak the

72:50

data if they're not protected also

72:52

models can leak uh the AI models um uh

72:56

enable more um complex attacks

73:01

uh prompt injection like we we were used

73:03

to SQL injection. Now we have model

73:05

prompt injection in LLMs. Uh and even uh

73:09

yeah it is possible in some

73:11

circumstances to even reverse engineer

73:13

the whole model or data can leak in the

73:16

logs. Next

73:20

uh so uh going now to pets uh so here uh

73:24

I will refer to our uh definition or

73:27

classification of pets that we develop

73:29

as part of the UN task team. Uh in input

73:33

privacy pets uh broadly there are

73:35

methods cryptographic and and other

73:37

methods that provide some guarantees uh

73:40

that uh one or more parties can

73:42

participate in computation with

73:43

assurance that other parties will not

73:45

learn anything about the sensitive data

73:47

of that party and on the other side the

73:50

output privacy is guaranteed that

73:51

sensitive information cannot be reverse

73:53

engineered. Uh so in examples in input

73:56

privacy are of course uh encryption like

73:59

you can work on fully encrypted or

74:00

partially encrypted data or you can

74:02

split uh the data in the in the parts

74:05

that cannot be readable on itself but

74:07

you can compute with them in case of uh

74:09

secure multi-party computation or in

74:12

output privacy examples uh we can have

74:14

statistical disclosure control like K

74:16

anonymization but also new methods like

74:18

a differential privacy. Next

74:21

uh so now let's go to the examples. Uh

74:24

so uh I will start with confidential AI.

74:26

This is based on secure enclaves and

74:28

secure enclave is approach that actually

74:31

was used uh in production in Indonesia

74:35

and Setia mentioned already that use

74:37

case uh for protecting mobile phone data

74:39

but of course can be also used to

74:41

protect not just data but also the

74:43

models as well as the the user

74:44

interaction with the model. Uh and this

74:47

approach can be extended u into the

74:49

confidential federated u uh retrial

74:54

augmented uh generative systems uh which

74:57

uh which are more complex use cases

75:00

because they also pro protect the

75:01

knowledge bases

75:04

so that information cannot leak from

75:07

there. Uh so that's kind of one more

75:09

centralized example and next uh example

75:12

is um example of a more distributed

75:15

scenario. So in that case within the UN

75:18

pet lab we have a project that started

75:20

back in 2022

75:22

uh with the uh idea that uh if NSOs need

75:27

to share and collectively train the

75:29

model on their own data without uh uh

75:33

wanting to share the the the training

75:35

data of course uh then you can um

75:38

organize um a protocol with federated

75:41

learning and and have iterative uh uh

75:44

model training uh and then uploading the

75:46

weights that that are then uploaded into

75:49

the new version of the model. Uh now

75:51

federated learning on itself uh doesn't

75:54

provide uh complete privacy guarantees.

75:57

So in that case uh uh the the team also

76:00

experimented with differential privacy

76:02

and homorphic encryption which are

76:04

methods to to do this part of

76:06

protection. Uh and u uh and the results

76:10

were were were published uh in uh in a

76:12

paper. So uh next uh

76:17

uh

76:17

>> you got about a minute left.

76:19

>> Yeah. With conclusions. So uh by

76:24

mitigating these risks of data expo uh

76:27

exposures that that I explained PET can

76:30

act as trust technologies. Um so allow

76:34

AI to be used responsibly and safely.

76:36

And examples of that could be u that we

76:39

can um minimize uh u the data that is

76:44

needed uh uh from the input side or or

76:47

make make sure that it's confidential.

76:49

We can also support the pets can also

76:51

support purpose enforcement. For

76:53

example, scripts in securing claims have

76:55

to be pre-agreed which means that they

76:57

cannot be used for other purposes than

76:58

what was already agreed. Uh then uh I

77:01

mentioned protecting user data prompts

77:03

and model weights. uh enable co-creation

77:06

and sharing of AI models uh and also um

77:09

more automated disclosure control. So

77:13

basically increasing the trust between

77:15

the collaborating partner uh parties

77:19

and finally uh the next slide

77:24

uh I would just like to to put to leave

77:26

this on uh so in the task team uh we are

77:29

collecting disseminating uh case studies

77:31

from projects that are introducing pets

77:34

and in the new update of pet guide we

77:36

will particularly focus on the data

77:38

governance aspects of pets uh and uh

77:42

that includes the role of pets in AI and

77:45

um within the work streams we will have

77:48

uh we already have practical

77:50

experimentation that's supported by the

77:52

UN global platform uh and we plan to

77:54

have next open house in April uh so we

77:57

will provide uh invitation to that day

78:00

thank you very much and uh look forward

78:02

for questions

78:03

>> brilliant thank you okay uh Gary Frankie

78:09

>> are you

78:11

>> hi Okay.

78:12

>> Okay. So, so far I can see you, hear

78:14

you, but can't see you yet.

78:17

>> Okay. Um, I can see me.

78:23

Can you see me yet? I think it's just

78:25

taking a little time to get from Cork.

78:28

>> Yep.

78:28

>> Okay.

78:31

>> Can Matthews, can you see them?

78:33

>> Ah, okay. That's interesting. Okay.

78:37

>> Uh, all right. Well, let's go anyway

78:38

because you know what? Frankie and I are

78:39

very good friends. cuz I know what she

78:40

looks like. Um,

78:43

okay. Uh, and Gary Narl's a very good

78:45

friend, so I know what he looks like,

78:46

even though I can't see either of them

78:48

right now. I don't know what's going on.

78:49

Okay. Uh,

78:52

uh, everyone can see Frankie apart from

78:54

me. Okay. Um, I don't know. Uh, Frankie,

78:58

look, uh, sort of Matias has sort of

79:01

talked about, uh, AI models, uh, and

79:04

some of the risks and, and they are able

79:06

to pass largecale data very quickly. and

79:10

repeatedly and variations uh and you

79:14

know this raises issues around

79:16

confidentiality and disclosability

79:19

at a senior leadership level how do you

79:21

get an organization to address such

79:23

issues again I think you know with these

79:25

emerging technologies right I think

79:28

we're quite used to addressing these

79:30

issues with the data we have used and

79:33

the statistical methods we have used but

79:34

you know the world's changing so how at

79:36

a senior leadership level do you get an

79:38

organization to address such issues.

79:41

>> Uh yeah, thank you and an absolutely

79:43

fascinating um presentation for Mattius.

79:46

Uh I certainly uh learned a lot from it.

79:49

So thank you very much. Um and obviously

79:52

Matthews has has kind of made some

79:53

really interesting points um in terms of

79:56

you know is there a real potentially a

79:58

real shift in our relationships with um

80:02

with stakeholders and and a little bit

80:04

although he was talking about it was

80:05

about building trust but it's almost

80:07

saying actually don't trust anybody um

80:10

and including and maybe particularly um

80:12

our tech providers um and somebody

80:16

coming from the technology side um you

80:18

know been thinking a lot about sort of

80:20

cyber attacks and of course zero trust

80:21

is is one of the basis for that and

80:23

therefore you know maybe moving it into

80:26

other areas in terms of relationships is

80:28

is perhaps you know a sensible um

80:31

absolutely sensible thing uh to do um

80:34

but I guess one of my reflections on

80:36

what Tomatus was talking about is um

80:38

those sorts of technology and techniques

80:40

um you know are really interesting but

80:42

they're quite expensive um and they can

80:45

be quite difficult to implement um you

80:48

know using scarce resources

80:50

Um and you know lots of NSOs are facing

80:53

funding challenges. So um whilst

80:56

technology absolutely plays a part in

80:57

this I think it's part of the approach

81:00

to the challenges that you were

81:01

mentioning. Um and I suppose I'm kind of

81:04

think it's interesting what you say. Has

81:05

it changed a lot? Is this a mind shift

81:07

for our staff in terms of how they

81:09

interact with data and these models? But

81:11

I suppose to me we've always had to

81:14

protect our data. So how much is it

81:17

really a change at a high level for

81:19

NSOs? Um so in a way many of the issues

81:23

around confidentiality and

81:24

disclosability are not necessarily that

81:27

different from the challenges that we've

81:29

always faced in using data but of course

81:31

the scale is completely different and

81:33

therefore the approaches that Mattis was

81:35

talking about are absolutely necessary.

81:38

Um and I was very interested in what

81:40

Renan was talking about um in the first

81:42

presentation you know about you know

81:44

thinking about some of the um less

81:46

technology based or centric processes

81:48

such as making sure that we're

81:50

benchmarking we're retraining to avoid

81:52

model drift etc.

81:54

>> Um

81:55

>> so I suppose my point is that it can't

81:58

we can't just rely on technology we need

82:00

to have a holistic approach to governing

82:03

AI. Um and we will have some of those

82:06

aspects already in place. I think

82:07

somebody else also mentioned the codes

82:09

of practice that we have such as the

82:11

UN's fundamental code of practice and um

82:14

the EU's escop. So they're still really

82:16

relevant in terms of providing

82:18

foundations and how we govern the use of

82:21

a genai within statistics. We've still

82:23

got to main trust with our stakeholders.

82:26

Um and in particular, you know, in this

82:28

area, it's about maintaining the trust

82:30

of people and the companies that are

82:31

providing their data to us. um whether

82:34

that's directly through surveys or in

82:35

this case obviously indirectly through

82:38

um the training data that's in these

82:40

models. Um so we've got to be

82:42

transparent about what data we collect u

82:44

and what we're going to do with it. We

82:46

need to continue to make sure it's

82:48

representative and within bias and

82:50

without bias. Um and you know those

82:52

challenges are quick applicable to all

82:55

of the data we use as well as genai

82:57

models. And somebody also mentioned data

83:00

is at the heart of this data quality. um

83:03

uh and again it does become more

83:05

challenging in the world of LNMs but we

83:07

do have the skills to review and analyze

83:10

it and again there are similarities that

83:12

we can learn from admin or big data um

83:15

I'm uh uh leading a an EU project which

83:18

is around AI machine learning in in

83:20

official statistics and one of the work

83:23

packages is providing an amended quality

83:25

model for machine learning for example

83:27

which is based around the total survey

83:29

error model I think we're going to

83:31

continue need to have human in the loops

83:33

or on the loops and that's going to be a

83:35

really important part of our quality

83:37

assurance um going forward for quite

83:39

some time. We need to be continue to

83:42

think about ethics um you know just

83:44

because we can doesn't mean we should

83:46

you know and it reinforces that need

83:48

about being transparent about what we're

83:50

doing and we have to be thinking you

83:52

know in our organizations what are the

83:53

legal frameworks in which we're

83:55

operating so in the EU we have um the AI

83:58

act and but that has to be led alongside

84:01

GDPR and the data governance and I

84:04

suppose also some NSIS are also taking

84:06

on a role of data uh stewardship and

84:09

working with organizations that create

84:11

the data that are used to perhaps train

84:13

these models. Um, but it's extremely

84:17

difficult to do that once you get belong

84:19

the the public sector. Um, and this

84:21

actually might mean us trying to ex

84:24

expand into engaging with tech companies

84:27

to try and help make sure, you know,

84:29

we're keeping our data safe. We're

84:30

protecting privacy and confidentiality

84:33

whilst making sure that the training

84:36

data and the answers these models come

84:37

up with, you know, are at the quality

84:39

and the standard and the citations that

84:41

we expect um within the world of

84:43

official statistics. I'll be honest,

84:45

it's quite a sensitive topic at the

84:47

moment as countries are becoming almost

84:51

becoming to look less dependent become

84:53

less dependent on these on the current

84:54

large tech companies. There's a lot of

84:57

discussions around data sovereignty,

84:59

geopatriation,

85:00

repatriation

85:02

and that you know feeds into these

85:04

conversations around you know governance

85:06

over AI. Um and to be honest NSIS I

85:09

don't think are particularly good about

85:11

with this sort of engagement and

85:12

therefore working as a community for

85:14

example the UN ECE project on AI

85:17

readiness the work that the OECD have

85:19

been leading world bank have been doing

85:21

work so have IMF so really working

85:23

together as a community I think is

85:25

particularly important um but you know

85:28

many of those aspects I've talked about

85:30

are based on being able to trust those

85:32

relationships trust the providers of the

85:34

data the users of our data and the and

85:37

suppliers um of tech. So the we need to

85:40

continue to have skills in in building

85:42

relationships.

85:43

Um but if we're moving more towards

85:46

that, you know, don't trust people, then

85:49

probably a lot of those um techniques

85:51

that Matias was talking about are likely

85:53

to play a part. Um so I think we're

85:56

going to need to continue within our

85:58

organizations to think about the the

86:00

governance, the processes, the people,

86:03

those sorts of skills as well as the

86:04

technology. Um so as I said in summary

86:08

we're kind of asking our organizations

86:09

to do much the same in a way to high

86:12

level we have to have strong governance

86:13

we and and ethics to protect the data we

86:16

use uh and explain all of that to to our

86:19

stakeholders but technically how we go

86:21

about that is clearly changing and

86:23

potentially the way that we interact

86:25

with suppliers and users of our data is

86:27

also changing. So, you know, maybe it's

86:30

about trusting no one to start with and

86:33

building those relationships and

86:34

therefore that's what as organizations

86:36

we need to prepare for.

86:38

>> Thank you. So, Gary, I actually agree

86:41

with Frankie that at a high level

86:43

there's nothing new here. We've always

86:45

had the concerns about confidentiality,

86:47

data security and things like that. But

86:50

as these new techn as these new big data

86:53

sets start being used, you know, new

86:55

tools, you know, gen tools and things

86:57

like that, there's other technologies

86:58

like pets that we can use to help us.

87:01

Um, from your methodological background,

87:04

how do we actually get gain a better

87:06

understanding of these technologies and

87:08

tools so that we have good quality

87:10

assurance of our outputs and can explain

87:13

them?

87:16

>> Yeah. So it's an interesting interesting

87:19

challenge and that that it is um in some

87:22

ways it is developing you know new as as

87:24

we're going with new new methods and

87:26

that but but is also um as Frankie

87:30

touched on a little bit that there is

87:32

also the existing that we've been using

87:34

models for for some time um you know and

87:37

and you imputation is a is a is a model

87:40

um and you know we've been using

87:42

different levels of sophistication over

87:45

the years but but it is taking it to a

87:47

whole new level. Um I think it was

87:49

interesting though as well is that

87:51

Ashwell uh when he when he um gave his

87:55

introductory comments, you know, we've

87:57

heard a lot about trust and and I think

87:58

there's there's a really key key aspect

88:00

that we've got to keep there. But I

88:04

suppose um when I'm sitting down and and

88:08

sort of, you know, thinking about from a

88:11

methological point of view, you know,

88:13

how we how we release this, I sort of it

88:16

goes back to,

88:18

you know, can the result be explained?

88:21

You know, it it's it's a you use a thing

88:24

called the sniff test. You know, does it

88:26

smell right? It's it's you know and and

88:29

there's a whole you know

88:32

you getting the methodologists to

88:34

actually run some simpler models and see

88:36

how they align you know the calibration

88:38

and um making sure that you know that

88:42

the the data the results and the context

88:45

are are accurate. You know we've we've

88:48

when you do research you know you have a

88:50

number of processes that you follow

88:52

through you know follow you know what

88:54

are your validation objectives you know

88:56

how are you going to validate the data

88:58

sets you know what are the appropriate

89:01

um you know sort of measures and and um

89:04

detection you know how are you going to

89:06

do validation experiments and they still

89:09

hold in in today in in you know the use

89:11

of AI. So you know I encourage the my my

89:14

methodologists to just apply their

89:16

normal research techniques into these

89:19

these methods and of course we've got

89:22

aspects around bias you know um

89:27

we can start using more what if analysis

89:29

simulation models different flavors of

89:31

it like you know prompt testing

89:34

um you know thinking about okay um can

89:38

we alter the the the um the test data

89:41

set to see you know what what might

89:43

happen um think about scalability what

89:47

will happen there um and then of course

89:50

you know um you know Frankie touched on

89:52

the fact that that you know ethics and

89:54

fairness have to be considered as well

89:56

and I think that's a growing uh area

89:59

that methodologists perhaps haven't been

90:02

thinking a little bit back you know

90:03

about in the past and I think we got to

90:06

become a lot stronger in that and and we

90:07

think about data sovereignty um so It's

90:11

it's interesting because I think um some

90:15

of our existing quality frameworks

90:19

are still highly relevant. We just have

90:22

to apply them in a different lens. The

90:24

final few things I do want to talk about

90:27

though as well is that um

90:30

we've really got to when I'm talking you

90:33

know testing um when we're going to

90:35

introduce new models and that you know I

90:38

really do focus on how are we going to

90:39

monitor the model um and you know for

90:42

ongoing monitoring and and and what are

90:45

the the the um parameters or or

90:48

expectations that we would expect to see

90:51

and what happens if it goes a little bit

90:53

astray. Okay. And then the final thing

90:55

is is becoming a whole lot more

90:57

transparent um about the use of models

91:00

and what we're doing um and and as part

91:04

of you know the work that we when we

91:06

publish I am really pushing hard for um

91:11

you know if we do analysis using simpler

91:13

models to actually publish those as well

91:16

so that um end users can actually

91:20

look at our simpler models and think oh

91:22

actually can we use or how can we apply

91:25

that as part of our our regular of our

91:28

monitoring of the final data um sort of

91:30

thing. So yep.

91:33

>> Thank you. Uh Matthews I have a question

91:36

for you. Um so you talked about all this

91:38

in the context of AI models but actually

91:41

Frank has picked up on something as uh

91:44

data stewardship of sort of other data

91:48

not just data we collect or generate

91:50

ourselves is becoming more and more

91:51

important and I think you know there

91:53

there's a possibility where we'll see

91:55

NSOs

91:57

uh thinking about becoming national data

91:59

offices or data platform organizations

92:01

and we may move to what my good friend

92:04

Jeff Bulby of Stat can refers to as a

92:06

wholesale model where really what we're

92:07

doing is making data available. Whereas

92:09

our current model is very much a retail

92:12

model where we produce the product, you

92:13

know, the statistics. In a world where

92:16

we sort of move to

92:20

more wholesale data provision, does pets

92:23

become more important then and why? And

92:25

if so, why?

92:28

Yeah, I think it's a very good uh uh

92:31

question because uh pets so first thing

92:35

pets we shouldn't seen them as a kind of

92:37

silver bullet for all problems with the

92:40

data protection and privacy protection.

92:42

We need to look at them in the context

92:44

of also organizational measures because

92:48

you need to put measures in place and

92:49

some some are better to be technical or

92:52

technological measures and some are

92:53

better to be organizational measures.

92:56

But in this context of the data

92:58

stewardship role of of NSO,

93:01

what we can expect is that the data

93:05

governance arrangements in in those yeah

93:09

whatever they are hubs or or spaces data

93:12

spaces or or uh uh uh these new data

93:16

ecosystems will will vary depending on

93:19

the purpose of use of data and uh

93:22

partners in the data ecosystem And I

93:27

think that pets offer you here some

93:29

possibility to balance uh uh the the

93:34

kind of uh uh risk uh with the usability

93:40

and and more fine-tune it to the

93:43

particular

93:44

use case or particular type of of um uh

93:48

yeah uh data arrangement and I think uh

93:52

that NSOS that have ambition to to

93:55

become um data stewards should invest in

93:57

skills in pets because it's not that we

94:00

would need to you know develop these

94:02

cryptographic methods ourselves they

94:04

they they are already developed by other

94:06

organizations that are much better

94:07

suited for that but we need to know how

94:09

to apply them and and and when and in

94:12

which uh cases and and that will require

94:15

skills and I think these skills is

94:16

similar to other situation when we need

94:19

to also have a good skills how to manage

94:21

metadata and and data standards and

94:23

interoperability if we want to be in

94:26

this position of data stewards uh in

94:29

broader role of NSO.

94:31

>> Okay, brilliant. Osama, can I can I can

94:35

I give a little bit of an example

94:38

where we've we've had some experience

94:40

and it's pretty simple example and and

94:42

I'm sure many agencies have done it but

94:46

we we were bringing together um telco

94:49

data and and working with uh uh the

94:53

three suppliers of of telecom services

94:56

in in New Zealand and um we wanted to

95:00

bring all the all the data together, but

95:04

they were obviously very cautious about

95:07

giving us data because if we slipped up

95:10

and published

95:11

data, their opposition would see, you

95:14

know, their their reach and what have

95:16

you.

95:17

So, so the way we approached that was

95:19

through building a partnership with each

95:21

of the three Telos and actually getting

95:25

them to confidentialize the data and

95:28

then shipping the confidential data

95:30

through to us and that gave them a very

95:33

strong

95:35

uh you know there was no reason you know

95:37

because we if we even if we did slip up

95:40

there would be no confidential data to

95:42

give out because it wasn't confidential

95:44

in the first place. Now, what was

95:46

interesting there was that the Telos

95:48

themselves actually weren't that strong

95:51

in confidentializing data because, you

95:53

know, they're all about billing and

95:56

making sure the connections work. And so

95:59

we actually ended up implanting some of

96:02

our staff into their data systems to

96:05

create the data sets that we were

96:08

wanting and then you know and and

96:11

demonstrating and talking them through

96:12

the confidential methodologies.

96:15

Now very simple example but but it

96:18

actually talks about strength talks

96:21

about data governance it talks moving

96:23

beyond our boundaries. Um, and I think

96:26

that that's quite a, you know, a

96:28

powerful model to to think about as we

96:30

go forward. Um, so yeah, just an example

96:34

I wanted to share.

96:35

>> Just for what it's worth, I've been

96:37

thinking for a while with our work. A

96:39

partnership model with private sector

96:42

providers where you work with them, I

96:43

think has worked much better for us than

96:45

a commercial model where we've tried to

96:46

procure data, you know, from some uh,

96:49

commercial. Okay. Uh, look, uh, I've not

96:51

been the greatest moderator because

96:52

we're a bit behind time. Matus, there is

96:55

a question or two for you mainly about,

96:57

you know, joining pets and I think

96:59

something about how you get skills. Can

97:01

I leave that to you to maybe pick up

97:03

separately uh as I move on to uh the

97:06

next section? Uh so for the next section

97:09

is basically uh so uh we have Rohit

97:12

Badwaj from the NAT stats office of

97:15

India although I suspect he's still very

97:17

busy right now with the uh AI summit

97:19

that's going on in India and he's going

97:21

to talk about the road map from moving

97:24

from ad hoc pilots to a mature

97:27

institutionalized data science operation

97:29

within an NSO. Robert are you here?

97:32

>> Yes I'm there. Maybe the camera is

97:35

taking some time for me to come up on

97:37

the screen but I can start you know in

97:39

in the interest of time.

97:42

>> Okay, go for it.

97:44

>> Okay, thank you. Thank you Osama. So yes

97:47

uh there's an exciting time in India

97:49

with AI summit happening really

97:50

exciting. So I I'm just going to present

97:53

what we have been talking about. We have

97:54

been talking about uh uh you know all

97:56

the pilots, all the work everybody is

97:58

doing but the ultimate goal is to make

98:00

it happen. and make it uh bring

98:02

everything to the production. So can we

98:03

go to the next slide please?

98:06

Yes. So this is my uh this is how I'm

98:09

trying to present this. I I'm going to

98:11

talk about how we did it. I mean how we

98:14

how we created a collaborative

98:15

framework, how we you know strategically

98:18

pivot from passive to proactiveness and

98:20

what was what has been our governance

98:21

model and how how has been the journey

98:23

so far. I'll not say the journey is

98:25

complete. Journey is still some time to

98:27

go. But yes, I'll trying to make up or

98:29

try to put up a presentation for that.

98:32

And then I am also going to talk about

98:34

uh the impact as you know the AI summit

98:36

is also about the impact. So what type

98:38

of impact does it have on our ecosystem

98:40

that's what I'm going to talk about it

98:42

in terms of use cases and other KPIs you

98:45

know. So next next slide please.

98:50

Yes. So traditional barriers I'm not

98:52

going to delve much into it. We all know

98:54

about it. That's one important point

98:56

that we feel that it's time that

98:57

everyone starts using AI so that the

99:00

capacity between those who are using it

99:02

and those who are not using it does not

99:04

become so wide that it becomes

99:06

impossible to fill. So our vision is to

99:08

leverage collabor collaboration. So

99:10

collaboration is key here. Leverage

99:12

collaboration to accelerate AI adoption

99:14

for quality data and statistics for our

99:16

national development world. we call it

99:18

vixit bharat by uh 2047 we when we'll

99:21

have the 100 years of our independence.

99:23

Next please.

99:26

So this has been the approaches

99:31

uh this this is the approach you know uh

99:35

so these are the approach to the data

99:36

innovation. First is identify use cases.

99:39

If you don't have the use cases nobody

99:41

is going to believe that anything is

99:43

possible using AI in official

99:44

statistics. do some in-house PC's to

99:47

find out what is possible and what is

99:49

not. Do research about the best

99:51

practices, the physibility, how the

99:53

world is taking up various things which

99:55

you plan to undertake. Those are the

99:57

very important point. Then collaborate

99:59

and this is key to our whole effort of

100:02

innovation. Actively collaborate with

100:04

academic institution, multilateral

100:05

agencies and non-governmental partners.

100:07

And I'll delve into detail as I move

100:09

forward. And lastly but not the least,

100:12

document document each step. document

100:15

the AI use cases, document the AI

100:17

readiness framework. So we basically uh

100:20

have documented both of these things. We

100:22

have a documented use cases and then the

100:24

readiness framework is also there and of

100:26

course the working papers and everything

100:27

else can come. So our our basic approach

100:30

is experiment scale and then govern.

100:33

Next please.

100:36

So I'll talk about a little bit about

100:37

the collaborative framework which we

100:39

have adopted while doing our our

100:41

innovation journey. It's called we call

100:43

it a triple helix model. It's a

100:45

well-known model where government,

100:47

academia and startups all come to

100:49

industry probably we all come together

100:51

and work to achieve a goal in in

100:54

additionally we also pitched in

100:56

researchers and students as well by way

100:59

of having hackathons and other extension

101:02

activities which I'll talk about it as I

101:04

move forward. So collaboration is the

101:06

cornerstone of our success if whatever

101:08

success we could get by now. Next

101:10

please.

101:13

So let me just discuss some of the

101:15

journey uh we undertook of our

101:17

innovation. We started 18 uh you know

101:20

sometime in 22 and there was lot of

101:22

delays in publishing the guidelines. The

101:25

ownership of the whole innovation

101:26

innovation activity was not clear. Uh

101:29

there were not not many use cases which

101:32

on which the businesses were agreeing on

101:34

and then when whenever we approached

101:36

anyone for collaboration they asked us

101:37

what what what is in it for us. So all

101:40

these questions dragged us and we sort

101:42

of you know for one and a half year we

101:44

remained in a passive state and then we

101:46

decided to make it more strategically

101:48

more pro proactive. So our senior

101:51

management reached out to different

101:52

partners and convinced them to become

101:54

collaborators in our journey of

101:57

innovation. We started working on a

101:59

transparent criteria for partnership for

102:02

project uh g giving the projects and and

102:05

of course the lastly the direct acid

102:08

academic industry partnership. This is

102:10

very important because not everything

102:12

can be done by the statistical office

102:14

itself. It has to have partner who are

102:16

wellversed with technology and they can

102:18

do it in a much better manner. Next

102:20

please.

102:22

So uh these are the some of the things.

102:24

So I the governance models is written

102:27

and we have this council which is headed

102:29

by the head of the NSO which is the

102:30

secretary of our ministry and uh things

102:33

like experts from outside transparent

102:36

project uh selection and criteria

102:38

guidelines based on trustworthy AI and

102:41

regular review and adaptive governance

102:43

is key to all this. So some of the best

102:44

practices I'm going to read which we

102:46

felt over a period of time are very

102:48

important. AI champions needs to be

102:50

encouraged. There has to AI champions

102:52

needs to be identified and they needs to

102:53

be encouraged very pro be very proactive

102:56

in building the partnership. Prioritize

102:58

knowledge knowledge sharing attract

103:01

specialized AI talent. Secure dedicated

103:04

funding. These are some of the issues

103:05

which which we thought while we moved in

103:07

our journey have played a very crucial

103:09

role in making us uh making our projects

103:13

start and remain sustainable. So we

103:16

followed this epic framework again is a

103:17

very established framework of education,

103:19

partnership, infrastructure and

103:20

commercialization. Commercialization in

103:22

sense of productionization. Next please.

103:26

So this is our journey. It's all on this

103:28

uh screen. I'm not not going to read it.

103:30

So we started in early 2022

103:32

operationalized in one and a half years

103:34

of almost of you know less active uh

103:37

state and then started with July 2024

103:40

and by July 2026 we have 12 use cases

103:43

undergoing. uh you know in some way it's

103:46

it's underway and and five are already

103:48

deployed in production and I'll request

103:50

all of you who are there please visit

103:52

our uh website uh and there there are AI

103:55

use cases section available in our

103:57

offering go there experience yourself

104:01

I'll be very happy to get any feedback

104:02

next please

104:05

so these are the outcomes in terms of

104:06

number 12 uh act the number itself

104:09

speaks uh the best part is that we have

104:11

been able to collaborate with 18

104:14

institutions and voluntary organizations

104:17

and different partners and we have been

104:19

able to engage more than 8,000 8,500

104:22

students by way of uh hackathons

104:24

especially by way of hackathons and

104:25

connecting with these uh these

104:27

institutions and this is written that

104:29

you know we we have worked with IITs we

104:31

have worked with server and startups we

104:33

have worked with voluntary all the names

104:34

are there and need not read it one thing

104:36

I just wanted to tell you that we have

104:38

been able to create a MCP for us MC MCP

104:41

server for us and which has been really

104:43

lapped up by the community and you know

104:45

the tweet itself announcing has more

104:47

than uh you know four lakh which is like

104:50

400,000 views. Next please.

104:54

So this is our workflow. This is

104:56

important. Quickly identify the use

104:58

case. Do a rapid prototyping. Don't take

105:01

time on that because people then will

105:02

not believe that something is possible.

105:04

Evaluation and validate and then decide

105:07

which ones are fit to be scaled and then

105:09

make your institution ready with the

105:11

capacity and have a framework if

105:13

possible for that and then put it in the

105:15

deployment and production and

105:17

deployment. Maintaining is something we

105:19

have outsourced it. We are working with

105:20

one of our partners to maintain and

105:22

orchestrate the entire thing. Next

105:24

please.

105:27

Yeah, some of the use cases the all the

105:28

use cases you can say we have a notebook

105:31

LLM corresponding to that we have a uh

105:34

tool available for us. We have NCP

105:36

server there. It's again available on

105:38

our website. The details are there. We

105:39

have NIC code classifier there. Again

105:42

available live on our website. Semantic

105:44

search for data portal is there live

105:46

again. You go to our data portal and

105:48

it's there. AI search is there. Next

105:50

please.

105:53

And then we have a website chatbot as

105:55

well and few of them are in the in the

105:57

process of getting pilot or it's at the

106:00

PC stage. The idea is that in next six

106:02

months create a data as a service uh

106:05

platform with help of the AI. Next

106:08

please.

106:11

So these are the lessons learned. Uh

106:12

it's all there. I'll only say one point

106:15

is that active engagement is always

106:17

better than passively waiting for

106:19

something. And in NSOS because we are

106:21

publicly funded it's our responsibility

106:24

to create a safe fail place to fail

106:26

because that's very important uh for

106:28

especially for innovation practices and

106:31

way forward how how we intend to move

106:33

forward I'll say we'll keep expanding

106:35

the collaboration uh we will create a AI

106:39

readiness framework for data by the way

106:41

we have one and that has also been

106:43

selected for the upcoming conference of

106:45

ISI on in I think it's in May sometime

106:48

and lastly but we are going to make all

106:50

the code bases public. So our MCP server

106:53

code base is public on our GitHub.

106:55

Anybody can go and raise a pull request.

106:58

So building a replicable model for other

107:00

national statistics office is our

107:02

ultimate aim. Next please.

107:07

So this is a nutshell a picture again

107:10

generated by AI which talks about the

107:12

innovation journey of NSO India. It's

107:14

all there. I thought I'll put it up in

107:16

the last so if there's a discussion this

107:18

can be helpful. Thank you very much and

107:20

I'm ready for any question or

107:21

discussion.

107:23

>> Brilliant. Thank you. Okay. Uh Ivan, are

107:26

you here with us?

107:29

>> Great.

107:30

>> Okay.

107:30

>> Yes. Um

107:32

>> so there was an interesting comment

107:33

there from Roit about uh you know sort

107:35

of trying to set up a model for other

107:37

NSOs. Um which is one aspect of um

107:42

working in the open. Um there's been a

107:44

lot of talk about working in the open

107:47

open tools open all that sort of stuff.

107:49

Why do you think there's a lot of talk

107:50

about this and why is this important?

107:53

>> Yeah, thank you and uh it's been

107:56

fascinating to follow the presenters and

107:59

uh the discussions. I think openness is

108:03

a central issue as we move forward when

108:05

it comes to O AI because openness is

108:09

about trust and as we all know uh as

108:13

statistics offices our our our value has

108:17

so much been about transparency

108:21

um reproducibility accountability that's

108:24

really what we've derived our legitimacy

108:28

in official statistics so even as we

108:31

embark on integrating AI and you know

108:34

referring back to the idea of

108:36

institutionalization

108:37

for sure that's already happening uh

108:41

it's no longer a question of

108:42

institutionalizing it's probably more of

108:45

how we do it in a deliberate responsible

108:48

and uh sustainable way and that's why

108:50

this key question you raise is is very

108:53

paramount in terms of of uh openness in

108:56

how we do things and I think

108:57

specifically

108:59

we we could think of it being so

109:01

beneficial to all of us in a couple of

109:04

ways. Uh one is that with with openness

109:08

we are able to

109:10

get feedback from others in terms of uh

109:13

the things we are doing in in the models

109:16

we are developing which obviously helps

109:19

us uh get better but also with openness

109:22

we we are able to share and learn from

109:24

each other. we're able to share and

109:26

learn from each other. And um the other

109:29

thing is that uh with with openness we

109:32

are able to uh also um build that trust

109:36

in the users of of of our statistics. So

109:40

I think openness is is definitely a a

109:43

fundamental issue here that uh we should

109:46

uh we should all keep in mind and uh and

109:49

and be able to integrate. But thinking

109:52

of some of the ideas around um the reuse

109:57

of models and the like. I mean in most

110:00

of our statistics offices this being a

110:03

new area we we can all agree that some

110:07

of the ideas like India has presented

110:09

and the others uh it would really be

110:12

beneficial to to adopt them given the

110:15

limited skills or or competencies in in

110:19

statistics offices. I think openness

110:21

also brings that opportunity and I think

110:23

through the hub the regional hubs that

110:26

we have like the one in Africa that's

110:29

one of the things we aim to uh in the

110:31

sense that from what we do in Rwanda or

110:33

what is done in another country it can

110:36

be able to be sharable so openness is is

110:38

really a critical one and uh I hope we

110:41

all adopt it.

110:43

>> Excellent. Right. I have a question for

110:46

you if uh you're still around. Okay. So

110:50

um I regularly say that prototyping is

110:52

easy but implementation is very hard and

110:55

if I looked at the workflow diagram that

110:57

I currently showed it would suggest it's

110:59

easy. I'm not quite sure it is so easy

111:01

but what key lesson do you have for us

111:04

to make the journey from prototyping to

111:07

implementation into businesses systems

111:09

easier.

111:11

>> Okay. Uh thank you Osama. uh three

111:15

points basically find your use case

111:19

do a quick pilot on that uh you you need

111:22

to quickly work with people who have

111:25

skills I mean not every skill exists in

111:27

NSO so you need to find that partners

111:30

who who are ready to work with you uh be

111:32

it through a procurement or otherwise

111:35

and start working with them uh and then

111:37

show it to the business or or the domain

111:40

expert and and once that helps them

111:42

bring those uh people on board. It's not

111:46

always the case that domain is already

111:47

onboarded. I mean at times they need

111:49

some evidence to get onboarded. Once

111:52

they are onboarded uh ensure that there

111:54

is a time limit before you before you

111:58

produce a minimum viable product.

112:00

>> Sure. and and and and for that I don't

112:02

have a I mean when we were doing it the

112:05

all our models are open aw weighted

112:07

model but uh we should not I mean my

112:09

opinion is that we should not wait for

112:10

it whatever suits anyone and that that

112:13

should be the approach but produce

112:15

something which can be shown to the

112:17

senior management as as something viable

112:20

and take the trust I mean and then get

112:23

the trust of the senior management on

112:25

your side once you have the manage

112:27

senior management support and something

112:29

to show up. I I guess every other things

112:31

fall in place

112:34

>> which actually links back to something

112:36

uh Marcus said earlier about you know

112:38

you show senior management they're only

112:39

going to be interested in if they can

112:41

see that there's an impact here uh

112:43

potential impact and not just that the

112:44

work is interesting

112:47

going back to the theme of openness so

112:49

you've talked about openness in terms of

112:51

transparency and sharing um to what

112:54

extent are are you at uh the National

112:58

Institute of Statistics Rwanda

113:00

using Opal tools like GitHub and sort of

113:02

you know I think more and more NSOs are

113:04

putting out uh the code that they use

113:07

for various things that they develop

113:08

onto their GitHub repositories. To what

113:11

extent are you using that and also

113:12

putting your own code out onto a GitHub

113:14

repository?

113:17

>> Yeah. Um not to the desirable extent.

113:22

Um I think this is something that we

113:25

have just started at the moment. the way

113:27

we've approached the the things we're

113:30

doing are more still in the pilot phase.

113:32

So we've been open to collaborate with

113:35

uh different partners like uh the office

113:38

of statistics of UK or the World Bank

113:41

where some of our projects we've opened

113:44

up to them for feedback and validation.

113:47

But the idea is what you just touched on

113:49

which is be able at the point where we

113:52

feel like some of these things are

113:54

meaningful uh uh uh uh uh uh uh uh good

113:58

to go then we open them up through the

114:01

regional hub and and we working closer

114:03

with the the uh the statistics um office

114:07

in the Africa statistics office in um in

114:10

Ad Saba uh coordinated by Dr. somewhere

114:14

that uh we come to that level where some

114:16

of these uh projects will be made more

114:18

available but so far what we are doing

114:20

is through workshops and we've had

114:24

several workshops where we invite

114:25

colleagues from Africa and they come and

114:28

see what we're doing and we're able to

114:30

share more from a practical perspective.

114:34

>> Okay, brilliant. um because I've been a

114:36

bad moderator of run over time. So, uh

114:40

I'm going to I'm just going to give you

114:42

some of my views of what we've heard

114:43

today. uh we have new large data sets

114:46

and new tools which means uh that we

114:51

there is more and more that we can do

114:53

but that requires both capability

114:55

building within NSOs

114:58

uh and also more collaboration between

115:01

NSOs while adhering to those fundamental

115:05

principles that I think NSOs have always

115:07

adhereed to and you know this all this

115:09

new big data and all this new technology

115:11

and AI doesn't actually change those

115:13

fundamental principles to me and then

115:15

sort of gets back to what Ivan's been

115:17

saying openness sharing this all helps

115:20

uh you know I think there is something

115:21

that you and SD can do there which is

115:23

perfect because I'm now handing over to

115:25

Ammer to just uh say some closing words

115:31

>> thank you thank you very much Osama and

115:33

great to see you again and uh many of

115:35

the colleagues here and uh let me assure

115:38

you that you are very far of being a mad

115:40

moder a bad moderator all right you've

115:42

been a fantastic moderator ator and you

115:45

led this with effectiveness and

115:46

efficiency. So, thank you very much to

115:48

you and thank you all to the colleagues

115:50

and speakers for their great and

115:52

valuable insights really. So, I'm really

115:55

grateful for that. Uh allow me also very

115:58

quickly to thank obviously Luis and my

115:59

team Maria and Clarence for all

116:01

supporting this webinar. So, thank you

116:03

very much to all. Uh clearly I'm not

116:06

going to try or even attempt uh to try

116:10

to kind of summarize any of these

116:12

discussions. All right. But allow me

116:14

just to make three points from the UNSD

116:17

perspective. All right. Uh on on this

116:20

fantastic webinar, uh the first point is

116:23

really about the transitioning from the

116:25

production to readiness.

116:27

While this webinar obviously and we

116:30

heard from all of you and from all the

116:32

speakers uh their insights about really

116:36

uh the the tools of production and how

116:39

to develop the AI systems all right

116:43

needed for such a production and so on

116:45

and hence many of the discussions seem

116:48

technical clearly and inevitably I would

116:51

say the speakers and the discussions

116:54

delved into really uh important policy

116:57

iss issues or kind of raised up some

117:00

policy issues. All right. And I can I

117:03

try to enumerate some of them not

117:05

exhaustive obviously but obviously

117:07

issues of human capital and managing of

117:10

human capital came the issues of

117:12

collaboration. All right. And I think

117:14

here the south south cooperation came

117:16

very strongly. All right. And the

117:18

examples of ini the regional and global

117:21

hubs and so on came very strongly.

117:24

issues of quality assurance, cost and

117:26

financing, privacy issues and holistic

117:30

approach to governance in AI came as

117:33

important as well. Uh all that leads me

117:37

just to the importance of this work and

117:40

to the importance of the CEBD role. All

117:43

right. in hopefully uh kind of bringing

117:47

and connecting the dots in all this and

117:49

again I'm grateful to Ashwell for his

117:52

presence and for opening this webinar

117:54

but I think more important it proves

117:56

that this webinar uh provides an

117:58

fantastic preparation and the foundation

118:01

for the Friday uh seminar on the 27

118:04

which I hope to see you and many of you

118:07

there all right because building on this

118:10

webinar and the tools of production that

118:12

we discussed here we can move to a

118:14

meaningful discussion on the demand side

118:16

all right of AI readiness so that's uh

118:20

my first point my second point is that

118:24

we in UNSD actually are trying to walk

118:27

the talk all right uh we are fascinated

118:30

by why what's happening but we are also

118:32

trying to modernize our own internal

118:34

operations testing new data engineering

118:37

practices as histed by the use of AI

118:40

such as LLM technologies and sharing our

118:43

experiences with our communities of

118:45

practice. One example is that we are

118:47

deploying an AIdriven knowledge

118:49

management system to analyze capacity

118:52

gaps and ensure our technical

118:54

innovations meets real demand in the

118:57

global statistical community and we look

118:59

forward obviously uh to share this with

119:02

you in more details uh in another

119:04

opportunity. My third point is that we

119:08

remain focused at UNSD in supporting our

119:10

member states through innovations. All

119:12

right. We continue working with our

119:15

partners to ensure that official

119:16

statistics are machine actionable and

119:18

interoperable

119:20

through the UN global platform that many

119:23

of you have contributed and continue to

119:25

contributed and we're again grateful for

119:27

all this contribution. We are moving

119:29

beyond theoretical training to providing

119:31

sandboxes and that would allow NSOs to

119:35

overcome local IT bottlenecks and access

119:38

cloud infrastructure and big data tool

119:40

as a service. We're also

119:42

operationalizing these principles

119:44

through the UN system data common a

119:47

federated architecture that transitions

119:49

our data from isolated silos into an AI

119:52

ready semantic knowledge graph. So uh

119:55

let me again close this fantastic

119:58

webinar by thanking all of you for the

120:00

contribution by inviting you all to

120:05

uh attend and participate and again

120:07

speak in the uh strategic and governance

120:10

discussion the in the Friday seminar on

120:13

AI readiness and uh I think uh with all

120:16

of this and this kind of two events

120:19

together we hope that we are

120:20

contributing to kind of bring this

120:23

community closer to what everybody

120:25

mentioned repeatedly which is the issue

120:27

of trust. All right, trust by the public

120:30

but also trust within the statistical

120:33

community about the AI uh applications

120:37

and uh readiness as well and use. Thank

120:40

you very much and hopefully I'll see you

120:43

all on Friday. Thank you.

120:45

>> Thank you. Yeah, I'll hope to see lots

120:47

of people in New York. Uh thank you

120:49

everyone. Uh

120:51

yes, thanks to all the speakers, all the

120:52

presenters, Luis and the group uh for

120:55

organizing and everyone for contributing

120:57

to the chat and also asking questions.

121:00

Bye for now.

121:02

>> Thank you very much.

121:04

>> Bye

121:04

>> bye.

121:05

>> Very good evening, good afternoon, good

121:06

morning to all.

121:07

>> Byebye.

121:08

>> Byebye.

121:09

>> All right. See you.

121:12

>> Thank you.

Interactive Summary

This video discusses leveraging Artificial Intelligence (AI) in the production of official statistics, focusing on how National Statistical Offices (NSOs) can use AI to deliver statistics faster, cheaper, and better. The session is structured around four case studies: enhancing statistical operations through AI, filling data gaps with AI, addressing governance challenges of AI in statistical organizations, and institutionalizing AI in statistical production. Key themes include the application of large language models (LLMs) for tasks like coding open-ended survey responses, using AI and satellite imagery for agricultural statistics, the importance of data governance and quality assurance for AI-driven statistics, and the need for capability development and trust-building within the industry. The discussion also touches upon the shift in the statistician's role, the balance between efficiency and regulation, and the benefits of open-source collaboration.

Suggested questions

5 ready-made prompts