Leveraging AI in the production of official statistics How AI helps NSO deliver statistics faster..
2698 segments
So um good morning everyone, good good
afternoon, good evening. Uh thank you
for joining this uh this uh um side
event which is part of the uh the leadup
to the 57 session of the statistical
commission. Um the the webinar is on
leveraging AI in the production of
official statistics. how AI helps NSOS
deliver uh statistics faster, cheaper
and better. So today's session uh will
focus on AI for data production uh so to
speak the supply side of of statistics
serving as the counterpart for the
Friday seminar on 28th February which
will focus on AI readiness and data use
or what we would call the demand side uh
of this conversation. So this seminar
will be organized uh in four strategic
dialogues around four concrete case
studies and each segment will consist of
a 7 minute presentation followed by 13
minutes strategic discussion and the
participants should use the the Q&A tab
in teams to submit their questions
throughout the event. Um
just to give you uh a few a few uh
housekeeping a more housekeeping details
the the the agenda uh we will have
opening remarks by uh Mr. Ashwell Jenner
who is deputy director general for
statistical operations and provincial
coordination of statistics South Africa.
Uh Mr. Jennica is representing the chair
of the bureau of the committee of
experts on big data and data science for
official statistics
and uh as a moderator we will have Mr.
Osama Raman who is director of strategy
planning innovation delivery assurance
and support at the office of national
statistics of the United Kingdom and Mr.
Raman is uh representing the chair of
the data science leaders network of
singing.
So the first segment on the use of AI uh
to enhance uh uh statistical operations
from manual processing to algorithmic
supervision. We'll start with a case
study by Mr. Elu Vijas who is director
of the data science laboratory and
modern methods of information production
in the national institute of statistics
and geography of Mexico and we will have
a discussion with uh professor Shu
Renjun uh from uh the representing the
global China hub and and from Sang
University in China.
Segment two will be on the use of AI to
fill data gaps and uh we will have a
case study uh presented uh by Mr. Setia
Pmana professor of statistics at the
poly techchnic statistica stis in
Indonesia uh and uh representing the
regional hub for big data and data
science for Asia and the Pacific in
Indonesia. Um the discussion will be led
uh with uh Miss Mariel Lisagera Toledo,
head of research uh in the National
School of Statistical Sciences in the
Brazilian Institute of Geography and
Statistics, IB and also representing the
UN regional hub for big data and data
science in Brazil and uh Mr. Marcus Sala
who is director general of statistics
Finland.
Uh the third segment will be on the
governance challenges of the AI in
statistical organizations and um we will
have a discussion uh a presentation
first by Mr. Matias Jock uh from
statistics Netherland and uh a
discussant uh uh discussant will be Mr.
Gary Donut, interim chief of methodology
and statistics in New Zealand in
statistics New Zealand and um uh Miss
Franchesca Frankie K uh chief uh
information officer uh of the central
statistics office of Ireland. And
finally the final segment on
institutionalizing AI in statistical
production we will have a presentation
by Mr. Rohit Bad Badway, deputy director
general uh in uh on data informatics and
innovation in the national statistics
office of India. Um the discussion will
be led with uh Mr. Ivan Moreni, director
general uh of the national institute of
statistics of Rwanda and representing
also the UN regional hub for big data in
Africa. And so we will have at the end
closing remarks uh by Mr. Am Andur,
chief of uh the data innovation and
capacity branch of the United Nations
Statistics Division. Uh with this I
would like to uh hand over to uh Mr.
Jenner uh from Statistics South Africa
uh for the official opening uh remarks
and and um the floor is yours. Uh
>> thank you Louise.
Good morning, good afternoon, good
evening depending on where you are
around the globe and heartly welcome
from the committee of experts on big
data and data science for official
statistics.
Now according to Fortune Business
Insights, the AI industry will grow from
367
billion to 2.5 trillion by 2034.
This means a compound annual growth rate
of about 26%.
So when the committee of experts looked
at this tremendous growth that will take
place
uh we considered this when we changed
our mandate in 2024
and we've now got specific reference to
AI and and what we were specifically
thinking about is we need to provide
strategic direction to emerging new
technologies and methodologies.
uh such as AI and that's to the whole um
sort of official statistics industry.
We also need to promote the practical
use of of of AI in official statistics.
Also promote capacity development,
enhance communication and advocacy
uh for AI initiatives in the industry.
Lastly, but the most important, we need
to build trust. We need to build public
trust for AI. But not just public trust.
We also need to build trust in the
industry because you must remember we're
a we're an old and established
statistics industry. And once something
new like AI comes along, we need to make
sure that we build the trust so that we
can adopt that into our processes. So
the question is today
how do we produce statistics that's
faster that's less costly and that's of
higher quality. It mean it's available
at a lower level but it's of higher
quality and we'll have discussions
around that.
So strategy is good. It's good to have
strategy to know where we're going but
today we'll focus on implementation.
So we're going to move from talking to
doing.
So we'll move to doing today. Not so
much talking but how do we start doing?
So I wish you fruitful deliberations and
I hope that um the discussions that will
follow will bring you much closer to
implementation. Thank you very much. Let
me hand over to Osama. You take it from
there. Thanks.
you still muted Osama
handing over and telling them I'm on
mute. Uh thank you. Uh as Luis has said,
we have a great set of presentations and
presenters uh and a great set of uh
panelists and discussants. So I think
we're going to have a really good
conversation. So rather than waste any
more time, uh let's go to my good friend
Elio Visenor and he's going to talk
about using LMS across the GSPM.
>> Thank you, Sama. I will share my screen.
Uh can you see my presentation?
>> Can see. Yep.
>> Thank you.
Hello everyone. Today uh I would like to
present a project that we have conduct
in at use large language models for a
very specific high impact task coding
open-ended responses in surveys and
sensors.
Before I introduce the project, let me
briefly explain what large language
models are. These models are produce
tense vector-based representations of
language. Unlike traditional approaches
where document is represented as a
sparse vector based on word occurrence,
language models represent words and text
as vectors in a high dimensional space.
The key idea is that in that space,
geome geometric distances reflect
semantic relationships. Words with
similar meanings tend to be close to
each other and analogous relationships
are preserved in a consistent way. In
addition, these representations are
contextual. The same word may be mapped
to different vectors depending on the
context which in which it appears.
At ING, we are able to experiment with
these models within our data science
lab, an infrastructure that integrates
both technological and human
capabilities.
This responds to the need to integrate
diverse data sources and apply AI to tax
such as text classifications special
analytics and synthetic data generation
among other use cases.
Specifically, we have developed machine
learning models to automatic coding of
open-ended responses. Traditionally,
this process is manual. Expert coders
interpret each response and a sync uh
code from a reference classification
catalog. This coded responses are
essential for producing a statistics.
However, in many operations, the volume
is extremely large. For instance, in a
recent exercise of interensal survey,
more than 12 pinions serve households
were interviewed. So you can imagine the
number of people needed to code all
responses. manually within a reasonable
time frame. For some time, part of the
task has been automated using rulebased
approaches driving by word occurrence
patterns. In our experience, around 70%
can be coded this way while preserving
good quality. But the remaining 30%
still requires manual coding. To reduce
the that workload,
we develop machine learning methods that
use already coded responses to train
algorithms capable of detecting patterns
in vector representation as text.
Within with these algorithms, we found
it possible to reduce manual effort
while maintaining coding quality.
In recent years, language models have
significantly improved automatic coding
because embedding representation capture
meaningful
meaning more effectively.
The workflow change as follows. Each
word is represented as an embedding
vector. Each response is modeled as a
sequence of vectors and models are
trained to classify these sequences and
assync a appropriate code. The key
element is that these models can output
an uncertainty score which can tune so
that the system codes only those
responses where the model is
sufficiently confident. This allow us to
quantify uncertaintity
and efforts a minimum desired level of
quality.
In recent operation, we achieved about a
50% reduction in manual coding workload
while maintaining accuracy of 94.3%
comparable to the quality typically
observed in traditional manual
workflows. In other words, we optimize
the resources and reserve expert time to
the most difficult cases.
We also collaborate with to implement
this approach for automatic coding in
Chile's obtimization survey achieving
high highly competitive resource. This
analysis highlights the value of moving
from traditional representations to
embeddings and sequence based models
using word occurrences features accuracy
was around 85%. Adding embeddings with
traditional method like extreme radium
boosting increase performance to roughly
87%
using sequence classification reach
about 91%
and using a large model for for example
Roberta a fine tune of bird for Spanish
generic embeddings reach around 92%.
It is important to note that large
models such as B often require
specialized computing infrastructure
like high performance GPUs which may not
be relatively available in a standard
production environments at the physical
offices. Therefore, lightweight
alternatives such as fast text
embeddings remain very relevant offering
a strong balance between performance and
computational cost.
Working with embeddings and large
language models also opens the door to
intelligent agents. We are currently
developing this with a a goal of
evaluating the quality of our institute
data and metadata so they can be
effectively be consumed by agents that
answer questions using tabular data and
knowledge coded in metadata such as
variable definitions and survey design
information. This model can generate SQL
like queries from natural language.
However, the critical step is providing
the agent with the right context to
accurately identify the relevant
variables and determining the correct
query. This is an ongoing effort
morely.
We are exploring
>> you have one minute left just to warn
you.
>> Yes, thank you. how how these models can
support different processes within the
GSVPN. For example, a chatbot, train
interviewers, tool to access data and
metadata quality, automate generation of
informative documents and semantic
search over our repositories.
Relations learned include a unifying
architecture is essential for
scalability and production deployment.
High performance on premises computing
supports data and long-term
sustainability. Finally, international
collaboration has significantly
accelerated our process. Going forward,
we our priorities include improving data
and metadata quality to strength AR
readiness. Continuing to develop use
cases that relate these models in
products and services and addressing
challenges related to human and
financial resources. continuous
infrastructures updates and the adoption
of appropriate security measures. Thank
you.
>> Brilliant. Thank you, Elio. Uh and as
always, great stuff from Enki. Um right.
Uh is Renjung with us?
>> Yes, I'm here.
>> Yes.
Is uh is your camera? Let me sh Okay.
Okay. Can you hear? Okay.
Hey, can you see my screen?
>> Uh, my camera.
>> There you go. Now I can see you.
Perfect.
>> Okay.
Thank you S and thank you Mr. Via for a
very concrete and well structured
demonstration. This is exactly what this
session promised me from air potential
to air production. I will not recap what
he have shown. Instead, let me name what
I think is the most strategically
significant thing in your presentation
because it easy to miss amid this
technical detail. What Ingi has built is
not just an automated coding system. It
is a tunable governance architecture.
The trade-off tunable accuracy curve is
showed is not a technical artifact. It
is a policy instrument. It it gives the
subject matter director not just the
data scientist a level to decide how
much human oversight to retain and that
design choice in my view is what makes
this architecture ready for production
in a way that many AI parts are not but
the curve also surface a profound the
question about this shifting law of the
statistician which is the focus of our
session so let me put two question to M
Mr.
and then I will offer some broader
reflections for for the room. The first
question you your present your
presentation showed AI deployed across
four phases of GSP BPM from interview or
training agents in phase 4 through
automatic coding in phase 5 quality
assessments in phase six to document
generation semantic search and chatbots
in phase 7 that is a remarkably broad
footprint. My question is about the
people when you moved automatic coding
into production. What happened to the
coding staff? Was he retrained as
supervisor of the algorithm reviewing
the cases below the confidence threshold
or were they deployed to other tasks and
practically speaking did they need a new
skills that your organization had to
build?
>> Yes.
Thank you very much Ren for for your
question and and yes that the the the
people that
the uh coder experts that are already in
the institution still uh coding the hard
the difficult responses, right?
and and many other the the the workforce
that that was needed for for calling the
the whole responses
uh are are not a permanent staff of the
institute. They they they are hired for
the
er concrete uh operation. Right. So uh
we we just need to hire less people now
for for for for that.
>> I see very interesting. So um so uh I
guess for uh maybe but for other do you
have any suggestion for other uh static
offices maybe they have uh the previous
statisticians how should they do you
have any suggestion do they need some
skill retraining or redeployment what's
do you have any suggestion
Do you want to repeat the question?
Thank you.
So my question is maybe for other
offices we have coding previous we have
coding staff right we have coding staff
we need do do you have a suggestion for
those uh those offices do they should
they deploy so the coding staff to other
maybe higher level low or should they uh
should we do should should they retrain
those coding staff to uh maybe to to uh
on uh to train with maybe who have the
higher level of skills to handle we have
a new scenario. Should they should we
retrain the those staff or should do you
have any suggestion on that?
Erh, I I I have to say that that that
the
putting in production this this uh
models is an already new erh thing that
is happening right now in a
actually the interial survey was the
first operation that that we have put in
production this coding system. So many
of of the consequences of of doing this
we we we have already
just starting to face right we don't
have uh the the evaluation of the
processes already made is something that
is already happening
>> yeah I guess the human capital freed up
for maybe you can you can the human
capital maybe can be freed up for higher
value work
We yeah so
here okay so now I have a second
question um is this is about something
you your slides revealed that I found
very striking the crime narrative
classification work the joint project
between ini and in Chile showed a
calibration diagram where you use the
temperature scaling to align the model's
confidence store with with actual
accuracy and you publish the full
pipeline as open source code on GitHub
in the lab. Here's the what I want to
ask the certain parameter the competence
threshold that determines what gets
autocoded and what goes to human review
who decides where to set it is it is the
data scientist the survey director or
some institutional governance body and
is that decision documented as part of
the statical methodology the way we
document sampling design or estimation
procedures.
Yes, it has to be the the the
methodologies, right? And the as you
mentioned, we have a a threshold of
uncertainty
uh uh level and and we move this
threshold for different classes. So we
can uh guarantee that we preserve the
quality of qualification made by manual
others. Right? That's the general idea.
>> Okay. Thank you. So I want to build on
both of your answers and our three
reflections for the loom. First from the
shifting law of the stitation what in
experience illustrates and what Mr. Veno
has just described from this inside is
that automating routine process does not
eliminates the need for statical
judgment. It lead the statistitation
expertise move from executing the coding
task to governing the systems that
executes it. Setting the confidence
threshold monitoring for distribution
shift when new response pattern may
emerge. deciding what when to retrain
and signing off on the quality of the AI
system output. This is a high order low
and frankly a more intell intelligally
demanding one but our training programs
and our job description in most NSOs
have not yet caught up with this
reality. Mr. Veno's own lesson that
human and resources cause organization
elain a challenge and learn this point.
The second one is on balance efficiency
and regal the tunable trade of curve is
elegant but it raise a governance
question that goes beyond ini if 94.3
accuracy at 50% and coding is the chosen
operating point for one survey who
ensures consistency across different
surveys within the same office. What if
the population survey team set an
aggressive threshold while the economic
survey team is more conservative? We
need a institutional level governance
frameworks standard operating procedure
for AI system pro production not just
team level technical decisions the UNC
quality framework for statistical
algorithm provides the conceptual
foundation but operationalizing is it is
the work ahead the third one is on
making these gains globally accessible I
want to highlight something from ini's
presentation that deserves more
attention the ini initially
collaboration on crime narrative
classification public as open source
code. This is a mod model of source
cooperations that makes efficiency gains
transferable. A statistic office in
another Spanish speaking country can
adapt that pipeline rather than build it
from scratch. At the UN global hub in
China, we have taken a similar approach.
In December 2025, we contributed to the
launch of the UN handbook of on sensing
of for agricult
production tool kit. The principle is
the same. Not every NSO building its own
AI from scratch but a global community
maintaining sh validated open tools. The
strategic questioning I want to leave
with this room is can we do the same for
AI assisted survey coding and the UN
global platform host validated
multilingual coding models for standard
classifications occup occupation
industry crime type that any office can
deploy. the technical ingredients exist
thanks to the kind of work energies is
leading. What we need is the
collaborative commitment. Thank you.
>> Thank you. Uh brilliant discussion.
Okay, Elio, there is a question for you
in the Q&A tab from uh my colleague
Farah Nanoir here at the ONS and he's
interested in hearing about you know the
benefits from training spoke model
versus just using an existing LLM and
rag. Do you have views on when you might
favor one approach over the other? So
the question is in the Q&A tab if you
can see it.
Okay. Well, uh actually to to to build a
a rack, we we need to use a LLM.
Erh maybe it's not uh uh
very
obvious but uh this is a because in
commercial solutions we just build the
rack given the the LLM the the right
context right but h if we
h work with an LLM that is already in
our own infrastructure. We can build the
ra the rack erh with with this LLM
when when we use the embeddings we
we it's like like we use the abser
representation of the of the text and
and
can
go uh straightforward
by a typical machine learning
procedure and I I think that it is a
more standardized way to pro pursue that
asking direct directly to the to the to
the agent in in in the classification
workloads.
Maybe the the advantage that use the the
rack is that that you don't need a a big
h
training data set. Right? This is maybe
the main advantage of doing that if you
don't have a great uh a large printing
data set is is a way to pursue.
>> Thank you.
>> Thank you. Uh Ranjen, I have a question
for you. Mhm.
>> Um as we use Gen AI more and more across
the GSBPM
uh the traditional ways in which we've
assessed statistical methodology may not
work. Do you have any thoughts on
actually when doing this sort of work h
how do you assess quality?
This that's a very an excellent question
and it gets an uh get at something
fundamental traditional quality
framework for office official stat like
the European statistical code of
practice the UN fundamental principles
the equality dimensions were designed
for a world where the production process
was determinist deterministic and
chasable. uh you could every step with
gen AI model set traceability breaks
down the model reasoning is not directly
inspectable. So how do we assume
quality? I think we need a layered
approach. The first lay is output
validation the benchmarking against the
ground truth. This is what in has done
well compare the models codes against a
human coded gold standard and measure
accuracy
precision record. This works for
classific tasks where we have historical
reference data. It is necessary but not
sufficient. The second layer is
calibration. Uh ensuring the model knows
what it does not know the core narrative
work that in published with the
initially use temporary scaling to
calibrate the model's confidence score.
So we so that when the model says I'm
90% confident, it is actually correct
90% of the time. This is critical
because the entire human in the loop
architecture depends on the quality of
the uncertainty estimates. If the world
is overconfident or low quality bricks
through calibration testing should
become a standard part of the AI quality
assurance to kit for the NSO. The third
layer is behavior monitoring in
production. This is a piece most offices
have not yet built. Once the model is
deployed, you need to continuously check
whether today's distribution of input
matches the distribution the model was
trained on. If survey responders start
describing the job differently because
of economic change or new industry or
even lingu linguistic drift the models
training data becomes stale. Statistical
process control techniques where which
our community knows well can be adapted
for this purpose can trust on model
confidence distribution l on category
level accuracy
validation against the fresh human coded
samples. The fourth layer and this is
specific to generative AI as opposed to
class classification AI is semantic
evaluation. When general AI produces a
statistical report or data description,
there is no single correct answer to
benchmark against. Here we need a
different quality paradigm. human expert
review of a random sample structured
rubrics for factorial accuracy and
coherence and increasingly AI assisted
evaluation where a second model uh
checks the first. This is an act active
area of research and I would not claim
we have solved it. But the principle is
clear. Any gen AI output that enter the
official statistical production chain
must be subject to documented quality
review and as the review process itself
should must be auditable. The honest
answer is that traditional quality
methodology still works for the
principles accuracy timelines coherence
compar compatibility but as the
measurement tool need to expanded that
expansion is an urgent task for the
international statistical community and
the UNCBD and the DC as the right bodies
to related. Thank you.
>> Thank you. Uh and I got one last
question for you. This is in the Q&A uh
from Ezi. Uh so how have you handled
cases where the AI model produces
incorrect classifications
and then what mechanism are in place to
make sure those errors don't propagate?
>> Yes. Right. Uh this is a question uh
well we we can avoid that uh some some
errors from from classification.
Actually ma manual manual coders also
make mistakes because not everyone is
expert and they apply different criteria
for for some codes. Uh so h we have to
to live with that errors. What what we
can do with with machine learning
algorithms is that we we can have as an
output the uncertainty
or of the confident
of the model
to to make the the classification.
Right? So we we can use this factor of
confidence uh to to to
leave the difficult cases to the
experts. That's that's the the way we
can manage that.
>> Okay. Thank you. Uh I'm going to make
one comment because I think you said
something very important. I think um you
know with the hype around AR you hear oh
hallucinations and everything else. The
incorrect benchmark as you've said is
not zero errors. the correct benchmark
is what's the error rate given our
current methodology and that's what you
need to benchmarking which I think is a
very important point and I think a lot
of people miss out on that okay that was
a fantastic presentation and discussion
and thank you for those who ask
questions sorry we're not we are not
going to have time to uh
look um answer every question that's
raised in the Q&A but keep them coming
for each section uh and we'll we'll make
sure we answer some Okay. So, uh next up
we have my very good friend Satia
Prammana from uh uh Statistical Poly
Technic in Indonesia. Uh I'm excited
about this because for quite a few years
I've been hearing about all the great
work that they've been doing in in in
Indonesia.
So, uh today Seti is going to talk about
using AI to derive timely statistics
from satellite imagery. Um Setia, are
you here?
>> Yes. Can you hear me? I can hear you cuz
there we go.
>> You can see your presentation, right? So
I think you just have to tell I think
Clarence is running the presentation. So
you just have to tell them next slide or
whatever whatever you want. Right.
>> Okay. Thank you, Clarence.
>> Over to you.
>> Thank you very much. It's great to meet
you over uh after the year. So uh thank
you very much for inviting me and then
it's great pleasure for me to uh be part
of this uh event and meet uh all my old
friends like uh Elio and all colleagues
here I think is also good names of
colleague from UNSD and also from UNSBD
and uh uh Osama right now I'm just like
almost a month appointed as the director
of statistical methology right now in
NBP statistic in Asia I'm still also
holding the director of uh regional hub
on big data center for official
statistics in uh as the Pacific I think
uh next. Yeah. Yeah.
Okay. Yes. So before focusing on the I
think the moderations of cultural
statistic using AI and nonidential data
sources let me just overview what we
have done so far. uh I just brief uh
overview what we have done mostly is
about uh using several data sources such
as mobile processing data for different
uh type of uh official statistics such
as uh tourism statistic migrations and
also uh other statistics but we will
focus later about mix method how you to
use the uh satellite for agricultural
statistics. Next please. We also as uh
we also try to use the uh AI gene AI for
uh classifications. So it is leveraging
JFI for automatic predictive of
Indonesian standard industrial
classifications. Now it's been uh we are
still working on that and then we
hopefully that we can use it for uh our
economic sensors this year. Also we have
uh I think this one also I discussed and
also have some good very good feedback
uh last year together with Osama that uh
we use uh we develop now I pass AI AI
knowledge for metadata chatbot and
automatic interpretations but today next
please I'm going to just focus because I
just have seven minutes oh I still have
another one sorry we also use the earth
observation for poverty mapping and also
chat deprivations for poverty mapping
it's uh we we have the SDGS award for
this in 2023 and then for mix metal
actually we just last year we have also
SG award for um uh using certain image
for agricultural statistics next please
yes so uh agriculture statistic is a
fundamental
uh data right for policy especially
Indonesia for inflation control also
national planning. But uh next please
the traditional data collection method
alone now longer sufficients to meet the
two today demands uh as you know that we
need to have data timely granular and
also high frequency statistics. So
initiate a transformation starting for a
few years ago to integrate satellite
imageries machine learning and also
other data sources into official
statistic production especially for
agricultural statistics. So
traditionally uh right now statistic
Indonesia conducts the IRA sampling
frame survey every month especially in
the last week and then this is to obtain
the um the pendical stage. So the
harvest and uh estimation times so we
have uh
sampling frame uh survey for uh
obtaining the productivity rights. So
this methods actually provide strong
ground truths but it has limitations. It
has high operational cost. Yeah. Because
we have thousands of uh staff going to
the field every month. Also time
constraints and also related to
logistical challenges across Indonesia
because some of area is difficult to
access also maybe dangerous to access
also a limited temporal frequency. So we
try to combine the use of uh satellite
imagery or aeros machine learning
algorithm to predict the uh next please
to predict the uh uh pen phological
faces. So we know that we start from uh
tage fitive one fitive two and etc. So
we use um satellites to predict the area
of uh this the area and also the
phological faces of this uh fat patty.
Why patty? Because this is our main
staple. So why we use sentinel? Why the
sentinel? Because it's rather based and
also it can penetrate clouds. uh it
works well in tropical region like
Indonesia and then uh hopefully we can
detect planting and also growing stages
and also hopefully that we can detect
harvesting phases and then this is a
crucial breakthrough because it allows
continuous monitoring not only uh we we
don't need to go to we don't need to
bring our staff to the to the field but
we can uh allow us to continue
monitoring without uh waiting for a
field visit which is actually difficult
and then uh we also uh use this uh based
on the guidance principle of visual
statistics. Next please. And yeah so how
the predition is made? So this actually
in collaboration with several uh several
uh agencies and also ministers Indonesia
and also we uh collaborate with the
UNSD, UNSAP and also with several uh
agencies such as FO. So in Indonesia we
use uh we collaborate with Brin uh our
anal research agencies and then uh we
based on the uh sentinel data uh images
from BRIN from the research agency for
every 12 years we check the ISF segment
and then we predict the pera of each
palical classes and then later we try to
estimate all faces area particularly the
het petty using the sampling sign
including relative standard error. Next
>> just to let you know you got like a
minute left.
>> Yes. Next please.
>> Yes. This actually because uh Indonesia
is quite big so we also uh make some uh
uh checking area the ground check area
and then the in general right most of
the province have accuracy above 80%. Of
course, there's still area that below
80% of accuracy. And then based on the
estimation, the new method demonstrates
the capability of capturing the hed the
harvesting patterns similar to the
current official data through uh ISF uh
sampling frame survey and then uh we
have high accuras in some area. So in
low some area with low accuras actually
we try to uh fine-tune the model. Next
please. The last one. Yes. So the
product delivery it's been like uh uh
part of the UN hardbook of reassultural
statistics with Gabel to Lorenzo and
also Ronald Ronald on the UN. So it's
part of this and then if you want to
know in more detail about this we also
have a knowledge videos to share. We
have also the uh product of uh
publication but it's still in uh Bahasa
Indonesia. Later we also will put that
in in in English. Next
is this ah yes uh last year we uh we
have uh awards uh of this uh uh method
of this uh breakthrough uh geo particip
uh geo awards uh member awards of
modernizing ag statistic uh using
satellite imagery for penological
stage I think that's all because I just
have only seven minutes hopefully that
can Thank you very for your attention
and then looking forward for any
fruitful discussion or comment for our
uh
>> Thank you. Thank you. 7 minutes but
there was a lot in there. Fantastic.
>> Okay. Uh Mar Louisa uh and Marcus are
you here?
>> Yes.
>> Yes.
>> Can't see you yet.
>> Can you see me? I'm here. I can hear
you. Can't see you yet. Is your camera
on?
>> Yes, it's on. Interesting.
>> H
I can see Marcus now.
>> And now I can see you as well. Mary
Louisa. Great. Okay. Fantastic. Uh Sia,
you should keep your camera on because I
am going to bring you into this
discussion as well. So don't disappear.
Uh right. Um
>> I'm here.
>> Okay. Okay. So, Mar Louisa, I'm going to
go to you first. Um,
what work has the Brazil regional hub uh
been doing to enable more use of AI
drive statistics using big data to fill
the gaps and then how do you determine
what use cases to investigate?
>> Well, h thank you very much for the
invitation to participate in this
important discussion.
It's truly a pleasure to be here
especially with my colleague Satia from
the regional hub in Indonesia.
H
and I think that Satia example clearly
illustrates the potential of AI derived
statistics to fill the critical data
gaps. particularly when we think about
domains where traditional surveys face
limitations in costs and however I think
it also raises an essential question for
NSOs under what conditions can AI
derived outputs be used responsible as
official statistics and from the
perspective of the UN regional hub for
big dates in Brazil. Uh this is one of
the central questions guiding our work
across Latin America and the Caribbean.
uh our recent consultation with 21
countries showed that almost 40% of the
NSOs already have AI initiatives
integrated into their production process
most commonly in the processing phase to
improve like data validation cleaning
and efficiency.
Uh this confirms that AI is already
straightening statistical production but
scaling its use to fill statistical gaps
requires clear and robust criteria to
ensure quality and trust
and supporting countries in
operationalizing these principles is a
central priority of the regional hub.
One example is our regional project with
11 countries to develop climate change
indicators using satellite imagery and
AI through a shared cloud blaze
cloud-based platform and these
initiatives enables countries to
co-develop validated methodologies
uh to share infrastructury and ensure h
consistent quality standards And
in parallel uh through our training
activities with the UN global platform,
we are helping countries assess
cloudnative data science infrastructure
and build the capacity required to
validate and to deploy these methods
responsibly.
Um and this work also helps identify
some priority use case. We focus
particularly on domains where there are
clear policy needs where traditional
data source are insufficient and where
alternative data source can complement
the existing statistical systems.
And what what we have seen is that AI
does not replace statistical systems. It
strengthen their ability to fulfill
their core mission by combining strong
validation frameworks uh transparent
methodologies and shared infrastructure.
The NSOS can responsibly use AI derived
statistics to fill the critical gaps
while maintaining the trust and the
important statistical integrity.
>> Thank you Marcus. I've got a question
for you. I think you know from Sea's
presentations
there's two aspects here that take us
away from traditional production of
statistics. And one is that these sorts
of big data are not data that NSOs have
traditionally been used to working with.
We're traditionally we tend to work with
data we either collect ourselves
and also maybe uh government
administrative data. So you know this
big data site of all it's not our data
and then using Gen AI tools that's not
part of the standard skill set of
staticians in NSOS. So how as a leader
do you set a culture that allows for
innovation and that encourages the use
of big data and new approaches?
>> Um thank you Osama.
May I start by uh noting that uh uh use
of AI in a context of G gatial data is
extremely natural in the sense that
these uh satellite observations and also
other forms of collecting data where
where the physical location of the
object is is is measured because it can
be done without satellite.
in information.
the the but the nature of of the data is
that the data sets are huge and it's
very natural to ask can we somehow
automaticize the the process of the
using the the using the the data and and
also also also you you mentioned the
important aspect here u even uh
statistics Finland is not really big
user of satellite observations
because we have access to other big data
sets. uh having this u uh the location
information included. Um
the nature of the satellite imaginary
and the the administrative data data is
exactly there's a one exact uh aspect
what is similar is that we we are not
able to very much affect how the data is
collected. We have to cope the the data
with someone else has produced. We don't
simply have a resources to uh uh to to
to to force other data collectors
collect exactly information what we
need. We have to cope with the already
existing data. And this is my basically
first question related to to our speaker
today. Uh how much uh work
you describe is actually done inside a
statistical institution and and how much
outside uh in in other government agency
and secondly
how how this everything is financed
who is paying to whom I mean is this
data expensive
and if and is the accuracy of data uh
high enough. I've understood that the
kind of very um uh the data the the
where the granularity is high it's it's
free or almost uh um free very cheap but
more you want to get information more
you have to pay uh uh have you have you
how you have handled this uh cost part
of the story Uh do you have a budget for
compensating this data or
is the data given to you free? And um
and and and finally definitely
everything is a question of what our
organizations what we are able to do. Uh
it's actually big uh uh process of
learning and changing organizations.
I believe that this consists of kind of
a right combination of planning and
anarchy. In an anarchy I I mean in a
sense that
every person is a kind of every every
person has a size of creativity
a side of kind of research attitude. uh
and uh and there are a lot of things
that you have to experiment.
But to to get these uh um uh what what
people invent and learn into production
means needs very normal structures that
you have to set targets, you have to set
timetables, you have to define your
corporate policies and uh so it's a kind
of balancing these different aspects and
and basically I feel that my role is is
to
try to generate
um
environment and culture where people can
freely uh start to think
perhaps I can use uh uh the AI in in my
everyday work. Perhaps I can try to do
it some here or or or in some other
place. But at the same time
I feel that I I have to start to ask is
this in a production or are you just
dreaming about it? So it's a combination
of the uh the encouragement and uh
gradually going towards demanding
results. Thank you very much.
>> Okay. I have to say I'm all in favor of
a bit of anarchy. um solution and you
raised an interesting point around costs
and financing
>> particularly with lots of this data
being collected by generated through
private companies not all of it but a
lot of it is and how I think this is for
another discussion somewhere else at
another time how
what are the things we have to do to
make this data a public good but that
that's for another discussion there's a
bunch of things there that you both
raised and I'm just going to go back to
Seia for a second. Uh, Seiia, are you
there?
>> Yes, I'm here. You hear me?
>> Okay. Oh, there your camera's off again.
I can't see you.
>> Um, so you're also heavily Okay. You're
also heavily involved in training
through the university.
>> Just so I'm interested, what have you
been doing to ensure statisticians have
the skills and technical knowledge to
use uh, you know, these new forms of
data, uh, machine learning tools, uh,
geno tools. So what have you been doing
on the train? What have you been doing
on the training side because that's
actually quite important I think.
>> Okay. Thank you. Uh can I just respond
to Marcus and then respond to course you
also just a bit a bit with what? Thank
you very much Marcus also Maria Louis.
So uh a good pleasure to have all the
comments from you and just quick respond
to what I think what is raised by Maria
and also by Marcus that uh how about how
it works right and how about the funding
or the team and the planning etc. For
example, the cost related to uh the data
sources indeed the data of big data uh
is not free. Some for example mobile
phone we have to pay in collaboration
with uh our mobile network operators but
for example for satellite we can obtain
it for central because it's free but
again it's in collaboration with our
research agency which actually they are
using they're more they're making the
satellite imageries to fit to our uh uh
needs and then uh whether it's only BPS
of course uh we in the beginning
actually the challenge is not only about
technical capacities also related to um
collaboration with between uh ministries
for example ministry of planning,
ministry of finance I think related to
the the budget Marus I think you
mentioned about that the budget and also
how about we can use that ministry of
agricultural minister of research etc.
So in the first years actually it
started on like three years ago. In the
first years we we try to collaborate and
to make into into one uh single
um vision that we can use satellite
imagery as part of our national uh
office statistical uh official
statistics and then right now it's still
on the uh research but mix method
actually the use of uh earth observation
is part of the big data utilization road
map OB statistics Indonesia and
hopefully it's goal to be official this
here that's actually that's why I'm
appointed to be the director to make
sure that uh I think it can be used as
official and then uh what you mentioned
uh again Osama related to uh the uh uh
uh training right of course we we
conduct uh uh luckily we have a uh
politics strategic where we have also
the scared of of our uh uh regional hub
same like FG and also the same like what
Maria have in In Brazil we have school
where we actually not only uh produce uh
young uh talent data scientists and
statutician but also actually together
with BPS data synchronia with together
with uh training center of data
synchronia we conduct a regular training
hands-on workshop covering data
management machine learning and AI uh
topics including um uh like chai
responsible AI and etc. And uh this also
we organized and know sharing program
on the SAS community including
government institution academia and also
practitioner. This hopefully can
accelerate the collective learning and
promote good practice of uh AI and also
official statistics in Indonesia.
I think that's just to to respond to
your uh comment or thank you.
>> Right. Uh I think we've got time for one
or two questions that have come up. Uh I
think this one is going to Satia. Uh how
do you do the validation of ground truth
validation? So this is uh a question
from anonymous in the Q&A. Their second
question
maybe my connection. Let me check.
>> Okay. How would how do you do the
validation the ground truth validation?
Uh oh.
>> Uh I lost signal here. Can you repeat
some? I'm sorry that
>> maybe no
worries. So um with mixed methods you
often with these sorts of data
especially with earth observation data
there's some ground truth validation
that needs to be done. How have you been
doing that?
>> Oh yes. So we select several area in
different type of areas in Asia like in
the coastal area in the uh
mountain areas to see the situation of
the ground check. So we send uh several
uh staff to uh some area to check out
whether our prediction actually
happening in uh true in in the field. So
that's why we we observe that most of
the area actually have high accuracy but
some area maybe is not that uh good for
example area that um uh montaneous area
that difficult to to uh access for
example that's uh area that have lower
accuracy but we send uh I think more
than almost 100 of staff to go to the
field and then check the the prediction
and then we actually have to uh inline
the the the time where the satellite is
going through the area and the time that
we have to go on check because sometimes
the window of checking should be the
same where when the satellite moving in
that area and then the time when we are
going to fill Zama
>> okay um uh much as Eric is a friend I'm
going to leave his question because you
can always get in touch with SEIA
directly uh even via me. Uh I've got one
for Maria Louisa. Um we'll we'll pick up
Eric's question separately. Um
um I've got a question for Maria Louisa.
Um which is kind of similar. Um
again it's around you know what have you
been doing in terms of skill any
specific things that you think are
important in terms of building skills
for using you know this sort of big data
and you know using these uh techniques?
I know you've been doing quite a bit.
Anything that you think is particularly
important?
>> Yes, there are some aspects that are
important. H and when we think about the
big question what how can
under what conditions can AI derived
outputs be used responsible as official
discs. So we need to think about many
skills that that must be developed h in
the in the staff.
H when you you think that the
statistician maybe he's going to change
his ability from being h the one that's
going to make all of the analysis but to
be a supervisor
of the models. Uh so uh we have like a
more focus on the computational uh
aspects and we try to to to to give this
approach in our in our trainings
h and also to define some some reference
to work. For example, when we the
statistician need to validate h again
the trusted reference data because the
AI derived estimates must be must be
like always compared with survey data
administrative sources h and this kind
of principles that's that must guide the
production of official stat statistics
also related to to how to measure and
communicate uncertainty
uh because when we are thinking about AI
outputs outputs it's it must like
include clear clear information about
their confidence and their limitations
and how we we transport the the
statistical knowledge the theoretical
statistical knowledge to this kind of of
universe
um also
h I think there's a change in how we see
the models because as Satia mentioned
the the the assessment of potential bias
and the fitness for purpose. Uh and we
we we need the statistician need to see
that the the models are not perfect.
They they must be like sufficiently
reliable for the intended statistical
use. And uh I think a last point that
that that must be h in the core of the
training for a statistician h the
question to about transparency and
reproducibility
because the methods must be documented
very clearly. H and
the staff need to be to be like
oversighting the model design the
validation implementation.
So like as the regional hubby we we we
try to focus uh the our supporting to
the countries in operationalizing these
principles as a central priority for the
new skills that that must be developed.
>> Thank you Marcus. I have one last quick
question for you. In a world where you
allow a bit of anarchy and new entrance
into NSOs are just, you know,
technically stronger and stronger all
the time, you know, for old people like
me whose technical skills aren't what
they used to be, but has to supervise
this work, what what's what's
the capability development people like
me need?
Oh well um perhaps the most important
thing is uh trying to listen what what
what people are saying. If you don't
understand what they are doing you may
ask and uh and and also it's helpful for
young younger people that you are trying
to force them to explain what they are
trying to do. It also helps them to
understand what what they are doing. But
also uh I I tried to emphasize that the
the leadership in a in a condition uh
what is uncertain because many aspects
of use of AI are such that we simply
don't know one knows what what can what
can be done then the good policy is to
uh
people at the same time try what they
won't try uh let them to create their
their own methods and ideas and uh and
also let them to just to play the uh
tools. We decided to open um to to buy
um copilot a little bit of license for
40% of stuff. So it's basically it was a
random number 40% and uh 40% of staff
have they have now um a better co-pilot
license and then they can start to play
with it but but so the the playing uh
it's a way to create ideas but quite
soon you have to ask people that can you
show me that what you have done is
actually beneficial ial for our
institution. So it's a combination of
free thinking and uh and gradually the
planning and discipline orders are
coming to um to affect the playing uh
playing ground. Thank you.
>> Thank you. Okay, Marcus Mar Louisa
Setia, thank you very much and we'll
move on to the next thing.
>> Thank you.
>> Okay, so uh
>> thank you very much. We're now going to
move on and hear about the governance
challenges of AI in statistical
organizations from the pets task team.
Uh and Matias Jük from the pets task
team and stats Netherlands is going to
do a presentation and uh Gary Donnut
from Stats New Zealand and Frankie K
from CSO Ireland are going to be
discussing. So uh Matas over to you.
>> Yeah. Uh thank you Osama. Uh yeah just
uh sharing the screen. Oh it's already
there. Yeah sorry. Uh yeah uh yeah thank
you for the invitation to present the
work of our task team. So uh the privacy
enhancing technology task team has been
um active since 2018 investigating the
role of pets uh in in official
statistics and since 2022
uh we also have UN pet lab which is kind
of a community of practice also doing
experimentation practical
experimentation. So first of all I would
like to thank um yeah all colleagues
from the task team and unpad lab for
providing input for this presentation.
Uh next slide. Uh so uh
uh let's start with what are pets. So
there are a lot of definitions floating
around but uh I like this one uh because
it's very simple. Privacy enhancing
technologies are a suit of tools that
can help maximize the use of data by
reducing the risks inherent to data use.
Uh so which captures that pets really
focus on uh the data in use as as
opposed to you know traditional
encryption uh which focus on other parts
uh uh of the uh of the cycle and uh
basically reducing the the privacy and
confidentiality risks. And if next one
um so in that uh respect some people
even talk about not just uh privacy
enhancing technologies but partnership
enhancing technologies because they also
um improve uh trust and then uh in many
use cases we need to rely on the other
partners like a data providers like a
researchers uh or tech companies and and
pets can help there. Uh and then finally
if you click again uh uh they support
principles of privacy by design, purpose
limitation, data minimization, all those
principles that come with the privacy
regulations such as GDPR. Next slide.
Uh so uh going now to the AI uh uh let's
look at the definition of AI model. Uh
so u according to Google the AI model is
a computer program or algorithm. uh and
this is a bit misleading because it's
not the kind of algorithm or program
that that traditionally we we were used
to uh because it can learn patterns and
relationships in the data which means
that um uh uh next one uh AI model
actually is data or can contain the data
that can be sensitive. Of course, it was
trained on sensitive data. So we need to
treat AI models differently than we had
been treating the the regular methods in
the past. Next
uh so that means uh there are a lot of
uh additional disclosure risks that come
with the AI and AI pipelines. So I would
just mention very quickly a couple of
them. Uh so of course if we want to
train uh the model uh on sensitive data
that in often includes u some risks with
leakage uh of sensitive information then
the model itself can can leak like
similarly that our outputs can leak the
data if they're not protected also
models can leak uh the AI models um uh
enable more um complex attacks
uh prompt injection like we we were used
to SQL injection. Now we have model
prompt injection in LLMs. Uh and even uh
yeah it is possible in some
circumstances to even reverse engineer
the whole model or data can leak in the
logs. Next
uh so uh going now to pets uh so here uh
I will refer to our uh definition or
classification of pets that we develop
as part of the UN task team. Uh in input
privacy pets uh broadly there are
methods cryptographic and and other
methods that provide some guarantees uh
that uh one or more parties can
participate in computation with
assurance that other parties will not
learn anything about the sensitive data
of that party and on the other side the
output privacy is guaranteed that
sensitive information cannot be reverse
engineered. Uh so in examples in input
privacy are of course uh encryption like
you can work on fully encrypted or
partially encrypted data or you can
split uh the data in the in the parts
that cannot be readable on itself but
you can compute with them in case of uh
secure multi-party computation or in
output privacy examples uh we can have
statistical disclosure control like K
anonymization but also new methods like
a differential privacy. Next
uh so now let's go to the examples. Uh
so uh I will start with confidential AI.
This is based on secure enclaves and
secure enclave is approach that actually
was used uh in production in Indonesia
and Setia mentioned already that use
case uh for protecting mobile phone data
but of course can be also used to
protect not just data but also the
models as well as the the user
interaction with the model. Uh and this
approach can be extended u into the
confidential federated u uh retrial
augmented uh generative systems uh which
uh which are more complex use cases
because they also pro protect the
knowledge bases
so that information cannot leak from
there. Uh so that's kind of one more
centralized example and next uh example
is um example of a more distributed
scenario. So in that case within the UN
pet lab we have a project that started
back in 2022
uh with the uh idea that uh if NSOs need
to share and collectively train the
model on their own data without uh uh
wanting to share the the the training
data of course uh then you can um
organize um a protocol with federated
learning and and have iterative uh uh
model training uh and then uploading the
weights that that are then uploaded into
the new version of the model. Uh now
federated learning on itself uh doesn't
provide uh complete privacy guarantees.
So in that case uh uh the the team also
experimented with differential privacy
and homorphic encryption which are
methods to to do this part of
protection. Uh and u uh and the results
were were were published uh in uh in a
paper. So uh next uh
uh
>> you got about a minute left.
>> Yeah. With conclusions. So uh by
mitigating these risks of data expo uh
exposures that that I explained PET can
act as trust technologies. Um so allow
AI to be used responsibly and safely.
And examples of that could be u that we
can um minimize uh u the data that is
needed uh uh from the input side or or
make make sure that it's confidential.
We can also support the pets can also
support purpose enforcement. For
example, scripts in securing claims have
to be pre-agreed which means that they
cannot be used for other purposes than
what was already agreed. Uh then uh I
mentioned protecting user data prompts
and model weights. uh enable co-creation
and sharing of AI models uh and also um
more automated disclosure control. So
basically increasing the trust between
the collaborating partner uh parties
and finally uh the next slide
uh I would just like to to put to leave
this on uh so in the task team uh we are
collecting disseminating uh case studies
from projects that are introducing pets
and in the new update of pet guide we
will particularly focus on the data
governance aspects of pets uh and uh
that includes the role of pets in AI and
um within the work streams we will have
uh we already have practical
experimentation that's supported by the
UN global platform uh and we plan to
have next open house in April uh so we
will provide uh invitation to that day
thank you very much and uh look forward
for questions
>> brilliant thank you okay uh Gary Frankie
>> are you
>> hi Okay.
>> Okay. So, so far I can see you, hear
you, but can't see you yet.
>> Okay. Um, I can see me.
Can you see me yet? I think it's just
taking a little time to get from Cork.
>> Yep.
>> Okay.
>> Can Matthews, can you see them?
>> Ah, okay. That's interesting. Okay.
>> Uh, all right. Well, let's go anyway
because you know what? Frankie and I are
very good friends. cuz I know what she
looks like. Um,
okay. Uh, and Gary Narl's a very good
friend, so I know what he looks like,
even though I can't see either of them
right now. I don't know what's going on.
Okay. Uh,
uh, everyone can see Frankie apart from
me. Okay. Um, I don't know. Uh, Frankie,
look, uh, sort of Matias has sort of
talked about, uh, AI models, uh, and
some of the risks and, and they are able
to pass largecale data very quickly. and
repeatedly and variations uh and you
know this raises issues around
confidentiality and disclosability
at a senior leadership level how do you
get an organization to address such
issues again I think you know with these
emerging technologies right I think
we're quite used to addressing these
issues with the data we have used and
the statistical methods we have used but
you know the world's changing so how at
a senior leadership level do you get an
organization to address such issues.
>> Uh yeah, thank you and an absolutely
fascinating um presentation for Mattius.
Uh I certainly uh learned a lot from it.
So thank you very much. Um and obviously
Matthews has has kind of made some
really interesting points um in terms of
you know is there a real potentially a
real shift in our relationships with um
with stakeholders and and a little bit
although he was talking about it was
about building trust but it's almost
saying actually don't trust anybody um
and including and maybe particularly um
our tech providers um and somebody
coming from the technology side um you
know been thinking a lot about sort of
cyber attacks and of course zero trust
is is one of the basis for that and
therefore you know maybe moving it into
other areas in terms of relationships is
is perhaps you know a sensible um
absolutely sensible thing uh to do um
but I guess one of my reflections on
what Tomatus was talking about is um
those sorts of technology and techniques
um you know are really interesting but
they're quite expensive um and they can
be quite difficult to implement um you
know using scarce resources
Um and you know lots of NSOs are facing
funding challenges. So um whilst
technology absolutely plays a part in
this I think it's part of the approach
to the challenges that you were
mentioning. Um and I suppose I'm kind of
think it's interesting what you say. Has
it changed a lot? Is this a mind shift
for our staff in terms of how they
interact with data and these models? But
I suppose to me we've always had to
protect our data. So how much is it
really a change at a high level for
NSOs? Um so in a way many of the issues
around confidentiality and
disclosability are not necessarily that
different from the challenges that we've
always faced in using data but of course
the scale is completely different and
therefore the approaches that Mattis was
talking about are absolutely necessary.
Um and I was very interested in what
Renan was talking about um in the first
presentation you know about you know
thinking about some of the um less
technology based or centric processes
such as making sure that we're
benchmarking we're retraining to avoid
model drift etc.
>> Um
>> so I suppose my point is that it can't
we can't just rely on technology we need
to have a holistic approach to governing
AI. Um and we will have some of those
aspects already in place. I think
somebody else also mentioned the codes
of practice that we have such as the
UN's fundamental code of practice and um
the EU's escop. So they're still really
relevant in terms of providing
foundations and how we govern the use of
a genai within statistics. We've still
got to main trust with our stakeholders.
Um and in particular, you know, in this
area, it's about maintaining the trust
of people and the companies that are
providing their data to us. um whether
that's directly through surveys or in
this case obviously indirectly through
um the training data that's in these
models. Um so we've got to be
transparent about what data we collect u
and what we're going to do with it. We
need to continue to make sure it's
representative and within bias and
without bias. Um and you know those
challenges are quick applicable to all
of the data we use as well as genai
models. And somebody also mentioned data
is at the heart of this data quality. um
uh and again it does become more
challenging in the world of LNMs but we
do have the skills to review and analyze
it and again there are similarities that
we can learn from admin or big data um
I'm uh uh leading a an EU project which
is around AI machine learning in in
official statistics and one of the work
packages is providing an amended quality
model for machine learning for example
which is based around the total survey
error model I think we're going to
continue need to have human in the loops
or on the loops and that's going to be a
really important part of our quality
assurance um going forward for quite
some time. We need to be continue to
think about ethics um you know just
because we can doesn't mean we should
you know and it reinforces that need
about being transparent about what we're
doing and we have to be thinking you
know in our organizations what are the
legal frameworks in which we're
operating so in the EU we have um the AI
act and but that has to be led alongside
GDPR and the data governance and I
suppose also some NSIS are also taking
on a role of data uh stewardship and
working with organizations that create
the data that are used to perhaps train
these models. Um, but it's extremely
difficult to do that once you get belong
the the public sector. Um, and this
actually might mean us trying to ex
expand into engaging with tech companies
to try and help make sure, you know,
we're keeping our data safe. We're
protecting privacy and confidentiality
whilst making sure that the training
data and the answers these models come
up with, you know, are at the quality
and the standard and the citations that
we expect um within the world of
official statistics. I'll be honest,
it's quite a sensitive topic at the
moment as countries are becoming almost
becoming to look less dependent become
less dependent on these on the current
large tech companies. There's a lot of
discussions around data sovereignty,
geopatriation,
repatriation
and that you know feeds into these
conversations around you know governance
over AI. Um and to be honest NSIS I
don't think are particularly good about
with this sort of engagement and
therefore working as a community for
example the UN ECE project on AI
readiness the work that the OECD have
been leading world bank have been doing
work so have IMF so really working
together as a community I think is
particularly important um but you know
many of those aspects I've talked about
are based on being able to trust those
relationships trust the providers of the
data the users of our data and the and
suppliers um of tech. So the we need to
continue to have skills in in building
relationships.
Um but if we're moving more towards
that, you know, don't trust people, then
probably a lot of those um techniques
that Matias was talking about are likely
to play a part. Um so I think we're
going to need to continue within our
organizations to think about the the
governance, the processes, the people,
those sorts of skills as well as the
technology. Um so as I said in summary
we're kind of asking our organizations
to do much the same in a way to high
level we have to have strong governance
we and and ethics to protect the data we
use uh and explain all of that to to our
stakeholders but technically how we go
about that is clearly changing and
potentially the way that we interact
with suppliers and users of our data is
also changing. So, you know, maybe it's
about trusting no one to start with and
building those relationships and
therefore that's what as organizations
we need to prepare for.
>> Thank you. So, Gary, I actually agree
with Frankie that at a high level
there's nothing new here. We've always
had the concerns about confidentiality,
data security and things like that. But
as these new techn as these new big data
sets start being used, you know, new
tools, you know, gen tools and things
like that, there's other technologies
like pets that we can use to help us.
Um, from your methodological background,
how do we actually get gain a better
understanding of these technologies and
tools so that we have good quality
assurance of our outputs and can explain
them?
>> Yeah. So it's an interesting interesting
challenge and that that it is um in some
ways it is developing you know new as as
we're going with new new methods and
that but but is also um as Frankie
touched on a little bit that there is
also the existing that we've been using
models for for some time um you know and
and you imputation is a is a is a model
um and you know we've been using
different levels of sophistication over
the years but but it is taking it to a
whole new level. Um I think it was
interesting though as well is that
Ashwell uh when he when he um gave his
introductory comments, you know, we've
heard a lot about trust and and I think
there's there's a really key key aspect
that we've got to keep there. But I
suppose um when I'm sitting down and and
sort of, you know, thinking about from a
methological point of view, you know,
how we how we release this, I sort of it
goes back to,
you know, can the result be explained?
You know, it it's it's a you use a thing
called the sniff test. You know, does it
smell right? It's it's you know and and
there's a whole you know
you getting the methodologists to
actually run some simpler models and see
how they align you know the calibration
and um making sure that you know that
the the data the results and the context
are are accurate. You know we've we've
when you do research you know you have a
number of processes that you follow
through you know follow you know what
are your validation objectives you know
how are you going to validate the data
sets you know what are the appropriate
um you know sort of measures and and um
detection you know how are you going to
do validation experiments and they still
hold in in today in in you know the use
of AI. So you know I encourage the my my
methodologists to just apply their
normal research techniques into these
these methods and of course we've got
aspects around bias you know um
we can start using more what if analysis
simulation models different flavors of
it like you know prompt testing
um you know thinking about okay um can
we alter the the the um the test data
set to see you know what what might
happen um think about scalability what
will happen there um and then of course
you know um you know Frankie touched on
the fact that that you know ethics and
fairness have to be considered as well
and I think that's a growing uh area
that methodologists perhaps haven't been
thinking a little bit back you know
about in the past and I think we got to
become a lot stronger in that and and we
think about data sovereignty um so It's
it's interesting because I think um some
of our existing quality frameworks
are still highly relevant. We just have
to apply them in a different lens. The
final few things I do want to talk about
though as well is that um
we've really got to when I'm talking you
know testing um when we're going to
introduce new models and that you know I
really do focus on how are we going to
monitor the model um and you know for
ongoing monitoring and and and what are
the the the um parameters or or
expectations that we would expect to see
and what happens if it goes a little bit
astray. Okay. And then the final thing
is is becoming a whole lot more
transparent um about the use of models
and what we're doing um and and as part
of you know the work that we when we
publish I am really pushing hard for um
you know if we do analysis using simpler
models to actually publish those as well
so that um end users can actually
look at our simpler models and think oh
actually can we use or how can we apply
that as part of our our regular of our
monitoring of the final data um sort of
thing. So yep.
>> Thank you. Uh Matthews I have a question
for you. Um so you talked about all this
in the context of AI models but actually
Frank has picked up on something as uh
data stewardship of sort of other data
not just data we collect or generate
ourselves is becoming more and more
important and I think you know there
there's a possibility where we'll see
NSOs
uh thinking about becoming national data
offices or data platform organizations
and we may move to what my good friend
Jeff Bulby of Stat can refers to as a
wholesale model where really what we're
doing is making data available. Whereas
our current model is very much a retail
model where we produce the product, you
know, the statistics. In a world where
we sort of move to
more wholesale data provision, does pets
become more important then and why? And
if so, why?
Yeah, I think it's a very good uh uh
question because uh pets so first thing
pets we shouldn't seen them as a kind of
silver bullet for all problems with the
data protection and privacy protection.
We need to look at them in the context
of also organizational measures because
you need to put measures in place and
some some are better to be technical or
technological measures and some are
better to be organizational measures.
But in this context of the data
stewardship role of of NSO,
what we can expect is that the data
governance arrangements in in those yeah
whatever they are hubs or or spaces data
spaces or or uh uh uh these new data
ecosystems will will vary depending on
the purpose of use of data and uh
partners in the data ecosystem And I
think that pets offer you here some
possibility to balance uh uh the the
kind of uh uh risk uh with the usability
and and more fine-tune it to the
particular
use case or particular type of of um uh
yeah uh data arrangement and I think uh
that NSOS that have ambition to to
become um data stewards should invest in
skills in pets because it's not that we
would need to you know develop these
cryptographic methods ourselves they
they they are already developed by other
organizations that are much better
suited for that but we need to know how
to apply them and and and when and in
which uh cases and and that will require
skills and I think these skills is
similar to other situation when we need
to also have a good skills how to manage
metadata and and data standards and
interoperability if we want to be in
this position of data stewards uh in
broader role of NSO.
>> Okay, brilliant. Osama, can I can I can
I give a little bit of an example
where we've we've had some experience
and it's pretty simple example and and
I'm sure many agencies have done it but
we we were bringing together um telco
data and and working with uh uh the
three suppliers of of telecom services
in in New Zealand and um we wanted to
bring all the all the data together, but
they were obviously very cautious about
giving us data because if we slipped up
and published
data, their opposition would see, you
know, their their reach and what have
you.
So, so the way we approached that was
through building a partnership with each
of the three Telos and actually getting
them to confidentialize the data and
then shipping the confidential data
through to us and that gave them a very
strong
uh you know there was no reason you know
because we if we even if we did slip up
there would be no confidential data to
give out because it wasn't confidential
in the first place. Now, what was
interesting there was that the Telos
themselves actually weren't that strong
in confidentializing data because, you
know, they're all about billing and
making sure the connections work. And so
we actually ended up implanting some of
our staff into their data systems to
create the data sets that we were
wanting and then you know and and
demonstrating and talking them through
the confidential methodologies.
Now very simple example but but it
actually talks about strength talks
about data governance it talks moving
beyond our boundaries. Um, and I think
that that's quite a, you know, a
powerful model to to think about as we
go forward. Um, so yeah, just an example
I wanted to share.
>> Just for what it's worth, I've been
thinking for a while with our work. A
partnership model with private sector
providers where you work with them, I
think has worked much better for us than
a commercial model where we've tried to
procure data, you know, from some uh,
commercial. Okay. Uh, look, uh, I've not
been the greatest moderator because
we're a bit behind time. Matus, there is
a question or two for you mainly about,
you know, joining pets and I think
something about how you get skills. Can
I leave that to you to maybe pick up
separately uh as I move on to uh the
next section? Uh so for the next section
is basically uh so uh we have Rohit
Badwaj from the NAT stats office of
India although I suspect he's still very
busy right now with the uh AI summit
that's going on in India and he's going
to talk about the road map from moving
from ad hoc pilots to a mature
institutionalized data science operation
within an NSO. Robert are you here?
>> Yes I'm there. Maybe the camera is
taking some time for me to come up on
the screen but I can start you know in
in the interest of time.
>> Okay, go for it.
>> Okay, thank you. Thank you Osama. So yes
uh there's an exciting time in India
with AI summit happening really
exciting. So I I'm just going to present
what we have been talking about. We have
been talking about uh uh you know all
the pilots, all the work everybody is
doing but the ultimate goal is to make
it happen. and make it uh bring
everything to the production. So can we
go to the next slide please?
Yes. So this is my uh this is how I'm
trying to present this. I I'm going to
talk about how we did it. I mean how we
how we created a collaborative
framework, how we you know strategically
pivot from passive to proactiveness and
what was what has been our governance
model and how how has been the journey
so far. I'll not say the journey is
complete. Journey is still some time to
go. But yes, I'll trying to make up or
try to put up a presentation for that.
And then I am also going to talk about
uh the impact as you know the AI summit
is also about the impact. So what type
of impact does it have on our ecosystem
that's what I'm going to talk about it
in terms of use cases and other KPIs you
know. So next next slide please.
Yes. So traditional barriers I'm not
going to delve much into it. We all know
about it. That's one important point
that we feel that it's time that
everyone starts using AI so that the
capacity between those who are using it
and those who are not using it does not
become so wide that it becomes
impossible to fill. So our vision is to
leverage collabor collaboration. So
collaboration is key here. Leverage
collaboration to accelerate AI adoption
for quality data and statistics for our
national development world. we call it
vixit bharat by uh 2047 we when we'll
have the 100 years of our independence.
Next please.
So this has been the approaches
uh this this is the approach you know uh
so these are the approach to the data
innovation. First is identify use cases.
If you don't have the use cases nobody
is going to believe that anything is
possible using AI in official
statistics. do some in-house PC's to
find out what is possible and what is
not. Do research about the best
practices, the physibility, how the
world is taking up various things which
you plan to undertake. Those are the
very important point. Then collaborate
and this is key to our whole effort of
innovation. Actively collaborate with
academic institution, multilateral
agencies and non-governmental partners.
And I'll delve into detail as I move
forward. And lastly but not the least,
document document each step. document
the AI use cases, document the AI
readiness framework. So we basically uh
have documented both of these things. We
have a documented use cases and then the
readiness framework is also there and of
course the working papers and everything
else can come. So our our basic approach
is experiment scale and then govern.
Next please.
So I'll talk about a little bit about
the collaborative framework which we
have adopted while doing our our
innovation journey. It's called we call
it a triple helix model. It's a
well-known model where government,
academia and startups all come to
industry probably we all come together
and work to achieve a goal in in
additionally we also pitched in
researchers and students as well by way
of having hackathons and other extension
activities which I'll talk about it as I
move forward. So collaboration is the
cornerstone of our success if whatever
success we could get by now. Next
please.
So let me just discuss some of the
journey uh we undertook of our
innovation. We started 18 uh you know
sometime in 22 and there was lot of
delays in publishing the guidelines. The
ownership of the whole innovation
innovation activity was not clear. Uh
there were not not many use cases which
on which the businesses were agreeing on
and then when whenever we approached
anyone for collaboration they asked us
what what what is in it for us. So all
these questions dragged us and we sort
of you know for one and a half year we
remained in a passive state and then we
decided to make it more strategically
more pro proactive. So our senior
management reached out to different
partners and convinced them to become
collaborators in our journey of
innovation. We started working on a
transparent criteria for partnership for
project uh g giving the projects and and
of course the lastly the direct acid
academic industry partnership. This is
very important because not everything
can be done by the statistical office
itself. It has to have partner who are
wellversed with technology and they can
do it in a much better manner. Next
please.
So uh these are the some of the things.
So I the governance models is written
and we have this council which is headed
by the head of the NSO which is the
secretary of our ministry and uh things
like experts from outside transparent
project uh selection and criteria
guidelines based on trustworthy AI and
regular review and adaptive governance
is key to all this. So some of the best
practices I'm going to read which we
felt over a period of time are very
important. AI champions needs to be
encouraged. There has to AI champions
needs to be identified and they needs to
be encouraged very pro be very proactive
in building the partnership. Prioritize
knowledge knowledge sharing attract
specialized AI talent. Secure dedicated
funding. These are some of the issues
which which we thought while we moved in
our journey have played a very crucial
role in making us uh making our projects
start and remain sustainable. So we
followed this epic framework again is a
very established framework of education,
partnership, infrastructure and
commercialization. Commercialization in
sense of productionization. Next please.
So this is our journey. It's all on this
uh screen. I'm not not going to read it.
So we started in early 2022
operationalized in one and a half years
of almost of you know less active uh
state and then started with July 2024
and by July 2026 we have 12 use cases
undergoing. uh you know in some way it's
it's underway and and five are already
deployed in production and I'll request
all of you who are there please visit
our uh website uh and there there are AI
use cases section available in our
offering go there experience yourself
I'll be very happy to get any feedback
next please
so these are the outcomes in terms of
number 12 uh act the number itself
speaks uh the best part is that we have
been able to collaborate with 18
institutions and voluntary organizations
and different partners and we have been
able to engage more than 8,000 8,500
students by way of uh hackathons
especially by way of hackathons and
connecting with these uh these
institutions and this is written that
you know we we have worked with IITs we
have worked with server and startups we
have worked with voluntary all the names
are there and need not read it one thing
I just wanted to tell you that we have
been able to create a MCP for us MC MCP
server for us and which has been really
lapped up by the community and you know
the tweet itself announcing has more
than uh you know four lakh which is like
400,000 views. Next please.
So this is our workflow. This is
important. Quickly identify the use
case. Do a rapid prototyping. Don't take
time on that because people then will
not believe that something is possible.
Evaluation and validate and then decide
which ones are fit to be scaled and then
make your institution ready with the
capacity and have a framework if
possible for that and then put it in the
deployment and production and
deployment. Maintaining is something we
have outsourced it. We are working with
one of our partners to maintain and
orchestrate the entire thing. Next
please.
Yeah, some of the use cases the all the
use cases you can say we have a notebook
LLM corresponding to that we have a uh
tool available for us. We have NCP
server there. It's again available on
our website. The details are there. We
have NIC code classifier there. Again
available live on our website. Semantic
search for data portal is there live
again. You go to our data portal and
it's there. AI search is there. Next
please.
And then we have a website chatbot as
well and few of them are in the in the
process of getting pilot or it's at the
PC stage. The idea is that in next six
months create a data as a service uh
platform with help of the AI. Next
please.
So these are the lessons learned. Uh
it's all there. I'll only say one point
is that active engagement is always
better than passively waiting for
something. And in NSOS because we are
publicly funded it's our responsibility
to create a safe fail place to fail
because that's very important uh for
especially for innovation practices and
way forward how how we intend to move
forward I'll say we'll keep expanding
the collaboration uh we will create a AI
readiness framework for data by the way
we have one and that has also been
selected for the upcoming conference of
ISI on in I think it's in May sometime
and lastly but we are going to make all
the code bases public. So our MCP server
code base is public on our GitHub.
Anybody can go and raise a pull request.
So building a replicable model for other
national statistics office is our
ultimate aim. Next please.
So this is a nutshell a picture again
generated by AI which talks about the
innovation journey of NSO India. It's
all there. I thought I'll put it up in
the last so if there's a discussion this
can be helpful. Thank you very much and
I'm ready for any question or
discussion.
>> Brilliant. Thank you. Okay. Uh Ivan, are
you here with us?
>> Great.
>> Okay.
>> Yes. Um
>> so there was an interesting comment
there from Roit about uh you know sort
of trying to set up a model for other
NSOs. Um which is one aspect of um
working in the open. Um there's been a
lot of talk about working in the open
open tools open all that sort of stuff.
Why do you think there's a lot of talk
about this and why is this important?
>> Yeah, thank you and uh it's been
fascinating to follow the presenters and
uh the discussions. I think openness is
a central issue as we move forward when
it comes to O AI because openness is
about trust and as we all know uh as
statistics offices our our our value has
so much been about transparency
um reproducibility accountability that's
really what we've derived our legitimacy
in official statistics so even as we
embark on integrating AI and you know
referring back to the idea of
institutionalization
for sure that's already happening uh
it's no longer a question of
institutionalizing it's probably more of
how we do it in a deliberate responsible
and uh sustainable way and that's why
this key question you raise is is very
paramount in terms of of uh openness in
how we do things and I think
specifically
we we could think of it being so
beneficial to all of us in a couple of
ways. Uh one is that with with openness
we are able to
get feedback from others in terms of uh
the things we are doing in in the models
we are developing which obviously helps
us uh get better but also with openness
we we are able to share and learn from
each other. we're able to share and
learn from each other. And um the other
thing is that uh with with openness we
are able to uh also um build that trust
in the users of of of our statistics. So
I think openness is is definitely a a
fundamental issue here that uh we should
uh we should all keep in mind and uh and
and be able to integrate. But thinking
of some of the ideas around um the reuse
of models and the like. I mean in most
of our statistics offices this being a
new area we we can all agree that some
of the ideas like India has presented
and the others uh it would really be
beneficial to to adopt them given the
limited skills or or competencies in in
statistics offices. I think openness
also brings that opportunity and I think
through the hub the regional hubs that
we have like the one in Africa that's
one of the things we aim to uh in the
sense that from what we do in Rwanda or
what is done in another country it can
be able to be sharable so openness is is
really a critical one and uh I hope we
all adopt it.
>> Excellent. Right. I have a question for
you if uh you're still around. Okay. So
um I regularly say that prototyping is
easy but implementation is very hard and
if I looked at the workflow diagram that
I currently showed it would suggest it's
easy. I'm not quite sure it is so easy
but what key lesson do you have for us
to make the journey from prototyping to
implementation into businesses systems
easier.
>> Okay. Uh thank you Osama. uh three
points basically find your use case
do a quick pilot on that uh you you need
to quickly work with people who have
skills I mean not every skill exists in
NSO so you need to find that partners
who who are ready to work with you uh be
it through a procurement or otherwise
and start working with them uh and then
show it to the business or or the domain
expert and and once that helps them
bring those uh people on board. It's not
always the case that domain is already
onboarded. I mean at times they need
some evidence to get onboarded. Once
they are onboarded uh ensure that there
is a time limit before you before you
produce a minimum viable product.
>> Sure. and and and and for that I don't
have a I mean when we were doing it the
all our models are open aw weighted
model but uh we should not I mean my
opinion is that we should not wait for
it whatever suits anyone and that that
should be the approach but produce
something which can be shown to the
senior management as as something viable
and take the trust I mean and then get
the trust of the senior management on
your side once you have the manage
senior management support and something
to show up. I I guess every other things
fall in place
>> which actually links back to something
uh Marcus said earlier about you know
you show senior management they're only
going to be interested in if they can
see that there's an impact here uh
potential impact and not just that the
work is interesting
going back to the theme of openness so
you've talked about openness in terms of
transparency and sharing um to what
extent are are you at uh the National
Institute of Statistics Rwanda
using Opal tools like GitHub and sort of
you know I think more and more NSOs are
putting out uh the code that they use
for various things that they develop
onto their GitHub repositories. To what
extent are you using that and also
putting your own code out onto a GitHub
repository?
>> Yeah. Um not to the desirable extent.
Um I think this is something that we
have just started at the moment. the way
we've approached the the things we're
doing are more still in the pilot phase.
So we've been open to collaborate with
uh different partners like uh the office
of statistics of UK or the World Bank
where some of our projects we've opened
up to them for feedback and validation.
But the idea is what you just touched on
which is be able at the point where we
feel like some of these things are
meaningful uh uh uh uh uh uh uh uh good
to go then we open them up through the
regional hub and and we working closer
with the the uh the statistics um office
in the Africa statistics office in um in
Ad Saba uh coordinated by Dr. somewhere
that uh we come to that level where some
of these uh projects will be made more
available but so far what we are doing
is through workshops and we've had
several workshops where we invite
colleagues from Africa and they come and
see what we're doing and we're able to
share more from a practical perspective.
>> Okay, brilliant. um because I've been a
bad moderator of run over time. So, uh
I'm going to I'm just going to give you
some of my views of what we've heard
today. uh we have new large data sets
and new tools which means uh that we
there is more and more that we can do
but that requires both capability
building within NSOs
uh and also more collaboration between
NSOs while adhering to those fundamental
principles that I think NSOs have always
adhereed to and you know this all this
new big data and all this new technology
and AI doesn't actually change those
fundamental principles to me and then
sort of gets back to what Ivan's been
saying openness sharing this all helps
uh you know I think there is something
that you and SD can do there which is
perfect because I'm now handing over to
Ammer to just uh say some closing words
>> thank you thank you very much Osama and
great to see you again and uh many of
the colleagues here and uh let me assure
you that you are very far of being a mad
moder a bad moderator all right you've
been a fantastic moderator ator and you
led this with effectiveness and
efficiency. So, thank you very much to
you and thank you all to the colleagues
and speakers for their great and
valuable insights really. So, I'm really
grateful for that. Uh allow me also very
quickly to thank obviously Luis and my
team Maria and Clarence for all
supporting this webinar. So, thank you
very much to all. Uh clearly I'm not
going to try or even attempt uh to try
to kind of summarize any of these
discussions. All right. But allow me
just to make three points from the UNSD
perspective. All right. Uh on on this
fantastic webinar, uh the first point is
really about the transitioning from the
production to readiness.
While this webinar obviously and we
heard from all of you and from all the
speakers uh their insights about really
uh the the tools of production and how
to develop the AI systems all right
needed for such a production and so on
and hence many of the discussions seem
technical clearly and inevitably I would
say the speakers and the discussions
delved into really uh important policy
iss issues or kind of raised up some
policy issues. All right. And I can I
try to enumerate some of them not
exhaustive obviously but obviously
issues of human capital and managing of
human capital came the issues of
collaboration. All right. And I think
here the south south cooperation came
very strongly. All right. And the
examples of ini the regional and global
hubs and so on came very strongly.
issues of quality assurance, cost and
financing, privacy issues and holistic
approach to governance in AI came as
important as well. Uh all that leads me
just to the importance of this work and
to the importance of the CEBD role. All
right. in hopefully uh kind of bringing
and connecting the dots in all this and
again I'm grateful to Ashwell for his
presence and for opening this webinar
but I think more important it proves
that this webinar uh provides an
fantastic preparation and the foundation
for the Friday uh seminar on the 27
which I hope to see you and many of you
there all right because building on this
webinar and the tools of production that
we discussed here we can move to a
meaningful discussion on the demand side
all right of AI readiness so that's uh
my first point my second point is that
we in UNSD actually are trying to walk
the talk all right uh we are fascinated
by why what's happening but we are also
trying to modernize our own internal
operations testing new data engineering
practices as histed by the use of AI
such as LLM technologies and sharing our
experiences with our communities of
practice. One example is that we are
deploying an AIdriven knowledge
management system to analyze capacity
gaps and ensure our technical
innovations meets real demand in the
global statistical community and we look
forward obviously uh to share this with
you in more details uh in another
opportunity. My third point is that we
remain focused at UNSD in supporting our
member states through innovations. All
right. We continue working with our
partners to ensure that official
statistics are machine actionable and
interoperable
through the UN global platform that many
of you have contributed and continue to
contributed and we're again grateful for
all this contribution. We are moving
beyond theoretical training to providing
sandboxes and that would allow NSOs to
overcome local IT bottlenecks and access
cloud infrastructure and big data tool
as a service. We're also
operationalizing these principles
through the UN system data common a
federated architecture that transitions
our data from isolated silos into an AI
ready semantic knowledge graph. So uh
let me again close this fantastic
webinar by thanking all of you for the
contribution by inviting you all to
uh attend and participate and again
speak in the uh strategic and governance
discussion the in the Friday seminar on
AI readiness and uh I think uh with all
of this and this kind of two events
together we hope that we are
contributing to kind of bring this
community closer to what everybody
mentioned repeatedly which is the issue
of trust. All right, trust by the public
but also trust within the statistical
community about the AI uh applications
and uh readiness as well and use. Thank
you very much and hopefully I'll see you
all on Friday. Thank you.
>> Thank you. Yeah, I'll hope to see lots
of people in New York. Uh thank you
everyone. Uh
yes, thanks to all the speakers, all the
presenters, Luis and the group uh for
organizing and everyone for contributing
to the chat and also asking questions.
Bye for now.
>> Thank you very much.
>> Bye
>> bye.
>> Very good evening, good afternoon, good
morning to all.
>> Byebye.
>> Byebye.
>> All right. See you.
>> Thank you.
Ask follow-up questions or revisit key timestamps.
This video discusses leveraging Artificial Intelligence (AI) in the production of official statistics, focusing on how National Statistical Offices (NSOs) can use AI to deliver statistics faster, cheaper, and better. The session is structured around four case studies: enhancing statistical operations through AI, filling data gaps with AI, addressing governance challenges of AI in statistical organizations, and institutionalizing AI in statistical production. Key themes include the application of large language models (LLMs) for tasks like coding open-ended survey responses, using AI and satellite imagery for agricultural statistics, the importance of data governance and quality assurance for AI-driven statistics, and the need for capability development and trust-building within the industry. The discussion also touches upon the shift in the statistician's role, the balance between efficiency and regulation, and the benefits of open-source collaboration.
Videos recently processed by our community