Small Area Estimation: bringing theory to practice
2105 segments
Good, good morning, uh good afternoon,
good evening everyone. Thank you for uh
joining this side event on small air
estimation
bringing theory to practice. Uh we will
have uh the next an hour and uh probably
half an hour and 30 minutes for uh uh
covering the the topics in our side
event is one of the 57 statical
commission side events which is being
held uh virtually uh we've um
distinguished guests here speakers uh
presenting uh regional aspects of the
the work and country examples and also
more from the uh uh the the the World
Bank and UNSD in terms of organizing and
managing uh this project so far. So uh
without further ado, I will um I'll pass
the mic to uh our colleague Miss Hawen
who is leading this uh activity from
United Nations Division to do the
opening and overview of the session. uh
we have a very tight uh uh program uh
and lots of participants will try to
make the most out of it. How please uh
go ahead.
>> Thank you so much Daniel. Thank you.
Welcome colleagues. My name is um Jan
coordinator of the intersector working
group working here division. So really
be welcome to this event on smarts that
jointly organized by UNC
um the world by and only from Africa
Asia Pacific and Latin America policies
for the noise behind
um so as countries start to deliver on
the sustainable development goals and
leave no one behind the demand for
disagregated timely and policy relevant
data.
Many NSO face constraints in producing
reliable estimate at subn national lower
geographic areas and using different
survey code.
My mission really offer a practical and
methodologically robust pathway to
bridge this gap by integrating survey
data with administrative data geospatial
and other nontraditional data filters to
produce more granular insight while
maintaining traditional results. G from
the UNSD site have been partnering with
our colleagues um here if you you will
be listening hearing from them advance
the use of smartation
offices really grateful for their
partnership
banking
materials guidelines in learning courses
in workshop
will be hearing their work and also from
countries
and the students are literally
countries that done great work in apply
as
um we look forward to hearing uh from
them and then we also look forward to
better serving our communities
they use as well estimation over to
thank you so much
>> thank you how for uh setting the stage I
think uh we have a sound issue from you
a little bit but I think uh most most of
uh the the uh topics have been I mean
been mean covered uh just making sure
that our partners are in in the house
the regional commission the world bank
who is really pushing this forward uh
for enabling NSO uh to to get to uh work
with small area estimation where where
it makes sense so uh without further ado
I'll Switching to uh the wallet bank
who's uh been the anchor for this
program. Uh our senior economist David
will will take you through the uh uh why
small area estimation and what the bank
is doing in supporting uh countries and
some some of the outcomes of the the
work so far uh they have been uh working
on. So I will uh pass it to David. David
uh floor is yours. Thank you very much,
Daniel. Um, let's give me a moment
perhaps to get the presentation up. Um,
can everyone see the presentation?
Let me also go into slideshow mode.
>> Wonderful.
>> Um, so I hope everyone can hear me well.
If not, I can uh speak louder and thank
you very much um Daniel and Howy for
organizing this uh presentation. It's
really an honor and a pleasure to be
here um with you and with uh so many
colleagues from national statistics
offices. I'm going to be talking about
geospatial small area estimation with a
focus on a recent application that we're
currently working on for Nigeria. Um
this is a nice compliment in my view for
a lot of the work that the bank has done
traditionally on smaller estimation with
survey and census data including the
relatively recently produced guidelines.
um and uh but we are also looking to
extend this um to geospatial
applications.
um so I think we mostly understand the
benefits of data integration. Surveys
measure very important socioeconomic
indicators but they're expensive and
because of that they're small and
because of that we cannot be very
granular with them. So most surveys
cannot go below state or district
levels. And of course we would love more
granular information. And so to do that
we can combine surveys with
comprehensive auxiliary data to increase
accuracy and precision of estimates. Um
and this is uh potentially useful for
many things. We can use it to target
social assistance programs, monitor and
evaluate programs and do quality control
for sample surveys. Um there are
concerns that sample surveys may not
always be representative. For example,
if potential areas in a country are
affected by conflict. Um there may be
some places where enumerators can't go
and data integration actually offers a
method for measuring that or estimating
that. Um traditionally, as I mentioned,
we've used survey census data for
smaller estimation. There's a long
history of this going back to Fay and
Harriet in the late 70s in the US
estimating average income for counties.
Um then in the bank Albert's Loa and Loa
kind of popularized the use of survey
and census smaller estimation around the
world. Uh Jen and Lehi and Molina and
Ralph also made improvements to the
methods. Um and so this is great but
censuses are very expensive. They're
conducted typically once every 10 years
but in some countries not always that
frequently. Um and so there's been more
recently a large and literature
combining survey and geospatial data to
try to uh see if we can do smaller
estimation more frequently. Amazingly
this started in 1988 with a paper that
was 30 years ahead of its time. Um then
everybody forgot about the potential
geospatial data until 2016 and then a a
relatively recent literature as I
mentioned has really pushed forward the
boundaries on it. And there are a couple
uh recent reviews um one of which I I
did that basically show that this this
stuff works um that the the geospatial
data that is available is predictive of
of many important socioeconomic
indicators. Um there's a lot of interest
in big data. I'm focused on geospatial
data in particular because it is
publicly available and we don't have to
worry about selection bias. Um I still
have some concerns with mobile phone
data. I I believe it can be useful for
some applications, but still not
everybody has a mobile phone. Also,
there's been a really revolutionary
increase in the access to publicly
available imagery and indicators spurred
on by platforms like Google Earth Engine
and Microsoft Planetary Computer. Um,
and now a lot of research as I mentioned
kind of that shows that geospatial
indicators really excel at predicting
spatial variation and urbanization. And
if you think about the things that it
can measure like buildings and nighttime
lights and vegetation and land
classification, these are all kind of uh
good proxies for urbanization and how
urban places and that in turn is
correlated to many important social
economic indicators. For example, almost
everywhere on average uh rural places
are poorer than urban than urban places.
Um so uh one area in which this has been
applied quite frequently is poverty
estimation. Um we've done a number of
tests relative to survey data and about
eight countries and shown that the
increase in precision uh is equivalent
to expanding the survey data by about a
factor of 2.5 to nine depending on the
context uh and the measure. Surveys cost
a huge amount of money to implement as
you all know a million dollars and more.
So if we can expand the precision of
these by 2.5 to nine for using a
procedure that is essentially free uh in
my book that qualifies as a huge win. Um
although much of the research so far has
applied this to poverty estimation I
believe it can be applied to many other
indicators and in fact we are doing so.
Not all indicators are sufficiently
precisely measured to do this. Not all
indicators are correlated enough with
population density to do this but many
are. Um so just to give you an example
of the best case scenario and this is
from older work done in Tanzania
um and uh we use geospatial smaller
estimation to go basically for from the
district to the commune level. Um that's
the the lower level is on the right. The
higher level is on the left. Um and
what's notable about this that we don't
show here is that the average precision
is measured by the coefficients of
variation is about the same in both of
these pictures. And so the the first one
did not use geospatial data. The one on
the right does. And it just shows in
this case, which was kind of a best
example, best case scenario, we're able
to uh go a level lower in terms of
administrative units with no loss in
precision.
Um so this is great. And then you might
think why isn't everybody using it? Um
uh but there are sort of some obstacles
I think that have held back adoption.
One of them is that there is a variety
and sophistication of methods um that
have been used for this and that can be
confusing and people have used linear
mixed models which are relatively simple
but even somewhat complicated like
empirical best predictor models. These
have been used uh traditionally for
survey and census data. They can also be
used for geospatial data. There's also
treebased machine learning like extreme
gradient boosting and now more recently
mixed effects gradient boosting. um you
so th those are uh maybe one step up
from linear mix models in terms of uh
complication sophistication but then a
lot of the literature has been focused
on deep learning approaches especially
convolutional neural networks and then
more recently AI with using foundational
transformer models. Um these are kind of
AI big AI models that have been trained
on many uh many thousands of terabytes
of imagery um to recognize features and
then they can be fine-tuned with data um
to develop predictions um tailored to
data that's so that's one issue is that
the methods can be sophisticated um a
second issue is that the EA location uh
can be sensitive notably the demographic
and health surveys would publish jitter
EA location information, but uh not all
surveys come with EA location
information. Um and so for for public
researchers, these can sometimes be
difficult to obtain. Um the this can be
somewhat comp complicated to implement
and requires considerable computing
capacity and uh memory. This is
especially true for the more
sophisticated deep learning and AI
methods. And of course, these also
require very strong technical skills.
the tools and the documentation are
still evolving and developing. We're
trying um but we're frankly not gotten
as far as we would have liked in terms
of getting the tools and documentation
out there. Um so these are all obstacles
that are kind of holding back adoption.
The hope is that through um events like
these and finishing tools and
documentation and doing workshops of the
sort that uh we've organized in the in
the recent past that the word will get
out and people will start uh taking
advantage of the benefits of publicly
available geospatial data. So what we're
trying to do is uh work on applications
of generating smaller estimates for
indicators for countries doing
additional research and evaluation
testing methods using geoloccated census
data. There are a few countries where
we've been able to obtain geoloccated
census data and those are great for
testing methods and then developing
tools and we're working on tool two R
packages in particular one called
Geolink and one called Podmap. Um the
first one helps download publicly
available geospatial indicators. The
second one helps integrate it with
survey data. We we've also used to learn
to utilize Google Earth Engine which is
a very powerful platform for obtaining
publicly available geospatial data. Um
and of course now with the development
of AI and the AI chat bots uh you can
just ask it to write code for you.
Momentum is accelerating here but a lot
of work remains. Um so I'm going to talk
here mostly about an application I'm
working on currently um for our country
team in Nigeria um where I'm uh doing
geospatial smaller estimation for 10
indicators. Um the 10 indicators are
listed there. I think what's neat about
this is almost all of the work and sweat
goes into uh downloading the geospatial
indicators. Once you have that, the cost
of adding new survey indicators is
essentially negligible. it's just you
know switching in a different dependent
variable in the same code. Um so uh you
know going to 20 indicators would not be
difficult um if they're if they can be
predicted by geospatial data. And so
just to give you a sense of what we're
doing uh we created the shape file of
about a million one square kilometer
grids covering all of Nigeria and then
we use that to obtain publicly available
geospatial features starting with
population estimates. Um there are some
relatively recent population estimates
produced by World Pop that we used and
when we use that we find that there are
about half of the grids in Nigeria are
populated nearly 500,000 and so we can
obtain geospatial data for those
populated grids. Merge those with the
survey data using area centrids obtained
from the survey. We were able to get
those then estimate a model using the
survey data use that model to predict
outcomes for each grid. weight it by the
estimated population and aggregate it to
the desired geographic level. Um we're
doing both wards and local government
areas. The most complicated part of this
is using bootstrap techniques to
estimate confidence intervals. Um that
can be difficult. Um uh but the our aim
is to get sort of documentation out on
that on how to do it. Um, so just to
give you a sense of the geospatial
features being used, the modeled
population estimates come from World
Pop. Nighttime lights um can be
downloaded from the Colorado School of
Minds website. There's building data
from 2018 and 2023.
Um, land cover, crop land, open- source
cell tower location, which may not be
super reliable, um, but it we can use
it. Um, it's also not the most
predictive variable. Um, as I'll show
you, vegetation index, average rainfall,
aerosol and ozone index, which are
measures of pollution that are local,
and then conflict events. Um, we also
experimented with these new and
interesting Google deep mind embeddings,
which are available on Google Earth
Engine. They work well on their own, but
when we have all these other variables
in in the model, they the embeddings add
very little. Um so you can estimate a
model at the grid level. P hat here is
our survey estimate of poverty in a grid
that's been uh matched to the
enumeration area. Um we use as
predictors predictors at the grid level
and predictors at the target area level
which is a ward or an LGA. And there's
an error term uh both at the ward level
and at the grid level. Um we use an arc
sign transformation because we're
estimating proportions and we want to
keep the uh estimates bounded between
zero and one.
Um we also in this case we're using
extreme gradient boosting. Um I find
that's widely used and very mature
software. It's been very well tested.
We've tested it in house. It slightly
outperforms linear models in many cases.
Um not always of course. Um but it has a
very flexible functional form that
handles nonlinearities and interactions
quite well and you can measure the
importance of predictors through sharply
decompositions without too much
difficulty. So it's a little more
flexible than linear models and a little
more interpretable than the AI and deep
learning models. Perhaps not as accurate
as mixed effects gradient boosting, but
that's a little more complicated and we
haven't tested it. And uh I you know
probably not as accurate as
convolutional neural networks or
transformers. Um but other times that
we've tested that they use we use
different features than we do here. And
so I I think with the extra features we
have here we can get very solid
estimates even with extreme gradient
boosting. Um and indeed the models are
pretty predictive. If you look at R
squar these are in sample R squares but
there is some regularization. So it's
probably not overfitting. Um when it
comes to XG boost and just predicting
these indicators um the R squares are
quite high. Um they do vary. Um so then
I mentioned you can do sharply
decomposition to look at what measures
are important and when we look at
poverty um it looks like it's buildings
uh pollution and land classification and
nighttime lights that's doing most of
the work. Um and I think this makes
sense. These are all correlated with
urbanization kind of as we'd expect. Um,
and fortunately I think they're pretty
solid in terms of the quality of the
data. Um, obviously not every building
is going to be correct, but on average
um, there's very useful information. And
I think what's nice is that this, as
we've seen in the past, leads to large
increases in precision relative to the
direct survey estimates. Um, these are
the mean width of the confidence
intervals of estimates. Um and you know
they they fall by over a half. Um and uh
you know that's really impressive I
think because you know this the mean
confidence interval widths are basically
proportional to the standard error and
the standard error is the square root of
the variance and the variance is
inversely proportional to the size of
the sample. So this is uh an efficiency
gain approximately equal to expanding
the sample by a factor of eight or six
um by this measure. Um so that that I
think that's again a big win. They are
less precise than the the quote
representative state level estimates. Um
so there there's a average confidence
interval width of 15 for the estimates
that are currently published. We would
go up to 26 even for LGA estimates if
you use this method. And so I think this
this leads to kind of some philosophical
questions of how precise is precise
enough to publish. Um so I just want to
give you a sense of these are
preliminary estimates but these are what
the maps look like. Um they show quite a
bit of poverty in the northern half of
Nigeria. Um but there are pockets um and
you can see where there are pockets
where it's redder than other places and
even in the northern and the northeast
there are pockets where it's notably
less poor than other places. Um and also
in the south you can see there there
there's quite a bit of variation even
within states. Um these have been
benchmarked to match the survey
estimates at the state level which I
think um is important. Um you can see
somewhat similar patterns but also some
differences in multi-dimensional poverty
which is I I think interesting. Um and
also some pockets um where you can see
particular words. Um we can look at
other indicators like the secondary
enrollment rate for education. Um again
uh different pockets sort of towards the
middle of the country where that's
weaker. Um so these are interesting. I
think for someone who knows Nigeria
better than I do they would be very
interesting but I want to again caution
that these are preliminary. They need to
be reviewed within the bank and by the
government. Um so uh you know I I
they're not ready to be used quite yet.
Um anyway to conclude um I feel that the
geospatial SAE is practical and
expensive and useful. Um all of this is
using publicly available data. Um so
it's relatively cheap and for those
reasons I think it should be used
routinely. Um getting to kind of routine
use requires tools, knowledge and
computing power. Um and uh I think we
can help certainly with the tools and
the knowledge. Um re the research on
methods and there's been a lot of
research on methods. I think it's useful
but it shouldn't stop applications at
this stage and in many cases where we've
compared methods the differences in
accuracy are really not that major. So
as an extreme example we did some
there's a recent paper that's coming out
soon that compares uh different methods
for combining census and survey data and
we found that the prefer when we
simulated a targeting program the
preferred method beat kind of the less
preferred method slightly and that
translated to a 0.01 01 percentage
points in simulated poverty after a
targeting program. So in my view that's
kind of small enough that it shouldn't
cause a lot of debate or hold back
adoption. Um that you know there can be
a lot of discussion about methods but in
practical terms the differences are are
probably not that major in many cases.
Um, so meanwhile the methods offer
trade-offs on interpretability and
simplicity to estimate parsimony in
terms of how easy it is to communicate
model parameters. And so if the
differences in accuracy in a practical
sense are not that huge, maybe one does
want to go for a method that offers more
interpretability or is simpler to
explain. Um, there can be bias in this
kind of technique. The model based
estimates, they reduce sampling error.
The reduction in sampling error
outweighs the introduction of model
error. So it's a net win. The estimates
are more accurate on average, but there
is model error. And so um we have to
think about how to deal with that. And
maybe this involves some sort of redress
mechanisms of being open to uh a
procedure for handling complaints if
there are complaints. It is possible
that a place looks less poor in terms of
its urbanization than it is. Um and even
though everything is kind of more
accurate on average, um these cases can
be important. And then I think we need
to think a little harder about to decide
when estimates are sufficiently precise
to publish. Um so survey based estimates
often uh there's a threshold based on
the coefficients of variation adopted by
many national statistics offices.
Actually I feel this is problematic for
proportions because it's it you know it
varies whether you use a measure or its
complement. So it's quite possible we
could publish the non-poverty rate but
not publish the poverty rate and that
doesn't really make sense to me. or you
could publish the inchool rate but not
the out of school rate. Um so I'm not a
huge fan of coefficients of variation
for proportions. They can be useful for
other things. Um and then in general,
you know, these estimates could be
useful for policy even if they're not
entirely precise. So uh thinking about
you know what what the right uh
precision measure if any for publishing
these I think is is important. Um of
course quality control regardless is
crucial. Um so whenever we do this there
does need to be some sort of process of
uh evaluating the estimates making sure
they make sense um etc. But I do believe
that uh it is not useful to suppress
useful estimates um and that these
techniques um can be very widely applied
and really help uh provide more policy
relevance and useful data um worldwide.
Thank you very much.
Thank you David I think for walking us
through this uh impressive example on
Nigeria. I think I'm sure everybody is
looking forward to see the the published
results which would be uh something that
uh you know the NSO would be very much
interested in picking up uh uh the
pieces in on their side.
uh it's there are a couple of questions
but I will we'll come to the questions
at the end so that we can answer them
along the way. I'm jotting some of them
in the chat uh some of them in the Q&A
and we have some from uh the the the
submission and registration. So I will
come back to that but really thank you
for uh putting perspective putting the
research into practice. I mean that's
that's what everybody is looking uh
forward to. uh and saying that the
regional commissions are at the center
of this making sure this these come to
practice by uh you know doing the
interlocation work that brings NSOs in
the region to to to this uh advanced
research methodologies use and
implementation and I will uh transition
to our regional colleagues we have ECA
and ELA um sorry escap on this session
we'll start with ECA uh I'll share your
presentation Angela Angela is our uh
lead from UNCCA who's pushing this with
this work with uh the region the African
region and a number of activities have
been happening in the past couple of
years uh on ACE I think it would be uh
uh great to see uh where they are at and
what they're uh uh pushing towards so
I'll I'll try to put your presentation
Angela in the meantime you can introduce
yourself please
>> thank you so much Daniel um as he has
indicated my name is Angela Chicho and
I'm a statistician at the African center
of statistics uh of the United Nation
Economic Commission for Africa I'm happy
to share with us uh because we've we've
each been given five minutes so it's a
really short presentation we're about
slide. Yeah. But uh it's really to
report on uh what we've been able to do
in the year 2025. But maybe as a way of
background I would like to share that uh
we've been doing this since uh 2023
and we have seen of course improvements
based on uh the experiences that uh we
encounter as uh you know this um course
is um uh
as this course is done by the
participants. Could you kindly go to the
next slide please? Yeah. So like I
mentioned um we started in 2023 and uh
in terms of uh trends uh we have seen an
increase in the number of countries uh
that have been uh um sending nominees to
participate in this uh uh e-learning
small area estimation course. Um I think
probably we could share the link in the
the chat for those that don't know about
it to maybe go there and see but uh it
is a self-paced uh course e-learning
course and uh it lasts uh about 7 weeks
it has several modules and so on and so
forth in terms of materials. Um so in
the case of Africa for example in 2023
we had only nine countries but of course
also the approach at the time was
different because the call was just open
and it was uh whoever saw the call that
you know self-enrolled and uh at least
the case of Africa we decided that we'll
pick like 30 participants
um for a manageable class in the next
year the approach was different we
reached out to the heads of national
statistical offices and requested to
nominate uh but this that time it was
really focusing on the anglophone
countries um and not all of them at that
time though. Uh so we had about 12
countries and this time round we put out
a call to all the 54 countries including
the franophone and uh we had a response
of about 30 countries. Now specifically
in 2025 uh we got about 70 nominees as
uh people proposed by the heads of NSO
because they understand their people
that is at least the the basis for
requesting or having the heads of NSO as
the entry for us and uh we requested
them to um register themselves at least
for the Anglo one and 41 did but of
course we can see the numbers keep
reducing and then of course committing
to uh you know have to do the course
over the 7 week period. You can see the
numbers again also keep producing and
ultimately those that actually do the
assignment because if you do not do the
assignment and pass it then uh you
cannot say that you have completed or
done the course or you know uh qualified
to take a certificate that you at least
have an idea. Yeah. So that has been uh
or that was the case with the English or
the anglophone. uh in the case of the
franophhone we had nine countries that u
sent nominees uh of which uh there were
about 27 uh 17 registered and uh
unfortunately in the case of the
franophone were not able to have the
cost done as per the model that is used
because there's an issue with some of
the materials but it is still planned
that once this is completed then uh
they will be benefiting from uh um from
the course. So the course is done in
such a way that uh for those that enroll
uh over the seven weeks at least on a
weekly basis they'll have sessions with
uh a facilitator a course facilitator
who takes them through uh what you call
a synchronous class and uh uh that is of
course still virtual. uh but during this
session um the discussions about the
challenges uh that the participants may
be experiencing especially with the
videos or the materials they would have
interacted with in the course of the
week and uh um as you can see I think it
still comes up as an issue a challenge
the the the attrition uh but we need to
see what would be a solution for it.
Now, we also did have uh an in-person um
training um for sorry, someone is trying
to call me. Yes. So for an inerson
training for the the company that uh
David has just uh taken us through and
of course the prerequisite was that uh
the the the participants for the in
person and this was uh only for for a
select um set of anglophone countries um
had to have completed the the e-learning
course the small area mission e-learning
course with proof that they had
completed it uh because it was forming
being part of you know the base for what
was going to be learned in person though
of course this one is skewed more to uh
the geospatial uh side as you can see it
is on earth observation data so it was a
5day more or less really intensive
workshop um in terms of the material
being uh offered and so on and uh we are
really grateful to the partnership from
the world bank the east African
community as well as uh UNSD um that
supported uh uh for for for the workshop
to to happen this training workshop. So
in a nutshell we had about uh 15
participating countries. Could you just
go back one sorry? Yeah. So we had 15
participating countries and those are
they and uh a total of 26 participants.
We also ensured at least for the host
country which is Kenya we invited um the
data person that uh is in the office of
the UN RCO that is the coordination or
the coordinator's office. Next please as
I conclude I just thought I'd highlight
some of the challenges that uh continue
to persist. Um I think of course people
who start with R and other uh um
packages usually beg your pardon other
packages besides R usually have a
challenge because this course is run in
R and uh so if one doesn't uh do that
foundational bit of uh learning R they
have a challenge um doing the course and
then of course uh this has really
persisted especially for the e-learning
the The completion rate continues to
remain a challenge but uh we hope that
along the way even during this webinar
we can have maybe ideas on how we can
overcome this one but they do exist
opportunities at least in the case of
Africa. We strongly see um support from
the top management and by top management
I mean the heads of the national uh
statistics offices and this is evidenced
by their responsiveness to a request for
nominees. Um the other is that uh um the
interest or for small area estimation uh
work is also evidenced in the
application especially by those that
have you know gained the skills of
course later during this webinar.
Colleague from Ghana will be sharing the
Ghana experience. Um David alluded to
the Nigeria case but then also as
individuals some uh uh uh uh
participants from course have gone ahead
to do their own papers. An example is
some colleagues from Kenya and uh we
still think that uh at least not to
leave people behind. So it would be
great to have the materials you know
expanded to other packages that people
are more familiar and more comfortable
with uh be it STA or Python. So I'd like
to thank us for uh thank you all for
your attention and uh also appreciate
our partners uh once again and uh thank
you so much for listening to me.
>> Thank thank you Angela. Thank you uh
taking us to the story of the region in
the past year and I think uh I was um uh
part of the the in-person training and I
have witnessed you know the the progress
even from countries and also the the you
know the innovative ways of participants
uh pulling other I think you were
mentioning about other tools like strata
and python we had a number of
participants who were proficient in
stata who were able to even pull their
data from Stata into uh uh this this
workshop and I think it's it's something
that we need to think about putting some
more examples in that in that area.
Thank you for that. We'll come back to
questions later on uh at the end. Now
I'll pass it to uh and you know and
saving some time I'll pass it to uh Roth
from uh Bangkok. Uh he's also leading
the regional uh work in Asia and
Pacific. Um, back to you. Uh, please go
ahead.
>> Thanks so much, Daniel. Um, so give me a
few second while I share my screen. Um,
hopefully it's showing up all right.
Seems to be okay on my side. Can you let
me know if you see the screen?
>> Okay, perfect. Uh well thanks thanks
again Daniel and thanks uh David and
Angela for um sharing the work that
you've been doing. Uh good morning good
afternoon uh evening everyone. So my
name is Sana Rod. I'm an associate
statistician from the statistics
division here at ESCAP. Um and it's a
pleasure to be with you today and share
uh ASCAP um well uh go beyond the
training uh to to share a bit more on
the capacity development program um once
that uh that we've uh conducted back in
20 uh 25.
Um I should first start off by saying
that uh ESCAP has been working closely
with uh UNSD, ECA uh and UN agencies
such as UNICEF and development partners
namely the World Bank to implement the
capacity development program. uh so
their support was uh very much crucial
to the success of the 2025 um activities
and even before uh getting into the
activities I should also mention that um
our SAPE program built on the decisions
of our previous committee uh I mean back
since uh the seventh session uh which
was in 2020 to prioritize data
integration and innovation integrate big
data into official statistics and
promote a whole of society approach um
to implement the um our uh declaration
on navigating policy with data to to
leave no one behind. So there's history
to to that and with that foundation we
car we carry out the following
activities
um back in 2025. Well, first of all, uh
in collaboration with uh UNSD, ECA and
the World Bank, we published the uh
how-to guide on um geospatial uh SAPE in
all. Um it's a practical guide with uh
runnable codes and real data uh for
practitioners and policy makers alike
interested in uh geospatial uh
uh the guide is available in both HTML
and GitHub versions as you can um should
be able to see in a bit. Uh I'll share
my screen to uh sort of give you a quick
demo of the guy. Um so this is the HTML
version. So you can see it's very
interactive. Uh we also prepare uh a
short sort of demo for you as well if
you would like to know how to navigate
um the guy. I won't play it because we
we're running out of time. Uh but do but
do please uh you know watch it and and
you know navigate the the guy um you
know in in your spare time. I'll share
the link um with you in the chat as
well. Uh there's sub uh components to
each chapter that you can go through. um
you know there's uh chapters on setting
up all as Angela mentioned some
participants in the workshop that I'm
going to tell you about later on um had
you know uh trouble uh getting used to
working on all and we've uh incorporated
that in here as well um but um overall
this guy is is uh essentially a
practical walk through where uh if
you're interested in in geospatial essay
E um you can just simply go through the
chapters um and you'll be able to uh
you'll be able to um you know not only
understand the importance and the
usability of geospatial uh SAE but
you'll also be able to uh run the codes
and um you know uh follow along the
examples and eventually um use uh all
this knowledge and codes uh for your own
uh country context or indicator of of
interest. Um, so let me get back to the
uh slide,
right? Um,
yes. So, um, that's that's the uh how-to
guide. Uh, and I highly encourage
everyone to visit it. Uh, the link will
be shared with you in the chat later on.
Uh second with support from UNSD and the
World Bank we organized um our 2025
Asia-Pacific capacity capacity
development program on SE uh and we uh
combine virtual e-learning uh
facilitated by um you know an expert
from the World Bank and uh with
in-person uh regional workshop in
Bangkok on geospatial SAE which I will
uh briefly tell you uh later on as Well,
so the guided e session started um on
the 2nd of October um and it concluded
on the 14th of January because um it it
includes seven weeks of facilitated
online classes um and participant went
through um you know uh sessions where we
have um the expert uh uh conducted um uh
tutorials uh essentially uh through the
e-learning
uh course uh that we developed and links
will be shared again uh in the chat box
uh on the Cup um the Cup platform. uh
CIP is our stat statistical institute
for Asia-Pacific. Um and through that
link uh participants went through uh all
of the uh materials. Um and all of the
participants uh able to complete uh the
course including the graded assignments
uh and they were able to um uh attend
the workshop which was similar to what
Angela mentioned was the main
requirement for participants to uh
participate in the in-person workshops.
Um there were 20 of them for the 2025
cohorts and from uh 10 uh countries. So
two per countries. Uh and let me get the
list for you. They're from Bangladesh,
India, Indonesia, Malaysia, Pakistan,
Philippines, Sri Lanka, Tajikistan,
Thailand, and Vietnam. We're lucky to
have uh one participant from Malaysia,
Miss Fisa uh here with us who will be
able to share her experience later on.
Um but yeah so this this uh course um
enable participants to be able to
prepare themsel for the inerson workshop
and contribute um as as much as they
can. Um speaking of the workshop uh it
happened on from the 24th to the 28th of
November. Um and uh it was attended by
all uh 28 participants. The focus of the
workshop was on providing hands-on
capacity uh building support uh to NSO
staff essentially on using uh or a
little bit Python to conduct geospatial
SAPE for their indicators of interest.
Uh we had um Dr. Josh Murfield some of
you might have uh worked or known him.
uh he was our facilitator and he taught
participants on uh working with shape
files, rusted data uh packages uh like
the one that um David mentioned earlier
uh geol uh titer puff map uh and
estimating models uh such as obviously
you know pop uh mentioned earlier and
then uh uh it's gradient uh boosting uh
as well um participant did bring uh
their own well many of them brought
their own data sources uh and they were
able to estimate their indicator of
interest using the um the skills uh
learned from the workshop. Um uh if
you're interested you can click on the
link or I will share in the chat box
again um to learn more about uh the
workshop itself. Um and the link at the
bottom there is another link uh in
reference to the the how-to guy uh
mentioned earlier. Last but not least,
uh we wanted to extend our capacity
building initiatives even further. Uh so
we organized our Asia-Pacific STA cafe
on geospatial SAPE on the 27th uh of
January 2026. Um, we invited
participants from our workshop and
e-learning program to share their
experiences uh and lessons learned in a
panel uh with expert reflections from
Josh and a re resource person uh from uh
BPS uh Indonesia.
Uh we also took the opportunity to
showcase the how-to guide again on
geospatial sea. As you can see, we're
very very proud of that uh with with uh
in collaboration with a colleague from
ECA uh USD and World Bank. Um and we've
got uh quite a number of participant uh
registered 60 plus uh and uh uh the uh
the panel discussion was lively and then
everyone was was very engaged uh and
more information on that can be found
again in the link that will be shared uh
in the chat uh later on. So that's all
from ASCAP in 2025 and look we look
forward to uh to uh 2026 and and what uh
the year will bring. Um thanks again for
the opportunity Daniel. Uh back to you.
>> Thank thank you Ro. I mean it's it's
really good to see uh what's happening
in in the African region and in the Asia
Pacific and uh I'm probably um
participants have picked up this is
happening uh for people who are
motivated because it's not it's not uh
uh other kind of courses you just go for
a week. Uh we also have a prerequisite
of people finishing the 7week module
that really uh puts uh the groundwork
ready for uh the 5-day workshop physical
workshop at the end. Uh we will be
sharing all this uh information later on
all the right links. Um I see a couple
of questions coming which which link
where it where the e-learning courses
and things like that they will be shared
and the recording of this meeting along
with the presentation will be shared
with all the participants who registered
here. So now with with all the regional
commission's backing and the UNSD and
our partners and in the world bank
countries have been uh participating in
this uh uh taking the course on online
and then also doing the physical uh
course for a week. uh it it is time now
to switch to countries and see what's
happening from uh their uh their side of
uh the the the the work on SAE. Without
further ado, I will pass this to our
colleague from Chile from the National
Institute of Statistics in Chile who
will be showcasing their uh SAPE
experience and they've been doing a lot
of experimental statistics using SAE and
we'll be uh yeah following up our
presentation. Miss Aier will be
presenting uh uh the activities in in
NA. U the floor is yours Mr.
Hello. Uh so well good morning everyone.
Thank you for the invitation. C can you
see my presentation?
>> We saw it and then it's gone. Can you
reshare again?
>> Oh
me
put it on presentation mode and I think
it's not working.
>> Okay.
>> Huh?
>> Yeah we can see it. Okay,
>> thank you.
>> Okay, thank you.
So, well, my name is Javier Torres. I'm
from Chile and we're going to show you
the results of the implementation of
small area estimations uh in the
national victimization survey.
So a little context uh our victimization
survey is called national
and it's one of the longest
victimization surveys in Latin America.
It's collected annually since 2005
and it has a sample of about 24,000
household providing national and
regional representativeness.
So in 2023 we have a major redesign of
the survey. We redesigned the
questionnaire and the sampling and with
that uh we collected the first survey
with a communal representativeness for
the European areas of 136 communes. Uh
this came from a growing demand from our
government for better geographical
disagregations given the characteristics
of the phenomenon. Uh however uh having
uh coming out a survey every year it's
uh expensive and and it's not
sustainable on the long run. So for 2024
we use SAI to obtain reliable estimates
of the proportion of households
victimized by violent crimes which is
the main estimate of the survey.
uh this for the 136 communes of the
design and the results were just
published we published in January and
you can see it on our website.
So uh regarding SAI we started working
back in 20 uh 2018 actually with the
first uh capacity of building phase uh
this came from assistance from EKLAC. So
we started working in 2018 with the
survey uh from that moment which is like
the old version of the survey and we
work with the proportion of households
victimized by high social impact crimes.
Uh this uh was a working paper that it
was published in 2024 and it gave us uh
a lot of lessons mostly that uh it was
needed to have a like a strong
theoretical framework and establish a
strong criteria for evaluating the
results and that these were consistent
with the phenomenon.
So uh well from 2018 to 2022 we mostly
work in the capacity building in NSO and
then for 2024 we establish uh SAI
estimations as official estimations of
the survey. So uh in 2023 we had an
exercise with the communal uh survey uh
which allow us to understand better the
coariates and the needs of the model.
So for uh 2024 as I told you we use uh
the survey we use SAI as an official uh
estimates or mostly it was uh
so we use uh uh some methodological
framework we use aid uh proposed
framework uh for the specific
specification phase
uh we evaluate the user needs uh the
data availability and the SA AI methods
available. So for the user needs, we
defined that our estimator was going to
be the proportion of households
victimized by violent crimes which is um
oh here it's a aggregate of seven
different crimes such as robbery or
assault. And for the data ability uh we
put efforts on creating a theoretical
framework which guided uh the search for
the coariate. So we establish some
dimensions like you know socioeconomic
or socio demographic characteristics but
we also look for data regarding uh crime
and victimization like police records
and infrastructure and environment such
as satellite images, national uh
community information and
and the system and informal settlements
data.
Uh this uh this phase we also chose our
target of estimation which is commune.
Uh commune is the smallest
administrative submission in Chile and
it's equivalent to a municipality. And
finally we decided on using EVAP uh is
based on the ferret model as it has been
applied to poverty estimations in Chile
and is quite consolidated as a
methodology.
Uh after that uh we went on the analysis
and adaptation phase. uh so to reduce
the volatility associated with the
estimates uh the sampling variance is
modeling using a generalized variance
function and this gave us a more stable
and robust measure of variance which is
subsequently used as an input for the
model. We also establish a domain
inclusion criteria which give us uh how
many communes will have a purely
synthetic estimation or which ones was
going to be uh direct and synthe
synthetic.
So um part of this criteria were the
degrees of freedom the number of
observations and the sign effect for
each domain and then we search uh for
our final model.
So uh for our model we have a model
selection algorithm algorithm uh
statistical criteria and conceptual
validation. I'm going to explain that
very shortly each one of them.
So uh we use a baseline model uh
including regional dummy variables. Uh
we use a stepwise selection and we did
the exploration of uh combinational
variable subsets. So we uh throw all the
coariates that we have and try to
simulate different models.
Uh we also well we look for the
statistical significance of the core
variates the AIC and the BIC. Uh we also
look for the residual diagnostics and
multicolinearity
checks and the prevent benchmarking
check. And finally for the conceptual
validation we selected coariates that
are expected to cover keymatic
dimensions related to victimization.
Uh so uh
for the evaluation finally after we have
our model so we evaluate its precision
and certainty and to verify the
assumption of linearity we check the
residuals looking that there's no
influal domains. Uh we also look for
consistency with the regional estimates
looking the SAI fall within the
confidence intervals of the direct
regional estimates.
uh in this case uh region is the first
level administrative division in Chile
and the commun are part of the region.
So we expected that you know no uh
domain were uh above the um confidence
intervals.
Uh we also look for error measures. Uh
particularly we compare the root mean
square uh error of the SAI within the
direct direct estimations of the survey.
And this show us uh this is the the
graphic we have here and this show us
that um the SAI is significantly more
efficient than direct estimations
especially in areas with a smaller
sample size sizes. Uh so in the figure
we also sort the communions by sample
size uh which is highly associated with
the reli reliability of the estimation
according to our NSO standards.
So um we have the point estimations this
is for 2024
uh where we can see the elab uh elap
estimations tend to be more conservative
than direct estimations. uh in blue uh
we can see the draw direct estimations
uh in the in the areas with the smaller
sample sizes also as we expected uh for
the communes with bigger sample sizes uh
both estimator estimators are quite
similar.
So this is sorted by sample size. Uh
this is smallest and the uh biggest on
the right.
Uh so uh well this is has been a long
work for us. We have been working on
this for the past six years and there is
a lot of lessons that we have taken from
the different exercises that we have
made. So uh first the quality of uh SAI
estimates depends uh strongly on the
relevance the coverage and consistency
of the auxiliary variables which are not
easy to find. uh regarding the
limitations of the administrative
records use we identified in some cases
coverage limitations. So we worked with
uh 136 communes from around 300 that
there is in Chile and it was really hard
to find data that covered the
136.
uh we also find missing data in the
administrative records and
inconsistencies which uh made it
necessary to perform some imputations or
in some cases to reject the use of the
data.
Uh also uh to use an aggregated
victimization indicator can facilitate
the explanation of the phenomena but
introduces uh challenges in generating
predictive models. As I told you, this
was seven uh different crimes with seven
uh different characteristics. So, it's
it's harder to find uh auxiliary
variables that can
work with the seven of these crimes.
Also, uh models with high predictive
performance are not always easily
communicable. So, for public policy
context, it is essential to balance
statistical precision with
interpretability and transparency.
uh for us also incorporates
incorporating the SAI requires a
safeguarding compatibility with direct
estimates and across periods which uh is
related to
to the last point. Uh the annual
frequency of our survey uh requires
iterating and adjusting and
progressively validating the models
strengthening their institutional use
over time and requires uh distinguish
distinguishing stages of methodological
learning, model testing and
implementation which do not always align
with the timelines of foring results. So
we have a very short timeline here. We
uh gathered the data in the last
trimester of the year. we publish during
the second trimester of the year. So, uh
actually being able to produce SAI
estimations over uh each year, it's uh a
serious challenge for us
and for the future as we already have
our first version of the SAI work. Uh we
are looking to expand uh to the use to
other indicators of interest. for
example uh the dark figure of crime or
the perception of insecurity. So it is
important for us to understand that the
results both on the quality of the
auxiliary information sources sources
and on aspects related to direct
estimations.
Uh we also well as the limitations of
the administrative records uh we
identified in some cases uh you know
limitations as as a missing data. So we
need to we need to search uh we we need
to search data every year for one uh
data that working for 2024 might not be
updated for 20 and 25 and so on. So it's
a constant work
and
finally uh as although it is possible to
make uh reliable uh commute level
estimates for the 20 and 24 period uh
for us uh
uh it's it has been a difficulty that we
had that as we produce uh direct
estimations for 2023
and scientific estimations for 2024. We
cannot rely on the usual statistical
test to compare the estimates with the
previous period and this is going to
happen to us again in 2025 because we
have direct estimations again. So this
is something that we're uh trying to
research uh how to
um how to use this data and avoid uh the
comparations between the two different
methods. So that's kind of what we are
doing right now. Uh thank you
>> many thanks Navier for an excellent
presentation and also showcasing you
know the the this the history of ACE in
in in the National Institute of Chile.
It it it shows clearly you know the
maturity of the work you you're doing.
Uh I will come back for the Q&A later on
but I will pass now to uh the other
continent
uh to Africa and uh Edward from Ghana
sical service will be presenting their
their experience in SAE uh will come
back uh to the questions later on. Uh
Edward the floor is yours.
>> Okay. Thank you. Um good afternoon once
again from Ghana.
Um I'll be presenting on Ghana's
experience using SAE and then what we
have been doing so far so date
screen is back.
>> Okay. Please can you see my screen?
>> Yes. Yes, we do.
>> Okay. Thank you. So
is it rolling or is still the same page?
Okay, sure. So this is going to be the
outline for the presentation. And I have
the introduction and then why we are
into SAPE the entry point where we
started from the capacity building and
training from training to practice what
we have been doing and then what we have
done so far and then what we are doing
as GSS when it comes to SAPE to instit
institutionalize it and then challenges
encountered so far um and then how we
are moving forward and then looking at
what we will do in the future.
So in Ghana we have 16 regions that's
admin 2 regions and then we have 261
municipal metropolitan industrial
assemblies that's for the admin 3 mostly
our um surveys are actually at the admin
level but the demand for data is always
coming in from the admin level because
of some of the local government policy
that they have over there. So what we
adapted is also part of this SAPE
actually we have to estimate for them
and mostly during the past we couldn't
because we didn't have any idea of these
estimates and then nobody will give them
the original and then try to make
assumptions around how the districts
will be like but for now that we have
been trained in SAE from
um
World Bank USD UNA ECA UNFPA and in
other regional commission
we have some ideas of how we can do
these estimations using the SA methods
that we have been trained on. I
particularly was part of the just recent
um ended training that we had in Kenya.
I was present there and then with Daniel
and then the team from World Bank.
So we have we have been through this
training from the past. I remember my
colleagues also were there some time ago
and then I also came and then we are
also training other people in the
office. What we do is that in most work
that we take under SAPE we try to
include a particular person or two to
also be part of the work so that they
also get some of the experiences
in SAPE so that we all work together and
then also get more people with some
skills in SAE to help us work
from that. We the Ghana saskar service
has been able to publish 15 reports
using SAPE and then these reports were
from the Ghana demographic and health
survey um data and then the population
housing census data and then it's it
ranges from the um exclusive
breastfeeding childhood immunization
women's empowerment gender based
violence excessive alcohol consumption
double burden of malnutrition we even
have um um excessive alcohol content and
all that we have a lot they or 15
reports that has been published together
with the UNFPA and the USA DHS. So in
this report we try to use different
methods. We use the ELE methods, we use
the EBP, we use a ferot the the the
reason why we chose a particular method
to use sometimes revolve around how we
interpret the results and then the
assumptions the estimate methods to the
policy makers. So that also embrace the
results that they are seeing. And
sometimes we test these methods to see
which one is giving us the best
estimate. Not just statistical
assumptions, but we look at what we know
from these districts, the data we have
from the past and what the estimates are
giving us. which one is closest to being
true or being real regarding the numbers
we are having and then sometimes we test
the assumptions like the UNFPA method
that we are seeing the ELN normally we
do logistic regression on these
estimates
um on these data sets and then we test
the ROC's that's the area under the
curve assumptions we test a lot of
things to see which one is performing
better before we choose the model to
And then we also trying as Ghana service
to like I said institutionalize
the G um the SAE method in Ghana. So we
release a report not just testing these
estimates or testing these methodologies
but we release the report to policy
makers to also use to the district level
local government policy makers to also
use sometimes to we invite them to these
publications so that they get sense of
whatever we are doing for them. the
numbers we are given to them, how they
were generated, how the estimates came
about so that they know what we are also
going to use the numbers for, what even
went into them and there are still
processes in place to make sure that SAE
methods become core part of whatever we
do in GSS most are incorporating it
apart from the um DHS report that we
have done currently as I'm sitting here
we are also working on the NPI reports
that were released um some months ago I
think last
So we are doing that was also at
regional level and it's like a trend
analysis from 2022 to 2025. We are also
trying to run SAE for all these years
for all the districts that's the 261
districts that we have in Ghana and it's
not just the headcount of poverty but we
also doing it at the indicator level and
also even running the intensity of
poverty for all these districts. So with
all these models there are different
different methods that we are trying to
use. Currently we are done with the hair
counts which was done using the ELLL the
intensity is um I've done for 2023 and
then we have we tested the EBP ferot and
then the base approach the reason is
that we can't estimate intensity of
multi-dimensional poverty for each
household but it has to be an area level
indicator so we need an area level model
to do this so that's why we are trying
to use the ferot the EBP P and then the
base that's the basian approach and then
the current one that we have that has
been accepted is based on the EBP so we
are trying to use the EBP to estimate
intensity for all the years also so GSS
is currently doing a lot and I won't say
we haven't faced some challenges we have
and then that's also the reason why
these are also based on purely sensors
data sets we haven't actually included
um
special data sets yet because sometimes
it comes with data harmonization between
the survey sensors and even the
geospatial data sets and then also
communicating uncertainty and model
based estimates to nontechnical users
are also a challenge sometimes. So we
try to find a way of interpreting them
in simpler terms or in simple languages
for them to understand to also cherish
what we are doing at GSS and then also
choosing the right auxiliary variables
also becomes difficult sometimes. I mean
it's like the NPI that we are doing
sometime we have to also make sure that
the indicators that we used to
estimating the head count poverty
doesn't also end up in the um the
auxiliary variables that we are using
the predators we are using to estimate
poverty so these are all difficult
sometimes you need to bring in other
external data set but because of how you
have to harmonize and then you are not
getting the right variables to me data
sets choosing the right auxiliary
variables become a challenge but well we
are doing our best and Then also the
right model like a lot of challenges
goes through when choosing the right
model. And then we are facing capacity
constraint when it comes to we were let
me put that way facing capacity
constraints when it comes to estimating
models but we try to put a member who
have not had a skill in SAE in the team
so that the person gets something in the
tips when it comes to SAE. So the next
time we can also rely on this person to
help us in SAPE models
and then what help us move forward I've
mentioned one we put members in the team
so that they can do it and then we also
learn from each other so we communicate
sometime we bring people from external
to also discuss what we are doing so
that they also give us their point of
views and then we put them together and
then we also had great leadership from
the part of the GSS management where
they also accept that okay this is not
our traditional way of doing things.
This is something new
but we have accepted it and we are also
trying to put them forward and then we
give them also thanks and then looking
ahead. So currently there are plans for
the DSS data science team to also look
at external data sources and how we can
incorporate that is also solving our
challenges when it comes to determin
harmonization and then I mentioned the
work of poverty multi-dimensional
poverty that we are working on there's
labor statuses coming we are working on
the Ghana living standard survey
currently we also working on the
consumption and the poverty line
estimates which when it's done we are
also going to run estimates for the
district levels because that one is also
representative at the regional level and
then the data science team is working on
um core detail records and then I am not
part of that team so I don't really know
what is in the data set but they are
working on how they can get some
variables to help us estimate these um
um models at the district levels
something that will distinguish one
district from another not just based on
the human beings living in it but
something that is peculiar to a
particular district from the data set so
That is also work that we are also doing
at the Ghana statistical service. Thank
you.
>> Thank you Edward for uh going through
Ghana's experience. I mean it's really
impressive to see uh all this reporting
is using all this SAPE um modeling uh
which which seems to be a little bit of
a challenge for many NSOs but uh going
through these examples and answering the
questions at the you know local level
that's where uh decisions are made where
the impact is much more important uh
that's that's really uh great to to see
Yeah, time is um flying and uh we are
trying to get uh everybody in and we'll
probably add extra minutes at the end
for more Q&A. Uh so without further ado,
I'm I'm now switching to another
continent going to Asia Pacific. We have
our colleague FIA from the Department of
Statistics Malaysia. they will be she
will be presenting the experience of SAE
in their uh uh in the in their uh
department uh FISA the floor is yours
please go ahead
>> okay thank you Mr. David, let me share
my slide first.
Can you see my slide?
>> It's coming.
>> Yes, I can see your slide.
>> Okay. Assalamu alayikum and hello
everyone. I am Faizar Rosanti Taj Aaros
a statistician from department of
statistic Malaysia. Today I will share
my experience attending the SAE capacity
development program under ESCAP UNSD and
World Bank with our team use SAE for
estimate income and poverty at state
legislative assembly in Kang Joho.
This is the geography of Na and the
arrow show the location of the study
which is Kangjoo.
Below are the four maps clearly show the
different boundaries for each
administrative level in Kuang Joho. The
first map is our administrative district
of Kuwang as admin level two. Then the
district is divided into three
parliament as admin level three. Each
parliament actually consists of two
SLAs's with a total of six SLA in admin
4 and we have 11 census district at
admin level five. For this study we use
three set of data. First the population
and housing census 2020 the data up to
enumeration block level in shape file
format. Then we have household income
and expenditure survey in 2022 with the
data up to living quarters level in CSV
format. And we have satellite image data
2024 in format CSV and TIF
by using our census shape file data.
Then we generate the population map
based on the enumeration block. From
this map we can see that the most
population in Kwang administrative
district are in parliament of Kuwang.
Then we create a new shape file by
combine the census and survey data and
the second map showing show showing the
distribution of average household income
across the Kang area
by using the data from Google Earth
Engine. We also plot a point of
nighttime light by using the point of
coordinate and we change the point at
the center into a polygon. This map
display night time light intensity for
clu.
Then we use satellite image to create a
simple map for showing how bright each
area is at night. From the nighttime
light map, we can conclude that the most
population area also the most bright
area in this satellite image data. The
polygon then are colored based on the
average nighttime light and the area
with no data are shown in light gray in
second map. Then we do some treatment
for the EBS with no satellite data. We
replace the missing data with the global
mean.
After finished with the data cleaning,
we managed to use the guideline to SAPE
for poverty mapping published by our
bank to identify which method should be
used for this study. From the decision
tree on method availability, we choose
to use area level model base. This is
because the census and survey data are
not conducted in the same time frame.
the census in 2020 and the data were
received at EBIS level while the survey
data was from 2022 and the data at LQ's
level. So for this study we choose to
use the fair Harriet model. Then we use
LEO and GLM net from package in R for
the variable selection and transform the
predictors to improve the fit of the
model. The lambda value was used for
extracting the variable with nonzero
coeffic coefficient.
For this study, we use income
indicators, poverty line information
and nighttime light from satellite image
as auxiliary information to help
stabilize the estimate for this area. We
cannot use direct survey alone because
of the sampling might be used only a few
household. So the numbers will become
noisy, unstable and sometime misleading.
We use the fairheaded model to improve
the estimate by combine two sources of
the information which is what the survey
tell us and what we know from other data
from this small area. If an area has
strong survey data, the model relies
mainly on the survey. But if the survey
data is weak, the model borrows strength
from the auxiliary information. For this
model, we group household by EBS and
calculate the direct survey poverty rate
for each EB. Then the model improve this
estimate using the auxiliary data
without any transformation.
The fair model without any
transformation can be uneven or noisy.
From the chart, we can see that the
relationship between the direct survey
and the model is less clear and the
results scatter among the point. The
brown test result show that the
correlation between the model predicted
and the direct survey estimate is 0.68.
This correlation result show that the
auxiliary data we use is informative and
support the model in producing better
and more stable poverty estimate.
Next we try fitting the fair model with
log transformation from empty package.
We apply a log transform to stabilize
the modeling scale. After do the
transformation, we convert result back
to the original poverty rate scale using
a bias corrected model or BC_SM.
So they are interpretable. The output
now include improved estimate for every
EB plus their uncertainty or MSE with
bias corrected back transformation to
the original scale. As a result, the
relationship become clearer, the data
points are more orderly and the model
become more consistent.
The brand test result show that the
correlation between the models with
transformation and the direct survey
estimate is more better from 0.68 to
0.79.
The main model accuracy indicator here
are the MSE for how confident we are
with the estimate.
In summary, the fair model give more
stable and reliable poverty estimate for
each small area. This model improve the
direct survey result by reducing noise
and borrow strength from auxiliary
information to make the estimate more
accurate. Limitation and way forward for
this study. For improving the data
quity, we need stronger administrative
and geospatial data such as housing
condition, land use, school attendance,
clinic visit or we can also incorporate
with other data source such as mobile
phone data, digital primary records and
additional satellite indicators to
strengthen the model. Beside that, all
data set must include geographic
boundaries information so the coordinate
more accurate and the model become more
perfect. And last but not least to have
better partnership by collaboration with
local authorities, welfare department,
utility providers and other facilities
provider is essential to access micro
data and these data sources can
significantly enhance the accuracy of
small area poverty estimate. That's all
from me. Thank you very much for your
attention.
>> Thank you Fisa. That's it's impressive.
Even I was thinking I'm going into the
course myself. So it's uh it's really
refreshing to see you know the the the
sweat that goes into uh running all this
uh modelings. Um as as they say in
Malaysia statics plume in harmony that's
uh that's the motto. So uh we've now
come to the conclusion of the
presentations. We have uh an extra 5 to
10 minutes for Q&A. Uh I have seen uh uh
most of the uh uh questions are being
answered in the chat very much thanks to
David and Howie Adit. But I will
probably um open one question for all
the three country presenters from for
Edward Fisa and uh Javier. This question
is about uh I think uh you've alluded it
to your uh presentation
question from uh in the Q&A we have a
question from uh Sanchez. uh he's asking
about uh if you can say a few words
about how policy makers and government
is accepting USA results uh when you
present it. So what are uh what do you
consider what are the things uh that uh
is you know important to communicate
this to the policy makers. If you have
some experience if you can share with
that with us that would be good
and you can uh you can jump in and
answering FA Edward or Mavia V.
>> Okay. Thank you. Um for Ghana I know
that this um estimate that we are
producing is a great deal for policy
makers especially when we go to the
local government they use some of the
estimate that we give them in their um
district common fund estimates. So when
they are preparing their um district
common fund um formula that's how they
call it here they use some of these
estimates to guide them in that. So yes,
it's a great deal for them. When we
produce them, they come, they sit in and
then we release the numbers to them.
Yes. So I know they are using it because
that's one of the things that they do.
And I think the government also Yeah.
Because I've seen the um Ministry of
Sanitation um some of the other relying
on these numbers that we put out for the
districts
to do some of the policies. I know they
are revising some of the documents
because of some of the estimates that we
put out there. Yeah.
Thank you Edward. uh any any input from
uh FISA or miss Javier?
Okay, for Malaysia we are from Malaysia
we are currently serious looking for the
small area statistic. We start publish
our small area um like parliament and
SLA statistic and also in our current
strategic plan for the department. We
also include the small area estimation
for our future statistic. That's all
from me.
>> Thank you. Thank you. And now to Chile.
Yes. So promise uh well uh poverty
estimations with SAI models are already
in place. So we we do not produce those
models though. But uh they have been
working since around two or three years
maybe a little bit more. And for us with
victimization it has been um
uh it has been a lot of work uh to uh
consider these estimations. As we told
you, we started working in 2018 and just
by 2024, we were allowed to produce this
uh estimations within the um the
official results of the survey. So uh
and it's still uh we still publish as uh
experimental uh statistics. So it's
still not like official official. So but
we are working on it. So we respecting
after two or three iterations uh we
might publish the result as official 66.
>> Thank you. Thank you very much. I I'll
just switch back to for extra few
minutes back to David. I've seen a
number of questions answered there.
Maybe if you can uh give us a quick
highlight of uh what if you want to
broadcast this to the the whole group on
some of the questions. most of them are
on technicalities and type of models
used but I think it would be good to uh
also um I've seen a few questions
requesting support uh for uh doing some
of the work. So I think if in light of
uh uh the support that would be uh
available in the coming year if you
could uh briefly uh present what we're
uh going to do from our side uh in terms
of supporting countries. Yep. Sure.
So, thank you Daniel. I think for
supporting countries um usually the
World Bank has a a contact person um in
country um and it's best to work through
that person or a contact person that
works with NSOS
um and you know I I can work through
that person um and support country teams
as I've been doing with Nigeria and will
be soon um with Colombia.
um on some of the the technicalities. Um
I I think I'm I'm happy my I put my
email address in the chat. I'm happy to
address any questions that people may
have. Um one thing I would point out is
that cross-sectional small area
estimation is very different from
intertemporal survey to survey
amputation. Um, and the same variables
that may do well in cross-sectional uh
smaller estimation may not do well when
uh trying to predict across time. Um,
simple models with geospatial
indicators, for example, do not do as
well predicting across time as they do
across space. Um, and in general, I feel
more confident using models that have
been trained in a cross-section to
predict across space rather than
applying them to intertemporal
uh prediction. though my colleagues have
been working on that as I put in the
chat um and have written quite a bit and
uh I would encourage people to talk to
them about that. It's just a it's just a
different thing. Um so uh yeah I I'm
looking forward to continuing this
agenda. Um and uh certainly the World
Bank is still very interested in how to
use models and data integration to
produce better data and we're happy to
support any countries that um are are
have the same interests. Thank you.
>> Thank you David. Just back to the last
for the last 30 seconds or so for our
regional commission Angela and Roth. I
think this is uh something that also
comes to you the support for countries
and what is a plan for next year. uh if
you can say a few words on that that
would be that would be great.
>> Okay, I'll go first. Thank you uh Daniel
for the opportunity. Um without a doubt
definitely these uh techniques are
really uh important and uh they help us
really leverage uh the data that we have
to further produce you know disagregated
uh estimates and so on and so forth.
fill gaps and so they are definitely
important uh as the African region would
want to continue promoting this among
the member states and you know really
exposing them so that at the end of the
day we can have a critical mass of
individuals that can actually um you
know apply these techniques use these
methods do this modeling to achieve u
the the ultimate goal and uh definitely
for our agenda 2063 as Africa as well as
the global agenda. We definitely need
this. So we'll continue to encourage
other member states to apply this. Thank
you. Back to you.
>> Thank you Angela. Back to you Roth for
the last uh word.
>> Thanks Daniel. Thanks uh everyone for
the active contributions. Uh like Angela
uh Escap is really uh keen on uh explore
opportunities uh to help countries out.
Um you know as colleagues mentioned uh
our doors are open if you're uh
interested in reaching out and letting
us know um you know how we can help you
with your work. Um smaller estimation
for us uh ties in quite nicely
particularly geospatial uh smaller
estimation is um you know ties in nicely
to our work um on big data data science
uh and data integration work. So um in a
way uh it really is uh part of our plan
to extend support to countries in our
region as much as we can whether that be
through um knowledge management uh
capacity building uh you know aka
projects or providing uh hands-on
training or even um or even uh webinars
type type of support. Um so yeah do do
look out for more information from us.
Uh the best way to do that is through uh
our website. That's that's the first
point of contact. Um and you can also
reach out to me or to our um our generic
email more more generally as well. Back
to you Daniel. Thank you.
>> Thank you. Thank you Ros. Thank you
everybody for attending the session. I
just put one slide because there was
back and forth in the chat. This is a
number of tools available. I mean you
don't have to wait for response from us
or from a specific agency. You can go
ahead and do a lot of self self-based
courses are online that that was
mentioned during the call and some
exercises are already there but in in in
in general yes uh UNSD in collaboration
with partners will be um happy to help
in this process. Unfortunately, how we
has to leave but we are from our side
from UNESC side uh coordinating this
work with the inter secretariat working
group on household surveys as a as a as
as an item uh that we're working on. So
we'll keep in touch and thank you for
your attention and uh making it to the
last bit. Uh sorry for uh taking extra
five minutes of your time. I appreciate
uh your uh your uh patience and uh
followup. Have a great day.
>> Thank you, Daniel. Thank you everyone.
>> Thank you everybody. That's you.
Ask follow-up questions or revisit key timestamps.
This webinar highlights the practical application of Small Area Estimation (SAE) to produce granular, policy-relevant data for the Sustainable Development Goals. It features collaborative efforts between the UNSD, the World Bank, and regional commissions in Africa and Asia-Pacific to bridge data gaps by integrating surveys with geospatial and administrative data. The session showcases methodological advancements, capacity-building initiatives like e-learning and workshops, and real-world case studies from Nigeria, Chile, Ghana, and Malaysia, focusing on poverty, victimization, and health indicators.
Videos recently processed by our community