sabbatical leave programme 2017 title of the …...big data and the sdgs 12-17 5. empirical...

38
Sabbatical Leave Programme 2017 Title of the research – Leveraging Big Data for SDGs Monitoring and Reporting in Latin America and the Caribbean Staff member name: Giovanni Savio Institution: UN Economic Commission for Latin America and the Caribbean, UN- ECLAC Academic supervisor name and title: Mr. Alan Belward, Head of Unit D06, Knowledge for Sustainable Development and Food Security, European Commission Joint Research Centre, JRC Date: 25 January 2018 © United Nations Sabbatical Leave Programme The views and recommendations expressed in the present report are solely those of the original author and other contributors and do not necessarily reflect the official views of the United Nations, its agencies or its Member States. Textual material may be freely reproduced with proper citation as appropriate. Endorsement by academic supervisor This is to certify that the present report is based on the research undertaken by Mr. Giovanni Savio during the period July-November 2017 at JRC, European Commission, under my supervision. Signature: Name: Alan BELWARD Title: Head of Unit JRC D06 Date: January, 2018

Upload: others

Post on 11-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Sabbatical Leave Programme 2017

Title of the research – Leveraging Big Data for SDGs Monitoring and Reporting in Latin America and the Caribbean

Staff member name: Giovanni Savio

Institution: UN Economic Commission for Latin America and the Caribbean, UN-

ECLAC

Academic supervisor name and title: Mr. Alan Belward, Head of Unit D06,

Knowledge for Sustainable Development and Food Security, European Commission

Joint Research Centre, JRC

Date: 25 January 2018

© United Nations Sabbatical Leave Programme

The views and recommendations expressed in the present report are solely those of

the original author and other contributors and do not necessarily reflect the official

views of the United Nations, its agencies or its Member States. Textual material may

be freely reproduced with proper citation as appropriate.

Endorsement by academic supervisor

This is to certify that the present report is based on the research undertaken by Mr.

Giovanni Savio during the period July-November 2017 at JRC, European

Commission, under my supervision.

Signature:

Name: Alan BELWARD

Title: Head of Unit JRC D06

Date: January, 2018

Page 2: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Contents

Acknowledgements

Abstract

1. Introduction 5-6

2. Definition of Big Data 6-9

3. Big Data and Official Statistics 9-12

4. Big Data and the SDGs 12-17

5. Empirical Applications 17-32

6. Conclusions and Recommendations 32-34

References

Page 3: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

3/38

Acknowledgements

The research was carried out during my UN Sabbatical Leave period at

the Joint Research Centre, JRC, of the European Commission. I am in

debt with Mr. Alan Belward and Ms. Apollonia Miola for hosting me

during the four months of the Sabbatical period in their Unit. I am also

grateful to Ms. Francesca Campolongo and Mr. Andrea Pagano of the

JRC for their logistic support during my research. My thanks go also to

participants to an internal JRC seminar, organized by the Unit D6 in

October 2017, where main results and ideas of the research were

presented and discussed.

Page 4: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Abstract

The adoption of the Sustainable Development Goals in September 2015

by the United Nations General Assembly is calling National Statistics

Offices (NSOs) worldwide to underpin a data revolution. Indeed, NSOs

should extend both the scope and disaggregation of the data traditionally

produced, and measure new economic, social and environmental

phenomena, leaving none behind.

There is a growing consensus that, in the digital era, Big Data might

strengthen traditional data sources and statistics in monitoring

sustainable well-being, facilitating the transformative agenda that

official statisticians should implement in the forthcoming years facing

the new challenges.

This research reviews Big Data definitions and analyses sources for

monitoring economic and environmental indicators for Latin America

and the Caribbean countries. It exploits the advantages coming from a

more intensive use of Big Data (frequency, granularity, coverage, costs

for data collection etc.), and the requirements Big Data should satisfy

for an effective use in official statistics, and the requirements they should

fulfil for an effective and efficient use as proxies for the SDGs.

The research also statistically and econometrically tests whether Big

Data might contribute to fill in existing gaps of official statistics for two

main indicators for Sustainable Development Goal indicators for Latin

America and Caribbean countries, namely GDP and poverty.

The empirical analyses, carried out with particular reference to poverty,

show that there might be considerable advantages from the use of Big

Data sources by official statisticians in the framework of the new

increasing demand coming from policy makers.

Page 5: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

5/38

1. Introduction

The adoption of the SDGs in September 2015 is calling NSOs worldwide

to underpin a data revolution, see Independent Expert Advisory Group

on a Data Revolution for Sustainable Development (2014). Indeed,

NSOs will be asked to extend both scope and disaggregation of the data

produced, and measure new economic, social and environmental

phenomena, leaving none behind.

There is a growing consensus that Big Data may strengthen traditional

data sources and statistics in monitoring sustainable well-being,

facilitating the transformative agenda that NSOs should implement in

the forthcoming years.

This research reviews Big Data sources for monitoring economic and

environmental indicators for Latin America and the Caribbean. It

exploits their advantages (frequency, granularity, coverage, costs for

data collection etc.) and the requirements they should fulfil for an

effective and efficient use as proxies for the SDGs. Finally, it statistically

and econometrically tests whether Big Data might contribute to fill in

existing gaps for some SDG indicators of Latin America and Caribbean

countries.

The research focuses on earth observation and satellite images as the

main instrument to derive early estimates for SDGs, particularly those

considered in the present report, namely GDP and poverty.

The research is divided into five parts. The first part deals with the

definition of Big Data and reviews their use in Official Statistics.

A second part of the research is dedicated to a review of potentialities of

Big Data for official statistics.

A third part of the research is dedicated to an in depth analysis of the

sources of information of Big Data for economic and environmental

SDG indicators. This part of the research also reviews the existing

mechanisms in place at the national level and at the international level

to use Big Data for SDGs monitoring and reporting.

A fourth part of the research is mostly empirical, and constitutes the core

part of the study. It analyses how earth observation data can be used to

produce maps of GDP and Poverty indices at a very disaggregated

geographical level (1 squared km) for immediate use for SDGs

monitoring and reporting. In both cases, it is discussed in detail

Page 6: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

characteristics of the data used, literature already available doing similar

exercises, as well as preliminary results.

A final part of the research is dedicated to summarize results and propose

lines of action for the future.

2. Definition of Big Data

Big Data are becoming the by-product of the increasing digitalization of

our modern economies. This phenomenon is likely to endure for the

years to come. However, nowadays there is no uniformly accepted

definition of Big Data. Big Data are generally characterized by the so-

called four V’s, namely velocity, volume, veracity and variety, as in the

infographics of the IBM Big Data & Analitics Hub1, see Figure 2.1.

Figure 2.1: Properties of Big Data - The 4 Vs

Some authors also refer to other Vs that are relevant in order to fully

describe Big Data. Amongst those characteristics are: their Value

(information and insights that Big Data provide), Viability (quick and

cost-effective assessment of a particular variable’s relevance),

Variability (due to changing definitions, irregularities in the data,

existence of multitude of data dimensions resulting from multiple

__________________

1 Available at http://www.ibmbigdatahub.com/infographic.

Page 7: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

7/38

disparate data types and sources) and Visualization (way of presenting

the data in a manner that is readable and accessible).

The meaning of the four Vs are summarised in Table 2.1, which is drawn

from TechAmerica Foundation (2012). As correctly pointed out by

Manske, Sangokoya, Pestre, and Letouzé (2016), Big Data refers not

only to data, but also to the whole ecosystem that produce and use them.

This gives rise to the three C’s definition of Big Data, being them

characterized by the union of Big Data Crumbs (new kind of passively

generated data), Capacity (as the technical and human capacity to yield

insights from this data) and Community (new actors from the private

sector and the research community for example).

The definition of Big Data provided by the TechAmerica Foundation,

although rather general, is the one adopted here. It states that: ‘Big Data

is a term that describes large volumes of high velocity, complex and

variable data that require advanced techniques and technologies to

enable the capture, storage, distribution, management, and analysis of

the information.’

Table 2.1: Characteristics of Big Data

Characteristic Description Attribute Driver Volume The sheer amount of

data generated or data

intensity that must be

ingested, analysed,

and managed to make

decisions based on

complete data

analysis

According to IDC’s

Digital Universe

Study, the world’s

“digital universe” is

in the process of

generating 1.8

Zettabytes of

information - with

continuing

exponential growth -

projecting to 35

Zettabytes in 2020

Increase in

data sources,

higher

resolution

sensors

Velocity How fast data is

being produced and

changed and the

speed with which

data must be

received, understood,

and processed

- Accessibility:

Information when,

where, and how the

user wants it, at the

point of impact

- Applicable:

Relevant, valuable

information for an

enterprise at a

torrential pace

becomes a real-time

phenomenon

- Increase in

data sources

- Improved

thru-put

connectivity

- Enhanced

computing

power of data

generating

devices

Page 8: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

- Time value: real-

time analysis yields

improved data-driven

decisions Variety The rise of

information coming

from new sources

both inside and

outside the walls of

the enterprise or

organization creates

integration,

management,

governance, and

architectural

pressures on IT

- Structured – 15% of

data today is

structured, row,

columns

- Unstructured – 85%

is unstructured or

human generated

information

- Semistructured –

The combination of

structured and

unstructured data is

becoming paramount.

- Complexity – where

data sources are

moving and residing

- Mobile

- Social

Media

- Videos

- Chat

- Genomics

- Sensors

Veracity The quality and

provenance of

received data

The quality of Big

Data may be good,

bad, or undefined due

to data inconsistency

& incompleteness,

ambiguities, latency,

deception, model

approximations

Data-based

decisions

require

traceability

and

justification

Big Data types are classified, following a definition based on data

sources, see United Nations Economic and Social Council (2013), as

follows:

1. Sources arising from the administration of a programme, be it

governmental or not, e.g., electronic medical records, hospital visits,

insurance records, bank records and food banks;

2. Commercial or transactional sources arising from the transaction

between two entities, e.g., credit card transactions and online

transactions (including from mobile devices);

3. Sensor network sources, e.g., satellite imaging, road sensors and

climate sensors, such as those pertaining to Remote Sensing data

sources;

4. Tracking device sources, e.g., tracking data from mobile telephones

and the Global Positioning System (GPS);

Page 9: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

9/38

5. Behavioural data sources, e.g., online searches (about a product, a

service or any other type of information) and online page views; and

6. Opinion data sources, e.g., comments on social media.

Sensor network sources, particularly satellite imaging, are those used in

this research to proxy the relevant SDGs.

There are other attempts of classifying types of Big Data, as the one

proposed by the UN Economic Commission for Europe2, which is more

general and goes into a two-digit level classification, and it seems

appropriate for further investigation.

Administrative data, traditionally organized in structured way by public

administrations, should not be classified as Big Data, but could become

such if the velocity and volume characteristics would increase, as it

seems appropriate in the future.

3. Big Data and Official Statistics

In addition to generating new commercial opportunities in the private

sector, Big Data are potentially a very interesting data source for official

statistics, either for use on their own, or in combination with more

traditional data sources, such as sample surveys and administrative

registers.

Nowadays, Big Data are unanimously described as a transformative tool

for official statistics, and the statistical community has recognized the

potential for Big Data in improving accuracy and reducing costs for

NSOs around the world.

Typical examples of the use of Big Data of various types for statistical

purposes include the ‘web scraping’ of internet data to produce the

‘billion prices’ Consumer Price Indices; Google searches for now-

casting of the state of the economy, the employment and unemployment

rates, car sales, tourism demand, and migration data; social media

messages (especially content and sentiment) to proxy confidence and

sentiment indicators of consumers and enterprises in various activity

sectors; and satellite images to obtain estimates of activity levels and

growths, energy consumption, land and water use, and poverty

conditions.

__________________

2 See http://www1.unece.org/stat/.

Page 10: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

It is this last area of use of Big Data that represents the focus of this

research.

However, extracting relevant and reliable information from Big Data

sources and incorporating it into the statistical production process is not

an easy task. There are challenges regarding analysis, capture, search,

sharing, storage, transfer, visualization, and information privacy of big

data. These challenges require new technologies to uncover hidden

values from large datasets that are diverse, complex, and massive in

scale.

Some of the biggest challenges that statisticians face in their use of Big

Data concern methodology. Many Big Data sources, such as social media

messages, are composed of observational data without a well-defined

target population, structure and quality. This makes it difficult to apply

traditional statistical methods based on sampling theory. The

unstructured nature of many Big Data sources makes it even more

difficult to extract meaningful statistical information.

For NSOs, a key question concerns how the quality of official statistics

can be guaranteed if Big Data are used totally or in part to derive

estimates. Because these data are collected for non-statistical purposes,

they usually do not meet statistical standards in many respects, i.e.

representativeness, coverage, concepts, definitions, collection methods. To

use these data, official statisticians should investigate and understand the

statistical characteristics of the data and improve the accuracy of these non-

statistical extrapolators through weighting, filling in gaps in coverage, bias

adjustments, averaging with other extrapolators, and benchmarking and

balancing. For this, it is imperative that Big Data be accompanied by

appropriate metadata, which should be clearly scrutinized before their use

to produce official statistics. Unfortunately, this is not always the case, and

might result in a certain loss of control and dependency on the part of

official statisticians.

Privacy and legal issues form another challenge. The prevention of the

disclosure of the identity of individuals is an imperative, but this is

difficult to guarantee when dealing with Big Data. Since legislation

typically lags behind the emergence of new social phenomena, the legal

situation for cases involving Big Data is not always clear. In such cases,

one may have to fall back on ethical standards to decide on whether and

how to use Big Data. Other legal issues relate to copyright and the

ownership of data. Even if data may legally be used, this does not imply

that it is wise or appropriate to do so.

Page 11: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

11/38

Another obvious challenge is the processing, storage and transfer of

large data sets. Technological advances in computing, larger storage

facilities and high bandwidth data channels may partly solve these

issues. Having data processed at the source, thus preventing the transfer

of large data sets and the duplication of storage, may also be considered.

These technological challenges include mechanisms for ensuring the

security of data, which is of the utmost importance because of privacy

and confidentiality concerns and makes, for example, cheap cloud-based

solutions less attractive.

Another issue is the possible volatility of Big Data sources and, given

that official statistics often take the form of time series, the availability

of long time series of temporally consistent data source of information.

For many users, the continuity of these series is of the utmost

importance.

Still another issue is the skills required for dealing with Big Data.

Modern data scientists may be better equipped than traditionally trained

statisticians. Probably more important is the need for a different mind-

set as the use of Big Data may imply a paradigm shift, including an

increased and modified use of modelling and forecasting techniques

(Daas and Puts, 2014a; Struijs and Daas, 2013).

In 2014 the United Nations established a Global Working Group (GWG)

to ‘provide a strategic vision, direction, and a global programme on big

data for official statistics, to promote practical use of sources of big data

for official statistics, while finding solutions to their challenges, and to

promote capacity building and sharing of experiences in this respect.’

The GWG provides strategic vision, direction and coordination for a

global programme on Big Data for official statistics, including indicators

of the 2030 Agenda for Sustainable Development. It also promotes the

practical use of Big Data sources, capacity-building, training and sharing

of experience. Finally, it fosters communication and advocacy with

respect to the use of Big Data for policy applications and offers advice

on building public trust in the use of Big Data from the private sector.

Since 2014, annual reports by the GWG are discussed by UN member

countries’ Chief Statisticians within the framework of the UN Statistical

Commission. These reports constitute an up-to-date overview of

activities carried out internationally in the many field of intersection

between Big Data and official statistics.

Page 12: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

The GWG established three teams on mobile phone data, satellite

imagery and social media data, respectively, to draft guidance and

develop practice through pilot projects. Another team was established

dealing with access to data and building partnerships with the private

sector and other communities; that team drafted provisional agreements

for access to data with globally operating Big Data providers.

An additional team was established with the aim to communicate the

benefits and value of Big Data, which included fundraising strategies to

enable developing countries to actively participate in pilot projects.

Given the context of the 2030 Agenda for Sustainable Development, it

was also agreed that one team would be tasked specifically with keeping

track of the links between the indicators needed for monitoring the SDGs

and Big Data applications. Finally, two more teams were created: one on

training, skills and capacity-building, and one on cross-cutting issues,

such as methodology, classification and quality frameworks.

4. Big Data and the SDGs

The adoption of the SDGs in September 2015 is calling NSOs worldwide

to underpin a data revolution. NSOs are now asked to extend both scope

and disaggregation of the data produced, and measure new economic,

social and environmental phenomena, leaving none behind. Most of the

SDGs indicators to be collected should be disaggregated by sex, income

level, age, geographical area, and activity.

There is a growing consensus that Big Data might strengthen traditional

data sources and statistics in monitoring sustainable well-being,

facilitating the transformative agenda that NSOs should implement in

the forthcoming years.

Measuring human outcomes using new kinds of data emitted by humans,

as those observed through satellite image, is indeed an area with

considerable promise, but it also carries significant challenges and

uncertainties.

For the most part, the ‘Big Data and SDG’ debate is framed as a

measurement and monitoring issue. There is already a significant body

of evidence that Big Data holds this potential, as the examples and

literature discussed in the next Chapter well summarize if we just focus

Page 13: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

13/38

on the indicators (poverty and GDP) that are relevant in the framework

of this research3.

Apart from contributing to estimation of missing observations, the

empirical applications presented here might contribute to improve

timeliness and geographical disaggregation for the SDGs reported in

Table 4.1 (see UNSC (2017)).

Based on recent research, data availability at SDG level is facing

tremendous challenges worldwide, without considering the expressed

need for data disaggregation by sex, geographic area, sector, income

level etc.4, in order to leave none behind.

Following the tiering system of the Interagency and Expert Group on the

SDG Indicators (IAEG-SDG) to categorize the indicators, the picture is

unfortunately not promising.

The IAEG-SDG has classified the indicators into three categories based

on the soundness of methodology and the availability of data. Tier I

indicators have an established methodology and regularly produced data;

Tier II indicators have an established methodology but not regularly

produced data; and Tier III indicators are indicators with no established

methodology.

With just 42% of indicators out of the total 230 indicators being Tier I,

only 62% of Tier I indicators - or 25% of all indicators – can be found

online in a publicly accessible format.

Without publicly accessible data, citizens and external groups cannot

keep UN member states accountable for their progress in implementing

each of the goals. Even the agenda’s cornerstone indicator on extreme

poverty lacks data on 72 countries over the last 15 years.

Much more work needs to be done to establish baselines and allow for

the monitoring of progress in each goal.

__________________

3 See also Data-Pop Alliance (2016) for examples of using Big Data for other SDGs. 4 Available at https://www.cgdev.org/blog.

Page 14: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Table 4.1: Relevant sustainable development goals, targets

and indicators

Goals and Targets (from the 2030

Agenda)

Indicators

Goal 1. End poverty in all its forms

everywhere

1.1 By 2030, eradicate extreme poverty for all

people everywhere, currently measured as

people living on less than $1.25 a day

1.1.1 Proportion of population below

the international poverty line, by sex,

age, employment status and

geographical location (urban/rural)

1.2 By 2030, reduce at least by half the

proportion of men, women and children of all

ages living in poverty in all its dimensions

according to national definitions

1.2.1 Proportion of population living

below the national poverty line, by sex

and age

1.2.2 Proportion of men, women and

children of all ages living in poverty in

all its dimensions according to national

definitions 1.4 By 2030, ensure that all men and women,

in particular the poor and the vulnerable, have

equal rights to economic resources, as well as

access to basic services, ownership and

control over land and other forms of property,

inheritance, natural resources, appropriate new

technology and financial services, including

microfinance

1.4.1 Proportion of population living in

households with access to basic

services

Goal 7. Ensure access to affordable,

reliable, sustainable and modern energy for

all

7.1 By 2030, ensure universal access to

affordable, reliable and modern energy

services

7.1.1 Proportion of population with

access to electricity

7.2 By 2030, increase substantially the share

of renewable energy in the global energy mix

7.2.1 Renewable energy share in the

total final energy consumption

7.3 By 2030, double the global rate of

improvement in energy efficiency

7.3.1 Energy intensity measured in

terms of primary energy and GDP

Goal 8. Promote sustained, inclusive and

sustainable economic growth, full and

productive employment and decent work

for all

8.1 Sustain per capita economic growth in

accordance with national circumstances and,

in particular, at least 7 per cent gross domestic

product growth per annum in the least

developed countries

8.1.1 Annual growth rate of real GDP

per capita

8.2 Achieve higher levels of economic

productivity through diversification,

8.2.1 Annual growth rate of real GDP

per employed person

Page 15: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

15/38

technological upgrading and innovation,

including through a focus on high-value added

and labour-intensive sectors

8.3 Promote development-oriented policies

that support productive activities, decent job

creation, entrepreneurship, creativity and

innovation, and encourage the formalization

and growth of micro-, small-and medium-

sized enterprises, including through access to

financial services

8.3.1 Proportion of informal

employment in non-agricultural

employment, by sex

8.4 Improve progressively, through 2030,

global resource efficiency in consumption and

production and endeavour to decouple

economic growth from environmental

degradation, in accordance with the 10-Year

Framework of Programmes on Sustainable

Consumption and Production, with developed

countries taking the lead

8.4.1 Material footprint, material

footprint per capita and material

footprint per GDP

8.4.2 Domestic material consumption,

domestic material consumption per

capita, and domestic material

consumption per GDP

8.9 By 2030, devise and implement policies to

promote sustainable tourism that creates jobs

and promotes local culture and products

8.9.1 Tourism direct GDP as a

proportion of total GDP and in growth

rate

Goal 9. Build resilient infrastructure,

promote inclusive and sustainable

industrialization and foster innovation

9.2 Promote inclusive and sustainable

industrialization and, by 2030, significantly

raise industry’s share of employment and

gross domestic product, in line with national

circumstances, and double its share in least

developed countries

9.2.1 Manufacturing value added as a

proportion of GDP and per capita

9.4 By 2030, upgrade infrastructure and

retrofit industries to make them sustainable,

with increased resource-use efficiency and

greater adoption of clean and environmentally

sound technologies and industrial processes,

with all countries taking action in accordance

with their respective capabilities

9.4.1 CO2 emission per unit of value

added

9.5 Enhance scientific research, upgrade the

technological capabilities of industrial sectors

in all countries, in particular developing

countries, including, by 2030, encouraging

innovation and substantially increasing the

number of research and development workers

9.5.1 Research and development

expenditure as a proportion of GDP

Page 16: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

per 1 million people and public and private

research and development spending

9.b Support domestic technology

development, research and innovation in

developing countries, including by ensuring a

conducive policy environment for, inter alia,

industrial diversification and value addition to

commodities

9.b.1 Proportion of medium and high-

tech industry value added in total value

added

Goal 10. Reduce inequality within and

among countries

10.1 By 2030, progressively achieve and

sustain income growth of the bottom 40 per

cent of the population at a rate higher than the

national average

10.1.1 Growth rates of household

expenditure or income per capita

among the bottom 40 per cent of the

population and the total population

10.2 By 2030, empower and promote the

social, economic and political inclusion of all,

irrespective of age, sex, disability, race,

ethnicity, origin, religion or economic or other

status

10.2.1 Proportion of people living

below 50 per cent of median income,

by sex, age and persons with

disabilities

10.4 Adopt policies, especially fiscal, wage

and social protection policies, and

progressively achieve greater equality

10.4.1 Labour share of GDP,

comprising wages and social protection

transfers

Goal 12. Ensure sustainable consumption

and production patterns

12.2 By 2030, achieve the sustainable

management and efficient use of natural

resources

12.2.1 Material footprint, material

footprint per capita and material

footprint per GDP

12.2.2 Domestic material consumption,

domestic material consumption per

capita and domestic material

consumption per GDP

12.c Rationalize inefficient fossil-fuel

subsidies that encourage wasteful

consumption by removing market distortions,

in accordance with national circumstances,

including by restructuring taxation and

phasing out those harmful subsidies, where

they exist, to reflect their environmental

impacts, taking fully into account the specific

needs and conditions of developing countries

and minimizing the possible adverse impacts

12.c.1 Amount of fossil-fuel subsidies

per unit of GDP (production and

consumption) and as a proportion of

total national expenditure on fossil

fuels

Page 17: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

17/38

on their development in a manner that protects

the poor and the affected communities

Goal 14. Conserve and sustainably use the

oceans, seas and marine resources for

sustainable development

14.7 By 2030, increase the economic benefits

to small island developing States and least

developed countries from the sustainable use

of marine resources, including through

sustainable management of fisheries,

aquaculture and tourism

14.7.1 Sustainable fisheries as a

proportion of GDP in small island

developing States, least developed

countries and all countries

Goal 17. Strengthen the means of

implementation and revitalize the Global

Partnership for Sustainable Development

17.1 Strengthen domestic resource

mobilization, including through international

support to developing countries, to improve

domestic capacity for tax and other revenue

collection

17.1.1 Total government revenue as a

proportion of GDP, by source

17.3 Mobilize additional financial resources

for developing countries from multiple source

17.3.2 Volume of remittances (in

United States dollars) as a proportion

of total GDP

17.13 Enhance global macroeconomic

stability, including through policy

coordination and policy coherence

17.13.1 Macroeconomic Dashboard

5. Empirical Applications

This part of the report shows the work carried out with night lights

satellite images, which are used to provide a proxy for mapping poverty

and GDP at a very fine geographical level. While mapping results have

been finalized for the poverty indices, concerning GDP the work is still

under way and final results can not be shown here. However, the section

concerning GDP reports on existing literature, and discussed the steps

already carried out in empirical analysis. This section is divided into

three sub-sections, which include a description of the Big Data satellite

images used in empirical applications, and GDP and Poverty indices

analyses.

5.1 Night Lights and Big Data from Satellite Images

Page 18: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Earth observation have been used in many respects to shed light on

specific aspects of human development, such as economic output,

population and demography, urban development, land, water and natural

resources use, weather and climate change, and pollution monitoring.

In parallel, there has been a growing use of nightlights, one of the most

important by-products of satellite remote sensing, as proxy for

measuring economic, social and environmental phenomena.

The research have used extensively the set of information coming from

satellite images, as processed by the US Department of Defense, and its

Defense Meteorological Satellite Program’s Operational Linescan

System (DMSP-OLS).

A characteristic of DMSP-OLS data that has attracted most attention of

research in the last years is their availability at a very fine geographical

level (1 square km), thus making it possible to estimate through them a

number of statistics at sub-national detail, particularly those related to

the level and growth of economic activity, thus providing an answer to

chronicle lack of official statistics at fine geographical. Indeed, this is

the level of disaggregation requested within the framework of the SDGs.

The DMSP is the meteorological program of the US Department of

Defense, which started its activities in the mid-1960s with the objective

of collecting worldwide cloud cover observations on a daily basis. The

prgram was officially acknowledged and declassified in 1972 and made

available to the world community.

The DMSP programme has been repeatedly upgraded over time, with the

latest series incorporating the Operational Linescan System, OLS, and

now releasing its Version 4, spanning data for the years 1992-2013. The

DMSP satellite flies in a sun-synchronous low earth orbit (833km mean

altitude) and makes a night-time pass typically between 20.30 and 10.00

each night. Orbiting the earth 14 times a day means that global coverage

can be obtained every 24 hours.

The OLS sensor has two broadband sensors, one in the visible/near-

infrared (VNIR, 0.4 − 1.1𝜇𝑚) and thermal infrared (10.5 − 12.6𝜇𝑚)

wavebands. The OLS is an oscillating scan radiometer with a broad field

of view (~ 3,000 km swath) and captures images at a nominal resolution

of 0.56 km, which is smoothed on-board into 5x5 pixel blocks to 2.8 km.

This is done to reduce the amount of memory required on board the

satellite.

Page 19: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

19/38

Scientists at the National Oceanic and Atmospheric Administration’s

(NOAA) National Geophysical Data Center (NGDC) process these raw

data and distribute the final data to the public, following an undertaking

of monumental difficulty. In processing, they remove observations for

places experiencing the bright half of the lunar cycle, the summer

months when the sun sets late, auroral activity (the northern and southern

lights), and forest fires. These restrictions remove intense sources of

natural light, leaving mostly man-made light. Observations where cloud

cover obscures the earth’s surface are also excluded. Finally, data from

all orbits of a given satellite in a given year are averaged over all valid

nights to produce a satellite-year dataset.

It is these datasets that are distributed to the public. Each satellite-year

dataset is a grid reporting the intensity of lights as a six-bit digital

number, for every 30 arc-second output pixel (approximately 0.86 square

km at the equator) between 65 degrees south and 75 degrees north

latitude.

Table 5.1.1: DMSP-OLS satellites

Satellites

Year F10 F12 F14 F15 F16 F18

1992 F101992 - - - - -

1993 F101993 - - - - -

1994 F101194 F121994 - - - -

1995 - F121995 - - - -

1996 - F121996 - - - -

1997 - F121997 F141997 - - -

1998 - F121998 F141998 - - -

1999 - F121999 F141999 - - -

2000 - - F142000 F152000 - -

2001 - - F142001 F152001 - -

2002 - - F142002 F152002 - -

2003 - - F142003 F152003 - -

2004 - - - F152004 F162004 -

2005 - - - F152005 F162005 -

2006 - - - F152006 F162006 -

2007 - - - F152007 F162007 -

2008 - - - - F162008 -

2009 - - - - F162009 -

2010 - - - - - F182010

2011 - - - - - F182011

2012 - - - - - F182012

2013 - - - - - F182013

Page 20: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

The digital number is an integer between 0 (no light) and 63. A small

fraction of pixels (0.1 percent), generally in rich and dense city areas,

are censored at 63. De facto, sensor settings vary over time across

satellites and with the age of a satellite, so that comparisons of raw

digital numbers over years can be problematic. This explains why

satellites, in the very last years, are replaced by new satellites,

accompanying them for the last few years of life, see Table 5.1.15.

In statistical work, we control for such issues in the version with stable

lights, not intercalibrated across time or satellites, by using panel

regression estimation with fixed effects for time and satellites6. The

digital number is not exactly proportional to the physical amount of light

received (called true radiance) for several reasons. The first is sensor

saturation, which is analogous to top-coding. Further, the scaling factor

(“gain”) applied to the sensor in converting it into a digital number varies

for reasons that are not explained, possibly to allow Air Force analysts

to get clearer information on cloud cover.

Unfortunately, the level of gain applied to the sensor is not recorded in

the data. The DMSP nighttime lights provide the longest continuous time

series of global urban remote sensing products, now spanning 22 years.

The flagship product is the stable lights, an annual cloud-free composite

of average digital brightness value for the detected lights, filtered to

remove ephemeral lights and background noise.

NGDC recently reprocessing of the DMSP time series have produced 34

annual products from six satellites spanning 22 years. This is referred to

as the v.4 DMSP stable lights time series, the ones used here for GDP

studies.

The follow on to DMSP for global low-light imaging of the Earth at

night is the Visible Infrared Imaging Radiometer Suite (VIIRS)

Day/Night Band (DNB), flown jointly by the same NASA-NOAA Suomi

National Polar Partnership. Those are the data used for mapping poverty

indices here. Indeed, these data are available for a shorter time series

(data are indeed available on a monthly basis only from 2012 onwards,

__________________

5 That happened for all satellites but the last, F16, substituted by the last orbiting F18 without

overlapping period. 6 There are different versions of the data; three of particular importance are the “raw,” the

“stable lights” and the “calibrated” versions. The stable lights version removes ephemeral

events such as fires and background noise. The calibrated version is currently available only

for 2006 and has the advantage of not being saturated (top-coded) at the highest intensities.

We performed the analyses here primarily with the stable lights version, but did sensitivity

checks using other measures and found only small quantitative differences.

Page 21: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

21/38

annually only for 2015), but they are of greater precision than previous

DMSP images and made available to public in a very timely way, after

some few days from the end of each month.

VIIRS DNB provides several key improvements over DMSP-OLS data,

including a vast reduction in the pixel footprint (ground instantaneous

field of view [GIFOV]), uniform GIFOV from nadir to edge of scan,

lower detection limits, wider dynamic range, finer quantization, and in-

flight calibration (Miller et al. 2012; Elvidge et al. 2013; Miller et al.

2013).

Prior to averaging, the DNB data is filtered to exclude data impacted by

stray light, lightning, lunar illumination, and cloud-cover. Cloud-cover

is determined using the VIIRS Cloud Mask product (VCM). In addition,

data near the edges of the swath are not included in the composites

(aggregation zones 29-32). Temporal averaging is done on a monthly

and annual basis. The version 1 series of monthly composites has not

been filtered to screen out lights from aurora, fires, boats, and other

temporal lights. However, the annual composites have layers with

additional separation, removing temporal lights and background (non-

light) values.

The version 1 products span the globe from 75N latitude to 65S. The

products are produced in 15 arc-second geographic grids and are made

available in geotiff format as a set of 6 tiles. The tiles are cut at the

equator and each span 120 degrees of latitude. Each tile is actually a set

of images containing average radiance values and numbers of available

observations.

In the monthly composites, there are many areas of the globe where it is

impossible to get good quality data coverage for that month. This can be

due to cloud-cover, especially in the tropical regions, or due to solar

illumination, as happens toward the poles in their respective summer

months. Therefore, it is imperative that users of these data utilize the

cloud-free observations file and not assume a value of zero in the average

radiance image means that no lights were observed.

5.2 The use of night lights to predict regional GDP in Latin America and

Caribbean countries

The first application relates to use of night lights satellite images to

predict regional GDP data for Latin America countries.

Page 22: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Earth observation have been used in many respects to shed light on

specific aspects of human development, such as economic output,

population and demography, urban development, land, water and natural

resources use, weather and climate change, and pollution monitoring. In

parallel, there has been a growing use of nightlights, one of the most

important by-products of satellite remote sensing, as proxy for

measuring economic, social and environmental phenomena.

In three excellent survey articles, Ghosh, Anderson, Elvidge, and Sutton

(2013), Huang, Yang, Gao, Yang, and Zhao (2014) and Donaldson and

Storeygard (2016) refer to numerous examples of use of night-lights as

correlates for GDP, poverty, informal economic activity and remittances,

human ecological footprint, energy and electric power consumption,

demography, fishing, anthropogenic gas emissions, information and

communication technology, urban structure and population, and carbon

dioxide emissions.

Nowadays, the use of night-light as proxy of GDP or as instrument to

improve the quality of national accounts data at national and sub-

national level, has becomes a standard in empirical economics. The

obvious advantage in using night-lights is that they generally show a

good correlation with GDP, they are available for free and for a long

time span, and they are objectively measured.

In an earlier study, after the release of the first data by the DMSP-OLS,

Elvidge, Baugh, Kihn, Kroehl, and Davis (1997) focused on the

correlation between luminosity and GDP at the country level in a single

year (1994) and found a strong correlation between the two measures for

21 countries. A characteristic of DMSP-OLS data that has attracted most

attention of subsequent reasearch is their availability at a very fine

geographical level (1 square km), thus making it possible to estimate

through them level of economic activity at sub-national detail, providing

an answer to a chronicle deficiency of official statistics on national

accounts. Examples of exploitation in simple statistics framework of

capacity of night lights to predict sub-national level of activity include,

among others, Sutton and Costanza (2002), Ebener, Murray, Tandon, and

Elvidge (2005), Doll, Muller, and Morley (2006), Sutton, Elvidge, and

Ghosh (2007), Ghosh, Powell, Elvidge, Baugh, Sutton, and Anderson

(2010), Bhandari and Roychowdhury (2011), and Chen and Nordhaus

(2011).

Work on DMSP-OLS data has expanded consistently since the seminal

work of Henderson, Storeygard, and Weil (2012). In an annual panel of

countries from 1992 to 2008, Henderson, Storeygard, and Weil (2012)

Page 23: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

23/38

show how to combine lights measure with an income measure to improve

estimates of economic growth, under the assumption of independence of

errors between the two data sources. The authors estimate an elasticity

of around 0.3 of measured GDP growth with respect to lights growth, for

use in predicting income growth. They do also estimate a structural

elasticity of lights growth with respect to GDP growth of just over 1.0.

Finally, as night lights data are observed at a much finer geographical

detail than standard official output measures, authors use night lights

data to obtain estimates of income growths at the sub- or supra-national

level in the context of the sub-Saharan Africa region, under the

assumption that elasticities calculated at national level are stable enough

to be applied at finer or larger geographical detail.

The paper by Henderson, Storeygard, and Weil (2012) was the first one

to use night lights in a complete statistics and econometric framework to

estimate in a panel of time series real economic growth, while previous

cited studies were conducted with data in levels, across countries, and in

general for single time periods.

Following the examples provided in Henderson, Storeygard, and Weil

(2012) on sub-Saharan Africa region, subsequent and innovative

literature has used lights as a proxy for economic activity within fine

geographic units, for which no alternative data source is available,

including cities (Stathakis (2016); Storeygard (2016)), ethnic homelands

(Alesina, Michalopoulos, and Papaioannou (2016); Michalopoulos and

Papaioannou (2013);Michalopoulos and Papaioannou (2014)), large

uniform grid squares (Henderson, Squires, Storeygard, and Weil (2016)),

and grid squares around natural areas such as those surrounding rivers

(Bleakley and Lin (2012)).

Sub-national administrative units have been deeply investigated by Lee

(2016) for North Korea, Bundervoet, Maiyo, and Sanghi (2015) for

Kenya and Rwanda, Mellander, Lobo, Stolarick, and Matheson (2015)

for Sweden, Obikili (2015) for Nigeria, Roychowdhury, Jones,

Arrowsmith, and Reinke (2012) for India, and Shi, Yu, Huang, Hu, Yin,

Chen, Chen, and Wu (2014) for China.

Quite recently, while some papers have confirmed the ideas uderlying

the lights-to-GDP hypothesis at the country level (see, e.g., Elvidge,

Hsu, Baugh, and Ghosh (2014)), the approach used by Henderson,

Storeygard, and Weil (2012) have been criticized due to the implicit

assumption of stable elasticities made in deriving sub and/or supra-

national estimates.

Page 24: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Indeed, Bickenbach, Bode, Nunnenkamp, and Söder (2016) have tested

the relationship between long-term growth rates of GDP per square km

and the long-term growth rates of lights per square km for sub-regions

of Brazil, India, Europe and the United States. They find that the

resulting growth elasticities are not stable across the geography of each

country or region, and infer that nightlights data are not a good proxy

for sub-regional GDP, thus invalidating the results obtained by

Henderson, Storeygard, and Weil (2012) and part of the literature

mentioned above.

Likewise, Addison and Stewart (2015) use growth rates as the basis for

testing the suitability of night-lights data as a proxy for GDP at the

national level and set out clearly the criteria for what constitutes a good

proxy.

First and foremost, the proxy variable (night lights) should have a

statistically significant and positive correlation with the variable it

would substitute for. Second, that relationship should hold up when the

data are expressed in growth rates rather than levels. In other words, one

should expect to find statistically significant elasticities of growth

between nightlights and economic variable. Third, the elasticity should

be constant over time.

In this regard, the authors disagree with Bickenbach, Bode,

Nunnenkamp, and Söder (2016) that instability of elasticities across sub-

regions would be a problem. To the contrary, growth in sub-regional

night-lights data can serve as a good proxy for growth sub-regional GDP

as long as the corresponding disparate sub-regional elasticities remain

constant over time. Moreover, growth in national GDP will be a simple

weighted average of growth in sub-regional GDP, with the elasticities

serving as weights. However, the authors add that there might be at least

one circumstance where one would want growth elasticities to be equal

across regions: one might be concerned about changes in the spatial

distribution of growth, moving from one sub-region to another.

Another circumstance we would add is that, if one wants to estimate

GDP at finer geographical level using night lights, one should have

statistically equal elasticities at the lower and higher geographical levels.

However, this circumstance is not further considered by the authors, as

their paper is constrained to national boundaries.

Addison and Stewart (2015) found that, although data do not reject a

positive correlation of lights with different measures of GDP (total, non-

Page 25: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

25/38

agricultural and manufacturing), the growth elasticities of night lights

with respect to these economic variables are too small and/or unstable

over time for practical use.

For Latin America countries, the literature on lights and GDP is quite

scarce and not systematic. Ghosh, Anderson, Powell, Sutton, and

Elvidge (2009) exploits the potential for estimating the formal and

informal economy of Mexico in 2000 through a ’blindly donor-approach’

using the estimated relationships between the spatial patterns of

nighttime satellite imagery and economic activity in the United States.

Muzzini, Eraso Puig, Anapolsky, Lonnberg, and Mora (2016) using night

lights data and gross product at province level, derive agglomeration-

level estimates of real GDP in Argentina for the period 1996-

2010.Lorena (2013) found a strong linear relationship between night

lights DMSP-OLS outbreaks and, amongst others, GDP data for the

Espirito Santo area of Brazil. As discussed above, Bickenbach, Bode,

Nunnenkamp, and Söder (2016) use data on real GDP for 4820 Brazilian

municipalities in 1999–2010 and test for parameter stability across five

statistical regions, Norte, Nordeste, Sudeste, Sul and Centro-Oeste. The

authors found that a stable relationship between night lights growth and

true GDP growth does not appear to exist across Brazilian regions.

Finally, in a global exercise on the correlation (in levels) between GDP,

night lights and population at national levels during 1992-2012, Elvidge,

Hsu, Baugh, and Ghosh (2014) classify Latin America and Caribbean

countries as follows:

- Rapid Growth in Lighting (the sum of the GDP and population

correlation coefficients exceeds 1.8, from highest to lowest) - Chile,

Bolivia, Grenada and St. Lucia;

- Moderate Growth in Lighting (sum of the GDP and population

correlation coefficients is larger than 1 and less than 1.8, from highest to

lowest) - Honduras, Belize, Argentine, Guatemala, Trinidad and Tobago,

Panama, Suriname, Brazil, Paraguay, El Salvador, Peru, Ecuador,

Antigua and Barbuda, Barbados, Bahamas, Nicaragua, Mexico, Costa

Rica, Haiti;

- Stable Lighting (lack strong correlation to either GDP or population or

both, with sum of coefficients between around 0 and 1) - Uruguay,

Dominican Republic, Saint Vincent and the Grenadine, Saint Kitts and

Nevis, Venezuela, Guyana, Colombia; and

- GDP centric (countries having a positive correlation coefficient with

GDP and a negative correlation coefficient with population) - Dominica.

Page 26: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

This research innovates with respect to previous literature in at least

three respects. First, it analyses in a systematic way the relationship

between DMSP-OLS night lights and GDP in Latin America and

Caribbean countries at the finer extent possible, looking at conditions

under which lights can be used at a very detailed geographical level,

giving all their value added in terms of mapping economic development.

Second, the research uses both a time and spatial econometric approach

in the analysis, particularly the panel regressions conducted on night and

GDP data. Third, use is made of regional and sub-regional data produced

by NSOs, Central Banks and other relevant and official agencies

pertaining to the national statistics systems of each country, after a

careful analysis of the data and metadata available.

5.3 The use of night lights to map poverty indices worldwide

Reducing consistently poverty is one of the main objectives of the

sustainable development agenda. The first Goal of the SDGs is to end

poverty in all its forms everywhere, and its first two targets include two

ambitious objectivs to be reached by 2030: (a) Eradicate extreme poverty

for all people everywhere, currently measured as people living on less

than 1.25 a day, and (b) Reduce at least by half the proportion of men,

women and children of all ages living in poverty in all its dimensions

according to national definitions.

Poverty is the general term describing living conditions that are

detrimental to health, comfort, and economic development. One of the

sources for statistics on global poverty is the World Bank, which has

collected and distributed national level data on poverty levels since

1990. The estimation method is based on the analysis of household

budget and expenditure surveys conducted in almost 100 developing

countries. Survey questions cover sources of income, consumption

expenditures, and numbers of individuals making up the household.

Based on data from the World Development Indicators, approximately

10.7 per cent or 765 million people in the world lived in 2013 in extreme

poverty, with less than 1.90 US dollars per day (in 2011 PPP) based on

the new definition adopted by the World Bank. The two published

measures from the World Bank are called here ODP and TDP (one and

two USD poverty lines, respectively).

Individual countries also establish their own poverty line for the national

data, here called NPL. However, differing standards in defining poverty

make pooling the national poverty line data problematic. There are also

a number of problems recognized with the World Bank poverty line data.

Page 27: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

27/38

Furthermore, not all countries around the world conduct the surveys, and

the survey repeat cycle is uncertain. The inter-comparability of the

estimates is also uncertain due to difficulties in reconciling consumption

and income data, plus discrepancies in the purchasing power parity

estimates for individual countries. It is also possible for governments to

influence the outcome of the surveys since they design the questions,

select the areas for survey and conduct the interviews.

The use of the threshold in terms of USD for the international poverty

line data is not applicable to prosperous countries such as the USA,

Japan or Western Europe. Finally, data are in most cases updated,

because there is a considerable time to process statistics information, and

final poverty lines are generally not available in a disaggregated form at

the country level.

Another important international source for poverty measures is the UN

Development Programme, UNDP, which since 2010 has published a

Multidimensional Poverty Index, MPI, in its annual reports. The

indicator starts from considering that, like economic and social

development, poverty is multidimensional, a circumstance that is

generally ignored by headline money metric measures of poverty.

The MPI complements monetary measures of poverty by considering

overlapping deprivations suffered by individuals at the same time in

three fundamental dimensions, namely Health (Nutrition and Child

mortality), Education (Years of schooling and Children enrolled), and

Standard of living (Cooking fuel, Toilet, Water, Electricity, Floor and

Assets), in parentheses the individual indicators.

Overall, ten components are considered in the three dimensions. The

indicator can be deconstructed by region, ethnicity and other groupings

as well as by dimension and the ten sub-indicator. 102 countries are

covered by the index, which uses micro data from household surveys,

therefore being prone to the same critics and shortcomings of the World

Bank indicators.

In parallel, the UNDP calculates another composite index, the Human

Development Index (HDI), which is a summary measure of

achievements in three key dimensions of human development, namely a

long and healthy life, access to knowledge and a decent standard of

living. Although not properly a poverty indicator, it is used here as an

indicator related to well-being and depicting particular aspects of human

development.

Page 28: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Spatially disaggregated global maps of poverty indicators, especially if

updated on an annual or semi-annual basis, would be extremely

beneficial for tracking the effectiveness of poverty-reduction efforts in

specific localities and the consequences of natural disasters, epidemics,

conflicts o other general policy purposes. Satellite images could make it

possible to update spatially disaggregated poverty maps on an annual,

semi-annual or even monthly basis.

This part of the research presents a spatially disaggregated map of

poverty indices derived from satellite data on night lights and population

data drawn at very fine geographical level.

The map is based on the assumption that lights are proxies for wealth,

and therefore areas with higher population in developing countries

would be poorly lit and with higher percentage of poor people, and vice

versa.

Two spatially disaggregated data are used to form the global poverty

index, which is obtained by dividing Gridded Population of theWorld,

Version 4 (GPWv4) of the Center for International Earth Science

Information Network (CIESIN) at Columbia University, by night lights

collected through the VIIRS instrument.

The index is formed by dividing population by the average visible band

digital number from the VIIRS lights. In areas where no lighting is

detected, the lights dataset have a value of one, thus passing the GPWv4

population into the poverty index, which reaches its maximum (of

poverty) of 100.

While GPWv4 is gridded with an output resolution of 30 arc-seconds, or

1 km at the equator, VIIRS data are feature a higher spatial resolution

(15 arc-second, about 500 m). Therefore, a bilinear resampling is

performed, which makes data comparable in terms of resolution before

final processing. Since the night time lights product has a latitudinal

extent of 65south–65north, this determined the extent of the analysis.

Linear regressions were performed between night lights and the various

poverty indices discussed above. Results are shown in Figure 5.3.1.

Correlations are quite strong between MPI and NPI, while reduce

consistently when other indices are considered. Figure 5.3.2 details more

on correlation between MPI and NPI, which is found to be positive and

high (around 0.70), as expected. Results are not qualitatively different

Page 29: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

29/38

when one considers sub-classes of MPI components, such as those

excluding Education and/or Health.

Figure 5.3.1: Regression of NPI over different poverty indices

Page 30: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Figure 5.3.2: Regression of MPI over NPI

Page 31: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

31/38

Figure 5.3.3: Scatterplot of MPI and NPI

Estimated linear regression coefficients for all relations considered in

Figure 5.3.1 were then used to obtained detailed map (1 squared km) for

the various indices for all countries worldwide.

Since the resulting poverty data set is at 30 arc sec resolution, it can be

aggregated to either national or sub-national levels, depending on aims

of the analyses.

Page 32: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Here we simply show the results obtained for the NPI at the world level,

reported in Figure 5.3.4. The final results obtained follow overall

common sense.

Most areas in sub-Saharan Africa show high poverty levels, together

with countries in Asia such as Afghanistan, Bangladesh, Cambodia and

Mongolia. Surprisingly, some countries in Europe reveal comparatively

high poverty levels, when compared with their European counterparts,

which share better positions in the poverty ranking. This is the case, for

example, of Ireland and Norway in the map. This might in part be due to

a bias in our procedure, which penalizes areas where governments

embarked in energy-saving policies in the last few years.

It should be further noticed that our linear regression lack predicting

power, because of possible non-linearities (especially at lower poverty

levels) in the relation between nigh lights and poverty indices, as clearly

emerges from the previous figures. It is worth mentioning that

experiments carried out on non-linear regression did not show

qualitative improvement in final results.

Figure 5.3.4: Normalized poverty index with night lights, World

6. Conclusions and Recommendations

The adoption of the Sustainable Development Goals in September 2015

by the United Nations General Assembly is calling NSOs worldwide to

Page 33: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

33/38

underpin a data revolution, which is difficult to achieve without

changing structure and functioning of national statistics systems

wordwide. Indeed, NSOs should extend both the scope and

disaggregation of the data traditionally produced, and measure new

economic, social and environmental phenomena, leaving none behind.

Nowadays, the is an overall consensus that, in the digital era, Big Data

might strengthen traditional data sources and statistics in monitoring

sustainable well-being, facilitating the transformative agenda of NSOs

facing the new challenges.

This research has reviewed Big Data definitions, discussed the

intangible borderline between the hard work daily made by official

statisticians and the possibilities offered through earth observation by

satellite images. Two of the most relevant set of indicators, on which

many SDG indicators are constructed - GDP and Poverty – have been

mapped.

GDP and poverty mapping are possible at very fine geographical level,

one square km, using satellite images publicly and freely available to

everybody. Econometric calculations are straightforward; and results

might sometimes request an act of faith, which is anyway quite familiar

to official statisticians and their users.

The empirical analyses carried out with particular reference to poverty,

show that there might be considerable advantages from the use of Big

Data sources in the framework of the new increasing demand coming

from policy makers.

The research greatly benefited from the use of US satellite data.

European data on earth observations are another incredible source of

statistics information.

Indeed, Copernicus is perhaps the most ambitious earth observation

programme to date. It provides accurate, timely and easily accessible

information to improve the management of the environment, understand

and mitigate the effects of climate change and ensure civil security.

This initiative, headed by the European Commission in partnership with

the European Space Agency, is actually providing accurate, timely and

easily accessible information to improve the management of the

environment, understand and mitigate the effects of climate change and

ensure civil security. The delivery of the data is ensured from upwards

of 30 satellites, called Sentinels.

Page 34: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

The Sentinels provide a unique set of observations, starting with the all-

weather, day and night radar images from Sentinel-1. Sentinel-2

satellites are designed to deliver high-resolution optical images for land

services, while Sentinel-3 provide data for services relevant to the ocean

and land. Sentinel-4 and -5 will provide data for atmospheric

composition monitoring from geostationary and polar orbits,

respectively. Sentinel-6 will carry a radar altimeter to measure global

sea-surface height, primarily for operational oceanography and for

climate studies.

The information provided by this incredible source of information for

SDGs monitoring and reporting is in its preliminary phase, but there is

an enormous amount of information awaiting for investigation to help

shape the future of our planet for the benefit of all, leaving none behind.

Page 35: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

35/38

References

Abdulkadri, A., A. Evans, and T. Ash (2016). An Assessment of Big Data

for Official Statistics in the Caribbean - Challenges and Opportunities.

48. UN-ECLAC Series Studies and Perspectives, pp. 1–56.

Addison, D. M. and B. P. Stewart (2015). Nighttime lights revisited: the

use of nighttime lights data as a proxy for economic variables. Policy

Research Working Paper 7496. World Bank.

Alesina, A., S. Michalopoulos, and E. Papaioannou (2016). “Ethnic

Inequality”. In: Journal of Political Economy 124.2, pp. 428–488.

Bhandari, L. and K. Roychowdhury (2011). “Night Lights and Economic

Activity in India: A study using DMSP-OLS night time images”. In:

Proceedings of the Asia-Pacific Advanced Network 32.0, p. 218.

Bickenbach, F., E. Bode, P. Nunnenkamp, and M. Söder (2016). “Night

lights and regional GDP”. In: Review of World Economics 152.2, pp.

425–447.

Bleakley, H. and J. Lin (2012). “Portage and Path Dependence”. In: The

Quarterly Journal of Economics 127.2, pp. 587–644.

Bundervoet, T., L. Maiyo, and A. Sanghi (2015). “Bright Lights, Big

Cities: measuring national and subnational economic growth in Africa

from outer space, with an application to Kenya and Rwanda”.

Chen, X. andW. D. Nordhaus (2011). “Using luminosity data as a proxy

for economic statistics”. In: Proceedings of the National Academy of

Sciences 108.21, pp. 8589–8594.

Doll, C. N. H., J. P. Muller, and J. G. Morley (2006). “Mapping regional

economic activity from night-time light satellite imagery”. In:

Ecological Economics 57.1, pp. 75–92.

Donaldson, D. and A. Storeygard (2016). “The view from above:

applications of satellite data in economics”. In: Journal of Economic

Perspectives 30.4, pp. 171–198.

Ebener, S., C. Murray, A. Tandon, and Christopher C. Elvidge (2005).

“From wealth to health: modelling the distribution of income per capita

at the sub-national level using night-time light imagery”. In:

International Journal of Health Geographics 4.1, p. 5.

Page 36: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Elvidge, C. D., K. E. Baugh, E. A. Kihn, H.W. Kroehl, and C.W. Davis

(1997). “Relation between Satellites Observed Visible - Near Infrared

Emissions, Population, Economic Activity and Electric Power

Consumption”. In: International Journal of Remote Sensing 18.6, pp.

1373–1379.

Elvidge, C. D., F. Hsu, K. E. Baugh, and T. Ghosh (2014). “National

trends in satellite observed lighting”. In: Global urban monitoring and

assessment through earth observation. Vol. 23. Boca Raton, FL: CRC

Press. Chap. 6, pp. 97–119.

Ghosh, T., S. Anderson, C. D. Elvidge, and P. Sutton (2013). “Using

Nighttime Satellite Imagery as a Proxy Measure of Human Well-Being”.

In: Sustainability 5.12, pp. 4988–5019.

Ghosh, T., S. Anderson, R. L. Powell, P. C. Sutton, and C. D. Elvidge

(2009). “Estimation of Mexico’s Informal Economy and Remittances

Using Nighttime Imagery”. In: Remote Sensing 1.3, pp. 418–444.

Ghosh, T., L. R. Powell, D. C. Elvidge, E. K. Baugh, C. P. Sutton, and

S. Anderson (2010). “Shedding light on the global distribution of

economic activity”. In: The Open Geography Journal 3.1.

Harvey, A. C. (1991). Forecasting, structural time series models and the

Kalman Filter. Cambridge University Press. ISBN: 9780521405737.

Henderson, J. V., T. L. Squires, A. Storeygard, and D. N.Weil (2016).

The Global Spatial Distribution of Economic Activity: Nature, History,

and the Role of Trade. National Bureau of Economic Research.

Henderson, J. V., A. Storeygard, and D. N.Weil (2012). “Measuring

Economic Growth from Outer Space”. In: American Economic Review

102.2, pp. 994–1028.

Huang, Q., X. Yang, B. Gao, Y. Yang, and Y. Zhao (2014). “Application

of DMSP/OLS Nighttime Light Images: A Meta-Analysis and a

Systematic Literature Review”. In: Remote Sensing 6.8, pp. 6844–6866.

Independent Expert Advisory Group on a Data Revolution for

Sustainable Development (2014). A World That Counts - Mobilising the

Data Revolution for Sustainable Development. United Nations, New

York, pp. 1–30.

Page 37: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

37/38

Lee, Y. S. (2016). International Isolation and Regional Inequality:

Evidence from Sanctions on North Korea. Working Paper 575. Stanford,

CA: Stanford Center for International Development.

Lorena, R. B. (2013). “Avaliação da potencialidade das imagens de luzes

noturnas DMSP/OLS para a construção de indicadores socioeconômicos

para o estrado do Espírito Santo”. In: Anais XVI Simpósio Brasileiro de

Sensoriamento Remoto. SBSR - Foz do Iguaçu (PR), Brasil, pp. 1355–

1362.

Manske, J., D. Sangokoya, G. Pestre, and E. Letouzé (2016).

Opportunities and Requirements for Leveraging Big Data for Official

Statistics and the Sustainable Development Goals in Latin America.

White Paper Series Data-Pop Alliance, pp. 1–71.

Mellander, C., J. Lobo, K. Stolarick, and Z. Matheson (2015). “Night-

Time Light Data: A Good Proxy Measure for Economic Activity?” In:

PLOS ONE 10.10. Ed. by Guy J-P. Schumann.

Michalopoulos, S. and E. Papaioannou (2013). “Pre-Colonial Ethnic

Institutions and Contemporary African Development”. In: Econometrica

81.1, pp. 113–152.

— (2014). “National Institutions and Subnational Development in

Africa”. In: The Quarterly Journal of Economics 129.1, pp. 151–213.

Muzzini, E., B. Eraso Puig, S. Anapolsky, T. Lonnberg, and V. Mora

(2016). Leveraging the Potential of Argentine Cities: A Framework for

Policy Action. The World Bank

Obikili, N. (2015). “An Examination of Subnational Growth in Nigeria:

1999-2012”. In: South African Journal of Economics 83.3, pp. 335–356.

Roychowdhury, K., S. J. Jones, C. Arrowsmith, and K. Reinke (2012).

“Night-Time Lights and Levels of Development: A Study Using DMSP-

OLS Night-Time Images at the Sub-National Level”. In: Proceedings of

the XXII ISPRS Congress, Melbourne, VIC, Australia. Vol. 25, pp. 93–

98.

Shi, K., B. Yu, Y. Huang, Y. Hu, B. Yin, Z. Chen, L. Chen, and J. Wu

(2014). “Evaluating the Ability of NPP-VIIRS Nighttime Light Data to

Estimate the Gross Domestic Product and the Electric Power

Consumption of China at Multiple Scales: A Comparison with DMSP-

OLS Data”. In: Remote Sensing 6.2, pp. 1705–1724.

Page 38: Sabbatical Leave Programme 2017 Title of the …...Big Data and the SDGs 12-17 5. Empirical Applications 17-32 6. Conclusions and Recommendations 32-34 References 3/38 Acknowledgements

Stathakis, D. (2016). “Forecasting urban expansion based on night

lights”. In: ISPRS - International Archives of the Photogrammetry,

Remote Sensing and Spatial Information Sciences XLI-B8, pp. 1049–

1054.

Storeygard, A. (2016). “Farther on down the Road: Transport Costs,

Trade and Urban Growth in Sub-Saharan Africa”. In: The Review of

Economic Studies 83.3, pp. 1263–1295.

Sutton, P. C. and R. Costanza (2002). “Global estimates of market and

non-market values derived from nighttime satellite imagery, land cover,

and ecosystem service valuation”. In: Ecological Economics 41.3, pp.

509–527.

Sutton, P. C., C. D. Elvidge, and T. Ghosh (2007). “Estimation of gross

domestic product at sub-national scales using nighttime satellite

imagery”. In: International Journal of Ecological Economics & Statistics

8 (S07), pp. 5–21.

TechAmerica Foundation (2012). Demystifying Big Data: A Practical

Guide to Transforming the Business of Government. Tech. rep.

United Nations Economic and Social Council (2013). Big Data and

Modernization of Statistical Systems - Report of the Secretary-General.

Forty-fifth session of the UN Statistical Commission, NY, 4-7 March

2014, doc. E/CN.3/2014/11. December. New York, pp. 1–16.