big data for official statistics @ konferensi big data indonesia 2016
TRANSCRIPT
Big Data, a Big Challenge forOfficial Statistics?
Setia PramanaPusat Kajian Komputasi StatistikSekolah Tinggi Ilmu Statistik
Official Statistics
Statistics published by government agencies or other public bodies such as international organizations as a public good.
One of the Official Statistics Producers in Indonesia is BPS
Badan Pusat Statistik
POSITION
BPS is a non-ministerial government institution
Under and responsible directly to the President of the Republic of Indonesia
Headed by a Chief Statistician
TASK
To execute governmental duty in the field of statistics according to the prevailing laws and regulations
Types of Data
Type Description Undertaken by
Basic
Statistics
• Used for a broad range of purposes
• Utilized by government and society
• Characters: cross-sectoral, nationally scale,
broadly aggregated
BPS-Statistics
Indonesia
Sectoral
Statistics
• Utilized by several particular institutions to
fulfill their main tasks
Govt institutions
(independently or in
collaboration with
BPS)
Special
Statistics
• Utilized to fulfill the specific needs of
business, education, socio-cultural, and other
purposed
Non-government
institutions,
organizations,
individuals and/or
other parts of the
society.
NATIONAL STATISTICAL SYSTEM
Statistical Data
Request
Statistical Community
Forum
Resources, Methods,
Infrastructure, Science &
Technology, Law
Component
SectoralStatistics
Basic Statistics
Special Statistics
Government Institution
BPS
Community
Data
Data
Synopsis
Data
BPSAs Clearing
House ofStatistical Information
Provider ofStatistical
Information
SurveyCompilation ofAdministrative
ProductOthers
CensusSurvey
Compilation ofAdministrative
ProductOthers
SurveyCompilation ofAdministrative
ProductOthers
Type Undertaker Methods Result
Feedback
Coordination, Integration, Synchronization, Standardization (CISS)
(5)(1)
(2)
(1)
(3)
(4)
NOTES: (1) BPS coordinates statistical undertaking; (2) Govt institutions submit survey plan and provides recommendations;(3) Govt institutions submit the result to BPS; (4) Private institutions or community submit synopsis (5) Govt institutions and private/community coordinate & cooperate with BPS
Data Collection Method
CENSUS
Enumerating all population units in Indonesia
Is conducted to obtain characteristics of the population at the certain period of time
Is held decennially
SURVEY
Enumerating samples of a population
Is conducted to estimate the characteristics of a population at a certain period of time
COMPILATION OF ADMINISTRATIVE RECORDS
Data collection, processing, dissemination, and analysis based on administrative records of government or community
Census Conducted by BPS (1)
• Mandate of Law No.16/1997 and an agenda of UN;
• Held every 10 years in years ended with 0;
• The 2010 Population Census was the sixth Population Census in Indonesia after 1961, 1971, 1980, 1990, and 2000;
• The 2010 Population Census was held to collect basic data on housing and population, demographic parameter, data for MDGsevaluation, and program targeting.
Population
Census
http://sp2010.bps.go.id
• Used as benchmark data for agricultural sector;
• Conducted every 10 years in years ended with 3;
• The 2013 Agricultural Census was the sixth agricultural census in Indonesia after 1963, 1973, 1983, 1993 and 2003;
• Agricultural characteristics: farmer household, number of livestock, land tenure, etc.
Agricultural
Census
http://st2013.bps.go.id
Census Conducted by BPS (2)
Census Conducted by BPS (3)
• Conducted every 10 year in years ending with 6;
• The upcoming 2016 Economic Census is going to be the fourth economic census in Indonesia after 1986, 1996, 2006, and 2016;
• Enterprise characteristics: number of enterprises, labor force, etc.
Economic
Census
Surveys Conducted by BPS
Several surveys conducted by BPS, among others:
Susenas (National Economic and Social Survey),
Sakernas (National Labor Force Survey),
Price Survey (Consumer, Rural Consumer, Wholesale)
Business and Consumer Tendency Survey
Industrial Survey
Indonesia Demographic and Health Survey
Etc.
Data Obtained from Compilation of Administrative Records
• Human Development Index,
• agricultural indicators,
• export and import,
• transportation,
• flow fund accounts,
• gross domestic product
• tourism
www.themegallery.com
Official Statistics News (Berita Resmi Statistik / BRS)
MonthlyQuarterly
(Feb, May, Aug, Nov)
Four-monthly, Semesterly,
AnnuallyInflation/Consumer Price
IndexGDP/Economic Growth Poverty (January and July)
Export Business Tendency Index Employment (May and November)
Import Consumer Tendency Index Forecast:
Trade Balance Manufacturing Industry- Production of Paddy, Maize, and Soybeans
Preliminary Figure Year n-1 (March)
Tourism - Large and Medium Scale- Production of Paddy, Maize, and Soybeans
Forecast I Year n (July)
Transportation - Micro and Small Scale- Production of Paddy, Maize, and Soybeans
Forecast II Year n (November)
Farmer Terms of Trade Producer Price Index (Oct’13)
Grain Producer Price
Wage
Wholesale Price Index
Quality of Statistics
Accuracy
Relevance
Timeliness
Accessibility
Coherence
Interpretability
Will Big Data Replacing the Official Statistics?
Official Statistics vs. Big Data
15
Dr. Jose Ramon G. Albert, NSCB, Philippines
Complementary Roles for Official Statistics and Big Data
• Provide variables to help BPS stratify better for sample surveys
• Improve sample survey estimates
• Help to compensate for nonresponse
• Help to check BPS estimates
• Help to improve the frequency and timeliness of data releases
• Help to improve and provide more small-area estimates
16
Cavan Capps and Tommy Wright, U.S. Census Bureau
Sources of Big Data
• Administrative data that arise from the administration of a program, be it governmental or not (e.g., electronic medical records, hospital visits, insurance records, bank records, and food banks)
• Commercial or transactional digital data that arise from the transaction between two entities (e.g., credit card transactions, online transactions including from mobile devices Sensor data (e.g., satellite imaging, road sensors, and climate sensors)
• GPS tracking devices (e.g., tracking data from mobile telephones)
• Behavioral data (e.g., online searches about a product, service, or any other type of information and online page views)
• Opinion data (e.g., comments on social media)
What have been done?
Connecting the Data
19
Google Dengue and Influence TrendGlobal Pulse: Price of rice trend based on Twitter
What have been done?
Proof of Concept Projects
Big Data for Predicting Commuting Patterns
Using Big Data to Nowcast Food Prices
Big Data for Predicting Commuting Patterns
Big Data for Predicting Commuting Patterns
Collaboration with Pulse Lab UN Jakarta
a big data project using multiple sources of data, e.g. social media, statistical data, etc.
to better understand inter-city commuting patterns using social media.
offers less expensive and easier way to collect information.
Data Sources
Jabodetabek Commuter Survey 2014
Sample: 13120 Household from 13 kabupaten/kota
Twitters (Februari 2014)
7 Million tweets
bases on Geotag
Preliminary Results
Commuter Survey Twitter
Using Big Data to Nowcast Food Prices
Using Big Data to Nowcast Food Prices
Collaboration with Pulse Lab UN Jakarta
Aim: to nowcast food prices using multiple sources of data including social media, Google Trends, and crowdsourcing as well as official statistics from BPS, Ministry of Trade and Ministry of Agriculture.
Locus: Kota Mataram, NTB
Time: March– July 2015
Data Sources
Data Sources, cont’d
Crowdsource
Premise
Premise
Contributor’s payment method:
online cash transfers (paypal)
mobile money,
grocery vouchers,
bitcoin,
gift cards.
Plus other incentives
Premises: Commodities
Premises: Quality checking and Fraud detection
Profile Fraud
User creates multiple profiles on
multiple phones. This gives the illusion that all observations are from different users with unique user IDs, when in actuality they are duplicates.
Group Collaboration Fraud
Users travel together to the same markets and submit the same items.
Premises: Fraud detection
Location Fraud
Users attempt to submit observations from one location as sourced from multiple locations by manually changing the store name.
Duplicate Data Fraud
Users attempt to submit the same products multiple times, often by changing the price for each submission
Data Sources
Food Prices: Consumer Prices Index BPS
HK 1.1 and HK 1.2
3 March until 13 July 2015.
Crowdsource: Data Preparation
107,973 records
Check and revise the unmatch quantity, size dan size.
Standard the price as data from crowdsourcing have different unit dan size unit
Data cleaning: remove record: uncomplete, error, dan record with unacceptable value.
Spline approach
Crowdsource: Challenges
Different unit size
Unknown commodity quality.
Lot of outliers (Large range max-min prices).
Contain strange observations, e.g. price: Rp 1,-
Uncompleted data
Number of observations per time is different.
Preliminary Results
Rice
Preliminary Results
Beef Chicken
Preliminary Results
Eggs Flour
Preliminary Results
0
5000
10000
15000
20000
Tomato
PREMISE Rata-Rata Pasar Rata-Rata Swalayan
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Sweet Potato
PREMISE Rata-Rata Pasar Rata-Rata Swalayan
0
2000
4000
6000
8000
10000
12000
Long Bean
PREMISE Rata-Rata Pasar Rata-Rata Swalayan
0
5000
10000
15000
20000
25000
30000
35000
40000
Green Chili
PREMISE Rata-Rata Pasar Rata-Rata Swalayan
Discussion
Similar pattern for commuting behavior and Twitter movement
Similar trend between crowdsourcing approach and BPS Survey for all commodities.
Data cleaning need to be more robust and automatized.
Extend to other commodities to predict inflation.
Can Crowdsourcing approach provide food price fast, and reliable?
Summary
Bigdata is complementing Official Statistics
Still many approaches have to be explored and studied
Big Data “maybe replacing” conventional approach, in some parts
Contributions from stakeholders needed
Researchers
STIS
Setia Pramana
Ricky Yordani
Budi Yuniarto
Robert Kurniawan
Pulse Lab
Jonggun Lee
Imaddudin