better data: combining survey, administrative and big data · 2019-01-04 · large data ‣...
TRANSCRIPT
![Page 1: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/1.jpg)
researchICTsolutions
Better data: combining survey, administrative and big data
- Dr. Christoph Stork
�1
![Page 2: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/2.jpg)
Big Data = combining various data sets, including large data‣ Large Data (private sector and administrative
data): - n -> all ( n approaches N)
- all bank accounts - all mobile subscribers - all tax payers - all driver licence holders
‣ Survey Data: ‣ Quarterly LFS South Africa 2016 n= 30,000
�2
![Page 3: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/3.jpg)
Large Data‣ Administrative data:
- medical and tax records - driver licences, civil registry ‣ social security, crime statistics, electricity consumption,
educational statistics ‣ Commercial transactions data:
- Stock exchange data, FX - bank and credit card and super market transactions, - insurance records, loyalty card records…
‣ Sensors and tracking devices: sensors, M2M, satellite, GPS devices…
‣ Online activities / social media: Web scraping of online search activity, online page views, blogs/ FB/ twitter
�3
![Page 4: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/4.jpg)
Example:PopulationdensitychangesinColomboregion:weekday/weekendbasedon100millionCallDetailRecordsperdaygeneratedbySriLankamobileoperators
!4
Wee
kday
Sund
ay
Decrease in Density Increase in Density
Time 18:30Time 12:30Time 06:30
by Sriganesh Lokanathan [email protected]
![Page 5: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/5.jpg)
Big Data
�5
ADMIN DATA
SURVEY
DATA
PRIVATE SECTOR
DATA
Banks Mobile Operators Other Companies
Labour Force Survey National Household Income and Expenditure Survey
Economic Surveys (Informal sector)
Ministry of Finance Social Security Civil Registry
![Page 6: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/6.jpg)
Surveys are snap shots - Big Data is a movie
2000
2001 Census
2011 Census
2006 DHS
2012 LFS
2003/4 NHIES
2006/7 NHIES
2013 LFS
2014 LFS
2016 LFS
Can Big Data be used to: ‣ Fill gaps (interpolate key statistics)? ‣ Reduce frequency of surveys? ‣ Make statistics more accurate? ‣ Reduce sample size (census 4% of population)?
Example of Namibia surveys
![Page 7: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/7.jpg)
Governments need reliable data‣ Survey data can be misleading if sampling is not
not done properly…(US election polling eg) ‣ Big data can be misleading if what is being
measured is not well understood ‣ Mobile operator data does not include info on non-users ‣ Bank data does not generate informal sector income data
‣ Social media as early indicator of an unemployment? ‣ Self reinforcing trends ‣ fake news
�7
![Page 8: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/8.jpg)
researchICTsolutions
Digital and Financial Divide
�8
![Page 9: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/9.jpg)
Finscope 2015 Survey for South Africa
�9
Big and Admin Data may overlook these
![Page 10: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/10.jpg)
Finscope 2015 Survey for South Africa
75% of South Africa may leave only thin digital trace
![Page 11: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/11.jpg)
�11
ZICTA 2015 ICT Survey: Individuals 10+
Zambia
Urban
Rural7%
18%
14%
39%
68%
51%
Active Mobile usersSmartphone owners among mobile phone owners
ZICTA 2015 ICT Survey: Households with working…
Financially included
Access to mobile phone
Access to computer
Access to Internet 11%
8%
78%
57%
21%
15%
82%
61%
Male Female
Finscope 2015: Individual 16+
Zambia
Male bias
Urban bias
![Page 12: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/12.jpg)
�12
Zambia -90 db
![Page 13: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/13.jpg)
researchICTsolutions
Informal Business Surveys
�13
![Page 14: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/14.jpg)
researchICTsolutions �14
![Page 15: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/15.jpg)
researchICTsolutions �15
![Page 16: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/16.jpg)
researchICTsolutions �16
Listing compiled for each EA - The listings serve as sample frames for the simple random selections of households & businesses
![Page 17: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/17.jpg)
Large data itself not enough for policy purposes‣ Large Data available for the rich or middle class ‣ Large data for Internet of Things ‣ Little available for the informal sector or poor
- Outside of coverage areas - Not using tech - Not having bank account and only using cash - No health insurance - No permanent address
‣ Informal sector makes up a large share of our societies
�17
![Page 18: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/18.jpg)
Generating Big Data through Triangulation
�18
ADMIN DATA
SURVEY
DATA
PRIVATE SECTOR
DATA
![Page 19: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/19.jpg)
Estimating Income
Source: BanksFrequency: ContinuousType: Private Sector Large DataWhat: Salaries received (formal)Data: address, age, gender, IDEnrichment: years of work experience, education, skills, sector
Source: Ministry of Finance Frequency: Continuous Type: Admin Large Data What: Pay as you earn tax (formal salaries), income from self employmentData: address, age, gender, IDEnrichment: years of work experience, education, skills
Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Income of formally employedIncome of informally employedIncome of self-employedData: age, gender, work experience, education, location, sector, type of employment (full-time, part-time or occasional)Enrichment: Having a bank accountReceiving salary in bank accountPaying Social Security
![Page 20: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/20.jpg)
�20
LFS LFS
2015 estimate
relationship
2020 re-calibrate
Income data
no of people receiving a
formal salary
Income Data Triangulation2015 LFS Formal Income = 2015 Total salaries declared for tax
= 2015 Total salaries received in bank accounts
2015 LFS Informal Income = 2015 LFS total Income - Total salaries declared for tax
Monthly or quarterly PAYE Tax
Salary transactions data
interpolate
![Page 21: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/21.jpg)
Understanding Informal Sector better
‣ Large / Big does not know much about informal sector
‣ Estimate link between informal and formal sector based on detailed LFSs
‣ Interpolating Labour force statistics based on large data may then also allow you to interpolate informal sector statistics
�21
![Page 22: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/22.jpg)
Detecting Strikes
‣ Bank transaction data classified as salary or wage payments can be screened for temporary interruptions by location
�22
![Page 23: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/23.jpg)
Matching Jobs to skills
Source: Universities, VTCs (SAQA) Frequency: AnnualType: Admin DataWhat: Supply of graduates by field and date
Source: Company Online Portal,Company Survey, Skills Audit, tracer studiesFrequency: Occasional Type: Survey / Demand drivenWhat: Vacancies by skill, address, date
Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Number of unemployed by age, gender, work experience, skills, education and locationEnrichment:Paying Social Security
Source: Social Security Frequency: ContinuousType: Admin DataWhat: Number of unemployedData details: address, age, gender, years of work experience,education, skills
![Page 24: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/24.jpg)
�24
2015 LFS
2020 LFS
2015 estimate
relationship
2020 re-calibrate
Combining Social Security with Labour Force Survey Data
quarterly social security data
interpolate employment and
unemployment numbers
![Page 25: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/25.jpg)
Impact of raising minimum wages in SA‣ Asking businesses? They will always say it will reduce jobs ‣ Triangulation: ‣ Bank and Tax Data: once enriched with demographic info
profile of lower salary spectrum ‣ Labour Force Survey Data: Profile informal wage earners close
to minimum threshold ‣ Determine the Gap between the 2 profiles ‣ Who are those that are below the minimum wage ‣ How likely would higher minimum wage make them formal? ‣ Who would benefit? ‣ Who strong will incentive be to remain informal?
‣ Big Data = allows to measure impact and allow quick reversal if harmful �25
![Page 26: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/26.jpg)
researchICTsolutions �26
Can Big Data be used to fill gaps (interpolate key statistics)?
Yes
Can Big Data be used to reduce frequency of surveys? Can Big Data be used to make statistics more accurate?
Can Big Data be used to reduce sample size?
We will only know once we try
![Page 27: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/27.jpg)
researchICTsolutions
All this assumes access can be negotiated
�27
![Page 28: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9ef3dc10a7c60e23520c2d/html5/thumbnails/28.jpg)
researchICTsolutions
Thank youDr. Christoph Stork
[email protected] www.researchictsolutions.com
�28