pakistan census data – case study

Pakistan Census Data

Collection for a Robust Distribution

Case Study

Population

187,418,849 est.

3rd August 2014Source: census.gov.pk

Objectives

● Data availability

● Open data

● Transparency

● Robust access

● Widely accessible formats

Sources● Population Census Organization (census.gov.pk)

● World Bank (data.worldbank.org)

● ReliefWeb (reliefweb.int)

● USAID (usaid.gov)

Best Source● Population Census Organization (census.gov.pk)

Detailed data exists but not available in reusable and widely accessible formats. In fact,

the website itself is not available most of the time.

● World Bank (data.worldbank.org)

● ReliefWeb (reliefweb.int)

● USAID (usaid.gov)

Data available in different accessible formats but data is brief, limited and directed.

Problems

● No downloadable data format available

● Website inaccessible most times of 24 hours

● No semantic management for available data

● No easy way to access the data programmatically

Collection Methodology

● Start with 1998 census data1.

● Data available for each district.

● Each district data accessible2 as HTML page.

● Patience!

1. Who am I kidding?! That is the only census data available.2. Only when website is available & accessible

First idea; Last idea

● Scrap the website

● Scrap the files on website

Scrap, Covert, Save. Easy Peasy!

PHP Library – Simple HTML DOM

Project website: http://sourceforge.net/projects/simplehtmldom/

Easier said than done!1. Server non-responsive to script calls.

2. Server unavailable after script comes across an error.

3. Ridiculous latency. (Patience methodology applies here.)

4. Non-semantic data e.g. some districts have extra information

columns; in result, returning error and going back to #2.

5. HTML files were literally saved from Microsoft Office!!

Yes, Save as… (Web page)

Problems:

● No looping through data files (Server timeout).

● Even with a delay, if an error occurred, its long server timeout.

Solution:

● Manually run script for each file one by one. Process goes as following:

Scrapping, finally

PHP file_get_contents Simple HTML DOM JSON SAVE FILE

Received Data - Final Version

Restructured Data - Final Version

Further Steps; Making Data Useful1. Go for original objectives of this whole process.

2. Restructure all data into a standard format.

3. Acquire missing data.

4. Make it all available for public use.

Get it, share it or contribute to it at git.io/pk-census

Thank you!

@jabranr | hello@jabran.me

http://git.io/pk-census

pakistan census data – case study

available data

census data available

detailed data

data useful

missing data

nonsemantic data

district data accessible2

data files server timeout

Data & Analytics

case history: pakistan floods of 2010 case history...

a case study from pakistan

climate change public finance case: pakistan

case studies of pakistan carbonates

a pakistan case study - lirneasia.net

country case study: pakistan - clingendael · country case...

case study cutlery pakistan

lia research institute census update - windows · pdf...

memo gate case pakistan

pakistan ooh media census report

by habib ullah khan population census organization, pakistan...

supplementing census data sandag case study. demographic...

bim- pakistan case study

modern census polish case study - national academies of

36914451 pepsicola in pakistan case

census of pakistan, 1951 - lsi.gov.in:8081

pakistan case study

case for abortion in pakistan

satyam case on pakistan law

pakistan census - spearhead...