sumandro_anatomy_of_nsso_data_opendatacamp_20120324

24
Sumandro Chattapadhyay @ajantriks @ajantriks.net Anatomy of NSSO Data

Upload: sumandro

Post on 29-Nov-2014

1.089 views

Category:

Self Improvement


0 download

DESCRIPTION

Presentation made at OpenDataCamp in Bangalore (24th March 2012) on the organisation of unit-level data published by National Sample Survey Organisation, Govt. of India.

TRANSCRIPT

Page 1: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

Sumandro Chattapadhyay@[email protected]

Anatomy of NSSO Data

Page 2: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data

Page 3: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History1862 British Administration constituted the Statistical Committee for preparation of

forms for primary data collection, followed by the publication of the first Statistical Abstract of British India (1840-1865)

1881 First Decennial Population Census begins

1914 Directorate of Statistics was established in Calcutta in 1914 that later became theDirectorate of Commercial Intelligence and Statistics, which was entrusted with thecompilation of colonial trade statistics

1916 Indian Industrial Commission

1925 Economic Enquiry Committee

1939 Wholesale Price Index collection and calculation begins

1947 P. C. Mahalanobis was appointed the Honorary Statistical Advisor

1949 The Central Statistical Unit was established

1951 Central Statistical Organization and the Department of Statistics are established. They continue to be the major organisations for collection of national-level data

Page 4: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data

Page 5: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

2. GlossaryRound: Each round of data collection by NSSO, usually of annual duration

Schedule: Each thematic focus for data collection, multiple Schedules per Round

Thick Round: Major data collection rounds repeated every 5 years (hence called quinquennial rounds)

Thin Round: Minor data collection rounds

State-Region: Usually a cluster of three or more districts in a state

Fixed-Width File: Fixed-width text files are data files in text format specified by fixed column widths, pad character and left/right alignment.

.do File: A Stata file format. Collection of Stata commands.

.dta File: A Stata file format for data files, similar to Excel, readable by R

.smcl File: A Stata file format for log files, automatically records the Statacommands and results

Page 6: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data

Page 7: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

3. SchedulesMain themes for thick rounds / quinquennial surveys

Consumer expenditure

Employment and Unemployment

Debt and Investment

Manufacturing Enterprises (Organised and Unorganised)

Main themes for thin rounds

Participation and Expenditure in Education

Particulars of Slum and Housing Condition

Morbidity and Healthcare

Situation Assessment Survey of Farmers

Land and Livestock Holding

Page 8: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data

Page 9: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataOrganisation of Raw Data

- The fixed-width file (.txt)

- The binary coding of information

The Supporting Files

- The ‘schedule’ file – survey questionnaire

- The ‘layout’ file – how the information is organised in data files

- The ‘readme’ file – how different data sets are organised

- The state and district codes

Level

- Coding information about single entity in multiple rows

Page 10: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data

12121212121212121212232323232323232323343434343434343434

Layout

Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 11: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data

12121212121212121212232323232323232323343434343434343434

Layout

Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 12: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data

12121212121212121212232323232323232323343434343434343434

Layout

Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 13: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data

12121212121212121212232323232323232323343434343434343434

Layout

Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 14: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data with Levels

120112121212121212121212021212121212121212122301232323232323232323

Layout

Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 15: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data with Levels

120112121212121212121212021212121212121212122301232323232323232323

Layout

Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 16: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data with Levels

120112121212121212121212021212121212121212122301232323232323232323

Layout

Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 17: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data with Levels

120112121212121212121212021212121212121212122301232323232323232323

Layout

Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 18: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

4. Organisation of DataRaw Data with Levels

120112121212121212121212021212121212121212122301232323232323232323

Layout

Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…

Schedule

Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?

[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…

Page 19: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data

Page 20: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

Converting NSSO raw data to tabular form (Comma Separated) using Stata

- The .do file: Set of Stata commands for extraction

- The ‘infix’ command: Mapping variables to data columns

- The ‘var’ command: Labeling the variables

- The levels: Multiple data rows for single entity

- The .dta file: The Stata spreadsheet format

- The .smcl file: The Stata commands and results log file

5. The Extraction

Page 21: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data

Page 22: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

NSSO – Raw Data

Page 23: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

NSSO – Extracted Data

Page 24: sumandro_anatomy_of_nsso_data_opendatacamp_20120324

Sumandro Chattapadhyay@[email protected]