sumandro_anatomy_of_nsso_data_opendatacamp_20120324
DESCRIPTION
Presentation made at OpenDataCamp in Bangalore (24th March 2012) on the organisation of unit-level data published by National Sample Survey Organisation, Govt. of India.TRANSCRIPT
![Page 2: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/2.jpg)
1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
![Page 3: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/3.jpg)
1. Pre-History1862 British Administration constituted the Statistical Committee for preparation of
forms for primary data collection, followed by the publication of the first Statistical Abstract of British India (1840-1865)
1881 First Decennial Population Census begins
1914 Directorate of Statistics was established in Calcutta in 1914 that later became theDirectorate of Commercial Intelligence and Statistics, which was entrusted with thecompilation of colonial trade statistics
1916 Indian Industrial Commission
1925 Economic Enquiry Committee
1939 Wholesale Price Index collection and calculation begins
1947 P. C. Mahalanobis was appointed the Honorary Statistical Advisor
1949 The Central Statistical Unit was established
1951 Central Statistical Organization and the Department of Statistics are established. They continue to be the major organisations for collection of national-level data
![Page 4: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/4.jpg)
1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
![Page 5: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/5.jpg)
2. GlossaryRound: Each round of data collection by NSSO, usually of annual duration
Schedule: Each thematic focus for data collection, multiple Schedules per Round
Thick Round: Major data collection rounds repeated every 5 years (hence called quinquennial rounds)
Thin Round: Minor data collection rounds
State-Region: Usually a cluster of three or more districts in a state
Fixed-Width File: Fixed-width text files are data files in text format specified by fixed column widths, pad character and left/right alignment.
.do File: A Stata file format. Collection of Stata commands.
.dta File: A Stata file format for data files, similar to Excel, readable by R
.smcl File: A Stata file format for log files, automatically records the Statacommands and results
![Page 6: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/6.jpg)
1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
![Page 7: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/7.jpg)
3. SchedulesMain themes for thick rounds / quinquennial surveys
Consumer expenditure
Employment and Unemployment
Debt and Investment
Manufacturing Enterprises (Organised and Unorganised)
Main themes for thin rounds
Participation and Expenditure in Education
Particulars of Slum and Housing Condition
Morbidity and Healthcare
Situation Assessment Survey of Farmers
Land and Livestock Holding
![Page 8: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/8.jpg)
1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
![Page 9: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/9.jpg)
4. Organisation of DataOrganisation of Raw Data
- The fixed-width file (.txt)
- The binary coding of information
The Supporting Files
- The ‘schedule’ file – survey questionnaire
- The ‘layout’ file – how the information is organised in data files
- The ‘readme’ file – how different data sets are organised
- The state and district codes
Level
- Coding information about single entity in multiple rows
![Page 10: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/10.jpg)
4. Organisation of DataRaw Data
12121212121212121212232323232323232323343434343434343434
Layout
Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 11: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/11.jpg)
4. Organisation of DataRaw Data
12121212121212121212232323232323232323343434343434343434
Layout
Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 12: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/12.jpg)
4. Organisation of DataRaw Data
12121212121212121212232323232323232323343434343434343434
Layout
Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 13: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/13.jpg)
4. Organisation of DataRaw Data
12121212121212121212232323232323232323343434343434343434
Layout
Column 1-2: Person Serial NumberColumn 3-4: Age of the PersonColumn 5-6: Educational Status…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 14: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/14.jpg)
4. Organisation of DataRaw Data with Levels
120112121212121212121212021212121212121212122301232323232323232323
Layout
Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 15: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/15.jpg)
4. Organisation of DataRaw Data with Levels
120112121212121212121212021212121212121212122301232323232323232323
Layout
Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 16: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/16.jpg)
4. Organisation of DataRaw Data with Levels
120112121212121212121212021212121212121212122301232323232323232323
Layout
Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 17: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/17.jpg)
4. Organisation of DataRaw Data with Levels
120112121212121212121212021212121212121212122301232323232323232323
Layout
Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 18: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/18.jpg)
4. Organisation of DataRaw Data with Levels
120112121212121212121212021212121212121212122301232323232323232323
Layout
Column 1-2: Person Serial NumberColumn 3-4: Level CodeColumn 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]…
Schedule
Q.1: What is the serial number of the person?Q.2: What is the age of the person?Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]…
![Page 19: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/19.jpg)
1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
![Page 20: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/20.jpg)
Converting NSSO raw data to tabular form (Comma Separated) using Stata
- The .do file: Set of Stata commands for extraction
- The ‘infix’ command: Mapping variables to data columns
- The ‘var’ command: Labeling the variables
- The levels: Multiple data rows for single entity
- The .dta file: The Stata spreadsheet format
- The .smcl file: The Stata commands and results log file
5. The Extraction
![Page 21: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/21.jpg)
1. Pre-History2. Glossary3. Schedules4. Organisation of Data5. The Extraction6. Looking at the Data
![Page 22: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/22.jpg)
NSSO – Raw Data
![Page 23: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/23.jpg)
NSSO – Extracted Data
![Page 24: sumandro_anatomy_of_nsso_data_opendatacamp_20120324](https://reader034.vdocuments.mx/reader034/viewer/2022051817/5479ff08b47959a9098b48ee/html5/thumbnails/24.jpg)
Sumandro Chattapadhyay@[email protected]