farm household surveys database organisation and data cleaning glwadys aymone gbetibouo...
TRANSCRIPT
![Page 1: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/1.jpg)
Farm Household Surveys DATABASE ORGANISATION
AND DATA CLEANING
Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN
Economics analyses of climate change Economics analyses of climate change impacts workshopimpacts workshop
Accra, GhanaAccra, Ghana
![Page 2: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/2.jpg)
• Database organisation and cleaning, or data management is generally seen as a set of tasks related to the tabulation phase of the survey, in other words, activities that are conducted towards the end of the survey project, that use computers in clean offices.
• Survey data management should begin concurrently with questionnaire design. Keys points to consider:– Nature and identification of the statistical units
observed– Built-in redundancies– Length and complexity of the questionnaire– Sample size and design– Survey timing and scheduling
![Page 3: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/3.jpg)
DATA ENTRY : “flat file”
![Page 4: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/4.jpg)
codification of the statistical unit
ADM0 ADM1 ADM2 CADM0 CADM1 CADM2 CODE
South Africa Eastern Cape Aberden 7 1 1 700101
![Page 5: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/5.jpg)
Household code8 digits code
HHCODE
70010101
![Page 6: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/6.jpg)
DATA ENTRY SYSTEM•A complex household survey typically contains hundreds of variables. For example household survey dataset 2003 GEF study : 1342 variables
•After the survey instrument has been finalized, you develop the data entry system and provide a protocol for data entry.
•Coding questionnaire•Coding sheet•Household data: 12 worksheets•Climate data; soil data, runoff data
![Page 7: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/7.jpg)
DATA ENTRY hhcode TIB farmtype relhead hhsize gender1 age1
HHCODE TIB 1.0.1 1.0.2 1.1 1.2.1.1 1.2.2.1
70010101 13:50 3 3 4 1 34
70010102 14:30 1 1 8 1 83
70010103 13:55 1 1 3 1 68
70010104 17:30 3 1 2 1 71
70010105 09:25 3 1 4 1 45
70010106 15:30 1 3 6 1 -99
70010107 07:30 3 -99 6 1 38
70010108 13:00 1 1 3 1 75
70010109 08:36 3 -99 5 1 -99
![Page 8: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/8.jpg)
Data cleaning
• Generally data is subjected to control mechanisms:
1.range checks, 2.consistency checks and 3.typographical checks
![Page 9: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/9.jpg)
Range checksEvery variable in the survey contains only data within a limited domain of valid values.
tab farmtype, missing
farmtype | Freq. Percent Cum.------------+-----------------------------------
-99 | 4 0.99 0.99 1 | 191 47.16 48.15 2 | 71 17.53 65.68 3 | 138 34.07 99.75 9 | 1 0.25 100.00
------------+----------------------------------- Total | 405 100.00
hhcode farmtype remark
39. 70013308 9 CHECK DATA FOR THIS OBS.
![Page 10: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/10.jpg)
Consistency checkValues from one question are consistent with values from another question.
Demographic consistency of the household Consistency of age and other individual characteristics
gen test=hhmales+hhfemales list hhcode hhsize hhmales hhfemales test remark if test!=hhsize,
hhcode hhsize hhmales hhfemales test remark 70013319 18 3 3 6 CHECK DATA FOR THIS OBS70030507 14 4 4 8 CHECK DATA FOR THIS OBS.
tab age5 hhcode age5 remark 70041703 281 CHECK DATA FOR THIS OBS.
![Page 11: Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change](https://reader036.vdocuments.mx/reader036/viewer/2022082505/56649dde5503460f94ad67e6/html5/thumbnails/11.jpg)
Typographical checks• Typographical error consists in the transposition of digits
like entering :41 rather than 14
This error can be check through the double data entry of all questionnaires
-999 rather than .-99 in a numerical input
foreach var of varlist _all {replace `var'=-99 if `var'==-999
replace `var'=. if `var'==-99}
Use the tab function to obtain frequency tables of the data