share data cleaning

13
Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de SHARE Data Cleaning Stephanie Stuck MEA Vienna November 5/6 th

Upload: ewa

Post on 06-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

SHARE Data Cleaning. Stephanie Stuck MEA Vienna November 5/6 th. General philosophy. Respondents are experts of their own lives, in general we (still ) take their answers very seriously - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SHARE Data  Cleaning

Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de

SHARE Data Cleaning

Stephanie Stuck

MEA

Vienna November 5/6th

Page 2: SHARE Data  Cleaning

2

General philosophy

Respondents are experts of their own lives, in general we (still ) take their answers very seriously

Only change data if you are sure it is wrong, if answers seem implausible but you are not sure what to do

indicate this via flag variable

Page 3: SHARE Data  Cleaning

3

General rules

Please use data files with original sampid to check and correct data (don’t use data version with sampid2)

Always write programs to correct data (STATA do or SPSS sps files) please never change data directly (e.g. no changes in editors)

Page 4: SHARE Data  Cleaning

4

General rules

Keep original variables (name: "varname_original”)

Add flag variables to indicate changes(name: "varname_flag)

Save corrected data files with new name (e.g. “filename_corrected”)

Page 5: SHARE Data  Cleaning

5

don’t always take wave 1 information for granted, it can be wrong, too

sometimes we will have to change wave 1 data, too

we will have another release of wave 1 data together with the public release of wave 2

Probably we will already have a minor update of release 2.0.1 early next year

General rules

Page 6: SHARE Data  Cleaning

6

Very next steps

Check for country specific deviations! e.g. especially routing errors, ep071, ep098, hc module etc.

Send information on all country specific deviations to MEA, please don’t forget an English translation or explanation of deviations

Information on important deviations in central variables should be available to all FRB authors together with release 0

Page 7: SHARE Data  Cleaning

7

Very next steps

Check financial amounts for implausible values, e.g. negative or very high amounts

outliers

zero values

wrong currencies

typing errors

“drunken interviewers” problem

also consider frequencies of payments etc.

Page 8: SHARE Data  Cleaning

8

Wrong sampid, cvid or respid

MEA already checks for mismatches within and between waves

Please ask survey agencies and send all information you have on renamed cases, mismatches etc. to MEA

Whenever you find new information on mismatches e.g. in remarks send the information to MEA

Please send data files with old and new ids for renamed cases to MEA, provide information on date and reason (if possible) in additional variables

Sometimes only the CV or only the individual modules (DN etc.) have to be renamed (especially but not only if respondents are exchanged within households). Please don’t forget to provide information where changes have to be done.

MEA will correct files and send lists with hard cases to country teams to check/ask survey agencies again

Page 9: SHARE Data  Cleaning

9

General checks

Corrections based on checks of frequency distributions, e.g. outliers, values out of range

Corrections based on consistency checks within and between modules and

waves

Page 10: SHARE Data  Cleaning

10

More concrete

Check for empty cases

Check for duplicates

Check year of birth between coverscreen (cv_r and cv_h) and dn module, drop-offs and vignettes respectively, and possibly with the gross sample

Check gender CV/DN vs. drop-off/vignettes

Check for consistency of dates:

Check information on marital status:

Check respondent dummies

Check ch module against coververscreen

Check relation to coverscreen respondent

Page 11: SHARE Data  Cleaning

11

Interviewer remarks

Go through remarksa lot of them are not helpful, but some are

very important (e.g. exchanged respondent, amounts apply to all familiy members, different time horizons etc.)

Categorize problems as much as possible Write programs to correct data if possible Flag cases where unsure Collect information on questions that

caused a lot of problems / didn’t work for future waves

Page 12: SHARE Data  Cleaning

12

Open questions

Go through open questions and code answers into original values if possible

Priority list of variableseducation, employment status

Page 13: SHARE Data  Cleaning

13

How to go on

Your experience is very appreciated

Please send information on what you have done, what problems you found etc. to MEA

MEA will send out more information, results of our discussion now, ‘checking lists’, ‘common problems’, etc.

We should have another meeting/workshop maybe in February or we could have an extra meeting e.g. in Mannheim