data management by shahzad asghar arain

31
Using EpiData and SPSS Shahzad Asghar Arain [email protected] Cell 92 312 514 9114 http://shahzadasghar.info

Upload: shahzad-asghar-arain

Post on 30-Oct-2014

69 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management by Shahzad Asghar Arain

Using EpiData and SPSSShahzad Asghar Arain

[email protected] 92 312 514 9114

http://shahzadasghar.info

Page 2: Data Management by Shahzad Asghar Arain

ReferencesPublic domain (pdf) book on data management:

Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData. The EpiData Documentation Project. : http://www.epidata.dk/downloads/dmepidata.pdf

EpiData Association Website: http://www.epidata.dk/

Importing raw data into SPSS: http://www.ats.ucla.edu/stat/spss/modules/input.htm

Page 3: Data Management by Shahzad Asghar Arain

Data ManagementPlanning data needsData collectionData entry and controlValidation and checkingData cleaning and variable transformationData backup and storageSystem documentationOther

Page 4: Data Management by Shahzad Asghar Arain

Types of Data Base Management Systems (DBMSs)

Spreadsheets (e.g., Excel, SPSS Data Editor) Prone to error, data corruption, & mismanagement Lack data controls, limited programmability Suitable only for small and didactic projects Also good for last step data cleaning

Commercial DBMS programs (e.g., MySQL,Oracle, Access) Limited data control, good programmability Slow & expensive Powerful and widely available

Public domain programs (e.g., EpiData, Epi Info) Controlled data entry, good programmability Suitable for research and field use

Page 5: Data Management by Shahzad Asghar Arain

We will use two platforms:EpiData

controlled data entry data documentationexport (“write”) data

SPSS import (“read”) dataanalysis reporting

Page 6: Data Management by Shahzad Asghar Arain

What is EpiData ? EpiData is computer program (small in size

1.2Mb) for simple or programmed data entry and data documentation

It is highly reliable It runs on Windows computers

Runs on Macs and Linus with emulator software (only)Interface

pull down menus work bar

Page 7: Data Management by Shahzad Asghar Arain

History of EpiInfo & EpiData 1976–1995: EpiInfo (DOS program) created by

CDC (in wake of swine flu epidemic)Small, fast, reliable, 100,000+ users worldwide

1995–2000: DOS dies slow painful death2000: CDC releases EpiInfo2000

Based on Microsoft Jet (Access) data engineLarge, slow, unreliable (resembled EpiInfo in name only)

2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”Creates open source public domain program Calls program “EpiData”

Page 8: Data Management by Shahzad Asghar Arain

Goal: Create & Maintain Error-Free DatasetsTwo types of data errors

Measurement error (i.e., information bias) – discussed last couple of weeks

Processing errors = errors that occur during data handling – discussed this week

Examples of data processing errorsTranspositions (91 instead of 19)Copying errors (O instead of 0)Additional processing errors described on p.

18.2

Page 9: Data Management by Shahzad Asghar Arain

Avoiding Data Processing ErrorsManual checks (e.g., handwriting legibility)

Range and consistency checks* (e.g., do not allow hysterectomy dates for men)

Double entry and validation* Operator 1 enters dataOperator 2 enters data in separate fileCheck files for inconsistencies

Screening during analysis (e.g., look for outliers)

* covered in lab

Page 10: Data Management by Shahzad Asghar Arain

Controlled Data EntryCriteria for accepting & rejecting dataTypes of data controls

Range checks (e.g., restrict AGE to reasonable range)

Value labels (e.g., SEX: 1 = male, 2 = female)Jumps (e.g., if “male,” jump to Q8)Consistency checks (e.g., if “sex = male,” do

not allow “hysterectomy = yes”)Must entersetc.

Page 11: Data Management by Shahzad Asghar Arain

Data Processing Steps1. File naming conventions2. Variables types and names3. QES (questionnaire) development4. Convert .QES file to .REC (record) file 5. Add .CHK file 6. Enter data in REC file7. Validate data (double entry procedure)8. Documentation data (code book) 9. Export data to SPSS 10. Import data into SPSS

Page 12: Data Management by Shahzad Asghar Arain

Filenaming and File Managementc:\path\filename.extA web address is a good example of a filename,

e.g., http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.pptSome systems are case sensitive (Unix)

Others are not (Windows) Always be aware of

Physical location (local, removable, network) Path (folders and subfolders) Filename (proper) Extension

Demo Windows Network Explorer: right-click Start Bar > Explore

Page 13: Data Management by Shahzad Asghar Arain

ExtensionExtension Software programSoftware program.qes.qes EpiInfo/EpiData questionnaireEpiInfo/EpiData questionnaire.rec.rec EpiInfo/EpiData records (data)EpiInfo/EpiData records (data).chk.chk EpiInfo/EpiData check (controls & labels)EpiInfo/EpiData check (controls & labels).not.not EpiData notes (data documentation)EpiData notes (data documentation).sav.sav SPSS permanent data fileSPSS permanent data file.sps.sps SPSS syntax file (program)SPSS syntax file (program).txt.txt Generic (flat) text dataGeneric (flat) text data.htm.htm Web BrowserWeb Browser.doc.doc Microsoft WordMicrosoft Word.xls.xls Microsoft ExcelMicrosoft Excel

Page 14: Data Management by Shahzad Asghar Arain

Selected EpiData Variable Types

Variable TypeVariable Type ExamplesExamplesTextText _ _

<A ><A >NumericNumeric ##

##.###.#DateDate <mm/dd/yyyy><mm/dd/yyyy>

<dd/mm/yyyy><dd/mm/yyyy>Auto IDAuto ID <IDNUM><IDNUM>Sondex (sanitized)Sondex (sanitized) <S ><S >

Page 15: Data Management by Shahzad Asghar Arain

EpiData Variable NamesVariable name based on text that occurs

before variable type indicator codeEpiData variable naming default vary

depending on installation Create variable names exactly as specified

To be safe, denote variable names in {curly brackets}

For example, to create a two byte numeric variable called age, use the question:

What is your {age}? ##

Page 16: Data Management by Shahzad Asghar Arain

Demo / Work AlongCreate QES file [demo.qes]Convert QES to REC [demo.rec]Create CHK file [demo.chk]Create double entry file [demo2.rec]Enter data Validate data

FnameFname LnameLname DOBDOB SEXSEX DEATHAGEDEATHAGE

JohnJohn SnowSnow 3/15/18133/15/1813 11 4545

GeorgeGeorge OrwellOrwell 6/25/19036/25/1903 11 4646

Page 17: Data Management by Shahzad Asghar Arain
Page 18: Data Management by Shahzad Asghar Arain

CodebooksContain info that helps users decipher

data file content and structureIncludes:

Filename(s)File location(s)Variable namesCoding schemesUnits Anything else you think might be useful

Page 19: Data Management by Shahzad Asghar Arain

EpiData codebook generators

Page 20: Data Management by Shahzad Asghar Arain

File Structure Codebook

Full codebook contains descriptive statistics (demo)

Page 21: Data Management by Shahzad Asghar Arain

Notice descriptive statistics

Page 22: Data Management by Shahzad Asghar Arain

Conversion of Data FileRequires common intermediate file formatExamples of common intermediate files

.TXT = plain text .DBF = dBase program.XLS = Excel

StepsExport .REC file .TXT fileImport .TXT file into SPSS Save permanent SAV file

Page 23: Data Management by Shahzad Asghar Arain
Page 24: Data Management by Shahzad Asghar Arain

Plain (“raw”) TXT dataplain ASCII data formatno column demarcationsno variable namesno labels

Page 25: Data Management by Shahzad Asghar Arain

tox-samp.txttox-samp.txt tox-samp.nottox-samp.not

Page 26: Data Management by Shahzad Asghar Arain

SPSS Data Export / Import

TXT(raw data)

REC

SPS(syntax)

SAV

Page 27: Data Management by Shahzad Asghar Arain

Lines beginning with * are comments (ignored by command interpreter)

Next set of commands showfile location and structure via SPSS command syntax

Page 28: Data Management by Shahzad Asghar Arain

Labels being importedinto SPSS

Delete * if you want this command to run

Page 29: Data Management by Shahzad Asghar Arain
Page 30: Data Management by Shahzad Asghar Arain
Page 31: Data Management by Shahzad Asghar Arain

Ethics of Data KeepingConfidentiality (sanitized files – free of

identifiers)Beneficence EquipoiseInformed consent (To what extent?)Oversight (IRB)