research data workflow
DESCRIPTION
Research data workflow. Practice in Slovenian Social Science Data Archives. SERSCIDA WP4 – WORKSHOP Ljubljana S ept ember 2013. SIP, AIP, DIP. Submission Information Package (SIP ) Archival Information Package (AIP ) Dissemination Information Package (DIP ). DIP. SIP. AIP. AIP. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/1.jpg)
Research data workflow
Practice in Slovenian Social Science Data Archives
SERSCIDA WP4 – WORKSHOP LjubljanaSeptember 2013
![Page 2: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/2.jpg)
www.serscida.eu
SIP, AIP, DIP
• Submission Information Package (SIP)• Archival Information Package (AIP)• Dissemination Information Package (DIP)
SIP DIP
AIP AIPLong term preservation
![Page 3: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/3.jpg)
www.serscida.eu
Recommended formats – input
Type of material Recommended format Other acceptable formats
Questionnaire
Rich Text Format (*.rtf) structured metadata record of
questionnaire (*.xml) by DDI or CAI programme (*.bmi)
other text formats (*.docx, *.txt, etc.)
*.pdf or other graphical formats
printed version
Data material(data file)
SPSS (*.por, *.sav) plain text data, ASCII (*.txt) +
structured text or mark-up file containing metadata information (variable names, labels, categories, question text)
other statistical packages
tables (*.xlsx etc.) data bases
Textual material (study description, codebook, interviewer instructions, speech to respondents, copies of research reports)
Rich Text Format (*.rtf) printed version *.pdf or other
graphical formats other text formats
(*.docx, *.txt, etc.)
![Page 4: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/4.jpg)
www.serscida.eu
Recommended formats – distribution
• STUDY DESCIPTION: DDI structured XML• DATA FILE: ASCII + xml distributed in
formats that can be exported from Nesstar
• OTHER TEXTUAL MATERIAL: PDF
![Page 5: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/5.jpg)
www.serscida.eu
Recommended formats – archiving
• DATA FILE:ASCII (*.txt) + xml with DDI file and data description
![Page 6: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/6.jpg)
www.serscida.eu
Recommended formats – archiving• QUESTIONNAIRE, TEXT MATERIAL:original (any format) + distribution files (PDF)
• STUDY DESCRIPTION:DDI structured XML
![Page 7: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/7.jpg)
www.serscida.eu
Licence AgreementFree:• to Share — to copy, distribute and transmit the work• to Remix — to adapt the work• to make commercial use of the work
Under the following conditions:Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Free:• to Share — to copy, distribute and transmit the work• to Remix — to adapt the work
Under the following conditions:Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).Noncommercial — You may not use this work for commercial purposes.
![Page 8: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/8.jpg)
www.serscida.eu
Naming files and versioning
File format:StudyID_MaterialType_Language_Version_Subversion.FileFormat
Example: sutr1006_p1_sl_v1_r2.txt
URN:URN:SI:UNI-LJ-FDV:ADP:StudyID_MaterialType_Language_Version
Example: URN:SI:UNI-LJ-FDV:ADP:sutr1006_p1_sl_v1
![Page 9: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/9.jpg)
www.serscida.eu
Managing workflow
• Project tracking software
• Task for every study, with 29 subtasks covering:- general part with email correspondence- managing deposited materials- preparing data file- preparing study description- publishing
http://nesstar2.adp.fdv.uni-lj.si:8080/browse/RAZ-4536
![Page 10: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/10.jpg)
www.serscida.eu
Cleaning data and documentation
• Frequencies check• Variable names, values• Missing values• Recode• Weight• Anonymisation • Cumulative dataset
![Page 11: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/11.jpg)
Anonymisation
Sebastian KočarExpert Assistant in Social Science Data Archives
SERSCIDA WP4 – WORKSHOP LjubljanaSeptember 2013
![Page 12: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/12.jpg)
www.serscida.eu
Anonymisation in the archives - types
• basic anonymisation - of mostly academic research dataset
• anonymisation of Eurostat files
• anonymisation of official statistics Public Use Files (PUF)
![Page 13: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/13.jpg)
www.serscida.eu
Basic anonymisation of distributed microdata in archives
• deleting variablesDirect identifiers (telephone numbers, addresses etc.) are removed.
• recoding indirect identifiers But still allowing serious researchers to receive datasets with indirect identifiers non-recoded). Recoding includes removing values and bracketing – combining the categories of a variable.
![Page 14: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/14.jpg)
www.serscida.eu
Anonymisation of Eurostat files (the case of Eurostat Labor Force Survey)• deleting variables: indirect identifiers and unneeded variables
are removed (municipality, wave nr. etc.)
• bracketing: age, marital status, education, years of residence, age of establishment of residence, duration of search of employment, professional status, country & nationality
• classification: income numbers are not given, respondents are divided into classes based on their income
• aggregation: economic activity and occupation values are aggregated at 1-digit level
• top-coding: restricting the upper range of a variable (nr. of hours worked)
![Page 15: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/15.jpg)
www.serscida.eu
Anonymisation of official statistics Public Use Files for distribution in archives
• anonymisation software: μArgus, R! (sdcMicro, bethel, sampling packages), Cornell anonymisation toolkit, synthetic data generators
• anonymisation technics: data reduction techniques (global coding, local suppression etc.), data perturbation techniques (micro-aggregation, PRAM etc.), sampling, generating synthetic microdata
![Page 16: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/16.jpg)
www.serscida.eu
Anonymisation – a case study• PUF prepared in cooperation with SORS Sector for General Methodology
and Standards
• anonymisation procedure which follows Eurostat LFS anonymisation criteria (in SPSS)
• calculating individual and global risk (R! – sdcMicro)
• calculating strata allocation, based on individual risk averages by strata (R! – bethel)
• stratified sampling, based on the inclusion probability of a certain case (R! – sampling – samplecube)
• sample weights recalculation
• LFS 2010 PUF distributed in August 2013
![Page 17: Research data workflow](https://reader035.vdocuments.mx/reader035/viewer/2022062323/56816359550346895dd414f6/html5/thumbnails/17.jpg)
www.serscida.eu