dealing with confidential research information - anonymisation techniques and other measures to...
TRANSCRIPT
![Page 1: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/1.jpg)
Dealing with confidential research information
-Anonymisation techniques and other
measures to enable using and sharing research data
Data Management and Sharing workshopEdinburgh, 17 June 2008
![Page 2: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/2.jpg)
Using and sharing confidential
research data…obtained from people as participants
Requires a combination of:• discussing consent and confidentiality with
participants / respondents (dialogue)• anomymisation of data• user access restrictions
researchers only; use licence with confidentiality agreement; data unavailable for certain time period
What is required depends on:• nature of research• planned data uses• is study specific
![Page 3: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/3.jpg)
Identity disclosure
A person’s identity can be disclosed through: • direct identifiers
name, address, postcode, telephone number, voice, picture
usually NOT essential research information (administrative)
• indirect identifiers – possible disclosure in combination with other information occupation, geography, unique or exceptional values
(outliers) or characteristics
![Page 4: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/4.jpg)
Why anonymise data?
• Ethical reasons – protect identity (sensitive, illegal, confidential
info)– disguise research location
• Commercial reasons
• Legal reasons – protect personal data (DPA)
![Page 5: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/5.jpg)
Essential points
• Never disclose personal data (unless specific consent)
• Reasonable / appropriate level of anonymity
• Maintain maximum meaningful info
• Where possible replace rather than remove
• Identifying info may provide context, do not over-anonymise
• Re-users of data have the same legal and ethical obligation to NOT disclose confidential info as primary users
![Page 6: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/6.jpg)
Anonymising quantitative data
• Remove direct identifiersnames, address, institution
• Reduce the variable precision through aggregationpostcode sector vs full postcode, birth year vs date of birth, occupational categories
• Generalise meaning of text occupational expertise
• Restrict upper / lower ranges to hide outliersincome, age
![Page 7: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/7.jpg)
Relational data
Extra care needed - combinations of related datasets or a dataset in combination with publicly available info can disclose information e.g. businesses studied are mapped in publication
![Page 8: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/8.jpg)
Geo-referenced data
Point data may reveal position of individuals, organisations, businesses, etc.
• Remove point coordinates – loss of all geographical info
• Reduce precision - replace point coordinates with line or polygon of larger areakm2 area, postcode district, ward, road
• Reduce precision - replace point coordinate with meaningful variable typifying the geographical position catchment area, poverty index, population density
But: geo-referenced data are valuable for re-use. Maintaining geo-references and imposing access restrictions is better
![Page 9: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/9.jpg)
Anonymising qualitative data
• Plan or apply editing at startanonymise during transcription, highlight sensitive info for
later anonymising• Except: longitudinal studies - anonymise when data
collection complete (linkages)• Avoid blanking out information• Use pseudonyms or codes• Removing or aggregating identifiers in text can distort
data, make them unusable and unreliable or misleading - avoid over-anonymising
• Consistency within research team and throughout project
• [bracket] replacements for clarity • XML mark-up can be used for anonymisation (TEI tag)
<seg type="anonymised">word to be anonymised</seg>
![Page 10: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/10.jpg)
Tips
• Always consider anonymisation together with consent agreements and user access restrictions
• Regulating / restricting user access may offer a better solution than anonymising
• Remove, mask, change identifiers
• Maintain maximum information
• Create log of all anonymisations
• Keep copy of original data
• Plan at start of research, not at the end
Example: Anonymisation log interview transcripts
Interview / Page Original Changed toInt1p1 Spain European p1 E-print Ltd Printing p2 20th J une June
p2 Amy MoiraInt2p1 Francis my friend
![Page 11: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/11.jpg)
Sources
• Clark, A. 2006. Anonymising research data. NCRM Working Paper Series 7/06. ESRC National Centre for Research Methods.[http://www.ncrm.ac.uk/research/outputs/publications/WorkingPapers/2006/0706_anonymising_research_data.pdf]
• Economic and Social Data Services (ESDS) guidelines, UK Data Archive
• Inter-University Consortium for Political and Social Research (ICPSR). 2005. Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle. 3rd Edition. ICPSR, Ann Arbor.
• Timescapes meetings & discussions
![Page 12: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management](https://reader036.vdocuments.mx/reader036/viewer/2022081518/5515ec9455034638038b5110/html5/thumbnails/12.jpg)
Exercises / scenarios
• Anonymising qualitative data: – Foot & mouth study Cumbria 2001-2003 (5407)– Conflicts and violence in prison (4596)
• Anonymising quantitative data: Labour Force Survey• Confidential relational and geo-referenced data: British
Household Panel Survey