access to sensitive data in the uk: a principles-based approach felix ritchie
TRANSCRIPT
Access to sensitive data in the UK: a principles-based approachFelix Ritchie
Overview
• Design principles
• Policies developed
• Conclusion: why do principles matter?
Part 1
Design principles
The framework principle
• Data access is driven from first principles
user needs
NSI optionslegalenvironment
solution
technology
…which is not this model; how about…
user needs
NSI principles
legalenvironment
solution
technology
principles of access
The principles in use today at ONS
• The value of microdata is well-established
• There are risks in not making full use of data
• Public bodies should be supporting research– for the public benefit– for their own benefit
• Not every research project needs detailed data– data released should be consistent with need
• Access to data should be driven by cost-benefit or cost-effectiveness assessments
The interaction between law and principle• Up to 2002: various dubious practices
– £1 contracts– Researchers using own equipment– Poor records of microdata use
• 2002-2008– New recording system for applications– Review and rationalisation of legal gateways– But still many hurdles to cross
• 2008 – – Experience led to significant provision in law for
research use
The legislative model
• Statistics and Registration Services Act 2007– single law allowing, in principle, access to all govt
data via ONS– flexible Approved Researcher scheme– ONS given a statutory duty to support research– but not a free-for-all
• ONS has a duty to protect confidentiality– even for Approved Researchers– data release has to be consistent with need
→ the data model
The data model: what is it?
• ‘Spectrum’ of access points balancing– value of data– ease of use– disclosure risk
• for a given level of confidentiality, maximise data use and convenience
• no ‘one-size-fits-all’ solution– no absolute prohibitions– trade-off is made explicit– users determine appropriate level of access
Type of access
None VML
ONS sites
VML
Govt sites
Secure data
service
Special
licences
Licensed data
archive
Internet
Anonymi-sation
Little Complete
SDC of inputs
None Complete
Restric-tions on users
Many None
SDC of outputs
Complete None
Use of confidential data: the access spectrum
Distributed access Distributed data
The data model: does it work?
• Options should cover most cases– Can’t be perfect in every case– Jumps between solutions should reflects data utility
and patterns of research use
• Pretty efficient– Fairly transparent– Users balance their own costs/benefits– Economies of scale delivering mass solutions
• How do we define/describe access points?
→ the security model
Part 2
Policies developed
The VML security modelWhy was it needed?
• Tendency to focus on single risks eg IT
• Poor understanding of complementarity of risk management measures
• New developments (eg output SDC, distributed access) not covered by current models
The VML security model:How does it work?
• valid statistical purpose
• trusted researchers
• anonymisation of data
• technical controls around data
• disclosure control of results
safe projects
+safe people
+safe data
+safe outputs
safe use
+safe setting
Active researchermanagement
Principle-basedSDC
Making people safe:researcher vs data management
• Traditional focus on ‘data management’ – Responsibility for security and operations rests with NSI– Security based on ‘worst case’ scenarios
• Consequences of data management approach– High cost of pre release anonymisation– Lack of communication
Lack of mutual understanding of needs, priorities and working practices
– Culture of distrustResearchers do not take responsibility for data
confidentiality
Researchers do not understand, or see the need to understand, SDC
– Risk of researchers attempting to subvert data security
The VML model:Active Researcher Management
• Researchers will engage with NSI if given a chance
• Actively engage with researchers– In explaining NSI goals– In explaining disclosure control– in understanding researcher needs, working practices– In securing cooperation minimise sensitive output
• Responsibility for data security shared between NSI and researcher (NSI always get final say)
• Certify researchers as part of the security model
ARM: A matter of perspective
Negative:
researchers as risks
Positive:
researchers as collaborators
“we’re doing this to protect the data” (from you)
“doing this allows us to supply you with more detailed data”
“you must limit your output to reduce the chance of disclosure”
“limit your output because we have finite resources; people who produce good output get their results back quicker”
Costs and benefits of ARM
• Better security
• More efficient management
• Easier change management
• There are costs:– Initial training costs– Ongoing communication costs
Statistical Disclosure ControlWhy was a new model needed?
• No value in protecting data– Protect only the results people want to take away
• But ‘traditional’ methods rely upon a finite set of outputs – not appropriate for research
Principles-based SDC
• SDC at the point of release
• trained NSI staff and researchers
• agreement on principles and purpose
• safe vs unsafe outputs, based on functional form
• No absolute restrictions– Procedures for resolving differences crystal-clear
Part 3
Concluding comments
What have we learnt?
• Design based on first principles…– made design slow but robust– helped identify failings in current approaches– showed where new models were needed– allowed the evaluation of new and different
models
• But this is the wisdom of hindsight– Development a heuristic process
Next stages
• Translation of VML model in its entirety to academic partners– First major test robustness of procedures
• Cost-benefit analysis of VML operations and review of strategic function– Does the VML have a future?
• Models for international data sharing– Can principles surmount the insurmountable?
Questions?
• Felix Ritchie• Microdata Analysis and User Support• [email protected]
• Virtual Microdata Laboratory (VML)• Microdata Analysis and User Support• [email protected]