statistics canada’s real time remote access solution 2011 msis meeting – karen doherty may 2011
TRANSCRIPT
Statistics Canada’s Real Time Remote Access
Solution2011 MSIS Meeting –
Karen Doherty
May 2011
2011-05-23Statistics Canada • Statistique Canada2
Background
Access to, and analysis of, StatCan data is fundamental to the fulfilment of our mandate.
Traditionally provided access through:• Aggregate data posted on the Agency’s website;• Public use microdata files (PUMFs); and• Special and customizations of aggregate data.
Currently 20 Research Data Centres (located in universities) provide access to confidential microdata files to researchers across the country
2011-05-23Statistics Canada • Statistique Canada3
Background
StatCan is facing increasing demands for greater access to detailed microdata
Advances in IT offer opportunities for producing, disseminating, mining analysing data
Researchers are frustrated with the impediments to data access imposed by StatCan
2011-05-23Statistics Canada • Statistique Canada4
RTRA – The Business Solution
An on-line remote access facility that allows researchers to run data analyses on microdata sets
Data sets are stored in a central and secure location under the control and care of StatCan
2011-05-23Statistics Canada • Statistique Canada5
Data Access Strategy
Researcher
Data User
General user (student, reporter
etc.)
RDC, Remote Access, Research
Contracts
Custom data products,
PUMF
Pre packaged tables
Type of
User
Service &
Cost
2011-05-23Statistics Canada • Statistique Canada6
Development of a Working System
Phase 1 – completed 2009• Identification of business requirements focusing on components
such as security, legal, and functionality
Phase 2 – completed 2010• Pilot version – limited number of researchers and restrictions on
types of requests allowed and level of details provided
Phase 3 – first production version – 2011• Functionality will be expanded incrementally in order to evaluate
security measures and mitigate risks
2011-05-23Statistics Canada • Statistique Canada7
Solution Approach
Examined lessons learned from other NSIs Determined key requirements of the model Adopted a model similar to the ABS model Built on existing e-File Transfer (e-FT) facility to securely
transfer files across the “air gap” Security issues addressed via 4 four control points:
• Secure dataset housing• Secure transit of datasets• Registered Users validation• Confidentiality rules for output
Right balance of risk versus security
2011-05-23Statistics Canada • Statistique Canada8
How RTRA Works
• Researcher submits SAS program
• Request passes through firewalls to secure server
• Upon vetting, tables are returned to researcher in specified format
• If request does not comply submission will not be run and the log will be returned for adjustment
• All submissions are monitored and logged and logs are kept for auditing purposes
2011-05-23Statistics Canada • Statistique Canada9
How RTRA Works
Pre-Scan of requests:• Limits access to data files• Ensures that the programming guidelines have been
followed• Uses automated SAS process to control output
Post-Scan of outputs:• Applies a controlled rounding algorithm to output tables• Limits each submission to 10 tables• Limits each researcher to 10 successful program
submission per day• Supports two formats for output (.sas7dbat and HTML)
2011-05-23Statistics Canada • Statistique Canada10
Methodological Challenge
No absolute criterion for defining confidential data, however in terms of disclosure control, StatCan applies risk management practises to safeguard the confidentiality of microdata
Developed specific rules:• Slightly masked microdata files• Automatic disclosure rules for tabular outputs• Pre-scan for inputs• Post-scan for the outputs
Strategy involves trade-offs of the four potential methodologies, any decision involves managing risk and consideration of levels of security
Architecture of RTRA – Design
2011-05-23Statistics Canada • Statistique Canada11
Technologies
• File Transfer – e-FT Services (COTS)• Workflow Components - SAS• User Authentication – SAS and StatCan
Customer Relations Management System (CRMS)
• Archive – Folder• Data Views – SAS• Automated Workflow – SAS Sniffer• Post-Scan – StatCan rounding tool
RNDII.exe
User Interface
2011-05-23Statistics Canada • Statistique Canada12
User creates a request
User Interface
2011-05-23Statistics Canada • Statistique Canada13
User logs onto RTRA from StatCan website
User Interface
2011-05-23Statistics Canada • Statistique Canada14
User submits the request
Resulting data to be delivered to an external FTP server via StatCan e-FT system
Future Direction
Adjust service based on client feedback for requirements and to tap into wider audience of academics and the private sector
Bring the solution in-sync with new WAN infrastructure used by Research Data Centres
Increase availability of additional cross-sectional surveys to researchers
Develop vetting procedures for longitudinal surveys and administrative data
2011-05-23Statistics Canada • Statistique Canada15
2011 Work Plan
Quality indicators for frequency indicators – June 2011
Means, medians, percentiles, ratios and proportions – August 2011
Investigate support for other programming languages such as SPSS – on-going
Add Census information – November 2011Work with Generalized Tabulation System (G-Tab) development team to see if G-Tab can automated confidentiality by types of output – beginning in 2011
2011-05-23Statistics Canada • Statistique Canada16
2011-05-23Statistics Canada • Statistique Canada17
Conclusion
Starting to gain traction among Government of Canada researchers.
As the system evolves Statistics Canada believes this tool will become a key component of the toolset available to researchers such as:• policy researchers in government departments and
agencies (federal, provincial, or municipal)• academic researchers in Canadian universities• any other researcher who agrees to the RTRA terms
and conditions of use