statistics canada’s real time remote access solution 2011 msis meeting – karen doherty may 2011

17
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

Upload: shawn-gallagher

Post on 29-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

Statistics Canada’s Real Time Remote Access

Solution2011 MSIS Meeting –

Karen Doherty

May 2011

Page 2: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada2

Background

Access to, and analysis of, StatCan data is fundamental to the fulfilment of our mandate.

Traditionally provided access through:• Aggregate data posted on the Agency’s website;• Public use microdata files (PUMFs); and• Special and customizations of aggregate data.

Currently 20 Research Data Centres (located in universities) provide access to confidential microdata files to researchers across the country

Page 3: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada3

Background

StatCan is facing increasing demands for greater access to detailed microdata

Advances in IT offer opportunities for producing, disseminating, mining analysing data

Researchers are frustrated with the impediments to data access imposed by StatCan

Page 4: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada4

RTRA – The Business Solution

An on-line remote access facility that allows researchers to run data analyses on microdata sets

Data sets are stored in a central and secure location under the control and care of StatCan

Page 5: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada5

Data Access Strategy

Researcher

Data User

General user (student, reporter

etc.)

RDC, Remote Access, Research

Contracts

Custom data products,

PUMF

Pre packaged tables

Type of

User

Service &

Cost

Page 6: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada6

Development of a Working System

Phase 1 – completed 2009• Identification of business requirements focusing on components

such as security, legal, and functionality

Phase 2 – completed 2010• Pilot version – limited number of researchers and restrictions on

types of requests allowed and level of details provided

Phase 3 – first production version – 2011• Functionality will be expanded incrementally in order to evaluate

security measures and mitigate risks

Page 7: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada7

Solution Approach

Examined lessons learned from other NSIs Determined key requirements of the model Adopted a model similar to the ABS model Built on existing e-File Transfer (e-FT) facility to securely

transfer files across the “air gap” Security issues addressed via 4 four control points:

• Secure dataset housing• Secure transit of datasets• Registered Users validation• Confidentiality rules for output

Right balance of risk versus security

Page 8: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada8

How RTRA Works

• Researcher submits SAS program

• Request passes through firewalls to secure server

• Upon vetting, tables are returned to researcher in specified format

• If request does not comply submission will not be run and the log will be returned for adjustment

• All submissions are monitored and logged and logs are kept for auditing purposes

Page 9: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada9

How RTRA Works

Pre-Scan of requests:• Limits access to data files• Ensures that the programming guidelines have been

followed• Uses automated SAS process to control output

Post-Scan of outputs:• Applies a controlled rounding algorithm to output tables• Limits each submission to 10 tables• Limits each researcher to 10 successful program

submission per day• Supports two formats for output (.sas7dbat and HTML)

Page 10: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada10

Methodological Challenge

No absolute criterion for defining confidential data, however in terms of disclosure control, StatCan applies risk management practises to safeguard the confidentiality of microdata

Developed specific rules:• Slightly masked microdata files• Automatic disclosure rules for tabular outputs• Pre-scan for inputs• Post-scan for the outputs

Strategy involves trade-offs of the four potential methodologies, any decision involves managing risk and consideration of levels of security

Page 11: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

Architecture of RTRA – Design

2011-05-23Statistics Canada • Statistique Canada11

Technologies

• File Transfer – e-FT Services (COTS)• Workflow Components - SAS• User Authentication – SAS and StatCan

Customer Relations Management System (CRMS)

• Archive – Folder• Data Views – SAS• Automated Workflow – SAS Sniffer• Post-Scan – StatCan rounding tool

RNDII.exe

Page 12: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

User Interface

2011-05-23Statistics Canada • Statistique Canada12

User creates a request

Page 13: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

User Interface

2011-05-23Statistics Canada • Statistique Canada13

User logs onto RTRA from StatCan website

Page 14: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

User Interface

2011-05-23Statistics Canada • Statistique Canada14

User submits the request

Resulting data to be delivered to an external FTP server via StatCan e-FT system

Page 15: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

Future Direction

Adjust service based on client feedback for requirements and to tap into wider audience of academics and the private sector

Bring the solution in-sync with new WAN infrastructure used by Research Data Centres

Increase availability of additional cross-sectional surveys to researchers

Develop vetting procedures for longitudinal surveys and administrative data

2011-05-23Statistics Canada • Statistique Canada15

Page 16: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011 Work Plan

Quality indicators for frequency indicators – June 2011

Means, medians, percentiles, ratios and proportions – August 2011

Investigate support for other programming languages such as SPSS – on-going

Add Census information – November 2011Work with Generalized Tabulation System (G-Tab) development team to see if G-Tab can automated confidentiality by types of output – beginning in 2011

2011-05-23Statistics Canada • Statistique Canada16

Page 17: Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011

2011-05-23Statistics Canada • Statistique Canada17

Conclusion

Starting to gain traction among Government of Canada researchers.

As the system evolves Statistics Canada believes this tool will become a key component of the toolset available to researchers such as:• policy researchers in government departments and

agencies (federal, provincial, or municipal)• academic researchers in Canadian universities• any other researcher who agrees to the RTRA terms

and conditions of use