integrated approach processing marie brodeur director general, industry statistics branch,...
Post on 17-Jan-2016
215 Views
Preview:
TRANSCRIPT
Integrated Approach Processing Integrated Approach Processing
Marie BrodeurDirector General, Industry Statistics Branch, Statistics Canada
St. LuciaFebruary, 2014
SNA seminar in the Caribbean
Why A Centralized Process?
Best Practices Standardization of Processes
• Cross Survey Comparisons• Enterprise Centric Processing/Coherence
Analysis Efficient use of Resources Transportable Knowledge Across Survey
Programs
2
Pre-Grooming
Allocation / Estimation
Edit & Imputation
Records from Collection
Data ServiceCenter
Subject Matter Review & Correction
Tool
Tax Data
Business Register
UES Post-Collection Processing
3
Collection Precontact (Dec-Jan)
– Mostly for Business Register (BR) births; verification of contact information (name, address, …)
– By phone (in a few cases, a letter or a fact sheet is sent)
Mail-out of questionnaires (Jan-March)– 2 or 3 mail-out dates
Follow-up in case of non-response for some units (begins about a month after mail-out)
– Phone call, remail or fax
Mail-back of questionnaires
Verifications of received questionnaires / Edits– Is the questionnaire complete or are some key variables
missing? (Edit follow-up by phone in some cases) 4
Centralized Collection
Mailout
Pre-Contact
Edit / Verification
Receipt(75% target)
Delinquent Follow-Up
Capture / Imaging
“Clean” Records
Prioritize
5
Use Of Tax Data Validation (comparison)
Verify dubious collected data against the equivalent tax data record
Imputation One of the methods used for non-response
Estimation Direct Data Replacement Calibration Estimates
Update Business Register Allocation of survey data (use tax revenues, salaries
and expenses)
Develop centralized systems• Move away from stand-alone• Single point of access for security
Integrated Questionnaire Metadata System Edit and imputation Allocation and Estimation Data Warehouse
Centralized Processing Systems And Databases
Enterprise Portfolio Managers
Top 350 enterprises in Canada Status
• Platinum, Gold, Silver, Bronze Personal visits Enterprise Profiling Coordination of mail-out and collection Enterprise/ Establishment coherence Holistic Response Management
• Strategic Response Unit• Escalation Process / Statistics Act
8
What Is E & I? Editing
• Verify that parts add-up to total • Ensure that there are no missing values where parts
add up to total• There must be consistency between related
variables Imputation
• Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed
• Impute for missing data or partially responded data• Impute entire records in the case of total non-
response9
Why Is E&I Necessary?
To produce a complete and consistent data file that accounts for all sampled units
Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed
Correct erroneous responses
10
E&I Terminology
Data Group• Groupings (defined by SM) of records that will be kept Groupings (defined by SM) of records that will be kept
together for imputation purposestogether for imputation purposes• These groupings are based on multi dimensions:These groupings are based on multi dimensions:
industry (NAICS)industry (NAICS) geography (province)geography (province)
Data groups that will be used for a specific survey will depend on:• initial sample design (number of units sampled and the initial sample design (number of units sampled and the
level of stratification used)level of stratification used)• number of records that respond to the survey (a number of records that respond to the survey (a
minimum of 5 or 10 records are required)minimum of 5 or 10 records are required)
11
BANFF E & I System
Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses)
Impute for other missing variables:• Apply Historical Trend• Apply Current Year Trend• Use donor (for partial imputation)
12
BANFF Algorithms
DIFTREND - Historical trend imputation
CURRATIO - Current ratio imputation
PREVALUE – Value from the previous period for the same unit is imputed
PREAUX – Historical value of a proxy variable for the same unit
CURAUX – Current value of a proxy variable for the same unit
13
Allocation - Definition & Purpose
Definition: Allocation is the distribution of survey and administrative
data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame.
Purpose: To provide fully-processed micro data on a fiscal year
basis, for establishments or locations in-sample for the UES
Determine the distribution of value added by province
14
Establishment 1
Establishment 4
Establishment 3
Establishment 2
SAMPLE
Questionnaire 2
Collection/Processing
Allocation
Establishment 1
Establishment 4
Establishment 3
Establishment 2
Establishment U
Questionnaire 1
Sample Survey Allocation
15
23-04-21Statistics Canada • Statistique Canada16
Multi-ModeCollection
Quality Indicatorsand Scores
Follow-Up Editing
Imputation
Estimation
Sampling
Rolling Estimates
Interpretation &Dissemination
Automated Processing
Active Management
Manual Editing
Overview of the IBSP Rolling Estimates Approach
23-04-21Statistics Canada • Statistique Canada17
Active Management – Strategy Settings A subset of all Key Estimates is selected All Key Estimates are:
• Ranked from the most to the least important• Weighted relatively using an importance factor• Assigned a Quality Target
Targets are set in line with the importance factor. Active Collection ends for a Key Estimate when the Quality Indicator meets
the Quality Target.
Active management and sampling strategies are coherent by design.
Quality Indicator (QI)• QI= Sampling CV & Imputation CV & Pseudo Relative Bias
Measure of Impact (MI) Score• Impact of a unit on the QI for a given estimate• Units imputed from a poor model or with reported/imputed
values far from their predicted values will have high MIs.
23-04-21Statistics Canada • Statistique Canada18
Active Management – Definitions
Parallel run for 47 Business Surveys Four Rolling Estimates iterations Total CV calculated for all key estimates (8,600) at each iteration
23-04-21Statistics Canada • Statistique Canada19
Empirical Study – RY2011 Prototype
23-04-21Statistics Canada • Statistique Canada20
top related