towards a better integration of survey and tax data in the unified enterprise survey
DESCRIPTION
Towards a Better Integration of Survey and Tax Data in the Unified Enterprise Survey. Claude Turmelle Statistics Canada ICES-III Montréal, Québec, Canada June 18-21, 2007. Outline. Overview of the UES Characteristics of the target population Current use of tax data At sampling - PowerPoint PPT PresentationTRANSCRIPT
Towards a Better Integration of Towards a Better Integration of Survey and Tax Data in the Unified Survey and Tax Data in the Unified
Enterprise SurveyEnterprise Survey
Claude TurmelleStatistics Canada
ICES-III Montréal, Québec, Canada June 18-21, 2007
2
Outline
Overview of the UESCharacteristics of the target populationCurrent use of tax data
At samplingAt imputationAt estimation
Issues and ChallengesTowards a better use of tax dataConclusion
3
Overview of the UES
Unified Enterprise Survey (UES) started in 1997Objectives
Integrate all annual business surveys into one unified survey frameworkTo produce quality financial and commodity estimates
National and sub-national levelsIndustrial levels
4
Overview of the UES
Target populationAll Canadian businesses within the covered industriesThe UES is an Establishment based survey
Coverage over time1997: Seven Industries1998: Sixteen more (including Wholesale)1999: Four more (including Retail)2000: Four more (including Manufacture)….2007: Now covers over 60 major industries
5
Characteristics of the Target Population
Divided into two main types of businesses: unincorporated (T1) and incorporated (T2)
General Index of Financial Information (GIFI) data are available electronically for the entire T2 populationT1 data are only available electronically for about half the T1s (e-filers)
6
Characteristics of the Target Population
An enterprise is Complex: Multi-provincial and/or Multi-industry and/or Multi-legal Simple: The opposite
An enterprise is alsoSingle: Only one establishment Multi: More than one establishment
Simple-Single enterprises represent about 95% of the population, although only about 40% of the economy
7
Current Use of Tax Data
Why would someone use tax data?Improve efficiency of the sample designReduce the response burdenReduce the collection costImprove quality of the estimates
8
Current Use of Tax Data
At samplingSome key variables taken from different tax files are put on the sampling frame
Total Revenue, Total Expenses from GIFITotal Sales from Goods & Services Tax (GST)Salaries & Wages, # Employees from Payroll Deductions (PD7)
Used to define a size measure (Total Revenue) for each establishment on the frameUsed to stratify the population by size and to define the Take-None (T-N) portion
9
Current Use of Tax Data
At imputationUsed to replace survey data (financial variables) for a predetermined sub-sample of selected Simple-Single unitsAlso used to replace survey data for some non-respondentsUsed as auxiliary data during imputation
10
Current Use of Tax Data
At estimationGIFI data are used to produce estimates for all T2 units falling in the T-N portionT1 e-filer data are used to produce estimates for all T1 units falling in the T-N portion
11
UES Survey Design at a GlanceT2
T2 Take-None:
Census of GIFI
EXCLUSION THRESHOLD
Main sample to be surveyed
For variables available from tax:
Total estimate = Survey estimate (T1,T2) + T2 Take-None + T1 Take-none e-filer estimate
For variables not available from tax (Characteristics):
Total estimate= Survey estimate (T1, T2)
Not eligible for tax : full questionnaire
Tax replaced
Characteristic quest. (services surveys) or full questionnaire (other surveys)
T1
Main sample to be surveyed
T1 Take-None:
Sample of e-filers
12
Issues and Challenges
At samplingSometimes we get inconsistent tax data
Ex: GIFI Total Revenue=$2MGST Total Sales=$25M
What do we do?We use a conservative approach, i.e. we take the maximumWe manually verify and adjust the extreme cases (we’ll make use of survey data if available)
13
Issues and Challenges
At sampling (cont’d)Sometimes all we get is # Employees or Salaries & Wages (Revenues = . or $0)What do we do?
We model Total Revenue using what’s available
14
Issues and Challenges
At imputationSometimes we can’t find the link to tax data (ex.: not-for-profit organizations)Sometimes we link to 2 or more tax filesWe currently use direct tax replacement (i.e. Ysurvey = Xtax). Should we instead use a modelling approach (i.e. Ysurvey = f(Xtax)?
Studies have shown that in some cases it might be more appropriate to use f(X)
15
16
Issues and Challenges
At estimationCurrently, we use the one-phase Horvitz-Thompson estimator
It’s a very simple, and fairly efficient estimatorUnfortunately, it could be severely biased if the model y = x doesn’t hold
unitsreplacedTaxforxirequestionnathroughcollectedunitsfory
ywhere
ywY
i
ii
siii
*
*1
ˆ
17
Issues and Challenges
At estimation (cont’d)Estimates for variables not available from tax file (characteristics/commodity) do not cover the T-N portionFor some characteristics the T-N portion can count for a lot more than 10%
18
Issues and Challenges
Data qualityResponse rates (What is a respondent?)
Respond to tax but not to the characteristic questionnaireReported tax data vs imputed tax dataPlanned tax replacement vs tax replacement for non-response
Variance & CVA lot of imputation occurs in the current strategy (incl. tax replacement)Shouldn’t we include the variance due to imputation?
19
Towards a Better Use of Tax Data
Understand the particularities of the different tax data sources (ex.: GST vs T2 is currently under investigation)Explore different administrative files to help with particular sub-populations (ex.: not-for-profit organizations)
20
Towards a Better Use of Tax Data
Keep investigating why Ysurvey ≠ Xtax even when they should conceptually be equalExplore the idea of using Ysurvey = f(Xtax)
Fine-tune our definition of who is eligible for tax replacement and who is notCurrently studying the possibility of using a more robust estimator to protect against the potential biasDeveloping a strategy to cover the entire population for all variables of interest
21
Start taking into account the variability introduced by imputation when computing variances and CVsA framework is under development to define response rates when both tax data and survey data are used for the same unitsExplore the possibility of making use of all the GIFI data, not only for the T-N and the sample
Towards a Better Use of Tax Data
22
Towards a Better Use of Tax DataT2
T2 Take-None:
Census of GIFI
EXCLUSION THRESHOLD
Main sample to be surveyed
For variables available from tax:
Total estimate = Survey estimate (T1,T2) + T2 Take-None + T1 Take-none e-filer estimate
For variables not available from tax (Characteristics):
Total estimate= Survey estimate (T1, T2)
Not eligible for tax : full questionnaire
Tax replaced
Characteristic quest. (services surveys) or full questionnaire (other surveys)
T1
T1 Take-None:
Sample of e-filers
Eligible Ineligible
23
Conclusion
Since the introduction of the UES, the use of tax data has increased consistentlyIt has significantly reduced response burden and the cost of the surveyUnfortunately, sometimes at the expense of a reduced data interpretabilityFortunately, it was recently decided that we would take a few steps back to evaluate how we currently do things, and to determine how we could improve our strategy
Pour plus d’information, veuillez contacter
For more information please contact
Visit our web site atwww.statcan.ca
Claude Turmelle(613) 951-3327