a probability sample strategy for improving the quality of the consumer price index survey using the...
TRANSCRIPT
A Probability Sample Strategy for A Probability Sample Strategy for improving the quality of the Consumer improving the quality of the Consumer
Price Index Survey using the Information of Price Index Survey using the Information of the Business Registerthe Business Register
Luigi Biggeri , Piero Demetrio FalorsiLuigi Biggeri , Piero Demetrio FalorsiNational Statistical Institute of Italy (ISTAT)National Statistical Institute of Italy (ISTAT)
2
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
SummarySummary
The presentation describes a proposal of a new sampling strategy for the Italian CPI survey, aiming to identify a solution that may work out some of the problems of the current design, based on purposive sampling that sometimes could cause bias in the estimates. A complex random multiple stage pps sampling schema is proposed where the inclusion (or selection) probabilities at the different stages are proportional to the turnover.
Two relevant innovation herein proposed are related to the procedure for the selection of elementary items and to the estimation procedure, based on an observational strategy allowing: (i) to calculate proxy values of the weights w unknown at elementary item level; (ii) to define a consistent estimation method by means of which the national CPI estimate can be obtained as a weighted sum of the estimates of the subpopulation indices.
3
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
SummarySummary
1. The Current CPI construction: characteristics and issues
2. Analysis and new studies
3. A proposal for a probability sampling strategy
4. Sampling frame and design
5. Estimation method
6. Concluding remarks
4
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
The Laspeyres type index
Where: P is the price; y the year; m the month;
a= geographic area; c = local district, v = outlet, j =elementary item
1. The Current CPI construction: characteristics and issues (a)
j
yjw 11,12
j a
ymyvcj
yvcjcvac
ymy rwI ,;1,12,
1,12,
,;1,12
1,12,
,,,;1,12
,
yvcj
ymvcjymy
vcjP
Pr
5
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
The current purposive sample strategy of the CPI survey
• The collection of prices of a fixed basket of 562 representative products (purposively chosen) is carried out in two different ways: – (a) centrally (roughly 60 products) by the staff of Istat
through specific sample procedures
– (b) locally (roughly 500 products) directly by staff of Municipal Statistical Offices involved in the survey.
• Local survey: Three sampling stages:– The first stage units (PSU) are the chief towns of
provinces (86 municipalities out of 103) – The second stage units are the outlets purposively
chosen (at December of each year) in each PSU to be representative of the consumer behaviour as a kind of quota sampling (roughly 40,000)
– The most sold elementary items of the fixed basked of products (chosen at December of each year) are observed in each selected outlet (roughly 400,000)
1. The Current CPI construction : characteristics and issues (b)
6
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
• The elementary indexes are obtained at municipality level by unweighted geometric mean
• The national index is calculated by subsequent territorial aggregation of elementary indexes, using weights at different levels based on population, national account data and households expenditure survey
• CPI for each sampled municipality is also calculated
1. The Current CPI construction: characteristics and issues (c)
7
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Some issues of the current survey
1. The current survey structure based on purposive sampling strategy does not allow to evaluate the accuracy; attempts to evaluate variance should be carried out.
2. Not all the chief towns of provinces are included in the survey and the small municipalities are not included at all.
3. The selection criterion of the “most sold elementary item” of the product in each outlet could introduce unknown bias
4. The lack of adequate detailed information on the households’ consumer expenditures, prevents the use of the weights at the elementary aggregate level and at municipal and regional level
2. Analysis and new studies (a)
8
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
The need for experimental analysis
1. To get information on the importance of the possible biases, analysis and computations must be carried out implementing adequate experiments
2. To evaluate and improve the quality of the Italian CPIs, last year Istat established a Scientific Committee that is reviewing the different aspects of the indices construction process. The Committee has stressed the need to study and verify the fesibility construction of a probabilistic sample strategy.
2. Analysis and new studies (b)
9
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
1. The proposal is tailored for the survey of prices collected locally
2. Recent availability of a business register referred to the local units and yearly updated (outlets)
3. Possibility to estimate the turnover of each outlet for each product, to be used for construction of weights.
4. The proposed survey framework based on a probability sample strategy guarantees unbiased estimates and should deal with most of the mentioned issues
5. The sample design consists of a three stage selection scheme (local districts, outlets and items) using probabilities proportional to the turnover used as a proxy of the consumer expenditure.
6. The index estimation in based on an observational scheme allowing to obtain proxy measures of the weight. Generalised regression estimator is used. A coherence of the calculated indexes for different estimation domains (planned or not) is obtained
3. A proposal for a probability sampling strategy
10
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
The parameter of interest is the national prices index
c = local district, v = outlet, d = type of product, j = item
price index of item (d,j,c,v)
weigth of item (d,j,c,v)
in terms of sold in base period
N
c
M
v
D
d
J
jvcjdvcjdpop
c vc vcd
rwI1 1 1 1
,,,,1,0
, ,
0,,
1,,,, vcjdvcjdvcjd PPr
N
c
M
v
D
d
J
jvcjdvcjdvcjd
c vc vcd
FFw1 1 1 1
0,,
0,,,,
, ,
4. Sampling frame and design (a)
11
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
PSUs: Local districts (municipalities in Italy) are selected within the geographical area through balanced sampling, aiming to define a sample producing direct estimates of the totals of some auxiliary variables equal to the known totals (Deville and Tillé, 2004)
SSUs: The sampling design for the outlets consists of linking D distinct samples, one for each type of product (TP). The outlet selection is made through a coordinated selection technique (PRN) aiming at obtaining an high level of overlapping of the selected samples for each type of product, reducing the size of the total sample of outlets, being equal the number of observed items (Ohlsson, 1995)
FINAL UNITS: A probability sample scheme for the item selection based on iterative hierarchical drawing of groups of products is proposed. Such a scheme is feasible and allows to solve the current problem of the definition of the fixed basket of products
4. Sampling frame and design (b)
12
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Planned domains for the survey estimates:
The most detailed domain is the geographical area by Type of Product (TP), element of the four digit classification of COICOP
4. Sampling frame and design (b)
13
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Sampling frame: Local Unit Archive
– Yearly updated
– The information contained in the archive (expositive surface, number of employees, economic activity code, geographical zone) are used for the stratification by size, outlet typology, etc.
– The NACE code may allow to establish which outlets sell the TP items to households
– A table linking NACE codes and types of products has been constructed
4. Sampling frame and design (c)
14
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Table 1. Example of table linking Types of products and NACE codes
Type of product
NACE CODE
52.11.1
52.11.2
52.11.3 …
52.24.1 …
52.27.4
52.48.6 52.48.E …
52.74.0
hypermarket
supermarket
grocerydiscount
Retail market of bread
Retail market of objects for arts, religion, etc.
Retail market for not other grocery
1 Rice 1 1 1 … 0 … 1 0 0 … 0
2 Bread 1 1 1 … 1 … 1 0 0 … 0
3 Pasta 1 1 1 … 0 … 1 0 0 … 0
.. …… … … … … … … … … … … …
.. …… … … … … … … … … … … …
207
Expenditures for religion
0 0 0 … 0 … 0 1 1 … 0
4. Sampling frame and design (d)
15
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Sampling Frame CONSTRUCTION: definition of turnover
outlet turnover: from business register source, it is exactly known only for the enterprises with only one local unit; otherwise it is imputed using different data sources
turnover for outlet and type of product: estimated
using different data sources (fiscal data, business register, National Accounts, Household Budget Survey)
Note that possible errors in imputation do not imply
bias on sampling strategy but they can cause only an increase of variance
0v,cF
~
0v,c,.d F
~
4. Sampling frame and design (e)
16
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Local districts Selection
– sample local districts are drawn from the local districts of the a area by means of a balanced sampling design with inclusion probabilities proportional to the turnover:
– The balancing equations are
being
where is the overall turnover of the c-th local district for the d-th type of product calculated by summing up frame data
)a(n )a(N
0)(.,.
0,..,.
)( ~
~
a
cac
F
Fn
)a()a( N
1cc
n
1c c
c xx
c0,.c,.D
0,.c,.d
0,.c,.1c ,F
~,...,F
~,...,F
~x
0cd F ,.,.
~
4. Sampling frame and design (f)
17
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Outlet Selection
– Separate samples are realised (one for each type of product).
– Each sample is performed through a PRN coordination technique which realises the maximum overlapping of the outlets selected for the different types of products
– In the sample selected for the generic type of product d (d=1,…,D) the outlets are stratified by typology within the local district
– The outlet final inclusion probability is defined as proportional to the outlet size in terms of turnover for the d-th type of product in the area.
4. Sampling frame and design (g)
0)(,.
0,,.
)(0)(,.
0,,.
)(|,, ~
~
~
~
ad
vcdad
adc
vcdadccvcdcvcd
F
Fm
F
Fm
18
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Items Selection (1)
In order to perform the probability selection of the items, the main operational difficulty is the construction of the list of all items sold in the outlet belonging to the type of product for which the outlet has been included in the sample
– A way to solve such a difficulty is to define:
• A hierarchical tree classification of elementary products for each type of product;
• A selection procedure for each level of this structure.
– The procedure should be translated in a specific algorithm, implemented in the lap-top used by the interviewer for the data collection. This operation allows to identify briefly a very small subset of homogeneous items to be used for the item selection in the outlet.
4. Sampling frame and design (h)
19
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Swimming Skiing Other SportsBody Building
Body Building
Water Polo
Downhill skiing Cross-country skiing
Swimming
Tennis
Fli
p p er
s
M as
ch er
eS wi
m su it
Wat
er
polo
eq
ui
pm
en
t Sk
i b oo
tsSk
i
Bod
y
Bu
ild
ing
Sk
iw ea
r Sk
i b oo
tsSk
i Sk
iw ea
r Te
n ni
s ra ck et
S p or
ts w ea
r Fo ot
b all
S p or
ts w ea
r
Fli
pp
er
s
Sw
im sui
t
Wat
er
polo
eq
ui
pm
en
t Ski
bo
ots
Ski
Bo
dy
Bu
ildi
ng
Sn
ow
sui
t Sk
i b oo
tsSki
Sn
ow
sui
t Te
nni
s ra ck et
Sp
ort
sw ea
r Sh
ort
tr
ous
er
s Un
de
rs hir
t Fo
ot
bal
lSn
ea
ke
rsGl
ov
es
Gl
ov
es
Football
TP = Equipment for Sport
20
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Item Selection (2)
– The procedure of item selection uses, at each level, the inclusion probabilities defined on the basis of information available in the sampled outlet or available as a auxiliary priori information.
– The optimal situation would occur if the probabilities used at each level were proportional to the turnover of the unit with respect to the total turnover of the outlet for the set of units among which the selection has to be carried out at the specific level.
– The probability selection allows to define unbiased estimators
– The efficiency of the estimates depends on the kind of the selection probabilities used
4. Sampling frame and design (i)
21
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
Final inclusion probabilities
The sampling scheme is implemented giving the items an inclusion probability proportional to the ratio between the item turnover and the overall turnover, at the d -th TP and area a level.
This expression shows that the proposed sample design is approximately self-weighting
4. Sampling frame and design (l)
0)a(,.d
0v,cj,d
)a(dv,cj,dF
Fj
22
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
In the estimation phase it is useful to express the weight with the following factorisation:
Therefore, a proxy observable value of this weight can be calculated as
5. Estimation method (a)
v,cj,dv,c,.dv,c.,.v,cj,d kkkw
0v,c,.d
0v,cj,d
0v,c.,.
0v,c,.d
0.,.,..,.
0v,c.,.
v,cj,dF
F
F
F
F
Fw v,cj,dv,c,.dv,c.,. kkk
where are respectively the
imputed values of
v,cj,dv,c,.dv,c.,. kandk,k
v,cj,dv,c,.dv,c.,. kandk,k
23
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
The general index estimate can be obtained by means of the
generalised regression estimator proposed by Valliant (1999),
based on the model
The expression of the estimator is
In this way the sample estimates equal the population totals,
known or estimates from external sources (Households Budget
Survey, National Accounts).
5. Estimation method (b)
v,cj,dv,cj,dv,cj,d xβr
A
a
D
d
n
c
m
v
j
jvcjdvcjdvcjd
vcjd
ad cgd ad
rwII1 1 1 1 1
,,,,,,,,
WW1,01,0
)( )( 1ˆ)ˆ(ˆ~
βXX
24
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
The proposed strategy is coherent with the Italian current practice: the sample of elementary items and outlets is updated each year to take into account the rapid changes in the products and in outlet universes. The sampling selection of outlets and items developed with permanent random numbers techniques allows implementing in a simple way a yearly updating of the samples guaranteeing, at the same time, to realize a prefixed rotation rate (Ohlsson, 1995).
Meanwhile, the sample of Local Districts, once selected, remains unchanged for several years. This is justified by cost consideration, connected with the high cost of training the interviewers for the local districts, and by the fact that the structure of local districts changes over time very slowly.
6. Concluding remarks (a)
25
A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …A Probability Sample Strategy for improving the quality of the Consumer Price Index Survey …
To verify the feasibility of the proposed probability sampling design, an experimental version of the frame has been implemented for testing various aspects of the sampling strategy. The outcome of the experiments have been encouraging. An experimentation of the selection of local districts (correspond to municipalities) and outlets for the Italian survey has been carried out.
Many other experiments have to be carried out to evaluate: (i) the feasibility and the cost-efficient implementation of the proposed probability sampling strategy; (ii) the quality improvements that
can be obtained using only partially the proposed strategy.
Concluding remarks (b)