[wiley series in probability and statistics] measurement errors in surveys (biemer/measurement) ||...

19
CHAPTER 26 MODELS FOR MEMORY EFFECTS IN COUNT DATA Piet G. W. M. van Dosselaar Netherlands Central Bureau of Statistics 26.1 INTRODUCTION At the Netherlands Central Bureau of Statistics, the analysis of memory effects was initiated by Dirk Sikkel in the early eighties and since then several surveys have been analyzed with respect to memory errors. However no attempt has been made to formalize the procedures in a probabilistic setting. In this chapter we will present a general theory for the modeling of memory effects in count data from surveys without an accessible external gauging device, as well as some new models for reported counts of labor market transitions. We consider a sample of respondents for which we would like to count the number of events of a certain type that occurred to them during a certain time interval. Information about these events is collected by using retrospective questions, and therefore our estimate of the number of experienced events may be affected by memory errors. Models for memory effects are developed to estimate the magnitude of the effects, to construct better survey estimates and, if possible, to develop correction procedures for the retrospective data. The methods used to model memory effects are closely related to the type of gauge that is available. The word "gauge" refers to any measure of the "true" net or mean number of events that can serve as a standard of comparison for the net or mean number of retrospectively reported events. Section 26.2 of this chapter gives a classification of these methods The views expressed in this chapter are those of the author and do not necessarily reflect the policies of the Netherlands Central Bureau of Statistics.

Upload: seymour

Post on 12-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

CHAPTER 26

MODELS FOR MEMORY EFFECTS IN COUNT DATA

Piet G. W. M. van Dosselaar Netherlands Central Bureau of Statistics

26.1 INTRODUCTION

At the Netherlands Central Bureau of Statistics, the analysis of memory effects was initiated by Dirk Sikkel in the early eighties and since then several surveys have been analyzed with respect to memory errors. However no attempt has been made to formalize the procedures in a probabilistic setting. In this chapter we will present a general theory for the modeling of memory effects in count data from surveys without an accessible external gauging device, as well as some new models for reported counts of labor market transitions.

We consider a sample of respondents for which we would like to count the number of events of a certain type that occurred to them during a certain time interval. Information about these events is collected by using retrospective questions, and therefore our estimate of the number of experienced events may be affected by memory errors. Models for memory effects are developed to estimate the magnitude of the effects, to construct better survey estimates and, if possible, to develop correction procedures for the retrospective data.

The methods used to model memory effects are closely related to the type of gauge that is available. The word "gauge" refers to any measure of the "true" net or mean number of events that can serve as a standard of comparison for the net or mean number of retrospectively reported events. Section 26.2 of this chapter gives a classification of these methods

The views expressed in this chapter are those of the author and do not necessarily reflect the policies of the Netherlands Central Bureau of Statistics.

Page 2: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

532 MODELS FOR MEMORY EFFECTS IN COUNT DATA

by type of gauging device. The remaining sections deal exclusively with the modeling of memory effects where the gauging device has to stem from the survey itself due to the absence of external sources. Whether a survey can furnish such a gauge depends on the combination of sample design (overlapping sample waves, for instance) and dependence on calendar time of the target variable. Although the ultimate model specification depends strongly on several survey characteristics, it appears that a general model can be derived, which can be extended to meet the goals and design of a particular survey. This general model consists of two components: the generation of events through a Poisson process and the conditional probability (as a function of elapsed time) that an event that actually took place is reported as well. For a particular survey this model can be extended by specifying these two components in more detail. This general model as well as some examples of its possible extensions will be treated in Section 26.3. Section 26.4 gives an example of how the general model is used in the process of building and estimating models for numbers of transitions between two labor market positions. The data on these transitions originate from the Dutch Continuing Labor Force Survey (CLFS). Subsection 26.4.1 des-cribes the design of the CLFS. Subsection 26.4.2 formulates the memory effect models in terms of the general model. In subsection 26.4.3 these models are estimated using transformed duration data in a two-step procedure. The model that can be considered the optimum one on both practical and theoretical grounds is derived properly and estimated using untransformed duration data in a one-step procedure in subsection 26.4.4. Section 26.5 contains the main conclusions.

26.2 CLASSIFICATION OF METHODS BY TYPE OF GAUGING DEVICE

The methods that can be used to model memory effects can be classified into three different groups according to the type of gauge that is available. In most cases the index of interest is the relative rate of report with respect to the " t rue" population total or population mean of events as it is derived (estimated) from the gauge that is used.

Class 1: External Validating Records Whenever we have access to existing records on the actual events experienced by our group of respondents, the effects of forgetting and telescoping can be studied by matching, as was done by for instance Sudman and Bradburn (1973) and Mathiowetz (1985). Since both survey and gauging device relate to the same respondents, analyses at an individual level are possible.

Page 3: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 533

Class 2: Reference Survey of Better Quality In this case we have at our disposal an external gauge, independent of our sample, in the form of a reference survey in which memory errors are (almost) absent. (See van Doeselaar, et al., 1989a.)

Class 3: No External Gauge In most cases we do not have access to an external gauge and we must look for a gauging device within the survey itself. If seasonal effects are present in the data it is essential that the design provide us with measurements of the target variable with various time-lags between the time of occurrence of the events and the day of interview (e.g., overlapping samples). In the case that seasonal effects can be ruled out a priori, a one-time retrospective study would suffice to study memory effects. In both cases we need some information on the dates of the events and we have to make some assumptions with respect to the "true" total or mean number of events in our group of respondents. Two examples of this class of problems can be found in van Dosselaar, et al. (1989a). A general model for this "no external gauge" situation is developed in Section 26.3 of this chapter.

26.3 THE THEORY FOR CLASS 3 PROBLEMS

We have a sample of respondents for whom we would like to estimate the total number of events experienced in a certain period of time. The information we have on these events is retrospective and likely to be biased due to recall errors. We have no access to external validating records, also there is no reference survey of better quality to provide a reliable gauge. Therefore we have to look for a gauging device within the survey at hand. In this section a general model is developed that can be used as a starting point for any particular survey without external gauge, but with appropriate design and target variable.

26.3.1 The General Model

Suppose we have a sample of respondents interviewed at time τ about their experiences with events of a particular kind during time period (ίο,τ]. The aim of the survey is to estimate the number of experienced events by this sample in the period (ίο,τ]. It is assumed that

1. reported events and their occurrence dates have been reported correctly, and

2. events are distributed in time as a Poisson process.

Page 4: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

534 MODELS FOR MEMORY EFFECTS IN COUNT DATA

Assumption 1 implies that overreporting is impossible and that telescop-ing effects are ignored. A necessary condition for assumption 2 to hold is that experienced events occur randomly and are rare in the population.

Let hr be the "true" number of events experienced by a randomly selected respondent in our sample in a relatively small interval 7 , = (Î,Î + S]C(Î0,T] and let ηγ(τ) be the number of events in this interval reported at time τ. Let A(t) be the intensity of the underlying Poisson process of hr and let λγ be the integral of k(t) over T. Informally, X(t) can be defined as the rate of occurrence of events (number of events per time unit) at time t and λτ as the expected number of events in T. For a formal definition of the intensity of a Poisson process, see Snyder (1985). For the "true" number of events in T we have the following distribution:

P[nT = *] = ^r-< * = 0,1, 2 , . . . (26.1)

For the distribution of the number of events in T reported at time τ we can write:

P[n^) = A] = £ p M t ) = k,nT = k+ j) i-o

= Σ 1 Μ » ) - *l*r = * + Λ plnT = k + j) i-o

CO

= £ P[k out of A + j events in T reported at τ] P[nT = k + j], J-° (26.2)

for fe=0,1,2,.... Note thatj>0 as a consequence of assumption 1. Let us now concentrate on the probability that k out of the k +j events that occurred in T are reported at τ. Due to the small length s of T we can treat all of the k +j events in Tas having equal probabilities of being reported at time τ and having occurrence times that are uniformly distributed over T. Let h(u) be the probability that a randomly selected respondent reports an event that actually took place u time units before the interview. Note that h(u) is defined to be only dependent on the time elapsed since the event and not on calendar time t or the intensity λ(ί). The probability that an event that occurred in T is reported at time t is given by equation (26.3) and can be interpreted as the average value of the reporting probability h(u) over the interval [τ -1 - β,τ -1):

t - l

#rW = Pfevent in T is reported at t] = du (26.3)

where we take A(u) to be a continuous function of u on the interval (0,oo). If we assume (assumption 3) that

Page 5: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 535

"7<T)|nr=m is distributed as a binomial random variable with para-meters m and Ηι{τ),

we obtain the following well-known result: CO

Ρ Μ τ ) - k) = V (k + Λ Hr(r)k (1 - Η^τ» P[nT=k+ j]

°° \k + j)! HW (1 - H^y X"T+i e - * -I k\j\ (k+j)\

?. ν ^ τ ^ - ^ γ α - H^)y kj

A! / J

(26.4)

Assumption 3 will hold if, for example, the length s of interval T is small enough so that all events occurring in the interval have approximately the same probability of being reported and approximately the same probability of occurring at any time within the interval T. From equation (26.4) we learn that the number of events that happened in Tand that are actually reported at τ is also Poisson distributed, but now with parameter λγΗι{τ). We can interpret λτΗτ(τ) as the rate at which events are reported at time t. This rate is seen to be the product of two rates: the rate of occurrence in the interval Tand the probability of recall at t of a given occurrence. This Poisson distribution for the reported number of events in a certain interval can serve as a starting point for the modeling of memory effects in a particular survey.

26.3.2 Possible Extensions of the General Model

For a particular survey the general model of subsection 3.1 can be specified in more detail. If the events are not too rare and sample sizes not too small, it can be used as a model at an individual level. Examples can be found in Sikkel (198δ), where individual models are proposed for the reported numbers of contacts with the family doctor, the specialist and the dentist. In Sikkel (1990) some of these models are extended to study differences in medical consumption between several subpopulations. In those papers the intensity A(i) is treated as a Gamma distributed latent random variable, instead of assuming λ(ί) to be constant over the individuals. For these models the integrated intensity λτ for any

Page 6: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

536 MODELS FOR MEMORY EFFECTS IN COUNT DATA

individual is simply the product ofthat individual's parameter value A(i) and the length s of the time period T under study. Several functional forme are proposed for the average forgetting probability 1 - Hj{t), which give some alternative descriptions of the possible dependence of the memory effects on time elapsed since the contact and on the number of previously reported contacts. Ultimately, the resulting Poisson-Gamma distributions are used to derive the distribution of so-called "profiles" of contacts. The models are estimated on retrospective data from a complete year of a continuing survey to neutralize seasonal effects and to provide for a gauge (i.e., no memory effects in data on a short period preceding the interview). These models illustrate the necessity of the assumption that the events under study occur randomly. Several models for the contacts with the family doctor and the specialist performed very well, whereas all models failed hopelessly for the more regular contacts with the dentist.

Whenever the number of reported events is rather small compared to the number of respondents or whenever the sample size is small, the general model can still be used, but now as a model at an aggregate level. Examples of this type of model can be found in van Dosselaar, et al. (1989a, 1989b), where several models for the reported numbers of transitions between the labor market states "employed" and "not employed" are proposed. Modified versions of these models and new estimation results will be presented in the next section of this chapter.

26.4 EXAMPLES FROM THE DUTCH CONTINUING LABOR FORCE SURVEY

26.4.1 Survey Design and Target Variables

In January 1987 the Netherlands Central Bureau of Statistics started the CLFS, which covers the noninstitutional population resident in the Netherlands. One of the survey goals is to publish figures on the labor market status (employed, unemployed and non-economically active) and on labor market dynamics. We will focus on transitions from one labor market status to another; for each transition the main aspects are the day of occurrence and the labor market status before and after the transition.

For the CLFS a stratified multistage sample (wave) of approximately 12,000 addresses is drawn every month. In July and August the sample size is half as large. Respondents are asked to give their employment histories for the year preceding the interview. For every respondent the past 12 months can be partitioned into periods of employment (up to a maximum of three jobs), unemployment, and inactivity. As a result of the

Page 7: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 537

Wave Month under review

J F M A M J J A S O N D J F M A M J J A S O N D J F M

January February March April May June Jury August September October November December January February March

P c P

c c P

c c c c c c P c

P

c c c c c c c c c c c c c c c c c c c c c c c c c p c c c c

p c c c p e c

P c P

c c c c c c c c c c P

c c c c c c c c c c c P

P c p c c p c c c c c c c c c c c c c c c c c c c c c c c c c c c p e c

Ρ c P

P c p c c c c c c c c c c c c c c c c c c c c

P c p c c p c c c p c c c c p c c c c c p c c c c c c p c c c c c c c p c c c c c c c c p c c c c c c c c c p

c=complete information available p=only partial information available

Figure 26.1. Sample design of the continuing labor force survey

one year retrospection period and the periodicity of the survey, the waves are partly overlapping (see Figure 26.1). The interviews are distributed more or less homogeneously over the month for every wave. A respon-dent interviewed on, say, April 23,1989 gives information on the period April 24, 1988-April 23, 1989. Hence, for any month under review information is available from respondents from thirteen different waves, where the first and last of these only give information on a part of that particular month. The survey design is described in more detail in van Bastelaer (1988).

Respondents are asked what their current main activities are, whether they are looking for a job, and whether they would be available if offered a job. Since retrospective questions on these topics cannot be formulated in the same wordings as the ones about current activities, validity problems other than recall errors may exist for retrospective data concerning availability and looking for a job. As availability and looking for a job are necessary conditions for the most frequently used definitions of unemployment, problems of validity may also exist for data concerning unemployment. Therefore, the models for memory effects have only been developed for the two state process, with states "employed" and "not employed." The latter category contains both unemployed and economically inactive respondents.

Even so, the problem of validity remains, because the survey questionnaire only allows for the three most recent jobs, so that

Page 8: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

538 MODELS FOR MEMORY EFFECTS IN COUNT DATA

information on the period before the third job is missing for respondents who report this maximum. Although the respondents who report three jobs constitute on the average as little as (roughly) 0.25 percent of the total response in a wave, they have great influence on the transition data. This influence can be quantified, as the respondents who report this maximum have to answer an additional question about the number of jobs they have had in the past year besides those already mentioned. Exclusion of these "3+-jobs" appears to lead to a total number of reported transitions that is roughly 5 percent lower than it would have been, had they been included in the survey setup. These unreported "3+-jobs," of course, are concentrated in the most distant part of the one-year retrospection period, where their omission leads to an extra decline in reported transitions of roughly 10 to 15 percent. Inclusion of this distant part of the retrospection period in the estimation process would result in a serious bias in the estimated memory effect parameters. To avoid the influence of this aspect of the survey setup on the estimation of the memory effect parameters, models have been developed and estimated for numbers of transitions reported with a maximum time lag of six months only.

A few words need to be said with regard to the possible presence of telescoping effects, which are ignored by the general model of Section 26.3 of this chapter. One should make a distinction between the telescoping of events into or out of the one-year retrospection period on the one hand, and the telescoping of events within the period on the other. The effects of the telescoping of events into or out of the one-year retrospection period are likely to be removed by only using data from the most recent half of the retrospection period (for the CLFS, this is the six months immediately preceding the interview), as this type of telescoping is located around the boundary of the retrospection period. The effect of the misplacement of dates within the retrospection period (internal telescoping) mainly depends on the average direction of these misplace-ments. In the case that the average direction of internal telescoping is either forward or backward, the estimates of the monthly intensities of the Poisson process might be biased. Whenever internal telescoping is present but without a specific direction (the misplacement of a random event has zero expectation), it might lead to an increase in the variances of the estimated intensities. There is some evidence (Mathiowetz, 1985) that both forward and backward telescoping are present in labor market data with magnitudes that are about equal, so that we can expect the estimated intensities to be nearly unbiased.

The time-period under study ranges from June 16, 1987 to June 15, 1988, divided into 12 one-month periods. Although the interviews are roughly equally distributed over the month for every sample wave, we

Page 9: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 539 Month of Reported Event (i)

3 4 5 6 7 8 9 10 11 12

Time Lag from Date of Report

(i)

1 2 3 4 5

1 2 3 4 5

2 3 4 5 6

3 4 5 6 7

4 5 6 7 8

5 6 7 8 9

6 7 8 9

10

7 8 9

10 11

8 9

10 11 12

9 10 11 12 13

10 11 12 13 14

11 12 13 14 15

12 13 14 15 16

6 6 7 8 9 10 11 12 13 14 15 16 17

Figure 26.2. Month of interview by month of reported event and time lag

will treat all respondents from a monthly sample as if they were interviewed on the 15th of that month.

In this section we will only use data from respondents who have reported at least one transition in the one-year retrospection period, assuming that respondents who do not report a transition have res-ponded correctly. This assumption is based on the notion that, for most people, matters of employment and unemployment are of great impor-tance. Moreover, a single labor market transition in the past year can be considered a very salient event, not likely to be forgotten.

26.4.2 Notation and Model Specifications

The two states of the process are indexed by x, where x = 0 stands for labor market status "not employed" and x-1 stands for labor market status "employed." The one-month periods are denoted by T,· (i = 1, 2, . . . , 12), where 7\ stands for the period "June 16,1987-July 15,1987", T2 stands for the period "July 16,1987-August 15,1987" and so on. The different time lags with which the numbers in the two-way table are measured are indexed byj (j = 1,2,... ,6), where,/ stands for a time lag (in months) in the range \J - l,j), the month of interview being i +j - 1 . Figure 26.2 displays the months of interview classified by month of reported event and time lag from date of report. Note that all pairs (i,j) for which i +j - 1 takes on the same value concern the same wave, from now on denoted by S; + ; _ i. In this situation we have 2 x 12 χ 6 "general models" where for each state x there are twelve intervals Ti of which the numbers of transitions are reported by six different sample waves Si+j-i.

Let nxij be the number of reported transitions from state x in T for wave Si+;-i and let nXy be the corresponding " t rue" number of transi-tions. The nxij coincide with n*^ (i +j - 1 ) as defined in the general model and the nxij coincide with nXTt (i +j -1), where the argument i +j - 1 is added to identify this particular group of respondents. Note that there

Page 10: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

540 MODELS FOR MEMORY EFFECTS IN COUNT DATA

was no need for this extra argument identifying the sample in the definition of the general model, since the general model is denned for one sample only. Now, however, we have 2 χ 12 χ 6 "general models" with data from 17 ( = 12 + 6 -1) different samples which have to be identified. These 144 general models are extended by specifying the integrated intensities λ^ and the reporting probabilities Ηζτ( (i+j-1) in more detail.

The intensity kx(t) for a randomly selected individual is approxi-mated by the constant λχι for ieT„ so that the integrated intensity λχτ, for the sample that reports on T, in interview month i+j-I is the product of XXi and the total time spent in state x during T„ aggregated over all the respondents from that sample. This total time spent in x during Ti by Si +j-i is denoted by ΐχψ In the sequel these t»y are called durations (note that these durations are not related to individual job length!).

The probability of reporting a transition from state x in period T, that occurred between./' - 1 and; months ago corresponds to H^ (i +j -1) in the general model. Recall that Hj{i) in the general model is defined to be only depending on the time lapse since the event. We see that HxTi(i+j-l) only depends on i+j-i=j, and we will denote this conditional probability by Hzj.

For HXj the following functional forms are considered:

1. Hy = Hy : unrestricted memory effects (Hxl s 1); 2. i/„ = 1 - ßx(J - J) : linear memory effects; 3. Η„ί = exp[ - ßx(j - J)] : exponential memory effects; 4. Hxj = Hj '.state-independent memory effects (H, s 1); 5. H^, = 1 - ß(j - J) : state-independent linear memory effects; 6. //„ = exp[ - ß(J - £)] : state-independent exponential memory effects; 7. H^ =1 : no memory effects.

Models 1 and 4 are not based on a continuous function on the time-axis. They simply give the reporting rate in time lag period j relative to time lag period one. To provide for a gauge, the Hxl are fixed at one for these two models. The linear models are obtained by applying equation (26.3) with hx(u) = 1 - ßxu. The exponential models are close approximations of the functions obtained when applying equation (26.3) with hx(u) = exp( - ßxu) (the correct functions obtained by this procedure will be derived in subsection 26.4.4).

Under the model the nxiJ are distributed according to a Poisson distribution which from (26.4) has parameter λυΗχ/ί^ (χ = 0,1; i = 1,2,.. .,12 and j = 1,2,... ,6). This is also equal to the expectation and the variance of nxij under the model. For the estimation of the model parameters by the maximum likelihood method we need the joint likelihood for the entire 2 x 12 x 6 table of reported transitions. By treating all nxij as if they are mutually independent this joint likelihood is easily obtained as the product of the 144 marginal likelihoods. This assumed independence is guaranteed for data from different sample waves. Under the Poisson process the numbers on a wave diagonal are

Page 11: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 541

independent for each state x. The main problem that could arise with respect to the assumption of independence is the possible dependence of the pairs (ηου, nuj). One way to overcome this problem is to assume a special form of the bivariate Poisson distribution (e.g., Section 11.4 of Johnson and Kotz, 1969) for the pairs (no«, nu/), and to write the joint likelihood for the entire table as the product over i and; of these bivariate distributions. However, for reasons of simplicity we choose to assume complete independence and write the joint likelihood as the product of the marginal distributions over all 144 cells. After estimating the model parameters, we will calculate Spearman's rank correlation coefficient for the paired normalized residuals of π0ν and nli; to test whether these are correlated. The maximum likelihood procedure is carried out by maxi-mizing the natural logarithm of the joint likelihood, using the Newton-Raphson algorithm.

To estimate the model parameters we need the "true" durations £«· Inspection of the reported durations ίχυ·, however, reveals a systematic effect of time lag: for x = 0 (not employed) the reported durations are increasing and for x = 1 (employed) they are decreasing as a function ofj. Under assumption 1, the main explanation for this phenomenon is that respondents fail to report episodes of work. This explanation is consis-tent with the cognitive tasks the respondents have to perform during the interview. They first have to remember whether or not they had a job in a certain time-period, before the more detailed information with respect to the dates of job commencement and job termination is requested. The recall effects in reported durations are dealt with in two different ways. In subsection 26.4.3 the reported durations are transformed to eliminate the effects of time lag and the models are estimated using these corrected durations. In subsection 26.4.4 a form of time-correction is built into the model and estimation takes place using the original reported durations.

26.4.3 The Models Estimated with Transformed Durations

In this subsection models 1-6 are estimated using transformed durations. The tuj are transformed to in; by dividing them by a factor 6j. The io«; then are equal to toy + <ιυ - tuj. The factors Sj are calculated from a 12 χ 6 table of reported durations $* by wave and time lag according to equation (26.5), using reported durations from the first 12 waves:

12

«5, = =ä* , 7 = 1,2 6 (26.5)

U . - 1

Page 12: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

542 MODELS FOR MEMORY EFFECTS IN COUNT DATA

Table 26.1. Model Fits for Different Models for H4

Model for Memory Effects

1: Unrestricted 2: Linear 3: Exponential 4: State-independent S: State-independent linear 6: State-independent exponential 7: No memory effects

C

137.83 147.89 146.28 138.63 147.93 146.31 223.76

df

110 118 118 115 119 119 120

p-value

0.037 0.033 0.040 0.067 0.037 0.046 0.000

where t^ stands for the state 1 durations in the jth month before the interview as reported by wave w. Note that ffi coincides with tuw _;+1), for all pairs (w, j) for which w - j + 1 is one of the integere 1,2,... ,12. Both numerator and denominator in equation (26.5) contain the reported durations for 12 calendar months, so that seasonal effects are neutra-lized. To understand the rationale behind the <5;, divide both numerator and denominator in equation (26.5) by 12. Then, the numerator gives the average duration in state 1 in a one-month period, measured with a time lag in months in the range [j - l,j). The denominator now gives this average duration measured with a time lag in months in the range [0,1). Hence, Sj describes the effect of the time lag on the average reported durations, and by applying this correction factor we artificially inflate the durations from time-lag period ;' to bring them to the level of the durations reported in the first month before the interview, assuming that these are (almost) unaffected by recall error. The resulting vector δ of correction factors èj is given by (1.0.971,0.948,0.918,0.893,0.863), clearly showing that reported durations decline with increasing time lag.

Models 1-6 are estimated for a 2 χ 12 x 6 table of reported transitions nXIJ with a corresponding table of transformed durations txiJ. The model without memory effects (model 7) is estimated with untransformed durations. In Table 26.1 the likelihood ratio test statistic with corres-ponding degrees of freedom and p-values is given for all seven models.

Examining Table 26.1 we see that the p-value for all models but the one without memory effects (with ap-value of 0.000) is around 0.050. The p-value for the no memory effects model indicates that memory effects are present in the data. Comparing this model with the state-independent exponential model, we see that the addition of one parameter describing the memory effects decreases the value of the chi square statistic, denoted by G2, by 70. The difference between the two models is highly significant. We also see that the state independent models do not differ significantly from their state dependent counterparts, which was not the case in van Dosselaar, et al. (1989a, 1989b), where untransformed durations were used to estimate the models.

Page 13: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 543

The values for G2 of the models that incorporate memory effects still are rather high, which could be due to the possible presence of a wave effect, not accounted for by the model. There are waves for which the numbers of reported transitions tend to be larger than the average and waves for which these numbers tend to be smaller than the average, probably due to the number of respondents in the wave who change jobs frequently. An analysis of normalized residuals by wave shows that there are indeed some waves for which the residuals differ significantly from the average.

Other possible explanations for high G -values include telescoping effects (i.e., misreporting of dates of employment), differences in distribu-tions of interview days over the interview months possibly resulting in misclassifications of transitions with respect to the periods T„ misclassi-fication error in the classification of employment, interactions between calendar period and time-lag, or the possible invalidity of assumption 2. Neglecting the complex design (see Chapter 31) or possible dependencies in the data (see Gleser and Moore, 1985) could have led to inflated G2-values.

Spearman's rank correlation coefficient p for the paired normalized residuals of n^ and n,<, is roughly - 0.05 for models 1 and 2, roughly 0.001 for models 3 through 6, and 0.26 for model 7. Performing Spearman's rank correlation test for independence, we see that the corresponding values of the test statistic [ = p(71)1/2] lie outside the critical region (with a = 0.05) for models 1-6. From these results it would seem that the assumed independence between n^ and nUj is safeguarded rather well for the models that fit the data best.

Table 26.2 shows the parameter estimates which describe the memory effects. The differences between the memory effect parameters for state 0 and state 1 in the state-dependent models are not significant. Table 26.3 contains the parameter estimates for λ0ί for all seven models and for a "model free" situation. This "model free" estimate is given by ou/ion and can be regarded as an estimate of the intensity when only the

most recent data on period Tt are used. Under the assumption of no memory effects in the month preceding the interview these "model free" estimates are unbiased. The standard errors of these estimates given by "ii/ wi» however, are rather high. Looking at the estimates of λ^ for models 1-6 we see that their values deviate neither much nor systemati-cally from the "model free" ones, but they have considerably smaller standard errors. Standard errors of estimates are computed as the square root of the diagonal elements of the estimated covariance matrix of the parameter vector, which is equal to minus the inverse of the matrix of second order derivatives of the log-likelihood with respect to the model parameters. The estimated values for model 7 are clearly biased downwards.

Page 14: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

544 MODELS FOR MEMORY EFFECTS IN COUNT DATA

Table 26.2. Parameter Estimates for Different Models for H, *t

Model for memory effects

1: Unrestr icted

2: Linear

3: Exponent ia l

4: State- independent

5: State- independent l inear

6: State- independent ι exponential

Parameter 1

0

Hm

Hm Ho4 Hot, Hoe

ßo

ßo

state

H2

« 3 Ht

Hi He

ß ß

1

H\2 H\3

Hu His H\e

ßl

A

Estimate state

0 1

0.936 0.904 0.814 0.833 0.803 0.791 0.806 0.789 0.761 0.747

0.044 0.046

0.053 0.055

0.918 0.825 0.796 0.796 0.753

0.046

0.054

Standard error state

0 1

0.052 0.045 0.046 0.042 0.046 0.040 0.046 0.039 0.044 0.037

0.007 0.006

0.010 0.009

0.034 0.031 0.030 0.030 0.028

0.005

0.006

Table 26.3. Parameter Estimates for loi in Different Models for Memory Effects (Standard Errors in Parentheses)

Model for HXj

Model free 1: Unrestr icted 2: Linear 3: Exponent ia l 4: State- independent 5: State- indep. l in. 6: State-indep. exp. 7: N o memory effects

Λ«ι

464 (85) 477 (35) 468 (32) 474 (33) 480 (32) 470 (31) 476 (31) 363 (23)

^02

876(117) 884 (52) 865 (45) 876 (47) 889 (46) 869 (42) 879 (43) 676 (30)

^03

736 (76) 805 (45) 792 (39) 801 (41) 810 (40) 794 (37) 804(38) 636 (28)

λο*

393 (54) 424 (29) 418 (27) 422 (28) 426 (27) 419 (26) 424 (27) 333 (20)

^05

462 (58) 431 (30) 425(28) 429(29) 434 (28) 426 (27) 431 (27) 337 (20)

Ί«

370 (56) 411(29) 405 (27) 410 (28) 414 (27) 407 (26) 411(26) 321 (20)

Model for //„·

Model free 1: Unrestr icted 2: Linear 3: Exponent ia l 4: State- independent 5: State-indep. lin. 6: State-indep. exp. 7: N o memory effects

^07

597 (66) 626 (37) 617 (33) 624 (35) 630(34) 619 (31) 626 (32) 494 (24)

Λοβ

330(50) 333 (25) 328 (23) 332 (24) 335 (23) 329 (22) 333 (23) 267 (18)

Λ»

296 (45) 301 (24) 296 (22) 299 (23) 303 (22) 297 (21) 301 (22) 245 (17)

Λοιο

343 (49) 302 (23) 298 (22) 301 (22) 304 (22) 299 (21) 302 (22) 241 (17)

^011

385(54) 328 (24) 325 (23) 329 (23) 330 (22) 326 (22) 330 (22) 260 (17)

^012

334(50) 284(22) 281 (21) 284 (21) 286 (21) 282 (20) 286(20) 222 (15)

Note: Table entries have been multiplied by 105.

Page 15: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 545

The estimation results presented in this subsection show that memory effects are present in these retrospective data on transitions and that models for memory effects can be used to obtain estimates for the intensities of the Poisson process generating the "true" number of transitions. Further, the estimates are efficient and nearly unbiased when compared to the "model free" estimates. The estimated memory effect parameters for the state-dependent models do not show a signifi-cant difference between the values for state 0 and state 1. These results are encouraging and support our theory that respondents mainly fail to report whole periods of employment.

We have estimated our models in a two-step procedure with the aid of transformed durations. To estimate and test these models properly and to examine whether the decline in reported employment durations can be linked to the decline in reported jobs, new models, incorporating a duration correction, are necessary. The mathematical derivation of such models is rather complicated. Therefore we will perform this task for the state independent exponential memory effects model only. We have chosen this particular model for both practical and theoretical reasons: its fit is as good or as bad as that of the others but it only uses one degree of freedom in modeling the memory effects, the state-independence is consistent with the respondents' cognitive task, the exponential forget-ting curve is widely used in cognitive psychology, and, finally, the form of the function assures that the estimated probabilities Hxj will always be within the range [0,1].

26.4.4 The State-Independent Exponential Memory Effects Model with Incorporated Time Correction

In this subsection the state-independent exponential memory effects model with incorporated time correction is developed, estimated and interpreted. The specification of the reporting probability is derived again in complete agreement with the general model, whereas it remained an approximation in the corresponding model of the previous subsection. The decline in employment durations will be accounted for by the model, so that the model can be estimated with untransformed data in a one-step procedure.

For the derivation of this model, let us examine first what happens when an event occurs at time t. At time t the event is placed in memory and we want respondents to retrieve the information on the event under certain interview conditions at time τ > t. The human memory is fallible and we assume that for every event experienced by our randomly selected respondent, there is a time span At after which the event is not retrieved

Page 16: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

546 MODELS FOR MEMORY EFFECTS IN COUNT DATA

from memory under the realized interview conditions of the survey. The exponential memory effects model states that At is exponentially distributed with parameter ß>0 (on the positive real numbers), that is fß(At) = /texp( - ßAt). The probability that an event is reported s time units after its occurrence is given by h(u) = exp( - ßu), which is obtained by integrating the density fß of At over the interval (u,oo). This procedure can be applied with distributions f(At) other than the exponential, and the general model thus can be made even more general by specifying f(At) instead of A(u), which can be derived from f(At). In complete agreement with the general model the (integrated) probability Hj (subscript x for state has been omitted) of reporting an event that happened between; - 1 and ; months ago is given by the following equation:

j

r β - ίο- ι> _ e - f t Hj= e-i>udu = - (26.6)

For the estimation of the error in the measured durations, write these durations txy as p^m,,, where mi; is equal to toy + tuj, and where p»; stands for the proportion of mv spent in state x. Corresponding with this decomposition txy should be written as pxymy, where pxy stands for the "true" proportion of my spent in state x. To model the deviations ofpXy from Pxy, we will treat these proportions as measured and "true" probabilities, respectively, of occupying state x at time t - f for a randomly selected respondent.

Let t be the time of interview for such a randomly selected individual. Denote the "true" probability of occupying state x at time t by Pr(t) and the probability of occupying state x at time t measured at time τ by px(i;t). Under the assumption that reported jobs are remembered correctly we can write:

Po(t;z) = ρ0(ί) + Ρ[Χ(ί;τ) = 0, X(t) = 1] (26.7)

where X(t;x) and X(t) denote the measured and true labor market position, respectively, of a randomly selected respondent at time t. In words, (26.7) states that the probability of a report at time τ of unemployment at time t is equal to the probability of truly being unemployed at time t plus the probability of truly being employed but falsely reporting unemployed at time t. The second probability on the right of equation (26.7) can be related to the reporting probability h(u) - exp( - ßu) with u in the interval (Ο,τ -1). With X(t) = 1, our randomly selected respondent has a job at time t. Let Wu be the waiting time between t and the last day of this work period, and let α,(·) be the density of Wu- The second probability on the right of equation (26.7) can now be written as

Page 17: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 547

Ρ[Χ(1,τ) = 0, X(t) = 1]

= ' Γ Ρ[Χ(ί;τ) = 0\X(t) = 1, Wu = s] φ) P[X(t) = 1] ds » - 0 t - l

= J (1 - Λ(τ - t - s)) φ)ρλ{ί) ds. (26.8) ««o

The probabilities are only integrated over the interval [Ο,τ -1) because values of s greater than τ - 1 imply that the job held at time t is still held at the day of interview and is therefore reported with probability one. Substitution of equation (26.8) into equation (26.7) and rewriting the resulting equation in terms of Ρι(ί;τ) and pi(i) leads to the following equation linking the measured probability of being in state 1 to the "true" probability of being in that state:

Pi(t,x) = A(0 | l - ί Φ) (1 - Λ(τ - ί - s)) ds . (26.9)

For our purpose of time correction it will be sufficient to have a waiting time distribution with only one parameter, independent of t. This implies that, for the moment, we will consider the process that governs job ending to be homogeneous (i.e., independent of calendar time). For such a process, it can be shown (see Cox and Lewis, 1966, Section 4.2) that the waiting time at any randomly selected time point t with X{t) = 1 is exponentially distributed with parameter a > 0. Substitution of at(s) = aexp( - as) and Λ(τ - 1 - s) = exp( -ß(r-t- s)) in equation (26.9) and integ-ration of the resulting function over the interval [Ο,τ -1) yields equation (26.10), which gives us an expression forpi(i;t) in terms ofp;(0> <*, ß, and τ - ί , viz.

Afet) = Ρ«)(-^-β e-^~l)- ^ e - "« - A. (26.10)

Treatingp^; aspx(i - J;i + j - 1) andp%o a sPx( l ~ 2)' w e c a n write tUj in the likelihood function as

tUj = ^ — ^ (26.11)

.e -w-i) iL_e-o--i> a- β Λ-β

whereas ίοί/ is computed by subtracting tuj from m^ ( = ία, + tuj). Using (26.4), we see that nXy is Poisson distributed with parameter XxiHjtXij, with Hj as given by equation (26.6) and txij as given by equation (26.11). Again, parameters are estimated by the maximum likelihood method using the Newton-Raphson algorithm to maximize the natural logarithm of the

Page 18: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

548 MODELS FOR MEMORY EFFECTS IN COUNT DATA

Table 16.4. Parameter Estimate· for the Exponential Memory Effects Model with Incorporated Time Correction

Parameter

ß

Aoi AOJ λθ3 Ίο< Αθ5

Estimate

0.0539

464 857 786 414 421 401

St. error

0.0065

32 44 39 27 28 27

Parameter

a

X(yi

4» 4» Λοιο 4>U 4>lî

Est imate

0.2372

613 326 295 297 324 280

St. error

0.0881

33 23 22 21 22 20

Note: Except for β and a, table entries have been multiplied by 10s.

likelihood function, which is written as the product of the likelihoods for the 2 x 12 x 6 separate cells in the table of reported numbers of transi-tions.

Estimation of this model leads to a value of G2 of 145.75 with 118 degrees of freedom and a corresponding p-value of 0.042. The estimated model parameters are presented in Table 26.4.

The estimation results for this model are very encouraging and easily interpreted. As in Section 26.4.3, the estimated integrated intensit-ies Xoi have values close to the model free ones of that section and have very small standard errors. Under the model, we see that the time-span At, after which an event is no longer recalled, is exponentially distributed with parameter ^=0.0539. Since an exponential distribution with parameter β has expectation l/ß and variance 1/β2 we find that under the realized interview conditions the average job of a respondent can be expected to be recalled up to 18.6 ( = 1//?) months after it has ended, with a variance of (18.6)2. By the same argument we may conclude that for our group of respondents, who have reported at least one transition over the past year, the average job length is equal to 4.2 (= 1/ot) months with a variance of (4.2)2. This may seem short for an average job length, but the bulk of this group consists of people who can be labeled as "movers" on the labor market, who change jobs frequently. For a discussion of the concept of "movers" and the associated "mover-stayer" models, see for instance Langeheine and van de Pol (1990).

The incorporated time correction leads to a vector of correction factors 6j with values (0.998,0.988,0.968,0.943,0.914,0.882). These values, obtained by substitution of ά and ß in the denominator of the right-hand side of equation (26.11), tend to be smaller than the ones computed a priori in subsection 26.4.3, but this could be due to the fact that in the a priori computation, some data were used that are not used here, and vice versa.

The normalized residuals on a wave diagonal tend to show the same

Page 19: [Wiley Series in Probability and Statistics] Measurement Errors in Surveys (Biemer/Measurement) || Models for Memory Effects in Count Data

MODELS FOR MEMORY EFFECTS IN COUNT DATA 549

sign ( + or - ) , which supports the hypothesis of wave effects being present in the data. This hypothesis is also supported by the table of estimated "true" proportions /»»y, which has both diagonals with propor-tions that are relatively high for almost all periods T, and diagonals with proportions that are small for almost all 7V Dividing the total number of expected transitions over a wave diagonal by the total number of reported transitions yields a quantity that can be interpreted as the effect of the wave that corresponds with the diagonal. Multiplying all expected numbers on each diagonal by the corresponding wave effect results in a considerable reduction of the value for Pearson's χ2. The wave effects can be regarded as the scale factor in the scaled inhomogeneous Poisson process. (See Snyder, 1975, example 2.4.5.) Incorporating such wave effects by a random component will increase the fit, as measured by G2, but will not affect the parameter estimates very much.

Spearman's rank correlation coefficient for the paired normalized residuals of no,; and nUj is equal to - 0.014. The test for independence based on this coefficient demonstrates that the assumed independency between /toy and ηΧί) is not likely to be violated.

26.5 CONCLUSIONS

The main conclusions of this chapter are that retrospective data on counts of events can be used to estimate the "true" number of counts of these events for the respondents in the survey, even in the case where no external gauging device is available. Provided we have a survey design and a target variable that satisfy certain conditions, models can be used to detect memory effects, to quantify the memory effects and to estimate the "true" number of counts efficiently. The memory effects themselves can be used to correct the table of reported events for recall loss. For a two-state process the estimated intensities can be used to estimate correct transition probabilities and calculate (period to period) transi-tion tables. The exponential memory effects model with incorporated time correction might be used to set up an imputation scheme at record level to correct for recall loss. The development of such a procedure to select the most appropriate records for imputation requires further analysis.