an r package for analyzinig truncated datadeu... · an r package for analyzing truncated data ......
TRANSCRIPT
An R package for analyzingtruncated data
Carla Moreira1∗, Jacobo de Una-Alvarez1 and Rosa M. Crujeiras2
1 Department of Statistics and OR, University of Vigo2 Department of Statistics and OR, University of Santiago de Compostela
∗
Introduction Algorithms for DTD Package description Conclusions
Outline
1 Introduction
2 Algorithms for DTD
3 Package description
4 Conclusions
Moreira et al. useR! 2009 DTDA package 2/30
Introduction Algorithms for DTD Package description Conclusions
Motivation examples
Astronomy
Epidemiology
Economy
Survival Analysis
In these cases, we must apply specialized statistical modelsand methods due the need to accommodate the event oflosses in the sample, such as grouping, censoring ortruncation.
Moreira et al. useR! 2009 DTDA package 3/30
Introduction Algorithms for DTD Package description Conclusions
Truncation Scheme
Moreira et al. useR! 2009 DTDA package 4/30
Introduction Algorithms for DTD Package description Conclusions
Truncation Scheme
t1
Moreira et al. useR! 2009 DTDA package 4/30
Introduction Algorithms for DTD Package description Conclusions
Truncation Scheme
t1 t2Observational Window
Moreira et al. useR! 2009 DTDA package 4/30
Introduction Algorithms for DTD Package description Conclusions
Truncation Scheme
Let X∗ be the ultimate time of interest with df F
(U∗, V ∗) the pair of truncation times, with joint df K
We observe (U∗,X∗, V ∗) if and only if U∗ ≤ X∗ ≤ V ∗
Let (Ui,Xi, Vi), i = 1, ..., n be the observed data.
Under the assumption of independence between X∗ and (U∗, V ∗):
The full likelihood is given by:
Ln(f, k) =n
∏
j=1
fjkj∑n
i=1 Fiki
Moreira et al. useR! 2009 DTDA package 5/30
Introduction Algorithms for DTD Package description Conclusions
Truncation SchemeWhere:
f = (f1, f2, ..., fn)
k = (k1, k2, ..., kn)
Fi =∑n
m=1 fmJim
andJim = I[Ui≤Xm≤Vi] = 1 if Ui ≤ Xm ≤ Vi,
or zero otherwise.As noted by Shen (2008):
Ln(f, k) =
n∏
j=1
fj
Fj
×
n∏
j=1
Fjkj∑n
i=1 Fiki
= L1(f) × L2(f, k)
Moreira et al. useR! 2009 DTDA package 6/30
Introduction Algorithms for DTD Package description Conclusions
Efron-Petrosian estimators
The condicional NPMLE of F (Efron-Petrosian, 1999) is definedas the maximizer of L1(f).
1
fj
=
n∑
i=1
Jij ×1
Fi
, j = 1, ..., n
where Fi =
n∑
m=1
fmJim.
This equation was used by Efron and Petrosian (1999) tointroduce the EM algorithm to compute f .
Moreira et al. useR! 2009 DTDA package 7/30
Introduction Algorithms for DTD Package description Conclusions
EM algorithm from Efron and Petrosian (1999)
EP1. Compute the initial estimate F(0) corresponding to
f(0) = (1/n, ..., 1/n);
EP2. Apply (1) to get an improved estimator f(1) to compute the
F(1) pertaining to f(1);
EP3. Repeat Step EP2 until convergence criterion is reached.
Moreira et al. useR! 2009 DTDA package 8/30
Introduction Algorithms for DTD Package description Conclusions
Shen EstimatorInterchanging the roles of X’s and (Ui, Vi):
Ln(f, k) =n
∏
j=1
kj
Kj
×
n∏
j=1
Kjfj∑n
i=1 Kifi
= L1(k) × L2(k, f)
where
Ki =
n∑
m=1
kmI[Um≤Xi≤Vm] =
n∑
m=1
kmJim
and maximizing L1(k):
1
kj
=
n∑
i=1
Jji
1
Ki
, j = 1, ..., n
with Ki =n
∑
m=1
kmJim.
Moreira et al. useR! 2009 DTDA package 9/30
Introduction Algorithms for DTD Package description Conclusions
Shen Estimator
Shen (2008) showed that the solutions are the unconditionalNPMLE of F and K, respectively, and both estimators can beobtained by:
fj =
[
n∑
i=1
1
Kj
]−11
Kj
, j = 1, ..., n
kj =
[
n∑
i=1
1
Fj
]−11
Fj
, j = 1, ..., n
Moreira et al. useR! 2009 DTDA package 10/30
Introduction Algorithms for DTD Package description Conclusions
EM algorithm from Shen (2008)
S1. Compute the initial estimate F(0) corresponding to
f(0) = (1/n, ..., 1/n);
S2. Apply (4) to get the first step estimator k(1) and compute the
K(1) pertaining to k(1);
S3. Apply (3) to get the first step estimator f(1) and its
corresponding F(1);
S4. Repeat Steps S2 and S3 until convergence criterion is reached.
Moreira et al. useR! 2009 DTDA package 11/30
Introduction Algorithms for DTD Package description Conclusions
DTDA-package
efron.petrosian(X,...)
lynden(X,...)
shen(X,...)
Moreira et al. useR! 2009 DTDA package 12/30
Introduction Algorithms for DTD Package description Conclusions
DTDA-package
efron.petrosian(X,...)
lynden(X,...)
shen(X,...)
3 examples data sets with X ∼ Unif(0,1)and:
Ex.1 U ∼ Unif(0,0.5), V ∼ Unif(0.5,1)
Ex.2 U ∼ Unif(0,0.25), V ∼ Unif(0.75,1)
Ex.3 U ∼ Unif(0,0.67), V ∼ Unif(0.33,1)
Moreira et al. useR! 2009 DTDA package 13/30
Introduction Algorithms for DTD Package description Conclusions
efron.petrosian illustration under double truncation
EX.1-50% of truncation
efron.petrosian(X,U,V,. . .)
>iter
>f
>FF
>S
>Sob
>upperF
>lowerF
>upperS
>lowerS0.2 0.6 1.0
0.0
0.2
0.4
0.6
0.8
1.0
EP estimator
Time of interest
0.2 0.6 1.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 14/30
Introduction Algorithms for DTD Package description Conclusions
efron.petrosian illustration under double truncation
0.2 0.6 1.0
0.0
0.2
0.4
0.6
0.8
1.0
EP estimator
Time of interest
0.2 0.6 1.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 15/30
Introduction Algorithms for DTD Package description Conclusions
efron.petrosian illustration under left truncation
EX.1
efron.petrosian(X,U,...)
0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
EP estimator
Time of interest
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 16/30
Introduction Algorithms for DTD Package description Conclusions
efron.petrosian illustration under left truncation
0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
EP estimator
Time of interest
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 17/30
Introduction Algorithms for DTD Package description Conclusions
lynden illustration under double truncation
EX.2-25% of truncation
lynden(X,U,V,...)
>iter
>NJ
>f
>FF
>h
>S
>Sob
>upperF
>lowerF
>upperS
>lowerS
0.2 0.6
0.0
0.2
0.4
0.6
0.8
1.0
EP estimator
Time of interest
0.2 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 18/30
Introduction Algorithms for DTD Package description Conclusions
lynden illustration under double truncation
0.2 0.6
0.0
0.2
0.4
0.6
0.8
1.0
EP estimator
Time of interest
0.2 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 19/30
Introduction Algorithms for DTD Package description Conclusions
lynden illustration under right truncation
EX.2
lynden(X,V,...)
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 20/30
Introduction Algorithms for DTD Package description Conclusions
lynden illustration under right truncation
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Survival
Time of interest
Moreira et al. useR! 2009 DTDA package 21/30
Introduction Algorithms for DTD Package description Conclusions
shen illustration under double truncation
EX.3-67% of truncation
shen(X,U,V...)
>iter
>f
>FF
>S
>Sob
>k
>fU
>fV
>upperF
>lowerF
>upperS
>lowerS
0.2 0.4 0.6 0.8
0.0
0.4
0.8
Shen estimator
Time of interest
0.2 0.4 0.6 0.8
0.2
0.6
1.0
Survival
Time of interest
0.2 0.4 0.6 0.8
0.0
0.4
0.8
Marginal U
Time of interest
0.2 0.4 0.6 0.8
0.0
0.4
0.8
Marginal V
Time of interest
Moreira et al. useR! 2009 DTDA package 22/30
Introduction Algorithms for DTD Package description Conclusions
shen illustration under double truncation
0.2 0.4 0.6 0.8
0.0
0.4
0.8
Shen estimator
Time of interest
0.2 0.4 0.6 0.8
0.2
0.6
1.0
Survival
Time of interest
0.2 0.4 0.6 0.8
0.0
0.4
0.8
Marginal U
Time of interest
0.2 0.4 0.6 0.8
0.0
0.4
0.8
Marginal V
Time of interest
Moreira et al. useR! 2009 DTDA package 23/30
Introduction Algorithms for DTD Package description Conclusions
Summary
The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.
Moreira et al. useR! 2009 DTDA package 24/30
Introduction Algorithms for DTD Package description Conclusions
Summary
The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.
This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).
Moreira et al. useR! 2009 DTDA package 25/30
Introduction Algorithms for DTD Package description Conclusions
Summary
The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.
This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).
Estimation of the lifetime and truncation times distributions ispossible, together with the corresponding pointwise confidencelimits based on the bootstrap.
Moreira et al. useR! 2009 DTDA package 26/30
Introduction Algorithms for DTD Package description Conclusions
Summary
The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.
This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).
Estimation of the lifetime and truncation times distributions ispossible, together with the corresponding pointwise confidencelimits based on the bootstrap.
Plots of cumulative distributions and survival functions areprovided.
Moreira et al. useR! 2009 DTDA package 27/30
Introduction Algorithms for DTD Package description Conclusions
Summary
The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data.
This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).
Estimation of the lifetime and truncation times distributions ispossible, together with the corresponding pointwise confidencelimits based on the bootstrap.
Plots of marginal cumulative distributions and survivalfunctions are provided.
There are no R packages with double truncation scheme.
Moreira et al. useR! 2009 DTDA package 28/30
Introduction Algorithms for DTD Package description Conclusions
Acknowledgments
Work supported by the research Grant MTM2008-03129 andMTM2008-0310 of the Spanish Ministerio de Ciencia eInnovacion
Grant PGIDIT07PXIB300191PR of the Xunta de Galicia
Moreira et al. useR! 2009 DTDA package 29/30
Introduction Algorithms for DTD Package description Conclusions
References
Efron, B. and Petrosian, V. (1999)Nonparametric methods for doubly truncated data.Journal of the American Statistical Association, 94, 824-834.
Lynden-Bell, D. (1971)A method of allowing for known observational selection insmall samples applied to 3CR quasars.Mon. Not. R. Astr. Soc, 155, 95-118.
Moreira, C. and de Una Alvarez, J.(Under revision)Bootstrapping the NPMLE for doubly truncated data.Journal of Nonparametric Statistics.
Shen P-S. (2008)Nonparametric analysis of doubly truncated data.Annals of the Institute of Statistical Mathematics, DOI10.1007/s10463-008-0192-2.
Moreira et al. useR! 2009 DTDA package 30/30