Download - Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors
Presenter: Lwando KondloSupervisor: Prof. C. Koen
SKA Postgrad Bursary ConferenceDecember 5, 2009
The model for variable X measured with error is
Estimation of the density/distribution function of X is often important.
This is a classical deconvolution problem. The specific case where X has a Pareto form
is discussed.
Pareto distribution – model for positive data. Example includes the
Distribution of income and wealth among individuals
Masses of molecular clouds, etc.
The Finite-Support Pareto distribution (FSPD) is
Distributional parameters are estimated by fitting the FSPD to a set of data.
This is not appropriate if the data are contaminated by errors
To develop methodology for deconvolution when X is known to be of Pareto form.
Apply the methodology to the real (radio astronomical) data.
If X has the PDF g(.) and has the PDF h(.). Then Y has the PDF
Then the convolved PDF (CPDF)
The CPDF could differ substantially from FSPD.
Probability-Probability plots (compares observed and theoretical distribution functions) can be used.
Simulated data with
are used.
L U a σ
3 6 1.5 0.4
1. The contaminated data extend beyond the interval [L,U] over which the error-free data occur
2. The shape of the distribution is changed
◦ This will lead to biased estimates of L, U and power-law exponent a.
Based on maximising the likelihood (or log-likelihood) of the observed data given the model.
Log-likelihood of CPDF
Application to the data in the histogram leads
N.B: CPDF fitted to the data with errors gives favourable MLEs with true parameter values 3; 6 and 1.5.
L U a
FSDP 2.267 7.124 1.186
CPDF 3.047 6.028 1.445
The methodology is illustrated by fitting CPDF to a sample of giant molecular clouds masses in the galaxy M33 (Engargiola et. al., 2003).
L U a σ
MLE 6.9 77.7 1.33 3.47
s.e 0.65 4.97 0.26 0.55
The unit mass is solar masses.
Good agreement with the Engargiola et al (2003) estimates. More especially a = 1.6 +/- 0.3.
The linear form of the P-P plot indicates that the estimated distribution fits the sample of giant molecular clouds very well.
Deconvolution is a useful statistical method for recovering an unknown distribution of X in the presence of errors.
The methodology for deconvolution when X is known to be of Pareto form is developed
Satisfactory results were found by MLE method.
The price paid is that the analysis is more complicated
Everyone contributed to the work presented.1. Prof. C. Koen (Supervisor)2.Funding: SKA SA (Kim, Anna and Daphne) 3. University of the Western Cape (Leslie and
Rennet)