nonparametric regresion models estimation in r · nonparametric regresion models estimation in r...
TRANSCRIPT
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Nonparametric regresion models
estimation in R
Maer Matei Monica Mihaela,
Bucharest University Of Economic Studies
National Scientific Research Institute for Labour and Social Protection
Eliza Olivia Lungu
National Scientific Research Institute for Labour and Social Protection
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Theoretical background: Nonparametric estimation of regression
functions with both categorical and continuous data (Racine and Li, 2004)
Software solution: R np package (Hayfield, and Racine, 2008)
Practical problem : Estimate the over education impact on earnings
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Objectives:
To model a dataset comprised of continuous, discrete, or categorical data (nominal
or ordinal), or any combination.
To construct a more flexible model.
To let the data determine an appropriate model without specifying the functional
forms for objects being estimated.
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
METHOD- nonparametric regression based on kernel methods
Key notions
- generalized product kernels
- kernels for categorical data
- bandwidth selection
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
R package “np” (Hayfield, and Racine, 2008):
- density estimation
- regression, and derivative estimation for both categorical and continuous data,
- a range of kernel functions and bandwidth selection methods
- tests of significance for nonparametric regression.
- A variety of bootstrap methods for computing standard errors, nonparametric
confidence bounds, and bias-corrected bounds are implemented.
- A variety of bandwidth methods are implemented
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
FUNCTIONS
npunitest - for testing equality of two univariate density/probability functions (Maasoumi and
Racine,2002).
npregbw - computes a bandwidth object for a p-variate kernel regression estimator defined
over mixed continuous and discrete, using the method of Racine and Li (2004) and Li and
Racine (2004).
npreg - computes a kernel regression estimate of a one (1) dimensional dependent variable on
p- variate explanatory data, given a set of explanatory data and dependent data), and a
bandwidth specification using the method of Racine and Li (2004) and Li and Racine (2004).
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
The difficulties we encountered are related to the estimation time especially when the routines
for significance testing based on bootstrap are called.
- Execution time for most routines is exponentially increasing in the number of observations
and increases with the number of variables involved.
- Data-driven bandwidth selection methods involving multivariate numerical search can
betime-consuming, particularly for large datasets.
- A version of this package is under development to facilitate computation involving large
datasets- Package ‘npRmpi
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Estimate the overeducation impact on earnings
- REFLEX database includes information on early career outcomes of school leavers
graduating ISCED 5 in 1999/2000 for 14 countries
- UK sample 932 graduates
- Main independent variable:
{
{ }
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Dependent variable
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Other independent variables
gender
number of months employed since graduation (totworkdu)
number of months at current job (workdu)
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Testing equality of the density functions
‘Srho’: 0.04526657 P Value: < 2.22e-16 *** Null of equality is rejected at the 0.1% level
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Signifficance test for the estimated coefficients and Rsquared
Country
X1
(total
work
duration)
X2
( work
duration
current
job)
X1
(job-
education
match)
X2
(gender)
R
squared
UK 0.070 0.320 0.008 < 2.22e-
16 0.145
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Partial local linear nonparametric response plots- UK case
Nonparametric regresion models estimation in R
New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014
Conclusions
The results allow us to understand the overeducation impact on earnings distribution without
assuming the functional form of the relationship between overeducation and higher education
graduates earnings.