presented by wenli li, shuhong li, and vivian tam venables and ripley section 8.7 novemeber 22, 2004...

22
Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve- Fitting

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004

One-Dimensional Curve-Fitting

Page 2: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

• Curve-fitting:• Sample data:{(x0,y0), (x1,y1), ... (xn, yn)}

• interpolation & extrapolation

• One-dimensional curve-fitting (section 8.7): • The functional form is not pre-specified• SPLINES (ns, smooth.spline)• Local Regression (LOESS, SUPSMU, KERNEL SMOOTHER and LOCPOLY)

• Data set:• One independent & one dependent

Examples: GAGurine & Mercury level

INTRODUCTION

Page 3: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

GAGurine (MASS)• Dataset:

– Variables:• Age: independent• GAG: dependent

– Sample size: 314

• Classical way:library(MASS)attach(GAGurine)plot(Age, GAG, main=”Degree

6 polynomial”)GAG.lm<-lm(GAG~Age+I(Age^2)

+I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6) +I(Age^7) +I(Age^8))

anova(GAG.lm) GAG.lm2<-lm(GAG~Age+I(Age^2)

+I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6))

xx<-seq(0, 17, len=200)lines(xx, predict(GAG.lm2,

data.frame(Age=xx), col=“red”)

Age: 0.00 0.00……0.46 0.47.….17.30 7.67

GAG 23.0 23.8……18.6 26.4.…..1.9 9.3

=======================================Terms added sequentially (first to last)

Df Sum of Sq Mean Sq F-value Pr(F)

Age 1 12590 12590 593.58 0.0000

I(Age^2) 1 3751 3751 176.84 0.0000

I(Age^3) 1 1492 1492 70.32 0.0000

I(Age^4) 1 449 449 21.18 0.00001

I(Age^5) 1 174 174 8.22 0.00444

I(Age^6) 1 286 286 13.48 0.00028

I(Age^7) 1 57 57 2.70 0.10151

I(Age^8) 1 45 45 2.12 0.14667

Page 4: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

SPLINES

• Algorithm:• Function: ns( )

• Generate a Basis Matrix for Natural Cubic Splines• Usage: ns(x, df, knots, intercept=F,

Boundary.knots,derivs) • Arguments:

• Required: x the predictor variable. • Optional:

• Df: degrees of freedom. One can supply df rather than knots; ns then chooses df-1-intercept knots at suitably chosen quantiles of x. This argument is ignored if knots is supplied.

• Knots: breakpoints that define the spline.

Page 5: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

SPLINES

Function: smooth.spline( )• Fits a cubic B-spline smooth to the input data. • Usage: smooth.spline(x, y, w = <<see below>>, df =

<<see below>>, spar = 0, cv = F, all.knots = F, df.offset = 0, penalty = 1)

• Arguments:• Required: X, values of the predictor variable. There

should be at least ten distinct x values. • Optional:

• Y: response variable, of the same length as x. • Df:a number which supplies the degrees of freedom =

trace(S)rather than a smoothing parameter.

Page 6: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

SPLINESlibrary(splines)plot(Age, GAG, type=”n”, main=”Spline”)#splineslines(Age, fitted(lm(GAG~ns(Age, df=5))), col=”red”)lines(Age, fitted(lm(GAG~ns(Age, df=10))), lty=3, col=”green”)lines(Age, fitted(lm(GAG~ns(Age, df=20))), lty=4, col=”blue”) lines(smooth.spline(Age, GAG), lwd=3, col=”black”)# Smoothing splines legend(12, 50, c(“red: df=5”, “green:df=10”, “blue:df=20”, “Smoothing”), lty=c(1,3,

4,1), lwd=c(1, 1,1, 3), bty=”n”)

Page 7: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

KERNEL SMOOTHFunction: ksmooth( )

• Estimates a probability density or performs scatterplot smoothing using kernel estimates.

• Usage: ksmooth(x, y=NULL, kernel="box", bandwidth=0.5, range.x=range(x), n.points=length(x), x.points=<<see below>>)

• Arguments:• Required: X, vector of x data• Optional:

• Y: vector of y data. This must be same length as x, and missing values are not accepted.

• Kernel: "box“,"triangle“,"parzen“,"normal” • Bandwidth: Larger values of bandwidth make smoother

estimates, smaller values of bandwidth make less smooth estimates.

Page 8: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

Kernel Smoother#kernel smoother:

plot(Age, GAG, type=”n”, main=”ksmooth”)

lines(ksmooth(Age, GAG, “normal”, bandwidth=1), col=”red”)

lines(ksmooth(Age, GAG, “normal”, bandwidth=5))

legend(12, 50, c(“red: bandwidth=1”, “black: bandwidth=5”),bty=”n”)

Page 9: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

LOESS • Using Local Polynomial Regression fit a curve

determined by one or more numerical predictors • gets a predicted value at each point by fitting

a weighted linear regression, where the weights decrease with distance from the point of interest

Page 10: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

LOESS Parameters

• f:controls the window size• weights: distance from some point x• span: the parameter alpha which controls the degree of smoothing

• degree: the degree of the polynomials to be used, up to 2

Page 11: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

LOESS

Code:

library(MASS)attach(GAGurine)plot(Age,GAG,type="n",main="loess")lines(loess.smooth(Age,GAG,span=2/3,degree=1),col="red",lwd=1)lines(loess.smooth(Age,GAG,span=2/3,degree=4),col="blue",lwd=2)lines(loess.smooth(Age,GAG,span=1/3,degree=4),col="green",lwd=1)legend(10,45, c("Red: span=2/3,deg=1","Blue: span=2/3,deg=4",”green: span=1/3,deg=4"),bty="n")

Page 12: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting
Page 13: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

SUPSMU • Serves a purpose similar to that of the function

loess• The best of the three smoothers is chosen by

cross-validation• If there are substantial correlations in x-

value, then a pre-specified fixed span smoother should be used. Reasonable span values are 0.2 to 0.4

Page 14: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

SUPSMU Parameters:• span: the fraction of the observations in the

span of the running( lines smoother, or ‘“cv”’ to choose this by leave-one-out cross-validation)

• bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness

• periodic: if TRUE, the smoother assumes x is a periodic variable with values in the range [0.0, 1.0] and period 1.0. An error occurs if x has values outside this range

References: Friedman, J. H. (1984) A variable span scatter-

plot smoother. Laboratory for Computational Statistics, Stanford University Technical Report No. 5

Page 15: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

Code:plot(Age,GAG,type="n",main="supsmu")lines(supsmu(Age,GAG))lines(supsmu(Age,GAG,bass=3),lty=3)lines(supsmu(Age,GAG,bass=10),lty=4)legend(12,50,c("default","bass=3","bass=10"),lty=c(1,3,4),bty="n")

Page 16: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

LOCPOLY• Estimates a probability density function using local

polynomials• A fast binned implementation over an equally-spaced grid is used• Use approximations over an equally-spaced grid for fast

computation• In a simple form : locpoly(x, y, degree=#, bandwidth=# )

Parameters:

• locpoly(x, y, drv=0, degree=<<see below>>, kernel="normal“ bandwidth,gridsize=401, bwdisc=25, range.x=<<see below>>, binned=FALSE, truncate=TRUE )• drv: order of derivative to be estimated • degree: degree of local polynomial used • bandwidth: the kernel bandwidth smoothing parameter • range.x: vector containing the minimum and maximum values of 'x'

at which to compute the estimate

Page 17: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

LOCPOLYCode:library(MASS)attach(GAGurine)library(KernSmooth)plot(Age, GAG, type="n", main="(Age, GAG) Locpoly")(h<- dpill(Age, GAG))lines(locpoly(Age, GAG, degree=0, bandwidth=h),

col="red",lty=1,lwd=2)lines(locpoly(Age, GAG, degree=1, bandwidth=h),

col="blue",lty=3,lwd=3)lines(locpoly(Age, GAG, degree=2, bandwidth=h),

col="green",lty=4,lwd=3)legend(10,40,c("const=0 red","linear=1 blue","quad=2

green"),lty=c(1,3,4),bty="n")detach()

Page 18: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

LOCPOLY : GAGurine

Page 19: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

Example: Mercury Level• Model : Mercury and Alkalinity• In 1990 to 1991, largemouth bass fish were

studied in 53 different Florida lakes to examine the Mercury contamination level and the factors that influenced the level of mercury absorpsion in the fish

• One factor studied was the Alkaliniity level of the water

• The graph of Mercury level and Alkalinity level is plotted to study the relationship

Page 20: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

Mercury Level Graphs Coding:• #1 loess• plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Loess")• lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 1),

col="red",lwd=2)• lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 2),

col="blue",lwd=2)• legend(65,1.0, c("deg=1 Red","deg=2 Blue"),bty="n")

• #2 supsmu• plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Supsmu")• lines(supsmu(Alkalinity,Mercury, bass=1), lty=1,col="red",lwd=2)• lines(supsmu(Alkalinity,Mercury, bass=10), lty=3,col="blue",lwd=3)• legend(58,1.0, c("base=1red","base=10blue"),lty=c(1,3),bty="n",lwd=2)

• #3 ksmooth• plot(Alkalinity, Mercury, type="n", main="Alkalinity and Mercury, Ksmooth")• lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=1),col="green",lwd=2)• lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=5),col="red",

lty=2,lwd=2)• legend(75,1.0, c("bw=1","bw=5"),lty=c(1,2),bty="n")

• #4 locpoly • library(KernSmooth)• plot( Alkalinity, Mercury, type="n",main="Alkalinity and Mercury, Locpoly")• #select bandwidth• (h <- dpill(Alkalinity,Mercury))• lines(locpoly(Alkalinity,Mercury,degree=0,

bandwidth=h),lty=1,col="green",lwd=2)• lines(locpoly(Alkalinity,Mercury,degree=1, bandwidth=h),lty=2,col="red",lwd=2)• lines(locpoly(Alkalinity,Mercury,degree=2,

bandwidth=h),lty=3,col="purple",lwd=3)• legend(75,1.0, c("const","linear","quad"),lty=c(1,2,3),bty="n")

Page 21: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting
Page 22: Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 One-Dimensional Curve-Fitting

SUMMARY• Use One-Dimensional Curve-Fitting when:

Scatter Plot does not result in a Linear Model Data Transformation does not give satisfactory Linear Model result Accommodate future data Include previous outliers Business applications

• Several methods discussed including: 1. SPLINES 2. LOESS 3. SUPSMU 4. KSMOOTH 5. LOCPOLY

• Parameters: such as bandwidth, df, derivative, smoothness, degree etc can help the curve fitting.