regression tree and multivariate adaptive regression splines (mars) · 23 hours ago · for mars a...

Post on 02-Aug-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Regression Treeand

Multivariate Adaptive Regression Splines (MARS)

Munmun Biswas

Dept. of Statistics, Brahmananda Keshab Chandra College

July 28, 2020

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 1 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the idea

We have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementation

Also certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

Bagging

BoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoosting

Random Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Regression Preblem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

x1

x2

1

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 3 / 20

Regression Problem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

The CART algorithm splitsthe x-space into partitions,say {R1,R2, . . . ,RM}

x1

x2

1

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 4 / 20

Regression Problem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

The algorithm splits thex-space into partitions, say{R1,R2, . . . ,RM}The regression tree model ofresponse asf (x) =

∑Mm=1 cmI(x ∈ Rm)

cm = ave(yi |xi ∈ Rm)

x1

x2

R1

R2

R3

R4

1

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 5 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}

Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.

maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 8 / 20

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 8 / 20

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 8 / 20

Data Description

Body girth measurements and skeletal diameter measurements, as well as age, weight, height

and gender, are given for 507 physically active individuals- 247 men and 260 women.

Figure: Body Dimension Data

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 9 / 20

Implementation using rpart

l i b r a r y ( r samp le )l i b r a r y ( r p a r t )l i b r a r y ( r p a r t . p l o t )%l i b r a r y ( Me t r i c s ) #rmse f u n c t i o n

b o d y d im e n s i o n d a t a i n t e r e s t <− r ead . c sv (”˜/ Desktop /Workshop s ta t ml /My ta lk / b o d y d im e n s i o n d a t a i n t e r e s t . c s v ” , row . names=NULL)View ( b o d y d im e n s i o n d a t a i n t e r e s t )

s e t . s eed (123)bodyd im sp l i t<− i n i t i a l s p l i t ( b o d y d im e n s i o n d a t a i n t e r e s t , propo =.7)bodyd im t ra i n<−t r a i n i n g ( b o d y d im s p l i t )bodyd im tes t<−t e s t i n g ( b o d y d im s p l i t )

m1<−r p a r t ( f o rmu la=we ight ˜ . , data=bodyd im t ra i n , method=”anova ”)r p a r t . p l o t (m1) #to v iew the t r e ep l o t c p (m1) #To check f o r the p runn ing

m2 <− r p a r t ( f o rmu la=we ight ˜ . , data= bodyd im t ra i n , method= ”anova ” , c o n t r o l= l i s t ( cp = 0 , x v a l = 10))p l o t c p (m2)

pred <− p r e d i c t (m1, newdata = bodyd im te s t )obs<−bodyd im te s t$we i gh trmse ( pred , obs )

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 10 / 20

Ilustration using rattle()

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 11 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Illustrative Example

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 14 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )

Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctions

In each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Implementation of MARS in R

#imp l ementa t i on o f MARSl i b r a r y ( e a r t h )l i b r a r y ( c a r e t )

mars1 <− e a r t h ( we ight ˜ . , data = bodyd im t r a i n )p r i n t ( mars1 ) #f o r model summarysummary ( mars1 )p l o t (mars1 , which = 1)

mars2 <− e a r t h ( we ight ˜ . , data = bodyd im t ra i n , deg r ee = 2)summary ( mars2 )p l o t (mars2 , which=1, l e g end . pos=0)

#per fo rmance o f the model i n t e s t d a t a s e tsummary (mars1 , newdata=bodyd im te s t )yhat=p r e d i c t ( mars1 , newdata=bodyd im te s t )yobs=bodyd im te s t$we i gh trmse ( yhat , yobs )

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 18 / 20

Advantage and Disadvantage of MARS

Advantages:

Accurate if the local linear relationships are correct.Quick computation.Can work well even with large and small data sets.Provides automated feature selection.The non-linear relationship between the features and response are fairlyintuitive.Can be used for both regression and classification problems.Does not require feature standardization.

Disadvantages:

Not accurate if the local linear relationships are incorrect.Typically not as accurate as more advanced non-linear algorithms(random forests, gradient boosting machines).The earth package does not incorporate more advanced spline features(i.e. Piecewise cubic models).Missing values must be pre-processed.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 19 / 20

Thank You

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 20 / 20

top related