regression tree and multivariate adaptive regression splines (mars) · 23 hours ago · for mars a...

71
Regression Tree and Multivariate Adaptive Regression Splines (MARS) Munmun Biswas Dept. of Statistics, Brahmananda Keshab Chandra College July 28, 2020 M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS) July 28, 2020 1 / 20

Upload: others

Post on 02-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Treeand

Multivariate Adaptive Regression Splines (MARS)

Munmun Biswas

Dept. of Statistics, Brahmananda Keshab Chandra College

July 28, 2020

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 1 / 20

Page 2: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 3: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the idea

We have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 4: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementation

Also certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 5: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 6: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 7: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

Bagging

BoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 8: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoosting

Random Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 9: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 10: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 2 / 20

Page 11: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Preblem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

x1

x2

1

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 3 / 20

Page 12: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Problem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

The CART algorithm splitsthe x-space into partitions,say {R1,R2, . . . ,RM}

x1

x2

1

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 4 / 20

Page 13: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Problem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

The algorithm splits thex-space into partitions, say{R1,R2, . . . ,RM}The regression tree model ofresponse asf (x) =

∑Mm=1 cmI(x ∈ Rm)

cm = ave(yi |xi ∈ Rm)

x1

x2

R1

R2

R3

R4

1

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 5 / 20

Page 14: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 15: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 16: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 17: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}

Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 18: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 19: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 20: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 6 / 20

Page 21: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 22: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.

maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 23: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 24: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 25: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 26: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 27: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 7 / 20

Page 28: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 8 / 20

Page 29: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 8 / 20

Page 30: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 8 / 20

Page 31: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Data Description

Body girth measurements and skeletal diameter measurements, as well as age, weight, height

and gender, are given for 507 physically active individuals- 247 men and 260 women.

Figure: Body Dimension Data

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 9 / 20

Page 32: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation using rpart

l i b r a r y ( r samp le )l i b r a r y ( r p a r t )l i b r a r y ( r p a r t . p l o t )%l i b r a r y ( Me t r i c s ) #rmse f u n c t i o n

b o d y d im e n s i o n d a t a i n t e r e s t <− r ead . c sv (”˜/ Desktop /Workshop s ta t ml /My ta lk / b o d y d im e n s i o n d a t a i n t e r e s t . c s v ” , row . names=NULL)View ( b o d y d im e n s i o n d a t a i n t e r e s t )

s e t . s eed (123)bodyd im sp l i t<− i n i t i a l s p l i t ( b o d y d im e n s i o n d a t a i n t e r e s t , propo =.7)bodyd im t ra i n<−t r a i n i n g ( b o d y d im s p l i t )bodyd im tes t<−t e s t i n g ( b o d y d im s p l i t )

m1<−r p a r t ( f o rmu la=we ight ˜ . , data=bodyd im t ra i n , method=”anova ”)r p a r t . p l o t (m1) #to v iew the t r e ep l o t c p (m1) #To check f o r the p runn ing

m2 <− r p a r t ( f o rmu la=we ight ˜ . , data= bodyd im t ra i n , method= ”anova ” , c o n t r o l= l i s t ( cp = 0 , x v a l = 10))p l o t c p (m2)

pred <− p r e d i c t (m1, newdata = bodyd im te s t )obs<−bodyd im te s t$we i gh trmse ( pred , obs )

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 10 / 20

Page 33: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Ilustration using rattle()

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 11 / 20

Page 34: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 35: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 36: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 37: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 38: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 39: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 40: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 41: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 12 / 20

Page 42: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Page 43: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Page 44: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Page 45: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Page 46: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Page 47: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 13 / 20

Page 48: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Illustrative Example

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 14 / 20

Page 49: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 50: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 51: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 52: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 53: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )

Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 54: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctions

In each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 55: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 56: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 57: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 15 / 20

Page 58: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

Page 59: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

Page 60: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

Page 61: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 16 / 20

Page 62: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 63: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 64: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 65: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 66: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 67: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 68: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 17 / 20

Page 69: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation of MARS in R

#imp l ementa t i on o f MARSl i b r a r y ( e a r t h )l i b r a r y ( c a r e t )

mars1 <− e a r t h ( we ight ˜ . , data = bodyd im t r a i n )p r i n t ( mars1 ) #f o r model summarysummary ( mars1 )p l o t (mars1 , which = 1)

mars2 <− e a r t h ( we ight ˜ . , data = bodyd im t ra i n , deg r ee = 2)summary ( mars2 )p l o t (mars2 , which=1, l e g end . pos=0)

#per fo rmance o f the model i n t e s t d a t a s e tsummary (mars1 , newdata=bodyd im te s t )yhat=p r e d i c t ( mars1 , newdata=bodyd im te s t )yobs=bodyd im te s t$we i gh trmse ( yhat , yobs )

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 18 / 20

Page 70: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage and Disadvantage of MARS

Advantages:

Accurate if the local linear relationships are correct.Quick computation.Can work well even with large and small data sets.Provides automated feature selection.The non-linear relationship between the features and response are fairlyintuitive.Can be used for both regression and classification problems.Does not require feature standardization.

Disadvantages:

Not accurate if the local linear relationships are incorrect.Typically not as accurate as more advanced non-linear algorithms(random forests, gradient boosting machines).The earth package does not incorporate more advanced spline features(i.e. Piecewise cubic models).Missing values must be pre-processed.

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 19 / 20

Page 71: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Thank You

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 20 / 20