regression tree and multivariate adaptive regression splines (mars) · 23 hours ago · for mars a...

Regression Treeand

Multivariate Adaptive Regression Splines (MARS)

Munmun Biswas

Dept. of Statistics, Brahmananda Keshab Chandra College

July 28, 2020

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 1 / 20

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

We have discussed the idea

We have seen the rpart package for implementationAlso certain limitations of CART algorithm

We have discussed the ideaWe have seen the rpart package for implementation

Also certain limitations of CART algorithm

Bagging

BoostingRandom Forest

BaggingBoosting

Random Forest

Regression Preblem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

Regression Problem

The CART algorithm splitsthe x-space into partitions,say {R1,R2, . . . ,RM}

Regression Problem

The algorithm splits thex-space into partitions, say{R1,R2, . . . ,RM}The regression tree model ofresponse asf (x) =

∑Mm=1 cmI(x ∈ Rm)

cm = ave(yi |xi ∈ Rm)

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}

Having found the best binary split, we partition the data into the tworesulting regions.

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.

maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

Stopping rule

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

R package rpart

Using rattle()

R package rpart

Using rattle()

Data Description

Body girth measurements and skeletal diameter measurements, as well as age, weight, height

and gender, are given for 507 physically active individuals- 247 men and 260 women.

Figure: Body Dimension Data

Implementation using rpart

l i b r a r y ( r samp le )l i b r a r y ( r p a r t )l i b r a r y ( r p a r t . p l o t )%l i b r a r y ( Me t r i c s ) #rmse f u n c t i o n

b o d y d im e n s i o n d a t a i n t e r e s t <− r ead . c sv (”˜/ Desktop /Workshop s ta t ml /My ta lk / b o d y d im e n s i o n d a t a i n t e r e s t . c s v ” , row . names=NULL)View ( b o d y d im e n s i o n d a t a i n t e r e s t )

s e t . s eed (123)bodyd im sp l i t<− i n i t i a l s p l i t ( b o d y d im e n s i o n d a t a i n t e r e s t , propo =.7)bodyd im t ra i n<−t r a i n i n g ( b o d y d im s p l i t )bodyd im tes t<−t e s t i n g ( b o d y d im s p l i t )

m1<−r p a r t ( f o rmu la=we ight ˜ . , data=bodyd im t ra i n , method=”anova ”)r p a r t . p l o t (m1) #to v iew the t r e ep l o t c p (m1) #To check f o r the p runn ing

m2 <− r p a r t ( f o rmu la=we ight ˜ . , data= bodyd im t ra i n , method= ”anova ” , c o n t r o l= l i s t ( cp = 0 , x v a l = 10))p l o t c p (m2)

pred <− p r e d i c t (m1, newdata = bodyd im te s t )obs<−bodyd im te s t$we i gh trmse ( pred , obs )

Ilustration using rattle()

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

Advantages

Disadvantages

Advantages

Disadvantages

Advantages

Disadvantages

Advantages

Disadvantages

Advantages

Disadvantages

Advantages

Disadvantages

Advantages

Disadvantages

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w