Download - Chapter II - Overview of Supervised Leraning
-
7/24/2019 Chapter II - Overview of Supervised Leraning
1/25
Overview of SupervisedLearning
By
Amitava Bandyopadhyay andBoby John
SQC & OR Division
ndian Statisti!a" nstitute
-
7/24/2019 Chapter II - Overview of Supervised Leraning
2/25
Contents
Supervised "earning as a fun!tion appro#imation
$arametri! and non%parametri! methods of fun!tionappro#imation
wo e#tremes' "inear mode"s and nearest neighbour
(a)or !"asses of approa!h ypes of supervised "earning prob"ems ' predi!tive and
e#p"anatory
Bayes* optima" !"assi+er
Assessing mode" a!!ura!y and ,ua"ity of +t ' training andho"dout data- !on!epts of (S.- training and test errors- biasand varian!e- /e#ibi"ity and interpretabi"ity- over +tting and itsimp"i!ations-
(ode" se"e!tion basi! "esson
$rob"ems of high dimensiona" data
-
7/24/2019 Chapter II - Overview of Supervised Leraning
3/25
Supervised Learning as 0un!tionAppro#imation
Supervised "earning !onsists of estimating a target variab"eon the basis of a set of inputs1 n genera"2 therefore2 theprob"em may be mathemati!a""y stated as
Y = fhat(X) + where X represents a vector of inputvariables (x1, x2,,xk) and Y may or may not be a vector
!he term " represents random error, to be explained later
n the supervised "earning set up we often assume that thetarget may be e#pressed as a fun!tion of the inputs13owever2 the true fun!tion ' say f(X) is genera""y un4nownand fhat(X)is an estimate of the true fun!tion1
#ote$ %n supervised learnin& problem we &enerally tryto estimate the avera&e (mean), median, rate orproportion of the tar&et variable for &iven input values
-
7/24/2019 Chapter II - Overview of Supervised Leraning
4/25
Data for (ode" 0itting
0itting the fun!tion to estimate va"ues of theoutput 5target6 variab"e 7 is !a""ed modeltting1
he mode"s are +tted using data !o""e!ted
on both the output 576 as we"" as the input586 variab"es1 n the usua" setup2 the dataare represented as a 59 # 5p : ;66 matri#where p gives the number of input
variab"es1 n most !ases 7 is not a ve!tor1 3owever2
there are o!!asions when there are morethan one output variab"es1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
5/25
raining2
-
7/24/2019 Chapter II - Overview of Supervised Leraning
6/25
wo Di@erent ypes of(ode"s
(ode"s may be +tted to estimate the va"ue of 7 or to !"assifythe response into one of severa" !"asses ' these two typesmay be referred to as estimation and !"assi+!ation settingrespe!tive"y
'stimation settin&$ n this !ase we estimate the averageor median of 7 for a given set of input variab"es1 n this setup
the error is usua""y measured as (y yhat)2/ n2 where yhatrepresents the estimated va"ue of y for given va"ues of #1 n!ertain !ases average abso"ute deviation is a"so ta4en1
lassication settin&$n this !ase we !"assify the responseinto one of severa" !"asses on the basis of the va"ues of 81 n
this setup the error is measured as I(yi yhati) / n1 hefun!tion I(yi yhati) is !a""ed the indi!ator fun!tion and it
ta4es the va"ue ; if yi yhatiand > otherwise1 he number
of !ases for whi!h the error is measured is given by n1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
7/25
9ote
Apart from the two settings of estimationand !"assi+!ation2 we sometimes havehypotheses testing setup
n this setup2 !ertain statements made aboutsome variab"es ' usua""y response variab"esare veri+ed from data1 n order to verify thestatements it is often ne!essary to estimate
some va"ues1 hese a!tivities and the!orresponding methodo"ogies have been!overed in a separate se!tion of this !ourse
-
7/24/2019 Chapter II - Overview of Supervised Leraning
8/25
Di@erent ypes of
-
7/24/2019 Chapter II - Overview of Supervised Leraning
9/25
Bayes* C"assi+er Bayes* !"assi+er provides an optima"ity !riteria for
!"assi+!ation mode"s Let the response 7 be a !ategori!a" variab"e with 4
di@erent !"asses 5"abe"s6
Consider a !"assi+er that !"assi+es 7 to !"ass ) su!h
that $57 E ) 8 E x6 F $57 E 4 8 E x6 for a"" 4 )2 i1e1the response is a""o!ated to the !"ass with ma#imum!onditiona" probabi"ity1 his !"assi+er is !a""ed Bayes*!"assi+er and it !an be shown that the Bayes*!"assi+er gives the "owest rate of !"assi+!ation error
among a"" !"assi+ers1 *verall +ayes 'rror -ate .; ' .5ma#)$57 E ) 8 E
x6 ' the e#pe!tation averages the probabi"ity over a""possib"e va"ues of 8
-
7/24/2019 Chapter II - Overview of Supervised Leraning
10/25
$arametri! and 9on%$arametri! (ethods
$arametri! mode"s assume a parti!u"ar form of thefun!tion ' say "inear or po"ynomia"- e1g1 the ana"ystmay assume "inearity G f586 E H>: H;8;:HI8I:111
:Hp8p1 n this !ase the ana"yst wi"" on"y have to
estimate a set of parameters to +t the mode"1
9on%parametri! methods do not ma4e e#p"i!itassumptions about the fun!tiona" form of f1 nsteadthey see4 an estimate of f that gets as !"ose to thedata points as possib"e without being too rough or
wigg"y1 hus non%parametri! methods aim at +ttingthe data as a!!urate"y as possib"e but does notassume how the inputs may be re"ated to theoutput 5target61
-
7/24/2019 Chapter II - Overview of Supervised Leraning
11/25
Comparison of $arametri! and 9on%$arametri! (ode"s
/dvanta&es of non0parametric approach$ As
these approa!hes avoid the assumption of a parti!u"arfun!tiona" form of f2 they have the potentia" toa!!urate"y +t a wide range of possib"e shapes of f1 n!ontrast2 a parametri! approa!h assumes a fun!tiona"form and therefore su@ers from a ma)or ris4 of the
assumed fun!tiona" form being very di@erent from thetrue shape
/dvanta&es of parametric approach$ heseapproa!hes redu!e the prob"em to one of estimating ahandfu" of parameters and !onse,uent"y re,uire are"ative"y sma""er number of observations1 n !ontrastnon%parametri! methods depend on the observedva"ues of 7 and tries to un!over under"ying patterns1Conse,uent"y these methods re,uire mu!h "arger
number of observations1 hen a parametri! mode" +ts we""2 we may assume that a
-
7/24/2019 Chapter II - Overview of Supervised Leraning
12/25
wo
-
7/24/2019 Chapter II - Overview of Supervised Leraning
13/25
he Continuum of (ode"s e present the mode"s from the perspe!tive of
/e#ibi"ity !omp"e#ity vs1 interpretabi"ity1 heordering is appro#imate
he mode"s that appear in the beginning are moreinterpretab"e but "ess /e#ib"e
a1 Linear mode"sb1 Subset se"e!tions2 stepwise regression and ridge
regression
!1 enera"ied Linear (ode"s 5L(6
d1 enera"ied Additive (ode"s 5A(6
e1 ree based mode"s
f1 Bagging and Boosting
g1 Regression sp"ines and "o!a" regression mode"s
h1 Support
-
7/24/2019 Chapter II - Overview of Supervised Leraning
14/25
-
7/24/2019 Chapter II - Overview of Supervised Leraning
15/25
Con!ept of Over 0itting
hen a +tted mode" shows very sma"" training errorbut high test error2 the mode" is said to have over+tted the data
Over +tting refers to e#tra!ting nuan!es of the
parti!u"ar data rather than e#p"aining thephenomenon1
hese mode"s have "ow bias for the training dataset1 3owever2 they have high varian!e sin!e +tting
with a di@erent data set may "ead to "arge !hange ofthe mode" parameters
Over +tted mode"s +t the training data very we"" butdoes not +t the va"idation test data we""1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
16/25
est and raining .rror
est .rror
raining
.rror
Over +ttingArea
.
rrorRate
0"e#ibi"ity Comp"e#ity
Nnder +ttingArea
-
7/24/2019 Chapter II - Overview of Supervised Leraning
17/25
Con!ept of 0"e#ibi"ity and Comp"e#ity
A method is said to be more /e#ib"e in !ase ita""ows a "arger range of shapes to be +tted
(ore /e#ib"e mode"s wi"" re,uire more number ofparameters to be estimated1 0or e#amp"e2 a 4
nearest neighbour approa!h with 4 E ;> and 9 E;>>>> wi"" re,uire ;>>> parameters to beestimated1 3owever2 if there are ;> independentvariab"es a "inear mode" wi"" re,uire on"y ;;parameters to be estimated1
(ode"s with "arger number of parameters is saidto be more !omp"e#1 hus more /e#ib"e mode"sare e#pe!ted to be more !omp"e#1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
18/25
ypes of .rrors
n a mode" +tting e#er!ise we !ome a!ross three types of errors 'the irredu!ib"e error2 bias and varian!e
%rreducible error$ As we may fai" to !onsider a"" variab"es orthere may be un!ontro""ab"e variation even when a"" measurab"evariab"es have been !onsidered2 a"" +tted mode"s have !ertain,uantum of error1 his error is !a""ed the irredu!ib"e error and isoften denoted by "1
+ias$ he amount by whi!h the average of the estimate di@er
from the true mean1 Lower bias2 therefore2 indi!ates "owerdeparture from the true mean on an average
ariance$ he e#tent to whi!h the estimated fun!tion 5fhat6varies around its mean1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
19/25
ypes of Supervised Learning $rob"ems
Supervised "earning prob"ems may be divided into three broad
!"asses2 name"y 'e#p"anatory2 predi!tive2 and !ombination1 e are often interested in understanding the way the response 7
is impa!ted by the input variab"es 8;211128p1 n this situation we
wish to estimate f2 but our goa" is not ne!essari"y to ma4epredi!tions for 71 e instead want to understand the re"ationshipbetween 8 and 72 or more spe!i+!a""y2 to understand how 7
!hanges as a fun!tion of 8;211128p1 9ow fhat !annot be treated as ab"a!4 bo#2 be!ause we need to 4now its e#a!t form1 hese setupsare often !a""ed 1lanatory -nalytics!
redicti"e -nalytics,n !ertain !ases the ana"yst may be so"e"yinterested in predi!tion a!!ura!y and may not be interested in
in!reasing substantive understanding1 n su!h !ases it isimportant to use very /e#ib"e fun!tions that estimate va"ues of fa!!urate"y1
Co#$ination, n !ertain !ases we may be interested in bothpredi!tion as we"" as e#p"anation of phenomenon1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
20/25
.#amp"es of .#p"anatoryAna"yti!s
Which predictors are associated with the response? t isoften the !ase that on"y a sma"" fra!tion of the avai"ab"epredi!tors are substantia""y asso!iated with 71 dentifyingthe few important predi!tors among a "arge set of possib"evariab"es !an be e#treme"y usefu"2 depending on theapp"i!ation1
What is the relationship between the response and eachpredictor? Some predi!tors may have a positive re"ationshipwith 72 in the sense that in!reasing the predi!tor isasso!iated with in!reasing va"ues of 71 Other predi!tors mayhave the opposite re"ationship1 Depending on the
!omp"e#ity of f2 the re"ationship between the response anda given predi!tor may a"so depend on the va"ues of theother predi!tors1
Can the relationship between Y and each predictor beadeuately summari!ed using a linear euation" or is the
relationship more complicated?
-
7/24/2019 Chapter II - Overview of Supervised Leraning
21/25
.#amp"es of $redi!tiveAna"yti!s
.stimating sto!4 pri!e
0inding out whether a !redit !ardtransa!tion is fraudu"ent
.stimating how "ong a parti!u"ar e!onomi!situation "i4e a re!ession may "ast
#ote;1 hether a parti!u"ar prob"em is e#p"anatory or predi!tive
depends on the spe!i+! !ondition1 An investor may be interestedin 4nowing the possib"e sto!4 pri!e or may wish to understandthe variab"es that impa!t the pri!e
I1 n many !ases the prob"em at hand may be a !ombination ofpredi!tive as we"" as e#p"anatory1 e may be interested ina!!urate"y predi!ting the dri""ing time of a oi" rig or the !han!e offai"ure of an instrument and at the same time we may "i4e to
-
7/24/2019 Chapter II - Overview of Supervised Leraning
22/25
(ode" Se"e!tion Basi!Lesson
Depending on whether our u"timate goa" ispredi!tion2 inferen!e2 or a !ombination of the two2di@erent methods for estimating f may beappropriate1 0or e#amp"e2 "inear mode"s a""ow for
re"ative"y simp"e and interpretab"e inferen!e2 butmay not yie"d as a!!urate predi!tions as someother approa!hes1 n !ontrast2 some of the high"ynon%"inear approa!hes that we dis!uss in the "ater
!hapters in this !ourse !an potentia""y provide,uite a!!urate predi!tions for 72 but this !omes atthe e#pense of a "ess interpretab"e mode" for whi!hinferen!e is more !ha""enging1
-
7/24/2019 Chapter II - Overview of Supervised Leraning
23/25
Summary A "arge part of Business Ana"yti!s !onsists of deve"oping
understanding about responses or predi!ting their va"uesor a !ombination of both1 he te!hni,ues used for thispurpose are !a""ed supervised "earning te!hni,ues
he supervised "earning te!hni,ues essentia""y boi" downto estimating a fun!tion of the e#p"anatory variab"es that
appro#imate the va"ue of the responses given someva"ues of the e#p"anatory variab"es
he supervised "earning prob"ems !onsist of prob"ems ofestimation and prob"ems of !"assi+!ation1 Sometimes wehave the prob"ems of hypothesis testing as we""
Choosing and +tting the fun!tion is 4nown as mode"+tting1 (ode"s with "arger number of parameters aremore /e#ib"e !omp"e#1 3owever su!h mode"s areusua""y more di!u"t to interpret
wo broad approa!hes ' parametri! and non%parametri!
are used to estimate the fun!tion1 he parametri!
-
7/24/2019 Chapter II - Overview of Supervised Leraning
24/25
Summary 5ContinuedP6
(ode" +tting is !arried out from twoperspe!tives ' e#p"anatory andpredi!tive1 hi"e /e#ib"e mode"s are
preferred for predi!tion2 interpretabi"ity ismost important for e#p"anatory mode"s
he data used to +t mode"s is !a""edtraining data1 enera""y the !o""e!teddata need to be divided into three!"asses ' training2 test and va"idation
-
7/24/2019 Chapter II - Overview of Supervised Leraning
25/25
Review Questions
hat is e#p"anatory ana"yti!s
hat is predi!tive ana"yti!s
hat is a non%parametri! mode" ive an e#amp"e1
hat are bias and varian!e
hat is meant by over +tting hy is it ris4y
$arametri! mode"s are genera""y more /e#ib"e but"ess interpretab"e1 Do you agree
.#p"ain the !on!ept of 99 brie/y1
n a "inear mode" you try to e#press .57 8;2 8I2P2
846 as a "inear fun!tion of 8;2 8I2P2 841 Do you
agree .#p"ain brie/y1