advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... ·...

Post on 10-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Advancedclassifica-onmethods

EnsembleMethods

•  Constructasetofclassifiersfromthetrainingdata

•  Predictclasslabelofpreviouslyunseenrecordsbyaggrega-ngpredic-onsmadebymul-pleclassifiers

GeneralIdeaOriginal

Training data

....D1 D2 Dt-1 Dt

D

Step 1:Create Multiple

Data Sets

C1 C2 Ct -1 Ct

Step 2:Build Multiple

Classifiers

C*Step 3:

CombineClassifiers

Whydoesitwork?

•  Supposethereare25baseclassifiers– Eachclassifierhaserrorrate,ε=0.35– Assumeclassifiersareindependent– Probabilitythattheensembleclassifiermakesawrongpredic-on:

∑=

− =−⎟⎟⎠

⎞⎜⎜⎝

⎛25

13

25 06.0)1(25

i

ii

iεε

ExamplesofEnsembleMethods

•  Howtogenerateanensembleofclassifiers?– Bagging

– Boos-ng

Bagging

•  Samplingwithreplacement

•  Buildclassifieroneachbootstrapsample

•  Eachsamplehasprobability(1–1/n)nofbeingselected

Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

Boos-ng

•  Anitera-veproceduretoadap-velychangedistribu-onoftrainingdatabyfocusingmoreonpreviouslymisclassifiedrecords–  Ini-ally,allNrecordsareassignedequalweights– Unlikebagging,weightsmaychangeattheendofboos-nground

Boos-ng

•  Recordsthatarewronglyclassifiedwillhavetheirweightsincreased

•  Recordsthatareclassifiedcorrectlywillhavetheirweightsdecreased

Original Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example4ishardtoclassify

• Itsweightisincreased,thereforeitismorelikelytobechosenagaininsubsequentrounds

Example:AdaBoost

•  Baseclassifiers:C1,C2,…,CT

•  Errorrate:

•  Importanceofaclassifier:

( )∑=

≠=N

jjjiji yxCw

N 1

)(1δε

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

i

ii ε

εα

1ln21

Example:AdaBoost

•  Weightupdate:

•  Ifanyintermediateroundsproduceerrorratehigherthan50%,theweightsarerevertedbackto1/nandtheresamplingprocedureisrepeated

•  Classifica-on:

factor ionnormalizat theis where

)( ifexp)( ifexp)(

)1(

j

iij

iij

j

jij

i

Z

yxCyxC

Zww

j

j

⎪⎩

⎪⎨⎧

==

−+

α

α

( )∑=

==T

jjj

yyxCxC

1

)(maxarg)(* δα

BoostingRound 1 + + + -- - - - - -

0.0094 0.0094 0.4623B1

a = 1.9459

Illustra-ngAdaBoostDatapointsfortraining

Ini-alweightsforeachdatapoint

OriginalData + + + -- - - - + +

0.1 0.1 0.1

Illustra-ngAdaBoostBoostingRound 1 + + + -- - - - - -

BoostingRound 2 - - - -- - - - + +

BoostingRound 3 + + + ++ + + + + +

Overall + + + -- - - - + +

0.0094 0.0094 0.4623

0.3037 0.0009 0.0422

0.0276 0.1819 0.0038

B1

B2

B3

a = 1.9459

a = 2.9323

a = 3.8744

Rule-BasedClassifier

•  Classifyrecordsbyusingacollec-onof“if…then…”rules

•  Rule:(Condi&on)→y–  where

•  Condi&onisaconjunc-onsofa`ributes•  yistheclasslabel

–  LHS:ruleantecedentorcondi-on–  RHS:ruleconsequent–  Examplesofclassifica-onrules:

•  (BloodType=Warm)∧(LayEggs=Yes)→Birds•  (TaxableIncome<50K)∧(Refund=Yes)→Evade=No

Rule-basedClassifier(Example)

R1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

Name Blood Type Give Birth Can Fly Live in Water Classhuman warm yes no no mammalspython cold no no no reptilessalmon cold no no yes fisheswhale warm yes no yes mammalsfrog cold no no sometimes amphibianskomodo cold no no no reptilesbat warm yes yes no mammalspigeon warm no yes no birdscat warm yes no no mammalsleopard shark cold yes no yes fishesturtle cold no no sometimes reptilespenguin warm no no sometimes birdsporcupine warm yes no no mammalseel cold no no yes fishessalamander cold no no sometimes amphibiansgila monster cold no no no reptilesplatypus warm no no no mammalsowl warm no yes no birdsdolphin warm yes no yes mammalseagle warm no yes no birds

Applica-onofRule-BasedClassifier

•  Arulercoversaninstancexifthea`ributesoftheinstancesa-sfythecondi-onoftheruleR1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

TheruleR1coversahawk=>BirdTheruleR3coversthegrizzlybear=>Mammal

Name Blood Type Give Birth Can Fly Live in Water Classhawk warm no yes no ?grizzly bear warm yes no no ?

RuleCoverageandAccuracy•  Coverageofarule:– Frac-onofrecordsthatsa-sfytheantecedentofarule

•  Accuracyofarule:– Frac-onofrecordsthatsa-sfyboththeantecedentandconsequentofarule

Tid Refund Marital Status

Taxable Income Class

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

(Status=Single)→No

Coverage=40%,Accuracy=50%

HowdoesRule-basedClassifierWork?R1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

AlemurtriggersruleR3,soitisclassifiedasamammalAturtletriggersbothR4andR5Adogfishsharktriggersnoneoftherules

Name Blood Type Give Birth Can Fly Live in Water Classlemur warm yes no no ?turtle cold no no sometimes ?dogfish shark cold yes no yes ?

Characteris-csofRule-BasedClassifier

•  Mutuallyexclusiverules– Classifiercontainsmutuallyexclusiverulesiftherulesareindependentofeachother

– Everyrecordiscoveredbyatmostonerule

•  Exhaus-verules– Classifierhasexhaus-vecoverageifitaccountsforeverypossiblecombina-onofa`ributevalues

– Eachrecordiscoveredbyatleastonerule

FromDecisionTreesToRules

YESYESNONO

NONO

NONO

Yes No

{Married}{Single,

Divorced}

< 80K > 80K

Taxable Income

Marital Status

Refund

Classification Rules

(Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes

(Refund=No, Marital Status={Married}) ==> No

Rulesaremutuallyexclusiveandexhaus-ve

Rulesetcontainsasmuchinforma-onasthetree

RulesCanBeSimplified

YESYESNONO

NONO

NONO

Yes No

{Married}{Single,

Divorced}

< 80K > 80K

Taxable Income

Marital Status

Refund

Tid Refund Marital Status

Taxable Income Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

Ini-alRule:(Refund=No)∧(Status=Married)→No

SimplifiedRule:(Status=Married)→No

EffectofRuleSimplifica-on•  Rulesarenolongermutuallyexclusive– Arecordmaytriggermorethanonerule– Solu-on?•  Orderedruleset•  Unorderedruleset–usevo-ngschemes

•  Rulesarenolongerexhaus-ve– Arecordmaynottriggeranyrules– Solu-on?•  Useadefaultclass

OrderedRuleSet

•  Rulesarerankorderedaccordingtotheirpriority–  Anorderedrulesetisknownasadecisionlist

•  Whenatestrecordispresentedtotheclassifier–  Itisassignedtotheclasslabelofthehighestrankedruleithas

triggered–  Ifnoneoftherulesfired,itisassignedtothedefaultclass

R1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

Name Blood Type Give Birth Can Fly Live in Water Classturtle cold no no sometimes ?

RuleOrderingSchemes

•  Rule-basedordering–  Individualrulesarerankedbasedontheirquality

•  Class-basedordering–  Rulesthatbelongtothesameclassappeartogether

Rule-based Ordering

(Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes

(Refund=No, Marital Status={Married}) ==> No

Class-based Ordering

(Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income<80K) ==> No

(Refund=No, Marital Status={Married}) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes

BuildingClassifica-onRules

•  DirectMethod:•  Extractrulesdirectlyfromdata•  e.g.:RIPPER,CN2,Holte’s1R

•  IndirectMethod:•  Extractrulesfromotherclassifica-onmodels(e.g.decisiontrees,neuralnetworks,etc).•  e.g:C4.5rules

DirectMethod:Sequen-alCovering

1.  Startfromanemptyrule2.  GrowaruleusingtheLearn-One-Rule

func-on3.  Removetrainingrecordscoveredbytherule4.  RepeatStep(2)and(3)un-lstopping

criterionismet

ExampleofSequen-alCovering

(i) Original Data (ii) Step 1

ExampleofSequen-alCovering…

(iii) Step 2

R1

(iv) Step 3

R1

R2

AspectsofSequen-alCovering

•  RuleGrowing

•  InstanceElimina-on

•  RuleEvalua-on

•  StoppingCriterion

•  RulePruning

RuleGrowing

•  Twocommonstrategies

Status =Single

Status =Divorced

Status =Married

Income> 80K...

Yes: 3No: 4{ }

Yes: 0No: 3

Refund=No

Yes: 3No: 4

Yes: 2No: 1

Yes: 1No: 0

Yes: 3No: 1

(a) General-to-specific

Refund=No,Status=Single,Income=85K(Class=Yes)

Refund=No,Status=Single,Income=90K(Class=Yes)

Refund=No,Status = Single(Class = Yes)

(b) Specific-to-general

RuleGrowing(Examples)•  CN2Algorithm:

–  Startfromanemptyconjunct:{}–  Addconjunctsthatminimizestheentropymeasure:{A},{A,B},…–  Determinetheruleconsequentbytakingmajorityclassofinstancescovered

bytherule

•  RIPPERAlgorithm:–  Startfromanemptyrule:{}=>class–  AddconjunctsthatmaximizesFOIL’sinforma-ongainmeasure:

•  R0:{}=>class(ini-alrule)•  R1:{A}=>class(rulealeraddingconjunct)•  Gain(R0,R1)=t[log(p1/(p1+n1))–log(p0/(p0+n0))]•  wheret:numberofposi-veinstancescoveredbybothR0andR1p0:numberofposi-veinstancescoveredbyR0n0:numberofnega-veinstancescoveredbyR0p1:numberofposi-veinstancescoveredbyR1n1:numberofnega-veinstancescoveredbyR1

InstanceElimina-on

•  Whydoweneedtoeliminateinstances?–  Otherwise,thenextruleis

iden-caltopreviousrule•  Whydoweremoveposi-ve

instances?–  Ensurethatthenextruleis

different•  Whydoweremovenega-ve

instances?–  Preventunderes-ma-ng

accuracyofrule–  ComparerulesR2andR3in

thediagram

class = +

class = -

+

+ +

+++

++

++

++

+

+

+

+

++

+

+

-

-

--- -

-

--

- -

-

-

-

-

--

-

-

-

-

+

+

++

+

+

+

R1R3 R2

+

+

RuleEvalua-on

•  Metrics:– Accuracy

– Laplace

– M-es-mate

knnc++

=1

knkpnc

++

=

n:Numberofinstancescoveredbyrule

nc:Numberofinstancescoveredbyrule

k:Numberofclasses

p:Priorprobability

nnc=

StoppingCriterionandRulePruning

•  Stoppingcriterion– Computethegain–  Ifgainisnotsignificant,discardthenewrule

•  RulePruning– Similartopost-pruningofdecisiontrees– ReducedErrorPruning:

•  Removeoneoftheconjunctsintherule•  Compareerrorrateonvalida-onsetbeforeandalerpruning•  Iferrorimproves,prunetheconjunct

SummaryofDirectMethod•  Growasinglerule

•  RemoveInstancesfromrule

•  Prunetherule(ifnecessary)

•  AddruletoCurrentRuleSet

•  Repeat

DirectMethod:RIPPER•  For2-classproblem,chooseoneoftheclassesasposi-ve

class,andtheotherasnega-veclass–  Learnrulesforposi-veclass–  Nega-veclasswillbedefaultclass

•  Formul--classproblem–  Ordertheclassesaccordingtoincreasingclassprevalence(frac-onofinstancesthatbelongtoapar-cularclass)

–  Learntherulesetforsmallestclassfirst,treattherestasnega-veclass

–  Repeatwithnextsmallestclassasposi-veclass

DirectMethod:RIPPER•  Growingarule:

–  Startfromemptyrule–  AddconjunctsaslongastheyimproveFOIL’sinforma-ongain–  Stopwhenrulenolongercoversnega-veexamples–  Prunetheruleimmediatelyusingincrementalreducederrorpruning

–  Measureforpruning:v=(p-n)/(p+n)•  p:numberofposi-veexamplescoveredbytheruleinthevalida-onset

•  n:numberofnega-veexamplescoveredbytheruleinthevalida-onset

–  Pruningmethod:deleteanyfinalsequenceofcondi-onsthatmaximizesv

DirectMethod:RIPPER

•  BuildingaRuleSet:– Usesequen-alcoveringalgorithm

•  Findsthebestrulethatcoversthecurrentsetofposi-veexamples•  Eliminatebothposi-veandnega-veexamplescoveredbytherule

– Each-mearuleisaddedtotheruleset,computethenewdescrip-onlength•  stopaddingnewruleswhenthenewdescrip-onlengthisdbitslongerthanthesmallestdescrip-onlengthobtainedsofar

DirectMethod:RIPPER

•  Op-mizetheruleset:– ForeachrulerintherulesetR

•  Consider2alterna-verules:–  Replacementrule(r*):grownewrulefromscratch–  Revisedrule(r’):addconjunctstoextendtheruler

•  Comparetherulesetforragainsttherulesetforr*andr’•  ChooserulesetthatminimizesMDLprinciple

– Repeatrulegenera-onandruleop-miza-onfortheremainingposi-veexamples

IndirectMethods

Rule Set

r1: (P=No,Q=No) ==> -r2: (P=No,Q=Yes) ==> +r3: (P=Yes,R=No) ==> +r4: (P=Yes,R=Yes,Q=No) ==> -r5: (P=Yes,R=Yes,Q=Yes) ==> +

P

Q R

Q- + +

- +

No No

No

Yes Yes

Yes

No Yes

IndirectMethod:C4.5rules

•  Extractrulesfromanunpruneddecisiontree•  Foreachrule,r:A→y,– consideranalterna-veruler’:A’ →ywhereA’isobtainedbyremovingoneoftheconjunctsinA

– Comparethepessimis-cerrorrateforragainstallr’s

– Pruneifoneofther’shaslowerpessimis-cerrorrate

– Repeatun-lwecannolongerimprovegeneraliza-onerror

IndirectMethod:C4.5rules

•  Insteadoforderingtherules,ordersubsetsofrules(classordering)– Eachsubsetisacollec-onofruleswiththesameruleconsequent(class)

– Computedescrip-onlengthofeachsubset•  Descrip-onlength=L(error)+gL(model)•  gisaparameterthattakesintoaccountthepresenceofredundanta`ributesinaruleset(defaultvalue=0.5)

ExampleName Give Birth Lay Eggs Can Fly Live in Water Have Legs Class

human yes no no no yes mammalspython no yes no no no reptilessalmon no yes no yes no fisheswhale yes no no yes no mammalsfrog no yes no sometimes yes amphibianskomodo no yes no no yes reptilesbat yes no yes no yes mammalspigeon no yes yes no yes birdscat yes no no no yes mammalsleopard shark yes no no yes no fishesturtle no yes no sometimes yes reptilespenguin no yes no sometimes yes birdsporcupine yes no no no yes mammalseel no yes no yes no fishessalamander no yes no sometimes yes amphibiansgila monster no yes no no yes reptilesplatypus no yes no no yes mammalsowl no yes yes no yes birdsdolphin yes no no yes no mammalseagle no yes yes no yes birds

C4.5versusC4.5rulesversusRIPPERC4.5rules:

(GiveBirth=No,CanFly=Yes)→Birds

(GiveBirth=No,LiveinWater=Yes)→Fishes

(GiveBirth=Yes)→Mammals

(GiveBirth=No,CanFly=No,LiveinWater=No)→Rep-les

()→Amphibians

GiveBirth?

Live InWater?

CanFly?

Mammals

Fishes Amphibians

Birds Reptiles

Yes No

Yes

Sometimes

No

Yes No

RIPPER:

(LiveinWater=Yes)→Fishes

(HaveLegs=No)→Rep-les

(GiveBirth=No,CanFly=No,LiveInWater=No)

→Rep-les

(CanFly=Yes,GiveBirth=No)→Birds

()→Mammals

C4.5versusC4.5rulesversusRIPPER

PREDICTED CLASS Amphibians Fishes Reptiles Birds MammalsACTUAL Amphibians 0 0 0 0 2CLASS Fishes 0 3 0 0 0

Reptiles 0 0 3 0 1Birds 0 0 1 2 1Mammals 0 2 1 0 4

PREDICTED CLASS Amphibians Fishes Reptiles Birds MammalsACTUAL Amphibians 2 0 0 0 0CLASS Fishes 0 2 0 0 1

Reptiles 1 0 3 0 0Birds 1 0 0 3 0Mammals 0 0 1 0 6

C4.5andC4.5rules:

RIPPER:

AdvantagesofRule-BasedClassifiers

•  Ashighlyexpressiveasdecisiontrees•  Easytointerpret•  Easytogenerate•  Canclassifynewinstancesrapidly•  Performancecomparabletodecisiontrees

top related