08 -1 lecture 08 classification-based learning topics –basics –decision trees –multi-layered...

44
08<Classification>-1 Lecture 08 Classification-based Learning • Topics – Basics Decision Trees Multi-Layered Perceptrons – Applications

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-1

Lecture 08 Classification-based Learning

• Topics– Basics– Decision Trees– Multi-Layered Perceptrons– Applications

Page 2: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-2

Basics

• Machine Learning– Inductive learning– Deductive learning– Abductive learning (reasoning)– Reinforcement learning– Collaborative learning

• Classification– Given a set of examples, each labelled

as in a specific class, learn how we classify the examples

– Inductive learning– Supervised learning

Page 3: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-3

Basics• Decision Trees

– A symbolic representation of of a reasoning process.

– It describes a data set by a tree-like structure.

• Multi-Layered Perceptrons (MLP)– A subsymbolic representation

architecture with multiple layers of preceptrons

– Parallel inference– Learning by generalizing patterns to

approximate functions

Page 4: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-4

Decision Trees

Homeownership

112responded:not responded: 888

Household

Total: 1000

responded:not responded:

No

Total:

103554657

9responded:not responded: 334

Yes

Total: 343

Household Income

responded:not responded:

No

Total:

3208211

responded:not responded:

Yes

Total:

86188274

Savings Accounts

89responded:not responded: 396

$20,701

Total: 485

14responded:not responded: 158

$20,700

Total: 172

• Example

Page 5: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-5

Decision Trees

• Data representation: Attribute-based language

• Attribute-list and two examples:Alist: [Gender Age Blood Smoking Caffeine

HT?]Ex1: [Male 50-59 high ≥1pack ≥3cups

high]Ex2: [Female 50-59 low ≥1pack ≥3cups

normal]– Where Blood: Blood-pressure

Caffeine: Caffeine-intakeHT: Hypertension, class label

Page 6: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-6

Decision Trees• Knowledge representation: Tree-like

structure– The tree always starts from the root node and

grows down by splitting the data at each level into new nodes according to some predictor (attribute, feature).

– The root node contains the entire data set (all data records), and child nodes hold respective subsets of that set.

– A split in a decision tree corresponds to the predictor with the maximum separating power.

– The best split does the best job in creating nodes where a single class dominates.

– Two of the best known methods of calculating the predictor’s power:

• Gini coefficient• Entropy

Page 7: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-7

Decision Trees• The Gini coefficient

is calculated as the area between the Lorenz curve and the diagonal divided by the area below the diagonal, i.e., (A-B)/B.

• The Gini coefficient ranges from 0 (perfect equality) to 1 (perfect inequality).

• Looking for largest Gini coefficient

Lorenz curve

A

B

X = % Instances in Parent node

Y =

% In

stan

ces

in C

lass

A

0 20 40 60 80 100 0

20

40

60

80

100

Brown formula:

|))((1| 1

1

01

iii

n

ii yyxxG

Page 8: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-8

Decision Trees

Predictor 1

Class A:Class B:Total:

10050

150

Predictor 4

Class A:Class B:Total:

371249

Class A:Class B:Total:

213

Class A:Class B:Total:

233

26

Class A:Class B:Total:

110

11

Class A:Class B:Total:

189

Predictor 5 Predictor 6

Class A:Class B:Total:

254

29

Class A:Class B:Total:

128

20

Class A:Class B:Total:

6338

101

Class A:Class B:Total:

03636

Class A:Class B:Total:

415

Predictor 3

Class A:Class B:Total:

43741

Predictor 2

Class A:Class B:Total:

591

60

yes

yesyes

no

no no

yes no yes no yes no

• Selecting an optimal tree with Gini splitting

Page 9: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-9

Decision Trees• Calculation of Gini coefficient for Class A

(with 7 leaf nodes forming a Lorenz curve)• (Non-increasingly) sorted instances in Class

A– 59, 23, 11, 4, 2, 1, 0

• Corresponding cumulative percentages for Class A– 59/100, 82/100, 93/100, 97/100, 99/100, 100/100,

100/100• Corresponding cumulative percentages for

total population– 60/150, 86/150, 97/150, 102/150, 105/150,

114/150, 150/150• G(A) = |1- [60/150 * 59/100 + (86-60)/150 * (59+82)/

100 + (97-86)/150 * (82+93)/ 100 + (102-97)/150 * (97+93)/ 100 + (105-102)/150 * (99+97)/ 100 + (114-105)/150 * (100+99)/ 100 + (150-114)/150 * (100+ 100)/ 100]|=0.311

Page 10: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-10

Decision Trees

0 20 40 60 80 100 0

20

40

60

80

100 %

Cla

ss A

% Total Population

The Gini splits

Manual split selection

• Gain chart of Class A

Page 11: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-11

Decision Trees

Predictor 5

Class A:Class B:Total:

10050

150

Predictor 3

Class A:Class B:Total:

8136

117

Class A:Class B:Total:

371451

Class A:Class B:Total:

97

16

Class A:Class B:Total:

239

32

Class A:Class B:Total:

126

18

Predictor 1 Predictor 6

Class A:Class B:Total:

462167

Class A:Class B:Total:

351550

Class A:Class B:Total:

191433

Class A:Class B:Total:

89

17

Class A:Class B:Total:

295

34

Predictor 4

Class A:Class B:Total:

176

23

Predictor 2

Class A:Class B:Total:

28

10

yes no

yes no yes no

yes no

yes no

yes no

• Selecting an optimal tree with random splitting

Page 12: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-12

Decision Trees

• Extracting rules from decision trees• The path from the root node to a bottom leaf reveals a decision rule• For example, a rule associated with the right bottom leaf in the figure that represents Gini splits can be represented as follows:

if (Predictor 1 = no)and (Predictor 4 = no)and (Predictor 6 = no)then class = Class A

Page 13: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-13

Decision Trees• Entropy

– Node A contains n classes, ci, i = 1, …, n, each with probability p(ci)

n

iii cpcpAE

12 )(log)()(

)()()( CNEAEsplitAGain

- Entropy of node A:

- Gain of the split at node A:

- Entropy of child nodes of A by some split (CNi: child node i; p(CNi): probability of CNi):

k

iii CNECNpCNE

1

)(*)()(

Page 14: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-14

Decision Trees• Select the split with largest gain.• When to end splitting?• Calculate deviation D of the split against

random splitting

'/)'('/)'( 2

1

2iii

iiii nnnpppD

);/()(*' npnppp iii )/()(*' npnpnn iii

- p’ and n’: expected instances in class p and n if randomly distributed

• Null hypothesis: if random splitting, the deviation D will be distributed according to distribution with -1 degrees of freedom

Page 15: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-15

Decision Trees

• distribution (pdf)

χ2 p value

r 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005

1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12

2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20

3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73

• Stop splitting if D() ≤ )1(2 v

r: degree of freedom

Page 16: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-16

Decision Trees

• The main advantage of the decision-tree approach to classification is it visualises the solution; it is easy to follow any path through the tree.

• Relationships learned by a decision tree can be expressed as a set of rules, which can then be used in developing an intelligent system.

Page 17: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-17

Decision Trees• Data preprocessing

– Continuous data, such as age or income, have to be grouped into ranges, which can unwittingly hide important patterns.

– Missing or inconsistent data have to be brought back or resolved.

• Inability to examine more than one variable at a time. This confines trees to only the problems that can be solved by dividing the solution space into several successive rectangles

Page 18: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-18

Multi-Layered Perceptrons

Input Layer Output Layer

Middle Layer

I n p

u t

S

i g n

a l

s

O u

t p

u t

S

i

g n

a l

s

• Example

Page 19: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-19

Multi-Layered Perceptrons

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon

• Emulating biological neural network

Page 20: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-20

Multi-Layered Perceptrons• Data representation: Coded attribute-

based language• Each input node takes an attribute (feature)• Coded attribute-list and two examples:A-list: [Gender Age Blood Smoking Caffeine HT?]Ex1: [0 0.5 1 1 1 1]Ex2: [1 0.5 0 1 1 0]

– Gender (categorical data): 0: male; 1:female – Age (continuous data ): age/100 (or 0.1:<10 yrs; 0.2:

20~21; …; 0.9: 90~99; 1.0: >99)– Blood (Blood-pressure): 0: low; 1: high– Smoking: cigarettes /40 (or 0 for 0 cigarettes; 0.1:

<10cigarretes; 0.5: 10~19; 1: ≥ 1 pack)– Caffeine (Caffeine-intake continuous data): cups/3 (or

0: 0 cups; 0.5: 1~2 cups; 1.0: ≥ 3cups )– HT (Hypertension, class label): 0: normal; 1: high

Page 21: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-21

Multi-Layered Perceptrons

Neuron Y

Input Signals

x1

x2

xn

Output Signals

Y

Y

Y

w2

w1

wn

Weights

• Knowledge representation: weight, neuron, and network structure

Page 22: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-22

Multi-Layered Perceptrons• Neuron structure: A neuron computes the

weighted sum of the input signals and compares the result with a threshold value, . If the net input is less than the threshold, the neuron output is –1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1.

• The neuron uses the following transfer or activation function:

• Y(X) is called a sign function, Ysign.

n

iiiwxX

1

X

XXY

if ,1

if ,1)(

Page 23: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-23

Multi-Layered Perceptrons

• Sample activation functions

Step function Sign function

+1

-1

0

+1

-1

0 X

Y

X

Y

+1

-1

0 X

Y

Sigmoid function

+1

-1

0 X

Y

Linear function

0 if ,0

0 if ,1

X

XY step

0 if ,1

0 if ,1

X

XY sign

Xsigmoid

eY

1

1XY linear

Page 24: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-24

MLP - Perceptrons

• Network structure: Preceptrons• A perceptron is the simplest form of

a neural network, consisting of a single neuron with adjustable synaptic weights and a hard limiter

• A single-layer two input perceptron

Threshold

Inputs

x1

x2

Output

Y

HardLimiter

w2

w1

LinearCombiner

Page 25: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-25

MLP - Perceptrons

• The aim of the perceptron is to classify inputs, x1, x2, . . ., xn, into one of two classes, say Y=A1 or Y=A2.

• In the case of an elementary perceptron, the n-dimensional input space is divided by a hyperplane into two decision regions. The hyperplane is defined by the linearly separable function:

01

n

iiiwx

Page 26: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-26

MLP - Perceptrons

x1

x2

Class A2

Class A1

1

2

x1w1 + x2w2 = 0

(a) Two-input perceptron. (b) Three-input perceptron.

x2

x1

x3x1w1 + x2w2 + x3w3 = 0

12

cf: ax+by=c cf: ax+by+cz=d

Page 27: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-27

MLP - Perceptrons• A perceptron learn its classification by

making small adjustments in the weights to reduce the difference (error) between the desired and actual outputs of the perceptron.

• The initial weights are randomly assigned, usually in a small range, and then updated to obtain the output consistent with the training examples.

• If the error is positive, we need to increase perceptron output; if it is negative, we need to decrease perceptron output.

Page 28: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-28

MLP - Perceptron learning algorithm

• Step 1: InitializationSet initial weights w1, w2,…, wn and threshold to random numbers in the range [0.5, 0.5].

Page 29: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-29

MLP - Perceptron learning algorithm

• Step 2: Activation(a) Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and desired output Yd (p). Calculate the actual output at iteration p = 1:

where n is the number of the perceptron inputs.

(b) Calculate the output error:

n

iii

step pwpxYpY1

)( )()(

)()()( pYpYpe d

Page 30: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-30

MLP - Perceptron learning algorithm

• Step 3: Weight trainingCalculate the weight correction at iteration p, wi(p), using delta rule:

Update the weights of the perceptron

• Step 4: IterationIncrease iteration p by one, go back to Step 2 and repeat the process until convergence

• Epoch: An epoch finishes the weight adjustment with the whole training examples.

)()()( pepxpw ii

)()()1( pwpwpw iii

Page 31: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-31

Multi-Layered Perceptrons• Linearly inseparable problem with

perceptrons• Increasing layers: from perceptrons to MLP • An MLP is a feedforward neural network

with one or more hidden layers• The network consists of an input layer of

source neurons, at least one middle or hidden layer of computational neurons, and an output layer of computational neurons.

• The input signals are propagated in a forward direction on a layer-by-layer basis.

Page 32: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-32

Multi-Layered Perceptrons

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

O u

t p

u t

S

i g n

a l

s

I n p

u t

S

i g n

a l

s• MLP with two hidden layers

Page 33: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-33

Multi-Layered Perceptrons• Learning in an MLP proceeds the same way

as for a perceptron.• First, a training input pattern is presented to

the network input layer. The network propagates the input pattern from layer to layer until the output pattern is generated by the output layer.

• If this pattern is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer. The weights are modified as the error is propagated - Back-propagation learning algorithm (BP)

Page 34: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-34

Multi-Layered Perceptrons

Input layer

xi

x1

x2

xn

1

2

i

n

Output layer

1

2

k

l

yk

y1

y2

yl

Input signals

Error signals

wjk

Hidden layer

wij

1

2

j

m

Page 35: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-35

MLP - BP Learning Algorithm• Step 1: Initialization

Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range:

where Fi is the total number of inputs of neuron i in the network. The weight initialization is done on a neuron-by-neuron basis.

ii FF

4.2 ,

4.2

Page 36: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-36

MLP - BP Learning Algorithm

• Step 2: ActivationActivate MLP by applying inputs x1(p), x2(p),…, xn(p) and desired outputs yd,1(p), yd,2(p),…, yd,l(p).

(a) Calculate the actual outputs of the neurons in the hidden layer:

where j = 1, .., m.

j

n

iiji

sigmoidj pwpxYpy

1

)()()(

Page 37: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-37

MLP - BP Learning Algorithm• Step 2: Activation (continued)

(b) Calculate the actual outputs of the neurons in the output layer:

where k = 1, …, l.

(c) Calculate the output errors of the neurons in the output layer:

k

m

jjkjk

sigmoidk pwpxYpy

1

)()()(

)()()( , pypype kkdk

Page 38: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-38

MLP - BP Learning Algorithm• Step 3: Weight training

(a) Calculate the error gradient for the neurons in the output layer:

Calculate the weight corrections by delta rule:

Update the weights at the output neurons:

)()()( ppypw kjjk

)()()1( pwpwpw jkjkjk

)()(1)()( pepypyp kkkk

Page 39: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-39

MLP - BP Learning Algorithm• Step 3: Weight training (continued)

(b) Calculate the propagated errors for the neurons in the hidden layer:

Calculate the error gradient for the neurons in the hidden layer:

Calculate the weight corrections by delta rule:

Update the weights at the hidden neurons:

)()(1)()( ][ ppypyp jejjj

)p()p(x)p(w jiij

)()()1( pwpwpw ijijij

)()()(1

pwppel

kjkkj

Page 40: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-40

MLP - BP Learning Algorithm

• Step 4: IterationIncrease iteration p by one, go back to Step 2 and repeat the process until the selected error criterion is satisfied.

Page 41: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-41

Multi-Layered Perceptrons• We can accelerate training by including a

momentum term in the delta rule and changing it into the generalized delta rule:

where is a positive number (0 1), called the momentum constant. Typically, the momentum constant is set to 0.95.

)()()1()( ppypwpw kjjkjk

Page 42: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-42

Multi-Layered Perceptrons• To accelerate the convergence and yet avoid the

danger of instability, we can apply two heuristics:• Heuristic 1

If the change of the sum of squared errors has the same algebraic sign for several consequent epochs, then the learning rate parameter, , should be increased. The sum of squared errors (network performance measure):

• Heuristic 2If the algebraic sign of the change of the sum of squared errors alternates for several consequent epochs, then the learning rate parameter, , should be decreased.

k

kkd yyE 2/)( 2,

Page 43: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-43

Multi-Layered Perceptrons

• Adapting the learning rate– If the sum of squared errors at the

current epoch exceeds the previous value by more than a predefined ratio (typically 1.04), the learning rate parameter is decreased (typically by multiplying by 0.7) and new weights and thresholds are calculated.

– If the error is less than the previous one, the learning rate is increased (typically by multiplying by 1.05).

Page 44: 08 -1 Lecture 08 Classification-based Learning Topics –Basics –Decision Trees –Multi-Layered Perceptrons –Applications

08<Classification>-44

Applications

• Decision trees– Data mining

• Churn model construction

• Multi-Layered Perceptrons– Function approximation– Pattern recognition

• Hand-writing recognition

– Case-based retrieval