linear vs. quadratic classifier power point

34
Linear vs. Quadratic Discriminant Classifier: an Overview Alaa Tharwat Electrical Dept. - Suez Canal University- Egypt Scientific Research Group in Egypt (SRGE) Email: [email protected] April 2, 2016 Alaa Tharwat April 2, 2016 1 / 34

Upload: alaa-tharwat

Post on 13-Apr-2017

114 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Linear vs. quadratic classifier power point

Linear vs. Quadratic Discriminant Classifier: anOverview

Alaa Tharwat

Electrical Dept. - Suez Canal University- EgyptScientific Research Group in Egypt (SRGE)

Email: [email protected]

April 2, 2016

Alaa Tharwat April 2, 2016 1 / 34

Page 2: Linear vs. quadratic classifier power point

Agenda

Introduction

Building a Classifier Model

Numerical Examples

Conclusions and Future Works

Alaa Tharwat April 2, 2016 2 / 34

Page 3: Linear vs. quadratic classifier power point

Introduction

The main objective is to:

Explain the principals of linear and quadratic classifiers.

Introduce numerical examples to explain how linear and quadraticclassifiers work.

Alaa Tharwat April 2, 2016 3 / 34

Page 4: Linear vs. quadratic classifier power point

Introduction

A pattern or sample is represented by a vector or a set of m features,which represent one point in m-dimensional space (Rm) that iscalled pattern space.

The goal of the pattern classification process is to train a model usingthe labelled patterns to assign a class label to an unknown pattern.

The classifier is represented by c decisions or discriminant functions({f1, f2, . . . , fc}).

The decision functions are used to determine the decision boundariesbetween classes and the region or area of each class.

Alaa Tharwat April 2, 2016 4 / 34

Page 5: Linear vs. quadratic classifier power point

Introduction

f1(x) f2(x) fc(x)

Class Label

Maximum Selector

Dis

crim

inant

Functions

x2 xn1

1xn1+1 xn1+n2

2xN-nc+1 xN-1

c

Input (x

i∈ℛm)

x1x11

x12

x1m

xn1+2 xN

Figure: The structure of building a classifier, which includes N samples and cdiscriminant functions or classes.

Alaa Tharwat April 2, 2016 5 / 34

Page 6: Linear vs. quadratic classifier power point

Introduction

Discriminant functions are used to build the decision boundaries todiscriminate between different classes into different regions(ωi, i = 1, 2, . . . , c).

Assume we have two classes (ω1) and (ω2), thus there are twodifferent discriminant functions (f1 and f2) and the decisionboundary is calculated as follows, S12 = f1 − f2. The decision regionor class label of an unknown pattern x is calculated as follows:

sgn(S12(x)) = sgn(f1(x)−f2(x)) =

Class 1 : for S12(x) > 0

Undefined : for S12(x) = 0

Class 2 : for S12(x) < 0

(1)

Alaa Tharwat April 2, 2016 6 / 34

Page 7: Linear vs. quadratic classifier power point

Introduction

The posterior probability of three classes.

PP(x|ω1)P(ω1)

P(x|ω2)P(ω2)P(x|ω3)P(ω3)

ω1 ω2 ω3

x

1

2

3

1 2 3 4 x1

S12 S13

ω2 ω3

ω1

S23

Decision Boundaries

x2

S13<0

S13>0

Figure: Decision Regions of three classes.

Alaa Tharwat April 2, 2016 7 / 34

Page 8: Linear vs. quadratic classifier power point

Building a Classifier Model

Let ω1, ω2, . . . , ωc be the set of c classes, P (x|ωi) represents thelikelihood function.

P (ωi) represents the priori probability of each class that reflects theprior knowledge about that class and it is simply equal to the ratiobetween the number of samples in that class and the total number ofsamples in all classes (N).

Bayes formula calculates the posterior probability based on priori andlikelihood as follows:

P (ω = ωi|x) =P (x|ω = ωi)P (ωi)

P (x)=likelihood× priori

evidence(2)

P (ω = ωi|x) represents the posterior probability or a posteriori, P (x)represents the evidence and it is calculated as follows,P (x) =

∑ci=1 P (x|ω = ωi)P (ωi).

P (x) is used only to scale the expressions in Equation (2), thus thesum of the posterior probabilities is 1 (

∑ci=1 P (ωi|x) = 1).

Generally, P (ωi|x) is calculated using the likelihood (P (x|ωi)) andprior probability (P (ωi)).

Alaa Tharwat April 2, 2016 8 / 34

Page 9: Linear vs. quadratic classifier power point

Building a Classifier Model

Assume that P (x|ωi) is normally distributed (P (x|ωi) ∼ N (µi,Σi)) asfollows:

P (x|ωi) = N (µi,Σi) =1√

(2π)m|Σi|exp(−1

2(x− µi)TΣ−1

i (x− µi)) (3)

where, µi represents the mean of the ith class, Σi is the covariance matrixof the ith class, |Σi| and Σ−1

i represent the determinant and inverse ofthe covariance matrix, respectively, m represents the number of featuresor the number of variables of the sample (x).

var(x1, x1) cov(x1, x2) . . . cov(x1, xN )cov(x2, x1) var(x2, x2) . . . cov(x2, xN )

......

. . ....

cov(xN , x1) cov(xN , x2) var(xN , xN )

(4)

Alaa Tharwat April 2, 2016 9 / 34

Page 10: Linear vs. quadratic classifier power point

Building a Classifier Model Discriminant Functions for the Normal Density

Assume we have two classes ω1 and ω2 and each class has onediscriminant function (fi, i = 1, 2). If we have an unknown pattern (x)and P (ω1|x) > P (ω2|x), thus the unknown pattern belongs to the firstclass (ω1). Similarly, if P (ω2|x) > P (ω1|x); hence, x belongs to ω2.

fi(x) = ln P (ω = ωi|x) = P (x|ω = ωi)P (ωi) = ln(P (x|ω = ωi))

+ln(P (ωi)) , i = 1, 2

= ln1√

(2π)m|Σi|exp(−1

2(x− µi)TΣ−1

i (x− µi)) + ln(P (ωi))

= −1

2(x− µi)TΣ−1

i (x− µi)−m

2ln(2π)− ln|Σi|

2+ ln(P (ωi))

= −Σ−1i

2(xTx+ µTi µi − 2µTi x)− m

2ln(2π)− ln|Σi|

2+ ln(P (ωi))

(5)

Alaa Tharwat April 2, 2016 10 / 34

Page 11: Linear vs. quadratic classifier power point

Building a Classifier Model Discriminant Functions for the Normal Density

The decision boundary or surface between the class ω1 and ω2 isrepresented by the difference between the two discriminant functions asfollows:

S12 = f1 − f2 =ln P (ω = ω1|x)

ln P (ω = ω2|x)= ln

P (x|ω = ω1)P (ω1)

P (x|ω = ω2)P (ω2)

= lnP (x|ω = ω1)

P (x|ω = ω2)+ ln

P (ω1)

P (ω2)= lnP (x|ω = ω1) + lnP (ω1)

−lnP (x|ω = ω2)− lnP (ω2)

(6)

Alaa Tharwat April 2, 2016 11 / 34

Page 12: Linear vs. quadratic classifier power point

Building a Classifier Model Discriminant Functions for the Normal Density

S12(x) = −1

2[Σ−1

1 (xTx− 2µT1 x+ µT1 µ1)

−Σ−12 (xTx− 2µT2 x+ µT2 µ2) + ln|Σ1| − ln|Σ2|] + ln

P (ω1)

P (ω2)

= −1

2xT (Σ−1

1 − Σ−12 )x+ (µT1 Σ−1

1 − µT2 Σ−1

2 )x

−0.5(µT1 Σ−11 µ1 − µT2 Σ−1

2 µ2 + ln|Σ1| − ln|Σ2|) + lnP (ω1)

P (ω2)

= xTWx+ wTx+W0

(7)

Alaa Tharwat April 2, 2016 12 / 34

Page 13: Linear vs. quadratic classifier power point

Building a Classifier Model Discriminant Functions for the Normal Density

W = −1

2(Σ−1

1 − Σ−12 ) (8)

w = µT1 Σ−11 − µ

T2 Σ−1

2 (9)

W0 = −0.5(µT1 Σ−11 µ1 − µT2 Σ−1

2 µ2 + ln|Σ1| − ln|Σ2|) + lnP (ω1)

P (ω2)(10)

where W0 represents the threshold or bias, w represents the slope of theline, and W is the coefficient of the quadratic term xTx. Thus, thedecision boundary is calculated by quadratic function or curve, which iscalled Quadratic Discriminant Classifier (QDC)

sgn(S12(x)) =

+ve if xTWx+ wTx+W0 > 0 → x ∈ ω1

0 if xTWx+ wTx+W0 = 0; On the boundary−ve if xTWx+ wTx+W0 < 0 → x ∈ ω2

(11)Alaa Tharwat April 2, 2016 13 / 34

Page 14: Linear vs. quadratic classifier power point

Building a Classifier Model Special Case: Common Covariance Matrices

Assume the variance of all classes are equal (Σ1 = Σ2 = Σ), hencethe term W will be neglected.

Similarly, the term ln|Σ1| − ln|Σ2| will be neglected and W0 will beeasier to calculate.

Moreover, w will be easier to implement.

The discriminant function is simplified from quadratic to linearfunction, which is called Linear Discriminant Classifier (LDC)

Alaa Tharwat April 2, 2016 14 / 34

Page 15: Linear vs. quadratic classifier power point

Building a Classifier Model Special Case: Common Covariance Matrices

S12 = wTx+W0 (12)

where

w = Σ−1(µT1 − µT2 ) (13)

and

W0 = −0.5Σ−1(µT1 µ1 − µT2 µ2) + lnP (ω1)

P (ω2)(14)

Alaa Tharwat April 2, 2016 15 / 34

Page 16: Linear vs. quadratic classifier power point

Building a Classifier Model Special Case: Common Covariance Matrices

The decision boundary is the point where S12 = 0 and this point will becalculated as follows:

S12 = 0→ Σ−1(µT1 −µT2 )x−0.5Σ−1(µT1 µ1−µT2 µ2)+ lnP (ω1)

P (ω2)= 0 (15)

The decision boundary xDB is

xDB =µ1 + µ2

2+

Σ

µ2 − µ1lnP (ω1)

P (ω2)(16)

When the two classes are equiprobable, then the second term will beneglected and the decision boundary is the point in the middle of theclass centers.

The decision boundary will be closer to the class that has lower priorprobability. For example, P (ωi) > P (ωj), then|µj − xDB| < |µi − xDB|.

Alaa Tharwat April 2, 2016 16 / 34

Page 17: Linear vs. quadratic classifier power point

Building a Classifier Model Special Case: Common Covariance Matrices

Σ =D DT

f =-0.5Σ1-1(xTx+µ1

Tµ1-2µ1Tx)-0.5m(ln(2 ))-0.5ln(|Σ1|)+ln(P( 1))

1

2

3

1 2 3 4

x2

x1

Σ2=D2D2TΣ3=D3D3

T

Covariance Matrix (Σi)Discriminant Functions (fi)

µ2

S12<0

S13<0

µ3

µ1

S13>0

Class 1

Class 2

Class 3

f2=-0.5Σ2-1(xTx+µ2

Tµ2-2µ2Tx)-0.5m(ln(2 ))-0.5ln(|Σ2|)+ln(P( 2))f3=-0.5Σ3

-1(xTx+µ3Tµ3-2µ3

Tx)-0.5m(ln(2 ))-0.5ln(|Σ3|)+ln(P( 3))

1

3

2

(S12=f1-f2)>0

S23>0(S23<0

X=

x1x2

Data Matrix (X)

x12

321

321

Mean of each

Class ( i)

Di= i- i

-1

-1

-2

-3

3-

2-1 -

Figure: Steps of calculating discriminant classifier given three classes, each classhas four samples.

Alaa Tharwat April 2, 2016 17 / 34

Page 18: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

In this example, the features were statistically independent, i.e. alloff-diagonal elements of the covariance matrices were zeros, and hadthe same variance (σ2). Thus,

1 The covariance matrices were diagonal and its diagonal elements wereσ2.

2 Geometrical interpretation for this case is that each class is centeredaround its mean, the distance from the mean to all samples of thesame class are equal.

3 The distributions of all classes are spherical in an m-dimensional space.

Alaa Tharwat April 2, 2016 18 / 34

Page 19: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

Given three different classes denoted by, ω1, ω2, ω3.

ω1 =

3.00 4.003.00 5.004.00 4.004.00 5.00

, ω2 =

3.00 2.003.00 3.004.00 2.004.00 3.00

, and ω3 =

6.00 2.006.00 3.007.00 2.007.00 3.00

(17)

The mean of each class is:

µ1 =[3.50 4.50

], µ2 =

[3.50 2.50

], and µ3 =

[6.50 2.50

](18)

Alaa Tharwat April 2, 2016 19 / 34

Page 20: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

Subtract the mean of each class from each sample in that class as follows:

D1 =

−0.5 −0.5−0.5 0.50.5 −0.50.5 0.5

, D2 =

−0.50 −0.50−0.50 0.500.50 −0.500.50 0.50

and D3 =

−0.50 −0.50−0.50 0.500.50 −0.500.50 0.50

(19)

The covariance matrix for each class (Σi) is:

Σ1 = Σ2 = Σ3 =

[1.00 0.000.00 1.00

](20)

Σ−11 = Σ−1

2 = Σ−13 =

[1.00 0.000.00 1.00

](21)

Alaa Tharwat April 2, 2016 20 / 34

Page 21: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

The discriminated functions for each class is:

fi(x) = −Σ−1i

2(xTx+ µTi µi − 2µTi x)− m

2ln(2π)− ln|Σi|

2+ ln(P (ωi))

(22)

f1 = −0.5x21 − 0.5x22 + 3.50x1 + 4.50x2 − 17.35

f2 = −0.5x21 − 0.5x22 + 3.50x1 + 2.50x2 − 10.35

f3 = −0.5x21 − 0.5x22 + 6.50x1 + 2.50x2 − 25.35

(23)

The decision boundaries between each two classes are as follows:

S12 = f1 − f2 → x2 = 3.50

S13 = f1 − f3 → x2 = 1.5x1 − 4.00

S23 = f2 − f3 → x1 = 5.00

(24)

As shown, the decision boundary S12 depends only on x2. Thus, for allsamples belonging to class ω1, the value of x2 is greater than 3.5 to bepositive.

Alaa Tharwat April 2, 2016 21 / 34

Page 22: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

1

2

3

4

5

6

1 2 3 4 5 6

x2 Class 1

Class 2

7

7

Class 3

µ1

µ2

µ2 µ3

µ3

S13<0

µ3

x1 =

5

x 2=1.5

x 1-4

S12<0 µ2

S12>0

x2=3.5

µ1

S23<0S23>0

S13>0

µ1

σ

σ

P

σ

σ

σ

σ

S12

S 13

S23

x1

Figure: The calculated decision boundaries for three different classes where thefeatures or variables are statistically independent and have the same variance.

Alaa Tharwat April 2, 2016 22 / 34

Page 23: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

Figure: Classification of three Gaussian classes with the same covariance matrix(Σ1 = Σ2 = Σ3 = σ2I) (our first example). Top figure, the green, red, and bluesurfaces represent the discriminant functions, f1, f2, andf3, respectively.Bottom, decision boundaries (separation curves) S12 = f1 − f2, S13 = f1 − f3,and S23 = f2 − f3.

Alaa Tharwat April 2, 2016 23 / 34

Page 24: Linear vs. quadratic classifier power point

Numerical Examples Example 1: Equal Variance (Σi = σ2I)

Given an unknown or test sample (T [2 2])

f1 = −5.35

f2 = −2.35

f3 = −11.35

(25)

The slope of the discriminant function will not be affected by changingthe priori probability. On the other hand, the bias of each discriminantfunction changes according to the prior probability.Assume the priori probability of the three classes in our example werechanged to be as follows, P (ω1) = 8

12 , P (ω2) = 212 , and P (ω3) = 2

12 .

f1 = −0.5x21 − 0.5x22 + 3.50x1 + 4.50x2 − 16.94

f2 = −0.5x21 − 0.5x22 + 3.50x1 + 2.50x2 − 10.64

f3 = −0.5x21 − 0.5x22 + 6.50x1 + 2.50x2 − 25.64

(26)

Alaa Tharwat April 2, 2016 24 / 34

Page 25: Linear vs. quadratic classifier power point

Numerical Examples Example 2: Equal Variance (Σi = Σ)

In this example, the covariance matrices of all classes were equal butarbitrary.

The variance of the variables were not equal.

Geometrical interpretation for this case is that the distributions of allclasses were elliptical in m-dimensions space.

Alaa Tharwat April 2, 2016 25 / 34

Page 26: Linear vs. quadratic classifier power point

Numerical Examples Example 2: Equal Variance (Σi = Σ)

Table: Feature values, mean, mean-centering data, and covariance matrices forall classes.

Pattern No.Features

ClassMean D Covariance

Matrix (Σi)x1 x2 x1 x2 x1 x21 3.00 5.00 ω1

3.50 6.00

-0.50 -1.00

Σ1 =

[1.00 0.000.00 4.00

]2 3.00 7.00 ω1 -0.50 1.003 4.00 5.00 ω1 0.50 -1.004 4.00 7.00 ω1 0.50 1.00

5 2.00 2.00 ω2

2.50 3.00

-0.50 -1.00

Σ2 =

[1.00 0.000.00 4.00

]6 2.00 4.00 ω2 -0.50 1.007 3.00 2.00 ω2 0.50 -1.008 3.00 4.00 ω2 0.50 1.00

9 6.00 1.00 ω3

6.50 2.00

-0.50 -1.00

Σ3 =

[1.00 0.000.00 4.00

]10 6.00 3.00 ω3 -0.50 1.0011 7.00 1.00 ω3 0.50 -1.0012 7.00 3.00 ω3 0.50 1.00

Alaa Tharwat April 2, 2016 26 / 34

Page 27: Linear vs. quadratic classifier power point

Numerical Examples Example 2: Equal Variance (Σi = Σ)

Values of the inverse of the covariance matrices are as follows:

Σ−11 = Σ−1

2 = Σ−13 =

[1.00 0.000.00 0.25

](27)

The discriminant functions were then calculated and its values will be asfollows:

f1 = −0.5x21 − 0.125x22 + 3.50x1 + 1.5x2 − 11.72

f2 = −0.5x21 − 0.125x22 + 2.50x1 + 0.75x2 − 5.35

f3 = −0.5x21 − 0.125x22 + 6.50x1 + 0.50x2 − 22.72

(28)

The decision boundaries between each two classes were then calculated asfollows,

S12 = f1 − f2 → x1 = 6.37− 0.75x2

S13 = f1 − f3 → x2 = 3.00x1 − 11.00

S23 = f2 − f3 → x2 = 16x1 − 69.48

(29)

Alaa Tharwat April 2, 2016 27 / 34

Page 28: Linear vs. quadratic classifier power point

Numerical Examples Example 2: Equal Variance (Σi = Σ)

1

2

3

4

5

6

1 2 3 4 5 6 7

7

8

µ2

µ3

S23<0S23>0

µ2

µ1

S13>0

S13<0

µ3

x1 =6.37-0.75x

2

x2 =

16x1 -69.48

S12>0

S12<0

x 2=3

x 1-1

1

µ1

µ3

Class 1

Class 2

Class 3

x2

µ1

µ2

x1

S12

S13

S23

Figure: The calculated decision boundaries for three different classes where theircovariance matrices were equal but arbitrary.

Alaa Tharwat April 2, 2016 28 / 34

Page 29: Linear vs. quadratic classifier power point

Numerical Examples Example 2: Equal Variance (Σi = Σ)

Figure: Classification of three Gaussian classes with the same covariance matrix(Σ1 = Σ2 = Σ3) (our second example). Green, red, and blue surfaces representf1, f2, andf3, respectively.

Alaa Tharwat April 2, 2016 29 / 34

Page 30: Linear vs. quadratic classifier power point

Numerical Examples Example 3: Different Covariance matrices (Σi =arbitrary)

In this example, the covariance matrices were different for all classesand we can consider this case represents the common case.

The distributions of all classes were different.

Table: Feature values, mean, mean-centering data, and covariance matrices forall classes.

Pattern No.Features

ClassMean D Covariance

Matrix (Σi)x1 x2 x1 x2 x1 x21 7.00 3.00 ω1

7.50 3.50

-0.50 -0.50

Σ1 =

[1.00 0.000.00 1.00

]2 8.00 3.00 ω1 0.50 -0.503 7.00 4.00 ω1 -0.5 0.504 8.00 4.00 ω1 0.50 0.50

5 2.00 2.00 ω2

3.50 2.50

-1.50 -0.50

Σ2 =

[9.00 0.000.00 1.00

]6 5.00 2.00 ω2 1.50 -0.507 2.00 3.00 ω2 -1.50 0.508 5.00 3.00 ω2 1.50 0.50

9 1.00 6.00 ω3

3.00 6.50

-2.00 -0.50

Σ3 =

[16.00 0.000.00 1.00

]10 5.00 6.00 ω3 2.00 -0.5011 1.00 7.00 ω3 -2.00 0.5012 5.00 7.00 ω3 2.00 0.50

Alaa Tharwat April 2, 2016 30 / 34

Page 31: Linear vs. quadratic classifier power point

Numerical Examples Example 3: Different Covariance matrices (Σi =arbitrary)

The values of the inverse of the covariance matrices are as follows:

Σ−11 =

[1.00 0.000.00 1.00

]Σ−12 =

[0.11 0.000.00 1.00

]Σ−13 =

[0.06 0.000.00 1.00

](30)

The discriminated functions were then calculated and its values will be asfollows:

f1 = −0.50x21 − 0.50x22 + 7.50x1 + 3.50x2 − 35.35

f2 = −0.06x21 − 0.50x22 + 0.39x1 + 2.50x2 − 6.00

f3 = −0.03x21 − 0.50x22 + 0.19x1 + 6.50x2 − 23.89

(31)

The decision boundaries between each two classes were then calculated asfollows:

S12 = f1 − f2 → x2 = 0.44x21 − 7.11x1 + 29.35

S13 = f1 − f3 → x2 = −0.16x21 + 2.44x1 − 3.82

S23 = f2 − f3 → x2 = −0.01x21 + 0.05x1 + 4.47

(32)

Alaa Tharwat April 2, 2016 31 / 34

Page 32: Linear vs. quadratic classifier power point

Numerical Examples Example 3: Different Covariance matrices (Σi =arbitrary)

1

2

3

4

5

6

1 2 3 4 5 6 7

8

8

7

µ1

µ2

µ3

S13=-0.48x12+7.32x1-3x2-11.46=0

S23=-0.04x12+0.20x1-4x2+17.88=0

S12 =-0.44x

1 2+7.11x1 +x

2 -29.35=0

Class 1

Class 2

Class 3

S12<0

S12>0

S23>0

S23<0

S13>0

S13<0

x1

x2

Figure: The calculated decision boundaries for three different classes where theircovariance matrices are different (our example in Section ??).

Alaa Tharwat April 2, 2016 32 / 34

Page 33: Linear vs. quadratic classifier power point

Numerical Examples Example 3: Different Covariance matrices (Σi =arbitrary)

Figure: Classification of three Gaussian classes with different covariance matrix(our third example). Green, red, and blue surfaces represent the discriminantfunctions, f1, f2, andf3, respectively.

Alaa Tharwat April 2, 2016 33 / 34

Page 34: Linear vs. quadratic classifier power point

Conclusions and Future Work

How to construct linear and quadratic classifiers.

In the future we will explain how the singularity problem occurs andhow to solve it.

Alaa Tharwat April 2, 2016 34 / 34