fischer linear discriminant

7/29/2019 fischer linear discriminant

1/23

PCA finds the most accurate data representationin a lower dimensional space

Project data in the directions of maximum variance

Fisher Linear Discriminant project to a line whichpreserves direction useful for data classification

Data Representation vs. Data Classification

However the directions of maximum variance maybe useless for classification

apply PCA

to each class

separable

not separable


2/23

Fisher Linear Discriminant

Main idea: find projection to a line s.t. samplesfrom different classes are well separated

bad line to project to,classes are mixed up

Example in 2D

good line to project to,classes are well separated


3/23


Suppose we have 2 classes and d-dimensionalsamples x1,,xn where n1 samples come from the first class n2 samples come from the second class

consider projection on a line

Let the line direction be given by unit vector v

vix

vt x i

Scalar vtxi is the distance ofprojection of xifrom the origin

Thus it vtxi is the projection ofxiinto a one dimensionalsubspace


4/23


How to measure separation between projections of

different classes?

Thus the projection of sample xionto a line indirection vis given by vtxi

====

========

1 1

1

1

111

1 11~

n

Cx

t

n

Cxi

ti

t

i i

vxn

vxvn

Let1 and2be the means of classes 1 and 2

Let and be the means of projections of

classes 1 and 21

~2

~

22~, tvsimilarly ====

seems like a good measure21~~


5/23


How good is as a measure of separation? The larger , the better is the expected separation

21 ~~

the vertical axes is a better line than the horizontalaxes to project to for class separability

however2121

~~ >>>>

21~~

1

2

1

2

1

~

2~


6/23


The problem with is that it does notconsider the variance of the classes

21 ~~

1

2

1~

2~

large variance

small

variance 1

2


7/23


We need to normalize by a factor which isproportional to variance

21 ~~

(((( ))))====

====

n

i

zizs

1

2

Define their scatteras

Have samples z1,,zn. Sample mean is ====

====

n

iiz z

n 1

1

Thus scatter is just sample variance multiplied by n

scatter measures the same thing as variance, the spread

of data around the mean scatter is just on different scale than variance

larger scatter: smaller scatter:


8/23


Fisher Solution: normalize by scatter21 ~~

(((( ))))

====

2Classy

22i

22

i

~ys~

Scatter for projected samples of class 2 is

Scatter for projected samples of class 1 is

(((( ))))

====

1

2

1

2

1~~

Classyi

i

ys

Let yi= vtxi, i.e. yi s are the projected samples


9/23


We need to normalize by both scatter of class 1 andscatter of class 2

(((( ))))(((( ))))

2

2

2

1

2

21

~~

~~

ssvJ

++++

====

Thus Fisher linear discriminant is to project on line

in the direction vwhich maximizes

want projected means are far from each other

want scatter in class 2 is assmall as possible, i.e. samplesof class 2 cluster around theprojected mean

2~

want scatter in class 1 is assmall as possible, i.e. samplesof class 1 cluster around theprojected mean 1

~


10/23


(((( )))) (((( ))))22

2

1

2

21

~~~~ss

vJ++++

====

If we find vwhich makes J(v) large, we are

guaranteed that the classes are well separated

1

~ 2~

small implies thatprojected samples ofclass 1 are clusteredaround projected mean

1

~s small implies that

projected samples ofclass 2 are clusteredaround projected mean

2

~s

projected means are far from each other


11/23

Fisher Linear Discriminant Derivation

All we need to do now is to express Jexplicitly as a

function of vand maximize it straightforward but need linear algebra and Calculus

(((( )))) (((( ))))22

2

1

2

21

~~~~ss

vJ++++

====

Define the separate class scatter matrices S1 andS2 for classes 1 and 2. These measure the scatterof original samples xi(before projection)

(((( ))))(((( ))))====

1111

Classx

t

iii

xxS

(((( ))))(((( ))))

====

2

222

Classx

t

ii

i

xxS


12/23


Now define the withinthe class scatter matrix21 SSSW ++++====

Recall that (((( ))))

====

1

2

1

2

1~~

Classyi

i

ys

(( ))====

1Classy

2

1

t

i

t2

1i vxvs

~

Using yi= vtxi and 11

~ tv====

(((( ))))(((( )))) (((( ))))(((( ))))====

1Classy

t1i

t

t1i

i

vxvx

(((( ))))(((( ))))

====

1Classy

t

1i1it

i

vxxv vSvt 1====

(((( ))))(((( )))) (((( ))))(((( ))))

====

1Classy1i

tt

1it

i

xvxv


13/23


Similarly vSvs t 222~ ====

Lets rewrite the separations of the projected means

Therefore vSvvSvvSvss Wttt

====++++====++++ 21

2

2

2

1

~~

(((( )))) (((( ))))2212

21~~ tt vv ====

(((( ))))(((( )))) vv tt 2121 ====

vSv Bt

====

Define between the class scatter matrix(((( ))))(((( ))))tBS 2121 ====

SBmeasures separation between the means of two

classes (before projection)


14/23


Thus our objective function can be written:

Minimize J(v) by taking the derivative w.r.t. vand

setting it to 0

(((( ))))(((( ))))2vSv

vSvvSvdv

dvSvvSvdv

d

vJdv

d

Wt

BtWtWtBt

====

(((( )))) (((( ))))(((( ))))2

22

vSv

vSvvSvSvvS

Wt

BtWWtB ==== 0====

(((( ))))(((( ))))

vSv

vSv

s~s~

~~vJ

W

t

Bt

2

2

2

1

2

21====

++++

====


15/23


Need to solve (((( )))) (((( )))) 0==== vSvSvvSvSv WBtBWt

(((( )))) (((( ))))0====

vSv

vSvSv

vSv

vSvSv

Wt

WBt

Wt

BWt

(((( ))))0====

vSv

vSvSvvS

W

t

WBt

B

vSvS WB ========

generalized eigenvalue problem


16/23


If SWhas full rank (the inverse exists), can convert

this to a standard eigenvalue problem

vSvS WB ====

vvSS BW ==== 1

But SBx for any vector x, points in the same

direction as1 - 2

(((( ))))(((( )))) xxS tB 2121 ==== (((( )))) (((( )))) xt2121 ==== (((( ))))21 ====

Thus can solve the eigenvalue problem immediately

(((( ))))211 ====

WSv

(((( )))) (((( ))))[[[[ ]]]]211

W211

WB1

W SSSS ====

v v

(((( ))))211

WS ====


17/23

Fisher Linear Discriminant Example

Data Class 1 has 5 samples c1=[(1,2),(2,3),(3,3),(4,5),(5,5)]

Class 2 has 6 samples c2=[(1,0),(2,1),(3,1),(3,2),(5,3),(6,5)]

Arrange data in 2 separate matrices

====

55

21c1

====

56

01c2

Notice that PCA performs verypoorly on this data because thedirection of largest variance is nothelpful for classification


18/23


First compute the mean for each class(((( )))) [[[[ ]]]]6.33cmean 11 ======== (((( )))) [[[[ ]]]]23.3cmean 22 ========

(((( ))))

======== 2.70.80.810ccov4S 11 (((( ))))

======== 1616163.17ccov5S 22

Compute scatter matrices S1 and S2 for each class

====++++==== 2.2324 243.27SSS 21W

Within the class scatter:

it has full rank, dont have to solve for eigenvalues

(((( ))))

========

47.041.0 41.039.0SinvS W1W The inverse of SW is

Finally, the optimal line direction v

(((( ))))

========

89.0

79.0Sv 211

W


19/23


Notice, as long as the linehas the right direction, its

exact position does notmatter

Last step is to compute

the actual 1D vector y.Lets do it separately foreach class

[[[[ ]]]] [[[[ ]]]]4.081.05251

73.065.011

====

========ttcvY

[[[[ ]]]] [[[[ ]]]]25.065.050

6173.065.0

22

====

========

ttcvY


20/23

Multiple Discriminant Analysis (MDA)

Can generalize FLD to multiple classes In case of cclasses, can reduce dimensionality to

1, 2, 3,, c-1 dimensions

Project sample xi to a linear subspace yi= Vtxi Vis called projection matrix


21/23


Objective function: (((( ))))(((( ))))(((( ))))VSVdet

VSVdetVJ

Wt

Bt

====

within the class scatter matrix SW is

(((( ))))(((( )))) ==== ====

========

c

1i

t

ikiclassx

ik

c

1iiW xxSS

k

between the class scatter matrix SB is

(((( ))))(((( ))))tic

1iiiB nS ====

====

Let

====

iclassxi

i xn

1 ====

ixix

n

1

maximum rank is c -1

niby the number of samples of class i andibe the sample mean of class i

be the total mean of all samples


22/23


(((( )))) (((( ))))(((( ))))VSVdetVSVdetVJ

Wt

Bt

====

The optimal projection matrix Vto a subspace of

dimension kis given by the eigenvectors

corresponding to the largest keigenvalues

First solve the generalized eigenvalueproblem:

vSvS WB ====

At most c-1 distinct solution eigenvalues

Let v1, v2 ,, vc-1 be the corresponding eigenvectors

Thus can project to a subspace of dimension at

most c-1


23/23

FDA and MDA Drawbacks

Reduces dimension only to k= c-1 (unlike PCA) For complex data, projection to even the best line may

result in unseparable projected samples

Will fail:1. J(v) is always 0: happens if1 =2

PCA performsreasonably wellhere:

PCA alsofails:

2. If J(v) is always large: classes have large overlap when

projected to any line (PCA will also fail)

fischer linear discriminant

Documents