fischer linear discriminant

Upload: mustafa-haider

Post on 03-Apr-2018

227 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/29/2019 fischer linear discriminant

    1/23

    PCA finds the most accurate data representationin a lower dimensional space

    Project data in the directions of maximum variance

    Fisher Linear Discriminant project to a line whichpreserves direction useful for data classification

    Data Representation vs. Data Classification

    However the directions of maximum variance maybe useless for classification

    apply PCA

    to each class

    separable

    not separable

  • 7/29/2019 fischer linear discriminant

    2/23

    Fisher Linear Discriminant

    Main idea: find projection to a line s.t. samplesfrom different classes are well separated

    bad line to project to,classes are mixed up

    Example in 2D

    good line to project to,classes are well separated

  • 7/29/2019 fischer linear discriminant

    3/23

    Fisher Linear Discriminant

    Suppose we have 2 classes and d-dimensionalsamples x1,,xn where n1 samples come from the first class n2 samples come from the second class

    consider projection on a line

    Let the line direction be given by unit vector v

    vix

    vt x i

    Scalar vtxi is the distance ofprojection of xifrom the origin

    Thus it vtxi is the projection ofxiinto a one dimensionalsubspace

  • 7/29/2019 fischer linear discriminant

    4/23

    Fisher Linear Discriminant

    How to measure separation between projections of

    different classes?

    Thus the projection of sample xionto a line indirection vis given by vtxi

    ====

    ========

    1 1

    1

    1

    111

    1 11~

    n

    Cx

    t

    n

    Cxi

    ti

    t

    i i

    vxn

    vxvn

    Let1 and2be the means of classes 1 and 2

    Let and be the means of projections of

    classes 1 and 21

    ~2

    ~

    22~, tvsimilarly ====

    seems like a good measure21~~

  • 7/29/2019 fischer linear discriminant

    5/23

    Fisher Linear Discriminant

    How good is as a measure of separation? The larger , the better is the expected separation

    21 ~~

    the vertical axes is a better line than the horizontalaxes to project to for class separability

    however2121

    ~~ >>>>

    21~~

    1

    2

    1

    2

    1

    ~

    2~

  • 7/29/2019 fischer linear discriminant

    6/23

    Fisher Linear Discriminant

    The problem with is that it does notconsider the variance of the classes

    21 ~~

    1

    2

    1~

    2~

    large variance

    small

    variance 1

    2

  • 7/29/2019 fischer linear discriminant

    7/23

    Fisher Linear Discriminant

    We need to normalize by a factor which isproportional to variance

    21 ~~

    (((( ))))====

    ====

    n

    i

    zizs

    1

    2

    Define their scatteras

    Have samples z1,,zn. Sample mean is ====

    ====

    n

    iiz z

    n 1

    1

    Thus scatter is just sample variance multiplied by n

    scatter measures the same thing as variance, the spread

    of data around the mean scatter is just on different scale than variance

    larger scatter: smaller scatter:

  • 7/29/2019 fischer linear discriminant

    8/23

    Fisher Linear Discriminant

    Fisher Solution: normalize by scatter21 ~~

    (((( ))))

    ====

    2Classy

    22i

    22

    i

    ~ys~

    Scatter for projected samples of class 2 is

    Scatter for projected samples of class 1 is

    (((( ))))

    ====

    1

    2

    1

    2

    1~~

    Classyi

    i

    ys

    Let yi= vtxi, i.e. yi s are the projected samples

  • 7/29/2019 fischer linear discriminant

    9/23

    Fisher Linear Discriminant

    We need to normalize by both scatter of class 1 andscatter of class 2

    (((( ))))(((( ))))

    2

    2

    2

    1

    2

    21

    ~~

    ~~

    ssvJ

    ++++

    ====

    Thus Fisher linear discriminant is to project on line

    in the direction vwhich maximizes

    want projected means are far from each other

    want scatter in class 2 is assmall as possible, i.e. samplesof class 2 cluster around theprojected mean

    2~

    want scatter in class 1 is assmall as possible, i.e. samplesof class 1 cluster around theprojected mean 1

    ~

  • 7/29/2019 fischer linear discriminant

    10/23

    Fisher Linear Discriminant

    (((( )))) (((( ))))22

    2

    1

    2

    21

    ~~~~ss

    vJ++++

    ====

    If we find vwhich makes J(v) large, we are

    guaranteed that the classes are well separated

    1

    ~ 2~

    small implies thatprojected samples ofclass 1 are clusteredaround projected mean

    1

    ~s small implies that

    projected samples ofclass 2 are clusteredaround projected mean

    2

    ~s

    projected means are far from each other

  • 7/29/2019 fischer linear discriminant

    11/23

    Fisher Linear Discriminant Derivation

    All we need to do now is to express Jexplicitly as a

    function of vand maximize it straightforward but need linear algebra and Calculus

    (((( )))) (((( ))))22

    2

    1

    2

    21

    ~~~~ss

    vJ++++

    ====

    Define the separate class scatter matrices S1 andS2 for classes 1 and 2. These measure the scatterof original samples xi(before projection)

    (((( ))))(((( ))))====

    1111

    Classx

    t

    iii

    xxS

    (((( ))))(((( ))))

    ====

    2

    222

    Classx

    t

    ii

    i

    xxS

  • 7/29/2019 fischer linear discriminant

    12/23

    Fisher Linear Discriminant Derivation

    Now define the withinthe class scatter matrix21 SSSW ++++====

    Recall that (((( ))))

    ====

    1

    2

    1

    2

    1~~

    Classyi

    i

    ys

    (( ))====

    1Classy

    2

    1

    t

    i

    t2

    1i vxvs

    ~

    Using yi= vtxi and 11

    ~ tv====

    (((( ))))(((( )))) (((( ))))(((( ))))====

    1Classy

    t1i

    t

    t1i

    i

    vxvx

    (((( ))))(((( ))))

    ====

    1Classy

    t

    1i1it

    i

    vxxv vSvt 1====

    (((( ))))(((( )))) (((( ))))(((( ))))

    ====

    1Classy1i

    tt

    1it

    i

    xvxv

  • 7/29/2019 fischer linear discriminant

    13/23

    Fisher Linear Discriminant Derivation

    Similarly vSvs t 222~ ====

    Lets rewrite the separations of the projected means

    Therefore vSvvSvvSvss Wttt

    ====++++====++++ 21

    2

    2

    2

    1

    ~~

    (((( )))) (((( ))))2212

    21~~ tt vv ====

    (((( ))))(((( )))) vv tt 2121 ====

    vSv Bt

    ====

    Define between the class scatter matrix(((( ))))(((( ))))tBS 2121 ====

    SBmeasures separation between the means of two

    classes (before projection)

  • 7/29/2019 fischer linear discriminant

    14/23

    Fisher Linear Discriminant Derivation

    Thus our objective function can be written:

    Minimize J(v) by taking the derivative w.r.t. vand

    setting it to 0

    (((( ))))(((( ))))2vSv

    vSvvSvdv

    dvSvvSvdv

    d

    vJdv

    d

    Wt

    BtWtWtBt

    ====

    (((( )))) (((( ))))(((( ))))2

    22

    vSv

    vSvvSvSvvS

    Wt

    BtWWtB ==== 0====

    (((( ))))(((( ))))

    vSv

    vSv

    s~s~

    ~~vJ

    W

    t

    Bt

    2

    2

    2

    1

    2

    21====

    ++++

    ====

  • 7/29/2019 fischer linear discriminant

    15/23

    Fisher Linear Discriminant Derivation

    Need to solve (((( )))) (((( )))) 0==== vSvSvvSvSv WBtBWt

    (((( )))) (((( ))))0====

    vSv

    vSvSv

    vSv

    vSvSv

    Wt

    WBt

    Wt

    BWt

    (((( ))))0====

    vSv

    vSvSvvS

    W

    t

    WBt

    B

    vSvS WB ========

    generalized eigenvalue problem

  • 7/29/2019 fischer linear discriminant

    16/23

    Fisher Linear Discriminant Derivation

    If SWhas full rank (the inverse exists), can convert

    this to a standard eigenvalue problem

    vSvS WB ====

    vvSS BW ==== 1

    But SBx for any vector x, points in the same

    direction as1 - 2

    (((( ))))(((( )))) xxS tB 2121 ==== (((( )))) (((( )))) xt2121 ==== (((( ))))21 ====

    Thus can solve the eigenvalue problem immediately

    (((( ))))211 ====

    WSv

    (((( )))) (((( ))))[[[[ ]]]]211

    W211

    WB1

    W SSSS ====

    v v

    (((( ))))211

    WS ====

  • 7/29/2019 fischer linear discriminant

    17/23

    Fisher Linear Discriminant Example

    Data Class 1 has 5 samples c1=[(1,2),(2,3),(3,3),(4,5),(5,5)]

    Class 2 has 6 samples c2=[(1,0),(2,1),(3,1),(3,2),(5,3),(6,5)]

    Arrange data in 2 separate matrices

    ====

    55

    21c1

    ====

    56

    01c2

    Notice that PCA performs verypoorly on this data because thedirection of largest variance is nothelpful for classification

  • 7/29/2019 fischer linear discriminant

    18/23

    Fisher Linear Discriminant Example

    First compute the mean for each class(((( )))) [[[[ ]]]]6.33cmean 11 ======== (((( )))) [[[[ ]]]]23.3cmean 22 ========

    (((( ))))

    ======== 2.70.80.810ccov4S 11 (((( ))))

    ======== 1616163.17ccov5S 22

    Compute scatter matrices S1 and S2 for each class

    ====++++==== 2.2324 243.27SSS 21W

    Within the class scatter:

    it has full rank, dont have to solve for eigenvalues

    (((( ))))

    ========

    47.041.0 41.039.0SinvS W1W The inverse of SW is

    Finally, the optimal line direction v

    (((( ))))

    ========

    89.0

    79.0Sv 211

    W

  • 7/29/2019 fischer linear discriminant

    19/23

    Fisher Linear Discriminant Example

    Notice, as long as the linehas the right direction, its

    exact position does notmatter

    Last step is to compute

    the actual 1D vector y.Lets do it separately foreach class

    [[[[ ]]]] [[[[ ]]]]4.081.05251

    73.065.011

    ====

    ========ttcvY

    [[[[ ]]]] [[[[ ]]]]25.065.050

    6173.065.0

    22

    ====

    ========

    ttcvY

  • 7/29/2019 fischer linear discriminant

    20/23

    Multiple Discriminant Analysis (MDA)

    Can generalize FLD to multiple classes In case of cclasses, can reduce dimensionality to

    1, 2, 3,, c-1 dimensions

    Project sample xi to a linear subspace yi= Vtxi Vis called projection matrix

  • 7/29/2019 fischer linear discriminant

    21/23

    Multiple Discriminant Analysis (MDA)

    Objective function: (((( ))))(((( ))))(((( ))))VSVdet

    VSVdetVJ

    Wt

    Bt

    ====

    within the class scatter matrix SW is

    (((( ))))(((( )))) ==== ====

    ========

    c

    1i

    t

    ikiclassx

    ik

    c

    1iiW xxSS

    k

    between the class scatter matrix SB is

    (((( ))))(((( ))))tic

    1iiiB nS ====

    ====

    Let

    ====

    iclassxi

    i xn

    1 ====

    ixix

    n

    1

    maximum rank is c -1

    niby the number of samples of class i andibe the sample mean of class i

    be the total mean of all samples

  • 7/29/2019 fischer linear discriminant

    22/23

    Multiple Discriminant Analysis (MDA)

    (((( )))) (((( ))))(((( ))))VSVdetVSVdetVJ

    Wt

    Bt

    ====

    The optimal projection matrix Vto a subspace of

    dimension kis given by the eigenvectors

    corresponding to the largest keigenvalues

    First solve the generalized eigenvalueproblem:

    vSvS WB ====

    At most c-1 distinct solution eigenvalues

    Let v1, v2 ,, vc-1 be the corresponding eigenvectors

    Thus can project to a subspace of dimension at

    most c-1

  • 7/29/2019 fischer linear discriminant

    23/23

    FDA and MDA Drawbacks

    Reduces dimension only to k= c-1 (unlike PCA) For complex data, projection to even the best line may

    result in unseparable projected samples

    Will fail:1. J(v) is always 0: happens if1 =2

    PCA performsreasonably wellhere:

    PCA alsofails:

    2. If J(v) is always large: classes have large overlap when

    projected to any line (PCA will also fail)