kernelized discriminant analysis and adaptive methods for discriminant analysis

48
Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007

Upload: larya

Post on 08-Jan-2016

63 views

Category:

Documents


3 download

DESCRIPTION

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis. Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007. Clustering. Clustering : grouping of data based on similarity measures. Classification. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Haesun Park

Georgia Institute of Technology,

Atlanta, GA, USA

(joint work with C. Park)

KAIST, Korea, June 2007

Page 2: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Clustering

Page 3: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Clustering : grouping of data based on similarity measures

Page 4: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Classification: assign a class label to new unseen data

Classification

Page 5: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Data Mining

Data Preparation

Preprocessing

Classification Clustering •Association Analysis• Regression• Probabilistic modeling …

Dimension reduction-Feature Selection

-

Data Reduction

• Mining or discovery of new information - patterns or rules - from large databases

Feature Extraction

Page 6: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Feature Extraction

• Optimal feature extraction - Reduce the dimensionality of data space - Minimize effects of redundant features and noise

Apply a classifier to predict a class label of new data

feature extraction

.. .. ..

......

number of features

new data

Curse of dimensionality

Page 7: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Linear dimension reduction

Maximize class separability

in the reduced dimensional space

Page 8: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Linear dimension reduction

Maximize class separability

in the reduced dimensional space

Page 9: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

What if data is not linear separable?

Nonlinear Dimension Reduction

Page 10: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Contents

• Linear Discriminant Analysis

• Nonlinear Dimension Reduction based on Kernel Methods

- Nonlinear Discriminant Analysis

• Application to Fingerprint Classification

Page 11: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

n

iia

nc

1

1

For a given data set {a1,┉,an }

• Within-class scatter matrix

• trace(Sw)

r

ii

iclassa

ca1

2||||

r

i

Ti

iclassaiw cacaS

1

))((

Centroids :

iclassai

i an

c 1

Linear Discriminant Analysis (LDA)

Page 12: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

• Between-class scatter matrix

• trace(Sb)2

1

|||| ccr

ii

Tiii

r

ib ccccnS ))((

1

GT→

maximize minimize trace(GTSwG)

trace(GTSbG)

a1┉ an GTa1┉ GTan

Page 13: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Eigenvalue problem xxSS bw 1

Sw-1 Sb

G

=

Sw-1Sb X = X

))()(( tracemax)( 1 GSGGSGGJ bT

wT

G

rank(Sb) number of classes - 1

Page 14: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Face Recognition

92 x 112

10304

GT

?

dimension reduction to maximize the distances among classes.

Page 15: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Text Classification

• A bag of words: each document is represented with frequencies of words contained

Education

FacultyStudentSyllabusGradeTuition….

Recreation

MovieMusicSportHollywoodTheater…..

GT

Page 16: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

SbSw

Generalized LDA Algorithms

xxSS bw 1

xSxS wb

• Undersampled problems:

high dimensionality & small number of data

Can’t compute Sw-1Sb

Page 17: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Nonlinear Dimension Reductionbased on Kernel Methods

Page 18: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Nonlinear Dimension Reduction

GT

nonlinear mapping linear dimension reduction ),2,(),( 2

2212121 xxxxxx

Page 19: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Kernel Method

• If a kernel function k(x,y) satisfies Mercer’s condition, then there exists a mapping

for which <(x),(y)>= k(x,y) holds

A (A) < x, y > < (x), (y) > = k(x,y)

• For a finite data set A=[a1,…,an], Mercer’s condition can be rephrased as the kernel matrix is positive semi-definite.

njiji aakK ,1)],([

Page 20: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Nonlinear Dimension Reduction by Kernel Methods

GT

),()(),( yxkyx

Given a kernel function k(x,y)

linear dimension reduction

Page 21: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Positive Definite Kernel Functions

• Gaussian kernel

• Polynomial kernel

)/exp(),(2 yxyxk

),,0(),(),( 2121 Rdyxyxk d

Page 22: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Nonlinear Discriminant Analysis using Kernel Methods

{a1,a2,…,an}

Sb x= Sw x

{(a1),…,(an)}

Want to apply LDA

<(x),(y)>= k(x,y)

Page 23: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Nonlinear Discriminant Analysis using Kernel Methods

{a1,a2,…,an}

Sb x= Sw x

{(a1),…,(an)}

k(a1,a1) k(a1,an) … ,…, … k(an,a1) k(an,an)

Sb u= Sw u

Apply Generalized LDA

Algorithms

Page 24: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

SbSw

Generalized LDA Algorithms

xSxS wb

Minimize trace(xT Sw x)

xT Sw x = 0

x null(Sw)

Maximize trace(xT Sb x)

xT Sb x ≠ 0

x range(Sb)

Page 25: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Generalized LDA algorithms

• Add a positive diagonal matrix I

to Sw so that Sw+I is nonsingularRLDA

LDA/GSVD • Apply the generalized singular value

decomposition (GSVD) to {Hw , Hb}

in Sb = Hb HbT and Sw=Hw Hw

T

To-N(Sw) • Projection to null space of Sw

• Maximize between-class scatter in the projected space

Page 26: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Generalized LDA Algorithms

To-R(Sb)• Transformation to range space of Sb

• Diagonalize within-class scatter matrix in the transformed space

To-NR(Sw)• Reduce data dimension by PCA• Maximize between-class scatter

in range(Sw) and null(Sw)

Page 27: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Data sets

Data dim no. of data no. of classes

Musk 166 6599 2

Isolet 617 7797 26

Car 6 1728 4

Mfeature 649 2000 10

Bcancer 9 699 2

Bscale 4 625 3

From Machine Learning Repository Database

Page 28: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Experimental Settings

Split

kernel function k and a linear transf. GT

Dimension reducing

Predict class labels of test data using training data

Original data

Training data Test data

Page 29: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

• Each color represents different data sets

methods

Prediction accuracies

Page 30: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Linear and Nonlinear Discriminant Analysis

Data sets

Page 31: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Face Recognition

Page 32: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Application of Nonlinear Discriminant Analysis to Fingerprint Classification

Page 33: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Left Loop Right Loop Whorl

Arch Tented Arch

Fingerprint Classification

From NIST Fingerprint database 4

Page 34: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Previous Works in Fingerprint Classification

Feature representation

Minutiae

Gabor filtering

Directional partitioning

Apply Classifiers:

Neural Networks

Support Vector

Machines

Probabilistic NN

Our Approach Construct core directional images by DFT Dimension Reduction by Nonlinear Discriminant Analysis

Page 35: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Construction of Core Directional Images

Left Loop Right Loop Whorl

Page 36: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Construction of Core Directional Images

Core Point

Page 37: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Discrete Fourier transform (DFT)

Page 38: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Discrete Fourier transform (DFT)

Page 39: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Construction of Directional Images

Computation of local dominant directions by DFT and directional filtering

Core point detection Reconstruction of core directional images

• Fast computation of DFT by FFT

• Reliable for low quality images

Page 40: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Computation of local dominant directions by DFT and directional filtering

Page 41: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Construction of Directional Images

105 x 105

512 x 512

Page 42: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Nonlinear discriminant Analysis

105 x 105

11025-dim. space

GT

Left loop

WhorlRight loop

Tented archArch

Maximizing class separability in the reduced dimensional space

4-dim. space

Page 43: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Comparison of Experimental Results

NIST Database 4

Rejection rate (%) 0 1.8 8.5 20.0

Nonlinear LDA/GSVD 90.7 91.3 92.8 95.3PCASYS + 89.7 90.5 92.8 95.6

Jain et.al. [1999,TPAMI] - 90.0 91.2 93.5

Yao et al. [2003,PR] - 90.0 92.2 95.6

prediction accuracies (%)

Page 44: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Summary

• Nonlinear Feature Extraction based on Kernel Methods

- Nonlinear Discriminant Analysis

- Kernel Orthogonal Centroid Method (KOC)

• A comparison of Generalized Linear and Nonlinear Discriminant Analysis Algorithms

• Application to Fingerprint Classification

Page 45: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

• Dimension reduction - feature transformation :

linear combination of original features

• Feature selection :

select a part of original features

gene expression microarray data anaysis

-- gene selection

• Visualization of high dimensional data

• Visual data mining

Page 46: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

• θi,j : dominant direction on the neighborhood

centered at (i, j)• Measure consistency of local dominant directions

| ΣΣi,j=-1,0,1 [cos(2θi,j), sin(2θi,j)] |

:distance from the starting point to finishing point

• the lowest value -> Core point

Core point detection

Page 47: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

References• L.Chen et al., A new LDA-based face recognition system which can solve the small

sample size problem, Pattern Recognition, 33:1713-1726, 2000

• P.Howland et al., Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIMAX, 25(1):165-179, 2003

• H.Yu and J.Yang, A direct LDA algorithm for high-dimensional data-with application to face recognition, Pattern Recognition, 34:2067-2070, 2001

• J.Yang and J.-Y.Yang, Why can LDA be performed in PCA transformed space?, Pattern Recognition, 36:563-566, 2003

• H. Park et al., Lower dimensional representation of text data based on centroids and least squares, BIT Numerical Mathematics, 43(2):1-22, 2003

• S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal processing IX, J.Larsen and S.Douglas, pp.41-48, IEEE, 1999

• B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10:1299-1319, 1998

• G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, 12:2385-2404, 2000

• V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions, Advances in neural information processing functions, 12:568-574, 2000

..

Page 48: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

• S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a minimum squared error cost function and the orthogonal least squares algorithm, Neural networks, 15(2):263-270, 2002

• C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized singular value decomposition, SIMAX, 27-1, pp. 98-102, 2005

• A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE transactions on Pattern Analysis and Machine Intelligence, 21(4):348-359,1999

• Y.Yao et al., Combining flat and structural representations for fingerprint classifiaction with recursive neural networks and support vector machines, Pattern recognition, 36(2):397-406,2003

• C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel functions, Pattern recognition, 37(4):801-810

• C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for undersampled problems, Pattern Recognition, to appear

• C.H.Park and H.Park, Fingerprint classification using fast fourier transform and nonlinear discriminant analysis, Pattern recognition, 2006