kernelized discriminant analysis and adaptive methods for discriminant analysis

of 48/48
Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007

Post on 08-Jan-2016

32 views

Category:

Documents

3 download

Embed Size (px)

DESCRIPTION

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis. Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007. Clustering. Clustering : grouping of data based on similarity measures. Classification. - PowerPoint PPT Presentation

TRANSCRIPT

  • Kernelized Discriminant Analysis and Adaptive Methods for Discriminant AnalysisHaesun Park

    Georgia Institute of Technology, Atlanta, GA, USA(joint work with C. Park)KAIST, Korea, June 2007

  • Clustering

  • Clustering : grouping of data based on similarity measures

  • Classification: assign a class label to new unseen dataClassification

  • Data Mining Data Preparation PreprocessingClassificationClusteringAssociation Analysis Regression Probabilistic modeling Dimension reductionFeature Selection -Data Reduction Mining or discovery of new information - patterns or rules - from large databasesFeature Extraction

  • Feature Extraction Optimal feature extraction - Reduce the dimensionality of data space - Minimize effects of redundant features and noiseApply a classifier to predict a class label of new datafeature extraction............number of featuresnew dataCurse of dimensionality

  • Linear dimension reductionMaximize class separability in the reduced dimensional space

  • Linear dimension reductionMaximize class separability in the reduced dimensional space

  • What if data is not linear separable?Nonlinear Dimension Reduction

  • ContentsLinear Discriminant Analysis

    Nonlinear Dimension Reduction based on Kernel Methods - Nonlinear Discriminant Analysis

    Application to Fingerprint Classification

  • Linear Discriminant Analysis (LDA)

    For a given data set {a1,,an }Within-class scatter matrix

    trace(Sw)

    Centroids :

  • Between-class scatter matrix

    trace(Sb)

    GTmaximize minimize

    trace(GTSwG)trace(GTSbG)a1 an GTa1 GTan

  • Eigenvalue problem Sw-1 SbG=Sw-1Sb X = X rank(Sb) number of classes - 1

  • Face Recognition92 x 11210304GT?dimension reduction to maximize the distances among classes.

  • Text ClassificationA bag of words: each document is represented with frequencies of words contained

    EducationFacultyStudentSyllabusGradeTuition.RecreationMovieMusicSportHollywoodTheater..

    GT

  • SbSwGeneralized LDA AlgorithmsUndersampled problems: high dimensionality & small number of data Cant compute Sw-1Sb

  • Nonlinear Dimension Reductionbased on Kernel Methods

  • Nonlinear Dimension ReductionGTnonlinear mappinglinear dimension reduction

  • Kernel MethodIf a kernel function k(x,y) satisfies Mercers condition, then there exists a mapping for which = k(x,y) holds A (A) < x, y > < (x), (y) > = k(x,y) For a finite data set A=[a1,,an], Mercers condition can be rephrased as the kernel matrix is positive semi-definite.

  • Nonlinear Dimension Reduction by Kernel MethodsGTGiven a kernel function k(x,y)linear dimension reduction

  • Positive Definite Kernel FunctionsGaussian kernel

    Polynomial kernel

  • Nonlinear Discriminant Analysis using Kernel Methods{a1,a2,,an}

    Sb x= Sw x{(a1),,(an)}Want to apply LDA= k(x,y)

  • Nonlinear Discriminant Analysis using Kernel Methods{a1,a2,,an}

    Sb x= Sw x{(a1),,(an)}k(a1,a1) k(a1,an) ,, k(an,a1) k(an,an)Sb u= Sw uApply Generalized LDA Algorithms

  • SbSwGeneralized LDA AlgorithmsMinimize trace(xT Sw x) xT Sw x = 0 x null(Sw)Maximize trace(xT Sb x) xT Sb x 0 x range(Sb)

  • Generalized LDA algorithmsAdd a positive diagonal matrix I to Sw so that Sw+I is nonsingularRLDALDA/GSVDApply the generalized singular value decomposition (GSVD) to {Hw , Hb} in Sb = Hb HbT and Sw=Hw HwTTo-N(Sw)Projection to null space of SwMaximize between-class scatter in the projected space

  • Generalized LDA AlgorithmsTransformation to range space of SbDiagonalize within-class scatter matrix in the transformed spaceTo-R(Sb)To-NR(Sw)Reduce data dimension by PCAMaximize between-class scatter in range(Sw) and null(Sw)

  • Data sets Data dim no. of data no. of classes Musk 166 6599 2 Isolet 617 7797 26 Car 6 1728 4 Mfeature 649 2000 10 Bcancer 9 699 2 Bscale 4 625 3From Machine Learning Repository Database

  • Experimental SettingsSplitkernel function k and a linear transf. GTDimension reducingPredict class labels of test data using training dataOriginal dataTraining dataTest data

  • Each color represents different data setsmethodsPrediction accuracies

  • Linear and Nonlinear Discriminant AnalysisData sets

  • Face Recognition

  • Application of Nonlinear Discriminant Analysis to Fingerprint Classification

  • Left Loop Right Loop Whorl Arch Tented Arch Fingerprint ClassificationFrom NIST Fingerprint database 4

  • Previous Works in Fingerprint ClassificationFeature representation Minutiae

    Gabor filtering Directional partitioning

    Apply Classifiers: Neural Networks Support Vector Machines Probabilistic NNOur Approach Construct core directional images by DFT Dimension Reduction by Nonlinear Discriminant Analysis

  • Construction of Core Directional Images Left Loop Right Loop Whorl

  • Construction of Core Directional ImagesCore Point

  • Discrete Fourier transform (DFT)

  • Discrete Fourier transform (DFT)

  • Construction of Directional Images Computation of local dominant directions by DFT and directional filtering Core point detection Reconstruction of core directional images

    Fast computation of DFT by FFTReliable for low quality images

  • Computation of local dominant directions by DFT and directional filtering

  • Construction of Directional Images105 x 105512 x 512

  • Nonlinear discriminant Analysis105 x 10511025-dim. spaceGTLeft loopWhorlRight loopTented archArchMaximizing class separability in the reduced dimensional space4-dim. space

  • Comparison of Experimental Results NIST Database 4

    Rejection rate (%) 0 1.8 8.5 20.0Nonlinear LDA/GSVD 90.7 91.3 92.8 95.3PCASYS + 89.7 90.5 92.8 95.6 Jain et.al. [1999,TPAMI] - 90.0 91.2 93.5Yao et al. [2003,PR] - 90.0 92.2 95.6 prediction accuracies (%)

  • Summary

    Nonlinear Feature Extraction based on Kernel Methods - Nonlinear Discriminant Analysis - Kernel Orthogonal Centroid Method (KOC)A comparison of Generalized Linear and Nonlinear Discriminant Analysis AlgorithmsApplication to Fingerprint Classification

  • Dimension reduction - feature transformation : linear combination of original featuresFeature selection : select a part of original features gene expression microarray data anaysis -- gene selectionVisualization of high dimensional dataVisual data mining

  • i,j : dominant direction on the neighborhood centered at (i, j) Measure consistency of local dominant directions | i,j=-1,0,1 [cos(2i,j), sin(2i,j)] | :distance from the starting point to finishing point the lowest value -> Core point Core point detection

  • ReferencesL.Chen et al., A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition, 33:1713-1726, 2000P.Howland et al., Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIMAX, 25(1):165-179, 2003 H.Yu and J.Yang, A direct LDA algorithm for high-dimensional data-with application to face recognition, Pattern Recognition, 34:2067-2070, 2001J.Yang and J.-Y.Yang, Why can LDA be performed in PCA transformed space?, Pattern Recognition, 36:563-566, 2003H. Park et al., Lower dimensional representation of text data based on centroids and least squares, BIT Numerical Mathematics, 43(2):1-22, 2003 S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal processing IX, J.Larsen and S.Douglas, pp.41-48, IEEE, 1999 B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10:1299-1319, 1998 G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, 12:2385-2404, 2000 V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions, Advances in neural information processing functions, 12:568-574, 2000..

  • S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a minimum squared error cost function and the orthogonal least squares algorithm, Neural networks, 15(2):263-270, 2002 C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized singular value decomposition, SIMAX, 27-1, pp. 98-102, 2005A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE transactions on Pattern Analysis and Machine Intelligence, 21(4):348-359,1999Y.Yao et al., Combining flat and structural representations for fingerprint classifiaction with recursive neural networks and support vector machines, Pattern recognition, 36(2):397-406,2003C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel functions, Pattern recognition, 37(4):801-810C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for undersampled problems, Pattern Recognition, to appearC.H.Park and H.Park, Fingerprint classification using fast fourier transform and nonlinear discriminant analysis, Pattern recognition, 2006