cca with its applications (part ii) - nanjing...

48
CCA with Its Applications (Part II) Chen Songcan Chen Songcan ( ( 陈松灿 陈松灿 ) ) [email protected] [email protected] http://parnec.nuaa.edu.cn http://parnec.nuaa.edu.cn Nanjing University of Aeronautics & Nanjing University of Aeronautics & Astronautics Astronautics MLA MLA 06, Nanjing 06, Nanjing

Upload: lediep

Post on 16-Apr-2018

240 views

Category:

Documents


6 download

TRANSCRIPT

CCA with Its Applications (Part II)

Chen SongcanChen Songcan ((陈松灿陈松灿))[email protected]@nuaa.edu.cn

http://parnec.nuaa.edu.cnhttp://parnec.nuaa.edu.cnNanjing University of Aeronautics & Nanjing University of Aeronautics &

AstronauticsAstronauticsMLAMLA’’06, Nanjing06, Nanjing

CCA IllustrationGiven Given XX set {set {xx11,,……,,xxnn}}∈∈RRpp, and , and YY set {set {yy11,,……,,yynn} } ∈∈ RRqq, , CCA aims to simultaneously seek CCA aims to simultaneously seek wwxx ∈∈ RRpp, , wwyy ∈∈ RRqq, , to ensure [Bor99] (specially use for feature extraction)to ensure [Bor99] (specially use for feature extraction)

CCA formulation( )

( ) ( ),

1

,

1 1

cov ,max

var var

max

T Tx y

T Tx y

nT Tx i i y

in n

T T T Tx i i x y i i y

i i

=

= =

=∑

∑ ∑

x y

x y

w w

w w

w x w y

w x w y

w x y w

w x x w w y y w

T Tx x

T Ty y

λ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞

⇒ =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

w wXY XXw wYX YY

1 1where [ ,..., ], [ ,..., ]n n= =X x x Y y y

CCA Alternative formulation

( )( )

( )( )

( )( )

1 1

1 1

1 1

max

s.t. 1

1

n n TTx i j i j y

i j

n n TTx i j i j x

i j

n n TTy i j i j y

i j

= =

= =

= =

⋅ − − ⋅

⋅ − − ⋅ =

⋅ − − ⋅ =

∑∑

∑∑

∑∑

w x x y y w

w x x x x w

w y y y y w

( ) ( )

( ) ( )

2

1

2 2

1 1

min

s.t. 1, 1

x y

nT Tx i y i

in n

T Tx i y i

i i

=

= =

= =

∑ ∑

w ,ww x - x w y - y

w x - x w y - y

Basis of our LPCCA

Applications

image retrieval [HSS04], image segmentation [LGD05], image retrieval [HSS04], image segmentation [LGD05], image deimage de--noising [He04] , image analysis [Nie02], feature noising [He04] , image analysis [Nie02], feature detection of image [And00]detection of image [And00]text categorization, text mining, text retrieval, imagetext categorization, text mining, text retrieval, image--text text retrieval , machine translation [For04]retrieval , machine translation [For04]computer vision [KSE05]computer vision [KSE05]pattern recognition [SZL05]pattern recognition [SZL05]biometrics [YVN03] biometrics [YVN03] signal processing [SS06]signal processing [SS06]location estimation in wireless sensor networks [PKY06] location estimation in wireless sensor networks [PKY06] facial expression recognition [ZZZ06] facial expression recognition [ZZZ06] MultiMulti--view SVM design [FHM05]view SVM design [FHM05]

New Applications

Sensor signal vs. location

location estimation in wireless sensor networkslocation estimation in wireless sensor networks

[PKY06] J.J. Pan, J.T. Kwok, Q. Yang, Y. Chen, Multidimensional Vector Regression for Accurate and Low-Cost Location Estimation in Pervasive Computing, IEEE Transaction on Knowledge and Data Engineering, 18(2006):1181-1193.

New Applicationsfacial expression recognition facial expression recognition (1:happiness, sadness, surprise, angry, (1:happiness, sadness, surprise, angry, disgust, 6:fear)disgust, 6:fear)

Landmark points vs. Semantic ratings (estimation)(JAFFE database)

[ZZZ06] W. Zheng, X. Zhou, C. Zou, L. Zhao, Facial Expression Recognition Using Kernel Canonical Correlation Analysis (KCCA), IEEE Transaction on Neural Networks, 17(2006), 233-238.

New ApplicationsMultiMulti--viewview SVM designSVM design

[FHM05] J. D.R. Farquhar, D. R. Hardoon, H. Meng, J. Shawe-Taylor, S. Szedmak, Two view learning: SVM-2K, Theory and Practice, NIPS 2005.

CCA alternative formulation

Our previous work

Locality preserving CCA (LPCCA)Locality preserving CCA (LPCCA)Kernelized LPCCAKernelized LPCCATheir applications in pose estimation and data Their applications in pose estimation and data visualizationvisualization

Our previous work--LPCCA

Motive: CCA is linear, and is insufficient to reveal the Motive: CCA is linear, and is insufficient to reveal the nonlinear correlation between two sets X and Y of realnonlinear correlation between two sets X and Y of real--world variablesworld variablesObjectiveObjective: :

( ) ( )

( )( )

( )( )

, 1

2

1

1

2

1

1 1

max

s.t. 1

1

x y

nx yij ij

n TT

[SC06] Tingkai Sun, Songcan Chen, Locality preserving CCA with applications to data visualization and pose estimation, Image and Vision Computing, in press, 2006

x i j i j yi j

n n TTx i j i j x

i j

n n TTy i j i j y

i

xij

yij

j

S S

S

S

= =

= =

= =

⋅ − − ⋅

⋅ − − ⋅ =

⋅ − − ⋅ =

∑∑

∑∑

∑∑

w ww x x y y w

w x x x x w

w y y y y w

xy xx

yyy

T Tx x

LP TTyx y

λ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⇒ = ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

S S

S

X Y w X X ww wY YXSY

Our previous work

CCA KCCA LPCCA

Error distribution histograms

CCA KCCA LPCCA

KLPCCA, TNN in revision

Our new work

CIPCACIPCA----ClassClass--informationinformation--incorporated PCAincorporated PCADCCADCCA---- Discriminant CCADiscriminant CCADCCAMDCCAM–– DCCA with missing samplesDCCA with missing samplesMultiMulti--Kernel learningKernel learningBeyond KCCABeyond KCCA

CIPCA

MotiveMotive1) utilize the class information for feature extraction1) utilize the class information for feature extraction2) CIPCA and CCA can be unified into a framework2) CIPCA and CCA can be unified into a framework

Objective:Objective:

D c+⎡ ⎤= ∈ℜ⎢ ⎥⎣ ⎦

xz

y

[CS05] Songcan Chen, Tingkai Sun, Class-information-incorporated principal component analysis, Neurocomputing, 69 (2005) 216–223

( ) ˆmax Tzz Y Xtr +⇒ =

WW C W y W W x

max , where for CCA, for CIPCATT

zzT T

⎡ ⎤= =⎢ ⎥⎣ ⎦w

XXw C w D D Iw Dw YY

CIPCA

ResultsResults

Our new work

CIPCACIPCADCCADCCADCCAMDCCAMMultiMulti--Kernel learningKernel learningBeyond KCCABeyond KCCA

DCCA- modified feature extractorMotiveMotive1) for feature fusion and multimodal recognition1) for feature fusion and multimodal recognition

Multiple Biometrics [JRS04] Applications

DCCA

MotiveMotive2) in CCA, the correlation between is insuffici2) in CCA, the correlation between is insufficient to ent to

discriminate between classesdiscriminate between classes3) in DCCA, we take correlation as similarity metric, aims3) in DCCA, we take correlation as similarity metric, aims

to maximize the withinto maximize the within--class correlation, minimize the class correlation, minimize the betweenbetween--class correlationclass correlation

Consider the correlation between classesConsider the correlation between classes

( , )i ix y

DCCA

maximize the withinmaximize the within--class correlationclass correlationminimize the betweenminimize the between--class correlationclass correlationObjective functionObjective function

1/ 2,

( )max

( )x y

Tx w b y

T Tx xx x y yy y

η−⋅w w

w C C ww C w w C w

ηwhere is a tunable parameter.where is a tunable parameter.

DCCA

withinwithin--class correlationclass correlation

betweenbetween--class correlationclass correlation

( ) ( )

1 1 1

i in nci i T T

w k li k l= = =

= =∑∑∑C x y XAY

( )( )

( ) ( )

1 1 1 1

ji nnc ci j T

b k li j k l

j i

w

T Tn n

T

= = = =≠

=

= −

= −= −

∑∑∑∑C x y

X1 Y1 AY

XAYC

X

1 1

i i

c c

n n

n nn n

n n

×

××

×

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1

1A

1

O

O

is a block, diagonal matrix of size is a block, diagonal matrix of size nn--byby--nn..

DCCA

Objective functionObjective function

T Tx x

T Ty y

λ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞

⇒ =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

w wXAY XXw wYAX YY

1/ 2,

1/ 2,

max( )

ma

(1

( )

)

x

x y

x y

Tx w y

T Tx xx x y yy y

Tx w y

T Tx xx x y yy y

η+ ⋅

=⋅

w w

w w

w C ww C w w C w

w C ww C w w C w

DCCA Results on toy problem

Data CCA PLS DCCA

Ti i iε= + +y W x b

22

22

0.60.8⎡ ⎤−

= ⎢ ⎥⎣ ⎦

W the imposed Gaussian noise iε[1,1]T=b

DCCA Results on real datasets

Hypertext categorization

(WebKB dataset)

DCCA ResultsHandwritten numeral recognition (Multiple Feature database)

The works about DCCA have been submitted.http://www.ics.uci.edu/~mlearn/MLSummary.html

Our new work

CIPCACIPCADCCADCCADCCAMDCCAMMultiMulti--Kernel learningKernel learningBeyond KCCABeyond KCCA

DCCAM

MotiveMotive1) deal with missing samples due to sensor failure, high 1) deal with missing samples due to sensor failure, high

cost, etc. cost, etc. 2) 2) ““missing samplesmissing samples”” means means ““not pairwisenot pairwise”” for X and Yfor X and Y3) differ from 3) differ from ““missing datamissing data”” (or (or ““incomplete dataincomplete data””))

Missing data Missing sample

DCCAM

Usual strategies for missing data Usual strategies for missing data 1) expectation maximization [DHS00]1) expectation maximization [DHS00]2) substitution of class mean for missing data2) substitution of class mean for missing data3) iterative estimation of the missing data [WM01]3) iterative estimation of the missing data [WM01]4) local least squares imputation (4) local least squares imputation (估算估算) [KGP05]) [KGP05]

Our solutionOur solutiondirectlydirectly deal with the training set with missing samplesdeal with the training set with missing samples

DCCAMObjectiveObjective

1/ 2,

( )max

( )x y

Tx w b y

T Tx xx x y yy y

η−⋅w w

w C C ww C w w C w

T Tx x

T Ty y

λ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞

⇒ =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

w wXAY XXw wYAX YY

1 1

i i

c c

m n

m nm n

m n

×

××

×

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1

1A

1

O

O

is a block, diagonal matrix of is a block, diagonal matrix of size size mm--byby--nn..

DCCAM Results on Toy problem

(a) Data

(b) CCA

(c) PLS

(d) DCCAM

Circled samples are missing ones. For (b) CCA and (c) PLS, the un-pairwise samples are deleted.

DCCAM Results on Real datasets

Multiple Feature Database (handwritten numerals), 10 classes, 100 pairs per class. For each class, we

1) delete 10 samples from X, and delete 10 samples from Y.

2) delete 20 samples from X, and delete 20 samples from Y.

3) delete 30 samples from X, and delete 30 samples from Y.

4) delete 40 samples from X, and delete 40 samples from Y.

5) delete 50 samples from X, and delete 50 samples from Y.

6) delete 60 samples from X, and delete 60 samples from Y.

Note: the deleted samples are NOT pairwise.

missing samples

un-pairwise samples

DCCAM Results on Real datasetsAdjustment of the comparative algorithmsAdjustment of the comparative algorithmsFor CCA and PLS, the following algorithm is adjusted to For CCA and PLS, the following algorithm is adjusted to match the case of match the case of ““missing samplesmissing samples””

1) delete the rest un1) delete the rest un--pairwise samplespairwise samples(named as (named as CCACCA--I, PLSI, PLS--II) and ) and DCCADCCA--II..

2) substitution of class mean vector for missing samples2) substitution of class mean vector for missing samples(named as (named as CCACCA--II, PLSII, PLS--IIII) and ) and DCCADCCA--IIII..

3) iterative estimation of the missing samples 3) iterative estimation of the missing samples (named as (named as CCACCA--III, PLSIII, PLS--IIIIII).).

Note that DCCANote that DCCA--I and I and ––II are just for comparison, and no II are just for comparison, and no DCCADCCA--III is introduced.III is introduced.

DCCAM Results on Real datasets

X-axis:

the # of missing samples per class. i.e., 10, 20,…50, 60.

Y-axis:

the recognition accuracy.

DCCAM Results on Real datasets

DCCAM Results on Real datasets

The works about DCCAM have been submitted.

DCCAM advantages

Supervised learning method due to the incorporation of the Supervised learning method due to the incorporation of the class information class information Tolerance to the missing of samples Tolerance to the missing of samples Superior performance to some multimodal recognition Superior performance to some multimodal recognition algorithmsalgorithmsDirect processing ability of missing samples and without Direct processing ability of missing samples and without resorting to artificially compensate the missing samplesresorting to artificially compensate the missing samplesRelative insensitiveness to the number of missing samplesRelative insensitiveness to the number of missing samplesOnly involvement of one tunable parameter, Only involvement of one tunable parameter, dd, the final , the final dimensionality of the extracted features. This makes it easy dimensionality of the extracted features. This makes it easy to be manipulated in real applications.to be manipulated in real applications.

Our new work

CIPCACIPCADCCADCCADCCAMDCCAMMultiMulti--Kernel learningKernel learningBeyond KCCABeyond KCCA

Multi-Kernel Learning (MKL)

Motive: Motive: 1) approximate heterogeneous (1) approximate heterogeneous (异质异质) data sources;) data sources;2) avoid model or kernel parameter selection to some2) avoid model or kernel parameter selection to some

extent;extent;The related work [BLJ04, SRS05]The related work [BLJ04, SRS05]The The conic combinationsconic combinations of multiple kernels of multiple kernels

Multi-Kernel Learning (MKL)

Our MethodOur Methodborrow multiple CCA (borrow multiple CCA (mCCAmCCA) to fuse multiple kernels) to fuse multiple kernels

Essence: Single dataset Multi-Kernel mapping

+ Multi-CCA (NmCCA) MKL

Multi-Kernel Learning (MKL)

New Multi-CCA (NmCCA)

mCCA

NmCCA

Least Squares Criterion

MKL Results compared to [SRS05]

Classification performance comparisonClassification performance comparison

MKL Results compared to [SRS05]

tt--test comparison on test comparison on MultiKMultiK--MHKS with the following algorithmsMHKS with the following algorithms

‘*’ Denotes that the difference between the two corresponding algorithms isnot significant at 5% significance level, i.e. t-value < 1.7341.

The works about MKL have been submitted.

Our new work

CIPCACIPCADCCADCCADCCAMDCCAMMultiMulti--Kernel learningKernel learningBeyond KCCABeyond KCCA

Beyond KCCA

Proof of Proof of ““KCCA = 2KCCA = 2--KPCA + CCAKPCA + CCA””

Beyond KCCAInsight from Insight from ““KCCA = 2KCCA = 2--KPCA + CCAKPCA + CCA””

Reveal the essence of KCCAReveal the essence of KCCAWhat is important What is important ……1) the original kernel mappings can be generalized to 1) the original kernel mappings can be generalized to empirical kernel mappings, i.e., empirical kernel mappings, i.e.,

: ( ) and : ( )( , ) and ( , )X Y

φ φ ψ ψ⇒

x x y yx K X x y K Y ya a

a a

Beyond KCCA

2) More generally2) More generally

where both where both FFxx and and FFyy are nonare non--negative real functions.negative real functions.AdvantagesAdvantagesAny types of kernel can be employed, Any types of kernel can be employed, e.g., graphe.g., graph--kernel, stringkernel, string-- kernel, treekernel, tree--kernel, and etc., kernel, and etc.,

even noneven non--kernel functionskernel functions

( , ) and ( , )X YF Fx X x y Y ya a

The works about Beyond KCCA have been submitted.

References[And00] S. Ando. Image field categorization and edge/corner dete[And00] S. Ando. Image field categorization and edge/corner detection from gradient ction from gradient covariance. IEEE Transactions on Pattern Analysis and Machine Icovariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, ntelligence, 2000, 22(2):17922(2):179--190.190.

[Bor99] Canonical correlation: a tutorial, at http://[Bor99] Canonical correlation: a tutorial, at http://people.imt.liu.se/~magnus/cca/tutorialpeople.imt.liu.se/~magnus/cca/tutorial/, /, 19991999

[DHS00] R.O. [DHS00] R.O. DudaDuda, P.E. Hart, D.G. Stock, Pattern Classification, 2nd edition, Jo, P.E. Hart, D.G. Stock, Pattern Classification, 2nd edition, John Wiley & hn Wiley & Sons, New York, 2000.Sons, New York, 2000.

[WM01] [WM01] WalczakWalczak, B. and D.L. , B. and D.L. MassartMassart, Dealing with missing data Part I. , Dealing with missing data Part I. ChemometricsChemometrics & & Intelligent Laboratory Systems, vol.58, no.1, pp.15Intelligent Laboratory Systems, vol.58, no.1, pp.15--27. 2001 27. 2001

[For04] B. Fortuna. Kernel canonical correlation analysis with a[For04] B. Fortuna. Kernel canonical correlation analysis with applications. In: SIKDD 2004 pplications. In: SIKDD 2004 at at MulticonferenceMulticonference, Ljubljana, Slovenia. 2004., Ljubljana, Slovenia. 2004.

[He04] Y. Hel[He04] Y. Hel--Or, The canonical correlations of color images and their use forOr, The canonical correlations of color images and their use for demosaicingdemosaicing, , HP Labs Technical Report, HPLHP Labs Technical Report, HPL--20032003--164(R.1), February 2004.164(R.1), February 2004.

References[HSS04] D.R. [HSS04] D.R. HardoonHardoon, S. , S. SzedmakSzedmak, J. , J. ShaweShawe--Taylor. Canonical correlation analysis: an Taylor. Canonical correlation analysis: an overview with application to learning methods. Neural Computatiooverview with application to learning methods. Neural Computation 2004, 16: 2639n 2004, 16: 2639--2664.2664.

[JRS04] Anil K. Jain, [JRS04] Anil K. Jain, ArunArun Ross and Ross and SalilSalil Prabhakar,AnPrabhakar,An Introduction to Biometric Introduction to Biometric Recognition,IEEERecognition,IEEE Transactions on Circuits and Systems for Video Technology, SpecTransactions on Circuits and Systems for Video Technology, Special Issue ial Issue on Imageon Image-- and Videoand Video--Based Biometrics, Vol. 14, No. 1, January 2004.Based Biometrics, Vol. 14, No. 1, January 2004.

[KGP05] H. Kim, G.H. [KGP05] H. Kim, G.H. GolubGolub and H. Park, Missing value estimation for DNA and H. Park, Missing value estimation for DNA microarraymicroarraygene expression data: local least squares imputation, Bioinformagene expression data: local least squares imputation, Bioinformatics, vol.21, 187tics, vol.21, 187--198, 2005198, 2005

[KSE05] E. [KSE05] E. KidronKidron, Y.Y. , Y.Y. SchechnerSchechner, M. , M. EladElad. Pixels that sound, IEEE Proc. of Computer . Pixels that sound, IEEE Proc. of Computer Vision and Pattern Recognition, 2005: 1, 88Vision and Pattern Recognition, 2005: 1, 88--95.95.

[LGD05] M. [LGD05] M. LoogLoog, B. , B. GinnekenGinneken, R.P.W. , R.P.W. DuinDuin. Dimensionality reduction of image features . Dimensionality reduction of image features using the canonical contextual correlation projection. Pattern Rusing the canonical contextual correlation projection. Pattern Recognition, 2005,38(12): 2409ecognition, 2005,38(12): 2409--2418 2418

References

[Nie02] A. Nielsen. [Nie02] A. Nielsen. MultisetMultiset canonical correlations analysis and canonical correlations analysis and multispectralmultispectral, truly , truly multitemporalmultitemporal remote sensing data. IEEE Transactions on Image Processing, 20remote sensing data. IEEE Transactions on Image Processing, 2002,11: 29302,11: 293--305.305.

[SS06] P. [SS06] P. SchreierSchreier, L. , L. ScharfScharf, Canonical coordinates for transform coding of noisy sources, , Canonical coordinates for transform coding of noisy sources, IEEE Transactions on signal processing, 2006, 54(1):235IEEE Transactions on signal processing, 2006, 54(1):235--243.243.

[SZL05] Q. Sun, S. [SZL05] Q. Sun, S. Zeng,YZeng,Y. Liu, P. Liu, P--A. A. HengHeng, D, D--S. Xia. A new method of feature fusion and S. Xia. A new method of feature fusion and its application in image recognition. Pattern Recognition, 2005,its application in image recognition. Pattern Recognition, 2005, 38(12): 243738(12): 2437--2448.2448.

[YVN03] Y. [YVN03] Y. YamanishiYamanishi, J., J.--P. P. VertVert, A. Nakaya1, et al. Extraction of correlated gene clusters , A. Nakaya1, et al. Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correfrom multiple genomic data by generalized kernel canonical correlation analysis. lation analysis. Bioinformatics, 2003, 19(1), pp. i323Bioinformatics, 2003, 19(1), pp. i323––i330.i330.

Thanks!