kernel methods for topological data analysis - ucl · kernel methods for topological data analysis...
TRANSCRIPT
Kernel Methods for Topological Data Analysis Toward Topological Statistics
Kenji Fukumizu
The Institute of Statistical Mathematics (Tokyo, Japan)
Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku Univ.) supported by JST CREST
2nd UCL Workshop on the Theory of Big Data. 6 - 8 January 2016
1
Topological Data Analysis• TDA: a new method for extracting topological or geometrical
information of data.
• Persistence homology: a recent mathematical tool (Edelsbrunner et al 2002; Carlsson 2005)
• Progress of computational topology: computing topological invariants in feasible computational cost.
• Big data? Not only the size, buy complex structure matters TDA.
2
TDA: Various applications
33
Brain artery treese.g. age effect(Bendich et al 2014)
Brain Science
Structure change of proteinseg. open / closed(Kovacev-Nikolic et al 2015)
Material Science
Computer Vision
Shape signature, natural image statistics(Freedman & Chen 2009)
Data of highly complex geometric structure
Often difficult to define good feature vectors / descriptors
etc…Non-crystal materials(Nakamura, Hiraoka, Hirata, Escolar, Nishiura. Nanotechnology 26 (2015))
Liquid Glass
TDA provides a compact representation for geometry of such data
Biology / protein
Algebraic Topology• Homology group
4
≅
Algebraic operation
Homology group: topological invariance
0 dim(connected comp.) : �� ≅ �1 dim(ring) : �� ≅ � ⊕ �
・・・
Various topological invariances
e.g. Euler number = ∑ −1 �dim ���
The generators of 1 dim. homology group
≅ Independent ℓ-dim. holes
Topology of statistical data?
5
Noisy sample
True structure
� neighborhood(e.g. manifold learning)
Small � disconnected object
Large � Small ring is not visible
Stable extraction of topology is NOT easy!
Persistence Homology• All � considered
� = �� ���� ⊂ �� , �� ≔∪���
� ��(��)
6
� small � large
Two rings (generators of 1 dim homology) persist in a long interval.
• Persistence homology (formal definition)Filtration of topological spaces X ∶ �� ⊂ �� ⊂ ⋯ ⊂ ��
���(X): �� �� → �� �� → ⋯ → ��(��) ≅ ⊕���
���[��, ��]
� �, � ≅ 0 → ⋯ → 0 → � → ⋯ → � → 0 → ⋯ → 0
• Persistence diagram (PD)
7
at �� at ��
�: field
Irreducible decomposition
Birth and death of a generator �
PD plots the birth and death of each generator of PH in a 2D graph (� ≥ �).
�
Birth Death
The lifetime of each generator is rigorously defined,and can be computed numerically.
Handy descriptor of complex geometry
Topological statistics: systematic data analysis in TDA• Conventional analysis in TDA: analysis WITH PH/PD
Data Computation of PH Display(PD) Analysis by human
• Our statistical approach to TDA: analysis OF random PH/PD
8
Many data setsComputation
of PD’s
PD1
PD2
PD3
PDn
Many persistent diagrams
Analysis of PD’s
Software available
CGAL / PHAT
Features / Descriptors
How?
Topological Statistics
“I predict a new subject of statistical topology. Rather than count the number of holes, Betti numbers, etc., one will be more interested in the distribution of such objects on noncompact manifolds as one goes out to infinity.”
— Isadore Singer, 2004.
9
Kernel representation of PD• Vectorization of PD by positive definite kernel
• PD = Discrete measure �� ≔ ∑ ���∈��
• Embedding of PD’s into RKHSℇ�: �� ↦ ∫ � ⋅, � ��� � = ∑ �(⋅, ��)� ∈ �� , Vectorization
• For some kernels (e.g., Gaussian, Laplace), ℇ� is injective.
• By vectorization,• a number of methods for data analysis can be applied,
SVM, regression, PCA, CCA, etc. • tractable computation is possible with kernel methods.
10
�: positive definite kernel��: corresponding RKHS
Persistence Weighted Gaussian (PWG) Kernel
Generators close to the diagonal may be noise, and should be discounted.
���� �, � = � � � � exp −��� �
���
� � = ��,� � ≔ arctan �Pers � � (�, � > 0)
Pers � ≔ � − � for � ∈ { �, � ∈ ��|� ≥ �}
11
Pers(x1)
• Stability with PWG kernel embedding• PWGK defines a distance measure on the persistence diagrams,
�� ��, �� ≔ ℇ� �� − ℇ� �� ��
Statiblity Theorem (Kusano, Hiraoka, Fukumizu 2015+) �: compact subset in �� . � ⊂ �, � ⊂ �� : finite sets.If � > � + 1, then with PWG kernel (�, �, �),
�� ��(�), ��(�) ≤ � �� �, � .
�: constant depending only on �, �, �, �, �
��(�): � dimensional persistence diagram of �
�� : Haussdorff distance
This stability is not known for Gaussian kernel.12
A small change of a set causes only a small change in PD
2nd‑level kernel
2nd-level kernel (SVM for measures, Muandet, Fukumizu, Dinuzzo, Schölkopf 2012)
• RKHS-Gaussian kernel � ��, �� = exp −����� ��
�
���
derives
� ��, �� = exp −ℇ�(��)�ℇ�(��)
��
�
���
13
PD1
PD2
PD3
PDm
ℇ� ���
ℇ� ���
…ℇ� ���
Vectors in RKHSPDData sets
Application of pos. def. Kernel on RKHS
��, ��: Persistence diagrams
Data analysis method
Embedding
Computational issueThe number of generators in a PD may be large (≥ 103, 104 )
For ��� = ∑ ���
(�)����� ∪ Δ, � ���, ��� = exp −
ℇ�(���)�ℇ�(���)��
�
��� requires computation
ℇ�(���) − ℇ�(���)��
�
= ∑ ∑ � ���
, �����
� ����� �� + ∑ ∑ � ��
�, ��
���
� ��
��
� �� − 2 ∑ ∑ � ���
, �����
� ����� �� .
The number of exp −�����
�
��� = �(����) computationally expensive for � ≈ 10�
14
� = max{��|� = 1, … , �}
• Approximation by random features (Rahimi & Recht 2008)
By Bochner’s theorem
exp −�����
�
��� = � ∫ � ���� �������
����
�� � �
� ��
Approximation by sampling: ��, … , ��: �. �. �. ~ ��
exp −�����
�
��� ≈ ��
�∑ � ���ℓ
��� � ���ℓ����
ℓ��
∑ ∑ � ���
, �����
� �� ≈��� ��
�
�∑ ∑ ∑ � ��
�� ��
�� ���ℓ
���(�)
� ���ℓ���
(�)�ℓ��
��
� ����� ��
=�
�∑ ∑ � ��
�� ���ℓ
���(�)
∑ � ���
� ���ℓ���
(�)��
������ ��
�ℓ ��
Computational cost �(��) 2nd level Gram matrix �(��� + ���). c.f. �(����)
Big gain if �, � ≪ �
15
Gaussian distribution =: ��
� dim.
(Fourier transform)
Comparison: Persistence Scale Space Kernel(Reininghaus et al 2015)
• PSS Kernel
�� �, � =1
8��exp
� − � �
8�− exp
� − �� �
8�
�� = (�, �) for � = (�, �).
ℇ�(�) is considered.
• Comparison between PWGK and PSSK • PWGK can control the discount around the diagonal independently of the
bandwidth parameter.• PSSK is not shift-invariant Random feature approximation is not applicable.• In Reininghaus et al 2015, 2nd level kernel is not considered.
16
Pos. def. on �, � � ≥ �0 on Δ.
Synthetic example: SVM classification• Classification of PD’s by SVM
• Synthetic data• One big circle (random location and sample size) �1
with or without small circle �0
• � = XOR((birth(�1)<1 && death(�1)>4)), �0 exists)• In fact, data include noise.• 100 for training
and 100 for testing
• Result (correct classification)• PWGK (proposed): 83.8%• PSSK (comparison): 46.5%
17
S0
S1
S1S1
S0
noise
PD1
Data points 1
Data points 2
No S0
� = 1
Application: Transition of Silica (SiO2 )If cooled down rapidly from the liquid state, SiO2 changes into the glass state (not to crystal).Goal: identify the temperature of phase transition.Data: Molecular Dynamics simulation for SiO2. 3D arrangements of the atoms are used for computing PD at 80 temperatures. (Nakamura et al 2015; Hiraoka et al 2015)
18
Examples of PD’s
Liquid Glass (Amorphous)
Amorphous: “soft” structure
Change point detection• Data along a parameter �
�� , � = 1, … , �.
Kernel Change Point Analysis with Fisher Discriminant score (Harchoui et al 2009): For each �, two classes are defined by the data before and after �. Fisher score on RKHS is used.
• For each �, compute ���:� =�
�∑ Φ(��)�
��� and �����:� =�
���∑ Φ(��)�
����� .
• Compute Δ� ≔ ��:� + ����:� + �� ��
�(���:� − �����:�)��
�
.
• Find max�
Δ�.
• For the packing problem, �� = ℇ� ��� (� = 1, … , 80).
19
• Detection of liquid-glass state transition• Approach in physics:Estimation using derivatives of enthalpy, but not so
accurate. Estimate = [2000K, 3500K]• Our approach: Change point detection by Kernel FDR.• Number of generators in a PD is 30000 at most difficult to use PSSK directly• Only PWGK (proposed) is applied with random features.
20
KFDR Gram matrix
Detected change point = 3100K
• Kernel PCA with PWGK
21
Sharp change between the two phases.
(Colored by the result of change point detection. Colors are not used for KPCA).
PD + PWGK successfully capture “soft” and complex structure of the glass state.
Conclusion• Goal: establishing topological statistics
• Persistence homology can be used for defining useful descriptors to complex objects of geometrical structures.
• Systematic statistical methods for TDA: Kernel methods introduce systematic data analysis to TDA.• Vectorization of persistence diagrams by kernel embedding.• Persistence weighted Gaussian kernel flexible kernel for noise.• A toolbox of kernel methods become available:
SVM, kernel ridge regression, kernel PCA, kernel CCA, etc ….
22