density traversal clustering and generative kernels
DESCRIPTION
Density Traversal Clustering and Generative Kernels. a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh. Attribute Generalisation. Prior work. Tishby and Slonim Meila and Shi Coifman et al Nadler et al. Example : Transition Matrix. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/1.jpg)
Amos Storkey, School of Informatics.
Density Traversal Clusteringand Generative Kernels
a generative framework
for spectral clustering
Amos Storkey, Tom G Griffiths
University of Edinburgh
![Page 2: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/2.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Attribute Generalisation
![Page 3: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/3.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Prior work
• Tishby and Slonim• Meila and Shi• Coifman et al• Nadler et al
![Page 4: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/4.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Example: Transition Matrix
![Page 5: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/5.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Example: 20 Iterations
![Page 6: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/6.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Example: 400 Iterations
![Page 7: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/7.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Argument
• A priori dependence on data.• No generative model.• Inconsistent with underlying density.
• Clusters are spatial characteristics that are properties of distributions.
• Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.
![Page 8: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/8.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
But we do know
• Know diffusion asymptotics, but probabilistic formalism inconsistent with data density:– Finite time-step, infinite data limit equilibrium distribution
does not match data distribution.
![Page 9: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/9.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Density Traversal Clustering
• Define discrete time, continuous, diffusing Markov chain.
• Definition dependent on some latent distribution.• Call this the Traversal Distribution.
![Page 10: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/10.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
The Markov chain
• Transition with probability
• D(y,x) is Gaussian centred at x, P* is Traversal distribution.
• Here S is given by the solution of
)()(),()(
)(
)()(),()|(
1*1
11
1*1
1
ySyPxyDdyxZ
xZ
xSxPxxDxxP
t
tttttt
)(
),()()(
*
yS
xyDyPdyxS
![Page 11: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/11.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Generative procedure
![Page 12: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/12.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Problems
• Random walk in continuous space• Each step involves many intractable integrals.• Real Bayesians would...• Good prior distributions over distributions is a hard
problem, but need prior for traversal distributions.
![Page 13: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/13.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
CHEAT
• Doing all the integrals is not possible, but...– All integrals are with respect to traversal distribution– Use empirical data proxy– All the integrals now become sample estimates: sums
over the data points.– Everything is computable in the space of data points.– WORKS!: never need to evaluate the probability at a
point, only integrals over regions.
![Page 14: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/14.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
We get…
• Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij
– A = WS-1
– W is usual affinity
– S-1 is extra consistency term.
• More generally have out of sample scaled likelihood:– P(x | centre y) / P(x)= n a(x)T (AD-2)b(y)
where a(x) and b(x) are the traversal probabilities to and from x.
![Page 15: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/15.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Example: Scaled likelihoods
![Page 16: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/16.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Example: 20 Iterations
![Page 17: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/17.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Example: 400 Iterations
![Page 18: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/18.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Initial distribution
• Can consider other initial distributions.• Specifically can consider delta functions at mixture
centres.• Variational Bayesian Mixture models…
![Page 19: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/19.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Demo
![Page 20: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/20.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Number of clusters
• Scaled likelihoods for three cluster problem.
![Page 21: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/21.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Number of clusters
• Scaled likelihoods for a five cluster problem.
![Page 22: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/22.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Cluster allocations
![Page 23: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/23.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Cluster allocations
![Page 24: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/24.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Conclusion
• A priori formulation of spectral clustering.• Can be used as any other spectral procedure• But also provides scaled likelihoods – can be
combined with Bayesian procedures.• Variational Bayesian formalism.• Small sample approximation issues.• Better to have a flexible density estimator.
![Page 25: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/25.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Generative Kernels
• Related to Seeger: Covariance Kernels from Bayesian Generative Models
Gaussian Process over X space
Data is obtained by diffusing in X space using the traversal process...
Density, and corresponding traversal process.
And then local averaging andAdditive noise.
X
![Page 26: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/26.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Generative Kernels
• Covariance Kij is
• Again use sample estimates.• Presume measured target is local average.• Just standard basis function derivation of GP.
),() sourced () sourced (),( yxKsyPrxPdxdysrK
![Page 27: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/27.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Motivation
• Generative model generates clustered data positions.
• Targets diffuse using traversal process.• Target values suffer locality averaging influence:
– Diffused objects locally influence one another’s target values so everyone becomes like their neighbours.
• E.g. Accents.• Can add local measurement noise.
![Page 28: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/28.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Kernel Clustering
• Use sample estimates again to get kernel
• Can also encorporate a prior over iterations and integrate out.
• For example can use matrix exponential exp(A) instead of (AD).
ij
ijjDT
iDT KAsaArasrK .
1.
1 )()()()(),(
![Page 29: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/29.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
Generating targets for rings data
• Can generate from the model:
• Across cluster covariance is low.
• Within cluster continuity.
![Page 30: Density Traversal Clustering and Generative Kernels](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56812c42550346895d90c73c/html5/thumbnails/30.jpg)
Amos Storkey, School of Informatics, University of Edinburgh
The point?
• Density dependence matters in missing data problems.
• Gaussian process: data with missing targets has no influence.
• Density Traversal Kernel: data with missing targets affects kernel, and hence has influence.