![Page 1: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/1.jpg)
Estimating Density Functionals
Barnabás Póczos
TexPoint fonts used in EMF.
Read the TexPoint manual before you delete this box.: AAAAAA
![Page 2: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/2.jpg)
Why are we all here?
![Page 3: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/3.jpg)
Curious
![Page 4: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/4.jpg)
To solve these problems, our main tool is always the same
![Page 5: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/5.jpg)
Collect data & learn from data
5
![Page 6: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/6.jpg)
Difficult & Important
) We need Entropy, Dependence, and Divergence
estimators to do machine learning
How random is the data?
• How large is its entropy?
How large is the dependence among the instances? Which variables are dependent, which ones are independent?
• How large is their mutual information?
How different are the distributions of the instances?
• How large is the divergence between the distributions?
Basic questions about the data
The world is very complicated...
We have to understand complex relationships across the data.
![Page 7: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/7.jpg)
Entropy, Mutual Information, Divergence
C. Shannon
A. Rényi
I. Csiszár
Fernandes & Gloor: Mutual information is critically dependent on
prior assumptions: would the correct estimate of mutual
information please identify itself? BIOINFORMATICS Vol. 26 no. 9 2010, pages 1135–1139
![Page 8: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/8.jpg)
“Mutual information” query produces 325,000 hits on Google Scholar, and the
first 10 papers have more than 30,065 citations.
Most of these papers are application papers, e.g. in feature selection, computer
vision, medical image processing, image alignment, and data fusion. As we find
better estimators, such applications can simply use them .
“Big Data” search on Google Scholar produces 181,000 hits, and the first 10 hits
have 12,872 citations.
Similarly, the “Deep Learning” search produces 106,000 hits, and the first 10
papers have 8,485 citations (as of May 28, 2017).
Developing efficient estimators for mutual information and related
quantities is highly important in many applications.
![Page 9: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/9.jpg)
How should we estimate them?
Naïve plug-in approach using density estimation
Density: nuisance parameter
Density estimation: difficult, curse of dimensionality!
histogram
kernel density estimation
k-nearest neighbors [D. Loftsgaarden & C. Quesenberry. 1965.]
How can we estimate them directly, without estimating the density?
![Page 10: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/10.jpg)
Part I
Consistent estimators for
Rényi entropy
Rényi mutual information
A large class of divergences that includes Rényi and L2
They avoid density estimation!
![Page 11: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/11.jpg)
Dealing with complex objects break into smaller parts, represent the input as a set of smaller parts
treat the set elements as sample points from some unknown distribution
do ML on these unknown distributions represented by sets
Part II Generalize ML to sets and distributions
Most machine learning algorithms operate on vectorial objects.
The world is complicated. Often • hand crafted vectorial features are not good enough
• natural to work with complex inputs directly (sets or distributions...)
Each galaxy can be represented by a feature vector
Classify galaxy clusters
Each cluster can be represented by a set of these vectors
We can’t concatenate the feature vectors into a huge vector
![Page 12: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/12.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
entropy
estimation
OUTLINE
Applications computer vision astronomy other applications
![Page 13: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/13.jpg)
Using
Estimate Rényi entropy
without density estimation
ENTROPY ESTIMATION
13
![Page 14: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/14.jpg)
Rényi- entropy estimators using kNN graphs
Calculate:
Pál, Póczos & Szepesvári. NIPS 2010
14
![Page 15: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/15.jpg)
Theoretical Results
15
Pál, Póczos & Szepesvári, NIPS 2010
First high probability rate on Rényi entropy estimators.
Convergence rate
Almost surely consistent
15
![Page 16: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/16.jpg)
Why is this entropy estimator consistent?
16
The larger the entropy, the longer the kNN graph is.
Quasi-subadditivity:
Details in Pál, Póczos & Szepesvári, NIPS 2010 16
![Page 17: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/17.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
OUTLINE
Applications computer vision astronomy other applications
dependence
estimation
![Page 18: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/18.jpg)
Using
Estimate MI
MUTUAL INFORMATION ESTIMATION
without density estimation
18
![Page 19: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/19.jpg)
How can we get mutual information estimators from entropy estimators?
Trick: Information is preserved under monotonic transformations.
Monotone transform Uniform margins
19
![Page 20: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/20.jpg)
Transformation to Get Uniform Margins
The copula transformation:
A little problem: we don’t know Fi distribution functions…
Solution: Empirical distribution function (ranks are enough)
Monotone transformation leading to uniform margins?
Prob theory 101:
20
![Page 21: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/21.jpg)
Sklar’s Theorem, 1959
The copula couples the joint distribution to its
margins, and preserves all the dependencies
= +
+
Copula distribution
21
F is a composition of its copula C
and the marginals
![Page 22: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/22.jpg)
Copula based methods are popular in financial analysis.
So popular and powerful that they led to the global financial crisis…
It is time to make them
more popular in
machine learning too!…
22
![Page 23: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/23.jpg)
Consistency theorem:
REGO: Rank-based estimation of Rényi information using Euclidean Graph Optimization
1st direct, consistent Rényi mutual information estimator
Póczos, Kirshner & Szepesvári. AI and Statistics, 2010.
23
Other Euclidean graphs: TSP, MST, Minimal Matching, ...
![Page 24: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/24.jpg)
Convergence Rate
Pál, Póczos & Szepesvári, NIPS 2010
24
![Page 25: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/25.jpg)
Robustness to Outliers
The amount of change caused by adding one outlier x
It cannot be arbitrarily big!
Póczos, Kirshner & Szepesvári, AI and Statistics, 2010 25
Empirical mean:
REGO:
![Page 26: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/26.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
OUTLINE
Applications computer vision astronomy other applications
divergence
estimation
![Page 27: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/27.jpg)
Using
Estimate divergence
DIVERGENCE ESTIMATION
without density estimation
27
![Page 28: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/28.jpg)
The Estimator
![Page 29: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/29.jpg)
Asymptotically Unbiased
We need to prove:
The estimator
Normalized k-NN distances converge to the Erlang distribution
Póczos & Schneider, AISTATS 2011
All we need is
29
![Page 30: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/30.jpg)
A little problem…
Asymptotic uniformly integrability… Solutions:
1 2 3
Be careful, mistakes are easy to make!
Strong law of large numbers [NIPS]
Need:
Appendix of Póczos & Schneider, AISTATS 2011
30
![Page 31: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/31.jpg)
Be careful, some mistakes are easy to make…
We want:
Helly–Bray theorem
[Annals of Statistics]
Enough:
Fatou lemma:
Fatou lemma:
[Journal of Nonparametric Statistics, Problems of Information Transmission,
IEEE Trans. on Information Theory]
31
![Page 32: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/32.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
OUTLINE
Applications computer vision astronomy other applications
classification, regression, clustering,
anomaly detection, low-dim embedding
![Page 33: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/33.jpg)
Dealing with complex objects break into smaller parts, represent the input as a set of smaller parts
treat the set elements as sample points from some unknown distribution
do ML on these unknown distributions represented by sets
ML on Distributions
![Page 34: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/34.jpg)
Machine Learning on Distributions
If we can estimate divergences and inner products between
distributions, then we can construct ML algorithms
that operate on distributions.
Many ML algorithms only require
the pairwise distances between the inputs
the inner products between the inputs
Classification
Regression
Low-dimensional embedding
Anomaly detection
34
![Page 35: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/35.jpg)
Distribution Regression / Classification
Y1=1
P1
Y2=0
P2
?
Pm+1
Y3=1
P3
Ym=0
Pm …
The inputs are distributions, density functions (not vectors)
We don’t know these distributions, only sample sets are available
(error in variables model)
Differences compared to standard methods on vectors
![Page 36: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/36.jpg)
Distribution Classification
Problems:
Solution: Use RKHS based SVM!
Dual form of SVM:
36
Calculate the Gram matrix
![Page 37: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/37.jpg)
Kernel Estimation
Linear kernel:
Polynomial kernel:
Gaussian kernel:
Solution: make it symmetric, and project it to the cone of PSD matrices
37
We already know how!
![Page 38: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/38.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
OUTLINE
Applications computer vision astronomy other applications computer vision
![Page 39: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/39.jpg)
Image Representation with Distributions
Each image patch is represented by PCA compressed SIFT vectors.
SIFT = Scale-invariant feature transform. PCA: 128dim) d dim
Image patches •Overlapping •Non-overlapping
Patch locations •Grid points •Interesting points •Random
Patch sizes •Same •Different, •Hierarchy
Dealing with complex objects
break into smaller parts,
represent the object as a sample set of these parts
d-dimensional sample set representation of the image
39
Each set is considered as a sample set from some unknown distribution.
Each image is represented as a set of these d dim feature vectors.
![Page 40: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/40.jpg)
Detecting Anomalous Images B. Póczos, L. Xiong & J. Schneider, UAI, 2011.
50 highway images
5 anomalies
2-dimensional sample set representation of images (128 dim SIFT ) 2 dim)
Anomaly score: divergences between the distributions of these sample sets 40
![Page 41: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/41.jpg)
Detecting Anomalous Images 1 2 3 4 9 5 8 6 7 10
55 54 53 51 52 41
![Page 42: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/42.jpg)
2-dimensional sample set representation
GMM-5 Density Approximation
42
1 2 3 4 9 5 8 6 7 10
55 54 53 51 52
![Page 43: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/43.jpg)
Noisy USPS Dataset Classification with SDM
Results:
SVM on raw images 82.1 ± .5% accuracy
Original (noiseless) USPS dataset is easy ~97%
SDM on the 2D distributions, Rényi divergence: 96.0 ± .3% accuracy
43
160
160
Each instance (image) is a set of 500 2d points
1000 training and 1000 test instances
![Page 44: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/44.jpg)
Multidimensional Scaling of USPS Data
Raw images
using Euclidean distance Estimated Euclidean distance
between the distributions
Nonlinear embedding with MDS into 2d.
10 instances from figures 1,2,3,4.
44
Calculate pairwise Euclidean distances.
![Page 45: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/45.jpg)
Local Linear Embedding of Distributions
72 rotated COIL froggies Edge detected COIL froggy
Euclidean distance between images Euclidean distance between distributions 45
![Page 46: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/46.jpg)
Object Classification ETH-80 [Leibe and Schiele, 2003]
BoW: 88.9%
NPR: 90.1%
8 categories, 400 images, each image is represented by 576 18 dim points
Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 46
2-fold CV,16 runs
![Page 47: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/47.jpg)
Outdoor Scenes Classification [Oliva and Torralba, 2001]
Best published: 91.57% (Qin and Yung, ICMV 2010)
NPR: 92.3%
coast
mountain country
forest
street
highway city
tall building
8 categories, 2688 images,
each represented by 1815 53 dim points.
Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 47
10 fold CV, 16 runs
![Page 48: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/48.jpg)
8 categories, 1040 images, each represented by 295 to 1542 57 dim points.
Sport Events Classification [Li and Fei Fei, 2007]
Best published: 86.7% (Zhang et al, CVPR 2011)
NPR: 87.1%
Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 48 2 fold CV, 16 runs
badminton bocce croquet polo sailing climbing rowing snowboard
![Page 49: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/49.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
OUTLINE
Applications computer vision astronomy other applications astronomy
![Page 50: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/50.jpg)
Find new scientific laws in physics
Goal: Estimate dynamical mass of galaxy clusters. Importance: Galaxy clusters are being the largest gravitationally bound systems
in the Universe. Dynamical mass measurements are important to understand the
behavior of dark matter and normal matter.
Difficulty: We can only measure the velocity of galaxies not the mass of their cluster.
Physicists estimate dynamical cluster mass from single velocity dispersion.
Our method: Estimate the cluster mass from the whole distribution of velocities
rather than just a simple velocity distribution.
![Page 51: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/51.jpg)
Find new scientific laws in physics
Michelle Ntampaka et al, A Machine Learning Approach for Dynamical Mass Measurements of
Galaxy Clusters, APJ 2015
![Page 52: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/52.jpg)
B. Póczos, L. Xiong & J. Schneider, UAI, 2011.
What are the most anomalous galaxy clusters?
The most anomalous galaxy cluster contains mostly
star forming blue galaxies
irregular galaxies
Sloan Digital Sky Survey (SDSS)
continuum spectrum
505 galaxy clusters
(10-50 galaxies in each)
7530 galaxies
Find interesting Galaxy Clusters
Blue galaxy Red galaxy
Credits: ESA, NASA
![Page 53: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/53.jpg)
Given a distribution of particles,
our goal is to predict the
parameters of the simulated
universe
Find the parameters of Universe
![Page 54: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/54.jpg)
dependence
estimation
entropy
estimation
divergence
estimation Part 1
Estimators
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 2
ML on distributions
OUTLINE
Applications computer vision astronomy other applications other applications
![Page 55: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/55.jpg)
Understanding Turbulences
Credits: ESA, NASA, PPPL, Wikipedia 55
![Page 56: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/56.jpg)
Turbulence Data Classification
Simulated fluid flow through time (JHU Turbulence Research Group)
Positive (vortex) Negative Negative
Velocity distributions
56
•11 positive, 20 negative examples
find interesting events, patterns, phenomena Goal: find vortices!
•Results: Leave one out cross-val : 97%
![Page 57: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/57.jpg)
Finding Vortices
Classification probabilities 57
![Page 58: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/58.jpg)
Find Interesting Phenomena in Turbulence Data
Anomaly scores
Anomaly detection with 1-class SDM
58
![Page 59: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/59.jpg)
Vorticity Scores
Find Interesting Phenomena in Turbulence Data
Xiong, Póczos, and Schneider, NIPS 2011. 59
![Page 60: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/60.jpg)
Agriculture
![Page 61: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/61.jpg)
Surrogate robotic system in the field
![Page 62: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/62.jpg)
Surrogate robotic system in the field
The surrogate system collecting data at the TAMU field site. The carriage supports two boom assemblies each
one of which carries a sensor pod. The carriage slides up and down on the column allowing full scanning of a
plant.
![Page 63: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/63.jpg)
Surrogate robotic system in the field
The carriage/dual-boom assembly
moves up and down the column at
a constant scanning speed. At its
highest travel point the assembly
clears the canopy (right).
![Page 64: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/64.jpg)
Data collection with sensor pods
A sensor pod is deployed into a row and scans a plant
![Page 65: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/65.jpg)
![Page 66: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/66.jpg)
Name Range RMSE error
Leaf angle* 75.94 3.30 (4.35%)
Leaf radiation angle* 120.66 4.34 (3.60%)
Leaf length* 35.00 0.87 (2.49%)
Leaf width [max] 3.61 0.27 (7.48%)
Leaf width [average] 2.99 0.21 (7.o2%)
Leaf area* 133.45 8.11 (6.08%)
![Page 67: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/67.jpg)
Extensions
L2 divergence:
Conditional Rényi Mutual Information:
x
y
z
B. Póczos & J. Schneider, AISTATS, 2012.
![Page 68: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/68.jpg)
Rates for kNN Estimators
![Page 69: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/69.jpg)
Problem Setting
![Page 70: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/70.jpg)
Notation
![Page 71: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/71.jpg)
K-NN Distances
![Page 72: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/72.jpg)
k-NN density estimation
![Page 73: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/73.jpg)
k-NN density Estimator Properties
The Plug-In estimator:
![Page 74: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/74.jpg)
Plug-In Estimator Properties
![Page 75: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/75.jpg)
Issues with the Plug-In Estimator
![Page 76: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/76.jpg)
Fixed-k functional estimators
![Page 77: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/77.jpg)
Bias Correction
![Page 78: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/78.jpg)
Bias Correction
That is,
![Page 79: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/79.jpg)
Bias Correction
Bias Correction for Divergences:
![Page 80: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/80.jpg)
Known Bias Correction Functions
![Page 81: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/81.jpg)
Bias Correction: Entropy Special Case
![Page 82: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/82.jpg)
Here, we discuss some of these challenges, motivating the assumptions we make to overcome them.
Discussion of Assumptions
![Page 83: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/83.jpg)
For these reasons, it has been common in the analysis of k-NN estimators to make the following assumption:
Discussion of Assumptions
![Page 84: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/84.jpg)
Discussion of Assumptions
![Page 85: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/85.jpg)
Discussion of Assumptions
![Page 86: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/86.jpg)
(A4) is a Mild Assumption
![Page 87: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/87.jpg)
Concentation of k-NN Distances
![Page 88: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/88.jpg)
Concentration of k-NN Distances
Concentration Corollary:
![Page 89: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/89.jpg)
Main results: Bias Bound:
![Page 90: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/90.jpg)
Main results: Variance Bound
![Page 91: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/91.jpg)
Main results: MSE Bound
![Page 92: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/92.jpg)
Benefits
are faster to compute,
can also exhibit superior rates of convergence
Compared to plug-in estimators, fixed-k estimators:
![Page 93: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/93.jpg)
Benefits
Main result: Under some conditions
![Page 94: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster](https://reader033.vdocuments.mx/reader033/viewer/2022050217/5f63149486d88d6a490de0c5/html5/thumbnails/94.jpg)
Dependence
estimation
Entropy
estimation
Divergence
estimation
computer vision, astronomy, other applications
classification, regression, clustering,
anomaly detection, low-dim embedding
Part 1
Part 2
Applications
Take Me Home!
• Outperforms state-of-the-art results in CV benchmarks
• Solves new problems in Astronomy, Turbulence data analysis, Agriculture
• Support Distribution Machines
• direct, consistent estimators, rates
• 1st Rényi MI estimator: robust, rank statistics only
• 1st divergence estimators
94