estimating density functionalszoltan.szabo/ml... · gaussian kernel: solution: make it symmetric,...

94
Estimating Density Functionals Barnabás Póczos

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Estimating Density Functionals

Barnabás Póczos

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAA

Page 2: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Why are we all here?

Page 3: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Curious

Page 4: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

To solve these problems, our main tool is always the same

Page 5: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Collect data & learn from data

5

Page 6: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Difficult & Important

) We need Entropy, Dependence, and Divergence

estimators to do machine learning

How random is the data?

• How large is its entropy?

How large is the dependence among the instances? Which variables are dependent, which ones are independent?

• How large is their mutual information?

How different are the distributions of the instances?

• How large is the divergence between the distributions?

Basic questions about the data

The world is very complicated...

We have to understand complex relationships across the data.

Page 7: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Entropy, Mutual Information, Divergence

C. Shannon

A. Rényi

I. Csiszár

Fernandes & Gloor: Mutual information is critically dependent on

prior assumptions: would the correct estimate of mutual

information please identify itself? BIOINFORMATICS Vol. 26 no. 9 2010, pages 1135–1139

Page 8: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

“Mutual information” query produces 325,000 hits on Google Scholar, and the

first 10 papers have more than 30,065 citations.

Most of these papers are application papers, e.g. in feature selection, computer

vision, medical image processing, image alignment, and data fusion. As we find

better estimators, such applications can simply use them .

“Big Data” search on Google Scholar produces 181,000 hits, and the first 10 hits

have 12,872 citations.

Similarly, the “Deep Learning” search produces 106,000 hits, and the first 10

papers have 8,485 citations (as of May 28, 2017).

Developing efficient estimators for mutual information and related

quantities is highly important in many applications.

Page 9: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

How should we estimate them?

Naïve plug-in approach using density estimation

Density: nuisance parameter

Density estimation: difficult, curse of dimensionality!

histogram

kernel density estimation

k-nearest neighbors [D. Loftsgaarden & C. Quesenberry. 1965.]

How can we estimate them directly, without estimating the density?

Page 10: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Part I

Consistent estimators for

Rényi entropy

Rényi mutual information

A large class of divergences that includes Rényi and L2

They avoid density estimation!

Page 11: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Dealing with complex objects break into smaller parts, represent the input as a set of smaller parts

treat the set elements as sample points from some unknown distribution

do ML on these unknown distributions represented by sets

Part II Generalize ML to sets and distributions

Most machine learning algorithms operate on vectorial objects.

The world is complicated. Often • hand crafted vectorial features are not good enough

• natural to work with complex inputs directly (sets or distributions...)

Each galaxy can be represented by a feature vector

Classify galaxy clusters

Each cluster can be represented by a set of these vectors

We can’t concatenate the feature vectors into a huge vector

Page 12: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

entropy

estimation

OUTLINE

Applications computer vision astronomy other applications

Page 13: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Using

Estimate Rényi entropy

without density estimation

ENTROPY ESTIMATION

13

Page 14: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Rényi- entropy estimators using kNN graphs

Calculate:

Pál, Póczos & Szepesvári. NIPS 2010

14

Page 15: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Theoretical Results

15

Pál, Póczos & Szepesvári, NIPS 2010

First high probability rate on Rényi entropy estimators.

Convergence rate

Almost surely consistent

15

Page 16: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Why is this entropy estimator consistent?

16

The larger the entropy, the longer the kNN graph is.

Quasi-subadditivity:

Details in Pál, Póczos & Szepesvári, NIPS 2010 16

Page 17: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

OUTLINE

Applications computer vision astronomy other applications

dependence

estimation

Page 18: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Using

Estimate MI

MUTUAL INFORMATION ESTIMATION

without density estimation

18

Page 19: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

How can we get mutual information estimators from entropy estimators?

Trick: Information is preserved under monotonic transformations.

Monotone transform Uniform margins

19

Page 20: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Transformation to Get Uniform Margins

The copula transformation:

A little problem: we don’t know Fi distribution functions…

Solution: Empirical distribution function (ranks are enough)

Monotone transformation leading to uniform margins?

Prob theory 101:

20

Page 21: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Sklar’s Theorem, 1959

The copula couples the joint distribution to its

margins, and preserves all the dependencies

= +

+

Copula distribution

21

F is a composition of its copula C

and the marginals

Page 22: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Copula based methods are popular in financial analysis.

So popular and powerful that they led to the global financial crisis…

It is time to make them

more popular in

machine learning too!…

22

Page 23: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Consistency theorem:

REGO: Rank-based estimation of Rényi information using Euclidean Graph Optimization

1st direct, consistent Rényi mutual information estimator

Póczos, Kirshner & Szepesvári. AI and Statistics, 2010.

23

Other Euclidean graphs: TSP, MST, Minimal Matching, ...

Page 24: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Convergence Rate

Pál, Póczos & Szepesvári, NIPS 2010

24

Page 25: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Robustness to Outliers

The amount of change caused by adding one outlier x

It cannot be arbitrarily big!

Póczos, Kirshner & Szepesvári, AI and Statistics, 2010 25

Empirical mean:

REGO:

Page 26: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

OUTLINE

Applications computer vision astronomy other applications

divergence

estimation

Page 27: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Using

Estimate divergence

DIVERGENCE ESTIMATION

without density estimation

27

Page 28: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

The Estimator

Page 29: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Asymptotically Unbiased

We need to prove:

The estimator

Normalized k-NN distances converge to the Erlang distribution

Póczos & Schneider, AISTATS 2011

All we need is

29

Page 30: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

A little problem…

Asymptotic uniformly integrability… Solutions:

1 2 3

Be careful, mistakes are easy to make!

Strong law of large numbers [NIPS]

Need:

Appendix of Póczos & Schneider, AISTATS 2011

30

Page 31: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Be careful, some mistakes are easy to make…

We want:

Helly–Bray theorem

[Annals of Statistics]

Enough:

Fatou lemma:

Fatou lemma:

[Journal of Nonparametric Statistics, Problems of Information Transmission,

IEEE Trans. on Information Theory]

31

Page 32: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

OUTLINE

Applications computer vision astronomy other applications

classification, regression, clustering,

anomaly detection, low-dim embedding

Page 33: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Dealing with complex objects break into smaller parts, represent the input as a set of smaller parts

treat the set elements as sample points from some unknown distribution

do ML on these unknown distributions represented by sets

ML on Distributions

Page 34: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Machine Learning on Distributions

If we can estimate divergences and inner products between

distributions, then we can construct ML algorithms

that operate on distributions.

Many ML algorithms only require

the pairwise distances between the inputs

the inner products between the inputs

Classification

Regression

Low-dimensional embedding

Anomaly detection

34

Page 35: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Distribution Regression / Classification

Y1=1

P1

Y2=0

P2

?

Pm+1

Y3=1

P3

Ym=0

Pm …

The inputs are distributions, density functions (not vectors)

We don’t know these distributions, only sample sets are available

(error in variables model)

Differences compared to standard methods on vectors

Page 36: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Distribution Classification

Problems:

Solution: Use RKHS based SVM!

Dual form of SVM:

36

Calculate the Gram matrix

Page 37: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Kernel Estimation

Linear kernel:

Polynomial kernel:

Gaussian kernel:

Solution: make it symmetric, and project it to the cone of PSD matrices

37

We already know how!

Page 38: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

OUTLINE

Applications computer vision astronomy other applications computer vision

Page 39: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Image Representation with Distributions

Each image patch is represented by PCA compressed SIFT vectors.

SIFT = Scale-invariant feature transform. PCA: 128dim) d dim

Image patches •Overlapping •Non-overlapping

Patch locations •Grid points •Interesting points •Random

Patch sizes •Same •Different, •Hierarchy

Dealing with complex objects

break into smaller parts,

represent the object as a sample set of these parts

d-dimensional sample set representation of the image

39

Each set is considered as a sample set from some unknown distribution.

Each image is represented as a set of these d dim feature vectors.

Page 40: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Detecting Anomalous Images B. Póczos, L. Xiong & J. Schneider, UAI, 2011.

50 highway images

5 anomalies

2-dimensional sample set representation of images (128 dim SIFT ) 2 dim)

Anomaly score: divergences between the distributions of these sample sets 40

Page 41: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Detecting Anomalous Images 1 2 3 4 9 5 8 6 7 10

55 54 53 51 52 41

Page 42: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

2-dimensional sample set representation

GMM-5 Density Approximation

42

1 2 3 4 9 5 8 6 7 10

55 54 53 51 52

Page 43: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Noisy USPS Dataset Classification with SDM

Results:

SVM on raw images 82.1 ± .5% accuracy

Original (noiseless) USPS dataset is easy ~97%

SDM on the 2D distributions, Rényi divergence: 96.0 ± .3% accuracy

43

160

160

Each instance (image) is a set of 500 2d points

1000 training and 1000 test instances

Page 44: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Multidimensional Scaling of USPS Data

Raw images

using Euclidean distance Estimated Euclidean distance

between the distributions

Nonlinear embedding with MDS into 2d.

10 instances from figures 1,2,3,4.

44

Calculate pairwise Euclidean distances.

Page 45: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Local Linear Embedding of Distributions

72 rotated COIL froggies Edge detected COIL froggy

Euclidean distance between images Euclidean distance between distributions 45

Page 46: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Object Classification ETH-80 [Leibe and Schiele, 2003]

BoW: 88.9%

NPR: 90.1%

8 categories, 400 images, each image is represented by 576 18 dim points

Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 46

2-fold CV,16 runs

Page 47: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Outdoor Scenes Classification [Oliva and Torralba, 2001]

Best published: 91.57% (Qin and Yung, ICMV 2010)

NPR: 92.3%

coast

mountain country

forest

street

highway city

tall building

8 categories, 2688 images,

each represented by 1815 53 dim points.

Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 47

10 fold CV, 16 runs

Page 48: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

8 categories, 1040 images, each represented by 295 to 1542 57 dim points.

Sport Events Classification [Li and Fei Fei, 2007]

Best published: 86.7% (Zhang et al, CVPR 2011)

NPR: 87.1%

Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 48 2 fold CV, 16 runs

badminton bocce croquet polo sailing climbing rowing snowboard

Page 49: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

OUTLINE

Applications computer vision astronomy other applications astronomy

Page 50: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Find new scientific laws in physics

Goal: Estimate dynamical mass of galaxy clusters. Importance: Galaxy clusters are being the largest gravitationally bound systems

in the Universe. Dynamical mass measurements are important to understand the

behavior of dark matter and normal matter.

Difficulty: We can only measure the velocity of galaxies not the mass of their cluster.

Physicists estimate dynamical cluster mass from single velocity dispersion.

Our method: Estimate the cluster mass from the whole distribution of velocities

rather than just a simple velocity distribution.

Page 51: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Find new scientific laws in physics

Michelle Ntampaka et al, A Machine Learning Approach for Dynamical Mass Measurements of

Galaxy Clusters, APJ 2015

Page 52: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

B. Póczos, L. Xiong & J. Schneider, UAI, 2011.

What are the most anomalous galaxy clusters?

The most anomalous galaxy cluster contains mostly

star forming blue galaxies

irregular galaxies

Sloan Digital Sky Survey (SDSS)

continuum spectrum

505 galaxy clusters

(10-50 galaxies in each)

7530 galaxies

Find interesting Galaxy Clusters

Blue galaxy Red galaxy

Credits: ESA, NASA

Page 53: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Given a distribution of particles,

our goal is to predict the

parameters of the simulated

universe

Find the parameters of Universe

Page 54: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

dependence

estimation

entropy

estimation

divergence

estimation Part 1

Estimators

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 2

ML on distributions

OUTLINE

Applications computer vision astronomy other applications other applications

Page 55: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Understanding Turbulences

Credits: ESA, NASA, PPPL, Wikipedia 55

Page 56: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Turbulence Data Classification

Simulated fluid flow through time (JHU Turbulence Research Group)

Positive (vortex) Negative Negative

Velocity distributions

56

•11 positive, 20 negative examples

find interesting events, patterns, phenomena Goal: find vortices!

•Results: Leave one out cross-val : 97%

Page 57: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Finding Vortices

Classification probabilities 57

Page 58: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Find Interesting Phenomena in Turbulence Data

Anomaly scores

Anomaly detection with 1-class SDM

58

Page 59: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Vorticity Scores

Find Interesting Phenomena in Turbulence Data

Xiong, Póczos, and Schneider, NIPS 2011. 59

Page 60: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Agriculture

Page 61: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Surrogate robotic system in the field

Page 62: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Surrogate robotic system in the field

The surrogate system collecting data at the TAMU field site. The carriage supports two boom assemblies each

one of which carries a sensor pod. The carriage slides up and down on the column allowing full scanning of a

plant.

Page 63: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Surrogate robotic system in the field

The carriage/dual-boom assembly

moves up and down the column at

a constant scanning speed. At its

highest travel point the assembly

clears the canopy (right).

Page 64: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Data collection with sensor pods

A sensor pod is deployed into a row and scans a plant

Page 65: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster
Page 66: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Name Range RMSE error

Leaf angle* 75.94 3.30 (4.35%)

Leaf radiation angle* 120.66 4.34 (3.60%)

Leaf length* 35.00 0.87 (2.49%)

Leaf width [max] 3.61 0.27 (7.48%)

Leaf width [average] 2.99 0.21 (7.o2%)

Leaf area* 133.45 8.11 (6.08%)

Page 67: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Extensions

L2 divergence:

Conditional Rényi Mutual Information:

x

y

z

B. Póczos & J. Schneider, AISTATS, 2012.

Page 68: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Rates for kNN Estimators

Page 69: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Problem Setting

Page 70: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Notation

Page 71: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

K-NN Distances

Page 72: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

k-NN density estimation

Page 73: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

k-NN density Estimator Properties

The Plug-In estimator:

Page 74: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Plug-In Estimator Properties

Page 75: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Issues with the Plug-In Estimator

Page 76: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Fixed-k functional estimators

Page 77: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Bias Correction

Page 78: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Bias Correction

That is,

Page 79: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Bias Correction

Bias Correction for Divergences:

Page 80: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Known Bias Correction Functions

Page 81: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Bias Correction: Entropy Special Case

Page 82: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Here, we discuss some of these challenges, motivating the assumptions we make to overcome them.

Discussion of Assumptions

Page 83: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

For these reasons, it has been common in the analysis of k-NN estimators to make the following assumption:

Discussion of Assumptions

Page 84: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Discussion of Assumptions

Page 85: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Discussion of Assumptions

Page 86: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

(A4) is a Mild Assumption

Page 87: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Concentation of k-NN Distances

Page 88: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Concentration of k-NN Distances

Concentration Corollary:

Page 89: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Main results: Bias Bound:

Page 90: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Main results: Variance Bound

Page 91: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Main results: MSE Bound

Page 92: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Benefits

are faster to compute,

can also exhibit superior rates of convergence

Compared to plug-in estimators, fixed-k estimators:

Page 93: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Benefits

Main result: Under some conditions

Page 94: Estimating Density Functionalszoltan.szabo/ml... · Gaussian kernel: Solution: make it symmetric, and project it to the cone of PSD matrices 37 ... Physicists estimate dynamical cluster

Dependence

estimation

Entropy

estimation

Divergence

estimation

computer vision, astronomy, other applications

classification, regression, clustering,

anomaly detection, low-dim embedding

Part 1

Part 2

Applications

Take Me Home!

• Outperforms state-of-the-art results in CV benchmarks

• Solves new problems in Astronomy, Turbulence data analysis, Agriculture

• Support Distribution Machines

• direct, consistent estimators, rates

• 1st Rényi MI estimator: robust, rank statistics only

• 1st divergence estimators

94