distinguishing distributions with maximum testing powerszabo/talks/invited_talk/... · maximize...
TRANSCRIPT
![Page 1: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/1.jpg)
Distinguishing Distributions with MaximumTesting Power
Zoltan Szabo (Gatsby Unit, UCL)
Wittawat Jitkrittum Kacper Chwialkowski Arthur Gretton
Realeyes, Budapest
August 24, 2016
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 2: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/2.jpg)
Contents
Motivating examples: NLP, computer vision.
Two-sample test: t-test Ñ distribution features.
Linear-time, interpretable, high-power, nonparametric t-test.
Numerical illustrations.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 3: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/3.jpg)
Motivating examples
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 4: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/4.jpg)
Motivating example-1: NLP
Given: two categories of documents (Bayesian inference,neuroscience).
Task:
test their distinguishability,most discriminative words Ñ interpretability.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 5: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/5.jpg)
Motivating example-2: computer vision
Given: two sets of faces (happy, angry).
Task:
check if they are different,determine the most discriminative features/regions.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 6: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/6.jpg)
One-page summary
Contribution:
We propose a nonparametric t-test.
It gives a reason why H0 is rejected.
It has high test power.
It runs in linear time.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 7: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/7.jpg)
One-page summary
Contribution:
We propose a nonparametric t-test.
It gives a reason why H0 is rejected.
It has high test power.
It runs in linear time.
Dissemination, code:
NIPS-2016 [Jitkrittum et al., 2016]: full oral = top 1.84%.
https://github.com/wittawatj/interpretable-test.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 8: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/8.jpg)
Two-sample test, distribution features
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 9: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/9.jpg)
What is a two-sample test?
Given:
X “ txiuni“1
i .i .d.„ P, Y “ tyjunj“1
i .i .d.„ Q.
Example: xi = i th happy face, yj = j th sad face.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 10: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/10.jpg)
What is a two-sample test?
Given:
X “ txiuni“1
i .i .d.„ P, Y “ tyjunj“1
i .i .d.„ Q.
Example: xi = i th happy face, yj = j th sad face.
Problem: using X , Y test
H0 : P “ Q, vs
H1 : P ‰ Q.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 11: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/11.jpg)
What is a two-sample test?
Given:
X “ txiuni“1
i .i .d.„ P, Y “ tyjunj“1
i .i .d.„ Q.
Example: xi = i th happy face, yj = j th sad face.
Problem: using X , Y test
H0 : P “ Q, vs
H1 : P ‰ Q.
Assume X ,Y Ă Rd .
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 12: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/12.jpg)
Ingredients of two-sample test
Test statistic: λn “ λnpX ,Y q, random.Significance level: α “ 0.01.Under H0: PH0
p λn ď Tαlooomoooncorrectly accepting H0
q “ 1 ´ α.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 13: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/13.jpg)
Ingredients of two-sample test
Test statistic: λn “ λnpX ,Y q, random.Significance level: α “ 0.01.Under H0: PH0
p λn ď Tαlooomoooncorrectly accepting H0
q “ 1 ´ α.
Under H1: PH1pTα ă λnq “ Ppcorrectly rejecting H0q =: power.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 14: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/14.jpg)
Towards representations of distributions: EX
Given: 2 Gaussians with different means.
Solution: t-test.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 15: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/15.jpg)
Towards representations of distributions: EX 2
Setup: 2 Gaussians; same means, different variances.
Idea: look at 2nd-order features of RVs.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 16: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/16.jpg)
Towards representations of distributions: EX 2
Setup: 2 Gaussians; same means, different variances.
Idea: look at 2nd-order features of RVs.
ϕx “ x2 ñ difference in EX 2.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 17: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/17.jpg)
Towards representations of distributions: further moments
Setup: a Gaussian and a Laplacian distribution.
Challenge: their means and variances are the same.
Idea: look at higher-order features.
Let us consider feature/distribution representations!
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 18: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/18.jpg)
Kernel: similarity between features
Given: x and x1 objects (images or texts).
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 19: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/19.jpg)
Kernel: similarity between features
Given: x and x1 objects (images or texts).
Question: how similar they are?
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 20: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/20.jpg)
Kernel: similarity between features
Given: x and x1 objects (images or texts).
Question: how similar they are?
Define features of the objects:
ϕx : features of x,
ϕx1 : features of x1.
Kernel: inner product of these features
kpx, x1q :“ 〈ϕx, ϕx1〉 .
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 21: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/21.jpg)
Kernel examples on Rd (γ ą 0, p P Z`)
Polynomial kernel:
kpx, yq “ p〈x, y〉 ` γqp .
Gaussian kernel:
kpx, yq “ e´γ}x´y}22 .
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 22: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/22.jpg)
Towards distribution features
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 23: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/23.jpg)
Towards distribution features
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 24: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/24.jpg)
Towards distribution features
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 25: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/25.jpg)
Towards distribution features
{MMD2pP,Qq “ ĚKP,P ` ĘKQ,Q ´ 2ĘKP,Q (without diagonals in ĚKP,P, ĘKQ,Q)
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 26: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/26.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 27: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/27.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.
Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 28: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/28.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.
Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Previous quantity: unbiased estimate of
MMD2pP,Qq “ }µP ´ µQ}2 .
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 29: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/29.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.
Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Previous quantity: unbiased estimate of
MMD2pP,Qq “ }µP ´ µQ}2 .
Valid test [Gretton et al., 2012]. Challenges:
1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 30: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/30.jpg)
Linear-time tests
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 31: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/31.jpg)
Linear-time 2-sample test
Recall:
MMD2pP,Qq “ }µP ´ µQ}2Hpkq .
Changing [Chwialkowski et al., 2015] this to
ρ2pP,Qq :“ 1
J
Jÿ
j“1
rµPpvj q ´ µQpvj qs2.
with random tvjuJj“1 test locations
ρ is a metric (a.s.). How do we estimate it? Distribution under H0?
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 32: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/32.jpg)
Estimation
Estimate
{ρ2pP,Qq “ 1
J
Jÿ
j“1
rµPpvj q ´ µQpvj qs2,
where µPpvq “ 1n
řni“1 kpxi , vq. Using kpx, vq “ e´ }x´v}2
2σ2 ,
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 33: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/33.jpg)
Estimation – continued
{ρ2pP,Qq “ 1
J
Jÿ
j“1
rµPpvj q ´ µQpvj qs2
“ 1
J
Jÿ
j“1
«1
n
nÿ
i“1
kpxi , vj q ´ 1
n
nÿ
i“1
kpyi , vj qff2
“ 1
J
Jÿ
j“1
pznq2j “ 1
JzTn zn,
where zn “ 1n
řni“1 rkpxi , vjq ´ kpyi , vj qsJj“1loooooooooooooomoooooooooooooon
“:zi
P RJ .
Good news: estimation is linear in n!
Bad news: intractable null distr. =?n {ρ2pP,Pq wÝÑ sum of J
correlated χ2.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 34: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/34.jpg)
Normalized version gives tractable null
Modified test statistic:
λn “ nzTn Σ´1n zn,
where Σn “ covptzi ui q.Under H0:
λnwÝÑ χ2pJq. ñ Easy to get the p1 ´ αq-quantile!
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 35: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/35.jpg)
Our idea
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 36: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/36.jpg)
Idea
Until this point: test locations (V) are fixed.
Instead: choose θ “ tV, σu to
maximize lower bound on the test power.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 37: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/37.jpg)
Idea
Until this point: test locations (V) are fixed.
Instead: choose θ “ tV, σu to
maximize lower bound on the test power.
Theorem (Lower bound on power)
For large n, test power ě Lpλnq; L: explicit function, increasing.
Here,
λn “ nµTΣ
´1µ: population version of λn.
µ “ Exyrz1s, Σ “ Exy
“pz1 ´ µqpz1 ´ µqT
‰.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 38: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/38.jpg)
Convergence of the λn estimator
Training objective λnpXtr ,Ytr q converges to λn.
But λn is unknown.Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq. Use λnpXtr ,Ytr q « λn.
Theorem (Guarantee on objective approximation)ˇsupV ,K zTn pΣn ` γnq´1zn ´ supV ,Kµ
TΣ
´1µ
ˇ“ O
´n´ 1
4
¯.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 39: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/39.jpg)
Convergence of the λn estimator
Training objective λnpXtr ,Ytr q converges to λn.
But λn is unknown.Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq. Use λnpXtr ,Ytr q « λn.
Theorem (Guarantee on objective approximation)ˇsupV ,K zTn pΣn ` γnq´1zn ´ supV ,Kµ
TΣ
´1µ
ˇ“ O
´n´ 1
4
¯.
Examples:
K “ tkσpx, yq “ e´}x´y}2 : σ ą 0u,K “ tkApx, yq “ e´px´yqTApx´yq : A ą 0u.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 40: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/40.jpg)
Numerical demos
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 41: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/41.jpg)
Parameter settings
Gaussian kernel (σ). α “ 0.01. J “ 1. Repeat 500 trials.Report
PprejectH0q « #times λn ą Tα holds
#trials.
Compare 4 methods
ME-full: Optimize V and Gaussian bandwidth σ.ME-grid: Optimize σ. Fix V [Chwialkowski et al., 2015].MMD-quad: Test with quadratic-time MMD [Gretton et al., 2012].MMD-lin: Test with linear-time MMD [Gretton et al., 2012].
Optimize kernels to power in MMD-lin, MMD-quad.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 42: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/42.jpg)
NLP: discrimination of document categories
5903 NIPS papers (1988-2015).Keyword-based category assignment into 4 groups:
Bayesian inference, Deep learning, Learning theory, Neuroscience
d “ 2000 nouns. TF-IDF representation.
Problem nte ME-full ME-grid MMD-quad MMD-lin
1. Bayes-Bayes 215 .012 .018 .022 .008
2. Bayes-Deep 216 .954 .034 .906 .262
3. Bayes-Learn 138 .990 .774 1.00 .238
4. Bayes-Neuro 394 1.00 .300 .952 .972
5. Learn-Deep 149 .956 .052 .876 .500
6. Learn-Neuro 146 .960 .572 1.00 .538
Performance of ME-full rOpnqs is comparable to MMD-quad rOpn2qs.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 43: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/43.jpg)
NLP: most/least discriminative words
Aggregating over trials; example: ’Bayes-Neuro’.
Most discriminative words:
spike, markov, cortex, dropout, recurr, iii, gibb.
learned test locations: highly interpretable,’markov’, ’gibb’ (ð Gibbs): Bayesian inference,’spike’, ’cortex’: key terms in neuroscience.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 44: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/44.jpg)
NLP: most/least discriminative words
Aggregating over trials; example: ’Bayes-Neuro’.
Least dicriminative ones:
circumfer, bra, dominiqu, rhino, mitra, kid, impostor.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 45: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/45.jpg)
Distinguish positive/negative emotions
Karolinska Directed Emotional Faces (KDEF) [Lundqvist et al., 1998].70 actors = 35 females and 35 males.d “ 48 ˆ 34 “ 1632. Grayscale. Pixel features.
` :happy neutral surprised
´ :afraid angry disgusted
Problem nte ME-full ME-grid MMD-quad MMD-lin˘ vs. ˘ 201 .010 .012 .018 .008
` vs. ´ 201 .998 .656 1.00 .578
Learned test location (averaged) =
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 46: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/46.jpg)
Summary
We proposed a nonparametric t-test:
linear time,high-power (« ’MMD-quad’),
2 demos: discriminating
documents of different categories,positive/negative emotions.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 47: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/47.jpg)
Thank you for the attention!
Acknowledgements: This work was supported by the GatsbyCharitable Foundation.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 48: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/48.jpg)
Contents
Non-convexity, informative features.
Number of locations (J).
Computational complexity.
Estimation of MMD2.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 49: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/49.jpg)
Non-convexity, informative features
2D problem:
P :“ N pr0; 0s, Iq,Q :“ N pr1; 0s, Iq.
V “ tv1, v2u.Fix v1 to ▲.
Contour plot ofv2 ÞÑ λnptv1, v2uq.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 50: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/50.jpg)
Number of locations (J)
Small J:
often enough to detect the difference of P & Q.few distinguishing regions to reject H0.faster test.
Very large J:
test power need not increase monotonically in J (morelocations ñ statistic can gain in variance).defeats the purpose of a linear-time test.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 51: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/51.jpg)
Computational complexity
Optimization & testing: linear in n.
Testing: O`ndJ ` nJ2 ` J3
˘.
Optimization: O`ndJ2 ` J3
˘per gradient ascent.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 52: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/52.jpg)
Estimation of MMD2
Squared difference between feature means:
MMD2pP,Qq “ }µP ´ µQ}2H
“ 〈µP ´ µQ, µP ´ µQ〉H“ 〈µP, µP〉H ` 〈µQ, µQ〉H ´ 2 〈µP, µQ〉H“ EP,Pkpx, x1q ` EQ,Qkpy, y1q ´ 2EP,Qkpx, yq.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 53: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/53.jpg)
Estimation of MMD2
Squared difference between feature means:
MMD2pP,Qq “ }µP ´ µQ}2H
“ 〈µP ´ µQ, µP ´ µQ〉H“ 〈µP, µP〉H ` 〈µQ, µQ〉H ´ 2 〈µP, µQ〉H“ EP,Pkpx, x1q ` EQ,Qkpy, y1q ´ 2EP,Qkpx, yq.
Unbiased empirical estimate for txi uni“1 „ P, tyjunj“1 „ Q:
{MMD2pP,Qq “ ĚKP,P ` ĘKQ,Q ´ 2ĘKP,Q.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 54: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/54.jpg)
Chwialkowski, K., Ramdas, A., Sejdinovic, D., and Gretton, A.(2015).Fast Two-Sample Testing with Analytic Representations ofProbability Measures.In Neural Information Processing Systems (NIPS), pages1981–1989.
Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B., andSmola, A. (2012).A kernel two-sample test.Journal of Machine Learning Research, 13:723–773.
Jitkrittum, W., Szabo, Z., Chwialkowski, K., and Gretton, A.(2016).Interpretable distribution features with maximum testingpower.In Neural Information Processing Systems (NIPS).(accepted).
Lundqvist, D., Flykt, A., and Ohman, A. (1998).
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power
![Page 55: Distinguishing Distributions with Maximum Testing Powerszabo/talks/invited_talk/... · maximize lower bound on the test power. Theorem (Lower bound on power) Forlargen,testpowerě](https://reader035.vdocuments.mx/reader035/viewer/2022071110/5fe5651704883f628f113c5e/html5/thumbnails/55.jpg)
The Karolinska directed emotional faces-KDEF.Technical report, ISBN 91-630-7164-9.
Zoltan Szabo Distinguishing Distributions with Maximum Testing Power