adanet: adaptive structural learning of artificial neural ... · outline 1 introduction motivation...
Post on 21-Aug-2019
217 Views
Preview:
TRANSCRIPT
AdaNet: Adaptive Structural Learning of ArtificialNeural Networks
Corinna Cortes1, Xavier Gonzalvo1, Vitaly Kuznetsov1, MehryarMohri21, Scott Yang 2
1Google Research
2Courant Institute
ICML, 2017/ Presenter: Anant Kharkar
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 1
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 2
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 3
/ 18
Motivation & Contributions
Motivation
Lack of theory behind model architecture design
Usually requires domain knowledge
Contributions
New regularizer that learns parameters & architecture simultaneously
Additive model
Context
Strongly convex optimization problems
Generic network structure (not just MLP)
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 4
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 5
/ 18
Regularizer Derivation
Function families:
H1 = {x 7→ u ·Ψ(x) : u ∈ Rn0 , ‖u‖p ≤ Λ1,0}
Hk =
{x 7→
k−1∑s=1
us · (ϕs ◦ hs)(x) : us ∈ Rns , ‖us‖p ≤ Λk,s , hk,s ∈ Hs
}Definitions:
x: input
u: connections to previous layers
Ψ: feature vector
ϕ: activation function
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 6
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 7
/ 18
Generalization Bounds
Rademacher Complexity:
R̂S(G) =1
mEσ
[suph∈G
m∑i=1
σih(xi )
]
Definitions:
x: input
h: function expressed by single unit
σ: Rademacher RV {-1, +1}
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 8
/ 18
Generalization Bounds
Theorem 1:
R(f ) ≤ R̂S ,ρ(f ) +4
ρ
l∑k=1
‖wk‖1Rm(H̃k) +2
ρ
√log l
m+ C (ρ, l ,m, δ)
C (ρ, l ,m, δ) =
√⌈4
ρ2log(
ρ2m
log l)
⌉log l
m+
log( 2δ )
2m
4ρ
l∑k=1
‖wk‖1Rm(H̃k): weighted sum of Rademacher complexities
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 9
/ 18
Generalization Bounds
Corollary 1:
R(f ) ≤ R̂S,ρ(f ) +2
ρ
l∑k=1
‖wk‖1
[r̄∞ΛkN
1q
k
√2log(2n0)
m
]
+2
ρ
√log l
m+ C (ρ, l ,m, δ)
C (ρ, l ,m, δ) =
√⌈4
ρ2log(
ρ2m
log l)
⌉log l
m+
log( 2δ )
2m
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 10
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 11
/ 18
Basics
Adaptively grows network structure
Objective function:
F (w) =1
m
m∑i=1
Φ
(1− yi
N∑j=1
wjhj
)+
N∑j=1
Γj |wj |
Definitions:
Φ: nondecreasing convex function (Ex: ex)
Γj : λrj + β
rj : Rm(Hkj ) (Rademacher complexity)
This objective function is forcibly convex
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 12
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 13
/ 18
Algorithm
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 14
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 15
/ 18
Results
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 16
/ 18
Outline
1 IntroductionMotivation & Contributions
2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds
3 AdaNetBasicsAlgorithmResultsVariations
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 17
/ 18
Variations
AdaNet.R: R(w,h) = Γh‖w‖1 regularization in obj.
AdaNet.P: subnetworks only connected to previous subnetwork
AdaNet.D: dropout on subnetwork connections
AdaNet.SD: std dev of final layers instead of Rademacher complexity
Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 18
/ 18
top related