uncertainty in bayesian neural nets · bnn •variational inference •maximize lower bound on the...
TRANSCRIPT
UncertaintyinBayesianNeuralNets
August42017
Overview
• BNNreview• Visualizationexperiments• BNNresults
BNN
Prior:p(W)Likelihood:p(Y|X,W)ApproximatePosterior:q(W)PosteriorPredictive:𝐸"($)[𝑝(𝑦|𝑥,𝑊)]
BNN
• VariationalInference• Maximizelowerboundonthemarginallog-likelihood
log 𝑝 𝑌 𝑋 ≥ 𝐸" $ [log 𝑝 𝑌 𝑋,𝑊 + log 𝑝 𝑊 − log 𝑞 𝑊 ]
Prior PosteriorApprox
Likelihood
Y
X W
Dependentonthenumberofdatapoints
1𝑀9 log 𝑝 𝑌: 𝑋:,𝑊
;
:<=
+1𝑁 log
𝑝(𝑊)𝑞(𝑊)
Differentpriorsandposteriorapproximations
• Priorsp(W):• 𝑁(0, 𝜎A)• Scale-mixturesofNormals• SparsityInducing
• PosteriorApproximationsq(W):• Deltapeak q W = 𝛿𝑊• FullyFactorizedGaussiansq W =∏𝑁(𝑤I|𝜇I, 𝜎IA)�
�• BernoulliDropout• GaussianDropout• MNF
MultiplicativeNormalizingFlows(MNF)
• Augmentmodelwithauxiliaryvariable
ChristosLouizos,MaxWellingICML2017
Y
X W
Z
GenerativeModel
InferenceModel
W
Z
𝑧~𝑞 𝑧 𝑊~𝑞 𝑊 𝑧
𝑞 𝑊 = N𝑞 𝑊 𝑧 𝑞 𝑧 𝑑𝑧�
�
𝑞 𝑊 𝑧 =PP𝑁(𝑧I𝜇IQ, 𝜎IQA)RSTU
Q<=
RVW
I<=
log 𝑝 𝑌 𝑋 ≥ 𝐸" $ [log 𝑝 𝑌 𝑋,𝑊 + log 𝑝 𝑊 − log 𝑞 𝑊|𝑧 + log 𝑟 𝑧 𝑤 − log 𝑞(𝑧)]
NewlowerboundNormalizingFlows
PredictiveDistributions
Uncertainties
• Modeluncertainty(Epistemicuncertainty)• Capturesignoranceaboutthemodelthatismostsuitabletoexplainthedata• Reducesastheamountofobserveddataincreases• Summarizedbygeneratingfunctionrealizationsfromourdistribution
• MeasurementNoise(Aleatoric uncertainty)• Noiseinherentintheenvironment,capturedinlikelihoodfunction
• Predictiveuncertainty• Entropyofprediction=H[p(y|x)]
VisualizationExperiments
• 1Dregression• ClassificationofMNIST(visualizein2D)
• Questions:• Activations• Numberofsamples• Heldoutclasses• Typeofuncertainties
Sigmoid:(1+e-x)-1 Tanh
Softplus:ln(1+ex) ReLU:max(0,x)
BNNswithDifferentActivationFunctions
UncertaintyofDecisionBoundaries
• Setup:• ClassificationofMNIST• Train:50000Test:10000
784-100-2-100-10
BNN:FFG,N(0,1)Activations:Softplus
NN BNN
DecisionBoundaries– 3Samples
PlotofArgmax p(y|x)ateachpoint
UncertaintyofDecisionBoundaries:HeldOutClasses• Setup:• Classificationofdigits0to4(5to9heldout)
784-100-100-2-100-100-10
BNN:FFG,N(0,1)Activations:Softplus
NN BNN
Wheredoyouthinktheheldoutclasseswillgo?
InsideorOutsidetheCircle?
Wheredoyouthinktheheldoutclasseswillgo?
HeldOutClasses
Unseenclassesdon’tgetencodedassomethingfaraway,insteadencodednearmean
ConfidenceofPredictions?MaybelargeareashavehighentropyArgmax vsMax
ClassBoundaries- Confidences
SharptransitionsThereisn’tmuchuncertainspace:mostlyuniform,highconfidence
EntropyArgmax
Max
Entropy
AffectofChoiceofActivationFunction
• Softplus• ReLU• Tanh
Softplus
Sample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]
ReLU
Sample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]
TanhSample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]
Mix(Softplus,ReLu,Tanh)Sample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥)]
NumberofDatapoints25000 10000 1000 100
Argmax
Max
Entropy
𝐸"($)[𝑝(𝑦|𝑥)]
ModelvsOutputUncertainty
• PredictiveUncertainty=𝐻[𝑝(𝑦|𝑥)]
OutputUncertainty
ModelUncertainty
𝐻[𝑝(𝑦|𝑥, 𝑤Z)]where𝑤Z=meanofq(w)
𝐻[𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]]
Outputhighentropy(ondecisionboundary)
Highvariancepredictions
ModelvsOutputUncertainty
Train Test HeldOut
ModelUncertainty .07 .26 .43
OutputUncertainty .03 .15 .25
Train Test HeldOut
ModelUncertainty .06 .06 .43
OutputUncertainty .05 .05 .36
100trainingdatapoints
25000trainingdatapoints
Smalldata:modeluncertainty
Largedata:outputuncertainty
NN BNN GP+NN
AdversarialExamples,Uncertainty,andTransferTestingRobustnessinGaussianProcessHybridDeepNetworks(July2017)
Visualizelandscapeoflikelihood
w1
w2
p(ytrain|xtrain,W)
DimensionofWislarge,sousean2Dauxiliaryvariable
Visualizelandscapeoflikelihood
• AuxiliaryVariableModel
Y
X W
Z
GenerativeModel
InferenceModel
W
Z
784-100-100-2-10-10-10NN BNN
(2D)𝑧~𝑞 𝑧
𝑞 𝑊 = N𝛿 𝑊 𝑧 𝑞 𝑧 𝑑𝑧�
�
𝑊~𝑞 𝑊 𝑧r 𝑧 𝑊
log 𝑝 𝑌 𝑋 ≥ 𝐸" $ [log 𝑝 𝑌 𝑋,𝑊 + log 𝑝 𝑊 − log 𝑞 𝑊|𝑧 + log 𝑟 𝑧 𝑤 − log 𝑞(𝑧)]
𝑞 𝑊 𝑧 = 𝛿(𝑊|𝑧)
hyper-network hypo-network
DecisionBoundariesz1 z2 z3 𝐸"([)[𝑝(𝑦|𝑥, 𝑧)]
LikelihoodLandscape
Logp(ytrain|xtrain,W,z) Logp(ytest|xtest,W,z)
z1
z2 z2
LikelihoodLandscape
logp(ytrain|xtrain,W,z) logp(ytest|xtest,W,z)
logp(ytrain|xtrain,W,z)+logr(z|W)- logq(z)
z1
z2
LikelihoodLandscape
Logp(ytrain|xtrain,W,z) Logp(ytest|xtest,W,z)
logp(ytrain|xtrain,W,z)+logr(z|W)- logq(z)
z1
z2
LikelihoodLandscape
Logp(ytrain|xtrain,W,z) Logp(ytest|xtest,W,z)
logp(ytrain|xtrain,W,z)+logr(z|W)- logq(z)
z1
z2
RecentBNNPapers
• MultiplicativeNormalizingFlowsforVariationalBayesianNeuralNetworks(2017)• VariationalDropoutSparsifies DeepNeuralNetworks(2017)• BayesianCompressionforDeepLearning(2017)
• AdversarialPerturbations• Compression
Adversarialperturbations
MNIST CIFAR10
CompressionvsUncertainty
H[P]
Conclusion
• UsedvisualizationstohelpunderstanduncertaintyinBNNs• Goal:improveuncertaintyestimatesandgeneralization
Applications• Activelearning• BayesOpt• RL
• Safety• Efficiency
References
• WeightUncertaintyinNeuralNetworks(2015)
• VariationalDropoutandtheLocalReparameterization Trick(2015)
• DropoutasaBayesianApproximation:RepresentingModelUncertaintyinDeepLearning(2016)
• VariationalDropoutSparsifies DeepNeuralNetworks(2017)
• OnCalibrationofModernNeuralNetworks(2017)
• MultiplicativeNormalizingFlowsforVariationalBayesianNeuralNetworks(2017)
ThankYou