machine learning using bayesian approaches - cedar.buffalo.edusrihari/talks/iisc-sept2008.pdf ·...
TRANSCRIPT
![Page 1: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/1.jpg)
1
Machine Learning using Bayesian Approaches
Sargur N. Srihari University at Buffalo, State University of New York
![Page 2: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/2.jpg)
2
Outline 1. Progress in ML and PR 2. Fully Bayesian Approach
1. Probability theory Bayes theorem, Conjugate Distributions
2. Making predictions using marginalization 3. Role of sampling methods 4. Generalization of neural networks, SVM
3. Biometric/Forensic Application 4. Concluding Remarks
![Page 3: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/3.jpg)
3
Example of Machine Learning: Handwritten Digit Recognition
• Handcrafted rules will result in large no of rules and exceptions
• Better to have a machine that learns from a large training set
• Provide a probabilistic prediction
Wide variability of same numeral
![Page 4: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/4.jpg)
Progress in PRML theory
• Unified theory involving – Regression and classification – Linear models, neural networks, SVMs
• Basis vectors – polynomial, non-linear functions of weighted inputs,
centered around samples, etc
• The fully Bayesian approach • Models from statistical physics
– 4
![Page 5: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/5.jpg)
5
The Fully Bayesian Approach
• Bayes Rule • Bayesian Probabilities • Concept of Conjugacy • Monte Carlo Sampling
![Page 6: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/6.jpg)
6
Rules of Probability • Given random variables X and Y • Sum Rule gives Marginal Probability
• Product Rule: joint probability in terms of conditional and marginal
• Combining we get Bayes Rule
€
p(X) = p(X |Y )p(Y )Y∑where
Viewed as Posterior α likelihood x prior
X
Y
P(X,Y)
![Page 7: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/7.jpg)
7
Bayesian Probabilities • Classical or Frequentist view of Probabilities
– Probability is frequency of random, repeatable event • Bayesian View
– Probability is a quantification of uncertainty • Whether Shakespeare’s plays were written by Francis Bacon • Whether moon was once in its own orbit around the sun
• Use of probability to represent uncertainty is not an ad-hoc choice – Cox’s Theorem:
• If numerical values represent degrees of belief, then simple set of axioms for manipulating degrees of belief leads to sum and product rules of probability
![Page 8: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/8.jpg)
Role of Likelihood Function • Uncertainty in w expressed as
• Denominator is normalization factor • Involves marginalization over w
• Bayes theorem in words posterior α likelihood x prior
• Likelihood Function used in both Bayesian and frequentist paradigms • Frequentist: w is fixed parameter determined by estimator;
error bars on estimate from possible data sets D • Bayesian: there is a single data set D, uncertainty
expressed as probability distribution over w
€
p(w |D) =p(D | w)p(w)
p(D)where p(D) = p(D | w)p(w)dw∫ by Sum Rule
![Page 9: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/9.jpg)
9
Bayesian versus Frequentist Approach • Inclusion of prior knowledge arises naturally • Coin Toss Example
– Suppose we need to learn from coin tosses so that we can predict whether a coin toss will land Heads
– Fair looking coin is tossed three times and lands Head each time
– Classical m.l.e of the probability of landing heads is 1 implying all future tosses will land Heads
– Bayesian approach with reasonable prior will lead to less extreme conclusion
p(µ) p(µ/D)
Beta Distribution has property of conjugacy
![Page 10: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/10.jpg)
10
Bayesian Inference with Beta • Posterior obtained by multiplying beta
prior with binomial likelihood (Nm)µm(1-µ)N-m
yields
– where l=N-m, which is no of tails – m is no of heads
• It is another beta distribution
– Effectively increase value of a by m and b by l – As number of observations increases
distribution becomes more peaked
a=2, b=2
N=m=1, with x=1
a=3, b=2
Illustration of one step in process
µ1(1-µ)0
![Page 11: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/11.jpg)
11
Predicting next trial outcome • Need predictive distribution of x given observed D
– From sum and products rule
• Expected value of the posterior distribution can be shown to be
– Which is fraction of observations (both fictitious and real) that correspond to x=1
• Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with
observed data
€
p(x =1 |D) = p(x =1,µ |D)dµ0
1∫ = p(x =1 |µ)p(µ |D)dµ
0
1∫ =
= µp(µ |D)dµ0
1∫ = E[µ |D]
![Page 12: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/12.jpg)
12
Bayesian Regression (Curve Fitting) • Estimating polynomial weight vector w
– In fully Bayesian approach consistently apply sum and product rules and integrate over all values of w
• Given training data x and t and new test point x , goal is to predict value of t – i.e, wish to evaluate predictive distribution p(t|x,x,t)
• Applying sum and product rules of probability
€
p(t | x,x,t) = p(t,w |∫ x,x,t)dw by Sum Rule (marginalizing over w)
= p(t | x,w,x,t) p(∫ w | x,x,t) by Product Rule
= p(t | x,w)p(w | x,t)dw∫ by eliminating unnecessary variables
€
p(t | x,w) = N(t | y(x ,w),β−1) Posterior distribution over parameters Also a Gaussian
![Page 13: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/13.jpg)
13
Conjugate Distributions Discrete- Binary
Discrete- Multi-valued
Continuous
Bernoulli Single binary variable
Multinomial One of K values = K-dimensional binary vector
Gaussian
Angular Von Mises
Binomial N samples of Bernoulli
Beta Continuous variable between {0,1]
Dirichlet K random variables between [0.1]
Gamma ConjugatePrior of univariate Gaussian precision
Wishart Conjugate Prior of multivariate Gaussian precision matrix
Student’s-t Generalization of Gaussian robust to Outliers Infinite mixture of Gaussians
Exponential Special case of Gamma
Uniform
N=1 Conjugate Prior
Conjugate Prior
Large N
K=2
Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision
Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrix
![Page 14: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/14.jpg)
Bayesian Approaches in Classical Pattern Recognition • Bayesian Neural Networks
– Classical neural networks focus on maximum likelihood to determine network parameters (weights and biases)
– Marginalizes over distribution of parameters in order to make prediction
• Bayesian Support Vector Machines
14
![Page 15: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/15.jpg)
15
Practicality of Bayesian Approach
• Marginalization over whole parameter space is required to make predictions or compare models
• Factors making it practical: • Sampling Methods such as Markov Chain
Monte Carlo, • Increased speed and memory of computers
• Deterministic approximation schemes • Variational Bayes is an alternative to sampling
![Page 16: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/16.jpg)
16
Markov Chain Monte Carlo (MCMC)
• Rejection sampling and importance sampling for evaluating expectations of functions – suffer from severe limitations, particularly with high
dimensionality • MCMC is a very general and powerful framework
– Markov refers to sequence of samples rather than the model being Markovian
• Allows sampling from large class of distributions • Scales well with dimensionality • MCMC origin is in statistical physics (Metropolis
1949)
![Page 17: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/17.jpg)
17
Gibbs Sampling • A simple and widely applicable Markov Chain Monte
Carlo algorithm • It is a special case of Metropolis-Hastings algorithm • Consider distribution p(z)=p(z1,..,zM) from which we wish
to sample • We have chosen an initial state for the Markov chain • Each step involves replacing value of one variable by a
value drawn from p(zi|z\i) where symbol z\i denotes z1,..,zM with zi omitted
• Procedure is repeated by cycling through variables in some particular order
![Page 18: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/18.jpg)
18
Machine Learning in forensics
• Q-K Comparison – involves significant uncertainty in some
modalities • E.g., handwriting
• Learn intra-class variations – so as to determine whether questioned sample
is genuine
![Page 19: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/19.jpg)
19
a b
c d
Known Signatures Knowns Compared with each other
1
2 3
5
6
Questioned Signature a
b
c
d
Questioned Compared with Knowns
Genuine Distribution
.4,.1,.2,.3,.2,.1
0,.4,.1,.2
Distribution Comparison
4
Questioned Distribution
Genuine and Questioned Distributions
![Page 20: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/20.jpg)
Methods for Comparing Distributions
• Kolmogorov- Smirnov Test • Kullback-Leibler Divergence • Bayesian Formulation
20
![Page 21: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/21.jpg)
Kolmogorov Smirnov Test
• To determine if two data sets were drawn from same distribution
• Compute a statistic D – maximum value of the absolute difference
between two c.d.f.s – K-S statistic for two cdfs SN1(x) and SN2(x)
• Mapped to a probability P according to PKS= QKS (sqrtNe+0.12+(0.11/sqrtNe)D)
€
D = max−∞<x<∞
SN1(x) − SN 2(x)
![Page 22: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/22.jpg)
22
Kullback-Leibler (KL) Divergence • If unknown distribution p(x) is modeled by
approximating distribution q(x) • Average additional information required to
specify value of x as a result of using q(x) instead of p(x)
€
KL(p ||q) = − p(x)lnq(x)dx − p(x)ln p(x)dx∫( )∫
= − p(x)ln p(x)q(x)
dx∫
![Page 23: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/23.jpg)
23
K-L Properties • Measure of Dissimilarity of p(x) and q(x) • Non-symmetric:
-- KL(p||q) > 0 with equality iff p(x)=q(x) – Proof with convex functions
• Not a metric • Convert to a probability using
KL(p||q) n.e. KL(q||p)
€
DKL = Pb logb=1
B
∑ (PbQb
)
PKL = e−ςDKL
![Page 24: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/24.jpg)
24
Bayesian Comparison of Distributions
• Determine p (D1 = D2) given that we only observe samples from them
• Method: – Determine p(D1=D2= a particular distribution f ) – Marginalize over all f (Non-informative prior) – If f is Gaussian we obtain a closed-form solution for
marginalization over the family of Gaussians F.
• S1 – Samples from one distribution • S2 – Samples from another distribution • Task: To determine statistical similarity between S1 and S2
• D1,D2 – Theoretical distributions from which S1 and S2 came from
![Page 25: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/25.jpg)
25
Bayesian Comparison of Distributions
€
P(D1 = D2 | S1,S2) = P(D1 = D2 = f | S1,S2)dfF∫
=F∫ p(S1,S2 |D1 = D2 = f )p(D1 = D2 = f )
p(S1,S2)df
=F∫ p(S1,S2 |D1 = D2 = f )
p(S1,S2)df
=F∫ p(S1,S2 |D1 = D2 = f )
p(S1,S2 |D1 = D2 = f )p(D1 = D2 = f )dfF∫
df
=F∫ p(S1,S2 |D1 = D2 = f )
p(S1,S2 |D1 = D2 = f )dfF∫
df
=F∫ p(S1,S2 |D1 = D2 = f )
p(S1 |D1 = f1)df1 p(S2 |D2 = f2)df2F∫
F∫
df
=Q(S1S2)
Q(S1) ×Q(S2) where Q(S) = p(S | f )df
F∫
Bayes rule
Marginalizing Denominator over all f
Assuming all distributions equally likely or uniform p(f)
Due to uniform p(f)
Since S1 and S2 are cond independent given f
Marginalizing over all f , a member of family F
![Page 26: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/26.jpg)
26
Bayesian Comparison of Distributions (F= Gaussian Family)
€
p(D1 = D2 | S1,S2) =Q(S1S2)
Q(S1) ×Q(S2) where Q(S) = p(S | f )df
F∫
For Gaussians closed form for Q is obtainable
€
Q(S) =−∞
∞
∫ p(S |µ,σ )dµdσ0
∞
∫
Q(S) =−3 / 22 ×
(1−n ) / 2π ×
(−1/ 2)n ×(1−n / 2)C ×Γ(n /2 −1)
C = 2xx∈S∑ −
( x)x∈S∑
2
n
Evaluation of Q for Gaussian family
where n is the cardinality of S
![Page 27: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/27.jpg)
Signature Data Set
Genuines
Forgeries
Samples for one person
55 individuals
![Page 28: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/28.jpg)
Feature computation
Correlation-Similarity Measure
Feature Vector and Similarity
![Page 29: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/29.jpg)
BayesianComparison
Genuines
Forgeries
Clear Separation
![Page 30: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/30.jpg)
Informa0on‐Theore0c(KS+KL)
![Page 31: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/31.jpg)
GraphMatching
![Page 32: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/32.jpg)
Comparison of KS and KL
*S.N.Srihari A.Xu M.K.Kalera, Learning strategies and classification methods for off-line signature verification. Proc. of the 7th Int. Workshop on Frontiers in handwriting recognition (IWHR), pages 161-166, 2004
* Threshold Methods
Knowns KS KL KS & KL
4 76.38% 75.10% 79.50%
7 80.10% 79.50% 81.0%
10 83.53% 84.27% 86.78%
**Harish Srinivasan, Sargur.N.Srihari and Mattew.J.Beal, Machine learning for person identification and verification, Machine Learning in Document Analysis and Recognition (S. Marinai and H. Fujisawa, ed.), Springer 2008
**Information-Theoretic
![Page 33: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/33.jpg)
Error Rates of Three Methods CEDAR Data Set
K-L
K-S
KS+KL
Bayesian
P
erce
ntag
e er
ror
5
10
1
5
20
2
5
30
5 10 15 20 Number of Training Samples
![Page 34: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/34.jpg)
Why Does Bayesian Formalism do much better?
• Few samples available for determining distributions
• Significant uncertainty exists • Bayesian models uncertainty better
than information-theoretic method which assumes that distributions are fixed
34
![Page 35: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/35.jpg)
35
Summary • ML is a systematic approach instead of ad-hoc
methods in designing information processing systems
• There is much progress in ML and PR – developing a unified theory of different machine learning
approaches, e,g., neural networks, SVM • Fully Bayesian approach involves making
predictions involving complex distributions, made feasible by sampling methods
• Bayesian approach leads to higher performing systems with small sample sizes
![Page 36: Machine Learning using Bayesian Approaches - cedar.buffalo.edusrihari/talks/IISc-Sept2008.pdf · extreme conclusion p µ ... Bernoulli Single binary variable Multinomial One of K](https://reader036.vdocuments.mx/reader036/viewer/2022070617/5e0f817a351154089741616b/html5/thumbnails/36.jpg)
Lecture Slides
• PRML lecture slides at
http: //www.cedar.buffalo.edu/~srihari/CSE574/index.html
36