![Page 1: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/1.jpg)
1
NAÏVE BAYESCLASSIFIER
ACM Student Chapter,Heritage Institute of Technology
10th February, 2012SIGKDD Presentation byAnirban GhoseParami RoySourav Dutta
![Page 2: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/2.jpg)
2
CLASSIFICATION●What is it?
Assigning a given piece of input data into one of a given number of categories.
e.g. :Classifying kitchen items : separating
cups and saucers.
![Page 3: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/3.jpg)
3
CLASSIFICATION
●Why do we need it?
Separating like things from unlike things.
Categorizing different types of cattle like cows, goats, etc.
![Page 4: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/4.jpg)
4
CLASSIFICATION Looking for identifiable patterns.
Predicting an e-mail is spam ornon-spam from patterns observedin previous mails.
Automatic categorization on online articles.
![Page 5: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/5.jpg)
5
Classification Allowing extrapolation.
Given the red dots predicting thevalue at the blue box.
![Page 6: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/6.jpg)
6
Classification Techniques
• Decision Tree based methods• Rule-based methods• Memory based methods• Neural Networks• Naïve Bayes Classifier• Support Vector Machines
![Page 7: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/7.jpg)
7
Problem StatementPlay Tennis : Training Examples
Day Outlook Temperature
Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
![Page 8: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/8.jpg)
8
Problem Statement• Domain space : Set of values an attribute
can have.
• Domain space of previous example:
o Outlook – {Sunny, Overcast, Rain}o Temperature – {Hot, Mild, Cool}o Humidity – {High, Normal}o Wind – {Strong, Weak}
o Play Tennis – {Yes, No}
![Page 9: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/9.jpg)
9
Problem Statement• Instances X :
A set of items over which the concept is defined.
Set of all possible days with attributes Outlook, Temperature, Humidity, Wind.
• Target concept (c):concept or function to be learned.c : X → {0,1}c(x) = 1 : Play Tennis = Yesc(x) = 0 : Play Tennis = No
![Page 10: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/10.jpg)
10
Problem Statement• Hypothesis (H)
A statement that is assumed to be true for the sake of argument.
Conjunction of constraints on the attributes.h : X → {0,1}
• For each attribute hypothesis will be :? – any value is acceptable<value> - a single required valueØ - no value is acceptable
![Page 11: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/11.jpg)
11
Problem Statement• Training examples - Prior knowledge.
Set of input vector (instances) and a label(outcome).
Input vectorOutlook - Sunny,Temperature - Hot,Humidity - High,Wind - Weak.
LabelPlay tennis – No
![Page 12: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/12.jpg)
12
Problem StatementTraining examples can be :
• Positive example :Instance satisfies all the constraints of
hypothesish(x) = 1
• Negative Example :Instance does not satisfy one or many
constraints of hypothesis.h(x) = 0
![Page 13: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/13.jpg)
Learning Algorithm• Naïve Bayes Classifier – Supervised Learning
• Supervised Learning: machine learning task of inferring a function from supervised (labelled) training data
g : X YX : input spaceY : output space
13
![Page 14: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/14.jpg)
14
A quick Recap• Conditional Probability : P(A/B) = P(A∩B) P(B)
• Multiplication Rule: P(A∩B) = P(A/B).P(B) = P(B/A).P(A)
• Independent Events: P(A∩B) = P(A).P(B)
• Total Probability:
A B
![Page 15: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/15.jpg)
15
Few Important Definitions
o Prior Probability: Let p be an uncertain quantity .
Then prior probability is the probability distribution that would express one's uncertainty about p before the "data" is taken into account.
o Posterior probability: The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account
![Page 16: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/16.jpg)
16
Bayes’ Theoremo P(h) : prior probability of hypothesis h
o P(D) : prior probability that the training data will be observed.
• P(D | h) : probability of observing data D given some world in which hypothesis h holds.
• P(h | D) : posterior probability of h ( to be found).
• Then as per Bayes' Theorem:
![Page 17: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/17.jpg)
17
MAP HYPOTHESIS
P(hi) = P(hj)
![Page 18: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/18.jpg)
18
Example• A medical diagnosis problem: It has 2 alternative hypothesis:
1) Patient has a particular form of cancer
2) The patient does not have the particular form of cancer
![Page 19: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/19.jpg)
19
Example - Bayes’ Theorem
TEST OUTCOMESa) + (Positive - having rare disease)b) - (Negative - not having rare disease)
Prior Knowledge:P(cancer) = 0.008 P(~cancer) =
0.992P(+|cancer) = 0.98 P(-|cancer) = 0.02P(+|~cancer) = 0.03 P(-|~cancer) =
0.97
![Page 20: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/20.jpg)
20
Examples – Bayes Theorem
Suppose we now observe a new patient for whom the lab test returns a positive value.
Should we diagnose the patient as having cancer or not??
![Page 21: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/21.jpg)
21
SolutionTherefore, from the following equation:
We get,P(cancer|+) = P(+|cancer).P(cancer)
= (0.98).(0.008) = 0.0078
P(~cancer|+) = P(+|~cancer).P(~cancer)
= (0.03).(.992) = 0.0298
![Page 22: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/22.jpg)
22
Naïve Bayes Classifier
• Supervised Learning Technique
• Bayes Theorem
• MAP Hypothesis
![Page 23: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/23.jpg)
23
Naïve Bayes Classifier• Prior Knowledge
• Training data set• A new instance of data.
• Objective• Classify the new instance of data: <a1,a2,..,an>• Find P(vj|a1,a2,….,an)• Find the required probability for all possible
classifications.• Find the maximum probability among them.
![Page 24: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/24.jpg)
24
Naïve Bayes Classifier
• (vj|a1,a2,. . . . ,an) for all vj in V
• Using Bayes’ Theorem (vj|a1,a2,. . . . ,an)
![Page 25: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/25.jpg)
25
Naïve Bayes Classifier• Why Naïve?
• Assume all attributes to be conditionally independent.
• P(a1,a2,…,an|vj) = P(ai|vj) for all i=1 to n
• VNB = max of P(vj) P(ai|vj) for all vj in V
![Page 26: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/26.jpg)
26
Play Tennis : Training Examples
Day Outlook Temperature
Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
New Instance: < Sunny, Cool, High, Strong >
![Page 27: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/27.jpg)
27
Probability Estimate• We define our probability estimate to be the
frequency of data combinations within the training examples
• P(vj) = Fraction of times vj occurs in the training set.
• P(ai|vj) = Fraction of times ai occurs in those examples which are classified as vj
![Page 28: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/28.jpg)
28
Example• Let’s calculate P(Overcast | Yes)
• Number of training examples classified as Yes = 9
• Number of times Outlook = Overcast given the classification is Yes = 4
• Hence, P(Overcast | Yes) = 4/9
![Page 29: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/29.jpg)
29
• Prior Probability• P(Yes) = 9/14 i.e. P(playing tennis)
• P(No) = 5/14 i.e. P(not playing tennis)
• Look up Tables
![Page 30: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/30.jpg)
30
• P(Yes) P(Sunny|Yes) P(Cool|Yes) P(High|Yes) P(Strong|Yes)
= 9/14 * 2/9 * 3/9 * 3/9 * 3/9= 0.0053
• P(No) P(Sunny|No) P(Cool|No) P(High|No) P(Strong|No)
= 5/14 * 3/5 * 1/5 * 4/5 * 3/5
= 0.0206
• Hence, We can’t play tennis given the weather conditions.
![Page 31: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/31.jpg)
31
Drawback of the estimate
• What happens if the probability estimate is zero?
• The estimate is zero when a particular attribute value never occurs in the training data set given the classification.
• This estimate will ultimately dominate the product term VNB for that particular classification.
![Page 32: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/32.jpg)
32
Example• For a new training set, the attribute outlook does
not have the value overcast when the example is labeled yes.
• P(Overcast | Yes) = 0
• VNB = P(Yes) * P(Overcast | Yes)*P(Cool | Yes)…. = 0
![Page 33: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/33.jpg)
33
Solution• m-estimate of probability
• P(ai | Vj) =
Wherem is the equivalent sample size
P is the prior estimate of the attribute value
![Page 34: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/34.jpg)
34
Disadvantages of Naïve Bayes Classifier
1) Require initial knowledge about many probabilities.
2)Significant computational cost needed to determine Bayes optimal hypothesis.
![Page 35: NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav](https://reader035.vdocuments.mx/reader035/viewer/2022062800/56649ddf5503460f94ad91cc/html5/thumbnails/35.jpg)
35
Conclusion• Naïve Bayes based on the independence
assumptiono Training is very easy and fast
o Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions
• A popular generative modelo Performance competitive to most of state-of-the-art
classifiers even in presence of violating independence assumption
o Many successful applications, e.g., spam mail filtering
o A good candidate of a base learner in ensemble learning