naïve bayes 𝑖 𝜶 - kangwoncs.kangwon.ac.kr/.../2015_machinelearning/07_naive_bayes.pdf ·...

𝑠𝑖𝑔𝑚𝑎 𝜶

Machine Learning

𝑠𝑖𝑔𝑚𝑎 𝜶

2015.08.01.

Naïve Bayes

𝑠𝑖𝑔𝑚𝑎 𝜶 2

Probability Basics

• Prior, conditional and joint probability for random variables

• Prior probability: 𝑃(𝑋)

• Conditional probability: 𝑃 𝑋1 𝑋2 , 𝑃(𝑋2|𝑋1)

• Joint probability: 𝑿 = 𝑋1, 𝑋2 , 𝑃 𝑿 = 𝑃(𝑋1, 𝑋2)

• Relationship: 𝑃 𝑋1, 𝑋2 = 𝑃 𝑋2 𝑋1 𝑃 𝑋1 = 𝑃 𝑋1 𝑋2 𝑃(𝑋2)

• Independence: 𝑃 𝑋2|𝑋1 = 𝑃 𝑋2 , 𝑃 𝑋1|𝑋2 = 𝑃 𝑋1 ,

𝑃 𝑋1, 𝑋2 = 𝑃 𝑋1 𝑃(𝑋2)

• Bayesian Rule


Probabilistic Classification

• Establishing a probabilistic model for classification

• Discriminative model

),, , )( 1 n1L X(Xc,,cC|CP XX

),,,( 21 nxxx x

Discriminative

Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP


Probabilistic Classification

• Establishing a probabilistic model for classification (cont.)

• Generative model

• Data들의 패턴으로 분류

• Label이 주어졌을 때 data들을 확인 data와 label 관계 파악


Bayes`s Theorem

• Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on conditions that might be related to the event.

• 두 확률 변수의 사전 확률과 사후 확률 사이의 관계를 나타냄

• 새로운 근거가 제시될 때 사후 확률이 어떻게 갱신될지 구함

• 𝑃 𝐴 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑨

• 𝑃 𝐵 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎 𝑩

• 𝑃 𝐴 𝐵 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑨 𝑔𝑖𝑣𝑒𝑛 𝑩

• 𝑃 𝐵 𝐴 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑩 𝑔𝑖𝑣𝑒𝑛 𝑨

𝑷 𝑨 𝑩 =𝑷(𝑩|𝑨)𝑷 𝑨

𝑷 𝑩


Bayes`s Theorem

• MAP classification rule• MAP: Maximum A Posterior

• Assign 𝑥 to 𝑐∗ if

𝑃 𝐶 = 𝑐∗ 𝑋 = 𝑥 > 𝑃 𝐶 = 𝑐 𝑋 = 𝑥 𝑐 ≠ 𝑐∗, 𝑐 = 𝑐1, … , 𝑐𝐿

• Generative classification with the MAP rule• Apply Bayesian rule

𝑃 𝐶 = 𝑐𝑖 𝑋 = 𝑥 =𝑃 𝑋 = 𝑥 𝐶 = 𝑐𝑖 𝑃 𝐶 = 𝑐𝑖

𝑃 𝑋 = 𝑥

∝ 𝑃 𝑋 = 𝑥 𝐶 = 𝑐𝑖 𝑃 𝐶 = 𝑐𝑖 ∀ 𝑐𝑖


Naïve Bayes

• Bayes rule을 적용하면 모든 데이터에 대하여 고려해야 함 learning the joint probability 𝑃(𝑋1, … , 𝑋𝑛|𝐶) : Difficulty

• 10개의 Binary feature 210개의 data

• Thus, assumption that all input features are conditionally independent Naïve Bayes rule

• 각 자질에 대하여 조건부확률이 독립적이라 가정

• 조건부 확률에 대한 경우의 수: 2𝑛 2𝑛


Naïve Bayes

• Naïve Bayes

• MAP classification rule: 𝑥 = (𝑥1, 𝑥2, … , 𝑥𝑛)

𝑃 𝑋1, 𝑋2, … , 𝑋𝑛 𝐶 = 𝑃 𝑋1 𝑋2, … , 𝑋𝑛, 𝐶 𝑃(𝑋2, … , 𝑋𝑛|𝐶)

= 𝑃 𝑋1 𝐶 𝑃(𝑋2, … , 𝑋𝑛|𝐶)

= 𝑃 𝑋1 𝐶 𝑃 𝑋2 𝐶 …𝑃(𝑋𝑛|𝐶)

ProbabilityChain rule!

𝑃 𝑥1 𝐶∗ …𝑃 𝑥𝑛 𝑐

∗ 𝑃 𝑐∗ > [𝑃 𝑥1 𝑐 …𝑃 𝑥𝑛 𝑐)]𝑃(𝑐),

𝑐 ≠ 𝑐^ ∗ , 𝑐 = 𝑐_1, … , 𝑐_𝐿

=

𝑖

𝑃(𝑋𝑖|𝐶)


Example

• Example: Play Tennis


Example

• Learning Phase

Outlook Play=Yes Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14


References

• Naïve Bayes Classifier - Ke Chen

• Advanced Algorithm(Naïve Bayes Classifier) - Leeck

• Machine Learning and Its Applications – Harksoo Kim

• Wikipedia

• http://www.leesanghyun.co.kr/Naive_Bayesian_Classifier

• http://darkpgmr.tistory.com/62

http://www.leesanghyun.co.kr/Naive_Bayesian_Classifier

http://darkpgmr.tistory.com/62


QA

감사합니다.

박천음, 박찬민, 최재혁

𝑠𝑖𝑔𝑚𝑎 𝜶 , 강원대학교

Email: [email protected]

naïve bayes 𝑖 𝜶 - kangwoncs.kangwon.ac.kr/.../2015_machinelearning/07_naive_bayes.pdf ·...

Documents