naïve bayes discussion - umd

49
Naïve Bayes Discussion Computational Linguistics: Jordan Boyd-Graber University of Maryland LECTURE 1A Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 1 / 13

Upload: others

Post on 29-Nov-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Naïve Bayes Discussion - UMD

Naïve Bayes Discussion

Computational Linguistics: Jordan Boyd-GraberUniversity of MarylandLECTURE 1A

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 1 / 13

Page 2: Naïve Bayes Discussion - UMD

Roadmap

� Content Questions

� Administrivia Questions

� NB Exercise

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 2 / 13

Page 3: Naïve Bayes Discussion - UMD

Content Questions

Content Questions

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 3 / 13

Page 4: Naïve Bayes Discussion - UMD

Administrivia Questions

Administrivia Announcements

� Use Piazza

� submit server

� TA office hours

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 4 / 13

Page 5: Naïve Bayes Discussion - UMD

Administrivia Questions

Administrivia Questions

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 5 / 13

Page 6: Naïve Bayes Discussion - UMD

NB Example

Documents

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13

Page 7: Naïve Bayes Discussion - UMD

NB Example

Documents

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

What’s |C| and |V |?

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13

Page 8: Naïve Bayes Discussion - UMD

NB Example

Documents

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

|C|= 2 (spam vs. ham)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13

Page 9: Naïve Bayes Discussion - UMD

NB Example

Documents

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

|V |= 8: ’deal’, ’dog’, ’bagel’, ’logistic’,’mother’, ’cialis’, ’abuja’, ’man’

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13

Page 10: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 7 / 13

Page 11: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

What’s P̂(cj)?

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 7 / 13

Page 12: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

� For spam:

(1)

� For ham:

(2)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13

Page 13: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

� For spam:

P̂(cj = spam) =Nc +1

N + |C|(1)

(2)

� For ham:

(3)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13

Page 14: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

� For spam:

P̂(cj = spam) =Nc +1

N + |C|(1)

=4+1

7+2(2)

=5

9(3)

� For ham:

(4)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13

Page 15: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

� For spam:

P̂(cj = spam) =Nc +1

N + |C|(1)

=4+1

7+2(2)

=5

9(3)

� For ham:

(4)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13

Page 16: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

� For spam:

P̂(cj = spam) =Nc +1

N + |C|(1)

=4+1

7+2(2)

=5

9(3)

� For ham:

P̂(cj = ham) =Nc +1

N + |C|(4)

(5)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13

Page 17: Naïve Bayes Discussion - UMD

NB Example

Background Probabilities

� For spam:

P̂(cj = spam) =Nc +1

N + |C|(1)

=4+1

7+2(2)

=5

9(3)

� For ham:

P̂(cj = ham) =Nc +1

N + |C|(4)

=3+1

7+2(5)

=4

9(6)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13

Page 18: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

D1: Spam

abuja man

D3: Spam

cialis deal

D5: Spam

abuja deal

D7: Spam

cialis dog

D2: Ham

man dog

D4: Ham

logistic mother logistic abuja

D6: Ham

bagel deal

What’s the conditional probability P̂(w = dog |c)?

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 9 / 13

Page 19: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

(7)

� For ham:

(8)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 20: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

(8)

� For ham:

(9)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 21: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

=1+1

8+8(8)

(9)

� For ham:

(10)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 22: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

=1+1

8+8(8)

=1

8(9)

� For ham:

(10)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 23: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

=1+1

8+8(8)

=1

8(9)

� For ham:

(10)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 24: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

=1+1

8+8(8)

=1

8(9)

� For ham:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(10)

(11)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 25: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

=1+1

8+8(8)

=1

8(9)

� For ham:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(10)

=1+1

8+8(11)

(12)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 26: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(7)

=1+1

8+8(8)

=1

8(9)

� For ham:

P̂(w = dog |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(10)

=1+1

8+8(11)

=1

8(12)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13

Page 27: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

(13)

� For ham:

(14)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 28: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

(14)

� For ham:

(15)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 29: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=5

9·1

8(14)

(15)

� For ham:

(16)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 30: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=5

9·1

8(14)

=0.07 (15)

� For ham:

(16)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 31: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=0.07 (14)

� For ham:

(15)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 32: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=0.07 (14)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (15)

(16)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 33: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=0.07 (14)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (15)

=4

9·1

8(16)

(17)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 34: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=0.07 (14)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (15)

=4

9·1

8(16)

=0.06 (17)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 35: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the word “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (13)

=0.07 (14)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (15)

=0.06 (16)

These aren’t probabilities? What if we wanted the real probabilities?

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13

Page 36: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

(17)

� For ham:

(18)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13

Page 37: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(17)

(18)

� For ham:

(19)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13

Page 38: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(17)

=0+1

8+8(18)

=1

16(19)

� For ham:

(20)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13

Page 39: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(17)

=0+1

8+8(18)

=1

16(19)

� For ham:

(20)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13

Page 40: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(17)

=0+1

8+8(18)

=1

16(19)

� For ham:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(20)

(21)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13

Page 41: Naïve Bayes Discussion - UMD

NB Example

Conditional Probabilities

� For spam:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(17)

=0+1

8+8(18)

=1

16(19)

� For ham:

P̂(w = logistic |c) =Tcw +1

(∑

w ′∈V Tcw ′)+ |V |(20)

=2+1

8+8(21)

=3

16(22)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13

Page 42: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?

� For spam:

(23)

� For ham:

(24)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 43: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

(24)

� For ham:

(25)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 44: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

=5

9·1

1

16·

1

16(24)

(25)

� For ham:

(26)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 45: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

=5

9·1

1

16·

1

16(24)

=0.0002 (25)

� For ham:

(26)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 46: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

=0.0002 (24)

� For ham:

(25)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 47: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?

� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

=0.0002 (24)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (25)

(26)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 48: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

=0.0002 (24)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (25)

=4

9·1

3

16·

3

16(26)

(27)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13

Page 49: Naïve Bayes Discussion - UMD

NB Example

Prediction

What if you saw a document with the words “logistic” “logistic” “dog”?� For spam:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (23)

=0.0002 (24)

� For ham:

P(c|d)∝P(c)∏

1≤i≤nd

P(wi |c) (25)

=4

9·1

3

16·

3

16(26)

=0.002 (27)

Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13