introduction to machine learninglegacydirs.umiacs.umd.edu/~jbg/teaching/cmsc_726/05b.pdfmachine...

33
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 1 / 13

Upload: others

Post on 11-Mar-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Introduction to Machine Learning

Machine Learning: Jordan Boyd-GraberUniversity of MarylandFEATURE ENGINEERING

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 1 / 13

Page 2: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Content Questions

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 2 / 13

Page 3: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Quiz!

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 3 / 13

Page 4: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Admin Questions

� Writeup must fit in one page

� Unit tests are not comprehensive

� Don’t break autograder

� HW3 due next week

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 4 / 13

Page 5: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

PAC Learnability: Rectangles

Is the hypothesis class of axis-aligned rectangles PAC learnable?

A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnabilityand the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM),36(4):929?965, 1989

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 13

Page 6: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

PAC Learnability: Rectangles

Is the hypothesis class of axis-aligned rectangles PAC learnable?

A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnabilityand the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM),36(4):929?965, 1989

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 13

Page 7: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

What’s the learning algorithm

Call this hS , which we learned from data. hs ∈ c

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 6 / 13

Page 8: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

What’s the learning algorithm

Call this hS , which we learned from data. hs ∈ c

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 6 / 13

Page 9: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 10: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Let c ≡ [b, t]× [l , r ].

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 11: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Let c ≡ [b, t]× [l , r ]. By construction, hS ∈ c, so it can only give falsenegatives.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 12: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Let c ≡ [b, t]× [l , r ]. By construction, hS ∈ c, so it can only give falsenegatives. The region of error is precisely c \hS .

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 13: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Let c ≡ [b, t]× [l , r ]. By construction, hS ∈ c, so it can only give falsenegatives. The region of error is precisely c \hS . WLOG, assumeP(R)≥ ε.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 14: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Let c ≡ [b, t]× [l , r ]. By construction, hS ∈ c, so it can only give falsenegatives. The region of error is precisely c \hS . WLOG, assumeP(R)≥ ε. Consider rectangles R1 . . .R4:

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 15: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Proof

Let c ≡ [b, t]× [l , r ]. By construction, hS ∈ c, so it can only give falsenegatives. The region of error is precisely c \hS . WLOG, assumeP(R)≥ ε. Consider rectangles R1 . . .R4:

We get a bad hS only if we have an observation fall in this region. So let’sbound this probability.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 13

Page 16: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Bounds

Pr[error ] =Pr[ä4i=1x 6∈Ri ] (1)

≤4∑

i=1

Pr[x 6∈Ri ] (2)

=4∑

i=1

(1−P(Ri))m (3)

If we assume that P(Ri)≥ ε4 , then

Pr[error ]≤ 4�

1−ε

4

�m≤ 4 ·exp

n

−mε

4

o

(4)

Solving for m gives

m≥4 ln4/δ

ε(5)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 13

Page 17: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Bounds

Pr[error ] =Pr[ä4i=1x 6∈Ri ] (1)

≤4∑

i=1

Pr[x 6∈Ri ] (2)

=4∑

i=1

(1−P(Ri))m (3)

If we assume that P(Ri)≥ ε4 , then

Pr[error ]≤ 4�

1−ε

4

�m≤ 4 ·exp

n

−mε

4

o

(4)

Solving for m gives

m≥4 ln4/δ

ε(5)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 13

Page 18: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Bounds

Pr[error ] =Pr[ä4i=1x 6∈Ri ] (1)

≤4∑

i=1

Pr[x 6∈Ri ] (2)

=4∑

i=1

(1−P(Ri))m (3)

If we assume that P(Ri)≥ ε4 , then

Pr[error ]≤ 4�

1−ε

4

�m≤ 4 ·exp

n

−mε

4

o

(4)

Solving for m gives

m≥4 ln4/δ

ε(5)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 13

Page 19: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Concept Learning

Are Boolean conjunctions PAC learnable? Think of every feature as aBoolean variable; in a given example the variable is given the value 1 if itscorresponding feature appears in the examples and 0 otherwise. In thisway, if the number of measured features is n the concept is represented asa Boolean function c : {0,1} 7→ {0,1}. For example we could define a chairas something that has four legs and you can sit on and is made of wood.Can you learn such a conjunction concept over n variables?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 13

Page 20: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Algorithm

Start withh = x̄1x1x̄2x2 . . . x̄nxn (6)

(say no to everything) For every positive example you see, remove thenegation of all dimensions present in that example. Example: 10001,11001, 10000, 11000

� After first example, x1x̄2x̄3x̄4x5

� After last example, x1x̄3x̄4

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 13

Page 21: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Algorithm

Start withh = x̄1x1x̄2x2 . . . x̄nxn (6)

(say no to everything)

For every positive example you see, remove thenegation of all dimensions present in that example. Example: 10001,11001, 10000, 11000

� After first example, x1x̄2x̄3x̄4x5

� After last example, x1x̄3x̄4

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 13

Page 22: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Algorithm

Start withh = x̄1x1x̄2x2 . . . x̄nxn (6)

(say no to everything) For every positive example you see, remove thenegation of all dimensions present in that example.

Example: 10001,11001, 10000, 11000

� After first example, x1x̄2x̄3x̄4x5

� After last example, x1x̄3x̄4

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 13

Page 23: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Algorithm

Start withh = x̄1x1x̄2x2 . . . x̄nxn (6)

(say no to everything) For every positive example you see, remove thenegation of all dimensions present in that example. Example: 10001,11001, 10000, 11000

� After first example, x1x̄2x̄3x̄4x5

� After last example, x1x̄3x̄4

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 13

Page 24: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Algorithm

Start withh = x̄1x1x̄2x2 . . . x̄nxn (6)

(say no to everything) For every positive example you see, remove thenegation of all dimensions present in that example. Example: 10001,11001, 10000, 11000

� After first example, x1x̄2x̄3x̄4x5

� After last example, x1x̄3x̄4

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 13

Page 25: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Algorithm

Start withh = x̄1x1x̄2x2 . . . x̄nxn (6)

(say no to everything) For every positive example you see, remove thenegation of all dimensions present in that example. Example: 10001,11001, 10000, 11000

� After first example, x1x̄2x̄3x̄4x5

� After last example, x1x̄3x̄4

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 13

Page 26: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Observations

� Having seen no data, h says no to everything

� Our algorithm can be too specific. It might not say yes when it should.

� We make an error on a literal if we’ve never seen it before (there are 2nliterals: x1, x̄1)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 13

Page 27: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Observations

� Having seen no data, h says no to everything

� Our algorithm can be too specific. It might not say yes when it should.

� We make an error on a literal if we’ve never seen it before (there are 2nliterals: x1, x̄1)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 13

Page 28: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Observations

� Having seen no data, h says no to everything

� Our algorithm can be too specific. It might not say yes when it should.

� We make an error on a literal if we’ve never seen it before (there are 2nliterals: x1, x̄1)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 13

Page 29: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Solving for number of examples

General learning bounds for consistent hypotheses

m≥1

ε

ln |H|+ ln1

δ

(7)

m≥1

ε

n · ln3 + ln1

δ

(8)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 12 / 13

Page 30: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Solving for number of examples

General learning bounds for consistent hypotheses

m≥1

ε

ln |H|+ ln1

δ

(7)

m≥1

ε

n · ln3 + ln1

δ

(8)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 12 / 13

Page 31: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

Solving for number of examples

General learning bounds for consistent hypotheses

m≥1

ε

ln |H|+ ln1

δ

(7)

m≥1

ε

n · ln3 + ln1

δ

(8)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 12 / 13

Page 32: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

3-DNF

Not efficiently learnable unless P = NP.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 13

Page 33: Introduction to Machine Learninglegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/05b.pdfMachine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 5 / 13 PAC Learnability:

3-DNF

Not efficiently learnable unless P = NP.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 13