machine learning reading: chapter 18. 2 machine learning and ai improve task performance through...

44
Machine Learning Reading: Chapter 18

Post on 20-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

Machine Learning

Reading: Chapter 18

Page 2: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

2

Machine Learning and AI

Improve task performance through observation, teaching

Acquire knowledge automatically for use in a task

Learning as a key component in intelligence

Page 3: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

3

What does it mean to learn?

Page 4: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

4

Different kinds of Learning

Rote learning

Learning from instruction

Learning by analogy

Learning from observation and discovery

Learning from examples

Page 5: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

5

Inductive Learning

Input: x, f(x)

Output: a function h that approximates f

A good hypothesis, h, is a generalization or learned rule

Page 6: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

6

How do systems learn?

Supervised

Unsupervised

Reinforcement

Page 7: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

7

Three Types of Learning

Rule induction E.g., decision trees

Knowledge based E.g., using a domain theory

Statistical E.g., Naïve bayes, Nearest neighbor,

support vector machines

Page 8: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

8

Applications

Language/speech Machine translation Summarization Grammars

IR Text categorization, relevance feedback

Medical Assessment of illness severity

Vision Face recognition, digit recognition, outdoor scene recognition

Security Intrusion detection, network traffic, credit fraud

Social networks Email traffic

To think about: applications to systems, computer engineering, software?

Page 9: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

9

Language Tasks

Text summarization Task: given a document which sentences

could serve as the summary Training data: summary + document pairs Output: rules which extract sentences given

an unseen document Grammar induction

Task: produce a tree representing syntactic structure given a sentence

Training data: set of sentences annotated with parse tree

Output: rules which generate a parse tree given an unseen sentence

Page 10: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

10

IR Task

Text categorization http://www.yahoo.com Task: given a web page, is it news or not?

Binary classification (yes, no) Classify as one of

business&economy,news&media, computer Training data: documents labeled with

category Output: a yes/no response for a new

document; a category for a new document

Page 11: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

11

Medical

Task: Does a patient have heart disease (on a scale from 1 to 4)

Training data: Age, sex,cholesterol, chest pain location,

chest pain type, resting blood pressure, smoker?, fasting blood sugar, etc.

Characterization of heart disease (0,1-4) Output:

Given a new patient, classification by disease

Page 12: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

12

General Approach

Formulate task

Prior model (parameters, structure)

Obtain data

What representation should be used? (attribute/value pairs)

Annotate data

Learn/refine model with data(training)

Use model for classification or prediction on unseen data (testing)

Measure accuracy

Page 13: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

13

Issues

Representation How to map from a representation in the

domain to a representation used for learning? Training data

How can training data be acquired? Amount of training data

How well does the algorithm do as we vary the amount of data?

Which attributes influence learning most? Does the learning algorithm provide

insight into the generalizations made?

Page 14: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

14

Classification Learning

Input: a set of attributes and values

Output: discrete valued function Learning a continuous valued function is

called regression

Binary or boolean classification: category is either true or false

Page 15: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

15

Learning Decision Trees

Each node tests the value of an input attribute

Branches from the node correspond to possible values of the attribute

Leaf nodes supply the values to be returned if that leaf is reached

Page 16: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

16

Example

http://www.ics.uci.edu/~mlearn/MLSummary.html Iris Plant Database Which of 3 classes is a given Iris plant?

Iris Setosa Iris Versicolour Iris Virginica

Attributes Sepal length in cm Sepal width in cm Petal length in cm Petal width in cm

Page 17: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

17

Summary Statistics: Min Max Mean SD ClassCorrelation sepal length: 4.3 7.9 5.84 0.83 0.7826 sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

Rules to learn If sepal length > 6 and sepal width > 3.8 and petal

length < 2.5 and petal width < 1.5 then class = Iris Setosa

If sepal length > 5 and sepal width > 3 and petal length >5.5 and petal width >2 then class = Iris Versicolour

If sepal length <5 and sepal width > 3 and petal length 2.5 and ≤ 5.5 and petal width 1.5 and ≤ 2 then class = Iris Virginica

Page 18: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

DataS-length S-width P-length P-width Class

1 6.8 3 6.3 2.3 Versicolour

2 7 3.9 2.4 2.2 Setosa

3 2 3 2.6 1.7 Verginica

4 3 3.4 2.5 1.1 Verginica

5 5.5 3.6 6.8 2.4 Versicolour

6 7.7 4.1 1.2 1.4 Setosa

7 6.3 4.3 1.6 1.2 Setosa

8 1 3.7 2.8 2.2 Verginica

9 6 4.2 5.6 2.1 Versicolour

Page 19: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

DataS-length S-width P-length Class

1 6.8 3 6.3 Versicolour

2 7 3.9 2.4 Setosa

3 2 3 2.6 Verginica

4 3 3.4 2.5 Verginica

5 5.5 3.6 6.8 Versicolour

6 7.7 4.1 1.2 Setosa

7 6.3 4.3 1.6 Setosa

8 1 3.7 2.8 Verginica

9 6 4.2 5.6 Versicolour

Page 20: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

20

Constructing the Decision Tree

Goal: Find the smallest decision tree consistent with the examples

Find the attribute that best splits examples Form tree with root = best attribute For each value vi (or range) of best attribute

Selects those examples with best=vi

Construct subtreei by recursively calling decision tree with subset of examples, all attributes except best

Add a branch to tree with label=vi and subtree=subtreei

Page 21: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

21

Construct example decision tree

Page 22: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

22

Tree and Rules Learned

S-length

<5.5 5.5

3,4,8 P-length

5.6 ≤2.4

1,5,9 2,6,7

If S-length < 5.5., then Verginica

If S-length 5.5 and P-length 5.6, then Versicolour

If S-length 5.5 and P-length ≤ 2.4, then Setosa

Page 23: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

23

Comparison of Target and Learned

If S-length < 5.5., then VerginicaIf S-length 5.5 and P-length 5.6, then VersicolourIf S-length 5.5 and P-length ≤ 2.4, then Setosa

If sepal length > 6 and sepal width > 3.8 and petal length < 2.5 and petal width < 1.5 then class = Iris SetosaIf sepal length > 5 and sepal width > 3 and petal length >5.5 and petal width >2 then class = Iris VersicolourIf sepal length <5 and sepal width > 3 and petal length 2.5 and ≤ 5.5 and petal width 1.5 and ≤ 2 then class = Iris Virginica

Page 24: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

24

Text Classification

Is texti a finance new article?

Positive Negative

Page 25: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

25

20 attributes

Investors 2 Dow 2 Jones 2 Industrial 1 Average 3 Percent 5 Gain 6 Trading 8 Broader 5 stock 5 Indicators 6 Standard 2 Rolling 1 Nasdaq 3 Early 10 Rest 12 More 13 first 11 Same 12 The 30

Page 26: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

26

20 attributes

Men’s Basketball Championship UConn Huskies Georgia Tech Women Playing Crown Titles Games Rebounds All-America early rolling Celebrates Rest More First The same

Page 27: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

27

Examplestock rolling the class

1 0 3 40 other

2 6 8 35 finance

3 7 7 25 other

4 5 7 14 other

5 8 2 20 finance

6 9 4 25 finance

7 5 6 20 finance

8 0 2 35 other

9 0 11 25 finance

10 0 15 28 other

Page 28: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

28

Issues

Representation How to map from a representation in the

domain to a representation used for learning? Training data

How can training data be acquired? Amount of training data

How well does the algorithm do as we vary the amount of data?

Which attributes influence learning most? Does the learning algorithm provide

insight into the generalizations made?

Page 29: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

29

Constructing the Decision Tree

Goal: Find the smallest decision tree consistent with the examples

Find the attribute that best splits examples Form tree with root = best attribute For each value vi (or range) of best attribute

Selects those examples with best=vi

Construct subtreei by recursively calling decision tree with subset of examples, all attributes except best

Add a branch to tree with label=vi and subtree=subtreei

Page 30: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

30

Choosing the Best Attribute:Binary Classification

Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction

Information theory (Shannon and Weaver 49)

H(P(v1),…P(vn))=∑-P(vi)log2P(vi)

H(1/2,1/2)=-1/2log21/2-1/2log21/2=1 bitH(1/100,1/99)=-.01log2.01-.99log2.99=.08 bits

n

i=1

Page 31: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

31

Information based on attributes

= Remainder (A)

P=n=10, so H(1/2,1/2)= 1 bit

Page 32: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

32

Information Gain

Information gain (from attribute test) = difference between the original information requirement and new requirement

Gain(A)=H(p/p+n,n/p+n)-Remainder(A)

Page 33: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

33

Examplestock rolling the class

1 0 3 40 other

2 6 8 35 finance

3 7 7 25 other

4 5 7 14 other

5 8 2 20 finance

6 9 4 25 finance

7 5 6 20 finance

8 0 2 35 other

9 0 11 25 finance

10 0 15 28 other

Page 34: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

stock rolling

<5 5-10 10 10

5-10<5

1,8,9,10 2,3,4,5,6,7

1,5,6,82,3,4,7

9,10

Gain(stock)=1-[4/10H(1/10,3/10)+6/10H(4/10,2/10)]= 1-[.4((-.1* -3.32)-(.3*-1.74))+.6((-.4*-1.32)-(.2*-2.32))]=

1-[.303+.5952]=.105

Gain(rolling)=1-[4/10H(1/2,1/2)+4/10H(1/2,1/2)+2/10H(1/2,1/2)]=0

Page 35: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

35

Other cases What if class is discrete valued, not

binary?

What if an attribute has many values (e.g., 1 per instance)?

Page 36: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

36

Training vs. Testing

A learning algorithm is good if it uses its learned hypothesis to make accurate predictions on unseen data

Collect a large set of examples (with classifications)

Divide into two disjoint sets: the training set and the test set

Apply the learning algorithm to the training set, generating hypothesis h

Measure the percentage of examples in the test set that are correctly classified by h

Repeat for different sizes of training sets and different randomly selected training sets of each size.

Page 37: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

37

Page 38: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

38

Division into 3 sets

Inadvertent peeking

Parameters that must be learned (e.g., how to split values)

Generate different hypotheses for different parameter values on training data

Choose values that perform best on testing data

Why do we need to do this for selecting best attributes?

Page 39: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

39

Overfitting

Learning algorithms may use irrelevant attributes to make decisions For news, day published and newspaper

Decision tree pruning Prune away attributes with low

information gain Use statistical significance to test

whether gain is meaningful

Page 40: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

40

K-fold Cross Validation

To reduce overfitting

Run k experiments Use a different 1/k of data for testing

each time Average the results

5-fold, 10-fold, leave-one-out

Page 41: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

41

Ensemble Learning

Learn from a collection of hypotheses

Majority voting

Enlarges the hypothesis space

Page 42: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

42

Boosting

Uses a weighted training set Each example as an associated weight wj0 Higher weighted examples have higher importance

Initially, wj=1 for all examples Next round: increase weights of misclassified

examples, decrease other weights From the new weighted set, generate hypothesis

h2 Continue until M hypotheses generated Final ensemble hypothesis = weighted-majority

combination of all M hypotheses Weight each hypothesis according to how well it did

on training data

Page 43: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

43

AdaBoost

If input learning algorithm is a weak learning algorithm L always returns a hypothesis with weighted

error on training slightly better than random Returns hypothesis that classifies training

data perfectly for large enough M Boosts the accuracy of the original

learning algorithm on training data

Page 44: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically

44

Issues

Representation How to map from a representation in the

domain to a representation used for learning? Training data

How can training data be acquired? Amount of training data

How well does the algorithm do as we vary the amount of data?

Which attributes influence learning most? Does the learning algorithm provide

insight into the generalizations made?