cs364 artificial intelligence machine learning

48
12775 CS364 Artificial Intelligence Machine Learning Matthew Casey portal.surrey.ac.uk/computing/resources/l3/cs364

Upload: butest

Post on 05-Jul-2015

494 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS364 Artificial Intelligence Machine Learning

12775

CS364 Artificial Intelligence Machine Learning

Matthew Casey

portal.surrey.ac.uk/computing/resources/l3/cs364

Page 2: CS364 Artificial Intelligence Machine Learning

2

12775

Learning Outcomes

• Describe methods for acquiring human knowledge– Through experience

• Evaluate which of the acquisition methods would be most appropriate in a given situation– Limited data available through example

Page 3: CS364 Artificial Intelligence Machine Learning

3

12775

Learning Outcomes

• Describe techniques for representing acquired knowledge in a way that facilitates automated reasoning over the knowledge– Generalise experience to novel situations

• Categorise and evaluate AI techniques according to different criteria such as applicability and ease of use, and intelligently participate in the selection of the appropriate techniques and tools, to solve simple problems– Strategies to overcome the ‘knowledge engineering

bottleneck’

Page 4: CS364 Artificial Intelligence Machine Learning

4

12775

What is Learning?

• ‘The action of receiving instruction or acquiring knowledge’

• ‘A process which leads to the modification of behaviour or the acquisition of new abilities or responses, and which is additional to natural development by growth or maturation’

Source: Oxford English Dictionary Online: http://www.oed.com/, accessed October 2003

Page 5: CS364 Artificial Intelligence Machine Learning

5

12775

Machine Learning

• Negnevitsky:– ‘In general, machine learning involves adaptive

mechanisms that enable computers to learn from experience, learn by example and learn by analogy’ (2005:165)

• Callan:– ‘A machine or software tool would not be viewed as

intelligent if it could not adapt to changes in its environment’ (2003:225)

• Luger:– ‘Intelligent agents must be able to change through the

course of their interactions with the world’ (2002:351)

Page 6: CS364 Artificial Intelligence Machine Learning

6

12775

Types of Learning

• Inductive learning– Learning from examples

• Evolutionary/genetic learning– Shaping a population of individual solutions

through survival of the fittest– Emergent learning– Learning through social interaction: game of

life

Page 7: CS364 Artificial Intelligence Machine Learning

7

12775

Inductive Learning

• Supervised learning– Training examples with a known classification

from a teacher• Unsupervised learning

– No pre-classification of training examples– Competitive learning: learning through

competition on training examples

Page 8: CS364 Artificial Intelligence Machine Learning

8

12775

Key Concepts

• Learn from experience– Through examples, analogy or discovery

• To adapt– Changes in response to interaction

• Generalisation– To use experience to form a response to

novel situations

Page 9: CS364 Artificial Intelligence Machine Learning

9

12775

Generalisation

Page 10: CS364 Artificial Intelligence Machine Learning

10

12775

Machine Learning

• Techniques and algorithms that adapt through experience

• Used for:– Interpretation / visualisation: summarise data– Prediction: time series / stock data– Classification: malignant or benign tumours– Regression: curve fitting– Discovery: data mining / pattern discovery

Page 11: CS364 Artificial Intelligence Machine Learning

11

12775

Why?

• Complexity of task / amount of data– Other techniques fail or are computationally

expensive• Problems that cannot be defined

– Discovery of patterns / data mining• Knowledge Engineering Bottleneck

– ‘Cost and difficulty of building expert systems using traditional […] techniques’ (Luger 2002:351)

Page 12: CS364 Artificial Intelligence Machine Learning

12

12775

Common Techniques

• Least squares• Decision trees• Support vector machines• Boosting• Neural networks• K-means• Genetic algorithms

Hastie, T., Tibshirani, R. & Friedman, J.H. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag.

Page 14: CS364 Artificial Intelligence Machine Learning

14

12775

Neural Networks

• Output is a complex (often non-linear) combination of the inputs

wk1x1

Input Signal

wk2x2

wkmxm

Φ(.)ΣOutput

yk

Biaswk0

Synaptic Weights

Summing Junction

Activation Function

Page 15: CS364 Artificial Intelligence Machine Learning

15

12775

Decision Trees

• A map of the reasoning process, good at solving classification problems (Negnevitsky, 2005)

• A decision tree represents a number of different attributes and values– Nodes represent attributes– Branches represent values of the attributes

• Path through a tree represents a decision• Tree can be associated with rules

Page 16: CS364 Artificial Intelligence Machine Learning

16

12775

Example 1

• Consider one rule for an ice-cream seller (Callan 2003:241)

IF Outlook = SunnyAND Temperature = HotTHEN Sell

Page 17: CS364 Artificial Intelligence Machine Learning

17

12775

Example 1

OutlookOutlook

TemperatureTemperature

SunnySunny

HotHot

SellSell

Don’t SellDon’t SellSellSell

YesYes NoNo

MildMild

Holiday SeasonHoliday Season

ColdCold

Don’t SellDon’t Sell

Holiday SeasonHoliday Season

OvercastOvercast

NoNo

Don’t SellDon’t Sell

YesYes

TemperatureTemperature

HotHot ColdColdMildMild

Don’t SellDon’t SellSellSell Don’t SellDon’t Sell

Root nodeBranch

Leaf

Node

Page 18: CS364 Artificial Intelligence Machine Learning

18

12775

Construction

• Concept learning:– Inducing concepts from examples

• We can intuitively construct a decision tree for a small set of examples

• Different algorithms used to construct a tree based upon the examples– Most popular ID3 (Quinlan, 1986)

Page 19: CS364 Artificial Intelligence Machine Learning

19

12775

Example 2

-TrueFalse3-TrueTrue4

21

Item

+FalseTrue+FalseFalse

ClassYX

Draw a tree based upon the following data:

Page 20: CS364 Artificial Intelligence Machine Learning

20

12775

Example 2: Tree 1

YY

{3,4}{3,4}NegativeNegative

{1,2}{1,2}PositivePositive

TrueTrue FalseFalse

Page 21: CS364 Artificial Intelligence Machine Learning

21

12775

Example 2: Tree 2

XX

{2,4}{2,4}YY

{1,3}{1,3}YY

{2}{2}PositivePositive

{4}{4}NegativeNegative

TrueTrue FalseFalse

TrueTrue FalseFalse

{1}{1}PositivePositive

{3}{3}NegativeNegative

TrueTrue FalseFalse

Page 22: CS364 Artificial Intelligence Machine Learning

22

12775

Which Tree?

• Different trees can be constructed from the same set of examples

• Which tree is the best?– Based upon choice of attributes at each node in the

tree– A split in the tree (branches) should correspond to the

predictor with the maximum separating power• Examples can be contradictory

– Real-life is noisy

Page 23: CS364 Artificial Intelligence Machine Learning

23

12775

Example 3

• Callan (2003:242-247)– Locating a new bar

Page 24: CS364 Artificial Intelligence Machine Learning

24

12775

Choosing Attributes

• Entropy:– Measure of disorder (high is bad)

• For c classification categories• Attribute a that has value v• Probability of v being in category i is pi

• Entropy E is:( ) ∑

=

−==c

iii ppvaE

12log

Page 25: CS364 Artificial Intelligence Machine Learning

25

12775

Entropy Example

• Choice of attributes:– City/Town, University, Housing Estate,

Industrial Estate, Transport and Schools• City/Town: is either Y or N• For Y: 7 positive examples, 3 negative• For N: 4 positive examples, 6 negative

Page 26: CS364 Artificial Intelligence Machine Learning

26

12775

Entropy Example

• City/Town as root node:– For c=2 (positive and negative) classification

categories– Attribute a=City/Town that has value v=Y– Probability of v=Y being in category positive

– Probability of v=Y being in category negative10

7positive ==ip

103

negative ==ip

Page 27: CS364 Artificial Intelligence Machine Learning

27

12775

• City/Town as root node:– For c=2 (positive and negative) classification

categories– Attribute a=City/Town that has value v=Y– Entropy E is:

Entropy Example

( )881.0

103log10

310

7log107YCity/Town 22

=

−−==E

Page 28: CS364 Artificial Intelligence Machine Learning

28

12775

• City/Town as root node:– For c=2 (positive and negative) classification

categories– Attribute a=City/Town that has value v=N– Probability of v=N being in category positive

– Probability of v=N being in category negative

Entropy Example

104

positive ==ip

106

negative ==ip

Page 29: CS364 Artificial Intelligence Machine Learning

29

12775

• City/Town as root node:– For c=2 (positive and negative) classification

categories– Attribute a=City/Town that has value v=N– Entropy E is:

Entropy Example

( )971.0

106log10

610

4log104NCity/Town 22

=

−−==E

Page 30: CS364 Artificial Intelligence Machine Learning

30

12775

Choosing Attributes

• Information gain: – Expected reduction in entropy (high is good)

• Entropy of whole example set T is E(T)• Examples with a=v, v is jth value are Tj,

• Entropy E(a=v)=E(Tj)• Gain is: ( ) ( ) ( )∑

=

−=V

jj

jTE

T

TTEaTGain

1

,

Page 31: CS364 Artificial Intelligence Machine Learning

31

12775

• For root of tree there are 20 examples:– For c=2 (positive and negative) classification

categories– Probability of being positive with 11 examples

– Probability of being negative with 9 examples

Information Gain Example

2011

positive ==ip

209

negative ==ip

Page 32: CS364 Artificial Intelligence Machine Learning

32

12775

• For root of tree there are 20 examples:– For c=2 (positive and negative) classification

categories– Entropy of all training examples E(T) is:

Information Gain Example

( )993.0

209log20

920

11log2011T 22

=

−−=E

20=T

Page 33: CS364 Artificial Intelligence Machine Learning

33

12775

Information Gain Example

• City/Town as root node:– 10 examples for a=City/Town and value v=Y

– 10 examples for a=City/Town and value v=N10==YjT

10==NjT

( ) ( ) ( )( )( )067.0

971.02010881.020

10993.0/,

=

×+×−=TownCityTGain

( ) 881.0==YjTE

( ) 971.0==NjTE

Page 34: CS364 Artificial Intelligence Machine Learning

34

12775

Example 4

• Calculate the information gain for the Transport attribute

Page 35: CS364 Artificial Intelligence Machine Learning

35

12775

Information Gain Example

Page 36: CS364 Artificial Intelligence Machine Learning

36

12775

Choosing Attributes

• Chose root node as the attribute that gives the highest Information Gain– In this case attribute Transport with 0.266

• Branches from root node then become the values associated with the attribute– Recursive calculation of attributes/nodes– Filter examples by attribute value

Page 37: CS364 Artificial Intelligence Machine Learning

37

12775

Recursive Example

• With Transport as the root node:– Select examples where Transport is Average– (1, 3, 6, 8, 11, 15, 17)– Use only these examples to construct this

branch of the tree– Repeat for each attribute (Poor, Good)

Page 38: CS364 Artificial Intelligence Machine Learning

38

12775

Final Tree

TransportTransport

{7,12,16,19,20}{7,12,16,19,20}PositivePositive

AA PP GG

{8}{8}NegativeNegative

{6}{6}NegativeNegative

{1,3,6,8,11,15,17}{1,3,6,8,11,15,17}Housing EstateHousing Estate

LL MM SS NN

{11,17}{11,17}Industrial EstateIndustrial Estate

{17}{17}NegativeNegative

{11}{11}PPositiveositive

YY NN

{1,3,15}{1,3,15}UniversityUniversity

{15}{15}NegativeNegative

{1,3}{1,3}PPositiveositive

YY NN

Callan 2003:243

{5,9,14}{5,9,14}PPositiveositive

{2,4,10,13,18}{2,4,10,13,18}NegativeNegative

{2,4,5,9,10,13,14,18}{2,4,5,9,10,13,14,18}Industrial EstateIndustrial Estate

YY NN

Page 39: CS364 Artificial Intelligence Machine Learning

39

12775

ID3

• Procedure Extend(Tree d, Examples T)– Choose best attribute a for root of d

• Calculate E(a=v) and Gain(T,a) for each attribute• Attribute with highest Gain(T,a) is selected as best

– Assign best attribute a to root of d– For each value v of attribute a

• Create branch for v=a resulting in sub-tree dj

• Assign to Tj training examples from T where v=a• Recurse sub-tree with Extend(dj, Tj)

Page 40: CS364 Artificial Intelligence Machine Learning

40

12775

Issues

• Use prior knowledge where available• Not all the examples may be needed to

construct a tree– Test generalisation of tree during training and

stop when desired performance is achieved– Prune the tree once constructed

• Examples may be noisy• Examples may contain irrelevant attributes

Page 41: CS364 Artificial Intelligence Machine Learning

41

12775

Extracting Rules

• We can extract rules from decision trees– Create one rule for each root-to-leaf path– Simplify by combining rules

• Other techniques are not so transparent:– Neural networks are often described as ‘black

boxes’ – it is difficult to understand what the network is doing

– Extraction of rules from trees can help us to understand the decision process

Page 42: CS364 Artificial Intelligence Machine Learning

42

12775

Rules Example

TransportTransport

{1,3,6,8,11,15,17}{1,3,6,8,11,15,17}Housing EstateHousing Estate

{2,4,5,9,10,13,14,18}{2,4,5,9,10,13,14,18}Industrial EstateIndustrial Estate

{7,12,16,19,20}{7,12,16,19,20}PositivePositive

{11,17}{11,17}Industrial EstateIndustrial Estate

{1,3,15}{1,3,15}UniversityUniversity

{8}{8}NegativeNegative

{6}{6}NegativeNegative

{5,9,14}{5,9,14}PPositiveositive

{2,4,10,13,18}{2,4,10,13,18}NegativeNegative

{17}{17}NegativeNegative

{11}{11}PPositiveositive

{15}{15}NegativeNegative

{1,3}{1,3}PPositiveositive

AA PP GG

LL MM SS NN

YY NN YY NN

Callan 2003:243

YY NN

Page 43: CS364 Artificial Intelligence Machine Learning

43

12775

Rules Example

• IF Transport is AverageAND Housing Estate is LargeAND Industrial Estate is YesTHEN Positive

• …• IF Transport is Good

THEN Positive

Page 44: CS364 Artificial Intelligence Machine Learning

44

12775

Summary

• What are the benefits/drawbacks of machine learning?– Are the techniques simple?– Are they simple to implement?– Are they computationally cheap?– Do they learn from experience?– Do they generalise well?– Can we understand how knowledge is represented?– Do they provide perfect solutions?

Page 45: CS364 Artificial Intelligence Machine Learning

45

12775

Source Texts

• Negnevitsky, M. (2005). Artificial Intelligence: A Guide to Intelligent Systems. 2nd Edition. Essex, UK: Pearson Education Limited.– Chapter 6, pp. 165-168, chapter 9, pp. 349-360.

• Callan, R. (2003). Artificial Intelligence, Basingstoke, UK: Palgrave MacMillan.– Part 5, chapters 11-17, pp. 225-346.

• Luger, G.F. (2002). Artificial Intelligence: Structures & Strategies for Complex Problem Solving. 4th Edition. London, UK: Addison-Wesley.– Part IV, chapters 9-11, pp. 349-506.

Page 47: CS364 Artificial Intelligence Machine Learning

47

12775

Articles

• Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, vol. 1, pp.81-106.

• Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.