cpsc 881: machine learning instance based learning

27
CpSc 881: Machine Learning Instance Based Learning

Upload: marsha-hancock

Post on 17-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CpSc 881: Machine Learning Instance Based Learning

CpSc 881: Machine Learning

Instance Based Learning

Page 2: CpSc 881: Machine Learning Instance Based Learning

2

Copy Right Notice

Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!

Page 3: CpSc 881: Machine Learning Instance Based Learning

3

Instance Based Learning (IBL)

IBL methods learn by simply storing the presented training data.

When a new query instance is encountered, a set of similar related instances is retrieved from memory and used to classify the new query instance.

IBL approaches can construct a different approximation to the target function for each distinct query. They can construct local rather than global approximations.

IBL methods can use complex symbolic representations for instances. This is called Case-Based Reasoning (CBR).

Page 4: CpSc 881: Machine Learning Instance Based Learning

4

Advantages and Disadvantages of IBL Methods

Advantage: IBL Methods are particularly well suited to problems in which the target function is very complex, but can still be described by a collection of less complex local approximations.

Disadvantage I: The cost of classifying new instances can be high (since most of the computation takes place at this stage).

Disadvantage II: Many IBL approaches typically consider all attributes of the instances ==> they are very sensitive to the curse of dimensionality!

Page 5: CpSc 881: Machine Learning Instance Based Learning

5

Instance Based Learning

Nearest Neighbor:

Given query instance xq, first locate nearest training example xn, then estimate

f(xq)<-f(xn)

K-Nearest Neighbor:

Given query instance xq,

take vote among its k nearest neighbors, if discrete-valued target function

Take mean of f values of k nearest neighbors, if real valued

k

xfxf

k

ii

q

1

)()(

Page 6: CpSc 881: Machine Learning Instance Based Learning

6

k-Nearest Neighbor Learning in Euclidean Space

Assumption: All instances, x, correspond to points in the n-dimensional space Rn. x =<a1(x), a2(x)…an(x)>. Measure Used: Euclidean Distance: d(xi,xj)= r=1

n (ar(xi)-ar(xj))2

Training Algorithm:For each training example <x,f(x)>, add the example to the list training_examples.

Classification Algorithm: Given a query instance xq to be classified:

Let x1…xk be the k instances from training_examples that are nearest to xq.

Return f^(xq) <- argmaxvVr=1n (v,f(xi))

where (a,b)=1 if a=b and (a,b)=0 otherwise.

Page 7: CpSc 881: Machine Learning Instance Based Learning

7

Voronoi Diagram

+

+

--

-

: query, xq

1-NN: +5-NN: -

Decision Surface for 1-NN

Page 8: CpSc 881: Machine Learning Instance Based Learning

8

Behavior in the Limit

Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative)

Nearest neighbor:As number of training examples -> ∞, approaches Gibbs AlgorithmGibbs: with probability p(x) predict 1, else 0

K nearest neighbor:As number of training examples -> ∞ and k get large, approaches Bayes optimalBayes optimal: if p(x)>0.5 then predict 1, else 0

Note Gibbs has at most twice the expected error of Bayes optimal.

Page 9: CpSc 881: Machine Learning Instance Based Learning

9

Distance-Weighted Nearest Neighbors

k-NN can be refined by weighting the contribution of the k neighbors according to their distance to the query point xq, giving greater weight to closer neighbors.

To do so, replace the last line of the algorithm with

f^(xq) <- argmaxvVr=1n wi(v,f(xi))

where wi=1/d(xq,xi)2

Page 10: CpSc 881: Machine Learning Instance Based Learning

10

Remarks on kNN

Advantages:Advantages:the NN algorithm can estimate complex target concepts locally and differently for each new instance to be classified;

the NN algorithm provides good generalization accuracy on many domains;

the NN algorithm learns very quickly;

the NN algorithm is robust to noisy training data;

the NN algorithm is intuitive and easy to understand which facilitates implementation and modification.

Disadvantages:Disadvantages:the NN algorithm has large storage requirements because it has to store all the data;

the NN algorithm is slow during instance classification because all the training instances have to be visited;

the accuracy of the NN algorithm degrades with increase of noise in the training data;

the accuracy of the NN algorithm degrades with increase of irrelevant attributes.

Efficient memory indexing of the training instances was proposed to speed up instance classification. The most popular indexing technique is based on multidimensional trees

Page 11: CpSc 881: Machine Learning Instance Based Learning

11

Curse of Dimensionality

Inductive Bias of k-nearest neighborAssumption that the classification of an instance xq will be most similar to the classification of other instance that are nearby in Euclidean distance.

Curse of dimensionality: nearest neighbor is easily mislead while high-dimensional X.

The distance is calculated based on all attributes of the instance. Image instances described by 20 attributes, but only two are relevant to target function.

Page 12: CpSc 881: Machine Learning Instance Based Learning

12

Curse of Dimensionality

Data in only one dimension is relatively packed

Adding a dimension “stretch” the points across that dimension, making them further apart

Adding more dimensions will make the points further apart—high dimensional data is extremely sparse

Distance measure becomes meaningless—due to equi-distance

Page 13: CpSc 881: Machine Learning Instance Based Learning

13

Curse of Dimensionality

Solution: weigh the attributes differently (use cross-validation to determine the weights)eliminate the least relevant attributes (again, use cross-validation to determine which attributes to eliminate)

Page 14: CpSc 881: Machine Learning Instance Based Learning

14

Locally Weighted Regression

kNN forms local approximation to f for each query point xq, Why not form an explicit approximation f^(x) for region surrounding xq

Locally weighted regression generalizes nearest-neighbor approaches by constructing an explicit approximation to f over a local region surrounding xq.

In such approaches, the contribution of each training example is weighted by its distance to the query point.

Page 15: CpSc 881: Machine Learning Instance Based Learning

15

An Example: Locally Weighted Linear Regression

f is approximated by: f^(x)=w0+w1a1(x)+…+wnan(x)

Gradient descent can be used to find the coefficients w0, w1,…wn that minimize some error function.

The error function, however, should be different from the one used in the Neural Net since we want a local solution.

Different possibilities:Minimize the squared error over just the k nearest neighbors.

Minimize the squared error over the entire training set but weigh the contribution of each example by some decreasing function K of its distance from xq.

Combine 1 and 2

qxofnbrsnearestkx

q xfxfxE 21 ))(ˆ)((

2

1)(

Dx

qq xxdKxfxfxE )),(())(ˆ)((2

1)( 2

2

qxofnbrsnearestkx

qq xxdKxfxfxE )),(())(ˆ)((2

1)( 2

1

Page 16: CpSc 881: Machine Learning Instance Based Learning

16

Radial Basis Function (RBF)

Approximating Function:

f^(x)=w0+ u=1k wu Ku(d(xu,x))

Ku(d(xu,x)) is a kernel function that decreases as the distance d(xu,x) increases (e.g., the Gaussian function); and k is a user-defined constant that specifies the number of kernel functions to be included.

Although f^(x) is a global approximation to f(x) the contribution of each kernel function is localized.

RBF can be implemented in a neural network. It is a very efficient two step algorithm:

Find the parameters of the kernel functions (e.g., use the EM algorithm)

Learn the linear weights of the kernel functions.

Page 17: CpSc 881: Machine Learning Instance Based Learning

17

Radial Basis Function Network

Where a1(x) are the attributes describing instance x and

f^(x)=w0+ u=1k wu Ku(d(xu,x))

One common choice for Ku(d(xu,x)) is

),(2

1 22

)),((xxd

uu

uuexxdK

Page 18: CpSc 881: Machine Learning Instance Based Learning

18

Case-Based Reasoning (CBR)

CBR is similar to k-NN methods in that:They are lazy learning methods in that they defer generalization until a query comes around.They classify new query instances by analyzing similar instances while ignoring instances that are very different from the query.

However, CBR is different from k-NN methods in that:

They do not represent instances as real-valued points, but instead, they use a rich symbolic representation.

CBR can thus be applied to complex conceptual problems such as the design of mechanical devices or legal reasoning

Application of CBR:Design: landscape, building, mechanical, conceptual design of aircraft sub-systemsPlanning: repair schedulesDiagnosis: medicalAdversarial reasoning: legal

Page 19: CpSc 881: Machine Learning Instance Based Learning

19

Case-Based Reasoning (CBR)

Methodology

Instances represented by rich symbolic descriptions (e.g., function graphs)

Search for similar cases, multiple retrieved cases may be combined

Tight coupling between case retrieval, knowledge-based reasoning, and problem solving

Challenges

Find a good similarity metric

Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases

Page 20: CpSc 881: Machine Learning Instance Based Learning

20

CBR process

New Case

matching Matched Cases

Retrieve

Adapt?No

Yes

Closest Case

Suggest solution

Retain

Learn

Revise

Reuse

Case Base

Knowledge and Adaptation rules

Page 21: CpSc 881: Machine Learning Instance Based Learning

21

CBR example: Property pricing

Case Location code

Bedrooms Recep rooms

Type floors Cond-ition

Price £

1 8 2 1 terraced 1 poor 20,500

2 8 2 2 terraced 1 fair 25,000

3 5 1 2 semi 2 good 48,000

4 5 1 2 terraced 2 good 41,000

Case Location code

Bedrooms Recep rooms

Type floors Cond-ition

Price £

5 7 2 2 semi 1 poor ???

Test instance

Page 22: CpSc 881: Machine Learning Instance Based Learning

22

How rules are generated

There is no unique way of doing it. Here is one possibility:

Examine cases and look for ones that are almost identical

case 1 and case 2

R1: If recep-rooms changes from 2 to 1 then reduce price by £5,000

case 3 and case 4

R2: If Type changes from semi to terraced then reduce price by £7,000

Page 23: CpSc 881: Machine Learning Instance Based Learning

23

Matching

Comparing test instance matches(5,1) = 3matches(5,2) = 3matches(5,3) = 2matches(5,4) = 1

Estimate price of case 5 is £25,000

Page 24: CpSc 881: Machine Learning Instance Based Learning

24

Adapting

Reverse rule 2if type changes from terraced to semi then increase price by £7,000

Apply reversed rule 2 new estimate of price of property 5 is £32,000

Page 25: CpSc 881: Machine Learning Instance Based Learning

25

Learning

So far we have a new case and an estimated price

nothing is added yet to the case base

If later we find house sold for £35,000 then the case would be added

could add a new rule

if location changes from 8 to 7 increase price by £3,000

Page 26: CpSc 881: Machine Learning Instance Based Learning

26

About CBR

Problems with CBRHow should cases be represented?How should cases be indexed for fast retrieval?How can good adaptation heuristics be developed?When should old cases be removed?

AdvantagesA local approximation is found for each test caseKnowledge is in a form understandable to human beingsFast to train

Page 27: CpSc 881: Machine Learning Instance Based Learning

27

Lazy vs. Eager Learning

Eager LearningLearning = acquiring explicit description of the target concepts on the whole training set;Classification = an instance gets a classification using the explicit description of the target concepts.

Instance-Based Learning (Lazy Learning)Learning = storing all training instancesClassification = an instance gets a classification equal to the classification of the nearest instances to the instance.

AccuracyLazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target functionEager: must commit to a single hypothesis that covers the entire instance space