ces 514 – data mining lecture 8 classification (contd…)

CES 514 – Data Mining Lecture 8

classification (contd…)

Example: PEBLS

PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg)– Works with both continuous and nominal features For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM)

– Each record is assigned a weight factor

– Number of nearest neighbor, k = 1

Example: PEBLS

Marital Status

Single Married

Divorced

Yes 2 0 1

No 2 4 1

121 ),(

Distance between nominal attribute values:

d(Single,Married)

= | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1

d(Single,Divorced)

= | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0

d(Married,Divorced)

= | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1

d(Refund=Yes,Refund=No)

= | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7

Tid Refund MaritalStatus

TaxableIncome Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes10

Refund

Yes No

Yes 0 3

No 3 4

Example: PEBLS

iiiYX YXdwwYX

2),(),(

Tid Refund Marital Status

Taxable Income Cheat

X Yes Single 125K No

Y No Married 100K No 10

Distance between record X and record Y:

where:

correctly predicts X timesofNumber

predictionfor used is X timesofNumber Xw

wX 1 if X makes accurate prediction most of the time

wX > 1 if X is not reliable for making predictions

Support Vector Machines

Find a linear hyperplane (decision boundary) that will separate the data

One Possible Solution

Another possible solution

Other possible solutions

Which one is better? B1 or B2? How do you define better?

Find hyperplane maximizes the margin (e.g. B1 is better than B2.)

margin

1 bxw 1 bxw

1bxw if1

1bxw if1)(

xf 2||||

2 Margin

We want to maximize:

– Which is equivalent to minimizing:

– But subjected to the following constraints:

This is a constrained optimization problem

– Numerical approaches to solve it (e.g., quadratic programming)

2 Margin

1bxw if1

1bxw if1)(

||||)(

Overview of optimization

Simplest optimization problem: Maximize f(x) (one variable)

If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum.

Tricky issues: • How to solve the equation f’(x) = 0?• what if there are many solutions? Each is a “local” optimum.

How to solve g(x) = 0

Even polynomial equations are very hard to solve.

Quadratic has a closed-form. What about higher-degrees?

Numerical techniques: (iteration)• bisection• secant• Newton-Raphson etc.

Challenges:• initial guess• rate of convergence?

Functions of several variables

Consider equation such as F(x,y) = 0

To find the maximum of F(x,y), we solve the equations

and 0xF 0

If we can solve this system of equations, then we have found a local maximum or minimum of F.

We can solve the equations using numerical techniques similar to the one-dimensional case.

When is the solution maximum or minimum?

• Hessian:

• if the Hessian is positive definite in the neighborhood of a, then a is a minimum.

• if the Hessian is negative definite in the neighborhood of a, then a is a maximum.

• if it is neither, then a is a saddle point.

Application - linear regression

Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y.

Assume y = Ax + B. To find A and B, we will minimize

Since this is a function of two variables, we can solve by setting and

)(),( BAxyBAE j

Constrained optimization

Maximize f(x,y) subject to g(x,y) = c

Using Lagrange multiplier, the problem is formulated as maximizing:

h(x,y) = f(x,y) + (g(x,y) – c)

Now, solve the equations:

Support Vector Machines (contd)

What if the problem is not linearly separable?

What if the problem is not linearly separable?– Introduce slack variables

Need to minimize:

Subject to:

1bxw if1

-1bxw if1)(

||||)(

Nonlinear Support Vector Machines

What if decision boundary is not linear?

Nonlinear Support Vector Machines

Transform data into higher dimensional space

Artificial Neural Networks (ANN)

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

Black box

Output

Output Y is 1 if at least two of the three inputs are equal to 1.

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

Black box

0.3 t=0.4

Outputnode

Inputnodes

otherwise0

trueis if1)( where

)04.03.03.03.0( 321

Model is an assembly of inter-connected nodes and weighted links

Output node sums up each of its input value according to the weights of its links

Compare output node against some threshold t

Black box

Outputnode

Inputnodes

)( tXwIYi

ii Perceptron Model

)( tXwsignYi

General Structure of ANN

Activationfunction

g(Si )Si Oi

Neuron iInput Output

threshold, t

InputLayer

HiddenLayer

OutputLayer

x1 x2 x3 x4 x5

Training ANN means learning the weights of the neurons

Algorithm for learning ANN

Initialize the weights (w0, w1, …, wk)

Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples– Objective function:

– Find the weights wi’s that minimize the above objective function e.g., backpropagation algorithm

2),( i

iii XwfYE

WEKA implementations

WEKA has implementation of all the major data mining algorithms including:

• decision trees (CART, C4.5 etc.)• naïve Bayes algorithm and all variants• nearest neighbor classifier• linear classifier• Support Vector Machine • clustering algorithms• boosting algorithms etc.

ces 514 – data mining lecture 8 classification (contd…)

parallel issues

data mining

datac vipin kumar

predictionsc vipin kumar

tricky issues

maximum of fx

equation fx

nominal features

Documents

processes contd filament winding processes contd dry lay-up...

venturimeter (contd.)

ces 514 data mining feb 17, 2010

eigenvalues - contd

sequence labeling, contd - sameer...

limits contd

congestion control (contd)

module 1 contd

to play (contd.)

meet rajat contd

ces 514 – data mining lec 2, feb 10 spring 2010 sonoma...

ces 514 data mining march 11, 2010

2d transformations (contd.)

multithreading (contd.)

middleware (contd.)

probability, contd

document1...288 -289 678 -849 -3708 -9133 -0571 -9617 514...

combat dining in 42d cbcs deactivates · shakiera lee, 514...

annexure ii- contd

ces 514 data mining march 11, 2010 lecture 5: scoring, term...