machine learning, data mining, genetic algorithms, neural

46
yright R. Weber Machine Learning, Data Mining, Genetic Algorithms, Neural Networks ISYS370 Dr. R. Weber

Upload: butest

Post on 04-Dec-2014

734 views

Category:

Documents


6 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Machine Learning, Data Mining,Genetic Algorithms, Neural

Networks

ISYS370

Dr. R. Weber

Page 2: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

•Learner uses:–positive examples (instances ARE examples of

a concept) and –negative examples (instances ARE NOT

examples of a concept)

Concept Learning is a Form of Inductive Learning

Page 3: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• Needs empirical validation• Dense or sparse data determine quality

of different methods

Concept Learning

Page 4: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• The learned concept should be able to correctly classify new instances of the concept– When it succeeds in a real instance of the

concept it finds true positives – When it fails in a real instance of the concept

it finds false negatives

Validation of Concept Learning i

Page 5: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• The learned concept should be able to correctly classify new instances of the concept– When it succeeds in a counterexample it

finds true negatives– When it fails in a counterexample it finds

false positives

Validation of Concept Learning ii

Page 6: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Rule Learning

• Learning widely used in data mining• Version Space Learning is a search

method to learn rules• Decision Trees

Page 7: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Decision trees• Knowledge representation formalism• Represent mutually exclusive rules

(disjunction)• A way of breaking up a data set into classes

or categories• Classification rules that determine, for each

instance with attribute values, whether it belongs to one or another class

• Not incremental

Page 8: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Decision trees-leaf nodes (classes)

- decision nodes (tests on attribute values)

-from decision nodes branches grow for each possible outcome of the test

From Cawsey, 1997

Page 9: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Decision tree induction• Goal is to correctly classify all example data• Several algorithms to induce decision trees:

ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5

• Constructs decision tree from past data• Attempts to find the simplest tree (not

guaranteed because it is based on heuristics)

Page 10: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

•From:– a set of target classes–Training data containing objects of more than one class

•ID3 uses test to refine the training data set into subsets that contain objects of only one class each•Choosing the right test is the key

ID3 algorithm

Page 11: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• Information gain or ‘minimum entropy’• Maximizing information gain corresponds to minimizing entropy•Predictive features (good indicators of the outcome)

How does ID3 chooses tests

Page 12: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• Information gain or ‘minimum entropy’• Maximizing information gain corresponds to minimizing entropy•Predictive features (good indicators of the outcome)

Choosing tests

Page 13: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Monthy income Job status Repayment Loan status

1 2,000 Salaried 200 Good

2 4,000 Salaried 600 Very bad

3 3,000 Waged 300 Very good

4 1,500 salaried 400 Bad

Page 14: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• Link analysis

• Deviation detection

Data mining tasks ii

Rules: • Association generation• Relationships between entities

• How things change over time, trends

Page 15: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

KDD applications• Fraud detection

– Telecom (calling cards, cell phones)– Credit cards– Health insurance

Loan approval Investment analysis Marketing and sales data analysis

Identify potential customers Effectiveness of sales campaign Store layout

Page 16: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Text mining

The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query.

Page 17: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Text mining applications

• IBM Text Navigator– Cluster documents by content;– Each document is annotated by the 2 most

frequently used words in the cluster;

• Concept Extraction (Los Alamos)– Text analysis of medical records;– Uses a clustering approach based on trigram

representation;– Documents in vectors, cosine for comparison;

Page 18: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

rule-based ES

case-based reasoning

inductive ML, NN

algorithms

deductive reasoning

analogical reasoning

inductive reasoning

search

Problemsolving

method

Reasoning type

Page 19: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Genetic Algorithms (GA)

Page 20: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Genetic algorithms (i)

• learn by experimentation• based on human genetics, it originates new

solutions • representational restrictions• good to improve quality of other methods

e.g., search algorithms, CBR• evolutionary algorithms (broader)

Page 21: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Genetic algorithms (ii)

• requires an evaluation function to guide the process• population of genomes represent possible solutions• operations are applied over these genomes• operations can be mutation, crossover• operations produce new offspring• an evaluation function tests how fit an offspring is • the fittest will survive to mate again

Page 22: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Genetic Algorithms ii

• http://ai.bpa.arizona.edu/~mramsey/ga.html You can change parameters

• http://www.rennard.org/alife/english/gavgb.html Steven Thompson presented

Page 23: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Neural Networks (NN)

Page 24: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

~= 2nd-5th week

training vision

the evidence

Page 25: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

the evidence

~= 2nd-5th week

training vision

10

Page 26: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

the evidence

~= 2nd-5th week

training vision

10

Page 27: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

the evidence

~= 2nd-5th week

training vision

Page 28: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

NN: model of brains

input output

neuronssynapses

electric transmissions:

Page 29: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Elements

• input nodes• output nodes• links• weights

Page 30: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

terminology

• input and output nodes (or units) connected by links

• each link has a numeric weight

• weights store information

• networks are trained on training sets (examples) and after are tested on test sets to assess networks’ accuracy

• learning/training takes place as weights are updated to reflect the input/output behavior

Page 31: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

The concept

=> mammal

=> bird0 1 1

4 legs flylayeggs

1 0 0

1 Yes, 0 No

=> mammal1 1 0

Page 32: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

The concept

=> mammal

=> bird0 1 1

4 legs flylayeggs

1 0 0

=> mammal1 1 0

1 Yes, 0 No

Page 33: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

The concept

=> mammal

=> bird0 1 1

4 legs flylayeggs

1 0 0

=> mammal1 1 0

0.5 0.5 0.5

1 Yes, 0 No

Page 34: Machine Learning, Data Mining, Genetic Algorithms, Neural

=> mammal

=> bird0 1 1

4 legs flylayeggs

1 0 0

=> mammal1 1 0

0*0.5+1*0.5+1*0.5= 1

1*0.5+0*0.5+0*0.5= 0.5

1*0.5+1*0.5+0*0.5= 1Goal is to have weights that recognize different representations of mammals and birds as such

0.5 0.5 0.5

Page 35: Machine Learning, Data Mining, Genetic Algorithms, Neural

=> mammal

=> bird0 1 1

4 legs flylayeggs

1 0 0

=> mammal1 1 0

0*0.5+1*0.5+1*0.5= 1

1*0.5+0*0.5+0*0.5= 0.5

1*0.5+1*0.5+0*0.5= 1Suppose we want bird to be greater 0.5 and mammal to be equal or less than 0.5

0.5 0.5 0.5

Page 36: Machine Learning, Data Mining, Genetic Algorithms, Neural

=> mammal

=> bird0 1 1

4 legs flylayeggs

1 0 0

=> mammal1 1 0

0*0.25+1*0.25+1*0.5= 0.75

1*0.25+0*0.25+0*0.5= 0.25

1*0.25+1*0.25+0*0.5= 0.5Suppose we want bird to be greater 0.5 and mammal to be equal or less than 0.5

0.25 0.25 0.5

Page 37: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

The trainingOutput=Step(w f )

learning takes place as weights are updated to reflect the input/output behavior

=> mammal (1)=> bird (0)

0 1 1

4 legs flies eggs

i=1

i=2

i=3

j=1 j=2 j=3

ij

0 0 0

0 0 0

0 0 0

1 0 0

1 0 0

1 0 0

1 0 0

1 0 0

1 1 1

1 0 0

1 1 1

1 1 1

Goal minimize error between representation of the expected and actual outcome

20

ij

Page 38: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

NN demo…..

Page 39: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Characteristics• NN implement inductive learning algorithms

(through generalization) therefore, it requires several training examples to learn

• NN do not provide an explanation why the task performed the way it was

• no explicit knowledge; uses data• Classification (pattern recognition), clustering,

diagnosis, optimization, forecasting (prediction), modeling, reconstruction, routing

Page 40: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Where are NN applicable?

• Where they can form a model from training data alone;

• When there may be an algorithm, but it is not known, or has too many variables;

• There are enough examples available• It is easier to let the network learn from

examples• Other inductive learning methods may not

be as accurate

Page 41: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Applications (i)• predict movement of stocks, currencies,

etc., from previous data;• to recognize signatures made (e.g. in a

bank) with those stored;• to monitor the state of aircraft engines (by

monitoring vibration levels and sound, early warning of engine problems can be given; British Rail have been testing an application to monitor diesel engines;

Page 42: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Applications (ii)

• Pronunciation (rules with many exceptions)

• Handwritten character recognition(network w/ 200,000 is impossible to train, final 9,760 weights, used 7300 examples to train and 2,000 to test, 99% accuracy)

• Learn brain patterns to control and activate limbs as in the “Rats control a robot by thought alone” article

• Credit assignment

Page 43: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eberCMU Driving ALVINN

• learns from human drivers how to steer a vehicle along a single lane on a highway

• ALVINN is implemented in two vehicles equipped with computer-controlled steering, acceleration, and braking

• cars can reach 70 m/h with ALVINN• programs that consider all the problem

environment reach 4 m/h only

Page 44: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eberWhy using NN for the driving

task? • there is no good theory of driving, but it is easy

to collect training samples• training data is obtained with a human* driving

the vehicle–5min training, 10 min algorithm runs

• driving is continuous and noisy• almost all features contribute with useful

information*humans are not very good generators of training instances when they behave too regularly without making mistakes

Page 45: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

• INPUT:video camera generates array of 30x32 grid of input nodes

•OUTPUT: 30 nodes layer corresponding to steering direction

•vehicle steers to the direction of the layer with highest activation

the neural network

Page 46: Machine Learning, Data Mining, Genetic Algorithms, Neural

Copy

right

R. W

eber

Resourceshttp://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html#what

http://www.ri.cmu.edu/projects/project_160.html

http://www.txtwriter.com/Onscience/Articles/ratrobot.html