04-10-05prof. pushpak bhattacharyya, iit bombay 1 cs 621 artificial intelligence lecture 21 -...

32
04-10-05 Prof. Pushpak Bhattachary ya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks: Models, Algorithms, Applications

Upload: howard-goodwin

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

1

CS 621 Artificial Intelligence

Lecture 21 - 04/10/05

Prof. Pushpak Bhattacharyya

Artificial Neural Networks: Models, Algorithms, Applications

Page 2: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

2

Softcomputing

• Neural Networks is a computing device composed of very simple processing elements called neurons which are interconnected.

• The processing power in such devices comes from many connections.

• There are different models of neural network like Perceptron, Back Propogation, Hopfield, Adaptive Resonance Theory, Kohonen Net.

• The first three models are supervised learning algorithms and the rest are unsupervised.

Page 3: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

3

Fundamental Unit: Perceptron

• These are single neurons: motivated by Brain Cells.

• ∑n i=1 wixi is compared with θ.

• If the net input is more than the threshold the neuron fires. i.e., assumes a state 1, else the state is 0.

Page 4: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

4

Computing with PerceptronX1 X2 Y

-- -- -

0 0 0

0 1 0

1 0 0

1 1 1

0 . w1 + 0 . w2 < θ From this, one possible

0 . w1 + 1 . w2 < θ solution is

1 . w1 + 0 . w2 < θ θ= 1.5 and

1 . w1 + 1 . w2 > θ w1=1.0 and w2=1.0

Page 5: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

5

Geometrical View

Page 6: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

6

Perceptron Training Algorithm (PTA): Preprocess

Page 7: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

7

Pre-processing for PTA

• The threshold is absorbed as a weight and a new input line is introduced which is applied a constant input of -1.

• Also we negate the components of the vector in the 0-class so that only the test

∑ni=1wixi > θ

is performed• In PTA we need to find the values

<w0, w1, ...,wn>

Page 8: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

8

Perceptron Training Algorithm Input : Preprocessed vectors Xi, i = 1 to m

1. Choose random values for wi, i = 0 to n 2. i:= 1 3. IF W.X > 0 goto 6

4. W := W + Xi

5. goto 2 6. i := i+1 7. IF (i > m) goto 9 8. goto 3 9. End• Thus whenever an example is misclassified, we add it to the weight

vector and this goes on until the test W.Xi > 0 is satisfied for all input vectors.

Page 9: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

9

Example of PTAAND function

1-class: {< 1, 1>} 0-class: { <0,0>, <0,1>, <1,0> }

Augmented classes: 1-class: {< -1, 1, 1>} 0-class: {<-1, 0, 0>, <-1, 0, 1>, <-1, 1, 0>}

Negating the 0-class vectors, the total set of vectors is:X1 = < -1, 1, 1>

X2 = < 1, 0, 0> X3 = < 1, 0, -1> X4 = < 1, -1, 0>

Page 10: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

10

Example of PTA (contd)Start with a random value for the weight vector:

W = < 0, 0, 0>

W.X1 = 0 and hence fails

Wnew = Wold + Xi = < -1, 1, 1>

fails for X2

Keep on adding the failed vectors. Finally the result must come.

WHY ?

Page 11: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

11

Perceptron Training Convergence Theorem

Whatever be the initial choice of the weights, the Perceptron Training Algorithm will eventually converge by finding the correct weight values, provided the function being trained is linearly separable.

Page 12: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

12

Non-LinearityHow to deal with Non-

linearity: • 1. Tolerate some error

e.g. in Case of XOR Classify the X points but include one O point also

• 2. Separate by higher order surface but in that case the neuron does not remain a linear threshold element.

Page 13: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

13

Non-Linearity (contd)

• Introduce more hyperplanes so that one segment encloses points of only one kind and no point of the other kind.

Page 14: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

14

Multilayer Perceptrons• Multilayer Perceptrons are

more powerful than single layer perceptron.

• The neurons have Sigmoid function as input output relationship.

• Merits of Sigmoid:

1. Biological Plausibility: mimics the behavior of actual brain cells.

2. Easy to compute derivative: dy/dx=y(1-y).

Page 15: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

15

Backpropagation Algorithm

Page 16: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

16

Backpropagation Algorithm (contd)

• Each neuron in one layer is connected to all neurons in the next layer if any.

Let I = <i1, i2, …, im> be the input vector,

O = <o1, o2, …, on> be the output vector,

T = <t1, t2, …, tn> be the target vector

Then error E = ½∑ni=1(ti-oi)2

Page 17: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

17

BP: Gradient DescentΔwji α –δE/δwji

where, Δwji = change in weight between ith (feeding) and jth (fed) neuron.

With learning constant η as constant of proportionality,

Δwji = –ηδE/δwji

From this the weight change values for the whole network can be easily found.

The weight change always reduces the error. Hence BP is by nature a greedy strategy.

Page 18: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

18

Analysis of BPInfluence of Learning Rate:

Large value gives fast progress, but oscillation about minima. Too small value makes the progress of the algorithm very slow.

Symmetry Breaking:If mapping demands different weights, but we start with same weightseverywhere, then BP will never converge.

Momentum Factor:Momentum factor hastens the operating point towards minima and once near the minima it dampens the oscillations.

Page 19: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

19

Local Minima

Due to the Greedy nature of BP, it can get stuck in local minimum m and will never be able to reach the global minimum g as the error can only decrease by weight change.

Page 20: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

20

Some useful tips for BP

• If the network weights do not change, one of the following might have happened:

a) Learning Rate too small.

b) Network paralysis due to neurons operating near saturation region (1 or 0) as the inputs are high positive or negative. In this case scale the inputs.

c) Network stuck in local minimum.

In this case start with a fresh random set of weights.

Page 21: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

21

Tips (contd)

• If there is an equal distribution of positive and negative values in the input then use the tanh(x) curve.

Page 22: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

22

Application

• Loan Defaulter recognition: classification task.

Name Age Sex Addr Income

(rs/yr)

Y/N

A.K. Das

43 M 5th Street, Juhu Reclamation

5 lakhs N

S. Singh

52 M A.S. Marg, Powai

0.7 Y

Page 23: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

23

Network Design

• Input layer with 4 neurons (in actuality hundreds of attributes).

• Hidden layer with 2 neurons.• Output layer with a single neuron.• Learning rate of about 0.6 and momentum factor of

about 0.3.• Training can be done with very efficient

packages/hardwarehttp://www-ra.informatik.uni-tuebingen.de/SNNS/

Page 24: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

24

Hopfield net

• Inspired by associative memory which means memory retrieval is not by address, but by part of the data.

• Consists of

N neurons fully connected with

symmetric weight strength wij = wji

• No self connection. So the weight matrix is 0-diagonal and symmetric.

• Each computing element or neuron is a linear threshold element with threshold = 0.

Page 25: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

25

Computation

Page 26: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

26

Stability• Asynchronous mode of operation: at any instant a

randomly selected neuron compares the net input with the threshold.

• In the synchronous mode of operation all neurons update themselves simultaneously at any instant of time.

• Since there are feedback connections in the Hopfield Net, the question of stability arises. At every time instant the network evolves and finally settles into a stable state.

• How does the Hopfield Net function as associative memory ?

• One needs to store or stabilize a vector which is the memory element.

Page 27: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

27

Examplew12 = w21 = 5

w13 = w31 = 3

w23 = w32 = 2

At time t=0

s1(t) = 1

s2(t) = -1

s3(t) = 1

Unstable state. Neuron 1 will flip.

A stable pattern is called an attractor for the net.

Page 28: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

28

Energy Consideration

• Stable patterns correspond to minimum energy states.

• Energy at state <x1, x2, x3, …, xn>

• E = -1/2∑j∑j<>iwjixixj

• Change in energy always comes out to be negative in the asynchronous mode of operation. Energy always decreases.

• Stability ensured.

Page 29: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

29

Association Finding

• Association detection a very important problem in data mining.

• A Hopfield net can be trained to store associations.• Classic example

People who buy bread also buy butter• Stores keep bread and butter together.• Sweet shops serving Bengali sweets also keep

popular Bengali magazines.

Page 30: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

30

Hopfield Net as a Storehouse of Associations

• Get the features of situations in terms of

<attribute, value> pairs• Train the Hopfield net with an appropriate learning

algorithm (Hopfield rule).• Use this to detect associations in future examples.

Page 31: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

31

Conclusion

• Neural net and fuzzy logic are alternate ways of computation that provide ways of discovering knowledge from high volume of data.

• First crucial step is feature selection.• The features can be fed into the neural net.• The features can be described qualitatively supported

with profile for data mining tasks.

Page 32: 04-10-05Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture 21 - 04/10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

04-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay

32

ReferencesBooks:

R.J.Schalkoff, Artificial Neural Networks, McGraw Hill Publishers, 1997.

G.J. Klir and T.A. Folger, Fuzzy Sets, Uncertainty and Information, Prentice Hall India, 1995.

E. Cox, Fuzzy Logic, Charles River Media Inc. Publication, 1995.

J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.

Journals

Neural Computation, IEEE Transactions on Neural Nets, Data and Knowledge Engineering Journal, SIGKDD, IEEE Expert etc.