1 data mining lecture 8: artificial neural networks

1

Data MiningLecture 8: Artificial Neural Networks

2

Do you see any shape other than circles ?

3

What do you see in the picture?

4


5


6

A biological neuron

7

An Artificial neuron

8

Conventional computes vs. Neural nets

In a von Neumann machine:

-fetch-decode-execute-…

In a Neural Net:

-different style of processing (signal processor)-massively parallel processor-information is stored in a set of weights (memory)-robust to noise and hardware failure-…

CPU

Memory

Instructions& data

data

9

Biological information

انسانآماري از بدن

وات20 رفي در مغز تقريبا" ثابت و حدودصمتوسط توان م وزن کل انسان است2مغز حدود % (kg1.3 ) اکسيژن مصرف مي کند20ولي حدود %ونها است رمغز يک صفحه بزرگ از ن (Cerebral cortex) تا 2 با قطر

CM2 22000 ميليمتر و مساحت3نرون 1011 با حدود #~(of stars in the galaxy و هر نرون )103

رحجم ده تا ده و خم شدت فشرد اتصال. اين صفحه بش104تا ديرگ جمجمه جاي دودمح

نورونها قديمي ترين و طوالني ترين سلولها در بدن هستند . مانندبقيه سلولها مرده و جايگزين مي شوند ولي بعضي هم اصال"

جايگزين نمي شوند . بعضي نورونها خيلي طوالني هستند تا چند فوت )ا متر( از بخش

محرک تا طناب نخاعي

10

Human Brain

تقر کرگدن بايوزن

و 30 انسان برابر

آن مغز وزن

آدم مغز ينصفاست

11

Artificial Neural Networks (ANN)

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

X1

X2

X3

Y

Black box

Output

Input

Output Y is 1 if at least two of the three inputs are equal to 1.

12


X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

X1

X2

X3

Y

Black box

0.3

0.3

0.3 t=0.4

Outputnode

Inputnodes

otherwise0

trueis if1)( where

)04.03.03.03.0( 321

zzI

XXXIY

13


Model is an assembly of inter-connected nodes and weighted links

Output node sums up each of its input value according to the weights of its links

Compare output node against some threshold t

X1

X2

X3

Y

Black box

w1

t

Outputnode

Inputnodes

w2

w3

)( tXwIYi

ii Perceptron Model

)( tXwsignYi

ii

or

14

Definition

the basic model : an artificial neuron

Warren McCulloch & Walter Pitts (1943)

w1w2

w3

wnH(wtx - b)

Σ

weights threshold

otherwise

bxwt

0

01

activation

wtx

b

b

w0

1x1

.

..

x2

x3

xn

15

Function

geometry: linear separation boundary

w

b/w1

b/w2

16

Task

learn a binary classification f:ℝn{0,1} given examples )x,y( in ℝnx{0,1},

positive/negative examples evaluation: mean number of

misclassifications on a test set

17

Linear Classification

The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2

0 bxwm

1iii

x2

C1

C2

x1

decisionboundary

w1x1 + w2x2 + b = 0

decisionregion for C1

w1x1 + w2x2 + b > 0

w1x1 + w2x2 + b <= 0

decisionregion for C2

18

Artificial Neuron

Using an activation function and a threshold, the neuron can implement a simple logic function:

Example: AND function

x1 x2 y

0 0 0

0 1 01 0 01 1 1

X1

X2

y

1

1

Threshold = 1.5

19

Artificial Neuron

Example 2: OR function

x1 x2 y

0 0 0

0 1 11 0 11 1 1

X1

X2

y

2

2

Threshold = 1.5

20

Perceptron

Rosenblatt 1962– Adopted from visual perception in human

– Pattern classification

w1w2

w3

wn...

Σθ

B

21

Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of units

1

2

3

Solution to XOR problem

22

XOR problem

x1

x2

-1

+1

+1+1

+1-1

-1

-1

0.1

In this graph of the XOR, input pairs giving output equal to 1 and -1 are depicted with green and red points. These two classes cannot be separated using a line. We have to use two lines. The following NN with two hidden nodes realizes this non-linear separation, where each hidden node is a perceptron and describes one of the two blue lines.

This NN uses the sign activation function. The two green arrows indicate the directions of the weight vectors of the two hidden nodes, (1,-1) and (-1,1). They indicate the regions where the network output will be 1. The output node is used to combine the outputs of the two hidden nodes.

23

Types of decision regions

022110 xwxww

022110 xwxwwx1

1

x2 w2

w1

w0

Convexregion

L1L2

L3L4 -3.5

Networkwith a singlenode

One-hidden layer network that realizes the convex region: eachhidden node realizes one of the lines bounding the convex region

P1P2

P3

1

1

1

1

1

x1

x2

1

-0.5

two-hidden layer network that realizes the union of three convex regions: each box represents a onehidden layer network realizing one convex region

1

1

1

1

x1

x2

1

24

A four-layer network:

xn

x1

x2

Input Output

Hidden layer

Multi-layer perceptron

25

Algorithm for learning ANN

Initialize the weights (w0, w1, …, wk)

Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples– Objective function:

– Find the weights wi’s that minimize the above objective function

e.g., backpropagation algorithm (see lecture notes)

2),( i

iii XwfYE

26

Training: Backprop algorithm

The BP algorithm searches for weight values that minimize the total error of the network over the set of training set.

• BP consists of the repeated application of the following two passes:– Forward pass: in this step the network is activated on one

example and the error of (each neuron of) the output layer is computed.

– Backward pass: in this step the network error is used for updating the weights (credit assignment problem). Starting at the output layer, the error is propagated backwards through the network, layer by layer.

27

BP

Back-propagation training algorithm

BP adjusts the weights of the network in order to minimize the total mean squared error.

Network activationForward Step

Error propagationBackward Step

1 data mining lecture 8: artificial neural networks

Documents