csci 4410 lecture 11: introduction to neural networks adapted from kathy swigger
TRANSCRIPT
![Page 1: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/1.jpg)
CSCI 4410Lecture 11: Introduction to Neural Networksadapted from Kathy Swigger
![Page 2: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/2.jpg)
Biological NeuronBiological Neuron
The Neuron - A Biological Information Processor
dentrites - the receivers soma - neuron cell body (sums input
signals) axon - the transmitter synapse - point of transmission neuron activates after a certain
threshold is metLearning occurs via electro-chemical
changes in effectiveness of synaptic junction.
![Page 3: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/3.jpg)
Biological NeuronBiological Neuron
![Page 4: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/4.jpg)
Advantage of the BrainAdvantage of the Brain
Inherent Advantages of the Brain: “distributed processing and
representation”
Parallel processing speeds Fault tolerance Graceful degradation Ability to generalize
![Page 5: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/5.jpg)
PrehistoryW.S. McCulloch & W. Pitts (1943). “A logical
calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics, 5, 115-137.
1 1 1 01
0 1 0 0
000
x & yyx
outputinputs
Truth Table for Logical AND
x+y-21 * -2
y * +1
x * +1
if sum<0 : 0
else : 1
inp
uts
weig
hts
sum output
• This seminal paper pointed out that simple artificial “neurons” could be made to perform basic logical operations such as AND, OR and NOT.
![Page 6: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/6.jpg)
Nervous Systems as Logical Circuits
Groups of these “neuronal” logic gates could carry out any computation, even though each neuron was very limited.
x+y-11 * -1
y * +1
x * +1
if sum<0 : 0
else : 1
inp
uts
weig
hts
sum output1 1 1 01
0 1 0 0
110
x | yyx
outputinputs
Truth Table for Logical OR
• Could computers built from these simple units reproduce the computational power of biological brains?
• Were biological neurons performing logical operations?
![Page 7: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/7.jpg)
The Perceptron
It obeyed the following rule:
If the sum of the weighted inputs exceeds a threshold, output 1, else output -1.
output
inputs
weig
ht
s sum
Σxi wi
*
Frank Rosenblatt (1962). Principles of Neurodynamics, Spartan, New York, NY.
Subsequent progress was inspired by the invention of learning rules inspired by ideas from neuroscience…
Rosenblatt’s Perceptron could automatically learn to categorise or classify input vectors into types.
1 if Σ inputi * weighti > threshold
-1 if Σ inputi * weighti < threshold
![Page 8: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/8.jpg)
Networks Network parameters are adapted so
that it discriminates between classes For m classes, the classifier partitions
the feature space into m decision regions
The separation of the classes is the decision boundary.
In more than 2 dimensions this is a surface
![Page 9: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/9.jpg)
Networks For 2 classes can view net output as a
discriminant function y(x, w) where:y(x, w) = 1 if x in C1
y(x, w) = - 1 if x in C2
Need some training data with known classes to generate an error function for the network
Need a (supervised) learning algorithm to adjust the weights
![Page 10: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/10.jpg)
Linear discriminant functions
-4 -2 0 2 4 6 8 10 12 14-4
-2
0
2
4
6
8
Feature 1
Featu
re 2
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
DecisionBoundary
DecisionRegion 1
Decision Region 2
A linear discriminant function is a mapping which partitions feature space using a linear function.
Simple form of classifier: “separate the two classes using a straight line in feature space”
![Page 11: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/11.jpg)
The Perceptron as a Classifier
For d-dimensional data perceptron consists of d-weights, a bias and a thresholding activation function. For 2D data we have:
w1
w2
w0
a = w0 + w1 x1 + w2 x2 Activate {-1, +1}
1. Weighted Sum of the inputs
2. Pass thru Activation function:T(a)= -1 if a < 0 T(a)= 1 if a >= 0
x1
1Output= classdecision
If we group the weights as a vector we therefore have the net output given by:
Output = w . x + w0 (bias or threshold)
x2
View the bias as another weight from an input which is constantly on
![Page 12: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/12.jpg)
Common Activation Function Choices
![Page 13: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/13.jpg)
Nodes representing boolean functions
Using the step activation function from the previous slide.
![Page 14: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/14.jpg)
Network LearningStandard procedure for training the weights is by gradient descent
For this process we have a set of training data from known classes to be used in conjunction with an error function (eg sum of squares error) to specify an error for each instantiation of the network
Then do: w new = w old - error function
where: the error function is: T – O where T = desired output and O is actual output.
This moves us downhill
![Page 15: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/15.jpg)
Illustration of Gradient Descent
w1
w0
E(w)
![Page 16: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/16.jpg)
Illustration of Gradient Descent
w1
w0
E(w)
![Page 17: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/17.jpg)
Illustration of Gradient Descent
w1
w0
E(w)
Direction of steepestdescent = direction ofnegative gradient
![Page 18: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/18.jpg)
Illustration of Gradient Descent
w1
w0
E(w)
Original point inweight space
New point inweight space
![Page 19: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/19.jpg)
Example
1Bias
2
W0 = -1
W1 = .5
W2 = .3
Ouput= sum (weights * input) + threshold (bias)Output = (2*.5) + (1* .3) + -1 = .3Activation = 1 if >0 and 0 if < 0
Updating the functionsWi(t+1) = Wi(t) + Wi(t) (t+1) = (t) + (t)
W(t)i = (T-O) I i
(t) = (T-O)
Error = T-O (where T=Desired and O = actual output
W1(t+1) = .5 +(0-1)(2) = -1.5W2(t+1) = .3 + (0-1)(1) = -.7 (t+1) = -1 + (0-1) = -2
1
![Page 20: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/20.jpg)
The Fall of the PerceptronMarvin Minsky & Seymour Papert (1969).
Perceptrons, MIT Press, Cambridge, MA.
• Before long researchers had begun to discover the Perceptron’s limitations.
• Unless input categories were “linearly separable”, a perceptron could not learn to discriminate between them.
• Unfortunately, it appeared that many important categories were not linearly separable.
• E.g., those inputs to an XOR gate that give an output of 1 (namely 10 & 01) are not linearly separable from those that do not (00 & 11).
![Page 21: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/21.jpg)
The Fall of the Perceptron
Successful
Unsuccessful
Many Hours in the Gym per Week
Few Hours in the Gym
per Week
Footballers
Academics
In this example, a perceptron would not be able to discriminate between the footballers and the academics…
…despite the simplicity of their relationship:
Academics =Successful XOR Gym
![Page 22: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/22.jpg)
Multi-Layered Networks
![Page 23: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/23.jpg)
Feed-forward : Links can only go in one direction.
Recurrent : Arbitrary topologies can be formed
from links .
![Page 24: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/24.jpg)
Feedforward Network
I2
t = -0.5W24= -1
H4
W46 = 1 t = 1.5
H6
W67 = 1
t = 0.5
I1
t = -0.5W13 = -1
H3
W35 = 1t = 1.5
H5
O7
W57 = 1
W25 = 1
W16 = 1
![Page 25: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/25.jpg)
Feedforward Networks Arranged in layers.
Each unit is linked only to the unit in next layer.
No units are linked between the same layer, back to previous layers or skip a layer.
Computations can proceed uniformly from input to output units.
No internal state exists.
![Page 26: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/26.jpg)
Multi-layer networks
Have one or more layers of hidden units
With hidden layer, it is possible to implement any function.
![Page 27: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/27.jpg)
Recurrent Networks The brain is not a feed-forward network.
Allows activation to be fed back to previous layers.
Can become unstable or oscillate.
May take long time to compute a stable output.
Learning process is much more difficult.
Can implement more complex designs.
Can model systems with state.
![Page 28: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/28.jpg)
Back propagation Training
![Page 29: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/29.jpg)
Multi-layer Networks - the XOR function
• XOR can be written:-
• x XOR y = (x AND NOT y) OR (y AND NOT x)
![Page 30: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/30.jpg)
Multi-Layer Networks
• The Single layer perceptron could not solve XOR, because it couldn’t ‘draw a line’ to separate the two classes
• This multi-layer perceptron ‘draws’ an extra line
x
y x AND NOT y
y AND NOT x
OR the above
![Page 31: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/31.jpg)
Decision Boundaries
Can draw arbitrarily complex decision boundaries with multi-layered networks
But how do we train them / change the weights ?
![Page 32: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/32.jpg)
Backpropagation
How do we assign an error / blame to a neuron hidden in a layer far away from the output nodes ?
The trick is to feed the information in…
Work out the errors at the output nodes…
Then Propagate the errors backwards through the layers
![Page 33: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/33.jpg)
New Threshold Function
• The Backprop algorithm requires sensitive measurement of error and a smoothly varying function (has a derivative everywhere)
• we replace the sign function with a smooth function
• A popular choice is the sigmoid
![Page 34: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/34.jpg)
Sigmoid Function
Defined by, Oj = 1 / ( 1 + e-Nj )
where Nj= sum of the (weights*inputs) + bias
![Page 35: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/35.jpg)
Backpropagation
We now use the Generalized Delta Rule for altering weights in the networks :-
wij (t+1) = wij(t) + wij(t)
wij (t) = (learning rate) (err)j Oi
j (t+1) = j + j
j (t) = (learning rate) (err) jTwo rules for updating weights(with sigmoid function) :-
Make the changes…
1) (err) j = O j (1 – Oj )(Tj – Oj)
for nodes in the output layer
2) (err)j = O j (1 – Oj )(k (err)k wjk)
for nodes on hidden layers
![Page 36: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/36.jpg)
Essentials of BackProp
• The equations for changing the weights are derived by trying to assign an amount of blame to weights deep in the network
•The sensitivity of ‘Error’ for output layer is calculated with respect to nodes and weights in hidden layers
•In practice, simply show training sets to the input layer, compare the results at the output layer ( T - O ) and used the two rules for weight adjustments. (1) for weights leading to output layer, (2) otherwise
![Page 37: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/37.jpg)
Training Algorithm 1
Step 0: Initialize the weights to small random values
Step 1: Feed the training sample through the network and determine the final output
Nj= Sum of the (weights * Input) + threshold
(bias) Step 2: Compute the error for each
output unit, for unit j it is:1) (err) j = O j (1 – Oj )(Tj – Oj)
![Page 38: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/38.jpg)
Training Algorithm (cont.)
Step 3: Calculate the weight correction term for each output unit, for unit j it is:
Δwij = (learning rate) (err)j O i
A small constant
Hidden layer signal
![Page 39: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/39.jpg)
Training Algorithm 3
Step 4: Propagate the delta terms (errors) back through the weights of the hidden units where the delta input for the jth hidden unit is:
(err)j = O j (1 – Oj )( (err)k wjk)
k
![Page 40: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/40.jpg)
Training Algorithm 4 Step 5: Calculate the weight correction
term for the hidden units:
Step 6: Update the weights:
Step 7: Test for stopping (maximum cylces, small changes, etc)
Δwij = (lrate)(err)j Oi
wij(t+1) = wij(t) + Δwij
![Page 41: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/41.jpg)
Options
There are a number of options in the design of a backprop system Initial weights – best to set the
initial weights (and all other free parameters) to random numbers inside a small range of values (say –0.5 to 0.5)
Number of cycles – tend to be quite large for backprop systems
Number of neurons in the hidden layer – as few as possible
![Page 42: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/42.jpg)
The numbers
I1 1I2 1W13 .1W14 -.2W23 .3W24 .4W35 .5W45 -.43 .24 -.35 .4
![Page 43: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/43.jpg)
Output!Where N= input of nodeO= Activation function = threshold value of node
![Page 44: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/44.jpg)
Backpropgating!
![Page 45: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/45.jpg)
XOR Architecture
x
y
fv21 v11
v31
fv22 v12
v32
fw21 w11
w31
1
1
1
![Page 46: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/46.jpg)
Initial Weights Randomly assign small weight
values:
x
y
f.21 -.3
.15
f-.4 .25
.1
f-.2 -.4
.3
1
1
1
![Page 47: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/47.jpg)
Feedfoward – 1st Pass
x
y
f.21 -.3
.15
f-.4 .25
.1
f-.2 -.4
.3
1
1
1
Training Case: (0 0)
0
0
1
1
s1 = -.3(1) + .21(0) + .25(0) = -.3
Oj = -sj 1
1 + e
Activation function f:
f = .43
s2 = .25(1) -.4(0) + .1(0)
f = .56
1
s3 = -.4(1) - .2(.43) +.3(.56) = -.318
f = .42(not 0)
![Page 48: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/48.jpg)
Backpropagate
0
0
f.21 -.3
.15
f-.4 .25
.1
f-.2 -.4
.3
1
1
1
err3 = .42 (1 – .42) (0-.42)
= -.102
_in1 = 3w13 = -.102(-.2) = .02
formula2= .43(1-.43).02 = .005
_in2 = 3w12 = -.102(.3) = -.03
formula2 = -.56(1-.56)(.03) = -.007
1) (errdrv) j = O j (1 – Oj )(Tj – Oj)
2) (errdrv)j = O j (1 – Oj )( (errdrv)k wjk)
f=.42
f=.56
f=.43
![Page 49: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/49.jpg)
Update the Weights – First Pass
0
0
f.21 -.3
.15
f-.4 .25
.1
f-.2 -.4
.3
1
1
1
Wij(t+1)= wij(t)+ wij(t) wij(t)= (lrate)(errdrv)j Oi
Wt = .3 +.28=.58 wij(t)= .5*(.102)*.56=.28
![Page 50: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/50.jpg)
Applications
![Page 51: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/51.jpg)
Sentinel
GOAL: Initial design and development of a neural network-based misuse detection system which explores the use of this technology in the identification of instances of external attacks on a computer network.
DNSSMTP POP
HTTP FTP
Internet
SENTINEL Intrusion Response
![Page 52: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/52.jpg)
t = 180
h t t p : / / w w w . c n n . c
a d m I n I s t r a t o r
h t t p : / / e s e t . s c I s
t e l n e t c I s , n o v a .
l o g o u t
s u p e r v I s o r
h t t p : / / w w w . g t I s c
a n o n y m o u s
h t t p : / / w w w . m i c r o
f t p : / / a s t r o . g a t c
………………
….
f t p : / / a s t r o . g a t c
h t t p : / / w w w . g t I s c
a n o n y m o u s
s u p e r v I s o r
h t t p : / / e s e t . s c I s
Raw Data and
Destination Port of
Network Packets
Self-Organizing Map
NetworkStream
Neural Network Output
Results of SOM Event Classification
![Page 53: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/53.jpg)
Results Tested with data sets containing 6, 12, 18, and 0
“attacks” in each 180 event data set Successfully detected >= 12 “attacks” in test cases Failed to “alert” in lower number of “attacks” (per
design)
Hybrid NN Test Results
0.00
0.20
0.40
0.60
0.80
1.00
6 12 18 0
FTP Attempts in Data Set
![Page 54: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/54.jpg)
An Automatic Screening System for Diabetic Retinopathy
Visual Features: Nerve: Circular yellow and bright area
from which vessels emerge Macula: Dark elliptic red area Microaneurysms: Small scattered red
and dark spots (in the order of 10x10 pixels in our database of images)
Haemorrhages: Larger red blots Cotton spots (exudates): Yellow blots
![Page 55: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/55.jpg)
Example
Cotton Spots:
![Page 56: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/56.jpg)
Approach
A moving window searches for the matching pattern
Match
Miss
Miss …NN trained torecognize nerves
Miss or Match
Test window
![Page 57: CSCI 4410 Lecture 11: Introduction to Neural Networks adapted from Kathy Swigger](https://reader030.vdocuments.mx/reader030/viewer/2022032606/56649e8e5503460f94b91f9d/html5/thumbnails/57.jpg)
How long should you train the net?
The goal is to achieve a balance between correct responses for the training patterns and correct responses for new patterns. (That is, a balance between memorization and generalization.)
If you train the net for too long, then you run the risk of overfitting to the training data.
In general, the network is trained until it reaches an acceptable error rate (e.g., 95%).
One approach to avoid overfitting is to break up the data into a training set and a training test set. The weight adjustments are based on the training set. However, at regular intervals the test set is evaluated to see if the error is still decreasing. When the error begins to increase on the test set, training is terminated.