pattern recognition: neural networks & other methods
DESCRIPTION
Pattern Recognition: Neural Networks & Other Methods. Charles Tappert Seidenberg School of CSIS, Pace University. Agenda. Neural Network Definitions Linear Discriminant Functions Simple Two-layer Perceptron Multilayer Neural Networks Example Multilayer Neural Network Study - PowerPoint PPT PresentationTRANSCRIPT
Pattern Recognition:Neural Networks & Other
Methods
Charles TappertSeidenberg School of CSIS, Pace
University
Agenda Neural Network Definitions Linear Discriminant Functions
Simple Two-layer Perceptron Multilayer Neural Networks Example Multilayer Neural Network
Study Non Neural Network Pattern Reco
Methods
Neural Network Definitions
An artificial neural network (ANN) consists of artificial neuron units (threshold logic units) with weighted interconnections
A Perceptron is a term created by Frank Rosenblatt in the late 1950s for an ANN Unfortunately, the term Perceptron is often
misconstrued to mean only a simple two-layer ANN
Therefore, we use the term “simple Perceptron” when referring to a simple two-layer ANN
Linear Discriminant Functions
Linear functions of parameters (e.g., features) The product of an input vector and a weight
vector Hyperplane decision boundaries Methods of solution
Simple two-layer Perceptron One weight set connecting input to output units Some simple problems unsolvable, e.g. XOR
Solve linear algebra directly Support Vector Machines (SVM)
Simple PerceptronTwo-category case (one output unit)
Simple Perceptron yieldslinear decision boundary
Multilayer Neural Networks Overcome limitations of two-layer networks Feedforward networks – backpropagation
training A standard 3-layer neural network has an input
layer, a hidden layer, and an output layer interconnected by modifiable weights represented by links between layers
Benefits Simplicity of learning algorithm Ease of model selection Incorporation of heuristics/constraints
Standard Feedforward Artificial Neural Network (ANN)
The inputs can be raw data or feature data.
3-layer Perceptron solves the XOR problem (red & black dots below)
Note that each hidden unit acts like a simple Perceptron
Perceptrons Rosenblatt described many types of
Perceptrons in his 1962 book Principles of Neurodynamics Standard 3-layer feedforward Perceptrons
Sufficient to solve any pattern separation problem Multi-layered (more than 3 layers) Perceptrons Cross-coupled Perceptrons
Information can flow between units within a layer Back-coupled Perceptrons
Backward information flow (feedback) between some layers
See http://www.dtreg.com/mlfn.htm
Example Neural Network Study:
Human Visual System Model Background
Line and edge detectors are known to exist in the visual systems of mammals (Hubel & Wiesel, Nobel Prize 1981)
See http://en.wikipedia.org/wiki/David_H._Hubel Problem
Demonstrate on a visual pattern recognition task that an ANN with line detectors is superior to those without line detectors
Hypotheses A line-detector ANN is more accurate than a non-line-
detector ANN A line-detector ANN trains faster than a non-line-detector
ANN A line-detector ANN requires fewer weights (and especially
fewer trainable weights) than a non-line-detector ANN
Example Neural Network Study:
Human Visual System Model Introduction – make a case for the study
The Visual System Biological Simulations of the Visual System ANN approach to visual pattern recognition ANNs Using Line and/or Edge Detectors Current Study
Methodology Experimental Results Conclusions and Future Work
The Visual System The Visual System Pathway
Eye, optic nerve, lateral geniculate nucleus, visual cortex
Hubel and Wiesel 1981 Nobel Prize for work in early 1960s Cat’s visual cortex
cats anesthetized, eyes open with controlling muscles paralyzed to fix the stare in a specific direction
thin microelectrodes measure activity in individual cells cells specifically sensitive to line of light at specific
orientation Key discovery – line and edge detectors
Biological Simulations of Visual System
Computational Neuroscience The Hubel-Wiesel discoveries were instrumental in
the creation of what is now called computational neuroscience
Which studies brain function in terms of information processing properties of structures that make up the nervous system
Creates biologically detailed models of the brain November 2009 – IBM announced they created the
largest brain simulation to date on the Blue Gene supercomputer
A billion neurons and trillions of synapses exceeding those in the cat’s brain
http://www.popsci.com/technology/article/2009-11/digital-cat-brain-runs-blue-gene-supercomputer
Artificial Neural Network Approach
Machine learning scientists have taken a different approach to visual pattern recognition using simpler neural network models called ANNs
The most common type of ANN used in pattern recognition is a 3-layer feedforward ANN Input layer Hidden layer Output layer
Standard Feedforward Artificial Neural Network (ANN)
The inputs can be raw data or feature data.
Literature review ofANNs using line/edge detectors
GIS images/maps – line and edge detectors in four orientations – 0°, 45°, 90°, and 135°
Synthetic Aperture Radar (SAR) images – line detectors constructed from edge detectors
Line detection can be done using edge techniques such as Sobel, Prewitt, Laplacian Gaussian, Zero Crossing and Canny edge detector
Current Visual System Study Use ANNs to simulate line detectors
known to exist in the human visual cortex
Construct two feedforward ANNs – one with line detectors and one without – and compare their accuracy and efficiency on a character recognition task
Demonstrate superior performance using pre-wired line detectors
Visual System Study – Methodology
Character recognition task - classify straight line uppercase alphabetic characters
Experiment 1 – ANN without line detectors
Experiment 2 – ANN with line detectors
Compare performance Recognition accuracy Efficiency – training time & number of
weights
Alphabetic Input PatternsSix Straight Line Characters
(5 x 7 bit patterns)
***** ***** * * * * ***** * * * * * * * * * * * * * * **** **** ***** * * * * * * * * * * * * * * * * * ***** * * * * ***** *
Experiment 1 - ANN without line detectors
Experiment 1 - ANN without line detectors
Alphabet character can be placed in any position inside the 20x20 retina not adjacent to an edge – 168 (12*14) possible positions
Training – choose 40 random non-identical positions for each of the 6 characters (~25% of patterns)
Total of 240 (40 x 6) input patterns Cycle through the sequence E, F, H, I, L, T forty times for
one pass (epoch) of the 240 patterns Testing – choose another 40 random non-identical
positions of each character for a total of 240
Input patterns on the retina E(2,2) and E(12,5)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Experiment 2 - ANN with line detectors
Simple horizontal and verticalline detectors
Horizontal Vertical + --- -+- +++++ -+- --- -+- +
288 horizontal and 288 vertical line detectors (a total of 576 simple line
detectors)cover the internal retinal area
24 complex vertical line detectors and their feeding 12 simple line
detectors
Results – No Line Detectors10 hidden-layer units: 27.7%
Epochs TrainingTime
TrainingAccuracy
TestingAccuracy
50 ~2.5 hr 100% 26.7%100 ~4 hr 100% 28.3%200 ~8 hr 100% 28.8%400 ~16 hr 100% 30.4%800 ~30 hr 100% 28.3%
1600 ~2 days 100% 23.8%Average 100% 27.7%
Results – Line Detectors 10 hidden-layer units: 57.5%
Epochs TrainingTime
TrainingAccuracy
TestingAccuracy
50 0:37 min 47.5% 37.5%
100 0:26 min 100.0% 63.3%
200 0:51 min 100.0% 68.8%
400 2:28 min 71.3% 50.8%
800 3:37 min 100.0% 67.9%
1600 8:42 min 95.8% 56.7%
Average 85.8% 57.5%
Line Detector Results50 hidden-layer units: 72.1%
Epochs Set/ Attained
TrainingTime
TrainingAccuracy
TestingAccuracy
50/8 41 sec 100% 70.0%100/9 45 sec 100% 69.8%
200/10 48 sec 100% 71.9%400/10 49 sec 100% 77.1%800/8 41 sec 100% 72.5%
1600/9 45 sec 100% 71.3%Average 100% 72.1%
Confusion Matrix Overall Accuracy of 77.1%
OutIn
E F H I L T
E 62.5 20 0 0 5 12.5
F 12.5 80 0 0 2.5 5H 0 7.5 85 0 7.5 0I 0 5 0 95 0 0
L 0 15 2.5 5 72.5 5
T 2.5 20 0 10 0 67.5
Example Study Conclusion Recognition Accuracy
0
10
20
30
40
50
60
70
80
90
100
No line detectors 10 hidden units
Line detectors 10 hidden units
Line detectors 50 hidden units
Example Study ConclusionEfficiency of Training Time
ANN with line detectors resulted in a significantly more efficient network training time decreased by several
orders of magnitude
Example Study Conclusion Efficiency of Number of
Weights
Experiment Fixed Weights
Variable Weights
Total Weights
1 No Line Detectors 0 20,300 20,300
2 Line Detectors 6,912 2,700 9,612
Example Study Overall Conclusions
The strength of the study was its simplicity
The weakness was also it simplicity and that the line detectors appear to be designed specifically for the patterns to be classified
Weakness can be corrected in future work Add edge detectors Extend alphabet to full 26 uppercase
letters Add noise to the patterns
Non Neural Network Methods
Stochastic methods Nonmetric methods Unsupervised learning
(clustering)
Stochastic Methods Relies of randomness to find model
parameters Used for highly complex problems where
gradient descent algorithms unlikely to work
Methods Simulated annealing Boltzman learning Genetic algorithms
Nonmetric Methods Nominal data
No measure of distance between vectors
No notion of similarity or ordering Methods
Decision trees Grammatical methods
e.g., finite state machines Rule-based systems
e.g., propositional logic or first-order logic
Unsupervised Learning Often called clustering The system is not given a set of
labeled patterns for training Instead, the system itself
establishes the classes based on the regularities of the patterns
Clustering Separate Clouds Methods work fine when clusters form
well separated compact clouds Less well when there are great
differences in the number of samples in different clusters
Hierarchical Clustering• Sometimes clusters are not disjoint, but may have
subclusters, which in turn having sub-subclusters, etc.
• Consider partitioning n samples into clusters• Start with n cluster, each one containing exactly one sample• Then partition into n-1 clusters, then into n-2, etc.
Dendrogram of uppercase A’s
from DPS Dissertation by Dr. Mary Manfredi
england 1norway
netherland1
switzerland3usa2
switzerland1
austria2german3
brazil1
peru2
columbia1
peru1
chile1
ecuador1usa1
canada1 usa2
Pattern Recognition DPS Dissertations
(parentheses indicate in progress)
Visual Systems – Rick Bassett, Sheb Bishop, Tom Lombardi
Speech Recognition – Jonathan Law Handwriting Recognition – Mary Manfredi Natural Language Processing – Bashir Ahmed, (Ted
Markowitz) Neural Networks – (John Casarella, Robb Zucker) Keystroke Biometric – Mary Curtin, Mary Villani Stylometry Biometric – (John Stewart) Fundamental Research – Kwang Lee, Carl Abrams,
Robert Zack [using keystroke data] Other – Karina Hernandez, Mark Ritzmann [using
keystroke data], (John Galatti)