pattern recognition: neural networks & other methods

Pattern Recognition:Neural Networks & Other

Methods

Charles TappertSeidenberg School of CSIS, Pace

University

Agenda Neural Network Definitions Linear Discriminant Functions

Simple Two-layer Perceptron Multilayer Neural Networks Example Multilayer Neural Network

Study Non Neural Network Pattern Reco

Methods

Neural Network Definitions

An artificial neural network (ANN) consists of artificial neuron units (threshold logic units) with weighted interconnections

A Perceptron is a term created by Frank Rosenblatt in the late 1950s for an ANN Unfortunately, the term Perceptron is often

misconstrued to mean only a simple two-layer ANN

Therefore, we use the term “simple Perceptron” when referring to a simple two-layer ANN

Linear Discriminant Functions

Linear functions of parameters (e.g., features) The product of an input vector and a weight

vector Hyperplane decision boundaries Methods of solution

Simple two-layer Perceptron One weight set connecting input to output units Some simple problems unsolvable, e.g. XOR

Solve linear algebra directly Support Vector Machines (SVM)

Simple PerceptronTwo-category case (one output unit)

Simple Perceptron yieldslinear decision boundary

Multilayer Neural Networks Overcome limitations of two-layer networks Feedforward networks – backpropagation

training A standard 3-layer neural network has an input

layer, a hidden layer, and an output layer interconnected by modifiable weights represented by links between layers

Benefits Simplicity of learning algorithm Ease of model selection Incorporation of heuristics/constraints

Standard Feedforward Artificial Neural Network (ANN)

The inputs can be raw data or feature data.

3-layer Perceptron solves the XOR problem (red & black dots below)

Note that each hidden unit acts like a simple Perceptron

Perceptrons Rosenblatt described many types of

Perceptrons in his 1962 book Principles of Neurodynamics Standard 3-layer feedforward Perceptrons

Sufficient to solve any pattern separation problem Multi-layered (more than 3 layers) Perceptrons Cross-coupled Perceptrons

Information can flow between units within a layer Back-coupled Perceptrons

Backward information flow (feedback) between some layers

See http://www.dtreg.com/mlfn.htm

http://www.dtreg.com/mlfn.htm

Example Neural Network Study:

Human Visual System Model Background

Line and edge detectors are known to exist in the visual systems of mammals (Hubel & Wiesel, Nobel Prize 1981)

See http://en.wikipedia.org/wiki/David_H._Hubel Problem

Demonstrate on a visual pattern recognition task that an ANN with line detectors is superior to those without line detectors

Hypotheses A line-detector ANN is more accurate than a non-line-

detector ANN A line-detector ANN trains faster than a non-line-detector

ANN A line-detector ANN requires fewer weights (and especially

fewer trainable weights) than a non-line-detector ANN

http://en.wikipedia.org/wiki/David_H._Hubel

Example Neural Network Study:

Human Visual System Model Introduction – make a case for the study

The Visual System Biological Simulations of the Visual System ANN approach to visual pattern recognition ANNs Using Line and/or Edge Detectors Current Study

Methodology Experimental Results Conclusions and Future Work

The Visual System The Visual System Pathway

Eye, optic nerve, lateral geniculate nucleus, visual cortex

Hubel and Wiesel 1981 Nobel Prize for work in early 1960s Cat’s visual cortex

cats anesthetized, eyes open with controlling muscles paralyzed to fix the stare in a specific direction

thin microelectrodes measure activity in individual cells cells specifically sensitive to line of light at specific

orientation Key discovery – line and edge detectors

Biological Simulations of Visual System

Computational Neuroscience The Hubel-Wiesel discoveries were instrumental in

the creation of what is now called computational neuroscience

Which studies brain function in terms of information processing properties of structures that make up the nervous system

Creates biologically detailed models of the brain November 2009 – IBM announced they created the

largest brain simulation to date on the Blue Gene supercomputer

A billion neurons and trillions of synapses exceeding those in the cat’s brain

http://www.popsci.com/technology/article/2009-11/digital-cat-brain-runs-blue-gene-supercomputer



Artificial Neural Network Approach

Machine learning scientists have taken a different approach to visual pattern recognition using simpler neural network models called ANNs

The most common type of ANN used in pattern recognition is a 3-layer feedforward ANN Input layer Hidden layer Output layer

Standard Feedforward Artificial Neural Network (ANN)

The inputs can be raw data or feature data.

Literature review ofANNs using line/edge detectors

GIS images/maps – line and edge detectors in four orientations – 0°, 45°, 90°, and 135°

Synthetic Aperture Radar (SAR) images – line detectors constructed from edge detectors

Line detection can be done using edge techniques such as Sobel, Prewitt, Laplacian Gaussian, Zero Crossing and Canny edge detector

Current Visual System Study Use ANNs to simulate line detectors

known to exist in the human visual cortex

Construct two feedforward ANNs – one with line detectors and one without – and compare their accuracy and efficiency on a character recognition task

Demonstrate superior performance using pre-wired line detectors

Visual System Study – Methodology

Character recognition task - classify straight line uppercase alphabetic characters

Experiment 1 – ANN without line detectors

Experiment 2 – ANN with line detectors

Compare performance Recognition accuracy Efficiency – training time & number of

weights

Alphabetic Input PatternsSix Straight Line Characters

(5 x 7 bit patterns)

***** ***** * * * * ***** * * * * * * * * * * * * * * **** **** ***** * * * * * * * * * * * * * * * * * ***** * * * * ***** *

Experiment 1 - ANN without line detectors

Experiment 1 - ANN without line detectors

Alphabet character can be placed in any position inside the 20x20 retina not adjacent to an edge – 168 (12*14) possible positions

Training – choose 40 random non-identical positions for each of the 6 characters (~25% of patterns)

Total of 240 (40 x 6) input patterns Cycle through the sequence E, F, H, I, L, T forty times for

one pass (epoch) of the 240 patterns Testing – choose another 40 random non-identical

positions of each character for a total of 240

Input patterns on the retina E(2,2) and E(12,5)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Experiment 2 - ANN with line detectors

Simple horizontal and verticalline detectors

Horizontal Vertical + --- -+- +++++ -+- --- -+- +

288 horizontal and 288 vertical line detectors (a total of 576 simple line

detectors)cover the internal retinal area

24 complex vertical line detectors and their feeding 12 simple line

detectors

Results – No Line Detectors10 hidden-layer units: 27.7%

Epochs TrainingTime

TrainingAccuracy

TestingAccuracy

50 ~2.5 hr 100% 26.7%100 ~4 hr 100% 28.3%200 ~8 hr 100% 28.8%400 ~16 hr 100% 30.4%800 ~30 hr 100% 28.3%

1600 ~2 days 100% 23.8%Average 100% 27.7%

Results – Line Detectors 10 hidden-layer units: 57.5%

Epochs TrainingTime

TrainingAccuracy

TestingAccuracy

50 0:37 min 47.5% 37.5%

100 0:26 min 100.0% 63.3%

200 0:51 min 100.0% 68.8%

400 2:28 min 71.3% 50.8%

800 3:37 min 100.0% 67.9%

1600 8:42 min 95.8% 56.7%

Average 85.8% 57.5%

Line Detector Results50 hidden-layer units: 72.1%

Epochs Set/ Attained

TrainingTime

TrainingAccuracy

TestingAccuracy

50/8 41 sec 100% 70.0%100/9 45 sec 100% 69.8%

200/10 48 sec 100% 71.9%400/10 49 sec 100% 77.1%800/8 41 sec 100% 72.5%

1600/9 45 sec 100% 71.3%Average 100% 72.1%

Confusion Matrix Overall Accuracy of 77.1%

OutIn

E F H I L T

E 62.5 20 0 0 5 12.5

F 12.5 80 0 0 2.5 5H 0 7.5 85 0 7.5 0I 0 5 0 95 0 0

L 0 15 2.5 5 72.5 5

T 2.5 20 0 10 0 67.5

Example Study Conclusion Recognition Accuracy

0

10

20

30

40

50

60

70

80

90

100

No line detectors 10 hidden units

Line detectors 10 hidden units

Line detectors 50 hidden units

Example Study ConclusionEfficiency of Training Time

ANN with line detectors resulted in a significantly more efficient network training time decreased by several

orders of magnitude

Example Study Conclusion Efficiency of Number of

Weights

Experiment Fixed Weights

Variable Weights

Total Weights

1 No Line Detectors 0 20,300 20,300

2 Line Detectors 6,912 2,700 9,612

Example Study Overall Conclusions

The strength of the study was its simplicity

The weakness was also it simplicity and that the line detectors appear to be designed specifically for the patterns to be classified

Weakness can be corrected in future work Add edge detectors Extend alphabet to full 26 uppercase

letters Add noise to the patterns

Non Neural Network Methods

Stochastic methods Nonmetric methods Unsupervised learning

(clustering)

Stochastic Methods Relies of randomness to find model

parameters Used for highly complex problems where

gradient descent algorithms unlikely to work

Methods Simulated annealing Boltzman learning Genetic algorithms

Nonmetric Methods Nominal data

No measure of distance between vectors

No notion of similarity or ordering Methods

Decision trees Grammatical methods

e.g., finite state machines Rule-based systems

e.g., propositional logic or first-order logic

Unsupervised Learning Often called clustering The system is not given a set of

labeled patterns for training Instead, the system itself

establishes the classes based on the regularities of the patterns

Clustering Separate Clouds Methods work fine when clusters form

well separated compact clouds Less well when there are great

differences in the number of samples in different clusters

Hierarchical Clustering• Sometimes clusters are not disjoint, but may have

subclusters, which in turn having sub-subclusters, etc.

• Consider partitioning n samples into clusters• Start with n cluster, each one containing exactly one sample• Then partition into n-1 clusters, then into n-2, etc.

Dendrogram of uppercase A’s

from DPS Dissertation by Dr. Mary Manfredi

england 1norway

netherland1

switzerland3usa2

switzerland1

austria2german3

brazil1

peru2

columbia1

peru1

chile1

ecuador1usa1

canada1 usa2

Pattern Recognition DPS Dissertations

(parentheses indicate in progress)

Visual Systems – Rick Bassett, Sheb Bishop, Tom Lombardi

Speech Recognition – Jonathan Law Handwriting Recognition – Mary Manfredi Natural Language Processing – Bashir Ahmed, (Ted

Markowitz) Neural Networks – (John Casarella, Robb Zucker) Keystroke Biometric – Mary Curtin, Mary Villani Stylometry Biometric – (John Stewart) Fundamental Research – Kwang Lee, Carl Abrams,

Robert Zack [using keystroke data] Other – Karina Hernandez, Mark Ritzmann [using

keystroke data], (John Galatti)

pattern recognition: neural networks & other methods

Documents