neural networks

Neural Networks (TEC-833)B.Tech (EC – VIII Sem) – Spring 2012

[email protected]

mailto:[email protected]

Mid Term Syllabus• Introduction:

– Brain and Machine, Biological neurons and its mathematical model, Artificial Neural Networks, Benefits and Applications, Architectures, Learning Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.

• Supervised learning – I: – Pattern space and weight space, Linearly and non-linearly

separable classes, decision boundary, Hebbian learning and limitation, Perceptron, Perceptron convergence theorem, Logic Functions implementations

• LMS Algorithm: – LMS Algorithm,

• Supervised Learning – II: – Multilayer Perceptrons, XOR problem,

End Sem Syllabus• Introduction: Brain and Machine, Biological neurons and its mathematical model,

Artificial Neural Networks, Benefits and Applications, Architectures, Learning Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.

• Supervised learning – I: Pattern space and weight space, Linearly and non-linearly separable classes, decision boundary, Hebbian learning and limitation, Perceptron, Perceptron convergence theorem, Logic Functions implementations

• LMS Algorithm: Wiener-Hopf equations, Steepest Descent search method, LMS Algorithm, Convergence consideration in mean and mean square, Adaline, Learning curve, Learning rate annealing schedules

• Supervised Learning – II: Multilayer Perceptrons, Backpropagation algorithm, XOR problem, Training modes, Optimum learning, Local minima, Network pruning techniques

• Unsupervised learning: Clustering, Hamming networks, Maxnet, Simple competitive learning, Winner-take-all networks, Learning Vector Quantizers, Counterpropagation Networks, Self Organizing Maps (Kohonen Networks), Adaptive Resonance Theory

• Associative Models: Hopfield Networks (Discrete and Continuous), Storage Capacity, Energy function and minimization, Brain-state-in-a-box neural network

• Applications of ANN and MATLAB Simulation: Character Recognition, Control Applications, Data Compression, Self Organizing Semantic Maps

References• Neural Networks: A Comprehensive Foundation

– Simon Haykin (Pearson Education)• Neural Networks: A Classroom Approach –

Satish Kumar (Tata McGraw Hill)• Fundamentals of Neural Networks – Laurene

Fausett (Pearson Education)• ftp://ftp.sas.com/pub/neural/FAQ.html• MATLAB neural network toolbox and related

help notes

ftp://ftp.sas.com/pub/neural/FAQ.html

Inputs to Neural Networks

• Biology• Graph Theory• Algorithms• Artificial Intelligence• Control Systems• Signal Theory

Minsky’s challenge(adapted from Minsky, Singh and Sloman (2004))

Few Number of Causes Many

Many Symbolic Logical Reasoning

Case based Reasoning Intractable

Number of EffectsOrdinary

Qualitative Reasoning

Classical AI Analogy Based Reasoning

Few Easy Linear, StatisticalConnectionist,

Neural Network, Fuzzy Logic

Who uses Neural Networks

Area Use

Computer Scientists To understand properties of non-symbolic information processing; Learning systems

Engineers In many areas including signal processing and automatic control

Statisticians As flexible, non-linear regression and classification models

Physicists To model phenomenon in statistical mechanics and other tasks

Cognitive Scientists To describe models of thinking and consciousness and other high level brain functions

Neuro-physiologists To describe and explore memory, sensory functions, motor functions and other mid-level brain functions

Biologists To interpret nucleotide sequences

Philosophers etc. For their own reasons

Brain vs the computerhttp://scienceblogs.com/developingintelligence/2007/03/why_the_brain_is_not_like_a_co.php

Brain ComputerBrains are analogue (neuronal firing rate, asynchronous, leakiness)

Computers are digital

Brain uses content-addressable memory Computers use byte addressable memory

Brain is a massively parallel machine Computers are modular and serial

Processing speed is not fixed in the brain; there is no system clock

Processing speed is fixed; there is a system clock

Short-term memory only holds pointers to long term memory

RAM has isomorphic data

No hardware/software distinction can be made with respect to the brain or mind

Computers have a clear distinction between hardware and software

Synapses are far more complex than electrical logic gates

Electrical gates are simpler in function and mechanism

Processing and memory are performed by the same components in the brain

Processing and memory are performed by different components in the computer

The brain is a self-organizing system Computers are usually not self organizing

Brains have bodies and use them Computers do not usually use their bodies

The brain capacity is much larger than any computer

Computer capacities though large are still not comparable with those of the brain

Neuro products and application areas

• Academia Research• Automotive Industry• Bio Informatics• Cancer Detection• Computer Gaming• Credit Ratings• Drug Interaction Prediction• Electrical Load Balancing• Financial Forecasting• Fraud Detection• Human Resources• Image Recognition• Industrial Plant Modeling• Machine Control• Machine Diagnostics

• Market Segmentation• Medical Diagnosis• Meteorological Research• Optical Character Recognition• Pattern Recognition• Predicting Business Expenses• Real Estate Evaluations• Robotics• Sales Forecasting• Search Engines• Software Security• Speech Recognition• Sports Betting• Sports Handicap Predictions

Applications of ANN (Sample Examples)• Non linear statistical data modeling tools• Function Approximation/ Mapping• Pattern recognition in data• Noise Cancellation (LMS) in signaling systems• Time Series Predictions• Control and Steering of Autonomous Vehicles (Feedforward)• Protein structure prediction and RNA splice junction identification• Sonar/radar/image/astronomy/handwriting target recognition/ classification• Call admission control for improving QOS in telecommunications (ATM) networks• Software engineering project management• Reinforcement Learning in Robotics (Backpropagation)• Pattern Completion (Hopfield)• Object recognition (Hopfield)• Clustering and Character Recognition (ART)• Neural Information Retrieval System(Machine parts retrieval at Boeing) (ART)• Neural Phonetic Typewriter (SOM)• Control of Robot Arms (SOM)• Vector Quantization (SOM)• Radar based classification (SOM)• Brain Modeling (SOM)• Feature mapping of language data (SOM)• Organization of massive document collection (SOM)

Neuroscience basics I

• 100 B (10**11) neurons in brain• Each neuron has 10K (10**4) synapses on

average• Thus 10**15 connections• A lifetime of 80 years is 2.5B seconds.

Structural organization of levels in brain

Molecules

Synapses

Neural microcircuits

Dendritic Trees

Neurons

Local Circuits (Maps/Networks)

Interregional Circuits (Systems)

Central Nervous System

Structural organization of levels in brain (Churchland)

Neuroscience Basics II• Brain structures

– Cerebrum• Frontal Lobe• Temporal Lobe• Parietal Lobe• Occipital Lobe• Central Sulcus• Sylvian fissure

– Cerebellum– Brain Stem

• Corpus callosum• Thalamus• Hypothalamus• Midbrain• Pons• Medulla

Brain Anatomy

Brain Areas

Homunculus

Nervous System

Sympathetic and Parasympathetic nerves

Neuron - I

Neuron - II

Sensory and Motor pathways

Wiring the Brain

Synapses (http://www.biologymad.com/nervoussystem/synapses.htm)

Neurotransmitters• Neurotransmitters are endogenous chemicals which

transmit signals from a neuron to a target cell across a synapse.

Excitatory Inhibitory

Glutamate (memory storage) Gamma Amino Butyric Acid (GABA) (brain)

Acetylcholine (neuro muscular junction)

Glycine (spinal cord)

Dopamine (brain reward system) Norepinephrine

Serotonin (regulation of appetite, sleep, memory, learning, mood, behaviour)Substance P

Action Potential

Types of Neural Networks

Based on Learning Algorithms Supervised and Unsupervised

Associativity in Supervised Learning Auto Associative and Hetro Associative

Based on Network Topology Feed forward and feedback / recurrent

Based on kind of data accepted Categorical variables, Quantitative variables

Based on transfer function used Linear, Non-linear

Based on number of layers Single Layer, Multilayer

ANN Architecture Taxonomy

ANN

Supervised

Feed forward

Linear Hebbian, Perceptron, Adaline, Higher Order, Functional Link

MLP (Multilayer Perceptron) Back Propagation, Cascade Correlation, Quick Prop, RPROP

RBF Networks Orthogonal Least Squares

CMAC Cerebellar Model Articulation Controller

Classification Only LVQ (Learning Vector Quantization), PNN (Probabilistic Neural Network)

Regression Only GNN (General Regression Neural Network)

Feedback

BAM (Binary Associative Memory)

Boltzmann Machine

Recurrent Time Series

Back Propagation through time, Elman, FIR, Jordan, Real time recurrent network, Recurrent Back propagation, TDNN (Time

Delay Neural Nets)

Competitive ARTMAP, Fuzzy ARTMAP, Gaussian ARTMAP, Counter propagation, Neocognitron

Unsupervised

Competitive

Vector Quantization Grossberg, Kohonen, Conscience

Self Organizing Map Kohonen, GTM, Local Linear

Adaptive Resonance Theory ART1, ART2, ART2A, ART3, Fuzzy ART

DCL (Differential Competitive Learning)

Dimension Reduction Hebbian, Oja, Sanger, Differential Hebbian

Auto Association Linear Auto Associator, BSB (Brain State in a Box), Hopfield

Non learning Hopfield, various networks for optimization

Learning Rules

• Error correction learning• Memory based learning• Hebbian learning• Competitive learning• Boltzmann learning

Error Correction Learning

• Error signal: ek(n) = dk(n) – yk(n)• Control mechanism to apply a series of corrective

adjustments• Index of performance or instantaneous value of

Error Energy: E(n) = ½ ek2(n)

• Delta rule or Widrow-Hopf rule– Thus Δwkj(n) = ηek(n)xj(n)

• And wkj(n+1) = wkj(n) + Δwkj(n)

• Using unit delay operator: wkj(n) = z-1[wkj(n+1)]

Euclidean Distance

• Ordinary distance between two points that can be measured with a ruler.

• In multi dimensional case it is the distance between two vectors.

Memory based learning

• Binary pattern classification :– with input output pairs {(xi,di)}N

i=1

• Nearest Neighbor Rule– xN’ є {x1, x2, …, xN}

– If mini d(xi, xtest) = d(xN’, xtest)

– Where d(xi, xtest) is the Euclidean distance between the vectors xi and x test.

• Cover and Hart (1967): nearest neighbor rule for pattern classification. Assumptions are: – The classified examples (xi, di) are independently and identically distributed according

to the joint probability distribution of the example – The sample size N is infinitely largeThen the probability of classification error is bounded by twice the Bayes probability of

error. , the minimum probability of error over all decision rules.• Radial basis function network for curve fitting (approximation problem in higher

dimensional space)

Hebbian Learning

• Repeated or persistent firing changes synaptic weight due to increased efficiency

• Associative learning at cellular level– Time dependent mechanism– Local mechanism– Interactive mechanism– Conjunctional or correlational mechanism– Here Δwkj(n) = F(yk(n), xj(n))

– Hebb’s hypothesis : Δwkj(n) = η yk(n)xj(n)

– Covariance hypothesis: Δwkj(n) = η (yk – yav)(xj(n)-xav)

• Synaptic modifications can be Hebbian, Anti-Hebbian, or non-Hebbian.• Evidence for Hebbian learning in the Hippocampus which plays an

important role in learning and memory

Competitive Learning

• The O/P neurons compete among themselves to become active• Elements of competitive learning rule (Rumelhart and Zisper

(1985))– Sets of neurons are same except randomly distributed synaptic

weights– Limit on strength of each neuron– Winner takes all mechanism

• Use as feature detectors• Has feed forward (excitatory connections)• Has lateral (inhibitory) connections• Here Δwkj(n) = η(xj – wkj) if neuron k wins

• = 0 if neuron k loses

Boltzmann Learning• Stochastic model of a neuron

– x = +1 with probability P(v)– = -1 with probability 1- P(v)– P = 1/(1+ exp(-v/T) – T is pseudo temperature use to control uncertainty in firing (noise level)

• Stochastic learning algorithm for statistical mechanics• Neurons in recurrent structure• Operate in binary manner• Energy function

– Here E= -1/2 Σ Σ wkjxkxj

• Flip a random neuron from state xk to state –xk at some temperature with probability

• P(xk -> -xk) = 1/(1+exp(- ΔEk/T))

Credit Assignment Problem in Distributed Systems

• Assignment of credit or blame for overall outcome to internal decisions

• Credit assignment problem has two parts:– Temporal Credit Assignment Problem– Structural Credit Assignment Problem

• Credit Assignment problem becomes more complex in multilayer feed forward neural nets.

Supervised Learning• Knowledge is represented by a series of input-output examples• Environment provides training vector to both teacher and Neural

Network• Teacher or Trainer provides Desired response• Neural Network provides Actual response• Error Signal = Desired response – Actual response• Adjustment is carried out iteratively to make the neural network

emulate the teacher.• The mean square error function can be visualized as a

multidimensional error-performance surface with the free parameters as coordinates.

• Identification of local or global minimum is done using steepest gradient descent method.

Reinforcement learning/ Neuro-dynamic Programming(Learning with a Critic)

• Critic converts a primary reinforcement signal from environment to a heuristic reinforcement signal

• system learns under delayed reinforcement after observation of temporal sequences

• goal is to minimize the cumulative cost of actions over a sequence of steps

• Problems:– No teacher to provide desired response– Learning machine must solve temporal credit assignment problem

• Reinforcement learning is related to Dynamic Programming

Unsupervised Learning(Self Organized Learning)

• No external teacher or critic• Provision for task independent measure of quality of learning• Free parameters are optimized with respect to that measure• Network becomes tuned to statistical regularities in data• It develops ability to form internal representations for

encoding features of input and create new classes automatically

• Competitive Learning rule is used for Unsupervised learning• Two layers: input layer and competitive layer

Learning Applications

• Pattern Association• Pattern Recognition• Function Approximation• Control• Filtering• Beam forming

Pattern Association• Cognition uses association in distributed memory :

– xk -> yk ; key pattern -> memorized pattern

– Two phases: • storage phase (training)• recall phase (noisy or distorted version of key pattern presented)• y= yj (Perfect recall)

• y ≠ yj for x =xj (error)

• Two types:– Auto associative memory:

• Output set of patterns is the same as input set: yk = xk

• Used for pattern retrieval• Input and output spaces have same dimensionality• Uses unsupervised learning

– Hetero associative memory:• Output set of patterns is the different from input set: yk ≠ xk

• Used in other Pattern Association• Input and output spaces may or may not have same dimensionality• Uses supervised learning

Pattern Recognition• Process whereby a received pattern is assigned to a prescribed number of

classes (categories)• Two stages:

– Training Session– New patterns

• Patterns can be considered as points in multidimensional decision space (MDS)• MDS is divided into regions, each associated with a class• Decision boundaries are determined by the training process• Boundary definition is by a statistical mechanism due to variability between

classes• Machine has two parts:

– Feature Extraction (Unsupervised network)– Classification (Supervised network)– m-dimensional observation (data) space -> q-dimensional feature space -> r

dimensional decision space• Approaches:

– Single layered feed forward network using a supervised learning algorithm– Feature extraction is done in the hidden layer

Function Approximation• I/O mapping: d=f(x)• Function f(.) is unknown• Set of labeled examples are available– T= {(xi, di)}N i=1

• ||F(x) –f(x)|| < ε for all x• Used in – System model identification– Inverse system model identification

Control

• Ref signal is compared with feedback signal• Error signal e is fed to neural network controller• O/P of NNC u is fed to plant as input• Plant output is y (part of which is sent as

feedback)• J={dyk/duj} (partial differential)• Two approaches:– Indirect Learning– Direct Learning

Filtering• To extract information from noisy data• Filter used for:

– Filtering (for getting current data based on past data)– Smoothing (for getting current data based on future data)– Prediction (for forecasting future data based on current and past data)

• In filtering – Cocktail party problem– Blind signal separation– Here x(n) = A u(n), were A = mixing matrix– Need a de mixing W to recover the original signal

• In prediction – Error correction learning– x(n) provides the desired response and used for training– A form of model building, where network acts as model– When prediction is non-linear; NNs are a powerful method because non-linear

processing units can be used for its construction– However if dynamic range of the time series is unknown, linear output unit is the most

reasonable choice

Beam forming• Spatial form of filtering• To provide attentional selectivity in the presence of noise• Used in radar and sonar systems• Detect and track a target of interest in the presence of receiver

noise and interfering signals (e.g. from jammers)• Task is complicated by:

– Target signal can be from an unknown direction– No prior information about interfering signals

• Generalized Side Lobe Canceller (GSLC) consisting of:– Array of antenna elements: which samples the observed signals – A linear combiner: acts as a spatial filter and provides the desired

response (i.e. for main lobe)– A signal blocking matrix: to cancel leakage from side lobes– A neural network : to accommodate variations in interfering signals

• Neural network adjusts its free parameters and acts as an attentional neurocomputer.

Associative Memory• Memory is relatively enduring neural alterations induced by

the interaction of an organism with its environment.• Activity must be stored in memory through a learning process• Memory may be short term or long term• Associative memory

– Distributed– Stimulus (key) pattern and response (stored) pattern vectors– Information is stored in memory by setting up a spatial pattern of

neural activities across a large number of neurons– Information in stimulus also contains storage location and address

for retrieval– High degree of resistance to noise and damage of a diffusive kind– May be interactions between different patterns stored in memory

and thus errors in recall process

Memory and noise

• For a linear network yk = W(k)xk

• Total experience gained M = Σk=1..q W(k)

• Memory matrix Mk = Mk-1 + W(k); k = 1..q

• Estimate of memory matrix Me = Σk=1..q ykxkT

• Correlation matrix memory Me= YXT

• X = key matrix; Y = memorized matrix• Recall : y= Mxj

• y = yj + vj ; vj = noise vector is due to cross talk between key vector xj and all other key vectors stored in memory

• For a linear signal space cosine of angle between vectors xj and xk cos(xk,xj) = xk

Txj/(|xk|.|xj|)

• Noise vector vj = Σk=1..m cos(xk,xj)yk

Orthogonality, Community and Errors

• The memory associates perfectly (noise vector is zero) when the key vectors are orthogonal, i.e. xk

Txj = {1 when k=j and 0 when k≠j}

• If key patterns are not orthogonal or highly separated it leads to confusion and errors

• Community of set of patterns {xkey } can be such that xk

Txj >= ᵞ for k≠j • If the lower bound ᵞ is large enough, the memory may

fail to distinguish the response y from any other key pattern contained in the set {xkey}

Adaptation• Spatiotemporal nature of learning• Temporal structure of experience from insects to humans, thus animal can

adapt its behavior• In time-stationary environment,

– supervised learning possible, – synaptic weights can be frozen after learning– learning system relies on memory

• In non-time-stationary environments– supervised learning inadequate– network needs a way to track the statistical variations in environment with time– desirable for neural network to continually adapt its free parameters to respond

in real time– this requires continuous learning– Linear adaptive filters perform continuous learning

• Used in radar, sonar, communications, seismology, biomedical signal processsing• In a mature state of development• Nonlinear adaptive filters, development not yet mature.

Pseudo stationary process• Neural network requires stable time for computation• How can it adapt to signals varying in time?• Many non stationary processes change slowly enough for the process to be

considered pseudo stationary over a window of short enough duration.– Speech signal: 10 – 30 ms– Radar returns from ocean surface: few seconds– Long range weather forecasting: few minutes– Long range stock market trends: few days

• Retrain network at regular intervals, dynamic approach– Select a window short enough for data to be considered pseudo stationary– Use the sampled data to train the network– Keep data samples in a FIFO, add new sample and drop oldest data sample– Use updated data window to retrain and repeat

• Network undergoes continual training with time ordered examples• Non linear filter : a generalization of linear adaptive filters• Resources available must be fast enough to complete the compute in one

sampling period.

Rosenblatt’s perceptron

• Type: feed forward• Neuron layers: 1 I/P, 1 O/P• Input value types: binary• Activation function: Hard

Limiter• Learning method:

Supervised• Learning Algorithm: Hebb’s

learning rule• Used in: Simple logic

operations; pattern classification

Perceptron weight updates

Perceptron

Perceptron Convergence Theorem• 1: Initialization : set w(0) = 0• 2: Activation: at time step n, activate the perceptron by applying

continuous valued input vector x(n) and desired response d(n)• 3: Computation of Actual Response: Compute the actual response of

the perceptron– y(n) = sgn(wT(n)x(n))

• 4: Adaptation of weight vector: Update the weight vector of the perceptron:– w(n+1) = w(n) + η[d(n) – y(n)]x(n)– Where – D(n) = +1 if x(n) belongs to class C1– = -1 if x(n) belongs to class C2

• Continuation: Increment time step n by one and go back to step 2

LMS Rule

• Also known as:– Delta rule– Adaline rule– Widrow Hopf rule

Neural Network Hardware• Hardware runs orders of magnitude faster than software• Two approaches:

– General, but probably expensive, system that can be reprogrammed for many kinds of tasks• e.g. Adaptive Solutions CNAPS

– Specialized but cheap chip to do one thing very quickly and efficiently.• e.g. IBM ZISC

• Number of neurons vary from 10 to 10**6• Precision is mostly limited to 16 bit fixed point for weights and 8

bit fixed point for outputs• Recurrent NNs may require output of >16 bits• Performance is measured in

– number of multiply and accumulate operations in unit time (MCPS: millions of connections per second)

– Rate of weight updates (MCUPS: millions of connections update per second)

NN Hardware categories

• Neurocomputers– Standard chips• Sequential + Accelerator• Multiprocessor

– Neuro chips• Analog• Digital• Hybrid

Hardware Implementation (Accelerator Boards)

• Accelerator boards– Most frequently used neural commercial hardware

• Relatively cheap• Widely available• Simple to connect to PCs, workstations• Have user friendly software tools• However usually specialized for certain tasks and may lack flexibility

– Based on neural network chips• IBM ZISC036 : 36 neurons; RBF network; RCE (or ROI algorithm)• PCI card: 19 chips, 684 prototypes, • Can process 165,000 patterns per second; where patterns are 64 8-bit element

vectors.• SAIC Sigma-1• Neuro Turbo• HNC

– Some use just fast DSPs

Hardware Implementation (General Purpose Processors)

• Neuro computers built from general purpose Processors– BSP400– COKOS– RAP (Ring Array Processor)

• Used for development of connectionist algorithms for speech recognition

• 4 to 40 TMS320C20 DSPs• Connected via ring of Xilinx FPGAs• VME bus to connect to host computer• 57 MCPS in feed forward mode• 13.2 MCPS in back propagation training

neural networks

Education

artificial neural networks

neural networks tec

references neural networks

hebbian learning

learning curve

optimum learning

simplecompetitive learning

counterpropagation networks