# 1 ece 517: reinforcement learning in artificial intelligence lecture 13: artificial neural networks...

TRANSCRIPT

11

ECE 517: Reinforcement Learning in ECE 517: Reinforcement Learning in Artificial IntelligenceArtificial Intelligence

Lecture 13: Artificial Neural Networks –Lecture 13: Artificial Neural Networks – Introduction, Feedforward Neural Networks Introduction, Feedforward Neural Networks

Dr. Itamar ArelDr. Itamar Arel

College of EngineeringCollege of EngineeringElectrical Engineering and Computer Science DepartmentElectrical Engineering and Computer Science Department

The University of TennesseeThe University of TennesseeFall 2012Fall 2012

October 30, 2012October 30, 2012

ECE 517 - Reinforcement Learning

Final projects - logistics Final projects - logistics

Projects can be done individually or in pairsProjects can be done individually or in pairs

Students are encouraged to propose a topicStudents are encouraged to propose a topic

Please email me your top three choices for a project Please email me your top three choices for a project along with a preferred along with a preferred date for your presentationdate for your presentation

Presentation dates:Presentation dates: Nov. 27, 29 and Dec. 4Nov. 27, 29 and Dec. 4

Format: 17 min presentation + 3 min Q&AFormat: 17 min presentation + 3 min Q&A ~7 min for background and motivation~7 min for background and motivation ~10 for description of your work and conclusions~10 for description of your work and conclusions

Written report due: Friday, Dec. 7Written report due: Friday, Dec. 7

Format similar to project reportFormat similar to project report

22

ECE 517 - Reinforcement Learning

Final projects - topicsFinal projects - topics

Teris player using RL (and NN)Teris player using RL (and NN)

Curiosity based TD learning*Curiosity based TD learning*States vs. Rewards in RLHuman reinforcement learning

Reinforcement Learning of Local Shape in the Game of Reinforcement Learning of Local Shape in the Game of GoGoWhere do rewards come from?Efficient Skill Learning using Abstraction Selection

AIBO Playing on a PC using RL*AIBO Playing on a PC using RL*

AIBO learning to walk within a maze* AIBO learning to walk within a maze*

Study of value function definitions for TD learning*Study of value function definitions for TD learning*

33

ECE 517 - Reinforcement Learning 44

OutlineOutline

IntroductionIntroduction

Brain vs. ComputersBrain vs. Computers

The PerceptronThe Perceptron

Multilayer Perceptrons (MLP)Multilayer Perceptrons (MLP)

Feedforward Neural-Networks and BackpropagationFeedforward Neural-Networks and Backpropagation

ECE 517 - Reinforcement Learning 55

Pigeons as art experts Pigeons as art experts (Watanabe (Watanabe et al.et al. 1995) 1995)

ExperimentExperiment:: Pigeon was placed in a closed boxPigeon was placed in a closed box Present paintings of two different artists (e.g. Chagall / Present paintings of two different artists (e.g. Chagall /

Van Gogh)Van Gogh) Reward for peckingReward for pecking

when presented a when presented a particular artist particular artist (e.g. Van Gogh)(e.g. Van Gogh)

Pigeons were able todiscriminate betweenVan Gogh and Chagallwith 95% accuracy(when presented withpictures they had beentrained on)

ECE 517 - Reinforcement Learning 66

Pictures by different artistsPictures by different artists

ECE 517 - Reinforcement Learning 77

Interesting resultsInteresting results

Discrimination still Discrimination still 85% successful85% successful for previously for previously unseen paintings of the artistsunseen paintings of the artists

Conclusions from the experiment:Conclusions from the experiment: Pigeons do not simply memorise the picturesPigeons do not simply memorise the pictures They can extract and recognise patterns (e.g. artistic They can extract and recognise patterns (e.g. artistic

‘style’)‘style’) They generalise from the already seen to make They generalise from the already seen to make

predictionspredictions

This is what neural networks (biological and This is what neural networks (biological and artificial) are good at artificial) are good at (unlike conventional computer)(unlike conventional computer)

Provided further justification for use of ANNsProvided further justification for use of ANNs“Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are

powerful beyond imagination,” Albert Einstein

ECE 517 - Reinforcement Learning 88

The “Von Neumann” architecture vs. Neural NetworksThe “Von Neumann” architecture vs. Neural Networks

Memory for programs Memory for programs and dataand data

CPU for math and logicCPU for math and logic

Control unit to steer Control unit to steer program flowprogram flow

Follows rulesFollows rules

Solution can/must be Solution can/must be formally specifiedformally specified

Cannot generalizeCannot generalize

Not error tolerantNot error tolerant

Learns from dataLearns from data

Rules on data are not Rules on data are not visiblevisible

Able to generalizeAble to generalize

Copes well with noiseCopes well with noise

Von Neumann

Neural Net

ECE 517 - Reinforcement Learning 99

Biological NeuronBiological Neuron

Input builds up on receptors (dendrites)

Cell has an input threshold

Upon breech of cell’s threshold, activation is fired down the axon

Synapses (i.e. weights) exist prior to the dendrites (input) interfaces

ECE 517 - Reinforcement Learning 1010

ConnectionismConnectionism

Connectionist techniques (a.k.a. neural networks) are Connectionist techniques (a.k.a. neural networks) are inspired by the strong interconnectedness of the human inspired by the strong interconnectedness of the human brain. brain. Neural networks are loosely modeled after the biological Neural networks are loosely modeled after the biological processes involved in cognition: processes involved in cognition:

1.1. Information processing involves many simple processing Information processing involves many simple processing elements called neurons. elements called neurons.

2. Signals are transmitted between neurons using 2. Signals are transmitted between neurons using connecting links. connecting links.

3. Each link has a weight that modulates (or controls) the 3. Each link has a weight that modulates (or controls) the strength of its signal. strength of its signal.

4. Each neuron applies an activation function to the input 4. Each neuron applies an activation function to the input that it receives from other neurons. This function that it receives from other neurons. This function determines its output.determines its output.

Links with Links with positivepositive weights are called weights are called excitatoryexcitatory links. links. Links with Links with negativenegative weights are called weights are called inhibitoryinhibitory links. links.

ECE 517 - Reinforcement Learning 1111

Some definitionsSome definitions

A A Neural NetworkNeural Network is an interconnected assembly of is an interconnected assembly of simple processing elements, simple processing elements, unitsunits or or nodesnodes. The long-. The long-term memory of the network is stored in the inter-unit term memory of the network is stored in the inter-unit connection strengths, or connection strengths, or weightsweights, obtained by a process , obtained by a process of adaptation to, or of adaptation to, or learninglearning from, a set of training from, a set of training patterns. patterns.

Biologically inspired learning mechanismBiologically inspired learning mechanism

ECE 517 - Reinforcement Learning 1212

Brain vs. ComputerBrain vs. Computer

Performance tends to degrade gracefully under partial damagePerformance tends to degrade gracefully under partial damageIn contrast, most programs and engineered systems are brittle: In contrast, most programs and engineered systems are brittle: if you remove some arbitrary parts, very likely the whole will if you remove some arbitrary parts, very likely the whole will cease to functioncease to functionIt performs It performs massively parallel computations massively parallel computations extremely extremely efficiently. For example, complex visual perception occurs efficiently. For example, complex visual perception occurs within less than 100 ms, that is, 10 processing steps! within less than 100 ms, that is, 10 processing steps!

ECE 517 - Reinforcement Learning 1313

Dimensions of Neural NetworksDimensions of Neural Networks

Various types of Various types of neuronsneurons

Various network Various network architecturesarchitectures

Various Various learning algorithmslearning algorithms

Various Various applicationsapplications

We’ll focus mainly on supervised learning based We’ll focus mainly on supervised learning based networksnetworks

TheThe architecturearchitecture of a neural network is linked of a neural network is linked with the learning algorithm used to trainwith the learning algorithm used to train

ECE 517 - Reinforcement Learning 1414

ANNs – The basicsANNs – The basics

ANNs incorporate ANNs incorporate the two the two fundamental fundamental components of components of biological neural biological neural nets:nets: NeuronsNeurons – –

computational computational nodesnodes

SynapsesSynapses – – weights or weights or memory storage memory storage devicesdevices

ECE 517 - Reinforcement Learning 1515

Neuron vs. NodeNeuron vs. Node

ECE 517 - Reinforcement Learning 1616

The Artificial NeuronThe Artificial Neuron

Inputsignal

Synapticweights

Summingfunction

Bias

b

Activation functionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

)(

ECE 517 - Reinforcement Learning 1717

Bias as an extra inputBias as an extra input

Bias is an external parameter of the neuron. Can be modeled by adding an extra (fixed-valued) input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

)(

w0x0 = +1

bw

xwv j

m

j

j

0

0

ECE 517 - Reinforcement Learning 1818

Face recognition exampleFace recognition example

90% accurate learning head pose, and recognizing 1-of-20 faces

ECE 517 - Reinforcement Learning 1919

The XOR problemThe XOR problem

A single-layer (linear) neural network cannot solve the A single-layer (linear) neural network cannot solve the XOR problem. XOR problem.

Input Input OutputOutput 00 00 0 0 01 01 1 1 10 10 1 1 11 11 0 0

To see why this is true, we can try to express the problem To see why this is true, we can try to express the problem as a linear equation: as a linear equation: aX + bY = ZaX + bY = Z

a0 + b0 = 0 a0 + b0 = 0 a0 + b1 = 1 -> b = 1 a0 + b1 = 1 -> b = 1 a1 + b0 = 1 -> a = 1 a1 + b0 = 1 -> a = 1 a1 + b1 = 0 -> a1 + b1 = 0 -> a = -ba = -b

ECE 517 - Reinforcement Learning 2020

The XOR problem (cont.)The XOR problem (cont.)

But adding a third bit the problem can be resolved. But adding a third bit the problem can be resolved.

Input Input OutputOutput 000 000 0 0 010 010 1 1 100100 1 1 111 111 0 0

Once again, we express the problem as a linear equation: Once again, we express the problem as a linear equation:

aX + bY + cZ = W aX + bY + cZ = W a0 + b0 + c0 = 0 a0 + b0 + c0 = 0 a0 + b1 + c0 = 1 -> b=1 a0 + b1 + c0 = 1 -> b=1 a1 + b0 + c0 = 1 -> a=1 a1 + b0 + c0 = 1 -> a=1 a1 + b1 + c1 = 0 -> a + b + c = 0 -> 1 + 1 + c = 0 -> c = -2 a1 + b1 + c1 = 0 -> a + b + c = 0 -> 1 + 1 + c = 0 -> c = -2 So the equation: So the equation: X + Y - 2Z = W will solve the problem.X + Y - 2Z = W will solve the problem.

ECE 517 - Reinforcement Learning 2121

A Multilayer Network for the XOR functionA Multilayer Network for the XOR function

Thresholds

ECE 517 - Reinforcement Learning 2222

Hidden UnitsHidden Units

Hidden unitsHidden units are a layer of nodes that are situated are a layer of nodes that are situated between the input nodes and the output nodes between the input nodes and the output nodes

Hidden units allow a network to learn non-linear Hidden units allow a network to learn non-linear functionsfunctions

The hidden units allow the net to represent The hidden units allow the net to represent combinations of the input features combinations of the input features Given Given tootoo many hidden unitsmany hidden units, however,, however,

a net will simply memorize the inputa net will simply memorize the inputpatterns patterns

Given Given too few hidden unitstoo few hidden units, the network, the networkmay not be able to represent all of themay not be able to represent all of thenecessary generalizationsnecessary generalizations

ECE 517 - Reinforcement Learning 2323

Backpropagation NetworksBackpropagation Networks

Backpropagation networks Backpropagation networks are among the most popular are among the most popular and widely used neural networks because they are and widely used neural networks because they are relatively simple and powerfulrelatively simple and powerful

Backpropagation was one of the first general techniques Backpropagation was one of the first general techniques developed to train multilayer networks, which do not developed to train multilayer networks, which do not have many of the inherent limitations of the earlier, have many of the inherent limitations of the earlier, single-layer neural nets criticized by Minsky and Papert. single-layer neural nets criticized by Minsky and Papert.

Backpropagation networks use a Backpropagation networks use a gradient descentgradient descent method to minimize the total squared error of the output. method to minimize the total squared error of the output.

A backpropagation net is a A backpropagation net is a multilayermultilayer, , feedforward feedforward networknetwork that is trained by backpropagating the errors that is trained by backpropagating the errors using the using the generalized delta rulegeneralized delta rule. .

ECE 517 - Reinforcement Learning 2424

The idea behind (error) backpropagation learningThe idea behind (error) backpropagation learning

Feedforward training of input patterns Feedforward training of input patterns Each input node receives a signal, which is broadcasted Each input node receives a signal, which is broadcasted to all of the hidden unitsto all of the hidden unitsEach hidden unit computes its activation, which is Each hidden unit computes its activation, which is broadcasted to all of the output nodesbroadcasted to all of the output nodes

Backpropagation of errors Backpropagation of errors Each output node compares itsEach output node compares itsactivation with the desired outputactivation with the desired outputBased on this difference, the error isBased on this difference, the error ispropagated back to all previous nodespropagated back to all previous nodes

Adjustment of weights Adjustment of weights The weights of all links are computedThe weights of all links are computedsimultaneously based on the errors simultaneously based on the errors that were propagated backwards that were propagated backwards

Multilayer Perceptron (MLP)

ECE 517 - Reinforcement Learning 2525

Activation functionsActivation functions

• Transforms neuron’s input into outputTransforms neuron’s input into output• Features of activation functions:Features of activation functions:

• A squashing effect is requiredA squashing effect is requiredPrevents accelerating growth of activation levels Prevents accelerating growth of activation levels through the networkthrough the network

• Simple and easy to calculateSimple and easy to calculate

ECE 517 - Reinforcement Learning 2626

Backpropagation LearningBackpropagation Learning

We want to train a multi-layer feedforward network by We want to train a multi-layer feedforward network by gradient descent gradient descent to approximate an unknown to approximate an unknown function, based on some training data consisting of function, based on some training data consisting of pairs pairs ((xx,,dd))

Vector Vector xx represents a pattern of represents a pattern of inputinput to the network, to the network, and the vector and the vector dd the corresponding the corresponding targettarget (desired (desired output) output)

BP is a gradient-descent based scheme …BP is a gradient-descent based scheme … The overall gradient with respect to the entire training The overall gradient with respect to the entire training

set is just the sum of the gradients for each patternset is just the sum of the gradients for each pattern We will therefore describe how to compute the gradient We will therefore describe how to compute the gradient

for just a single training patternfor just a single training pattern We will number the units, and denote the weight from We will number the units, and denote the weight from

unit unit jj to unit to unit ii by by xxijij

ECE 517 - Reinforcement Learning 2727

BP – Forward Pass at Layer 1BP – Forward Pass at Layer 1

ECE 517 - Reinforcement Learning 2828

BP – Forward Pass at Layer 2BP – Forward Pass at Layer 2

ECE 517 - Reinforcement Learning 2929

BP – Forward Pass at Layer 3BP – Forward Pass at Layer 3

The last layer produces the network’s outputThe last layer produces the network’s output

We can now derive an error (difference between output and We can now derive an error (difference between output and the target) the target)

ECE 517 - Reinforcement Learning 3030

BP – Back-propagation of error – output layerBP – Back-propagation of error – output layer

We have an error with respect to the target (We have an error with respect to the target (zz))This error signal will be propagated back towards the input This error signal will be propagated back towards the input layer (layer 1)layer (layer 1)

Each neuron will forward error information to the neurons Each neuron will forward error information to the neurons feeding it from the previous layerfeeding it from the previous layer

ECE 517 - Reinforcement Learning 3131

BP – Back-propagation of error towards the hidden BP – Back-propagation of error towards the hidden layerlayer

ECE 517 - Reinforcement Learning 3232

BP – Back-propagation of error towards the input layerBP – Back-propagation of error towards the input layer

ECE 517 - Reinforcement Learning 3333

BP – Illustration of Weight UpdateBP – Illustration of Weight Update