# tutorial training recurrent neural networks

Post on 28-Nov-2015

35 views

Embed Size (px)

TRANSCRIPT

A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach Herbert Jaeger Fraunhofer Institute for Autonomous Intelligent Systems (AIS) since 2003: International University Bremen First published: Oct. 2002 First revision: Feb. 2004 Second revision: March 2005 Abstract: This tutorial is a worked-out version of a 5-hour course originally held at AIS in September/October 2002. It has two distinct components. First, it contains a mathematically-oriented crash course on traditional training methods for recurrent neural networks, covering back-propagation through time (BPTT), real-time recurrent learning (RTRL), and extended Kalman filtering approaches (EKF). This material is covered in Sections 2 5. The remaining sections 1 and 6 9 are much more gentle, more detailed, and illustrated with simple examples. They are intended to be useful as a stand-alone tutorial for the echo state network (ESN) approach to recurrent neural network training. The author apologizes for the poor layout of this document: it was transformed from an html file into a Word file... This manuscript was first printed in October 2002 as H. Jaeger (2002): Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. GMD Report 159, German National Research Center for Information Technology, 2002 (48 pp.) Revision history: 01/04/2004: several serious typos/errors in Sections 3 and 5 03/05/2004: numerous typos 21/03/2005: errors in Section 8.1, updated some URLs

1

Index 1. Recurrent neural networks................................................................................................. 3

1.1 First impression ................................................................................................................ 3 1.2 Supervised training: basic scheme ................................................................................... 5 1.3 Formal description of RNNs ............................................................................................ 6 1.4 Example: a little timer network........................................................................................ 8

2. Standard training techniques for RNNs ........................................................................... 9 2.1 Backpropagation revisited................................................................................................ 9 2.2. Backpropagation through time ...................................................................................... 12

3. Real-time recurrent learning ............................................................................................ 15 4. Higher-order gradient descent techniques.................................................................... 16 5. Extended Kalman-filtering approaches.......................................................................... 17

5.1 The extended Kalman filter............................................................................................ 17 5.2 Applying EKF to RNN weight estimation ..................................................................... 18

6. Echo state networks ......................................................................................................... 20 6.1 Training echo state networks.......................................................................................... 20

6.1.1 First example: a sinewave generator ....................................................................... 20 6.1.2 Second Example: a tuneable sinewave generator ................................................... 24

6.2 Training echo state networks: mathematics of echo states ............................................ 26 6.3 Training echo state networks: algorithm........................................................................ 29 6.4 Why echo states? ............................................................................................................ 33 6. 5 Liquid state machines.................................................................................................... 33

7. Short term memory in ESNs............................................................................................ 34 7.1 First example: training an ESN as a delay line .............................................................. 35 7.2 Theoretical insights ........................................................................................................ 36

8. ESNs with leaky integrator neurons ............................................................................... 38 8.1 The neuron model........................................................................................................... 39 8.2 Example: slow sinewave generator ................................................................................ 41

9. Tricks of the trade ............................................................................................................. 41 References ............................................................................................................................ 45

2

1. Recurrent neural networks

1.1 First impression There are two major types of neural networks, feedforward and recurrent. In feedforward networks, activation is "piped" through the network from input units to output units (from left to right in left drawing in Fig. 1.1):

......

Figure 1.1: Typical structure of a feedforward network (left) and a recurrent network (right). Short characterization of feedforward networks:

typically, activation is fed forward from input to output through "hidden layers" ("Multi-Layer Perceptrons" MLP), though many other architectures exist

mathematically, they implement static input-output mappings (functions) basic theoretical result: MLPs can approximate arbitrary (term needs some

qualification) nonlinear maps with arbitrary precision ("universal approximation property")

most popular supervised training algorithm: backpropagation algorithm huge literature, 95 % of neural network publications concern feedforward nets

(my estimate) have proven useful in many practical applications as approximators of

nonlinear functions and as pattern classificators are not the topic considered in this tutorial

By contrast, a recurrent neural network (RNN) has (at least one) cyclic path of synaptic connections. Basic characteristics:

all biological neural networks are recurrent mathematically, RNNs implement dynamical systems basic theoretical result: RNNs can approximate arbitrary (term needs some

qualification) dynamical systems with arbitrary precision ("universal approximation property")

several types of training algorithms are known, no clear winner theoretical and practical difficulties by and large have prevented practical

applications so far

3

not covered in most neuroinformatics textbooks, absent from engineering textbooks

this tutorial is all about them.

Because biological neuronal systems are recurrent, RNN models abound in the biological and biocybernetical literature. Standard types of research papers include...

bottom-up, detailed neurosimulation: o compartment models of small (even single-unit) systems o complex biological network models (e.g. Freeman's olfactory bulb

models) top-down, investigation of principles

o complete mathematical study of few-unit networks (in AIS: Pasemann, Giannakopoulos)

o universal properties of dynamical systems as "explanations" for cognitive neurodynamics, e.g. "concept ~ attractor state"; "learning ~ parameter change"; " jumps in learning and development ~ bifurcations"

o demonstration of dynamical working principles o synaptic learning dynamics and conditioning o synfire chains

This tutorial does not enter this vast area. The tutorial is about algorithmical RNNs, intended as blackbox models for engineering and signal processing. The general picture is given in Fig. 1.2:

physical system empirical time series data

RNN model model-generated data

observe

model

generate

fit (similar distribution)"learn",

"estimate","identify"

......

Figure 1.2: Principal moves in the blackbox modeling game.

4

Types of tasks for which RNNs can, in principle, be used:

system identification and inverse system identification filtering and prediction pattern classification stochastic sequence modeling associative memory data compression

Some relevant application areas:

telecommunication control of chemical plants control of engines and generators fault monitoring, biomedical diagnostics and monitoring speech recognition robotics, toys and edutainment video data analysis man-machine interfaces

State of usage in applications: RNNs are (not often) proposed in technical articles as "in principle promising" solutions for difficult tasks. Demo prototypes in simulated or clean laboratory tasks. Not economically relevant yet. Why? supervised training of RNNs is (was) extremely difficult. This is the topic of this tutorial.

1.2 Supervised training: basic scheme There are two basic classes of "learning": supervised and unsupervised (and unclear cases, e.g. reinforcement learning). This tutorial considers only supervised training. In supervised training of RNNs, one starts with teacher data (or training data): empirically observed or artificially constructed input-output time series, which represent examples of the desired model behavior.

Figure 1.3: Supervised training scheme.

A. Training

Recommended