chapter 5 recurrent networks and temporal feedforward networks

86
Chapter 5 Recurrent Networks and T emporal Feedforward Netw orks 國國國國國國國國 國國國國國國國 國國國 (Chuan-Yu Chang ) 國國 Office: ES 709 TEL: 05-5342601 ext. 4337 E-mail: [email protected]

Upload: gay

Post on 12-Jan-2016

78 views

Category:

Documents


5 download

DESCRIPTION

Chapter 5 Recurrent Networks and Temporal Feedforward Networks. 國立雲林科技大學 資訊工程研究所 張傳育 (Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext. 4337 E-mail: [email protected]. Overview of Recurrent Neural Networks. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chapter 5Recurrent Networks and Temporal Feedforward Networks

國立雲林科技大學 資訊工程研究所張傳育 (Chuan-Yu Chang ) 博士Office: ES 709TEL: 05-5342601 ext. 4337E-mail: [email protected]

Page 2: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 2

Overview of Recurrent Neural Networks A network that has closed loops in its topological structure is considered

a recurrent network. Feedforward networks:

Implemented fixed-weighted mapping from input space to output space. The state of any neuron is solely determined by the input to the unit and not t

he initial and past states of the neuron. Recurrent neural networks

Recurrent neural networks utilize feedback to allow initial and past state involvement along with serial processing. Fault-tolerant

These networks can be fully connected. The connection weights in a recurrent neural network can be symmetric or as

ymmetric. In symmetric case, (wij=wji) the network always converges to stable point. However, thes

e networks cannot accommodate temporal sequences of pattern. In the asymmetric case, (wij≠wji) the dynamics of the network can exhibit limit cycles and

chaos, and with the proper selection of weights, temporal spatial patterns can be generated and stored in the network.

Page 3: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 3

Hopfield Associative Memory

Hopfield(1988) The physical systems consisting of a large

number of simple neurons can exhibit collective emergent properties.

A collective property of a system cannot emerge from a single neuron, but it can emerge from local neuron interactions in the system.

Produce a content-addressable memory that can correctly yield an entire memory from partial information.

Page 4: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 4

Hopfield Associative Memory (cont.)

The standard discrete-times Hopfield neural network A kind of recurrent network Can be viewed as a nonlinear associative memory, or

content-addressable memory. To perform a dynamic mapping function.

Intended to perform the function of data storage and retrieval.

The network stores the information in a dynamically stable environment.

A stored pattern in memory is to be retrieved in response to an input pattern that is a noisy version (incomplete) of the stored pattern.

Page 5: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 5

Hopfield Associative Memory (cont.)

Content-addressable memory (CAM) Attractor is a state that the system will evolve tow

ard in time, starting from a set of initial conditions. (basin of attraction)

If an attractor is a unique point in the state space, it is called a fixed point.

A prototype state h is represented by a fixed point h of the dynamic system.

Thus, h is mapped onto the stable points h of the network.

Page 6: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 6

Hopfield Associative Memory (cont.)

Page 7: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 7

Hopfield Associative Memory (cont.)

Activation function: symmetric hard-limiter

Output can only be +1 or -1

The output of a neuron is not fed back to itself. Therefore, Wij=0 for i=j.

Page 8: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 8

Hopfield Associative Memory (cont.)

The output of the linear combiner is written as

where is the state of the network The state of each neuron is given by

if vi=0, the value of xj will be defined as its previous state. The vector-matrix form of (5.1) is given by

neurons) ( 1,2,...,for 1

nni,θx wθxwv iTi

n

jijiji

Tnxxxx ,...,, 21External threshold

0for 1

0for 1)sgn(

i

iijx

Wxv

(5.1)

(5.2)

(5.3)

Page 9: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 9

Hopfield Associative Memory (cont.)

The network weight matrix W is written as

Each row in (5.4) is the associated weight vector for each neuron.

The output of the network can be written asvector-matrix form

Scalar form

0...

0...

..................

...0

...0

...0

1321

1131211

3343231

2242321

1141312

nnnnn

nnnnn

n

n

n

wwww

wwww

wwww

wwww

wwww

W

])(sgn[)1( θW kxkx

n

jijiji xwkx

1

sgn)1(

(5.4)

(5.5)

(5.6)

Page 10: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 10

Hopfield Associative Memory (cont.) There are two basic operational phases associated with the

Hopfield network: the storage phase and the recall phase. During the storage phase, the associative memory is build acc

ording to the outer-product rule for correlation matrix memories. Given the set of r prototype memories, the network

weight matrix is computed as

Recall phase A test input vector x’ The state of network x(k) is initialized with the values of the unkno

wn input, ie., x(0)=x’. Using the Eq.(5.6), the elements of the state vector x(k) are update

d one at a time until there is no significant change in the elements of the vector. When this condition is reached, the stable state xe is the network output.

,,..., 21 r

r

h

Thh I

n

r

nW

1

1 (5.7)為了滿足 wij=0 for i=j

Page 11: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 11

Hopfield Associative Memory (cont.) Discrete-time Hopfield network training algorithm

Step 1: (storage phase) Given a set of prototype memories, using (5.7), the synaptic weights of the network are calculated according to

Step 2: (Recall Phase) Given an unknown input vector x’, the Hopfield network is initialized by setting the state of the network x(k) at time k=0 to x’

Step 3: The element of the state of the network x(k) are update asynchronously according to (5.6)

This iterative process is continued until it can be shown that the element of the state vector do not change. When this condition is met, the network outputs the equilibrium state

ji

jinw

r

nhfhi

ij

0

1

1

')0( xx

n

jijiji xwkx

1

sgn)1(

exx

(5.8)

(5.9)

(5.10)

(5.11)

Page 12: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 12

Hopfield Associative Memory (cont.)

The major problem associated with the Hopfield network is spurious equilibrium state. These are stable equilibrium states that are not part of the

design set of prototype memories. 造成的 spurious equilibrium state 原因

They can result from linear combinations of an odd number of patterns.

For a large number of prototype memories to be stored, there can exist local minima in the energy landscape.

Spurious attractors can result from the symmetric energy function.

Li et al., proposed a design approach, which is based on a system of first-order linear ordinary differential equation. The number of spurious attractors is minimized.

Page 13: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 13

Hopfield Associative Memory (cont.)

Because the Hopfield network has symmetric weights and no neuron self-loop, an energy function (Lyapunov function) can be defined.

An energy function for the discrete-time Hopfield neural network can be written as

The change in energy function is given by

n

j

n

i

n

iiiiijiij

n

jii

xxxxxw1 1 1

'

12

1

i

n

ijj

iijij xxxw

1

'

(5.12)

(5.13)

The operation of Hopfield network leads to a monotonically decreasing energy function, and changes in the state of the network will continue until a local minimum of the energy landscape is reached/

x is the state of the network,x’ is an externally applied input presented to the network is the threshold vector.

Page 14: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 14

Hopfield Associative Memory (cont.) For no externally applied inputs, the energy function is given by

The energy change is

The storage capacity (bipolar patterns) of the Hopfield network is approximately by

If most of the prototype memories be recalled perfectly, the maximum storage capacity of the network given by

If it is required that 99% of the prototype memories are to be recalled perfectly

n

jjiij

n

jii

xxw112

1

i

n

ijj

jij xxw

1

nPs 15.0

n

nPs ln2

n

nPs ln4

(5.14)

(5.15)

(5.16)

(5.17)

(5.18)

n is the number of neurons in the network

Page 15: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 15

Hopfield Associative Memory (cont.) Example 5.1

有一個具有三個神經元,且權重值為固定, threshold 為 0 的網路架構,因此可有八種可能的輸入 ( 均為 bipolar vector)

網路的權重是由兩個穩定向量 (prototype memory) [-1, 1, -1] 及 [1, -1, 1] 根據(5.7) 所組成

其他的狀態會轉移到此兩個穩定狀態。

032

32

3203

23

23

20

100

010

001

3

21,1,1

1

1

1

1,1,1

1

1

1

3

1W

Page 16: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 16

Hopfield Associative Memory (cont.)

向量 [-1, -1, 1], [1, -1, -1] 和 [1, 1, 1] 均會收斂到 [ 1, -1, 1]

利用 (5.14) 計算能量函數

由表 5.1 可知 , 在八個可能的輸入中,此兩個穩定的輸入所得到的 energy最小。

xxxxxx 3231213

2

Page 17: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 17

Hopfield Associative Memory (cont.) Example 5.2

每個字母由 12*12 雙極性值陣列所組成。 (+1 為黑, -1 為白 ) 需要 12*12=144 個神經元, 144*144=20736 條神經鍵。 Threshold =0 。 每個字母均以向量來表示 (-1 表黑色、 +1 表白色 )

1144

hXvecx

h

Page 18: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 18

Hopfield Associative Memory (cont.) 神經鍵的權重值是由 5 個 pr

ototype vector 根據 (5.7) 式所計算求得。如圖 5.7 所示。

具有 30% bit error rate對角線為 0

Page 19: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 19

The Traveling-Salesperson Problem

Optimization problems Finding the best way to do something subject to certain

constraints. The best solution is defined by a specific criterion. In many cases optimization problems are described in

terms of a cost function. The Traveling-Salesperson Problem, TSP

A salesperson must make a circuit through a certain number of cities.

Visiting each city only once. The salesperson returns to the starting point at the end of

the trip. Minimizing the total distance traveled.

Page 20: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 20

The Traveling-Salesperson Problem (cont.)

Constraint Weak constraint

Eg. Minimum distance Strong constraint

Constraints that must be satisfied. The Hopfield network is guaranteed to converge to a

local minimum of the energy function. To use the Hopfield memory for optimization proble

ms, we must find a way to map the problem onto the network architecture. The first step is to develop a representation of the problems

solutions that fit an architecture having a single array of PEs.

Page 21: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 21

The Traveling-Salesperson Problem (cont.)

以 Hopfield 解最佳化問題的步驟 決定問題設計變數 決定問題設計限制及設計目標 選擇神經元的狀態表現變數 定義能量函數 (Lyapunov energy function) ,其最低值,對應網路

的最佳解。

由能量函數推導出網路的連結權重 W ,和門限值 經由網路的疊代過程,以求得解答。

N

x

N

iXiXi

n

X

n

i

n

Y

n

jYjYjXiXi VIVWVE

1 11 1 1 1,2

1

Page 22: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 22

The Traveling-Salesperson Problem (cont.)

An energy function must satisfies the following criteria. Visit each city only once on the tour Visit each position on the tour in a time Include all n cities. The shortest total distances.

A

B

C

D

E

1 2 3 4 5

Page 23: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 23

The Traveling-Salesperson Problem (cont.)

The energy equation is

n

X

n

XYY

n

iiYiYXiXY

n

X

n

iXi

n

i

n

X

n

XYY

YjXi

n

X

n

i

n

ijj

XjXi

vvvdD

nvC

vvB

vvA

E

1 1 11,1,

2

1 1

1 1 11 1 1

22

22

每一城市只去一次

每一次只去一個城市

所有的 N 個城市都要去從某城市出發,最後回到原出發城市的路徑總浬程最短

Page 24: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 24

The Traveling-Salesperson Problem (cont.)

Comparing the cost function and the Lyapunov function of the Hopfield networks, the synaptic interconnection strengths and the bias input of the network are obtained as

where the Kronecker delta function defined as

1,1,, )1()1( ijijXYXYijijXYYjXi DdCBAW

nCXi

ji,

ji

jiji 0

1,

.

Page 25: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 25

The Traveling-Salesperson Problem (cont.)

The total input to neuron (x,i) is

n

Y

n

jXiYjXiYjXi XWnet

1 1

Page 26: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

A Contextual Hopfield Neural Networks for

Medical Image Edge Detection 張傳育 (Chuan-Yu Chang)

Optical Engineering,

vol. 45, No. 3, pp. 037006-1~037006-9,2006. (EI 、 SCI)

Page 27: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 27

Introduction

Edge detection from medical images(such as CT and MRI) is an important steps in the medical image understanding system.

Page 28: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 28

Introduction

The Proposed CHNN The input of CHNN is the original two-dimensional image and the

output is an edge-based feature map. Taking each pixel’s contextual information. Experimental results are more perceptual than the CHEFNN. The execution time is fast than the CHEFNN

Chang’s[2000]- CHEFNN

… …

… …

… …

… …

Layer 1(edge layer)

Layer 2(non-edge layer)Vx,i,2

Vx,i,1

… …

… …

… …

… …

… …

… …

… …

… …

… …

… …

… …

… …

Layer 1(edge layer)

Layer 2(non-edge layer)Vx,i,2

Vx,i,1

The CHEFNN Advantage:

--Taking each pixel’s contextual information.--Adoption of the competitive learning rule.Disadvantage:-- predetermined parameters A and B, obtain by trial and errors-- Execution time is long, 26 second above.

Page 29: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 29

The Contextual Hopfield Neural Network, CHNN

The architecture of CHNN

Page 30: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 30

The CHNN

The total input to neuron (x,i) is computed as

The activation function in the network is defined as

(2)

(1)

N

y

N

jixjyjyixix IVWNet

1 1,,,;,,

otherwisee

Net

Vn

ixix VNet

ixnix

,,1

10 ,

1,

Page 31: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 31

The CHNN

Base on the update equation, the Lyapunov energy function of the two dimensional Hopfield neural network as

(3)

N

x

N

iixix

N

x

N

y

N

i

N

jjyjyixix VIVWVE

1 1,,

1 1 1 1,,;,,2

1

N

x

N

ixjyy

N

i

N

jjyix

qpixjyix VVjydE

1),(),(

1 1 1,,

,,,;, ),(

2

1

Page 32: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 32

The CHNN

The energy function of CHNN must satisfy the following conditions: The gray levels within an area belonging to the non-edge

points have the minima Euclidean distance measure.

where

(5)

(4)

N

x

N

ixjyy

N

i

N

jjyix

qpixjyix VVjyd

1),(),(

1 1 1,,

,,,;, ),(

2

,,,;, )(

GMax

ggd jyix

jyix

Page 33: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 33

The CHNN

The neighborhood function

q

ql

p

pmmxylij

qpix jy ,,,, ),(

ji

jiji 0

1,

(7)

(6)

Page 34: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 34

The CHNN

The objective function for CHNN

(8)

N

x

N

ixjyy

N

i

N

jjyix

qpixjyix VVjydE

1),(),(

1 1 1,,

,,,;, ),(

2

1

Page 35: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 35

The CHNN

Comparing the objection function of the CHNN in Eq.(8) and the Lyapunov function Eq.(3) of the CHNN

0, ixI

(9)

(10)

(11)

jydW qpixjyixjyix ,

2

1 ,,,;,,;,

jy

N

ixjyy

N

j

qpixjyixix VjydNet ,

),(),(1 1

,,,;,, ,

2

1

Page 36: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 36

The CHNN Algorithm

Input: The original image X, the neighborhood parameters p and q.

Output: The stabilized neuron representing the classified edge feature map of the original image.

Page 37: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 37

The CHNN Algorithm

Algorithm: Step 1) Assigning the initial neuron states as 1. Step 2) Use Eq.(11) to calculate the total input of each

neuron . Step 3) Apply the activation rule given in Eq.(2) to

obtain the new output states for each neuron. Step 4) Repeat Step 2 and Step 3 for all neurons and count the

number of neurons whose state is changed during the updating. If there is a change, then go to Step 2. Otherwise, go to Step 5.

Step 5) Output the final states of neurons that indicate the edge detection results.

Page 38: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 38

Experimental Results

(a) Original phantom image (b) added noise (K=18), (c) added noise (K=20),(d) added noise (K=23), (e) added noise (K=25), (f) noise (K=30)

Phantom images

Page 39: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 39

Experimental Results

Noiseless phantom image. (a) Laplacian-based,(b) the Marr-Hildreth’s,(c) the wavelet-based,(d) the Canny’s, (e) the CHEFNN,(f) the proposed CHNN.

Page 40: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 40

Experimental Results

Noise phantom image(K=18). (a) Laplacian-based,(b) the Marr-Hildreth’s,(c) the wavelet-based,(d) the Canny’s, (e) the CHEFNN,(f) the proposed CHNN.

Page 41: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 41

Experimental Results

Noise phantom image(K=20). (a) Laplacian-based,(b) the Marr-Hildreth’s,(c) the wavelet-based,(d) the Canny’s, (e) the CHEFNN,(f) the proposed CHNN.

Page 42: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 42

Experimental Results

Noise phantom image(K=23). (a) Laplacian-based,(b) the Marr-Hildreth’s,(c) the wavelet-based,(d) the Canny’s, (e) the CHEFNN,(f) the proposed CHNN.

Page 43: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 43

Experimental Results

Noise phantom image(K=25). (a) Laplacian-based,(b) the Marr-Hildreth’s,(c) the wavelet-based,(d) the Canny’s, (e) the CHEFNN,(f) the proposed CHNN.

Page 44: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 44

Experimental Results

Noise phantom image(K=30). (a) Laplacian-based,(b) the Marr-Hildreth’s,(c) the wavelet-based,(d) the Canny’s, (e) the CHEFNN,(f) the proposed CHNN.

Page 45: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 45

Experimental Results

Page 46: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 46

Experimental Results

Knee joint based MR image

Page 47: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 47

Experimental Results

Skull-based CT image

Page 48: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 48

Conclusion

Proposed a new contextual Hopfield neural networks called Contextual Hopfield Neural Network (CHNN) for edge detection.

CHNN considers the contextual information of pixels.

The results of our experiments indicate that CHNN can be applied to various kinds of medical image segmentation including CT and MRI.

Page 49: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 49

Recommended Reading

Chuan-Yu Chang and Pau-Choo Chung, “Two-layer competitive based Hopfield neural network for medical image edge detection,” Optical Engineering, Vol. 39, No. 3, pp.695-703, March. 2000. (SCI)

Chuan-Yu Chang, and Pau-Choo Chung, “Medical Image Segmentation Using a Contextual-Constraint Based Hopfield Neural Cube,” Image and Vision Computing, Vol 19, pp. 669-678, 2001. (SCI)

Chuan-Yu Chang, "Spatiotemporal-Hopfield Neural Cube for Diagnosing Recurrent Nasal Papilloma," Medical & Biological Engineering & Computing, Vol. 43. pp. 16-22, 2005(EI 、 SCI).

Chuan-Yu Chang, “A Contextual-based Hopfield Neural Network for Medical Image Edge Detection,” Optical Engineering, vol. 45, No. 3, pp. 037006-1~037006-9,2006. (EI 、 SCI)

Chuan-Yu Chang, Hung-Jen Wang and Si-Yan Lin, “Simulation Studies of Two-layer Hopfield Neural Networks for Automatic Wafer Defect Inspection,” Lecture Notes in Computer Science 4031, pp. 1119 – 1126, 2006.(SCI)

Page 50: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 50

Simulated Annealing 由於 Hopfield neural network 在回想的過程 (recalling stored pattern)

可能會落入所謂的 local minima 。但是,在 optimization problem 上我們希望能夠得到 global minimum 。

因為 Hopfield neural network 是採用 gradient descent 法,因此在網路收斂的過程會卡在一個 local minimum 。

所以我們必須在網路的收斂過程中適當的加入一些擾亂,來增加其收斂到 global minimum 的機率。

SA 可用來解決像是組合最佳化、 NP-complete 的問題。 SA 與傳統的最陡坡降法不同,因為其在收斂的過程加入了一個擾動

量,允許系統跳離 local minimum ,並持續尋找最佳解 (global minimum) 。

SA 過程包含兩個階段: Melting the system to be optimized at an effectively

high temperature. Lowering the temperature in slow stages until the sy

stem freezes.

Page 51: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 51

Simulated Annealing (cont.)

Plot of a function of two variables with multiple minima and maxima

Page 52: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 52

Simulated Annealing (cont.) 以統計的觀點,能量函數 E(x) 可定義成計算所有元素在狀態

x 的熱能量。 每個狀態 x 發生熱平衡的機率

其中 KB為 boltzmann 常數 (KB=1.3806*10-23 J/K)T 為溫度, Z 為 partition function 定義如下:

其中, Tr 表系統中所有可能的原子結構的總合。 將 (5.25) 代入 (5.24) 可得 Boltamann-Gibbs Distribution

e

Tx kZ

x B/

r

1P

e

Tx kTrZ B/

ee

Tx

Tx

kr

kx

B

B

/

/

rT

P

(5.24)

(5.25)

(5.26)

Page 53: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 53

Simulated Annealing (cont.)

定義一組機率 Pr(x->xp) 表示從任何狀態 x 轉換進入另一擾動狀態 xp的機率。

但此機率並無法保證會熱平衡 (thermal equilibrium) ,因此我們必須在 Pr(x->xp) 加入一充分條件 (sufficient condition) ,以保證進入平衡狀態 xp 。

就機率的觀點,平均而言從 x 轉移到 xp的機率與從 xp轉移到 x 的機率是一樣的。可表示成 xxrxrxxrxr ppp PPPP (5.27)

Page 54: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 54

Simulated Annealing (cont.) 將 (5.27) 移項,並將 (5.26) 代入整理可得

其中能量變化量 Metropolis algorithm 使用 Monte Carlo technique ,使用如下

的轉移機率

如果能量的變化為負值,則原子 (atom) 將產生位移,且此位移後的結構將作為下一步驟的起始。

如果能量的變化為正值,則結構會不會被接受是由下式的機率所決定。

ee

ee TTx

Tx

T

p

p kkxk

kx

xx

xxBBp

B

Bp//

/

/

Pr

Pr

0for

0 for 1Pr /

e

Tp kxxB

)()( xxp

(5.28)

(5.29)

eTkr B/

P

Page 55: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 55

Simulated Annealing (cont.)

演算法隨機的部份通常以 uniformly distributed的 random number(範圍 [0-1]) 來實現。

從此機率分佈中,選擇一個值來和 Pr(E)比較 如果亂數值 < Pr(E) 則保持新的結構, 如果亂數 >= Pr(E) 則目前的結構不變,而繼續下一步驟。

Page 56: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 56

Simulated Annealing (cont.)

There are four basic components associated with a simulated annealing based global search algorithm: A concise description of the system configuration. An object or cost function. An exploration process, or a random generator of “move” or

rearrangements of the system elements in a configuration. An annealing schedule of temperatures and defined time

periods for which the system is to be evolved. The basic idea is to go “downhill” most of the time

instead of always going downhill. (a video)

Page 57: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 57

Simulated Annealing (cont.)

SA演算法的效能會因溫度參數 T 的下降 schedule 有關,如果溫度下降太快,則收斂太早,容易找到 local minimum;如果溫度下降太慢,則收斂速度太慢。

Geman提出一可以保證找到 global minimum 的 temperature schedule 。但收斂速度極慢,幾乎無實用價值。

因此如何加速 SA演算法是一個很有趣的課題,有許多次佳的解法,可增快收斂的速度。

,2,1 1log

0

k

k

TkT

,2,1 1 kkTkT

(5.32)

(5.33)

足夠大的正數

Decrementing factor should be small and close to unity, 0.8~0.99

Page 58: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 58

Simulated Annealing based global search algorithm Step 1:

Initialize the vector x to a random point in the set . Step 2:

Select an annealing schedule for the parameter T Initialize T to a sufficiently large number.

Step 3: Compute xp=x+Δx.

Step 4: Compute the change in the cost function Δf=f(xp)-f(x)

Step 5: Use (5.29), associated with the Metropolis algorithm, to decide if

xp should be used as the new state of the system or keep the current state x.

0for

0 for 1/rP

f

fxx

eTfp

Page 59: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 59

Simulated Annealing based global search algorithm (cont.) Step 6:

Step 3 through 5 are repeated until the system reaches equilibrium,

which is determined when the number of accepted transitions becomes insignificant.

Typically, Steps 3 through 5 are carried out a predetermined number of times.

Step 7: The temperature T is updated according to the annealing

schedule specified in step 2,. Steps 3 through 6 are repeated. The process can be stopped when the temperature T

reaches zero or a predetermined small number.

Page 60: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 60

Simulated Annealing (cont.) Example 5.3 TSP problem

The optimization task is to determine the optimum sequence of cities that the salesman is to follow on the trip.

The steps of the SA to this problem Identify the state space of possible solutions.

An ordered list of cities on the sales trip. The possible number of different sequences is equal to N!.

To specify the nature of the state perturbation. Assume that a new solution is obtained by swapping the position

of two cities in the current solution. To specify the cost function that facilitates fitness

quantification of the proposed solution. The total distance traveled by the salesman.

Page 61: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 61

Simulated Annealing (cont.)

Random 選擇 20 個要拜訪的城市

Initial solution

Final solution by

SA

網路收斂過程, cost 變化情形。

Example 5.3

Page 62: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 62

Boltzmann Machine

The Boltzmann machine is a parallel constraint satisfaction network based on simulated annealing and uses stochastic neurons.

Boltzmann machine 可以從集合中的範例樣式 (pattern) ,學習潛在的條件。

此網路使用大量的內部回饋 (feedback) 和隨機神經元 (stochastic neuron) ,因此可視為一種 stochastic recurrent network.

Boltzmann machine 與 Hopfield 的差異 : Boltzmann machine 允許使用 hidden neuron , Hopfield 不允許。 Boltzmann machine 使用 stochastic neuron , Hopfield 使用 McC

ulloch-Pitts neuron. Hopfield 使用非監督式學習, Boltzmann machineg 可使用監督式

學習,或非監督式學習。

Page 63: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 63

Boltzmann Machine (cont.)

Boltzmann machine 與 Hopfield 的共通點: 所有的連結權重是對稱的。 沒有自迴路 (self-feedback) Processing unit have bipolar states. The neurons are selected randomly and one at a time for u

pdating. Constraints

Strong constraint Must be satisfied by any solution. The strong constraints are the rule.

Weak constraint Boltzmann machine 適合求解具有大量 weak constra

int 的問題。

Page 64: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 64

Boltzmann Machine (cont.) Stochastic neuron

假設神經元 q ,會根據機率法則而 fire 。也就是說神經元的激發與否是由神經元的激發程度而定。

神經元的輸出

其中

如果 vq=0 ,則 yq=+1/-1 的機率各為 0.5

)(1 probailitywith

)( probailitywith

1

1

qr

qr

q vP

vPy

e

Tvvr

/21

1P

n

qjj

jqjq xwv1

(5.34)

(5.35)

T 為 pseudo temperature 用來控制神經原激發與否的不確定性。當 T趨近於零時,則網路退化成

Hopfield network (5.2)

Page 65: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 65

Boltzmann Machine (cont.) Probability distribution function for a stochastic neuron firin

g and the MiCulloch-Pitts neuron activation function.

如同 Hopfield network 一般, Boltzmann machine 有對稱的權重結構 wij=wji,而且沒有 self-feedback,wij=0 i=j.

T=0:MiCulloch-Pitts neuron activation function

stochastic neuron

Page 66: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 66

Boltzmann Machine (cont.) 因為系統的狀態機率是屬於 Boltzmann-Gibbs 分布,所以稱為

Boltzmann machine 。 Boltzmann machine 中的神經元分成兩類:

nv個 Visible 神經元 nh個 Hidden 神經元 共有 (nv + nh)(nv + nh -1) 條

神經連結 在非監督式學習中,輸入和

輸出神經元並無差別,只有visible 神經元直接與外界環境溝通,而且控制環境狀態。隱藏的神經元則用來描述隱藏在環境輸入的條件。

可用來架構一個機率分布, clamping patterns associated with the environment onto the visible neurons with the appropriate probabilities.

The supervised mode of training may involve a probabilistic correct response pattern for each of the input pattern.

Page 67: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 67

Boltzmann Machine (cont.) The energy of global network configuration

The energy function can be written in vector form

Bolzmann machine 的 learning cycle Positive phase and negative phase alternate follo

wed by synaptic weight adjustments. The state transition function is given by

xxxw i

jij i

ijiij

i 2

1

TT xWxx 2

1

e

xx Tii i /1

1Pr

(5.36)

(5.37)

(5.38)

xi denotes the ith neuron output state, i is the ith neuron

threshold

Page 68: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 68

Boltzmann Machine (cont.)

假設 neuron i改變狀態 (xi=>-xi) ,則能量函數的變化量為

將 (5.39) 代入 (5.38) 可得

若 neuron i 的初始狀態為 xi=-1 ,則神經元的轉移到另一狀態 (xi=1) 的機率為

ii

v

n

jjij

xx

i vxx

i

ii

xw 21

e

Tvxiiii

xxr/2

1

1P

Tvii

x/2-e1

11)Pr(

(5.39)

(5.40)

(5.41)

Page 69: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 69

Boltzmann Machine (cont.) 若 neuron i 的初始狀態為 xi=1 ,則神經元的轉移到另一狀態 (xi=-1)

的機率為

(5.42) 可改寫成

(5.41) 和 (5.43)恰為 (5.34) 的 general stochastic neuron 機率。 Boltzmann machine共有 n=nv+nh個神經元,每個神經元能表現兩個

狀態 (+1/-1) ,則共有 2n個狀態。 Boltzmann machine 採用 simulated annealing ,因此能量最小化的

過程是在學習的開始給予較大的 T ,而逐漸降低 T 的值。 The Boltzmann machine learning rule will presented in a step-by-ste

p algorithm.

Tix/2vie1

11)Pr(

Tix/2v- ie1

111)Pr(

(5.42)

(5.43)

Page 70: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 70

Learning algorithm for the Boltzmann machine Loop 1:

目的在使網路的可見神經元表現出來的所有狀態的機率分佈與特定環境所有狀態的機率分佈一致或接近

At the outermost loop, the synaptic weights of the network are update many times to ensure convergence according to

where >0, and

freeclampedjijiij xxxxw

jinjixxPRxx jiji ;,....,2,1, |||clamped

(5.45)

(5.46)

Page 71: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 71

Learning algorithm for the Boltzmann machine (cont.) Loop 2:

For each iteration in loop 1 <xixj> must be calculated in an unclamped state, and with the visible units clamped in each desired pattern.

To operate the Boltzmann machine, the system must be in thermal equilibrium for some positive temperature T>0.

The state of the system x then fluctuates and the correlations <xix

j> are measured by taking the time average of xixj. To obtain all information that is necessary to compute the synapti

c weight update rule in (5.45), this process must be carried out once with the visible neurons clamped in each of their states for R>0, and once with the neurons unclamped.

The system must repeatedly reach thermal equilibrium before an average can be taken.

Page 72: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 72

Learning algorithm for the Boltzmann machine (cont.) Loop 3:

Foe each of these averages in loop 2, thermal equilibrium must be reached using a simulated annealing temperature schedule {T(k)}, for a sufficiently large initial temperature T(0), and then a gradual decrease in the temperature.

Page 73: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 73

Learning algorithm for the Boltzmann machine (cont.) Loop 4:

At each of these temperatures in loop3, many neurons must be sampled and updated according to the rule from

where

and vi is the activity level of neuron i, that is

)Pr(-1y probabilitwith

)Pr(y probabilitwith

1

1

i

ii v

vx

Tiv/2v- ie1

1)Pr(

n

ijj

jiji xwv1

(5.47)

(5.48)

(5.49)

Page 74: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 74

Overview of Temporal Feedforward network The time delays allow the network to become

a dynamic network. The most common types of temporal network

Time-delay neural network (TDNN) Finite impulse response (FIR) Simple recurrent network (SRN) Real-time recurrent neural network (RTRNN) Pipeline recurrent neural network (PRNN) Nonlinear autoregressive moving average

(NARMA)

Page 75: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 75

Simple Recurrent Network

Simple recurrent network Elman network A single hidden-layer feedforward neural network. It has feedback connections from the outputs of th

e hidden-layer neurons to the input of the network. Developed to learn time-varying patterns or tempo

ral sequences.

Page 76: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 76

Simple Recurrent Network (cont.)

The upper portion of the network contains the context units.

The function of these units is to replicate the hidden-layer output signals at the previous time step.

The purpose of the context units is to deal with input pattern dissonance.

Page 77: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 77

Simple Recurrent Network (cont.)

The feedback provide a mechanism within the network to discriminate between patterns occurring at different times that are essentially identical.

The weights of the context units are fixed. The other network weights can be adjusted in a s

upervised training mode by using the error backpropagation algorithm with momentum.

Page 78: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 78

Time-delay neural network

Using time delays to perform temporal processing. A Feedforward neural network, with the inputs to the network succe

ssively delayed in time. A temporal sequence for the input is established and can be expres

sed as

The total number of weights required for the single neuron is (p+1)n This single-neuron model can be extended to a multilayer structure. The TDNN can be trained using a modified version of the standard b

ackpropagation algorithm.

mxxxX ,,1,0

Page 79: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 79

Time-delay neural network (cont.) Basic TDNN neuron with n inputs and p delays for each

input.

Page 80: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 80

Time-delay neural network (cont.) Three layered TDNN architecture for the

recognition of phonemes.

Page 81: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 81

Distributed Time-Lagged Feedforward neural networks A DTLFNN is distributed in the sense that the

element of time is distributed throughout the entire network.

Page 82: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 82

Distributed Time-Lagged Feedforward neural networks (cont.)

The output of the linear combiner is given by

where

In the z domain we can write from (5.52)

n

iin kvkvkvkvkv

121 )()()()()(

p

rii

iiiiiiiii

rkxrw

pkxpwkxwkxwkxwkv

0

)()(

)()()2()2()1()1()()0()(

The sum in (5.52) is referred to as a convolution sum.

)()()()2()()1()()0()( 21 zXzpwzXzwzXzwzXwzV ip

iiiiiiii

(5.51)

(5.52)

(5.53)

Page 83: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 83

Distributed Time-Lagged Feedforward neural networks (cont.)

Or as a transfer function

or

The output of the linear combiner in Fig. 5.19 for the qth neuron of the network is

piiii

i

i zpwzwzwwzX

zVzH )()2()1()0(

)(

)()( 21

pi

pi

pi

pi

i

ii z

pwzwzwzw

zX

zVzH

)()2()1()0(

)(

)()(

21

n

iqiqnqqq kskskskskv

121 )()()()()( (5.56)

(5.54)

(5.55)

Page 84: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 84

Distributed Time-Lagged Feedforward neural networks (cont.)

Each filtered input in Fig. 5.19 expressed in the time domain is given by the convolution sum

The output of the jth neuron in the network is given by

p

rijiji rkxrwks

0

)()()(

n

i

p

riji

n

ijijj rkxrwfksfkvfky

1 01

)()()()()(

(5.57)

(5.58)

Page 85: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 85

Distributed Time-Lagged Feedforward neural networks (cont.)

A DTLFNN is trained using a supervised learning algorithm a temporal backpropagation algorithm This training algorithm is a temporal generalizatio

n of the standard backpropagation training algorithm.

Update the appropriate network weight vector according to

)()()()1( )()1()()()( kxkkwkw si

sj

ssji

sji

(5.59)

Page 86: Chapter 5 Recurrent Networks and Temporal Feedforward Networks

Chuan-Yu Chang Ph.D. 86

Distributed Time-Lagged Feedforward neural networks (cont.)

where

In (5.60) ej(k) is the instantaneous error, and

1

1

11

)(

layerhidden sin neuron for )()(

layeroutput in neuron for )()(

)( sq

h

shj

shj

jj

sj

jwkkvf

jkvfke

k

)]()1([)( pkkk sh

sh

sh

sh

(5.60)