fundamental neurocomputing conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/chapter-2... ·...

85
1 Chapter 2 Fundamental Neurocomputing Concepts 國立雲林科技大學 資訊工程研究所 張傳育(Chuan-Yu Chang ) 博士 Office: EB212 TEL: 05-5342601 ext. 4516 E-mail: [email protected] HTTP://MIPL.yuntech.edu.tw

Upload: others

Post on 18-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

1

Chapter 2 Fundamental Neurocomputing Concepts

國立雲林科技大學 資訊工程研究所

張傳育(Chuan-Yu Chang ) 博士

Office: EB212TEL: 05-5342601 ext. 4516E-mail: [email protected]://MIPL.yuntech.edu.tw

Page 2: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

2

Basic Models of Artificial neurons

An artificial neuron can be referred to as a processing element, node, or a threshold logic unit.

There are four basic components of a neuron A set of synapses with associated synaptic weights A summing device, each input is multiplied by the

associated synaptic weight and then summed. A activation function, serves to limit the amplitude of the

neuron output. A threshold function, externally applied and lowers the

cumulative input to the activation function.

Page 3: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

3

Basic Models of Artificial neurons

Page 4: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

4

Basic Models of Artificial neurons

[ ]

( )

−=

−==

∈=

===

=

×

=

q

n

jjqjq

qqqq

Tqnqq

n

jq

Tjqjq

xwfy

ufvfy

www

xwu

θ

θ

1

1n21

1

bygiven isneuron theofoutput the)(

isfunction activation theofoutput theR,...,, where

iscombiner linear theofoutput the

q

Tq

w

wxxw (2.2)

(2.3)

(2.4)

Page 5: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

5

Basic Models of Artificial neurons The threshold (or bias) is incorporated into the synaptic

weight vector wq for neuron q.

Page 6: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

6

Basic Models of Artificial neurons

( )qq

n

jjqjq

vfyq

wv

=

= ∑=

as written is neuron ofoutput The

as written is potential activation internal effective The

0x

Page 7: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

7

Basic Activation Functions

The activation function, transfer function, Linear or nonlinear

Linear (identity) activation function

( ) qqlinq vvfy ==

Page 8: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

8

Basic Activation Functions

Hard limiter Binary function, threshold function

(0,1) The output of the binary hard

limiter can be written as

Hard limiter activation function( )

≥<

==0 if10 if0

q

qqhlq v

vvfy

Page 9: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

9

Basic Activation Functions

Bipolar, symmetric hard limiter (-1, 1) The output of the symmetric

hard limiter can be written as

Sometimes referred to as the signum (or sign) function.

( )

>=<−

==0 if10 if00 if1

q

q

q

qshlq

vvv

vfySymmetric limiter activation function

Page 10: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

10

Basic Activation Functions

Saturation linear function, piecewise linear function The output of the saturation

linear function is given by

( )

>

≤≤+

−<

==

21 if1

21

21- if

21

21 if0

q

qq

q

qslq

v

vv

v

vfySaturation linear activation function

Page 11: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

11

Basic Activation Functions

Saturation linear function The output of the symmetric

saturation linear function is given by

Saturation linear activation function( )

>≤≤

−<−==

1 if111- if

1 if1

q

qq

q

qsslq

vvv

vvfy

Page 12: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

12

Basic Activation Functions

Sigmoid function (S-shaped function) Binary sigmoid function The output of the binary

sigmoid function is given by

( )qvqbsq e

vfy α−+==

11

Where α is the slope parameter of the binary sigmoid function

Binary sigmoid function

Hard limiter has no derivative at the origin, the binary sigmoid is a continuousand differentiable function.

Page 13: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

13

Basic Activation Functions

Sigmoid function (S-shaped function) Bipolar sigmoid function,

hyperbolic tangent sigmoid The output of the Binary sigmoid

function is given by

( ) ( )q

q

qq

qq

v

v

vv

vv

qqhtsq ee

eeeevvfy α

α

αα

αα

α 2

2

11tanh −

+−

=+−

===

Page 14: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

14

Adaline and Madaline

Least-Mean-Square (LMS) Algorithm Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm that computes

adjustments of the neuron synaptic weights. The algorithm is based on the method of steepest decent. It adjusts the neuron weights to minimize the mean square

error between the inner product of the weight vector with the input vector and the desired output of the neuron.

Adaline (adaptive linear element) A single neuron whose synaptic weights are updated

according to the LMS algorithm. Madaline (Multiple Adaline)

Page 15: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

15

Simple adaptive linear combiner

( ) ( ) ( ) ( ) ( )kxkwkwkxkv TT ==

inputs

x0=1, wo=β (bias)

Page 16: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

16

Simple adaptive linear combiner

The difference between the desired response and the network response is

The MSE criterion can be written as

Expanding Eq(2.23)

{ } ( )[ ]{ }22 )()(21)(

21)( kkkdEkeEJ T xww −==

( ) ( ) ( ) ( ) ( ) ( )kkkdkvkdke T xw−=−=

{ } { } { }

{ } )()(21)()(

21

)()()()(21)()()()(

21)(

2

2

kkkkdE

kkkEkkkkdEkdEwJ

xTT

TTT

wCwwp

wxxwwx

+−=

+−=

(2.22)

(2.23)

(2.24)

(2.25)

Page 17: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

17

Simple adaptive linear combiner

Cross correlation vector between the desired response and the input patterns

Covariance matrix for the input pattern

J(w)的MSE表面有一個最小值(minimum) ,因此計算梯度等於零的權重值

因此,最佳的權重值為

{ })()( kkdE xp =

{ })()( kkE Tx xxC =

0)()()( =+−=∂

∂=∇ kJJ xwCp

wwww

pCw 1* −= x

(2.26)

(2.27)

Page 18: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

18

The LMS Algorithm

上式的兩個限制 求解covariance matrix的反矩陣很費時

不適合即時的修正權重,因為在大部分情況,covariance matrix和cross correlation vector無法事先知道。

為避開這些問題,Widow and Hoff提出了LMS algorithm To obtain the optimal values of the synaptic weights when

J(w) is minimum. Search the error surface using a gradient descent method

to find the minimum value. We can reach the bottom of the error surface by changing the

weights in the direction of the negative gradient of the surface.

Page 19: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

19

The LMS AlgorithmTypical MSE surface of an adaptive linear combiner

Page 20: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

20

The LMS Algorithm Because the gradient on the surface cannot be computed

without knowledge of the input covariance matrix and the cross-correlation vector, these must be estimated during an iterative procedure.

Estimation of the MSE gradient surface can be obtained by taking the gradient of the instantaneous error surface.

The gradient of J(w) approximated as

The learning rule for updating the weights using the steepest descent gradients method as

)()(

)(21)( )(

2

kke

keJ kwww

xw

w

−=∂

∂≈∇ =

[ ] )()()()()()1( kkekJkk w xwwww µµ +=∇−+=+

(2.28)

(2.29)

Learning rate specifies the magnitude of the update step for theweights in the negative gradient direction.

Page 21: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

21

The LMS Algorithm

If the value of µ is chosen to be too small, the learning algorithm will modify the weights slowly and a relatively large number of iterations will be required.

If the value of µ is set too large, the learning rule can become numerically unstable leading to the weights not converging.

Page 22: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

22

The LMS Algorithm

The scalar form of the LMS algorithm can be written from (2.22) and (2.29)

從(2.29)及(2.31)式,我們必須給learning rate設立一個上限,以維持網路的穩定性。(Haykin,1996)

∑=

−=n

hhh kxkwkdke

1)()()()(

)()()()1( kxkekwkw iii µ+=+

(2.30)

(2.31)

max

20λ

µ << The largest eigenvalue of the input covariance matrix Cx

Page 23: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

23

The LMS Algorithm

為使LMS收斂的最小容忍的穩定性,可接受的learning rate可限定在

(2.33)式是一個近似的合理解法,因為

{ }xtrace C20 << µ

{ } ∑ ∑= =

≥==n

h

n

hxhhhx ctrace

1 1maxλλC

(2.33)

(2.34)

Page 24: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

24

The LMS Algorithm

從(2.32) 、(2.33)式知道,learning rate的決定,至少得計算輸入樣本的covariance matrix,在實際的應用上是很難達到的。

即使可以得到,這種固定learning rate在結果的精確度上是有問題的。

因此,Robbin’s and Monro’s root-finding algorithm提出了,隨時間變動learning rate的方法。(Stochastic approximation )

where κ is a very small constant. 缺點:learning rate減低的速度太快。

kk κµ =)(

(2.35)

Page 25: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

25

The LMS Algorithm

理想的做法應該是在學習的過程中,learning rate µ應該在訓練的開始時有較大的值,然後逐漸降低。(Schedule-type adjustment)

Darken and Moody Search-then converge algorithm

Search phase: µ is relatively large and almost constant. Converge phase: µ is decrease exponentially to zero.

µ0 >0 and τ>>1, typically 100<=τ<=500 These methods of adjusting the learning rate are

commonly called learning rate schedules.

τµµ

/1)( 0

kk

+= (2.36)

Page 26: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

26

The LMS Algorithm

Adaptive normalization approach (non-schedule-type) µ is adjusted according to the input data every time step

where µ0 is a fixed constant. Stability is guaranteed if 0< µ0 <2; the practical range is

0.1<= µ0 <=1

2

2

0

)()(

kxk µµ = (2.37)

Page 27: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

27

The LMS Algorithm Comparison of two learning rate schedules: stochastic approximation

schedule and the search-then-converge schedule.

Eq.(2.35)

Eq.(2.36)

µ is a constant

Page 28: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

28

Summary of the LMS algorithm

Step 1: set k=1, initialize the synaptic weight vector w(k=1), and select values for µ0 and τ.

Step 2: Compute the learning rate parameter

Step 3: Computer the error

Step 4: Update the synaptic weights

Step 5: If convergence is achieved, stop; else set k=k+1, then go to step 2.

( )τ

µµ

/10

kk

+=

∑=

−=n

hhh kxkwkdke

1)()()()(

)()()()()1( kxkekkwkw iii µ+=+

Page 29: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

29

Example 2.1: Parametric system identification Input data consist of 1000 zero-mean Gaussian random vectors

with three components. The bias is set to zero. The variance of the components of x are 5, 1, and 0.5. The assumed linear model is given by b=[1, 0.8, -1]T.

To generate the target values, the 1000 input vectors are used to form a matrix X=[x1x2…x1000], the desired outputs are computed according to d=bTX

The progress of the learning rate parameter as it is adjusted according to the search-then converge schedule.

bx d

200

1936.09.010001000

1

max0

1000

1

=

==

=≈ ∑=

τλ

µ

h

TT

xXXxxC

The learning process was terminated when( ) 82 102/1 −≤= keJ

Page 30: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

30

Example 2.1 (cont.) Parametric system identification: estimating a parameter vector

associated with a dynamic model of a system given only input/output data from the system.

The root mean square (RMS) value of the performance measure.

Page 31: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

31

Adaline and Madaline

Adaline It is an adaptive pattern classification network trained by

the LMS algorithm.

x0(k)=1

可調整的bias或 weight

產生bipolar (+1, -1)的輸出,可因activation function的不同,而有(0,1)的輸出

)()()( kvkdke −= )()()(~ kykdke −=

Page 32: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

32

Adaline

Linear error The difference between the desired output and the

output of the linear combiner.

Quantizer error The difference between the desired output and the

output of the symmetric hard limiter.

)()()( kvkdke −=

)()()(~ kykdke −=

Page 33: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

33

Adaline

Adaline的訓練過程

輸入向量x必須和對應的desired輸出d,同時餵給Adaline。 神經鍵的權重值w,會根據linear LMS algorithm動態的調整。

Adaline在訓練的過程,並沒有使用到activation function,(activation function只有在測試階段才會使用)

一旦網路的權重經過適當的調整後,可用未經訓練的pattern來測試Adaline的反應。

如果Adaline的輸出和測試的輸入有很高的正確性時,可稱網路已經generalization。

Page 34: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

34

Adaline Linear separability

The Adaline acts as a classifier which separates all possible input patterns into two categories.

The output of the linear combiner is given as

)()()(

)()()(

0)()()()()(0)(Let

)()()()()()(

2

01

2

12

02211

02211

kwkwkx

kwkwkx

orkwkxkwkxkw

kvkwkxkwkxkwkv

−−=

=++=

++=

Page 35: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

35

AdalineLinear separability of the Adaline

Adaline只能分割線性可分割的patten

Page 36: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

36

AdalineNonlinear separation problem

若separating boundary非straight line,Adaline無法分割

Since the boundary is not a straight line, the Adaline cannot be used to accomplish this task.

Page 37: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

37

Adaline (cont.) Linear error correction rules

有兩種基本的線性修正規則,可用來動態調整網路的權重值。(網路權重的改變與網路實際輸出和desire輸出的差異有關) µ-LMS:same as (2.22) and (2.29) (基於最小化MSE表面)

α-LMS: a self-normalizing version of the µ-LMS learning rule

α-LMS演算法是根據最小擾動原則(minimal-disturbance principle) ,當調整權重以適應新的pattern的同時,對於先前的pattern的反應,應該有最小的影響。

2

2)(

)()()()1(kx

kxkekwkw α+=+ (2.46)

[ ] )()()()()()1( kxkekwwJkwkw w µµ +=∇−+=+

Page 38: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

38

Adaline (cont.) Consider the change in the error for α-LMS

From (2.47)

The choice of a controls stability and speed of convergence, is typically set in the range

α-LMS之所以稱為self-normalizing是因為α的選擇和網路的輸入大小無關,

( ) ( ) ( ) ( ) ( ) ( )[ ] ( )

( ) ( ) ( ) ( )( )

( ) ( )

( ) ( ) ( ) ( )( )

( ) ( )kekekx

kxkxkeke

kekxkx

kxkekwkd

kekxkwkdkekeke

T

TT

T

αα

α

−=−

−=

+−=

−+−=−+=∆

22

22

11

( )( )ke

ke∆−=α

11.0 << α

(2.47)

(2.48)

Page 39: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

39

Adaline (cont.) Detail comparison of the µ-LMS and α-LMS

From (2.46)

Define normalized desired response and normalized training vector

Eq(2.49) can be rewrote as

( ) ( ) ( ) ( )[ ] ( )

( ) ( ) ( ) ( ) ( )

222

22

22

)()()(

)(

)(

)()()()1(

kx

kx

kx

kxkwkx

kdkw

kx

kxkxkwkdkw

kx

kxkekwkw

T

T

−+=

−+=

+=+

α

α

α

( ) ( )( ) ( ) ( )

( ) 22

,kxkxkx

kxkdkd ≡≡ ))

( ) ( ) ( ) ( )[ ] )()1( kxkxkwkdkwkw T )))−+=+ α

和µ-LMS具有相同的型式,所以α-LMS表示正規化輸入樣本後的µ-LMS 。

(2.49)

(2.50-51)

(2.52)

[ ] )()()()()()1( kxkekwwJkwkw w µµ +=∇−+=+

Page 40: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

40

Multiple Adaline (Madaline)

單一個Adaline無法解決非線性分割區域的問題。

可使用多個adaline Multiple adaline Madaline

Madaline I:single-layer network with single output. Madaline II:multi-layer network with multiple output.

Page 41: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

41

Example of Madaline I network consisting of three Adalines

May be OR, AND, and MAJ

Page 42: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

42

Two-layer Madaline II architecture

Page 43: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

43

Simple Perceptron

Simple perceptron (single-layer perceptron) Very similar to the Adaline, 由Frank Rosenblatt (1950)提出。

Minsky and Paper發現一個嚴重的限制:perceptron無法解決XOR的問題。

藉由正確的processing layer,可解決XOR問題,或是parity function的問題。

Simple perceptron和典型的pattern classifier 的maximum-likelihood Gaussian classifier有關,均可視為線性分類器。

大部分的perceptron的訓練是supervised,也有部分是self-organizing。

Page 44: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

44

Simple Perceptron44

[ ][ ]Td

Td

Td

jjj

xx

www

wxwy

,...,,1

,...,,

1

10

01

=

=

=+= ∑=

x

w

xw

(Rosenblatt, 1962)

The perceptron is the basic processing element.

Page 45: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

45

What a Perceptron Does?

Regression: y=wx+w0 Classification:y=1(wx+w0>0)

45

ww0

y

x

x0=+1

ww0

y

x

s

w0

y

x

( ) [ ]xwToy−+

==exp1

1 sigmoid

xwTo =

Page 46: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

46

Simple Perceptron : K Outputs46

K parallel perceptrons. xj, j = 0, . . . , d are the inputs and yi, i =1,. . .,Kare the outputs. wij is the weight of the connection from input xj to output yi .

When used for K-class classification problem, there is a post-processing to choose the maximum, or softmax if we need the posterior probabilities.

Page 47: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

47

K Outputs47

∑=

=

k k

ii

Tii

ooy

o

expexpxw

Classification:there are K perceptrons, each of which has a weight vector wi

where wij is the weight from input xj to output yi . W is the K × (d+ 1) weight matrix of wijWhen used for classification, during testing, we

xy

xw

W=

=+= ∑=

Tii

d

jjiji wxwy 0

1

kk

ii yyC max if choose =

Activation function

Page 48: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

48

Simple Perceptron (cont.)

Original Rosenblatt’s perceptron Binary input, no bias.

Modified perceptron Bipolar inputs and a bias term Output y={-1,1}

Page 49: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

49

Simple Perceptron (cont.) The quantizer error is used to adjust the synaptic

weights of the neuron. The adaptive algorithm for adjusting the neuron

weights (the perceptron learning rule) is given as

Rosenblatt normally set α to unity. The choice of the learning rate α does not affect the

numerical stability of the perceptron learning rule. α can affect the speed of convergence.

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )[ ] ( ) ( )kykdkxkwkdke

kxkekwkw

T −=−=

+=+

sgn~where

2

~1 α (2.55)

(2.56)

比較(2.46)

2

2)(

)()()()1(kx

kxkekwkw α+=+

Page 50: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

50

Simple Perceptron (cont.)

The perceptron learning rule is considered a nonlinear algorithm.

The perceptron learning rule performs the update of the weights until all the input patterns are classified correctly. The quantizer error will be zero for all training pattern inputs, and

no weight adjustments will occur.

The weights are not guaranteed to be optimal.

Page 51: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

51

Simple Perceptron with a Sigmoid Activation Function

The learning rule is based on the method of steepest descent and attempts to minimize an instantaneous performance function.

Page 52: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

52

Simple Perceptron with a Sigmoid Activation Function (cont.)

學習演算法可由MSE推導獲得

The instantaneous performance function to be minimized is given as

( ) ( ){ }( ) ( ) ( )kykdke

keEJ

qqq

qq

−=

=

~ where

~21 2w

( ) ( ) ( ) ( )[ ]

( ) ( ) ( ) ( )[ ]( ) ( )[ ] ( ) ( )[ ]qq

Tqq

qqqq

qqqq

kwkxfkvfky

kykykdkd

kykdkeJ

θ+==

+−=

−==

where

221

21~

21

22

22w

(2.61)

(2.60)

(2.59)

Page 53: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

53

Simple Perceptron with a Sigmoid Activation Function (cont.)

假設 activation function為hyperbolic tangent sigmoid,因此,神經元的輸出可表示成

根據(2.15)式對hyperbolic tangent sigmoid函數的微分

採用steepest descent的discrete-time learning rule

( ) ( )[ ] ( )[ ]kvkvfky qqhtsq αtanh==

( )[ ] ( )[ ] ( )[ ]{ }kvfkvfkvg qqq21' −== α

( ) ( ) ( )qwqq Jkk www ∇−=+ µ1(參考2.29式)

(2.64)

(2.63)

(2.62)

[ ] )()()()()()1( kxkekJkk w µµ +=∇−+=+ wwww

( ) ( )( ) ( ) ( )[ ]qbsqbsv

v

q

qbsqbs vfvf

e

edv

vdfvg

q

q

−=+

==−

11

2 ααα

α

Page 54: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

54

Simple Perceptron with a Sigmoid Activation Function (cont.)

計算(2.64)式的梯度(gradient)

以(2.63)式代入(2.65)式

採用(2.66)式的gradient,則discrete-time learning rule for simple perceptron 可寫成

( ) ( ) ( )[ ] ( ) ( )[ ] ( )[ ] ( )( ) ( )[ ]{ } ( )[ ] ( )

( ) ( )[ ] ( )kxkvfkekxkvfkvfkd

kxkvfkvfkxkvfkdJ

qq

qqq

qqqqqw

~'

''

−=

+−=

+−=∇ w

( ) ( ) ( )[ ]{ } ( )( ) ( ){ } ( )kxkyke

kxkvfkeJ

qq

qqqw

2

2

1~1~

−−=

−−=∇

α

αw

( ) ( ) ( ) ( )[ ] ( )kxkykekk qqqq21~1 −+=+ µαww

(2.65)

(2.66)

(2.67)

Page 55: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

55

Simple Perceptron with a Sigmoid Activation Function (cont.)

(2.67)式可改寫成scalar form

其中

(2.68) 、(2.69)和 (2.70)為backpropagation training algorithm的標準形式。

( ) ( ) ( ) ( )[ ] ( )kxkykekwkw jqqqjqj21~1 −+=+ µα

( ) ( ) ( )

( ) ( )[ ] ( ) ( )

+==

−=

∑=

n

jqqjjqq

qqq

kwkxfkvfky

kykdke

1

~

θ (2.70)

(2.69)

(2.68)

Page 56: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

56

Example 2.2

Applied the architecture of Figure 2.30 to learn character “E” The character image consists of 5x5 array, 25 pixel (column major) The learning rule is Eq.(2.67), with α=1, µ=0.25 The desired neuron response d=0.5, error goal 10-8. The initial weights of the neuron were randomized. After 39 training pattern, the actual neuron output y=0.50009 (see Fig.

2.32)

Page 57: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

57

Example 2.2 (cont.)

The single neuron cannot correct for a noisy input.

For Fig. 2.31 (b), y=0.5204 For Fig. 2.31 (c), y=0.6805 To compensate for noisy

Multi-layer perceptron Hopfield associative

memory.

Page 58: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

58

Feedforward Multilayer Perceptron The multilayer perceptron is an artificial neural

network structure and is a nonparametric estimator that can be used for classification and regression.

Multilayer perceptron (MLP) The branches can only broadcast information in one direction. Synaptic weight can be adjusted according to a defined

learning rule. h-p-m feedforward MLP neural network. In general there can be any number of hidden layers in the

architecture; however, from a practical perspective, only one or two hidden layer are used.

Page 59: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

59

Feedforward Multilayer Perceptron (cont.)

Page 60: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

60

Feedforward Multilayer Perceptron (cont.)

The first layer has the weight matrix

The second layer has the weight matrix

The third layer has the weight matrix

Define a diagonal nonlinear operator matrix

[ ] nhjiwW ×ℜ∈= )1()1(

[ ] hprjwW ×ℜ∈= )2()2(

[ ] pmsrwW ×ℜ∈= )3()3(

( )[ ] [ ] [ ] [ ][ ]⋅⋅⋅≡⋅ )()()( ,...,, llll fffdiagf (2.71)

Page 61: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

61

Feedforward Multilayer Perceptron (cont.)

The output of the first layer can be written as

The output of the second layer can be written as

The output of the third layer can be written as

將(2.72)代入(2.73) ,再代入(2.74)可得最後的輸出為

[ ] [ ]xWvx )1()1()1()1(1 ffout ==

[ ] [ ]1)2()2()2()2(

2 outout ff xWvx ==

[ ] [ ]2)3()3()3()3(

3 outout ff xWvx ==

(2.72)

(2.73)

(2.74)

[ ][ ][ ]xWWW )1()1()2()2()3()3( fffy = (2.75)

The synaptic weights are fixed, a training process must be carried out a prioriTo properly adjust the weights.

Page 62: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

62

Overview of Basic Learning Rules for a Single Neuron

Generalized LMS Learning Rule 定義一個需最小化的performance function (energy function)

其中, ||w||2為向量w的Euclidean normΨ(.)為任何可微分的函數,e is the linear error。

2

22)( ww αψξ +=)( e

xwTde −=

Desired output

Weight vector Input vector

(2.76)

(2.77)

Page 63: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

63

Generalized LMS Learning Rule (cont.) 採用最陡坡降法(steepest descent approach) ,可獲得

general LMS algorithm。

Continuous-time learning rule(可視為向量的微分)

Discrete-time learning rule

If Ψ(t)=1/2t2, and Ψ ’(t)=g(t)=t, then (2.81) is written as

[ ]wx

ww

αµ

ξµ

−=

∇−=

)(

)(

egdtd

w

[ ])()()()()()()1(

kkegkkk w

wxwwww

αµξµ

−+=∇−=+

Learning rate Leakage factor

(2.78)

(2.79)

(2.82)

(2. 81)

( ) wxwxwxw γµµαµαµ −=−=−= eeedtd

(2.83)

Page 64: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

64

Generalized LMS Learning Rule (cont.)

Leaky LMS algorithm (0<=γ<1)

Standard LMS algorithm (γ=0), (the same as Eq.2.29)

The scalar form of standard LMS algorithm

)()()()1()()()()()1(

kkekkkkekk

xwwxww

µγγµ

+−=−+=+

)()()()1( kkekk xww µ+=+

(2.84)

(2.85)

∑=

−=

=

+=+

n

jjj

jjj

kxkwkdke

njkxkekwkw

1)()()()(

,...,2,1,0for )()()()1( µ (2.86)

Page 65: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

65

Generalized LMS Learning Rule (cont.)

Standard LMS 可有二種重要的變化:

慣性(momentum)被設計來在平均下坡力量的方向上,提供一特殊動量來改變權重向量。可定義成目前權重w(k)和前一權重w(k-1)間的差異。

因此(2.85)式可改寫成

其中0<α<1為momentum parameter

( ) ( )[ ]1)( −−=∆ kkk www αα

( ) ( ) ( ) ( ) ( ) ( )[ ]11 −−++=+ kkkkekk wwxww αµ

(2.87)

(2.88)Standard LMS algorithm with momentum

Page 66: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

66

Generalized LMS Learning Rule (cont.)

2. 最小擾動原則(minimal disturbance principle) Modified normalized LMS 在(2.46)式中,在分母的地方加入正的常數,確保權重的更新

不會變成無限大。

Where

typically

( ) ( ) ( ) ( )( )

++=+ 2

2

1kkkekk

xxww

αµ

(2.98)

200

<<≥

µα

11.0 << µ

Page 67: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

67

Example 2.3

The same as Example 2.1,但使用不同的LMS algorithm。

Use the same Initial weight vector Initial learning rate Termination criterion

Page 68: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

68

Overview of basic learning rules for a single neuron (cont.) Standard Perceptron Learning Rule

可由minimizing the MSE criterion來推導獲得

其中

神經元的輸出

採用steepest descent approach, the continuous-time learning rule is given by

2

21)( ew =ξ

yde −=

)()( vfxwfy T ==

)(wdtdw

wξµ∇−=

(2.132)

(2.133)

(2.134)

Page 69: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

69

Overview of basic learning rules for a single neuron (cont.)

則(2.132)式的gradient可得

[ ]

x

xdv

vdfe

xdv

vdfvd

xdv

vdfvxdv

vdfdww

l−=

−=

−−=

+−=∇

)(

)()(

)()()()(

ψ

ψξ

)()(')( vegvefdv

vdfe ===l

其中

(2.135)

(2.136)

Page 70: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

70

Overview of basic learning rules for a single neuron (cont.)

使用(2.134), (2.135)和(2.136)式,the continuous-time standard perceptron learning rule for a single neuron as

(2.137)式可改寫成discrete-time形式

The scalar form of (2.138) can be written as

xdtdw lµ−=

)()()()1( kxkkwkw lµ+=+

)()()()1( kxkkwkw jjj lµ+=+

(2.137)

(2.138)

(2.139)

Page 71: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

71

Overview of basic learning rules for a single neuron (cont.) Generalized Perceptron Learning Rule

Page 72: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

72

Overview of basic learning rules for a single neuron (cont.) Generalized Perceptron Learning Rule

When the energy function is not defined to be the MSE criterion, we can define a general energy function as

其中ψ(.)為可微函數。如果ψ(.)=1/2 e2 則變成standard perceptron learning rule.

其中

)()()( ydew −== ψψξ

wv

vy

ye

eww ∂

∂∂∂

∂∂

∂∂

=∇ψξ )(

)()(')( eeee δψψ

≡=∂

( ) )(vfxwfy T ==

(2.141)

(2.142)

(2.143)

(2.140)

Page 73: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

73

Overview of basic learning rules for a single neuron (cont.) f(.) is a differentiable function, and

(2.141) can be written as

The continuous-time general perceptron learning rule is given as

If we define the learning signal as

(2.146) can be written as

)()(')()( vgvfdv

vdfdv

vdy≡==

xvgeww )()()( δξ −=∇

xvgedtdw )()(µδ=

)()( vgeδ≡l

xdtdw lµ=

(2.144)

(2.145)

(2.146)

(2.147)

(2.148)

Page 74: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

74

Overview of basic learning rules for a single neuron (cont.) Discrete-time form

Discrete scalar form

)()()()1( kxkkwkw lµ+=+

)()()()1( kxkkwkw jjj lµ+=+

(2.149)

(2.150)

Page 75: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

75

Data Preprocessing

The performance of a neural network is strongly dependent on the preprocessing that is performed on the training data.

Scaling The training data can be amplitude-scaled in two

ways The value of the pattern lie between -1 and 1. The value of the pattern lie between 0 and 1.

Referred to as min/max scaling. MATLAB: premnmx

Page 76: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

76

Data Preprocessing (cont.)

Another scaling process Mean centering

如果用來training的data包含有biases時。

Variance scaling 如果用來training的data具有不同的單位時。

假設輸入向量以行方向排列成矩陣

目標向量也以行方向排列成矩陣

Mean centering 計算矩陣A、C中每一列的 mean value 將矩陣A、C中的每個元素,減去該對應的mean value。

Variance scaling 計算矩陣A、C中每一列的 standard deviation. 將矩陣A、C中的每個元素,除以該對應的standard deviation 。

mnA ×ℜ∈mpC ×ℜ∈

Page 77: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

77

Data Preprocessing (cont.)

Transformations The feature of certain “raw” signals are used fro training

inputs provide better results than the raw signals. A front-end feature extractor can be used to discern salient

or distinguishing characteristics of the data. Four transform methods:

Fourier Transform Principal-Component Analysis Partial Least-Squares Regression Wavelets and Wavelet Transforms

Page 78: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

78

Data Preprocessing (cont.)

Fourier Transform The FFT can be used to extract the import

features of the data, and then these dominant characteristic features can be used to train the neural network.

Page 79: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

79

Data Preprocessing (cont.)三個具有相同波形,不同相位的信號,

每個信號具有1024個取樣點。

Page 80: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

80

Data Preprocessing (cont.)

在FFT magnitude response上,具有相同magnitude response ,而且只

需16個magnitude取樣即可。

三個具有相同波形,不同相位的信號,在FFT的相位上則有所不同。

Page 81: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

81

Data Preprocessing (cont.)

Principal-Component Analysis PCA can be used to “compress” the input training

data set, reduce the dimension of the inputs. By determining the important features of the data

according to an assessment of the variance of the data.

In MATLAB, prepca is provided to perform PCA on the training data

Page 82: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

82

Data Preprocessing (cont.) Given a set of training data

where assumed that m>>n,n denote the dimension of the input training patternsm denote the number of training pattern.

Using PCA, an “optimal” orthogonal transformation matrix can be determined

where h<<n (the degree of dimension reduction) The dimension of the input vectors can be reduced according to

the transformation

where Ar is the reduced-dimension set of training patterns.The columns of Ar are the principal components for each of the inputs from A

mnA ×ℜ∈

nhpcaW ×ℜ∈

AWA pcar =

mhrA ×ℜ∈

(2.151)

Page 83: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

83

Data Preprocessing (cont.) Partial Least-Squares Regression

PLSR can be used to compress the input training data set. Restricted for use with supervised trained neural networks. Only scalar target values are allowed. The factor analysis in PLSR can determine the degree of

compression of the input data. After the optimal number of PLSR factor h has been determined,

the weight loading vectors can be used to transform the data similar to the PCA approach.

The optimal number weight loading vectors can form an orthogonal transformation matrix as the columns of the matrix

The dimension of the input vectors can be reduced according to the transformation

hnplsrW ×ℜ∈

AWA Tplsrr = (2.152)

Page 84: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

84

Data Preprocessing (cont.)PCA and PLSR orthogonal transformation vectors used for data compression

PLSR使用輸入資料與目標資料來產生orthogonal transformation Wplsr的weight loading vector

Page 85: Fundamental Neurocomputing Conceptsmipl.yuntech.edu.tw/wp-content/uploads/2020/03/Chapter-2... · 2020. 3. 23. · Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm

85

Data Preprocessing (cont.)

Wavelets and Wavelet Transforms A wave is an oscillating function of time. Fourier analysis is used for analyzing waves

Certain function can be expanded in terms of sinusoidal waves. How much of each frequency component is required to synthesize the

signal. Very useful for periodic, time-invariant, stationary signal analysis.

A wavelet can be considered as a small wave, whose energy is concentrated. Useful for analyzing signals that are time-varying, transient,

nonstationary. To allow for simultaneous time and frequency analysis. Wavelets are local waves. The wavelet transform can provide a time-frequency description of

signals and can be used to compress data for training neural network.