gradient descent rule tuning see pp. 207-210 in text book
TRANSCRIPT
RulesConsider a rule base with• M rules, rth rule has the form
• IF x1 is Tr,1 AND … AND xn is Tr,n THEN y is yr (or y is yr + other stuff)
• TSK fuzzy system has mathematical form
, , , ,1
, , ,11
,
1
( ; , ,
( )
)
( ; , , )
M
r
M
r
n
r i i r i r i r ii
n
r i i r i r i r i
r
i
f x
y x c L R
x c L R
,1 , , , , ,1
, , , ,
1
1
1
1
, ( ; , , )
( ; , , )
( )
n
r i r i r i r i r ii
n
r i i r i r i r
M
r r r n
ri
n
i
r
M
y x c L R
x
x
L
f x
x
c R
• Membership function parameters– Center, right-width, left-width– Consequent parameters
• 3 level (layer) structure of f(x)– Level (layer) 1:
• For each rule Compute all membership values for each term, compute product, store as zr
– Level (layer) 2:• Compute product of membership values and
consequents, sum: n• Sum membership values: d
– Level (layer) 3:• Compute quotient: f = n/d
( ; ; , )( )
( ; )
n xf x
d x
y
Rule parameters• Membership function parameters
– Center, right-width, left-width– Consequent parameters
• Why not s, z and triangular membership functions?• Why Gaussian membership functions?
,
2
,
,
,2
1 1
1
1
1 1
( , , )( )
( , , )
i r
i r
i r
i r
xnM
M
r i lrM
xnM
lr
x
r i
x
r
r
ez x
f xz x
xy
e
y
x
Gradient Descent
• Choose parameters to minimize the error
• Corresponds to a blind person descending a mountain by finding the steepest descending slope and moving in that direction
• Slope is determined by differentiation (computing the “gradient”)
• Chain rule helps tremendously.
Gradient Descent Math•Consider a sequence of input/output measurements: (x0
p, y0p)
•As each input/output measurement pair arrives (and before the next input/output measurement pair arrives), we want to adjust our model parameters to reduce the error ep = [f(x0
p)-y0p]2/2
•Dropping the sub-and-super-scripts e = [f(x)-y]2/2•The gradient descent algorithm for any vector-valued parameter s is
old
new old ss s
s
es s
s
step size for s
2,
,
1 1
1 1
( , , )
( , )
( , , ) ( , , )
( , , ) ( ; , )i r i
r i
M M
r r rr r
xn n
r ii i
r
x
nf
d
n z x d z x
z x e x
y x
x
x i r
y x x
old
new old ss s
s
es s
s
step size for s
Apply to: y
x
For ybar
old
new old y
y
ey y
y
2
2 1
1
( ; )1 1
( )2 2 ( ; )
M
rrM
r
y x ie f x y y
x i
2
1
1
1 1
( )1( )
( ( ) ) 2
( )( )
( , )( )
( , )
( ; )( , )
( ; ) ( , )
q q
q
M
r
M
rrM M
r r
f x yef x y
y f x y y
f xf x y
y
x qf x y
x r
y x rx q
yx r x r
1
2
1
( ,1)( ; )
( , 2)
( ; )( , )
M
rr
M
r
xy y x r
xe
yx r
x M
Given x and y
Modify for betaModify for xbarModify for sigma
Gradient Descent• For a generic
parameter
• For ybar, see previous slide
• For xbar
• For sigma• Abstraction saves
work.
de dff y
dp dp
?i
df
dx
?i
df
d
?i
df
dy
One LV Example FL System
• LV X: Term set: Negative, Zero, Positive
• 3 rules
• Antecedent matrix, Consequent matrix
• Gaussian membership functions
• Super membership function
• Fuzzy function parameters
• TSK fuzzy function
• Gradient Descent parameter tuning
One LV Example FL System• LV X: Negative5, Zero, Positive5• 3 rules
– If x is Negative5 then y is 25– If x is Zero then y is 0– If x is Positive5 then y is 25
• Antecedent matrix and consequent matrix
5 1
2
5 3
Negative
A Zero
Positive
1
2
3
25 ?
0 ??
25 ???
y
C y
y
One LV Example FL System
• LV X: Negative5, Zero, Positive5
• Gaussian membership functions
222
2
(0)
2( )x xx
Zero x e e
22
3
3
(5)
25 ( )
x xx
Positive x e e
221
1
( 5)
25 ( )
x xx
Negative x e e
1
2
3
5 ?
0 ??
5 ???
x
x x
x
1
2
3
2 ?
2 ??
2 ???
One LV Example FL System
• Super membership function
5 1 1 1 1 1
2 2 2 2 2
5 3 3 3 3 3
( , , ) ( , , )
( , , , , ) ( , , ) ( , , )
( , , ) ( , , )
{ 5, , 5} {1,2,3}
Negative
Zero
Positive
x x x x
x LV T x x x x x
x x x x
LV is X
T is one of Negative Zero Positive
One LV Example FL System• TSK fuzzy function• Gradient Descent parameter tuning
3 3
1 13 3
1 1
( , , ) ( , , )( )
( , , ) ( , , )
r r r r r rr r
r r r rr r
y x x y z x xf x
x x z x x
One LV Example FL System
• TSK fuzzy function, Gradient Descent parameter tuning ybar
3 3
1 13 3
1 1
( , , ) ( , , )( )
( , , ) ( , , )
r r r r r rr r
r r r rr r
y x x y z x xf x
x x z x x
21( , , , )
2e f x y x y
o
n o y
y
de
dy
yy
[ ( , )[ ( , ) ]
[ ( , ) ]
] ( , )d f x y ydef x y
dey
d f x
df x y
dy dydy yy
: ( , )new data x y
3
1
( , ) ( , , )
( , , )
i i i i
ir r r
r
df x y x x
dy x x
1 1
2 23 3
3 31 1
( , )
1 1
r rr r
df x y
dy
z
zz z
One LV Example FL System
• TSK fuzzy function, Gradient Descent parameter tuning ybar
o
n o y
y
de
dyy y
( , )[ ( , ) ]
df x yf x y
y
d
y ddy
e
: ( , )new data x y
1
, ,
23
31
( , , )( , , , )
( , , )( , , ) ( , , )
o o o
n o y
x
r
yr
z xf y x
z xz x z
y y
xx y
x
xx x
Heart and soul of gradient descent algorithm to tune ybar using experimental data.
Engineers derive these expressions. Computers compute with these expressions, often iteratively, to improve designs.
Note interplay of theory and real-world data.
One LV Example FL System
• TSK fuzzy function, Gradient Descent parameter tuning xbar
o
n o xx
de
dxx x
[ ( , ) ]
( , )f x x y
de df x
dxd
x
x
: ( , )new data x y
3
13
1
3 3
1 13 3
1 1
( , ) ( , )
( , ) (
)
,
( ,
)
r r r r rr r
i i ir r
r rr
ir
rr
r r
i i
i
y z x xd
y x xdf d d
dx d
x x
dxx dxz x x x x
yd
d
2
3 3 3
1 1 12 23 3
1 1
2 2
2(1 (
) 2( ))i
i
i
i r r r i r rr r r
r r
x x
i
r r
ii
i i
y y y ydf
dx
x x x xe
One LV Example FL System
• TSK fuzzy function, Gradient Descent parameter tuning xbar
112
1
2
3
11
3
2231
222
332
3
33
1
1
( )
1( )
2( )
2( )
)( )
2(
r rr
r rr
rr
r rr
y y
y y
y
x x
x x
x
f
d
yx
d
x
o
n o xx
de
dxx x
( , )[ ( , ) ]
de df xf x
dx x
x
dx y
: ( , )new data x y
31
1 121 1
32
2 22 231 2
31 3
3 321 3 , ,
2( )( )
2( )1( , , , ) ( )
2( )( )
o o o
r rr
r rr
rr
r r
n o x
r y x
x
xx y
xy y
xf y x y y
y
x
xy
x
x
One LV Example FL System
• TSK fuzzy function, Gradient Descent parameter tuning xbar
o
n ox
de
d
[ ( , ) ]
( , )f x x y
de df
dd
x
: ( , )new data x y
3
13
1
3 3
1 13 3
1 1
( , ) ( , )
( , ) (
)
,
( ,
)
r r r r rr r
i i ir r
r r
rr r
r
ir
i i
r
i
y z x y xdf d d
d
d x
dd dz x x
yd
d
2
3 3 3
1 1 12 23 3
1
3
1
2 2
3
2( )( )
2( )1 i
i
i
i r r r i r rx x
i ii
i i
r r r
r rr r
x x xdf
d
y ye
y yx
One LV Example FL System
• TSK fuzzy function, Gradient Descent parameter tuning xbar
21
131
2
3
11
3
2231
3
223
2
23
333
13
1
( )
1( )
2( )
2( )
)( )
2(
r rr
r rr
rr
r rr
x x
x x
y y
y ydf
d
yx x
y
o
n o x
de
d
( , )[ ( , ) ]
de dff yx
d d
xx
: ( , )new data x y
231
1 131 1
232
2 22 331 2
231 3
3 331 3 , ,
2( )( )
2( )1( , , , ) ( )
2( )( )
o o o
r rr
r rn or
rr
r
x
rr y x
xy y
xf y x y y
xy
xy
x
xx y
One LV: Gradient Descent Summary
31
1 121 1
32
2 22 231 2
31 3
3 321 3 , ,
2( )( )
2( )1( , , , ) ( )
2( )( )
o o o
r rr
r rr
rr
r r
n o x
r y x
x
xx y
xy y
xf y x y y
y
x
xy
x
x
231
1 131 1
232
2 22 331 2
231 3
3 331 3 , ,
2( )( )
2( )1( , , , ) ( )
2( )( )
o o o
r rr
r rn or
rr
r
x
rr y x
xy y
xf y x y y
xy
xy
x
xx y
1
, ,
23
31
( , , )( , , , )
( , , )( , , ) ( , , )
o o o
n o y
x
r
yr
z xf y x
z xz x z
y y
xx y
x
xx x
Two LV Example FL System• Temperature term set: Cold, Comfortable, Hot• Humidity term set: Wet, Dry• 6 rules• Antecedent matrix, Consequent matrix• Gaussian membership functions• Super membership function• Fuzzy function parameters• TSK Fuzzy Function• Gradient descent parameter tuning
Two LV Example FL System• Temperature term set: Comfortable, Warm, Hot
• Humidity term set: Wet, Dry
• 6 rules– If T is Comfortable and H is Wet then HI is– If T is Comfortable and H is Dry then HI is– If T is Warm and H is Wet then HI is– If T is Warm and H is Dry then HI is– If T is Hot and H is Wet then HI is – If T is Hot and H is Dry then HI is
Two LV Example FL System: Matrices– If T is Comfortable and H is Wet then HI is– If T is Comfortable and H is Dry then HI is– If T is Warm and H is Wet then HI is– If T is Warm and H is Dry then HI is– If T is Hot and H is Wet then HI is – If T is Hot and H is Dry then HI is
1
2
3
4
5
6
1 1
1 2
2 1
2 2
3 1
3 2
yComfortable Wet
yComfortable Dry
yWarm WetA C
yWarm Dry
yHot Wet
yHot Dry
Two LV Example FL System
• Temperature term set: Cold, Comfortable, Hot
• Humidity term set: Wet, Dry
• Gaussian membership functions
• Super membership function