definition of univariate b-splines - uni-hamburg.de · definition of univariate b-splines the...
TRANSCRIPT
![Page 1: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/1.jpg)
Definition of univariate B-splines
The B-splines are employed to specify the linguistic terms, and knots are chosento be different from each other (periodical model). Visually, the selection of k (theorder of the B-splines) determines the following factors of the fuzzy sets formodeling the linguistic terms.
Assume x is a general input variable of a control system that is defined on theuniverse of discourse [x1, xm]. Given a sequence of ordered parameters (knots):x1, x2, . . . , the ith B-spline Ni,k of order k (degree k − 1) is recursively defined asfollows:
Ni,k(x) =
{
1 for x ∈ [xi, xi+1)0 otherwise
if k = 1
x−xixi+k−1−xi
Ni,k−1(x) + xi+k−x
xi+k−xi+1Ni+1,k−1(x) if k > 1
(1)
with i = 1, . . . ,m− k.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 263
![Page 2: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/2.jpg)
Therefore, m knots xi(i = 1, . . . ,m) form l = m− k B-splines (Figure 1).
Abbildung 1: Nine B-splines of order 3 defined over 12 non-uniformly distributedknots.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 264
![Page 3: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/3.jpg)
Examples of B-splines of order 1, 2, 3 and 4 with their knots are shown in Figure 2.
Abbildung 2: Nonuniform univariate B-splines of oder 1 to 4 defined on a parameterx.
In each interval [xj, xj+1], k non-zero B-splines overlap.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 265
![Page 4: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/4.jpg)
The example of order 3 (cubic B-splines) is shown in Figure 3.
Abbildung 3: Cubic B-splines [xj, xj+1] defined on a parameter x.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 266
![Page 5: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/5.jpg)
Properties of B-Splines
Recursive definition is one basic feature of B-splines, which enables the generationof B-splines of arbitrary orders with the incremental smoothness for a given set ofknots. The other most important properties of B-splines, in respect to modelingand control are:
Partition of unity:∑l
i=0 Ni,k(x) = 1.
Positivity: Ni,k(x) ≥ 0 for all x.
Local support: Ni,k(x) = 0 for x /∈ [xi, xi+k].
Ck−2 continuity: If the knots {xi} are pairwise different fromeach other, then Ni,k(x) ∈ Ck−2, i.e., Ni,k(x)is (k − 2) times continuously differentiable.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 267
![Page 6: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/6.jpg)
Lattice
Abbildung 4: The B-spline model – a two-dimensional illustration.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 268
![Page 7: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/7.jpg)
Each n-dimensional rectangle (n > 1) of the lattice is covered by the jth
multivariate B-spline N jk(x) which is formed by taking the tensor product of n
univariate B-splines:
N jk(x) =
n∏j=1
N jij,kj
(xj) (2)
Therefore the shape of each B-spline, and thus the shape of multivariate ones(Figure 5), is implicitly set by their order and their given knot distribution on eachinput interval.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 269
![Page 8: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/8.jpg)
(a) Tensor product of two, order 2univariate B-splines.
(b) Tensor product of one order3 and one order 2 univariate B-splines.
(c) Tensor product of two univa-riate B-splines of order 3.
Abbildung 5: Bivariate B-splines formed by taking the tensor product of twounivariate B-splines.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 270
![Page 9: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/9.jpg)
Fuzzy-Controller eines MISO-Systems - I
Conditions of B-spline Fuzzy Controllers:
• periodical B-spline basis functions as membership functions for inputs,
• fuzzy singletons as membership functions for outputs,
• “product” as fuzzy conjunctions,
• “centroid” as defuzzification method,
• addition of “virtual linguistic terms” at both ends of each input variable and
• extension of the rule base for the “virtual linguistic terms” by copying theoutput values of the “nearest” neighbourhood.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 271
![Page 10: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/10.jpg)
B-Spline-Fuzzy-Controller eines MISO-Systems - II
A MISO system with n inputs x1, x2, . . . , xn, rules with the n conjunctive terms in the premise
are given in the following form:
{Rule(i1, i2, . . . , in): IF (x1 is N1i1,k1
) and (x2 is N2i2,k2
) and . . . and (xn is Nnin,kn
) THEN y
is Yi1i2...in},
where
• xj: the j-th input (j = 1, . . . , n),
• kj: the order of the B-spline basis functions used for xj,
• N jij,kj
: the i-th linguistic term of xj defined by B-spline basis functions,
• ij = 0, . . . ,mj, representing how fine the j-th input is fuzzy partitioned,
• Yi1i2...in: the control vertex (deBoor points) of Rule(i1, i2, . . . , in).
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 272
![Page 11: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/11.jpg)
Fuzzy-Controller eines MISO-Systems - III
The output y of a MISO fuzzy controller is:
y =
∑m1i1=0 . . .
∑mnin=0(Yi1,...,in
∏nj=1 N j
ij,kj(xj))∑m1
i1=0 . . .∑mn
in=0
∏nj=1 N j
ij,kj(xj)
(3)
=
m1∑i1=0
. . .
mn∑in=0
(Yi1,...,in
n∏j=1
Njij,kj
(xj)) (4)
This is called a general NUBS hypersurface, which possesses the followingproperties:
• If the B-functions of order k1, k2, . . . , kn are employed to specify the linguistic terms of the
input variables x1, x2, . . . , xn, it can be guaranteed that the output variable y is (kj − 2)
times continuously differentiable in respect to the input variable xj, j = 1, . . . , n.
• If the input space is partitioned fine enough and at the correct positions, the interpolation with
the B-spline hypersurface can reach a given precision.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 273
![Page 12: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/12.jpg)
B-spline Type: SISO Systems
A SISO system with B-functions of order 2 (Xi(x): firing strength of rule i; yi: thecontribution of rule i to the output).
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 274
![Page 13: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/13.jpg)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 275
![Page 14: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/14.jpg)
MISO Systems - A 2D Example
An example with two input variables (x and y) and an output z. The controlvertices of the output are Z1, Z2, Z3, Z4.
The linguistic terms of the inputs:
The linguistic terms of the output:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 276
![Page 15: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/15.jpg)
A 2D Example - The Rule Base
The rule base consists of four rules:Rule
1) IF x is X1 and y is Y1 THEN z is Z1
2) IF x is X1 and y is Y2 THEN z is Z2
3) IF x is X2 and y is Y1 THEN z is Z3
4) IF x is X2 and y is Y2 THEN z is Z4
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 277
![Page 16: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/16.jpg)
A 2D Example - Inference
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 278
![Page 17: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/17.jpg)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 279
![Page 18: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/18.jpg)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 280
![Page 19: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/19.jpg)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 281
![Page 20: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/20.jpg)
A 2D Example - Defuzzification
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 282
![Page 21: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/21.jpg)
Supervised Learning
Supervised learning assumes that a “teacher” provides the complete desiredsystem output for each input datum.
Based on the complete set of these input/output vectors, B-spline type fuzzycontrollers can be trained very rapidly.
Computing parameters of such a B-spline fuzzy system is divided into two steps:for the IF-part and for the THEN-part.
Considering the granularity of the input space and the maximal point distributionof the control space if known, the fuzzy sets can be generated using the recursivecomputation of B-spline basis functions.
The control vertices of the THEN parts can be automatically achieved through alearning procedure.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 283
![Page 22: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/22.jpg)
Learning algorithm - I
Assume {(X, yd)} is a set of training data, where
• X = (x1, x2, . . . , xn) : the input data vector,
• yd : the desired output for X.
The squared error is computed as:
E =12(yr − yd)2, (5)
where yr is the current real output value during training.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 284
![Page 23: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/23.jpg)
The parameters to be found are Yi1,i2,...,in, which make the error in (5) as small aspossible, i.e.
E =12(yr − yd)2 ≡ MIN. (6)
Each control vertex Yi1,...,in can be modified by using the gradient descentmethod:
∆Yi1,...,in = −ε∂E
∂Yi1,...,in
(7)
= ε(yr − yd)n∏
j=1
N jij,kj
(xj) (8)
where 0 < ε ≤ 1.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 285
![Page 24: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/24.jpg)
The gradient descent method guarantees that the learning algorithm converges tothe global minimum of the error function because the second partial differentiationin respect to the quadratic error function Yi1,i2,...,in is constant:
∂2E
∂2Yi1,...,in
= (n∏
j=1
N jij,kj
(xj))2 ≥ 0. (9)
This means that the error function (5) is convex in the space Yi1,i2,...,in andtherefore possesses only one (global) minimum.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 286
![Page 25: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/25.jpg)
Immediate learning by self-evaluation
A fuzzy system can learn under supervision.
Such a learning process needs a teacher, i.e. for each input vector, the desiredoutput should be known. Then the fuzzy controller attempts to interpolate theseinput/output vectors to provide a continuous (hyper-)surface for the whole controlspace.
In reality, it is not always simple to find the goal function of the output for acomplex system. An unsupervised learning approach should therefore bedeveloped.
Based on a B-spline fuzzy controller, the parameters to be learned are still mainlythe control vertices of the “THEN” part.
The key problem of unsupervised learning with such a model is then how tomodify the control vertices after each learning step, i.e. the change direction (+ or-) and the change magnitude.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 287
![Page 26: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/26.jpg)
Inspiration by Supervised Learning
We first discuss a control system with (X1, X2, . . . , Xn) as input and Y as output.Let us rewrite the modification of the control vertices for supervised learning:
∆yi1,...,im = −ε∂E
∂yi1,...,im
= ε(yr − yd).m∏
j=1
Xij,kj(xj)
= sign(yr − yd) ε .|yr − yd|.m∏
j=1
Xij,kj(xj) (10)
sign(yr − yd) indicates the direction of the modification of yi1,...,im in eachlearning step, while the product ε · |yr − yd| ·
∏mj=1 Xij,kj
(xj) determines themagnitude of the modification.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 288
![Page 27: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/27.jpg)
Evaluation Function - I
In unsupervised learning, it is usually possible to define an “evaluation function”.Such an evaluation function should describe how “good” the current system state((x1, x2, . . . , xn), y) is.
For each input vector, an output is generated. With this output, the systemtransits to another state. The new state is compared with the old one; anadaptation is performed if necessary.
Assume the evaluation function, denoted by V (·), possesses a bigger value for abetter state, i.e. for two states st and st+1, if st is better than st+1, thenV (st) ≥ V (st+1). The adaptation of the control vertices can be performed with asimilar representation as in supervised learning.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 289
![Page 28: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/28.jpg)
Evaluation Function - II
Let us reconsider the modification of the control vertices through the equation(10). State st transits to st+1 by the output yr. The desired state is sd. Wereplace yr in (10) with V (st+1), yd with V (sd).
Assume two system states st and st+1, and st is better than st+1, i.e.V (st) ≥ V (st+1), where V (·) is the evaluation function.
We consider those systems, for which a function V (·) can be found which fulfillsthe following condition:
Assume st is the current state and y an arbitrary output. With y the systemtransits to the state st+1. If another output y′ fulfills y × y′ ≤ 0, and with y′
the system transits to s′t+1, the following relation of the evaluation functions isvalid:
( V (st+1) − V (st) )× ( V (s′t+1) − V (st) ) ≤ 0. (11)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 290
![Page 29: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/29.jpg)
Modifying Control Vertices in Reinforcement Learning - I
At the moment t the system has the state st. The ideal state of the moment t + 1would be sd.
With the controller output yr generated at the moment t, the system transits tothe state st+1.
Considering the state transition from st to st+1, the constellation of st, st+1 andsd:
(a) (b) (c)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 291
![Page 30: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/30.jpg)
Modifying Control Vertices in Reinforcement Learning - II
(a): The system state becomes worse, i.e. the system acts incorrectly. According tothe condition in (11) the change direction is −sign(y).
(b): The system acts in the correct direction. The value of the output should beenlarged. The change direction is then sign(y).
(c): This case is the inverse case of the case (b). The change direction should be−sign(y).
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 292
![Page 31: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/31.jpg)
These three cases can be synthesized by
S = sign(V (st)− V (st+1)) ∗ sign(V (st+1)− V (sd)) ∗ sign(y). (12)
The change of control vertices can finally be written as:
∆yi1,...,im = S . ε . |V (st+1)− V (sd)| .m∏
j=1
Xij,kj(xj). (13)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 293
![Page 32: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/32.jpg)
Learning of Cart-Pole Balancing
The pendulum possesses an initial state (θ, θ). To be solved is the force f to beexerted, which is able to bring the cart-pole system to the balanced final stateθ = 0 and θ = 0.
The inputs of the system are:
• angle: θ(◦) ∈ [−15,+15] and
• angle velocity: θ(◦/s) ∈ [−20,+20].
Each of the two input variables are covered with 7 B-spline basis functions oforder 3.
The output of the system is the force f to be exerted on the cart.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 294
![Page 33: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/33.jpg)
For learning we choose the evaluation function as:
V (st) = V (θ, θ)def= −|2 ∗ θ + θ|, and the relation of the evaluation functions of
the desired state sd and A: V (sd)def= 0.5 ∗ V (st).
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 295
![Page 34: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/34.jpg)
CP-Balancing: Control Surfaces
at the beginning: after 100 learning steps after 3000 learning steps
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 296
![Page 35: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/35.jpg)
CP-Balancing - Validation
The motion profiles of the pendulum from the starting state (θ=-10, θ=10):
angle:
angle velocity:
applied force:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 297
![Page 36: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/36.jpg)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 298
![Page 37: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/37.jpg)
Inverses Pendel: I
Problem Balanciere Pendel P durch Steuerung des Motors M
Eingang: zwei Zustand-Variablen:
• Winkel θ;• Winkelgeschwindigkeit θ
als Differenz ∆θt = θt − θt−1
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 299
![Page 38: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/38.jpg)
Ausgang: eine Steuer-Variable Motor-Storm→ Motor-Geschwindigkeit v
Quantisierung von drei linguistischen Variablen in jeweils sieben Fuzzy-Mengen(linguistischen Termen):
{NB, NM,NS,Z, PS, PM, PB}
Beispiel: Regel (NM, Z; PM)
Wenn der Winkel θ in seinem mittleren negativen Bereich istund die Winkelgeschwindigkeit θ ungefahr Null ist,
Dann sollte die Motor-Geschwindigkeit v in ihrem mittleren positiven Bereich sein.
Die Regelbasis in Tabellenform:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 300
![Page 39: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/39.jpg)
θNB NM NS Z PS PM PB
NB PBNM PMNS PS NS
∆θ Z PB PM PS Z NS NM NBPS PS NSPM NMPB NB
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 301
![Page 40: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/40.jpg)
Miniatur–Roboter KHEPERA
• Motorola 68331 Micro–Controller
• 128 KByte RAM, 128 KByte ROM
• Verbindung zur Außenwelt uber ein serielles Kabel
• 2 Schrittmotoren, 600 Schritte/Umdrehung, d.h. ein Schritt entspricht 1/12mm
• 8 Nahbereichssensoren (Infrarot), Siemens SFH900, Empfindlichkeit maximal5cm
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 302
![Page 41: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/41.jpg)
KHEPERA — Sensoren
Eingabe fur Regelung: IR Sensoren
0: SL85, 1: SL45, Mittelwert von 2 und 3: SLR0,
4: SR45, 5: SR85
Sensor Meßwerte gegen deren Distanz:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 303
![Page 42: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/42.jpg)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 304
![Page 43: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/43.jpg)
Problem der Hindernisvermeidnung
Ausgabe: Geschwindigkeiten des linken und rechten Motors⇒ Robotergeschwindigkeit v, Steuerwinkel s
Ziel: Kollisionsvermeidung, d.h., moglichst“sanftes” Umfahren von Hindernissen
Struktur des Fuzzy-Reglers:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 305
![Page 44: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/44.jpg)
ZF der Ein- und Ausgange
IR-Sensorwerte:
Robotergeschwindigkeit v:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 306
![Page 45: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/45.jpg)
Steuerwinkel s:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 307
![Page 46: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/46.jpg)
Die Regeln des Systems: I
Ausweichmanover im freien Raum beim Erkennen eines Hindernisses von rechts:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 308
![Page 47: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/47.jpg)
Fuzzy–Eingangsvariablen Ausgangsvar.
SL85 SL45 SLR0 SR45 SR85 speed steer
vl vl vl vl low high n
vl vl vl low low low nm
vl vl low low low low nb
vl low low low low low nb
vl vl vl vl high low nm
vl vl vl low high vl nb
vl vl low low high vl nb
vl vl vl high high vl nb
vl vl high high high vl nb
vl vl vl vl vh vl nb
vl vl vl low vh vl nb
vl vl vl high vh vl nb
vl vl low high vh vl nb
vl low high high vh vl nb
vl vl vl vh vh vl nb
vl vl low vh vh vl nb
vl vl vh vh vh vl nb
vl low vh vh vh vl nb
low high vh vh vh vl nb
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 309
![Page 48: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/48.jpg)
Autonome mobile Roboter: 1
Ziel: Zielfahrt und Kollisionsvermeidung
Besonderheiten:
• Fuzzyfikation der Sensorsignale;
(b) Laser range finder
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 310
![Page 49: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/49.jpg)
Autonome mobile Roboter: 2
• Fuzzy-Regeln fur die Realisierung von Verhaltensmustern (“behaviors”);
GO → SC 1 Regel
OP → SC 4 Regeln
GO → TC 3 Regeln
“Far” OP → TC 2 Regeln
“Near” OP → TC 2 Regeln
“Very close” OP → TC 3 Regeln
wobei SC (“speed control”) und TC (“turn control”) Funktionen von GO (“goal orientation”)
und OP (“obstacle proximity”) sind.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 311
![Page 50: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/50.jpg)
Autonome mobile Roboter: 3
• Darstellung des Verhaltens “goal-tracking”:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 312
![Page 51: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/51.jpg)
• On-Board-VLSI-Chip
→ Alle Regeln konnen in 30 µs verarbeitet werden.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 313
![Page 52: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/52.jpg)
Reinforcement Learning
Der Roboter erhalt in jedem Regelungszyklus sowohl Sensordaten als auch einReinforcement-Signal, dann fuhrt er eine Aktion aus, welche seinen Zustandverandert.
Reinforcement Learning liegt zwischen uberwachtem Lernen und unuberwachtemLernen.
Der Roboteragent kann auch uber ein “delayed reinforcement” Signal lernen.Dabei wird auch eine Aktion des Roboteragenten belohnt, wenn sie nur indirektzum Ziel gefuhrt hat. Dies kann der Fall sein, wenn die entsprechende Aktionausgefuhrt werdenmußte, um weitere Aktionen in Richtung des Zielzustandesausfuhren zu konnen.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 314
![Page 53: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/53.jpg)
Erwerb von Fertigkeiten eines Roboters
Skill acquisition: “Verbesserung mototischer oder kognitiver Fahigkeiten durchTraining. Lesen einer Anleitung stellt nur das initiale Wissen dar, das dannsukzessiv verbessert und verfeinert werden muss.”
(Carbonell et. al. 1983”)
Illustration des Reinforcement-Lernproblems:
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 315
![Page 54: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/54.jpg)
Markov-Entscheidungsprozeß
(“Markov Decision Process” MDP)
Ein MDP ist gegeben durch
• Eine Menge S diskreter Zustande (states),
• Eine Menge A moglicher Handlungen (actions),
• eine Reward-Funktion rt = r(st, at),
• Eine Successor-Funktion st+1 = δ(st, at),
Die Funktion r und δ sind Teil der Umgebung und dem Agenten nicht notwendigbekannt.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 316
![Page 55: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/55.jpg)
Graph zu einem Markov-Entscheidungsprozeß
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 317
![Page 56: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/56.jpg)
Das Problem der unvollstandigen Zustandsinformation
.
Man spricht auch von verborgenen Zustanden (engl.: hidden states).
Beispiel fur unvollstandige Zustandsinformation:
a) b) c) d)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 318
![Page 57: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/57.jpg)
Ablauf des MDPs
Zu jedem Zeitschritt t durchlauft der Agent folgende Schritte:
1. Bestimme den aktuellen Zustand st.
2. Wahle eine Handlung at.
3. Fuhre at aus.
4. Erhalte Reward rt = r(st, at).
Die Umgebung geht als Reaktion auf at in einen neuen Zustand st+1 = δ(st, at)uber.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 319
![Page 58: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/58.jpg)
Policy
Eine Funktionπ : S → A
wird Policy genannt.
Sie stellt eine Strategie dar, wie der Agent in einem bestimmten Zustand s eineHandlung a = π(s) auswahlt.
Die Aufgabe besteht darin, diese Funktion π zu lernen.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 320
![Page 59: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/59.jpg)
Kumulativer Wert
Der kumulative WertV π(st)
ist die kumulierte Reward, die der Agent erzielt, wenn er von einem Zustand st
startet und einer Policy π folgt.
Es gibt unterschiedliche Definitionen fur V π(st), die zukunftige Rewards inunterschiedlicher Weise mit einbeziehen.
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 321
![Page 60: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/60.jpg)
Definitionen fur V π(st)
• “Dicount cumulative reward”: V π(st) =∑∞
i=0 γirt+i
• “Finite horizon reward”: V π(st) =∑h
i=0 rt+i
• “Average reward”: V π(st) = limh→∞1h
∑hi=0 rt+i
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 322
![Page 61: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/61.jpg)
Optimale Policy
Eine Policy, die V π(st) fur alle Zustande s maximiert, wird optimale Policy π∗
genannt:
π∗ ≡ arg maxπ
V π(s),∀s
Der kumulative Wert einer optimalen Policy wird auch mit V ∗(s) bezeichnet:
V ∗(s) ≡ V π∗(s)
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 323
![Page 62: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/62.jpg)
Lernen der optimalen Policy
Aus der Definition von V π(st)
V π(st) =∞∑
i=0
γirt+i
folgt sofort fur π∗(s):
π∗(s) = arg maxa
[r(s, a) + γV ∗(δ(s, a))]
D.h.: Die optimale Policy kann erlernt werden, indem V ∗ gelernt wird, falls r undδ bekannt sind.
Aber dies ist oft nicht der Fall!
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 324
![Page 63: Definition of univariate B-splines - uni-hamburg.de · Definition of univariate B-splines The B-splines are employed to specify the linguistic terms, and knots are chosen to be](https://reader030.vdocuments.mx/reader030/viewer/2022021703/5e40d8e10ba72a6c5123457d/html5/thumbnails/63.jpg)
Modellbasiert oder modellfrei?
Modellbasiertes Reinforcement-Lernen:
z.B. mit dynamischer Programmierung.
Vergleich mit A*-Suche.
Anwendungsbeispiel: z.B. kollisionsfreie Bahnplanung unter bekannterUmgebungsdarstellung.
Modellfreies Reinforcement-Lernen:
r und δ sind unbekannt.
⇒: Q-Lernen
Angewandte Sensorik, J. Zhang Lernmethoden, W4/2003, 21. Januar 2003 325