a floating-point fpga-based self-tuning regulator

8/10/2019 A Floating-Point FPGA-Based Self-Tuning Regulator

1/12

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 53, NO. 2, APRIL 2006 693

A Floating-Point FPGA-Based Self-Tuning RegulatorZoran Salcic,Senior Member, IEEE, Jiaying Cao, and Sing Kiong Nguang, Senior Member, IEEE

AbstractRecursive-least-square (RLS) algorithm is widelyused in many areas with real-time implementation using digitalsignal processors. In this paper, the authors present a pure hard-ware implementation of a self-tuning regulator (STR) that usesa real-time RLS algorithm as the parameter estimator. The STRcontains a controller design circuit and a controller circuit. Dueto RLS computation-precision and dynamic-range requirements,the hardware implementation uses a floating-point format. Thefloating-point processing elements presented in this paper use pa-rameterized design, where the number of exponents and mantissabits can be changed as the data range and the accuracy of a specificapplication require. The strategies for overcoming the covariancematrix asymmetrical problem during the hardware computationand the covariance matrix resetting is introduced when the systemis poorly exciting are presented. The design was verified with

real-time experiments using a new testbed. The experiment resultsare presented.

Index TermsDigital signal processor (DSP), estimation al-gorithm, field-programmable gate array (FPGA), floating-pointarithmetic, floating-point format, recursive least squares (RLS),self-tuning regulator (STR).

I. INTRODUCTION

ADAPTIVE-CONTROL strategies are widely used inmany areas such as motor controllers, tracking cameras,

smart antennas, filters for crosstalk removal and channel esti-

mation, etc. [1][6]. Most strategies have been realized usingsingle or multiple digital signal processors (DSPs) and facedimplementation problems in terms of performance and accuracyof computation. A self-tuning regulator (STR) with an onlinerecursive-least-square (RLS) estimator is a promising adaptive-control strategy for many applications. Its digital implementa-tion naturally lands on DSP platforms. All digital STR systemsinclude an estimation circuit; the controller design circuit andthe controller circuit modeled as shown in Fig. 1. The blockestimation circuit estimates the process parameters online.The block labeled controller design circuit contains algo-rithms that are required to perform a design of a controllerwith a specified method and a few design parameters that can

be chosen externally. The block labeled controller circuit isan implementation of the controller of which parameters areobtained from the controller design circuit [7].

In order to overcome the shortcomings of DSP processors inthe implementation of the STR (primarily sequential processing

Manuscript received June 2, 2004; revised August 3, 2004. Abstract pub-lished on the Internet January 25, 2006.

The authors are with the Department of Electrical and Computer Engi-neering, The University of Auckland, Auckland 1020, New Zealand (e-mail:[email protected]).

Digital Object Identifier 10.1109/TIE.2006.871702

due to program execution and insufficient computing power),an STR using the full hardware implementation on a field pro-grammable gate array (FPGA) with fixed-point arithmetic wasproposed and implemented [8], [9]. However, the fixed-pointarithmetic has been a limiting factor for most applications dueto two major problems: 1) the dynamic range of computationand 2) the inflexibility to customize the hardware circuit oncethe features of the application are known. More discussion onthis is presented in Section IV.

FPGAs have become quite common not only as prototypingbut also the implementation technology for digital systems.FPGAs are a competitive alternative for high-performance DSP

applications, previously dominated by general-purpose DSPprocessors. One example is the methodology of the scalar-baseddirect algorithm mapping (SBDAM) on an FPGA and is usedin the implementation of a Kalman filter [10]. The approachis based on developing matrix equations into scalar form andon using only simple arithmetic operations for the implemen-tation. A similar approach, but extended to the floating-pointarithmetic, has been introduced in [11] and further developedin this paper for implementation of the STR. The new STRuses a floating-point-based parameterized data path, whichcan easily be customized for concrete specific application anda minimal number of arithmetic processing elements (adder,multiplier, multiplier/accumulator, and divider), which are used

in a serialized manner, to implement the STR. This version ofthe controller is called the floating-point serial STR (FSSTR).The options of using more such units to exploit the internalparallelism of the STR are left for future studies.

Exponential forgetting is a way to discard old data expo-nentially. However, exponential forgetting in combination withpoor excitation can result to an undesirable effect called thecovariance windup [12]. To overcome this in FSSTR, a con-ditional covariance resetting algorithm is used, which providesdesired system behaviors as will be demonstrated on a numberof experiments.

In Section II, we introduce mathematical models used to

describe the STR system. In this paper, the consideration isrestricted on the second order systems only. Section III showsthe scalar form of an RLS, which is suitable for hardwareimplementation. Section IV describes the floating-point formatwe used in our design, the FSSTR design and the design ofindividual processing elements (components). The STR hasbeen developed within a flexible design and testbed platform,which is shown in Section IV. The implemented testbed offersa unique environment that fully combines hardware implemen-tation of an FSSTR and communication with a process modelsimulated in the software. The experiment settings and resultsare shown in Section V. The discussion, conclusion, and thefuture study are presented in Section VI.

0278-0046/$20.00 2006 IEEE


2/12

694 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 53, NO. 2, APRIL 2006

Fig. 1. Block diagram of an STR.

II. BASICSYSTEMMODEL

In this section, we present the mathematical models of eachpart of the STR system from Fig. 1.

A. Process Model

It is assumed that the process is described by a single-inputsingle-output (SISO) system

A(q1)y(t) = B(q1) (u(t) +v(t)) (1)

where y(t)is the output, u(t) is the input, v(t)is the distur-bance, and q1 is the delay operator. A(q1)and B(q1)are

polynomials inq1

. It is assumed that A(q1

)and B(q1

)donot have any common factors. For simplicity, disturbance v(t)is assumed to be zero. Thus, process model (1) can be rewrittenexplicitly as

y(t) = a1y(t 1) a2y(t 2) any(t n)

+b0u(t d0) + +u(t d0 m). (2)

Noting that this model is linear in the process parameters, themodel is rewritten as

y(t) = T(t 1) (3)

whereT = [a1, a2, . . . , an, b0, . . . , bm] (3a)

and

T(t 1) = [y(t 1), . . . , y(t n),

u(t n), . . . , u(t d0 m)] . (3b)

B. RLS

The estimation circuit is the heart of an STR. The RLSalgorithm with an exponential forgetting algorithm is used

in this circuit. It recursively estimates the unknown processparameters for each measurement based on the minimization

of the least-square error. The whole algorithm involves the

following matrix computations:

(t) = y(t) T(t 1)(t 1) (4a)K(t) = P(t 1)(t 1)

+T(t 1)P(t 1)(t 1)

1

(4b)

(t) =(t 1) +K(t)(t) (4c)

P(t) =

I K(t)T(t 1)P(t 1)

(4d)

where (t) = [a1, a2, . . . , an, b0, . . . , bm] is the estimatedprocess-parameter vector,is the forgetting factor,K(t)is theKalman gain,(t)is the error in predicting the signal y(t)one

step ahead based on the estimate (t), and p(t) is the errorcovariance. In general, the estimation circuit operates at eachprocess output sample instance as follows.

1) New datay(t)and (t)are acquired, and the predictionerror is computed from the old estimated parameter (4a).

2) Equation (4b) is used to update the Kalman gainvectorK(t).

3) New parameter(t)is calculated using (4c).4) Datap(t)is updated by (4d) for the next sample.

Using this form of RLS algorithm in a number of experi-ments, it is found that estimated plant parameters converged totheir true values and the RLS algorithm proved to be stable forhundreds of iterations of the estimator. However, it becomesunstable sometimes and is unable to properly estimate thesystem parameters, as shown and discussed in Section V.

C. Covariancep_Matrix Resetting Algorithm

The ability of an adaptive controller to track time-varyingparameters is an important issue. To do so, an exponentialforgetting factor is introduced in an RLS. It is based on theassumption that the least-square loss function is replaced by aloss function in which old data is discarded exponentially. Thecomparison of RLS with and without the forgetting factor is

shown Section V-C. From the experiment, it is found that theRLS with a forgetting factor has a smaller tracking lag than


3/12

SALCIC et al.: A FLOATING-POINT FPGA-BASED SELF-TUNING REGULATOR 695

that without a forgetting factor. Thus, the forgetting factor canbe decreased to reduce the tracking lag. However, when thecommand signal is constant over a long period of time, thesystem is poorly exciting. The exponential forgetting factor willcause the covariance windup. In Section V, it is shown thatthe exponential growth of covariancePcauses the estimates of

parameters to become very inaccurate. In order to ovecome this,we add the covariancep_matrix resetting algorithm to the RLS.In this algorithm, there are three signals that need to be

tracked: the command signal, the plant-output signal, and theestimated parameters. Tracking the command signal is to seethe system exitation status (5a). The p_matrix is reset whenestimated parameters are converged and the plant output fol-lows the command signal. The comparison between commandsignal and plant-output signal is needed (5b). The most im-portant issue is to track the convergence status of estimatedparameters (5c). The condition for resetting the p_matrix isas follows:

uc(t+ 1) = uc(t) (5a)y(t)> uc(t) (5b)

(t+ 1) = (t). (5c)

To ensure that thep_matrix is reset when estimated parame-ters converged and the plant output follows command signal,an iteration counter is set in the FSSTR to count the number ofiterations for which those conditions continuously apply. If thecounter reaches a certain fixed value (e.g., 10), the p_matrix isreset. If one of those conditions is not satisfied, the counter willbe reset immediately.

D. Controller Circuit

The controller circuit implements the following equation:

Ru(t) =T uc(t) Sy (t) (6)

whereR,S, andTare polynomials.

E. Controller Design Circuit

The controller design circuit determines the controller thatgives desired closed-loop poles. The minimum-degree pole-placement (MDPP) algorithm for the model following withzero cancellation is used in the controller design circuit 0. Thecontroller parameters are

s1= am2 a2

b0s0=

am1 a1b0

1=b1b0

t0= bm0

b0(7)

wheream2,am1, andbm0are the closed-loop system parame-ters andt0,r1,s0, ands1are the controller parameters.

III. SECOND-ORDER-SYSTEMSCALARFORM OFRLS

In this paper, our discussion is restricted to a second order

system. In order to implement a complex matrix equation inhardware, the equation is decomposed into its scalar form.

Then, only simple arithmetic operations such as addition, sub-traction, multiplication, and division are used in these scalarequations. The scalar form of the RLS algorithm is derived from(4a)(4d).

1) Equation (4a) in the scalar form converts to

(t) =y(t) (1(t 1)1(t 1) +2(t 1)2(t 1)+ 3(t 1)3(t 1) +4(t 1)4(t 1)) . (8)

2) Equation (4b) in the scalar form converts to

K1i(t) = Pi1(t 1)1(t 1) +Pi2(t 1)2(t 1)

+Pi3(t 1)3(t 1) +Pi4(t 1)4(t 1) (9)

A= +T(t 1)p(t 1)(t 1)

= +K11 (t)1(t 1) +K12 (t)2(t 1)

+K13(t)3(t 1) +K144(t 1) (10)

Ki(t) =K1i(t)

A . (11)

3) Equation (4c) in the scalar form converts to

i(t) = i(t 1) +Ki(t)(t). (12)

4) Equation (4d) in the scalar form converts to

pij(t) =

pij(t 1) Ki(t)K

1j (t)

. (13)

By counting the number of operations, it can be seen that

44 multiplications, 40 additions or subtractions, and 20 divi-sions are required for a fully parallel estimation block im-plementation. In order to reduce the number of processingelements, a decision was made to implement the circuit witha single shared copy (component) of each of the processingelements. The same approach has been used in our initial fixed-point implementation of the STR presented in [8].

IV. FSSTR

The major problem with fixed-point number representationis that it can represent a limited range of data, which causes

inaccuracies in the algorithm implementation. A fixed binary orradix-point representation allows the representation of numberswith a fractional component, but the dynamic range is limited,causing problems in representing very small and very largenumbers. In our early implementation of the STR, a fixed-point format with radix point 0 was used. First, the true plantparameters are chosen by implementing the STR in MATLAB.According to the computation data collected from MATLAB,the position of the radix point for the fixed-point format STR ischosen. By running a MATLAB simulation, the implementationdata range was determined, which varies with different sets oftrue plant parameters. Obviously, this is not feasible to do inreal time, as the true plant parameters are not known, making

a very difficult implementation of the STR with fixed-pointarithmetic.


4/12


TABLE IFLOATING-POINTFORMAT

A. Floating-Point Format

In order to avoid problems of fixed-point format in our earlierimplementation of the STR [8], [9], a floating-point format,which is a combination of IEEE Standard 754 and the needs ofthe STR, has been specified and used. It is illustrated in Table I.Here, 8 exponent bits and 16 mantissa bits are chosen.

The floating-point numbers are in the form

(1)S M 2E

where S is the sign bit, E is the exponent field as 2s com-plement number with no bias used in the IEEE standard, and

M is the mantissa field representing numbers in the rangebetween 1 and 2. If the bits of the mantissa from left to rightareM1, M2, M3, . . ., then the value of the number is

(1)S (1 +M1 21 +M2 2

2

+M3 23 + ) 2E.

More details on arithmetic processing elements for this numberformat are given in Section IV-C.

B. FSSTR

The STR implemented in our case is of a serial type usingonly a single shared instance of all basic arithmetic elementsand, hence, its name FSSTR. Our goal was an implementationthat requires the minimum amount of hardware resources;however, due to sharing of processing elements, it is alsothe lowest speed implementation. The strategy in its designis similar to the one applied in the fixed-point case [8]. TheFSSTR global architecture is shown in Fig. 2. The majordifference to the previous architecture is the use of five dual-port RAM memories as the temporary storage for calculatedparameters: p(t)_mem is used to store the covariance errors;K(t)_mem stores Kalman gains; (t)_mem stores delay line; (t)_mem is for estimated parameters; and K1(t)_memstores the intermediate results of computation. The convergencemonitor circuit is used for checking the convergence level of theestimated plant parameters, and it is a part of the control unit.The data_in port is used to input data from plant y(t), initialvalues of(t), and initial covariance p(t). The data_out portis used to output data to plantu(t)and intermediate results ofcomputation from memory blocks for analysis in MATLAB, aswill be shown in Section V. The address is supplied from anexternal source (software) through the peripheral-component-interconnect (PCI) bus for fetching data from memory blocksand bringing them to the external world. The start and finish

signals are the handshake signals for communication betweenFSTTR hardware and its external world.

C. Building Blocks

1) Floating-Point MAC: The multiply and accumulate(MAC) is the core computation unit in the FSSTR circuit.Its structure is shown in Fig. 3. It contains a floating-pointmultiplier, a floating-point adder/subtraction, one register, andfour multiplexes. By selecting control signal of multiplexers

(MUXs), this circuit can implement a single addition or mul-tiplication and accumulation.

2) Floating-Point Adder/Subtraction: To perform basicarithmetic operations, floating-point adder/subtractor, multi-plier, and divider have been designed. In floating-pointarithmetic, addition and subtraction are more complex thanmultiplication and division. It is necessary to ensure that bothoperands have the same exponent before doing the addition.This requires shifting the radix point on one of the operandsto achieve alignment. The floating-point adder/subtractor isshown in Fig. 4. Exponent comparator (EXCMP) and mantissacomparator (MANCMP) are used to determine which operand

is larger and by how much. The difference controls mantissamultiplexer (MANMUX) to choose the exponent bits of largernumber to the normalize circuit, the mantissa bits of largernumber to adder/subtractor, the mantissa bits of smaller numberto the shift right register, and the sign bit of bigger operand asthe final sign bit output. The normalization step shifts the sumleft or right and increments or decrements the exponent.

3) Floating-Point Multiplier: The floating-point multiplica-tion is more straightforward compared with the floating-pointaddition. Multiplication of floating-point numbers a and b canbe represented as

aM bM 2(aE+bE).

Fig. 5 illustrates the architecture of the floating-point mul-tiplier. The sign of the result is simply found by XO Ring thesign bit of multiplicands. The exponent is the sum of the twoexponents. The addition of the two exponents is in 2s com-plement format. The multiplication of the fraction is an integerunsigned multiplication of two mantissas. The normalization isimplemented with 1-bit shift if the multiplication overflows.

4) Floating-Point Divider: As the division element has thelongest propagation delay and taking the advantage of the needfor division by a constant, division A/B is converted in theSTR into multiplication A (1/B) and implemented using a

look-up table (LUT) as illustrated in Fig. 6. For the purposeof normalization, 2/mantissa is stored in the LUT instead of1/mantissa. Since the mantissa field represents numbers in therange between 12, 1/mantissa will be the numbers less than1. 2/mantissa will be the numbers in the range between 12.For the same reason, the result of exponent field will be: 1(exponent of the input number).

D. Memory Blocks

There are five dual-port memory blocks in the FSSTR. Thesememory blocks have an 8-bit address bus, and the data widthis 32 bits. P(t)_mem storesp(t), initial covariance p(0), and

am2,am1. The top 16 locations of memory stores p(t), and thenext 16 locations is for p(0).(t)_mem stores delay line, and


5/12


Fig. 2. FSSTR global architecture.

Fig. 3. Data path of the MAC unit.

the reference input uc. K1(t)_mem stores intermediate dataK1(t), A, (t), r1, and bm0. (t)_mem stores the estimatedplant parameters and . K(t)_mem stores the Kalman gain. Thereason for using five dual-port memories instead of one memoryblock is the faster access to the memory and simultaneousaccess to memory locations by different processing elements.

E. LUTs

There are two LUTs used in the hardware circuit. One is forthe divider used that is mentioned in Section IV-C4. The otherLUT is used for converting the normal matrix P address tofetching and computing the top triangle element ofp_matrix

only. The comparison of an ordinate 4 4 matrix and asymmetrical 4 4 matrix are shown below. There are only

Fig. 4. Floating-point adder/subtraction.

ten elements used because the matrix Pis symmetrical. It isquite easy to generate the address for the normal 4 4 matrixin the control unit. This LUT creates the symmetrical p_matrixaddress according to the ordinate address that is received fromthe control unit. This LUT is stored into (t)_mem.

Ordinate4 4matrix: Symmetric4 4matrix:

0 1 2 34 5 6 7

8 9 10 1112 13 14 15

0 1 2 31 4 5 6

2 5 7 83 6 8 9


6/12


Fig. 5. Floating-point multiplier.

Fig. 6. Floating-point divider.

F. Control Unit

The control unit implements a finite-state machine byusing two counters (c_1, c_2), a state generator, an addressgenerator, and a control-signal generator as shown in Fig. 7.Based on the state of the counters, the state generator circuitgenerates six global states: compute_K1(t), compute_A,compute_(t)_K(t), compute_(t), compute_p(t) andcompute_controller. The control-unit finite-state machinebehavior is described by the flowchart in Fig. 8. The addressgenerator circuit generates addresses of memory blocksaccessed in each individual state. The control-signal generatorcircuit generates the memory write signal, the data selection

signal for MAC, and the multiplier and divider and enablesignal for the convergence monitor circuit to check the value

of product (t). The start signal is generated by a programrunning on the PC and is sent via the PCI bus. The finish signalis a signal from the FSSTR to the program on the PC receivedvia the PCI bus. The convergence signal is supplied by theconvergence monitoring circuit.

G. FSSTR Prototype Implementation Platform

The FSSTR has been implemented using an FPGA PCIprototyping board. The board has a number of useful featuresthat make it ideal for testing the hardware implementation.The FPGA device used (Altera Flex 10K200S) provides 10 160logic elements, 98 304 memory bits, and 200 000 typical gates.This device is ideal for memory functions or integrated entiresystems in a single chip [13].

The PCI core is an interface between the FSSTR hardwarecircuit and the host PCs PCI bus, as shown in Fig. 9. Itis used to exchange data between host PC software and theFSSTR. The ior_mailbox implements a handshaking protocolbetween the host software and the FSSTR hardware. The hostsoftware communicates with the FSSTR via dual-port memoryblocks, enabling easy debugging and experiments using customprograms or standard tools such as MATLAB.

V. EXPERIMENTS ANDRESULTS

A testbed has been established for performing various exper-iments with the hardware-implemented FSSTR and is shownin Fig. 10. The PCI board is the rapid prototyping platformfor experiments with STRs. The host PC has three functions inthe experiments. First, it provides a PCI interface with the PCI

board. Also, it executes a program (written in C language) thatsimulates the plant process, produces the plant output for theFSSTR, and collects various intermedia data from the FSSTR.Finally, it runs MATLAB for the purpose of data analysis andplots various graphs that illustrate the behavior of the simulatedplant and the controller.

The FSSTR estimates parameters in real time and returnsthem to the C program and MATLAB for comparison andanalysis. As the FSSTR design is parameterized, differentimplementation parameters are chosen such as the numbers ofmantissa and exponent bits. In the experiments presented in thispaper, we use a 16-bit mantissa and an 8-bit exponent. Thus,

all the data collected from the experiment are in a floating-point format with a 1-bit sign, an 8-bit exponent, and a 16-bitmantissa.

In the experiments, it has to be to ensured that thep_matrixis symmetric during the iterative computation. It is difficult toachieve this if all 16 elements of the p_matrix are calculated inthe computation because of the accumulation of hardware com-putation errors due to the limited mantissa length. In fact, onlyten elements in thep_matrix are used and then updated. Whenpij (i > j) is needed, we use pji instead. The algorithm ofthe symmetricp_matrix hardware implementation is presentedin Section IV-E.

The comparison of results when using symmetric and asym-

metric covariance matrices is shown in Section V-A. Thefunction of the exponential forgetting factor is discussed in


7/12


Fig. 7. Control-unit circuit.

Fig. 8. FSSTR behavior description.

Section V-B. The experimental results of the covariance reset-ting algorithm are presented in Section V-C.

A. Time-Invariant Parameters

Several fixed sets of parameters are used in these experi-ments. The typical results are shown in Table II. The true plantparameters that are stored in the host PC are in bold. Theestimated plant parameters are calculated by the FSSTR. FromTable II, it can be seen that estimated plant parameters are veryclose to the true plant parameters.

Figs. 11 and 12 show the dynamic behavior of both asym-metric and symmetric covariance matrices for the first set of

Fig. 9. Hardware platform in the Flex 10K200S.

Fig. 10. STR testbed.

parameters in Table II, respectively. The command signal is asquare wave of unit amplitude and with period of 50 samples.Thus, it is assumed that the system is persistently excited.Both matrices find their true values, but the FSSTR with anasymmetric covariance matrix shows wrong behaviors at 900iterations in Fig. 11(a). The error is oscillated widely as shownin Fig. 11(c). The comparison between the plant output andcommand-signal input uc is shown in Fig. 11(b). The plantoutput does not follow the command signal all the way through.In Fig. 11(a), it can be seen that the estimated parameter a0tracks the true parameter very well at the beginning. This is

because we initialize the covariance p_matrix as symmetricbefore the FSSTR start implementation. After hundreds of


8/12


TABLE IIPARAMETERESTIMATIONEXPERIMENTALDATA

Fig. 11. Tracking time-invariant parameters with an asymmetric covariance p_matrix algorithm. (a) Parameter a0. (b) Plant output. (c) Error refers to (4a).

Fig. 12. Tracking time-invariant parameters with a symmetric covariancep_matrix algorithm. (a) Parameter a0. (b) Plant output. (c) Error refers to (4a).

iterations, the covariance p_matrix is not symmetric at all dueto the accumulation of hardware computational errors.

Good results are obtained when a symmetric p_matrix algo-rithm is used in the FSSTR. This is shown in Fig. 12. Fromthese figures, it is obvious that the FSSTR without a symmetricp_matrix algorithm is unstable.

B. Time-Varying Parameters

In this section, an experiment when true-parameter valueschange instantaneously by 5% at 300th and 600th FSSTR itera-

tions is presented. The same command input signal is used. Thatis, a square wave of unit amplitude and a period of 50 samples.

Also, the system is exciting. The FSSTR with the asymmetriccovariancep_matrix cannot track the change of true parameters;the plant output does not follow the command signal, and theerror is high. This is shown in Fig. 13. In Fig. 13(a), theestimated parameters track their first true value, but they cannottrack second and the third values. Due to the accumulationof hardware computation errors, covariance p_matrix is notsymmetric. This is the reason why the error is small in thebeginning (iterations 0250) as shown in Fig. 13(c).

In Fig. 14(a), the estimated parameter a0 tracks its truevalue, but there is a track lag between the true and esti-

mated parameter when the true parameter is changing. Thissituation will be discussed in the next experiment. The plant


9/12


Fig. 13. FSSTR without a symmetric covariance p_matrix algorithm. (a) Estimated parameter a0 and its true parameter. (b) Plant output y and referenceinput uc. (c) Error refers to (4a).

Fig. 14. FSSTR with a symmetric covariancep_matrix algorithm. (a) Estimated parameter a0and its true parameter. (b) Plant output y and reference input uc.(c) Error refers to (4a).

Fig. 15. Exponential forgetting factor is 1. (a) Estimated parameter a0 and its true parameter. (b) Plant output y and reference input uc. (c) Errorrefers to (4a).

output follows the command signal in Fig. 14(b). The erroris small except at the 300th and the 600th iterations shownFig. 14(c), at the points where the true parameter is changingits value.

C. Exponential Forgetting Factor

In the previous two sections, it is shown that the asymmetric

covariance p_matrix will yield an unstable system behavior. Inthe rest of the experiment, the symmetric covariancep_matrix

algorithm is used in the FSSTR. The tracking lag betweenestimated- and true-parameter values appears to be due to theuse of the exponential forgetting factor in FSSTR. For theexperiment shown in Fig. 14, the exponential forgetting factorwas set to 0.9.

In the experiment shown in this section, the same commandsignal is used as in the experiment in Section V-B. However,the exponential forgetting factor is 1. The estimated parameter

a0 tends to track the true-parameter value. It discards the olddata too slow to track the changes of the true parameter in


10/12


Fig. 16. Exponential forgetting factor is 0.95. (a) Estimated parameter a0 and its true parameter. (b) Plant output y and reference input uc. (c) Errorrefers to (4a).

Fig. 17. Exponential forgetting factor is 0.7. (a) Estimated parameter a0 and its true parameter. (b) Plant output y and reference input uc. (c) Errorrefers to (4a).

Fig. 15(a). The error becomes high after the 300th iterationin Fig. 15(c).

In Fig. 16, the exponential forgetting factor is 0.95. Theestimated parametera0tracks the change of the true parameter.However, there is a tracking lag between estimated and trueparameters in Fig. 16(a). The tracking lag is bigger than theone shown in Fig. 14.

In Fig. 17, the exponential forgetting factor is 0.7. Theestimated parametera0tracks the change of the true parametervery well. The tracking lag is reduced significantly as shown inFig. 17(a). The error is small except at the 300th and the 600thiterations when the true parameter is instantaneously changing

its value.From the experiments, it can be seen that the function ofthe exponential forgetting factor is to discard old data expo-nentially.

D. Covariancep_Matrix Resetting

In the previous experiment, the command signal was set as asquare wave with a unit amplitude and a period of 50 samples.It is also assumed that the system is well exciting and the majorconcern is the effect of the exponential forgetting factor.

In this section, the ability of the FFSTR to cope with a poorlyexcited system when the command signal is constant over long

period of time is discussed. Exponential forgetting is a wayto discard old data exponentially. However, the exponential

forgetting in combination with a poor excitation can be acause of an undesirable effect called covariance windup. In theextreme case, there is no excitation at all, that is, = 0. Theequations for the estimate then become

(t+ 1) =(t) P(t+ 1) = 1

P(t).

The estimate will remain constant, and the p_matrix willgrow exponentially if


11/12


Fig. 18. FSSTR without covariance resetting. (a) Estimated parametera0and its true parameter. (b) Plant output y and reference input uc. (c) Covariance p0.

Fig. 19. FSSTR with covariance resetting. (a) Estimated parameter a0and its true parameter. (b) Plant output y and reference input uc. (c) Covariance p0.

forgetting combined with covariance p_matrix resetting is away to track time-varying parameters.

VI. CONCLUSION ANDFURTHERSTUDY

A fully hardware implementation of the FSSTR, which usescovariance p_matrix resetting and exponential forgetting forreal-time estimation of plant parameters, is presented in thispaper. The FSSTR with the covariance resetting algorithmprovides stability of computation of plant parameters and con-troller, which was not guaranteed when the system is poorly

excited if we use the common RLS algorithm. The FSSTRis implemented in a single FPGA device with the sharingof basic arithmetic processing elements performing operationsin a floating-point format. The implementation uses relativelymoderate FPGA resources: 37% of resources of a low-capacityAltera FLEX10K200 device. The design uses advantages of-fered by this type of FPGA device, which provides distributedsmall memory blocks that can operate as dual-port memoriesto store intermediate results of computation. To perform thesystem parameter estimation and control action, the circuitrequires 60 clock cycles. Using a very moderate clock with a100-ns cycle time, a single control step is performed within6 ms, allowing a sampling frequency of plant output at

166 000 samples/s. A testbed has been built to provide efficientexperiments with the real implementation of the FSSTR with

various plants that are simulated by a program running on a PC.This testbed environment enables further developments of theconcept and its extensions in various directions. A future studyis to explore the implementation of higher order system controlusing self-tuning controllers and the utilization of parallelismsin the algorithm to achieve a higher performance FSSTR aimedat a higher plant-output sampling frequency.

REFERENCES[1] S. Yacob and F. A. Mohamed, Online self-tuning controller for induction

motor based on generalized minimum variance method, in Proc. 37thSICE Annu. Conf., 1998, pp. 875878.

[2] S. Sugimori, H. Odajima, H. Hashimoto, and M. Higashiguchi, Develop-ment of camera tracking system using STR in visual feedback scheme,in Proc. IEEE/ASME Int. Conf. Advanced Intelligent Mechatronics,1997, p. 52.

[3] D. Mohammad and L. Cecil, A Software Radio Architecture for SmartAntennas. Vancouver, BC, Canada: Spectrum Signal Processing, Inc.

[4] L. A. Dessaint, B. J. Hebert, H. Le-Huy, and G. Cavout, A DSP-basedadaptive controller for a smooth positioning system, IEEE Trans. Ind.

Electron., vol. 37, no. 5, pp. 372377, Oct. 1990.[5] M. Umayahara, Y. Iiguni, and H. Maeda, A nonlinear recursive

least squares algorithm for crosstalk-resistant noise canceller, Electron.Commun. Jpn., vol. 86, pt. III, no. 5, pp. 3644, May 2003.

[6] M. Tuchler, R. Otnes, and A. Schmidbauer, Performance of soft iterativechannel estimation in turbo equalization, in Proc. IEEE ICC, 2002,vol. 3, pp. 18581862.

[7] K. J. strm and B. Wittenmark, Adaptive Control. Reading, MA:

Addison-Wesley, 1995.[8] J. Cao, Z. Salcic, and S. K. Nguang, Digital self-tuning regulator in asingle FPLD chip, in Proc. ICARV Conf., Singapore, Dec. 2000.


12/12


[9] , A generic single-chip second-order system digital self tuningregulator, in Proc. 9th Mediterranean Conf. Control and Automation,Dubrovnik, Croatia, Jun. 2001.

[10] Z. Salcic and C. R. Lee, Scalar-based direct algorithm mapping FPLDimplementation of a Kalman filter, IEEE Trans. Aerosp. Electron. Syst.,vol. 36, no. 3, pt. 1, pp. 879888, Jul. 2000.

[11] J. Cao, Z. Salcic, and S. K. Nguang, A floating-point all hardwareself-tuning regulator for second order systems, in Proc. IEEE Region

10 Conf. Computers, Communications, Control and Power Engineering,2002, vol. III, pp. 17331736.[12] A. P. Liavas and P. A. Regulia, On the numerical stability and accuracy

of the conventional recursive least squares algorithm, IEEE Trans. SignalProcess., vol. 47, no. 1, pp. 8896, Jan. 1999.

[13] PLD Applications.DPC10K-PRODB Users Guide. 1.2 Version. [Online].Available: http://www.plda.com

Zoran Salcic (S75M76SM98) received theB.E., M.E., and Ph.D. degrees from the Universityof Sarajevo, Sarajevo, Bosnia and Herzegovinia, in1972, 1974, and 1976, respectively, all in electricalengineering.

He is a Professor and the Chair in ComputerSystems Engineering, Department of Electrical andComputer Engineering, The University of Auck-land, Auckland, New Zealand. His research inter-ests include complex digital-system architecturesand design, prototyping and customization of digital

systems, and new architectures and formal models for embedded systems forcombined data- and control-driven applications. He is the author of more than150 papers in journals and refereed conferences, seven books, and numeroustechnical reports. He led or took part in several dozens of research projectsand complex digital and computer-based system designs in over 30 years. Heestablished and leads the Embedded Systems Research Group, The Universityof Auckland.

Jiaying Caoreceived the M.E. degree in electrical and electronic engineeringfrom The University of Auckland, Auckland, New Zealand, in 1999.

She has worked in the electronics industry for more than ten years. Her majorareas of interest are the implementation of complex algorithms in hardware,particularly, field-programmable gate arrays, complex digital systems anddesign techniques, rapid prototyping, and self-tuning controllers.

Sing Kiong Nguang (M97SM00) received theB.E. degree (with first class honors) and the Ph.D.degree in 1992 and 1995, respectively, from theDepartment of Electrical and Computer Engineering,University of Newcastle, Callaghan, Australia.

He is currently a Senior Lecturer in the Depart-ment of Electrical and Computer Engineering, TheUniversity of Auckland, Auckland, New Zealand.He is the author of more than 60 journal papersand over 40 conference papers/presentations on non-linear control design, nonlinear H-infinity control

systems, nonlinear time-delay systems, nonlinear sampled-data systems, bio-medical systems modeling, fuzzy modeling and control, biological systemsmodeling and control, and food and bioproduct processing.

Dr. Nguang is currently the Associate Editor of the IEEE Control SystemSociety Conference Editorial Board.

a floating-point fpga-based self-tuning regulator

Documents