[ieee 2005 pakistan section multitopic conference - karachi, pakistan (2005.12.24-2005.12.25)] 2005...

6
CORDIC: Novel sequential and Pipelined Architectures and Performance issues Usman Ali', Umair Ali Sheikh2 COMSATS Institute of Information Technology, Abbottabad, Pakistan 2Intech Process Automation, Lahore, Pakistan Abstract This paper presents a novel and simple sequential design of CORDIC for the calculation of Sine and Cosine of an angle. Design reduces the hardware requirement by using a single adderlsubtrator module for all the iterations and for both Sine and Cosine calculation. Sequential design is implemented and synthesized in MAX plus 2 and time diagram is presented A parallel extension is also discussed, which would increase the speed Performance of the proposed design is compared with other general architectures available. 1. Introduction & Related Work CORDIC (COordinate Rotation DIgital Computer) is an iterative algorithm for the calculation of the rotation of a two-dimensional vector, in linear, circular or hyperbolic coordinate systems, using only add and shift operations [1, 2, 3]. CORDIC was developed for rotating coordinates, a piece of hardware for doing real-time navigational computations. Its current applications are in the field of digital signal processing, image processing, filtering, and matrix algebra. The CORDIC algo-rithm is used for implementation of the following func- tions: sin(x), cos(x), tg(x), arctg(xly), 2X-1, ylog2(x), ylog2(x+ 1). Hardware implementation of this algorithm is based on datapaths with adders, shifters, registers and Read Only Memory (containing a number of pre-computed constant factors). Controller enables realization of all transcen- dental functions with the same datapath structure, but with different sequence of basic operations. FPGA implementation of CORDIC schemes for fast and silicon area efficient computation of the sine and cosine functions is presented along with application of CORDIC as a sine and cosine generator in small satellites in [2]. A Variable Precision CORDIC Processor FPGA implementation is presented in [4]; this implementation has significantly reduced the number of cycles required. Implemented of CORDIC as a virtual component (IP core), in a VHDL simulation environment is discussed in [5]. CORDIC pipelining and parallelization schemes for maximizing throughput are provided in [6]. A pipelined design for differentiable function evaluation using lookup tables, adders, shifters and multipliers is presented along with different tradeoff explanations in [7]. This approach can be used to develop efficient implementations of function evaluators. This paper presents a novel FPGA based sequential implementation first, and then proposes pipelined extension. We then compare the presented architecture with the already existing general architectures. Paper is organized as follows: Section 2 present the basic CORDIC algorithm, Section discusses general/standard sequential and parallel architectures and presents the novel proposed sequential architecture and its parallel extension. Section 3 is discussion on the performance parameter. Finally, a conclusion in drawn in section 4 and references are listed in section 5. 2. Fundamentals of CORDIC Algorithm: The CORDIC algorithm consists of two operating modes, the rotation mode and the vectoring mode, respectively. In the rotation mode, a vector (x, y) is rotated by an angle "0" to obtain the new vector (x', y') as shown in figure 1. The co-ordinate components of a vector and an angle of rotation are given and the co- ordinate components of the original vector, after rotation through a given angle, are computed. y lf I,, -: XY' x R Figure 1: The rotation and vectoring mode of CORDIC algorithm r1

Upload: umair-ali

Post on 18-Mar-2017

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [IEEE 2005 Pakistan Section Multitopic Conference - Karachi, Pakistan (2005.12.24-2005.12.25)] 2005 Pakistan Section Multitopic Conference - CORDIC: Novel sequential and Pipelined

CORDIC: Novel sequential and Pipelined Architectures andPerformance issues

Usman Ali', Umair Ali Sheikh2COMSATS Institute of Information Technology, Abbottabad, Pakistan

2Intech Process Automation, Lahore, Pakistan

AbstractThis paper presents a novel and simple

sequential design ofCORDICfor the calculationof Sine and Cosine of an angle. Design reducesthe hardware requirement by using a singleadderlsubtrator module for all the iterations andfor both Sine and Cosine calculation. Sequentialdesign is implemented and synthesized in MAXplus 2 and time diagram is presented A parallelextension is also discussed, which wouldincrease the speed Performance of the proposeddesign is compared with other generalarchitectures available.

1. Introduction & Related WorkCORDIC (COordinate Rotation DIgital

Computer) is an iterative algorithm for thecalculation of the rotation of a two-dimensionalvector, in linear, circular or hyperboliccoordinate systems, using only add and shiftoperations [1, 2, 3]. CORDIC was developed forrotating coordinates, a piece of hardware fordoing real-time navigational computations. Itscurrent applications are in the field of digitalsignal processing, image processing, filtering,and matrix algebra. The CORDIC algo-rithm isused for implementation of the following func-tions: sin(x), cos(x), tg(x), arctg(xly), 2X-1,ylog2(x), ylog2(x+ 1).

Hardware implementation of this algorithmis based on datapaths with adders, shifters,registers and Read Only Memory (containing anumber of pre-computed constant factors).Controller enables realization of all transcen-dental functions with the same datapathstructure, but with different sequence of basicoperations.

FPGA implementation of CORDIC schemesfor fast and silicon area efficient computation ofthe sine and cosine functions is presented alongwith application of CORDIC as a sine and cosinegenerator in small satellites in [2]. A VariablePrecision CORDIC Processor FPGAimplementation is presented in [4]; thisimplementation has significantly reduced thenumber of cycles required. Implemented ofCORDIC as a virtual component (IP core), in a

VHDL simulation environment is discussed in[5]. CORDIC pipelining and parallelizationschemes for maximizing throughput are providedin [6]. A pipelined design for differentiablefunction evaluation using lookup tables, adders,shifters and multipliers is presented along withdifferent tradeoff explanations in [7]. Thisapproach can be used to develop efficientimplementations of function evaluators. Thispaper presents a novel FPGA based sequentialimplementation first, and then proposespipelined extension. We then compare thepresented architecture with the already existinggeneral architectures.

Paper is organized as follows: Section 2present the basic CORDIC algorithm, Sectiondiscusses general/standard sequential andparallel architectures and presents the novelproposed sequential architecture and its parallelextension. Section 3 is discussion on theperformance parameter. Finally, a conclusion indrawn in section 4 and references are listed insection 5.

2. Fundamentals of CORDICAlgorithm:

The CORDIC algorithm consists of twooperating modes, the rotation mode and thevectoring mode, respectively. In the rotationmode, a vector (x, y) is rotated by an angle "0"to obtain the new vector (x', y') as shown infigure 1. The co-ordinate components of a vectorand an angle of rotation are given and the co-ordinate components of the original vector, afterrotation through a given angle, are computed.

y

lf

I,,

-:

XY' x R

Figure 1: The rotation and vectoring mode ofCORDIC algorithm

r1

Page 2: [IEEE 2005 Pakistan Section Multitopic Conference - Karachi, Pakistan (2005.12.24-2005.12.25)] 2005 Pakistan Section Multitopic Conference - CORDIC: Novel sequential and Pipelined

In every micro rotation i, fixed anglesof the value (arctan[2-']) which are stored in aROM are subtracted or added from/to the angleremainder "0i", so that the angle remainderapproaches zero.

In the vectoring mode, the length R andthe angle towards the x-axis a of a vector (x, y)are computed. For this purpose, the vector isrotated towards the x-axis, so that the ycomponent approaches zero. The sum of allangle rotations is equal to the value of a, whilethe value of x-component corresponds to thelength R of the vector (x, y).

The mathematical relations for theCORDIC rotations are given below [2]

Xi+,= -i 2 * Y

Yi+l = Yi + (Ti * 2- *-xiZi+l = Zi - 07i * arctan(2-i)

YnZn

K, sinO0

2.2 CORDIC C AlgorithmWith above iterative equations 4-6, we

can design an algorithm that will, given an anglein A, reduce this angle to zero. At each step, italso increments or decrements the X and Ycoordinate register by the appropriate value (i.e.shifted values ofX and Y), thus keeping track ofthe coordinates while rotating.

for i

(1)

(2)

(3)

Where, ai is either +1 or -1 dependingon the sign of the zi (angle remainder) or yicomponent, depending on rotation or vectoringmode.

The rotation mode of the CORDICalgorithm has three inputs and its outputs aregiven by the following expressions.

Xi+,= Ki(xi cos[zi] - yi sin[zi]) (4)

Yi+I = Ki(xi cos[zi] + xi sin[zi] (5)zi =0 (6)Where,

n-KKi=~n+2

The constant Ki, given by above equation, isreferred to as a scale factor, and represents theincrease in magnitude of the vector during therotation process. When the number ofiterations/micro-rotations is fixed the scale factoris a constant approaching the value of 1.647 as

the number of iterations goes to infinity.

2.1 Sine, Cosine CalculationsThe elementary functions sine and

cosine can be computed using the rotation modeof the CORDIC algorithm if the initial vector isof unit length and is aligned with the abscissa.The computation of sinO and cosO is based on

above equations with input values x0 = 1, y.o1and z0=0. The outputs after n iterations are given

by:Kn cosO

O to Ndx =X/2Aidy= Y12Aida = atan(112Ai)ifZ>=0 then

X = -dyA = -daY +dx

elseX= +dyA = +daY = -dx

endifnext

3 CORDIC ArchitecturesThe CORDIC algorithm can be

implemented in hardware using threeapproaches: a sequential approach - the structureis unfolded in time, a parallel approach - thestructure is unfolded in space or a combinationof the two. These three approaches and theresulting structures are also referred to in theliterature as iterative, cascaded and cascadedfusion, respectively.

3.1 Sequential CORDIC designA sequential CORDIC design performs

one iteration per clock cycle and consists of threen-bit adders/subtractors, two sign extendingshifters, a look-up table (LUT) for the step angleconstants and a finite state machine.

3.2 Our Proposed Design

Our novel sequential CORDIC design isshown in the figure 2. To use less hardware wedesigned our system in such a way that only oneshifter and adder/subtractor are used and inexpense an additional multiplexers areintroduced. The angle, whose sine and cosine

Xn

Page 3: [IEEE 2005 Pakistan Section Multitopic Conference - Karachi, Pakistan (2005.12.24-2005.12.25)] 2005 Pakistan Section Multitopic Conference - CORDIC: Novel sequential and Pipelined

values are to be computed, enters as an input tothe "Register A". In the first step, the initialvalues of "0.607" and "0" are loaded in the"Register X" and "Register Y". The multiplexer

thetAininLIt .o

0.60j

x~~~~~~~~IrQux uI xy

Metise Y' ..

X,hUaxy 4

Add /Sub

Figure 2: Our Novel3.2 Our Design Implementation

'For' loop in C algorithm presented insection 2.2 is not synthesizable in Verilog, so itis implemented using a control RAM. ThisRAM contained all the signals to the registers(clear, load etc.), the muxes (which register toselect) and for shifter (how much to shift). Thevalues of atan(112Ai) were saved in a separateLUT, which was also controlled by controlRAM. Division by 2Ai should end up being asimple shift instruction on most architectures.

"Register X" can be loaded with theinitial values of "0.607", no matter whatprecision we use, we just truncate it to thenumber of bits we need. An example with angle30 is tabulated below. Finally, X gives cosineand Y gives sinO.

i X Y A Atan (1/2^i)|0 0.607 0.000 30.000 45.0001 0.607 0.607 -15.000 26.5652 0.910 0.303 11.565 14.0363 0.835 0.531 -2.471 7.1254 0.901 0.427 4.654 3.5765 0.874 0.483 1.077 1.7906 0.859 0.510 -0.712 0.8957 0.867 0.497 0.183 0.4488 0.863 0.504 -0.265 0.224

XY is driven by the control RAM. Shifter,Multiplexer dXdY, the input and output enablesof all the registers and LUT are also sent by thecontrol RAM.

I Sequential CORDIC Design16 bits registers were used to hold the initial,intermediate, and final values of X, Y, A and dX,dY, dA. The most significant bit was kept forsign. Since the CORDIC algorithm works up topi/2, the resolution of one location came out tobe:

(Pi/2)/(2 15) = 0.0000479

Hence, if we are to represent K=0.607 using thisnotation

0.607/0.0000479= 12662And similarly for example: 16 bit binary of12662 would be

0011000101110110

Timing diagram of Verilogimplementation is provided in figure 3, with A ashex 4000(pi/4 radians) and ouput cosA ashex39FF(.711 decimal) and sinA ashex3930(0.70 decimal). The original values forCos and Sin are 0.7071. Our simulation resultsfor implementing the architecture to get therequired values took about 90 clock cycles for a16 bit long word and when this architecture willbe implemented in a pipelined form, the timewill further reduce.

Page 4: [IEEE 2005 Pakistan Section Multitopic Conference - Karachi, Pakistan (2005.12.24-2005.12.25)] 2005 Pakistan Section Multitopic Conference - CORDIC: Novel sequential and Pipelined

Figure 3: Timing diagram ofOur

3.3 General Parallel CORDIC designsA parallel CORDIC design is similar to an

array multiplier structure consisting of rows ofadders/subtractors, with hardwired shifts andconstants. Parallel CORDIC can be implementedin the form of purely combinational arrays or canbe pipelined depending on the size of the designand the requested data rate. A combinedCORDIC design is based on a sequentialstructure where the logic for several successiveiterations is cascaded and is executed within oneclock cycle. The number of "used" successiveiteration stages determines the order of acombined CORDIC design. Figure 5 gives thegeneral architecture of parallel or pipelinedCORDIC architecture.

Sequential CORDIC Design

In the design given in figure 4 and inmajority of other general architectures ofCORDIC, separate adder/subtractor block is usedfor the calculation of Cosine and Sine

3.4 Our Parallel CORDIC designsFigure 4 gives a single block of the multiplestage CORDIC architecture. In the proposedparallel design single adder/subtractor block isused to calculate the values of the sine, cosineand the remainder angle. This design thusconsumes a lesser space than the architecturegiven above. We will get the values of both Sineand Cosine at the same output, but the clockcycles associated with each output will bedifferent, that is the outputs at different clockcycles will be carrying different information(Cosine/Sine /Remainder angle). The time for theoutput will also depend upon the number ofblocks associated with the pipeline.

Single block is derived from thesequential design presented in section 3.2. Wewill take the output from every preceding blockto the input of the next block. The remainderangle will be fed to the register A, while Cosinevalue will go to register X and Sine value will goto register Y, corresponding value of the LUTcan be stored in the register, thus ending the needfor a LUT. The contents of the control RAM willbe the same except for the first block. As all theinformation we require is passed out on a singleoutput, thus we can have problems relating to theloading of the next step register. This can besolved using time division multiplexing.

4. PerformanceWe make a general comparison of the

architecture presented above with some generalarchitectures used for implementation ofCORDIC.

x(n, yin) znO

Figure 4: General Pipelined architecture

Page 5: [IEEE 2005 Pakistan Section Multitopic Conference - Karachi, Pakistan (2005.12.24-2005.12.25)] 2005 Pakistan Section Multitopic Conference - CORDIC: Novel sequential and Pipelined

theta1rwpt Lou

Reqit& A 2

2

rvux axyI/

AddSub

Figure 5: single block of th

4.1 AccuracyThe CORDIC algorithm gives the correct valuewhen the remainder angle is 0 radians. Thisassumption is true for a very large number ofiterations. Thus in other words the accuracydepends upon the number of iterations involved.The number of iterations increase the number ofstages involved in the pipelined architecture,which further increases the size of the pipelinedarchitecture. Another factor is the representationof the value of theta (0) in binary form. Clearly,representation would be more accurate, if morenumber of binary digits are involved in therepresentation.4.2 Speed

Considering the speed of the designpresented, we can say that it is the biggest loophole in the design. The normal pipelinedarchitecture will be approximately two timesfaster than what we have presented. The majorpart of the speed will be lost in calculating thevalues using a single adder/subtractor. In thedesign presented all the outputs come from thesingle adder/subtractor and in the next stage werequire the previous stage outputs, while in thegeneral pipelined or parallel CORDICarchitecture the output comes from threedifferent adders and subtractors. The comparisonwith the bit-parallel iterative CORDIC is notworth mentioning as this architecture belongs tocompletely different category. The bit serialiterative CORDIC involves separate calculation

_T n9xt stag

e Pipelined CORDIC architecture.

for all the bits resulting into much faster resultsas it uses a bit serial adder.4.3 Size

Size is the major advantage that thearchitecture presented has over the otherarchitectures. The bit parallel unrolled CORDICrequires three adder/subtractor modules for eachstage, while in our design only oneadder/subtractor module does the trick on theexpense of a multiplexer. Thus about twice thespace is saved in the presented architecture. Thebit serial iterative CORDIC takes far more spacethan this architecture as we implement separatestages for every bit. The size required toimplement very large stage pipelined architecturemight be very large, but this will also improvethe accuracy and speed.

5 Conclusion:This paper presents a new approach to

implement the CORDIC algorithm. Theoriginality in the idea of using the same outputfor three purposes makes this implementationfairly new. The paper also discussed the timingproblems that might be associated with obtainingoutput on a single line. A single block wassimulated such that it could be used to calculatethe Cosine and Sine values and on the basis ofthis simulation the timing analysis was done. Thetotal time taken by the device was calculated andthen the total time that will be taken by apipelined CORDIC architecture was estimated.

Page 6: [IEEE 2005 Pakistan Section Multitopic Conference - Karachi, Pakistan (2005.12.24-2005.12.25)] 2005 Pakistan Section Multitopic Conference - CORDIC: Novel sequential and Pipelined

The performance was reviewed on the basis ofthe time and the size involved in the presentedarchitecture. The performance issues werediscussed and compared with those of bit serialand bit parallel iterative CORDIC architectures.Currently work is underway towards simulationof the pipelined architecture.

6 References[1] J. E. Volder, "The CORDIC TrigonometricComputing Technique," IRE Trans. ElectronicsComputers, Vol. EC-8, No. 3, pp. 330-334, Sept.1959.[2] T. Valdimirova, H. Tiggeler, "FPGAImplementation of Sine and Cosine GeneratorsUsing the CORDIC Algorithm", Surrey SpaceCentre, University of Surre, Guildford, Surrey,GU2 5XH,[3] Kuhlmann. Martin, Parhi. Keshab "A NovelCORDIC Rotation Method for GeneralizedCoordinate Systems"[4] E. Saez, J. Villalba, J. Hormigo, F.J. Quiles,J.I. Benavides, E.L. Zapata, "FPGAImplementation of a variable precision CORDICProcessor", 13th Conf. on Design of Circuits andIntegrated Systems (DCIS'98),Madrid, Spain,pp. 604-609, November 17--20, 1998[5] Potok. Grzegorz, Bitniok. Adam, "CORDICAlgorithm Implemented as a VirtualComponent"[6] Freeman, S. and O'Donnell, M., " AComplex Arithmetic Digital Signal ProcessorUsing Cordic Rotators", IEEE Proceedings ofICASSP 1995, Volume 5, pp. 3191-3193, 1995.[7] 0. Mencer, N. Boullis, W. Luk and H. Styles,"Parameterized function evaluation for FPGAs",Field-Programmable Logic and Applications,LNCS 2147, pp. 544-554, 2001