design of high performance cpu datapath unit

7/27/2019 Design of High Performance CPU Datapath Unit

http://slidepdf.com/reader/full/design-of-high-performance-cpu-datapath-unit 1/5

International Journal of Engineering Trends and Technology (IJETT) – Volume 4 Issue 6- June 2013

ISSN: 2231-5381 http://www.ijettjournal.org Page 2428

Design of High Performance CPU Datapath UnitMaroju SaiKumar #1, N.Rajasekhar *2

#,* M.Tech (Electronics), Department of Electronics Engineering,

Pondicherry University (A Central University),

Pondicherry, India-605014

Abstract — In this paper we are going to discuss about the Designand Implementation of CPU Data Path Unit on FPGA. ADatapath Unit commonly exists in each and every CPU. DatapathUnit is collectively does all the repeated operations such asArithmetic, Logical and Control Operations. In this paper, weare going to design the General Purpose Datapath unit andCustom Datapath Unit. And we will analyze using TimingAnalysis of both circuits.

Keywords- CPU Datapath Unit, Adder, Multiplier, Register, ALU,Control Unit etc .

I. I NTRODUCTION

Datapath unit is a common name we hear in the CPUDesign. When we are doing Arithmetic Operations and LogicalOperations we are doing with Adders and Multipliers etc., butwhen we are doing such a million additions and other operations simultaneously we required to perform suchoperations continuously through one control block callDatapath Unit. The Datapath Unit is collection of Adders,Multipliers, Shifters, Registers, ALU, and other combinationaland sequential circuits. There are two major types of DatapathUnits are available they are, General Purpose Datapath Unitand Custom Datapath Unit. In General Purpose Datapath Unitwe are going to Design a CPU Datapath Unit for all types of Operations to be perform simultaneously. The General PurposeDatapath Unit contains Functional Unit, ALU, and Register.The Datapath Unit will be operated through Control Signal sent

by Functional Unit of Datapath Unit and the Datapath unitrequired acknowledgment signal from the Blocks connected toit.

II. GENERAL DATAPATH U NIT

The below diagram shows the Block Diagram of CPUGeneral Purpose Datapath Unit which contains ALU,Multiplexer, and Register. Register is used to store Data

obtained from ALU. Each Datapath unit have ownrequirements they are given as,- Instruction fetch require PC, Instruction Memory, IR - ALU instructions require IR, Registers and ALU- Load/Store instructions require IR, Registers, ALU

Depending on the type of operations the Datapath ControlSignals will send the signals for to do specific operations.Hence the General Purpose Datapath unit satisfies all ALUoperation, we need to design the ALU, IR, PC, and other Registers in such a way to meet the requirements.

Fig 1: - General Purpose CPU Datapath Unit

ALU2 ALU1 ALU0 Operation0 0 0 Pass through A0 0 1 A and B0 1 0 A or B0 1 1 Not A1 0 0 A+B1 0 1 A-B

1 1 0 A+11 1 1 A-1

Fig 2: - Operations performed by three ALU Units

The input to the A operand of the ALU can be either anexternal input or the constant ‘1’ as selected by themultiplexer select signal line IE . The B operand of the ALU isalways from the content of the register. The operation of theALU is determined by the three control lines ALU 2, ALU 1, and

ALU 0, as defined in Figure 2. The register provides a load




ISSN: 2231-5381 http://www.ijettjournal.org 429

capability for loading the output of the ALU into the register.The register can also be reset to zero by asserting the Clear signal line. The content of the register can be passed to theexternal output by asserting the output enable line OE of thetri-state buffer. We assume here that the buses for transferringthe data between components are eight bits wide. All thecontrol lines, of course, are one bit.

There are seven control lines (number 0 to 6) for controlling the operations of this simple Datapath. Variousoperations can be performed by this simple Datapath byasserting or de-asserting these control signals at differenttimes. These control lines are grouped together to form what iscalled a control word . One operation of the Datapath,therefore, is determined by the values set in one control word,and will take one clock cycle to perform. By combiningmultiple control words together in a certain sequence, theDatapath will perform the specified operations in the order given. For example, to load a value from the external input tothe register, we would set the control word as follows.

Table 1: - The Control Word InstructionsControl

LineIE ALU Load Clear OE

6 3-May 2 1 0Value

Set 1 000(pass) 1 0 0

By setting IE = 1, we select the external input to pass throughthe mux. From Figure 1(b), we see that setting the ALUcontrol lines ALU 2, ALU 1, and ALU 0 to 000 selects the passthrough operation. Finally, setting Load = 1 loads the valuefrom the output of the ALU into the register. Thus, we havestored the input value into the register. We do not want tooutput the value from the register so OE is set to 0. Note thatthe writing of the register occurs at the next active edge of theclock. Thus, the new value is not available to be read from theregister until the next clock cycle. If we had set OE to 1 in theabove control word, we would be reading the register in thecurrent clock cycle and thus outputting the original valuefound in the register rather than the new value that was justentered in.

A. Using General Datapath Units

A general Datapath, such as the one described in the previous section, can be used to solve various problems aslong as it has all of the required functional units and hasenough registers for storing all the temporary data. The idea of using a general datapath is that we can use a “ready made”circuit to solve a given problem without having to modify it.The tradeoff is a time versus space issue. On one hand, we donot need the extra time to design a custom or dedicated datapath. On the other hand, the general Datapath may contain

Fig 3: - The Complete Block Diagram of Datapath Unit

more features than what the problem requires, so it not only

increases the size of the circuit, but also consumes more power. The following example shows how we can use the

general datapath from the previous section to solve a problem.

In order to explain operation here we will discussthrough one Example as follows, to see how a datapath is used

to perform a computation, let us write the control words for thedatapath of Figure 1 to generate and output the numbers from 1to 10. The algorithm for doing this is shown in Figure 2. Totranslate this algorithm to control words for our datapath, weneed to look at all the instructions in the algorithm that

performs data operations (since this is what the datapath isresponsible for); namely, lines 1, 3 and 4. Line 2 is not a dataoperation instruction but rather a control instruction, eventhough it reads the value of i. The condition is evaluated by thedatapath and a status signal (telling whether the condition istrue or false) is generated and sent to the control unit.Depending on this status signal, the control unit will decidewhether or not to loop again. The control words for the threeinstructions are shown in figure below.

1 i=0

2 while ( i<10) {

3 i=i+1

4 Output i

5 }

Fig 4: - Algorithm to generate numbers

ControlWord

InstructionIE ALU Load Clear OE6 5to3 2 1 0

1 i=0 x xxx 0 1 02 i=i+1 0 100(add) 1 0 03 Output i x xxx 0 0 1

Fig 5: - Control Word Instructions for above Algorithm

Control word 1 initializes i to 0. The register in the datapath isused to store the value of i. Since the register has a Clear feature, we can assert this Clear signal to zero the register.The ALU is not needed in this operation so it doesn’t matter





what the inputs to the ALU are, or the operation that isselected for the ALU to perform. Hence, the four control lines

IE (for selecting the input), and ALU 2, ALU 1, and ALU 0 (for selecting the ALU operation) are all set to ’s (“don’t cares”).

Load is de-asserted because we don’t need to store the outputof the ALU to the register. At this time, we also do not want tooutput the value from the register, so the output control line

OE is also de-asserted.

Control word 2 increments i, so we need to add a oneto the value that is stored in the register. Although, the ALUhas an increment operation, we cannot use it because the ALUwas designed such that the operation increments the A operand rather than the B operand (see Figure 1(b)), and our datapath isconnected such that the output of the register goes to the Boperand. Now, we can modify the ALU to have an increment

B operation, or we can modify the datapath so that the outputof the register can be routed to the A operand of the ALU.However, both of these solutions require the modifications of the datapath, and this defeats the purpose of using a general

datapath. Instead, what we can do is to use the ALU add (100)operation to increment the value stored in the register by one.We can get a one to the A operand by setting IE to 0 since the0 input line of the mux is tied to the constant ‘1’. The Boperand will have the register value. Finally, we need to load the result of the ALU back into the register so the Load line isasserted.

Control word 3 outputs the incremented value. Again, wedon’t care about the inputs to the ALU and the operation of theALU, so there is no new value to load into the register. Wedefinitely do not want to clear the register. We simply want tooutput the value from the register, so we just assert OE by

setting it to a 1. Note that control words 2 and 3 must beexecuted ten times in order to output the ten numbers. Thewhile loop in the algorithm is implemented in the control unitand we will see in the next chapter how it is done. Thesimulation trace of the control words is shown in Figure 3.

Notice that two cycles are needed for each count – the firstcycle for control word 2 and the second cycle for control word 3. These two cycles are repeated ten times for the ten numbers.For example, at 500ns (at beginning of the first of the twoclock cycles), Load = 1 and OE = 0. The current content of theregister is 1. Since OE = 0, so the output is Z. At 700ns (the

beginning of the second of the two clock cycles), the register isupdated with the value 2. Load is de-asserted and OE isasserted, and the number 2 is outputted.

The control unit will generate the appropriate controlsignals for the datapath for each clock cycle. The control unitwill also have to determine whether to repeat control words 2and 3 in the loop, or to terminate. In order for the control unitto know this, we must add a comparator to the output of theregister in the datapath to test whether the count is ten or not.The output of this comparator is the status signal that thedatapath sends to the control unit.

III. CUSTOM DATAPATH U NIT DESIGN

When a particular general datapath does not contain all thefunctional units and/or registers needed to perform all therequired operations specified in the algorithm that you aretrying to solve, then you need to select a more complexdatapath. When working with general Datapath, the goal is tofind the simplest and smallest one that matches therequirements of the problem as close as possible. Example 8.2shows the need for selecting a more complex datapath.

The Custom Datapath Unit is explained using an exampleillustrated below, let us use the simple datapath of Figure1 togenerate and add the numbers from n down to 1 where n is aninput number, and output the sum of these numbers. Thealgorithm for doing this is shown in Figure 6. The algorithmrequires the use of two variables, n for the input that countsdown to zero, and sum for adding up the total. This means thatwe need two registers in the datapath, unless we want the user to enter the numbers from n down to 1 manually and just usethe one register to store the sum. Thus, we conclude that thedatapath of Figure 1 cannot be used to implement thisalgorithm.

In order to implement the algorithm of Figure 6 we need a

slightly more complex datapath that includes at least tworegisters. One possible datapath is shown in Figure 8. The maindifference between this datapath and the previous one is that aregister file (RF) with four locations is used instead of having

just one register. To access a particular port, the enable line for that port must be asserted and the address for the location setup. The designated lines are WE for write enable, RAE for read

port A enable, and RBE for read port B enable, WA for the writeaddress, RAA for the read port A address, and RBA for the read

port B address. The read ports A and B can be read simultaneously, and they are connected to the two inputoperands A and B of the ALU respectively. The result of theALU is passed through a shifter whose operations are specified in Figure 10. Although the shifter is not needed by thealgorithm of Figure 6, it is available in this datapath. Theoutput of the shifter is routed back to the register file via themux or it can be outputted externally by enabling the output tri-state buffer. The datapath width is again assumed to be eight

bits wide.

1 sum=0

2 input n

3 while (n!=0) {4 sum= sum+n;

5 n=n+1

6 }

7 Output sum

Fig 6: - Algorithm to generate numbers n down to 1





Fig 7: - Control Words for above Algorithm Implementation

IV. SIMULATION R ESULTS

One control word is executed in one clock cycle. In oneclock cycle, data from a register is first read, then it passesthrough functional units and gets modified, and finally it iswritten back to a register. In example discussed above, twocontrol words are needed for the addition and the outputoperations. Control word 2 does the addition and writing of the result into the register. We see that during this clock cyclefor control words 2, the operations start with the constant ‘1’

passing down through the mux, follow by the ALU performing the addition. The resulting value from the additionis written to the register at the beginning of the next clock cycle. New value gets latched into the flip-flop at the active(rising) edge of the clock. Therefore, the value that is availableat the output of the register in the current clock cycle is stillthe value before the write back, which is the value before theincrement. If we assert the OE signal in the same clock cycleto output the register value as shown in control word 2 of Figure 5, the output value would be the value before theincrement and not the result from after the increment.Performing both a read and a write from/to the same register in the same control word, i.e. same clock cycle, do not createany signal conflict because the reading occurs immediately inthe current clock cycle and is getting the original value that isin the register. The writing occurs at the beginning of the nextclock cycle after the reading.

Fig 8: - Simulation for three control words shown in fig 3

ControlWord Instruction

IE ALU Load Clear OE6 5to3 2 1 0

1 6 x xxx 0 1 02 0 100(add) 1 0 1

Fig 9: - Counting Algorithm for two control words for Datapathsshow in fig 1

Fig 10: - Simulation Results for data show in fig 9

Fig 11: - Optimized control words for the Counting algorithm usingDatapath shown in fig 1

Fig 12: - Corrected Simulation trace for using two control wordsfrom fig 9

Fig 13: - Simulation Trace for the simulation problems of controlword shown in fig 6

ControlWord Instruction

IE ALU Load Clear OE6 5to3 2 1 0

1 i=0 x xxx 0 1 0

2 i=i+1 0 100(add) 1 0 0

3i+1 andoutput i 0 100(add) 1 0 1





V. CONCLUSION

Hence, we designed the both General Purpose CPUDatapath Unit and Custom Datapath Unit. In the above twodesigns the General Purpose Datapath Unit is a general purposewhich will be used for all operations but whereas the CustomDatapath Unit is the complex type which is designed for aspecific applications such as complex addition, complex

multiplication etc. And we are designed these on FPGAthrough Modelsim. And we are analyzed through TimingAnalysis of each unit.

References

[1] Zheng-WeiMin, Tang-ZhiZhong. Computer System Structure (Thesecond edition), Tsinghua University Press, 2006.

[2] A.Sudnitson, “FINITE STATE MACHINES WITH DATAPATHPARTITIONING FOR LOW POWER SYNTHESIS ”, Tallinn TechnicalUniversity, ESTONIA.

[3] Bhatia.S and N.K. Jha, 1998, Integration of hierarchical test generationwith behavioral synthesis of controller and datapath circuits. IEEETrans.Very Large Scale Integration (VLSI) Syst.,6:608-619.

design of high performance cpu datapath unit

Documents