eecs 470: computer architecture

EECS 470: Computer Architecture

Discussion #2

Friday, September 14,

2007

Administrative

� Homework 1 due right now

� Project 1 due tonight

– Make sure its synthesizable

� Homework 2 due week from Wednesday

� Project 2 due week from Monday

Setting values – assign statements

� Descriptions of combinational logic� assign <wire> = <expression>;

� Left hand side must be a wire, right hand side can be anything

Example

wire [3:0] a; // 3 bit

wire [3:0] b; // 3 bit

wire sel; // sel == 0 -> a, otherwise b;

wire [3:0] result;

assign result = sel ? b : a;

Setting Values – always block

� All outputs(LHS) must be registers– Much of the time they’ll become wires

� We’ll mostly use two kinds:– always @* – Updated whenever any of the inputs change

� All outputs should become wires

� Use blocking operator (=)

� Assign variable through all paths

– always @(posedge clock) – Update synchronously at positive clock edge

� Use delay statements, #

� Use non-blocking operator (<=)

� Does not need to assign variable through all paths

Setting values – always block examples

Combinational Example

reg x;

…

always @*

begin

if (en)

x = a | b;

else

x = c;

end


Combinational Example

reg x;

…

always @*

begin

x = c;

if (en)

x = a | b;

end


Sequential Example

always @(posedge clock)

begin

if (reset)

x <= #1 1’b0;

else

begin

if (en)

x <= #1 new_x;

end

end

Simple Example

4 Input AND gate

module AND2(a,x);

input [1:0] a;

output x;

assign x=a[0] & a[1];

endmodule

module AND4(in,out);

input [3:0] in;

output out;

wire [1:0] tmp;

AND2 left(.a(in[1:0]),.x(tmp[0]));

AND2 right(.a(in[3:2]),.x(tmp[1]));

AND2 top(.a(tmp),.x(out));

endmodule

Simple Example - Diagram

in[3]in[2]in[1]in[0]

AND2

x

a[0] a[1]

AND2

x

a[0] a[1]

tmp[0] tmp[1]

out

AND2

x

a[0] a[1]

AND4

Array Connections

� Make a simple module and duplicate it a bunch

� Assume we have a module definition:

– one_bit_addr(a,b,cin,sum,cout);

� All ports are 1 bit, first three input, last two output

� How do we build an eight bit adder?

The Error Prone Way

module eight_bit_addr(a,b,cin,sum,cout);

input [7:0] a,b;

input cin;

output [7:0] sum;

output cout;

wire [6:0] carries;

one_bit_addr a0(a[0],b[0],cin,sum[0], carries[0]);

one_bit_addr a1(a[1],b[1],carries[0],sum[1], carries[1]);






one_bit_addr a7(a[7],b[7],carries[6],sum[7], cout);

endmodule

The Error Prone Way Continued

� Lots of duplicated code� If you missed replacing one number it’s hard to find

– Especially if it was much bigger, and had even more connections

– Your tests might not catch the case

� There is an one line substitute

The Better Way

module eight_bit_addr(a,b,cin,sum,cout);

input [7:0] a,b;

input cin;

output [7:0] sum;

output cout;

wire [6:0] carries;

one_bit_addr addr [7:0]

(.a(a),.b(b),.cin({carries,cin}),.sum(sum),.cout({cout,

carries}));

� Since the one_bit_addr ports are all 1 bit, we are instantiating 8 of them, and the eight_bit_addr ports are 8 bits, each one bit port will get one bit from the 8 bit value.

Array Connections Summary

� If the port width matches the wire width the wire is connected to the port

� Note the concatenation operator in the previous example

– It’s making the carries width correct and taking care of the boundary conditions

Synthesis

� Translate verilog to gates� Optimize translation to meet certain constraits� Extremely complex process� If you follow all the directions we’ve given you everything will

probably work– I’m not guaranteeing it though

� All you designs will need to synthesize– That way you’ll know you’re not doing anything that would be hard

to implement in gates

� Clock period isn’t perfect– No global placement and routing

– We fake the capacitance of wires

Hints to Synthesis Tool

� //synopysis sync_set_reset "<signal>"

– Goes right before a synchronous always block

– Tells Design Compiler that the <signal> is a synchronous reset

– Helps the synthesis tool choose a synchronous reset

� //synopysis parallel_case

– Placed before a case statement

– Only one branch of a case can be true at a time

� //synopysis full_case

– Placed before a case statement

– Any unspecified cases are invalid

– You can also put a default: in the case for good measure

� //synopysis one_hot "<signal>"

– Placed after signal declared

– Only one signal of the group will be 1 at a given time

Synthesis Scripts

#/***********************************************************/

#/* The following five lines must be updated for every */

#/* new design */

#/***********************************************************/

read_file -f verilog [list "inout.v"]

set design_name tinout

set clock_name clock

set CLK_PERIOD 6

set reset_name reset

#/***********************************************************/

#/* The rest of this file may be left alone for most small */

#/* to moderate sized designs. You may need to alter it */

#/* when synthesizing your final project. */

#/***********************************************************/

set SYN_DIR ./

set search_path

"/afs/engin.umich.edu/caen/generic/mentor_lib-D.1/public/eecs470/synopsys/"

set target_library "lec25dscc25_TT.db“

…

Synthesis Script

� A bunch of directives to tell the Design Compiler what to do� Minimally you need to be familiar with the first 5 lines� read_file -f verilog [list "myfile.v"]

– Read the verilog file myfile.v

� set design_name mydesign

– Synthesize the module mydesign and all modules it instantiates

� set clock_name clock

– The name of the clock

� set CLK_PERIOD 6

– Set the clock period to 6ns

� set reset_name reset

– The name of the reset line in reset

More Advanced Synthesis

� As designs get bigger you may want to break up the synthesis into multiple parts

– In this case you may compile lower level modules separately and work your way up

– Although not strictly necessary, you’ll need to do it for the multiplier

– We’ll talk more about it for your final project� The lowest level will be just like this

� Higher levels will include the lower levels output and the file to synthesize

� Look at .tcl in project 2 to see how the higher level includes the lower level

� You should familiarize yourself with the tcl files. If you would like to look at the documentation for VCS or Design Complier execute: sold

Synthesis Output

� xxxx_synth.out — The output that scrolls across the screen at high speed

� <designname>.chk — The synthesis tool places warnings in here

� <designname>.rep — Timing report

� <designname>.vg — Structural verilog output

� <designname>.db/xg — Compiled output for including in other designs

synth.out

� Prints all the lines in the tcl file as it executes them

� If you have a problem with synthesis this is a good first place to look– *** Presto compilation terminated with 2 errors. ***

� Also contains information about what flip-flops/latches it found

synth.out – Good output

Inferred memory devices in process

in routine <design_name> line XXX in file

’<path to file>/<file>.v’.

===============================================================================

| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |

===============================================================================

| state_reg | Flip-flop | 2 | Y | N | N | N | Y | N | N |

===============================================================================

� All the Types are: Flip-flop

� Every register we think we should have, should be

listed along with the correct width

synth.out – Bad output

Inferred memory devices in process

in routine <design_name> line XXX in file

’<path to file>/<file>.v’.

===========================================================================

| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |

===========================================================================

| next_state_reg | Latch | 2 | Y | N | N | N | - | - | - |

===========================================================================

� You should never see a Latch

– It means you have some state in one of your combinational blocks

� Gives you the line number to go find the error

<design name>.chk

� Prints warnings that may or may not be a problem

� Good to look at and verify that you don’t have a

problemWarning: In design ’icache’, port ’proc2Icache_addr[0]’ is not connected to any

nets. (LINT-28)

– That is fine if you didn’t connect those bits to anything

– Or they are always 0 because you can’t have an unaligned access

� Will give you places to look if you have problems

with your synthesized code

<design name>.rep

� Lists critical paths through your design

� All slacks should be “MET”

– If any are “VIOLATED” you have too aggressive of a clock period or a bad design

<design name>.rep

startpoint: state_reg[1]

(rising edge-triggered flip-flop clocked by clock)

Endpoint: gnt_b (output port clocked by clock)

...

Point Fanout Trans Incr Path

---------------------------------------------------------------------

state_reg[1]/CLK (dffcs1) 0.00 0.00 0.00 r

state_reg[1]/QN (dffcs1) 0.15 0.16 0.16 f

n5 (net) 1 0.00 0.16 f

state_reg[1]/Q (dffcs1) 0.59 0.24 0.40 r

gnt_b (net) 2 0.00 0.40 r

gnt_b (out) 0.59 0.02 0.42 r

data arrival time 0.42

max_delay 6.00 6.00

clock uncertainty -0.10 5.90

output external delay -0.10 5.80

data required time 5.80

---------------------------------------------------------------------

data required time 5.80

data arrival time -0.42

---------------------------------------------------------------------

slack (MET) 5.38

<design name>.rep

� Trans – Time for a logic transition to occur� Incr – Time that is added to the critical path because

of it

� Path – Total Path so far� Slack needs to be positive: closer to 0 it is, closer

you are to the clock period limit

� Just because you have Xns of slack doesn’t mean that you can’t do better

– If there is a lot of slack VCS won’t try very hard– Closer to the limit you are the harder it will try (the longer it

will take)

<design name>.vg

module a1 ( clock, reset, req_a, gnt_a, req_b, gnt_b );

input clock, reset, req_a, req_b;

output gnt_a, gnt_b;

wire N19, N20, N21, n2, n3, n5;

wire [1:0] next_state;

hib1s1 U9 ( .Q(n2), .DIN(reset) );

dffcs2 \state_reg[0] ( .Q(gnt_a), .CLK(clock), .CLRB(next_state[0]), .DIN( n2) );

dffcs1 \state_reg[1] ( .Q(gnt_b), .QN(n5), .CLK(clock), .CLRB(next_state[.DIN(n3) );

and2s1 U10 ( .Q(N19), .DIN1(req_a), .DIN2(n5) );

nor3s1 U11 ( .Q(N21), .DIN1(N19), .DIN2(gnt_a), .DIN3(n3) );

ib1s1 U12 ( .Q(n3), .DIN(req_b) );

or4s1 U13 ( .Q(N20), .DIN1(gnt_a), .DIN2(gnt_b), .DIN3(req_a), DIN4(req_b));

endmodule

Multiplying by partial products

� Most hardware multipliers involve computing

a number of partial products and then

summing them

� Very similar to how you learned to multiply in

second grade

– Do each bit at a time and then sum all the partial

products to get your answer

Second Grade Way

000111 7

* 0101 * 5

000111

0000

0111

+ 0000 . .

100011 35

2 bits at a time – partial products

000111 xx011100 << 2

* xx01 * 01xx >> 2

000111 011100

+ 000000 + 000000 .

000111 011100

2 bits at a time – add products

000111

+ 011100

100011

2-stage multiplication – 2 bits at a time

00001011 multicand: 00001011 (11)

* 00000011 multiplier: 00000111 (7)

partial product: 00000000


00001011 multicand: 00001011

* 00000011 multiplier: 00000111

00100001 (33) + partial product: 00000000 (0)


00001011 multicand: 00001011

* 00000011 multiplier: 00000111

00100001 partial product: 00100001 (33)


00001011 multicand: 00001011 << 2

* 00000011 multiplier: 00000111 >> 2

00100001 partial product: 00100001


00101100 multicand: 00101100 (44)

* 00000001 multiplier: 00000001 (1)

partial product: 00100001


00101100 multicand: 00101100

* 00000001 multiplier: 00000001

00101100 (44) + partial product: 00100001 (33)


00101100 multicand: 00101100

* 00000001 multiplier: 00000001

00101100 partial product: 01001101 (77)

Project 2 – Part 1

� Supplied with a 8-stage multiplier

� Make a 4-stage multiplier

� Make a 2-stage multiplier

� Synthesize each and answer some questions

– Make sure you set an aggressive clock period

Part 1 – pipe_mult.v

module mult(clock, reset, mplier, mcand, start, product, done);

input clock, reset, start;

input [63:0] mcand, mplier;

output [63:0] product;

output done;

wire [63:0] mcand_out, mplier_out;

wire [(7*64)-1:0] internal_products, internal_mcands, internal_mpliers;

wire [6:0] internal_dones;

mult_stage mstage [7:0]

(.clock(clock),

.reset(reset),

.product_in({internal_products,64’h0}),

.mplier_in({internal_mpliers,mplier}),

.mcand_in({internal_mcands,mcand}),

.start({internal_dones,start}),

.product_out({product,internal_products}),

.mplier_out({mplier_out,internal_mpliers}),

.mcand_out({mcand_out,internal_mcands}),

.done({done,internal_dones})

);

endmodule

Part 1 – mult_stage.v

module mult_stage(clock, reset, product_in, mplier_in, mcand_in, start,

product_out, mplier_out, mcand_out, done);

....

reg [63:0] prod_in_reg, partial_prod_reg;

wire [63:0] partial_product, next_mplier, next_mcand;

assign product_out = prod_in_reg + partial_prod_reg;

assign partial_product = mplier_in[7:0] * mcand_in;

assign next_mplier = {8’b0,mplier_in[63:8]};

assign next_mcand = {mcand_in[55:0],8’b0};


begin

prod_in_reg <= #1 product_in;

partial_prod_reg <= #1 partial_product;

mplier_out <= #1 next_mplier;

mcand_out <= #1 next_mcand;

end


begin

if(reset)

done <= #1 1’b0;

else

done <= #1 start;

end

endmodule

Part 2 – Integer Square Root

� Conceptually it’s a loop

– Propose highest bit of answer is set and square

the proposed answer

– If the result < value keep the bit set

– Otherwise clear the bit

– now try the next most significant bit

� You won’t use a loop primitive to implement it

though

Part 2 – ISR state machine

� Set the highest bit of the solution

� Start a multiply

� Wait until the multiply completes

� Check the result against the value that you’re computing the ISR of

� If less than keep the bit, greater than clear the bit

� Start with the next most significant bit until you’ve tested all 32 bits

� When done with all 32 bits raise the done signal for 1 cycle

� If at any time you receive a reset signal start over

Part 2 – Warnings

� When you’re dealing with 64 bit numbers in verilog you need to specify them as 64’hXXXX or 64’dXXXXX

� If you leave off the 64’ you won’t get the number you wanted

� Pay attention to how the reset operates

– If your device receives a reset during it’s calculation, it should start over with the new value

– The reset causes the input value to be flopped (stored by the ISR module)

– The value can change after the reset goes low

� Your testbenches should also be testing for these conditions

� Must not take more than 600 cycles to complete one ISR

– Average is between 300-400 cycles

Part 2 – Simple Example

Input 10101101 (173)

Proposed 0000 (0)

Proposed2 00000000 (0)


Input 10101101 (173)

Proposed 1000 (8)

Proposed2 00000000 (0)


Input 10101101 (173)

Proposed 1000 (8)

Proposed2 01000000 (64)


Input 10101101 (173)

Proposed 1100 (12)

Proposed2 00000000 (0)


Input 10101101 (173)

Proposed 1100 (12)

Proposed2 10010000 (144)


Input 10101101 (173)

Proposed 1110 (14)

Proposed2 00000000 (0)


Input 10101101 (173)

Proposed 1110 (14)

Proposed2 11000100 (196)


Input 10101101 (173)

Proposed 1100 (12)

Proposed2 11000100 (196)


Input 10101101 (173)

Proposed 1101 (13)

Proposed2 00000000 (0)


Input 10101101 (173)

Proposed 1101 (13)

Proposed2 10101001 (169)


Input 10101101 (173)

Proposed 1101 (13)

Proposed2 10101001 (169)

√173 = 13.153

Part 3 – Synthesize ISR

� Synthesize the ISR module you made in Part

2

� Answer some more questions

Synthesis – mult.tcl

read_file -f ddc [list "mult_stage.ddc"]

set_dont_touch mult_stage

read_file -f verilog [list "pipe_mult.v"]

set design_name mult

set clock_name clock

set reset_name reset

set CLK_PERIOD 10

…

eecs 470: computer architecture

Documents