eecs 470: computer architecture

58
EECS 470: Computer Architecture Discussion #2 Friday, September 14, 2007

Upload: others

Post on 05-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECS 470: Computer Architecture

EECS 470: Computer Architecture

Discussion #2

Friday, September 14,

2007

Page 2: EECS 470: Computer Architecture

Administrative

� Homework 1 due right now

� Project 1 due tonight

– Make sure its synthesizable

� Homework 2 due week from Wednesday

� Project 2 due week from Monday

Page 3: EECS 470: Computer Architecture

Setting values – assign statements

� Descriptions of combinational logic� assign <wire> = <expression>;

� Left hand side must be a wire, right hand side can be anything

Example

wire [3:0] a; // 3 bit

wire [3:0] b; // 3 bit

wire sel; // sel == 0 -> a, otherwise b;

wire [3:0] result;

assign result = sel ? b : a;

Page 4: EECS 470: Computer Architecture

Setting Values – always block

� All outputs(LHS) must be registers– Much of the time they’ll become wires

� We’ll mostly use two kinds:– always @* – Updated whenever any of the inputs change

� All outputs should become wires

� Use blocking operator (=)

� Assign variable through all paths

– always @(posedge clock) – Update synchronously at positive clock edge

� Use delay statements, #

� Use non-blocking operator (<=)

� Does not need to assign variable through all paths

Page 5: EECS 470: Computer Architecture

Setting values – always block examples

Combinational Example

reg x;

always @*

begin

if (en)

x = a | b;

else

x = c;

end

Page 6: EECS 470: Computer Architecture

Setting values – always block examples

Combinational Example

reg x;

always @*

begin

x = c;

if (en)

x = a | b;

end

Page 7: EECS 470: Computer Architecture

Setting values – always block examples

Sequential Example

always @(posedge clock)

begin

if (reset)

x <= #1 1’b0;

else

begin

if (en)

x <= #1 new_x;

end

end

Page 8: EECS 470: Computer Architecture

Simple Example

4 Input AND gate

module AND2(a,x);

input [1:0] a;

output x;

assign x=a[0] & a[1];

endmodule

module AND4(in,out);

input [3:0] in;

output out;

wire [1:0] tmp;

AND2 left(.a(in[1:0]),.x(tmp[0]));

AND2 right(.a(in[3:2]),.x(tmp[1]));

AND2 top(.a(tmp),.x(out));

endmodule

Page 9: EECS 470: Computer Architecture

Simple Example - Diagram

in[3]in[2]in[1]in[0]

AND2

x

a[0] a[1]

AND2

x

a[0] a[1]

tmp[0] tmp[1]

out

AND2

x

a[0] a[1]

AND4

Page 10: EECS 470: Computer Architecture

Array Connections

� Make a simple module and duplicate it a bunch

� Assume we have a module definition:

– one_bit_addr(a,b,cin,sum,cout);

� All ports are 1 bit, first three input, last two output

� How do we build an eight bit adder?

Page 11: EECS 470: Computer Architecture

The Error Prone Way

module eight_bit_addr(a,b,cin,sum,cout);

input [7:0] a,b;

input cin;

output [7:0] sum;

output cout;

wire [6:0] carries;

one_bit_addr a0(a[0],b[0],cin,sum[0], carries[0]);

one_bit_addr a1(a[1],b[1],carries[0],sum[1], carries[1]);

one_bit_addr a2(a[2],b[2],carries[1],sum[2], carries[2]);

one_bit_addr a3(a[3],b[3],carries[2],sum[3], carries[3]);

one_bit_addr a4(a[4],b[4],carries[3],sum[4], carries[4]);

one_bit_addr a5(a[5],b[5],carries[4],sum[5], carries[5]);

one_bit_addr a6(a[6],b[6],carries[5],sum[6], carries[6]);

one_bit_addr a7(a[7],b[7],carries[6],sum[7], cout);

endmodule

Page 12: EECS 470: Computer Architecture

The Error Prone Way Continued

� Lots of duplicated code� If you missed replacing one number it’s hard to find

– Especially if it was much bigger, and had even more connections

– Your tests might not catch the case

� There is an one line substitute

Page 13: EECS 470: Computer Architecture

The Better Way

module eight_bit_addr(a,b,cin,sum,cout);

input [7:0] a,b;

input cin;

output [7:0] sum;

output cout;

wire [6:0] carries;

one_bit_addr addr [7:0]

(.a(a),.b(b),.cin({carries,cin}),.sum(sum),.cout({cout,

carries}));

� Since the one_bit_addr ports are all 1 bit, we are instantiating 8 of them, and the eight_bit_addr ports are 8 bits, each one bit port will get one bit from the 8 bit value.

Page 14: EECS 470: Computer Architecture

Array Connections Summary

� If the port width matches the wire width the wire is connected to the port

� Note the concatenation operator in the previous example

– It’s making the carries width correct and taking care of the boundary conditions

Page 15: EECS 470: Computer Architecture

Synthesis

� Translate verilog to gates� Optimize translation to meet certain constraits� Extremely complex process� If you follow all the directions we’ve given you everything will

probably work– I’m not guaranteeing it though

� All you designs will need to synthesize– That way you’ll know you’re not doing anything that would be hard

to implement in gates

� Clock period isn’t perfect– No global placement and routing

– We fake the capacitance of wires

Page 16: EECS 470: Computer Architecture

Hints to Synthesis Tool

� //synopysis sync_set_reset "<signal>"

– Goes right before a synchronous always block

– Tells Design Compiler that the <signal> is a synchronous reset

– Helps the synthesis tool choose a synchronous reset

� //synopysis parallel_case

– Placed before a case statement

– Only one branch of a case can be true at a time

� //synopysis full_case

– Placed before a case statement

– Any unspecified cases are invalid

– You can also put a default: in the case for good measure

� //synopysis one_hot "<signal>"

– Placed after signal declared

– Only one signal of the group will be 1 at a given time

Page 17: EECS 470: Computer Architecture

Synthesis Scripts

#/***********************************************************/

#/* The following five lines must be updated for every */

#/* new design */

#/***********************************************************/

read_file -f verilog [list "inout.v"]

set design_name tinout

set clock_name clock

set CLK_PERIOD 6

set reset_name reset

#/***********************************************************/

#/* The rest of this file may be left alone for most small */

#/* to moderate sized designs. You may need to alter it */

#/* when synthesizing your final project. */

#/***********************************************************/

set SYN_DIR ./

set search_path

"/afs/engin.umich.edu/caen/generic/mentor_lib-D.1/public/eecs470/synopsys/"

set target_library "lec25dscc25_TT.db“

Page 18: EECS 470: Computer Architecture

Synthesis Script

� A bunch of directives to tell the Design Compiler what to do� Minimally you need to be familiar with the first 5 lines� read_file -f verilog [list "myfile.v"]

– Read the verilog file myfile.v

� set design_name mydesign

– Synthesize the module mydesign and all modules it instantiates

� set clock_name clock

– The name of the clock

� set CLK_PERIOD 6

– Set the clock period to 6ns

� set reset_name reset

– The name of the reset line in reset

Page 19: EECS 470: Computer Architecture

More Advanced Synthesis

� As designs get bigger you may want to break up the synthesis into multiple parts

– In this case you may compile lower level modules separately and work your way up

– Although not strictly necessary, you’ll need to do it for the multiplier

– We’ll talk more about it for your final project� The lowest level will be just like this

� Higher levels will include the lower levels output and the file to synthesize

� Look at .tcl in project 2 to see how the higher level includes the lower level

� You should familiarize yourself with the tcl files. If you would like to look at the documentation for VCS or Design Complier execute: sold

Page 20: EECS 470: Computer Architecture

Synthesis Output

� xxxx_synth.out — The output that scrolls across the screen at high speed

� <designname>.chk — The synthesis tool places warnings in here

� <designname>.rep — Timing report

� <designname>.vg — Structural verilog output

� <designname>.db/xg — Compiled output for including in other designs

Page 21: EECS 470: Computer Architecture

synth.out

� Prints all the lines in the tcl file as it executes them

� If you have a problem with synthesis this is a good first place to look– *** Presto compilation terminated with 2 errors. ***

� Also contains information about what flip-flops/latches it found

Page 22: EECS 470: Computer Architecture

synth.out – Good output

Inferred memory devices in process

in routine <design_name> line XXX in file

’<path to file>/<file>.v’.

===============================================================================

| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |

===============================================================================

| state_reg | Flip-flop | 2 | Y | N | N | N | Y | N | N |

===============================================================================

� All the Types are: Flip-flop

� Every register we think we should have, should be

listed along with the correct width

Page 23: EECS 470: Computer Architecture

synth.out – Bad output

Inferred memory devices in process

in routine <design_name> line XXX in file

’<path to file>/<file>.v’.

===========================================================================

| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |

===========================================================================

| next_state_reg | Latch | 2 | Y | N | N | N | - | - | - |

===========================================================================

� You should never see a Latch

– It means you have some state in one of your combinational blocks

� Gives you the line number to go find the error

Page 24: EECS 470: Computer Architecture

<design name>.chk

� Prints warnings that may or may not be a problem

� Good to look at and verify that you don’t have a

problemWarning: In design ’icache’, port ’proc2Icache_addr[0]’ is not connected to any

nets. (LINT-28)

– That is fine if you didn’t connect those bits to anything

– Or they are always 0 because you can’t have an unaligned access

� Will give you places to look if you have problems

with your synthesized code

Page 25: EECS 470: Computer Architecture

<design name>.rep

� Lists critical paths through your design

� All slacks should be “MET”

– If any are “VIOLATED” you have too aggressive of a clock period or a bad design

Page 26: EECS 470: Computer Architecture

<design name>.rep

startpoint: state_reg[1]

(rising edge-triggered flip-flop clocked by clock)

Endpoint: gnt_b (output port clocked by clock)

...

Point Fanout Trans Incr Path

---------------------------------------------------------------------

state_reg[1]/CLK (dffcs1) 0.00 0.00 0.00 r

state_reg[1]/QN (dffcs1) 0.15 0.16 0.16 f

n5 (net) 1 0.00 0.16 f

state_reg[1]/Q (dffcs1) 0.59 0.24 0.40 r

gnt_b (net) 2 0.00 0.40 r

gnt_b (out) 0.59 0.02 0.42 r

data arrival time 0.42

max_delay 6.00 6.00

clock uncertainty -0.10 5.90

output external delay -0.10 5.80

data required time 5.80

---------------------------------------------------------------------

data required time 5.80

data arrival time -0.42

---------------------------------------------------------------------

slack (MET) 5.38

Page 27: EECS 470: Computer Architecture

<design name>.rep

� Trans – Time for a logic transition to occur� Incr – Time that is added to the critical path because

of it

� Path – Total Path so far� Slack needs to be positive: closer to 0 it is, closer

you are to the clock period limit

� Just because you have Xns of slack doesn’t mean that you can’t do better

– If there is a lot of slack VCS won’t try very hard– Closer to the limit you are the harder it will try (the longer it

will take)

Page 28: EECS 470: Computer Architecture

<design name>.vg

module a1 ( clock, reset, req_a, gnt_a, req_b, gnt_b );

input clock, reset, req_a, req_b;

output gnt_a, gnt_b;

wire N19, N20, N21, n2, n3, n5;

wire [1:0] next_state;

hib1s1 U9 ( .Q(n2), .DIN(reset) );

dffcs2 \state_reg[0] ( .Q(gnt_a), .CLK(clock), .CLRB(next_state[0]), .DIN( n2) );

dffcs1 \state_reg[1] ( .Q(gnt_b), .QN(n5), .CLK(clock), .CLRB(next_state[.DIN(n3) );

and2s1 U10 ( .Q(N19), .DIN1(req_a), .DIN2(n5) );

nor3s1 U11 ( .Q(N21), .DIN1(N19), .DIN2(gnt_a), .DIN3(n3) );

ib1s1 U12 ( .Q(n3), .DIN(req_b) );

or4s1 U13 ( .Q(N20), .DIN1(gnt_a), .DIN2(gnt_b), .DIN3(req_a), DIN4(req_b));

endmodule

Page 29: EECS 470: Computer Architecture

Multiplying by partial products

� Most hardware multipliers involve computing

a number of partial products and then

summing them

� Very similar to how you learned to multiply in

second grade

– Do each bit at a time and then sum all the partial

products to get your answer

Page 30: EECS 470: Computer Architecture

Second Grade Way

000111 7

* 0101 * 5

000111

0000

0111

+ 0000 . .

100011 35

Page 31: EECS 470: Computer Architecture

2 bits at a time – partial products

000111 xx011100 << 2

* xx01 * 01xx >> 2

000111 011100

+ 000000 + 000000 .

000111 011100

Page 32: EECS 470: Computer Architecture

2 bits at a time – add products

000111

+ 011100

100011

Page 33: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00001011 multicand: 00001011 (11)

* 00000011 multiplier: 00000111 (7)

partial product: 00000000

Page 34: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00001011 multicand: 00001011

* 00000011 multiplier: 00000111

00100001 (33) + partial product: 00000000 (0)

Page 35: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00001011 multicand: 00001011

* 00000011 multiplier: 00000111

00100001 partial product: 00100001 (33)

Page 36: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00001011 multicand: 00001011 << 2

* 00000011 multiplier: 00000111 >> 2

00100001 partial product: 00100001

Page 37: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00101100 multicand: 00101100 (44)

* 00000001 multiplier: 00000001 (1)

partial product: 00100001

Page 38: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00101100 multicand: 00101100

* 00000001 multiplier: 00000001

00101100 (44) + partial product: 00100001 (33)

Page 39: EECS 470: Computer Architecture

2-stage multiplication – 2 bits at a time

00101100 multicand: 00101100

* 00000001 multiplier: 00000001

00101100 partial product: 01001101 (77)

Page 40: EECS 470: Computer Architecture

Project 2 – Part 1

� Supplied with a 8-stage multiplier

� Make a 4-stage multiplier

� Make a 2-stage multiplier

� Synthesize each and answer some questions

– Make sure you set an aggressive clock period

Page 41: EECS 470: Computer Architecture

Part 1 – pipe_mult.v

module mult(clock, reset, mplier, mcand, start, product, done);

input clock, reset, start;

input [63:0] mcand, mplier;

output [63:0] product;

output done;

wire [63:0] mcand_out, mplier_out;

wire [(7*64)-1:0] internal_products, internal_mcands, internal_mpliers;

wire [6:0] internal_dones;

mult_stage mstage [7:0]

(.clock(clock),

.reset(reset),

.product_in({internal_products,64’h0}),

.mplier_in({internal_mpliers,mplier}),

.mcand_in({internal_mcands,mcand}),

.start({internal_dones,start}),

.product_out({product,internal_products}),

.mplier_out({mplier_out,internal_mpliers}),

.mcand_out({mcand_out,internal_mcands}),

.done({done,internal_dones})

);

endmodule

Page 42: EECS 470: Computer Architecture

Part 1 – mult_stage.v

module mult_stage(clock, reset, product_in, mplier_in, mcand_in, start,

product_out, mplier_out, mcand_out, done);

....

reg [63:0] prod_in_reg, partial_prod_reg;

wire [63:0] partial_product, next_mplier, next_mcand;

assign product_out = prod_in_reg + partial_prod_reg;

assign partial_product = mplier_in[7:0] * mcand_in;

assign next_mplier = {8’b0,mplier_in[63:8]};

assign next_mcand = {mcand_in[55:0],8’b0};

always @(posedge clock)

begin

prod_in_reg <= #1 product_in;

partial_prod_reg <= #1 partial_product;

mplier_out <= #1 next_mplier;

mcand_out <= #1 next_mcand;

end

always @(posedge clock)

begin

if(reset)

done <= #1 1’b0;

else

done <= #1 start;

end

endmodule

Page 43: EECS 470: Computer Architecture

Part 2 – Integer Square Root

� Conceptually it’s a loop

– Propose highest bit of answer is set and square

the proposed answer

– If the result < value keep the bit set

– Otherwise clear the bit

– now try the next most significant bit

� You won’t use a loop primitive to implement it

though

Page 44: EECS 470: Computer Architecture

Part 2 – ISR state machine

� Set the highest bit of the solution

� Start a multiply

� Wait until the multiply completes

� Check the result against the value that you’re computing the ISR of

� If less than keep the bit, greater than clear the bit

� Start with the next most significant bit until you’ve tested all 32 bits

� When done with all 32 bits raise the done signal for 1 cycle

� If at any time you receive a reset signal start over

Page 45: EECS 470: Computer Architecture

Part 2 – Warnings

� When you’re dealing with 64 bit numbers in verilog you need to specify them as 64’hXXXX or 64’dXXXXX

� If you leave off the 64’ you won’t get the number you wanted

� Pay attention to how the reset operates

– If your device receives a reset during it’s calculation, it should start over with the new value

– The reset causes the input value to be flopped (stored by the ISR module)

– The value can change after the reset goes low

� Your testbenches should also be testing for these conditions

� Must not take more than 600 cycles to complete one ISR

– Average is between 300-400 cycles

Page 46: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 0000 (0)

Proposed2 00000000 (0)

Page 47: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1000 (8)

Proposed2 00000000 (0)

Page 48: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1000 (8)

Proposed2 01000000 (64)

Page 49: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1100 (12)

Proposed2 00000000 (0)

Page 50: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1100 (12)

Proposed2 10010000 (144)

Page 51: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1110 (14)

Proposed2 00000000 (0)

Page 52: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1110 (14)

Proposed2 11000100 (196)

Page 53: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1100 (12)

Proposed2 11000100 (196)

Page 54: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1101 (13)

Proposed2 00000000 (0)

Page 55: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1101 (13)

Proposed2 10101001 (169)

Page 56: EECS 470: Computer Architecture

Part 2 – Simple Example

Input 10101101 (173)

Proposed 1101 (13)

Proposed2 10101001 (169)

√173 = 13.153

Page 57: EECS 470: Computer Architecture

Part 3 – Synthesize ISR

� Synthesize the ISR module you made in Part

2

� Answer some more questions

Page 58: EECS 470: Computer Architecture

Synthesis – mult.tcl

read_file -f ddc [list "mult_stage.ddc"]

set_dont_touch mult_stage

read_file -f verilog [list "pipe_mult.v"]

set design_name mult

set clock_name clock

set reset_name reset

set CLK_PERIOD 10