eecs 470: computer architecture
TRANSCRIPT
EECS 470: Computer Architecture
Discussion #2
Friday, September 14,
2007
Administrative
� Homework 1 due right now
� Project 1 due tonight
– Make sure its synthesizable
� Homework 2 due week from Wednesday
� Project 2 due week from Monday
Setting values – assign statements
� Descriptions of combinational logic� assign <wire> = <expression>;
� Left hand side must be a wire, right hand side can be anything
Example
wire [3:0] a; // 3 bit
wire [3:0] b; // 3 bit
wire sel; // sel == 0 -> a, otherwise b;
wire [3:0] result;
assign result = sel ? b : a;
Setting Values – always block
� All outputs(LHS) must be registers– Much of the time they’ll become wires
� We’ll mostly use two kinds:– always @* – Updated whenever any of the inputs change
� All outputs should become wires
� Use blocking operator (=)
� Assign variable through all paths
– always @(posedge clock) – Update synchronously at positive clock edge
� Use delay statements, #
� Use non-blocking operator (<=)
� Does not need to assign variable through all paths
Setting values – always block examples
Combinational Example
reg x;
…
always @*
begin
if (en)
x = a | b;
else
x = c;
end
Setting values – always block examples
Combinational Example
reg x;
…
always @*
begin
x = c;
if (en)
x = a | b;
end
Setting values – always block examples
Sequential Example
always @(posedge clock)
begin
if (reset)
x <= #1 1’b0;
else
begin
if (en)
x <= #1 new_x;
end
end
Simple Example
4 Input AND gate
module AND2(a,x);
input [1:0] a;
output x;
assign x=a[0] & a[1];
endmodule
module AND4(in,out);
input [3:0] in;
output out;
wire [1:0] tmp;
AND2 left(.a(in[1:0]),.x(tmp[0]));
AND2 right(.a(in[3:2]),.x(tmp[1]));
AND2 top(.a(tmp),.x(out));
endmodule
Simple Example - Diagram
in[3]in[2]in[1]in[0]
AND2
x
a[0] a[1]
AND2
x
a[0] a[1]
tmp[0] tmp[1]
out
AND2
x
a[0] a[1]
AND4
Array Connections
� Make a simple module and duplicate it a bunch
� Assume we have a module definition:
– one_bit_addr(a,b,cin,sum,cout);
� All ports are 1 bit, first three input, last two output
� How do we build an eight bit adder?
The Error Prone Way
module eight_bit_addr(a,b,cin,sum,cout);
input [7:0] a,b;
input cin;
output [7:0] sum;
output cout;
wire [6:0] carries;
one_bit_addr a0(a[0],b[0],cin,sum[0], carries[0]);
one_bit_addr a1(a[1],b[1],carries[0],sum[1], carries[1]);
one_bit_addr a2(a[2],b[2],carries[1],sum[2], carries[2]);
one_bit_addr a3(a[3],b[3],carries[2],sum[3], carries[3]);
one_bit_addr a4(a[4],b[4],carries[3],sum[4], carries[4]);
one_bit_addr a5(a[5],b[5],carries[4],sum[5], carries[5]);
one_bit_addr a6(a[6],b[6],carries[5],sum[6], carries[6]);
one_bit_addr a7(a[7],b[7],carries[6],sum[7], cout);
endmodule
The Error Prone Way Continued
� Lots of duplicated code� If you missed replacing one number it’s hard to find
– Especially if it was much bigger, and had even more connections
– Your tests might not catch the case
� There is an one line substitute
The Better Way
module eight_bit_addr(a,b,cin,sum,cout);
input [7:0] a,b;
input cin;
output [7:0] sum;
output cout;
wire [6:0] carries;
one_bit_addr addr [7:0]
(.a(a),.b(b),.cin({carries,cin}),.sum(sum),.cout({cout,
carries}));
� Since the one_bit_addr ports are all 1 bit, we are instantiating 8 of them, and the eight_bit_addr ports are 8 bits, each one bit port will get one bit from the 8 bit value.
Array Connections Summary
� If the port width matches the wire width the wire is connected to the port
� Note the concatenation operator in the previous example
– It’s making the carries width correct and taking care of the boundary conditions
Synthesis
� Translate verilog to gates� Optimize translation to meet certain constraits� Extremely complex process� If you follow all the directions we’ve given you everything will
probably work– I’m not guaranteeing it though
� All you designs will need to synthesize– That way you’ll know you’re not doing anything that would be hard
to implement in gates
� Clock period isn’t perfect– No global placement and routing
– We fake the capacitance of wires
Hints to Synthesis Tool
� //synopysis sync_set_reset "<signal>"
– Goes right before a synchronous always block
– Tells Design Compiler that the <signal> is a synchronous reset
– Helps the synthesis tool choose a synchronous reset
� //synopysis parallel_case
– Placed before a case statement
– Only one branch of a case can be true at a time
� //synopysis full_case
– Placed before a case statement
– Any unspecified cases are invalid
– You can also put a default: in the case for good measure
� //synopysis one_hot "<signal>"
– Placed after signal declared
– Only one signal of the group will be 1 at a given time
Synthesis Scripts
#/***********************************************************/
#/* The following five lines must be updated for every */
#/* new design */
#/***********************************************************/
read_file -f verilog [list "inout.v"]
set design_name tinout
set clock_name clock
set CLK_PERIOD 6
set reset_name reset
#/***********************************************************/
#/* The rest of this file may be left alone for most small */
#/* to moderate sized designs. You may need to alter it */
#/* when synthesizing your final project. */
#/***********************************************************/
set SYN_DIR ./
set search_path
"/afs/engin.umich.edu/caen/generic/mentor_lib-D.1/public/eecs470/synopsys/"
set target_library "lec25dscc25_TT.db“
…
Synthesis Script
� A bunch of directives to tell the Design Compiler what to do� Minimally you need to be familiar with the first 5 lines� read_file -f verilog [list "myfile.v"]
– Read the verilog file myfile.v
� set design_name mydesign
– Synthesize the module mydesign and all modules it instantiates
� set clock_name clock
– The name of the clock
� set CLK_PERIOD 6
– Set the clock period to 6ns
� set reset_name reset
– The name of the reset line in reset
More Advanced Synthesis
� As designs get bigger you may want to break up the synthesis into multiple parts
– In this case you may compile lower level modules separately and work your way up
– Although not strictly necessary, you’ll need to do it for the multiplier
– We’ll talk more about it for your final project� The lowest level will be just like this
� Higher levels will include the lower levels output and the file to synthesize
� Look at .tcl in project 2 to see how the higher level includes the lower level
� You should familiarize yourself with the tcl files. If you would like to look at the documentation for VCS or Design Complier execute: sold
Synthesis Output
� xxxx_synth.out — The output that scrolls across the screen at high speed
� <designname>.chk — The synthesis tool places warnings in here
� <designname>.rep — Timing report
� <designname>.vg — Structural verilog output
� <designname>.db/xg — Compiled output for including in other designs
synth.out
� Prints all the lines in the tcl file as it executes them
� If you have a problem with synthesis this is a good first place to look– *** Presto compilation terminated with 2 errors. ***
� Also contains information about what flip-flops/latches it found
synth.out – Good output
Inferred memory devices in process
in routine <design_name> line XXX in file
’<path to file>/<file>.v’.
===============================================================================
| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |
===============================================================================
| state_reg | Flip-flop | 2 | Y | N | N | N | Y | N | N |
===============================================================================
� All the Types are: Flip-flop
� Every register we think we should have, should be
listed along with the correct width
synth.out – Bad output
Inferred memory devices in process
in routine <design_name> line XXX in file
’<path to file>/<file>.v’.
===========================================================================
| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |
===========================================================================
| next_state_reg | Latch | 2 | Y | N | N | N | - | - | - |
===========================================================================
� You should never see a Latch
– It means you have some state in one of your combinational blocks
� Gives you the line number to go find the error
<design name>.chk
� Prints warnings that may or may not be a problem
� Good to look at and verify that you don’t have a
problemWarning: In design ’icache’, port ’proc2Icache_addr[0]’ is not connected to any
nets. (LINT-28)
– That is fine if you didn’t connect those bits to anything
– Or they are always 0 because you can’t have an unaligned access
� Will give you places to look if you have problems
with your synthesized code
<design name>.rep
� Lists critical paths through your design
� All slacks should be “MET”
– If any are “VIOLATED” you have too aggressive of a clock period or a bad design
<design name>.rep
startpoint: state_reg[1]
(rising edge-triggered flip-flop clocked by clock)
Endpoint: gnt_b (output port clocked by clock)
...
Point Fanout Trans Incr Path
---------------------------------------------------------------------
state_reg[1]/CLK (dffcs1) 0.00 0.00 0.00 r
state_reg[1]/QN (dffcs1) 0.15 0.16 0.16 f
n5 (net) 1 0.00 0.16 f
state_reg[1]/Q (dffcs1) 0.59 0.24 0.40 r
gnt_b (net) 2 0.00 0.40 r
gnt_b (out) 0.59 0.02 0.42 r
data arrival time 0.42
max_delay 6.00 6.00
clock uncertainty -0.10 5.90
output external delay -0.10 5.80
data required time 5.80
---------------------------------------------------------------------
data required time 5.80
data arrival time -0.42
---------------------------------------------------------------------
slack (MET) 5.38
<design name>.rep
� Trans – Time for a logic transition to occur� Incr – Time that is added to the critical path because
of it
� Path – Total Path so far� Slack needs to be positive: closer to 0 it is, closer
you are to the clock period limit
� Just because you have Xns of slack doesn’t mean that you can’t do better
– If there is a lot of slack VCS won’t try very hard– Closer to the limit you are the harder it will try (the longer it
will take)
<design name>.vg
module a1 ( clock, reset, req_a, gnt_a, req_b, gnt_b );
input clock, reset, req_a, req_b;
output gnt_a, gnt_b;
wire N19, N20, N21, n2, n3, n5;
wire [1:0] next_state;
hib1s1 U9 ( .Q(n2), .DIN(reset) );
dffcs2 \state_reg[0] ( .Q(gnt_a), .CLK(clock), .CLRB(next_state[0]), .DIN( n2) );
dffcs1 \state_reg[1] ( .Q(gnt_b), .QN(n5), .CLK(clock), .CLRB(next_state[.DIN(n3) );
and2s1 U10 ( .Q(N19), .DIN1(req_a), .DIN2(n5) );
nor3s1 U11 ( .Q(N21), .DIN1(N19), .DIN2(gnt_a), .DIN3(n3) );
ib1s1 U12 ( .Q(n3), .DIN(req_b) );
or4s1 U13 ( .Q(N20), .DIN1(gnt_a), .DIN2(gnt_b), .DIN3(req_a), DIN4(req_b));
endmodule
Multiplying by partial products
� Most hardware multipliers involve computing
a number of partial products and then
summing them
� Very similar to how you learned to multiply in
second grade
– Do each bit at a time and then sum all the partial
products to get your answer
Second Grade Way
000111 7
* 0101 * 5
000111
0000
0111
+ 0000 . .
100011 35
2 bits at a time – partial products
000111 xx011100 << 2
* xx01 * 01xx >> 2
000111 011100
+ 000000 + 000000 .
000111 011100
2 bits at a time – add products
000111
+ 011100
100011
2-stage multiplication – 2 bits at a time
00001011 multicand: 00001011 (11)
* 00000011 multiplier: 00000111 (7)
partial product: 00000000
2-stage multiplication – 2 bits at a time
00001011 multicand: 00001011
* 00000011 multiplier: 00000111
00100001 (33) + partial product: 00000000 (0)
2-stage multiplication – 2 bits at a time
00001011 multicand: 00001011
* 00000011 multiplier: 00000111
00100001 partial product: 00100001 (33)
2-stage multiplication – 2 bits at a time
00001011 multicand: 00001011 << 2
* 00000011 multiplier: 00000111 >> 2
00100001 partial product: 00100001
2-stage multiplication – 2 bits at a time
00101100 multicand: 00101100 (44)
* 00000001 multiplier: 00000001 (1)
partial product: 00100001
2-stage multiplication – 2 bits at a time
00101100 multicand: 00101100
* 00000001 multiplier: 00000001
00101100 (44) + partial product: 00100001 (33)
2-stage multiplication – 2 bits at a time
00101100 multicand: 00101100
* 00000001 multiplier: 00000001
00101100 partial product: 01001101 (77)
Project 2 – Part 1
� Supplied with a 8-stage multiplier
� Make a 4-stage multiplier
� Make a 2-stage multiplier
� Synthesize each and answer some questions
– Make sure you set an aggressive clock period
Part 1 – pipe_mult.v
module mult(clock, reset, mplier, mcand, start, product, done);
input clock, reset, start;
input [63:0] mcand, mplier;
output [63:0] product;
output done;
wire [63:0] mcand_out, mplier_out;
wire [(7*64)-1:0] internal_products, internal_mcands, internal_mpliers;
wire [6:0] internal_dones;
mult_stage mstage [7:0]
(.clock(clock),
.reset(reset),
.product_in({internal_products,64’h0}),
.mplier_in({internal_mpliers,mplier}),
.mcand_in({internal_mcands,mcand}),
.start({internal_dones,start}),
.product_out({product,internal_products}),
.mplier_out({mplier_out,internal_mpliers}),
.mcand_out({mcand_out,internal_mcands}),
.done({done,internal_dones})
);
endmodule
Part 1 – mult_stage.v
module mult_stage(clock, reset, product_in, mplier_in, mcand_in, start,
product_out, mplier_out, mcand_out, done);
....
reg [63:0] prod_in_reg, partial_prod_reg;
wire [63:0] partial_product, next_mplier, next_mcand;
assign product_out = prod_in_reg + partial_prod_reg;
assign partial_product = mplier_in[7:0] * mcand_in;
assign next_mplier = {8’b0,mplier_in[63:8]};
assign next_mcand = {mcand_in[55:0],8’b0};
always @(posedge clock)
begin
prod_in_reg <= #1 product_in;
partial_prod_reg <= #1 partial_product;
mplier_out <= #1 next_mplier;
mcand_out <= #1 next_mcand;
end
always @(posedge clock)
begin
if(reset)
done <= #1 1’b0;
else
done <= #1 start;
end
endmodule
Part 2 – Integer Square Root
� Conceptually it’s a loop
– Propose highest bit of answer is set and square
the proposed answer
– If the result < value keep the bit set
– Otherwise clear the bit
– now try the next most significant bit
� You won’t use a loop primitive to implement it
though
Part 2 – ISR state machine
� Set the highest bit of the solution
� Start a multiply
� Wait until the multiply completes
� Check the result against the value that you’re computing the ISR of
� If less than keep the bit, greater than clear the bit
� Start with the next most significant bit until you’ve tested all 32 bits
� When done with all 32 bits raise the done signal for 1 cycle
� If at any time you receive a reset signal start over
Part 2 – Warnings
� When you’re dealing with 64 bit numbers in verilog you need to specify them as 64’hXXXX or 64’dXXXXX
� If you leave off the 64’ you won’t get the number you wanted
� Pay attention to how the reset operates
– If your device receives a reset during it’s calculation, it should start over with the new value
– The reset causes the input value to be flopped (stored by the ISR module)
– The value can change after the reset goes low
� Your testbenches should also be testing for these conditions
� Must not take more than 600 cycles to complete one ISR
– Average is between 300-400 cycles
Part 2 – Simple Example
Input 10101101 (173)
Proposed 0000 (0)
Proposed2 00000000 (0)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1000 (8)
Proposed2 00000000 (0)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1000 (8)
Proposed2 01000000 (64)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1100 (12)
Proposed2 00000000 (0)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1100 (12)
Proposed2 10010000 (144)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1110 (14)
Proposed2 00000000 (0)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1110 (14)
Proposed2 11000100 (196)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1100 (12)
Proposed2 11000100 (196)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1101 (13)
Proposed2 00000000 (0)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1101 (13)
Proposed2 10101001 (169)
Part 2 – Simple Example
Input 10101101 (173)
Proposed 1101 (13)
Proposed2 10101001 (169)
√173 = 13.153
Part 3 – Synthesize ISR
� Synthesize the ISR module you made in Part
2
� Answer some more questions
Synthesis – mult.tcl
read_file -f ddc [list "mult_stage.ddc"]
set_dont_touch mult_stage
read_file -f verilog [list "pipe_mult.v"]
set design_name mult
set clock_name clock
set reset_name reset
set CLK_PERIOD 10
…