interra confidentialSlide: 1
Synthesis in EDA Flow
by: Saikat Bandyopadhyay
© Interra Systems India Pvt Ltd
interra confidentialSlide: 2
Content
• Defining Synthesis
• History
• IC Design Flow
• Synthesis Flow
Analysis and Elaboration
Synthesis
Scheduling and Allocation
Optimization
Technology Mapping
• Synthesis Goals and Constraints
• Synthesizing Big Design
• Variations in Synthesis
• Q and A
interra confidentialSlide: 3
Defining Synthesis
• Conversion of High Level Hardware Description to Gate Level Hardware Description
• Level of Hardware Description
Gate level
Data Flow level
RTL level
Behavioural level
interra confidentialSlide: 4
Gate Level
• Description of the hardware is purely in terms nets connecting pins of gate instances and ports
• Example
implements a 2 input mux using gate level components
• module select(out, s, a, b);
• output out;
• input s, a, b;
• INT_NOT (s_bar, s); //s_bar=!s
• INT_AND2 (t1, a, s); //t1=a&s
• INT_AND2 (t2, b, s_bar);//t2=a&s_b
• INT_OR2 (out, t1, t2);//out=t1|t2
• endmodule
interra confidentialSlide: 5
Data Flow Level
• Gate level + assign statements
• normally used to represent combinational circuit
• Can represent sequential circuit if used with instance of latch or ff
• Example: computes absolute
value
• module abs (out, in);
• output [7:0] out;
• input [7:0] in;
• wire [7:0] twosCIn;
• assign twosCIn = ~in + 1;
• assign out = in[7] ? twosCIn : in;
• endmodule
interra confidentialSlide: 6
RTL Level
• Explicit clock and state machine
• Technology independent
• Fixed Architechture
• Synthesizable
• Example :
RTL level description for recognizing overlapping 101 pattern
• State diagram
S1S0 S2S01/0
0/0
0/0
1/0
0/0
1/1
interra confidentialSlide: 7
RTL Level
• module recognize101(match,in,ck);
• input in, ck;
• output match;
• reg match;
• reg [1:0] state;
• always @(posedge ck) begin
• case (state)
• 2’b00: begin
• if (in == 1) begin
• state = 2’b01;
• end
• match = 1’b0;
• end
• 2’b01: begin
• if (in == 0) begin
• state = 2’b10
• end
• match = 1’b0;
• end
• case 2’b10: begin
• if (in == 1) begin
• state = 2’b01;
• match = 1’b1;
• end else begin
• state = 2’b00;
• match = 1’b0;
• end
• end
• default: begin
• state = 2’b00;
• match = 1’b0;
• end
• endcase
• endmodule
interra confidentialSlide: 8
Behavioural Level
• Implicit clock and scheduling of events
• Architechture independent
• Mostly used for modeling only (not synthesizable)
• Can be synthesized with special behavioural synthesis tools.
• Example:
The following module computes sqrt
Uses logicn-1
(2i+1) = n2
0
• module sqrt(in, out);
• input [7:0] in; output [3:0] out;
• reg [3:0] out, tmp; reg [7:0] odd;
• always @(in) begin
• tmp = in; out = 0; odd = 1;
• while (tmp > 0) begin
• if (tmp >= odd) begin
• out = out+1;
• tmp = tmp - odd;
• odd = odd + 2;
• end else begin
• tmp = 0;
• end
• end
• end
• endmodule
interra confidentialSlide: 9
History of Synthesis
• Initial IC Designs were handmade at Mask level
Polygon pushing tools(example Calma®) were used for design.
Simulation was done at this level by Simulators like HiLo®.
• Next tools were developed for automatic generation of operators
Some generators were developed for generating operators from parameters like input/output width and architecture.(e.g 16 bit carry look ahead adder)
The operators were connected by hand
• Later Schematic entry tools came to market.
Gates or operators can be drawn and connected schematically
Automatic tools would generate the mask from the schematic.
Mentor graphics Idea Station® had integrated schematic entry and simulation
interra confidentialSlide: 10
History of Synthesis(cont)
• Next came High Level Hardware Description Language
Gateways Design came up with Verilog Language
Verilog was essentially developed to model behavior of Electronic Circuits. Not for simulation.
Gateways developed the Verilog Simulator now called Verilog-XL.
• From High Level Description to Gate Level
Synopsys was at earlier called optimal design Inc. It specialized in gate level logic optimization.
Synthesis happened as a after thought. Since this modeling language(verilog) was available, Synopsys engineers tried to convert various of high level verilog constructs into gate level where ever possible.
synthesis as we know today was born.
interra confidentialSlide: 11
IC Design Flow
• Develop and verify algorithm (C, Mathlab etc)
• Hand convert to RTL level Hardware Description
• Verify the RTL Design by Simulation.
• Power and Timing estimation tools can also be used at RTL level.
• Synthesis tools used to convert description to gate level.
• Simulation or Formal Verification done to verify functionality
• Design Flow
Algorithm in C, Mathlab
RTL Description
Gate Description
Synthesis
Execute andverify Algo
Simulate to verifyFunctionality
Estimate Timingand Power
Verify Timingand Power
Verify Functionalitywith Simulation or
Formal Verification
Tech Library
Constraints
interra confidentialSlide: 12
IC Design Flow (cont)
• Placement tool in now used to assign place(x,y coordinates) for gates
• Timing verification is done with better estimate of wire delays
• Routing tool assigns location for nets that connect the instance gates.
• Timing Verification is again done with still refined wire delays
• Mask is used to prepare the IC
• Design Flow
Gate Description
Placement
Mask (GDSII)
Placed Gates
Routing
VerifyTiming
Verify and CorrectPlacement Rules
VerifyTiming
Verify and CorrectMask Rules
To IC foundry
Floor Plan
Physical Library
interra confidentialSlide: 13
Synthesis Flow
Translate RTL level Design description in HDL to gate level netlist
In description only synthesizable subset of the HDL are supported for synthesis
Different steps in Synthesis flow
Elaboration
DFA
Allocation
CDFGgeneration
Analysis
CDFGTraversal
Optimization
WritingNetlist
TechnologyMapping
RTLDescription
Gate LevelDescription
MacroGeneration
interra confidentialSlide: 14
Synthesis Flow (analysis)
• Analysis Input : Design description in HDL (Verilog/VHDL file)
Output : Analyzed design units in an intermediate form either in memory or in disk
Functionality :
• Perform syntax and semantics checks on the design description
• Creates Data Structure in an language dependent form (Obejct Model)module my_mod(z, a, b, c);input [1:0] a, b, c;output [1:0] z;always @(a or b or c) z = a + b – c;endendmodule
module my_mod
always
expr
ports
interra confidentialSlide: 15
Synthesis Flow (elaboration)
• Elaboration Input : Analyzed design unit list
Output : Elaborated design unit list
Functionality :
• Expand the complete design hierarchy
• Generate a design unit list consisting of distinct design units
• Resolve all parameter values
• Compute all the constant expression
•
module top (o, i1, i2);input [7:0] i1, i2; output [7:0] o;my_mod#(1) (o[1:0], i1[1:0], i2[1:0]);my_mod#(3) (o[7:2], i1[7:2], i2[7:2]);endmodule
module my_mod(z, a, b);parameter w;input [2*w-1:0] a, b;output [2*w-1:0] z;assign z = a + b – c;endmodule
module top (o, i1, i2);input [7:0] i1, i2; output [7:0] o;my_mod_1 (o[1:0], i1[1:0], i2[1:0]);my_mod_3 (o[7:2], i1[7:2], i2[7:2]);endmodule
module my_mod_1(z, a, b);input [1:0] a, b;output [1:0] z;assign z = a + b – c;endmodule
module my_mod_3(z, a, b);input [5:0] a, b;output [5:0] z;assign z = a + b – c;endmodule
interra confidentialSlide: 16
Synthesis Flow (cdfg)
• Generation of Control and Data Flow Graphs
Input : Elaborated Language dependent Data Structure
Output : Language Independent Control and Data Flow Graphs(CDFG)module my_mod(z,a,b,c,m,n);
input [1:0] a, b, c;input m, n;reg[1:0] z;reg [1:0] z;reg [1:0] t;
always @(a or b or c or m or n) begin if(m) t = a; else if (n) t = b; z = t + c;endendmodule
START
END
IF
ENDIF
IF
ENDIF
= = NOP
+
t
cz
t
a
b
t
mn
interra confidentialSlide: 17
Synthesis Flow (cdfg)
• Distinct component of synthesis routine:
CDFG Generation
• Populate Language independent representation of the input design as a Control and Data Flow Graph
• Functional flow input language dependent
• Input: Inmemory representation of the entire design created by analyzer
• Output: Language independent representation of the entire design as a directed graph
• Graph is created for each concurrent block and represents sequential behaviour of the design
• Each node in Graph represents either control node or data node
• Each edge in Graph represents either control flow or data flow
interra confidentialSlide: 18
Synthesis Flow (dfa)
Data Flow Analysis and Creating Logic with Generic Gates
• Traverse the CDFG created for each concurrent block
• Calculate the driving logic for each assign object in each path and store them as logic equation
• Both data logic and control logic are evaluated
• Realize an abstract structure of the input designSTART
END
IF
ENDIF
IF
ENDIF
= = NOP
+t
cz
t
ab
t
mn
MU
X
LATCH adderb
a
m
m
n
c
z
interra confidentialSlide: 19
Synthesis Flow (dfa)
We analyze the cdfg and store the data in intermediate forms called path variable array(PVA) and path variable matrix(PVM)
Path Variable Array(PVA)
• one for each path
• array of lhs-rhs pair.
p = a + b;
q = ~en
~enq
a+bp
rhslhs
interra confidentialSlide: 20
Synthesis Flow (dfa)
Path Variable Matrix(PVM)
• Created each time paths join
• rows represent lhs(signals getting assigned)
• columns are paths
• For each column(path) there is enabling condition
nNULLmr
NULLbNULLq
a+bbap
m == 3m == 2m == 1lhs\cond
interra confidentialSlide: 21
Synthesis Flow (dfa)
Data Flow Analysis
• Each path consists of path segments and for each path segment data and control value are evaluated for each assigned object.
• These values are stored in PVA (Path Variable Array)
• A special construct PVM (Path Variable Matrix) is created out of PVAs to hold value of the objects in different paths.
• Each column in PVM represents a particular path and each row represents a particular object. Each entry in Matrix represents logic value of a particular object in a particular path.
interra confidentialSlide: 22
Synthesis Flow (dfa)
Data Flow Analysis (Example)START
END
IF
ENDIF
IF
ENDIF
==
NOP
+
t
cz
t
a
b
t
mn
PVA : P1PVM: M1
PVA : P11
PVA : P121 PVA : P12
PVM : M2
PVA : M3
PVA : P12
interra confidentialSlide: 23
Synthesis Flow (dfa)
Data Flow Analysis (Example)
• For each sequential block, one root PVA and one root PVM are allocated (P1, M1)
• Starting from each branch node new PVA is created for each path segment.(P11 and P12)
• When hit a join node, new PVM (M2) is created out of PVAs (P11 and P12)
• This PVM is passed to allocator for allocating current data and control logic
• Clock, Tristate and Hold logic is allocated only from Root PVM (M1)
interra confidentialSlide: 24
Synthesis Flow (dfa) Inferring Logic from PVM
• Each row of PVM is analyzed and logic inferred.
• For row in which all colums have values one hot mux is inferred
• For row in which some columns are empty, latch is infered
• Latch, flip-flop and tristate are allocated from root PVM: M1
lhs\cond m ~m
d a b
lhs\cond m ~m
d a NULL
MU
Xb
a
m
d
LATCH
m
a d
interra confidentialSlide: 25
Synthesis Flow(dfa example)
Lets now infer logic for the CDFG that we had created
• Initial PVM just has initial values(NULL)
• At first join node PVM M2 is created
• Since infers to latch we wait till root PVM:M3
• Since t_1 is not yet allocated. The PVM is divided into PVM for data and PVM for hold logic
lhs\cond n ~n
t_1 b NULL
lhs\cond m ~m
t a t_1
interra confidentialSlide: 26
Synthesis Flow(dfa example)
• PVM for data logic
• PVM for hold logic
• t_data goes to data pin. t_hold goes to hold pin and the output is t
• Finally logic for z is infered for root PVM
lhs\cond m ~m
t_2 a b
lhs\cond m ~m
t_2 NULL ~n
MU
Xb
a
n
t_data
m
n t_hold
+t
c
z
interra confidentialSlide: 27
Synthesis Flow(dfa example)
Inferred netlist for the CDFG
RT
L_M
UX
RTL_LD M_RTL_ADDb
a
m
m
n
c
interra confidentialSlide: 28
Synthesis Flow (cont.)
• Allocation and Scheduling
Schedule the clock cycle in which to perform the operation
Allocate actual hardware resource for each logic operation
Bind the allocated resource with the input and output data
Transform the design into netlist form by instantiating cell/macro and connects them to achieve the functionality
interra confidentialSlide: 29
Synthesis Flow (cont.)
• Allocation and Scheduling
Example of Data Flow Path for scheduling
Trivial Scheduling
• Assumes infinite resources
• All operations in 1 clock cycle
• Large clock cycle
• Latency is 0
* ** * +
* *
-
-
+ <
Clo
ck P
erio
d
interra confidentialSlide: 30
Synthesis Flow (cont.)
• Allocation and Scheduling
ASAP Scheduling
• One operation per clock cycle
• Independent operations done parallel
• Operations done ASAP
• Smaller clock
• Latency is number of levels
* ** * +
* *
-
-
+ <
T1
T2
T3
T4
interra confidentialSlide: 31
Synthesis Flow (cont.)
• Allocation and Scheduling
Scheduling under resource constraint
• Resource available
– 1 multiplier
– 1 add/sub
• Small clock(same as ASAP)
• Small area
• Large latency
*
*
*
*
+
*
*
-
-
+
<
T1
T2
T3
T4
T7
T6
T5
interra confidentialSlide: 32
Synthesis Flow(cont)
• Macro Generation
Operators in Data Flow Paths like adders, multipliers which are allocated as Macros are build in terms of primitive cells
Input: Netlist with macro Instances
Ouput: Netlist in terms of primitive instances only
Functionality
• Based on the macro(operator type), input width and input type(signed, unsigned) appropriate operator generator are called.
• generator replaces the macro with primitive gates like PRIM_AND, PRIM_XOR.
interra confidentialSlide: 33
Synthesis Flow (cont.)
• Optimization
Circuit cost whether area or speed is optimized.
Optimization in concorde is mainly done by SIS
Hanging logic removal, removal of not gates connected in series, parallel instance removal etc. is done traversing the netlist in concorde code.
interra confidentialSlide: 34
Synthesis Flow(cont)
• Logic Optimization
• Lets discuss algorithm for one such case (expand)
• Function to optimize is• FON = ab’c’ + a’b’c’ + a’bc’ + a’b’c
• Fdon’t care = abc’
• FOFF can be computed to ab’c + a’bc + abc
• Tabular representation
• FON FOFF
• a b c a b c
• ab’c’ 1 0 0 ab’c 1 0 1
• a’b’c’ 0 0 0 a’bc 0 1 1
• a’bc 0 1 1 abc 1 1 1
• a’b’c 0 0 1Cube Representation of function
a
bc
interra confidentialSlide: 35
Synthesis Flow(cont)
• Expand Algo
• Foreach row of FON
• foreach column of row
• if (FON[row][column] != *)
• F = FON
• F[row][column] = *
• if (FFOFF == )
• foreach row2 of F
• if (row != row2 &&
• F[row]F[row2] == F[row]) {
• erase F[row2];
• FON = F
interra confidentialSlide: 36
Synthesis Flow(cont)
• Expand Algo• Tabular Representation Cube Representation
• FON FOFF
• 1 0 0 1 0 1
• 0 0 0 1 1 1
• 0 1 0 0 1 1
• 0 0 1
• * 0 0 * * 0
• 0 0 0 erase 0 1 0 erase
• 0 1 0 0 0 1
• 0 0 1
• * * * * * 0 * * 0 * * 0
• 0 0 1 * 0 1 0 * 1 0 0 *
interra confidentialSlide: 37
Synthesis Flow(cont)
• Sequential Optimization
• Several Kinds of Sequential Optimization Techniques are also present.
• Lets consider one such Optimization(retiming)
• Flip Flop or Latch position is moved along the path to optimize area and speed
interra confidentialSlide: 38
Synthesis Flow (cont.)
• Technology Mapping & Optimization
Map the generic synthesized netlist using customer specific library cell
Rule Based Mapping
Algorithm Based Mapping
Mapping criteria
• get minimum area
• get minimum delay
interra confidentialSlide: 39
Synthesis Flow (cont.)
• Technology Mapping & Optimization• Lets consider Dynamic Programming based mapping to optimize area
• Library cells are converted to NAND, INV tree based on it’s logic
• Library and NAND-INV tree
• INV 2
• NAND 5
• AND 6
• IOR 5
interra confidentialSlide: 40
Synthesis Flow (cont.)
• Technology Mapping & Optimization• Design is also converted to NAND_INV tree
• Algorithm
Cost of a cell is it’s Area
Cost of Input pins is 0
Cost of a vertex is cost of cell whose pattern matches the pattern at vertex + vertex cost at inputs
If multiple cell patterns match pattern at the vertex. We will take the cell which results in minimum vertex cost
Compute cost for all vertex from input to output
interra confidentialSlide: 41
Synthesis Flow (cont.)
• Technology Mapping & Optimization
• Cost of V1 = cost(NAND) = 5
• Cost of V2 = min(cost(INV)+cost(V1), cost(AND)) = 6
• Cost of V3 = min(cost(IOR)+cost(V1),cost(NAND)+cost(V2)) = 10
• INPUT DESIGN MIN AREA IMPLEMETATION
1 2 3
interra confidentialSlide: 42
Synthesis Flow (cont.)
• Writing Structural Netlist
Write synthesized netlist in any desired format to output text files
Output netlist is in structural form.
interra confidentialSlide: 43
Synthesis Goals and Constraints
• RTL Level hardware description can be implemented in many ways[macro(architectural), or micro(logic) level]
+
+ +
+
+
+
a+b+c a+b+c a+b+c
a
a
b
c
cb c
b
a
Architectural choices
x
y
zLogic choices
x
y
z
interra confidentialSlide: 44
Synthesis Goals and Constraints
• Goals and Constraints help Synthesis Tool to make the choices
• Goals can be maximize speed or minimize area, power
• Constraints are more detailed Goals
• Constraints at Chip Level
Minimize area for a given Clock speed
Maximize speed as long as the design fits into a FPGA of specific size
• Constraints at Block Level are more complex
interra confidentialSlide: 45
Constraints at Block Level
• Input Delay specifying the data arrival time at each input seperately.
• Output Delay specifies the extra delay after the output. The current design must make the output data arrive earlier to take care of this case.
• Clock waveform needs to be specified.
• Specific paths can be specified with specific delay to meet
interra confidentialSlide: 46
Synthesizing Big Design
• Big Designs take too much memory and time to be Synthesized together.
• Divided into blocks(modules) and the blocks are synthesized separately
• Synthesis is done bottom up. Leaf level blocks are synthesized first.
• Constraints need to be computed from the Top, since constraint at each block comes from constraint of the whole chip.
interra confidentialSlide: 47
Synthesizing Big Design
• Designers divide the total chip area into area constraint for each block
• The block constraints can be total area or width and height of each block. Pin positions of each block are determined.
• Synthesis tool only takes in the area. The other constraints (width, height, pin positions) are for placement tools
B1B3
B2
B4
B5
B6
B7
Chip Layout
interra confidentialSlide: 48
Synthsizing Big Design
• Similarly designers divides the clock period into timing constraints for each block.
• Say the clock period is 20ns. For B1 Flip Flop to output can be 7ns, for B2 input to output can be 5 ns. For B3 input to Flip Flop is 8ns.
B1 B2 B3
Design with Blocks(abstract)
interra confidentialSlide: 49
Synthesizing Big Designs
• This process of dividing chips resources is called bugeting.
• Buggeting is mostly manual but there are some tools to help in bugeting
• The process is mostly iterative. After Synthesis designers often find blocks that couldn’t meet the constraints. Designers normally redo the buggeting and Synthesizes again.
interra confidentialSlide: 50
Variations in Synthesis
• Higher Level Synthesis
Input is at higher level than RTL
• Alternate Target Synthesis
Output not at Gate Level
• Timing Driven Synthesis
interra confidentialSlide: 51
Higher Level Synthesis
• Behavioural Synthesis Synthesis done from Behavioral Level
Output is normally RTL
Unlike RTL Synthesis(regular Synthesis), architechture selection is done by the tool based on constraints
Scheduling is non trivial. Clock is used to divide the data paths into different time slots
Resources are shared if they are in different time slots
interra confidentialSlide: 52
Higher Level Synthesis
• Protocol Synthesis
Input in Language specific for describing Communication Protocols between designs
Output is RTL Description for Synthesis
Sometimes also produces C model for verification
Examples are
• Synopsys’s Protocol Compiler
• Austin Protocol Compiler(APC) of The University of Texas at Austin
• ALFred Protocol Compiler
interra confidentialSlide: 53
Higher Level Synthesis
• Example of Protocol input in Timed Asynchronous Protocol(TAP)
process peconst Rp: integer=0; Bq: integer=0; tr: integer=10; qe: addressvar sp: integer = 0; sq: array [2] of integer = 0; d, e: integer; initialize: integer = 1begin act sendrqst in 0; initialize := 0 timeout sendrqst rst.e:=NCR(Bq,2,sq[0],sq[1]); send rqst to qe; act resend in tr; rcv rqst from qe d:=DCR(Bq,0,rqst.e); e:= DCR(Bq,1,rqt.e);
if (sp=d)(sp=e) sp:=e; reply.e:= NCR(Bq,1,sp); log(“detected adversary”); fi timeout resend if sq[0] = sq[1] rqst.e:=NCR(Bq,2,1,sq[1]); send rqst to qe; act resend in tr; skip; fi rcv reply from qe d:= DCR(Rp,0,reply.e); if sq[1] = d sq[0]:=sq[1]; log(“detected adversary”); fi end
interra confidentialSlide: 54
Alternate Target Synthesis
• FPGA Synthesis Special Mapping to Programmable gates
• e.g 4 input gates(often called LUT) that can be programmed to any 4 input logic
Dedicated resources needs special care while mapping and cost computation.
• Gates using carry chain wires have different delay from regular wires that go through switch boxes.
Architechture specific OptimizationLUT
LUT
LUT
SwitchBox
interra confidentialSlide: 55
Alternate Target Synthesis
• Physical Synthesis
Generates directly Placed Gates
Design Convergence is guarantied
• Constraint that meets in Synthesis may not meet after placement. We normally need to redo the Synthesis. Physical Synthesis helps to avoid this iteration
interra confidentialSlide: 56
Timing Driven Synthesis
• Synthesis is done directly to technology gates.
• Synthesis is done from input towards output(light to dark)
• Architechtures are selected while synthesizing based on the delays
interra confidentialSlide: 57
Q & A
• Thank you