y. hu, v. shih, r. majumdar and l. he, “exploiting symmetries to speedup sat-based boolean...

1
Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD 2008. Y. Hu, V. Shih, R. Majumdar and L. He, “FPGA Area Reduction by Multi-Output Function Based Sequential Resynthesis”, DAC 2008. Y. Hu, Z. Feng, R. Majumdar and L. He, “Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching”, ICCAD 2008. FPGA Resynthesis for Area and Reliability Abstract Resynthesis, a circuit rewriting technique in FPGA CAD flow, has emerged to cope with the inherent NP-hardness of many CAD tasks and the ever increasing design complexity and logic capacity of FPGAs. Targeting area and reliability optimization, this project proposed two logic resynthesis algorithms by applying an efficient SAT-based Boolean matching as the optimization engine. In contrast to existing resynthesis, our proposed algorithms explore multiple design freedoms and architecture features in order to achieve better quality. Motivations Heuristic FPGA synthesis results in sub-optimal 500X gap exists between optimal and heuristic technology mapping [Cong, FPGA’06] Growing design complexity and FPGA capacity increases the optimality gap Resynthesis is needed to improve quality (area/performance/power/reliability) Rewrite the logic or physical design Perform iterations for design closure 1 Resynthesis for Area & SER SAT-based Boolean Matching MIMO&Retiming for Area References Student: Yu Hu (Ph.D., 2009) Advisor: Lei He EDA Lab, Electrical Engineering Department, UCLA http://eda.ee.ucla.edu 2 6 Area-Aware Resynthesis 8 9 4 Resynthesis Framework Area Reduction Results Exploring Symmetries in BM SAT-BM can be much faster if we explore symmetries in Boolean function, e.g., b and c are symmetric in a(b+c) FPGA PLB architecture, e.g., pins in an LUT are symmetric 7 14 Proposed Resynthesis Resynthesis based on LUT reconfiguration Leverage the inherent flexibility in LUT-based FPGA Reduce area without performance degradation Increase reliability with negligible area overhead A SAT-based Boolean matching is the key A formal method ensuring correct-by- construction Flexible enough to deal with heterogeneous FPGA Efficient proposed implementation for scalability 3 200X speedup Multi-iterations of block-based re-mapping Sequential resynthesis obtains up to 9% area Factors to sequential resynthesis quality Sequential structure PLB templates, the number of iterations SER model: [Mukherjee, HPCA, 2005] Assume large industrial FPGA: 330,000 LUTs Robust PLB Structure 10 12 Stochastic Resynthesis Fault-Tolerant Boolean Matching 11 13 MTTF Evaluation Different synthesis algorithms lead to different area-robustness tradeoffs Stochastic resynthesis maximizes the yield rate under random faults No testing overhead, negligible area/performance overhead Logic synthesis Resynthesis Physical design H igh-level circuit description Bitstream Timing info Fault info Architecture specification Boolean matching answers a Yes-No question Can PLB p implement Boolean function f? If yes, give the configuration bits for all LUTs in p. LU T4 PLB_d LUT4 LU T3 0.001 0.01 0.1 1 10 100 5 6 7 8 9 Runtime(s) N um berofinputs ofB oolean function Ling'05 Ours Ours+Cong'07 Each re-mapping is based on SAT- BM a b c d e x 1 x 2 x 3 O 1 O 2 a b c d e x 1 x 2 x 3 O 1 O 2 Retiming breaks register boundaries for resynthesis f c g e x 1 x 2 x 3 O 1 O 2 Function of O 2 has to be preserved, i.e., c and e need to be duplicated, which is not required if MIMO block is considered. h i x 1 x 2 x 3 O 1 O 2 Case I: Classic retiming w/o duplication Case II: Peripheral retiming w/o duplication Case III: Peripheral retiming w duplication 4-LU T 4-LU T 4-LUT 4-LU T FF FF 4-LU T 4-LU T 4-LU T FFs 4-LU T 4-LU T 4-LU T FF 4-LU T 4-LU T 4-LU T 4-LU T FFs 3-LUT 3-LUT 3-LUT FF out1 out2 x 1 x 2 x 3 x 4 x 5 3-LU T 3-LUT 3-LUT FF out1 out2 -1 FF x 1 x 2 x 3 x 4 x 5 3-LUT 3-LUT FF out1 out2 -1 FF x 1 x 2 x 3 x 4 x 5 3-LUT 3-LUT FF out1 out2 FF x 1 x 2 x 3 x 4 x 5 LU T LU T LU T LU T FF x1 x 2 x3 x4 LU T LU T LU T LU T-a FF LU T-b x1 x 1 2 x4 x 1 3 x 0 2 x 0 3 LU T LU T LU T FFs LU T-a LU T-b x 1 x 1 2 x4 x 1 3 x 0 2 x 0 3 LU T LU T LU T-c LU T-d FFs x1 x 1 2 x 4 x 1 3 x 0 2 x 0 3 LU T LU T LU T-c LU T-d FFs x1 x 1 2 x4 x 1 3 x 0 2 x 0 3 LU T LU T LU T FFs x1 x 2 x 3 x 4 C ircuit ABC LUT# Runtime(min) C om b Seq C om b Seq bigkey 1261 1261 (0.00% ) 1244 (-1.35% ) 2709 1898 clma 4210 4167 (-1.02% ) 4116 (-2.23% ) 2697 3825 di_ eq 674 674 (0.00% ) 673 (-0.15% ) 655 856 dsip 1554 1330 (-14.41% ) 1338 (-13.90% ) 705 1481 elliptic 441 419 (-4.99% ) 419 (-4.99% ) 32 370 fris c 2841 2660 (-6.37% ) 2595 (-8.66% ) 1364 1537 s298 44 41 (-6.82% ) 37 (-15.91% ) 186 125 s38417 3134 3105 (-0.93% ) 3117 (-0.54% ) 3466 6092 s38584 3720 3654 (-1.77% ) 3655 (-1.75% ) 2867 8363 tseng 946 935 (-1.16% ) 934 (-1.27% ) 1331 1492 ave 1883 1825 (-3.75% ) 1813 (-5.07% ) 1601 2604 Ratio 1 99.3% 1 1.6X Logicblock Intermediate logics Fault rate of previous logics Input faults of LB2 Faultsin config-bits X Both faults in LUT configuration and interconnect are considered and modeled as random variables. 18 synthesis solutions for MCNC-i10 Input: PLB template H and Boolean function F Fault rate for the inputs and SRAM bits of PLB H Output: Either that F cannot be implemented by PLB H Or that the configuration of H which minimizes the probability that faults are observable in the output of the PLB under all input vectors Fault-Tolerant BM task breakdown Find multiple Boolean matching Evaluate the stochastic fault rate Deterministic SAT vs. SSAT Deterministic SAT Stochastic SAT Faults in intermediate wires Faults in LUT configurations LU T4 LU T4 LU T4 LU T4 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 z1 z3 z2 G F1 F2 LU T2 LU T2 00 1 01 1 10 x 11 1 z1 z3 z2 X1 X2 X3 X4 LUT2 00 x 01 x 10 x 11 x LU T2 z1 z3 z2 X1 X2 1 1 G (a) (b) Satisfiab ility don’t- care Observabi lity don’t- care Robust PLB structure introduces more potential for don’t-cares Stochastic resynthesis maximizes don’t- cares w/ FTBM and robust PLB 20.66 27.15 0 5 10 15 20 25 30 M TTF (year) ABC Stochastic R esynth 31% MTTF increase! LUT utilization is low for Xilinx V-5 FPGA 5

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD 2008.  Y. Hu,

Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD 2008.

Y. Hu, V. Shih, R. Majumdar and L. He, “FPGA Area Reduction by Multi-Output Function Based Sequential Resynthesis”, DAC 2008.

Y. Hu, Z. Feng, R. Majumdar and L. He, “Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching”, ICCAD 2008.

FPGA Resynthesis for Area and Reliability

AbstractResynthesis, a circuit rewriting technique in FPGA CAD flow,

has emerged to cope with the inherent NP-hardness of many

CAD tasks and the ever increasing design complexity and

logic capacity of FPGAs. Targeting area and reliability

optimization, this project proposed two logic resynthesis

algorithms by applying an efficient SAT-based Boolean

matching as the optimization engine. In contrast to existing

resynthesis, our proposed algorithms explore multiple design

freedoms and architecture features in order to achieve better

quality.

Motivations Heuristic FPGA synthesis results in sub-optimal

500X gap exists between optimal and heuristic technology mapping [Cong, FPGA’06]

Growing design complexity and FPGA capacity increases the optimality gap

Resynthesis is needed to improve quality (area/performance/power/reliability)

Rewrite the logic or physical design

Perform iterations for design closure 1

Resynthesis for Area & SER

SAT-based Boolean Matching

MIMO&Retiming for Area

References

Student: Yu Hu (Ph.D., 2009)Advisor: Lei HeEDA Lab, Electrical Engineering Department,

UCLAhttp://eda.ee.ucla.edu

2

5

6

Area-Aware Resynthesis

8

9

4

Resynthesis Framework

Area Reduction ResultsExploring Symmetries in BM SAT-BM can be much faster if we explore symmetries in

Boolean function, e.g., b and c are symmetric in a(b+c)

FPGA PLB architecture, e.g., pins in an LUT are symmetric

7

14

Proposed Resynthesis Resynthesis based on LUT reconfiguration

Leverage the inherent flexibility in LUT-based FPGA

Reduce area without performance degradation

Increase reliability with negligible area overhead

A SAT-based Boolean matching is the key A formal method ensuring correct-by-construction Flexible enough to deal with heterogeneous FPGA Efficient proposed implementation for scalability 3

200X speedup

Multi-iterations of block-based re-mapping

Sequential resynthesis obtains up to 9% area Factors to sequential resynthesis quality

Sequential structure

PLB templates, the number of iterations

SER model: [Mukherjee, HPCA, 2005]

Assume large industrial FPGA: 330,000 LUTs

Robust PLB Structure

10

12

Stochastic Resynthesis

Fault-Tolerant Boolean Matching11

13MTTF Evaluation

Different synthesis algorithms lead to different area-robustness tradeoffs

Stochastic resynthesis maximizes the yield rate under random faults No testing overhead, negligible area/performance overhead

Logic synthesis

Resynthesis

Physical design

High-level circuit description

Bitstream

Timing info

Fault info

Architecture specification

Boolean matching answers a Yes-No question

Can PLB p implement Boolean function f?

If yes, give the configuration bits for all LUTs in p.

LUT4

PLB_d

LUT4

LUT3

0.001

0.01

0.1

1

10

100

5 6 7 8 9

Runt

ime

(s)

Number of inputs of Boolean function

Ling'05 Ours Ours+Cong'07

Each re-mapping is based on SAT-BM

a

b c

d e

x1 x2 x3

O1

O2a

b c

d e

x1 x2 x3

O1

O2

Retiming breaks register boundaries for resynthesis

f

c

g e

x1 x2 x3

O1

O2

Function of O2 has to be preserved, i.e., c and e need to be duplicated, which is not required if MIMO block is considered.

h

i

x1 x2x3

O1

O2

Case I: Classic retiming w/o duplication

Case II: Peripheral retiming w/o duplication

Case III: Peripheral retiming w duplication

4-LUT

4-LUT 4-LUT

4-LUT

FFFF

4-LUT

4-LUT 4-LUT

FFs

4-LUT

4-LUT 4-LUT

FF

4-LUT

4-LUT 4-LUT

4-LUT

FFs

3-LUT

3-LUT 3-LUT

FF

out1

out2

x1 x2 x3 x4 x5

3-LUT

3-LUT 3-LUT

FF

out1

out2

-1 FF

x1 x2 x3 x4 x5

3-LUT

3-LUT

FF

out1

out2

-1 FF

x1 x2 x3 x4 x5

3-LUT

3-LUT

FF

out1

out2

FF

x1 x2 x3 x4 x5

LUT

LUT LUT

LUT

FF

x1 x2 x3 x4

LUT

LUT LUT

LUT-a

FF

LUT-b

x1 x12 x4x1

3 x02 x0

3

LUT

LUT LUT

FFs

LUT-a LUT-b

x1 x12 x4x1

3 x02 x0

3

LUT

LUT

LUT-c LUT-d

FFs

x1 x12 x4x1

3 x02 x0

3

LUT

LUT

LUT-c LUT-d

FFs

x1 x12 x4x1

3 x02 x0

3

LUT

LUT

LUT

FFs

x1 x2 x3 x4

Circuit ABC LUT# Runtime(min) Comb Seq Comb Seq

bigkey 1261 1261 (0.00%) 1244 (-1.35%) 2709 1898 clma 4210 4167 (-1.02%) 4116 (-2.23%) 2697 3825 di_eq 674 674 (0.00%) 673 (-0.15%) 655 856 dsip 1554 1330 (-14.41%) 1338 (-13.90%) 705 1481

elliptic 441 419 (-4.99%) 419 (-4.99%) 32 370 frisc 2841 2660 (-6.37%) 2595 (-8.66%) 1364 1537 s298 44 41 (-6.82%) 37 (-15.91%) 186 125

s38417 3134 3105 (-0.93%) 3117 (-0.54%) 3466 6092 s38584 3720 3654 (-1.77%) 3655 (-1.75%) 2867 8363 tseng 946 935 (-1.16%) 934 (-1.27%) 1331 1492 ave 1883 1825 (-3.75%) 1813 (-5.07%) 1601 2604

Ratio 1 99.3% 1 1.6X

Logic blockIntermediate

logics

Fault rateof previous

logics

Input faults of LB2

Faults in config-bits

X

Both faults in LUT configuration and interconnect are considered and modeled as random variables. 18 synthesis solutions for MCNC-i10

Input: PLB template H and Boolean function F Fault rate for the inputs and SRAM bits of PLB H

Output: Either that F cannot be implemented by PLB H Or that the configuration of H which minimizes the probability that faults are observable in the output of the PLB under all input vectors

Fault-Tolerant BM task breakdown Find multiple Boolean matching Evaluate the stochastic fault rate

Deterministic SAT vs. SSAT

Deterministic SAT Stochastic SAT

Faults in intermediate wiresFaults in LUT configurations

LUT4

LUT4

LUT4 LUT4

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

z1

z3

z2

G

F1

F2

LUT2

LUT2

00 101 110 x11 1

z1

z3

z2

X1

X2

X3

X4

LUT2

00 x01 x10 x11 x

LUT2z1

z3

z2

X1

X2

1

1

G

(a) (b)

Satisfiability don’t-

care

Observability don’t-

care

Robust PLB structure introduces more potential for don’t-cares Stochastic resynthesis maximizes don’t-cares w/ FTBM and robust PLB

20.66

27.15

0 5 10 15 20 25 30

MT

TF

(y

ea

r)

ABC Stochastic Resynth 31% MTTF increase!

LUT utilization is low for Xilinx V-5 FPGA

5