1 performance analysis (clock signal). 2 unbalanced delays logic with unbalanced delays leads to...

51
1 Performance Analysis (Clock Signal)

Upload: octavia-miles

Post on 18-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

1

Performance Analysis

(Clock Signal)

Page 2: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

2

Unbalanced delays

Logic with unbalanced delays leads to inefficient use of logic:

long clock periodshort clock period

Page 3: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

3

Flip-flop-based system performance analysis

Page 4: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

4

Flip-flop-based system model

• Clock signal is perfect (no rise/fall), period P• Clock event on rising edge• Setup time s

– Time from arrival of combinational logic event to clock event

• Propagation time p– Time for value to go from input to output (tco?)

• Worst-case combinational delay C– Time from output of flip-flop to input

Page 5: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

5

Clock period constraint

• P >= p + C + s. s p C

Page 6: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

6

Clock parameters

Page 7: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

7

Clock with rise/fall

• tr is large because the clock wire is long and has high capacitance.

Page 8: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

8

Rise/fall clock period constraint

• P >= tr + p + C + s s p Ctr

Page 9: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

9

Skew

• Skew: relative delay between events.

• Clock skew: can harm any sequential system.

Page 10: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

10

Clock skew

Clock must arrive at all memory elements in time to load data.

Page 11: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

11

Clock skew in system

D Q

D Q

logic

Page 12: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

12

Clock skew and qualified (gated) clocks

Page 13: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

13

Clock skew analysis model

s12 = 1 – 2s21 = 2 – 1

• Assume 1 > 2 (s12 > 0)φ

Page 14: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

14

Skew and clock period

• Assume that each flip-flop operates instantaneously:

• If clock arrives at FF1 after FF2, then there is less time for the signal to propagate through the combinational logic.

• Given clock period, determine allowable skew:P >= 2 + s12

Page 15: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

15

Clock distribution

• Often one of the hardest problems in clock design.– Fast edges.

– Minimum skew.

Page 16: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

16

Clock skew example

10 ps 10 ps

20 ps 20 ps

30 ps 30 ps

D Q D QD Q

D Q

Page 17: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

17

Clock H-Tree

Page 18: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

Digital Clock Manager (DCM)

• A hard block in FPGAs– Gets clock input– Generates daughter clocks

• FPGA can have multiple DCMs

• Provides clocks to – internal circuitry– external devices on board

18

Page 19: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM

19

Page 20: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM Functions

• Jitter removal

20

Page 21: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM Functions• Frequency synthesis:

– Clock frequency generated outside ≠ frequency needed in our FPGA multiplies/divides it to generate daughter clocks

• Can generate even other ratios (3/4, 4/5 of original)– In Spartan 6: n times or n/2 times (n ∈ [1,16])

21

Page 22: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM Functions• Phase shifting:

– Some designs need phased-shifted clocks• Some DCMs allow common values

– 120, 240: for 3-phase clocking scheme

– 90, 180, 270: for 4-phase clocking scheme

• Some DCMs allow to set exact values

22

Page 23: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM Functions• Clock deskewing:

– DCM gets a special input – DCM compares the two signals– Adds additional delay to the daughter clock to align

with the main clock

• Two types:– PLL (phase-locked loop): analogue– DLL (digital-delay locked loop): digital

23

Page 24: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM Functions• Auto skew correction:

24

Page 25: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

DCM

• You can enter DCM parameters in tools:– Frequency– Duty cycle– Phase shift– …

• See Xilinx ISE In-Depth Tutorial, UG695 (v 14.1), 2012

25

Page 26: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

26

Case Study

• 16 x 16 multiplier example.

Page 27: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

27

The FPGA design process

• Xilinx ISE (Integrated Synthesis Environment)– Translation from HDL

– Logic synthesis

– Placement and routing

– Configuration generation

Page 28: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

28

Design experiments

• Synthesize with no constraints.• Synthesize with timing constraint.

– Tighten timing constraint.

• Synthesize with placement constraints.• Power:

– Many tools don’t allow us to directly specify power consumption

– Some tools allow us to specify power as an objective– May need to rewrite our h/w description for better power

consumption characteristics.

Page 29: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

Commercial Tools

• XST “-power” option reduces dynamic power consumption.

• Xilinx MAP and PAR“-power” option reduces dynamic power– But increases runtime and decreases design

performance.

• Quartus-II has Power-Driven Synthesis and Place & Route.

29

Page 30: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

30

Post-translation simulation model

• No timing or area constraints• HDL model in terms of FPGA primitives.• Example: X_LUT4 \p12_Madd__n0015_Mxor_Result_Xo<1>1

( .ADR0(x_7_IBUF), .ADR1(y_13_IBUF), .ADR2(c12[7]), .ADR3(row12[8]), .O(row13[7]) );

Page 31: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

31

Mapping report

Design Summary--------------Number of errors: 0Number of warnings: 0Logic Utilization: Number of 4 input LUTs: 501 out of 1,024 48%Logic Distribution: Number of occupied Slices: 255 out of 512 49% Number of Slices containing only related logic: 255 out of 255 100% Number of Slices containing unrelated logic: 0 out of 255 0% *See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs: 501 out of 1,024 48%

Number of bonded IOBs: 64 out of 92 69%

Total equivalent gate count for design: 3,006Additional JTAG gate count for IOBs: 3,072Peak Memory Usage: 64 MB

Page 32: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

32

Static timing analysis report

Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" 99.999 uS ;

20135312 items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors)

Maximum delay is 20.916ns.--------------------------------------------------------------------------------

After Mapping: estimated delays (no information about interconnects)

Page 33: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

33

Static timing report: delays along paths

Data Sheet report:-----------------All values displayed in nanoseconds (ns)

Pad to Pad------------------+----------------------+-----------+Source Pad |Destination Pad| Delay |------------------+----------------------+-----------+x<0> |p<0> | 5.824|x<0> |p<10> | 10.675|x<0> |p<11> | 11.214|x<0> |p<12> | 11.753|

Page 34: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

34

Routing reportPhase 1: 1975 unrouted; REAL time: 11 secs

Phase 2: 1975 unrouted; REAL time: 11 secs

Phase 3: 619 unrouted; REAL time: 12 secs

Phase 4: 619 unrouted; (0) REAL time: 12 secs

Phase 5: 619 unrouted; (0) REAL time: 12 secs

Phase 6: 619 unrouted; (0) REAL time: 12 secs

Phase 7: 0 unrouted; (0) REAL time: 12 secs

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is: 0

• REAL time: Routing algorithm run time.

Page 35: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

35

Static timing after routing

Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" 99.999 uS ;

20135312 items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors)

Maximum delay is 38.424ns.

-------------------------------------------------------------------

• (vs. 20.916 ns in mapping report) Because of interconnect delays.

Page 36: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

36

Timing constraint

• Use timing constraint editor:

Page 37: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

37

Post-map static timing report

Timing constraint: TS_P2P = MAXDELAY FROM

TIMEGRP "PADS" TO TIMEGRP "PADS" 32 nS ;

20135312 items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors)

Maximum delay is 20.916ns.

Pad to pad

Hasn’t changed since this design has limited opportunities for logic synthesis to change delays by restructuring logic.

Page 38: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

38

Post-routing static timing report

Timing constraint: TS_P2P = MAXDELAY FROM

TIMEGRP "PADS" TO TIMEGRP "PADS" 32 nS ;

20135312 items analyzed, 0 timing errors detected. (0 setup errors, 0 hold errors)

Maximum delay is 31.984ns.

Tools generally try to meet the delay goal as closely as possible to minimize area.

Page 39: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

39

Tighter timing constraints

• Tighten requirement to 25 ns.• Post-place-route timing report:

Timing constraint: TS_P2P = MAXDELAY FROM TIMEGRP "PADS" TO TIMEGRP "PADS" 25 nS ;

20135312 items analyzed, 11 timing errors detected. (11 setup errors, 0 hold errors)

Maximum delay is 31.128ns.

Page 40: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

40

Report on a violated path

Slack: -6.128ns (requirement - data path) Source: y<0> (PAD) Destination: p<30> (PAD) Requirement: 25.000ns Data Path Delay: 31.128ns (Levels of Logic = 31)

Modify the logic and/or physical design to improve the delay.

Page 41: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

41

Power report

Power summary: I(mA) P(mW)----------------------------------------------------------------Total estimated power consumption: 333 --- Vccint 1.50V: 0 0 Vccaux 3.30V: 100 330 Vcco33 3.30V: 1 3 --- Inputs: 0 0 Logic: 0 0 Outputs: Vcco33 0 0 Signals: 0 0 --- Quiescent Vccaux 3.30V: 100 330 Quiescent Vcco33 3.30V: 1 3

Thermal summary:---------------------------------------------------------------- Estimated junction temperature: 36C Ambient temp: 25C Case temp: 35C Theta J-A: 34C/W

Helps us determine whether we need additional cooling.

Page 42: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

42

Improving area• Floorplanner window:

– Floorplanner View/edit placed design

LEs

Chipfloorplan

• Green rectangles: mapped components to CLBs

Page 43: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

43

Rat’s nest wiring• If you click on a component in the deign hierarchy window, its

rat’s nest is shown.

Page 44: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

44

Routing editor view• FPGA Editor View/Edit Routed Design

Page 45: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

45

Editing constraints• Use constraints editor to place constraints:

– This tool allws you to constrain 1. placement of logic 2. assignment of chip I/Os to IOBs (e.g useful for PCB design)

Page 46: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

46

Design browser pane

Page 47: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

47

Drag and drop constraints

Page 48: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

48

Change the shape of constraints

Page 49: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

49

Full set of placement constraints

• We place the rows of the multiplier one below the other to create the row structure of the floorplan.

Page 50: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

50

Placement results

Page 51: 1 Performance Analysis (Clock Signal). 2 Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: long clock periodshort clock

51

New timing report

• After placement constraints: 19742142 items analyzed, 0 timing errors

detected. (0 setup errors, 0 hold errors)

Maximum delay is 29.934ns.

• Compares to 31.128 ns for unconstrained placement.