ece 327 slides vhdl verilog digital hardware design

705

Upload: ysakeun

Post on 12-Sep-2014

459 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: ECE 327 Slides VHDL Verilog Digital Hardware Design
Page 2: ECE 327 Slides VHDL Verilog Digital Hardware Design
Page 3: ECE 327 Slides VHDL Verilog Digital Hardware Design

E&CE 327: Digital Systems Engineering

Lecture Slides

Mark Aagaard2011t1–Winter

University of WaterlooDept of Electrical and Computer Engineering

Page 4: ECE 327 Slides VHDL Verilog Digital Hardware Design
Page 5: ECE 327 Slides VHDL Verilog Digital Hardware Design

Contents

I Lecture Notes 1

1 VHDL 31.1 Introduction to VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Levels of Abstraction . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 VHDL Origins and History . . . . . . . . . . . . . . . . . . . . 51.1.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.4 Synthesis of a Simulation-Based Language . . . . . . . . . . 111.1.5 Solution to Synthesis Sanity . . . . . . . . . . . . . . . . . . . 121.1.6 Standard Logic 1164 . . . . . . . . . . . . . . . . . . . . . . . 13

1.2 Comparison of VHDL to Other Hardware Description Languages . . 14

i

Page 6: ECE 327 Slides VHDL Verilog Digital Hardware Design

ii CONTENTS

1.3 Overview of Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.1 Syntactic Categories . . . . . . . . . . . . . . . . . . . . . . . 141.3.2 Library Units . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.3 Entities and Architecture . . . . . . . . . . . . . . . . . . . . . 151.3.4 Concurrent Statements . . . . . . . . . . . . . . . . . . . . . 181.3.5 Component Declaration and Instantiations . . . . . . . . . . . 211.3.6 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.3.7 Sequential Statements . . . . . . . . . . . . . . . . . . . . . . 261.3.8 A Few More Miscellaneous VHDL Features . . . . . . . . . . 27

1.4 Concurrent vs Sequential Statements . . . . . . . . . . . . . . . . . 271.4.1 Concurrent Assignment vs Process . . . . . . . . . . . . . . 281.4.2 Conditional Assignment vs If Statements . . . . . . . . . . . 291.4.3 Selected Assignment vs Case Statement . . . . . . . . . . . 301.4.4 Coding Style . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.5 Overview of Processes . . . . . . . . . . . . . . . . . . . . . . . . . 321.5.1 Combinational Process vs Clocked Process . . . . . . . . . . 361.5.2 Latch Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1.6 Details of Process Execution . . . . . . . . . . . . . . . . . . . . . . 461.6.1 Simple Simulation . . . . . . . . . . . . . . . . . . . . . . . . 461.6.2 Temporal Granularities of Simulation . . . . . . . . . . . . . . 48

Page 7: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS iii

1.6.3 Intuition Behind Delta-Cycle Simulation . . . . . . . . . . . . 481.6.4 Definitions and Algorithm . . . . . . . . . . . . . . . . . . . . 50

1.6.4.1 Process Modes . . . . . . . . . . . . . . . . . . . . 501.6.4.2 Simulation Algorithm . . . . . . . . . . . . . . . . . 541.6.4.3 Delta-Cycle Definitions . . . . . . . . . . . . . . . . 57

1.6.5 Example 1: Process Execution (Bamboozle) . . . . . . . . . 581.6.6 Example 2: Process Execution (Flummox) . . . . . . . . . . . 581.6.7 Ex: Need for Provisonal Asn . . . . . . . . . . . . . . . . . . 631.6.8 Delta-Cycle Simulations of Flip-Flops . . . . . . . . . . . . . 69

1.7 Register-Transfer-Level Simulation . . . . . . . . . . . . . . . . . . . 781.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791.7.2 Technique for Register-Transfer Level Simulation . . . . . . . 801.7.3 Examples of RTL Simulation . . . . . . . . . . . . . . . . . . 81

1.7.3.1 RTL Simulation Example 1 . . . . . . . . . . . . . . 811.8 VHDL and Hardware Building Blocks . . . . . . . . . . . . . . . . . . 85

1.8.1 Basic Building Blocks . . . . . . . . . . . . . . . . . . . . . . 851.8.2 Deprecated Building Blocks for RTL . . . . . . . . . . . . . . 901.8.3 Hardware and Code for Flops . . . . . . . . . . . . . . . . . . 92

1.8.3.1 Flops with Waits and Ifs . . . . . . . . . . . . . . . . 921.8.3.2 Flops with Synchronous Reset . . . . . . . . . . . . 94

Page 8: ECE 327 Slides VHDL Verilog Digital Hardware Design

iv CONTENTS

1.8.3.3 Flop with Chip-Enable and Mux on Input . . . . . . 1011.8.3.4 Flops with Chip-Enable, Muxes, and Reset . . . . . 102

1.8.4 An Example Sequential Circuit . . . . . . . . . . . . . . . . . 1021.9 Arrays and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021.10 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

1.10.1 Arithmetic Packages . . . . . . . . . . . . . . . . . . . . . . 1031.10.2 Shift and Rotate Operations . . . . . . . . . . . . . . . . . . 1041.10.3 Overloading of Arithmetic . . . . . . . . . . . . . . . . . . . 1041.10.4 Different Widths and Arithmetic . . . . . . . . . . . . . . . . 1041.10.5 Overloading of Comparisons . . . . . . . . . . . . . . . . . 1041.10.6 Different Widths and Comparisons . . . . . . . . . . . . . . 1051.10.7 Type Conversion . . . . . . . . . . . . . . . . . . . . . . . . 106

1.11 Synthesizable vs Non-Synthesizable Code . . . . . . . . . . . . . . 1081.11.1 Unsynthesizable Code . . . . . . . . . . . . . . . . . . . . . 109

1.11.1.1 Initial Values . . . . . . . . . . . . . . . . . . . . . 1091.11.1.2 Wait For . . . . . . . . . . . . . . . . . . . . . . . . 1101.11.1.3 Different Wait Conditions . . . . . . . . . . . . . . 1111.11.1.4 Multiple “if rising edge” in Process . . . . . . . . . 1131.11.1.5 “if rising edge” and “wait” in Same Process . . . . 1141.11.1.6 “if rising edge” with “else” Clause . . . . . . . . . . 115

Page 9: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS v

1.11.1.7 “if rising edge” Inside a “for” Loop . . . . . . . . . . 1161.11.1.8 “wait” Inside of a “for loop” . . . . . . . . . . . . . . 118

1.12 Synthesizable VHDL Coding Guidelines . . . . . . . . . . . . . . . 120

2 RTL Design with VHDL 1212.1 Prelude to Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1222.2 FPGA Background and Coding Guidelines . . . . . . . . . . . . . . 122

2.2.1 Generic FPGA Hardware . . . . . . . . . . . . . . . . . . . . 1222.2.1.1 Generic FPGA Cell . . . . . . . . . . . . . . . . . . 123

2.2.2 Area Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 1282.2.2.1 Interconnect for Generic FPGA . . . . . . . . . . . . 1342.2.2.2 Clocks for Generic FPGAs . . . . . . . . . . . . . . 1342.2.2.3 Special Circuitry in FPGAs . . . . . . . . . . . . . . 135

2.2.3 Generic-FPGA Coding Guidelines . . . . . . . . . . . . . . . 1392.3 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1432.4 Algorithms and High-Level Models . . . . . . . . . . . . . . . . . . . 1432.5 Finite State Machines in VHDL . . . . . . . . . . . . . . . . . . . . . 144

2.5.1 Introduction to State-Machine Design . . . . . . . . . . . . . 1442.5.1.1 Mealy vs Moore State Machines . . . . . . . . . . . 1442.5.1.2 Introduction to State Machines and VHDL . . . . . . 147

Page 10: ECE 327 Slides VHDL Verilog Digital Hardware Design

vi CONTENTS

2.5.1.3 Explicit vs Implicit State Machines . . . . . . . . . . 1492.5.2 Implementing a Simple Moore Machine . . . . . . . . . . . . 154

2.5.2.1 Implicit Moore State Machine . . . . . . . . . . . . . 1552.5.2.2 Explicit Moore with Flopped Output . . . . . . . . . 1572.5.2.3 Explicit Moore with Combinational Outputs . . . . . 1592.5.2.4 Explicit-Current+Next Moore with Concurrent As-

signment . . . . . . . . . . . . . . . . . . . . . . . . 1612.5.2.5 E-C+N Moore with Comb Proc . . . . . . . . . . . . 163

2.5.3 Implementing a Simple Mealy Machine . . . . . . . . . . . . 1652.5.4 Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1662.5.5 State Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 170

2.6 Dataflow Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1712.6.1 Dataflow Diagrams Overview . . . . . . . . . . . . . . . . . . 1712.6.2 Dataflow Diagrams, Hardware, and Behaviour . . . . . . . . 1842.6.3 Dataflow Diagram Execution . . . . . . . . . . . . . . . . . . 1882.6.4 Performance Estimation . . . . . . . . . . . . . . . . . . . . . 1982.6.5 Area Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 1992.6.6 Design Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2012.6.7 Area / Performance Tradeoffs . . . . . . . . . . . . . . . . . . 203

2.7 Design Example: Massey . . . . . . . . . . . . . . . . . . . . . . . . 206

Page 11: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS vii

2.8 Design Example: Vanier . . . . . . . . . . . . . . . . . . . . . . . . . 2062.8.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 2082.8.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2092.8.3 Initial Dataflow Diagram . . . . . . . . . . . . . . . . . . . . . 2102.8.4 Reschedule to Meet Requirements . . . . . . . . . . . . . . . 2112.8.5 Optimize Resources . . . . . . . . . . . . . . . . . . . . . . . 2132.8.6 Assign Names to Registered Values . . . . . . . . . . . . . . 2162.8.7 Input/Output Allocation . . . . . . . . . . . . . . . . . . . . . 2172.8.8 Tangent: Combinational Outputs . . . . . . . . . . . . . . . . 2202.8.9 Register Allocation . . . . . . . . . . . . . . . . . . . . . . . . 2212.8.10 Datapath Allocation . . . . . . . . . . . . . . . . . . . . . . . 2232.8.11 Hardware Block Diagram and State Machine . . . . . . . . 224

2.8.11.1 Control for Registers . . . . . . . . . . . . . . . . . 2252.8.11.2 Control for Datapath Components . . . . . . . . . 2282.8.11.3 Control for State . . . . . . . . . . . . . . . . . . . 2302.8.11.4 Complete State Machine Table . . . . . . . . . . . 231

2.8.12 VHDL Code with Explicit State Machine . . . . . . . . . . . 2332.8.13 Peephole Optimizations . . . . . . . . . . . . . . . . . . . . 2372.8.14 Notes and Observations . . . . . . . . . . . . . . . . . . . . 240

2.9 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

Page 12: ECE 327 Slides VHDL Verilog Digital Hardware Design

viii CONTENTS

2.9.1 Introduction to Pipelining . . . . . . . . . . . . . . . . . . . . 2422.9.2 Partially Pipelined . . . . . . . . . . . . . . . . . . . . . . . . 2482.9.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

2.10 Design Example: Pipelined Massey . . . . . . . . . . . . . . . . . . 2522.11 Memory Arrays and RTL Design . . . . . . . . . . . . . . . . . . . 256

2.11.1 Memory Operations . . . . . . . . . . . . . . . . . . . . . . 2562.11.2 Memory Arrays in VHDL . . . . . . . . . . . . . . . . . . . . 2602.11.3 Data Dependencies . . . . . . . . . . . . . . . . . . . . . . 2602.11.4 Memory and Dataflow Diagrams . . . . . . . . . . . . . . . 2652.11.5 Ex: Mem Array and Dataflow Diagram . . . . . . . . . . . . 272

2.12 Input / Output Protocols . . . . . . . . . . . . . . . . . . . . . . . . 2792.13 Example: Moving Average . . . . . . . . . . . . . . . . . . . . . . . 280

2.13.1 Requirements and Environmental Assumptions . . . . . . . 2812.13.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2822.13.3 Pseudocode and Dataflow Diagrams . . . . . . . . . . . . . 2862.13.4 Control Tables and State Machine . . . . . . . . . . . . . . . 2912.13.5 VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

Page 13: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS ix

3 Performance Analysis and Optimization 2973.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2983.2 Defining Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 2993.3 Comparing Performance . . . . . . . . . . . . . . . . . . . . . . . . . 302

3.3.1 General Equations . . . . . . . . . . . . . . . . . . . . . . . . 3023.3.2 Example: Performance of Printers . . . . . . . . . . . . . . . 304

3.4 Clock Speed, CPI, Program Length, and Performance . . . . . . . . 3053.4.1 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . 3053.4.2 Example: CISC vs RISC and CPI . . . . . . . . . . . . . . . . 3063.4.3 Effect of Instruction Set on Performance . . . . . . . . . . . . 3103.4.4 Effect of Time to Market on Relative Performance . . . . . . 3123.4.5 Summary of Equations . . . . . . . . . . . . . . . . . . . . . 312

3.5 Performance Analysis and Dataflow Diagrams . . . . . . . . . . . . 3133.5.1 Dataflow Diagrams, CPI, and Clock Speed . . . . . . . . . . 3133.5.2 Examples of Dataflow Diagrams for Two Instructions . . . . . 316

3.5.2.1 Scheduling of Operations for Different Clock Periods 3173.5.2.2 Performance Computation for Different Clock Periods 3203.5.2.3 Example: Two Instructions Taking Similar Time . . . 3213.5.2.4 Example: Same Total Time, Different Order for A . . 322

3.5.3 Example: From Algorithm to Optimized Dataflow . . . . . . . 323

Page 14: ECE 327 Slides VHDL Verilog Digital Hardware Design

x CONTENTS

3.6 General Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 3263.6.1 Strength Reduction . . . . . . . . . . . . . . . . . . . . . . . 326

3.6.1.1 Arithmetic Strength Reduction . . . . . . . . . . . . 3263.6.1.2 Boolean Strength Reduction . . . . . . . . . . . . . 327

3.6.2 Replication and Sharing . . . . . . . . . . . . . . . . . . . . . 3283.6.2.1 Mux-Pushing . . . . . . . . . . . . . . . . . . . . . . 3283.6.2.2 Common Subexpression Elimination . . . . . . . . . 3293.6.2.3 Computation Replication . . . . . . . . . . . . . . . 331

3.6.3 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3323.7 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Page 15: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xi

4 Functional Verification 3354.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

4.1.1 Terminology: Validation / Verification / Testing . . . . . . . . . 3364.1.2 The Difficulty of Designing Correct Chips . . . . . . . . . . . 336

4.1.2.1 Notes from Kenn Heinrich (UW E&CE grad) . . . . 3374.1.2.2 Notes from Aart de Geus (Chairman and CEO of

Synopsys) . . . . . . . . . . . . . . . . . . . . . . . 3374.2 Test Cases and Coverage . . . . . . . . . . . . . . . . . . . . . . . . 338

4.2.1 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3384.2.2 Floating Point Divider Example . . . . . . . . . . . . . . . . . 339

4.3 Testbenches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3444.3.1 Overview of Test Benches . . . . . . . . . . . . . . . . . . . . 3444.3.2 Reference Model Style Testbench . . . . . . . . . . . . . . . 3454.3.3 Relational Style Testbench . . . . . . . . . . . . . . . . . . . 3454.3.4 Coding Structure of a Testbench . . . . . . . . . . . . . . . . 3464.3.5 Datapath vs Control . . . . . . . . . . . . . . . . . . . . . . . 3474.3.6 Verification Tips . . . . . . . . . . . . . . . . . . . . . . . . . 348

4.4 Functional Verification for Datapath Circuits . . . . . . . . . . . . . . 3494.4.1 A Spec-Less Testbench . . . . . . . . . . . . . . . . . . . . . 3514.4.2 Use an Array for Test Vectors . . . . . . . . . . . . . . . . . . 352

Page 16: ECE 327 Slides VHDL Verilog Digital Hardware Design

xii CONTENTS

4.4.3 Build Spec into Stimulus . . . . . . . . . . . . . . . . . . . . . 3534.4.4 Have Separate Specification Entity . . . . . . . . . . . . . . . 3554.4.5 Generate Test Vectors Automatically . . . . . . . . . . . . . . 3584.4.6 Relational Specification . . . . . . . . . . . . . . . . . . . . . 359

4.5 Functional Verification of Control Circuits . . . . . . . . . . . . . . . 3604.5.1 Overview of Queues in Hardware . . . . . . . . . . . . . . . . 3614.5.2 VHDL Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

4.5.2.1 Package . . . . . . . . . . . . . . . . . . . . . . . . 3684.5.2.2 Other VHDL Coding . . . . . . . . . . . . . . . . . . 368

4.5.3 Code Structure for Verification . . . . . . . . . . . . . . . . . 3694.5.4 Instrumentation Code . . . . . . . . . . . . . . . . . . . . . . 3714.5.5 Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3764.5.6 VHDL Coding Tips . . . . . . . . . . . . . . . . . . . . . . . . 3804.5.7 Queue Specification . . . . . . . . . . . . . . . . . . . . . . . 3854.5.8 Queue Testbench . . . . . . . . . . . . . . . . . . . . . . . . 389

4.6 Example: Microwave Oven . . . . . . . . . . . . . . . . . . . . . . . 391

Page 17: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xiii

5 Timing Analysis 4015.1 Delays and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 402

5.1.1 Background Definitions . . . . . . . . . . . . . . . . . . . . . 4025.1.2 Clock-Related Timing Definitions . . . . . . . . . . . . . . . . 403

5.1.2.1 Clock Skew . . . . . . . . . . . . . . . . . . . . . . . 4035.1.2.2 Clock Latency . . . . . . . . . . . . . . . . . . . . . 4055.1.2.3 Clock Jitter . . . . . . . . . . . . . . . . . . . . . . . 406

5.1.3 Storage-Related Timing Definitions . . . . . . . . . . . . . . . 4085.1.3.1 Flops and Latches . . . . . . . . . . . . . . . . . . . 408

5.1.4 Propagation Delays . . . . . . . . . . . . . . . . . . . . . . . 4105.1.5 Timing Constraints . . . . . . . . . . . . . . . . . . . . . . . . 411

5.1.5.1 Minimum Clock Period . . . . . . . . . . . . . . . . . 4115.1.5.2 Hold Constraint . . . . . . . . . . . . . . . . . . . . 4125.1.5.3 Example Timing Violations . . . . . . . . . . . . . . 412

5.2 Timing Analysis of Latches and Flip Flops . . . . . . . . . . . . . . . 4155.2.1 Simple Multiplexer Latch . . . . . . . . . . . . . . . . . . . . . 415

5.2.1.1 Structure and Behaviour of Multiplexer Latch . . . . 4165.2.1.2 Strategy for Timing Analysis of Storage Devices . . 4205.2.1.3 Clock-to-Q Time of a Multiplexer Latch . . . . . . . 4215.2.1.4 Setup Timing of a Multiplexer Latch . . . . . . . . . 422

Page 18: ECE 327 Slides VHDL Verilog Digital Hardware Design

xiv CONTENTS

5.2.1.5 Hold Time of a Multiplexer Latch . . . . . . . . . . . 4285.2.1.6 Example of a Bad Latch . . . . . . . . . . . . . . . . 430

5.3 Critical Paths and False Paths . . . . . . . . . . . . . . . . . . . . . 4315.3.1 Introduction to Critical and False Paths . . . . . . . . . . . . 431

5.3.1.1 Example of Critical Path in Full Adder . . . . . . . . 4345.3.1.2 Preliminaries for Critical Paths . . . . . . . . . . . . 4365.3.1.3 Longest Path and Critical Path . . . . . . . . . . . . 436

5.3.2 Longest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 4405.3.3 Detecting a False Path . . . . . . . . . . . . . . . . . . . . . . 441

5.3.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . 4415.3.3.2 Almost-Correct Algorithm to Detect a False Path . . 4475.3.3.3 Examples of Detecting False Paths . . . . . . . . . 447

5.3.4 Finding the Next Candidate Path . . . . . . . . . . . . . . . . 4495.3.4.1 Algorithm to Find Next Candidate Path . . . . . . . 4505.3.4.2 Examples of Finding Next Candidate Path . . . . . . 451

5.3.5 Correct Algorithm to Find Critical Path . . . . . . . . . . . . . 4545.3.5.1 Rules for Late Side Inputs . . . . . . . . . . . . . . . 4545.3.5.2 Monotone Speedup . . . . . . . . . . . . . . . . . . 4555.3.5.3 Analysis of Side-Input-Causes-Glitch Situation . . . 4565.3.5.4 Complete Algorithm . . . . . . . . . . . . . . . . . . 456

Page 19: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xv

5.3.5.5 Complete Examples . . . . . . . . . . . . . . . . . . 4575.3.6 Further Extensions to Critical Path Analysis . . . . . . . . . . 4625.3.7 Increasing the Accuracy of Critical Path Analysis . . . . . . . 462

5.4 Elmore Timing Model . . . . . . . . . . . . . . . . . . . . . . . . . . 4635.4.1 RC-Networks for Timing Analysis . . . . . . . . . . . . . . . . 4635.4.2 Derivation of Analog Timing Model . . . . . . . . . . . . . . . 475

5.4.2.1 Example Derivation: Equation for Voltage at Node 3 4795.4.2.2 General Derivation . . . . . . . . . . . . . . . . . . . 483

5.4.3 Elmore Timing Model . . . . . . . . . . . . . . . . . . . . . . 4875.4.4 Examples of Using Elmore Delay . . . . . . . . . . . . . . . . 491

5.4.4.1 Interconnect with Single Fanout . . . . . . . . . . . 4915.4.4.2 Interconnect with Multiple Gates in Fanout . . . . . 495

5.5 Practical Usage of Timing Analysis . . . . . . . . . . . . . . . . . . . 4985.5.1 Speed Binning . . . . . . . . . . . . . . . . . . . . . . . . . . 500

5.5.1.1 FPGAs, Interconnect, and Synthesis . . . . . . . . . 5015.5.2 Worst Case Timing . . . . . . . . . . . . . . . . . . . . . . . 502

5.5.2.1 Fanout delay . . . . . . . . . . . . . . . . . . . . . . 5025.5.2.2 Derating Factors . . . . . . . . . . . . . . . . . . . . 503

Page 20: ECE 327 Slides VHDL Verilog Digital Hardware Design

xvi CONTENTS

6 Power Analysis and Power-Aware Design 5076.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

6.1.1 Importance of Power and Energy . . . . . . . . . . . . . . . . 5086.1.2 Industrial Names and Products . . . . . . . . . . . . . . . . . 5096.1.3 Power vs Energy . . . . . . . . . . . . . . . . . . . . . . . . . 5096.1.4 Batteries, Power and Energy . . . . . . . . . . . . . . . . . . 510

6.1.4.1 Do Batteries Store Energy or Power? . . . . . . . . 5106.1.4.2 Battery Life and Efficiency . . . . . . . . . . . . . . 5116.1.4.3 Battery Life and Power . . . . . . . . . . . . . . . . 512

6.2 Power Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5156.2.1 Switching Power . . . . . . . . . . . . . . . . . . . . . . . . . 5176.2.2 Short-Circuited Power . . . . . . . . . . . . . . . . . . . . . . 5206.2.3 Leakage Power . . . . . . . . . . . . . . . . . . . . . . . . . . 5216.2.4 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5226.2.5 Note on Power Equations . . . . . . . . . . . . . . . . . . . . 522

6.3 Overview of Power Reduction Techniques . . . . . . . . . . . . . . . 5226.4 Voltage Reduction for Power Reduction . . . . . . . . . . . . . . . . 5276.5 Data Encoding for Power Reduction . . . . . . . . . . . . . . . . . . 531

6.5.1 How Data Encoding Can Reduce Power . . . . . . . . . . . . 5316.5.2 Example Problem: Sixteen Pulser . . . . . . . . . . . . . . . 535

Page 21: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xvii

6.5.2.1 Problem Statement . . . . . . . . . . . . . . . . . . 5356.5.2.2 Additional Information . . . . . . . . . . . . . . . . . 5366.5.2.3 Answer . . . . . . . . . . . . . . . . . . . . . . . . . 538

6.6 Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5446.6.1 Introduction to Clock Gating . . . . . . . . . . . . . . . . . . . 5446.6.2 Implementing Clock Gating . . . . . . . . . . . . . . . . . . . 5456.6.3 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . 5466.6.4 Effectiveness of Clock Gating . . . . . . . . . . . . . . . . . . 5466.6.5 Example: Reduced Activity Factor with Clock Gating . . . . . 5506.6.6 Clock Gating with Valid-Bit Protocol . . . . . . . . . . . . . . 552

6.6.6.1 Valid-Bit Protocol . . . . . . . . . . . . . . . . . . . . 5526.6.6.2 How Many Clock Cycles for Module? . . . . . . . . 5556.6.6.3 Adding Clock-Gating Circuitry . . . . . . . . . . . . 556

6.6.7 Example: Pipelined Circuit with Clock-Gating . . . . . . . . . 559

Page 22: ECE 327 Slides VHDL Verilog Digital Hardware Design

xviii CONTENTS

7 Fault Testing and Testability 5637.1 Faults and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564

7.1.1 Overview of Faults and Testing . . . . . . . . . . . . . . . . . 5647.1.1.1 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . 5647.1.1.2 Causes of Faults . . . . . . . . . . . . . . . . . . . . 5657.1.1.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . 5657.1.1.4 Burn In . . . . . . . . . . . . . . . . . . . . . . . . . 5667.1.1.5 Bin Sorting . . . . . . . . . . . . . . . . . . . . . . . 5667.1.1.6 Testing Techniques . . . . . . . . . . . . . . . . . . 5677.1.1.7 Design for Testability (DFT) . . . . . . . . . . . . . . 567

7.1.2 Example Problem: Economics of Testing . . . . . . . . . . . 5677.1.3 Physical Faults . . . . . . . . . . . . . . . . . . . . . . . . . . 567

7.1.3.1 Types of Physical Faults . . . . . . . . . . . . . . . . 5687.1.3.2 Locations of Faults . . . . . . . . . . . . . . . . . . . 5697.1.3.3 Layout Affects Locations . . . . . . . . . . . . . . . 5707.1.3.4 Naming Fault Locations . . . . . . . . . . . . . . . . 570

7.1.4 Detecting a Fault . . . . . . . . . . . . . . . . . . . . . . . . . 5717.1.4.1 Which Test Vectors will Detect a Fault? . . . . . . . 571

7.1.5 Mathematical Models of Faults . . . . . . . . . . . . . . . . . 5747.1.5.1 Single Stuck-At Fault Model . . . . . . . . . . . . . . 575

Page 23: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xix

7.1.6 Generate Test Vector to Find a Mathematical Fault . . . . . . 5777.1.6.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 5777.1.6.2 Example of Finding a Test Vector . . . . . . . . . . . 578

7.1.7 Undetectable Faults . . . . . . . . . . . . . . . . . . . . . . . 5797.1.7.1 Redundant Circuitry . . . . . . . . . . . . . . . . . . 5797.1.7.2 Curious Circuitry and Fault Detection . . . . . . . . 582

7.2 Test Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5837.2.1 A Small Example . . . . . . . . . . . . . . . . . . . . . . . . . 5837.2.2 Choosing Test Vectors . . . . . . . . . . . . . . . . . . . . . . 584

7.2.2.1 Fault Domination . . . . . . . . . . . . . . . . . . . . 5857.2.2.2 Fault Equivalence . . . . . . . . . . . . . . . . . . . 5867.2.2.3 Gate Collapsing . . . . . . . . . . . . . . . . . . . . 5877.2.2.4 Node Collapsing . . . . . . . . . . . . . . . . . . . . 5887.2.2.5 Fault Collapsing Summary . . . . . . . . . . . . . . 588

7.2.3 Fault Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . 5897.2.4 Test Vector Generation and Fault Detection . . . . . . . . . . 5907.2.5 Generate Test Vectors for 100% Coverage . . . . . . . . . . 591

7.2.5.1 Collapse the Faults . . . . . . . . . . . . . . . . . . 5927.2.5.2 Check for Fault Domination . . . . . . . . . . . . . . 5957.2.5.3 Required Test Vectors . . . . . . . . . . . . . . . . . 597

Page 24: ECE 327 Slides VHDL Verilog Digital Hardware Design

xx CONTENTS

7.2.5.4 Faults Not Covered by Required Test Vectors . . . . 5987.2.5.5 Order to Run Test Vectors . . . . . . . . . . . . . . . 5997.2.5.6 Summary of Technique to Find and Order Test Vectors601

7.2.6 One Fault Hiding Another . . . . . . . . . . . . . . . . . . . . 6027.3 Scan Testing in General . . . . . . . . . . . . . . . . . . . . . . . . . 604

7.3.1 Structure and Behaviour of Scan Testing . . . . . . . . . . . 6047.3.2 Scan Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 606

7.3.2.1 Circuitry in Normal and Scan Mode . . . . . . . . . 6077.3.2.2 Scan in Operation . . . . . . . . . . . . . . . . . . . 6087.3.2.3 Scan in Operation with Example Circuit . . . . . . . 610

7.3.3 Summary of Scan Testing . . . . . . . . . . . . . . . . . . . . 6147.3.4 Time to Test a Chip . . . . . . . . . . . . . . . . . . . . . . . 615

7.3.4.1 Example: Time to Test a Chip . . . . . . . . . . . . 6167.4 Boundary Scan and JTAG . . . . . . . . . . . . . . . . . . . . . . . . 617

7.4.1 Scan Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 6207.5 Built In Self Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621

7.5.1 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 6217.5.1.1 Components . . . . . . . . . . . . . . . . . . . . . . 6247.5.1.2 Linear Feedback Shift Register (LFSR) . . . . . . . 6287.5.1.3 Maximal-Length LFSR . . . . . . . . . . . . . . . . . 630

Page 25: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xxi

7.5.2 Test Generator . . . . . . . . . . . . . . . . . . . . . . . . . . 6337.5.3 Signature Analyzer . . . . . . . . . . . . . . . . . . . . . . . . 6367.5.4 Result Checker . . . . . . . . . . . . . . . . . . . . . . . . . . 6407.5.5 Arithmetic over Binary Fields . . . . . . . . . . . . . . . . . . 6417.5.6 Shift Registers and Characteristic Polynomials . . . . . . . . 643

7.5.6.1 Circuit Multiplication . . . . . . . . . . . . . . . . . . 6467.5.7 Bit Streams and Characteristic Polynomials . . . . . . . . . . 6477.5.8 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6487.5.9 Signature Analysis: Math and Circuits . . . . . . . . . . . . . 651

7.6 Scan vs Self Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660

Page 26: ECE 327 Slides VHDL Verilog Digital Hardware Design

xxii CONTENTS

8 Review 6618.1 Overview of the Term . . . . . . . . . . . . . . . . . . . . . . . . . . 6628.2 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663

8.2.1 VHDL Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6638.2.2 VHDL Example Problems . . . . . . . . . . . . . . . . . . . . 664

8.3 RTL Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 6658.3.1 Design Topics . . . . . . . . . . . . . . . . . . . . . . . . . . 6658.3.2 Design Example Problems . . . . . . . . . . . . . . . . . . . 666

8.4 Functional Verification . . . . . . . . . . . . . . . . . . . . . . . . . . 6678.4.1 Verification Topics . . . . . . . . . . . . . . . . . . . . . . . . 6678.4.2 Verification Example Problems . . . . . . . . . . . . . . . . . 668

8.5 Performance Analysis and Optimization . . . . . . . . . . . . . . . . 6698.5.1 Performance Topics . . . . . . . . . . . . . . . . . . . . . . . 6698.5.2 Performance Example Problems . . . . . . . . . . . . . . . . 670

8.6 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6718.6.1 Timing Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6718.6.2 Timing Example Problems . . . . . . . . . . . . . . . . . . . 672

8.7 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6738.7.1 Power Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6738.7.2 Power Example Problems . . . . . . . . . . . . . . . . . . . . 674

Page 27: ECE 327 Slides VHDL Verilog Digital Hardware Design

CONTENTS xxiii

8.8 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6758.8.1 Testing Topics . . . . . . . . . . . . . . . . . . . . . . . . . . 6758.8.2 Testing Example Problems . . . . . . . . . . . . . . . . . . . 676

8.9 Formulas to be Given on Final Exam . . . . . . . . . . . . . . . . . . 677

Page 28: ECE 327 Slides VHDL Verilog Digital Hardware Design

Part I

Lecture Notes

1

Page 29: ECE 327 Slides VHDL Verilog Digital Hardware Design
Page 30: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 1

VHDL: The Language

3

Page 31: ECE 327 Slides VHDL Verilog Digital Hardware Design

4 CHAPTER 1. VHDL

1.1 Introduction to VHDL

1.1.1 Levels of AbstractionTransistor Signal values and time are continous (analog). Each transistor is mod-

eled by a resistor-capacitor network.

Switch Time is continuous, but voltage may be either continuous or discrete. Lin-ear equations are used.

Gate Transistors are grouped together into gates. Voltages are discrete valuessuch as 0 and 1.

Register transfer level Hardware is modeled as assignments to registers andcombinational signals. Basic unit of time is one clock cycle.

Transaction level A transaction is an operation such as transfering data acrossa bus. Building blocks are processors, controllers, etc. VHDL, SystemC, orSystemVerilog.

Electronic-system level Looks at an entire electronic system, with both hard-ware and software.

Page 32: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.1.2 VHDL Origins and History 5

1.1.2 VHDL Origins and History

VHDL = VHSIC Hardware Description LanguageVHSIC = Very High Speed Integrated Circuit

The VHSIC Hardware Description Language (VHDL) is a formal notationintended for use in all phases of the creation of electronic systems.Because it is both machine readable and human readable, it supports thedevelopment, verification, synthesis and testing of hardware designs, thecommunication of hardware design data, and the maintenance,modification, and procurement of hardware.

Language Reference Manual (IEEE Design Automation StandardsCommittee, 1993a)

VHDL is a lot more than synthesis of digitalhardware

Page 33: ECE 327 Slides VHDL Verilog Digital Hardware Design

6 CHAPTER 1. VHDL

1.1.3 Semantics

The original goal of VHDL was to simulate circuits. The semantics of the languagedefine circuit behaviour .

a

b

c

simulationc <= a AND b;

But now, VHDL is used in simulation and synthesis. Synthesis is concerned withthe structure of the circuit.

Synthesis: converts one type of description (behavioural) into another, lower level,description (usually a netlist).

a

b cc <= a AND b; synthesis

Page 34: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.1.3 Semantics 7

Synthesis

Synthesis is a computer-aided design (CAD) technique that transforms a designer’sconcise, high-level description of a circuit into a structural description of a circuit.

a

b cc <= a AND b; synthesis

Page 35: ECE 327 Slides VHDL Verilog Digital Hardware Design

8 CHAPTER 1. VHDL

CAD Tools

CAD Tools allow designers to automate lower-level design processes in implement-ing the desired functionality of a system.

NOTE: EDA = Electronic Design Automation. In digital hardware designEDA = CAD.

Page 36: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.1.3 Semantics 9

Synthesis vs Simulation

For synthesis, we want the code we write to define the structure of the hardwarethat is generated.

a

b cc <= a AND b; synthesis

Page 37: ECE 327 Slides VHDL Verilog Digital Hardware Design

10 CHAPTER 1. VHDL

Synthesis vs Simulation

The VHDL semantics define the behaviour of the hardware that is generated, notthe structure of the hardware.

a

b c

a

b c

c <= a AND b;

a

b

c

differentstructure

samebehavioursynthesis

simulation

a

b

c

simulation

synt

hesis

Page 38: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.1.4 Synthesis of a Simulation-Based Language 11

1.1.4 Synthesis of a Simulation-Based Lan-guage

This section reserved for your reading pleasure

Page 39: ECE 327 Slides VHDL Verilog Digital Hardware Design

12 CHAPTER 1. VHDL

1.1.5 Solution to Synthesis Sanity• Pick a high-quality synthesis tool and study its documentation thoroughly

• Learn the idioms of the tool

• Different VHDL code with same behaviour can result in very different circuits

• Be careful if you have to port VHDL code from one tool to another

• KISS: Keep It Simple Stupid

– VHDL examples will illustrate reliable coding techniques for the synthesis toolsfrom Synopsys, Mentor Graphics, Altera, Xilinx, and most other companies aswell.

– Follow the coding guidelines and examples from lecture

– As you write VHDL, think about the hardware you expect to get.

Note: If you can’t predict the hardware, then the hardwareprobably won’t be very good (small, fast, correct, etc)

Page 40: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.1.6 Standard Logic 1164 13

1.1.6 Standard Logic 1164

std logic 1164 : IEEE standard for signal values in VHDL.

’U’ uninitialized’X’ strong unknown’0’ strong 0’1’ strong 1’Z’ high impedance’W’ weak unknown’L’ weak 0’H’ weak 1’--’ don’t care

The most common values are: ’U’ , ’X’ , ’0’ , ’1’ .

If you see ’X’ in a simulation, it usually means that there is a mistake in your code.

Page 41: ECE 327 Slides VHDL Verilog Digital Hardware Design

14 CHAPTER 1. VHDL

1.2 Comparison of VHDL to Other Hard-ware Description Languages

This section reserved for your reading pleasure

1.3 Overview of Syntax

1.3.1 Syntactic Categories

This section reserved for your reading pleasure

1.3.2 Library Units

This section reserved for your reading pleasure

Page 42: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.3 Entities and Architecture 15

1.3.3 Entities and Architecture

Each hardware module is described with an Entity/Architecture pair

architecture

entityarchitecture

entity

Entity and Architecture

Page 43: ECE 327 Slides VHDL Verilog Digital Hardware Design

16 CHAPTER 1. VHDL

Entity

library ieee;

use ieee.std_logic_1164.all;

entity and_or is

port (

a, b, c : in std_logic ;

z : out std_logic

);

end and_or;

Example of an entity

Page 44: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.3 Entities and Architecture 17

Architecture

architecture main of and_or is

signal x : std_logic;

begin

x <= a AND b;

z <= x OR (a AND c);

end main;

Example of architecture

Page 45: ECE 327 Slides VHDL Verilog Digital Hardware Design

18 CHAPTER 1. VHDL

1.3.4 Concurrent Statements• Architecture s contain concurrent statements

• Concurrent statements execute in parallel (Figure1.4)

– Concurrent statements make VHDL fundamentally different from most soft-ware languages.

– Hardware (gates) naturally execute in parallel — VHDL mimics the behaviourof real hardware.

– At each infinitesimally small moment of time, each gate:

1. samples its inputs

2. computes the value of its output

3. drives the output

Page 46: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.4 Concurrent Statements 19

Concurrent Statements

architecture main of bowser isbegin x1 <= a AND b; x2 <= NOT x1; z <= NOT x2;end main;

architecture main of bowser isbegin z <= NOT x2; x2 <= NOT x1; x1 <= a AND b;end main;

a

b z

x1 x2

The order of concurrent statements doesn’t matter

Page 47: ECE 327 Slides VHDL Verilog Digital Hardware Design

20 CHAPTER 1. VHDL

Types of Concurrent Statements

conditional assignment similar to conventional if-then-elsec <= a+b when sel=’1’ else a+c when sel=’0’ else "0000";

selected assignment similar to conventional case/switchwith color select d <= "00" when red , "01" when . . .;

component instantiation use a hardware module/componentadd1 : adder port map( a => f , b => g, s => h, co => i );

for-generate create multiple pieces of hardwarebgen: for i in 1 to 7 generate b(i)<=a(7-i); end generate;

if-generate conditionally create some hardwareokgen : if optgoal /= fast then generate

result <= ((a and b) or (d and not e)) or g;end generate;fastgen : if optgoal = fast then generate

result <= ’1’;end generate;

process description of complex behaviour (Section 1.3.6)

Page 48: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.5 Component Declaration and Instantiations 21

1.3.5 Component Declaration and Instanti-ations

This section reserved for your reading pleasure

1.3.6 Processes

• Processes are used to describe complex and potentially unsynthesizable be-haviour

• A process is a concurrent statement (Section 1.3.4).

• The body of a process contains sequential statements (Section 1.3.7)

• Processes are the most complex and difficult to understand part of VHDL (Sec-tions 1.5 and 1.6)

Page 49: ECE 327 Slides VHDL Verilog Digital Hardware Design

22 CHAPTER 1. VHDL

Example Process with Sensitivity List

process (a, b, c)

begin

y <= a AND b;

if (a = ’1’) then

z1 <= b AND c;

z2 <= NOT c;

else

z1 <= b OR c;

z2 <= c;

end if;

end process;

Page 50: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.6 Processes 23

Example Process with Wait Statements

process

begin

y <= a AND b;

z <= ’0’;

wait until rising_edge(clk);

if (a = ’1’) then

z <= ’1’;

y <= ’0’;

wait until rising_edge(clk);

else

y <= a OR b;

end if;

end process;

Page 51: ECE 327 Slides VHDL Verilog Digital Hardware Design

24 CHAPTER 1. VHDL

Sensitivity Lists and Wait Statements

• Processes must have either a sensitivity list or at least one wait statement oneach execution path through the process.

• Processes cannot have both a sensitivity list and a wait statement.

Page 52: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.6 Processes 25

Sensitivity List

The sensitivity list contains the signals that are read in the process.

A process is executed when a signal in its sensitivity list changes value.

An important coding guideline to ensure consistent synthesis and simulation resultsis to include all signals that are read in the sensitivity list.

There is one exception to this rule: for a process that implements a flip-flop with anif rising edge statement, it is acceptable to include only the clock signal in thesensitivity list — other signals may be included, but are not needed.

Page 53: ECE 327 Slides VHDL Verilog Digital Hardware Design

26 CHAPTER 1. VHDL

1.3.7 Sequential Statements

Used inside processes and functions .

wait wait until . . . ;signal assignment . . . <= . . . ;if-then-else if . . . then . . . elsif . . . end if;case case . . . is

when . . . | . . . => . . . ;when . . . => . . . ;

end case;loop loop . . . end loop;while loop while . . . loop . . . end loop;for loop for . . . in . . . loop . . . end loop;next next . . . ;

The most commonly used sequential statements

Page 54: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.3.8 A Few More Miscellaneous VHDL Features 27

1.3.8 A Few More Miscellaneous VHDL Fea-tures

This section reserved for your reading pleasure

1.4 Concurrent vs Sequential Statements

All concurrent assignments can be translated into sequential statements. But, notall sequential statements can be translated into concurrent statements.

Page 55: ECE 327 Slides VHDL Verilog Digital Hardware Design

28 CHAPTER 1. VHDL

1.4.1 Concurrent Assignment vs Process

The two code fragments below have identical behaviour:

architecture main of tiny is

begin

b <= a;

end main;

architecture main of tiny is

begin

process (a) begin

b <= a;

end process;

end main;

Page 56: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.4.2 Conditional Assignment vs If Statements 29

1.4.2 Conditional Assignment vs If State-ments

The two code fragments below have identical behaviour:

Concurrent Statements

t <= <val1> when <cond>

else < val2>;

Sequential Statementsif < cond> then

t <= < val1>;

else

t <= < val2>;

end if

Page 57: ECE 327 Slides VHDL Verilog Digital Hardware Design

30 CHAPTER 1. VHDL

1.4.3 Selected Assignment vs Case State-ment

The two code fragments below have identical behaviour

Concurrent Statementswith < expr> select

t <= < val1> when <choices1>,

<val2> when <choices2>,

<val3> when <choices3>;

Sequential Statementscase < expr> is

when <choices1> =>

t <= < val1>;

when <choices2> =>

t <= < val2>;

when <choices3> =>

t <= < val3>;

end case;

Page 58: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.4.4 Coding Style 31

1.4.4 Coding Style

Code that’s easy to write with sequential statements, but difficult with concurrent :

case < expr> is

when <choice1> =>

if < cond> then

o <= <expr1>;

else

o <= <expr2>;

end if;

when <choice2> =>

. . .end case;

Page 59: ECE 327 Slides VHDL Verilog Digital Hardware Design

32 CHAPTER 1. VHDL

1.5 Overview of Processes

Processes are the most difficult VHDL construct to understand. This section givesan overview of processes. Section 1.6 gives the details of the semantics of pro-cesses.• Within a process, statements are executed almost sequentially

• Among processes, execution is done in parallel

• Remember: a process is a concurrent statement!

Page 60: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5. OVERVIEW OF PROCESSES 33

Process Semantics• VHDL mimics hardware

• Hardware (gates) execute in parallel

• Processes execute in parallel with each other

• All possible orders of executing processes must produce the same simulationresults (waveforms)

• If a signal is not assigned a value, then it holds its previous value

All orders of executing concurrentstatements must produce the same

waveforms

Page 61: ECE 327 Slides VHDL Verilog Digital Hardware Design

34 CHAPTER 1. VHDL

Process Semantics

architecture

procA: process

stmtA1;

stmtA2;

stmtA3;

end process;

procB: process

stmtB1;

stmtB2;

end process;

execution sequence

A1

A2

A3

B1

B2

execution sequence

A1

A2

A3

B1

B2

execution sequence

A1

A2

A3

B1

B2

single threaded:procA beforeprocB

single threaded:procB beforeprocA

multithreaded:procA and procB

in parallel

Page 62: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5. OVERVIEW OF PROCESSES 35

Process Semantics

All execution orders must have same behaviour

Page 63: ECE 327 Slides VHDL Verilog Digital Hardware Design

36 CHAPTER 1. VHDL

1.5.1 Combinational Process vs ClockedProcess

Each well-written synthesizable process is either combinational or clocked.

Combinational process:• Executing the process takes part of one clock cycle

• Target signals are outputs of combinational circuitry

• A combinational processes must have a sensitivity list

• A combinational process must not have any wait statements

• A combinational process must not have any rising_edge s, orfalling_edge s

• The hardware for a combinational process is just combinational circuitry

Page 64: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5.1 Combinational Process vs Clocked Process 37

Clocked process:• Executing the process takes one (or more) clock cycles

• Target signals are outputs of flops

• Process contains one or more wait or if rising edge statements

• Hardware contains combinational circuitry and flip flops

Note: Clocked processes are sometimes called “sequentialprocesses”, but this can be easily confused with “sequential state-ments”, so in E&CE 327 we’ll refer to synthesizable processes aseither “combinational” or “clocked”.

Page 65: ECE 327 Slides VHDL Verilog Digital Hardware Design

38 CHAPTER 1. VHDL

Combinational or Clocked Process? (1)

process (a,b,c)

p1 <= a;

if (b = c) then

p2 <= b;

else

p2 <= a;

end if;

end process;

Page 66: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5.1 Combinational Process vs Clocked Process 39

Combinational or Clocked Process? (2)

process

begin

wait until rising_edge(clk);

b <= a;

end process;

Page 67: ECE 327 Slides VHDL Verilog Digital Hardware Design

40 CHAPTER 1. VHDL

Combinational or Clocked Process? (3)

process (clk)

begin

if rising_edge(clk) then

b <= a;

end if;

end process;

Page 68: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5.1 Combinational Process vs Clocked Process 41

Combinational or Clocked Process? (4)

process (clk)

begin

a <= clk;

end process;

Page 69: ECE 327 Slides VHDL Verilog Digital Hardware Design

42 CHAPTER 1. VHDL

Combinational or Clocked Process? (5)

process

begin

wait until rising_edge(a);

c <= b;

end process;

Page 70: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5.2 Latch Inference 43

1.5.2 Latch Inference

The semantics of VHDL require that if a signal is assigned a value on some passesthrough a process and not on other passes, then on a pass through the processwhen the signal is not assigned a value, it must maintain its value from the previouspass.

process (a, b, c)

begin

if (a = ’1’) then

z1 <= b;

z2 <= b;

else

z1 <= c;

end if;

end process;

a

b

c

z1

z2

Example of latch inference

Page 71: ECE 327 Slides VHDL Verilog Digital Hardware Design

44 CHAPTER 1. VHDL

Latch Inference

When a signal’s value must be stored, VHDL infers a latch or a flip-flop in thehardware to store the value.

If you want a latch or a flip-flop for the signal, then latch inference is good.

If you want combinational circuitry, then latch inference is bad.

Page 72: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.5.2 Latch Inference 45

Loop, Latch, Flop

b

a

z

Combinational loop

b z

a EN

Latch

b z

a

D Q

Flip-flop

Question: Write VHDL code for each of the above circuits

Page 73: ECE 327 Slides VHDL Verilog Digital Hardware Design

46 CHAPTER 1. VHDL

1.6 Details of Process Execution

1.6.1 Simple Simulation

a

b

c d

e

a

b

c

d

e

0ns 10ns 12ns 15ns

Page 74: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.2 Temporal Granularities of Simulation 47

Different Programs, Same Behaviour

All three programs below synthesize to the circuit on the previous slide.

The goal of VHDL semantics is that all three programs have the same behaviour.

process (a,b)

begin

c <= a and b;

end process;

process (b,c,d)

begin

d <= not c;

e <= b and d;

end process;

process (a,b,c,d)

begin

c <= a and b;

d <= not c;

e <= b and d;

end process;

process (a,b)

begin

c <= a and b;

end process;

process (c)

begin

d <= not c;

end process;

process (b,d)

begin

e <= b and d;

end process;

Page 75: ECE 327 Slides VHDL Verilog Digital Hardware Design

48 CHAPTER 1. VHDL

1.6.2 Temporal Granularities of Simulation

This section reserved for your reading pleasure

1.6.3 Intuition Behind Delta-Cycle Simula-tion

In zero-delay simulation, a sequence of dependent events must appear to happeninstantaneously (in zero time). In particular, the effect of an event must propagateinstantaneously through combinational circuitry.

Two fundamental rules for zero-delay simulation:

1. events appear to propagate through combinational circuitry instantaneously.

2. all of the gates appear to operate in parallel

Page 76: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.3 Intuition Behind Delta-Cycle Simulation 49

Intution for Delta Cycles

To make it appear that events propagate instaneously, VHDL introduces an artificialunit of time, the delta cycle, to represent an infinitesimally small amount of time. Ineach delta cycle, every gate in the circuit will sample its inputs, compute its result,and drive its output signal with the result.

Simulators simulate one gate at a time, but the waveforms make it appear that all ofthe gates were run in parallel. In each delta cycle, the simulator executes all gateswhose inputs changed.

To preserve the illusion that the gates ran in parallel, the effect of simulating a gateremains invisible until the end of the delta cycle.

Page 77: ECE 327 Slides VHDL Verilog Digital Hardware Design

50 CHAPTER 1. VHDL

1.6.4 Definitions and Algorithm

1.6.4.1 Process Modes

suspend

resume

activ

ate

active

suspendedpostponed

Page 78: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.4 Definitions and Algorithm 51

Suspended

suspend

resume

activ

ate

active

suspendedpostponed

• Nothing to currently execute

• A process stays suspended until the event that it is waiting for occurs: either achange in a signal on its sensitivity list or the condition in a wait statement

Page 79: ECE 327 Slides VHDL Verilog Digital Hardware Design

52 CHAPTER 1. VHDL

Postponed

suspend

resume

activ

ate

active

suspendedpostponed

• Wants to execute, but not currently active

• A process stays postponed until the simulator chooses it from the pool of post-poned processes

Page 80: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.4 Definitions and Algorithm 53

Active

suspend

resume

activ

ate

active

suspendedpostponed

• Currently executing

• A process stays active until it hits a wait statement or sensitivity list, at whichpoint it suspends

Page 81: ECE 327 Slides VHDL Verilog Digital Hardware Design

54 CHAPTER 1. VHDL

1.6.4.2 Simulation Algorithm

The algorithm presented here is a simplification of the actual algorithm in the VHDLStandard.

This algorithm does not support delayed assignments; for example:(a <= b after 2 ns; ).

A somewhat ironic note, only six of the two hundred pages in the VHDL Standardare devoted to the semantics of executing processes.

Page 82: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.4 Definitions and Algorithm 55

The Algorithm

Simulations start at step 1 with all processes postponed and all signals with adefault value (e.g., ’U’ for std logic ).

1. While there are postponed processes:

(a) Pick one or more postponed processes to execute (become active).(b) Provisionally execute assignments (new values become visible at step 3)(c) A process executes until it hits its sensitivity list or a wait statement, at which point it

suspends.(d) Processes that become suspended, stay suspended until there are no more postponed

or active processes.

2. Each process checks its sensitivity list or wait condition to see if it should resume

3. Update signals with their provisional values4. If no postponed processes, then increment simulation time to next event.

Page 83: ECE 327 Slides VHDL Verilog Digital Hardware Design

56 CHAPTER 1. VHDL

Notes on Simulation Algorithm• At a wait statement, the process will suspend even if the condition is true in the

current simulation cycle. The process will resume when the condition changesto true.

• In n-threaded execution, at most n processes are active at a time

Page 84: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.4 Definitions and Algorithm 57

1.6.4.3 Delta-Cycle Definitions

Definition simulation step: Executing one sequential assignment or processmode change.

Definition simulation cycle: The operations that occur in one iteration of thesimulation algorithm.

Definition delta cycle: A simulation cycle that does not advance simulationtime.

Definition simulation round: A sequence of simulation cycles that all have thesame simulation time.

Page 85: ECE 327 Slides VHDL Verilog Digital Hardware Design

58 CHAPTER 1. VHDL

1.6.5 Example 1: Process Execution (Bam-boozle)

This section reserved for your reading pleasure

1.6.6 Example 2: Process Execution (Flum-mox)

This example is a variation of the Bamboozle example from section 1.6.5.

Page 86: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.6 Example 2: Process Execution (Flummox) 59

a

b

c d

e

U

U

U UU

a

b

c

d

e

P

P

Legend

0ns

simulation step

visible-assignment valuesimulation-step pointer(one per process)

process mode (S=suspended, P=postponend A=active)

P

initial values

provisional-assignment value

proc1: process (a, b, c) begin

c <= a AND b;

end process;

proc2: process (b, d) begin

d <= NOT c;

end process;

e <= b AND d;

proc3: process begin

a <= ’1’;

b <= ’0’;

b <= ’1’;

wait for 3 ns;

wait for 99 ns;

end process;

proc1

proc2

proc3

delta cyclesim cycle

sim round

Page 87: ECE 327 Slides VHDL Verilog Digital Hardware Design

60 CHAPTER 1. VHDL

a

b

c d

e

proc1: ...(a, b, c)...

c <= a AND b;

end process;

proc2: ...(b, d)...

d <= NOT c;

end process;

e <= b AND d;

proc3: process begin

a <= ’1’;

b <= ’0’;

b <= ’1’;

wait for 3 ns;

wait for 99 ns;

end process;

2. Check sens lists, wait conditions for changes3. Update signals with provisional values4. If no postponed procs, increment time

1. While there are postponed processes:(a) Pick process(es) to activate(b) Execute active processes, record prov asns(c) Suspend at sens list or wait statement(d) Once suspended, stay suspended

a

b

c

d

e

proc1

proc2

proc3

delta cyclesim cycle

sim round

Page 88: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.6 Example 2: Process Execution (Flummox) 61

From Delta-Time to Real Time

a

b

c

d

e

U

U

U

U

U

+1δ +2δ +3δ3ns

+1δ +2δ +3δ0ns 102ns

U

U

U

U

U

a

b

c

d

e

3ns0ns 102ns

U

U

U

U

U

2ns1ns 4ns 100ns 101ns

Page 89: ECE 327 Slides VHDL Verilog Digital Hardware Design

62 CHAPTER 1. VHDL

Note and Questions

Note: If a signal is updated with the same value it had in theprevious simulation cycle, then it does not change, and thereforedoes not trigger processes to resume.

Question: What are the different granularities of time that occur when doingdelta-cycle simulation?

Question: What is the order of granularity, from finest to coarsest, amongstthe different granularities related to delta-cycle simulation?

Page 90: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.7 Ex: Need for Provisonal Asn 63

1.6.7 Ex: Need for Provisonal Asnarchitecture main of swindle is

begin

p_c: process (a, b) begin

c <= a AND b;

end process;

p_d: process (a, c) begin

d <= a XOR c;

end process;

end main;

Question: draw the circuit

Circuit to illustrate need for provisional assignments

1. Start with all signals at ’0’ .

2. Simultaneously change to a = ’1’ and b = ’1’ .

Page 91: ECE 327 Slides VHDL Verilog Digital Hardware Design

64 CHAPTER 1. VHDL

With Provisional Assignments,

c Before d

If assignments are not visible within same simulation cycle(correct: i.e. provisional assignments are used)

p_c: process (a, b) begin

c <= a AND b;

end process;

p_d: process (a, c) begin

d <= a XOR c;

end process;

a

b

c

d

0

0

0

0

p_d

p_c P

P

A S

A S P A S

If p c is scheduled before p d, then d will have a ’1’ pulse.

Page 92: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.7 Ex: Need for Provisonal Asn 65

With Provisional Assignments,

d Before c

If assignments are not visible within same simulation cycle(correct: i.e. provisional assignments are used)

p_c: process (a, b) begin

c <= a AND b;

end process;

p_d: process (a, c) begin

d <= a XOR c;

end process;

a

b

c

d

0

0

0

0

p_d

p_c P

P

A S

A S P A S

If p d is scheduled before p c , then d will have a ’1’ pulse.

Page 93: ECE 327 Slides VHDL Verilog Digital Hardware Design

66 CHAPTER 1. VHDL

Without Prov. Assignments,

c Before d

If assignments are visible within same simulation cycle (incorrect)

p_c: process (a, b) begin

c <= a AND b;

end process;

p_d: process (a, c) begin

d <= a XOR c;

end process;

a

b

c

d

0

0

0

0

p_d

p_c P

P

A S

A S P A S

If p c is scheduled before p d, then d will stay constant ’0’ .

Page 94: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.7 Ex: Need for Provisonal Asn 67

Without Prov. Assignments,

d Before c

If assignments are visible within same simulation cycle (incorrect)

p_c: process (a, b) begin

c <= a AND b;

end process;

p_d: process (a, c) begin

d <= a XOR c;

end process;

a

b

c

d

0

0

0

0

p_d

p_c P

P

A S

A S P A S

If p d is scheduled before p c , then d will have a ’1’ pulse.

Page 95: ECE 327 Slides VHDL Verilog Digital Hardware Design

68 CHAPTER 1. VHDL

Need for Provisional Assignment

With provisional assignments, both orders of scheduling processes result in thesame behaviour on all signals. Without provisional assignments, different schedul-ing orders result in different behaviour.

Page 96: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.8 Delta-Cycle Simulations of Flip-Flops 69

1.6.8 Delta-Cycle Simulations of Flip-Flops

p_a : process begina <= ’0’;wait for 15 ns;a <= ’1’;wait for 20 ns;

end process;

p_clk : process beginclk <= ’0’;wait for 10 ns;clk <= ’1’;wait for 10 ns;

end process;flop : process ( clk ) begin

if rising_edge( clk ) thenq <= a;

end if;end process;

a

clk

q

flop

p_a

p_clk

sim roundsim cycle

delta cycle

0ns

PP

U

U

U

P

U

BBB

EE

A SA S

U

A S

0

0

Page 97: ECE 327 Slides VHDL Verilog Digital Hardware Design

70 CHAPTER 1. VHDL

Redraw with Normal Time Scale

a

clk

q

0ns 10ns 20ns5ns 15ns 30ns 35ns25ns

Page 98: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.8 Delta-Cycle Simulations of Flip-Flops 71

Back-to-Back Flops

p_a : process begina <= ’0’;wait for 15 ns;a <= ’1’;wait for 20 ns;

end process;

p_clk : process beginclk <= ’0’;wait for 10 ns;clk <= ’1’;wait for 10 ns;

end process;flops : process ( clk ) begin

if rising_edge( clk ) thenq1 <= a;q2 <= q1;

end if;end process;

a

clk

q1

flops

p_a

p_clk

sim roundsim cycle

delta cycle

10ns

P A S

0

0

B/E

A SP

U

15ns

P A S

20ns

P A S

30ns

P A SA S

1

0

0

A SP

1

1

B/E

B

BB

EE E

EE

EE

E B E B E B EB E B/E

B/E

B/E

B/E

B/E

B/EBB B E

BB B E

35ns

1

P

U

q2 U

B

Page 99: ECE 327 Slides VHDL Verilog Digital Hardware Design

72 CHAPTER 1. VHDL

Redraw with Normal Time Scale

a

clk

q

0ns 10ns 20ns5ns 15ns 30ns 35ns25ns

Page 100: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.8 Delta-Cycle Simulations of Flip-Flops 73

External Inputs and Flops

Question: Do the signals b1 and b2 have the same behaviour from20–30 ns?

Page 101: ECE 327 Slides VHDL Verilog Digital Hardware Design

74 CHAPTER 1. VHDL

architecture mathilde of sauv e is

signal clk, a, b : std_logic;

begin

process begin

clk <= ’1’;

wait for 10 ns;

clk <= ’0’;

wait for 10 ns;

end process;

process begin

wait for 20 ns;

a1 <= ’1’;

end process;

process begin

wait until rising_edge(clk);

a1 <= ’1’;

end process;

process begin

wait until rising_edge( clk );

b1 <= a1;

b2 <= a2;

Page 102: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.8 Delta-Cycle Simulations of Flip-Flops 75

Testbenches and Clock Phases

env : process begina <= ’1’;clk <= ’0’;wait for 10 ns;a <= ’0’;clk <= ’1’;wait for 10 ns;

end process;

flop : process ( clk ) beginif rising_edge( clk ) then

q1 <= aend if;

end process;

a

clk

q1

flop2

flop1

env

sim roundsim cycle

delta cycle

0ns

Page 103: ECE 327 Slides VHDL Verilog Digital Hardware Design

76 CHAPTER 1. VHDL

Redraw with Normal Time Scale

a

clk

q1

0ns 10ns 20ns

Page 104: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.6.8 Delta-Cycle Simulations of Flip-Flops 77

WarningNote: Testbench signals For consistent results across differ-ent simulators, simulation scripts vs test benches, and timing-simulation vs zero-delay simulation do not change signals in yourtestbench or script at the same time as the clock changes.

a is output of clocked or com-binational process

a

clk

q1

0ns 10ns 20ns

U

U

U

30ns 40ns 50ns 60ns

a is output of timed process(testbench or environment)POOR DESIGN

a

clk

q1

0ns 10ns 20ns

U

U

U

30ns 40ns 50ns 60ns

a is output of timed process(testbench or environment)GOOD DESIGN

a

clk

q1

0ns 10ns 20ns

U

U

U

30ns 40ns 50ns 60ns

Page 105: ECE 327 Slides VHDL Verilog Digital Hardware Design

78 CHAPTER 1. VHDL

1.7 Register-Transfer-Level Simulation

a

b

c

d

e

proc1

proc2

proc3

delta cyclesim cycle

sim round BBB

PPP

U

U

U

U

U

A

U

SA

1

0

S

A S

U

U

EE

PP

A

0

U

SA S

BB E

E

P A S

0

1

BB E

E

P A S

0

B EE

P A S

1

PP A S

1

A S

1

1

BB

BEE

P A S

1

0

P A S

0

102ns

0

BBE

E EE

EBB

0ns 3ns

BEE

U

0ns+1δ 0ns+2δ 0ns+2δ 3ns+1δ 3ns+2δ 3ns+3δ

a

b

c

d

e

U

U

U

U

U

1

0

0

1

0

1

1

0

0ns 1ns 2ns 3ns 102ns

Delta cycle simulation RTL simulation

Page 106: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.7.1 Overview 79

1.7.1 Overview• Much simpler than delta cycle

• Columns are real time: clock cycles, nanoseconds, etc.

• Can simulate both synthesizable and unsynthesizable code

• Cannot simulate combinational loops

• Same values as delta-cycle at end of simulation round

process begin

a <= ’0’;

wait for 10 ns;

a <= ’1’;

...

end process;

process begin

b <= ’0’;

wait for 10 ns;

b <= a;

...

end process;

Question: In this code, whatvalue should b have 10 ns?

Page 107: ECE 327 Slides VHDL Verilog Digital Hardware Design

80 CHAPTER 1. VHDL

1.7.2 Technique for Register-Transfer LevelSimulation

1. Pre-processing

(a) Separate processes into combinational and non-combinational (clocked andtimed)

(b) Decompose each combinational process into separate processes with onetarget signal per process

(c) Sort processes into topological order based on dependencies

2. For each clock cycle or unit of time:

(a) Run non-combinational processes in any order. Non-combinational assign-ments read from earlier clock cycle / time step, except that clocked processesread the current value of the clock signal.

(b) Run combinational processes in topological order. Combinational assign-ments read from current clock cycle / time step.

Page 108: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.7.3 Examples of RTL Simulation 81

1.7.3 Examples of RTL Simulation

1.7.3.1 RTL Simulation Example 1

We revisit an earlier example from delta-cycle simulation, but change the codeslightly and do register-transfer-level simulation.

proc1: process (a, b, c) begin

d <= NOT c;

c <= a AND b;

end process;

proc2: process (b, d) begin

e <= b AND d;

end process;

proc3: process begin

a <= ’1’;

b <= ’0’;

wait for 3 ns;

b <= ’1’;

wait for 99 ns;

end process;

Page 109: ECE 327 Slides VHDL Verilog Digital Hardware Design

82 CHAPTER 1. VHDL

Decompose and sort comb procs

proc1d: process (c) begind <= NOT c;

end process;

proc1c: process (a, b) beginc <= a AND b;

end process;

proc2: process (b, d) begine <= b AND d;

end process;

proc1c: process (a, b) beginc <= a AND b;

end process;

proc1d: process (c) begind <= NOT c;

end process;

proc2: process (b, d) begine <= b AND d;

end process;

Decomposed Sorted

Page 110: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.7.3 Examples of RTL Simulation 83

Waveforms

a

b

c

d

e

U

U

U

U

U

0ns 1ns 2ns 3ns 102ns

Example: Communicating State Machines

Page 111: ECE 327 Slides VHDL Verilog Digital Hardware Design

84 CHAPTER 1. VHDL

huey: process

begin

clk <= ’1’;

wait for 10 ns;

clk <= ’0’;

wait for 10 ns;

end process;

dewey: process

begina <= to_unsigned(0,4);

wait until re(clk);

while (a < 4) loop

a <= a + 1;

wait until re(clk);

end loop;

end process;

louie: process

begin

wait until re(clk);

d <= ’1’;

if (a >= 2) then

d <= ’0’;

wait until re(clk);

end if;

end process;

clk

a

d

Page 112: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8. VHDL AND HARDWARE BUILDING BLOCKS 85

1.8 VHDL and Hardware Building Blocks

1.8.1 Basic Building Blocks

Different classes of building blocks:

• Conditional

• Arithmetic

• Storage

Page 113: ECE 327 Slides VHDL Verilog Digital Hardware Design

86 CHAPTER 1. VHDL

Basic Building Blocks: Boolean

Schematic VHDL Description

and AND gate

or OR gatenot inverter

nand NAND gate

nor and gate

xor exclusive-or gate

Page 114: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.1 Basic Building Blocks 87

Basic Building Blocks: Conditional

if-then-else ,when-else ,with-select ,case

Multiplexer

Page 115: ECE 327 Slides VHDL Verilog Digital Hardware Design

88 CHAPTER 1. VHDL

Basic Building Blocks: Arithmetic

+ adder

- subtracter

asl , lsl left shifter

asr , lsr right shifter

Page 116: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.1 Basic Building Blocks 89

Basic Building Blocks: Storage

CE

S

R D Q

clocked process flip flop WE

A

DI

DO

memory component single-port memory WE

A0

DI0

DO0

A1 DO1

memory component dual-port memory

Page 117: ECE 327 Slides VHDL Verilog Digital Hardware Design

90 CHAPTER 1. VHDL

1.8.2 Deprecated Building Blocks for RTL

Some of the common gates you have encountered in previous courses should beavoided when synthesizing register-transfer-level hardware, particularly if FPGAsare the implementation technology.

Latches : Use flops, not latches

T, JK, SR, etc flip-flops : Limit yourself to D-type flip-flops

Tri-State Buffers : Use multiplexers, not tri-state buffers

Note: Unfortunately and surprisingly, PalmChip has beenawarded a US patent for using uni-directional busses (i.e. multi-plexers) for system-on-chip designs. The patent was filed in 2000,so all fourth-year design projects completed after that date willneed to pay royalties to PalmChip

Page 118: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.2 Deprecated Building Blocks for RTL 91

What is This?

process (a)

begin

if rising_edge(a) then

c <= b;

end if;

end process;

Page 119: ECE 327 Slides VHDL Verilog Digital Hardware Design

92 CHAPTER 1. VHDL

1.8.3 Hardware and Code for Flops

1.8.3.1 Flops with Waits and Ifs

process (clk)

begin

if rising_edge(clk) then

q <= d;

end if;

end process;

Page 120: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.3 Hardware and Code for Flops 93

VHDL Code for Flip-Flop: Wait-Style

process

begin

wait until rising_edge(clk);

q <= d;

end process;

Page 121: ECE 327 Slides VHDL Verilog Digital Hardware Design

94 CHAPTER 1. VHDL

1.8.3.2 Flops with Synchronous Reset

process (clk)

begin

if rising_edge(clk) then

if (reset = ’1’) then

q <= ’0’;

else

q <= d;

end if;

end if;

end process;

Page 122: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.3 Hardware and Code for Flops 95

Flop with Synchronous Reset: Wait-Style

process

begin

wait until rising_edge(clk);

if (reset = ’1’) then

q <= ’0’;

else

q <= d0;

end if;

end process;

Page 123: ECE 327 Slides VHDL Verilog Digital Hardware Design

96 CHAPTER 1. VHDL

Variation on a Floppy Theme

Question: Synchronous or asynchronous reset?

process (clk, reset)

begin

if (reset = ’1’) then

q <= ’0’;

else

if rising_edge(clk) then

q <= d;

end if;

end if;

end process;

Page 124: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.3 Hardware and Code for Flops 97

Variated Flop of a Theme

Question: Synchronous or asynchronous reset?

process

begin

if (reset = ’1’) then

q <= ’0’;

else

q <= d0;

end if;

wait until rising_edge(clk);

end process;

Page 125: ECE 327 Slides VHDL Verilog Digital Hardware Design

98 CHAPTER 1. VHDL

Flop with Chip-Enable

process (clk)

begin

if rising_edge(clk) then

if (ce = ’1’) then

q <= d;

end if;

end if;

end process;

Wait-style flop with chip-enable included in course notes

Page 126: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.3 Hardware and Code for Flops 99

Q: Flop with a Mux on the Input?

D Q

d0

d1

sel

q

clk

Page 127: ECE 327 Slides VHDL Verilog Digital Hardware Design

100 CHAPTER 1. VHDL

Q: Flops with a Mux on the Output?

D Q q0

q1

sel

clk

D Q

clk

d1

d0

q

Question: For the circuits with mux-on-input and mux-on-output, does qhave the same behaviour in both circuits?

Page 128: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.8.3 Hardware and Code for Flops 101

1.8.3.3 Flop with Chip-Enable and Mux onInput

Hint: Chip Enableprocess (clk)

begin

if rising_edge(clk) then

if (ce = ’1’) then

q <= d;

end if;

end if;

end process;

Page 129: ECE 327 Slides VHDL Verilog Digital Hardware Design

102 CHAPTER 1. VHDL

1.8.3.4 Flops with Chip-Enable, Muxes, andReset

This section reserved for your reading pleasure

1.8.4 An Example Sequential Circuit

This section reserved for your reading pleasure

1.9 Arrays and Vectors

This section reserved for your reading pleasure

Page 130: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.10. ARITHMETIC 103

1.10 Arithmetic

VHDL includes all of the common arithmetic and logical operators.

Use the VHDL arithmetic operators and let the synthesis tool choose the best im-plementation for you.

1.10.1 Arithmetic Packages

To do arithmetic with signals, use the numeric_std package. This package de-fines types signed and unsigned , which are std_logic vectors on which youcan do signed or unsigned arithmetic.

numeric std supersedes earlier arithmetic packages, such asstd logic arith .

Use only one arithmetic package, otherwise the different definitions will clash andyou can get strange error messages.

Page 131: ECE 327 Slides VHDL Verilog Digital Hardware Design

104 CHAPTER 1. VHDL

1.10.2 Shift and Rotate Operations

This section reserved for your reading pleasure

1.10.3 Overloading of Arithmetic

This section reserved for your reading pleasure

1.10.4 Different Widths and Arithmetic

This section reserved for your reading pleasure

1.10.5 Overloading of Comparisons

This section reserved for your reading pleasure

Page 132: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.10.6 Different Widths and Comparisons 105

Overloading of Comparison Operations (=, /= , >=, >, <)

src1/2 src2/1unsigned integer OK

signed integer OKunsigned signed fails in analysis

1.10.6 Different Widths and Comparisons

This section reserved for your reading pleasure

Page 133: ECE 327 Slides VHDL Verilog Digital Hardware Design

106 CHAPTER 1. VHDL

1.10.7 Type Conversion

The functions unsigned , signed , to integer , to unsigned and to signed

are used to convert between integers, std-logic vectors, signed vectors and un-signed vectors.

If you convert between two types of the same width, then no additional hardwarewill be generated.

The listing below summarizes the types of these functions.

Page 134: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.10.7 Type Conversion 107

Type Conversion

unsigned( val : std_logic_vector ) return unsigned;

signed( val : std_logic_vector ) return signed;

to_integer( val : signed ) return integer;

to_integer( val : unsigned ) return integer;

to_unsigned( val : integer; width : natural) return unsigned;

to_signed( val : integer; width : natural) return signed;

Note: More details in course notes

Page 135: ECE 327 Slides VHDL Verilog Digital Hardware Design

108 CHAPTER 1. VHDL

1.11 Synthesizable vs Non-SynthesizableCode

Synthesis is done by matching VHDL code against templates or patterns.

It’s important to use idioms that your synthesis tools recognize.

Think like hardware: when you write VHDL, you should know what hardware youexpect to be produced by the synthesizer.

Page 136: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.11.1 Unsynthesizable Code 109

1.11.1 Unsynthesizable Code

1.11.1.1 Initial Values

Initial values on signals (UNSYNTHESIZABLE)

signal bad_signal : std_logic := ’0’;

Reason : At powerup, the values on signals are random (except for some FPGAs).

Page 137: ECE 327 Slides VHDL Verilog Digital Hardware Design

110 CHAPTER 1. VHDL

1.11.1.2 Wait For

Wait for length of time (UNSYNTHESIZABLE)

wait for 10 ns;

Reason : Delays through circuits are dependent upon both the circuit and its op-erating environment, particularly supply voltage and temperature. For example,imagine trying to build an AND gate that will have exactly a 2ns delay in all envi-ronments.

Page 138: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.11.1 Unsynthesizable Code 111

1.11.1.3 Different Wait Conditions

wait statements with different conditions in a process (UNSYNTHESIZABLE)

-- different clock signals

process

begin

wait until rising_edge(clk1);

x <= a;

wait until rising_edge(clk2);

x <= a;

end process;

Reason : Would require the flip flops to use different clock signals at different times.

Page 139: ECE 327 Slides VHDL Verilog Digital Hardware Design

112 CHAPTER 1. VHDL

Different Wait Conditions

-- different clock edges

process

begin

wait until rising_edge(clk);

x <= a;

wait until falling_edge(clk);

x <= a;

end process;

Reason : Would require flip-flop to be sensitive to different clock edges at differenttimes.

Page 140: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.11.1 Unsynthesizable Code 113

1.11.1.4 Multiple “if rising edge” in Pro-cessMultiple if rising edge statements in a process (UNSYNTHESIZABLE)

process (clk)

begin

if rising_edge(clk) then

q0 <= d0;

end if;

if rising_edge(clk) then

q1 <= d1;

end if;

end process;

Reason : The idioms for synthesis tools generally expect just a single ifrising edge statement in each process.

The simpler the VHDL code is, the easier it is to synthesize hardware. Program-mers of synthesis tools make idiomatic (idiotic?) restrictions to make their jobssimpler.

Page 141: ECE 327 Slides VHDL Verilog Digital Hardware Design

114 CHAPTER 1. VHDL

1.11.1.5 “if rising edge” and “wait” in SameProcess

An if rising edge statement and a wait statement in the same process (UN-SYNTHESIZABLE)

process (clk)

begin

if rising_edge(clk) then

q0 <= d0;

end if;

wait until rising_edge(clk);

q0 <= d1;

end process;

Reason : The idioms for synthesis tools generally expect just a single type of flop-generating statement in each process.

Page 142: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.11.1 Unsynthesizable Code 115

1.11.1.6 “if rising edge” with “else” Clause

The if statement has a rising edge condition and an else clause (UNSYN-THESIZABLE).

process (clk)

begin

if rising_edge(clk) then

q0 <= d0;

else

q0 <= d1;

end if;

end process;

Reason : Generally, an if-then-else statement synthesizes to a multiplexer.

Page 143: ECE 327 Slides VHDL Verilog Digital Hardware Design

116 CHAPTER 1. VHDL

1.11.1.7 “if rising edge” Inside a “for” Loop

An if rising edge statement in a for-loop (UNSYNTHESIZABLE-Synopsys)

process (clk) begin

for i in 0 to 7 loop

if rising_edge(clk) then

q(i) <= d;

end if;

end loop;

end process;

Reason : just an idiom of the synthesis tool.

Some loop statements are synthesizable (Rushton Section 8.7).For-loops in general are described in Ashenden.

Page 144: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.11.1 Unsynthesizable Code 117

Synthesizable Alternative

A synthesizable alternative to an if rising edge statement in a for-loop is to putthe if-rising-edge outside of the for loop.

process (clk) begin

if rising_edge(clk) then

for i in 0 to 7 loop

q(i) <= d;

end loop;

end if;

end process;

Page 145: ECE 327 Slides VHDL Verilog Digital Hardware Design

118 CHAPTER 1. VHDL

1.11.1.8 “wait” Inside of a “for loop”wait statements in a for loop (UNSYNTHESIZABLE)

process

begin

for i in 0 to 7 loop

wait until rising_edge(clk);

x <= to_unsigned(i,4);

end loop;

end process;

Reason : Unknown. while-loop s with the same behaviour are synthesizable.

Note: Combinational for-loops Combinational for-loops areusually synthesizable. They are often used to build a combina-tional circuit for each element of an array.

Note: Clocked for-loops Clocked for-loops are not synthe-sizable, but are very useful in simulation, particular to generatetest vectors for test benches.

Page 146: ECE 327 Slides VHDL Verilog Digital Hardware Design

1.11.1 Unsynthesizable Code 119

Synthesizable Alternative to Wait-Inside-For

while loop (synthesizable)

This is the synthesizable alternative to the the wait statement in a for loop above.

process

begin

-- output values from 0 to 4 on i

-- sending one value out each clock cycle

i <= to_unsigned(0,4);

wait until rising_edge(clk);

while (4 > i) loop

i <= i + 1;

wait until rising_edge(clk);

end loop;

end process;

Page 147: ECE 327 Slides VHDL Verilog Digital Hardware Design

120 CHAPTER 1. VHDL

1.12 Synthesizable VHDL Coding Guide-lines

This section reserved for your reading pleasure

Page 148: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 2

RTL Design with VHDL: FromRequirements to Optimized Code

121

Page 149: ECE 327 Slides VHDL Verilog Digital Hardware Design

122 CHAPTER 2. RTL DESIGN WITH VHDL

2.1 Prelude to Chapter

This section reserved for your reading pleasure

2.2 FPGA Background and Coding Guide-lines

2.2.1 Generic FPGA Hardware

Page 150: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.1 Generic FPGA Hardware 123

2.2.1.1 Generic FPGA Cell“Cell” = “Logic Element” (LE) in Altera

= “Configurable Logic Block” (CLB) in Xilinx

CE

S

R D Q data_in

ctrl_in

carry_in

carry_out

data_outcomb

Page 151: ECE 327 Slides VHDL Verilog Digital Hardware Design

124 CHAPTER 2. RTL DESIGN WITH VHDL

Configurable Comb/Flop Connection

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

Page 152: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.1 Generic FPGA Hardware 125

Separate Comb and Flop

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

Page 153: ECE 327 Slides VHDL Verilog Digital Hardware Design

126 CHAPTER 2. RTL DESIGN WITH VHDL

Connect Comb and Flop

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

Page 154: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.1 Generic FPGA Hardware 127

Flopped and Unflopped Outputs

CE

S

R D Q

comb_data_in

ctrl_in

carry_in

carry_out

flop_data_outcomb

comb_data_out

flop_data_in

Page 155: ECE 327 Slides VHDL Verilog Digital Hardware Design

128 CHAPTER 2. RTL DESIGN WITH VHDL

2.2.2 Area Estimation

To estimate the number of FPGA cells that will be required to implement a circuit,recall that an FPGA lookup-table can implement any function with up to four inputsand one output.

We will describe two methods to estimate the area (number of FPGA cells) requiredto implement a gate-level circuit:

1. Rough estimate based simply upon the number of flip-flops and primary inputsthat are in the fanin of each flip-flop.

2. A more accurate estimate, based upon greedily including as many gates aspossible into each FPGA cell.

Page 156: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.2 Area Estimation 129

Lower Bound on Area for Circuit with oneTarget

Source flops/inputs Minimum cells1 12 13 14 15 26 27 28 39 3

10 311 4

For a single target signal, this technique gives a lower bound on the number of cellsneeded.

For multiple target signals, this technique might be an overestimate, because asingle cell can drive several other cells.

Page 157: ECE 327 Slides VHDL Verilog Digital Hardware Design

130 CHAPTER 2. RTL DESIGN WITH VHDL

Question: How many cells are needed to implement a 4:1 mux?

Page 158: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.2 Area Estimation 131

3 Cells for 10:1 Function

Page 159: ECE 327 Slides VHDL Verilog Digital Hardware Design

132 CHAPTER 2. RTL DESIGN WITH VHDL

Estimate Area for Circuit

For each flip-flop and output: traverse backward through the fanin gathering asmuch combinational circuitry as possible into the FPGA cell.

Stopping conditions:• flip-flop

• more than four inputs — However, have more than four signals as input, thenfurther back in the fanin, the circuit will collapse back to four or fewer signals.

Page 160: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.2 Area Estimation 133

Question: Map the combinational circuits below onto generic FPGA cells.

a

b

c

d

z

CE

S

R D Q comb

CE

S

R D Q comb

CE

S

R D Q comb

CE

S

R D Q comb

CE

S

R D Q comb

CE

S

R D Q comb

Page 161: ECE 327 Slides VHDL Verilog Digital Hardware Design

134 CHAPTER 2. RTL DESIGN WITH VHDL

2.2.2.1 Interconnect for Generic FPGA

This section reserved for your reading pleasure

2.2.2.2 Clocks for Generic FPGAs

Characteristics of clock signals:• High fanout (drive many gates)

• Long wires (destination gates scattered all over chip)

Characteristics of FPGAs:• Very few gates that are large (strong) enough to support a high fanout.

• Very few wires that traverse entire chip and can be connected to every flip-flop.

Page 162: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.2 Area Estimation 135

2.2.2.3 Special Circuitry in FPGAs

Memory

For more than five years, FPGAs have had special circuits for RAM and ROM. InAltera FPGAs, these circuits are called ESBs (Embedded System Blocks). Thesespecial circuits are possible because many FPGAs are fabricated on the sameprocesses as SRAM chips. So, the FPGAs simply contain small chunks of SRAM.

Page 163: ECE 327 Slides VHDL Verilog Digital Hardware Design

136 CHAPTER 2. RTL DESIGN WITH VHDL

Microprocessors

A new feature to appear in FPGAs in 2001 and 2002 is hardwired microprocessorson the same chip as programmable hardware.

Hard SoftAltera Arm 922T with 200 MIPs Nios with ?? MIPsXilinx: Virtex-II Pro Power PC 405 with 420 D-MIPs Microblaze with 100 D-MIPs

The Xilinx-II Pro has 4 Power PCs and enough programmable hardware to imple-ment the first-generation Intel Pentium microprocessor.

Page 164: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.2 Area Estimation 137

Arithmetic Circuitry

A new feature to appear in FPGAs in 2001 and 2002 is hardwired circuits for multi-pliers and adders.

Altera: Mercury 16×16 at 130MHzXilinx: Virtex-II Pro 18×18 at ???MHz

Using these resources can improve significantly both the area and performance ofa design.

Page 165: ECE 327 Slides VHDL Verilog Digital Hardware Design

138 CHAPTER 2. RTL DESIGN WITH VHDL

Input / Output

Recently, high-end FPGAs have started to include special circuits to increase thebandwidth of communication with the outside world.

ProductAltera True-LVDS (1 Gbps)Xilinx Rocket I/O (3 Gbps)

Page 166: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.3 Generic-FPGA Coding Guidelines 139

2.2.3 Generic-FPGA Coding Guidelines

Flip Flops Are Free• Flip-flops are almost free in FPGAs

reason In FPGAs, the area consumed by a design is usually determined by theamount of combinational circuitry, not by the number of flip-flops.

Page 167: ECE 327 Slides VHDL Verilog Digital Hardware Design

140 CHAPTER 2. RTL DESIGN WITH VHDL

Use It or Lose• Aim for using 80–90% of the cells on a chip.

reason If you use more than 90% of the cells on a chip, then the place-and-route program might not be able to route the wires to connect the cells.

reason If you use less than 80% of the cells, then probably:

there are optimizations that will increase performance and still allow thedesign to fit on the chip;

or you spent too much human effort on optimizing for low area;or you could use a smaller (cheaper!) chip.

exception In E&CE 327 (unlike in real life), the mark is based on the actualnumber of cells used.

Page 168: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.2.3 Generic-FPGA Coding Guidelines 141

Just One Clock• Use just one clock signal

reason If all flip-flops use the same clock, then the clock does not impose anyconstraints on where the place-and-route tool puts flip-flops and gates. Ifdifferent flip-flops used different clocks, then flip-flops that are near each otherwould probably be required to use the same clock.

Page 169: ECE 327 Slides VHDL Verilog Digital Hardware Design

142 CHAPTER 2. RTL DESIGN WITH VHDL

Just One Clock Edge• Use only one edge of the clock signal

reason There are two ways to use both rising and falling edges of a clock signal:have rising-edge and falling-edge flip flops, or have two different clock signalsthat are inverses of each other. Most FPGAs have only rising-edge flip flops.Thus, using both edges of a clock signal is equivalent to having two differentclock signals, which is deprecated by the preceding guideline.

Page 170: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.3. DESIGN FLOW 143

2.3 Design Flow

This section reserved for your reading pleasure

2.4 Algorithms and High-Level Models

This section reserved for your reading pleasure

Page 171: ECE 327 Slides VHDL Verilog Digital Hardware Design

144 CHAPTER 2. RTL DESIGN WITH VHDL

2.5 Finite State Machines in VHDL

2.5.1 Introduction to State-Machine Design

2.5.1.1 Mealy vs Moore State Machines

Page 172: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.1 Introduction to State-Machine Design 145

Moore Machines• Outputs are dependent upon only the state

• No combinational paths from inputs to outputs

s0/0

s1/1 s2/0

s3/0

a !a

Page 173: ECE 327 Slides VHDL Verilog Digital Hardware Design

146 CHAPTER 2. RTL DESIGN WITH VHDL

Mealy Machines• Outputs are dependent upon both the state and the inputs

• Combinational paths from inputs to outputs

s0

s1 s2

s3

a/1 !a/0

/0/0

Page 174: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.1 Introduction to State-Machine Design 147

2.5.1.2 Introduction to State Machines andVHDL

A state machine is generally written as a single clocked process, or as a pair ofprocesses, where one is clocked and one is combinational.

Design Decisions• Moore vs Mealy (Sections 2.5.2 and 2.5.3)

• Implicit vs Explicit (Section 2.5.1.3)

• State values in explicit state machines: Enumerated type vs constants (Sec-tion 2.5.5)

• State values for constants: encoding scheme (binary, gray, one-hot, ...) (Sec-tion 2.5.5)

Page 175: ECE 327 Slides VHDL Verilog Digital Hardware Design

148 CHAPTER 2. RTL DESIGN WITH VHDL

VHDL Constructs for State Machines

The following VHDL control constructs are useful to steer the transition from stateto state:• if ... then ... else

• case

• for ... loop

• while ... loop

• loop

• next

• exit

Page 176: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.1 Introduction to State-Machine Design 149

2.5.1.3 Explicit vs Implicit State Machines

There are two styles of writing state machines in VHDL: explicit and implicit.

Explicit

• State signal appears explicitly in VHDL code

• At most one wait statement per process

• Two sub-categories of explicit state machines

Explicit-Current

– State signal represents current state

– Next-state computation done in a clocked process

Explicit-Current+Next

– Two state signals: current state and next state

– Next-state computation done in a combinational process

– Current-state <= next-state is registered assignment

Implicit Use multiple wait statements in a process to describe state machineimplicilty

Page 177: ECE 327 Slides VHDL Verilog Digital Hardware Design

150 CHAPTER 2. RTL DESIGN WITH VHDL

Implicit State Machines

For the implicit style of writing state machines, the synthesis program adds an im-plicit register to hold the state signal and combinational circuitry to update the statesignal. In Synopsys synthesis tools, the state signal defined by the synthesizer isnamed multiple wait state reg .

In Mentor Graphics, the state signal is named STATE VAR

We can think of the VHDL code for implicit state machines as having zero statesignals, explicit-current state machines as having one state signal (state ), andexplicit-current+next state machines as having two state signals (state andstate next ).

Page 178: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.1 Introduction to State-Machine Design 151

State Machine TradeoffsExplicit-Current+Next

• Most detailed, closest to hardware

• Greatest opportunity for manual optimization

• Most labour-intensive

• Susceptible to small, subtle, hard-to-find bugs

Explicit-Current

• Almost as manual optimization as Explicit-Current+Next

• Easier to write than Explicit-Current+Next

• Less susceptible to subtle bugs

Implicit

• Taught infrequently

• Least detailed, furthest from actual hardware

• Rely on synthesis for optimization

• Usually least labour to write, shortest code

• Easiest to write correctly (But must understand VHDL synthesis! )

Page 179: ECE 327 Slides VHDL Verilog Digital Hardware Design

152 CHAPTER 2. RTL DESIGN WITH VHDL

Limitation of Implicit State Machines

Because implicit state machines are written with loops, if-then-elses, cases, etc. itis difficult to write some state machines with complicated control flows in an implicitstyle. The following example illustrates the point.

s0/0

s1/1

s2/0

s3/0

a

!a

!a

a

Page 180: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.1 Introduction to State-Machine Design 153

Terminology

Note: The terminology of “explicit” and “implicit” is somewhatstandard, in that some descriptions of processes with multiple waitstatements describe the processes as having “implicit state ma-chines”.There is no standard terminology to distinguish between the twoexplicit styles: explicit-current+next and explicit-current.

Page 181: ECE 327 Slides VHDL Verilog Digital Hardware Design

154 CHAPTER 2. RTL DESIGN WITH VHDL

2.5.2 Implementing a Simple Moore Ma-chine

s0/0

s1/1 s2/0

s3/0

a !a

entity simple is

port (

a, clk : in std_logic;

z : out std_logic

);

end simple;

Page 182: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.2 Implementing a Simple Moore Machine 155

2.5.2.1 Implicit Moore State Machine

architecture moore_implicit_v1a of simple is

begin

process

begin

z <= ’0’;

wait until rising_edge(clk);

if (a = ’1’) then

z <= ’1’;

else

z <= ’0’;

end if;

wait until rising_edge(clk);

z <= ’0’;

wait until rising_edge(clk);

end process;

end moore_implicit;

FlopsGatesDelay

Page 183: ECE 327 Slides VHDL Verilog Digital Hardware Design

156 CHAPTER 2. RTL DESIGN WITH VHDL

Implicit Moore State Machine

s2/0

!a

Page 184: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.2 Implementing a Simple Moore Machine 157

2.5.2.2 Explicit Moore with Flopped Output

architecture moore_explicit_v1 of simple istype state_ty is (s0, s1, s2, s3);signal state : state_ty;

beginprocess (clk)begin

if rising_edge(clk) thencase state is

when s0 =>if (a = ’1’) then

state <= s1;z <= ’1’;

elsestate <= s2;z <= ’0’;

end if;when s1 | s2 =>

state <= s3;z <= ’0’;

when s3 =>state <= s0;z <= ’1’;

end case;end if;

end process;end moore_explicit_v1;

FlopsGatesDelay

Page 185: ECE 327 Slides VHDL Verilog Digital Hardware Design

158 CHAPTER 2. RTL DESIGN WITH VHDL

Explicit Moore with Flopped Outputs

Page 186: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.2 Implementing a Simple Moore Machine 159

2.5.2.3 Explicit Moore with CombinationalOutputs

architecture moore_explicit_v2 of simple istype state_ty is (s0, s1, s2, s3);signal state : state_ty;

beginprocess (clk)begin

if rising_edge(clk) thencase state is

when s0 =>if (a = ’1’) then

state <= s1;else

state <= s2;end if;

when s1 | s2 =>state <= s3;

when s3 =>state <= s0;

end case;end if;

end process;z <= ’1’ when (state = s1)

else ’0’;end moore_explicit_v2;

FlopsGatesDelay

Page 187: ECE 327 Slides VHDL Verilog Digital Hardware Design

160 CHAPTER 2. RTL DESIGN WITH VHDL

Explicit Moore with Combinational Outputs

Page 188: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.2 Implementing a Simple Moore Machine 161

2.5.2.4 Explicit-Current+Next Moore withConcurrent Assignment

architecture moore_explicit_v3 of simple istype state_ty is (s0, s1, s2, s3);signal state, state_nxt : state_ty;

beginprocess (clk)begin

if rising_edge(clk) thenstate <= state_nxt;

end if;end process;state_nxt <= s1 when (state = s0) and (a = ’1’)

else s2 when (state = s0) and (a = ’0’)else s3 when (state = s1) or (state = s2)else s0;

z <= ’1’ when (state = s1)else ’0’;

end moore_explicit_v3;

FlopsGatesDelay

Page 189: ECE 327 Slides VHDL Verilog Digital Hardware Design

162 CHAPTER 2. RTL DESIGN WITH VHDL

Explicit-Current+Next Moore with

Concurrent Assignment

The hardware synthesized from this architecture is the same as that synthesizedfrom moore explicit v2 , which is written in the current-explicit style.

Page 190: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.2 Implementing a Simple Moore Machine 163

2.5.2.5 E-C+N Moore with Comb Procarchitecture moore_explicit_v4 of simple is

type state_ty is (s0, s1, s2, s3);signal state, state_nxt : state_ty;

beginprocess (clk)begin

if rising_edge(clk) thenstate <= state_nxt;

end if;end process;process (state, a)begin

case state iswhen s0 =>

if (a = ’1’) thenstate_nxt <= s1;

elsestate_nxt <= s2;

end if;when s1 | s2 =>

state_nxt <= s3;when s3 =>

state_nxt <= s0;end case;

end process;z <= ’1’ when (state = s1)

else ’0’;end moore_explicit_v4;

Change the selected as-signment to state intoa combinational processusing a case statement.

FlopsGatesDelay

Same hardware asmoore explicit v2

and v3 .

Page 191: ECE 327 Slides VHDL Verilog Digital Hardware Design

164 CHAPTER 2. RTL DESIGN WITH VHDL

Explicit-Current+Next Moore with

Combinational Process

Page 192: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.3 Implementing a Simple Mealy Machine 165

2.5.3 Implementing a Simple Mealy Ma-chine

Mealy machines have a combinational path from inputs to outputs, which oftenviolates good coding guidelines for hardware. Thus, Moore machines are muchmore common. You should know how to write a Mealy machine if needed, but mostof the state machines that you design will be Moore machines.

This section reserved for your reading pleasure

Page 193: ECE 327 Slides VHDL Verilog Digital Hardware Design

166 CHAPTER 2. RTL DESIGN WITH VHDL

2.5.4 Reset

All circuits should have a reset signal that puts the circuit back into a good initialstate. However, not all flip flops within the circuit need to be reset. In a circuit thathas a datapath and a state machine, the state machine will probably need to bereset, but datapath may not need to be reset.

There are standard ways to add a reset signal to both explicit and implicit statemachines.

It is important that reset is tested on every clock cycle, otherwise a reset might notbe noticed, or your circuit will be slow to react to reset and could generate illegaloutputs after reset is asserted.

Page 194: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.4 Reset 167

Reset with Implicit State Machine• Insert a loop

• Test for reset after each wait

Example from section 2.5.2.1:

architecture moore_implicit of simple isbegin

processbegin

init : loop -- outermost loopz <= ’0’;wait until rising_edge(clk);next init when (reset = ’1’); -- test for resetif (a = ’1’) then

z <= ’1’;else

z <= ’0’;end if;wait until rising_edge(clk);next init when (reset = ’1’); -- test for resetz <= ’0’;wait until rising_edge(clk);next init when (reset = ’1’); -- test for reset

end process;end moore_implicit;

Page 195: ECE 327 Slides VHDL Verilog Digital Hardware Design

168 CHAPTER 2. RTL DESIGN WITH VHDL

Reset with Explicit State Machine

Reset is often easier to include in an explicit state machine, because we need onlyput a test for reset = ’1’ in the clocked process for the state.

The pattern for an explicit-current style of machine is:

process (clk) begin

if rising_edge(clk) then

if reset = ’1’ then

state <= S0;

else

if ... then

state <= ...;

elif ... then

... -- more tests and assignments to state

end if;

end if;

end if;

end process;

Page 196: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.5.4 Reset 169

Reset with Explicit State Machine

Applying this pattern to the explicit Moore machine from section 2.5.2.3 produces:

architecture moore_explicit_v2 of simple istype state_ty is (s0, s1, s2, s3);signal state : state_ty;

beginprocess (clk)begin

if rising_edge(clk) thenif (reset = ’1’) thenstate <= s0;

elsecase state is

...end case;

end if;end if;

end process;z <= ’1’ when (state = s1)

else ’0’;end moore_explicit_v2;

Page 197: ECE 327 Slides VHDL Verilog Digital Hardware Design

170 CHAPTER 2. RTL DESIGN WITH VHDL

Reset with Explicit-Current+Next

The pattern for an explicit-current+next style is:

process (clk) begin

if rising_edge(clk) then

if reset = ’1’ then

state_cur <= reset state;

else

state_cur <= state_nxt;

end if;

end if;

end process;

2.5.5 State Encoding

This section reserved for your reading pleasure

Page 198: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6. DATAFLOW DIAGRAMS 171

2.6 Dataflow Diagrams

2.6.1 Dataflow Diagrams Overview• Dataflow diagrams are data-dependency graphs where the computation is di-

vided into clock cycles.

• Purpose:

– Provide a disciplined approach for designing datapath-centric circuits

– Guide the design from algorithm, through high-level models, and finally to reg-ister transfer level code for the datapath and control circuitry.

– Estimate area and performance

– Make tradeoffs between different design options

• Background

– Based on techniques from high-level synthesis tools

– Some similarity between high-level synthesis and software compilation

– Each dataflow diagram corresponds to a basic block in software compiler ter-minology.

Page 199: ECE 327 Slides VHDL Verilog Digital Hardware Design

172 CHAPTER 2. RTL DESIGN WITH VHDL

Data-Dependency Graph

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Data-dependency graph for z = a + b + c + d + e + f

Page 200: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.1 Dataflow Diagrams Overview 173

Dataflow Diagrams

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Dataflow diagram for z = a + b + c + d + e + f

Page 201: ECE 327 Slides VHDL Verilog Digital Hardware Design

174 CHAPTER 2. RTL DESIGN WITH VHDL

Clock Cycle Boundaries

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Page 202: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.1 Dataflow Diagrams Overview 175

Latency

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Latency = 6 clock cycles

1

2

3

4

5

6

Page 203: ECE 327 Slides VHDL Verilog Digital Hardware Design

176 CHAPTER 2. RTL DESIGN WITH VHDL

Latency

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Latency = 4 clock cycles

1

2

3

4

Question: Why would a good hardware engineer find this designdisatisfying?

Page 204: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.1 Dataflow Diagrams Overview 177

Flip Flops

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Signals crossing clockboundaries are flip-flops

Page 205: ECE 327 Slides VHDL Verilog Digital Hardware Design

178 CHAPTER 2. RTL DESIGN WITH VHDL

Registered Inputs and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Signals crossing clockboundaries are flip-flops

Flops on both inputs and outputs

Page 206: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.1 Dataflow Diagrams Overview 179

Registered Inputs, Combinational Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Signals crossing clockboundaries are flip-flops

Flops on inputs, but not outputs(Latency = 5)

Page 207: ECE 327 Slides VHDL Verilog Digital Hardware Design

180 CHAPTER 2. RTL DESIGN WITH VHDL

Datapath Components

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Signals crossing clockboundaries are flip-flops

Blocks in clock cyclesare datapath components

Page 208: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.1 Dataflow Diagrams Overview 181

Inputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Unconnected signal tails are inputs

Signals crossing clockboundaries are flip-flops

Blocks in clock cyclesare datapath components

Page 209: ECE 327 Slides VHDL Verilog Digital Hardware Design

182 CHAPTER 2. RTL DESIGN WITH VHDL

Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Unconnected signal tails are inputs

Signals crossing clockboundaries are flip-flops

Blocks in clock cyclesare datapath components

Unconnected signal headsare outputs

Page 210: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.1 Dataflow Diagrams Overview 183

Summary

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

Horizontal lines mark clock cycle boundaries

Unconnected signal tails are inputs

Signals crossing clockboundaries are flip-flops

Blocks in clock cyclesare datapath components

Unconnected signal headsare outputs

Page 211: ECE 327 Slides VHDL Verilog Digital Hardware Design

184 CHAPTER 2. RTL DESIGN WITH VHDL

2.6.2 Dataflow Diagrams, Hardware, andBehaviour

Primary Input

Dataflow Diagrami

x

Hardwarei

x

Behaviourclk

i

x

Page 212: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.2 Dataflow Diagrams, Hardware, and Behaviour 185

Register Input

Dataflow Diagrami

x

Hardwarei

x

Behaviourclk

i

x

Page 213: ECE 327 Slides VHDL Verilog Digital Hardware Design

186 CHAPTER 2. RTL DESIGN WITH VHDL

Register Signal

Dataflow Diagrami1

x

+

i2

Hardware

+

i2

x

i1

Behaviourclk

i1

i2

x

Page 214: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.2 Dataflow Diagrams, Hardware, and Behaviour 187

Combinational-Component Output

Dataflow Diagrami1

x+

i2

Hardware

+

i2

i1x

Behaviourclk

i1

i2

x

Page 215: ECE 327 Slides VHDL Verilog Digital Hardware Design

188 CHAPTER 2. RTL DESIGN WITH VHDL

2.6.3 Dataflow Diagram Execution

Page 216: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.3 Dataflow Diagram Execution 189

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0 0 1 2 3 4 5 6

x5

Page 217: ECE 327 Slides VHDL Verilog Digital Hardware Design

190 CHAPTER 2. RTL DESIGN WITH VHDL

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

0 1 2 3 4 5 6

x5

Page 218: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.3 Dataflow Diagram Execution 191

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

0 1 2 3 4 5 6

x5

Page 219: ECE 327 Slides VHDL Verilog Digital Hardware Design

192 CHAPTER 2. RTL DESIGN WITH VHDL

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

0 1 2 3 4 5 6

x5

Page 220: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.3 Dataflow Diagram Execution 193

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

4

0 1 2 3 4 5 6

x5

Page 221: ECE 327 Slides VHDL Verilog Digital Hardware Design

194 CHAPTER 2. RTL DESIGN WITH VHDL

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

4

5

0 1 2 3 4 5 6

x5

Page 222: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.3 Dataflow Diagram Execution 195

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

4

5

6

0 1 2 3 4 5 6

x5

Page 223: ECE 327 Slides VHDL Verilog Digital Hardware Design

196 CHAPTER 2. RTL DESIGN WITH VHDL

Execution with Registers on Both Inputs

and Outputs

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

4

5

6

0 1 2 3 4 5 6

x5

Page 224: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.3 Dataflow Diagram Execution 197

Execution Without Output Registers

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

clk

a

x1

x2

x3

x4

x5

z

0

1

2

3

4

5

0 1 2 3 4 5 6

x5

Page 225: ECE 327 Slides VHDL Verilog Digital Hardware Design

198 CHAPTER 2. RTL DESIGN WITH VHDL

2.6.4 Performance Estimation

Performance Equations

Performance ∝1

TimeExec

TimeExec = Latency×ClockPeriod

Definition Latency: Number of clock cycles from inputs to outputs. Acombinational circuit has latency of zero. A single register has a latency ofone. A chain of n registers has a latency of n.

Performance of Dataflow Diagrams• Latency: count horizontal lines in diagram

• Min clock period (Max clock speed) limited by longest path in a clock cycle

Page 226: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.5 Area Estimation 199

2.6.5 Area Estimation• Maximum number of blocks in a clock cycle is total number of that component

that are needed

• Maximum number of signals that cross a cycle boundary is total number ofregisters that are needed

• Maximum number of unconnected signal tails in a clock cycle is total numberof inputs that are needed

• Maximum number of unconnected signal heads in a clock cycle is total num-ber of outputs that are needed

These estimates give lower bounds.

Other constraints might force you to use more components.

Page 227: ECE 327 Slides VHDL Verilog Digital Hardware Design

200 CHAPTER 2. RTL DESIGN WITH VHDL

Area Estimation

Implementation-technology factors, such as the relative size of registers, multiplex-ers, and datapath components, might force you to make tradeoffs that increase thenumber of datapath components to decrease the overall area of the circuit.• With some FPGA chips, a 2:1 multiplexer has the same area as an adder.

• With some FPGA chips, a 2:1 multiplexer can be combined with an adder intoone FPGA cell per bit.

• In FPGAs, registers are usually “free”, in that the area consumed by a circuit islimited by the amount of combinational logic, not the number of flip-flops.

Page 228: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.6 Design Analysis 201

2.6.6 Design Analysis

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

num inputs

num outputs

num registers

num adders

min clock period

latency

Page 229: ECE 327 Slides VHDL Verilog Digital Hardware Design

202 CHAPTER 2. RTL DESIGN WITH VHDL

Design Analysis (Cont’d)

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

x5

num inputs

num outputs

num registers

num adders

min clock period

latency

Page 230: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.7 Area / Performance Tradeoffs 203

2.6.7 Area / Performance Tradeoffsone add per clock cycle two adds per clock cycle

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

0

1

2

3

4

5

6x5

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

0

1

2

3

4

x5

Note: In the “Two-add” design, half of the last clock cycle iswasted.

Page 231: ECE 327 Slides VHDL Verilog Digital Hardware Design

204 CHAPTER 2. RTL DESIGN WITH VHDL

Two Adds per Clock Cycle

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

0

1

2

3

clk

a

x1

x2

x3

x4

x5

z

0 1 2 3 4 5 6

4

x5

Page 232: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.6.7 Area / Performance Tradeoffs 205

Design Comparison

One add per clock cycle Two adds per clock cyclea b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

0

1

2

3

4

5

6x5

a b c d e f

+

+

+

+

+

x1

x2

x3

x4

z

0

1

2

3

4

x5

inputs 6 6outputs 1 1registers 6 6adders 1 2clock period flop + 1 add flop + 2 addlatency 6 4

Question: Under what circumstances would each design option be fastest?

Page 233: ECE 327 Slides VHDL Verilog Digital Hardware Design

206 CHAPTER 2. RTL DESIGN WITH VHDL

2.7 Design Example: Massey

This section reserved for your reading pleasure

2.8 Design Example: Vanier

We’ll go through the following artifacts:

1. requirements

2. algorithm

3. dataflow diagram

4. high-level models

5. hardware block diagram

6. RTL code for datapath

7. state machine

8. RTL code for control

Page 234: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8. DESIGN EXAMPLE: VANIER 207

Design Process1. Scheduling (allocate operations to clock cycles)

2. I/O allocation

3. First high-level model

4. Register allocation

5. Datapath allocation

6. Connect datapath components, insert muxes where needed

7. Design implicit state machine

8. Optimize

9. Design explicit-current state machine

10. Optimize

Page 235: ECE 327 Slides VHDL Verilog Digital Hardware Design

208 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.1 Requirements• Functional requirements: compute the following formula:

output = (a × d) + c + (d × b) + b

• Performance requirement:

– Max clock period: flop plus (2 adds or 1 multiply)

– Max latency: 4

• Cost requirements

– Maximum of two adders

– Maximum of two multipliers

– Unlimited registers

– Maximum of three inputs and one output

– Maximum of 5000 student-minutes of design effort

• Registered inputs and outputs

Page 236: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.2 Algorithm 209

2.8.2 Algorithm

output = (a × d) + c + (d × b) + b

Create a data-dependency graph for the algorithm.

z

a d

+

+

+

b c

Page 237: ECE 327 Slides VHDL Verilog Digital Hardware Design

210 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.3 Initial Dataflow Diagram

Schedule operations into clock cycles.

z

a d

+

+

+

b c

Page 238: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.4 Reschedule to Meet Requirements 211

2.8.4 Reschedule to Meet Requirements

z

a d

+

+

+

b c

z

d b ca

Page 239: ECE 327 Slides VHDL Verilog Digital Hardware Design

212 CHAPTER 2. RTL DESIGN WITH VHDL

Fix Clock Period Violation

z

d

+

+

+

b c

a

z

d

+

+

+

b c

a

Page 240: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.5 Optimize Resources 213

2.8.5 Optimize Resources

z

a

d

+

+

+

b c

z

d b ca

Page 241: ECE 327 Slides VHDL Verilog Digital Hardware Design

214 CHAPTER 2. RTL DESIGN WITH VHDL

Analysis

z

a

d

+

+

+

b

c

Question: Should we move the second addition from third clock cycle tosecond?

Page 242: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.5 Optimize Resources 215

Define Entity

Having finalized our input/output scheduling, we can write our entity. Note: we willadd a reset signal later, when we design the state machine to control the datapath.

entity vanier is

port (

clk : in std_logic;

i_1, i_2 : in std_logic_vector(15 downto 0);

o_1 : out std_logic_vector(15 downto 0)

);

end vanier;

Page 243: ECE 327 Slides VHDL Verilog Digital Hardware Design

216 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.6 Assign Names to Registered Values

z

a

d

+

+

+

b

c

Question: Why do we not need to assign names to combinational signals?

Question: Why do we not need to assign a new name to x1, x2, and x4 thesecond time they cross a clock cycle boundary?

Page 244: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.7 Input/Output Allocation 217

2.8.7 Input/Output Allocation

z

a

d

+

+

+

b

c

x1 x2

x3 x4 x5

x6 x7

x8

Page 245: ECE 327 Slides VHDL Verilog Digital Hardware Design

218 CHAPTER 2. RTL DESIGN WITH VHDL

VHDL Code!

architecture hlm_v1 of vanier issignal x_1, x_2, x_3, x_4, x_5, x_6, x_7,

x_8 : unsigned(15 downto 0);begin

process beginwait until rising_edge(clk);x_1 <= unsigned(i_1);x_2 <= unsigned(i_2);wait until rising_edge(clk);x_3 <= unsigned(i_1);x_4 <= x_1(7 downto 0) * x_2(7 downto 0);x_5 <= unsigned(i_2);wait until rising_edge(clk);x_6 <= x_3(7 downto 0) * x_1(7 downto 0);x_7 <= x_2 + x_5;wait until rising_edge(clk);x_8 <= x_6 + (x_4 + x_7);

end process;o_1 <= std_logic_vector(x_8);

end hlm_v1;

Page 246: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.7 Input/Output Allocation 219

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

x8

x1

0

1

2

3

4

x2

x3

x4

x5

x6

x7

x8

0 1 2 3 4 5

r1

r2

r3

r4

r5

0 1 2 3 4 5

i1

i2

i1

i2

Page 247: ECE 327 Slides VHDL Verilog Digital Hardware Design

220 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.8 Tangent: Combinational Outputs

architecture hlm_v1c of vanier issignal x_1, x_2, x_3, x_4, x_5, x_6, x_7

: unsigned(15 downto 0);begin

process beginwait until rising_edge(clk);x_1 <= unsigned(i_1);x_2 <= unsigned(i_2);wait until rising_edge(clk);x_3 <= unsigned(i_1);x_4 <= x_1(7 downto 0) * x_2(7 downto 0);x_5 <= unsigned(i_2);wait until rising_edge(clk);x_6 <= x_3(7 downto 0) * x_1(7 downto 0);x_7 <= x_2 + x_5;

end process;o_1 <= std_logic_vector(x_6 + (x_4 + x_7));

end hlm_v1c;

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

Page 248: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.9 Register Allocation 221

2.8.9 Register Allocation

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

Page 249: ECE 327 Slides VHDL Verilog Digital Hardware Design

222 CHAPTER 2. RTL DESIGN WITH VHDL

New VHDL Code!

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

x8

r1 r2

r3 r4 r5

r2 r5

r5

architecture hlm_v2 of vanier issignal r_1, r_2, r_3, r_4, r_5

: unsigned(15 downto 0);begin

process beginwait until rising_edge(clk);r_1 <= unsigned(i_1);r_2 <= unsigned(i_2);wait until rising_edge(clk);r_3 <= unsigned(i_1);r_4 <= r_1(7 downto 0) * r_2(7 downto 0);r_5 <= unsigned(i_2);wait until rising_edge(clk);r_2 <= r_3(7 downto 0) * r_1(7 downto 0);r_5 <= r_2 + r_5;wait until rising_edge(clk);r_5 <= r_2 + (r_4 + r_5);

end process;o_1 <= std_logic_vector(r_5);

end hlm_v2;

Page 250: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.10 Datapath Allocation 223

2.8.10 Datapath Allocation

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

x8

r1 r2

r3 r4 r5

r2 r5

r5

Page 251: ECE 327 Slides VHDL Verilog Digital Hardware Design

224 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.11 Hardware Block Diagram and StateMachine1. Calculate number of states that are needed

2. Control signals for registers

• Chip enable

• Mux select on input

3. Control signals for datapath components

• Instruction (e.g. add/sub for ALU)

• Mux select on inputs

For our example: Use four states: S0..S3, one for each clock cycle.

Page 252: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.11 Hardware Block Diagram and State Machine 225

2.8.11.1 Control for RegistersBuild a table with one row per state, one colum per register.

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

x8

r1 r2

r3 r4 r5

r2 r5

r5

m1

m1a1

a2

a1

S0

S1

S2

S3

S0

r1 r2 r3 r4 r5ce d ce d ce d ce d ce d

S0S1S2S3

Page 253: ECE 327 Slides VHDL Verilog Digital Hardware Design

226 CHAPTER 2. RTL DESIGN WITH VHDL

Optimize chip enables and muxes

r1 r2 r3 r4 r5ce d ce d ce d ce d ce d

S0 1 i1 1 i2 – – – – – –S1 0 – 0 – 1 i1 1 m1 1 i2S2 – – 1 m1 – – 0 – 1 a1S3 – – – – – – – – 1 a1

• Chip enable: a register holds a value for multiple clock cycles.

• Mux: a register loads values from multiple sources.

Page 254: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.11 Hardware Block Diagram and State Machine 227

Optimized Chip Enables and Muxes

r1=i1 r2 r3=i1 r4=m1 r5ce ce d ce d

S0 1 1 i2 – –S1 0 0 – 1 i2S2 – 1 m1 0 a1S3 – – – – a1

Page 255: ECE 327 Slides VHDL Verilog Digital Hardware Design

228 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.11.2 Control for Datapath Components• Table for datapath components.

• One row per state.

• One column per datapath component.

• Sub-columns for sources and instructions (e.g. add/sub for ALU).

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

x8

r1 r2

r3 r4 r5

r2 r5

r5

m1

m1a1

a2

a1

S0

S1

S2

S3

S0

a1 a2 m1src1 src2 src1 src2 src1 src2

S0 – – – – – –S1 – – – – r1 r2S2 r2 r5 – – r3 r1S3 r2 a2 r4 r5 – –

Page 256: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.11 Hardware Block Diagram and State Machine 229

Optimize Datapath Control Table

a1 a2 m1src1 src2 src1 src2 src1 src2

S0 – – – – – –S1 – – – – r1 r2S2 r2 r5 – – r1 r3S3 r2 a2 r4 r5 – –

Page 257: ECE 327 Slides VHDL Verilog Digital Hardware Design

230 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.11.3 Control for State

We need to control the transition from one state to the next. For this example, thetransition is very simple, each state transitions to its successor: S0→ S1→ S2→

S3→ S0....

Page 258: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.11 Hardware Block Diagram and State Machine 231

2.8.11.4 Complete State Machine Table

r1 ce r2 ce r2 sel r4 ce r5 sel a1 src2 sel m1 src2 sel stateS0 1 1 i2 – – – – S1S1 0 0 – 1 i2 – r2 S2S2 – 1 m1 0 a1 r5 r3 S3S3 – – – – a1 a2 – S0

Question: What values should we use for don’t cares?

Page 259: ECE 327 Slides VHDL Verilog Digital Hardware Design

232 CHAPTER 2. RTL DESIGN WITH VHDL

“Don’t Cares” Instantiations

r1 ce r2 ce r2 sel r4 ce r5 sel a1 src2 sel m1 src2 sel stateS0 1 1 i2 0 a1 a2 r3 S1S1 0 0 m1 1 i2 a2 r2 S2S2 1 1 m1 0 a1 r5 r3 S3S3 1 1 m1 0 a1 a2 r3 S0

Page 260: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.12 VHDL Code with Explicit State Machine 233

2.8.12 VHDL Code with Explicit State Ma-chine

We chose a one-hot encoding of the state, which usually results in small and fasthardware for state machines with sixteen or fewer states.

architecture explicit_v1 of vanier is

signal r_1, r_2, r_3, r_4, r_5 : std_logic_vector(15 downto 0);

type state_ty is std_logic_vector(3 downto 0);

constant s0 : state_ty := "0001";

constant s1 : state_ty := "0010";

constant s2 : state_ty := "0100";

constant s3 : state_ty := "1000";

signal state : state_ty;

Page 261: ECE 327 Slides VHDL Verilog Digital Hardware Design

234 CHAPTER 2. RTL DESIGN WITH VHDL

begin------------------------ r_1process (clk) begin

if rising_edge(clk) thenif state != S1 then

r_1 <= i_1;end if;

end if;end process;------------------------ r_2process (clk) begin

if rising_edge(clk) thenif state != S1 then

if state = S0 thenr_2 <= i_2;

elser_2 <= m_1;

end if;end if;

end if;end process;

------------------------ r_3process (clk) begin

if rising_edge(clk) thenr_3 <= i_1;

end if;end process;------------------------ r_4process (clk) begin

if rising_edge(clk) thenif state = S1 then

r_4 <= m_1;end if;

end if;end process;

Page 262: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.12 VHDL Code with Explicit State Machine 235

------------------------ r_5process (clk) begin

if rising_edge(clk) thenif state = S1 then

r_5 <= i_2;else

r_5 <= a_1;end if;

end if;end process;------------------------ combinational datapathwith state select

a1_src2 <= r_5 when S2,a_2 when others;

with state selectm1_src2 <= r_2 when S1

r_3 when others;a_1 <= a_2 + a1_src2;a_2 <= r_4 + r_5;m_1 <= r_1 * m1_src2;o_1 <= r_5;

------------------------ state machineprocess (clk) begin

if rising_edge(clk) thenif reset = ’1’ then

state <= S0;else

case state iswhen S0 => state <= S1;when S1 => state <= S2;when S2 => state <= S3;when S3 => state <= S0;

end case;end if;

end if;end process;----------------------

end explicit_v1;

Page 263: ECE 327 Slides VHDL Verilog Digital Hardware Design

236 CHAPTER 2. RTL DESIGN WITH VHDL

Hardware Block Diagram

z

a

d

+

+

+

b

c

i1 i2

o1

i1 i2

x1 x2

x3 x4 x5

x6 x7

x8

r1 r2

r3 r4 r5

r2 r5

r5

m1

m1a1

a2

a1

+

+

m1

a1

a2

r1 r2 r3

r4

r5

i1 i2

S0

S1

S2

S3

S0

Page 264: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.13 Peephole Optimizations 237

2.8.13 Peephole Optimizations

-- r_1

process (clk) begin

if rising_edge(clk) then

if state != S1 then

r_1 <= i_1;

end if;

end if;

end process;

-- r_1 (optimized)

process (clk) begin

if rising_edge(clk) then

if then

r_1 <= i_1;

end if;

end if;

end process;

Page 265: ECE 327 Slides VHDL Verilog Digital Hardware Design

238 CHAPTER 2. RTL DESIGN WITH VHDL

Peephole Optimizations

-- r_2process (clk) begin

if rising_edge(clk) thenif state != S1

if state = S0 thenr_2 <= i_2;

elser_2 <= m_1;

end if;end if;

end if;end process;

-- r_2 (optimized)process (clk) begin

if rising_edge(clk) thenif state(1) = ’0’ then

if state(0) = ’1’ thenr_2 <= i_2;

elser_2 <= m_1;

end if;end if;

end if;end process;

Page 266: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.13 Peephole Optimizations 239

Peephole Optimizations

-- state machineprocess (clk) begin

if rising_edge(clk) thenif reset = ’1’ then

state <= S0;else

case state iswhen S0 => state <= S1;when S1 => state <= S2;when S2 => state <= S3;when S3 => state <= S0;

end case;end if;

end if;end process;

-- state machine (optimized)-- NOTE: "st" = "state"process (clk) begin

if rising_edge(clk) thenif reset = ’1’ then

st <= S0;else

for i in 0 to 3 loopst( (i+1) mod 4 ) <= st( i );

end loop;end if;

end if;end process;

Page 267: ECE 327 Slides VHDL Verilog Digital Hardware Design

240 CHAPTER 2. RTL DESIGN WITH VHDL

2.8.14 Notes and Observations

Our functional requirements were written as:

output = (a × d) + (d × b) + b + c

Alternatively, we could have achieved exactly the same functionality with the func-tional requirements written as (the two statements are mathematically equivalent):

output = (a × d) + b + (d × b) + c

Page 268: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.8.14 Notes and Observations 241

Data Dependency Graphs: Clean vs Ugly

The naive data dependency graph for the alternative formulation is much messierthan the data dependency graph for the original formulation:

Original(a × d) + (d × b) + b + c

z

a d

+

+

+

b c

Alternative(a × d) + c + (d × b) + b

z

a b

+

+ +

cd

Page 269: ECE 327 Slides VHDL Verilog Digital Hardware Design

242 CHAPTER 2. RTL DESIGN WITH VHDL

2.9 Pipelining

Pipelining is optimization that increases performance by overlapping the executionof multiple parcels (instructions). The cost is an increase in area, because wecannot reuse datapath components, registers, inputs, or outputs.

2.9.1 Introduction to Pipelining

Page 270: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.9.1 Introduction to Pipelining 243

Review of unpipelined dataflow diagram

a b

c

d

e

f

+

+

+

+

+

r1

z

0

1

2

3

4

5

add1

add1

add1

add1

add1

r1 r2

r2

r1 r2

r1 r2

r1 r2

clk

a

r1

z

0 1 2 3 4 5 6

αα

α

7 8 9 10 11 12 13

α α α α

Question: How soon can westart to execute β?

Page 271: ECE 327 Slides VHDL Verilog Digital Hardware Design

244 CHAPTER 2. RTL DESIGN WITH VHDL

Pipelined dataflow diagram• Each stage is treated as separate dataflow diagram.

• Double line denotes boundary between stages.

a b

c

d

e

f

+

+

+

+

+

r3

z

0

1

2

3

4

5

add1

add2

add3

add4

add5

r1 r2

r4

r5 r5

r7 r8

r9 r10

stag

e 1

stag

e 2

stag

e 3

stag

e 4

stag

e 5

clk

a

z

0 1 2 3 4 5 6

αα

αα

ααα

7 8 9 10 11 12 13

(stage1) r1

(stage2) r3

(stage3) r5

(stage4) r7

(stage5) r9

Question: How soon can westart to execute β?

Page 272: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.9.1 Introduction to Pipelining 245

Sequential (Unpipelined) Hardware

+

i2

o1

State(1) State(2) State(3)reset

State(0) State(4)

add1

i1

r1 r2

Page 273: ECE 327 Slides VHDL Verilog Digital Hardware Design

246 CHAPTER 2. RTL DESIGN WITH VHDL

Pipelined Hardware

+

i2

add1

i1

r1 r2

+add2

r3 r4

i3

+add3

r5 r6

i4

+add4

r7 r8

i5

+add5

r9 r10

i6

o1

stag

e 1

stag

e 2

stag

e 3

stag

e 4

stag

e 5

Page 274: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.9.1 Introduction to Pipelining 247

Pipelined VHDL Code

-- stage 1process begin

wait until rising_edge(clk);r1 <= i1; r2 <= i2;

end process;-- stage 2process begin

wait until rising_edge(clk);r3 <= r1 + r2; r4 <= i3;

end process;-- stage 3process begin

wait until rising_edge(clk);r5 <= r3 + r4; r6 <= i4;

end process;

-- stage 4process begin

wait until rising_edge(clk);r7 <= r5 + r6; r8 <= i5;

end process;-- stage 5process begin

wait until rising_edge(clk);r9 <= r7 + r8; r10 <= i6;

end process;-- outputo1 <= r9 + r10;

Page 275: ECE 327 Slides VHDL Verilog Digital Hardware Design

248 CHAPTER 2. RTL DESIGN WITH VHDL

2.9.2 Partially Pipelined• Fully pipelined: throughput is one parcel per clock cycle

• Partially pipelined: throughput is less than one parcel per clock cycle.

• Superscalar: throughput is more than one parcel per clock cycle.

a b

c

d

e

f

+

+

+

+

+

r1

z

0

1

2

3

4

5

add1

add1

add2

add2

add3

r1 r2

r2

r3 r4

r3 r4

r5 r6

stag

e 1

stag

e 2

stag

e 3

clk

a

z

0 1 2 3 4 5 6 7 8 9 10 11 12 13

(stage1) r1

(stage2) r3

(stage3) r5

Question: How do we execute αfollowed by β?

Page 276: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.9.2 Partially Pipelined 249

Hardware for Partially Pipelined

State(1)reset

State(0)

+

i2

add1

i1

r1 r2

+

i2

add2

r3 r4

+

i2

o1

add3

r5 r6

stage 1stage 2

stage 3

Page 277: ECE 327 Slides VHDL Verilog Digital Hardware Design

250 CHAPTER 2. RTL DESIGN WITH VHDL

2.9.3 Terminology

Definition Depth: The depth of a pipeline is the number of stages on thelongest path through the pipeline.

Definition Latency: The latency of a pipeline is measured the same as for anunpipelined circuit: the number of clock cycles from inputs to outputs.

Definition Throughput: The number of parcels consumed or produced perclock cycle.

Definition Upstream/downstream: Because parcels flow through the pipelineanalogously to water in a stream, the terms upstream and downstream areused respectively to refer to earlier and later stages in the pipeline. Forexample, stage1 is upstream from stage2.

Page 278: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.9.3 Terminology 251

Definition Bubble: When a pipe stage is empty (contains invalid data), it issaid to contain a “bubble”.

Question: How do we know whether the output of the pipeline is a bubbleor is valid data?

Page 279: ECE 327 Slides VHDL Verilog Digital Hardware Design

252 CHAPTER 2. RTL DESIGN WITH VHDL

2.10 Design Example: Pipelined Massey

RequirementsFunctional requirements:

• Compute the sum of six 8-bit numbers:output = a + b + c + d + e + f

• Registered inputs, combinational outputs

Performance requirements:

• Maximum clock period: unlimited

• Maximum latency: four

Cost requirements:

• Maximum of five adders

• Small miscellaneous hardware (e.g. muxes) is unlimited

• Maximum of six inputs and one output

• Design effort is unlimited

Page 280: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.10. DESIGN EXAMPLE: PIPELINED MASSEY 253

Initial Dataflow Diagrams

Original dataflow

z

a b c d

e f+

+

+

+

+

Final unpipelined dataflowa b c

d e

f

+

+

+

+

+

z

Page 281: ECE 327 Slides VHDL Verilog Digital Hardware Design

254 CHAPTER 2. RTL DESIGN WITH VHDL

Dataflow Diagram Exploration

Variation on original dataflow

z

a b c d e f

+

+

+

+

+

Pipelined dataflow diagram

z

a b c d

e f+

+

+

+

+

i_valid

o_valid

Page 282: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.10. DESIGN EXAMPLE: PIPELINED MASSEY 255

VHDL Code

-- stage 1process begin

wait until rising_edge(clk);r1 <= i1; r2 <= i2; r3 <= i3; r4 <= i4; v1 <= i_valid;

end process;a1 <= r1 + r2; a2 <= r3 + r4;-- stage 2process begin

wait until rising_edge(clk);r5 <= a1; r6 <= a2; r7 <= i5; r8 <= i6; v2 <= v1;

end process;a3 <= r5 + r6; a4 <= r7 + r8;-- stage 3process begin

wait until rising_edge(clk);r9 <= a3; r10 <= a4; v3 <= v2;

end process;a5 <= r9 + r10;-- outputsz <= a5;o_valid <= v3;

Page 283: ECE 327 Slides VHDL Verilog Digital Hardware Design

256 CHAPTER 2. RTL DESIGN WITH VHDL

2.11 Memory Arrays and RTL Design2.11.1 Memory Operations

Read of Memory with Registered InputsHardware

WE

A

DI

DO a doM

clk

we

Behaviourclk

αaa

M(αa)

we

do

-

αd

Page 284: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.1 Memory Operations 257

Write to Memory with Registered Inputs

Hardware WE

A

DI

DO aM

clk

di

we

do

Behaviourclk

αaa

M(αa)

αd

we

di

-

-

-

do

Page 285: ECE 327 Slides VHDL Verilog Digital Hardware Design

258 CHAPTER 2. RTL DESIGN WITH VHDL

Dual-Port Memory with Registered Inputs

a0M

clk

di0

we WE

A0

DI0

DO0

A1 DO1 a1 do1

do0

clk

αaa0

M(αa)

αd

we

di0

-

-

-

βaa1

do0

-

M(βa) βd

do1

Page 286: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.1 Memory Operations 259

Sequence of Memory Operations

a0M

clk

di0

we WE

A0

DI0

DO0

A1 DO1 a1 do1

do0

clk

αaa0

M(γa)

αd

we

di0

βaa1

do0

M(θa)

do1

γa

γd2

θa

-

-

-

-

M(αa)

M(βa) βd

γd1

θd

Page 287: ECE 327 Slides VHDL Verilog Digital Hardware Design

260 CHAPTER 2. RTL DESIGN WITH VHDL

2.11.2 Memory Arrays in VHDL

This section reserved for your reading pleasure

2.11.3 Data Dependencies

Definition of Three Types of Dependencies

M[i] :=

:= M[i]

:=

M[i]

:=

:=

M[i]:=

M[i]

:=

:=

M[i]:=

Read after Write Write after Write Write after Read(True dependency) (Load dependency) (Anti dependency)

Instructions in a program can be reordered, so long as the data dependencies arepreserved.

Page 288: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.3 Data Dependencies 261

Purpose of Dependencies

R3 := ......

... := ... R3 ...

producer

consumer

W1

R1

R3 := ......W0

W2

WAW ordering prevents W0

from happening after W1

WAR ordering prevents W2

from happening before R1

RAW ordering prevents R1

from happening before W1

R3 := ......

Each of the three types of memory dependencies (RAW, WAW, and WAR) serves aspecific purpose in ensuring that producer-consumer relationships are preserved.

Page 289: ECE 327 Slides VHDL Verilog Digital Hardware Design

262 CHAPTER 2. RTL DESIGN WITH VHDL

Ordering of Memory Operations

Data Dependencies

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3] M[2] M[1] M[0]30 20 10 0

M[3]C :=

21

Initial Program

Page 290: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.3 Data Dependencies 263

Data Dependencies (Cont’d)

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3]C :=

Initial Program

M[2] := 21

M[3] 31:=

A := M[2]

B := M[0]

M[3] 32:=

M[0] 01:=

C := M[3]

Valid Modification

Page 291: ECE 327 Slides VHDL Verilog Digital Hardware Design

264 CHAPTER 2. RTL DESIGN WITH VHDL

Data Dependencies (Cont’d)

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3]C :=

Initial Program

M[2] := 21

M[3] 31:=

A := M[2]

B := M[0]

M[3] 32:=

M[0] 01:=

C := M[3]

Valid (or Bad?) Modification

Page 292: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.4 Memory and Dataflow Diagrams 265

2.11.4 Memory and Dataflow Diagrams

Legend for Dataflow Diagrams

name

name name name (rd) name(wr)

Input port Output port State signal Array read Array write

Basic Memory Operations

mem(rd)

addr

data

mem

mem (anti-dependency)

mem(wr)

data addrmem

mem

data := mem[addr]; mem[addr] := data;Memory Read Memory Write

Page 293: ECE 327 Slides VHDL Verilog Digital Hardware Design

266 CHAPTER 2. RTL DESIGN WITH VHDL

Dataflow Diagrams and Data Dependencies

Read after Write Dependencies

Algo: mem[wr addr] := data in;data out := mem[rd addr];

data_out

mem(wr)

data_in wr_addr

rd_addr

mem

mem(rd)

mem

Read after Write

Page 294: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.4 Memory and Dataflow Diagrams 267

Read after Write Optimization

Algo: mem[wr addr] := data in;data out := mem[rd addr];

data_out

mem(wr)

data_in wr_addrrd_addr

mem

mem(rd)

mem

Optimization when rd addr 6= wr addr

Page 295: ECE 327 Slides VHDL Verilog Digital Hardware Design

268 CHAPTER 2. RTL DESIGN WITH VHDL

Write after Write Dependencies

Algo: mem[wr1 addr] := data1;mem[wr2 addr] := data2;

mem(wr)

mem

mem(wr)

data1 wr1_addr

wr2_addr

mem

data2

Write after Write

Page 296: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.4 Memory and Dataflow Diagrams 269

Write after Write Scheduling Option

Algo: mem[wr1 addr] := data1;mem[wr2 addr] := data2;

mem(wr)

mem

mem(wr)

data1 wr1_addr

wr2_addr

mem

data2

Write after Write

Algo: mem[wr1 addr] := data1;mem[wr2 addr] := data2;

mem(wr)

mem(wr)

data1 wr1_addr

wr2_addr

mem

data2mem

Scheduling option whenwr1 addr 6= wr2 addr

Page 297: ECE 327 Slides VHDL Verilog Digital Hardware Design

270 CHAPTER 2. RTL DESIGN WITH VHDL

Write after Read Dependencies

Algo: rd data := mem[rd addr];mem[wr addr] := wr data;

mem(wr)

mem

mem(rd)

rd_addr

wr_addr

mem

wr_data

rd_data

Write after Read

Page 298: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.4 Memory and Dataflow Diagrams 271

Write after Read Optimization

Algo: rd data := mem[rd addr];mem[wr addr] := wr data;

mem(wr)

mem

mem(rd)

rd_addr wr_addr

mem

wr_data

rd_data

Optimization when rd addr 6= wr addr

Page 299: ECE 327 Slides VHDL Verilog Digital Hardware Design

272 CHAPTER 2. RTL DESIGN WITH VHDL

2.11.5 Ex: Mem Array and Dataflow Dia-gram

M(wr)

data_in wr_addr

2

M(rd)

mem

M 21 2

M(wr)

31 3

A

0

M(rd)

B M(wr)

32 3

M(wr) 3

01 0

M(rd)

CM

M[2]

M[3]

M[3]

M[0]

:=

A

B

21

31

32

01

:=

:=

:=

M[2]

M[0]

:=

:=

M[3]C :=

1

2

3

4

5

6

7

1

2

3 4

5

6

7

Page 300: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.5 Ex: Mem Array and Dataflow Diagram 273

Dependencies for Known Addresses

M(wr)

data_in wr_addr

2

M(rd)

mem

M 21 2

M(wr)

31 3

A

0

M(rd)

B M(wr)

32 3

M(wr) 3

01 0

M(rd)

CM

Page 301: ECE 327 Slides VHDL Verilog Digital Hardware Design

274 CHAPTER 2. RTL DESIGN WITH VHDL

Anti-Dependencies for Known Addresses

M(wr)

data_in wr_addr

2

M(rd)

mem

M 21 2

M(wr)

31 3

A

0

M(rd)

B M(wr)

32 3

M(wr) 3

01 0

M(rd)

CM

Page 302: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.5 Ex: Mem Array and Dataflow Diagram 275

Minimal Dependencies

M(wr)

2

M(rd)

M 21 2

M(wr)

31 3

A

0

M(rd)

B

M(wr)

32 3

M(wr)

01 0

3

M(rd)

CM

Memory array with minimal dependencies

Page 303: ECE 327 Slides VHDL Verilog Digital Hardware Design

276 CHAPTER 2. RTL DESIGN WITH VHDL

Memory Array with Orderings

M(wr)

2

M(rd)

M 21 2

M(wr)

31 3

A

0

M(rd)

B

M(wr)

32 3

M(wr)

01 0

3

M(rd)

CM

3

2

1 1 2

34

Memory array with orderings

Page 304: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.11.5 Ex: Mem Array and Dataflow Diagram 277

Place Operations in Clock Cycles

M(wr)

2

M(rd)

M

21 2

M(wr)

31 3

A

0

M(rd)

B

M(wr)

32 3

M(wr)

01 0 3

M(rd)

CM

3

2

1 1

2

3

4

Page 305: ECE 327 Slides VHDL Verilog Digital Hardware Design

278 CHAPTER 2. RTL DESIGN WITH VHDL

Final Dataflow Diagram

M(wr)

2

M(rd)

M

21 2

M(wr)

31 3

A

0

M(rd)

B

M(wr)

32 3

M(wr)

01 03

M(rd)

C M

3

2

1 1

2

3

4

Final version of DFD

Page 306: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.12. INPUT / OUTPUT PROTOCOLS 279

2.12 Input / Output Protocols

This section reserved for your reading pleasure

Page 307: ECE 327 Slides VHDL Verilog Digital Hardware Design

280 CHAPTER 2. RTL DESIGN WITH VHDL

2.13 Example: Moving Average

In this section we will design a circuit that performs a moving average as it receivesa stream of data. When each new data item is received, the output is the averageof the four most recently received data.

2 3 5 6 6 0 2 2 5 3 1i_data

o_avg 4 5 4 3

Time 0 1 2 3 4 5 6 7 8 9 10

Page 308: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.1 Requirements and Environmental Assumptions 281

2.13.1 Requirements and EnvironmentalAssumptions1. Input data is sent sporadically, with at least 2 clock cycles of bubbles (invalid

data) between valid data.

2. When the input data is valid, the signal i valid is asserted for exactly oneclock cycle.

3. Input data will be 8-bit signed numbers.

4. When output data is ready, o valid shall be asserted.

5. The output data (o avg ) shall be the average of the four most recently receivedinput data. Output numbers shall be truncated to integer values.

Page 309: ECE 327 Slides VHDL Verilog Digital Hardware Design

282 CHAPTER 2. RTL DESIGN WITH VHDL

2.13.2 Algorithm

Generic equation with input data xi:

avgi = (xi−3 + xi−2 + xi−1 + xi)/4

Decompose into sum and avg:

sumi = xi−3 + xi−2 + xi−1 + xiavgi = sumi/4

Look for patterns and potential optimizations:

sum5 = x2 +(x3 + x4 + x5)sum6 = (x3 + x4 + x5)+ x6

= sum5− x2 + x6

Generalized recurrence equation:

sumi = sumi−1− xi−4 + xiavgi = sumi/4

Page 310: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.2 Algorithm 283

Summary of Behaviour1. Define a signal new for the value of i data each time that i valid is ’1’ .

2. Define a memory array Mto store a sliding window of the four most recent valuesof i data .

3. Define a signal old for the oldest data value from the sliding window.

4. Update sumi with sumi−1 – old i + newi

Page 311: ECE 327 Slides VHDL Verilog Digital Hardware Design

284 CHAPTER 2. RTL DESIGN WITH VHDL

Sliding Window

Two design patterns to choose from: shift register vs circular buffer

α β δγold newM[3] M[2] M[1] M[0]

α ε

β δγ

η

ι

ζε

δγ ζε

ηδ ζε

β

γ

δ

κ

λ

ιηζε

κιηζ

ε

ζ

Shift register

α β δγε

M[0..3]old new

α

β

γ

δ

β δγ

δγ

δ

η

ι

ζε

ε

ε

ζ

ζ η

ει

κε ζ η

ζικ ζ η

λ

Circular Buffer

For FIFO behaviour, circular buffer is usually prefered: smaller and lower power.

Page 312: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.2 Algorithm 285

Sliding Window with Registers

CE

D Q

CE

D Q

CE

D Q

CE

D Q

d

ce[0]

ce[1]

ce[2]

ce[3]

M[0]

M[1]

M[2]

M[3]

8

q

8

8

8

8

8

we addr

idx[0]

idx[1]

idx[2]

idx[3]

Register array with chip-enables and decoded multiplexer

Page 313: ECE 327 Slides VHDL Verilog Digital Hardware Design

286 CHAPTER 2. RTL DESIGN WITH VHDL

2.13.3 Pseudocode and Dataflow Diagrams

First Pseudocode

Real 3-address pseudocode

new = i_data

old = M[idx]

tmp = sum - old

sum = tmp + new

M[idx] = new

idx = idx rol 1

o_avg = sum/4

sum i_data

sum o_avg

(wired shift)

M idx

Rd

Wr

M idx

1tmp

new

old

Page 314: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.3 Pseudocode and Dataflow Diagrams 287

Remove intermediate signal old

new = i_data

tmp = sum - M[idx]

sum = tmp + new

M[idx] = new

idx = idx rol 1

o_avg = sum/4reading new from memorytmp = sum - M[idx]

M[idx] = i_data

new = M[idx]

sum = tmp + new

idx = idx rol 1

o_avg = sum/4Remove intermediate signal new

tmp = sum - M[idx]

M[idx] = i_data

sum = tmp + M[idx]

idx = idx rol 1

o_avg = sum/4

Data-dependency graph after removingnew

i_data

o_avg

(wired shift)

Rd

Wr

M

1Rd

tmp

old

new

sum idx

sum M idx

Page 315: ECE 327 Slides VHDL Verilog Digital Hardware Design

288 CHAPTER 2. RTL DESIGN WITH VHDL

Dataflow Diagram

Latency of three clock cycles

sumi_data

o_avg

(wired shift)

M idx

RdWr

1Rd

S1

S2

S0

S0M sum idx

Latency of two clock cycles

sumi_data

sum o_avg

(wired shift)

M idx

RdWr

M idx

1Rd

S1

S0

S0

Two clock cycles potentially preferable for performance, but requires an additionalmultiplexer.

Page 316: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.3 Pseudocode and Dataflow Diagrams 289

Latency of two clock cycles with registered addresssumi_data

(wired shift)

idx

RdWr1

Rd

S1

S0

S0

M

sum o_avgM idx

Removes need for multiplexer on address input to circular buffer

Page 317: ECE 327 Slides VHDL Verilog Digital Hardware Design

290 CHAPTER 2. RTL DESIGN WITH VHDL

Register and Datapath Allocation

sumidx

sumi_data

(wired shift)

idx

RdWr1

Rd

as1

as1

S1

S0

S0

M

sum o_avgM idx

idxsum

rol

Page 318: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.4 Control Tables and State Machine 291

2.13.4 Control Tables and State Machine

sumidx

sumi_data

(wired shift)

idx

RdWr1

Rd

as1

as1

S1

S0

S0

M

sum o_avgM idx

idxsum

rol

Register controltable

M idx sumwe addr d ce d ce d

S0 1 idx x 0 – 1 as1S1 0 idx – 1 rol 1 as1

Datapath controltable

as1 rolsub src1 src2 src1 src2

S0 0 M sum – –S1 1 sum M idx 1

Page 319: ECE 327 Slides VHDL Verilog Digital Hardware Design

292 CHAPTER 2. RTL DESIGN WITH VHDL

Optimized control table

M idx as1we ce sub

S0 1 1 0S1 0 0 1

Static assignments in control tableM.addr = idx

M.d = x

idx.d = rol

sum.d = as1

as1.src1 = sum

as1.src2 = M

Page 320: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.4 Control Tables and State Machine 293

Control Table and Bubbles

Almost final control table

M idx sum as1we ce ce sub

S0 1 0 1 0S1 0 1 1 1

idle 0 0 0 –

Final control table

M idx sum as1we ce ce sub

S0 1 0 1 0S1 0 1 1 1

idle 0 0 0 0

Static assignmentsM.addr = idx

M.d = x

idx.d = rol

sum.d = as1

as1.src1 = sum

as1.src2 = M

Page 321: ECE 327 Slides VHDL Verilog Digital Hardware Design

294 CHAPTER 2. RTL DESIGN WITH VHDL

State Machine

i valid valid1S0 1 0S1 0 1

idle 0 0

Final control table with state encoding

state M idx sum as1i valid valid1 we ce ce sub

S0 1 0 1 0 1 0S1 0 1 0 1 1 1

idle 0 0 0 0 0 0

M.we = i_valid

idx.ce = valid1

sum.ce = i_valid OR valid1

as1.sub = valid1

Page 322: ECE 327 Slides VHDL Verilog Digital Hardware Design

2.13.5 VHDL Code 295

2.13.5 VHDL Code

-- valid bitsprocess begin

wait until rising_edge(clk);valid1 <= i_valid;o_valid <= valid1;

end process;-- idxprocess begin

wait until rising_edge(clk);if reset = ’1’ then

idx <= "0001";else

if valid1 = ’1’ thenidx <= idx rol 1;

end if;end if;

end process;

-- sliding windowprocess begin

wait until rising_edge(clk);for i in 3 downto 0 loop

if (i_valid = ’1’) and (idx(i) = ’1’) thenM(i) <= i_data;

end if;end loop;

end process;mem_out <= M(0) when idx(0) = ’1’

else M(1) when idx(1) = ’1’else M(2) when idx(2) = ’1’else M(3);

-- add subadd_sub <= sum - mem_out when valid1 = ’1’

else sum + mem_out;-- sumprocess begin

wait until rising_edge(clk);if i_valid = ’1’ or valid1 = ’1’ then

sum <= add_sub;end if;

end process;

Page 323: ECE 327 Slides VHDL Verilog Digital Hardware Design

296 CHAPTER 2. RTL DESIGN WITH VHDL

Hardware

i_datai_valid

valid1

add/sub

sum

o_avg(wired shift)

M

(wired shift) idx

CE

CE

CEA

o_valid

Page 324: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 3

Performance Analysis andOptimization

297

Page 325: ECE 327 Slides VHDL Verilog Digital Hardware Design

298 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.1 Introduction

Hennessey and Patterson’s Quantitative Computer Achitecture (textbook for E&CE429) has good information on performance. We will use some of the same def-initions and formulas as Hennessey and Patterson, but we will move away fromgeneric definitions of performance for computer systems and focus on performancefor digital circuits.

Page 326: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.2. DEFINING PERFORMANCE 299

3.2 Defining Performance

Performance =WorkTime

You can double your performance by:

doing twice the work in the same amount of time

OR doing the same amount of work in half the time

Page 327: ECE 327 Slides VHDL Verilog Digital Hardware Design

300 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

Benchmarking

Performance =WorkTime

Measuring time is easy, but how do we accurately measure work?

The game of benchmarketing is finding a definition of work that makes your systemappear to get the most work done in the least amount of time.

Measure of Work Measure of Performanceclock cycle MHzinstruction MIPssynthetic program Whetstone, Dhrystone, D-MIPs (Dhrystone MIPs)real program SPECtravel 1/4 mile drag race

Page 328: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.2. DEFINING PERFORMANCE 301

SPEC Benchmarks

The Spec Benchmarks are among the most respected and accurate predictions ofreal-world performance.

Definition SPEC: Standard Performance Evaluation Corporation MISSION:“To establish, maintain, and endorse a standardized set of relevantbenchmarks and metrics for performance evaluation of modern computersystems http://www.spec.org .”

The Spec organization has different benchmarks for integer software, floating-pointsoftware, web-serving software, etc.

Page 329: ECE 327 Slides VHDL Verilog Digital Hardware Design

302 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.3 Comparing Performance

3.3.1 General Equations

Equation for “Big is n% greater than Small”:

n% =Big−Small

Small

Using “n% greater” formula, the phrase “The performance of A is n% greater thanthe performance of B” is:

n% =PerformanceA−PerformanceB

PerformanceB

Performance is inversely proportional to time:

Performance =1

Time

Page 330: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.3.1 General Equations 303

Substituting the above equation into the equation for “the performance of A is n%greater than the performance of B” gives:

n% =TimeB−TimeA

TimeA

In general, the equation for a fast system to be “n%” faster than a slow system is:

n% =TSlow −TFast

TFast

Another useful formula is the average time to do one of k different tasks, each ofwhich happens %i of the time and takes an amount of time Ti to do each time it isdone .

TAvg =k

∑i=1

(%i)(Ti)

We can measure the performance of practically anything (cars, computers, vacuumcleaners, printers....)

Page 331: ECE 327 Slides VHDL Verilog Digital Hardware Design

304 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.3.2 Example: Performance of Printers

This section reserved for your reading pleasure

Page 332: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.4. CLOCK SPEED, CPI, PROGRAM LENGTH, AND PERFORMANCE 305

3.4 Clock Speed, CPI, Program Length, andPerformance

3.4.1 Mathematics

CPI Cycles per instructionNumInsts Number of instructionsClockSpeed Clock speedClockPeriod Clock period

Time = NumInsts×CPI×ClockPeriod

Time = NumInsts×CPIClockSpeed

Page 333: ECE 327 Slides VHDL Verilog Digital Hardware Design

306 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.4.2 Example: CISC vs RISC and CPI

Clock Speed SPECintAMD Athlon 1.1GHz 409Fujitsu SPARC64 675MHz 443

The AMD Athlon is a CISC microprocessor (it uses the IA-32 instruction set). TheFujitsu SPARC64 is a RISC microprocessor (it uses Sun’s Sparc instruction set).Assume that it requires 20% more instructions to write a program in the Sparcinstruction set than the same program requires in IA-32.

Page 334: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.4.2 Example: CISC vs RISC and CPI 307

SPECint and Performance

Clock Speed SPECintAMD Athlon 1.1GHz 409Fujitsu SPARC64 675MHz 443

Question: Which of the two processors has higher performance?

Page 335: ECE 327 Slides VHDL Verilog Digital Hardware Design

308 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

Relative CPI

Question: What is the ratio between the CPIs of the two microprocessors?

Page 336: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.4.2 Example: CISC vs RISC and CPI 309

Absolute CPI

Question: Can you determine the absolute (actual) CPI of eithermicroprocessor?

Page 337: ECE 327 Slides VHDL Verilog Digital Hardware Design

310 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.4.3 Effect of Instruction Set on Perfor-mance

Your group designs a microprocessor and you are considering adding a fusedmultiply-accumulate to the instruction set. (A fused multiply accumulate is a sin-gle instruction that does both a multiply and an addition. It is often used in digitalsignal processing.)

Your studies have shown that, on average, half of the multiply operations are fol-lowed by an add instruction that could be done with a fused multiply-add.

Additionally, you know:

cpi %ADD 0.8 CPIavg 15%MUL 1.2 CPIavg 5%Other 1.0 CPIavg 80%

Page 338: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.4.3 Effect of Instruction Set on Performance 311

Options

You have three options:

option 1 : no change

option 2 : add the MAC instruction, increase the clock period by 20%, and MAChas the same CPI as MUL.

option 3 : add the MAC instruction, keep the clock period the same, and the CPIof a MAC is 50% greater than that of a multiply.

Question: Which option will result in the highest overall performance?

Page 339: ECE 327 Slides VHDL Verilog Digital Hardware Design

312 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.4.4 Effect of Time to Market on RelativePerformance

Assume that performance of the average product in your market segment doublesevery 18 months.

You are considering an optimization that will improve the performance of your prod-uct by 7%.

Question: If you add the optimization, how much can you allow yourschedule to slip before the delay hurts your relative performance comparedto not doing the optimization and launching the product according to yourcurrent schedule?

3.4.5 Summary of Equations

Page 340: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5. PERFORMANCE ANALYSIS AND DATAFLOW DIAGRAMS 313

3.5 Performance Analysis and Dataflow Di-agrams

3.5.1 Dataflow Diagrams, CPI, and ClockSpeed• One of the challenges in designing a circuit is to choose the clock speed.

• Choosing a clock period affects many aspects of the design, not just the overallperformance.

• Some goals will push you toward a short clock period

• Some goals will push you toward a long clock period

Page 341: ECE 327 Slides VHDL Verilog Digital Hardware Design

314 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

Goal Action Affect

Minimize area

Increase schedulingflexibility

Decrease percentage ofclock cycle spent in flops(overhead — time inflops is not doing usefulwork)Decrease time to exe-cute an instruction

Page 342: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5.1 Dataflow Diagrams, CPI, and Clock Speed 315

Outline to Choose Clock Period

Outline of plan to find optimal latency and clock period for maximum performance:

1. Start with smallest possible clock period.

2. Allocate operations to clock cycles

3. Calculate average time to execute an instruction.

4. If latency > 1, then: increase clock period until reduce latency; return to Step 2.Else (latency = 1): choose clock period and dataflow diagram that resulted inhighest performance.

5. Optimize dataflow diagram to reduce area.

Page 343: ECE 327 Slides VHDL Verilog Digital Hardware Design

316 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.5.2 Examples of Dataflow Diagrams forTwo Instructions

• Circuit supports two instructions, Aand B

• Each operation occurs 50% of thetime.

• The delay through a register is 5ns.

• Find clock period and dataflow di-agram to maximize overall perfor-mance.

Instruction A

f (30ns)

g (50 ns)

h (20 ns)

g (50 ns)

Instruction B

i (40ns)

g (50 ns)

Page 344: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5.2 Examples of Dataflow Diagrams for Two Instructions 317

3.5.2.1 Scheduling of Operations for Differ-ent Clock Periods

Scheduling (1)

55ns Clock Period

f (30ns)

g (50 ns)

h (20 ns)

g (50 ns)

i (40ns)

g (50 ns)

55ns

55ns

55ns

55ns

Instr A Instr B 25 ns 15 ns

Page 345: ECE 327 Slides VHDL Verilog Digital Hardware Design

318 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

Scheduling (2)

25 ns 15 ns

25 ns 15 ns

Page 346: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5.2 Examples of Dataflow Diagrams for Two Instructions 319

Scheduling (3)

25 ns 15 ns

Page 347: ECE 327 Slides VHDL Verilog Digital Hardware Design

320 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.5.2.2 Performance Computation for Dif-ferent Clock Periods

Question: Which clock speed will result in the highest overall performance?

Clock Period CPIA CPIB Tavg55ns75ns85ns95ns155ns

Page 348: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5.2 Examples of Dataflow Diagrams for Two Instructions 321

3.5.2.3 Example: Two Instructions TakingSimilar Time

Question: For the flow below, which clock speed will result in the highestoverall performance?

A B30ns 40ns50ns 50ns20ns 40ns50ns

Clock Period CPIA CPIB Tavgnsnsnsnsnsns

Page 349: ECE 327 Slides VHDL Verilog Digital Hardware Design

322 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.5.2.4 Example: Same Total Time, Differ-ent Order for A

Question: For the flow below, which clock speed will result in the highestoverall performance?

A B30ns 40ns20ns 50ns50ns 40ns50ns

Clock Period CPIA CPIB Tavgnsnsnsns

Page 350: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5.3 Example: From Algorithm to Optimized Dataflow 323

3.5.3 Example: From Algorithm to Opti-mized Dataflow

This question involves doing some of the design work for a circuit that implementsInstP and InstQ using the components described below.

Instruction Algorithm Frequence of OccurrenceInstP a×b× ((a×b)+(b×d)+ e) 75%InstQ (i+ j + k + l)×m 25%

Component Delays2-input Mult 40ns2-input Add 25nsRegister 5ns

Page 351: ECE 327 Slides VHDL Verilog Digital Hardware Design

324 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

NOTES• There is a resource limitation of a maximum of 3 input ports. (There are no other

resource limitations.)

• You must put registers on your inputs, you do not need to register your outputs.

• The environment will directly connect your outputs (its inputs) to registers.

• Each input value (a, b, c, d, e, i, j, k, l, m) can be input only once — if you needto use a value in multiple clock cycles, you must store it in a register.

Page 352: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.5.3 Example: From Algorithm to Optimized Dataflow 325

Questions

Question: What clock period will result in the best overall performance?

Question: Find a minimal set of resources that will achieve theperformance you calculated.

Page 353: ECE 327 Slides VHDL Verilog Digital Hardware Design

326 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.6 General Optimizations

3.6.1 Strength Reduction

Strength reduction replaces one operation with another that is simpler.

3.6.1.1 Arithmetic Strength Reduction

Multiply by a constant power of two wired shift logical leftMultiply by a power of two shift logical leftDivide by a constant power of two wired shift logical rightDivide by a power of two shift logical rightMultiply by 3 wired shift and addition

Page 354: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.6.1 Strength Reduction 327

3.6.1.2 Boolean Strength ReductionBoolean tests that can be implemented as wires• is odd, is even

• is neg, is pos

By choosing your encodings carefully, you can sometimes reduce a vector compar-ison to a wire.

For example if your state uses a one-hot encoding, then the comparison state =S3 reduces to state(3) = ’1’ . You might expect a reasonable logic-synthesistool to do this reduction automatically, but most tools do not do this reduction.

When using encodings other than one-hot, Karnaugh maps can be useful tools foroptimizing vector comparisons. By carefully choosing our state assignments, whenwe use a full binary encoding for 8 states, the comparison:

(state = S0 or state = S3 or state = S4) = ’1’

can be reduced from looking at 3 bits, to looking at just 2 bits. If we have a conditionthat is true for four states, then we can find an encoding that looks at just 1 bit.

Page 355: ECE 327 Slides VHDL Verilog Digital Hardware Design

328 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.6.2 Replication and Sharing

3.6.2.1 Mux-Pushing

Pushing multiplexors into the fanin of a signal can reduce area.

Beforez <= a + b when (w = ’1’)

else a + c;

Aftertmp <= b when (w = ’1’)

else c;

z <= a + tmp;

The first circuit will have two adders, while the second will have one adder. Somesynthesis tools will perform this optimization automatically, particularly if all of thesignals are combinational.

Page 356: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.6.2 Replication and Sharing 329

3.6.2.2 Common Subexpression Elimina-tion

Introduce new signals to capture subexpressions that occur multiple places in thecode.

Beforey <= a + b + c when (w = ’1’)

else d;

z <= a + c + d when (w = ’1’)

else e;

Aftertmp <= a + c;

y <= b + tmp when (w = ’1’)

else d;

z <= d + tmp when (w = ’1’)

else e;

Page 357: ECE 327 Slides VHDL Verilog Digital Hardware Design

330 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

Subexpression Elimination

Note: Clocked subexpressions Care must be taken when doingcommon subexpression elimination in a clocked process. Puttingthe “temporary” signal in the clocked process will add a clock cycleto the latency of the computation, because the tmp signal will beflip-flop. The tmp signal must be combinational to preserve thebehaviour of the circuit.

Page 358: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.6.2 Replication and Sharing 331

3.6.2.3 Computation Replication• To improve performance

– If same result is needed at two very distant locations and wire delays are sig-nificant, it might improve performance (increase clock speed) to replicate thehardware

• To reduce area

– If same result is needed at two different times that are widely separated, itmight be cheaper to reuse the hardware component to repeat the computationthan to store the result in a register

Note: Muxes are not free Each time a component is reused,multiplexors are added to inputs and/or outputs. Too much sharingof a component can cost more area in additional multiplexors thanwould be spent in replicating the component

Page 359: ECE 327 Slides VHDL Verilog Digital Hardware Design

332 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

3.6.3 Arithmetic

VHDL is left-associative. The expression a + b + c + d is interpreted as (((a

+ b) + c) + d) . You can use parentheses to suggest parallelism.

Perform arithmetic on the minimum number of bits needed. If you only need thelower 12 bits of a result, but your input signals are 16 bits wide, trim your inputs to12 bits. This results in a smaller and faster design than computing all 16 bits of theresult and trimming the result to 12 bits.

Page 360: ECE 327 Slides VHDL Verilog Digital Hardware Design

3.7. RETIMING 333

3.7 Retiming

state

a

b

c

sel

x y z

critical path

state S0 S1 S2 S3 S0 S1 S2 S3a b c

sel x y z

αβγ1α

α+γα+γ

process begin

wait until rising_edge(clk);

if state = S1 then

z <= a + c;

else

z <= b + c;

end if;

end process;

Page 361: ECE 327 Slides VHDL Verilog Digital Hardware Design

334 CHAPTER 3. PERFORMANCE ANALYSIS AND OPTIMIZATION

Retimed Circuit and Waveform

state

a

b

c

sel

x y z

state S0 S1 S2 S3 S0 S1 S2 S3a b c

sel x y z

αβγ

process (state) beginif state = S1 then

sel = ’1’else

sel = ’1’end if;

end process;process begin

wait until rising_edge(clk);if sel = ’1’ then

... -- code for zend if;

end process;

process beginwait until rising_edge(clk);if state = then

sel = ’1’else

sel = ’1’end if;

end process;process begin

wait until rising_edge(clk);if sel = ’1’ then

... -- code for zend if;

end process;

Page 362: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 4

Functional Verification

335

Page 363: ECE 327 Slides VHDL Verilog Digital Hardware Design

336 CHAPTER 4. FUNCTIONAL VERIFICATION

4.1 Overview

4.1.1 Terminology: Validation / Verification/ Testing

4.1.2 The Difficulty of Designing CorrectChips

Page 364: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.1.2 The Difficulty of Designing Correct Chips 337

4.1.2.1 Notes from Kenn Heinrich (UWE&CE grad)

“Everyone should get a lecture on why their first industrial design won’t work in thefield.”

Note: There are six reasons in your notes.

4.1.2.2 Notes from Aart de Geus (Chairmanand CEO of Synopsys)

More than 60% of the ASIC designs that are fabricated have at least one error,issue, or a problem that whose severity forced the design to be reworked.

Note: There is a pretty picture in your notes.

Page 365: ECE 327 Slides VHDL Verilog Digital Hardware Design

338 CHAPTER 4. FUNCTIONAL VERIFICATION

4.2 Test Cases and Coverage

4.2.1 Coverage

To be absolutely certain that an implementation is correct, we must check everycombination of values. This includes both input values and internal state (flip flops).

If we have ni bits of inputs and ns bits in flip-flops, we have to test 2ni+ns differentcases when doing functional verification.

Question: If we have nc combinational signals, why don’t we have to test2ni+ns+nc different cases?

Page 366: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.2.2 Floating Point Divider Example 339

4.2.2 Floating Point Divider Example

This example illustrates the difficulty of achieving significant coverage on realisticcircuits.

Consider doing the functional simulation for a double precision (64-bit) floating-pointdivider.

Given InformationData width 64 bitsNumber of gates in circuit 10 000Number of assembly-language instructions tosimulate one gate for one test case

100

Number of clock cycles required to execute oneassembly language instruction on the computerthat is running the simulation

0.5

Clock speed of computer that is running the sim-ulation

1 Gigahertz

Page 367: ECE 327 Slides VHDL Verilog Digital Hardware Design

340 CHAPTER 4. FUNCTIONAL VERIFICATION

Number of Cases

Question: How many cases must be considered?

width=64b, gates=10 000, instrs/gate=100, cycles/instr=0.5, cycles/sec=109

Page 368: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.2.2 Floating Point Divider Example 341

Simulation Run Time

Question: How long will it take to simulate all of the different possible casesusing a single computer?

width=64b, gates=10 000, instrs/gate=100, cycles/instr=0.5, cycles/sec=109

Page 369: ECE 327 Slides VHDL Verilog Digital Hardware Design

342 CHAPTER 4. FUNCTIONAL VERIFICATION

Coverage

Question: If you can run simulations non-stop for one year on tencomputers, what coverage will you achieve?

width=64b, gates=10 000, instrs/gate=100, cycles/instr=0.5, cycles/sec=109

Page 370: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.2.2 Floating Point Divider Example 343

Simulation vs the Real World

From Validating the Intel(R) Pentium(R) Microprocessor by Bob Bentley, DesignAutomation Conference 2001. (Link on E&CE 327 web page.)• Simulating the Pentium 4 Processor on a Pentium 3 Processor ran at about 15

MHz.

• By tapeout, over 200 billion simulation cycles had been run on a network ofcomputers.

• All of these simulations represent less than two minutes of running a real proces-sor.

Page 371: ECE 327 Slides VHDL Verilog Digital Hardware Design

344 CHAPTER 4. FUNCTIONAL VERIFICATION

4.3 Testbenches

4.3.1 Overview of Test Benches

stimulus

implementation

specification

check

testbench

Implementation Circuit that you’re checking for bugsalso known as: “design under test” or “unit under test”

Stimulus Generates test vectors

Specification Describes desired behaviour of implementation

Check Checks whether implementation obeys specification

Page 372: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.3.2 Reference Model Style Testbench 345

4.3.2 Reference Model Style Testbench

stimulus

implementation

specification

reference model testbench

4.3.3 Relational Style Testbench

stimulus

implementation

relational testbench

check

Page 373: ECE 327 Slides VHDL Verilog Digital Hardware Design

346 CHAPTER 4. FUNCTIONAL VERIFICATION

4.3.4 Coding Structure of a Testbench

stimulus

implementation

specification

check

testbench

architecture main of athabasca_tb iscomponent declaration for implementation;other declarations

beginimplementation instantiation;stimulus process;specification process (or component instantiation);check process;

end main;

Page 374: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.3.5 Datapath vs Control 347

4.3.5 Datapath vs Control

Datapath and control circuits tend to use different styles of testbenches.

stimulus

implementation

specification

reference model testbench

stimulus

implementation

relational testbench

check

Page 375: ECE 327 Slides VHDL Verilog Digital Hardware Design

348 CHAPTER 4. FUNCTIONAL VERIFICATION

4.3.6 Verification Tips

Suggested order of simulation for functional verification.

1. Write high-level model.

2. Simulate high-level model until have correct functionality and latency.

3. Write synthesizable model.

4. Use zero-delay simulation (uw-sim ) to check behaviour of synthesizable modelagainst high-level model.

5. Optimize the synthesizable model.

6. Use zero-delay simulation (uw-sim ) to check behaviour of optimized modelagainst high-level model.

7. Use timing-simulation (uw-timsim ) to check behaviour of optimized modelagainst high-level model.

section 4.4 describes a series of testbenches that are particularly useful for debug-ging datapath circuits in the early phases of the design cycle.

Page 376: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.4. FUNCTIONAL VERIFICATION FOR DATAPATH CIRCUITS 349

4.4 Functional Verification for Datapath Cir-cuits

In this section we will incrementally develop a testbench for a very simple circuit:an AND gate.

Page 377: ECE 327 Slides VHDL Verilog Digital Hardware Design

350 CHAPTER 4. FUNCTIONAL VERIFICATION

Implementation

entity and2 is

port (

a, b : in std_logic;

c : out std_logic

);

end and2;

architecture main of and2 is

begin

c <= ’1’ when (a = ’1’ AND b = ’1’)

else ’0’;

end and2;

Page 378: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.4.1 A Spec-Less Testbench 351

4.4.1 A Spec-Less Testbench

First, use waveform viewer to check that implementation generates reasonable out-puts for a small set of inputs.

entity and2_tb isend and2_tb;

architecture main_tb of and2_tb iscomponent and2 ... end component;signal ta, tb, tc_impl : std_logic;signal ok : boolean;

begin---------------------------------------------impl : and2 port map (a => ta, b => tb, c => tc_impl);---------------------------------------------stimulus : processbegin

ta <= ’0’; tb <= ’0’;wait for 10ns;ta <= ’1’; tb <= ’1’;wait for 10ns;

end process;---------------------------------------------

end main_tb;

Page 379: ECE 327 Slides VHDL Verilog Digital Hardware Design

352 CHAPTER 4. FUNCTIONAL VERIFICATION

4.4.2 Use an Array for Test Vectorsarchitecture main_tb of and2_tb is

...begin

...stimulus : process

type test_datum_ty is recordra, rb : std_logic;

end record;type test_vectors_ty is

array(natural range <>) of test_datum_ty;constant test_vectors : test_vectors_ty :=

-- a b( ( ’0’, ’0’),

( ’1’, ’1’));

beginfor i in test_vectors’low to test_vectors’high loop

ta <= test_vectors(i).ra;tb <= test_vectors(i).rb;wait for 10 ns;

end loop;end process;

end main_tb;

Page 380: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.4.3 Build Spec into Stimulus 353

4.4.3 Build Spec into Stimulus

stimulus : processtype test_datum_ty is record

ra, rb, rc : std_logic;end record;type test_vectors_ty is

array(natural range <>) of test_datum_ty;constant test_vectors : test_vectors_ty :=

-- a, b: inputs-- c : expected output-- a b c( ( ’0’, ’0’, ’0’),

( ’0’, ’1’, ’0’),( ’1’, ’1’, ’1’)

);begin

for i in test_vectors’low to test_vectors’high loopta <= test_vectors(i).ra;tb <= test_vectors(i).rb;tc_spec <= test_vectors(i).rc;wait for 10 ns;

end loop;end process;

Page 381: ECE 327 Slides VHDL Verilog Digital Hardware Design

354 CHAPTER 4. FUNCTIONAL VERIFICATION

Build Spec into Stimulus (Cont’d)

stimulus : process...

beginfor i in test_vectors’low to test_vectors’high loopta <= test_vectors(i).ra;tb <= test_vectors(i).rb;tc_spec <= test_vectors(i).rc;wait for 10 ns;

end loop;end process;------------------------------------------check : process (tc_impl, tc_spec)begin

ok <= (tc_impl = tc_spec);end process;------------------------------------------

end main_tb;

Page 382: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.4.4 Have Separate Specification Entity 355

4.4.4 Have Separate Specification Entityentity and2_spec is...(same as and2 entity)...

end and2_spec;

architecture spec of and2_spec isbegin

c <= a AND b;end spec;

Page 383: ECE 327 Slides VHDL Verilog Digital Hardware Design

356 CHAPTER 4. FUNCTIONAL VERIFICATION

Testbench for Separate Specification

architecture main_tb of and2_tb iscomponent and2 ...;component and2_spec ...;signal ta, tb, tc_impl, tc_spec : std_logic;signal ok : boolean;

begin------------------------------------------impl : and2 port map (a => ta, b => tb, c => tc_impl);spec : and2_spec port map (a => ta, b => tb, c => tc_spec);------------------------------------------

stimulus process...check process...

end

Page 384: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.4.4 Have Separate Specification Entity 357

Testbench for Separate Spec (Cont’d)

stimulus : process...constant test_vectors : test_vectors_ty :=

-- a b( ( ’0’, ’0’),

( ’1’, ’1’));

beginfor i in test_vectors’low to test_vectors’high loop

ta <= test_vectors(i).ra;tb <= test_vectors(i).rb;wait for 10 ns;

end loop;end process;------------------------------------------check : process (tc_impl, tc_spec)begin

ok <= (tc_impl = tc_spec);end process;------------------------------------------

end main_tb;

Page 385: ECE 327 Slides VHDL Verilog Digital Hardware Design

358 CHAPTER 4. FUNCTIONAL VERIFICATION

4.4.5 Generate Test Vectors Automaticallyarchitecture main_tb of and2_tb is

...begin

...stimulus : process

subtype std_test_ty of std_logic is (’0’, ’1’);begin

for va in std_test_ty’low to std_test_ty’high loopfor vb in std_test_ty’low to std_test_ty’high loop

ta <= va;tb <= vb;wait for 10 ns;

end loop;end loop;

end process;...

end main_tb;

Page 386: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.4.6 Relational Specification 359

4.4.6 Relational Specification

Sometimes we want to check a relationship between the output and the input, ratherthan check that the output has a specific value.

To do this, we drop the spec process, and put the brains into the check process.

architecture main_tb of and2_tb is...

begin------------------------------------------impl : and2 port map (a => ta, b => tb, c => tc_impl);------------------------------------------stimulus : process

...end process;------------------------------------------check : process (tc_impl, tc_spec)begin

ok <= NOT (tc_impl = ’1’ AND (ta =’0’ OR tb = ’0’));end process;------------------------------------------

end main_tb;

Page 387: ECE 327 Slides VHDL Verilog Digital Hardware Design

360 CHAPTER 4. FUNCTIONAL VERIFICATION

4.5 Functional Verification of Control Cir-cuits

Control circuits are often more challenging to verify than datapath circuits.

In this section, we will explore the functional verification of state machines via aFirst-In First-Out queue.

Page 388: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.1 Overview of Queues in Hardware 361

4.5.1 Overview of Queues in Hardwarewrite read

qu

eu

e

Structure of queue

Page 389: ECE 327 Slides VHDL Verilog Digital Hardware Design

362 CHAPTER 4. FUNCTIONAL VERIFICATION

Empty Write 1

A

Write 2

A

Write Sequence

Page 390: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.1 Overview of Queues in Hardware 363

Write 1

BA

Write 2

BA

A Second Example Write

Page 391: ECE 327 Slides VHDL Verilog Digital Hardware Design

364 CHAPTER 4. FUNCTIONAL VERIFICATION

Read 1

BA

Read 2

BA

Example Read Sequence

Page 392: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.1 Overview of Queues in Hardware 365

Write 1

BCDEFGHI

J

Write 2

BCDEFGHIJ

Write Illustrating Index Wrap

Page 393: ECE 327 Slides VHDL Verilog Digital Hardware Design

366 CHAPTER 4. FUNCTIONAL VERIFICATION

Write 1

BCDEFGHIJ

K

Write 2

BCDEFGHIJ

K

Write Illustrating Full Queue

Page 394: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.1 Overview of Queues in Hardware 367

empty

mem

wr_idx

rd_idx

data_wrdata_rd

do_wr

do_rd

Queue Signals

empty

mem

wr_idx

rd_idx

data_wr

data_rd

do_wr

do_rd

WE

A0

DI0

DO0

A1 DO1

Incomplete Queue Blocks

Control circuitry not shown.

Page 395: ECE 327 Slides VHDL Verilog Digital Hardware Design

368 CHAPTER 4. FUNCTIONAL VERIFICATION

4.5.2 VHDL Coding

4.5.2.1 Package

package queue_pkg is

subtype data is std_logic_vector(3 downto 0);

function to_data(i : integer) return data;

end queue_pkg;

package body queue_pkg is

function to_data(i : integer) return data is

begin

return std_logic_vector(to_unsigned(i, 4));

end to_data;

end queue_pkg;

4.5.2.2 Other VHDL Coding

Page 396: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.3 Code Structure for Verification 369

This section reserved for your reading pleasure

4.5.3 Code Structure for Verification

Verification things to notice in queue implementation:

1. instrumentation code

2. coverage monitors

3. assertions

Page 397: ECE 327 Slides VHDL Verilog Digital Hardware Design

370 CHAPTER 4. FUNCTIONAL VERIFICATION

Code Structure for Verification

architecture ... is

...

begin

... normal implementation ...

process (clk)

begin

if rising_edge(clk) then

... instrumentation code ...

prev_ signame <= signame;

end if;

end process;

... assertions ...

... coverage monitors ...

end;

Page 398: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.4 Instrumentation Code 371

4.5.4 Instrumentation Code• Added to implementation to support verification

• Usually keeps track of previous values of signals

• Does not create hardware (Optimized away during synthesis)

• Does not feed any output signals

• Must use synthesizable subset of VHDL

process (clk) begin

if rising_edge(clk) then

prev_rd_idx <= rd_idx;

prev_wr_idx <= wr_idx;

prev_do_rd <= do_rd;

prev_do_wr <= do_wr;

end if;

end process;

Page 399: ECE 327 Slides VHDL Verilog Digital Hardware Design

372 CHAPTER 4. FUNCTIONAL VERIFICATION

Coverage Events for Queue

Question: What events should we monitor to estimate the coverage of ourfunctional tests?

Page 400: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.4 Instrumentation Code 373

Coverage Monitor Template

process ( signals read)

begin

if ( condition) then

report "coverage: message";

elsif ( condition) ) then

report "coverage: message";

else

report "error: case fall through on message"

severity warning;

end if;

end process;

Page 401: ECE 327 Slides VHDL Verilog Digital Hardware Design

374 CHAPTER 4. FUNCTIONAL VERIFICATION

Coverage Monitor Code

Events related to rd idx equals wr idx .

process (prev_rd_idx, prev_wr_idx, rd_idx, wr_idx)

begin

if (rd_idx = wr_idx) then

if ( prev_rd_idx = prev_wr_idx ) then

report "coverage: read = write both moved";

elsif ( rd_idx /= prev_rd_idx ) then

report "coverage: Read caught write";

elsif ( wr_idx /= prev_wr_idx ) then

report "coverage: Write caught read";

else

report "error: case fall through on rd/wr catching"

severity warning;

end if;

end if;

end process;

Page 402: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.4 Instrumentation Code 375

Coverage Monitor Code

Events related to rd idx wrapping.

process (rd_idx)

begin

if (rd_idx = low_idx) then

report "coverage: rd mv to low";

elsif (rd_idx = high_idx) then

report "coverage: rd mv to high";

else

report "coverage: rd mv normal";

end if;

end process;

Page 403: ECE 327 Slides VHDL Verilog Digital Hardware Design

376 CHAPTER 4. FUNCTIONAL VERIFICATION

4.5.5 Assertions

Assertions for Queue1. If rd idx changes, then it increments or wraps.

2. If rd idx changes, then do rd was ’1’ , or reset is ’1’ .

3. If wr idx changes, then it increments or wraps.

4. If wr idx changes, then do wr was ’1’ , or reset is ’1’ .

5. And many others....

Page 404: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.5 Assertions 377

Assertion Template

process ( signals read) begin

assert ( required condition)

report "error: message" severity warning;

end process;

Page 405: ECE 327 Slides VHDL Verilog Digital Hardware Design

378 CHAPTER 4. FUNCTIONAL VERIFICATION

Assertions: Read Index

process (rd_idx) begin

assert ((rd_idx > prev_rd_idx) or (rd_idx = low_idx))

report "error: rd inc" severity warning;

assert ((prev_do_rd = ’1’) or (reset = ’1’))

report "error: rd imp do_rd" severity warning;

end process;

Page 406: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.5 Assertions 379

Assertions: Write Index

process (wr_idx) begin

assert ((wr_idx > prev_wr_idx) or (wr_idx = low_idx))

report "error: wr inc" severity warning;

assert ((prev_do_wr = ’1’) or (reset = ’1’))

report "error: wr imp do_wr" severity warning;

end process;

Page 407: ECE 327 Slides VHDL Verilog Digital Hardware Design

380 CHAPTER 4. FUNCTIONAL VERIFICATION

4.5.6 VHDL Coding Tips

Vector Type Declaration

type data_array_ty is array(natural range <>) of data;

signal data_array : data_array_ty(7 downto 0);

Page 408: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.6 VHDL Coding Tips 381

Functions

function to_idx

(i : natural range data_array’low to data_array’high)

return idx_ty

is

begin

return to_unsigned(i, idx_ty’length);

end to_idx;

Conversion to IndexWithout Function With Function

rd_idx <= to_unsigned(5, 3); rd_idx <= to_idx(5);

The function code is verbose, but is very maintainable, because neither the functionitself nor uses of the function need to know the width of the index vector.

Page 409: ECE 327 Slides VHDL Verilog Digital Hardware Design

382 CHAPTER 4. FUNCTIONAL VERIFICATION

Attributes

function inc_idx (idx : idx_ty) return idx_ty is

begin

if idx < data_array’high then

return (idx + 1);

else

return (to_idx(data_array’low));

end if;

end inc_idx;

Page 410: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.6 VHDL Coding Tips 383

Feedback Loops, and Functions

Coding guideline: use functions. Don’t use procedures.

inc as fun inc as procwr_idx <= inc_idx(wr_idx); inc_idx(wr_idx);

Functions clearly distinguish between reading from a signal and writing to a signal.By examining the use of a procedure, you cannot tell which signals are read fromand which are written to. You must examine the declaration or implementation ofthe procedure to determine modes of signals.

Modifying a signal within a procedure results in a tri-state signal. This is bad.

Page 411: ECE 327 Slides VHDL Verilog Digital Hardware Design

384 CHAPTER 4. FUNCTIONAL VERIFICATION

File I/O (textio package)

TEXTIO defines read , write , readline , writeline functions.

Described in:• http://www.eng.auburn.edu/department/ee/mgc/vhdl.ht ml#textio

These functions can be used to read test vectors from a file and write results to afile.

Page 412: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.7 Queue Specification 385

4.5.7 Queue Specification

Most bugs in queues are related to the queue becoming full, becoming empty,and/or wrap of indices.

Specification should be “obviously correct”. Avoid bugs in specification by makingspecification queue larger than the max number of writes that we will do in testsuite. Thus, the specification queue will never become full or wrap. However, theimplementation queue will become full and wrap.

Page 413: ECE 327 Slides VHDL Verilog Digital Hardware Design

386 CHAPTER 4. FUNCTIONAL VERIFICATION

Write Index Update in Specification

We increment write-index on every write, we never wrap.

process (clk) begin

if rising_edge(clk) then

if (reset = ’1’) then

wr_idx <= 0;

elsif (do_wr = ’1’) then

wr_idx <= wr_idx + 1;

end if;

end if;

end process;

Page 414: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.7 Queue Specification 387

Things to Notice

Things to notice in queue specification:

1. don’t care conditions (’-’ )

2. uninitialized data (hint: what is the value of rd_data when do more reads thanwrites?

Page 415: ECE 327 Slides VHDL Verilog Digital Hardware Design

388 CHAPTER 4. FUNCTIONAL VERIFICATION

Don’t Care

rd_data <= data_array(rd_idx) when (do_rd =’1’)

else (others => ’-’);

Page 416: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.5.8 Queue Testbench 389

4.5.8 Queue Testbench

Things to notice in queue testbench:

1. running multipe test sequences

2. uninitialized data ’U’

3. std_match to compare spec and impl data

0 ∼ 00 ∼ L1 ∼ 11 ∼ H- ∼ everything

everything else 6∼ everything

With equality, ’-’ 6= ’1’ , but we want to use ’-’ to mean “don’t care” in specifi-cation. The solution is to use std match , rather than = to check implementationsignals against the specification.

Page 417: ECE 327 Slides VHDL Verilog Digital Hardware Design

390 CHAPTER 4. FUNCTIONAL VERIFICATION

Stimulus Process StructureThe stimulus process runs multiple test vectors in a single simulation run.

stimulus : processtype test_datum_ty is

recordr_reset, ... normal fields ...

end record;type test_vectors_ty is

array(natural range <>) of test_datum_ty;constant test_vectors : test_vectors_ty :=

( -- reset ... other signal ...( ’1’, normal fields), -- test case 1( ’0’, normal fields),

...( ’1’, normal fields), -- test case 2( ’0’, normal fields),

...);

beginfor i in test_vectors’range loop

if (test_vectors(i).r_reset = ’1’) then... reset code ...

end if;reset <= ’0’;... normal sequence ...wait until rising_edge(clk);

end loop;end process;

Page 418: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.6. EXAMPLE: MICROWAVE OVEN 391

4.6 Example: Microwave Oven

This question concerns the VHDL code microwave , which controls a simple mi-crowave oven; the properties prop1 ...prop3 ; and two proposed changes to theVHDL code.

INSTRUCTIONS:

1. Assume that the code as currently written is correct — any change to the codethat causes a change to the behaviour of the signals heat or count is a bug.

2. For each of the two proposed code changes, answer whether the code changewill cause a bug.

3. If the code change will cause a bug, provide a test case that will exercise thebug and identify all of the given properties (prop1 , prop2 , and prop3 ) that willdetect the bug with the test case you provide.

4. If none of the three properties can detect the bug, provide a property of yourown that will detect the bug with the testcase you provide.

Page 419: ECE 327 Slides VHDL Verilog Digital Hardware Design

392 CHAPTER 4. FUNCTIONAL VERIFICATION

Question: For each of the three properties prop1...prop2, answer whetherthe property is best checked as part of a testbench or assertion. For eachproperty, justify why a testbench or an assertion is the best method tovalidate that property.

prop1 If start is pushed and the door is closed, then heat remains on for exactlythe time specified by the timer when start was pushed, assuming reset remainsfalse and the door remains closed.

prop2 If the door is open, then heat is off.

prop3 If start is not pushed, reset is false, and count is greater than zero, thencount is decremented.

Page 420: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.6. EXAMPLE: MICROWAVE OVEN 393

Implementationentity microwave is

port (

timer -- time input from user

: in unsigned(7 downto 0);

reset, -- resets microwave

clk, -- clock signal input

is_open, -- detects when door is open

start -- start button input from user

: in std_logic;

heat : out std_logic -- 1=on, 0=off

);

end microwave;

architecture main of microwave is

signal count : unsigned(7 downto 0); -- internal time count

signal x_heat : std_logic;

begin

Page 421: ECE 327 Slides VHDL Verilog Digital Hardware Design

394 CHAPTER 4. FUNCTIONAL VERIFICATION

-- heat process ------------------------------process (clk)begin

if rising_edge(clk) thenif reset = ’1’ then

x_heat <= ’0’;elsif (is_open = ’0’) and (start = ’1’) and -- region of

(time > 0) -- change #1then --

x_heat <= ’1’; --elsif (is_open = ’0’) and (count > 0) then --

x_heat <= x_heat; --else

x_heat <= ’0’;end if;

end if;end process;

Page 422: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.6. EXAMPLE: MICROWAVE OVEN 395

-- count process ------------------------------process (clk)begin

if rising_edge(clk) thenif (reset = ’1’) then

count <= to_unsigned(0, 8);elsif (start = ’1’) then -- region of

count <= timer; -- change #2elsif (count > 0) then --

count <= count - 1; --end if;

end if;end process;heat <= x_heat;

end main;

Page 423: ECE 327 Slides VHDL Verilog Digital Hardware Design

396 CHAPTER 4. FUNCTIONAL VERIFICATION

Propertiesprop1 If start is pushed and the door is closed, then heat remains on for exactly

the time specified by the timer when start was pushed, assuming reset remainsfalse and the door remains closed.

prop2 If the door is open, then heat is off.

prop3 If start is not pushed, reset is false, and count is greater than zero, thencount is decremented.

Page 424: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.6. EXAMPLE: MICROWAVE OVEN 397

Change #1

From:

elsif (start = ’1’) then

count <= time;

elsif (count > 0) then

count <= count - 1;

To:

elsif (count > 0) then

count <= count - 1;

elsif (start = ’1’) then

count <= time;

Page 425: ECE 327 Slides VHDL Verilog Digital Hardware Design

398 CHAPTER 4. FUNCTIONAL VERIFICATION

Change #2

From:

elsif (is_open = ’0’) and (start = ’1’) and (time > 0)

then x_heat <= ’1’;

elsif (is_open = ’0’) and (count > 0)

then x_heat <= x_heat;

To:

elsif (is_open = ’0’)

and ((start = ’1’) or (count > 0))

then x_heat <= ’1’;

else x_heat <= ’0’;

Page 426: ECE 327 Slides VHDL Verilog Digital Hardware Design

4.6. EXAMPLE: MICROWAVE OVEN 399

Coverage

Question: If msb of src1 is ’1’ and lsb of src2 is ’0’ or sum(3) is ’1’, thenresult is wrong. What is the minimum coverage needed to detect bug?What is the minimim coverage needed to guarantee that the bug will bedetected?

Page 427: ECE 327 Slides VHDL Verilog Digital Hardware Design

400 CHAPTER 4. FUNCTIONAL VERIFICATION

Page 428: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 5

Timing Analysis

401

Page 429: ECE 327 Slides VHDL Verilog Digital Hardware Design

402 CHAPTER 5. TIMING ANALYSIS

5.1 Delays and Definitions

In this section we will look at the different timing parameters of circuits. Our focuswill be on those parameters that limit the maximum clock speed at which a circuitwill work correctly.

5.1.1 Background Definitions

This section reserved for your reading pleasure

Page 430: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.1.2 Clock-Related Timing Definitions 403

5.1.2 Clock-Related Timing Definitions

5.1.2.1 Clock Skewskew

clk1

clk2

clk3

clk4

clk1

clk2

clk3

clk4

Definition Clock Skew: The difference in arrival times for the same clockedge at different flip-flops.

Clock skew is caused by the difference in interconnect delays to different points onthe chip.

Page 431: ECE 327 Slides VHDL Verilog Digital Hardware Design

404 CHAPTER 5. TIMING ANALYSIS

Clock Tree Design

Clock tree design is critical in high-performance designs to minimize clock skew.Sophisticated synthesis tools put lots of effort into clock tree design, and the tech-niques for clock tree design still generate PhD theses.

Page 432: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.1.2 Clock-Related Timing Definitions 405

5.1.2.2 Clock Latency

latency

master clock

intermediate clock

final clock

master clock

inte

rmed

iate

clo

ck final clock

Definition Clock Latency: The difference in arrival times for the same clockedge at different levels of interconnect along the clock tree. (Intuitively“different points in the clock generation circuitry.”)

Note: Clock latency Clock latency does not affect the limit onthe minimim clock period.

Page 433: ECE 327 Slides VHDL Verilog Digital Hardware Design

406 CHAPTER 5. TIMING ANALYSIS

5.1.2.3 Clock Jitter

jitter

ideal clock

clock with jitter

Definition Clock Jitter: Difference between actual clock period and idealclock period.

Page 434: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.1.2 Clock-Related Timing Definitions 407

Causes of Clock Jitter

Clock jitter is caused by:• temperature and voltage variations over time

• temperature and voltage variations across different locations on a chip

• manufacturing variations between different parts

Page 435: ECE 327 Slides VHDL Verilog Digital Hardware Design

408 CHAPTER 5. TIMING ANALYSIS

5.1.3 Storage-Related Timing Definitions

5.1.3.1 Flops and Latches

d

clk

q

Flop Behaviour

d

clk

q

Latch Behaviour

Storage devices have two modes: load mode and store mode.

Flops are edge sensitive; they are in load mode just before the clock edge.

Latches are level senstive; they are in load mode while their enable signal is as-serted high (low for active low latches).

Page 436: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.1.3 Storage-Related Timing Definitions 409

Timing Parameters

β

d

clk

q

Clock-to-Q

HoldSetup

α β

Flip-flop

d

clk

q

Clock-to-Q

HoldSetup

α β

α β

Active-high latch

d

clk

q

Clock-to-Q

HoldSetup

α β

α β

Active-low latch

Setup and hold define the window in which input data are required to be constantin order to guarantee that storage device will store data correctly.

Clock-to-Q defines the delay from the clock edge to when the output is guaranteedto be stable.

Page 437: ECE 327 Slides VHDL Verilog Digital Hardware Design

410 CHAPTER 5. TIMING ANALYSIS

5.1.4 Propagation Delays

Propagation delay time it takes a signal to travel from the source (driving) flop tothe destination flop

propagation delay = load delay + interconnect delay

Load delay combinational gates between the flops

Interconnect delay wires between gates and flops

Page 438: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.1.5 Timing Constraints 411

5.1.5 Timing Constraints

5.1.5.1 Minimum Clock Periodsignal may change

signal is stablea b

clk1 clk2

signal may rise

signal may fall

clk1

clk2

a

b

clock period

ClockPeriod >

Page 439: ECE 327 Slides VHDL Verilog Digital Hardware Design

412 CHAPTER 5. TIMING ANALYSIS

5.1.5.2 Hold Constraint5.1.5.3 Example Timing Violations

Good Timinga

b

clk

a

clk

b

dc

c

Clock-to-Q

Setup

Prop

d

β γ

β

βα γ

α

α

αα

β

Hold

Page 440: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.1.5 Timing Constraints 413

Setup Violation

α

a

clk

b

c α β

?α?β?

a

clk

b

c

Clock-to-Q

Setup

Prop

d

β γ

β

βα γ

α

α

αα

?α?β?

Setup Violation

Page 441: ECE 327 Slides VHDL Verilog Digital Hardware Design

414 CHAPTER 5. TIMING ANALYSIS

Hold Violation

a b

clk

a

clk

b

dc

c

Hold

d

β γ

β

β γ

?β?γ?

γ

Clock-to-Q

Prop

Hold Violation

Page 442: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2. TIMING ANALYSIS OF LATCHES AND FLIP FLOPS 415

5.2 Timing Analysis of Latches and FlipFlops

In this section, we show how to find the clock-to-Q, setup, and hold times for latches,flip-flops, and other storage elements.

5.2.1 Simple Multiplexer Latch

Page 443: ECE 327 Slides VHDL Verilog Digital Hardware Design

416 CHAPTER 5. TIMING ANALYSIS

5.2.1.1 Structure and Behaviour of Multi-plexer Latch

i o

clk

Loading / pass-through mode

i o

’1’

Storage mode

Page 444: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 417

Unfold Multiplexer to Simple Gates

i o

’0’

ab

s

o

Multiplexer: symbol and implementation

i o

clka

sel

b

o

Latch implementation

Page 445: ECE 327 Slides VHDL Verilog Digital Hardware Design

418 CHAPTER 5. TIMING ANALYSIS

Latch Glitching

d clk

o

Note: inverters on clk Both of the inverters on the clk signalare needed. Together, they prevent a glitch on the OR gate whenclk is deasserted. If there was only one inverter, a glitch wouldoccur. For more on this, see section 5.2.1.6

Page 446: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 419

Loading and Storing Values

d clk

o

Loading ’0’

0

11

10

0

d=’0’ clk=’1’

o1

Loading ’1’

1

00

00

0

d=’1’ clk=’1’

o1

Storing ’0’

010

11

d clk=’0’

o=’0’0

1

Storing ’1’

Page 447: ECE 327 Slides VHDL Verilog Digital Hardware Design

420 CHAPTER 5. TIMING ANALYSIS

5.2.1.2 Strategy for Timing Analysis ofStorage Devices

The key to calculating setup and hold times of a latch, flop, etc is to identify:

1. how the data is stored when not connected to the input (often a pair of invertersin a loop)

2. the gate(s) that the clock uses to cause the stored data to drive the output (oftena transmission gate or multiplexor)

3. the gate(s) that the clock uses to cause the input to drive the output (often atransmission gate or multiplexor)

Page 448: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 421

5.2.1.3 Clock-to-Q Time of a MultiplexerLatch

clk d

l1l2

qn q

s2

s1

cn

c2 clk

d l1

l2

qn q

s2

s1

cn

c2

clk d

l1l2

qn q

s2

s1

cn

c2 clk

d l1

l2

qn q

s2

s1

cn

c2

clk d

l1l2

qn q

s2

s1

cn

c2 clk

d l1

l2

qn q

s2

s1

cn

c2

Page 449: ECE 327 Slides VHDL Verilog Digital Hardware Design

422 CHAPTER 5. TIMING ANALYSIS

5.2.1.4 Setup Timing of a Multiplexer Latchclk

d α1 0 1

αα

α α

ααα0

0

Circuit is stable in load mode

clk d α

0 1 0α

0

α α

ααα1

t=3: l2 is set to 0, because c2 turns off AND gate

α

clk d α

0 0 1α

α

α α

ααα0

0

t=0: Clk transitions from load to store

clk d α

0 1 0α

0

α α

ααα1

t=4: α from store path propagates to q

α

clk d α

0 1 1α

α

α α

ααα1

0

t=1: Clk transitions from load to store

clk d α

0 1 0α

0

α α

ααα1

t=5: α from store path completes cycle

α

clk d α

0 1 0α

α

α α

ααα1

t=2: s1 propagates to s2, because cn turns on AND gate

α

Page 450: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 423

Setup Violation

clk d

1 0 1ω

ω

ω ω

ωωω0

0

Circuit is stable in load mode with ω

ωclk

d α αα

ω ω

ωωω

0

t=1: α propagates through ANDClk propagates through inverter

0 1 1

1

clk d α

1 0 1ω

ω

ω ω

ωωω0

0

t=-1: D transitions from ω to α

Trouble: inconsistent values on load path and store path.Old value (ω) still in store path when store path is enabled.

clk d α α

α

α ω

ωωω

0

ω

t=2: old ω propagates through AND

1 0

1

clk d α

0 1α

ω

ω ω

ωωω0

0

t=0: α propagates through inverterClk transitions from load to store

α0

clk d α α

0

α

αωω

t=3: l2 is set to 0, because c2 turns off AND gate

ω

0 1 0

1ω/α

Page 451: ECE 327 Slides VHDL Verilog Digital Hardware Design

424 CHAPTER 5. TIMING ANALYSIS

clk d α α

ω ω/α

ω/ααα

ω

0 1 0

1

t=4: ω/α from store path propagates to q

clk d α=1

0 1 00

0

0 1

1111

t=5: Illustrate instability with ω=0, α=1

0

clk d α

0 1 0α

0

ω

ωω/αω/α

t=5: ω/α from store path completes cycle

ω

d ω

l1

l2

qn

q

s1

s2

clk

cn

ω

ω

ω

ω

ω

α

α

α

ω

α ω

ω

ω

ω

setup with negative margin

c2

ω

ω

ω

ω

ω

ω

α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α

-3 -2 -1 0 1 2 3 4 5 6

Page 452: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 425

We now repeat the analysis of setup violation, but illustrate the minimum violation(input transitions from ω to α 3 time-units before the clock edge).

clk d

1 0 1ω

ω

ω ω

ωωω0

0

Circuit is stable in load mode with ω

ω

clk d α

1 0 1α

α

ω ω

ωωω0

0

t=-1: α propagates through AND

clk d α

1 0 1ω

ω

ω ω

ωωω0

0

t=-3: D transitions from ω to α

clk d α

0 0 1α

α

α ω

ωωω0

0

t=0: Clk transitions from load to store

clk d α

1 0 1α

ω

ω ω

ωωω0

0

t=-2: α propagates through inverter

α

clk d α

0 1 1α

α

α α

αωω1

0

t=1: Clk propagates through inverter

Page 453: ECE 327 Slides VHDL Verilog Digital Hardware Design

426 CHAPTER 5. TIMING ANALYSIS

clk d α

0 1 0α

α

α α

ααα1

t=2: old ω propagates through AND

ω

Trouble: inconsistent values on load path and store path.Old value (ω) still in store path when store path is enabled.

clk d α

0 1 0α

0

α α

αω/αω/α

1

t=5: ω/α from store path completes cycle

α

clk d α

0 1 0α

0

ω/α α

ααα1

t=3: l2 is set to 0, because c2 turns off AND gate

α

clk d α=1

0 1 00

0

0 1

1111

t=5: Illustrate instability with ω=0, α=1

0

clk d α

0 1 0α

0

α ω/α

ω/ααα1

t=4: ω/α from store path propagates to q

α

d ω

l1

l2

qn

q

s1

s2

clk

cn

ω

ω

ω

ω

ω

α

α

α

ω

α α

α

α

α

setup with negative margin

c2

α

α

α

α

ω

ω

α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α α/ω

α

-3 -2 -1 0 1 2 3 4 5 6

Page 454: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 427

Minimum Setup Time

clk d

l1l2

qn q

s2

s1

cn

d ω

l1

l2

qn

q

s1

s2

clk

cn

ω

ω

ω

ω

ω

α

α

α

α

α

α

α

setup

c2

α

α

α

α

α

α

α

α

α

α

α

α

α

α

α

α

Page 455: ECE 327 Slides VHDL Verilog Digital Hardware Design

428 CHAPTER 5. TIMING ANALYSIS

5.2.1.5 Hold Time of a Multiplexer Latchclk

d l1

l2

qn q

s2

s1

cn

c2

Page 456: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.2.1 Simple Multiplexer Latch 429

Hold Time Behaviour

clk d

l1l2

qn q

s2

s1

cn

c2clk

d l1

l2

qn q

s2

s1

cn

c2

clk d

l1l2

qn q

s2

s1

cn

c2clk

d l1

l2

qn q

s2

s1

cn

c2

clk d

l1l2

qn q

s2

s1

cn

c2clk

d l1

l2

qn q

s2

s1

cn

c2

Page 457: ECE 327 Slides VHDL Verilog Digital Hardware Design

430 CHAPTER 5. TIMING ANALYSIS

5.2.1.6 Example of a Bad Latch

clk d

l1l2

qn q

s2

s1

cn

c2

d α β

l1

l2

qn

q

s1

α β

s2

clk

c2

α

α

α

α

cn

α

α

α

α

α

α

α

α

α

Page 458: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3. CRITICAL PATHS AND FALSE PATHS 431

5.3 Critical Paths and False Paths

5.3.1 Introduction to Critical and FalsePaths

Definition critical path: The slowest path on the chip between flops or flopsand pins. The critical path limits the maximum clock speed.

Definition false path: : a path along which an edge cannot travel frombeginning to end.

Page 459: ECE 327 Slides VHDL Verilog Digital Hardware Design

432 CHAPTER 5. TIMING ANALYSIS

Outline

The algorithm that we present comes from McGeer and Brayton in a DAC 198?paper. The algorithm to find the critical path through a circuit is presented in severalparts.

1. Section 5.3.2: Find the longest path ignoring the possibility of false paths.

2. Section 5.3.3: Almost-correct algorithm to test whether a candidate critical pathis a false path.

3. Section 5.3.4: If a candidate path is a false path, then find the next candidatepath, and repeat the false-path detection algorithm.

4. Section 5.3.5: Correct, complete, and complex algorithm to find the critical pathin a circuit.

Page 460: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.1 Introduction to Critical and False Paths 433

Notes

Note: The analysis of critical paths and false paths assumesthat all inputs change values at exactly the same time. Timingdifferences between inputs are modelled by the skew parameterin timing analysis.

Throughout our discussion of critical paths, we will use the delay values for gatesshown in the table below.

gate delayNOT 2AND 4OR 4XOR 6

Page 461: ECE 327 Slides VHDL Verilog Digital Hardware Design

434 CHAPTER 5. TIMING ANALYSIS

5.3.1.1 Example of Critical Path in FullAdder

Question: Find the critical path through the full-adder circuit shown below.

ci a b

co

si

jk

Page 462: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.1 Introduction to Critical and False Paths 435

Alternative Excitation

Question: Do the input values of ci=0, a=↓, b=1 exercise the critical path?

ci a b

co

si

jk

Page 463: ECE 327 Slides VHDL Verilog Digital Hardware Design

436 CHAPTER 5. TIMING ANALYSIS

5.3.1.2 Preliminaries for Critical Paths

5.3.1.3 Longest Path and Critical Path

The longest path through the circuit might not be the critical path, because thebehaviour of the gates might prevent an edge (0→ 1 or 1→ 0) from travelling alongthe path.

Page 464: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.1 Introduction to Critical and False Paths 437

Example False Path

Question: Determine whether the longest path in the circuit below is a falsepath

ya

b

a = 0, b = 0→ 1 a = 0, b = 1→ 0

ya

b

ya

b

a = 1, b = 0→ 1 a = 1, b = 1→ 0

ya

b

ya

b

Question: How can we determine analytically that this is a false path?

Page 465: ECE 327 Slides VHDL Verilog Digital Hardware Design

438 CHAPTER 5. TIMING ANALYSIS

ya

b

Page 466: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.1 Introduction to Critical and False Paths 439

Preview of Complete Example

Question: Find the critical path through the circuit below.

a b

c

d ef

g

a b

c

d ef

g

Page 467: ECE 327 Slides VHDL Verilog Digital Hardware Design

440 CHAPTER 5. TIMING ANALYSIS

5.3.2 Longest Path

Outline of Algorithm to Find Longest Path

The basic idea is to annotate each signal with the maximum delay from it to anoutput.• Start at destination signals and traverse through fanin to source signals.

– Destination signals have a delay of 0

– At each gate, annotate the inputs by the delay through the gate plus the delayof the output.

– When a signal fans out to multiple gates, annotate the output of the source(driving) gate with maximum delay of the destination signals.

• The primary input signal with the maximum delay is the start of the longest path.The delay annotation of this signal is the delay of the longest path.

• The longest path is found by working from the source signal to the destinationsignals, picking the fanout signal with the maximum delay at each step.

Page 468: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.3 Detecting a False Path 441

5.3.3 Detecting a False Path

5.3.3.1 Preliminaries

The controlling value of a gate is the value such that if one of the inputs has thisvalue, the output can be determined independently of the other inputs.

The controlled output value is the value produced by the controlling input value.

Gate Controlling Value Controlled Output

AND

OR

NAND

NOR

XOR

Page 469: ECE 327 Slides VHDL Verilog Digital Hardware Design

442 CHAPTER 5. TIMING ANALYSIS

Path Input, Side Input

Definition path input: For a gate on a path (either a candidate critical path, ora real critical path), the path input is the input signal that is on the path.

Definition side input: For a gate on a path (either a candidate critical path, ora real critical path), the side inputs are the input signals that are not on thepath.

Page 470: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.3 Detecting a False Path 443

Reconvergent Fanout

Definition reconvergent fanout: There are paths from signals in the fanout ofa gate that reconverge at another gate.

ya

b

c

z d e

f

h

g

If a candidate path has reconvergent fanout, then the rising or falling edge on theinput to the path might cause a side input along the path to have a rising or fallingedge, rather than a stable ’0’ or ’1’ .

Page 471: ECE 327 Slides VHDL Verilog Digital Hardware Design

444 CHAPTER 5. TIMING ANALYSIS

Rules for Propagating an Edge Along a Path

1 1

0 0

1 1

0 0

NOT

AND

OR

XOR

Page 472: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.3 Detecting a False Path 445

Missing Rules?

Question: Why do the rules not have falling edges for AND gates or risingedges for OR gates on the side input?

ab c

a

b

c

Page 473: ECE 327 Slides VHDL Verilog Digital Hardware Design

446 CHAPTER 5. TIMING ANALYSIS

Viability Condition of a Path

Definition Viability condition: For a path (p) though a circuit, the viabilitycondition is a Boolean expression in terms of the input signals that definesthe cases where an edge will propagate along the path.

Based upon the rules for propagating an edge that we have seen so far, the viabilitycondition for a path is: every side input has a non-controlling value.

As always, section 5.3.5 has the complete viability condition.

Page 474: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.3 Detecting a False Path 447

5.3.3.2 Almost-Correct Algorithm to Detecta False Path1. Annotate each side input along the path with its non-controlling value. These

annotations are the constraints that must be satisfied for the candidate path tobe exercised.

2. Propagate the constraints backward from the side inputs of the path to the inputsof the circuit under consideration.

3. If there is a contradiction amongst the constraints, then the candidate path is afalse path.

4. If there is no contradiction, then the constraints on the inputs give the conditionsunder which an edge will traverse along the candidate path from input to output.

5.3.3.3 Examples of Detecting False Paths

Page 475: ECE 327 Slides VHDL Verilog Digital Hardware Design

448 CHAPTER 5. TIMING ANALYSIS

False-Path Example 1

Question: Determine if the longest path in the circuit below is a false path.

a

b

c

0

14 12 1212

6 44

8 88

44

8 2 016

12

10

d

e

f g

h

i

j

k

side input non-controlling value constraint

Page 476: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.4 Finding the Next Candidate Path 449

5.3.4 Finding the Next Candidate Path

If the longest path is a false path, we need to find the next longest path in the circuit,which will be our next candidate critical path. If this candidate fails, we continue tofind the next longest of the remaining paths, ad infinitum.

Page 477: ECE 327 Slides VHDL Verilog Digital Hardware Design

450 CHAPTER 5. TIMING ANALYSIS

5.3.4.1 Algorithm to Find Next CandidatePath1. Initialize path table with primary inputs, their potential delay, and fanout.

2. Sort path table by potential delay

3. If the partial path with the max delay has just one unused fanout signal,then extend the partial path with this signal.Otherwise:

(a) Extend path through unused fanout with max delay.

(b) Delete this fanout signal from the list of unused fanout signals .

4. Compute constraint that side input has non-controlling value

5. If the new constraint does not cause a contradiction,then return to step 3.Otherwise:

(a) Mark this partial path as false.

(b) For each partial path that is a prefix of the false path:

• recalculate potential delay of path

(c) Return to step 2

Page 478: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.4 Finding the Next Candidate Path 451

5.3.4.2 Examples of Finding Next Candi-date Path

Next-Path Example 1

Question: Starting from the initial delay calculation and longest path, findthe next candidate path and test if it is a false path.

a

b

c

0

14 12 1212

6 44

8 88

44

8 2 016

12

10

d

e

f g

h

i

j

k

Page 479: ECE 327 Slides VHDL Verilog Digital Hardware Design

452 CHAPTER 5. TIMING ANALYSIS

potential unuseddelay fanout path10 e c12 h, g b16 d a

Page 480: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.4 Finding the Next Candidate Path 453

side input non-controlling value constraint

Page 481: ECE 327 Slides VHDL Verilog Digital Hardware Design

454 CHAPTER 5. TIMING ANALYSIS

5.3.5 Correct Algorithm to Find CriticalPath

We now remove the assumption that side inputs always arrive earlier than pathinputs.

5.3.5.1 Rules for Late Side Inputs

Early Side

monotone speedup side input causes glitchpath input propogates

Late Side

path=CTRLside=non-ctrl

path=non-ctrl path=CTRL path=non-ctrlside=non-ctrl side=CTRL side=CTRL

path input causes glitch path input propogates neither input propogatesside input propogates

monotone speedup

The complete and correct rule: a path input excites the gate if the side-input isnon-controlling or the side-input arrives late and the path input is controlling.

Page 482: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.5 Correct Algorithm to Find Critical Path 455

5.3.5.2 Monotone Speedup

Definition monotonic: A function ( f ) is monotonic if increasing its inputcauses the output to increase or remain the same. Mathematically:x < y =⇒ f (x)≤ f (y).

Definition monotononous: A lecture is monotonous if increasing the length ofthe lecture increases the number of people who are asleep.

Definition monotone speedup: The maximum clockspeed of a circuit shouldbe monotonic with respect to the speed of any gate or sub-circuit. That is, ifwe increase the speed of part of the circuit, we should either increase theclockspeed of the circuit, or leave it unchanged.

Page 483: ECE 327 Slides VHDL Verilog Digital Hardware Design

456 CHAPTER 5. TIMING ANALYSIS

5.3.5.3 Analysis ofSide-Input-Causes-Glitch Situation

5.3.5.4 Complete Algorithm• If find a contradiction on the path, check for side inputs that are on previously

discovered false paths.

• If a gate and its side input are on a previously discovered false path, then theside input defines a prefix of a false path that is a late-arriving side input.

• For each late-arriving prefix, compute its viability (the conditions under which anedge will propagate along the prefix to the late side input).

• To the row of the late arriving side input in the constraint table, add as adisjunction the constraint that: the path input has a controlling value and at leastone of the prefixes is viable.

Page 484: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.5 Correct Algorithm to Find Critical Path 457

5.3.5.5 Complete Examples

Complete Example 1

Question: Find the critical path in the circuit below.

a b

c

d ef

g

potential unuseddelay fanout pathfalse a,b,d,e,f,g10 g, c a10 a,c,f,g

side input non-controlling value constraintf[e] 1 ag[a] 1 a

Page 485: ECE 327 Slides VHDL Verilog Digital Hardware Design

458 CHAPTER 5. TIMING ANALYSIS

Complete Example 2

Question: Find the critical path in the circuit below.

a

c

h

i jj

i

gb

f

04

44

48

88

8

8

8

12

1212

8

814 1010ed 12

potential unuseddelay fanout pathfalse b,d,e,g,h,i,j8 f a12 h c14 f, g b,d,e14 b,d,e,g,i,j

side input non-ctrl value constrainth[c] 0 ci[h] 0 cbj[f] 0 ab

Page 486: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.5 Correct Algorithm to Find Critical Path 459

Complete Example 3Monotone speedup

• Critical path 〈a,c,e,f〉

• Late side input e[d]

• Total delay 10

• Excitation: a = rising edge

a b

ef

c

d0 0 2 4

0 2

0

Rising edge excitation

a b

ef

c

d0 0 2 4

0 2

04

6

Falling edge excitation

a b

ef

c

d0 0 0.5 1

0 2

0

610

Fast timing

Page 487: ECE 327 Slides VHDL Verilog Digital Hardware Design

460 CHAPTER 5. TIMING ANALYSIS

Complete Example 4Late side inputs sometimes must have an edge.

Find the second-longest path with contradiction using early sides:

a b

c de

f g h

i jk

0

0 2 4 6

6

1 0 11 1

1

1 00a

b

c de

f g h

i jk

2 44

08

4 8

0 2 4 6

6810

10 12

14 16a b

c de

f g h

i jk

0

0

Page 488: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.3.5 Correct Algorithm to Find Critical Path 461

Complete Example 5

Late side paths must be viable.

Question: Find the critical path in the circuit below.

a b

c

d

e

f

g

h

i

j

k

a b

c

d

e

f

g

h

i

j

k

Page 489: ECE 327 Slides VHDL Verilog Digital Hardware Design

462 CHAPTER 5. TIMING ANALYSIS

5.3.6 Further Extensions to Critical PathAnalysis

McGeer and Brayton’s paper includes two extensions to the critical path algorithmpresented here that we will not cover.• gates with more than two inputs

• finding all input values that will exercise the critical path

• multiple paths with the same delay to the same gate

5.3.7 Increasing the Accuracy of CriticalPath Analysis

When doing critical path calculations, it is often useful to strike a balance betweenaccuracy and effort. In the examples so far, we assumed that all signals had thesame wire and load delays. This assumption simplifies calculations, but reducesaccuracy. Section 5.4 discusses how the analog world affects timing analysis.

Page 490: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4. ELMORE TIMING MODEL 463

5.4 Elmore Timing Model

5.4.1 RC-Networks for Timing Analysis

Transistor Level(P-Tran)

gate

source

drain

Mask Level(P-Tran)

gate

sourcepoly

p-diff

contact

drain

Cross-Section ofFabricatedTransistor

poly

p-diff

contact

substrate

Switch Level(P-Tran)

gate

source

drain

Page 491: ECE 327 Slides VHDL Verilog Digital Hardware Design

464 CHAPTER 5. TIMING ANALYSIS

Transistor Level(N-Tran)

gate

source

drain

Mask Level(N-Tran)

gate

sourcepoly

n-diff

drain

contact

Cross-Section ofFabricatedTransistor

poly

p-diff

contact

substrate

Switch Level(N-Tran)

gate

source

drain

Page 492: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.1 RC-Networks for Timing Analysis 465

Different Levels of Abstraction for Inverter

Gate Levela b

Transistor Level

a b

VDD

GND

Mask Level

VDD

GND

a b

poly

n-diff

p-diff

metal

metal

contact

RC-Network models of P- andN-transistors

gate

Rpu

RpdCp

source

drain

Cp

source

gate

drain

Page 493: ECE 327 Slides VHDL Verilog Digital Hardware Design

466 CHAPTER 5. TIMING ANALYSIS

RC-Network for Timing Analysis

a b

Rpu

Rpd

Cp

VDD

GND

CL

Page 494: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.1 RC-Networks for Timing Analysis 467

A Pair of Inverters

Gate Level

ab

c

Transistor Level

ab

VDD

GND

c

Mask Level

ab

c

Page 495: ECE 327 Slides VHDL Verilog Digital Hardware Design

468 CHAPTER 5. TIMING ANALYSIS

A Pair of Inverters (Cont’d)

Mask LevelVDD

GND

ab c

RC-Network for Timing Analysis

ab

Rpu

Rpd

Cp

VDD

GND

c

Rpu

Rpd

CpCL CLCW

RW RV

RC-Network for Timing Analysis (trimmed)

Page 496: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.1 RC-Networks for Timing Analysis 469

Rpu

Rpd

Cp

VDD

GND

CL

RVb

CW

RW

Page 497: ECE 327 Slides VHDL Verilog Digital Hardware Design

470 CHAPTER 5. TIMING ANALYSIS

A Circuit with Fanout

Gate Level

ab

c

d

Gate Level (physical layout)

ab c

dc

Transistor Level

ab

VDD

GND

c b d

c

Page 498: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.1 RC-Networks for Timing Analysis 471

A Circuit with Fanout (Cont’d)

Transistor Level

ab

VDD

GND

c b d

c

Mask LevelVDD

GND

a db

b

c

c

Page 499: ECE 327 Slides VHDL Verilog Digital Hardware Design

472 CHAPTER 5. TIMING ANALYSIS

A Circuit with Fanout (Cont’d)

Mask LevelVDD

GND

a db

b

c

c

RC-Network for Timing Analysis

a

Rpu

Rpd

Cp

GND

c

Rpu

Rpd

Cpd

Rpu

Rpd

Cp

c

CL CL CL

VDD

b

CW1

RW1 RV

b

CW2

RW2 RV

CW3

RW3

Page 500: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.1 RC-Networks for Timing Analysis 473

A Circuit with Fanout

RC-Network for Timing Analysis

a

Rpu

Rpd

Cp

GND

c

Rpu

Rpd

Cpd

Rpu

Rpd

Cp

c

CL CL CL

VDD

b

CW1

RW1 RV

b

CW2

RW2 RV

CW3

RW3

RC-Network for Timing Analysis (trimmed)

Rpu

Rpd

Cp

GND

CL CL

VDD

RV

bRVb

CW1

RW1

CW2

RW2

Page 501: ECE 327 Slides VHDL Verilog Digital Hardware Design

474 CHAPTER 5. TIMING ANALYSIS

RC-Network for Timing Analysis (cleaned up)

Rpu

Rpd

Cp

GND

CL

CL

VDD

RV

b RV

b

CW1

RW1

CW2

RW2

Page 502: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.2 Derivation of Analog Timing Model 475

5.4.2 Derivation of Analog Timing Model

Real Waveforms

Slow input

time

inputvoltage

time

outputvoltage

Fast input

time

inputvoltage

time

inputvoltage

Page 503: ECE 327 Slides VHDL Verilog Digital Hardware Design

476 CHAPTER 5. TIMING ANALYSIS

Steps Toward Approximation

We begin with two simplifications as steps toward calculating a single delay valuefor a circuit.

1. Look at the circuit’s response to a step-function input.

2. Measure the delay to go from GND to 65% of VDD and from VDD to 35% ofVDD.

Definition Trip Points: A high or ’1’ trip point is the voltage level where anupwards transition means the signal represents a ’1’ .

A low or ’0’ trip point is the voltage level where a downwards transitionmeans the signal represents a ’0’ .

a

b

Page 504: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.2 Derivation of Analog Timing Model 477

Node Numbering, Initial Conditions• The source (VDD in our case) and each capacitor is a node. We number the

nodes, capacitors, and resistors. Resistors are numbered according to thecapacitor to their right. Multiple resistors in series without an interveningcapacitor are lumped into a single resistor.

• All nodes except the source start at GND.

• We calculate the voltage at a node when we turn on the P-transistor (connect toVDD).

The process for analyzing a transition from VDD to GND on a node is the dual ofthe process just described. The source node is GND, all other nodes start at VDD,we calculate the voltage when we turn on the N-transistor (connect it to GND).

Rpu

Rpd

Cp

GND

CL

CL

VDD

RV

b RV

b

CW1

RW1

CW2

RW2

1 2 5

3 40

R1

R2 R5

R3 R4

Page 505: ECE 327 Slides VHDL Verilog Digital Hardware Design

478 CHAPTER 5. TIMING ANALYSIS

Define: Path and Downstream

Definition path: The path from the source node to a node i is the set of allresistors between the source and i. Example: path(3) = {R1,R2,R3}

Definition down: The set of capactitors downstream from a node is the set ofall capacitors where current would flow through the node to charge thecapacitor. You can think of this as the set of capacitors that are between thenode and ground. Example: down(2) = {C2,C3,C4,C5}. Example: down(3) ={C3,C4}

Page 506: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.2 Derivation of Analog Timing Model 479

5.4.2.1 Example Derivation: Equation forVoltage at Node 3

V3(t) = V0(t)−voltage drop fromNode0toNode3

The voltage drop is the sum of the voltage dropsacross the resistors on the path from Node0 toNode3

= V0(t)− ∑r∈path(3)

Rr×Ir(t)

= V0(t)− (R1I1(t)+R2I2(t)+R3I3(t))

The current through a resistor is the sum of thecurrents through all of the downstream capacitors

Ir(t) = ∑c∈down(r)

Ic

I1(t) = Ic1 + Ic2 + Ic3 + Ic4 + Ic5I2(t) = Ic2 + Ic3 + Ic4 + Ic5I3(t) = Ic3 + Ic4

Page 507: ECE 327 Slides VHDL Verilog Digital Hardware Design

480 CHAPTER 5. TIMING ANALYSIS

Substitute Ir into the equation for V3

V3(t) = V0(t)−

R1(Ic1 + Ic2 + Ic3 + Ic4 + Ic5)+ R2(Ic2 + Ic3 + Ic4 + Ic5)+ R3(Ic3 + Ic4)

Use associativity to group terms by currents.

V3(t) = V0(t)−

Ic1(R1)+ Ic2(R1 +R2)+ Ic3(R1 +R2 +R3)+ Ic4(R1 +R2 +R3)+ Ic5(R1 +R2)

Page 508: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.2 Derivation of Analog Timing Model 481

Current through a capacitor

Ic(t) = Cc∂Vc(t)

∂t

Substitute Ic into equation for V3

V3(t) = V0(t)−

(R1)Cc1∂Vc1(t)

∂t

+ (R1 +R2)Cc2∂Vc2(t)

∂t

+ (R1 +R2 +R3)Cc3∂Vc3(t)

∂t

+ (R1 +R2 +R3)Cc4∂Vc4(t)

∂t

+ (R1 +R2)Cc5∂Vc5(t)

∂t

Page 509: ECE 327 Slides VHDL Verilog Digital Hardware Design

482 CHAPTER 5. TIMING ANALYSIS

Ri,k = ∑r∈(path(k)∩path(k))

Rr

R3,1 = R1R3,2 = R1 +R2R3,3 = R1 +R2 +R3R3,4 = R1 +R2 +R3R3,5 = R1 +R2

Substitute Ri,k into V3

V3(t) = V0(t)−

R3,1Cc1∂Vc1(t)

∂t+ R3,2Cc2

∂Vc2(t)∂t

+ R3,3Cc3∂Vc3(t)

∂t

+ R3,4Cc4∂Vc4(t)

∂t+ R3,5Cc5

∂Vc5(t)∂t

Page 510: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.2 Derivation of Analog Timing Model 483

5.4.2.2 General Derivation

Vi(t) = V0(t)−voltage drop fromNode0toNodei

The voltage drop is the sum of the voltage dropsacross the resistors on the path from Node0 toNodei

= V0(t)− ∑r∈path(i)

Rr×Ir(t)

Page 511: ECE 327 Slides VHDL Verilog Digital Hardware Design

484 CHAPTER 5. TIMING ANALYSIS

The current through a resistor is the sum of thecurrents through all of the downstream capacitors

Ir(t) = ∑c∈down(r)

Ic

Substitute Ir into the equation for Vi

Vi(t) = V0(t)− ∑r∈path(i)

Rr× ∑c∈down(r)

Ic

Use associativity to push Rr into the summationover c

Vi(t) = V0(t)− ∑r∈path(i)

∑c∈down(r)

Rr×Ic

Page 512: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.2 Derivation of Analog Timing Model 485

Current through a capacitor

Ic(t) = Cc∂Vc(t)

∂t

Substitute Ic into equation for Vi

Vi(t) = V0(t)− ∑r∈path(i)

∑c∈down(r)

Rr×Cc∂Vc(t)

∂t

A little bit of handwaving to prepare for Elmore re-sistance

Vi(t) = V0(t)− ∑k∈Nodes

∑r∈path(i)∩path(k)

Rr

×Ck∂Vc(t)

∂t

Page 513: ECE 327 Slides VHDL Verilog Digital Hardware Design

486 CHAPTER 5. TIMING ANALYSIS

Define Elmore resistance Ri,k

Ri,k = ∑r∈(path(k)∩path(k))

Rr

Substitute Ri,k into Vi

Vi(t) = V0(t)− ∑k∈Nodes

Ri,k×Ck∂Vc(t)

∂t

Page 514: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.3 Elmore Timing Model 487

5.4.3 Elmore Timing Model• Assume that V0(t) is a step function from 0 to 1 at time 0.

• Derive upper and lower bounds for Vi(t).

• Find RC time constants for upper and lower bounds.

• Elmore delay is guaranteed to be between upper and lower bounds.

Upper and lower bounds Elmore model RC-network model

TD-TRi

TP-TRi

TRi

TD

TP

Page 515: ECE 327 Slides VHDL Verilog Digital Hardware Design

488 CHAPTER 5. TIMING ANALYSIS

Equations for Curves

Time : 0 TDi−TRi TP−TRi ∞

Upper 1+t−TDi

TP1−

TRi

TPe

TDi−TP− t

TRi

Elmore 1− e−t/TDi

Lower 0 1−TDi

t +TRi

1−TDi

TPe

TP−TRi− t

TP

Fact: 0≤ TRi ≤ TDi ≤ TP

Page 516: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.3 Elmore Timing Model 489

Definitions of Time Constants

TRi = ∑k∈Nodes

R2k,iCk

Ri,iMathematical artifact, no intuitive meaning

TDi = ∑k∈Nodes

Rk,iCk Elmore delay

TP = ∑k∈Nodes

Rk,kCk RC-time constant for lumped network

Page 517: ECE 327 Slides VHDL Verilog Digital Hardware Design

490 CHAPTER 5. TIMING ANALYSIS

Picking the Trip Point

Vi(t) = VDD(1− e−t/TDi)Pick trip point of Vi(t) = 0.65VDD, then solve for t

0.65VDD = VDD(1− e−t/TDi)

0.35 = e−t/TDi

Take ln of both sidesln0.35 = ln(e−t/TDi)

ln0.35 =−1.05≈−1.0−1.0 = −t/TDi

t = TDi

By picking a trip point of 0.65VDD, the time for Vi to reach the trip is the Elmoredelay.

Page 518: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.4 Examples of Using Elmore Delay 491

5.4.4 Examples of Using Elmore Delay

5.4.4.1 Interconnect with Single Fanout

Page 519: ECE 327 Slides VHDL Verilog Digital Hardware Design

492 CHAPTER 5. TIMING ANALYSIS

G1 G2

G1Ra1

C1 Ra2

Ra3

C2C3Ra4

G2Rw1

Rw2Rw3

C1

G1

Vi

Rpu

Rpd

Cp C2

Rw1

C3

Rw2 Rw3

CG2

G2

Ra1 Ra2 Ra3 Ra4

G* gateC* capacitance on wireRa* resistance through antifuseRw* resistance through wire

Page 520: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.4 Examples of Using Elmore Delay 493

Question: Calculate delay from gate 1 to gate 2

C1

G1

Vi

Rpu

Rpd

Cp C2

Rw1

C3

Rw2 Rw3

CG2

G2

Ra1 Ra2 Ra3 Ra4

Page 521: ECE 327 Slides VHDL Verilog Digital Hardware Design

494 CHAPTER 5. TIMING ANALYSIS

Doubling Antifuses

Question: If you double the number of antifuses and wires needed toconnect two gates, what will be the approximate effect on the wire delaybetween the gates?

Page 522: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.4 Examples of Using Elmore Delay 495

5.4.4.2 Interconnect with Multiple Gates inFanout

G1 G2

G3 G1

G2

G3

Question: Assuming that wire resistance is much less than antifuseresistance and that all antifuses have equal resistance, calculate the delayfrom the source inverter (G1) to G2

Page 523: ECE 327 Slides VHDL Verilog Digital Hardware Design

496 CHAPTER 5. TIMING ANALYSIS

Page 524: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.4.4 Examples of Using Elmore Delay 497

Delay to G2 vs G3

Question: Assuming all wire segments at same level have roughly thesame capacitance, which is greater, the delay to G2 or the delay to G3?

G1R1

C1

R2

R3

C2

C4R4

G2

C6

R6

R5

G3

C3

C5

C7

C1

G1

Vi

Rpu

Rpd

Cp C2

R1

C4

R2 R3 R4

C5

G2

C6

R5 R6

C7

G3

C3

n1 n2 n3 n4 n5

n6 n7

Page 525: ECE 327 Slides VHDL Verilog Digital Hardware Design

498 CHAPTER 5. TIMING ANALYSIS

5.5 Practical Usage of Timing Analysis

Speed Grading

• Fabs sort chips according to their speed (sorting is known as speed gradingor speed binning)

• Faster chips are more expensive

• In FPGAs, sorting is based usualy on propagation delay through an FPGAcell. As wires become a larger portiono of delay, some analysis of wiredelays is also being done.

• Propagation delay is the average of the rising and falling propagation delays.

• Typical speed grades for FPGAs:

Std standard speed grade1 15% faster than Std2 25% faster than Std3 35% faster than Std

Worst-Case Timing

• Maximum Delay in CMOS. When?

Page 526: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.5. PRACTICAL USAGE OF TIMING ANALYSIS 499

– Minimum voltage

– Maximum temperature

– Slow-slow conditions (process variation/corner which result in slowp-channel and slow n-channel). We could also have fast-fast, slow-fast,and fast-slow process corners

• Increasing temperature increases delay

– ⇑ Temp =⇒ ⇑ resistivity

– ⇑ resistivity =⇒ ⇑ electron vibration

– ⇑ electron vibration =⇒ ⇑ colliding with current electrons

– ⇑ colliding with current electrons =⇒ ⇑ delay

• Increasing supply voltage decreases delay

– ⇑ supply voltage =⇒ ⇑ current

– ⇑ current =⇒ ⇓ load capacitor charge time

– ⇓ load capacitor charge time =⇒ ⇓ total delay

• Derating factor is a number used to adjust timing number to account forvoltage and temp conditions

Page 527: ECE 327 Slides VHDL Verilog Digital Hardware Design

500 CHAPTER 5. TIMING ANALYSIS

• ASIC manufacturers classes, based on variety of environments:VDD TA (ambient temp) TC (case temp)

Commercial 5V ± 5% 0 to +70CIndustrial 5V ± 10% –40 to +85CMilitary 5V ± 10% –55 to +125C

• What is important is the transistor temperature inside the chip, TJ (junctiontemperature)

5.5.1 Speed BinningSpeed binning is the process of testing each manufactured part to determine themaximum clock speed at which it will run reliably.

Manufacturers sell chips off of the same manufacturing line at different pricesbased on how fast they will run.

A “speed bin” is the clock speed that chips will be labeled with when sold.

Overclocking: running a chip at a clock speed faster than what it is rated for (andhoping that your software crashes more frequently than your over-stressedhardware will).

Page 528: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.5.1 Speed Binning 501

5.5.1.1 FPGAs, Interconnect, andSynthesis

On FPGAs 40-60% of clock cycle is consumed by interconnect.

When synthesizing, increasing effort (number of iterations) of place and route cansignificantly reduce the clock period on large designs.

Page 529: ECE 327 Slides VHDL Verilog Digital Hardware Design

502 CHAPTER 5. TIMING ANALYSIS

5.5.2 Worst Case Timing

5.5.2.1 Fanout delay

In Smith’s book, Table 5.2 (Fanout delay) combines two separate parameters:

• capacitive load delay

• interconnect delay

into a single parameter (fanout). This is common, and fine.

But, when reading a table such as this, you need to know whether fanout delay iscombining both capacitive load delay and interconnect delay, or is just capacitiveload.

Page 530: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.5.2 Worst Case Timing 503

5.5.2.2 Derating Factors

Delays are dependent upon supply voltage and temperature.

⇑ Temp =⇒ ⇑ Delay⇑ Supply voltage =⇒ ⇓ Delay

Page 531: ECE 327 Slides VHDL Verilog Digital Hardware Design

504 CHAPTER 5. TIMING ANALYSIS

Temperature

• ⇑ Temp =⇒ ⇑ Delay

– ⇑ Temp =⇒ ⇑ Resistivity of wires

– As temp goes up, atoms vibrate more, and so have greater probability ofcolliding with electrons flowing with current.

Page 532: ECE 327 Slides VHDL Verilog Digital Hardware Design

5.5.2 Worst Case Timing 505

Supply Voltage

• ⇑ Supply voltage =⇒ ⇓ Delay

– ⇑ Supply voltage =⇒ ⇑ current (V = IR)

– ⇑ current =⇒ ⇓ time to charge load capacitors to threshold voltage

Page 533: ECE 327 Slides VHDL Verilog Digital Hardware Design

506 CHAPTER 5. TIMING ANALYSIS

Derating Factor Definition

A “derating factor” is a number to adjust timing numbers to account for differenttemperature and voltage conditions.

Excerpt from table 5.3 in Smith’s book (Actel Act 3 derating factors):

Derating factor Temp Vdd1.17 125C 4.5V1.00 70C 5.0V0.63 -55C 5.5V

Page 534: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 6

Power Analysis and Power-AwareDesign

507

Page 535: ECE 327 Slides VHDL Verilog Digital Hardware Design

508 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.1 Overview

6.1.1 Importance of Power and Energy• Laptops, PDA, cell-phones, etc — obvious!

• For microprocessors in personal computers, every watt above 40W adds $1 tomanufacturing cost

• Approx 25% of operating expense of server farm goes to energy bills

• (Dis)Comfort of Unix labs in E2

• Sandia Labs had to build a special sub-station when they took delivery ofTeraflops massively parallel supercomputer (over 9000 Pentium Pros)

• High-speed microprocessors today can run so hot that they will damagethemselves — Athlon reliability problems, Pentium 4 processor thermal throttling

• In 2000, information technology consumed 8% of total power in US.

• Future power viruses: cell phone viruses cause cell phone to run in full powermode and consume battery very quickly; PC viruses that cause CPU tomeltdown batteries

Page 536: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.1.2 Industrial Names and Products 509

6.1.2 Industrial Names and ProductsNote: Lots of links from E&CE 327 web pages under “Docu-mentation”

6.1.3 Power vs Energy

Most people talk about “power” reduction, but sometimes they mean “power” andsometimes “energy.”• Power minimization is usually about heat removal

• Energy minimization is usually about battery life or energy costs

Type Units Equivalent Types EquationsEnergy Joules Work = Volts×Coulombs

= 12×C×Volts2

Power Watts Energy / Time = Volts× I= Joules/sec

Page 537: ECE 327 Slides VHDL Verilog Digital Hardware Design

510 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.1.4 Batteries, Power and Energy

6.1.4.1 Do Batteries Store Energy orPower?

Energy = Volts×Coulombs

Power =EnergyTime

Batteries rated in Amp-hours at a voltage.

battery = Amps×Seconds×Volts

= CoulombsSeconds ×Seconds×Volts

= Coulombs×Volts

= Energy

Batteries store energy.

Page 538: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.1.4 Batteries, Power and Energy 511

6.1.4.2 Battery Life and Efficiency

To extend battery life, we want to increase the amount of work done and/ordecrease energy consumed.

Work and energy are same units, therefore to extend battery life, we truly want toimprove efficiency.

“Power efficiency” of microprocessors normally measured in MIPS/Watt. Is this areal measure of efficiency?

MIPsWatts = millions of instructions

Seconds ×SecondsEnergy

= millions of instructionsEnergy

Both instructions executed and energy are measures of work, so MIPs/Watt is ameasure of efficiency.

Question: What is the weakness of this analysis?

Page 539: ECE 327 Slides VHDL Verilog Digital Hardware Design

512 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.1.4.3 Battery Life and Power

Question: Running a VHDL simulation requires executing an average of 1million instructions per simulation step. My computer runs at 700MHz, has aCPI of 1.0, and burns 70W of power. My battery is rated at 10V and 2.5AH.Assuming all of my computer’s clock cycles go towards running VHDLsimulations, how many simulation steps can I run on one battery charge?

Page 540: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.1.4 Batteries, Power and Energy 513

Battery Life and Power

Question: If I use the SpeedStep feature of my computer, my computerruns at 600MHz with 60W of power. With SpeedStep activated, muchlonger can I keep the computer running on one battery?

Page 541: ECE 327 Slides VHDL Verilog Digital Hardware Design

514 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Battery Life and Power

Question: With SpeedStep activated, how many more simulation steps canI run on one battery?

Page 542: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.2. POWER EQUATIONS 515

6.2 Power Equations

Power = SwitchPower+ShortPower︸ ︷︷ ︸

+ LeakagePower︸ ︷︷ ︸

DynamicPower StaticPower

Dynamic Power dependent upon clock speed

Switching Power useful — charges up transistors

Short Circuit Power not useful — both N and P transistors are on

Static Power independent of clock speed

Leakage Power not useful — leaks around transistor

Page 543: ECE 327 Slides VHDL Verilog Digital Hardware Design

516 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Dynamic Power

Dynamic power is proportional to how often signals change their value (switch).• Roughly 20% of signals switch during a clock cycle.

• Need to take glitches into account when calculating activity factor. Glitchesincrease the activity factor.

• Equations for dynamic power contain clock speed and activity factor.

Page 544: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.2.1 Switching Power 517

6.2.1 Switching Power

1->00->1CapLoad

Charging a capacitor

0->11->0CapLoad

Disharging a capacitor

energy to (dis)charge capacitor =12×CapLoad×VoltSup2

Page 545: ECE 327 Slides VHDL Verilog Digital Hardware Design

518 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Switching Power

When a capacitor C is charged to a voltage V , the energy stored in capacitor is12CV 2.

The energy required to charge the capacitor from 0 to V is CV 2. Half of the energy(12CV 2 is dissipated as heat through the pullup resistance. Half of energy is

transfered to the capacitor.

When the capacitor discharges from V to 0, the energy stored in the capacitor(12CV 2) is dissipated as heat through the pulldown resistance.

Page 546: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.2.1 Switching Power 519

Switching Power

f ′: frequency at which invertor goes through complete charge-discharge cycle .(eqn 15.4 in Smith)

average switching power = f ′×CapLoad×VoltSup2

ClockSpeed clock speedActFact average number of times that signal switches from 0→ 1

or from 1→ 0 during a clock cycle

average switching power =12×ActFact×ClockSpeed×CapLoad×VoltSup2

Page 547: ECE 327 Slides VHDL Verilog Digital Hardware Design

520 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.2.2 Short-Circuited Power

Vi Vo

IShort

VoltSup

GND

VoltThresh

VoltSup - VoltThresh

P-trans on

N-trans on

TimeShort

Gate Voltage

PwrShort = ActFact×ClockSpeed×TimeShort× IShort×VoltSup

Page 548: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.2.3 Leakage Power 521

6.2.3 Leakage Power

N-substrate

P

Vi

Vo

N N P

P

Cross section of invertor showingparasitic diode

I

V

ILeak

Leakage current through parasitic diode

PwrLk = ILeak×VoltSup

ILeak ∝ e

(−q×VoltThresh

k×T

)

Page 549: ECE 327 Slides VHDL Verilog Digital Hardware Design

522 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.2.4 Glossary

This section reserved for your reading pleasure

6.2.5 Note on Power Equations

This section reserved for your reading pleasure

6.3 Overview of Power ReductionTechniques

We can divide power reduction techniques into two classes: analog and digital.

Page 550: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.3. OVERVIEW OF POWER REDUCTION TECHNIQUES 523

Analog Parameters

Power reduction parameters at the analog level.

capacitance for example, Silicon on Insulator (SOI)

resistance for example, copper wires

voltage low-voltage circuits

Page 551: ECE 327 Slides VHDL Verilog Digital Hardware Design

524 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Analog Techniques

Power reduction techniques at the analog level.

dual-VDD Two different supply voltages: high voltage for performance-criticalportions of design, low voltage for remainder of circuit. Alternatively, can varyvoltage over time: high voltage when running performance-critical software andlow voltage when running software that is less sensitive to performance.

dual-Vt Two different threshold voltages: transistors with low threshold voltagefor performance-critical portions of design (can switch more quickly, but moreleakage power), transistors with high threshold voltage for remainder of circuit(switches more slowly, but reduces leakage power).

exotic circuits Special flops, latches, and combinational circuitry that run at ahigh frequency while minimizing power

adiabatic circuits Special circuitry that consumes power on 0→ 1 transitions,but not 1→ 0 transitions. These sacrifice performance for reduced power.

clock trees Up to 30% of total power can be consumed in clock generation andclock tree

Page 552: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.3. OVERVIEW OF POWER REDUCTION TECHNIQUES 525

Digital Parameters

Power-reduction parameters at the digital level.

capacitance (number of gates)

activity factor

clock frequency

Page 553: ECE 327 Slides VHDL Verilog Digital Hardware Design

526 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Digital Techniques

Power-reduction techniques at the digital level.

multiple clocks Put a high speed clock in performance-critical parts of designand a low speed clock for remainder of circuit

clock gating Turn off clock to portions of a chip when it’s not being used

data encoding Gray coding vs one-hot vs fully encoded vs ...

glitch reduction Adjust circuit delays or add redundant circuitry to reduce oreliminate glitches.

asynchronous circuits Get rid of clocks altogether....

Additional low-power design techniques for RTL from a Qualis engineer:http://home.europa.com/ ˜ celiac/lowpower.html

Page 554: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.4. VOLTAGE REDUCTION FOR POWER REDUCTION 527

6.4 Voltage Reduction for PowerReduction

If our goal is to reduce power, the most promising approach is to reduce thesupply voltage, because, from:

Power = (ActFact×ClockSpeed× 12CapLoad×VoltSup2)

+ (ActFact×ClockSpeed×TimeShort× IShort×VoltSup)+ (ILeak×VoltSup)

we observe:

Power ∝ VoltSup2

Page 555: ECE 327 Slides VHDL Verilog Digital Hardware Design

528 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Reducing Difference Between Supply and

Threshold Voltage

As the supply voltage decreases, it takes longer to charge up the capacitive load,which increases the load delay of a circuit.

In the chapter on timing analysis, we saw that increasing the supply voltage willdecrease the delay through a circuit. (From V = IR, increasing V causes anincrease in I, which causes the capacitive load to charge more quickly.) However,it is more accurate to take into account both the value of the supply voltage, andthe difference between the supply voltage and the threshold voltage.

MaxClockSpeed ∝(VoltSup−VoltThresh)2

VoltSup

Page 556: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.4. VOLTAGE REDUCTION FOR POWER REDUCTION 529

Effect of Decreasing Supply Voltage on

Delay

Question: If the delay along the critical path of a circuit is 20 ns, the supplyvoltage is 2.8 V, and the threshold voltage is 0.7 V, calculate the critical pathdelay if the supply voltage is dropped to 2.2 V.

Page 557: ECE 327 Slides VHDL Verilog Digital Hardware Design

530 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Reducing Threshold Voltage IncreasesLeakage Current

If we reduce the supply voltage, we want to also reduce the threshold voltage, sothat we do not increase the delay through the circuit. However, as thresholdvoltage drops, leakage current increases:

ILeak ∝ e

(−q×VoltThresh

k×T

)

And increasing the leakage current increases the power:

Power ∝ ILeak

So, need to strike a balance between reducing VoltSup (which has a quadraticaffect on reducing power), and increasing ILeak, which has a linear affect onincreasing power.

Page 558: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5. DATA ENCODING FOR POWER REDUCTION 531

6.5 Data Encoding for Power Reduction

6.5.1 How Data Encoding Can ReducePower

Data encoding is a technique that chooses data values so that normal executionwill have a low activity factor.

The most common example is “Gray coding” where exactly one bit changes valueeach clock cycle when counting.

Page 559: ECE 327 Slides VHDL Verilog Digital Hardware Design

532 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Decimal Gray Binary0 0000 00001 0001 00012 0011 00103 0010 00114 0110 01005 0111 01016 0101 01107 0100 01118 1100 10009 1101 1001

10 1111 101011 1110 101112 1010 110013 1011 110114 1001 111015 1000 1111

Page 560: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5.1 How Data Encoding Can Reduce Power 533

8-bit Counter

Question: For an eight-bit counter, how much more power will a binarycounter consume than a Gray-code counter?

Page 561: ECE 327 Slides VHDL Verilog Digital Hardware Design

534 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Random Data

Question: For completely random eight-bit data, how much more power willa binary circuit consume than a Gray-code circuit?

Page 562: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5.2 Example Problem: Sixteen Pulser 535

6.5.2 Example Problem: Sixteen Pulser

6.5.2.1 Problem StatementYour task is to do the power analysis for a circuit that should send out aone-clock-cycle pulse on the done signal once every 16 clock cycles. (That is,done is ’0’ for 15 clock cycles, then ’1’ for one cycle, then repeat with 15 cycles of’0’ followed by a ’1’, etc.)

done

1 2 3 1615 17 3231 33

clk

Required behaviour

You have been asked to consider three different types of counters: a binarycounter, a Gray-code counter, and a one-hot counter. (The table below shows thevalues from 0 to 15 for the different encodings.)

Question: What is the relative amount of power consumption for thedifferent options?

Page 563: ECE 327 Slides VHDL Verilog Digital Hardware Design

536 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.5.2.2 Additional Information

Your implementation technology is an FPGA where each cell has a programablecombinational circuit and a flip-flop. The combinational circuit has 4 inputs and 1output. The capacitive load of the combinational circuit is twice that of the flip-flop.

PLA

cell

1. You may neglect power associated with clocks.

2. You may assume that all counters:

(a) are implemented on the same fabrication process

(b) run at the same clock speed

(c) have negligible leakage and short-circuit currents

Page 564: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5.2 Example Problem: Sixteen Pulser 537

Data Encoding

Decimal Gray One-Hot Binary0 0000 0000000000000001 00001 0001 0000000000000010 00012 0011 0000000000000100 00103 0010 0000000000001000 00114 0110 0000000000010000 01005 0111 0000000000100000 01016 0101 0000000001000000 01107 0100 0000000010000000 01118 1100 0000000100000000 10009 1101 0000001000000000 1001

10 1111 0000010000000000 101011 1110 0000100000000000 101112 1010 0001000000000000 110013 1011 0010000000000000 110114 1001 0100000000000000 111015 1000 1000000000000000 1111

Page 565: ECE 327 Slides VHDL Verilog Digital Hardware Design

538 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.5.2.3 Answer

Sketch the Circuitry

Name the output “done” and the count digits “d()”.

Page 566: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5.2 Example Problem: Sixteen Pulser 539

Capacitance

cap number subtotal capGray d() PLAs

Flops

done PLAs

Flops

1-Hot d() PLAs

Flops

done PLAs

Flops

Binary d() PLAs

Flops

done PLAs

Flops

Page 567: ECE 327 Slides VHDL Verilog Digital Hardware Design

540 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Activity Factors

Gray Coding Activity Factor

d(0)

d(1)

d(2)

d(3)

done

clk

4/16

2/16

2/16

2/16

8/16

Gray coding

Page 568: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5.2 Example Problem: Sixteen Pulser 541

One-Hot Activity Factor

d(0)

d(1)

d(2)

done

clk

2/16

2/16

2/16

2/16

2/16

One-hot coding

Page 569: ECE 327 Slides VHDL Verilog Digital Hardware Design

542 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Binary Coding Activity Factor

d(0)

d(1)

d(2)

d(3)

done

clk

8/16

4/16

2/16

2/16

16/16

Binary coding

Page 570: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.5.2 Example Problem: Sixteen Pulser 543

Putting it all Together

subtotal cap act fact power

Gray d() PLAs

Flops

done PLAs

Flops

Total

1-Hot d() PLAs

Flops

done PLAs

Flops

Total

Binary d() PLAs

Flops

done PLAs

Flops

Total

Page 571: ECE 327 Slides VHDL Verilog Digital Hardware Design

544 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.6 Clock Gating

The basic idea of clock gating is to reduce power by turning off the clock when acircuit isn’t needed. This reduces the activity factor.

6.6.1 Introduction to Clock Gating

Examples of Clock Gating

Condition Circuitry turned offO/S in standby mode Everything except “core” state (PC, registers,

caches, etc)No floating point instruc-tions for k clock cycles

floating point circuitry

Instruction cache miss Instruction decode circuitryNo instruction in pipestage i

Pipe stage i

Page 572: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.2 Implementing Clock Gating 545

6.6.2 Implementing Clock Gating

Clock gating is implemented by adding a component that disables the clock whenthe circuit isn’t needed.

i_data

clk

o_data

i_valid

o_valid

Without clock gating

Clock EnableState Machine

clk

i_wakeup

clk_en

cool_clk

i_data o_data

i_valid

o_valid

With clock gating

Page 573: ECE 327 Slides VHDL Verilog Digital Hardware Design

546 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.6.3 Design Process

6.6.4 Effectiveness of Clock Gating

Parameters to characterize effectiveness of clock gating:

Eff = effectiveness of clock gatingPctValid = percentage of clock cycles with valid data in the circuit —

the clock must be togglingPctClk = percentage of clock cycles that clock toggles

Effectiveness measures the percentage of clock cycles with invalid data in whichthe clock is turned off. Equation for effectiveness of clock gating:

Eff =PctClkOffPctInvalid

=1−PctClk

1−PctValid

Page 574: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.4 Effectiveness of Clock Gating 547

Clock Gating Effectiveness Questions

Question: What is the effectiveness if the clock toggles only when there isvalid data?

Question: What is the effectiveness of a clock that always toggles?

Page 575: ECE 327 Slides VHDL Verilog Digital Hardware Design

548 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Clock Gating Effectiveness Questions

Question: What does it mean for a clock gating scheme to be 75%effective?

Question: What happens if PctClk < PctValid?

Page 576: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.4 Effectiveness of Clock Gating 549

Effect of Effectiveness

We can see the effect of the effectiveness of a clock-gating scheme on the activityfactor:

A’

Eff

A

0 10

PctValid * A

The new activity factor with a clock gating scheme is:

A′ = A− (1−PctValid)×Eff ×A

Page 577: ECE 327 Slides VHDL Verilog Digital Hardware Design

550 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.6.5 Example: Reduced Activity Factorwith Clock Gating

Question: How much power will be saved in the following clock-gatingscheme?

• 70% of the time the main circuit has valid data

• clock gating circuit is 90% effective (90% of the time that the circuit has invaliddata, the clock is off)

• clock gating circuit has 10% of the area of the main circuit

• clock gating circuit has same activity factor as main circuit

• neglect short-circuiting and leakage power

Page 578: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.5 Example: Reduced Activity Factor with Clock Gating 551

Page 579: ECE 327 Slides VHDL Verilog Digital Hardware Design

552 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.6.6 Clock Gating with Valid-Bit Protocol

6.6.6.1 Valid-Bit Protocol

Need a mechanism to tell circuit when to pay attention to data inputs

clk

i_valid

i_data o_data

o_valid

clk

i_valid

i_data α β γ

Page 580: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.6 Clock Gating with Valid-Bit Protocol 553

Valid-Bit Protocol

clk

i_valid

i_data o_data

o_valid

clk

i_valid

i_data

o_data

o_valid

α β γ

α β γ

i valid : high when i data has valid data — signifies whether circuit should payattention to or ignore data.

o valid : high when o data has valid data — signifies whether whetherenvironment should pay attention to output of circuit.

For more on circuit protocols, see section 2.12.

Page 581: ECE 327 Slides VHDL Verilog Digital Hardware Design

554 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Microscopic Analysis

Which clock edges are needed?

i_valid

clk

o_valid

clk

i_valid

o_valid

Page 582: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.6 Clock Gating with Valid-Bit Protocol 555

6.6.6.2 How Many Clock Cycles forModule?

Given a module with latency Lat , if the module receives a stream of NumPclsconsecutive valid parcels, how many clock cycles must the clock-enable signal beasserted?

Latency NumPcls NumClkEn

i_valido_validclk_en

Latency NumPcls NumClkEn

i_valido_validclk_en

i_valido_validclk_en

i_valido_validclk_en

i_valido_validclk_en

i_valido_validclk_en

i_valido_validclk_en

Page 583: ECE 327 Slides VHDL Verilog Digital Hardware Design

556 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

6.6.6.3 Adding Clock-Gating Circuitry

Before Clock Gating

data_in

clk

data_out

valid_in valid_out

clk

α β δγ

α β γ

data_in

valid_in

data_out

valid_out don’t care

uninitialized

Page 584: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.6 Clock Gating with Valid-Bit Protocol 557

After Clock Gating: Circuitry

Clock EnableState Machine

data_in

hot_clk

wakeup_in

data_out

clk_en

cool_clk

valid_in valid_out

wakeup_out

• hot clk : clock that always toggles

• cool clk : gated clock — sometimes toggles, sometimes stays low

• wakeup : alerts circuit that valid data will be arriving soon

• clk en : turns on cool clk

Page 585: ECE 327 Slides VHDL Verilog Digital Hardware Design

558 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

After Clock Gating: New Signals

data_in

valid_in

hot_clk

data_out

valid_out

wakeup_in

cool_clk

clk_en

wakeup_out

α β δγ

α β γ

Page 586: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.7 Example: Pipelined Circuit with Clock-Gating 559

6.6.7 Example: Pipelined Circuit withClock-Gating

Design a “clock enable state machine” for the pipelined component describedbelow.• capacitance of pipelined component = 200

• latency varies from 5 to 10 clock cycles, even distribution of latencies

• contains a maximum of 6 instructions (parcels of data).

• 60% of incoming parcels are valid

• average length of continuous sequence of valid parcels is 80

• use input and output valid bits for wakeup

• leakage current is negligible

• short-circuit current is negligible

• LUTs have a capacitance of 1, flops have a capacitance of 2

Page 587: ECE 327 Slides VHDL Verilog Digital Hardware Design

560 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Waveforms for Parcel Count

i_valid

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

o_valid

parcel_count

parcel_clk_en

18 19 20 21 22 23 24

Waveforms for Cycle Count

i_valid

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

o_valid

cycle_count

1 2 0 0 0 1 2 3 4 1 2 3 4 5 6 7 8 9 1000

cycle_clk_en

18 19 20 21 22 23 24

5

Page 588: ECE 327 Slides VHDL Verilog Digital Hardware Design

6.6.7 Example: Pipelined Circuit with Clock-Gating 561

Summary of Design Process

Outline:

1. sketch out circuitry for parcel count and cycle count state machine

2. estimate capacitance of each state machine

3. estimate activity factor of main circuit, based on behaviour

Page 589: ECE 327 Slides VHDL Verilog Digital Hardware Design

562 CHAPTER 6. POWER ANALYSIS AND POWER-AWARE DESIGN

Parcel Count Design

Need to count (0..6) parcels, therefore need 3 bits for counter.

Counter must be able to increment and decrement.

Equations for counter action (increment/decrement/no-change):

i valid o valid action0 0 no change0 1 decrement1 0 increment1 1 no change

Page 590: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 7

Fault Testing and Testability

563

Page 591: ECE 327 Slides VHDL Verilog Digital Hardware Design

564 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1 Faults and Testing

7.1.1 Overview of Faults and Testing

7.1.1.1 Faults

During manufacturing, faults can occur that make the physical product behaveincorrectly.

Definition : A fault is a manufacturing defect that causes a wire, poly, diffusion, orvia to either break or connect to something it shouldn’t .

Good wires Shorted wires Open wire

Page 592: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.1 Overview of Faults and Testing 565

7.1.1.2 Causes of Faults• Fabrication process (initial construction is bad)

chemical mix, impurities, dust

• Manufacturing process (damage during construction)

– handling: probing, cutting, mounting

– materials: corrosion, adhesion failure, cracking, peeling

7.1.1.3 Testing

Definition Testing is the process of checking that the manufacturedwafer/chip/board/system has the same functionality as the simulations.

Page 593: ECE 327 Slides VHDL Verilog Digital Hardware Design

566 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1.1.4 Burn In

Definition Burn-in: The process of subjecting chips to extreme conditions (highand low temps, high and low voltages, high and low clock speeds) before andduring testing.

Soon to break wire

7.1.1.5 Bin Sorting

Each chip (or wafer) is run at a variety of clock speeds. The chips are grouped andlabeled (binned) by the maximum clock frequency at which they will work reliably.

For example, chips coming off of the same production line might be labelled as800MHz, 900MHz, and 1000MHz.

Page 594: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.2 Example Problem: Economics of Testing 567

7.1.1.6 Testing Techniques

7.1.1.7 Design for Testability (DFT)

7.1.2 Example Problem: Economics ofTesting

Note: There is a tradeoff between the amount of money spenton testing chips vs dealing with (e.g. replacing) faulty chips. Usu-ally the best tradeoff is to ship chips with a small, but non-zeroprobability that the chip has a fault.

7.1.3 Physical Faults

Page 595: ECE 327 Slides VHDL Verilog Digital Hardware Design

568 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1.3.1 Types of Physical Faults

Good Circuit Bad Circuitsab

cd open

ab

cd

wired-AND bridging shortab

cd

wired-OR bridging shortab

cd

stronger wins bridging shortab

cd

(b is stronger)

short to VDDab

cd

short to GND

ab

cd

Page 596: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.3 Physical Faults 569

7.1.3.2 Locations of Faults

Each segment of wire, poly, diffusion, via, etc is a potential fault location.

Different segments affect different gates in the fanout.

A potential fault location is a segment or segments where a fault at any positionaffects the same set of gates in the same way.

b b

Page 597: ECE 327 Slides VHDL Verilog Digital Hardware Design

570 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1.3.3 Layout Affects Locations

a

d

ef

g

h

ibc

e

g

h

bL1

L2

L3

L4

e

g

h

bL1

L2

L3

L4

L5

7.1.3.4 Naming Fault Locations

Two ways to name a fault location:

pin-fault model Faults are modelled as occuring on input and output pins ofgates.

net-fault model Faults are modelled as occuring on segments of wires.

In E&CE 327, we’ll use the net-fault model, because it is simpler to work with andis closer to what actually happens in hardware.

Page 598: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.4 Detecting a Fault 571

7.1.4 Detecting a Fault

To detect a fault, we compare the actual output of the circuit against the expectedvalue.

7.1.4.1 Which Test Vectors will Detect aFault?

Question: For the good circuit and faulty circuit shown below, which testvectors will detect the fault?

a b

c

d

e

Good circuit

a b

c

d

e

Faulty circuit

Page 599: ECE 327 Slides VHDL Verilog Digital Hardware Design

572 CHAPTER 7. FAULT TESTING AND TESTABILITY

Answer:

a b c good faulty0 0 0 0 00 0 1 1 10 1 0 0 00 1 1 1 11 0 0 0 01 0 1 1 11 1 0 1 01 1 1 1 1

Sometimes multiple test vectors will catch the same fault.

Sometimes a single test vector can catch multiple faults.

Page 600: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.4 Detecting a Fault 573

a b

c

d

e

a b

c

d

e

Another fault

a b c good faulty1 1 0 1 0 ←−

The test vector 110 can catch both this fault and the previous one.

Note: Detect vs. diagnose Testing detects faults. Testing doesnot diagnose which fault occurred.

Page 601: ECE 327 Slides VHDL Verilog Digital Hardware Design

574 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1.5 Mathematical Models of Faults

Goal: develop reliable and predictable technique for detecting faults in circuits.

Observations:

• The possible faults in a circuit are dependent upon the physical layout of thecircuit.

• A very wide variety of possible faults

• A single test vector can catch many different faults

Need: a mathematical model for faults that is abstracted from complexities ofcircuit layout and plethora of possible faults, yet still detects most or all possiblefaults.

Page 602: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.5 Mathematical Models of Faults 575

7.1.5.1 Single Stuck-At Fault Model

Two simplifying assumptions:

1. A maximum of one fault per tested circuit (hence “single”)

2. All faults are either:

(a) stuck-at 1: short to VDD

(b) stuck-at 0: short to GND

hence, “stuck at”

Page 603: ECE 327 Slides VHDL Verilog Digital Hardware Design

576 CHAPTER 7. FAULT TESTING AND TESTABILITY

Example of Stuck-At Faults

a

d

ibc

Question: If we consider all possible stuck-at faults, how many faultycircuits would we need to test for?

Question: If we consider only single-stuck-at faults, how many faultycircuits would we need to test for?

Page 604: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.6 Generate Test Vector to Find a Mathematical Fault 577

7.1.6 Generate Test Vector to Find aMathematical Fault

Faults are detected by stimulating circuits (real, manufactured circuit, not asimulation!) with test-vectors and checking that the real circuit gives the correctoutput.

7.1.6.1 Algorithm1. compute Karnaugh map for correct circuit

2. compute Karnaugh map for faulty circuit

3. find region of disagreement

4. any assignment in region of disagreement is a test vector that will detect fault

5. any assignment outside of region of disagreement will result in same output onboth correct and faulty circuit

Page 605: ECE 327 Slides VHDL Verilog Digital Hardware Design

578 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1.6.2 Example of Finding a Test Vector

a b

c

d

e

a b

c

d

e

c

ba

1

0

10 11 01 00ba ba ba

c

a b

c

ab

c

Good circuit Faulty circuit

Question: Find a test test vector will detect the faulty circuit

a bc

Page 606: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.7 Undetectable Faults 579

7.1.7 Undetectable Faults

Not all faults are detectable.

1. If a circuit is irredundant then all single stuck-at faults can be detected.

A redundant circuit is one where one or more gates can be removedwithout affecting the functional behaviour.

2. If not trying to find all of the faults in a circuit, then a fault that you aren’t lookingfor can mask a fault that you are looking for.

7.1.7.1 Redundant Circuitry

Some faults are undetectable. Undetectable stuck-at faults are located inredundant parts of a circuit.

Page 607: ECE 327 Slides VHDL Verilog Digital Hardware Design

580 CHAPTER 7. FAULT TESTING AND TESTABILITY

Timing Hazards

Static hazardDynamic hazard

Timing hazards are often removed byadding redundant circuitry.

Redundant Circuitry

ab

c

1,0

1,1

1,1

0,10,1

1,0

1,0,1

d

e

fg

Irredundant circuit

a

b

c

d

e

f

g

Illustration of timing hazard

Glitch on g is caused because the AND gate for e turns off before f turns on.

Page 608: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.1.7 Undetectable Faults 581

Redundant Circuitry

Question: Add one or more gates to the circuit so that the static hazard isguaranteed to be prevented, independent of the delay values through thegates

a b

c

ab

c

1,0

1,1

1,1

0,10,1

1,0

1,0,1

d

e

fg

Redundant Circuitry

Question: Has the redundant circuitry introduced any undetectable faults?If so, identify an undetectable fault.

Page 609: ECE 327 Slides VHDL Verilog Digital Hardware Design

582 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.1.7.2 Curious Circuitry and FaultDetection

Curiously, the stuck-at fault at L1 is undetectable, but faults at either L2 or L3 aredetectable.

a

b

c

zL1

L2

L3

a

c

z

ab

c

fault eqn K-map diff w/ ckt

L2@0 a⊕ (b⊕ c)

ab

c

ab

c

L2@1 a⊕ (b⊕ c)

ab

c

ab

c

Page 610: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2. TEST GENERATION 583

7.2 Test Generation

7.2.1 A Small Example

a

b

c

zL2

L4

L5

ab+bca

bc

fault eqn K-map diff w/ ckt test vectors

1) L2@1

a bc

a bc

2) L4@1

a bc

a bc

3) L5@1

a bc

a bc

Page 611: ECE 327 Slides VHDL Verilog Digital Hardware Design

584 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.2 Choosing Test Vectors

The goal of test vector generation is to find the smallest set of test vectors that willdetect the faults of interest.

Test vector generation requires analyzing the faults.

We can simplify the task of fault analysis by reducing the number of faults that wehave to analyze.

Smith has examples of this in Figures 14.13 and 14.14.

Page 612: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.2 Choosing Test Vectors 585

7.2.2.1 Fault Dominationfault eqn K-map Diff w/ ckt test vectors

1) L5@1 ab+c

ab

c

ab

c

101, 001

2) L6@1 1

ab

c

ab

c

101, 001, 100, 010, 000

Definition dominates: f1 dominates f2: any test vector that detects f1 willalso detect f2.

When choosing test vectors, we can ignore the dominated fault, but must keep thedominant fault.

Question: To detect both L5@1 and L6@1, can we ignore one of the faults?

Question: What would happen if we ignored the “wrong” fault?

Page 613: ECE 327 Slides VHDL Verilog Digital Hardware Design

586 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.2.2 Fault Equivalence

fault eqn K-map Diff w/ ckt

1) L1@1 b

ab

c

ab

c

2) L3@1 b

ab

c

ab

c

Definition fault equivalence: f1 is equivalent to f2: f1 and f2 are detected byexactly the same set of test vectors. That is, all of the test vectors thatdetect f1 will also detect f2, and vice versa.

When choosing test vectors we can ignore one of the faults and just include theother.

Page 614: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.2 Choosing Test Vectors 587

7.2.2.3 Gate Collapsing

A stuck-at-1 fault on the input to an OR gate is equivalent to a stuck-at-1 fault onthe output of the OR gate.

Definition Gate collapsing: : The technique of looking at the functionality of agate and finding equivalent faults between inputs and outputs.

Sets of collapsable faults for common gates

AND

@0

@0@0

OR

@1

@1@1

Question What is the set of collapsible faults for a NAND gate?

NAND

Page 615: ECE 327 Slides VHDL Verilog Digital Hardware Design

588 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.2.4 Node Collapsing

Note: Node collapsing is relevant only for the pin-fault model

7.2.2.5 Fault Collapsing Summary

When calculating the test-vectors to detect a set of faults, apply the faultcollapsing techniques of:• gate collapsing

• node collapsing (if using pin-fault model)

• general fault equivalence (intelligent collapsing)

• fault domination

to reduce the number of faults that you must examine.

Page 616: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.3 Fault Coverage 589

7.2.3 Fault Coverage

Definition Fault coverage: percentage of detectable faults that are detected by aset of test vectors.

FaultCoverage =DetectedFaults

DetectableFaults

Some people’s definition of fault coverage has a denominator of AllPossibleFaults,not just those that are detectable.

Page 617: ECE 327 Slides VHDL Verilog Digital Hardware Design

590 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.4 Test Vector Generation and FaultDetection

There are two ways to generate vectors and check results: built-in tests and scantesting.

Both require:• generate test vectors

• overide normal datapath to send test-vectors, rather than normal inputs, asinputs to flops

• compare outputs of flops to expected result

Page 618: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.5 Generate Test Vectors for 100% Coverage 591

7.2.5 Generate Test Vectors for 100%Coverage

In this section we will find the test vectors to achieve 100% coverage of singlestuck at faults for the circuit of the day.

We will use a simple algorithm, there are much more sophisticated algorithms thatare more efficient.

The problem of test vector generation is often called Automatic Test PatternGeneration (ATPG) and continues to be an active area of research.

a

b

c

z

L1

L2

L3

L4

L5

L6

L7

L8

ab+bca

bc

Example Circuit with Fault Locations and Karnaugh Map

Page 619: ECE 327 Slides VHDL Verilog Digital Hardware Design

592 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.5.1 Collapse the Faults

Initial circuit with potential faults:

a

b

c

z

L7@0,1

L6@0,1

L8@0,1

L1@0,1

L2@0,1

L3@0,1

L4@0,1

L5@0,1

Page 620: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.5 Generate Test Vectors for 100% Coverage 593

Gate Collapsing

gate faults kept fault

For each set of equivalent faults, we will keep the fault shown in bold and eliminatethe other faults. A good heuristic for choosing which fault to keep: keep the faultcloses to the output. The closer a fault is to the output, the easier it is to analyzeits behaviour, because the equation for the output will be simpler.

Page 621: ECE 327 Slides VHDL Verilog Digital Hardware Design

594 CHAPTER 7. FAULT TESTING AND TESTABILITY

Intelligent Collapsing1. delete faults that previously decided could be ignored

2. by intelligent analysis of circuit, find equivalent faults

a

b

c

z

L7@0,1

L6@0,1

L8@0,1

L1@0,1

L2@0,1

L3@0,1

L4@0,1

L5@0,1

Page 622: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.5 Generate Test Vectors for 100% Coverage 595

7.2.5.2 Check for Fault Dominationfault eqn K-map Diff w/ ckt

1) L2@1 a+c

ab

ca b

c

2) L3@1 b

ab

ca b

c

3) L4@1 a+bc

ab

ca b

c

4) L5@1 ab+c

ab

ca b

c

5) L6@0 bc

ab

ca b

c

6) L7@0 ab

ab

ca b

c

7) L8@0 0

ab

ca b

c

8) L8@1 1

ab

ca b

c

Page 623: ECE 327 Slides VHDL Verilog Digital Hardware Design

596 CHAPTER 7. FAULT TESTING AND TESTABILITY

Remove dominated faults

Current faults:

a

b

c

z

L7@0,1

L6@0,1

L8@0,1

L1@0,1

L2@0,1

L3@0,1

L4@0,1

L5@0,1

Dominated faults:

Page 624: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.5 Generate Test Vectors for 100% Coverage 597

7.2.5.3 Required Test Vectors

Definition required test vector: A test vector tv is required if there is a fault forwhich tv is the only test vector that will detect the fault.

fault eqn K-map Diff w/ ckt

1) L3@1 b

ab

c

ab

c

2) L4@1 a+bc

ab

c

ab

c

3) L5@1 ab+c

ab

c

ab

c

4) L6@0 bc

ab

c

ab

c

5) L7@0 ab

ab

c

ab

c

Page 625: ECE 327 Slides VHDL Verilog Digital Hardware Design

598 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.5.4 Faults Not Covered by RequiredTest Vectors

fault eqn K-map Diff w/ ckt

1) L4@1 a+bc

ab

c

ab

c

2) L5@1 ab+c

ab

c

ab

c

Test vector(s) required to catch these faults:

Page 626: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.5 Generate Test Vectors for 100% Coverage 599

7.2.5.5 Order to Run Test Vectors

The order in which the test vectors are run is important because it can affect howlong a faulty chip stays in the tester before the chip’s fault is detected.

The first vector to run should be the one that detects the most faults.

Build a table for which faults each test vector will detect.

Page 627: ECE 327 Slides VHDL Verilog Digital Hardware Design

600 CHAPTER 7. FAULT TESTING AND TESTABILITY

Test Vector

faulta

bc

ab

c

ab

c

ab

c

110 010 011 101

1) L1@0a

bc

1

2) L1@1a

bc

1

3) L2@0a

bc

1 1

4) L2@1a

bc

1

5) L3@0a

bc

1

6) L3@1a

bc

1

7) L4@0a

bc

1

8) L4@1a

bc

1

9) L5@0a

bc

1

10) L5@1a

bc

1

11) L6@0a

bc

1

12) L6@1a

bc

1 1

13) L7@0a

bc

1

14) L7@1a

bc

1 1

15) L8@0a

bc

1 1

16) L8@1a

bc

1 1Faults detected 5 5 5 6

Page 628: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.5 Generate Test Vectors for 100% Coverage 601

7.2.5.6 Summary of Technique to Find andOrder Test Vectors

1. identify all possible faults

2. gate collapsing

3. node collapsing

4. intelligent collapsing

5. fault domination

6. determine required test vectors

7. choose minimal set of test vectors to detect remaining faults

8. order test vectors based on number of faults detected (NOTE: when iteratingthrough this step, need to take into account faults detected by earlier testvectors)

Page 629: ECE 327 Slides VHDL Verilog Digital Hardware Design

602 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.2.6 One Fault Hiding Anothera

b

c

z

L1

L2

L3

L4

L5

L6

L7

L8

Assume that we are not trying to detect all faults — L1 is viewed as not being atrisk for faults, but L3 is at risk for faults.

a

b

c

z

L1

L3

a

b

c

z

L1

L3

Page 630: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.2.6 One Fault Hiding Another 603

Fault Hiding

a

b

c

z

L1

L3

a

b

c

z

L1

L3

Problem: If L1 is stuck-at 1, the test vectors that normally detect L3@0 will notdetect L3@0.

In the presence of other faults, the set of test vectors to detect a fault will change.

fault(s) eqn K-map Diff w/ ckt

L3@0 aba

bc

ab

c

L1@1,L3@0 ba

bc

ab

c

Page 631: ECE 327 Slides VHDL Verilog Digital Hardware Design

604 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.3 Scan Testing in General

7.3.1 Structure and Behaviour of ScanTesting

circuitundertest

data_in(3)

data_in(1)

data_in(2)

data_in(0)

zeta_in(3)

zeta_in(1)

zeta_in(2)

zeta_in(0)

anot

her

circ

uit #

0

anot

her

circ

uit #

1

Normal Circuit

Page 632: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.3.1 Structure and Behaviour of Scan Testing 605

circuitundertest

anot

her

circ

uit

yet a

noth

er c

ircui

t

mode0 scan_in0

scan_out0

mode1 scan_in1

scan_out1

scan

cha

in 0

scan

cha

in 1

Circuit with Scan Chains Added

Page 633: ECE 327 Slides VHDL Verilog Digital Hardware Design

606 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.3.2 Scan Chains

circuitundertest

data_in(3)

data_in(1)

data_in(2)

data_in(0)

zeta_in(3)

zeta_in(1)

zeta_in(2)

zeta_in(0)

anot

her

circ

uit #

0

anot

her

circ

uit #

1Normal Circuit

mode0 scan_in0

circuitundertest

scan_out0

mode1 scan_in1

scan_out1

data_in(3)

data_in(1)

data_in(2)

data_in(0)

zeta_in(3)

zeta_in(1)

zeta_in(2)

zeta_in(0)

Circuit with Scan Chains Added

Page 634: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.3.2 Scan Chains 607

7.3.2.1 Circuitry in Normal and Scan Modemode0 scan_in0

circuitundertest

scan_out0

mode1 scan_in1

scan_out1

Normal Mode

mode0 scan_in0

circuitundertest

scan_out0

mode1 scan_in1

scan_out1

Scan Mode

Page 635: ECE 327 Slides VHDL Verilog Digital Hardware Design

608 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.3.2.2 Scan in Operation

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1sc

an c

hain

0Circuit under test with scan chains

clk

scan_in0

mode0

scan_out1

scan_out0

scan_in1

currentvector0

currentresults1

Sequence of load; test; unload

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1

scan

cha

in 0

currentvector0

Load Test Vector(1 cycle per bit)

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1

scan

cha

in 0

Run Test VectorThrough Circuit

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1

scan

cha

in 0

currentresults1

Unload Result(1 cycle per bit)

Page 636: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.3.2 Scan Chains 609

Unload and Load and Same Time

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1

scan

cha

in 0

currentvector0

previousresults0

previousresults1

currentvector1

Unload Prev ResultLoad Cur Test Vector

(1 cycle per bit)

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1

scan

cha

in 0

Run Cur Test VectorThrough Circuit

circuitundertest

anot

her

circ

uit

yet

anot

her

circ

uit

mode0 scan_in0

scan_out0

mode1 scan_in1

scan

cha

in 0

scan_out1

scan

cha

in 0

next testvector0

currentresults0

currentresults1

next testvector1

Unload Cur ResultLoad New Test Vector

(1 cycle per bit)

clk

scan_in0

mode0

scan_out1

next testvector0

previousresults1

scan_out0

scan_in1 currentvector1

currentresults0

previousresults0currentvector0

next testvector1

currentresults1

Sequence of load; run; unload

Page 637: ECE 327 Slides VHDL Verilog Digital Hardware Design

610 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.3.2.3 Scan in Operation with ExampleCircuit

a b

c z

d

y

Circuit under test

mode0 scan_in0

a

b

c

z

d

y

mode1 scan_in1

scan_out0 scan_out1

Circuit under test with scan test circuitry

Page 638: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.3.2 Scan Chains 611

mode0 scan_in0

a

b

c

z

d

y

mode1 scan_in1

scan_out0 scan_out1

clk

mode0

δδ

Start Loading Test Vector (Load δ)

mode0 scan_in0

a

b

c

z

d

y

mode1 scan_in1

scan_out0 scan_out1

clk

mode0

γ γ δ

δδ

Load γmode0 scan_in0

a

b

c

z

d

y

mode1 scan_in1

scan_out0 scan_out1

clk

mode0

δ

β

γ

δδ

γ

γβ

Load β

mode0 scan_in0

a

b

c

z

d

y

mode1 scan_in1

scan_out0 scan_out1

clk

mode0

α α β

β β γ

γ γ

δδ

δ

Load α

Page 639: ECE 327 Slides VHDL Verilog Digital Hardware Design

612 CHAPTER 7. FAULT TESTING AND TESTABILITY

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

clk

mode0

β β

α βα

α

γ

γ γ δ

Run Test Vector

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

clk

mode0

α

α α β

β β γ

γ γ δ

αβ

α__

δ

β__

γ

βδ

αβ+β__

γ

α__

δ+βδ

Test Values Propagate

(α__

δ+βδ)

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

−-

clk

mode0

δ’ δ’

αβ+β__

γ

α__

δ+βδ

Flop-In Result, Start (Un)loading Test Vector

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

αβ+β__

γ

(α__

δ+βδ, αβ+β__

γ)

−−

clk

mode0

δ’

δ’ δ’

γ’ γ’

Continue (Un)loading Test Vector

Page 640: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.3.2 Scan Chains 613

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

clk

mode0

ζζ

γ’

γ’ γ’ δ’

δ’ δ’

β’ β’

(α__

δ+βδ, αβ+β__

γ)

Continue (Un)loading Test Vector

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

(α__

δ+βδ, αβ+β__

γ)clk

mode0

−ζ

ζ

ζ

ψψ

β’

β’ β’ γ’

γ’ γ’ δ’

δ’ δ’ δ’

α’ α’

Finish (Un)loading Test Vector

mode0 scan_in0 mode1 scan_in1

scan_out0 scan_out1

ψ

(α__

δ+βδ, αβ+β__

γ)

ψ

clk

mode0

α’

β’

γ’

δ’

ψ

ζ

Run Next Test Vector

Page 641: ECE 327 Slides VHDL Verilog Digital Hardware Design

614 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.3.3 Summary of Scan Testing

• Adding scan circuitry

1. Registers around circuit to be tested are grouped into scan chains

2. Replace each flop with mux + flop

3. Flops and muxes wired together into scan chains

4. Each scan chain is connected to dedicated I/O pins for loading andunloading test vectors

• Running test vectors

1. Put scan chain in “scan” mode

2. Load in test vector (one element of vector per clock cycle)

3. Put scan chain in “normal” mode

4. Run circuit for one clock cycle — load result of test into flops

5. Unload results of current test vector while simultaneously loading in nexttest vector (one element of vector per clock cycle)

Page 642: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.3.4 Time to Test a Chip 615

7.3.4 Time to Test a Chip

If the length (number of flops) of a scan chain is n, then it takes 2n+1 clock cyclesto run a single test: n clock cycles to scan in the test vector, 1 clock cycle toexecute the test vector, and n cycles to scan out the results. Once the results arescanned out, they can be compared to the expected results for a correctly workingcircuit.

If we run 2 or more tests (and chips generally are subjected to hundreds ofthousands of tests), then we speed things up by scanning in the next test vectorwhile we scan out the previous result.

ScanLength = number of flip flops in a scan chainNumVectors = number of test vectors in test suiteTimeScan = number of clock cycles to run test suite

= NumVectors× (ScanLength+1)+ScanLength

Page 643: ECE 327 Slides VHDL Verilog Digital Hardware Design

616 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.3.4.1 Example: Time to Test a Chip

A 800MHz chip has scan chains of length 20,000 bits, 18,000 bits, 21,000 bits,22,000 bits, and two of 15,000 bits.

500,000 test vectors are used for each scan chain.

The tests are run at 80% of full speed.

Question: Calculate the total test time.

Page 644: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.4. BOUNDARY SCAN AND JTAG 617

7.4 Boundary Scan and JTAG

Boundary scan originated as technique to test wires on printed circuit boards(PCBs).

Goal was to replace “bed-of-nails” style testing with technique that would work forhigh-density PCBs (lots of small wires close together)

Now used to test both boards and chip internals.

Used both on boundaries (I/O pins) and internal flops.

Page 645: ECE 327 Slides VHDL Verilog Digital Hardware Design

618 CHAPTER 7. FAULT TESTING AND TESTABILITY

Boundary Scan with JTAG

Standardized by IEEE (1149) and previously by JTAG:• 4 required signals (Scan Pins: TDI , TDO, TCK, TMS)

• 1 optional signal (Scan Pin: TRST)

• protocol to connect circuit under test to tester and other circuits

• state machine to drive test circuitry on chip

• Boundary Scan Description Language (BSDL): structural language used todescribe which features of JTAG a circuit supports

JTAG circuitry now commonly built-into FPGAs and ASICS, or part of a cell-library.Rarely is a JTAG circuit custom-built as part of a larger part. So, you’ll probably bechoosing and using JTAG circuits, not constructing new ones.

Using JTAG circuitry is usually done by giving a description of your printed circuitboard (PCB) and the JTAG components on each chip (in BSDL) to test generationsoftware. The software then generates a sequence of JTAG commands and datathat can be used to test the wires on the circuit board for opens and shorts.

Page 646: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.4.1 Scan Instructions 619

JTAG Structure

scan registers

TDI TDOTCK TMS

circuitundertest

chip

control

normalinputpins

normaloutputpins

High-level view

BSC

BSC

BSC

BR

IR

IDCODE

TAP Controller

BSR

TDI TDO

TCK

TMS

IRC IRC

circuitundertest

chip

Instruction Decoder

BSC

BSC

BSC

control

Detailed view

Page 647: ECE 327 Slides VHDL Verilog Digital Hardware Design

620 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.4.1 Scan Instructions

This the set of required instructions, other instructions are optional.

EXTEST Test board-level interconnect. Drive output pins of chipwith hard-coded test vector. Sample results on inputs.

SAMPLE Sample result dataPRELOADLoad test vectorBYPASS Directly connect TDI to TDO. This is used when several

chips are daisy chained together to skip loading data intosome chips.

IDCODE Output manufacturer and part number

Page 648: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5. BUILT IN SELF TEST 621

7.5 Built In Self Test

7.5.1 Block Diagrammode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

resultchecker

all_ok

o_data(0)d(0)

d(1)

d(2)

d(3)

o_data(1)

o_data(2)

circuitundertest

Circuit in Normal Mode

mode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

resultchecker

all_ok

o_data(0)d(0)

d(1)

d(2)

d(3)

o_data(1)

o_data(2)

circuitundertest

Circuit in Test Mode

Page 649: ECE 327 Slides VHDL Verilog Digital Hardware Design

622 CHAPTER 7. FAULT TESTING AND TESTABILITY

Circuit w/ BIST in Normal Mode

circuitundertest

mode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

Page 650: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.1 Block Diagram 623

Circuit w/ BIST in Test Mode

circuitundertest

mode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

Page 651: ECE 327 Slides VHDL Verilog Digital Hardware Design

624 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.1.1 Components

Test Generatormode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

circuitundertest

• generates a psuedo-random set of test vectors

• for n output bits, generates all vectors from 1 to 2n−1 in a pseudo random order

• built with a linear-feedback shift register (shift-register portion is the input flops)

Page 652: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.1 Block Diagram 625

Test Generator

q2q1q0

Question: Why not just use a counter to generate 1..2n−1?

Page 653: ECE 327 Slides VHDL Verilog Digital Hardware Design

626 CHAPTER 7. FAULT TESTING AND TESTABILITY

Signature Analyzer

mode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

circuitundertest

• checks that the output it is examining has the correct results for the completeset of tests that are run

• only has a meaningful result at the end of the entire test sequence.

• built with a linear-feedback shift register

• similar to a hash function or a lossy compression function

• if there are no faults, the signature analyzer will definitely say “ok” (no falsenegatives)

• if there is a fault, the signature analyzer might say “ok” or might say “bad” (falsepositives are possible)

• design tradeoff: more accurate signature analyzers require more hardware

Page 654: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.1 Block Diagram 627

Result Checkermode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

circuitundertest

• signature analyzers output “ok”/”bad” on every clock cycle, but the result is onlymeaningful at the end of running the complete set of test vectors

• the result checker looks at test vector inputs to detect the end of the test suiteand outputs “all ok” if all signature analyzers report “ok” at that moment

• implemented as an AND gate

Page 655: ECE 327 Slides VHDL Verilog Digital Hardware Design

628 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.1.2 Linear Feedback Shift Register(LFSR)

Basically, a shift register (sequence of flip-flops) with the output of the last flip-flopfed back into some of the earlier flip-flops with XOR gates.

Design parameters:

• number of flip-flops

• external or internal XOR

• feedback taps (coefficients)

• external-input orself-contained

• reset or set

S

R

S

R

S

R

reset

d0 q0 d1 q1 d2 q2i

LFSR Example

Page 656: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.1 Block Diagram 629

Example LFSRs

S

R

S

R

S

R

reset

d0 q0 d1 q1 d2 q2i

External-XOR, input, reset

S

R

S

R

S

R

set

d0 q0 d1 q1 d2 q2

External-XOR, no input, set

S

R

S

R

S

R

set

d0 q0 d1 q1 d2 q2i

Internal-XOR, input, set

S

R

S

R

S

R

reset

d0 q0 d1 q1 d2 q2i

Internal-XOR, input, reset

In E&CE 327, we use internal- XOR LFSR’s, because the circuitry matches themathematics of Galois fields.

External-XOR LFSR’s work just fine, but they are more difficult to analyze, becausetheir behaviour can’t be treated as Galois fields.

Page 657: ECE 327 Slides VHDL Verilog Digital Hardware Design

630 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.1.3 Maximal-Length LFSR

Definition maximal-length linear feedback shift register: An LFSR thatoutputs a pseudo-random sequence of all representable bit-vectors except0...00 .

Definition pseudo random: The same elements in the same order every time,but the relationship between consecutive elements is apparantly random.

Maximal-length linear feedback shift registers are used to generate test vectors forbuilt-in self test.

Page 658: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.1 Block Diagram 631

Maximal-Length LFSR Circuits

The figures below illustrate the two maximal-length internal-XOR linear feedbackshift registers that can be constructed with 3 flops.

S

R

S

R

S

R

set

d0 q0 d1 q1 d2 q2

Maximal-length internal-XOR LFSR

S

R

S

R

S

R

set

d0 q0 d1 q1 d2 q2

Maximal-length internal-XOR LFSR

Question: Why do maximal-length LFSRs not generate the test vector0...00?

Page 659: ECE 327 Slides VHDL Verilog Digital Hardware Design

632 CHAPTER 7. FAULT TESTING AND TESTABILITY

Maximal Length LFSR Characteristics

Maximal-length LFSRs:

• set to all 1s initially

• self contained (no external i input)

clk

d0

q0

reset

d1

q1

val 6 4 1 2 5 3 77

q2

1 2 3 4 5 6 7 8

6

Timing diagram for a 3-flop maximal-length LFSR

Page 660: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.2 Test Generator 633

7.5.2 Test Generatormode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

circuitundertest

The test generator component is a maximal-length LFSR ...

S

R

S

R

S

R

set

d0 q0 d1 q1 d2 q2

Page 661: ECE 327 Slides VHDL Verilog Digital Hardware Design

634 CHAPTER 7. FAULT TESTING AND TESTABILITY

Test Generator

The test generator component is a maximal-length LFSR with multiplexors on theinputs to each flip-flop. In test mode, the data input on each flip flop is connectedto the output of the previous flip flop. In normal mode, the input of each flip flop isconnected to the environment.

S

R

S

R

S

R

set

d0 q0 d1 q1 d2 q2

i_d(0) i_d(1) i_d(2)

mode

q2q1q0

Page 662: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.2 Test Generator 635

Test Generator

mode

i_d(0)

i_d(2)

i_d(1)

q0

q1

q2

d0

d1

d2

A test generator, reset not shown

Page 663: ECE 327 Slides VHDL Verilog Digital Hardware Design

636 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.3 Signature Analyzer

There are four things that change between different signature analyzers:

• number of flops (⇑ flops =⇒ ⇑ area, ⇑ accuracy)

• choice of feedback taps: a good choice can improve accuracy (more isn’tnecessarily better)

• bubbles on input to AND gate for “ok”: determined by expected result fromsimulating test sequence through circuit under test and LFSR of analyzer.

mode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

circuitundertest

Page 664: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.3 Signature Analyzer 637

Signature Analyzer

This circuit:

• Two flops, most analyzers use more — the HP boards in the 1970s used 37flops!

• Feedback taps on both flops. Different signature analyzers have differentconfigurations of feedback taps.

• Also contains “ok” tester (AND gate). Expected output of LFSR at end of testsequence is: q0=1 and q1=1 , or 01 . (We know this because of bubble on AND

gate. To see why this is the expected output of the signature analyzer, we wouldneed to know the correct sequence of outputs of the circuit under test.)

S

R

S

R

reset

d0 q0 d1 q1i

ok

Page 665: ECE 327 Slides VHDL Verilog Digital Hardware Design

638 CHAPTER 7. FAULT TESTING AND TESTABILITY

Signature Analyzer

clk

q0

q1

reset

0

0

i i6 i5 i4 i3 i2 i1 i0 -

d0 -

d1

Page 666: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.3 Signature Analyzer 639

Signature Analyzer Timing

clk

q0

q1

reset

0

0

i6

i60

i i6 i5 i4 i3 i2 i1 i0

356 = i3⊕i5⊕i62356 = i2⊕i3⊕i5⊕i6etc...

-

d0 i6 i5 -

d1 0 i6 i5⊕i6

i5

i5⊕i6

i4⊕i6

i4⊕i6

356

356

i4⊕i5

i4⊕i5

346

346

245

245

2356

2356

1346

1346

02356

02356

1245

1245

-

Page 667: ECE 327 Slides VHDL Verilog Digital Hardware Design

640 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.4 Result Checkermode

i_data(0)

i_data(2)

i_data(1)

i_data(3)

testgenerator

signatureanalyzer0

signatureanalyzer1

signatureanalyzer2

resultchecker

all_ok

test gen LFSR

o_data(0)d(0)

d(1)

d(2)

d(3)

ok(0)

ok(1)

ok(2)

o_data(1)

o_data(2)

circuitundertest

The purpose of the result checker is to check the “ok” circuit at the end of the testsequence.

q0 q1 all_ok

reset

q2 ok

Page 668: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.5 Arithmetic over Binary Fields 641

7.5.5 Arithmetic over Binary Fields• Galois Fields!

• Two operations: “+” and “×”

• Two values: 0 and 1

• Bit vectors and shift-registers are written as polynomials in terms of x.

Addition+ represents XOR

expression result0+0 00+1 11+0 11+1 0x+ x 0

Multiplication× represents concatenating shift

registers

expression resultx4×1 x4

x2× x3 x5

Page 669: ECE 327 Slides VHDL Verilog Digital Hardware Design

642 CHAPTER 7. FAULT TESTING AND TESTABILITY

Example

Calculate (x3 + x2 +1)× (x2 + x)

x2 × (x3 + x2 +1) = x5 + x4 + x2

x × (x3 + x2 +1) = x4 + x3 + xx5 + x3 + x2 + x

Page 670: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.6 Shift Registers and Characteristic Polynomials 643

7.5.6 Shift Registers and CharacteristicPolynomials

Each linear feedback shift register has a corresponding characteristic polynomial.

From polynomials to hardware:

• The maximum exponent denotes the number of flops

• The other exponents denote the flops that tap off of feedback line from last flop

• From the characteristic polynomial, we cannot determine whether the shiftregister has an external input. Stated another way, two shift registers that areidentical except that one has an external input and the other does not will havethe same characteristic polynomial.

Page 671: ECE 327 Slides VHDL Verilog Digital Hardware Design

644 CHAPTER 7. FAULT TESTING AND TESTABILITY

Shift Regs and Polynomials

S

R

S

R

reset

d0 q0 q1i

S

R q2

p(x) = x3

S

R

S

R

reset

d0 q0 q1

S

R q2d1i

x0 x1 x2 x3

p(x) = x3 + x

S

R

S

R

reset

d0 q0 q1i

S

R q2

x0 x1 x2 x3

p(x) = x3 +1

Page 672: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.6 Shift Registers and Characteristic Polynomials 645

Shift Regs and Polynomials

S

R

S

R

reset

d0 q0 d1 q1i

S

R q2

x0 x1 x2 x3

p(x) = x3 + x+1

S

R

S

R

reset

d0 q0 d1 q1i

S

R q2d2

x0 x1 x2 x3

p(x) = x3 + x2 + x+1

S

R

S

R

reset

d0 q0 d1 q1i

S

R q2

S

R q3d3

x0 x1 x2 x3 x4

p(x) = x4 + x3 + x+1

Page 673: ECE 327 Slides VHDL Verilog Digital Hardware Design

646 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.6.1 Circuit Multiplication

Redoing the multiplication example (x2 + x)× (x3 + x2 +1) as circuits:

x2 + x

x3 + x2 +1

(x2 + x)× (x3 + x2 +1)

= x× (x3 + x2 +1)

+ x2× (x3 + x2 +1)

= x5 + x3 + x2 + x

Page 674: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.7 Bit Streams and Characteristic Polynomials 647

7.5.7 Bit Streams and CharacteristicPolynomials

A bit stream, or bit sequence, can be represented as a polynomial.

The oldest (first) bit in a sequence of n bits is represented by xn−1 and theyoungest (last) bit is x0.

The bit sequence 1010011 can be represented as x6 + x4 + x+1:

1 0 1 0 0 1 1= 1x6 + 0x5 + 1x4 + 0x3 + 0x2 + 1x1 + 1x0

= x6 + x4 + x+1

Page 675: ECE 327 Slides VHDL Verilog Digital Hardware Design

648 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.5.8 Division

With rules for multiplication and addition, we can define division.

A fundamental theorem of division defines q and r to be the quotient andremainder, respectively, of m÷ p iff:

m(x) = q(x)× p(x)+ r(x)

Page 676: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.8 Division 649

Long Division

In Galois fields, we do division just as with long division in elementary school.

Given:

m(x) = x6 + x4 + x3

p(x) = x4 + x

Calculate the quotient, q(x) and remainder r(x) for m(x)÷ p(x):

x2 + 1x4 + x x6 + 0x5 + 1x4 + 1x3 + 0x2 + 0x1 + 0x0

x6 + 1x3

1x4

1x4 + xx

Quotient q(x) = x2 +1Remainder r(x) = x

Page 677: ECE 327 Slides VHDL Verilog Digital Hardware Design

650 CHAPTER 7. FAULT TESTING AND TESTABILITY

Long Division (Check)

Check result:

m(x) = q(x) × p(x) + r(x)= (x2 +1) × (x4 + x) + x= x6 + x3 + x4 + x + x= x6 + x4 + x3

Page 678: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.9 Signature Analysis: Math and Circuits 651

7.5.9 Signature Analysis: Math andCircuits

The input to the signature analyzer is a “message”, m(x), which is a sequence of n

bits represented as a polynomial.

After n shifts through an LFSR with l flops:

• The sequence of output bits forms a quotient, q(x), of length n− l

• The flops in the analyzer form a remainder, r(x), of length l

m(x) = q(x)× p(x)+ r(x)

The remainder is the signature.

Page 679: ECE 327 Slides VHDL Verilog Digital Hardware Design

652 CHAPTER 7. FAULT TESTING AND TESTABILITY

Test Generation: Math and Circuits

The mathematics for an LFSR without an input i :

• same polynomial as if the circuit had an input

• input sequence is all 0s

Page 680: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.9 Signature Analysis: Math and Circuits 653

Input Streams and Error Polynomials

An input stream with an error can be represented as m(x)+ e(x)

• e(x) is the error polynomial

• bits in the message that are flipped have a coefficient of 1 in e(x)

m(x)+ e(x) = q′(x)× p(x)+ r′(x)

Page 681: ECE 327 Slides VHDL Verilog Digital Hardware Design

654 CHAPTER 7. FAULT TESTING AND TESTABILITY

Input Streams and Error Polynomials

The error e(x) will be detected if it results in a different signature (remainder).

m(x) and m(x)+ e(x) will have the same remainder iff

e(x) mod p(x) = 0

That is e(x) must be a multiple of p(x).

The larger p(x) is, the smaller the chances that e(x) will be a multiple of p(x).

Page 682: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.9 Signature Analysis: Math and Circuits 655

BIST for a Simple Circuit

Outline of steps to see if a fault will be detected by BIST:

1. Output sequence from test generator

2. Output sequence from correct circuit

3. Remainder for signature analyzer with correct output sequence

4. Output sequence from faulty circuit

5. Remainder for signature analyzer with faulty output sequence

6. Compare correct and faulty remainder, if different then fault detected

Page 683: ECE 327 Slides VHDL Verilog Digital Hardware Design

656 CHAPTER 7. FAULT TESTING AND TESTABILITY

Components

a

b z

a

L1

L2

L3

L4

L5

L6

L7L8

t0 t1 t2D QD QD Q

r0 r1 r2D QD QD Qz

Page 684: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.9 Signature Analysis: Math and Circuits 657

t0 t1 t2t0 t1 t2a b c

corr

ect

faul

ty

z z

z r0 r1 r2 z r0 r1 r2

Page 685: ECE 327 Slides VHDL Verilog Digital Hardware Design

658 CHAPTER 7. FAULT TESTING AND TESTABILITY

Question: Determine if L2@1 will be detected

Test Generation Sequencet0 t1 t2

1 1 11

1

11

11

1

1

initial values = 101

111

00

0

00

0

00

01

111

00

final values are repeatof initial values

Technique is to shift; then computeresult of XORs

Equation for correct circuit: ab+bc

Equation for faulty circuit: a+ c

Output sequences for correct and faultycircuits

t0 t1 t2a b c

corr

ect

faul

ty

z z1 1 11

1

1

1

1

0

00

0

00

01

11

00

1

vectors from test generationsequence

1110000

output sequencesfrom circuits

1111

11

0

Page 686: ECE 327 Slides VHDL Verilog Digital Hardware Design

7.5.9 Signature Analysis: Math and Circuits 659

Signature analyzer sequence for correctCircuit

z r0 r1 r21110000

0 0 0

output sequencefrom correct circuit

initialvalues = 0

1111001

111100

remainder

011

1

1

0

0

0011

1

1

0

0

01

11

00

001

11

00

1

Signature analyzer sequence for faultycircuit

z r0 r1 r2

output sequencefrom correct circuit

initialvalues = 0

remainder

11

1111

11

0

1 0 0 00 011

11

00

111

1

00

110001

011000

010000

0010000

Page 687: ECE 327 Slides VHDL Verilog Digital Hardware Design

660 CHAPTER 7. FAULT TESTING AND TESTABILITY

7.6 Scan vs Self Test

Scan

⇑ less hardware

⇓ slower

⇑ well defined coverage

⇑ test vectors are easy to modify

Self Test

⇓ more hardware

⇑ faster

⇓ ill defined coverage

⇓ test vectors are hard to modify

Page 688: ECE 327 Slides VHDL Verilog Digital Hardware Design

Chapter 8

Review

This chapter lists the major topics of the term. The “Topics List” section for eachmajor area is meant to be relatively complete.

661

Page 689: ECE 327 Slides VHDL Verilog Digital Hardware Design

662 CHAPTER 8. REVIEW

8.1 Overview of the Term

• The purely digital world

– VHDL

– design and optimization methods

– functional verification

– performance analysis

• Analog effects in the digitalworld

– timing analysis

– power

– faults and testing

Page 690: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.2. VHDL 663

8.2 VHDL

8.2.1 VHDL Topics• simple syntax and semantics — things that you should know simply by having

done the labs and project

• behavioural semantics of VHDL

• synthesis semantics of VHDL

• synthesizable and unsynthesizable code

Page 691: ECE 327 Slides VHDL Verilog Digital Hardware Design

664 CHAPTER 8. REVIEW

8.2.2 VHDL Example Problems• identify whether a particular signal will be the output of combinational circuitry or

a flop

• identify whether a particular process is combinational or clocked

• legal, synthesizable, and good code

• perform delta-cycle simulation of VHDL

• perform RTL simulation of VHDL

• identify whether two VHDL fragments have same behaviour

• match VHDL code with waveforms

• match VHDL code with hardware

• choose the VHDL fragment that generates smaller or faster hardware

Page 692: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.3. RTL DESIGN TECHNIQUES 665

8.3 RTL Design Techniques

8.3.1 Design Topics• coding guidelines

• generic FPGA hardware

• area estimation

• finite state machines

– implicit

– explicit-current

– explicit-current+next

• from algorithm to hardware

– dependency graph

– dataflow diagram

– scheduling

– input/output allocation

– register allocation

– datapath allocation

– hardware block diagram

– state machine

• memory dependencies

• memory arrays and dataflow diagrams

Page 693: ECE 327 Slides VHDL Verilog Digital Hardware Design

666 CHAPTER 8. REVIEW

8.3.2 Design Example Problems• choose design guidelines to follow in different situations

• estimate area to implement a circuit in an FPGA

• calculate resource usage for a dataflow diagram

• calculate performance data for a dataflow diagram

• given an algorithm, design a dataflow diagram

• given a dataflow diagram, design the datapath and finite state machine

• optimize a dataflow diagram to improve performance or reduce resource usage

• given a dataflow diagram, calculate the clock period that will result in themaximum performance

Page 694: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.4. FUNCTIONAL VERIFICATION 667

8.4 Functional Verification

8.4.1 Verification Topics• test cases

• measuring coverage

• time for verification

• test benches

• assertions

• coverage monitors

• relational specification

• functional specification

• boundary conditions / corner cases

Page 695: ECE 327 Slides VHDL Verilog Digital Hardware Design

668 CHAPTER 8. REVIEW

8.4.2 Verification Example Problems• choose first cases to test

• identify corner cases

• choose technique to detect bug (test case, assertion/test bench)

• determine whether a code change will cause a bug

• identify a test case and either assertion or test bench to catch a bug

Page 696: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.5. PERFORMANCE ANALYSIS AND OPTIMIZATION 669

8.5 Performance Analysis andOptimization

8.5.1 Performance Topics• time to execute a program

• definition of performance

• speedup

• n% faster

• calculating performance of different different tasks and of average task

• choosing which task to optimize to best improve overall performance

• cpi calculations

• performance increase over time

• design tradeoffs (CPI vs NumInsts vs ClockSpeed vs time-to-market)

• CPI calculations

• MIPs calculations

• Clock speed vs. performance

• Optimality — performance / area tradeoffs

Page 697: ECE 327 Slides VHDL Verilog Digital Hardware Design

670 CHAPTER 8. REVIEW

8.5.2 Performance Example Problems• calculate performance / area tradeoffs

• calculate performance / time tradeoffs

• compare performance data between products

• evaluate performance criteria

Page 698: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.6. TIMING ANALYSIS 671

8.6 Timing Analysis

8.6.1 Timing Topics• circuit parameters that affect delay

– clock period

– clock skew

– clock jitter

– propagation delay

– load delay

– setup time

– hold time

– clock-to-Q time

• timing analysis of latch

• timing analysis of master-slaveflip-flop

• timing analysis of hierachical storagedevice

• critical path and false path

– algorithm to find critical path

– algorithm to determine if path isfalse or critical

– signal assignment to exercisecritical path

• elmore timing model

• derating factors

Page 699: ECE 327 Slides VHDL Verilog Digital Hardware Design

672 CHAPTER 8. REVIEW

8.6.2 Timing Example Problems• timing parameters for minimum clock period

• timing parameters for hold constraint

• find the critical path and assignment to exercise it

• compute elmore delay constant

• compare accuracy of different timing models

• determine if a storage device will work correctly

• compute timing parameters of storage device

• identify timing violation, suggest remedy

• suggest design change to increase clock speed

Page 700: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.7. POWER 673

8.7 Power

8.7.1 Power Topics• power vs energy

• equations for power

– dynamic power

– static power

– switching power

– short circuit power

– leakage power

– activity factor

– leakage current

– threshold voltage

– supply voltage

• analog power reduction techniques

• rtl power reduction techniques

– data encoding

– clock gating

Page 701: ECE 327 Slides VHDL Verilog Digital Hardware Design

674 CHAPTER 8. REVIEW

8.7.2 Power Example Problems• predict effect of new fabrication process on power

• predict effect of environment change (temp, supply voltage, etc) on powerconsumption

• predict effect of design change on power consumption (capacitance, activityfactor)

• design data-encoding scheme for a circuit, predict effect on power consumption

• design clock gating scheme for a circuit, predict effect on power consumption

• asses validity of various power- or energy-consumption metrics

Page 702: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.8. TESTING 675

8.8 Testing

8.8.1 Testing Topics• causes of faults

• locations of faults

• physical faults

• single stuck-at fault model

• testable / untestable fault

• economics of testing

• fault coverage

• test vector generation

• order test vectors to reduce test time

• behaviour of a scan chain

• time to run a scan test

• JTAG

• built-in self-test

• linear feedback shift register

• signature analyzer

• Galois fields

• process and time to run a BIST test

Page 703: ECE 327 Slides VHDL Verilog Digital Hardware Design

676 CHAPTER 8. REVIEW

8.8.2 Testing Example Problems• compute optimal amount of testing to maximize profits

• compute coverage for a given set of test vectors

• find test vectors to catch a set of faults, choose order to run test vectors

• determine if a fault is detectable

• choose an LFSR to use for BIST test generation

• choose an LFSR to use for BIST signature analysis

• determine if a given BIST will catch a given fault

• determine probability that a given BIST technique will report that a faulty circuitis correct

• determine if a given fault-testing scheme will detect a physical fault

• match LFSR to characteristic polynomial

• match BIST hardware to Galois mathematics

• perform Galois field mathematics, compare to waveforms

Page 704: ECE 327 Slides VHDL Verilog Digital Hardware Design

8.9. FORMULAS TO BE GIVEN ON FINAL EXAM 677

8.9 Formulas to be Given on Final Exam

T =Ins×C

F

Pf =W

T

S =T1

T2

M =F/106

(n

∑i=0

PIi×Ci)

Page 705: ECE 327 Slides VHDL Verilog Digital Hardware Design

678 CHAPTER 8. REVIEW

Formulas II

P =12(A×CL×V

2×F)+(τ×A×V× ISh×F)+(V× IL)

q = 1.60218×10−19C

k = 1.38066×10−23J/K

F ∝(V−VTh)2

V

IL ∝ e

−q×VTh

k×T