in-system design verification of processors

Download In-System Design Verification of Processors

If you can't read please download the document

Upload: bennett-dennis

Post on 08-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Introduction Design Hierarchy Macro Instruction Level Simulator (Behavioral) General Purpose Register, Memory Micro-code Level Verifier + Internal Bus Verilog Hardware Model + Clock-cycle Accurate Description SUB ADD macroSUB end ISS Cycle-based Verilog (HDL)

TRANSCRIPT

In-System Design Verification of Processors Introduction Design Hierarchy
Macro Instruction Level Simulator (Behavioral) General Purpose Register, Memory Micro-code Level Verifier + Internal Bus Verilog Hardware Model + Clock-cycle Accurate Description SUB ADD macroSUB end ISS Cycle-based Verilog (HDL) What is ISV? ISV = In-System Verification When is ISV required? vs.
1) Design refinement down along the design hierarchy Comparison between design levels C1: ISS (Instruction Set Simulator) C2: Cycle-based Model C3: RTL Model specification specification C1 C2 vs. C3 Cn Cn What is ISV? (contd) system chip HW chip I/F HW (slowed)
2) In-system operation : confirm correct behavior in system environment system chip HW (FPGA) chip I/F HW (slowed) (a) simulation (c) emulation SW SW SW HW (b) all-software (d) Virtual Chip Simulation Consistency check between models of different abstraction levels Instruction Set Simulator (behavioral) RTL model (structural) Test Vector Test Pattern Random Pattern Test Program Application Program SW Stimulus at the I/F Various Levels of Design Verification (Test Vectors in Simulation)
High efficiency = # of bugs detected size of test vector Confined to the designers understanding Covers rare cases automatic generation Coverage not reliable Available good compromise between coverage & efficiency as benchmark Requires many programs to obtain sufficient coverage Simulates real situations High coverage Excessive verification Low efficiency Advantage Disadvantage Test pattern Random program Application All-Software Approach
Modeling System Part in Software Test Vector System Software (BIOS, OS) Application Programs compatible processor design Helps detect bugs When the situation is difficult to reproduce with random patterns (i.e., Instructions sensible behavior requires some pre-setting) When instruction behavior is complex, i.e., CISC instruction Modeling system parts is difficult when no source code for the application programs is available SW (chip) SW (system) Emulation Mapping Gate-level Model in FPGA-based System Fast ISV HW
in simulation speed in design stage Slowed-down System HW in FPGA Time second seconds minutes hours day days months 1 10 2 16 3 12 Speed up factor 107 106 105 104 103 102 101 Actual Hardware Logic Emulation Software Simulation Verification Gap Concurrent Verification
Without Emulation Sequential Verification System SW Design Code Debug Integration Hardware HW Design Build Debug Integration CHIP Design Fab Debug Back annotation Time With Emulation Concurrent Verification SW Design Code HW Design Build CHIP Design Fab Early to HW integration Final Integration Market!! HW emulation Chip & HW Debug Debug Debug Sys integration & SW Debug Back annotation Bridge between SW and HW
Virtual Chip Validate the functionality and performance evaluation of algorithm in real situations, i.e., with real-world vectors and real hardware environment. verify the algorithm in the early design stage Concept of Virtual Chip SW HW Functional model Bus System Description In PL (C, C++) Processor Bridge between SW and HW In HDL (Verilog...) FPGA (pin signal generator)
Virtual Chip [DAC98] [DAC98] Virtual Chip: Making Functional Models Work on Real Target System Example: Simulating ISS with real target system ISV with application program in early design stage Target board Host computer cable PSG (pin signal generator) daughter board Chip Model Why Virtual Chip ? No need to model external system in software as in all-software approach Inexpensive solution compared to emulator small number of FPGAs HW slow-down is not necessary no need to modify target system for emulation Hardware Emulation slowed Bus Model Target Board Virtual Chip slowed normal Bus Model Target Board Buffer Conventional design flow Virtual-Chip-based design flow
Benefit in Design Time idle Conventional design flow Architectural model RTL Gate-level H/W Emulation Verification w/ H/W H/W prototype (H/W emulation) Board design Application S/W Virtual-Chip-based design flow (Virtual Chip) Design time is drastically reduced x86-compatible Microprocessor Design
1. HK386 : The first step to x86(1994) 300,000 Tr. count 5V, 0.8um DLM CMOS technology die size : 1cm x 1cm 2. K486 : The attempt to full custom(1997) 1,000,000 Tr. count 8KB on-chip cache : full-custom design die size : 1.5cm x 1.5cm 3. Marcia : Superscalar architecture(1997) 3,000,000 Tr. count 3V, 0.6um TLM CMOS technology die size : 1.2cm x 1.2cm Overall Functional Verification Flow
Architecture Define Microcode Description RTL Description (Verilog HDL) Synthesis RTL Simulation Microcode Verifier Gate Level Simulation For version control Hardware Emulation Verification Completed Design Verification Methodology
Instruction Behavior In C (Polaris) Micro- architecture in C RT-Level in Verilog Gate-Level Real Mother-board H/W Virtual PC in C language (VPC) C Language Peripherals HDL MCV : Microcode Verifier PLI : Programming Language Interface more refined model MCV FlexPC Virtual Chip Using PLI CPU Polaris: ISS (Instruction Set Simulator)
ISS for x86 processors : Polaris a standard reference model for the design of x86 processors about 10,000 line code written in C language Polaris can execute all the programs which run on real PCs Polaris is used for verifying the functionality of each instruction Polaris helps microcode design and debugging with the verified reference model MCV (Micro-Code Verifier)
DOS simulation window Behavior simulation at micro-operation level Debugging feature trace each micro-operation result operation backward source code trace internal states (registers and buses) symbolic microcode in execution states before executing this microcode can be restored MCV debugging environment StreC (Structural Level C Model)
RTL Model using C language A cycle is levelized into 4 phase Static scheduling of logic behavior No timing delay Cycle-based simulator High simulation speed(1.4KHz) Structural Analysis of Design Signal Flow Graph Static timing verification Resource estimation at RTL RTL floorplan P1 P1_EDGE P1_LEVEL P2_EDGE P2_LEVEL CP1_EDGE(); DP1_EDGE(); FP1_EDGE(); SP1_EDGE(); KPP1_EDGE(); BP1_EDGE(); XP1_EDGE(); CP1_LEVEL(); DP1_LEVEL(); FP1_LEVEL(); SP1_LEVEL(); KPP1_LEVEL(); BP1_LEVEL(); XP1_LEVEL(); CP2_EDGE(); DP2_EDGE(); FP2_EDGE(); SP2_EDGE(); KPP2_EDGE(); BP2_EDGE(); XP2_EDGE(); XP2_LEVEL_1(); KPP2_LEVEL_1(); CP2_LEVEL_1(); DP2_LEVEL_1(); CP2_LEVEL_2(); FP2_LEVEL_1(); DP2_LEVEL_2(); SP2_LEVEL_1(); FP2_LEVEL_2(); KPP2_LEVEL_2(); BP2_LEVEL_1(); XP2_LEVEL_2(); RTL C Model (StreCTM) RTL description in C Functional Verification
Cycle-based simulation about 100 times speed-up compared to VCS Translated to Verilog RTL model Reducing total simulation time Polaris MCV StreC VCS Chip 210KHz 50KHz 1.4KHz 17Hz 33MHz 20min. 50min. 2days 120days 12sec. speed time time: Windows 3.1 running time Verilog simulation Working Verilog code Functional + timing conversion Conventional Method C model time Functional Static Timing verification StreC Simulation timing C code VPC (Virtual PC) library
Library of PC chipset model software model of PC board capable of interface to CPU model of any level provides interfaces for workstation platform keyboard, graphic card: X Windows floppy disk, hard disk: UNIX file system C code of 20,000 lines BIOS code mostly consists of x86 assembly program speed-critical part is implemented with C functions disk, graphic routine register values are transferred via I/O port VPC(Virtual PC) Environment
Chipset model CPU model X window intel i386 BIOS (Assembly and C routine) Interface routines Keyboard with Xlib Memory Debugging feature Simulation & Debugging x86 interface platform interface Virtual PC UNIX file system PC model Platform HK386 Design Specification Test Programs MS Office
compatibility : Instruction level, Pin-to-Pin compatible with i386 performance : Similar to i386 operation speed : 40 MHz process : 0.8 m DLM CMOS Test Programs MS DOS 6.0, Windows 3.1, Office 4.0 CAD tools, games, etc.. MS Win. 3.1 MS Office MaxPlus II 12 minutes 15 slides (including cover) cover (1) introduction why need? (1) virtual chip concept (1) virtual chip organization (?) experimental result (?) conclusion (1) HK387 Design specification
compatibility : Instruction level, Pin-to-Pin compatible with i387 operation speed : 33 MHz process : 0.8 m 2LM CMOS performance PC magazine coprocessor benchmark [ops/sec] intel 387 ULSI 387 HK 387 Cyrix 387 AutoCAD R11 Mathematica 3.0 Design Center ... Simulation Input Vector
determine type of instructions Off-the-shelf Test Vector Regression test Intensive instruction test programs more than 500 programs Random Test Vector Generator (Pandora) Template based Improve the test coverage Real applications DOS, Windows sequence of testing processor status Pandora Saver with Modify and Restart Capability
Conventional Saver Dump all running information at arbitrary time points. Any modification forces the simulation to be rewound to the beginning. Proposed Saver Find the nearest suitable points to save snapshot, then save only internal states rather than all simulation context. Can be restarted at any save points by triggering a signal in spite of design modification. Save point is actively adjusted to a stable point conventional proposed Reduction of Simulation Time
Simulation Started Timing overhead for a bug-fix Bug Detected Size of debugging loop for failure of bug-fix Signal dump generation for debugging Resimulation from the beginning Without Saver TBD TBD+ TDBG TBD+ Conventional TBD TSD+ TDBG TBD+ Proposed Saver TBD TSD+ TDBG TSD+ Debugging x86 Emulation Configuration
Quickturn Hardware Emulator Slow-Down PC Probe Module Target Interface Board Debugging Progress Traces
HDL saver Attached Windows DOS HDL Simulation Hardware Emulation setup version update 1 update 2 update 3 Catched-Bug Categories
1. Test Program and Random Test Vector are concurrently verified. 2. Exceptional cases of complex instructions are hard to fully verify only with test vectors. Conclusions ISV (In-System Verification) is a MUST for assuring the successful working of the APPLICATION programs on the WHOLE SYSTEM, and reducing Time-to-Market. We have presented various approaches for in-system verification of microprocessors and DSP processors. ASIP(Application-Specific Instruction Set Processor) Design Reference J.H.Yang et al, MetaCore: An Application-Specific DSP Development System, 1998 DAC Proceedings, pp J.H.Yang et al, MetaCore: An Application-Specific Programmable DSP Development System, IEEE Trans. VLSI Systems, vol 8, April 2000, pp B.W.Kim et al, MDSP-II:16-bit DSP with Mobile Communication Accelerator, IEEE JSSC, vol 34, March 1999, pp Part I : ASIP in general ASIP is a compromise between GPP(General-Purpose Processor) which can be used anywhere with low performance and full-custom ASIC which fits only a specific application but with very high performance. GPP, DSP, ASIP, FPGA, ASIC(sea of gates), CBIC(standard cell-based IC), and full custom ASIC in the order of increasing performance and decreasing adaptability. Recently, ASIC as well as FPGA contains processor cores. Cost, Performance,Programmability, and TTM(Time-to-Market)
ASIP (Application-Specific Instruction set Processor) ASIP is a tradeoff between the advantages of general-purpose processor (flexibility, short development time) and those of ASIC (fast execution time). Execution time General-purpose processor ASIP Rigidity Cost (NRE+chip area) Depends on volume of product ASIC Development time Comparison of Typical Development Time
Chip manufacturer time Customer time MetaCore (ASIP) 20 months 3 months Core generation + application code development MetaCore development General-purpose processor 20 months 2 months Core generation Applicationcode development ASIC 10 months Issues in ASIP Design For high execution speed, flexibility and small chip area; An optimal selection of micro-architecture & instruction set is required based on diverse exploration of the design space. For short design turnaround time; An efficient means of transforming higher-level specification into lower-level implementation is required. For friendly support of application program development; A fast development of a suite of supporting software including compiler and ISS(Instruction Set Simulator) is necessary. Various ASIP Development Systems
Instruction set customization Application programming level Year Selection from predefined super set User-defined instructions PEAS-I (Univ. Toyohashi) Risc-like Micro-architecture (register based operation) 1991 Yes No C-language Generates proper instruction set based on predefined datapath ASIA (USC) 1993 C-language EPICS (Philips) 1993 Yes No assembly DSP-oriented Micro-architecture (memory based operation) CD2450 (Clarkspur) 1995 Yes No assembly MetaCore (KAIST) 1997 Yes Yes C-language Part II : MetaCore System
Verification with co-generated compiler and ISS MetaCore system ASIP development environment Re-configurable fixed-point DSP architecture Retargetable system software C-compiler, ISS, assembler MDSP-II : a 16-bit DSP targeted for GSM applications. The Goal of MetaCore System
Supports efficient design methodology for ASIP targeted for DSP application field. Diverse design exploration Performance/cost efficient design Automatic design generation Short chip/core design turnaround time In-situ generation of application program development tools Overview: How to Obtain a DSP Core from MetaCore System
Instructions Architecture template Functional blocks Primitive class Adder add and or sub Bus structure Multiplier Data-path structure Shifter Optional class mac max min Pipeline model Select instructions architectural Select parameter Select functional blocks Benchmark Programs Simulation Modify architecture No No OK? Add or delete instructions Add or delete functional blocks Yes HDL code generation Logic synthesis System Library & Generator Set: Key Components of MetaCore System
Processor Specification Benchmark Programs Modify specification Compiler generator ISS generator Simulation C compiler ISS modify Modify Add Add Evaluation Generator set accept functional Set of blocks Architecture template instructions Set of HDL generator - bus structure - instructions definition - parameterized HDL code - pipeline model Synthesizable HDL code - related func. block - I/O port information - data-path structure - gate count System Lib. Processor Specification (example)
Specification of target core defines instruction set & hardware configuration. is easy for designer to use & modify due to high-level abstraction. //Specification of EM1 (hardware ACC Hardware configuration AR pmem k, [2047: 0] . ) (def_instADD (operandtype2) (ACC a0) pc=L1 clr a1 ; a1=0 L1: add a1, a0 ; a1=a1+a0 Application-specific instruction L1: Frequent sequence of contiguous instructions Connectivity synthesis Control-path synthesis Macro-block generation
HDL Code Generator Processor Specification Synthesizable HDL code Connectivity synthesis Connects I/O and control ports of each functional block to buses and control signals Control-path synthesis Generates decoder logic for each pipeline stage Decoder logic Memory size, address space Macro-block generation Instantiates the parameter variables of each functional block ALU Multiplier Shifter Register file Data memory1 AGU1 memory0 Program memory Peripherals (Timer, SIO) Controller Bit-width of functional blocks BMU Target core Design Example (MDSP-II)
GSM(Global System for Mobile communication) Benchmark programs C programs (each algorithm constructing GSM) Procedure of design refinement Remove infrequent instructions based on instruction usage count Turn frequent sequence of contiguous instructions into a new instruction EM2 (MDSP-II) EM0 EM1 Initial design containing all predefined instructions Final design containing application-specific instructions Evolution of MDSP-II Core from The Initial Machine
Number of clock cycles (for 1 sec. voice data processing) Gate count Machine EM0 (initial) 53.0 Millions 18.1K EM1 (intermediate) 53.1 Millions 15.0K EM2 (MDSP-II) 27.5 Millions 19.3K Number of clock cycles EM1 EM0 50M 40M 30M EM2 (MDSP-II) 20M 10M Gate count 5K 10K 15K 20K Design Turnaround Time (MDSP-II)
Design turnaround is significantly reduced due to the reduction of HDL design & functional simulation time. Only hardware blocks for application-specific instructions, if any, need to be designed by the user. Design progress Time (months) Application analysis HDL design, Functional simulation Layout, Timing simulation MetaCore 1 2 3 5 weeks 1 week 7 weeks Tape-out Overview of EM2 (MDSP-II)
16-bit fixed-point DSP Optimized for GSM 0.6 mm CMOS (TLM), 9.7mm x 9.8mm 55 MCAU Program Memory Data Memory PU (SIO, Timer) DALU PCU AGU MCAU (Mobile Comm. Acceleration Unit) consists of functional blocks for application-specific instructions 16x16 multiplier 32-bit adder DALU (Data Arithmetic Logic Unit) 16x16 multiplier 16-bit barrel shifter 32-bit adder Data switch network PCU (Program Control Unit) AGU (Address Generation Unit) supports linear, modulo and bit-reverse addressing modes PU (Peripheral Unit) Serial I/O Timer Conclusions MetaCore, an effective ASIP design methodology for DSP is proposed. 1) Benchmark-driven & high-level abstraction of processor specification enables performance/cost effective design. 2) Generator set with system library enables short design turnaround time.