in-system design verification of processors
DESCRIPTION
Introduction Design Hierarchy Macro Instruction Level Simulator (Behavioral) General Purpose Register, Memory Micro-code Level Verifier + Internal Bus Verilog Hardware Model + Clock-cycle Accurate Description SUB ADD macroSUB end ISS Cycle-based Verilog (HDL)TRANSCRIPT
In-System Design Verification of Processors Introduction Design
Hierarchy
Macro Instruction Level Simulator (Behavioral) General Purpose
Register, Memory Micro-code Level Verifier + Internal Bus Verilog
Hardware Model + Clock-cycle Accurate Description SUB ADD macroSUB
end ISS Cycle-based Verilog (HDL) What is ISV? ISV = In-System
Verification When is ISV required? vs.
1) Design refinement down along the design hierarchy Comparison
between design levels C1: ISS (Instruction Set Simulator) C2:
Cycle-based Model C3: RTL Model specification specification C1 C2
vs. C3 Cn Cn What is ISV? (contd) system chip HW chip I/F HW
(slowed)
2) In-system operation : confirm correct behavior in system
environment system chip HW (FPGA) chip I/F HW (slowed) (a)
simulation (c) emulation SW SW SW HW (b) all-software (d) Virtual
Chip Simulation Consistency check between models of different
abstraction levels Instruction Set Simulator (behavioral) RTL model
(structural) Test Vector Test Pattern Random Pattern Test Program
Application Program SW Stimulus at the I/F Various Levels of Design
Verification (Test Vectors in Simulation)
High efficiency = # of bugs detected size of test vector Confined
to the designers understanding Covers rare cases automatic
generation Coverage not reliable Available good compromise between
coverage & efficiency as benchmark Requires many programs to
obtain sufficient coverage Simulates real situations High coverage
Excessive verification Low efficiency Advantage Disadvantage Test
pattern Random program Application All-Software Approach
Modeling System Part in Software Test Vector System Software (BIOS,
OS) Application Programs compatible processor design Helps detect
bugs When the situation is difficult to reproduce with random
patterns (i.e., Instructions sensible behavior requires some
pre-setting) When instruction behavior is complex, i.e., CISC
instruction Modeling system parts is difficult when no source code
for the application programs is available SW (chip) SW (system)
Emulation Mapping Gate-level Model in FPGA-based System Fast ISV
HW
in simulation speed in design stage Slowed-down System HW in FPGA
Time second seconds minutes hours day days months 1 10 2 16 3 12
Speed up factor 107 106 105 104 103 102 101 Actual Hardware Logic
Emulation Software Simulation Verification Gap Concurrent
Verification
Without Emulation Sequential Verification System SW Design Code
Debug Integration Hardware HW Design Build Debug Integration CHIP
Design Fab Debug Back annotation Time With Emulation Concurrent
Verification SW Design Code HW Design Build CHIP Design Fab Early
to HW integration Final Integration Market!! HW emulation Chip
& HW Debug Debug Debug Sys integration & SW Debug Back
annotation Bridge between SW and HW
Virtual Chip Validate the functionality and performance evaluation
of algorithm in real situations, i.e., with real-world vectors and
real hardware environment. verify the algorithm in the early design
stage Concept of Virtual Chip SW HW Functional model Bus System
Description In PL (C, C++) Processor Bridge between SW and HW In
HDL (Verilog...) FPGA (pin signal generator)
Virtual Chip [DAC98] [DAC98] Virtual Chip: Making Functional Models
Work on Real Target System Example: Simulating ISS with real target
system ISV with application program in early design stage Target
board Host computer cable PSG (pin signal generator) daughter board
Chip Model Why Virtual Chip ? No need to model external system in
software as in all-software approach Inexpensive solution compared
to emulator small number of FPGAs HW slow-down is not necessary no
need to modify target system for emulation Hardware Emulation
slowed Bus Model Target Board Virtual Chip slowed normal Bus Model
Target Board Buffer Conventional design flow Virtual-Chip-based
design flow
Benefit in Design Time idle Conventional design flow Architectural
model RTL Gate-level H/W Emulation Verification w/ H/W H/W
prototype (H/W emulation) Board design Application S/W
Virtual-Chip-based design flow (Virtual Chip) Design time is
drastically reduced x86-compatible Microprocessor Design
1. HK386 : The first step to x86(1994) 300,000 Tr. count 5V, 0.8um
DLM CMOS technology die size : 1cm x 1cm 2. K486 : The attempt to
full custom(1997) 1,000,000 Tr. count 8KB on-chip cache :
full-custom design die size : 1.5cm x 1.5cm 3. Marcia : Superscalar
architecture(1997) 3,000,000 Tr. count 3V, 0.6um TLM CMOS
technology die size : 1.2cm x 1.2cm Overall Functional Verification
Flow
Architecture Define Microcode Description RTL Description (Verilog
HDL) Synthesis RTL Simulation Microcode Verifier Gate Level
Simulation For version control Hardware Emulation Verification
Completed Design Verification Methodology
Instruction Behavior In C (Polaris) Micro- architecture in C
RT-Level in Verilog Gate-Level Real Mother-board H/W Virtual PC in
C language (VPC) C Language Peripherals HDL MCV : Microcode
Verifier PLI : Programming Language Interface more refined model
MCV FlexPC Virtual Chip Using PLI CPU Polaris: ISS (Instruction Set
Simulator)
ISS for x86 processors : Polaris a standard reference model for the
design of x86 processors about 10,000 line code written in C
language Polaris can execute all the programs which run on real PCs
Polaris is used for verifying the functionality of each instruction
Polaris helps microcode design and debugging with the verified
reference model MCV (Micro-Code Verifier)
DOS simulation window Behavior simulation at micro-operation level
Debugging feature trace each micro-operation result operation
backward source code trace internal states (registers and buses)
symbolic microcode in execution states before executing this
microcode can be restored MCV debugging environment StreC
(Structural Level C Model)
RTL Model using C language A cycle is levelized into 4 phase Static
scheduling of logic behavior No timing delay Cycle-based simulator
High simulation speed(1.4KHz) Structural Analysis of Design Signal
Flow Graph Static timing verification Resource estimation at RTL
RTL floorplan P1 P1_EDGE P1_LEVEL P2_EDGE P2_LEVEL CP1_EDGE();
DP1_EDGE(); FP1_EDGE(); SP1_EDGE(); KPP1_EDGE(); BP1_EDGE();
XP1_EDGE(); CP1_LEVEL(); DP1_LEVEL(); FP1_LEVEL(); SP1_LEVEL();
KPP1_LEVEL(); BP1_LEVEL(); XP1_LEVEL(); CP2_EDGE(); DP2_EDGE();
FP2_EDGE(); SP2_EDGE(); KPP2_EDGE(); BP2_EDGE(); XP2_EDGE();
XP2_LEVEL_1(); KPP2_LEVEL_1(); CP2_LEVEL_1(); DP2_LEVEL_1();
CP2_LEVEL_2(); FP2_LEVEL_1(); DP2_LEVEL_2(); SP2_LEVEL_1();
FP2_LEVEL_2(); KPP2_LEVEL_2(); BP2_LEVEL_1(); XP2_LEVEL_2(); RTL C
Model (StreCTM) RTL description in C Functional Verification
Cycle-based simulation about 100 times speed-up compared to VCS
Translated to Verilog RTL model Reducing total simulation time
Polaris MCV StreC VCS Chip 210KHz 50KHz 1.4KHz 17Hz 33MHz 20min.
50min. 2days 120days 12sec. speed time time: Windows 3.1 running
time Verilog simulation Working Verilog code Functional + timing
conversion Conventional Method C model time Functional Static
Timing verification StreC Simulation timing C code VPC (Virtual PC)
library
Library of PC chipset model software model of PC board capable of
interface to CPU model of any level provides interfaces for
workstation platform keyboard, graphic card: X Windows floppy disk,
hard disk: UNIX file system C code of 20,000 lines BIOS code mostly
consists of x86 assembly program speed-critical part is implemented
with C functions disk, graphic routine register values are
transferred via I/O port VPC(Virtual PC) Environment
Chipset model CPU model X window intel i386 BIOS (Assembly and C
routine) Interface routines Keyboard with Xlib Memory Debugging
feature Simulation & Debugging x86 interface platform interface
Virtual PC UNIX file system PC model Platform HK386 Design
Specification Test Programs MS Office
compatibility : Instruction level, Pin-to-Pin compatible with i386
performance : Similar to i386 operation speed : 40 MHz process :
0.8 m DLM CMOS Test Programs MS DOS 6.0, Windows 3.1, Office 4.0
CAD tools, games, etc.. MS Win. 3.1 MS Office MaxPlus II 12 minutes
15 slides (including cover) cover (1) introduction why need? (1)
virtual chip concept (1) virtual chip organization (?) experimental
result (?) conclusion (1) HK387 Design specification
compatibility : Instruction level, Pin-to-Pin compatible with i387
operation speed : 33 MHz process : 0.8 m 2LM CMOS performance PC
magazine coprocessor benchmark [ops/sec] intel 387 ULSI 387 HK 387
Cyrix 387 AutoCAD R11 Mathematica 3.0 Design Center ... Simulation
Input Vector
determine type of instructions Off-the-shelf Test Vector Regression
test Intensive instruction test programs more than 500 programs
Random Test Vector Generator (Pandora) Template based Improve the
test coverage Real applications DOS, Windows sequence of testing
processor status Pandora Saver with Modify and Restart
Capability
Conventional Saver Dump all running information at arbitrary time
points. Any modification forces the simulation to be rewound to the
beginning. Proposed Saver Find the nearest suitable points to save
snapshot, then save only internal states rather than all simulation
context. Can be restarted at any save points by triggering a signal
in spite of design modification. Save point is actively adjusted to
a stable point conventional proposed Reduction of Simulation
Time
Simulation Started Timing overhead for a bug-fix Bug Detected Size
of debugging loop for failure of bug-fix Signal dump generation for
debugging Resimulation from the beginning Without Saver TBD TBD+
TDBG TBD+ Conventional TBD TSD+ TDBG TBD+ Proposed Saver TBD TSD+
TDBG TSD+ Debugging x86 Emulation Configuration
Quickturn Hardware Emulator Slow-Down PC Probe Module Target
Interface Board Debugging Progress Traces
HDL saver Attached Windows DOS HDL Simulation Hardware Emulation
setup version update 1 update 2 update 3 Catched-Bug
Categories
1. Test Program and Random Test Vector are concurrently verified.
2. Exceptional cases of complex instructions are hard to fully
verify only with test vectors. Conclusions ISV (In-System
Verification) is a MUST for assuring the successful working of the
APPLICATION programs on the WHOLE SYSTEM, and reducing
Time-to-Market. We have presented various approaches for in-system
verification of microprocessors and DSP processors.
ASIP(Application-Specific Instruction Set Processor) Design
Reference J.H.Yang et al, MetaCore: An Application-Specific DSP
Development System, 1998 DAC Proceedings, pp J.H.Yang et al,
MetaCore: An Application-Specific Programmable DSP Development
System, IEEE Trans. VLSI Systems, vol 8, April 2000, pp B.W.Kim et
al, MDSP-II:16-bit DSP with Mobile Communication Accelerator, IEEE
JSSC, vol 34, March 1999, pp Part I : ASIP in general ASIP is a
compromise between GPP(General-Purpose Processor) which can be used
anywhere with low performance and full-custom ASIC which fits only
a specific application but with very high performance. GPP, DSP,
ASIP, FPGA, ASIC(sea of gates), CBIC(standard cell-based IC), and
full custom ASIC in the order of increasing performance and
decreasing adaptability. Recently, ASIC as well as FPGA contains
processor cores. Cost, Performance,Programmability, and
TTM(Time-to-Market)
ASIP (Application-Specific Instruction set Processor) ASIP is a
tradeoff between the advantages of general-purpose processor
(flexibility, short development time) and those of ASIC (fast
execution time). Execution time General-purpose processor ASIP
Rigidity Cost (NRE+chip area) Depends on volume of product ASIC
Development time Comparison of Typical Development Time
Chip manufacturer time Customer time MetaCore (ASIP) 20 months 3
months Core generation + application code development MetaCore
development General-purpose processor 20 months 2 months Core
generation Applicationcode development ASIC 10 months Issues in
ASIP Design For high execution speed, flexibility and small chip
area; An optimal selection of micro-architecture & instruction
set is required based on diverse exploration of the design space.
For short design turnaround time; An efficient means of
transforming higher-level specification into lower-level
implementation is required. For friendly support of application
program development; A fast development of a suite of supporting
software including compiler and ISS(Instruction Set Simulator) is
necessary. Various ASIP Development Systems
Instruction set customization Application programming level Year
Selection from predefined super set User-defined instructions
PEAS-I (Univ. Toyohashi) Risc-like Micro-architecture (register
based operation) 1991 Yes No C-language Generates proper
instruction set based on predefined datapath ASIA (USC) 1993
C-language EPICS (Philips) 1993 Yes No assembly DSP-oriented
Micro-architecture (memory based operation) CD2450 (Clarkspur) 1995
Yes No assembly MetaCore (KAIST) 1997 Yes Yes C-language Part II :
MetaCore System
Verification with co-generated compiler and ISS MetaCore system
ASIP development environment Re-configurable fixed-point DSP
architecture Retargetable system software C-compiler, ISS,
assembler MDSP-II : a 16-bit DSP targeted for GSM applications. The
Goal of MetaCore System
Supports efficient design methodology for ASIP targeted for DSP
application field. Diverse design exploration Performance/cost
efficient design Automatic design generation Short chip/core design
turnaround time In-situ generation of application program
development tools Overview: How to Obtain a DSP Core from MetaCore
System
Instructions Architecture template Functional blocks Primitive
class Adder add and or sub Bus structure Multiplier Data-path
structure Shifter Optional class mac max min Pipeline model Select
instructions architectural Select parameter Select functional
blocks Benchmark Programs Simulation Modify architecture No No OK?
Add or delete instructions Add or delete functional blocks Yes HDL
code generation Logic synthesis System Library & Generator Set:
Key Components of MetaCore System
Processor Specification Benchmark Programs Modify specification
Compiler generator ISS generator Simulation C compiler ISS modify
Modify Add Add Evaluation Generator set accept functional Set of
blocks Architecture template instructions Set of HDL generator -
bus structure - instructions definition - parameterized HDL code -
pipeline model Synthesizable HDL code - related func. block - I/O
port information - data-path structure - gate count System Lib.
Processor Specification (example)
Specification of target core defines instruction set & hardware
configuration. is easy for designer to use & modify due to
high-level abstraction. //Specification of EM1 (hardware ACC
Hardware configuration AR pmem k, [2047: 0] . ) (def_instADD
(operandtype2) (ACC a0) pc=L1 clr a1 ; a1=0 L1: add a1, a0 ;
a1=a1+a0 Application-specific instruction L1: Frequent sequence of
contiguous instructions Connectivity synthesis Control-path
synthesis Macro-block generation
HDL Code Generator Processor Specification Synthesizable HDL code
Connectivity synthesis Connects I/O and control ports of each
functional block to buses and control signals Control-path
synthesis Generates decoder logic for each pipeline stage Decoder
logic Memory size, address space Macro-block generation
Instantiates the parameter variables of each functional block ALU
Multiplier Shifter Register file Data memory1 AGU1 memory0 Program
memory Peripherals (Timer, SIO) Controller Bit-width of functional
blocks BMU Target core Design Example (MDSP-II)
GSM(Global System for Mobile communication) Benchmark programs C
programs (each algorithm constructing GSM) Procedure of design
refinement Remove infrequent instructions based on instruction
usage count Turn frequent sequence of contiguous instructions into
a new instruction EM2 (MDSP-II) EM0 EM1 Initial design containing
all predefined instructions Final design containing
application-specific instructions Evolution of MDSP-II Core from
The Initial Machine
Number of clock cycles (for 1 sec. voice data processing) Gate
count Machine EM0 (initial) 53.0 Millions 18.1K EM1 (intermediate)
53.1 Millions 15.0K EM2 (MDSP-II) 27.5 Millions 19.3K Number of
clock cycles EM1 EM0 50M 40M 30M EM2 (MDSP-II) 20M 10M Gate count
5K 10K 15K 20K Design Turnaround Time (MDSP-II)
Design turnaround is significantly reduced due to the reduction of
HDL design & functional simulation time. Only hardware blocks
for application-specific instructions, if any, need to be designed
by the user. Design progress Time (months) Application analysis HDL
design, Functional simulation Layout, Timing simulation MetaCore 1
2 3 5 weeks 1 week 7 weeks Tape-out Overview of EM2 (MDSP-II)
16-bit fixed-point DSP Optimized for GSM 0.6 mm CMOS (TLM), 9.7mm x
9.8mm 55 MCAU Program Memory Data Memory PU (SIO, Timer) DALU PCU
AGU MCAU (Mobile Comm. Acceleration Unit) consists of functional
blocks for application-specific instructions 16x16 multiplier
32-bit adder DALU (Data Arithmetic Logic Unit) 16x16 multiplier
16-bit barrel shifter 32-bit adder Data switch network PCU (Program
Control Unit) AGU (Address Generation Unit) supports linear, modulo
and bit-reverse addressing modes PU (Peripheral Unit) Serial I/O
Timer Conclusions MetaCore, an effective ASIP design methodology
for DSP is proposed. 1) Benchmark-driven & high-level
abstraction of processor specification enables performance/cost
effective design. 2) Generator set with system library enables
short design turnaround time.