vlsi design 1998 tutorial part 1. core building blocks and building systems using cores
DESCRIPTION
VLSI DESIGN 1998 TUTORIAL Part 1. Core Building Blocks and Building Systems using Cores. What are cores? Building systems using cores Challenges in using cores. Rajesh K. Gupta University of California, Irvine. Available “Core” Building Blocks. 68030. ARM810. PPC401. - PowerPoint PPT PresentationTRANSCRIPT
VLSI DESIGN 1998 TUTORIALVLSI DESIGN 1998 TUTORIAL Part 1. Part 1.
Core Building Blocks andCore Building Blocks andBuilding Systems using CoresBuilding Systems using Cores
Rajesh K. Gupta
University of California, Irvine.
What are cores?What are cores?Building systems using coresBuilding systems using cores
Challenges in using coresChallenges in using cores
©1998 R. Gupta 2
Available “Core” Building BlocksAvailable “Core” Building Blocks
7
4 2218.596
2525 1730
209
425 40
0
50
100
150
200
250
Sizein
Kgates
2 3 4 5 6 7
0.18
0.35
0.6
0.7
Number of Metal Levels
L drawn
Core Size in x1000 Gates
AR
M8
10
PPC401
68
03
0
©1998 R. Gupta 3
What Is A Core Cell?What Is A Core Cell?
• Working definition
• at least 5K gates
• pre-designed
• pre-verified
• “re-usable”
• Examples:– Processor: LSI logic CW4001/4010/4100, ARM 7TDMI, ARM 810, NEC
85x, Motorola 680x0, IBM PPC
– DSP cores: TI TMS320C54X, Pine, Oak
– Encryption: PKuP, DES
– Controllers: USB, PCI, UART
– Multimedia: JPEG comp., MPEG decoder, DAC
– Networking: ATM SAR, Ethernet
©1998 R. Gupta 4
Core TypesCore Types
• Soft cores (“code”)– HDL description
– flexible, i.e., can be changed to suit an application
– technology independent: may be resynthesized across processes
– significant IP protection risks
• Firm cores (“code+structure”)– gate-level netlist to be placed and routed
– technology sampled
• Hard cores (“physical”)– ready for “drop in”
– include layout and timing (technology dependent)
– IP is easily protected
– mostly processors and memory
– functional test vectors or ATPG vectors available.
©1998 R. Gupta 5
Core Types and Their UseCore Types and Their Use
“Soft”
system specificationBehavioral HDL Bus Functional
ISA model
system designRTL HDL
“Synthesizable RTL”
RTL Functional
scheduling, binding
Timing models
Power models
logic designGate Netlist
“Firm”Gate Functional
control generation, FSM synthesis
physical designMask Data
“Hard”Fault Coverage
floorplanning, placement, routing
Technology: ASIC or FPGATechnology: ASIC or FPGA
©1998 R. Gupta 6
Public Domain Formats Proprietary FormatsC, C++, VHDL, Verilog Synthesizable “subsets”ASCII BC/DC scripts, VCD, WGLEDIF, SPICE DEF, SPEF, ITL, NLDM, TLF, MMFSPICE LEF, SPEF, GDSII
DEF = Design Exchange Format (Cadence)SPEF = Standard Parasitic Extended Format (Cadence)GDSII = Layout format (Cadence)ITL = Interpolated Table Lookup cell-level timing model (Mentor)LEF = Layout Exchange Format (Cadence)
MMF = Motive Modeling Format (Viewlogic)NLDM = Non-linear Delay Model (Synopsys)TLF = Table Lookup Format (Cadence)VCD = Verilog Change Dump (Cadence)WGL = Waveform Graphical Language (TSSI)
Core Portability Core Portability
• Determined by technology independence and data format.
– Technology independence based on the type of core
– both open and proprietary data formats are current in use.
©1998 R. Gupta 7
Timing Information in Timing Information in Firm and Hard CoresFirm and Hard Cores
• Timing behavior can be generated from SPICE inputs
• However, it is not always possible for big cores– static timing information is necessary
• Basic delay model– propagation delay model from inputs to outputs
– slew model (as a function of load and input slew)
– input/output capacitances
– setup and hold constraints on inputs.
©1998 R. Gupta 8
What are cores?What are cores?Building systems using coresBuilding systems using cores
Challenges in using coresChallenges in using cores
©1998 R. Gupta 9
MEMORYMEMORY
Cache/SRAM Cache/SRAM or even DRAMor even DRAM
ProcessorProcessorCoreCore
DSP DSP Processor Processor
CoreCore
GraphicsGraphics VideoVideo
VRAMVRAM
Mot
ion
Mot
ion
Enc
rypt
ion/
Enc
rypt
ion/
Dec
rypt
ion
Dec
rypt
ion
SCSISCSI
EISA InterfaceEISA Interface
GlueGlue GlueGlue
PCI InterfacePCI Interface
I/O
Int
erfa
ceI/
O I
nter
face
LA
N I
nter
face
LA
N I
nter
face
Hub Architecture
Commodity Software:Commodity Software:- encryption/decryption- encryption/decryption- device drivers- device drivers- legacy code- legacy code- operating/runtime system- operating/runtime system
CommodityCommodityHardware:Hardware:-compression-compression-encryption-encryption-modem-modem-signal proc.-signal proc.-image proc.-image proc.
SOC is a SM of LSI Logic Corporation.
Building Systems-On-A-Chip Building Systems-On-A-Chip Using CoresUsing Cores
©1998 R. Gupta 10
Class Application Processor RequirementsData flow laser printers, X-
terminals, routers,bridges, imageprocessing
R4600, I960,29k, Coldfire,PPC (403, 605)
Processes data andpasses it on. Highmemory bw, highthroughput.
Interactivevideo &portable
set-top boxes, videogames, PDAs, portableinfo appliances
R3900,R4100/ 4300/ 4600, ARM6xx/ 7xx, V851,SH1/ 2/ 3
Interactive, lowcost, low power,high throughput.
Classicembedded
controllers, d iskcontrollers,automotive, industrialcontrol
Piranha, ARM,MIPS, Cores
mix of CPU power,low cost, lowpower, peripherals
Time-constrained computing Time-constrained computing systems.systems.
Set-topVOD+
GamesHQ Graphics
VideoConferencing
MPEG1encoding
Audio &VideoBridging
MPEG2encoding
High-endSet-top
PDADerivatives
S-O-C Application ClassesS-O-C Application Classes
©1998 R. Gupta 11
Systems-On-A-Chip (SOCs)Systems-On-A-Chip (SOCs)
Two Types:• Technology-Driven
– Developed In-House, maximum leverage of technology “crown-jewels”
– Close cooperation between module developers and system designers
– or wide-ranging cross-licensing agreements between partners
• Component-Driven– Core cells as IP carriers
» IP encapsulated into “usable” products
» design “reuse” is critical to IP products
©1998 R. Gupta 12
Component-Driven SOCComponent-Driven SOC
• Core supplier different from core user– “Third party IP providers”
• Significant technology packaging without importing it
– The IP provider wants to sell a product and not the technology behind the product
• Enormous technical, and legal challenges– can it be done successfully?
– who guarantees if a SOC works as required
– who is liable in case the end product does not perform?
©1998 R. Gupta 13
• 3Soft: uC, DSP, LAN, SCSI, PI• ARM: uC, uP• Plessey: per. controllers, DSP• Scenix: uC, PCI, DMA• Western Digital Center: uC• TI: DSP; NEC: DSP, uC• Symbios: ARM7 TC• VAutomation: uP, controllers• CAST: 2910A, IDT49C410, DMAc• Butterfly DSP: DSP, FFT, DFT, ADSL,
OFDM• Int. Sil. Systems: ADPCM, FIR• Analog Devices: DSP• DSP Group: Pine, Oak
• Digital Design & Dev: MIDI• Hitachi: MPGE, PCI, SCSI, uC• Palmchip: MPEG, UART, ECC• Silicon Engg.: micro VGA
• Eureka: PCI; Virtual Chips: PCI, USB• Logic Innovations: PCI, ATM• OKI: PCI, PCMCIA, DMA, UART• Sand: USB, PCI• Sierra: ATM SAR, Ether, R3000
• LogicVision: BIST, JTAG• ROHM: UART, SIO, PIO, FIFOc, Add,
Mpy, ALU• Synopsys: DesignWare, ISA, Intel uC• Chip Express: FIFO, RAM, ROM• VLSI Libraries: Memory, Mpy
• Focus Semi: PLL, VCXO• VLSI Cores: Encryption, DES• ASIC Intl: DES
NOT EXHAUSTIVE.
One-stop Shops
ASIC Cores AvailabilityASIC Cores Availability• LSI logic CoreWare
• IBM Microelectronics
• Motorola FlexWare
• Lucent One-Stop Shops
©1998 R. Gupta 14
FPGA/CPLD Cores AvailabilityFPGA/CPLD Cores Availability
• Capacity constrained cores– do not include wide/high performance PCI, ATM SAR, or
Microprocessors
• Altera– 8-bit 6502
– DMAC 8237
• Xilinx– PCI
• Actel– System Programmable Gate Array (SPGA)
» combine FPGA with customer ASIC
» ASIC examples: PCI, Router, DMA controller.
©1998 R. Gupta 15
Three ways:
Lic
en
sab
leFou
nd
ary
Cap
tive
Foundary captive cores do not have to reveal internal design and layoutof the core. The foundary provides a bounding box.
Current Core Market ModelsCurrent Core Market Models
• 1. A design house licenses design and tools – DSP Group (Pine and Oak Cores), 3Soft, ARM (RISC)
– offering includes HDL simulation model, tool and/or an emulator
– customer does the design, fab.
• 2. Core vendor designs and fabs ICs– TI, Motorola, Lucent
– VLSI, SSI, Cirrus, Adaptec
• 3. Core vendor sells cores, takes customer designs and fabs ICs
– LSI logic, TI, Lucent
©1998 R. Gupta 16
0
5
10
15
20
25
30
35
40
0 2 4 7 9 11
13
16
20
24
60
Months to completion
Source: Integrated System Design
Core Trends:Core Trends:1997 Survey of Designers1997 Survey of Designers
• 74% hardware designers.
• 26% plan to purchase core for next design:– 40% hard, 68% soft, 32% firm
©1998 R. Gupta 17
SRAM 82%ROM 57%HighSpeed 46%Multiport 44%Low power 38%DRAM 38%Cache 20%Video 16%
MEMORY16-bit uC 40%8-bit uC 30%RISC 28%32-bit uC 20%x86 20%68xx 18%DSP 16%PPC 14%Fixed 56K 4%Floating 4%
PROCESSORSPCI 49%Comm. 40%UART 32%Video 18%USB 16%Graphics 14%Firmware 14%PCMCIA 12%
INTERFACE etc.ADC/DAC 44%PLL 44%Analog 32%RAMDAC 10%Rambus 4%
ANALOG
Accumulator 34%Adder 30%Mpy 30%Shifter 28%
GENERICS
Source: Integrated System Design
Application NeedsApplication Needs
©1998 R. Gupta 18
CPU
PCIcontroller
Host Bus
Primary PCI Bus
PCI/IDE/ISA
ISA Bus
IDE
ASIC
Using Cores : PCIUsing Cores : PCI
• Class of interface cores such as– USB, UART, SCSI, PCI, 1394 etc.
• Identify target technology– ASIC, FPGA
• PCI (Peripheral Component Interface)
– processor independent CPU interface to peripherals
– multi-master, peer-to-peer protocol
– synchronous: 8-33 MHz (132 MB/s)
– arbitration: central, access oriented, “hidden”
– variable length bursting on reads and writes
– (I/O, Mem) x (Read, Write) and IACK commands
©1998 R. Gupta 19
PCI CoresPCI Cores
• VHDL/Verilog synthesizable cores with options:– PCI-Host, PCI-Satellite
– 32-bit (33 MHz) or 64-bit (66 MHz)
– FIFO or register data storage
– Synchronous or Asynchronous host interface
• Core components– Master/Target Read/Write FIFOs,
– Master/Target State Machines
– Configuration registers
• Timing requirements– input setup time = 7ns; clock to output delay = 11ns
• DC Specs: input pin caps: 10 pF, clk pin 12 pF, ID Sel 8pF
©1998 R. Gupta 20
Sou
rce:
EE T
imesUser ExperienceUser Experience
• Huges Network Systems: – DirecPC ASIC in a satellite receiver card
– 80K gates device on Chip Express process
• DirecPC consists of– IDT R3041 RISC controller
– Memory, Demodulator, Error-check, PCI core
• PCI core from Virtual Chips– 17K gates including asynchronous FIFOs
– Guesstimate: 4K extra gates due to the core (5%)
• Comments:“Their test vectors assume you have direct access to the
internal interface of the core. I looked through their test vectors and tried to do the same things using my back end.”
“They were kind of giving us a reference documentation. It wasn’t turnkey.”
©1998 R. Gupta 21
Using Cores: DSPsUsing Cores: DSPs
• 16-bit fixed point processors are most commonly used.
• DSPs– simple: Clarkspur Design CD2450 (variable data width)
– compatible: DSPGroup, TI, SGS-T: 320C5x
– clone:
• Options– memory, mem controller, interrupt controller, host port,
serial port
• Criticals– power consumption as most DSP applications go into
portable products
©1998 R. Gupta 22
Design using DSP CoresDesign using DSP Cores
• Core vendors often supply a development chip or core version of the COTS processor
– board-level prototyping fairly common
– followed by single-chip solution
• To avoid board-level prototyping, a full-functional simulation model is a must, particularly for foundry captive cores.
• Software tools provided– assembler, linker, instruction set simulator, debugger,
(high-level language compiler?)
©1998 R. Gupta 23Not exhaustive, only a representative sample.
DSP Sample PointsDSP Sample Points
• TI TEC320C52– 16-bit fixed-point TMS320C52
» 1Kx16 data RAM, 4Kx16 program RAM
» 2 serial ports, 1 16-bit timer
– and 0.8 micron 15,000-gate gate array
• Motorola 7-Day CSIC– 8-16 MHz HC08, DMA, MMU, ..
• SGS-Thomson ST18932, ST18950– 16-bit fixed-point DSPs, 0.5 u, 3.3 volt CMOS, 80MHz
– has no off-the-shelf DSP IC
– used in PC sound cards, 950 has a better assembly
©1998 R. Gupta 24
Third Party DSP CoresThird Party DSP Cores
• DSPGroup Pine– 16-bit fixed-point, 0.8u CMOS, 5.0/3.3 V, 40 MHz
– 36-bit ALU, 16-bit MPY, 2Kx16 RAM/ROM, (prog mem is outside core)
– used in pagers and answering machines
• DSPGroup Oak– same as Pine, plus includes a bit manipulation unit
– Viterbi decoding support instructions (min, max)
– used in digital cellular telephony
• Clarkspur CD2400, CD2450– 16-bit fixed-point
– 24-bit ALU, MPY, Acc, 2x 256x16 data RAM/450 makes it 48 bits
– used in fax-modem
©1998 R. Gupta 25
One-Stop Shops: LSI Logic One-Stop Shops: LSI Logic CoreWareCoreWare
• Cores for building ASIC for most embedded applications:
– laser printer, ATM, PDA, Set-top, Router, Graphics accelerators, etc.
• CPU cores: miniRISC CW4K, Oak DSP– miniRISC compatible with MIPS R4000
– 0.5u CMOS, 2mW/MHz, 60MHz, 3-stage pipeline
– 32-bit address/data bus
– full scan: 99% fault coverage, gate-level timing model
• Interface: PCI, Fibre Channel, SerialLink
• Networking: Ethernet, ATM (SAR), Viterbi, RS
• Compression etc: MPEG, JPEG, DAC/ADC.
©1998 R. Gupta 26
Core ExamplesCore Examples
• Only a representative sample of cores. Not exhaustive or even comparative.
• Processor cores– LSI Logic CW4001, CW4010
– ARM (7) processors
– Motorola FlexCore
• Memory cores– 16M/18M Rambus DRAM
• Multimedia cores– CompCore CD2
• Networking– Media Access Controller (MAC)
• Encryption cores– VLSI cores, ASIC international.
CP0 ALU Shifter
Register FileFlexLink
CBus
Cou
rtesy
: S
. D
ey, IC
CA
D’9
6LS
I Lo
gic
.
LSI Logic: CW4001 CoreLSI Logic: CW4001 Core
• Behavioral Verilog/VHDL model
• Gate-level timing accurate model
• Specifications– 60 MHz, 60 MIPS (45 MIPS average), 3 stage pipeline
– 0.5 micron CMOS process, 4 sq. mm., 2mW/MHz
– Full-scan with 99% fault coverage.
• Interfaces: – CBUS, Computational Bolt-On (CBO), Co-processor, MMU
• Customizability: – BIU, cache controller, MDU, MMU, DRAM/SRAM controllers,
timers, caches (<16K), RAM/ROM, DMAc
– Upto 3 Co-processors (FPU, Graphics, Compression, Network Protocol), MPY/DIV unit, CRC, direct access to CPU GPRs
©1998 R. Gupta 28
• Co-processor has its own instruction set including• read data bus for instruction, rd/wr to external mem.• read/write to CPU registers, stall and interrupt CPU
• CW delivers [0:5] and [26:31] opc fields to Co-processor instr. decoder• Coprocessor executs in lockstep with CPU pipeline stages.
BIU,Cache
Controller
Write Buffer
DRAMController
TimerDMA
Controller
ExtendedBIU (XC)
BBus
XBus
Courtesy: S. Dey, ICCAD’96LSI Logic.
CW4001
Co-procInterface
FlexLinkInterface
CPUBusInterface
CU Cache
MMU RAM/ROM
CPUBus
Mult/Div
coprocessor
Using CW4001Using CW4001
©1998 R. Gupta 29
CW4010 CPU CoreCW4010 CPU Core
• Verilog/VHDL model with gate-level timing
• 80MHz, 160 MIPS (110 MIPS average), 6 stage pipeline
• 0.5 micron CMOS, 9 sq. mm., 5 mW/MHz
• Integrated cache controllers with separate I and D caches
– cache size from 2-16 KB
• 64-bit memory and cache interface
• Up to 3 co-processors
• Full-scan with 99% fault coverage.
Advanced RISC Machines (ARM )Advanced RISC Machines (ARM )
• A family of 32-bit RISC processor cores
• ARM6, ARM7: MPU with Cache, MMU, Write Buffer and JTAG
• ARM7TDMI :ARM7 with Thumb ISA, ICE, Debug & MPY
• ARM8 : cached, low power, 5-stage pipe (vs 3 in others)
• StrongARM1, StrongARM2: available as Digital SA-110 (21285)
• Piccolo: DSP co-processor for ARM, shares system bus (AMBA)
– support for Viterbi, bit manipulation operations
– four nestable zero-overhead hardware loop constructs
– splittable ALU, 1 cycle dual 16-bit operations
– saturation arithmetic
– 1024 point in place complex radix 2 FFT in 33,331 cycles
• Manufacturing partnerships and/or licensing with– Cirrus logic, GEC Plessey, Sharp, TI and VLSI Tech.
©1998 R. Gupta 31
ARM Processor CoresARM Processor Cores
• Enhancements: ARM7D, ARM7DM, ARM7DMIM = 64-bit result hardware multiplier running at 8bits/cycle
D = 2 boundary scan chains for basic debug
I = Embedded ICE debug– Thumb instruction set
Family Xstrs Process Sizemm2
ClkMHz
Vcc(V)
Power(mW)
MIPS
ARM6 33K 0.6u 5.6 45 5 225 40ARM7 36K 0.6u 5.6 45/28 5/3 225/46 40/25ARM7TDM 59K 0.6u 4.26 40/27 5/3 200/44 38/..ARM8 NA 0.5u NA 120(est) 3 500 80
Source: ARM Inc.
©1998 R. Gupta 32Source: ARM Inc.
ASICASIC
ARMARMCoreCore
EmbeddedICE Cell (creates to core)
ICE
Debug Hostrunning ARMsd
40KB/s softwaredownload
Uses boundaryscan pins
ARM Enhancements: Embedded ARM Enhancements: Embedded ICEICE
• The EmbeddedICE core cell allows debugging of ARM core embedded with an ASIC:
– real time address and data-dependent breakpoints
– full access and control of the CPU
– can be reduced for size savings once the part goes into production.
1110
001
001
10
01001
Rd
0 Rd 0 Rd
Constant
0000 Constant
16-bit Thumb instr.
32-bit ARM instr.
alwaysmaj. opc.min. opc.dest. and src. zero extended
ADD Rd #constant
ARM Enhancements: Thumb ISAARM Enhancements: Thumb ISA
• 8- or 16-bit external, 32-bit internal
• Thumb instruction set is a subset of 32-bit ARM instruction set
– 16-bit instructions
– expanded into 32-bit ARM instructions at run time without any penalty
• Up to 65-70% smaller code size compared to ARM
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory
©1998 R. Gupta 34
Courtesy: S. Dey, ICCAD’96
ARM ApplicationsARM Applications
• Widely used in a variety of applications– low cost 16-bit applications
» mobile phones, modems, fax machines, pagers
» hard disk and CD drive controllers
» engine management
– low cost 32-bit applications
» smart cards
» ATM and ethernet network interfaces
» low power, on-chip application code
– high performance 32-bit applications
» digital cameras
» set top boxes, network switches, laser printers
» external memory system (RAM, ROMs)
©1998 R. Gupta 35
Motorola FlexCore Motorola FlexCore
• CPU cores based on 680x0 family– EC000, EC020, EC030
– all with static operation, 5/3.3 volt supplies
– performance:
» EC000: 2.7 MIPS @16.67MHz, 33 mW
» EC020: 7.4 MIPS @25 MHz, 150 mW
» EC030: 11.8 MIPS @33 MHz, 258 mW
• Serial I/O cores: 68681UART, MBus, SPI
• RT clock, Dual timer cores
• SCSCI, Parallel I/O, 8051 interfaces
• DRAM, Interrupt, JTAG controllers
• PLA, PLL, oscillators, power management cells.
©1998 R. Gupta 36
Memory Core ExampleMemory Core Example
• Virtual Chips 16M/18M bit Rambus DRAM
• Verilog/VHDL simulation model
• Organization– two banks, 512 pages per bank, 72x256 per page
– dual internal banks, 2K byte cache per bank
• Programmable ack, write, read delays through control registers
• Synchronous protocol for fast block oriented xfrs.
• Modes of operation– reset, stand-by, power-down, active
• Deliverable: VHDL, Verilog source, test bench, test vectors, documentations.
• Others: Sand DRAM, VRAM verilog models.
©1998 R. Gupta 37
Multimedia CoresMultimedia Cores
MPEG input
1Mx16 SDRAM
AudioDecoder
VideoDecoder
microc. interface
phy. mem. controller
synchronization
virtual mem. controller
SRAM SRAM SRAM
• JPEG compression, MPEG decoding, Video DAC, etc.
• IBM Microelectronics, LSI logic, PalmChip, Silicon Engineering, Mentor Graphics, CompCore, Intrinsix VGA
• Example: MPEG-2 decoder from CompCore
– 70K-80K gates
– 18K bits of internal SRAM
– 16Mbit SDRAM (external)
» bitstream buffering, frames
– 54MHz, 16-bit external mem. bus
Source: CompCore
CD2Decoder
audio stream video str.
©1998 R. Gupta 38
Other Core CategoriesOther Core Categories
• Protocol choices:– switched Ether, s. TR,
ATM155, ATM25
• Example: SYM1000 from Symbios
– HDL code, 3.3 V, 0.5u
– CSMA/CD ethernet
– programmable inter-packet gap.
– Optional CRC insertion, and check
– MII interface to physical layer device
– Host bus interface
• LSI Logic: ATMizer
• VLSI Cores– PKuP encryption core
» implements modular exponentiation
» synthesizable HDL core
– DES core as a synthesizable Verilog model
» two models: 8 bytes/8 cycle, 8 bytes/16 cycles
• ASIC International– DES cores
– Exponentiator Engine
– Hash function cores
Networking Encryption
©1998 R. Gupta 39
What are cores?What are cores?Building systems using coresBuilding systems using coresChallenges in using coresChallenges in using cores
©1998 R. Gupta 40
Challenges in Using CoresChallenges in Using Cores
• A core cell is not a single product– a PCI cell consists of 25 separate Verilog files
» plus as many synthesis scripts
– immature interface abstraction
» e.g., there is no direct access to the core from the end product. Access must be created.
• A core is not an end product– a core cell is design + know-how to use it for a particular process,
tools and even application
• Testability and testing is a challenge– as opposed to design, testing is not a hierarchical problem
» using 90% testable cores does not give 90% system testability
» tests are core-specific, not applicable from primary IO
What is an efficient design methodology using cores?
©1998 R. Gupta 41
Interface
Processor ASIC
Memory
Inte
rfac
e
Analog I/OD
MA
2. HDL ModelingArchitectural synthesisLogic synthesisPhysical synthesis
3. Software synthesis,Optimization,Retargetable code gen.,Debugging & Programming environ.
1. Design environment, co-simulationconstraint analysis.
4. Test Issues,Test access,Isolation,ATPG
Processor cores introduce software part of system design.Processor cores introduce software part of system design.
SOC Design Problem ComponentsSOC Design Problem Components
©1998 R. Gupta 429
Co-Design ComponentsCo-Design Components
• Specification, Modeling and Analysis– How to capture designer intent efficiently in a design
language?
» HDL optimizations
» Constraint modeling and analysis
• System Validation– How to use description in building a (computational)
prototype capable of running actual applications?
» Co-simulation, Formal Verification
• System Design and Synthesis– Delayed partitioning of hardware and software
– Software synthesis and optimizations
– Interface design and optimizations.
©1998 R. Gupta 43
System Specification: System Specification: Goals & CharacteristicsGoals & Characteristics
• Main purpose: provide clear and unambiguous description of the system function, and to provide a
– documentation of the initial design process
• Support– diverse models of computation
– allow the application of computer-aided design tools for
» design space exploration
» partitioning
» software-hardware synthesis
» validation (verification, simulation)
» testing
• Should not constrain the implementation options.– diverse implementation technologies.
©1998 R. Gupta 44
Embedded System ModelingEmbedded System Modeling
• Reactive and time-constrained interactions
• Consist of structural and behavioral components.
• Hierarchically organized components.
• Synchronous and asynchronous communications.
• Locally or globally clocked.
• Idealized as Synchronous Reactive Systems.
©1998 R. Gupta 45
Synchronous Reactive ModelingSynchronous Reactive Modeling
• Zero computation time
• System outputs produced in synchrony with inputs
• Instantaneous broadcast communications
• Deterministic behavior: – a given sequence of inputs always produces same output
sequence.
• Examples languages using this model– ESTEREL, LUSTURE.
– More later.
©1998 R. Gupta 46
Example: EsterelExample: Esterel
• Reactive and atomicity of reactions– “watching” implements a generalized watchdog
– Time as discrete “instants”
– Easily translated into a transducer (FSM generation)
– Perfect synchrony hypothesis
• Instantaneous broadcast– Implicit communication architecture.
– Using signals which are present or absent and may carry a value.
– Pure signals do not carry a value.
©1998 R. Gupta 47
Constraint and Interface ModelingConstraint and Interface Modeling
• Source of timing constraints– Time-constrained interactions between system components
and environment
– Specified using statement tags on HDL descriptions.
• Types of constraints– Delay and interval constraints (latency-type)
– Rate constraints (throughput-type)
• Constraint satisfiability– Are constraints satisfied for a given implementation?
– Given an implementation, resynthesize to satisfy a given set of constraints.
©1998 R. Gupta 48
RUNTIME SYSTEM
DISPLAY INFO
CALIBRATION
GET INFO
CLOCKSTATE
VEHICLE CRUISE CONTROLLER
CurFuel
RotClk
brakegear
valve
speed
ave_speed
consumption
maintenance
ROUTINE
ROUTINE
InstVel AveVel
SecClk
SecPulse
1/sec
1/sec
<= 1ms1000/sec
DATA-RATE
OP-DELAY
Derived from events at system interfaces.
ExampleExample
©1998 R. Gupta 49
Interface Modeling using Interface Modeling using ConstraintsConstraints
• Interface described using events.
• Events are instances of actions.
• Most common interface action is a signal transition on a wire.
• Temporal relationship between events:– Propagation delays:
– Bounds on event separation intervals: min, max, linear
– Absolute versus relative rate constraints.
©1998 R. Gupta 50
LINEAR
ij
k
MAX
ij
k
MIN
ij
k
i j
kmax max
min min
Binary Delay ConstraintsBinary Delay Constraints
©1998 R. Gupta 51
Interface Delay Timing Interface Delay Timing ConstraintsConstraints
• Three types: (McMillan & Dill)– Given events i and j with time stamps ti and tj respectively
and dij as the delay i to event j, such that lij <= dij <= uij :
» min constraints: tj = mini<j (ti +dij )
» max constraints: tj = maxi<j (ti +dij )
» linear constraints: tj - ti <= sij where sij is maximum achievable separation between i and j.
• Constraint graph:– nodes <=> events; edges <=> constraints.
• Synthesis: find maximum achievable separation between pairs of events (minimum separation depends upon operation delays.)
• Rate constraint analysis and “debugging.”
©1998 R. Gupta 52
Hardware Modeling Hardware Modeling As A Programming ActivityAs A Programming Activity
• Programming languages are often used for constructing system models
• Core based designs assume that all new designs originate as an “HDL” model
• Hardware– concurrency in operations
– I/O ports and interconnection of blocks
– exact event timing is important: open computation
• Software– typically sequential execution
– structural information is less important
– exact event timing is not important: closed computation.
©1998 R. Gupta 53
HDL Semantic NecessitiesHDL Semantic Necessities
• Abstraction – provide a mechanism for building larger systems by
composing smaller ones
• Reactive programming – provide mechansims to model non-terminating
interaction with other components
– watching (signal) and waiting (condition)
» must be separate (else one is an implementation of the other)
– exception handling
• Determinism– provide a “predictable” simulation behavior
• Simultaneity– model hardware parallelism, multiple clocks
©1998 R. Gupta 54
HDL PragmaticsHDL Pragmatics
• Data types– simple (bit/Boolean): HardwareC, Verilog
– complex (records): VHDL
• Interface abstraction– provide an external view independent of implementation
» Classes (packages) in C++, VHDL
» Entity interfaces or Tasks: VHDL, ADA
©1998 R. Gupta 55
Pragmatics (contd.)Pragmatics (contd.)
• Communication– shared variables using explicit communication
architectures
– synchronous handshaking using implicit communications (ADA task entry call)
– instantaneous broadcast (Esterel)
– asynchronous message passing using explicitly communication architectures
• Time– global, multiple clocks, logics.
©1998 R. Gupta 56
(Restricted) HLL Description
Add reactivity,clock(s), waiting & watching
Refine data types - bit true, fixed point - saturation arithmetic
HDL Description
CONTROL DATA
Going from HLL to HDLGoing from HLL to HDL
©1998 R. Gupta 57
HLL RestrictionsHLL Restrictions
• Classes for synthesis target do not use– unions, floating, pointers (only interface with lib)
– type casts
– virtual functions (restricted to only library classes)
– policy of use on shared variables
• Suggestions:– explicit initialization blocks
– use “defines” instead of conditional process enables for statically determined conditions
©1998 R. Gupta 58
Adding ReactivityAdding Reactivity
• Reactivity can be added in one of three ways:1. use annotations, comments
» commonly used in “home-grown” C-based HDLs
» sometime use “semantic overloads” that is association an alternative interpretations.
2. use library assists
» additional library elements that can be used by the programmer in modeling hardware.
» example: additional classes in C++
3. use additional language constructs
» new constructs require a specific language front-end, new debugging tools.
» example: divide operations across cycles using next()
©1998 R. Gupta 59
Adding Data TypesAdding Data Types
• Identify signals– storage elements, structured memory blocks
• Type variables : signed, unsigned, std_logic
• Size state variables on instantiation
©1998 R. Gupta 60
Language Orientation Data Types Assignments Processes Structure Delays
Verilog DES model(gate level)
event, real, int,no, weak typing
inertial,immediate,high prioritypreemptive
concurrent,initial, always,task, cont. assig
components timing &functionality
VHDL DES model user defined preemptive sequential,guarded
components delayedassignements
Esterel Reactivemodels
int, bool, trivabstract
atomicreaction
nested actions modules perfectsynchrony
Scenic Synch.synthesis
signed, nsigned,2d-arrays
next cycle clocksynchronized
ProcessesTBD
synchronous
•Verilog, VHDL: compiler produces inputs to run a DES simulator. •Esterel: compiler produces a single deterministic FSM.•Scenic: compiler produces (synthesizable) processes and a simulator.
Language ComparisonsLanguage Comparisons
©1998 R. Gupta 61
From HDL to Circuit/System:From HDL to Circuit/System:Compilation & SynthesisCompilation & Synthesis
• Compilation spans programming language theory, architecture and algorithms
• Synthesis spans concurrency, finite automata, switching theory and algorithms
• In practice, the two tasks are inter-related.
• Compilation and synthesis tasks are done in three steps:
– front-end, intermediate optimizations, back-end.
©1998 R. Gupta 62
CompilationCompilation
• Program compilation for software target– Front-end parsing into intermediate form
– Optimization over the intermediate form
– Back-end code-generation for a given processor
• HDL compilation for hardware target– Front-end parsing into intermediate form
– Optimization over the intermediate form
– Back-end architecture, logic and physical synthesis.
©1998 R. Gupta 63
Synthesis and OptimizationSynthesis and Optimization
• Substantial growth in last twenty years
• Industry-standard tools in – Logic synthesis
– Physical synthesis
• Behavioral synthesis just becoming commercial.
• Substantial room for growth when considered together with software compilation.
©1998 R. Gupta 64
Behavioral to RTLBehavioral to RTL
• Basic transformations needed– 1. Operation scheduling
– 2. Resource binding
– 3. Control generation: central or distributed..
• Evolutionary growth to synthesis tools– Designer expertise today lies in the RTL coding
– Synthesis tools are strongly dependent upon design methodology.
• Generate a structure suitable for synchronous and single-phase circuits
– resource performance in terms of execution delay
– in number of clock cycles
• Design space:– area, cycle time, latency, throughput
©1998 R. Gupta 65
Synthesis TasksSynthesis Tasks
• Operation scheduling, resource binding, control generation
• Scheduling determines operation start times– minimize latency
• Resource binding: resource selection, allocation– minimize area (maximize sharing)
• Control synthesis:– data-path = “connectivity synthesis”
» detailed resource connections
» steering logic
» connection to the interface
– control synthesis
» synthesize controller that provides operations/resource enables, operation synchronization, resource arbitration
©1998 R. Gupta 66
A CAD Methodology for SWA CAD Methodology for SW
• Automated software synthesis from specs.– Synthesis tools generate implementation
– Global optimization of the program.
• Optimization used to achieve design goals.
• Analysis and verification tools for feedback.
• Compilation for embeddable software
• Software Optimizations– Code compression
– Optimization for power
– Instruction-set generation
– Static memory allocation
©1998 R. Gupta 67
CompressionCompression
• Block-based compression– Program compressed in small blocks to preserve random-
access properties (e.g., cache line blocks)
• Transparent code compression– ISA unchanged. Compression uses compiler output.
– Decompression performed by cache refill engine.
– Processor sees only uncompressed code.
– Techniques: Huffman coding.
• Key issue: code location in memory after compression?
©1998 R. Gupta 68
Compilation: What is New?Compilation: What is New?
• Machine description– in terms of architecture -> programming
– in terms of organization -> hardware
• Retargetable code generation has traditionally addressed the problem of compilation for an architecture.
• SOCs also need input about machine organization in order to perform timing analysis on generated code
– Two approaches:
» describe detailed machine
» extract ISA from machine organization
©1998 R. Gupta 69
HardwareDesign &Synthesis
C code
Assembly
Compiler
Application Development
Machine Definition
CompilerGenerator
Code generatorAlgorithm(s)
EDA
Co-Design FrameworkCo-Design Framework
©1998 R. Gupta 70
Test Strategy for Firm/Hard CoresTest Strategy for Firm/Hard Cores• System-level test strategy
– build test sets for cores
» generate functional vectors
» fault grade for interconnects
– prepare cores for test application from primary inputs through access/isolation, Scan/DFT
– if BIST, schedule BIST application and signature analysis.
• System-level DFT– goal is to reduce testing cost
– increase accessability of the internal nodes
» controllability: ability to establish a specific signal value at each node from primary inputs (PIs)
» observability: determine signal value by controlling Pis and observing primary outputs
» tradeoffs: area, I/O pins, performance, yield, TTM
©1998 R. Gupta 71
DFT TechniquesDFT Techniques
• Commonly used approach is to modify a sequential circuit into a combinational one during test.
– Automatic test generation is much easier for combinational circuits
• Current monitoring techniques.
• For sequential circuits, scan techniques are often used
– link memory elements into a shift register
– serially load and read out
– boundary scan is commonly used to test board-level devices
• Built-In Self Test– minimal external support, high fault coverage, easy access
requirements, protect IP
©1998 R. Gupta 72
Test Access for Cores Test Access for Cores
• Peripheral access techniques– parallel access, serial access or functional access
• Parallel access– add MUXs to connect core IOs, high routing overhead, pin
limitations may prevent parallel access
• Serial access– most common is ring approach, during test core I/Os are
connected via a scan chain, low overhead, delay penalty, easy to test user-defined logic, long test application time
• Functional access– sensitize path through cores, low hardware cost, parallel
test pattern translation possible.
• Also need isolation mechanisms for cores.
©1998 R. Gupta 73
Summary of Part ISummary of Part I
• Core cells present a new market opportunity– core cells are breathing life into many “old” designs (6502)
– a new class of “third-party vendors” who bridge the gap between design houses and EDA vendors.
• Productization of cores faces many challenges– portability of cores versus design reuse
– socketing standards (portability and reuse)
– IP protection: encryption, product versus technology
– design and test methodologies
• Research outlook is aligned with industry expectations– all new designs start with HDL description
– immediate focus on validation, testability issues
– long term focus on software optimization, complexity management.