lecture 1 - introduction 1 - introduction.pdf · standard-cell asic • circuit made using a set of...
TRANSCRIPT
Lecture 1 - Introduction
Arto Perttula
TIE-50206 Logic Synthesis
Tampere University of Technology
2015-2016
Lecture Contents
1. Course organization
2. Introduction to implementing digital systems
19.10.2015 Arto Perttula 2
Course Goals
• Get to know practical digital system design
• Aware of challenges of digital system design
• Design for portability
– Device independency, software independency
– RTL design, parameterization
• Design for large scale
– Large module, large system, overall development process
– Design reuse
• Design for efficiency
19.10.2015 Arto Perttula 3
Course Description
• Web: http://www.tkt.cs.tut.fi/kurssit/50200
– Note: This course is as POP-free as possible
• Lectures, startin at Mon 19.10.2015
– Monday 12-14 TC133 (three times, 19.10, 26.10, 2.11)
– Tuesday 10-12 TC133
• Exercises at TC221, starting on the first week
– Sakari Lahti and Arto Perttula
– Wed 12-14
– Thu 10-12
– Fri 12-14
19.10.2015 Arto Perttula 4
Course Description (2)
• Course requirements:
– Regular exam or two midterm exams (own notes are allowed)
– Succesful exercises/exercise work
• Course primarily based on book:
– RTL Hardware Design Using VHDL: Coding for Efficiency, Portability,
and Scalability. Chu, Pong P. (2006)
• Available at TUT library
– Lectures and lecture notes should be enough for passing the course
19.10.2015 Arto Perttula 5
Expected Backround Knowledge
19.10.2015 Arto Perttula 6
2. Karnaugh maps
3. Basic gates and D flip-flop,
critical path
1. Boolean algebra
4. Finite state machine (FSM), Mealy and Moore
6. Timing analysis (at cycle level)
5. Control and data paths, hierarchical design
Course Contents
I. VHDL language
– Very High Speed Integrated Circuit Hardware Description Language
= VHSIC HDL = VHDL
– Familiarize with the language constructs
II. Testbenches and simulators, synthesis, guidelines for re-use
III. FPGA circuits, designing for them
IV. Advanced topics: multiple clock domains, clock synchronization,
system design challenges
19.10.2015 Arto Perttula 7
Preliminary Schedule, Fall 2015
8
DE2 development board Block diagram of the synthesizer
Mandatory Exercise Work
• Simple audio synthesizer implemented on FPGA development board
– Each of the four buttons produces different tone
– Sounds are heard from the external speakers
During the Exercises, You’ll Learn
1. To describe, synthesize, and verify digital systems using
VHDL
– De facto standard in European microelectronic industry
2. To read data sheets
3. To use I2C bus developed by Philips
– Serial bus used, e.g., in car industry
4. To operate Wolfson audio codec chip
– Also used, e.g., in some iPods
19.10.2015 Arto Perttula 10
Exercises in Practice
• Weekly exercises in TC221 (Windows class)
– Done in groups of two (alone only in special cases)
– Three guidance sessions per week
• Presence is not required
– Return is mandatory
– Deadline is withing two weeks (due Sunday 23:59)
• The first exercise on Wed 21.10.2015
• Possibility to gain 6 bonus points to the passed examp by completing separate bonus tasks
• You must report (average) hours per person for each exercise
19.10.2015 Arto Perttula 11
Hour reporting
Reserve Enough Time
• Exercises take 3-4h/week on average but
– Verification is harder than you think
– Large variations between groups
• Start early!
19.10.2015 Arto Perttula 12
1.45 1.551.85
2.07
4.70
2.64
4.28
5.84
1.872.18
7.46 7.33
3.96
1.971.71
0
1
2
3
4
5
6
7
8
tuto
riaali
3-b
+
gen.
+
hie
r.
tied-t
b
kolm
io
audio
ctr
l
audio
tb
synth
top
quart
us
i2c
i2c t
b
debug
fifo
synth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Harjoitus
Harj
oit
usko
hta
iset
aja
nkäytt
ö,
[tu
nti
a]
avg (kaikki)
min
Getting the Development
Boards and Softwares
• Students may borrow an FPGA kit to do the exercises
and own hobby projects
– You may keep the kit if you write a BSc/MSc thesis for the
Department of Pervasive Computing
– Info and pickups from the room TH210 (Sakari Lahti)
– http://www.tkt.cs.tut.fi/Opetus/Fpga_board
• Students may install the needed EDA tools to their own
computer
– http://www.tkt.cs.tut.fi/tools/public/tutorials/mentor/licensing/
licensing.html
19.10.2015 Arto Perttula 13
Action Points
1. Register to one of the exercise groups
2. Access rights from spring 2015 are still valid
– Otherwise, fill and sign Access application and confidentiality
agreement
– http://www.tkt.cs.tut.fi/kurssit/STUDENTS_CONFIDENTIALITY_AGRE
EMENT.pdf
– Return the form to Jari Salo at room TE207
3. Optional: You may install the needed SW tools to your home
computer
4. Optional: You may borrow Altera DE2 FPGA board
19.10.2015 Arto Perttula 14
HW-oriented Courses at TUT
• Most direct follow-up is TIE-50506 System Design which combines
HW and SW on a single chip
– CPU, application software, drivers-OS
– HW accelerator, system architecture, buses
19.10.2015 Arto Perttula 15
Check the details
from the study guide
1. INTRODUCTION
19.10.2015 Arto Perttula 16
Acknowledgements
• Prof. Pong P. Chu provided ”official” slides for
the book which is gratefully acknowledged
– See also: http://academic.csuohio.edu/chu_p/
• Most slides were originally made by Ari Kulmala
– and other previous lecturers (Teemu Pitkänen, Konsta
Punkka, Mikko Alho, Erno Salminen…)
19.10.2015 Arto Perttula 17
Digital Circuits
• Nowadays found everywhere – from washing machines to space shuttles
• Digital circuits are typically integrated circuits (IC)
– Minimize the number of discrete components
• Typical digital systems, such as cellular phones, contain
– (Several) Processors and co-processors
– Application specific hardware
– An on-chip interconnection between the components
– Memory
• RAM, FLASH, even hard disks
– RF/Analog IC
• Out of the scope of this course
19.10.2015 Arto Perttula 18
How to Implement a Digital System
• No two applications are identical and every one needs certain amount of
customization
• Basic methods for customization
1. ”General-purpose hardware” with custom software
– General purpose processor (GPP): e.g., performance-oriented processor (e.g., Pentium),
cost-oriented processor (e.g., AVR micro-controller)
– Special purpose processor: architecture with a specific set of functions: e.g., DSP
processor (efficient multiply-add), network processor (to do buffering and routing), GPU (to
do 3D rendering)
2. Custom HW platform (CPU+other hardware) with custom SW (requires hardware-
software co-design)
3. Custom hardware only (no software)
19.10.2015 Arto Perttula 19
How to Implement a Digital System (2)
• Trade-off between flexibility,
programmability, design effort, cost,
performance, and power consumption
• A complex application contains many
different tasks and use more than one
customization methods
19.10.2015 Arto Perttula 20
1. INTRODUCTION
1a. Device Technologies
19.10.2015 Arto Perttula 21
What Does an IC Look Like?
22
Intel Penryn dual core
http://www.intel.com/pressroom/kits/45nm/photos.htm
Package
The IC
http://www.namedevelopment.com/blog/archives/Intel-penryn.gif
What Does an IC Look Like? (2)
• 45 nm, quad-core
• Note the symmetry
• Two dual-cores integrated
19.10.2015 Arto Perttula 23
Structure of an IC
• Transistors and connections are made from many layers (typical 10 to 15 in CMOS)
built on top of one another
– Ever increasing number of layers (more layers, more cost, though)
• Each layer has a special pattern defined by a mask
• One important aspect of an IC is the length of a smallest feature that can be
fabricated
– Feature may stand for length of the transistor or the width of a wire (or something completely
different…)
– Unit is micrometer (µm, 10-6 meter), or nanometer (nm, 10-9 meter)
– E.g., we may say that an IC is built with 0.35 µm process
– The process continues to improve (Moore’s law) even in deep sub-micron era
– The state-of-art commercial process is 14 nm, and 10 nm is coming 2016 or 2017
19.10.2015 Arto Perttula 24
Structure of an IC (2)
19.10.2015 Arto Perttula 25
Structure of an IC (3)
• Several metal layers, e.g., M1-M10
– Less congestion
• Every other layer routes wires in X-
direction, every other in Y
• Hierarchical scaling
– Wires on top levels are wider and
taller than on lower levels
• Top layers for
– Power supply
– Clock
– Global signals
19.10.2015 Arto Perttula 26
transistors
Fabrication of an IC
1. Purified silicon ingot (cylinder) is sliced into wafers
(e.g., 12-inch diameter)
2. Wafer is coated with photoresist
3. Light shines through the mask
4. Photoresist not hit by light is washed away
5. New layers (n-well, dielectric, copper wire, via etc.)
are created on top of the silicon
6. Finally, the rectangular dies (chips, e.g., 1-200 mm2)
are sawed from the wafer, tested, and packaged
19.10.2015 Arto Perttula 27
Example Lithography Machine
28 [K.M. Palmer, An extremely fine line , IEEE Spectrum, Jan 2012, pp. 47 - 50]
Classification: Where HW
Customization Is Done
a) In a fabrication facility: ASIC
– Full-custom, Standard cell, and Gate array ASIC
(Application Specific IC)
b) In the ”field”: non-ASIC
– Simple/Complex field programmable logic device
– Off-the-shelf SSI/MSI (Small/Medium Scaled IC)
components
19.10.2015 Arto Perttula 29
Full-Custom ASIC
• All aspects (e.g., size of a transistor) of a circuit are
tailored for a particular application
• Circuit fully optimized
• Design extremely complex
• Very time consuming design (typically only feasible for
small components)
• Masks needed for all layers
– Very expensive
– Fabrication time up to months
• Example: Intel, AMD, and IBM processors are (partly)
full-custom 19.10.2015 Arto Perttula 30
Fig. Silicon layout editor
Standard-Cell ASIC
• Circuit made using a set of pre-defined logic components, known as
standard cells
– E.g., basic logic gates, 1-bit adder, D-FF
– Library cannot be altered albeit some basic parameters can (e.g., fan-out)
– Height of a cell is pre-determined
• Layout of the complete circuit is customized
1. The location and type of the standard cells
2. Connections between cells
• Layout created with special EDA tools
• Masks needed for all layers
– Same fabrication cost as with full-custom
• E.g., mobile phone digital ICs
19.10.2015 Arto Perttula 31
SC-ASIC has
fixed-height rows
of std cells
Closer look at 4 standard cell rows. Power can ground lines
run horizontally inside the cells
Gate array ASIC
• Circuit is built from an array of a single type of cell (known as base cell)
• Base cells are pre-arranged and placed in fixed positions, aligned as one-
or two-dimensional array
– Connections customized by the designer
• More sophisticated components (macro cells) can be constructed from base
cells
• Masks needed only for metal layers (connection wires)
– Cheaper than full-custom or standard-cell
• Aka channelles array or sea of gates array
19.10.2015 Arto Perttula 32
Compex Field Programmable
Logic Device
• Device consists of an array of generic logic cells and general
interconnect structure
• Logic cells and interconnect can be ”programmed” by utilizing
”semiconductor fuses” or ”switches”
• Customization is done ”in the field”
• Two categories:
1. CPLD (Complex Programmable Logic Device)
– Sea-of-gates to implement logic
2. FPGA (Field Programmable Gate Array)
– Look-up tables to implement logic
• No custom mask needed
• For example, Cisco 2600 series routers and this course
Arto Perttula
Simple Field Programmable
Logic Device (PLD)
• Programmable device with simple internal
structure
• E.g., PROM (Programmable Read Only
Memory), PAL (Programmable Array Logic)
• No custom mask needed
• Outdated technology
• Replaced by CPLD/FPGA
Arto Perttula Fig.1 Example PAL (AND-OR net)
SSI/MSI Components
• Small discrete parts with fixed, limited
functionality
– E.g., few AND-ports in Printed Circuit Board (PCB)
• E.g., 7400 TTL series has more than 100 parts
• Resources (e.g., power, board area,
manufacturing cost etc.) is consumed by package
but not silicon
• No longer a viable option except for hobby
projects
19.10.2015 Arto Perttula 35
Fig. 2 TTL clock with 7400s.
Rather hackish, ehh.
Fig.1 Example component
Major Trend: Integration
19.10.2015 36
• Minimize the number of discrete components
• Integrate to single chip/package – several CPUs
– memories
– HW accelerators
– on-chip network
– also passive, RF, and MEMS components
• Fig: [J. Blau, Talk is cheap, IEEE Spectrum, vol. 43, iss. 10, Oct 2006, pp. 10-11.]
Syste
m-o
n-c
hip
Major Trend: Integration (2)
• Actel Fusion Mixed-signal FPGA
1. Integrated Analog-to-Digital Converter (ADC)
2. Fusion Supports Low Power, synchronization
3. Embedded Flash Memory
4. Advanced I/O Standards
5. Charge Pumps
6. Analog Quads
7. Flash FPGA VersaTile
8. SRAM and FIFOs
9. Integrated Oscillators – Crystal and RC
10. Routing Structure
11. JTAG
19.10.2015 37 http://www.actel.com/documents/Fusion_PIB.pdf
3. Designer productivity
Rela
tive
pe
rfo
rman
ce
2. Memory speed
Major problem areas
1.
Trend: Shannon’s Law >
Moore’s Law > Productivity
19.10.2015 Arto Perttula 38
Fig: [J.M. Rabaey - Silicon Architectures for Wireless Systems - Part 01, Tutorial, HotChips, 2001]
http://bwrc.eecs.berkeley.edu/People/Faculty/jan/presentations/hotchips1.pdf
• Wirth’s (or Reiser’s) law: ”Software is slowing faster than hardware is accelerating”
• Unknown: ”What Grove giveth, Gates taketh away”
Principles of Moders SoC Design
• Adopt system-level design
– Use models prior to implementation
– Seek global optimum instead of local
• Extensive reuse (of IP components)
– Use pre-designed and pre-verified components instead of implementing from scratch
– Leads to shorter time-to-market
• Hardware/Software co-design
– Software can be tested with simulation/emulation before HW has been implemented
• Need changes in SW programming paradigm, languages and tools
– SW must be designed as concurrent instead of sequential
• Need change in education
– Tenhunen’s law: ”The number of courses needed to understand digital systems doubles every decade”
• Use formal models more (not in this course, though)
19.10.2015 Arto Perttula 39
1. INTRODUCTION
1b. Comparing the Technologies
19.10.2015 Arto Perttula 40
Comparison Criteria
• Area (size, silicon real-estate)
– [mm2], [equivalent gates]
• Speed (performance)
– [MHz], operations/second [op/s]
– Time required to perform a task, [s]
• Power consumption, [mW]
• Cost, [€]
• Design effort, [person-month]
• Life-cycle of COTS components [years]
– Commercial off-the-self
Arto Perttula 41
Standard-Cell ASIC versus FPGA
1. Area [1]
– ASIC is smaller since the cells and interconnect are customized
– FPGA has overhead for programmability and capacity cannot be completely utilized
– Roughly: FPGA area is approximately 35x using the LUT-based logic elements
• However, that is not directly seen by FPGA end users – high volume compensates some costs ($$)
2. Performance [1]
– Roughly: ASIC has 3.4-4.6x frequency compared to FPGA
3. Power [2]
– ASIC is better, the ratio ~10x
19.10.2015 Arto Perttula 42
[1] I. Kuon and J. Rose, "Measuring the Gap between FPGAs and ASICs" in IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, Vol. 26, NO. 2, FEBRUARY 2007, pp. 203 - 215.
[2] John Blyler, Navigating the Silicon Jungle: FPGA or ASIC?, June / July 2005 issue of Chip Design Magazine, [online]:
http://chipdesignmag.com/display.php?articleId=115&issueId=11
Cost of Integrated Circuits
• Types of cost:
1. Chip costs
– NRE (Non-Recurring Engineering) cost: one-time, per-design cost
– Part cost: per-unit cost
2. Indirect design costs
– Lead time: time to get the chip out of the factory
– Time-to-market ”cost” loss of revenue
• Standard-cell: high NRE, small part cost and large lead time
• FPGA: low NRE, large part cost and small lead time
19.10.2015 Arto Perttula 43
Cost of Integrated Circuits (2)
• For ASIC, first-time-right is necessary
• FPGA has lower NRE, but higher RE
– Suitable for low volumes
– Break even volume getting bigger all the time
44
Summary of Technologies
19.10.2015 Arto Perttula 45
• Trade-off between optimal use of hardware resource and design effort/cost
• No single best technology
Architecture Choice Makes
Big Difference
Heinrich Meyr, Future Wireless Communication Systems…, VTC, 2005.
(Figure data by T.Noll T.Noll, RWTH Aachen)
http://www.ieeevtc.org/vtc2005spring/presentations/2020_presentations/HMeyr.pdf
CONCLUSIONS
19.10.2015 Arto Perttula 47
Conclusions
• Two viable implementation technologies: standard-cell ASIC and
FPGA
– ASICs are smaller in area and faster than FPGA
– ASICs have low unit cost but high NRE, FPGA vice versa
– ASICs used in high volume products, FPGAs in tailorable products
• FPGA is a ”programmable ASIC” (custom IC, actually)
– I.e., someone (Altera, Xilinx etc.) an IC application which is FPGA
– Extra resources needed to provide in-field configuration
• Many chips include several programmable processors
19.10.2015 Arto Perttula 48
FOR SELFSTUDY
19.10.2015 Arto Perttula 49
Multiprocessor Is
Mainstream in SoC
19.10.2015 Arto Perttula 50
6:
[Herzen, Lerer, Grand Challenges…, Design,
Applications, Integration and Software, 2006]
[Turley, Survey says: software tools more
important than chips, Embedded Systems
Design, 04/11/05]
32-bit processors are the most popular (> 55%)
#CPUs has increased since 2005
SoC frequencies are much lower than high-end CPUs
Manycore Chips Are Here Today
• Increase the number of processors
on chip
• On-chip parallel computer
• On market: 2, 4, 8 CPU cores per
chip
– Not accounting GPUs!
• Coming: tens or hundreds CPUs per
chip
• A 38, 48, 80-core demo chips from
Intel, 64-core and 100-core chips
from Tilera on market
19.10.2015 Arto Perttula 51
iPhone 4(S) Teardown
• iPhone 4S
19.10.2015 Arto Perttula 52
Sources: http://www.appleinsider.com/print/11/10/13/teardown_of_apples_iphone_4s_reveals_larger_battery_new_baseband_chip.html
iPhone 4S SoCs
53
Apple A5: 45nm, 122 mm2, 800MHz- 1 GHz, est. 15M tran, 50 mm2,
includes dual-core ARM Cortex A9 with NEON SIMD accelerator
and PowerVR graphics prcoessor, (power <1 W??)
+533 MHz 512 MiB DDR2 in the same package
Figure: http://www.electronics-lab.com/blog/?p=10110
http://en.wikipedia.org/wiki/Apple_A45
Qualcomm MDM6600, 45 nm, 512 MHz, ARM1136JS
32+32 KB L1, 256 KB L2; 147 MH< Application DSP,
162 MHz Modem DSP, 160 MHz 16-b DDR interface
(power ~ tens of mW?)
http://www.scribd.com/doc/54154049/80-Vr001-1-
Mdm6200-and-Mdm6600-Mobile-Data-Modem-Device-
Specification-Advance-Information
iPhone 4 SoCs
54
A4: 45nm, 800MHz- 1 GHz, est. 15M tran, 50 mm2, includes dual-issue
superscalar ARM Cortex 8 and PowerVR graphics prcoessor
Figure: http://techon.nikkeibp.co.jp/article/HONSHI/20100727/184585/?P=2
http://en.wikipedia.org/wiki/Apple_A4, http://en.wikipedia.org/wiki/Apple_Ax
http://en.wikipedia.org/wiki/Samsung_Hummingbird
http://en.wikipedia.org/wiki/ARM_Cortex-A8, http://en.wikipedia.org/wiki/PowerVR
http://pdadb.net/index.php?m=cpu&id=a40000&c=samsung-
intrinsity_apple_a4_s5pc110a01
XMM 6180 baseband platform, 65 nm, ARM1176 @ 416MHz,
supports HSDPA/HSUPA, WCDMA, EDGE, speech
(aka. Infineon 337S3394 WEDGE baseband, marked SP836175
G0822, nowadays probably called intel XMM 6180)
http://www.theiphonewiki.com/wiki/index.php?title=XMM_6180
http://www.infineon.com/cms/en/corporate/press/news/releases/
2008/INFCOM200805-068.html
Check out also the newer chip SDR20: [U. ramcher et al.,
Architecture and implementation of a Software-Defined Radio
baseband processor , ISCAS, 2011] and
http://www.teknologisk.dk/_root/media/34851_3_SDR_Infineon.
19.10.2015 Arto Perttula 55
Table for iPhone 4
19.10.2015 Arto Perttula 56
Table for iPhone 4
iPhone 4 Cost Teardown
• Most profit is made with top models
– Together with mindless hype and (annoying) bundle sales
• E.g., consider Flash memory costs
– 1.2 $/GB as a chip
– 4-6 $/GB inside iPhone
• Development, SW etc. costs not included in table
19.10.2015 Arto Perttula 57
Sources:
http://www.isuppli.com/Teardowns/News/Pages/iPhone-4S-Carries-BOM-of-$188,-IHS-iSuppli-Teardown-Analysis-Reveals.aspx
http://www.isuppli.com/Teardowns/News/Pages/iPhone-4-Carries-Bill-of-Materials-of-187-51-According-to-iSuppli.aspx
Cost category 16 GB 32 GB 64 GB
Retail price w/ contract, [$] 199 299 399
Total BOM cost, [$] 188 207 245
of which NAND Flash, [$] 19 38 77
Manufacturing cost, [$] 8 8 8
Price - Bom - manufac., [$] 3 84 146