ee3a1 computer hardware and digital design lecture 10 hardware design flows
TRANSCRIPT
EE3A1 Computer Hardware and EE3A1 Computer Hardware and Digital DesignDigital Design
Lecture 10
Hardware Design Flows
IntroductionIntroduction We want to turn a customer requirement into an electronic
system 2 approaches: Hardwired algorithms.
Customised hardware: solves only one problem Application Specific Integrated Circuit (ASIC)
Computation in software. General purpose hardware (microprocessor) Can solve any problem Customise through software.
Computing in hardware and softwareComputing in hardware and software
A simple example: Shopping list
Item Price (P) Quantity (Q) Item 1 2 5 Item 2 7 2 Item 3 4 3
Total bill = P1 x Q1 + P2 x Q2 + P3 x Q3
Turn our problem into siliconTurn our problem into silicon
How? What are the choices?
What types of silicon chip?What types of silicon chip?
Microprocessor Serial operation
ASIC Application Specific Integrated Circuit: Special purpose silicon chip Parallel operation
Solve in ASICSolve in ASIC
Build special purpose circuit Needs:
3 multipliers 1 adder 1 time step
Calculated in one go Calculated in parallel
P1
Q1
P2 Q2
P3
Q3
7
3
5
4
2
6
21
20
18
59
Solve on a microprocessorSolve on a microprocessor Break problem into sequence of simple steps
(program)
Step 1 Multiply P1 and Q1Step 2 Multiply P2 and Q2Step 3 Multiply P3 and Q3Step 4 Add all the results
Micro performs steps one after the other Slow, but can solve any problem
For ( i=1 to 3 ) {
result = result + pi * qi }
Solve on a microprocessorSolve on a microprocessor Uses simple circuit ( arithmetic logic unit ) Does only one simple operation at a time
Input and
output devices
Storage and Processor memory ( CPU )
Address bus
Data bus
Control bus
Memory
7 3 5 4 2 6P1xQ1
Input and
output devices
Storage and Processor memory ( CPU )
Address bus
Data bus
Control bus
7 3 5 4 2 6
MemoryP1xQ1
Solve on a microprocessorSolve on a microprocessor
Issue address of instruction
Instruction is returned
Inst
P1xQ1
Read
Uses simple circuit ( arithmetic logic unit ) Does only one simple operation at a time
Input and
output devices
Storage and Processor memory ( CPU )
Address bus
Data bus
Control bus
7 3 5 4 2 6
MemoryP1xQ1
Solve on a microprocessorSolve on a microprocessor Uses simple circuit ( arithmetic logic unit ) Does only one simple operation at a time
Issue address of P1
Value of P1 is returned
P1?
7xQ1P1xQ17 xQ1
Read
Solve on a microprocessorSolve on a microprocessor Uses simple circuit ( arithmetic logic unit ) Does only one simple operation at a time
And so on … Takes many cycles even for one instruction
What if our problem is large?What if our problem is large?
Suppose we change from 3 items to 300 items Microprocessor needs
The same sized circuit 100 times more time
ASIC needs: The same length of time ( very fast ) 100 times bigger circuit ( needs many transistors ) Can we make a circuit that big?
Number of transistors on a single chip
Year
1,000
10,000
100,000
1,000,000
10,000,000
1970 1975 1980 1985 1990 1995 2000
Moore’s lawMoore’s law
Until recently, such big circuits were not possible Now they are possible ASICs have become a big business
What if our problem changes?What if our problem changes?
Suppose we add a fourth item to our shopping list
Micro: Re-write program Quick & cheap to do For ( i=1 to n ) {
result = result + pi * qi }
For ( i=1 to 3 ) {
result = result + pi * qi }
P1
Q1
P2 Q2
P3
Q3
ASIC: Must build completely new circuit Slow & expensive to do
What if we find a bug? Micro: download patch to customers ASIC: disaster - product recall
What types of silicon chip?What types of silicon chip?
Microprocessor is: programmable ( easy to upgrade, bug fix ) flexible: can be modified to do anything slow
ASIC Application Specific Integrated Circuit: special purpose silicon chip: fast (because parallel) fixed purpose - cannot be modified
Must choose flexibility or speed
Reconfigurable hardwareReconfigurable hardware
FPGA ( Field programmable gate array ) New type of hardware Function is programmed by sending bits to it Function is easily and quickly modified Can be modified just like software:
Can fix bugs Can update function during product lifetime
Design flowsDesign flows Statement of what system must do
Register Transfer Level VHDL or Verilog code
Gates (or other building blocks)
Configuration Bit Stream
Physical level Fuse Map
Transistors
Silicon
ASIC PLD FPGA
Logic synthesis
Physical synthesis
Frontend
Backend
Normally done by skilled human
Normally done by computer program
ASICASIC
VHDL is converted to gates Each gate has a mask design (standard cell)
gnd
77
xh1
gndth2
si3
si4
si1
si2
th
si1
si2
gnd
7th
si1
si2
gnd
7
gnd
th1xhxh
ASICASIC CAD tool stitches together gate definitions to give
mask definition of whole chip:
Programmable Logic DevicesProgrammable Logic Devices Sum-of-products devices, i.e. AND of OR
Customize by setting state of switch boxes List of switch box states is fuse map.
Programmable Logic DevicesProgrammable Logic Devices
Various families: PEEL PLA PAL
All are: Cheap Very quick to design Inflexible and slow
Complex PLDsComplex PLDs
CPLD Improves flexibility by using many PLDs with programmable connections between them
PLD PLD
PLD PLD
Programmable routing resource Inputs Outputs
FPGA architectureFPGA architecture
IO blocks
Wiring
Logic
Configurable logic gates Configurable wiring Change chip’s function
by changing config data
Reconfigurable logic gatesReconfigurable logic gates
MUX
Memorybits
In1 In2
Out
Gate function is determined by data stored in memory One bit is selected by inputs
Reconfigurable logic gatesReconfigurable logic gates
MUX
Memorybits
In1 In2
Out
11
01
OR gate
If In1=0 and In2=0, 0th memory bit is output
00
0
Reconfigurable logic gatesReconfigurable logic gates
MUX
Memorybits
In1 In2
Out
11
01
OR gate
If In1=0 and In2=0, 0th memory bit is output
10
1
If In1=0 and In2=1, 1st memory bit is output
Reconfigurable logic gatesReconfigurable logic gates
MUX
Memorybits
In1 In2
Out
11
01
OR gate
If In1=0 and In2=0, 0th memory bit is output
01
1
If In1=0 and In2=1, 1st memory bit is output If In1=1 and In2=0, 2nd memory bit is output
Reconfigurable logic gatesReconfigurable logic gates
MUX
Memorybits
In1 In2
Out
11
01
OR gate
If In1=0 and In2=0, 0th memory bit is output
11
1
If In1=0 and In2=1, 1st memory bit is output If In1=1 and In2=0, 2nd memory bit is output If In1=1 and In2=1, 3rd memory bit is output
Reconfigurable logic gatesReconfigurable logic gates
MUX
Memorybits
In1 In2
Out
11
01
OR gate
Change gate function by changing stored memory bits Truth table of required function is stored in memory
MUX
Memorybits
In1 In2
Out
0
100
NOR gate
MUX
Memorybits
In1 In2
Out110
0
XOR gate
Field Programmable Gate arraysField Programmable Gate arrays
Better gate: output can be registered if required
100000Configuration data is scanned in during boot or reset
Field Programmable Gate arraysField Programmable Gate arrays
Better gate: output can be registered if required
1
000
0 0
AND gate: no flip-flop at output
Field Programmable Gate arraysField Programmable Gate arrays
Better gate: output can be registered if required
100011Change function by giving new configuration data
Field Programmable Gate arraysField Programmable Gate arrays
Better gate: output can be registered if required
1
000
1 1
AND gate with flip-flop at output
ExampleExample Carry unit of full-adder Synthesise VHDL to basic logic gates
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
Technology mapping transform to equivalent circuit, that uses only resources that we have available
ExampleExample Output of synthesis tool may use resources we don’t
have available
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6 cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g7
n5
Our simple logic gates have only 2 inputs
Compute values for truth tables
0001
0001
0001
01110111
ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g7
n5 0001
0001
0001
01110111
Put the function into the FPGA
ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g7
n5 0001
0001
0001
01110111
g3
0001
0001
g4
0001
g5
AND gates
ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g7
n5 0001
0001
0001
01110111
g3
0001
0001
0001
g4 g5
0111
g7
0111
g6
OR gates
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g 7
n 5 ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
0001
0001
0001
01110111
g3
0001
0001
0001
g4 g5
0111
0111
g7g6
xy
cin
Configure switch boxes to give required wiring
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g 7
n 5 ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
0001
0001
0001
01110111
g3
0001
0001
0001
g4 g5
0111
0111
g7g6
xy
cin
Configure switch boxes to give required wiring
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g 7
n 5 ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
0001
0001
0001
01110111
g3
0001
0001
0001
g4 g5
0111
0111
g7g6
xy
cin
Configure switch boxes to give required wiring
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
g 7
n 5 ExampleExample
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
0001
0001
0001
01110111
g3
0001
0001
0001
g4 g5
0111
0111
g7g6
xy
cin
cout
Configure switch boxes to give required wiring
FPGA ConfigurationFPGA Configuration
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
1000XXXX0001011110001110
0001
0001
0001
0111
0111
Scan in serial bitstream at boot or reset time to give chip its function
g3 g4 g5
g7g6
xy
cin
cout
FPGA ConfigurationFPGA Configuration
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
1000XXXX0001011110001110
0001
0001
0001
0111
0111
g3 g4 g5
g7g6
xy
cin
cout
Bitstream would also need to scan in configuration data for switch boxes
Configuration bitstreamConfiguration bitstream Bitstream determines:
Gate function Switch box routing
Routing is slow and inefficient Many more wire segments available than any one
design will actually use Wastes space Passing signal through switch boxes is slow
Better CLBBetter CLB
If the FPGA has a bigger CLB (more inputs, more memory) the design is more efficient (uses less wiring)
Example: Whole carry unit fits (easily) into 4-input 1 CLB
cout
x
y
cin
n2
n4
n3
g5
g4
g3
g6
Intellectual Property (IP) CoresIntellectual Property (IP) Cores Designs for sub-systems Sold to chip designers Exist as intellectual property Makes FPGA design very easy
Also available for ASICs Two types
Commonly used sub-systems (e.g. multipliers, FIFOs,…)
complicated and high value sub-systems (e.g. video decoder, network interface)
Intellectual Property (IP) CoresIntellectual Property (IP) Cores Same business model as software Individual can set up small company to produce cores Very low start-up costs No manufacturing costs Upgrades and bug-fixes can download across Web (But piracy is a problem)
Protection of IPProtection of IP
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box
MUX
In1 In2
Out Switch box
Switch box
Switch box N7
MUX
In1
In2
Out
Switch box
Switch box
Switch box
MUX
In1
In2
Out
Switch box
Switch box
Switch box
1000XXXX0001011110001110
0001
0001
0001
0111
0111
Can we encrypt this bitstream for distribution, and then decrypt it as it enters the chip?
g3 g4 g5
g7g6
xy
cin
cout
?????????????????????????????
Decrypt
Evaluation of Hardware TechnologiesEvaluation of Hardware Technologies ASICs are very high performance, but very costly to
prototype. Mask set costs £500,000. If I sell 1,000 chips cost price is £500 each If I sell 1,000,000 chips, cost price is 50p each
ASICs are only suitable for huge production runs: consumer mass market.
ASICs cannot be bug-fixed or upgraded. PLDs are very cheap and quick, but slow and inflexible. FPGAs are very good, but quite expensive