システムlsiとアーキテクチャ技術 (part iii:チップ間接続技術) ·...
TRANSCRIPT
![Page 1: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/1.jpg)
Reconfigurable Architectures
AMANO, Hideharu
hunga@am.ics.keio.ac.jp
![Page 2: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/2.jpg)
Reconfigurable System
(Custom Computing Machine)
A target algorithm is executed directly with
a hardware on SRAM-style FPGA/PLDs.
High performance of special purpose machines.
High degree of flexibility of general purpose
machines.
A completely different execution
mechanism from stored program
computers.
![Page 3: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/3.jpg)
PLD(Programmable Logic Device)
Integrated Circuit whose logic function can be defined by users.
Standard IC,ASIC(Application Specific IC) SPLD(Simple PLD) / PLA(Programmable Logic
Array) Small scale IC with AND-OR array
CPLD(Complex PLD) Middle scale IC with AND-OR array
FPGA(Field Progarmmable Gate Array) Large scale IC with LUT
Caution! Terms are not well defined!
![Page 4: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/4.jpg)
1990 20001980
10K
100K
1M
10M
Gate number
Increasing Performance
From 1991-2000
Amount of gate: X45
Speed: X12
Cost:1/100
Fuse-
PLA
EEPROM-
SPLD
SRAM-
FPGA
CPLD
Anti-fuse
FPGA
Hierarchical structure
Embedded Core
Low voltage
Rapidly development of PLD
![Page 5: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/5.jpg)
SPLD(Simple PLD:
AND-OR/Product-term)
NOT AND
OR
Arbitrary logic is realized by
changing the AND-OR connection
![Page 6: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/6.jpg)
NOT AND
OR
A B C D
A & B
AND/OR connection example
C & D
A&B | C&D
![Page 7: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/7.jpg)
LUT:Look Up Table
Look Up Table
ROM/RAM…Address
…Data
A simple ROM/RAM can used as a
random logic.
ABC
000
001
010
011
100
101
110
111
Z
0
0
0
1
0
0
0
1
Z
0
0
0
1
0
0
0
1
CB
A
A combination of memory and
multiplexers are commonly used.
![Page 8: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/8.jpg)
An example using LUT:Look Up Table
ABC
000
001
010
011
100
101
110
111
Z
0
0
0
1
0
0
0
1
Z
0
0
0
1
0
0
0
1
CB
A
1 1 0
1
![Page 9: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/9.jpg)
AND-OR array vs. LUT
AND-OR array(product-term)
Efficient for logic with multiple outputs
There is a type of logic which cannot be realized.
Suitable for EEPROM and Flash-ROM
LUT
Any logic can be realized.
Efficient for logic with a single output
Suitable for Flash-ROM, Anti-fuse, and SRAM.
![Page 10: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/10.jpg)
Sequential circuits
AND・OR
array
or
LUT
D Q
D Q
D Q
D Q
Input Output
Feed
Back
Sequential circuit (state machine) can be built
by attaching Flip-flops and feed back loops.
DQ
Q
Feed back
From AND/OR array
Output
Module
![Page 11: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/11.jpg)
CPLD (Complex PLD)
SPLD
Programmable
Switch
SPLD
SPLD SPLD
SPLD
SPLD
Altera’s MAX
Matrix of SPLDs Programmable Switch
SPLD
2-dimensional Array
![Page 12: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/12.jpg)
FPGA(Field Programmable Gate Array)
LUT
F.F
Configurable Logic
Block
Switch
Block
Connection Block
IOB
island style
LUT and interconnection
is decided with
configuration data
![Page 13: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/13.jpg)
Device for flexibility(1)
Anti-fuse type
Program by destruction of isolation with high voltage
High speed but One-time
ACTEL、Quicklogic
EEPROM・Flash-ROM Switches for connections are realized by
floating gates.
Re-programmable
Lattice、Altera’s MAX series
![Page 14: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/14.jpg)
Device for flexibility(2) SRAM
Data on SRAM represents look up table and wire
connection.
ISP (In System Programming) is available.
The configuration data is erased, when the power
turns off.
Suitable for a large scale FPGA. Recently, rapidly
advanced.
Xilinx XC、 Altera FLEX, Lucent ORCA
The advanced series: Xilinx Virtex, Altera Stratix
その他 Magnetic memory
DRAM
![Page 15: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/15.jpg)
Architectures and devices
SPLD
CPLD
FPGA
Anti-fuse
EEPROM
Flash-ROM
SRAM
High speed middle size
One-time
ACTEL,Quicklogic
Large scale
Rapidly development
Xilinx、Altera
High speed small/middle size
Re-programmable
Delay is predictable
Lattice,Altera,Xlinx
![Page 16: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/16.jpg)
Recent PLDs
High-end: a large scale chip with hierarchical structure:
Xilinx’s Virtex II、Virtex-4/LX, Altera’s Stratix-3
System on Programmable Device
Providing DLL,CPU、DSP, ROM, RAM, Multiplier, High speed link, and other hard IPs.
Xilix’s Virtex-4/EX,FX, Altera’s Stratix-3
Specialized for mass-production Low cost:Xilinx’s Spartan, Altera’s Cyclone
Low voltage, Multiple voltages, and Low power consumption
![Page 17: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/17.jpg)
Process Products Name LUT Power
350nm XC4000 XC4085KLA 7448 3.3V
250nm XC4000 XC40250KV 20102 2.5V
220nm Virtex XCV1000 27648 2.5V
180nm Virtex-E XCV2000E 43200 1.8V
150nm Virtex-II XC2V800O 104882 1.5V
130nm Virtex-II
Pro
XC2VP125 125136 1.5V
90nm Virtex-4 XC4VLX200 200488 1.2V
65nm Virtex-5 XC5VLX330 51840slice 1.0V
40nm Virtex-6 XC6VLX760 118560slice 1.0V
28nm Virtex-7 XC7VX1140T 1139200slice 0.9V
Process and parameters(Xilinx co.)
![Page 18: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/18.jpg)
Xilinx Virtex II
Programmable IOs
Configurable Logic
DCM IOB
RAM Multiplier
Global
Clock
MUX
LUT Carry DQ
LUT Carry DQ
Slice
Slice X 2 → CLB (Configurable Logic Block)
100000 CLBs
3Mbit
![Page 19: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/19.jpg)
Slice structure of Virtex-6
LUT6inX1
5inX2
Carry
FF
FFMUX
6bitMUX
LUT
6inX15inX2
Carry
FF
FFMUX
6bitMUX
LUT
6inX1
5inX2
Carry
FF
FFMUX
6bitMUX
LUT
6inX1
5inX2
Carry
FF
FFMUX
6bitMUX
Virtex-6 manual
![Page 20: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/20.jpg)
Slice
X0Y0
Slice
X1Y0
CLB
COUT COUT
Slice
X2Y0
Slice
X3Y0
CLB
COUT COUT
Slice
X2Y1
Slice
X3Y1
CLB
COUT COUT
Slice
X1Y0
Slice
X1Y1
CLB
COUT COUT
CIN CIN CIN CIN
Virtex-6 CLBs
Virtex-6 manual
![Page 21: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/21.jpg)
Altera Stratix II
Mega RAM
Blocks
M4K RAM
Blocks
M512 RAM
Blocks
PLL
DSP Blocks
LAB:Logic Array Block
consisting of 10 LE (
4-input LUT and F.F.)
Hierarchical Interconnect
![Page 22: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/22.jpg)
LUT
6inFF
MUX4bit
data MUX
LUT
6in
adder1
FFMUX
4bit
data MUX
adder2
shared_arithcarry
reg_carry
4-in LUT X 2
5-in LUT + 3-in LUT
5-in LUT + 4-in LUT 1-input shared
5-in LUT + 5-in LUT 2-input shared
6-in LUT
6-in LUT + 6-in LUT 4-input shared
Stratix-IV ALM Structure
Stratix-IV manual
![Page 23: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/23.jpg)
LABLocal
Interconnect
MLABLocal
Interconnect
ALMs
Stratix-IV LAB structure
Stratix-IVマニュアルより
![Page 24: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/24.jpg)
Technologies vs. Product
90nm 65nm 45nm 40nm
Virtex-4LX/FX/SX
200000LC
Stratix-IV
/E/GX/GT
531200LE
Stratix-III/L/E
338000LE
Stratix-II/GX
179400LE
Cyclone IV/E/GX
149760LE
60nm
Cyclone III/LS
119088LE
Cyclone II
68416LE
Virtex-6LXT/SXT/
HXT/CXT
760000LC
Spartan-6LX/LXT
150000LC
Virtex-5LX/LXT/SXT/
FXT/TXT
330000LC
Spartan-3A N/DSP
53000LC
High-end
Low-cost
28nm
Virtex-7
T/XT/HT
2000000LC
Stratix-V
/E/GX/GS/GT
359200ALM
Cyclone V
/E/GX/GS/GT
301000LE
Arria Arria-II Arria-IV
174000LE
Kintex-7
480000LC
Artix-7
360000LC
Middle range
X1.5-X2.5/generation
High-end/Low-cost: X3-X5
![Page 25: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/25.jpg)
Technology vs. Products (Cont.)28nm
Virtex-7
2000000LC
Stratix-V
/E/GX/GS/GT
359200ALM
Cyclone V
/E/GX/GS/GT
301000LE
Arria-IV
174000LE
Kintex-7
480000LC
Artix-7
360000LC
20nm 16nm
Virtex-Ultrascale
5541000LC
Virtex-Ultrascale+
3780000LC
Kintex-Ultrascale
1451000LC
Kintex-Ultrascale+1143000LC
Arria-10
ARM+FPU
Stratix-10
ARM+FPU
10nm
![Page 26: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/26.jpg)
Partial Reconfiguration
A part of configuration on FPGA can be replaced
during operation.
Efficient use of FPGA area
Virtex-IIPro → Column by column
Virtex-4 → Rectangle shape
Virtex-6→ Dynamic Reconfiguration Port(DRP) is
provided.
Partial reconfiguration is controlled by the logic in
FPGA
Operating Systems for FPGA have been developed.
![Page 27: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/27.jpg)
Partial Reconfiguration
Reconf
Frame
Clock Region BoundaryPRM
PRM(Partial Reconfigurable Module) is placed on a fixed frame.
The problem is the interconnection between partial reconfiguration module
and static part.
![Page 28: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/28.jpg)
Partial Reconfiguration region
The case of Altera(Intel)’s Stratix V
![Page 29: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/29.jpg)
Replacing with another accelerator
Main switching fabrics works without stopping.
![Page 30: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/30.jpg)
Zynq All programmable SoC
ARM Cotex-A9(PS part) Dual Core CPU +
28nm Artix-7/Kintex-7 based FPGA(PL part)
![Page 31: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/31.jpg)
Zynq-7000 All Programmable SoCM
IO
Processing System (PS)
Programmable Logic (PL)
APU
DRAM
Controller
2 SPI
2 SD SDIO
2 GigE
2 USB
GPIO
2 I2C
2 CAN
2 UART
Flash Controller
(NOR, NAND,
SRAM, QSPI)
ARM Cortex-A9
32/32 KB
I/D Cache
512Kbyte L2 Cache
Snoop Control Unit
256Kbyte
On-Chip
Memory
AX
I In
terc
onnec
t
ARM Cortex-A9
32/32 KB
I/D Cache
AXI Interconnect
General Purpose
AXI portsAXI ACP High Performance
AXI ports
![Page 32: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/32.jpg)
ZedBoard
32
Zynq-7000 All Programmable SoC
XC7Z020-CLG484-1
DDR3 512MB
SD Card
installed
Linaro
Ubuntu
Ethernet
JTAG
![Page 33: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/33.jpg)
2.5D FPGA (http://www.xilix.com)
![Page 34: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/34.jpg)
QuickLogic
Lattice GAL
![Page 35: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/35.jpg)
Altera FLEX10K
![Page 36: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/36.jpg)
Qucklogic
Xilinx Vertex
![Page 37: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/37.jpg)
Design of PLDs Mostly designed with common HDL(Verilog-HDL,
VHDL) C level entry is used recently: Impulse-C, Vibado(Xilinx),
Open-CL(Altera)
Synthesis, optimization, place and route is automatically done by vendors’ tools. Integration and combination of tools from various venders
are used recently.
For large circuit, a long time is required especially for place and route.
Using IPs, clock/DLL adjustment is manually done.
Optimization techniques are different from vendors/products.
![Page 38: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/38.jpg)
Reconfigurable System
(Custom Computing Machine)
A target algorithm is executed directly with
a hardware on SRAM-style FPGA/PLDs.
High performance of special purpose machines.
High degree of flexibility of general purpose
machines.
A completely different execution
mechanism from a stored program
computers.
![Page 39: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/39.jpg)
Flexibility
Perform
ance
CPU
for i=0; i<K; i++
X[i]=X[i+j]
.....
CPU
Software
ASIC
Design
A
Design
C
Design
DDesign
B
FPGAs
High Performance and
Flexibility
Refonfigurable Systems
![Page 40: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/40.jpg)
How enhance the performance?
Performance enhancement by hardware
execution itself
The overhead of software execution (Instruction
fetch, data load to registers, and etc.)
The overhead of using fixed size data.
The overhead of using only two way branches.
The key of performance improvement is parallel processing
However, these benefits are not so large, for embedded CPU and DSP
are highly optimized.
![Page 41: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/41.jpg)
Parallel processing in reconfigurable
systems
Various techniques can be used
SIMD execution
Pipelined structure
Systolic algorithm
Data driven control
Parallel execution other than calculation
Parallel data access using internal memory units
Parallel data transfer including I/O accesses
![Page 42: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/42.jpg)
SIMD (Single Instruction-stream/
Multiple Data-stream)-like calculation
Stream Data in Stream Data out
Internal
Memory module
Processing part
The same instruction is applied to different data stream
In Reconfigurable Systems, the operation is not required to be same
(SIMD-like calculation)
![Page 43: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/43.jpg)
Pipelined structure
Stream Data 1Stream Data 1
Internal
Memory module
Processing part
The stream is divided and inserted periodically.
Stream Data 2Stream Data 3Stream Data 4Stream Data 5Stream Data 2
![Page 44: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/44.jpg)
Systolic Algorithm
Data x
Data y
Computational array
Data stream x,y are inserted with a certain interval.
When two stream meet each other, a calculation is executed.
→ Systolic: The beat of heart
![Page 45: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/45.jpg)
Band matrix multiply y=Ax
a
x
a11 a12 0 0
a21 a22 a23 0
0 a32 a33 a34
0 0 a43 a44
X+
yiyo
yo=ax+yi
y0
y1
y2
y3
x0
x1
x2
x3
=
![Page 46: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/46.jpg)
Band matrix multiply y=Ax
X+
a11
x1
a12 a21
a22
a23 a32
a11 a12 0 0
a21 a22 a23 0
0 a32 a33 a34
0 0 a43 a44
![Page 47: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/47.jpg)
Band matrix multiply y=Ax
X+X+
x1
a12 a21
a22
a23 a32
a11 a12 0 0
a21 a22 a23 0
0 a32 a33 a34
0 0 a43 a44
a33
y1=a11x1
x2
![Page 48: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/48.jpg)
Band matrix multiply y=Ax
X +
x1
a34 a43
a22
a23 a32
a11 a12 0 0
a21 a22 a23 0
0 a32 a33 a34
0 0 a43 a44a33
y2=a21 x1
x2x3
y1=a11 x1+a12 x2
![Page 49: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/49.jpg)
Band matrix multiply y=Ax
X+X+
a34 a43
a44
a23 a32
a11 a12 0 0
a21 a22 a23 0
0 a32 a33 a34
0 0 a43 a44
a33
y2=a21 x1+a22 x2
x2x3
![Page 50: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/50.jpg)
Band matrix multiply y=Ax
X +
a34 a43
a44
a11 a12 0 0
a21 a22 a23 0
0 a32 a33 a34
0 0 a43 a44
a33y2=a21 x1+
a22 x2+a23 x3
x2x3
y3= a32 x2
![Page 51: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/51.jpg)
Data flow algorithm
x
+
x
+
a bc
d e
(a+b)x(c+(dxe))
The process is activated
with the available of tokens
(data)
The overhead of synchronization is large.
![Page 52: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/52.jpg)
Data flow analysis and hardware generation
Data Flow Language
Configuration
Data
Data Flow Graph
Graph Decomposition
HDL
Description
Suitable for automatic generation of hardware
![Page 53: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/53.jpg)
Applications
No flexible program change
No IEEE standard floating point
Not memory bounded
Image processing, analysis, pattern matching,
Logic simulation, Fault simulation.
Neural network simulation.
Encryption /Decryption
Queuing Model、Markov Analysis
Electric Power Flow
Censer processing
Efficient use of on the fly processing.
Communication control、Protocol control
Software radio
![Page 54: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/54.jpg)
Large Scale Reconfigurable Platforms
…μP μP … …RU RU ……μP μP …RU RU
Interconnection/Shared memory
…μP RU …μP RU …μP RU …μP RU
Interconnection/Shared memory
Hetero nodes using homo cores: SRC6, SGI RASC
Homo node using hetero cores: Cray XD-1, XT4(XR-1)
μP
Stand-alone: SPLASH, RASH,BEE2
…RU RU …RU RU …RU RU
Interconnection/Shared memory
![Page 55: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/55.jpg)
Splash-2 (Arnold et.al 92)
String matching, Image processing, DNA matching, 330 times faster than the supercomputer Cray-II.
Systolic algorithm
VHDL, Parallel C
Annapolis MicroSystems(WILDFIRE)
![Page 56: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/56.jpg)
CRAY-XD1:• AMD Opteron
• 1board is consisting of 2CPUs+FPGA(Virtex II Pro)
• 1 rack provides 6 boards
• A high speed network called Rapid Array is used
• Interconnection between FPGAs can be done with Rocket I/O
![Page 57: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/57.jpg)
SGI RASC•Accelerator for SGI’s NUMA Altix
•Virtex II XC2V6000 and another Virtex for control
•Directly connected into the controller with NUMAlink4
![Page 58: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/58.jpg)
Microsoft’s Catapult
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
CPU
FPGA
FE FFE0 FFE1 Compress MLS0 MLS1 MLS2
Rank computation for Web search on Bing.
Task Level Macro-Pipelining (MISD)
FE: Feature Extraction
FFE: Free Form Expression: Synthesis of feature values
MLS: Machine Learning Scoring
2-Dimensional Mesh is formed (8x6) for 1 cluster.
FPGA: Altera’s Stratix V
![Page 59: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/59.jpg)
Recent trend of reconfigurable platforms
Enough size of logic can be mounted on a
single chip.
VL605 board (Virtex-6) or other boards can be a
good platform of reconfigurable computing.
A combination with embedded ARM core.
Zynq(Xilinx), Arria(Altera)
Large platforms have been developed.
BEE3 Berkeley etc.
Maxeler Technology’s success on business.
Targets: Oil、Gas、Financial Analytics
Selling Solution using FPGAs
![Page 60: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/60.jpg)
Dynamically Reconfigurable Processors
Coarse Grained Reconfigurable Array (CGRA)
Parallel processing using a lot of PEs
Dedicated for stream processing
Distributed memory
High speed dynamic reconfiguration
Multicontext
Multicast/Broadcast of configuration data inside the chip
On-line Configuration
C-base design
![Page 61: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/61.jpg)
Short history of Dynamically Reconfigurable Processors
1990 1995 2000 2005
FPGA with Dynamic
Reconfiguration
Processor with
Reconfigurable
Instructions
MPLD(Fujitsu)
WASMII(Keio)
Time Multiplexed
FPGA(Xilinx)
DRL(NEC)
GARP(UCB)
CHIMAERA(NorthWestern Univ.)
Xpp(PACT)
CS2112(Chameleon)
DRP(NEC elec.)
DAPDNA/2(IPFlex)
DFabric(Elixcent)
Kilocore(Rapport)PipeRench(CMU)
X-bridge
(NEC ele.)
DAPDNA/IMX
(IPFlex)
S-5(Stretch) S-6(Stretch)
FE-GA(Hitachi)
The 1st Generation The 2nd Generation
A lot of commercial
systems
DISC(Brigham Young Univ.)
![Page 62: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/62.jpg)
Coarse Grain Structure of PE
Instruction
Ro
uti
ng
MU
XR
ou
tin
gM
UX
Register&
Mask
Register&
Mask
OP Register
Register
BarrelShifter
Chameleon CS2112Kress Array II
![Page 63: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/63.jpg)
An example of PE array
PE
SE SE SE SE SE
PE
SE
PE
SE
PE
SE
PE
SE SE
PE
SE
PE
SE
PE
SE
PE
SE SE
PE
SE
PE
SE
PE
SE
PE
SE SE
MEM
SE
MEM
SE
MEM
SE
MEM
SE SE
FUNC
FUNC
FUNC
FUNC
PE PE PE
MuCCRA-1
(ASSCC2007)
![Page 64: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/64.jpg)
Product Vendor Context Data PE
D-Fabric Panasonic Deliver 4 Homo
Xpp PACT Deliver 24 Homo
S5/S6 engine Stretch Deliver 4/8 Hetero
CS2112 Chameleon Multi-C(8) 16/32 Homo
DAPDNA-2 IPFlex Multi-C(4) 32 Hetero
DRP-1 NEC electronics Multi-C(16) 8 Homo
STP-engine NEC electronics Multi-C(32) 8 Homo
Kilocore Rapport Multi-C 8 Homo
ADRES IMEC Multi-C(32) 16 Homo
FE-GA Hitachi Multi-C 16 Hetero
For Car-tuners SANYO Multi-C(4) 24 Homo
FlexSword(SAKE) Toshiba Multi-C(4/16) 16 Homo
Cluster Fujitsu Multi-C 16 Hetero
Most of Japanese semiconductor Companies has their
own projects! (2009 ASP-DAC Panel)
![Page 65: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/65.jpg)
Product Vendor Context Data PE
D-Fabric Panasonic Deliver 4 Homo
Xpp PACT Deliver 24 Homo
S5/S6 engine Stretch Deliver 4/8 Hetero
CS2112 Chameleon Multi-C(8) 16/32 Homo
DAPDNA-2 IPFlex→Tokyo
Keki
Multi-C(4) 32 Hetero
DRP-1 NEC electronics Multi-C(16) 8 Homo
STP-engine(DRP-2) Renesas
electronics
Multi-C(32) 8 Homo
Kilocore Rapport Multi-C 8 Homo
ADRES/SRP IMEC→Used in Somsung’s
smart phoneMulti-C(32) 16 Homo
FE-GA Hitachi Multi-C 16 Hetero
For Car-tuners SANYO Multi-C(4) 24 Homo
FlexSword(SAKE) Toshiba Multi-C(4/16) 16 Homo
Cluster Fujitsu Multi-C 16 Hetero
Now most of them disappeared
Shonan Meeting [2012]
![Page 66: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/66.jpg)
SRP (Samsung Reconfigurable Processor)
Samsung announced to use widely in their
smart phones(ICFPT2012 Seoul)
Based on IMEC’s ADRES
High performance architecture with 400MHz-
1GHz clock
Available as VLIW processors
Configurable PEs for application
Sophisticated Design Tools
![Page 67: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/67.jpg)
Reconfigurable
Array
ViewFURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FU FU FU FU
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FU FU FU FU
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
RF
Instruction FetchInstruction DispatchInstruction Decode
Data CacheVLIW
view
Base of Samsung Reconfigurable Processor
(IMEC ADRES)
![Page 68: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/68.jpg)
STP engine: Lunesas electronics
CPU
MIPS
JTAG
I-C
D-C
UA
RT
UA
RT
CS
IG
PIO
INTC
General
Port
8bX4
DMA
SPL
DMA
Work
RAM
(1kB)
PCIexp
HB/EP
(1-lane)Periph
I/F
10/100
Ether
MAC
DMA
PCI
Host/
Target
DDR2
SDRAM
CTR
DMAPCIexp
HB/EP
(1-lane)
STP
Engine
64bit on chip bus (266MHz)
64bit M
em
ory
Sw
itch (
266M
Hz)
SPLSPL
SPL
SPLSPL
SPLSPL SPL
Nconnect
Dynamically
Reconfigurable
Core
512PE(8bit)
32-context
Providing the virtual
hardware
mechanism
DMA controller hides
the communication
overhead
From Invited talk in Design Gaia.2008
![Page 69: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/69.jpg)
Tile
DRP (Dynamically Reconfigurable Processor)
A core of STP engine
![Page 70: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/70.jpg)
DRP Tile and PE structure
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
HMEM HMEM HMEM HMEM
HMEM HMEM HMEM HMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
VMEM
State Transition ControllerVMEM ctrl
VMEM ctrl
VMEM ctrl
VMEM ctrl
VMEM(2-port memory)
8bit × 256entry
HMEM(1-port memory)
8bit × 8092entry
![Page 71: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/71.jpg)
Context control for DRP
0
1
2
3
4
5
Data input
Data output
1. Context
switching
2. Parallel processing in a context
3. Serial execution in a context
Description in BDL
DRP compiler controls 3-dimensional
assignment
![Page 72: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/72.jpg)
Main Advantage:
Low power consumptionWhy low power ?
1. No redundant hardware There are no instruction fetch mechanisms, cache, TLB, and etc.
→ Of course, it cannot be a general purpose engine, but enough for an accelerator.
A bare datapath works only for computation.
2. Parallel Execution with a number of PEs Much lower clock frequency can be used to achieve the same
performance as other architectures.
The main problem is leakage power, but can be suppressed by power gating techniques.
10X energy efficient compared with DSPs.
5-50X with FPGAs.
Sometimes similar to that for hardwired logic.
![Page 73: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/73.jpg)
Dynamically Reconfigurable Processors
Coarse grain architecture, somehow like on-chip multiprocessors, while somehow like FPGA.
Rapidly development from 2001
They don’t find killer application (Chameleon’s fail)
High level language development environment has not been well established.
A lot of competitors
High performance embedded processors
Chip multiprocessors
Application Specific Configurable Processors
DSP
Standard FPGA/CPLD
System On Chip
![Page 74: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/74.jpg)
Open Problems
What’s difference between a Program and Configuration Data Reconfigurable Processor Array=a VLIW machine with an
extremely large instructions (Configuration data)
How frequent should Configuration change? Every-clock-context switching is not advantageous from the
viewpoint of consuming power.
However, if configuration is rarely switched, dynamic reconfiguration function is useless.
How is grain size of Processing Element decided? 8-32bit calculators are correct solution?
Is it a only escape way from Xilinx’s patent ?
How is the balance between calculators and controllers? Since DRP focuses on calculators, it is difficult to implement
complicated control.
Does the node balance of ACM correct?
![Page 75: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/75.jpg)
Summary
Another computing system than stored
program computers.
Not a perfect replace of stored program
type computers.
Advance of the semiconductor techniques
directly enhance the performance.
A lot of problems and subjects to research.
![Page 76: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/76.jpg)
Historical flow of computer systems
ENIAC
EDVAC、EDSAC
IBM machines
RISC, Intel’s microprocessorsReconfigurable
Machine
![Page 77: システムLSIとアーキテクチャ技術 (part III:チップ間接続技術) · 2017-07-14 · Process Products Name LUT Power 350nm XC4000 XC4085KLA 7448 3.3V 250nm XC4000](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0564037e708231d412b9e9/html5/thumbnails/77.jpg)
Exercise
There is a systolic array which multiplies 8 x
8 tri-diagonal matrix A with a size 8 vector x.
Compute the number of clock cycles for the
multiply. Here, the time when the first element
of x reaches to the left-most array is assumed
to be time 0.