networks -on-chip and noc synthesissi2.epfl.ch/~amaru/dtis/slides_presentations/dt9.pdf · networks...
TRANSCRIPT
NETWORKS-ON-CHIPAND NOC SYNTHESIS
Federico ANGIOLINILSI EPFL2014
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 1
CORE-BASED DESIGNFederico ANGIOLINILSI EPFL2014
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 2
A WIDENING PRODUCTIVITY GAP
ITRS
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 3
TWO TRENDS: CMPS AND MPSOCS
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 4
Chip MultiProcessor(CMP)
MultiProcessorSystem-on-Chip
(MPSoC)
Intel, Tilera, STM Qualcomm, Samsung, TI, Apple
Blurred boundaries…
CMPS VS. MPSOCS
CMP MPSoC
Fully homogeneous Very heterogeneous
Fully programmable cores Mostly fixed-function cores
Tile-based Subsystem-based
Regular chip layout Heavily customized layout
Emphasis on ease of programming
Emphasis on power/performance/cost
For: general processing, programmable accelerators
For: embedded applications, optimized accelerators
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 5
MPSOC SYSTEMS BECOME VERY COMPLEX
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 6
TI OMAP 4460, 2011
INTEGRATION ISSUE ARISES
❖Many cores
❖If MPSoC, heterogeneous cores➢Different pinouts➢Different clocking (+DVFS)➢Different physical sizes➢Different power/temperature budgets➢Different communication needs
❖How to integrate, verify?
❖System interconnect
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 7
ON-CHIP INTERCONNECTSFederico ANGIOLINILSI EPFL2014
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 8
TRADITIONAL ANSWER: BUSES
❖ Shared bus topology
❖ Aimed at simple, cost-effective integration
Bus
Slave 1
Master 1
Slave 3
Master 2
Slave 2
Master 3
❖ Typical example: ARM AMBA AHB➢ Arbitration among multiple masters➢ Single outstanding transaction allowed➢ If wait states are needed, everybody waits➢ Slow!
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 9
BUS EVOLUTION OCCURRED
Bus
Bus
Bus
Bus
Bus
Bus
Bus
Cros
sbar
Bus++ v2.0
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 10
Protocolevolution
Topologyevolution
STILL OUTSTANDING PROBLEMS
❖Performance scalability
❖ Ease of design and verification➢ ~2 years time to market (89% late!)
❖Physical design issues➢ Routing congestion➢ Wire propagation delay
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 11
Numetrics Management Systems
AND OVERALL, THE DESIGN LOOKS LIKE…
Chip for mobile multimedia apps (under NDA)
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 12
THE RISE OF NETWORKS-ON-CHIP (NOCS)
❖ Packet-based communication
❖ NIs packetize transactions
❖ Switches route transactions across
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 13
Benini-De Micheli, 2002
Dally, 2001
NOC OPERATION EXAMPLE
CPU
I/O
Net
wor
k In
terf
ace
switch
Net
wor
k In
terf
ace1. CPU request (AHB, OCP, ... pinout)
2. Packetization and transmission
3. Routing
4. Receipt and unpacketization (AHB, OCP, ... pinout)
5. Device response (if needed)
6. Packetization and transmission
7. Routing
8. Receipt and unpacketization
switch
switch
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 14
NOC HERITAGE: A LOGICAL STEP
❖ On-chip version of large area networks➢ WAN, MAN, LAN, PAN, …
❖ Next logical step in interconnect evolution
❖ Scalable, robust, efficient, well-understood
❖ Different trade-offs compared to large area networks➢ Wire parallelism is cheap vs. cables➢ Latency, power, area must be orders of magnitude lower➢ Buffers must be extremely small
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 15
NOC ADOPTION TRENDS
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 16
Sonics Whitepaper, 2012
HOW TO DESIGN NOCSFederico ANGIOLINILSI EPFL2014
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 17
NOCS HAVE MANY DESIGN AXES
❖ How to connect things: topology
❖ How to inject and eject messages: network interface architecture (incl. packetizationpolicy)
❖ How to pass packets along: router architecture (incl. switching policy, flow control policy)
❖ Where to send packets: routing policy
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 18
NoC
NETWORK INTERFACE ARCHITECTURE (XPIPES)
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 19
ni_request
ni_response
ni_receive
ni_resendMas
ter
Slav
e
Initiator NI Target NI
Hea
der R
egPa
yloa
d Re
gH
eade
r Reg
Payl
oad
Reg
Hea
der R
egPa
yloa
d Re
gH
eade
r Reg
Payl
oad
Reg
Pending Trans Reg
FSM
Routing LUT
Pending Trans Reg
FSM
Routing LUT
FSM
FSM
Buffer Buffer
BufferBuffer
OCP
AHB
AXI
OCP
AHB
AXI
OCP
AHB
AXI
OCP
AHB
AXI
Source routing
Parametric link width
Protocol interoperab.
PACKET TRANSMISSION
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 20
Header
Header
Header
Payload
Payload 1Payload 2Payload 3Payload 4…
(1) Read request
(2) Single write request or single read response
(3) Burst write request or burst read response
Initi
ator
Targ
et
DATABYTE
ENABLES
DATA RESP
ATTRIBU
TES
BURSTLENGTH
BURST
SEQU
ENC
E
BURST
PRECISE
BYTEEN
ABLES
CO
MM
AN
D
ADDRESS SENDER ROUTE
BURST
INC
REMEN
T
ID
SEND
ER WID
TH
ROUTE
TYPEID SENDER
SEND
ER WID
TH
request header
request payload
response header
response payload
INTEROPERABLE PACKET FORMAT
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 21
ROUTER ARCHITECTURE (XPIPES)
❖ Input and/or output buffering
❖ Wormhole switching
❖ Supports multiple flow control policiesDESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 22
Crossbar
AllocatorArbiter
RoutingFlow Control
DEADLOCKS❖ Routing level
❖ Message level
Initiator Target
Request
Response
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS13-Oct-14 23
APPLICATION-SPECIFICNOC DESIGN
Federico ANGIOLINILSI EPFL2014
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 24
CMP VS. MPSOC INTERCONNECTS
CMP MPSoC
Regular interconnect (mesh, torus) Custom topology
Unpredictable workload Mostly predictable workloads
Synthetic workloads may be representative
All usage scenarios must be checked
Emphasis on routing, bisection bandwidth
Emphasis on latency, simulated performance
Homogeneous NoC architecture Locally optimized architecture
Requires cache coherence Requires core interoperability
Regular, short wiring Irregular, long wiring
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 25
MANY CHALLENGES TO TACKLE
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 26
Sonics Whitepaper, 2012
…WHICH IS EXPENSIVE
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 27
Sonics Whitepaper, 2012
MPSoC architects spend ~28% of time defining the NoC
Company A: € 76 M (profitable)Company B: € 56 M (maybe bankrupt)
3 months
STANDARD DESIGN FLOW
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 28
DesignRequirements
HardwareDescription
SynthesisPlacement
Routing
ProcessorsMemoriesI/ONoC Inter-connect
■ Designer picks NoC with expertise + guesswork■ NoC is iterated on multiple times based on simulation and
synthesis feedback■ Electronic Design Automation (EDA) Tools exist but most
difficult work is manual
AN AUTOMATED DESIGN FLOW
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 29
NoC RTL
NoC
Models
Comm
Specs
IP Core
RTL
Back-End
Flow
Floorplanwith NoC
ArchSpecs
NoC
Library
Chip
Topology
Generation,
Floorplanning
Existing know-how, IP and tools
NoC extensions
Floorplan(optional)
Simulation
Design PointSelection
REFERENCE ALGORITHM
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 30
Partitioning (core-to-switch
assignment)
Partitioning (core-to-switch
assignment)
Design point generation
Partitioning (core-to-switch
assignment)
Switch-to-switch connectivity
establishment
Post-processing (trimming,
simulation, …)
go to next design point
Dump output
1
2
3
4
5
Murali, DAC 2004Murali, ASPDAC 2005
Bertozzi, IEEE TPDS 2005
Other seminal works:Pinto, ICCD 2003
Srinivasan, ICCD 2004
1 - DESIGN POINT GENERATION
❖ Identify NoC parameter permutations➢ NoC flit width➢ Switch count in each V/f island➢ Potentially any others: buffering, VCs…
❖ Sweep grain impacts runtime, exploration thoroughness
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 31
EXAMPLE DESIGN POINTS
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 32
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Core 1
Core 8
Core 18
Core 9
Core 17
Core 10
Core 16
Core 15
Core 14
Core 13
Core 12
Core 11
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Core 1
Core 8
Core 18
Core 9
Core 17
Core 10
Core 16
Core 15
Core 14
Core 13
Core 12
Core 11
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Core 1
Core 8
Core 18
Core 9
Core 17
Core 10
Core 16
Core 15
Core 14
Core 13
Core 12
Core 11
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Core 1
Core 8
Core 18
Core 9
Core 17
Core 10
Core 16
Core 15
Core 14
Core 13
Core 12
Core 11
Core to switch assignment such that communication constraints can be met
2 - DETERMINE SWITCHES IN V ISLANDS❖ Operating frequency bounds maximum switch
radix, thus max radix different in each VI
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 33
f = 600 MHz → Switch = 9x9
f = 850 MHz → Switch = 3x3 f = 700 MHz →
Switch = 7x7
MEM
ARM DSP
DMA
RAM
MEM DSP COPROC ACCEL
IMPROC
FIFO
VI 0
VI 1
VI 2
MEM
ARM DSP
DMA
RAM
MEM DSP COPROC ACCEL
IMPROC
FIFO
VI 0
VI 1
VI 2
EXAMPLE TOPOLOGY AT END OF STEP 2
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 34
Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
Core 1 Core 8
Core 18 Core 9
Core 17 Core 10
Core 16 Core 15 Core 14 Core 13 Core 12 Core 11
S1
S8 S7S6
S3S2
S5
S4
3 – FLOW PATH FINDING❖ Establish physical links, consider➢ Cost function: marginal power (to be minimized)➢ Constraints: max switch radix, minimum bandwidth, max latency
(power/latency auto-tuning)
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 35
Many paths available !
MEM
ARM DSP
DMA
RAM
MEM DSP COPROC ACCEL
IMPROC
FIFO
VI 0
VI 1
VI 2
3 – DEADLOCK AVOIDANCE❖ Routing deadlocks removed during synthesis❖ Methods guarantee availability of paths❖ No special hardware/virtual channels needed
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 36
MEM
ARM DSP
DMA
RAM
MEM DSP COPROC ACCEL
IMPROC
FIFO
VI 0
VI 1
VI 2
MEM
ARM DSP
DMA
RAM
MEM DSP COPROC ACCEL
IMPROC
FIFO
VI 0
VI 1
VI 2
VI 4
3 x 3
9 x 9
7 x 7
Frequency converters modeled and cost included in path computation
3 – DVFS-AWARE FLOW ROUTING❖ Inter-island routing options:➢ Directly from source VI to destination VI, or across intermediate islands
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 37
EXAMPLE TOPOLOGY AT END OF STEP 3
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 38
Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
Core 1 Core 8
Core 18 Core 9
Core 17 Core 10
Core 16 Core 15 Core 14 Core 13 Core 12 Core 11
S1
S8 S7S6
S3S2
S5
S4
4, 5 – DESIGN POINT FINALIZATION❖ Topology is now defined
❖Optional optimizations and checks:➢ Trim 1x1 switches➢ Can trim flit width where excessive for requirements➢ Can refine buffer sizing to optimize
performance/area/power➢ Can simulate for performance verification and iterate
❖Dump topology to disk13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 39
EXAMPLE TOPOLOGY AT END OF STEP 5
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 40
Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
Core 1 Core 8
Core 18 Core 9
Core 17 Core 10
Core 16 Core 15 Core 14 Core 13 Core 12 Core 11
S1
S8 S7
S3S2
S5
S4
NOC FLOORPLANNINGFederico ANGIOLINILSI EPFL2014
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 41
ROUTING CONGESTION IS AN ISSUE
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS17-Oct-14 42
Arteris Whitepaper
WIRING DELAY IS AN ISSUE
DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS17-Oct-14 43
ITRS 2004
TRAVERSABLE WIRE LENGTH IS GOING DOWN
❖ In 40 nm, at same frequency, wires must be 20-25% shorter than in 65 nm
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 44
THEREFORE, FLOORPLAN INPUT IS ESSENTIAL
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 45
ACCOUNTING FOR FLOORPLAN
❖ Optional input
❖ Not to be reshuffled➢ Too many constraints (thermal, pins, analog macros…)
❖ Insert NoC in the floorplan➢ Figure out ideal NoC component positions➢ Identify links still too long, pipeline them➢ Estimate power according to library model
❖ Designer can always tweak to taste
❖ Output floorplan can be exported towards P&R tools
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 46
OPTIMIZED FLOORPLAN EFFECT❖ Collaboration with Teklatech
❖ Different 40nm reference floorplans
❖ NoCs inserted by iNoCs with optimizations
❖ 15-20% savings in demanding use cases
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 47
IMPACT ON SYNTHESIS ALGORITHM
❖ Floorplan is updated simultaneously with topology synthesis (wires accounted in exploration)
❖ Two floorplanning steps:
➢ doInitialFloorplanning(): places blocks in ideal positions. Can be iterated and refined during topology building.
➢ doFloorplanning(): moves blocks from ideal positions to suitable locations (empty areas). Can be executed once as final pass.
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 48
EXAMPLE: INPUT FLOORPLAN
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 49
OUTPUT FLOORPLAN (NO OVERLAPS)
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 50
OUTPUT FLOORPLAN (OVERLAPS)
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 51
WIRE LENGTH ESTIMATION
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 52
LINK PIPELINING
❖ Wire segmentation by topology design➢ Put more switches, or place them closer
❖ Wire segmentation by pipeline insertion➢ Flops/relay stations to break links
❖ Take advantage of pre-existing converters➢ If e.g. frequency converter is instantiated along link, space it
evenly to help with segmenting
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 53
Sender Receiver
VALID
STALL
VALID
STALL
VALID
STALL
EXAMPLE: LINK LENGTH DISTRIBUTION
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 54
Original benchmarkmax f = 400 MHz
Overclocked benchmarkmax f = 800 MHz
CONCLUSIONSFederico ANGIOLINILSI EPFL2014
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 55
CONCLUSIONS
❖ IP-based design becoming prevalent➢ CMPs, MPSoCs
❖ System interconnect is evolving towards NoCs➢ Scalable performance, better physical design properties
❖ EDA tooling needed to help with NoC design➢ Devise best interconnect from high-level specifications➢ Accounting for floorplan is important
13-Oct-14 DESIGN TECHNOLOGIES FOR INTEGRATED SYSTEMS 56