express cube topologies for on-chip interconnects boris grot j. hestness, s. w. keckler, o. mutlu...
TRANSCRIPT
Express Cube Topologies for On-chip
InterconnectsBoris Grot
J. Hestness, S. W. Keckler, O. Mutlu†
The University of Texas at Austin†Carnegie Mellon University
‡Part of this work was performed at Microsoft Research
Feb 17, 2009HPCA ‘09
The Era of Many-core
UTCS 2HPCA ‘09
Intel Larrabee• 16+ cores• Bidirectional
ring interconnect
UT TRIPS• 2x16 exec tiles• 16 NUCA tiles• Multiple networks
Intel Polaris• 80 tiles• 8x10
mesh
Tilera Tile• 64 cores• 5 mesh networks
Networks on a Chip (NOCs) On-chip advantages
No pin constraints Rich wiring resources
On-chip limitations 2D substrates limit implementable topologies Logic area constrains use of wiring resources Energy/power budget caps
Focus Topologies for tomorrow’s many-core CMPs
HPCA ‘09 3UTCS
Outline Introduction Existing topologies Multidrop Express Channels (MECS) Evaluation Generalized Express Cubes Summary
UTCS 4HPCA '09
UTCS 5HPCA '09
2-D Mesh
Pros Low design & layout
complexity Simple, fast routers
Cons Large diameter Energy & latency
impact
UTCS 6HPCA '09
2-D Mesh
Pros Multiple terminals
attached to a router node Fast nearest-neighbor
communication via the crossbar
Hop count reduction proportional to concentration degree
Cons Benefits limited by
crossbar complexity
UTCS 7HPCA '09
Concentration (Balfour & Dally, ICS ‘06)
UTCS 8HPCA '09
Concentration
Side-effects Fewer channels Greater channel width
UTCS 9HPCA ‘09
Replication
CMesh-X2
Benefits Restores bisection
channel count Restores channel width Reduced crossbar
complexity
UTCS 10HPCA '09
Flattened Butterfly (Kim et al., Micro ‘07)
Objectives: Improve connectivity Exploit the wire budget
UTCS 11HPCA '09
Flattened Butterfly (Kim et al., Micro ‘07)
UTCS 12HPCA '09
Flattened Butterfly (Kim et al., Micro ‘07)
UTCS 13HPCA '09
Flattened Butterfly (Kim et al., Micro ‘07)
UTCS 14HPCA '09
Flattened Butterfly (Kim et al., Micro ‘07)
Pros Excellent connectivity Low diameter: 2 hops
Cons High channel count:
k2/2 per row/column Low channel utilization Increased control
(arbitration) complexity
UTCS 15HPCA '09
Flattened Butterfly (Kim et al., Micro ‘07)
UTCS 16HPCA '09
Multidrop Express Channels (MECS)
Objectives: Connectivity More scalable channel
count Better channel
utilization
UTCS 17HPCA '09
Multidrop Express Channels (MECS)
UTCS 18HPCA '09
Multidrop Express Channels (MECS)
UTCS 19HPCA '09
Multidrop Express Channels (MECS)
UTCS 20HPCA '09
Multidrop Express Channels (MECS)
UTCS 21HPCA ‘09
Multidrop Express Channels (MECS)
Pros One-to-many topology Low diameter: 2 hops k channels row/column Asymmetric
Cons Asymmetric Increased control
(arbitration) complexity
UTCS 22HPCA ‘09
Multidrop Express Channels (MECS)
Analytical Comparison
UTCS 23HPCA '09
CMesh FBfly MECS
Network Size 64 256 64 256 64 256
Radix (conctr’d) 4 8 4 8 4 8
Diameter 6 14 2 2 2 2
Channel count 2 2 8 32 4 8
Channel width 576 1152 144 72 288 288
Router inputs 4 4 6 14 6 14
Router outputs 4 4 6 14 4 4
Experimental Methodology
Topologies Mesh, CMesh, CMesh-X2, FBFly, MECS, MECS-X2
Network sizes 64 & 256 terminals
Routing DOR, adaptive
Messages 64 & 576 bits
Synthetic traffic Uniform random, bit complement, transpose, self-similar
PARSECbenchmarks
Blackscholes, Bodytrack, Canneal, Ferret, Fluidanimate, Freqmine, Vip, x264
Full-system config M5 simulator, Alpha ISA, 64 OOO cores
Energy evaluation Orion + CACTI 6
UTCS 24HPCA '09
UTCS 25HPCA '09
64 nodes: Uniform Random
0
10
20
30
40
1 4 7 10 13 16 19 22 25 28 31 34 37 40
Late
ncy
(cyc
les)
injection rate (%)
mesh cmesh cmesh-x2 fbfly mecs mecs-x2
UTCS 26HPCA '09
256 nodes: Uniform Random
0
10
20
30
40
50
60
70
1 4 7 10 13 16 19 22 25
Late
ncy
(cyc
les)
Injection rate (%)
mesh cmesh-x2 fbfly mecs mecs-x2
UTCS 27HPCA '09
Energy (100K pkts, Uniform Random)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Ave
rage
pac
ket e
ne
rgy
(nJ) Link Energy Router Energy
64 nodes 256 nodes
UTCS 28HPCA '09
64 Nodes: PARSEC
0
2
4
6
8
10
12
14
16
18
20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Router Energy Link Energy latency
Blackscholes Canneal Vip
Tota
l ne
two
rk E
ne
rgy
(J)
Avg
pac
ket
late
ncy
(cyc
les)
x264
Generalized Express Cubes Low-dimensional k-ary n-cube
n = {1,2} Good fit for planar silicon
Express channels Improve connectivity MECS for better wire utilization
Multiple networks Improve throughput Reduce crossbar area & energy overhead
Hierarchical scaling
UTCS 29HPCA '09
Partitioning: a GEC Example
UTCS 30HPCA '09
MECS
MECS-X2
FlattenedButterfly
PartitionedMECS
Summary MECS
A novel one-to-many topology Good fit for planar substrates Excellent connectivity Effective wire utilization
Generalized Express Cubes Framework & taxonomy for NOC topologies Extension of the k-ary n-cube model Useful for understanding and exploring
on-chip interconnect options Future: expand & formalize
UTCS 31HPCA '09
Summary MECS
A novel one-to-many topology Good fit for planar substrates Excellent connectivity Effective wire utilization
Generalized Express Cubes Framework & taxonomy for NOC topologies Extension of the k-ary n-cube model Useful for understanding and exploring
on-chip interconnect options Future: expand & formalize
UTCS 32HPCA '09
UTCS 33HPCA '09