physical aware system level design for tiled hierarchical ... · physical‐aware system‐level...
TRANSCRIPT
Physical‐Aware System‐Level Design for Tiled Hierarchical
Chip Multiprocessors
Jordi Cortadella, Javier de San Pedro,Nikita Nikitin and Jordi Petit
Universitat Politècnica de Catalunya (Barcelona)
Project funded by Intel Corp.
Designing a Chip Multiprocessor
ISPD 2013 Tiled CMPs 2
CMP Off‐ChipMemory
DSP
Graphics
Data Mining
Bioinformatics
⁞
• How many cores?• How much L2/L3 on-chip cache?• Interconnect: mesh/ring/bus?• How many memory controllers?
What is architectural exploration?
ISPD 2013 Tiled CMPs
MC
MC
MC
MC
MC
MC
MC
MC
C1
C1
C1
R
C2
L2L2
L2
NIL3
L2Bu
s
C2
L2C2L2
C2L2
L3
Bus
C2L2
C2L2
NIR
C1
L2
‐ 6x4 mesh, 24 clusters‐ total 144 cores‐ 6 cores/cluster‐ 1 C1, 128K L1, 256K L2‐ 5 C2, 64K L1, 96K L2‐ 146 Mb total shared L3
Throughput = 85.71 IPC
‐ 5x5 mesh, 25 clusters‐ total 100 cores‐ 4 cores/cluster‐ 3 C1, 128K L1, 1M L2‐ 1 C2, 128K L1, 1M L2‐ 130 Mb total shared L3
Throughput = 103.26 IPC
3
Library of models
ISPD 2013 Tiled CMPs
Parameter Value
Chip areaMesh dimensionsMem. Cntrl. latencyInterconnectLink widthWorkload MPIWorkload MLP
350 mm2
2x2 to 16x16200 cyclesBus, uni‐ / bi‐ring256 – 1024 bits0.51.25
1.5
1.75
2
2.25
2.5
2.75
0.75 1 1.25 1.5 1.75 2 2.25
IPC
Area (mm2)
C1 (IO)
C2 (OoO)
C3 (OoO)
0
0.2
0.4
0.6
0.8
0 1 2 3 4
Miss Ra
tio
Cache Size (Mb)
CacheSize
Area(mm2)
Latency(cycles)
64Kb 0.063 2
128Kb 0.125 3
256Kb 0.25 4
… … …
8Mb 8.0 9
Miss rates forSPEC CPU2006
Core library
CMP configuration Cache models
4
Physical planning for tiled CMPs
ISPD 2013 Tiled CMPs
N
S
EW
5
Outline
• Architectural exploration– The cost of exploration– Exploring with metaheuristics– Analytical models
• Physical planning for tiled CMPs
• Current work: regular floorplanning
Tiled CMPsISPD 2013 6
Exploration engines
Tiled CMPsISPD 2013
1
10
100
1000
10000
1E+05
1E+06
1E+07
1E+08
1E+09
1E+10
1E+11
1E+12
1E+13
Simulation(full system)
Simulation(probabilistic)
Analytical(exhaustive)
Analytical(metaheuristic)
Exploration runtim
e (sec)
Design space: 109 configurations300 centuries
300 years
100 days
100 seconds
7
Scalable exploration
ISPD 2013 Tiled CMPs
Simulation
AnalyticalModeling
8
Architecturalconfigurations
Promisingconfigurations
Automated exploration
Models(performance/power)
Cores
On‐chip caches
Off‐chip memories
Interconnect fabrics
Cache protocol
Workloads
Architectural configuration
Number of cores
Cluster size
L2/L3 size
Intra‐cluster interconnect
Inter‐cluster interconnect
Memory controllers
Exploration tool
Constraints
Area
Throughput
Power
ISPD 2013 Tiled CMPs
Physical info
Cores
Caches
Interconnects
9
Exploration engine: metaheuristics• Explore huge design spaces efficiently• Our proposal:
– Simulated Annealing (Kirkpatrick et al., 1983)– Extremal Optimization (Boettcher et al., 1999)
Tiled CMPsISPD 2013
Exploration toolSimulationBest configuration
Models
Constraints
Partial generation of configurations
search direction
Analytical modeling
10
Generation of configurations• Generate neighbors by applying transformations
– Increase/Decrease• mesh dimensions• core count per cluster• L1, L2 size
– Change interconnect type (bus/uni‐ring/bi‐ring)– Complex updates (increase mesh/decrease core count)
• Example: Increase_X(mesh 4x4) => mesh 5x4
ISPD 2013 Tiled CMPs 11
Increase_X
Analytical performance model for CMPs
λ L Throughput
Core
Core
Core
…
Liλi
Memory subsystem
ISPD 2013 Tiled CMPs
L L •••
Queueing model:Umit Ogras et al.
IEEE TCAD, Dec 2010
0
10
20
30
40
50
0 0.05 0.1 0.15 0.2
L, average latency (cycles)
λ, average traffic rate (flits/cycle)
L(λ)λ (L)
Characteristic of the ICCharacteristic of
the cores/workload
Hop‐count latency
Nonlinear analytical models
Throughput model:
Traffic model:
Latency model:
12
Analytical model vs. simulation
ISPD 2013 Tiled CMPs
Simulation
Analyticalmodeling
0
10
20
30
40
50
60
70
801 55 109
163
217
271
325
379
433
487
541
595
649
703
757
811
865
919
973
1027
1081
1135
1189
1243
1297
1351
1405
1459
1513
1567
1621
1675
1729
1783
1837
1891
1945
1999
2053
2107
Throughp
ut (IPC
)
Configurations sorted in descending order of throughput
Modeling
Simulation
13
Case Study: Power‐performance exploration
Tiled CMPsISPD 2013
70
80
90
100
110
120
130
120 140 160 180 200 220 240
Throughp
ut (IPC
)
Power (W)
6x5, Bi‐Ring, 4C25x4, Bi‐Ring, 6C2
5x3, Bi‐Ring, 8C2
4x3, Bi‐Ring, 10C24x2, Bi‐Ring, 15C2
3x2, Bi‐Ring, 20C2
6x5, Bus, 4C27x4, Bus, 3C2+1C3
7x4, Bus, 4C2
6x5, Bus, 2C1+2C26x4, Bus, 3C1+2C2
Power‐performance trade‐off(Search space: 1.5 · 109 configurations)
14
Outline
• Architectural exploration
• Physical planning for tiled CMPs– Impact of physical planning– Floorplanning– Wire planning
• Current work: regular floorplanning
Tiled CMPsISPD 2013 15
Physical planning
ISPD 2013 Tiled CMPs 16
C
L2
C
L2
C
L2
C
L2
r r
r r
r
r
R
L3
NSWE
The impact of physical planning
ISPD 2013 Tiled CMPs 17
C
L2
C
L2
C
L2
C
L2
r r
r r
r
r
R
L3
NSWE
Physical planning for tiles
ISPD 2013 Tiled CMPs
N
S
EW
18
Link width: how many wires?
ISPD 2013 Tiled CMPs 19
Router
Router
Cntrl Addr Cache line64 512
Cntrl Addr Cache line64 512
> 1K wires 100 m
3D Wire Planning
ISPD 2013 Tiled CMPs
m1
m2
m3
m4
m5
m6
MemoryRouterCore
In systems where memory bandwidth is the bottleneck,the physical resources providing the bandwidth are critical
20
FEOL
Exploration without physical planning
Tiled CMPsISPD 2013
SimulationBest configuration
Models
Constraints
Generation of configurations
search direction
Analytical modeling
Validation
Architectural exploration
21
Exploration with physical planning
Tiled CMPsISPD 2013
SimulationBest configuration
Generation of configurations
search direction
Analytical modeling
Wire Planning
Floor‐planning
Physical planningValidation
Architectural exploration
22
Phys. Info
Models
Constraints
Physical planning
ISPD 2013 Tiled CMPs
Simulation
AnalyticalModeling L3 R
Local IC
L2CC L2
L3
23
PhysicalPlanning
Estimations:• Area• Wirelength• Routability
Physical planning technology• Floorplanning
– Slicing structures & Simulated Annealing– Lightweight 3D maze router– Constraints:
• Adjacency (Core L2)• Balanced links (rings)
• Wire planning– SAT‐based 3D global routing– Boolean constraints
ISPD 2013 Tiled CMPs 24
Slicing structures
ISPD 2013 Tiled CMPs 25
13
24 5
1
2
3
5
4
V
H H
V
H
V V
H
1 2 34 5
1
4 3
2 5
D.F. Wong and C.L. Liu,“A New Algorithm for Floorplan Design”DAC, 1986, pages 101-107.
Bounding curves
ISPD 2013 Tiled CMPs 26
Memory
L. Stockmeyer, 1983,Optimal Orientation of Cells in Slicing Floorplan Designs
Wire planner• SAT‐based approach for gridded routing• Grid unit: link width ( 500 ‐ 1000 wires)• Support for floating terminals• Customizable for any type of Boolean‐encoded constraints (symmetry, 1D/2D routing, …)
ISPD 2013 Tiled CMPs
Top view
Cross-section view
27
Wire planner
• Concurrent routing: all nets simultaneously• Using Euler’s theory to find legal routes• SAT: a route is always found if it exists• ILP‐based route optimization
ISPD 2013 Tiled CMPs 28
RouterW EW E
Design space
ISPD 2013 Tiled CMPs 29
Design spaceWire
length [1
06μm
]
ISPD 2013 Tiled CMPs 30
Filtering floorplansArea [m
m2]
ISPD 2013 Tiled CMPs 31
Filtering floorplansArea [m
m2]
ISPD 2013 Tiled CMPs 32
After physical planning
ISPD 2013 Tiled CMPs 33
After physical planning
ISPD 2013 Tiled CMPs 34
After physical planning
ISPD 2013 Tiled CMPs 35
Outline
• Architectural exploration
• Physical planning for tiled CMPs
• Current work: regular floorplanning– Memory floorplanning– Regularity extraction
Tiled CMPsISPD 2013 36
Min‐area floorplan
ISPD 2013 Tiled CMPs 37
C
L2
C
L2
C
L2
C
L2
r r
r r
r
r
R
L3
NSWE
L3
CL2
L2
CL2
r
CR
r
rr r
L2C
r
R
Integrated memory floorplanner
ISPD 2013 Tiled CMPs 38
1Mb1Mb 1Mb
512Kb
512Kb
512Kb 512KbL-shape
256Kb 512
Kb
256Kb
T-shape
Regular floorplan
ISPD 2013 Tiled CMPs 39
C
L2
C
L2
C
L2
C
L2
r r
r r
r
r
R
L3
NSWE
L3
r
C
r
C
r
rC
r
C
r
R
L2
L2
L2
L2
Regular floorplan
ISPD 2013 Tiled CMPs 40
L3
r
C
r
C
r
rC
r
C
r
R
L2
L2
L2
L2
L3
CL2
L2
CL2
r
CR
r
rr r
L2C
r
Regularity:• Smaller design effort• Efficient timing closure• Choppability
Regular floorplan
ISPD 2013 Tiled CMPs 41
L3
r
C
r
rC
r
R
L3
CL2
L2
CL2
r
CR
r
rr r
L2C
r L2
L2
Regularity:• Smaller design effort• Efficient timing closure• Choppability
Exploration:• Graph based knowledge discovery• Hierarchical slicing structures• Simulated Annealing
Conclusions• Physical information is essential for the architectural exploration of tiled CMPs
• Approach: divide & conquer– 1‐cluster floorplan + abutment constraints– Build tiled CMPs (scalable and choppable)
• Ongoing work:– Multi‐module memory floorplanning– Regularity
ISPD 2013 Tiled CMPs 42