cstalks-polymorphic heterogeneous multicore systems-17aug
DESCRIPTION
TRANSCRIPT
blog.nus.edu.sg/cstalks
Polymorphic Heterogeneous
Multi-Core Systems
Mihai Pricopi
CSTalks
August 17, 2011
Motivation
Mihai Pricopi 3 CSTalks
Single-core performance (complexity) increase
Motivation
Mihai Pricopi 4 CSTalks
Instruction-level parallelism (ILP)
1: e = a + b
2: f = c + d
3: g = e * f
4: h = f * 2
I 2
3 4
Motivation
Mihai Pricopi 5
2006 2007
CSTalks
Motivation
Mihai Pricopi 6 CSTalks
Thread-level parallelism (TLP)
Multi-threaded applications
Multi-programmed jobs
Process
P0 P1 P0 P1
Process0 Process1
Motivation
Mihai Pricopi 7
nVidia Tesla many-core: up to
960 simple and identical
cores.
Massively exploiting the TLP.
Sequential programs suffer
from limited ILP exploitation.
A gap between TLP and ILP.
Solution: heterogeneous
systems to accommodate the
gap between TLP and ILP.
CSTalks
Heterogeneous Chip Multi-processors
Mihai Pricopi 8
Multi-core systems that use cores with different
performance parameters.
Existing results show that heterogeneous systems are
more efficient than homogeneous ones in terms of
performance, power, area and delay.
Heterogeneity can be reached by using:
◦ Asymmetric chip multi-processors (ACMPs)
◦ Multiprocessor system-on-chip (MPSoC)
◦ Architectures that dynamically reconfigure the internal
structure in order to adapt to different software requests
(polymorphic)
CSTalks
Heterogeneous Chip Multi-processors
Mihai Pricopi 9 CSTalks
Asymmetric chip multi-processors (ACMPs)
P1
P2
P3
P0 P0 P1
P2 P3
P4
Heterogeneous Chip Multi-processors
Mihai Pricopi 10 CSTalks
Multiprocessor system-on-chip (MPSoC)
ARM
DSP
memory
controller
video
accelerator
bridges
Program Phase Behavior - gzip
Mihai Pricopi 11 CSTalks
Program Phase Behavior - gcc
Mihai Pricopi 12 CSTalks
Polymorphic Heterogeneous Multi-Core
Systems
Mihai Pricopi 13 CSTalks
• General propose applications
• Novel architecture that can be
tailored according to the
software requirements
• Base system: homogeneous
processor
• Reconfigurable capabilities
• Internal structure
adaptation
• Core-coalition
• Memory
P0 P1
P4 P5
P2 P3
P6 P7
P8 P9
P12 P13
P10 P11
P14 P15
RF
RF
Polymorphic Heterogeneous Multi-Core
Systems – Reconfigurable Fabric
Mihai Pricopi 14 CSTalks
• Reconfigurable hardware shared by different processors
• RF implements custom instructions
• Dynamic reconfiguration at runtime – speedup
1: e = a + b
2: f = c + d
3: g = e * f
4: h = f * 2
I 2
3 4
RF
P0
P1
Custom Instruction
Polymorphic Heterogeneous Multi-Core
Systems – Reconfigurable Fabric
Mihai Pricopi 15 CSTalks
• Challenging Problems:
• The amount of RF is limited.
• Decide when to reconfigure the RF (scheduling)
• What is the best set of Custom Instructions that
will give the highest speedup.
• Overhead of the dynamic reconfiguration.
Polymorphic Heterogeneous Multi-Core
Systems – Core Structure Adaptation
Mihai Pricopi 16 CSTalks
• Similar performance can be achieved by using smaller
processor internal units.
• Instruction fetch window size, issue width, instruction
window size, frequency can be dynamically changed.
• Power and thermal concerns.
Polymorphic Heterogeneous Multi-Core
Systems – Core-Coalition
Mihai Pricopi 17 CSTalks
• Coalition helps creating “stronger” cores using the already
existing light cores:
• accelerates serial applications by extracting more ILP
(if available).
• uses limited amount of shared hardware between
cores.
• up to 4-core coalition can be formed.
P0
(2-way)
P1
(2-way)
P
(4-way) ≡
2-core coalition
Polymorphic Heterogeneous Multi-Core
Systems – Core-Coalition Execution Model
Mihai Pricopi 18 National University of Singapore
B0
B1 B2
B3
B0
B1
B1
B3
B3
B0
B0
B1
B1
B3
B3
B4
B4
B4
B4
B0
B4
Core 0 Core 1
SF: Sentinel Instruction
fetch and global
renaming
RF: Regular instruction
fetch, decode and
renaming
EX: Regular instruction
execution
CM: Regular instruction
commit
Time SF RF EX CM SF RF EX CM
CFG
Experimental Results - Speedup
Mihai Pricopi 19 National University of Singapore
Experimental Results – Load Balance
Mihai Pricopi 20 National University of Singapore
Proposed directions
Mihai Pricopi 21 National University of Singapore
Next steps:
◦ Implement Coalition on FPGA.
◦ More study on the overhead and power
consumption determined by the shared resources.
◦ Implement a dynamic scheduler for Coalition.
Mihai Pricopi 22 National University of Singapore
?
Next Week’s Talk
A Unified Framework for Recommendations in
the Social Network by Chen Wei
Join us next Wednesday!
Wednesday, 31 August, 2011 23