cap: criticality ancap: criticality annalysis for...
TRANSCRIPT
CAP: Criticality AnCAP: Criticality AnEfficient Speculatip
James Tuck Wei LiuJames Tuck, Wei LiuUniversity of Illinois at
International Conference on Comput
nalysis for Powernalysis for Power-ive Multithreadingg
u Josep Torrellasu, Josep TorrellasUrbana-Champaign
er Design (ICCD), October 2007.
MotivMotiv
S l ti M ltith dSpeculative MultithreadCMPs− It can speedup hard-to-
Power inefficiency of SPower inefficiency of S
vationvation
di (SM) i i t t fding (SM) is important for
parallelize programs
M is a serious concernM is a serious concern
PropProp
W iWe can improve powercriticality analysis− Some threads matter m
others
Dynamically construct aand calculate criticalityand calculate criticality
Schedule tasks on a CM− DVFS per-core for powe
Schedule critical tasks t− Schedule critical tasks tcritical to lower V-f
posalposal
ffi i ir efficiency using
more for performance than
a graph of SM execution
MP using criticalityer-efficiencyto higher V f cores nonto higher V-f cores, non-
ContribContrib
N l id l li blNovel, widely-applicablSpeculative Multithread
CAP architecture for a our proposed modelour proposed model
Evaluation of SPECint2− We reduce average pow
Average slow down of 2− Average slow down of 2− ED^2 reduced on avera
Characterize task criticdifferent applications
butionsbutions
t k iti l d l fe task critical model for ding
CMP that implements
2000wer by Geo.Mean of 22%2 6%2.6%age 15%
ality composition of
Task-Level CrTask-Level Cr
M d l ti t thModel execution at the − Events of interest: spawp− Keeps overhead low co
schemesschemes
Seamlessly handle a va− In-order vs. out-of-order− Scheduling mechanismSc edu g ec a s
Round robinFirst available coreFirst available core
Efficient hardware impl
riticality Modelriticality Model
l l f t k tlevel of task eventswn, commit, squash, , qompared to instruction-level
ariety of SM systemsr spawns
ementation
Lifetime ofLifetime off SM Taskf SM Task
Criticality GraCriticality Gra
N dNodes− Stages of the task's exeg
Start, Execute, Finish/Sp
EdgesEdges− Transitioning between s− Events between tasks
Spawn, squash, commitSpawn, squash, commitbecome safe
aph Summaryaph Summary
ecutionpawn, Commit
states in a single task
, freeing a resource, wait to , freeing a resource, wait to
CAP ArcCAP Arc
B ild iti lit h iBuild criticality graph inmodel
Dynamically analyze cr
M k di ti dMake predictions and s
hitecturehitecture
h d in hardware using our
ritical path of graph
h d l t kschedule tasks
CAP in a MultiprCAP in a Multiprrocessor Systemrocessor System
T k C t llTask Controller− Tracks running tasks and g
their context
Novel components of CAPNovel components of CAP− Critical path builder
Builds path and analyzes graph
− Critical path predictor
CAP OvCAP Ov
T ll t th hTo collect the graph− TC sends summary of ty
task commits− Summary contains sumSummary contains sum
Who spawned it, who sq
The CPB creates a nod− The CPB creates a nodedges
Analyzing the graph− Store nodes such that c− Walk graph in reverse t
verviewverview
task execution to builder after
mmary of important edgesmmary of important edgesquashed it, etc.
e in the graph and adds thee in the graph and adds the
critical path calculation is easyp yo find critical path
Critical PatCritical Pat
T i i l l t dTrain using calculated c
Record edge-centric infeco d edge ce t c− Spawn edges− Squash edges
Use strongly biased edg ydecisions. For example
When Task(A) spawns− When Task(A) spawns − When Task(A) squashe
iti lcritical
h Predictorh Predictor
iti l thcritical path
formationo at o
ges to control scheduling g ge:Task(B) B is likely criticalTask(B), B is likely critical
es Task(C), C becomes
SchedulingScheduling
A DVFSAssume DVFS per core
CMP is statically configC s stat ca y co gfrequency (V-f) pairs
P t iti l t k tPromote critical tasks to
Demote non-critical tas
g on a CMPg on a CMP
CMPe on a CMP
gured among Voltage-gu ed a o g o tage
hi h V fo high V-f cores
sks to low V-f cores
EvaluatioEvaluatioSPECint2000 applicatioSPECint2000 applicatio− Optimized for SM using
Two V-f settings
3 Static CMP configura3 Static CMP configura− 3-Crit, 2-Crit, 1-Crit
on Setupon Setuponsons POSH compiler [PPoPP'06]
tions 3-Crittions
2-Crit
1-Crit
Normalized ExNormalized Ex
Moving to fewer fast cogperformance
Only 2 2% for 2 Crit!− Only 2.2% for 2-Crit!
xecution Timexecution Time
ores has small impact on p
Normalized ENormalized E
Best ED^2 is obtained Average reduction of 16− Average reduction of 16
− Max reduction of 57.5%
E-D-SquaredE-D-Squared
for 2-Crit6 2%6.2%
%
ConcluConclu
SM b ffi iSM can be power effici
Efficiently modeled taskc e t y ode ed tashardware
C iti lit l iCriticality analysis succfor power efficiency− Average performance lo
ED^2 reduction of 16 6%− ED 2 reduction of 16.6%
usionsusions
tent
k-level criticality in e e c t ca ty
f ll h d l t kcessfully schedules tasks
oss of only 2.2%% on average% on average