constraint-based register allocation and instruction ... · global register allocation with all...
TRANSCRIPT
Constraint-based Register Allocation and
Instruction Scheduling
Roberto Castaneda Lozano – SICS
Mats Carlsson – SICS
Frej Drejhammar – SICS
Christian Schulte – KTH, SICS
CP 2012
Outline
1 Introduction
2 Program Representation
3 Constraint Model
4 Decomposition
5 Evaluation
6 Conclusion
2 / 22
Problems in Traditional Register Allocation
Traditional compiler:
front-endinstructionselection
instructionscheduling
registerallocation
sourceprogram
assemblyprogram
Problems:
Interdependencies: staging is suboptimal
NP-hardness: sub-optimal, complex heuristic algorithms
“Lord knows how GCC does register allocationright now”. (Anonymous, GCC Wiki)
3 / 22
Can We Do Better?
1 Potentially optimal code: integration, optimization
2 Simplicity, flexibility: separation of modeling and solving
. . . this sounds like something for CP
previous CP approaches:
scheduling only (Malik et al., 2008)
integrated code generation
scheduling, assignment (Kuchcinski, 2003)
selection, scheduling, allocation (Leupers et al., 1997)
→ limitation: local (cannot handle control flow)
4 / 22
Our Approach
Constraint model that unifies
global register allocation with all essential aspects
register assignment, spilling, coalescing, . . .
instruction scheduling
Based on a novel program representation
Robust code generator based on a problem decomposition
Current code quality: on par with LLVM (state of the art)
5 / 22
1 Introduction
2 Program Representation
3 Constraint Model
4 Decomposition
5 Evaluation
6 Conclusion
6 / 22
Liveness and Interference
A temp is live while it might still be used:
t0 ← add . . .
......
jr t0
Two temps interfere if they are live simultaneously
non-interfering temps can share registers
7 / 22
Linear Static Single Assignment Form (LSSA)t0 is global: live in multiple blocks
How to model interference of global temps?
LSSA: decompose global temps into multiple local temps
t1t1 ← add . . .
t2... t3
...
t4jr t4
t1 ≡ t2 t1 ≡ t3
t2 ≡ t4 t3 ≡ t4
Invariant: all temps are local → simple interference model8 / 22
1 Introduction
2 Program Representation
3 Constraint Model
4 Decomposition
5 Evaluation
6 Conclusion
9 / 22
Register Assignment as Rectangle Packing
Register Assignment Rectangle Packing
temp live ranges rectangles
temp size rectangle width
interfering temps cannot share registers rectangles cannot overlap
→ based on (Pereira et al., 2008)cycle
0
1
2
3
v0 v1 a0 a1 . . ....
... . . .
t0
t1
t2
t3
11 / 22
Spilling and CoalescingSpilling: saving a temp in memory
Requires copying temps from/to memory
Introduce optional copy instructions:
t1 ← {null, sw, move} t0
which operation implements each copy?
if a copy is inactive (null), its temps are coalesced
cycle
0
1
2
3
v0 v1 a0 a1 . . . ra m1 m2 . . .
...
processor space memory space
. . .
. . .
...
... . . .
t0
t1
null
12 / 22
Global Register Allocation
Link between blocks: congruences
Congruent temps must be assigned the same register:
v0 v1 a0 a1 . . .
cycle
0
1
2
3
t0
t1
t2
v0 v1 a0 a1 . . .
cycle
0
1
2
3
4
t3t4
t5t6
v0 v1 a0 a1 . . .
cycle
0
1
2
t7
t0 ≡ t3t1 ≡ t4
t1 ≡ t7
t6 ≡ t7
t3 ≡ t5t4 ≡ t6
13 / 22
Instruction Scheduling
in which cycle is each instruction issued?
Connection to register allocation: live ranges
Classic scheduling model with:
precedencesresource constraints
14 / 22
1 Introduction
2 Program Representation
3 Constraint Model
4 Decomposition
5 Evaluation
6 Conclusion
15 / 22
LSSA Decomposition
Only link between blocks: congruences
instruction cycles?copy operations?register of t0?register of t1?register of t2? instruction cycles?
copy operations?register of t3?register of t4?register of t5?register of t6?
instruction cycles?copy operations?register of t7?
t0 ≡ t3t1 ≡ t4
t1 ≡ t7
t6 ≡ t7
t3 ≡ t5t4 ≡ t6
16 / 22
1 Introduction
2 Program Representation
3 Constraint Model
4 Decomposition
5 Evaluation
6 Conclusion
17 / 22
Experiment Setup
86 functions from bzip2 (SPECint 2006 suite)
Selected MIPS32 instructions with LLVM 3.0
Implementation with Gecode 3.7.3
Sequential search on standard desktop machine
18 / 22
Quality of Generated Code vs. LLVM
1001011021031041051061071081091010
1001011021031041051061071081091010
estimated
cycles
(codegenerator)
estimated cycles (LLVM)
b
b
b
b
b
b
b
bb
bb
b
b
b
b
bb
bb
b
bb
b
b
b
b
bb
b
bb
b
b
b
b
b
b
b
bb
b
b
b
b
bb
bb
b
b
b
bb
b
bb
b
bbb
b
b
bb
b
b
b bbb
b
bb
b
bbbbbbb
bb
b
bb
19 / 22
Solving Time
101
102
103
104
105
106
107
108
101 102 103 104
averagesolvingtime(m
s)
instructions
n2
n
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
bb
b b
b
b
b
b
b
b
b
bb
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
bb
b
b
b
b
b
b
b
bb
bb
b
b
b
bb
b
b
b
b
b
b
b
b
b
bb
b
b
b
b
bb
20 / 22
1 Introduction
2 Program Representation
3 Constraint Model
4 Decomposition
5 Evaluation
6 Conclusion
21 / 22
Conclusion
1 Model that unifies:
all essential aspects of register allocationinstruction scheduling
→ state-of-the-art code quality
2 Problem decomposition
→ robust generator for thousands of instructions
Key: tailored problem representation (LSSA form)
Lots of future work:
search heuristics, implied constraints . . .integration with instruction selectionevaluate for other processors and benchmarks
22 / 22