graph-based procedural abstraction
DESCRIPTION
Graph-Based Procedural Abstraction. A. Dreweke , M. Wörlein, D. Schell, T. Meinl, I. Fischer, M. Philippsen. embedded systems. cost and energy consumption depend on the size of the built-in memory limited amount of memory more and more functionality is packed on embedded systems - PowerPoint PPT PresentationTRANSCRIPT
Programming Systems Group, Computer Science Department 2University of Erlangen-Nuremberg, Germany
www2.cs.fau.de
Graph-Based Procedural Abstraction
A. Dreweke, M. Wörlein, D. Schell,
T. Meinl, I. Fischer, M. Philippsen
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
2
embedded systems
• cost and energy consumption depend on the size of the built-in memory limited amount of memory
• more and more functionality is packed on embedded systemsmemory must be used more efficiently
procedural abstraction reduces code size by extracting duplicate code segments
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
3
procedural abstraction
post link-time optimization of static binaries:
+ whole program code, including all libraries
+ function prolog and epilog + constant address
calculations
- precise control flow must be reconstructed
- offset tables- register indirect jumps
binary
optimized binary
postprocessor
extraction
candidate selection
duplicate search
preprocessor
duplicate search
candidate selection
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
4
procedural abstraction (suffix tree)
• textual matching of instruction sequences
• frequent instruction sequences are taken from the suffix tree
• various optimizations:– special treatment for labels, jumps, …– fingerprinting– canonic register mapping– …
but fundamental suffix tree matching problem persists
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
5
duplicate search (suffix tree)
postprocessor
extraction
candidate selection
duplicate search
preprocessor
...
2000: add r2, r1, 0x42
2004: sub r2, r2, r3
2008: add r4, r2, 0x4
200c: load r3, 0x10710
2010: sub r2, r2, r3
2014: load r3, 0x1071c
2018: add r4, r2, 0x4
...
2504: mul r2, r1, 0x5
2508: sub r2, r2, r3
250c: add r4, r2, 0x4
2510: load r3, 0x10710
2514: sub r2, r2, r3
2518: load r3, 0x1071c
251c: add r4, r2, 0x4
...
...
3118: div r3, r2, r1
311c: sub r2, r2, r3
3120: add r4, r2, 0x4
3124: load r3, 0x10710
3128: sub r2, r2, r3
312c: load r3, 0x1071c
3130: add r4, r2, 0x4
...
400c: sub r3, r2, 0x42
4010: sub r2, r2, r3
4014: load r3, 0x10710
4018: add r4, r2, 0x4
401c: sub r2, r2, r3
4020: add r4, r2, 0x4
4024: load r3, 0x1071c
...
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
6
extraction (suffix tree)
...
2000: add r2, r1, 0x42
2004: call 0x5070
...
2504: mul r2, r1, 0x5
2508: call 0x5070
...
3118: div r3, r2, r1
311c: call 0x5070
...
400c: sub r3, r2, 0x42
4010: sub r2, r2, r3
4014: load r3, 0x10710
4018: add r4, r2, 0x4
401c: sub r2, r2, r3
4020: add r4, r2, 0x4
4024: load r3, 0x1071c
...
5070: sub r2, r2, r3
5074: load r3, 0x10710
5078: add r4, r2, 0x4
507c: sub r2, r2, r3
5080: add r4, r2, 0x4
5084: load r3, 0x1071c
5088: return
postprocessor
extraction
candidate selection
duplicate search
preprocessor
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
7
candidates selection (iterative greedy)
postprocessor
extraction
candidate selection
duplicate search
preprocessor
=21
4
3
3
4
4
3
3 instructions
4 instructions
7 instructions
extraction benefit:
(L · (N – 1) – (N + 1) > 0
L: code length
N: # of occurrences
call ret
extraction benefit:
(7 · (2 – 1) – (2 + 1) = 4 > 0
L: code length
N: # of occurrences
call ret
=17
3
4
4
3
call
call
ret
extraction benefit:
(4 · (2 – 1) – (2 + 1) = 1 > 0
L: code length
N: # of occurrences
call ret
=16
3
4
3
call
call
ret
call
call
ret
extraction benefit:
(3 · (2 – 1) – (2 + 1) = 0
L: code length
N: # of occurrences
call ret
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
8
saved instructions (absolute values)
0
10
20
30
40
50
60
70
80
90
100
bitcnts crc dijkstra patricia qsort rijndael search sha
# in
str
uc
tio
ns
really small input binaries: gcc -Os, dietlibc linked
MiBench programs on ARM
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
9
saved instructions (relative values)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
bitcnts crc dijkstra patricia qsort rijndael search sha
% im
pro
ve
me
nt
really small input binaries: gcc -Os, dietlibc linked
MiBench programs on ARM
good savings, still not optimal
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
10
procedural abstraction (graph-based)
• transform instruction sequences into minimal data flow graphs (DFG)
• search for frequent subgraphs in DFGs
sub r2, r2, r3
add r4, r2, 0x4
load r3, 0x10710
sub r2, r2, r3
load r3, 0x1071c
add r4, r2, 0x4
add
sub
load
sub
add load
add load
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
11
duplicate search (graph-based)
postprocessor
extraction
candidate selection
duplicate search
preprocessor
...
2000: add r2, r1, 0x42
2004: sub r2, r2, r3
2008: add r4, r2, 0x4
200c: load r3, 0x10710
2010: sub r2, r2, r3
2014: load r3, 0x1071c
2018: add r4, r2, 0x4
...
2504: mul r2, r1, 0x5
2508: sub r2, r2, r3
250c: add r4, r2, 0x4
2510: load r3, 0x10710
2514: sub r2, r2, r3
2518: load r3, 0x1071c
251c: add r4, r2, 0x4
...
...
3118: div r3, r2, r1
311c: sub r2, r2, r3
3120: add r4, r2, 0x4
3124: load r3, 0x10710
3128: sub r2, r2, r3
312c: load r3, 0x1071c
3130: add r4, r2, 0x4
...
400c: sub r3, r2, 0x42
4010: sub r2, r2, r3
4014: load r3, 0x10710
4018: add r4, r2, 0x4
401c: sub r2, r2, r3
4020: add r4, r2, 0x4
4024: load r3, 0x1071c
...
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
12
extraction (graph-based)
...
5070: sub r2, r2, r3
5074: load r3, 0x10710
5078: add r4, r2, 0x4
507c: sub r2, r2, r3
5080: add r4, r2, 0x4
5084: load r3, 0x1071c
5088: return
postprocessor
extraction
candidate selection
duplicate search
preprocessor
...
2000: add r2, r1, 0x42
2004: call 0x5070
...
2504: mul r2, r1, 0x5
2508: call 0x5070
...
3118: div r3, r2, r1
311c: call 0x5070
...
400c: sub r3, r2, 0x42
4010: call 0x5070
...
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
13
postprocessor
extraction
candidate selection
duplicate search
preprocessor
search lattice
* sub
add
sub
add
sub
load
add
sub
load
sub
add
sub
load
sub
add
add
sub
load
sub
add load
add load
load
sub
sub
load
add
sub
load
load
sub
add
load add
sub
load
add
sub
sub
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
14
• pruning necessary because of the size of the search lattice
• number of occurrences must decrease with growing subgraph size
• calculate the maximal-independent set (MIS) of subgraphs to make pruning possible again
graph miner (procedural abstraction extensions)
load
sub
add add
#occurrences: 1#occurrences: 2#occurrences: 1
postprocessor
extraction
candidate selection
duplicate search
preprocessor
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
15
add
sub
load
sub
add load
add load
graph miner (procedural abstraction extensions)
load
add load
call
postprocessor
extraction
candidate selection
duplicate search
preprocessor
• invalid subgraph pruning during candidate selection
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
16
postprocessor
extraction
candidate selection
duplicate search
preprocessor
candidates selection (optimal)
=21
4
3
3
4
4
3
=16 =15
ret
4
3
callcallcall
callcallcall
ret
3
4
3
call
call
ret
call
call
ret
greedy iterative
collisions:
optimum
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
17
procedural abstraction (graph-based)
Pro• no special treatment of
branches and labels
• resistant to instruction reordering
• can be used to extract general code fragments, not limited to basic blocks or single-entry single-exit regions
Con• subgraph-isomorphism test
is NP-complete
• extremely huge search lattice (exponential in time and memory usage)
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
18
saved instructions (absolute values)
0
50
100
150
200
250
300
bitcnts crc dijkstra patricia qsort rijndael search sha
# in
str
uc
tio
ns
suffix tree graph based
really small input binaries: gcc -Os, dietlibc linked
MiBench programs on ARM
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
19
saved instructions (relative values)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
bitcnts crc dijkstra patricia qsort rijndael search sha
% im
pro
ve
me
nt
suffix tree graph based
really small input binaries: gcc -Os, dietlibc linked
MiBench programs on ARM
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
20
optimization time (sec.)
0
20
40
60
80
100
120
140
bitcnts crc dijkstra patricia qsort rijndael search sha
tim
e (
se
c.)
suffix tree graph based
4h 20m
really small input binaries: gcc -Os, dietlibc linked
MiBench programs on ARM
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
21
future work
• increase number of identified duplicate candidates– extend search areas from basic blocks to function and
whole program– canonic register mapping
• speedup duplicate search– further parallelize graph search– more procedural abstraction specific pruning rules to limit
search lattice
© Alexander Dreweke, Computer Science Department 2 – Programming Systems Group, University of Erlangen-Nuremberg, Germany
22
summary
• procedural abstraction with DFGs result in more compact code:– graph-based mining saves up to 2.6 times more
instructions than the traditional approaches
• interesting for embedded systems (huge volumes)– long optimization times affordable because of price per
piece– overnight or over the weekend optimization of code
during the development process – every saved bit counts
Programming Systems Group, Computer Science Department 2University of Erlangen-Nuremberg, Germany
www2.cs.fau.de
Graph-Based Procedural Abstraction
A. Dreweke, M. Wörlein, D. Schell,
T. Meinl, I. Fischer, M. Philippsen