framework and design methodology of a compiler that compresses code using echo instructions
DESCRIPTION
Philip BriskMajid Sarrafzadeh. [email protected]. [email protected]. Embedded and Reconfigurable Systems Lab. Computer Science Department. University of California, Los Angeles. Framework and Design Methodology of a Compiler that Compresses Code using Echo Instructions. Outline. - PowerPoint PPT PresentationTRANSCRIPT
Framework and Design Methodology of a Compiler that
Compresses Code using Echo Instructions
Philip Brisk Majid Sarrafzadeh
Embedded and Reconfigurable Systems LabComputer Science Department
University of California, Los Angeles
Outline
• Introduction
• Echo Instructions
• Compiler Framework
• Experimental Results
• Conclusion
Introductory Example:The HP DeskJet 820C Digital Controller
• Total chip area is 81 mm2
• ROM consumes 14% of total die area
• Reduce Code Size
Reduce ROM size
Reduce Chip Area
Reduce Heat Dissipation and Power Consumption
“… the foremost consideration … was the final cost to the buyer.”
[McWilliams, 1997]
LZ77 Compression [Ziv and Lempel, 1977]
• Replace of Repeated Substrings with Pointers
Example: ABCDCABCDBABCAA becomes
ABCDC(5, 4)B(7, 3)AA
Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs
LZ77 Compression and Echo Instructions
Echo Instructions
Echo(Offset, Length)
1. Branch to PC – Offset; Save PC+1 in register R.
2. Execute the next Length Instructions
3. Branch to the address in register R
• Replaces Repeated Code Segments in a Program Instruction Stream
• Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism.
• Does not Incur the Overhead Associated with Procedure Calls.
An Example
$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1
100104108112116
340344348352356
404408412416420
• Repeating code sequences are replaced with echo instructions.
• Echo instructions are more space efficient than procedure calls
• No parameters
• No stack frame
$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$Echo(240, 5)$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…Echo(304, 5)$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1
100104108112116
340344348352356
404408412416420
Procedural Abstraction
• Techniques Predate Echo Instructions by 20+ Years
• Replace Repeated Instruction Sequences with Procedure Calls
• Substring Matching [Fraser, 1984]
• Reschedule/Rename [Cooper, 1999] [Lau, 2003]
• Our Approach: Subgraph Isomorphism
Substring Matching and Reschedule/Rename
$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1
100104108112116
340344348352356
404408412416420
$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$10 $5 + $4$11 $9 * $6$6 $9 * $10$10 $11 / $6$10 $6 + 10…$11 $7 * $8$1 $2 + $3$8 $7 * $1$1 $11 / $8$1 $8 + 1
100104108112116
340344348352356
404408412416420
Rename$4 : $3 $5 : $2 $6 : $8 $9 : $7$10 : $1 $11 : $11
Reschedule
Subgraph Isomorphism
+
*
*
+
/
All 3 Code Sequences have the same Data Flow Graph Representation
Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001].
Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use.
$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$10 $5 + $4$11 $9 * $6$6 $9 * $10$10 $11 / $6$10 $6 + 10…$11 $7 * $8$1 $2 + $3$8 $7 * $1$1 $11 / $8$1 $8 + 1
100104108112116
340344348352356
404408412416420
Example: 3 Dfgs
+ +
+
+
+ +
+
+
+ +
+
+
+ +
+
+
*
-
>> * -
+
<<
1 2
3 4
5
6
1
2 3 4 5
6
1 2 6
77 8
3 45
G1 G2 G3
Compression Example: 3 Dfgs
+ +
+
+
+ +
+
+
+ +
+
+
+ +
+
+
*
-
>> * -
+
<<
1 2
3 4
5
6
1
2 3 4 5
6
1 2 6
77 8
3 4
G1 G2 G3
5
Compression Example: 3 Dfgs
6
+ +
+
+
+ +
+
+
+ +
+
+
+ +
+
+
*
-
>> * -
+
<<
1 2
3 4
5
6
1
2 3 4 5
6
1 2
77 8
3 4
G1 G2 G3
5
Compression Example: 3 Dfgs
+ +
+
+
+ +
+
+
+ +
+
+
+ +
+
+
*
-
>> * -
+
<<
1 2
3 4
5
6
1
2 3 4 5
6
1 2 6
77 8
3 4
G1 G2 G3
5
Compression Example: 3 Dfgs
+ +
+
+
+ +
+
+
+ +
+
+
+ +
+
+
*
-
>> * -
+
<<
1 2
3 4
5
6
1
2 3 4 5
6
1 2 6
77 8
3 4
G1 G2 G3
5
Compression Example: 3 Dfgs
+ +
+
+
+ +
+
+
+ +
+
+
+ +
+
+
*
-
>> * -
+
<<
1 2
3 4
5
6
1
2 3 4 5
6
1 2 6
77 8
3 4
G1 G2 G3
5
Compression Example: 3 Dfgs
E
E
E
* -
+
<<
1
2 3 4 5
6
1 2 6
77 8
3 4
G2 G3
5
+ +
+
+
*
-
>>
1 2
3 4
5
6G1
• Both patterns reference the same instruction sequence.
• Schedule of operations and register usage must be identical.
Register Allocation by Example
• Data dependencies are maintained between patterns
• Spilling values to memory is inevitable where register pressure is high.
• Shuffle or spill code reduces the effectiveness of compression
+ +
+
+
+ +
+
+
<<
A B F
Z
C D
G3
E
X
YT5
T6 T7
T8
T5
T6 T7
T8
T1T2
T3
T4
T1T2
T3T4
Temporary Registers
(Infinite Supply)
Compiler Framework
• Challenge
Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions.
• Optimization Strategy
1. Minimize code size.
2. Select the lowest cost memory from a library.
3. Apply performance enhancing transformations as long as:
Code Size < Memory Capacity.
Design Overview
Performance Optimization
Target Independent Optimization
Instruction Selection
Compression Step
Register Allocation
Instruction Scheduling
Memory Selection
IR
MemoryLibrary
Assembly Code
1
2
3
4
5
6
7emit
Implementation Status
• Algorithms Integrated into the Machine SUIF Compiler
• Retargetable: Current Implementation Targets x86 and Alpha
• Alpha selected as our Target
• Instruction Selection via do_gen pass (Machine SUIF)
• Compression Engine implemented successfully.
• Register Allocation and Scheduling are under construction.
• Optimization and Memory Selection will be implemented later.
Compilation Procedure
• Compile a source program to SUIFvm.
• Perform instruction selection for Alpha using the do_gen pass.
• Convert the SUIF IR (a linear list of instructions) to CDFG.
• Compress the CDFG.
Compression Ratio = Compressed Code Size
Original Code Size
Compression ResultsC
ode
Siz
e
0
2000
4000
6000
8000
10000
12000
14000
16000
Adpcm Epic G721 Gsm Rasta
Initial
Compressed
72.35%
64.60%
71.58%
56.23%
61.03%
Compilation TimeC
ode
Siz
e
0
2000
4000
6000
8000
10000
12000
14000
16000
Adpcm Epic G721 Gsm Rasta
Initial
Compressed
0.47s
5.68s
6.21s
62.77s
11.18s
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
JPEG Mesa Mpeg2 Pegwit Pgp
Initial
Compressed
Compression ResultsC
ode
Siz
e
60.94%
50.93%
59.21%
59.71%60.29%
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
JPEG Mesa Mpeg2 Pegwit Pgp
Initial
Compressed
Compilation TimeC
ode
Siz
e
62.92s
402.35s
49.33s
87.21s57.05s
Conclusion
• Echo Instructions
• Hardware support for runtime execution of compressed programs.
• Compiler Framework
• Compress IR instead of assembly code
• Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications.
• Results do not account for register allocation.
References
• Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999.
• Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984.
• Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002.
• Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.
References
• Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003.
• Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997.
• Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000.
• Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.