framework and design methodology of a compiler that compresses code using echo instructions

Framework and Design Methodology of a Compiler that

Compresses Code using Echo Instructions

Philip Brisk Majid Sarrafzadeh

Embedded and Reconfigurable Systems LabComputer Science Department

University of California, Los Angeles

[email protected] [email protected]

Outline

• Introduction

• Echo Instructions

• Compiler Framework

• Experimental Results

• Conclusion

Introductory Example:The HP DeskJet 820C Digital Controller

• Total chip area is 81 mm2

• ROM consumes 14% of total die area

• Reduce Code Size

Reduce ROM size

Reduce Chip Area

Reduce Heat Dissipation and Power Consumption

“… the foremost consideration … was the final cost to the buyer.”

[McWilliams, 1997]

LZ77 Compression [Ziv and Lempel, 1977]

• Replace of Repeated Substrings with Pointers

Example: ABCDCABCDBABCAA becomes

ABCDC(5, 4)B(7, 3)AA

Echo Instructions [Fraser, 2002] offer ISA support for Execution of LZ77-compressed programs

LZ77 Compression and Echo Instructions

Echo Instructions

Echo(Offset, Length)

1. Branch to PC – Offset; Save PC+1 in register R.

2. Execute the next Length Instructions

3. Branch to the address in register R

• Replaces Repeated Code Segments in a Program Instruction Stream

• Augments a MIPS Jump-and-Link (JAL) Instruction with a Parameterized Procedure Return Mechanism.

• Does not Incur the Overhead Associated with Procedure Calls.

An Example

$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1

100104108112116

340344348352356

404408412416420

• Repeating code sequences are replaced with echo instructions.

• Echo instructions are more space efficient than procedure calls

• No parameters

• No stack frame

$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$Echo(240, 5)$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…Echo(304, 5)$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1

100104108112116

340344348352356

404408412416420

Procedural Abstraction

• Techniques Predate Echo Instructions by 20+ Years

• Replace Repeated Instruction Sequences with Procedure Calls

• Substring Matching [Fraser, 1984]

• Reschedule/Rename [Cooper, 1999] [Lau, 2003]

• Our Approach: Subgraph Isomorphism

Substring Matching and Reschedule/Rename

$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1

100104108112116

340344348352356

404408412416420

$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$10 $5 + $4$11 $9 * $6$6 $9 * $10$10 $11 / $6$10 $6 + 10…$11 $7 * $8$1 $2 + $3$8 $7 * $1$1 $11 / $8$1 $8 + 1

100104108112116

340344348352356

404408412416420

Rename$4 : $3 $5 : $2 $6 : $8 $9 : $7$10 : $1 $11 : $11

Reschedule

Subgraph Isomorphism

+

*

*

+

/

All 3 Code Sequences have the same Data Flow Graph Representation

Subgraph Isomorphism Techniques Identify Repeated Pattern Instances [Kastner, 2001].

Register Allocation and Scheduling must be reformulated to Optimize Pattern Re-Use.

$1 $2 + $3$11 $7 * $8$8 $7 * $1$1 $11 / $8$1 $8 + 1…$10 $5 + $4$11 $9 * $6$6 $9 * $10$10 $11 / $6$10 $6 + 10…$11 $7 * $8$1 $2 + $3$8 $7 * $1$1 $11 / $8$1 $8 + 1

100104108112116

340344348352356

404408412416420

Example: 3 Dfgs

+ +

+

+

+ +

+

+

+ +

+

+

+ +

+

+

*

-

>> * -

+

<<

1 2

3 4

5

6

1

2 3 4 5

6

1 2 6

77 8

3 45

G1 G2 G3

Compression Example: 3 Dfgs

+ +

+

+

+ +

+

+

+ +

+

+

+ +

+

+

*

-

>> * -

+

<<

1 2

3 4

5

6

1

2 3 4 5

6

1 2 6

77 8

3 4

G1 G2 G3

5


6

+ +

+

+

+ +

+

+

+ +

+

+

+ +

+

+

*

-

>> * -

+

<<

1 2

3 4

5

6

1

2 3 4 5

6

1 2

77 8

3 4

G1 G2 G3

5


+ +

+

+

+ +

+

+

+ +

+

+

+ +

+

+

*

-

>> * -

+

<<

1 2

3 4

5

6

1

2 3 4 5

6

1 2 6

77 8

3 4

G1 G2 G3

5


E

E

E

* -

+

<<

1

2 3 4 5

6

1 2 6

77 8

3 4

G2 G3

5

+ +

+

+

*

-

>>

1 2

3 4

5

6G1

• Both patterns reference the same instruction sequence.

• Schedule of operations and register usage must be identical.

Register Allocation by Example

• Data dependencies are maintained between patterns

• Spilling values to memory is inevitable where register pressure is high.

• Shuffle or spill code reduces the effectiveness of compression

+ +

+

+

+ +

+

+

<<

A B F

Z

C D

G3

E

X

YT5

T6 T7

T8

T5

T6 T7

T8

T1T2

T3

T4

T1T2

T3T4

Temporary Registers

(Infinite Supply)

Compiler Framework

• Challenge

Design a Compiler that Minimizes Code Size for Architectures Augmented with Echo Instructions.

• Optimization Strategy

1. Minimize code size.

2. Select the lowest cost memory from a library.

3. Apply performance enhancing transformations as long as:

Code Size < Memory Capacity.

Design Overview

Performance Optimization

Target Independent Optimization

Instruction Selection

Compression Step

Register Allocation

Instruction Scheduling

Memory Selection

IR

MemoryLibrary

Assembly Code

1

2

3

4

5

6

7emit

Implementation Status

• Algorithms Integrated into the Machine SUIF Compiler

• Retargetable: Current Implementation Targets x86 and Alpha

• Alpha selected as our Target

• Instruction Selection via do_gen pass (Machine SUIF)

• Compression Engine implemented successfully.

• Register Allocation and Scheduling are under construction.

• Optimization and Memory Selection will be implemented later.

Compilation Procedure

• Compile a source program to SUIFvm.

• Perform instruction selection for Alpha using the do_gen pass.

• Convert the SUIF IR (a linear list of instructions) to CDFG.

• Compress the CDFG.

Compression Ratio = Compressed Code Size

Original Code Size

Compression ResultsC

ode

Siz

e

0

2000

4000

6000

8000

10000

12000

14000

16000

Adpcm Epic G721 Gsm Rasta

Initial

Compressed

72.35%

64.60%

71.58%

56.23%

61.03%

Compilation TimeC

ode

Siz

e

0

2000

4000

6000

8000

10000

12000

14000

16000

Adpcm Epic G721 Gsm Rasta

Initial

Compressed

0.47s

5.68s

6.21s

62.77s

11.18s

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

JPEG Mesa Mpeg2 Pegwit Pgp

Initial

Compressed

Compression ResultsC

ode

Siz

e

60.94%

50.93%

59.21%

59.71%60.29%

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

JPEG Mesa Mpeg2 Pegwit Pgp

Initial

Compressed

Compilation TimeC

ode

Siz

e

62.92s

402.35s

49.33s

87.21s57.05s

Conclusion

• Echo Instructions

• Hardware support for runtime execution of compressed programs.

• Compiler Framework

• Compress IR instead of assembly code

• Compression ratios ranging from 72.35% to 50.93% for 10 MediaBench applications.

• Results do not account for register allocation.

References

• Cooper, K. and McIntosh, N. Enhanced Code Compression for Embedded RISC Processors. PLDI, 1999.

• Fraser, C. W., Myers, E. W., and Wendt, A. Analyzing and Compressing Assembly Code. SCC, 1984.

• Fraser, C. W. An Instruction for Direct Interpretation of LZ77-compressed Programs. Microsoft Tech. Report, 2002.

• Kastner, R. et. al. Instruction Generation for Hybrid-Reconfigurable Systems. ICCAD, 2001.

References

• Lau, J. et. al. Reducing Code Size with Echo Instructions. CASES, 2003.

• Lee, C., Potkonjak, M., and Mangione-Smith, W. H. MediaBench: A Tool for Evaluating Multimedia and Communication Systems. MICRO, 1997.

• Runeson, J. Code Compression through Procedural Abstraction before Register Allocation. Master’s Thesis. University of Uppsala, March, 2000.

• Ziv, J. and Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory, May 1977.

framework and design methodology of a compiler that compresses code using echo instructions

Documents

dfgscompression example

pointers example

dfgs6compression example

example data dependencies

aa echo instructions

repeated instruction

code size

register allocation