optimizing dsp scheduling via address assignment with array and loop transformation chun xue, zili...

30
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.- M. Sha Department of Computer Science University of Texas at Dallas

Upload: charles-weaver

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

OPTIMIZING DSP SCHEDULING VIA ADDRESS

ASSIGNMENT WITH ARRAY AND

LOOP TRANSFORMATIONChun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha

Department of Computer ScienceUniversity of Texas at Dallas

Page 2: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Background

• DSP processors generally provide dedicated address generation units (AGUs)

• AGUs can reduce address arithmetic instructions by modifying address register in parallel with the current instruction– Three modes: Auto-increment, Auto-

decrement, and using Modify Register

Page 3: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Goal

• Contrary to the traditional compilers, DSP compilers should carefully determine the relative location of data in memory to achieve compacted object code size and improved performance.

• We propose a scheme to exploit Address Assignment and scheduling for loops.

Page 4: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Contribution

• The algorithm utilizes the techniques of rotation scheduling, address assignment, and array transformation to minimize both address instructions and schedule length.

• Compared to the list scheduling, AIRLS shows an average reduction of 35.4% in schedule length and an average reduction of 38.3% in address instructions.

Page 5: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

AGU Example

1 Load *(AR0)

2 Adar AR0, 1

3 Add *(AR0)

4 Adar AR0, 1

5 Stor *(AR0)

1 Load *(AR0)+2 Add *(AR0)+3 Stor *(AR0)

A AR0

To Calculate: C = A + B

BC

Memory Layout

High

Low

Assembly Code without AGU

Assembly Code with AGU

The address arithmetic instructions can be reduced by modifying address register in parallel with the current instruction by AGU

Page 6: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

A LOOP WITH DFG AND SCHEDULE

Page 7: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

SCHEDULE I

Page 8: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

SCHEDULE II

Page 9: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

PROCESSOR MODEL

• For each functional unit in a multiple functional units processor, there is an accumulator and an address register.

• Memory access can only occur indirectly via address registers, AR0 through ARk.

• indirect addressing with post-increment, post-decrement

Page 10: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

ADDRESS ASSIGNMENT

• With a careful placement of variables in memory, – total number of address instructions can be reduce– Both code size and timing performance is improved

• Address assignment – the optimization of memory layout of program variables– For single functional unit processors, this problem has

been studied extensively.– However, little research has been done for multiple

function units architecture like TI C6x VLIW processors.

Page 11: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

PREVIOUS WORK

• Address Assignment is first studied by Bartley and Liao.

• They modeled the program as a graph theoretic optimization problem.

• The problem is proved to be NP-hard.

• An efficient algorithm is used to find the Maximum Weighted Path Covering

Page 12: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

LOOP SCHEDULING• DFG

– Data Flow Graph G=(V, E, OP, d)

• Static Schedule – Repeated pattern of an execution of an iteration

• Unfolding – A schedule of unfolding factor f can be obtained by

unfolding G f times.

• Rotation – Is a scheduling technique used to optimize a loop

schedule with resource constraints. It transforms a schedule to a more compact one iteratively.

Page 13: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Algorithm Step 1

• Put all nodes in the first row of Schedule S into a set R

• Delete the first Row of schedule S

• Shift S up by 1 control step

• Use L to record schedule length of S

Page 14: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Algorithm Step 2

• Retime each node u in set R

• r(u) = r(u) + 1

• Update each node with new computation

• B[i] = A[i] + 5 transform into

• B[i+1] = A[i+1] + 5

Page 15: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Algorithm Step 3• Generate a new array transformation

assignment based on the new computations from Step 2.

Page 16: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Algorithm Step 4

• Rotate each node u in set R by put u into the location with minimum address operations among all available locations.

• Function to calculate number of address operations: – AD(x,y) = 0 if x, y are the same– AD(x,y) = 1 if x, y are next to each other– AD(x,y) = 2 otherwise

Page 17: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

EXPERIMENT SETTING

• Experiment conduct using filters• Performed on PC with Linux• All running time within 10 seconds

Page 18: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

EXPERIMENT RESULT 1

Page 19: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

EXPERIMENT RESULT 2

Page 20: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

EXPERIMENT RESULT 3

Page 21: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

RESULT SUMMARY

• Compared to the list scheduling, the reduction in schedule length become 45.3% and the reduction in address instructions become 40.7%.

• Compared to the rotation scheduling, the reduction in schedule length become 24.8% and the reduction in address instructions become 40.7%.

Page 22: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

FURTHER IMPROVEMENT

• When we apply unfolding technique with a unfolding factor of 2, the average reduction in schedule length and number of address instructions are both increased

• Higher unfolding factor will give more reduction in both schedule length and address instructions

Page 23: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

CONCLUSION

• We propose an algorithm, AIRLS• Utilizes array transformation, address

assignment, and rotation scheduling techniques to reduce schedule length and address operations for loops on multiple functional units DSP

• Can significantly reduce schedule length and address instructions comparing to the previous works.

Page 24: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

FUTURE WORK

• Consider multiple address registers per functional unit

• Consider shared address registers

• Consider minimum number of address registers needed

Page 25: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Introduction

Page 26: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Motivating Example

Page 27: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Basic Concepts And Models

Page 28: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Algorithm

Page 29: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Experiments

Page 30: OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer

Conclusion