topics on compilers – spring semester...

30
Topics on Compilers – Spring Semester 2011 Christine Wagner – 2011/06/08

Upload: others

Post on 12-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Topics on Compilers – Spring Semester 2011Christine Wagner – 2011/06/08

Page 2: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Introduction

Modulo Scheduling Challenges

Core Concepts

Implementation

Experimental Results

Conclusion

2011/06/08 2Edge-centric Scheduling

Page 3: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Embedded computing systems in today’s portable devices

demand high performance and energy efficiency

Traditional application specific hardware: ASICs

Different functionalities on a single device (voice/data

communication, high definition video, digital photography)

High non-recurring costs for designing ASICs

Programmable hardware solutions

2011/06/08 3Edge-centric Scheduling

Page 4: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Coarse-Grained Reconfigurable Architectures (CGRA)

offer high computation throughput, scalability, low cost and

energy efficiency

consist of an array of FU and register files often organized as a

two dimensional grid

need a compiler to efficiently map implementations of

compute intensive loops onto the array and to exploit all

available resources

2011/06/08 4Edge-centric Scheduling

Page 5: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Challenge: sparse connectivity and distributed register files

Values must be explicitly routed between producing and

consuming operations

No dedicated routing resources

FU serves either as compute resource or as routing resource

Approach of this paper: edge-centric modulo scheduling

2011/06/08 5Edge-centric Scheduling

Page 6: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Modulo Scheduling exposes parallelism by overlapping

successive iterations of a loop

Goal: Find a valid schedule with minimal initiation interval (II)

Factors that complicate CGRA scheduling:

1. Explicit routing

VLIW: routing implicitly guaranteed by storing inter-

mediate values in a multi-ported, centralized register file

CGRA: sparse connectivity and distributed register files

2011/06/08 6Edge-centric Scheduling

Page 7: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2. Intelligent routing

FU for computation and routing

scheduling can easily fail due to poor routing choices

minimizing routing resources

3. Heterogeneous nodes

Inexpensive and expensive nodes

Avoid scheduling inexpensive operations on expensive

nodes

2011/06/08 7Edge-centric Scheduling

Page 8: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

4. Modulo constraint

Resources used in periodic fashion as loop kernel

repeats every II cycles

Not possible to guarantee routability by extending the

schedule

schedule can easily fail due to previously scheduled

operations

2011/06/08 8Edge-centric Scheduling

Page 9: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 9

CGRA scheduling consists of two tasks:

• Placement of operations into computation slots (FU and time)

• Routing of operands

Node-centric scheduling:

• Operations are placed first and then the routing is done

• Slot by slot is visited until a solution is found

• Scheduler does not consider routing information when placing

operations

Page 10: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Unnecessary visits to empty

slots

Redundant routings

2011/06/08Edge-centric Scheduling 10

Page 11: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Example:

Assumption: C can only be

placed in (4,2) and (2,4)

(3,1): only remaining

memory access slot

Difficult to find the right

slot for placing an operation

2011/06/08Edge-centric Scheduling 11

Page 12: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 12

Edge-centric scheduling:

• Operation placement integrated into the routing function

• Scheduler starts with routing the edge instead of placing the operation

up front

• When empty slot is found, scheduler places operation temporarily and

checks if other edges connected to the consumer exist

• If so, those edges are routed recursively

• If this routing fails, the routing resumes from the current slot and not

from the starting slot

Page 13: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Only one routing call is required

Cost assignment to slots to avoid wasting expensive nodes

Faster performance and better results

2011/06/08Edge-centric Scheduling 13

Page 14: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 14

Final schedule formed by calling a routing function for each

edge of the DFG

Order in which the router visits each slot determined by a

routing cost assigned to each slot

Two main objectives when routing a single edge:

• Minimizing number of routing resources used

• Proactively avoiding routing failure: avoid using resources that will block

future routes and reserve slots for expensive operations

Page 15: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 15

Recurrence edges:

Edges in a recurrence cycle

Schedule them ahead of other operations, especially when II

is close to the length of the recurrence

Edges with the highest priority

Page 16: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 16

Simple edges:

Outgoing edge of an operation that has only one consumer

High-fanout edges:

Outgoing edge of an operation with multiple consumers

Priority to simple edges over high-fanout edges

Page 17: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Non-critical and critical edges:

Multiple disjoint paths between two nodes

in the DFG

Dependencies between edges in different paths

Edges on critical path are scheduled first

Example:

Recurrence cycle (5, 6, 8) scheduled first, then 0

2011/06/08Edge-centric Scheduling 17

Page 18: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Non-critical and critical edges:

Multiple disjoint paths between two nodes

in the DFG

Dependencies between edges in different paths

Edges on critical path are scheduled first

Example:

Recurrence cycle (5, 6, 8) scheduled first, then 0

2011/06/08Edge-centric Scheduling 18

Non-critical path

Page 19: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Non-critical and critical edges:

Multiple disjoint paths between two nodes

in the DFG

Dependencies between edges in different paths

Edges on critical path are scheduled first

Example:

Recurrence cycle (5, 6, 8) scheduled first, then 0

2011/06/08Edge-centric Scheduling 19

Non-critical path

Critical path

Page 20: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 20

Generation of reduced DFG

• Conversion of DFG into reduced form by collapsing nodes

• Operation is collapsible if inexpensive and has only one producer and

one consumer

• Remove node and draw edge from producer to consumer

• New edge annotated with number of collapsed nodes

Clustering of reduced DFG by ignoring high-fanout edges

Prioritize edges

Page 21: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 21

Operation scheduling by calling either placement or routing

function

• Placement function only called if target operation has no placed

producers or consumers

• Routing function: decision which edge to route first

• Decision based on factors like schedule time, state-changeability of

producers or consumers and how many routing options are available

• Forward or backward routing

Page 22: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 22

Routing cost calculation

• Routing cost for each available slot

• Used by router to determine the order in which to explore slots

• Three primary components:

1. Static cost: fixed cost assigned to each slot

2. Affinity cost: based on a slot’s distance from placed producers and

given to two operations that have common consumers

3. Probability cost: probability of a slot to be required in the future

Page 23: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 23

Finding a target

• After updating all routing costs, router starts finding a path from the

source to the target operation

• Router visits neighboring slots in order of their assigned costs

• When routing collapsed edges, the path goes through at least as many

FUs as the number of collapsed nodes, so that they can be expanded

later without problems

• After slot is found, scheduler checks for other edges connected to the

target and recurses to route those edges

Page 24: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 24

After finding a legal schedule, collapsed nodes are expanded

onto the found FU slots

Generation of configuration memories for each component

(e.g. control bits)

If scheduling fails, scheduler increases II and repeats

scheduling

Page 25: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 25

Benchmarks: media applications from embedded domain

(H.264 encoder, 3D graphics, AAC decoder, MP3 decoder)

CGRA Architecture: 4x4 heterogeneous array, 4 MEM and 6

MULT FUs, central RF and each FU has its own local RF

Loops with varying size mapped onto different configurations

Comparison with traditional, node-centric and simulated

annealing based modulo scheduling

Page 26: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 26

Page 27: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 27

Performance improvement of 25% over traditional modulo

scheduling

10-13% increased performance and reduced compile time of

27-46% compared to node-centric scheduling

Simulated annealing most effective strategy, but its high

performance results in slow compile time (EMS: 18x speedup)

EMS showed competitive performance results to simulated

annealing

Page 28: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

2011/06/08Edge-centric Scheduling 28

Edge-centric modulo scheduling for CGRAs

Focus on routing process with operation placement as a product

Performance improvement of 25% over traditional modulo scheduling

Reduced compilation time (18x compared to simulated annealing)

Performance heavily depends on characteristics of loop structure and underlying CGRA architecture

Page 29: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Thank you for listening!

Please feel free to ask questions!

2011/06/08 29Edge-centric Scheduling

Page 30: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation

Park, H., Fan, K., Mahlke, S., Oh, T., Kim, H., Kim, H.: Edge-centric Modulo

Scheduling for Coarse-Grained Reconfigurable Architectures. Proceedings

of PACT ’08, ACM New York, pp. 166–176.

2011/06/08Edge-centric Scheduling 30