instrumenting and analyzing platform - independent...

44
1 SIGIL A tool for assisting acceleration selection Siddharth Nilakantan , Parnian Mokri , Mike Lui and Mark Hempstead Drexel University

Upload: dotram

Post on 04-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 2: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Outline 2

Accelerator Selection Problem Example

Sigil Overview

Sigil Methodology for Accelerator Selection

Partitioning Example

Page 3: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

What is accelerator selection? 3

Which functions to accelerate?

Page 4: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

What is accelerator selection? 4

Which functions to accelerate? What are limiting factors for selection?

Page 5: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

What is accelerator selection? 5

Which functions to accelerate? What are limiting factors for selection?

Page 6: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

What is accelerator selection? 6

Which functions to accelerate? What are limiting factors for selection?

Page 7: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Accelerator selection 7

Time

CPU time Comm. time Accel. time

Page 8: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Accelerator selection 8

Time

CPU time Comm. time Accel. time

Page 9: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Accelerator selection 9

Time

CPU time Comm. time Accel. time

Page 10: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Platform-independent metrics 10

Accelerator time and communication time are implementation-dependent! Large design space for implementations

Early stage design approach: Capture platform-

independent metrics as proxy Accelerator time Compute operations Communication Time I/O set of bytes for each function

Page 11: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Capturing Input/Output Set 11

Input/Output set: NOT all memory reads and writes, only unique ones

Biggest challenge: Measuring unique communication

I/O bytes

Compute

Rd Addr

Function A Function B

Rd Addr

Compute

Wr Addr

Compute

Bytes added to I/O set

Bytes marked as reuse

A B

Page 12: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Enter Sigil 12

Novelty: Sigil measures these metrics *automatically* Classifying communication (unique and total bytes) Compute operations for each function Produces control data flow graph (CDFG) representations

Revisit the Q: Which functions to accelerate? Apply HW/SW partitioning algorithm to graphs! Goals of algorithm Minimize unique communication Maximizing coverage in HW

Page 13: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Outline 13

Accelerator selection problem

Sigil Overview

Sigil Methodology for Accelerator Selection

Partitioning Example

Page 14: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Sigil Implementation 14

Implemented on top of Callgrind Works on binary, no source changes Can be implemented on any framework. Requires

Functions Load/Store addresses

• Cache simulation • Branch prediction

• Control data flow graph • Unique and local communication costs

and edges

• Dynamic binary instrumentation • VEX IR generation

Page 15: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Tracking unique communication 15

Page 16: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Tracking unique communication 16

Write Addr. 1

Read Addr. 1

FunctionA Function B

Monitor

Shadow Memory

1

Page 17: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Tracking unique communication 17

Write Addr. 1

Read Addr. 1

FunctionA Function B

Monitor

Shadow Memory Update

last writer

1

Page 18: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Tracking unique communication 18

Write Addr. 1

Read Addr. 1

FunctionA Function B

Monitor

Shadow Memory

1

2

Page 19: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Tracking unique communication 19

Write Addr. 1

Read Addr. 1

FunctionA Function B

Monitor

Shadow Memory

1

2

Last Writer 3

Page 20: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Tracking unique communication 20

Write Addr. 1

Read Addr. 1

FunctionA Function B

Monitor

Shadow Memory

1

2

Last Writer 3

Func. B Data struct

Unique/non-unique bytes

4

Page 21: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Inside Shadow Memory 21

Shadow Obj

Secondary Maps

Addr[15:0] Shadow Obj

0 0 ….

Primary Map Addr[34:16]

Last Writer = Func A

Last Reader = None

Last Reader Call = 0

ST Addr, Register in Function A . . LD Register, Addr in Function B

A B

Page 22: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Inside Shadow Memory 22

Shadow Obj

Secondary Maps

Addr[15:0] Shadow Obj

0 0 ….

Primary Map Addr[34:16]

Last Writer = Func A

Last Reader = Func B

Last Reader Call = 1

22

ST Addr, Register in Function A . . LD Register, Addr in Function B

A B

Page 23: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Outline 23

Accelerator selection problem

Sigil Overview

Sigil Methodology for Accelerator Selection Control Data flow graphs Partitioning process

Partitioning Example

Page 24: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Control Data Flow graphs 24

Function calltrees…. Hierarchical representation of functions in application Obtained via Callgrind

main

A B

C E D1 D2

Page 25: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Control Data Flow graphs 25

Function calltrees annotated with unique communication flow Obtained via Sigil

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

Page 26: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Control Data Flow graphs 26

Function calltrees annotated with unique communication flow Add computation costs in as well Also obtained via Sigil

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

700

400 200

50 300 50 50

Page 27: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

HW/SW partitioning process 27

How to pick accelerator candidates in hierarchical CDFG?

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

Page 28: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

HW/SW partitioning process 28

How to pick accelerator candidates? Leaf nodes are self contained – Natural candidates If coverage of work too low?

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

Software

Accelerator Candidates

Page 29: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

HW/SW partitioning process 29

How to pick accelerator candidates? Leaf nodes are self contained – Natural candidates Non-leaf nodes? Include functionality of sub-calltree

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

Software

Accelerator Candidates

Page 30: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Calculate inclusive costs 30

Non-leaf nodes: Merge sub-calltree Inclusive computation costs – Add up operations Inclusive communication costs – Edges crossing the box

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

Software

Accelerator Candidates

Page 31: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Outline 31

Accelerator selection problem

Sigil Overview

Sigil methodology for accelerator selection

Partitioning examples In-depth look: 456.Hmmer Results: Multiple benchmarks

Page 32: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Partitioning algorithm 32

Employ any partitioning algorithm Existing algorithms Intuitive: Computation to Communication ratio State-of-the-art: Simulated Annealing, Genetic algorithms

We use a demonstrative algorithm utilizing: software time – from Callgrind communication time – from Sigil compute time – from Sigil

Does not indicate amenability of functions HLS tools show amenability

Page 33: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Partitioning example: Spec 456.Hmmer 33

main

Random Seq.

FChoose

Hmm caliber

Gauss Random

Digitize Seq.

P7 Viterbi

Other functions

After partitioning Call edges shown Communication edges

not shown Candidates in box

Assumptions For SW time –

2.5GHz CPU For Comm. Time –

16GB/s transfer rate

Page 34: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Partitioning example: Spec 456.Hmmer 34

main

Random Seq.

FChoose

Hmm caliber

Gauss Random

Digitize Seq.

P7 Viterbi

Cycles 14% OPs 10% Comm. 0.39%

1

4 3 2

Cycles 0.05% Ops 0.01% Comm. 1.82%

Cycles 10% Ops 4.7% Comm. 49%

Cycles 74% Ops 83% Comm. 4.62%

Rank statistics S/W, Flops/Iops and

Communication bytes coverage %

Comm. % Communication cost of merged function/Total communicated bytes in program

Other functions

Page 35: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Partitioning Results - PARSEC 35

Source: [1] C. Bienia et al. “The PARSEC Benchmark Suite: Characterization and Architectural Implications”, Princeton Technical Report

Rank Blackscholes Freqmine Dedup

1 String to float sort sha1_block_data_order

2 ieee754_exp FP_Array_scan2* sha1_block_data_order

3 ieee754_expf sort compress2*

4 ieee754_logf FP_Array_scan2* write_file*

Functions from demonstrative partitioning for PARSEC benchmarks

ieee_754/mul – IEEE “math” library functions sha1_block_data_order – core of SHA1 calculation FP_Array_scan2 – Builds “prefix-tree” for frequent pattern mining [1] * merged function

Page 36: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Partitioning Results - PARSEC 36

S/W Coverage with accelerator candidates

Page 37: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Looking forward 37

We plan on releasing results from SPEC, PARSEC, BioBench and more

Commonality of functions between applications Area may be free, design and verification are not

Need more applications!

Run Sigil on your workload and tell us what you find

Page 38: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Getting Sigil 38

Available open source git clone https://github.com/snilakan/Sigil Documentation included

Tested and validated in Linux

Officially supported distros: CentOS6, Ubuntu 12.04 LTS, Ubuntu 14.04 LTS

Theoretically, supported by any system supported by Valgrind (3.10.1)

Page 39: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Building Sigil 39

Automated script Checks dependencies and builds Valgrind with

Sigil/Callgrind Configures post processing scripts

Manual build process

Typical autotools flow (./configure, make, etc ) Will have to modify post processing scripts See documentation

Page 40: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Running Sigil 40

Compile <user_program> with debug flags

Generate CDFGs ./run_sigil.sh <user_program>

Partitioning the graph

./aggregate_costs.py –help Can plug in your own partitioning

algorithm!

Page 41: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Sigil Release 41

Official Website git clone https://github.com/snilakan/Sigil Or download zip from link

Contact: [email protected]

Related Publications “Platform-independent Analysis of Function-level

Communication in Workloads”, Siddharth Nilakantan and Mark Hempstead, IISWC 2013

“Metrics for Early-Stage Modeling of Many-Accelerator Architectures”, Siddharth Nilakantan, Steven Battle and Mark Hempstead, CAL July-Dec 2012

Page 42: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

42

BACKUP SLIDES

Page 43: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Partitioning steps 43

First, use a metric to compare nodes against parents Merge nodes when parents make better candidates

Second, rank leaf nodes by same metric

main

A B

C E

4

4

12

4 8 4

8 8

D1

16

D2

16

main

A B

D2 E

12/16

4

12

4 4

16 8

Page 44: instrumenting and analyzing platform - independent …accelerator.eecs.harvard.edu/.../hpca2015-tutorial-sigil.pdf · 1 SIGIL A tool for assisting acceleration selection . Siddharth

Metric for merging & ranking 44

Breakeven-speedup Minimum factor of computational acceleration, given

communication For calculation of communication; we can plug in a

transfer rate

Breakeven-speedup = A’

tsw

A

I/P O/P

tcomm:ip:accel

tcomm:op:accel