toward a data-centric profiler for pgas …...program. however, the traditional code-centric view of...

1
RESEARCH POSTER PRESENTATION DESIGN © 2012 www.PosterPresentations.com Performance profiling is a key tool for parallel computing. PGAS languages value performance as important as productivity, sometimes even more. Chapel [1], as a newer PGAS language, needs a fully supported profiler that can present performance information in an intuitive way, such as a data-centric view, that allows programmers to investigate opportunities of improving the performance of their programs. We have presented an early snapshot of our on-going work that aims to provide Chapel programmers with a deeper understanding of the bottlenecks in their code. It also provides language developers with insights for future improvement. Introduction Finding performance problems is critical in any parallel program. However, the traditional code-centric view of performance data lacks the capability to find problems associated with how different variables are accessed by specific lines in the code. Data-centric approaches that relate performance to data structures rather than code are especially important for HPC/PGAS applications since memory allocation and data movement are often a bottleneck for overall performance. Therefore a profiling tool that can identify the inefficiencies by memory regions is highly desirable. Motivation This work is based on a previous data-centric profiler that focused on parallel programs running on clusters, called “Blame” [2]. Our tool can aggregate the performance data that is associated with the same variable or memory block from all the nodes in the system. The procedure of profiling Chapel programs consists of four steps: (see Figure 2) Static Analysis, Execution Measurement, Post Processing, GUI Display. Method Here are two representative programs that illustrate how the tool works for complex data types and multiple function calls. Micro-benchmarks Conclusion We extended the Chapel compiler to make our tool’s static analysis available to the LLVM backend, and added hooks to instrument the Chapel implicit multithreading feature. From the simple experiment result, we can see the unique insights that only data-centric profiling can provide. We believe that, through this tool and our future work on supporting the combination of data parallelism and task parallelism in multi-locale environment, developers and researchers will gain better ideas about Chapel or more general PGAS models. References [1] B. Chamberlain, D. Callahan, and H. Zima, “Parallel programmability and the Chapel language,” Int. J. High Perform. Comput. Appl., vol. 21, no. 3, pp. 291312, Aug. 2007. [Online].Available: http://dx.doi.org/10.1177/1094342007078442. [2] Rutar, Nick, and Jeffrey K. Hollingsworth. "Data centric techniques for mapping performance data to program variables." Parallel Computing 38.1 (2012): 2-14. Acknowledgment This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program under Award Numbers ER26143 and ER26054. Department of Computer Science, University of Maryland-College Park Hui Zhang, Jeffrey K. Hollingsworth {hzhang86, hollings}@cs.umd.edu Toward a Data-Centric Profiler for PGAS Applications Results Intra-procedural Static Analysis Binary Execution Post Processing (per node) Complex Types Analysis Container resolution Data-flow Analysis Reads/Writes/Alias Control-flow Analysis Branches/Loops With Augmented Chapel Compiler: Debug-info generation for LLVM backend User code instrumentation Multi-threads sampling Analyze the Context Sensitive Samples Compute Variable Profiles Node 1 Node 9 Node 7 Node 2 Node 3 Node 4 Node 6 Node 5 Node 10 Aggregate and Display GUI Figure 2. Our Profiling Framework main: 100% latency busy: 100% latency complex: 100% latency 1: int busy(int *x) { 2: *x = complex(); //consumes the most time 3: return *x; 3: } 4: int main() { 5: for (i=0; i<n; i++) { 6: A[i] = busy(&B[i]) + busy(&C[i]); 7: } 8: } Array A: 100% latency main Array B: 50% latency main.busy.complex Array C: 50% latency main.busy.complex Data-centric Profiling Code-centric Profiling Figure 1. Code-centric aggregates metrics to the different functions based on sampled lines, while data-centric can distinguish these metrics by different variables Table 1. Related source lines for each important local variable in Listing 1. Table 2. Related source lines for each important local variable in Listing 2. 1: proc grandChild(y_in:real) : real 2: { 3: var mid2 = y_in; 4: for i in 1..LARGE { 5: mid2 = (mid2**2 + 10)/7; 6: } 7: return mid2; 8: } 9: proc child(x_in: real) : real { 10: var mid1 = grandChild(x_in), 11: var val1; 12: for i in 1..LARGE { 13: val1 = (val1*3+11)/7 + mid1; 14: } 15: return val1; 16: } 17: proc main() { 18: var bar = 1.1; 19: var foo = child(bar); 20: } Listing 1. Function Hierarchy 1: record Birthday { 2: var year: int; 3: var month: int; 4: var day: int; 5: } 7: record Actor { 8: var name: string; 9: var bd: Birthday; 10:} 11: proc main() { 12: var ActorA = new Actor(); 13: var ActorB = new Actor(); 14: var localVar = 0.1, mid = 1; 15: for i2 in 1..LARGE { 16: mid = i%8; 17: ActorA.bd.month = (mid**3+6)%12 + 1; 18: ActorA.bd.day = (mid**2+8)%6; 19: localVar = localVar**2; 20: } 21: for i in 1..LARGE/2 { 22: ActorB.bd.year=(ActorB.bd.year**2+8)%6; 23: } 24: } Listing 2. Data Hierarchy Static Analysis Name Function Line numbers ActorA main 12,14~18 ActorA.bd main 12,14~18 ActorA.bd.month main 12,14~17 ActorA.bd.day main 12,14~16,18 ActorB main 13,21,22 ActorB.bd main 13,21,22 ActorB.bd.year main 13,21,22 mid main 14,15,16 localVar main 14,15,19 Name Function Line numbers foo main 3~5,10~13,18,19 val1 child 3,4,5,10,11,12,13 mid1 child 3,4,5,10 mid2 grandChild 3,4,5 Figure 3. GUI output for Listing 1 and 2 Hot Spot Hot Spot

Upload: others

Post on 20-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toward a Data-Centric Profiler for PGAS …...program. However, the traditional code-centric view of performance data lacks the capability to find problems associated with how different

RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

QU ICK START ( con t . )

How to change the template color theme You can easily change the color theme of your poster by going to the

DESIGN menu, click on COLORS, and choose the color theme of your

choice. You can also create your own color theme.

You can also manually change the color of your background by going to

VIEW > SLIDE MASTER. After you finish working on the master be sure to

go to VIEW > NORMAL to continue working on your poster.

How to add Text The template comes with a number of pre-

formatted placeholders for headers and text

blocks. You can add more blocks by copying and

pasting the existing ones or by adding a text box

from the HOME menu.

Text size Adjust the size of your text based on how much content you have to

present. The default template text offers a good starting point. Follow the

conference requirements.

How to add Tables To add a table from scratch go to the INSERT menu and

click on TABLE. A drop-down box will help you select rows and

columns.

You can also copy and a paste a table from Word or another PowerPoint

document. A pasted table may need to be re-formatted by RIGHT-CLICK >

FORMAT SHAPE, TEXT BOX, Margins.

Graphs / Charts You can simply copy and paste charts and graphs from Excel or Word. Some

reformatting may be required depending on how the original document has

been created.

How to change the column configuration RIGHT-CLICK on the poster background and select LAYOUT to see the

column options available for this template. The poster columns can also be

customized on the Master. VIEW > MASTER.

How to remove the info bars If you are working in PowerPoint for Windows and have finished your

poster, save as PDF and the bars will not be included. You can also delete

them by going to VIEW > MASTER. On the Mac adjust the Page-Setup to

match the Page-Setup in PowerPoint before you create a PDF. You can also

delete them from the Slide Master.

Save your work Save your template as a PowerPoint document. For printing, save as

PowerPoint or “Print-quality” PDF.

Print your poster When you are ready to have your poster printed go online to

PosterPresentations.com and click on the “Order Your Poster” button.

Choose the poster type the best suits your needs and submit your order. If

you submit a PowerPoint document you will be receiving a PDF proof for

your approval prior to printing. If your order is placed and paid for before

noon, Pacific, Monday through Friday, your order will ship out that same

day. Next day, Second day, Third day, and Free Ground services are

offered. Go to PosterPresentations.com for more information.

Student discounts are available on our Facebook page.

Go to PosterPresentations.com and click on the FB icon.

(—THIS SIDEBAR DOES NOT PRINT—)

DES IGN GU IDE

This PowerPoint 2007 template produces a 36”x48”

presentation poster. You can use it to create your research

poster and save valuable time placing titles, subtitles, text,

and graphics.

We provide a series of online tutorials that will guide you

through the poster design process and answer your poster

production questions. To view our template tutorials, go online

to PosterPresentations.com and click on HELP DESK.

When you are ready to print your poster, go online to

PosterPresentations.com

Need assistance? Call us at 1.510.649.3001

QU ICK START

Zoom in and out As you work on your poster zoom in and out to the level that

is more comfortable to you.

Go to VIEW > ZOOM.

Title, Authors, and Affiliations Start designing your poster by adding the title, the names of the authors,

and the affiliated institutions. You can type or paste text into the

provided boxes. The template will automatically adjust the size of your

text to fit the title box. You can manually override this feature and

change the size of your text.

TIP: The font size of your title should be bigger than your name(s) and

institution name(s).

Adding Logos / Seals Most often, logos are added on each side of the title. You can insert a logo

by dragging and dropping it from your desktop, copy and paste or by going

to INSERT > PICTURES. Logos taken from web sites are likely to be low

quality when printed. Zoom it at 100% to see what the logo will look like

on the final poster and make any necessary adjustments.

TIP: See if your school’s logo is available on our free poster templates

page.

Photographs / Graphics You can add images by dragging and dropping from your desktop, copy and

paste, or by going to INSERT > PICTURES. Resize images proportionally by

holding down the SHIFT key and dragging one of the corner handles. For a

professional-looking poster, do not distort your images by enlarging them

disproportionally.

Image Quality Check Zoom in and look at your images at 100% magnification. If they look good

they will print well.

ORIGINAL DISTORTED Corner handles

Go

od

pri

nti

ng

qu

alit

y

Bad

pri

nti

ng

qu

alit

y

© 2015 PosterPresentations.com 2117 Fourth Street , Unit C

Berkeley CA 94710

[email protected]

Performance profiling is a key tool for parallel computing. PGAS languages value performance as important as productivity, sometimes even more. Chapel [1], as a newer PGAS language, needs a fully supported profiler that can present performance information in an intuitive way, such as a data-centric view, that allows programmers to investigate opportunities of improving the performance of their programs.

We have presented an early snapshot of our on-going work that aims to provide Chapel programmers with a deeper understanding of the bottlenecks in their code. It also provides language developers with insights for future improvement.

Introduction

Finding performance problems is critical in any parallel program. However, the traditional code-centric view of performance data lacks the capability to find problems associated with how different variables are accessed by specific lines in the code. Data-centric approaches that relate performance to data structures rather than code are especially important for HPC/PGAS applications since memory allocation and data movement are often a bottleneck for overall performance. Therefore a profiling tool that can identify the inefficiencies by memory regions is highly desirable.

Motivation

This work is based on a previous data-centric profiler that focused on parallel programs running on clusters, called “Blame” [2]. Our tool can aggregate the performance data that is associated with the same variable or memory block from all the nodes in the system. The procedure of profiling Chapel programs consists of four steps: (see Figure 2)

Static Analysis, Execution Measurement, Post Processing, GUI Display.

Method

Here are two representative programs that illustrate how the tool works for complex data types and multiple function calls.

Micro-benchmarks

Conclusion

We extended the Chapel compiler to make our tool’s static analysis available to the LLVM backend, and added hooks to instrument the Chapel implicit multithreading feature. From the simple experiment result, we can see the unique insights that only data-centric profiling can provide. We believe that, through this tool and our future work on supporting the combination of data parallelism and task parallelism in multi-locale environment, developers and researchers will gain better ideas about Chapel or more general PGAS models.

References

[1] B. Chamberlain, D. Callahan, and H. Zima, “Parallel programmability and the Chapel

language,” Int. J. High Perform. Comput. Appl., vol. 21, no. 3, pp. 291–312, Aug.

2007. [Online].Available: http://dx.doi.org/10.1177/1094342007078442.

[2] Rutar, Nick, and Jeffrey K. Hollingsworth. "Data centric techniques for mapping

performance data to program variables." Parallel Computing 38.1 (2012): 2-14.

Acknowledgment

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program under Award Numbers ER26143 and ER26054.

Department of Computer Science, University of Maryland-College Park

Hui Zhang, Jeffrey K. Hollingsworth {hzhang86, hollings}@cs.umd.edu

Toward a Data-Centric Profiler for PGAS Applications

Results

Intra-procedural Static Analysis

Binary Execution

Post Processing (per node)

• Complex Types Analysis Container resolution

• Data-flow Analysis Reads/Writes/Alias

• Control-flow Analysis Branches/Loops

With Augmented Chapel Compiler: • Debug-info generation for LLVM backend • User code instrumentation • Multi-threads sampling

• Analyze the Context Sensitive Samples

• Compute Variable Profiles

Node 1 Node 9 Node 7 Node 2 Node 3 Node 4 Node 6 Node 5 Node 10

Aggregate and Display GUI

Figure 2. Our Profiling Framework

main: 100% latency busy: 100% latency complex: 100% latency

1: int busy(int *x) { 2: *x = complex(); //consumes the most time

3: return *x; 3: } 4: int main() { 5: for (i=0; i<n; i++) { 6: A[i] = busy(&B[i]) + busy(&C[i]); 7: } 8: }

Array A: 100% latency main Array B: 50% latency main.busy.complex Array C: 50% latency main.busy.complex

Data-centric Profiling Code-centric Profiling

Figure 1. Code-centric aggregates metrics to the different functions based on sampled lines, while data-centric can distinguish these metrics by different variables

Table 1. Related source lines for each

important local variable in Listing 1.

Table 2. Related source lines for each

important local variable in Listing 2.

1: proc grandChild(y_in:real) : real 2: { 3: var mid2 = y_in; 4: for i in 1..LARGE { 5: mid2 = (mid2**2 + 10)/7; 6: } 7: return mid2; 8: }

9: proc child(x_in: real) : real { 10: var mid1 = grandChild(x_in), 11: var val1; 12: for i in 1..LARGE { 13: val1 = (val1*3+11)/7 + mid1; 14: } 15: return val1; 16: } 17: proc main() { 18: var bar = 1.1; 19: var foo = child(bar); 20: }

Listing 1. Function Hierarchy

1: record Birthday { 2: var year: int; 3: var month: int; 4: var day: int; 5: } 7: record Actor { 8: var name: string; 9: var bd: Birthday; 10:}

11: proc main() { 12: var ActorA = new Actor(); 13: var ActorB = new Actor(); 14: var localVar = 0.1, mid = 1; 15: for i2 in 1..LARGE { 16: mid = i%8; 17: ActorA.bd.month = (mid**3+6)%12 + 1; 18: ActorA.bd.day = (mid**2+8)%6; 19: localVar = localVar**2; 20: } 21: for i in 1..LARGE/2 { 22: ActorB.bd.year=(ActorB.bd.year**2+8)%6; 23: } 24: }

Listing 2. Data Hierarchy

Sta

tic

An

aly

sis

Name Function Line numbers

ActorA main 12,14~18

ActorA.bd main 12,14~18

ActorA.bd.month main 12,14~17

ActorA.bd.day main 12,14~16,18

ActorB main 13,21,22

ActorB.bd main 13,21,22

ActorB.bd.year main 13,21,22

mid main 14,15,16

localVar main 14,15,19

Name Function Line numbers

foo main 3~5,10~13,18,19

val1 child 3,4,5,10,11,12,13

mid1 child 3,4,5,10

mid2 grandChild 3,4,5

Figure 3. GUI output for Listing 1 and 2

Hot Spot

Hot Spot