performance models for application optimization walid abu-sufah [email protected] visiting...

27
Performance Models for Application Optimization Walid Abu-Sufah [email protected] Visiting Scholar, University of Illinois Associate Professor, University of Jordan

Upload: kristen-mobley

Post on 30-Mar-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Performance Models for Application Optimization

Walid Abu-Sufah

[email protected] Scholar, University of Illinois

Associate Professor, University of Jordan

Page 2: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Outline

1. Objective

2. Overview1. Roofline model2. Capacity model

3. Relate roofline/capacity

4. Open Issues

5. Discussion: How could PMUs help

www.upcrc.illinois.edu2

Page 3: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

1. Objective

www.upcrc.illinois.edu3

Explore how a model for a target architecture could be used for application tuning (may be in a compiler?).

Explore how a model for a target architecture could be used for application tuning (may be in a compiler?).

Page 4: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

2.1 Roofline Model

• For applications where off-chip memory bandwidth is the constraining resource (limit) in system performance.

• Relates processor performance to off-chip memory traffic.

• Bound and Bottleneck Model– good enough to understand which optimizations to try to get next

level of performance

• So far, demonstrated for several HPC dwarfs and multicore systems.

www.upcrc.illinois.edu4

Page 5: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Bounds

= Peak Processing Bandwidth; MFLOP/sec

= Peak DRAM Bandwidth; Mbytes/sec

• “Operational Intensity”: – Average number of Floating Point Operations per Byte to DRAM,

FLOPs/Byte– Varies by multicore design (cache org.) and dwarf– Characterize dwarf for a particular multicore design

5

PB

mB

Page 6: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Performance Model Graph

6

Y axis is GFLOPs/sec

X-axis is FLOPs/Byte(i.e. Operational Intensity)

Can plot peak DRAM BW, since

(GFLOPs/sec) (FLOPs/Byte)

= GBytes/sec

mB“Roofline”

pB

mB

Page 7: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline Visual Performance Model

7

• “Ridge Point”: minimum Operational Intensity to get Peak Performance • Compute Bound• Memory Bound

Ridge Point

Page 8: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline model for AMD Opteron X2

Page 9: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline model for Opteron X2 vs. Opteron X4

Page 10: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline model with ceilings for Opteron X2

10

Page 11: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline model with ceilings for Opteron X2.

Page 12: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline model with ceilings for Opteron X2

Page 13: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

What is next for Roofline

• Non-floating point kernels would be interesting– e.g., Sort (potential exchanges/sec vs GB/s),

Graph Traversal (nodes traversed/sec vs. GB/s)

• Opportunities for others to help investigate: many kernels, multicores, metrics, …

13

Page 14: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

2.2 Capacity Model

• HW represented as nodes with “peak” BW– In this talk & for illustration purposes, we assume

only two nodes, a memory and a processing node with BWs:

• System is represented as graph of HW nodes

mB pB

Page 15: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Performance Depends on:

A. System Characteristics1. Peak BWs of nodes2. Memory hierarchy (cache) organization/ size3. Operational overlap

B. Application Characteristics1. Relative demands on BWs2. Overheads

www.upcrc.illinois.edu15

Page 16: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Definitions

• Ration of peak BWs,

• BW-used per node: ,

• Ratio of BWs-used

• Ratio of BW-used per node to system bandwidth-used:

www.upcrc.illinois.edu16

upB

umB

p

mmp B

B,

um

up

up

mp BB

B

,

1

pmup

um

mp B

B,, /1

Page 17: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Capacity of A Node

Average node BW utilized by an application

A function of

• Application characteristics

• Node BW

www.upcrc.illinois.edu17

,{ pupp

pup

up

BBifB

BBifBpC

,{ m

umm

mum

um

BBifB

BBifBmC

Page 18: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Saturated Node Capacity• Assume that at least one of the nodes is saturated, then

processor capacity, , is given by

www.upcrc.illinois.edu18

A similar expression applies for memory capacity, mC

mps CCC

pC

System capacity,

Similar argument holds for unsaturated node pairSimilar argument holds for unsaturated node pair

Page 19: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Saturated Node Capacity Expression – Example

• For αp,m = ½

www.upcrc.illinois.edu19

Page 20: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Processor, Memory, and System Capacity Curves ( )

www.upcrc.illinois.edu20

21

, mp

Page 21: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

3. Relating Roofline/ Capacity

• A processing optimization ceiling, x , in Roofline corresponds to a used processing BW

• A memory optimization ceiling , y, in Roofline corresponds to a used memory BW,

• If an application is optimized using optimizations x and y then

www.upcrc.illinois.edu21

xpB

ymB

ym

xp

xp

mp BB

B

,

1

pmxp

ym

mpB

B,, /1

Page 22: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

Roofline model with ceilings for Opteron X2

) or ILP ( 1 SIMDpB

5 ,4mB

pB

mB

5,41

1

,

1

mp

p

mp BB

B

Page 23: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

4. Open Issues

• Modeling with different performance limiting factors – Cache resident client applications (i.e. memory BW is not the

limit)

• Introduce additional bounds: Network BW and IO BW

• Development of tools based on models for use in application optimization

www.upcrc.illinois.edu23

Page 24: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

5. Discussion:How could PMUs help

www.upcrc.illinois.edu24

Page 25: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

References: Roofline Model

• S. Williams, A. Waterman, D. Patterson, "Roofline: an insightful visual performance model for multicore architectures,” Communications of the ACM, Volume 52 , Issue 4 (April 2009), Pages 65-76.

• David Patterson,” The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem?“, April 8, 2009 lecture in the Parallel@Illinois Distinguished Lecture Series (http://www.parallel.illinois.edu/dls_archive.html )

www.upcrc.illinois.edu25

Page 26: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

References: Capacity Model

• D. J. Kuck, "Computer System Capacity Fundamentals,” National Bureau of Standards, Technical Note 851, Oct. 1974.

• D. J. Kuck, B. Kumar, A system model for computer performance evaluation, March 1976 SIGMETRICS 76: Proceedings of the 1976 ACM SIGMETRICS Conference on computer performance modeling measurement and evaluation.

• D.J. Kuck, The Structure of Computers and Computations, Vol. I, John Wiley & Sons, Inc., 1978.

www.upcrc.illinois.edu26

Page 27: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University

• David J. Kuck “Capacity-based Codesign of Computer HW and SW“, January 26, 2009 lecture in the Parallel@Illinois Distinguished Lecture Series (http://www.parallel.illinois.edu/dls_archive.html )

www.upcrc.illinois.edu27