center for scalable application development softwarecscads.rice.edu/vuduc_et_al--welcome.pdf ·...

12
Automatic tuning for petascale systems Keith Cooper (Rice) Richard Vuduc (Georgia Tech) Kathy Yelick (UCB/LBNL), Jack Dongarra (UTK) Center for Scalable Application Development Software 1

Upload: others

Post on 24-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Automatic tuning for petascale systemsKeith Cooper (Rice)

Richard Vuduc (Georgia Tech)

Kathy Yelick (UCB/LBNL), Jack Dongarra (UTK)

Center for Scalable Application Development Software

1

Page 2: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Proverb

A movementbegins as a vision,runs as a business, andends as a racket.

Question: In what stage are we?

2

Page 3: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Poly-algorithms: John R. Rice (Purdue)

(1969) “A polyalgorithm for the automatic solution of nonlinear equations”

(1976) “The algorithm selection problem”

Profiling and feedback-directed compilation

(1971) Knuth, “An empirical study of FORTRAN programs”

(1982) Graham, et al., gprof

(1987) Massalin, “superoptimizer”

(1991) Chang, Mahlke, Hwu: “Using profile information to assist classic code optimizations”

Code generation from high-level representations

(1989) J. Johnson, R.W. Johnson, D. Rodriguez, R. Tolimieri: “A methodology for designing, modifying, and implementing Fourier Transform algorithms on various architectures.”

(1992) M. Covell, C. Myers, A. Oppenheim: “Computer-aided algorithm design and arrangement” (1992)

“Automatic tuning” – Early seedlings

3

Page 4: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Identify + generate a space of implementations

Search space to find “best,” using models + experiments

A notion of autotuning:

m0

n0

k0 = 1

Mflop/s

Source: PHiPAC Project at UC Berkeley (1997)

Platform: Sun Ultra IIi

16 double regs

667 Mflop/s peak

Unrolled, pipelined inner-kernel

Sun cc v5.0 compiler

4

Page 5: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Identify + generate a space of implementations

“Identify” – What goes into the space?

“Generate” – IRs? Infrastructures?

How much is automatable? What is composable?

Search space to find “best,” using models + experiments

Static vs. dynamic?

Limits of models? Composability?

How much and what to measure?

What is “best?” (Metrics of success?)

A notion of autotuning:

5

Page 6: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

CScADS Goals

Conduct research leading to software tools and systems that help apps scale to petascale and beyond

Catalyze activities in computer science

Enable interactions among vendors, developers

Sponsor workshops, create “visions”

Foster development of new software through support of common software infrastructures and standards

6

Page 7: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

CScADS Participants

Funded by DOE SciDAC Program

Rice U. (lead): Mellor-Crummey & Cooper

Argonne: Beckman (site dir.), Gropp, Lusk

Berkeley: Yelick

U. Tenn. Knoxville: Dongarra

U. Wisconsin: Miller

Acknowledgements: Staff at Rice (Darnell Price), ANL (Lori Swift), Snowbird (Kelly Wilkins)

7

Page 8: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Format

“Meeting of minds”

Architects, compiler writers, library developers

Industry, labs, academia

Discussion and debate

Topic questions, but make up your own

No holds barred – push buttons!

Community building

Day 1: “Guests,” industry, libraries

Day 2, 3 (half): Compilers, libraries, run-time

8

Page 9: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Example: DARPA AACE

“Architecture-Aware Compiler Environment”

Build a self-assembling, self-tuning compiler that generates code with peak performance in zero-compilation time on any architecture, including one “it” has never seen before

Proposition: Compilers will never do this.

9

Page 10: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Today’s autotuning work does/doesn’t address the challenges of petascale.

How do we measure success for tuning? Performance? Productivity?

What architectures/platforms should we target?

“Parameter tuning” is the wrong focus for our area, as it suggests only incremental improvements.

Self-tuned libraries will always outperform compiler-generated code.

What improvements should we expect from autotuning? From compilers? libraries?

Simple performance models (e.g., cache-oblivious, simple cores) will be the right models in the future, obviating “search.”

Traditional boundaries between apps, libs, compilers, and OSes are too rigid.

What issues are we as a community ignoring?

Common infrastructures?

10

Page 11: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Today’s autotuning work does/doesn’t address the challenges of petascale.

How do we measure success for tuning? Performance? Productivity?

What architectures/platforms should we target?

“Parameter tuning” is the wrong focus for our area, as it suggests only incremental improvements.

Self-tuned libraries will always outperform compiler-generated code.

What improvements should we expect from autotuning? From compilers? libraries?

Simple performance models (e.g., cache-oblivious, simple cores) will be the right models in the future, obviating “search.”

Traditional boundaries between apps, libs, compilers, and OSes are too rigid.

What issues are we as a community ignoring?

Common infrastructures?

11

Page 12: Center for Scalable Application Development Softwarecscads.rice.edu/Vuduc_et_al--Welcome.pdf · (1976) “The algorithm selection problem” Profiling and feedback-directed compilation

Many recent “autotuning” meetings

SIAM Parallel Processing ’08 special sessions

DOE High Perf. Comp. Sci. Weekhttp://www.hpcsw.org/presentations/workshops/autotuning/

iWAPT (‘06–’08), in Japan

Others?

Who is not here?

12