hpc challenge benchmark suite 2006 spec workshop january 23, 2006 austin, tx jack dongarra piotr...

20
HPC Challenge Benchmark Suite HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek http://icl.cs.utk.edu/hpcc/

Upload: lillian-mccall

Post on 27-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

HPC Challenge Benchmark SuiteHPC Challenge Benchmark Suite

2006 SPEC WorkshopJanuary 23, 2006

Austin, TX

Jack DongarraPiotr Łuszczek

http://icl.cs.utk.edu/hpcc/http://icl.cs.utk.edu/hpcc/

Page 2: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 2/20

High Productivity Computing Systems

Impact:

Performance (time-to-solution): speed up critical national security applications by a factor 10X to 40X

Programmability (idea-to-first-solution): reduce cost and time of developing application solutions

Portability (transparency): insulate research and operational application software from system

Robustness (reliability): apply all known techniques to protect against outside attacks, hardware faults, and programming errors

Goal:Provide a generation of economically viable high productivity computing systems for the national security and industrial user community (2010)

Goal:Provide a generation of economically viable high productivity computing systems for the national security and industrial user community (2010)

Fill the Critical Technology and Capability Gap Today (late 80's HPC Technology) ... to ... Future (Quantum/Bio Computing)

Fill the Critical Technology and Capability Gap Today (late 80's HPC Technology) ... to ... Future (Quantum/Bio Computing)

Applications:Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology

Analysis &

Analysis &

Assessment

Assessment

PerformanceCharacterization

& Prediction

SystemArchitecture

SoftwareTechnology

HardwareTechnology

Programming Models

Industry R&D

Industry R&D

HPCS Program Focus Areas

Page 3: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 3/20

HPCC Motivation and Design

1. Augment TOP500

● Not use single number

● Provide detailed system description

2. Span locality space

3. Test various hardware components

Spatial LocalityT

emp

ora

l L

oca

lity

DGEMM

HPL

PTRANS

STREAM

FFT

RandomAccess

Mission Partner

Applications

LowH

igh

High

CPU

Memory Interconnect

Computationalresources

Computationalresources

Page 4: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 4/20

HPCC Components

1. HPL (Hi-Perf LINPACK)

2. STREAM

3. PTRANS (A ← AT+B)

4. RandomAccess

5. FFT

6. Matrix-matrix multiply

7. b_eff (effective bandwidth/latency)

A x=b A∈ℝ n×n x , b∈ℝ n

------------------------------------------------------- name kernel bytes/iter FLOPS/iter------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2-------------------------------------------------------

f k ∑1

m

t j e−2 i jk

m 1≤k≤m ; f , t∈ℂm1

-1

T: T[k] (+) ai

T[k] (+) ai

64 bits

ER

= E

E

π

C

R

Page 5: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 5/20

HPCC Test Variants

1. Local

2. Embarrassingly parallel

3. Global

4. Network only

MM

PPPP

MM

PPPP

MM

PPPP

MM

PPPP

NetworkNetwork

MM

PPPP

MM

PPPP

MM

PPPP

MM

PPPP

NetworkNetwork

MM

PPPP

MM

PPPP

MM

PPPP

MM

PPPP

NetworkNetwork

Page 6: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 6/20

Official HPCC Submission Process

1. Download

2. Install

3. Run

4. Upload results

5. Confirm via @email@

6. Tune

7. Run

8. Upload results

9. Confirm via @email@

● Only some routines can be replaced● Data layout needs to be preserved● Multiple languages can be used

● Only some routines can be replaced● Data layout needs to be preserved● Multiple languages can be used

Provide detailedinstallation and

execution environment

Provide detailedinstallation and

execution environment

Results are immediately available on the web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)

Results are immediately available on the web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)

OptionalOptional

Prequesites:● C compiler● BLAS● MPI

Prequesites:● C compiler● BLAS● MPI

Page 7: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 7/20

Measuring Locality in Code

HPC Challenge Benchmarks

Select Applications

0.00

0.20

0.40

0.60

0.80

1.00

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Spatial Locality

Tem

po

ral l

oca

lity

HPL

Test3D

CG

OverflowGamess

RandomAccess

AVUS

OOCore

RFCTH2

STREAM

HYCOM

• Spatial and temporal data locality here is for one node/processor — i.e., locally or “in the small”

• Spatial and temporal data locality here is for one node/processor — i.e., locally or “in the small”

Generated by PMaC @ SDSC

Page 8: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 8/20

HPCC Awards: SC|05 BOF

Class 1: Best Performance

● Figure of merit: raw system performance

● Submission must be valid HPCC database entry

Side effect: populate HPCC database

● 4 categories: HPCC components

HPL STREAM RandomAccess FFT

● Award certificates

4x $500 from HPCwire

Class 2: Most Productivity

● Figure of merit: performance and elegance

Highly subjective Based on committee vote

● Submission must implement at least 2 out of 4 Class 1 tests

The more tests the better

● Performance numbers are a plus

● The submission process:

Source code “Marketing brochure” SC|05 BOF presentation

● Award certificate

$1500 from HPCwireHPCwire contribution:● press coverage● $3500 awards

HPCwire contribution:● press coverage● $3500 awards

Page 9: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 9/20

HPCC Awards Class 2 Detailed Results

Language HPL RandomAccess STREAM FFT

Python MPI √ √

pMatlab √ √ √ √

Cray MTA C √ √

MPT C √ √

UPC x 3 √ √ √

Cilk √ √ √ √

OpenMP C++ √ √

StarP √ √

Parallel Matlab √ √ √ √

HPF √ √

Page 10: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 10/20

Time line: HPL Submission Stats

Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 060.1

1

10

100

1000

Tflo

p/s

259 Tflop/s

110 Gflop/s

HPCS goal: 2000 Tflop/s

SC04 SC|05

1. IBM BG/L 259 (LLNL)2. IBM BG/L 67 (Watson)3. IBM Power5 58 (LLNL)

x7x7 TOP500: 280 Tflop/s

TOP500Systemsin HPCC

database: #1, #2, #3,

#4, #10, #14, #17, #35, #37, #71, #80

TOP500Systemsin HPCC

database: #1, #2, #3,

#4, #10, #14, #17, #35, #37, #71, #80

Page 11: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 11/20

Time line: STREAM Submission Stats

Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 0610

100

1000

10000

100000

1000000

GB

/s

160 TB/s

27 GB/s

HPCS goal: 6500 TB/s

SC04 SC|05

1. IBM BG/L 160 (LLNL)2. IBM Power5 55 (LLNL)3. IBM BG/L 40 (Watson)

x40x40

Page 12: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 12/20

Time line: FFT Submission Stats

Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 061

10

100

1000

10000

Gflo

p/s

2311 Gflop/s

4 Gflop/s

HPCS goal: 500 Tflop/s

SC|05SC04

1. IBM BG/L 2.3 (LLNL)2. IBM BG/L 1.1 (Watson)3. IBM Power5 1.0 (LLNL)

x200x200

Page 13: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 13/20

Timeline: RandomAccess Submission Stats

Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 061

10

100

GU

PS

35 GUPS

0.01 GUPS

HPCS goal: 64000 GUPS

SC04 SC|05

1. IBM BG/L 35(LLNL)

2. IBM BG/L 17 (Watson)

3. Cray X1E 8 (ORNL)

x1800x1800

Page 14: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 14/20

Kiviat Charts: Multi-network Example

AMD Opteron clusters

● 2.2 GHz

● 64-processor cluster

Interconnects

1. GigE

2. Commodity

3. Vendor

Cannot be differentiated based on:

● HPL

● Matrix-matrix multiply

Available on HPCC website

● http://icl.cs.utk.edu/hpcc/

Kiviat chart (radar plot)

Page 15: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 15/20

HPCC Data Analysis: Normalize

Example: divide by peak flop/s

System HPL RandomAccess STREAM FFT

Cray XT3 81.4% 0.031 1168.8 38.3

Cray X1E 67.3% 0.422 696.1 13.4

IBM Power5 53.5% 0.003 703.5 15.5

IBM BG/L 70.6% 0.089 435.7 6.1

SGI Altix 71.9% 0.003 308.7 3.5

NEC SX-8 86.9% 0.002 2555.9 17.5

Page 16: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 16/20

HPCC Data Analysis: Correlate

HPL versus Theoretical Peak

0

5

10

15

20

25

30

0 5 10 15 20 25

HPL (Tflop/s)

Th

eore

tica

l P

eak

(Tfl

op

/s)

Cray XT3

NEC SX-8

SGI Altix

Is HPL an effective peak or just a peak?

Page 17: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 17/20

HPCC Data Analysis: Correlate More

Can I just run DGEMM (local matrix-matrix multiply) instead of HPL?

DGEMM alone overestimates HPL performance

Note the 1000x difference in scales: Tera vs. Giga

HPL versus DGEMM

0

5000

10000

15000

20000

25000

30000

0 5 10 15 20 25

HPL (Tflop/s)

DG

EM

M (

Gfl

op

/s)

Cray XT3

NEC SX-8

SGI Altix

Page 18: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 18/20

HPCC Data Analysis: Correlate Yet More

HPL versus G-RandomAccess

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 5 10 15 20 25

HPL (Tflop/s)

G-R

and

om

Acc

ess

(GU

PS

) Cray XT3

NEC SX-8

SGI Altix

Cray X1E/opt

IBM BG/L

Rackable

Page 19: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 19/20

Future Directions

Reduce execution time

● Preserve relevance of existing results

Add new tests but not duplicate effort

● Sparse matrix operations

● I/O

● Smith-Waterman (sequence alignment)

Porting

● Cell/PS3

● Languages

Co-Array Fortran HPCS languages: Chapel, Fortress, X10

● Environments

● Paradigms

Page 20: HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr Łuszczek

Jan 26, 2006 2006 SPEC Workshop, Austin, TX 20/20

Collaborators

David Bailey

● NERSC/LBL

Jeremy Kepner

● MIT Lincoln Lab

David Koester

● MITRE

Bob Lucas

● ISI/USC

Rusty Lusk

● ANL

John McCalpin

● IBM Austin «» AMD

Rolf Rabenseifner

● HLRS Stuttgart

Daisuke Takahashi

● Tsukuba, Japan