supercomputer benchmarking

Upload: anonymous-wu14iv9dq

Post on 14-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Supercomputer Benchmarking

    1/18

    SupercomputerBenchmarking

    By: John Dorfner, Wesley Jones, and Eric Ng

    Cray-1 CDC 1604 Origin 2000 RS/6000 SP

  • 7/27/2019 Supercomputer Benchmarking

    2/18

    Overview Definition of Benchmark

    Introduction to Benchmark Suites

    SPEChpc96 Suite

    Livermore Loops

    The Linpack Benchmark

    The Top 8 Supercomputers

    HPC Challenge Benchmark Cray 1-A vs. IBM Cluster 1600

    inside the IBM Cluster 1600

    Conclusion

  • 7/27/2019 Supercomputer Benchmarking

    3/18

    A measurement or standard that serves as a point of reference bywhich process performance is measured. Benchmarking is a structuredapproach for identifying the best practices from industry andgovernment, and comparing and adapting them to the organization'soperations. Such an approach is aimed at identifying more efficientand effective processes for achieving intended results, and suggestingambitious goals for program output, product/service quality, andprocess improvement.

    www.ichnet.org

    Benchmark def.

  • 7/27/2019 Supercomputer Benchmarking

    4/18

    Supercomputer BenchmarkingThe number and type of benchmark suites used to studysupercomputer performance has varied widely over the years. In earlystudies, an ad hoc collection of programs was typically used tomeasure the performance of a given system relative to a knownperformance benchmark. Eventually, this practice evolved into groups

    of programs explicitly designed as supercomputer benchmark suites.The most widely used benchmarks for performance on supercomputingclusters are: the SPEChpc96 suite; the Livermore Loops; and forscientific machines, the Linpack Kernels.

    Some general examples of individual computer benchmarks: Dhrystone - Integer benchmark for UNIX systems

    Whetstone - Floating point benchmark for minicomputers I/O benchmarks MIPS Synthetic benchmarks Kernel benchmarks SPECint / SPECfp Summarizing

  • 7/27/2019 Supercomputer Benchmarking

    5/18

    SPEChpc96 SuiteIn 1995, the Standard Performance Evaluation Corp. (SPEC)announced the release of SPEChpc96, the first standard benchmarksuite specifically designed for measuring high-performance computing.SPEChpc96 was developed by SPEC's High Performance Group (HPG),which includes several leading high-performance computer vendors,systems integrators, and major universities and research institutes.

    SPEChpc96 allows users and vendors of high-end computers to make objectiveperformance comparisons across different hardware platforms.

    Specific scientific and industrial applications are represented within theSPEChpc96 benchamrk suite.The first two SPEChpc96 benchmarks are:

    SPECseis96, a seismic processing application SPECchem96, a computational chemistry application

    Since SPECseis96 and SPECchem96 can be run in both serial and parallelmodes, the SPEChpc96 suite can be used for general performance comparisonsover a broad range of high-performance computing systems. This list includesmultiprocessor systems, workstation clusters, distributed memory parallel

    systems, and traditional vector and vector parallel supercomputers.

  • 7/27/2019 Supercomputer Benchmarking

    6/18

    SPEChpc96 Suite: MetricsThe SPECseis96 and SPECchem96 suites each generate four metrics.Each program represents a different problem size and is used tocharacterize the scalability of the application as well as the entiresystem.The SPEChpc96 metrics are as follows:

    SPECseis96_SMSPECseis96_MDSPECseis96_LGSPECseis96_XLSPECchem96_SMSPECchem96_MDSPECchem96_LG

    SPECchem96_XL.The metrics are unitless. They are derived as follows:metric = (86400 seconds) / (elapsed time of benchmark in seconds)

    Since these benchmarks are both compute-intensive and data-intensive, theabove metrics are used to reflect the performance of the entire system. Thisincludes the processors, memory access, I/O bandwidth, interconnect topology,

    etc. For example, the SPECseis96_XL requires processing of 100GB of data.

  • 7/27/2019 Supercomputer Benchmarking

    7/18

    Livermore LoopsLivermore Loops is a set of kernels consisting of loops from realFortran programs.

    Introduced in 1970, this supercomputer benchmark was initiallycomprised of 14 kernels of numerically intensive applications written in

    Fortran. The number of kernels was increased to 24 in the 1980's.Performance measurements are taken in units of Millions of FloatingPoint Operations Per Second or MFLOPS. The program also evaluatesthe results for computational accuracy. A main aim of the Livermoredesign was to avoid producing single number performancecomparisons. The 24 kernels can be executed three times each at arange of do-loop spans to produce short, medium and long vector

    performance measurements. In this mode, if overall averages arequoted, the geometric mean may be interpreted as a characteristicrate of computation for the suite. However, it is more realistic toretain the range of statistics in terms of geometric, harmonic andarithmetic means, minimum and maximum.

  • 7/27/2019 Supercomputer Benchmarking

    8/18

    Livermore Loops: KernelsKernel 1: an excerpt from a hydrodynamic code.Kernel 2: an excerpt from an Incomplete Cholesky-Conjugate Gradient code.Kernel 3: the standard Inner Product function of linear algebra.Kernel 4: an excerpt from a Banded Linear Equations routine.Kernel 5: an excerpt from a Tridiagonal Elimination routine.Kernel 6: an example of a general linear recurrence equation.Kernel 7: an Equation of State fragment.Kernel 8: an excerpt of an Alternating Direction, Implicit Integration code.

    Kernel 9: an Integrate Predictor code.Kernel 10: a Difference Predictor code.Kernel 11: a First Sum.Kernel 12: a First Difference.Kernel 13: an excerpt from a 2-D Particle-in-Cell code.Kernel 14: an excerpt of a 1-D Particle-in-Cell code.Kernel 15: a sample of how casually FORTRAN can be written.

    Kernel 16: a search loop from a Monte Carlo code.Kernel 17: an example of an implicit conditional computation.Kernel 18: an excerpt from a 2-D Explicit Hydrodynamic code.Kernel 19: a general Linear Recurrence Equation.Kernel 20: an excerpt from a Discrete Ordinate Transport program.Kernel 21: a matrix X matrix product calculation.Kernel 22: a Planckian Distribution procedure.Kernel 23: an excerpt from 2-D Implicit Hydrodynamics.

    Kernel 24: finds the location of the first minimum in X.

  • 7/27/2019 Supercomputer Benchmarking

    9/18

    Livermore Loops: Kernel Output********************************************THE LIVERMORE FORTRAN KERNELS: * SUMMARY *********************************************Computer : CRAY-YMP C90 (240 MHz)System : UNICOS 7.C, loadedCompiler : CFT77 5.0.1.17Date : 92.02.18Testor : Charles Grassl, CRI

    MFLOPS RANGE: REPORT ALL RANGE STATISTICS:Mean DO Span = 167Code Samples = 72Maximum Rate = 826.0859 Mega-Flops/Sec.Average Rate = 190.5636 Mega-Flops/Sec.GEOMETRIC MEAN = 86.2649 Mega-Flops/Sec.Median Q2 = 83.5138 Mega-Flops/Sec.Harmonic Mean = 40.7302 Mega-Flops/Sec.Minimum Rate = 6.7925 Mega-Flops/Sec.

    Mean Precision = 11.07 Decimal Digits< BOTTOM-LINE: 72 SAMPLES LFK TEST RESULTS SUMMARY. >< USE RANGE STATISTICS ABOVE FOR OFFICIAL QUOTATIONS. >

  • 7/27/2019 Supercomputer Benchmarking

    10/18

    The Linpack BenchmarkThe Linpack Benchmark measures a computers floating-point rate ofexecution, Mflop/s, by running a mathematics application that solves adense system of linear equations. Over the years, the characteristicsof the benchmark have changed. Today, in fact, there are threebenchmarks included in the Linpack Benchmark report.

    The Linpack Benchmark grew out of the Linpack software project. Itwas originally intended to give end-users an indication of length oftime it would take to solve certain matrix problems.

    The three benchmarks in the Linpack Benchmark report are:Linpack Fortran n = 100 benchmark

    Linpack n = 1000 benchmarkLinpacks Highly Parallel Computing benchmark

    Mflop/s, millions of floating point operations per second, execution rate refers to64-bit floating-point operations of either addition or multiplication. Gflop/s arebillions of floating-point operations per second and Tflop/s are trillions offloating-point operations per second.

  • 7/27/2019 Supercomputer Benchmarking

    11/18

    Linpack: Performance Example

    Measured Gflop/s: Peak rate of execution in billions of floating point operations per second.Size of Problem: The matrix size at which the measured performance was observed.Size of Perf: The size of problem needed to achieve the measured peak performance.

    Theoretical Peak Gflop/s: The theoretical peak performance for the computer.

  • 7/27/2019 Supercomputer Benchmarking

    12/18

    The Top 8 Supercomputers

  • 7/27/2019 Supercomputer Benchmarking

    13/18

    The Top 8 Supercomputers

    Rank: Position within the TOP500 ranking Manufacturer: Manufacturer or vendor

    Computer: Model type indicated by manufacturer or vendor Installation Site: Customer Location: Location and country Year: Year of installation/last major update Installation Area: Field of Application Processors: Number of processors Rmax: Maximum LINPACK performance achieved

    Rpeak: Theoretical peak performance Nmax: Problem size for achieving Rmax N1/2: Problem size for achieving half of Rmax

    Table Key

  • 7/27/2019 Supercomputer Benchmarking

    14/18

    HPC Challenge BenchmarkA Group of 20 top researchers has initiated a program to redefine thebenchmarks used to measure high-performance systems under thedirection of the High Productivity Computing Systems program underthe Defense Advanced Research Projects Agency (DARPA). It isdesigned to broaden the Linpack benchmark of raw floating-point

    operations/second (flops). They have established a target date of2006 to release new a benchmark.

    The HPC Challenge benchmark consists of 5 hardware performance metrics: HPL - the Linpack TPP benchmark which measures the floating point rateof execution for solving a linear system of equations STREAM - a simple synthetic benchmark program that measures

    sustainable memory bandwidth (in GB/s) and the correspondingcomputation rate for simple vector kernels RandomAccess - measures the rate of integer random updates of memory PTRANS (parallel matrix transpose) - exercises the communications wherepairs of processors communicate with each other simultaneously. It is auseful test of the total communications capacity of the network b_eff (effective bandwidth benchmark) - a set of tests to measure latencyand bandwidth of a number of simultaneous communication patterns

  • 7/27/2019 Supercomputer Benchmarking

    15/18

    Cray 1-A1978 2002

    vs.

    IBM Cluster 1600

  • 7/27/2019 Supercomputer Benchmarking

    16/18

    Inside the IBM 1600 cluster

    The diagram above shows a schematic view

    of the two-cluster configuration

    The diagram above shows the configuration

    of a single cluster

  • 7/27/2019 Supercomputer Benchmarking

    17/18

    Conclusion Benchmarking refers to a measurement standard that serves as apoint of reference by which process performance is measured

    Three of the more popular suites for benchmarking supercomputersare the SPEChpc96 suite, the Livermore Loops, and for scientificmachines, the Linpack Kernels

    The performance ratios, for important HPC features, betweensupercomputers of the past and those used today, is vastly different

    As the High Performance Computing industry grows, the benchmarksused upon supercomputers must also grow in order to provide a yardstick by which these systems can be measured

  • 7/27/2019 Supercomputer Benchmarking

    18/18

    For more information www.top500.org www.spec.org/hpg www.llnl.gov www.ecmwf.int/services/computing/overview/supercomputer_history.html www.microsoft.com/windows2000/hpc/ www.ibm.com www.sgi.com www.hp.com www.cray.com