gpu performance prediction greenlight education & outreach summer workshop ucsd. la jolla,...

GPU Performance Prediction

GreenLight Education & Outreach Summer WorkshopUCSD. La Jolla, California. July 1 – 2, 2009.

Javier DelgadoGabriel Gazolla

Constantinos MenelaouLixi Wang

Mark Joselli

Outline

Motivation Role in Energy Efficiency Performance Modeling GPU programming for Weather Modeling GPU Programming for BLAST Model Testing Conclusion


Benefits


GPU Performance Improvement Over Time


Source: nVidia.com

Sample Speedups


Source: nVidia.com

Outline

Motivation Role in Energy Efficiency Performance Modeling GPU programming for Weather Modeling GPU Programming for BLAST Model Testing Conclusion


Role in Energy Efficiency

Idle GPU = wasted energy Maximally-loaded GPU = a lot of power

consumption For example

Nvidia 8800 GTX consumes 137W @ max load Intel Xeon LS5400 consumes 50W @ max load


Source: http://mark.zoomcities.com/images/gfx/GFXpowerchartby3d.png (which is derived from data from http://www.xbitlabs.com)

Power Consumption


http://www.xbitlabs.com/articles/video/display/gf8800gts320MB-roundup_8.html#sect0

http://www.xbitlabs.com/articles/video/display/xfx-gf-gtx285-gtx295_16.html

GPU Role in Energy Efficiency

But...


Source: John Michalakes and Manish Vachharajani

• And ...


Outline

Motivation Role in Energy Efficiency Hurricane Mitigation Overview Performance Modeling GPU Programming for BLAST Model Testing Conclusion


Motivation

Hurricanes cost coastal regions financial and personal damage

Damage can be mitigated, but

Impact area prediction is inaccurate

Simulation using commodity computers is not precise

Alarming Statistics

40% of (small-medium sized) companies shut down within 36 months,

if forced closed for 3 or more days after a hurricane

Local communities lose jobs and hundreds of millions of dollars to their

economy

If 5% of businesses in South Florida recover one week earlier,

then we can prevent $219,300,000 in non-property economic

losses

Hurricane Andrew, Florida 1992 Katrina, New Orleans 2005 Ike, Cuba 2008

Outline



Motivation for application profiling and performance

prediction Optimal usage of grid resources through “smarter”

meta-scheduling Many users overestimate job requirements Reduced idle time for compute resources Save utility and energy costs Optimal resource selection for most expedient job

return time


Process


Typical Results on Large Clusters

Input: Marenostrum– 8, 16, and 32 nodes– 1 process per node

Output: Marenostrum– 8, 16, 32, 64, 96,

and 128 nodes

0 20 40 60 80 100 120 140

0

200

400

600

800

1000

1200

Actual Execution Time (s)Predicted Execution Time (s)

Number of Nodes

Exe

cutio

n T

ime

(s)

Future Modeling Plans

Model execution time with different GPU configurations

Current GPU project objective: learn how to model GPU performance by porting WRF kernels to CUDA Test with different cards Test with different processor configurations Test with different number of nodes


Overview of GPU Benchmarking Project


Understand Source code of existing CUDA-ported code

Understand old source code (Fortran)

Learn CUDA

Port another module

Benchmark

Learn WRF

Learn WRF

Learn CUDA

Learn Fortran

Status

Code has been compiled and executed Regions of similarity are being identified

– Fortran Program: 1729 lines

– CUDA (C) Program: 1329 lines (incl init) Currently figuring out necessary code logic of

existing ported kernel Preliminary documentation/report of findings


Outline



Purpose

BLAST used extensively for sequence analysis Provides a different kind of application for

testing GPU performance improvements Further improve our GPU programming and

performance modeling knowledge


Status

Literature review concerning other sequence analysis work with GPU

Learning how BLAST works


Long-running, Fault-tolerant Weather Prediction

Slight inaccuracies in initial conditions of domain can cause significant inaccuracies later

Third component of this project: account for this using perturbation analysis

The effects of perturbation on runtime must also be modeled


Conclusion

GPU’s promise much faster job execution for different applications

In order to maximize resource utilization, application execution time should be predictable Especially for time-critical applications that take long

to execute


Thank You

Questions?


gpu performance prediction greenlight education & outreach summer workshop ucsd. la jolla,...

Documents

energy efficiencyidle

energy efficiencybut

performance improvement

max loadintel xeon ls5400

personal damagedamage

john michalakes