acceleration of cfd using graphics hardwaregp10006/research/pullan_gpus_jan08.pdf · 2008-01-29 ·...

51
Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve added some notes that weren’t on the original slides to help readers of the online pdf version.

Upload: others

Post on 03-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Acceleration of CFD using graphics hardware

Graham Pullan29 January 2008

I’ve added some notes that weren’t on the original slides to help readers of the online pdfversion.

Page 2: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Part 1: CPUs and GPUs

Page 3: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Moore’s Law

“The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue.”Gordon Moore (Intel), 1965

Page 4: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Moore’s Law

“The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue.”Gordon Moore (Intel), 1965

“OK, maybe a factor of two every two years.”Gordon Moore (Intel), 1975 [paraphrased]

Page 6: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Was Moore right?

Source: Intel

Page 7: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Was Moore right?

Source: Intel

Page 9: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Clock speed

Source: http://www.tomshardware.com/2005/11/21/the_mother_of_all_cpu_charts_2005/index.html

Page 10: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Power – the Clock speed limiter?

• 1 GHz CPU requires ≈ 25 W• 3 GHz CPU requires ≈ 100 W

Page 11: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Power – the Clock speed limiter?

• 1 GHz CPU requires ≈ 25 W• 3 GHz CPU requires ≈ 100 W

“The total of electricity consumed by major search engines in 2006 approaches 5 GW.” – Wired / AMD

Source: http://www.hotchips.org/hc19/docs/keynote2.pdf

Page 12: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

What to do with all these transistors?

Page 13: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Parallel computing

Multi-core chips are either:

– Instruction parallel(Multipile Instruction, Multiple Data) – MIMD

or

– Data parallel(Single Instruction, Multiple Data) – SIMD

Page 14: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Today’s commodity MIMD chips: CPUs

Intel Core 2 Quad• 4 cores• 2.4 GHz• 65nm features• 582 million transistors• 8MB on chip memory

Page 15: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Today’s commodity SIMD chips: GPUs

NVIDIA 8800 GTX• 128 cores• 1.35 GHz• 90nm features• 681 million transistors• 128kB on chip memory• 768MB on board memory

Page 16: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

CPUs vs GPUs

Source: http://www.eng.cam.ac.uk/~gp10006/research/Brandvik_Pullan_2008a_DRAFT.pdf

Page 17: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

CPUs vs GPUs

Transistor usage:

Source: NVIDIA CUDA SDK documentation

Page 18: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Graphics pipeline

Source: ftp://download.nvidia.com/developer/presentations/2004/Perfect_Kitchen_Art/English_Evolution_of_GPUs.pdf

Page 19: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Graphics pipeline

Page 20: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

GPUs and scientific computing

GPUs are designed to apply the same shading function

to many pixels simultaneously

Page 21: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

GPUs and scientific computing

GPUs are designed to apply the same function

to many data simultaneously

This is what most scientific computing needs!

Page 22: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Part 2: Application to CFD

Page 23: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

CFD basics

Body-fitted mesh

Page 24: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

CFD basics

Body-fitted mesh For each cell, conserve:

• mass

• momentum

• energy

and update flow properties

Page 25: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

GPGPU programming models

Two options (of many) :

• Treat GPU as a streaming machine (Brook - Stanford)

• Treat GPU as a set of multi-processors (CUDA - NVIDIA)

Page 26: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

GPGPU programming models

Two options (of many) :

• Treat GPU as a streaming machine (Brook - Stanford)– Example: 2D flow solver

• Treat GPU as a set of multi-processors (CUDA - NVIDIA)– Example: 3D flow solver

Page 27: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

The GPU as a streaming machine (Brook)

Source: http://www.eng.cam.ac.uk/~gp10006/research/Brandvik_Pullan_2007b_DRAFT.pdf

Page 28: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid

Page 29: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid

These slides show that four nodes forming a cell are a long way apart in CPU memory but close together in the (approximated) 2D texture cache of a GPU

Page 30: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – CPU memory

Page 31: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – CPU memory

Page 32: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – CPU memory

Page 33: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – GPU 2D memory

Page 34: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – GPU 2D memory

Page 35: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – GPU 2D memory

Page 36: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Example Brook code

kernel void calc_changes(float deltat<>, float area<>,float4 iflux[ ][ ], float4 jflux[ ][ ], out float4 delta<>){

float2 up ={0, 1};float2 right = {1, 0};float2 index = indexof delta.xy;delta =(iflux[index] - iflux[index + right]+ jflux[index] - jflux[index + up])

*deltat/area;}

Page 37: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Boundary problems

• To avoid branching – apply same kernel to all nodes (including boundaries)

• Boundary treatment is often different:

Page 38: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Store indices:

kernel void dist_delta(Index4 index<>, float4 primary_old<>, float4 delta[ ][ ], out float4 primary<>) {

primary = primary_old +0.25f*(delta[index.i1] + delta[index.i2] +

delta[index.i3] + delta[index.i4]);}

Page 39: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

CPU / GPU code split

Page 40: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

2D results

Source: http://www.eng.cam.ac.uk/~gp10006/research/Brandvik_Pullan_2008a_DRAFT.pdf

Page 41: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

2D performance

Page 42: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

The GPU as a multi-processor (CUDA)Divide 128 cores into

8 Multiprocessors (MPs)

Each MP has 16kB shared memory (no caching)

Source: NVIDIA CUDA SDK documentation

Page 43: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – CUDA partitioning

Page 44: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Typical grid – CUDA partitioning

Each colour to a different multiprocessor

Page 45: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

3D results

Source: http://www.eng.cam.ac.uk/~gp10006/research/Brandvik_Pullan_2008a_DRAFT.pdf

Page 46: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

2D/3D performance at 300k points

Page 47: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Conclusions

• Transistor counts continue to increase (exponentially)• Current CPUs are multi-core (≈4) MIMD• Current GPUs are many-core (≈100) SIMD

Page 48: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Conclusions

• Transistor counts continue to increase (exponentially)• Current CPUs are multi-core (≈4) MIMD• Current GPUs are many-core (≈100) SIMD

• Many-core SIMD is perfect for scientific computations• Speed-ups for CFD of 29x (2D) and 16x (3D) obtained

Page 49: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Conclusions

• Transistor counts continue to increase (exponentially)• Current CPUs are multi-core (≈4) MIMD• Current GPUs are many-core (≈100) SIMD

• Many-core SIMD is perfect for scientific computations• Speed-ups for CFD of 29x (2D) and 16x (3D) obtained

• Future CPUs will have heterogeneous cores• MIMD and SIMD on one chip – the best of both worlds!

Page 50: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Challenges

The sands are still shifting:• Software

– There are many ways to program GPUs• Hardware

– There are several options for future hardware• GPUs• CPUs with accelerator boards / chips• CPU-GPU combined on a single chip

Developers need a framework to support new software

Page 51: Acceleration of CFD using graphics hardwaregp10006/research/Pullan_GPUs_Jan08.pdf · 2008-01-29 · Acceleration of CFD using graphics hardware Graham Pullan 29 January 2008 I’ve

Acknowledgements

• Research student: Tobias Brandvik • Donation of GPU hardware: NVIDIA