acceleration of the longwave rapid radiative transfer module using gpgpu

1
Abstract This poster presents Weather Research and Forecast (WRF) model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, one of which is the Long-Wave Rapid Radiative Transfer Model (RRTM). Even with the advent of large-scale parallelism in weather models, much of the performance increase has came from increasing processor speed rather than increased parallelism. We present an alternative method of scaling model performance by exploiting emerging architectures like GPGPU using the fine-grain parallelism. We claim to get much more than 23.71x, performance gain by using asynchronous data transfer, use of texture memory and the techniques like loop unrolling. Introduction WRF offers multiple physics options, one of which is the Long-Wave Rapid Radiative Transfer Model (RRTM). Longwave RRTM (Rapid Radiative Transport Model) is an optional model that computes the energy transfer through the atmosphere due to electromagnetic radiation. Uses look-up tables for efficiency. Separates calculation into 16 spectral bands. RRTM Overview RRTM Present Status RRTM Previous work RRTM proposed as benchmark kernel http://www.mmm.ucar.edu/wrf/WG2/GPU/ Contains only CPU code. Nvidia claim to get 10X speed up but with different porting approach. ( No implementation avaialble) RRTM Sequential Approach Implementation Data Layout RRTM Scale-up after Porting Results References [1] Online http://www.mmm.ucar.edu/wrf/WG2/GPU/ [3] NVIDIA Corporation, Compute Unified Device Architecture (CUDA), http://developer.nvidia.com/object/cuda.html. Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU Mahesh Khadtare 1 , Pragati Dharmale 2 , Prakalp Somwanshi 3 1 I2IT, Pune, IN; 2 SNHU, NH, US; 3 CRL, Pune, IN Ozone mixing Atmospheric Profiling Coefficient for Radiative Transfer Gaseous optical depths and Planck functions for 16 spectral bands Radiative Transfer for clear and cloudy column INIRAD MM5ATM SETCOEF GASABS RTRN 11.4 % 6.8% 9.8% 40% 32% INIRAD MM5ATM SETCOEF GASABS RTRN k j i Physics Predominantly vertical, “column physics” Perfectly parallel in the horizontal Many and varied dependencies in the vertical dimension The original code and input data for the RRTM benchmark were obtained from the kernel Benchmark Page at NCAR. The code contains a driver for both CPU and GPU versions, as well as a serial CPU implementation of RRTM. The input data for this benchmark is a mesh of (x; z; y) = (73; 27; 60) elements. For any point in this three-dimensional mesh, radiation transfer only depends on data in the same vertical column. This independence between vertical columns was exploited for parallelism in developing the GPU version. Z- Dependencies Vector Dimensions RRTM only depends on data in the same vertical column. GPU exploits this parallelism. j i k Outer loops over (i,j) Extract 1D arrays in k and send to RRTM CPU Layout j i k GPU Layout All routines launch ‘nx*nythreads each thread calculates one column in ‘nz’ dimension. Reorder to A(i,j,k) after transfer A(i,k,j) INIRAD MM5ATM SETCOEF GASABS RTRN 11.4% 6.8% 9.8% 40% 32% DATA Transfer to GPU DATA Transfer to Host Single Thread Sequential Implementation Profile Scale-up using GPU 51.2x 16x 44x 9.81x 7.7x CPU / GPU Core Details Time (sec) Speedup AMD Athlon X2 Dual Core 1 1067 1x Tesla GeForce GTX 275 192 165 6.4x Fermi C2050 448 98 10.88x Fermi S2050 4x448 45 23.71x Input data is on a (nx,nz,ny) mesh with nx = 73, nz = 28, ny = 60 CONTACT NAME Pragati Dharmale: [email protected] POSTER P5144 CATEGORY: ASTRONOMY & ASTROPHYSICS - AA01

Upload: maheshkha

Post on 20-Mar-2017

13 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU

Abstract This poster presents Weather Research and Forecast (WRF) model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, one of which is the Long-Wave Rapid Radiative Transfer Model (RRTM). Even with the advent of large-scale parallelism in weather models, much of the performance increase has came from increasing processor speed rather than increased parallelism. We present an alternative method of scaling model performance by exploiting emerging architectures like GPGPU using the fine-grain parallelism. We claim to get much more than 23.71x, performance gain by using asynchronous data transfer, use of texture memory and the techniques like loop unrolling.

Introduction WRF offers multiple physics options, one of which is the Long-Wave Rapid

Radiative Transfer Model (RRTM). Longwave RRTM (Rapid Radiative Transport Model) is an optional model that

computes the energy transfer through the atmosphere due to electromagnetic radiation. Uses look-up tables for efficiency. Separates calculation into 16 spectral bands.

RRTM Overview

RRTM Present Status RRTM – Previous work

RRTM proposed as benchmark kernel http://www.mmm.ucar.edu/wrf/WG2/GPU/ Contains only CPU code. Nvidia claim to get 10X speed up but with different porting approach. ( No implementation avaialble)

RRTM Sequential Approach

Implementation

Data Layout

RRTM Scale-up after Porting

Results

References [1] Online http://www.mmm.ucar.edu/wrf/WG2/GPU/ [3] NVIDIA Corporation, Compute Unified Device Architecture (CUDA), http://developer.nvidia.com/object/cuda.html.

Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU

Mahesh Khadtare1, Pragati Dharmale2, Prakalp Somwanshi3 1 – I2IT, Pune, IN; 2 – SNHU, NH, US; 3 – CRL, Pune, IN

Ozone mixing

Atmospheric Profiling

Coefficient for Radiative

Transfer

Gaseous optical depths and Planck functions for 16 spectral bands

Radiative Transfer for clear

and cloudy column

INIRAD

MM5ATM

SETCOEF

GASABS

RTRN

11.4%

6.8%

9.8%

40%

32%

INIRAD

MM5ATM

SETCOEF

GASABS

RTRN k

j

i

Physics • Predominantly vertical, “column physics” • Perfectly parallel in the horizontal • Many and varied dependencies in the vertical dimension

The original code and input data for the RRTM benchmark were obtained from the kernel Benchmark Page at NCAR. The code contains a driver for both CPU and GPU versions, as well as a serial CPU implementation of RRTM.

The input data for this benchmark is a mesh of (x; z; y) = (73; 27; 60) elements. For any point in this three-dimensional mesh, radiation transfer only depends on data in the same vertical column. This independence between vertical columns was exploited for parallelism in developing the GPU version.

Z- D

epen

denc

ies

Vector Dimensions

RRTM only depends on data in the same vertical column.

GPU exploits this parallelism.

j

i

k

Outer loops over (i,j)

Extract 1D arrays in k and send to RRTM

CPU Layout

j

i

k

GPU Layout

All routines launch ‘nx*ny’ threads each thread calculates one column in ‘nz’ dimension.

Reorder to A(i,j,k) after transfer A(i,k,j)

INIR

AD

MM

5ATM

SE

TCO

EF

GA

SA

BS

RTR

N

11.4% 6.8% 9.8% 40% 32%

DAT

A Tr

ansf

er

to G

PU

DAT

A Tr

ansf

er

to H

ost

Single Thread Sequential

Implementation Profile

Scale-up using GPU 51.2x 16x 44x 9.81x 7.7x

CPU / GPU Core Details

Time (sec) Speedup

AMD Athlon X2 Dual Core 1 1067 1x

Tesla GeForce GTX 275 192 165 6.4x

Fermi C2050 448 98 10.88x

Fermi S2050 4x448 45 23.71x

Input data is on a (nx,nz,ny) mesh with nx = 73, nz = 28, ny = 60

contact name

Pragati Dharmale: [email protected]

P5144

category: Astronomy & AstroPhysics - AA01