operational forecasting on the sgi origin 3800 and linux clusters

17
Operational Forecasting on the SGI Origin 3800 and Linux Clusters Roar Skålin Norwegian Meteorological Institute CAS 2001, Annecy, 31.10.2001 ions by: Dr. D. Bjørge, Dr. O. Vignes, Dr. E. Berge ?

Upload: avedis

Post on 24-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Operational Forecasting on the SGI Origin 3800 and Linux Clusters. ?. Roar Skålin Norwegian Meteorological Institute CAS 2001, Annecy, 31.10.2001. Contributions by: Dr. D. Bjørge, Dr. O. Vignes, Dr. E. Berge and T. Bø. DNMI Atmospheric Models. Weather Forecasting - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Operational Forecasting on the SGI Origin 3800 and Linux

Clusters

Roar Skålin

Norwegian Meteorological Institute

CAS 2001, Annecy, 31.10.2001

Contributions by: Dr. D. Bjørge, Dr. O. Vignes, Dr. E. Berge and T. Bø

?

Page 2: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

DNMI Atmospheric Models

• Weather Forecasting– HIRLAM (HIgh Resolution Limited Area Model)– 3D VAR, hydrostatic, semi-implicit, semi-Lagrangian– Parallellisation by SHMEM and MPI– Resolutions: 50 km -> 20 km, 10 km, 5 km

• Air Quality Forecasting (Clean City Air):– HIRLAM: 10 km– MM5: 3 km and 1 km– AirQUIS: Emission database, Eulerian dispersion

model, sub-grid treatment of line and point sources, receptor point calculations

Page 3: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

DNMI Operational Computers

GridurSGI O3800220 PE/220 GBMIPS 400 MHzTrix OSCompute Server

ClusterScali TeraRack20 PE/5 GBIntel PIII 800 MHzLinux OSCompute Server

500 kmPeak: 100 Mbit/sFtp: 55 Mbit/sScp: 20 Mbit/s

MonsoonSGI O20004 PE/2 GBIrix OSSystem Monitoring and Scheduling (SMS)

2 mPeak 100 Mbit/sFtp: 44 Mbit/s

Page 4: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

DNMI Operational Schedule

Monsoon

SMS

Met. WorkstationHirlam 50: 02:30Hirlam 10: 03:30

MM5: 05:00AirQUIS: 06:00

NT systems

AirQUIS: 05:50

Cluster

MM5: 05:00

Gridur

Hirlam 50: 02:30Hirlam 20: 03:15Hirlam 10: 03:30

EC Frames: 01:20

Observastions: 02:15

Page 5: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Cray T3E vs. SGI Origin 3800

• HIRLAM 50 on Cray T3E:– Version 2.6 of HIRLAM– DNMI specific data assimilation and I/O– 188 x 152 x 31 grid points– Run on 84 EV5 300 MHz processors

• HIRLAM 20 on SGI Origin 3800– Version 4.2 of HIRLAM– 3D VAR and GRIB I/O– 468 x 378 x 40 grid points– Run on 210 MIPS R14K 400 MHz processors

Page 6: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Cray T3E vs. SGI Origin 3800

0 %

10 %

20 %

30 %

40 %

50 %

60 %

Dynamics Physics Diffusion Init

T3E

O3800

HIRLAM 50 on Cray T3E, 84 PEs vs. HIRLAM 20 on SGI Origin 3800, 210 PEs:

Page 7: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Cray T3E vs. SGI Origin 3800

0 %10 %20 %30 %40 %50 %60 %70 %80 %90 %

One or moreprocessors

communicate or wait

All processorscompute

T3E

O3800

HIRLAM 50 on Cray T3E, 84 PEs vs. HIRLAM 20 on SGI Origin 3800, 210 PEs :

Page 8: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

O3800 Algorithmic Challenges

• Reduce the amount of messages and synchronisation points– Use of buffers in nearest neighbour communication– Develop new algorithms for data transposition– Remove unnecessary statistics

• Parallel I/O– Asynchronous I/O on a dedicated set of processors

• Dynamic load balancing• Single node optimisation

– Currently far less important than on the Cray T3E

Page 9: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

O3800 System Challenges

• Interference from other users– CPU: Must suspend all other jobs, even if we run on a

subset of the system– Memory: Global Swapping under TRIX/IRIX– Interactive processes: Cannot be suspended

• Security– Scp substantially slower that ftp– TRIX is not a problem

• Communication on a system level– Memory: Use local memory if possible– I/O: CXFS, NFS, directly mounted disks

Page 10: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Clean City Air

• Collaborative effort of:

The Norwegian Public Road Administration

The Municipality of Oslo

The Norwegian Meteorological Institute

Norwegian Institute for Air Research

Page 11: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Main Aims

• Reduce the undesired effects of wintertime air pollution in Norwegian cities

• Components: NO2, PM10 (PM2.5)

• Develop a standardised and science based forecast system for air pollution in Norwegian cities

• Develop a basis for decision makers who want to control emissions on winter days with high concentration levels

Page 12: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Modelling Domains

Page 13: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

AirQUIS Output Domain Oslo

Page 14: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Scali TeraRack

• 10 Dual Nodes:– Two 800 MHz Pentium III– 512 MByte RAM– 30 GB IDE disk– Dolphin Interconnect

• Software:– RedHat Linux 6.2– Scali MPI implementation– PGI Compilers– OpenPBS queuing system

Page 15: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

MM5 on the TeraRack

0

20

40

60

80

100

120

140

160

MPI MPI and OpenMP Inlining

Target: 90 minutes to complete a 3 and 1 km run for the Oslo area

Page 16: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

MM5 on the TeraRack

• Modifications to MM5:– No changes to the source code– Changed to configuration files– Inlined eight routines

• DCPL3D, BDYTEN, SOLVE, EQUATE, DM_BCAST, EXCHANJ, ADDRX1C, SINTY

• Struggled with one bug in the PGI runtime environment and a few Scali bugs

Page 17: Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Conclusions

• Shared Memory (SM) vs. Distributed Memory (DM):– Performance of communication algorithms may differ

significantly– DM systems best for single user (peak), SM better for multi

user systems (throughput)– SM easy to use for ”new” users of parallel systems, DM

easier for ”experienced” users

• Linux Clusters:– So inexpensive that you can’t afford to optimise code– So inexpensive that you can afford to buy a backup system– Main limitations: Interconnect and I/O