ccsm4 - a flexible new infrastructure for earth system modeling
DESCRIPTION
CCSM4 - A Flexible New Infrastructure for Earth System Modeling. Mariana Vertenstein NCAR CCSM Software Engineering Group. Major Infrastructure Changes since CCSM3. CCSM4/CPL7 development could not have occurred without the following collaborators DOE/SciDAC - PowerPoint PPT PresentationTRANSCRIPT
04/20/23 1
CCSM4 - A Flexible New CCSM4 - A Flexible New Infrastructure for Earth System Infrastructure for Earth System
ModelingModeling
Mariana Vertenstein NCAR
CCSM Software Engineering Group
Major Infrastructure Major Infrastructure Changes since CCSM3Changes since CCSM3
CCSM4/CPL7 development could not have occurred without the following collaborators– DOE/SciDAC
Oak Ridge National Laboratory (ORNL) Argonne National Laboratory (ANL) Los Alamos National Laboratory (LANL) Lawrence Livermore National Laboratory (LLNL)
– NCAR/CISL– ESMF
04/20/23 2
OutlineOutline What are software requirements of community
earth model? Overview of current CCSM4 How does CCSM4 address requirements?
– Flexibility permits greater efficiency, throughput, ease of porting and model development
How is CCSM4 being used in new ways?– Interactive ensembles - extending traditional definition of
component– Extending CCSM to ultra high resolutions
What is CCSM4 Scalability and Performance? Upcoming releases and new CCSM4 scripts
04/20/23 3
04/20/23 4
CESM General Software CESM General Software Requirements Requirements
04/20/23 5
Specific High Resolution Specific High Resolution Requirements Requirements
Capability to use both MPI and OpenMP effectively to address requirements of new multi-core architectures
Scalable and flexible coupling infrastructure
Parallel I/O throughout model system (for both scalable memory and performance)
Scalable memory (minimum global arrays) for each component
6
CCSM4 OverviewCCSM4 Overview Consists of a set of 4 (5 for CESM) geophysical
component models on potentially different grids that exchange boundary data with each other only via communication with a coupler (hub and spoke architecture)– New science is resulting in sharply increasing number
of fields being communicated between components Large code base: >1M lines
– Fortran 90 (mostly)– Developed over 20+ years– 200-300K lines are critically important --> no comp
kernels, need good compilers Collaborations are critical
– DOE/SciDAC, University Community, NSF (PetaApps), ESMF
04/20/23
CAM Modes: Multiple Dycores, Multiple Chemistry Options, WACCM, single column
CAM Modes: Multiple Dycores, Multiple Chemistry Options, WACCM, single column
CAM DATM (WRF)CAM DATM (WRF)Atmosphere ComponentAtmosphere Component
Data-ATM: Multiple Forcing/Physics Modes Data-ATM: Multiple Forcing/Physics Modes
CLM Modes: no BGC, BGC, Dynamic-Vegetation, BGC-DV, Prescribed-Veg, Urban
CLM Modes: no BGC, BGC, Dynamic-Vegetation, BGC-DV, Prescribed-Veg, Urban
CLM DLND (VIC)CLM DLND (VIC)Land ComponentLand Component
Data-LND: Multiple Forcing/Physics Modes Data-LND: Multiple Forcing/Physics Modes
CICE Modes: Fully Prognostic, PrescribedCICE Modes: Fully Prognostic, PrescribedCICE DICE CICE DICE Ice ComponentIce Component
Data-ICE : Multiple Forcing/Physics Modes Data-ICE : Multiple Forcing/Physics Modes
POP Modes: Ecosystem, Fully-coupled, Ocean-only, Multiple Physics Options
POP Modes: Ecosystem, Fully-coupled, Ocean-only, Multiple Physics Options
POP DOCN(SOM/DOM) (ROMS)POP DOCN(SOM/DOM) (ROMS)Ocean ComponentOcean Component
Data-OCN : Multiple Forcing/Physics Modes (SOM/DOM)Data-OCN : Multiple Forcing/Physics Modes (SOM/DOM)
New Land Ice Component
CouplerRegridding, Merging, Calculation of ATM/OCN fluxes,
Conservation diagnostic
What are the CCSM Components?
CCSM Component GridsCCSM Component Grids Ocean and Sea-Ice must run on same grid
– displaced pole, tripole Atmosphere and Land can now run on different grids
– these in general are different from the ocean/ice grid– lat/lon, but also new cubed sphere for CAM
Globally grids span low resolution (3 degree) to ultra-high– 0.25 ATM/LND [1152 x 768]– 0.50 ATM/LND [576 x 384]– 0.1 OCN/ICE [3600 x 2400]
Regridding – Done in parallel at runtime using mapping files that are
generated offline using SCRIP – In past, grids have been global and logically rectangular –
but now can have single point, regional, cubed sphere …– Regridding issues are rapidly becoming a higher priority
04/20/23 8
CCSM Component CCSM Component ParallelismParallelism
MPI/OpenMP– CAM, CLM, CICE, POP have MPI/OpenMP hybrid capability– Coupler only has MPI capability– Data models only have MPI capability
Parallel I/O (use of PIO library)– CAM, CICE, POP, CPL, Data models all
have PIO capability
04/20/23 9
New CCSM4 Architecture New CCSM4 Architecture
processors
New Single Executable CCSM4 architecture (cpl7)
tim
e
CPL (regridding, merging)
CAM
CLM
CICE
Driver (controls time evolution)
POP
Sequential Layout
processors
Hybrid Sequential/Concurrent Layouts
CAM
CLM CICE
POP
Driver
CPL
Original Multiple Executable CCSM3 architecture (cpl6)
CAM CLM CICE POP CPLtim e
processors
04/20/23 11
Advantages of CLP7 DesignAdvantages of CLP7 Design New flexible coupling strategy
– Design targets a wide range of architectures - massively parallel peta-scale hardware, smaller linux clusters, and even single laptop computers
– Provides efficient support of varying levels of parallelism via simple run-time configuration for processor layout
New CCSM4 scripts provide one simple xml file to specify processor layout of entire system and automated timing information to simplify load balancing
Scientific unification – ALL model development done with one code base -
elimination of separate stand-alone component code bases (CAM, CLM)
Code Reuse and Maintainability– Lowers cost of support/maintenance
04/20/23 12
More CPL7 advantages…More CPL7 advantages… Simplicity
– Easier to debug - much easier to understand time flow
– Easier to port – ported to IBM p6 (NCAR) Cray XT4/XT5 (NICS,ORNL,NERSC) BGP (Argonne), BGL (LLNL) Linux Clusters (NCAR, NERSC, CCSM4-alpha users)
– Easier to run - new xml-based scripts permit user-friendly capability to create “out-of-box” experiments
Performance (throughput and efficiency)– Much greater flexibility to achieve optimal load balance for
different choices of Resolution, Component combinations, Component
physics– Automatically generated timing tables provide users with
immediate feedback on both performance and efficiency
CCSM4 Provides a Seamless End-to-End Cycle of Model CCSM4 Provides a Seamless End-to-End Cycle of Model Development, Integration and Prediction with Development, Integration and Prediction with
One Unified Model Code Base One Unified Model Code Base
04/20/23 13
New frontiers for CCSM New frontiers for CCSM
Using the coupling infrastructure in novel ways– Implementation of interactive
ensembles Pushing the limits of high resolution
– Capability to really exercise the scalability and performance of the system
04/20/23 14
CCSM4 and PetaApps CCSM4 and PetaApps CCSM4/CPL7 is integral piece of NSF CCSM4/CPL7 is integral piece of NSF
Petaapps award Petaapps award – Funded 3 year effort aimed atFunded 3 year effort aimed at advancing
climate science capability for petascale climate science capability for petascale systemssystems
– NCAR, COLA, NERSC, U. MiamiNCAR, COLA, NERSC, U. Miami– Interactive ensembles using CCSM4/CPL7 Interactive ensembles using CCSM4/CPL7
involves both computational and scientific involves both computational and scientific challengeschallenges
used to understand how oceanic, sea-ice and used to understand how oceanic, sea-ice and atmospheric noise impacts climate variabilityatmospheric noise impacts climate variability
can also scale out to tens of thousands of processorscan also scale out to tens of thousands of processors
– Also examine use of PGAS language in CCSMAlso examine use of PGAS language in CCSM
04/20/23 15
04/20/23 16
CLM
CPL
POP
Driver
CAM CAM CAM
tim
e
processors
Interactive Ensembles and Interactive Ensembles and CPL7CPL7• All Ensemble members run concurrently on non-overlapping
processor sets• Communication with coupler takes place serially over ensemble members• Setting new number of ensembles requires editing 1 line of an xml file• 35M CPU hours TeraGrid [2nd largest]
• All Ensemble members run concurrently on non-overlapping processor sets• Communication with coupler takes place serially over ensemble members• Setting new number of ensembles requires editing 1 line of an xml file• 35M CPU hours TeraGrid [2nd largest]
CICE
tim
e POP
Driver
CLM
CICE
CAM
processors
POP POP
CPL
Currently being used to perform ocean data assimilation (using DART) for POP2
CCSM4 and Ultra High Resolution CCSM4 and Ultra High Resolution
DOE/LLNL Grand Challenge Simulation– .25° atmosphere/land and .1° ocean/ice– Multi-institutional collaboration (ANL, LANL,
LLNL, NCAR, ORNL) – First ever U.S. multi-decadal global climate
simulation with eddy resolving ocean and high resolution atmosphere
0.42 sypd on 4048 cpus (Atlas LLNL cluster) 20 years completed 100 GB/simulated month
04/20/23 17
Ultra High Resolution Ultra High Resolution (cont)(cont)
NSF/PetaApps Control Simulation (IE baseline) – John Dennis (CISL) has carried this out– .5° atmosphere/land and .1° ocean/ice– Control run in production @ NICS (Teragrid)
1.9 sypd on 5848 quad-core XT5 cpus (4-5 months continuous simulation)
155 years completed 100TB of data generated (generating 0.5-1 TB per wall clock
day) 18M CPU hours used
– Transfer output from NICS to NCAR (100 – 180 MB/sec sustained) – archive on HPSS
– Data analysis using 55 TB project space at NCAR
04/20/23 18
Next steps at high Next steps at high resolutionresolution
Future work– Use OpenMP capability in all components effectively to
take advantage multi-core architectures Cray XT5 hex-core and BG/P
– Improve disk I/O performance [currently using 10 - 25% of time]
– Improve memory footprint scalability
Future simulations– .25° atm/ .1° ocean – T341 atm/ .1° ocean (effect of Eulerian dycore)– 1/8° atm (HOMME)/.25° land/ .1° ocean
04/20/23 19
CCSM4 Scalability and CCSM4 Scalability and Performance Performance
04/20/23 20
04/20/23 21
New Parallel I/O library (PIO)New Parallel I/O library (PIO) Interface between the model
and the I/O library. Supports– Binary – NetCDF3 (serial netcdf)– Parallel NetCDF (pnetcdf)
(MPI/IO)– NetCDF4
User has enormous flexibility to choose what works best for their needs
– Can read one format and write another
Rearranges data from model decomp to I/O friendly decomp (rearranger is framework independent) – model tasks and I/O tasks can be independent
PIO in CCSMPIO in CCSM PIO implemented in CAM, CICE and POP Usage is critical for high resolution, high
processor count simulations– Serial I/O is one of the largest sources of
global memory in CCSM - will eventually always run out of memory
– Serial I/O results in serious performance penalty at higher processor counts
Performance benefit noticed even with serial netcdf (model output decomposed on output I/O tasks)
04/20/23 22
CPL scalabilityCPL scalability
Scales much better than previous version – both in memory and throughput
Inherently involves a lot of communication versus flops
New coupler has not been a bottleneck in any configuration we have tested so far – other issues such as load balance and scaling of other processes have dominated
Minor impact at 1800 cores (kraken peta-apps control)
23
CCSM4 Cray XT Scalability CCSM4 Cray XT Scalability
24(Courtesy of John Dennis)
POP 4028
CAM 1664
CICE 1800
CPL 1800
processors
tim
e1.9 sypd on 5844 cores with
i/oon kraken quad-core XT5
04/20/23 25
CAM/HOMME DycoreCAM/HOMME DycoreCubed-sphere grid overcomes dynamical core scalability
problems inherent with lat/lon gridWork of Mark Taylor (SciDAC), Jim Edwards (IBM), Brian
Eaton(CSEG)PIO library used for all I/O (work COULD NOT have been done without PIO)
•BGP (4 cores/node): Excellent scalability down to 1 element per processor (86,200 processors at 0.25 degree resolution).
•JaguarPF (12 cores/node): 2-3x faster per core than BGP, scaling not as good - 1/8 degree run loosing scalability at 4 elements per processor
PIO library used for all I/O (work COULD NOT have been done without PIO)
•BGP (4 cores/node): Excellent scalability down to 1 element per processor (86,200 processors at 0.25 degree resolution).
•JaguarPF (12 cores/node): 2-3x faster per core than BGP, scaling not as good - 1/8 degree run loosing scalability at 4 elements per processor
CAM/HOMMME CAM/HOMMME Real Planet: 1/8° SimulationsReal Planet: 1/8° Simulations
CCSM4 - CAM4 physics configuration with cyclical year 2000 ocean forcing data sets– CAM-HOMME 1/8°, 86400 cores−CLM2 on lat/lon 1/4°, 512 cores−Data ocean/ice, 1°, 512 cores−Coupler, 8640 cores
Jaguarpf simulationExcellent scalability: 1/8 degree running at 3 SYPD on JaguarLarge scale features agree well with Eulerian and FV dycores
l Runs confirm that the scalability of the dynamical core is preserved by CAM and the scalability of CAM is preserved by CCSM real planet configuration.
How will CCSM4 be released?How will CCSM4 be released?- Leverage Subversion revision control
system- Source code and Input Data obtained
from Subversion servers (not tar files)- Output data of control runs from ESG- Advantages:
- Easier for CSEG to produce frequent updates- Flexible way to have users obtain new
updates of source code (and bug fixes)- Users can leverage Subversion to merge new
updates into their “sandbox” with their modifications
04/20/23 27
04/20/23 28
Obtaining the Code and Obtaining the Code and UpdatesUpdates
Subversion Source Code Repository (Public)https://svn-ccsm-release.cgd.ucar.edu
svn co
obtain ccsm4.0
code
make your own
modifications in your sandbox
obtain new code updates and bug fixes which are
merged by subversion with
your own changes
svn merge
Creating an Experimental Creating an Experimental CaseCase
New CCSM4 Scripts Simplify:– Porting CCSM4 to your machine– Creating your experiment and
obtaining necessary input data for your experiment
– Load Balancing your experiment– Debugging your experiment- if
something goes wrong during the simulation (never happen of course) - simpler to determine what it is
04/20/23 29
Porting to your machinePorting to your machine CCSM4 scripts contain a set of supported
machines – user can run out of the box CCSM4 scripts also support a set of “generic”
machines (e.g. linux clusters with a variety of compilers) – user still needs to determine which generic
machine most closely resembles their machine and needs to customize Makefile macros for their machine
– user feedback will be leveraged to continuously upgrade the generic machine capability post-release
04/20/23 30
Obtaining Input DataObtaining Input DataInput data is now in Subversion
repositoryEntire input data is about 900 GB
and growingCCSM4 scripts permit user
automatically obtain only the input data need for a given experimental configuration
04/20/23 31
04/20/23 32
Accessing input data for your Accessing input data for your experimentexperiment
Set up experiment
create_newcase
(component set, resolution, machine)
determine local root directory where all input
data will go(DIN_LOC_ROOT)
Subversion Input Data Repository (Public)https://svn-ccsm-inputdata.cgd.ucar.edu
use
check_input_data –export
to automatically obtain ONLY
required datasets for experiment in DIN_LOC_ROOT
load balance your
experimental configuration
(use timing files)
use
check_input_data
to see of required
datasets are present in
DIN_LOC_ROOT
run Experiment
Load Balancing Your Load Balancing Your ExperimentExperiment
Load balancing exercise must be done before starting an experiment –
Repeat short experiments (20 days) without I/O and adjust processor layout to – optimize throughput – minimize idle time (maximize efficiency)
Detailed timing results are produced with each run
Makes load balancing exercise much simpler than in CCSM3
04/20/23 33
Load Balancing CCSM ExampleLoad Balancing CCSM Example
Idle time/cores
1664 cores
POP
CICE
Processors
CAM
CPL7 Tim
e
CLM
Increase core count for POP
3136 cores
1.53 SYPD
POPCICE
Processors
CAM
CPL7 Tim
e
CLM
4028 cores1664 cores
2.23 SYPD
Reduced Idle time
CCSM4 Releases and CCSM4 Releases and TimelinesTimelines
• January 15, 2010:• CCSM4.0 alpha release - to subset of users and
vendors with minimal documentation (except for script's User's Guide)
• April 1, 2010: • CCSM4.0 release - Full documentation, including User's
Guide, Model Reference Documents, and experimental data
• June 1, 2010: CESM1.0 release • ocean ecosystem, CAM-AP, interactive chemistry,
WACCM
• New CCSM output data web design underway (including comprehensive diagnostics)
04/20/23 35
04/20/23 36
CCSM4.0 alpha
release
Extensive CCSM4 User’s Guide already in place
apply for alpha user access at www.ccsm.ucar.edu/models/ccsm4.0
Upcoming ChallengesUpcoming Challenges
This year– Carry out IPCC simulations– Release CCSM4 and CESM1 and updates– Resolve performance and memory issues with ultra-
high resolution configuration on Cray XT5 and BG/P– Create user-friendly validation process for porting to
new machines On the horizon
– Support regional grids– Nested regional modeling in CPL7– Migration to optimization for GPUs
37
6/23/09
Big Interdisciplinary Team! Big Interdisciplinary Team!
Contributors: D. Bader (ORNL)D. Bailey (NCAR)C. Bitz (U Washington)F. Bryan (NCAR)T. Craig (NCAR)A. St. Cyr (NCAR)J. Dennis (NCAR)B. Eaton (NCAR)J. Edwards (IBM)B. Fox-Kemper (MIT,CU)N. Hearn (NCAR)E. Hunke (LANL)B. Kauffman (NCAR)E. Kluzek (NCAR)B. Kadlec (CU)D. Ivanova (LLNL)E. Jedlicka (ANL)E. Jessup (CU)R. Jacob (ANL)P. Jones (LANL)J. Kinter (COLA)A. Mai (NCAR)
Funding:– DOE-BER CCPP Program Grant
DE-FC03-97ER62402 DE-PS02-07ER07-06 DE-FC02-07ER64340 B&R KP1206000
– DOE-ASCR B&R KJ0101030
– NSF Cooperative Grant NSF01– NSF PetaApps Award
Computer Time:– Blue Gene/L time:
NSF MRI GrantNCARUniversity of ColoradoIBM (SUR) program
BGW Consortium DaysIBM research (Watson)
LLNLStony Brook & BNL
– CRAY XT time:NICS/ORNLNERSCSandia
S. Mishra (NCAR)S. Peacock (NCAR)K. Lindsay (NCAR)W. Lipscomb (LANL)R. Loft (NCAR)R. Loy (ANL)J. Michalakes (NCAR)A. Mirin (LLNL)M. Maltrud (LANL)J. McClean (LLNL)R. Nair (NCAR)M. Norman (NCSU)N. Norton (NCAR)T. Qian (NCAR)M. Rothstein (NCAR)C. Stan (COLA)M. Taylor (SNL)H. Tufo (NCAR)M. Vertenstein (NCAR)J. Wolfe (NCAR)P. Worley (ORNL)M. Zhang (SUNYSB)
38
Thanks! Questions?CCSM4.0 alpha release page at
www.ccsm.ucar.edu/models/ccsm4.0
04/20/23 39