parallel io 2 update - cesm® · 2016-08-14 · •is a straight average across all decomps the...

16
Parallel IO 2 Update Kate Thayer-Calder & Jim Edwards NCAR CSEG

Upload: others

Post on 08-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

Parallel IO 2 UpdateKate Thayer-Calder & Jim Edwards

NCAR CSEG

Page 2: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

Overview

• What is Parallel IO (PIO) and why do we need it?

• What’s New in PIO2?

• Does PIO2 improve performance?

• Current status, To-Do’s, and CESM2

Page 3: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

What is PIO and why do we need it?• A library to organize data from its structures in memory to the

desired structure on disk.

1 2 3 4 5 6 7 8 9 10 11Of course, it’s not

that simple…

Page 4: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

• Computational data is distributed across multiple processors, and processes are load balanced to make efficient use of resources.

What is PIO and why do we need it?

1

1

1

1

2 2

22

3

45

3

4

5

3

4

5

3

45

P3 - Columns 1,2,3,4,5…

P1 - Columns 1,2,3,4,5… P2 - Columns 1,2,3,4,5…

P4 - Columns 1,2,3,4,5…

Page 5: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

• PIO acts as a translator that takes in distributed multi-processor computational data and spits out simple easy-to-write arrays for the NetCDF libraries (and vice-versa for reading in data).

What is PIO and why do we need it?

P3 - Columns 1,2,3,4,5…P1 - Columns 1,2,3,4,5… P2 - Columns 1,2,3,4,5…

P4 - Columns 1,2,3,4,5…

Decomposition Map

Page 6: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

• And you can imagine how much more complicated this story is at higher and higher resolutions…

What is PIO and why do we need it?

Page 7: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

What’s new in PIO2?

1) Library rewritten in C to improve portability and flexibility with different software products.

Page 8: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

What’s new in PIO2?2) PIO2 supports an extra method for handling the computational data translation.

Old (PIO1) Method: BOXP1 P2 P3 P4 P5 P6

IO Task 1 IO Task 2

NetCDF library

*Higher MPI overhead, but fastest netcdf operations. Good for large data, smaller

processor counts.

Page 9: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

What’s new in PIO2?2) PIO2 supports an extra method for handling the computational data translation.

New (PIO2) Method: SUBSETP1 P2 P3 P4 P5 P6

IO Task 1 IO Task 2

NetCDF library

*Less MPI overhead, but more netcdf operations.

Good for medium data, large processor counts.

Page 10: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

What’s new in PIO2?3) PIO is now hosted on GitHub with doxygen-generated documentation and automated nightly tests on CDash.

https://github.com/NCAR/ParallelIO

http://ncar.github.io/ParallelIO/

http://my.cdash.org/index.php?project=PIO

Page 11: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

Does PIO2 improve performance?IO WRITES

IO READS

Page 12: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

Does PIO2 improve performance?Can we tune IO in CESM to get better

performance from PIO2?

• Default B1850 PIO1 run for 1 year: # simulated years / cmp-day = 6.254

• Default B1850 PIO2 run for 1 year: # simulated years / cpm-day = 6.210

• Change CPL from default 40 IO Tasks to 5 IO Tasks: # simulated years / cpm-day = 6.314 (1% improve)

All CPL Decomps

Avg All CPL Decomps

Page 13: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

• Is a straight average across all decomps the correct way to look at this?

• What if most model IO time is spent on a single large or complex IO operation - it would be best to optimize only that one.

• Tune by decomp size or complexity?

Does PIO2 improve performance?

• Currently adding custom IO timers to CAM to see which decomps are related to the highest IO cost.

B1850 CAM Decomp Performance Box Reads

Page 14: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

Current Status PIO2 and CESM2

• PIO2 is an option in the latest versions of CIME, but PIO1 is still the default.

• Fixed a few problems that were causing alpha & beta tests to fail with PIO2 but still have some outstanding issues with CLM.

• Planned to be included in CESM2.

• If you’re interested in doing some experiments in PIO2, contact me and I can help you get started: [email protected]

Page 15: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

Thank you! Questions?

Page 16: Parallel IO 2 Update - CESM® · 2016-08-14 · •Is a straight average across all decomps the correct way to look at this? • What if most model IO time is spent on a single large

How to tune PIO2• PIO_VERSION=2 (env_build.xml)

• {ATM|CPL|GLC|ICE|LND|OCN|ROF|WAV} = ###

• ###_PIO_NUMTASKS

• ###_PIO_REARRANGER (1=box, 2=subset)

• ###_PIO_ROOT (root PE)

• ###_PIO_STRIDE (num tasks btn IO tasks)

• ###_PIO_TYPENAME (netcdf, pnetcdf, netcdf4p, netcdf4c)