dsl toolchains for performance portable geophysical fluid ...€¦ · dsl toolchains for...
TRANSCRIPT
Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwissDSL Toolchains for Performance Portable Geophysical Fluid Dynamic ModelsCarlos OsunaMeteoswiss
V. Clement, O. Fuhrer, S. Moosbrugger, X. Lapillonne, P. Spoerri, T. Schulthess, F. Thuering, H. Vogt, T. Wicky
MPI Seminar, 21st Novemebr 2017 2
DSL Toolchains for Performance PortableGeophysical Fluid Dynamic ModelsCOSMO 2km ensembles (21 members) COSMO 1km
MPI Seminar, 21st Novemebr 2017 3
DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models
Carlos Osuna, MPI Seminar, 21 November 2017 4Multi-coreGPU MIC 4
Top 500 overview Sunway TaihuLightTianhe-2TitanSequoia K ComputerPiz daint
TrinityCorioakforestGyoukouDisruptvie emergence of accelerators in HPC.Large memory bandwidth have speeded upour models. On NVIDIA GPUs:• dynamical cores – 2-3x• physical parametrizations 3-7x• spectral transforms > 10x
Carlos Osuna, MPI Seminar, 21 November 2017 5
How to achieveportabilityfor our GFD models ?
Carlos Osuna, MPI Seminar, 21 November 2017 6
DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models
Carlos Osuna, MPI Seminar, 21 November 2017 7
COSMO global 1kmDiscussion paper under reviewhttps://www.geosci-model-dev-discuss.net/gmd-2017-230/
Carlos Osuna, MPI Seminar, 21 November 2017 8
DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models
Carlos Osuna, MPI Seminar, 21 November 2017 9
Carlos Osuna, MPI Seminar, 21 November 2017 10
A domain specific language (DSL) for GFDs:• abstracts away all unnecessary details of a programminglanguage (Fortran) implementation• concise syntax: only those keywords needed to express thenumerical problem The GridTools Ecosystem• Set of C++ APIs / Libraries• DSL for PDEs• Large class of problems• Performance Portability• Separation of concerns• Interface to Fortran• Open source license
Carlos Osuna, MPI Seminar, 21 November 2017 11
DSL for Weather Codes on Unstructured GridsDSLs are a successful approach to deal with the software “productivity gap”ESCAPE develops a DSL based on GridTools: library tools for solving PDEs on multiple grids. The DSL syntax requires only the grid-point operation:Tiled loop templates and parallelization are abstractedUse of fast on-chip memory is also abstracted. Grid is abstracted (in combination with Atlas)����) = -4* u + u(i+1) + u(i-1) + u(j+1) + u(j-1)
Carlos Osuna, MPI Seminar, 21 November 2017 12
Writing Operators using DSL
Carlos Osuna, MPI Seminar, 21 November 2017 13
Carlos Osuna, MPI Seminar, 21 November 2017 14
Global Grids• New DSL constructs for stencils on global grids• Expose a unstructured grid API• Leverage grid structure for performanceIcosahedral Cubed sphere Octahedral
Carlos Osuna, MPI Seminar, 21 November 2017 15
Grid Abstraction15
Icosahedral OctahedralDual-grid Cube sphere
constepxr auto offsets = connectivity<edges, vertices, Color>::offsets();
for( auto off: offsets) {eval(flux) += eval( pD(off) );
}
eval(flux) = eval(sum_on_vertices(0.0, pD));• The DSL syntax elements are grid independent. • The same used code using DSL can be compiled for multiple Grids. • GridTools will support efficient native layouts for octahedral/icosahedral. • GridTools will use Atlas in order to support any grid.
Carlos Osuna, MPI Seminar, 21 November 2017 16
Example of GridTools DSL : MPDATA
Carlos Osuna, MPI Seminar, 21 November 2017 17
Example of GridTools DSL : MPDATA
Carlos Osuna, MPI Seminar, 21 November 2017 18
m_upwind_fluxes = make_computation<gpu>(domain_uwf, grid_,make_multistage(execute<forward>(),make_stage<upwind_flux, octahedral_topology_t, edges>(p_flux(), p_pD(), p_vn()),make_stage<upwind_fluz, octahedral_topology_t, vertices>(p_fluz(), p_pD(), p_wn()),make_stage<fluxzdiv, octahedral_topology_t, vertices>(p_divVD(), p_flux(), p_fluz(), p_dual_volumes(), p_edges_sign()),make_stage<advance_solution, octahedral_topology_t, vertices>(p_pD(), p_divVD(), p_rho())));Full MPDATA implemented using the GridTools DSL:� upwind fluxes� minmax limiter� compute fluxes� Limit fluxes� Flux solution� Rho correction
Composition of multiple MPDATA operators
Carlos Osuna, MPI Seminar, 21 November 2017 19
Grid Abstraction: Performance Portability19
Thanks to abstraction of the mesh wecan implement any partitioning:
O32 meshEqual partitionerimplemented with Atlas
Structured partitionerimplemented with Atlas
Carlos Osuna, MPI Seminar, 21 November 2017 20
Atlas: a library for NWP and climate modelling20Deconinck et al. 2017
Willem Deconinck
Carlos Osuna, MPI Seminar, 21 November 2017 21
Grid Abstraction: Performance Portability21
Abstraction of the mesh can implement any indexing layoutIndexing has a large impact in performance:NVIDIA P100, 128x128x80, Bandwidth GB/s
SN DA 211
SN IA 270
UN IA 130
HN IA 256
B � �∑ �������� �
Carlos Osuna, MPI Seminar, 21 November 2017 22
GridTools and Atlas E� solely relies on a C++ compiler � No external tools� provides a high-level of abstraction � Performance portabilityBut writing in this new programming modelE � is hard (C++ expertise) � Decreased productivity� is unsafe � No protection from violating the parallel model� often requires expert knowledge to produce efficient code
DSLs and abstractions: state of the art
Carlos Osuna, MPI Seminar, 21 November 2017 23
What is the Future of GFD Models in a complex HPC environment?1. Need to increase model development productivity2. Need to decrease maintainence cost of models3. Need to extend tools to a large set of models, methods, domains.4. Need to maximize efficiency
Carlos Osuna, MPI Seminar, 21 November 2017 2424
DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models
Carlos Osuna, MPI Seminar, 21 November 2017 25
Need to increase modeldevelopment productivityHigh level DSLs, concise, safe and intuitive syntaxthat remove boiler-platefrom embedded GridTools
Carlos Osuna, MPI Seminar, 21 November 2017 26
Need to decrease maintainence of modelsCheck for parallel errors,Auto-generation of boundaryconditions and halo exchangesCheck for out of boundsSafe generation of loop datadependenciesE.
Carlos Osuna, MPI Seminar, 21 November 2017 27
Need to extend tools to a large set of models, methods, domains.ICON-atmos IFS COSMOLFRicICON-Ocean
GridTools LisztPsyClone HybridFortranFiredrakeCLAWGFD modelsDSLs and programmingmodels
Options:• Unify all tools in a single DSL software stack -> maintainenceissue.• One DSL (programming model) per model, currently leadingto explosion of DSL solutions.• ESCAPE2
Carlos Osuna, MPI Seminar, 21 November 2017 28
Need to extend tools to a large set of models, methods, domains.ICON IFS COSMOLFRicICON-Ocean
GridTools LisztPsyClone HybridFortranFiredrakeCLAWGFD modelsDSLs and programmingmodels
Options:• Unify all tools in a single DSL software stack -> maintainenceissue.• One DSL (programming model) per model, currently leadingto explosion of DSL solutions.• ESCAPE2
Carlos Osuna, MPI Seminar, 21 November 2017 29
In the ESCAPE2 DNA3• Develop programming languages for GFD models thatcan increase productivity, and lower maintainance• Standardization, avoid explosion of tools.• Support multiple domains (FV weather, FV ocean, structured, unstructured, FE) in the standardization• Multiple domain languages or customizations, singletool• Language to be defined together with scientists
Carlos Osuna, MPI Seminar, 21 November 2017 30
ESCAPE2 base technologyDAWN
GTCLANG
Modular compiler toolchain (based on modern compilers –llvm- )
Carlos Osuna, MPI Seminar, 21 November 2017 31
ESCAPE2 base technologyDAWN
Unstructured
Structured
ICON Ocean pluginIFS collocatedFE pluginMultiple variants of a languagea[i+1]
on_cells(+,a)
Carlos Osuna, MPI Seminar, 21 November 2017 32 11/3Fabian Thüring
Example – Horizontal Diffusion
32
Carlos Osuna, MPI Seminar, 21 November 2017 33 33
stencil_function laplacian {storage phi;Do {return 4.0 * phi – phi[i+1] – phi[i-1] – phi[j+1] – phi[j-1];
}};
stencil hd_type2 {storage hd, in;temporary_storage lap;Do {vertical_region(k_start, k_end) { lap = laplacian(in);hd = laplacian(lap);}
}};
Example – Horizontal Diffusion
Carlos Osuna, MPI Seminar, 21 November 2017 34 34
stencil hd_type2 {storage u, utens;temporary_storage uavg;Do {vertical_region(k_start, k_end) {uavg = (u[i+1]+u[i-1])*0.5;utens = (uavg[k+1]); }
}};
Safety First
Carlos Osuna, MPI Seminar, 21 November 2017 35 35
stencil hd_type2 {storage u, utens;temporary_storage uavg;Do {vertical_region(k_start, k_end) {uavg = (u[i+1]+u[i-1])*0.5;utens = (uavg[j+1] + uavg[j-1]); }
}};
Safety First
Carlos Osuna, MPI Seminar, 21 November 2017 36 36
stencil hd_type2 {storage u, utens;temporary_storage uavg;Do {vertical_region(k_start, k_end) {uavg = (u[i+1]+u[i-1])*0.5;utens = (uavg[j+1] + uavg[j-1]); }
}};
Safety First
Carlos Osuna, MPI Seminar, 21 November 2017 37 37
stencil_function laplacian {storage phi;Do {return 4.0 * phi – phi[i+1] – phi[i-1] – phi[j+1] – phi[j-1];
}};stencil hd_type2 {storage hd, in;temporary_storage lap;Do {vertical_region(k_start, k_end) { lap = laplacian(in);hd = laplacian(lap);}
}};#omp parallel for nowait#acc …for(int i=0; i < isize; ++i) {for(int j=0; j < jsize; ++j) {for(int k=0; k < ksize; ++k) {udiff(i,j,k) = hd(i+1,j,k) + hd(i-1,j,k);
} } }
Interoperatbility : Escaping the languageDSL codeStandard C++ code(+OpenMP, +OpenACC, +CUDA)
Carlos Osuna, MPI Seminar, 21 November 2017 38
Standardization:SIR { «filename» : «/code/access_test.cpp»,«stencils» : [«name» : «hori_diff»,«loc» : {«line»: 24,«column»: 8},«ast»: {«block_stmt»: {«statements»: {«expr_stmt»: {«assignment_expr»: {«left»: {«field_access_expr»: {«name»: «b»,«offset»: [], }},«right»: {«binary_op»: {«left»: {«field_access_expr»: {«name»: «a»,«offset»: [1,0,0], }},«right»: {«field_access_expr»: {«name»: «a»,«offset»: [-1,0,0], } } } } } } } } } } }
b = a(i+1) + a(i-1)GTCLANGMultiple DSL (thin) frontends can generatestandard SIR schemeGridToolsT4PyCLAWESCAPE2 / OceanPsyCloneFortranFortran
C++ Python
Carlos Osuna, MPI Seminar, 21 November 2017 39
DAWN: Compiler optimization passes
Carlos Osuna, MPI Seminar, 21 November 2017 40
COSMO dycore evaluation of DSL toolchainType 2 u SmagorinskyType 2 v Type 2 w Type 2
pp Horizontal Diffusion(P100)Advection( metric and time - P100 -)
Carlos Osuna, MPI Seminar, 21 November 2017 41
Conclusions• The future of GFD models, high resolution, multidecade simulationwill bring serious computational challenges:efficiency and production cost, maintenance cost• We need to find right programming models so that adaptation tonew architectures do not hinder scientific development -> high productivity• Single efforts are not a valid model anymore, we needcollaborations• ESCAPE2 is a great opportunity to define the new language withscientists, we have the technology in place.
MPI Seminar, 21st Novemebr 2017
BACKUPS
MPI Seminar, 21st Novemebr 2017
Internal presentation, Valentin Clément
Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss
MeteoSvizzeraVia ai Monti 146CH-6605 Locarno-MontiT +41 58 460 92 22www.meteosvizzera.ch MétéoSuisse7bis, av. de la PaixCH-1211 Genève 2T +41 58 460 98 88www.meteosuisse.ch MétéoSuisseChemin de l‘AérologieCH-1530 PayerneT +41 58 460 94 44www.meteosuisse.chMeteoSwissOperation Center 1 CH-8058 Zurich-Airport T +41 58 460 91 11 www.meteoswiss.ch
44