european extreme data and computing initiative - …...production/analysis applications big data and...

18
ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE Centres of Excellence view on: “What are the important research goals for the next 5 years”

Upload: others

Post on 19-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Centres of Excellence view on:

“What are the important research goals for the next 5 years”

Page 2: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

EoCoE - Energy oriented Centre of Excellence for computer

BioExcel - Centre of Excellence for Biomolecular Research

NoMaD - The Novel Materials Discovery Laboratory

MaX - Materials design at the eXascale

ESiWACE - Excellence in SImulation of Weather and Climate in Europe

E-CAM - An e-infrastructure for software, training and consultancy in simulation and modelling

POP - Performance Optimisation and Productivity

COEGSS - Center of Excellence for Global Systems Science

CompBioMed - A Centre of Excellence in Computational Biomedicine

1st Generation of CoE

Page 3: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

• Collaborate with PRACE on: best practices software environment organizational requirements for science

domains training

• PRACE HLST• Use PRACE resources for benchmarking etc.• Use center competence for co-design and

exascaling (all tiers) • Help communities accessing PRACE

resources

• Collaborate with FET research projects to: harden and integrate results to

production quality provide real-world test cases

• Provide for ESDs: provide requirements optimize software for targeted

ESDs – will also be important for smaller scales

co-design

Ecosystem: cPPP

Page 4: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Goals of CoE

Ensure European competitiveness in the application of HPC to address scientific, industrial or societal challenges:

• Excellence in research, development and service;• User-driven, with the application users and owners playing a decisive role in governance; • Integrated: encompassing not only HPC software but also relevant aspects of hardware, algorithms, data, workflows, connectivity,

security, etc., covering the full scientific/industrial workflow and addressing full-scale, widely used applications with proven impact,

not just application kernels or mini-apps;• Multidisciplinary: with teams that combine application domain and user expertise together with HPC system, software and

algorithm expertise; • Distributed with a possible central hub, federating capabilities around Europe, exploiting available competences, and ensuring

synergies with national/local programmes; • Showing a clear path towards Exascale covering both computing and extreme data• Committed to a culture of collaboration and sharing of best practice among CoEs on cross-cutting topics, such as co-design and

Exascale technologies, software sustainability, and potential business models; • Committed to collaborate with the upcoming Extreme Scale Demonstrators, providing them with a high performance and

scalable application base; • Actively contributing to economic and societal benefit, by facilitating high-impact research and collaborating with industry

and SMEs.

Page 5: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Present CoE

Vertical (domain) CoEs, which provide services to well defined communities:• Software development, support & consultancy, training• Frontier developments towards exascale requirements in Material Sciences, Weather&Climate or Bio may be very different

Horizontal (transversal) CoEs, which cover topics of importance to several vertical CoEs:• Software development, support & consultancy, training• Performance analysis potentially complex interaction matrix

Page 6: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Future CoE

Vertical (domain) CoEs, which provide services to well defined communities:• Current themes+ Engineering?+ Fundamental physics?+ Environmental sciences (adaptation to environmental change)?Cover entire value chain (from science to industry)Compute and data (from exaflop to exabyte)

Horizontal (transversal) CoEs, which cover topics of importance to several vertical CoEs:• Current themes+ High-performance data analytics?+ Software sustainability?+ Mathematics and algorithms? Enhance culture of collaboration (CoE matrix)

Page 7: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Future CoE: Considerations

• Potentially merging of existing CoE (cf. Flagships):• smaller number of domains that have shown a clear path to exascale/co-design or larger variety?• may provide more efficient x-benefit for several communities towards exascale but effectiveness

may suffer (weak links affect wider area)• providing support for merged communities will be challenging

• Importance of Exascale:• addressing key science challenges is most important ( Exaflop)• technological developments (e.g. co-design EsD top 3 in 2022) run away from applications• only few CoE have really big data tasks (e.g. ESiWACE)

• Stimulation of SME:• smaller dedicated efforts may be more effective (e.g. SHAPE, Fortissimo)• unified software development vs specialized SME niches

• Sustainability:• assimilating cutting-edge FET developments• business model needs simple/realistic approach (and public funding for some time to come)

Manage expectations of CoE what will be able to achieve & sustain – given the available funding

Page 8: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

CoE in SRA-2

• Section 8 EsD introduction, page 67: “These EsDs should provide platforms deployed by HPC centres and used by CoEs for their production of new and relevant applications.”

• Section 8.2, Proposal of ETP4HPC for EsD Calls: “This and other hardware characteristics (energy efficiency, I/O bandwidth, resiliency, etc.) will be detailed in the 2017 release of the SRA, also taking into account results from the FETHPC projects and requirements from the CoEs.”

• Section 9.3, Centres of Excellence for Computing Applications: “The Centres of Excellence in Computing Applications (CoEs) form one of the three pillars of the European HPC Ecosystem and represent the European Application expertise […]. ETP4HPC will be working to include the CoEs in the processes of the contractual Public-Private Partnership for HPC and synchronise their efforts with those of the other two pillars of the European HPC Ecosystem (i.e. ETP4HPC and the FETHPC projects, PRACE).

Page 9: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Target for addressing key science challenges in weather & climate prediction: Global 1-km Earth system simulations @ ~1 year / day rate

Example: ESiWACE Science Challenge

10 km

1 km

Page 10: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Example: NOMAD Science and Data Handling Challenges

Discovering interpretable patterns and correlations in this data will

• create knowledge• advance materials science, • identify new scientific phenomena, and• support industrial applications.

The NOMAD Archive10.000

100

1

# G

eo

met

ries

per

Co

mp

osi

tio

n

1 Mio

1.000

1

# C

om

po

siti

on

s

Transparent metals

Photovoltaics

Superconductors

Thermal-barrier materials

Descriptor A

Des

crip

tor

B

The NOMAD challenge: Build a mapand fill the existing white spots

NOMAD supports all important codes in computational materials science. The code-independent Archive contains data from manymillion calculations (billions of CPU hours).

Data is the raw materials of the 21st century

Page 11: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Agent Based Modelling for Synthetic Information System Simulation

• Higher Scalability Extreme communication overheads• Big Data Outputs Need for enhanced HPDA capabilities

Example: CoeGSS Science Challenge

Page 12: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

E183

E226

D259

E236

R0

R1

R2

R3

K4

R5

VSD

A

B

Molecular Dynamics on the exascale

• Understanding proteins and drugs

• A 1 μs simulation: 10 exaflop

• Many structural transition: many simulations needed

• Study effect of several bound drugs

• Study effect of mutations

• All this multiplies to >> zettaflop

• Question: how far can we parallelize?

Example: ion channel in a nerve cell.Opens and closes during signalling.Affected by e.g. alcohol and drugs.200 000 atoms

Page 13: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

CoE and SRA-3Domain CoE: What are the key scientific challenges and what does it take to achieve them?

BioExcel CompBioMed MaX ESiWACE E-CAM

HPC System Architecture and Components

Large width vector units, low-latency networks, high-bandwidth and large memory; fast CPU<->accelerator transfer rates, Heterogeneous acceleration, floating-point

Hybrid systems, GPUs, high-bandwidth memory

Scientific challanges: accurate computations requiring more powerful HPC systems,heterogeneous systems designed to optimize end-to-end workflows

High-bandwidthmemory, networks, NVRAM

High memory bandwidth, large RAM

System Software and Management

Dynamic (task) scheduling, Support for workflows

Dynamic scheduling, urgent computing

Virtual Machine model supported Dynamic scheduling, compilers

Cross compiling, archiving tools

Programming Environment

Standardization, portability, task parallelism, fast code driven by Python interfaces

Portability, ease of scale out, MPI extensions

support new MPI and OpenMPstandards + new development tools like python

Fast standardization, DSL

Interactive testing, OpenCL support, sustainable support of standards

Energy and Resiliency

Distributed computing techniques to handle resiliency/fault tolerance

Reduced cost, improved fault tolerance

Energy optimized workflows: HPC systems including energy monitoring and profiling.

Fault handling, less precision

Energy aware algorithms

Page 14: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

CoE and SRA-3

Domain CoE: What are the key scientific challenges and what does it take to achieve them?

POP CoEGSS NOMAD EoCoE

HPC System Architecture and Components

Throughput oriented devices (vectors), memory architectures and how to use them, architectural support for runtimes, mechanisms to monitor progress and notify runtimes in cases of resource preemptions

Convergence between HPC & HPDA, NVRAM, Fast networks

For extreme scale simulations: powerful compute nodes, high bandwidth memory, low-latency networks.For data analytics: convergence between HPC & HPDA

accelerator with shared memoryFPGAProcessor in memoryNVRAM, large memory

System Software and Management

Dynamic, interactive use of available resources, tight and bidirectional communication/cooperation between job schedulers and runtimes

Dynamically scaling jobs, Integration of HPC & HPDA, Visualization (in situ), Data analytics (in situ) Programming Environment

Containers, scalable data bases, interactive access to subsets of resources, remote visualization, synchronization of big data sets between different computer centres in different countries and continents

complex, light and flexible workflow (notebooks based on python may be a solution comparing to other heavy solutions)Allowing an easy access to large HPC ressourcesRuntimesHeterogenous job handling for coupled codes/libraries

Programming Environment

Programming model and runtime support for malleability, asynchrony/out of order task execution, hide heterogeneity and tolerate latency and variability, powerful performance analytics in tools, tools for task dependencies and memory access patterns, programmers mindset from bottom-up latency dominated tothroughput oriented mentality

Well-defined standards /and tools that implement the standards) for agent-based modeling, Compilers, Debuggers

MPI, OpenMP, GPU programming, Python, tools for profiling and optimization of scalable data bases

Task programmingautomatic optimization (as BOAST)domain specific language to test new algorithms more easilyPerformances analysis toolsAutomated, continual performance monitoring of production

Energy and Resiliency

Better integration between algorithmic based fault detection techniques and mechanism in the infrastructure from detected errors

Not important for CoeGSS Energy aware system operation, algorithms and workflows

Fault toleranceAnalysis in situ

Page 15: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

CoE and SRA-3 (cont’d)Domain CoE: What are the key scientific challenges and what does it take to achieve them?

BioExcel CompBioMed MaX ESiWACE E-CAM

Balance Compute, I/O and Storage Performance

Post-processing on the fly, Data-focused workflows, handling lots of small files in bioinformatics

Post-processing on the fly, easy transfer between storage tiers

HT material science workload becomes quickly memory and I/O bound: systems with high IOPS and post posix data objects are required.

Post-processing on the fly, multi-tier software

Fast access of archive data, multi-platform workflows, multi-threaded applications for hybrid production/analysis applications

Big Data and HPC usage Models

Proximity of data generation and analysis/visualization resources, workflows, machine learning for analyzing simulation data, high-throughput sampling

Analytics of simulation outputs, visualisation

Workflows, intelligent data analytics

Recomputing, dataanalytics

Fast access to data bases, data mining

Mathematics and algorithms for extreme scale HPC systems

Multi-scale algorithms, task-parallel algorithms, Electrostatics solvers, ensemble sampling & clustering theory, ensemble simulations

Novel time stepping algorithms, automated implementation of multiscale computing patterns

New algorithms avoiding synchronous (unnecessary) data dependency and exploiting unreducible data dependency tree (nesting), to improve concurrency and locality

Disruptive numerical methods (discretization), data placement

Memory/cache aware algorithms, asynchronous algorithms, efficient handling of long range/collective correlations

Page 16: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

CoE and SRA-3 (cont’d)Domain CoE: What are the key scientific challenges and what does it take to achieve them?

POP CoEGSS Nomad EoCoE

Balance Compute, I/O and Storage Performance

Integration of asynchronous I/O interface in programming model, better integration of programming models/languages and persistent storage

Converged systems, Live data analytics, Strong data movement capabilities

Data-centric workflows, high I/O bandwidth, handling huge number of files

In situ data analysis coupled to dedicated I/O systemHigh-band large capacity storage

Big Data and HPC usage Models

Better integration between programming model and storage interface, more dynamic, interactive supercomputing practices, make users/programmers aware of cost/benefit of each individual data and computation for better resource/storage scheduling

HPDA Platform support, Algorithms / Models for efficient HPDA

High performance data analytics of simulation outputsand derived data, faster analysis of complex and Big Data structures (map reduce, faceted search), fast and efficient data mining. Significant advancements in hard and software for handling Big Data more efficiently

Techniques of data analysis coming from Big Data (robust solution with fault tolerance)Reproducibility and ease of calculations based on notebooksHigh-throughput computing

Mathematics and algorithms for extreme scale HPC systems

Algorithm complexity (computation/communication). asynchrony and variability tolerance, algorithm based fault detection

Algorithms for efficient data analytics

Novel scalable algorithms for extreme scale simulations and for extreme scale data analytics

parallel dynamics (molecular or for climate)multiscale modelling so better coupling between codes as QM/MMlinear algebraParallel in time algorithms

Page 17: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

Top 3 Challenges – VERY EARLY DRAFT

• HPC System Architecture and Components– Efficient use of memory and I/O hierarchies - links to Balance

Compute, I/O and Storage Performance

– Efficient interaction between “fat” and “thin” (GPU) cores

• System Software and Management – Software standards (C++17 and Fortran 2015 in particular, but also

OpenMP 4.5, MPI 3.1, OpenCL 2.2,...)

• Programming Environment– (Dynamic) environments for task parallelism.

18

Page 18: European eXtreme Data and Computing Initiative - …...production/analysis applications Big Data and HPC usage Models Proximity of data generation and analysis/visualization resources,

ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE

CoE and SRA-3/EsD

Assumption: EsD will be precursors of European exascale HPC facilities hosting novel technologies and software stack

EsD are FLOP focused, CoE application areas may not be!Extreme scale data analytics not covered yet!

If the key role of CoE is to provide wider (domain) user community with expertise, software etc.:1. How is transition from ‘novel’ to ‘applicable’ be managed? What is the role of FET

projects in this context? (there is also no certainty for sustainability of FET projects per domain)

2. How are key applications, to be run on EsD in operational mode, adapted to novel architecture, stack, programming models etc.? By whom? With what funding?

3. If future CoE become wider (less domain specific):a. how can the above be realized?b. will the horizontal CoE do the job?