egi vision for data and distributed computing e ... · training and outreach integrate at european...
TRANSCRIPT
www.egi.eu
www.ui.sav.sk
EGI vision for Data and Distributed
Computing e-infrastrutures for Open
Science
Tiziana Ferrari, Egi.eu, Ladislav Hluchý, Institute of Informatics SAS
210/10/2016
Challenge and scope
EINFRA-12 (A) Meeting, 13-09-2016
310/10/2016
Impact
EINFRA-12 (A) Meeting, 13-09-2016
410/10/2016
EINFRA-12 Challenge
Compute/Storage/Data/Security
Federation ServicesThematic Services
Training and outreach
“Integrate at European level the geographically and disciplinary dispersed resources to achieve economies of scale and efficiency gains in providing the best data and computing capacity and services to the research and education communities.”
Issues: - geographical and disciplinary
fragmentation- Lack of economies of scale
EINFRA-12 (A) Meeting, 13-09-2016
510/10/2016
Impact and Excellence
• Concerned platforms and services are based on systems
and technologies that have reached at least TRL 8 (“system
complete and qualified”) before the start of the project.
• Quality and Quantity of services in a joint service
catalogue: “The extent to which the Service Activities
(Trans-national and/or Virtual Access Activities) will offer
access to state of-the-art infrastructures and high quality
services
• Potential to enhance capacity for innovation and
production of new knowledge
Exploitation of services for excellence science and industry/SMEs
EINFRA-12 (A) Meeting, 13-09-2016
610/10/2016
Requirements/1
• Integration of computing, software and storage resources
– EGI services: Compute/Storage/Data management/AppDB/UMD and CMD
distribution
– External services: EUDAT and other e-Infrastructures, RIs
• Exposing them through a dynamic registry and catalogue of services
supporting European research and education communities in their
tasks related with data and computing intensive science
– EGI new services: marketplace (from EGI-Engage), Cloud services for data
science educational activities
• This integration should be done by means of open and flexible
architectures and include institutional, regional, national and European
capabilities, packaging them in the optics of end-user needs
– Support and training for federated service management including service portfolio
management
– Extend federation to regional infrastructures e.g. funded through ESIF
EINFRA-12 (A) Meeting, 13-09-2016
710/10/2016
Requirements/2
• Seamless operation of highly scalable and agile data and computing
platforms and services dedicated to analytics including hardware and
software components, database, compilers, analytics software,
supported to easy user entry points for the community of users
– EGI services: federation fabric (aka Core Infrastructure Platform) and
federated service management tools and activities. Includes new core
infrastructure services of general interest.
– New services: Data Hub/INDIGO Datacloud, EOSCpilot succesful PoCs
– Activities:
• Operation of the EGI Core Infrastructure Platform and federated operations
• Operation of EGI community platforms (for research and industry), Integration of new
ones at TRL 8
• Integration with thematic data infrastructures (RIs, national data infrastructures) and
generic ones (EUDAT)
EINFRA-12 (A) Meeting, 13-09-2016
810/10/2016
Requirements/3
• Reliably address the aspects of privacy, cybersecurity and
information assurance supporting multiple compartments
with private, public or industrial corpus of data, protected
from unauthorized access by secure interfaces
– Security coordination and policy development
– Incident response
– Security training
– Security certification in compliance to data protection regulations
– AAI services (credential translation, attribute translation, IdP/SP
proxy services: “EGI CheckIn”)
EINFRA-12 (A) Meeting, 13-09-2016
910/10/2016
Requirements/4
• Adoption of standards-based common interfaces, open
source components enabling access and processing of
underlying data collected/stored in different platforms and
formats.
• Empowering users to customise application and services
tailoring them to specific requirements, which will differ
across disciplines, applications etc.
– Support to open standards
– Maintenance of critical middlware components
– Technical integration of new community platforms according to
EGI participants’ priorities and EC priorities (e.g. Copernicus/DAIS)
through competence centres
EINFRA-12 (A) Meeting, 13-09-2016
1010/10/2016
Requirements/5
• Work closely with user communities (from different disciplines) to
foster the use of digital infrastructures
– Outreach, training
• Promote the values of open science and support their data
management plans
– Distributed storage infrastructure for European research collaborations
and long tail of science for depositing data
– Open access to data management planning tools available at RI/EGI
participant level, cooperation with EUDAT/OpenAIRE and relevant national
authorities /EIROs (e.g. Zenodo/CERN and others)
• Engage and train users (researchers, educators and students) to
contribute to the dynamic registry and catalogue of services
improving quality of data, software and computingi nfrastructure that
become available for re-use
– Promotion of marketplace with new service providers (public/private)
EINFRA-12 (A) Meeting, 13-09-2016
1110/10/2016
Requirements/6
• Foster interoperability of pan-European
thematic/community-driven e-infrastructures providing
cost-effective and interoperable solutions for data
management. The data and computing e-infrastructure
should be able to interoperate with resources based on
different technologies which are operated/owned by
public and or private organisations
– Establish a commercial cloud supplier group and define a technical
interoperability requirements and roadmap for the European Open
Science Cloud (leveraging HNSciCloud results)
– Establish EGI as cross-border lead procurer (in collaboration with
GEANT FPA initiatives)?
EINFRA-12 (A) Meeting, 13-09-2016
1210/10/2016
Requirements/7
• Support the preservation and curation of data and
associated software so that the reproducibility and
accuracy of the data can be verified
– EGI services: AppDB, Cloud Compute integrate with
preservation infrastructures (e.g. Zenodo, B2Share…) for
preservation of VM images and linking to data and
Cloud Compute
– Liaise with EINFRA-12 (B) proposal (OpenAIRE)
– Repurpose community data preservation and data
management instruments of general interest
EINFRA-12 (A) Meeting, 13-09-2016
1310/10/2016
Requirements/8
• Enable seamless transition and e-infrastructure upgrades,
exploiting economies of scale and promoting
interoperability with similar infrastructures across and
beyond Europe and operate user-friendly and
comprehensive repositories of software components for
research and education
– EGI services: UMD and CMD software distributions, AppDB, quality
verification of services for marketplace
– International cooperation for data-driven research collaborations
• HBP/Astronomy and Astrophysics/Life Science…
EINFRA-12 (A) Meeting, 13-09-2016
1410/10/2016
EGI Services/NGI-EIRO services
Compute
High-throughput
GPU
Cloud
Cloud Container
Cloud GPU
Storage
Online
Archival
Data
Data transfer
Public Data
Security
User attributes
management
- Providers- NGIs and EIROs- (Commercial suppliers)
- Funding- 100% National funding
agencies- Private funds
- EINFRA-12 funding- EC (cost for pooling and
supporting new user groups) + National/Private funds establishing a distributed operations coordination and technical support team
EINFRA-12 (A) Meeting, 13-09-2016
1510/10/2016
EGI Services/federation services
Helpdesk
GGUS
Technical support
Security
Coordination, CSIRT and
policies
VO and user registration
IDP and IDP Proxy
Credential translation
Attribute Management
Accounting
Repository
Portal
Scientific Applications
and cloud VM library
Operations tools
Messaging infrastructure
Monitoring
Service registry
(GOCDB)
Operations Portal
Collaboration tools
Unified Middleware Distribution
Quality assurance
UMD infrastructure
Coordination
Technical
Operations
User communities
- Providers- NGIs and EIROs- EGI Foundation
- Currently- 50% National funding
agencies + 50% Fees- EINFRA-12 funding
- EC contribution (devops) + EGI in kind pledged contributions
EINFRA-12 (A) Meeting, 13-09-2016
1610/10/2016
EGI Services/Thematic services
HEP, Astroand
Astroparticle
WLCG, CTA, etc.
Structural biology
WeNMR services
Biomedicine and Bioinformatics
Life Science Grid
Community
BILS (Sweden)
CHIPSTER
Hydrology
DRIHM
Fresh water and marine
resource conservation
iMARINE
Environmental Science
ESA Them.
Exploitation
Platforms and
Data and Info Access Service Coperni
cus
Art and humanities
Musicology (Peachnote)
DARIAH
Scientific applications on
demand
Bioinformatics
Engineering
…
ETC
- Providers- Research
organizations/universities (MoUs)
- NGIs- Industry
- Funding- 100% National
funding agencies- Private funds
- EINFRA-12 funding- EC (devops,
technical support) + National funding agencies or private funds
EINFRA-12 (A) Meeting, 13-09-2016
1710/10/2016
Cloud Services for
Nanotechnology
1810/10/2016
Nanotechnology Requirements
• Nanotechnology is easily able to exhaust any
available computer resources
– An increase in detail of computation requires also a
siginificant increase in computing time
– Outputs are measured in hundreds of GB
– Monte Carlo methods are quite popular
• Many different software packages
– VASP (Vienna Ab-initio Simulation Package, DFT),
CPMD (Car-Parrinello Molecular Dynamics, DFT), SPR-
SKKR (Spin-Polarized Relativistic Screened Korringa-
Kohn-Rostoker), Q-espresso (Quantum Espresso, DFT)…
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
1910/10/2016
Experiment and Simulation
• Nanotechnology can be broadly divided into
experimental and theoretical (simulation-based)
• These two fields work together, share ideas,
exchange data and results
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2010/10/2016
High performance or high scalability?
• Theoretical (simulation-based) nanotechnology
currently relies on HPC
– Reduced availability for smaller research teams
– Often cumbersome setup of software and various
constraints (security, multi-user sharing of resources…)
• A shift towards HSC would be beneficial
– Some widely used methods (like Quantum Monte
Carlo) work very well in HSC environment
– Cloud can work as HSC -> excellent on-demand
availability, price scales with requirements
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2110/10/2016
Different models of large-scale computing
• Cloud can be very useful; cloud with HPC
resources available is even better
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2210/10/2016
Storage requirements
• Usually only processed data (graphs, pictures of
atomic sctructures, movies of trajectories) is being
exchanged
– Raw data is too big to send via e-mail or store for
longer periods of time
• Availability of raw data from previous runs would
free up considerable resources
– Simulations would not need to be repeated by different
teams
– Often the raw data contains aspects not present in the
final, processed data – these just get lost and need to
be computed from scratch if needed
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2310/10/2016
Modern Virtual Research Environment for
Nanotechnology - Consortium
• Based on close cooperation with several
important nanotechnology centers– Karlsruher Institut für Technologie
– Technische Universität Wien
– Universität Regensburg
– Universität Tübingen
– King’s College London
– Justus-Liebig-Universität Giessen
– Institut Català de Nanociència i Nanotecnologia
– North Carolina State University
– Department of Applied Physics, Graduate School of Engineering,
Osaka University
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2410/10/2016
Modern Virtual Research Environment for
Nanotechnology –motivation and goals
• Weakest points of current technology:
– Need of frequent and cumbersome porting and moving of application
codes onto and between the different HPC platforms,
– Need of expert performance tuning on the platforms on which the
application codes have been ported,
– Slow and cumbersome data exchange/flow between the different research
participants
• A new type of cloud-based VRE:
– making increased use of the HSC computing paradigm, grant easy,
comfortable, and secure access to and use of even larger computing
facilities
– solve the needs of huge data exchange/sharing among the diverse
research participants (theory/simulation, experiment)
– long-term data storage
– improved convenience and quality of graphics and data animation
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2510/10/2016
Modern Virtual Research Environment for
Nanotechnology - Objectives
• A collaborative, nanotechnology-specific virtual
hub
• Extensive semantic support for data and services
• Modern distributed infrastructure with a state-of-
the-art architecture and deployed technologies
• Comprehensive integration of resources
• A platform built for real users and with their close
involvement
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2610/10/2016
High-level Architecture
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2710/10/2016
Adopted Technology – WS-PGRADE
• WS-PGRADE/gUSE
– MTA SZTAKI (Hungarian Academy of Sciences)
• Scientific Gateway Based User Support (SCI-BUS)
– Based on WS-PGRADE
– Is widely used
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
2810/10/2016
EGI and the European Open Science Cloud
2910/10/2016
Open Science:
a Complex Resource System
• Shared resources
– Integrated, easy and fair access
• Engaged communities
– Participating in the process
– Culture of sharing
– Collaborating in the management and
stewardship
• Governance
– Rules to access
– Rules to resolve conflicts
– Rules to balance quality vs. openness
• Financial support
– For long-term availability
Digital services and applications
Knowledge & Expertise
Instruments
Research Data
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3010/10/2016
Open Science Commons:
When implemented…
Researchers from all disciplines
have easy, integrated and open access
to the advanced digital services, scientific instruments,
data, knowledge and expertise
they need to collaborate and achieve
excellence in science, research and innovation.
They feel engaged in governing, managing and preserving
these resources for everyone’s benefit, with the support of
all stakeholders.
Open Science Commons adopted in the EU Council Conclusions, May 2015Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3110/10/2016
A multi-stakeholder endeavor
(EU perspective)
Digital services and applications
Knowledge & Expertise
Instruments
Research data
Centres of Excellence
Innovation Centres
Research Infrastructures
Virt. Research Env. providers
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3210/10/2016
Commons
Institutionalised community governance of the production
and/or sharing of a particular type of resource (from natural
to intellectual)
Constructing Genome Commons
GÉANT: European Communications
Commons
e-Infrastructure Commons
Linux
Wikipedia …
Internet
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3310/10/2016
EOSC principles?
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3410/10/2016
EOSC Research Objects Hub
EOSC Research Objects Hub
Storage/data management
Cloud computeCloud container compute
HTC and HPC
Research outputs
Thematic services (data products, pipelines,
software, virtual appliances..)
Hub-specific service management processes,
business processes, policies
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3510/10/2016
Federation services and processes
Research objects libraries
Research Object Indexingand discovery services
EOSC federation services and activities (examples)
Markeplaces Standards and policies
Federation services/processes(accounting, monitoring,..
Business processes and channels
Knowledge an d training
Federated IdP, Auth, Authz
Tech Summit Bratislava & GadgetEXPO - Bratislava, 11.-12.05.2016
3610/10/2016
Thank you for your attention
Questions?