building an international infrastructure for research data - jisc digital festival 2015
TRANSCRIPT
Building an international infrastructure for Research DataJisc, CSC, SurfSara, EUDAT
Research Data infrastructure
Jisc, UK
CSC, Finland
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
4
» Research Integrity and transparency
» Re-use, new research and innovation
» Research Funder policies
» Changes - culture, organisation, technical
» Services and support for data across the lifecycle
Research data
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
5
UK e-Infrastructure
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
6
Data infrastructure
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
7
Data infrastructure BBSRC Goblet (Bioinformatics Training)Digging Into Data (AHRC, ESRC, NSF, SSHRC together)
NSF/BIO lead agency pilot in data driven biology and systems biologyEU DARIAH (Digital research Infrastructure for Arts and Humanities)Genomic Alliance
Square Kilometre ArrayEU T0 (Tier Zero)
Human Brain Project
Cross European Cohorts and initiative e.g. SHARE, CHICOS etc.
International cohort families e.g. the HRG ageing family
DASISH European Initiative for European Data Humanities Infrastructure
PRACE
World LHC Computing Grid
LSST Large Synoptic Survey Telescope
ISBE ESFRI project – preparatory – in Systems Biology
BBSRC iPlant GEANTEBI ELIXIR
European Bioinformatics Institute (EMBL-EBI funded via EMBL subscriptions for EU nations and AustraliaEUDAT (EC project)
Research Data Alliance for Standards and Communities of practice
Catalogue of data catalogueR3 data, datalib merger
EC projects in data curation e.g. SCIDIP – ES – SCAPE (RC involvement)
PANDATA consortium + EU projects (photon + neutron community) European facilities
Funding ops – FW7 – Newton – H2020
Science Europe Working Groups
ELIXIR (ESFRI project) European Life Science data Infrastructure (10 members + lead)
Data beyond/ without boundaries
Biomed Bridges
ERICS/ Transnational Platforms)
MIBBI initiative Journals
Researcher identity schemes such as ORCID
CESSDA (Consortium of European Social Science Data Archives)
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
8
» Janet Network:› A core across the UK of
200Gbit/s
› 19 regional distribution areas.
› Resilient
› ~900 organisations connected
› IP and circuit connection services
› ~1Tbit/s external connectivity
› Overall availability: >99.9%
Data movement
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
9
An environment for housing infrastructure
» Shared data centre:› £900K HEFCE investment› Anchor tenants: Crick, KCL, LSE, QMUL, Sanger, UCL
– jisc.ac.uk/shared-data-centre › Requirements for a second data centre being gathered
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
10
Research Data infrastructure
Data
Clouod
Librarians, research managers & IT have
interlocking services, to support researcher needs and institutional policies
Researchers have a cohesive suite of research data
management, publication and discovery services
Research data management and planning services
Research data storage and archival services
Research data discovery services
Data Data
UKDA, BADC ICSU / WDSEBI / GenBank
Research data management applications
Journal & funder policyregistries Research data registry /
Cross repository discovery service
DMPonline
DMP Registry
SWORD +
Disciplinary data repositories (National and International)
Institutional data catalogues
Disciplinary research data Discovery services
Metadata exchange between journals, archives, repositories
Data identifiers, metadata schema, metrics
Support for Research data lifecycle
Storage
Infrastructure components that
underpin all functions & services
Researcher identifiers Organisation identifiers RegistriesData Identifiers
Research data management applications
Network / Janet - Security/UK access management federation
KeyJisc supported:
Other supported:
Advice, guidance & training is also needed
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
11
Research at Risk
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
12
Shared infrastructure
15/04/2023
13
Shared Infrastructure
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
Research dataDiscovery service
Usage statisticsfor research data
Journal research data policy registry
Research datamanagementplan registry
Research informationaggregation
Medical datasharing tools
15/04/2023
14
Shared Infrastructure
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
Technical standards –
common metadata &
protocols
Experiments & prototypes
for new solutions
Other shared services?
Frameworks, national agreementsand access to tools
Preservation& storage
services
Advice, guidance,
policy, advocacy
Working with:
» RCUK
» Funding bodies
» Knowledge Exchange partners
» EC
» Research Data Alliance
» UK Open Research Data Forum
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
15
» Finnish Ministry of Education and Culture initiative for the promotion of information availability and open science
» Outputs include: Open Science and Research Handbook, Data management guide
» Services:› Etsin research data
finder› IDA research data
storage service› AVAA open data
publishing platform› Aila data service portal› Language Bank of
Finland› Doria & Theseus
publication archives› FINTO ontology service
Open Science and Research Initiative in Finland
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
16
Open Science and Research Initiative in Finland
openscience.fi/services
Planning
Implementation
Storage
Publishing
Discovery
Reuse Open Science and Research
HandbookData
management guide and checklist
EUDAT
A pan-European e-infrastructure solution
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
18
EUDAT Consortium
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
19
Bridging National and European solutions
» Research and infrastructures are still funded at national levels so we need to make sure that the solutions remain interoperable
» EUDAT provides a European gateway to national centers and a European extension to national solutions
» Making national resources more available and visible› Access to European resources through
national catalogues› Making visible valuable national
collections through EUDAT
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
20
Research Infrastructures – Where is it going?
» Research Infrastructure trends:› Internationalisation
› Diversification
› Increasingly relying on on ICT
› Data deluge is a common challenge
» European Ris:› Around 500› € 100 billion
investment
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
21
» The worst case scenario: 500 RI with 500 incompatible self-made ICT and data management solutions
» What can we do to promote collaboration and re-use of e-infrastructure?
EUDAT needs to promote synergy
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
22
» Think about the users... and all these acronyms! › Users should have a “right” to a seamless access to
network, data, and computing resources funded by public money
› It is our role to make it as easy as possible for users. No one should care about e-Infrastructures as such
» Think global!› Solutions must also be thought at global level (RDA)
› Cross-continent collaboration is a must (e.g. NDS, ANDS, etc.)
E-Infrastructure commons
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
23
Service-Oriented
» Covering both access and deposit, from informal data sharing to long-term archiving, and addressing identification, discoverability and computability of both long-tail and big data, EUDAT services address the full lifecycle of research data
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
24
» Using EUDAT services: finding and accessing data, for instance, or storing smaller data sets by interacting with one of the Collaborative Data Infrastructure (CDI) public front-end services
vs
» Joining the CDI: implies a tighter integration with at least one of the EUDAT centres and a partnership between legal entities relying on OLAs and SLAs
Federated and Distibuted
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
25
Jisc’s Role: Governance model
» Leading the EUDAT work on governance models for the Collaborative Data Infrastructure› Review and draw on the existing arrangements
within national governments and pan-European research communities
› Two Case-studies:– (Working together at national and
European levels) how national infrastructures can work with EUDAT through clear examples from the UK, the Netherlands, Finland, and Germany
– (Working together with research infrastructures) will investigate the requirements from the research infrastructures in terms of service provisioning, contractual obligations, and governance, starting with the RIs represented in the project
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
26
Jisc’s Role: Governance model
» With DCC lead EUDAT’s work on data management planning› bring expertise on data management plans
(DMPs)
› customize the DMPOnLine tool for EUDAT https://dmponline.dcc.ac.uk/ in order to provide a seamless data management environment and workflow and support EC data requirements
› deliver DMP training support
15/04/2023
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham
27
» EUDAT Slides courtesy of Kimmo Koski, Managing Director CSC - IT Center for Science, Finland & EUDAT Co-ordinator
Thank You!
SURFSARA – AN EXAMPLE OF RESEARCH DATA INFRASTRUCTURE FROM THE NETHERLANDS
Jisc Digital Festival - Building an international infrastructure for research data
9 March 2015
Jan Bot
About SURF
SURF research service portfolioOverview
COMPUTE: highend solutions 1000 times more powerful than your PC
DATA SERVICES: easily accessible storage on disk or tape
VISUALISATION: advanced solutions and support to create visualisations
CONNECTIVITY: fast endtoend connections tailored to your research need
COLLABORATION INFRA: single signon access to many services
INTEGRATION SUPPORT: dedicated integration support by experienced scientists
MARKET: reseller of content, cloud solutions, software and hardware (via partnershops)
Connecting Dutch national and European e-infrastructures
national international
High Performance Computing
National super-computer
www.prace-ri.eu/www.eesi-project.eu
Grid & Cloud Computing
SURFsara, Nikhef, RUG-CIT
www.egi.eu
Network SURFnetwww.geant.net
Data servicesSURFsara, DANS, 3TU.datacenter, TARGET
www.eudat.euwww.pidconsortium.eu
Data centric infrastructure services
Example projects
LOFAR
Large Hadron Collider
Genome of the Netherlands
Data volumes @SURFsara
Community Volume
LOFAR 5,8PB
LHC/ATLAS 3.9PB
LHC/LHCb 1,4PB
BBMRI 115TB
MOLEPI 78TB
LHC/ALICE 74TB
ILDG 43TB
DANS 18TB
OTHERS 50TB
Projects Volume
COSMO GRID (2009) 105TB
RUMC (2013) 90TB
OWLS (2006) 70TB
ESSENCE (2006) 41TB
ENTRAIN (2011) 26TB
ITAMOC (2011) 25TB
EAGLE (2013) 2,5TB (600TB)
TITAN (2014) (1PB)
HPC ARCHIVE (1993) 1172TB
Start LHC
LOFAR
12PB (2014)
Happy Few or.. the SURF Value Chain
SUPPORT
0 20.000# RESEARCHERS
Appliances for the happy fewLot of support neededTailor made implementations
Blueprints &Good practicesdeveloped
Long tailGeneric/minimal SupportCommodity
ICT support requestResource selection, orchestration and support
The scientist
Local ICT support
staff
SURFsupport
4research
find the best
solution and get
started !
Trainthetrainer
Advice, training, handson support
Advice, training, handson support
Mastering the data life cyclewith SURF services
giving access to data
ingest, store, preserve, share data
transfer data
Process data
visualizedata
integrate data
LightpathsBandwidth on Demand
NetherLight
Long/Short term, Disk/TapeTrusted Digital RepositoryBeehub/SURFdriveB2SAFE/B2SHARE/PID services (EUDAT)DANS/3TU datacentrum
CommunitieseScience center:
Integration support
Remote clusters & GPUCollaboratorium
Support
AuthenticationAuthorization(3rd party) collaboration tools (e.g. FileSender)
SupersGRID HPC cloud
HADOOP
supporting theResearch Data
Life Cycle
Research Timeline
Before During After
Central Archive
GRID SE
Trusted Digital Repository
Research Data Storage
BEEHUB
B2DROP / SURFdrive
Data Ingest Service
EPIC PID
B2SHARE
B2FIND
B2SAFE
B2STAGE
Developments in The Netherlands
» Funding agencies require scientists to add a ‘data paragraph’ to project proposals
» Fraud in scientific publications (Stapel, 2011) raised awareness of the importance of data provenance
» Decreased (direct) funding of national e-infrastructure changes the role of resource providers
RDNLResearch Data Netherlands
RDNL: 3 Organisations
Three organisations serving different research disciplines.
Annemiek van der Kuil |PhotoA.nl
researchdata.nl/en
DANS Data Archiving and Networked Services
Institute of Dutch Academy and
Research Funding Organisation
(KNAW & NWO) since 2005
First predecessor dates back to
1964 (Steinmetz Foundation),
Historical Data Archive 1989
Mission: promote and
provide permanent
access to digital research
information
dans.knaw.nl/en
3TU.Datacenter
3TU.Datacentrum offers the knowledge, experience and the tools to archive research data in a standardized, secure and well-documented manner. It provides the research community with:
• A long-term archive for storing scientific research data• Permanent access to, and tools for reuse of research data• Advice and support on data management
3TU.Datacentrum currently hosts thousands of datasets. To see examples please visit: http://data.3tu.nl.
Federated infrastructure for research data
Front-office Back-office model
U2Connect: linking EUDAT services to research institutes in the Netherlands
Use and build upon the EUDAT services to facilitate researchers to manage research data and to collaborate on research data across universities/research institutes
u2connect.eu
From researcher to infrastructure provider
scientist
Local ICT
support staff
Find out more…
Contact…
Matthew DoveyPrincipal consultant (Research e-Infrastructures)