eudat cdi its origins and evolution · b2 services (e.g. b2share, b2find, pid) further integration...

29
www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 EUDAT How manage Data into the Collaborative Data Infrastructure: a general overview of EUDAT services Giovanni Morelli

Upload: others

Post on 30-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

www.eudat.eu

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065

EUDAT How manage Data into the

Collaborative Data Infrastructure: a general

overview of EUDAT services

Giovanni Morelli

Page 2: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Outline

What kind of problems we want(try) to solve Different management system for different communities

Quality of data sets Class of users

What about our solutions (B2<services>) B2DROP, B2SHARE,B2SAFE,B2STAGE,B2HANDLE,B2ACCESS,…

B2<service> integration

Project and Service Enabling Community / EUDAT interaction

Practical use cases

Page 3: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Where Does EUDAT Fit In?(in a Data quality view)

Community repositories

Institute repositories

Scientists personal data

Homeless scientists

Citizen scientists

Page 4: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Where Does EUDAT Fit In?(in a multilayer view of Data Management)

Tru

st

Data

C

ura

tion

Common Data Services

Users

User functionalities, data

capture & transfer, virtual

research environments

Persistent storage,

identification, authenticity,

workflow execution, mining

Data

Generators

Community Support Services

Data discovery & navigation,

workflow generation,

annotation, interpretability

Page 5: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Who can use EUDAT service

5

Upload and

download

Upload, add

metadata, share

Periodic transfers,

quality checks …

Single researcher Team Community

Different strategies for different usage scenarios

Page 6: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Community-Driven Solutions

PHYSICAL SCIENCES & ENGINEERING

MATERIALS & ANALYTICAL FACILITIES

MAPPER

BIOMEDICAL & MEDICAL SCIENCES

EUDAT services are designed, built

and implemented based on user

community requirements.

Page 7: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

7

Community Repositories(thematic data centres)

EUDAT generic data service provider storage, workflows, processing, archive

EUDAT Collaborative Data Infrastructure(A general CDI architecture overview)

Page 8: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

8

EUDAT Collaborative Data Infrastructure(Using vs. joining)

Community “use” EUDAT

Page 9: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

9

EUDAT Collaborative Data Infrastructure(Using vs. joining)

Community “join” EUDAT

Page 10: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

If there are hundreds of Research

Infrastructures, how many different data

management systems can be sustained?

10www.eudat.eu

Page 11: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

B2 Service (modular) Suite

B2ACCESS

B2Handle

Page 12: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

EUDAT2020Further integration with EUDAT CDI (e.g. B2SHARE)

Integration with B2ACCESS to enable access by many different Identity Providers

Cloud Storage Federation, collaboration with GEANT in OpenCloudMesh

Assess B2DROP as workspacearea to computing facilities

Who

Citizens Scientists and small teams

What

Store and exchange data

Synchronize multiple versions

Ensure automatic desktop

synchronization

Why

Ease of Use

Trusted European Service

12

Page 13: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

EUDAT2020Further integration with EUDAT CDI (e.g.

B2DROP, B2SAFE)

Integration with B2ACCESS (incl eduGAIN),

focus on authorization

Embargo period

Editing of metadata

Data versioning and annotation

Extended HTTP Restful API interface

Easy installable software package

Who

Small to Medium Teams

What

Store data (incl. software) and add domain

meta data

Share registered research data worldwide

Preserve (small-scale) research data for long-

term

Why

Register Data for Publications

Make known to wider community

13

Page 14: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

14

Collection of official RDA documents

Page 15: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Service Integration

Bidirectional Integration

Page 16: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

EUDAT2020Support iRODS v4

Support metadata

Optimize and extend policies to support

data curation and provenance

Further integration with B2ACCESS

Support authorization on basis of

community access rules

Assess B2SAFE as workspace area to

computing facilities

Who

Community Data Managers

‘Sophisticated’ Organisations

What

Provide an abstraction layer which virtualizes

large-scale data resources

Guard against data loss in long-term

archiving and preservation

Optimize access for users from different

regions

Bring data closer to powerful computers

Why

Performance

Replication between trusted sites

Data Preservation

16

Page 17: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Data Policy ManagerData policies are centrally managed

Policy rules are implemented and enforced by

site-local rule engines

Policies describe in an abstract language

Community data managers must authenticate

to provide trust

Support policies for data replication and

integrity checking

Central logging for auditable data policies to

monitor execution

Active collaboration with the RDA Practical

Policy WG

EUDAT2020Handover to operations

Extend number of policies supported

Focus on data curation and

provenance policies

Integrate with B2ACCESS17

Page 18: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Further develop HTTP to a mature

interface and extend functionality to

metadata

Native support PIDs within GridFTP

transfers

Extend EUDAT client API library to other

B2 services (e.g. B2SHARE, B2FIND,

PID)

Further integration with B2ACCESS

EUDAT2020

Who

Users and Communities with Significant

Computational Needs

What

Transfer large data collections from EUDAT

storages to external HPC facilities for

processing

Copy large data sets, ingesting them onto

EUDAT storage resources

Why

Integration/Collaboration with PRACE

Simplify Data Transfer

18

Page 19: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Harvesting of metadata stored in

B2SAFE

Community customizations

Annotation of datasets

Further assess RDF and Linked Data

Further assess scalability and

performance

EUDAT2020

Who

Anyone

What

Find collections of scientific data quickly and

easily, irrespective of their origin, discipline or

community

Get quick overviews of available data

Browse through collections using standardized

facets

Why

Unique collection

Ease of Searching

19

Page 20: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Develop the policies for the B2HANDLE

service (e.g. PID namespace mngmt)

Migrate service from Handle v7 to v8

Define PID Information Types for data,

metadata, collection records

Integrate with Data Type Registry service

Consolidate B2HANDLE API library with

EUDAT API library

EUDAT 6M EC Review, 28th October 2015, Brussels

Development plan

Who

Groups or Communities who want to make

their data citable

What

Follows policies to register data and make

it long term refer- and citable

Reliability through mutual PID mirroring

Provides abstraction layer between a

globally unique persistent identifier and

physical location of data objects

Machine readable via HTTP RESTful API

Why

Simple integration

Technology Agnostic

20EUDAT M6 Review - Services and Operations

Page 21: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

EUDAT2020Integration with operational and all B2 services

B2SHARE B2DROP B2STAGE

B2SAFE B2HANDLE, DPM, CREG , TTS,

Integration with community IdP domains and

portal environments

Enabling access via eduGAIN social IDs

enabling access via ORCID CLARIN IdPs

Focus on authorization

Collaborate on cross e-infrastructure access

(e.g. PRACE, EGI)

Extend European collaboration via AARC

(e.g. Geant, Terena)

Who

Anyone wanting to use the B2 Services

What

Complies with community ownerships and

access rights, basis of trust

Credential conversion approach (e.g.

SAML, OpenID, X.509, Username/password)

Identity provider for citizen scientists

Why

Use your own ID in federated environment

21

Page 22: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational
Page 23: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Operational tools & Central Services

creg.eudat.eu

CDI Config DBSites, Service Comp.

cmon.eudat.eu

Monitoring (cmon)to be replaced: A&R M.

rct.eudat.eu

RCT (Project Coord.)to be replaced by DPCP

http://eudat.eu/support-request

helpdesk.eudat.eu

HelpdeskTTS

EUDAT Wiki, JIRACROWD (AAI), SVN

Service Hosting

Framework 23

Page 24: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Understanding the enabling processall the actors

Pre

sale

Dep

loy

Pro

du

ctio

n

Data pilot document(WP4)

Data Project Coordination Portal

Service Portfolio(WP2)

Small/LargeCustomization

(WP5)

Service & Resource

Provisioning(WP6 – T6.2)

Data Project Y Data Project ZData Project X

Service XEnabling Team

Service YEnabling Team

Service ZEnabling Team

WP

6 –

T6.3

TTSTTSTTS

Community

GOCDB

Interface

Production

UserSupport Monitoring

Page 25: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Understanding the enablingDeploy actors

Dep

loy

Data Project X

Service XEnabling Team

WP

6 –

T6.3

ProjectEnabler

TTS

TTS

ServiceIntegrator

Service integrationinto community

Page 26: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Understanding the enablingProject Lifecycle and relationship with

Project Enablers and Service Integrators

Planned

Enabling (repos)

Enabling

Pre-Production

Production

Serv

ice

Inte

grat

or(

s)

Pro

ject

En

able

r(s)

data project/service enabling still under discussion

service enabling at community side (repository) only, EUDAT provider selected, but storage service not yet provided

service enabling at community and EUDAT side

service is operational, but there are still someissues: e.g initial data transfer not complete,security or quality assessment pending,community or provider did not confirmedproduction readiness

service deployed and integrated across allparticipating project partners (communityrepository and EUDAT nodes, communityconfirmed production readiness

Documentation

User Documentation

Page 27: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

23 data pilots selected for enabling in EUDAT2020

Data pilots overview

Biomedical and lifesciences

Earth sciences, energyand environment

Physical Sciences andEngineering

Social Sciences andHumanities

Other

ResearchCommunity

ResearchInfrastructure

Applicant Community

Scientific domain

Page 28: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Data pilots overview

0

1

2

3

4

5

Reference sites for storage

0 5 10 15 20

Data synchronication and exchange

Data repository and data sharing

Data replication and preservation

Data staging for analysis and processing

Data discovery and search

Data typing & visualization

New services or tools for Big Data

New services or tools for Semantic web

Total storage request 1220-4300 TB

Requested EUDAT services

Page 29: EUDAT CDI Its Origins and Evolution · B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS EUDAT2020 Who Users and Communities with Significant Computational

Questions…