globus as a platform for research data management · globus as a platform for research data...

25
Globus as a platform for research data management Vas Vasiliadis University of Chicago [email protected] Best Practices in Data Infrastructure May 17, 2016

Upload: others

Post on 08-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus as a platform for research data management

Vas VasiliadisUniversity of Chicago

[email protected]

Best Practices in Data InfrastructureMay 17, 2016

Page 2: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus delivers…

Big data transfer, sharing,publication, and discovery…

…directly from your own storage systems…...via software-as-a-service

2

Page 3: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus as SaaS

Researcher initiates transfer request; or requested automatically by script, science gateway

1

InstrumentCompute Facility

Globus transfers files reliably, securely

2

Globus controls access to shared

files on existing storage; no need

to move files to cloud storage!

4

Curator reviews and approves; data set

published on campus or other system

7

Researcher selects files to share, selects user or group,

and sets access permissions

3

Collaborator logs in to Globus and accesses shared files; no local

account required; download via Globus

5

Researcher assembles data set;

describes it using metadata (Dublin core and domain-

specific)

6

6

Peers, collaborators search and discover datasets; transfer and share using Globus

8

Publication Repository

Personal Computer

Transfer

Share

Publish

Discover

• SaaSWeb access; low operational costs

• Use storage system of your choice

• Access using your existing credentials

3

Page 4: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus as bridging technology to…

• Supercomputing resources: NCSA, NERSC, XSEDE

• Campus HPC facilities• Clouds: Jetstream, AWS, Google• Instruments• Lab clusters, servers, laptops, etc.

4

Page 5: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Scaling up analysis

Move datasets to campus HPC, supercomputer, national facility

Move results to (…)

Page 6: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Bridging to instruments: APS

6Cou

rtesy

of F

ranc

esco

De

Car

lo, A

rgon

ne N

atio

nal L

abor

ator

y (2

016)

Dynamic imaging:>200TB per dataset

Page 7: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

APS DMagic

• Simple commands to automate the majority of beamline data management tasks

• Toolbox supports APS Imaging Group; can be easily adapted to any APS beamline

• Given an experiment date, retrieves users from APS scheduling system and automatically sends e-mail with link to the data

• Monitors a directory and copies any new files to a personal or remote server endpoint

• Data can be shared directly from the beamline machine or from a Globus server endpoint

7

Page 8: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Data Distribution: NGS

EC2

Page 9: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Ad Hoc Sharing: NIH

9

helix.nih.gov

Page 10: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

CC Storage

Globus Connect

Globus Publication Archivematica

Compute Canada Cloud

Regional Repository

Institutional Repository

MetadataMetadata

Index

Globus Connect

CC Storage

Globus Connect

CC Storage

Repositories: Compute Canada

National ResearchData Repository(Phase 1)

Courtesy of Todd Trann, Compute Canada, 2016

Page 11: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

NRDP Features

• Federated Storage Model: Storage and repositories distributed, and owned operated by organizations / institutions

• National Data Discovery: Single search to discover data, regardless of location

• Suitable for broad range of data types

• Archivematica: preservation packages

• Automatic geographic data replication11Adapted from Todd Trann, Compute Canada, 2016

Page 12: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus serves as…

A platform for building science gateways, portals and other web applications in support of research and education

12

Page 13: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Identity/Authentication, Group Management

…Globus Toolkit

Glo

bus

API

s

Glo

bus

Con

nectData Publication & Discovery

File Sharing

File Transfer & Replication

Globus as PaaS

13

Enable existing institutional ID systems to be used in external web applications

Integrate file transfer and sharing capabilities into scientific web apps, portals, gateways, etc.

Page 14: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Data Archive: NCAR

Page 15: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Serving a global community

• 17+ PB virtual processing

• 45,000+ custom orders, 4,000 users, 380 TB served in 2014 Courtesy of Thomas Cram, NCAR (2014)

Fully automated delivery via portal using Globus PaaS

Page 16: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

PaaS enabled automated workflow

• User logs in w/NCAR or other campus identity

• Selected dataset copied to staging area (shared endpoint)

• Read permission granted to user to access shared endpoint

• User receives email with link to access files

• ACLs deleted after five days

Page 17: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Analysis portal: Sanger

17

Page 18: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Compute Access: OSG

18

Page 19: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Data “dropbox”: BBFC

Studios upload movies for rating• Authenticate to BBFC IdP; issued unique ID• Automatically provision “dropbox”, set ACLs• Auto activate shared endpoint using SSO• Initiate transfer

19

/distributor/paramount/32534

/distributor/wb/65346

Page 20: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus today…

5major services

13national labs use Globus

160 PBtransferred

10,000+active endpoints

27 billion files processed

~450 active daily users

40,000registered users

99.9%uptime

50+institutional subscribers

1 PBlargest single

transfer to date

3 months longest

continuously managed transfer

130+federated

campus identities

Page 21: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Thank you to our sponsors!

U . S . D E PA RT M E N T O F

ENERGY

21

Page 22: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Users, usage continue steady growth…

0

500

1000

1500

2000

2500

3000

Num

ber o

f Use

rs

Active Users

Page 23: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

…but freemium gap is widening

0

500

1000

1500

2000

2500

3000

Num

ber o

f End

poin

ts

Free

Subscribed

Active Endpoints

Page 24: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

Globus Subscriptions• Globus Provider Plan

– Shared endpoints– Data publication– Peer-to-peer transfer/sharing– Management console– Usage reports– Priority support– Application integration

• Branded Web Site• Alternate Identity Provider (InCommon is standard)• Premium Storage Connectors (S3, HPSS, Spectra

Google Drive coming soon)

24

globus.org/provider-plans

Page 25: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data

We hope you will join us…

• Signup and transfer files: globus.org/login• Create endpoints: globus.org/globus-connect-

server• Documentation: docs.globus.org• Need help? support.globus.org• Subscribe to help us make Globus self-sustaining:

globus.org/provider-plans• Follow us: @globusonline

25