20150209_globus_poster-v11

1
Background Globus started as an experiment to explore the feasibil- ity of using SaaS approaches to radically simplify research data transfer. Globus is now used by dozens of research institutions and is the recommended file transfer service for UChicago RCC, NCSA, XSEDE, NERSC, and many other cyberinfrastructure providers nationwide. It implements methods for managing the secure transfer, sharing, and publication of large data sets. Objective: To provide frictionless access to sophisticated cy- berinfrastructure for managing large datasets throughout their lifecycle, at a modest cost to ensure sustainability. Globus: Research Data Management-as-a-Service CONTRIBUTORS: B. Allen, R. Ananthakrishnan, R. Aydt, J. Bester, B. Blaiszik, E. Blau, K. Chard, D. Collins, V. Cuplinskas, P. Dave, I. Foster, R. Kettimuthu, J. Kordas, L. Lacinski, M. Lidman, L. Lim- ing, M. Link, R. Madduri, S. Martin, B. Mccollam, D. Morgan, JP Navarro, C. Pickett, J. Pruyne, A. Rodriguez, G. Rohder, S. Rosen, D. Shifflett, D. Sulakhe, T. Sutton, S. Tuecke, V. Vasiliadis Project Impact • Deliver advanced capabilities for moving, sharing, an- notating, and publishing reasearch datasets • Accelerate discovery and expand scope of inquiry • Enable PIs to spend more time doing research • Significantly increase data processing rates • Expand usage and simplify access to advanced com- putation across US research and education system • Expand use of Software-as-a-Service approaches across all areas of research B Globus moves the data for you secure endpoint, e.g. laptop You submit a transfer request Globus notifies you once the transfer is complete secure endpoint, e.g. midway t r a n s f e r A 4 Years in operation 78,157,072 Gigabytes moved 2,940,000,000 Files transferred 20,882 Registered users Globus by the Numbers Transfer files reliably, securely, and fast Globus tracks shared files – no need to move files to cloud storage A user selects file(s) to share and sets access permissions for individuals or groups User B logs in to Globus to access shared file(s) data source A B Share files directly from existing storage Curator reviews and approves; data set published on campus or other Researcher assembles data set; describes it using metadata (Dublin core and domain-specific) Peers and public search and discover data sets; access and transfer using published data store curator researcher Publish files with metadata for discovery Globus Connect Personal on laptops Fire-and-forget data movement Simple sharing off existing storage Access using existing credentials Expert operations and monitoring Priority support (with subscription) Globus Connect Server on clusters Command Line Interface $ ls alcf#dtn:/ $ scp alcf#dtn:/myfile \ nersc#dtn:/myfile_V0 Data transfer nodes in Science DMZ HTTP REST Interface POST https://transfer.api. globusonline.org/v0.10/ transfer <transfer-doc> A user view of Globus Publish and discover data Catalog data and build workflows • Automate metadata ingestion from instrumentation • Build real-time metadata-driven feedback loops and workflows to support complex experiments (below) • Query for and act on resulting datasets as units • Catalog and visualize dataset provenance • Construct and share typed tag definitions • Access via web GUI (right), REST, Python API client, CLI • Deployed SaaS data publication service (below left) • Designed to handle Big Data • Built to support managed collections with configurable policies - storage location, persistent identifier type (e.g. DOI), metadata schemas and input forms, and fine-grained ACLs (right) • Designed to support rich data discovery functionality (below right - in development) Search for data Define typed tags per catalog Share data with fine-grained ACLs NexPy workflow with Swift and Catalog Data publication and discovery architecture Data publication landing page Data discovery supports complex queries S3 Globus Nexus Authentication Materials Data Facility Metadata Catalog NIST ANL UIUC S3 Handle System DOI Configuration and Policies Submission Workflows Metadata & forms ACLs Workflows Polymer Collection Storage Repositories Configuration and Policies Submission Workflows Metadata & forms ACLs Workflows Comp. Materials Collection Users & Groups Globus Nexus Publish Discover ... Persistent Identifiers (Wozniak, Wilde, Blaiszik)

Upload: ben-blaiszik

Post on 18-Aug-2015

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 20150209_Globus_Poster-v11

Curator reviews and approves; data set published on campus or other

Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public

search and discover data sets; access and transfer using

publisheddatastore

curator

researcher

Curator reviews and approves; data set published on campus or other

Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public

search and discover data sets; access and transfer using

publisheddatastore

curator

researcher

Curator reviews and approves; data set published on campus or other

Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public

search and discover data sets; access and transfer using

publisheddatastore

curator

researcher

Curator reviews and approves; data set published on campus or other

Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public

search and discover data sets; access and transfer using

publisheddatastore

curator

researcher

BackgroundGlobus started as an experiment to explore the feasibil-ity of using SaaS approaches to radically simplify research data transfer. Globus is now used by dozens of research institutions and is the recommended file transfer service for UChicago RCC, NCSA, XSEDE, NERSC, and many other cyberinfrastructure providers nationwide. It implements methods for managing the secure transfer, sharing, and publication of large data sets.

Objective: To provide frictionless access to sophisticated cy-berinfrastructure for managing large datasets throughout their lifecycle, at a modest cost to ensure sustainability.

Globus: Research Data Management-as-a-Service

CONTRIBUTORS: B. Allen, R. Ananthakrishnan, R. Aydt, J. Bester, B. Blaiszik, E. Blau, K. Chard, D. Collins, V. Cuplinskas, P. Dave, I. Foster, R. Kettimuthu, J. Kordas, L. Lacinski, M. Lidman, L. Lim-ing, M. Link, R. Madduri, S. Martin, B. Mccollam, D. Morgan, JP Navarro, C. Pickett, J. Pruyne, A. Rodriguez, G. Rohder, S. Rosen, D. Shifflett, D. Sulakhe, T. Sutton, S. Tuecke, V. Vasiliadis

Project Impact• Deliver advanced capabilities for moving, sharing, an-

notating, and publishing reasearch datasets• Accelerate discovery and expand scope of inquiry

• Enable PIs to spend more time doing research• Significantly increase data processing rates

• Expand usage and simplify access to advanced com-putation across US research and education system

• Expand use of Software-as-a-Service approaches across all areas of research

BGlobus moves the

data for you

secureendpoint,

e.g. laptop

You submit a transfer request Globus

noti�es you once the transfer is complete

secureendpoint,e.g. midway

transfer

A

4Years in operation

78,157,072Gigabytes moved

2,940,000,000Files transferred

20,882Registered users

Globus by the Numbers

Transfer files reliably, securely, and fast

Curator reviews and approves; data set published on campus or other

Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public

search and discover data sets; access and transfer using

publisheddatastore

curator

researcher

Globus tracks shared �les – no need to move �les to cloud storage

A user selects �le(s) to share and sets access permissions for individuals or groups

User B logs in to Globus to access shared �le(s)

datasource

AB

Share files directly from existing storage

Curator reviews and approves; data set published on campus or other

Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public

search and discover data sets; access and transfer using

publisheddatastore

curator

researcher

Publish files with metadata for discovery

Globus ConnectPersonal on laptops

Fire-and-forget data movementSimple sharing o� existing storageAccess using existing credentialsExpert operations and monitoringPriority support (with subscription)

Globus ConnectServer on clusters

Command Line Interface$ ls alcf#dtn:/$ scp alcf#dtn:/myfile \ nersc#dtn:/myfile_V0

Data transfer nodesin Science DMZ

HTTP REST InterfacePOST https://transfer.api.globusonline.org/v0.10/transfer <transfer-doc>

A user view of Globus

Publish and discover data

Catalog data and build workflows

• Automate metadata ingestion from instrumentation• Build real-time metadata-driven feedback loops and workflows

to support complex experiments (below)• Query for and act on resulting datasets as units• Catalog and visualize dataset provenance• Construct and share typed tag definitions• Access via web GUI (right), REST, Python API client, CLI

• Deployed SaaS data publication service (below left)• Designed to handle Big Data• Built to support managed collections with configurable policies

- storage location, persistent identifier type (e.g. DOI), metadata schemas and input forms, and fine-grained ACLs (right)

• Designed to support rich data discovery functionality (below right - in development)

Search for data

Define typed tags per catalog

Share data with fine-grained ACLs

NexPy workflow with Swift and Catalog

Data publication and discovery architecture

Data publication landing page

Data discovery supports complex queries

S3

Globus Nexus Authentication

Materials Data Facility

Metadata Catalog

NIST ANL UIUC S3

Handle System

DOI

Configuration and PoliciesSubmissionWorkflows

Metadata & forms

ACLs Workflows

Polymer Collection

Storage Repositories

Configuration and PoliciesSubmissionWorkflows

Metadata & forms

ACLs Workflows

Comp. Materials Collection Users &

G

roups

GlobusNexus

Publish Discover

...

Persistent Identifiers

(Wozniak, Wilde, Blaiszik)