20150209_globus_poster-v11
TRANSCRIPT
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
BackgroundGlobus started as an experiment to explore the feasibil-ity of using SaaS approaches to radically simplify research data transfer. Globus is now used by dozens of research institutions and is the recommended file transfer service for UChicago RCC, NCSA, XSEDE, NERSC, and many other cyberinfrastructure providers nationwide. It implements methods for managing the secure transfer, sharing, and publication of large data sets.
Objective: To provide frictionless access to sophisticated cy-berinfrastructure for managing large datasets throughout their lifecycle, at a modest cost to ensure sustainability.
Globus: Research Data Management-as-a-Service
CONTRIBUTORS: B. Allen, R. Ananthakrishnan, R. Aydt, J. Bester, B. Blaiszik, E. Blau, K. Chard, D. Collins, V. Cuplinskas, P. Dave, I. Foster, R. Kettimuthu, J. Kordas, L. Lacinski, M. Lidman, L. Lim-ing, M. Link, R. Madduri, S. Martin, B. Mccollam, D. Morgan, JP Navarro, C. Pickett, J. Pruyne, A. Rodriguez, G. Rohder, S. Rosen, D. Shifflett, D. Sulakhe, T. Sutton, S. Tuecke, V. Vasiliadis
Project Impact• Deliver advanced capabilities for moving, sharing, an-
notating, and publishing reasearch datasets• Accelerate discovery and expand scope of inquiry
• Enable PIs to spend more time doing research• Significantly increase data processing rates
• Expand usage and simplify access to advanced com-putation across US research and education system
• Expand use of Software-as-a-Service approaches across all areas of research
BGlobus moves the
data for you
secureendpoint,
e.g. laptop
You submit a transfer request Globus
noti�es you once the transfer is complete
secureendpoint,e.g. midway
transfer
A
4Years in operation
78,157,072Gigabytes moved
2,940,000,000Files transferred
20,882Registered users
Globus by the Numbers
Transfer files reliably, securely, and fast
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
Globus tracks shared �les – no need to move �les to cloud storage
A user selects �le(s) to share and sets access permissions for individuals or groups
User B logs in to Globus to access shared �le(s)
datasource
AB
Share files directly from existing storage
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-speci�c) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
Publish files with metadata for discovery
Globus ConnectPersonal on laptops
Fire-and-forget data movementSimple sharing o� existing storageAccess using existing credentialsExpert operations and monitoringPriority support (with subscription)
Globus ConnectServer on clusters
Command Line Interface$ ls alcf#dtn:/$ scp alcf#dtn:/myfile \ nersc#dtn:/myfile_V0
Data transfer nodesin Science DMZ
HTTP REST InterfacePOST https://transfer.api.globusonline.org/v0.10/transfer <transfer-doc>
A user view of Globus
Publish and discover data
Catalog data and build workflows
• Automate metadata ingestion from instrumentation• Build real-time metadata-driven feedback loops and workflows
to support complex experiments (below)• Query for and act on resulting datasets as units• Catalog and visualize dataset provenance• Construct and share typed tag definitions• Access via web GUI (right), REST, Python API client, CLI
• Deployed SaaS data publication service (below left)• Designed to handle Big Data• Built to support managed collections with configurable policies
- storage location, persistent identifier type (e.g. DOI), metadata schemas and input forms, and fine-grained ACLs (right)
• Designed to support rich data discovery functionality (below right - in development)
Search for data
Define typed tags per catalog
Share data with fine-grained ACLs
NexPy workflow with Swift and Catalog
Data publication and discovery architecture
Data publication landing page
Data discovery supports complex queries
S3
Globus Nexus Authentication
Materials Data Facility
Metadata Catalog
NIST ANL UIUC S3
Handle System
DOI
Configuration and PoliciesSubmissionWorkflows
Metadata & forms
ACLs Workflows
Polymer Collection
Storage Repositories
Configuration and PoliciesSubmissionWorkflows
Metadata & forms
ACLs Workflows
Comp. Materials Collection Users &
G
roups
GlobusNexus
Publish Discover
...
Persistent Identifiers
(Wozniak, Wilde, Blaiszik)