2013 06-21-computing-for-light-sources

23
globus online Globus Online for Managing Tomography Data at APS Rachana Ananthakrishnan Francesco De Carlo Argonne National Lab

Upload: rachana-ananthakrishnan

Post on 28-Aug-2014

154 views

Category:

Technology


0 download

DESCRIPTION

Presented at the Computing for Light and Neutron Sources Technical Forum. Discusses Globus Online transfer, sharing and metadata management in the context of collaboration with Advanced Photon Source.

TRANSCRIPT

Page 1: 2013 06-21-computing-for-light-sources

globus online

Globus Online for Managing Tomography Data at APS

Rachana AnanthakrishnanFrancesco De Carlo

Argonne National Lab

Page 2: 2013 06-21-computing-for-light-sources

We started with reliable, secure, high-performance file transfer …

DataSource

DataDestinatio

n

User initiates transfer request

1

Globus Online moves and syncs files

2

Globus Online notifies user

3

Page 3: 2013 06-21-computing-for-light-sources

… and then made it simple to share big data off existing storage systems

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus Online tracks shared files; no need to move files to cloud storage!

2

User B logs in to Globus

Online and accesses

shared file

3

Page 4: 2013 06-21-computing-for-light-sources

Transforming data acquisitionCurrent

• Experimental parameters optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Page 5: 2013 06-21-computing-for-light-sources

Transforming data acquisitionEnvisaged

• Experimental parameters optimized automatically

• Collected data available to optimization programs

• Data are automatically reconstructed, reduced, and shared with local and remote participants

• User team leaves the APS with reduced data

Current• Experimental parameters

optimized manually• Collected data combined

with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Page 6: 2013 06-21-computing-for-light-sources

Facility data acquisition

Globus Online as enabler

Globus Online transfer service

Reduced data

Analysis/SharingGlobus

Online sharing service

Globus Online dataset service*

* In development

Page 7: 2013 06-21-computing-for-light-sources

7Credit: Kerstin Kleese-van Dam

Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL

Page 8: 2013 06-21-computing-for-light-sources

Looking at how researchers use data

• A single research question often requires the integration of many data elements, that are:– In different locations– In different formats (Excel, text, CDF, HDF, …)– Described in different ways

• Best grouping can vary during investigation– Longitudinal, vertical, cross-cutting

• But always needs to be operated on as a unit– Share, annotate, process, copy, archive, …

Page 9: 2013 06-21-computing-for-light-sources

How do we manage data today?

• Often, a curious mix of ad hoc methods– Organize in directories using file and directory

naming conventions– Capture status in README files, spreadsheets,

notebooks– Even PowerPoint!

• Time-consuming, complex, error prone

Why can’t we manage our data like we manage our pictures and music?

Page 10: 2013 06-21-computing-for-light-sources

Introducing the dataset• Group data based on use, not location

– Logical grouping to organize, reorganize, search, and describe usage

• Tag with characteristics that reflect content …– Capture as much existing information as we can

• …or to reflect current status in investigation– Stage of processing, provenance, validation, ..

• Share data sets for collaboration– Control access to data and metadata

• Operate on datasets as units– Copy, export, analyze, tag, archive, …

Page 11: 2013 06-21-computing-for-light-sources

Expanding Globus Online services

• Ingest and publication– Imagine a DropBox that not only

replicates, but also extracts metadata, catalogs, converts

• Cataloging– Virtual views of data based on user-

defined and/or automatically extracted metadata

• Integration with computation– Associate computational procedures,

orchestrate application, catalog results, record provenance

Page 12: 2013 06-21-computing-for-light-sources

Builds on catalog as a serviceApproach

• Hosted user-defined catalogs

• Based on tag model<subject, name, value>

• Optional schema constraints

• Integrated with other Globus services

Three REST APIs/query/• Retrieve subjects/tags/• Create, delete,

retrieve tags/tagdef/• Create, delete,

retrieve tag definitions

Builds on USC Tagfiler project (C. Kesselman et al.)

Page 13: 2013 06-21-computing-for-light-sources

Exemplar: APS Beamlines 32-ID & 2-BM

X-Ray imaging, tomography, ~few µm to 30 nm resolution

Currently can generate up to 100 TB per day

< 1GB/s data rate; ~3-5GB/s in 5-10 years

Page 14: 2013 06-21-computing-for-light-sources

14

StorageImage processing

(normalization, etc.)

Tomographic reconstruction

Visual inspection

Selection

Beamline 2-BM~1.5um resolution

Beamline 32-ID-C20-50 nm resolution

Image processing (alignment, etc.)

Tomographic reconstruction

Visual inspection

Selection

Selection Multi-scale image fusion

Visual inspection

Up to 100 fps2K x 2K, 16 bits11 GB raw data

1,500 fps2K x 2K, 16 bits1 min readout

11 GB raw data

Multi-scale 3D imaging data fusion at APS

Page 15: 2013 06-21-computing-for-light-sources

15

APS Imaging Group

APS Software Service Group

Mathematics & Computer Science/Computation Institute

Multi-scale image fusion

Infrastructure LDRD

System integration

Instrument & Data Collection

Data Management Services

Mathematics & Computer Science

Results:Google earth style

zoom in data navigation

Tao of Fusion LDRD

Argonne Collaborations

Page 16: 2013 06-21-computing-for-light-sources
Page 17: 2013 06-21-computing-for-light-sources
Page 18: 2013 06-21-computing-for-light-sources
Page 19: 2013 06-21-computing-for-light-sources
Page 20: 2013 06-21-computing-for-light-sources
Page 21: 2013 06-21-computing-for-light-sources
Page 22: 2013 06-21-computing-for-light-sources

Timelines• July: – Alpha service available

• August:– Pilot with two groups at APS

• Fall of this year:– Pilot with few other groups at APS– Early beta

Page 23: 2013 06-21-computing-for-light-sources

Thank You

• Interested in working with us on dataset service:– Email: [email protected]

• Contact: [email protected]• Website: www.globusonline.org