glast collaboration meeting, march 2008 t.johnson1/22 glast large area telescope data access tony...

22
GLAST Collaboration Meeting, March 2008 T.Johnson 1/22 GLAST Large Area Telescope GLAST Large Area Telescope Data Access Tony Johnson Stanford Linear Accelerator Center [email protected] Gamma-ray Large Gamma-ray Large Area Space Area Space Telescope Telescope http://glast-ground.slac.stanford.edu/

Upload: emerald-maude-dennis

Post on 17-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

GLAST Collaboration Meeting, March 2008

T.Johnson 1/22

GLAST Large Area TelescopeGLAST Large Area Telescope

Data Access

Tony JohnsonStanford Linear Accelerator [email protected]

Gamma-ray Large Gamma-ray Large Area Space Area Space TelescopeTelescope

http://glast-ground.slac.stanford.edu/

GLAST Collaboration Meeting, March 2008

T.Johnson 2/22

OutlineOutline

• Topics Covered

– xrootd

– LAT Data Catalog

• Features

• Web Interface

• Tools

– Download Manager

– Skimmer

– WIRED

– Astro Server

– Miscellaneous

GLAST Collaboration Meeting, March 2008

T.Johnson 3/22

xrootdxrootd• xrootd

– System developed at SLAC to manage large datasets– Distributes files across disks

• Maximizes throughput• Minimizes manual disk management• Automates archiving datasets to (and restoring from) tape• Provides more reliability and scalability than NFS• Supports access control based on GLAST collaborator list

• Has been in used for OpsSim2 and “Big MC Run”– Mostly working smoothly

• Miscellaneous idiosyncrasies that need to be understood• Timeout problems when reading files

GLAST Collaboration Meeting, March 2008

T.Johnson 4/22

LAT Data CatalogLAT Data Catalog

• Data catalog is a database designed for tracking LAT datasets– Can be used with

• Disk files in AFS, NFS, or XROOTD servers, or tape archives • Data created inside or outside of processing pipeline• Data created/stored at SLAC or elsewhere• One or more locations per dataset

– Simplifies access to data by providing a uniform view of files irrespective of their physical location

– Allows data to be organized into a tree of “virtual” folders• Folders don’t have to correspond to physical location of data

– Allows data to have associated “meta-data”• Some meta-data is required and verified by catalog

– size, location, run range, creation date• Other meta-data is user-defined and arbitrarily extensible

– Data can be • Browsed using virtual folders and “groups”

– Folders contain arbitrary sub-folders, datasets and groups– Groups contain homogeneous list of datasets

• Searched using meta-data– E.g. DatasetType=MC && RunMin > 50 && RunMin < 100

– Data crawler• As new datasets are registered crawler validates files and extracts meta-

data (file size, number of events, etc).

GLAST Collaboration Meeting, March 2008

T.Johnson 5/22

LAT Data Catalog - Web InterfaceLAT Data Catalog - Web Interface

Browsable tree of

datasets

Events, file size, run range

automatically set by “crawler”

Access/ Authentification handled by web

Meta-data added by creator

Supports mirroring at

multiple sites

•http://glast-ground.slac.stanford.edu/DataCatalog/

Dataset Description

GLAST Collaboration Meeting, March 2008

T.Johnson 6/22

LAT Data Catalog - ToolsLAT Data Catalog - Tools

• Pipeline Tools– From within “Pipeline Scriptlet” datasets can be

• registered together with meta-data and multiple locations• located using meta-data and passed to subsequent processing stages

• Command Line Tools– Available now

• registerDataset– Wildcards supported for registering many datasets at once

• find– List/search for files

• addLocation• addMetadata

– Coming soon• remove• move

• Java API– Programmatic access to full functionality

• More Info– Data catalog User’s Guide– http://confluence.slac.stanford.edu/display/ds/Data+Catalog+Users+Guide

GLAST Collaboration Meeting, March 2008

T.Johnson 7/22

Recent ImprovementsRecent Improvements

• Line-mode client find command– datacat find -G merit /MC-Tasks/OpsSim/opssim2-GR-v13r9/runs -s RunMin

root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.rootroot://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.rootroot://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000002-merit.rootroot://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000003-merit.rootroot://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000004-merit.rootroot://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000005-merit.rootroot://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000006-merit.root

– datacat find --recurse --search-groups

-F 'DataType=="MERIT"&&nMetStart>=257731200 && nMetStart<=257731202' -S SLAC_XROOT -s TaskName -s Name /MC-Tasks/OpsSim/

root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-HEAD1-1041-2-6/merit/opssim2-GR-HEAD1-1041-2-6-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9/merit/opssim2-GR-v13r9-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p1/merit/opssim2-GR-v13r9p1-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2/merit/opssim2-GR-v13r9p2-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p2-np/merit/opssim2-GR-v13r9p2-np-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-GR-v13r9p3/merit/opssim2-GR-v13r9p3-000001-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000000-merit.root root://glast-rdr//glast/mc/OpsSim/opssim2-nocel/merit/opssim2-nocel-000001-merit.root

– Available now in DEV, feedback encouraged• Dan is preparing adding to data catalog user’s guide

• Enhancements to data catalog access in pipeline– Access meta-data from search results

GLAST Collaboration Meeting, March 2008

T.Johnson 8/22

Recent ImprovementsRecent Improvements

• New faster crawler– Original crawler was not able to keep up with MC running at

full throttle.– New crawler processes files in parallel and can easily keep

up– During Ops Sim2 problems discovered with files >2GB in

length• Now fixed

GLAST Collaboration Meeting, March 2008

T.Johnson 9/22

Status/Problems/PlansStatus/Problems/Plans

• Problems– Can be painfully slow (with 5,000,000 datasets)

• New oracle database being tested now• Karen working on adding “materialized views”• Further optimization of queries needed• Sensible pagination of large datasets

– Web interface needs to allow selection of data based on• Run number range• Time range• Meta-data search (c.f. line-mode client)

– File versions• As of Ops Sim 2 L1Proc registers multiple versions of files

– r0257998848_v001_merit.root – r0257998848_v002_merit.root

• Data catalog does not know these are multiple versions of the same file– Sends them both to the skimmer duplicate events

• Propose to add versioning to data catalog (show only latest by default)– Need Custom Views of data

• E.g. All ASP products for run nnn source abc• Plan

– Fix problems

GLAST Collaboration Meeting, March 2008

T.Johnson 10/22

Download ManagerDownload Manager

• One-click download of multiple files• Inherits authorization from web login

– note no anonymous FTP in future – SLAC account will be required for data access• Works with ftp:, http: and root:

– Validates files (length, checksum) against data catalog• Supports simultaneous download of multiple files• Does not download files which already exist in target dir

– So easy to fetch recently added files• Can resume download of partially downloaded files

GLAST Collaboration Meeting, March 2008

T.Johnson 11/22

Status/Problems/PlansStatus/Problems/Plans

• Several problems discovered during Ops Sim 2– 100% CPU usage after file recovery (fixed)– Bad error message if checksum inconsistent (fixed)– Problems downloading files >2GB (almost fixed)

• New feature– Start/Pause download requested (now available)

• Feature requests pending– Ability to download select run/time ranges

• This will work automatically once this feature is added to data catalog web application

– Non-GUI version for automated download/sync of data– Ability to select files to download from GUI (without web)

GLAST Collaboration Meeting, March 2008

T.Johnson 12/22

LAT Data SkimmerLAT Data Skimmer

• Allows data to be selected using “TCut” on tuple columns– Can output either Root or Fits (FT1) files– Uses Pipeline II for data processing

• Allows parallel processing for large tasks– Output available for download for 10 days– Complete skim history maintained for later reuse

GLAST Collaboration Meeting, March 2008

T.Johnson 13/22

3 Ways to Access 3 Ways to Access Data SkimmerData Skimmer

• Directly from Data Portal– http://glast-ground.slac.stanford.edu/DataPortal/– click on “Simple Skimmer”

• Data Processing Page(s)• From the Data Catalog

GLAST Collaboration Meeting, March 2008

T.Johnson 14/22

LAT Data SkimmerLAT Data Skimmer

GLAST Collaboration Meeting, March 2008

T.Johnson 15/22

Status/Problems/PlansStatus/Problems/Plans

• Problems– Backend/root crashes

• new (compiled) backend available soon– E-mail notification should include data dir even if failed

• Need to be able to navigate from pipeline> data dirs• Skimmer improvements in progress

– Ability to skim more types of files• “svac” “cal” and “gcr” added by David Chamont

– Web interface needs to catch up

– Ability to output more event types• Full Recon, Digi, MC trees• “Extended Event” (intermediate between FT1 and Merit)• Event Lists

– CompositeEventLists (CEL) files– Access to more “expert” options

GLAST Collaboration Meeting, March 2008

T.Johnson 16/22

Event Display (WIRED)Event Display (WIRED)

• WIRED allows quick look at detector response – can be installed

directly from Web with no additional GLAST software required.

– Uses “HepRep” interchange format/infrastructure (shared with FRED)

GLAST Collaboration Meeting, March 2008

T.Johnson 17/22

Event Display (WIRED)Event Display (WIRED)

GLAST Collaboration Meeting, March 2008

T.Johnson 18/22

Status/Problems/PlansStatus/Problems/Plans

• According to rumour doesn’t work outside my office– Actually it doesn’t work in my office either– But it did work fine for DC2 data

• Invariant under spatial translations/rotations• Now being hooked up to data catalog/xrootd

– Issue related to CEL files in gleam being investigated– Should be working again in next few days– “Event Display” link will appear it data catalog

• Will support browsing events or selection of specific events

GLAST Collaboration Meeting, March 2008

T.Johnson 19/22

Astro Data ServerAstro Data Server

• Similar to skimmer, allows events to be selected using cuts– Cuts can only be on position in the sky,

energy, time, and event category – Works much faster than Skimmer – Currently loaded with DC2 data

• Currently being refurbished for use with Service Challenge data and beyond– Will load all events as soon as they are

produced by L1Proc • User will be able to select

– all data including partial runs– only “complete” runs

• Loose event cuts CTBClassLevel>1– User can select CTBClassLevel category

• Able to output FT1, FT2, Extended event files, Merit root files

– API for programmatic event selection• Will be used by ASDC tools

– Closer integration with data catalog, skimmer

GLAST Collaboration Meeting, March 2008

T.Johnson 20/22

Astro Data ServerAstro Data Server

• Astro data server will remember the last set of parameters you used

• Astro Server also has a “Favorites” page– Keeps a list of your

“favorite” search parameters

GLAST Collaboration Meeting, March 2008

T.Johnson 21/22

Status/Problems/PlansStatus/Problems/Plans

• Was used for SC2 55 day run• Not used in Ops Sim 2• Still plan to

– Load data from L1Proc– Add programmatic interface for use by ASP/ASDC tools– Better integration with Data Portal

• Bottom of priority list

GLAST Collaboration Meeting, March 2008

T.Johnson 22/22

MiscellaneousMiscellaneous

• Data Access Restrictions– Starting very soon (this week hopefully) you will need to be a

“glast collaborator” to access files from xrootd– You will need to login to access data catalog/download manager

• Need to define standard skims– Automate their production

• Part of RSP?– Automate their registration in data catalog

• Access to ASP/RSP data has not been discussed here– But is in the plan

• Feedback from Ops Sim2 has been very useful– Not all digested yet

• Need more/better documentation– Data Access frequently asked questions

• http://confluence.slac.stanford.edu/x/zgAz • Please suggest more FAQ’s

• More feedback welcome– http://glast-ground.slac.stanford.edu/DataPortal/