using the grid for astronomical data roy williams, caltech

53
Using the Grid for Astronomical Data Roy Williams, Caltech

Upload: emma-montgomery

Post on 27-Mar-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Using the Grid for Astronomical Data Roy Williams, Caltech

Using the Grid for Astronomical Data

Roy Williams, Caltech

Page 2: Using the Grid for Astronomical Data Roy Williams, Caltech

Palomar-Quest SurveyCaltech, NCSA, Yale

P48 Telescope

Caltech Yale

NCSA

Transient pipeline computing reservation at sunrise for immediate followup of transients

Synoptic survey massive resampling (Atlasmaker) for ultrafaint detection

TG

NCSA and Caltech and Yale run different pipelines on the same data

50 Gbyte/night

5 Tbyte

ALERT

Page 3: Using the Grid for Astronomical Data Roy Williams, Caltech

Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid projectCaltech

High-qualityflux-preserving, spatial accuracy

StackableHyperatlas

Edge-freePyramid weight

Mining AND Outreach

DPOSS 15º

Griffith Observatory "Big Picture"

Page 4: Using the Grid for Astronomical Data Roy Williams, Caltech

Synoptic Image Stack

Page 5: Using the Grid for Astronomical Data Roy Williams, Caltech

PQ Pipeline

ComputingObservation Night28 columns x 4 filtersup to 70 Gbyte

real-time

next day

cleaned frames hyperatlas pages

coadd

VOEventNet

quasars @z>4

Page 6: Using the Grid for Astronomical Data Roy Williams, Caltech

Mosaicking service

Logical SIAP

NVO Registry

Physical SIAP

Computing

Portal

Security

Request

Sandbox

http

Page 7: Using the Grid for Astronomical Data Roy Williams, Caltech

Transient from PQ

from catalog pipeline

Page 8: Using the Grid for Astronomical Data Roy Williams, Caltech

Event Synthesis Engine

Pairitel

Palomar 60”

Raptor

PQ next-daypipelines

catalog

Palomar-Quest

knownVariables

knownasteroids

SDSS2MASS

PQ Event Factory

remote archives

baselinesky

eStar

VOEventNet

VOEventNet: a Rapid-Response Telescope Grid GRBsatellites

VOEventdatabase

Page 9: Using the Grid for Astronomical Data Roy Williams, Caltech

Correlation of mass distribution (SDSS) with CMB (ISW effect)-- statistical significance through ensemble of simulated universes

Connolly and Scrantom, U Pittsburgh

ISW Effect

Page 10: Using the Grid for Astronomical Data Roy Williams, Caltech

Analysis of data from AMANDAAntarctic Muon and Neutrino Detector Array

Barwick and Silvestri, UC Irvine

Amanda analysis

Page 11: Using the Grid for Astronomical Data Roy Williams, Caltech

Quasar ScienceAn NVO-Teragrid projectPennState, CMU, Caltech

• 60,000 quasar spectra from Sloan Sky Survey• Each is 1 cpu-hour: submit to grid queue• Fits complex model (173 parameter)

– derive black hole mass from line widths

clusters

globusrun

manager

NVO dataservices

Page 12: Using the Grid for Astronomical Data Roy Williams, Caltech

N-point galaxy correlationAn NVO-Teragrid projectPitt, CMU

Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z)

Lots of large parallel jobs

kd-tree algorithms

Page 13: Using the Grid for Astronomical Data Roy Williams, Caltech

TeraGrid

Page 14: Using the Grid for Astronomical Data Roy Williams, Caltech

TeraGrid Wide Area Network

Page 15: Using the Grid for Astronomical Data Roy Williams, Caltech

TeraGrid Components

• Compute hardware– Intel/Linux Clusters, Alpha SMP clusters, POWER4

cluster, …

• Large-scale storage systems– hundreds of terabytes for secondary storage

• Very high-speed network backbone– bandwidth for rich interaction and tight coupling

• Grid middleware– Globus, data management, …

• Next-generation applications

Page 16: Using the Grid for Astronomical Data Roy Williams, Caltech

Overview of Distributed TeraGrid Resources

HPSSHPSS

HPSS UniTree

External Networks

External NetworksExternal

Networks

External Networks

Site Resources Site Resources

Site ResourcesSite ResourcesNCSA/PACI10.3 TF240 TB

SDSC4.1 TF225 TB

Caltech Argonne

Page 17: Using the Grid for Astronomical Data Roy Williams, Caltech

Cluster Supercomputer

100s of nodes

purged /scratch

parallel file system/home (backed-up)

login node

job submission and queueing(Condor, PBS, ..)

user

metadata node

parallel I/O

VO service

Page 18: Using the Grid for Astronomical Data Roy Williams, Caltech

TeraGrid Allocations Policies

• Any US researcher can request an allocation– Policies/procedures posted at:

• http://www.paci.org/Allocations.html – Online proposal submission

• https://pops-submit.paci.org/

• NVO has an account on Teragrid– (just ask RW)

Page 19: Using the Grid for Astronomical Data Roy Williams, Caltech

Data storage

Page 20: Using the Grid for Astronomical Data Roy Williams, Caltech

Logical and Physical names

• Logical name– application-context

• eg frame_20050828.012.fits

• Physical name– storage-context

• eg /home/roy/data/frame_20050828.012.fits• eg file:///envoy4/raid3/frames/20050825/012.fits• eg

http://nvo.caltech.edu/vostore/6ab7c828fe73.fits.gz

Page 21: Using the Grid for Astronomical Data Roy Williams, Caltech

Logical and Physical Names

• Allows – replication of data– movement/optimization of storage– transition to database (lname -> key)– heterogeneous/extensible storage hardware

• /envoy2/raid2, /pvfs/nvo/, etc etc

Page 22: Using the Grid for Astronomical Data Roy Williams, Caltech

Physical Name

• Suggest URI form– protocol://identifier– if you know the protocol, you can interpret the

identifier

• Examplesfile://ftp://srb://uberftp://

• Transition to serviceshttp://server/MadeToOrder?frame=012&a=2&b=3

Page 23: Using the Grid for Astronomical Data Roy Williams, Caltech

Typical types of HPC storage needs

Type

Typical size

Use Aggregate BW

Tolerance for Latency

Requirements

1 1-10TB Home filesystem

A lot of small files, high metadata rates, interactive use.

2 (optional)

100’s GB (per CPU)

Local scratch space

High bandwidth data cache.

3 10-100TB

Global filesystem

High aggregate bandwidth. Concurrent access to data. Moderate latency tolerated.

4 100TB-PB

Archival Storage

Large storage pools with low cost. Used for long term storage of results.

Page 24: Using the Grid for Astronomical Data Roy Williams, Caltech

Disk Farms (datawulf)

Large files striped over disks

Management node for file creation, access, ls, etc etc

• Homogeneous Disk Farm(= parallel file system)

parallel file systemmetadata node

parallel I/O

Page 25: Using the Grid for Astronomical Data Roy Williams, Caltech

Parallel File System

• Large files are striped– very fast parallel access

• Medium files are distributed– Stripes do not all start the same place

• Small files choke the PFS manager– Either containerize– or use blobs in a database

• not a file system anymore: pool of 108 blobs with lnames

Page 26: Using the Grid for Astronomical Data Roy Williams, Caltech

Containerizing

• Shared metadata• Easier for bulk movement

container file in container

Page 27: Using the Grid for Astronomical Data Roy Williams, Caltech

Extraction from Container

• tar container• slow extraction (reads whole container)

• zip container• indexed for fast partial extraction• 2 Gbyte limit on container size• used for fast access 2MASS image service at

Caltech

Page 28: Using the Grid for Astronomical Data Roy Williams, Caltech

Storage Resource Broker (SRB)

• Single logical namespace while accessing distributed archival storage resources

• Effectively infinite storage (first to 1TB wins a t-shirt)

• Data replication• Parallel Transfers• Interfaces: command-line, API, web/portal.

Page 29: Using the Grid for Astronomical Data Roy Williams, Caltech

Storage Resource Broker (SRB):Virtual Resources, Replication

NCSA

SDSC

workstation

SRB Client

(cmdline, or API)

hpss-sdsc

sfs-tape-sdsc

hpss-caltech

Page 30: Using the Grid for Astronomical Data Roy Williams, Caltech

Running jobs

Page 31: Using the Grid for Astronomical Data Roy Williams, Caltech

3 Ways to Submit a Job

1. Directly to PBS Batch Scheduler – Simple, scripts are portable among PBS TeraGrid clusters

2. Globus common batch script syntax– Scripts are portable among other grids using Globus

3. Condor-G– Nice interface atop Globus, monitoring of all jobs submitted via Condor-G– Higher-level tools like DAGMan

Page 32: Using the Grid for Astronomical Data Roy Williams, Caltech

PBS Batch Submission

• Single executables to be on a single remote machine– login to a head node, submit to queue

• Direct, interactive execution– mpirun –np 16 ./a.out

• Through a batch job manager– qsub my_script

• where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification…

• ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org – qsub flatten.sh –v "FILE=f544"– qstat or showq– ls *.dat– pbs.out, pbs.err files

Page 33: Using the Grid for Astronomical Data Roy Williams, Caltech

Remote submission

• Through globus– globusrun -r [some-teragrid-head-node].teragrid.org/jobmanager -f my_rsl_script

• where my_rsl_script describes the same details as in the qsub my_script!

• Through Condor-G– condor_submit my_condor_script

• where my_condor_script describes the same details as the globus my_rsl_script!

Page 34: Using the Grid for Astronomical Data Roy Williams, Caltech

globus-job-submit

• For running of batch/offline jobs– globus-job-submit Submit job

• same interface as globus-job-run• returns immediately

– globus-job-status Check job status– globus-job-cancel Cancel job– globus-job-get-output Get job

stdout/err– globus-job-clean Cleanup after job

Page 35: Using the Grid for Astronomical Data Roy Williams, Caltech

Condor-G

A Grid-enabled version of Condor that provides robust job management for Globus clients.

– Robust replacement for globusrun– Provides extensive fault-tolerance– Can provide scheduling across multiple

Globus sites– Brings Condor’s job management features

to Globus jobs

Page 36: Using the Grid for Astronomical Data Roy Williams, Caltech

Condor DAGMan

• Manages workflow interdependencies• Each task is a Condor description file• A DAG file controls the order in which

the Condor files are run

Page 37: Using the Grid for Astronomical Data Roy Williams, Caltech

Data intensive computing with NVO services

Page 38: Using the Grid for Astronomical Data Roy Williams, Caltech

Two Key Ideas for Fault-Tolerance

• Transactions• No partial completion -- either all or nothing

– eg copy to a tmp filename, then mv to correct file name

• Idempotent• “Acting as if done only once, even if used multiple times”• Can run the script repeatedly until finished

Page 39: Using the Grid for Astronomical Data Roy Williams, Caltech

DPOSS flattening

2650 x 1.1 Gbyte files

Cropping borders

Quadratic fit and subtract

Virtual data

Source Target

Page 40: Using the Grid for Astronomical Data Roy Williams, Caltech

Driving the Queues

for f in os.listdir(inputDirectory):

# if the file exists, with the right size and age, then we keep it

ofile = outputDirectory +"/"+ f

if os.path.exists(ofile):

osize = os.path.getsize(ofile)

if osize != 1109404800:

print " -- wrong target size, remaking", osize

else:

time_tgt = filetime(ofile)

time_src = filetime(file)

if time_tgt < time_src:

print(" -- target too old or nonexistant, making")

else:

print " -- already have target file "

continue

cmd = "qsub flat.sh -v \"FILE=" + f +"\""

print " -- submitting batch job: ", cmd

os.system(cmd)

Here is the driver that makes and submits jobs

Page 41: Using the Grid for Astronomical Data Roy Williams, Caltech

PBS script

#!/bin/sh

#PBS -N dposs

#PBS -V

#PBS -l nodes=1

#PBS -l walltime=1:00:00

cd /home/roy/dposs-flat/flat

./flat \

-infile /pvfs/mydata/source/${FILE}.fits \

-outfile /pvfs/mydata/target/${FILE}.fits \

-chop 0 0 1500 23552 \

-chop 0 0 23552 1500 \

-chop 0 22052 23552 23552 \

-chop 22052 0 23552 23552 \

-chop 18052 0 23552 4000

A PBS script. Can do "qsub script.sh –v "FILE=f345"

Page 42: Using the Grid for Astronomical Data Roy Williams, Caltech

Hyperatlas

Standard naming for atlases and pagesTM-5-SIN-20Page 1589

Standard Scales:scale s means 220-s arcseconds per pixel

SIN projection

TAN projection

TM-5 layout

HV-4 layout

Standard Projections

StandardLayout

Page 43: Using the Grid for Astronomical Data Roy Williams, Caltech

Hyperatlas is a Service

All Pages: <baseURL>/getChart?atlas=TM-5-SIN-20 (and no other arguments)

0 2.77777778E-4 'RA---SIN’ 'DEC--SIN' 0.0 -90.01 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 -85.02 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 36.0 -85.0...1731 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 288.0 85.01732 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 324.0 85.01733 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 90.0

Sky to Page: page=1603&RA=182&Dec=62 --> page, scale, ctype, RA, Dec. x, y1603 2.777777777777778E-4 'RA---TAN' 'DEC--TAN' 175.3 60.0 -11180.1 7773.7

Best Page: RA=182&Dec=62 --> page, scale, ctype, RA, Dec. x, y1604 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 184.61538 60.0 4422.4 7292.1

Page WCS: page=1604 --> page, scale, ctype, RA, Dec1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.61538 60.0

Replicated ImplementationsbaseURL = http://nvo.caltech.edu:8080/hyperatlas (services try)

Page 44: Using the Grid for Astronomical Data Roy Williams, Caltech

Hyperatlas Service

Page to Sky: page=1603&x=200&y=500 --> RA, Dec, nx,n y, nz

184.5 60.1 -0.496 -0.039 0.867

Relevant pages from sky region: tilesize=4096&ramin=200.0&ramax=202.0&decmin=11.0&decmax=12.0 --> RA, Dec, nx,n y, nz

1015 -1 11015 -1 21015 -2 11015 -2 21015 0 11015 0 2

ImplementationbaseURL = http://nvo.caltech.edu:8080/hyperatlas (services try)

page 1015ref point RA=200, Dec=10

Page 45: Using the Grid for Astronomical Data Roy Williams, Caltech

GET services from Python

import urllib

hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \

+ "&RA=" + str(center1) + "&Dec=" + str(center2)

stream = urllib.urlopen(hyperatlasURL)

# result is a tab-separated line, so use split() to tokenize

tokens = stream.readline().split('\t')

print "Using page ", tokens[0], " of atlas ", atlas

self.scale = float(tokens[1])

self.CTYPE1 = tokens[2]

self.CTYPE2 = tokens[3]

rval1 = float(tokens[4])

rval2 = float(tokens[5])

This code uses a service to find the best hyperatlas page for a given sky location

Page 46: Using the Grid for Astronomical Data Roy Williams, Caltech

VOTable parser in Python

import urllib

import xml.dom.minidom

stream = urllib.urlopen(SIAP_URL)

doc = xml.dom.minidom.parse(stream)

#Make a dictionary for the columns

col_ucd_dict = {}

for XML_TABLE in doc.getElementsByTagName("TABLE"):

for XML_FIELD in XML_TABLE.getElementsByTagName("FIELD"):

col_ucd = XML_FIELD.getAttribute("ucd")

col_ucd_dict[col_title] = col_counter

urlColumn = col_ucd_dict["VOX:Image_AccessReference"]

formatColumn = col_ucd_dict["VOX:Image_Format"]

raColumn = col_ucd_dict["POS_EQ_RA_MAIN"]

deColumn = col_ucd_dict["POS_EQ_DEC_MAIN"]

From a SIAP URL, we get the XML, and extract the columns that have the image references, image format, and image RA/Dec

Page 47: Using the Grid for Astronomical Data Roy Williams, Caltech

VOTable parser in Python

import xml.dom.minidom

table=[]

for XML_TABLE in doc.getElementsByTagName("TABLE"):

for XML_DATA in XML_TABLE.getElementsByTagName("DATA"):

for XML_TABLEDATA in XML_DATA.getElementsByTagName("TABLEDATA"):

for XML_TR in XML_TABLEDATA.getElementsByTagName("TR"):

row=[]

for XML_TD in XML_TR.getElementsByTagName("TD"):

data = ""

for child in XML_TD.childNodes:

data += child.data

row.append(data)

table.append(row)

Table is a list of rows, and each row is a list of table cells

Page 48: Using the Grid for Astronomical Data Roy Williams, Caltech

Science Gateways

Page 49: Using the Grid for Astronomical Data Roy Williams, Caltech

Grid Impediments

Learn GlobusLearn MPILearn PBSPort code to ItaniumGet certificateGet logged inWait 3 months for accountWrite proposal

and now do some science....

Page 50: Using the Grid for Astronomical Data Roy Williams, Caltech

A better way:Graduated Securityfor Science Gateways

Web form - anonymous

somescience....

Register - logging and reporting

morescience....

Authenticate X.509- browser or cmd line

big-ironcomputing

....

Write proposal- own account

power user

Page 51: Using the Grid for Astronomical Data Roy Williams, Caltech

2MASS Mosaicking portalAn NVO-Teragrid projectCaltech IPAC

Page 52: Using the Grid for Astronomical Data Roy Williams, Caltech

Three Types of Science Gateways

• Web-based Portals – User interacts with community-deployed web interface.– Runs community-deployed codes – Service requests forwarded to grid resources

• Scripted service call – User writes code to submit and monitor jobs

• Grid-enabled applications– Application programs on users' machines (eg IRAF)– Also runs program on grid resource

Page 53: Using the Grid for Astronomical Data Roy Williams, Caltech

Secure Web services for Teragrid Access