the henp grand challenge project and initial use in the rhic mock data challenge 1

31
The HENP Grand Challenge Project and initial use in the RHIC Mock Data Challenge 1 D. Olson DM Workshop SLAC, 20-22 Oct 1998

Upload: adrina

Post on 12-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

The HENP Grand Challenge Project and initial use in the RHIC Mock Data Challenge 1. D. Olson DM Workshop SLAC, 20-22 Oct 1998. Outline. Overview The problem being addressed Experiences from Mock Data Challenge. The HENP GCA. 3 year project: FY97, FY98, FY99 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

The HENP Grand Challenge Projectand initial use in the

RHIC Mock Data Challenge 1

D. Olson

DM Workshop

SLAC, 20-22 Oct 1998

Page 2: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

2

Outline

• Overview• The problem being addressed• Experiences from Mock Data Challenge

Page 3: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

3

The HENP GCA

• 3 year project: FY97, FY98, FY99• Funding from DOE/MICS, collaboration with DOE/NP,

HEP• Focus on RHIC data access

Page 4: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

4

Who - the workers

• Henrik Nordberg, NERSC/LBNL• Luis Bernardo, NERSC/LBNL• Alex Sim, NERSC/LBNL• Dave Malon, ATLAS/ANL• Dave Stampf, RCF/BNL• Jeff Porter, STAR/LBNL• Dave Zimmerman, STAR/LBNL• Jie Yang, STAR/LBNL-UCLA-Beijing• Mark Pollack, PHENIX/BNL

Page 5: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

5

Who - the others

• Doug Olson - STAR/LBNLArie Shoshani, Doron Rotem - NERSC/LBNL (Data Mgmt Grp)Craig Tull - NERSC/LBNL (HENP)

• Bruce Gibbard, Shigeki Misawa RCF/BNLTorre Wenaus STAR/BNL

• ED May - ATLAS/ANL

Page 6: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

6

Relativistic Heavy Ion Collider• Brookhaven National

Laboratory on Long Island

• An accelerator for high-energy nuclear physics

• Begin operating in June 1999. (10+ year life)

Page 7: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

7

UsingObjectivity/DB(www.objectvity.com)

“large”

Using ROOT(root.cern.ch)

“small”

2 “Large”, 2 “Small” Experiments(www.rhic.bnl.gov)

Page 8: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

8

Characteristics

BRAHMS PHENIX PHOBOS STAR

# Scientists (approx.) 50 400 70 350

# Institutions 13 45 12 36

M events/year 3,600 965 2880 17

Size/raw event (KB) 10 300 18 12000

Total Data/Year (TB) 62 496 204 264

Req'd CPU Capacity 960 17,518 6,196 8,818 (SPECint95)

Page 9: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

9

Event (data) structure for STAR

http://www.rhic.bnl.gov/STAR/html/comp_l/dataproc/EventStructure.pdf

Eventcomponents,(bulky data)

Tags(index)

Different eventcomponents storedin different files.

Page 10: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

10

Data Characteristics (STAR example)

Data Characteristics (STAR example)

0.01

0.1

1

10

100

1000

Size /event (MB)

Volume (TB/yr)

# Users

http://www.rhic.bnl.gov/STAR/html/comp_l/ofl/reqmts9708/report/CompReqReport.ps

Every user isalso a software developer.

Page 11: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

11

Likelihood of implementation w/ Objectivity/DB

Certainly

Probably

Possibly

Doubtful

Page 12: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

12

Transport through the storage hierarchy

Processor

Cache memory mgr

“I/O” software (Objectivity, Zebra, …)

HPSS, pftp(optimization with HENP GC)

“Hope for” HPSS

Page 13: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

13

The Goal

• Optimize access to tape-resident files • Based upon selections of objects of interest to the

application (components of physics events)• Utilizing disk-resident index

Page 14: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

14

RHIC Analysis Architecture

MDC2

MDC2

Page 15: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

15

HENP-GC software features

• Index event component objects• Query attributes of events (tags)• Order optimize iteration over events• Coordinate file caching across multiple simultaneous

queries• Policies to control resource usage• Parallel query execution (analysis)

Page 16: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

16

Opportunities for optimization• Prevent / eliminate unwanted queries

=> query estimation (fast index) • Read all events (qualified for a query) from a file at

the same time, without reading all event in the file=> exact index over all properties

• Share files brought into cache by multiple queries=> look ahead for files needed and cache management

• Match data storage to access patterns=> clustering on tape

Page 17: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

17

Data access s/w (simple view)• key developers

– Henrik Nordberg (NERSC)query estimator

– Alex Sim (NERSC) query monitor

– Luis Bernardo (NERSC) cache manager

– Jeff Porter (LBL-STAR) query object

– Dave Malon (ANL) order-optimized iterator &gcaResources API

– Dave Zimmerman, (LBL-STAR)Mark Pollack (BNL-PHENIX) tagDB

– Jie Yang (UCLA,LBL,Beijing)testing

QueryInterface

StorageManager

Event Data(Objectivity)

Sample User CodeGC systemcomponents

Expt. code & data

Page 18: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

18

Process Flow

Query Estimator

Event Iterators Query Monitor Policy Module

Cache Manager

3

4

1 execute

2 request

whichFileToCache

FileID ToCache5

stage

6 staged

7 retrieve

8 release

9 purge

10 purged

Page 19: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

19

MDC1 setup

pftp pftp

STAR Objectivitydatabase filesin STAR COS,2 tapes,32 GB,240 files

STAR Objectivitydatabase filesin PHENIX COS,2 tapes

Objy db files on local disk

Storage managerand analysis codeson rmds03

Page 20: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

20

File ID

Start of query

Start of pftp request

File staged to HPSS disk

File staged to local disk

Event ID’s retrievedby iterator

File releasedby queries

File purgedfrom disk

Symbol color identifies queryTime

End of query

Legend

Page 21: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

21

3 queries with some sharedfiles, time delay betweeneach query, then the same3 queries are repeatedsimultaneously.The cache was large enoughto hold all files so the secondtime all queries run atprocessing speed rather thanI/O speed.

q1 q2 q3 q1,2,3

3 queries

Page 22: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

22

No caching policy With caching policy - shared files, ordering by # events

Shared access policy

Page 23: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

23

HPSS recovered& pftp succeeds again

Green means pftp failed

Detail

Page 24: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

24

Opportunities for optimization

• Prevent / eliminate unwanted queries=> query estimation (fast index)

– Query Estimator

• Read all events (qualified for a query) from a file at the same time, without reading all event in the file=> exact index over all properties

– Order Optimized Iterator

• Share files brought into cache by multiple queries=> look ahead for files needed and cache management

– Query Monitor

• Match data storage to access patterns=> clustering on tape

– Clustering Analyzer and Dynamic Reorganizer

Implementation

Page 25: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

25

Things not discussed

• Indices• Cluster analysis• Reorganization• Parallel query execution• Cray T3E production of simulated data

Page 26: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

26

References

• http://www-rnc.lbl.gov/GC/• http://gizmo.lbl.gov/sm/• http://www.rhic.bnl.gov/RCF/• http://www.rhic.bnl.gov/STAR/

Page 27: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

27

The End

Page 28: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

28

Where

• Massive simulations data generation– NERSC Cray T3E (www.nersc.gov)– Pittsburg SC Cray T3E (www.psc.edu)

• Software development & testing– NERSC HPSS, PDSF(recently upgraded from SSC vintage)

• Installation & operations– RHIC Computing Facility– STAR regional facility at NERSC/PDSF

Page 29: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

29

When

• Started March ‘97• Architecture November ‘97• RHIC Objectivity decision November ‘97• Prototype components May ‘98• RHIC MDC1 September ‘98• RHIC MDC2 early ‘99• RHIC operations start November ‘99

Page 30: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

30

Features (MDC1)• Extract tag parameters for index

– base attributes & computed values

• Query estimation– # events, # files (disk, tape), # seconds

• Query execution– order optimization (sort OID’s by file)

– return OID’s as files are staged

• Disk cache management– pre-fetch files

– coordinate multiple queries

Page 31: The HENP Grand Challenge Project and initial use in the  RHIC Mock Data Challenge 1

21 Oct. 1998 HENP-GC, D. Olson, SLAC DM Workshop

31

FY99• Multi-component event implementation (MDC2)• Performance measurements

• Monitoring • Tuning with policy module parameters

• GUI’s for – user query builder

– administration