uk tony doyle - university of glasgow grid data management introduction introduction physics...

16
Tony Doyle - University of Glasgow UK UK Grid Data Management Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario GRID Data Management GRID Data Management Service Graph Service Graph Development Tools Development Tools Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype From Files b PEvent b PEventObj Vector b Pevent Obj b PsiDetect or b PSiDigi t b PMDT _Detector b PMDT _Digit b Pcalo Region b Pcalo Digit b Ptruth Vertex b Ptruth Track To Objects

Upload: riley-garcia

Post on 28-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Grid Data ManagementGrid Data Management

IntroductionIntroduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

GRID Data ManagementGRID Data Management Service GraphService Graph Development ToolsDevelopment Tools

Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype

From FilesFrom Files

bPEvent

bPEventObj

Vector

bPevent

Obj

bPsiDetector

bPSiDigit

bPMDT

_Detector

bPMDT _Digit

bPcalo Region

bPcalo Digit

bPtruth Vertex

bPtruth Track

To Objects

Page 2: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Physics AnalysisPhysics Analysis

ESD: Data or Monte CarloESD: Data or Monte Carlo

Event Tags Event TagsEvent Selection

Analysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object Data

AOD

Analysis Object Data

AOD

Calibration DataCalibration Data

Analysis, Skims

Raw DataRaw Data

Tier 0,1Collaboration

wide

Tier 2Analysis

Groups

Tier 3, 4Physicists

Physics Analysis

Physics

Objects Physics

Objects

Physics

Objects

INC

RE

AS

ING

DA

TA

FLO

W

Page 3: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Data HierarchyData Hierarchy

““RAW, ESD, AOD, TAG”RAW, ESD, AOD, TAG”

RAWRAW Recorded by DAQRecorded by DAQTriggered eventsTriggered events

Detector digitiDetector digitissationation~2 MB/event~2 MB/event

ESDESDPseudo-physical information:Pseudo-physical information:

Clusters, track candidates Clusters, track candidates (electrons, muons), etc.(electrons, muons), etc.

Reconstructed Reconstructed informationinformation

~100 kB/event~100 kB/event

AODAOD

Physical informationPhysical information::Transverse momentum, Transverse momentum,

Association of particles, jets, Association of particles, jets, (best) id of particles,(best) id of particles,

Physical info for relevant “objects”Physical info for relevant “objects”

Selected Selected informationinformation

~10 kB/event~10 kB/event

TAGTAGAnalysis Analysis

informationinformation~1 kB/event~1 kB/eventRelevant information Relevant information

for fast event selectionfor fast event selection

Page 4: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

GRID ServicesGRID Services

Grid ServicesGrid Services Resource Discovery Scheduling Security Monitoring Data Access Policy

Athena/Gaudi ServicesAthena/Gaudi Services Application manager

“Job Options” service

Event persistency service

Detector persistency

Histogram service

User interfaces

Visualization

DatabaseDatabase Event model

Object federations

Extensible interfaces and

protocols being specified

and developed:

Tools: 1. UML

2. Java

Protocols: 1. XML

2. MySQL DataGRID Toolkit

3. LDAP}

Page 5: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Virtual Data ScenarioVirtual Data Scenario

Example analysis scenario:Example analysis scenario: Physicist issues a query from Athena for a Monte Carlo dataset

Issues: How expressive is this query? What is the nature of the query: declarative Creating new queries and language

Algorithms are already available in local shared libraries

An Athena service consults an ATLAS Virtual Data Catalog

Consider possibilities:Consider possibilities: TAG file exists on local machine (e.g. Glasgow)

Analyze it

ESD file exists in a remote store (e.g. Edinburgh) Access relevant event files, then analyze that

RAW File no longer exists (e.g. RAL) Regenerate, re-reconstruct, re-analyze !!! GRID Data

Management

Page 6: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

GRID Data ManagementGRID Data Management

Goal: develop middle-ware infrastructure to manage petabyte-scale data

Replica Manager

Data Mover

Data Accessor

Storage Manager

Castor HPSS

Data Locator

Meta Data Manager

Local Filesystem

Query Optimisation &Access Pattern Manag.

Secure Region

High Level Services

Medium Level Services

Core ServicesService levels reasonably well defined

Identify Key AreasWithin Software

Structure

UKUK

Page 7: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

5 areas for development5 areas for development Data Accessor - hides specific storage system requirements.

Mass Storage Management group. Replication - improves access by wide-area caching. Globus

toolkit offers sockets and a communication library, Nexus. Meta Data Management - data catalogues, monitoring

information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated.

Security - ensuring consistent levels of security for data and meta data.

Query optimisation - “cost” minimisation based on response time and throughput Monitoring Services group.

Identifiable UKContributions

RAL

Identifying Key AreasIdentifying Key Areas

RAL

Page 8: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

4 tasks defined in current UK WP24 tasks defined in current UK WP2 Service Discovery - locate grid services

(Wolfgang Hoschek, Gavin McCance +...) SQL Database Service - store, query and

retrieve metadata (Wolfgang Hoschek, Gavin McCance +...)

Query Optimisation - “cost” model (Kurt Stockinger +…)

Data Mining - semi-automatic discovery of events patterns, associations and anomalies: Grid metadata and HEP applications

UK + CERN = UK++

Identifying Key AreasIdentifying Key Areas

UKUK

Page 9: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Service GraphService Graph

sds.cern.ch

sds.anl.gov

sds.infn.it sds.ral.uk

sds.padova-infn.it

sds.trieste-infn.it

sds.bologna-infn.it

Optimisation? - combine all info on nodes from e.g. ScotGRID

locally and advertise via Globus

All nodes “Grid Aware”

Allowed? Hierarchical Model

Page 10: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Unified Modelling LanguageUnified Modelling Language

•Standard method to define the architecture = UML

•Standard tool = TogetherSoft?

Free for academic use.Runs under linux.

“I tried to generate an import/export module for MySQL under linux by copying the db2 .config file and replacing the various column types by the ones that are available in MySQL. This works apart from the fact that the primary key generation fails and a schema is generated (which MySQL doesn't support). The Access97 type of primary key generation is fine for MySQL. I have seen that Access uses a specialized DB import/export class. How can I generate one for MySQL?”

DB Driver for MySQL under linux?

Determine correct tools by testing..

Page 11: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

Compiler EfficiencyCompiler Efficiency

Numerically intensive simulations:Numerically intensive simulations: Minimal input and output data

ATLAS Monte Carlo (gg H bb)228 sec/3.5 Mb event on 800 MHz linux

box

Compiler Speed (MFlops)Fortran (g77) 27C (gcc) 43Java (jdk) 41

Compiler Tests: LINPACK

Industry StandardCompilers

+OO Methods

Page 12: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

System Monitoring PrototypeSystem Monitoring Prototype

Tools:1. Linux Kernel Info = /proc/stat2. Enquire = Java client-server 3. Histograms= Java Analysis Studio 4. TCP/IP= Local WAN

InstantaneousCPU Usage

ScalableArchitecture

Individual Node Info.

http://ppewww.ph.gla.ac.uk/~skilli/grid1.html

Page 13: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUKIndustrial PartnershipIndustrial Partnership

pingping

service

ping

monitor

WAN

LAN

Adoption of OPENIndustry Standards

+OO Methods

Industry ResearchCouncil

Monitoring Tools Exist Standard?

Page 14: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

System Monitoring PrototypeSystem Monitoring Prototype

Inputfrom

/proc/stat

InstantaneousCPU, disk, memory

Individual Node Info.is input to

single Grid node

user nice system idlecpu 469607 1593 823764 6044637disk 51306 0 0 0disk_rio 11002 0 0 0disk_wio 40304 0 0 0disk_rblk 87872 0 0 0disk_wblk 322378 0 0 0page 29693 49417swap 33 1447intr 18916942 7339601 27941 0 2 2 0 3 0 1 0 9331361 0 869060 1 619454 729516 0 0ctxt 62664003btime 984922120processes 107015

Combined Infointo e.g. distributedMySQL database

“Why start here?”Need well-understood simple system to start tests

and calibrate commercially available solutions.

Page 15: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

e.g. MySQL database daemon

Basic 'crash-me' and associated tests

Access times for basic insert, modify, delete, update database operations e.g.

(on 256Mbyte, 800MHz Red Hat 6.2 linux box)

Database Access BenchmarkDatabase Access Benchmark

350k data insert operations 149 seconds

10k query operations 97 seconds

350k data insert operations 149 seconds

10k query operations 97 seconds

Many applications require database functionalityMany applications require database functionality

Currently favoured HEP DataBase applicatione.g. BaBar, ZEUS software

Page 16: UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario

Tony Doyle - University of Glasgow

UKUK

WP2 - Open Issues WP2 - Open Issues

Many… Early DaysMany… Early Days Working Standards?Working Standards? Scope Of UK ContributionScope Of UK Contribution

Service Discovery SQL Database Service Query Optimisation Data Mining

Development Tools?Development Tools? UMLTogetherSoft Database MySQL GDMP System Monitoring Standard Grid-Enabled Files Objects..

Input/Contributions welcome….Input/Contributions welcome….

From FilesFrom Files

bGridEvent

bGridEventObj Vector

bGridEvent

Obj

bGrid

Network

bNetwork

Digit

bGridCPU

bCPU Digit

bGrid Disk

bDisk Digit

bGrid

Memory

bMemory

Digit

To Objects

Teamwork