uk tony doyle - university of glasgow grid data management introduction introduction physics...
TRANSCRIPT
Tony Doyle - University of Glasgow
UKUK
Grid Data ManagementGrid Data Management
IntroductionIntroduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario
GRID Data ManagementGRID Data Management Service GraphService Graph Development ToolsDevelopment Tools
Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype
From FilesFrom Files
bPEvent
bPEventObj
Vector
bPevent
Obj
bPsiDetector
bPSiDigit
bPMDT
_Detector
bPMDT _Digit
bPcalo Region
bPcalo Digit
bPtruth Vertex
bPtruth Track
To Objects
Tony Doyle - University of Glasgow
UKUK
Physics AnalysisPhysics Analysis
ESD: Data or Monte CarloESD: Data or Monte Carlo
Event Tags Event TagsEvent Selection
Analysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object Data
AOD
Analysis Object Data
AOD
Calibration DataCalibration Data
Analysis, Skims
Raw DataRaw Data
Tier 0,1Collaboration
wide
Tier 2Analysis
Groups
Tier 3, 4Physicists
Physics Analysis
Physics
Objects Physics
Objects
Physics
Objects
INC
RE
AS
ING
DA
TA
FLO
W
Tony Doyle - University of Glasgow
UKUK
Data HierarchyData Hierarchy
““RAW, ESD, AOD, TAG”RAW, ESD, AOD, TAG”
RAWRAW Recorded by DAQRecorded by DAQTriggered eventsTriggered events
Detector digitiDetector digitissationation~2 MB/event~2 MB/event
ESDESDPseudo-physical information:Pseudo-physical information:
Clusters, track candidates Clusters, track candidates (electrons, muons), etc.(electrons, muons), etc.
Reconstructed Reconstructed informationinformation
~100 kB/event~100 kB/event
AODAOD
Physical informationPhysical information::Transverse momentum, Transverse momentum,
Association of particles, jets, Association of particles, jets, (best) id of particles,(best) id of particles,
Physical info for relevant “objects”Physical info for relevant “objects”
Selected Selected informationinformation
~10 kB/event~10 kB/event
TAGTAGAnalysis Analysis
informationinformation~1 kB/event~1 kB/eventRelevant information Relevant information
for fast event selectionfor fast event selection
Tony Doyle - University of Glasgow
UKUK
GRID ServicesGRID Services
Grid ServicesGrid Services Resource Discovery Scheduling Security Monitoring Data Access Policy
Athena/Gaudi ServicesAthena/Gaudi Services Application manager
“Job Options” service
Event persistency service
Detector persistency
Histogram service
User interfaces
Visualization
DatabaseDatabase Event model
Object federations
Extensible interfaces and
protocols being specified
and developed:
Tools: 1. UML
2. Java
Protocols: 1. XML
2. MySQL DataGRID Toolkit
3. LDAP}
Tony Doyle - University of Glasgow
UKUK
Virtual Data ScenarioVirtual Data Scenario
Example analysis scenario:Example analysis scenario: Physicist issues a query from Athena for a Monte Carlo dataset
Issues: How expressive is this query? What is the nature of the query: declarative Creating new queries and language
Algorithms are already available in local shared libraries
An Athena service consults an ATLAS Virtual Data Catalog
Consider possibilities:Consider possibilities: TAG file exists on local machine (e.g. Glasgow)
Analyze it
ESD file exists in a remote store (e.g. Edinburgh) Access relevant event files, then analyze that
RAW File no longer exists (e.g. RAL) Regenerate, re-reconstruct, re-analyze !!! GRID Data
Management
Tony Doyle - University of Glasgow
UKUK
GRID Data ManagementGRID Data Management
Goal: develop middle-ware infrastructure to manage petabyte-scale data
Replica Manager
Data Mover
Data Accessor
Storage Manager
Castor HPSS
Data Locator
Meta Data Manager
Local Filesystem
Query Optimisation &Access Pattern Manag.
Secure Region
High Level Services
Medium Level Services
Core ServicesService levels reasonably well defined
Identify Key AreasWithin Software
Structure
UKUK
Tony Doyle - University of Glasgow
UKUK
5 areas for development5 areas for development Data Accessor - hides specific storage system requirements.
Mass Storage Management group. Replication - improves access by wide-area caching. Globus
toolkit offers sockets and a communication library, Nexus. Meta Data Management - data catalogues, monitoring
information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated.
Security - ensuring consistent levels of security for data and meta data.
Query optimisation - “cost” minimisation based on response time and throughput Monitoring Services group.
Identifiable UKContributions
RAL
Identifying Key AreasIdentifying Key Areas
RAL
Tony Doyle - University of Glasgow
UKUK
4 tasks defined in current UK WP24 tasks defined in current UK WP2 Service Discovery - locate grid services
(Wolfgang Hoschek, Gavin McCance +...) SQL Database Service - store, query and
retrieve metadata (Wolfgang Hoschek, Gavin McCance +...)
Query Optimisation - “cost” model (Kurt Stockinger +…)
Data Mining - semi-automatic discovery of events patterns, associations and anomalies: Grid metadata and HEP applications
UK + CERN = UK++
Identifying Key AreasIdentifying Key Areas
UKUK
Tony Doyle - University of Glasgow
UKUK
Service GraphService Graph
sds.cern.ch
sds.anl.gov
sds.infn.it sds.ral.uk
sds.padova-infn.it
sds.trieste-infn.it
sds.bologna-infn.it
Optimisation? - combine all info on nodes from e.g. ScotGRID
locally and advertise via Globus
All nodes “Grid Aware”
Allowed? Hierarchical Model
Tony Doyle - University of Glasgow
UKUK
Unified Modelling LanguageUnified Modelling Language
•Standard method to define the architecture = UML
•Standard tool = TogetherSoft?
Free for academic use.Runs under linux.
“I tried to generate an import/export module for MySQL under linux by copying the db2 .config file and replacing the various column types by the ones that are available in MySQL. This works apart from the fact that the primary key generation fails and a schema is generated (which MySQL doesn't support). The Access97 type of primary key generation is fine for MySQL. I have seen that Access uses a specialized DB import/export class. How can I generate one for MySQL?”
DB Driver for MySQL under linux?
Determine correct tools by testing..
Tony Doyle - University of Glasgow
UKUK
Compiler EfficiencyCompiler Efficiency
Numerically intensive simulations:Numerically intensive simulations: Minimal input and output data
ATLAS Monte Carlo (gg H bb)228 sec/3.5 Mb event on 800 MHz linux
box
Compiler Speed (MFlops)Fortran (g77) 27C (gcc) 43Java (jdk) 41
Compiler Tests: LINPACK
Industry StandardCompilers
+OO Methods
Tony Doyle - University of Glasgow
UKUK
System Monitoring PrototypeSystem Monitoring Prototype
Tools:1. Linux Kernel Info = /proc/stat2. Enquire = Java client-server 3. Histograms= Java Analysis Studio 4. TCP/IP= Local WAN
InstantaneousCPU Usage
ScalableArchitecture
Individual Node Info.
http://ppewww.ph.gla.ac.uk/~skilli/grid1.html
Tony Doyle - University of Glasgow
UKUKIndustrial PartnershipIndustrial Partnership
pingping
service
ping
monitor
WAN
LAN
Adoption of OPENIndustry Standards
+OO Methods
Industry ResearchCouncil
Monitoring Tools Exist Standard?
Tony Doyle - University of Glasgow
UKUK
System Monitoring PrototypeSystem Monitoring Prototype
Inputfrom
/proc/stat
InstantaneousCPU, disk, memory
Individual Node Info.is input to
single Grid node
user nice system idlecpu 469607 1593 823764 6044637disk 51306 0 0 0disk_rio 11002 0 0 0disk_wio 40304 0 0 0disk_rblk 87872 0 0 0disk_wblk 322378 0 0 0page 29693 49417swap 33 1447intr 18916942 7339601 27941 0 2 2 0 3 0 1 0 9331361 0 869060 1 619454 729516 0 0ctxt 62664003btime 984922120processes 107015
Combined Infointo e.g. distributedMySQL database
“Why start here?”Need well-understood simple system to start tests
and calibrate commercially available solutions.
Tony Doyle - University of Glasgow
UKUK
e.g. MySQL database daemon
Basic 'crash-me' and associated tests
Access times for basic insert, modify, delete, update database operations e.g.
(on 256Mbyte, 800MHz Red Hat 6.2 linux box)
Database Access BenchmarkDatabase Access Benchmark
350k data insert operations 149 seconds
10k query operations 97 seconds
350k data insert operations 149 seconds
10k query operations 97 seconds
Many applications require database functionalityMany applications require database functionality
Currently favoured HEP DataBase applicatione.g. BaBar, ZEUS software
Tony Doyle - University of Glasgow
UKUK
WP2 - Open Issues WP2 - Open Issues
Many… Early DaysMany… Early Days Working Standards?Working Standards? Scope Of UK ContributionScope Of UK Contribution
Service Discovery SQL Database Service Query Optimisation Data Mining
Development Tools?Development Tools? UMLTogetherSoft Database MySQL GDMP System Monitoring Standard Grid-Enabled Files Objects..
Input/Contributions welcome….Input/Contributions welcome….
From FilesFrom Files
bGridEvent
bGridEventObj Vector
bGridEvent
Obj
bGrid
Network
bNetwork
Digit
bGridCPU
bCPU Digit
bGrid Disk
bDisk Digit
bGrid
Memory
bMemory
Digit
To Objects
Teamwork