the alice grid the beat of a different drum

29
The ALICE Grid The beat of a different drum L.Betev, P.Buncic, A.Peters, P.Saiz, S.Bagnasco, P.Mendez- Lorenzo, C.Cistoiu, C.Grigoras Presented by F.Carminati April 23, 2007 ACAT - Amsterdam

Upload: faxon

Post on 01-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

The ALICE Grid The beat of a different drum. L.Betev, P.Buncic, A.Peters, P.Saiz, S.Bagnasco, P.Mendez-Lorenzo, C.Cistoiu, C.Grigoras Presented by F.Carminati April 23, 2007 ACAT - Amsterdam. ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~ 80 Institutes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The ALICE Grid The beat of a different drum

The ALICE GridThe beat of a different drum

L.Betev, P.Buncic, A.Peters, P.Saiz, S.Bagnasco, P.Mendez-Lorenzo,

C.Cistoiu, C.GrigorasPresented by F.Carminati

April 23, 2007ACAT - Amsterdam

Page 2: The ALICE Grid The beat of a different drum

223/04/07 fca @ ACAT07

level 0 - special hardware8 kHz (160 GB/sec)

level 1 - embedded processors

level 2 - PCs

200 Hz (4 GB/sec)

30 Hz (2.5 GB/sec)

30 Hz

(1.25 GB/sec)

data recording &

offline analysis

Total weight 10,000tOverall diameter 16.00mOverall length 25mMagnetic Field 0.4Tesla

ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~

80 Institutes

Page 3: The ALICE Grid The beat of a different drum

323/04/07 fca @ ACAT07

QuickTime™ and a decompressor

are needed to see this picture.

Page 4: The ALICE Grid The beat of a different drum

423/04/07 fca @ ACAT07

The ALICE Grid (AliEn)

Functionality+

Simulation

Interoperability+

Reconstruction

Performance, Scalability, Standards+

Analysis

First production (distributed simulation)

Physics Performance Report (mixing & reconstruction)10% Data Challenge

2001 2002 2003 2004 2005 2006 2007

Start

There are millions lines of code in OS dealing with GRID issuesWhy not using them to build the minimal GRID that does the job?

Fast development of a prototype, can restart from scratch etc etc Hundreds of users and developers Immediate adoption of emerging standards

AliEn by ALICE (5% of code developed, 95% imported)

WLCG integration20% Data Challenge

Page 5: The ALICE Grid The beat of a different drum

523/04/07 fca @ ACAT07

gLiteMiddleware

Services

Middleware Services in AliEn

GAPI

WM DM

TQ

PM

FTQ

ACE FC

CEJW(JA) SE

CR(LSF,..)

LJC SRM

LRC

API

GAS Grid Access ServiceWM Workload MgmtDM Data MgmtRB Resource BrokerTQ Task QueueFPS File Placement ServiceFQ File Transfer QueuePM Package ManagerACE AliEn CE (pull)FC File CatalogueJW Job WrapperJA Job AgentLRC Local Replica Catalogue? Local Job CatalogueSE Storage ElementCE Computing ElementSRM Storage Resource MgrCR Computing Resource

(LSF, PBS,…)

EDGAliEn

Exp specific services

LCGAliEn arch + LCG code

EGEE

Exp specific services(AliEn for ALICE)

EGEE, ARC, OSG…

Page 6: The ALICE Grid The beat of a different drum

623/04/07 fca @ ACAT07

Design criteria

• Minimize intrusiveness– Limit the impact on the host computer centres

• Use delegation– Where possible acquire “capability” to perform

operation, no need to verify operation mode at each step

• Centralise information– Minimise the need to “synchronise” information sources

• Decentralise decisions– Minimise interactions and avoid bottlenecks

• Virtualise resources• Automatise operations• Provide extensive monitoring

Page 7: The ALICE Grid The beat of a different drum

723/04/07 fca @ ACAT07

Site

ALICE central services

Job submission in LCGJob 1 lfn1, lfn2, lfn3,

lfn4

Job 2 lfn1, lfn2, lfn3, lfn4

Job 3 lfn1, lfn2, lfn3

Job 1.1 lfn1

Job 1.2 lfn2

Job 1.3 lfn3, lfn4

Job 2.1 lfn1, lfn3

Job 2.1 lfn2, lfn4

Job 3.1 lfn1, lfn3

Job 3.2 lfn2

Optimizer

ComputingAgent

RB

CE WN

Env OK?

Die with grac

e

Execs agent

Sends job agent to site

Yes No

Close SE’s & SoftwareMatchmaking

Receives work-load

Asks work-load

Retrieves workload

Sends job result

Updates TQ

Submits job UserALICE Job Catalogue

Submitsjob agent

VO-Box

LCG

User Job

ALICE catalogues

Registers output

lfn guid

{se’s}

lfn guid

{se’s}

lfn guid

{se’s}

lfn guid

{se’s}

lfn guid

{se’s}

ALICE File Catalogue

packman

Page 8: The ALICE Grid The beat of a different drum

823/04/07 fca @ ACAT07

VO-Box monitoring

• Status of the VOBOX, ALICE and WLCG services are monitored through ML

• Sites are encouraged to check the status through these pages

• Alarm system established

• Standard SAM tests to check LCG services availability are incorporated in the VO-box

• Available to Grid Support and ALICE (via ML)

Page 9: The ALICE Grid The beat of a different drum

923/04/07 fca @ ACAT07

Job submission

• Minimize intrusiveness– Job submission is realised using existing Grid MW if

possible or directly to CE otherwise

• Centralise information– Jobs are held in a single central queue handling

priorities, and quotas

• Decentralise decisions– Sites decides which jobs to “pull”

• Virtualise resources– Job agents are run to providing a standard

environment (job wrapper) across different systems

• Automatise operations• Provide extensive monitoring

Page 10: The ALICE Grid The beat of a different drum

1023/04/07 fca @ ACAT07

The AliEn FC

• Hierarchical structure (like a UNIX File system)• Designed in 2001

– Provides mapping from LFN to PFN– Built on top of several distributed databases

• Possible to add another database

– Possible to move directories to another table• Transparent for the end user

– Metadata catalogue on the LFN– Triggers– GUID to PFN mapping in the central catalogue

• No “local catalogue”

– Possibility of automatic PFN construction• Store only the GUID and Storage Index and the SE builds the PFN from

the GUID

– Two independent catalogues: LFN->GUID and GUID->PFN• Possible to add databases to one or the other• We could drop LFN->GUID mapping if not used anymore

Page 11: The ALICE Grid The beat of a different drum

1123/04/07 fca @ ACAT07

Benchmarks

Reading AliEn v2-12 AliEn v2-13

No cache

cache No cache

cache

List LFN 23 2.8 20 2

List LFN (10 ) 23 1.5 20 1

LFN ->GUID 24 3 20 2.5

LFN->PFN 106* 30* 70 5.5

GUID->PFN 143* 51* 52 2

• Tests done on:– Dual Pentium CPU 3.4

GHz– 3.2 GB RAM

• DB, writers, reader and soap servers running on the same machine

• Users: – Register their files in their home

directories• PackMan

– Definition of the packages (VO & user)

• Production user– Register data

• AliEn TaskQueue:– Register the output of the jobs

Insertion

Page 12: The ALICE Grid The beat of a different drum

1223/04/07 fca @ ACAT07

Other features• Size

– LFN tables: 130 bytes/entry– GUID: 300 (Innodb), 210 (MyISAM), 120 (no PFN)– Binary log files: 1000 bytes/entry!

• Needed for database replication• Automatically cleaned by mysql

– The current database could contain 7.5 billion entries!• Two QoS for SE

– Custodial: File has low probability of disappearing– Replica: File has high probability of disappearing– User specifies QoS when registering a file

• Still to do: quotas• Entries in the LFN catalogue can have expiration time

– The entry will disappear regardless of QoS of SE and is removed from storage

– A GUID not referenced by any LFN will also disappear

Page 13: The ALICE Grid The beat of a different drum

1323/04/07 fca @ ACAT07

File Catalogue v2-13

/

/alice

/alice/user/p/psaiz

/alice/simulation/2006

Index

LFN->GUID

1-JAN-1970

1-JAN-2006

14-FEB-2007

23-AUG-2008

Index

GUID->PFN

LFN Catalogue GUID Catalogue

Page 14: The ALICE Grid The beat of a different drum

1423/04/07 fca @ ACAT07

Storage strategy

WN

VOBOX::SA

xrootd (manager)

MSS

xrootd (worker)

Disk

SR

M

xrootd (worker)

DPM

xrootd (worker)

Castor

SR

M

SR

M

MSS

xrootd emulation (worker)

dCache

SR

MDPM, CASTOR, dCache are LCG-developed SEs

AvailableBeing deployed

Prototype being validated

Being deployed

Page 15: The ALICE Grid The beat of a different drum

1523/04/07 fca @ ACAT07

Xrootd architecture

Client

Redirector(Head Node)

Data Servers

open file X

A

B

C

go to C

open file X

Who has file X?

I have

Cluster

Client sees all servers as xrootd data servers

2nd open X

go to C

RedirectorsCache filelocation

Page 16: The ALICE Grid The beat of a different drum

1623/04/07 fca @ ACAT07

xrootd serving several VO’s

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

priv key

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

proxyQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

sec env

client

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

proxy QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

sec env

pub key

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

ALICE catalogue

xrootd server

GSI auth

Catalogue auth

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 17: The ALICE Grid The beat of a different drum

1723/04/07 fca @ ACAT07

Tag architecture

ev#guid

Tag1, tag2, tag3…

ev#guid

Tag1, tag2, tag3…

ev#guid

Tag1, tag2, tag3…

ev#guid

Tag1, tag2, tag3…

Reconstruction

Analysis job

GRID/PROOF#1

GRID/PROOF#2

GRID/PROOF#3

GRID/PROOF#N

guid#{ev1…evn}

guid#{ev1…evn}

guid#{ev1…evn}

guid#{ev1…evn}

SelectionList of ev#guid’s

Bitmap Index

Index builder

Selection List of ev#guid’s

When available

InteractiveBatch Job#1

Job#2Job#3

…Job#N

Page 18: The ALICE Grid The beat of a different drum

1823/04/07 fca @ ACAT07

How to select data

• A dataset list is created via queries to the metadata– Key/value pairs– Run, file, and tag MD

• Run Meta Data– Stored as (Directory) Meta

Data in the File Catalogue– Contains parameters

describing conditions during the run

• File Meta Data– No physics information– Sanity, permission &

location of Files

Page 19: The ALICE Grid The beat of a different drum

1923/04/07 fca @ ACAT07

File Catalogue query

CE and SE

processing

User job (many events)

Data set (ESD’s, AOD’s)

Job Optimizer

Sub-job 1 Sub-job 2 Sub-job n

CE and SEprocessin

g

CE and SE

processing

Job Broker

Grouped by SE files location

Submit to CE with closest SE

Output file 1

Output file 2

Output file n

File merging job

Job output

Distributed analysis

processin

g

processin

g

Page 20: The ALICE Grid The beat of a different drum

2023/04/07 fca @ ACAT07

Grid data challenge - PDC’06• The longest running Data Challenge in ALICE

– A comprehensive test of the ALICE Computing model– Running already for 9 months non-stop: approaching data taking regime of

operation– Participating: 55 computing centres on 4 continents: 6 Tier 1s, 49 T2s– 7MSI2k • hours 1500 CPUs running continuously • 685K Grid jobs total

• 530K production• 53K DAQ • 102K user !!!

• 40M evts, 0.5PB generated, reconstructed and stored

• User analysis ongoing

43% T1s57% T2s

T1 sites:

CNAF, CCIN2P3, GridKa, RAL, SARA

• FTS tests T0->T1 Sep-Dec• Design goal 300MB/s reached

but not maintained• 0.7PB DAQ data registered

Page 21: The ALICE Grid The beat of a different drum

2123/04/07 fca @ ACAT07

Long HistoryDB

Monitoring, monitoring, monitoring…

http://pcalimonitor.cern.ch:8889/LCG Tools

MonALISA @Site

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

MonALISA @CERN

MonALISA

LCG Site

ApMon

AliEn CE

ApMon

AliEn SE

ApMon

ClusterMonitor

ApMon

AliEn TQ

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

ApMon

AliEn Job Agent

ApMon

AliEn CE

ApMon

AliEn SE

ApMon

ClusterMonitor

ApMon

AliEn IS

ApMon

AliEn Optimizers

ApMon

AliEn Brokers

ApMon

MySQLServers

ApMon

CastorGridScripts

ApMon

APIServices

MonaLisaMonaLisaRepositoryRepository

Aggregated Data

rss

vsz

cpu

time

run

tim

e

job

slots

free

spac

e

nr.

of

file

s

op

en

files

Queued

JobAgents

cpu

ksi2k

jobstatus

disk

used

pro

cesses

loadn

etIn

/ou

t

jobsstatussockets

migratedmbytes

active

sessions

MyP

roxy

status

Page 22: The ALICE Grid The beat of a different drum

2223/04/07 fca @ ACAT07

Back to the future…

• But now…– Memory and disk space is cheap– Virtual Machines running on commodity hardware on Open Source

OS are promising to deliver what we lost some time ago• Why?

– The infrastructure can evolve independently from the application– Now we can Start, Stop, Pause, Migrate VM– Software running inside a VM can not affect the environment– Perfect process and file sandboxing – (re)use a lot of code which was previously is system/kernel domain

IBM-VM 360 mainframe, 1988• Once upon a time…– statically linking, running in a VM– prefect isolation!

• Then, things changed..– Unix, PC, commodity computing,

shared libraries, dynamical linking, plugins

– Fuzzy application boundary!

Page 23: The ALICE Grid The beat of a different drum

2323/04/07 fca @ ACAT07

Virtual Appliances• Virtual Software Appliance =

Application + Virtual Machine + Simple UI that combines – Minimal operating

environment– Specialized application

functionality• Designed to run under

various virtualization technologies– VMware , Xen, Parallels,

Microsoft Virtual PC, QEMU, User mode Linux, CoLinux, Virtual Iron…

• Allieviate the deployment in a traditional server environment– Complex configuration– Maintenance

ExamplerPath: Software Appliance Company

Page 24: The ALICE Grid The beat of a different drum

2423/04/07 fca @ ACAT07

Practical exercise: AliEn Appliance

External Dependencies

AliEn

busybox(system tools)

ggbox

System devices

Kernel

Grid Appliance

++

=

Page 25: The ALICE Grid The beat of a different drum

2523/04/07 fca @ ACAT07

AliEnX • AliEn Linux – minimal guest OS capable of running AliEn

services and hosting Grid applications– http://alien.cern.ch/twiki/bin/view/AliEnX– http://alien.rpath.org

• Built using rPath tools (rBuilder and Conary package manager)

• AliEn Appliance Version 0.4– x86 Mountable Filesystem (Xen Virtual Appliance) – x86_64 Mountable Filesystem (Xen Virtual Appliance) – x86 VMware (R) ESX Server Virtual Appliance – x86 Installable CD/DVD – x86_64 Parallels, QEMU (Raw Hard Disk) – x86 Parallels, QEMU (Raw Hard Disk)

• Already usable as User Interface– Generic, can be customized for other purposes– To do: Run Grid Jobs in, VM

Xen 3.0.3 Native

Simu 193 s 191.5 s

Reco 52 s 51 s

3 GHz Pentium D, 1GB RAM, AliRoot

Page 26: The ALICE Grid The beat of a different drum

2623/04/07 fca @ ACAT07

Use cases for Virtual Machines ?• Grid

– Sandbox environment for job execution on WN– Enhanced site security

• VO box– Enhanced Scalability

• User Interfaces– Separation of Grid and system environment– Reducing Grid initiation threshold

• Specialized environments– PROOF/CAF

• process migration• kernel modules to enable fancy user space file systems• P2P like object sharing and caching

• Training setups– Make sure that everyone has the same environment when they walk in

training room • Testing environments

– Easy to setup, saving time and money

Page 27: The ALICE Grid The beat of a different drum

2723/04/07 fca @ ACAT07

A cloud over the Grid?

http://www.rpath.com/corp/amazon.html

Page 28: The ALICE Grid The beat of a different drum

2823/04/07 fca @ ACAT07

Conclusions• AliEn has allowed ALICE to exploit its distributed computing

resources achieving different objectives, potentially contradictory– Make maximum usage of the existing Grid MW– A stable and uniform environment for processing and

analysing ALICE data– A lean environment for development and test of new

technologies• The AliEn MW has been tested in production and we are

confident it provides a solid framework for ALICE computing• A promising area that we are exploring now with AliEn is

VM– Coming back as viable technology– Potential benefits for users and resource providers– Technology and business model are catching up fast– They may not solve all our problems, but they can make

solutions faster and easier

Page 29: The ALICE Grid The beat of a different drum

2923/04/07 fca @ ACAT07