chep – mumbai, february 2006 state of readiness of lhc computing infrastructure jamie shiers, cern

Download CHEP – Mumbai, February 2006 State of Readiness of LHC Computing Infrastructure Jamie Shiers, CERN

Post on 14-Jan-2016

215 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • CHEP Mumbai, February 2006State of Readiness of LHC Computing Infrastructure

    Jamie Shiers, CERN

    CERN - Computing Challenges

  • IntroductionSome attempts to define what readiness could mean

    How we (will) actually measure it

    Where we stand today

    What we have left to do or can do in the time remaining

    Timeline to First Data

    Related Talks

    Summary & Conclusions

    CERN - Computing Challenges

  • What are the requirements?Since the last CHEP, we have seen:

    The LHC Computing Model documents and Technical Design Reports;The associated LCG Technical Design Report;The finalisation of the LCG Memorandum of Understanding (MoU)

    Together, these define not only the functionality required (Use Cases), but also the requirements in terms of Computing, Storage (disk & tape) and Network

    But not necessarily in an site-accessible format

    We also have close-to-agreement on the Services that must be run at each participating siteTier0, Tier1, Tier2, VO-variations (few) and specific requirements

    We also have close-to-agreement on the roll-out of Service upgrades to address critical missing functionality

    We have an on-going programme to ensure that the service delivered meets the requirements, including the essential validation by the experiments themselves

    CERN - Computing Challenges

  • How do we measure success?By measuring the service we deliver against the MoU targets

    Data transfer rates;Service availability and time to resolve problems;Resources provisioned across the sites as well as measured usage

    By the challenge established at CHEP 2004:

    [ The service ] should not limit ability of physicist to exploit performance of detectors nor LHCs physics potentialwhilst being stable, reliable and easy to use

    Preferably both

    Equally important is our state of readiness for startup / commissioning, that we know will be anything but steady state

    [ Oh yes, and that favourite metric Ive been saving ]

    CERN - Computing Challenges

  • LHC StartupStartup schedule expected to be confirmed around March 2006

    Working hypothesis remains Summer 2007

    Lower than design luminosity & energy expected initially

    But triggers will be opened so that data rate = nominal

    Machine efficiency still an open question look at previous machines???

    Current targets: Pilot production services from June 2006Full production services from October 2006Ramp up in capacity & throughput to TWICE NOMINAL by April 2007

    CERN - Computing Challenges

  • LHC CommissioningExpect to be characterised by:

    Poorly understood detectors, calibration, software, triggers etc.

    Most likely no AOD or TAG from first pass but ESD will be larger?

    The pressure will be on to produce some results as soon as possible!

    There will not be sufficient resources at CERN to handle the load

    We need a fully functional distributed system, aka Grid

    There are many Use Cases we did not yet clearly identify

    Nor indeed test --- this remains to be done in the coming 9 months!

    CERN - Computing Challenges

  • LCG Service HierarchyTier-2 ~100 centres in ~40 countriesSimulationEnd-user analysis batch and interactiveLes Robertson

    CERN - Computing Challenges

  • The DashboardSounds like a conventional problem for a dashboard

    But there is not one single viewpoint

    Funding agency how well are the resources provided being used?VO manager how well is my production proceeding?Site administrator are my services up and running? MoU targets?Operations team are there any alarms?LHCC referee how is the overall preparation progressing? Areas of concern?

    Nevertheless, much of the information that would need to be collected is common

    So separate the collection from presentation (views)

    As well as the discussion on metrics

    CERN - Computing Challenges

  • The RequirementsResource requirements, e.g. ramp-up in TierN CPU, disk, tape and networkLook at the Computing TDRs;Look at the resources pledged by the sites (MoU etc.);Look at the plans submitted by the sites regarding acquisition, installation and commissioning;Measure what is currently (and historically) available; signal anomalies.

    Functional requirements, in terms of services and service levels, including operations, problem resolution and supportImplicit / explicit requirements in Computing Models;Agreements from Baseline Services Working Group and Task Forces;Service Level definitions in MoU;Measure what is currently (and historically) delivered; signal anomalies.

    Data transfer rates the TierX TierY matrixUnderstand Use Cases;Measure

    And test extensively, both dteam and other VOs

    CERN - Computing Challenges

  • The RequirementsResource requirements, e.g. ramp-up in TierN CPU, disk, tape and networkLook at the Computing TDRs;Look at the resources pledged by the sites (MoU etc.);Look at the plans submitted by the sites regarding acquisition, installation and commissioning;Measure what is currently (and historically) available.

    Functional requirements, in terms of services and service levels, including operations, problem resolution and supportImplicit / explicit requirements in Computing Models;Agreements from Baseline Services Working Group and Task Forces;Service Level definitions in MoU;Measure what is currently (and historically) delivered; signal anomalies.

    Data transfer rates the TierX TierY matrixUnderstand Use Cases;Measure

    And test extensively, both dteam and other VOs

    CERN - Computing Challenges

  • Resource Deployment and UsageResource Requirements for 2008

    CERN - Computing Challenges

  • Tier-0Tier-2sTier-1sCERN Analysis FacilityATLAS Resource Ramp-Up Needs

    CERN - Computing Challenges

  • Site Planning CoordinationSite plans coordinated by LCG Planning Officer, Alberto Aimar

    Plans are now collected in a standard format, updated quarterly

    These allow tracking of progress towards agreed targets

    Capacity ramp-up to MoU deliverables;Installation and testing of key services;Preparation for milestones, such as LCG Service Challenges

    CERN - Computing Challenges

  • Measured Delivered CapacityVarious accounting summaries:

    LHC View http://goc.grid-support.ac.uk/gridsite/accounting/tree/treeview.phpData Aggregation across CountriesEGEE View http://www2.egee.cesga.es/gridsite/accounting/CESGA/tree_egee.phpData Aggregation across EGEE ROCGridPP View http://goc.grid-support.ac.uk/gridsite/accounting/tree/gridppview.phpSpecific view for GridPP accounting summaries for Tier-2s

    CERN - Computing Challenges

  • The RequirementsResource requirements, e.g. ramp-up in TierN CPU, disk, tape and networkLook at the Computing TDRs;Look at the resources pledged by the sites (MoU etc.);Look at the plans submitted by the sites regarding acquisition, installation and commissioning;Measure what is currently (and historically) available.

    Functional requirements, in terms of services and service levels, including operations, problem resolution and supportImplicit / explicit requirements in Computing Models;Agreements from Baseline Services Working Group and Task Forces;Service Level definitions in MoU;Measure what is currently (and historically) delivered; signal anomalies.

    Data transfer rates the TierX TierY matrixUnderstand Use Cases;Measure

    And test extensively, both dteam and other VOs

    CERN - Computing Challenges

  • Reaching the MoU Service TargetsThese define the (high level) services that must be provided by the different Tiers

    They also define average availability targets and intervention / resolution times for downtime & degradation

    These differ from TierN to TierN+1 (less stringent as N increases) but refer to the compound services, such as acceptance of raw data from the Tier0 during accelerator operation

    Thus they depend on the availability of specific components managed storage, reliable file transfer service, database services,

    Can only be addressed through a combination of appropriate:Hardware; Middleware and ProceduresCareful Planning & PreparationWell understood operational & support procedures & staffing

    CERN - Computing Challenges

    Same, COD-6, Barcelona

    Service Monitoring - IntroductionService Availability Monitoring Environment (SAME) - uniform platform for monitoring all core services based on SFT experienceTwo main end users (and use cases):project management - overall metricsoperators - alarms, detailed info for debugging, problem trackingA lot of work already done:SFT and GStat are monitoring CEs and Site-BDIIsData schema (R-GMA) establishedBasic displays in place (SFT report, CIC-on-duty dashboard, GStat) and can be reused

    CERN - Computing Challenges

  • Service Level DefinitionsTier0 services: C/H, Tier1 services: H/M, Tier2 services M/L

    ClassDescriptionDowntimeReducedDegradedAvailabilityCCritical1 hour1 hour4 hours99%HHigh4 hours6 hours6 hours99%MMedium6 hours6 hours12 hours 99%LLow12 hours24 hours48 hours98%UUnmanagedNoneNoneNoneNone

    ServiceMaximum delay in responding to operational problemsAverage availability measured on an annual basisService interruptionDegradation by more than 50%Degradation by more than 20%During accelerator operationAt all other timesAcceptance of data from the Tier-0 Centre during accelerator operation12 hours12 hours24 hours99%n/aNetworking service to the Tier-0 Centre during accelerator operation12 hours24 hours48 hours98%n/aData-intensive analysis services, including networking to Tier-0, Tier-1 Centres outside accelerator operation24 hours48 hours48 hoursn

Recommended

View more >