tier0 status

20
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 16 th February 2009

Upload: alda

Post on 22-Mar-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Tier0 Status. Tony Cass LCG-LHCC Referees Meeting 16 th February 2009. Agenda. Resources CASTOR status and performance Progress with new data centre project. Agenda. Resources CASTOR status and performance Progress with new data centre project. November Status. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 1

Tier0 Status

Tony Cass

LCG-LHCC Referees Meeting16th February 2009

Page 2: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 2

Agenda• Resources• CASTOR status and performance• Progress with new data centre project

Page 3: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 3

Agenda• Resources• CASTOR status and performance• Progress with new data centre project

Page 4: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 4

Status of 2009 procurements• CPU

– First batch• Ordered out in late August• Delivery before November• Production in December or early 2009

– Second batch• Received the tender answers• Target FC approval in December• Delivery before March 2009• Production in March – April 2009

• Disk – First batch

• FC approval last week• Delivery in December• Production January 2009

– Second batch• Received the tender answers• Target FC approval in December• Delivery before March 2009• Production in March – April 2009

• Tape– Media availability not a problem but exact procurement schedule

depends on progress with new repack service between now and beginning of 2009

2 of 3 batches already on site

FC approval not required, but delivery scheduleunchanged (installation depends on readiness of racks)

JanuaryFebruary

70 Sun T10KB drives ordered (1TB/cartridge)

T10KA drives to be phased out as repack advances.

November Status

No orders issued following December

statement on likely schedule

No orders issued following December

statement on likely schedule

Page 5: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Procurements2009 Status & 2010 outlook• CPU & Disk

– ~60% of foreseen 2009 pledges available in April– (Additional ATLAS request not included)

– Balance to be operational in October• Tight schedule, but agreed with Purchasing dept.• Exploring options to purchase iSCSI disk storage

– Greater cost/TB, but avoids interruption to CASTOR service due to disk server failure (#1 cause of incidents; disk failures are handled transparently)

– 2010 procurement planning underway• Tenders issued in June; adjudication in ~November.

• Tape– Expect ~20PB spare capacity by October.– Will purchase “high density” IBM robot in autumn

• 14,000 slots — 14PB– Can convert an existing IBM robot to “high density’

version in 2010 (with no service interruption) if additional capacity required. Tier0 Status - 5

Page 6: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Resource Usage Efficiency • CPU/Wall ratio has long been a concern:• But utilisation of the public LXBATCH

cluster is generally high:• Still, we see many jobs waiting for tape

recalls– New “backfill” option introduced to schedule

short jobs when long waits for tape expected.– Nice improvement seen:– Need to review settings and publicise to

improve impact.

Tier0 Status - 6

(CPU...)

Page 7: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

SLC5 Migration• Migration of batch resources underway

– All new capacity introduced will be SLC5 based

– Existing capacity migrated progressively.• Migration of LXPLUS alias is an issue:

– Principle is easy: switch when majority of batch capacity is SLC5. But measured where?• @ CERN: switch early• on grid: switch late.

– No clear/obvious solution yet.• [Rapid migration of other grid sites would help. And

is maybe sensible before September anyway?]

Tier0 Status - 7

Page 8: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 8

Agenda• Resources• CASTOR status and performance• Progress with new data centre project

Page 9: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 9

Agenda• Resources• CASTOR status and performance

– Upstream services (SRM, FTS)– CASTOR status & plans– Metrics

• Progress with new data centre project

Page 10: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 10

SRM & FTS• SRM 2.7 release is delayed

– Originally foreseen in June but has still not yet passed testing/certification

– Continue with 1.3 until LHC shutdown• SLC3 – hardware running out of warranty

retire/replace• Cannot be deployed in a fully redundant configuration• Built with an old castor client constrains the stager

deployment• FTS 2.1 passed certification too close to LHC startup– Continue with 2.0 service (SLC3)– Setting up an independent 2.1 production service

(SLC4) in parallel allowing VOs to move when convenient

Pre-production clusters in service for all LHC VOs

Production deployment before end-2008

FTS 2.1 production service available

Still being “tested” by experiments but most

production transfers already with this version

November Status

Page 11: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

• Status– Generally quiet/good...– ... except for tape repack

• BUT we are reasonably confident about our ability to support production; user analysis is the concern and there is no major load.– CASTOR 2.1.8, with integrated xrootd redirector, should

deliver improvements for analysis• LSF bypass & reduced latency, but also improved scalability as

xrootd daemon has smaller footprint than rfio (to be deprecated?)• Also delivers

– end-to-end checksumming for rfio– User space accounting (required for later deployment of

quotas)– operational improvements (notably automatic draining of disk

servers)– fixes to problems identified by repack (main reason for

deployment delays)• Schedule: end-Feb release, in production on c2cernt3 end-March,

deployment for experiment instances in April.

CASTOR Status & Plans

Tier0 Status - 11

Page 12: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 12

Performance metrics• Metrics have been implemented and

deployed on preproduction cluster– Data collected in lemon– RRD graphs not yet implemented

• Production deployment delayed for several reasons– New metrics imply several changes to

exception/alarms and automated actions used in production

– An unexpected technical dependency on the late SRM 2.7 version• Ongoing work to back-port the implementation

All still true

November Status

Much progress, but little visible; considering

how best to group metrics for display

• e.g. group cache hits and garbage collection

activity? However...

Page 13: Tier0 Status
Page 14: Tier0 Status
Page 15: Tier0 Status
Page 16: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 16

Agenda• Resources• CASTOR status and performance• Progress with new data centre project

Page 17: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 17

New data centre project• Reminder: the selected strategy is to

do a single tender for an overall solution

• Four phase process developed:1. Request (many) conceptual designs2. Commission 3-4 companies submitting

conceptual designs to develop an outline design

3. In-house, turn a selected outline design into plans and documents enabling

4. Single tender for overall construction.

Page 18: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

• Deadline: 28th November– Contacts with all 4 companies during design

phase– All 4 companies say deadline will be met

• Meetings to review proposed designs scheduled in week of December 8th.

• Market Survey in preparation as first stage in selection of company for detailed design & construction.

• Discussions in Oslo on 28th November to further investigate possible remote server installation in 2011 (and beyond)– RAL also have power available in 2011, but

not as much and for a shorter period. Tier0 Status - 18

Outline Design PhaseNovember Status

Page 19: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

• Four designs reviewed– No clear winner, but consensus on leading design.

• New Management supports project. Good, but…– New requirements --- “Green” & Prévessin heat

recovery option– New organisation brings new players to brief

• “Single Contract for construction” agreed• Agreement to work with one company to deliver

fully acceptable design with modifications for new requirements.– Will lead to ~6 month delay.– [Personal view] Plan to continue with only one

company should be agreed by Directorate now to avoid potential hiccups later. Frédéric Hemmer discussing with Sergio Bertolucci.

• Will need to revisit option to install equipment at University of Oslo.

Tier0 Status - 19

Current Status

Page 20: Tier0 Status

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier0 Status - 20

Questions?

Comments?