tony.cass@ cern.ch 1 issues hardware management –where are my boxes? and what are they? hardware...

6

1 Tony.Cass@CERN .ch Issues Hardware Management – Where are my boxes? and what are they? Hardware Failure – #boxes MTBF + Manual Intervention = Problem! Software Consistency – Operating system and managed components – Experiment software State Management – Evolve configuration with high level directives, not low level actions. Maintain service despite failures – or, at least, avoid dropping catastrophically below expected service level.

Upload: nigel-paul

Post on 14-Dec-2015

212 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Tony.Cass@ CERN.ch 1 Issues Hardware Management –Where are my boxes? and what are they? Hardware Failure –#boxes MTBF + Manual Intervention = Problem!

1 [email protected]

Issues Hardware Management

– Where are my boxes? and what are they? Hardware Failure

– #boxes MTBF + Manual Intervention = Problem! Software Consistency

– Operating system and managed components– Experiment software

State Management– Evolve configuration with high level directives, not

low level actions. Maintain service despite failures

– or, at least, avoid dropping catastrophically below expected service level.

Page 2: Tony.Cass@ CERN.ch 1 Issues Hardware Management –Where are my boxes? and what are they? Hardware Failure –#boxes MTBF + Manual Intervention = Problem!

2 [email protected]

Hardware Management We are not used to handling boxes on this

scale.– Essential databases were designed in the ’90s for

handling a few systems at a time.» 2FTE-weeks to enter 450 systems!

– Chain of people involved» prepare racks, prepare allocations, physical install, logical

install» and people make mistakes…

Developing Hardware Management System to track systems– 1st benefit has been to understand what we do!– Being used to track systems as we migrate to our

new machine room.– Would now like SOAP interfaces to all databases.

Page 3: Tony.Cass@ CERN.ch 1 Issues Hardware Management –Where are my boxes? and what are they? Hardware Failure –#boxes MTBF + Manual Intervention = Problem!

3 [email protected]

Hardware Failure MTBF is high, but so is the box count.

– 2400 disks @ CERN today: 3.5106 disk-hours/week

» 1 disk failure per week

Worse, these problems need human intervention.

Another role for the Hardware Management System– Manage list of systems needing local

intervention.» Expect this to be prime shift activity only; maintain list

overnight and present for action in the morning.

– Track systems scheduled for vendor repair» Ensure vendors meet contractual obligations for intervention» Feedback subsequent system changes (e.g. new disk, new

MAC address) into configuration databases.

Page 4: Tony.Cass@ CERN.ch 1 Issues Hardware Management –Where are my boxes? and what are they? Hardware Failure –#boxes MTBF + Manual Intervention = Problem!

4 [email protected]

Keeping nodes in order

NodeConfiguration

SystemMonitoring

System

InstallationSystem

Fault MgmtSystem

Page 5: Tony.Cass@ CERN.ch 1 Issues Hardware Management –Where are my boxes? and what are they? Hardware Failure –#boxes MTBF + Manual Intervention = Problem!

5 [email protected]

State Management Clusters are not static

– OS upgrades– reconfiguration for special assignments

» c.f. Higgs analysis for LEP

– load dependent reconfiguration» but best handled by common configuration!

Today:– Human identification of nodes to be moved,

manual tracking of nodes through required steps. Tomorrow:

– Give me 200, any 200. Make them like this. By then.

– A State Management System.» Development starting now.» Again, needs tight coupling to monitoring &

configuration systems.

Page 6: Tony.Cass@ CERN.ch 1 Issues Hardware Management –Where are my boxes? and what are they? Hardware Failure –#boxes MTBF + Manual Intervention = Problem!

6 [email protected]

Grace under Pressure The pressure:

– Hardware failures– Software failure

» 1 mirror failure per day» 1% of CPU server nodes fail per day

– Infrastructure failure» e.g. AFS servers

We need a Fault Tolerance System– Repair simple local failures

» and tell the monitoring system…

– Recognise failures with wide impact and take action» e.g. temporarily suspend job submission

– Complete system would be highly complex, but we are starting to address simple cases.

Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 Tony.Cass@ CERN.ch

The CERN Computer Centres October 14 th 2005 Tony.Cass@ CERN.ch

Batch System Operation & Interaction with the Grid LCG/EGEE Operations Workshop May 25 th 2005 Tony.Cass@ CERN.ch

Die Cut Patterns Ballot Boxes Bin Boxes Display Boxes Gift Boxes Mailers Gift Boxes Mailers Folding Cartons Setup Boxes Software Packaging

Electric Hardware Identification Interactive Quiz Boxes Interactive Quiz Boxes

Shutters & Hardware • Flower Boxes • Interior & Exterior Doors · Shutters, Flower Boxes, Carriage House Doors, Windows, Entry Doors, Bi-folding Wall Systems, Exterior and Interior

ANCHORING - OHM International Line Hardware Maclean Power Systems Performance Line ... A.B. CHANCE - Pole Line Hardware, Anchoring A.B. CONTROLS - Junction Boxes A.O. SMITH - …

HEPiX Virtualisation Working Group Status, June 22 nd 2010 [email protected]

Using Low-cost Cryptographic Hardware to “Rob a …rnc1/talks/040311-BankRobbery.pdf · Using Low-cost Cryptographic Hardware to “Rob a Bank ... Some S-Boxes Have Structure •

2001 Summer Student Lectures Computing at CERN Lecture 4 — The Grid & Software Tony Cass — [email protected]

Physical Infrastructure Issues In A Large Centre July 8 th 2003 Tony.Cass@ CERN .ch

Trinity | A Full Service Packaging Supply Company · 2015-07-25 · Gift Boxes Moving Boxes Printers Boxes Easy Fold Mailers Tall Boxes a Bulk Cargo Boxes White Corrugated Boxes Insulated

boxes€¦ · Title: boxes Created Date: 20131114230207Z

Grandview Screen DYNAMIQUE R3 MANUAL - Cloudinary · 2021. 2. 25. · Step 1: check for boxes and spare parts; total of five boxes including hardware kit. Step 2: Check for Decorative

Designing Graphical User Interfaces Components of GUIs – icons, cursors, menus, dialogue boxes, form fillin, error messages Interface hardware - keyboards,

The rise of configurable I/O · 2016. 10. 11. · 12,500 I/O, using 48-CHARM boxes instead of 96-CHARM boxes adds about $785,000, or about 10% to the DeltaV DCS’s hardware costs,

Embedded Systems : Hardware or Software?jazi.staff.ugm.ac.id/Jazi- Embedded_systems.pdf · – Real-time video, set-top boxes, DVD players, medical equipment ... – Baik hardware

Hardware Cabinet Part 2 - WordPress.com · Hardware Cabinet Part 2 The original hardware cabinet is made entirely from old crating and cigar boxes. Most of its various parts have

Computer Centre Upgrade Status & Plans Post-C5, June 27 th 2003 Tony.Cass@ CERN.ch

LCG Status GridPP 8 September 22 nd 2003 Tony.Cass@ CERN.ch

COMPUTER MAIN PARTS. HARDWARE AND SOFTWARE HARDWARE It refers to all physical parts of a computer system. They are cables, cabinets or boxes, peripherals

1999 Summer Student Lectures Computing at CERN Lecture 2 — Looking at Data Tony Cass — [email protected]

FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch

CONDUCTIVE BOXES + SMD-BOXES 8

FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN.ch

SMALL BUSINESS MANAGEMENT Chapter 11 Human Resources Management SF Hardware Family caseFamily case Family Case Cali Restaurant Boxes R US

Chapter 1 WIRING - pcmcindia.gov.in · material including hardware, binding wire, fish wire; accessories such as deep / long neck PVC junction boxes, PVC / MS junction / draw-in boxes,

Indexes, Index Boxes, Security Seals and Hardware · Indexes, Index Boxes, Security Seals and Hardware ... (e.g. 5 circle index ... Gearing Ratio Desired– Found by dividing the

Fabric Infrastructure LCG Review November 18 th 2003 Tony.Cass@ CERN.ch

Simly Smart • March 2018 Simply Smart...including food packaging, bakery boxes, medicine boxes, perfume boxes, kraft boxes, e-flute boxes, duplex and food board printed boxes, PE

IBM Research Wireless Security Initiatives - IBM WWW … · IBM Research Wireless Security Initiatives ... Smart Card Hardware Hardware ... Mobile phones PCMCIA cards Standalone boxes

2001 Summer Student Lectures Computing at CERN Lecture 1 — Looking Around Tony Cass — [email protected]

CASES - ENCLOSURES - FEET - HARDWARE & · PDF filec o n t e n t s potting boxes david use photo sd0010 cases - enclosures - feet - hardware & fittings abs boxes 2046 - 2048

Pizza Boxes | Custom Pizza Boxes