pep-ii reliability and uptime

27
PEP-II Reliability and Uptime Roger Erickson 10 October 2003 With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data.

Upload: milt

Post on 04-Jan-2016

41 views

Category:

Documents


1 download

DESCRIPTION

PEP-II Reliability and Uptime. Roger Erickson 10 October 2003 With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data. Excludes “long” downtimes and holiday shut-downs. Statistics: Causes of Unscheduled Down Time. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PEP-II Reliability and Uptime

PEP-II Reliabilityand Uptime

Roger Erickson10 October 2003

With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data.

Page 2: PEP-II Reliability and Uptime

Excludes “long” downtimes and holiday shut-downs.

Page 3: PEP-II Reliability and Uptime

Statistics: Causes ofUnscheduled Down Time

• 3 PEP-II running periods considered: January 2000 through June 2003.

• 22,936 total scheduled operating hours.• 2994 hours unscheduled down time.• 5469 reported malfunctions (“events”).• 1317 events directly tied to lost hours.

We can sort the data by area of the machine (HER, linac, etc.), by system categories (RF, vacuum, etc.), by date, and by details of resolution.

Page 4: PEP-II Reliability and Uptime

Accelerator Performance Statistics

Definitions:

Revealed failures: malfunctions resulting in lost beam time. Also called “events”.

Unscheduled down time: hours lost from scheduled program due to malfunctions.

Mean Time to Fail:

MTTF = Scheduled beam timeEvents

Mean Time to Repair:

MTTR = Unscheduled down timeEvents

Availability = 1 - Unscheduled down timeScheduled beam time

NOTE: PEP-II aborts are not counted as downtime, unless the event is reported; i.e., unless we stop to fix something and make a database entry.

Page 5: PEP-II Reliability and Uptime
Page 6: PEP-II Reliability and Uptime
Page 7: PEP-II Reliability and Uptime
Page 8: PEP-II Reliability and Uptime

PEP-II Run Totals

Run 1: 1/12/00 – 10/31/00 Run 2: 2/4/01 – 6/30/02 Run 3: 11/15/02 – 6/30/03

Long annual downtimes and holiday shut-downs are not included.

Page 9: PEP-II Reliability and Uptime

Hardware Availability by Run

MTTF MTTR Availabilityhours hours percent

Run 1 18.57 2.39 87.1

Run 2 17.88 2.02 88.7

Run 3 15.28 2.63 82.8

MTTF has been getting shorter (worse) each run.MTTR improved from Run 1 to Run 2, but got worse during Run 3.

Page 10: PEP-II Reliability and Uptime

Unscheduled Downtime by Major System

System Run 1 Run 2 Run 3

Injection 5.6 5.0 4.2

PEP Rings 6.8 4.6 10.7

BaBar 0.3 1.2 0.8

PG&E 0.2 0.5 1.5

Availability 87.1 88.7 82.8

Total 100.0 100.0 100.0

Unscheduled down time (percentage), sorted by responsible system.

Page 11: PEP-II Reliability and Uptime
Page 12: PEP-II Reliability and Uptime

MTTR : PEP-II Rings

Run 1 Run 2 Run 3 Run 1 Run 2 Run 3

MTTR MTTR MTTR Evnts DT hrs Evnts DT hrs Evnts DT hrs

Power Supplies 2.37 1.52 1.50 61 144.7 97 147 83 124.9

Magnets 3.05 2.50 4.80 2 6.1 3 7.5 3 14.4

RF 2.47 1.80 2.71 55 135.8 58 104.2 47 127.6

Vacuum 10.58 3.82 28.68 5 52.9 26 99.4 6 172.1

Utilities 3.29 1.93 1.88 14 46 28 53.9 12 22.6

Controls 1.39 1.45 1.69 42 58.5 63 91.3 32 54.0

Safety 0.70 1 0.7

Other 2.85 1.69 4.13 2 5.7 8 13.5 6 24.8

Totals 182 450.4 283 516.8 189 540.4

Page 13: PEP-II Reliability and Uptime

Time Required for Repairs

Beam time lost EventsPercent of

total eventsHours

down% of

total DT

> 0 to 1.0 hours 641 48.7% 383.4 12.8%

> 1.0 to 2.0 hours 286 21.7% 463.6 15.5%

> 2.0 to 4.0 hours 241 18.3% 723.0 24.1%

> 4.0 to 8.0 hours 85 6.5% 485.8 16.2%

> 8.0 to 24.0 hours 56 4.3% 686.0 22.9%

> 24.0 hours 8 0.6% 252.7 8.4%

         

1317 100.0% 2994.5 100.0%

Combined data set from all three runs.

Page 14: PEP-II Reliability and Uptime

PEP Rings Events Requiring > 2 hours to Repair

Run 3 Data:

33 % of PEP ring eventsrequire > 2 hours to repair.

These account for81 % of PEP ring down time.

Page 15: PEP-II Reliability and Uptime

Problems Requiring > 24 hours to Fix

January 2000 – June 2003:• 5 vacuum chamber failures in PEP rings.

Some known vulnerabilities were already receiving attention.Vacuum task force is studying options for upgrading some chambers.

• 2 site-wide electrical power outages.These were outside SLAC’s control.

• SLTR quadrupoles overheated when cooling water pump stopped, but power remained on.

Page 16: PEP-II Reliability and Uptime

Recent Problems Requiring > 24 hours to Fix

August 20, 2003:

VVS transformer failure in linac.

• Failure occurred during E158; no impact on PEP. Two days for full recovery.• Failure was in the only dry-type transformer among 16 VVS’s. Oil-filled, fixed-ratio

replacement options being investigated.

September 12, 2003:

Site-wide power failure when tree grew too closeto 230 kV line. Time lost to PEP program >47 hours.

• Tree trimming had not been done on established schedule.• SLAC now has new contract with tree-trimmer company, with option to renew for five

years.

Page 17: PEP-II Reliability and Uptime

Underlying Problems Sometimes Cross Technical and Jurisdictional Boundaries

• Seasonal high ambient temperatures cause drift, jitter, timing-shifts, spurious trips, and sometimes component failures in power supplies and sensitive electronics.

• Plan to air-condition the electronics alcove at Linac Sector 0, which houses the master oscillator and electronics critical to accelerator timing. A contract has been awarded.

• Several PEP support buildings have temperature control problems on hot days. More needs to be done to identify cost-effective improvements.

An example of a problem not easily identified by counting malfunction reports.

Page 18: PEP-II Reliability and Uptime

Injection and Tuning

Normal top-off:

Typically 4 to 5 minutes to fill at intervals of 40 to 50 min. Approx. 10% of scheduled run time.

Why is 21% spent injecting and tuning?Beam aborts require fill from scratch; typically 15 to 25 minutes each time.

Page 19: PEP-II Reliability and Uptime

Beware of Double counting: An abort in one ring usually leads to an abort in the other.

Page 20: PEP-II Reliability and Uptime

HER RF Aborts

Station Run 2 Run 3

– 12-1: 0.33 1.1 aborts/day– 12-3: 0.50 0.34– 8-1: 0.22 0.57– 8-3: 0.50 0.68– 8-5: 0.51 0.66– 12-6: 1.65*Total = 2.1 5.0 aborts/day

– All stations were worse in 2003, except 12-3.

* 12-6 fault accounting only available since 10-May-2003.

Page 21: PEP-II Reliability and Uptime

LER RF Aborts

Station Run 3

– 4-3: 0.88 aborts/day* – 4-4: 0.55 (was 0.56 in 2002)– 4-5: 0.55 (was 0.53 in 2002)

Total = 2 aborts per day

* 4-3 fault accounting only available since 10-May-2003.

Page 22: PEP-II Reliability and Uptime

BaBar Radiation Aborts

3-year trend, based on data latched by accelerator control system:

– 2000: 5.6 aborts/day– 2001: 4.1 – 2002: 3.6– 2002/3: 2.8

Page 23: PEP-II Reliability and Uptime

Injection and Tuning Summary

Percentages of scheduled operating hours:

• Normal top-offs: 10%

Fill from scratch following:• RF aborts: 6.3%• BaBar radiation aborts: 3.5%

Approximate total: 20%

Trickle charging could have significant beneficial impact!

Page 24: PEP-II Reliability and Uptime

Scheduled Off Time

• No routine scheduled maintenance days.

• Repair Opportunity Days (“RODs”) are launched when needed for show-stoppers or upgrade projects (typically 1/month).

• As many ROD and SML jobs as possible are completed during program interruption (typically 50 to 100 identified jobs).

Page 25: PEP-II Reliability and Uptime

Personnel Protection System (PPS) Testing

• Formerly required approx 3 months of beam-off, most of which was folded into long downtimes, but “verifications” were required at 6-month intervals.

• Net impact on PEP program depended on interval between long downtimes. Typically about 2 weeks/year.

• New policies and procedures have reduced testing to about 3 weeks once each year to coincide with long downtimes, plus operator interlock checks.

Page 26: PEP-II Reliability and Uptime

Opportunities for FurtherPPS Testing Improvements

• Add switches and indicators to further decouple zones/subsections/systems for testing purposes.

• Further streamline test procedures (much progress made last year).• Train/authorize more staff members, so that testing can be done 24

hours/day when opportunities arise.

Additional uptime to be gained?Possibly 1 week/year, depending on long downtime schedule and “opportunistic” down days.

Long-range proposal: Replace linac and BSY PPS with modern system to facilitate testing and minimize downtime for diagnosing problems.

Page 27: PEP-II Reliability and Uptime

How to Increase PEP-II Up Time:Challenges to Ourselves

• Allocate resources among hardware projects to achieve optimal improvement in MTTF.

• Identify common-mode or infrastructure projects that will improve overall uptime and stability.

• Find ways to reduce frequency of aborts.

• Minimize scheduled off time through policy and procedure changes and aggressive scheduling.

• Reduce MTTR with improved procedures, diagnostic tools, and organizational efficiency.