snt2015 poster no. t4.1-p02 failure mode and effect

P R E PA R ATO RY CO M M I S S I O N DisclaimerThe views expressed on this poster are those of the author and do not necessarily reflect the view of the CTBTO

SnT2015 Poster No. T4.1-P02 Failure Mode and Effect Analysis (FMEA) of the IMS radionuclide stations in the years 2013 and 2014Bernd Wernsperger , Matthias Auer, Theo JuurlinkPTS, IMS Division, Engineering and Development Section, Vienna International Center, A-1400 Vienna

Failure Modes in the

IMS RADIONUCLIDE PARTICULATE NETWORK Failure mode and effect analysis was carried out for the IMS particulate network from 2013/10/01 to 2014/10/31. In this period 63 radionuclide particulate stations were under certified operation achieving a network data availability of 85.5%. Out of 24760 days of expected data In the considered period the particulate stations suffered from a total downtime of 3598 days. The main contribution to the reduction of data availability results from problems related to station equipment (11.2%). Other downtime categories contribute no more than 1% each to the total downtime (see Figure 1). With regard to station equipment failures the major part of downtime (72.6%) is caused by problems related to the detection system: detector related problems alone reduce the particulate network data availability by 8.7% in the considered period. The second largest contributions to network downtime are created by problems related to station power (reducing the data availability by 0.9%) and by the computer system (data availability reduction by 0.8%).

Figure 1: Available and non available data (%) for IMS particulate network 2013/10/01-2014/10/31


IMS NOBLE GAS NETWORK – SPALAX Stations

In the years 2013 and 2014 the SPALAX noble gas network grew from two to nine certified stations. In a period between 2013/01/01 and 2015/01/31 there were 4965 days of data expected from the SPALAX network. 1311 days of downtime reduced the network data availability by 26.7% whereas the major contribution (23.9%) to downtime results from failures related to the station equipment. Looking at station equipment 46% of downtime is related to the HPGe-detector system and 32.6% to organizational issues with regard to detector replacement (shipping time and quality, setup of new detectors). It is pointed out that with regard to the SPALAX network detector related downtime reflects rather single incidents at single stations with excessively long downtimes than a statistical distribution among many stations. Nevertheless, with regard to the HPGe detection system the same problems like in the radionuclide particulate network are found here and, the same three areas of improvements must be focused on.

Data Available 73.7%

Sta0on Equipment 23.9%

Power Outage 1.7%

Planned Ac0vity 0.3%

Other 0.5%

SPALAX Data Availability 1 Jan 2013 -‐ 31 Jan 2015; 9 cer<fied SPALAXs 4965 days of data expected Total Down Time: 1311 days Data Available

Sta0on Equipment

Human Error

Power Outage

Environment

Planned Ac0vity

Third Party

GCI/ISN

Other

Air Sampling 4.2%

Organisa0onal issues related to detector

replacement 32.6%

Detec0on System 46.0%

Sample Processing System 5.4%

Hardware/Control Electronics

9%

SoPware 1.7%

Sta0on Infrastructure 7.1%

Distribu<on of equipment failures causing down <me -‐ SPALAX 1 Jan 2013 -‐ 31 Jan 2015 Total DT by Sta<on equipment failures: 1187 days

Air Sampling

Organisa0onal issues related to detector replacement Detec0on System

Sample Processing System

Gas Quan0fica0on System


SoPware

System Power Management

Anciliiary Data Acquisi0on System

Sta0on Infrastructure

Computer, 6.9%

Data Flow, 5.2%

Detec0on System, 77.8%

Filter Management, 1.1% Sta0on Power,

7.8% Sampler, 1.2%

Distribu<on of Sta<on Equipment related Failures (%) 1 Oct 2013 -‐ 31 Oct 2014; 63 sta<ons Total Down<me related to Sta<on Equipment: 2775 days Computer

Data Flow

Detec0on System Filter Management Sta0on Power

Sampler

Infrastructure

Data available, 85.5%

Equipment failure, 11.2%

Power outage, 0.4%

Planned ac0vity, 0.5% Human error, 0.6% Environment, 1.0%

Third party, 0.2%

GCI / ISN, 0.2%

unspecified, 0.4%

Available and non available data (%) 1 Oct 2013 -‐ 31 Oct 2014; 63 radionuclide sta<ons 24760 days of data expected Total Down <me: 3598 days

Data available

Equipment failure

Power outage

Planned ac0vity

Human error

Environment

Third party

GCI / ISN

unspecified

FAILURE MODES OF HPGE-DETECTION SYSTEMSHigh Purity Germanium detection system are used at IMS particulate stations and by SPALAX noble gas systems. Failure Mode and Effect Analysis of IMS stations in the years 2013 and 2014 has shown that the HPGe detection systems are the most sensitive component of IMS radionuclide stations. Therefore, a specific analysis of failure modes of detection systems was carried out for the operational period 2013/01/01 – 2014/10/31 when 72 IMS detection systems were in certified operation at 63 particulate stations and 9 SPALAX noble gas stations. In this period 101 incidents related to the detection systems and 14 infrastructure related problems caused a total of 115 incidents resulting in detector downtime of 3146 days (out of total 45953 days, i.e. 6.8%).

Figure 7 shows the distribution of downtime over the problem categories related to the detection system. Besides equipment related problems it is seen that in the considered period a significant contribution to detector downtime (15%) was added due to delays in detector setup procedures (i.e. issues with calibration, background measurements, data format, communication,…) and by third parties causing additional downtime (24.2%). The latter was related mainly to broken detectors after bad shipments or caused by organizational delays in the provision of detectors by the suppliers. A significant contribution to downtime (18.4%) was identified also by failures of station infrastructure which resulted in broken detection systems. The main reason for infrastructure failures were failing power systems (357 days) and air condition systems (215 days). Vacuum problems (13.0%) and broken preamplifiers (12.8%) still created significant downtimes as these problems required replacement of detectors. Broken FETs (3.9%) occurred rarely. Problems with MCAs and Cooling systems (6.4% each) occurred often, but downtimes were kept low due to the fact that these parts were available as on-site spares. Bascially, due to cost considerations only some selected components of detection system (such as MCA, specific electrical coolers, cooler controllers) are stored as on-site spares at the station. Full back-up detection systems are available only at very few IMS stations, where this is justified by the remoteness of the site. Figures 7a and 7b show the distribution of downtimes and incidents for Ortec detection systems. Here, the main contributions to downtime result from preceding infrastructure failures, delayed setup times and third party delays, as well as from vacuum problems requiring whole detector replacements. Notably, many incidents with MCAs and coolers result in comparably low downtime due to the fact that spare parts were available at the station.Figure 7c and 7d show the respective distributions for Canberra detectors. Again, main contribution to downtime come from the same problem categories (infrastructure, setup times, third party, preamplifier problems requiring full detector replacement).

Figure 8 shows the averaged downtime per incident grouped according to the problem categories. It can be seen that breaking components which are stored at the station (cooling equipment, MCAs) result in much less downtime than detector components which require component replacement including shipping procedures. This is also underlined in Table 1 below where the averaged downtime for an incident with an on-site spare part is 7.9 days, whereas an average of 111.8 days of downtime was experienced when spare parts and/or detector had to be shipped to the station.


IMS NOBLE GAS NETWORK – SAUNA Stations

The SAUNA noble gas network consisted of 10 stations throughout the years 2013 to 2014. The SAUNA network reached a data availability of 90.1% from 2013/01/01 to 2015/01/31. SAUNA’s data availability is reduced mainly by failures of Station Equipment (7.0%). The equipment related downtime is distributed equally among different system components. It is pointed out though that downtime of a certain station component (like for example infrastructure) is often driven by single incidents at single stations. Downtime related to certain system components have been already investigated thoroughly with a view to upgrading the system. In fact, improvement of the SAUNA system is already underway by installing upgrades. The upgrade of SAUNA consists of further developed components in the detection system (memory-free beta cells, digital signal processing in the detector electronics), improved sampling traps, new PCs, an improved design of the processing system, replacement of the gas chromatograph, and an sample archiving system extended to 7 days. Furthermore, the motor of the QC-source which sometimes caused downtime is being re-engineered.

Data Available 90.1%

Sta0on Equipment 7.0%

Planned Ac0vity 1.3%

GCI/ISN 0.1%

Other/Unspecified 1.5%

SAUNA Data Availability 1 Jan 2013 -‐ 31 Jan 2015; 10 SAUNA sta<ons 7194 days of data expected Total Down<me: 715.5 days

Data Available

Sta0on Equipment

Human Error

Power Outage

Environment

Planned Ac0vity

Third Party

GCI/ISN

Other/Unspecified

Air Sampling 12.0%

Sample Processing System 0.8% Nuclear Detec0on System

14.5%

Gas Quan0fica0on System 21.2%


2.3%

SoPware 11.2%


19.5%

Sta0on Infrastructure 18.4%

Distribu<on of down<me causes related to equipment failures -‐ SAUNA 1 Jan 2013 -‐ 31 Jan 2015; 10 sta<ons Total DT related to Sta<on equipment failures: 507 days

Air Sampling

Sample Processing System

Nuclear Detec0on System

Gas Quan0fica0on System


SoPware


Anciliiary Data Acquisi0on System

Sta0on Infrastructure

0 20 40 60 80

100 120 140 160 180

Averaged Down<me/Incident

Total days of detec0on system related Down0me 2567 days Total number of detec0on system reated Incidents 101

Average down0me/incident 25.4 days Total Down0me for incidents requiring spare part shipment 1900 days

Number of Incidents requiring spare part shipment 17 Average down0me/Incident requiring spare part shipment 111.8 days Total Down0me for incidents when spare part was on-‐site 667 days

Number of Incidents when spare part was on-‐site 84 Average down0me/Incident when spare part was onsite 7.9 days

Figure 2: Distribution of equipment related downtimefor IMS particulate network 2013/10/01-2014/10/31

Figure 3: Data availability and downtime contributionfor IMS SPALAX noble gas network 2013/01/01-2015/01/31

Figure 4: Distribution of equipment related downtimefor SPALAX noble gas network 2013/01/01-2015/01/31

Figure 5: Data availability and downtime contributionfor IMS SAUNA noble gas network 2013/01/01-2015/01/31

Figure 6: Distribution of equipment related downtimefor SAUNA noble gas network 2013/01/01-2015/01/31

Figure 7: Distribution of downtime related to HPGe detection system components (2013/01/01-2015/10/31)

STRATEGIES FOR IMPROVING DETECTION SYSTEM UPTIME

1. SPARING POLICY: In order to improve the detector supply, more regional depots should be established in order to shorten distances and increase flexibilities of supply. The number of stations having on-site spare detectors could be increased. With regard to on-site spares appropriate quality control and assurance measures must be in place to avoid failing of operating and spare systems at the same time.

2. STREAMLINING AND ENFORCING OF DETECTOR SETUP PROCEDURES:A process of streamlining the detector setup procedure at the PTS is already in place since a short time. Related to this, communication and instructions to the station operators must be enforced and station operator must be following timely.

3. TECHNOLOGY IMPROVEMENT IN COOPERATION WITH SUPPLIERS:Intensified cooperation with detector suppliers to improve detector reliability has started in the second half of 2014. In parallel liquid nitrogen (LN2) cooled detectors are still explored as an option. LN2 generators could be a possibility to make LN2 available independent from third suppliers. Such equipment is tested at the moment.

4. IMPROVED SHIPMENT PROCEDURES:The PTS has started to cooperate with detector suppliers on improvements in this area, specifically the packaging of detection systems will be made even more ruggedized and will be standardized for the IMS network.

5. IMPROVEMENT OF STATION INFRASTRUCTURE:Stable station air conditioning and power systems are a key factor for operation of HPGe detection systems. Failures in the power system and the air conditioning can result in significant downtimes as was seen in the past period. Therefore, engineering projects to improve station power have started already for several stations. With regard to the station air conditioning, a special focus is put on this issue, too, because the quality of the station climate control is directly linked to the detector systems as well.

Figure 8: averaged downtime per incident grouped accordingDetection system problem category

Table 1: Comparison of averaged downtimes per incidentdepending on on-site availability of spare parts.

Figure 7a: Downtimes of Ortec detection systems Figure 7b: Incidents causing downtime of Ortec detection systems

Figure 7d: Incidents causing downtime of Canberra detection systemsFigure 7c: Downtimes of Canberra detection systems

Cooling System 5.0%

Electronics -‐ FET 5.0%

Electronics -‐ MCA 20.0%

Electronics -‐ Preamplifier

20.0%

Temperature sensor 10.0%

Vacuum problem (cryostat, sieves…)

5.0%

DT caused by Third Party 15.0%

Delayed Detector setup <me

5.0%

Infrastructure failure causing detector DT

15.0%

Distribu<on of Incidents with down<me of Canberra detector components Cooling System

Electronics -‐ FET

Electronics -‐ MCA


Temperature sensor

Vacuum problem (cryostat, sieves…) DT caused by Third Party

Delayed Detector setup 0me


Cooling System 0.3%




27.3%


Vacuum problem (cryostat, sieves…) 3.4%

DT caused by Third Party

43.5% Delayed Detector

setup <me 13.3%


9.6%

Distribu<on of down<me of Canberra detector components

Cooling System 26.3%

detailed info missing...

2




3.2%


4.2%


Delayed Detector setup <me

3.2%


11.6%

Distribu<on of incidents with down<me of Ortec detector components

Cooling System

detailed info missing/not documented








Cooling System 9.5%


10.1%




5.5%


17.9%


7.7%

Delayed Detector setup <me 12.1%


22.8%

Distribu<on of down<me of Ortec detector components

Vacuum problem (cryostat, sieves…) 13.0%

Cooling System 6.4% Electronics -‐ FET

3.9% Electronics -‐ MCA

6.4%

Electronics -‐ Preamplifier 12.8%



Delayed Detector setup <me 12.5%


6.7%


18.4%

Distribu<on of down<me related to problems with the HPGe-‐detec<on system Affected system components in the period 2013/01/01 -‐ 2014/10/31 Total down<me related to detec<on systems: 3146 days (115 incidents)


Cooling System

Ge-‐technology (leakage current)

Electronics -‐ Cooler controller


Electronics -‐ High Voltage

Electronics -‐ HV filter



Temperature sensor

Mechanical issue (contact…)



Other

Human error (window broken, mistreatment…)



snt2015 poster no. t4.1-p02 failure mode and effect

Documents