10.1007_978-3-642-24553-4_37

Upload: shmuel-yonatan-hayoun

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    1/9

    Reliability of Standby Systems

    Salvatore Distefano

    Dipartimento di Matematica, Universita di Messina,C.da Di Dio, 98166 Messina, Italy

    [email protected]

    Dipartimento di Elettronica e Informazione, Politecnico di Milano,Via Ponzio 34/5, 20133 Milano, Italy

    [email protected]

    Abstract. Reliability theory bases on the concept of boolean compo-

    nents, i.e. of up, operating or down, failed components. But often suchassumption is not adequate for modeling specific behaviors of compo-nents, units, subsystems and systems. It cannot catch, for example, dif-ferent operating conditions of components due to dependencies on othercomponents or environment variations.

    Aim of this paper is to investigating a specific dynamic behavior, thestandby phenomena in reliability contexts, starting from a characteriza-tion from both internal and external perspectives. The formal specifica-tion of the problem is obtained through the dynamic reliability theory,providing its analytical formulation.

    Keywords: Standby, redundancy, standby redundant systems, k-out-of-n standby redundancy.

    1 Introduction and Motivations

    Standby is a hot topic in reliability as also highlighted in literature. With specificregards to the evaluation of the standby systems reliability several techniques

    have been used. For example, in [1] and [2] renewal theory and semi-Markovmodels are exploited to evaluate some specific case studies such as: three-statesystems, systems with mixed constant repair time, systems with multi-phaserepair, systems with non-regenerative states, two-component systems with coldstandby and maintenance, and so on. The method of the supplementary variablesand Laplace transform are instead used in [3,4] to evaluate the stationary avail-ability of n-unit parallel redundant systems with correlated failures and singlerepair facilities.

    However, to the best of our knowledge, the specific literature partially faces

    or lacks of some aspects that can arise by evaluating the problem from differentperspectives. In particular, as introduced above, the concept of standby is alwaysmixed to the redundant one. This fact can drive to misunderstanding or even toapproximations that can result wrong and dangerous in the system design.

    The main aim of this paper is to cover these lacks, focusing on standby anddeeply investigating the related phenomena in reliability contexts from both the

    D.-S. Huang et al. (Eds.): ICIC 2011, LNBI 6840, pp. 267275, 2012.c Springer-Verlag Berlin Heidelberg 2012

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    2/9

    268 S. Distefano

    internal and the external viewpoints. In the former case, the unit is observed inisolation, without taking into account the interactions with the external envi-ronment, in order to characterize and to evaluate its internal behaviour. Then,the unit behaviour is evaluated also considering such interactions, in a larger

    system context. With the support of dynamic reliability theory, the character-ization thus specified is formalized in terms of specific equations starting fromthe conservation of reliability principle.

    In order to achieve such goals, the remainder of the paper is organized asfollows: section 2 provides background on the standby behaviour and relatedconcepts; section 3 characterizes the standby from both the internal and the ex-ternal points of view also introducing standby redundant systems, while section4 specifies the standby behaviour in analytical terms. Then section 5 summarizesand closes the paper.

    2 Preliminary Concepts

    Standby systems are characterized by dual-operating mode: active and sleep.While in active-mode a standby system is fully operating, able to provide itsservices. Otherwise, in the sleep-mode no services are provided by the standbysystem until a specific external call, signal or input switches it from the sleep tothe active mode.

    Standby systems are widely used in modern technologies due to their capa-bilities to optimize costs, to reduce the environmental impact, to optimize thesystem reliability and availability, to adequately manage redundant resources,and so on.

    Usually the concept of standby, in technological context, is referred to theenergy or to the power consumption of the system. In fact, more and more oftenstandby devices such as standby generators, standby batteries, standby powersystems, etc., are used in designs, projects, schemes, data sheets and technicalnotes. As a confirmation of this, several technical glossaries now include standby

    and related terms, such as sleep-mode or standby-mode as in [10]: a mode in whichelectronic appliances are turned off but under power and ready to activate oncommand.

    The attention attracted by standby and related issues has consolidated aresearch trend on the topic, especially in recent times in which the sensitivenesson environment, pollution and energy-related problems is strongly grown, givingrise to many government [7,8,9,11] and non-government [5,6,12] initiatives andprojects.

    This has impacted on the designing approach, preferring low-power devices

    managed through standby policies with the aim of reducing the power consump-tion and optimize costs, performance and reliability. In the ICT context, Greencomputing [16] was born in order to achieve such goals.

    A good definition of standby in proposed in [11], in particular since it high-lights the relationship between standby and energy/power, thus characterizingthe active/sleep modes in terms of the load applied to the standby unit: in the

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    3/9

    Reliability of Standby Systems 269

    former case a full load is applied, while sleep modes are characterized by partialor phantom loads. According to such viewpoint, the hot/warm standby repre-sents fully or partial powered sleep mode, respectively, while in cold sleep modethe system is not powered.

    3 Standby Characterization

    Standby, in reliability contexts, is usually considered as a specific policy of redun-dancy. But, as discussed in section 1, it can be interpreted as a more general andcomplex concept that has to be investigated from a higher level of abstraction,separately from the redundancy.

    With this aim, in the following the standby behaviour is studied in deep fromtwo different points of view: internal, by observing the effects of the standby

    from the inside, and external/system, taking into account the interactions froma system reliability viewpoint, thus considering the standby unit as a component.

    3.1 Inside a Standby Unit

    The goal of this section is to observe a standby unit from the inside in orderto identify the effects of standby into the unit and to characterize them asa specific state of the standby unit state machine. According to the (static)system reliability theory, two states can be assumed by a component/unit: up

    and down. A unit is therefore Boolean, i.e. it can only be either operating orfailed, respectively. Such classification cannot adequately represent the standby,since the sleep-mode cannot be clearly identified as an up or a down state. It isthus necessary to review such classification by considering the standby behaviour.

    A good starting point is the definition provided in [13]:

    Up - Pertaining to a system or component that is operable and in service.It can be: Operating - Pertaining to a system or component that is operable, in

    service, and in use. Idle - Pertaining to a system or component that is operable and in ser-

    vice, but not in use.

    Down - Pertaining to a system or component that is not operable or hasbeen taken out of service.

    In this way 3 features characterizing the states of a system can be identified:

    operable - if the system is ready for use, for example if it is physicallyintact;

    serviceable - if the system is ready to provide its service to the environment; in use - if the system is performing its service.

    The serviceable property mainly regards the interaction between the unit andthe external environment, and therefore it is better considered and evaluated inthe next subsection.

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    4/9

    270 S. Distefano

    Table 1. Standby unit state machine

    FEATURE Operable Serviceable UseSTATE Ye s Not Yes Not In Not in

    operating X X X

    idle X X Xdormant X X X

    failed X X X

    From such classification it is possible to obtain four meaningful states asreported in Table 1. In this way, the states classification of [13] into operating, idleand failed, is enriched by a new state, the dormant one. This latter describes acondition in which an operable unit is not in service due to a particular conditionor constraint applied to the unit, for example an external input switching theunit in the sleep-mode as occur in standby unit.

    Operating

    Failed

    resume

    failuresleep

    failure

    repair

    repair

    Standby

    Fig. 1. State machine of a standby unit from inside

    From an internal viewpoint, the dormant state cannot be distinguished bythe idle one, since the serviceable property, as discussed above, is not taken intoaccount from an internal point of view. Therefore the dormant state has to beconsidered as an up state. As a consequence, from the internal perspective, idleand dormant states can be merged into a standby state, as shown in Fig. 1.

    Even though such standby state is an up state as the operating one, it differsby this latter since it is characterized, as introduced in section 2, by a different(phantom) load that can significantly affect its behaviour. Such distinction isparticularly meaningful in case of cold and warm standby but, otherwise, it ismeaningless in case of hot standby since the load characterizing such state is thesame of the one characterizing the operating state. Therefore the hot standbystate, from the internal point of view, is undistinguishable from the operatingstate and so it can be considered as an operating state.

    The standby unit dynamics thus resulting is regulated by four main events:

    failure, resume, sleep and repair. The failure event brings to the failed state. Sincein general a unit can fail from both operating and standby states, failures fromboth such states are possible. The only exception is the cold standby by which itis not possible to fail and therefore no failure events can be specified from it. Onthe other hand, the repair switches from failed to active or standby states, thusrepresenting the standby unit repair. The resume represents transitions from

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    5/9

    Reliability of Standby Systems 271

    the standby to the operating states, while sleep the reverse transitions, thusimplementing the sleep-active cycles of a standby unit.

    3.2 The System Perspective

    Once characterized the internal behaviour of a standby unit, the focus is movedtowards the system observing the unit from the outside. The standby unit istherefore now considered as a part of the system, thus taking into account therelationships with the other components and with the external environment. Inthis way the standby can be characterized as a dynamic-dependent behaviour,involving two parts: the driver/trigger side driving the standby, and the standbyunit that reacts to the inputs incoming from driver.

    From the system reliability point of view, the characterization discussed above

    and synthesized in Table 1 has to be adequately revised. First of all, it is neces-sary to take into account the serviceable property above identified and neglected,since it is strictly related to the external viewpoint. This means that the statescharacterization specified in Fig. 1 has to be modified, and, in particular, thatit is no more possible to merge idle and dormant states into a unique standbystate. This fact requires further explanations.

    Since the dormant state represents the condition in which the standby unitdepends upon an external event able to switch it in service, or serviceable, it isno more possible to consider it as an up state as above, but more properly it

    has to be evaluated as a down state, out of service. In this way, from a systemreliability perspective, both the dormant and the failed states are identified asdown states.

    But there is an important difference between dormant and failed states: incase of failure, an external time-consuming action, such as a replacement or arepair, is required to restore the operating conditions of the unit; while, in thecase of the dormant state, the standby unit is not physically failed, it waits fora driver input that can immediately switch it in service.

    This further justifies the fact that the one-standby-state characterization of

    Fig. 1 does not well represent the behaviour of the standby component. It be-comes necessary to split the unique standby state into two different states cor-responding to the idle and the dormant modes as shown in Fig. 2. In this way,

    enable

    Operating

    Failed

    resume

    failure

    sleep

    Idlefailure

    repair

    repair

    Dormant

    failure

    repair

    pause

    disable

    enabledisable

    Fig. 2. State machine of a standby unit from outside

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    6/9

    272 S. Distefano

    a unit can transitions to the dormant state if repaired or from both active andidle states (the disable event), vice-versa, a unit can be enabled from the dor-mant state by transitioning to both idle or active states (the enable event), or itcan fail.

    Thus, according to the standby unit characterization of Table 1, from anexternal point of view it is possible to identify operating and idle as up states,while dormant and failed as down states.

    4 Formal Aspects

    In section 2, starting from [11], a classification among hot, warm and cold stand-bys is performed while in section 3, a characterization of the standby behaviour inspecific states is proposed. With the aim of quantitatively translating such char-acterization in terms of reliability, it is possible to base on the heuristic rulesthat establishes a relationship between the load applied to a generic standbysystem (subsystem or unit) and the system (un)reliability, as introduced above.According to such rule, to greater load applied to the system corresponds lowerreliability or, equivalently, higher unreliability. This is due to the fact that, incase of greater load, the system makes more work and, consequently, its reliabil-ity quicker decreases or its failure rate increases. Such heuristic is particularlytrue when the standby system is subjected to a phantom load as in the coldstandby case, since it does not work in standby and therefore cannot fail on itsown but only if external causes arise.

    Following this reasoning, it is possible to provide a reliability characterizationof the standby state based on the relationship between reliability and load. Froman high level point of view, the standby, the idle and the dormant states (moregenerally standby) of a standby system, as specified in section 3, can be furthercharacterized as [14,15]:

    Cold - if the unit cannot fail autonomously;

    Warm - if the unit can fail autonomously, but with a lower failure rate or

    greater reliability distribution (in the statistical sense) than the operatingstate one;

    Hot - if the unit can fail autonomously as in the operating state, with thesame failure rate or reliability distribution.

    A standby unit in cold standby is (intrinsically) reliable during its sleep mode.Otherwise (warm and hot standby), the unit can also fail from the sleep mode.From the system point of view, as discussed in section 3.2, a standby can beeither an up state in case it is classified as idle, or a down state if identified as

    dormant.However, in general a standby state, both idle and dormant, can be charac-

    terized by its own reliability function that quantifies the impact of the standbyon the system. From the probability theory point of view, a standby system SScan be characterized by, at least, two reliability functions: RO

    SSand RS

    SS. The

    former models the behaviour of the unit in the fully operating mode, while the

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    7/9

    Reliability of Standby Systems 273

    latter characterizes the standby/sleep mode. Assuming the unit is initially, attime t = 0, fully operating and at time t = x > 0 it switches to the sleep mode,the standby system reliability function RSS(t, x) can be specified as follows:

    RSS(t, x) =RO

    SS(t)

    t

    x

    RSSS

    (t) t > x (1)

    where x is associated to the trigger event random variable X driving the standby.Thus, following the classification of cold, warm and hot standby, cold and

    warm standby can be more formally specified by eq. (1), in which at changepoint x there is a change of the reliability CDF from RO

    SS(t) to RS

    SS(t) with

    ROSS

    (t) RSSS

    (t) t 0, while in the hot standby case ROSS

    (t) = RSSS

    (t).As stated above, the standby system active-sleep cycles are triggered by an

    external event, i.e., in probability terms, the standby system reliability RSSdepends on two events as in eq. (1): the standby unit lifetime T and the triggerevent X. Assuming to know RO

    SS(t) and RS

    SS(t) or equivalently the corresponding

    unreliability CDFs FOSS

    (t) and FSSS

    (t), and the distribution of the conditioningevent X, FX(x), the aim is to obtain the reliability of the standby system RSS(t)of eq. (1), removing its dependency on x.

    Thus, exploiting the theorem of total probability, FSS(t) = 1 RSS(t) can beobtained as follows:

    FSS(t) = +

    P r(T t|X = x)fX(x)dx =t

    0 P r(T t|X = x)fX(x)dx ++t P r(T t|X = x)fX(x)dx =

    =t

    0(1 P r(T > t|X = x))fX(x)dx + F

    OSS

    (t)(FX (x)|t )

    (2)

    where FOSS

    (t) = 1 ROSS

    (t) and FSSS

    (t) = 1 RSSS

    (t) and fX(x) = FX

    (x). Inorder to evaluate P r(T > t|X = x), it can be observed that the dependentcomponent for t x follows FO

    SS(t) and then, at t = x, it switches into the sleep

    mode state characterized by FSSS

    (t).It is therefore necessary to understand what happens at change point x. Start-

    ing from the conservation of reliability principle [17], also known as the Markov

    additive property [18], the effect of the switching between the two distributionscan be quantified in terms of time through , thus obtaining the equivalent timesuch that, at change point x:

    ROSS(x) = RS

    SS(x + ) = RS(1)SS

    (ROSS(x)) x (3)

    assuming that ROSS

    () is strictly decreasing and therefore invertible.

    In this way P r(T > t|X = x) = RSSS

    (t + ) = RSSS

    t + R

    S(1)SS

    (ROSS

    (x)) x

    since x t, and thus substituting it in eq. (2):

    FSS(t) = FOSS

    (t)(1 FX(t)) +t

    0FSSS

    t + R

    S(1)SS

    (ROSS

    (x)) x

    fX(x)dx (4)

    Eq. (4) thus quantifies the unreliability of a standby system switching fromoperating to sleep modes when triggered by an external event stochasticallyrepresented by X.

  • 7/28/2019 10.1007_978-3-642-24553-4_37

    8/9

    274 S. Distefano

    5 Conclusions

    Standby systems are of strategic importance in the actual technologies, being away for reducing the environmental impact and the costs, by extending the sys-

    tems time-to-life. Focusing on reliability, this paper studies in depth the standbybehaviour considering different complementary perspectives, the intrinsic one,investigating a standby unit from the inside, and the external/operational view-point, considering reliability interactions and dynamics among the standby com-ponents of a system.

    Starting from such characterization, the behaviour of a generic standby sys-tem is analytically investigated, providing the corresponding formal relationshipsand equations. Moreover, standby redundancy is formally evaluated firstly con-sidering a 2-unit standby redundant system.

    References

    1. Limnios, N., Oprisan, G.: Semi-Markov Processes and Reliability, ser. Statistics forIndustry and Technology. Birkhauser, Boston (2001)

    2. Janssen, J., Manca, R.: Semi-Markov Risk Models for Finance, Insurance and Re-liability. Springer, Heidelberg (2007)

    3. Itoi, T., Nishida, T., Kodama, M., Ohi, F.: N-unit Parallel Redundant System with

    Correlated Failure and Single Repair Facility. Microelectronics Reliability 17(2),279285 (1978)

    4. Nikolov, A.V.: N-unit Parallel Redundant System with Multiple Correlated Fail-ures. Microelectronics and Reliability 26(1), 3134 (1986)

    5. International Energy Agency (IEA). IEA Standby Power Initiative. Task Force1: Definitions and Terminology of Standby Power, November 17-18, Washington,USA (1999)

    6. International Electrotechnical Commission (IEC). IEC 62301 standard: Householdelectrical appliances - Measurement of standby power. Edition 2.0.

    7. Australian Government, Department Of Environment, Water, Heritage and the

    Arts. Australian standby power program (September 2009),http://www.energyrating.gov.au/standby.html

    8. U.S. Environmental Protection Agency and U.S. Department of Energy. ENERGYSTAR program

    9. The European Commission. The Directive 2005/32/EC on the Eco-Design ofEnergy-using Products (EuP)

    10. Meier, A.: Standby Power Use - Definitions and Terminology. In: First Workshopon Reducing Standby Losses, Paris, France (January 1999)

    11. Alliance for Telecommunications Industry Solutions (ATIS). American NationalStandard ATIS Telecom Glossary (2007)

    12. Institute of Electrical and Electronics Engineers (IEEE). IEEE Std 446-1995 - IEEERecommended Practice for Emergency and Standby Power Systems for Industrialand Commercial Applications

    13. Institute of Electrical and Electronics Engineers (IEEE). IEEE 610-1991 - IEEEStandard Computer Dictionary. A Compilation of IEEE Standard Computer Glos-saries (1991) ISBN:1559370793.

    http://www.energyrating.gov.au/standby.htmlhttp://www.energyrating.gov.au/standby.html
  • 7/28/2019 10.1007_978-3-642-24553-4_37

    9/9

    Reliability of Standby Systems 275

    14. Dugan, J.B., Bavuso, S., Boyd, M.: Dynamic Fault Tree Models for Fault-TolerantComputer Systems. IEEE Trans. Reliability 41(3), 363377 (1992)

    15. Distefano, S., Puliafito, A.: Dependability evaluation with dynamic reliability blockdiagrams and dynamic fault trees. IEEE Transactions on Dependable and SecureComputing 6(1), 417 (2009)

    16. Murugesan, S.: Harnessing Green IT: Principles and Practices. IT Profes-sional 10(1), 2433 (2008), doi:10.1109/MITP.2008.10.

    17. Kececioglu, D.: Reliability Engineering Handbook, vol. 1 & 2. DEStech Publica-tions (1991) ISBN Volume 1: 1932078002, ISBN Volume 2: 1932078010

    18. Finkelstein, M.S.: Wearing-out of components in a variable environment. ReliabilityEngineering & System Safety 66(3), 235242 (1999)