SRC Forum  Message Replies
Forum: Reliability & Maintainability Questions and AnswersTopic: Reliability & Maintainability Questions and Answers
Topic Posted by: Reliability & Maintainability Forum
(src_forum@alionscience.com
)
Organization: System Reliability Center
Date Posted: Mon Aug 31 12:47:36 US/Eastern 1998
Original Message:
Posted by: Andrew Comons
Date posted: Mon Mar 4 20:47:32 US/Eastern 2002
Subject: MTBF and Redundent Systems
Message: I am trying to write a system specification
for equipment which will be fielded in pairs, the second system will act as backup system. Each system will be required to have a spare parts package onboard, where the fulltime attending operator will be required to employ built in diagnostics and correct faults within a period of 5 minutes. If faults can't be corrected in 5 minutes the decision is to switch to the backup system. Mission duration is 120 hours. Reliability goal is 95%.
I need to determine MTBF of the system, including the redundent but I do not know how to account for a failure if it is correctable by the 5 minute fault detection/correction requirement.
Sample calcs, references or advice available?
Thanks in advance.
Reply:
Subject: Repairable system reliability approximation
Reply Posted by: Larry George
(pstlarry@attbi.com
)
Organization: Problem Solving Tools
Date Posted: Sat Mar 9 12:49:30 US/Eastern 2002
Message: K.I.S.S. The following Markov chain model may be an adequate approximation for specifying reliability and MTBF. Build a spreadsheet to analyze the Markov chain shown in the transition diagram below for reliability and MTBF.
This model assumes failure and repair rates don't depend on time or age and that the reserve system won't fail while on standby.
[Sys1 up] mu<>lambda [Sys 1 dn] >1mu [Sys 2 up]
mu<>lambda [Sys 2 dn] >1mu [Failure]
lambda is the failure rate of either system
mu is the repair rate of either system. Use the probability that repair is completed within 5 minutes.
Let me know if you would like help with the spreadsheet model.
Reply:
Subject: MTBF with Redundancy (and repair)
Reply Posted by: Andrew Comons
(metalace86@netzero.net
)
Date Posted: Sun Mar 10 17:17:27 US/Eastern 2002
Message: Yes Larry, I would be all too happy to see details of the spreadsheet. Please email directly if you prefer. The Markov methods are new to me and am seeking all information I can find. Practical information seems hard to locate.
Thanks and V/R,
Andrew
Reply:
Subject: MTBF and Redundant Systems
Reply Posted by: Gary Sunada
(gsunada@alionscience.com
)
Organization: Reliability Analysis Center
Date Posted: Tue Mar 12 10:00:03 US/Eastern 2002
Message: Hello  perhaps this may be of help:
First, you'll need the failure rate of one field unit.
One of our books, Reliability Toolkit: Commercial Practices Edition, contains a redundancywithrepair equation (n active units, one offline on standby, with immediate repair):
lambda(standby) = n [n*lambda+(1P)*mu] lambda

mu + n (P + 1) lambda
where
n = number of active online units (in this case, = 1)
lambda = failure rate of one unit
mu = repair rate = 1 / mean corrective maintenance time in hours = 1 / 5 min (assuming that the repair will take 5 minutes, no more, no less  you can adjust this, depending on the nature of the failure modes and their respective repairs) = 1 / 0.0833 = 12.0
P = probability of successful switching over to backup (in this case, with a live human operator, assume 100%) = 1
Plug in the values, and you get lambda(standby). Assuming an exponential distribution for the system's failure rate, 1 over lambda(standby) gives you MTBF.
Reply:
Subject: MTBF and Redundant Systems
Reply Posted by: Gary Sunada
(gsunada@alionscience.com
)
Organization: Reliability Analysis Center
Date Posted: Tue Mar 12 10:04:17 US/Eastern 2002
Message: I apologize for the horrible formatting; let me try that again....
lambda(standby) =
n [n*lambda+(1P)*mu] lambda

mu + n (P + 1) lambda
where
n = number of active online units (in this case, = 1)
lambda = failure rate of one unit
mu = repair rate = 1 / mean corrective maintenance time in hours = 1 / 5 min (assuming that the repair will take 5 minutes, no more, no less  you can adjust this, depending on the nature of the failure modes and their respective repairs) = 1 / 0.0833 = 12.0
P = probability of successful switching over to backup (in this case, with a live human operator, assume 100%) = 1
