《Chapter 2 Reliability and Fault Tolerance:2章,可靠性和容错性》由会员分享,可在线阅读,更多相关《Chapter 2 Reliability and Fault Tolerance:2章,可靠性和容错性(41页珍藏版)》请在金锄头文库上搜索。
1、Chapter 2: Reliability and Fault Tolerance Alan Burns and Andy WellingsReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 2 - 41AimsnTo understand the factors which affect the reliability of a system and introduce how software design faults can be toleratednTo introduce Safety
2、 and DependabilityReliability, failure and faultsFailure modesFault prevention and fault toleranceN-Version programmingDynamic RedundancyReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 3 - 41ScopeFour sources of faults which can result in system failure:nInadequate specific
3、ation not coverednDesign errors in software covered nownProcessor failure not coverednInterference on the communication subsystem not covered Real-Time Systems and Programming Languages: Alan Burns and Andy Wellings 4 - 41Safety and ReliabilitynSafety: freedom from those conditions that can cause de
4、ath, injury, occupational illness, damage to (or loss of) equipment (or property), or environmental harmBy this definition, most systems which have an element of risk associated with their use as unsafenReliability: a measure of the success with which a system conforms to some authoritative specific
5、ation of its behaviournSafety is the probability that conditions that can lead to mishaps do not occur whether or not the intended function is performedReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 5 - 41SafetynE.g., measures which increase the likelihood of a weapon firi
6、ng when required may well increase the possibility of its accidental detonationnIn many ways, the only safe airplane is one that never takes off, however, it is not very reliablenAs with reliability, to ensure the safety requirements of an embedded system, system safety analysis must be performed th
7、roughout all stages of its life cycle developmentReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 6 - 41Aspects of Dependability DependabilityAvailableReadiness for UsageReliableContinuity of Service DeliverySafeNon-occurrence of Catastrophic ConsequencesConfidentialNon- occ
8、urrence of unauthorized disclosure of informationIntegralNon- occurrence of improper alteration of informationMaintainableAptitude to undergo repairs of evolutionsReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 7 - 41Dependability TerminologyDependabilityAvailabilityConfide
9、ntialityReliabilitySafetyIntegrityMaintainabilityFault PreventionFault ToleranceFault RemovalFault ForecastingFaultsErrorsFailuresAttributesMeansImpairmentsReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 8 - 41Reliability, Failure and FaultsnThe reliability of a system is a
10、 measure of the success with which it conforms to an authoritative specification of its behaviournWhen the behaviour of a system deviates from that which is specified for it, this is called a failurenFailures result from unexpected problems internal to the system that eventually manifest themselves
11、in the systems external behaviournThese problems are called errors and their mechanical or algorithmic cause are termed faultsnSystems are composed of components which are themselves systems: hence failure - fault - error - failure - faultReal-Time Systems and Programming Languages: Alan Burns and A
12、ndy Wellings 9 - 41Fault TypesnA transient fault starts at a particular time, remains in the system for some period and then disappearsnE.g. hardware components which have an adverse reaction to radioactivitynMany faults in communication systems are transientnPermanent faults remain in the system un
13、til they are repaired; e.g., a broken wire or a software design errornIntermittent faults are transient faults that occur from time to timenE.g. a hardware component that is heat sensitive, it works for a time, stops working, cools down and then starts to work againReal-Time Systems and Programming
14、Languages: Alan Burns and Andy Wellings 10 - 41Software FaultsnCalled BugsBohrbugs: reproducible identifiable.Heisenbugs: only active under rare conditions: e.g. race conditionsnSoftware doesnt deteriorate with age: it is either correct or incorrect butnFaults can remain dormant for long periods Usu
15、ally related to resource usage e.g. memory leaksReal-Time Systems and Programming Languages: Alan Burns and Andy Wellings 11 - 41Failure ModesFailure modeValue domainTiming domainArbitrary (Fail uncontrolled)Constraint errorValue errorEarlyOmissionLateFail silentFail stopFail controlledReal-Time Sys
16、tems and Programming Languages: Alan Burns and Andy Wellings 12 - 41Approaches to Achieving Reliable SystemsnFault prevention attempts to eliminate any possibility of faults creeping into a system before it goes operationalnFault tolerance enables a system to continue functioning even in the presence of faultsnBoth approaches attempt to produces systems which have well-defined failure modesReal-Time Systems and Programming Languages: Alan Burns and A