Reliability program plan
Many tasks, methods, and tools can be used to achieve reliability. Every
system requires a different level of reliability. A commercial
airliner must operate under a wide range of conditions. The consequences of failure are
grave, but there is a correspondingly higher budget. A pencil sharpener may be
more reliable than an airliner, but has a much different set of operational
conditions, insignificant consequences of failure, and a much lower budget.
A reliability program plan is used to document exactly what tasks, methods,
tools, analyses, and tests are required for a particular system. For complex
systems, the reliability program plan is a separate
document.
For simple systems, it may be combined with the
systems engineering management plan. The reliability program plan is
essential for a successful reliability program and is developed early during
system development. It specifies not only what the reliability engineer does,
but also the tasks performed by others. The reliability program plan is approved
by top program management.
Reliability requirements
For any system, one of the first tasks of reliability engineering is to
adequately specify the reliability requirements. Reliability requirements
address the system itself, test and assessment requirements, and associated
tasks and documentation. Reliability requirements are included in the
appropriate system/subsystem requirements specifications, test plans, and
contract statements.
System reliability parameters
Requirements are specified using reliability
parameters.
The most common reliability parameter is the
mean-time-between-failure (MTBF), which can also be specified as the
failure
rate or the number of failures during a given period. These parameters are
very useful for systems that are operated on a regular basis, such as most
vehicles,
machinery, and
electronic equipment. Reliability increases as the MTBF increases. The MTBF
is usually specified in hours, but can also be used with other units of
measurement such as miles or cycles.
In other cases, reliability is specified as the probability of mission
success. For example, reliability of a scheduled aircraft flight can be
specified as a dimensionless probability or a percentage.
A special case of mission success is the single-shot device or system. These
are devices or systems that remain relatively dormant and only operate once.
Examples include automobile
airbags, thermal
batteries and
missiles. Single-shot reliability is specified as a probability of success,
or is subsumed into a related parameter. Single-shot missile reliability may be
incorporated into a requirement for the probability of hit.
For such systems,
the probability of failure on demand (PFD) is the reliability
measure. This
PFD is derived from failure rate and mission time for non-repairable systems.
For repairable systems, it is obtained from failure rate and MTTR and test
interval. This measure may not be unique for a given system as this measure
depends on the kind of demand. In addition to system level requirements,
reliability requirements may be specified for critical subsystems. In all cases,
reliability parameters are specified with appropriate statistical
confidence intervals.
Reliability modelling
Reliability modelling is the process of predicting or understanding the
reliability of a component or system. Two separate fields of investigation
are common: The
physics of failure approach uses an understanding of the failure mechanisms
involved, such as
crack propagation or chemical
corrosion;
The
parts stress modelling approach is an empirical method for prediction based
on counting the number and type of components of the system, and the stress they
undergo during operation.
For systems with a clearly defined failure time (which is sometimes not given
for systems with a drifting parameter), the
empirical distribution function of these failure times can be determined.
This is done in general in an accelerated experiment with increased stress.
These experiments can be divided into two main categories:
Early failure rate studies determine the distribution with a decreasing
failure rate over the first part of the
bathtub curve. Here in general only moderate stress is necessary. The stress
is applied for a limited period of time in what is called a censored test.
Therefore, only the part of the distribution with early failures can be
determined.
In so-called zero defect experiments, only limited information about the
failure distribution is acquired. Here the stress, stress time, or the sample
size is so low that not a single failure occurs. Due to the insufficient sample
size, only an upper limit of the early failure rate can be determined. At any
rate, it looks good for the customer if there are no failures.
In a study of the intrinsic failure distribution, which is often a material
property, higher stresses are necessary to get failure in a reasonable period of
time. Several degrees of stress have to be applied to determine an acceleration
model. The empirical failure distribution is often parametrised with a
Weibull or a
log-normal model.
It is a general
praxis to model the early failure rate with an exponential distribution.
This less complex model for the failure distribution has only one parameter: the
constant failure rate. In such cases, the
Chi-square distribution can be used to find the
goodness of fit for the estimated failure rate. Compared to a model with a
decreasing failure rate, this is quite pessimistic. Combined with a zero-defect
experiment this becomes even more pessimistic. The effort is greatly reduced in
this case: one does not have to determine a second model parameter (e.g. the
shape parameter of a
Weibull distribution, or its confidence interval (e.g by an MLE /
Maximum likelihood approach) - and the sample size is much smaller.
Reliability test requirements
Because reliability is a probability, even highly reliable systems have some
chance of failure. However, testing reliability requirements is problematic for
several reasons. A single test is insufficient to generate enough statistical
data. Multiple tests or long-duration tests are usually very expensive. Some
tests are simply impractical. Reliability engineering is used to design a
realistic and affordable test program that provides enough evidence that the
system meets its requirement. Statistical
confidence levels are used to address some of these concerns. A certain
parameter is expressed along with a corresponding confidence level: for example,
an MTBF of 1000 hours at 90% confidence level. From this specification, the
reliability engineer can design a test with explicit criteria for the number of
hours and number of failures until the requirement is met or failed.
The combination of reliability parameter value and confidence level greatly
affects the development cost and the risk to both the customer and producer.
Care is needed to select the best combination of requirements. Reliability
testing may be performed at various levels, such as
component,
subsystem, and
system. Also, many factors must be addressed during testing, such as extreme
temperature and humidity, shock, vibration, and heat. Reliability engineering
determines an effective test strategy so that all parts are exercised in
relevant environments. For systems that must last many years, reliability
engineering may be used to design an accelerated life test.
Requirements for reliability tasks
Reliability engineering must also address requirements for various
reliability tasks and documentation during system development, test, production,
and operation. These requirements are generally specified in the contract
statement of work and depend on how much leeway the customer wishes to provide
to the contractor. Reliability tasks include various analyses, planning, and
failure reporting. Task selection depends on the criticality of the system as
well as cost. A critical system may require a formal failure reporting and
review process throughout development, whereas a non-critical system may rely on
final test reports. The most common reliability program tasks are documented in
reliability program standards, such as MIL-STD-785 and IEEE 1332.
|