Reliability engineering |
Accelerated testing
The purpose of accelerated life testing in to induce field failure in the
laboratory at a much faster rate by providing a harsher, but nonetheless
representative, environment. In such a test the product is expected to fail in
the lab just as it would have failed in the field�but in much less time.
The
main objective of an accelerated test is either of the following:
-
- To discover failure modes
- To predict the normal field life from the high stress lab life
Accelerated testing need planning and as following
-
- Define objective and scope of the test
- Collect required information about the product
- Identify the stress(es)
- Determine level of stress(es)
- Conduct the Accelerated test and analyse the accelerated data.
Common way to determine a life stress relationship are
-
- Arrhenius Model
- Eyring Model
- Inverse Power Law Model
- Temperature-Humidity Model
- Temperature Non-thermal Model
Software reliability
Software reliability is a special aspect of reliability engineering. System
reliability, by definition, includes all parts of the system, including
hardware,
software, operators and procedures. Traditionally, reliability engineering
focuses on critical hardware parts of the system. Since the widespread use of
digital
integrated circuit technology, software has become an increasingly critical
part of most
electronics and, hence, nearly all present day systems. There are
significant differences, however, in how software and hardware behave. Most
hardware unreliability is the result of a component or
material
failure that results in the system not performing its intended function.
Repairing or replacing the hardware component restores the system to its
original unfailed state. However, software does not fail in the same sense that
hardware fails. Instead, software unreliability is the result of unanticipated
results of software operations. Even relatively small software programs can have
astronomically large
combinations of inputs and states that are infeasible to exhaustively test.
Restoring software to its original state only works until the same combination
of inputs and states results in the same unintended result. Software reliability
engineering must take this into account.
Despite this difference in the source of failure between software and
hardware � software doesn�t wear out � some in the software reliability
engineering community believe statistical models used in hardware reliability
are nevertheless useful as a measure of software reliability, describing what we
experience with software: the longer you run software, the higher the
probability you�ll eventually use it in an untested manner and find a latent
defect that results in a failure (Shooman
1987), (Musa 2005), (Denney 2005).
As with hardware, software reliability depends on good requirements, design
and implementation. Software reliability engineering relies heavily on a
disciplined
software engineering process to anticipate and design against
unintended consequences. There is more overlap between software
quality engineering and software reliability engineering than between
hardware quality and reliability. A good software development plan is a key
aspect of the software reliability program. The software development plan
describes the design and coding standards,
peer reviews,
unit tests,
configuration management,
software metrics and software models to be used during software development.
A common reliability metric is the number of software faults, usually
expressed as faults per thousand lines of code. This metric, along with software
execution time, is key to most software reliability models and estimates. The
theory is that the software reliability increases as the number of faults (or
fault density) goes down. Establishing a direct connection between fault density
and mean-time-between-failure is difficult, however, because of the way software
faults are distributed in the code, their severity, and the probability of the
combination of inputs necessary to encounter the fault. Nevertheless, fault
density serves as a useful indicator for the reliability engineer. Other
software metrics, such as complexity, are also used.
Testing is even more important for software than hardware. Even the best
software development process results in some software faults that are nearly
undetectable until tested. As with hardware, software is tested at several
levels, starting with individual units, through integration and full-up system
testing. Unlike hardware, it is inadvisable to skip levels of software testing.
During all phases of testing, software faults are discovered, corrected, and
re-tested. Reliability estimates are updated based on the fault density and
other metrics. At system level, mean-time-between-failure data is collected and
used to estimate reliability. Unlike hardware, performing the exact same test on
the exact same software configuration does not provide increased statistical
confidence. Instead, software reliability uses different metrics such as test
coverage.
Eventually, the software is integrated with the hardware in the top-level
system, and software reliability is subsumed by system reliability. The Software
Engineering Institute's
Capability Maturity Model is a common means of assessing the overall
software development process for reliability and quality purposes.
Reliability operational assessment
After a system is produced, reliability engineering during the system
operation phase monitors, assesses, and corrects deficiencies. Data collection
and analysis are the primary tools used. When possible, system failures and
corrective actions are reported to the reliability engineering organization. The
data is constantly analyzed using statistical techniques, such as
Weibull analysis and
linear regression, to ensure the system reliability meets the specification.
Reliability data and estimates are also key inputs for system
logistics.
Data collection is highly dependent on the nature of the system. Most large
organizations have
quality control groups that collect failure data on vehicles, equipment, and
machinery. Consumer product failures are often tracked by the number of returns.
For systems in dormant storage or on standby, it is necessary to establish a
formal surveillance program to inspect and test random samples. Any changes to
the system, such as field upgrades or recall repairs, require additional
reliability testing to ensure the reliability of the modification.
Reliability organizations
Systems of any significant complexity are developed by
organizations of people, such as a commercial
company or a
government
agency. The reliability engineering organization must be consistent with the
company's
organizational structure. For small, non-critical systems, reliability
engineering may be informal. As complexity grows, the need arises for a formal
reliability function. Because reliability is important to the customer, the
customer may even specify certain aspects of the reliability organization.
There are several common types of reliability organizations. The
project manager or chief
engineer
may employ one or more reliability engineers directly. In larger organizations,
there is usually a product assurance or specialty engineering organization,
which may include reliability,
maintainability,
quality,
safety,
human
factors,
logistics, etc. In such case, the reliability engineer reports to the
product assurance manager or specialty engineering manager.
In some cases, a company may wish to establish an independent reliability
organization. This is desirable to ensure that the system reliability, which is
often expensive and time consuming, is not unduly slighted due to budget and
schedule pressures. In such cases, the reliability engineer works for the
project on a day-to-day basis, but is actually employed and paid by a separate
organization within the company.
Because reliability engineering is critical to early system design, it has
become common for reliability engineers, however the organization is structured,
to work as part of an integrated product team.
Certification
The
American Society for Quality has a program to become a Certified Reliability
Engineer, CRE. Certification is based on education, experience, and a
certification test: periodic recertification is required. The body of knowledge
for the test includes: reliability management, design evaluation, product
safety, statistical tools, design and development, modeling, reliability
testing, collecting and using data, etc.
Reliability engineering education
Some Universities offer graduate degrees in Reliability Engineering (e.g.,
see
University of Maryland). Other reliability engineers typically have an
engineering degree, which can be in any field of engineering, from an
accredited
university
or college
program. Many engineering programs offer reliability courses, and some
universities have entire reliability engineering programs. A reliability
engineer may be registered as a
Professional Engineer by the state, but this is not required by most
employers. There are many professional conferences and industry training
programs available for reliability engineers. Several professional organizations
exist for reliability engineers, including the IEEE Reliability Society, the
American Society for Quality (ASQ), and the
Society of Reliability Engineers (SRE).
|