__The business of System Reliability__: Defining the characteristics of our best target market with a simple cost model.
Supply chain managers have
developed tools to monitor the supply of components for large-scale systems,
including parameters like lead time, single source of supply, potential
replacement parts, price, INCOTERM in their analyses. They usually work with
component engineers to gather the data needed for their management tools. Reliability
has historically been factored into the design (life duration) and the
maintenance program mostly for deterministic ageing effect creating a slow
drift in performance. Another type of reliability issue is unexpected, random
failure like SEEs that can occur at
anytime, anywhere in the system, causing potential catastrophic failures.
Because of the difficulty of modelling and predicting SEEs, they are not
necessarily well analyzed as they are sometimes like the needle in the haystack
for large systems. When components show this type of reliability issues in the
field, the information is supposed to be fed back to the supply chain’s
monitoring system, which organizes repairs and recalls, and design change
request if needed. These operations often come at a large cost to the system
vendor, both internal and as contract penalties.

A simplified cost estimation model
of this risk and the associated costs can be represented below:

Total cost is:

*C = C1 + P*

_{1}*C3 + C4
Where: C1
represents the cost associated to development, implementation, fabrication, sales,
etc. C3 represents the cost of repair or recall of the product. C4 represents
the cost of maintenance.

Adding the capability of assessing and correcting SEEs before shipment drastically lowers the risk of failure in the field. It modifies the cost structure as shown below:

Adding the capability of assessing and correcting SEEs before shipment drastically lowers the risk of failure in the field. It modifies the cost structure as shown below:

The total cost would then be:

*C*(2)

_{R}= C1 + C2 + P_{2}*C3 + C4
Where: C2 is
the overall cost of improving the reliability (analysis, test, mitigation, ...).
P2 is the probability of a failure given the result (and recommendations) of
the reliability audit. Statistically speaking, this probability follows
Bayesian statistics.

The difference of cost between the two approaches would then be:

The difference of cost between the two approaches would then be:

Δ

*C = C*(3)_{R}– C = C2 + C3*(P_{2}- P_{1})
ΔC
needs to be negative in order for reliability audit to make business sense and
generate positive Return On Investment. Note that if reliability analysis is performed well, the repair cost C3 should be much lower than in the previous case. For the sake of simplicity, we'll assume that C3 is the same (worst case).

*ΔC<0 nbsp="" o:p="">*

The conditions for this to happen are clear from (3), and define our target markets and our offering:

·
C3 is
large. It corresponds to certain applications and industries. Our experience of cases where this cost can be prohibitive:
aerospace, medical devices and cloud infrastructure

·
P2 << P1: this condition is
achieved mainly with accurate analysis tools, deep knowledge and expertise in
the field and effective mitigation strategies

·
C2 is as
small as possible: C2 has two components, the cost of analysis and the cost of
mitigation. C2 depends on the stage where the problem is audited, obviously the
earlier the cheaper.

Therefore, for the target markets for which cost of failure is prohibitive, we have to bring enough expertise to significantly lower the probability of failure through accurate analysis and effective mitigation, and our intervention should be as early as possible in the design phase. These statement are very helpful when defining our product portfolio and the type of data and engagement that we'll be seeking.

if you find these thoughts interesting, or you'd like to react to this blog, let us know! Your comments are welcome as usual!

Therefore, for the target markets for which cost of failure is prohibitive, we have to bring enough expertise to significantly lower the probability of failure through accurate analysis and effective mitigation, and our intervention should be as early as possible in the design phase. These statement are very helpful when defining our product portfolio and the type of data and engagement that we'll be seeking.

if you find these thoughts interesting, or you'd like to react to this blog, let us know! Your comments are welcome as usual!