Tuesday, March 25, 2014



The business of System Reliability :  Defining the characteristics of our best target market with a simple cost model.

Supply chain managers have developed tools to monitor the supply of components for large-scale systems, including parameters like lead time, single source of supply, potential replacement parts, price, INCOTERM in their analyses. They usually work with component engineers to gather the data needed for their management tools. Reliability has historically been factored into the design (life duration) and the maintenance program mostly for deterministic ageing effect creating a slow drift in performance. Another type of reliability issue is unexpected, random failure like SEEs that can occur  at anytime, anywhere in the system, causing potential catastrophic failures. Because of the difficulty of modelling and predicting SEEs, they are not necessarily well analyzed as they are sometimes like the needle in the haystack for large systems. When components show this type of reliability issues in the field, the information is supposed to be fed back to the supply chain’s monitoring system, which organizes repairs and recalls, and design change request if needed. These operations often come at a large cost to the system vendor, both internal and as contract penalties.
A simplified cost estimation model of this risk and the associated costs can be represented below:



Total cost is:
                                    C = C1 + P1*C3 + C4                              

Where: C1 represents the cost associated to development, implementation, fabrication, sales, etc. C3 represents the cost of repair or recall of the product. C4 represents the cost of maintenance.

Adding the capability of assessing and correcting SEEs before shipment drastically lowers the risk of failure in the field. It modifies the cost structure as shown below:


The total cost would then be:
                              CR = C1 + C2 + P2*C3 + C4                         (2)
Where: C2 is the overall cost of improving the reliability (analysis, test, mitigation, ...). P2 is the probability of a failure given the result (and recommendations) of the reliability audit. Statistically speaking, this probability follows Bayesian statistics. 
The difference of cost between the two approaches would then be:
                          ΔC = CR – C = C2 + C3*(P2 - P1)                     (3)
ΔC needs to be negative in order for reliability audit to make business sense and generate positive Return On Investment.  Note that if reliability analysis is performed well, the repair cost C3 should be much lower than in the previous case. For the sake of simplicity, we'll assume that C3 and C2 are the same (worst case).

                                                    ΔC 

The conditions for this to happen are clear from (3), and define our target markets and our offering:

·         C3 is large. It corresponds to certain applications and industries. Our experience of cases where this cost can be prohibitive: aerospace, medical devices and cloud infrastructure

·         P2 << P1: this condition is achieved mainly with accurate analysis tools, deep knowledge and expertise in the field and effective mitigation strategies

·         C2 is as small as possible: C2 has two components, the cost of analysis and the cost of mitigation. C2 depends on the stage where the problem is audited, obviously the earlier the cheaper.

Therefore, for the target markets for which cost of failure is prohibitive, we have to bring enough expertise to significantly lower the probability of failure through accurate analysis and effective mitigation, and our intervention should be as early as possible in the design phase. These statement are very helpful when defining our product portfolio and the type of data and engagement that we'll be seeking.


if you find these thoughts interesting, or you'd like to react to this blog, let us know! Your comments are welcome as usual!