Why Failure prevention is not the main RCM (Reliability) focus?

Well, in maintenance usually our target is to prevent failure. In Reliability-Centered Maintenance, this focus need to be changed to the eliminating or reducing the consequences of this failure. Even it encourages letting failure happens. That’s when its consequences have less effects than the cost of the planned maintenance action. Or, when equipment fail regardless of you being adherent to the maintenance program or not. There are many factors that affects the failure or the unsatisfactory performance of the equipment. Many of these factors and failure perceptions are not directly related to the equipment itself. Let us demystify it bit by bit.

What is failure prevention?

To prevent a failure you need to take a restoration action before the failure occurs. This means that: you need to know when the equipment will fail. Then, you plan in the latest maintenance day before that date to restore its original condition. This restoration happens by replacing the part or overhauling it. But how we know when the equipment will fail? There are two ways to do this:

Either we compare the maintenance cycle operations or time to:
- actual operation hours
- calendar hours
- number of operations
Or, based on the equipment condition. i.e. The workplace may adopt condition monitoring maintenance program. Whether, the tools and the analysis are in-house or externally contracted. The condition monitoring tasks need to be analyzed if adding value to the maintenance operation or not.

The first way implicitly indicates that the equipment is aging with time. That’s true that equipment is aging but what is the relation between how old the equipment is and its tear and wear?

Equipment Aging and maintenance cycle

There are Three (3) main sources for this information:

Number one is the Operation and Maintenance manual
- It eithers sets a fixed time or number of operations
- Or, it adds a certain indication or measurement value to take action. This indication can be the pressure across a filter or the level or condition of coolant.
- The default mentality takes the maintenance team to follow the fixed time or number of operations. Moreover, it might be linked to the warranty of the equipment. Supplier expects that regardless of how healthy the equipment condition is, you must follow the number of operations or operation hours strictly.
  That’s holds it self absolutely true when the consequence of failure involves:
  - Health, and/or safety, and/or Environment risks
  - Or, high costs relative to the cost of the part itself.
The manuals are followed by, Team experience from previous jobs.
Then with time, Workplace experience is built up after some years of operation

However, Aging is not always the magic answer

As we previously explained the aviation industry pioneered the reliability-centered maintenance process and standardized it under SAE JA1011. To reach to this level they conducts many studies to upgrade the maintenance patterns they inherited from the other industry fields to suit the aviation safety and cost needs. Those studies revealed some astonishing results in the 1970’s that is now popular in other industry fields. Anyhow the results of the studies in the aviation field might not be typical to the results in other industry fields if they performed the same studies. Why? Due to many reasons. The most important one is that the complexity and technological advancement of the aviation equipment is not true for all the industries. Some industries has simple or old equipment that follows the standard traditional wear patterns. Anyhow we might use it as a guide.

The studies lead by the aviation industry resulted in 6 failure patterns. Three of them are age related. Unfortunately they contribute only to 11% of the failures in the aviation industry and 23% in the naval industry. While the remaining three patterns which demonstrate 70% to 90% of failures are random in nature. Random here means that they are influenced by external factors to the equipment itself. If we managed to mitigate this external agents, the equipment would continue to work for indefinite time. Those mitigation efforts will be the maintenance activities based on the reliability analysis.

Moreover, Adding a new maintenance Activity is not usually the answer

In some organizations, when an equipment fails, the easy countermeasure is to add some sort of maintenance activity. Either a new item to be added in a checklist, a new point in the condition monitoring path, or a repair action are added to the maintenance program. However the equipment fails again. And again we look for an action or a valid excuse for the failure to close the failure report and so on. Unless you have a solid reason for adding a maintenance activity you will be overloading the team or adding more failure reasons.

Once upon time, a room filled with electronic cards suffered multiple failures. The quick action was to clean the dust on the cards with air blower. Blowing the air on the cards in their racks led to the dust accumulation inside the cards and more failures. Taking out the cards and blowing them with air individually took a lot of time. Returning the cards back again issued a lot of cards infant mortality as if you are making a new startup. The solution was in isolating the rooms against dust and preventing natural ventilation. Moreover, if the door is left open, an alarm is generated. And the cards cleaning maintenance item was canceled.

Why Reliability focuses on how equipment fails?

Because Reliability-Centered Maintenance or RCM wants to broaden our vision of the failure as below:

There are many reasons or motivators for the same failure. Inhibiting those motivators early enough will safe the equipment from failing. From the external motivators that induces a failure and needs a root cause analysis to catch them :
- Loosened screw
- Cracked base
- Induced failure due to:
  - A failing nearby equipment
  - Nearby liquid or gas Leakage
  - Fire
- Human mistakes
- Ventilation
- Temperature
- Humidity
- Electronic sudden death due to failure of internal components on electronic cards. Electronic cards failure mode starts with infant mortality then after stability it is totally random unless one of the above factors occur.
The criteria of considering one condition as a failure doesn’t mean that the equipment is totally failed. (More in our operation & Reliability post) Loosing some secondary functions can discourage the operator or user from using this equipment. One of the simple examples is that the plane main function is to transfer the passengers through air to the destination they want. Serving meals and having toilets are secondary functions. For the passenger, unavailable toilet or the unsuitable cabin condition can be considered a failure even if he reached his destination safely
There is no maintenance task to increase the original capability of the equipment

In Conclusion,

From the above points, we can see that there are many motivators for failure or partial failure of an equipment. Reducing those motivators and reducing the consequence of the failure are more important than focusing on eliminating a randomly occurring failure. Some of those motivators that induce failures to the equipment include: wrong perceptions about the equipment capabilities. The failure won’t be prevented by just adding another preventive calendar-based task or adding a new point in the predictive inspection cycle without knowing what we are really looking to prevent.

If you feel you need help with any of these ideas we discussed, request a Management Consultancy or Coaching Services From our Store

3 replies on “3- Reliability made easy – Why Failure prevention is not the main RCM focus?”

[…] It may fail in other machine after 1 month. What you will do? Either you will decrease the cycle again or you will add some inspection for this part. Both raises your cost and manhours need. However, you can find the way out of this trap by a little pit of analysis for each failure. This analysis needs some data collection by the failure repair team. The part might by originally defective, not supplied according to spec. Or the failure might had originated from wrong operation of the machine or induced by other part vibrating or leaking. Some examples ere introduced in a previous chat: 3- Reliability made easy – Why Failure prevention is not the main RCM focus? […]

[…] 3- Reliability made easy – Why Failure prevention is not the main RCM focus? […]

[…] Reliability concepts at one point were introduced into our culture. Unless we hold on to the Reliability […]