by Ricky Smith, CMRP
Most companies have reduced costs during these tough economic times in order to survive – shutting down equipment, laying off or furloughing staff, extending time between rebuilds and preventive maintenance – without thinking about the preparation they should be taking in order to take advantage of the economic turnaround. There will be winners and losers once the world economy turns around. Companies that are prepared will win by shutting down their competition and demonstrating even higher profits than before the economic downturn. This can be accomplished by focusing on “optimizing asset integrity at optimal cost”.
Do You Want to be a Leader or a Laggard When the Economy
Turns Around?
Many industries have advanced maintenance and reliability in their operation and have taken great strides toward managing asset integrity. By applying known maintenance and reliability best practices, they have been able to optimize asset integrity, reduce total cost and, at the same time, reducing overall risk. But, most organizations are still trapped in the old way of thinking, and do not follow the true definition of asset integrity. One good way to know where your organization stands is to answer these questions:
1. Have your assets been ranked based on criticality?
The definition of critical equipment may vary from organization to organization. In fact, if it is not formalized, there may be several interpretations of equipment criticality within a single organization. The assumptions used to assess what equipment is critical are not technically based. As a result, when different individuals are asked to identify their critical equipment, they will likely select different pieces of equipment. Often we are told, “all of our equipment is critical!” Selections are based on individual opinions, lacking consensus or even basis in functional relevance to achieving business goals. The potential for equipment failure having significant safety, environmental or economic consequences may be overlooked.
A consistent definition for equipment criticality needs to be adopted. The definition used in the context of this article is: Critical equipment is that equipment whose failure has the highest potential impact on the business goals of the company.
Each asset is ranked based on environmental impact, customer impact, cost, and more. Remember 20% of your assets utilize 80% of your resources. Criticality ranking is influenced by all key function leaders in an organization and their decision impacts which equipment to focus on with a strong maintenance strategy, equipment replacement strategy, PM/PdM compliance and planned and scheduled maintenance work.
2. Can you measure Mean Time Between Failure (MTBF) of your critical assets?
This is a simple metric of measuring emergency or urgent work orders divided into time. If your maintenance department cannot provide this information in a matter of minutes for the total operation or a critical asset then there are two problems:
A) The maintenance software is not being used to its fullest potential, even though you
may have spent millions to install and maintain it
B) The maintenance department has not been educated in basic maintenance
management principles
3. Do you have 100% PM compliance and continue to have equipment breakdown?
It’s not too hard to figure this one out. Take a sample of your current PM inspection sheets and if they do not prevent a failure mode, detect the presence of a failure mode or aren’t regulatory in nature then you have a problem.
4. Are your total maintenance costs going down or are they flat?
If your maintenance costs continue to go up and you are adding staff or contractors, you are in what’s known as reactive maintenance mode. Education of and implementation of best practices is the solution. Having knowledge of common best practices will likely increase the frequency your maintenance staff and senior leadership chooses to apply in a given scenario. The maintenance process is like a puzzle; if you want to see the whole picture, then all elements must be in place and applied at the right time.
In this article, we will discuss what are known as best practices in Asset Integrity and how they are applied to prepare an organization for the economic turn around.
Let’s look at Shell’s Asset Integrity and Risk Management Statement:
We are committed to preventing incidents that put our people, neighbors, the environment and our facilities at risk. Asset integrity and process safety means making sure our facilities are well designed, safely operated and properly maintained.
Webster’s Dictionary contains the following definitions: Asset: anything of value that a corporation owns; and Integrity: firm adherence to a code. So, it follows that the definition of Asset Integrity is: anything of value that a corporation owns that is held in firm adherence to a code.
The Codes can be defined as ISO, ASQ, etc. The question is to what level of adherence is your company meeting these codes? Of course, you can say we satisfy environmental or safety codes by meeting PM compliance every month. However, is this the intent of a true PM Program? A preventive maintenance program is to prevent or predict a specific failure mode. All PM compliance tells us is that the PM is completed within some schedule, but it does not tell us if the PM is effective or not. If something is of value then a corporation should want to ensure it is well protected from failure, especially if the failure results in unacceptable costs.
Let’s look at how this entire Asset Integrity process works. Asset Integrity impacts safety, cost, and service to the customer. It also impacts your competition. If we look at the Market Survivor Model in Figure 1, it tells us a lot.
Any company will own a certain portion of market share. The market price drives profits and your future. With company “A” their market share is high, costs are low, and Asset Integrity is high, thus, they are prepared for economic turn around. Company “B” has a large part of market share, however, their cost is higher than Company “A” and this puts them at some risk. Their Asset Integrity is not optimized and thus their cost, along with risk, is higher. Company “C” has the smallest market share and the largest cost. A good example of a Company “C” would be Alcatel Telecommunications in 1999. The fiber optic cable business was growing at a rapid rate. By assessing the current level of their Asset Integrity program, we found key areas of improvement and closed the gap quickly. They went from being a “C” Company to an “A” Company within 6 months with a Return on Investment (ROI) of 8:1. This involved the application of known best practices in maintenance and reliability. Best practices are measurable and do impact the customer’s need in quality, delivery and price as seen in Figure 2.
Some of the Known Best Practices are:
Using the right metrics or Key Performance Indicators (KPIs) to manage asset health effectively and efficiently – such as:
1. Mean Time Between Failures: Most companies don’t measure Mean Time Between
Failures (MTBF), even though it’s the most basic measurement that quantifies reliability.
MTBF is the average time an asset functions before it fails. MTBF should be used for:
• Overall Operation
• Area
• Asset Type
So, why don’t they measure MTBF? We will discuss those reasons a little later.
2. Percentage of Assets with No Identifiable Defects – This means you can Identify a
component defect early enough in the failure cycle where work can be planned and
scheduled effectively and without interrupting operations or customers
In this case everyone must understand the P-F Curve and how it impacts asset reliability (Figure 3). “P” on the P-F Curve is the point at which failure begins on a specific component which will lead to catastrophic failure of an asset. Once a failure begins, we call this a defect and the severity of the defect and criticality of the asset determines how quickly we respond to the problem. If the defect severity is low and asset criticality is low, then there is no panic.
Tracking the percent of assets with “NO Identifiable Defect” is key to knowing the current health of your assets. When the percentage of Assets with No Identifiable Defect is over 80%, there is no longer a need to track Mean Time Between Failure (MTBF) because we are now in a proactive, not reactive mode. An asset that has an identifiable defect is said to be in a condition RED. An asset that does not have an identifiable defect is said to be in condition GREEN. That is it. It is that simple! There are no other “but’s”, “what if’s” or “if then’s”. Yellow is an unknown and must be determined to be either Green or Red within 72 hours. Figure 4 is an example of this metric.
Applying the right maintenance strategy at the right time – When an organization is focused on preventing and predicting failure modes, they can “Optimize Asset Integrity at an Optimal Cost” along with managing associated risks. My question to you is…
“Do you know the failure modes of your critical assets, and if so, are you applying the right maintenance strategy at the right time?”
An example of a failure mode would be pump bearing failure which could have been caused by poor alignment practices. If an inspection checklist for alignment had been followed, the problem would have been found. A failure modes analysis would identify Vibration Analysis as the most cost effective maintenance strategy for detecting alignment problems. Vibration Analysis will indicate if the shafts are in alignment, if a bolt is loose on the motor, if the pump has bearing stress, etc.
Failure of critical assets is unacceptable, as is spending too much money on reliability. The focus should be on applying the maintenance strategy, which predicts or prevents a failure mode in the most cost effective manner.
Identifying the Most Dominant Failure Pattern in your Operation – Many organizations today are focusing their resources on the most dominant failure pattern in their operation instead of reacting to problems. Identifying the most dominant failure pattern allows a company to focus on the common thread which has the largest impact on Asset Integrity. The US Navy conducted a study of their assets and found the most dominant failure pattern was infant mortality and considered the findings to be unacceptable. They put forth an effort to reduce infant mortality of their assets. Focusing on the dominant failure pattern allowed them to identify commonalities between different types of assets. They were successful in doing so as the percentage of Infant Failures dropped from 69% to 6%, which had a significant impact on their overall Asset Integrity. The failure patterns shown in Figure 5 were conducted back in the 1960s by United Airlines. These failure patterns have been found to be the same across different industry verticals.
Causes of Infant Mortality has been found to be the same in most industries. Here are a few examples of Infant Mortality:
Number 1: Lack of effective preventive, corrective and lubrication work procedures.
Number 2: Employees fail to follow the steps for a given procedure because either they do not agree with them or there is a lack of leadership follow-up.
Number 3: No contamination control – personnel apply grease into perfectly good bearings with contaminated grease on the end of lube fittings on the motors or the lube guns. Number 4: Not removing the relief plug on large motors to relieve oil grease to purge out of the motor. Walk out to a 30-100 HP motor and look for a plug under the motor bearing and see if the plug has ever been removed. My guess is that is hasn’t.
Number 5: Welding on equipment without attaching the ground lead within 6” of the weld. Most construction and maintenance personnel attach the ground for welding as close to the welding machine as possible, thus allowing current to flow to the path of least resistance (aluminum conduit, etc.). We do not want arcing between bearings, motor stators, electrical circuits, etc. If the welding lead and ground lead on your (or your contractor’s) welding machine is not the same length, then it will cause either infant mortality or random failures.
This list could go on and on. Just remember that if you could reduce the occurrence of this failure pattern by 50%, you will make a large impact on both asset reliability and cost. The ultimate goal is to extend the life of the equipment without seeing “P” on the P-F Curve. You can accomplish this by applying Precision Maintenance, which includes training craft personnel, effective work procedures, etc. Look at the I-P Curve in Figure 7, and decide on which curve your maintenance staff is currently working. This is exactly what the US Navy did, and their results speak for themselves.
I am sure that some people will say, “we do not have infant mortality”. Well, I hope that after reading this article, you actually investigate for yourself. If you would like a longer list, send me an e-mail and tell me which industry you are in, and I will share industry specific issues which cause infant mortality. The long I-P will never be obtained as long as the focus is not on the reduction of infant mortality by everyone from Project Engineering to Operators.
What a Successful Asset Integrity Manager Does
A successful Asset Integrity Manager focuses on providing reliability to an operation at the rate which its customers demand. However, we know this is not easy. Many organizations have spent millions of dollars performing Reliability Centered Maintenance (RCM) on their assets with what seems like very little Return on the Investment (ROI).
Doug Plucknette, a noted RCM expert and author states in his book Reliability Centered Maintenance using RCM Blitz, “The key to RCM was abandoning the philosophy of “preserve-equipment” in favor of “preserve-function”. Simply put, equipment became the means to an end, not the end in itself.” Doug has performed RCM on thousands of assets world wide and is noted among the best in the business.
Past studies have concluded that a maintenance policy based on operating age would have little, if any, impact on failure rates. Thus, applying time-based maintenance on equipment which has no “wear-out” pattern is futile. This conclusion forced a change in philosophy from, “It wasn’t broken, but we fixed it anyway” to “If it isn’t broken, don’t fix it”. These studies also concluded that:
1. Time-based maintenance works only for a small percentage of components, and then only
when there is solid information on their “wear-out” characteristics.
2. Condition Based Maintenance (CBM) is the preferred option. That means monitoring,
observing and taking non-intrusive actions, such as lubricating and cleaning, until a
conditionsignals that corrective action is necessary. This means striking the balance
between PM and Condition Monitoring (CBM) / Predictive Maintenance (PdM). We know it to be a fact that the cost to repair increases the longer we wait to correct a known defect, and that the time in the x-axis on the P-F Curve is an unknown. We can’t determine how long that interval is, nor can we tell when something is going to fail (Figure 8).
If you hear people say, “I think it will last a little longer,” when a defect has been identified using one of the Condition Monitoring technologies (IR, UE, Vibe etc.), then you are taking a great risk if you leave that equipment in service. You should be using asset criticality and defect severity to determine when to make the repair. Have you ever seen a large pump fail after the bearing defect has been identified 6 months earlier? It is all about risk and the consequence of that risk.
3. Run-to-failure is a viable tactic in situations where there is little economic and no safety
or environmental impact.
4. In a significant number of situations, the very act of maintenance itself causes subsequent
failure of the equipment. This should be more clearly defined. This suggests that even if
we do maintenance correctly, we still cause problems.
5. Non-intrusive maintenance tasks should be used instead of intrusive maintenance
whenever possible. In other words, don’t do any maintenance, except monitoring and
non-intrusive sustaining actions, until condition directs intrusive corrective action. A simple example is v-belt tension. The PM states “Check tension of Belt”. The equipment is stopped, the belt checked and found to be “loose”, so the mechanic tensions the belt using his hand as the tension gauge. It is highly probable he has over tensioned the belt because if he under-tensioned it you would hear the belt squeal when it returns to full speed and load. Over-tensioning of the belt will cause bearing failure. Why not use Infrared Inspection and look for belt slippage while the equipment is running (Figure 9)?
You must move from being a PM Centric organization to a truly PdM Centric organization if you ever want to obtain “Optimal Asset Integrity at Optimal Cost”. PdM is the most non-intrusive and precise method for conducting inspections while the machine is running under full load.
In his book, Plant Engineer’s Handbook, Keith Mobley links the following benefits to PdM:
• Maintenance costs - down by 50%
• Unexpected failures - reduced by 55%
• Repair and overhaul time - down by 60%
• Spare parts inventory - reduced by 30%
• 30% increase in machinery Mean Time Between Failures (MTBF)
• 30% increase in uptime
Now these numbers may seem high. But even if you take only a fraction of these benefits the financial impact of an effective PdM program at most organizations can easily reach into millions of dollars.
Despite what you may have heard, the foundation of a successful, comprehensive inspection program is simple: a detailed equipment list.
Why? Because your equipment list is the foundation for all of the key steps that follow. For example, a good list is essential for:
• Identifying how your equipment can fail (identifying failure modes)
• Choosing the right PdM technologies to apply to the asset
• Determining the ideal amount of PdM coverage for your operation
• Ranking the criticality of each piece of equipment
• Building databases for each PdM technology
• Determining PdM staffing levels
So if your list is incomplete or incorrect, everything that’s built from it will be flawed. Any shortcuts or inaccuracies will be exposed as big problems later. Figure 10 illustrates a sample of recommended technologies by equipment type for a specified environment.
In the last 40 years, no better method than RCM has been found for determining what maintenance should be performed to increase asset integrity. Four statistically significant studies have confirmed the validity of RCM.
In a survey conducted by Reliabilityweb.com in 2005, many companies offered the following excuses for the failure of their Reliability Centered Maintenance Implementations.
is too time and resource intense.”
“100% reliability is extremely expensive, difficult to attain, and not necessarily the right answer.”
“RCM is misunderstood to be software.”
“In the beginning, it was hard. And, it is still a challenge to steer the mind-set toward more
condition-based maintenance than time-based.”
“We always ran into the problem with implementation. In the few places where we
implemented it successfully, it was at the maintenance level. And recognition
for it was non-existent.”
“The system is very strong but too high level“
The truth is, there are many pitfalls in RCM. But few get revealed when an RCM project fails. You see, nobody wants to write an article or present a paper at a conference that reveals how money was wasted or about great visions that were never realized.
In order to ensure RCM works in an organization, you must focus on the systems that will give you the best Return on Investment (ROI). Simply put, RCM is a slam dunk when it comes to ROI for critical assets.
Begin your RCM effort by identifying the top 10% of your most critical assets. Once this list has been identified, you should now begin to measure Reliability on these assets; performing RCM analysis on those critical assets that have equipment-based operational, speed and throughput losses and thus reducing total cost of maintaining these assets by reducing contractors, overtime, maintenance parts, and fines. If you have selected a critical asset, your implemented RCM maintenance strategy will show measurable improvements with added improvements in Health, Safety and Environmental performance as well.
As a general rule, the success of your first RCM analysis will build the business case to complete RCM analysis on the remainder of your critical assets. In the simplest terms, RCM is a decision-making process which calls for answers to questions such as:
- What is this system supposed to do?
- How can it fail to do that?
- What causes it to fail?
- What happens when it does fail?
- Can we predict or prevent that from happening?
Getting the Tools in place - Funding, Staffing, Practices, and Reporting
Step 1 – Do not accept excuses for why you cannot transition an organization to the optimal maintenance model. Steven Covey once stated the following: “Your sphere of influence is very small, your area of concern is very large, focus on your sphere of influence and not on your area of concern and you will find your sphere of influence will grow.”
When an organization transitions from their current maintenance model to a more proactive one, you will find staffing is not a problem, costs are lower, and emergencies become a rare occurrence, however, you must take it one step at a time and follow the industry proven best practices. There are no excuses for following the wrong path.
Step 2 – Develop a business case for the reliability improvement process. Identify costs and Return on Investment based on known data. Involve senior leadership and the financial team in the business case development. You must have everyone sold on this idea of change or it will never happen. Too many organizations have had many false starts. You should never start a journey without everyone knowing success is the only option. Step 3 – Develop a master plan that focuses on results and moving an organization to having a balanced PM and PdM program using the Proactive Work Flow Model (Figure 11). In this plan you want financial targets and metrics to ensure you are receiving the benefits expected. You want short term results with long term gains. Note the Work Flow includes Management of Change (MOC) procedures to ensure consistency of processes and data. In addition, a process for a Failure Report Analysis and Corrective Active System (FRACAS) is identified for ensuring failure data is analyzed and used for decision making. This means implementing known best practices at the right time and not trying “new ideas” which are not proven.
Step 4 – Ensure you have identified Key Performance Indicators (KPIs) and financial targets. Make sure you have leading KPIs and lagging KPIs developed which equal the expected financial Return on Investment (ROI). These metrics and financial targets must be aligned with the process changes being made. In financial targets we want “hard dollar” savings and not “soft dollars”. Here is an example of what I mean: While working at the Pentagon with the US Army I was in a meeting and a Lean Six Sigma Director was reporting to his General officer (3 star General) and he stated they he had exceeded their expected goal of $70 Million Dollar savings and the General looked at his comptroller and ask if he had seen the savings and, if so, he wanted to use this money for a critical project. The comptroller stated he had not seen any monetary savings. Needless to say, the General was not too happy about this situation. Remember only “hard dollar”, verifiable savings can reported.
Figure 12 shows some known benchmarks in the chemical processing industry.
Benchmarking can be deceiving, but it is a great way to know where you stand in relation to other companies. The metric is the amount of money spent annually maintaining assets, divided by the Replacement Asset Value (RAV) of the assets being maintained, expressed as a percentage. This metric allows one to compare the expenditures for maintenance with other plants of varying size and value, as well as to benchmarks. The RAV as the denominator is used to normalize the measurement given that different plants vary in size and value. Here is what we know:
• Companies who are leaders have an RAV of less than 3.5% for Maintenance
• Companies who are in a reactive state have an RAV between 3.6-14%
Where does your organization you stack up?
The definition has been defined by the Society for Maintenance and Reliability Professionals (SMRP) after years of research. Here is SMRP’s definition:
Guidelines provide additional information or further clarification on component terms used in SMRP Best Practice Metrics. This guideline is for Replacement Asset Value (RAV).
A. Definition: The Replacement Asset Value (RAV) is defined as the cost that would be incurred, in today’s dollars, to replace the facility and equipment in its current configuration. It is intended to represent the realistic value to replace the existing assets at new value.
B. Purpose: RAV is used as the denominator in a number of calculations to normalize cost performance of facilities of various sizes within a given industry. These calculations are used to determine the performance of the maintenance and reliability function relative to other facilities within its industry.
C. Inclusions:
• Building envelope
• All physical assets (equipment) that must be maintained on an ongoing basis
• The value of improvements to grounds (provided these must be maintained on
an ongoing basis)
• Capitalized engineering costs
D. Exclusions:
• Value of land on which the facility is situated
• The value of working capital:
– Raw material inventory
– Work-in-process inventory
– Finished goods inventory
– Spare parts inventory
• Capitalized interest
• Pre-operational expense
• Investments included in construction of the facility that are not part of the facility assets
• Mine development
Replacement Asset Value Formula:
Best Practice Leaders typically have an Annual Maintenance Cost per RAV in the 2%-3.5% range, while Reactive Organizations tend to run anywhere from 4%-14%.
Look at this example of the cost a company incurs by not following codes and standards identified to optimize Asset Integrity.
This is not including increase equipment performance and output which can be 2-10 times these losses.
Step 5 – Educate everyone from top leadership to operators on your plan and what your organization’s future state looks like. You want operations to understand they own reliability integrity as an equal partner with maintenance and of course engineering. This is a three legged stool which will fall if not supported properly.
Step 6 – Execute the plan and always meet targets and goals set by the plan. Ensure everyone sees the KPIs and financial targets which were established, and thus always knows the “score in the game”.
In summary, organizations can be successful if they apply best practices at the right time and tie all changes to “hard savings”. Remember, the ultimate goal is “optimal asset integrity at optimal cost” by following best practices.
Ricky has been involved with maintenance for over 30 years as a maintenance manager, maintenance supervisor, maintenance engineer, maintenance training specialist, maintenance consultant and is a well known published author. Today he is a Principal Advisor at GPAllied, LLC. Ricky has worked with maintenance organizations in hundreds of facilities, industrial plants, etc, world wide in developing reliability, maintenance and technical training strategies. He has worked for Exxon Company USA, Alumax (this plant was rated the best in the world for over 18 years), Kendall Company, and Hercules Chemical providing the foundation for his reliability and maintenance experience. Ricky Smith is Vice Chairman of the Oil, Gas, and Petrochemical SIG for the Society of Maintenance and Reliability Professionals (SMRP), is the Reliability Engineering Discipline Manager for PetroSkills and is well known as a Reliability Advocate in Asset Reliability and Process Integrity in the Middle East and around the world. Ricky is the co-author of “Rules of Thumb for Maintenance and Reliability Engineers”, “Lean Maintenance” and “Industrial Repair, Best Maintenance Repair Practices” and “Planning and Scheduling Made Simple”. Ricky holds certification as Certified Maintenance and Reliability Professional (CMRP) from the Society for Maintenance and Reliability Professionals (SMRP) as well as a Certified Plant Maintenance Manager (CPMM) from the Association of Facilities Engineering (AFE) and is also a Six Sigma Black Belt. Ricky lives in Charleston, SC with his wife. Aside form spending time with his 3 children and 3 grandchildren, Ricky enjoys kayaking, fishing, hiking and archaeology. He can be reached at rsmith@gpallied.com
Positioning For An Economic Recovery.pdf














Recent Comments