Run-to-Failure (RTF) maintenance — also called failure-finding maintenance or “fix it when it breaks” — is a deliberate maintenance strategy in which an asset is allowed to operate until it fails before any maintenance action is taken. Unlike reactive maintenance, which responds to unexpected failures, RTF is a conscious, planned decision to accept failure as the trigger for maintenance on specific assets. It is a legitimate maintenance strategy when applied to the right assets under the right conditions — and a costly mistake when applied to the wrong ones.
RTF is one of several maintenance strategies within a complete preventive maintenance program. The others — time-based PM, condition-based maintenance, and predictive maintenance — all involve intervening before failure occurs. RTF deliberately does not intervene. The strategic question is not whether RTF is good or bad in principle, but whether it is the right strategy for a specific asset given its failure consequences, failure predictability, and the relative cost of prevention versus restoration.
Reliability-Centered Maintenance (RCM) analysis consistently finds that RTF is the appropriate maintenance strategy for a significant proportion of assets in most facilities — typically those where failure has no safety consequence, no significant production impact, and where the cost of restoration after failure is less than or equal to the cost of preventing it. Applying proactive maintenance to every asset regardless of consequence is not reliability best practice — it is maintenance resource misallocation.
Why RTF Strategy Selection Matters
The decision to apply RTF to an asset is a risk acceptance decision. It accepts that the asset will fail, that the failure will be unplanned, and that the consequences of that failure are acceptable given the cost of preventing it. When that risk assessment is correct — when the asset is truly non-critical, truly redundant, or truly cheaper to restore than to prevent — RTF is the economically rational choice and applying proactive maintenance to it wastes resources that could be directed to assets where prevention delivers real value.
When the risk assessment is incorrect — when an asset classified as non-critical is actually a single point of failure, or when failure consequences include safety risks that were not fully evaluated — RTF produces outcomes that proactive maintenance would have prevented. The cost of misapplied RTF is not just the repair cost; it is the full consequence of the failure event, including production loss, safety incidents, regulatory exposure, and secondary equipment damage from the failure cascade.
RTF strategy selection is therefore not a maintenance department decision made informally — it should be the output of a structured Asset Criticality Ranking process that evaluates failure consequences systematically across the asset population and assigns maintenance strategies based on documented risk assessment rather than historical practice or convenience.
How RTF Works in Practice
RTF vs. Reactive Maintenance
RTF and reactive maintenance both result in maintenance action after failure — but they differ fundamentally in intent and preparation. Reactive maintenance is unplanned: the failure was unexpected, no preparation was made for it, and the response is improvised. RTF is planned: the failure is anticipated, spare parts are stocked, repair procedures are documented, and the response is pre-organized. The failure event itself may look identical from the outside, but the response speed, cost, and production impact are significantly different.
This distinction matters because RTF, properly applied, should not produce the extended downtime and parts scramble that characterizes reactive maintenance. An asset on a deliberate RTF strategy should have its failure response pre-planned — critical spare parts on the shelf, repair procedures documented and available, technician skill requirements identified, and production impact mitigation plans in place. RTF without this preparation is not a strategy — it is reactive maintenance with a different label.
When RTF Is Appropriate
RTF is the right maintenance strategy when all of the following conditions are met:
No safety consequence: Failure of the asset does not create a risk of injury, environmental release, or regulatory compliance breach. Safety-critical assets — pressure boundaries, emergency shutdown systems, load-bearing structural elements — are never candidates for RTF regardless of their production impact.
No significant production consequence: The asset is either non-critical to production continuity or has sufficient redundancy that its failure does not interrupt production. A pump with a fully operational standby pump on auto-start is a legitimate RTF candidate; the same pump without redundancy on a single-train process is not.
Restoration cost is less than or equal to prevention cost: The total cost of allowing failure and restoring the asset — including labor, parts, and any residual production impact — is less than or equal to the cost of the preventive maintenance program that would be required to prevent it. For assets where failure is infrequent and restoration is simple, this condition is often met.
Failure mode does not cause secondary damage: When the asset fails, it fails in a contained way that does not damage adjacent equipment, contaminate product, or cascade into a larger system failure. Bearing failures that produce metal contamination in lubrication systems, for example, can damage downstream equipment and disqualify an otherwise acceptable RTF candidate.
RTF in Practice — Examples
Light bulbs and minor consumables: The most straightforward RTF application. A general-purpose light bulb in a non-production area has no safety consequence when it fails, no production impact, and a restoration cost (replacing the bulb) that is trivially less than any preventive replacement program. RTF is the correct strategy — maintain a spare on the shelf and replace on failure.
Redundant equipment in large fleets: In a mining operation running 50 haul trucks, individual truck availability is less critical than fleet availability. A single truck failure does not stop production — the remaining fleet absorbs the load. RTF may be appropriate for certain truck components where the cost of preventive replacement across 50 units exceeds the cost of on-failure replacement for the small number of units that actually fail. The single rock crusher at the same operation is not a RTF candidate — its failure stops all production regardless of how many trucks are available.
Non-production support equipment: Office HVAC units, non-critical lighting circuits, administrative facility equipment, and support infrastructure that does not affect production continuity or safety are typically RTF candidates. The maintenance resources saved by applying RTF to genuinely non-critical support assets can be redirected to production-critical equipment where preventive maintenance delivers measurable reliability value.
What RTF Requires
RTF is not a decision to ignore an asset — it is a decision to manage its failure response proactively rather than preventing the failure. Effective RTF implementation requires: critical spare parts identified and stocked, repair procedures documented and accessible, technician skill requirements identified and available, production impact mitigation plans in place (temporary workarounds, standby equipment activation, production schedule adjustments), and the failure event recorded in the CMMS for cost tracking and strategy validation.
The CMMS role in RTF is to accumulate restoration cost history against each RTF asset, enabling periodic validation that the RTF strategy remains economically justified. If restoration costs are rising — because failure is becoming more frequent as the asset ages, or because parts costs have increased — the RTF strategy should be re-evaluated against the current cost of preventive alternatives.
RTF by Industry
Manufacturing: RTF in manufacturing is most commonly applied to non-production support equipment, redundant utilities, and minor components where failure does not affect line output or product quality. For production equipment on a single-train line, RTF is rarely appropriate — failure of any production asset stops the line. Manufacturing operations with mature maintenance programs use RCM or simplified criticality analysis to formally identify RTF candidates rather than allowing RTF to be the default strategy for under-resourced assets.
Mining: RTF is more commonly applied in mining than in most other industries because of the high degree of fleet redundancy in mobile equipment operations. Individual unit failures in large fleets are tolerable in ways that single-asset failures in manufacturing are not. The primary RTF risk in mining is component failures that cascade — a failed wheel motor that, if not caught early, damages the final drive — which requires that even RTF assets are monitored for early failure indicators that would trigger expedited response before secondary damage occurs.
Oil and Gas: RTF application in oil and gas is constrained by safety and regulatory requirements — a much higher proportion of the asset population has either safety consequence or regulatory inspection requirements that preclude RTF. Utility systems, non-process instrumentation, and support infrastructure with no process safety connection may be RTF candidates. Any asset classified as a safety-critical element under the facility’s process safety management program is excluded from RTF consideration regardless of production impact.
Crane and Rigging: RTF is essentially inapplicable to load-bearing crane components — structural members, hooks, wire rope assemblies, and braking systems — where failure during operation creates immediate risk of dropped load and personnel injury. ASME B30 and OSHA standards require inspection and replacement criteria for these components that are incompatible with RTF. Administrative and non-load-bearing support systems on crane assets may be RTF candidates, but the safety-critical nature of lifting equipment means that RTF application requires careful case-by-case evaluation against regulatory requirements.
Common RTF Application Failures
RTF as default rather than deliberate strategy: The most common RTF failure is not a strategic decision at all — it is the absence of a proactive maintenance program on an asset that simply never got added to the PM schedule. Assets maintained reactively because no one made an explicit maintenance strategy decision are not on RTF — they are unmanaged. True RTF requires a documented decision, a criticality assessment, and a failure response plan.
Applying RTF to assets with hidden safety consequences: Failure consequence assessments that focus on production impact and miss safety implications produce RTF decisions that should never have been made. A valve that fails open may have no immediate production consequence but significant process safety consequence depending on what it controls. Safety consequence evaluation must be thorough and must include failure mode analysis, not just asset classification.
No spare parts staged for RTF assets: RTF without pre-positioned spare parts is reactive maintenance. If the failure response requires an emergency parts order with a multi-day lead time, the RTF strategy has produced extended unplanned downtime that a basic spare parts plan would have prevented. Every asset on RTF should have its critical restoration parts identified and stocked.
No periodic strategy re-validation: An asset that was a legitimate RTF candidate five years ago may not be today — if it has aged into higher failure frequency, if a redundant asset has been decommissioned, or if its operational context has changed. RTF strategies should be reviewed on a defined cycle as part of the broader maintenance strategy review program.
RTF vs. Other Maintenance Strategies
- Run-to-Failure (RTF): Planned acceptance of failure as the maintenance trigger. Applied to non-critical, non-safety assets where restoration cost is less than or equal to prevention cost. Failure response is pre-planned.
- Reactive Maintenance: Unplanned response to unexpected failure. Not a deliberate strategy — the absence of a strategy. Distinguished from RTF by lack of preparation and anticipation.
- Preventive Maintenance (PM): Time-based or usage-based maintenance performed before failure occurs. Applied to assets where scheduled intervention reduces failure frequency or prevents failure consequences. See: Preventive Maintenance (PM).
- Predictive Maintenance (PdM): Condition-based maintenance triggered by asset condition indicators rather than fixed schedules. Applied to assets where failure can be detected in advance through condition monitoring. See: Predictive Maintenance (PdM).
- Reliability-Centered Maintenance (RCM): The structured analysis process used to determine the appropriate maintenance strategy — including RTF — for each asset based on failure modes and consequences. RTF is one of several strategies RCM may prescribe. See: Reliability-Centered Maintenance (RCM).
Frequently Asked Questions
What is run-to-failure maintenance?
Run-to-failure (RTF) maintenance is a deliberate maintenance strategy in which an asset is allowed to operate until it fails before any maintenance action is taken. Unlike reactive maintenance — which responds to unexpected failures — RTF is a planned decision to accept failure as the maintenance trigger for specific assets that meet defined criteria: no safety consequence, no significant production impact, and restoration cost less than or equal to prevention cost. RTF is a legitimate reliability strategy when applied correctly, not a default for under-resourced maintenance programs.
What is the difference between RTF and reactive maintenance?
RTF is planned; reactive maintenance is unplanned. RTF involves a deliberate decision to allow failure, with spare parts pre-positioned, repair procedures documented, and failure response pre-organized. Reactive maintenance responds to unexpected failures without preparation — the failure was not anticipated, parts may not be available, and the response is improvised. Both result in maintenance action after failure, but RTF produces faster, lower-cost failure response because the response was planned in advance.
When should RTF be used?
RTF is appropriate when all of the following conditions are met: the asset has no safety consequence on failure, the asset has no significant production consequence on failure (either non-critical or fully redundant), the total cost of restoration after failure is less than or equal to the cost of preventive maintenance, and failure does not cause secondary damage to adjacent equipment. These conditions should be evaluated through a formal criticality assessment rather than informal judgment, and the RTF decision should be documented with a pre-planned failure response.
How does a CMMS support RTF maintenance?
A CMMS supports RTF by tracking restoration cost history against each RTF asset — accumulating labor, parts, and downtime costs from every failure event — enabling periodic validation that the RTF strategy remains economically justified. It also manages spare parts inventory for RTF assets, ensuring critical restoration parts are stocked and available when failures occur. When restoration costs begin rising due to increasing failure frequency or escalating parts costs, the CMMS data provides the evidence base for reconsidering the RTF strategy in favor of a proactive alternative.
Related Terms
- Preventive Maintenance (PM)
- Predictive Maintenance (PdM)
- Reliability-Centered Maintenance (RCM)
- Asset Criticality Ranking (ACR)
- Corrective Maintenance (CM)
- Mean Time Between Failures (MTBF)
- Total Cost of Ownership (TCO)
Build a Smarter Maintenance Strategy With Redlist
Redlist helps maintenance teams identify which assets belong on RTF, which need proactive care, and tracks restoration costs over time so strategy decisions stay grounded in actual data.