Failure Mode and Effects Analysis (FMEA) is a structured, systematic process for identifying the ways a piece of equipment, process, or system can fail, analyzing the effect each failure mode has on operations, and prioritizing corrective actions based on risk. The output of an FMEA is a ranked list of failure modes with associated risk scores — giving maintenance and reliability teams a defensible, data-driven basis for allocating PM resources, defining inspection intervals, and targeting condition monitoring investment.
FMEA is a proactive reliability tool, not a troubleshooting method. It is performed before failures occur — during equipment commissioning, after a significant failure event, or as part of a reliability-centered maintenance program review. The goal is to identify which failure modes carry the highest combination of severity, likelihood, and detectability before those modes produce unplanned downtime, safety incidents, or quality escapes.
In asset-intensive operations, FMEA is most valuable as an input to maintenance strategy development. An asset with ten identified failure modes does not need ten separate PM tasks — it needs a maintenance strategy matched to the actual risk profile of each mode. FMEA provides that profile.
Why FMEA Matters
Most maintenance programs are built on history and intuition — tasks inherited from previous programs, OEM recommendations applied uniformly regardless of operating context, and interval decisions made by experienced technicians who are no longer with the organization. FMEA replaces that informal foundation with documented analysis.
The practical consequence is better allocation of finite maintenance resources. In any operation, maintenance capacity is limited. FMEA identifies which failure modes carry the highest risk and ensures that PM effort, condition monitoring investment, and spare parts stocking are concentrated where failure consequences are greatest — not distributed uniformly across all assets regardless of criticality.
FMEA also provides regulatory and warranty documentation. In oil and gas, aerospace, and heavy industrial contexts, demonstrating that a systematic failure analysis was performed and that maintenance strategies were derived from that analysis satisfies audit requirements and supports warranty defense claims when failures occur despite documented preventive action.
How FMEA Works in Practice
Types of FMEA
Three FMEA types are used in industrial maintenance and reliability contexts:
- Process FMEA: Analyzes failure modes in a manufacturing or operational process — where the process itself can deviate from standard in ways that produce defects, safety incidents, or equipment damage. Used in manufacturing quality programs and process safety management.
- Design FMEA: Analyzes failure modes inherent in equipment design before the asset enters service. Used by engineers during equipment specification and procurement to identify design weaknesses before they become operational problems.
- Functional FMEA: Analyzes failure modes based on the functions an asset must perform, rather than its physical components. Used when component-level detail is unavailable or when analyzing complex systems where function is better defined than hardware.
In maintenance and reliability programs, Functional FMEA and Process FMEA are most commonly applied to operating assets. Design FMEA is most relevant during capital procurement and equipment specification phases.
The FMEA Process
A structured FMEA follows a defined sequence:
1. Define the scope. Identify the asset, system, or process being analyzed and set the boundary of the analysis. A crusher FMEA might scope to the crusher and its drive system, excluding the feed conveyor and discharge screen as separate analyses.
2. Identify asset functions. For each component or subsystem in scope, define what function it must perform. A bearing must support the shaft load and allow rotation with minimal friction. A seal must prevent contamination ingress and lubricant loss.
3. Identify failure modes. For each function, define the ways it can fail to perform that function. A bearing can fail through fatigue spalling, corrosion, overloading, contamination, or inadequate lubrication. Each failure mode is listed separately.
4. Analyze failure effects. For each failure mode, describe what happens when the failure occurs — at the component level, the asset level, and the system level. A bearing failure on a crusher main shaft may cause shaft seizure, crusher shutdown, and production circuit stoppage.
5. Assign severity (S). Rate the consequence of the failure effect on a scale of 1 to 10. A failure mode with no safety or production impact scores low. A failure mode that causes a safety incident or extended production loss scores high.
6. Identify failure causes. For each failure mode, identify the root causes that could produce it. Bearing fatigue may result from overloading, incorrect installation, or interval-based replacement that does not account for actual load cycles.
7. Assess occurrence (O). Rate how frequently the failure mode is expected to occur on a scale of 1 to 10, based on historical data, OEM information, or engineering judgment. A failure mode observed multiple times per year scores higher than one observed once in ten years of fleet history.
8. Evaluate current controls. Document what is currently in place to prevent the failure mode or detect it before it produces the failure effect. A vibration monitoring program that detects bearing defects before failure is a detection control. A lubrication PM that prevents contamination ingress is a prevention control.
9. Assign detection score (D). Rate how difficult it is to detect the failure mode before it causes the failure effect, on a scale of 1 to 10. A failure mode with no current detection method scores 10. One detected reliably by an existing sensor scores low.
10. Calculate Risk Priority Number (RPN). RPN = Severity x Occurrence x Detection. The RPN ranks failure modes by combined risk. High-RPN failure modes receive priority attention for maintenance strategy improvement — either reducing occurrence through better PM, improving detection through condition monitoring, or accepting the risk with documented rationale.
FMEA vs. FMECA
FMECA (Failure Mode, Effects, and Criticality Analysis) extends FMEA by adding a formal criticality analysis that ranks failure modes by their probability of occurrence and consequence severity using quantitative data rather than relative scores. FMECA is more analytically rigorous and is standard in defense, aerospace, and nuclear applications where quantitative risk assessment is required. FMEA with RPN scoring is more commonly used in industrial maintenance programs where the precision of quantitative probability data is not available or not required. Both methods produce the same output — a prioritized list of failure modes with recommended maintenance actions — but FMECA requires more data and analytical resources to execute.
FMEA by Industry
Manufacturing: FMEA is embedded in manufacturing quality systems through standards like IATF 16949 (automotive) and AS9100 (aerospace). Process FMEAs are used to identify failure modes in production processes that could produce defective output. Equipment FMEAs drive PM strategy for production line assets where unplanned downtime directly reduces throughput. In TPM programs, FMEA outputs feed directly into planned maintenance task lists and operator inspection routines.
Mining: FMEA is used in mining to develop maintenance strategies for high-value, high-consequence assets — haul truck powertrains, crusher components, conveyor drive systems. The combination of extreme operating conditions, high component replacement costs, and severe production consequences when assets fail makes structured failure mode analysis a justifiable investment. Mining FMEAs frequently identify lubrication failure modes as the highest-RPN category, which drives investment in lubrication management programs.
Oil and Gas: Process safety management (PSM) regulations require systematic analysis of failure modes for safety-critical equipment. FMEA is a standard method for satisfying this requirement, particularly for pressure-containing equipment, rotating machinery, and safety instrumented systems. FMEA outputs in oil and gas feed directly into inspection and testing intervals, spare parts strategies, and operator response procedures for identified failure scenarios.
Crane and Rigging: FMEA on load-bearing components — wire rope, hooks, sheaves, hydraulic cylinders — identifies failure modes with direct safety consequences. In crane operations, a structural failure under load has catastrophic potential. FMEA provides the analytical basis for inspection intervals, load testing requirements, and mandatory replacement criteria for safety-critical components, supporting both regulatory compliance and liability documentation.
Common FMEA Program Failures
FMEA performed once and never updated: An FMEA reflects the failure knowledge available at the time it was performed. When operating conditions change, new failure modes are observed, or maintenance strategies are modified, the FMEA must be updated to remain valid. Organizations that treat FMEA as a one-time documentation exercise rather than a living analysis lose its value within a few years.
RPN used as the sole prioritization criterion: RPN multiplication can produce counterintuitive rankings — a moderate severity failure mode that is common and hard to detect can outscore a catastrophic failure mode that is rare and easily detected. High-severity failure modes should receive attention regardless of RPN score. Severity should always be reviewed independently before deferring to RPN ranking.
FMEA performed without technician input: FMEAs developed exclusively by engineers from documentation and OEM data miss the failure modes that only become visible through years of hands-on maintenance experience. The technicians who maintain the equipment know failure patterns that do not appear in any manual. Their input is not optional — it is the most valuable data source in the analysis.
No connection to maintenance strategy: An FMEA that produces a list of failure modes and RPN scores but does not result in defined maintenance task changes, interval adjustments, or condition monitoring additions has not delivered its value. The output of every FMEA should be a set of specific maintenance strategy decisions with documented rationale.
Scope too broad: FMEAs that attempt to analyze an entire facility or production system in a single exercise become unmanageable. Effective FMEAs are scoped to a specific asset, subsystem, or process with defined boundaries. Starting with the highest-criticality assets and expanding from there produces actionable results faster than attempting comprehensive coverage immediately.
FMEA vs. Related Reliability Tools
- FMEA (Failure Mode and Effects Analysis): Identifies failure modes, analyzes their effects, and prioritizes them by risk score. A proactive planning tool used to develop maintenance strategy.
- FMECA (Failure Mode, Effects, and Criticality Analysis): Extends FMEA with quantitative criticality analysis. More rigorous, more data-intensive, standard in regulated industries.
- RCM (Reliability-Centered Maintenance): A broader maintenance strategy development methodology that uses FMEA as one of its core analytical tools. RCM defines what maintenance strategy is appropriate for each failure mode identified in the FMEA. See: Reliability-Centered Maintenance (RCM).
- Root Cause Failure Analysis (RCFA): A reactive analysis performed after a failure occurs to identify why it happened. FMEA is proactive — performed before failures to prevent them. RCFA findings should feed back into FMEA updates. See: Root Cause Failure Analysis (RCFA).
- Asset Criticality Ranking (ACR): Prioritizes assets by failure consequence and likelihood. ACR identifies which assets warrant FMEA investment. FMEA then provides the detailed failure mode analysis for those assets. See: Asset Criticality Ranking (ACR).
Frequently Asked Questions
What is a Risk Priority Number (RPN) in FMEA?
A Risk Priority Number (RPN) is a composite risk score calculated by multiplying three ratings: Severity (how serious the failure effect is), Occurrence (how frequently the failure mode is expected to occur), and Detection (how difficult the failure mode is to detect before it causes the failure effect). Each factor is rated on a scale of 1 to 10, giving RPN values ranging from 1 to 1,000. High-RPN failure modes receive priority attention for maintenance strategy improvement. RPN should not be used as the sole prioritization criterion — failure modes with high Severity scores warrant attention regardless of their overall RPN.
How does FMEA relate to RCM?
FMEA is the analytical engine inside a Reliability-Centered Maintenance (RCM) program. RCM uses FMEA to identify and analyze failure modes, then applies a decision logic to determine the most appropriate maintenance strategy for each mode — preventive, condition-based, run-to-failure, or redesign. An RCM program without FMEA is making maintenance strategy decisions without a systematic understanding of what can fail and why. FMEA without RCM produces a risk ranking but no defined process for translating that ranking into maintenance actions.
How often should an FMEA be updated?
An FMEA should be reviewed and updated when operating conditions change significantly, when a failure mode occurs that was not identified in the existing analysis, when maintenance strategies are modified, or on a defined review cycle — typically every two to three years for stable assets. An FMEA that has not been reviewed since initial development reflects the failure knowledge of the team at that time, not current operational reality. Treat FMEA as a living document with ownership and a review schedule, not a one-time deliverable.
Where should a maintenance team start with FMEA?
Start with the highest-criticality assets identified through an Asset Criticality Ranking process. These are the assets where failure consequence is highest — production-critical equipment, safety-critical systems, assets with long lead times for spare parts. Performing FMEA on a Tier 1 critical asset first produces the highest return on analytical effort and generates the most significant maintenance strategy improvements. Expand to lower-criticality assets as resources allow.
Related Terms
- Asset Criticality Ranking (ACR)
- Reliability-Centered Maintenance (RCM)
- Root Cause Failure Analysis (RCFA)
- Preventive Maintenance (PM)
- Condition-Based Maintenance (CBM)
- Mean Time Between Failures (MTBF)
- Predictive Maintenance (PdM)
Build Smarter Maintenance Strategies With Redlist
Redlist helps reliability teams move from reactive firefighting to planned execution — connecting failure mode analysis to work order management, PM scheduling, and asset history in one platform.