Mean Time Between Failures (MTBF) is a reliability metric that measures the average operating time between one failure and the next for a repairable asset or component. It is expressed in hours and calculated by dividing total operating time by the number of failures in a given period. A higher MTBF indicates a more reliable asset — one that operates longer between failures. A declining MTBF over time is one of the clearest signals that a maintenance program is losing ground against equipment degradation.
MTBF applies specifically to repairable assets — equipment that is restored to service after a failure. For non-repairable components that are replaced rather than repaired, the equivalent metric is Mean Time To Failure (MTTF).
Why MTBF Matters
MTBF is one of the few reliability metrics that bridges the gap between maintenance execution and business performance. A maintenance team that improves MTBF on a critical asset is directly extending production run time, reducing emergency labor costs, and lowering the frequency of unplanned downtime events.
Tracking MTBF over time reveals whether a maintenance program is improving equipment reliability or simply responding to failures as they occur. An MTBF that holds steady or increases confirms that PM intervals, lubrication programs, and inspection routines are working. An MTBF that declines signals that something in the maintenance program — interval, task quality, parts quality, or operating conditions — needs to change.
MTBF also supports capital planning decisions. When an asset’s MTBF drops below a threshold where maintenance costs exceed replacement cost amortized over the asset’s remaining useful life, the data justifies a capital replacement request. This converts a maintenance observation into a financial argument that operations and finance leadership can act on.
How to Calculate MTBF
The Formula
MTBF is calculated by dividing total operating time by the number of failures in the measurement period:
MTBF = Total Operating Time / Number of Failures
Operating time counts only the hours the asset was running — it excludes planned downtime, scheduled maintenance windows, and time the asset was intentionally offline. Only unplanned failures count in the denominator.
Calculation Example
A centrifugal pump operated for 4,200 hours over a 12-month period and experienced 3 unplanned failures during that time.
MTBF = 4,200 hours / 3 failures = 1,400 hours per failure
This means the pump fails on average once every 1,400 operating hours. If the pump runs 16 hours per day, that translates to a failure approximately every 87 days. This figure can now be compared against the previous year’s MTBF, against OEM reliability specifications, and against similar assets in the fleet to identify whether this pump is performing within expectations or is an outlier requiring investigation.
What Counts as a Failure
Consistent failure counting is critical to MTBF accuracy. A failure should be defined as any unplanned event that causes the asset to stop performing its intended function — regardless of repair time. Teams that only count major failures and exclude minor stoppages will systematically overstate MTBF. Define failure criteria clearly and apply them uniformly across assets and recording periods.
MTBF and MTTR
MTBF is most useful when paired with Mean Time To Repair (MTTR) — the average time required to restore an asset to service after a failure. Together they describe the full failure cycle: MTBF measures how long between failures, MTTR measures how long each failure takes to resolve. A high MTBF with a low MTTR indicates a reliable asset with efficient repair processes. A low MTBF with a high MTTR indicates an asset that fails frequently and takes a long time to fix — the worst combination for production availability.
MTBF by Industry
Manufacturing: Production lines track MTBF at the machine level to identify bottleneck assets that are limiting Overall Equipment Effectiveness (OEE). A single press, mixer, or conveyor with a declining MTBF can constrain an entire line. Manufacturing reliability teams use MTBF trend data to justify PM interval changes, lubrication program upgrades, and component replacements before failures impact production schedules.
Mining: Haul truck engines, crusher bearings, and conveyor drives are tracked by MTBF to manage fleet availability and plan component change-out schedules. In mining, MTBF is often calculated at the component level — wheel motor MTBF, crusher liner MTBF, conveyor belt MTBF — rather than the asset level, because component-level data drives more precise maintenance decisions on complex, high-value equipment.
Oil and Gas: Compressors, pumps, and rotating equipment in upstream and midstream operations are managed against MTBF targets tied to production commitments. Remote operations where repair logistics are expensive place particular emphasis on improving MTBF — each failure in a remote location carries travel, mobilization, and lost production costs far beyond the repair itself.
Crane and Rigging: MTBF tracking on hoists, wire rope systems, and hydraulic components supports both maintenance planning and the inspection documentation requirements that crane operations must maintain. A declining MTBF on a load-bearing component is both a reliability signal and a safety signal that warrants accelerated inspection intervals.
Common MTBF Measurement Failures
Inconsistent failure definitions: If one technician records every minor stoppage as a failure and another records only major breakdowns, MTBF figures across assets and time periods are not comparable. Define failure criteria once and enforce them consistently across all recording.
Including planned downtime in operating time: Scheduled maintenance windows, planned shutdowns, and intentional offline periods should not count toward operating time. Including them inflates MTBF and masks the true failure rate. Operating time means the asset was running and available to produce.
Measuring the wrong asset boundary: Calculating MTBF at the system level (the entire pump skid) when the failures are concentrated in one component (the mechanical seal) obscures the actual reliability problem. Component-level MTBF is more actionable than system-level MTBF for high-complexity assets.
Treating MTBF as a static target: MTBF should be tracked as a trend, not evaluated as a single number. A current MTBF of 800 hours means nothing without context — is it improving, declining, or holding steady compared to the previous six months? Trend direction is more important than the absolute value.
No connection to maintenance decisions: MTBF data that lives in a spreadsheet and is never reviewed in maintenance planning meetings produces no reliability improvement. MTBF should be a standing agenda item in reliability reviews, with declining assets triggering specific investigation and corrective action.
MTBF vs. Related Reliability Metrics
- MTBF (Mean Time Between Failures): Average operating time between failures for repairable assets. Measures reliability.
- MTTF (Mean Time To Failure): Average operating time before failure for non-repairable components. Used for parts that are replaced, not repaired. See: Mean Time To Failure (MTTF).
- MTTR (Mean Time To Repair): Average time to restore an asset to service after a failure. Measures maintainability, not reliability. Paired with MTBF to calculate asset availability.
- Asset Availability: The percentage of time an asset is available to operate. Calculated from MTBF and MTTR: Availability = MTBF / (MTBF + MTTR). See: Asset Availability.
- OEE (Overall Equipment Effectiveness): Combines availability, performance, and quality into a single production efficiency metric. MTBF improvements feed directly into the availability component of OEE.
Frequently Asked Questions
What is a good MTBF?
There is no universal benchmark — a good MTBF depends on the asset type, operating conditions, and industry. The more useful question is whether MTBF is improving over time. For context, OEM reliability specifications often state an expected MTBF for equipment under defined operating conditions. Compare your actual MTBF against the OEM figure, against your own historical trend, and against similar assets in your fleet. Consistent improvement over 6 to 12 month periods indicates a maintenance program that is working.
How do you improve MTBF?
MTBF improves by addressing the root causes of failures rather than simply repairing them. Start with Root Cause Failure Analysis (RCFA) on recurring failures to identify whether the cause is inadequate lubrication, incorrect installation, operating outside design parameters, or PM interval errors. The most common MTBF improvements come from upgrading lubrication programs, tightening PM compliance, improving operator inspection routines, and correcting installation and alignment practices.
How do MTBF and MTTR work together?
MTBF and MTTR together determine asset availability. A pump with an MTBF of 1,000 hours and an MTTR of 8 hours has an availability of 1,000 / (1,000 + 8) = 99.2 percent. Improving MTBF increases the time between failures. Reducing MTTR reduces the time lost when failures do occur. Both levers improve availability — which one to prioritize depends on whether the constraint is failure frequency or repair efficiency.
How is MTBF tracked in a CMMS?
A CMMS tracks MTBF by recording failure events as work orders with failure codes, logging repair completion times, and calculating operating hours from asset runtime data. The system automatically computes MTBF from these inputs and surfaces trend data in reliability dashboards. The key requirement is consistent failure coding — every unplanned failure must be recorded as a work order with a failure type designation for MTBF calculations to be accurate and comparable over time.
Related Terms
- Mean Time To Failure (MTTF)
- Asset Availability
- Preventive Maintenance (PM)
- Condition-Based Maintenance (CBM)
- Asset Criticality Ranking (ACR)
- Root Cause Failure Analysis (RCFA)
- Failure Mode and Effects Analysis (FMEA)
Track and Improve MTBF With Redlist
Redlist captures failure data, calculates MTBF trends, and surfaces declining assets before they become emergencies — giving reliability teams the data they need to make maintenance decisions that compound over time.