Gradual degradation vs. sudden failure: two problems, two models

Early in the project, we tried to build a single model that would detect and predict all charger failures. This was a mistake, and understanding why leads to something important about how the problem is actually structured.

Two failure modes, two data signatures

Gradual degradation and sudden hardware failure are not points on a spectrum. They are mechanistically different events with different data signatures, different time horizons, and — critically — different data requirements.

Gradual degradation shows up in OCPP data. It has a temporal signature: power delivery efficiency trending down over days, error code frequency creeping up, session completion rates declining. The signal is in the stream, and it accumulates. By the time a charger fails from gradual degradation, the evidence was there — often weeks earlier — for any model patient enough to watch for it.

Sudden hardware failure — a connector relay that burns out, a cooling fan that seizes, a power electronics component that fails without precursor — typically doesn't have a strong OCPP precursor. The charger was reporting fine, and then it wasn't. To predict this class of failure, you need data that OCPP doesn't carry: temperature inside the housing, vibration, electrical component telemetry. This is a sensor fusion problem, not a message stream problem.

Why a unified model fails

When we trained a single model on both failure classes, it learned the gradual degradation signature well but effectively ignored sudden failures — because sudden failures with no OCPP precursor provided no learnable signal. The model's performance on gradual degradation dragged up the overall metrics, masking the fact that it was doing nothing useful for the harder class.

The honest thing to do was split the problem. We now have:

A gradual degradation model — an LSTM sequence model plus a gradient-boosted classifier, operating on normalized OCPP streams. This model has real predictive value and is deployed in the pilot.
A sudden failure research track — where we're working with hardware partners to understand what telemetry is available, at what cost, and whether it's sufficient to build a useful classifier. This is not yet in production.

What this means for operators

The practical implication is that our current system gives you real, actionable predictions for one class of failure and honest uncertainty for another. We think that's more useful than a black-box model that claims to handle everything but provides no way to understand its confidence or limitations.

Operators who work with us know: if we flag CHG-042 for scheduled inspection, that signal is coming from a model trained on real degradation signatures. If we don't flag a charger, we're not guaranteeing it won't fail suddenly — we're saying there's no detectable degradation trend in its OCPP stream.

The roadmap

Getting to sudden-failure prediction requires hardware integration. We're in conversations with telemetry providers who can give us access to the sensor data we need. The architecture for the fusion model is designed — combining OCPP stream features with time-series sensor data through a multimodal embedding that can handle asynchronous update frequencies.

The research track is real, not vaporware. But we're not going to claim we've solved it before we have. The gradual degradation model works. The rest is in progress.