What 32 million OCPP messages taught us about charger reliability

When we started ingesting OCPP streams from the pilot fleet, the immediate temptation was to look for dramatic signals — voltage spikes, error codes that scream "hardware failure," connector states that flip in impossible sequences. Those exist. But they're not what's interesting.

What's interesting is the noise that isn't noise.

The signal before the signal

In the months of data we analyzed, the most reliable predictor of charger failure wasn't an error code. It was a subtle, consistent drift in the cadence of MeterValues messages. Chargers that were healthy reported meter readings at a steady interval — typically every 60 seconds, calibrated to their configuration. Chargers that failed in the following 48–72 hours showed an increasing variance in that interval. Not a lot. A few seconds. But consistent.

This kind of drift doesn't show up in threshold-based monitoring. You won't catch it by checking if a value exceeds a limit. You catch it by learning what "normal" looks like for each individual charger and watching for deviation from that baseline.

Why per-charger baselines matter

Fleet-wide averages are misleading. A charger installed at a highway rest stop handles different load profiles than one in a downtown parking garage. Temperature exposure, session frequency, hardware generation, firmware version — all of these shape what normal looks like for a given unit. We train a per-charger model, not a single fleet-wide model, specifically because of this.

The Isolation Forest approach we use for anomaly detection is well-suited to this problem. It doesn't need labeled failure examples — it learns the distribution of normal behavior and flags samples that are hard to isolate from the majority. In a fleet where actual failure events are rare (which is the entire point of maintenance), unsupervised methods that don't require labeled data are essential.

What 32 million messages actually looks like

At this scale, the first engineering challenge is schema. OCPP is a protocol, not a data model. Different hardware vendors implement it differently. Some send StatusNotification messages on every minor state change; others batch them. Some include StopTransactionReason; others omit it. Before we could analyze anything, we had to build a normalization layer that made heterogeneous hardware look consistent upstream.

Once normalized, the message stream tells a surprisingly rich story. Each session leaves a fingerprint — the ramp-up behavior when power delivery starts, the plateau profile, the teardown sequence when the vehicle disconnects. Deviations from a charger's typical session fingerprint are often more informative than any single message in isolation.

What we're still figuring out

The honest answer is that 32 million messages across one fleet over a few months is a start, not a conclusion. Some failure modes we've identified we can predict with high confidence. Others we can detect retrospectively — meaning we can see in the historical data that the signal was there — but we haven't yet built a real-time model that would have caught it in time.

That gap between retrospective analysis and real-time prediction is where most of the hard work lives. We're working on it. The data is there; the question is whether our models are sensitive enough and fast enough to surface it before operators need to respond.