Managing sensor unreliability

By Pilgrim - January 31, 2019

Connected products are far from 100% unreliable. With customers across commercial and consumer markets, DevicePilot has unique typical performance levels and the underlying reasons and mitigation.

The facts

If you're like most connected product companies today, you'll struggle to achieve "one nine" of uptime (working 90% of the time). This is a much worse experience than for typical unconnected products. To move towards two nines (99% uptime), you must reduce your downtime by a factor of 10.

Why are IoT sensors unreliable (and how do you fix that)?

Here are the main failure modes and ways to mitigate them:

Failure mode

How to mitigate

Sensor application crashes

○ Sending "heartbeats" allows remote management

○ A "watchdog timer" reboots sensor automatically

Sensor battery runs out

○ Track battery state remotely:

• automatically ship batteries just-in-time

• flag any sudden increases in consumption and root-cause (new software version? hardware problem? pathological application state?)

Sensor "falls off" network

○ Rigorous testing of network management code (notoriously complex)

○ Collect diagnostics from the network layer (e.g. cellular)

○ Store-and-forward data in the sensor

○ Don't use wireless unless you have to

○ Provide more than one comms link (e.g. cellular plus LoRA, or use meshing)

Sensor unplugged or damaged

○ Design hardware to detect the condition, for example a broken/missing temperature sensor mustn't report a plausible temperature

In general

○ Implement a "black box", logging to non-volatile memory on the sensor, so that in the worst case individual units can be diagnosed by R&D engineers

○ Add some redundancy in the data you sent. Repeat any important "state" information regularly even if it hasn't changed.

Missing data

As data trickles through an IoT system from sensors to cloud database, it's inevitable that some of it will go missing. Here are some rules of thumb to make sure can cope well with that reality:

  1. Design your application to cope with missing data. For example, asking the average temperature of a million sensors when only 100 are missing isn't an "error", it's a reasonable answer, albeit with a caveat.

  2. Don't ever "invent" data.

    1. If you're plotting a graph and some of the data is missing, don't skim over the gap - its existence is important information to show the user

    2. Likewise, as you can together pieces of analytics within your application, you might like to pass a "confidence" value each result, so you never lose track of the quality of the result

  3. Ultimately, your application may sometimes have to say "I don't know" because the input data is too patchy or too old to allow a high-confidence answer.

DevicePilot's Cohort Analysis page is a good example of these principles in action: the colour-density of each bar on the chart shows the number of devices making up that sample, making statistical significance intuitive.

Uptime Percentage by Signal Strength

How good can you get?

Be aware of some "laws of physics" limitations. For example, if your device is using cellular connectivity, is deployed indoors, and you have no control over its exact placement, then you will be lucky to achieve 92% network availability.

If you're deploying a lot of connected products, you can't ignore the challenge of reliability. DevicePilot provides a great way to get the big picture to identify, measure, analyze and resolve you smart product challenges.

email-image

Ready to get started? Talk to our experts

View pricing and event analytics across your entire portfolio in a consolidated dashboard. Toggle between a macro-level overview and detailed property-by-property view in an actionable and flexible interface.

Comments

We promise that we won't SPAM you.