8

Recently, Amazon S3 had an outage which caused lots of web services to go down, including IFTTT, which is often used to link IoT devices together (e.g. connecting your Alexa to some Philips Hue bulbs).

Nest security cameras stopped working, TP-Link smart switches refused to turn on, and someone wasn't able to change their mouse sensitivity because it syncs with the cloud because of the outage, apparently.

In a smart home with a few Philips Hue bulbs, an Amazon Echo and some smart switches, I'd like to try and avoid issues like that so my house doesn't 'go down' along with the cloud services.

How can I figure out if my devices rely on one single service and avoid it if possible?

1 Answer 1

9

As a consumer

Your options are often quite limited as a consumer, but you can minimise your risks in a few ways through carefully selecting the products you use and how you connect them.

Check what happens when your device loses Internet connectivity

Usually, you can just do a quick Google search to see what happens when a certain device disconnects from the Internet. Some devices will simply fail completely if their connection to a remote cloud server is lost, like the Amazon Echo:

Your Echo requires an active Wi-Fi connection to speak, process your commands, and stream media.

Sometimes, there's a good reason (for example, the Echo has to stream commands to the cloud to process your instructions, as stated in 'Is the Amazon Echo 'always listening' and sending data to the cloud?'), but for others, it may just be an oversight or design flaw in your product.

If you physically have the device, you could try unplugging your router to see what happens—this might not be a great test, because it's more likely a remote server will break but local connections will still work, but it's something to try.

With enough time to waste use productively to improve your setup, you could potentially sniff packets from your devices, then apply a router-level block to certain domains—this way, you'd know what happened if mydeviceserver.com went down completely. Of course, this would take a long time so it might not be practical to test all of your devices in a large home with lots of 'smart' devices.

Use local connectivity

If you're just turning your lights on from your smart switch, you might not need to route all the traffic through the Internet, into a cloud server thousands of miles away, and back to your lightbulb—you might just be able to route the command through local devices instead. A lot of the time, these devices will use a protocol like ZigBee or Z-Wave, so you might need a hub to co-ordinate the traffic (see 'Why do I need hubs for some devices when automating my home?').

As a developer

For developers of IoT devices, careful design of a device can avoid problems like the recent S3 outage from affecting consumers. Of course, IoT designers aren't always known for careful design, but if you're reading this, you're probably not in that group.

Design services to be redundant

For Amazon S3's recent outage in particular, there may not have been much you could do. There are some reports that cross-region replication could have potentially prevented services from going down, as explained in this question on DevOps Stack Exchange, but it's debated whether that's really true or just poor advice.

If feasible, having some sort of redundancy or backup would be ideal—although the costs are greater, the additional reliability is greatly needed—otherwise, people's lights stop responding, power switches refuse to work, etc.

Add better support for scenarios without Internet connections

I listed 'Use local connectivity' under the ways that a consumer could avoid this issue, but it's a losing battle. The devices often don't support connecting in any other way than through their approved web service, and manufacturers are reluctant to spend developer time on this. If the support was greater, there would be less reliance on cloud services, which benefits the manufacturer too, because they don't need to pay for so much server capacity.

With all these options, why were so many devices affected?

Because no-one wants to spend the time—designing any sort of reliable system takes a lot of time and effort, and it's often far more complex than the comparable 'dumb' solution (e.g. simple electrical switches).

Why isn't software as reliable as a car? Because the software has so much more complexity, yet isn't tested nearly as rigorously as a car would be. The same issue seems to apply with IoT—controlling devices through a network is far more complex, so things can go wrong far more easily, as we've seen with the recent S3 incident.

4
  • 1
    this one seems to be very informative! Commented Mar 4, 2017 at 17:00
  • 1
    Maybe worth noting that redundancy probable doubles the ongoing cost of a service, and this will have a huge impact on the business model for anyone other than a major provider. Commented Mar 4, 2017 at 20:33
  • Your last question will get laughed at by the automobile industry. Software isn't as reliable because it's not even half as rigorously tested as cars are—at least cars in western countries. Essentially every half-assed software can get released. Cars, not so much. Btw the cross-region replication wouldn't have helped last week. The replication is not usually set up for redundancy purposes, but for fast global accessibility. The Netflix blog linked in that answer targets a completely different problem. An ELB outage. ELB is no storage.
    – Helmar
    Commented Mar 5, 2017 at 16:08
  • @Helmar I did a a little bit more research on the cross-region replication issue and there isn't really much evidence either way—some people say it would, some say it wouldn't. I'll edit that bit though. As for the Netflix bit, that's not the part I intended to reference, it's just attached to the same answer.
    – Aurora0001
    Commented Mar 5, 2017 at 16:31

Not the answer you're looking for? Browse other questions tagged or ask your own question.