A key part of building reliable software is handling the unexpected gracefully—be it invalid input, failing services, or infrastructure failures like power outages. Most bad situations can be simulated with unit tests, fault injection or integration tests, but how do you test that your system preserves data and recovers gracefully in the face of a power outage? Killing your service forcibly doesn’t really do it because your server’s drives will still have power to flush their buffers. The only way to really test how you do in a power outage is to somehow kill power to the server. But how to do so programmatically without combining Twilio and an intern?
One simple, cost-effective way is with Belkin WeMo insight switches ($59 from Amazon). Each switch exposes a series of UPnP services via WiFi that can turn the switch on and off, measure power usage, and query usage history. This means that with our simple wrapper library you can do things like simulate a power failure during a sensitive operation, examine the impact of periodic failures on a workload and measure power draw during demanding workloads.