UPSes suck and need to be disrupted
Warning: this is a rant.
I use a UPS (Uninterruptible Power Supply) to protect the Great Beast of Malvern from power outages and lightning strikes. Every once in a while I have to buy a replacement UPS and am reminded of how horribly this entire product category sucks. Consumer-grade UPSes suck, SOHO UPSs suck, and I am reliably informed by my friends who run datacenters that no, you cannot ascend into a blissful upland of winnitude by shelling out for expensive “enterprise-grade” UPSes – they all suck too.
The lossage is extra annoying because designing a UPS that doesn’t suck would be neither difficult nor expensive. These are not complicated devices – they’re way simpler than, say, printers or scanners. This whole category begs to be disrupted by an open-hardware design that could be assembled cheaply in a makerspace from off-the-shelf components, an Arduino-class microcontroller, and a PROM.
How badly do UPSes suck? Let me count the ways…
I know people who hook up car batteries to salvaged UPS electronics and get 10 years of life out of the rig. UPSes could be designed with the kind of deep-cycle gel batteries used for marine applications like trolling motors to last even longer and be even more reliable. But noooooo. Buy a UPS and the vendor (even one of the relatively good ones) will sell you the crappiest, lowest-cost power cell that might, with a squint and a following tailwind, possibly achieve the dwell time printed on the packaging; it will in fact be a piece of shite with so little deep-cycle endurance that it will crap out in usually less than three years.
(This isn’t just needlessly expensive, it’s bad for the environment. Dead batteries are nasty things to put in the waste stream. Doing that as seldom as possible is real care for the ecosphere, not mere idiot virtue signaling like, say, ‘recycling’ paper and plastic.)
Yeah, and that dwell time will always be at least what Mark Twain called a “stretcher” and from less scrupulous vendors an outright lie. Vendors commonly measure it with tiny monitors and half the other peripherals slept; you’ll be lucky to get 50% of the rated value on a system in real use.
Next: automobiles nowadays are are equipped with intelligent battery-current sensors that measure not just output voltage but discharge current and battery temperature. This is enough to do accurate state-of-charge and state-of-battery calculations, so you (a) know how much dwell time to expect during an outage, and (b) get warning your battery has entered the bad end of its bathtub curve well before it craps out.
But are these intelligent current sensors deployed in UPSes? Why, no! That might add a couple of cents to the BOM and of course we can’t have that. Far better to inflict unexpected battery death on the customers who, you know, were paying money exactly so that sort of thing wouldn’t happen to them.
Now we get to what actually triggered today’s rant: the terrible user experience produced by the vendors’ grim determination to pump out least-possible-cost designs that ignore what users actually need. I was awakened from a sound sleep at oh dark thirty yesterday morning by the alarm on my UPS. Upon examining it, I was greeted by a flashing idiot light.
What’s missing from this picture? A text error-message display, like you’d see even on a rather low-end printer or scanner, to tell me “Scheduled periodic dwell test failed – your battery is dead”. To get a clue that this is what that particular alarm tone and flash pattern meant I would have had to lay hands on the device documentation, which of course you can always do instantly when awakened to deal with an emergency alarm in the middle of the fscking night.
It gets better. The tech support drones at my vendor’s call center couldn’t tell me what that alarm behavior meant either. Eventually they issued an RMA for a new battery anyway, but I couldn’t reconstruct what had actually gone down until discussing the whole sorry mess with AD&D regular John D. Bell the next day – he runs a university data center and has seen incidents like this before much more often than I have.
(It took some struggle to get the vendor to issue an RMA, because although I bought the device just three months ago a serial number lookup reveals that it was manufactured five years ago – the battery spent almost all of its service life sitting in a succession of warehouses. To be fair, this part is not the vendor’s fault – it’s the kind of risk you take when you buy from an electronics retailer that likes to stack other peoples’s overstock goods in the front of the store with an eye-catching discount on the pricetag.)
Back to what is absolutely the vendor’s fault: all but the lowest-end UPSes have monitoring ports (usually USB these days) that, in theory, should let you get useful status information from the device – perhaps even hook it up to a monitoring daemon on your computer that can do a clean shutdown when the device has been on battery for more than a transient brownout’s worth of duration.
In theory. In practice, this sort of thing is such a pain in the ass to set up that despite having contributed to two different UPS-monitoring daemons (NUT and apcsupsd) I gave up on this capability years ago. The problem divides into at least two parts:
1. The wire protocols these things speak are generally undocumented. The vendors think it’s sufficient to provide a Windows binary blob that gives you a fixed-function monitor GUI. Which is almost invariably so badly designed and poorly documented that it might look kinda purty but you can get little actual use out of it.
2. That first problem might be surmountable if you could watch the port yourself and reverse-engineer the protocol by watching what datagrams come up the wire. And if it had been designed by anyone with a clue about how to do application protocols right they’d be in some self-describing metaprotocol like JSON or XML or at least NMEA0183-like text sentences.
But no. This never happens. UPS protocols are invariably cryptic, half-assed crap designed by an EE in a hurry who thinks every “unnecessary” byte transmission is a sin against nature. Fields have no names. Numbers, if they’re not binary-encoded in unspecified endianness, have no units. Opaque status codes abound. The protocol grammar is full of defect-attracting corner cases. The device never IDs itself or provides a protocol version. Discoverability: what’s that?
If this sounds like the same sort of mess that afflicts GPS reporting protocols, that’s because it is. Actually it’s worse here, because there’s no equivalent of NMEA0183 to even begin to address the discoverability problem. Relentless vendor cheapness is at the root of both messes – given a choice between spending NRE on a decent design and shipping inadequate crap that piles hidden long-term costs on customers, UPS vendors unfailingly opt for the latter.
This whole product category begs to be disrupted by a maker design – because it’s possible, and otherwise the incentives on the vendors won’t change. Without disruption, the whole category could stay trapped in a nasty crab-bucket equilibrium that never rewards the risk of spending a few more pennies than ones’s competitors to field a decent design.
Let’s invert the above gripe list to specify what a decent design would look like:
0. Open hardware design, open-source firmware design, open-source device-monitor code.
1. Designed to be used with deep-cycle marine gel batteries that will last next to forever, for minimum long-term cost and least environmental impact.
2. Uses EV-style intelligent battery-current sensors to enable accurate projection of battery performance.
3. Has a textual alert status display in addition to alarms.
4. Has a USB monitoring port that speaks a decently-designed and fully documented wire protocol, probably JSON datagrams a la GPSD.
I could write the firmware, but I don’t have the chops to design the hardware. Anybody game?
Eric S. Raymond's Blog
- Eric S. Raymond's profile
- 140 followers
