The Obama campaign's technologists were tense and tired. It was game day and everything was going wrong.
Josh Thayer, the lead engineer of Narwhal, had just been informed that they'd lost another one of the services powering their software. That was bad: Narwhal was the code name for the data platform that underpinned the campaign and let it track voters and volunteers. If it broke, so would everything else.
They were talking with people at Amazon Web Services, but all they knew was that they had packet loss. Earlier that day, they lost their databases, their East Coast servers, and their memcache clusters. Thayer was ready to kill Nick Hatch, a DevOps engineer who was the official bearer of bad news. Another of their vendors, StallionDB, was fixing databases, but needed to rebuild the replicas. It was going to take time, Hatch said. They didn't have time.
They'd been working 14-hour days, six or seven days a week, trying to reelect the president, and now everything had been broken at just the wrong time. It was like someone had written a Murphy's Law algorithm and deployed it at scale.