Amazon Crash Lesson: Innovate but Verify

While the federal government continues its efforts to expand into cloud and innovative technologies, the Amazon Elastic Compute Cloud (EC2) crash of last week reminds us of the need to "innovate but verify" as we move to the next realm of technologies.

Today, Amazon issued a 5,700 word explanation of went wrong. The verdict -- the crash was "caused by several root causes interacting with one another." In other words, things didn't work right. A quick lowdown in twitter-esque wording of what happened on April 21:

  • Amazon tries to upgrade capacity in northern Virginia regional network storage, traffic rerouted
  • Oops. Goes to backup instead of main network
  • Too much traffic, causes clogging and cutoff
  • Amazon fixes but storage area tries to back up data
  • "Re-mirroring storm" equals dreaded computer "hourglass of doom," we all know
  • Chaos spreads
  • Affected users get 10 free days

Amazon is now assessing the EC2 structure, saying "this event has taught us that we must make further investments" to realize a better design goal.

In the end, sometimes a crash means having to say you're sorry. The lesson for the government: Innovation demands that "new" technology be tested and verified to truly be cutting edge and a trusted tool in government-citizen interactions.