Amazon Crash Lesson: Innovate but Verify

By Jessica Herrera-Flanigan ,
Jessica Herrera-Flanigan

| April 29, 2011

While the federal government continues its efforts to expand into cloud and innovative technologies, the Amazon Elastic Compute Cloud (EC2) crash of last week reminds us of the need to "innovate but verify" as we move to the next realm of technologies.

Today, Amazon issued a 5,700 word explanation of went wrong. The verdict -- the crash was "caused by several root causes interacting with one another." In other words, things didn't work right. A quick lowdown in twitter-esque wording of what happened on April 21:

Amazon tries to upgrade capacity in northern Virginia regional network storage, traffic rerouted
Oops. Goes to backup instead of main network
Too much traffic, causes clogging and cutoff
Amazon fixes but storage area tries to back up data
"Re-mirroring storm" equals dreaded computer "hourglass of doom," we all know
Chaos spreads
Affected users get 10 free days

Amazon is now assessing the EC2 structure, saying "this event has taught us that we must make further investments" to realize a better design goal.

In the end, sometimes a crash means having to say you're sorry. The lesson for the government: Innovation demands that "new" technology be tested and verified to truly be cutting edge and a trusted tool in government-citizen interactions.

NEXT STORY: DHS probing Sony PlayStation network attack

CDM

Future-Ready Workforce