Amazon Admits a Typo Shut Down Much of the Internet on Tuesday

Amazon Web Services logo (Shutterstock)

You may have noticed that on Tuesday, a lot of the Internet was acting up. Tons of websites were down. Maybe you would come across a website that worked, but some of the embedded content wouldn’t load.  Or your interoffice chat system wasn’t quite working properly. The overarching reason was known immediately: S3, Amazon’s hosting platform as part of its Amazon Web Services suite of cloud computing tools, saw a number of servers go down. Since S3 is relied upon by much of the Internet and suppose to be nearly unsinkable, this was a very bad thing. On Thursday, once everything was restored, Amazon put up a post on the AWS website explaining what happened: A typo wreaked all of Tuesday’s havoc.

Yes, really.

“We’d like to give you some additional information about the service disruption that occurred in the Northern Virginia (US-EAST-1) Region on the morning of February 28th,” the post reads. While attempting to fix an issue with their billing system, “an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process.” There was just one problem with this: “Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.  One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region.”

Well, that answers that. The post closed with an apology:

Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.

If there’s a moral to this story, it’s probably that even the best and most reliable can screw up in epic fashion.

[image via shutterstock]

Have a tip we should know?

Filed Under:

Follow Mediaite: