With their latest outage, Amazon Web Services (AWS) provides business leaders with a stark reminder: The public cloud is not infallible, the public cloud does not guarantee high availability and when it goes down, it does it magnificently. Which is why Hybrid IT is so valuable.
It happened again.
Amazon Web Services (AWS) went down yesterday for hours, bringing down a huge chunk of the internet with it. I didn’t realize at the time that AWS was the reason that I was not able to play the latest episode of Supergirl on my Apple TV. And it was not just the iCloud that was affected. It was not just the small sites. Big players were hit big time, including Apple, Adobe, Docker’s Registry Hub, GitHub, GitLab, Quora, Medium, Signal, Slack, Imgur, Twitch.tv…and many more.
This is not the first time AWS has gone down for hours, bringing everyone down with them. And it won’t be the last time.
Yesterday afternoon, Amazon Web Services (AWS) experienced a significant and prolonged outage that brought a number of popular websites and services down. While Amazon is more readily known for its online retail business, the company’s cloud services division has quickly become a huge money maker for the Jeff Bezos-led company. What’s more, AWS provides the backbone for many well-known sites, including Netflix and Quora.
“We are investigating increased error rates for Amazon S3 requests in the US-EAST-1 Region,” Amazon said yesterday amidst a flurry of confusion and frustration.
The problem was eventually resolved, but not before a number of services from Apple were affected. For a brief while yesterday, iOS users experienced difficulties accessing the App Store, Apple Music, iCloud backups, iWork and other cloud-based services.
Yesterday Amazon Web Services had a bad day. And when AWS has a bad day, so do a lot of other sites.
Vendor Apica is a website monitoring services that keeps a close eye on some of the top retail websites around the country. All in all, the retail website Apica tracks had trouble dealing with the elevated errors rates AWS reported in S3 starting around mid-day Eastern Time.
+MORE AT NETWORK WORLD: 5 Lessons from Amazon’s S3 cloud blunder, and how to protect yourself from the next outage +
While Amazon Web Services hasn’t yet issued an apology via its social media channels regarding big problems today with its Simple Storage Service (S3), the company’s customers have turned to Twitter and Facebook to apologize to their own customers — while pointing the finger at AWS.
AWS, via its @awscloud Twitter account, did alert customers that “S3 is experiencing high error rates. We are working hard on recovering.” That was posted a bit after 2pm EST and Amazon has since posted a few updates, including a note about the status dashboard recovering.
In the wee hours of Sunday morning something went very wrong in an Amazon Web Services data center.
At 6 AM ET error rates for the company’s massive NoSQL database named DynamoDB began skyrocketing in AWS’s US-East Virginia region – the oldest and largest of its nine global regions. By 7:52 AM ET, AWS determined the cause of the problems: an issue with how the database manages metadata had gone awry, impacting the service’s partitions and tables.