With such a large power grid and complex internet infrastructure, outages are bound to happen. The Google Cloud Platform is the latest outage victim. In the afternoon on Nov. 16, many people noticed 404 errors when they tried to access their sites via Google.
The Google Cloud outage disrupted several popular sites including Etsy, Discord, Snapchat, and Spotify. It pointed to a much larger problem since all of the sites started experiencing issues a the same time.
What happened during Google's Cloud outage?
According to Google’s updated status on the issue, the company experienced an issue with Google Cloud load balancing. Users weren't able to access any webpage that was served by the Google External Proxy Load Balancer. The disrupted Google services were Firebase, Apigee, Google App Engine, Engine Flex, Cloud Run, Cloud Functions, and Cloud Networking. Many people couldn't access Snapchat and an estimated 50,000 couldn't use Spotify.
Cloud load balances the process of distributing computing resources and workloads in a cloud environment. The load balancing allows Google Cloud Platform users to distribute applications and allows for over 1 million requests per second. The most significant purpose of cloud load balancing is to prevent one single server from becoming overloaded and malfunctioning.
For the Google App Engine, the company observed an 80 percent decrease in traffic in the U.S. and Europe, along with a 25 percent decrease in traffic for Google Cloud Run. Just four days prior, several Google systems faced disruptions that could have possibly foreshadowed the outage on Nov. 16. On Nov. 12, a number of features were down including the Google App Engine, Google Cloud Composer, Google Cloud Networking, and Google Cloud infrastructure components—all of which were resolved the following day.
The cause of the issue and Google's remedy
Google thinks that the issue is partially resolved and users won't be able to make changes to load balancers until the issue is completely fixed. Google responded immediately to complaints and apologized to users who were affected by the issue. While many people regained access within two hours after the shortage, some people might still have issues with the server. It's estimated that 50,000 Spotify users weren't able to use the site.
The issue is reminiscent of Fastly’s major outage in June due to a software bug triggered by a customer configuration change. However, it isn't clear if a bug configuration is the source of Google's issues. Google’s outage came shortly after Meta’s second Facebook outage. For nearly 3–6 hours, users weren't able to access Facebook, Messenger, or Instagram.
According to Google’s Cloud Status Dashboard at 12:50 a.m. ET on Nov. 17, every issue was resolved except Cloud Networking. While not completely out, Cloud Run and Google App Engine are still facing disruptions. While Spotify, Discord, and Etsy users have since regained access, an estimated time hasn't been released for when Google Functions will be fully operational. Google hopes to provide a full analysis on its dashboard of what happened as soon as possible.