Google Workspace Status Dashboard

This page provides status information on the services that are part of Google Workspace. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://workspace.google.com/. For incidents related to Google Analytics, visit the Google Ads Status Dashboard.

Incident affecting Google Groups

Incident began at 2021-11-12 08:30 and ended at 2021-11-12 10:26 (times are in Coordinated Universal Time (UTC)).

Date Time Description
Dec 2, 2021 11:37 PM UTC

INCIDENT REPORT

DATE/TIME OF THE ISSUE (US/Pacific time) Friday, 12 November 2021 00:30 - Friday, 12 November 2021 02:26 Duration: 1 hour, 56 minutes

Summary

On November 12, 2021, the Google Cloud Load Balancing (GCLB) service experienced failures resulting in impact to several downstream Google Cloud services in Europe for a duration of 1 hour, 56 minutes. We understand that this issue has impacted our valued customers and users, and we apologize to those who were affected.

Background

Google Cloud Load Balancing is a collection of software and services that load balance traffic across Google properties. There are two main components: a control plane and a data plane. The control plane provides programming to the data plane on how to handle requests. A key component of the data plane is the Google Front End (GFE).

The GFE is an HTTP/TCP reverse proxy, which is used to serve requests to Google properties including Search, Ads, Workspace (Gmail, Chat, Meet, Docs, Drive, etc.), Cloud External HTTP(S) Load Balancing, Proxy/SSL Load Balancing, and many Cloud APIs. Updates are regularly rolled out to GFEs, typically via configuration flags, starting with canary GFEs and gradually expanding to production globally.

GFEs support and terminate QUIC(1) connections, before connecting to downstream backend services. QUIC is a general-purpose transport layer network protocol. Upon first connection, QUIC servers supply a source address token to prove that a client has previously used a given address when resuming a future connection.

Root Cause

On Friday, 12 November at 00:27, a configuration change modifying the format of the source address token provided to QUIC clients was rolled out to a small set of GFEs. This change resulted in a misconfigured token that could crash GFEs that had not yet received this update. Shortly thereafter, the monitoring service automatically detected a problem with GFEs using this flag and rolled back the change within four minutes. However, clients that had connected to a GFE with the updated configuration during that period received a misconfigured token, which was subsequently shared with other GFEs during reconnection. So despite the rollback, impact remained until additional mitigations were put in place.

[1] - https://cloud.google.com/blog/products/gcp/introducing-quic-support-https-load-balancing

Remediation and Prevention

Google engineers were alerted to the issue via automated alerting on Friday, 12 November 2021, at 00:30 US/Pacific and immediately started an investigation. At 00:31, the configuration change was automatically rolled back. However, by 00:42, it was clear the impact remained widespread, and our engineering team continued further investigation. Mitigation began at 01:38, when traffic was redirected away from the impacted GFEs. At 02:12, a flag change was pushed to temporarily disable QUIC support on GFEs, which mitigated all impact by 02:26.

In order to prevent this type of outage from happening again we are pursuing the following:
QUIC support has since been re-enabled, and the relevant bug fixed. Improve mitigation playbooks for GFE rollbacks to reduce mitigation time in future. Improve GFE regionalized monitoring to reduce investigation time in the future. Further narrow the scope of GFE canaries to limit impact of future issues. Improve collection of debugging details about active handshakes upon fatal GFE errors.

We want to apologize for the length and severity of this incident. We are taking immediate steps to prevent recurrence and improve reliability in the future. If your service or application was affected, we apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

Detailed Description of Impact

On Friday, November 12, 00:30 2021 US/Pacific, the GCLB service experienced failures resulting in impact to several downstream Google Cloud services for 1 hour, 56 minutes. Some customers in Europe were unable to access web and mobile clients for services including Gmail, Groups, Calendar, Tasks, and Chat.

Google Gmail

Affected customers were unable to access web and mobile clients. This resulted in ~2% traffic drop for Gmail services. This mostly affected customers in Europe. The period of impact was between 00:30 and 02:53.

Google Groups

Affected customers were unable to access web and mobile clients. This resulted in affected customers in Europe, who were unable to access web and mobile clients. The period of impact was between 1:28 and 3:06, during which time affected customers in Europe were having issues loading the Groups UI.

Google Tasks

Google Tasks experienced error rates up to ~.2% in Europe. Affected customers were unable to access web and mobile clients. The period of impact was between 00:30 and 02:10.

Google Calendar

Google Calendar experienced error rates up to ~.5% in Europe. Affected customers were unable to access web and mobile clients. The period of impact was between 00:30 and 02:10.

Google Chat

14.5% of Chat users could not connect, which impaired functionality in their clients. This affected mostly European users, both web and mobile. The period of impact was between 00:30 and 02:20.

Appendix

[1] - https://cloud.google.com/blog/products/gcp/introducing-quic-support-https-load-balancing

Nov 13, 2021 1:04 AM UTC

We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support.

(All Times US/Pacific)

Incident Start: 12 November 2021 00:30

Incident End: 12 November 2021 02:14

Duration: 1 hour 44 minutes

Affected Services and Features:

  • Google Gmail
  • Google Groups
  • Google Calendar and Tasks
  • Google Chat

Regions/Zones: Europe

Description:

Google’s Front End load balancing service experienced failures resulting in impact to several downstream Google Cloud services in Europe. From preliminary analysis, the root cause of the issue was caused by a new infrastructure feature triggering a latent issue within internal network load balancer code.

Customer Impact:

  • Google Groups - Affected customers in Europe were unable to access web and mobile clients.
  • Google Calendar and Tasks - Affected customers in Europe were unable to access web mobile clients.
  • Google Chat - Affected customers in Europe were unable to access web and mobile clients.
  • Google Gmail - Affected customers in Europe were unable to access web and mobile clients.

Additional details:

The error was caught within 4 minutes by automated safety systems, and further spread was slowed at this point. The issue was fully mitigated approximately 1hr 44m later, when our engineering team completed a rollout to disable the vulnerable code path. The issue will be fully prevented going forward via a root cause fix, which will complete rollout by 12 November 2021 21:00 US/Pacific.

Nov 12, 2021 10:55 AM UTC The problem with Google Groups has been resolved. We apologize for the inconvenience and thank you for your patience and continued support.
Nov 12, 2021 10:39 AM UTC Our team is continuing to investigate this issue. We will provide an update by Nov 12, 2021, 11:00 AM UTC with more information about this problem. Thank you for your patience. The affected users are unable to access Google Groups.

Some users in Europe may experience issues when attempting to access services.

Nov 12, 2021 10:05 AM UTC We're investigating reports of an issue with Google Groups. We will provide more information shortly. The affected users are unable to access Google Groups.

We are investigating an issue which is affecting some users in Europe affecting their ability to access some services.