Allow customer encryption key validation time interval
Currently, Atlas has a 15 minute interval where it checks access to remotely stored Customer Managed Encryption keys for the custom encryption at rest feature.
This value is not changeable, and it implies a problem: access to encryption keys can be lost for longer periods than 15 minutes without that implying that the key has been invalidated by the customer.
A perfect example of this was the recent Azure AD global outage that prevented all authentication to the Azure platform for about 3 hours (complete outage description: https://app.azure.com/h/SM79-F88/691d78). While the key used for one of our clusters was not invalidated nor the credentials rotated, the whole cluster went down just because the auth mechanism was failing. This impacted our customers and would have easily been mitigated if the time interval for key access checking would have been longer.
-
Guido, Victor,
Please accept our apologies for the availability consequences of the Azure outage you mentioned: You have my commitment that we are making changes on our side so that the Azure outage you mentioned does *not* in future lead to Atlas cluster shutdown--we will instead treat transient errors like this differently.
-Andrew (VP Cloud Products)
-
Victor commented
I second this. Being able to configure it, or even to temporarily pause this check could be real useful to prevent downtime in case of vault unavailability.