More detail and lead time around maintenance notifications
When we get email notifications of upcoming maintenances, it doesn't specify if the update is "urgent" or "low". When we received our last notice, it was 3 days out. If it's urgent, then we understand the short time frame around the notice. However, if it's a lower priority update, I would hope we could configure how far out we are aware of the maintenance or at a minimum get notified 2 weeks prior so we can communicate to our end users.
Additional information around what the maintenance is for would be appreciated as well. Something along the lines of "Patching to 4.0.14" or "OS level updates". In general, we are in the dark in regards to what is being updated.
This is a pain-point for us as well.
As a customer who recently moved to Atlas, we only discovered driver intolerances with this downtime on some parts of our stack after our migration and it's going to take time to work on a permanent fix. I have no doubts that other Atlas migrants will run into similar issues and so maintenance is far from zero-impact for some clients.
Additionally we have a zero-risk tolerance during key commercial holidays of the year (eg Black Friday) and these maintenance tasks need to be re-scheduled but with them typically being such short notice, it's quite possible to miss them and fail to click the "defer 1 week" button and as a result, apply a change that includes some level of risk, during a risk-adverse period.
While we do not plan to introduce a greater than 72 hours heads up, you can always defer maintenance for a week up to two times. You can also select your preferred hour of the week for our heads up to go out 72 hours. If Monday is sub-optimal, why not move your window back int he week? This gives you the power to decide when the optimal time for you is.
Any update on this? Even a week notice would be great, especially if we have a maintenance window on Monday. Otherwise, with 3 days notice, we only know about it one business day beforehand.
I agree with you Andrew, but that assumes all applications are set up correctly and we have control over those applications to make sure they are up to snuff. In the past, the business has not be okay with that short of notice with anything changing in the environment, especially production workloads.
I'm just asking for more notice and more insight into what is changing.
Thanks for reaching out: importantly, Atlas maintenance is performed in a rolling manner and should only manifest as a replica-set level election from a client perspective. This means that an application with built-in retry logic (MongoDB offers retryable writes and retryable reads since 4.2) generally need not worry about maintenance.
We have the "Test Failover" capability in the cluster's "..." menu in the Atlas UI for testing your resilience posture.
We introduced maintenance windows so that our customers that are less election-tolerant could have more control into the preferred time the maintenance would occur. You should not need to schedule downtime with your end customers, and if maintenance is causing you downtime we would like to provide you assistance to ensure there isn't a driver bug at play.