|
What problem are you trying to solve?
Focus on the what and why of the need you have, not the how you'd like it solved.
|
Our service operates under strict operational policies: quarterly maintenance cycles, no scheduled downtime, and an expectation of fully uninterrupted service. Other managed services such as Amazon RDS (Aurora MySQL) or Elastic allow customers to ignore or skip minor patches and apply only major updates in a controlled, non-disruptive manner. In Atlas, however, system-initiated maintenance can still trigger instance restarts or minor version updates that we cannot defer or ignore. Even though these events are designed to be non-disruptive, the resulting failover and alerts cause operational workload, internal escalations, and service reliability concerns.
|
|
What would you like to see happen?
Describe the desired outcome or enhancement.
|
A customer-controlled option to fully skip, defer, or disable maintenance tasks, including minor version patches and system-initiated host updates.
Version-level control, allowing customers to skip specific minor versions or apply updates only during explicitly approved windows.
A strict no-downtime mode, similar to AWS/Aurora’s optional OS/security patch controls, allowing customers to opt out of non-critical maintenance.
|
|
Why is this important to you or your team?
Explain how the request adds value or solves a business need.
|
We run a B2C service that is extremely sensitive to even momentary failovers. Although Atlas maintenance is designed to avoid service interruption, the failovers and CPU spikes during elections still generate alerts at night, requiring manual checks and follow-up actions.
This adds unnecessary operational cost and creates reliability concerns for our service teams.
If unplanned maintenance continues, it may become a barrier to expanding Atlas adoption across more services, despite our willingness to grow usage.
|
What steps, if any, are you taking today to manage this problem? |
Currently, we manage the issue by:
Conducting service health checks after every maintenance event
Responding to nighttime alerts caused by instance restarts
Manually verifying stability after minor version changes
Control the maintenance and requiring no-downtime operation policies internally
|
My comment has been deleted. Could you explain why this happened?
The most ideal approach, in our view, would be to not apply maintenance by default for non-critical updates unless the user has actively opted in or made an explicit choice.
If that isn’t feasible, a deferral period of at least 3 months (one quarter) would be highly appreciated.
This would allow maintenance tasks to accumulate and be applied in a single batch, which could significantly reduce the frequency of service impact events, and ultimately lower the number of maintenance-related disruptions.
As for the other items you inquired about — they are already configured and operating as expected on our side.
Hi there, thank you the feedback. We need more clarity on this request- it would be very helpful if you could answer these questions- How long are we looking to defer non-critical maintenance for? Do you currently have have maintenance windows/protected hours configured? And why does the defer option today not work?
This is a critical matter that significantly impacts our credibility with users. It also imposes a substantial operational burden, as it requires follow-up actions from our Developer, DBA, Product, and CS teams, leading to unavoidable extra operational resource drain.