Allow Alerts to be disabled during Atlas performed Maintenances
When Atlas is performing maintenances such as setting changes and OS reboots Alerts will be triggered. It would be helpful to have these Alerts be disabled so that it doesn't cause false alarms or so much noise that they are ignored.
Hi Tomas - Thanks for the feedback. We are evaluating a way to silence alerts happening during maintenance. Will touch-base shortly.
Our clusters keeps sending loads of alerts during maintenance times about host is down and similar - it's logical and I understand that those are rolling updates so the functionality of the cluster is not affected. However it's quite unfortunate that host are down for quite a while (20 - 30 mins sometimes) and as we don't want to create unnecessary noise from atlas monitoring we had to bump up our slack and pager duty notification to be fired 30+ minutes. This covers maintenance but out of maintenance this is ridiculously high and we would like to know there is possible issue way sooner than that.
As it's the atlas which is performing all the maintenance work it seems better solution would be to let atlas disable such alerts when entering the maintenance and enable them automatically again when maintenance is finished.
We just had a similar issue to this last night. What happens is that we will get an alert for Disk I/O % utilization on Data Partition has gone above 90 and if it continues we could get a primary election. My guess is the primary node gets so bogged down that an election occurs. So I guess it would be nice to see that the io usage got to the point where it took down the node vs maintenance taking down a node.
We had an issue a few months back where one of our applications was having an issue we could not quite figure out. After looking at the database we noticed it was failing over every 10 mins to hour or so for more than a day. The timing of the failovers lead us to finding the self inflicted problem.
Since the failovers led us to the problem it was decided that we should enable it for all of our projects. We have not hit the issue since to gain the benefit, but we have seen the negative with every update/upgrade.
Hi Jason, Thank you.
For what it's worth, that is a non-default alert in Atlas that we do not recommend most customers enable. We do not want customers to feel significant anxiety stemming from the routine elections that should be expected in MongoDB Atlas generally speaking. We do offer the "Test Failover" capability to help ensure that apps are resilient to elections.
Did you enable that alert because it adds value for you?
The alert that we get is "Replica set elected a new primary" when the fail overs happen
Out of curiosity, do you receive alerts every time Atlas performs maintenance? Also do you happen to know if those alerts were defaults that Atlas had pre-baked, or might they have been ones that you custom set?
I definitely hear you on not wanting to deal with noise around maintenance windows and would like to figure something out.