Our clusters keeps sending loads of alerts during maintenance times about host is down and similar - it's logical and I understand that those are rolling updates so the functionality of the cluster is not affected. However it's quite unfortunate that host are down for quite a while (20 - 30 mins sometimes) and as we don't want to create unnecessary noise from atlas monitoring we had to bump up our slack and pager duty notification to be fired 30+ minutes. This covers maintenance but out of maintenance this is ridiculously high and we would like to know there is possible issue way sooner than that.
As it's the atlas which is performing all the maintenance work it seems better solution would be to let atlas disable such alerts when entering the maintenance and enable them automatically again when maintenance is finished.
Our clusters keeps sending loads of alerts during maintenance times about host is down and similar - it's logical and I understand that those are rolling updates so the functionality of the cluster is not affected. However it's quite unfortunate that host are down for quite a while (20 - 30 mins sometimes) and as we don't want to create unnecessary noise from atlas monitoring we had to bump up our slack and pager duty notification to be fired 30+ minutes. This covers maintenance but out of maintenance this is ridiculously high and we would like to know there is possible issue way sooner than that.
As it's the atlas which is performing all the maintenance work it seems better solution would be to let atlas disable such alerts when entering the maintenance and enable them automatically again when maintenance is finished.