"Deviation from Norm" and "Frog Boil" type alerting
Currently Atlas only alerts us when CPU reaches a critical threshold such as 90%. We would like to see additional types of alerts to detect issues sooner, including the following.
- “Deviation from norm” - A given metric is X% worse than the Y-hour average at the same time window for the past Z days (e.g. “CPU today at 4am is much worse than the average of 3am ~ 4am for the past 7 days”.)
- “Frog boils” - A given metric is becoming progressively X% worse over Y hours / days / weeks (e.g. "CPU usage is 10% higher on average today than it was 1 week ago.")
Such alert criteria would allow us to detect and respond to critical issues earlier, i.e. before we hit 90% CPU.
6
votes
![](https://secure.gravatar.com/avatar/b8825f7e3e6c1337c7c3413c7adf44af?size=40&default=https%3A%2F%2Fassets.uvcdn.com%2Fpkg%2Fadmin%2Ficons%2Fuser_70-6bcf9e08938533adb9bac95c3e487cb2a6d4a32f890ca6fdc82e3072e0ea0368.png)