Add metrics to monitor CPU credits for burstable performance Atlas clusters
Add metrics to Atlas for tracking burstable CPU credit spend for M10 and M20 cluster tier instances. Additional add support for creating alerts based on these metrics.
We use M10/M20 instances (AWS backend), which accumulate CPU credits over time. The problem is that when Atlas nodes run out of CPU credits, performance goes down and we see a lot of CPU steals. This is a common issue with AWS EC2 instances. We need a way to monitor this CPU credit balance on Atlas so that we can plan in advance (before the problem happens).
The goal here is to identify when the CPU and Network credits are or are getting close to exhausted and the instance is going to be throttled down to its base.