Welcome to the new MongoDB Feedback Portal!
{Improvement: "Your idea"}
We’ve upgraded our system to better capture and act on your feedback.
Your feedback is meaningful and helps us build better products.
We’ve upgraded our feedback system to better capture, track, and act on your feedback. Here’s what you need to know:
|
What problem are you trying to solve? Focus on the what and why of the need you have, not the how you'd like it solved. |
Our customer operates a large, business‑critical Atlas footprint and standardizes all alerting on Prometheus + Alertmanager. Today, they have no direct, machine‑readable signal in Prometheus for upcoming or ongoing Atlas maintenance events. As a result, they cannot easily distinguish planned maintenance from genuine incidents in their central monitoring stack, and they must configure and maintain separate alerts in the Atlas UI, which also requires broader Atlas access for non‑admin team members. |
|
What would you like to see happen? Describe the desired outcome or enhancement. |
We would like Atlas’s Prometheus integration to expose a dedicated metric for maintenance, e.g. a |
|
Why is this important to you or your team? Explain how the request adds value or solves a business need. |
The customer has strict reliability requirements and wants a single, trusted observability stack. Having maintenance visibility only in the Atlas UI or via ad‑hoc API/webhook integrations fragments their alerting story and makes it harder to coordinate operational response. A first‑class Prometheus metric would:
The customer explicitly called out the next World Cup as a high‑risk period with expected peak traffic; they would like unified monitoring for Atlas maintenance in place before then, to align with their internal reliability objectives. It would not be needed only for this period, but long-term. |
What steps, if any, are you taking today to manage this problem? |
Today, the customer uses an indirect workaround: they alert on the availability of cluster members via Prometheus and infer that maintenance is happening when replicas briefly go down during a failover. This helps, but it is noisy and does not provide forward‑looking insight into upcoming maintenance. We also discussed alternatives such as consuming maintenance events from the Atlas Activity Feed via the Admin API or webhooks, but these options require the customer to build and operate custom integrations and/or manage separate alert rules in Atlas, which they view as less desirable than a native Prometheus metric. |