Atlas
- A brief description of what you are looking to do
- How you think this will help
- Why this matters to you
180 results found
-
Add connection pooling metrics for sharded clusters
We recently ran into an issue where we hit the internal mongoS -> mongoD connection pool limit when reading from secondary's requiring Atlas support to increase the value of ShardingTaskExecutorPoolMaxSize.
As a result it would be great to be able to monitor the internal mongoS -> mongoD connection pool usage so we can monitor it and set up alarms if it gets near the limit.
2 votes -
Export Aggregation Results as Metrics to Prometheus
Add support for exporting MongoDB aggregation results as Prometheus metrics. This would allow users to track custom queries and dynamic data, enabling more granular and meaningful monitoring and alerting in Prometheus and Grafana.
6 votes -
Expose hourly cost data as a metric for monitoring cluster cost
The hourly cost of a cluster is already availble in the Atlas UI. Expose this same data as a metric for monitoring cluster cost. We understand it may not include the data transfer and some other costs but monitoring the spike or valleys in the monitor over time for a given cluster is helpful when autoscaling is turned on. Then we can also be able to set an alarm on the metric.
2 votes -
Premium Monitoring Granularity for lower tier clusters
CURRENT STATE
Premium Monitoring Granularity (10 second metrics) only available on M40 clusters or higher
IMPACT
Lower tiered environments (such as testing and staging) cannot have 10 second metrics granularity. Some customers export metrics to third parties such as Data Dog who only handle homogenous granularity of metrics.
When Data Dog accepts different granularities e.g. 10 second granularity for PROD environments (M40+) and lower granularity for STAGE environments (lower than M40) - it leads to poor data integration and dashboards failing to load data properly.
Customer does not have a reliable into their data since some environments send 10 second…
2 votes -
Collection size metrics
Hi,
From time to time we have Atlas auto-scale up our clusters' disks. We then need to start analyzing why. In some cases it is organic growth of the data we store, but in some cases we are missing TTLs or they are misconfigured and we accumulate data we do not need.
In both cases, trying to realize what causes the disk increase is a very tedious process as some clusters have thousands of collections.
To overcome this, we started running a small utility that gathers some data over all our collections. It iterates on all the organizations, all the…2 votes -
Add application connection details
It would be really helpful to be able to see application connection details on Atlas monitoring. The "real time" only shows active collections. I want to see which applications are connecting, active, and what queries are they running etc. At this moment, we don't have that on Atlas web console. Please consider this feature to be added in the coming future.
1 vote -
Show Mongos Connection Number in Overview.
Currently, the connection number shown in Overview is shard connection, it is the connection from Mongos to Mongod, which should usually be lower than the Mongos connection.
The connection limit for MongoDB is referring to Mongos connection which should be shown in overview or otherwise very misleading when we received connection % alert.1 vote -
Integrar la alerta de Replication lag de Atlas en el API de Prometheus
Se solicita incluir métricas en el API de prometheus para poder implementar la alarma de Replication lag
6 votes -
index stat
Merge index stats from all replicas in index page UI, and enable the reset stats in UI.
Preferably merge all stats from all shards and show it is coming from a secondary replica or a primary replica. Current index stats in UI only show primary stats and it is not useful.1 vote -
Notification "Failed maintenance"
It would be helpful to have a "Failed maintenance" notification. While we cannot resolve the issue ourselves in such cases and need to open an Atlas support ticket, it is still important for us to receive a notification so that we are informed about the problem and can respond accordingly.
1 vote -
Adding balancer activity from sharding Statistics into Atlas UI
The customer often experience multiple performance issues in different clusters related to chunk migrations and in each case the customers are struggling with being able to determine that chunk migrations were occurring and started at the same time as the performance issue. The customer is limited in the observability available to identify this problem via Atlas UI:
Using db.serverStatus().shardingStatistics on each shard can provide what the customers need.
In particular, key metrics below will provide good insight on balancer activity :
For donor:
- db.serverStatus().shardingStatistics.countDonorMoveChunkStarted
: The total number of times that MongoDB starts the moveChunk command or moveRange command…
2 votes -
Query profiler drilldown by number of write conflicts
One sort of issue that can lead to elevated CPU utilization but is otherwise hard to identify is queries with high numbers of write conflicts. Having some way to drill down to queries that exhibit these symptoms would be quite convenient.
1 vote -
Atlas metrics granularity after 48 hours
For metrics older than 48 hours, the data is presented in 1-hour intervals. This level of granularity is often too coarse for a thorough examination of past events and trends. Such a broad view can obscure smaller yet significant details critical for understanding and resolving performance issues that occurred in the past.
Suggested Improvement:
having a smaller granularity value for historical metrics beyond the 48-hour timeframe. Providing data in smaller intervals would greatly enhance our ability to conduct in-depth analyses and diagnose past performance issues accurately. This would be particularly beneficial for conducting detailed investigations of historical data and identifying…
5 votes -
Forward MongoDB Atlas logs to Securonix
This is a feature request to integrate the forwarding of MongoDB Atlas logs to Securonix.
1 vote -
Ability to "Mass Kill" slow running queries
Currently, Atlas has a "Kill Op" option which is useful to kill single long-running queries.
When upgrading to MongoDB 7.0, we were faced with a situation where the Slot-Based Query Engine (SBE) was causing 1000s of queries to execute slowly, we wanted to kill them all, but it was more than a human could do by clicking "Kill Op" 1-by-1. Hence a "Mass Kill" feature which kills queries longer than X seconds (X is configurable) would have helped us greatly in an outage scenario. We ultimately rebooted our cluster to kill queries, then manually implemented a script which did this…
7 votes -
Properly Formatted Prometheus Integration (mongodb_ metrics)
The (mongodb*) metrics collected by the integration lack the metric type and the description is extremely vague (mongodbcatalogStats_views has "catalogStats." as description). It would be easier to setup dashboards and queries if the type (e.g., gauge, counter) was properly set up and the metric provided a proper description.
1 vote -
Prometheus database and collection metrics
We checked the Prometheus metrics provided by MongoDB Atlas and didn't find metrics for the following:
Database sizeCollection storage size
Record per collection
Indexes per collection
Index size
We would like to have this kind of metrics to add to dashboards.
25 votes -
Add document count to Datadog metrics
We'd like to monitor the number of documents in a collection via DataDog.
For On-Premise MongoDB the stats are already reported via mongodb.collection.count, mongodb.collection.size and mongodb.collection.avgobjsize.
If the same metrics could be made available for Atlas (E.g. mongodb.atlas.stats.collection.count) that would really help in monitoring.
E.g. spikes in different parts of the application could be tied to the number of documents on a glance. Without having that metric available, it is hard to pinpoint if a recent change had a negative impact on performance.
If the metrics can't be made available on the collection but only on the database level, this…
8 votes -
Change Streams Monitoring and Alerting
Change streams can cause performance issues if not used properly. In some cases, administrators of multi-tenant dbs have no control (and shouldn't) over how various clients create change streams.
I think it is important that we accommodate these use-cases and provide useful metrics in the OM/Atlas metrics pages, and alerts on those metrics. Some potential metrics:
1. Number of change streams open
2. Average change stream lifetime
3. Query targeting ratios for change streams
4. Avg time between consecutive polls of the change stream (and other statistics)
--thought here is that change streams that are polled infrequently will result in…9 votes -
Metric Grouping
A huge improvement and help when it comes to metrics would be the ability to query by grouping (e.g. for database access users). This way if you were to use a specific database user per a specific service connection, we could see how much load to the database that specific service is causing.
Any form of implementation would be helpful, one example could be adding labels to the prometheus metrics per user, replica/shard etc.2 votes
- Don't see your idea?