Change Streams Monitoring and Alerting
Change streams can cause performance issues if not used properly. In some cases, administrators of multi-tenant dbs have no control (and shouldn't) over how various clients create change streams.
I think it is important that we accommodate these use-cases and provide useful metrics in the OM/Atlas metrics pages, and alerts on those metrics. Some potential metrics:
1. Number of change streams open
2. Average change stream lifetime
3. Query targeting ratios for change streams
4. Avg time between consecutive polls of the change stream (and other statistics)
--thought here is that change streams that are polled infrequently will result in less performant reads against the oplog
5. Num docs read from change streams
6. Difference between timestamp of most recently consumed change stream and end of the oplog
7. Difference between timestamp of most recently consumed change stream and beginning of oplog
I realize that probably some of these are unrealistic to implement once the details are considered, but Im interested in any useful metrics we can add regarding change streams. Currently the only way to retrieve some of this info is from the logs or via db.currentOp.
Such metrics will assist in managing and monitoring the change streams. An immediate use case is for Ops Manager admins to pre-empt any rogue change streams that may affect the MongoDB cluster operation.