Improved Metrics | Memory usage by collection
The Profiler is a great tool, but it does have limitations. The metrics there don't really tell us what's driving I/O. For example, we might have these two queries:
- a query that scans 10000 documents that active users trigger, but the queried documents are the same for all users
- a query that scans 100 documents that active users trigger, but the queried documents are different for all users
If you have 1000s or more active users, the working set (memory usage & i/o) is being driven by the second query. And, it probably performs better than the first query in the Profiler, but ultimately will be the query that is causing the database to slow down.
It'd be great if there was a way to see the memory usage by collection. I think some really telling metrics would be:
- A periodic snapshot of pages in Memory, broken down by collection/index
- Counts of page loads/evictions, broken down by collection/index
My intuition from high I/O CPU is that our working set has grown too large, and we need to figure out what's driving the usage. The Profiler is a good proxy (page loads are slow, so they will show up in slow queries) - it's just not the best way to measure it directly.