Run aggregation pipelines on dedicated resources
When dealing with large collections(my usecase is 2billion+) it would make sense to run these aggregation's on dedicated resources like Azure Functions(my guess is that currently the aggregation pipelines are run(and somehow throttled) on the same resources as the cluster runs.
Not sure how much are the aggregations internally tied to how the data is stored(maybe they are somehow taking advantage of that) but if its straight up pipe with filters and aggregations on top of it, this would be a huge step to allow using Mongo as Big Data Platform, since it would allow reasonable scaling of the pipeline execution.
-
Hi Jan,
The great thing about aggregations is that you're pushing down to the database engine to do the heavy lifting. However where you wish to have workload isolation so that your operational workload is not fighting for resources with your analytical/heavy aggregations, we recommend you explore Atlas Analytics Nodes -- these are special replicas that you target for isolated queries using a read preference tag. Learn more here: https://docs.atlas.mongodb.com/workload-isolation/#analytics-nodes-for-workload-isolation
Cheers
-Andrew