Allow configuration of 100mb memory limit per aggregation pipeline stage
In this old thread from 2016 (https://groups.google.com/forum/#!topic/mongodb-user/LCeFZZRz5EY) it was asked whether there was a way to increase the 100mb in memory limit of each stage of an aggregation pipeline. The responses centered around two points:
- If too much memory is used per aggregation pipeline stage then it will reduce performance for the overall MongoDB database, impacting other queries negatively.
- You can set allowDiskUse: true and revert to performing these pipeline stages on disk when they exceed 100mb.
I believe this subject needs to be revisited for the following reasons:
- “Too much memory” is very subjective, and the 100mb number is arbitrary. The hardware for a typical MongoDB database setup in 2016 is not the same as in 2020. And what is the MongoDB use case for one company isn’t the use case for another company. Company A might have top of the line hardware they are running MongoDB on and only have a few concurrent users, and need large in memory aggregation queries to run fast so want to set a limit of 200mb or even higher. Company B might have lesser hardware and many concurrent users, and want to set the limit to 50mb or less.
- The allowDiskUse: true option is nice to have, but it obviously comes at a huge performance hit once your aggregation pipeline stages exceed the 100mb limit, even on high performance SSD drives.
This really comes down to who knows best what the limit should be, the company’s developers and database / server ops teams or the MongoDB developers? The MongoDB developers obviously have much more knowledge on the inner workings of Mongo, but that advice can still be taken into account by having a default limit of 100mb. I believe very strongly that today in 2020 this limit should be configurable so that we can customize the behavior for our own specific situation. The documentation can of course provide a warning that changing the setting can impact performance (although everyone already knows that.)
If you want to get into the details of why my company needs to increase this limit, and why it has been holding us back from switching from SQL Server to MongoDB for the last year (despite spending 6 months developing a working Mongo solution) then we can get into specifics. But really that will distract away from the central issue, which is that different customers have different scenarios, and a “one size fits all” solution isn’t appropriate and needs to be customizable.
There's already an internal parameter that is available to change this value but I want to make it clear that the 100MBs limit only applies to in-memory sorts and in-memory groups (which are basically like in-memory sorts). To make aggregations faster, making sure that large sorts are supported by an index or can be avoided entirely would be preferable to changing this parameter limit.
More discussion of the limit is here: https://developer.mongodb.com/community/forums/t/mongodb-meaning-of-pipeline-aggregation-100mb-limit/99553/2