Implement $bucket and $group on indexed values with sub-linear runtime

We noticed that sum $bucket and $group aggregations such as $min, $max, $count are unexpectedly slow even when fully covered by an index, (partially) because the DB scans through the entire index rather than employing optimization approaches such as binary search.

An example pipeline that should return instantaneous but scans through the entire index (confirmed on v4.4 and v5):
[
{
$match: {
status: "DELIVERED",
},
},
{
$group: {
id: {
status: "$status",
},
min: {
$min: "$modifytime",
},
},
},
]
with an index { status: 1, modify_time: 1}

Another example is $bucket (same index):
[
{
$match: {
status: "DELIVERED",
},
},
{
$bucket: {
groupBy: "$creation_ts",
boundaries: [
Date(0),
ISODate("2020-01-01T00:00:30Z"),
ISODate("2022-01-01T00:00:30Z"),
],
default: "default",
output: {
count: {
$sum: 1,
},
status: {
$first: "$status",
},
},
},
},
]

6 votes

Ferdinand shared this idea · Jan 9, 2023 · Report… · Admin →

An error occurred while saving the comment

Database: Performance

Give feedback

How can we improve the MongoDB Database?

Implement $bucket and $group on indexed values with sub-linear runtime

Feedback

Database: Performance

Feedback and Knowledge Base

Searching…

Give feedback

Implement $bucket and $group on indexed values with sub-linear runtime

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Database: Performance

Categories

Searching…

Give feedback