Implement $bucket and $group on indexed values with sub-linear runtime
We noticed that sum $bucket and $group aggregations such as $min, $max, $count are unexpectedly slow even when fully covered by an index, (partially) because the DB scans through the entire index rather than employing optimization approaches such as binary search.
An example pipeline that should return instantaneous but scans through the entire index (confirmed on v4.4 and v5):
[
{
$match: {
status: "DELIVERED",
},
},
{
$group: {
id: {
status: "$status",
},
min: {
$min: "$modifytime",
},
},
},
]
with an index { status: 1, modify_time: 1}
Another example is $bucket (same index):
[
{
$match: {
status: "DELIVERED",
},
},
{
$bucket: {
groupBy: "$creation_ts",
boundaries: [
Date(0),
ISODate("2020-01-01T00:00:30Z"),
ISODate("2022-01-01T00:00:30Z"),
],
default: "default",
output: {
count: {
$sum: 1,
},
status: {
$first: "$status",
},
},
},
},
]