Database

← MongoDB Feedback Engine

To report bugs, please use our SERVER JIRA project.

How can we improve the MongoDB Database?

Enter your idea

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Should be possible to configure profiling output destination

When enabling database profiling the output is sent to both the system.profile collection and system logs. Logging to a capped collection is fine but spamming the logs on disk is not good.

We have a need to be able to react to changing query patterns quickly, so we have profiling enabled in production on a busy system and we do real-time analysis on the the system.profile collection. This works fine and the performance hit is acceptable but our system logs on disk grows a lot.

Please make it possible to configure if profile logging should go to disk, collection or both. Ideally with individual levels.

When enabling database profiling the output is sent to both the system.profile collection and system logs. Logging to a capped collection is fine but spamming the logs on disk is not good.

We have a need to be able to react to changing query patterns quickly, so we have profiling enabled in production on a busy system and we do real-time analysis on the the system.profile collection. This works fine and the performance hit is acceptable but our system logs on disk grows a lot.

Please make it possible to configure if profile logging should go to disk, collection or…

9 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Implement $bucket and $group on indexed values with sub-linear runtime

We noticed that sum $bucket and $group aggregations such as $min, $max, $count are unexpectedly slow even when fully covered by an index, (partially) because the DB scans through the entire index rather than employing optimization approaches such as binary search.

An example pipeline that should return instantaneous but scans through the entire index (confirmed on v4.4 and v5):
[
{
$match: {
status: "DELIVERED",
},
},
{
$group: {
id: {
status: "$status",
},
min: {
$min: "$modifytime",
},
},
},
]
with an index { status: 1, modify_time: 1}

Another example is $bucket (same index):
[
{
$match: {
status: "DELIVERED",
},
},
{
$bucket: {
groupBy: "$creation_ts",
boundaries: [
Date(0),
ISODate("2020-01-01T00:00:30Z"),
ISODate("2022-01-01T00:00:30Z"),
],
default: "default",
output: {
count: {
$sum: 1,
},
status: {
$first: "$status",
},
},
},
},
]

We noticed that sum $bucket and $group aggregations such as $min, $max, $count are unexpectedly slow even when fully covered by an index, (partially) because the DB scans through the entire index rather than employing optimization approaches such as binary search.

An example pipeline that should return instantaneous but scans through the entire index (confirmed on v4.4 and v5):
[
{
$match: {
status: "DELIVERED",
},
},
{
$group: {
id: {
status: "$status",
},
min: {
$min: "$modifytime",
},
},
},
]
with an index { status: 1, modify_time: 1}

Another example is $bucket (same index):
[
{…

6 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Allow the ability when NVME cluster to autoscale up based on CPU and Memory

To provide autoscaling capability for MongoDB NVME clusters based on CPU utilization and memory metrics. This feature would automatically provision additional resources when predefined thresholds are reached, ensuring optimal performance during usage spikes without manual intervention.

Currently, MongoDB Atlas NVME-based clusters require manual vertical scaling, which leads to:
- Performance degradation during unexpected high-load periods
- Inefficient resource allocation requiring constant monitoring
- Potential application downtime during scaling operations
- DevOps overhead for continuous capacity planning

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add support for all type of joins like Postgres has and improve performance

$lookup is a performance killer. Joins are crucial parts in every OLTP system. $lookup is the equivalent to join in SQL, however $lookup is slow, doesn't support hash joins or other efficient join algorithm implemented in Postgres for example.

Seems that if mongo won't add support, their DB puts behind Postgres.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

1 comment · Performance · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Provide a dictionary for matched/modified/deleted counts for bulk writes

Current bulk write operations provide only aggregated counts that look like this:
{ acknowledged: true, insertedCount: 0, insertedIds: {}, matchedCount: 2, modifiedCount: 2, deletedCount: 0, upsertedCount: 0, upsertedIds: {} }
All these operations are processed individually by the server, even if out of order, meaning that the current Mongo DB code aggregates these individual counts in order to present them in the structure above.

This aggregation, however, makes it impossible to use bulk writes in the same way single updates and deletes can be performed, which significantly slows down mass updates.

Consider updates of versioned documents - one can issue 100 updates, where a version is a part of a criteria. If updates would be tracked in the same way inserts/upserts are (with relation to their operation index, not document IDs, which would be an overkill), there would be a dictionary in the bulk result object, with the bulk operation index as a key and the number of matched and modified or deleted document as the value.

This way callers could identify that, say 90 of those 100 versioned updates were successful and 10 were not and then use client-side mapping between those bulk operations in the original structures to infer which ones failed, so the failed document could be reported as needed, such as reporting stale versions, etc.

Current bulk write operations provide only aggregated counts that look like this:
{ acknowledged: true, insertedCount: 0, insertedIds: {}, matchedCount: 2, modifiedCount: 2, deletedCount: 0, upsertedCount: 0, upsertedIds: {} }
All these operations are processed individually by the server, even if out of order, meaning that the current Mongo DB code aggregates these individual counts in order to present them in the structure above.

This aggregation, however, makes it impossible to use bulk writes in the same way single updates and deletes can be performed, which significantly slows down mass updates.

Consider updates of versioned documents - one can issue…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Execute $group on shardPart instead of mergerPart

Basically what is described here: https://www.mongodb.com/community/forums/t/how-to-enforce-mongodb-to-execute-group-on-shardpart-of-the-execution-plan/267560

When running covered count queries that could be aggregated independently on the shards, there is still a lot of overhead due to the fact that shards have to report documents with _id to mongos

It looks like this is happening for count queries that use $limit stage, but not for count queries without it

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Parallelize unionWith

Today $unionWith aggregation command is executed sequentially. EG first we query collection A and then collection B and then the union occurs.
The process should be parallelized so the query part will run in parallel while the union will be done as best effort tree merge try to speed up the overall Elapsed Time of the query

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
an aggregation stage to load data to DRAM for the fields that are only requested

When we use $project stage, it loads whole document from disk to memory (if not in working set). Because of this, When we create a data model, We have to create separate collection if the field is not required in frequent access of data. Creating a view is an option, but what if $project itself or $project with some argument in it, or a new pipeline stage or operator gets introduced which fetches the data from disk only for the fields specified instead of loading whole document.

With memory mapped file, retrieving fields specified alone would not be simply possible but just thinking of a possibility which can be really useful in a case.

When we use $project stage, it loads whole document from disk to memory (if not in working set). Because of this, When we create a data model, We have to create separate collection if the field is not required in frequent access of data. Creating a view is an option, but what if $project itself or $project with some argument in it, or a new pipeline stage or operator gets introduced which fetches the data from disk only for the fields specified instead of loading whole document.

With memory mapped file, retrieving fields specified alone would not be simply possible…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Avoid truncating the query on the Atlas profiler or system.profile collection

Slow running queries that are captured in system.profile collection or on profiler page of Atlas are truncated if the query is too long. As an Application DBA, it would be difficult to analyse the query without figuring out the actual query. The current limitation of command document is 50Kb. Request you to consider this limitation to avoid truncation of queries.

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
throttle sessions which use too much resources

We have different types of applications :
1. Writer - to load data into mongodb from different data sources
2. Reader - to read data and display to end user.

Normally, there is strict SLA for reader , but no SLA (or less restricted) for writers. We want to make sure that writer will not impact reader in case when for some reason a lot of data arrived from external sources. So, we would like to slow down writers for the sake of readers.

Writers can saturate CPUs and IO, that's why we want an option to leave some room for readers. Normally it's hard to limit on application level because applications can be deployed on many different host and may not communicate to each other.

We have different types of applications :
1. Writer - to load data into mongodb from different data sources
2. Reader - to read data and display to end user.

Normally, there is strict SLA for reader , but no SLA (or less restricted) for writers. We want to make sure that writer will not impact reader in case when for some reason a lot of data arrived from external sources. So, we would like to slow down writers for the sake of readers.

Writers can saturate CPUs and IO, that's why we want an option to leave some room…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Improve sorting performance

Sorting always ends up doing a collection scan when the selected index for the find/match does meet the sort requirement. The sort effectively makes the performance worse by 15-25 times for the "matched" dataset which runs into 10s of thousands (not millions) of documents

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Build MongoDB with PGO

I would like to see support for PGO (and even LLVM Bolt) in the upstream. Would be awesome if MongoDB will distribute PGO-optimized binaries, so the users will be able to see an additional performance boost "for free". At least describe to the users somewhere in the documentation, how they could achieve a boost for their own scenarios with PGO.

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

1 comment · Performance · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Boost the performance of bioinformatic annotation queries

The documents to be selected look something like this:

{
"_id": {
"$oid": "6272c580d4400d8cb10d5406"
},
"#CHROM": 1,
"POS": 286747,
"ID": "rs369556846",
"REF": "A",
"ALT": "G",
"QUAL": ".",
"FILTER": ".",
"INFO": [{
"RS": 369556846,
"RSPOS": 286747,
"dbSNPBuildID": 138,
"SSR": 0,
"SAO": 0,
"VP": "0x050100000005150026000100",
"WGT": 1,
"VC": "SNV",
"CAF": [{
"$numberDecimal": "0.9381"
}, {
"$numberDecimal": "0.0619"
}],
"COMMON": 1,
"TOPMED": [{
"$numberDecimal": "0.88411856523955147"
}, {
"$numberDecimal": "0.11588143476044852"
}]
},
["SLO", "ASP", "VLD", "G5", "KGPhase3"]
]
}

For a basic annotation (https://en.wikipedia.org/wiki/SNP_annotation) scenario, we need such query:

{'ID': {'$in': ['rs369556846', 'rs2185539', 'rs2519062', 'rs149363311', 'rs55745762', <...>]}}
, where <...> means hundreds/thousands of values.

Such query is executed in a few seconds.

More complex annotation queries:

{'$or': [{'#CHROM': 1, 'POS': 1499125}, {'#CHROM': 1, 'POS': 1680158}, {'#CHROM': 1, 'POS': 1749174}, {'#CHROM': 1, 'POS': 3061224}, {'#CHROM': 1, 'POS': 3589337}, <...>]}

{'$or': [{'ID': 'rs149434212', 'REF': 'C', 'ALT': 'T'}, {'ID': 'rs72901712', 'REF': 'G', 'ALT': 'A'}, {'ID': 'rs145474533', 'REF': 'G', 'ALT': 'C'}, {'ID': 'rs12096573', 'REF': 'G', 'ALT': 'T'}, {'ID': 'rs10909978', 'REF': 'G', 'ALT': 'A'}, <...>]}

Despite the involvement of IXSCAN, they run many hours.

Please test aforementioned queries thoroughly and improve the performance of their execution. This will help science!

The documents to be selected look something like this:

{
"_id": {
"$oid": "6272c580d4400d8cb10d5406"
},
"#CHROM": 1,
"POS": 286747,
"ID": "rs369556846",
"REF": "A",
"ALT": "G",
"QUAL": ".",
"FILTER": ".",
"INFO": [{
"RS": 369556846,
"RSPOS": 286747,
"dbSNPBuildID": 138,
"SSR": 0,
"SAO": 0,
"VP": "0x050100000005150026000100",
"WGT": 1,
"VC": "SNV",
"CAF": [{
"$numberDecimal": "0.9381"
}, {
"$numberDecimal": "0.0619"
}],
"COMMON": 1,
"TOPMED": [{
"$numberDecimal": "0.88411856523955147"
}, {
"$numberDecimal": "0.11588143476044852"
}]
},
["SLO", "ASP", "VLD", "G5", "KGPhase3"]
]
}

For a basic annotation (https://en.wikipedia.org/wiki/SNP_annotation) scenario, we need such query:

{'ID': {'$in': ['rs369556846', 'rs2185539', 'rs2519062', 'rs149363311', 'rs55745762', <...>]}}
, where <...> means hundreds/thousands…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
There is a specific collection that I need more performance than others. Is there a way to assign more ram/memory to a specific collection?

There is a specific collection that I need more performance than others. Is there a way to assign more ram/memory to a specific collection?

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Performance · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Don't see your idea?

Database

How can we improve the MongoDB Database?

Should be possible to configure profiling output destination

Implement $bucket and $group on indexed values with sub-linear runtime

Allow the ability when NVME cluster to autoscale up based on CPU and Memory

Add support for all type of joins like Postgres has and improve performance

Provide a dictionary for matched/modified/deleted counts for bulk writes

Execute $group on shardPart instead of mergerPart

Parallelize unionWith

an aggregation stage to load data to DRAM for the fields that are only requested

Avoid truncating the query on the Atlas profiler or system.profile collection

throttle sessions which use too much resources

Improve sorting performance

Build MongoDB with PGO

Boost the performance of bioinformatic annotation queries

There is a specific collection that I need more performance than others. Is there a way to assign more ram/memory to a specific collection?

Feedback

Database

Feedback and Knowledge Base

Searching…

Give feedback

How can we improve the MongoDB Database?

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Database

Categories

Searching…