Data Federation and Data Lake

← MongoDB Feedback Engine

How can we improve Data Federation and Data Lake?

Enter your idea

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Add support for cross region Private Endpoints in AWS

Since Private Endpoints for Data Federation have very limited region support in AWS, I would like for them to support cross region endpoint usage.

5 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

1 comment · Connectors · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add Support for Private Endpoints on US East 2 (Ohio) for AWS

Since MongoDB Atlas Private Endpoint dont support cross region endpoints, I would like to have available the endpoints for Data Federation on the region where I operate.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Connectors · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support Azure Data Federation private endpoint

Now you have supported Azure blobs for data federation it will be great to have a private endpoint connection to the storage account

21 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

started · 3 comments · Connectors · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support for Iceberg Partitions

Apache Iceberg uses URL encoded partitions, see:

https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L218

Atlas Data Federation S3/Parquet currently does not support URL encoded partitions, this a potential blocker to use Data Federation with Iceberg.

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · File Formats · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add eu-west-3 as a option for AWS private endpoint

Currently Data Lake doesn't support France/Paris eu-west-3 to set up a private endpoint. It would be great to support eu-west-3 as well.

18 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

2 comments · Infrastructure Options · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support scoped permissions to online archive

Currently it is not possible to add a Mongo user with "scoped" permission to an online archive instance. This should be supported for tighter access control.

Details here: https://support.mongodb.com/case/01376416

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Simplified JSON support for $out to S3

The ability to $out to S3 from a federated database instance is a game-changer for those working with their own data warehouses and data lakes.

One improvement that would make it better would be to support simplified JSON for json exports. Currently, $out uses extended json v2, which may not be compatible for systems reading from the destination S3 bucket, which require simplified JSON (which aligns with other tools like kafka source connector). Technically, it is possible to make this conversion yourself with clever use of the $toString aggregation pipeline operator in stages preceding $out. However there are several challenges:
+ Increased computation time
+ The more general a solution is needed (ie--in cases where you don't know/cannot make assumptions about the schema), the more complex the aggregation stages become. One such solution would be to $objectToArray the document, $map over the resulting array, converting the v field conditionally, then $arrayToObject back and $replaceRoot to recompose the document. This is already complex enough for most MongoDB users; handling nested arrays and objects makes it vastly more complex.

The ability to $out to S3 from a federated database instance is a game-changer for those working with their own data warehouses and data lakes.

One improvement that would make it better would be to support simplified JSON for json exports. Currently, $out uses extended json v2, which may not be compatible for systems reading from the destination S3 bucket, which require simplified JSON (which aligns with other tools like kafka source connector). Technically, it is possible to make this conversion yourself with clever use of the $toString aggregation pipeline operator in stages preceding $out. However there are several challenges:…

6 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

1 comment · File Formats · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
a GUI for setting SQL Query sampling size (like in bi connector Atlas console)

Provide the ability to set SQL Query sampling size (like in bi connector Atlas console). This would allow our business customers that use the Power BI/ Tableu to easily set and manage sampling without having to use cli command (i.e., sqlGenerateSchame) whenever a new document is added to the database.

5 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

under review · 1 comment · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Write "_SUCCESS" File when finish data exporting

We use MongoDB to store time-series data, and export the data via Data Federation incrementally on daily basis onto s3 as Parquet. The data is relative big, and duration to export data varies from day to day. It’s hard for downstream services to know when data exporting completes. Sometimes, the downstream service start reading the parquets while MongoDB is writing, which causes partial extraction. Normally, a big data job would create a flag file, such as _SUCCESS, to indicate that the job has finished writing the dataset. This file serves as a marker, indicating that all tasks associated with the job were finished successfully, and the data files in the directory are complete and consistent. Could you consider adding such feature?

const outStage = {
"$out": {
"s3": {
"bucket": ${aws-s3-bucket},
"filename": ${fileName},
"format": {
"name": "parquet",
"maxFileSize": "1GB",
"maxRowGroupSize": "128MB",
}
}
}
}
const coll = db.collection(collName);
await coll.aggregate([
matchStage,
outStage
], { background: true }).toArray();
console.log(Job Submitted);

We use MongoDB to store time-series data, and export the data via Data Federation incrementally on daily basis onto s3 as Parquet. The data is relative big, and duration to export data varies from day to day. It’s hard for downstream services to know when data exporting completes. Sometimes, the downstream service start reading the parquets while MongoDB is writing, which causes partial extraction. Normally, a big data job would create a flag file, such as _SUCCESS, to indicate that the job has finished writing the dataset. This file serves as a marker, indicating that all tasks associated with the…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Automation · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Allow a single timestamp field to be split by Year Month Day and Hour for folders instead of just one field like Year in filepath for Azure

I checked internally, and it has been confirmed that an attribute can only appear once in a template. If Atlas Data Federation (ADF) has a template like the one you are using, it wouldn't know what value to assign to StatusDatetime because it's being assigned multiple values. Unfortunately, ADF doesn't support defining a single field value across multiple segments of the path. Instead, each of those segments should be different attributes.

{
"path": "/HistoryCollection/{StatusDatetime isodate:Year}/StatusDatetime isodate:Month}/StatusDatetime isodate:Day}/StatusDatetime isodate:Hour}/{RecordSource string}/{Status string}/*",
"storeName": "sampledatabase"
}

We would like to have the store we are creating as an archive be queried by StatusDatetime RecordSource and Status so it matches the queries we use against the live collections under Federation instead of extracting the Year Month Day and Hour fields which don't exist in the live collection.

I checked internally, and it has been confirmed that an attribute can only appear once in a template. If Atlas Data Federation (ADF) has a template like the one you are using, it wouldn't know what value to assign to StatusDatetime because it's being assigned multiple values. Unfortunately, ADF doesn't support defining a single field value across multiple segments of the path. Instead, each of those segments should be different attributes.

{
"path": "/HistoryCollection/{StatusDatetime isodate:Year}/StatusDatetime isodate:Month}/StatusDatetime isodate:Day}/StatusDatetime isodate:Hour}/{RecordSource string}/{Status string}/*",
"storeName": "sampledatabase"
}

We would like to have the store we are creating as an archive be queried by StatusDatetime…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Background aggregation queries return a query ID or correlation ID to be able to quickly poll for completion

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support for Superset and other Python DB-API / SQLAlchemy connections to SQL Atlas

Superset uses SQL Alchemy and/or Python DB-API drivers, not JDBC or ODBC drivers. Superset is the most popular, open-source Apache visualization tool.

Others have made it work like this: https://preset.io/blog/building-database-connector/

8 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Connectors · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Include support for Federated Database Instance with Data API services.
- Screenshot 2024-05-13 at 12.25.50.png 374 KB
1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
sqlGetSchema Sampling

There is currently no way to know what a current sampling size is on a collection. I would recommend adding this to the sqlGetSchema output.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support to export backups to Azure Blob Storage in Atlas

I would like the capability to export my cloud snapshots to Azure blob storage.

8 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
AWS IAM AuthN for Atlas SQL

Support AWS IAM Authentication mechanism in JDBC and ODBC drivers (Atlas SQL)

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Connectors · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Create a read/write Data Federation connection string

Some customers need a connection string both to the cluster and to Online Archive with the ability to write to the cluster only.

So far, the only option is to use more than a connection string in the application.

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Implement a feature to track data download volume per DB user

In order to enhance data security and prevent unauthorized data exfiltration, our team proposes the implementation of a metric within MongoDB Atlas that allows administrators to monitor and measure the amount of data downloaded by each database user over a specified period. This feature would provide critical insights into user behavior, helping to identify unusual data access patterns or potential data breaches. By tracking network data usage at the user level, we can more effectively audit data access and transfer, ensuring that data is used appropriately and in compliance with organizational data governance policies. This granularity in monitoring would be a significant step forward in data management and security within MongoDB Atlas, offering a proactive tool for administrators in safeguarding sensitive data.

In order to enhance data security and prevent unauthorized data exfiltration, our team proposes the implementation of a metric within MongoDB Atlas that allows administrators to monitor and measure the amount of data downloaded by each database user over a specified period. This feature would provide critical insights into user behavior, helping to identify unusual data access patterns or potential data breaches. By tracking network data usage at the user level, we can more effectively audit data access and transfer, ensuring that data is used appropriately and in compliance with organizational data governance policies. This granularity in monitoring would be…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Reporting · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support AWS IAM for Data Federation Authentication

We would like to be able to connect to the Federated Database Instance using AWS IAM for Authentication just like you can for Atlas Clusters.

5 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

2 comments · Infrastructure Options · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add support region at south korea for federated database

6 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Infrastructure Options · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

← Previous 1 2 3 Next →

Don't see your idea?

Data Federation and Data Lake

How can we improve Data Federation and Data Lake?

Add support for cross region Private Endpoints in AWS

Add Support for Private Endpoints on US East 2 (Ohio) for AWS

Support Azure Data Federation private endpoint

Support for Iceberg Partitions

Add eu-west-3 as a option for AWS private endpoint

Support scoped permissions to online archive

Simplified JSON support for $out to S3

a GUI for setting SQL Query sampling size (like in bi connector Atlas console)

Write "_SUCCESS" File when finish data exporting

Allow a single timestamp field to be split by Year Month Day and Hour for folders instead of just one field like Year in filepath for Azure

Background aggregation queries return a query ID or correlation ID to be able to quickly poll for completion

Support for Superset and other Python DB-API / SQLAlchemy connections to SQL Atlas

Include support for Federated Database Instance with Data API services.

sqlGetSchema Sampling

Support to export backups to Azure Blob Storage in Atlas

AWS IAM AuthN for Atlas SQL

Create a read/write Data Federation connection string

Implement a feature to track data download volume per DB user

Support AWS IAM for Data Federation Authentication

Add support region at south korea for federated database

Feedback

Data Federation and Data Lake

Feedback and Knowledge Base

Searching…

Give feedback

How can we improve Data Federation and Data Lake?

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Data Federation and Data Lake

Categories

Searching…