Data Federation and Data Lake

← MongoDB Feedback Engine

How can we improve Data Federation and Data Lake?

Enter your idea

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Allow GridFS to use Atlas and object storage (via ADL) when connecting to the cloud MDB

Many users of MongoDB store metadata in MDB and PDFs and other files in object storage. With GridFS already built into drivers, it seems like a nice change would allow ADL to federate GridFS functionality across Atlas and the file in object storage

12 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add Incremental Materialized Views

Add the ability to create a view where the result is pre-computed and is updated incrementally as more data becomes available.

7 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
a GUI for setting SQL Query sampling size (like in bi connector Atlas console)

Provide the ability to set SQL Query sampling size (like in bi connector Atlas console). This would allow our business customers that use the Power BI/ Tableu to easily set and manage sampling without having to use cli command (i.e., sqlGenerateSchame) whenever a new document is added to the database.

5 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

under review · 1 comment · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
sqlGetSchema Sampling

There is currently no way to know what a current sampling size is on a collection. I would recommend adding this to the sqlGetSchema output.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Support Geo Queries on Object Storage

I'd like to be able to query using the Geo functionality inside of MongoDB Query Language on data stored in Object Storage.

Maybe using a format like: https://github.com/opengeospatial/geoparquet

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Atlas Data Explorer to support using Aggregation Builder against Atlas Data Lake

You can use the Atlas Data Explorer and Aggregation Builder in the MongoDB Atlas web dashboard on regular collections and views. Unfortunately there appears to be no way to use them against a Data Lake within the web dashboard, either directly or while constructing new Data Sources for Charts. Attempting to use Aggregation Builder on a Data Lake while defining a Data Source forwards to a URL that returns 404.

It would be great if the same functionality was available for Data Lake as well.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Ability to use GUID field as a partition field for online archive

Hi,

Today there is no way to partition the archive data based on a field that is of type GUID (legacy GUID). For example, I tried selecting a field which had Binary('0TfYLb3Qg0WT2mZu0wbq8Q==', 3) as the value but I got an error saying that the field is not supported to be a partition field. It makes sense to do this because archived data is usually old and at that time most people were using legacy guids as opposed to object ids.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Simplify interface for query commands

User friendly data filtering, queries for updating or deleting data from collections.

3 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
The "Date field to archive on" option under Archiving Rule tab should also accept date in timestamp format.

The "Date field to archive on" option under Archiving Rule tab in Online Archive should also accept date field having timestamp format instead of only having date format.

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Allow a single timestamp field to be split by Year Month Day and Hour for folders instead of just one field like Year in filepath for Azure

I checked internally, and it has been confirmed that an attribute can only appear once in a template. If Atlas Data Federation (ADF) has a template like the one you are using, it wouldn't know what value to assign to StatusDatetime because it's being assigned multiple values. Unfortunately, ADF doesn't support defining a single field value across multiple segments of the path. Instead, each of those segments should be different attributes.

{
"path": "/HistoryCollection/{StatusDatetime isodate:Year}/StatusDatetime isodate:Month}/StatusDatetime isodate:Day}/StatusDatetime isodate:Hour}/{RecordSource string}/{Status string}/*",
"storeName": "sampledatabase"
}

We would like to have the store we are creating as an archive be queried by StatusDatetime RecordSource and Status so it matches the queries we use against the live collections under Federation instead of extracting the Year Month Day and Hour fields which don't exist in the live collection.

I checked internally, and it has been confirmed that an attribute can only appear once in a template. If Atlas Data Federation (ADF) has a template like the one you are using, it wouldn't know what value to assign to StatusDatetime because it's being assigned multiple values. Unfortunately, ADF doesn't support defining a single field value across multiple segments of the path. Instead, each of those segments should be different attributes.

{
"path": "/HistoryCollection/{StatusDatetime isodate:Year}/StatusDatetime isodate:Month}/StatusDatetime isodate:Day}/StatusDatetime isodate:Hour}/{RecordSource string}/{Status string}/*",
"storeName": "sampledatabase"
}

We would like to have the store we are creating as an archive be queried by StatusDatetime…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Background aggregation queries return a query ID or correlation ID to be able to quickly poll for completion

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Create a read/write Data Federation connection string

Some customers need a connection string both to the cluster and to Online Archive with the ability to write to the cluster only.

So far, the only option is to use more than a connection string in the application.

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Schema inference

Schemaless is flexible but it has a big impact for the downstreams especially for data exchange and DW/AI.

It is a must-have effort to derive & infer the schema from the actual documents, so that we can understand/track/evolve/translate the document schema.

https://www.mongodb.com/blog/post/engblog-implementing-online-parquet-shredder is a great article.

I'd like to propose an additional feature in ADL/ADF to make schema inference as a 1st-class citizen with faster turnaround & less operation cost.

After the $out operation of ADL/ADF, please collect the Parquet schema from each data files and union/unify them into a single schema. This schema will be stored in a .schema.json or .schema.txt file in the same S3/GCS location.

Add a new flag/parameter for $out to scan through all the documents based on the filter condition in the queries, but instead of writing the Parquet files, the $out only writes out .schema.json or .schema.txt file to s3. This can be a quite useful operation routine to run every week with or without a rough datetime incremental filter to infer the schema and then update the corporate/enterprise schema repository/central.

I've elaborated this idea with MGM and benjamin.flast

Thank you.

Schemaless is flexible but it has a big impact for the downstreams especially for data exchange and DW/AI.

It is a must-have effort to derive & infer the schema from the actual documents, so that we can understand/track/evolve/translate the document schema.

https://www.mongodb.com/blog/post/engblog-implementing-online-parquet-shredder is a great article.

I'd like to propose an additional feature in ADL/ADF to make schema inference as a 1st-class citizen with faster turnaround & less operation cost.

After the $out operation of ADL/ADF, please collect the Parquet schema from each data files and union/unify them into a single schema. This schema will be stored in a .schema.json…

1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Query Functionality · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Don't see your idea?

Data Federation and Data Lake

How can we improve Data Federation and Data Lake?

Allow GridFS to use Atlas and object storage (via ADL) when connecting to the cloud MDB

Add Incremental Materialized Views

a GUI for setting SQL Query sampling size (like in bi connector Atlas console)

sqlGetSchema Sampling

Support Geo Queries on Object Storage

Atlas Data Explorer to support using Aggregation Builder against Atlas Data Lake

Ability to use GUID field as a partition field for online archive

Simplify interface for query commands

The "Date field to archive on" option under Archiving Rule tab should also accept date in timestamp format.

Allow a single timestamp field to be split by Year Month Day and Hour for folders instead of just one field like Year in filepath for Azure

Background aggregation queries return a query ID or correlation ID to be able to quickly poll for completion

Create a read/write Data Federation connection string

Schema inference

Feedback

Data Federation and Data Lake

Feedback and Knowledge Base

Searching…

Give feedback

How can we improve Data Federation and Data Lake?

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Data Federation and Data Lake

Categories

Searching…