Data Federation and Data Lake

← MongoDB Feedback Engine

How can we improve Data Federation and Data Lake?

Enter your idea

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Cross Project Access to Atlas Clusters from Data Lake

I would like by Data Lake in Project A to be able to query data in a Cluster in Project B.

18 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

4 comments · Storage Configuration · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
S3 alternative provider support

A lot of providers support the same API of AWS. I think it will be simple to integrate them !

9 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Storage Configuration · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Ability to "rehydrate" Atlas cluster from online archive

Consider an archive scenario when a user of a given app has not logged into the app in [x] number of weeks/months, so all their data is moved to Online Archive. Once they log back into the app again, their "cold" data should now be considered "hot" and be moved back into Atlas. While we can use $out to copy data back to Atlas, there is no current way to remove the "rehydrated" data from S3 once it's been copied back to Atlas

5 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

1 comment · Storage Configuration · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Add last modified timestamp to Data Federation Provenance for S3

It would be great to have the last modified timestamp of a file in S3 returned with the provenance functionality in Atlas Data Federation.

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Storage Configuration · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Filtered Data Lake Ingestions

Our immediate need is that our applications are multi-tenant, so it would be very useful if we could create tenant-specific data lakes, by setting particular constraints in the ingestion configuration (ex. only ingest the documents with tenantId = 'specificTenantId').
However, the usefulness of filtered data lake ingestions can be multifaceted. The ingestion could be done only for archived=false documents, documents with status=ACTIVE, etc.

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Storage Configuration · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close
Combine data lake snapshots into a single federated collection
A common use case for data analytics is to analyse how your data evolve over time.
For example, imagine you have an e-commerce database and your products have their price change every day. You may only store the price in your database but you'd like to make a chart that shows the evolution of your product prices over time (price y axis and time for x axis).

It is possible today to make this happen with the combination of Data Lake and Data Federation, but the Storage Configuration JSON need to be manually updated like this:

{ "databases": [ { "collections": [ { "name": "collectionName", "dataSources": [ { "datasetName": "v1$atlas$snapshot$Cluster0$env$collectionName$20230814T050424Z", "provenanceFieldName": "provenance", "storeName": "..." }, { "datasetName": "v1$atlas$snapshot$Cluster0$env$collectionName$20230813T050415Z", "provenanceFieldName": "provenance", "storeName": "..." }, { "datasetName": "v1$atlas$snapshot$Cluster0$env$collectionName$20230812T050424Z", "provenanceFieldName": "provenance", "storeName": "..." }, ..., ..., ..., ..., ..., ..., ] } ], } ], ... }

The Data Federation configuration json is going to make thousands of lines and need to be maintained daily or maybe using a script + API. (3 lines of json per collection * 365 snapshots a year * 20 collections = 22'000 lines of json a year)

One idea could be to use a simple wildcard instead of the timestamp like this:

{ "databases": [ { "collections": [ { "name": "collectionName", "dataSources": [ { "datasetName": "v1$atlas$snapshot$Cluster0$env$collectionName$*", "provenanceFieldName": "provenance", "storeName": "..." } ] } ] } ] }

P.S: I know that time series collection could be useful in the specific example I just gave. But, sometime, you may want to analyse various properties over time, that's where a Data Lake solutions make sense.
A common use case for data analytics is to analyse how your data evolve over time.
For example, imagine you have an e-commerce database and your products have their price change every day. You may only store the price in your database but you'd like to make a chart that shows the evolution of your product prices over time (price y axis and time for x axis).

It is possible today to make this happen with the combination of Data Lake and Data Federation, but the Storage Configuration JSON need to be manually updated like this:

{ "databases": [
…
1 vote

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

0 comments · Storage Configuration · Edit… · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

Don't see your idea?

Data Federation and Data Lake

How can we improve Data Federation and Data Lake?

Cross Project Access to Atlas Clusters from Data Lake

S3 alternative provider support

Ability to "rehydrate" Atlas cluster from online archive

Add last modified timestamp to Data Federation Provenance for S3

Filtered Data Lake Ingestions

Combine data lake snapshots into a single federated collection

Feedback

Data Federation and Data Lake

Feedback and Knowledge Base

Searching…

Give feedback

How can we improve Data Federation and Data Lake?

We're glad you're here

Cross Project Access to Atlas Clusters from Data Lake

We're glad you're here

We're glad you're here

S3 alternative provider support

We're glad you're here

We're glad you're here

Ability to "rehydrate" Atlas cluster from online archive

We're glad you're here

We're glad you're here

Add last modified timestamp to Data Federation Provenance for S3

We're glad you're here

We're glad you're here

Filtered Data Lake Ingestions

We're glad you're here

We're glad you're here

Combine data lake snapshots into a single federated collection

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Data Federation and Data Lake

Categories

Searching…

Give feedback