Data Federation and Data Lake
6 results found
-
Cross Project Access to Atlas Clusters from Data Lake
I would like by Data Lake in Project A to be able to query data in a Cluster in Project B.
17 votes -
S3 alternative provider support
A lot of providers support the same API of AWS. I think it will be simple to integrate them !
9 votes -
Ability to "rehydrate" Atlas cluster from online archive
Consider an archive scenario when a user of a given app has not logged into the app in [x] number of weeks/months, so all their data is moved to Online Archive. Once they log back into the app again, their "cold" data should now be considered "hot" and be moved back into Atlas. While we can use $out to copy data back to Atlas, there is no current way to remove the "rehydrated" data from S3 once it's been copied back to Atlas
5 votes -
Add last modified timestamp to Data Federation Provenance for S3
It would be great to have the last modified timestamp of a file in S3 returned with the provenance functionality in Atlas Data Federation.
2 votes -
Filtered Data Lake Ingestions
Our immediate need is that our applications are multi-tenant, so it would be very useful if we could create tenant-specific data lakes, by setting particular constraints in the ingestion configuration (ex. only ingest the documents with tenantId = 'specificTenantId').
However, the usefulness of filtered data lake ingestions can be multifaceted. The ingestion could be done only for archived=false documents, documents with status=ACTIVE, etc.2 votes -
Combine data lake snapshots into a single federated collection
A common use case for data analytics is to analyse how your data evolve over time.
For example, imagine you have an e-commerce database and your products have their price change every day. You may only store the price in your database but you'd like to make a chart that shows the evolution of your product prices over time (price y axis and time for x axis).It is possible today to make this happen with the combination of
Data Lake
andData Federation
, but the Storage Configuration JSON need to be manually updated like this:
…{ "databases": [
1 vote
- Don't see your idea?