Skip to content

Data Federation and Data Lake

  • Hot ideas
  • Top ideas
  • New ideas
  • My feedback

13 results found

  1. Allow GridFS to use Atlas and object storage (via ADL) when connecting to the cloud MDB

    Many users of MongoDB store metadata in MDB and PDFs and other files in object storage. With GridFS already built into drivers, it seems like a nice change would allow ADL to federate GridFS functionality across Atlas and the file in object storage

    12 votes
    How important is this to you?
  2. Add Incremental Materialized Views

    Add the ability to create a view where the result is pre-computed and is updated incrementally as more data becomes available.

    7 votes
    How important is this to you?
  3. a GUI for setting SQL Query sampling size (like in bi connector Atlas console)

    Provide the ability to set SQL Query sampling size (like in bi connector Atlas console). This would allow our business customers that use the Power BI/ Tableu to easily set and manage sampling without having to use cli command (i.e., sqlGenerateSchame) whenever a new document is added to the database.

    5 votes
    How important is this to you?
  4. sqlGetSchema Sampling

    There is currently no way to know what a current sampling size is on a collection. I would recommend adding this to the sqlGetSchema output.

    3 votes
    How important is this to you?
  5. Support Geo Queries on Object Storage

    I'd like to be able to query using the Geo functionality inside of MongoDB Query Language on data stored in Object Storage.

    Maybe using a format like: https://github.com/opengeospatial/geoparquet

    3 votes
    How important is this to you?
  6. Atlas Data Explorer to support using Aggregation Builder against Atlas Data Lake

    You can use the Atlas Data Explorer and Aggregation Builder in the MongoDB Atlas web dashboard on regular collections and views. Unfortunately there appears to be no way to use them against a Data Lake within the web dashboard, either directly or while constructing new Data Sources for Charts. Attempting to use Aggregation Builder on a Data Lake while defining a Data Source forwards to a URL that returns 404.

    It would be great if the same functionality was available for Data Lake as well.

    3 votes
    How important is this to you?
  7. Ability to use GUID field as a partition field for online archive

    Hi,

    Today there is no way to partition the archive data based on a field that is of type GUID (legacy GUID). For example, I tried selecting a field which had Binary('0TfYLb3Qg0WT2mZu0wbq8Q==', 3) as the value but I got an error saying that the field is not supported to be a partition field. It makes sense to do this because archived data is usually old and at that time most people were using legacy guids as opposed to object ids.

    3 votes
    How important is this to you?
  8. Simplify interface for query commands

    User friendly data filtering, queries for updating or deleting data from collections.

    3 votes
    How important is this to you?
  9. The "Date field to archive on" option under Archiving Rule tab should also accept date in timestamp format.

    The "Date field to archive on" option under Archiving Rule tab in Online Archive should also accept date field having timestamp format instead of only having date format.

    2 votes
    How important is this to you?
  10. Allow a single timestamp field to be split by Year Month Day and Hour for folders instead of just one field like Year in filepath for Azure

    I checked internally, and it has been confirmed that an attribute can only appear once in a template. If Atlas Data Federation (ADF) has a template like the one you are using, it wouldn't know what value to assign to StatusDatetime because it's being assigned multiple values. Unfortunately, ADF doesn't support defining a single field value across multiple segments of the path. Instead, each of those segments should be different attributes.

    {
    "path": "/HistoryCollection/{StatusDatetime isodate:Year}/StatusDatetime isodate:Month}/StatusDatetime isodate:Day}/StatusDatetime isodate:Hour}/{RecordSource string}/{Status string}/*",
    "storeName": "sampledatabase"
    }

    We would like to have the store we are creating as an archive be queried by StatusDatetime…

    1 vote
    How important is this to you?
  11. 1 vote
    How important is this to you?
  12. Create a read/write Data Federation connection string

    Some customers need a connection string both to the cluster and to Online Archive with the ability to write to the cluster only.

    So far, the only option is to use more than a connection string in the application.

    1 vote
    How important is this to you?
  13. Schema inference

    Schemaless is flexible but it has a big impact for the downstreams especially for data exchange and DW/AI.

    It is a must-have effort to derive & infer the schema from the actual documents, so that we can understand/track/evolve/translate the document schema.

    https://www.mongodb.com/blog/post/engblog-implementing-online-parquet-shredder is a great article.

    I'd like to propose an additional feature in ADL/ADF to make schema inference as a 1st-class citizen with faster turnaround & less operation cost.

    After the $out operation of ADL/ADF, please collect the Parquet schema from each data files and union/unify them into a single schema. This schema will be stored in a .schema.json…

    1 vote
    How important is this to you?
  • Don't see your idea?

Data Federation and Data Lake

Categories

Feedback and Knowledge Base