Data Federation and Data Lake
43 results found
-
Schema inference
Schemaless is flexible but it has a big impact for the downstreams especially for data exchange and DW/AI.
It is a must-have effort to derive & infer the schema from the actual documents, so that we can understand/track/evolve/translate the document schema.
https://www.mongodb.com/blog/post/engblog-implementing-online-parquet-shredder is a great article.
I'd like to propose an additional feature in ADL/ADF to make schema inference as a 1st-class citizen with faster turnaround & less operation cost.
After the $out operation of ADL/ADF, please collect the Parquet schema from each data files and union/unify them into a single schema. This schema will be stored in a .schema.json…
1 vote -
Data Uploading process Is Little bit Difficult for new users. Upload a demo vedio of Uploading.
Overall I Found one of the Interesting Software and Friendly use
1 vote -
Add support to $out to S3 for Standard JSON
I'd like to be able to use $out but output to Standard JSON instead of Extended JSON as the tool I'm using needs to consume standard JSON.
1 vote
- Don't see your idea?