Connectors (BI, Kafka, Spark)
2 results found
-
Proposal for an Optimized Load Mechanism in MongoDB Atlas via Spark + Databricks
Content:
I currently use the Spark connector in Databricks jobs to load data into MongoDB Atlas. To handle large volumes and minimize the impact of writing to active collections,I developed a mechanism that significantly accelerates the ingestion process while maintaining consistency and query performance.
The strategy involves:
- Creating a temporary collection, cloning the structure of the original collection without indexes.
- Inserting data directly into the temporary collection, avoiding the write overhead caused by indexes.
- Recreating indexes after the load, on the temporary collection.
- Swapping collections, promoting the new collection as the “hot” one and deactivating the previous version (via…
4 votes -
Mongo Spark Connector Option to refresh the Schema
This w.r.t the ticket we raised "https://support.mongodb.com/case/01352011"
In the current spark connector to infer the schema automatically we have an option "stream.publish.full.document.only" to "true", once this configured there is no explicit schema we need to pass but the driver will infer the schema on the first document it streams and use/apply the schema for any future documents coming from that collection.
But the issue here is when there is any addition of new fields in the source collection the streams are not inferring the new changes and instead it is using the old schema.
We should either design…4 votes
- Don't see your idea?