Connectors (BI, Kafka, Spark)
13 results found
-
Kafka Sink Connector ObjectId Support
The sink connector should support ObjectIds for the document.id.strategy. In other words, if a ObjectId hex string is provided for _id, the sink connector should be able to convert this to an ObjectId. I've scoured the documentation and the rest of the internet and have not been able to determine how to do this. I see an open PR on the github repo, but there has been no movement in almost 3 years. It's astonishing that I cannot use an ObjectId...
1 vote -
Retry/reconnect mechanism Mongodb Source Connectors on MongoTimeoutException
The MongoDB Kafka Source and Sink connectors for the data streaming are working fine seamlessly until there are any errors in Kafka Source connectors.
If there is any error occurs, the connectors are not recovered from the timeout exceptions and need to be reposted/restarted connectors.
Exception:
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches3 votes -
support regex in topic.namespace.map ( kafka source connector )
Currently,
topic.namespace.map
supports wildcard ( * ).
It will be helpful if it also support regex as well.
Or, since it can be breaking change, it would be nice if a new mapping is introduced that supports regex format.The workaround can be using custom mapper.
However, if the there is an builtin version, then the wheel can invented once and used by many ppl.1 vote -
CFLE support for Kafka connector
Using MongoDB CFLE, The data need to be passed to downstream using Kafka connector which is consumed by downstream datalake for further processing. When data is pushed into Kafka it will remain encrypted and same encrypted data will land into the datalake, as encryption and decryption is happening at the driver level.
If Kafka connector supports CFLE encryption/decryption this would work seamlessly
4 votes -
Dynamic Topic Mapping on the basis of message content
Currently mongodb source connector only supports dynamic topic mapping on the bases of collections/databases. Can we extend it to support routing on the basis of message content?
Why it's important?
We've currently setup connector for a very large collection but since we wanted to route the data for different sections to different topics, we had to setup separate connector for each and define the filter in the pipeline section (it runs as an aggregate query on changestreams to filter out the relevant data). Now obviously it created performance concerns as large number of collscan queries started to run on these…1 vote -
Change stream total ordering within a transaction
Right now, change stream will flatten out a transaction into individual operations, i.e., if we do two operations within a transaction, change stream will generate two events.
However there is no total ordering of the events that happens within the transaction. This is important if we do two updates on the same object within a transaction, since both operations share the same optime (clusterTime), there is no way to provide the ordering between these two events.
It would be helpful if we can provide an ordering number on every events from the same transaction for this purpose.
1 vote -
Kafka connector to support Kafka Schema Registry
One of the issues that our team has been talking about is when getting data from MongoDB, via a Kafka connector, and sending it through to Kafka we try to enforce schemas in Kafka but that schema is not enforced on the MongoDB data. This leads to developers needing to make sure they let the Data Engineering team know when their schema evolves so we can accommodate that change in the Avro schema. Our thought is to potentially have the developers use the Confluent Schema Registry to serialize their data to Avro prior to writing it to MongoDB. This would…
2 votes -
built-in CDC to Kafka
Hi,
it is still hard to set up a MongoDB oplog CDC connection to Kafka to publish changes from e.g. a microservice-local MongoDB. You typically have to use Kafka Connect and either the official MongoDB Atlas Connector, or the Debezium Open Source Connector.
One of the databases competing with MongoDB, CockroachDB, has a built-in feature to publish "change feeds" to Kafka (see https://www.cockroachlabs.com/docs/stable/stream-data-out-of-cockroachdb-using-changefeeds.html).
I'd love to see a similar feature for MongoDB, since this would allow us to keep MongoDB and Kafka in sync much easier and more conveniently - without having to care about yet another (probably centralized)…
1 vote -
Ignore heartbeats-mongodb topic by default
As per KAFKA-208, SMTs can't be applied to the heartbeats-mongodb topic. Users should not have to configure each connector to ignore this topic. Please either ignore this topic by default or provide a command-line switch so it can be ignored.
4 votes -
Get schema validation "feedback" in Kafka Mongo Sink Connector
Objective :
We want to be able to validate that data matches some requirements. We would like to to perform this data validation by adding a JSON schema in Mongo (such as it is described here : https://docs.mongodb.com/manual/core/schema-validation/).
Problem is that current implementation of the current Mongo DB Kafka Sink connector does not implement the required elements to benefit from features brought by this KIP : https://cwiki.apache.org/confluence/display/KAFKA/KIP-610%3A+Error+Reporting+in+Sink+Connectors
So if we define such a validation on Mongo, if a message has a value that does not match the definition, it would not go in the dead letter queue, and the…
4 votes -
MongoDB Sink Connector CDC default handler
I would like to have a default CDC handler that can process data produced from MongoDB Source Connector without Debezium https://docs.mongodb.com/kafka-connector/master/kafka-sink-cdc#cdc-handler-configuration
3 votes -
Dropbox
Lead with the drop that will very allow. Permit to be inside and all the time useful !!
stay tuned.1 vote -
Kafka source connector once only semantics
Added as a suppport case here : https://support.mongodb.com/case/00634630
When using the connector as a Source, i.e we capture change streams from the Source Mongo DB and stream that to a Kafka endpoint.
Imagine these are updates on financial transactions in mongodb and they are NOT tolerant to
1) missed data and
2) duplicated data
in that order.So, we need to make sure that the Change Streams that we are observing(matching) on, are delivered once and exactly once to the Kafka pipeline. (Blog on the same : https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/). If exactly-once semantics are enabled, it makes commits transactional by default.
…
4 votes
- Don't see your idea?