Add Event Stream Features (Apache Kafka, NATS Streaming)
I thought it would be great if mongodb can support an event streaming(event bus) feature.
Existing popular event streaming services(AWS Kinesis, NATS Streaming, Apache Kafka) can persist data which is somewhat like a database.
Its great for debugging, data logging for later uses such as machine learning. Since I can see the full flow of the data and its changes.
But there are two downside with most even streaming services.
1. Its very difficult to query the data.
2. "Eventual Consistency" issue when dealing programmatic errors(bugs). Most event streams have this nice feature, which keeps sending the same event to a subscriber until the subscriber acknowledges that the event has been published. Now this is really great in terms of temporary issues(network problem, service restart due to deployment, etc.). But if the cause of failure is a bug, which needs fixing by a developer, the event bus should not continuously send same event again and again.
To solve this issue, I made a micro service that subscribes to all events and store the events in mongodb, basically I store the event data both in mongo and stream service.
Now its super easy to query(since "event chain" is a tree structure for every child event I store the parent event Id)
Also I dont only store events but event errors with its parent event Id and error stack trace. With this I made a small feature that counts the number of duplicate errors. When the number of duplicate error exceeds a certain number, we can assume that its not a temporary error and so the event bus is programmatically acknowledged.
After the developer fixes the issue the developer can relaunch the same event based on what is stored in mongodb.
Now one major issue people might come up with this solution is that a lot of data is generated in mongo, which are not queried regularly.
Well in this case I think I can make use of mongodb data lake/ For example I could only store recent 7 days of events in mongodb and after 7days I could make a data pipeline that exports data to S3 and if I need to query I would use mongodb data lake.
After building this service and duplicating the events from the event stream to mongodb, I thought it would be great if mongodb supports a built in feature. This feature would be especially great for mongodb compared to other databases(relational) because events in general are objects(documents)
MongoDB provides a connector integrating with Apache Kafka.
MongoDB Kafka Connector