Support for delayed replicas
It would be great if Atlas provided support for hidden, priority 0 delayed replicas. For workloads that do not need real time data or situations where data recovery is needed, a delayed replica would be the fastest way to mitigate both of those scenarios.
-
Snir commented
As Atlas backups can't be generated for a specific collection - it makes recovery of a specific collection almost impossible in large data sets.
e.g, a data set of 5TB, will back up all of it, and will require a download of the entire backup, just to restore a collection that may be 1GB.
A delayed replica of 12 hours for example, to be used for fast catastrophe recovery, will be useful in such cases.
-
Bradley commented
having these delayed replicas would be helpful for our staging and testing environments. We see in practice that our prod environment has a replication lag of between 1-3 seconds, but our staging and dev environments have a replication lag near zero seconds all the time.
The difference was recently the cause of a bug that we did not catch until production. We used the delayed replicas now in our local dev environments, but it would be useful to be able to have delayed replicas in our live staging and dev environments as well, which use mongo atlas. This would ensure our dev and staging environments operate as similar to prod as possible, and would prevent future bugs of this type from occuring again.
-
Brian commented
Thanks Andrew. Unfortunately having a delayed replica would still be the fastest recovery method for accidental data deletion since there would be no restore process necessary, the data is just live in the cluster. This is a feature that is used popularly in self-hosted solutions, is this not in the horizion for Atlas replica sets? Also, are you guys considering adding zero-priority and hidden secondaries at some point?
-
Hi Brian,
Atlas contemplates the use case of being able to return to a point in time differently, through our Cloud Continuous Backups: these allow for up to the minute restores and leverage some of the same primitives that the replication system leverages (namely the internal oplog). This has the added benefit of giving you the ability to select the point of time to restore to rather than a specific delay period.
-Andrew