Dynamically switch primary between multi-region clusters based on client request locality
Consider an application that uses a multi-region implementation in an active/passive configuration - we'll call the regions "A" and "B" and assume they're located on opposite ends of the continental US. This application also leverages a multi-region Atlas cluster with single nodes in region A and B. Both the application and Primary node of the Atlas cluster are operating out of region A.
If something were to cause the application to fail-over from region A to B but that trigger did not include a complete regional outage, the Atlas cluster's primary would still be operating in region A. This is sub-optimal as calls to the DB would now have to traverse the entire length of the US and substantial latency would be introduced.
Ideally, since the Atlas cluster is aware of the source of its client calls (via the incoming request's IP address) which can be correlated to geo-locality, it would be ideal (in the prior scenario) if the Atlas cluster were to recognize that suddenly all of the client calls to it were originating from region B instead of A - where it's primary still is located. After a predefined threshold of these types of calls were met, Atlas would automatically change the priority definition of its regions such that B became a higher priority than A and as a result the primary would change to B - thus automatically mitigating the latency problem introduced when the application failover occurred.
-
Hi John,
One thing to consider is to build an API call into the app-tier failover orchestration layer which changes the preferred region of the Atlas cluster upon failover.
I definitely think the idea of automatically deducing is cool: but there are situations where ti might nevertheless not be intended so getting the exact mechanics right is nontrivial.
-Andrew