Define current limits of Kubernetes Operator
1.) What is the limiting factor of the Operator? Is it number of Pods, number of Custom Resources (e.g. MongoDB, MongoDBUser) or something else?
2.) What does the number of "Clusters" refer to? Does it differ for Standalone, ReplicaSet and ShardedCluster?
3.) How many instances of the "Clusters" in 2.) are supported per MongoDB Operator?
Is there any way to add this sort of data to our documentation?
Thanks

Current scale recommendations are 20-50 deployments (ReplicaSet/ShardedCluster/Standalone).
While the Operator can handle hundreds, the limiting factor is API calls to Ops Manager/Cloud Manager. Updating one deployment at a time is fine, the issue arrises when making concurrent changes to many deployments simultaneously, where reconciliation will be slow for those later in the queue - for example during a DR scenario.
We have work planned for later in 2023 to start removing Ops Manager as a prerequisite for many of the basic operations, and we expect that to greatly improve these limits.
-
Martin commented
There are some useful information regarding limits and sizing in the Production Notes page: https://docs.mongodb.com/kubernetes-operator/stable/reference/production-notes/
-
AdminAndrey (Admin, MongoDB) commented
1.) What is the limiting factor of the Operator? Is it number of Pods, number of Custom Resources (e.g. MongoDB, MongoDBUser) or something else?
AB: It largely depends on the size of each cluster. The theoretical limit is thousands of clusters. but it depends so much on K8S API performance, Ops Manager performance etc. We are planning to add more load tests later this summer to provide more of a
2.) What does the number of "Clusters" refer to? Does it differ for Standalone, ReplicaSet and ShardedCluster?
A cluster refers to a single logical represented by connection string (Standalon, RS or Sharded)
3.) How many instances of the "Clusters" in 2.) are supported per MongoDB Operator?
Our Operator processes each cluster type in its own "loop". So at some stage, K8S and OpsManager will max on API server performance and the loop would start to run slow. However, it should be in hundreds. Usually, We don't see more than 100 clusters per namespace/operatorIs there any way to add this sort of data to our documentation?
Thanks