Configurable sample size in schema analyzer
Sometimes larger or smaller sampling sizes (default is 1000) are desired since queries are taking too long or the sample size is not statistically useful.
Could be a part of progressive loading/sampling.
-
Ante commented
To add to other comments, "all" should also be an option, not just an integer sample count. Frequently, the exhaustiveness of the output schema is the point of using it.
This possibly impacts how it works in the first place, as it wouldn't be even using $sample at all, it should cursor over all the docs.
I've been personally using this extremely handy utility https://github.com/mongoeye/mongoeye for my analysis, but it's using outdated driver that doesn't work anymore with newer MongoDB versions.
-
Noam Chamovitz commented
This is critical and should have been addressed years ago, any update on this?
-
Luis Mauricio commented
I think this could be very useful.
I have a db running in my computer with 20M documents and the default sample size is not very useful...