AdminHenry (Admin, MongoDB)
My feedback
4 results found
-
5 votes
An error occurred while saving the comment -
2 votesAdminHenry (Admin, MongoDB) shared this idea ·
-
3 votesAdminHenry (Admin, MongoDB) shared this idea ·
-
1 vote
An error occurred while saving the comment AdminHenry (Admin, MongoDB) commentedHey Tom,
Thanks so much for the detailed response.
I should clarify, when Ben said consistent he meant consistent between queries, not consistent between other operators.
I recognize that replacing relevancy with text search to using vector search in a similar intended way requires using a different operator with different compatibility with other features, and we will definitely consider improving this in the future. Sort for example seems like a very natural fit to fit within the search + knnBeta stage, as users may want to view paged results based on a metadata field. You can produce this same behavior with a different syntax by using a subsequent $sort stage which can hold returned values in memory and perform a subsequent sort. If you exceed the memory limit, which could happen for large values of k, you can also spill to disk. More information on that stage is available here: https://www.mongodb.com/docs/manual/reference/operator/aggregation/sort/
In general there are a lot of features that make sense for full text search (such as analyzers) that might not make sense to carry over into knnBeta, so we are not necessarily optimizing for 1:1 parity in terms of syntax. The “should” example you provided makes a bit less sense from our perspective since knnOperator is used to rank results by vector proximity, not assess whether a vector fits within some other criteria. Where it makes sense we will try to add support, but we cannot guarantee that you can simply replace an autocorrect operator with knnBeta in the future.
It's possible if you have a filter that is not very selective that the approximate search executes in cycles for a while where each greedy search through the HNSW layers leads to a candidate at the bottom most layer that does not meet the prefilter, causing wasteful similarity comparisons.
Can you share more about how many unique values exist for fieldA?