Usage of vectorSearch in more complex use cases

I am thinking of how to implement semantic search in existing and more complex applications. For example, i have an application with a search input field for regular fulltext search and dynamic filters that are generated with the help of facets. The results are displayed in a paginated and sortable table.

All this is possible with the $search and $searchMeta stages. But how would you implement semantic search for the search input field instead or together with regular fulltext search?

I see that MongoDB is implementing a stable version of knnBeta, which is good and solves many of my problems (https://www.mongodb.com/docs/atlas/atlas-search/operators-collectors/vectorSearch/?interface=driver&language=nodejs&vector-search-type=enn&prefilter-type=compound). But due to the limitations of the new vectorSearch you can not easily include features like faceting, sorting and pagination. There are some workaround but they are harder to integrate in an existing codebase and will probably be slower.

For example to generate facets for a vectorSearch query you could perform two queries. First, a vectorSearch with returnStoredSource: true to fetch all _id's of found documents and then a second $searchMeta query with an in operator with the _id's from the first query to fetch the facets.

At the moment it seems that vectorSearch dominates the entire query. So i would like to either have more examples for more complex use cases in the official documentation or maybe a new search operator that treats vectorSearch as a "regular" operator that just calculates a score like text, autocomplete, near, ...

The new operator (maybe vectorDistance?) could behave similar to the current vectorSearch operator but it has no filter and limit option. Instead it calculates the vector distance for each document as score between 0 and 1. It has not the limitations of vectorSearch and can be used inside a facet or compound. This way you are far more flexible and can still include additional filter like this:

```
db.products.aggregate([
{
$search: {
facet: {
operator: {
compound: {
should: [
{
text: {
query: "office lighting",
path: ["title", "description"]

score: {

boost: 2

}
}
},
{
vectorDistance: {
queryVector: embeddingVector,
path: "titleEmbedding",
}
}
],

minimumShouldMatch: 1,
filter: [
{
range: {
path: "price",
lte: 500
}
},
{
equals: {
path: "availability.status",
value: "in stock"
}
}
]
}
},
facets: {
categoryFacet: {
type: "string",
path: "category"
},
brandFacet: {
type: "string",
path: "brand"
},
priceFacet: {
type: "number",
path: "price",
boundaries: [50, 100, 200, 300, 500, 1000]
}
}
},
sort: {score: {$meta: "searchScore"}} // support $search sort
}
},
{ $limit: 20 },

{
$project: {
title: 1,
price: 1,
category: 1,
brand: 1,
score: { $meta: "searchScore" }
}
},
]);
```

Post comment

Please enter your email address

RELATED FEEDBACK

Usage of vectorSearch in more complex use cases