Provide straightforward syntax for 1-to-1 joins in aggregation
The syntax for joins that bring back multiple documents from foreign collections is very straightforward and yields exactly what one would expect, but simple joins that are bread and butter in SQL require very convoluted and expensive to run syntax.
Consider a product database that has products
, categories
and reviews
collections. Each product has a unique category and may have multiple reviews. Getting all reviews in an aggregation is very straightforward (top stage), but getting categories, similar to SQL, is as convoluted as it gets (bottom stage).
db.products.aggregate(
[
//
// Document aggregates naturally aggregate foreign documents
// into the primary document.
//
{$lookup: {
from: "reviews",
localField: "_id",
foreignField: "product_id",
as: "reviews"
}},
//
// A simple join that is supposed to bring a couple of fields
// into the primary document translates into this convoluted
// syntax just to get those two fields.
//
{$lookup: {
from: "categories",
let: {category_id: "$category"},
pipeline: [
{$match: {
$expr: {$eq: ["$_id", "$$category_id"]}
}},
{$project: {
_id: 0,
cat_name: "$category",
cat_description: "$description"
}}
],
as: "category_details"
}},
{$replaceRoot: {
newRoot: {
$mergeObjects: [
"$$ROOT", {$arrayElemAt: ["$category_details", 0]}
]
}
}},
{$project: {
category_details: 0
}}
])
Notice that keeping category.category
and category.description
in products
, the noSQL-way, isn't an option because updating 10000's of products when category or description change is just not a good idea any way one looks at it. Denormalization is useful in many noSQL scenarios, but this isn't one of them.
Having some simple join syntax that would identify the joined collection as a 1-to-1 join and would bring field names and values as if they existed in the primary document, possibly aliased via a projection-like syntax, would be extremely helpful in keeping aggregations readable, keeping data reasonably normalized, and would run much faster because it would be just multiple parallel BSON append
calls when constructing aggregation results documents.
Data for the query:
products:
{"_id":{"$oid":"6140e6c6dfe7b9d6e597d07e"},"name":"Product A","category":1,"price":111}
{"_id":{"$oid":"6140e6c6dfe7b9d6e597d07f"},"name":"Product B","category":2,"price":222}
{"_id":{"$oid":"6140e6c6dfe7b9d6e597d080"},"name":"Product C","category":1,"price":333}
categories:
{"_id":1,"category":"Clothing","description":"All kinds of clothing"}
{"_id":2,"category":"Electronics","description":"Cool gadgets"}
reviews:
{"_id":{"$oid":"6140e7cadfe7b9d6e597d08a"},"product_id":{"$oid":"6140e6c6dfe7b9d6e597d07e"},"review":"Product A is great","rating":4}
{"_id":{"$oid":"6140e813dfe7b9d6e597d090"},"product_id":{"$oid":"6140e6c6dfe7b9d6e597d07f"},"review":"This product is horrible","rating":1}
{"_id":{"$oid":"6140e8eff4ff76778b48ba88"},"product_id":{"$oid":"6140e6c6dfe7b9d6e597d07f"},"review":"This product is not the best, but usable","rating":2}
-
Andre M commented
Just wanted to add that in case if it wasn't very clear from my response, the original suggestion *does* use the `localField`/`foreignField` form where it's suitable, which only works for aggregating array sub-objects, such post comments or reviews.
What the suggestion does outline is that for 1-to-1 joins, which just need a few fields from the foreign collection and is a very common use case, the syntax becomes nightmarish and if Mongo DB would came up with a straightforward 1-to-1 join syntax, it would work very well for many Mongo DB users who want to use simple references, like we do in SQL, rather than having to update tens of thousands of denormalized documents, like we have to do today.
PS Can somebody please look at the broken Markdown formatting of these comments and the ridiculous 5-line window size on this site?
-
Andre M commented
Thanks for the response, Katya.
`$match` in my example isn't used as a replacement of the form with `localField`/`foreignField`, but because I need only those fields from the linked collection and this is the only way not to pull the entire linked object, but rather just fields required for this join.
In reality, only simple aggregation pipelines use the `localField`/`foreignField` form and in most cases the pipeline form would be used to control what's returned and how it is named via the projection.
The ask really is to recognize 1-to-1 joins or allow people to specify them explicitly, so one doesn't have to jump through the hoops manipulating all these sub-objects - it makes queries quite convoluted and also cannot be good for the server that needs multiple BSON operations to perform all those merges and unwinds.
-
Thank you for the feedback!
While as of today $lookup still always produces an array, even if there's only one document found in the 'from' collection, starting MongoDB 5.0 you can use `localField` and `foreignField` with the `pipeline`, so $match is not necessary
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax
[
{
"$lookup": {
"from": "categories",
"localField": "category",
"foreignField": "_id",
"pipeline": [
{
"$project": {
"_id": 0,
"cat_name": "$category",
"cat_description": "$description"
}
}
],
"as": "category_details"
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$$ROOT",
{
"$first": "$category_details"
}
]
}
}
},
{
"$project": {
"category_details": 0
}
}
]