Boost the performance of bioinformatic annotation queries

The documents to be selected look something like this:

{
"_id": {
"$oid": "6272c580d4400d8cb10d5406"
},
"#CHROM": 1,
"POS": 286747,
"ID": "rs369556846",
"REF": "A",
"ALT": "G",
"QUAL": ".",
"FILTER": ".",
"INFO": [{
"RS": 369556846,
"RSPOS": 286747,
"dbSNPBuildID": 138,
"SSR": 0,
"SAO": 0,
"VP": "0x050100000005150026000100",
"WGT": 1,
"VC": "SNV",
"CAF": [{
"$numberDecimal": "0.9381"
}, {
"$numberDecimal": "0.0619"
}],
"COMMON": 1,
"TOPMED": [{
"$numberDecimal": "0.88411856523955147"
}, {
"$numberDecimal": "0.11588143476044852"
}]
},
["SLO", "ASP", "VLD", "G5", "KGPhase3"]
]
}

For a basic annotation (https://en.wikipedia.org/wiki/SNP_annotation) scenario, we need such query:

{'ID': {'$in': ['rs369556846', 'rs2185539', 'rs2519062', 'rs149363311', 'rs55745762', <...>]}}
, where <...> means hundreds/thousands of values.

Such query is executed in a few seconds.

More complex annotation queries:

{'$or': [{'#CHROM': 1, 'POS': 1499125}, {'#CHROM': 1, 'POS': 1680158}, {'#CHROM': 1, 'POS': 1749174}, {'#CHROM': 1, 'POS': 3061224}, {'#CHROM': 1, 'POS': 3589337}, <...>]}

{'$or': [{'ID': 'rs149434212', 'REF': 'C', 'ALT': 'T'}, {'ID': 'rs72901712', 'REF': 'G', 'ALT': 'A'}, {'ID': 'rs145474533', 'REF': 'G', 'ALT': 'C'}, {'ID': 'rs12096573', 'REF': 'G', 'ALT': 'T'}, {'ID': 'rs10909978', 'REF': 'G', 'ALT': 'A'}, <...>]}

Despite the involvement of IXSCAN, they run many hours.

Please test aforementioned queries thoroughly and improve the performance of their execution. This will help science!

1 vote

Platon shared this idea · Jun 18, 2022 · Report… · Admin →

An error occurred while saving the comment

How can we improve the MongoDB Database?

Boost the performance of bioinformatic annotation queries

Feedback

Database: Performance

Feedback and Knowledge Base

Searching…

Give feedback

Boost the performance of bioinformatic annotation queries

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

Database: Performance

Categories

Searching…

Give feedback