Skip to Main Content

MongoByte MongoDB Logo

Welcome to the new MongoDB Feedback Portal!

{Improvement: "Your idea"}
We’ve upgraded our system to better capture and act on your feedback.
Your feedback is meaningful and helps us build better products.

Status Future Consideration
Categories Atlas Search
Created by Chaoyang Zhu
Created on Nov 20, 2025

How to tokenize email field?

What problem are you trying to solve?

Focus on the what and why of the need you have, not the how you'd like it solved.

I want to tokenize an email. For example, here is an email address: jack.ma@gail-abc.ws.com, and can be tokenized as the below:

jack

jack.ma

jack.m

ma

jack.ma@gail

jack.ma@gail-abc.ws.com

gail-abc.ws.com

abc.ws.com

ws.com

com

abc.ws.co

such as these tokens, I can search them by any form of input.

What would you like to see happen?

Describe the desired outcome or enhancement.


Why is this important to you or your team?

Explain how the request adds value or solves a business need.


What steps, if any, are you taking today to manage this problem?



  • Admin
    Coral Parmar
    Dec 9, 2025

    Hello,

    Thank you for the detailed examples! Achieving that level of flexibility for email search—where you need to match partials like jack.m as well as specific parts like ws.com—is a very common requirement.

    To support all the token examples you listed, the best approach is to use a Multi-Field mapping in Atlas Search.

    This simply means we will index the email field in two different ways simultaneously:

    1. As Text: To capture whole words (handling jack, ma, ws, com).

    2. As Autocomplete: To capture partial characters as the user types (handling jack.m, abc.ws.co).

    Here is the JSON configuration to paste into your Atlas Search Index editor. This maps the field emailAddress to use both the Standard Analyzer and the Autocomplete type.

    JSON

    {
    "mappings": {
    "dynamic": false,
    "fields": {
    "emailAddress": [
    {
    "type": "string",
    "analyzer": "lucene.standard"
    },
    {
    "type": "autocomplete",
    "tokenization": "nGram",
    "minGrams": 2,
    "maxGrams": 15,
    "foldDiacritics": true
    }
    ]
    }
    }
    }

    To search across both of these strategies at once, you can use the compound operator in your aggregation pipeline.

    JavaScript

    [
    {
    "$search": {
    "index": "default",
    "compound": {
    "should": [
    {
    "text": {
    "query": "jack.m",
    "path": "emailAddress"
    }
    },
    {
    "autocomplete": {
    "query": "jack.m",
    "path": "emailAddress"
    }
    }
    ],
    "minimumShouldMatch": 1
    }
    }
    }
    ]
    • Input jack, ma, ws, or com: The Standard Analyzer splits emails on punctuation. It treats ws and com as separate tokens, so searching for them works immediately.

    • Input jack.m or abc.ws: The Autocomplete definition creates "edge n-grams" (partial text), allowing these partial inputs to succeed.

    • Input jack.ma@gail...: The search will match the tokens found within the full string.

    Helpful Resources

    For a deeper dive into these configurations, here is the relevant documentation:

    I hope this unblocks you! Please let me know if you run into any issues applying this JSON to your index.