How to tokenize email field?

What problem are you trying to solve? Focus on the what and why of the need you have, not the how you'd like it solved.	I want to tokenize an email. For example, here is an email address: jack.ma@gail-abc.ws.com, and can be tokenized as the below: jack jack.ma jack.m ma jack.ma@gail jack.ma@gail-abc.ws.com gail-abc.ws.com abc.ws.com ws.com com abc.ws.co such as these tokens, I can search them by any form of input.
What would you like to see happen? Describe the desired outcome or enhancement.
Why is this important to you or your team? Explain how the request adds value or solves a business need.
What steps, if any, are you taking today to manage this problem?

Post comment

Admin
Coral Parmar

Dec 9, 2025
Hello,
Thank you for the detailed examples! Achieving that level of flexibility for email search—where you need to match partials like jack.m as well as specific parts like ws.com—is a very common requirement.
To support all the token examples you listed, the best approach is to use a Multi-Field mapping in Atlas Search.
This simply means we will index the email field in two different ways simultaneously:
1. As Text: To capture whole words (handling jack, ma, ws, com).
2. As Autocomplete: To capture partial characters as the user types (handling jack.m, abc.ws.co).
Here is the JSON configuration to paste into your Atlas Search Index editor. This maps the field emailAddress to use both the Standard Analyzer and the Autocomplete type.
JSON
```
{
  "mappings": {
    "dynamic": false,
    "fields": {
      "emailAddress": [
        {
          "type": "string",
          "analyzer": "lucene.standard"
        },
        {
          "type": "autocomplete",
          "tokenization": "nGram",
          "minGrams": 2,
          "maxGrams": 15,
          "foldDiacritics": true
        }
      ]
    }
  }
}
```
To search across both of these strategies at once, you can use the compound operator in your aggregation pipeline.
JavaScript
```
[
  {
    "$search": {
      "index": "default", 
      "compound": {
        "should": [
          {
            "text": {
              "query": "jack.m", 
              "path": "emailAddress"
            }
          },
          {
            "autocomplete": {
              "query": "jack.m",
              "path": "emailAddress"
            }
          }
        ],
        "minimumShouldMatch": 1
      }
    }
  }
]
```
- Input jack, ma, ws, or com: The Standard Analyzer splits emails on punctuation. It treats ws and com as separate tokens, so searching for them works immediately.
- Input jack.m or abc.ws: The Autocomplete definition creates "edge n-grams" (partial text), allowing these partial inputs to succeed.
- Input jack.ma@gail...: The search will match the tokens found within the full string.
Helpful Resources
For a deeper dive into these configurations, here is the relevant documentation:
- Defining Field Mappings: How to map one field multiple ways (Multi-field).
- Autocomplete Operator: Details on nGrams and partial matching.
- Standard Analyzer: How text is split into tokens.
I hope this unblocks you! Please let me know if you run into any issues applying this JSON to your index.
Reply
Hide replies
Like

Please enter your email address

RELATED FEEDBACK

How to tokenize email field?

Helpful Resources