opensearch-docs-cn/_opensearch/query-dsl/term.md

11 KiB

layout title parent nav_order
default Term-level queries Query DSL 20

Term-level queries

Term-level queries search an index for documents that contain an exact search term. Documents returned by a term-level query are not sorted by their relevance scores.

When working with text data, use term-level queries for fields mapped as keyword only.

Term-level queries are not suited for searching analyzed text fields. To return analyzed fields, use a full-text query.

Term-level query types

The following table lists all term-level query types.

Query type Description
term Searches for documents with an exact term in a specific field.
terms Searches for documents with one or more terms in a specific field.
terms_set Searches for documents that match a minimum number of terms in a specific field.
ids Searches for documents by document ID.
range Searches for documents with field values in a specific range.
prefix Searches for documents with terms that begin with a specific prefix.
exists Searches for documents with any indexed value in a specific field.
fuzzy Searches for documents with terms that are similar to the search term within the maximum allowed Levenshtein distance. The Levenshtein distance measures the number of one-character changes needed to change one term to another term.
wildcard Searches for documents with terms that match a wildcard pattern.
regexp Searches for documents with terms that match a regular expression.

Term

Use the term query to search for an exact term in a field.

GET shakespeare/_search
{
  "query": {
    "term": {
      "line_id": {
        "value": "61809"
      }
    }
  }
}

{% include copy-curl.html %}

Terms

Use the terms query to search for multiple terms in the same field.

GET shakespeare/_search
{
  "query": {
    "terms": {
      "line_id": [
        "61809",
        "61810"
      ]
    }
  }
}

{% include copy-curl.html %}

You get back documents that match any of the terms.

Terms set

With a terms set query, you can search for documents that match a minimum number of exact terms in a specified field. The terms_set query is similar to the terms query, but you can specify the minimum number of matching terms that are required to return a document. You can specify this number either in a field in the index or with a script.

As an example, consider an index that contains students with classes they have taken. When setting up the mapping for this index, you need to provide a numeric field that specifies the minimum number of matching terms that are required to return a document:

PUT students
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "classes": {
        "type": "keyword"
      },
      "min_required": {
        "type": "integer"
      }
    }
  }
}

{% include copy-curl.html %}

Next, index two documents that correspond to students:

PUT students/_doc/1
{
  "name": "Mary Major",
  "classes": [ "CS101", "CS102", "MATH101" ],
  "min_required": 2
}

{% include copy-curl.html %}

PUT students/_doc/2
{
  "name": "John Doe",
  "classes": [ "CS101", "MATH101", "ENG101" ],
  "min_required": 2
}

{% include copy-curl.html %}

Now search for students who have taken at least two of the following classes: CS101, CS102, MATH101:

GET students/_search
{
  "query": {
    "terms_set": {
      "classes": {
        "terms": [ "CS101", "CS102", "MATH101" ],
        "minimum_should_match_field": "min_required"
      }
    }
  }
}

{% include copy-curl.html %}

The response contains both students:

{
  "took" : 44,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.4544616,
    "hits" : [
      {
        "_index" : "students",
        "_id" : "1",
        "_score" : 1.4544616,
        "_source" : {
          "name" : "Mary Major",
          "classes" : [
            "CS101",
            "CS102",
            "MATH101"
          ],
          "min_required" : 2
        }
      },
      {
        "_index" : "students",
        "_id" : "2",
        "_score" : 0.5013843,
        "_source" : {
          "name" : "John Doe",
          "classes" : [
            "CS101",
            "MATH101",
            "ENG101"
          ],
          "min_required" : 2
        }
      }
    ]
  }
}

To specify the minimum number of terms a document should match with a script, provide the script in the minimum_should_match_script field:

GET students/_search
{
  "query": {
    "terms_set": {
      "classes": {
        "terms": [ "CS101", "CS102", "MATH101" ],
        "minimum_should_match_script": {
          "source": "Math.min(params.num_terms, doc['min_required'].value)"
        }
      }
    }
  }
}

{% include copy-curl.html %}

IDs

Use the ids query to search for one or more document ID values.

GET shakespeare/_search
{
  "query": {
    "ids": {
      "values": [
        34229,
        91296
      ]
    }
  }
}

{% include copy-curl.html %}

Range

You can search for a range of values in a field with the range query.

To search for documents where the line_id value is >= 10 and <= 20:

GET shakespeare/_search
{
  "query": {
    "range": {
      "line_id": {
        "gte": 10,
        "lte": 20
      }
    }
  }
}

{% include copy-curl.html %}

Parameter Behavior
gte Greater than or equal to.
gt Greater than.
lte Less than or equal to.
lt Less than.

In addition to the range query parameters, you can provide date formats or relation operators such as "contains" or "within." To see the supported field types for range queries, see Range query optional parameters. To see all date formats, see Formats. {: .tip }

Assume that you have a products index and you want to find all the products that were added in the year 2019:

GET products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2019/01/01",
        "lte": "2019/12/31"
      }
    }
  }
}

{% include copy-curl.html %}

Specify relative dates by using date math.

To subtract 1 year and 1 day from the specified date, use the following query:

GET products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2019/01/01||-1y-1d"
      }
    }
  }
}

{% include copy-curl.html %}

The first date that we specify is the anchor date or the starting point for the date math. Add two trailing pipe symbols. You could then add one day (+1d) or subtract two weeks (-2w). This math expression is relative to the anchor date that you specify.

You could also round off dates by adding a forward slash to the date or time unit.

To find products added in the last year and rounded off by month:

GET products/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "now-1y/M"
      }
    }
  }
}

{% include copy-curl.html %}

The keyword now refers to the current date and time.

Prefix

Use the prefix query to search for terms that begin with a specific prefix.

GET shakespeare/_search
{
  "query": {
    "prefix": {
      "speaker": "KING"
    }
  }
}

{% include copy-curl.html %}

Exists

Use the exists query to search for documents that contain a specific field.

GET shakespeare/_search
{
  "query": {
    "exists": {
      "field": "speaker"
    }
  }
}

{% include copy-curl.html %}

Fuzzy

A fuzzy query searches for documents with terms that are similar to the search term within the maximum allowed Levenshtein distance. The Levenshtein distance measures the number of one-character changes needed to change one term to another term. These changes include:

  • Replacements: cat to bat
  • Insertions: cat to cats
  • Deletions: cat to at
  • Transpositions: cat to act

A fuzzy query creates a list of all possible expansions of the search term that fall within the Levenshtein distance. You can specify the maximum number of such expansions in the max_expansions field. Then is searches for documents that match any of the expansions.

The following example query searches for the speaker HALET (misspelled HAMLET). The maximum edit distance is not specified, so the default AUTO edit distance is used:

GET shakespeare/_search
{
  "query": {
    "fuzzy": {
      "speaker": {
        "value": "HALET"
      }
    }
  }
}

{% include copy-curl.html %}

The response contains all documents where HAMLET is the speaker.

The following example query searches for the word cat with advanced parameters:

GET shakespeare/_search
{
  "query": {
    "fuzzy": {
      "speaker": {
        "value": "HALET",
        "fuzziness": "2",
        "max_expansions": 40,
        "prefix_length": 0,
        "transpositions": true,
        "rewrite": "constant_score"
      }
    }
  }
}

{% include copy-curl.html %}

Wildcard

Use wildcard queries to search for terms that match a wildcard pattern.

Feature Behavior
* Specifies all valid values.
? Specifies a single valid value.

To search for terms that start with H and end with Y:

GET shakespeare/_search
{
  "query": {
    "wildcard": {
      "speaker": {
        "value": "H*Y"
      }
    }
  }
}

{% include copy-curl.html %}

If we change * to ?, we get no matches, because ? refers to a single character.

Wildcard queries tend to be slow because they need to iterate over a lot of terms. Avoid placing wildcard characters at the beginning of a query because it could be a very expensive operation in terms of both resources and time.

Regexp

Use the regexp query to search for terms that match a regular expression.

This regular expression matches any single uppercase or lowercase letter:

GET shakespeare/_search
{
  "query": {
    "regexp": {
      "play_name": "[a-zA-Z]amlet"
    }
  }
}

{% include copy-curl.html %}

A few important notes:

  • Regular expressions are applied to the terms in the field (i.e. tokens), not the entire field.
  • Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see the Lucene documentation.
  • regexp queries can be expensive operations and require the search.allow_expensive_queries setting to be set to true. Before making frequent regexp queries, test their impact on cluster performance and examine alternative queries for achieving similar results.