15 KiB

Raw Blame History

layout	title	nav_order	parent	grand_parent
default	Intervals	80	Full-text queries	Query DSL

Intervals query

The intervals query matches documents based on the proximity and order of matching terms. It applies a set of matching rules to terms contained in the specified field. The query generates sequences of minimal intervals that span terms in the text. You can combine the intervals and filter them by parent sources.

Consider an index containing the following documents:

PUT testindex/_doc/1 
{
  "title": "key-value pairs are efficiently stored in a hash table"
}

{% include copy-curl.html %}

PUT /testindex/_doc/2
{
  "title": "store key-value pairs in a hash map"
}

{% include copy-curl.html %}

For example, the following query searches for documents containing the phrase key-value pairs (with no gap separating the terms) followed by either hash table or hash map:

GET /testindex/_search
{
  "query": {
    "intervals": {
      "title": {
        "all_of": {
          "ordered": true,
          "intervals": [
            {
              "match": {
                "query": "key-value pairs",
                "max_gaps": 0,
                "ordered": true
              }
            },
            {
              "any_of": {
                "intervals": [
                  {
                    "match": {
                      "query": "hash table"
                    }
                  },
                  {
                    "match": {
                      "query": "hash map"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

{% include copy-curl.html %}

The query returns both documents:

Response

{: .text-delta}

{
  "took": 1011,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.25,
    "hits": [
      {
        "_index": "testindex",
        "_id": "2",
        "_score": 0.25,
        "_source": {
          "title": "store key-value pairs in a hash map"
        }
      },
      {
        "_index": "testindex",
        "_id": "1",
        "_score": 0.14285713,
        "_source": {
          "title": "key-value pairs are efficiently stored in a hash table"
        }
      }
    ]
  }
}

Parameters

The query accepts the name of the field (<field>) as a top-level parameter:

GET _search
{
  "query": {
    "intervals": {
      "<field>": {
        ... 
      }
    }
  }
}

{% include copy-curl.html %}

The <field> accepts the following rule objects that are used to match documents based on terms, order, and proximity.

Rule	Description
`match`	Matches analyzed text.
`prefix`	Matches terms that start with a specified set of characters.
`wildcard`	Matches terms using a wildcard pattern.
`fuzzy`	Matches terms that are similar to the provided term within a specified edit distance.
`all_of`	Combines multiple rules using a conjunction (`AND`).
`any_of`	Combines multiple rules using a disjunction (`OR`).

The `match` rule

The match rule matches analyzed text. The following table lists all parameters the match rule supports.

Parameter	Required/Optional	Data type	Description
`query`	Required	String	Text for which to search.
`analyzer`	Optional	String	The analyzer used to analyze the `query` text. Default is the analyzer specified for the `<field>`.
`filter`	Optional	Interval filter rule object	A rule used to filter returned intervals.
`max_gaps`	Optional	Integer	The maximum allowed number of positions between the matching terms. Terms further apart than `max_gaps` are not considered matches. If `max_gaps` is not specified or is set to `-1`, terms are considered matches regardless of their position. If `max_gaps` is set to `0`, matching terms must appear next to each other. Default is `-1`.
`ordered`	Optional	Boolean	Specifies whether matching terms must appear in their specified order. Default is `false`.
`use_field`	Optional	String	Specifies to search this field instead of the top-level . Terms are analyzed using the search analyzer specified for this field. By specifying `use_field`, you can search across multiple fields as if they were all the same field. For example, if you index the same text into stemmed and unstemmed fields, you can search for stemmed tokens that are near unstemmed ones.

The `prefix` rule

The prefix rule matches terms that start with a specified set of characters (prefix). The prefix can expand to match at most 128 terms. If the prefix matches more than 128 terms, an error is returned. The following table lists all parameters the prefix rule supports.

Parameter	Required/Optional	Data type	Description
`prefix`	Required	String	The prefix used to match terms.
`analyzer`	Optional	String	The analyzer used to normalize the `prefix`. Default is the analyzer specified for the `<field>`.
`use_field`	Optional	String	Specifies to search this field instead of the top-level . The `prefix` is normalized using the search analyzer specified for this field, unless you specify an `analyzer`.

The `wildcard` rule

The wildcard rule matches terms using a wildcard pattern. The wildcard pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the wildcard rule supports.

Parameter	Required/Optional	Data type	Description
`pattern`	Required	String	The wildcard pattern used to match terms. Specify `?` to match any single character or `*` to match zero or more characters.
`analyzer`	Optional	String	The analyzer used to normalize the `pattern`. Default is the analyzer specified for the `<field>`.
`use_field`	Optional	String	Specifies to search this field instead of the top-level . The `prefix` is normalized using the search analyzer specified for this field, unless you specify an `analyzer`.

Specifying patterns that start with * or ? can hinder search performance because it increases the number of iterations required to match terms. {: .important}

The `fuzzy` rule

The fuzzy rule matches terms that are similar to the provided term within a specified edit distance. The fuzzy pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the fuzzy rule supports.

Parameter	Required/Optional	Data type	Description
`term`	Required	String	The term to match.
`analyzer`	Optional	String	The analyzer used to normalize the `term`. Default is the analyzer specified for the `<field>`.
`fuzziness`	Optional	String	The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`transpositions`	Optional	Boolean	Setting `transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `transpositions` is `false`, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`prefix_length`	Optional	Integer	The number of beginning characters left unchanged for fuzzy matching. Default is 0.
`use_field`	Optional	String	Specifies to search this field instead of the top-level . The `term` is normalized using the search analyzer specified for this field, unless you specify an `analyzer`.

The `all_of` rule

The all_of rule combines multiple rules using a conjunction (AND). The following table lists all parameters the all_of rule supports.

Parameter	Required/Optional	Data type	Description
`intervals`	Required	Array of rule objects	An array of rules to combine. A document must match all rules in order to be returned in the results.
`filter`	Optional	Interval filter rule object	A rule used to filter returned intervals.
`max_gaps`	Optional	Integer	The maximum allowed number of positions between the matching terms. Terms further apart than `max_gaps` are not considered matches. If `max_gaps` is not specified or is set to `-1`, terms are considered matches regardless of their position. If `max_gaps` is set to `0`, matching terms must appear next to each other. Default is `-1`.
`ordered`	Optional	Boolean	If `true`, intervals generated by the rules should appear in the specified order. Default is `false`.

The `any_of` rule

The any_of rule combines multiple rules using a disjunction (OR). The following table lists all parameters the any_of rule supports.

Parameter	Required/Optional	Data type	Description
`intervals`	Required	Array of rule objects	An array of rules to combine. A document must match at least one rule in order to be returned in the results.
`filter`	Optional	Interval filter rule object	A rule used to filter returned intervals.

The `filter` rule

The filter rule is used to restrict the results. The following table lists all parameters the filter rule supports.

Parameter	Required/Optional	Data type	Description
`after`	Optional	Query object	A query used to return intervals that follow an interval specified in the filter rule.
`before`	Optional	Query object	A query used to return intervals that are before an interval specified in the filter rule.
`contained_by`	Optional	Query object	A query used to return intervals contained by an interval specified in the filter rule.
`containing`	Optional	Query object	A query used to return intervals that contain an interval specified in the filter rule.
`not_contained_by`	Optional	Query object	A query used to return intervals that are not contained by an interval specified in the filter rule.
`not_containing`	Optional	Query object	A query used to return intervals that do not contain an interval specified in the filter rule.
`not_overlapping`	Optional	Query object	A query used to return intervals that do not overlap with an interval specified in the filter rule.
`overlapping`	Optional	Query object	A query used to return intervals that overlap with an interval specified in the filter rule.
`script`	Optional	Script object	A script used to match documents. This script must return `true` or `false`.

Example: Filters

The following query searches for documents containing the words pairs and hash that are within five positions of each other and don't contain the word efficiently between them:

POST /testindex/_search
{
  "query": {
    "intervals" : {
      "title" : {
        "match" : {
          "query" : "pairs hash",
          "max_gaps" : 5,
          "filter" : {
            "not_containing" : {
              "match" : {
                "query" : "efficiently"
              }
            }
          }
        }
      }
    }
  }
}

{% include copy-curl.html %}

The response contains only document 2:

Response

{: .text-delta}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.25,
    "hits": [
      {
        "_index": "testindex",
        "_id": "2",
        "_score": 0.25,
        "_source": {
          "title": "store key-value pairs in a hash map"
        }
      }
    ]
  }
}

Example: Script filters

Alternatively, you can write your own script filter to include with the intervals query using the following variables:

interval.start: The position (term number) where the interval starts.
interval.end: The position (term number) where the interval ends.
interval.gap: The number of words between the terms.

For example, the following query searches for the words map and hash that are next to each other within the specified interval. Terms are numbered starting with 0, so in the text store key-value pairs in a hash map, store is at position 0, keyis at position 1, and so on. The specified interval should start after a and end before the end of string:

POST /testindex/_search
{
  "query": {
    "intervals" : {
      "title" : {
        "match" : {
          "query" : "map hash",
          "filter" : {
            "script" : {
              "source" : "interval.start > 5 && interval.end < 8 && interval.gaps == 0"
            }
          }
        }
      }
    }
  }
}

{% include copy-curl.html %}

The response contains document 2:

Response

{: .text-delta}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.5,
    "hits": [
      {
        "_index": "testindex",
        "_id": "2",
        "_score": 0.5,
        "_source": {
          "title": "store key-value pairs in a hash map"
        }
      }
    ]
  }
}

Interval minimization

To ensure that queries run in linear time, the intervals query minimizes the intervals. For example, consider a document containing the text a b c d c. You can use the following query to search for d that is contained by a and c:

POST /testindex/_search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "match" : {
          "query" : "d",
          "filter" : {
            "contained_by" : {
              "match" : {
                "query" : "a c"
              }
            }
          }
        }
      }
    }
  }
}

{% include copy-curl.html %}

The query returns no results because it matches the first two terms a c and finds no d between these terms.

15 KiB Raw Blame History

Intervals query

Parameters

The match rule

The prefix rule

The wildcard rule

The fuzzy rule

The all_of rule

The any_of rule

The filter rule

Example: Filters

Example: Script filters

Interval minimization

15 KiB

Raw Blame History

The `match` rule

The `prefix` rule

The `wildcard` rule

The `fuzzy` rule

The `all_of` rule

The `any_of` rule

The `filter` rule