opensearch-docs-cn/_query-dsl/full-text/simple-query-string.md

14 KiB

layout title parent grand_parent nav_order
default Simple query string Full-text queries Query DSL 70

Simple query string query

Use the simple_query_string type to specify multiple arguments delineated by regular expressions directly in the query string. Simple query string has a less strict syntax than query string because it discards any invalid portions of the string and does not return errors for invalid syntax.

This query uses a simple syntax to parse the query string based on special operators and split the string into terms. After parsing, the query analyzes each term independently and then returns matching documents.

The following query performs fuzzy search on the title field:

GET _search
{
  "query": {
    "simple_query_string": {
      "query": "\"rises wind the\"~4 | *ising~2",
      "fields": ["title"]
    }
  }
}

{% include copy-curl.html %}

Simple query string syntax

A query string consists of terms and operators. A term is a single word (for example, in the query wind rises, the terms are wind and rises). If several terms are surrounded by quotation marks, they are treated as one phrase where words are marched in the order they appear (for example, "wind rises"). Operators such as +, |, and - specify the Boolean logic used to interpret text in the query string.

Operators

Simple query string syntax supports the following operators.

Operator Description
+ Acts as the AND operator.
` `
* When used at the end of a term, signifies a prefix query.
" Wraps several terms into a phrase (for example, "wind rises").
(, ) Wrap a clause for precedence (for example, `wind + (rises
~n When used after a term (for example, wnid~3), sets fuzziness. When used after a phrase, sets slop.
- Negates the term.

All of the preceding operators are reserved characters. To refer to them as raw characters and not operators, escape any of them with a backslash. When sending a JSON request, use \\ to escape reserved characters (because the backslash character is itself reserved, you must escape the backslash with another backslash).

Default operator

The default operator is OR (unless you set the default_operator to AND). The default operator dictates the overall query behavior. For example, consider an index containing the following documents:

PUT /customers/_doc/1
{
  "first_name":"Amber",
  "last_name":"Duke",
  "address":"880 Holmes Lane"
}

{% include copy-curl.html %}

PUT /customers/_doc/2
{
  "first_name":"Hattie",
  "last_name":"Bond",
  "address":"671 Bristol Street"
}

{% include copy-curl.html %}

PUT /customers/_doc/3
{
  "first_name":"Nanette",
  "last_name":"Bates",
  "address":"789 Madison St"
}

{% include copy-curl.html %}

PUT /customers/_doc/4
{
  "first_name":"Dale",
  "last_name":"Amber",
  "address":"467 Hutchinson Court"
}

{% include copy-curl.html %}

The following query attempts to find documents, for which the address contains the words street or st and does not contain the word madison:

GET /customers/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "address" ],
      "query": "street st -madison"
    }
  }
}

{% include copy-curl.html %}

However, the results include not only the expected document, but all four documents:

Response {: .text-delta}
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 2.2039728,
    "hits": [
      {
        "_index": "customers",
        "_id": "2",
        "_score": 2.2039728,
        "_source": {
          "first_name": "Hattie",
          "last_name": "Bond",
          "address": "671 Bristol Street"
        }
      },
      {
        "_index": "customers",
        "_id": "3",
        "_score": 1.2039728,
        "_source": {
          "first_name": "Nanette",
          "last_name": "Bates",
          "address": "789 Madison St"
        }
      },
      {
        "_index": "customers",
        "_id": "1",
        "_score": 1,
        "_source": {
          "first_name": "Amber",
          "last_name": "Duke",
          "address": "880 Holmes Lane"
        }
      },
      {
        "_index": "customers",
        "_id": "4",
        "_score": 1,
        "_source": {
          "first_name": "Dale",
          "last_name": "Amber",
          "address": "467 Hutchinson Court"
        }
      }
    ]
  }
}

Because the default operator is OR, this query includes documents that contain the words street or st (documents 2 and 3) and documents that do not contain the word madison (documents 1 and 4).

To express the query intent correctly, precede -madison with +:

GET /customers/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "address" ],
      "query": "street st +-madison"
    }
  }
}

{% include copy-curl.html %}

Alternatively, specify AND as the default operator and use disjunction for the words street and st:

GET /customers/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "address" ],
      "query": "st|street -madison",
      "default_operator": "AND"
    }
  }
}

{% include copy-curl.html %}

The preceding query returns document 2:

Response {: .text-delta}
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 2.2039728,
    "hits": [
      {
        "_index": "customers",
        "_id": "2",
        "_score": 2.2039728,
        "_source": {
          "first_name": "Hattie",
          "last_name": "Bond",
          "address": "671 Bristol Street"
        }
      }
    ]
  }
}

Limit operators

To limit the supported operators for the simple query string parser, include the operators that you want to support, separated by |, in the flags parameter. For example, the following query enables only OR, AND, and FUZZY operators:

GET /customers/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "address" ],
      "query": "bristol | madison +stre~2",
      "flags": "OR|AND|FUZZY"
    }
  }
}

{% include copy-curl.html %}

The following table lists all available operator flags.

Flag Description
ALL (default) Enables all operators.
AND Enables the + (AND) operator.
ESCAPE Enables the \ as an escape character.
FUZZY Enables the ~n operator after a word, where n is an integer denoting the allowed edit distance for matching.
NEAR Enables the ~n operator after a phrase, where n is the maximum number of positions allowed between matching tokens. Same as SLOP.
NONE Disables all operators.
NOT Enables the - (NOT) operator.
OR Enables the `
PHRASE Enables the " (quotation marks) for phrase search.
PRECEDENCE Enables the ( and ) (parentheses) operators for operator precedence.
PREFIX Enables the * (prefix) operator.
SLOP Enables the ~n operator after a phrase, where n is the maximum number of positions allowed between matching tokens. Same as NEAR.
WHITESPACE Enables white space characters as characters on which the text is split.

Wildcard expressions

You can specify wildcard expressions using the * special character, which replaces zero or more characters. For example, the following query searches in all fields that end with name:

GET /customers/_search
{
  "query": {
    "simple_query_string" : {
      "query":    "Amber Bond",
      "fields": [ "*name" ] 
    }
  }
}

{% include copy-curl.html %}

Boosting

Use the caret (^) boost operator to boost the relevance score of a field by a multiplier. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is 1.

For example, the following query searches the first_name and last_name fields and boosts matches from the first_name field by a factor of 2:

GET /customers/_search
{
  "query": {
    "simple_query_string" : {
      "query":    "Amber",
      "fields": [ "first_name^2", "last_name" ] 
    }
  }
}

{% include copy-curl.html %}

Multi-position tokens

For multi-position tokens, simple query string creates a match phrase query. Thus, if you specify ml, machine learning as synonyms and search for ml, OpenSearch searches for ml OR "machine learning".

Alternatively, you can match multi-position tokens using conjunctions. If you set auto_generate_synonyms_phrase_query to false, OpenSearch searches for ml OR (machine AND learning).

For example, the following query searches for the text ml models and specifies not to auto-generate a match phrase query for each synonym:

GET /testindex/_search
{
  "query": {
    "simple_query_string": {
      "fields": ["title"],
      "query": "ml models",
      "auto_generate_synonyms_phrase_query": false
    }
  }
}

{% include copy-curl.html %}

For this query, OpenSearch creates the following Boolean query: (ml OR (machine AND learning)) models.

Parameters

The following table lists the top-level parameters that simple_query_string query supports. All parameters except query are optional.

Parameter Data type Description
query String The text that may contain expressions in the simple query string syntax to use for search. Required.
analyze_wildcard Boolean Specifies whether OpenSearch should attempt to analyze wildcard terms. Default is false.
analyzer String The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the default_field. If no analyzer is specified for the default_field, the analyzer is the default analyzer for the index.
auto_generate_synonyms_phrase_query Boolean Specifies whether to create match_phrase queries automatically for multi-term synonyms. Default is true.
default_operator String If the query string contains multiple search terms, whether all terms need to match (AND) or only one term needs to match (OR) for a document to be considered a match. Valid values are:
- OR: The string to be is interpreted as to OR be
- AND: The string to be is interpreted as to AND be
Default is OR.
fields String array The list of fields to search (for example, "fields": ["title^4", "description"]). Supports wildcards. If unspecified, defaults to the index.query. Default_field setting, which defaults to ["*"]. The maximum number of fields that can be searched at the same time is defined by indices.query.bool.max_clause_count, which is 1,024 by default.
flags String A `
fuzzy_max_expansions Positive integer The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in fuzziness. Then OpenSearch tries to match those terms. Default is 50.
fuzzy_transpositions Boolean Setting fuzzy_transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if fuzzy_transpositions is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If fuzzy_transpositions is false, rewind and wnid have the same distance (2) from wind, despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases.
fuzzy_prefix_length Integer The number of beginning characters left unchanged for fuzzy matching. Default is 0.
lenient Boolean Setting lenient to true ignores data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type float. Default is false.
minimum_should_match Positive or negative integer, positive or negative percentage, combination If the query string contains multiple search terms and you use the or operator, the number of terms that need to match for the document to be considered a match. For example, if minimum_should_match is 2, wind often rising does not match The Wind Rises. If minimum_should_match is 1, it matches. For details, see Minimum should match.
quote_field_suffix String This option supports searching for exact matches (surrounded with quotation marks) using a different analysis method than non-exact matches use. For example, if quote_field_suffix is .exact and you search for \"lightly\" in the title field, OpenSearch searches for the word lightly in the title.exact field. This second field might use a different type (for example, keyword rather than text) or a different analyzer.