This commit removes some parameters deprecated in 6.x (or 5.x):
`use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`.
Closes#25551
* Clarifies how the query_string splits textual part to build a query
Whitespaces are not considered as operators anymore in 6x but the documentation is not clear about it.
This commit changes the example in the documentation and adds a note regarding whitespaces and operators.
Closes#28719
Add fuzzy_transpositions parameter to multi_match and query_string queries.
Add fuzzy_transpositions, fuzzy_prefix_length and fuzzy_max_expansions
parameters to simple_query_string query.
* Limit the number of expanded fields it query_string and simple_query_string
This limits the number of automatically expanded fields for the "all fields"
mode (`"default_field": "*"`) for the `query_string` and `simple_query_string`
queries to 1024 fields.
Resolves#25105
* Add blurb about limit to the docs
Adding a check to QueryStringQueryBuilderTests that checks the override
behaviour of `quote_analyzer`, also adding documentation explaining the use of
this parameter in `query_string` query.
Closes#25417
* Remove the _all metadata field
This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string
This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)
Note how the multi terms synonym "new york" produces a phrase query.
This commit changes the way we handle field expansion in `match`, `multi_match` and `query_string` query.
The main changes are:
- For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.
- For partial field names (with `*` suffix), the expansion is done only on `keyword`, `text`, `date`, `ip` and `number` field types. Other field types are simply ignored.
- For all fields (`*`), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.
- The `*` notation can also be used to set `default_field` option on`query_string` query. This should replace the needs for the extra option `use_all_fields` which is deprecated in this change.
This commit also rewrites simple `*` query to matchalldocs query when all fields are requested (Fixes#25556).
The same change should be done on `simple_query_string` for completeness.
`use_all_fields` option in `query_string` is also deprecated in this change, `default_field` should be set to `*` instead.
Relates #25551
This change refactors the query_string query to analyze the query text around logical operators of the query string the same way than a match_query/multi_match_query.
It also adds a type parameter that can be used to change the way multi fields query are built the same way than a multi_match query does.
Now that these queries share the same behavior regarding text analysis, some parameters are obsolete and have been deprecated:
split_on_whitespace: This setting is now ignored with a deprecation notice
if it is used explicitely. With this PR The query_string always splits on logical operator.
It simplifies the understanding of the other parameters that can have different meanings
depending on the value of split_on_whitespace.
auto_generate_phrase_queries: This setting is now ignored with a deprecation notice
if it is used explicitely. This setting only makes sense when the parser splits on whitespace.
use_dismax: This setting is now ignored with a deprecation notice
if it is used explicitely. The tie_breaker parameter is sufficient to handle best_fields/most_fields.
Fixes#25574
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output
This commit introduces a new execution mode for the query_string query, which
is intended down the road to be a replacement for the current _all field.
It now does auto-field-expansion and auto-leniency when the following criteria
are ALL met:
The _all field is disabled
No default_field has been set in the index settings
No default_field has been set in the request
No fields are specified in the request
Additionally, a user can force the "all-like" execution by setting the
all_fields parameter to true.
When executing in all field mode, the query_string query will look at all the
fields in the mapping that are not metafields and can be searched, and
automatically expand the list of fields that are going to be queried.
Relates to #19784
Lucene 6.2 introduces the new `Analyzer.normalize` API, which allows to apply
only character-level normalization such as lowercasing or accent folding, which
is exactly what is needed to process queries that operate on partial terms such
as `prefix`, `wildcard` or `fuzzy` queries. As a consequence, the
`lowercase_expanded_terms` option is not necessary anymore. Furthermore, the
`locale` option was only needed in order to know how to perform the lowercasing,
so this one can be removed as well.
Closes#9978
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.