OpenSearch/docs/reference/migration/migrate_7_0/search.asciidoc

[float]
[[breaking_70_search_changes]]
=== Search and Query DSL changes

[float]
==== Changes to queries
*   The default value for `transpositions` parameter of `fuzzy` query
    has been changed to `true`.

*   The `query_string` options `use_dismax`, `split_on_whitespace`,
    `all_fields`, `locale`, `auto_generate_phrase_query` and
    `lowercase_expanded_terms` deprecated in 6.x have been removed.

*   Purely negative queries (only MUST_NOT clauses) now return a score of `0`
    rather than `1`.

*   The boundary specified using geohashes in the `geo_bounding_box` query
    now include entire geohash cell, instead of just geohash center.

*   Attempts to generate multi-term phrase queries against non-text fields
    with a custom analyzer will now throw an exception

*   An `envelope` crossing the dateline in a `geo_shape `query is now processed
    correctly when specified using REST API instead of having its left and
    right corners flipped.

[float]
==== Adaptive replica selection enabled by default

Adaptive replica selection has been enabled by default. If you wish to return to
the older round robin of search requests, you can use the
`cluster.routing.use_adaptive_replica_selection` setting:

[source,js]
--------------------------------------------------
PUT /_cluster/settings
{
    "transient": {
        "cluster.routing.use_adaptive_replica_selection": false
    }
}
--------------------------------------------------
// CONSOLE

[float]
==== Search API returns `400` for invalid requests

The Search API returns `400 - Bad request` while it would previously return
`500 - Internal Server Error` in the following cases of invalid request:

*   the result window is too large
*   sort is used in  combination with rescore
*   the rescore window is too large
*   the number of slices is too large
*   keep alive for scroll is too large
*   number of filters in the adjacency matrix aggregation is too large
*   script compilation errors

[float]
==== Scroll queries cannot use the `request_cache` anymore

Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`)
has been deprecated in 6 and will now return a `400 - Bad request`.
Scroll queries are not meant to be cached.

[float]
==== Scroll queries cannot use `rescore`  anymore

Including a rescore clause on a query that creates a scroll (`scroll=1m`) has
been deprecated in 6.5 and will now return a `400 - Bad request`.  Allowing
rescore on scroll queries would break the scroll sort.  In the 6.x line, the
rescore clause was silently ignored (for scroll queries), and it was allowed in
the 5.x line.

[float]
==== Term Suggesters supported distance algorithms

The following string distance algorithms were given additional names in 6.2 and
their existing names were deprecated. The deprecated names have now been
removed.

* 	`levenstein` - replaced by `levenshtein`
* 	`jarowinkler` - replaced by `jaro_winkler`

[float]
==== `popular` mode for Suggesters

The `popular` mode for Suggesters (`term` and `phrase`) now uses the doc frequency
(instead of the sum of the doc frequency) of the input terms to compute the frequency
threshold for candidate suggestions.

[float]
==== Limiting the number of terms that can be used in a Terms Query request

Executing a Terms Query with a lot of terms may degrade the cluster performance,
as each additional term demands extra processing and memory.
To safeguard against this, the maximum number of terms that can be used in a
Terms Query request has been limited to 65536. This default maximum can be changed
for a particular index with the index setting `index.max_terms_count`.

[float]
==== Limiting the length of regex that can be used in a Regexp Query request

Executing a Regexp Query with a long regex string may degrade search performance.
To safeguard against this, the maximum length of regex that can be used in a
Regexp Query request has been limited to 1000. This default maximum can be changed
for a particular index with the index setting `index.max_regex_length`.

[float]
==== Limiting the number of auto-expanded fields

Executing queries that use automatic expansion of fields (e.g. `query_string`, `simple_query_string`
or `multi_match`) can have performance issues for indices with a large numbers of fields.
To safeguard against this, a hard limit of 1024 fields has been introduced for queries
using the "all fields" mode ("default_field": "*") or other fieldname expansions (e.g. "foo*").

[float]
==== Invalid `_search` request body

Search requests with extra content after the main object will no longer be accepted
by the `_search` endpoint. A parsing exception will be thrown instead.

[float]
==== Context Completion Suggester

The ability to query and index context enabled suggestions without context,
deprecated in 6.x, has been removed. Context enabled suggestion queries
without contexts have to visit every suggestion, which degrades the search performance
considerably.

For geo context the value of the `path` parameter is now validated against the mapping,
and the context is only accepted if `path` points to a field with `geo_point` type.

[float]
==== Semantics changed for `max_concurrent_shard_requests`

`max_concurrent_shard_requests` used to limit the total number of concurrent shard
requests a single high level search request can execute. In 7.0 this changed to be the
max number of concurrent shard requests per node. The default is now `5`.

[float]
==== `max_score` set to `null` when scores are not tracked

`max_score` used to be set to `0` whenever scores are not tracked. `null` is now used
instead which is a more appropriate value for a scenario where scores are not available.

[float]
==== Negative boosts are not allowed

Setting a negative `boost` in a query, deprecated in 6x, are not allowed in this version.
To deboost a specific query you can use a `boost` comprise between 0 and 1.
[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Enable adaptive replica selection by default (#26522) Relates to #24915 2017-09-07 11:25:05 -04:00			`[[breaking_70_search_changes]]`
Change default value to true for transpositions parameter of fuzzy query (#26901) 2017-10-11 09:31:48 -04:00			`=== Search and Query DSL changes`

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Change default value to true for transpositions parameter of fuzzy query (#26901) 2017-10-11 09:31:48 -04:00			`==== Changes to queries`
			* The default value for `transpositions` parameter of `fuzzy` query
			has been changed to `true`.
Enable adaptive replica selection by default (#26522) Relates to #24915 2017-09-07 11:25:05 -04:00
Remove deprecated options for query_string (#29203) This commit removes some parameters deprecated in 6.x (or 5.x): `use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`. Closes #25551 2018-03-22 13:37:08 -04:00			* The `query_string` options `use_dismax`, `split_on_whitespace`,
			`all_fields`, `locale`, `auto_generate_phrase_query` and
			`lowercase_expanded_terms` deprecated in 6.x have been removed.

Make purely negative queries return scores of 0. (#26015) It would make them consistent with queries that are only made of filters. Closes #23449 2018-04-10 08:31:06 -04:00			* Purely negative queries (only MUST_NOT clauses) now return a score of `0`
			rather than `1`.

Use geohash cell instead of just a corner in geo_bounding_box (#30698) Treats geohashes as grid cells instead of just points when the geohashes are used to specify the edges in the geo_bounding_box query. For example, if a geohash is used to specify the top_left corner, the top left corner of the geohash cell will be used as the corner of the bounding box. Closes #25154 2018-05-24 14:46:15 -04:00			* The boundary specified using geohashes in the `geo_bounding_box` query
			`now include entire geohash cell, instead of just geohash center.`

Match phrase queries against non-indexed fields should throw an exception (#31060) When `lenient=false`, attempts to create match phrase queries with custom analyzers against non-text fields will throw an IllegalArgumentException. Also changes `MatchQueryBuilderTests` so that it avoids this scenario Fixes #31061 2018-06-04 14:12:45 -04:00			`* Attempts to generate multi-term phrase queries against non-text fields`
			`with a custom analyzer will now throw an exception`

Geo: Don't flip longitude of envelopes crossing dateline (#34535) When a envelope that crosses the dateline is specified as a part of geo_shape query is parsed it shouldn't have its left and right points flipped. Fixes #34418 2018-10-19 13:53:54 -04:00			* An `envelope` crossing the dateline in a `geo_shape `query is now processed
			`correctly when specified using REST API instead of having its left and`
			`right corners flipped.`

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Enable adaptive replica selection by default (#26522) Relates to #24915 2017-09-07 11:25:05 -04:00			`==== Adaptive replica selection enabled by default`

			`Adaptive replica selection has been enabled by default. If you wish to return to`
			`the older round robin of search requests, you can use the`
			`cluster.routing.use_adaptive_replica_selection` setting:

			`[source,js]`
			`--------------------------------------------------`
			`PUT /_cluster/settings`
			`{`
			`"transient": {`
			`"cluster.routing.use_adaptive_replica_selection": false`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
[DOCS] Clarify migrate guide and search request validation Relates to #26811 2017-10-31 07:36:00 -04:00			==== Search API returns `400` for invalid requests
Raise IllegalArgumentException if query validation failed (#26811) Closes #26799 2017-10-31 07:17:27 -04:00
[DOCS] Clarify migrate guide and search request validation Relates to #26811 2017-10-31 07:36:00 -04:00			The Search API returns `400 - Bad request` while it would previously return
			`500 - Internal Server Error` in the following cases of invalid request:

			`* the result window is too large`
			`* sort is used in combination with rescore`
			`* the rescore window is too large`
			`* the number of slices is too large`
			`* keep alive for scroll is too large`
			`* number of filters in the adjacency matrix aggregation is too large`
Change ScriptException status to 400 (bad request) (#30861) Currently failures to compile a script usually lead to a ScriptException, which inherits the 500 INTERNAL_SERVER_ERROR from ElasticsearchException if it does not contain another root cause. Instead, this should be a 400 Bad Request error. This PR changes this more generally for script compilation errors by changing ScriptException to return 400 (bad request) as status code. Closes #12315 2018-05-30 08:00:07 -04:00			`* script compilation errors`
Fail queries with scroll that explicitely set request_cache (#27342) Queries that create a scroll context cannot use the cache. They modify the search context during their execution so using the cache can lead to duplicate result for the next scroll query. This change fails the entire request if the request_cache option is explictely set on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never use the cache for these queries when the option is not explicitely used. For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint will be ignored (forced to false). 2017-11-10 10:02:06 -05:00
[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00			==== Scroll queries cannot use the `request_cache` anymore
Fail queries with scroll that explicitely set request_cache (#27342) Queries that create a scroll context cannot use the cache. They modify the search context during their execution so using the cache can lead to duplicate result for the next scroll query. This change fails the entire request if the request_cache option is explictely set on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never use the cache for these queries when the option is not explicitely used. For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint will be ignored (forced to false). 2017-11-10 10:02:06 -05:00
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00			Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`)
Fail queries with scroll that explicitely set request_cache (#27342) Queries that create a scroll context cannot use the cache. They modify the search context during their execution so using the cache can lead to duplicate result for the next scroll query. This change fails the entire request if the request_cache option is explictely set on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never use the cache for these queries when the option is not explicitely used. For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint will be ignored (forced to false). 2017-11-10 10:02:06 -05:00			has been deprecated in 6 and will now return a `400 - Bad request`.
			`Scroll queries are not meant to be cached.`
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00
[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Scroll queries asking for rescore are considered invalid (#32918) This PR changes our behavior from silently ignoring rescore in a scroll query to instead report to the user that such a query is invalid. Closes #31775 2018-08-28 15:48:23 -04:00			==== Scroll queries cannot use `rescore` anymore
Disallow negative query boost (#34486) This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts in 6x in order to ensure that users are aware when they'll upgrade to ES 7. Relates #33309 2018-10-16 06:31:53 -04:00
Scroll queries asking for rescore are considered invalid (#32918) This PR changes our behavior from silently ignoring rescore in a scroll query to instead report to the user that such a query is invalid. Closes #31775 2018-08-28 15:48:23 -04:00			Including a rescore clause on a query that creates a scroll (`scroll=1m`) has
			been deprecated in 6.5 and will now return a `400 - Bad request`. Allowing
			`rescore on scroll queries would break the scroll sort. In the 6.x line, the`
			`rescore clause was silently ignored (for scroll queries), and it was allowed in`
			`the 5.x line.`

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00			`==== Term Suggesters supported distance algorithms`

			`The following string distance algorithms were given additional names in 6.2 and`
			`their existing names were deprecated. The deprecated names have now been`
			`removed.`

			* `levenstein` - replaced by `levenshtein`
			* `jarowinkler` - replaced by `jaro_winkler`
Introduce limit to the number of terms in Terms Query (#27968) - Introduce index level settings to control the maximum number of terms that can be used in a Terms Query - Throw an error if a request exceeds this max number Closes #18829 2017-12-28 17:36:29 -05:00
Fix threshold frequency computation in Suggesters (#34312) The `term` and `phrase` suggesters have different options to filter candidates based on their frequencies. The `popular` mode for instance filters candidate terms that occur in less docs than the original term. However when we compute this threshold we use the total term frequency of a term instead of the document frequency. This is not inline with the actual filtering which is always based on the document frequency. This change fixes this discrepancy and clarifies the meaning of the different frequencies in use in the suggesters. It also ensures that the threshold doesn't overflow the maximum allowed value (Integer.MAX_VALUE). Closes #34282 2018-10-19 07:33:19 -04:00			`[float]`
			==== `popular` mode for Suggesters

			The `popular` mode for Suggesters (`term` and `phrase`) now uses the doc frequency
			`(instead of the sum of the doc frequency) of the input terms to compute the frequency`
			`threshold for candidate suggestions.`

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Introduce limit to the number of terms in Terms Query (#27968) - Introduce index level settings to control the maximum number of terms that can be used in a Terms Query - Throw an error if a request exceeds this max number Closes #18829 2017-12-28 17:36:29 -05:00			`==== Limiting the number of terms that can be used in a Terms Query request`

			`Executing a Terms Query with a lot of terms may degrade the cluster performance,`
			`as each additional term demands extra processing and memory.`
			`To safeguard against this, the maximum number of terms that can be used in a`
			`Terms Query request has been limited to 65536. This default maximum can be changed`
			for a particular index with the index setting `index.max_terms_count`.
Reject regex search if regex string is too long (#28542) * Reject regex search if regex string is too long (#28344) * Add docs * Introduce index level setting `index.max_regex_length` to control the maximum length of the regular expression Closes #28344 2018-02-23 13:41:24 -05:00
[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Reject regex search if regex string is too long (#28542) * Reject regex search if regex string is too long (#28344) * Add docs * Introduce index level setting `index.max_regex_length` to control the maximum length of the regular expression Closes #28344 2018-02-23 13:41:24 -05:00			`==== Limiting the length of regex that can be used in a Regexp Query request`

			`Executing a Regexp Query with a long regex string may degrade search performance.`
			`To safeguard against this, the maximum length of regex that can be used in a`
			`Regexp Query request has been limited to 1000. This default maximum can be changed`
			for a particular index with the index setting `index.max_regex_length`.
Fail _search request with trailing tokens (#29428) This change validates that the `_search` request does not have trailing tokens after the main object and fails the request with a parsing exception otherwise. Closes #28995 2018-04-11 07:10:22 -04:00
[Docs] Add migration note about expanded fields limit (#34920) Adds a note to warn users about the limit introduced in #26541. 2018-10-29 05:23:07 -04:00			`[float]`
			`==== Limiting the number of auto-expanded fields`

			Executing queries that use automatic expansion of fields (e.g. `query_string`, `simple_query_string`
			or `multi_match`) can have performance issues for indices with a large numbers of fields.
			`To safeguard against this, a hard limit of 1024 fields has been introduced for queries`
			`using the "all fields" mode ("default_field": "") or other fieldname expansions (e.g. "foo").`

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Fail _search request with trailing tokens (#29428) This change validates that the `_search` request does not have trailing tokens after the main object and fails the request with a parsing exception otherwise. Closes #28995 2018-04-11 07:10:22 -04:00			==== Invalid `_search` request body

			`Search requests with extra content after the main object will no longer be accepted`
			by the `_search` endpoint. A parsing exception will be thrown instead.
Limit the number of concurrent requests per node (#31206) With `max_concurrent_shard_requests` we used to throttle / limit the number of concurrent shard requests a high level search request can execute per node. This had several problems since it limited the number on a global level based on the number of nodes. This change now throttles the number of concurrent requests per node while still allowing concurrency across multiple nodes. Closes #31192 2018-06-11 02:49:18 -04:00
[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Remove the ability to index or query context suggestions without context (#31007) This is a follow up of #30712 that removes the ability to index or query and context enabled completion field without context. Relates #30712 2018-07-09 10:01:01 -04:00			`==== Context Completion Suggester`

			`The ability to query and index context enabled suggestions without context,`
			`deprecated in 6.x, has been removed. Context enabled suggestion queries`
			`without contexts have to visit every suggestion, which degrades the search performance`
			`considerably.`

Make Geo Context Mapping Parsing More Strict (#32821) Currently, if geo context is represented by something other than geo_point or an object with lat and lon fields, the parsing of it as a geo context can result in ignoring the context altogether, returning confusing errors such as number_format_exception or trying to parse the number specifying as long-encoded hash code. It would also fail if the geo_point was stored. This commit makes the mapping parsing more strict and will fail during mapping update or index creation if the geo context doesn't point to a geo_point field. Supersedes #32412 Closes #32202 2018-08-17 11:13:16 -04:00			For geo context the value of the `path` parameter is now validated against the mapping,
			and the context is only accepted if `path` points to a field with `geo_point` type.

[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Limit the number of concurrent requests per node (#31206) With `max_concurrent_shard_requests` we used to throttle / limit the number of concurrent shard requests a high level search request can execute per node. This had several problems since it limited the number on a global level based on the number of nodes. This change now throttles the number of concurrent requests per node while still allowing concurrency across multiple nodes. Closes #31192 2018-06-11 02:49:18 -04:00			==== Semantics changed for `max_concurrent_shard_requests`

			`max_concurrent_shard_requests` used to limit the total number of concurrent shard
			`requests a single high level search request can execute. In 7.0 this changed to be the`
			max number of concurrent shard requests per node. The default is now `5`.
Set maxScore for empty TopDocs to Nan rather than 0 (#32938) We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`). 2018-08-22 11:23:54 -04:00
[DOCS] Synchronize location of Breaking Changes (#33588) 2018-09-27 11:41:38 -04:00			`[float]`
Set maxScore for empty TopDocs to Nan rather than 0 (#32938) We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`). 2018-08-22 11:23:54 -04:00			==== `max_score` set to `null` when scores are not tracked

			`max_score` used to be set to `0` whenever scores are not tracked. `null` is now used
			`instead which is a more appropriate value for a scenario where scores are not available.`
Disallow negative query boost (#34486) This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts in 6x in order to ensure that users are aware when they'll upgrade to ES 7. Relates #33309 2018-10-16 06:31:53 -04:00
			`[float]`
			`==== Negative boosts are not allowed`

			Setting a negative `boost` in a query, deprecated in 6x, are not allowed in this version.
			To deboost a specific query you can use a `boost` comprise between 0 and 1.