OpenSearch/docs/reference/migration/migrate_7_0/search.asciidoc

[[breaking_70_search_changes]]
=== Search and Query DSL changes

==== Changes to queries
*   The default value for `transpositions` parameter of `fuzzy` query
    has been changed to `true`.

*   The `query_string` options `use_dismax`, `split_on_whitespace`,
    `all_fields`, `locale`, `auto_generate_phrase_query` and
    `lowercase_expanded_terms` deprecated in 6.x have been removed.

*   Purely negative queries (only MUST_NOT clauses) now return a score of `0`
    rather than `1`.

*   The boundary specified using geohashes in the `geo_bounding_box` query
    now include entire geohash cell, instead of just geohash center.

*   Attempts to generate multi-term phrase queries against non-text fields
    with a custom analyzer will now throw an exception

==== Adaptive replica selection enabled by default

Adaptive replica selection has been enabled by default. If you wish to return to
the older round robin of search requests, you can use the
`cluster.routing.use_adaptive_replica_selection` setting:

[source,js]
--------------------------------------------------
PUT /_cluster/settings
{
    "transient": {
        "cluster.routing.use_adaptive_replica_selection": false
    }
}
--------------------------------------------------
// CONSOLE

==== Search API returns `400` for invalid requests

The Search API returns `400 - Bad request` while it would previously return
`500 - Internal Server Error` in the following cases of invalid request:

*   the result window is too large
*   sort is used in  combination with rescore
*   the rescore window is too large
*   the number of slices is too large
*   keep alive for scroll is too large
*   number of filters in the adjacency matrix aggregation is too large
*   script compilation errors

==== Scroll queries cannot use the `request_cache` anymore

Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`)
has been deprecated in 6 and will now return a `400 - Bad request`.
Scroll queries are not meant to be cached.

==== Term Suggesters supported distance algorithms

The following string distance algorithms were given additional names in 6.2 and
their existing names were deprecated. The deprecated names have now been
removed.

* 	`levenstein` - replaced by `levenshtein`
* 	`jarowinkler` - replaced by `jaro_winkler`


==== Limiting the number of terms that can be used in a Terms Query request

Executing a Terms Query with a lot of terms may degrade the cluster performance,
as each additional term demands extra processing and memory.
To safeguard against this, the maximum number of terms that can be used in a
Terms Query request has been limited to 65536. This default maximum can be changed
for a particular index with the index setting `index.max_terms_count`.


==== Limiting the length of regex that can be used in a Regexp Query request

Executing a Regexp Query with a long regex string may degrade search performance.
To safeguard against this, the maximum length of regex that can be used in a
Regexp Query request has been limited to 1000. This default maximum can be changed
for a particular index with the index setting `index.max_regex_length`.

==== Invalid `_search` request body

Search requests with extra content after the main object will no longer be accepted
by the `_search` endpoint. A parsing exception will be thrown instead.

==== Context Completion Suggester

The ability to query and index context enabled suggestions without context,
deprecated in 6.x, has been removed. Context enabled suggestion queries
without contexts have to visit every suggestion, which degrades the search performance
considerably.

For geo context the value of the `path` parameter is now validated against the mapping,
and the context is only accepted if `path` points to a field with `geo_point` type.

==== Semantics changed for `max_concurrent_shard_requests`

`max_concurrent_shard_requests` used to limit the total number of concurrent shard
requests a single high level search request can execute. In 7.0 this changed to be the
max number of concurrent shard requests per node. The default is now `5`.

==== `max_score` set to `null` when scores are not tracked

`max_score` used to be set to `0` whenever scores are not tracked. `null` is now used
instead which is a more appropriate value for a scenario where scores are not available.
Enable adaptive replica selection by default (#26522) Relates to #24915 2017-09-07 11:25:05 -04:00			`[[breaking_70_search_changes]]`
Change default value to true for transpositions parameter of fuzzy query (#26901) 2017-10-11 09:31:48 -04:00			`=== Search and Query DSL changes`

			`==== Changes to queries`
			* The default value for `transpositions` parameter of `fuzzy` query
			has been changed to `true`.
Enable adaptive replica selection by default (#26522) Relates to #24915 2017-09-07 11:25:05 -04:00
Remove deprecated options for query_string (#29203) This commit removes some parameters deprecated in 6.x (or 5.x): `use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`. Closes #25551 2018-03-22 13:37:08 -04:00			* The `query_string` options `use_dismax`, `split_on_whitespace`,
			`all_fields`, `locale`, `auto_generate_phrase_query` and
			`lowercase_expanded_terms` deprecated in 6.x have been removed.

Make purely negative queries return scores of 0. (#26015) It would make them consistent with queries that are only made of filters. Closes #23449 2018-04-10 08:31:06 -04:00			* Purely negative queries (only MUST_NOT clauses) now return a score of `0`
			rather than `1`.

Use geohash cell instead of just a corner in geo_bounding_box (#30698) Treats geohashes as grid cells instead of just points when the geohashes are used to specify the edges in the geo_bounding_box query. For example, if a geohash is used to specify the top_left corner, the top left corner of the geohash cell will be used as the corner of the bounding box. Closes #25154 2018-05-24 14:46:15 -04:00			* The boundary specified using geohashes in the `geo_bounding_box` query
			`now include entire geohash cell, instead of just geohash center.`

Match phrase queries against non-indexed fields should throw an exception (#31060) When `lenient=false`, attempts to create match phrase queries with custom analyzers against non-text fields will throw an IllegalArgumentException. Also changes `MatchQueryBuilderTests` so that it avoids this scenario Fixes #31061 2018-06-04 14:12:45 -04:00			`* Attempts to generate multi-term phrase queries against non-text fields`
			`with a custom analyzer will now throw an exception`

Enable adaptive replica selection by default (#26522) Relates to #24915 2017-09-07 11:25:05 -04:00			`==== Adaptive replica selection enabled by default`

			`Adaptive replica selection has been enabled by default. If you wish to return to`
			`the older round robin of search requests, you can use the`
			`cluster.routing.use_adaptive_replica_selection` setting:

			`[source,js]`
			`--------------------------------------------------`
			`PUT /_cluster/settings`
			`{`
			`"transient": {`
			`"cluster.routing.use_adaptive_replica_selection": false`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

[DOCS] Clarify migrate guide and search request validation Relates to #26811 2017-10-31 07:36:00 -04:00			==== Search API returns `400` for invalid requests
Raise IllegalArgumentException if query validation failed (#26811) Closes #26799 2017-10-31 07:17:27 -04:00
[DOCS] Clarify migrate guide and search request validation Relates to #26811 2017-10-31 07:36:00 -04:00			The Search API returns `400 - Bad request` while it would previously return
			`500 - Internal Server Error` in the following cases of invalid request:

			`* the result window is too large`
			`* sort is used in combination with rescore`
			`* the rescore window is too large`
			`* the number of slices is too large`
			`* keep alive for scroll is too large`
			`* number of filters in the adjacency matrix aggregation is too large`
Change ScriptException status to 400 (bad request) (#30861) Currently failures to compile a script usually lead to a ScriptException, which inherits the 500 INTERNAL_SERVER_ERROR from ElasticsearchException if it does not contain another root cause. Instead, this should be a 400 Bad Request error. This PR changes this more generally for script compilation errors by changing ScriptException to return 400 (bad request) as status code. Closes #12315 2018-05-30 08:00:07 -04:00			`* script compilation errors`
Fail queries with scroll that explicitely set request_cache (#27342) Queries that create a scroll context cannot use the cache. They modify the search context during their execution so using the cache can lead to duplicate result for the next scroll query. This change fails the entire request if the request_cache option is explictely set on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never use the cache for these queries when the option is not explicitely used. For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint will be ignored (forced to false). 2017-11-10 10:02:06 -05:00
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00			==== Scroll queries cannot use the `request_cache` anymore
Fail queries with scroll that explicitely set request_cache (#27342) Queries that create a scroll context cannot use the cache. They modify the search context during their execution so using the cache can lead to duplicate result for the next scroll query. This change fails the entire request if the request_cache option is explictely set on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never use the cache for these queries when the option is not explicitely used. For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint will be ignored (forced to false). 2017-11-10 10:02:06 -05:00
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00			Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`)
Fail queries with scroll that explicitely set request_cache (#27342) Queries that create a scroll context cannot use the cache. They modify the search context during their execution so using the cache can lead to duplicate result for the next scroll query. This change fails the entire request if the request_cache option is explictely set on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never use the cache for these queries when the option is not explicitely used. For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint will be ignored (forced to false). 2017-11-10 10:02:06 -05:00			has been deprecated in 6 and will now return a `400 - Bad request`.
			`Scroll queries are not meant to be cached.`
Remove deprecated names for string distance algorithms (#27640) #27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`. #27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`. These changes were merged into 6.2, and this change removes them entirely in 7.0. 2017-12-11 07:16:04 -05:00
			`==== Term Suggesters supported distance algorithms`

			`The following string distance algorithms were given additional names in 6.2 and`
			`their existing names were deprecated. The deprecated names have now been`
			`removed.`

			* `levenstein` - replaced by `levenshtein`
			* `jarowinkler` - replaced by `jaro_winkler`
Introduce limit to the number of terms in Terms Query (#27968) - Introduce index level settings to control the maximum number of terms that can be used in a Terms Query - Throw an error if a request exceeds this max number Closes #18829 2017-12-28 17:36:29 -05:00

			`==== Limiting the number of terms that can be used in a Terms Query request`

			`Executing a Terms Query with a lot of terms may degrade the cluster performance,`
			`as each additional term demands extra processing and memory.`
			`To safeguard against this, the maximum number of terms that can be used in a`
			`Terms Query request has been limited to 65536. This default maximum can be changed`
			for a particular index with the index setting `index.max_terms_count`.
Reject regex search if regex string is too long (#28542) * Reject regex search if regex string is too long (#28344) * Add docs * Introduce index level setting `index.max_regex_length` to control the maximum length of the regular expression Closes #28344 2018-02-23 13:41:24 -05:00

			`==== Limiting the length of regex that can be used in a Regexp Query request`

			`Executing a Regexp Query with a long regex string may degrade search performance.`
			`To safeguard against this, the maximum length of regex that can be used in a`
			`Regexp Query request has been limited to 1000. This default maximum can be changed`
			for a particular index with the index setting `index.max_regex_length`.
Fail _search request with trailing tokens (#29428) This change validates that the `_search` request does not have trailing tokens after the main object and fails the request with a parsing exception otherwise. Closes #28995 2018-04-11 07:10:22 -04:00
			==== Invalid `_search` request body

			`Search requests with extra content after the main object will no longer be accepted`
			by the `_search` endpoint. A parsing exception will be thrown instead.
Limit the number of concurrent requests per node (#31206) With `max_concurrent_shard_requests` we used to throttle / limit the number of concurrent shard requests a high level search request can execute per node. This had several problems since it limited the number on a global level based on the number of nodes. This change now throttles the number of concurrent requests per node while still allowing concurrency across multiple nodes. Closes #31192 2018-06-11 02:49:18 -04:00
Remove the ability to index or query context suggestions without context (#31007) This is a follow up of #30712 that removes the ability to index or query and context enabled completion field without context. Relates #30712 2018-07-09 10:01:01 -04:00			`==== Context Completion Suggester`

			`The ability to query and index context enabled suggestions without context,`
			`deprecated in 6.x, has been removed. Context enabled suggestion queries`
			`without contexts have to visit every suggestion, which degrades the search performance`
			`considerably.`

Make Geo Context Mapping Parsing More Strict (#32821) Currently, if geo context is represented by something other than geo_point or an object with lat and lon fields, the parsing of it as a geo context can result in ignoring the context altogether, returning confusing errors such as number_format_exception or trying to parse the number specifying as long-encoded hash code. It would also fail if the geo_point was stored. This commit makes the mapping parsing more strict and will fail during mapping update or index creation if the geo context doesn't point to a geo_point field. Supersedes #32412 Closes #32202 2018-08-17 11:13:16 -04:00			For geo context the value of the `path` parameter is now validated against the mapping,
			and the context is only accepted if `path` points to a field with `geo_point` type.

Limit the number of concurrent requests per node (#31206) With `max_concurrent_shard_requests` we used to throttle / limit the number of concurrent shard requests a high level search request can execute per node. This had several problems since it limited the number on a global level based on the number of nodes. This change now throttles the number of concurrent requests per node while still allowing concurrency across multiple nodes. Closes #31192 2018-06-11 02:49:18 -04:00			==== Semantics changed for `max_concurrent_shard_requests`

			`max_concurrent_shard_requests` used to limit the total number of concurrent shard
			`requests a single high level search request can execute. In 7.0 this changed to be the`
			max number of concurrent shard requests per node. The default is now `5`.
Set maxScore for empty TopDocs to Nan rather than 0 (#32938) We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`). 2018-08-22 11:23:54 -04:00
			==== `max_score` set to `null` when scores are not tracked

			`max_score` used to be set to `0` whenever scores are not tracked. `null` is now used
			`instead which is a more appropriate value for a scenario where scores are not available.`