[float] [[breaking_70_search_changes]] === Search and Query DSL changes //NOTE: The notable-breaking-changes tagged regions are re-used in the //Installation and Upgrade Guide //tag::notable-breaking-changes[] // end::notable-breaking-changes[] [float] ==== Off-heap terms index The terms dictionary is the part of the inverted index that records all terms that occur within a segment in sorted order. In order to provide fast retrieval, terms dictionaries come with a small terms index that allows for efficient random access by term. Until now this terms index had always been loaded on-heap. As of 7.0, the terms index is loaded on-heap for fields that only have unique values such as `_id` fields, and off-heap otherwise - likely most other fields. This is expected to reduce memory requirements but might slow down search requests if both below conditions are met: * The size of the data directory on each node is significantly larger than the amount of memory that is available to the filesystem cache. * The number of matches of the query is not several orders of magnitude greater than the number of terms that the query tries to match, either explicitly via `term` or `terms` queries, or implicitly via multi-term queries such as `prefix`, `wildcard` or `fuzzy` queries. This change affects both existing indices created with Elasticsearch 6.x and new indices created with Elasticsearch 7.x. [float] ==== Changes to queries * The default value for `transpositions` parameter of `fuzzy` query has been changed to `true`. * The `query_string` options `use_dismax`, `split_on_whitespace`, `all_fields`, `locale`, `auto_generate_phrase_query` and `lowercase_expanded_terms` deprecated in 6.x have been removed. * Purely negative queries (only MUST_NOT clauses) now return a score of `0` rather than `1`. * The boundary specified using geohashes in the `geo_bounding_box` query now include entire geohash cell, instead of just geohash center. * Attempts to generate multi-term phrase queries against non-text fields with a custom analyzer will now throw an exception. * An `envelope` crossing the dateline in a `geo_shape `query is now processed correctly when specified using REST API instead of having its left and right corners flipped. * Attempts to set `boost` on inner span queries will now throw a parsing exception. [float] ==== Adaptive replica selection enabled by default Adaptive replica selection has been enabled by default. If you wish to return to the older round robin of search requests, you can use the `cluster.routing.use_adaptive_replica_selection` setting: [source,js] -------------------------------------------------- PUT /_cluster/settings { "transient": { "cluster.routing.use_adaptive_replica_selection": false } } -------------------------------------------------- // CONSOLE [float] ==== Search API returns `400` for invalid requests The Search API returns `400 - Bad request` while it would previously return `500 - Internal Server Error` in the following cases of invalid request: * the result window is too large * sort is used in combination with rescore * the rescore window is too large * the number of slices is too large * keep alive for scroll is too large * number of filters in the adjacency matrix aggregation is too large * script compilation errors [float] ==== Scroll queries cannot use the `request_cache` anymore Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`) has been deprecated in 6 and will now return a `400 - Bad request`. Scroll queries are not meant to be cached. [float] ==== Scroll queries cannot use `rescore` anymore Including a rescore clause on a query that creates a scroll (`scroll=1m`) has been deprecated in 6.5 and will now return a `400 - Bad request`. Allowing rescore on scroll queries would break the scroll sort. In the 6.x line, the rescore clause was silently ignored (for scroll queries), and it was allowed in the 5.x line. [float] ==== Term Suggesters supported distance algorithms The following string distance algorithms were given additional names in 6.2 and their existing names were deprecated. The deprecated names have now been removed. * `levenstein` - replaced by `levenshtein` * `jarowinkler` - replaced by `jaro_winkler` [float] ==== `popular` mode for Suggesters The `popular` mode for Suggesters (`term` and `phrase`) now uses the doc frequency (instead of the sum of the doc frequency) of the input terms to compute the frequency threshold for candidate suggestions. [float] ==== Limiting the number of terms that can be used in a Terms Query request Executing a Terms Query with a lot of terms may degrade the cluster performance, as each additional term demands extra processing and memory. To safeguard against this, the maximum number of terms that can be used in a Terms Query request has been limited to 65536. This default maximum can be changed for a particular index with the index setting `index.max_terms_count`. [float] ==== Limiting the length of regex that can be used in a Regexp Query request Executing a Regexp Query with a long regex string may degrade search performance. To safeguard against this, the maximum length of regex that can be used in a Regexp Query request has been limited to 1000. This default maximum can be changed for a particular index with the index setting `index.max_regex_length`. [float] ==== Limiting the number of auto-expanded fields Executing queries that use automatic expansion of fields (e.g. `query_string`, `simple_query_string` or `multi_match`) can have performance issues for indices with a large numbers of fields. To safeguard against this, a hard limit of 1024 fields has been introduced for queries using the "all fields" mode ("default_field": "*") or other fieldname expansions (e.g. "foo*"). [float] ==== Invalid `_search` request body Search requests with extra content after the main object will no longer be accepted by the `_search` endpoint. A parsing exception will be thrown instead. [float] ==== Doc-value fields default format The format of doc-value fields is changing to be the same as what could be obtained in 6.x with the special `use_field_mapping` format. This is mostly a change for date fields, which are now formatted based on the format that is configured in the mappings by default. This behavior can be changed by specifying a <> within the doc-value field. [float] ==== Context Completion Suggester The ability to query and index context enabled suggestions without context, deprecated in 6.x, has been removed. Context enabled suggestion queries without contexts have to visit every suggestion, which degrades the search performance considerably. For geo context the value of the `path` parameter is now validated against the mapping, and the context is only accepted if `path` points to a field with `geo_point` type. [float] ==== Semantics changed for `max_concurrent_shard_requests` `max_concurrent_shard_requests` used to limit the total number of concurrent shard requests a single high level search request can execute. In 7.0 this changed to be the max number of concurrent shard requests per node. The default is now `5`. [float] ==== `max_score` set to `null` when scores are not tracked `max_score` used to be set to `0` whenever scores are not tracked. `null` is now used instead which is a more appropriate value for a scenario where scores are not available. [float] ==== Negative boosts are not allowed Setting a negative `boost` for a query or a field, deprecated in 6x, is not allowed in this version. To deboost a specific query or field you can use a `boost` comprise between 0 and 1. [float] ==== Negative scores are not allowed in Function Score Query Negative scores in the Function Score Query are deprecated in 6.x, and are not allowed in this version. If a negative score is produced as a result of computation (e.g. in `script_score` or `field_value_factor` functions), an error will be thrown. [float] ==== The filter context has been removed The `filter` context has been removed from Elasticsearch's query builders, the distinction between queries and filters is now decided in Lucene depending on whether queries need to access score or not. As a result `bool` queries with `should` clauses that don't need to access the score will no longer set their `minimum_should_match` to 1. This behavior has been deprecated in the previous major version. //tag::notable-breaking-changes[] [float] ==== `hits.total` is now an object in the search response The total hits that match the search request is now returned as an object with a `value` and a `relation`. `value` indicates the number of hits that match and `relation` indicates whether the value is accurate (`eq`) or a lower bound (`gte`): [source,js] -------------------------------------------------- { "hits": { "total": { "value": 1000, "relation": "eq" }, ... } } -------------------------------------------------- // NOTCONSOLE The `total` object in the response indicates that the query matches exactly 1000 documents ("eq"). The `value` is always accurate (`"relation": "eq"`) when `track_total_hits` is set to true in the request. You can also retrieve `hits.total` as a number in the rest response by adding `rest_total_hits_as_int=true` in the request parameter of the search request. This parameter has been added to ease the transition to the new format and will be removed in the next major version (8.0). //end::notable-breaking-changes[] [float] ==== `hits.total` is omitted in the response if `track_total_hits` is disabled (false) If `track_total_hits` is set to `false` in the search request the search response will set `hits.total` to null and the object will not be displayed in the rest layer. You can add `rest_total_hits_as_int=true` in the search request parameters to get the old format back (`"total": -1`). //tag::notable-breaking-changes[] [float] ==== `track_total_hits` defaults to 10,000 By default search request will count the total hits accurately up to `10,000` documents. If the total number of hits that match the query is greater than this value, the response will indicate that the returned value is a lower bound: [source,js] -------------------------------------------------- { "_shards": ... "timed_out": false, "took": 100, "hits": { "max_score": 1.0, "total" : { "value": 10000, <1> "relation": "gte" <2> }, "hits": ... } } -------------------------------------------------- // NOTCONSOLE <1> There are at least 10000 documents that match the query <2> This is a lower bound (`"gte"`). You can force the count to always be accurate by setting `"track_total_hits` to true explicitly in the search request. //end::notable-breaking-changes[] [float] ==== Limitations on Similarities Lucene 8 introduced more constraints on similarities, in particular: - scores must not be negative, - scores must not decrease when term freq increases, - scores must not increase when norm (interpreted as an unsigned long) increases. [float] ==== Weights in Function Score must be positive Negative `weight` parameters in the `function_score` are no longer allowed. [float] ==== Query string and Simple query string limit expansion of fields to 1024 The number of automatically expanded fields for the "all fields" mode (`"default_field": "*"`) for the query_string and simple_query_string queries is now 1024 fields.