274 lines
10 KiB
Plaintext
274 lines
10 KiB
Plaintext
[float]
|
|
[[breaking_70_search_changes]]
|
|
=== Search and Query DSL changes
|
|
|
|
[float]
|
|
==== Off-heap terms index
|
|
|
|
The terms dictionary is the part of the inverted index that records all terms
|
|
that occur within a segment in sorted order. In order to provide fast retrieval,
|
|
terms dictionaries come with a small terms index that allows for efficient
|
|
random access by term. Until now this terms index had always been loaded
|
|
on-heap.
|
|
|
|
As of 7.0, the terms index is loaded on-heap for fields that only have unique
|
|
values such as `_id` fields, and off-heap otherwise - likely most other fields.
|
|
This is expected to reduce memory requirements but might slow down search
|
|
requests if both below conditions are met:
|
|
|
|
* The size of the data directory on each node is significantly larger than the
|
|
amount of memory that is available to the filesystem cache.
|
|
|
|
* The number of matches of the query is not several orders of magnitude greater
|
|
than the number of terms that the query tries to match, either explicitly via
|
|
`term` or `terms` queries, or implicitly via multi-term queries such as
|
|
`prefix`, `wildcard` or `fuzzy` queries.
|
|
|
|
This change affects both existing indices created with Elasticsearch 6.x and new
|
|
indices created with Elasticsearch 7.x.
|
|
|
|
[float]
|
|
==== Changes to queries
|
|
* The default value for `transpositions` parameter of `fuzzy` query
|
|
has been changed to `true`.
|
|
|
|
* The `query_string` options `use_dismax`, `split_on_whitespace`,
|
|
`all_fields`, `locale`, `auto_generate_phrase_query` and
|
|
`lowercase_expanded_terms` deprecated in 6.x have been removed.
|
|
|
|
* Purely negative queries (only MUST_NOT clauses) now return a score of `0`
|
|
rather than `1`.
|
|
|
|
* The boundary specified using geohashes in the `geo_bounding_box` query
|
|
now include entire geohash cell, instead of just geohash center.
|
|
|
|
* Attempts to generate multi-term phrase queries against non-text fields
|
|
with a custom analyzer will now throw an exception.
|
|
|
|
* An `envelope` crossing the dateline in a `geo_shape `query is now processed
|
|
correctly when specified using REST API instead of having its left and
|
|
right corners flipped.
|
|
|
|
* Attempts to set `boost` on inner span queries will now throw a parsing exception.
|
|
|
|
[float]
|
|
==== Adaptive replica selection enabled by default
|
|
|
|
Adaptive replica selection has been enabled by default. If you wish to return to
|
|
the older round robin of search requests, you can use the
|
|
`cluster.routing.use_adaptive_replica_selection` setting:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT /_cluster/settings
|
|
{
|
|
"transient": {
|
|
"cluster.routing.use_adaptive_replica_selection": false
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
==== Search API returns `400` for invalid requests
|
|
|
|
The Search API returns `400 - Bad request` while it would previously return
|
|
`500 - Internal Server Error` in the following cases of invalid request:
|
|
|
|
* the result window is too large
|
|
* sort is used in combination with rescore
|
|
* the rescore window is too large
|
|
* the number of slices is too large
|
|
* keep alive for scroll is too large
|
|
* number of filters in the adjacency matrix aggregation is too large
|
|
* script compilation errors
|
|
|
|
[float]
|
|
==== Scroll queries cannot use the `request_cache` anymore
|
|
|
|
Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`)
|
|
has been deprecated in 6 and will now return a `400 - Bad request`.
|
|
Scroll queries are not meant to be cached.
|
|
|
|
[float]
|
|
==== Scroll queries cannot use `rescore` anymore
|
|
|
|
Including a rescore clause on a query that creates a scroll (`scroll=1m`) has
|
|
been deprecated in 6.5 and will now return a `400 - Bad request`. Allowing
|
|
rescore on scroll queries would break the scroll sort. In the 6.x line, the
|
|
rescore clause was silently ignored (for scroll queries), and it was allowed in
|
|
the 5.x line.
|
|
|
|
[float]
|
|
==== Term Suggesters supported distance algorithms
|
|
|
|
The following string distance algorithms were given additional names in 6.2 and
|
|
their existing names were deprecated. The deprecated names have now been
|
|
removed.
|
|
|
|
* `levenstein` - replaced by `levenshtein`
|
|
* `jarowinkler` - replaced by `jaro_winkler`
|
|
|
|
[float]
|
|
==== `popular` mode for Suggesters
|
|
|
|
The `popular` mode for Suggesters (`term` and `phrase`) now uses the doc frequency
|
|
(instead of the sum of the doc frequency) of the input terms to compute the frequency
|
|
threshold for candidate suggestions.
|
|
|
|
[float]
|
|
==== Limiting the number of terms that can be used in a Terms Query request
|
|
|
|
Executing a Terms Query with a lot of terms may degrade the cluster performance,
|
|
as each additional term demands extra processing and memory.
|
|
To safeguard against this, the maximum number of terms that can be used in a
|
|
Terms Query request has been limited to 65536. This default maximum can be changed
|
|
for a particular index with the index setting `index.max_terms_count`.
|
|
|
|
[float]
|
|
==== Limiting the length of regex that can be used in a Regexp Query request
|
|
|
|
Executing a Regexp Query with a long regex string may degrade search performance.
|
|
To safeguard against this, the maximum length of regex that can be used in a
|
|
Regexp Query request has been limited to 1000. This default maximum can be changed
|
|
for a particular index with the index setting `index.max_regex_length`.
|
|
|
|
[float]
|
|
==== Limiting the number of auto-expanded fields
|
|
|
|
Executing queries that use automatic expansion of fields (e.g. `query_string`, `simple_query_string`
|
|
or `multi_match`) can have performance issues for indices with a large numbers of fields.
|
|
To safeguard against this, a hard limit of 1024 fields has been introduced for queries
|
|
using the "all fields" mode ("default_field": "*") or other fieldname expansions (e.g. "foo*").
|
|
|
|
[float]
|
|
==== Invalid `_search` request body
|
|
|
|
Search requests with extra content after the main object will no longer be accepted
|
|
by the `_search` endpoint. A parsing exception will be thrown instead.
|
|
|
|
[float]
|
|
==== Doc-value fields default format
|
|
|
|
The format of doc-value fields is changing to be the same as what could be
|
|
obtained in 6.x with the special `use_field_mapping` format. This is mostly a
|
|
change for date fields, which are now formatted based on the format that is
|
|
configured in the mappings by default. This behavior can be changed by
|
|
specifying a <<search-request-docvalue-fields,`format`>> within the doc-value
|
|
field.
|
|
|
|
[float]
|
|
==== Context Completion Suggester
|
|
|
|
The ability to query and index context enabled suggestions without context,
|
|
deprecated in 6.x, has been removed. Context enabled suggestion queries
|
|
without contexts have to visit every suggestion, which degrades the search performance
|
|
considerably.
|
|
|
|
For geo context the value of the `path` parameter is now validated against the mapping,
|
|
and the context is only accepted if `path` points to a field with `geo_point` type.
|
|
|
|
[float]
|
|
==== Semantics changed for `max_concurrent_shard_requests`
|
|
|
|
`max_concurrent_shard_requests` used to limit the total number of concurrent shard
|
|
requests a single high level search request can execute. In 7.0 this changed to be the
|
|
max number of concurrent shard requests per node. The default is now `5`.
|
|
|
|
[float]
|
|
==== `max_score` set to `null` when scores are not tracked
|
|
|
|
`max_score` used to be set to `0` whenever scores are not tracked. `null` is now used
|
|
instead which is a more appropriate value for a scenario where scores are not available.
|
|
|
|
[float]
|
|
==== Negative boosts are not allowed
|
|
|
|
Setting a negative `boost` for a query or a field, deprecated in 6x, is not allowed in this version.
|
|
To deboost a specific query or field you can use a `boost` comprise between 0 and 1.
|
|
|
|
[float]
|
|
==== Negative scores are not allowed in Function Score Query
|
|
|
|
Negative scores in the Function Score Query are deprecated in 6.x, and are
|
|
not allowed in this version. If a negative score is produced as a result
|
|
of computation (e.g. in `script_score` or `field_value_factor` functions),
|
|
an error will be thrown.
|
|
|
|
[float]
|
|
==== The filter context has been removed
|
|
|
|
The `filter` context has been removed from Elasticsearch's query builders,
|
|
the distinction between queries and filters is now decided in Lucene depending
|
|
on whether queries need to access score or not. As a result `bool` queries with
|
|
`should` clauses that don't need to access the score will no longer set their
|
|
`minimum_should_match` to 1. This behavior has been deprecated in the previous
|
|
major version.
|
|
|
|
[float]
|
|
==== `hits.total` is now an object in the search response
|
|
|
|
The total hits that match the search request is now returned as an object
|
|
with a `value` and a `relation`. `value` indicates the number of hits that
|
|
match and `relation` indicates whether the value is accurate (`eq`) or a lower bound
|
|
(`gte`):
|
|
|
|
```
|
|
{
|
|
"hits": {
|
|
"total": { <1>
|
|
"value": 1000,
|
|
"relation": "eq"
|
|
},
|
|
...
|
|
}
|
|
}
|
|
```
|
|
|
|
The "total" object in the response indicates that the query matches exactly 1000
|
|
documents ("eq"). The `value` is always accurate (`"relation": "eq"`) when
|
|
`track_total_hits` is set to true in the request.
|
|
You can also retrieve `hits.total` as a number in the rest response by adding
|
|
`rest_total_hits_as_int=true` in the request parameter of the search request.
|
|
This parameter has been added to ease the transition to the new format and
|
|
will be removed in the next major version (8.0).
|
|
|
|
[float]
|
|
==== `hits.total` is omitted in the response if `track_total_hits` is disabled (false)
|
|
|
|
If `track_total_hits` is set to `false` in the search request the search response
|
|
will set `hits.total` to null and the object will not be displayed in the rest
|
|
layer. You can add `rest_total_hits_as_int=true` in the search request parameters
|
|
to get the old format back (`"total": -1`).
|
|
|
|
[float]
|
|
==== `track_total_hits` defaults to 10,000
|
|
|
|
By default search request will count the total hits accurately up to `10,000`
|
|
documents. If the total number of hits that match the query is greater than this
|
|
value, the response will indicate that the returned value is a lower bound:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards": ...
|
|
"timed_out": false,
|
|
"took": 100,
|
|
"hits": {
|
|
"max_score": 1.0,
|
|
"total" : {
|
|
"value": 10000, <1>
|
|
"relation": "gte" <2>
|
|
},
|
|
"hits": ...
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> There are at least 10000 documents that match the query
|
|
<2> This is a lower bound (`"gte"`).
|
|
|
|
You can force the count to always be accurate by setting `"track_total_hits`
|
|
to true explicitly in the search request.
|