mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-08 14:05:27 +00:00
The docs/reference/redirects.asciidoc file stores a list of relocated or deleted pages for the Elasticsearch Reference documentation. This prunes several older redirects that are no longer needed.
315 lines
12 KiB
Plaintext
315 lines
12 KiB
Plaintext
[float]
|
|
[[breaking_70_search_changes]]
|
|
=== Search and Query DSL changes
|
|
|
|
//NOTE: The notable-breaking-changes tagged regions are re-used in the
|
|
//Installation and Upgrade Guide
|
|
|
|
//tag::notable-breaking-changes[]
|
|
|
|
// end::notable-breaking-changes[]
|
|
|
|
[float]
|
|
==== Off-heap terms index
|
|
|
|
The terms dictionary is the part of the inverted index that records all terms
|
|
that occur within a segment in sorted order. In order to provide fast retrieval,
|
|
terms dictionaries come with a small terms index that allows for efficient
|
|
random access by term. Until now this terms index had always been loaded
|
|
on-heap.
|
|
|
|
As of 7.0, the terms index is loaded on-heap for fields that only have unique
|
|
values such as `_id` fields, and off-heap otherwise - likely most other fields.
|
|
This is expected to reduce memory requirements but might slow down search
|
|
requests if both below conditions are met:
|
|
|
|
* The size of the data directory on each node is significantly larger than the
|
|
amount of memory that is available to the filesystem cache.
|
|
|
|
* The number of matches of the query is not several orders of magnitude greater
|
|
than the number of terms that the query tries to match, either explicitly via
|
|
`term` or `terms` queries, or implicitly via multi-term queries such as
|
|
`prefix`, `wildcard` or `fuzzy` queries.
|
|
|
|
This change affects both existing indices created with Elasticsearch 6.x and new
|
|
indices created with Elasticsearch 7.x.
|
|
|
|
[float]
|
|
==== Changes to queries
|
|
* The default value for `transpositions` parameter of `fuzzy` query
|
|
has been changed to `true`.
|
|
|
|
* The `query_string` options `use_dismax`, `split_on_whitespace`,
|
|
`all_fields`, `locale`, `auto_generate_phrase_query` and
|
|
`lowercase_expanded_terms` deprecated in 6.x have been removed.
|
|
|
|
* Purely negative queries (only MUST_NOT clauses) now return a score of `0`
|
|
rather than `1`.
|
|
|
|
* The boundary specified using geohashes in the `geo_bounding_box` query
|
|
now include entire geohash cell, instead of just geohash center.
|
|
|
|
* Attempts to generate multi-term phrase queries against non-text fields
|
|
with a custom analyzer will now throw an exception.
|
|
|
|
* An `envelope` crossing the dateline in a `geo_shape `query is now processed
|
|
correctly when specified using REST API instead of having its left and
|
|
right corners flipped.
|
|
|
|
* Attempts to set `boost` on inner span queries will now throw a parsing exception.
|
|
|
|
[float]
|
|
==== Adaptive replica selection enabled by default
|
|
|
|
Adaptive replica selection has been enabled by default. If you wish to return to
|
|
the older round robin of search requests, you can use the
|
|
`cluster.routing.use_adaptive_replica_selection` setting:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /_cluster/settings
|
|
{
|
|
"transient": {
|
|
"cluster.routing.use_adaptive_replica_selection": false
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
[float]
|
|
[[search-api-returns-400-invalid-requests]]
|
|
==== Search API returns `400` for invalid requests
|
|
|
|
The Search API returns `400 - Bad request` while it would previously return
|
|
`500 - Internal Server Error` in the following cases of invalid request:
|
|
|
|
* the result window is too large
|
|
* sort is used in combination with rescore
|
|
* the rescore window is too large
|
|
* the number of slices is too large
|
|
* keep alive for scroll is too large
|
|
* number of filters in the adjacency matrix aggregation is too large
|
|
* script compilation errors
|
|
|
|
[float]
|
|
[[scroll-queries-cannot-use-request-cache]]
|
|
==== Scroll queries cannot use the `request_cache` anymore
|
|
|
|
Setting `request_cache:true` on a query that creates a scroll (`scroll=1m`)
|
|
has been deprecated in 6 and will now return a `400 - Bad request`.
|
|
Scroll queries are not meant to be cached.
|
|
|
|
[float]
|
|
[[scroll-queries-cannot-use-rescore]]
|
|
==== Scroll queries cannot use `rescore` anymore
|
|
|
|
Including a rescore clause on a query that creates a scroll (`scroll=1m`) has
|
|
been deprecated in 6.5 and will now return a `400 - Bad request`. Allowing
|
|
rescore on scroll queries would break the scroll sort. In the 6.x line, the
|
|
rescore clause was silently ignored (for scroll queries), and it was allowed in
|
|
the 5.x line.
|
|
|
|
[float]
|
|
==== Term Suggesters supported distance algorithms
|
|
|
|
The following string distance algorithms were given additional names in 6.2 and
|
|
their existing names were deprecated. The deprecated names have now been
|
|
removed.
|
|
|
|
* `levenstein` - replaced by `levenshtein`
|
|
* `jarowinkler` - replaced by `jaro_winkler`
|
|
|
|
[float]
|
|
[[popular-mode-suggesters]]
|
|
==== `popular` mode for Suggesters
|
|
|
|
The `popular` mode for Suggesters (`term` and `phrase`) now uses the doc frequency
|
|
(instead of the sum of the doc frequency) of the input terms to compute the frequency
|
|
threshold for candidate suggestions.
|
|
|
|
[float]
|
|
==== Limiting the number of terms that can be used in a Terms Query request
|
|
|
|
Executing a Terms Query with a lot of terms may degrade the cluster performance,
|
|
as each additional term demands extra processing and memory.
|
|
To safeguard against this, the maximum number of terms that can be used in a
|
|
Terms Query request has been limited to 65536. This default maximum can be changed
|
|
for a particular index with the index setting `index.max_terms_count`.
|
|
|
|
[float]
|
|
==== Limiting the length of regex that can be used in a Regexp Query request
|
|
|
|
Executing a Regexp Query with a long regex string may degrade search performance.
|
|
To safeguard against this, the maximum length of regex that can be used in a
|
|
Regexp Query request has been limited to 1000. This default maximum can be changed
|
|
for a particular index with the index setting `index.max_regex_length`.
|
|
|
|
[float]
|
|
==== Limiting the number of auto-expanded fields
|
|
|
|
Executing queries that use automatic expansion of fields (e.g. `query_string`, `simple_query_string`
|
|
or `multi_match`) can have performance issues for indices with a large numbers of fields.
|
|
To safeguard against this, a hard limit of 1024 fields has been introduced for queries
|
|
using the "all fields" mode ("default_field": "*") or other fieldname expansions (e.g. "foo*").
|
|
|
|
[float]
|
|
[[invalid-search-request-body]]
|
|
==== Invalid `_search` request body
|
|
|
|
Search requests with extra content after the main object will no longer be accepted
|
|
by the `_search` endpoint. A parsing exception will be thrown instead.
|
|
|
|
[float]
|
|
==== Doc-value fields default format
|
|
|
|
The format of doc-value fields is changing to be the same as what could be
|
|
obtained in 6.x with the special `use_field_mapping` format. This is mostly a
|
|
change for date fields, which are now formatted based on the format that is
|
|
configured in the mappings by default. This behavior can be changed by
|
|
specifying a <<request-body-search-docvalue-fields,`format`>> within the doc-value
|
|
field.
|
|
|
|
[float]
|
|
==== Context Completion Suggester
|
|
|
|
The ability to query and index context enabled suggestions without context,
|
|
deprecated in 6.x, has been removed. Context enabled suggestion queries
|
|
without contexts have to visit every suggestion, which degrades the search performance
|
|
considerably.
|
|
|
|
For geo context the value of the `path` parameter is now validated against the mapping,
|
|
and the context is only accepted if `path` points to a field with `geo_point` type.
|
|
|
|
[float]
|
|
[[semantics-changed-max-concurrent-shared-requests]]
|
|
==== Semantics changed for `max_concurrent_shard_requests`
|
|
|
|
`max_concurrent_shard_requests` used to limit the total number of concurrent shard
|
|
requests a single high level search request can execute. In 7.0 this changed to be the
|
|
max number of concurrent shard requests per node. The default is now `5`.
|
|
|
|
[float]
|
|
[[max-score-set-to-null-when-untracked]]
|
|
==== `max_score` set to `null` when scores are not tracked
|
|
|
|
`max_score` used to be set to `0` whenever scores are not tracked. `null` is now used
|
|
instead which is a more appropriate value for a scenario where scores are not available.
|
|
|
|
[float]
|
|
==== Negative boosts are not allowed
|
|
|
|
Setting a negative `boost` for a query or a field, deprecated in 6x, is not allowed in this version.
|
|
To deboost a specific query or field you can use a `boost` comprise between 0 and 1.
|
|
|
|
[float]
|
|
==== Negative scores are not allowed in Function Score Query
|
|
|
|
Negative scores in the Function Score Query are deprecated in 6.x, and are
|
|
not allowed in this version. If a negative score is produced as a result
|
|
of computation (e.g. in `script_score` or `field_value_factor` functions),
|
|
an error will be thrown.
|
|
|
|
[float]
|
|
==== The filter context has been removed
|
|
|
|
The `filter` context has been removed from Elasticsearch's query builders,
|
|
the distinction between queries and filters is now decided in Lucene depending
|
|
on whether queries need to access score or not. As a result `bool` queries with
|
|
`should` clauses that don't need to access the score will no longer set their
|
|
`minimum_should_match` to 1. This behavior has been deprecated in the previous
|
|
major version.
|
|
|
|
//tag::notable-breaking-changes[]
|
|
[float]
|
|
[[hits-total-now-object-search-response]]
|
|
==== `hits.total` is now an object in the search response
|
|
|
|
The total hits that match the search request is now returned as an object
|
|
with a `value` and a `relation`. `value` indicates the number of hits that
|
|
match and `relation` indicates whether the value is accurate (`eq`) or a lower bound
|
|
(`gte`):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"hits": {
|
|
"total": {
|
|
"value": 1000,
|
|
"relation": "eq"
|
|
},
|
|
...
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
The `total` object in the response indicates that the query matches exactly 1000
|
|
documents ("eq"). The `value` is always accurate (`"relation": "eq"`) when
|
|
`track_total_hits` is set to true in the request.
|
|
You can also retrieve `hits.total` as a number in the rest response by adding
|
|
`rest_total_hits_as_int=true` in the request parameter of the search request.
|
|
This parameter has been added to ease the transition to the new format and
|
|
will be removed in the next major version (8.0).
|
|
//end::notable-breaking-changes[]
|
|
|
|
[float]
|
|
[[hits-total-omitted-if-disabled]]
|
|
==== `hits.total` is omitted in the response if `track_total_hits` is disabled (false)
|
|
|
|
If `track_total_hits` is set to `false` in the search request the search response
|
|
will set `hits.total` to null and the object will not be displayed in the rest
|
|
layer. You can add `rest_total_hits_as_int=true` in the search request parameters
|
|
to get the old format back (`"total": -1`).
|
|
|
|
//tag::notable-breaking-changes[]
|
|
[float]
|
|
[[track-total-hits-10000-default]]
|
|
==== `track_total_hits` defaults to 10,000
|
|
|
|
By default search request will count the total hits accurately up to `10,000`
|
|
documents. If the total number of hits that match the query is greater than this
|
|
value, the response will indicate that the returned value is a lower bound:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards": ...
|
|
"timed_out": false,
|
|
"took": 100,
|
|
"hits": {
|
|
"max_score": 1.0,
|
|
"total" : {
|
|
"value": 10000, <1>
|
|
"relation": "gte" <2>
|
|
},
|
|
"hits": ...
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> There are at least 10000 documents that match the query
|
|
<2> This is a lower bound (`"gte"`).
|
|
|
|
You can force the count to always be accurate by setting `"track_total_hits`
|
|
to true explicitly in the search request.
|
|
//end::notable-breaking-changes[]
|
|
|
|
[float]
|
|
==== Limitations on Similarities
|
|
Lucene 8 introduced more constraints on similarities, in particular:
|
|
|
|
- scores must not be negative,
|
|
- scores must not decrease when term freq increases,
|
|
- scores must not increase when norm (interpreted as an unsigned long) increases.
|
|
|
|
[float]
|
|
==== Weights in Function Score must be positive
|
|
Negative `weight` parameters in the `function_score` are no longer allowed.
|
|
|
|
[float]
|
|
==== Query string and Simple query string limit expansion of fields to 1024
|
|
The number of automatically expanded fields for the "all fields"
|
|
mode (`"default_field": "*"`) for the query_string and simple_query_string
|
|
queries is now 1024 fields.
|