Warn about the fact that the terms index is moving off-heap. (#39918)

Lucene 8.0 includes a [change](https://issues.apache.org/jira/browse/LUCENE-8635)
that moves the terms index off-heap for all fields but ID fields. I'm
including this in the migration notes so that users who have queries that match
lots of terms won't be surprised in case of slowdown.
This commit is contained in:
Adrien Grand 2019-03-12 10:16:41 +01:00 committed by GitHub
parent 675489d54c
commit 66b3a3a546
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 26 additions and 1 deletions

View File

@ -2,6 +2,31 @@
[[breaking_70_search_changes]]
=== Search and Query DSL changes
[float]
==== Off-heap terms index
The terms dictionary is the part of the inverted index that records all terms
that occur within a segment in sorted order. In order to provide fast retrieval,
terms dictionaries come with a small terms index that allows for efficient
random access by term. Until now this terms index had always been loaded
on-heap.
As of 7.0, the terms index is loaded on-heap for fields that only have unique
values such as `_id` fields, and off-heap otherwise - likely most other fields.
This is expected to reduce memory requirements but might slow down search
requests if both below conditions are met:
* The size of the data directory on each node is significantly larger than the
amount of memory that is available to the filesystem cache.
* The number of matches of the query is not several orders of magnitude greater
than the number of terms that the query tries to match, either explicitly via
`term` or `terms` queries, or implicitly via multi-term queries such as
`prefix`, `wildcard` or `fuzzy` queries.
This change affects both existing indices created with Elasticsearch 6.x and new
indices created with Elasticsearch 7.x.
[float]
==== Changes to queries
* The default value for `transpositions` parameter of `fuzzy` query
@ -245,4 +270,4 @@ documents. If the total number of hits that match the query is greater than this
<2> This is a lower bound (`"gte"`).
You can force the count to always be accurate by setting `"track_total_hits`
to true explicitly in the search request.
to true explicitly in the search request.