From 66b3a3a546b71d485d723c678af1d50d918855aa Mon Sep 17 00:00:00 2001 From: Adrien Grand Date: Tue, 12 Mar 2019 10:16:41 +0100 Subject: [PATCH] Warn about the fact that the terms index is moving off-heap. (#39918) Lucene 8.0 includes a [change](https://issues.apache.org/jira/browse/LUCENE-8635) that moves the terms index off-heap for all fields but ID fields. I'm including this in the migration notes so that users who have queries that match lots of terms won't be surprised in case of slowdown. --- .../migration/migrate_7_0/search.asciidoc | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/reference/migration/migrate_7_0/search.asciidoc b/docs/reference/migration/migrate_7_0/search.asciidoc index afe96fd8851..cf3342ea7ee 100644 --- a/docs/reference/migration/migrate_7_0/search.asciidoc +++ b/docs/reference/migration/migrate_7_0/search.asciidoc @@ -2,6 +2,31 @@ [[breaking_70_search_changes]] === Search and Query DSL changes +[float] +==== Off-heap terms index + +The terms dictionary is the part of the inverted index that records all terms +that occur within a segment in sorted order. In order to provide fast retrieval, +terms dictionaries come with a small terms index that allows for efficient +random access by term. Until now this terms index had always been loaded +on-heap. + +As of 7.0, the terms index is loaded on-heap for fields that only have unique +values such as `_id` fields, and off-heap otherwise - likely most other fields. +This is expected to reduce memory requirements but might slow down search +requests if both below conditions are met: + +* The size of the data directory on each node is significantly larger than the + amount of memory that is available to the filesystem cache. + +* The number of matches of the query is not several orders of magnitude greater + than the number of terms that the query tries to match, either explicitly via + `term` or `terms` queries, or implicitly via multi-term queries such as + `prefix`, `wildcard` or `fuzzy` queries. + +This change affects both existing indices created with Elasticsearch 6.x and new +indices created with Elasticsearch 7.x. + [float] ==== Changes to queries * The default value for `transpositions` parameter of `fuzzy` query @@ -245,4 +270,4 @@ documents. If the total number of hits that match the query is greater than this <2> This is a lower bound (`"gte"`). You can force the count to always be accurate by setting `"track_total_hits` -to true explicitly in the search request. \ No newline at end of file +to true explicitly in the search request.