Warn about the fact that the terms index is moving off-heap. (#39918)

Lucene 8.0 includes a [change](https://issues.apache.org/jira/browse/LUCENE-8635) that moves the terms index off-heap for all fields but ID fields. I'm including this in the migration notes so that users who have queries that match lots of terms won't be surprised in case of slowdown.
2025-02-26 14:54:56 +00:00 · 2019-03-12 10:16:41 +01:00 · 2019-03-12 10:16:41 +01:00 · 66b3a3a546
commit 66b3a3a546
parent 675489d54c
1 changed files with 26 additions and 1 deletions
--- a/docs/reference/migration/migrate_7_0/search.asciidoc
+++ b/docs/reference/migration/migrate_7_0/search.asciidoc
@ -2,6 +2,31 @@
 [[breaking_70_search_changes]]
 === Search and Query DSL changes

+[float]
+==== Off-heap terms index
+
+The terms dictionary is the part of the inverted index that records all terms
+that occur within a segment in sorted order. In order to provide fast retrieval,
+terms dictionaries come with a small terms index that allows for efficient
+random access by term. Until now this terms index had always been loaded
+on-heap.
+
+As of 7.0, the terms index is loaded on-heap for fields that only have unique
+values such as `_id` fields, and off-heap otherwise - likely most other fields.
+This is expected to reduce memory requirements but might slow down search
+requests if both below conditions are met:
+
+* The size of the data directory on each node is significantly larger than the
+  amount of memory that is available to the filesystem cache.
+
+* The number of matches of the query is not several orders of magnitude greater
+  than the number of terms that the query tries to match, either explicitly via
+  `term` or `terms` queries, or implicitly via multi-term queries such as
+  `prefix`, `wildcard` or `fuzzy` queries.
+
+This change affects both existing indices created with Elasticsearch 6.x and new
+indices created with Elasticsearch 7.x.
+
 [float]
 ==== Changes to queries
 *   The default value for `transpositions` parameter of `fuzzy` query
@ -245,4 +270,4 @@ documents. If the total number of hits that match the query is greater than this
 <2> This is a lower bound (`"gte"`).

 You can force the count to always be accurate by setting `"track_total_hits`
-to true explicitly in the search request.
+to true explicitly in the search request.