Warn about the fact that the terms index is moving off-heap. (#39918)

Lucene 8.0 includes a [change](https://issues.apache.org/jira/browse/LUCENE-8635) that moves the terms index off-heap for all fields but ID fields. I'm including this in the migration notes so that users who have queries that match lots of terms won't be surprised in case of slowdown.
2019-03-12 10:16:41 +01:00 · 2019-03-12 10:16:41 +01:00 · 66b3a3a546
parent 675489d54c
commit 66b3a3a546
1 changed files with 26 additions and 1 deletions
--- a/docs/reference/migration/migrate_7_0/search.asciidoc
+++ b/docs/reference/migration/migrate_7_0/search.asciidoc
@ -2,6 +2,31 @@
 [[breaking_70_search_changes]]
 === Search and Query DSL changes

+[float]
+==== Off-heap terms index
+
+The terms dictionary is the part of the inverted index that records all terms
+that occur within a segment in sorted order. In order to provide fast retrieval,
+terms dictionaries come with a small terms index that allows for efficient
+random access by term. Until now this terms index had always been loaded
+on-heap.
+
+As of 7.0, the terms index is loaded on-heap for fields that only have unique
+values such as `_id` fields, and off-heap otherwise - likely most other fields.
+This is expected to reduce memory requirements but might slow down search
+requests if both below conditions are met:
+
+* The size of the data directory on each node is significantly larger than the
+  amount of memory that is available to the filesystem cache.
+
+* The number of matches of the query is not several orders of magnitude greater
+  than the number of terms that the query tries to match, either explicitly via
+  `term` or `terms` queries, or implicitly via multi-term queries such as
+  `prefix`, `wildcard` or `fuzzy` queries.
+
+This change affects both existing indices created with Elasticsearch 6.x and new
+indices created with Elasticsearch 7.x.
+
 [float]
 ==== Changes to queries
 *   The default value for `transpositions` parameter of `fuzzy` query