Mention the cost of tracking live docs in scrolls (#41375)

Relates #41337, in which a heap dump shows hundreds of MBs allocated on the
heap for tracking the live docs for each scroll.
This commit is contained in:
David Turner 2019-04-23 15:26:14 +01:00
parent e2f8ffdde8
commit 411994b489
1 changed files with 20 additions and 10 deletions

View File

@ -103,6 +103,12 @@ GET /_search?scroll=1m
[[scroll-search-context]]
==== Keeping the search context alive
A scroll returns all the documents which matched the search at the time of the
initial search request. It ignores any subsequent changes to these documents.
The `scroll_id` identifies a _search context_ which keeps track of everything
that {es} needs to return the correct documents. The search context is created
by the initial request and kept alive by subsequent requests.
The `scroll` parameter (passed to the `search` request and to every `scroll`
request) tells Elasticsearch how long it should keep the search context alive.
Its value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
@ -112,17 +118,21 @@ new expiry time. If a `scroll` request doesn't pass in the `scroll`
parameter, then the search context will be freed as part of _that_ `scroll`
request.
Normally, the background merge process optimizes the
index by merging together smaller segments to create new bigger segments, at
which time the smaller segments are deleted. This process continues during
scrolling, but an open search context prevents the old segments from being
deleted while they are still in use. This is how Elasticsearch is able to
return the results of the initial search request, regardless of subsequent
changes to documents.
Normally, the background merge process optimizes the index by merging together
smaller segments to create new, bigger segments. Once the smaller segments are
no longer needed they are deleted. This process continues during scrolling, but
an open search context prevents the old segments from being deleted since they
are still in use.
TIP: Keeping older segments alive means that more file handles are needed.
Ensure that you have configured your nodes to have ample free file handles.
See <<file-descriptors>>.
TIP: Keeping older segments alive means that more disk space and file handles
are needed. Ensure that you have configured your nodes to have ample free file
handles. See <<file-descriptors>>.
Additionally, if a segment contains deleted or updated documents then the
search context must keep track of whether each document in the segment was live
at the time of the initial search request. Ensure that your nodes have
sufficient heap space if you have many open scrolls on an index that is subject
to ongoing deletes or updates.
NOTE: To prevent against issues caused by having too many scrolls open, the
user is not allowed to open scrolls past a certain limit. By default, the