117 lines
4.7 KiB
Plaintext
117 lines
4.7 KiB
Plaintext
|
[role="xpack"]
|
||
|
[testenv="basic"]
|
||
|
[[point-in-time]]
|
||
|
==== Point in time
|
||
|
|
||
|
A search request by default executes against the most recent visible data of
|
||
|
the target indices, which is called point in time. Elasticsearch pit (point in time)
|
||
|
is a lightweight view into the state of the data as it existed when initiated.
|
||
|
In some cases, it's preferred to perform multiple search requests using
|
||
|
the same point in time. For example, if <<indices-refresh,refreshes>> happen between
|
||
|
search_after requests, then the results of those requests might not be consistent as
|
||
|
changes happening between searches are only visible to the more recent point in time.
|
||
|
|
||
|
A point in time must be opened explicitly before being used in search requests. The
|
||
|
keep_alive parameter tells Elasticsearch how long it should keep a point in time alive,
|
||
|
e.g. `?keep_alive=5m`.
|
||
|
|
||
|
[source,console]
|
||
|
--------------------------------------------------
|
||
|
POST /my-index-000001/_pit?keep_alive=1m
|
||
|
--------------------------------------------------
|
||
|
// TEST[setup:my_index]
|
||
|
|
||
|
The result from the above request includes a `id`, which should
|
||
|
be passed to the `id` of the `pit` parameter of a search request.
|
||
|
|
||
|
[source,console]
|
||
|
--------------------------------------------------
|
||
|
POST /_search <1>
|
||
|
{
|
||
|
"size": 100,
|
||
|
"query": {
|
||
|
"match" : {
|
||
|
"title" : "elasticsearch"
|
||
|
}
|
||
|
},
|
||
|
"pit": {
|
||
|
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <2>
|
||
|
"keep_alive": "1m" <3>
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
// TEST[catch:missing]
|
||
|
|
||
|
<1> A search request with the `pit` parameter must not specify `index`, `routing`,
|
||
|
and {ref}/search-request-body.html#request-body-search-preference[`preference`]
|
||
|
as these parameters are copied from the point in time.
|
||
|
<2> The `id` parameter tells Elasticsearch to execute the request using contexts
|
||
|
from this point int time.
|
||
|
<3> The `keep_alive` parameter tells Elasticsearch how long it should extend
|
||
|
the time to live of the point in time.
|
||
|
|
||
|
IMPORTANT: The open point in time request and each subsequent search request can
|
||
|
return different `id`; thus always use the most recently received `id` for the
|
||
|
next search request.
|
||
|
|
||
|
[[point-in-time-keep-alive]]
|
||
|
===== Keeping point in time alive
|
||
|
The `keep_alive` parameter, which is passed to a open point in time request and
|
||
|
search request, extends the time to live of the corresponding point in time.
|
||
|
The value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
|
||
|
process all data -- it just needs to be long enough for the next request.
|
||
|
|
||
|
Normally, the background merge process optimizes the index by merging together
|
||
|
smaller segments to create new, bigger segments. Once the smaller segments are
|
||
|
no longer needed they are deleted. However, open point-in-times prevent the
|
||
|
old segments from being deleted since they are still in use.
|
||
|
|
||
|
TIP: Keeping older segments alive means that more disk space and file handles
|
||
|
are needed. Ensure that you have configured your nodes to have ample free file
|
||
|
handles. See <<file-descriptors>>.
|
||
|
|
||
|
Additionally, if a segment contains deleted or updated documents then the
|
||
|
point in time must keep track of whether each document in the segment was live at
|
||
|
the time of the initial search request. Ensure that your nodes have sufficient heap
|
||
|
space if you have many open point-in-times on an index that is subject to ongoing
|
||
|
deletes or updates.
|
||
|
|
||
|
You can check how many point-in-times (i.e, search contexts) are open with the
|
||
|
<<cluster-nodes-stats,nodes stats API>>:
|
||
|
|
||
|
[source,console]
|
||
|
---------------------------------------
|
||
|
GET /_nodes/stats/indices/search
|
||
|
---------------------------------------
|
||
|
|
||
|
===== Close point in time API
|
||
|
|
||
|
Point-in-time is automatically closed when its `keep_alive` has
|
||
|
been elapsed. However keeping point-in-times has a cost, as discussed in the
|
||
|
<<point-in-time-keep-alive,previous section>>. Point-in-times should be closed
|
||
|
as soon as they are no longer used in search requests.
|
||
|
|
||
|
[source,console]
|
||
|
---------------------------------------
|
||
|
DELETE /_pit
|
||
|
{
|
||
|
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
|
||
|
}
|
||
|
---------------------------------------
|
||
|
// TEST[catch:missing]
|
||
|
|
||
|
The API returns the following response:
|
||
|
|
||
|
[source,console-result]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"succeeded": true, <1>
|
||
|
"num_freed": 3 <2>
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
// TESTRESPONSE[s/"succeeded": true/"succeeded": $body.succeeded/]
|
||
|
// TESTRESPONSE[s/"num_freed": 3/"num_freed": $body.num_freed/]
|
||
|
|
||
|
<1> If true, all search contexts associated with the point-in-time id are successfully closed
|
||
|
<2> The number of search contexts have been successfully closed
|