parent
95766da345
commit
af13c9802d
|
@ -14,6 +14,7 @@ exception of the <<search-explain,explain API>>.
|
||||||
* <<search-search>>
|
* <<search-search>>
|
||||||
* <<search-multi-search>>
|
* <<search-multi-search>>
|
||||||
* <<async-search>>
|
* <<async-search>>
|
||||||
|
* <<point-in-time-api>>
|
||||||
* <<scroll-api>>
|
* <<scroll-api>>
|
||||||
* <<clear-scroll-api>>
|
* <<clear-scroll-api>>
|
||||||
* <<search-suggesters>>
|
* <<search-suggesters>>
|
||||||
|
@ -51,6 +52,8 @@ include::search/search.asciidoc[]
|
||||||
|
|
||||||
include::search/async-search.asciidoc[]
|
include::search/async-search.asciidoc[]
|
||||||
|
|
||||||
|
include::search/point-in-time-api.asciidoc[]
|
||||||
|
|
||||||
include::search/scroll-api.asciidoc[]
|
include::search/scroll-api.asciidoc[]
|
||||||
|
|
||||||
include::search/clear-scroll-api.asciidoc[]
|
include::search/clear-scroll-api.asciidoc[]
|
||||||
|
|
|
@ -0,0 +1,120 @@
|
||||||
|
[role="xpack"]
|
||||||
|
[testenv="basic"]
|
||||||
|
[[point-in-time-api]]
|
||||||
|
=== Point in time API
|
||||||
|
++++
|
||||||
|
<titleabbrev>Point in time</titleabbrev>
|
||||||
|
++++
|
||||||
|
|
||||||
|
A search request by default executes against the most recent visible data of
|
||||||
|
the target indices, which is called point in time. Elasticsearch pit (point in time)
|
||||||
|
is a lightweight view into the state of the data as it existed when initiated.
|
||||||
|
In some cases, it's preferred to perform multiple search requests using
|
||||||
|
the same point in time. For example, if <<indices-refresh,refreshes>> happen between
|
||||||
|
search_after requests, then the results of those requests might not be consistent as
|
||||||
|
changes happening between searches are only visible to the more recent point in time.
|
||||||
|
|
||||||
|
A point in time must be opened explicitly before being used in search requests. The
|
||||||
|
keep_alive parameter tells Elasticsearch how long it should keep a point in time alive,
|
||||||
|
e.g. `?keep_alive=5m`.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
--------------------------------------------------
|
||||||
|
POST /my-index-000001/_pit?keep_alive=1m
|
||||||
|
--------------------------------------------------
|
||||||
|
// TEST[setup:my_index]
|
||||||
|
|
||||||
|
The result from the above request includes a `id`, which should
|
||||||
|
be passed to the `id` of the `pit` parameter of a search request.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
--------------------------------------------------
|
||||||
|
POST /_search <1>
|
||||||
|
{
|
||||||
|
"size": 100,
|
||||||
|
"query": {
|
||||||
|
"match" : {
|
||||||
|
"title" : "elasticsearch"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pit": {
|
||||||
|
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <2>
|
||||||
|
"keep_alive": "1m" <3>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// TEST[catch:missing]
|
||||||
|
|
||||||
|
<1> A search request with the `pit` parameter must not specify `index`, `routing`,
|
||||||
|
and {ref}/search-request-body.html#request-body-search-preference[`preference`]
|
||||||
|
as these parameters are copied from the point in time.
|
||||||
|
<2> The `id` parameter tells Elasticsearch to execute the request using contexts
|
||||||
|
from this point int time.
|
||||||
|
<3> The `keep_alive` parameter tells Elasticsearch how long it should extend
|
||||||
|
the time to live of the point in time.
|
||||||
|
|
||||||
|
IMPORTANT: The open point in time request and each subsequent search request can
|
||||||
|
return different `id`; thus always use the most recently received `id` for the
|
||||||
|
next search request.
|
||||||
|
|
||||||
|
[[point-in-time-keep-alive]]
|
||||||
|
==== Keeping point in time alive
|
||||||
|
The `keep_alive` parameter, which is passed to a open point in time request and
|
||||||
|
search request, extends the time to live of the corresponding point in time.
|
||||||
|
The value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
|
||||||
|
process all data -- it just needs to be long enough for the next request.
|
||||||
|
|
||||||
|
Normally, the background merge process optimizes the index by merging together
|
||||||
|
smaller segments to create new, bigger segments. Once the smaller segments are
|
||||||
|
no longer needed they are deleted. However, open point-in-times prevent the
|
||||||
|
old segments from being deleted since they are still in use.
|
||||||
|
|
||||||
|
TIP: Keeping older segments alive means that more disk space and file handles
|
||||||
|
are needed. Ensure that you have configured your nodes to have ample free file
|
||||||
|
handles. See <<file-descriptors>>.
|
||||||
|
|
||||||
|
Additionally, if a segment contains deleted or updated documents then the
|
||||||
|
point in time must keep track of whether each document in the segment was live at
|
||||||
|
the time of the initial search request. Ensure that your nodes have sufficient heap
|
||||||
|
space if you have many open point-in-times on an index that is subject to ongoing
|
||||||
|
deletes or updates.
|
||||||
|
|
||||||
|
You can check how many point-in-times (i.e, search contexts) are open with the
|
||||||
|
<<cluster-nodes-stats,nodes stats API>>:
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
---------------------------------------
|
||||||
|
GET /_nodes/stats/indices/search
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
[[close-point-in-time-api]]
|
||||||
|
==== Close point in time API
|
||||||
|
|
||||||
|
Point-in-time is automatically closed when its `keep_alive` has
|
||||||
|
been elapsed. However keeping point-in-times has a cost, as discussed in the
|
||||||
|
<<point-in-time-keep-alive,previous section>>. Point-in-times should be closed
|
||||||
|
as soon as they are no longer used in search requests.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
---------------------------------------
|
||||||
|
DELETE /_pit
|
||||||
|
{
|
||||||
|
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
|
||||||
|
}
|
||||||
|
---------------------------------------
|
||||||
|
// TEST[catch:missing]
|
||||||
|
|
||||||
|
The API returns the following response:
|
||||||
|
|
||||||
|
[source,console-result]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"succeeded": true, <1>
|
||||||
|
"num_freed": 3 <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// TESTRESPONSE[s/"succeeded": true/"succeeded": $body.succeeded/]
|
||||||
|
// TESTRESPONSE[s/"num_freed": 3/"num_freed": $body.num_freed/]
|
||||||
|
|
||||||
|
<1> If true, all search contexts associated with the point-in-time id are successfully closed
|
||||||
|
<2> The number of search contexts have been successfully closed
|
|
@ -4,6 +4,10 @@
|
||||||
<titleabbrev>Scroll</titleabbrev>
|
<titleabbrev>Scroll</titleabbrev>
|
||||||
++++
|
++++
|
||||||
|
|
||||||
|
IMPORTANT: We no longer recommend using the scroll API for deep pagination. If
|
||||||
|
you need to preserve the index state while paging through more than 10,000 hits,
|
||||||
|
use the <<search-after,`search_after`>> parameter with a point in time (PIT).
|
||||||
|
|
||||||
Retrieves the next batch of results for a <<scroll-search-results,scrolling
|
Retrieves the next batch of results for a <<scroll-search-results,scrolling
|
||||||
search>>.
|
search>>.
|
||||||
|
|
||||||
|
|
|
@ -1,18 +1,11 @@
|
||||||
[[paginate-search-results]]
|
[[paginate-search-results]]
|
||||||
== Paginate search results
|
== Paginate search results
|
||||||
|
|
||||||
By default, the <<search-search,search API>> returns the top 10 matching documents.
|
By default, searches return the top 10 matching hits. To page through a larger
|
||||||
|
set of results, you can use the <<search-search,search API>>'s `from` and `size`
|
||||||
To paginate through a larger set of results, you can use the search API's `size`
|
parameters. The `from` parameter defines the number of hits to skip, defaulting
|
||||||
and `from` parameters. The `size` parameter is the number of matching documents
|
to `0`. The `size` parameter is the maximum number of hits to return. Together,
|
||||||
to return. The `from` parameter is a zero-indexed offset from the beginning of
|
these two parameters define a page of results.
|
||||||
the complete result set that indicates the document you want to start with.
|
|
||||||
|
|
||||||
The following search API request sets the `from` offset to `5`, meaning the
|
|
||||||
request offsets, or skips, the first five matching documents.
|
|
||||||
|
|
||||||
The `size` parameter is `20`, meaning the request can return up to 20 documents,
|
|
||||||
starting at the offset.
|
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
----
|
----
|
||||||
|
@ -28,29 +21,177 @@ GET /_search
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
By default, you cannot page through more than 10,000 documents using the `from`
|
Avoid using `from` and `size` to page too deeply or request too many results at
|
||||||
and `size` parameters. This limit is set using the
|
once. Search requests usually span multiple shards. Each shard must load its
|
||||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
requested hits and the hits for any previous pages into memory. For deep pages
|
||||||
|
or large sets of results, these operations can significantly increase memory and
|
||||||
|
CPU usage, resulting in degraded performance or node failures.
|
||||||
|
|
||||||
Deep paging or requesting many results at once can result in slow searches.
|
By default, you cannot use `from` and `size` to page through more than 10,000
|
||||||
Results are sorted before being returned. Because search requests usually span
|
hits. This limit is a safeguard set by the
|
||||||
multiple shards, each shard must generate its own sorted results. These separate
|
<<index-max-result-window,`index.max_result_window`>> index setting. If you need
|
||||||
results must then be combined and sorted to ensure that the overall sort order
|
to page through more than 10,000 hits, use the <<search-after,`search_after`>>
|
||||||
is correct.
|
parameter instead.
|
||||||
|
|
||||||
As an alternative to deep paging, we recommend using
|
WARNING: {es} uses Lucene's internal doc IDs as tie-breakers. These internal doc
|
||||||
<<scroll-search-results,scroll>> or the
|
IDs can be completely different across replicas of the same data. When paging
|
||||||
<<search-after,`search_after`>> parameter.
|
search hits, you might occasionally see that documents with the same sort values
|
||||||
|
are not ordered consistently.
|
||||||
|
|
||||||
|
[discrete]
|
||||||
|
[[search-after]]
|
||||||
|
=== Search after
|
||||||
|
|
||||||
|
You can use the `search_after` parameter to retrieve the next page of hits
|
||||||
|
using a set of <<sort-search-results,sort values>> from the previous page.
|
||||||
|
|
||||||
|
Using `search_after` requires multiple search requests with the same `query` and
|
||||||
|
`sort` values. If a <<near-real-time,refresh>> occurs between these requests,
|
||||||
|
the order of your results may change, causing inconsistent results across pages. To
|
||||||
|
prevent this, you can create a <<point-in-time-api,point in time (PIT)>> to
|
||||||
|
preserve the current index state over your searches.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
POST /my-index-000001/_pit?keep_alive=1m
|
||||||
|
----
|
||||||
|
// TEST[setup:my_index]
|
||||||
|
|
||||||
|
The API returns a PIT ID.
|
||||||
|
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TESTRESPONSE[s/"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="/"id": $body.id/]
|
||||||
|
|
||||||
|
To get the first page of results, submit a search request with a `sort`
|
||||||
|
argument. If using a PIT, specify the PIT ID in the `pit.id` parameter and omit
|
||||||
|
the target data stream or index from the request path.
|
||||||
|
|
||||||
|
IMPORTANT: We recommend you include a tiebreaker field in your `sort`. This
|
||||||
|
tiebreaker field should contain a unique value for each document. If you don't
|
||||||
|
include a tiebreaker field, your paged results could miss or duplicate hits.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
GET /_search
|
||||||
|
{
|
||||||
|
"size": 10000,
|
||||||
|
"query": {
|
||||||
|
"match" : {
|
||||||
|
"user.id" : "elkbee"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pit": {
|
||||||
|
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", <1>
|
||||||
|
"keep_alive": "1m"
|
||||||
|
},
|
||||||
|
"sort": [ <2>
|
||||||
|
{"@timestamp": "asc"},
|
||||||
|
{"tie_breaker_id": "asc"}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[catch:missing]
|
||||||
|
|
||||||
|
<1> PIT ID for the search.
|
||||||
|
<2> Sorts hits for the search.
|
||||||
|
|
||||||
|
The search response includes an array of `sort` values for each hit. If you used
|
||||||
|
a PIT, the response's `pit_id` parameter contains an updated PIT ID.
|
||||||
|
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"pit_id" : "46ToAwEPbXktaW5kZXgtMDAwMDAxFnVzaTVuenpUVGQ2TFNheUxVUG5LVVEAFldicVdzOFFtVHZTZDFoWWowTGkwS0EAAAAAAAAAAAQURzZzcUszUUJ5U1NMX3Jyak5ET0wBFnVzaTVuenpUVGQ2TFNheUxVUG5LVVEAAA==", <1>
|
||||||
|
"took" : 17,
|
||||||
|
"timed_out" : false,
|
||||||
|
"_shards" : ...,
|
||||||
|
"hits" : {
|
||||||
|
"total" : ...,
|
||||||
|
"max_score" : null,
|
||||||
|
"hits" : [
|
||||||
|
...
|
||||||
|
{
|
||||||
|
"_index" : "my-index-000001",
|
||||||
|
"_id" : "FaslK3QBySSL_rrj9zM5",
|
||||||
|
"_score" : null,
|
||||||
|
"_source" : ...,
|
||||||
|
"sort" : [ <2>
|
||||||
|
4098435132000,
|
||||||
|
"FaslK3QBySSL_rrj9zM5"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TESTRESPONSE[skip: unable to access PIT ID]
|
||||||
|
|
||||||
|
<1> Updated `id` for the point in time.
|
||||||
|
<2> Sort values for the last returned hit.
|
||||||
|
|
||||||
|
To get the next page of results, rerun the previous search using the last hit's
|
||||||
|
sort values as the `search_after` argument. If using a PIT, use the latest PIT
|
||||||
|
ID in the `pit.id` parameter. The search's `query` and `sort` arguments must
|
||||||
|
remain unchanged. If provided, the `from` argument must be `0` (default) or `-1`.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
GET /_search
|
||||||
|
{
|
||||||
|
"size": 10000,
|
||||||
|
"query": {
|
||||||
|
"match" : {
|
||||||
|
"user.id" : "elkbee"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pit": {
|
||||||
|
"id": "46ToAwEPbXktaW5kZXgtMDAwMDAxFnVzaTVuenpUVGQ2TFNheUxVUG5LVVEAFldicVdzOFFtVHZTZDFoWWowTGkwS0EAAAAAAAAAAAQURzZzcUszUUJ5U1NMX3Jyak5ET0wBFnVzaTVuenpUVGQ2TFNheUxVUG5LVVEAAA==", <1>
|
||||||
|
"keep_alive": "1m"
|
||||||
|
},
|
||||||
|
"sort": [
|
||||||
|
{"@timestamp": "asc"},
|
||||||
|
{"tie_breaker_id": "asc"}
|
||||||
|
],
|
||||||
|
"search_after": [ <2>
|
||||||
|
4098435132000,
|
||||||
|
"FaslK3QBySSL_rrj9zM5"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[catch:missing]
|
||||||
|
|
||||||
|
<1> PIT ID returned by the previous search.
|
||||||
|
<2> Sort values from the previous search's last hit.
|
||||||
|
|
||||||
|
You can repeat this process to get additional pages of results. If using a PIT,
|
||||||
|
you can extend the PIT's retention period using the
|
||||||
|
`keep_alive` parameter of each search request.
|
||||||
|
|
||||||
|
When you're finished, you should delete your PIT.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
DELETE /_pit
|
||||||
|
{
|
||||||
|
"id" : "46ToAwEPbXktaW5kZXgtMDAwMDAxFnVzaTVuenpUVGQ2TFNheUxVUG5LVVEAFldicVdzOFFtVHZTZDFoWWowTGkwS0EAAAAAAAAAAAQURzZzcUszUUJ5U1NMX3Jyak5ET0wBFnVzaTVuenpUVGQ2TFNheUxVUG5LVVEAAA=="
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[catch:missing]
|
||||||
|
|
||||||
WARNING: {es} uses Lucene's internal doc IDs as tie-breakers. These internal
|
|
||||||
doc IDs can be completely different across replicas of the same
|
|
||||||
data. When paginating, you might occasionally see that documents with the same
|
|
||||||
sort values are not ordered consistently.
|
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[scroll-search-results]]
|
[[scroll-search-results]]
|
||||||
=== Scroll search results
|
=== Scroll search results
|
||||||
|
|
||||||
|
IMPORTANT: We no longer recommend using the scroll API for deep pagination. If
|
||||||
|
you need to preserve the index state while paging through more than 10,000 hits,
|
||||||
|
use the <<search-after,`search_after`>> parameter with a point in time (PIT).
|
||||||
|
|
||||||
While a `search` request returns a single ``page'' of results, the `scroll`
|
While a `search` request returns a single ``page'' of results, the `scroll`
|
||||||
API can be used to retrieve large numbers of results (or even all results)
|
API can be used to retrieve large numbers of results (or even all results)
|
||||||
from a single search request, in much the same way as you would use a cursor
|
from a single search request, in much the same way as you would use a cursor
|
||||||
|
@ -125,13 +266,13 @@ POST /_search/scroll
|
||||||
for another `1m`.
|
for another `1m`.
|
||||||
<3> The `scroll_id` parameter
|
<3> The `scroll_id` parameter
|
||||||
|
|
||||||
The `size` parameter allows you to configure the maximum number of hits to be
|
The `size` parameter allows you to configure the maximum number of hits to be
|
||||||
returned with each batch of results. Each call to the `scroll` API returns the
|
returned with each batch of results. Each call to the `scroll` API returns the
|
||||||
next batch of results until there are no more results left to return, ie the
|
next batch of results until there are no more results left to return, ie the
|
||||||
`hits` array is empty.
|
`hits` array is empty.
|
||||||
|
|
||||||
IMPORTANT: The initial search request and each subsequent scroll request each
|
IMPORTANT: The initial search request and each subsequent scroll request each
|
||||||
return a `_scroll_id`. While the `_scroll_id` may change between requests, it doesn’t
|
return a `_scroll_id`. While the `_scroll_id` may change between requests, it doesn’t
|
||||||
always change — in any case, only the most recently received `_scroll_id` should be used.
|
always change — in any case, only the most recently received `_scroll_id` should be used.
|
||||||
|
|
||||||
NOTE: If the request specifies aggregations, only the initial search response
|
NOTE: If the request specifies aggregations, only the initial search response
|
||||||
|
@ -340,85 +481,3 @@ For append only time-based indices, the `timestamp` field can be used safely.
|
||||||
|
|
||||||
NOTE: By default the maximum number of slices allowed per scroll is limited to 1024.
|
NOTE: By default the maximum number of slices allowed per scroll is limited to 1024.
|
||||||
You can update the `index.max_slices_per_scroll` index setting to bypass this limit.
|
You can update the `index.max_slices_per_scroll` index setting to bypass this limit.
|
||||||
|
|
||||||
[discrete]
|
|
||||||
[[search-after]]
|
|
||||||
=== Search after
|
|
||||||
|
|
||||||
Pagination of results can be done by using the `from` and `size` but the cost becomes prohibitive when the deep pagination is reached.
|
|
||||||
The `index.max_result_window` which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to `from + size`.
|
|
||||||
The <<scroll-search-results,scroll>> API is recommended for efficient deep scrolling but scroll contexts are costly and it is not
|
|
||||||
recommended to use it for real time user requests.
|
|
||||||
The `search_after` parameter circumvents this problem by providing a live cursor.
|
|
||||||
The idea is to use the results from the previous page to help the retrieval of the next page.
|
|
||||||
|
|
||||||
Suppose that the query to retrieve the first page looks like this:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
--------------------------------------------------
|
|
||||||
GET my-index-000001/_search
|
|
||||||
{
|
|
||||||
"size": 10,
|
|
||||||
"query": {
|
|
||||||
"match" : {
|
|
||||||
"message" : "foo"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"sort": [
|
|
||||||
{"@timestamp": "asc"},
|
|
||||||
{"tie_breaker_id": "asc"} <1>
|
|
||||||
]
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[setup:my_index]
|
|
||||||
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
|
||||||
|
|
||||||
<1> A copy of the `_id` field with `doc_values` enabled
|
|
||||||
|
|
||||||
[IMPORTANT]
|
|
||||||
A field with one unique value per document should be used as the tiebreaker
|
|
||||||
of the sort specification. Otherwise the sort order for documents that have
|
|
||||||
the same sort values would be undefined and could lead to missing or duplicate
|
|
||||||
results. The <<mapping-id-field,`_id` field>> has a unique value per document
|
|
||||||
but it is not recommended to use it as a tiebreaker directly.
|
|
||||||
Beware that `search_after` looks for the first document which fully or partially
|
|
||||||
matches tiebreaker's provided value. Therefore if a document has a tiebreaker value of
|
|
||||||
`"654323"` and you `search_after` for `"654"` it would still match that document
|
|
||||||
and return results found after it.
|
|
||||||
<<doc-values,doc value>> are disabled on this field so sorting on it requires
|
|
||||||
to load a lot of data in memory. Instead it is advised to duplicate (client side
|
|
||||||
or with a <<ingest-processors,set ingest processor>>) the content
|
|
||||||
of the <<mapping-id-field,`_id` field>> in another field that has
|
|
||||||
<<doc-values,doc value>> enabled and to use this new field as the tiebreaker
|
|
||||||
for the sort.
|
|
||||||
|
|
||||||
The result from the above request includes an array of `sort values` for each document.
|
|
||||||
These `sort values` can be used in conjunction with the `search_after` parameter to start returning results "after" any
|
|
||||||
document in the result list.
|
|
||||||
For instance we can use the `sort values` of the last document and pass it to `search_after` to retrieve the next page of results:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
--------------------------------------------------
|
|
||||||
GET my-index-000001/_search
|
|
||||||
{
|
|
||||||
"size": 10,
|
|
||||||
"query": {
|
|
||||||
"match" : {
|
|
||||||
"message" : "foo"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"search_after": [1463538857, "654323"],
|
|
||||||
"sort": [
|
|
||||||
{"@timestamp": "asc"},
|
|
||||||
{"tie_breaker_id": "asc"}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[setup:my_index]
|
|
||||||
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
|
||||||
|
|
||||||
NOTE: The parameter `from` must be set to 0 (or -1) when `search_after` is used.
|
|
||||||
|
|
||||||
`search_after` is not a solution to jump freely to a random page but rather to scroll many queries in parallel.
|
|
||||||
It is very similar to the `scroll` API but unlike it, the `search_after` parameter is stateless, it is always resolved against the latest
|
|
||||||
version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.
|
|
||||||
|
|
|
@ -89,21 +89,9 @@ computation as part of a hit. Defaults to `false`.
|
||||||
|
|
||||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
|
||||||
+
|
+
|
||||||
--
|
By default, you cannot page through more than 10,000 hits using the `from` and
|
||||||
By default, you cannot page through more than 10,000 documents using the `from`
|
`size` parameters. To page through more hits, use the
|
||||||
and `size` parameters. This limit is set using the
|
|
||||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
|
||||||
|
|
||||||
Deep paging or requesting many results at once can result in slow searches.
|
|
||||||
Results are sorted before being returned. Because search requests usually span
|
|
||||||
multiple shards, each shard must generate its own sorted results. These separate
|
|
||||||
results must then be combined and sorted to ensure that the overall order is
|
|
||||||
correct.
|
|
||||||
|
|
||||||
As an alternative to deep paging, we recommend using
|
|
||||||
<<scroll-search-results,scroll>> or the
|
|
||||||
<<search-after,`search_after`>> parameter.
|
<<search-after,`search_after`>> parameter.
|
||||||
--
|
|
||||||
|
|
||||||
`ignore_throttled`::
|
`ignore_throttled`::
|
||||||
(Optional, boolean) If `true`, concrete, expanded or aliased indices will be
|
(Optional, boolean) If `true`, concrete, expanded or aliased indices will be
|
||||||
|
@ -229,25 +217,10 @@ last modification of each hit. See <<optimistic-concurrency-control>>.
|
||||||
`size`::
|
`size`::
|
||||||
(Optional, integer) Defines the number of hits to return. Defaults to `10`.
|
(Optional, integer) Defines the number of hits to return. Defaults to `10`.
|
||||||
+
|
+
|
||||||
--
|
By default, you cannot page through more than 10,000 hits using the `from` and
|
||||||
By default, you cannot page through more than 10,000 documents using the `from`
|
`size` parameters. To page through more hits, use the
|
||||||
and `size` parameters. This limit is set using the
|
|
||||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
|
||||||
|
|
||||||
Deep paging or requesting many results at once can result in slow searches.
|
|
||||||
Results are sorted before being returned. Because search requests usually span
|
|
||||||
multiple shards, each shard must generate its own sorted results. These separate
|
|
||||||
results must then be combined and sorted to ensure that the overall order is
|
|
||||||
correct.
|
|
||||||
|
|
||||||
As an alternative to deep paging, we recommend using
|
|
||||||
<<scroll-search-results,scroll>> or the
|
|
||||||
<<search-after,`search_after`>> parameter.
|
<<search-after,`search_after`>> parameter.
|
||||||
|
|
||||||
If the <<search-api-scroll-query-param,`scroll` parameter>> is specified, this
|
|
||||||
value cannot be `0`.
|
|
||||||
--
|
|
||||||
|
|
||||||
`sort`::
|
`sort`::
|
||||||
(Optional, string) A comma-separated list of <field>:<direction> pairs.
|
(Optional, string) A comma-separated list of <field>:<direction> pairs.
|
||||||
|
|
||||||
|
@ -366,21 +339,9 @@ computation as part of a hit. Defaults to `false`.
|
||||||
|
|
||||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
|
||||||
+
|
+
|
||||||
--
|
By default, you cannot page through more than 10,000 hits using the `from` and
|
||||||
By default, you cannot page through more than 10,000 documents using the `from`
|
`size` parameters. To page through more hits, use the
|
||||||
and `size` parameters. This limit is set using the
|
|
||||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
|
||||||
|
|
||||||
Deep paging or requesting many results at once can result in slow searches.
|
|
||||||
Results are sorted before being returned. Because search requests usually span
|
|
||||||
multiple shards, each shard must generate its own sorted results. These separate
|
|
||||||
results must then be combined and sorted to ensure that the overall order is
|
|
||||||
correct.
|
|
||||||
|
|
||||||
As an alternative to deep paging, we recommend using
|
|
||||||
<<scroll-search-results,scroll>> or the
|
|
||||||
<<search-after,`search_after`>> parameter.
|
<<search-after,`search_after`>> parameter.
|
||||||
--
|
|
||||||
|
|
||||||
`indices_boost`::
|
`indices_boost`::
|
||||||
(Optional, array of objects)
|
(Optional, array of objects)
|
||||||
|
@ -419,25 +380,10 @@ last modification of each hit. See <<optimistic-concurrency-control>>.
|
||||||
`size`::
|
`size`::
|
||||||
(Optional, integer) The number of hits to return. Defaults to `10`.
|
(Optional, integer) The number of hits to return. Defaults to `10`.
|
||||||
+
|
+
|
||||||
--
|
By default, you cannot page through more than 10,000 hits using the `from` and
|
||||||
By default, you cannot page through more than 10,000 documents using the `from`
|
`size` parameters. To page through more hits, use the
|
||||||
and `size` parameters. This limit is set using the
|
|
||||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
|
||||||
|
|
||||||
Deep paging or requesting many results at once can result in slow searches.
|
|
||||||
Results are sorted before being returned. Because search requests usually span
|
|
||||||
multiple shards, each shard must generate its own sorted results. These separate
|
|
||||||
results must then be combined and sorted to ensure that the overall order is
|
|
||||||
correct.
|
|
||||||
|
|
||||||
As an alternative to deep paging, we recommend using
|
|
||||||
<<scroll-search-results,scroll>> or the
|
|
||||||
<<search-after,`search_after`>> parameter.
|
<<search-after,`search_after`>> parameter.
|
||||||
|
|
||||||
If the <<search-api-scroll-query-param,`scroll` parameter>> is specified, this
|
|
||||||
value cannot be `0`.
|
|
||||||
--
|
|
||||||
|
|
||||||
`_source`::
|
`_source`::
|
||||||
(Optional)
|
(Optional)
|
||||||
Indicates which <<mapping-source-field,source fields>> are returned for matching
|
Indicates which <<mapping-source-field,source fields>> are returned for matching
|
||||||
|
|
Loading…
Reference in New Issue