2019-07-17 08:49:22 -04:00
|
|
|
[[request-body-search-search-after]]
|
2019-07-19 14:35:36 -04:00
|
|
|
==== Search After
|
2016-01-12 11:40:34 -05:00
|
|
|
|
|
|
|
Pagination of results can be done by using the `from` and `size` but the cost becomes prohibitive when the deep pagination is reached.
|
|
|
|
The `index.max_result_window` which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to `from + size`.
|
2019-07-19 09:16:35 -04:00
|
|
|
The <<request-body-search-scroll,Scroll>> api is recommended for efficient deep scrolling but scroll contexts are costly and it is not
|
2016-01-12 11:40:34 -05:00
|
|
|
recommended to use it for real time user requests.
|
|
|
|
The `search_after` parameter circumvents this problem by providing a live cursor.
|
|
|
|
The idea is to use the results from the previous page to help the retrieval of the next page.
|
|
|
|
|
|
|
|
Suppose that the query to retrieve the first page looks like this:
|
2019-09-09 12:35:50 -04:00
|
|
|
|
|
|
|
[source,console]
|
2016-01-12 11:40:34 -05:00
|
|
|
--------------------------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
GET twitter/_search
|
2016-01-12 11:40:34 -05:00
|
|
|
{
|
2016-05-17 22:35:48 -04:00
|
|
|
"size": 10,
|
2016-01-12 11:40:34 -05:00
|
|
|
"query": {
|
|
|
|
"match" : {
|
|
|
|
"title" : "elasticsearch"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"sort": [
|
2016-05-17 22:35:48 -04:00
|
|
|
{"date": "asc"},
|
2018-11-14 04:50:31 -05:00
|
|
|
{"tie_breaker_id": "asc"} <1>
|
2016-01-12 11:40:34 -05:00
|
|
|
]
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2016-05-17 22:35:48 -04:00
|
|
|
// TEST[setup:twitter]
|
2018-11-14 04:50:31 -05:00
|
|
|
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
2016-01-12 11:40:34 -05:00
|
|
|
|
2018-11-14 04:50:31 -05:00
|
|
|
<1> A copy of the `_id` field with `doc_values` enabled
|
|
|
|
|
|
|
|
[IMPORTANT]
|
|
|
|
A field with one unique value per document should be used as the tiebreaker
|
|
|
|
of the sort specification. Otherwise the sort order for documents that have
|
|
|
|
the same sort values would be undefined and could lead to missing or duplicate
|
|
|
|
results. The <<mapping-id-field,`_id` field>> has a unique value per document
|
|
|
|
but it is not recommended to use it as a tiebreaker directly.
|
2018-11-30 08:30:23 -05:00
|
|
|
Beware that `search_after` looks for the first document which fully or partially
|
|
|
|
matches tiebreaker's provided value. Therefore if a document has a tiebreaker value of
|
|
|
|
`"654323"` and you `search_after` for `"654"` it would still match that document
|
|
|
|
and return results found after it.
|
2018-11-14 04:50:31 -05:00
|
|
|
<<doc-values,doc value>> are disabled on this field so sorting on it requires
|
|
|
|
to load a lot of data in memory. Instead it is advised to duplicate (client side
|
|
|
|
or with a <<ingest-processors,set ingest processor>>) the content
|
|
|
|
of the <<mapping-id-field,`_id` field>> in another field that has
|
|
|
|
<<doc-values,doc value>> enabled and to use this new field as the tiebreaker
|
|
|
|
for the sort.
|
2016-01-12 11:40:34 -05:00
|
|
|
|
|
|
|
The result from the above request includes an array of `sort values` for each document.
|
|
|
|
These `sort values` can be used in conjunction with the `search_after` parameter to start returning results "after" any
|
|
|
|
document in the result list.
|
|
|
|
For instance we can use the `sort values` of the last document and pass it to `search_after` to retrieve the next page of results:
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2016-01-12 11:40:34 -05:00
|
|
|
--------------------------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
GET twitter/_search
|
2016-01-12 11:40:34 -05:00
|
|
|
{
|
2016-05-17 22:35:48 -04:00
|
|
|
"size": 10,
|
2016-01-12 11:40:34 -05:00
|
|
|
"query": {
|
|
|
|
"match" : {
|
|
|
|
"title" : "elasticsearch"
|
|
|
|
}
|
|
|
|
},
|
2017-05-09 10:33:52 -04:00
|
|
|
"search_after": [1463538857, "654323"],
|
2016-01-12 11:40:34 -05:00
|
|
|
"sort": [
|
2016-05-17 22:35:48 -04:00
|
|
|
{"date": "asc"},
|
2018-11-14 04:50:31 -05:00
|
|
|
{"tie_breaker_id": "asc"}
|
2016-01-12 11:40:34 -05:00
|
|
|
]
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2016-05-17 22:35:48 -04:00
|
|
|
// TEST[setup:twitter]
|
2018-11-14 04:50:31 -05:00
|
|
|
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
2016-01-12 11:40:34 -05:00
|
|
|
|
|
|
|
NOTE: The parameter `from` must be set to 0 (or -1) when `search_after` is used.
|
|
|
|
|
|
|
|
`search_after` is not a solution to jump freely to a random page but rather to scroll many queries in parallel.
|
|
|
|
It is very similar to the `scroll` API but unlike it, the `search_after` parameter is stateless, it is always resolved against the latest
|
|
|
|
version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.
|