Do not recommend to use the _id field in search_after docs (#35370)

The documentation of `search_after` recommends to use the `_id`
field as a tiebreaker for the sort without warning against
the additional memory required. This change changes the recommandation
to use a copy of the `_id` field with doc_values enabled.
This commit is contained in:
Jim Ferenczi 2018-11-14 10:50:31 +01:00 committed by GitHub
parent 0ce4649e88
commit 72504c2512
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 18 additions and 5 deletions

View File

@ -21,16 +21,28 @@ GET twitter/_search
}, },
"sort": [ "sort": [
{"date": "asc"}, {"date": "asc"},
{"_id": "desc"} {"tie_breaker_id": "asc"} <1>
] ]
} }
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
// TEST[setup:twitter] // TEST[setup:twitter]
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
NOTE: A field with one unique value per document should be used as the tiebreaker of the sort specification. <1> A copy of the `_id` field with `doc_values` enabled
Otherwise the sort order for documents that have the same sort values would be undefined. The recommended way is to use
the field `_id` which is certain to contain one unique value for each document. [IMPORTANT]
A field with one unique value per document should be used as the tiebreaker
of the sort specification. Otherwise the sort order for documents that have
the same sort values would be undefined and could lead to missing or duplicate
results. The <<mapping-id-field,`_id` field>> has a unique value per document
but it is not recommended to use it as a tiebreaker directly.
<<doc-values,doc value>> are disabled on this field so sorting on it requires
to load a lot of data in memory. Instead it is advised to duplicate (client side
or with a <<ingest-processors,set ingest processor>>) the content
of the <<mapping-id-field,`_id` field>> in another field that has
<<doc-values,doc value>> enabled and to use this new field as the tiebreaker
for the sort.
The result from the above request includes an array of `sort values` for each document. The result from the above request includes an array of `sort values` for each document.
These `sort values` can be used in conjunction with the `search_after` parameter to start returning results "after" any These `sort values` can be used in conjunction with the `search_after` parameter to start returning results "after" any
@ -50,12 +62,13 @@ GET twitter/_search
"search_after": [1463538857, "654323"], "search_after": [1463538857, "654323"],
"sort": [ "sort": [
{"date": "asc"}, {"date": "asc"},
{"_id": "desc"} {"tie_breaker_id": "asc"}
] ]
} }
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
// TEST[setup:twitter] // TEST[setup:twitter]
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
NOTE: The parameter `from` must be set to 0 (or -1) when `search_after` is used. NOTE: The parameter `from` must be set to 0 (or -1) when `search_after` is used.