OpenSearch/docs/reference/index-modules/index-sorting.asciidoc

132 lines
4.3 KiB
Plaintext

[[index-modules-index-sorting]]
== Index Sorting
experimental[]
When creating a new index in elasticsearch it is possible to configure how the Segments
inside each Shard will be sorted. By default Lucene does not apply any sort.
The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment.
[WARNING]
nested fields are not compatible with index sorting because they rely on the assumption
that nested documents are stored in contiguous doc ids, which can be broken by index sorting.
An error will be thrown if index sorting is activated on an index that contains nested fields.
For instance the following example shows how to define a sort on a single field:
[source,js]
--------------------------------------------------
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : "date", <1>
"sort.order" : "desc" <2>
}
},
"mappings": {
"tweet": {
"properties": {
"date": {
"type": "date"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
<1> This index is sorted by the `date` field
<2> ... in descending order.
It is also possible to sort the index by more than one field:
[source,js]
--------------------------------------------------
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : ["username", "date"], <1>
"sort.order" : ["asc", "desc"] <2>
}
},
"mappings": {
"tweet": {
"properties": {
"username": {
"type": "keyword",
"doc_values": true
},
"date": {
"type": "date"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
<1> This index is sorted by `username` first then by `date`
<2> ... in ascending order for the `username` field and in descending order for the `date` field.
Index sorting supports the following settings:
`index.sort.field`::
The list of fields used to sort the index.
Only `boolean`, `numeric`, `date` and `keyword` fields with `doc_values` are allowed here.
`index.sort.order`::
The sort order to use for each field.
The order option can have the following values:
* `asc`: For ascending order
* `desc`: For descending order.
`index.sort.mode`::
Elasticsearch supports sorting by multi-valued fields.
The mode option controls what value is picked to sort the document.
The mode option can have the following values:
* `min`: Pick the lowest value.
* `max`: Pick the highest value.
`index.sort.missing`::
The missing parameter specifies how docs which are missing the field should be treated.
The missing value can have the following values:
* `_last`: Documents without value for the field are sorted last.
* `_first`: Documents without value for the field are sorted first.
[WARNING]
Index sorting can be defined only once at index creation. It is not allowed to add or update
a sort on an existing index.
// TODO: Also document how index sorting can be used to early-terminate
// sorted search requests when the total number of matches is not needed
[[index-modules-index-sorting-conjunctions]]
=== Use index sorting to speed up conjunctions
Index sorting can be useful in order to organize Lucene doc ids (not to be
conflated with `_id`) in a way that makes conjunctions (a AND b AND ...) more
efficient. In order to be efficient, conjunctions rely on the fact that if any
clause does not match, then the entire conjunction does not match. By using
index sorting, we can put documents that do not match together, which will
help skip efficiently over large ranges of doc IDs that do not match the
conjunction.
This trick only works with low-cardinality fields. A rule of thumb is that
you should sort first on fields that both have a low cardinality and are
frequently used for filtering. The sort order (`asc` or `desc`) does not
matter as we only care about putting values that would match the same clauses
close to each other.
For instance if you were indexing cars for sale, it might be interesting to
sort by fuel type, body type, make, year of registration and finally mileage.