132 lines
4.3 KiB
Plaintext
132 lines
4.3 KiB
Plaintext
[[index-modules-index-sorting]]
|
|
== Index Sorting
|
|
|
|
experimental[]
|
|
|
|
When creating a new index in elasticsearch it is possible to configure how the Segments
|
|
inside each Shard will be sorted. By default Lucene does not apply any sort.
|
|
The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment.
|
|
|
|
[WARNING]
|
|
nested fields are not compatible with index sorting because they rely on the assumption
|
|
that nested documents are stored in contiguous doc ids, which can be broken by index sorting.
|
|
An error will be thrown if index sorting is activated on an index that contains nested fields.
|
|
|
|
For instance the following example shows how to define a sort on a single field:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter
|
|
{
|
|
"settings" : {
|
|
"index" : {
|
|
"sort.field" : "date", <1>
|
|
"sort.order" : "desc" <2>
|
|
}
|
|
},
|
|
"mappings": {
|
|
"tweet": {
|
|
"properties": {
|
|
"date": {
|
|
"type": "date"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<1> This index is sorted by the `date` field
|
|
<2> ... in descending order.
|
|
|
|
It is also possible to sort the index by more than one field:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter
|
|
{
|
|
"settings" : {
|
|
"index" : {
|
|
"sort.field" : ["username", "date"], <1>
|
|
"sort.order" : ["asc", "desc"] <2>
|
|
}
|
|
},
|
|
"mappings": {
|
|
"tweet": {
|
|
"properties": {
|
|
"username": {
|
|
"type": "keyword",
|
|
"doc_values": true
|
|
},
|
|
"date": {
|
|
"type": "date"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<1> This index is sorted by `username` first then by `date`
|
|
<2> ... in ascending order for the `username` field and in descending order for the `date` field.
|
|
|
|
|
|
Index sorting supports the following settings:
|
|
|
|
`index.sort.field`::
|
|
|
|
The list of fields used to sort the index.
|
|
Only `boolean`, `numeric`, `date` and `keyword` fields with `doc_values` are allowed here.
|
|
|
|
`index.sort.order`::
|
|
|
|
The sort order to use for each field.
|
|
The order option can have the following values:
|
|
* `asc`: For ascending order
|
|
* `desc`: For descending order.
|
|
|
|
`index.sort.mode`::
|
|
|
|
Elasticsearch supports sorting by multi-valued fields.
|
|
The mode option controls what value is picked to sort the document.
|
|
The mode option can have the following values:
|
|
* `min`: Pick the lowest value.
|
|
* `max`: Pick the highest value.
|
|
|
|
`index.sort.missing`::
|
|
|
|
The missing parameter specifies how docs which are missing the field should be treated.
|
|
The missing value can have the following values:
|
|
* `_last`: Documents without value for the field are sorted last.
|
|
* `_first`: Documents without value for the field are sorted first.
|
|
|
|
[WARNING]
|
|
Index sorting can be defined only once at index creation. It is not allowed to add or update
|
|
a sort on an existing index.
|
|
|
|
// TODO: Also document how index sorting can be used to early-terminate
|
|
// sorted search requests when the total number of matches is not needed
|
|
|
|
[[index-modules-index-sorting-conjunctions]]
|
|
=== Use index sorting to speed up conjunctions
|
|
|
|
Index sorting can be useful in order to organize Lucene doc ids (not to be
|
|
conflated with `_id`) in a way that makes conjunctions (a AND b AND ...) more
|
|
efficient. In order to be efficient, conjunctions rely on the fact that if any
|
|
clause does not match, then the entire conjunction does not match. By using
|
|
index sorting, we can put documents that do not match together, which will
|
|
help skip efficiently over large ranges of doc IDs that do not match the
|
|
conjunction.
|
|
|
|
This trick only works with low-cardinality fields. A rule of thumb is that
|
|
you should sort first on fields that both have a low cardinality and are
|
|
frequently used for filtering. The sort order (`asc` or `desc`) does not
|
|
matter as we only care about putting values that would match the same clauses
|
|
close to each other.
|
|
|
|
For instance if you were indexing cars for sale, it might be interesting to
|
|
sort by fuel type, body type, make, year of registration and finally mileage.
|
|
|