641 lines
16 KiB
Plaintext
641 lines
16 KiB
Plaintext
[[sort-search-results]]
|
|
== Sort search results
|
|
|
|
Allows you to add one or more sorts on specific fields. Each sort can be
|
|
reversed as well. The sort is defined on a per field level, with special
|
|
field name for `_score` to sort by score, and `_doc` to sort by index order.
|
|
|
|
Assuming the following index mapping:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /my-index-000001
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"post_date": { "type": "date" },
|
|
"user": {
|
|
"type": "keyword"
|
|
},
|
|
"name": {
|
|
"type": "keyword"
|
|
},
|
|
"age": { "type": "integer" }
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /my-index-000001/_search
|
|
{
|
|
"sort" : [
|
|
{ "post_date" : {"order" : "asc"}},
|
|
"user",
|
|
{ "name" : "desc" },
|
|
{ "age" : "desc" },
|
|
"_score"
|
|
],
|
|
"query" : {
|
|
"term" : { "user" : "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
NOTE: `_doc` has no real use-case besides being the most efficient sort order.
|
|
So if you don't care about the order in which documents are returned, then you
|
|
should sort by `_doc`. This especially helps when <<scroll-search-results,scrolling>>.
|
|
|
|
[discrete]
|
|
=== Sort Values
|
|
|
|
The sort values for each document returned are also returned as part of
|
|
the response.
|
|
|
|
[discrete]
|
|
=== Sort Order
|
|
|
|
The `order` option can have the following values:
|
|
|
|
[horizontal]
|
|
`asc`:: Sort in ascending order
|
|
`desc`:: Sort in descending order
|
|
|
|
The order defaults to `desc` when sorting on the `_score`, and defaults
|
|
to `asc` when sorting on anything else.
|
|
|
|
[discrete]
|
|
=== Sort mode option
|
|
|
|
Elasticsearch supports sorting by array or multi-valued fields. The `mode` option
|
|
controls what array value is picked for sorting the document it belongs
|
|
to. The `mode` option can have the following values:
|
|
|
|
[horizontal]
|
|
`min`:: Pick the lowest value.
|
|
`max`:: Pick the highest value.
|
|
`sum`:: Use the sum of all values as sort value. Only applicable for
|
|
number based array fields.
|
|
`avg`:: Use the average of all values as sort value. Only applicable
|
|
for number based array fields.
|
|
`median`:: Use the median of all values as sort value. Only applicable
|
|
for number based array fields.
|
|
|
|
The default sort mode in the ascending sort order is `min` -- the lowest value
|
|
is picked. The default sort mode in the descending order is `max` --
|
|
the highest value is picked.
|
|
|
|
[discrete]
|
|
==== Sort mode example usage
|
|
|
|
In the example below the field price has multiple prices per document.
|
|
In this case the result hits will be sorted by price ascending based on
|
|
the average price per document.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /my-index-000001/_doc/1?refresh
|
|
{
|
|
"product": "chocolate",
|
|
"price": [20, 4]
|
|
}
|
|
|
|
POST /_search
|
|
{
|
|
"query" : {
|
|
"term" : { "product" : "chocolate" }
|
|
},
|
|
"sort" : [
|
|
{"price" : {"order" : "asc", "mode" : "avg"}}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
=== Sorting numeric fields
|
|
|
|
For numeric fields it is also possible to cast the values from one type
|
|
to another using the `numeric_type` option.
|
|
This option accepts the following values: [`"double", "long", "date", "date_nanos"`]
|
|
and can be useful for searches across multiple data streams or indices where the sort field is mapped differently.
|
|
|
|
Consider for instance these two indices:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /index_double
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"field": { "type": "double" }
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /index_long
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"field": { "type": "long" }
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
Since `field` is mapped as a `double` in the first index and as a `long`
|
|
in the second index, it is not possible to use this field to sort requests
|
|
that query both indices by default. However you can force the type to one
|
|
or the other with the `numeric_type` option in order to force a specific
|
|
type for all indices:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /index_long,index_double/_search
|
|
{
|
|
"sort" : [
|
|
{
|
|
"field" : {
|
|
"numeric_type" : "double"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
In the example above, values for the `index_long` index are casted to
|
|
a double in order to be compatible with the values produced by the
|
|
`index_double` index.
|
|
It is also possible to transform a floating point field into a `long`
|
|
but note that in this case floating points are replaced by the largest
|
|
value that is less than or equal (greater than or equal if the value
|
|
is negative) to the argument and is equal to a mathematical integer.
|
|
|
|
This option can also be used to convert a `date` field that uses millisecond
|
|
resolution to a `date_nanos` field with nanosecond resolution.
|
|
Consider for instance these two indices:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /index_double
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"field": { "type": "date" }
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /index_long
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"field": { "type": "date_nanos" }
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
Values in these indices are stored with different resolutions so sorting on these
|
|
fields will always sort the `date` before the `date_nanos` (ascending order).
|
|
With the `numeric_type` type option it is possible to set a single resolution for
|
|
the sort, setting to `date` will convert the `date_nanos` to the millisecond resolution
|
|
while `date_nanos` will convert the values in the `date` field to the nanoseconds resolution:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /index_long,index_double/_search
|
|
{
|
|
"sort" : [
|
|
{
|
|
"field" : {
|
|
"numeric_type" : "date_nanos"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
[WARNING]
|
|
To avoid overflow, the conversion to `date_nanos` cannot be applied on dates before
|
|
1970 and after 2262 as nanoseconds are represented as longs.
|
|
|
|
[discrete]
|
|
[[nested-sorting]]
|
|
=== Sorting within nested objects.
|
|
|
|
Elasticsearch also supports sorting by
|
|
fields that are inside one or more nested objects. The sorting by nested
|
|
field support has a `nested` sort option with the following properties:
|
|
|
|
`path`::
|
|
Defines on which nested object to sort. The actual
|
|
sort field must be a direct field inside this nested object.
|
|
When sorting by nested field, this field is mandatory.
|
|
|
|
`filter`::
|
|
A filter that the inner objects inside the nested path
|
|
should match with in order for its field values to be taken into account
|
|
by sorting. Common case is to repeat the query / filter inside the
|
|
nested filter or query. By default no `nested_filter` is active.
|
|
`max_children`::
|
|
The maximum number of children to consider per root document
|
|
when picking the sort value. Defaults to unlimited.
|
|
`nested`::
|
|
Same as top-level `nested` but applies to another nested path within the
|
|
current nested object.
|
|
|
|
[WARNING]
|
|
.Nested sort options before Elasticsearch 6.1
|
|
============================================
|
|
|
|
The `nested_path` and `nested_filter` options have been deprecated in
|
|
favor of the options documented above.
|
|
|
|
============================================
|
|
|
|
[discrete]
|
|
==== Nested sorting examples
|
|
|
|
In the below example `offer` is a field of type `nested`.
|
|
The nested `path` needs to be specified; otherwise, Elasticsearch doesn't know on what nested level sort values need to be captured.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /_search
|
|
{
|
|
"query" : {
|
|
"term" : { "product" : "chocolate" }
|
|
},
|
|
"sort" : [
|
|
{
|
|
"offer.price" : {
|
|
"mode" : "avg",
|
|
"order" : "asc",
|
|
"nested": {
|
|
"path": "offer",
|
|
"filter": {
|
|
"term" : { "offer.color" : "blue" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
In the below example `parent` and `child` fields are of type `nested`.
|
|
The `nested_path` needs to be specified at each level; otherwise, Elasticsearch doesn't know on what nested level sort values need to be captured.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /_search
|
|
{
|
|
"query": {
|
|
"nested": {
|
|
"path": "parent",
|
|
"query": {
|
|
"bool": {
|
|
"must": {"range": {"parent.age": {"gte": 21}}},
|
|
"filter": {
|
|
"nested": {
|
|
"path": "parent.child",
|
|
"query": {"match": {"parent.child.name": "matt"}}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"sort" : [
|
|
{
|
|
"parent.child.age" : {
|
|
"mode" : "min",
|
|
"order" : "asc",
|
|
"nested": {
|
|
"path": "parent",
|
|
"filter": {
|
|
"range": {"parent.age": {"gte": 21}}
|
|
},
|
|
"nested": {
|
|
"path": "parent.child",
|
|
"filter": {
|
|
"match": {"parent.child.name": "matt"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Nested sorting is also supported when sorting by
|
|
scripts and sorting by geo distance.
|
|
|
|
[discrete]
|
|
=== Missing Values
|
|
|
|
The `missing` parameter specifies how docs which are missing
|
|
the sort field should be treated: The `missing` value can be
|
|
set to `_last`, `_first`, or a custom value (that
|
|
will be used for missing docs as the sort value).
|
|
The default is `_last`.
|
|
|
|
For example:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort" : [
|
|
{ "price" : {"missing" : "_last"} }
|
|
],
|
|
"query" : {
|
|
"term" : { "product" : "chocolate" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: If a nested inner object doesn't match with
|
|
the `nested_filter` then a missing value is used.
|
|
|
|
[discrete]
|
|
=== Ignoring Unmapped Fields
|
|
|
|
By default, the search request will fail if there is no mapping
|
|
associated with a field. The `unmapped_type` option allows you to ignore
|
|
fields that have no mapping and not sort by them. The value of this
|
|
parameter is used to determine what sort values to emit. Here is an
|
|
example of how it can be used:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort" : [
|
|
{ "price" : {"unmapped_type" : "long"} }
|
|
],
|
|
"query" : {
|
|
"term" : { "product" : "chocolate" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
If any of the indices that are queried doesn't have a mapping for `price`
|
|
then Elasticsearch will handle it as if there was a mapping of type
|
|
`long`, with all documents in this index having no value for this field.
|
|
|
|
[discrete]
|
|
[[geo-sorting]]
|
|
=== Geo Distance Sorting
|
|
|
|
Allow to sort by `_geo_distance`. Here is an example, assuming `pin.location` is a field of type `geo_point`:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort" : [
|
|
{
|
|
"_geo_distance" : {
|
|
"pin.location" : [-70, 40],
|
|
"order" : "asc",
|
|
"unit" : "km",
|
|
"mode" : "min",
|
|
"distance_type" : "arc",
|
|
"ignore_unmapped": true
|
|
}
|
|
}
|
|
],
|
|
"query" : {
|
|
"term" : { "user" : "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
`distance_type`::
|
|
|
|
How to compute the distance. Can either be `arc` (default), or `plane` (faster, but inaccurate on long distances and close to the poles).
|
|
|
|
`mode`::
|
|
|
|
What to do in case a field has several geo points. By default, the shortest
|
|
distance is taken into account when sorting in ascending order and the
|
|
longest distance when sorting in descending order. Supported values are
|
|
`min`, `max`, `median` and `avg`.
|
|
|
|
`unit`::
|
|
|
|
The unit to use when computing sort values. The default is `m` (meters).
|
|
|
|
|
|
`ignore_unmapped`::
|
|
|
|
Indicates if the unmapped field should be treated as a missing value. Setting it to `true` is equivalent to specifying
|
|
an `unmapped_type` in the field sort. The default is `false` (unmapped field cause the search to fail).
|
|
|
|
NOTE: geo distance sorting does not support configurable missing values: the
|
|
distance will always be considered equal to +Infinity+ when a document does not
|
|
have values for the field that is used for distance computation.
|
|
|
|
The following formats are supported in providing the coordinates:
|
|
|
|
[discrete]
|
|
==== Lat Lon as Properties
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort" : [
|
|
{
|
|
"_geo_distance" : {
|
|
"pin.location" : {
|
|
"lat" : 40,
|
|
"lon" : -70
|
|
},
|
|
"order" : "asc",
|
|
"unit" : "km"
|
|
}
|
|
}
|
|
],
|
|
"query" : {
|
|
"term" : { "user" : "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
==== Lat Lon as String
|
|
|
|
Format in `lat,lon`.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort": [
|
|
{
|
|
"_geo_distance": {
|
|
"pin.location": "40,-70",
|
|
"order": "asc",
|
|
"unit": "km"
|
|
}
|
|
}
|
|
],
|
|
"query": {
|
|
"term": { "user": "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
==== Geohash
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort": [
|
|
{
|
|
"_geo_distance": {
|
|
"pin.location": "drm3btev3e86",
|
|
"order": "asc",
|
|
"unit": "km"
|
|
}
|
|
}
|
|
],
|
|
"query": {
|
|
"term": { "user": "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
==== Lat Lon as Array
|
|
|
|
Format in `[lon, lat]`, note, the order of lon/lat here in order to
|
|
conform with http://geojson.org/[GeoJSON].
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort": [
|
|
{
|
|
"_geo_distance": {
|
|
"pin.location": [ -70, 40 ],
|
|
"order": "asc",
|
|
"unit": "km"
|
|
}
|
|
}
|
|
],
|
|
"query": {
|
|
"term": { "user": "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
=== Multiple reference points
|
|
|
|
Multiple geo points can be passed as an array containing any `geo_point` format, for example
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"sort": [
|
|
{
|
|
"_geo_distance": {
|
|
"pin.location": [ [ -70, 40 ], [ -71, 42 ] ],
|
|
"order": "asc",
|
|
"unit": "km"
|
|
}
|
|
}
|
|
],
|
|
"query": {
|
|
"term": { "user": "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
and so forth.
|
|
|
|
The final distance for a document will then be `min`/`max`/`avg` (defined via `mode`) distance of all points contained in the document to all points given in the sort request.
|
|
|
|
|
|
[discrete]
|
|
=== Script Based Sorting
|
|
|
|
Allow to sort based on custom scripts, here is an example:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"term": { "user": "kimchy" }
|
|
},
|
|
"sort": {
|
|
"_script": {
|
|
"type": "number",
|
|
"script": {
|
|
"lang": "painless",
|
|
"source": "doc['field_name'].value * params.factor",
|
|
"params": {
|
|
"factor": 1.1
|
|
}
|
|
},
|
|
"order": "asc"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
=== Track Scores
|
|
|
|
When sorting on a field, scores are not computed. By setting
|
|
`track_scores` to true, scores will still be computed and tracked.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"track_scores": true,
|
|
"sort" : [
|
|
{ "post_date" : {"order" : "desc"} },
|
|
{ "name" : "desc" },
|
|
{ "age" : "desc" }
|
|
],
|
|
"query" : {
|
|
"term" : { "user" : "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[discrete]
|
|
=== Memory Considerations
|
|
|
|
When sorting, the relevant sorted field values are loaded into memory.
|
|
This means that per shard, there should be enough memory to contain
|
|
them. For string based types, the field sorted on should not be analyzed
|
|
/ tokenized. For numeric types, if possible, it is recommended to
|
|
explicitly set the type to narrower types (like `short`, `integer` and
|
|
`float`).
|