2020-07-30 09:19:05 -04:00
[[collapse-search-results]]
2020-07-30 16:45:18 -04:00
== Collapse search results
2017-01-23 10:33:51 -05:00
2020-07-30 09:19:05 -04:00
You can use the `collapse` parameter to collapse search results based
on field values. The collapsing is done by selecting only the top sorted
2020-08-04 14:16:38 -04:00
document per collapse key.
For example, the following search collapses results by `user.id` and sorts them
by `http.response.bytes`.
2017-01-23 10:33:51 -05:00
2019-09-09 12:35:50 -04:00
[source,console]
2017-01-23 10:33:51 -05:00
--------------------------------------------------
2020-08-04 14:16:38 -04:00
GET /my-index-000001/_search
2017-01-23 10:33:51 -05:00
{
2020-07-21 15:49:58 -04:00
"query": {
"match": {
2020-08-04 14:16:38 -04:00
"message": "GET /search"
2020-07-21 15:49:58 -04:00
}
},
"collapse": {
2020-08-04 14:16:38 -04:00
"field": "user.id" <1>
2020-07-21 15:49:58 -04:00
},
2020-08-04 14:16:38 -04:00
"sort": [ "http.response.bytes" ], <2>
"from": 10 <3>
2017-01-23 10:33:51 -05:00
}
--------------------------------------------------
2020-08-04 14:16:38 -04:00
// TEST[setup:my_index]
2019-09-09 12:35:50 -04:00
2020-08-04 14:16:38 -04:00
<1> Collapse the result set using the "user.id" field
<2> Sort the results by `http.response.bytes`
2017-01-23 10:33:51 -05:00
<3> define the offset of the first collapsed result
WARNING: The total number of hits in the response indicates the number of matching documents without collapsing.
The total number of distinct group is unknown.
2017-01-30 07:57:28 -05:00
The field used for collapsing must be a single valued <<keyword, `keyword`>> or <<number, `numeric`>> field with <<doc-values, `doc_values`>> activated
2017-01-23 10:33:51 -05:00
NOTE: The collapsing is applied to the top hits only and does not affect aggregations.
2020-07-30 16:45:18 -04:00
[discrete]
2020-07-30 09:19:05 -04:00
[[expand-collapse-results]]
2020-07-30 16:45:18 -04:00
=== Expand collapse results
2017-01-23 10:33:51 -05:00
It is also possible to expand each collapsed top hits with the `inner_hits` option.
2019-09-09 12:35:50 -04:00
[source,console]
2017-01-23 10:33:51 -05:00
--------------------------------------------------
2020-08-04 14:16:38 -04:00
GET /my-index-000001/_search
2017-01-23 10:33:51 -05:00
{
2020-07-21 15:49:58 -04:00
"query": {
"match": {
2020-08-04 14:16:38 -04:00
"message": "GET /search"
2020-07-21 15:49:58 -04:00
}
},
"collapse": {
2020-08-04 14:16:38 -04:00
"field": "user.id", <1>
2020-07-21 15:49:58 -04:00
"inner_hits": {
2020-08-04 14:16:38 -04:00
"name": "most_recent", <2>
"size": 5, <3>
"sort": [ { "@timestamp": "asc" } ] <4>
2017-01-23 10:33:51 -05:00
},
2020-08-04 14:16:38 -04:00
"max_concurrent_group_searches": 4 <5>
2020-07-21 15:49:58 -04:00
},
2020-08-04 14:16:38 -04:00
"sort": [ "http.response.bytes" ]
2017-01-23 10:33:51 -05:00
}
--------------------------------------------------
2020-08-04 14:16:38 -04:00
// TEST[setup:my_index]
2019-09-09 12:35:50 -04:00
2020-08-04 14:16:38 -04:00
<1> collapse the result set using the "user.id" field
2017-01-23 10:33:51 -05:00
<2> the name used for the inner hit section in the response
<3> the number of inner_hits to retrieve per collapse key
<4> how to sort the document inside each group
2020-08-04 14:16:38 -04:00
<5> the number of concurrent requests allowed to retrieve the `inner_hits` per group
2017-01-23 10:33:51 -05:00
2020-08-06 14:06:01 -04:00
See <<inner-hits, inner hits>> for the complete list of supported options and the format of the response.
2017-01-23 10:33:51 -05:00
2017-05-05 13:59:11 -04:00
It is also possible to request multiple `inner_hits` for each collapsed hit. This can be useful when you want to get
multiple representations of the collapsed hits.
2019-09-09 12:35:50 -04:00
[source,console]
2017-05-05 13:59:11 -04:00
--------------------------------------------------
2020-08-04 14:16:38 -04:00
GET /my-index-000001/_search
2017-05-05 13:59:11 -04:00
{
2020-07-21 15:49:58 -04:00
"query": {
"match": {
2020-08-04 14:16:38 -04:00
"message": "GET /search"
2020-07-21 15:49:58 -04:00
}
},
"collapse": {
2020-08-04 14:16:38 -04:00
"field": "user.id", <1>
2020-07-21 15:49:58 -04:00
"inner_hits": [
{
2020-08-04 14:16:38 -04:00
"name": "largest_responses", <2>
2020-07-21 15:49:58 -04:00
"size": 3,
2020-08-04 14:16:38 -04:00
"sort": [ "http.response.bytes" ]
2020-07-21 15:49:58 -04:00
},
{
2020-08-04 14:16:38 -04:00
"name": "most_recent", <3>
2020-07-21 15:49:58 -04:00
"size": 3,
2020-08-04 14:16:38 -04:00
"sort": [ { "@timestamp": "asc" } ]
2020-07-21 15:49:58 -04:00
}
]
},
2020-08-04 14:16:38 -04:00
"sort": [ "http.response.bytes" ]
2017-05-05 13:59:11 -04:00
}
--------------------------------------------------
2020-08-04 14:16:38 -04:00
// TEST[setup:my_index]
2019-09-09 12:35:50 -04:00
2020-08-04 14:16:38 -04:00
<1> collapse the result set using the "user.id" field
<2> return the three largest HTTP responses for the user
<3> return the three most recent HTTP responses for the user
2017-05-05 13:59:11 -04:00
2017-02-09 12:06:10 -05:00
The expansion of the group is done by sending an additional query for each
2017-05-05 13:59:11 -04:00
`inner_hit` request for each collapsed hit returned in the response. This can significantly slow things down
if you have too many groups and/or `inner_hit` requests.
2017-02-09 12:06:10 -05:00
The `max_concurrent_group_searches` request parameter can be used to control
the maximum number of concurrent searches allowed in this phase.
The default is based on the number of data nodes and the default search thread pool size.
2020-07-31 12:40:40 -04:00
WARNING: `collapse` cannot be used in conjunction with <<scroll-search-results, scroll>>,
2020-08-11 13:04:07 -04:00
<<rescore, rescore>> or <<search-after, search after>>.
2018-07-13 11:40:03 -04:00
2020-07-30 16:45:18 -04:00
[discrete]
2020-07-30 09:19:05 -04:00
[[second-level-of-collapsing]]
2020-07-30 16:45:18 -04:00
=== Second level of collapsing
2018-07-13 11:40:03 -04:00
Second level of collapsing is also supported and is applied to `inner_hits`.
2020-08-04 14:16:38 -04:00
For example, the following search collapses results by `geo.country_name`.
Within each `geo.country_name`, inner hits are collapsed by `user.id`.
2018-07-13 11:40:03 -04:00
[source,js]
--------------------------------------------------
2020-08-04 14:16:38 -04:00
GET /my-index-000001/_search
2018-07-13 11:40:03 -04:00
{
2020-07-21 15:49:58 -04:00
"query": {
"match": {
2020-08-04 14:16:38 -04:00
"message": "GET /search"
2020-07-21 15:49:58 -04:00
}
},
"collapse": {
2020-08-04 14:16:38 -04:00
"field": "geo.country_name",
2020-07-21 15:49:58 -04:00
"inner_hits": {
"name": "by_location",
2020-08-04 14:16:38 -04:00
"collapse": { "field": "user.id" },
2020-07-21 15:49:58 -04:00
"size": 3
2018-07-13 11:40:03 -04:00
}
2020-07-21 15:49:58 -04:00
}
2018-07-13 11:40:03 -04:00
}
--------------------------------------------------
// NOTCONSOLE
Response:
[source,js]
--------------------------------------------------
{
2020-07-21 15:49:58 -04:00
...
"hits": [
{
2020-08-04 14:16:38 -04:00
"_index": "my-index-000001",
2020-07-21 15:49:58 -04:00
"_type": "_doc",
"_id": "9",
"_score": ...,
"_source": {...},
2020-08-04 14:16:38 -04:00
"fields": { "geo": { "country_name": [ "UK" ] }},
2020-07-21 15:49:58 -04:00
"inner_hits": {
"by_location": {
"hits": {
...,
"hits": [
{
...
2020-08-04 14:16:38 -04:00
"fields": { "user": "id": { [ "user124" ] }}
2020-07-21 15:49:58 -04:00
},
{
...
2020-08-04 14:16:38 -04:00
"fields": { "user": "id": { [ "user589" ] }}
2020-07-21 15:49:58 -04:00
},
{
...
2020-08-04 14:16:38 -04:00
"fields": { "user": "id": { [ "user001" ] }}
2020-07-21 15:49:58 -04:00
}
]
}
}
}
},
{
2020-08-04 14:16:38 -04:00
"_index": "my-index-000001",
2020-07-21 15:49:58 -04:00
"_type": "_doc",
"_id": "1",
"_score": ..,
"_source": {...
},
2020-08-04 14:16:38 -04:00
"fields": { "geo": { "country_name": [ "Canada" ] }},
2020-07-21 15:49:58 -04:00
"inner_hits": {
"by_location": {
"hits": {
...,
"hits": [
{
...
2020-08-04 14:16:38 -04:00
"fields": { "user": "id": { [ "user444" ] }}
2020-07-21 15:49:58 -04:00
},
{
...
2020-08-04 14:16:38 -04:00
"fields": { "user": "id": { [ "user1111" ] }
2020-07-21 15:49:58 -04:00
},
{
...
2020-08-04 14:16:38 -04:00
"fields": { "user": "id": { [ "user999" ] }}
2020-07-21 15:49:58 -04:00
}
]
}
}
}
},
2018-07-13 11:40:03 -04:00
...
2020-07-21 15:49:58 -04:00
]
2018-07-13 11:40:03 -04:00
}
--------------------------------------------------
// NOTCONSOLE
2018-08-28 07:16:43 -04:00
NOTE: Second level of collapsing doesn't allow `inner_hits`.