220 lines
6.5 KiB
Plaintext
220 lines
6.5 KiB
Plaintext
[[request-body-search-collapse]]
|
|
==== Field Collapsing
|
|
|
|
Allows to collapse search results based on field values.
|
|
The collapsing is done by selecting only the top sorted document per collapse key.
|
|
For instance the query below retrieves the best tweet for each user and sorts them by number of likes.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": "elasticsearch"
|
|
}
|
|
},
|
|
"collapse" : {
|
|
"field" : "user" <1>
|
|
},
|
|
"sort": ["likes"], <2>
|
|
"from": 10 <3>
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:twitter]
|
|
|
|
<1> collapse the result set using the "user" field
|
|
<2> sort the top docs by number of likes
|
|
<3> define the offset of the first collapsed result
|
|
|
|
WARNING: The total number of hits in the response indicates the number of matching documents without collapsing.
|
|
The total number of distinct group is unknown.
|
|
|
|
The field used for collapsing must be a single valued <<keyword, `keyword`>> or <<number, `numeric`>> field with <<doc-values, `doc_values`>> activated
|
|
|
|
NOTE: The collapsing is applied to the top hits only and does not affect aggregations.
|
|
|
|
|
|
===== Expand collapse results
|
|
|
|
It is also possible to expand each collapsed top hits with the `inner_hits` option.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": "elasticsearch"
|
|
}
|
|
},
|
|
"collapse" : {
|
|
"field" : "user", <1>
|
|
"inner_hits": {
|
|
"name": "last_tweets", <2>
|
|
"size": 5, <3>
|
|
"sort": [{ "date": "asc" }] <4>
|
|
},
|
|
"max_concurrent_group_searches": 4 <5>
|
|
},
|
|
"sort": ["likes"]
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:twitter]
|
|
|
|
<1> collapse the result set using the "user" field
|
|
<2> the name used for the inner hit section in the response
|
|
<3> the number of inner_hits to retrieve per collapse key
|
|
<4> how to sort the document inside each group
|
|
<5> the number of concurrent requests allowed to retrieve the inner_hits` per group
|
|
|
|
See <<request-body-search-inner-hits, inner hits>> for the complete list of supported options and the format of the response.
|
|
|
|
It is also possible to request multiple `inner_hits` for each collapsed hit. This can be useful when you want to get
|
|
multiple representations of the collapsed hits.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": "elasticsearch"
|
|
}
|
|
},
|
|
"collapse" : {
|
|
"field" : "user", <1>
|
|
"inner_hits": [
|
|
{
|
|
"name": "most_liked", <2>
|
|
"size": 3,
|
|
"sort": ["likes"]
|
|
},
|
|
{
|
|
"name": "most_recent", <3>
|
|
"size": 3,
|
|
"sort": [{ "date": "asc" }]
|
|
}
|
|
]
|
|
},
|
|
"sort": ["likes"]
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:twitter]
|
|
|
|
<1> collapse the result set using the "user" field
|
|
<2> return the three most liked tweets for the user
|
|
<3> return the three most recent tweets for the user
|
|
|
|
The expansion of the group is done by sending an additional query for each
|
|
`inner_hit` request for each collapsed hit returned in the response. This can significantly slow things down
|
|
if you have too many groups and/or `inner_hit` requests.
|
|
|
|
The `max_concurrent_group_searches` request parameter can be used to control
|
|
the maximum number of concurrent searches allowed in this phase.
|
|
The default is based on the number of data nodes and the default search thread pool size.
|
|
|
|
WARNING: `collapse` cannot be used in conjunction with <<request-body-search-scroll, scroll>>,
|
|
<<request-body-search-rescore, rescore>> or <<request-body-search-search-after, search after>>.
|
|
|
|
===== Second level of collapsing
|
|
|
|
Second level of collapsing is also supported and is applied to `inner_hits`.
|
|
For example, the following request finds the top scored tweets for
|
|
each country, and within each country finds the top scored tweets
|
|
for each user.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": "elasticsearch"
|
|
}
|
|
},
|
|
"collapse" : {
|
|
"field" : "country",
|
|
"inner_hits" : {
|
|
"name": "by_location",
|
|
"collapse" : {"field" : "user"},
|
|
"size": 3
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
|
|
Response:
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"hits": [
|
|
{
|
|
"_index": "twitter",
|
|
"_type": "_doc",
|
|
"_id": "9",
|
|
"_score": ...,
|
|
"_source": {...},
|
|
"fields": {"country": ["UK"]},
|
|
"inner_hits":{
|
|
"by_location": {
|
|
"hits": {
|
|
...,
|
|
"hits": [
|
|
{
|
|
...
|
|
"fields": {"user" : ["user124"]}
|
|
},
|
|
{
|
|
...
|
|
"fields": {"user" : ["user589"]}
|
|
},
|
|
{
|
|
...
|
|
"fields": {"user" : ["user001"]}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"_index": "twitter",
|
|
"_type": "_doc",
|
|
"_id": "1",
|
|
"_score": ..,
|
|
"_source": {...},
|
|
"fields": {"country": ["Canada"]},
|
|
"inner_hits":{
|
|
"by_location": {
|
|
"hits": {
|
|
...,
|
|
"hits": [
|
|
{
|
|
...
|
|
"fields": {"user" : ["user444"]}
|
|
},
|
|
{
|
|
...
|
|
"fields": {"user" : ["user1111"]}
|
|
},
|
|
{
|
|
...
|
|
"fields": {"user" : ["user999"]}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
|
|
},
|
|
....
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
NOTE: Second level of collapsing doesn't allow `inner_hits`. |