410 lines
13 KiB
Plaintext
410 lines
13 KiB
Plaintext
[[search-aggregations-metrics-top-hits-aggregation]]
|
|
=== Top Hits Aggregation
|
|
|
|
A `top_hits` metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended
|
|
to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
|
|
|
|
The `top_hits` aggregator can effectively be used to group result sets by certain fields via a bucket aggregator.
|
|
One or more bucket aggregators determines by which properties a result set get sliced into.
|
|
|
|
==== Options
|
|
|
|
* `from` - The offset from the first result you want to fetch.
|
|
* `size` - The maximum number of top matching hits to return per bucket. By default the top three matching hits are returned.
|
|
* `sort` - How the top matching hits should be sorted. By default the hits are sorted by the score of the main query.
|
|
|
|
==== Supported per hit features
|
|
|
|
The top_hits aggregation returns regular search hits, because of this many per hit features can be supported:
|
|
|
|
* <<search-request-highlighting,Highlighting>>
|
|
* <<search-request-explain,Explain>>
|
|
* <<search-request-named-queries-and-filters,Named filters and queries>>
|
|
* <<search-request-source-filtering,Source filtering>>
|
|
* <<search-request-stored-fields,Stored fields>>
|
|
* <<search-request-script-fields,Script fields>>
|
|
* <<search-request-docvalue-fields,Doc value fields>>
|
|
* <<search-request-version,Include versions>>
|
|
* <<search-request-seq-no-primary-term,Include Sequence Numbers and Primary Terms>>
|
|
|
|
==== Example
|
|
|
|
In the following example we group the sales by type and per type we show the last sale.
|
|
For each sale only the date and price fields are being included in the source.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"top_tags": {
|
|
"terms": {
|
|
"field": "type",
|
|
"size": 3
|
|
},
|
|
"aggs": {
|
|
"top_sales_hits": {
|
|
"top_hits": {
|
|
"sort": [
|
|
{
|
|
"date": {
|
|
"order": "desc"
|
|
}
|
|
}
|
|
],
|
|
"_source": {
|
|
"includes": [ "date", "price" ]
|
|
},
|
|
"size" : 1
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
Possible response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"top_tags": {
|
|
"doc_count_error_upper_bound": 0,
|
|
"sum_other_doc_count": 0,
|
|
"buckets": [
|
|
{
|
|
"key": "hat",
|
|
"doc_count": 3,
|
|
"top_sales_hits": {
|
|
"hits": {
|
|
"total" : {
|
|
"value": 3,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": null,
|
|
"hits": [
|
|
{
|
|
"_index": "sales",
|
|
"_type": "_doc",
|
|
"_id": "AVnNBmauCQpcRyxw6ChK",
|
|
"_source": {
|
|
"date": "2015/03/01 00:00:00",
|
|
"price": 200
|
|
},
|
|
"sort": [
|
|
1425168000000
|
|
],
|
|
"_score": null
|
|
}
|
|
]
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"key": "t-shirt",
|
|
"doc_count": 3,
|
|
"top_sales_hits": {
|
|
"hits": {
|
|
"total" : {
|
|
"value": 3,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": null,
|
|
"hits": [
|
|
{
|
|
"_index": "sales",
|
|
"_type": "_doc",
|
|
"_id": "AVnNBmauCQpcRyxw6ChL",
|
|
"_source": {
|
|
"date": "2015/03/01 00:00:00",
|
|
"price": 175
|
|
},
|
|
"sort": [
|
|
1425168000000
|
|
],
|
|
"_score": null
|
|
}
|
|
]
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"key": "bag",
|
|
"doc_count": 1,
|
|
"top_sales_hits": {
|
|
"hits": {
|
|
"total" : {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": null,
|
|
"hits": [
|
|
{
|
|
"_index": "sales",
|
|
"_type": "_doc",
|
|
"_id": "AVnNBmatCQpcRyxw6ChH",
|
|
"_source": {
|
|
"date": "2015/01/01 00:00:00",
|
|
"price": 150
|
|
},
|
|
"sort": [
|
|
1420070400000
|
|
],
|
|
"_score": null
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
// TESTRESPONSE[s/AVnNBmauCQpcRyxw6ChK/$body.aggregations.top_tags.buckets.0.top_sales_hits.hits.hits.0._id/]
|
|
// TESTRESPONSE[s/AVnNBmauCQpcRyxw6ChL/$body.aggregations.top_tags.buckets.1.top_sales_hits.hits.hits.0._id/]
|
|
// TESTRESPONSE[s/AVnNBmatCQpcRyxw6ChH/$body.aggregations.top_tags.buckets.2.top_sales_hits.hits.hits.0._id/]
|
|
|
|
|
|
==== Field collapse example
|
|
|
|
Field collapsing or result grouping is a feature that logically groups a result set into groups and per group returns
|
|
top documents. The ordering of the groups is determined by the relevancy of the first document in a group. In
|
|
Elasticsearch this can be implemented via a bucket aggregator that wraps a `top_hits` aggregator as sub-aggregator.
|
|
|
|
In the example below we search across crawled webpages. For each webpage we store the body and the domain the webpage
|
|
belong to. By defining a `terms` aggregator on the `domain` field we group the result set of webpages by domain. The
|
|
`top_hits` aggregator is then defined as sub-aggregator, so that the top matching hits are collected per bucket.
|
|
|
|
Also a `max` aggregator is defined which is used by the `terms` aggregator's order feature to return the buckets by
|
|
relevancy order of the most relevant document in a bucket.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"body": "elections"
|
|
}
|
|
},
|
|
"aggs": {
|
|
"top_sites": {
|
|
"terms": {
|
|
"field": "domain",
|
|
"order": {
|
|
"top_hit": "desc"
|
|
}
|
|
},
|
|
"aggs": {
|
|
"top_tags_hits": {
|
|
"top_hits": {}
|
|
},
|
|
"top_hit" : {
|
|
"max": {
|
|
"script": {
|
|
"source": "_score"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
At the moment the `max` (or `min`) aggregator is needed to make sure the buckets from the `terms` aggregator are
|
|
ordered according to the score of the most relevant webpage per domain. Unfortunately the `top_hits` aggregator
|
|
can't be used in the `order` option of the `terms` aggregator yet.
|
|
|
|
==== top_hits support in a nested or reverse_nested aggregator
|
|
|
|
If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
|
|
Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
|
|
has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
|
|
or `reverse_nested` aggregator. Read more about nested in the <<nested,nested type mapping>>.
|
|
|
|
If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
|
|
the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why
|
|
nested hits also include their nested identity. The nested identity is kept under the `_nested` field in the search hit
|
|
and includes the array field and the offset in the array field the nested hit belongs to. The offset is zero based.
|
|
|
|
Let's see how it works with a real sample. Considering the following mapping:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT /sales
|
|
{
|
|
"mappings": {
|
|
"properties" : {
|
|
"tags" : { "type" : "keyword" },
|
|
"comments" : { <1>
|
|
"type" : "nested",
|
|
"properties" : {
|
|
"username" : { "type" : "keyword" },
|
|
"comment" : { "type" : "text" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
<1> The `comments` is an array that holds nested documents under the `product` object.
|
|
|
|
And some documents:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT /sales/_doc/1?refresh
|
|
{
|
|
"tags": ["car", "auto"],
|
|
"comments": [
|
|
{"username": "baddriver007", "comment": "This car could have better brakes"},
|
|
{"username": "dr_who", "comment": "Where's the autopilot? Can't find it"},
|
|
{"username": "ilovemotorbikes", "comment": "This car has two extra wheels"}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
It's now possible to execute the following `top_hits` aggregation (wrapped in a `nested` aggregation):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search
|
|
{
|
|
"query": {
|
|
"term": { "tags": "car" }
|
|
},
|
|
"aggs": {
|
|
"by_sale": {
|
|
"nested" : {
|
|
"path" : "comments"
|
|
},
|
|
"aggs": {
|
|
"by_user": {
|
|
"terms": {
|
|
"field": "comments.username",
|
|
"size": 1
|
|
},
|
|
"aggs": {
|
|
"by_nested": {
|
|
"top_hits":{}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
// TEST[s/_search/_search\?filter_path=aggregations.by_sale.by_user.buckets/]
|
|
|
|
Top hits response snippet with a nested hit, which resides in the first slot of array field `comments`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_sale": {
|
|
"by_user": {
|
|
"buckets": [
|
|
{
|
|
"key": "baddriver007",
|
|
"doc_count": 1,
|
|
"by_nested": {
|
|
"hits": {
|
|
"total" : {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 0.3616575,
|
|
"hits": [
|
|
{
|
|
"_index": "sales",
|
|
"_type" : "_doc",
|
|
"_id": "1",
|
|
"_nested": {
|
|
"field": "comments", <1>
|
|
"offset": 0 <2>
|
|
},
|
|
"_score": 0.3616575,
|
|
"_source": {
|
|
"comment": "This car could have better brakes", <3>
|
|
"username": "baddriver007"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
...
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\.//]
|
|
<1> Name of the array field containing the nested hit
|
|
<2> Position if the nested hit in the containing array
|
|
<3> Source of the nested hit
|
|
|
|
If `_source` is requested then just the part of the source of the nested object is returned, not the entire source of the document.
|
|
Also stored fields on the *nested* inner object level are accessible via `top_hits` aggregator residing in a `nested` or `reverse_nested` aggregator.
|
|
|
|
Only nested hits will have a `_nested` field in the hit, non nested (regular) hits will not have a `_nested` field.
|
|
|
|
The information in `_nested` can also be used to parse the original source somewhere else if `_source` isn't enabled.
|
|
|
|
If there are multiple levels of nested object types defined in mappings then the `_nested` information can also be hierarchical
|
|
in order to express the identity of nested hits that are two layers deep or more.
|
|
|
|
In the example below a nested hit resides in the first slot of the field `nested_grand_child_field` which then resides in
|
|
the second slow of the `nested_child_field` field:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
...
|
|
"hits": {
|
|
"total" : {
|
|
"value": 2565,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "a",
|
|
"_type": "b",
|
|
"_id": "1",
|
|
"_score": 1,
|
|
"_nested" : {
|
|
"field" : "nested_child_field",
|
|
"offset" : 1,
|
|
"_nested" : {
|
|
"field" : "nested_grand_child_field",
|
|
"offset" : 0
|
|
}
|
|
}
|
|
"_source": ...
|
|
},
|
|
...
|
|
]
|
|
}
|
|
...
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|