326 lines
9.5 KiB
Plaintext
326 lines
9.5 KiB
Plaintext
[[search-aggregations-bucket-histogram-aggregation]]
|
|
=== Histogram
|
|
|
|
A multi-bucket values source based aggregation that can be applied on numeric values extracted from the documents.
|
|
It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the documents have a field
|
|
that holds a price (numeric), we can configure this aggregation to dynamically build buckets with interval `5`
|
|
(in case of price it may represent $5). When the aggregation executes, the price field of every document will be
|
|
evaluated and will be rounded down to its closest bucket - for example, if the price is `32` and the bucket size is `5`
|
|
then the rounding will yield `30` and thus the document will "fall" into the bucket that is associated withe the key `30`.
|
|
To make this more formal, here is the rounding function that is used:
|
|
|
|
[source,java]
|
|
--------------------------------------------------
|
|
rem = value % interval
|
|
if (rem < 0) {
|
|
rem += interval
|
|
}
|
|
bucket_key = value - rem
|
|
--------------------------------------------------
|
|
|
|
The following snippet "buckets" the products based on their `price` by interval of `50`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
And the following may be the response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggregations": {
|
|
"prices" : {
|
|
"buckets": [
|
|
{
|
|
"key": 0,
|
|
"doc_count": 2
|
|
},
|
|
{
|
|
"key": 50,
|
|
"doc_count": 4
|
|
},
|
|
{
|
|
"key": 150,
|
|
"doc_count": 3
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The response above shows that none of the aggregated products has a price that falls within the range of `[100 - 150)`.
|
|
By default, the response will only contain those buckets with a `doc_count` greater than 0. It is possible change that
|
|
and request buckets with either a higher minimum count or even 0 (in which case elasticsearch will "fill in the gaps"
|
|
and create buckets with zero documents). This can be configured using the `min_doc_count` setting:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"min_doc_count" : 0
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggregations": {
|
|
"prices" : {
|
|
"buckets": [
|
|
{
|
|
"key": 0,
|
|
"doc_count": 2
|
|
},
|
|
{
|
|
"key": 50,
|
|
"doc_count": 4
|
|
},
|
|
{
|
|
"key" : 100,
|
|
"doc_count" : 0 <1>
|
|
},
|
|
{
|
|
"key": 150,
|
|
"doc_count": 3
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> No documents were found that belong in this bucket, yet it is still returned with zero `doc_count`.
|
|
|
|
==== Order
|
|
|
|
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controled
|
|
using the `order` setting.
|
|
|
|
Ordering the buckets by their key - descending:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"order" : { "_key" : "desc" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Ordering the buckets by their `doc_count` - ascending:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"order" : { "_count" : "asc" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
If the histogram aggregation has a direct metrics sub-aggregation, the latter can determine the order of the buckets:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"order" : { "price_stats.min" : "asc" } <1>
|
|
},
|
|
"aggs" : {
|
|
"price_stats" : { "stats" : {} } <2>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> The `{ "price_stats.min" : asc" }` will sort the buckets based on `min` value of their their `price_stats` sub-aggregation.
|
|
|
|
<2> There is no need to configure the `price` field for the `price_stats` aggregation as it will inherit it by default from its parent histogram aggregation.
|
|
|
|
It is also possible to order the buckets based on a "deeper" aggregation in the hierarchy. This is supported as long
|
|
as the aggregations path are of a single-bucket type, where the last aggregation in the path may either by a single-bucket
|
|
one or a metrics one. If it's a single-bucket type, the order will be defined by the number of docs in the bucket (i.e. `doc_count`),
|
|
in case it's a metrics one, the same rules as above apply (where the path must indicate the metric name to sort by in case of
|
|
a multi-value metrics aggregation, and in case of a single-value metrics aggregation the sort will be applied on that value).
|
|
|
|
The path must be defined in the following form:
|
|
|
|
--------------------------------------------------
|
|
AGG_SEPARATOR := '>'
|
|
METRIC_SEPARATOR := '.'
|
|
AGG_NAME := <the name of the aggregation>
|
|
METRIC := <the name of the metric (in case of multi-value metrics aggregation)>
|
|
PATH := <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR><METRIC>]
|
|
--------------------------------------------------
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"order" : { "promoted_products>rating_stats.avg" : "desc" } <1>
|
|
},
|
|
"aggs" : {
|
|
"promoted_products" : {
|
|
"filter" : { "term" : { "promoted" : true }},
|
|
"aggs" : {
|
|
"rating_stats" : { "stats" : { "field" : "rating" }}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The above will sort the buckets based on the avg rating among the promoted products
|
|
|
|
|
|
==== Minimum document count
|
|
|
|
It is possible to only return buckets that have a document count that is greater than or equal to a configured
|
|
limit through the `min_doc_count` option.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"min_doc_count": 10
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The above aggregation would only return buckets that contain 10 documents or more. Default value is `1`.
|
|
|
|
NOTE: The special value `0` can be used to add empty buckets to the response between the minimum and the maximum buckets.
|
|
Here is an example of what the response could look like:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggregations": {
|
|
"prices": {
|
|
"buckets": {
|
|
"0": {
|
|
"key": 0,
|
|
"doc_count": 2
|
|
},
|
|
"50": {
|
|
"key": 50,
|
|
"doc_count": 0
|
|
},
|
|
"150": {
|
|
"key": 150,
|
|
"doc_count": 3
|
|
},
|
|
"200": {
|
|
"key": 150,
|
|
"doc_count": 0
|
|
},
|
|
"250": {
|
|
"key": 150,
|
|
"doc_count": 0
|
|
},
|
|
"300": {
|
|
"key": 150,
|
|
"doc_count": 1
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
==== Response Format
|
|
|
|
By default, the buckets are returned as an ordered array. It is also possible to request the response as a hash
|
|
instead keyed by the buckets keys:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"prices" : {
|
|
"histogram" : {
|
|
"field" : "price",
|
|
"interval" : 50,
|
|
"keyed" : true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggregations": {
|
|
"prices": {
|
|
"buckets": {
|
|
"0": {
|
|
"key": 0,
|
|
"doc_count": 2
|
|
},
|
|
"50": {
|
|
"key": 50,
|
|
"doc_count": 4
|
|
},
|
|
"150": {
|
|
"key": 150,
|
|
"doc_count": 3
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|