parent
5ba543fd6c
commit
5bdf25320a
|
@ -67,3 +67,4 @@ include::bucket/significanttext-aggregation.asciidoc[]
|
|||
|
||||
include::bucket/terms-aggregation.asciidoc[]
|
||||
|
||||
include::bucket/range-field-note.asciidoc[]
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
This multi-bucket aggregation is similar to the normal
|
||||
<<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
|
||||
only be used with date values. Because dates are represented internally in
|
||||
only be used with date or date range values. Because dates are represented internally in
|
||||
Elasticsearch as long values, it is possible, but not as accurate, to use the
|
||||
normal `histogram` on dates as well. The main difference in the two APIs is
|
||||
that here the interval can be specified using date/time expressions. Time-based
|
||||
|
|
|
@ -1,12 +1,13 @@
|
|||
[[search-aggregations-bucket-histogram-aggregation]]
|
||||
=== Histogram Aggregation
|
||||
|
||||
A multi-bucket values source based aggregation that can be applied on numeric values extracted from the documents.
|
||||
It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the documents have a field
|
||||
that holds a price (numeric), we can configure this aggregation to dynamically build buckets with interval `5`
|
||||
(in case of price it may represent $5). When the aggregation executes, the price field of every document will be
|
||||
evaluated and will be rounded down to its closest bucket - for example, if the price is `32` and the bucket size is `5`
|
||||
then the rounding will yield `30` and thus the document will "fall" into the bucket that is associated with the key `30`.
|
||||
A multi-bucket values source based aggregation that can be applied on numeric values or numeric range values extracted
|
||||
from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the
|
||||
documents have a field that holds a price (numeric), we can configure this aggregation to dynamically build buckets with
|
||||
interval `5` (in case of price it may represent $5). When the aggregation executes, the price field of every document
|
||||
will be evaluated and will be rounded down to its closest bucket - for example, if the price is `32` and the bucket size
|
||||
is `5` then the rounding will yield `30` and thus the document will "fall" into the bucket that is associated with the
|
||||
key `30`.
|
||||
To make this more formal, here is the rounding function that is used:
|
||||
|
||||
[source,java]
|
||||
|
@ -14,6 +15,10 @@ To make this more formal, here is the rounding function that is used:
|
|||
bucket_key = Math.floor((value - offset) / interval) * interval + offset
|
||||
--------------------------------------------------
|
||||
|
||||
For range values, a document can fall into multiple buckets. The first bucket is computed from the lower
|
||||
bound of the range in the same way as a bucket for a single value is computed. The final bucket is computed in the same
|
||||
way from the upper bound of the range, and the range is counted in all buckets in between and including those two.
|
||||
|
||||
The `interval` must be a positive decimal, while the `offset` must be a decimal in `[0, interval)`
|
||||
(a decimal greater than or equal to `0` and less than `interval`)
|
||||
|
||||
|
@ -175,6 +180,14 @@ POST /sales/_search?size=0
|
|||
--------------------------------------------------
|
||||
// TEST[setup:sales]
|
||||
|
||||
When aggregating ranges, buckets are based on the values of the returned documents. This means the response may include
|
||||
buckets outside of a query's range. For example, if your query looks for values greater than 100, and you have a range
|
||||
covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, it's
|
||||
best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the
|
||||
aggregation buckets those documents without regard to how they were selected.
|
||||
See <<search-aggregations-bucket-range-field-note,note on bucketing range
|
||||
fields>> for more information and an example.
|
||||
|
||||
==== Order
|
||||
|
||||
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using
|
||||
|
|
|
@ -0,0 +1,181 @@
|
|||
[[search-aggregations-bucket-range-field-note]]
|
||||
=== Subtleties of bucketing range fields
|
||||
|
||||
==== Documents are counted for each bucket they land in
|
||||
|
||||
Since a range represents multiple values, running a bucket aggregation over a
|
||||
range field can result in the same document landing in multiple buckets. This
|
||||
can lead to surprising behavior, such as the sum of bucket counts being higher
|
||||
than the number of matched documents. For example, consider the following
|
||||
index:
|
||||
[source, console]
|
||||
--------------------------------------------------
|
||||
PUT range_index
|
||||
{
|
||||
"settings": {
|
||||
"number_of_shards": 2
|
||||
},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"expected_attendees": {
|
||||
"type": "integer_range"
|
||||
},
|
||||
"time_frame": {
|
||||
"type": "date_range",
|
||||
"format": "yyyy-MM-dd||epoch_millis"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT range_index/_doc/1?refresh
|
||||
{
|
||||
"expected_attendees" : {
|
||||
"gte" : 10,
|
||||
"lte" : 20
|
||||
},
|
||||
"time_frame" : {
|
||||
"gte" : "2019-10-28",
|
||||
"lte" : "2019-11-04"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTSETUP
|
||||
|
||||
The range is wider than the interval in the following aggregation, and thus the
|
||||
document will land in multiple buckets.
|
||||
|
||||
[source, console]
|
||||
--------------------------------------------------
|
||||
POST /range_index/_search?size=0
|
||||
{
|
||||
"aggs" : {
|
||||
"range_histo" : {
|
||||
"histogram" : {
|
||||
"field" : "expected_attendees",
|
||||
"interval" : 5
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Since the interval is `5` (and the offset is `0` by default), we expect buckets `10`,
|
||||
`15`, and `20`. Our range document will fall in all three of these buckets.
|
||||
|
||||
[source, console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
...
|
||||
"aggregations" : {
|
||||
"range_histo" : {
|
||||
"buckets" : [
|
||||
{
|
||||
"key" : 10.0,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 15.0,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 20.0,
|
||||
"doc_count" : 1
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
||||
|
||||
A document cannot exist partially in a bucket; For example, the above document
|
||||
cannot count as one-third in each of the above three buckets. In this example,
|
||||
since the document's range landed in multiple buckets, the full value of that
|
||||
document would also be counted in any sub-aggregations for each bucket as well.
|
||||
|
||||
==== Query bounds are not aggregation filters
|
||||
|
||||
Another unexpected behavior can arise when a query is used to filter on the
|
||||
field being aggregated. In this case, a document could match the query but
|
||||
still have one or both of the endpoints of the range outside the query.
|
||||
Consider the following aggregation on the above document:
|
||||
|
||||
[source, console]
|
||||
--------------------------------------------------
|
||||
POST /range_index/_search?size=0
|
||||
{
|
||||
"query": {
|
||||
"range": {
|
||||
"time_frame": {
|
||||
"gte": "2019-11-01",
|
||||
"format": "yyyy-MM-dd"
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggs" : {
|
||||
"november_data" : {
|
||||
"date_histogram" : {
|
||||
"field" : "time_frame",
|
||||
"calendar_interval" : "day"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Even though the query only considers days in November, the aggregation
|
||||
generates 8 buckets (4 in October, 4 in November) because the aggregation is
|
||||
calculated over the ranges of all matching documents.
|
||||
|
||||
[source, console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
...
|
||||
"aggregations" : {
|
||||
"november_data" : {
|
||||
"buckets" : [
|
||||
{
|
||||
"key" : 1572220800000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572307200000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572393600000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572480000000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572566400000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572652800000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572739200000,
|
||||
"doc_count" : 1
|
||||
},
|
||||
{
|
||||
"key" : 1572825600000,
|
||||
"doc_count" : 1
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
||||
|
||||
Depending on the use case, a `CONTAINS` query could limit the documents to only
|
||||
those that fall entirely in the queried range. In this example, the one
|
||||
document would not be included and the aggregation would be empty. Filtering
|
||||
the buckets after the aggregation is also an option, for use cases where the
|
||||
document should be counted but the out of bounds data can be safely ignored.
|
Loading…
Reference in New Issue