458 lines
13 KiB
Plaintext
458 lines
13 KiB
Plaintext
[[search-aggregations-bucket-datehistogram-aggregation]]
|
|
=== Date Histogram Aggregation
|
|
|
|
A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
|
|
only be applied on date values. Since dates are represented in Elasticsearch internally as long values, it is possible
|
|
to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
|
|
that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
|
|
we need special support for time based data. From a functionality perspective, this histogram supports the same features
|
|
as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
|
|
|
|
Requesting bucket intervals of a month.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"interval" : "month"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
Available expressions for interval: `year` (`1y`), `quarter` (`1q`), `month` (`1M`), `week` (`1w`),
|
|
`day` (`1d`), `hour` (`1h`), `minute` (`1m`), `second` (`1s`)
|
|
|
|
Time values can also be specified via abbreviations supported by <<time-units,time units>> parsing.
|
|
Note that fractional time values are not supported, but you can address this by shifting to another
|
|
time unit (e.g., `1.5h` could instead be specified as `90m`). Also note that time intervals larger than
|
|
days do not support arbitrary values but can only be one unit large (e.g. `1y` is valid, `2y` is not).
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"interval" : "90m"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
==== Keys
|
|
|
|
Internally, a date is represented as a 64 bit number representing a timestamp
|
|
in milliseconds-since-the-epoch. These timestamps are returned as the bucket
|
|
++key++s. The `key_as_string` is the same timestamp converted to a formatted
|
|
date string using the format specified with the `format` parameter:
|
|
|
|
TIP: If no `format` is specified, then it will use the first date
|
|
<<mapping-date-format,format>> specified in the field mapping.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"interval" : "1M",
|
|
"format" : "yyyy-MM-dd" <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
<1> Supports expressive date <<date-format-pattern,format pattern>>
|
|
|
|
Response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"sales_over_time": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-01-01",
|
|
"key": 1420070400000,
|
|
"doc_count": 3
|
|
},
|
|
{
|
|
"key_as_string": "2015-02-01",
|
|
"key": 1422748800000,
|
|
"doc_count": 2
|
|
},
|
|
{
|
|
"key_as_string": "2015-03-01",
|
|
"key": 1425168000000,
|
|
"doc_count": 2
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
==== Time Zone
|
|
|
|
Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
|
|
rounding is also done in UTC. The `time_zone` parameter can be used to indicate
|
|
that bucketing should use a different time zone.
|
|
|
|
Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
|
|
`-08:00`) or as a timezone id, an identifier used in the TZ database like
|
|
`America/Los_Angeles`.
|
|
|
|
Consider the following example:
|
|
|
|
[source,js]
|
|
---------------------------------
|
|
PUT my_index/_doc/1?refresh
|
|
{
|
|
"date": "2015-10-01T00:30:00Z"
|
|
}
|
|
|
|
PUT my_index/_doc/2?refresh
|
|
{
|
|
"date": "2015-10-01T01:30:00Z"
|
|
}
|
|
|
|
GET my_index/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"by_day": {
|
|
"date_histogram": {
|
|
"field": "date",
|
|
"interval": "day"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// CONSOLE
|
|
|
|
UTC is used if no time zone is specified, which would result in both of these
|
|
documents being placed into the same day bucket, which starts at midnight UTC
|
|
on 1 October 2015:
|
|
|
|
[source,js]
|
|
---------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_day": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-10-01T00:00:00.000Z",
|
|
"key": 1443657600000,
|
|
"doc_count": 2
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before
|
|
midnight UTC:
|
|
|
|
[source,js]
|
|
---------------------------------
|
|
GET my_index/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"by_day": {
|
|
"date_histogram": {
|
|
"field": "date",
|
|
"interval": "day",
|
|
"time_zone": "-01:00"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Now the first document falls into the bucket for 30 September 2015, while the
|
|
second document falls into the bucket for 1 October 2015:
|
|
|
|
[source,js]
|
|
---------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_day": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-09-30T00:00:00.000-01:00", <1>
|
|
"key": 1443574800000,
|
|
"doc_count": 1
|
|
},
|
|
{
|
|
"key_as_string": "2015-10-01T00:00:00.000-01:00", <1>
|
|
"key": 1443661200000,
|
|
"doc_count": 1
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
<1> The `key_as_string` value represents midnight on each day
|
|
in the specified time zone.
|
|
|
|
WARNING: When using time zones that follow DST (daylight savings time) changes,
|
|
buckets close to the moment when those changes happen can have slightly different
|
|
sizes than would be expected from the used `interval`.
|
|
For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
|
|
clocks were turned forward 1 hour to 3am local time. When using `day` as `interval`,
|
|
the bucket covering that day will only hold data for 23 hours instead of the usual
|
|
24 hours for other buckets. The same is true for shorter intervals like e.g. 12h.
|
|
Here, we will have only a 11h bucket on the morning of 27 March when the DST shift
|
|
happens.
|
|
|
|
|
|
==== Offset
|
|
|
|
The `offset` parameter is used to change the start value of each bucket by the
|
|
specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
|
|
an hour, or `1d` for a day. See <<time-units>> for more possible time
|
|
duration options.
|
|
|
|
For instance, when using an interval of `day`, each bucket runs from midnight
|
|
to midnight. Setting the `offset` parameter to `+6h` would change each bucket
|
|
to run from 6am to 6am:
|
|
|
|
[source,js]
|
|
-----------------------------
|
|
PUT my_index/_doc/1?refresh
|
|
{
|
|
"date": "2015-10-01T05:30:00Z"
|
|
}
|
|
|
|
PUT my_index/_doc/2?refresh
|
|
{
|
|
"date": "2015-10-01T06:30:00Z"
|
|
}
|
|
|
|
GET my_index/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"by_day": {
|
|
"date_histogram": {
|
|
"field": "date",
|
|
"interval": "day",
|
|
"offset": "+6h"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------------
|
|
// CONSOLE
|
|
|
|
Instead of a single bucket starting at midnight, the above request groups the
|
|
documents into buckets starting at 6am:
|
|
|
|
[source,js]
|
|
-----------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_day": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-09-30T06:00:00.000Z",
|
|
"key": 1443592800000,
|
|
"doc_count": 1
|
|
},
|
|
{
|
|
"key_as_string": "2015-10-01T06:00:00.000Z",
|
|
"key": 1443679200000,
|
|
"doc_count": 1
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
-----------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
NOTE: The start `offset` of each bucket is calculated after the `time_zone`
|
|
adjustments have been made.
|
|
|
|
==== Keyed Response
|
|
|
|
Setting the `keyed` flag to `true` will associate a unique string key with each bucket and return the ranges as a hash rather than an array:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"interval" : "1M",
|
|
"format" : "yyyy-MM-dd",
|
|
"keyed": true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
Response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"sales_over_time": {
|
|
"buckets": {
|
|
"2015-01-01": {
|
|
"key_as_string": "2015-01-01",
|
|
"key": 1420070400000,
|
|
"doc_count": 3
|
|
},
|
|
"2015-02-01": {
|
|
"key_as_string": "2015-02-01",
|
|
"key": 1422748800000,
|
|
"doc_count": 2
|
|
},
|
|
"2015-03-01": {
|
|
"key_as_string": "2015-03-01",
|
|
"key": 1425168000000,
|
|
"doc_count": 2
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
==== Scripts
|
|
|
|
Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
|
|
value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
|
|
settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
|
|
bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
|
|
setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
|
|
do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
|
|
|
|
==== Missing value
|
|
|
|
The `missing` parameter defines how documents that are missing a value should be treated.
|
|
By default they will be ignored but it is also possible to treat them as if they
|
|
had a value.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sale_date" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"interval": "year",
|
|
"missing": "2000/01/01" <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
|
|
<1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.
|
|
|
|
==== Order
|
|
|
|
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using
|
|
the `order` setting. Supports the same `order` functionality as the <<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
|
|
|
|
deprecated[6.0.0, Use `_key` instead of `_time` to order buckets by their dates/keys]
|
|
|
|
==== Use of a script to aggregate by day of the week
|
|
|
|
There are some cases where date histogram can't help us, like for example, when we need
|
|
to aggregate the results by day of the week.
|
|
In this case to overcome the problem, we can use a script that returns the day of the week:
|
|
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"dayOfWeek": {
|
|
"terms": {
|
|
"script": {
|
|
"lang": "painless",
|
|
"source": "doc['date'].value.dayOfWeek"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sales]
|
|
// TEST[warning:The joda time api for doc values is deprecated. Use -Des.scripting.use_java_time=true to use the java time api for date field doc values]
|
|
|
|
Response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"dayOfWeek": {
|
|
"doc_count_error_upper_bound": 0,
|
|
"sum_other_doc_count": 0,
|
|
"buckets": [
|
|
{
|
|
"key": "7",
|
|
"doc_count": 4
|
|
},
|
|
{
|
|
"key": "4",
|
|
"doc_count": 3
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
The response will contain all the buckets having as key the relative day of
|
|
the week: 1 for Monday, 2 for Tuesday... 7 for Sunday.
|