679 lines
20 KiB
Plaintext
679 lines
20 KiB
Plaintext
[[search-aggregations-bucket-datehistogram-aggregation]]
|
|
=== Date Histogram Aggregation
|
|
|
|
This multi-bucket aggregation is similar to the normal
|
|
<<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
|
|
only be used with date values. Because dates are represented internally in
|
|
Elasticsearch as long values, it is possible, but not as accurate, to use the
|
|
normal `histogram` on dates as well. The main difference in the two APIs is
|
|
that here the interval can be specified using date/time expressions. Time-based
|
|
data requires special support because time-based intervals are not always a
|
|
fixed length.
|
|
|
|
==== Calendar and Fixed intervals
|
|
|
|
When configuring a date histogram aggregation, the interval can be specified
|
|
in two manners: calendar-aware time intervals, and fixed time intervals.
|
|
|
|
Calendar-aware intervals understand that daylight savings changes the length
|
|
of specific days, months have different amounts of days, and leap seconds can
|
|
be tacked onto a particular year.
|
|
|
|
Fixed intervals are, by contrast, always multiples of SI units and do not change
|
|
based on calendaring context.
|
|
|
|
[NOTE]
|
|
.Combined `interval` field is deprecated
|
|
==================================
|
|
deprecated[7.2, `interval` field is deprecated] Historically both calendar and fixed
|
|
intervals were configured in a single `interval` field, which led to confusing
|
|
semantics. Specifying `1d` would be assumed as a calendar-aware time,
|
|
whereas `2d` would be interpreted as fixed time. To get "one day" of fixed time,
|
|
the user would need to specify the next smaller unit (in this case, `24h`).
|
|
|
|
This combined behavior was often unknown to users, and even when knowledgeable about
|
|
the behavior it was difficult to use and confusing.
|
|
|
|
This behavior has been deprecated in favor of two new, explicit fields: `calendar_interval`
|
|
and `fixed_interval`.
|
|
|
|
By forcing a choice between calendar and intervals up front, the semantics of the interval
|
|
are clear to the user immediately and there is no ambiguity. The old `interval` field
|
|
will be removed in the future.
|
|
==================================
|
|
|
|
===== Calendar Intervals
|
|
|
|
Calendar-aware intervals are configured with the `calendar_interval` parameter.
|
|
Calendar intervals can only be specified in "singular" quantities of the unit
|
|
(`1d`, `1M`, etc). Multiples, such as `2d`, are not supported and will throw an exception.
|
|
|
|
The accepted units for calendar intervals are:
|
|
|
|
minute (`m`, `1m`) ::
|
|
All minutes begin at 00 seconds.
|
|
|
|
One minute is the interval between 00 seconds of the first minute and 00
|
|
seconds of the following minute in the specified timezone, compensating for any
|
|
intervening leap seconds, so that the number of minutes and seconds past the
|
|
hour is the same at the start and end.
|
|
|
|
hours (`h`, `1h`) ::
|
|
All hours begin at 00 minutes and 00 seconds.
|
|
|
|
One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
|
|
minutes of the following hour in the specified timezone, compensating for any
|
|
intervening leap seconds, so that the number of minutes and seconds past the hour
|
|
is the same at the start and end.
|
|
|
|
|
|
days (`d`, `1d`) ::
|
|
All days begin at the earliest possible time, which is usually 00:00:00
|
|
(midnight).
|
|
|
|
One day (1d) is the interval between the start of the day and the start of
|
|
of the following day in the specified timezone, compensating for any intervening
|
|
time changes.
|
|
|
|
week (`w`, `1w`) ::
|
|
|
|
One week is the interval between the start day_of_week:hour:minute:second
|
|
and the same day of the week and time of the following week in the specified
|
|
timezone.
|
|
|
|
month (`M`, `1M`) ::
|
|
|
|
One month is the interval between the start day of the month and time of
|
|
day and the same day of the month and time of the following month in the specified
|
|
timezone, so that the day of the month and time of day are the same at the start
|
|
and end.
|
|
|
|
quarter (`q`, `1q`) ::
|
|
|
|
One quarter (1q) is the interval between the start day of the month and
|
|
time of day and the same day of the month and time of day three months later,
|
|
so that the day of the month and time of day are the same at the start and end. +
|
|
|
|
year (`y`, `1y`) ::
|
|
|
|
One year (1y) is the interval between the start day of the month and time of
|
|
day and the same day of the month and time of day the following year in the
|
|
specified timezone, so that the date and time are the same at the start and end. +
|
|
|
|
===== Calendar Interval Examples
|
|
As an example, here is an aggregation requesting bucket intervals of a month in calendar time:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"calendar_interval" : "month"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
|
|
If you attempt to use multiples of calendar units, the aggregation will fail because only
|
|
singular calendar units are supported:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"calendar_interval" : "2d"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
// TEST[catch:bad_request]
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"error" : {
|
|
"root_cause" : [...],
|
|
"type" : "x_content_parse_exception",
|
|
"reason" : "[1:82] [date_histogram] failed to parse field [calendar_interval]",
|
|
"caused_by" : {
|
|
"type" : "illegal_argument_exception",
|
|
"reason" : "The supplied interval [2d] could not be parsed as a calendar interval.",
|
|
"stack_trace" : "java.lang.IllegalArgumentException: The supplied interval [2d] could not be parsed as a calendar interval."
|
|
}
|
|
}
|
|
}
|
|
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
===== Fixed Intervals
|
|
|
|
Fixed intervals are configured with the `fixed_interval` parameter.
|
|
|
|
In contrast to calendar-aware intervals, fixed intervals are a fixed number of SI
|
|
units and never deviate, regardless of where they fall on the calendar. One second
|
|
is always composed of 1000ms. This allows fixed intervals to be specified in
|
|
any multiple of the supported units.
|
|
|
|
However, it means fixed intervals cannot express other units such as months,
|
|
since the duration of a month is not a fixed quantity. Attempting to specify
|
|
a calendar interval like month or quarter will throw an exception.
|
|
|
|
The accepted units for fixed intervals are:
|
|
|
|
milliseconds (ms) ::
|
|
|
|
seconds (s) ::
|
|
Defined as 1000 milliseconds each
|
|
|
|
minutes (m) ::
|
|
All minutes begin at 00 seconds.
|
|
|
|
Defined as 60 seconds each (60,000 milliseconds)
|
|
|
|
hours (h) ::
|
|
All hours begin at 00 minutes and 00 seconds.
|
|
Defined as 60 minutes each (3,600,000 milliseconds)
|
|
|
|
days (d) ::
|
|
All days begin at the earliest possible time, which is usually 00:00:00
|
|
(midnight).
|
|
|
|
Defined as 24 hours (86,400,000 milliseconds)
|
|
|
|
===== Fixed Interval Examples
|
|
|
|
If we try to recreate the "month" `calendar_interval` from earlier, we can approximate that with
|
|
30 fixed days:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"fixed_interval" : "30d"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
|
|
But if we try to use a calendar unit that is not supported, such as weeks, we'll get an exception:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"fixed_interval" : "2w"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
// TEST[catch:bad_request]
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"error" : {
|
|
"root_cause" : [...],
|
|
"type" : "x_content_parse_exception",
|
|
"reason" : "[1:82] [date_histogram] failed to parse field [fixed_interval]",
|
|
"caused_by" : {
|
|
"type" : "illegal_argument_exception",
|
|
"reason" : "failed to parse setting [date_histogram.fixedInterval] with value [2w] as a time value: unit is missing or unrecognized",
|
|
"stack_trace" : "java.lang.IllegalArgumentException: failed to parse setting [date_histogram.fixedInterval] with value [2w] as a time value: unit is missing or unrecognized"
|
|
}
|
|
}
|
|
}
|
|
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
===== Notes
|
|
|
|
In all cases, when the specified end time does not exist, the actual end time is
|
|
the closest available time after the specified end.
|
|
|
|
Widely distributed applications must also consider vagaries such as countries that
|
|
start and stop daylight savings time at 12:01 A.M., so end up with one minute of
|
|
Sunday followed by an additional 59 minutes of Saturday once a year, and countries
|
|
that decide to move across the international date line. Situations like
|
|
that can make irregular timezone offsets seem easy.
|
|
|
|
As always, rigorous testing, especially around time-change events, will ensure
|
|
that your time interval specification is
|
|
what you intend it to be.
|
|
|
|
WARNING:
|
|
To avoid unexpected results, all connected servers and clients must sync to a
|
|
reliable network time service.
|
|
|
|
NOTE: fractional time values are not supported, but you can address this by
|
|
shifting to another time unit (e.g., `1.5h` could instead be specified as `90m`).
|
|
|
|
NOTE: You can also specify time values using abbreviations supported by
|
|
<<time-units,time units>> parsing.
|
|
|
|
===== Keys
|
|
|
|
Internally, a date is represented as a 64 bit number representing a timestamp
|
|
in milliseconds-since-the-epoch (01/01/1970 midnight UTC). These timestamps are
|
|
returned as the ++key++ name of the bucket. The `key_as_string` is the same
|
|
timestamp converted to a formatted
|
|
date string using the `format` parameter specification:
|
|
|
|
TIP: If you don't specify `format`, the first date
|
|
<<mapping-date-format,format>> specified in the field mapping is used.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"calendar_interval" : "1M",
|
|
"format" : "yyyy-MM-dd" <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
|
|
<1> Supports expressive date <<date-format-pattern,format pattern>>
|
|
|
|
Response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"sales_over_time": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-01-01",
|
|
"key": 1420070400000,
|
|
"doc_count": 3
|
|
},
|
|
{
|
|
"key_as_string": "2015-02-01",
|
|
"key": 1422748800000,
|
|
"doc_count": 2
|
|
},
|
|
{
|
|
"key_as_string": "2015-03-01",
|
|
"key": 1425168000000,
|
|
"doc_count": 2
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
===== Timezone
|
|
|
|
Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
|
|
rounding is also done in UTC. Use the `time_zone` parameter to indicate
|
|
that bucketing should use a different timezone.
|
|
|
|
You can specify timezones as either an ISO 8601 UTC offset (e.g. `+01:00` or
|
|
`-08:00`) or as a timezone ID as specified in the IANA timezone database,
|
|
such as`America/Los_Angeles`.
|
|
|
|
Consider the following example:
|
|
|
|
[source,console]
|
|
---------------------------------
|
|
PUT my_index/_doc/1?refresh
|
|
{
|
|
"date": "2015-10-01T00:30:00Z"
|
|
}
|
|
|
|
PUT my_index/_doc/2?refresh
|
|
{
|
|
"date": "2015-10-01T01:30:00Z"
|
|
}
|
|
|
|
GET my_index/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"by_day": {
|
|
"date_histogram": {
|
|
"field": "date",
|
|
"calendar_interval": "day"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
|
|
If you don't specify a timezone, UTC is used. This would result in both of these
|
|
documents being placed into the same day bucket, which starts at midnight UTC
|
|
on 1 October 2015:
|
|
|
|
[source,console-result]
|
|
---------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_day": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-10-01T00:00:00.000Z",
|
|
"key": 1443657600000,
|
|
"doc_count": 2
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
If you specify a `time_zone` of `-01:00`, midnight in that timezone is one hour
|
|
before midnight UTC:
|
|
|
|
[source,console]
|
|
---------------------------------
|
|
GET my_index/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"by_day": {
|
|
"date_histogram": {
|
|
"field": "date",
|
|
"calendar_interval": "day",
|
|
"time_zone": "-01:00"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// TEST[continued]
|
|
|
|
Now the first document falls into the bucket for 30 September 2015, while the
|
|
second document falls into the bucket for 1 October 2015:
|
|
|
|
[source,console-result]
|
|
---------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_day": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-09-30T00:00:00.000-01:00", <1>
|
|
"key": 1443574800000,
|
|
"doc_count": 1
|
|
},
|
|
{
|
|
"key_as_string": "2015-10-01T00:00:00.000-01:00", <1>
|
|
"key": 1443661200000,
|
|
"doc_count": 1
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
---------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
<1> The `key_as_string` value represents midnight on each day
|
|
in the specified timezone.
|
|
|
|
WARNING: When using time zones that follow DST (daylight savings time) changes,
|
|
buckets close to the moment when those changes happen can have slightly different
|
|
sizes than you would expect from the used `interval`.
|
|
For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
|
|
clocks were turned forward 1 hour to 3am local time. If you use `day` as `interval`,
|
|
the bucket covering that day will only hold data for 23 hours instead of the usual
|
|
24 hours for other buckets. The same is true for shorter intervals, like 12h,
|
|
where you'll have only a 11h bucket on the morning of 27 March when the DST shift
|
|
happens.
|
|
|
|
===== Offset
|
|
|
|
Use the `offset` parameter to change the start value of each bucket by the
|
|
specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
|
|
an hour, or `1d` for a day. See <<time-units>> for more possible time
|
|
duration options.
|
|
|
|
For example, when using an interval of `day`, each bucket runs from midnight
|
|
to midnight. Setting the `offset` parameter to `+6h` changes each bucket
|
|
to run from 6am to 6am:
|
|
|
|
[source,console]
|
|
-----------------------------
|
|
PUT my_index/_doc/1?refresh
|
|
{
|
|
"date": "2015-10-01T05:30:00Z"
|
|
}
|
|
|
|
PUT my_index/_doc/2?refresh
|
|
{
|
|
"date": "2015-10-01T06:30:00Z"
|
|
}
|
|
|
|
GET my_index/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"by_day": {
|
|
"date_histogram": {
|
|
"field": "date",
|
|
"calendar_interval": "day",
|
|
"offset": "+6h"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
-----------------------------
|
|
|
|
Instead of a single bucket starting at midnight, the above request groups the
|
|
documents into buckets starting at 6am:
|
|
|
|
[source,console-result]
|
|
-----------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"by_day": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015-09-30T06:00:00.000Z",
|
|
"key": 1443592800000,
|
|
"doc_count": 1
|
|
},
|
|
{
|
|
"key_as_string": "2015-10-01T06:00:00.000Z",
|
|
"key": 1443679200000,
|
|
"doc_count": 1
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
-----------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
NOTE: The start `offset` of each bucket is calculated after `time_zone`
|
|
adjustments have been made.
|
|
|
|
===== Keyed Response
|
|
|
|
Setting the `keyed` flag to `true` associates a unique string key with each
|
|
bucket and returns the ranges as a hash rather than an array:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sales_over_time" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"calendar_interval" : "1M",
|
|
"format" : "yyyy-MM-dd",
|
|
"keyed": true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
|
|
Response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"sales_over_time": {
|
|
"buckets": {
|
|
"2015-01-01": {
|
|
"key_as_string": "2015-01-01",
|
|
"key": 1420070400000,
|
|
"doc_count": 3
|
|
},
|
|
"2015-02-01": {
|
|
"key_as_string": "2015-02-01",
|
|
"key": 1422748800000,
|
|
"doc_count": 2
|
|
},
|
|
"2015-03-01": {
|
|
"key_as_string": "2015-03-01",
|
|
"key": 1425168000000,
|
|
"doc_count": 2
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
===== Scripts
|
|
|
|
As with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>,
|
|
both document-level scripts and
|
|
value-level scripts are supported. You can control the order of the returned
|
|
buckets using the `order`
|
|
settings and filter the returned buckets based on a `min_doc_count` setting
|
|
(by default all buckets between the first
|
|
bucket that matches documents and the last one are returned). This histogram
|
|
also supports the `extended_bounds`
|
|
setting, which enables extending the bounds of the histogram beyond the data
|
|
itself. For more information, see
|
|
<<search-aggregations-bucket-histogram-aggregation-extended-bounds,`Extended Bounds`>>.
|
|
|
|
===== Missing value
|
|
|
|
The `missing` parameter defines how to treat documents that are missing a value.
|
|
By default, they are ignored, but it is also possible to treat them as if they
|
|
have a value.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs" : {
|
|
"sale_date" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"calendar_interval": "year",
|
|
"missing": "2000/01/01" <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
|
|
<1> Documents without a value in the `publish_date` field will fall into the
|
|
same bucket as documents that have the value `2000-01-01`.
|
|
|
|
===== Order
|
|
|
|
By default the returned buckets are sorted by their `key` ascending, but you can
|
|
control the order using
|
|
the `order` setting. This setting supports the same `order` functionality as
|
|
<<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
|
|
|
|
===== Using a script to aggregate by day of the week
|
|
|
|
When you need to aggregate the results by day of the week, use a script that
|
|
returns the day of the week:
|
|
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /sales/_search?size=0
|
|
{
|
|
"aggs": {
|
|
"dayOfWeek": {
|
|
"terms": {
|
|
"script": {
|
|
"lang": "painless",
|
|
"source": "doc['date'].value.dayOfWeekEnum.value"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sales]
|
|
|
|
Response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"dayOfWeek": {
|
|
"doc_count_error_upper_bound": 0,
|
|
"sum_other_doc_count": 0,
|
|
"buckets": [
|
|
{
|
|
"key": "7",
|
|
"doc_count": 4
|
|
},
|
|
{
|
|
"key": "4",
|
|
"doc_count": 3
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
The response will contain all the buckets having the relative day of
|
|
the week as key : 1 for Monday, 2 for Tuesday... 7 for Sunday.
|