2013-11-24 06:13:08 -05:00
[[search-aggregations-bucket-datehistogram-aggregation]]
2014-05-12 19:35:58 -04:00
=== Date Histogram Aggregation
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
2014-01-17 11:20:05 -05:00
to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
2013-12-01 19:54:42 -05:00
that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
2014-01-17 11:20:05 -05:00
we need a special support for time based data. From a functionality perspective, this histogram supports the same features
as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
2013-11-24 06:13:08 -05:00
2014-01-17 11:20:05 -05:00
Requesting bucket intervals of a month.
2013-11-24 06:13:08 -05:00
[source,js]
--------------------------------------------------
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
}
}
}
}
--------------------------------------------------
2014-01-17 11:20:05 -05:00
fractional values are allowed, for example 1.5 hours:
2013-11-24 06:13:08 -05:00
[source,js]
--------------------------------------------------
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
2014-01-17 11:20:05 -05:00
"interval" : "1.5h"
2013-11-24 06:13:08 -05:00
}
}
}
}
--------------------------------------------------
2014-03-01 12:20:48 -05:00
Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`
2013-11-24 06:13:08 -05:00
==== Time Zone
2013-12-01 19:54:42 -05:00
By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is
done on UTC. It is possible to provide a time zone (both pre rounding, and post rounding) value, which will cause all
computations to take the relevant zone into account. The time returned for each bucket/entry is milliseconds since the
epoch of the provided time zone.
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
The parameters are `pre_zone` (pre rounding based on interval) and `post_zone` (post rounding based on interval). The
`time_zone` parameter simply sets the `pre_zone` parameter. By default, those are set to `UTC`.
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
The zone value accepts either a numeric value for the hours offset, for example: `"time_zone" : -2`. It also accepts a
format of hours and minutes, like `"time_zone" : "-02:30"`. Another option is to provide a time zone accepted as one of
the values listed here.
2013-11-24 06:13:08 -05:00
2013-12-01 19:54:42 -05:00
Lets take an example. For `2012-04-01T04:15:30Z`, with a `pre_zone` of `-08:00`. For day interval, the actual time by
applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
`2012-03-31T00:00:00Z` (UTC). For hour interval, applying the time zone results in `2012-03-31T20:15:30`, rounding it
results in `2012-03-31T20:00:00`, but, we want to return it in UTC (`post_zone` is not set), so we convert it back to
UTC: `2012-04-01T04:00:00Z`. Note, we are consistent in the results, returning the rounded value in UTC.
2013-11-24 06:13:08 -05:00
`post_zone` simply takes the result, and adds the relevant offset.
2013-12-01 19:54:42 -05:00
Sometimes, we want to apply the same conversion to UTC we did above for hour also for day (and up) intervals. We can
set `pre_zone_adjust_large_interval` to `true`, which will apply the same conversion done for hour interval in the
example, to day and above intervals (it can be set regardless of the interval, but only kick in when using day and
higher intervals).
2013-11-24 06:13:08 -05:00
==== Factor
2013-12-01 19:54:42 -05:00
The date histogram works on numeric values (since time is stored in milliseconds since the epoch in UTC). But,
sometimes, systems will store a different resolution (like seconds since UTC) in a numeric field. The `factor`
parameter can be used to change the value in the field to milliseconds to actual do the relevant rounding, and then
be applied again to get to the original unit. For example, when storing in a numeric field seconds resolution, the
factor can be set to 1000.
2013-11-24 06:13:08 -05:00
==== Pre/Post Offset
2013-12-01 19:54:42 -05:00
Specific offsets can be provided for pre rounding and post rounding. The `pre_offset` for pre rounding, and
`post_offset` for post rounding. The format is the date time format (`1h`, `1d`, etc...).
2013-11-24 06:13:08 -05:00
==== Keys
2013-12-01 19:54:42 -05:00
Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key
representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in
returning the dates as formatted strings next to the numeric key values:
2013-11-24 06:13:08 -05:00
[source,js]
--------------------------------------------------
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "1M",
"format" : "yyyy-MM-dd" <1>
}
}
}
}
--------------------------------------------------
<1> Supports expressive date <<date-format-pattern,format pattern>>
Response:
[source,js]
--------------------------------------------------
{
"aggregations": {
2014-01-28 11:46:26 -05:00
"articles_over_time": {
"buckets": [
{
"key_as_string": "2013-02-02",
"key": 1328140800000,
"doc_count": 1
},
{
"key_as_string": "2013-03-02",
"key": 1330646400000,
"doc_count": 2
},
...
]
}
2013-11-24 06:13:08 -05:00
}
}
--------------------------------------------------
2013-12-01 19:54:42 -05:00
Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
Added extended_bounds support for date_/histogram aggs
By default the date_/histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when when requesting empty buckets (min_doc_count : 0), this causes a confusion, specifically, when the data is also filtered.
To understand why, let's look at an example:
Lets say the you're filtering your request to get all docs from the last month, and in the date_histogram aggs you'd like to slice the data per day. You also specify min_doc_count:0 so that you'd still get empty buckets for those days to which no document belongs. By default, if the first document that fall in this last month also happen to fall on the first day of the **second week** of the month, the date_histogram will **not** return empty buckets for all those days prior to that second week. The reason for that is that by default the histogram aggregations only start building buckets when they encounter documents (hence, missing on all the days of the first week in our example).
With extended_bounds, you now can "force" the histogram aggregations to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if the min_doc_count is greater than 0).
Note that (as the name suggest) extended_bounds is **not** filtering buckets. Meaning, if the min bounds is higher than the values extracted from the documents, the documents will still dictate what the min bucket will be (and the same goes to the extended_bounds.max and the max bucket). For filtering buckets, one should nest the histogram agg under a range filter agg with the appropriate min/max.
Closes #5224
2014-03-16 20:06:07 -04:00
value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
2014-01-29 14:55:19 -05:00
settings and filter the returned buckets based on a `min_doc_count` setting (by defaults to all buckets with
Added extended_bounds support for date_/histogram aggs
By default the date_/histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when when requesting empty buckets (min_doc_count : 0), this causes a confusion, specifically, when the data is also filtered.
To understand why, let's look at an example:
Lets say the you're filtering your request to get all docs from the last month, and in the date_histogram aggs you'd like to slice the data per day. You also specify min_doc_count:0 so that you'd still get empty buckets for those days to which no document belongs. By default, if the first document that fall in this last month also happen to fall on the first day of the **second week** of the month, the date_histogram will **not** return empty buckets for all those days prior to that second week. The reason for that is that by default the histogram aggregations only start building buckets when they encounter documents (hence, missing on all the days of the first week in our example).
With extended_bounds, you now can "force" the histogram aggregations to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if the min_doc_count is greater than 0).
Note that (as the name suggest) extended_bounds is **not** filtering buckets. Meaning, if the min bounds is higher than the values extracted from the documents, the documents will still dictate what the min bucket will be (and the same goes to the extended_bounds.max and the max bucket). For filtering buckets, one should nest the histogram agg under a range filter agg with the appropriate min/max.
Closes #5224
2014-03-16 20:06:07 -04:00
`min_doc_count > 0` will be returned). This histogram also supports the `extended_bounds` settings, that enables extending
the bounds of the histogram beyond the data itself (to read more on why you'd want to do that please refer to the
explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>.