OpenSearch/docs/reference/search/aggregations/bucket/datehistogram-aggregation.a...

[[search-aggregations-bucket-datehistogram-aggregation]]
=== Date Histogram Aggregation

A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
we need special support for time based data. From a functionality perspective, this histogram supports the same features
as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.

Requesting bucket intervals of a month.

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "articles_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            }
        }
    }
}
--------------------------------------------------

Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`


Fractional values are allowed for seconds, minutes, hours, days and weeks. For example 1.5 hours:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "articles_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "1.5h"
            }
        }
    }
}
--------------------------------------------------

See <<time-units>> for accepted abbreviations.

==== Time Zone

By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is
done on UTC. It is possible to provide a time zone value, which will cause all bucket
computations to take place in the specified zone. The time returned for each bucket/entry is milliseconds since the
epoch in UTC. The parameters is called `time_zone`. It accepts either a numeric value for the hours offset, for example:
`"time_zone" : -2`. It also accepts a format of hours and minutes, like `"time_zone" : "-02:30"`.
Another option is to provide a time zone accepted as one of the values listed here.

Lets take an example. For `2012-04-01T04:15:30Z` (UTC), with a `time_zone` of `"-08:00"`. For day interval, the actual time by
applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
`2012-03-31T08:00:00Z` (UTC). For hour interval, internally applying the time zone results in `2012-03-31T20:15:30`, so rounding it
in the time zone results in `2012-03-31T20:00:00`, but we return that rounded value converted back in UTC so be consistent as
`2012-04-01T04:00:00Z` (UTC).

==== Offset

The `offset` option can be provided for shifting the date bucket intervals boundaries after any other shifts because of
time zones are applies. This for example makes it possible that daily buckets go from 6AM to 6AM the next day instead of starting at 12AM
or that monthly buckets go from the 10th of the month to the 10th of the next month instead of the 1st.

The `offset` option accepts positive or negative time durations like "1h" for an hour or "1M" for a Month. See <<time-units>> for more
possible time duration options.

==== Keys

Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key
representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in
returning the dates as formatted strings next to the numeric key values:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "articles_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "1M",
                "format" : "yyyy-MM-dd" <1>
            }
        }
    }
}
--------------------------------------------------

<1> Supports expressive date <<date-format-pattern,format pattern>>

Response:

[source,js]
--------------------------------------------------
{
    "aggregations": {
        "articles_over_time": {
            "buckets": [
                {
                    "key_as_string": "2013-02-02",
                    "key": 1328140800000,
                    "doc_count": 1
                },
                {
                    "key_as_string": "2013-03-02",
                    "key": 1330646400000,
                    "doc_count": 2
                },
                ...
            ]
        }
    }
}
--------------------------------------------------

Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00			`[[search-aggregations-bucket-datehistogram-aggregation]]`
[DOCS] Added "Aggregation" to all aggs titles 2014-05-12 19:35:58 -04:00			`=== Date Histogram Aggregation`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-01 19:54:42 -05:00			`A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can`
			`only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible`
Reference docs fixes * Make it clearer that `aggs` is an allowed synomym for the `aggregations` key * Fix broken example in for datehistogram, `1.5M` is not an allowed interval * Make use of colon before examples consistent * Fix typos 2014-01-17 11:20:05 -05:00			to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-01 19:54:42 -05:00			`that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,`
Fix date histogram docs grammar. 2014-12-23 13:19:25 -05:00			`we need special support for time based data. From a functionality perspective, this histogram supports the same features`
Reference docs fixes * Make it clearer that `aggs` is an allowed synomym for the `aggregations` key * Fix broken example in for datehistogram, `1.5M` is not an allowed interval * Make use of colon before examples consistent * Fix typos 2014-01-17 11:20:05 -05:00			`as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
Reference docs fixes * Make it clearer that `aggs` is an allowed synomym for the `aggregations` key * Fix broken example in for datehistogram, `1.5M` is not an allowed interval * Make use of colon before examples consistent * Fix typos 2014-01-17 11:20:05 -05:00			`Requesting bucket intervals of a month.`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"articles_over_time" : {`
			`"date_histogram" : {`
			`"field" : "date",`
			`"interval" : "month"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

Update datehistogram-aggregation.asciidoc Clarified use of fractional time units in the date histo agg. Closes #7957 2014-11-08 11:49:09 -05:00			Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`


			`Fractional values are allowed for seconds, minutes, hours, days and weeks. For example 1.5 hours:`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"articles_over_time" : {`
			`"date_histogram" : {`
			`"field" : "date",`
Reference docs fixes * Make it clearer that `aggs` is an allowed synomym for the `aggregations` key * Fix broken example in for datehistogram, `1.5M` is not an allowed interval * Make use of colon before examples consistent * Fix typos 2014-01-17 11:20:05 -05:00			`"interval" : "1.5h"`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

Update datehistogram-aggregation.asciidoc Clarified use of fractional time units in the date histo agg. Closes #7957 2014-11-08 11:49:09 -05:00			`See <<time-units>> for accepted abbreviations.`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
			`==== Time Zone`

Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-01 19:54:42 -05:00			`By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is`
Aggregations: Simplify time zone option in `date_histogram` Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of the simpler `time_zone` option. Previously, specifying different values for these could lead to confusing scenarios where ES would return bucket keys that are not UTC. Now `time_zone` is the only option setting, the calculation of date buckets to take place in the preferred time zone, but after rounding converting the bucket key values back to UTC. Closes #9062 Closes #9637 2015-02-16 10:54:06 -05:00			`done on UTC. It is possible to provide a time zone value, which will cause all bucket`
			`computations to take place in the specified zone. The time returned for each bucket/entry is milliseconds since the`
			epoch in UTC. The parameters is called `time_zone`. It accepts either a numeric value for the hours offset, for example:
			`"time_zone" : -2`. It also accepts a format of hours and minutes, like `"time_zone" : "-02:30"`.
			`Another option is to provide a time zone accepted as one of the values listed here.`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
Aggregations: Simplify time zone option in `date_histogram` Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of the simpler `time_zone` option. Previously, specifying different values for these could lead to confusing scenarios where ES would return bucket keys that are not UTC. Now `time_zone` is the only option setting, the calculation of date buckets to take place in the preferred time zone, but after rounding converting the bucket key values back to UTC. Closes #9062 Closes #9637 2015-02-16 10:54:06 -05:00			Lets take an example. For `2012-04-01T04:15:30Z` (UTC), with a `time_zone` of `"-08:00"`. For day interval, the actual time by
Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-01 19:54:42 -05:00			applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
Aggregations: Simplify time zone option in `date_histogram` Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of the simpler `time_zone` option. Previously, specifying different values for these could lead to confusing scenarios where ES would return bucket keys that are not UTC. Now `time_zone` is the only option setting, the calculation of date buckets to take place in the preferred time zone, but after rounding converting the bucket key values back to UTC. Closes #9062 Closes #9637 2015-02-16 10:54:06 -05:00			`2012-03-31T08:00:00Z` (UTC). For hour interval, internally applying the time zone results in `2012-03-31T20:15:30`, so rounding it
			in the time zone results in `2012-03-31T20:00:00`, but we return that rounded value converted back in UTC so be consistent as
			`2012-04-01T04:00:00Z` (UTC).
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
Aggregations: Add 'offset' option to date_histogram, replacing 'pre_offset' and 'post_offset' Add offset option to 'date_histogram' replacing and simplifying the previous 'pre_offset' and 'post_offset' options. This change is part of a larger clean up task for `date_histogram` from issue #9062. 2015-02-03 08:06:50 -05:00			`==== Offset`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
Aggregations: Add 'offset' option to date_histogram, replacing 'pre_offset' and 'post_offset' Add offset option to 'date_histogram' replacing and simplifying the previous 'pre_offset' and 'post_offset' options. This change is part of a larger clean up task for `date_histogram` from issue #9062. 2015-02-03 08:06:50 -05:00			The `offset` option can be provided for shifting the date bucket intervals boundaries after any other shifts because of
			`time zones are applies. This for example makes it possible that daily buckets go from 6AM to 6AM the next day instead of starting at 12AM`
			`or that monthly buckets go from the 10th of the month to the 10th of the next month instead of the 1st.`

			The `offset` option accepts positive or negative time durations like "1h" for an hour or "1M" for a Month. See <<time-units>> for more
			`possible time duration options.`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
			`==== Keys`

Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-01 19:54:42 -05:00			`Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key`
			`representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in`
			`returning the dates as formatted strings next to the numeric key values:`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"articles_over_time" : {`
			`"date_histogram" : {`
			`"field" : "date",`
			`"interval" : "1M",`
			`"format" : "yyyy-MM-dd" <1>`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`<1> Supports expressive date <<date-format-pattern,format pattern>>`

			`Response:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggregations": {`
Made all multi-bucket aggs return consistent response format Closes #4926 2014-01-28 11:46:26 -05:00			`"articles_over_time": {`
			`"buckets": [`
			`{`
			`"key_as_string": "2013-02-02",`
			`"key": 1328140800000,`
			`"doc_count": 1`
			`},`
			`{`
			`"key_as_string": "2013-03-02",`
			`"key": 1330646400000,`
			`"doc_count": 2`
			`},`
			`...`
			`]`
			`}`
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00			`}`
			`}`
			`--------------------------------------------------`

Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs. 2013-12-01 19:54:42 -05:00			`Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and`
Added extended_bounds support for date_/histogram aggs By default the date_/histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when when requesting empty buckets (min_doc_count : 0), this causes a confusion, specifically, when the data is also filtered. To understand why, let's look at an example: Lets say the you're filtering your request to get all docs from the last month, and in the date_histogram aggs you'd like to slice the data per day. You also specify min_doc_count:0 so that you'd still get empty buckets for those days to which no document belongs. By default, if the first document that fall in this last month also happen to fall on the first day of the second week of the month, the date_histogram will not return empty buckets for all those days prior to that second week. The reason for that is that by default the histogram aggregations only start building buckets when they encounter documents (hence, missing on all the days of the first week in our example). With extended_bounds, you now can "force" the histogram aggregations to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if the min_doc_count is greater than 0). Note that (as the name suggest) extended_bounds is not filtering buckets. Meaning, if the min bounds is higher than the values extracted from the documents, the documents will still dictate what the min bucket will be (and the same goes to the extended_bounds.max and the max bucket). For filtering buckets, one should nest the histogram agg under a range filter agg with the appropriate min/max. Closes #5224 2014-03-16 20:06:07 -04:00			value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
Aggs: Change the default `min_doc_count` to 0 on histograms. The assumption is that gaps in histogram are generally undesirable, for instance if you want to build a visualization from it. Additionally, we are building new aggregations that require that there are no gaps to work correctly (eg. derivatives). 2015-04-30 08:55:34 -04:00			settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
			bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
			`setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to`
			`do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).`