Docs: Improved the date histogram docs for time_zone and offset

2015-09-07 19:53:50 +02:00 · 2015-09-07 19:53:50 +02:00 · 8aba6ce93a
parent 11c87106ce
commit 8aba6ce93a
1 changed files with 173 additions and 28 deletions
--- a/docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
+++ b/docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
@ -45,36 +45,15 @@ Fractional values are allowed for seconds, minutes, hours, days and weeks. For e
 See <<time-units>> for accepted abbreviations.
 ==== Time Zone
 By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is
 done on UTC. It is possible to provide a time zone value, which will cause all bucket
 computations to take place in the specified zone. The time returned for each bucket/entry is milliseconds since the
 epoch in UTC. The parameters is called `time_zone`. It accepts either a ISO 8601 UTC offset, or a timezone id.
 A UTC offset has the form of a `+` or `-`, followed by two digit hour, followed by `:`, followed by two digit minutes.
 For example, `+01:00` represents 1 hour ahead of UTC. A timezone id is the identifier for a TZ database. For example,
 Pacific time is represented as `America\Los_Angeles`.
 Lets take an example. For `2012-04-01T04:15:30Z` (UTC), with a `time_zone` of `"-08:00"`. For day interval, the actual time by
 applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
 `2012-03-31T08:00:00Z` (UTC). For hour interval, internally applying the time zone results in `2012-03-31T20:15:30`, so rounding it
 in the time zone results in `2012-03-31T20:00:00`, but we return that rounded value converted back in UTC so be consistent as
 `2012-04-01T04:00:00Z` (UTC).
 ==== Offset
 The `offset` option can be provided for shifting the date bucket intervals boundaries after any other shifts because of
 time zones are applies. This for example makes it possible that daily buckets go from 6AM to 6AM the next day instead of starting at 12AM
 or that monthly buckets go from the 10th of the month to the 10th of the next month instead of the 1st.
 The `offset` option accepts positive or negative time durations like "1h" for an hour or "1M" for a Month. See <<time-units>> for more
 possible time duration options.
 ==== Keys
-Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key
+Internally, a date is represented as a 64 bit number representing a timestamp
-representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in
+in milliseconds-since-the-epoch. These timestamps are returned as the bucket
-returning the dates as formatted strings next to the numeric key values:
++key++s. The `key_as_string` is the same timestamp converted to a formatted
 date string using the format specified with the `format` parameter:
 TIP: If no `format` is specified, then it will use the first date
 <<mapping-date-format,format>> specified in the field mapping.
 [source,js]
 --------------------------------------------------
@ -118,6 +97,172 @@ Response:
 }
 --------------------------------------------------
 ==== Time Zone
 Date-times are stored in Elasticsearch in UTC.  By default, all bucketing and
 rounding is also done in UTC. The `time_zone` parameter can be used to indicate
 that bucketing should use a different time zone.
 Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
 `-08:00`)  or as a timezone id, an identifier used in the TZ database like
 `America\Los_Angeles` (which would need to be escaped in JSON as
 `"America\\Los_Angeles"`).
 Consider the following example:
 [source,js]
 ---------------------------------
 PUT my_index/log/1
 {
  "date": "2015-10-01T00:30:00Z"
 }
 PUT my_index/log/2
 {
  "date": "2015-10-01T01:30:00Z"
 }
 GET my_index/_search?size=0
 {
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field":     "date",
        "interval":  "day"
      }
    }
  }
 }
 ---------------------------------
 UTC is used if no time zone is specified, which would result in both of these
 documents being placed into the same day bucket, which starts at midnight UTC
 on 1 October 2015:
 [source,js]
 ---------------------------------
 "aggregations": {
  "by_day": {
    "buckets": [
      {
        "key_as_string": "2015-10-01T00:00:00.000Z",
        "key":           1443657600000,
        "doc_count":     2
      }
    ]
  }
 }
 ---------------------------------
 If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before
 midnight UTC:
 [source,js]
 ---------------------------------
 GET my_index/_search?size=0
 {
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field":     "date",
        "interval":  "day",
        "time_zone": "-01:00"
      }
    }
  }
 }
 ---------------------------------
 Now the first document falls into the bucket for 30 September 2015, while the
 second document falls into the bucket for 1 October 2015:
 [source,js]
 ---------------------------------
 "aggregations": {
  "by_day": {
    "buckets": [
      {
        "key_as_string": "2015-09-30T00:00:00.000-01:00", <1>
        "key": 1443571200000,
        "doc_count": 1
      },
      {
        "key_as_string": "2015-10-01T00:00:00.000-01:00", <1>
        "key": 1443657600000,
        "doc_count": 1
      }
    ]
  }
 }
 ---------------------------------
 <1> The `key_as_string` value represents midnight on each day
    in the specified time zone.
 ==== Offset
 The `offset` parameter is used to change the start value of each bucket by the
 specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
 an hour, or `1M` for a month. See <<time-units>> for more possible time
 duration options.
 For instance, when using an interval of `day`, each bucket runs from midnight
 to midnight.  Setting the `offset` parameter to `+6h` would change each bucket
 to run from 6am to 6am:
 [source,js]
 -----------------------------
 PUT my_index/log/1
 {
  "date": "2015-10-01T05:30:00Z"
 }
 PUT my_index/log/2
 {
  "date": "2015-10-01T06:30:00Z"
 }
 GET my_index/_search?size=0
 {
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field":     "date",
        "interval":  "day",
        "offset":    "+6h"
      }
    }
  }
 }
 -----------------------------
 Instead of a single bucket starting at midnight, the above request groups the
 documents into buckets starting at 6am:
 [source,js]
 -----------------------------
 "aggregations": {
  "by_day": {
    "buckets": [
      {
        "key_as_string": "2015-09-30T06:00:00.000Z",
        "key": 1443592800000,
        "doc_count": 1
      },
      {
        "key_as_string": "2015-10-01T06:00:00.000Z",
        "key": 1443679200000,
        "doc_count": 1
      }
    ]
  }
 }
 -----------------------------
 NOTE: The start `offset` of each bucket is calculated after the `time_zone`
 adjustments have been made.
 ==== Scripts
 Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
 value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
 settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first