mirror of https://github.com/apache/druid.git
468 lines
12 KiB
Markdown
468 lines
12 KiB
Markdown
---
|
|
id: granularities
|
|
title: "Query granularities"
|
|
sidebar_label: "Granularities"
|
|
---
|
|
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|
~ or more contributor license agreements. See the NOTICE file
|
|
~ distributed with this work for additional information
|
|
~ regarding copyright ownership. The ASF licenses this file
|
|
~ to you under the Apache License, Version 2.0 (the
|
|
~ "License"); you may not use this file except in compliance
|
|
~ with the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing,
|
|
~ software distributed under the License is distributed on an
|
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
~ KIND, either express or implied. See the License for the
|
|
~ specific language governing permissions and limitations
|
|
~ under the License.
|
|
-->
|
|
|
|
:::info
|
|
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
|
|
This document describes the native
|
|
language. For information about time functions available in SQL, refer to the
|
|
[SQL documentation](sql-scalar.md#date-and-time-functions).
|
|
:::
|
|
|
|
Granularity determines how to bucket data across the time dimension, or how to aggregate data by hour, day, minute, etc.
|
|
|
|
For example, use time granularities in [native queries](querying.md) to bucket results by time, and in the `dataSchema` \\ [`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec) section of ingestion specifications to segment incoming data.
|
|
|
|
You can specify a time period as a [simple](#simple-granularities) string, as a [duration](#duration-granularities) in milliseconds, or as an arbitrary ISO8601 [period](#period-granularities).
|
|
|
|
### Simple Granularities
|
|
|
|
Simple granularities are specified as a string and bucket timestamps by their UTC time (e.g., days start at 00:00 UTC).
|
|
|
|
Druid supports the following granularity strings:
|
|
- `all`
|
|
- `none`
|
|
- `second`
|
|
- `minute`
|
|
- `five_minute`
|
|
- `ten_minute`
|
|
- `fifteen_minute`
|
|
- `thirty_minute`
|
|
- `hour`
|
|
- `six_hour`
|
|
- `eight_hour`
|
|
- `day`
|
|
- `week`*
|
|
- `month`
|
|
- `quarter`
|
|
- `year`
|
|
|
|
The minimum and maximum granularities are `none` and `all`, described as follows:
|
|
* `all` buckets everything into a single bucket.
|
|
* `none` does not mean zero bucketing. It buckets data to millisecond granularity—the granularity of the internal index. You can think of `none` as equivalent to `millisecond`.
|
|
:::info
|
|
Do not use `none` in a [timeseries query](../querying/timeseriesquery.md); Druid fills empty interior time buckets with zeroes, meaning the output will contain results for every single millisecond in the requested interval.
|
|
:::
|
|
|
|
*Avoid using the `week` granularity for partitioning at ingestion time, because weeks don't align neatly with months and years, making it difficult to partition by coarser granularities later.
|
|
|
|
#### Example:
|
|
|
|
Suppose you have data below stored in Apache Druid with millisecond ingestion granularity,
|
|
|
|
``` json
|
|
{"timestamp": "2013-08-31T01:02:33Z", "page": "AAA", "language" : "en"}
|
|
{"timestamp": "2013-09-01T01:02:33Z", "page": "BBB", "language" : "en"}
|
|
{"timestamp": "2013-09-02T23:32:45Z", "page": "CCC", "language" : "en"}
|
|
{"timestamp": "2013-09-03T03:32:45Z", "page": "DDD", "language" : "en"}
|
|
```
|
|
|
|
After submitting a groupBy query with `hour` granularity,
|
|
|
|
``` json
|
|
{
|
|
"queryType":"groupBy",
|
|
"dataSource":"my_dataSource",
|
|
"granularity":"hour",
|
|
"dimensions":[
|
|
"language"
|
|
],
|
|
"aggregations":[
|
|
{
|
|
"type":"count",
|
|
"name":"count"
|
|
}
|
|
],
|
|
"intervals":[
|
|
"2000-01-01T00:00Z/3000-01-01T00:00Z"
|
|
]
|
|
}
|
|
```
|
|
|
|
you will get
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-31T01:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-01T01:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T23:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-03T03:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
Note that all the empty buckets are discarded.
|
|
|
|
|
|
If you change the granularity to `day`, you will get
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-31T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-01T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-03T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
|
|
If you change the granularity to `none`, you will get the same results as setting it to the ingestion granularity.
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-31T01:02:33.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-01T01:02:33.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T23:32:45.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-03T03:32:45.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
Having a query time `granularity` that is smaller than the `queryGranularity` parameter set at
|
|
[ingestion time](../ingestion/ingestion-spec.md#granularityspec) is unreasonable because information about that
|
|
smaller granularity is not present in the indexed data. So, if the query time granularity is smaller than the ingestion
|
|
time query granularity, Druid produces results that are equivalent to having set `granularity` to `queryGranularity`.
|
|
|
|
|
|
If you change the granularity to `all`, you will get everything aggregated in 1 bucket,
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2000-01-01T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 4,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
|
|
### Duration Granularities
|
|
|
|
Duration granularities are specified as an exact duration in milliseconds and timestamps are returned as UTC. Duration granularity values are in millis.
|
|
|
|
They also support specifying an optional origin, which defines where to start counting time buckets from (defaults to 1970-01-01T00:00:00Z).
|
|
|
|
```javascript
|
|
{"type": "duration", "duration": 7200000}
|
|
```
|
|
|
|
This chunks up every 2 hours.
|
|
|
|
```javascript
|
|
{"type": "duration", "duration": 3600000, "origin": "2012-01-01T00:30:00Z"}
|
|
```
|
|
|
|
This chunks up every hour on the half-hour.
|
|
|
|
#### Example:
|
|
|
|
Reusing the data in the previous example, after submitting a groupBy query with 24 hours duration,
|
|
|
|
``` json
|
|
{
|
|
"queryType":"groupBy",
|
|
"dataSource":"my_dataSource",
|
|
"granularity":{"type": "duration", "duration": "86400000"},
|
|
"dimensions":[
|
|
"language"
|
|
],
|
|
"aggregations":[
|
|
{
|
|
"type":"count",
|
|
"name":"count"
|
|
}
|
|
],
|
|
"intervals":[
|
|
"2000-01-01T00:00Z/3000-01-01T00:00Z"
|
|
]
|
|
}
|
|
```
|
|
|
|
you will get
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-31T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-01T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-03T00:00:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
if you set the origin for the granularity to `2012-01-01T00:30:00Z`,
|
|
|
|
``` javascript
|
|
"granularity":{"type": "duration", "duration": "86400000", "origin":"2012-01-01T00:30:00Z"}
|
|
```
|
|
|
|
you will get
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-31T00:30:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-01T00:30:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T00:30:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-03T00:30:00.000Z",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
Note that the timestamp for each bucket starts at the 30th minute.
|
|
|
|
### Period Granularities
|
|
|
|
Period granularities are specified as arbitrary period combinations of years, months, weeks, hours, minutes and seconds (e.g. P2W, P3M, PT1H30M, PT0.750S) in [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) format. They support specifying a time zone which determines where period boundaries start as well as the timezone of the returned timestamps. By default, years start on the first of January, months start on the first of the month and weeks start on Mondays unless an origin is specified.
|
|
|
|
Time zone is optional (defaults to UTC). Origin is optional (defaults to 1970-01-01T00:00:00 in the given time zone).
|
|
|
|
```javascript
|
|
{"type": "period", "period": "P2D", "timeZone": "America/Los_Angeles"}
|
|
```
|
|
|
|
This will bucket by two-day chunks in the Pacific timezone.
|
|
|
|
```javascript
|
|
{"type": "period", "period": "P3M", "timeZone": "America/Los_Angeles", "origin": "2012-02-01T00:00:00-08:00"}
|
|
```
|
|
|
|
This will bucket by 3-month chunks in the Pacific timezone where the three-month quarters are defined as starting from February.
|
|
|
|
#### Example
|
|
|
|
Reusing the data in the previous example, if you submit a groupBy query with 1 day period in Pacific timezone,
|
|
|
|
``` json
|
|
{
|
|
"queryType":"groupBy",
|
|
"dataSource":"my_dataSource",
|
|
"granularity":{"type": "period", "period": "P1D", "timeZone": "America/Los_Angeles"},
|
|
"dimensions":[
|
|
"language"
|
|
],
|
|
"aggregations":[
|
|
{
|
|
"type":"count",
|
|
"name":"count"
|
|
}
|
|
],
|
|
"intervals":[
|
|
"1999-12-31T16:00:00.000-08:00/2999-12-31T16:00:00.000-08:00"
|
|
]
|
|
}
|
|
```
|
|
|
|
you will get
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-30T00:00:00.000-07:00",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-31T00:00:00.000-07:00",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T00:00:00.000-07:00",
|
|
"event" : {
|
|
"count" : 2,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
Note that the timestamp for each bucket has been converted to Pacific time. Row `{"timestamp": "2013-09-02T23:32:45Z", "page": "CCC", "language" : "en"}` and
|
|
`{"timestamp": "2013-09-03T03:32:45Z", "page": "DDD", "language" : "en"}` are put in the same bucket because they are in the same day in Pacific time.
|
|
|
|
Also note that the `intervals` in groupBy query will not be converted to the timezone specified, the timezone specified in granularity is only applied on the
|
|
query results.
|
|
|
|
If you set the origin for the granularity to `1970-01-01T20:30:00-08:00`,
|
|
|
|
``` javascript
|
|
"granularity":{"type": "period", "period": "P1D", "timeZone": "America/Los_Angeles", "origin": "1970-01-01T20:30:00-08:00"}
|
|
```
|
|
|
|
you will get
|
|
|
|
``` json
|
|
[ {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-29T20:30:00.000-07:00",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-08-30T20:30:00.000-07:00",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-01T20:30:00.000-07:00",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
}, {
|
|
"version" : "v1",
|
|
"timestamp" : "2013-09-02T20:30:00.000-07:00",
|
|
"event" : {
|
|
"count" : 1,
|
|
"language" : "en"
|
|
}
|
|
} ]
|
|
```
|
|
|
|
Note that the `origin` you specified has nothing to do with the timezone, it only serves as a starting point for locating the very first granularity bucket.
|
|
In this case, Row `{"timestamp": "2013-09-02T23:32:45Z", "page": "CCC", "language" : "en"}` and `{"timestamp": "2013-09-03T03:32:45Z", "page": "DDD", "language" : "en"}`
|
|
are not in the same bucket.
|
|
|
|
#### Supported Time Zones
|
|
Timezone support is provided by the [Joda Time library](http://www.joda.org), which uses the standard IANA time zones. See the [Joda Time supported timezones](http://joda-time.sourceforge.net/timezones.html).
|