mirror of https://github.com/apache/druid.git
Advise against using WEEK granularity for Native Batch and MSQ (#14341)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
parent
2086ff88bc
commit
70c06fc0e1
|
@ -317,7 +317,7 @@ A `granularitySpec` can have the following components:
|
|||
| Field | Description | Default |
|
||||
|-------|-------------|---------|
|
||||
| type |`uniform`| `uniform` |
|
||||
| segmentGranularity | [Time chunking](../design/architecture.md#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.| `day` |
|
||||
| segmentGranularity | [Time chunking](../design/architecture.md#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br /><br />Avoid `WEEK` granularity for data partitioning because weeks don't align neatly with months and years, making it difficult to change partitioning by coarser granularity. Instead, opt for other partitioning options such as `DAY` or `MONTH`, which offer more flexibility.| `day` |
|
||||
| queryGranularity | The resolution of timestamp storage within each segment. This must be equal to, or finer, than `segmentGranularity`. This will be the finest granularity that you can query at and still receive sensible results, but note that you can still query at anything coarser than this granularity. E.g., a value of `minute` will mean that records will be stored at minutely granularity, and can be sensibly queried at any multiple of minutes (including minutely, 5-minutely, hourly, etc).<br /><br />Any [granularity](../querying/granularities.md) can be provided here. Use `none` to store timestamps as-is, without any truncation. Note that `rollup` will be applied if it is set even when the `queryGranularity` is set to `none`. | `none` |
|
||||
| rollup | Whether to use ingestion-time [rollup](./rollup.md) or not. Note that rollup is still effective even when `queryGranularity` is set to `none`. Your data will be rolled up if they have the exactly same timestamp. | `true` |
|
||||
| intervals | A list of intervals defining time chunks for segments. Specify interval values using ISO8601 format. For example, `["2021-12-06T21:27:10+00:00/2021-12-07T00:00:00+00:00"]`. If you omit the time, the time defaults to "00:00:00".<br /><br />Druid breaks the list up and rounds off the list values based on the `segmentGranularity`.<br /><br />If `null` or not provided, batch ingestion tasks generally determine which time chunks to output based on the timestamps found in the input data.<br /><br />If specified, batch ingestion tasks may be able to skip a determining-partitions phase, which can result in faster ingestion. Batch ingestion tasks may also be able to request all their locks up-front instead of one by one. Batch ingestion tasks throw away any records with timestamps outside of the specified intervals.<br /><br />Ignored for any form of streaming ingestion. | `null` |
|
||||
|
|
|
@ -192,12 +192,13 @@ The following ISO 8601 periods are supported for `TIME_FLOOR` and the string con
|
|||
- PT1H
|
||||
- PT6H
|
||||
- P1D
|
||||
- P1W
|
||||
- P1W*
|
||||
- P1M
|
||||
- P3M
|
||||
- P1Y
|
||||
|
||||
For more information about partitioning, see [Partitioning](concepts.md#partitioning-by-time).
|
||||
For more information about partitioning, see [Partitioning](concepts.md#partitioning-by-time). <br /><br />
|
||||
*Avoid partitioning by week, `P1W`, because weeks don't align neatly with months and years, making it difficult to partition by coarser granularities later.
|
||||
|
||||
### `CLUSTERED BY`
|
||||
|
||||
|
|
|
@ -51,7 +51,7 @@ Druid supports the following granularity strings:
|
|||
- `six_hour`
|
||||
- `eight_hour`
|
||||
- `day`
|
||||
- `week`
|
||||
- `week`*
|
||||
- `month`
|
||||
- `quarter`
|
||||
- `year`
|
||||
|
@ -61,6 +61,8 @@ The minimum and maximum granularities are `none` and `all`, described as follows
|
|||
* `none` does not mean zero bucketing. It buckets data to millisecond granularity—the granularity of the internal index. You can think of `none` as equivalent to `millisecond`.
|
||||
> Do not use `none` in a [timeseries query](../querying/timeseriesquery.md); Druid fills empty interior time buckets with zeroes, meaning the output will contain results for every single millisecond in the requested interval.
|
||||
|
||||
*Avoid using the `week` granularity for partitioning at ingestion time, because weeks don't align neatly with months and years, making it difficult to partition by coarser granularities later.
|
||||
|
||||
#### Example:
|
||||
|
||||
Suppose you have data below stored in Apache Druid with millisecond ingestion granularity,
|
||||
|
|
Loading…
Reference in New Issue