mirror of https://github.com/apache/druid.git
Add a note to the documentation about pre-built HLLSketches (#13088)
* add a note to the documentation about pre-built HLLSketches Druid actually supports ingesting a pre-generated sketch column by using the HLLSketchMerge aggregator. However, this functionality was previously not made clear in the documentation. * copyedit from the King's English to American English * add suggested style changes Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
parent
c8f4d72fb1
commit
0d7bf66578
|
@ -59,6 +59,9 @@ druid.extensions.loadList=["druid-datasketches"]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The `HLLSketchBuild` aggregator builds an HLL sketch object from the specified input column. When used during ingestion, Druid stores pre-generated HLL sketch objects in the datasource instead of the raw data from the input column.
|
||||||
|
When applied at query time on an existing dimension, you can use the resulting column as an intermediate dimension by the [post-aggregators](#post-aggregators).
|
||||||
|
|
||||||
> It is very common to use `HLLSketchBuild` in combination with [rollup](../../ingestion/rollup.md) to create a [metric](../../ingestion/ingestion-spec.html#metricsspec) on high-cardinality columns. In this example, a metric called `userid_hll` is included in the `metricsSpec`. This will perform a HLL sketch on the `userid` field at ingestion time, allowing for highly-performant approximate `COUNT DISTINCT` query operations and improving roll-up ratios when `userid` is then left out of the `dimensionsSpec`.
|
> It is very common to use `HLLSketchBuild` in combination with [rollup](../../ingestion/rollup.md) to create a [metric](../../ingestion/ingestion-spec.html#metricsspec) on high-cardinality columns. In this example, a metric called `userid_hll` is included in the `metricsSpec`. This will perform a HLL sketch on the `userid` field at ingestion time, allowing for highly-performant approximate `COUNT DISTINCT` query operations and improving roll-up ratios when `userid` is then left out of the `dimensionsSpec`.
|
||||||
>
|
>
|
||||||
> ```
|
> ```
|
||||||
|
@ -89,6 +92,8 @@ druid.extensions.loadList=["druid-datasketches"]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches from an input dataset. For example, you can set up a batch processing job to generate the sketches before sending the data to Druid. You must serialize the sketches in the input dataset to Base64-encoded bytes. Then, specify `HLLSketchMerge` for the input column in the native ingestion `metricsSpec`.
|
||||||
|
|
||||||
### Post Aggregators
|
### Post Aggregators
|
||||||
|
|
||||||
#### Estimate
|
#### Estimate
|
||||||
|
|
Loading…
Reference in New Issue