mirror of https://github.com/apache/druid.git
160 lines
5.5 KiB
Markdown
160 lines
5.5 KiB
Markdown
---
|
|
layout: doc_page
|
|
title: "T-Digest Quantiles Sketch module"
|
|
---
|
|
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|
~ or more contributor license agreements. See the NOTICE file
|
|
~ distributed with this work for additional information
|
|
~ regarding copyright ownership. The ASF licenses this file
|
|
~ to you under the Apache License, Version 2.0 (the
|
|
~ "License"); you may not use this file except in compliance
|
|
~ with the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing,
|
|
~ software distributed under the License is distributed on an
|
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
~ KIND, either express or implied. See the License for the
|
|
~ specific language governing permissions and limitations
|
|
~ under the License.
|
|
-->
|
|
|
|
# T-Digest Quantiles Sketch module
|
|
|
|
This module provides Apache Druid (incubating) approximate sketch aggregators based on T-Digest.
|
|
T-Digest (https://github.com/tdunning/t-digest) is a popular datastructure for accurate on-line accumulation of
|
|
rank-based statistics such as quantiles and trimmed means.
|
|
The datastructure is also designed for parallel programming use cases like distributed aggregations or map reduce jobs by making combining two intermediate t-digests easy and efficient.
|
|
|
|
There are three flavors of T-Digest sketch aggregator available in Apache Druid (incubating):
|
|
|
|
1. buildTDigestSketch - used for building T-Digest sketches from raw numeric values. It generally makes sense to
|
|
use this aggregator when ingesting raw data into Druid. One can also use this aggregator during query time too to
|
|
generate sketches, just that one would be building these sketches on every query execution instead of building them
|
|
once during ingestion.
|
|
2. mergeTDigestSketch - used for merging pre-built T-Digest sketches. This aggregator is generally used during
|
|
query time to combine sketches generated by buildTDigestSketch aggregator.
|
|
3. quantilesFromTDigestSketch - used for generating quantiles from T-Digest sketches. This aggregator is generally used
|
|
during query time to generate quantiles from sketches built using the above two sketch generating aggregators.
|
|
|
|
To use this aggregator, make sure you [include](../../operations/including-extensions.html) the extension in your config file:
|
|
|
|
```
|
|
druid.extensions.loadList=["druid-tdigestsketch"]
|
|
```
|
|
|
|
### Aggregator
|
|
|
|
The result of the aggregation is a T-Digest sketch that is built ingesting numeric values from the raw data.
|
|
|
|
```json
|
|
{
|
|
"type" : "buildTDigestSketch",
|
|
"name" : <output_name>,
|
|
"fieldName" : <metric_name>,
|
|
"compression": <parameter that controls size and accuracy>
|
|
}
|
|
```
|
|
Example:
|
|
```json
|
|
{
|
|
"type": "buildTDigestSketch",
|
|
"name": "sketch",
|
|
"fieldName": "session_duration",
|
|
"compression": 200
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This String should always be "buildTDigestSketch"|yes|
|
|
|name|A String for the output (result) name of the calculation.|yes|
|
|
|fieldName|A String for the name of the input field containing raw numeric values.|yes|
|
|
|compression|Parameter that determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches.|no, defaults to 100|
|
|
|
|
|
|
The result of the aggregation is a T-Digest sketch that is built by merging pre-built T-Digest sketches.
|
|
|
|
```json
|
|
{
|
|
"type" : "mergeTDigestSketch",
|
|
"name" : <output_name>,
|
|
"fieldName" : <metric_name>,
|
|
"compression": <parameter that controls size and accuracy>
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This String should always be "mergeTDigestSketch"|yes|
|
|
|name|A String for the output (result) name of the calculation.|yes|
|
|
|fieldName|A String for the name of the input field containing raw numeric values.|yes|
|
|
|compression|Parameter that determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches.|no, defaults to 100|
|
|
|
|
Example:
|
|
```json
|
|
{
|
|
"queryType": "groupBy",
|
|
"dataSource": "test_datasource",
|
|
"granularity": "ALL",
|
|
"dimensions": [],
|
|
"aggregations": [{
|
|
"type": "mergeTDigestSketch",
|
|
"name": "merged_sketch",
|
|
"fieldName": "ingested_sketch",
|
|
"compression": 200
|
|
}],
|
|
"intervals": ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"]
|
|
}
|
|
```
|
|
### Post Aggregators
|
|
|
|
#### Quantiles
|
|
|
|
This returns an array of quantiles corresponding to a given array of fractions.
|
|
|
|
```json
|
|
{
|
|
"type" : "quantilesFromTDigestSketch",
|
|
"name": <output name>,
|
|
"field" : <post aggregator that refers to a TDigestSketch (fieldAccess or another post aggregator)>,
|
|
"fractions" : <array of fractions>
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This String should always be "quantilesFromTDigestSketch"|yes|
|
|
|name|A String for the output (result) name of the calculation.|yes|
|
|
|fieldName|A String for the name of the input field containing raw numeric values.|yes|
|
|
|fractions|Non-empty array of fractions between 0 and 1|yes|
|
|
|
|
Example:
|
|
```json
|
|
{
|
|
"queryType": "groupBy",
|
|
"dataSource": "test_datasource",
|
|
"granularity": "ALL",
|
|
"dimensions": [],
|
|
"aggregations": [{
|
|
"type": "mergeTDigestSketch",
|
|
"name": "merged_sketch",
|
|
"fieldName": "ingested_sketch",
|
|
"compression": 200
|
|
}],
|
|
"postAggregations": [{
|
|
"type": "quantilesFromTDigestSketch",
|
|
"name": "quantiles",
|
|
"fractions": [0, 0.5, 1],
|
|
"field": {
|
|
"type": "fieldAccess",
|
|
"fieldName": "merged_sketch"
|
|
}
|
|
}],
|
|
"intervals": ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"]
|
|
}
|
|
```
|