mirror of
https://github.com/apache/druid.git
synced 2025-02-17 07:25:02 +00:00
Fixing a few typos and style issues (#11883)
* grammar and format work * light writing touchup Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
parent
3abca73ee8
commit
02b578a3dd
@ -27,10 +27,11 @@ title: "Aggregations"
|
||||
> language. For information about aggregators available in SQL, refer to the
|
||||
> [SQL documentation](sql.md#aggregation-functions).
|
||||
|
||||
Aggregations can be provided at ingestion time as part of the ingestion spec as a way of summarizing data before it enters Apache Druid.
|
||||
Aggregations can also be specified as part of many queries at query time.
|
||||
You can use aggregations:
|
||||
- in the ingestion spec during ingestion to summarize data before it enters Apache Druid.
|
||||
- at query time to summarize result data.
|
||||
|
||||
Available aggregations are:
|
||||
The following sections list the available aggregate functions. Unless otherwise noted, aggregations are available at both ingestion and query time.
|
||||
|
||||
### Count aggregator
|
||||
|
||||
@ -49,7 +50,7 @@ query time.
|
||||
|
||||
#### `longSum` aggregator
|
||||
|
||||
computes the sum of values as a 64-bit, signed integer
|
||||
Computes the sum of values as a 64-bit, signed integer.
|
||||
|
||||
```json
|
||||
{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -60,7 +61,7 @@ computes the sum of values as a 64-bit, signed integer
|
||||
|
||||
#### `doubleSum` aggregator
|
||||
|
||||
Computes and stores the sum of values as 64-bit floating point value. Similar to `longSum`
|
||||
Computes and stores the sum of values as a 64-bit floating point value. Similar to `longSum`.
|
||||
|
||||
```json
|
||||
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -68,7 +69,7 @@ Computes and stores the sum of values as 64-bit floating point value. Similar to
|
||||
|
||||
#### `floatSum` aggregator
|
||||
|
||||
Computes and stores the sum of values as 32-bit floating point value. Similar to `longSum` and `doubleSum`
|
||||
Computes and stores the sum of values as a 32-bit floating point value. Similar to `longSum` and `doubleSum`.
|
||||
|
||||
```json
|
||||
{ "type" : "floatSum", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -78,7 +79,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
|
||||
|
||||
#### `doubleMin` aggregator
|
||||
|
||||
`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY
|
||||
`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY.
|
||||
|
||||
```json
|
||||
{ "type" : "doubleMin", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -86,7 +87,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
|
||||
|
||||
#### `doubleMax` aggregator
|
||||
|
||||
`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY
|
||||
`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY.
|
||||
|
||||
```json
|
||||
{ "type" : "doubleMax", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -94,7 +95,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
|
||||
|
||||
#### `floatMin` aggregator
|
||||
|
||||
`floatMin` computes the minimum of all metric values and Float.POSITIVE_INFINITY
|
||||
`floatMin` computes the minimum of all metric values and Float.POSITIVE_INFINITY.
|
||||
|
||||
```json
|
||||
{ "type" : "floatMin", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -102,7 +103,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
|
||||
|
||||
#### `floatMax` aggregator
|
||||
|
||||
`floatMax` computes the maximum of all metric values and Float.NEGATIVE_INFINITY
|
||||
`floatMax` computes the maximum of all metric values and Float.NEGATIVE_INFINITY.
|
||||
|
||||
```json
|
||||
{ "type" : "floatMax", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -110,7 +111,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
|
||||
|
||||
#### `longMin` aggregator
|
||||
|
||||
`longMin` computes the minimum of all metric values and Long.MAX_VALUE
|
||||
`longMin` computes the minimum of all metric values and Long.MAX_VALUE.
|
||||
|
||||
```json
|
||||
{ "type" : "longMin", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -118,7 +119,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
|
||||
|
||||
#### `longMax` aggregator
|
||||
|
||||
`longMax` computes the maximum of all metric values and Long.MIN_VALUE
|
||||
`longMax` computes the maximum of all metric values and Long.MIN_VALUE.
|
||||
|
||||
```json
|
||||
{ "type" : "longMax", "name" : <output_name>, "fieldName" : <metric_name> }
|
||||
@ -136,13 +137,13 @@ To accomplish mean aggregation on ingestion, refer to the [Quantiles aggregator]
|
||||
|
||||
### First / Last aggregator
|
||||
|
||||
(Double/Float/Long) First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
|
||||
(Double/Float/Long) Do not use First and Last aggregators in an ingestion spec. They are only supported for queries.
|
||||
|
||||
Note that queries with first/last aggregators on a segment created with rollup enabled will return the rolled up value, and not the last value within the raw ingested data.
|
||||
Note that queries with first/last aggregators on a segment created with rollup enabled return the rolled up value, and not the last value within the raw ingested data.
|
||||
|
||||
#### `doubleFirst` aggregator
|
||||
|
||||
`doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
|
||||
`doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -154,7 +155,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `doubleLast` aggregator
|
||||
|
||||
`doubleLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
|
||||
`doubleLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -166,7 +167,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `floatFirst` aggregator
|
||||
|
||||
`floatFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
|
||||
`floatFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -178,7 +179,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `floatLast` aggregator
|
||||
|
||||
`floatLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
|
||||
`floatLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -190,7 +191,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `longFirst` aggregator
|
||||
|
||||
`longFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
|
||||
`longFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -202,7 +203,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `longLast` aggregator
|
||||
|
||||
`longLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
|
||||
`longLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -214,7 +215,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `stringFirst` aggregator
|
||||
|
||||
`stringFirst` computes the metric value with the minimum timestamp or `null` if no row exist
|
||||
`stringFirst` computes the metric value with the minimum timestamp or `null` if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -229,7 +230,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
|
||||
|
||||
#### `stringLast` aggregator
|
||||
|
||||
`stringLast` computes the metric value with the maximum timestamp or `null` if no row exist
|
||||
`stringLast` computes the metric value with the maximum timestamp or `null` if no row exists.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -248,7 +249,7 @@ Returns any value including null. This aggregator can simplify and optimize the
|
||||
|
||||
#### `doubleAny` aggregator
|
||||
|
||||
`doubleAny` returns any double metric value
|
||||
`doubleAny` returns any double metric value.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -260,7 +261,7 @@ Returns any value including null. This aggregator can simplify and optimize the
|
||||
|
||||
#### `floatAny` aggregator
|
||||
|
||||
`floatAny` returns any float metric value
|
||||
`floatAny` returns any float metric value.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -272,7 +273,7 @@ Returns any value including null. This aggregator can simplify and optimize the
|
||||
|
||||
#### `longAny` aggregator
|
||||
|
||||
`longAny` returns any long metric value
|
||||
`longAny` returns any long metric value.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -284,7 +285,7 @@ Returns any value including null. This aggregator can simplify and optimize the
|
||||
|
||||
#### `stringAny` aggregator
|
||||
|
||||
`stringAny` returns any string metric value
|
||||
`stringAny` returns any string metric value.
|
||||
|
||||
```json
|
||||
{
|
||||
@ -434,8 +435,9 @@ This makes it possible to compute the results of a filtered and an unfiltered ag
|
||||
A grouping aggregator can only be used as part of GroupBy queries which have a subtotal spec. It returns a number for
|
||||
each output row that lets you infer whether a particular dimension is included in the sub-grouping used for that row. You can pass
|
||||
a *non-empty* list of dimensions to this aggregator which *must* be a subset of dimensions that you are grouping on.
|
||||
E.g if the aggregator has `["dim1", "dim2"]` as input dimensions and `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals,
|
||||
following can be the possible output of the aggregator
|
||||
|
||||
For example, if the aggregator has `["dim1", "dim2"]` as input dimensions and `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals, the
|
||||
possible output of the aggregator is:
|
||||
|
||||
| subtotal used in query | Output | (bits representation) |
|
||||
|------------------------|--------|-----------------------|
|
||||
@ -444,9 +446,8 @@ following can be the possible output of the aggregator
|
||||
| `["dim2"]` | 2 | (10) |
|
||||
| `[]` | 3 | (11) |
|
||||
|
||||
As illustrated in above example, output number can be thought of as an unsigned n bit number where n is the number of dimensions passed to the aggregator.
|
||||
The bit at position X is set in this number to 0 if a dimension at position X in input to aggregators is included in the sub-grouping. Otherwise, this bit
|
||||
is set to 1.
|
||||
As the example illustrates, you can think of the output number as an unsigned _n_ bit number where _n_ is the number of dimensions passed to the aggregator.
|
||||
Druid sets the bit at position X for the number to 0 if the sub-grouping includes a dimension at position X in the aggregator input. Otherwise, Druid sets this bit to 1.
|
||||
|
||||
```json
|
||||
{ "type" : "grouping", "name" : <output_name>, "groupings" : [<dimension>] }
|
||||
|
@ -32,7 +32,7 @@ The following JSON fields can be used in a query to operate on dimension values.
|
||||
|
||||
## DimensionSpec
|
||||
|
||||
`DimensionSpec`s define how dimension values get transformed prior to aggregation.
|
||||
A `DimensionSpec` defines how to transform dimension values prior to aggregation.
|
||||
|
||||
### Default DimensionSpec
|
||||
|
||||
@ -47,9 +47,9 @@ Returns dimension values as is and optionally renames the dimension.
|
||||
}
|
||||
```
|
||||
|
||||
When specifying a DimensionSpec on a numeric column, the user should include the type of the column in the `outputType` field. If left unspecified, the `outputType` defaults to STRING.
|
||||
When specifying a `DimensionSpec` on a numeric column, you should include the type of the column in the `outputType` field. The `outputType` defaults to STRING when not specified.
|
||||
|
||||
Please refer to the [Output Types](#output-types) section for more details.
|
||||
See [Output Types](#output-types) for more details.
|
||||
|
||||
### Extraction DimensionSpec
|
||||
|
||||
@ -65,32 +65,35 @@ Returns dimension values transformed using the given [extraction function](#extr
|
||||
}
|
||||
```
|
||||
|
||||
`outputType` may also be specified in an ExtractionDimensionSpec to apply type conversion to results before merging. If left unspecified, the `outputType` defaults to STRING.
|
||||
You can specify an `outputType` in an `ExtractionDimensionSpec` to apply type conversion to results before merging. The `outputType` defaults to STRING when not specified.
|
||||
|
||||
Please refer to the [Output Types](#output-types) section for more details.
|
||||
|
||||
### Filtered DimensionSpecs
|
||||
|
||||
These are only useful for multi-value dimensions. If you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.md) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
|
||||
A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a [query filter](filters.md) for a value of "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
|
||||
|
||||
It happens because "query filter" is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.md) for more details.
|
||||
Then groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each.
|
||||
This happens because Druid uses the "query filter" internally on bitmaps to match the row to include in query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which matches the row with dimension value ["v1", "v2", "v3"].
|
||||
|
||||
In addition to "query filter" which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimensionSpecs take a delegate DimensionSpec and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result.
|
||||
See the section on "Multi-value columns" in [segment](../design/segments.md) for more details.
|
||||
|
||||
The following filtered dimension spec acts as a whitelist or blacklist for values as per the "isWhitelist" attribute value.
|
||||
Then the groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each.
|
||||
|
||||
In addition to "query filter", which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimension specs take a delegate `DimensionSpec` and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result.
|
||||
|
||||
The following filtered dimension spec defines the values to include or exclude as per the `isWhitelist` attribute value.
|
||||
|
||||
```json
|
||||
{ "type" : "listFiltered", "delegate" : <dimensionSpec>, "values": <array of strings>, "isWhitelist": <optional attribute for true/false, default is true> }
|
||||
```
|
||||
|
||||
Following filtered dimension spec retains only the values matching regex. Note that `listFiltered` is faster than this and one should use that for whitelist or blacklist use case.
|
||||
The following filtered dimension spec retains only the values matching a regex. You should use the `listFiltered` function for inclusion and exclusion use cases because it is faster.
|
||||
|
||||
```json
|
||||
{ "type" : "regexFiltered", "delegate" : <dimensionSpec>, "pattern": <java regex pattern> }
|
||||
```
|
||||
|
||||
Following filtered dimension spec retains only the values starting with the same prefix.
|
||||
The following filtered dimension spec retains only the values starting with the same prefix.
|
||||
|
||||
```json
|
||||
{ "type" : "prefixFiltered", "delegate" : <dimensionSpec>, "prefix": <prefix string> }
|
||||
@ -102,8 +105,8 @@ For more details and examples, see [multi-value dimensions](multi-value-dimensio
|
||||
|
||||
> Lookups are an [experimental](../development/experimental.md) feature.
|
||||
|
||||
Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec.
|
||||
Generally speaking there is two different kind of lookups implementations.
|
||||
You can use lookup dimension specs to define a lookup implementation as a dimension spec directly.
|
||||
Generally, there are two kinds of lookup implementations.
|
||||
The first kind is passed at the query time like `map` implementation.
|
||||
|
||||
```json
|
||||
|
@ -78,7 +78,9 @@ Enums
|
||||
FirehoseFactory
|
||||
FlattenSpec
|
||||
Float.NEGATIVE_INFINITY
|
||||
Float.NEGATIVE_INFINITY.
|
||||
Float.POSITIVE_INFINITY
|
||||
Float.POSITIVE_INFINITY.
|
||||
ForwardedRequestCustomizer
|
||||
GC
|
||||
GPG
|
||||
@ -137,7 +139,9 @@ LZ4
|
||||
LZO
|
||||
LimitSpec
|
||||
Long.MAX_VALUE
|
||||
Long.MAX_VALUE.
|
||||
Long.MIN_VALUE
|
||||
Long.MIN_VALUE.
|
||||
Lucene
|
||||
MapBD
|
||||
MapDB
|
||||
@ -1939,4 +1943,4 @@ PiB
|
||||
protobuf
|
||||
Golang
|
||||
multiValueHandling
|
||||
|
||||
_n_
|
||||
|
Loading…
x
Reference in New Issue
Block a user