Fixing a few typos and style issues (#11883)

* grammar and format work

* light writing touchup

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
sthetland 2021-11-16 10:13:35 -08:00 committed by GitHub
parent 3abca73ee8
commit 02b578a3dd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 53 additions and 45 deletions

View File

@ -27,10 +27,11 @@ title: "Aggregations"
> language. For information about aggregators available in SQL, refer to the
> [SQL documentation](sql.md#aggregation-functions).
Aggregations can be provided at ingestion time as part of the ingestion spec as a way of summarizing data before it enters Apache Druid.
Aggregations can also be specified as part of many queries at query time.
You can use aggregations:
- in the ingestion spec during ingestion to summarize data before it enters Apache Druid.
- at query time to summarize result data.
Available aggregations are:
The following sections list the available aggregate functions. Unless otherwise noted, aggregations are available at both ingestion and query time.
### Count aggregator
@ -49,7 +50,7 @@ query time.
#### `longSum` aggregator
computes the sum of values as a 64-bit, signed integer
Computes the sum of values as a 64-bit, signed integer.
```json
{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
@ -60,7 +61,7 @@ computes the sum of values as a 64-bit, signed integer
#### `doubleSum` aggregator
Computes and stores the sum of values as 64-bit floating point value. Similar to `longSum`
Computes and stores the sum of values as a 64-bit floating point value. Similar to `longSum`.
```json
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
@ -68,7 +69,7 @@ Computes and stores the sum of values as 64-bit floating point value. Similar to
#### `floatSum` aggregator
Computes and stores the sum of values as 32-bit floating point value. Similar to `longSum` and `doubleSum`
Computes and stores the sum of values as a 32-bit floating point value. Similar to `longSum` and `doubleSum`.
```json
{ "type" : "floatSum", "name" : <output_name>, "fieldName" : <metric_name> }
@ -78,7 +79,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
#### `doubleMin` aggregator
`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY
`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY.
```json
{ "type" : "doubleMin", "name" : <output_name>, "fieldName" : <metric_name> }
@ -86,7 +87,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
#### `doubleMax` aggregator
`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY
`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY.
```json
{ "type" : "doubleMax", "name" : <output_name>, "fieldName" : <metric_name> }
@ -94,7 +95,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
#### `floatMin` aggregator
`floatMin` computes the minimum of all metric values and Float.POSITIVE_INFINITY
`floatMin` computes the minimum of all metric values and Float.POSITIVE_INFINITY.
```json
{ "type" : "floatMin", "name" : <output_name>, "fieldName" : <metric_name> }
@ -102,7 +103,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
#### `floatMax` aggregator
`floatMax` computes the maximum of all metric values and Float.NEGATIVE_INFINITY
`floatMax` computes the maximum of all metric values and Float.NEGATIVE_INFINITY.
```json
{ "type" : "floatMax", "name" : <output_name>, "fieldName" : <metric_name> }
@ -110,7 +111,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
#### `longMin` aggregator
`longMin` computes the minimum of all metric values and Long.MAX_VALUE
`longMin` computes the minimum of all metric values and Long.MAX_VALUE.
```json
{ "type" : "longMin", "name" : <output_name>, "fieldName" : <metric_name> }
@ -118,7 +119,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to
#### `longMax` aggregator
`longMax` computes the maximum of all metric values and Long.MIN_VALUE
`longMax` computes the maximum of all metric values and Long.MIN_VALUE.
```json
{ "type" : "longMax", "name" : <output_name>, "fieldName" : <metric_name> }
@ -136,13 +137,13 @@ To accomplish mean aggregation on ingestion, refer to the [Quantiles aggregator]
### First / Last aggregator
(Double/Float/Long) First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
(Double/Float/Long) Do not use First and Last aggregators in an ingestion spec. They are only supported for queries.
Note that queries with first/last aggregators on a segment created with rollup enabled will return the rolled up value, and not the last value within the raw ingested data.
Note that queries with first/last aggregators on a segment created with rollup enabled return the rolled up value, and not the last value within the raw ingested data.
#### `doubleFirst` aggregator
`doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
`doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
```json
{
@ -154,7 +155,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `doubleLast` aggregator
`doubleLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
`doubleLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
```json
{
@ -166,7 +167,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `floatFirst` aggregator
`floatFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
`floatFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
```json
{
@ -178,7 +179,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `floatLast` aggregator
`floatLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
`floatLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
```json
{
@ -190,7 +191,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `longFirst` aggregator
`longFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
`longFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
```json
{
@ -202,7 +203,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `longLast` aggregator
`longLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist
`longLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists.
```json
{
@ -214,7 +215,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `stringFirst` aggregator
`stringFirst` computes the metric value with the minimum timestamp or `null` if no row exist
`stringFirst` computes the metric value with the minimum timestamp or `null` if no row exists.
```json
{
@ -229,7 +230,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
#### `stringLast` aggregator
`stringLast` computes the metric value with the maximum timestamp or `null` if no row exist
`stringLast` computes the metric value with the maximum timestamp or `null` if no row exists.
```json
{
@ -248,7 +249,7 @@ Returns any value including null. This aggregator can simplify and optimize the
#### `doubleAny` aggregator
`doubleAny` returns any double metric value
`doubleAny` returns any double metric value.
```json
{
@ -260,7 +261,7 @@ Returns any value including null. This aggregator can simplify and optimize the
#### `floatAny` aggregator
`floatAny` returns any float metric value
`floatAny` returns any float metric value.
```json
{
@ -272,7 +273,7 @@ Returns any value including null. This aggregator can simplify and optimize the
#### `longAny` aggregator
`longAny` returns any long metric value
`longAny` returns any long metric value.
```json
{
@ -284,7 +285,7 @@ Returns any value including null. This aggregator can simplify and optimize the
#### `stringAny` aggregator
`stringAny` returns any string metric value
`stringAny` returns any string metric value.
```json
{
@ -434,8 +435,9 @@ This makes it possible to compute the results of a filtered and an unfiltered ag
A grouping aggregator can only be used as part of GroupBy queries which have a subtotal spec. It returns a number for
each output row that lets you infer whether a particular dimension is included in the sub-grouping used for that row. You can pass
a *non-empty* list of dimensions to this aggregator which *must* be a subset of dimensions that you are grouping on.
E.g if the aggregator has `["dim1", "dim2"]` as input dimensions and `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals,
following can be the possible output of the aggregator
For example, if the aggregator has `["dim1", "dim2"]` as input dimensions and `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals, the
possible output of the aggregator is:
| subtotal used in query | Output | (bits representation) |
|------------------------|--------|-----------------------|
@ -444,9 +446,8 @@ following can be the possible output of the aggregator
| `["dim2"]` | 2 | (10) |
| `[]` | 3 | (11) |
As illustrated in above example, output number can be thought of as an unsigned n bit number where n is the number of dimensions passed to the aggregator.
The bit at position X is set in this number to 0 if a dimension at position X in input to aggregators is included in the sub-grouping. Otherwise, this bit
is set to 1.
As the example illustrates, you can think of the output number as an unsigned _n_ bit number where _n_ is the number of dimensions passed to the aggregator.
Druid sets the bit at position X for the number to 0 if the sub-grouping includes a dimension at position X in the aggregator input. Otherwise, Druid sets this bit to 1.
```json
{ "type" : "grouping", "name" : <output_name>, "groupings" : [<dimension>] }

View File

@ -32,7 +32,7 @@ The following JSON fields can be used in a query to operate on dimension values.
## DimensionSpec
`DimensionSpec`s define how dimension values get transformed prior to aggregation.
A `DimensionSpec` defines how to transform dimension values prior to aggregation.
### Default DimensionSpec
@ -47,9 +47,9 @@ Returns dimension values as is and optionally renames the dimension.
}
```
When specifying a DimensionSpec on a numeric column, the user should include the type of the column in the `outputType` field. If left unspecified, the `outputType` defaults to STRING.
When specifying a `DimensionSpec` on a numeric column, you should include the type of the column in the `outputType` field. The `outputType` defaults to STRING when not specified.
Please refer to the [Output Types](#output-types) section for more details.
See [Output Types](#output-types) for more details.
### Extraction DimensionSpec
@ -65,32 +65,35 @@ Returns dimension values transformed using the given [extraction function](#extr
}
```
`outputType` may also be specified in an ExtractionDimensionSpec to apply type conversion to results before merging. If left unspecified, the `outputType` defaults to STRING.
You can specify an `outputType` in an `ExtractionDimensionSpec` to apply type conversion to results before merging. The `outputType` defaults to STRING when not specified.
Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
These are only useful for multi-value dimensions. If you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.md) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a [query filter](filters.md) for a value of "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
It happens because "query filter" is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.md) for more details.
Then groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each.
This happens because Druid uses the "query filter" internally on bitmaps to match the row to include in query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which matches the row with dimension value ["v1", "v2", "v3"].
In addition to "query filter" which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimensionSpecs take a delegate DimensionSpec and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result.
See the section on "Multi-value columns" in [segment](../design/segments.md) for more details.
The following filtered dimension spec acts as a whitelist or blacklist for values as per the "isWhitelist" attribute value.
Then the groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each.
In addition to "query filter", which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimension specs take a delegate `DimensionSpec` and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result.
The following filtered dimension spec defines the values to include or exclude as per the `isWhitelist` attribute value.
```json
{ "type" : "listFiltered", "delegate" : <dimensionSpec>, "values": <array of strings>, "isWhitelist": <optional attribute for true/false, default is true> }
```
Following filtered dimension spec retains only the values matching regex. Note that `listFiltered` is faster than this and one should use that for whitelist or blacklist use case.
The following filtered dimension spec retains only the values matching a regex. You should use the `listFiltered` function for inclusion and exclusion use cases because it is faster.
```json
{ "type" : "regexFiltered", "delegate" : <dimensionSpec>, "pattern": <java regex pattern> }
```
Following filtered dimension spec retains only the values starting with the same prefix.
The following filtered dimension spec retains only the values starting with the same prefix.
```json
{ "type" : "prefixFiltered", "delegate" : <dimensionSpec>, "prefix": <prefix string> }
@ -102,8 +105,8 @@ For more details and examples, see [multi-value dimensions](multi-value-dimensio
> Lookups are an [experimental](../development/experimental.md) feature.
Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec.
Generally speaking there is two different kind of lookups implementations.
You can use lookup dimension specs to define a lookup implementation as a dimension spec directly.
Generally, there are two kinds of lookup implementations.
The first kind is passed at the query time like `map` implementation.
```json

View File

@ -78,7 +78,9 @@ Enums
FirehoseFactory
FlattenSpec
Float.NEGATIVE_INFINITY
Float.NEGATIVE_INFINITY.
Float.POSITIVE_INFINITY
Float.POSITIVE_INFINITY.
ForwardedRequestCustomizer
GC
GPG
@ -137,7 +139,9 @@ LZ4
LZO
LimitSpec
Long.MAX_VALUE
Long.MAX_VALUE.
Long.MIN_VALUE
Long.MIN_VALUE.
Lucene
MapBD
MapDB
@ -1939,4 +1943,4 @@ PiB
protobuf
Golang
multiValueHandling
_n_