From 02b578a3dd0161366788ce5738fdadcdc528d89a Mon Sep 17 00:00:00 2001 From: sthetland Date: Tue, 16 Nov 2021 10:13:35 -0800 Subject: [PATCH] Fixing a few typos and style issues (#11883) * grammar and format work * light writing touchup Co-authored-by: Charles Smith --- docs/querying/aggregations.md | 63 +++++++++++++++++---------------- docs/querying/dimensionspecs.md | 29 ++++++++------- website/.spelling | 6 +++- 3 files changed, 53 insertions(+), 45 deletions(-) diff --git a/docs/querying/aggregations.md b/docs/querying/aggregations.md index f5cf05d8056..cf592d9086b 100644 --- a/docs/querying/aggregations.md +++ b/docs/querying/aggregations.md @@ -27,10 +27,11 @@ title: "Aggregations" > language. For information about aggregators available in SQL, refer to the > [SQL documentation](sql.md#aggregation-functions). -Aggregations can be provided at ingestion time as part of the ingestion spec as a way of summarizing data before it enters Apache Druid. -Aggregations can also be specified as part of many queries at query time. +You can use aggregations: +- in the ingestion spec during ingestion to summarize data before it enters Apache Druid. +- at query time to summarize result data. -Available aggregations are: +The following sections list the available aggregate functions. Unless otherwise noted, aggregations are available at both ingestion and query time. ### Count aggregator @@ -49,7 +50,7 @@ query time. #### `longSum` aggregator -computes the sum of values as a 64-bit, signed integer +Computes the sum of values as a 64-bit, signed integer. ```json { "type" : "longSum", "name" : , "fieldName" : } @@ -60,7 +61,7 @@ computes the sum of values as a 64-bit, signed integer #### `doubleSum` aggregator -Computes and stores the sum of values as 64-bit floating point value. Similar to `longSum` +Computes and stores the sum of values as a 64-bit floating point value. Similar to `longSum`. ```json { "type" : "doubleSum", "name" : , "fieldName" : } @@ -68,7 +69,7 @@ Computes and stores the sum of values as 64-bit floating point value. Similar to #### `floatSum` aggregator -Computes and stores the sum of values as 32-bit floating point value. Similar to `longSum` and `doubleSum` +Computes and stores the sum of values as a 32-bit floating point value. Similar to `longSum` and `doubleSum`. ```json { "type" : "floatSum", "name" : , "fieldName" : } @@ -78,7 +79,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to #### `doubleMin` aggregator -`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY +`doubleMin` computes the minimum of all metric values and Double.POSITIVE_INFINITY. ```json { "type" : "doubleMin", "name" : , "fieldName" : } @@ -86,7 +87,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to #### `doubleMax` aggregator -`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY +`doubleMax` computes the maximum of all metric values and Double.NEGATIVE_INFINITY. ```json { "type" : "doubleMax", "name" : , "fieldName" : } @@ -94,7 +95,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to #### `floatMin` aggregator -`floatMin` computes the minimum of all metric values and Float.POSITIVE_INFINITY +`floatMin` computes the minimum of all metric values and Float.POSITIVE_INFINITY. ```json { "type" : "floatMin", "name" : , "fieldName" : } @@ -102,7 +103,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to #### `floatMax` aggregator -`floatMax` computes the maximum of all metric values and Float.NEGATIVE_INFINITY +`floatMax` computes the maximum of all metric values and Float.NEGATIVE_INFINITY. ```json { "type" : "floatMax", "name" : , "fieldName" : } @@ -110,7 +111,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to #### `longMin` aggregator -`longMin` computes the minimum of all metric values and Long.MAX_VALUE +`longMin` computes the minimum of all metric values and Long.MAX_VALUE. ```json { "type" : "longMin", "name" : , "fieldName" : } @@ -118,7 +119,7 @@ Computes and stores the sum of values as 32-bit floating point value. Similar to #### `longMax` aggregator -`longMax` computes the maximum of all metric values and Long.MIN_VALUE +`longMax` computes the maximum of all metric values and Long.MIN_VALUE. ```json { "type" : "longMax", "name" : , "fieldName" : } @@ -136,13 +137,13 @@ To accomplish mean aggregation on ingestion, refer to the [Quantiles aggregator] ### First / Last aggregator -(Double/Float/Long) First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries. +(Double/Float/Long) Do not use First and Last aggregators in an ingestion spec. They are only supported for queries. -Note that queries with first/last aggregators on a segment created with rollup enabled will return the rolled up value, and not the last value within the raw ingested data. +Note that queries with first/last aggregators on a segment created with rollup enabled return the rolled up value, and not the last value within the raw ingested data. #### `doubleFirst` aggregator -`doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist +`doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists. ```json { @@ -154,7 +155,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `doubleLast` aggregator -`doubleLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist +`doubleLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists. ```json { @@ -166,7 +167,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `floatFirst` aggregator -`floatFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist +`floatFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists. ```json { @@ -178,7 +179,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `floatLast` aggregator -`floatLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist +`floatLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists. ```json { @@ -190,7 +191,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `longFirst` aggregator -`longFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist +`longFirst` computes the metric value with the minimum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists. ```json { @@ -202,7 +203,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `longLast` aggregator -`longLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist +`longLast` computes the metric value with the maximum timestamp or 0 in default mode, or `null` in SQL-compatible mode if no row exists. ```json { @@ -214,7 +215,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `stringFirst` aggregator -`stringFirst` computes the metric value with the minimum timestamp or `null` if no row exist +`stringFirst` computes the metric value with the minimum timestamp or `null` if no row exists. ```json { @@ -229,7 +230,7 @@ Note that queries with first/last aggregators on a segment created with rollup e #### `stringLast` aggregator -`stringLast` computes the metric value with the maximum timestamp or `null` if no row exist +`stringLast` computes the metric value with the maximum timestamp or `null` if no row exists. ```json { @@ -248,7 +249,7 @@ Returns any value including null. This aggregator can simplify and optimize the #### `doubleAny` aggregator -`doubleAny` returns any double metric value +`doubleAny` returns any double metric value. ```json { @@ -260,7 +261,7 @@ Returns any value including null. This aggregator can simplify and optimize the #### `floatAny` aggregator -`floatAny` returns any float metric value +`floatAny` returns any float metric value. ```json { @@ -272,7 +273,7 @@ Returns any value including null. This aggregator can simplify and optimize the #### `longAny` aggregator -`longAny` returns any long metric value +`longAny` returns any long metric value. ```json { @@ -284,7 +285,7 @@ Returns any value including null. This aggregator can simplify and optimize the #### `stringAny` aggregator -`stringAny` returns any string metric value +`stringAny` returns any string metric value. ```json { @@ -434,8 +435,9 @@ This makes it possible to compute the results of a filtered and an unfiltered ag A grouping aggregator can only be used as part of GroupBy queries which have a subtotal spec. It returns a number for each output row that lets you infer whether a particular dimension is included in the sub-grouping used for that row. You can pass a *non-empty* list of dimensions to this aggregator which *must* be a subset of dimensions that you are grouping on. -E.g if the aggregator has `["dim1", "dim2"]` as input dimensions and `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals, -following can be the possible output of the aggregator + +For example, if the aggregator has `["dim1", "dim2"]` as input dimensions and `[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals, the +possible output of the aggregator is: | subtotal used in query | Output | (bits representation) | |------------------------|--------|-----------------------| @@ -444,9 +446,8 @@ following can be the possible output of the aggregator | `["dim2"]` | 2 | (10) | | `[]` | 3 | (11) | -As illustrated in above example, output number can be thought of as an unsigned n bit number where n is the number of dimensions passed to the aggregator. -The bit at position X is set in this number to 0 if a dimension at position X in input to aggregators is included in the sub-grouping. Otherwise, this bit -is set to 1. +As the example illustrates, you can think of the output number as an unsigned _n_ bit number where _n_ is the number of dimensions passed to the aggregator. +Druid sets the bit at position X for the number to 0 if the sub-grouping includes a dimension at position X in the aggregator input. Otherwise, Druid sets this bit to 1. ```json { "type" : "grouping", "name" : , "groupings" : [] } diff --git a/docs/querying/dimensionspecs.md b/docs/querying/dimensionspecs.md index b2ad5f5599b..4b0cdb7a836 100644 --- a/docs/querying/dimensionspecs.md +++ b/docs/querying/dimensionspecs.md @@ -32,7 +32,7 @@ The following JSON fields can be used in a query to operate on dimension values. ## DimensionSpec -`DimensionSpec`s define how dimension values get transformed prior to aggregation. +A `DimensionSpec` defines how to transform dimension values prior to aggregation. ### Default DimensionSpec @@ -47,9 +47,9 @@ Returns dimension values as is and optionally renames the dimension. } ``` -When specifying a DimensionSpec on a numeric column, the user should include the type of the column in the `outputType` field. If left unspecified, the `outputType` defaults to STRING. +When specifying a `DimensionSpec` on a numeric column, you should include the type of the column in the `outputType` field. The `outputType` defaults to STRING when not specified. -Please refer to the [Output Types](#output-types) section for more details. +See [Output Types](#output-types) for more details. ### Extraction DimensionSpec @@ -65,32 +65,35 @@ Returns dimension values transformed using the given [extraction function](#extr } ``` -`outputType` may also be specified in an ExtractionDimensionSpec to apply type conversion to results before merging. If left unspecified, the `outputType` defaults to STRING. +You can specify an `outputType` in an `ExtractionDimensionSpec` to apply type conversion to results before merging. The `outputType` defaults to STRING when not specified. Please refer to the [Output Types](#output-types) section for more details. ### Filtered DimensionSpecs -These are only useful for multi-value dimensions. If you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.md) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases. +A filtered `DimensionSpec` is only useful for multi-value dimensions. Say you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with a [query filter](filters.md) for a value of "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases. -It happens because "query filter" is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.md) for more details. -Then groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each. +This happens because Druid uses the "query filter" internally on bitmaps to match the row to include in query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which matches the row with dimension value ["v1", "v2", "v3"]. -In addition to "query filter" which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimensionSpecs take a delegate DimensionSpec and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result. +See the section on "Multi-value columns" in [segment](../design/segments.md) for more details. -The following filtered dimension spec acts as a whitelist or blacklist for values as per the "isWhitelist" attribute value. +Then the groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each. + +In addition to "query filter", which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimension specs take a delegate `DimensionSpec` and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result. + +The following filtered dimension spec defines the values to include or exclude as per the `isWhitelist` attribute value. ```json { "type" : "listFiltered", "delegate" : , "values": , "isWhitelist": } ``` -Following filtered dimension spec retains only the values matching regex. Note that `listFiltered` is faster than this and one should use that for whitelist or blacklist use case. +The following filtered dimension spec retains only the values matching a regex. You should use the `listFiltered` function for inclusion and exclusion use cases because it is faster. ```json { "type" : "regexFiltered", "delegate" : , "pattern": } ``` -Following filtered dimension spec retains only the values starting with the same prefix. +The following filtered dimension spec retains only the values starting with the same prefix. ```json { "type" : "prefixFiltered", "delegate" : , "prefix": } @@ -102,8 +105,8 @@ For more details and examples, see [multi-value dimensions](multi-value-dimensio > Lookups are an [experimental](../development/experimental.md) feature. -Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec. -Generally speaking there is two different kind of lookups implementations. +You can use lookup dimension specs to define a lookup implementation as a dimension spec directly. +Generally, there are two kinds of lookup implementations. The first kind is passed at the query time like `map` implementation. ```json diff --git a/website/.spelling b/website/.spelling index 9d474f7f6ee..01e9ff3cfa4 100644 --- a/website/.spelling +++ b/website/.spelling @@ -78,7 +78,9 @@ Enums FirehoseFactory FlattenSpec Float.NEGATIVE_INFINITY +Float.NEGATIVE_INFINITY. Float.POSITIVE_INFINITY +Float.POSITIVE_INFINITY. ForwardedRequestCustomizer GC GPG @@ -137,7 +139,9 @@ LZ4 LZO LimitSpec Long.MAX_VALUE +Long.MAX_VALUE. Long.MIN_VALUE +Long.MIN_VALUE. Lucene MapBD MapDB @@ -1939,4 +1943,4 @@ PiB protobuf Golang multiValueHandling - +_n_