diff --git a/docs/content/development/extensions-contrib/distinctcount.md b/docs/content/development/extensions-contrib/distinctcount.md index a39236052f4..7cf67b5f869 100644 --- a/docs/content/development/extensions-contrib/distinctcount.md +++ b/docs/content/development/extensions-contrib/distinctcount.md @@ -28,8 +28,8 @@ To use this Apache Druid (incubating) extension, make sure to [include](../../op Additionally, follow these steps: -(1) First, use a single dimension hash-based partition spec to partition data by a single dimension. For example visitor_id. This to make sure all rows with a particular value for that dimension will go into the same segment, or this might over count. -(2) Second, use distinctCount to calculate the distinct count, make sure queryGranularity is divided exactly by segmentGranularity or else the result will be wrong. +1. First, use a single dimension hash-based partition spec to partition data by a single dimension. For example visitor_id. This to make sure all rows with a particular value for that dimension will go into the same segment, or this might over count. +2. Second, use distinctCount to calculate the distinct count, make sure queryGranularity is divided exactly by segmentGranularity or else the result will be wrong. There are some limitations, when used with groupBy, the groupBy keys' numbers should not exceed maxIntermediateRows in every segment. If exceeded the result will be wrong. When used with topN, numValuesPerPass should not be too big. If too big the distinctCount will use a lot of memory and might cause the JVM to go our of memory. diff --git a/docs/content/development/extensions-contrib/influx.md b/docs/content/development/extensions-contrib/influx.md index c5c071bba48..62e036be725 100644 --- a/docs/content/development/extensions-contrib/influx.md +++ b/docs/content/development/extensions-contrib/influx.md @@ -35,6 +35,7 @@ A typical line looks like this: ```cpu,application=dbhost=prdb123,region=us-east-1 usage_idle=99.24,usage_user=0.55 1520722030000000000``` which contains four parts: + - measurement: A string indicating the name of the measurement represented (e.g. cpu, network, web_requests) - tags: zero or more key-value pairs (i.e. dimensions) - measurements: one or more key-value pairs; values can be numeric, boolean, or string @@ -43,6 +44,7 @@ which contains four parts: The parser extracts these fields into a map, giving the measurement the key `measurement` and the timestamp the key `_ts`. The tag and measurement keys are copied verbatim, so users should take care to avoid name collisions. It is up to the ingestion spec to decide which fields should be treated as dimensions and which should be treated as metrics (typically tags correspond to dimensions and measurements correspond to metrics). The parser is configured like so: + ```json "parser": { "type": "string", diff --git a/docs/content/development/extensions-contrib/materialized-view.md b/docs/content/development/extensions-contrib/materialized-view.md index 95bfde9a4eb..963a9446c90 100644 --- a/docs/content/development/extensions-contrib/materialized-view.md +++ b/docs/content/development/extensions-contrib/materialized-view.md @@ -33,6 +33,7 @@ In materialized-view-maintenance, dataSouces user ingested are called "base-data The `derivativeDataSource` supervisor is used to keep the timeline of derived-dataSource consistent with base-dataSource. Each `derivativeDataSource` supervisor is responsible for one derived-dataSource. A sample derivativeDataSource supervisor spec is shown below: + ```json { "type": "derivativeDataSource", @@ -90,6 +91,7 @@ A sample derivativeDataSource supervisor spec is shown below: In materialized-view-selection, we implement a new query type `view`. When we request a view query, Druid will try its best to optimize the query based on query dataSource and intervals. A sample view query spec is shown below: + ```json { "queryType": "view", @@ -124,6 +126,7 @@ A sample view query spec is shown below: } } ``` + There are 2 parts in a view query: |Field|Description|Required| diff --git a/docs/content/development/extensions-contrib/momentsketch-quantiles.md b/docs/content/development/extensions-contrib/momentsketch-quantiles.md index 966caa2fb29..3eeadaf87cc 100644 --- a/docs/content/development/extensions-contrib/momentsketch-quantiles.md +++ b/docs/content/development/extensions-contrib/momentsketch-quantiles.md @@ -38,6 +38,7 @@ druid.extensions.loadList=["druid-momentsketch"] The result of the aggregation is a momentsketch that is the union of all sketches either built from raw data or read from the segments. The `momentSketch` aggregator operates over raw data while the `momentSketchMerge` aggregator should be used when aggregating pre-computed sketches. + ```json { "type" : , @@ -59,6 +60,7 @@ The `momentSketch` aggregator operates over raw data while the `momentSketchMerg ### Post Aggregators Users can query for a set of quantiles using the `momentSketchSolveQuantiles` post-aggregator on the sketches created by the `momentSketch` or `momentSketchMerge` aggregators. + ```json { "type" : "momentSketchSolveQuantiles", @@ -69,6 +71,7 @@ Users can query for a set of quantiles using the `momentSketchSolveQuantiles` po ``` Users can also query for the min/max of a distribution: + ```json { "type" : "momentSketchMin" | "momentSketchMax", @@ -79,6 +82,7 @@ Users can also query for the min/max of a distribution: ### Example As an example of a query with sketches pre-aggregated at ingestion time, one could set up the following aggregator at ingest: + ```json { "type": "momentSketch", @@ -88,7 +92,9 @@ As an example of a query with sketches pre-aggregated at ingestion time, one cou "compress": true, } ``` + and make queries using the following aggregator + post-aggregator: + ```json { "aggregations": [{ diff --git a/docs/content/development/extensions-contrib/moving-average-query.md b/docs/content/development/extensions-contrib/moving-average-query.md index 5fc72688236..7e028cc5a42 100644 --- a/docs/content/development/extensions-contrib/moving-average-query.md +++ b/docs/content/development/extensions-contrib/moving-average-query.md @@ -33,6 +33,7 @@ These Aggregate Window Functions consume standard Druid Aggregators and outputs Moving Average encapsulates the [groupBy query](../../querying/groupbyquery.html) (Or [timeseries](../../querying/timeseriesquery.html) in case of no dimensions) in order to rely on the maturity of these query types. It runs the query in two main phases: + 1. Runs an inner [groupBy](../../querying/groupbyquery.html) or [timeseries](../../querying/timeseriesquery.html) query to compute Aggregators (i.e. daily count of events). 2. Passes over aggregated results in Broker, in order to compute Averagers (i.e. moving 7 day average of the daily count). @@ -110,6 +111,7 @@ These are properties which are common to all Averagers: #### Standard averagers These averagers offer four functions: + * Mean (Average) * MeanNoNulls (Ignores empty buckets). * Max @@ -121,6 +123,7 @@ In that case, the first records will ignore missing buckets and average won't be However, this also means that empty days in a sparse dataset will also be ignored. Example of usage: + ```json { "type" : "doubleMean", "name" : , "fieldName": } ``` @@ -130,6 +133,7 @@ This optional parameter is used to calculate over a single bucket within each cy A prime example would be weekly buckets, resulting in a Day of Week calculation. (Other examples: Month of year, Hour of day). I.e. when using these parameters: + * *granularity*: period=P1D (daily) * *buckets*: 28 * *cycleSize*: 7 @@ -146,6 +150,7 @@ All examples are based on the Wikipedia dataset provided in the Druid [tutorials Calculating a 7-buckets moving average for Wikipedia edit deltas. Query syntax: + ```json { "queryType": "movingAverage", @@ -176,6 +181,7 @@ Query syntax: ``` Result: + ```json [ { "version" : "v1", @@ -217,6 +223,7 @@ Result: Calculating a 7-buckets moving average for Wikipedia edit deltas, plus a ratio between the current period and the moving average. Query syntax: + ```json { "queryType": "movingAverage", @@ -264,6 +271,7 @@ Query syntax: ``` Result: + ```json [ { "version" : "v1", @@ -306,6 +314,7 @@ Result: Calculating an average of every first 10-minutes of the last 3 hours: Query syntax: + ```json { "queryType": "movingAverage", diff --git a/docs/content/development/extensions-contrib/tdigestsketch-quantiles.md b/docs/content/development/extensions-contrib/tdigestsketch-quantiles.md index 9947e017697..3fc07cf2982 100644 --- a/docs/content/development/extensions-contrib/tdigestsketch-quantiles.md +++ b/docs/content/development/extensions-contrib/tdigestsketch-quantiles.md @@ -58,7 +58,9 @@ The result of the aggregation is a T-Digest sketch that is built ingesting numer "compression": } ``` + Example: + ```json { "type": "buildTDigestSketch", @@ -95,6 +97,7 @@ The result of the aggregation is a T-Digest sketch that is built by merging pre- |compression|Parameter that determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches.|no, defaults to 100| Example: + ```json { "queryType": "groupBy", @@ -110,6 +113,7 @@ Example: "intervals": ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"] } ``` + ### Post Aggregators #### Quantiles @@ -133,6 +137,7 @@ This returns an array of quantiles corresponding to a given array of fractions. |fractions|Non-empty array of fractions between 0 and 1|yes| Example: + ```json { "queryType": "groupBy", diff --git a/docs/content/development/extensions-core/druid-basic-security.md b/docs/content/development/extensions-core/druid-basic-security.md index 4282f910db3..e067fdf4ed1 100644 --- a/docs/content/development/extensions-core/druid-basic-security.md +++ b/docs/content/development/extensions-core/druid-basic-security.md @@ -173,6 +173,7 @@ Return a list of all user names. Return the name and role information of the user with name {userName} Example output: + ```json { "name": "druid2", @@ -183,9 +184,11 @@ Example output: ``` This API supports the following flags: + - `?full`: The response will also include the full information for each role currently assigned to the user. Example output: + ```json { "name": "druid2", @@ -268,6 +271,7 @@ Return a list of all role names. Return name and permissions for the role named {roleName}. Example output: + ```json { "name": "druidRole2", @@ -299,6 +303,7 @@ This API supports the following flags: - `?simplifyPermissions`: The permissions in the output will contain only a list of `resourceAction` objects, without the extraneous `resourceNamePattern` field. The `users` field will be null when `?full` is not specified. Example output: + ```json { "name": "druidRole2", diff --git a/docs/content/development/extensions-core/druid-lookups.md b/docs/content/development/extensions-core/druid-lookups.md index 53476eb8106..9f5798e6a43 100644 --- a/docs/content/development/extensions-core/druid-lookups.md +++ b/docs/content/development/extensions-core/druid-lookups.md @@ -75,6 +75,7 @@ Same for Loading cache, developer can implement a new type of loading cache by i ##### Example of Polling On-heap Lookup This example demonstrates a polling cache that will update its on-heap cache every 10 minutes + ```json { "type":"pollingLookup", diff --git a/docs/content/development/extensions-core/orc.md b/docs/content/development/extensions-core/orc.md index af7a3151d84..791531d9378 100644 --- a/docs/content/development/extensions-core/orc.md +++ b/docs/content/development/extensions-core/orc.md @@ -269,6 +269,7 @@ This extension, first available in version 0.15.0, replaces the previous 'contri ingestion task is *incompatible*, and will need modified to work with the newer 'core' extension. To migrate to 0.15.0+: + * In `inputSpec` of `ioConfig`, `inputFormat` must be changed from `"org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat"` to `"org.apache.orc.mapreduce.OrcInputFormat"` * The 'contrib' extension supported a `typeString` property, which provided the schema of the @@ -276,6 +277,7 @@ ORC file, of which was essentially required to have the types correct, but notab facilitated column renaming. In the 'core' extension, column renaming can be achieved with [`flattenSpec` expressions](../../ingestion/flatten-json.html). For example, `"typeString":"struct"` with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid schema would need replaced with: + ```json "flattenSpec": { "fields": [ @@ -293,10 +295,12 @@ with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid sc ... } ``` + * The 'contrib' extension supported a `mapFieldNameFormat` property, which provided a way to specify a dimension to flatten `OrcMap` columns with primitive types. This functionality has also been replaced with [`flattenSpec` expressions](../../ingestion/flatten-json.html). For example: `"mapFieldNameFormat": "_"` for a dimension `nestedData_dim1`, to preserve Druid schema could be replaced with + ```json "flattenSpec": { "fields": [ diff --git a/docs/content/querying/filters.md b/docs/content/querying/filters.md index 2f9b23a19c9..53e0853d234 100644 --- a/docs/content/querying/filters.md +++ b/docs/content/querying/filters.md @@ -282,6 +282,7 @@ greater than, less than, greater than or equal to, less than or equal to, and "b Bound filters support the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details. The following bound filter expresses the condition `21 <= age <= 31`: + ```json { "type": "bound", @@ -293,6 +294,7 @@ The following bound filter expresses the condition `21 <= age <= 31`: ``` This filter expresses the condition `foo <= name <= hoo`, using the default lexicographic sorting order. + ```json { "type": "bound", @@ -303,6 +305,7 @@ This filter expresses the condition `foo <= name <= hoo`, using the default lexi ``` Using strict bounds, this filter expresses the condition `21 < age < 31` + ```json { "type": "bound", @@ -316,6 +319,7 @@ Using strict bounds, this filter expresses the condition `21 < age < 31` ``` The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses `age < 31`. + ```json { "type": "bound", @@ -327,6 +331,7 @@ The user can also specify a one-sided bound by omitting "upper" or "lower". This ``` Likewise, this filter expresses `age >= 18` + ```json { "type": "bound", @@ -355,6 +360,7 @@ The interval filter supports the use of extraction functions, see [Filtering wit If an extraction function is used with this filter, the extraction function should output values that are parseable as long milliseconds. The following example filters on the time ranges of October 1-7, 2014 and November 15-16, 2014. + ```json { "type" : "interval",