Remove SQL experimental banner and other doc adjustments. (#7591)

* Remove SQL experimental banner and other doc adjustments. Also, - Adjust the ToC and other docs a bit so SQL and native queries are presented on more equal footing. - De-emphasize querying historicals and peons directly in the native query docs. This is a really niche thing and may have been confusing to include prominently in the very first paragraph. - Remove DataSketches and Kafka indexing service from the experimental features ToC. They are not experimental any longer and were there in error. * More notes. * Slight tweak. * Remove extra extra word. * Remove RT node from ToC.
2019-05-06 12:31:51 -07:00 · 2019-05-06 12:31:51 -07:00 · 727b65c7e5
parent f7bfe8f269
commit 727b65c7e5
8 changed files with 99 additions and 78 deletions
--- a/docs/content/development/experimental.md
+++ b/docs/content/development/experimental.md
@ -24,16 +24,15 @@ title: "Experimental Features"

 # Experimental Features

-Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended.
+Features often start out in "experimental" status that indicates they are still evolving.
+This can mean any of the following things:

-<div class="note caution">
-APIs for experimental features may change in backwards incompatible ways.
-</div>
+1. The feature's API may change even in minor releases or patch releases.
+2. The feature may have known "missing" pieces that will be added later.
+3. The feature may or may not have received full battle-testing in production environments.

-To enable experimental features, include their artifacts in the configuration runtime.properties file, e.g.,
+All experimental features are optional.

-```
-druid.extensions.loadList=["druid-histogram"]
-```
-
-The configuration files for all the Apache Druid (incubating) processes need to be updated with this.
+Note that not all of these points apply to every experimental feature. Some have been battle-tested in terms of
+implementation, but are still marked experimental due to an evolving API. Please check the documentation for each
+feature for full details.
--- a/docs/content/development/router.md
+++ b/docs/content/development/router.md
@ -24,6 +24,11 @@ title: "Router Process"

 # Router Process

+<div class="note info">
+The Router is an optional and <a href="../development/experimental.html">experimental</a> feature due to the fact that its recommended place in the Druid cluster architecture is still evolving.
+However, it has been battle-tested in production, and it hosts the powerful [Druid Console](../operations/management-uis.html#druid-console), so you should feel safe deploying it.
+</div>
+
 The Apache Druid (incubating) Router process can be used to route queries to different Broker processes. By default, the broker routes queries based on how [Rules](../operations/rule-configuration.html) are set up. For example, if 1 month of recent data is loaded into a `hot` cluster, queries that fall within the recent month can be routed to a dedicated set of brokers. Queries outside this range are routed to another set of brokers. This set up provides query isolation such that queries for more important data are not impacted by queries for less important data. 

 For query routing purposes, you should only ever need the Router process if you have a Druid cluster well into the terabyte range. 
--- a/docs/content/querying/aggregations.md
+++ b/docs/content/querying/aggregations.md
@ -279,22 +279,20 @@ The [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.ht

 Compared to the Theta sketch, the HLL sketch does not support set operations and has slightly slower update and merge speed, but requires significantly less space.

-#### Cardinality/HyperUnique (Deprecated)
+#### Cardinality, hyperUnique

-<div class="note caution">
-The Cardinality and HyperUnique aggregators are deprecated.
+<div class="note info">
 For new use cases, we recommend evaluating <a href="../development/extensions-core/datasketches-theta.html">DataSketches Theta Sketch</a> or <a href="../development/extensions-core/datasketches-hll.html">DataSketches HLL Sketch</a> instead.
-For existing users, we recommend evaluating the newer DataSketches aggregators and migrating if possible.
+The DataSketches aggregators are generally able to offer more flexibility and better accuracy than the classic Druid `cardinality` and `hyperUnique` aggregators.
 </div>

 The [Cardinality and HyperUnique](../querying/hll-old.html) aggregators are older aggregator implementations available by default in Druid that also provide distinct count estimates using the HyperLogLog algorithm. The newer DataSketches Theta and HLL extension-provided aggregators described above have superior accuracy and performance and are recommended instead. 

-The DataSketches team has published a [comparison study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we have deprecated Druid's original HLL aggregator.
+The DataSketches team has published a [comparison study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we are recommending using them in preference to Druid's original HLL-based aggregators.
+However, to ensure backwards compatibility, we will continue to support the classic aggregators.

 Please note that `hyperUnique` aggregators are not mutually compatible with Datasketches HLL or Theta sketches.

-Although deprecated, we will continue to support the older Cardinality/HyperUnique aggregators for backwards compatibility. 
-
 ##### Multi-column handling

 Note the DataSketches Theta and HLL aggregators currently only support single-column inputs. If you were previously using the Cardinality aggregator with multiple-column inputs, equivalent operations using Theta or HLL sketches are described below:
@ -326,10 +324,11 @@ The fixed buckets histogram can perform well when the distribution of the input

 We do not recommend the fixed buckets histogram for general use, as its usefulness is extremely data dependent. However, it is made available for users that have already identified use cases where a fixed buckets histogram is suitable.

-#### Approximate Histogram (Deprecated)
+#### Approximate Histogram (deprecated)

 <div class="note caution">
 The Approximate Histogram aggregator is deprecated.
+There are a number of other quantile estimation algorithms that offer better performance, accuracy, and memory footprint.
 We recommend using <a href="../development/extensions-core/datasketches-quantiles.html">DataSketches Quantiles</a> instead.
 </div>

--- a/docs/content/querying/lookups.md
+++ b/docs/content/querying/lookups.md
@ -55,6 +55,17 @@ Other lookup types are available as extensions, including:
 - Globally cached lookups from local files, remote URIs, or JDBC through [lookups-cached-global](../development/extensions-core/lookups-cached-global.html).
 - Globally cached lookups from a Kafka topic through [kafka-extraction-namespace](../development/extensions-core/kafka-extraction-namespace.html).

+Query Syntax
+------------
+
+In [Druid SQL](sql.html), lookups can be queried using the `LOOKUP` function, for example:
+
+```
+SELECT LOOKUP(column_name, 'lookup-name'), COUNT(*) FROM datasource GROUP BY 1
+```
+
+In native queries, lookups can be queried with [dimension specs or extraction functions](dimensionspecs.html).
+
 Query Execution
 ---------------
 When executing an aggregation query involving lookups, Druid can decide to apply lookups either while scanning and
--- a/docs/content/querying/querying.md
+++ b/docs/content/querying/querying.md
@ -1,6 +1,6 @@
 ---
 layout: doc_page
-title: "Querying"
+title: "Native queries"
 ---

 <!--
@ -22,26 +22,28 @@ title: "Querying"
  ~ under the License.
  -->

-# Querying
+# Native queries

-Apache Druid (incubating) queries are made using an HTTP REST style request to queryable processes ([Broker](../design/broker.html),
-[Historical](../design/historical.html). [Peons](../design/peons.html)) that are running stream ingestion tasks can also accept queries. The
-query is expressed in JSON and each of these process types expose the same
-REST query interface. For normal Druid operations, queries should be issued to the Broker processes. Queries can be posted
-to the queryable processes like this -
+<div class="note info">
+Apache Druid (incubating) supports two query languages: [Druid SQL](sql.html) and native queries, which SQL queries
+are planned into, and which end users can also issue directly. This document describes the native query language.
+</div>

- ```bash
- curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
- ```
+Native queries in Druid are JSON objects and are typically issued to the Broker or Router processes. Queries can be
+posted like this:
+
+```bash
+curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
+```
 
 Druid's native query language is JSON over HTTP, although many members of the community have contributed different 
 [client libraries](../development/libraries.html) in other languages to query Druid. 

 The Content-Type/Accept Headers can also take 'application/x-jackson-smile'.

- ```bash
- curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @<query_json_file>
- ```
+```bash
+curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @<query_json_file>
+```

 Note: If Accept header is not provided, it defaults to value of 'Content-Type' header.

@ -49,6 +51,11 @@ Druid's native query is relatively low level, mapping closely to how computation
 are designed to be lightweight and complete very quickly. This means that for more complex analysis, or to build 
 more complex visualizations, multiple Druid queries may be required.

+Even though queries are typically made to Brokers or Routers, they can also be accepted by
+[Historical](../design/historical.html) processes and by [Peons (task JVMs)](../design/peons.html)) that are running
+stream ingestion tasks. This may be valuable if you want to query results for specific segments that are served by
+specific processes.
+
 ## Available Queries

 Druid has numerous query types for various use cases. Queries are composed of various JSON properties and Druid has different types of queries for different use cases. The documentation for the various query types describe all the JSON properties that can be set.
--- a/docs/content/querying/select-query.md
+++ b/docs/content/querying/select-query.md
@ -24,7 +24,15 @@ title: "Select Queries"

 # Select Queries

-Select queries return raw Apache Druid (incubating) rows and support pagination.
+<div class="note caution">
+We encourage you to use the [Scan query](../querying/scan-query.html) type rather than Select whenever possible.
+In situations involving larger numbers of segments, the Select query can have very high memory and performance overhead.
+The Scan query does not have this issue.
+The major difference between the two is that the Scan query does not support pagination.
+However, the Scan query type is able to return a virtually unlimited number of results even without pagination, making it unnecessary in many cases.
+</div>
+
+Select queries return raw Druid rows and support pagination.

 ```json
 {
@ -41,12 +49,6 @@ Select queries return raw Apache Druid (incubating) rows and support pagination.
 }
 ```

-<div class="note info">
-Consider using the [Scan query](../querying/scan-query.html) instead of the Select query if you don't need pagination. 
-The Scan query returns results without pagination but is significantly more efficient in terms of both processing time
-and memory requirements. It is also capable of returning a virtually unlimited number of results.
-</div>
-
 There are several main parts to a select query:

 |property|description|required?|
--- a/docs/content/querying/sql.md
+++ b/docs/content/querying/sql.md
@ -31,12 +31,12 @@ title: "SQL"

 # SQL

-<div class="note caution">
-Built-in SQL is an <a href="../development/experimental.html">experimental</a> feature. The API described here is
-subject to change.
+<div class="note info">
+Apache Druid (incubating) supports two query languages: Druid SQL and [native queries](querying.html), which SQL queries
+are planned into, and which end users can also issue directly. This document describes the SQL language.
 </div>

-Apache Druid (incubating) SQL is a built-in SQL layer and an alternative to Druid's native JSON-based query language, and is powered by a
+Druid SQL is a built-in SQL layer and an alternative to Druid's native JSON-based query language, and is powered by a
 parser and planner based on [Apache Calcite](https://calcite.apache.org/). Druid SQL translates SQL into native Druid
 queries on the query Broker (the first process you query), which are then passed down to data processes as native Druid
 queries. Other than the (slight) overhead of translating SQL on the Broker, there isn't an additional performance
@ -125,7 +125,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`MIN(expr)`|Takes the minimum of numbers.|
 |`MAX(expr)`|Takes the maximum of numbers.|
 |`AVG(expr)`|Averages numbers.|
-|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a regular column or a hyperUnique column. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`.|
+|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a regular column or a hyperUnique column. This is always approximate, regardless of the value of "useApproximateCountDistinct". This uses Druid's builtin "cardinality" or "hyperUnique" aggregators. See also `COUNT(DISTINCT expr)`.|
 |`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct values of expr, which can be a regular column or an [HLL sketch](../development/extensions-core/datasketches-hll.html) column. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
 |`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.html) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
 |`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate quantiles on numeric or [approxHistogram](../development/extensions-core/approximate-histograms.html#approximate-histogram-aggregator) exprs. The "probability" should be between 0 and 1 (exclusive). The "resolution" is the number of centroids to use for the computation. Higher resolutions will give more precise results but also have higher overhead. If not provided, the default resolution is 50. The [approximate histogram extension](../development/extensions-core/approximate-histograms.html) must be loaded to use this function.|
@ -133,6 +133,8 @@ Only the COUNT aggregation can accept DISTINCT.
 |`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric or [fixed buckets histogram](../development/extensions-core/approximate-histograms.html#fixed-buckets-histogram) exprs. The "probability" should be between 0 and 1 (exclusive). The `numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters are described in the fixed buckets histogram documentation. The [approximate histogram extension](../development/extensions-core/approximate-histograms.html) must be loaded to use this function.|
 |`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced by `expr`, with `numEntries` maximum number of distinct values before false positve rate increases. See [bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.|

+For advice on choosing approximate aggregation functions, check out our [approximate aggregations documentation](aggregations.html#approx).
+
 ### Numeric functions

 Numeric functions will return 64 bit integers or 64 bit floats, depending on their inputs.
--- a/docs/content/toc.md
+++ b/docs/content/toc.md
@ -70,32 +70,34 @@ layout: toc
  * [Misc. Tasks](/docs/VERSION/ingestion/misc-tasks.html)

 ## Querying
-  * [Overview](/docs/VERSION/querying/querying.html)
-  * [Timeseries](/docs/VERSION/querying/timeseriesquery.html)
-  * [TopN](/docs/VERSION/querying/topnquery.html)
-  * [GroupBy](/docs/VERSION/querying/groupbyquery.html)
-  * [Time Boundary](/docs/VERSION/querying/timeboundaryquery.html)
-  * [Segment Metadata](/docs/VERSION/querying/segmentmetadataquery.html)
-  * [DataSource Metadata](/docs/VERSION/querying/datasourcemetadataquery.html)
-  * [Search](/docs/VERSION/querying/searchquery.html)
-  * [Select](/docs/VERSION/querying/select-query.html)
-  * [Scan](/docs/VERSION/querying/scan-query.html)
-  * Components
-    * [Datasources](/docs/VERSION/querying/datasource.html)
-    * [Filters](/docs/VERSION/querying/filters.html)
-    * [Aggregations](/docs/VERSION/querying/aggregations.html)
-    * [Post Aggregations](/docs/VERSION/querying/post-aggregations.html)
-    * [Granularities](/docs/VERSION/querying/granularities.html)
-    * [DimensionSpecs](/docs/VERSION/querying/dimensionspecs.html)
-    * [Context](/docs/VERSION/querying/query-context.html)
-  * [Multi-value dimensions](/docs/VERSION/querying/multi-value-dimensions.html)
-  * [SQL](/docs/VERSION/querying/sql.html)
-  * [Lookups](/docs/VERSION/querying/lookups.html)
-  * [Joins](/docs/VERSION/querying/joins.html)
-  * [Multitenancy](/docs/VERSION/querying/multitenancy.html)
-  * [Caching](/docs/VERSION/querying/caching.html)
-  * [Sorting Orders](/docs/VERSION/querying/sorting-orders.html)
-  * [Virtual Columns](/docs/VERSION/querying/virtual-columns.html)
+  * [Druid SQL](/docs/VERSION/querying/sql.html)
+  * [Native queries](/docs/VERSION/querying/querying.html)
+    * [Timeseries](/docs/VERSION/querying/timeseriesquery.html)
+    * [TopN](/docs/VERSION/querying/topnquery.html)
+    * [GroupBy](/docs/VERSION/querying/groupbyquery.html)
+    * [Time Boundary](/docs/VERSION/querying/timeboundaryquery.html)
+    * [Segment Metadata](/docs/VERSION/querying/segmentmetadataquery.html)
+    * [DataSource Metadata](/docs/VERSION/querying/datasourcemetadataquery.html)
+    * [Search](/docs/VERSION/querying/searchquery.html)
+    * [Scan](/docs/VERSION/querying/scan-query.html)
+    * [Select](/docs/VERSION/querying/select-query.html)
+    * Components
+      * [Datasources](/docs/VERSION/querying/datasource.html)
+      * [Filters](/docs/VERSION/querying/filters.html)
+      * [Aggregations](/docs/VERSION/querying/aggregations.html)
+      * [Post Aggregations](/docs/VERSION/querying/post-aggregations.html)
+      * [Granularities](/docs/VERSION/querying/granularities.html)
+      * [DimensionSpecs](/docs/VERSION/querying/dimensionspecs.html)
+      * [Sorting Orders](/docs/VERSION/querying/sorting-orders.html)
+      * [Virtual Columns](/docs/VERSION/querying/virtual-columns.html)
+      * [Context](/docs/VERSION/querying/query-context.html)
+  * Concepts
+    * [Multi-value dimensions](/docs/VERSION/querying/multi-value-dimensions.html)
+    * [Lookups](/docs/VERSION/querying/lookups.html)
+    * [Joins](/docs/VERSION/querying/joins.html)
+    * [Multitenancy](/docs/VERSION/querying/multitenancy.html)
+    * [Caching](/docs/VERSION/querying/caching.html)
+    * [Geographic Queries](/docs/VERSION/development/geo.html) (experimental)

 ## Design
  * [Overview](/docs/VERSION/design/index.html)
@ -108,7 +110,7 @@ layout: toc
    * [Historical](/docs/VERSION/design/historical.html)
    * [MiddleManager](/docs/VERSION/design/middlemanager.html)
      * [Peons](/docs/VERSION/design/peons.html)
-    * [Realtime (Deprecated)](/docs/VERSION/design/realtime.html)
+    * [Router](/docs/VERSION/development/router.html) (optional; experimental)
  * Dependencies
    * [Deep Storage](/docs/VERSION/dependencies/deep-storage.html)
    * [Metadata Storage](/docs/VERSION/dependencies/metadata-storage.html)
@ -161,13 +163,7 @@ layout: toc
  * [Build From Source](/docs/VERSION/development/build.html)
  * [Versioning](/docs/VERSION/development/versioning.html)
  * [Integration](/docs/VERSION/development/integrating-druid-with-other-technologies.html)
-  * Experimental Features
-    * [Overview](/docs/VERSION/development/experimental.html)
-    * [Approximate Histograms and Quantiles](/docs/VERSION/development/extensions-core/approximate-histograms.html)
-    * [Datasketches](/docs/VERSION/development/extensions-core/datasketches-extension.html)
-    * [Geographic Queries](/docs/VERSION/development/geo.html)
-    * [Router](/docs/VERSION/development/router.html)
-    * [Kafka Indexing Service](/docs/VERSION/development/extensions-core/kafka-ingestion.html)
+  * [Experimental Features](/docs/VERSION/development/experimental.html)

 ## Misc
  * [Druid Expressions Language](/docs/VERSION/misc/math-expr.html)