update links datasketches.github.io to datasketches.apache.org (#10107)

* update links datasketches.github.io to datasketches.apache.org

* now with more apache

* oops

* oops
This commit is contained in:
Clint Wylie 2020-07-01 14:56:17 -07:00 committed by GitHub
parent 3e92cdf1cf
commit 477335abb4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
13 changed files with 25 additions and 25 deletions

View File

@ -23,7 +23,7 @@ title: "DataSketches extension"
-->
Apache Druid aggregators based on [datasketches](https://datasketches.github.io/) library. Sketches are data structures implementing approximate streaming mergeable algorithms. Sketches can be ingested from the outside of Druid or built from raw data at ingestion time. Sketches can be stored in Druid segments as additive metrics.
Apache Druid aggregators based on [Apache DataSketches](https://datasketches.apache.org/) library. Sketches are data structures implementing approximate streaming mergeable algorithms. Sketches can be ingested from the outside of Druid or built from raw data at ingestion time. Sketches can be stored in Druid segments as additive metrics.
To use the datasketches aggregators, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:

View File

@ -23,7 +23,7 @@ title: "DataSketches HLL Sketch module"
-->
This module provides Apache Druid aggregators for distinct counting based on HLL sketch from [datasketches](https://datasketches.github.io/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row.
This module provides Apache Druid aggregators for distinct counting based on HLL sketch from [Apache DataSketches](https://datasketches.apache.org/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row.
You can use the HLL sketch aggregator on columns of any identifiers. It will return estimated cardinality of the column.
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:

View File

@ -23,7 +23,7 @@ title: "DataSketches Quantiles Sketch module"
-->
This module provides Apache Druid aggregators based on numeric quantiles DoublesSketch from [datasketches](https://datasketches.github.io/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cumulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.github.io/docs/Quantiles/QuantilesOverview.html).
This module provides Apache Druid aggregators based on numeric quantiles DoublesSketch from [Apache DataSketches](https://datasketches.apache.org/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cumulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.apache.org/docs/Quantiles/QuantilesOverview.html).
There are three major modes of operation:
@ -55,7 +55,7 @@ The result of the aggregation is a DoublesSketch that is the union of all sketch
|type|This String should always be "quantilesDoublesSketch"|yes|
|name|A String for the output (result) name of the calculation.|yes|
|fieldName|A String for the name of the input field (can contain sketches or raw numeric values).|yes|
|k|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2 from 2 to 32768. See the [Quantiles Accuracy](https://datasketches.github.io/docs/Quantiles/QuantilesAccuracy.html) for details. |no, defaults to 128|
|k|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2 from 2 to 32768. See the [Quantiles Accuracy](https://datasketches.apache.org/docs/Quantiles/QuantilesAccuracy.html) for details. |no, defaults to 128|
### Post Aggregators

View File

@ -23,7 +23,7 @@ title: "DataSketches Theta Sketch module"
-->
This module provides Apache Druid aggregators based on Theta sketch from [datasketches](https://datasketches.apache.org/) library. Note that sketch algorithms are approximate; see details in the "Accuracy" section of the datasketches doc.
This module provides Apache Druid aggregators based on Theta sketch from [Apache DataSketches](https://datasketches.apache.org/) library. Note that sketch algorithms are approximate; see details in the "Accuracy" section of the datasketches doc.
At ingestion time, this aggregator creates the Theta sketch objects which get stored in Druid segments. Logically speaking, a Theta sketch object can be thought of as a Set data structure. At query time, sketches are read and aggregated (set unioned) together. In the end, by default, you receive the estimate of the number of unique entries in the sketch object. Also, you can use post aggregators to do union, intersection or difference on sketch columns in the same row.
Note that you can use `thetaSketch` aggregator on columns which were not ingested using the same. It will return estimated cardinality of the column. It is recommended to use it at ingestion time as well to make querying faster.
@ -51,7 +51,7 @@ druid.extensions.loadList=["druid-datasketches"]
|name|A String for the output (result) name of the calculation.|yes|
|fieldName|A String for the name of the aggregator used at ingestion time.|yes|
|isInputThetaSketch|This should only be used at indexing time if your input data contains theta sketch objects. This would be the case if you use datasketches library outside of Druid, say with Pig/Hive, to produce the data that you are ingesting into Druid |no, defaults to false|
|size|Must be a power of 2. Internally, size refers to the maximum number of entries sketch object will retain. Higher size means higher accuracy but more space to store sketches. Note that after you index with a particular size, druid will persist sketch in segments and you will use size greater or equal to that at query time. See the [DataSketches site](https://datasketches.github.io/docs/Theta/ThetaSize.html) for details. In general, We recommend just sticking to default size. |no, defaults to 16384|
|size|Must be a power of 2. Internally, size refers to the maximum number of entries sketch object will retain. Higher size means higher accuracy but more space to store sketches. Note that after you index with a particular size, druid will persist sketch in segments and you will use size greater or equal to that at query time. See the [DataSketches site](https://datasketches.apache.org/docs/Theta/ThetaSize.html) for details. In general, We recommend just sticking to default size. |no, defaults to 16384|
### Post Aggregators

View File

@ -23,7 +23,7 @@ title: "DataSketches Tuple Sketch module"
-->
This module provides Apache Druid aggregators based on Tuple sketch from [datasketches](https://datasketches.github.io/) library. ArrayOfDoublesSketch sketches extend the functionality of the count-distinct Theta sketches by adding arrays of double values associated with unique keys.
This module provides Apache Druid aggregators based on Tuple sketch from [Apache DataSketches](https://datasketches.apache.org/) library. ArrayOfDoublesSketch sketches extend the functionality of the count-distinct Theta sketches by adding arrays of double values associated with unique keys.
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
@ -49,7 +49,7 @@ druid.extensions.loadList=["druid-datasketches"]
|type|This String should always be "arrayOfDoublesSketch"|yes|
|name|A String for the output (result) name of the calculation.|yes|
|fieldName|A String for the name of the input field.|yes|
|nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.github.io/docs/Theta/ThetaErrorTable.html) for details. |no, defaults to 16384|
|nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable.html) for details. |no, defaults to 16384|
|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1|
|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array|

View File

@ -40,17 +40,17 @@ Core extensions are maintained by Druid committers.
|druid-azure-extensions|Microsoft Azure deep storage.|[link](../development/extensions-core/azure.md)|
|druid-basic-security|Support for Basic HTTP authentication and role-based access control.|[link](../development/extensions-core/druid-basic-security.md)|
|druid-bloom-filter|Support for providing Bloom filters in druid queries.|[link](../development/extensions-core/bloom-filter.md)|
|druid-datasketches|Support for approximate counts and set operations with [DataSketches](https://datasketches.github.io/).|[link](../development/extensions-core/datasketches-extension.md)|
|druid-datasketches|Support for approximate counts and set operations with [Apache DataSketches](https://datasketches.apache.org/).|[link](../development/extensions-core/datasketches-extension.md)|
|druid-google-extensions|Google Cloud Storage deep storage.|[link](../development/extensions-core/google.md)|
|druid-hdfs-storage|HDFS deep storage.|[link](../development/extensions-core/hdfs.md)|
|druid-histogram|Approximate histograms and quantiles aggregator. Deprecated, please use the [DataSketches quantiles aggregator](../development/extensions-core/datasketches-quantiles.md) from the `druid-datasketches` extension instead.|[link](../development/extensions-core/approximate-histograms.md)|
|druid-kafka-extraction-namespace|Kafka-based namespaced lookup. Requires namespace lookup extension.|[link](../development/extensions-core/kafka-extraction-namespace.md)|
|druid-kafka-indexing-service|Supervised exactly-once Kafka ingestion for the indexing service.|[link](../development/extensions-core/kafka-ingestion.md)|
|druid-kafka-extraction-namespace|Apache Kafka-based namespaced lookup. Requires namespace lookup extension.|[link](../development/extensions-core/kafka-extraction-namespace.md)|
|druid-kafka-indexing-service|Supervised exactly-once Apache Kafka ingestion for the indexing service.|[link](../development/extensions-core/kafka-ingestion.md)|
|druid-kinesis-indexing-service|Supervised exactly-once Kinesis ingestion for the indexing service.|[link](../development/extensions-core/kinesis-ingestion.md)|
|druid-kerberos|Kerberos authentication for druid processes.|[link](../development/extensions-core/druid-kerberos.md)|
|druid-lookups-cached-global|A module for [lookups](../querying/lookups.md) providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.|[link](../development/extensions-core/lookups-cached-global.md)|
|druid-lookups-cached-single| Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups |[link](../development/extensions-core/druid-lookups.md)|
|druid-orc-extensions|Support for data in Apache Orc data format.|[link](../development/extensions-core/orc.md)|
|druid-orc-extensions|Support for data in Apache ORC data format.|[link](../development/extensions-core/orc.md)|
|druid-parquet-extensions|Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.|[link](../development/extensions-core/parquet.md)|
|druid-protobuf-extensions| Support for data in Protobuf data format.|[link](../development/extensions-core/protobuf.md)|
|druid-ranger-security|Support for access control through Apache Ranger.|[link](../development/extensions-core/druid-ranger-security.md)|

View File

@ -332,11 +332,11 @@ JavaScript functions are expected to return floating-point values.
### Count distinct
#### DataSketches Theta Sketch
#### Apache DataSketches Theta Sketch
The [DataSketches Theta Sketch](../development/extensions-core/datasketches-theta.md) extension-provided aggregator gives distinct count estimates with support for set union, intersection, and difference post-aggregators, using Theta sketches from the [datasketches](https://datasketches.github.io/) library.
The [DataSketches Theta Sketch](../development/extensions-core/datasketches-theta.md) extension-provided aggregator gives distinct count estimates with support for set union, intersection, and difference post-aggregators, using Theta sketches from the [Apache DataSketches](https://datasketches.apache.org/) library.
#### DataSketches HLL Sketch
#### Apache DataSketches HLL Sketch
The [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.md) extension-provided aggregator gives distinct count estimates using the HyperLogLog algorithm.
@ -349,7 +349,7 @@ Compared to the Theta sketch, the HLL sketch does not support set operations and
The [Cardinality and HyperUnique](../querying/hll-old.md) aggregators are older aggregator implementations available by default in Druid that also provide distinct count estimates using the HyperLogLog algorithm. The newer DataSketches Theta and HLL extension-provided aggregators described above have superior accuracy and performance and are recommended instead.
The DataSketches team has published a [comparison study](https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we are recommending using them in preference to Druid's original HLL-based aggregators.
The DataSketches team has published a [comparison study](https://datasketches.apache.org/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html) between Druid's original HLL algorithm and the DataSketches HLL algorithm. Based on the demonstrated advantages of the DataSketches implementation, we are recommending using them in preference to Druid's original HLL-based aggregators.
However, to ensure backwards compatibility, we will continue to support the classic aggregators.
Please note that `hyperUnique` aggregators are not mutually compatible with Datasketches HLL or Theta sketches.
@ -365,7 +365,7 @@ Note the DataSketches Theta and HLL aggregators currently only support single-co
#### DataSketches Quantiles Sketch
The [DataSketches Quantiles Sketch](../development/extensions-core/datasketches-quantiles.md) extension-provided aggregator provides quantile estimates and histogram approximations using the numeric quantiles DoublesSketch from the [datasketches](https://datasketches.github.io/) library.
The [DataSketches Quantiles Sketch](../development/extensions-core/datasketches-quantiles.md) extension-provided aggregator provides quantile estimates and histogram approximations using the numeric quantiles DoublesSketch from the [datasketches](https://datasketches.apache.org/) library.
We recommend this aggregator in general for quantiles/histogram use cases, as it provides formal error bounds and has distribution-independent accuracy.
@ -395,7 +395,7 @@ The [Approximate Histogram](../development/extensions-core/approximate-histogram
The algorithm used by this deprecated aggregator is highly distribution-dependent and its output is subject to serious distortions when the input does not fit within the algorithm's limitations.
A [study published by the DataSketches team](https://datasketches.github.io/docs/Quantiles/DruidApproxHistogramStudy.html) demonstrates some of the known failure modes of this algorithm:
A [study published by the DataSketches team](https://datasketches.apache.org/docs/Quantiles/DruidApproxHistogramStudy.html) demonstrates some of the known failure modes of this algorithm:
- The algorithm's quantile calculations can fail to provide results for a large range of rank values (all ranks less than 0.89 in the example used in the study), returning all zeroes instead.
- The algorithm can completely fail to record spikes in the tail ends of the distribution

View File

@ -17,7 +17,7 @@
~ under the License.
-->
This module provides Druid aggregators based on https://datasketches.github.io/.
This module provides Druid aggregators based on https://datasketches.apache.org/.
Credits: This module is a result of feedback and work done by following people.

View File

@ -25,7 +25,7 @@
<groupId>org.apache.druid.extensions</groupId>
<artifactId>druid-datasketches</artifactId>
<name>druid-datasketches</name>
<description>Druid Aggregators based on datasketches lib https://datasketches.github.io/</description>
<description>Druid Aggregators based on datasketches lib https://datasketches.apache.org/</description>
<parent>
<groupId>org.apache.druid</groupId>

View File

@ -40,7 +40,7 @@ import java.util.List;
/**
* This module is to support count-distinct operations using {@link HllSketch}.
* See <a href="https://datasketches.github.io/docs/HLL/HLL.html">HyperLogLog Sketch documentation</a>
* See <a href="https://datasketches.apache.org/docs/HLL/HLL.html">HyperLogLog Sketch documentation</a>
*/
public class HllSketchModule implements DruidModule
{

View File

@ -34,7 +34,7 @@ import java.util.List;
* This module is to support numeric Tuple sketches, which extend the functionality of the count-distinct
* Theta sketches by adding arrays of double values associated with unique keys.
*
* See <a href=https://datasketches.github.io/docs/Tuple/TupleOverview.html>Tuple Sketch Overview</a>
* See <a href=https://datasketches.apache.org/docs/Tuple/TupleOverview.html>Tuple Sketch Overview</a>
*/
public class ArrayOfDoublesSketchModule implements DruidModule
{

View File

@ -41,7 +41,7 @@ import java.util.Objects;
* The column number is optional (the default is 1).
* The parameter k is optional (the default is defined in the sketch library).
* The result is a quantiles sketch.
* See <a href=https://datasketches.github.io/docs/Quantiles/QuantilesOverview.html>Quantiles Sketch Overview</a>
* See <a href=https://datasketches.apache.org/docs/Quantiles/QuantilesOverview.html>Quantiles Sketch Overview</a>
*/
public class ArrayOfDoublesSketchToQuantilesSketchPostAggregator extends ArrayOfDoublesSketchUnaryPostAggregator
{

View File

@ -778,7 +778,7 @@ const METRIC_SPEC_FORM_FIELDS: Field<MetricSpec>[] = [
</p>
<p>
See the{' '}
<ExternalLink href="https://datasketches.github.io/docs/Theta/ThetaSize.html">
<ExternalLink href="https://datasketches.apache.org/docs/Theta/ThetaSize.html">
DataSketches site
</ExternalLink>{' '}
for details.
@ -843,7 +843,7 @@ const METRIC_SPEC_FORM_FIELDS: Field<MetricSpec>[] = [
</p>
<p>
Must be a power of 2 from 2 to 32768. See the{' '}
<ExternalLink href="https://datasketches.github.io/docs/Quantiles/QuantilesAccuracy.html">
<ExternalLink href="https://datasketches.apache.org/docs/Quantiles/QuantilesAccuracy.html">
Quantiles Accuracy
</ExternalLink>{' '}
for details.