From d9817461420de4be034895e208c5292c7aaf7c19 Mon Sep 17 00:00:00 2001 From: Zachary Tong Date: Mon, 22 Oct 2018 13:15:45 -0400 Subject: [PATCH] [Docs] clarification about cardinality accuracy (#34616) Adds a bit more clarification about how accuracy is dependent on the dataset in question. Closes #18231 --- .../metrics/cardinality-aggregation.asciidoc | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc b/docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc index 96822f6ea9c..a451c6da0db 100644 --- a/docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc +++ b/docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc @@ -150,10 +150,18 @@ public static void main(String[] args) { image:images/cardinality_error.png[] -For all 3 thresholds, counts have been accurate up to the configured threshold -(although not guaranteed, this is likely to be the case). Please also note that -even with a threshold as low as 100, the error remains very low, even when -counting millions of items. +For all 3 thresholds, counts have been accurate up to the configured threshold. +Although not guaranteed, this is likely to be the case. Accuracy in practice depends +on the dataset in question. In general, most datasets show consistently good +accuracy. Also note that even with a threshold as low as 100, the error +remains very low (1-6% as seen in the above graph) even when counting millions of items. + +The HyperLogLog++ algorithm depends on the leading zeros of hashed +values, the exact distributions of hashes in a dataset can affect the +accuracy of the cardinality. + +Please also note that even with a threshold as low as 100, the error remains +very low, even when counting millions of items. ==== Pre-computed hashes