[Docs] clarification about cardinality accuracy (#34616)

Adds a bit more clarification about how accuracy is dependent on the dataset in question. Closes #18231
2018-10-22 13:15:45 -04:00 · 2018-10-22 13:15:45 -04:00 · d981746142
parent c344293aed
commit d981746142
1 changed files with 12 additions and 4 deletions
--- a/docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc
+++ b/docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc
@ -150,10 +150,18 @@ public static void main(String[] args) {

 image:images/cardinality_error.png[]

-For all 3 thresholds, counts have been accurate up to the configured threshold
-(although not guaranteed, this is likely to be the case). Please also note that
-even with a threshold as low as 100, the error remains very low, even when
-counting millions of items.
+For all 3 thresholds, counts have been accurate up to the configured threshold.
+Although not guaranteed, this is likely to be the case.  Accuracy in practice depends
+on the dataset in question.  In general, most datasets show consistently good
+accuracy. Also note that even with a threshold as low as 100, the error
+remains very low (1-6% as seen in the above graph) even when counting millions of items.
+
+The HyperLogLog++ algorithm depends on the leading zeros of hashed
+values, the exact distributions of hashes in a dataset can affect the 
+accuracy of the cardinality.  
+
+Please also note that even with a threshold as low as 100, the error remains
+very low, even when counting millions of items.

 ==== Pre-computed hashes