[Docs] clarification about cardinality accuracy (#34616)
Adds a bit more clarification about how accuracy is dependent on the dataset in question. Closes #18231
This commit is contained in:
parent
c344293aed
commit
d981746142
|
@ -150,10 +150,18 @@ public static void main(String[] args) {
|
|||
|
||||
image:images/cardinality_error.png[]
|
||||
|
||||
For all 3 thresholds, counts have been accurate up to the configured threshold
|
||||
(although not guaranteed, this is likely to be the case). Please also note that
|
||||
even with a threshold as low as 100, the error remains very low, even when
|
||||
counting millions of items.
|
||||
For all 3 thresholds, counts have been accurate up to the configured threshold.
|
||||
Although not guaranteed, this is likely to be the case. Accuracy in practice depends
|
||||
on the dataset in question. In general, most datasets show consistently good
|
||||
accuracy. Also note that even with a threshold as low as 100, the error
|
||||
remains very low (1-6% as seen in the above graph) even when counting millions of items.
|
||||
|
||||
The HyperLogLog++ algorithm depends on the leading zeros of hashed
|
||||
values, the exact distributions of hashes in a dataset can affect the
|
||||
accuracy of the cardinality.
|
||||
|
||||
Please also note that even with a threshold as low as 100, the error remains
|
||||
very low, even when counting millions of items.
|
||||
|
||||
==== Pre-computed hashes
|
||||
|
||||
|
|
Loading…
Reference in New Issue