Correct errors in min_hash filter documentation

Related to #39671
This commit is contained in:
Mayya Sharipova 2019-03-08 16:16:03 -05:00
parent be9c37fc76
commit 671a209ed9
1 changed files with 2 additions and 2 deletions

View File

@ -30,7 +30,7 @@ occurring in a document is low. At the same time, as
internally each shingle is hashed into to 128-bit hash, you should choose internally each shingle is hashed into to 128-bit hash, you should choose
`k` small enough so that all possible `k` small enough so that all possible
different k-words shingles can be hashed to 128-bit hash with different k-words shingles can be hashed to 128-bit hash with
minimal collision. 5-word shingles typically work well. minimal collision.
* choosing the right settings for `hash_count`, `bucket_count` and * choosing the right settings for `hash_count`, `bucket_count` and
`hash_set_size` needs some experimentation. `hash_set_size` needs some experimentation.
@ -39,7 +39,7 @@ minimal collision. 5-word shingles typically work well.
will provide a higher guarantee that different tokens are will provide a higher guarantee that different tokens are
indexed to different buckets. indexed to different buckets.
** to improve the recall, ** to improve the recall,
you should increase `hash_token` parameter. For example, you should increase `hash_count` parameter. For example,
setting `hash_count=2`, will make each token to be hashed in setting `hash_count=2`, will make each token to be hashed in
two different ways, thus increasing the number of potential two different ways, thus increasing the number of potential
candidates for search. candidates for search.