parent
be9c37fc76
commit
671a209ed9
|
@ -30,7 +30,7 @@ occurring in a document is low. At the same time, as
|
|||
internally each shingle is hashed into to 128-bit hash, you should choose
|
||||
`k` small enough so that all possible
|
||||
different k-words shingles can be hashed to 128-bit hash with
|
||||
minimal collision. 5-word shingles typically work well.
|
||||
minimal collision.
|
||||
|
||||
* choosing the right settings for `hash_count`, `bucket_count` and
|
||||
`hash_set_size` needs some experimentation.
|
||||
|
@ -39,7 +39,7 @@ minimal collision. 5-word shingles typically work well.
|
|||
will provide a higher guarantee that different tokens are
|
||||
indexed to different buckets.
|
||||
** to improve the recall,
|
||||
you should increase `hash_token` parameter. For example,
|
||||
you should increase `hash_count` parameter. For example,
|
||||
setting `hash_count=2`, will make each token to be hashed in
|
||||
two different ways, thus increasing the number of potential
|
||||
candidates for search.
|
||||
|
|
Loading…
Reference in New Issue