mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-09 14:34:43 +00:00
parent
be9c37fc76
commit
671a209ed9
@ -30,7 +30,7 @@ occurring in a document is low. At the same time, as
|
||||
internally each shingle is hashed into to 128-bit hash, you should choose
|
||||
`k` small enough so that all possible
|
||||
different k-words shingles can be hashed to 128-bit hash with
|
||||
minimal collision. 5-word shingles typically work well.
|
||||
minimal collision.
|
||||
|
||||
* choosing the right settings for `hash_count`, `bucket_count` and
|
||||
`hash_set_size` needs some experimentation.
|
||||
@ -39,7 +39,7 @@ minimal collision. 5-word shingles typically work well.
|
||||
will provide a higher guarantee that different tokens are
|
||||
indexed to different buckets.
|
||||
** to improve the recall,
|
||||
you should increase `hash_token` parameter. For example,
|
||||
you should increase `hash_count` parameter. For example,
|
||||
setting `hash_count=2`, will make each token to be hashed in
|
||||
two different ways, thus increasing the number of potential
|
||||
candidates for search.
|
||||
|
Loading…
x
Reference in New Issue
Block a user