Merge pull request #15405 from alexg-dev/patch-1
More detailed explanation of some similarity types
This commit is contained in:
parent
6144457f01
commit
f20f41e02e
|
@ -112,7 +112,10 @@ Type name: `DFR`
|
|||
==== IB similarity.
|
||||
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||
based model] . This similarity has the following options:
|
||||
based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
|
||||
sequence is primarily determined by the repetitive usage of its basic elements.
|
||||
For written texts this challenge would correspond to comparing the writing styles of diferent authors.
|
||||
This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
`distribution`:: Possible values: `ll` and `spl`.
|
||||
|
@ -138,11 +141,11 @@ Type name: `LMDirichlet`
|
|||
==== LM Jelinek Mercer similarity.
|
||||
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||
Jelinek Mercer similarity] . This similarity has the following options:
|
||||
Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
`lambda`:: The optimal value depends on both the collection and the query. The optimal value is around `0.1`
|
||||
for title queries and `0.7` for long queries. Default to `0.1`.
|
||||
for title queries and `0.7` for long queries. Default to `0.1`. When value approaches `0`, documents that match more query terms will be ranked higher than those that match fewer terms.
|
||||
|
||||
Type name: `LMJelinekMercer`
|
||||
|
||||
|
|
Loading…
Reference in New Issue