Merge pull request #15405 from alexg-dev/patch-1
More detailed explanation of some similarity types
This commit is contained in:
parent
6144457f01
commit
f20f41e02e
|
@ -112,7 +112,10 @@ Type name: `DFR`
|
||||||
==== IB similarity.
|
==== IB similarity.
|
||||||
|
|
||||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||||
based model] . This similarity has the following options:
|
based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
|
||||||
|
sequence is primarily determined by the repetitive usage of its basic elements.
|
||||||
|
For written texts this challenge would correspond to comparing the writing styles of diferent authors.
|
||||||
|
This similarity has the following options:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
`distribution`:: Possible values: `ll` and `spl`.
|
`distribution`:: Possible values: `ll` and `spl`.
|
||||||
|
@ -138,11 +141,11 @@ Type name: `LMDirichlet`
|
||||||
==== LM Jelinek Mercer similarity.
|
==== LM Jelinek Mercer similarity.
|
||||||
|
|
||||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||||
Jelinek Mercer similarity] . This similarity has the following options:
|
Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
`lambda`:: The optimal value depends on both the collection and the query. The optimal value is around `0.1`
|
`lambda`:: The optimal value depends on both the collection and the query. The optimal value is around `0.1`
|
||||||
for title queries and `0.7` for long queries. Default to `0.1`.
|
for title queries and `0.7` for long queries. Default to `0.1`. When value approaches `0`, documents that match more query terms will be ranked higher than those that match fewer terms.
|
||||||
|
|
||||||
Type name: `LMJelinekMercer`
|
Type name: `LMJelinekMercer`
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue