Merge pull request #15405 from alexg-dev/patch-1

More detailed explanation of some similarity types
This commit is contained in:
Clinton Gormley 2015-12-14 14:27:40 +01:00
parent 6144457f01
commit f20f41e02e
1 changed files with 6 additions and 3 deletions

View File

@ -112,7 +112,10 @@ Type name: `DFR`
==== IB similarity.
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
based model] . This similarity has the following options:
based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
sequence is primarily determined by the repetitive usage of its basic elements.
For written texts this challenge would correspond to comparing the writing styles of diferent authors.
This similarity has the following options:
[horizontal]
`distribution`:: Possible values: `ll` and `spl`.
@ -138,11 +141,11 @@ Type name: `LMDirichlet`
==== LM Jelinek Mercer similarity.
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
Jelinek Mercer similarity] . This similarity has the following options:
Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
[horizontal]
`lambda`:: The optimal value depends on both the collection and the query. The optimal value is around `0.1`
for title queries and `0.7` for long queries. Default to `0.1`.
for title queries and `0.7` for long queries. Default to `0.1`. When value approaches `0`, documents that match more query terms will be ranked higher than those that match fewer terms.
Type name: `LMJelinekMercer`