Merge pull request #15405 from alexg-dev/patch-1

More detailed explanation of some similarity types
2025-03-06 19:09:14 +00:00 · 2015-12-14 14:27:40 +01:00 · 2015-12-14 14:27:40 +01:00 · f20f41e02e
commit f20f41e02e
parent 6144457f01
1 changed files with 6 additions and 3 deletions
--- a/docs/reference/index-modules/similarity.asciidoc
+++ b/docs/reference/index-modules/similarity.asciidoc
@ -112,7 +112,10 @@ Type name: `DFR`
 ==== IB similarity.

 http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
-based model] . This similarity has the following options:
+based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
+sequence is primarily determined by the repetitive usage of its basic elements.
+For written texts this challenge would correspond to comparing the writing styles of diferent authors.
+This similarity has the following options:

 [horizontal]
 `distribution`::  Possible values: `ll` and `spl`.
@ -138,11 +141,11 @@ Type name: `LMDirichlet`
 ==== LM Jelinek Mercer similarity.

 http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
-Jelinek Mercer similarity] . This similarity has the following options:
+Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:

 [horizontal]
 `lambda`::  The optimal value depends on both the collection and the query. The optimal value is around `0.1`
-for title queries and `0.7` for long queries. Default to `0.1`.
+for title queries and `0.7` for long queries. Default to `0.1`. When value approaches `0`, documents that match more query terms will be ranked higher than those that match fewer terms.

 Type name: `LMJelinekMercer`