Improve similarity docs. (#29089)
This adds links to the relevant Lucene javadocs and warnings regarding similarities that might return 0 as a score. Close #29015
This commit is contained in:
parent
08c530907a
commit
1d6ed824c7
|
@ -97,22 +97,38 @@ similarity has the following option:
|
|||
Type name: `classic`
|
||||
|
||||
[float]
|
||||
[[drf]]
|
||||
[[dfr]]
|
||||
==== DFR similarity
|
||||
|
||||
Similarity that implements the
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
|
||||
from randomness] framework. This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
`basic_model`::
|
||||
Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`.
|
||||
Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelG.html[`be`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelD.html[`d`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelG.html[`g`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIF.html[`if`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIn.html[`in`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIne.html[`ine`] and
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelP.html[`p`].
|
||||
|
||||
`be`, `d` and `p` should be avoided in practice as they might return scores that
|
||||
are equal to 0 or infinite with terms that do not meet the expected random
|
||||
distribution.
|
||||
|
||||
`after_effect`::
|
||||
Possible values: `no`, `b` and `l`.
|
||||
Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffect.NoAfterEffect.html[`no`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffectB.html[`b`] and
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffectL.html[`l`].
|
||||
|
||||
`normalization`::
|
||||
Possible values: `no`, `h1`, `h2`, `h3` and `z`.
|
||||
Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/Normalization.NoNormalization.html[`no`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH1.html[`h1`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH2.html[`h2`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH1.html[`h3`] and
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationZ.html[`z`].
|
||||
|
||||
All options but the first option need a normalization value.
|
||||
|
||||
|
@ -127,7 +143,14 @@ model.
|
|||
This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
`independence_measure`:: Possible values `standardized`, `saturated`, `chisquared`.
|
||||
`independence_measure`:: Possible values
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceStandardized.html[`standardized`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceSaturated.html[`saturated`],
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceChiSquared.html[`chisquared`].
|
||||
|
||||
When using this similarity, it is highly recommended to remove stop words to get
|
||||
good relevance. Also beware that terms whose frequency is less than the expected
|
||||
frequency will get a score equal to 0.
|
||||
|
||||
Type name: `DFI`
|
||||
|
||||
|
@ -135,15 +158,19 @@ Type name: `DFI`
|
|||
[[ib]]
|
||||
==== IB similarity.
|
||||
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||
based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
|
||||
sequence is primarily determined by the repetitive usage of its basic elements.
|
||||
For written texts this challenge would correspond to comparing the writing styles of different authors.
|
||||
This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
`distribution`:: Possible values: `ll` and `spl`.
|
||||
`lambda`:: Possible values: `df` and `ttf`.
|
||||
`distribution`:: Possible values:
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/DistributionLL.html[`ll`] and
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/DistributionSPL.html[`spl`].
|
||||
`lambda`:: Possible values:
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/LambdaDF.html[`df`] and
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/LambdaTTF.html[`ttf`].
|
||||
`normalization`:: Same as in `DFR` similarity.
|
||||
|
||||
Type name: `IB`
|
||||
|
@ -152,19 +179,23 @@ Type name: `IB`
|
|||
[[lm_dirichlet]]
|
||||
==== LM Dirichlet similarity.
|
||||
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
|
||||
{lucene-core-javadoc}/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
|
||||
Dirichlet similarity] . This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
`mu`:: Default to `2000`.
|
||||
|
||||
The scoring formula in the paper assigns negative scores to terms that have
|
||||
fewer occurrences than predicted by the language model, which is illegal to
|
||||
Lucene, so such terms get a score of 0.
|
||||
|
||||
Type name: `LMDirichlet`
|
||||
|
||||
[float]
|
||||
[[lm_jelinek_mercer]]
|
||||
==== LM Jelinek Mercer similarity.
|
||||
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||
{lucene-core-javadoc}/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||
Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
|
|
Loading…
Reference in New Issue