[DOCS] Document the `string_distance` parameter for term suggestor
This commit is contained in:
parent
11f5a2ebaf
commit
50d184066f
|
@ -9,70 +9,70 @@ suggest text is analyzed before terms are suggested. The suggested terms
|
|||
are provided per analyzed suggest text token. The `term` suggester
|
||||
doesn't take the query into account that is part of request.
|
||||
|
||||
==== Common suggest options:
|
||||
==== Common suggest options:
|
||||
|
||||
[horizontal]
|
||||
`text`::
|
||||
`text`::
|
||||
The suggest text. The suggest text is a required option that
|
||||
needs to be set globally or per suggestion.
|
||||
|
||||
`field`::
|
||||
`field`::
|
||||
The field to fetch the candidate suggestions from. This is
|
||||
an required option that either needs to be set globally or per
|
||||
suggestion.
|
||||
suggestion.
|
||||
|
||||
`analyzer`::
|
||||
`analyzer`::
|
||||
The analyzer to analyse the suggest text with. Defaults
|
||||
to the search analyzer of the suggest field.
|
||||
to the search analyzer of the suggest field.
|
||||
|
||||
`size`::
|
||||
`size`::
|
||||
The maximum corrections to be returned per suggest text
|
||||
token.
|
||||
token.
|
||||
|
||||
`sort`::
|
||||
`sort`::
|
||||
Defines how suggestions should be sorted per suggest text
|
||||
term. Two possible values:
|
||||
+
|
||||
** `score`: Sort by score first, then document frequency and
|
||||
then the term itself.
|
||||
** `score`: Sort by score first, then document frequency and
|
||||
then the term itself.
|
||||
** `frequency`: Sort by document frequency first, then similarity
|
||||
score and then the term itself.
|
||||
score and then the term itself.
|
||||
+
|
||||
`suggest_mode`::
|
||||
`suggest_mode`::
|
||||
The suggest mode controls what suggestions are
|
||||
included or controls for what suggest text terms, suggestions should be
|
||||
suggested. Three possible values can be specified:
|
||||
+
|
||||
suggested. Three possible values can be specified:
|
||||
+
|
||||
** `missing`: Only provide suggestions for suggest text terms that are
|
||||
not in the index. This is the default.
|
||||
not in the index. This is the default.
|
||||
** `popular`: Only suggest suggestions that occur in more docs then
|
||||
the original suggest text term.
|
||||
the original suggest text term.
|
||||
** `always`: Suggest any matching suggestions based on terms in the
|
||||
suggest text.
|
||||
|
||||
==== Other term suggest options:
|
||||
==== Other term suggest options:
|
||||
|
||||
[horizontal]
|
||||
`lowercase_terms`::
|
||||
Lower cases the suggest text terms after text analysis.
|
||||
`lowercase_terms`::
|
||||
Lower cases the suggest text terms after text analysis.
|
||||
|
||||
`max_edits`::
|
||||
`max_edits`::
|
||||
The maximum edit distance candidate suggestions can
|
||||
have in order to be considered as a suggestion. Can only be a value
|
||||
between 1 and 2. Any other value result in an bad request error being
|
||||
thrown. Defaults to 2.
|
||||
thrown. Defaults to 2.
|
||||
|
||||
`prefix_length`::
|
||||
`prefix_length`::
|
||||
The number of minimal prefix characters that must
|
||||
match in order be a candidate suggestions. Defaults to 1. Increasing
|
||||
this number improves spellcheck performance. Usually misspellings don't
|
||||
occur in the beginning of terms. (Old name "prefix_len" is deprecated)
|
||||
occur in the beginning of terms. (Old name "prefix_len" is deprecated)
|
||||
|
||||
`min_word_length`::
|
||||
`min_word_length`::
|
||||
The minimum length a suggest text term must have in
|
||||
order to be included. Defaults to 4. (Old name "min_word_len" is deprecated)
|
||||
|
||||
`shard_size`::
|
||||
`shard_size`::
|
||||
Sets the maximum number of suggestions to be retrieved
|
||||
from each individual shard. During the reduce phase only the top N
|
||||
suggestions are returned based on the `size` option. Defaults to the
|
||||
|
@ -81,24 +81,24 @@ doesn't take the query into account that is part of request.
|
|||
corrections at the cost of performance. Due to the fact that terms are
|
||||
partitioned amongst shards, the shard level document frequencies of
|
||||
spelling corrections may not be precise. Increasing this will make these
|
||||
document frequencies more precise.
|
||||
document frequencies more precise.
|
||||
|
||||
`max_inspections`::
|
||||
`max_inspections`::
|
||||
A factor that is used to multiply with the
|
||||
`shards_size` in order to inspect more candidate spell corrections on
|
||||
the shard level. Can improve accuracy at the cost of performance.
|
||||
Defaults to 5.
|
||||
Defaults to 5.
|
||||
|
||||
`min_doc_freq`::
|
||||
`min_doc_freq`::
|
||||
The minimal threshold in number of documents a
|
||||
suggestion should appear in. This can be specified as an absolute number
|
||||
or as a relative percentage of number of documents. This can improve
|
||||
quality by only suggesting high frequency terms. Defaults to 0f and is
|
||||
not enabled. If a value higher than 1 is specified then the number
|
||||
cannot be fractional. The shard level document frequencies are used for
|
||||
this option.
|
||||
this option.
|
||||
|
||||
`max_term_freq`::
|
||||
`max_term_freq`::
|
||||
The maximum threshold in number of documents a
|
||||
suggest text token can exist in order to be included. Can be a relative
|
||||
percentage number (e.g 0.4) or an absolute number to represent document
|
||||
|
@ -108,3 +108,15 @@ doesn't take the query into account that is part of request.
|
|||
usually spelled correctly on top of this also improves the spellcheck
|
||||
performance. The shard level document frequencies are used for this
|
||||
option.
|
||||
|
||||
`string_distance`::
|
||||
Which string distance implementation to use for comparing how similar
|
||||
suggested terms are. Five possible values can be specfied:
|
||||
`internal` - The default based on damerau_levenshtein but highly optimized
|
||||
for comparing string distancee for terms inside the index.
|
||||
`damerau_levenshtein` - String distance algorithm based on
|
||||
Damerau-Levenshtein algorithm.
|
||||
`levenstein` - String distance algorithm based on Levenstein edit distance
|
||||
algorithm.
|
||||
`jarowinkler` - String distance algorithm based on Jaro-Winkler algorithm.
|
||||
`ngram` - String distance algorithm based on character n-grams.
|
||||
|
|
Loading…
Reference in New Issue