mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-19 19:35:02 +00:00
Merge pull request #15758 from nik9000/docs_suggest
suggest_mode is per shard
This commit is contained in:
commit
c5917870d8
@ -97,20 +97,20 @@ can contain misspellings (See parameter descriptions below).
|
|||||||
language model, the suggester will use this field to gain statistics to
|
language model, the suggester will use this field to gain statistics to
|
||||||
score corrections. This field is mandatory.
|
score corrections. This field is mandatory.
|
||||||
|
|
||||||
`gram_size`::
|
`gram_size`::
|
||||||
sets max size of the n-grams (shingles) in the `field`.
|
sets max size of the n-grams (shingles) in the `field`.
|
||||||
If the field doesn't contain n-grams (shingles) this should be omitted
|
If the field doesn't contain n-grams (shingles) this should be omitted
|
||||||
or set to `1`. Note that Elasticsearch tries to detect the gram size
|
or set to `1`. Note that Elasticsearch tries to detect the gram size
|
||||||
based on the specified `field`. If the field uses a `shingle` filter the
|
based on the specified `field`. If the field uses a `shingle` filter the
|
||||||
`gram_size` is set to the `max_shingle_size` if not explicitly set.
|
`gram_size` is set to the `max_shingle_size` if not explicitly set.
|
||||||
|
|
||||||
`real_word_error_likelihood`::
|
`real_word_error_likelihood`::
|
||||||
the likelihood of a term being a
|
the likelihood of a term being a
|
||||||
misspelled even if the term exists in the dictionary. The default is
|
misspelled even if the term exists in the dictionary. The default is
|
||||||
`0.95` corresponding to 5% of the real words are misspelled.
|
`0.95` corresponding to 5% of the real words are misspelled.
|
||||||
|
|
||||||
|
|
||||||
`confidence`::
|
`confidence`::
|
||||||
The confidence level defines a factor applied to the
|
The confidence level defines a factor applied to the
|
||||||
input phrases score which is used as a threshold for other suggest
|
input phrases score which is used as a threshold for other suggest
|
||||||
candidates. Only candidates that score higher than the threshold will be
|
candidates. Only candidates that score higher than the threshold will be
|
||||||
@ -118,7 +118,7 @@ can contain misspellings (See parameter descriptions below).
|
|||||||
only return suggestions that score higher than the input phrase. If set
|
only return suggestions that score higher than the input phrase. If set
|
||||||
to `0.0` the top N candidates are returned. The default is `1.0`.
|
to `0.0` the top N candidates are returned. The default is `1.0`.
|
||||||
|
|
||||||
`max_errors`::
|
`max_errors`::
|
||||||
the maximum percentage of the terms that at most
|
the maximum percentage of the terms that at most
|
||||||
considered to be misspellings in order to form a correction. This method
|
considered to be misspellings in order to form a correction. This method
|
||||||
accepts a float value in the range `[0..1)` as a fraction of the actual
|
accepts a float value in the range `[0..1)` as a fraction of the actual
|
||||||
@ -126,39 +126,39 @@ can contain misspellings (See parameter descriptions below).
|
|||||||
default is set to `1.0` which corresponds to that only corrections with
|
default is set to `1.0` which corresponds to that only corrections with
|
||||||
at most 1 misspelled term are returned. Note that setting this too high
|
at most 1 misspelled term are returned. Note that setting this too high
|
||||||
can negatively impact performance. Low values like `1` or `2` are recommended
|
can negatively impact performance. Low values like `1` or `2` are recommended
|
||||||
otherwise the time spend in suggest calls might exceed the time spend in
|
otherwise the time spend in suggest calls might exceed the time spend in
|
||||||
query execution.
|
query execution.
|
||||||
|
|
||||||
`separator`::
|
`separator`::
|
||||||
the separator that is used to separate terms in the
|
the separator that is used to separate terms in the
|
||||||
bigram field. If not set the whitespace character is used as a
|
bigram field. If not set the whitespace character is used as a
|
||||||
separator.
|
separator.
|
||||||
|
|
||||||
`size`::
|
`size`::
|
||||||
the number of candidates that are generated for each
|
the number of candidates that are generated for each
|
||||||
individual query term Low numbers like `3` or `5` typically produce good
|
individual query term Low numbers like `3` or `5` typically produce good
|
||||||
results. Raising this can bring up terms with higher edit distances. The
|
results. Raising this can bring up terms with higher edit distances. The
|
||||||
default is `5`.
|
default is `5`.
|
||||||
|
|
||||||
`analyzer`::
|
`analyzer`::
|
||||||
Sets the analyzer to analyse to suggest text with.
|
Sets the analyzer to analyse to suggest text with.
|
||||||
Defaults to the search analyzer of the suggest field passed via `field`.
|
Defaults to the search analyzer of the suggest field passed via `field`.
|
||||||
|
|
||||||
`shard_size`::
|
`shard_size`::
|
||||||
Sets the maximum number of suggested term to be
|
Sets the maximum number of suggested term to be
|
||||||
retrieved from each individual shard. During the reduce phase, only the
|
retrieved from each individual shard. During the reduce phase, only the
|
||||||
top N suggestions are returned based on the `size` option. Defaults to
|
top N suggestions are returned based on the `size` option. Defaults to
|
||||||
`5`.
|
`5`.
|
||||||
|
|
||||||
`text`::
|
`text`::
|
||||||
Sets the text / query to provide suggestions for.
|
Sets the text / query to provide suggestions for.
|
||||||
|
|
||||||
`highlight`::
|
`highlight`::
|
||||||
Sets up suggestion highlighting. If not provided then
|
Sets up suggestion highlighting. If not provided then
|
||||||
no `highlighted` field is returned. If provided must
|
no `highlighted` field is returned. If provided must
|
||||||
contain exactly `pre_tag` and `post_tag` which are
|
contain exactly `pre_tag` and `post_tag` which are
|
||||||
wrapped around the changed tokens. If multiple tokens
|
wrapped around the changed tokens. If multiple tokens
|
||||||
in a row are changed the entire phrase of changed tokens
|
in a row are changed the entire phrase of changed tokens
|
||||||
is wrapped rather than each token.
|
is wrapped rather than each token.
|
||||||
|
|
||||||
`collate`::
|
`collate`::
|
||||||
@ -217,21 +217,21 @@ curl -XPOST 'localhost:9200/_search' -d {
|
|||||||
|
|
||||||
The `phrase` suggester supports multiple smoothing models to balance
|
The `phrase` suggester supports multiple smoothing models to balance
|
||||||
weight between infrequent grams (grams (shingles) are not existing in
|
weight between infrequent grams (grams (shingles) are not existing in
|
||||||
the index) and frequent grams (appear at least once in the index).
|
the index) and frequent grams (appear at least once in the index).
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
`stupid_backoff`::
|
`stupid_backoff`::
|
||||||
a simple backoff model that backs off to lower
|
a simple backoff model that backs off to lower
|
||||||
order n-gram models if the higher order count is `0` and discounts the
|
order n-gram models if the higher order count is `0` and discounts the
|
||||||
lower order n-gram model by a constant factor. The default `discount` is
|
lower order n-gram model by a constant factor. The default `discount` is
|
||||||
`0.4`. Stupid Backoff is the default model.
|
`0.4`. Stupid Backoff is the default model.
|
||||||
|
|
||||||
`laplace`::
|
`laplace`::
|
||||||
a smoothing model that uses an additive smoothing where a
|
a smoothing model that uses an additive smoothing where a
|
||||||
constant (typically `1.0` or smaller) is added to all counts to balance
|
constant (typically `1.0` or smaller) is added to all counts to balance
|
||||||
weights, The default `alpha` is `0.5`.
|
weights, The default `alpha` is `0.5`.
|
||||||
|
|
||||||
`linear_interpolation`::
|
`linear_interpolation`::
|
||||||
a smoothing model that takes the weighted
|
a smoothing model that takes the weighted
|
||||||
mean of the unigrams, bigrams and trigrams based on user supplied
|
mean of the unigrams, bigrams and trigrams based on user supplied
|
||||||
weights (lambdas). Linear Interpolation doesn't have any default values.
|
weights (lambdas). Linear Interpolation doesn't have any default values.
|
||||||
@ -244,7 +244,7 @@ The `phrase` suggester uses candidate generators to produce a list of
|
|||||||
possible terms per term in the given text. A single candidate generator
|
possible terms per term in the given text. A single candidate generator
|
||||||
is similar to a `term` suggester called for each individual term in the
|
is similar to a `term` suggester called for each individual term in the
|
||||||
text. The output of the generators is subsequently scored in combination
|
text. The output of the generators is subsequently scored in combination
|
||||||
with the candidates from the other terms to for suggestion candidates.
|
with the candidates from the other terms to for suggestion candidates.
|
||||||
|
|
||||||
Currently only one type of candidate generator is supported, the
|
Currently only one type of candidate generator is supported, the
|
||||||
`direct_generator`. The Phrase suggest API accepts a list of generators
|
`direct_generator`. The Phrase suggest API accepts a list of generators
|
||||||
@ -256,26 +256,30 @@ called per term in the original text.
|
|||||||
The direct generators support the following parameters:
|
The direct generators support the following parameters:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
`field`::
|
`field`::
|
||||||
The field to fetch the candidate suggestions from. This is
|
The field to fetch the candidate suggestions from. This is
|
||||||
a required option that either needs to be set globally or per
|
a required option that either needs to be set globally or per
|
||||||
suggestion.
|
suggestion.
|
||||||
|
|
||||||
`size`::
|
`size`::
|
||||||
The maximum corrections to be returned per suggest text token.
|
The maximum corrections to be returned per suggest text token.
|
||||||
|
|
||||||
`suggest_mode`::
|
`suggest_mode`::
|
||||||
The suggest mode controls what suggestions are
|
The suggest mode controls what suggestions are included on the suggestions
|
||||||
included or controls for what suggest text terms, suggestions should be
|
generated on each shard. All values other than `always` can be thought of
|
||||||
suggested. Three possible values can be specified:
|
as an optimization to generate fewer suggestions to test on each shard and
|
||||||
** `missing`: Only suggest terms in the suggest text that aren't in the
|
are not rechecked at when combining the suggestions generated on each
|
||||||
index. This is the default.
|
shard. Thus `missing` will generate suggestions for terms on shards that do
|
||||||
** `popular`: Only suggest suggestions that occur in more docs then the
|
not contain them even other shards do contain them. Those should be
|
||||||
original suggest text term.
|
filtered out using `confidence`. Three possible values can be specified:
|
||||||
|
** `missing`: Only generate suggestions for terms that are not in the
|
||||||
|
shard. This is the default.
|
||||||
|
** `popular`: Only suggest terms that occur in more docs on the shard then
|
||||||
|
the original term.
|
||||||
** `always`: Suggest any matching suggestions based on terms in the
|
** `always`: Suggest any matching suggestions based on terms in the
|
||||||
suggest text.
|
suggest text.
|
||||||
|
|
||||||
`max_edits`::
|
`max_edits`::
|
||||||
The maximum edit distance candidate suggestions can have
|
The maximum edit distance candidate suggestions can have
|
||||||
in order to be considered as a suggestion. Can only be a value between 1
|
in order to be considered as a suggestion. Can only be a value between 1
|
||||||
and 2. Any other value result in an bad request error being thrown.
|
and 2. Any other value result in an bad request error being thrown.
|
||||||
@ -287,11 +291,11 @@ The direct generators support the following parameters:
|
|||||||
this number improves spellcheck performance. Usually misspellings don't
|
this number improves spellcheck performance. Usually misspellings don't
|
||||||
occur in the beginning of terms. (Old name "prefix_len" is deprecated)
|
occur in the beginning of terms. (Old name "prefix_len" is deprecated)
|
||||||
|
|
||||||
`min_word_length`::
|
`min_word_length`::
|
||||||
The minimum length a suggest text term must have in
|
The minimum length a suggest text term must have in
|
||||||
order to be included. Defaults to 4. (Old name "min_word_len" is deprecated)
|
order to be included. Defaults to 4. (Old name "min_word_len" is deprecated)
|
||||||
|
|
||||||
`max_inspections`::
|
`max_inspections`::
|
||||||
A factor that is used to multiply with the
|
A factor that is used to multiply with the
|
||||||
`shards_size` in order to inspect more candidate spell corrections on
|
`shards_size` in order to inspect more candidate spell corrections on
|
||||||
the shard level. Can improve accuracy at the cost of performance.
|
the shard level. Can improve accuracy at the cost of performance.
|
||||||
@ -306,7 +310,7 @@ The direct generators support the following parameters:
|
|||||||
cannot be fractional. The shard level document frequencies are used for
|
cannot be fractional. The shard level document frequencies are used for
|
||||||
this option.
|
this option.
|
||||||
|
|
||||||
`max_term_freq`::
|
`max_term_freq`::
|
||||||
The maximum threshold in number of documents a
|
The maximum threshold in number of documents a
|
||||||
suggest text token can exist in order to be included. Can be a relative
|
suggest text token can exist in order to be included. Can be a relative
|
||||||
percentage number (e.g 0.4) or an absolute number to represent document
|
percentage number (e.g 0.4) or an absolute number to represent document
|
||||||
@ -322,16 +326,16 @@ The direct generators support the following parameters:
|
|||||||
tokens passed to this candidate generator. This filter is applied to the
|
tokens passed to this candidate generator. This filter is applied to the
|
||||||
original token before candidates are generated.
|
original token before candidates are generated.
|
||||||
|
|
||||||
`post_filter`::
|
`post_filter`::
|
||||||
a filter (analyzer) that is applied to each of the
|
a filter (analyzer) that is applied to each of the
|
||||||
generated tokens before they are passed to the actual phrase scorer.
|
generated tokens before they are passed to the actual phrase scorer.
|
||||||
|
|
||||||
The following example shows a `phrase` suggest call with two generators,
|
The following example shows a `phrase` suggest call with two generators,
|
||||||
the first one is using a field containing ordinary indexed terms and the
|
the first one is using a field containing ordinary indexed terms and the
|
||||||
second one uses a field that uses terms indexed with a `reverse` filter
|
second one uses a field that uses terms indexed with a `reverse` filter
|
||||||
(tokens are index in reverse order). This is used to overcome the limitation
|
(tokens are index in reverse order). This is used to overcome the limitation
|
||||||
of the direct generators to require a constant prefix to provide
|
of the direct generators to require a constant prefix to provide
|
||||||
high-performance suggestions. The `pre_filter` and `post_filter` options
|
high-performance suggestions. The `pre_filter` and `post_filter` options
|
||||||
accept ordinary analyzer names.
|
accept ordinary analyzer names.
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
|
Loading…
x
Reference in New Issue
Block a user