Docs: Note on shard vs. index level doc frequencies.
Relates to #10154 and #10150 Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation. Closes #10451
This commit is contained in:
parent
3c52bc1098
commit
60bb65c4d9
|
@ -4,6 +4,7 @@
|
|||
:version: 1.5.0
|
||||
:branch: 1.5
|
||||
:jdk: 1.8.0_25
|
||||
:defguide: https://www.elastic.co/guide/en/elasticsearch/guide/current
|
||||
|
||||
include::getting-started.asciidoc[]
|
||||
|
||||
|
|
|
@ -53,7 +53,9 @@ in this case a high enough value should probably be used.
|
|||
|
||||
Terms are allocated to the high or low frequency groups based on the
|
||||
`cutoff_frequency`, which can be specified as an absolute frequency
|
||||
(`>=1`) or as a relative frequency (`0.0 .. 1.0`).
|
||||
(`>=1`) or as a relative frequency (`0.0 .. 1.0`). (Remember that document
|
||||
frequencies are computed on a per shard level as explained in the blog post
|
||||
{defguide}/relevance-is-broken.html[Relevence is broken].)
|
||||
|
||||
Perhaps the most interesting property of this query is that it adapts to
|
||||
domain specific stopwords automatically. For example, on a video hosting
|
||||
|
|
|
@ -94,8 +94,8 @@ the query terms are above the given `cutoff_frequency` the query is
|
|||
automatically transformed into a pure conjunction (`and`) query to
|
||||
ensure fast execution.
|
||||
|
||||
The `cutoff_frequency` can either be relative to the number of documents
|
||||
in the index if in the range `[0..1)` or absolute if greater or equal to
|
||||
The `cutoff_frequency` can either be relative to the total number of
|
||||
documents if in the range `[0..1)` or absolute if greater or equal to
|
||||
`1.0`.
|
||||
|
||||
Here is an example showing a query composed of stopwords exclusivly:
|
||||
|
@ -112,6 +112,11 @@ Here is an example showing a query composed of stopwords exclusivly:
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
|
||||
IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
|
||||
that when trying it out on test indexes with low document numbers you
|
||||
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
|
||||
|
||||
[float]
|
||||
===== phrase
|
||||
|
||||
|
|
Loading…
Reference in New Issue