Docs: Note on shard vs. index level doc frequencies.
Relates to #10154 and #10150 Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation. Closes #10451
This commit is contained in:
parent
3c52bc1098
commit
60bb65c4d9
|
@ -1,9 +1,10 @@
|
||||||
[[elasticsearch-reference]]
|
[[elasticsearch-reference]]
|
||||||
= Reference
|
= Reference
|
||||||
|
|
||||||
:version: 1.5.0
|
:version: 1.5.0
|
||||||
:branch: 1.5
|
:branch: 1.5
|
||||||
:jdk: 1.8.0_25
|
:jdk: 1.8.0_25
|
||||||
|
:defguide: https://www.elastic.co/guide/en/elasticsearch/guide/current
|
||||||
|
|
||||||
include::getting-started.asciidoc[]
|
include::getting-started.asciidoc[]
|
||||||
|
|
||||||
|
|
|
@ -53,7 +53,9 @@ in this case a high enough value should probably be used.
|
||||||
|
|
||||||
Terms are allocated to the high or low frequency groups based on the
|
Terms are allocated to the high or low frequency groups based on the
|
||||||
`cutoff_frequency`, which can be specified as an absolute frequency
|
`cutoff_frequency`, which can be specified as an absolute frequency
|
||||||
(`>=1`) or as a relative frequency (`0.0 .. 1.0`).
|
(`>=1`) or as a relative frequency (`0.0 .. 1.0`). (Remember that document
|
||||||
|
frequencies are computed on a per shard level as explained in the blog post
|
||||||
|
{defguide}/relevance-is-broken.html[Relevence is broken].)
|
||||||
|
|
||||||
Perhaps the most interesting property of this query is that it adapts to
|
Perhaps the most interesting property of this query is that it adapts to
|
||||||
domain specific stopwords automatically. For example, on a video hosting
|
domain specific stopwords automatically. For example, on a video hosting
|
||||||
|
|
|
@ -94,8 +94,8 @@ the query terms are above the given `cutoff_frequency` the query is
|
||||||
automatically transformed into a pure conjunction (`and`) query to
|
automatically transformed into a pure conjunction (`and`) query to
|
||||||
ensure fast execution.
|
ensure fast execution.
|
||||||
|
|
||||||
The `cutoff_frequency` can either be relative to the number of documents
|
The `cutoff_frequency` can either be relative to the total number of
|
||||||
in the index if in the range `[0..1)` or absolute if greater or equal to
|
documents if in the range `[0..1)` or absolute if greater or equal to
|
||||||
`1.0`.
|
`1.0`.
|
||||||
|
|
||||||
Here is an example showing a query composed of stopwords exclusivly:
|
Here is an example showing a query composed of stopwords exclusivly:
|
||||||
|
@ -112,6 +112,11 @@ Here is an example showing a query composed of stopwords exclusivly:
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
|
||||||
|
that when trying it out on test indexes with low document numbers you
|
||||||
|
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
===== phrase
|
===== phrase
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue