Docs: Note on shard vs. index level doc frequencies.

Relates to #10154 and #10150

Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation.

Closes #10451
This commit is contained in:
Isabel Drost-Fromm 2015-04-07 10:12:39 +02:00 committed by Clinton Gormley
parent 3c52bc1098
commit 60bb65c4d9
3 changed files with 14 additions and 6 deletions

View File

@ -1,9 +1,10 @@
[[elasticsearch-reference]]
= Reference
:version: 1.5.0
:branch: 1.5
:jdk: 1.8.0_25
:version: 1.5.0
:branch: 1.5
:jdk: 1.8.0_25
:defguide: https://www.elastic.co/guide/en/elasticsearch/guide/current
include::getting-started.asciidoc[]

View File

@ -53,7 +53,9 @@ in this case a high enough value should probably be used.
Terms are allocated to the high or low frequency groups based on the
`cutoff_frequency`, which can be specified as an absolute frequency
(`>=1`) or as a relative frequency (`0.0 .. 1.0`).
(`>=1`) or as a relative frequency (`0.0 .. 1.0`). (Remember that document
frequencies are computed on a per shard level as explained in the blog post
{defguide}/relevance-is-broken.html[Relevence is broken].)
Perhaps the most interesting property of this query is that it adapts to
domain specific stopwords automatically. For example, on a video hosting

View File

@ -94,8 +94,8 @@ the query terms are above the given `cutoff_frequency` the query is
automatically transformed into a pure conjunction (`and`) query to
ensure fast execution.
The `cutoff_frequency` can either be relative to the number of documents
in the index if in the range `[0..1)` or absolute if greater or equal to
The `cutoff_frequency` can either be relative to the total number of
documents if in the range `[0..1)` or absolute if greater or equal to
`1.0`.
Here is an example showing a query composed of stopwords exclusivly:
@ -112,6 +112,11 @@ Here is an example showing a query composed of stopwords exclusivly:
}
--------------------------------------------------
IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
that when trying it out on test indexes with low document numbers you
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
[float]
===== phrase