Docs: Note on shard vs. index level doc frequencies.

Relates to #10154 and #10150

Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation.

Closes #10451
This commit is contained in:
Isabel Drost-Fromm 2015-04-07 10:12:39 +02:00 committed by Clinton Gormley
parent 3c52bc1098
commit 60bb65c4d9
3 changed files with 14 additions and 6 deletions

View File

@ -1,9 +1,10 @@
[[elasticsearch-reference]] [[elasticsearch-reference]]
= Reference = Reference
:version: 1.5.0 :version: 1.5.0
:branch: 1.5 :branch: 1.5
:jdk: 1.8.0_25 :jdk: 1.8.0_25
:defguide: https://www.elastic.co/guide/en/elasticsearch/guide/current
include::getting-started.asciidoc[] include::getting-started.asciidoc[]

View File

@ -53,7 +53,9 @@ in this case a high enough value should probably be used.
Terms are allocated to the high or low frequency groups based on the Terms are allocated to the high or low frequency groups based on the
`cutoff_frequency`, which can be specified as an absolute frequency `cutoff_frequency`, which can be specified as an absolute frequency
(`>=1`) or as a relative frequency (`0.0 .. 1.0`). (`>=1`) or as a relative frequency (`0.0 .. 1.0`). (Remember that document
frequencies are computed on a per shard level as explained in the blog post
{defguide}/relevance-is-broken.html[Relevence is broken].)
Perhaps the most interesting property of this query is that it adapts to Perhaps the most interesting property of this query is that it adapts to
domain specific stopwords automatically. For example, on a video hosting domain specific stopwords automatically. For example, on a video hosting

View File

@ -94,8 +94,8 @@ the query terms are above the given `cutoff_frequency` the query is
automatically transformed into a pure conjunction (`and`) query to automatically transformed into a pure conjunction (`and`) query to
ensure fast execution. ensure fast execution.
The `cutoff_frequency` can either be relative to the number of documents The `cutoff_frequency` can either be relative to the total number of
in the index if in the range `[0..1)` or absolute if greater or equal to documents if in the range `[0..1)` or absolute if greater or equal to
`1.0`. `1.0`.
Here is an example showing a query composed of stopwords exclusivly: Here is an example showing a query composed of stopwords exclusivly:
@ -112,6 +112,11 @@ Here is an example showing a query composed of stopwords exclusivly:
} }
-------------------------------------------------- --------------------------------------------------
IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
that when trying it out on test indexes with low document numbers you
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
[float] [float]
===== phrase ===== phrase