From 60bb65c4d98b430d3b634dc0f2c840ca7a0f89a1 Mon Sep 17 00:00:00 2001 From: Isabel Drost-Fromm Date: Tue, 7 Apr 2015 10:12:39 +0200 Subject: [PATCH] Docs: Note on shard vs. index level doc frequencies. Relates to #10154 and #10150 Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation. Closes #10451 --- docs/reference/index.asciidoc | 7 ++++--- .../query-dsl/queries/common-terms-query.asciidoc | 4 +++- docs/reference/query-dsl/queries/match-query.asciidoc | 9 +++++++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/reference/index.asciidoc b/docs/reference/index.asciidoc index 1c26d8d7673..3a5945d9931 100644 --- a/docs/reference/index.asciidoc +++ b/docs/reference/index.asciidoc @@ -1,9 +1,10 @@ [[elasticsearch-reference]] = Reference -:version: 1.5.0 -:branch: 1.5 -:jdk: 1.8.0_25 +:version: 1.5.0 +:branch: 1.5 +:jdk: 1.8.0_25 +:defguide: https://www.elastic.co/guide/en/elasticsearch/guide/current include::getting-started.asciidoc[] diff --git a/docs/reference/query-dsl/queries/common-terms-query.asciidoc b/docs/reference/query-dsl/queries/common-terms-query.asciidoc index 256d9bb74af..3e9a73e31dc 100644 --- a/docs/reference/query-dsl/queries/common-terms-query.asciidoc +++ b/docs/reference/query-dsl/queries/common-terms-query.asciidoc @@ -53,7 +53,9 @@ in this case a high enough value should probably be used. Terms are allocated to the high or low frequency groups based on the `cutoff_frequency`, which can be specified as an absolute frequency -(`>=1`) or as a relative frequency (`0.0 .. 1.0`). +(`>=1`) or as a relative frequency (`0.0 .. 1.0`). (Remember that document +frequencies are computed on a per shard level as explained in the blog post +{defguide}/relevance-is-broken.html[Relevence is broken].) Perhaps the most interesting property of this query is that it adapts to domain specific stopwords automatically. For example, on a video hosting diff --git a/docs/reference/query-dsl/queries/match-query.asciidoc b/docs/reference/query-dsl/queries/match-query.asciidoc index 674bf17b630..e2a8178135c 100644 --- a/docs/reference/query-dsl/queries/match-query.asciidoc +++ b/docs/reference/query-dsl/queries/match-query.asciidoc @@ -94,8 +94,8 @@ the query terms are above the given `cutoff_frequency` the query is automatically transformed into a pure conjunction (`and`) query to ensure fast execution. -The `cutoff_frequency` can either be relative to the number of documents -in the index if in the range `[0..1)` or absolute if greater or equal to +The `cutoff_frequency` can either be relative to the total number of +documents if in the range `[0..1)` or absolute if greater or equal to `1.0`. Here is an example showing a query composed of stopwords exclusivly: @@ -112,6 +112,11 @@ Here is an example showing a query composed of stopwords exclusivly: } -------------------------------------------------- + +IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means +that when trying it out on test indexes with low document numbers you +should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken]. + [float] ===== phrase