OpenSearch/docs/reference
Britta Weber 7944369fd1 Add `shard_min_doc_count` parameter for significant terms similar to `shard_size`
Significant terms internally maintain a priority queue per shard with a size potentially
lower than the number of terms. This queue uses the score as criterion to determine if
a bucket is kept or not. If many terms with low subsetDF score very high
but the `min_doc_count` is set high, this might result in no terms being
returned because the pq is filled with low frequent terms which are all sorted
out in the end.

This can be avoided by increasing the `shard_size` parameter to a higher value.
However, it is not immediately clear to which value this parameter must be set
because we can not know how many terms with low frequency are scored higher that
the high frequent terms that we are actually interested in.

On the other hand, if there is no routing of docs to shards involved, we can maybe
assume that the documents of classes and also the terms therein are distributed evenly
across shards. In that case it might be easier to not add documents to the pq that have
subsetDF <= `shard_min_doc_count` which can be set to something like
`min_doc_count`/number of shards  because we would assume that even when summing up
the subsetDF across shards `min_doc_count` will not be reached.

closes #5998
closes #6041
2014-05-07 18:02:56 +02:00
..
analysis Update keyword-tokenizer.asciidoc 2014-05-07 15:04:07 +02:00
cat Added type, max, min, queueSize & keepAlive to _cat/thread_pool 2014-04-28 12:00:27 +02:00
cluster fix field data stats doc 2014-05-06 15:57:00 +02:00
docs [DOCS] Fixed get asciidoc missing section warning 2014-04-28 11:39:12 +02:00
images [Doc] Add a chart about the relative error of the percentiles aggregation. 2014-03-14 12:23:23 +01:00
index-modules [DOC] Fix default values for filter cache size and field data circuit breaker. 2014-05-06 10:13:05 +02:00
indices [DOCS] Updated the mapping and field mapping docs to use the new format 2014-05-06 17:21:09 +02:00
mapping Removed mention of Spatial4J and JTS requirement 2014-05-06 14:49:48 +02:00
migration [DOCS] Included the `_percolator` index breaking change to migration docs. 2014-02-20 16:43:06 +01:00
modules Correcting gramma 2014-05-06 18:00:19 +02:00
query-dsl s/boost_factor/boost in custom_filters_score doc 2014-05-06 16:15:36 +02:00
search Add `shard_min_doc_count` parameter for significant terms similar to `shard_size` 2014-05-07 18:02:56 +02:00
setup Update JNA to latest version 2014-05-02 11:52:57 +02:00
testing [TEST] Randomized number of shards used for indices created during tests 2014-03-10 13:01:52 +01:00
analysis.asciidoc Add more anchor links to documentation 2013-09-30 13:13:16 -06:00
api-conventions.asciidoc [DOCS] rewrite -> fuzzy_rewrite in match query 2014-04-23 21:05:14 +02:00
cat.asciidoc Add _cat/plugins endpoint 2014-03-16 12:16:09 +01:00
cluster.asciidoc [DOCS] Fix HTTP endpoints after stats API changes 2014-01-09 11:30:28 +01:00
docs.asciidoc [DOCS] Moved termvector and mtermvectors from search to docs 2014-01-22 14:10:26 +01:00
getting-started.asciidoc Update getting-started.asciidoc 2014-05-06 16:32:33 +02:00
glossary.asciidoc Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
index-modules.asciidoc Removed 0.90.* deprecation and addition notifications 2014-02-07 20:52:49 +01:00
index.asciidoc [DOCS] getting started tutorial 2014-04-22 13:33:03 -04:00
indices.asciidoc [DOCS] Removed leftover indices status link 2014-04-28 11:39:12 +02:00
mapping.asciidoc [DOCS] Moved multi fields documentation into the core-types page 2014-01-22 10:05:58 +01:00
modules.asciidoc [DOCS] Fixed link to tribe.asciidoc 2014-01-13 22:01:12 +01:00
query-dsl.asciidoc Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
search.asciidoc Benchmark documentation 2014-04-14 14:08:41 -07:00
setup.asciidoc Author: Sean Gallagher 2014-04-07 14:43:35 -04:00
testing.asciidoc [DOCS] Test framework documentation 2013-12-02 18:01:45 +01:00