OpenSearch/docs/reference/search
Britta Weber 7944369fd1 Add `shard_min_doc_count` parameter for significant terms similar to `shard_size`
Significant terms internally maintain a priority queue per shard with a size potentially
lower than the number of terms. This queue uses the score as criterion to determine if
a bucket is kept or not. If many terms with low subsetDF score very high
but the `min_doc_count` is set high, this might result in no terms being
returned because the pq is filled with low frequent terms which are all sorted
out in the end.

This can be avoided by increasing the `shard_size` parameter to a higher value.
However, it is not immediately clear to which value this parameter must be set
because we can not know how many terms with low frequency are scored higher that
the high frequent terms that we are actually interested in.

On the other hand, if there is no routing of docs to shards involved, we can maybe
assume that the documents of classes and also the terms therein are distributed evenly
across shards. In that case it might be easier to not add documents to the pq that have
subsetDF <= `shard_min_doc_count` which can be set to something like
`min_doc_count`/number of shards  because we would assume that even when summing up
the subsetDF across shards `min_doc_count` will not be reached.

closes #5998
closes #6041
2014-05-07 18:02:56 +02:00
..
aggregations Add `shard_min_doc_count` parameter for significant terms similar to `shard_size` 2014-05-07 18:02:56 +02:00
facets [Doc] doc updates for date histogram interval 2014-03-14 18:55:32 +01:00
request [DOCS] Add /_search_shards documentation 2014-04-22 08:54:32 -06:00
suggesters [DOCS] Update phrase-suggest.asciidoc 2014-05-06 10:28:13 +02:00
aggregations.asciidoc [DOCS] Various aggregation doc fixes 2014-03-13 09:05:25 +01:00
benchmark.asciidoc Update benchmark.asciidoc 2014-04-22 14:16:10 +02:00
count.asciidoc [DOCS] fixed count docs, it now requires a top-level query object, same as other apis 2014-02-13 13:36:20 +01:00
explain.asciidoc [DOCS] updated json responses after #4310 and #4480 2014-01-16 12:01:39 +01:00
facets.asciidoc Cleanup comments and class names s/ElasticSearch/Elasticsearch 2014-01-07 11:21:51 +01:00
more-like-this.asciidoc Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
multi-search.asciidoc [DOCS] Documented rest.action.multi.allow_explicit_index 2013-11-27 17:33:09 +01:00
percolate.asciidoc Update percolate.asciidoc 2014-04-15 16:01:44 +02:00
request-body.asciidoc [DOCS] Add /_search_shards documentation 2014-04-22 08:54:32 -06:00
search-template.asciidoc Update Documentation Feature Flags [1.1.0] 2014-03-25 17:51:30 +01:00
search.asciidoc [DOCS] Reorganised common API conventions 2013-10-13 16:46:56 +02:00
suggesters.asciidoc Fix some typos in documentation. 2014-03-31 13:48:17 +02:00
uri-request.asciidoc Cleanup comments and class names s/ElasticSearch/Elasticsearch 2014-01-07 11:21:51 +01:00
validate.asciidoc [DOCS] fixed count and validate query docs, they now require a top-level query object, same as other apis 2014-02-13 11:42:04 +01:00