Commit Graph

38 Commits

Author SHA1 Message Date
Flavio Pompermaier f1bab2fa89 [DOCS] Correct sum_other_doc_count value in terms agg example (#45028)
Closes issue #41902
2019-07-31 14:10:36 -04:00
Antonio Matarrese 79c7a57737 Use the breadth first collection mode for significant terms aggs. (#29042)
This helps avoid memory issues when computing deep sub-aggregations. Because it
should be rare to use sub-aggregations with significant terms, we opted to always
choose breadth first as opposed to exposing a `collect_mode` option.

Closes #28652.
2019-04-11 15:56:02 -07:00
Samuel Cifuentes García ca83408542 Improved Terms Aggregation documentation (#38892)
Added a note after the first query example talking about fielddata.
2019-03-05 10:44:59 -05:00
Christoph Büscher 25aac4f77f
Remove `include_type_name` in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Luca Cavanna 42ea644903
Remove single shard optimization when suggesting shard_size (#37041)
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.

Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.

While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.

This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.

This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.

Relates to #32125
2019-01-02 17:45:49 +01:00
Serge Populov 13af5d5d7f Docs: Fix typo in field name in aggregations (#34223) 2018-10-02 10:54:29 -04:00
Colm O'Shea 97b379e0d4 fix no=>not typo (#32463)
Found a tiny typo while reading the docs
2018-07-31 13:33:23 +01:00
Clinton Gormley 45c1e37740 Add defined ID to terms agg size header 2018-02-02 13:43:20 +01:00
Jim Ferenczi 65184d0b5b
Adds a note in the `terms` aggregation docs regarding pagination (#28360)
This change adds a note in the `terms` aggregation that explains how to retrieve **all**
terms (or all combinations of terms in a nested agg) using the `composite` aggregation.
2018-01-25 08:59:41 +01:00
Christoph Büscher 0d11b9fe34
[Docs] Unify spelling of Elasticsearch (#27567)
Removes occurences of "elasticsearch" or "ElasticSearch" in favour of
"Elasticsearch" where appropriate.
2017-11-29 09:44:25 +01:00
Tanguy Leroux 643eb286dc [Docs] Convert remaining code snippets in docs (#26422)
This commit converts the last remaining code snippets so that they are
now testable.
2017-08-30 12:11:10 +02:00
Jim Ferenczi 977dcfe789 Deprecate global_ordinals_hash and global_ordinals_low_cardinality (#26173)
* Deprecate global_ordinals_hash and global_ordinals_low_cardinality

This change deprecates the `global_ordinals_hash` and `global_ordinals_low_cardinality` and
makes the `global_ordinals` execution hint choose internally if global ords should be remapped or use the segment ord directly.
These hints are too sensitive and expert to be exposed and we should be able to take the right decision internally based on the agg tree.
2017-08-21 19:12:27 +02:00
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Ryan Ernst 463fe2f4d4 Scripting: Remove file scripts (#24627)
This commit removes file scripts, which were deprecated in 5.5.

closes #21798
2017-05-17 14:42:25 -07:00
qwerty4030 e7d352b489 Compound order for histogram aggregations. (#22343)
This commit adds support for histogram and date_histogram agg compound order by refactoring and reusing terms agg order code. The major change is that the Terms.Order and Histogram.Order classes have been replaced/refactored into a new class BucketOrder. This is a breaking change for the Java Transport API. For backward compatibility with previous ES versions the (date)histogram compound order will use the first order. Also the _term and _time aggregation order keys have been deprecated; replaced by _key.

Relates to #20003: now that all these aggregations use the same order code, it should be easier to move validation to parse time (as a follow up PR).

Relates to #14771: histogram and date_histogram aggregation order will now be validated at reduce time.

Closes #23613: if a single BucketOrder that is not a tie-breaker is added with the Java Transport API, it will be converted into a CompoundOrder with a tie-breaker.
2017-05-11 18:06:26 +01:00
Jim Ferenczi 433c822d4f Promote longs to doubles when a terms agg mixes decimal and non-decimal numbers (#22449)
* Promote longs to doubles when a terms agg mixes decimal and non-decimal number

This change makes the terms aggregation work when the buckets coming from different indices are a mix of decimal numbers and non-decimal numbers. In this case non-decimal number (longs) are promoted to decimal (double) which can result in a loss of precision for big numbers.

Fixes #22232
2017-01-10 11:50:56 +01:00
Colin Goodheart-Smithe 8006b105f3 Update order examples to use max instead of avg (#22032)
The use of the avg aggregation for sorting the terms aggregation is not encouraged since it has unbounded error. This changes the examples to use the max aggregation which does not suffer the same issues
2016-12-07 16:00:24 +00:00
markharwood aa60e5cc07 Aggregations - support for partitioning set of terms used in aggregations so that multiple requests can be done without trying to compute everything in one request.
Closes #21487
2016-11-24 15:10:46 +00:00
Pascal Borreli fcb01deb34 Fixed typos (#20843) 2016-10-10 14:51:47 -06:00
Nik Everett 5cff2a046d Remove most of the need for `// NOTCONSOLE`
and be much more stingy about what we consider a console candidate.

* Add `// CONSOLE` to check-running
* Fix version in some snippets
* Mark groovy snippets as groovy
* Fix versions in plugins
* Fix language marker errors
* Fix language parsing in snippets

  This adds support for snippets who's language is written like
  `[source, txt]` and `["source","js",subs="attributes,callouts"]`.

  This also makes language required for snippets which is nice because
  then we can be sure we can grep for snippets in a particular language.
2016-09-06 10:32:54 -04:00
Clinton Gormley de208cf78c Fied bad asciidoc 2016-08-18 14:08:58 +02:00
Clinton Gormley 31e5e0b17f Document that pipeline aggs cannot be used for sorting
Closes #20037
2016-08-18 13:52:45 +02:00
Adrien Grand dcc598c414 Make the heuristic to compute the default shard size less aggressive.
The current heuristic to compute a default shard size is pretty aggressive,
it returns `max(10, number_of_shards * size)` as a value for the shard size.
I think making it less aggressive has the benefit that it would reduce the
likelyness of running into OOME when there are many shards (yearly
aggregations with time-based indices can make numbers of shards in the
thousands) and make the use of breadth-first more likely/efficient.

This commit replaces the heuristic with `size * 1.5 + 10`, which is enough
to have good accuracy on zipfian distributions.
2016-07-29 09:59:29 +02:00
Jared McQueen d97b3fd817 [docs] missing a comma in the terms aggregation example 2016-07-27 12:59:38 -04:00
Leon Weidauer 1297a707da non-binary gender option in term aggr. example (#19188)
* non-binary gender option in term aggr. example

* replace gender with music genre for term aggregation docs
2016-07-01 14:59:03 +02:00
Jim Ferenczi fb2a48d0f0 Revert "Remove support for sorting terms aggregation by ascending count"
This is delayed after alpha4 since Kibana relies on it.
2016-06-17 17:14:01 +02:00
Jim Ferenczi 755721953b Remove support for sorting terms aggregation by ascending count
closes #17614
2016-06-17 15:06:49 +02:00
Jim Ferenczi ad232aebbe Set collection mode to breadth_first in the terms aggregation when the cardinality of the field is unknown or smaller than the requested size.
closes #9825
2016-06-16 11:33:40 +02:00
Colin Goodheart-Smithe cfd3356ee3 Remove size 0 options in aggregations
This removes the ability to set `size: 0` in the `terms`, `significant_terms` and `geohash_grid` aggregations for the reasons described in https://github.com/elastic/elasticsearch/issues/18838

Closes #18838
2016-06-14 13:07:02 +01:00
Robert Muir c5532d3df0 add a rest test for this that seems to work, fix the documentation. thanks @s1monw 2016-05-11 16:07:08 -04:00
Jim Ferenczi 052191f2a2 Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information.
The score is recomputed lazily for each document belonging to a top bucket.
Relates to #9825
2016-05-04 15:35:45 +02:00
Sergii Golubev 434a563fe0 terms-aggregation.asciidoc tiny edit 2016-04-13 16:51:47 -06:00
Adrien Grand 1d0239c125 Add a warning about the impact of sorting terms aggregations on the accuracy of doc counts. 2016-04-07 16:57:44 +02:00
Colin Goodheart-Smithe 35a58d874e Scripting: Unify script and template requests across codebase
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs

Closes #11091
2015-05-29 16:52:04 +01:00
Adrien Grand 32e23b9100 Aggs: Make it possible to configure missing values.
Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.

This works in a very similar way to the `missing` option on the `sort`
element.

One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.

Related to #5324
2015-05-15 16:26:58 +02:00
Zachary Tong e3ae1df6f0 [DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00