Commit Graph

192 Commits

Author SHA1 Message Date
Jim Ferenczi 2acafd4b15
Optimize composite aggregation based on index sorting (#48399) (#50272)
Co-authored-by: Daniel Huang <danielhuang@tencent.com>

This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification.
In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue.
The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases:
  * Multi-valued source, in such case early termination is not possible.
  * missing_bucket is set to true
2019-12-20 12:32:37 +01:00
Lisa Cawley 30d66828ae [DOCS] Move transform resource definitions into APIs (#50108) 2019-12-17 12:31:31 -08:00
Lisa Cawley ca895d3ad5 [DOCS] Merge rollup config details into API (#49412) 2019-11-22 08:39:49 -08:00
James Rodewig 852622d970 [DOCS] Remove binary gendered language (#48362) 2019-10-23 09:37:12 -05:00
Mark Tozzi e404f7ea80
DocValueFormat implementation for date range fields (#47472) (#47605) 2019-10-04 17:21:17 -04:00
Mark Tozzi 5bdf25320a
Documentation notes for Range field histograms (#46890) (#47366) 2019-10-01 10:58:44 -04:00
Javier Ruiz a5661ac03a [DOCS] Fix calendar interval typos for date histo agg (#46911) 2019-09-20 15:22:41 -04:00
James Rodewig 99130114de
[DOCS] Correct several [source,console-result] snippets (#46930) (#46937) 2019-09-20 12:20:12 -04:00
James Rodewig 043471c643
[DOCS] Minor improvement to the nested aggregation docs (#46475) (#46604)
* Minor improvement to the nested aggregation docs

* The attributes name and resellers.name were rather confusing,
  especially since the first one was dynamically mapped and not shown
  in the documentation (you had to read the test to see it). This
  change introduces a unique name for the nested attribute and adds
  the example document to the documentation.
* Change the index name from "index" to something more speaking.

* Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-11 12:06:42 -04:00
James Rodewig f04573f8e8
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
James Rodewig bb7bff5e30
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418) 2019-09-06 09:22:08 -04:00
markharwood 323ec022be
Deprecate the "index.max_adjacency_matrix_filters" index setting (#46394)
Following performance optimisations to the adjacency_matrix aggregation we no longer require this setting. Marked as deprecated and due for removal in 8.0

Related #46324
2019-09-06 13:59:47 +01:00
James Rodewig 1f36c4e50c
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159) (#46332) 2019-09-05 10:11:25 -04:00
LHearen 8f86faca5c [DOCS] Correct conditional clause in histogram agg docs (#45643) 2019-08-19 10:09:46 -04:00
LHearen da0a785685 [DOCS] Fix a 'value' -> 'values' typo in histogram aggregation docs (#45642) 2019-08-19 10:02:59 -04:00
Flavio Pompermaier f1bab2fa89 [DOCS] Correct sum_other_doc_count value in terms agg example (#45028)
Closes issue #41902
2019-07-31 14:10:36 -04:00
Sandeep Kanabar 8f1a3ab70a [Docs] Update daterange-aggregation.asciidoc (#44730)
Correcting the value to be the same as that specified for "missing".
2019-07-29 12:50:33 +02:00
James Rodewig d46545f729 [DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:18:23 -04:00
Zachary Tong 3fa677ce79 Document that pipeline aggs are not compatible with composite agg (#44180) 2019-07-12 12:35:18 -04:00
Zachary Tong ea1794832f Add RareTerms aggregation (#35718)
This adds a `rare_terms` aggregation.  It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.

This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).

This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again.  If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter.  If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".

The map keys are the "rare" terms after collection is done.
2019-07-01 10:30:02 -04:00
Paul Sanwald 8578aba654
[backport] Adds a minimum interval to `auto_date_histogram`. (#42814) (#43285)
Backports minimum interval to date histogram
2019-06-19 07:06:45 -04:00
Zachary Tong 6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
James Rodewig 53702efddd [DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:20:17 -04:00
Zachary Tong ec5dd0594f Disallow null/empty or duplicate composite sources (#41359)
Adds some validation to prevent duplicate source names from being
used in the composite agg.

Also refactored to use a ConstructingObjectParser and removed the
private ctor and setter for sources, making it mandatory.
2019-04-24 13:23:31 -04:00
Jason Tedor 454148eee6
Fix intervals section of auto date-histogram docs (#41203)
This section should be at the same sub-level as other sections in the
auto date-histogram docs, otherwise it is rendered on to another page
and is confusing for users to understand what it's in reference to.
2019-04-15 11:28:12 -04:00
Antonio Matarrese 79c7a57737 Use the breadth first collection mode for significant terms aggs. (#29042)
This helps avoid memory issues when computing deep sub-aggregations. Because it
should be rare to use sub-aggregations with significant terms, we opted to always
choose breadth first as opposed to exposing a `collect_mode` option.

Closes #28652.
2019-04-11 15:56:02 -07:00
Lisa Cawley e120deb08f [DOCS] Fixes callout for Asciidoctor migration (#41127) 2019-04-11 12:06:10 -07:00
Ian 95409d3a7e Correct date in daterange-aggregation.asciidoc (#39727) 2019-03-06 11:29:32 +01:00
Samuel Cifuentes García ca83408542 Improved Terms Aggregation documentation (#38892)
Added a note after the first query example talking about fielddata.
2019-03-05 10:44:59 -05:00
Hannes Van De Vreken 27cf7e27e7 Fix typo in DateRange docs (yyy → yyyy) (#38883) 2019-02-15 10:19:54 -05:00
Alexander Reelsen 8e5e48319e
Add documentation about breaking java time changes (#38886)
In addition remove joda time mentions across the docs, make 
sure links are updated to java time javadocs.

Forward port of #38720
2019-02-14 10:18:12 +01:00
Yuri Astrakhan f3cde06a1d
geotile_grid implementation (#37842)
Implements `geotile_grid` aggregation

This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240

This code uses the same base classes as `geohash_grid` agg, but uses a different hashing
algorithm to allow zoom consistency.  Each grid bucket is aligned to Web Mercator tiles.
2019-01-31 19:11:30 -05:00
Jim Ferenczi cb451edb01
Allow nested fields in the composite aggregation (#37178)
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.

Closes #28611
2019-01-25 14:00:39 +01:00
Christoph Büscher 967de04257
Uppercasing some docs section title (#37781)
Section titles are mostly uppercase, only a few cases where query DSL parameters
or Java method names are used as the title they should be lowercased.
2019-01-24 22:54:55 +01:00
Christoph Büscher 95a6951f78
Use new bulk API endpoint in the docs (#37698)
This change switches to using the typeless bulk API endpoint in the
documentation snippets where possible
2019-01-23 09:46:28 +01:00
Christoph Büscher 3a96608b3f
Remove more include_type_name and types from docs (#37601) 2019-01-18 14:11:18 +01:00
Christoph Büscher 25aac4f77f
Remove `include_type_name` in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Igor Motov d6acd8e15f
Docs: add clarification about geohash use in geohashgrid agg (#36901)
Adds an example on translating geohashes returned by geohashgrid 
agg as bucket keys into geo bounding box filters in elasticsearch as well
as 3rd party applications.

Closes #36413
2019-01-03 15:40:48 -05:00
Luca Cavanna 42ea644903
Remove single shard optimization when suggesting shard_size (#37041)
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.

Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.

While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.

This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.

This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.

Relates to #32125
2019-01-02 17:45:49 +01:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Jeff Hajewski 49087f16f5 Adds deprecation logging to ScriptDocValues#getValues. (#34279)
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.

Closes #22919
2018-11-27 14:30:13 -05:00
William Desportes a204d1cdff [Docs] Fix typo in datehistogram-aggregation.asciidoc (#35855) 2018-11-23 15:16:53 +01:00
Jim Ferenczi d96202a282 [DOCS] Fix missing callouts 2018-11-08 15:40:01 +01:00
Dominik Stadler d351422215 Add parent-aggregation to parent-join module (#34210)
Add `parent` aggregation, a special single bucket aggregation that joins children documents to their parent.
2018-11-08 14:13:00 +01:00
Sue Gallagher 1ce3c92a2d [DOCS] Add info on calendar vs fixed interval. (#31638)
Extensive edit to add additional information on the difference between calendar intervals and fixed-length intervals.
2018-10-31 10:16:36 -04:00
Julie Tibshirani f854330e06
Make sure to use the type _doc in the REST documentation. (#34662)
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
2018-10-22 11:54:04 -07:00
markharwood fe623acf66
Docs - removed experimental/beta markers from adjacency matrix aggregation (#34599) 2018-10-19 09:33:59 +01:00
markharwood 2a413abb0b
Docs - remove experimental marker from significant_text aggregation (#34598) 2018-10-19 09:32:02 +01:00
Jim Ferenczi 36557469f6
[DOCS] Removes beta label from composite aggregation (#34329) 2018-10-05 19:46:20 +02:00