OpenSearch

Commit Graph

Author	SHA1	Message	Date
Tal Levy	9ee2e11181	[7.x] Adds support for geo-bounds filtering in geogrid aggregations (#50996 ) * Adds support for geo-bounds filtering in geogrid aggregations (#50002) It is fairly common to filter the geo point candidates in geohash_grid and geotile_grid aggregations according to some viewable bounding box. This change introduces the option of specifying this filter directly in the tiling aggregation. This is even more relevant to `geo_shape` where the bounds will restrict the shape to be within the bounds this optional `bounds` parameter is parsed in an equivalent fashion to the bounds specified in the geo_bounding_box query.	2020-01-14 11:18:46 -08:00
Nik Everett	1d8e51f89d	Support offset in composite aggs (#50609 ) (#50808 ) Adds support for the `offset` parameter to the `date_histogram` source of composite aggs. The `offset` parameter is supported by the normal `date_histogram` aggregation and is useful for folks that need to measure things from, say, 6am one day to 6am the next day. This is implemented by creating a new `Rounding` that knows how to handle offsets and delegates to other rounding implementations. That implementation doesn't fully implement the `Rounding` contract, namely `nextRoundingValue`. That method isn't used by composite aggs so I can't be sure that any implementation that I add will be correct. I propose to leave it throwing `UnsupportedOperationException` until I need it. Closes #48757	2020-01-09 14:11:24 -05:00
James Rodewig	1299dda437	[DOCS] Warn about using `geo_centroid` as sub-agg to `geohash_grid` (#50038 ) If `geo_point fields` are multi-valued, using `geo_centroid` as a sub-agg to `geohash_grid` could result in centroids outside of bucket boundaries. This adds a related warning to the geo_centroid agg docs.	2020-01-06 07:47:54 -06:00
Nik Everett	55107ce8ae	Docs: Refine note about `after_key` (#50475 ) * Docs: Refine note about `after_key` I was curious about composite aggregations, specifically I wanted to know how to write a composite aggregation that had all of its buckets filtered out so you had to use the `after_key`. Then I saw that we've declared composite aggregations not to work with pipelines in #44180. So I'm not sure you can do that any more. Which makes the note about `after_key` inaccurate. This rejiggers that section of the docs a little so it is more obvious that you send the `after_key` back to us. And so it is more obvious that you should only use the `after_key` that we give you rather than try to work it out for yourself. * Apply suggestions from code review Co-Authored-By: James Rodewig <james.rodewig@elastic.co> Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-01-02 10:03:23 -05:00
Gilad Gal	9fdfb075bb	Deleted 'a' before plural 'messages' Deleted 'a' before plural 'messages'	2019-12-30 21:25:15 +02:00
James Rodewig	694b119f0a	[DOCS] Percentile aggs are non-deterministic (#50468 ) Percentile aggregations are non-deterministic. A percentile aggregation can produce different results even when using the same data. Based on [this discuss post][0], the non-deterministic property stems from processes in Lucene that can affect the order in which docs are provided to the aggregation. This adds a warning stating that the aggregation is non-deterministic and what that means. [0]: https://discuss.elastic.co/t/different-results-for-same-query/111757	2019-12-23 13:13:34 -05:00
Florian Kelbert	af8bed13d3	[DOCS] Fix typo in bucket sum aggregation docs (#50431 )	2019-12-20 08:48:25 -05:00
Jim Ferenczi	2acafd4b15	Optimize composite aggregation based on index sorting (#48399 ) (#50272 ) Co-authored-by: Daniel Huang <danielhuang@tencent.com> This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true	2019-12-20 12:32:37 +01:00
Lisa Cawley	30d66828ae	[DOCS] Move transform resource definitions into APIs (#50108 )	2019-12-17 12:31:31 -08:00
James Rodewig	364eb2d34c	[DOCS] Correct percentile rank agg example response (#50052 ) The example snippets in the percentile rank agg docs use a test dataset named `latency`, which is generated from docs/gradle.build. At some point the dataset and example snippets were updated, but the text surrounding the snippets was not. This means the text and the example snippets shown no longer match up. This corrects that by changing the snippets using /TESTRESPONSE magic comments.	2019-12-12 09:06:41 -05:00
Ignacio Vera	326fe7566e	New Histogram field mapper that supports percentiles aggregations. (#48580 ) (#49683 ) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.	2019-11-28 15:06:26 +01:00
Lisa Cawley	ca895d3ad5	[DOCS] Merge rollup config details into API (#49412 )	2019-11-22 08:39:49 -08:00
Christos Soulios	d9f0245b10	[7.x] Implement stats aggregation for string terms (#49097 ) Backport of #47468 to 7.x This PR adds a new metric aggregation called string_stats that operates on string terms of a document and returns the following: min_length: The length of the shortest term max_length: The length of the longest term avg_length: The average length of all terms distribution: The probability distribution of all characters appearing in all terms entropy: The total Shannon entropy value calculated for all terms This aggregation has been implemented as an analytics plugin.	2019-11-15 14:36:21 +02:00
James Rodewig	852622d970	[DOCS] Remove binary gendered language (#48362 )	2019-10-23 09:37:12 -05:00
Ian Danforth	28c1677341	[DOCS] Fix typo in percentile rank aggregation docs (#47247 )	2019-10-15 15:56:45 -04:00
Mark Tozzi	e404f7ea80	DocValueFormat implementation for date range fields (#47472 ) (#47605 )	2019-10-04 17:21:17 -04:00
Mark Tozzi	5bdf25320a	Documentation notes for Range field histograms (#46890 ) (#47366 )	2019-10-01 10:58:44 -04:00
Javier Ruiz	a5661ac03a	[DOCS] Fix calendar interval typos for date histo agg (#46911 )	2019-09-20 15:22:41 -04:00
James Rodewig	99130114de	[DOCS] Correct several [source,console-result] snippets (#46930 ) (#46937 )	2019-09-20 12:20:12 -04:00
James Rodewig	2831535cf9	[DOCS] Replace "// CONSOLE" comments with [source,console] (#46679 )	2019-09-13 11:44:54 -04:00
James Rodewig	043471c643	[DOCS] Minor improvement to the nested aggregation docs (#46475 ) (#46604 ) * Minor improvement to the nested aggregation docs * The attributes name and resellers.name were rather confusing, especially since the first one was dynamically mapped and not shown in the documentation (you had to read the test to see it). This change introduces a unique name for the nested attribute and adds the example document to the documentation. * Change the index name from "index" to something more speaking. * Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2019-09-11 12:06:42 -04:00
James Rodewig	f04573f8e8	[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449 ) (#46459 )	2019-09-06 16:09:09 -04:00
James Rodewig	bb7bff5e30	[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295 ) (#46418 )	2019-09-06 09:22:08 -04:00
markharwood	323ec022be	Deprecate the "index.max_adjacency_matrix_filters" index setting (#46394 ) Following performance optimisations to the adjacency_matrix aggregation we no longer require this setting. Marked as deprecated and due for removal in 8.0 Related #46324	2019-09-06 13:59:47 +01:00
James Rodewig	1f36c4e50c	[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159 ) (#46332 )	2019-09-05 10:11:25 -04:00
Zachary Tong	85e2e41de7	Add CumulativeCard pipeline agg to pipeline index (#46279 ) The Cumulative Cardinality docs weren't linked from the pipeline index page	2019-09-03 12:11:04 -04:00
Zachary Tong	943a016bb2	Add Cumulative Cardinality agg (and Data Science plugin) (#45990 ) This adds a pipeline aggregation that calculates the cumulative cardinality of a field. It does this by iteratively merging in the HLL sketch from consecutive buckets and emitting the cardinality up to that point. This is useful for things like finding the total "new" users that have visited a website (as opposed to "repeat" visitors). This is a Basic+ aggregation and adds a new Data Science plugin to house it and future advanced analytics/data science aggregations.	2019-08-26 16:19:55 -04:00
LHearen	8f86faca5c	[DOCS] Correct conditional clause in histogram agg docs (#45643 )	2019-08-19 10:09:46 -04:00
LHearen	da0a785685	[DOCS] Fix a 'value' -> 'values' typo in histogram aggregation docs (#45642 )	2019-08-19 10:02:59 -04:00
Zachary Tong	3df1c76f9b	Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179 ) This adjusts the `buckets_path` parser so that pipeline aggs can select specific buckets (via their bucket keys) instead of fetching the entire set of buckets. This is useful for bucket_script in particular, which might want specific buckets for calculations. It's possible to workaround this with `filter` aggs, but the workaround is hacky and probably less performant. - Adjusts documentation - Adds a barebones AggregatorTestCase for bucket_script - Tweaks AggTestCase to use getMockScriptService() for reductions and pipelines. Previously pipelines could just pass in a script service for testing, but this didnt work for regular aggs. The new getMockScriptService() method fixes that issue, but needs to be used for pipelines too. This had a knock-on effect of touching MovFn, AvgBucket and ScriptedMetric	2019-08-05 12:18:40 -04:00
Zachary Tong	e5079ac288	[7.x backport] Add more flexibility to MovingFunction window alignment (#45159 ) Introduce shift field to MovingFunction aggregation. By default, shift = 0. Behavior, in this case, is the same as before. Increasing shift by 1 moves starting window position by 1 to the right. To simply include current bucket to the window, use shift = 1 For center alignment (n/2 values before and after the current bucket), use shift = window / 2 For right alignment (n values after the current bucket), use shift = window.	2019-08-05 11:56:52 -04:00
Zachary Tong	ffbe047c32	Revert "Add more flexibility to MovingFunction window alignment (#44360 )" This reverts commit `1a58a487f0`.	2019-08-02 15:16:04 -04:00
Nikita Glashenko	1a58a487f0	Add more flexibility to MovingFunction window alignment (#44360 ) Introduce shift field to MovingFunction aggregation. By default, shift = 0. Behavior, in this case, is the same as before. Increasing shift by 1 moves starting window position by 1 to the right. To simply include current bucket to the window, use shift = 1 For center alignment (n/2 values before and after the current bucket), use shift = window / 2 For right alignment (n values after the current bucket), use shift = window.	2019-08-02 15:10:21 -04:00
Flavio Pompermaier	f1bab2fa89	[DOCS] Correct sum_other_doc_count value in terms agg example (#45028 ) Closes issue #41902	2019-07-31 14:10:36 -04:00
Sandeep Kanabar	8f1a3ab70a	[Docs] Update daterange-aggregation.asciidoc (#44730 ) Correcting the value to be the same as that specified for "missing".	2019-07-29 12:50:33 +02:00
James Rodewig	d46545f729	[DOCS] Update anchors and links for Elasticsearch API relocation (#44500 )	2019-07-19 09:18:23 -04:00
Zachary Tong	3fa677ce79	Document that pipeline aggs are not compatible with composite agg (#44180 )	2019-07-12 12:35:18 -04:00
Zachary Tong	f8fd4321f8	Link rare_terms docs from index page (#43882 ) Docs for rare_terms were added in #35718, but neglected to link it from the bucket index page	2019-07-03 09:32:01 -04:00
Zachary Tong	ea1794832f	Add RareTerms aggregation (#35718 ) This adds a `rare_terms` aggregation. It is an aggregation designed to identify the long-tail of keywords, e.g. terms that are "rare" or have low doc counts. This aggregation is designed to be more memory efficient than the alternative, which is setting a terms aggregation to size: LONG_MAX (or worse, ordering a terms agg by count ascending, which has unbounded error). This aggregation works by maintaining a map of terms that have been seen. A counter associated with each value is incremented when we see the term again. If the counter surpasses a predefined threshold, the term is removed from the map and inserted into a cuckoo filter. If a future term is found in the cuckoo filter we assume it was previously removed from the map and is "common". The map keys are the "rare" terms after collection is done.	2019-07-01 10:30:02 -04:00
Paul Sanwald	8578aba654	[backport] Adds a minimum interval to `auto_date_histogram`. (#42814 ) (#43285 ) Backports minimum interval to date histogram	2019-06-19 07:06:45 -04:00
James Rodewig	0a37dd7a86	[DOCS] Remove unneeded `ifdef::asciidoctor[]` conditionals (#42758 ) Several `ifdef::asciidoctor` conditionals were added so that AsciiDoc and Asciidoctor doc builds rendered consistently. With https://github.com/elastic/docs/pull/827, Elasticsearch Reference documentation migrated completely to Asciidoctor. We no longer need to support AsciiDoc so we can remove these conditionals. Resolves #41722	2019-05-31 11:08:54 -04:00
James Rodewig	31d2bdca37	[DOCS] Fix Moving Avg Aggregation `deprecated` macro for Asciidoctor (#42405 )	2019-05-28 08:56:50 -04:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
James Rodewig	53702efddd	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:20:17 -04:00
Ignacio Vera	d119abdf96	Improve accuracy for Geo Centroid Aggregation (#41514 ) keeps the partial results as doubles and uses Kahan summation to help reduce floating point errors.	2019-04-25 15:25:48 +02:00
Zachary Tong	ec5dd0594f	Disallow null/empty or duplicate composite sources (#41359 ) Adds some validation to prevent duplicate source names from being used in the composite agg. Also refactored to use a ConstructingObjectParser and removed the private ctor and setter for sources, making it mandatory.	2019-04-24 13:23:31 -04:00
Jason Tedor	454148eee6	Fix intervals section of auto date-histogram docs (#41203 ) This section should be at the same sub-level as other sections in the auto date-histogram docs, otherwise it is rendered on to another page and is confusing for users to understand what it's in reference to.	2019-04-15 11:28:12 -04:00
Antonio Matarrese	79c7a57737	Use the breadth first collection mode for significant terms aggs. (#29042 ) This helps avoid memory issues when computing deep sub-aggregations. Because it should be rare to use sub-aggregations with significant terms, we opted to always choose breadth first as opposed to exposing a `collect_mode` option. Closes #28652.	2019-04-11 15:56:02 -07:00
Lisa Cawley	e120deb08f	[DOCS] Fixes callout for Asciidoctor migration (#41127 )	2019-04-11 12:06:10 -07:00
Ian	95409d3a7e	Correct date in daterange-aggregation.asciidoc (#39727 )	2019-03-06 11:29:32 +01:00

1 2 3 4 5 ...

353 Commits