OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-08 14:05:27 +00:00

Author	SHA1	Message	Date
Christos Soulios	3868bcc7b8	[7.x] Histogram integration on Histogram field type (#59431 ) Backports #58930 to 7.x Implements histogram aggregation over histogram fields as requested in #53285.	2020-07-13 19:36:33 +03:00
Nik Everett	eb169ae226	Fix lookup support in adjacency matrix (backport of #59099 ) (#59108 ) This request: ``` POST /_search { "aggs": { "a": { "adjacency_matrix": { "filters": { "1": { "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } } } } } } } } ``` Would fail with a 500 error and a message like: ``` { "error": { "root_cause": [ { "type": "illegal_state_exception", "reason":"async actions are left after rewrite" } ] } } ``` This fixes that by moving the query rewrite phase from a synchronous call on the data nodes into the standard aggregation rewrite phase which can properly handle the asynchronous actions.	2020-07-07 10:28:20 -04:00
Nik Everett	40850a780d	Fail variable_width_histogram that collects from many (#58619 ) (#58780 ) Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to #42035	2020-06-30 18:26:45 -04:00
Nik Everett	d22a242613	Docs: Mark variable_width_histogram experimental (#58574 ) We're tracking this aggregation's experimental-progress in #58573. We'd like a little time to be able to make backwards incompatible changes to the aggregation because we're not 100% sure about the request and response format yet.	2020-06-25 16:54:57 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Tal Levy	11086d5c7d	add geo_shape documentation for supported aggregations (#58284 ) (#58354 ) This commit adds documentation for geo_shape fields in aggregations Closes #55495.	2020-06-18 12:36:24 -07:00
Benjamin Trent	1aea9d5f49	Adding transform docs for geotile_grid (#57000 ) (#57474 ) transforms and composite aggs support geotile_grid as a source. This adds documentation explaining that support.	2020-06-01 15:46:37 -04:00
Nik Everett	07c76f2894	Update date_histogram docs (#56922 ) (#57387 ) * Make it more clear that you can use `month` or `1M`. * Explain rounding rules * Consistently use "time zone" instead of "timezone". It looks like both are right but I see "time zone" much more. And the parameter in elasticsearch is `time_zone` so we may as well line up. Closes #56760 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-05-29 17:40:40 -04:00
Gabriel Petrovay	cb4d5f5042	Fixed calendar intervals documentation (#56666 ) - the 1-letter intervals are not parseable (`m`, `h`, `d`, `w`, `M`, `q`, `y`) - fixed formatting broken by new lines	2020-05-15 16:55:57 -04:00
Gabriel Petrovay	ca586f2a8d	[Docs] Correct formatting in datehistogram-aggregation.asciidoc (#56664 )	2020-05-13 12:01:42 +02:00
James Rodewig	e4e02e133e	[DOCS] Remove approximate document counts example from term agg docs (#55442 ) Removes an example from the "Document counts are approximate" section of the terms agg documentation. As #52377 details, the example was no longer accurate in 7.x or 6.8. Document counts were more precise than the example presented. We've opened issue #56025 to discuss re-adding an example later. Co-authored-by: James Rodewig <james.rodewig@elastic.co> Co-authored-by: AB Prashanth <panuradh@buffalo.edu>	2020-04-30 10:11:50 -04:00
Zachary Tong	715c90bf7d	Aggs must specify a `field` or `script` (or both) (#52226 ) This adds a validation to VSParserHelper to ensure that a field or script or both are specified by the user. This is technically required today already, but throws an exception much deeper in the agg framework and has a very unintuitive error for the user (as well as eating more resources instead of failing early)	2020-04-23 19:23:41 -04:00
Anton Dollmaier	35c8226419	[DOCS] Fix parameter formatting for GeoHash grid agg docs (#53032 ) Adds missing colon (`:`) to the parameter definition list.	2020-03-09 08:16:40 -04:00
Tal Levy	9ee2e11181	[7.x] Adds support for geo-bounds filtering in geogrid aggregations (#50996 ) * Adds support for geo-bounds filtering in geogrid aggregations (#50002) It is fairly common to filter the geo point candidates in geohash_grid and geotile_grid aggregations according to some viewable bounding box. This change introduces the option of specifying this filter directly in the tiling aggregation. This is even more relevant to `geo_shape` where the bounds will restrict the shape to be within the bounds this optional `bounds` parameter is parsed in an equivalent fashion to the bounds specified in the geo_bounding_box query.	2020-01-14 11:18:46 -08:00
Nik Everett	1d8e51f89d	Support offset in composite aggs (#50609 ) (#50808 ) Adds support for the `offset` parameter to the `date_histogram` source of composite aggs. The `offset` parameter is supported by the normal `date_histogram` aggregation and is useful for folks that need to measure things from, say, 6am one day to 6am the next day. This is implemented by creating a new `Rounding` that knows how to handle offsets and delegates to other rounding implementations. That implementation doesn't fully implement the `Rounding` contract, namely `nextRoundingValue`. That method isn't used by composite aggs so I can't be sure that any implementation that I add will be correct. I propose to leave it throwing `UnsupportedOperationException` until I need it. Closes #48757	2020-01-09 14:11:24 -05:00
Nik Everett	55107ce8ae	Docs: Refine note about `after_key` (#50475 ) * Docs: Refine note about `after_key` I was curious about composite aggregations, specifically I wanted to know how to write a composite aggregation that had all of its buckets filtered out so you had to use the `after_key`. Then I saw that we've declared composite aggregations not to work with pipelines in #44180. So I'm not sure you can do that any more. Which makes the note about `after_key` inaccurate. This rejiggers that section of the docs a little so it is more obvious that you send the `after_key` back to us. And so it is more obvious that you should only use the `after_key` that we give you rather than try to work it out for yourself. * Apply suggestions from code review Co-Authored-By: James Rodewig <james.rodewig@elastic.co> Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-01-02 10:03:23 -05:00
Jim Ferenczi	2acafd4b15	Optimize composite aggregation based on index sorting (#48399 ) (#50272 ) Co-authored-by: Daniel Huang <danielhuang@tencent.com> This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true	2019-12-20 12:32:37 +01:00
Lisa Cawley	30d66828ae	[DOCS] Move transform resource definitions into APIs (#50108 )	2019-12-17 12:31:31 -08:00
Lisa Cawley	ca895d3ad5	[DOCS] Merge rollup config details into API (#49412 )	2019-11-22 08:39:49 -08:00
James Rodewig	852622d970	[DOCS] Remove binary gendered language (#48362 )	2019-10-23 09:37:12 -05:00
Mark Tozzi	e404f7ea80	DocValueFormat implementation for date range fields (#47472 ) (#47605 )	2019-10-04 17:21:17 -04:00
Mark Tozzi	5bdf25320a	Documentation notes for Range field histograms (#46890 ) (#47366 )	2019-10-01 10:58:44 -04:00
Javier Ruiz	a5661ac03a	[DOCS] Fix calendar interval typos for date histo agg (#46911 )	2019-09-20 15:22:41 -04:00
James Rodewig	99130114de	[DOCS] Correct several [source,console-result] snippets (#46930 ) (#46937 )	2019-09-20 12:20:12 -04:00
James Rodewig	043471c643	[DOCS] Minor improvement to the nested aggregation docs (#46475 ) (#46604 ) * Minor improvement to the nested aggregation docs * The attributes name and resellers.name were rather confusing, especially since the first one was dynamically mapped and not shown in the documentation (you had to read the test to see it). This change introduces a unique name for the nested attribute and adds the example document to the documentation. * Change the index name from "index" to something more speaking. * Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2019-09-11 12:06:42 -04:00
James Rodewig	f04573f8e8	[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449 ) (#46459 )	2019-09-06 16:09:09 -04:00
James Rodewig	bb7bff5e30	[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295 ) (#46418 )	2019-09-06 09:22:08 -04:00
markharwood	323ec022be	Deprecate the "index.max_adjacency_matrix_filters" index setting (#46394 ) Following performance optimisations to the adjacency_matrix aggregation we no longer require this setting. Marked as deprecated and due for removal in 8.0 Related #46324	2019-09-06 13:59:47 +01:00
James Rodewig	1f36c4e50c	[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159 ) (#46332 )	2019-09-05 10:11:25 -04:00
LHearen	8f86faca5c	[DOCS] Correct conditional clause in histogram agg docs (#45643 )	2019-08-19 10:09:46 -04:00
LHearen	da0a785685	[DOCS] Fix a 'value' -> 'values' typo in histogram aggregation docs (#45642 )	2019-08-19 10:02:59 -04:00
Flavio Pompermaier	f1bab2fa89	[DOCS] Correct sum_other_doc_count value in terms agg example (#45028 ) Closes issue #41902	2019-07-31 14:10:36 -04:00
Sandeep Kanabar	8f1a3ab70a	[Docs] Update daterange-aggregation.asciidoc (#44730 ) Correcting the value to be the same as that specified for "missing".	2019-07-29 12:50:33 +02:00
James Rodewig	d46545f729	[DOCS] Update anchors and links for Elasticsearch API relocation (#44500 )	2019-07-19 09:18:23 -04:00
Zachary Tong	3fa677ce79	Document that pipeline aggs are not compatible with composite agg (#44180 )	2019-07-12 12:35:18 -04:00
Zachary Tong	ea1794832f	Add RareTerms aggregation (#35718 ) This adds a `rare_terms` aggregation. It is an aggregation designed to identify the long-tail of keywords, e.g. terms that are "rare" or have low doc counts. This aggregation is designed to be more memory efficient than the alternative, which is setting a terms aggregation to size: LONG_MAX (or worse, ordering a terms agg by count ascending, which has unbounded error). This aggregation works by maintaining a map of terms that have been seen. A counter associated with each value is incremented when we see the term again. If the counter surpasses a predefined threshold, the term is removed from the map and inserted into a cuckoo filter. If a future term is found in the cuckoo filter we assume it was previously removed from the map and is "common". The map keys are the "rare" terms after collection is done.	2019-07-01 10:30:02 -04:00
Paul Sanwald	8578aba654	[backport] Adds a minimum interval to `auto_date_histogram`. (#42814 ) (#43285 ) Backports minimum interval to date histogram	2019-06-19 07:06:45 -04:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
James Rodewig	53702efddd	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:20:17 -04:00
Zachary Tong	ec5dd0594f	Disallow null/empty or duplicate composite sources (#41359 ) Adds some validation to prevent duplicate source names from being used in the composite agg. Also refactored to use a ConstructingObjectParser and removed the private ctor and setter for sources, making it mandatory.	2019-04-24 13:23:31 -04:00
Jason Tedor	454148eee6	Fix intervals section of auto date-histogram docs (#41203 ) This section should be at the same sub-level as other sections in the auto date-histogram docs, otherwise it is rendered on to another page and is confusing for users to understand what it's in reference to.	2019-04-15 11:28:12 -04:00
Antonio Matarrese	79c7a57737	Use the breadth first collection mode for significant terms aggs. (#29042 ) This helps avoid memory issues when computing deep sub-aggregations. Because it should be rare to use sub-aggregations with significant terms, we opted to always choose breadth first as opposed to exposing a `collect_mode` option. Closes #28652.	2019-04-11 15:56:02 -07:00
Lisa Cawley	e120deb08f	[DOCS] Fixes callout for Asciidoctor migration (#41127 )	2019-04-11 12:06:10 -07:00
Ian	95409d3a7e	Correct date in daterange-aggregation.asciidoc (#39727 )	2019-03-06 11:29:32 +01:00
Samuel Cifuentes García	ca83408542	Improved Terms Aggregation documentation (#38892 ) Added a note after the first query example talking about fielddata.	2019-03-05 10:44:59 -05:00
Hannes Van De Vreken	27cf7e27e7	Fix typo in DateRange docs (yyy → yyyy) (#38883 )	2019-02-15 10:19:54 -05:00
Alexander Reelsen	8e5e48319e	Add documentation about breaking java time changes (#38886 ) In addition remove joda time mentions across the docs, make sure links are updated to java time javadocs. Forward port of #38720	2019-02-14 10:18:12 +01:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Jim Ferenczi	cb451edb01	Allow nested fields in the composite aggregation (#37178 ) This changes adds the support to handle `nested` fields in the `composite` aggregation. A `nested` aggregation can be used as parent of a `composite` aggregation in order to target `nested` fields in the `sources`. Closes #28611	2019-01-25 14:00:39 +01:00
Christoph Büscher	967de04257	Uppercasing some docs section title (#37781 ) Section titles are mostly uppercase, only a few cases where query DSL parameters or Java method names are used as the title they should be lowercased.	2019-01-24 22:54:55 +01:00

1 2 3 4 5

208 Commits