OpenSearch

Commit Graph

Author	SHA1	Message	Date
James Rodewig	0c9791798d	[7.x] [DOCS] Reformat snippets to use two-space indents (#60080 )	2020-07-22 15:57:49 -04:00
Howard	466e947b0e	[DOCS] Fix missing punctuation in agg docs (#59823 )	2020-07-21 10:19:29 -04:00
James Rodewig	ff8a042580	[DOCS] Reformat agg snippets to use two-space indents (#59912 ) (#59922 )	2020-07-20 15:59:00 -04:00
James Rodewig	24fec52447	[DOCS] Add performance warning for scripts (#59890 ) (#59913 )	2020-07-20 15:05:33 -04:00
Igor Motov	96a5284484	Add hard_bounds documentation (#59809 ) (#59883 ) Fixes #59774	2020-07-20 10:51:23 -04:00
Nik Everett	514b2f3414	Clean up a few of vwh's rough edges (#59341 ) (#59807 ) This cleans up a few rough edged in the `variable_width_histogram`, mostly found by @wwang500: 1. Setting its tuning parameters in an unexpected order could cause the request to fail. 2. We checked that the maximum number of buckets was both less than 50000 and MAX_BUCKETS. This drops the 50000. 3. Fixes a divide by 0 that can occur of the `shard_size` is 1. 4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows a signed int. 5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less than `buckets` we will very consistently return fewer buckets than requested. For the most part we expect folks to leave it at the default. If they change it, we expect it to be much bigger than `buckets`. 6. Allocate a smaller `mergeMap` in when initially bucketing requests that don't use the entire `shard_size * 3 / 4`. Its just a waste. 7. Default `shard_size` to `10 * buckets` rather than `100`. It looks like that was our intention the whole time. And it feels like it'd keep the algorithm humming along more smoothly. 8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like we've documented it rather than `5000`. Like the point above, this feels like the right thing to do to keep the algorithm happy. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-17 15:16:09 -04:00
James Rodewig	a672a2a2d4	[DOCS] Move highlighting docs to separate page (#59768 ) (#59781 ) Moves the highlighting docs from the deprecated 'Request Body Search' chapter to the new subpage of the 'Run a search chapter' section. No substantive changes were made to the content.	2020-07-17 10:57:00 -04:00
István Zoltán Szabó	35512a9284	[DOCS] Adds security privilege info to inference bucket aggregation (#59604 )	2020-07-16 18:03:19 +02:00
Adam Locke	aa260636e5	Indicating that the size parameter defaults to 10. (#59438 ) (#59461 )	2020-07-13 16:27:20 -04:00
Christos Soulios	3868bcc7b8	[7.x] Histogram integration on Histogram field type (#59431 ) Backports #58930 to 7.x Implements histogram aggregation over histogram fields as requested in #53285.	2020-07-13 19:36:33 +03:00
David Kyle	b87cef6fe7	Include the ml inference aggregation doc (#59219 ) (#59226 ) Add to the list of pipeline aggregations	2020-07-08 14:35:08 +01:00
Nik Everett	eb169ae226	Fix lookup support in adjacency matrix (backport of #59099 ) (#59108 ) This request: ``` POST /_search { "aggs": { "a": { "adjacency_matrix": { "filters": { "1": { "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } } } } } } } } ``` Would fail with a 500 error and a message like: ``` { "error": { "root_cause": [ { "type": "illegal_state_exception", "reason":"async actions are left after rewrite" } ] } } ``` This fixes that by moving the query rewrite phase from a synchronous call on the data nodes into the standard aggregation rewrite phase which can properly handle the asynchronous actions.	2020-07-07 10:28:20 -04:00
David Kyle	f6a0c2c59d	[7.x] Pipeline Inference Aggregation (#58965 ) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results.	2020-07-03 09:29:04 +01:00
Nik Everett	40850a780d	Fail variable_width_histogram that collects from many (#58619 ) (#58780 ) Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to #42035	2020-06-30 18:26:45 -04:00
Nik Everett	d22a242613	Docs: Mark variable_width_histogram experimental (#58574 ) We're tracking this aggregation's experimental-progress in #58573. We'd like a little time to be able to make backwards incompatible changes to the aggregation because we're not 100% sure about the request and response format yet.	2020-06-25 16:54:57 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Tal Levy	11086d5c7d	add geo_shape documentation for supported aggregations (#58284 ) (#58354 ) This commit adds documentation for geo_shape fields in aggregations Closes #55495.	2020-06-18 12:36:24 -07:00
James Rodewig	d534862d41	[DOCS] Move search API's `docvalue_fields` examples (#57760 ) (#57989 ) Changes: * Condenses and relocates the `docvalue_fields` example to the 'Run a search' page. * Adds docs for the `docvalue_fields` request body parameter. * Updates several related xrefs. Co-authored-by: debadair <debadair@elastic.co>	2020-06-11 11:25:04 -04:00
Igor Motov	947573f309	Added standard deviation / variance sampling to extended stats (#49782 ) (#57947 ) Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface. Closes #49554 Co-authored-by: Igor Motov <igor@motovs.org> Co-authored-by: andrewjohnson2 <aj114114@gmail.com>	2020-06-11 09:19:44 -04:00
James Rodewig	b03a83a69d	[DOCS] Fix source filtering xrefs (#57720 ) (#57725 )	2020-06-05 09:05:30 -04:00
Igor Motov	8d7f389f3a	Increase search.max_buckets to 65,535 (#57042 ) Increases the default search.max_buckets limit to 65,535, and only counts buckets during reduce phase. Closes #51731	2020-06-03 15:35:41 -04:00
Benjamin Trent	1aea9d5f49	Adding transform docs for geotile_grid (#57000 ) (#57474 ) transforms and composite aggs support geotile_grid as a source. This adds documentation explaining that support.	2020-06-01 15:46:37 -04:00
Nik Everett	07c76f2894	Update date_histogram docs (#56922 ) (#57387 ) * Make it more clear that you can use `month` or `1M`. * Explain rounding rules * Consistently use "time zone" instead of "timezone". It looks like both are right but I see "time zone" much more. And the parameter in elasticsearch is `time_zone` so we may as well line up. Closes #56760 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-05-29 17:40:40 -04:00
James Rodewig	e492c23944	[DOCS] Sort metric and pipeline agg docs (#56613 ) (#56846 ) Co-authored-by: Gil Raphaelli <gil@elastic.co>	2020-05-15 17:15:53 -04:00
Gabriel Petrovay	cb4d5f5042	Fixed calendar intervals documentation (#56666 ) - the 1-letter intervals are not parseable (`m`, `h`, `d`, `w`, `M`, `q`, `y`) - fixed formatting broken by new lines	2020-05-15 16:55:57 -04:00
Tal Levy	5e90ff32f7	Add Normalize Pipeline Aggregation (#56399 ) (#56792 ) This aggregation will perform normalizations of metrics for a given series of data in the form of bucket values. The aggregations supports the following normalizations - rescale 0-1 - rescale 0-100 - percentage of sum - mean normalization - z-score normalization - softmax normalization To specify which normalization is to be used, it can be specified in the normalize agg's `normalizer` field. For example: ``` { "normalize": { "buckets_path": <>, "normalizer": "percent" } } ```	2020-05-14 17:40:15 -07:00
Gabriel Petrovay	ca586f2a8d	[Docs] Correct formatting in datehistogram-aggregation.asciidoc (#56664 )	2020-05-13 12:01:42 +02:00
Ignacio Vera	222ee721ec	Add moving percentiles pipeline aggregation (#55441 ) (#56575 ) Similar to what the moving function aggregation does, except merging windows of percentiles sketches together instead of cumulatively merging final metrics	2020-05-12 11:35:23 +02:00
James Rodewig	ba67ab3b64	[DOCS] Add reference docs for `search.max_buckets` setting (#56449 ) (#56511 ) Adds reference-style setting documentation for the `search.max_buckets` setting. This setting was previously only documented on the [bucket aggregations][0] page. [0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket.html	2020-05-11 09:45:09 -04:00
Christos Soulios	c65f828cb7	[7.x] Histogram field type support for ValueCount and Avg aggregations (#56099 ) Backports #55933 to 7.x Implements value_count and avg aggregations over Histogram fields as discussed in #53285 - value_count returns the sum of all counts array of the histograms - avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array	2020-05-04 13:23:02 +03:00
James Rodewig	e4e02e133e	[DOCS] Remove approximate document counts example from term agg docs (#55442 ) Removes an example from the "Document counts are approximate" section of the terms agg documentation. As #52377 details, the example was no longer accurate in 7.x or 6.8. Document counts were more precise than the example presented. We've opened issue #56025 to discuss re-adding an example later. Co-authored-by: James Rodewig <james.rodewig@elastic.co> Co-authored-by: AB Prashanth <panuradh@buffalo.edu>	2020-04-30 10:11:50 -04:00
Christos Soulios	02bf0c586a	[7.x] Histogram field type support for Sum aggregation (#55916 ) Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285 Backports #55681 to 7.x	2020-04-29 15:06:12 +03:00
Zachary Tong	715c90bf7d	Aggs must specify a `field` or `script` (or both) (#52226 ) This adds a validation to VSParserHelper to ensure that a field or script or both are specified by the user. This is technically required today already, but throws an exception much deeper in the agg framework and has a very unintuitive error for the user (as well as eating more resources instead of failing early)	2020-04-23 19:23:41 -04:00
Igor Motov	51c6f69e02	[7.x] Add support for filters to T-Test aggregation (#54980 ) (#55066 ) Adds support for filters to T-Test aggregation. The filters can be used to select populations based on some criteria and use values from the same or different fields. Closes #53692	2020-04-13 12:28:58 -04:00
Igor Motov	2794572a35	[7.x] Add Student's t-test aggregation support (#54469 ) (#54737 ) Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to #53692	2020-04-06 11:36:47 -04:00
Gil Raphaelli	2984a54b7f	[DOCS] Fix typos in top metrics agg docs (#54299 )	2020-03-27 10:49:21 -04:00
Paweł Krześniak	c0534f4157	[DOCS] link fix (#53973 ) Fix bad link in top_metrics.	2020-03-23 14:20:54 -04:00
Lisa Cawley	278e3fce50	[DOCS] Add anchors for scripted metric aggregations (#53618 )	2020-03-16 12:20:41 -07:00
Nik Everett	f7482f794a	Improve top_metrics docs (#53521 ) (#53619 ) * Removes experimental. * Replaces `"v"` (for value) with `"m"` (for metric). * Move the note about tiebreaking into the list of limitations of the sort. * Explain how you ask for `metrics`. * Clean up some wording. * Link to the docs from `top_metrics`. Closes #51813	2020-03-16 13:47:43 -04:00
Nik Everett	9dcd64c110	Preserve metric types in top_metrics (backport of #53288 ) (#53440 ) This changes the `top_metrics` aggregation to return metrics in their original type. Since it only supports numerics, that means that dates, longs, and doubles will come back as stored, with their appropriate formatter applied.	2020-03-12 17:17:09 -04:00
Anton Dollmaier	35c8226419	[DOCS] Fix parameter formatting for GeoHash grid agg docs (#53032 ) Adds missing colon (`:`) to the parameter definition list.	2020-03-09 08:16:40 -04:00
Nik Everett	28df7ae5ed	Support multiple metrics in `top_metrics` agg (backport of #52965 ) (#53163 ) This adds support for returning multiple metrics to the `top_metrics` agg. It looks like: ``` POST /test/_search?filter_path=aggregations { "aggs": { "tm": { "top_metrics": { "metrics": [ {"field": "v"}, {"field": "m"} ], "sort": {"s": "desc"} } } } } ```	2020-03-05 08:12:01 -05:00
Nik Everett	1d1956ee93	Add size support to `top_metrics` (backport of #52662 ) (#52914 ) This adds support for returning the top "n" metrics instead of just the very top. Relates to #51813	2020-02-27 16:12:52 -05:00
István Zoltán Szabó	1d895118dd	[DOCS] Links transforms in aggregation docs (#52563 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-02-21 08:23:34 +01:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Igor Motov	a66988281f	Add histogram field type support to boxplot aggs (#52265 ) Add support for the histogram field type to boxplot aggs. Closes #52233 Relates to #33112	2020-02-13 18:09:26 -05:00
Igor Motov	667e1a5225	Add Boxplot Aggregation (#52174 ) Adds a `boxplot` aggregation that calculates min, max, medium and the first and the third quartiles of the given data set. Closes #33112	2020-02-11 09:38:17 -05:00
Igor Motov	08e9c673e5	Fix leftover mentions of method parameter in Percentile Aggs (#51272 ) The method parameter is not used in the percentile aggs, instead the method is determined by the presence of `hdr` or `tdigest` objects. Relates to #8324	2020-01-22 10:03:35 -05:00
Tal Levy	9ee2e11181	[7.x] Adds support for geo-bounds filtering in geogrid aggregations (#50996 ) * Adds support for geo-bounds filtering in geogrid aggregations (#50002) It is fairly common to filter the geo point candidates in geohash_grid and geotile_grid aggregations according to some viewable bounding box. This change introduces the option of specifying this filter directly in the tiling aggregation. This is even more relevant to `geo_shape` where the bounds will restrict the shape to be within the bounds this optional `bounds` parameter is parsed in an equivalent fashion to the bounds specified in the geo_bounding_box query.	2020-01-14 11:18:46 -08:00
Nik Everett	1d8e51f89d	Support offset in composite aggs (#50609 ) (#50808 ) Adds support for the `offset` parameter to the `date_histogram` source of composite aggs. The `offset` parameter is supported by the normal `date_histogram` aggregation and is useful for folks that need to measure things from, say, 6am one day to 6am the next day. This is implemented by creating a new `Rounding` that knows how to handle offsets and delegates to other rounding implementations. That implementation doesn't fully implement the `Rounding` contract, namely `nextRoundingValue`. That method isn't used by composite aggs so I can't be sure that any implementation that I add will be correct. I propose to leave it throwing `UnsupportedOperationException` until I need it. Closes #48757	2020-01-09 14:11:24 -05:00

1 2 3 4 5 ...

401 Commits