OpenSearch

Commit Graph

Author	SHA1	Message	Date
Christos Soulios	c65f828cb7	[7.x] Histogram field type support for ValueCount and Avg aggregations (#56099 ) Backports #55933 to 7.x Implements value_count and avg aggregations over Histogram fields as discussed in #53285 - value_count returns the sum of all counts array of the histograms - avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array	2020-05-04 13:23:02 +03:00
Christos Soulios	02bf0c586a	[7.x] Histogram field type support for Sum aggregation (#55916 ) Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285 Backports #55681 to 7.x	2020-04-29 15:06:12 +03:00
Igor Motov	51c6f69e02	[7.x] Add support for filters to T-Test aggregation (#54980 ) (#55066 ) Adds support for filters to T-Test aggregation. The filters can be used to select populations based on some criteria and use values from the same or different fields. Closes #53692	2020-04-13 12:28:58 -04:00
Igor Motov	2794572a35	[7.x] Add Student's t-test aggregation support (#54469 ) (#54737 ) Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to #53692	2020-04-06 11:36:47 -04:00
Gil Raphaelli	2984a54b7f	[DOCS] Fix typos in top metrics agg docs (#54299 )	2020-03-27 10:49:21 -04:00
Paweł Krześniak	c0534f4157	[DOCS] link fix (#53973 ) Fix bad link in top_metrics.	2020-03-23 14:20:54 -04:00
Lisa Cawley	278e3fce50	[DOCS] Add anchors for scripted metric aggregations (#53618 )	2020-03-16 12:20:41 -07:00
Nik Everett	f7482f794a	Improve top_metrics docs (#53521 ) (#53619 ) * Removes experimental. * Replaces `"v"` (for value) with `"m"` (for metric). * Move the note about tiebreaking into the list of limitations of the sort. * Explain how you ask for `metrics`. * Clean up some wording. * Link to the docs from `top_metrics`. Closes #51813	2020-03-16 13:47:43 -04:00
Nik Everett	9dcd64c110	Preserve metric types in top_metrics (backport of #53288 ) (#53440 ) This changes the `top_metrics` aggregation to return metrics in their original type. Since it only supports numerics, that means that dates, longs, and doubles will come back as stored, with their appropriate formatter applied.	2020-03-12 17:17:09 -04:00
Nik Everett	28df7ae5ed	Support multiple metrics in `top_metrics` agg (backport of #52965 ) (#53163 ) This adds support for returning multiple metrics to the `top_metrics` agg. It looks like: ``` POST /test/_search?filter_path=aggregations { "aggs": { "tm": { "top_metrics": { "metrics": [ {"field": "v"}, {"field": "m"} ], "sort": {"s": "desc"} } } } } ```	2020-03-05 08:12:01 -05:00
Nik Everett	1d1956ee93	Add size support to `top_metrics` (backport of #52662 ) (#52914 ) This adds support for returning the top "n" metrics instead of just the very top. Relates to #51813	2020-02-27 16:12:52 -05:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Igor Motov	a66988281f	Add histogram field type support to boxplot aggs (#52265 ) Add support for the histogram field type to boxplot aggs. Closes #52233 Relates to #33112	2020-02-13 18:09:26 -05:00
Igor Motov	667e1a5225	Add Boxplot Aggregation (#52174 ) Adds a `boxplot` aggregation that calculates min, max, medium and the first and the third quartiles of the given data set. Closes #33112	2020-02-11 09:38:17 -05:00
Igor Motov	08e9c673e5	Fix leftover mentions of method parameter in Percentile Aggs (#51272 ) The method parameter is not used in the percentile aggs, instead the method is determined by the presence of `hdr` or `tdigest` objects. Relates to #8324	2020-01-22 10:03:35 -05:00
James Rodewig	1299dda437	[DOCS] Warn about using `geo_centroid` as sub-agg to `geohash_grid` (#50038 ) If `geo_point fields` are multi-valued, using `geo_centroid` as a sub-agg to `geohash_grid` could result in centroids outside of bucket boundaries. This adds a related warning to the geo_centroid agg docs.	2020-01-06 07:47:54 -06:00
Gilad Gal	9fdfb075bb	Deleted 'a' before plural 'messages' Deleted 'a' before plural 'messages'	2019-12-30 21:25:15 +02:00
James Rodewig	694b119f0a	[DOCS] Percentile aggs are non-deterministic (#50468 ) Percentile aggregations are non-deterministic. A percentile aggregation can produce different results even when using the same data. Based on [this discuss post][0], the non-deterministic property stems from processes in Lucene that can affect the order in which docs are provided to the aggregation. This adds a warning stating that the aggregation is non-deterministic and what that means. [0]: https://discuss.elastic.co/t/different-results-for-same-query/111757	2019-12-23 13:13:34 -05:00
James Rodewig	364eb2d34c	[DOCS] Correct percentile rank agg example response (#50052 ) The example snippets in the percentile rank agg docs use a test dataset named `latency`, which is generated from docs/gradle.build. At some point the dataset and example snippets were updated, but the text surrounding the snippets was not. This means the text and the example snippets shown no longer match up. This corrects that by changing the snippets using /TESTRESPONSE magic comments.	2019-12-12 09:06:41 -05:00
Ignacio Vera	326fe7566e	New Histogram field mapper that supports percentiles aggregations. (#48580 ) (#49683 ) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.	2019-11-28 15:06:26 +01:00
Christos Soulios	d9f0245b10	[7.x] Implement stats aggregation for string terms (#49097 ) Backport of #47468 to 7.x This PR adds a new metric aggregation called string_stats that operates on string terms of a document and returns the following: min_length: The length of the shortest term max_length: The length of the longest term avg_length: The average length of all terms distribution: The probability distribution of all characters appearing in all terms entropy: The total Shannon entropy value calculated for all terms This aggregation has been implemented as an analytics plugin.	2019-11-15 14:36:21 +02:00
Ian Danforth	28c1677341	[DOCS] Fix typo in percentile rank aggregation docs (#47247 )	2019-10-15 15:56:45 -04:00
James Rodewig	f04573f8e8	[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449 ) (#46459 )	2019-09-06 16:09:09 -04:00
James Rodewig	1f36c4e50c	[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159 ) (#46332 )	2019-09-05 10:11:25 -04:00
James Rodewig	d46545f729	[DOCS] Update anchors and links for Elasticsearch API relocation (#44500 )	2019-07-19 09:18:23 -04:00
James Rodewig	53702efddd	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:20:17 -04:00
Ignacio Vera	d119abdf96	Improve accuracy for Geo Centroid Aggregation (#41514 ) keeps the partial results as doubles and uses Kahan summation to help reduce floating point errors.	2019-04-25 15:25:48 +02:00
Julie Tibshirani	9ca26b7e63	Remove more references to type in docs. (#37946 ) * Update the top-level 'getting started' guide. * Remove custom types from the painless getting started documentation. * Fix an incorrect references to '_doc' in the cardinality query docs. * Update the _update docs to use the typeless API format.	2019-01-29 10:51:07 -08:00
Christoph Büscher	95a6951f78	Use new bulk API endpoint in the docs (#37698 ) This change switches to using the typeless bulk API endpoint in the documentation snippets where possible	2019-01-23 09:46:28 +01:00
Boaz Leskes	52ba407931	Expose sequence number and primary terms in search responses (#37639 ) Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.	2019-01-23 09:01:58 +01:00
Christoph Büscher	34f2d2ec91	Remove remaining occurances of "include_type_name=true" in docs (#37646 )	2019-01-22 15:13:52 +01:00
Christoph Büscher	3a96608b3f	Remove more include_type_name and types from docs (#37601 )	2019-01-18 14:11:18 +01:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
Andy Bristol	b8280ea7cc	median absolute deviation agg (#34482 ) This commit adds a new single value metric aggregation that calculates the statistic called median absolute deviation, which is a measure of variability that works on more types of data than standard deviation Our calculation of MAD is approximated using t-digests. In the collect phase, we collect each value visited into a t-digest. In the reduce phase, we merge all value t-digests, then create a t-digest of deviations using the first t-digest's median and centroids	2018-10-30 07:22:52 -07:00
Gordon Brown	794d4fa879	Label required scripts in Scripted Metric Agg docs (#35051 ) When combine_script and reduce_script were made into required parameters for Scripted Metric aggregations in #33452, the docs were not updated to reflect that. This marks those parameters as required in the documentation.	2018-10-29 15:13:14 -06:00
Julie Tibshirani	f854330e06	Make sure to use the type _doc in the REST documentation. (#34662 ) * Replace custom type names with _doc in REST examples. * Avoid using two mapping types in the percolator docs. * Rename doc -> _doc in the main repository README. * Also replace some custom type names in the HLRC docs.	2018-10-22 11:54:04 -07:00
Zachary Tong	d981746142	[Docs] clarification about cardinality accuracy (#34616 ) Adds a bit more clarification about how accuracy is dependent on the dataset in question. Closes #18231	2018-10-22 13:15:45 -04:00
ben5556	012b9c7539	Corrected aggregation name to match the example (#33786 )	2018-09-17 18:24:43 -07:00
Jim Ferenczi	7ad71f906a	Upgrade to a Lucene 8 snapshot (#33310 ) The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly. Some comments about the change: * Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309 Closes #32899	2018-09-06 14:42:06 +02:00
Sandeep Kanabar	7ad16ffd84	Docs: Correcting a typo in tophits (#32359 )	2018-07-26 13:30:01 -04:00
Zachary Tong	6ba144ae31	Add WeightedAvg metric aggregation (#31037 ) Adds a new single-value metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents. These values can be extracted from specific numeric fields in the documents. When calculating a regular average, each datapoint has an equal "weight"; it contributes equally to the final value. In contrast, weighted averages scale each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document, or provided by a script. As a formula, a weighted average is the `∑(value * weight) / ∑(weight)` A regular average can be thought of as a weighted average where every value has an implicit weight of `1`. Closes #15731	2018-07-23 18:33:15 -04:00
Peter Evers	ea15284230	Docs: Match the examples in the description (#31710 ) Prose drifted from snippet.	2018-07-02 14:12:49 -04:00
Peter Evers	050fbc8f3d	Docs: Fix description of percentile ranks example example (#31652 )	2018-06-28 09:29:56 -04:00
Jonathan Little	8e4768890a	Migrate scripted metric aggregation scripts to ScriptContext design (#30111 ) * Migrate scripted metric aggregation scripts to ScriptContext design #29328 * Rename new script context container class and add clarifying comments to remaining references to params._agg(s) * Misc cleanup: make mock metric agg script inner classes static * Move _score to an accessor rather than an arg for scripted metric agg scripts This causes the score to be evaluated only when it's used. * Documentation changes for params._agg -> agg * Migration doc addition for scripted metric aggs _agg object change * Rename "agg" Scripted Metric Aggregation script context variable to "state" * Rename a private base class from ...Agg to ...State that I missed in my last commit * Clean up imports after merge	2018-06-25 12:01:33 +01:00
Colin Goodheart-Smithe	58e9446e00	Removes experimental tag from scripted_metric aggregation (#31298 )	2018-06-13 17:24:32 +01:00
Christoph Büscher	4777d8a2df	[Docs] Fix typo in Min Aggregation reference (#30899 )	2018-05-31 15:05:03 +02:00
Piotr Prądzyński	cefbd29db3	top_hits doc example description update (#30676 ) Example description does not fit example code.	2018-05-17 15:21:25 +01:00
Jason Tedor	4a4e3d70d5	Default to one shard (#30539 ) This commit changes the default out-of-the-box configuration for the number of shards from five to one. We think this will help address a common problem of oversharding. For users with time-based indices that need a different default, this can be managed with index templates. For users with non-time-based indices that find they need to re-shard with the split API in place they no longer need to resort only to reindexing. Since this has the impact of changing the default number of shards used in REST tests, we want to ensure that we still have coverage for issues that could arise from multiple shards. As such, we randomize (rarely) the default number of shards in REST tests to two. This is managed via a global index template. However, some tests check the templates that are in the cluster state during the test. Since this template is randomly there, we need a way for tests to skip adding the template used to set the number of shards to two. For this we add the default_shards feature skip. To avoid having to write our docs in a complicated way because sometimes they might be behind one shard, and sometimes they might be behind two shards we apply the default_shards feature skip to all docs tests. That is, these tests will always run with the default number of shards (one).	2018-05-14 12:22:35 -04:00
Karim Frenn	3acca0b35c	[Docs] Fix typo in cardinality-aggregation.asciidoc (#30434 )	2018-05-08 16:12:36 +02:00

1 2 3

113 Commits