Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.
Relates to #53692
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
Backport of #47468 to 7.x
This PR adds a new metric aggregation called string_stats that operates on string terms of a document and returns the following:
min_length: The length of the shortest term
max_length: The length of the longest term
avg_length: The average length of all terms
distribution: The probability distribution of all characters appearing in all terms
entropy: The total Shannon entropy value calculated for all terms
This aggregation has been implemented as an analytics plugin.
This commit adds a new single value metric aggregation that calculates
the statistic called median absolute deviation, which is a measure of
variability that works on more types of data than standard deviation
Our calculation of MAD is approximated using t-digests. In the collect
phase, we collect each value visited into a t-digest. In the reduce
phase, we merge all value t-digests, then create a t-digest of
deviations using the first t-digest's median and centroids
Adds a new single-value metrics aggregation that computes the weighted
average of numeric values that are extracted from the aggregated
documents. These values can be extracted from specific numeric
fields in the documents.
When calculating a regular average, each datapoint has an equal "weight"; it
contributes equally to the final value. In contrast, weighted averages
scale each datapoint differently. The amount that each datapoint contributes
to the final value is extracted from the document, or provided by a script.
As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`
A regular average can be thought of as a weighted average where every value has
an implicit weight of `1`.
Closes#15731
This commit adds a new metric aggregator for computing the geo_centroid over a set of geo_point fields. This can be combined with other aggregators (e.g., geohash_grid, significant_terms) for computing the geospatial centroid based on the document sets from other aggregation results.