OpenSearch/docs/reference/aggregations/pipeline/percentiles-bucket-aggregation.asciidoc
Zachary Tong 6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00

133 lines
4.3 KiB
Plaintext

[[search-aggregations-pipeline-percentiles-bucket-aggregation]]
=== Percentiles Bucket Aggregation
A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.
The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
==== Syntax
A `percentiles_bucket` aggregation looks like this in isolation:
[source,js]
--------------------------------------------------
{
"percentiles_bucket": {
"buckets_path": "the_sum"
}
}
--------------------------------------------------
// NOTCONSOLE
[[percentiles-bucket-params]]
.`percentiles_bucket` Parameters
[options="header"]
|===
|Parameter Name |Description |Required |Default Value
|`buckets_path` |The path to the buckets we wish to find the percentiles for (see <<buckets-path-syntax>> for more
details) |Required |
|`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
details)|Optional | `skip`
|`format` |format to apply to the output value of this aggregation |Optional | `null`
|`percents` |The list of percentiles to calculate |Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]`
|`keyed` |Flag which returns the range as an hash instead of an array of key-value pairs |Optional | `true`
|===
The following snippet calculates the percentiles for the total monthly `sales` buckets:
[source,js]
--------------------------------------------------
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"calendar_interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"percentiles_monthly_sales": {
"percentiles_bucket": {
"buckets_path": "sales_per_month>sales", <1>
"percents": [ 25.0, 50.0, 75.0 ] <2>
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
<1> `buckets_path` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
the `sales` aggregation in the `sales_per_month` date histogram.
<2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentiles.
And the following may be the response:
[source,js]
--------------------------------------------------
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"percentiles_monthly_sales": {
"values" : {
"25.0": 375.0,
"50.0": 375.0,
"75.0": 550.0
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 11/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
==== Percentiles_bucket implementation
The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not
interpolate between data points.
The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the
data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
data-points in a single `percentiles_bucket`.