119 lines
3.8 KiB
Plaintext
119 lines
3.8 KiB
Plaintext
[[search-aggregations-pipeline-percentiles-bucket-aggregation]]
|
|
=== Percentiles Bucket Aggregation
|
|
|
|
experimental[]
|
|
|
|
A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.
|
|
The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
|
|
|
|
==== Syntax
|
|
|
|
A `percentiles_bucket` aggregation looks like this in isolation:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"percentiles_bucket": {
|
|
"buckets_path": "the_sum"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
.`sum_bucket` Parameters
|
|
|===
|
|
|Parameter Name |Description |Required |Default Value
|
|
|`buckets_path` |The path to the buckets we wish to find the sum for (see <<buckets-path-syntax>> for more
|
|
details) |Required |
|
|
|`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
|
|
details)|Optional | `skip`
|
|
|`format` |format to apply to the output value of this aggregation |Optional | `null`
|
|
|`percents` |The list of percentiles to calculate |Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]`
|
|
|===
|
|
|
|
The following snippet calculates the sum of all the total monthly `sales` buckets:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"sales_per_month" : {
|
|
"date_histogram" : {
|
|
"field" : "date",
|
|
"interval" : "month"
|
|
},
|
|
"aggs": {
|
|
"sales": {
|
|
"sum": {
|
|
"field": "price"
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"sum_monthly_sales": {
|
|
"percentiles_bucket": {
|
|
"buckets_paths": "sales_per_month>sales", <1>
|
|
"percents": [ 25.0, 50.0, 75.0 ] <2>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> `bucket_paths` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
|
|
the `sales` aggregation in the `sales_per_month` date histogram.
|
|
<2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentil
|
|
|
|
And the following may be the response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggregations": {
|
|
"sales_per_month": {
|
|
"buckets": [
|
|
{
|
|
"key_as_string": "2015/01/01 00:00:00",
|
|
"key": 1420070400000,
|
|
"doc_count": 3,
|
|
"sales": {
|
|
"value": 550
|
|
}
|
|
},
|
|
{
|
|
"key_as_string": "2015/02/01 00:00:00",
|
|
"key": 1422748800000,
|
|
"doc_count": 2,
|
|
"sales": {
|
|
"value": 60
|
|
}
|
|
},
|
|
{
|
|
"key_as_string": "2015/03/01 00:00:00",
|
|
"key": 1425168000000,
|
|
"doc_count": 2,
|
|
"sales": {
|
|
"value": 375
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"percentiles_monthly_sales": {
|
|
"values" : {
|
|
"25.0": 60,
|
|
"50.0": 375",
|
|
"75.0": 550
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
==== Percentiles_bucket implementation
|
|
|
|
The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not
|
|
interpolate between data points.
|
|
|
|
The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
|
|
the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the
|
|
data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
|
|
data-points in a single `percentiles_bucket`. |