OpenSearch/docs/reference/aggregations/pipeline/percentiles-bucket-aggregation.asciidoc
Nik Everett c66db9a81e Add // CONSOLE to much of pipeline agg docs
Most of the examples in the pipeline aggregation docs use a small
"sales" test data set and I converted all of the examples that use
it to `// CONSOLE`. There are still a bunch of snippets in the pipeline
aggregation docs that aren't `// CONSOLE` so they aren't tested. Most
of them are "this is the most basic form of this aggregation" so they
are more immune to errors and bit rot then the examples that I converted.
I'd like to do something with them as well but I'm not sure what.

Also, the moving average docs and serial diff docs didn't get a lot of
love from this pass because they don't use the test data set or follow
the same general layout.

Relates to #18160
2016-08-17 09:26:41 -04:00

131 lines
4.1 KiB
Plaintext

[[search-aggregations-pipeline-percentiles-bucket-aggregation]]
=== Percentiles Bucket Aggregation
experimental[]
A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.
The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
==== Syntax
A `percentiles_bucket` aggregation looks like this in isolation:
[source,js]
--------------------------------------------------
{
"percentiles_bucket": {
"buckets_path": "the_sum"
}
}
--------------------------------------------------
.`sum_bucket` Parameters
|===
|Parameter Name |Description |Required |Default Value
|`buckets_path` |The path to the buckets we wish to find the sum for (see <<buckets-path-syntax>> for more
details) |Required |
|`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
details)|Optional | `skip`
|`format` |format to apply to the output value of this aggregation |Optional | `null`
|`percents` |The list of percentiles to calculate |Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]`
|===
The following snippet calculates the sum of all the total monthly `sales` buckets:
[source,js]
--------------------------------------------------
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"percentiles_monthly_sales": {
"percentiles_bucket": {
"buckets_path": "sales_per_month>sales", <1>
"percents": [ 25.0, 50.0, 75.0 ] <2>
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
<1> `buckets_path` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
the `sales` aggregation in the `sales_per_month` date histogram.
<2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentil
And the following may be the response:
[source,js]
--------------------------------------------------
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"percentiles_monthly_sales": {
"values" : {
"25.0": 60.0,
"50.0": 375.0,
"75.0": 550.0
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 11/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
==== Percentiles_bucket implementation
The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not
interpolate between data points.
The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the
data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
data-points in a single `percentiles_bucket`.