OpenSearch/docs/reference/aggregations/pipeline/percentiles-bucket-aggregat...

[[search-aggregations-pipeline-percentiles-bucket-aggregation]]
=== Percentiles Bucket Aggregation

experimental[]

A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.
The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.

==== Syntax

A `percentiles_bucket` aggregation looks like this in isolation:

[source,js]
--------------------------------------------------
{
    "percentiles_bucket": {
        "buckets_path": "the_sum"
    }
}
--------------------------------------------------

.`sum_bucket` Parameters
|===
|Parameter Name |Description |Required |Default Value
|`buckets_path` |The path to the buckets we wish to find the sum for (see <<buckets-path-syntax>> for more
 details) |Required |
|`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
 details)|Optional | `skip`
|`format` |format to apply to the output value of this aggregation |Optional | `null`
|`percents` |The list of percentiles to calculate |Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]`
|===

The following snippet calculates the sum of all the total monthly `sales` buckets:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                }
            }
        },
        "sum_monthly_sales": {
            "percentiles_bucket": {
                "buckets_path": "sales_per_month>sales", <1>
                "percents": [ 25.0, 50.0, 75.0 ] <2>
            }
        }
    }
}
--------------------------------------------------
<1> `buckets_path` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
the `sales` aggregation in the `sales_per_month` date histogram.
<2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentil

And the following may be the response:

[source,js]
--------------------------------------------------
{
   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375
               }
            }
         ]
      },
      "percentiles_monthly_sales": {
        "values" : {
            "25.0": 60,
            "50.0": 375",
            "75.0": 550
         }
      }
   }
}
--------------------------------------------------


==== Percentiles_bucket implementation

The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not
interpolate between data points.

The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the
data.  You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
data-points in a single `percentiles_bucket`.
Aggregations: Add percentiles_bucket pipeline aggregations This pipeline will calculate percentiles over a set of sibling buckets. This is an exact implementation, meaning it needs to cache a copy of the series in memory and sort it to determine the percentiles. This comes with a few limitations: to prevent serializing data around, only the requested percentiles are calculated (unlike the TDigest version, which allows the java API to ask for any percentile). It also needs to store the data in-memory, resulting in some overhead if the requested series is very large. 2015-08-28 12:23:19 -04:00			`[[search-aggregations-pipeline-percentiles-bucket-aggregation]]`
			`=== Percentiles Bucket Aggregation`

			`experimental[]`

			`A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.`
			`The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.`

			`==== Syntax`

			A `percentiles_bucket` aggregation looks like this in isolation:

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"percentiles_bucket": {`
			`"buckets_path": "the_sum"`
			`}`
			`}`
			`--------------------------------------------------`

			.`sum_bucket` Parameters
			`\|===`
			`\|Parameter Name \|Description \|Required \|Default Value`
[DOCS] Fix broken inter-page link 2015-09-03 23:17:01 -04:00			\|`buckets_path` \|The path to the buckets we wish to find the sum for (see <<buckets-path-syntax>> for more
Aggregations: Add percentiles_bucket pipeline aggregations This pipeline will calculate percentiles over a set of sibling buckets. This is an exact implementation, meaning it needs to cache a copy of the series in memory and sort it to determine the percentiles. This comes with a few limitations: to prevent serializing data around, only the requested percentiles are calculated (unlike the TDigest version, which allows the java API to ask for any percentile). It also needs to store the data in-memory, resulting in some overhead if the requested series is very large. 2015-08-28 12:23:19 -04:00			`details) \|Required \|`
			\|`gap_policy` \|The policy to apply when gaps are found in the data (see <<gap-policy>> for more
			details)\|Optional \| `skip`
			\|`format` \|format to apply to the output value of this aggregation \|Optional \| `null`
			\|`percents` \|The list of percentiles to calculate \|Optional \| `[ 1, 5, 25, 50, 75, 95, 99 ]`
			`\|===`

			The following snippet calculates the sum of all the total monthly `sales` buckets:

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"sales_per_month" : {`
			`"date_histogram" : {`
			`"field" : "date",`
			`"interval" : "month"`
			`},`
			`"aggs": {`
			`"sales": {`
			`"sum": {`
			`"field": "price"`
			`}`
			`}`
			`}`
			`},`
			`"sum_monthly_sales": {`
			`"percentiles_bucket": {`
Docs: Fixed typos in example buckets_paths > buckets_path. 2016-08-09 17:35:27 -04:00			`"buckets_path": "sales_per_month>sales", <1>`
Aggregations: Add percentiles_bucket pipeline aggregations This pipeline will calculate percentiles over a set of sibling buckets. This is an exact implementation, meaning it needs to cache a copy of the series in memory and sort it to determine the percentiles. This comes with a few limitations: to prevent serializing data around, only the requested percentiles are calculated (unlike the TDigest version, which allows the java API to ask for any percentile). It also needs to store the data in-memory, resulting in some overhead if the requested series is very large. 2015-08-28 12:23:19 -04:00			`"percents": [ 25.0, 50.0, 75.0 ] <2>`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
Docs: Fixed typos in example buckets_paths > buckets_path. 2016-08-09 17:35:27 -04:00			<1> `buckets_path` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
Aggregations: Add percentiles_bucket pipeline aggregations This pipeline will calculate percentiles over a set of sibling buckets. This is an exact implementation, meaning it needs to cache a copy of the series in memory and sort it to determine the percentiles. This comes with a few limitations: to prevent serializing data around, only the requested percentiles are calculated (unlike the TDigest version, which allows the java API to ask for any percentile). It also needs to store the data in-memory, resulting in some overhead if the requested series is very large. 2015-08-28 12:23:19 -04:00			the `sales` aggregation in the `sales_per_month` date histogram.
			<2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentil

			`And the following may be the response:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggregations": {`
			`"sales_per_month": {`
			`"buckets": [`
			`{`
			`"key_as_string": "2015/01/01 00:00:00",`
			`"key": 1420070400000,`
			`"doc_count": 3,`
			`"sales": {`
			`"value": 550`
			`}`
			`},`
			`{`
			`"key_as_string": "2015/02/01 00:00:00",`
			`"key": 1422748800000,`
			`"doc_count": 2,`
			`"sales": {`
			`"value": 60`
			`}`
			`},`
			`{`
			`"key_as_string": "2015/03/01 00:00:00",`
			`"key": 1425168000000,`
			`"doc_count": 2,`
			`"sales": {`
			`"value": 375`
			`}`
			`}`
			`]`
			`},`
			`"percentiles_monthly_sales": {`
			`"values" : {`
			`"25.0": 60,`
			`"50.0": 375",`
			`"75.0": 550`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`


			`==== Percentiles_bucket implementation`

			`The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not`
			`interpolate between data points.`

			`The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means`
			`the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the`
			`data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of`
			data-points in a single `percentiles_bucket`.