[role="xpack"] [testenv="basic"] [[search-aggregations-metrics-boxplot-aggregation]] === Boxplot Aggregation A `boxplot` metrics aggregation that computes boxplot of numeric values extracted from the aggregated documents. These values can be generated by a provided script or extracted from specific numeric or <> in the documents. The `boxplot` aggregation returns essential information for making a {wikipedia}/Box_plot[box plot]: minimum, maximum median, first quartile (25th percentile) and third quartile (75th percentile) values. ==== Syntax A `boxplot` aggregation looks like this in isolation: [source,js] -------------------------------------------------- { "boxplot": { "field": "load_time" } } -------------------------------------------------- // NOTCONSOLE Let's look at a boxplot representing load time: [source,console] -------------------------------------------------- GET latency/_search { "size": 0, "aggs": { "load_time_boxplot": { "boxplot": { "field": "load_time" <1> } } } } -------------------------------------------------- // TEST[setup:latency] <1> The field `load_time` must be a numeric field The response will look like this: [source,console-result] -------------------------------------------------- { ... "aggregations": { "load_time_boxplot": { "min": 0.0, "max": 990.0, "q1": 165.0, "q2": 445.0, "q3": 725.0 } } } -------------------------------------------------- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] ==== Script The boxplot metric supports scripting. For example, if our load times are in milliseconds but we want values calculated in seconds, we could use a script to convert them on-the-fly: [source,console] -------------------------------------------------- GET latency/_search { "size": 0, "aggs": { "load_time_boxplot": { "boxplot": { "script": { "lang": "painless", "source": "doc['load_time'].value / params.timeUnit", <1> "params": { "timeUnit": 1000 <2> } } } } } } -------------------------------------------------- // TEST[setup:latency] <1> The `field` parameter is replaced with a `script` parameter, which uses the script to generate values which percentiles are calculated on <2> Scripting supports parameterized input just like any other script This will interpret the `script` parameter as an `inline` script with the `painless` script language and no script parameters. To use a stored script use the following syntax: [source,console] -------------------------------------------------- GET latency/_search { "size": 0, "aggs": { "load_time_boxplot": { "boxplot": { "script": { "id": "my_script", "params": { "field": "load_time" } } } } } } -------------------------------------------------- // TEST[setup:latency,stored_example_script] [[search-aggregations-metrics-boxplot-aggregation-approximation]] ==== Boxplot values are (usually) approximate The algorithm used by the `boxplot` metric is called TDigest (introduced by Ted Dunning in https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf[Computing Accurate Quantiles using T-Digests]). [WARNING] ==== Boxplot as other percentile aggregations are also {wikipedia}/Nondeterministic_algorithm[non-deterministic]. This means you can get slightly different results using the same data. ==== [[search-aggregations-metrics-boxplot-aggregation-compression]] ==== Compression Approximate algorithms must balance memory utilization with estimation accuracy. This balance can be controlled using a `compression` parameter: [source,console] -------------------------------------------------- GET latency/_search { "size": 0, "aggs": { "load_time_boxplot": { "boxplot": { "field": "load_time", "compression": 200 <1> } } } } -------------------------------------------------- // TEST[setup:latency] <1> Compression controls memory usage and approximation error include::percentile-aggregation.asciidoc[tags=t-digest] ==== Missing value The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. [source,console] -------------------------------------------------- GET latency/_search { "size": 0, "aggs": { "grade_boxplot": { "boxplot": { "field": "grade", "missing": 10 <1> } } } } -------------------------------------------------- // TEST[setup:latency] <1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.