OpenSearch/docs/reference/aggregations/pipeline/derivative-aggregation.asciidoc
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00

315 lines
9.1 KiB
Plaintext

[[search-aggregations-pipeline-derivative-aggregation]]
=== Derivative Aggregation
A parent pipeline aggregation which calculates the derivative of a specified metric in a parent histogram (or date_histogram)
aggregation. The specified metric must be numeric and the enclosing histogram must have `min_doc_count` set to `0` (default
for `histogram` aggregations).
==== Syntax
A `derivative` aggregation looks like this in isolation:
[source,js]
--------------------------------------------------
"derivative": {
"buckets_path": "the_sum"
}
--------------------------------------------------
// NOTCONSOLE
.`derivative` Parameters
|===
|Parameter Name |Description |Required |Default Value
|`buckets_path` |The path to the buckets we wish to find the derivative for (see <<buckets-path-syntax>> for more
details) |Required |
|`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
details)|Optional |`skip`
|`format` |format to apply to the output value of this aggregation |Optional | `null`
|===
==== First Order Derivative
The following snippet calculates the derivative of the total monthly `sales`:
[source,js]
--------------------------------------------------
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_path": "sales" <1>
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
<1> `buckets_path` instructs this derivative aggregation to use the output of the `sales` aggregation for the derivative
And the following may be the response:
[source,js]
--------------------------------------------------
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
} <1>
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
},
"sales_deriv": {
"value": -490.0 <2>
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2, <3>
"sales": {
"value": 375.0
},
"sales_deriv": {
"value": 315.0
}
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 11/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
<1> No derivative for the first bucket since we need at least 2 data points to calculate the derivative
<2> Derivative value units are implicitly defined by the `sales` aggregation and the parent histogram so in this case the units
would be $/month assuming the `price` field has units of $.
<3> The number of documents in the bucket are represented by the `doc_count`
==== Second Order Derivative
A second order derivative can be calculated by chaining the derivative pipeline aggregation onto the result of another derivative
pipeline aggregation as in the following example which will calculate both the first and the second order derivative of the total
monthly sales:
[source,js]
--------------------------------------------------
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_path": "sales"
}
},
"sales_2nd_deriv": {
"derivative": {
"buckets_path": "sales_deriv" <1>
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
<1> `buckets_path` for the second derivative points to the name of the first derivative
And the following may be the response:
[source,js]
--------------------------------------------------
{
"took": 50,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
} <1>
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
},
"sales_deriv": {
"value": -490.0
} <1>
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
},
"sales_deriv": {
"value": 315.0
},
"sales_2nd_deriv": {
"value": 805.0
}
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 50/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
<1> No second derivative for the first two buckets since we need at least 2 data points from the first derivative to calculate the
second derivative
==== Units
The derivative aggregation allows the units of the derivative values to be specified. This returns an extra field in the response
`normalized_value` which reports the derivative value in the desired x-axis units. In the below example we calculate the derivative
of the total sales per month but ask for the derivative of the sales as in the units of sales per day:
[source,js]
--------------------------------------------------
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_path": "sales",
"unit": "day" <1>
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
<1> `unit` specifies what unit to use for the x-axis of the derivative calculation
And the following may be the response:
[source,js]
--------------------------------------------------
{
"took": 50,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
} <1>
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
},
"sales_deriv": {
"value": -490.0, <1>
"normalized_value": -15.806451612903226 <2>
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
},
"sales_deriv": {
"value": 315.0,
"normalized_value": 11.25
}
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 50/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
<1> `value` is reported in the original units of 'per month'
<2> `normalized_value` is reported in the desired units of 'per day'