diff --git a/docs/reference/search/aggregations/reducers/images/double_0.2beta.png b/docs/reference/search/aggregations/reducers/images/double_0.2beta.png new file mode 100644 index 00000000000..64499b98342 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/double_0.2beta.png differ diff --git a/docs/reference/search/aggregations/reducers/images/double_0.7beta.png b/docs/reference/search/aggregations/reducers/images/double_0.7beta.png new file mode 100644 index 00000000000..b9f530227d9 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/double_0.7beta.png differ diff --git a/docs/reference/search/aggregations/reducers/images/double_prediction_global.png b/docs/reference/search/aggregations/reducers/images/double_prediction_global.png new file mode 100644 index 00000000000..faee6d22bc2 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/double_prediction_global.png differ diff --git a/docs/reference/search/aggregations/reducers/images/double_prediction_local.png b/docs/reference/search/aggregations/reducers/images/double_prediction_local.png new file mode 100644 index 00000000000..930a5cfde9b Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/double_prediction_local.png differ diff --git a/docs/reference/search/aggregations/reducers/images/linear_100window.png b/docs/reference/search/aggregations/reducers/images/linear_100window.png new file mode 100644 index 00000000000..3a4d51ae956 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/linear_100window.png differ diff --git a/docs/reference/search/aggregations/reducers/images/linear_10window.png b/docs/reference/search/aggregations/reducers/images/linear_10window.png new file mode 100644 index 00000000000..1407ded8791 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/linear_10window.png differ diff --git a/docs/reference/search/aggregations/reducers/images/movavg_100window.png b/docs/reference/search/aggregations/reducers/images/movavg_100window.png new file mode 100644 index 00000000000..45094ec2681 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/movavg_100window.png differ diff --git a/docs/reference/search/aggregations/reducers/images/movavg_10window.png b/docs/reference/search/aggregations/reducers/images/movavg_10window.png new file mode 100644 index 00000000000..1e9f543385f Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/movavg_10window.png differ diff --git a/docs/reference/search/aggregations/reducers/images/simple_prediction.png b/docs/reference/search/aggregations/reducers/images/simple_prediction.png new file mode 100644 index 00000000000..d74724e1546 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/simple_prediction.png differ diff --git a/docs/reference/search/aggregations/reducers/images/single_0.2alpha.png b/docs/reference/search/aggregations/reducers/images/single_0.2alpha.png new file mode 100644 index 00000000000..d96cf771743 Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/single_0.2alpha.png differ diff --git a/docs/reference/search/aggregations/reducers/images/single_0.7alpha.png b/docs/reference/search/aggregations/reducers/images/single_0.7alpha.png new file mode 100644 index 00000000000..bf7bdd1752e Binary files /dev/null and b/docs/reference/search/aggregations/reducers/images/single_0.7alpha.png differ diff --git a/docs/reference/search/aggregations/reducers/movavg-reducer.asciidoc b/docs/reference/search/aggregations/reducers/movavg-reducer.asciidoc new file mode 100644 index 00000000000..d9759629b75 --- /dev/null +++ b/docs/reference/search/aggregations/reducers/movavg-reducer.asciidoc @@ -0,0 +1,296 @@ +[[search-aggregations-reducers-movavg-reducer]] +=== Moving Average Aggregation + +Given an ordered series of data, the Moving Average aggregation will slide a window across the data and emit the average +value of that window. For example, given the data `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`, we can calculate a simple moving +average with windows size of `5` as follows: + +- (1 + 2 + 3 + 4 + 5) / 5 = 3 +- (2 + 3 + 4 + 5 + 6) / 5 = 4 +- (3 + 4 + 5 + 6 + 7) / 5 = 5 +- etc + +Moving averages are a simple method to smooth sequential data. Moving averages are typically applied to time-based data, +such as stock prices or server metrics. The smoothing can be used to eliminate high frequency fluctuations or random noise, +which allows the lower frequency trends to be more easily visualized, such as seasonality. + +==== Syntax + +A `moving_avg` aggregation looks like this in isolation: + +[source,js] +-------------------------------------------------- +{ + "movavg": { + "buckets_path": "the_sum", + "model": "double_exp", + "window": 5, + "gap_policy": "insert_zero", + "settings": { + "alpha": 0.8 + } + } +} +-------------------------------------------------- + +.`moving_avg` Parameters +|=== +|Parameter Name |Description |Required |Default + +|`buckets_path` |The path to the metric that we wish to calculate a moving average for |Required | +|`model` |The moving average weighting model that we wish to use |Optional |`simple` +|`gap_policy` |Determines what should happen when a gap in the data is encountered. |Optional |`insert_zero` +|`window` |The size of window to "slide" across the histogram. |Optional |`5` +|`settings` |Model-specific settings, contents which differ depending on the model specified. |Optional | +|=== + + +`moving_avg` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be +embedded like any other metric aggregation: + +[source,js] +-------------------------------------------------- +{ + "my_date_histo":{ <1> + "date_histogram":{ + "field":"timestamp", + "interval":"day", + "min_doc_count": 0 <2> + }, + "aggs":{ + "the_sum":{ + "sum":{ "field": "lemmings" } <3> + }, + "the_movavg":{ + "moving_avg":{ "buckets_path": "the_sum" } <4> + } + } + } +} +-------------------------------------------------- +<1> A `date_histogram` named "my_date_histo" is constructed on the "timestamp" field, with one-day intervals +<2> We must specify "min_doc_count: 0" in our date histogram that all buckets are returned, even if they are empty. +<3> A `sum` metric is used to calculate the sum of a field. This could be any metric (sum, min, max, etc) +<4> Finally, we specify a `moving_avg` aggregation which uses "the_sum" metric as it's input. + +Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally +add normal metrics, such as a `sum`, inside of that histogram. Finally, the `moving_avg` is embedded inside the histogram. +The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram. + +A moving average can also be calculated on the document count of each bucket, instead of a metric: + +[source,js] +-------------------------------------------------- +{ + "my_date_histo":{ + "date_histogram":{ + "field":"timestamp", + "interval":"day", + "min_doc_count": 0 + }, + "aggs":{ + "the_movavg":{ + "moving_avg":{ "buckets_path": "_count" } <1> + } + } + } +} +-------------------------------------------------- +<1> By using `_count` instead of a metric name, we can calculate the moving average of document counts in the histogram + +==== Models + +The `moving_avg` aggregation includes four different moving average "models". The main difference is how the values in the +window are weighted. As data-points become "older" in the window, they may be weighted differently. This will +affect the final average for that window. + +Models are specified using the `model` parameter. Some models may have optional configurations which are specified inside +the `settings` parameter. + +===== Simple + +The `simple` model calculates the sum of all values in the window, then divides by the size of the window. It is effectively +a simple arithmetic mean of the window. The simple model does not perform any time-dependent weighting, which means +the values from a `simple` moving average tend to "lag" behind the real data. + +[source,js] +-------------------------------------------------- +{ + "the_movavg":{ + "moving_avg":{ + "buckets_path": "the_sum", + "model" : "simple" + } +} +-------------------------------------------------- + +A `simple` model has no special settings to configure + +The window size can change the behavior of the moving average. For example, a small window (`"window": 10`) will closely +track the data and only smooth out small scale fluctuations: + +[[movavg_10window]] +.Moving average with window of size 10 +image::images/movavg_10window.png[] + +In contrast, a `simple` moving average with larger window (`"window": 100`) will smooth out all higher-frequency fluctuations, +leaving only low-frequency, long term trends. It also tends to "lag" behind the actual data by a substantial amount: + +[[movavg_100window]] +.Moving average with window of size 100 +image::images/movavg_100window.png[] + + +==== Linear + +The `linear` model assigns a linear weighting to points in the series, such that "older" datapoints (e.g. those at +the beginning of the window) contribute a linearly less amount to the total average. The linear weighting helps reduce +the "lag" behind the data's mean, since older points have less influence. + +[source,js] +-------------------------------------------------- +{ + "the_movavg":{ + "moving_avg":{ + "buckets_path": "the_sum", + "model" : "linear" + } +} +-------------------------------------------------- + +A `linear` model has no special settings to configure + +Like the `simple` model, window size can change the behavior of the moving average. For example, a small window (`"window": 10`) +will closely track the data and only smooth out small scale fluctuations: + +[[linear_10window]] +.Linear moving average with window of size 10 +image::images/linear_10window.png[] + +In contrast, a `linear` moving average with larger window (`"window": 100`) will smooth out all higher-frequency fluctuations, +leaving only low-frequency, long term trends. It also tends to "lag" behind the actual data by a substantial amount, +although typically less than the `simple` model: + +[[linear_100window]] +.Linear moving average with window of size 100 +image::images/linear_100window.png[] + +==== Single Exponential + +The `single_exp` model is similar to the `linear` model, except older data-points become exponentially less important, +rather than linearly less important. The speed at which the importance decays can be controlled with an `alpha` +setting. Small values make the weight decay slowly, which provides greater smoothing and takes into account a larger +portion of the window. Larger valuers make the weight decay quickly, which reduces the impact of older values on the +moving average. This tends to make the moving average track the data more closely but with less smoothing. + +The default value of `alpha` is `0.5`, and the setting accepts any float from 0-1 inclusive. + +[source,js] +-------------------------------------------------- +{ + "the_movavg":{ + "moving_avg":{ + "buckets_path": "the_sum", + "model" : "single_exp", + "settings" : { + "alpha" : 0.5 + } + } +} +-------------------------------------------------- + + + +[[single_0.2alpha]] +.Single Exponential moving average with window of size 10, alpha = 0.2 +image::images/single_0.2alpha.png[] + +[[single_0.7alpha]] +.Single Exponential moving average with window of size 10, alpha = 0.7 +image::images/single_0.7alpha.png[] + +==== Double Exponential + +The `double_exp` model, sometimes called "Holt's Linear Trend" model, incorporates a second exponential term which +tracks the data's trend. Single exponential does not perform well when the data has an underlying linear trend. The +double exponential model calculates two values internally: a "level" and a "trend". + +The level calculation is similar to `single_exp`, and is an exponentially weighted view of the data. The difference is +that the previously smoothed value is used instead of the raw value, which allows it to stay close to the original series. +The trend calculation looks at the difference between the current and last value (e.g. the slope, or trend, of the +smoothed data). The trend value is also exponentially weighted. + +Values are produced by multiplying the level and trend components. + +The default value of `alpha` and `beta` is `0.5`, and the settings accept any float from 0-1 inclusive. + +[source,js] +-------------------------------------------------- +{ + "the_movavg":{ + "moving_avg":{ + "buckets_path": "the_sum", + "model" : "double_exp", + "settings" : { + "alpha" : 0.5, + "beta" : 0.5 + } + } +} +-------------------------------------------------- + +In practice, the `alpha` value behaves very similarly in `double_exp` as `single_exp`: small values produce more smoothing +and more lag, while larger values produce closer tracking and less lag. The value of `beta` is often difficult +to see. Small values emphasize long-term trends (such as a constant linear trend in the whole series), while larger +values emphasize short-term trends. This will become more apparently when you are predicting values. + +[[double_0.2beta]] +.Double Exponential moving average with window of size 100, alpha = 0.5, beta = 0.2 +image::images/double_0.2beta.png[] + +[[double_0.7beta]] +.Double Exponential moving average with window of size 100, alpha = 0.5, beta = 0.7 +image::images/double_0.7beta.png[] + +=== Prediction + +All the moving average model support a "prediction" mode, which will attempt to extrapolate into the future given the +current smoothed, moving average. Depending on the model and parameter, these predictions may or may not be accurate. + +Predictions are enabled by adding a `predict` parameter to any moving average aggregation, specifying the nubmer of +predictions you would like appended to the end of the series. These predictions will be spaced out at the same interval +as your buckets: + +[source,js] +-------------------------------------------------- +{ + "the_movavg":{ + "moving_avg":{ + "buckets_path": "the_sum", + "model" : "simple", + "predict" 10 + } +} +-------------------------------------------------- + +The `simple`, `linear` and `single_exp` models all produce "flat" predictions: they essentially converge on the mean +of the last value in the series, producing a flat: + +[[simple_prediction]] +.Simple moving average with window of size 10, predict = 50 +image::images/simple_prediction.png[] + +In contrast, the `double_exp` model can extrapolate based on local or global constant trends. If we set a high `beta` +value, we can extrapolate based on local constant trends (in this case the predictions head down, because the data at the end +of the series was heading in a downward direction): + +[[double_prediction_local]] +.Double Exponential moving average with window of size 100, predict = 20, alpha = 0.5, beta = 0.8 +image::images/double_prediction_local.png[] + +In contrast, if we choose a small `beta`, the predictions are based on the global constant trend. In this series, the +global trend is slightly positive, so the prediction makes a sharp u-turn and begins a positive slope: + +[[double_prediction_global]] +.Double Exponential moving average with window of size 100, predict = 20, alpha = 0.5, beta = 0.1 +image::images/double_prediction_global.png[] \ No newline at end of file