[DOCS] Add forecasting overview (elastic/x-pack-elasticsearch#3263)

* [DOCS] Restructure ML overview * [DOCS] Added forecasting limitations * [DOCS] Merged changes to ML overview * [DOCS] Added forecasting screenshot * [DOCS] Removed incorrect results info from forecast API * [DOCS] Addressed feedback about forecasts * [DOCS] Clarified default forecast duration Original commit: elastic/x-pack-elasticsearch@1403f2cd2e
2017-12-21 08:14:52 -08:00 · 2017-12-21 08:14:52 -08:00 · b35f1909cc
parent 01e3db3740
commit b35f1909cc
7 changed files with 81 additions and 17 deletions
--- a/docs/en/ml/forecasting.asciidoc
+++ b/docs/en/ml/forecasting.asciidoc
@ -0,0 +1,68 @@
+[float]
+[[ml-forecasting]]
+=== Forecasting the Future
+
+After the {xpackml} features create baselines of normal behavior for your data,
+you can use that information to extrapolate future behavior.
+
+You can use a forecast to estimate a time series value at a specific future date.
+For example, you might want to determine how many users you can expect to visit
+your website next Sunday at 0900.
+
+You can also use it to estimate the probability of a time series value occurring
+at a future date. For example, you might want to determine how likely it is that
+your disk utilization will reach 100% before the end of next week.
+
+Each forecast has a unique ID, which you can use to distinguish between forecasts
+that you created at different times. You can create a forecast by using the
+{ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
+
+
+[role="screenshot"]
+image::images/ml-gs-job-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
+
+//For a more detailed walk-through of {xpackml} features, see <<ml-getting-started>>.
+
+The yellow line in the chart represents the predicted data values. The
+shaded yellow area represents the bounds for the predicted values, which also
+gives an indication of the confidence of the predictions.
+
+When you create a forecast, you specify its _duration_, which indicates how far
+the forecast extends beyond the last record that was processed. By default, the
+duration is 1 day. Typically the farther into the future that you forecast, the
+lower the confidence levels become (that is to say, the bounds increase).
+Eventually if the confidence levels are too low, the forecast stops.
+
+You can also optionally specify when the forecast expires. By default, it
+expires in 14 days and is deleted automatically thereafter. You can specify a
+different expiration period by using the `expires_in` parameter in the  {xpack-ref}/ml-forecast.html[Forecast Jobs API].
+
+//Add examples of forecast_request_stats and forecast documents?
+
+There are some limitations that affect your ability to create a forecast:
+
+* You can generate only three forecasts concurrently. There is no limit to the
+number of forecasts that you retain. Existing forecasts are not overwritten when
+you create new forecasts. Rather, they are automatically deleted when they expire.
+* If you use an `over_field_name` property in your job (that is to say, it's a
+_population job_), you cannot create a forecast.
+* If you use any of the following analytical functions in your job, you
+cannot create a forecast:
+** `lat_long`
+** `rare` and `freq_rare`
+** `time_of_day` and `time_of_week`
+
+--
+For more information about any of these functions, see <<ml-functions>>.
+--
+* Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
+analysis does not stop while forecasts are generated. Forecasts can have an
+impact on {ml} jobs, however, especially in terms of memory usage. For this
+reason, forecasts run only if the model memory status is acceptable and the
+snapshot models for the forecast do not require more than 20 MB. If these memory
+limits are reached, consider splitting the job into multiple smaller jobs and
+creating forecasts for these.
+* The job must be open when you create a forecast. Otherwise, an error occurs.
+* If there is insufficient data to generate any meaningful predictions, an
+error occurs. In general, forecasts that are created early in the learning phase
+of the data analysis are less accurate.
--- a/docs/en/ml/functions/geo.asciidoc
+++ b/docs/en/ml/functions/geo.asciidoc
@ -6,6 +6,8 @@ input data.

 The {xpackml} features include the following geographic function: `lat_long`.

+NOTE: You cannot create forecasts for jobs that contain geographic functions. 
+
 [float]
 [[ml-lat-long]]
 ==== Lat_long
--- a/docs/en/ml/functions/rare.asciidoc
+++ b/docs/en/ml/functions/rare.asciidoc
@ -12,6 +12,8 @@ number of times (frequency) rare values occur.
 ====
 * The `rare` and `freq_rare` functions should not be used in conjunction with
 `exclude_frequent`.
+* You cannot create forecasts for jobs that contain `rare` or `freq_rare`
+functions. 
 * Shorter bucket spans (less than 1 hour, for example) are recommended when
 looking for rare events. The functions model whether something happens in a
 bucket at least once. With longer bucket spans, it is more likely that
--- a/docs/en/ml/functions/time.asciidoc
+++ b/docs/en/ml/functions/time.asciidoc
@ -13,6 +13,7 @@ The {xpackml} features include the following time functions:

 [NOTE]
 ====
+* NOTE: You cannot create forecasts for jobs that contain time functions. 
 * The `time_of_day` function is not aware of the difference between days, for instance
 work days and weekends. When modeling different days, use the `time_of_week` function.
 In general, the `time_of_week` function is more suited to modeling the behavior of people
--- a/docs/en/ml/images/ml-gs-job-forecast.jpg
+++ b/docs/en/ml/images/ml-gs-job-forecast.jpg
--- a/docs/en/ml/overview.asciidoc
+++ b/docs/en/ml/overview.asciidoc
@ -2,6 +2,7 @@
 == Overview

 include::analyzing.asciidoc[]
+include::forecasting.asciidoc[]

 [[ml-concepts]]
 === Basic Machine Learning Terms
--- a/docs/en/rest-api/ml/forecast.asciidoc
+++ b/docs/en/rest-api/ml/forecast.asciidoc
@ -15,17 +15,7 @@ a time series.

 ==== Description

-You can use the API to estimate a time series value at a specific future date.
-For example, you might want to determine how many users you can expect to visit
-your website next Sunday at 0900.
-
-You can also use it to estimate the probability of a time series value occurring
-at a future date. For example, you might want to determine how likely it is that
-your disk utilization will reach 100% before the end of next week.
-
-Each time you call the API, it generates a new forecast and returns a unique ID.
-Existing forecasts for the same job are not overwritten. You can use the forecast
-ID to distinguish between forecasts that you generated at different times.
+See {xpack-ref}/ml-forecasting.html[Forecasting the Future].

 [NOTE]
 ===============================
@ -45,9 +35,9 @@ forecast. For more information about this property, see <<ml-job-resource>>.

 `duration`::
  (time units) A period of time that indicates how far into the future to
-  forecast. For example, `30d` corresponds to 30 days. The forecast starts at the
-  last record that was processed. For more information about time units, see
-  <<time-units>>.
+  forecast. For example, `30d` corresponds to 30 days. The default value is 1
+  day. The forecast starts at the last record that was processed. For more
+  information about time units, see <<time-units>>.

 `expires_in`::
  (time units) The period of time that forecast results are retained.
@ -84,6 +74,6 @@ When the forecast is created, you receive the following results:
 }
 ----

-You can subsequently see the forecast in the *Single Metric Viewer* in {kib}
-and in the results that you retrieve by using {ml} APIs such as the
-<<ml-get-bucket,get bucket API>> and <<ml-get-record,get records API>>.
+You can subsequently see the forecast in the *Single Metric Viewer* in {kib}.
+//and in the results that you retrieve by using {ml} APIs such as the
+//<<ml-get-bucket,get bucket API>> and <<ml-get-record,get records API>>.