[DOCS] Add forecasting overview (elastic/x-pack-elasticsearch#3263)

* [DOCS] Restructure ML overview * [DOCS] Added forecasting limitations * [DOCS] Merged changes to ML overview * [DOCS] Added forecasting screenshot * [DOCS] Removed incorrect results info from forecast API * [DOCS] Addressed feedback about forecasts * [DOCS] Clarified default forecast duration Original commit: elastic/x-pack-elasticsearch@1403f2cd2e
2025-03-01 16:39:11 +00:00 · 2017-12-21 08:14:52 -08:00 · 2017-12-21 08:14:52 -08:00 · b35f1909cc
commit b35f1909cc
parent 01e3db3740
7 changed files with 81 additions and 17 deletions
--- a/docs/en/ml/forecasting.asciidoc
+++ b/docs/en/ml/forecasting.asciidoc
@ -0,0 +1,68 @@
 [float]
 [[ml-forecasting]]
 === Forecasting the Future
 After the {xpackml} features create baselines of normal behavior for your data,
 you can use that information to extrapolate future behavior.
 You can use a forecast to estimate a time series value at a specific future date.
 For example, you might want to determine how many users you can expect to visit
 your website next Sunday at 0900.
 You can also use it to estimate the probability of a time series value occurring
 at a future date. For example, you might want to determine how likely it is that
 your disk utilization will reach 100% before the end of next week.
 Each forecast has a unique ID, which you can use to distinguish between forecasts
 that you created at different times. You can create a forecast by using the
 {ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
 [role="screenshot"]
 image::images/ml-gs-job-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
 //For a more detailed walk-through of {xpackml} features, see <<ml-getting-started>>.
 The yellow line in the chart represents the predicted data values. The
 shaded yellow area represents the bounds for the predicted values, which also
 gives an indication of the confidence of the predictions.
 When you create a forecast, you specify its _duration_, which indicates how far
 the forecast extends beyond the last record that was processed. By default, the
 duration is 1 day. Typically the farther into the future that you forecast, the
 lower the confidence levels become (that is to say, the bounds increase).
 Eventually if the confidence levels are too low, the forecast stops.
 You can also optionally specify when the forecast expires. By default, it
 expires in 14 days and is deleted automatically thereafter. You can specify a
 different expiration period by using the `expires_in` parameter in the  {xpack-ref}/ml-forecast.html[Forecast Jobs API].
 //Add examples of forecast_request_stats and forecast documents?
 There are some limitations that affect your ability to create a forecast:
 * You can generate only three forecasts concurrently. There is no limit to the
 number of forecasts that you retain. Existing forecasts are not overwritten when
 you create new forecasts. Rather, they are automatically deleted when they expire.
 * If you use an `over_field_name` property in your job (that is to say, it's a
 _population job_), you cannot create a forecast.
 * If you use any of the following analytical functions in your job, you
 cannot create a forecast:
 ** `lat_long`
 ** `rare` and `freq_rare`
 ** `time_of_day` and `time_of_week`
 +
 --
 For more information about any of these functions, see <<ml-functions>>.
 --
 * Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
 analysis does not stop while forecasts are generated. Forecasts can have an
 impact on {ml} jobs, however, especially in terms of memory usage. For this
 reason, forecasts run only if the model memory status is acceptable and the
 snapshot models for the forecast do not require more than 20 MB. If these memory
 limits are reached, consider splitting the job into multiple smaller jobs and
 creating forecasts for these.
 * The job must be open when you create a forecast. Otherwise, an error occurs.
 * If there is insufficient data to generate any meaningful predictions, an
 error occurs. In general, forecasts that are created early in the learning phase
 of the data analysis are less accurate.
--- a/docs/en/ml/functions/geo.asciidoc
+++ b/docs/en/ml/functions/geo.asciidoc
@ -6,6 +6,8 @@ input data.
 The {xpackml} features include the following geographic function: `lat_long`.
 NOTE: You cannot create forecasts for jobs that contain geographic functions. 
 [float]
 [[ml-lat-long]]
 ==== Lat_long
--- a/docs/en/ml/functions/rare.asciidoc
+++ b/docs/en/ml/functions/rare.asciidoc
@ -12,6 +12,8 @@ number of times (frequency) rare values occur.
 ====
 * The `rare` and `freq_rare` functions should not be used in conjunction with
 `exclude_frequent`.
 * You cannot create forecasts for jobs that contain `rare` or `freq_rare`
 functions. 
 * Shorter bucket spans (less than 1 hour, for example) are recommended when
 looking for rare events. The functions model whether something happens in a
 bucket at least once. With longer bucket spans, it is more likely that
--- a/docs/en/ml/functions/time.asciidoc
+++ b/docs/en/ml/functions/time.asciidoc
@ -13,6 +13,7 @@ The {xpackml} features include the following time functions:
 [NOTE]
 ====
 * NOTE: You cannot create forecasts for jobs that contain time functions. 
 * The `time_of_day` function is not aware of the difference between days, for instance
 work days and weekends. When modeling different days, use the `time_of_week` function.
 In general, the `time_of_week` function is more suited to modeling the behavior of people
--- a/docs/en/ml/images/ml-gs-job-forecast.jpg
+++ b/docs/en/ml/images/ml-gs-job-forecast.jpg
--- a/docs/en/ml/overview.asciidoc
+++ b/docs/en/ml/overview.asciidoc
@ -2,6 +2,7 @@
 == Overview
 include::analyzing.asciidoc[]
 include::forecasting.asciidoc[]
 [[ml-concepts]]
 === Basic Machine Learning Terms
--- a/docs/en/rest-api/ml/forecast.asciidoc
+++ b/docs/en/rest-api/ml/forecast.asciidoc
@ -15,17 +15,7 @@ a time series.
 ==== Description
-You can use the API to estimate a time series value at a specific future date.
+See {xpack-ref}/ml-forecasting.html[Forecasting the Future].
 For example, you might want to determine how many users you can expect to visit
 your website next Sunday at 0900.
 You can also use it to estimate the probability of a time series value occurring
 at a future date. For example, you might want to determine how likely it is that
 your disk utilization will reach 100% before the end of next week.
 Each time you call the API, it generates a new forecast and returns a unique ID.
 Existing forecasts for the same job are not overwritten. You can use the forecast
 ID to distinguish between forecasts that you generated at different times.
 [NOTE]
 ===============================
@ -45,9 +35,9 @@ forecast. For more information about this property, see <<ml-job-resource>>.
 `duration`::
  (time units) A period of time that indicates how far into the future to
-  forecast. For example, `30d` corresponds to 30 days. The forecast starts at the
+  forecast. For example, `30d` corresponds to 30 days. The default value is 1
-  last record that was processed. For more information about time units, see
+  day. The forecast starts at the last record that was processed. For more
-  <<time-units>>.
+  information about time units, see <<time-units>>.
 `expires_in`::
  (time units) The period of time that forecast results are retained.
@ -84,6 +74,6 @@ When the forecast is created, you receive the following results:
 }
 ----
-You can subsequently see the forecast in the *Single Metric Viewer* in {kib}
+You can subsequently see the forecast in the *Single Metric Viewer* in {kib}.
-and in the results that you retrieve by using {ml} APIs such as the
+//and in the results that you retrieve by using {ml} APIs such as the
-<<ml-get-bucket,get bucket API>> and <<ml-get-record,get records API>>.
+//<<ml-get-bucket,get bucket API>> and <<ml-get-record,get records API>>.