[DOCS] Add forecasting overview (elastic/x-pack-elasticsearch#3263)

* [DOCS] Restructure ML overview

* [DOCS] Added forecasting limitations

* [DOCS] Merged changes to ML overview

* [DOCS] Added forecasting screenshot

* [DOCS] Removed incorrect results info from forecast API

* [DOCS] Addressed feedback about forecasts

* [DOCS] Clarified default forecast duration

Original commit: elastic/x-pack-elasticsearch@1403f2cd2e
This commit is contained in:
Lisa Cawley 2017-12-21 08:14:52 -08:00 committed by GitHub
parent 01e3db3740
commit b35f1909cc
7 changed files with 81 additions and 17 deletions

View File

@ -0,0 +1,68 @@
[float]
[[ml-forecasting]]
=== Forecasting the Future
After the {xpackml} features create baselines of normal behavior for your data,
you can use that information to extrapolate future behavior.
You can use a forecast to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.
You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.
Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
[role="screenshot"]
image::images/ml-gs-job-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
//For a more detailed walk-through of {xpackml} features, see <<ml-getting-started>>.
The yellow line in the chart represents the predicted data values. The
shaded yellow area represents the bounds for the predicted values, which also
gives an indication of the confidence of the predictions.
When you create a forecast, you specify its _duration_, which indicates how far
the forecast extends beyond the last record that was processed. By default, the
duration is 1 day. Typically the farther into the future that you forecast, the
lower the confidence levels become (that is to say, the bounds increase).
Eventually if the confidence levels are too low, the forecast stops.
You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the {xpack-ref}/ml-forecast.html[Forecast Jobs API].
//Add examples of forecast_request_stats and forecast documents?
There are some limitations that affect your ability to create a forecast:
* You can generate only three forecasts concurrently. There is no limit to the
number of forecasts that you retain. Existing forecasts are not overwritten when
you create new forecasts. Rather, they are automatically deleted when they expire.
* If you use an `over_field_name` property in your job (that is to say, it's a
_population job_), you cannot create a forecast.
* If you use any of the following analytical functions in your job, you
cannot create a forecast:
** `lat_long`
** `rare` and `freq_rare`
** `time_of_day` and `time_of_week`
+
--
For more information about any of these functions, see <<ml-functions>>.
--
* Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
analysis does not stop while forecasts are generated. Forecasts can have an
impact on {ml} jobs, however, especially in terms of memory usage. For this
reason, forecasts run only if the model memory status is acceptable and the
snapshot models for the forecast do not require more than 20 MB. If these memory
limits are reached, consider splitting the job into multiple smaller jobs and
creating forecasts for these.
* The job must be open when you create a forecast. Otherwise, an error occurs.
* If there is insufficient data to generate any meaningful predictions, an
error occurs. In general, forecasts that are created early in the learning phase
of the data analysis are less accurate.

View File

@ -6,6 +6,8 @@ input data.
The {xpackml} features include the following geographic function: `lat_long`.
NOTE: You cannot create forecasts for jobs that contain geographic functions.
[float]
[[ml-lat-long]]
==== Lat_long

View File

@ -12,6 +12,8 @@ number of times (frequency) rare values occur.
====
* The `rare` and `freq_rare` functions should not be used in conjunction with
`exclude_frequent`.
* You cannot create forecasts for jobs that contain `rare` or `freq_rare`
functions.
* Shorter bucket spans (less than 1 hour, for example) are recommended when
looking for rare events. The functions model whether something happens in a
bucket at least once. With longer bucket spans, it is more likely that

View File

@ -13,6 +13,7 @@ The {xpackml} features include the following time functions:
[NOTE]
====
* NOTE: You cannot create forecasts for jobs that contain time functions.
* The `time_of_day` function is not aware of the difference between days, for instance
work days and weekends. When modeling different days, use the `time_of_week` function.
In general, the `time_of_week` function is more suited to modeling the behavior of people

Binary file not shown.

After

Width:  |  Height:  |  Size: 262 KiB

View File

@ -2,6 +2,7 @@
== Overview
include::analyzing.asciidoc[]
include::forecasting.asciidoc[]
[[ml-concepts]]
=== Basic Machine Learning Terms

View File

@ -15,17 +15,7 @@ a time series.
==== Description
You can use the API to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.
You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.
Each time you call the API, it generates a new forecast and returns a unique ID.
Existing forecasts for the same job are not overwritten. You can use the forecast
ID to distinguish between forecasts that you generated at different times.
See {xpack-ref}/ml-forecasting.html[Forecasting the Future].
[NOTE]
===============================
@ -45,9 +35,9 @@ forecast. For more information about this property, see <<ml-job-resource>>.
`duration`::
(time units) A period of time that indicates how far into the future to
forecast. For example, `30d` corresponds to 30 days. The forecast starts at the
last record that was processed. For more information about time units, see
<<time-units>>.
forecast. For example, `30d` corresponds to 30 days. The default value is 1
day. The forecast starts at the last record that was processed. For more
information about time units, see <<time-units>>.
`expires_in`::
(time units) The period of time that forecast results are retained.
@ -84,6 +74,6 @@ When the forecast is created, you receive the following results:
}
----
You can subsequently see the forecast in the *Single Metric Viewer* in {kib}
and in the results that you retrieve by using {ml} APIs such as the
<<ml-get-bucket,get bucket API>> and <<ml-get-record,get records API>>.
You can subsequently see the forecast in the *Single Metric Viewer* in {kib}.
//and in the results that you retrieve by using {ml} APIs such as the
//<<ml-get-bucket,get bucket API>> and <<ml-get-record,get records API>>.