OpenSearch/x-pack/docs/en/ml/forecasting.asciidoc
Hendrik Muhs 6c313a9871 This implementation lazily (on 1st forecast request) checks for available
diskspace and creates a subfolder for storing data outside of Lucene
indexes, but as part of the ES data paths.

Details:
 - tmp storage is managed and does not allow allocation if disk space is
   below a threshold (5GB at the moment)
 - tmp storage is supposed to be managed by the native component but in
   case this fails cleanup is provided:
    - on job close
    - on process crash
    - after node crash, on restart
 - available space is re-checked for every forecast call (the native
   component has to check again before writing)

Note: The 1st path that has enough space is chosen on job open (job
close/reopen triggers a new search)
2018-05-18 14:04:09 +02:00

67 lines
3.2 KiB
Plaintext

[float]
[[ml-forecasting]]
=== Forecasting the Future
After the {xpackml} features create baselines of normal behavior for your data,
you can use that information to extrapolate future behavior.
You can use a forecast to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.
You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.
Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
[role="screenshot"]
image::images/ml-gs-job-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
//For a more detailed walk-through of {xpackml} features, see <<ml-getting-started>>.
The yellow line in the chart represents the predicted data values. The
shaded yellow area represents the bounds for the predicted values, which also
gives an indication of the confidence of the predictions.
When you create a forecast, you specify its _duration_, which indicates how far
the forecast extends beyond the last record that was processed. By default, the
duration is 1 day. Typically the farther into the future that you forecast, the
lower the confidence levels become (that is to say, the bounds increase).
Eventually if the confidence levels are too low, the forecast stops.
You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the
{ref}/ml-forecast.html[Forecast Jobs API].
//Add examples of forecast_request_stats and forecast documents?
There are some limitations that affect your ability to create a forecast:
* You can generate only three forecasts concurrently. There is no limit to the
number of forecasts that you retain. Existing forecasts are not overwritten when
you create new forecasts. Rather, they are automatically deleted when they expire.
* If you use an `over_field_name` property in your job (that is to say, it's a
_population job_), you cannot create a forecast.
* If you use any of the following analytical functions in your job, you
cannot create a forecast:
** `lat_long`
** `rare` and `freq_rare`
** `time_of_day` and `time_of_week`
+
--
For more information about any of these functions, see <<ml-functions>>.
--
* Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
analysis does not stop while forecasts are generated. Forecasts can have an
impact on {ml} jobs, however, especially in terms of memory usage. For this
reason, forecasts run only if the model memory status is acceptable.
* The job must be open when you create a forecast. Otherwise, an error occurs.
* If there is insufficient data to generate any meaningful predictions, an
error occurs. In general, forecasts that are created early in the learning phase
of the data analysis are less accurate.