mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-27 10:28:28 +00:00
[DOCS] Moves machine learning overview to stack-docs
This commit is contained in:
parent
16d1f05045
commit
7e565797e7
@ -1,29 +0,0 @@
|
||||
[float]
|
||||
[[ml-analyzing]]
|
||||
=== Analyzing the Past and Present
|
||||
|
||||
The {xpackml} features automate the analysis of time-series data by creating
|
||||
accurate baselines of normal behavior in the data and identifying anomalous
|
||||
patterns in that data. You can submit your data for analysis in batches or
|
||||
continuously in real-time {dfeeds}.
|
||||
|
||||
Using proprietary {ml} algorithms, the following circumstances are detected,
|
||||
scored, and linked with statistically significant influencers in the data:
|
||||
|
||||
* Anomalies related to temporal deviations in values, counts, or frequencies
|
||||
* Statistical rarity
|
||||
* Unusual behaviors for a member of a population
|
||||
|
||||
Automated periodicity detection and quick adaptation to changing data ensure
|
||||
that you don’t need to specify algorithms, models, or other data science-related
|
||||
configurations in order to get the benefits of {ml}.
|
||||
|
||||
You can view the {ml} results in {kib} where, for example, charts illustrate the
|
||||
actual data values, the bounds for the expected values, and the anomalies that
|
||||
occur outside these bounds.
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job-analysis.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
|
||||
|
||||
For a more detailed walk-through of {xpackml} features, see
|
||||
<<ml-getting-started>>.
|
@ -1,10 +0,0 @@
|
||||
[float]
|
||||
[[ml-nodes]]
|
||||
=== Machine learning nodes
|
||||
|
||||
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
|
||||
which is the default behavior. If you set `node.ml` to `false`, the node can
|
||||
service API requests but it cannot run jobs. If you want to use {xpackml}
|
||||
features, there must be at least one {ml} node in your cluster. For more
|
||||
information about this setting, see
|
||||
{ref}/ml-settings.html[{ml} settings in {es}].
|
@ -1,26 +0,0 @@
|
||||
[[ml-buckets]]
|
||||
=== Buckets
|
||||
++++
|
||||
<titleabbrev>Buckets</titleabbrev>
|
||||
++++
|
||||
|
||||
The {xpackml} features use the concept of a _bucket_ to divide the time series
|
||||
into batches for processing.
|
||||
|
||||
The _bucket span_ is part of the configuration information for a job. It defines
|
||||
the time interval that is used to summarize and model the data. This is
|
||||
typically between 5 minutes to 1 hour and it depends on your data characteristics.
|
||||
When you set the bucket span, take into account the granularity at which you
|
||||
want to analyze, the frequency of the input data, the typical duration of the
|
||||
anomalies, and the frequency at which alerting is required.
|
||||
|
||||
When you view your {ml} results, each bucket has an anomaly score. This score is
|
||||
a statistically aggregated and normalized view of the combined anomalousness of
|
||||
all the record results in the bucket. If you have more than one job, you can
|
||||
also obtain overall bucket results, which combine and correlate anomalies from
|
||||
multiple jobs into an overall score. When you view the results for jobs groups
|
||||
in {kib}, it provides the overall bucket scores.
|
||||
|
||||
For more information, see
|
||||
{ref}/ml-results-resource.html[Results Resources] and
|
||||
{ref}/ml-get-overall-buckets.html[Get Overall Buckets API].
|
@ -1,40 +0,0 @@
|
||||
[[ml-calendars]]
|
||||
=== Calendars and Scheduled Events
|
||||
|
||||
Sometimes there are periods when you expect unusual activity to take place,
|
||||
such as bank holidays, "Black Friday", or planned system outages. If you
|
||||
identify these events in advance, no anomalies are generated during that period.
|
||||
The {ml} model is not ill-affected and you do not receive spurious results.
|
||||
|
||||
You can create calendars and scheduled events in the **Settings** pane on the
|
||||
**Machine Learning** page in {kib} or by using {ref}/ml-apis.html[{ml} APIs].
|
||||
|
||||
A scheduled event must have a start time, end time, and description. In general,
|
||||
scheduled events are short in duration (typically lasting from a few hours to a
|
||||
day) and occur infrequently. If you have regularly occurring events, such as
|
||||
weekly maintenance periods, you do not need to create scheduled events for these
|
||||
circumstances; they are already handled by the {ml} analytics.
|
||||
|
||||
You can identify zero or more scheduled events in a calendar. Jobs can then
|
||||
subscribe to calendars and the {ml} analytics handle all subsequent scheduled
|
||||
events appropriately.
|
||||
|
||||
If you want to add multiple scheduled events at once, you can import an
|
||||
iCalendar (`.ics`) file in {kib} or a JSON file in the
|
||||
{ref}/ml-post-calendar-event.html[add events to calendar API].
|
||||
|
||||
[NOTE]
|
||||
--
|
||||
|
||||
* You must identify scheduled events before your job analyzes the data for that
|
||||
time period. Machine learning results are not updated retroactively.
|
||||
* If your iCalendar file contains recurring events, only the first occurrence is
|
||||
imported.
|
||||
* Bucket results are generated during scheduled events but they have an
|
||||
anomaly score of zero. For more information about bucket results, see
|
||||
{ref}/ml-results-resource.html[Results Resources].
|
||||
* If you use long or frequent scheduled events, it might take longer for the
|
||||
{ml} analytics to learn to model your data and some anomalous behavior might be
|
||||
missed.
|
||||
|
||||
--
|
@ -1,40 +0,0 @@
|
||||
[[ml-dfeeds]]
|
||||
=== {dfeeds-cap}
|
||||
|
||||
Machine learning jobs can analyze data that is stored in {es} or data that is
|
||||
sent from some other source via an API. _{dfeeds-cap}_ retrieve data from {es}
|
||||
for analysis, which is the simpler and more common scenario.
|
||||
|
||||
If you create jobs in {kib}, you must use {dfeeds}. When you create a job, you
|
||||
select an index pattern and {kib} configures the {dfeed} for you under the
|
||||
covers. If you use {ml} APIs instead, you can create a {dfeed} by using the
|
||||
{ref}/ml-put-datafeed.html[create {dfeeds} API] after you create a job. You can
|
||||
associate only one {dfeed} with each job.
|
||||
|
||||
For a description of all the {dfeed} properties, see
|
||||
{ref}/ml-datafeed-resource.html[Datafeed Resources].
|
||||
|
||||
To start retrieving data from {es}, you must start the {dfeed}. When you start
|
||||
it, you can optionally specify start and end times. If you do not specify an
|
||||
end time, the {dfeed} runs continuously. You can start and stop {dfeeds} in
|
||||
{kib} or use the {ref}/ml-start-datafeed.html[start {dfeeds}] and
|
||||
{ref}/ml-stop-datafeed.html[stop {dfeeds}] APIs. A {dfeed} can be started and
|
||||
stopped multiple times throughout its lifecycle.
|
||||
|
||||
[IMPORTANT]
|
||||
--
|
||||
When {security} is enabled, a {dfeed} stores the roles of the user who created
|
||||
or updated the {dfeed} at that time. This means that if those roles are updated,
|
||||
the {dfeed} subsequently runs with the new permissions that are associated with
|
||||
the roles. However, if the user’s roles are adjusted after creating or updating
|
||||
the {dfeed}, the {dfeed} continues to run with the permissions that were
|
||||
associated with the original roles.
|
||||
|
||||
One way to update the roles that are stored within the {dfeed} without changing
|
||||
any other settings is to submit an empty JSON document ({}) to the
|
||||
{ref}/ml-update-datafeed.html[update {dfeed} API].
|
||||
--
|
||||
|
||||
If the data that you want to analyze is not stored in {es}, you cannot use
|
||||
{dfeeds}. You can however send batches of data directly to the job by using the
|
||||
{ref}/ml-post-data.html[post data to jobs API].
|
@ -1,66 +0,0 @@
|
||||
[float]
|
||||
[[ml-forecasting]]
|
||||
=== Forecasting the Future
|
||||
|
||||
After the {xpackml} features create baselines of normal behavior for your data,
|
||||
you can use that information to extrapolate future behavior.
|
||||
|
||||
You can use a forecast to estimate a time series value at a specific future date.
|
||||
For example, you might want to determine how many users you can expect to visit
|
||||
your website next Sunday at 0900.
|
||||
|
||||
You can also use it to estimate the probability of a time series value occurring
|
||||
at a future date. For example, you might want to determine how likely it is that
|
||||
your disk utilization will reach 100% before the end of next week.
|
||||
|
||||
Each forecast has a unique ID, which you can use to distinguish between forecasts
|
||||
that you created at different times. You can create a forecast by using the
|
||||
{ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
|
||||
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
|
||||
|
||||
//For a more detailed walk-through of {xpackml} features, see <<ml-getting-started>>.
|
||||
|
||||
The yellow line in the chart represents the predicted data values. The
|
||||
shaded yellow area represents the bounds for the predicted values, which also
|
||||
gives an indication of the confidence of the predictions.
|
||||
|
||||
When you create a forecast, you specify its _duration_, which indicates how far
|
||||
the forecast extends beyond the last record that was processed. By default, the
|
||||
duration is 1 day. Typically the farther into the future that you forecast, the
|
||||
lower the confidence levels become (that is to say, the bounds increase).
|
||||
Eventually if the confidence levels are too low, the forecast stops.
|
||||
|
||||
You can also optionally specify when the forecast expires. By default, it
|
||||
expires in 14 days and is deleted automatically thereafter. You can specify a
|
||||
different expiration period by using the `expires_in` parameter in the
|
||||
{ref}/ml-forecast.html[Forecast Jobs API].
|
||||
|
||||
//Add examples of forecast_request_stats and forecast documents?
|
||||
|
||||
There are some limitations that affect your ability to create a forecast:
|
||||
|
||||
* You can generate only three forecasts concurrently. There is no limit to the
|
||||
number of forecasts that you retain. Existing forecasts are not overwritten when
|
||||
you create new forecasts. Rather, they are automatically deleted when they expire.
|
||||
* If you use an `over_field_name` property in your job (that is to say, it's a
|
||||
_population job_), you cannot create a forecast.
|
||||
* If you use any of the following analytical functions in your job, you
|
||||
cannot create a forecast:
|
||||
** `lat_long`
|
||||
** `rare` and `freq_rare`
|
||||
** `time_of_day` and `time_of_week`
|
||||
+
|
||||
--
|
||||
For more information about any of these functions, see <<ml-functions>>.
|
||||
--
|
||||
* Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
|
||||
analysis does not stop while forecasts are generated. Forecasts can have an
|
||||
impact on {ml} jobs, however, especially in terms of memory usage. For this
|
||||
reason, forecasts run only if the model memory status is acceptable.
|
||||
* The job must be open when you create a forecast. Otherwise, an error occurs.
|
||||
* If there is insufficient data to generate any meaningful predictions, an
|
||||
error occurs. In general, forecasts that are created early in the learning phase
|
||||
of the data analysis are less accurate.
|
Binary file not shown.
Before Width: | Height: | Size: 92 KiB |
Binary file not shown.
Before Width: | Height: | Size: 262 KiB |
@ -1,36 +0,0 @@
|
||||
[[xpack-ml]]
|
||||
= Machine Learning in the Elastic Stack
|
||||
|
||||
[partintro]
|
||||
--
|
||||
Machine learning is tightly integrated with the Elastic Stack. Data is pulled
|
||||
from {es} for analysis and anomaly results are displayed in {kib} dashboards.
|
||||
|
||||
* <<ml-overview>>
|
||||
* <<ml-getting-started>>
|
||||
* <<ml-configuring>>
|
||||
* <<stopping-ml>>
|
||||
* <<ml-troubleshooting, Troubleshooting Machine Learning>>
|
||||
* <<ml-api-quickref>>
|
||||
* <<ml-functions>>
|
||||
|
||||
|
||||
--
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/overview.asciidoc
|
||||
include::overview.asciidoc[]
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started.asciidoc
|
||||
include::getting-started.asciidoc[]
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/configuring.asciidoc
|
||||
include::configuring.asciidoc[]
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/stopping-ml.asciidoc
|
||||
include::stopping-ml.asciidoc[]
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/api-quickref.asciidoc
|
||||
include::api-quickref.asciidoc[]
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/functions.asciidoc
|
||||
include::functions.asciidoc[]
|
@ -1,33 +0,0 @@
|
||||
[[ml-jobs]]
|
||||
=== Machine Learning Jobs
|
||||
++++
|
||||
<titleabbrev>Jobs</titleabbrev>
|
||||
++++
|
||||
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytics task.
|
||||
|
||||
Each job has one or more _detectors_. A detector applies an analytical function
|
||||
to specific fields in your data. For more information about the types of
|
||||
analysis you can perform, see <<ml-functions>>.
|
||||
|
||||
A job can also contain properties that affect which types of entities or events
|
||||
are considered anomalous. For example, you can specify whether entities are
|
||||
analyzed relative to their own previous behavior or relative to other entities
|
||||
in a population. There are also multiple options for splitting the data into
|
||||
categories and partitions. Some of these more advanced job configurations
|
||||
are described in the following section: <<ml-configuring>>.
|
||||
|
||||
For a description of all the job properties, see
|
||||
{ref}/ml-job-resource.html[Job Resources].
|
||||
|
||||
In {kib}, there are wizards that help you create specific types of jobs, such
|
||||
as _single metric_, _multi-metric_, and _population_ jobs. A single metric job
|
||||
is just a job with a single detector and limited job properties. To have access
|
||||
to all of the job properties in {kib}, you must choose the _advanced_ job wizard.
|
||||
If you want to try creating single and multi-metrics jobs in {kib} with sample
|
||||
data, see <<ml-getting-started>>.
|
||||
|
||||
You can also optionally assign jobs to one or more _job groups_. You can use
|
||||
job groups to view the results from multiple jobs more easily and to expedite
|
||||
administrative tasks by opening or closing multiple jobs at once.
|
@ -1,21 +0,0 @@
|
||||
[[ml-overview]]
|
||||
== Overview
|
||||
|
||||
include::analyzing.asciidoc[]
|
||||
include::forecasting.asciidoc[]
|
||||
include::jobs.asciidoc[]
|
||||
include::datafeeds.asciidoc[]
|
||||
include::buckets.asciidoc[]
|
||||
include::calendars.asciidoc[]
|
||||
|
||||
[[ml-concepts]]
|
||||
=== Basic Machine Learning Terms
|
||||
++++
|
||||
<titleabbrev>Basic Terms</titleabbrev>
|
||||
++++
|
||||
|
||||
There are a few concepts that are core to {ml} in {xpack}. Understanding these
|
||||
concepts from the outset will tremendously help ease the learning process.
|
||||
|
||||
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/architecture.asciidoc
|
||||
include::architecture.asciidoc[]
|
Loading…
x
Reference in New Issue
Block a user