[DOCS] Moves machine learning overview to stack-docs

This commit is contained in:
lcawl 2018-05-31 11:44:30 -07:00
parent 16d1f05045
commit 7e565797e7
11 changed files with 0 additions and 301 deletions

View File

@ -1,29 +0,0 @@
[float]
[[ml-analyzing]]
=== Analyzing the Past and Present
The {xpackml} features automate the analysis of time-series data by creating
accurate baselines of normal behavior in the data and identifying anomalous
patterns in that data. You can submit your data for analysis in batches or
continuously in real-time {dfeeds}.
Using proprietary {ml} algorithms, the following circumstances are detected,
scored, and linked with statistically significant influencers in the data:
* Anomalies related to temporal deviations in values, counts, or frequencies
* Statistical rarity
* Unusual behaviors for a member of a population
Automated periodicity detection and quick adaptation to changing data ensure
that you dont need to specify algorithms, models, or other data science-related
configurations in order to get the benefits of {ml}.
You can view the {ml} results in {kib} where, for example, charts illustrate the
actual data values, the bounds for the expected values, and the anomalies that
occur outside these bounds.
[role="screenshot"]
image::images/ml-gs-job-analysis.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
For a more detailed walk-through of {xpackml} features, see
<<ml-getting-started>>.

View File

@ -1,10 +0,0 @@
[float]
[[ml-nodes]]
=== Machine learning nodes
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
which is the default behavior. If you set `node.ml` to `false`, the node can
service API requests but it cannot run jobs. If you want to use {xpackml}
features, there must be at least one {ml} node in your cluster. For more
information about this setting, see
{ref}/ml-settings.html[{ml} settings in {es}].

View File

@ -1,26 +0,0 @@
[[ml-buckets]]
=== Buckets
++++
<titleabbrev>Buckets</titleabbrev>
++++
The {xpackml} features use the concept of a _bucket_ to divide the time series
into batches for processing.
The _bucket span_ is part of the configuration information for a job. It defines
the time interval that is used to summarize and model the data. This is
typically between 5 minutes to 1 hour and it depends on your data characteristics.
When you set the bucket span, take into account the granularity at which you
want to analyze, the frequency of the input data, the typical duration of the
anomalies, and the frequency at which alerting is required.
When you view your {ml} results, each bucket has an anomaly score. This score is
a statistically aggregated and normalized view of the combined anomalousness of
all the record results in the bucket. If you have more than one job, you can
also obtain overall bucket results, which combine and correlate anomalies from
multiple jobs into an overall score. When you view the results for jobs groups
in {kib}, it provides the overall bucket scores.
For more information, see
{ref}/ml-results-resource.html[Results Resources] and
{ref}/ml-get-overall-buckets.html[Get Overall Buckets API].

View File

@ -1,40 +0,0 @@
[[ml-calendars]]
=== Calendars and Scheduled Events
Sometimes there are periods when you expect unusual activity to take place,
such as bank holidays, "Black Friday", or planned system outages. If you
identify these events in advance, no anomalies are generated during that period.
The {ml} model is not ill-affected and you do not receive spurious results.
You can create calendars and scheduled events in the **Settings** pane on the
**Machine Learning** page in {kib} or by using {ref}/ml-apis.html[{ml} APIs].
A scheduled event must have a start time, end time, and description. In general,
scheduled events are short in duration (typically lasting from a few hours to a
day) and occur infrequently. If you have regularly occurring events, such as
weekly maintenance periods, you do not need to create scheduled events for these
circumstances; they are already handled by the {ml} analytics.
You can identify zero or more scheduled events in a calendar. Jobs can then
subscribe to calendars and the {ml} analytics handle all subsequent scheduled
events appropriately.
If you want to add multiple scheduled events at once, you can import an
iCalendar (`.ics`) file in {kib} or a JSON file in the
{ref}/ml-post-calendar-event.html[add events to calendar API].
[NOTE]
--
* You must identify scheduled events before your job analyzes the data for that
time period. Machine learning results are not updated retroactively.
* If your iCalendar file contains recurring events, only the first occurrence is
imported.
* Bucket results are generated during scheduled events but they have an
anomaly score of zero. For more information about bucket results, see
{ref}/ml-results-resource.html[Results Resources].
* If you use long or frequent scheduled events, it might take longer for the
{ml} analytics to learn to model your data and some anomalous behavior might be
missed.
--

View File

@ -1,40 +0,0 @@
[[ml-dfeeds]]
=== {dfeeds-cap}
Machine learning jobs can analyze data that is stored in {es} or data that is
sent from some other source via an API. _{dfeeds-cap}_ retrieve data from {es}
for analysis, which is the simpler and more common scenario.
If you create jobs in {kib}, you must use {dfeeds}. When you create a job, you
select an index pattern and {kib} configures the {dfeed} for you under the
covers. If you use {ml} APIs instead, you can create a {dfeed} by using the
{ref}/ml-put-datafeed.html[create {dfeeds} API] after you create a job. You can
associate only one {dfeed} with each job.
For a description of all the {dfeed} properties, see
{ref}/ml-datafeed-resource.html[Datafeed Resources].
To start retrieving data from {es}, you must start the {dfeed}. When you start
it, you can optionally specify start and end times. If you do not specify an
end time, the {dfeed} runs continuously. You can start and stop {dfeeds} in
{kib} or use the {ref}/ml-start-datafeed.html[start {dfeeds}] and
{ref}/ml-stop-datafeed.html[stop {dfeeds}] APIs. A {dfeed} can be started and
stopped multiple times throughout its lifecycle.
[IMPORTANT]
--
When {security} is enabled, a {dfeed} stores the roles of the user who created
or updated the {dfeed} at that time. This means that if those roles are updated,
the {dfeed} subsequently runs with the new permissions that are associated with
the roles. However, if the users roles are adjusted after creating or updating
the {dfeed}, the {dfeed} continues to run with the permissions that were
associated with the original roles.
One way to update the roles that are stored within the {dfeed} without changing
any other settings is to submit an empty JSON document ({}) to the
{ref}/ml-update-datafeed.html[update {dfeed} API].
--
If the data that you want to analyze is not stored in {es}, you cannot use
{dfeeds}. You can however send batches of data directly to the job by using the
{ref}/ml-post-data.html[post data to jobs API].

View File

@ -1,66 +0,0 @@
[float]
[[ml-forecasting]]
=== Forecasting the Future
After the {xpackml} features create baselines of normal behavior for your data,
you can use that information to extrapolate future behavior.
You can use a forecast to estimate a time series value at a specific future date.
For example, you might want to determine how many users you can expect to visit
your website next Sunday at 0900.
You can also use it to estimate the probability of a time series value occurring
at a future date. For example, you might want to determine how likely it is that
your disk utilization will reach 100% before the end of next week.
Each forecast has a unique ID, which you can use to distinguish between forecasts
that you created at different times. You can create a forecast by using the
{ref}/ml-forecast.html[Forecast Jobs API] or by using {kib}. For example:
[role="screenshot"]
image::images/ml-gs-job-forecast.jpg["Example screenshot from the Machine Learning Single Metric Viewer in Kibana"]
//For a more detailed walk-through of {xpackml} features, see <<ml-getting-started>>.
The yellow line in the chart represents the predicted data values. The
shaded yellow area represents the bounds for the predicted values, which also
gives an indication of the confidence of the predictions.
When you create a forecast, you specify its _duration_, which indicates how far
the forecast extends beyond the last record that was processed. By default, the
duration is 1 day. Typically the farther into the future that you forecast, the
lower the confidence levels become (that is to say, the bounds increase).
Eventually if the confidence levels are too low, the forecast stops.
You can also optionally specify when the forecast expires. By default, it
expires in 14 days and is deleted automatically thereafter. You can specify a
different expiration period by using the `expires_in` parameter in the
{ref}/ml-forecast.html[Forecast Jobs API].
//Add examples of forecast_request_stats and forecast documents?
There are some limitations that affect your ability to create a forecast:
* You can generate only three forecasts concurrently. There is no limit to the
number of forecasts that you retain. Existing forecasts are not overwritten when
you create new forecasts. Rather, they are automatically deleted when they expire.
* If you use an `over_field_name` property in your job (that is to say, it's a
_population job_), you cannot create a forecast.
* If you use any of the following analytical functions in your job, you
cannot create a forecast:
** `lat_long`
** `rare` and `freq_rare`
** `time_of_day` and `time_of_week`
+
--
For more information about any of these functions, see <<ml-functions>>.
--
* Forecasts run concurrently with real-time {ml} analysis. That is to say, {ml}
analysis does not stop while forecasts are generated. Forecasts can have an
impact on {ml} jobs, however, especially in terms of memory usage. For this
reason, forecasts run only if the model memory status is acceptable.
* The job must be open when you create a forecast. Otherwise, an error occurs.
* If there is insufficient data to generate any meaningful predictions, an
error occurs. In general, forecasts that are created early in the learning phase
of the data analysis are less accurate.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 262 KiB

View File

@ -1,36 +0,0 @@
[[xpack-ml]]
= Machine Learning in the Elastic Stack
[partintro]
--
Machine learning is tightly integrated with the Elastic Stack. Data is pulled
from {es} for analysis and anomaly results are displayed in {kib} dashboards.
* <<ml-overview>>
* <<ml-getting-started>>
* <<ml-configuring>>
* <<stopping-ml>>
* <<ml-troubleshooting, Troubleshooting Machine Learning>>
* <<ml-api-quickref>>
* <<ml-functions>>
--
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/overview.asciidoc
include::overview.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started.asciidoc
include::getting-started.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/configuring.asciidoc
include::configuring.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/stopping-ml.asciidoc
include::stopping-ml.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/api-quickref.asciidoc
include::api-quickref.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/functions.asciidoc
include::functions.asciidoc[]

View File

@ -1,33 +0,0 @@
[[ml-jobs]]
=== Machine Learning Jobs
++++
<titleabbrev>Jobs</titleabbrev>
++++
Machine learning jobs contain the configuration information and metadata
necessary to perform an analytics task.
Each job has one or more _detectors_. A detector applies an analytical function
to specific fields in your data. For more information about the types of
analysis you can perform, see <<ml-functions>>.
A job can also contain properties that affect which types of entities or events
are considered anomalous. For example, you can specify whether entities are
analyzed relative to their own previous behavior or relative to other entities
in a population. There are also multiple options for splitting the data into
categories and partitions. Some of these more advanced job configurations
are described in the following section: <<ml-configuring>>.
For a description of all the job properties, see
{ref}/ml-job-resource.html[Job Resources].
In {kib}, there are wizards that help you create specific types of jobs, such
as _single metric_, _multi-metric_, and _population_ jobs. A single metric job
is just a job with a single detector and limited job properties. To have access
to all of the job properties in {kib}, you must choose the _advanced_ job wizard.
If you want to try creating single and multi-metrics jobs in {kib} with sample
data, see <<ml-getting-started>>.
You can also optionally assign jobs to one or more _job groups_. You can use
job groups to view the results from multiple jobs more easily and to expedite
administrative tasks by opening or closing multiple jobs at once.

View File

@ -1,21 +0,0 @@
[[ml-overview]]
== Overview
include::analyzing.asciidoc[]
include::forecasting.asciidoc[]
include::jobs.asciidoc[]
include::datafeeds.asciidoc[]
include::buckets.asciidoc[]
include::calendars.asciidoc[]
[[ml-concepts]]
=== Basic Machine Learning Terms
++++
<titleabbrev>Basic Terms</titleabbrev>
++++
There are a few concepts that are core to {ml} in {xpack}. Understanding these
concepts from the outset will tremendously help ease the learning process.
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/architecture.asciidoc
include::architecture.asciidoc[]