[DOCS] ML 5.4 docs final tuning (elastic/x-pack-elasticsearch#1265)
Original commit: elastic/x-pack-elasticsearch@91e4af140d
This commit is contained in:
parent
bec3102e06
commit
a615532866
|
@ -21,8 +21,8 @@ will hopefully be inspired to use it to detect anomalies in your own data.
|
|||
|
||||
You might also be interested in these video tutorials:
|
||||
|
||||
* Getting started with machine learning (single metric)
|
||||
* Getting started with machine learning (multiple metric)
|
||||
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
|
||||
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
|
||||
|
||||
|
||||
[float]
|
||||
|
|
|
@ -3,20 +3,74 @@
|
|||
|
||||
[partintro]
|
||||
--
|
||||
Data stored in {es} contains valuable insights into the behavior and
|
||||
performance of your business and systems. However, the following questions can
|
||||
be difficult to answer:
|
||||
The {xpack} {ml} features automate the analysis of time-series data by creating
|
||||
accurate baselines of normal behaviors in the data and identifying anomalous
|
||||
patterns in that data.
|
||||
|
||||
Using proprietary {ml} algorithms, the following circumstances are detected,
|
||||
scored, and linked with statistically significant influencers in the data:
|
||||
|
||||
* Anomalies related to temporal deviations in values, counts, or frequencies
|
||||
* Statistical rarity
|
||||
* Unusual behaviors for a member of a population
|
||||
|
||||
Automated periodicity detection and quick adaptation to changing data ensure
|
||||
that you don’t need to specify algorithms, models, or other data science-related
|
||||
configurations in order to get the benefits of {ml}.
|
||||
|
||||
[float]
|
||||
[[ml-intro]]
|
||||
== Integration with the Elastic Stack
|
||||
|
||||
Machine learning is tightly integrated with the Elastic Stack. Data is pulled
|
||||
from {es} for analysis and anomaly results are displayed in {kib} dashboards.
|
||||
|
||||
[float]
|
||||
[[ml-concepts]]
|
||||
== Basic Concepts
|
||||
|
||||
There are a few concepts that are core to {ml} in {xpack}. Understanding these
|
||||
concepts from the outset will tremendously help ease the learning process.
|
||||
|
||||
Jobs::
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytics task. For a list of the properties associated
|
||||
with a job, see <<ml-job-resource, Job Resources>>.
|
||||
|
||||
Data feeds::
|
||||
Jobs can analyze either a one-off batch of data or continuously in real time.
|
||||
Data feeds retrieve data from {es} for analysis. Alternatively you can
|
||||
<<ml-post-data,POST data>> from any source directly to an API.
|
||||
|
||||
Detectors::
|
||||
As part of the configuration information that is associated with a job,
|
||||
detectors define the type of analysis that needs to be done. They also specify
|
||||
which fields to analyze. You can have more than one detector in a job, which
|
||||
is more efficient than running multiple jobs against the same data. For a list
|
||||
of the properties associated with detectors,
|
||||
see <<ml-detectorconfig, Detector Configuration Objects>>.
|
||||
|
||||
Buckets::
|
||||
The {xpack} {ml} features use the concept of a bucket to divide the time
|
||||
series into batches for processing. The _bucket span_ is part of the
|
||||
configuration information for a job. It defines the time interval that is used
|
||||
to summarize and model the data. This is typically between 5 minutes to 1 hour
|
||||
and it depends on your data characteristics. When you set the bucket span,
|
||||
take into account the granularity at which you want to analyze, the frequency
|
||||
of the input data, the typical duration of the anomalies, and the frequency at
|
||||
which alerting is required.
|
||||
|
||||
Machine learning nodes::
|
||||
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
|
||||
which is the default behavior. If you set `node.ml` to `false`, the node can
|
||||
service API requests but it cannot run jobs. If you want to use {xpack} {ml}
|
||||
features, there must be at least one {ml} node in your cluster. For more
|
||||
information about this setting, see <<ml-settings>>.
|
||||
|
||||
* Is the response time of my website unusual?
|
||||
* Are users exfiltrating data unusually?
|
||||
|
||||
The good news is that the {xpack} machine learning capabilities enable you to
|
||||
easily answer these types of questions.
|
||||
--
|
||||
|
||||
include::introduction.asciidoc[]
|
||||
include::getting-started.asciidoc[]
|
||||
// include::ml-scenarios.asciidoc[]
|
||||
include::api-quickref.asciidoc[]
|
||||
|
||||
//include::troubleshooting.asciidoc[] Referenced from x-pack/docs/public/xpack-troubleshooting.asciidoc
|
||||
|
|
|
@ -1,81 +0,0 @@
|
|||
[[ml-introduction]]
|
||||
== Introduction
|
||||
|
||||
Machine learning in {xpack} automates the analysis of time-series data by
|
||||
creating accurate baselines of normal behaviors in the data, and identifying
|
||||
anomalous patterns in that data.
|
||||
|
||||
Driven by proprietary machine learning algorithms, anomalies related to
|
||||
temporal deviations in values/counts/frequencies, statistical rarity, and unusual
|
||||
behaviors for a member of a population are detected, scored and linked with
|
||||
statistically significant influencers in the data.
|
||||
|
||||
Automated periodicity detection and quick adaptation to changing data ensure
|
||||
that you don’t need to specify algorithms, models, or other data
|
||||
science-related configurations in order to get the benefits of {ml}.
|
||||
//image::images/graph-network.jpg["Graph network"]
|
||||
|
||||
[float]
|
||||
=== Integration with the Elastic Stack
|
||||
|
||||
Machine learning is tightly integrated with the Elastic Stack.
|
||||
Data is pulled from {es} for analysis and anomaly results are displayed in {kib}
|
||||
dashboards.
|
||||
|
||||
[float]
|
||||
[[ml-concepts]]
|
||||
=== Basic Concepts
|
||||
|
||||
There are a few concepts that are core to {ml} in {xpack}.
|
||||
Understanding these concepts from the outset will tremendously help ease the
|
||||
learning process.
|
||||
|
||||
Jobs::
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytics task. For a list of the properties associated
|
||||
with a job, see <<ml-job-resource, Job Resources>>.
|
||||
|
||||
Data feeds::
|
||||
Jobs can analyze either a one-off batch of data or continuously in real time.
|
||||
Data feeds retrieve data from {es} for analysis. Alternatively you can
|
||||
<<ml-post-data,POST data>> from any source directly to an API.
|
||||
|
||||
Detectors::
|
||||
Part of the configuration information associated with a job, detectors define
|
||||
the type of analysis that needs to be done (for example, max, average, rare).
|
||||
They also specify which fields to analyze. You can have more than one detector
|
||||
in a job, which is more efficient than running multiple jobs against the same
|
||||
data. For a list of the properties associated with detectors, see
|
||||
<<ml-detectorconfig, Detector Configuration Objects>>.
|
||||
|
||||
Buckets::
|
||||
Part of the configuration information associated with a job, the _bucket span_
|
||||
defines the time interval used to summarize and model the data. This is typically
|
||||
between 5 minutes to 1 hour, and it depends on your data characteristics. When setting the
|
||||
bucket span, take into account the granularity at which you want to analyze,
|
||||
the frequency of the input data, the typical duration of the anomalies
|
||||
and the frequency at which alerting is required.
|
||||
|
||||
Machine learning nodes::
|
||||
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
|
||||
which is the default behavior. If you set `node.ml` to `false`, the node can
|
||||
service API requests but it cannot run jobs. If you want to use {xpack} {ml}
|
||||
features, there must be at least one {ml} node in your cluster.
|
||||
For more information about this setting, see <<ml-settings>>.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
//[float]
|
||||
//== Where to Go Next
|
||||
|
||||
//<<ml-getting-started, Getting Started>> :: Enable machine learning and start
|
||||
//discovering anomalies in your data.
|
||||
|
||||
//[float]
|
||||
//== Have Comments, Questions, or Feedback?
|
||||
|
||||
//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and
|
||||
//suggestions.
|
|
@ -1,104 +0,0 @@
|
|||
[[ml-scenarios]]
|
||||
== Use Cases
|
||||
|
||||
TBD
|
||||
|
||||
////
|
||||
Enterprises, government organizations and cloud based service providers daily
|
||||
process volumes of machine data so massive as to make real-time human
|
||||
analysis impossible. Changing behaviors hidden in this data provide the
|
||||
information needed to quickly resolve massive service outage, detect security
|
||||
breaches before they result in the theft of millions of credit records or
|
||||
identify the next big trend in consumer patterns. Current search and analysis,
|
||||
performance management and cyber security tools are unable to find these
|
||||
anomalies without significant human work in the form of thresholds, rules,
|
||||
signatures and data models.
|
||||
|
||||
By using advanced anomaly detection techniques that learn normal behavior
|
||||
patterns represented by the data and identify and cross-correlate anomalies,
|
||||
performance, security and operational anomalies and their cause can be
|
||||
identified as they develop, so they can be acted on before they impact business.
|
||||
|
||||
Whilst anomaly detection is applicable to any type of data, we focus on machine
|
||||
data scenarios. Enterprise application developers, cloud service providers and
|
||||
technology vendors need to harness the power of machine learning based anomaly
|
||||
detection analytics to better manage complex on-line services, detect the
|
||||
earliest signs of advanced security threats and gain insight to business
|
||||
opportunities and risks represented by changing behaviors hidden in their
|
||||
massive data sets. Here are some real-world examples.
|
||||
|
||||
=== Eliminating noise generated by threshold-based alerts
|
||||
|
||||
Modern IT systems are highly instrumented and can generate TBs of machine data
|
||||
a day. Traditional methods for analyzing data involves alerting when metric
|
||||
values exceed a known value (static thresholds), or looking for simple statistical deviations (dynamic thresholds).
|
||||
|
||||
Setting accurate thresholds for each metric at different times of day is
|
||||
practically impossible. It results in static thresholds generating large volumes
|
||||
of false positives (threshold set too low) and false negatives (threshold set too high).
|
||||
|
||||
The {ml} features in {xpack} automatically learn and calculate the probability
|
||||
of a value being anomalous based on its historical behavior.
|
||||
This enables accurate alerting and highlights only the subset of relevant metrics
|
||||
that have changed. These alerts provide actionable insight into what is a growing
|
||||
mountain of data.
|
||||
|
||||
=== Reducing troubleshooting times and subject matter expert (SME) involvement
|
||||
|
||||
It is said that 75 percent of troubleshooting time is spent mining data to try
|
||||
and identify the root cause of an incident. The {ml} features in {xpack}
|
||||
automatically analyze data and boil down the massive volume of information
|
||||
to the few metrics or log messages that have changed behavior.
|
||||
This enables the subject matter experts (SMEs) to focus on the subset of
|
||||
information that is relevant to an issue, which greatly reduces triage time.
|
||||
|
||||
//In a major credit services provider, within a month of deployment, the company
|
||||
//reported that its overall time to triage was reduced by 70 percent and the use of
|
||||
//outside SMEs’ time to troubleshoot was decreased by 80 percent.
|
||||
|
||||
=== Finding and fixing issues before they impact the end user
|
||||
|
||||
Large-scale systems, such as online banking, typically require complex
|
||||
infrastructures involving hundreds of different interdependent applications.
|
||||
Just accessing an account summary page might involve dozens of different
|
||||
databases, systems and applications.
|
||||
|
||||
Because of their importance to the business, these systems are typically highly
|
||||
resilient and a critical problem will not be allowed to re-occur.
|
||||
If a problem happens, it is likely to be complicated and be the result of a
|
||||
causal sequence of events that span multiple interacting resources.
|
||||
Troubleshooting would require the analysis of large volumes of data with a wide
|
||||
range of characteristics and data types. A variety of experts from multiple
|
||||
disciplines would need to participate in time consuming “war rooms” to mine
|
||||
the data for answers.
|
||||
|
||||
By using {ml} in real-time, large volumes of data can be analyzed to provide
|
||||
alerts to early indicators of problems and highlight the events that were likely
|
||||
to have contributed to the problem.
|
||||
|
||||
=== Finding rare events that may be symptomatic of a security issue
|
||||
|
||||
With several hundred servers under management, the presence of new processes
|
||||
running might indicate a security breach.
|
||||
|
||||
Using typical operational management techniques, each server would require a
|
||||
period of baselining in order to identify which processes are considered standard.
|
||||
Ideally a baseline would be created for each server (or server group)
|
||||
and would be periodically updated, making this a large management overhead.
|
||||
|
||||
By using {ml} features in {xpack}, baselines are automatically built based
|
||||
upon normal behavior patterns for each host and alerts are generated when rare
|
||||
events occur.
|
||||
|
||||
=== Finding anomalies in periodic data
|
||||
|
||||
For data that has periodicity it is difficult for standard monitoring tools to
|
||||
accurately tell whether a change in data is due to a service outage, or is a
|
||||
result of usual time schedules. Daily and weekly trends in data along with
|
||||
peak and off-peak hours, make it difficult to identify anomalies using standard
|
||||
threshold-based methods. A min and max threshold for SMS text activity at 2am
|
||||
would be very different than the thresholds that would be effective during the day.
|
||||
|
||||
By using {ml}, time-related trends are automatically identified and smoothed,
|
||||
leaving the residual to be analyzed for anomalies.
|
||||
////
|
|
@ -1,4 +0,0 @@
|
|||
[[ml-troubleshooting]]
|
||||
== Machine Learning Troubleshooting
|
||||
|
||||
TBD
|
|
@ -27,7 +27,8 @@ The following properties can be updated after the job is created:
|
|||
(object) The analysis configuration, which specifies how to analyze the data.
|
||||
See <<ml-analysisconfig, analysis configuration objects>>. In particular,
|
||||
the following properties can be updated: `categorization_filters`,
|
||||
`detector_description`, TBD.
|
||||
`detector_description`.
|
||||
//TBD: Full list of properties that can be updated?
|
||||
|
||||
`analysis_limits`::
|
||||
(object) Specifies runtime limits for the job.
|
||||
|
|
Loading…
Reference in New Issue