[DOCS] ML 5.4 docs final tuning (elastic/x-pack-elasticsearch#1265)

Original commit: elastic/x-pack-elasticsearch@91e4af140d
2025-03-09 14:34:43 +00:00 · 2017-05-01 11:27:48 -07:00 · 2017-05-01 11:27:48 -07:00 · a615532866
commit a615532866
parent bec3102e06
6 changed files with 67 additions and 201 deletions
--- a/docs/en/ml/getting-started.asciidoc
+++ b/docs/en/ml/getting-started.asciidoc
@ -21,8 +21,8 @@ will hopefully be inspired to use it to detect anomalies in your own data.

 You might also be interested in these video tutorials:

-* Getting started with machine learning (single metric)
-* Getting started with machine learning (multiple metric)
+* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
+* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]


 [float]
--- a/docs/en/ml/index.asciidoc
+++ b/docs/en/ml/index.asciidoc
@ -3,20 +3,74 @@

 [partintro]
 --
-Data stored in {es} contains valuable insights into the behavior and
-performance of your business and systems. However, the following questions can
-be difficult to answer:
+The {xpack} {ml} features automate the analysis of time-series data by creating
+accurate baselines of normal behaviors in the data and identifying anomalous
+patterns in that data.
+
+Using proprietary {ml} algorithms, the following circumstances are detected,
+scored, and linked with statistically significant influencers in the data:
+
+* Anomalies related to temporal deviations in values, counts, or frequencies
+* Statistical rarity
+* Unusual behaviors for a member of a population
+
+Automated periodicity detection and quick adaptation to changing data ensure
+that you don’t need to specify algorithms, models, or other data science-related
+configurations in order to get the benefits of {ml}.
+
+[float]
+[[ml-intro]]
+== Integration with the Elastic Stack
+
+Machine learning is tightly integrated with the Elastic Stack. Data is pulled
+from {es} for analysis and anomaly results are displayed in {kib} dashboards.
+
+[float]
+[[ml-concepts]]
+== Basic Concepts
+
+There are a few concepts that are core to {ml} in {xpack}. Understanding these
+concepts from the outset will tremendously help ease the learning process.
+
+Jobs::
+  Machine learning jobs contain the configuration information and metadata
+  necessary to perform an analytics task. For a list of the properties associated
+  with a job, see <<ml-job-resource, Job Resources>>.
+
+Data feeds::
+  Jobs can analyze either a one-off batch of data or continuously in real time.
+  Data feeds retrieve data from {es} for analysis. Alternatively you can
+  <<ml-post-data,POST data>> from any source directly to an API.
+
+Detectors::
+  As part of the configuration information that is associated with a job,
+  detectors define the type of analysis that needs to be done. They also specify
+  which fields to analyze. You can have more than one detector in a job, which
+  is more efficient than running multiple jobs against the same data. For a list
+  of the properties associated with detectors,
+  see <<ml-detectorconfig, Detector Configuration Objects>>.
+
+Buckets::
+  The {xpack} {ml} features use the concept of a bucket to divide the time
+  series into batches for processing. The _bucket span_ is part of the
+  configuration information for a job. It defines the time interval that is used
+  to summarize and model the data. This is typically between 5 minutes to 1 hour
+  and it depends on your data characteristics. When you set the bucket span,
+  take into account the granularity at which you want to analyze, the frequency
+  of the input data, the typical duration of the anomalies, and the frequency at
+  which alerting is required.
+
+Machine learning nodes::
+  A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
+  which is the default behavior. If you set `node.ml` to `false`, the node can
+  service API requests but it cannot run jobs. If you want to use {xpack} {ml}
+  features, there must be at least one {ml} node in your cluster. For more
+  information about this setting, see <<ml-settings>>.

-* Is the response time of my website unusual?
-* Are users exfiltrating data unusually?

-The good news is that the {xpack} machine learning capabilities enable you to
-easily answer these types of questions.
 --

-include::introduction.asciidoc[]
 include::getting-started.asciidoc[]
 // include::ml-scenarios.asciidoc[]
 include::api-quickref.asciidoc[]
-
 //include::troubleshooting.asciidoc[]  Referenced from x-pack/docs/public/xpack-troubleshooting.asciidoc
--- a/docs/en/ml/introduction.asciidoc
+++ b/docs/en/ml/introduction.asciidoc
@ -1,81 +0,0 @@
-[[ml-introduction]]
-== Introduction
-
-Machine learning in {xpack} automates the analysis of time-series data by
-creating accurate baselines of normal behaviors in the data, and identifying
-anomalous patterns in that data.
-
-Driven by proprietary machine learning algorithms, anomalies related to
-temporal deviations in values/counts/frequencies, statistical rarity, and unusual
-behaviors for a member of a population are detected, scored and linked with
-statistically significant influencers in the data.
-
-Automated periodicity detection and quick adaptation to changing data ensure
-that you don’t need to specify algorithms, models, or other data
-science-related configurations in order to get the benefits of {ml}.
-//image::images/graph-network.jpg["Graph network"]
-
-[float]
-=== Integration with the Elastic Stack
-
-Machine learning is tightly integrated with the Elastic Stack.
-Data is pulled from {es} for analysis and anomaly results are displayed in {kib}
-dashboards.
-
-[float]
-[[ml-concepts]]
-=== Basic Concepts
-
-There are a few concepts that are core to {ml} in {xpack}.
-Understanding these concepts from the outset will tremendously help ease the
-learning process.
-
-Jobs::
-  Machine learning jobs contain the configuration information and metadata
-  necessary to perform an analytics task. For a list of the properties associated
-  with a job, see <<ml-job-resource, Job Resources>>.
-
-Data feeds::
-  Jobs can analyze either a one-off batch of data or continuously in real time.
-  Data feeds retrieve data from {es} for analysis. Alternatively you can
-  <<ml-post-data,POST data>> from any source directly to an API.
-
-Detectors::
-  Part of the configuration information associated with a job, detectors define
-  the type of analysis that needs to be done (for example, max, average, rare).
-  They also specify which fields to analyze. You can have more than one detector
-  in a job, which is more efficient than running multiple jobs against the same
-  data. For a list of the properties associated with detectors, see
-  <<ml-detectorconfig, Detector Configuration Objects>>.
-
-Buckets::
-  Part of the configuration information associated with a job, the _bucket span_
-  defines the time interval used to summarize and model the data. This is typically
-  between 5 minutes to 1 hour, and it depends on your data characteristics. When setting the
-  bucket span, take into account the granularity at which you want to analyze,
-  the frequency of the input data, the typical duration of the anomalies
-  and the frequency at which alerting is required.
-
-Machine learning nodes::
-  A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
-  which is the default behavior. If you set `node.ml` to `false`, the node can
-  service API requests but it cannot run jobs. If you want to use {xpack} {ml}
-  features, there must be at least one {ml} node in your cluster.
-  For more information about this setting, see <<ml-settings>>.
-
-
-
-
-
-
-//[float]
-//== Where to Go Next
-
-//<<ml-getting-started, Getting Started>> :: Enable machine learning and start
-//discovering anomalies in your data.
-
-//[float]
-//== Have Comments, Questions, or Feedback?
-
-//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and
-//suggestions.
--- a/docs/en/ml/ml-scenarios.asciidoc
+++ b/docs/en/ml/ml-scenarios.asciidoc
@ -1,104 +0,0 @@
-[[ml-scenarios]]
-== Use Cases
-
-TBD
-
-////
-Enterprises, government organizations and cloud based service providers daily
-process volumes of machine data so massive as to make real-time human
-analysis impossible. Changing behaviors hidden in this data provide the
-information needed to quickly resolve massive service outage, detect security
-breaches before they result in the theft of millions of credit records or
-identify the next big trend in consumer patterns. Current search and analysis,
-performance management and cyber security tools are unable to find these
-anomalies without significant human work in the form of thresholds, rules,
-signatures and data models.
-
-By using advanced anomaly detection techniques that learn normal behavior
-patterns represented by the data and identify and cross-correlate anomalies,
-performance, security and operational anomalies and their cause can be
-identified as they develop, so they can be acted on before they impact business.
-
-Whilst anomaly detection is applicable to any type of data, we focus on machine
-data scenarios. Enterprise application developers, cloud service providers and
-technology vendors need to harness the power of machine learning based anomaly
-detection analytics to better manage complex on-line services, detect the
-earliest signs of advanced security threats and gain insight to business
-opportunities and risks represented by changing behaviors hidden in their
-massive data sets. Here are some real-world examples.
-
-=== Eliminating noise generated by threshold-based alerts
-
-Modern IT systems are highly instrumented and can generate TBs of machine data
-a day. Traditional methods for analyzing data involves alerting when metric
-values exceed a known value (static thresholds), or looking for simple statistical deviations (dynamic thresholds).
-
-Setting accurate thresholds for each metric at different times of day is
-practically impossible. It results in static thresholds generating large volumes
-of false positives (threshold set too low) and false negatives (threshold set too high).
-
-The {ml} features in {xpack} automatically learn and calculate the probability
-of a value being anomalous based on its historical behavior.
-This enables accurate alerting and highlights only the subset of relevant metrics
-that have changed. These alerts provide actionable insight into what is a growing
-mountain of data.
-
-=== Reducing troubleshooting times and subject matter expert (SME) involvement
-
-It is said that 75 percent of troubleshooting time is spent mining data to try
-and identify the root cause of an incident. The {ml} features in {xpack}
-automatically analyze data and boil down the massive volume of information
-to the few metrics or log messages that have changed behavior.
-This enables the subject matter experts (SMEs) to focus on the subset of
-information that is relevant to an issue, which greatly reduces triage time.
-
-//In a major credit services provider, within a month of deployment, the company
-//reported that its overall time to triage was reduced by 70 percent and the use of
-//outside SMEs’ time to troubleshoot was decreased by 80 percent.
-
-=== Finding and fixing issues before they impact the end user
-
-Large-scale systems, such as online banking, typically require complex
-infrastructures involving hundreds of different interdependent applications.
-Just accessing an account summary page might involve dozens of different
-databases, systems and applications.
-
-Because of their importance to the business, these systems are typically highly
-resilient and a critical problem will not be allowed to re-occur.
-If a problem happens, it is likely to be complicated and be the result of a
-causal sequence of events that span multiple interacting resources.
-Troubleshooting would require the analysis of large volumes of data with a wide
-range of characteristics and data types. A variety of experts from multiple
-disciplines would need to participate in time consuming “war rooms” to mine
-the data for answers.
-
-By using {ml} in real-time, large volumes of data can be analyzed to provide
-alerts to early indicators of problems and highlight the events that were likely
-to have contributed to the problem.
-
-=== Finding rare events that may be symptomatic of a security issue
-
-With several hundred servers under management, the presence of new processes
-running might indicate a security breach.
-
-Using typical operational management techniques, each server would require a
-period of baselining in order to identify which processes are considered standard.
-Ideally a baseline would be created for each server (or server group)
-and would be periodically updated, making this a large management overhead.
-
-By using {ml} features in {xpack}, baselines are automatically built based
-upon normal behavior patterns for each host and alerts are generated when rare
-events occur.
-
-=== Finding anomalies in periodic data
-
-For data that has periodicity it is difficult for standard monitoring tools to
-accurately tell whether a change in data is due to a service outage, or is a
-result of usual time schedules. Daily and weekly trends in data along with
-peak and off-peak hours, make it difficult to identify anomalies using standard
-threshold-based methods. A min and max threshold for SMS text activity at 2am
-would be very different than the thresholds that would be effective during the day.
-
-By using {ml}, time-related trends are automatically identified and smoothed,
-leaving the residual to be analyzed for anomalies.
-////
--- a/docs/en/ml/troubleshooting.asciidoc
+++ b/docs/en/ml/troubleshooting.asciidoc
@ -1,4 +0,0 @@
-[[ml-troubleshooting]]
-== Machine Learning Troubleshooting
-
-TBD
--- a/docs/en/rest-api/ml/update-job.asciidoc
+++ b/docs/en/rest-api/ml/update-job.asciidoc
@ -27,7 +27,8 @@ The following properties can be updated after the job is created:
  (object) The analysis configuration, which specifies how to analyze the data.
  See <<ml-analysisconfig, analysis configuration objects>>. In particular,
  the following properties can be updated: `categorization_filters`,
-  `detector_description`, TBD.
+  `detector_description`.
+//TBD: Full list of properties that can be updated?  

 `analysis_limits`::
  (object) Specifies runtime limits for the job.