OpenSearch/docs/en/ml/introduction.asciidoc

82 lines
3.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[ml-introduction]]
== Introduction
Machine learning in {xpack} automates the analysis of time-series data by
creating accurate baselines of normal behaviors in the data, and identifying
anomalous patterns in that data.
Driven by proprietary machine learning algorithms, anomalies related to
temporal deviations in values/counts/frequencies, statistical rarity, and unusual
behaviors for a member of a population are detected, scored and linked with
statistically significant influencers in the data.
Automated periodicity detection and quick adaptation to changing data ensure
that you dont need to specify algorithms, models, or other data
science-related configurations in order to get the benefits of {ml}.
//image::images/graph-network.jpg["Graph network"]
[float]
=== Integration with the Elastic Stack
Machine learning is tightly integrated with the Elastic Stack.
Data is pulled from {es} for analysis and anomaly results are displayed in {kb}
dashboards.
[float]
[[ml-concepts]]
=== Basic Concepts
There are a few concepts that are core to {ml} in {xpack}.
Understanding these concepts from the outset will tremendously help ease the
learning process.
Jobs::
Machine learning jobs contain the configuration information and metadata
necessary to perform an analytics task. For a list of the properties associated
with a job, see <<ml-job-resource, Job Resources>>.
Data feeds::
Jobs can analyze either a one-off batch of data or continuously in real-time.
Data feeds retrieve data from {es} for analysis. Alternatively you can
<<ml-post-data],POST data>> from any source directly to an API.
Detectors::
Part of the configuration information associated with a job, detectors define
the type of analysis that needs to be done (for example, max, average, rare).
They also specify which fields to analyze. You can have more than one detector
in a job, which is more efficient than running multiple jobs against the same
data. For a list of the properties associated with detectors, see
<<ml-detectorconfig, Detector Configuration Objects>>.
Buckets::
Part of the configuration information associated with a job, the _bucket span_
defines the time interval used to summarize and model the data. This is typically
between 5 minutes to 1 hour, and it depends on your data characteristics. When setting the
bucket span, take into account the granularity at which you want to analyze,
the frequency of the input data, the typical duration of the anomalies
and the frequency at which alerting is required.
Machine learning nodes::
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
which is the default behavior. If you set `node.ml` to `false`, the node can
service API requests but it cannot run jobs. If you want to use {xpack} {ml}
features, there must be at least one {ml} node in your cluster.
For more information about this setting, see <<ml-settings>>.
//[float]
//== Where to Go Next
//<<ml-getting-started, Getting Started>> :: Enable machine learning and start
//discovering anomalies in your data.
//[float]
//== Have Comments, Questions, or Feedback?
//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and
//suggestions.