[DOCS] Overall review (elastic/x-pack-elasticsearch#1237)

* [DOCS] Overall review * [DOCS] General review * [DOCS] typo * [DOCS] Fix for processed_record_count with aggs * [DOCS] Added latency tbd Original commit: elastic/x-pack-elasticsearch@9e8cf664c1
2017-04-27 18:51:48 +01:00 · 2017-04-27 18:51:48 +01:00 · ffb3bb6493
parent 642b1f7c19
commit ffb3bb6493
8 changed files with 77 additions and 69 deletions
--- a/docs/en/ml/api-quickref.asciidoc
+++ b/docs/en/ml/api-quickref.asciidoc
@ -10,11 +10,11 @@ All {ml} endpoints have the following base:
 The main {ml} resources can be accessed with a variety of endpoints:
-* <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs.
+* <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs
-* <<ml-api-datafeeds,+/datafeeds/+>>: Update data to be analyzed.
+* <<ml-api-datafeeds,+/datafeeds/+>>: Select data from {es} to be analyzed
-* <<ml-api-results,+/results/+>>: Access the results of a {ml} job.
+* <<ml-api-results,+/results/+>>: Access the results of a {ml} job
-* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots.
+* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots
-* <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations.
+* <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations
 [float]
 [[ml-api-jobs]]
--- a/docs/en/ml/introduction.asciidoc
+++ b/docs/en/ml/introduction.asciidoc
@ -19,8 +19,8 @@ science-related configurations in order to get the benefits of {ml}.
 === Integration with the Elastic Stack
 Machine learning is tightly integrated with the Elastic Stack.
-Data is pulled from {es} for analysis and anomaly results are displayed in
+Data is pulled from {es} for analysis and anomaly results are displayed in {kb} 
-{kb} dashboards.
+dashboards.
 [float]
 [[ml-concepts]]
@ -36,23 +36,25 @@ Jobs::
  with a job, see <<ml-job-resource, Job Resources>>.
 Data feeds::
-  Jobs can analyze either a batch of data from a data store or a stream of data
+  Jobs can analyze either a one-off batch of data or continuously in real-time. 
-  in real-time. The latter involves data that is retrieved from {es} and is
+  Data feeds retrieve data from {es} for analysis. Alternatively you can 
-  referred to as a data feed.
+  <<ml-post-data],POST data>> from any source directly to an API. 
 Detectors::
  Part of the configuration information associated with a job, detectors define
  the type of analysis that needs to be done (for example, max, average, rare).
  They also specify which fields to analyze. You can have more than one detector
  in a job, which is more efficient than running multiple jobs against the same
-  data stream. For a list of the properties associated with detectors, see
+  data. For a list of the properties associated with detectors, see
  <<ml-detectorconfig, Detector Configuration Objects>>.
 Buckets::
  Part of the configuration information associated with a job, the _bucket span_
-  defines the time interval across which the job analyzes. When setting the
+  defines the time interval used to summarize and model the data. This is typically
  between 5 minutes to 1 hour, and it depends on your data characteristics. When setting the
  bucket span, take into account the granularity at which you want to analyze,
-  the frequency of the input data, and the frequency at which alerting is required.
+  the frequency of the input data, the typical duration of the anomalies
  and the frequency at which alerting is required.
 Machine learning nodes::
  A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
--- a/docs/en/rest-api/ml-api.asciidoc
+++ b/docs/en/rest-api/ml-api.asciidoc
@ -12,14 +12,14 @@ Use machine learning to detect anomalies in time series data.
 [[ml-api-datafeed-endpoint]]
 === Data Feeds
-* <<ml-put-datafeed,Create data feeds>>
+* <<ml-put-datafeed,Create data feed>>
-* <<ml-delete-datafeed,Delete data feeds>>
+* <<ml-delete-datafeed,Delete data feed>>
-* <<ml-get-datafeed,Get data feeds>>
+* <<ml-get-datafeed,Get data feed info>>
 * <<ml-get-datafeed-stats,Get data feed statistics>>
-* <<ml-preview-datafeed,Preview data feeds>>
+* <<ml-preview-datafeed,Preview data feed>>
-* <<ml-start-datafeed,Start data feeds>>
+* <<ml-start-datafeed,Start data feed>>
-* <<ml-stop-datafeed,Stop data feeds>>
+* <<ml-stop-datafeed,Stop data feed>>
-* <<ml-update-datafeed,Update data feeds>>
+* <<ml-update-datafeed,Update data feed>>
 include::ml/put-datafeed.asciidoc[]
 include::ml/delete-datafeed.asciidoc[]
@ -35,15 +35,15 @@ include::ml/update-datafeed.asciidoc[]
 You can use APIs to perform the following activities:
-* <<ml-close-job,Close jobs>>
+* <<ml-close-job,Close job>>
-* <<ml-put-job,Create jobs>>
+* <<ml-put-job,Create job>>
-* <<ml-delete-job,Delete jobs>>
+* <<ml-delete-job,Delete job>>
-* <<ml-get-job,Get jobs>>
+* <<ml-get-job,Get job info>>
 * <<ml-get-job-stats,Get job statistics>>
-* <<ml-flush-job,Flush jobs>>
+* <<ml-flush-job,Flush job>>
-* <<ml-open-job,Open jobs>>
+* <<ml-open-job,Open job>>
-* <<ml-post-data,Post data to jobs>>
+* <<ml-post-data,Post data to job>>
-* <<ml-update-job,Update jobs>>
+* <<ml-update-job,Update job>>
 * <<ml-valid-detector,Validate detectors>>
 * <<ml-valid-job,Validate job>>
@ -62,10 +62,10 @@ include::ml/validate-job.asciidoc[]
 [[ml-api-snapshot-endpoint]]
 === Model Snapshots
-* <<ml-delete-snapshot,Delete model snapshots>>
+* <<ml-delete-snapshot,Delete model snapshot>>
-* <<ml-get-snapshot,Get model snapshots>>
+* <<ml-get-snapshot,Get model snapshot info>>
-* <<ml-revert-snapshot,Revert model snapshots>>
+* <<ml-revert-snapshot,Revert model snapshot>>
-* <<ml-update-snapshot,Update model snapshots>>
+* <<ml-update-snapshot,Update model snapshot>>
 include::ml/delete-snapshot.asciidoc[]
 include::ml/get-snapshot.asciidoc[]
@ -91,7 +91,7 @@ include::ml/get-record.asciidoc[]
 * <<ml-datafeed-resource,Data feeds>>
 * <<ml-datafeed-counts,Data feed counts>>
 * <<ml-job-resource,Jobs>>
-* <<ml-jobstats,Job Stats>>
+* <<ml-jobstats,Job statistics>>
 * <<ml-snapshot-resource,Model snapshots>>
 * <<ml-results-resource,Results>>
--- a/docs/en/rest-api/ml/datafeedresource.asciidoc
+++ b/docs/en/rest-api/ml/datafeedresource.asciidoc
@ -7,16 +7,18 @@ A data feed resource has the following properties:
 `aggregations`::
  (object) If set, the data feed performs aggregation searches.
  For syntax information, see {ref}/search-aggregations.html[Aggregations].
-  Support for aggregations is limited: TBD.
+  Support for aggregations is limited and should only be used with 
  low cardinality data: 
  For example:
  `{"@timestamp": {"histogram": {"field": "@timestamp",
  "interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
  "min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
  "field": "events_per_min"}}}}}`.
  //TBD link to a Working with aggregations page
 `chunking_config`::
-  (object) The chunking configuration, which specifies how data searches are
+  (object) Specifies how data searches are split into time chunks.
-  chunked. See <<ml-datafeed-chunking-config>>.
+  See <<ml-datafeed-chunking-config>>.
  For example: {"mode": "manual", "time_span": "3h"}
 `datafeed_id`::
@ -39,14 +41,12 @@ A data feed resource has the following properties:
  corresponds to the query object in an Elasticsearch search POST body. All the
  options that are supported by Elasticsearch can be used, as this object is
  passed verbatim to Elasticsearch. By default, this property has the following
-  value: `{"match_all": {"boost": 1}}`. If this property is not specified, the
+  value: `{"match_all": {"boost": 1}}`. 
  default value is `“match_all”: {}`.
 `query_delay`::
  (time units) The number of seconds behind real-time that data is queried. For
  example, if data from 10:04 a.m. might not be searchable in Elasticsearch
-  until 10:06 a.m., set this property to 120 seconds. The default value is 60
+  until 10:06 a.m., set this property to 120 seconds. The default value is `60s`.
  seconds. For example: "60s".
 `scroll_size`::
  (unsigned integer) The `size` parameter that is used in Elasticsearch searches.
@ -59,11 +59,17 @@ A data feed resource has the following properties:
 [[ml-datafeed-chunking-config]]
 ===== Chunking Configuration Objects
 Data feeds may be required to search over long time periods, for several months
 or years. This search is split into time chunks in order to ensure the load
 on {es} is managed. Chunking configuration controls how the size of these time
 chunks are calculated and is an advanced configuration option.
 A chunking configuration object has the following properties:
 `mode` (required)::
  There are three available modes: +
-  `auto`::: The chunk size will be dynamically calculated.
+  `auto`::: The chunk size will be dynamically calculated. This is the default 
  and recommended value.
  `manual`::: Chunking will be applied according to the specified `time_span`.
  `off`::: No chunking will be applied.
@ -79,20 +85,20 @@ A chunking configuration object has the following properties:
 The get data feed statistics API provides information about the operational
 progress of a data feed. For example:
-`assigment_explanation`::
+`assignment_explanation`::
-  TBD. For example: " "
+  (string) For started data feeds only, contains messages relating to the selection
  of a node.
 `datafeed_id`::
 (string) A numerical character string that uniquely identifies the data feed.
 `node`::
-  (object) TBD
+  (object) The node upon which the data feed is started. The data feed and job will be on the same node.
-  The node that is running the query?
+  `id`::: The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw".
-  `id`::: TBD. For example, "0-o0tOoRTwKFZifatTWKNw".
+  `name`::: The node name. For example, "0-o0tOo".
-  `name`::: TBD. For example, "0-o0tOo".
+  `ephemeral_id`::: The node ephemeral id.
-  `ephemeral_id`::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
+  `transport_address`::: The host and port where transport HTTP connections are accepted. For example, "127.0.0.1:9300".
-  `transport_address`::: TBD. For example, "127.0.0.1:9300".
+  `attributes`::: For example, {"max_running_jobs": "10"}.
  `attributes`::: TBD. For example, {"max_running_jobs": "10"}.
 `state`::
  (string) The status of the data feed, which can be one of the following values: +
--- a/docs/en/rest-api/ml/jobcounts.asciidoc
+++ b/docs/en/rest-api/ml/jobcounts.asciidoc
@ -118,14 +118,8 @@ necessarily a cause for concern.
  This value includes records with missing fields, since they are nonetheless
  analyzed. +
  If you use data feeds and have aggregations in your search query,
-  the `processed_record_count` differs from the `input_record_count`. +
+  the `processed_record_count` will be the number of aggregated records
-  If you use the <<ml-post-data,post data API>> to provide data to the job,
+  processed, not the number of {es} documents. 
  the following records are not processed: +
 +
 --
 * Records not in chronological order and outside the latency window
 * Records with invalid timestamp
 --
 `sparse_bucket_count`::
  (long) The number of buckets that contained few data points compared to the
@ -167,12 +161,12 @@ The `model_size_stats` object has the following properties:
  (string) For internal use. The type of result.
 `total_by_field_count`::
-  (long) The number of `by` field values that were analyzed by the models.
+  (long) The number of `by` field values that were analyzed by the models.+
 NOTE: The `by` field values are counted separately for each detector and partition.
 `total_over_field_count`::
-  (long) The number of `over` field values that were analyzed by the models.
+  (long) The number of `over` field values that were analyzed by the models.+
 NOTE: The `over` field values are counted separately for each detector and partition.
@ -196,12 +190,10 @@ This information is available only for open jobs.
  (string) The node name.
 `ephemeral_id`::
-
+  (string) The ephemeral id of the node.
 `transport_address`::
  (string) The host and port where transport HTTP connections are accepted.
 `attributes`::
-  (object) {ml} attributes.
+  (object) For example, {"max_running_jobs": "10"}.
  `max_running_jobs`::: The maximum number of concurrently open jobs that are
  allowed per node.
--- a/docs/en/rest-api/ml/post-data.asciidoc
+++ b/docs/en/rest-api/ml/post-data.asciidoc
@ -15,9 +15,17 @@ The job must have been opened prior to sending data.
 File sizes are limited to 100 Mb, so if your file is larger,
 then split it into multiple files and upload each one separately in sequential time order.
-When running in real-time, it is generally recommended to arrange to perform
+When running in real-time, it is generally recommended to perform
 many small uploads, rather than queueing data to upload larger files.
 When uploading data, check the <<ml-datacounts,job data counts>> for progress.
 The following records will not be processed:
 * Records not in chronological order and outside the latency window
 * Records with an invalid timestamp
 //TBD link to Working with Out of Order timeseries concept doc
 IMPORTANT:  Data can only be accepted from a single connection.
 Use a single connection synchronously to send data, close, flush, or delete a single job.
 It is not currently possible to post data to multiple jobs using wildcards
--- a/docs/en/rest-api/ml/snapshotresource.asciidoc
+++ b/docs/en/rest-api/ml/snapshotresource.asciidoc
@ -14,7 +14,6 @@ When choosing a new value, consider the following:
 * Persistence enables snapshots to be reverted.
 * The time taken to persist a job is proportional to the size of the model in memory.
 //* The smallest allowed value is 3600 (1 hour).
 ////
 A model snapshot resource has the following properties:
@ -34,7 +33,8 @@ A model snapshot resource has the following properties:
  (object) Summary information describing the model. See <<ml-snapshot-stats,Model Size Statistics>>.
 `retain`::
-  (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
+  (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots 
  older than `model_snapshot_retention_days`.
  However, this snapshot will be deleted when the job is deleted.
  The default value is false.
@ -89,4 +89,4 @@ The `model_size_stats` object has the following properties:
 `total_partition_field_count`::
  (long) The number of _partition_ field values analyzed.
-////
+
--- a/docs/en/settings/ml-settings.asciidoc
+++ b/docs/en/settings/ml-settings.asciidoc
@ -1,6 +1,6 @@
 [[ml-settings]]
 == Machine Learning Settings
-You do not need to configure any settings to use {ml}.
+You do not need to configure any settings to use {ml}. It is enabled by default.
 [float]
 [[general-ml-settings]]