[DOCS] Make ILM documentation data stream aware (#58035) (#58110)

Co-authored-by: James Rodewig <james.rodewig@elastic.co> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> (cherry picked from commit 25cbbe56dd29fbee2efe8040e9c8b92d168cb670) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-15 15:16:14 +01:00 · 2020-06-15 15:16:14 +01:00 · 3635bd741c
parent 03dd73dc0d
commit 3635bd741c
6 changed files with 301 additions and 78 deletions
--- a/docs/reference/glossary.asciidoc
+++ b/docs/reference/glossary.asciidoc
@ -268,9 +268,14 @@ shard will never be started on the same node as its primary shard.
 --
 // tag::rollover-def[]
 // tag::rollover-def-short[]
-Redirect an alias to begin writing to a new index when the existing index reaches a certain age, number of docs, or size. 
+
+Creates a new index for a rollover target when the existing index reaches a certain size, number of docs, or age.
+A rollover target can be either an <<indices-aliases, index alias>> or a <<data-streams, data stream>>.
 // end::rollover-def-short[]
-The new index is automatically configured according to any matching <<glossary-index-template, index templates>>. 
+
+The new index is automatically configured according to any matching <<glossary-index-template,index templates>> or
+respectively, a <<indices-templates,composable index template>> if the rollover target is a
+<<data-streams, data stream>>.
 For example, if you're indexing log data, you might use rollover to create daily or weekly indices.
 See the {ref}/indices-rollover-index.html[rollover index API].
 // end::rollover-def[]
--- a/docs/reference/ilm/actions/ilm-rollover.asciidoc
+++ b/docs/reference/ilm/actions/ilm-rollover.asciidoc
@ -4,7 +4,7 @@

 Phases allowed: hot.

-Rolls an alias over to a new index when the existing index meets one of the rollover conditions.
+Rolls over a target to a new index when the existing index meets one of the rollover conditions.

 IMPORTANT: If the rollover action is used on a <<ccr-put-follow,follower index>>,
 policy execution waits until the leader index rolls over (or is
@ -12,7 +12,12 @@ policy execution waits until the leader index rolls over (or is
 then converts the follower index into a regular index with the
 <<ilm-unfollow-action, Unfollow action>>.

-For a managed index to be rolled over: 
+A rollover target can be a <<data-streams, data stream>> or an <<indices-aliases, index alias>>.
+When targeting a data stream, the new index becomes the data stream's
+<<data-stream-write-index,write index>> and its generation is incremented.
+
+To roll over an <<indices-aliases, index alias>>, the alias and its write index
+must meet the following conditions:

 * The index name must match the pattern '^.*-\\d+$', for example (`my_index-000001`).
 * The `index.lifecycle.rollover_alias` must be configured as the alias to roll over. 
--- a/docs/reference/ilm/actions/ilm-searchable-snapshot.asciidoc
+++ b/docs/reference/ilm/actions/ilm-searchable-snapshot.asciidoc
@ -6,6 +6,18 @@ Phases allowed: cold.

 Takes a snapshot of the managed index in the configured repository
 and mounts it as a searchable snapshot.
+If the managed index is part of a <<data-streams, data stream>>,
+the mounted index replaces the original index in the data stream.
+
+[NOTE]
+This action cannot be performed on a data stream's write index. Attempts to do
+so will fail. To convert the index to a searchable snapshot, first
+<<manually-roll-over-a-data-stream,manually roll over>> the data stream. This
+creates a new write index. Because the index is no longer the stream's write
+index, the action can then convert it to a searchable snapshot.
+Using a policy that makes use of the <<ilm-rollover, rollover>> action
+in the hot phase will avoid this situation and the need for a manual rollover for future
+managed indices.

 By default, this snapshot is deleted by the <<ilm-delete-action, delete action>> in the delete phase.
 To keep the snapshot, set `delete_searchable_snapshot` to `false` in the delete action.
--- a/docs/reference/ilm/actions/ilm-shrink.asciidoc
+++ b/docs/reference/ilm/actions/ilm-shrink.asciidoc
@ -21,6 +21,19 @@ policy execution waits until the leader index rolls over (or is
 then converts the follower index into a regular index with the 
 <<ilm-unfollow-action,the Unfollow action>> before performing the shrink operation.

+If the managed index is part of a <<data-streams, data stream>>,
+the shrunken index replaces the original index in the data stream.
+
+[NOTE]
+This action cannot be performed on a data stream's write index. Attempts to do
+so will fail. To shrink the index, first
+<<manually-roll-over-a-data-stream,manually roll over>> the data stream. This
+creates a new write index. Because the index is no longer the stream's write
+index, the action can resume shrinking it.
+Using a policy that makes use of the <<ilm-rollover, rollover>> action
+in the hot phase will avoid this situation and the need for a manual rollover for future
+managed indices.
+
 [[ilm-shrink-options]]
 ==== Shrink options
 `number_of_shards`::
--- a/docs/reference/ilm/ilm-tutorial.asciidoc
+++ b/docs/reference/ilm/ilm-tutorial.asciidoc
@ -11,17 +11,26 @@ This tutorial demonstrates how to use {ilm}
 ({ilm-init}) to manage indices that contain time-series data.

 When you continuously index timestamped documents into {es},
-you typically use an index alias so you can periodically roll over to a new index.
+you typically use a <<data-streams, data stream>> so you can periodically roll over to a
+new index.
 This enables you to implement a hot-warm-cold architecture to meet your performance
 requirements for your newest data, control costs over time, enforce retention policies,
 and still get the most out of your data.

-To automate rollover and management of time-series indices with {ilm-init}, you:
+TIP: Data streams are best suited for
+<<data-streams-append-only,append-only>> use cases. If you need to frequently
+update or delete existing documents across multiple indices, we recommend
+using an index alias and index template instead. You can still use ILM to
+manage and rollover the alias's indices. Skip to
+<<manage-time-series-data-without-data-streams>>.
+
+To automate rollover and management of a data stream with {ilm-init}, you:

 . <<ilm-gs-create-policy, Create a lifecycle policy>> that defines the appropriate
 phases and actions.
-. <<ilm-gs-apply-policy, Create an index template>> to apply the policy to each new index.
-. <<ilm-gs-bootstrap, Bootstrap an index>> as the initial write index.
+. <<ilm-gs-apply-policy, Create a composable template>> to create the data stream and
+apply the ILM policy and the indices settings and mappings configurations for the backing
+indices.
 . <<ilm-gs-check-progress, Verify indices are moving through the lifecycle phases>>
 as expected.

@ -29,7 +38,7 @@ For an introduction to rolling indices, see <<index-rollover>>.

 IMPORTANT: When you enable {ilm} for {beats} or the {ls} {es} output plugin,
 lifecycle policies are set up automatically.
-You do not need bootstrap the initial index or take any other actions.
+You do not need to take any other actions.
 You can modify the default policies through
 {kibana-ref}/example-using-index-lifecycle-policy.html[{kib} Management]
 or the {ilm-init} APIs.
@ -89,7 +98,211 @@ For the complete list of actions that {ilm} can perform, see <<ilm-actions>>.

 [discrete]
 [[ilm-gs-apply-policy]]
-=== Create an index template to apply the lifecycle policy
+=== Create a composable template to create the data stream and apply the lifecycle policy
+
+To set up a data stream, first create a composable template to specify the lifecycle policy. Because
+the template is for a data stream, it must also include a `data_stream` definition.
+
+For example, you might create a `timeseries_template` to use for a future data stream
+named `timeseries`.
+
+To enable the {ilm-init} to manage the data stream, the template configures one {ilm-init} setting:
+
+* `index.lifecycle.name` specifies the name of the lifecycle policy to apply to the data stream.
+
+You can use the {kib} Create template wizard to add the template.
+This wizard invokes the put _index_template API to create the <<indices-templates,composable index template>>
+with the options you specify.
+
+The underlying request looks like this:
+
+[source,console]
+-----------------------
+PUT _index_template/timeseries_template
+{
+  "index_patterns": ["timeseries"],                   <1>
+  "data_stream": {
+    "timestamp_field": "@timestamp"                   <2>
+  },
+  "template": {
+    "settings": {
+      "number_of_shards": 1,
+      "number_of_replicas": 1,
+      "index.lifecycle.name": "timeseries_policy"     <3>
+    },
+    "mappings": {
+      "properties": {
+        "@timestamp": {
+          "type": "date"                              <4>
+        }
+      }
+    }
+  }
+}
+-----------------------
+// TEST[continued]
+
+<1> Apply the template when a document is indexed into the `timeseries` target.
+<2> Identifies the timestamp field for the data source. This field must be present
+in all documents indexed into the `timeseries` data stream.
+<3> The name of the {ilm-init} policy used to manage the data stream.
+<4> A <<date,`date`>> or <<date_nanos,`date_nanos`>> field mapping for the
+timestamp field specified in the `timestamp_field` property
+
+You can also invoke this API directly to add templates.
+
+[discrete]
+[[ilm-gs-create-the-data-stream]]
+=== Create the data stream
+
+To get things started, index a document into the name or wildcard pattern defined
+in the `index_patterns` of the <<indices-templates,composable index template>>. As long
+as an existing data stream, index, or index alias does not already use the name, the index
+request automatically creates a corresponding data stream with a single backing index.
+{es} automatically indexes the request's documents into this backing index, which also
+acts as the stream's <<data-stream-write-index,write index>>.
+
+For example, the following request creates the `timeseries` data stream and the first generation
+backing index called `.ds-timeseries-000001`.
+
+[source,console]
+-----------------------
+POST timeseries/_doc
+{
+  "message": "logged the request",
+  "@timestamp": "1591890611"
+}
+
+-----------------------
+// TEST[continued]
+
+When a rollover condition in the lifecycle policy is met, the `rollover` action:
+
+* Creates the second generation backing index, named `.ds-timeseries-000002`.
+Because it is a backing index of the `timeseries` data stream, the configuration from the `timeseries_template` composable template is applied to the new index.
+* As it is the latest generation index of the `timeseries` data stream, the newly created
+backing index `.ds-timeseries-000002` becomes the data stream's write index.
+
+This process repeats each time a rollover condition is met.
+You can search across all of the data stream's backing indices, managed by the `timeseries_policy`,
+with the `timeseries` data stream name.
+Write operations are routed to the current write index. Read operations will be handled by all
+backing indices.
+
+[discrete]
+[[ilm-gs-check-progress]]
+=== Check lifecycle progress
+
+To get status information for managed indices, you use the {ilm-init} explain API.
+This lets you find out things like:
+
+* What phase an index is in and when it entered that phase.
+* The current action and what step is being performed.
+* If any errors have occurred or progress is blocked.
+
+For example, the following request gets information about the `timeseries` data stream's
+backing indices:
+
+[source,console]
+--------------------------------------------------
+GET .ds-timeseries-*/_ilm/explain
+--------------------------------------------------
+// TEST[continued]
+
+The following response shows the data stream's first generation backing index is waiting for the `hot`
+phase's `rollover` action.
+It remains in this state and {ilm-init} continues to call `check-rollover-ready` until a rollover condition
+is met.
+
+// [[36818c6d9f434d387819c30bd9addb14]]
+[source,console-result]
+--------------------------------------------------
+{
+  "indices": {
+    ".ds-timeseries-000001": {
+      "index": ".ds-timeseries-000001",
+      "managed": true,
+      "policy": "timeseries_policy",             <1>
+      "lifecycle_date_millis": 1538475653281,
+      "age": "30s",                              <2>
+      "phase": "hot",
+      "phase_time_millis": 1538475653317,
+      "action": "rollover",
+      "action_time_millis": 1538475653317,
+      "step": "check-rollover-ready",            <3>
+      "step_time_millis": 1538475653317,
+      "phase_execution": {
+        "policy": "timeseries_policy",
+        "phase_definition": {                    <4>
+          "min_age": "0ms",
+          "actions": {
+            "rollover": {
+              "max_size": "50gb",
+              "max_age": "30d"
+            }
+          }
+        },
+        "version": 1,
+        "modified_date_in_millis": 1539609701576
+      }
+    }
+  }
+}
+--------------------------------------------------
+// TESTRESPONSE[skip:no way to know if we will get this response immediately]
+
+<1> The policy used to manage the index
+<2> The age of the index
+<3> The step {ilm-init} is performing on the index
+<4> The definition of the current phase (the `hot` phase)
+
+//////////////////////////
+
+[source,console]
+--------------------------------------------------
+DELETE /_data_stream/timeseries
+--------------------------------------------------
+// TEST[continued]
+
+//////////////////////////
+
+
+//////////////////////////
+
+[source,console]
+--------------------------------------------------
+DELETE /_index_template/timeseries_template
+--------------------------------------------------
+// TEST[continued]
+
+//////////////////////////
+
+[discrete]
+[[manage-time-series-data-without-data-streams]]
+=== Manage time-series data without data streams
+
+Even though <<data-streams, data streams>> are a convenient way to scale
+and manage time-series data, they are designed to be append-only. We recognise there
+might be use-cases where data needs to be updated or deleted in place and the
+data streams don't support delete and update requests directly,
+so the index APIs would need to be used directly on the data stream's backing indices.
+
+In these cases, you can use an index alias to manage indices containing the time-series data
+and periodically roll over to a new index.
+
+To automate rollover and management of time-series indices with {ilm-init} using an index
+alias, you:
+
+. Create a lifecycle policy that defines the appropriate phases and actions.
+See <<ilm-gs-create-policy, Create a lifecycle policy>> above.
+. <<ilm-gs-alias-apply-policy, Create an index template>> to apply the policy to each new index.
+. <<ilm-gs-alias-bootstrap, Bootstrap an index>> as the initial write index.
+. <<ilm-gs-alias-check-progress, Verify indices are moving through the lifecycle phases>>
+as expected.
+
+[discrete]
+[[ilm-gs-alias-apply-policy]]
+=== Create a legacy index template to apply the lifecycle policy

 To automatically apply a lifecycle policy to the new write index on rollover,
 specify the policy in the index template used to create new indices.
@ -143,8 +356,8 @@ DELETE /_template/timeseries_template
 //////////////////////////

 [discrete]
-[[ilm-gs-bootstrap]]
-=== Bootstrap the initial time-series index
+[[ilm-gs-alias-bootstrap]]
+=== Bootstrap the initial time-series index with a write index alias

 To get things started, you need to bootstrap an initial index and
 designate it as the write index for the rollover alias specified in your index template.
@ -178,17 +391,13 @@ You can search across all of the indices managed by the `timeseries_policy` with
 Write operations are routed to the current write index.

 [discrete]
-[[ilm-gs-check-progress]]
+[[ilm-gs-alias-check-progress]]
 === Check lifecycle progress

-To get status information for managed indices, you use the {ilm-init} explain API.
-This lets you find out things like:
-
-* What phase an index is in and when it entered that phase.
-* The current action and what step is being performed.
-* If any errors have occurred or progress is blocked.
-
-For example, the following request gets information about the `timeseries` indices:
+Retrieving the status information for managed indices is very similar to the data stream case.
+See the data stream <<ilm-gs-check-progress, check progress section>> for more information.
+The only difference is the indices namespace, so retrieving the progress will entail the following
+api call:

 [source,console]
 --------------------------------------------------
@ -196,48 +405,11 @@ GET timeseries-*/_ilm/explain
 --------------------------------------------------
 // TEST[continued]

-The response below shows that the bootstrap index is waiting in the `hot` phase's `rollover` action.
-It remains in this state and {ilm-init} continues to call `attempt-rollover`
-until the rollover conditions are met.
+//////////////////////////

-// [[36818c6d9f434d387819c30bd9addb14]]
-[source,console-result]
+[source,console]
 --------------------------------------------------
-{
-  "indices": {
-    "timeseries-000001": {
-      "index": "timeseries-000001",
-      "managed": true,
-      "policy": "timeseries_policy",             <1>
-      "lifecycle_date_millis": 1538475653281,
-      "age": "30s",                              <2>
-      "phase": "hot",
-      "phase_time_millis": 1538475653317,
-      "action": "rollover",
-      "action_time_millis": 1538475653317,
-      "step": "attempt-rollover",                <3>
-      "step_time_millis": 1538475653317,
-      "phase_execution": {
-        "policy": "timeseries_policy",
-        "phase_definition": {                    <4>
-          "min_age": "0ms",
-          "actions": {
-            "rollover": {
-              "max_size": "50gb",
-              "max_age": "30d"
-            }
-          }
-        },
-        "version": 1,
-        "modified_date_in_millis": 1539609701576
-      }
-    }
-  }
-}
+DELETE /timeseries-000001
 --------------------------------------------------
-// TESTRESPONSE[skip:no way to know if we will get this response immediately]
-
-<1> The policy used to manage the index
-<2> The age of the index
-<3> The step {ilm-init} is performing on the index
-<4> The definition of the current phase (the `hot` phase)
+// TEST[continued]
+//////////////////////////
--- a/docs/reference/ilm/index-rollover.asciidoc
+++ b/docs/reference/ilm/index-rollover.asciidoc
@ -12,7 +12,23 @@ Using rolling indices enables you to:
 * Shift older, less frequently accessed data to less expensive _cold_ nodes,
 * Delete data according to your retention policies by removing entire indices.

-Rollover relies on three things:
+We recommend using <<indices-create-data-stream, data streams>> to manage time-series
+data. Data streams automatically track the write index while keeping configuration to a minimum.
+
+Each data stream requires a <<indices-templates,composable index template>> that contains:
+
+* A name or wildcard (`*`) pattern for the data stream.
+
+* The data stream's timestamp field. This field must be mapped as a
+  <<date,`date`>> or <<date_nanos,`date_nanos`>> field datatype and must be
+  included in every document indexed to the data stream.
+  
+  * The mappings and settings applied to each backing index when it's created.
+
+Data streams are designed for append-only data, where the data stream name
+can be used as the operations (read, write, rollover, shrink etc.) target.
+If your use case requires data to be updated in place, you can instead manage your time-series data using <<indices-aliases, indices aliases>>. However, there are a few more configuration steps and
+concepts:

 * An _index template_ that specifies the settings for each new index in the series.
 You optimize this configuration for ingestion, typically using as many shards as you have hot nodes.