[DOCS] Reformat data streams intro and overview (#57954) (#57993)

Changes: * Updates 'Data streams' intro page to focus on problem solution and benefits. * Adds 'Data streams overview' page to cover conceptual information, based on existing content in the 'Data streams' intro. * Adds diagrams for data streams and search/indexing request examples. * Moves API jump list and API docs to a new 'Data streams APIs' section. Links to these APIs will be available through tutorials. * Add xrefs to existing docs for concepts like generation, write index, and append-only.
2025-03-24 17:09:48 +00:00 · 2020-06-11 11:32:09 -04:00 · 2020-06-11 11:32:09 -04:00 · 6fc8317f07
commit 6fc8317f07
parent 4e738f60f8
10 changed files with 208 additions and 129 deletions
--- a/docs/reference/data-streams/data-stream-apis.asciidoc
+++ b/docs/reference/data-streams/data-stream-apis.asciidoc
@ -0,0 +1,14 @@
+[[data-stream-apis]]
+== Data stream APIs
+
+The following APIs are available for managing data streams:
+
+* To get information about data streams, use the <<indices-get-data-stream, get data stream API>>.
+* To delete data streams, use the <<indices-delete-data-stream, delete data stream API>>.
+* To manually create a data stream, use the <<indices-create-data-stream, create data stream API>>.
+
+include::{es-repo-dir}/indices/create-data-stream.asciidoc[]
+
+include::{es-repo-dir}/indices/get-data-stream.asciidoc[]
+
+include::{es-repo-dir}/indices/delete-data-stream.asciidoc[]
--- a/docs/reference/data-streams/data-streams-overview.asciidoc
+++ b/docs/reference/data-streams/data-streams-overview.asciidoc
@ -0,0 +1,142 @@
+[[data-streams-overview]]
+== Data streams overview
++++
+<titleabbrev>Overview</titleabbrev>
++++
+
+A data stream consists of one or more _backing indices_. Backing indices are
+<<index-hidden,hidden>>, automatically-generated indices used to store a
+stream's documents.
+
+image::images/data-streams/data-streams-diagram.svg[align="center"]
+
+The creation of a data stream requires an associated
+<<indices-templates,composable template>>. This template acts as a blueprint for
+the stream's backing indices. It contains:
+
+* A name or wildcard (`*`) pattern for the data stream.
+
+* The data stream's _timestamp field_. This field must be mapped as a
+  <<date,`date`>> or <<date_nanos,`date_nanos`>> field datatype and must be
+  included in every document indexed to the data stream.
+
+* The mappings and settings applied to each backing index when it's created.
+
+The same composable template can be used to create multiple data streams.
+See <<set-up-a-data-stream>>.
+
+[discrete]
+[[data-streams-generation]]
+=== Generation
+
+Each data stream tracks its _generation_: a six-digit, zero-padded integer
+that acts as a cumulative count of the data stream's backing indices. This count
+includes any deleted indices for the stream. The generation is incremented
+whenever a new backing index is added to the stream.
+
+When a backing index is created, the index is named using the following
+convention:
+
+[source,text]
+----
+.ds-<data-stream>-<generation>
+----
+
+.*Example*
+[%collapsible]
+====
+The `web_server_logs` data stream has a generation of `34`. The most recently
+created backing index for this data stream is named
+`.ds-web_server_logs-000034`.
+====
+
+Because the generation increments with each new backing index, backing indices
+with a higher generation contain more recent data. Backing indices with a lower
+generation contain older data.
+
+A backing index's name can change after its creation due to a
+<<indices-shrink-index,shrink>>, <<snapshots-restore-snapshot,restore>>, or
+other operations.
+
+[discrete]
+[[data-stream-write-index]]
+=== Write index
+
+When a read request is sent to a data stream, it routes the request to all its
+backing indices. For example, a search request sent to a data stream would query
+all its backing indices.
+
+image::images/data-streams/data-streams-search-request.svg[align="center"]
+
+However, the most recently created backing index is the data stream’s only
+_write index_. The data stream routes all indexing requests for new documents to
+this index.
+
+image::images/data-streams/data-streams-index-request.svg[align="center"]
+
+You cannot add new documents to a stream's other backing indices, even by
+sending requests directly to the index. This means you cannot submit the
+following requests directly to any backing index except the write index:
+
+* An <<docs-index_,Index API>> request with an
+  <<docs-index-api-op_type,`op_type`>> of `create`. The `op_type` parameter
+  defaults to `create` when adding new documents.
+* A <<docs-bulk,Bulk API>> request using a `create` action
+
+Because it's the only index capable of ingesting new documents, you cannot
+perform operations on a write index that might hinder indexing. These
+prohibited operations include:
+
+* <<indices-close,Closing the write index>>
+* <<indices-delete-index,Deleting the write index>>
+* <<freeze-index-api,Freezing the write index>>
+* <<indices-shrink-index,Shrinking the write index>>
+
+[discrete]
+[[data-streams-rollover]]
+=== Rollover
+
+When a data stream is created, one backing index is automatically created.
+Because this single index is also the most recently created backing index, it
+acts as the stream's write index.
+
+A <<indices-rollover-index,rollover>> creates a new backing index for a data
+stream. This new backing index becomes the stream's write index, replacing
+the current one, and increments the stream's generation.
+
+In most cases, we recommend using <<index-lifecycle-management,{ilm}
+({ilm-init})>> to automate rollovers for data streams. This lets you
+automatically roll over the current write index when it meets specified
+criteria, such as a maximum age or size.
+
+However, you can also use the <<indices-rollover-index,rollover API>> to
+manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
+
+[discrete]
+[[data-streams-append-only]]
+=== Append-only
+
+For most time-series use cases, existing data is rarely, if ever, updated.
+Because of this, data streams are designed to be append-only. This means you can
+send indexing requests for new documents directly to a data stream. However, you
+cannot send update or deletion requests for existing documents to a data stream.
+
+To update or delete specific documents in a data stream, submit one of the
+following requests to the backing index containing the document:
+
+* An <<docs-index_,Index API>> request with an
+  <<docs-index-api-op_type,`op_type`>> of `index`.
+  These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
+  and `if_primary_term`>> arguments.
+
+* A <<docs-bulk,Bulk API>> request using the `delete`, `index`, or `update`
+  action. If the action type is `index`, the action must include valid
+  <<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
+  arguments.
+
+* A <<docs-delete,Delete API>> request
+
+TIP: If you need to frequently update or delete existing documents across
+multiple indices, we recommend using an <<indices-add-alias,index alias>> and
+<<indices-templates,index template>> instead of a data stream. You can still
+use <<index-lifecycle-management,{ilm-init}>> to manage the indices.
--- a/docs/reference/data-streams/data-streams.asciidoc
+++ b/docs/reference/data-streams/data-streams.asciidoc
@ -1,130 +1,60 @@
 [[data-streams]]
 = Data streams
++++
+<titleabbrev>Data streams</titleabbrev>
++++

-[partintro]
--
-You can use data streams to index time-based data that's continuously generated.
-A data stream groups indices from the same time-based data source.
-A data stream tracks its indices, known as _backing indices_, using an ordered
-list.
+A _data stream_ is a convenient, scalable way to ingest, search, and manage
+continuously generated time-series data.

-A data stream's backing indices are <<index-hidden,hidden>>.
-While all backing indices handle read requests, the most recently created
-backing index is the data stream's only write index.  A data stream only
-accepts <<docs-index_,index requests>> with `op_type` set to `create`. To update
-or delete specific documents in a data stream, submit a <<docs-delete,delete>>
-or <<docs-update,update>> API request to the backing index containing the
-document.
+Time-series data, such as logs, tends to grow over time. While storing an entire
+time series in a single {es} index is simpler, it is often more efficient and
+cost-effective to store large volumes of data across multiple, time-based
+indices. Multiple indices let you move indices containing older, less frequently
+queried data to less expensive hardware and delete indices when they're no
+longer needed, reducing overhead and storage costs.

-To create a data stream, set up a <<indices-templates,composable index
-template>> containing:
+A data stream is designed to give you the best of both worlds:

-* A name or wildcard pattern for the data stream in the `index_patterns` property.
-* A `data_stream` definition that contains the `timestamp_field` property.
-  The `timestamp_field` must be the primary timestamp field
-   for the data source. This field must be included in every
-   document indexed to the data stream.
+* The simplicity of a single, named resource you can use for requests
+  related
+* The storage, scalability, and cost-saving benefits of multiple indices

-When you index one or more documents to a not-yet-existent target matching
-the template's name or pattern, {es} automatically creates the corresponding
-data stream. You can also manually create a data stream using the
-<<indices-create-data-stream,create data stream API>>. However, a composable
-template for the stream is still required.
+You can submit indexing and search requests directly to a data stream. The
+stream automatically routes the requests to a collection of hidden,
+auto-generated indices that store the stream's data.
+
+You can use a <<indices-templates,composable template>> and
+<<index-lifecycle-management,{ilm} ({ilm-init})>> to automate the management of
+these hidden indices. You can use {ilm-init} to spin up new indices, allocate
+indices to different hardware, delete old indices, and take other automatic
+actions based on age or size criteria you set. This lets you seamlessly scale
+your data storage based on your budget, performance, resiliency, and retention
+needs.

-You can use the <<indices-rollover-index,rollover API>> to roll a data stream
-over to a new index when the current write index meets specified criteria, such
-as a maximum age or size. A rollover creates a new backing index and updates the
-data stream's list of backing indices. This new index then becomes the stream's
-new write index. See <<rollover-data-stream-ex>>.

 [discrete]
-[[create-data-stream]]
-== Create a data stream
+[[when-to-use-data-streams]]
+== When to use data streams

-Create a composable template with a `data_stream` definition:
+We recommend using data streams if you:

-[source,console]
-----------------------------------
-PUT /_index_template/logs_template
-{
-  "index_patterns": ["logs-*"],
-  "data_stream": {
-    "timestamp_field": "@timestamp"
-  }
-}
-----------------------------------
+* Use {es} to ingest, search, and manage large volumes of time-series data
+* Want to scale and reduce costs by using {ilm-init} to automate the management
+  of your indices
+* Index large volumes of time-series data in {es} but rarely delete or update
+  individual documents

-Start indexing data to a target matching the composable template's wildcard
-pattern:
-
-[source,console]
----
-POST /logs-foobar/_doc
-{
-  "@timestamp": "2050-11-15T14:12:12",
-  ...
-}
----
-// TEST[continued]
-// TEST[s/,//]
-// TEST[s/\.\.\.//]
-
-Response:
-
-[source,console-result]
--------------------------------------------------
-{
-    "_shards" : {
-        "total" : 2,
-        "failed" : 0,
-        "successful" : 1
-    },
-    "_index" : ".ds-logs-foobar-000001",
-    "_type" : "_doc",
-    "_id" : "W0tpsmIBdwcYyG50zbta",
-    "_version" : 1,
-    "_seq_no" : 0,
-    "_primary_term" : 1,
-    "result": "created"
-}
--------------------------------------------------
-// TESTRESPONSE[s/W0tpsmIBdwcYyG50zbta/$body._id/]
-
-Or create a data stream using the create data stream API:
-
-[source,console]
--------------------------------------------------
-PUT /_data_stream/logs-barbaz
--------------------------------------------------
-// TEST[continued]
-
-////
-[source,console]
-----------------------------------
-DELETE /_data_stream/logs-foobar
-DELETE /_data_stream/logs-barbaz
-DELETE /_index_template/logs_template
-----------------------------------
-// TEST[continued]
-////
-
-[discrete]
-[[data-streams-apis]]
-== Data stream APIs
-
-The following APIs are available for managing data streams:
-
-* To get information about data streams, use the <<indices-get-data-stream, get data stream API>>.
-* To delete data streams, use the <<indices-delete-data-stream, delete data stream API>>.
-* To manually create a data stream, use the <<indices-create-data-stream, create data stream API>>.

 [discrete]
 [[data-streams-toc]]
 == In this section

+* <<data-streams-overview>>
 * <<set-up-a-data-stream>>
 * <<use-a-data-stream>>
--

+
+include::data-streams-overview.asciidoc[]
 include::set-up-a-data-stream.asciidoc[]
 include::use-a-data-stream.asciidoc[]
--- a/docs/reference/data-streams/set-up-a-data-stream.asciidoc
+++ b/docs/reference/data-streams/set-up-a-data-stream.asciidoc
@ -22,11 +22,11 @@ TIP: Data streams work well with most common log formats. While no schema is
 required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
 (ECS)].

-* Data streams are designed to be append-only. While you can index new documents
-directly to a data stream, you cannot use a data stream to directly update or
-delete individual documents. To update or delete specific documents in a data
-stream, submit a <<docs-delete,delete>> or <<docs-update,update>> API request to
-the backing index containing the document.
+* Data streams are designed to be <<data-streams-append-only,append-only>>.
+While you can index new documents directly to a data stream, you cannot use a
+data stream to directly update or delete individual documents. To update or
+delete specific documents in a data stream, submit a <<docs-delete,delete>> or
+<<docs-update,update>> API request to the backing index containing the document.


 [discrete]
@ -57,8 +57,9 @@ The following <<ilm-put-lifecycle,create lifecycle policy API>> request
 configures the `logs_policy` lifecycle policy.

 The `logs_policy` policy uses the <<ilm-rollover,`rollover` action>> to create a
-new write index for the data stream when the current one reaches 25GB in size.
-The policy also deletes backing indices 30 days after their rollover.
+new <<data-stream-write-index,write index>> for the data stream when the current
+one reaches 25GB in size. The policy also deletes backing indices 30 days after
+their rollover.

 [source,console]
 ----
--- a/docs/reference/data-streams/use-a-data-stream.asciidoc
+++ b/docs/reference/data-streams/use-a-data-stream.asciidoc
@ -144,7 +144,8 @@ GET /logs/_search
 === Manually roll over a data stream

 A rollover creates a new backing index for a data stream. This new backing index
-becomes the stream's new write index and increments the stream's generation.
+becomes the stream's <<data-stream-write-index,write index>> and increments
+the stream's <<data-streams-generation,generation>>.

 In most cases, we recommend using <<index-lifecycle-management,{ilm-init}>> to
 automate rollovers for data streams. This lets you automatically roll over the
--- a/docs/reference/images/data-streams/data-streams-diagram.svg
+++ b/docs/reference/images/data-streams/data-streams-diagram.svg
--- a/docs/reference/images/data-streams/data-streams-index-request.svg
+++ b/docs/reference/images/data-streams/data-streams-index-request.svg
--- a/docs/reference/images/data-streams/data-streams-search-request.svg
+++ b/docs/reference/images/data-streams/data-streams-search-request.svg
--- a/docs/reference/indices.asciidoc
+++ b/docs/reference/indices.asciidoc
@ -2,7 +2,7 @@
 == Index APIs

 Index APIs are used to manage individual indices,
-index settings, data streams, aliases, mappings, and index templates.
+index settings, aliases, mappings, and index templates.

 [float]
 [[index-management]]
@ -31,13 +31,6 @@ index settings, data streams, aliases, mappings, and index templates.
 * <<indices-get-field-mapping>>
 * <<indices-types-exists>>

-[float]
-[[data-stream-management]]
-=== Data stream management:
-* <<indices-create-data-stream>>
-* <<indices-delete-data-stream>>
-* <<indices-get-data-stream>>
-
 [float]
 [[alias-management]]
 === Alias management:
@ -165,9 +158,3 @@ include::indices/apis/unfreeze.asciidoc[]
 include::indices/aliases.asciidoc[]

 include::indices/update-settings.asciidoc[]
-
-include::indices/create-data-stream.asciidoc[]
-
-include::indices/get-data-stream.asciidoc[]
-
-include::indices/delete-data-stream.asciidoc[]
--- a/docs/reference/rest-api/index.asciidoc
+++ b/docs/reference/rest-api/index.asciidoc
@ -47,6 +47,7 @@ endif::[]
 include::{es-repo-dir}/cat.asciidoc[]
 include::{es-repo-dir}/cluster.asciidoc[]
 include::{es-repo-dir}/ccr/apis/ccr-apis.asciidoc[]
+include::{es-repo-dir}/data-streams/data-stream-apis.asciidoc[]
 include::{es-repo-dir}/docs.asciidoc[]
 include::{es-repo-dir}/ingest/apis/enrich/index.asciidoc[]
 include::{es-repo-dir}/graph/explore.asciidoc[]