[DOCS] Reformat data streams intro and overview (#57954) (#57993)

Changes:

* Updates 'Data streams' intro page to focus on problem solution and
  benefits.

* Adds 'Data streams overview' page to cover conceptual information,
  based on existing content in the 'Data streams' intro.

* Adds diagrams for data streams and search/indexing request examples.

* Moves API jump list and API docs to a new 'Data streams APIs' section.
  Links to these APIs will be available through tutorials.

* Add xrefs to existing docs for concepts like generation, write index,
  and append-only.
This commit is contained in:
James Rodewig 2020-06-11 11:32:09 -04:00 committed by GitHub
parent 4e738f60f8
commit 6fc8317f07
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 208 additions and 129 deletions

View File

@ -0,0 +1,14 @@
[[data-stream-apis]]
== Data stream APIs
The following APIs are available for managing data streams:
* To get information about data streams, use the <<indices-get-data-stream, get data stream API>>.
* To delete data streams, use the <<indices-delete-data-stream, delete data stream API>>.
* To manually create a data stream, use the <<indices-create-data-stream, create data stream API>>.
include::{es-repo-dir}/indices/create-data-stream.asciidoc[]
include::{es-repo-dir}/indices/get-data-stream.asciidoc[]
include::{es-repo-dir}/indices/delete-data-stream.asciidoc[]

View File

@ -0,0 +1,142 @@
[[data-streams-overview]]
== Data streams overview
++++
<titleabbrev>Overview</titleabbrev>
++++
A data stream consists of one or more _backing indices_. Backing indices are
<<index-hidden,hidden>>, automatically-generated indices used to store a
stream's documents.
image::images/data-streams/data-streams-diagram.svg[align="center"]
The creation of a data stream requires an associated
<<indices-templates,composable template>>. This template acts as a blueprint for
the stream's backing indices. It contains:
* A name or wildcard (`*`) pattern for the data stream.
* The data stream's _timestamp field_. This field must be mapped as a
<<date,`date`>> or <<date_nanos,`date_nanos`>> field datatype and must be
included in every document indexed to the data stream.
* The mappings and settings applied to each backing index when it's created.
The same composable template can be used to create multiple data streams.
See <<set-up-a-data-stream>>.
[discrete]
[[data-streams-generation]]
=== Generation
Each data stream tracks its _generation_: a six-digit, zero-padded integer
that acts as a cumulative count of the data stream's backing indices. This count
includes any deleted indices for the stream. The generation is incremented
whenever a new backing index is added to the stream.
When a backing index is created, the index is named using the following
convention:
[source,text]
----
.ds-<data-stream>-<generation>
----
.*Example*
[%collapsible]
====
The `web_server_logs` data stream has a generation of `34`. The most recently
created backing index for this data stream is named
`.ds-web_server_logs-000034`.
====
Because the generation increments with each new backing index, backing indices
with a higher generation contain more recent data. Backing indices with a lower
generation contain older data.
A backing index's name can change after its creation due to a
<<indices-shrink-index,shrink>>, <<snapshots-restore-snapshot,restore>>, or
other operations.
[discrete]
[[data-stream-write-index]]
=== Write index
When a read request is sent to a data stream, it routes the request to all its
backing indices. For example, a search request sent to a data stream would query
all its backing indices.
image::images/data-streams/data-streams-search-request.svg[align="center"]
However, the most recently created backing index is the data streams only
_write index_. The data stream routes all indexing requests for new documents to
this index.
image::images/data-streams/data-streams-index-request.svg[align="center"]
You cannot add new documents to a stream's other backing indices, even by
sending requests directly to the index. This means you cannot submit the
following requests directly to any backing index except the write index:
* An <<docs-index_,Index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `create`. The `op_type` parameter
defaults to `create` when adding new documents.
* A <<docs-bulk,Bulk API>> request using a `create` action
Because it's the only index capable of ingesting new documents, you cannot
perform operations on a write index that might hinder indexing. These
prohibited operations include:
* <<indices-close,Closing the write index>>
* <<indices-delete-index,Deleting the write index>>
* <<freeze-index-api,Freezing the write index>>
* <<indices-shrink-index,Shrinking the write index>>
[discrete]
[[data-streams-rollover]]
=== Rollover
When a data stream is created, one backing index is automatically created.
Because this single index is also the most recently created backing index, it
acts as the stream's write index.
A <<indices-rollover-index,rollover>> creates a new backing index for a data
stream. This new backing index becomes the stream's write index, replacing
the current one, and increments the stream's generation.
In most cases, we recommend using <<index-lifecycle-management,{ilm}
({ilm-init})>> to automate rollovers for data streams. This lets you
automatically roll over the current write index when it meets specified
criteria, such as a maximum age or size.
However, you can also use the <<indices-rollover-index,rollover API>> to
manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
[discrete]
[[data-streams-append-only]]
=== Append-only
For most time-series use cases, existing data is rarely, if ever, updated.
Because of this, data streams are designed to be append-only. This means you can
send indexing requests for new documents directly to a data stream. However, you
cannot send update or deletion requests for existing documents to a data stream.
To update or delete specific documents in a data stream, submit one of the
following requests to the backing index containing the document:
* An <<docs-index_,Index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `index`.
These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
and `if_primary_term`>> arguments.
* A <<docs-bulk,Bulk API>> request using the `delete`, `index`, or `update`
action. If the action type is `index`, the action must include valid
<<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
arguments.
* A <<docs-delete,Delete API>> request
TIP: If you need to frequently update or delete existing documents across
multiple indices, we recommend using an <<indices-add-alias,index alias>> and
<<indices-templates,index template>> instead of a data stream. You can still
use <<index-lifecycle-management,{ilm-init}>> to manage the indices.

View File

@ -1,130 +1,60 @@
[[data-streams]]
= Data streams
++++
<titleabbrev>Data streams</titleabbrev>
++++
[partintro]
--
You can use data streams to index time-based data that's continuously generated.
A data stream groups indices from the same time-based data source.
A data stream tracks its indices, known as _backing indices_, using an ordered
list.
A _data stream_ is a convenient, scalable way to ingest, search, and manage
continuously generated time-series data.
A data stream's backing indices are <<index-hidden,hidden>>.
While all backing indices handle read requests, the most recently created
backing index is the data stream's only write index. A data stream only
accepts <<docs-index_,index requests>> with `op_type` set to `create`. To update
or delete specific documents in a data stream, submit a <<docs-delete,delete>>
or <<docs-update,update>> API request to the backing index containing the
document.
Time-series data, such as logs, tends to grow over time. While storing an entire
time series in a single {es} index is simpler, it is often more efficient and
cost-effective to store large volumes of data across multiple, time-based
indices. Multiple indices let you move indices containing older, less frequently
queried data to less expensive hardware and delete indices when they're no
longer needed, reducing overhead and storage costs.
To create a data stream, set up a <<indices-templates,composable index
template>> containing:
A data stream is designed to give you the best of both worlds:
* A name or wildcard pattern for the data stream in the `index_patterns` property.
* A `data_stream` definition that contains the `timestamp_field` property.
The `timestamp_field` must be the primary timestamp field
for the data source. This field must be included in every
document indexed to the data stream.
* The simplicity of a single, named resource you can use for requests
related
* The storage, scalability, and cost-saving benefits of multiple indices
When you index one or more documents to a not-yet-existent target matching
the template's name or pattern, {es} automatically creates the corresponding
data stream. You can also manually create a data stream using the
<<indices-create-data-stream,create data stream API>>. However, a composable
template for the stream is still required.
You can submit indexing and search requests directly to a data stream. The
stream automatically routes the requests to a collection of hidden,
auto-generated indices that store the stream's data.
You can use a <<indices-templates,composable template>> and
<<index-lifecycle-management,{ilm} ({ilm-init})>> to automate the management of
these hidden indices. You can use {ilm-init} to spin up new indices, allocate
indices to different hardware, delete old indices, and take other automatic
actions based on age or size criteria you set. This lets you seamlessly scale
your data storage based on your budget, performance, resiliency, and retention
needs.
You can use the <<indices-rollover-index,rollover API>> to roll a data stream
over to a new index when the current write index meets specified criteria, such
as a maximum age or size. A rollover creates a new backing index and updates the
data stream's list of backing indices. This new index then becomes the stream's
new write index. See <<rollover-data-stream-ex>>.
[discrete]
[[create-data-stream]]
== Create a data stream
[[when-to-use-data-streams]]
== When to use data streams
Create a composable template with a `data_stream` definition:
We recommend using data streams if you:
[source,console]
-----------------------------------
PUT /_index_template/logs_template
{
"index_patterns": ["logs-*"],
"data_stream": {
"timestamp_field": "@timestamp"
}
}
-----------------------------------
* Use {es} to ingest, search, and manage large volumes of time-series data
* Want to scale and reduce costs by using {ilm-init} to automate the management
of your indices
* Index large volumes of time-series data in {es} but rarely delete or update
individual documents
Start indexing data to a target matching the composable template's wildcard
pattern:
[source,console]
----
POST /logs-foobar/_doc
{
"@timestamp": "2050-11-15T14:12:12",
...
}
----
// TEST[continued]
// TEST[s/,//]
// TEST[s/\.\.\.//]
Response:
[source,console-result]
--------------------------------------------------
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 1
},
"_index" : ".ds-logs-foobar-000001",
"_type" : "_doc",
"_id" : "W0tpsmIBdwcYyG50zbta",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"result": "created"
}
--------------------------------------------------
// TESTRESPONSE[s/W0tpsmIBdwcYyG50zbta/$body._id/]
Or create a data stream using the create data stream API:
[source,console]
--------------------------------------------------
PUT /_data_stream/logs-barbaz
--------------------------------------------------
// TEST[continued]
////
[source,console]
-----------------------------------
DELETE /_data_stream/logs-foobar
DELETE /_data_stream/logs-barbaz
DELETE /_index_template/logs_template
-----------------------------------
// TEST[continued]
////
[discrete]
[[data-streams-apis]]
== Data stream APIs
The following APIs are available for managing data streams:
* To get information about data streams, use the <<indices-get-data-stream, get data stream API>>.
* To delete data streams, use the <<indices-delete-data-stream, delete data stream API>>.
* To manually create a data stream, use the <<indices-create-data-stream, create data stream API>>.
[discrete]
[[data-streams-toc]]
== In this section
* <<data-streams-overview>>
* <<set-up-a-data-stream>>
* <<use-a-data-stream>>
--
include::data-streams-overview.asciidoc[]
include::set-up-a-data-stream.asciidoc[]
include::use-a-data-stream.asciidoc[]

View File

@ -22,11 +22,11 @@ TIP: Data streams work well with most common log formats. While no schema is
required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
(ECS)].
* Data streams are designed to be append-only. While you can index new documents
directly to a data stream, you cannot use a data stream to directly update or
delete individual documents. To update or delete specific documents in a data
stream, submit a <<docs-delete,delete>> or <<docs-update,update>> API request to
the backing index containing the document.
* Data streams are designed to be <<data-streams-append-only,append-only>>.
While you can index new documents directly to a data stream, you cannot use a
data stream to directly update or delete individual documents. To update or
delete specific documents in a data stream, submit a <<docs-delete,delete>> or
<<docs-update,update>> API request to the backing index containing the document.
[discrete]
@ -57,8 +57,9 @@ The following <<ilm-put-lifecycle,create lifecycle policy API>> request
configures the `logs_policy` lifecycle policy.
The `logs_policy` policy uses the <<ilm-rollover,`rollover` action>> to create a
new write index for the data stream when the current one reaches 25GB in size.
The policy also deletes backing indices 30 days after their rollover.
new <<data-stream-write-index,write index>> for the data stream when the current
one reaches 25GB in size. The policy also deletes backing indices 30 days after
their rollover.
[source,console]
----

View File

@ -144,7 +144,8 @@ GET /logs/_search
=== Manually roll over a data stream
A rollover creates a new backing index for a data stream. This new backing index
becomes the stream's new write index and increments the stream's generation.
becomes the stream's <<data-stream-write-index,write index>> and increments
the stream's <<data-streams-generation,generation>>.
In most cases, we recommend using <<index-lifecycle-management,{ilm-init}>> to
automate rollovers for data streams. This lets you automatically roll over the

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 34 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 43 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 45 KiB

View File

@ -2,7 +2,7 @@
== Index APIs
Index APIs are used to manage individual indices,
index settings, data streams, aliases, mappings, and index templates.
index settings, aliases, mappings, and index templates.
[float]
[[index-management]]
@ -31,13 +31,6 @@ index settings, data streams, aliases, mappings, and index templates.
* <<indices-get-field-mapping>>
* <<indices-types-exists>>
[float]
[[data-stream-management]]
=== Data stream management:
* <<indices-create-data-stream>>
* <<indices-delete-data-stream>>
* <<indices-get-data-stream>>
[float]
[[alias-management]]
=== Alias management:
@ -165,9 +158,3 @@ include::indices/apis/unfreeze.asciidoc[]
include::indices/aliases.asciidoc[]
include::indices/update-settings.asciidoc[]
include::indices/create-data-stream.asciidoc[]
include::indices/get-data-stream.asciidoc[]
include::indices/delete-data-stream.asciidoc[]

View File

@ -47,6 +47,7 @@ endif::[]
include::{es-repo-dir}/cat.asciidoc[]
include::{es-repo-dir}/cluster.asciidoc[]
include::{es-repo-dir}/ccr/apis/ccr-apis.asciidoc[]
include::{es-repo-dir}/data-streams/data-stream-apis.asciidoc[]
include::{es-repo-dir}/docs.asciidoc[]
include::{es-repo-dir}/ingest/apis/enrich/index.asciidoc[]
include::{es-repo-dir}/graph/explore.asciidoc[]