OpenSearch/docs/reference/data-streams/data-streams-overview.asciidoc

148 lines
5.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[data-streams-overview]]
== Data streams overview
++++
<titleabbrev>Overview</titleabbrev>
++++
A data stream consists of one or more _backing indices_. Backing indices are
<<index-hidden,hidden>>, auto-generated indices used to store a stream's
documents.
image::images/data-streams/data-streams-diagram.svg[align="center"]
The creation of a data stream requires a matching
<<indices-templates,index template>>. This template acts as a blueprint for
the stream's backing indices. It contains:
* A name or wildcard (`*`) pattern for the data stream.
* The data stream's _timestamp field_. This field must be mapped as a
<<date,`date`>> or <<date_nanos,`date_nanos`>> field datatype and must be
included in every document indexed to the data stream.
* The mappings and settings applied to each backing index when it's created.
The same index template can be used to create multiple data streams.
See <<set-up-a-data-stream>>.
[discrete]
[[data-streams-generation]]
=== Generation
Each data stream tracks its _generation_: a six-digit, zero-padded integer
that acts as a cumulative count of the data stream's backing indices. This count
includes any deleted indices for the stream. The generation is incremented
whenever a new backing index is added to the stream.
When a backing index is created, the index is named using the following
convention:
[source,text]
----
.ds-<data-stream>-<generation>
----
.*Example*
[%collapsible]
====
The `web_server_logs` data stream has a generation of `34`. The most recently
created backing index for this data stream is named
`.ds-web_server_logs-000034`.
====
Because the generation increments with each new backing index, backing indices
with a higher generation contain more recent data. Backing indices with a lower
generation contain older data.
A backing index's name can change after its creation due to a
<<indices-shrink-index,shrink>>, <<snapshots-restore-snapshot,restore>>, or
other operations.
[discrete]
[[data-stream-write-index]]
=== Write index
When a read request is sent to a data stream, it routes the request to all its
backing indices. For example, a search request sent to a data stream would query
all its backing indices.
image::images/data-streams/data-streams-search-request.svg[align="center"]
However, the most recently created backing index is the data streams only
_write index_. The data stream routes all indexing requests for new documents to
this index.
image::images/data-streams/data-streams-index-request.svg[align="center"]
You cannot add new documents to a stream's other backing indices, even by
sending requests directly to the index. This means you cannot submit the
following requests directly to any backing index except the write index:
* An <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `create`. The `op_type` parameter
defaults to `create` when adding new documents.
* A <<docs-bulk,bulk API>> request using a `create` action
Because it's the only index capable of ingesting new documents, you cannot
perform operations on a write index that might hinder indexing. These
prohibited operations include:
* <<indices-clone-index,Clone>>
* <<indices-close,Close>>
* <<indices-delete-index,Delete>>
* <<freeze-index-api,Freeze>>
* <<indices-shrink-index,Shrink>>
* <<indices-split-index,Split>>
[discrete]
[[data-streams-rollover]]
=== Rollover
When a data stream is created, one backing index is automatically created.
Because this single index is also the most recently created backing index, it
acts as the stream's write index.
A <<indices-rollover-index,rollover>> creates a new backing index for a data
stream. This new backing index becomes the stream's write index, replacing
the current one, and increments the stream's generation.
In most cases, we recommend using <<index-lifecycle-management,{ilm}
({ilm-init})>> to automate rollovers for data streams. This lets you
automatically roll over the current write index when it meets specified
criteria, such as a maximum age or size.
However, you can also use the <<indices-rollover-index,rollover API>> to
manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
[discrete]
[[data-streams-append-only]]
=== Append-only
For most time-series use cases, existing data is rarely, if ever, updated.
Because of this, data streams are designed to be append-only.
You can send <<add-documents-to-a-data-stream,indexing requests for new
documents>> directly to a data stream. However, you cannot send the following
requests for existing documents directly to a data stream:
* An <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
defaults to `index` for existing documents.
* A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
action.
* A <<docs-delete,delete API>> request
Instead, you can use the <<docs-update-by-query,update by query>> and
<<docs-delete-by-query,delete by query>> APIs to update or delete existing
documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
Alternatively, you can update or delete a document by submitting requests to the
backing index containing the document. See
<<update-delete-docs-in-a-backing-index>>.
TIP: If you frequently update or delete existing documents,
we recommend using an <<indices-add-alias,index alias>> and
<<indices-templates,index template>> instead of a data stream. You can still
use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.