[DOCS] Tighten data streams copy (#64085) (#64111)

This commit is contained in:
James Rodewig 2020-10-24 14:39:13 -04:00 committed by GitHub
parent f19f170811
commit 361fa021fa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 188 additions and 809 deletions

View File

@ -471,7 +471,7 @@ PUT /_index_template/new-data-stream-template
create the new data stream. The name of the data stream must match the index
pattern defined in the new template's `index_patterns` property.
+
We do not recommend <<index-documents-to-create-a-data-stream,indexing new data
We do not recommend <<create-a-data-stream,indexing new data
to create this data stream>>. Later, you will reindex older data from an
existing data stream into this new stream. This could result in one or more
backing indices that contains a mix of new and old data.

View File

@ -5,111 +5,49 @@
<titleabbrev>Data streams</titleabbrev>
++++
A _data stream_ is a convenient, scalable way to ingest, search, and manage
continuously generated time series data.
Time series data, such as logs, tends to grow over time. While storing an entire
time series in a single {es} index is simpler, it is often more efficient and
cost-effective to store large volumes of data across multiple, time-based
indices. Multiple indices let you move indices containing older, less frequently
queried data to less expensive hardware and delete indices when they're no
longer needed, reducing overhead and storage costs.
A data stream is designed to give you the best of both worlds:
* The simplicity of a single named resource you can use for requests
* The storage, scalability, and cost-saving benefits of multiple indices
A {ref}/data-streams.html[data stream] lets you store append-only time series
data across multiple indices while giving you a single named resource for
requests. Data streams are well-suited for logs, events, metrics, and other
continuously generated data.
You can submit indexing and search requests directly to a data stream. The
stream automatically routes the requests to a collection of hidden
_backing indices_ that store the stream's data.
You can use <<index-lifecycle-management,{ilm} ({ilm-init})>> to automate the
management of these backing indices. {ilm-init} lets you automatically spin up
new backing indices, allocate indices to different hardware, delete old indices,
and take other automatic actions based on age or size criteria you set. Use data
streams and {ilm-init} to seamlessly scale your data storage based on your
budget, performance, resiliency, and retention needs.
[discrete]
[[when-to-use-data-streams]]
== When to use data streams
We recommend using data streams if you:
* Use {es} to ingest, search, and manage large volumes of time series data
* Want to scale and reduce costs by using {ilm-init} to automate the management
of your indices
* Index large volumes of time series data in {es} but rarely delete or update
individual documents
stream automatically routes the request to backing indices that store the
stream's data. You can use <<index-lifecycle-management,{ilm} ({ilm-init})>> to
automate the management of these backing indices. For example, you can use
{ilm-init} to automatically move older backing indices to less expensive
hardware and delete unneeded indices. {ilm-init} can help you reduce costs and
overhead as your data grows.
[discrete]
[[backing-indices]]
== Backing indices
A data stream consists of one or more _backing indices_. Backing indices are
<<index-hidden,hidden>>, auto-generated indices used to store a stream's
documents.
A data stream consists of one or more <<index-hidden,hidden>>, auto-generated
backing indices.
image::images/data-streams/data-streams-diagram.svg[align="center"]
To create backing indices, each data stream requires a matching
<<indices-templates,index template>>. This template acts as a blueprint for the
stream's backing indices. It specifies:
To configure its backing indices, each data stream requires a matching
<<indices-templates,index template>>. This template contains:
* One or more wildcard (`*`) patterns that match the name of the stream.
* An index pattern matching the stream's name.
* The mappings and settings for the stream's backing indices.
* Mappings and settings for the stream's backing indices.
* That the template is used exclusively for data streams.
Every document indexed to a data stream must contain a `@timestamp` field,
mapped as a <<date,`date`>> or <<date_nanos,`date_nanos`>> field type. If the
index template doesn't specify a mapping for the `@timestamp` field, {es} maps
`@timestamp` as a `date` field with default options.
Every document indexed to a data stream must have a `@timestamp` field. This
field can be mapped as a <<date,`date`>> or <<date_nanos,`date_nanos`>> field
data type by the stream's index template. If the template does not specify a
mapping, the `@timestamp` field is mapped as a `date` field with default
options.
The same index template can be used to create multiple data streams.
[discrete]
[[data-streams-generation]]
== Generation
Each data stream tracks its _generation_: a six-digit, zero-padded integer
that acts as a cumulative count of the data stream's backing indices. This count
includes any deleted indices for the stream. The generation is incremented
whenever a new backing index is added to the stream.
When a backing index is created, the index is named using the following
convention:
[source,text]
----
.ds-<data-stream>-<generation>
----
For example, the `web-server-logs` data stream has a generation of `34`. The
most recently created backing index for this data stream is named
`.ds-web-server-logs-000034`.
Because the generation increments with each new backing index, backing indices
with a higher generation contain more recent data. Backing indices with a lower
generation contain older data.
A backing index's name can change after its creation due to a
<<indices-shrink-index,shrink>>, <<snapshots-restore-snapshot,restore>>, or
other operations. However, renaming a backing index does not detach it from a
data stream.
The same index template can be used for multiple data streams. You cannot
delete an index template in use by a data stream.
[discrete]
[[data-stream-read-requests]]
== Read requests
When a read request is sent to a data stream, it routes the request to all its
backing indices. For example, a search request sent to a data stream would query
all its backing indices.
When you submit a read request to a data stream, the stream routes the request
to all its backing indices.
image::images/data-streams/data-streams-search-request.svg[align="center"]
@ -117,18 +55,16 @@ image::images/data-streams/data-streams-search-request.svg[align="center"]
[[data-stream-write-index]]
== Write index
The most recently created backing index is the data streams only
_write index_. The data stream routes all indexing requests for new documents to
this index.
The most recently created backing index is the data streams write index.
The stream adds new documents to this index only.
image::images/data-streams/data-streams-index-request.svg[align="center"]
You cannot add new documents to a stream's other backing indices, even by
sending requests directly to the index.
You cannot add new documents to other backing indices, even by sending requests
directly to the index.
Because it's the only index capable of ingesting new documents, you cannot
perform operations on a write index that might hinder indexing. These
prohibited operations include:
You also cannot perform operations on a write index that may hinder indexing,
such as:
* <<indices-clone-index,Clone>>
* <<indices-close,Close>>
@ -141,45 +77,57 @@ prohibited operations include:
[[data-streams-rollover]]
== Rollover
When a data stream is created, one backing index is automatically created.
Because this single index is also the most recently created backing index, it
acts as the stream's write index.
When you create a data stream, {es} automatically creates a backing index for
the stream. This index also acts as the stream's first write index. A
<<indices-rollover-index,rollover>> creates a new backing index that becomes the
stream's new write index.
A <<indices-rollover-index,rollover>> creates a new backing index for a data
stream. This new backing index becomes the stream's write index, replacing
the current one, and increments the stream's generation.
We recommend using <<index-lifecycle-management,{ilm-init}>> to automatically
roll over data streams when the write index reaches a specified age or size.
If needed, you can also <<manually-roll-over-a-data-stream,manually roll over>>
a data stream.
In most cases, we recommend using <<index-lifecycle-management,{ilm}
({ilm-init})>> to automate rollovers for data streams. This lets you
automatically roll over the current write index when it meets specified
criteria, such as a maximum age or size.
However, you can also use the <<indices-rollover-index,rollover API>> to
manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
[discrete]
[[data-streams-generation]]
== Generation
Each data stream tracks its generation: a six-digit, zero-padded integer that
acts as a cumulative count of the stream's rollovers, starting at `000001`.
When a backing index is created, the index is named using the following
convention:
[source,text]
----
.ds-<data-stream>-<generation>
----
Backing indices with a higher generation contain more recent data. For example,
the `web-server-logs` data stream has a generation of `34`. The stream's most
recent backing index is named `.ds-web-server-logs-000034`.
Some operations, such as a <<indices-shrink-index,shrink>> or
<<snapshots-restore-snapshot,restore>>, can change a backing index's name.
These name changes do not remove a backing index from its data stream.
[discrete]
[[data-streams-append-only]]
== Append-only
For most time series use cases, existing data is rarely, if ever, updated.
Because of this, data streams are designed to be append-only.
Data streams are designed for use cases where existing data is rarely,
if ever, updated. You cannot send update or deletion requests for existing
documents directly to a data stream. Instead, use the
<<update-docs-in-a-data-stream-by-query,update by query>> and
<<delete-docs-in-a-data-stream-by-query,delete by query>> APIs.
You can send <<add-documents-to-a-data-stream,indexing requests for new
documents>> directly to a data stream. However, you cannot send the update or
deletion requests for existing documents directly to a data stream.
If needed, you can <<update-delete-docs-in-a-backing-index,update or delete
documents>> by submitting requests directly to the document's backing index.
Instead, you can use the <<docs-update-by-query,update by query>> and
<<docs-delete-by-query,delete by query>> APIs to update or delete existing
documents in a data stream. See <<update-docs-in-a-data-stream-by-query>> and <<delete-docs-in-a-data-stream-by-query>>.
If needed, you can update or delete a document by submitting requests to the
backing index containing the document. See
<<update-delete-docs-in-a-backing-index>>.
TIP: If you frequently update or delete existing documents,
we recommend using an <<indices-add-alias,index alias>> and
<<indices-templates,index template>> instead of a data stream. You can still
use <<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.
TIP: If you frequently update or delete existing documents, use an
<<indices-add-alias,index alias>> and <<indices-templates,index template>>
instead of a data stream. You can still use
<<index-lifecycle-management,{ilm-init}>> to manage indices for the alias.
include::set-up-a-data-stream.asciidoc[]
include::use-a-data-stream.asciidoc[]

View File

@ -4,69 +4,29 @@
To set up a data stream, follow these steps:
. Check the <<data-stream-prereqs, prerequisites>>.
. <<configure-a-data-stream-ilm-policy>>.
. <<create-a-data-stream-template>>.
. <<create-a-data-stream>>.
. <<get-info-about-a-data-stream>> to verify it exists.
. <<secure-a-data-stream>>.
After you set up a data stream, you can <<use-a-data-stream, use the data
stream>> for indexing, searches, and other supported operations.
If you no longer need it, you can <<delete-a-data-stream,delete a data stream>>
and its backing indices.
[discrete]
[[data-stream-prereqs]]
=== Prerequisites
* {es} data streams are intended for time series data only. Each document
indexed to a data stream must contain the `@timestamp` field. This field must be
mapped as a <<date,`date`>> or <<date_nanos,`date_nanos`>> field data type.
* Data streams are best suited for time-based,
<<data-streams-append-only,append-only>> use cases. If you frequently need to
update or delete existing documents, we recommend using an index alias and an
index template instead.
[discrete]
[[configure-a-data-stream-ilm-policy]]
=== Optional: Configure an {ilm-init} lifecycle policy for a data stream
=== Optional: Configure an {ilm-init} lifecycle policy
You can use <<index-lifecycle-management,{ilm} ({ilm-init})>> to automatically
manage a data stream's backing indices. For example, you could use {ilm-init}
to:
While optional, we recommend you configure an <<set-up-lifecycle-policy,{ilm}
({ilm-init}) policy>> to automate the management of your data stream's backing
indices.
* Spin up a new write index for the data stream when the current one reaches a
certain size or age.
* Move older backing indices to slower, less expensive hardware.
* Delete stale backing indices to enforce data retention standards.
To use {ilm-init} with a data stream, you must
<<set-up-lifecycle-policy,configure a lifecycle policy>>. This lifecycle policy
should contain the automated actions to take on backing indices and the
triggers for such actions.
TIP: While optional, we recommend using {ilm-init} to manage the backing indices
associated with a data stream.
You can create the policy through the Kibana UI. In Kibana, open the menu and go
to *Stack Management > Index Lifecycle Policies*. Click *Index Lifecycle
Policies*.
In {kib}, open the menu and go to *Stack Management > Index Lifecycle Policies*.
Click *Index Lifecycle Policies*.
[role="screenshot"]
image::images/ilm/create-policy.png[Index Lifecycle Policies page]
You can also create a policy using the <<ilm-put-lifecycle,create lifecycle
policy API>>.
The following request configures the `my-data-stream-policy` lifecycle policy.
The policy uses the <<ilm-rollover,`rollover` action>> to create a
new <<data-stream-write-index,write index>> for the data stream when the current
one reaches 25GB in size. The policy also deletes backing indices 30 days after
their rollover.
[%collapsible]
.API example
====
Use the <<ilm-put-lifecycle,create lifecycle policy API>> to configure a policy:
[source,console]
----
@ -91,73 +51,59 @@ PUT /_ilm/policy/my-data-stream-policy
}
}
----
====
[discrete]
[[create-a-data-stream-template]]
=== Create an index template for a data stream
=== Create an index template
A data stream uses an index template to configure its backing indices. A
template for a data stream must specify:
. In {kib}, open the menu and go to *Stack Management > Index Management*.
. In the *Index Templates* tab, click *Create template*.
. In the Create template wizard, use the *Data stream* toggle to indicate the
template is used for data streams.
. Use the wizard to finish defining your template. Specify:
* One or more index patterns that match the name of the stream.
* One or more index patterns that match the data stream's name.
* The mappings and settings for the stream's backing indices.
* That the template is used exclusively for data streams.
* A priority for the template.
* Mappings and settings for the stream's backing indices.
* A priority for the index template
+
[IMPORTANT]
====
{es} has built-in index templates for the `metrics-*-*`, `logs-*-*`, and `synthetics-*-*` index
patterns, each with a priority of `100`.
{es} has built-in index templates for the `metrics-*-*`, `logs-*-*`, and
`synthetics-*-*` index patterns, each with a priority of `100`.
{ingest-guide}/ingest-management-overview.html[{agent}] uses these templates to
create data streams. If you use {agent}, assign your index templates a priority
lower than `100` to avoid overriding the built-in templates.
create data streams.
Otherwise, to avoid accidentally applying the built-in templates, use a
non-overlapping index pattern or assign templates with an overlapping pattern a
`priority` higher than `100`.
If you use {agent}, assign your index templates a priority lower than `100` to
avoid overriding the built-in templates. Otherwise, use a non-overlapping index
pattern or assign templates with an overlapping pattern a `priority` higher than
`100`.
For example, if you don't use {agent} and want to create a template for the
`logs-*` index pattern, assign your template a priority of `200`. This ensures
your template is applied instead of the built-in template for `logs-*-*`.
====
Every document indexed to a data stream must have a `@timestamp` field. This
field can be mapped as a <<date,`date`>> or <<date_nanos,`date_nanos`>> field
data type by the stream's index template. This mapping can include other
<<mapping-params,mapping parameters>>, such as <<mapping-date-format,`format`>>.
If the template does not specify a mapping, the `@timestamp` field is mapped as
a `date` field with default options.
If the index template doesn't specify a mapping for the `@timestamp` field, {es}
maps `@timestamp` as a `date` field with default options.
We recommend using {ilm-init} to manage a data stream's backing indices. Specify
the name of the lifecycle policy with the `index.lifecycle.name` setting.
If using {ilm-init}, specify your lifecycle policy in the `index.lifecycle.name`
setting.
TIP: We recommend you carefully consider which mappings and settings to include
in this template before creating a data stream. Later changes to the mappings or
settings of a stream's backing indices may require reindexing. See
<<data-streams-change-mappings-and-settings>>.
You can create an index template through the Kibana UI:
. From Kibana, open the menu and go to *Stack Management > Index Management*.
. In the *Index Templates* tab, click *Create template*.
. In the Create template wizard, use the *Data stream* toggle to indicate the
template is used exclusively for data streams.
TIP: Carefully consider your template's mappings and settings. Later changes may
require reindexing. See <<data-streams-change-mappings-and-settings>>.
[role="screenshot"]
image::images/data-streams/create-index-template.png[Create template page]
You can also create a template using the <<indices-put-template,put index
template API>>. The template must include a `data_stream` object with an empty
body (`{ }`). This object indicates the template is used exclusively for data
streams.
The following request configures the `my-data-stream-template` index template.
Because no field mapping is specified, the `@timestamp` field uses the `date`
field data type by default.
[%collapsible]
.API example
====
Use the <<indices-put-template,put index template API>> to create an index
template. The template must include an empty `data_stream` object, indicating
it's used for data streams.
[source,console]
----
@ -174,64 +120,15 @@ PUT /_index_template/my-data-stream-template
}
----
// TEST[continued]
Alternatively, the following template maps `@timestamp` as a `date_nanos` field.
[source,console]
----
PUT /_index_template/my-data-stream-template
{
"index_patterns": [ "my-data-stream*" ],
"data_stream": { },
"priority": 200,
"template": {
"mappings": {
"properties": {
"@timestamp": { "type": "date_nanos" } <1>
}
},
"settings": {
"index.lifecycle.name": "my-data-stream-policy"
}
}
}
----
// TEST[continued]
<1> Maps `@timestamp` as a `date_nanos` field. You can include other supported
mapping parameters in this field mapping.
NOTE: You cannot delete an index template that's in use by a data stream.
This would prevent the data stream from creating new backing indices.
====
[discrete]
[[create-a-data-stream]]
=== Create a data stream
=== Create the data stream
You can create a data stream using one of two methods:
* <<index-documents-to-create-a-data-stream>>
* <<manually-create-a-data-stream>>
[discrete]
[[index-documents-to-create-a-data-stream]]
==== Index documents to create a data stream
You can automatically create a data stream using an indexing request. Submit
an <<add-documents-to-a-data-stream,indexing request>> to a target
matching the index pattern defined in the template's `index_patterns`
property.
If the indexing request's target doesn't exist, {es} creates the data stream and
uses the target name as the name for the stream.
NOTE: Data streams support only specific types of indexing requests. See
<<add-documents-to-a-data-stream>>.
The following <<docs-index_,index API>> request targets `my-data-stream`, which
matches the index pattern for `my-data-stream-template`. Because
no existing index or data stream uses this name, this request creates the
`my-data-stream` data stream and indexes the document to it.
To automatically create the data stream, submit an
<<add-documents-to-a-data-stream,indexing request>> to the stream. The stream's
name must match one of your template's index patterns.
[source,console]
----
@ -246,41 +143,9 @@ POST /my-data-stream/_doc/
----
// TEST[continued]
The API returns the following response. Note the `_index` property contains
`.ds-my-data-stream-000001`, indicating the document was indexed to the write
index of the new data stream.
[source,console-result]
----
{
"_index": ".ds-my-data-stream-000001",
"_id": "qecQmXIBT4jB8tq1nG0j",
"_type": "_doc",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
----
// TESTRESPONSE[s/"_id": "qecQmXIBT4jB8tq1nG0j"/"_id": $body._id/]
[discrete]
[[manually-create-a-data-stream]]
==== Manually create a data stream
You can use the <<indices-create-data-stream,create data stream API>> to
manually create a data stream. The name of the data stream must match the index
pattern defined in the template's `index_patterns` property.
The following create data stream request targets `my-data-stream-alt`, which
matches the index pattern for `my-data-stream-template`. Because
no existing index or data stream uses this name, this request creates the
`my-data-stream-alt` data stream.
You can also use the <<indices-create-data-stream,create data stream API>> to
manually create the data stream. The stream's name must match one of your
template's index patterns.
[source,console]
----
@ -288,30 +153,28 @@ PUT /_data_stream/my-data-stream-alt
----
// TEST[continued]
[discrete]
[[secure-a-data-stream]]
=== Secure the data stream
To control access to the data stream and its
data, use <<data-stream-privileges,{es}'s {security-features}>>.
[discrete]
[[get-info-about-a-data-stream]]
=== Get information about a data stream
To view information about a data stream in Kibana, open the menu and go to
*Stack Management > Index Management*. In the *Data Streams* tab, click a data
stream's name to view information about the stream.
In {kib}, open the menu and go to *Stack Management > Index Management*. In the
*Data Streams* tab, click the data stream's name.
[role="screenshot"]
image::images/data-streams/data-streams-list.png[Data Streams tab]
You can also use the <<indices-get-data-stream,get data stream API>> to retrieve
the following information about one or more data streams:
* The current backing indices, which is returned as an array. The last item in
the array contains information about the stream's current write index.
* The current generation
* The data stream's health status
* The index template used to create the stream's backing indices
* The current {ilm-init} lifecycle policy in the stream's matching index
template
The following get data stream API request retrieves information about
`my-data-stream`.
[%collapsible]
.API example
====
Use the <<indices-get-data-stream,get data stream API>> to retrieve information
about one or more data streams:
////
[source,console]
@ -326,75 +189,31 @@ POST /my-data-stream/_rollover/
GET /_data_stream/my-data-stream
----
// TEST[continued]
The API returns the following response. Note the `indices` property contains an
array of the stream's current backing indices. The last item in this array
contains information about the stream's write index, `.ds-my-data-stream-000002`.
[source,console-result]
----
{
"data_streams": [
{
"name": "my-data-stream",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-my-data-stream-000001",
"index_uuid": "krR78LfvTOe6gr5dj2_1xQ"
},
{
"index_name": ".ds-my-data-stream-000002", <1>
"index_uuid": "C6LWyNJHQWmA08aQGvqRkA"
}
],
"generation": 2,
"status": "GREEN",
"template": "my-data-stream-template",
"ilm_policy": "my-data-stream-policy"
}
]
}
----
// TESTRESPONSE[s/"index_uuid": "krR78LfvTOe6gr5dj2_1xQ"/"index_uuid": $body.data_streams.0.indices.0.index_uuid/]
// TESTRESPONSE[s/"index_uuid": "C6LWyNJHQWmA08aQGvqRkA"/"index_uuid": $body.data_streams.0.indices.1.index_uuid/]
// TESTRESPONSE[s/"status": "GREEN"/"status": "YELLOW"/]
<1> Last item in the `indices` array for `my-data-stream`. This
item contains information about the stream's current write index,
`.ds-my-data-stream-000002`.
[discrete]
[[secure-a-data-stream]]
=== Secure a data stream
You can use {es} {security-features} to control access to a data stream and its
data. See <<data-stream-privileges>>.
====
[discrete]
[[delete-a-data-stream]]
=== Delete a data stream
You can use the Kibana UI to delete a data stream and its backing indices. In
Kibana, open the menu and go to *Stack Management > Index Management*. In the
*Data Streams* tab, click the trash can icon to delete a stream and its backing
indices.
To delete a data stream and its backing indices, open the {kib} menu and go to
*Stack Management > Index Management*. In the *Data Streams* tab, click the
trash can icon.
[role="screenshot"]
image::images/data-streams/data-streams-list.png[Data Streams tab]
You can also use the the <<indices-delete-data-stream,delete data stream API>>
to delete a data stream. The following delete data stream API request deletes
`my-data-stream`. This request also deletes the stream's backing
indices and any data they contain.
[%collapsible]
.API example
====
Use the <<indices-delete-data-stream,delete data stream API>> to delete a data
stream and its backing indices:
[source,console]
----
DELETE /_data_stream/my-data-stream
----
// TEST[continued]
====
////
[source,console]

View File

@ -56,35 +56,8 @@ DELETE /_index_template/*
[[add-documents-to-a-data-stream]]
=== Add documents to a data stream
You can add documents to a data stream using two types of indexing requests:
* <<data-streams-individual-indexing-requests>>
* <<data-streams-bulk-indexing-requests>>
Adding a document to a data stream adds the document to stream's current
<<data-stream-write-index,write index>>.
You cannot add new documents to a stream's other backing indices, even by
sending requests directly to the index. This means you cannot submit the
following requests directly to any backing index except the write index:
* An <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `create`. The `op_type` parameter
defaults to `create` when adding new documents.
* A <<docs-bulk,bulk API>> request using a `create` action
[discrete]
[[data-streams-individual-indexing-requests]]
==== Individual indexing requests
You can use an <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `create` to add individual documents
to a data stream.
NOTE: The `op_type` parameter defaults to `create` when adding new documents.
The following index API request adds a new document to `my-data-stream`.
To add an individual document, use the <<docs-index_,index API>>.
<<ingest,Ingest pipelines>> are supported.
[source,console]
----
@ -98,23 +71,13 @@ POST /my-data-stream/_doc/
}
----
IMPORTANT: You cannot add new documents to a data stream using the index API's
`PUT /<target>/_doc/<_id>` request format. To specify a document ID, use the
`PUT /<target>/_create/<_id>` format instead.
You cannot add new documents to a data stream using the index API's `PUT
/<target>/_doc/<_id>` request format. To specify a document ID, use the `PUT
/<target>/_create/<_id>` format instead. Only an
<<docs-index-api-op_type,`op_type`>> of `create` is supported.
[discrete]
[[data-streams-bulk-indexing-requests]]
==== Bulk indexing requests
You can use the <<docs-bulk,bulk API>> to add multiple documents to a data
stream in a single request. Each action in the bulk request must use the
`create` action.
NOTE: Data streams do not support other bulk actions, such as `index`.
The following bulk API request adds several new documents to
`my-data-stream`. Only the `create` action is used.
To add multiple documents with a single request, use the <<docs-bulk,bulk API>>.
Only `create` actions are supported.
[source,console]
----
@ -127,65 +90,6 @@ PUT /my-data-stream/_bulk?refresh
{ "@timestamp": "2020-12-09T11:07:08.000Z", "user": { "id": "l7gk7f82" }, "message": "Logout successful" }
----
[discrete]
[[data-streams-index-with-an-ingest-pipeline]]
==== Index with an ingest pipeline
You can use an <<ingest,ingest pipeline>> with an indexing request to
pre-process data before it's indexed to a data stream.
The following <<put-pipeline-api,put pipeline API>> request creates the
`lowercase_message_field` ingest pipeline. The pipeline uses the
<<lowercase-processor,`lowercase` ingest processor>> to change the `message`
field value to lowercase before indexing.
[source,console]
----
PUT /_ingest/pipeline/lowercase_message_field
{
"description" : "Lowercases the message field value",
"processors" : [
{
"lowercase" : {
"field" : "message"
}
}
]
}
----
// TEST[continued]
The following index API request adds a new document to `my-data-stream`.
The request includes a `?pipeline=lowercase_message_field` query parameter.
This parameter indicates {es} should use the `lowercase_message_field` pipeline
to pre-process the document before indexing it.
During pre-processing, the pipeline changes the letter case of the document's
`message` field value from `LOGIN Successful` to `login successful`.
[source,console]
----
POST /my-data-stream/_doc?pipeline=lowercase_message_field
{
"@timestamp": "2020-12-08T11:12:01.000Z",
"user": {
"id": "I1YBEOxJ"
},
"message": "LOGIN Successful"
}
----
// TEST[continued]
////
[source,console]
----
DELETE /_ingest/pipeline/lowercase_message_field
----
// TEST[continued]
////
[discrete]
[[search-a-data-stream]]
=== Search a data stream
@ -198,157 +102,24 @@ The following search APIs support data streams:
* <<search-field-caps, Field capabilities>>
* <<eql-search-api, EQL search>>
The following <<search-search,search API>> request searches `my-data-stream`
for documents with a timestamp between today and yesterday that also have
`message` value of `login successful`.
[source,console]
----
GET /my-data-stream/_search
{
"query": {
"bool": {
"must": {
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lt": "now/d"
}
}
},
"should": {
"match": {
"message": "login successful"
}
}
}
}
}
----
You can use a comma-separated list to search
multiple data streams, indices, and index aliases in the same request.
The following request searches `my-data-stream` and `my-data-stream-alt`,
which are specified as a comma-separated list in the request path.
[source,console]
----
GET /my-data-stream,my-data-stream-alt/_search
{
"query": {
"match": {
"user.id": "8a4f500d"
}
}
}
----
Index patterns are also supported.
The following request uses the `my-data-stream*` index pattern to search any
data stream, index, or index alias beginning with `my-data-stream`.
[source,console]
----
GET /my-data-stream*/_search
{
"query": {
"match": {
"user.id": "vlb44hny"
}
}
}
----
The following search request omits a target in the request path. The request
searches all data streams and indices in the cluster.
[source,console]
----
GET /_search
{
"query": {
"match": {
"user.id": "l7gk7f82"
}
}
}
----
[discrete]
[[get-stats-for-a-data-stream]]
=== Get statistics for a data stream
You can use the <<data-stream-stats-api,data stream stats API>> to retrieve
statistics for one or more data streams. These statistics include:
* A count of the stream's backing indices
* The total store size of all shards for the stream's backing indices
* The highest `@timestamp` value for the stream
.*Example*
[%collapsible]
====
The following data stream stats API request retrieves statistics for
`my-data-stream`.
Use the <<data-stream-stats-api,data stream stats API>> to get
statistics for one or more data streams:
[source,console]
----
GET /_data_stream/my-data-stream/_stats?human=true
----
The API returns the following response.
[source,console-result]
----
{
"_shards": {
"total": 6,
"successful": 3,
"failed": 0
},
"data_stream_count": 1,
"backing_indices": 3,
"total_store_size": "624b",
"total_store_size_bytes": 624,
"data_streams": [
{
"data_stream": "my-data-stream",
"backing_indices": 3,
"store_size": "624b",
"store_size_bytes": 624,
"maximum_timestamp": 1607339167000
}
]
}
----
// TESTRESPONSE[s/"total_store_size": "624b"/"total_store_size": $body.total_store_size/]
// TESTRESPONSE[s/"total_store_size_bytes": 624/"total_store_size_bytes": $body.total_store_size_bytes/]
// TESTRESPONSE[s/"store_size": "624b"/"store_size": $body.data_streams.0.store_size/]
// TESTRESPONSE[s/"store_size_bytes": 624/"store_size_bytes": $body.data_streams.0.store_size_bytes/]
====
[discrete]
[[manually-roll-over-a-data-stream]]
=== Manually roll over a data stream
A rollover creates a new backing index for a data stream. This new backing index
becomes the stream's <<data-stream-write-index,write index>> and increments
the stream's <<data-streams-generation,generation>>.
In most cases, we recommend using <<index-lifecycle-management,{ilm-init}>> to
automate rollovers for data streams. This lets you automatically roll over the
current write index when it meets specified criteria, such as a maximum age or
size.
However, you can also use the <<indices-rollover-index,rollover API>> to
manually perform a rollover. This can be useful if you want to
<<data-streams-change-mappings-and-settings,apply mapping or setting changes>>
to the stream's write index after updating a data stream's template.
The following <<indices-rollover-index,rollover API>> request submits a manual
rollover request for `my-data-stream`.
Use the <<indices-rollover-index,rollover API>> to manually
<<data-streams-rollover,roll over>> a data stream:
[source,console]
----
@ -359,112 +130,35 @@ POST /my-data-stream/_rollover/
[[open-closed-backing-indices]]
=== Open closed backing indices
You may <<indices-close,close>> one or more of a data stream's backing indices
as part of its {ilm-init} lifecycle or another workflow. A closed backing index
cannot be searched, even for searches targeting its data stream. You also can't
<<update-docs-in-a-data-stream-by-query,update>> or
<<delete-docs-in-a-data-stream-by-query,delete>> documents in a closed index.
You cannot search a <<indices-close,closed>> backing index, even by searching
its data stream. You also cannot <<update-docs-in-a-data-stream-by-query,update>>
or <<delete-docs-in-a-data-stream-by-query,delete>> documents in a closed index.
You can re-open individual backing indices by sending an
<<indices-open-close,open request>> directly to the index.
You also can conveniently re-open all closed backing indices for a data stream
by sending an open request directly to the stream.
The following <<cat-indices,cat indices>> API request retrieves the status for
`my-data-stream`'s backing indices.
////
[source,console]
----
POST /.ds-my-data-stream-000001,.ds-my-data-stream-000002/_close/
----
////
To re-open a closed backing index, submit an <<indices-open-close,open
index API request>> directly to the index:
[source,console]
----
GET /_cat/indices/my-data-stream?v&s=index&h=index,status
POST /.ds-my-data-stream-000001/_open/
----
// TEST[continued]
The API returns the following response. The response indicates
`my-data-stream` contains two closed backing indices:
`.ds-my-data-stream-000001` and `.ds-my-data-stream-000002`.
[source,txt]
----
index status
.ds-my-data-stream-000001 close
.ds-my-data-stream-000002 close
.ds-my-data-stream-000003 open
----
// TESTRESPONSE[non_json]
The following <<indices-open-close,open API>> request re-opens any closed
backing indices for `my-data-stream`, including
`.ds-my-data-stream-000001` and `.ds-my-data-stream-000002`.
To re-open all closed backing indices for a data stream, submit an open index
API request to the stream:
[source,console]
----
POST /my-data-stream/_open/
----
// TEST[continued]
You can resubmit the original cat indices API request to verify
`.ds-my-data-stream-000001` and `.ds-my-data-stream-000002` were re-opened.
[source,console]
----
GET /_cat/indices/my-data-stream?v&s=index&h=index,status
----
// TEST[continued]
The API returns the following response.
[source,txt]
----
index status
.ds-my-data-stream-000001 open
.ds-my-data-stream-000002 open
.ds-my-data-stream-000003 open
----
// TESTRESPONSE[non_json]
[discrete]
[[reindex-with-a-data-stream]]
=== Reindex with a data stream
You can use the <<docs-reindex,reindex API>> to copy documents to a data stream
from an existing index, index alias, or data stream.
A reindex copies documents from a _source_ to a _destination_. The source and
destination can be any pre-existing index, index alias, or data stream. However,
the source and destination must be different. You cannot reindex a data stream
into itself.
Because data streams are <<data-streams-append-only,append-only>>, a reindex
request to a data stream destination must have an `op_type` of `create`. This
means a reindex can only add new documents to a data stream. It cannot update
existing documents in the data stream destination.
A reindex can be used to:
* Convert an existing index alias and collection of time-based indices into a
data stream.
* Apply a new or updated <<create-a-data-stream-template,index template>>
by reindexing an existing data stream into a new one. This applies mapping
and setting changes in the template to each document and backing index of the
data stream destination. See
<<data-streams-use-reindex-to-change-mappings-settings>>.
TIP: If you only want to update the mappings or settings of a data stream's
write index, we recommend you update the <<create-a-data-stream-template,data
stream's template>> and perform a <<manually-roll-over-a-data-stream,rollover>>.
The following reindex request copies documents from the `archive` index alias to
`my-data-stream`. Because the destination is a data
stream, the request's `op_type` is `create`.
Use the <<docs-reindex,reindex API>> to copy documents from an
existing index, index alias, or data stream to a data stream. Because data streams are
<<data-streams-append-only,append-only>>, a reindex into a data stream must use
an `op_type` of `create`. A reindex cannot update existing documents in a data
stream.
////
[source,console]
@ -504,48 +198,12 @@ POST /_reindex
----
// TEST[continued]
You can also reindex documents from a data stream to an index, index
alias, or data stream.
The following reindex request copies documents from `my-data-stream`
to the existing `archive` index alias. Because the destination is not a
data stream, the `op_type` does not need to be specified.
[source,console]
----
POST /_reindex
{
"source": {
"index": "my-data-stream"
},
"dest": {
"index": "archive"
}
}
----
// TEST[continued]
[discrete]
[[update-docs-in-a-data-stream-by-query]]
=== Update documents in a data stream by query
You cannot send indexing or update requests for existing documents directly to a
data stream. These prohibited requests include:
* An <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
defaults to `index` for existing documents.
* A <<docs-bulk,bulk API>> request using the `index` or `update`
action.
Instead, you can use the <<docs-update-by-query,update by query API>> to update
documents in a data stream that matches a provided query.
The following update by query request updates documents in `my-data-stream`
with a `user.id` of `l7gk7f82`. The request uses a
<<modules-scripting-using,script>> to assign matching documents a new `user.id`
value of `XgdX0NoX`.
Use the <<docs-update-by-query,update by query API>> to update documents in a
data stream that match a provided query:
[source,console]
----
@ -569,18 +227,8 @@ POST /my-data-stream/_update_by_query
[[delete-docs-in-a-data-stream-by-query]]
=== Delete documents in a data stream by query
You cannot send document deletion requests directly to a data stream. These
prohibited requests include:
* A <<docs-delete,delete API>> request
* A <<docs-bulk,bulk API>> request using the `delete` action.
Instead, you can use the <<docs-delete-by-query,delete by query API>> to delete
documents in a data stream that matches a provided query.
The following delete by query request deletes documents in `my-data-stream`
with a `user.id` of `vlb44hny`.
Use the <<docs-delete-by-query,delete by query API>> to delete documents in a
data stream that match a provided query:
[source,console]
----
@ -598,26 +246,15 @@ POST /my-data-stream/_delete_by_query
[[update-delete-docs-in-a-backing-index]]
=== Update or delete documents in a backing index
Alternatively, you can update or delete documents in a data stream by sending
the update or deletion request to the backing index containing the document. To
do this, you first need to get:
If needed, you can update or delete documents in a data stream by sending
requests to the backing index containing the document. You'll need:
* The <<mapping-id-field,document ID>>
* The name of the backing index that contains the document
* The name of the backing index containing the document
* If updating the document, its <<optimistic-concurrency-control,sequence number
and primary term>>
If you want to update a document, you must also get its current
<<optimistic-concurrency-control,sequence number and primary term>>.
You can use a <<search-a-data-stream,search request>> to retrieve this
information.
The following search request retrieves documents in `my-data-stream`
with a `user.id` of `yWIumJd7`. By default, this search returns the
document ID and backing index for any matching documents.
The request includes a `"seq_no_primary_term": true` argument. This means the
search also returns the sequence number and primary term for any matching
documents.
To get this information, use a <<search-a-data-stream,search request>>:
[source,console]
----
@ -632,8 +269,7 @@ GET /my-data-stream/_search
}
----
The API returns the following response. The `hits.hits` property contains
information for any documents matching the search.
Response:
[source,console-result]
----
@ -681,17 +317,8 @@ information for any documents matching the search.
<3> Current sequence number for the document
<4> Primary term for the document
You can use an <<docs-index_,index API>> request to update an individual
document. To prevent an accidental overwrite, this request must include valid
`if_seq_no` and `if_primary_term` arguments.
The following index API request updates an existing document in
`my-data-stream`. The request targets document ID
`bfspvnIBr7VVZlfp2lqX` in the `.ds-my-data-stream-000003` backing index.
The request also includes the current sequence number and primary term in the
respective `if_seq_no` and `if_primary_term` query parameters. The request body
contains a new JSON source for the document.
To update the document, use an <<docs-index_,index API>> request with valid
`if_seq_no` and `if_primary_term` arguments:
[source,console]
----
@ -705,32 +332,17 @@ PUT /.ds-my-data-stream-000003/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=0&if_primary_
}
----
You use the <<docs-delete,delete API>> to delete individual documents. Deletion
requests do not require a sequence number or primary term.
The following index API request deletes an existing document in
`my-data-stream`. The request targets document ID
`bfspvnIBr7VVZlfp2lqX` in the `.ds-my-data-stream-000003` backing index.
To delete the document, use the <<docs-delete,delete API>>:
[source,console]
----
DELETE /.ds-my-data-stream-000003/_doc/bfspvnIBr7VVZlfp2lqX
----
You can use the <<docs-bulk,bulk API>> to delete or update multiple documents in
one request using `delete`, `index`, or `update` actions.
If the action type is `index`, the action must include valid
<<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
arguments.
The following bulk API request uses an `index` action to update an existing
document in `my-data-stream`.
The `index` action targets document ID `bfspvnIBr7VVZlfp2lqX` in the
`.ds-my-data-stream-000003` backing index. The action also includes the current
sequence number and primary term in the respective `if_seq_no` and
`if_primary_term` parameters.
To delete or update multiple documents with a single request, use the
<<docs-bulk,bulk API>>'s `delete`, `index`, and `update` actions. For `index`
actions, include valid <<bulk-optimistic-concurrency-control,`if_seq_no` and
`if_primary_term`>> arguments.
[source,console]
----