[7.x] [DOCS] Document delete/update by query for data streams (#58679) (#58706)

This commit is contained in:
James Rodewig 2020-06-30 08:35:13 -04:00 committed by GitHub
parent a9677efb56
commit d33764583c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 156 additions and 54 deletions

View File

@ -119,26 +119,28 @@ manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
=== Append-only === Append-only
For most time-series use cases, existing data is rarely, if ever, updated. For most time-series use cases, existing data is rarely, if ever, updated.
Because of this, data streams are designed to be append-only. This means you can Because of this, data streams are designed to be append-only.
send indexing requests for new documents directly to a data stream. However, you
cannot send update or deletion requests for existing documents to a data stream.
To update or delete specific documents in a data stream, submit one of the You can send <<add-documents-to-a-data-stream,indexing requests for new
following requests to the backing index containing the document: documents>> directly to a data stream. However, you cannot send the following
requests for existing documents directly to a data stream:
* An <<docs-index_,index API>> request with an * An <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `index`. <<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
These requests must include valid <<optimistic-concurrency-control,`if_seq_no` defaults to `index` for existing documents.
and `if_primary_term`>> arguments.
* A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update` * A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
action. If the action type is `index`, the action must include valid action.
<<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
arguments.
* A <<docs-delete,delete API>> request * A <<docs-delete,delete API>> request
See <<update-delete-docs-in-a-data-stream>>. Instead, you can use the <<docs-update-by-query,update by query>> and
<<docs-delete-by-query,delete by query>> APIs to update or delete existing
documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
Alternatively, you can update or delete a document by submitting requests to the
backing index containing the document. See
<<update-delete-docs-in-a-backing-index>>.
TIP: If you frequently update or delete existing documents, TIP: If you frequently update or delete existing documents,
we recommend using an <<indices-add-alias,index alias>> and we recommend using an <<indices-add-alias,index alias>> and

View File

@ -26,11 +26,10 @@ TIP: Data streams work well with most common log formats. While no schema is
required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
(ECS)]. (ECS)].
* Data streams are designed to be <<data-streams-append-only,append-only>>. * Data streams are best suited for time-based,
While you can index new documents directly to a data stream, you cannot use a <<data-streams-append-only,append-only>> use cases. If you frequently need to
data stream to directly update or delete individual documents. To update or update or delete existing documents, we recommend using an index alias and an
delete specific documents in a data stream, submit a <<docs-delete,delete>> or index template instead.
<<docs-update,update>> API request to the backing index containing the document.
[discrete] [discrete]

View File

@ -9,6 +9,7 @@ the following:
* <<manually-roll-over-a-data-stream>> * <<manually-roll-over-a-data-stream>>
* <<reindex-with-a-data-stream>> * <<reindex-with-a-data-stream>>
* <<update-delete-docs-in-a-data-stream>> * <<update-delete-docs-in-a-data-stream>>
* <<update-delete-docs-in-a-backing-index>>
//// ////
[source,console] [source,console]
@ -66,6 +67,10 @@ POST /logs/_doc/
---- ----
// TEST[continued] // TEST[continued]
==== ====
IMPORTANT: You cannot add new documents to a data stream using the index API's
`PUT /<target>/_doc/<_id>` request format. Use the `PUT /<target>/_create/<_id>`
format instead.
-- --
* A <<docs-bulk,bulk API>> request using the `create` action. Specify the data * A <<docs-bulk,bulk API>> request using the `create` action. Specify the data
@ -348,12 +353,96 @@ POST /_reindex
[[update-delete-docs-in-a-data-stream]] [[update-delete-docs-in-a-data-stream]]
=== Update or delete documents in a data stream === Update or delete documents in a data stream
Data streams are designed to be <<data-streams-append-only,append-only>>. This You can update or delete documents in a data stream using the following
means you cannot send update or deletion requests for existing documents to a requests:
data stream. However, you can send update or deletion requests to the backing
index containing the document.
To delete or update a document in a data stream, you first need to get: * An <<docs-update-by-query,update by query API>> request
+
.*Example*
[%collapsible]
====
The following update by query API request updates documents in the `logs` data
stream with a `user.id` of `i96BP1mA`. The request uses a
<<modules-scripting-using,script>> to assign matching documents a new `user.id`
value of `XgdX0NoX`.
////
[source,console]
----
PUT /logs/_create/2?refresh=wait_for
{
"@timestamp": "2020-12-07T11:06:07.000Z",
"user": {
"id": "i96BP1mA"
}
}
----
// TEST[continued]
////
[source,console]
----
POST /logs/_update_by_query
{
"query": {
"match": {
"user.id": "i96BP1mA"
}
},
"script": {
"source": "ctx._source.user.id = params.new_id",
"params": {
"new_id": "XgdX0NoX"
}
}
}
----
// TEST[continued]
====
* A <<docs-delete-by-query,delete by query API>> request
+
.*Example*
[%collapsible]
====
The following delete by query API request deletes documents in the `logs` data
stream with a `user.id` of `zVZMamUM`.
////
[source,console]
----
PUT /logs/_create/1?refresh=wait_for
{
"@timestamp": "2020-12-07T11:06:07.000Z",
"user": {
"id": "zVZMamUM"
}
}
----
// TEST[continued]
////
[source,console]
----
POST /logs/_delete_by_query
{
"query": {
"match": {
"user.id": "zVZMamUM"
}
}
}
----
// TEST[continued]
====
[discrete]
[[update-delete-docs-in-a-backing-index]]
=== Update or delete documents in a backing index
Alternatively, you can update or delete documents in a data stream by sending
the update or deletion request to the backing index containing the document. To
do this, you first need to get:
* The <<mapping-id-field,document ID>> * The <<mapping-id-field,document ID>>
* The name of the backing index that contains the document * The name of the backing index that contains the document
@ -429,7 +518,7 @@ information for any documents matching the search.
"_index": ".ds-logs-000002", <1> "_index": ".ds-logs-000002", <1>
"_type": "_doc", "_type": "_doc",
"_id": "bfspvnIBr7VVZlfp2lqX", <2> "_id": "bfspvnIBr7VVZlfp2lqX", <2>
"_seq_no": 4, <3> "_seq_no": 8, <3>
"_primary_term": 1, <4> "_primary_term": 1, <4>
"_score": 0.2876821, "_score": 0.2876821,
"_source": { "_source": {
@ -445,6 +534,8 @@ information for any documents matching the search.
} }
---- ----
// TESTRESPONSE[s/"took": 20/"took": $body.took/] // TESTRESPONSE[s/"took": 20/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 0.2876821/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"_score": 0.2876821/"_score": $body.hits.hits.0._score/]
<1> Backing index containing the matching document <1> Backing index containing the matching document
<2> Document ID for the document <2> Document ID for the document
@ -469,7 +560,7 @@ contains a new JSON source for the document.
[source,console] [source,console]
---- ----
PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=4&if_primary_term=1 PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=8&if_primary_term=1
{ {
"@timestamp": "2020-12-07T11:06:07.000Z", "@timestamp": "2020-12-07T11:06:07.000Z",
"user": { "user": {
@ -534,7 +625,7 @@ parameters.
[source,console] [source,console]
---- ----
PUT /_bulk?refresh PUT /_bulk?refresh
{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 4, "if_primary_term": 1 } } { "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 8, "if_primary_term": 1 } }
{ "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" } { "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
---- ----
// TEST[continued] // TEST[continued]

View File

@ -47,7 +47,7 @@ POST /twitter/_delete_by_query
[[docs-delete-by-query-api-request]] [[docs-delete-by-query-api-request]]
==== {api-request-title} ==== {api-request-title}
`POST /<index>/_delete_by_query` `POST /<target>/_delete_by_query`
[[docs-delete-by-query-api-desc]] [[docs-delete-by-query-api-desc]]
==== {api-description-title} ==== {api-description-title}
@ -55,7 +55,7 @@ POST /twitter/_delete_by_query
You can specify the query criteria in the request URI or the request body You can specify the query criteria in the request URI or the request body
using the same syntax as the <<search-search,Search API>>. using the same syntax as the <<search-search,Search API>>.
When you submit a delete by query request, {es} gets a snapshot of the index When you submit a delete by query request, {es} gets a snapshot of the data stream or index
when it begins processing the request and deletes matching documents using when it begins processing the request and deletes matching documents using
`internal` versioning. If a document changes between the time that the `internal` versioning. If a document changes between the time that the
snapshot is taken and the delete operation is processed, it results in a version snapshot is taken and the delete operation is processed, it results in a version
@ -134,12 +134,12 @@ Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the
delete process. This can improve efficiency and provide a delete process. This can improve efficiency and provide a
convenient way to break the request down into smaller parts. convenient way to break the request down into smaller parts.
Setting `slices` to `auto` chooses a reasonable number for most indices. Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind If you're slicing manually or otherwise tuning automatic slicing, keep in mind
that: that:
* Query performance is most efficient when the number of `slices` is equal to * Query performance is most efficient when the number of `slices` is equal to
the number of shards in the index. If that number is large (for example, the number of shards in the index or backing index. If that number is large (for example,
500), choose a lower number as too many `slices` hurts performance. Setting 500), choose a lower number as too many `slices` hurts performance. Setting
`slices` higher than the number of shards generally does not improve efficiency `slices` higher than the number of shards generally does not improve efficiency
and adds overhead. and adds overhead.
@ -153,9 +153,11 @@ documents being reindexed and cluster resources.
[[docs-delete-by-query-api-path-params]] [[docs-delete-by-query-api-path-params]]
==== {api-path-parms-title} ==== {api-path-parms-title}
`<index>`:: `<target>`::
(Optional, string) A comma-separated list of index names to search. Use `_all` (Optional, string)
or omit to search all indices. A comma-separated list of data streams, indices, and index aliases to search.
Wildcard (`*`) expressions are supported. To search all data streams or indices
in a cluster, omit this parameter or use `_all` or `*`.
[[docs-delete-by-query-api-query-params]] [[docs-delete-by-query-api-query-params]]
==== {api-query-parms-title} ==== {api-query-parms-title}
@ -200,7 +202,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing] include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll] `scroll`::
(Optional, <<time-units,time value>>)
Period to retain the <<scroll-search-context,search context>> for scrolling. See
<<request-body-search-scroll>>.
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size] include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
@ -343,7 +348,7 @@ version conflicts.
[[docs-delete-by-query-api-example]] [[docs-delete-by-query-api-example]]
==== {api-examples-title} ==== {api-examples-title}
Delete all tweets from the `twitter` index: Delete all tweets from the `twitter` data stream or index:
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------
@ -356,7 +361,7 @@ POST twitter/_delete_by_query?conflicts=proceed
-------------------------------------------------- --------------------------------------------------
// TEST[setup:twitter] // TEST[setup:twitter]
Delete documents from multiple indices: Delete documents from multiple data streams or indices:
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------
@ -531,8 +536,8 @@ Which results in a sensible `total` like this one:
Setting `slices` to `auto` will let {es} choose the number of slices Setting `slices` to `auto` will let {es} choose the number of slices
to use. This setting will use one slice per shard, up to a certain limit. If to use. This setting will use one slice per shard, up to a certain limit. If
there are multiple source indices, it will choose the number of slices based there are multiple source data streams or indices, it will choose the number of slices based
on the index with the smallest number of shards. on the index or backing index with the smallest number of shards.
Adding `slices` to `_delete_by_query` just automates the manual process used in Adding `slices` to `_delete_by_query` just automates the manual process used in
the section above, creating sub-requests which means it has some quirks: the section above, creating sub-requests which means it has some quirks:
@ -555,7 +560,7 @@ slices` are distributed proportionally to each sub-request. Combine that with
the point above about distribution being uneven and you should conclude that the point above about distribution being uneven and you should conclude that
using `max_docs` with `slices` might not result in exactly `max_docs` documents using `max_docs` with `slices` might not result in exactly `max_docs` documents
being deleted. being deleted.
* Each sub-request gets a slightly different snapshot of the source index * Each sub-request gets a slightly different snapshot of the source data stream or index
though these are all taken at approximately the same time. though these are all taken at approximately the same time.
[float] [float]

View File

@ -5,7 +5,7 @@
++++ ++++
Updates documents that match the specified query. Updates documents that match the specified query.
If no query is specified, performs an update on every document in the index without If no query is specified, performs an update on every document in the data stream or index without
modifying the source, which is useful for picking up mapping changes. modifying the source, which is useful for picking up mapping changes.
[source,console] [source,console]
@ -44,7 +44,7 @@ POST twitter/_update_by_query?conflicts=proceed
[[docs-update-by-query-api-request]] [[docs-update-by-query-api-request]]
==== {api-request-title} ==== {api-request-title}
`POST /<index>/_update_by_query` `POST /<target>/_update_by_query`
[[docs-update-by-query-api-desc]] [[docs-update-by-query-api-desc]]
==== {api-description-title} ==== {api-description-title}
@ -52,7 +52,7 @@ POST twitter/_update_by_query?conflicts=proceed
You can specify the query criteria in the request URI or the request body You can specify the query criteria in the request URI or the request body
using the same syntax as the <<search-search,Search API>>. using the same syntax as the <<search-search,Search API>>.
When you submit an update by query request, {es} gets a snapshot of the index When you submit an update by query request, {es} gets a snapshot of the data stream or index
when it begins processing the request and updates matching documents using when it begins processing the request and updates matching documents using
`internal` versioning. `internal` versioning.
When the versions match, the document is updated and the version number is incremented. When the versions match, the document is updated and the version number is incremented.
@ -75,7 +75,7 @@ Any update requests that completed successfully still stick, they are not rolled
===== Refreshing shards ===== Refreshing shards
Specifying the `refresh` parameter refreshes all shards once the request completes. Specifying the `refresh` parameter refreshes all shards once the request completes.
This is different than the update API#8217;s `refresh` parameter, which causes just the shard This is different than the update API's `refresh` parameter, which causes just the shard
that received the request to be refreshed. Unlike the update API, it does not support that received the request to be refreshed. Unlike the update API, it does not support
`wait_for`. `wait_for`.
@ -129,12 +129,12 @@ Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
update process. This can improve efficiency and provide a update process. This can improve efficiency and provide a
convenient way to break the request down into smaller parts. convenient way to break the request down into smaller parts.
Setting `slices` to `auto` chooses a reasonable number for most indices. Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind If you're slicing manually or otherwise tuning automatic slicing, keep in mind
that: that:
* Query performance is most efficient when the number of `slices` is equal to * Query performance is most efficient when the number of `slices` is equal to
the number of shards in the index. If that number is large (for example, the number of shards in the index or backing index. If that number is large (for example,
500), choose a lower number as too many `slices` hurts performance. Setting 500), choose a lower number as too many `slices` hurts performance. Setting
`slices` higher than the number of shards generally does not improve efficiency `slices` higher than the number of shards generally does not improve efficiency
and adds overhead. and adds overhead.
@ -148,9 +148,11 @@ documents being reindexed and cluster resources.
[[docs-update-by-query-api-path-params]] [[docs-update-by-query-api-path-params]]
==== {api-path-parms-title} ==== {api-path-parms-title}
`<index>`:: `<target>`::
(Optional, string) A comma-separated list of index names to search. Use `_all` (Optional, string)
or omit to search all indices. A comma-separated list of data streams, indices, and index aliases to search.
Wildcard (`*`) expressions are supported. To search all data streams or indices
in a cluster, omit this parameter or use `_all` or `*`.
[[docs-update-by-query-api-query-params]] [[docs-update-by-query-api-query-params]]
==== {api-query-parms-title} ==== {api-query-parms-title}
@ -197,7 +199,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing] include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll] `scroll`::
(Optional, <<time-units,time value>>)
Period to retain the <<scroll-search-context,search context>> for scrolling. See
<<request-body-search-scroll>>.
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size] include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
@ -290,7 +295,7 @@ version conflicts.
==== {api-examples-title} ==== {api-examples-title}
The simplest usage of `_update_by_query` just performs an update on every The simplest usage of `_update_by_query` just performs an update on every
document in the index without changing the source. This is useful to document in the data stream or index without changing the source. This is useful to
<<picking-up-a-new-property,pick up a new property>> or some other online <<picking-up-a-new-property,pick up a new property>> or some other online
mapping change. mapping change.
@ -313,7 +318,7 @@ POST twitter/_update_by_query?conflicts=proceed
way as the <<search-search,Search API>>. You can also use the `q` way as the <<search-search,Search API>>. You can also use the `q`
parameter in the same way as the search API. parameter in the same way as the search API.
Update documents in multiple indices: Update documents in multiple data streams or indices:
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------
@ -617,8 +622,8 @@ Which results in a sensible `total` like this one:
Setting `slices` to `auto` will let Elasticsearch choose the number of slices Setting `slices` to `auto` will let Elasticsearch choose the number of slices
to use. This setting will use one slice per shard, up to a certain limit. If to use. This setting will use one slice per shard, up to a certain limit. If
there are multiple source indices, it will choose the number of slices based there are multiple source data streams or indices, it will choose the number of slices based
on the index with the smallest number of shards. on the index or backing index with the smallest number of shards.
Adding `slices` to `_update_by_query` just automates the manual process used in Adding `slices` to `_update_by_query` just automates the manual process used in
the section above, creating sub-requests which means it has some quirks: the section above, creating sub-requests which means it has some quirks:
@ -641,7 +646,7 @@ be larger than others. Expect larger slices to have a more even distribution.
the point above about distribution being uneven and you should conclude that the point above about distribution being uneven and you should conclude that
using `max_docs` with `slices` might not result in exactly `max_docs` documents using `max_docs` with `slices` might not result in exactly `max_docs` documents
being updated. being updated.
* Each sub-request gets a slightly different snapshot of the source index * Each sub-request gets a slightly different snapshot of the source data stream or index
though these are all taken at approximately the same time. though these are all taken at approximately the same time.
[float] [float]