[7.x] [DOCS] Document delete/update by query for data streams (#58679) (#58706)

This commit is contained in:
James Rodewig 2020-06-30 08:35:13 -04:00 committed by GitHub
parent a9677efb56
commit d33764583c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 156 additions and 54 deletions

View File

@ -119,26 +119,28 @@ manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
=== Append-only
For most time-series use cases, existing data is rarely, if ever, updated.
Because of this, data streams are designed to be append-only. This means you can
send indexing requests for new documents directly to a data stream. However, you
cannot send update or deletion requests for existing documents to a data stream.
Because of this, data streams are designed to be append-only.
To update or delete specific documents in a data stream, submit one of the
following requests to the backing index containing the document:
You can send <<add-documents-to-a-data-stream,indexing requests for new
documents>> directly to a data stream. However, you cannot send the following
requests for existing documents directly to a data stream:
* An <<docs-index_,index API>> request with an
<<docs-index-api-op_type,`op_type`>> of `index`.
These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
and `if_primary_term`>> arguments.
<<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
defaults to `index` for existing documents.
* A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
action. If the action type is `index`, the action must include valid
<<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
arguments.
action.
* A <<docs-delete,delete API>> request
See <<update-delete-docs-in-a-data-stream>>.
Instead, you can use the <<docs-update-by-query,update by query>> and
<<docs-delete-by-query,delete by query>> APIs to update or delete existing
documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
Alternatively, you can update or delete a document by submitting requests to the
backing index containing the document. See
<<update-delete-docs-in-a-backing-index>>.
TIP: If you frequently update or delete existing documents,
we recommend using an <<indices-add-alias,index alias>> and

View File

@ -26,11 +26,10 @@ TIP: Data streams work well with most common log formats. While no schema is
required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
(ECS)].
* Data streams are designed to be <<data-streams-append-only,append-only>>.
While you can index new documents directly to a data stream, you cannot use a
data stream to directly update or delete individual documents. To update or
delete specific documents in a data stream, submit a <<docs-delete,delete>> or
<<docs-update,update>> API request to the backing index containing the document.
* Data streams are best suited for time-based,
<<data-streams-append-only,append-only>> use cases. If you frequently need to
update or delete existing documents, we recommend using an index alias and an
index template instead.
[discrete]

View File

@ -9,6 +9,7 @@ the following:
* <<manually-roll-over-a-data-stream>>
* <<reindex-with-a-data-stream>>
* <<update-delete-docs-in-a-data-stream>>
* <<update-delete-docs-in-a-backing-index>>
////
[source,console]
@ -66,6 +67,10 @@ POST /logs/_doc/
----
// TEST[continued]
====
IMPORTANT: You cannot add new documents to a data stream using the index API's
`PUT /<target>/_doc/<_id>` request format. Use the `PUT /<target>/_create/<_id>`
format instead.
--
* A <<docs-bulk,bulk API>> request using the `create` action. Specify the data
@ -348,12 +353,96 @@ POST /_reindex
[[update-delete-docs-in-a-data-stream]]
=== Update or delete documents in a data stream
Data streams are designed to be <<data-streams-append-only,append-only>>. This
means you cannot send update or deletion requests for existing documents to a
data stream. However, you can send update or deletion requests to the backing
index containing the document.
You can update or delete documents in a data stream using the following
requests:
To delete or update a document in a data stream, you first need to get:
* An <<docs-update-by-query,update by query API>> request
+
.*Example*
[%collapsible]
====
The following update by query API request updates documents in the `logs` data
stream with a `user.id` of `i96BP1mA`. The request uses a
<<modules-scripting-using,script>> to assign matching documents a new `user.id`
value of `XgdX0NoX`.
////
[source,console]
----
PUT /logs/_create/2?refresh=wait_for
{
"@timestamp": "2020-12-07T11:06:07.000Z",
"user": {
"id": "i96BP1mA"
}
}
----
// TEST[continued]
////
[source,console]
----
POST /logs/_update_by_query
{
"query": {
"match": {
"user.id": "i96BP1mA"
}
},
"script": {
"source": "ctx._source.user.id = params.new_id",
"params": {
"new_id": "XgdX0NoX"
}
}
}
----
// TEST[continued]
====
* A <<docs-delete-by-query,delete by query API>> request
+
.*Example*
[%collapsible]
====
The following delete by query API request deletes documents in the `logs` data
stream with a `user.id` of `zVZMamUM`.
////
[source,console]
----
PUT /logs/_create/1?refresh=wait_for
{
"@timestamp": "2020-12-07T11:06:07.000Z",
"user": {
"id": "zVZMamUM"
}
}
----
// TEST[continued]
////
[source,console]
----
POST /logs/_delete_by_query
{
"query": {
"match": {
"user.id": "zVZMamUM"
}
}
}
----
// TEST[continued]
====
[discrete]
[[update-delete-docs-in-a-backing-index]]
=== Update or delete documents in a backing index
Alternatively, you can update or delete documents in a data stream by sending
the update or deletion request to the backing index containing the document. To
do this, you first need to get:
* The <<mapping-id-field,document ID>>
* The name of the backing index that contains the document
@ -429,7 +518,7 @@ information for any documents matching the search.
"_index": ".ds-logs-000002", <1>
"_type": "_doc",
"_id": "bfspvnIBr7VVZlfp2lqX", <2>
"_seq_no": 4, <3>
"_seq_no": 8, <3>
"_primary_term": 1, <4>
"_score": 0.2876821,
"_source": {
@ -445,6 +534,8 @@ information for any documents matching the search.
}
----
// TESTRESPONSE[s/"took": 20/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 0.2876821/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"_score": 0.2876821/"_score": $body.hits.hits.0._score/]
<1> Backing index containing the matching document
<2> Document ID for the document
@ -469,7 +560,7 @@ contains a new JSON source for the document.
[source,console]
----
PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=4&if_primary_term=1
PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=8&if_primary_term=1
{
"@timestamp": "2020-12-07T11:06:07.000Z",
"user": {
@ -534,7 +625,7 @@ parameters.
[source,console]
----
PUT /_bulk?refresh
{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 4, "if_primary_term": 1 } }
{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 8, "if_primary_term": 1 } }
{ "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
----
// TEST[continued]

View File

@ -47,7 +47,7 @@ POST /twitter/_delete_by_query
[[docs-delete-by-query-api-request]]
==== {api-request-title}
`POST /<index>/_delete_by_query`
`POST /<target>/_delete_by_query`
[[docs-delete-by-query-api-desc]]
==== {api-description-title}
@ -55,7 +55,7 @@ POST /twitter/_delete_by_query
You can specify the query criteria in the request URI or the request body
using the same syntax as the <<search-search,Search API>>.
When you submit a delete by query request, {es} gets a snapshot of the index
When you submit a delete by query request, {es} gets a snapshot of the data stream or index
when it begins processing the request and deletes matching documents using
`internal` versioning. If a document changes between the time that the
snapshot is taken and the delete operation is processed, it results in a version
@ -134,12 +134,12 @@ Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the
delete process. This can improve efficiency and provide a
convenient way to break the request down into smaller parts.
Setting `slices` to `auto` chooses a reasonable number for most indices.
Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
that:
* Query performance is most efficient when the number of `slices` is equal to
the number of shards in the index. If that number is large (for example,
the number of shards in the index or backing index. If that number is large (for example,
500), choose a lower number as too many `slices` hurts performance. Setting
`slices` higher than the number of shards generally does not improve efficiency
and adds overhead.
@ -153,9 +153,11 @@ documents being reindexed and cluster resources.
[[docs-delete-by-query-api-path-params]]
==== {api-path-parms-title}
`<index>`::
(Optional, string) A comma-separated list of index names to search. Use `_all`
or omit to search all indices.
`<target>`::
(Optional, string)
A comma-separated list of data streams, indices, and index aliases to search.
Wildcard (`*`) expressions are supported. To search all data streams or indices
in a cluster, omit this parameter or use `_all` or `*`.
[[docs-delete-by-query-api-query-params]]
==== {api-query-parms-title}
@ -200,7 +202,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
`scroll`::
(Optional, <<time-units,time value>>)
Period to retain the <<scroll-search-context,search context>> for scrolling. See
<<request-body-search-scroll>>.
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
@ -343,7 +348,7 @@ version conflicts.
[[docs-delete-by-query-api-example]]
==== {api-examples-title}
Delete all tweets from the `twitter` index:
Delete all tweets from the `twitter` data stream or index:
[source,console]
--------------------------------------------------
@ -356,7 +361,7 @@ POST twitter/_delete_by_query?conflicts=proceed
--------------------------------------------------
// TEST[setup:twitter]
Delete documents from multiple indices:
Delete documents from multiple data streams or indices:
[source,console]
--------------------------------------------------
@ -531,8 +536,8 @@ Which results in a sensible `total` like this one:
Setting `slices` to `auto` will let {es} choose the number of slices
to use. This setting will use one slice per shard, up to a certain limit. If
there are multiple source indices, it will choose the number of slices based
on the index with the smallest number of shards.
there are multiple source data streams or indices, it will choose the number of slices based
on the index or backing index with the smallest number of shards.
Adding `slices` to `_delete_by_query` just automates the manual process used in
the section above, creating sub-requests which means it has some quirks:
@ -555,7 +560,7 @@ slices` are distributed proportionally to each sub-request. Combine that with
the point above about distribution being uneven and you should conclude that
using `max_docs` with `slices` might not result in exactly `max_docs` documents
being deleted.
* Each sub-request gets a slightly different snapshot of the source index
* Each sub-request gets a slightly different snapshot of the source data stream or index
though these are all taken at approximately the same time.
[float]

View File

@ -5,7 +5,7 @@
++++
Updates documents that match the specified query.
If no query is specified, performs an update on every document in the index without
If no query is specified, performs an update on every document in the data stream or index without
modifying the source, which is useful for picking up mapping changes.
[source,console]
@ -44,7 +44,7 @@ POST twitter/_update_by_query?conflicts=proceed
[[docs-update-by-query-api-request]]
==== {api-request-title}
`POST /<index>/_update_by_query`
`POST /<target>/_update_by_query`
[[docs-update-by-query-api-desc]]
==== {api-description-title}
@ -52,7 +52,7 @@ POST twitter/_update_by_query?conflicts=proceed
You can specify the query criteria in the request URI or the request body
using the same syntax as the <<search-search,Search API>>.
When you submit an update by query request, {es} gets a snapshot of the index
When you submit an update by query request, {es} gets a snapshot of the data stream or index
when it begins processing the request and updates matching documents using
`internal` versioning.
When the versions match, the document is updated and the version number is incremented.
@ -75,7 +75,7 @@ Any update requests that completed successfully still stick, they are not rolled
===== Refreshing shards
Specifying the `refresh` parameter refreshes all shards once the request completes.
This is different than the update API#8217;s `refresh` parameter, which causes just the shard
This is different than the update API's `refresh` parameter, which causes just the shard
that received the request to be refreshed. Unlike the update API, it does not support
`wait_for`.
@ -129,12 +129,12 @@ Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
update process. This can improve efficiency and provide a
convenient way to break the request down into smaller parts.
Setting `slices` to `auto` chooses a reasonable number for most indices.
Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
that:
* Query performance is most efficient when the number of `slices` is equal to
the number of shards in the index. If that number is large (for example,
the number of shards in the index or backing index. If that number is large (for example,
500), choose a lower number as too many `slices` hurts performance. Setting
`slices` higher than the number of shards generally does not improve efficiency
and adds overhead.
@ -148,9 +148,11 @@ documents being reindexed and cluster resources.
[[docs-update-by-query-api-path-params]]
==== {api-path-parms-title}
`<index>`::
(Optional, string) A comma-separated list of index names to search. Use `_all`
or omit to search all indices.
`<target>`::
(Optional, string)
A comma-separated list of data streams, indices, and index aliases to search.
Wildcard (`*`) expressions are supported. To search all data streams or indices
in a cluster, omit this parameter or use `_all` or `*`.
[[docs-update-by-query-api-query-params]]
==== {api-query-parms-title}
@ -197,7 +199,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
`scroll`::
(Optional, <<time-units,time value>>)
Period to retain the <<scroll-search-context,search context>> for scrolling. See
<<request-body-search-scroll>>.
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
@ -290,7 +295,7 @@ version conflicts.
==== {api-examples-title}
The simplest usage of `_update_by_query` just performs an update on every
document in the index without changing the source. This is useful to
document in the data stream or index without changing the source. This is useful to
<<picking-up-a-new-property,pick up a new property>> or some other online
mapping change.
@ -313,7 +318,7 @@ POST twitter/_update_by_query?conflicts=proceed
way as the <<search-search,Search API>>. You can also use the `q`
parameter in the same way as the search API.
Update documents in multiple indices:
Update documents in multiple data streams or indices:
[source,console]
--------------------------------------------------
@ -617,8 +622,8 @@ Which results in a sensible `total` like this one:
Setting `slices` to `auto` will let Elasticsearch choose the number of slices
to use. This setting will use one slice per shard, up to a certain limit. If
there are multiple source indices, it will choose the number of slices based
on the index with the smallest number of shards.
there are multiple source data streams or indices, it will choose the number of slices based
on the index or backing index with the smallest number of shards.
Adding `slices` to `_update_by_query` just automates the manual process used in
the section above, creating sub-requests which means it has some quirks:
@ -641,7 +646,7 @@ be larger than others. Expect larger slices to have a more even distribution.
the point above about distribution being uneven and you should conclude that
using `max_docs` with `slices` might not result in exactly `max_docs` documents
being updated.
* Each sub-request gets a slightly different snapshot of the source index
* Each sub-request gets a slightly different snapshot of the source data stream or index
though these are all taken at approximately the same time.
[float]