[7.x] [DOCS] Document delete/update by query for data streams (#58679) (#58706)

2020-06-30 08:35:13 -04:00 · 2020-06-30 08:35:13 -04:00 · d33764583c
parent a9677efb56
commit d33764583c
5 changed files with 156 additions and 54 deletions
--- a/docs/reference/data-streams/data-streams-overview.asciidoc
+++ b/docs/reference/data-streams/data-streams-overview.asciidoc
@ -119,26 +119,28 @@ manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
 === Append-only

 For most time-series use cases, existing data is rarely, if ever, updated.
-Because of this, data streams are designed to be append-only. This means you can
-send indexing requests for new documents directly to a data stream. However, you
-cannot send update or deletion requests for existing documents to a data stream.
+Because of this, data streams are designed to be append-only.

-To update or delete specific documents in a data stream, submit one of the
-following requests to the backing index containing the document:
+You can send <<add-documents-to-a-data-stream,indexing requests for new
+documents>> directly to a data stream. However, you cannot send the following
+requests for existing documents directly to a data stream:

 * An <<docs-index_,index API>> request with an
-  <<docs-index-api-op_type,`op_type`>> of `index`.
-  These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
-  and `if_primary_term`>> arguments.
+  <<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
+  defaults to `index` for existing documents.

 * A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
-  action. If the action type is `index`, the action must include valid
-  <<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
-  arguments.
+  action.

 * A <<docs-delete,delete API>> request

-See <<update-delete-docs-in-a-data-stream>>.
+Instead, you can use the <<docs-update-by-query,update by query>> and
+<<docs-delete-by-query,delete by query>> APIs to update or delete existing
+documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
+
+Alternatively, you can update or delete a document by submitting requests to the
+backing index containing the document. See
+<<update-delete-docs-in-a-backing-index>>.

 TIP: If you frequently update or delete existing documents,
 we recommend using an <<indices-add-alias,index alias>> and
--- a/docs/reference/data-streams/set-up-a-data-stream.asciidoc
+++ b/docs/reference/data-streams/set-up-a-data-stream.asciidoc
@ -26,11 +26,10 @@ TIP: Data streams work well with most common log formats. While no schema is
 required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
 (ECS)].

-* Data streams are designed to be <<data-streams-append-only,append-only>>.
-While you can index new documents directly to a data stream, you cannot use a
-data stream to directly update or delete individual documents. To update or
-delete specific documents in a data stream, submit a <<docs-delete,delete>> or
-<<docs-update,update>> API request to the backing index containing the document.
+* Data streams are best suited for time-based,
+<<data-streams-append-only,append-only>> use cases. If you frequently need to
+update or delete existing documents, we recommend using an index alias and an
+index template instead.


 [discrete]
--- a/docs/reference/data-streams/use-a-data-stream.asciidoc
+++ b/docs/reference/data-streams/use-a-data-stream.asciidoc
@ -9,6 +9,7 @@ the following:
 * <<manually-roll-over-a-data-stream>>
 * <<reindex-with-a-data-stream>>
 * <<update-delete-docs-in-a-data-stream>>
+* <<update-delete-docs-in-a-backing-index>>

 ////
 [source,console]
@ -66,6 +67,10 @@ POST /logs/_doc/
 ----
 // TEST[continued]
 ====
+
+IMPORTANT: You cannot add new documents to a data stream using the index API's
+`PUT /<target>/_doc/<_id>` request format. Use the `PUT /<target>/_create/<_id>`
+format instead.
 --

 * A <<docs-bulk,bulk API>> request using the `create` action. Specify the data
@ -348,12 +353,96 @@ POST /_reindex
 [[update-delete-docs-in-a-data-stream]]
 === Update or delete documents in a data stream

-Data streams are designed to be <<data-streams-append-only,append-only>>. This
-means you cannot send update or deletion requests for existing documents to a
-data stream. However, you can send update or deletion requests to the backing
-index containing the document.
+You can update or delete documents in a data stream using the following
+requests:

-To delete or update a document in a data stream, you first need to get:
+* An <<docs-update-by-query,update by query API>> request
+
+.*Example*
+[%collapsible]
+====
+The following update by query API request updates documents in the `logs` data
+stream with a `user.id` of `i96BP1mA`. The request uses a
+<<modules-scripting-using,script>> to assign matching documents a new `user.id`
+value of `XgdX0NoX`.
+
+////
+[source,console]
+----
+PUT /logs/_create/2?refresh=wait_for
+{
+  "@timestamp": "2020-12-07T11:06:07.000Z",
+  "user": {
+    "id": "i96BP1mA"
+  }
+}
+----
+// TEST[continued]
+////
+
+[source,console]
+----
+POST /logs/_update_by_query
+{
+  "query": {
+    "match": {
+      "user.id": "i96BP1mA"
+    }
+  },
+  "script": {
+    "source": "ctx._source.user.id = params.new_id",
+    "params": {
+      "new_id": "XgdX0NoX"
+    }
+  }
+}
+----
+// TEST[continued]
+====
+
+* A <<docs-delete-by-query,delete by query API>> request
+
+.*Example*
+[%collapsible]
+====
+The following delete by query API request deletes documents in the `logs` data
+stream with a `user.id` of `zVZMamUM`.
+
+////
+[source,console]
+----
+PUT /logs/_create/1?refresh=wait_for
+{
+  "@timestamp": "2020-12-07T11:06:07.000Z",
+  "user": {
+    "id": "zVZMamUM"
+  }
+}
+----
+// TEST[continued]
+////
+
+[source,console]
+----
+POST /logs/_delete_by_query
+{
+  "query": {
+    "match": {
+      "user.id": "zVZMamUM"
+    }
+  }
+}
+----
+// TEST[continued]
+====
+
+[discrete]
+[[update-delete-docs-in-a-backing-index]]
+=== Update or delete documents in a backing index
+
+Alternatively, you can update or delete documents in a data stream by sending
+the update or deletion request to the backing index containing the document. To
+do this, you first need to get:

 * The <<mapping-id-field,document ID>>
 * The name of the backing index that contains the document
@ -429,7 +518,7 @@ information for any documents matching the search.
        "_index": ".ds-logs-000002",                <1>
        "_type": "_doc",
        "_id": "bfspvnIBr7VVZlfp2lqX",              <2>
-        "_seq_no": 4,                               <3>
+        "_seq_no": 8,                               <3>
        "_primary_term": 1,                         <4>
        "_score": 0.2876821,
        "_source": {
@ -445,6 +534,8 @@ information for any documents matching the search.
 }
 ----
 // TESTRESPONSE[s/"took": 20/"took": $body.took/]
+// TESTRESPONSE[s/"max_score": 0.2876821/"max_score": $body.hits.max_score/]
+// TESTRESPONSE[s/"_score": 0.2876821/"_score": $body.hits.hits.0._score/]

 <1> Backing index containing the matching document
 <2> Document ID for the document
@ -469,7 +560,7 @@ contains a new JSON source for the document.

 [source,console]
 ----
-PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=4&if_primary_term=1
+PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=8&if_primary_term=1
 {
  "@timestamp": "2020-12-07T11:06:07.000Z",
  "user": {
@ -534,7 +625,7 @@ parameters.
 [source,console]
 ----
 PUT /_bulk?refresh
-{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 4, "if_primary_term": 1 } }
+{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 8, "if_primary_term": 1 } }
 { "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
 ----
 // TEST[continued]
--- a/docs/reference/docs/delete-by-query.asciidoc
+++ b/docs/reference/docs/delete-by-query.asciidoc
@ -47,7 +47,7 @@ POST /twitter/_delete_by_query
 [[docs-delete-by-query-api-request]]
 ==== {api-request-title}

-`POST /<index>/_delete_by_query`
+`POST /<target>/_delete_by_query`

 [[docs-delete-by-query-api-desc]]
 ==== {api-description-title}
@ -55,7 +55,7 @@ POST /twitter/_delete_by_query
 You can specify the query criteria in the request URI or the request body
 using the same syntax as the  <<search-search,Search API>>. 

-When you submit a delete by query request, {es} gets a snapshot of the index
+When you submit a delete by query request, {es} gets a snapshot of the data stream or index
 when it begins processing the request and deletes matching documents using
 `internal` versioning. If a document changes between the time that the
 snapshot is taken and the delete operation is processed, it results in a version
@ -134,12 +134,12 @@ Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the
 delete process. This can improve efficiency and provide a
 convenient way to break the request down into smaller parts.

-Setting `slices` to `auto` chooses a reasonable number for most indices. 
+Setting `slices` to `auto` chooses a reasonable number for most data streams and indices. 
 If you're slicing manually or otherwise tuning automatic slicing, keep in mind 
 that:

 * Query performance is most efficient when the number of `slices` is equal to 
-the number of shards in the index. If that number is large (for example,
+the number of shards in the index or backing index. If that number is large (for example,
 500), choose a lower number as too many `slices` hurts performance. Setting
 `slices` higher than the number of shards generally does not improve efficiency
 and adds overhead.
@ -153,9 +153,11 @@ documents being reindexed and cluster resources.
 [[docs-delete-by-query-api-path-params]]
 ==== {api-path-parms-title}

-`<index>`::
-(Optional, string) A comma-separated list of index names to search. Use `_all`
-or omit to search all indices.
+`<target>`::
+(Optional, string)
+A comma-separated list of data streams, indices, and index aliases to search.
+Wildcard (`*`) expressions are supported. To search all data streams or indices
+in a cluster, omit this parameter or use `_all` or `*`.

 [[docs-delete-by-query-api-query-params]]
 ==== {api-query-parms-title}
@ -200,7 +202,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]

 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]

-include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
+`scroll`::
+(Optional, <<time-units,time value>>)
+Period to retain the <<scroll-search-context,search context>> for scrolling. See
+<<request-body-search-scroll>>.

 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]

@ -343,7 +348,7 @@ version conflicts.
 [[docs-delete-by-query-api-example]]
 ==== {api-examples-title}

-Delete all tweets from the `twitter` index:
+Delete all tweets from the `twitter` data stream or index:

 [source,console]
 --------------------------------------------------
@ -356,7 +361,7 @@ POST twitter/_delete_by_query?conflicts=proceed
 --------------------------------------------------
 // TEST[setup:twitter]

-Delete documents from multiple indices:
+Delete documents from multiple data streams or indices:

 [source,console]
 --------------------------------------------------
@ -531,8 +536,8 @@ Which results in a sensible `total` like this one:

 Setting `slices` to `auto` will let {es} choose the number of slices
 to use. This setting will use one slice per shard, up to a certain limit. If
-there are multiple source indices, it will choose the number of slices based
-on the index with the smallest number of shards.
+there are multiple source data streams or indices, it will choose the number of slices based
+on the index or backing index with the smallest number of shards.

 Adding `slices` to `_delete_by_query` just automates the manual process used in
 the section above, creating sub-requests which means it has some quirks:
@ -555,7 +560,7 @@ slices` are distributed proportionally to each sub-request. Combine that with
 the point above about distribution being uneven and you should conclude that
 using `max_docs` with `slices` might not result in exactly `max_docs` documents
 being deleted.
-* Each sub-request gets a slightly different snapshot of the source index
+* Each sub-request gets a slightly different snapshot of the source data stream or index
 though these are all taken at approximately the same time.

 [float]
--- a/docs/reference/docs/update-by-query.asciidoc
+++ b/docs/reference/docs/update-by-query.asciidoc
@ -5,7 +5,7 @@
 ++++

 Updates documents that match the specified query. 
-If no query is specified, performs an update on every document in the index without
+If no query is specified, performs an update on every document in the data stream or index without
 modifying the source, which is useful for picking up mapping changes.

 [source,console]
@ -44,7 +44,7 @@ POST twitter/_update_by_query?conflicts=proceed
 [[docs-update-by-query-api-request]]
 ==== {api-request-title}

-`POST /<index>/_update_by_query`
+`POST /<target>/_update_by_query`

 [[docs-update-by-query-api-desc]]
 ==== {api-description-title}
@ -52,7 +52,7 @@ POST twitter/_update_by_query?conflicts=proceed
 You can specify the query criteria in the request URI or the request body
 using the same syntax as the  <<search-search,Search API>>. 

-When you submit an update by query request, {es} gets a snapshot of the index
+When you submit an update by query request, {es} gets a snapshot of the data stream or index
 when it begins processing the request and updates matching documents using
 `internal` versioning. 
 When the versions match, the document is updated and the version number is incremented. 
@ -75,7 +75,7 @@ Any update requests that completed successfully still stick, they are not rolled
 ===== Refreshing shards

 Specifying the `refresh` parameter refreshes all shards once the request completes. 
-This is different than the update API#8217;s `refresh` parameter, which causes just the shard
+This is different than the update API's `refresh` parameter, which causes just the shard
 that received the request to be refreshed. Unlike the update API, it does not support 
 `wait_for`.

@ -129,12 +129,12 @@ Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
 update process. This can improve efficiency and provide a
 convenient way to break the request down into smaller parts.

-Setting `slices` to `auto` chooses a reasonable number for most indices. 
+Setting `slices` to `auto` chooses a reasonable number for most data streams and indices. 
 If you're slicing manually or otherwise tuning automatic slicing, keep in mind 
 that:

 * Query performance is most efficient when the number of `slices` is equal to 
-the number of shards in the index. If that number is large (for example,
+the number of shards in the index or backing index. If that number is large (for example,
 500), choose a lower number as too many `slices` hurts performance. Setting
 `slices` higher than the number of shards generally does not improve efficiency
 and adds overhead.
@ -148,9 +148,11 @@ documents being reindexed and cluster resources.
 [[docs-update-by-query-api-path-params]]
 ==== {api-path-parms-title}

-`<index>`::
-(Optional, string) A comma-separated list of index names to search. Use `_all`
-or omit to search all indices.
+`<target>`::
+(Optional, string)
+A comma-separated list of data streams, indices, and index aliases to search.
+Wildcard (`*`) expressions are supported. To search all data streams or indices
+in a cluster, omit this parameter or use `_all` or `*`.

 [[docs-update-by-query-api-query-params]]
 ==== {api-query-parms-title}
@ -197,7 +199,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]

 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]

-include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
+`scroll`::
+(Optional, <<time-units,time value>>)
+Period to retain the <<scroll-search-context,search context>> for scrolling. See
+<<request-body-search-scroll>>.

 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]

@ -290,7 +295,7 @@ version conflicts.
 ==== {api-examples-title}

 The simplest usage of `_update_by_query` just performs an update on every
-document in the index without changing the source. This is useful to
+document in the data stream or index without changing the source. This is useful to
 <<picking-up-a-new-property,pick up a new property>> or some other online
 mapping change.

@ -313,7 +318,7 @@ POST twitter/_update_by_query?conflicts=proceed
 way as the <<search-search,Search API>>. You can also use the `q`
 parameter in the same way as the search API.

-Update documents in multiple indices:
+Update documents in multiple data streams or indices:

 [source,console]
 --------------------------------------------------
@ -617,8 +622,8 @@ Which results in a sensible `total` like this one:

 Setting `slices` to `auto` will let Elasticsearch choose the number of slices
 to use. This setting will use one slice per shard, up to a certain limit. If
-there are multiple source indices, it will choose the number of slices based
-on the index with the smallest number of shards.
+there are multiple source data streams or indices, it will choose the number of slices based
+on the index or backing index with the smallest number of shards.

 Adding `slices` to `_update_by_query` just automates the manual process used in
 the section above, creating sub-requests which means it has some quirks:
@ -641,7 +646,7 @@ be larger than others. Expect larger slices to have a more even distribution.
 the point above about distribution being uneven and you should conclude that
 using `max_docs` with `slices` might not result in exactly `max_docs` documents
 being updated.
-* Each sub-request gets a slightly different snapshot of the source index
+* Each sub-request gets a slightly different snapshot of the source data stream or index
 though these are all taken at approximately the same time.

 [float]