This commit is contained in:
parent
a9677efb56
commit
d33764583c
|
@ -119,26 +119,28 @@ manually perform a rollover. See <<manually-roll-over-a-data-stream>>.
|
|||
=== Append-only
|
||||
|
||||
For most time-series use cases, existing data is rarely, if ever, updated.
|
||||
Because of this, data streams are designed to be append-only. This means you can
|
||||
send indexing requests for new documents directly to a data stream. However, you
|
||||
cannot send update or deletion requests for existing documents to a data stream.
|
||||
Because of this, data streams are designed to be append-only.
|
||||
|
||||
To update or delete specific documents in a data stream, submit one of the
|
||||
following requests to the backing index containing the document:
|
||||
You can send <<add-documents-to-a-data-stream,indexing requests for new
|
||||
documents>> directly to a data stream. However, you cannot send the following
|
||||
requests for existing documents directly to a data stream:
|
||||
|
||||
* An <<docs-index_,index API>> request with an
|
||||
<<docs-index-api-op_type,`op_type`>> of `index`.
|
||||
These requests must include valid <<optimistic-concurrency-control,`if_seq_no`
|
||||
and `if_primary_term`>> arguments.
|
||||
<<docs-index-api-op_type,`op_type`>> of `index`. The `op_type` parameter
|
||||
defaults to `index` for existing documents.
|
||||
|
||||
* A <<docs-bulk,bulk API>> request using the `delete`, `index`, or `update`
|
||||
action. If the action type is `index`, the action must include valid
|
||||
<<bulk-optimistic-concurrency-control,`if_seq_no` and `if_primary_term`>>
|
||||
arguments.
|
||||
action.
|
||||
|
||||
* A <<docs-delete,delete API>> request
|
||||
|
||||
See <<update-delete-docs-in-a-data-stream>>.
|
||||
Instead, you can use the <<docs-update-by-query,update by query>> and
|
||||
<<docs-delete-by-query,delete by query>> APIs to update or delete existing
|
||||
documents in a data stream. See <<update-delete-docs-in-a-data-stream>>.
|
||||
|
||||
Alternatively, you can update or delete a document by submitting requests to the
|
||||
backing index containing the document. See
|
||||
<<update-delete-docs-in-a-backing-index>>.
|
||||
|
||||
TIP: If you frequently update or delete existing documents,
|
||||
we recommend using an <<indices-add-alias,index alias>> and
|
||||
|
|
|
@ -26,11 +26,10 @@ TIP: Data streams work well with most common log formats. While no schema is
|
|||
required to use data streams, we recommend the {ecs-ref}[Elastic Common Schema
|
||||
(ECS)].
|
||||
|
||||
* Data streams are designed to be <<data-streams-append-only,append-only>>.
|
||||
While you can index new documents directly to a data stream, you cannot use a
|
||||
data stream to directly update or delete individual documents. To update or
|
||||
delete specific documents in a data stream, submit a <<docs-delete,delete>> or
|
||||
<<docs-update,update>> API request to the backing index containing the document.
|
||||
* Data streams are best suited for time-based,
|
||||
<<data-streams-append-only,append-only>> use cases. If you frequently need to
|
||||
update or delete existing documents, we recommend using an index alias and an
|
||||
index template instead.
|
||||
|
||||
|
||||
[discrete]
|
||||
|
|
|
@ -9,6 +9,7 @@ the following:
|
|||
* <<manually-roll-over-a-data-stream>>
|
||||
* <<reindex-with-a-data-stream>>
|
||||
* <<update-delete-docs-in-a-data-stream>>
|
||||
* <<update-delete-docs-in-a-backing-index>>
|
||||
|
||||
////
|
||||
[source,console]
|
||||
|
@ -66,6 +67,10 @@ POST /logs/_doc/
|
|||
----
|
||||
// TEST[continued]
|
||||
====
|
||||
|
||||
IMPORTANT: You cannot add new documents to a data stream using the index API's
|
||||
`PUT /<target>/_doc/<_id>` request format. Use the `PUT /<target>/_create/<_id>`
|
||||
format instead.
|
||||
--
|
||||
|
||||
* A <<docs-bulk,bulk API>> request using the `create` action. Specify the data
|
||||
|
@ -348,12 +353,96 @@ POST /_reindex
|
|||
[[update-delete-docs-in-a-data-stream]]
|
||||
=== Update or delete documents in a data stream
|
||||
|
||||
Data streams are designed to be <<data-streams-append-only,append-only>>. This
|
||||
means you cannot send update or deletion requests for existing documents to a
|
||||
data stream. However, you can send update or deletion requests to the backing
|
||||
index containing the document.
|
||||
You can update or delete documents in a data stream using the following
|
||||
requests:
|
||||
|
||||
To delete or update a document in a data stream, you first need to get:
|
||||
* An <<docs-update-by-query,update by query API>> request
|
||||
+
|
||||
.*Example*
|
||||
[%collapsible]
|
||||
====
|
||||
The following update by query API request updates documents in the `logs` data
|
||||
stream with a `user.id` of `i96BP1mA`. The request uses a
|
||||
<<modules-scripting-using,script>> to assign matching documents a new `user.id`
|
||||
value of `XgdX0NoX`.
|
||||
|
||||
////
|
||||
[source,console]
|
||||
----
|
||||
PUT /logs/_create/2?refresh=wait_for
|
||||
{
|
||||
"@timestamp": "2020-12-07T11:06:07.000Z",
|
||||
"user": {
|
||||
"id": "i96BP1mA"
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
////
|
||||
|
||||
[source,console]
|
||||
----
|
||||
POST /logs/_update_by_query
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "i96BP1mA"
|
||||
}
|
||||
},
|
||||
"script": {
|
||||
"source": "ctx._source.user.id = params.new_id",
|
||||
"params": {
|
||||
"new_id": "XgdX0NoX"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
====
|
||||
|
||||
* A <<docs-delete-by-query,delete by query API>> request
|
||||
+
|
||||
.*Example*
|
||||
[%collapsible]
|
||||
====
|
||||
The following delete by query API request deletes documents in the `logs` data
|
||||
stream with a `user.id` of `zVZMamUM`.
|
||||
|
||||
////
|
||||
[source,console]
|
||||
----
|
||||
PUT /logs/_create/1?refresh=wait_for
|
||||
{
|
||||
"@timestamp": "2020-12-07T11:06:07.000Z",
|
||||
"user": {
|
||||
"id": "zVZMamUM"
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
////
|
||||
|
||||
[source,console]
|
||||
----
|
||||
POST /logs/_delete_by_query
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "zVZMamUM"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
====
|
||||
|
||||
[discrete]
|
||||
[[update-delete-docs-in-a-backing-index]]
|
||||
=== Update or delete documents in a backing index
|
||||
|
||||
Alternatively, you can update or delete documents in a data stream by sending
|
||||
the update or deletion request to the backing index containing the document. To
|
||||
do this, you first need to get:
|
||||
|
||||
* The <<mapping-id-field,document ID>>
|
||||
* The name of the backing index that contains the document
|
||||
|
@ -429,7 +518,7 @@ information for any documents matching the search.
|
|||
"_index": ".ds-logs-000002", <1>
|
||||
"_type": "_doc",
|
||||
"_id": "bfspvnIBr7VVZlfp2lqX", <2>
|
||||
"_seq_no": 4, <3>
|
||||
"_seq_no": 8, <3>
|
||||
"_primary_term": 1, <4>
|
||||
"_score": 0.2876821,
|
||||
"_source": {
|
||||
|
@ -445,6 +534,8 @@ information for any documents matching the search.
|
|||
}
|
||||
----
|
||||
// TESTRESPONSE[s/"took": 20/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 0.2876821/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"_score": 0.2876821/"_score": $body.hits.hits.0._score/]
|
||||
|
||||
<1> Backing index containing the matching document
|
||||
<2> Document ID for the document
|
||||
|
@ -469,7 +560,7 @@ contains a new JSON source for the document.
|
|||
|
||||
[source,console]
|
||||
----
|
||||
PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=4&if_primary_term=1
|
||||
PUT /.ds-logs-000002/_doc/bfspvnIBr7VVZlfp2lqX?if_seq_no=8&if_primary_term=1
|
||||
{
|
||||
"@timestamp": "2020-12-07T11:06:07.000Z",
|
||||
"user": {
|
||||
|
@ -534,7 +625,7 @@ parameters.
|
|||
[source,console]
|
||||
----
|
||||
PUT /_bulk?refresh
|
||||
{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 4, "if_primary_term": 1 } }
|
||||
{ "index": { "_index": ".ds-logs-000002", "_id": "bfspvnIBr7VVZlfp2lqX", "if_seq_no": 8, "if_primary_term": 1 } }
|
||||
{ "@timestamp": "2020-12-07T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
|
||||
----
|
||||
// TEST[continued]
|
||||
|
|
|
@ -47,7 +47,7 @@ POST /twitter/_delete_by_query
|
|||
[[docs-delete-by-query-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`POST /<index>/_delete_by_query`
|
||||
`POST /<target>/_delete_by_query`
|
||||
|
||||
[[docs-delete-by-query-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
@ -55,7 +55,7 @@ POST /twitter/_delete_by_query
|
|||
You can specify the query criteria in the request URI or the request body
|
||||
using the same syntax as the <<search-search,Search API>>.
|
||||
|
||||
When you submit a delete by query request, {es} gets a snapshot of the index
|
||||
When you submit a delete by query request, {es} gets a snapshot of the data stream or index
|
||||
when it begins processing the request and deletes matching documents using
|
||||
`internal` versioning. If a document changes between the time that the
|
||||
snapshot is taken and the delete operation is processed, it results in a version
|
||||
|
@ -134,12 +134,12 @@ Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the
|
|||
delete process. This can improve efficiency and provide a
|
||||
convenient way to break the request down into smaller parts.
|
||||
|
||||
Setting `slices` to `auto` chooses a reasonable number for most indices.
|
||||
Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
|
||||
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
|
||||
that:
|
||||
|
||||
* Query performance is most efficient when the number of `slices` is equal to
|
||||
the number of shards in the index. If that number is large (for example,
|
||||
the number of shards in the index or backing index. If that number is large (for example,
|
||||
500), choose a lower number as too many `slices` hurts performance. Setting
|
||||
`slices` higher than the number of shards generally does not improve efficiency
|
||||
and adds overhead.
|
||||
|
@ -153,9 +153,11 @@ documents being reindexed and cluster resources.
|
|||
[[docs-delete-by-query-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<index>`::
|
||||
(Optional, string) A comma-separated list of index names to search. Use `_all`
|
||||
or omit to search all indices.
|
||||
`<target>`::
|
||||
(Optional, string)
|
||||
A comma-separated list of data streams, indices, and index aliases to search.
|
||||
Wildcard (`*`) expressions are supported. To search all data streams or indices
|
||||
in a cluster, omit this parameter or use `_all` or `*`.
|
||||
|
||||
[[docs-delete-by-query-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
@ -200,7 +202,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
|
|||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
|
||||
`scroll`::
|
||||
(Optional, <<time-units,time value>>)
|
||||
Period to retain the <<scroll-search-context,search context>> for scrolling. See
|
||||
<<request-body-search-scroll>>.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
|
||||
|
||||
|
@ -343,7 +348,7 @@ version conflicts.
|
|||
[[docs-delete-by-query-api-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
Delete all tweets from the `twitter` index:
|
||||
Delete all tweets from the `twitter` data stream or index:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -356,7 +361,7 @@ POST twitter/_delete_by_query?conflicts=proceed
|
|||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
Delete documents from multiple indices:
|
||||
Delete documents from multiple data streams or indices:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -531,8 +536,8 @@ Which results in a sensible `total` like this one:
|
|||
|
||||
Setting `slices` to `auto` will let {es} choose the number of slices
|
||||
to use. This setting will use one slice per shard, up to a certain limit. If
|
||||
there are multiple source indices, it will choose the number of slices based
|
||||
on the index with the smallest number of shards.
|
||||
there are multiple source data streams or indices, it will choose the number of slices based
|
||||
on the index or backing index with the smallest number of shards.
|
||||
|
||||
Adding `slices` to `_delete_by_query` just automates the manual process used in
|
||||
the section above, creating sub-requests which means it has some quirks:
|
||||
|
@ -555,7 +560,7 @@ slices` are distributed proportionally to each sub-request. Combine that with
|
|||
the point above about distribution being uneven and you should conclude that
|
||||
using `max_docs` with `slices` might not result in exactly `max_docs` documents
|
||||
being deleted.
|
||||
* Each sub-request gets a slightly different snapshot of the source index
|
||||
* Each sub-request gets a slightly different snapshot of the source data stream or index
|
||||
though these are all taken at approximately the same time.
|
||||
|
||||
[float]
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
++++
|
||||
|
||||
Updates documents that match the specified query.
|
||||
If no query is specified, performs an update on every document in the index without
|
||||
If no query is specified, performs an update on every document in the data stream or index without
|
||||
modifying the source, which is useful for picking up mapping changes.
|
||||
|
||||
[source,console]
|
||||
|
@ -44,7 +44,7 @@ POST twitter/_update_by_query?conflicts=proceed
|
|||
[[docs-update-by-query-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`POST /<index>/_update_by_query`
|
||||
`POST /<target>/_update_by_query`
|
||||
|
||||
[[docs-update-by-query-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
@ -52,7 +52,7 @@ POST twitter/_update_by_query?conflicts=proceed
|
|||
You can specify the query criteria in the request URI or the request body
|
||||
using the same syntax as the <<search-search,Search API>>.
|
||||
|
||||
When you submit an update by query request, {es} gets a snapshot of the index
|
||||
When you submit an update by query request, {es} gets a snapshot of the data stream or index
|
||||
when it begins processing the request and updates matching documents using
|
||||
`internal` versioning.
|
||||
When the versions match, the document is updated and the version number is incremented.
|
||||
|
@ -75,7 +75,7 @@ Any update requests that completed successfully still stick, they are not rolled
|
|||
===== Refreshing shards
|
||||
|
||||
Specifying the `refresh` parameter refreshes all shards once the request completes.
|
||||
This is different than the update API#8217;s `refresh` parameter, which causes just the shard
|
||||
This is different than the update API's `refresh` parameter, which causes just the shard
|
||||
that received the request to be refreshed. Unlike the update API, it does not support
|
||||
`wait_for`.
|
||||
|
||||
|
@ -129,12 +129,12 @@ Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
|
|||
update process. This can improve efficiency and provide a
|
||||
convenient way to break the request down into smaller parts.
|
||||
|
||||
Setting `slices` to `auto` chooses a reasonable number for most indices.
|
||||
Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
|
||||
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
|
||||
that:
|
||||
|
||||
* Query performance is most efficient when the number of `slices` is equal to
|
||||
the number of shards in the index. If that number is large (for example,
|
||||
the number of shards in the index or backing index. If that number is large (for example,
|
||||
500), choose a lower number as too many `slices` hurts performance. Setting
|
||||
`slices` higher than the number of shards generally does not improve efficiency
|
||||
and adds overhead.
|
||||
|
@ -148,9 +148,11 @@ documents being reindexed and cluster resources.
|
|||
[[docs-update-by-query-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<index>`::
|
||||
(Optional, string) A comma-separated list of index names to search. Use `_all`
|
||||
or omit to search all indices.
|
||||
`<target>`::
|
||||
(Optional, string)
|
||||
A comma-separated list of data streams, indices, and index aliases to search.
|
||||
Wildcard (`*`) expressions are supported. To search all data streams or indices
|
||||
in a cluster, omit this parameter or use `_all` or `*`.
|
||||
|
||||
[[docs-update-by-query-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
@ -197,7 +199,10 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
|
|||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
|
||||
`scroll`::
|
||||
(Optional, <<time-units,time value>>)
|
||||
Period to retain the <<scroll-search-context,search context>> for scrolling. See
|
||||
<<request-body-search-scroll>>.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
|
||||
|
||||
|
@ -290,7 +295,7 @@ version conflicts.
|
|||
==== {api-examples-title}
|
||||
|
||||
The simplest usage of `_update_by_query` just performs an update on every
|
||||
document in the index without changing the source. This is useful to
|
||||
document in the data stream or index without changing the source. This is useful to
|
||||
<<picking-up-a-new-property,pick up a new property>> or some other online
|
||||
mapping change.
|
||||
|
||||
|
@ -313,7 +318,7 @@ POST twitter/_update_by_query?conflicts=proceed
|
|||
way as the <<search-search,Search API>>. You can also use the `q`
|
||||
parameter in the same way as the search API.
|
||||
|
||||
Update documents in multiple indices:
|
||||
Update documents in multiple data streams or indices:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -617,8 +622,8 @@ Which results in a sensible `total` like this one:
|
|||
|
||||
Setting `slices` to `auto` will let Elasticsearch choose the number of slices
|
||||
to use. This setting will use one slice per shard, up to a certain limit. If
|
||||
there are multiple source indices, it will choose the number of slices based
|
||||
on the index with the smallest number of shards.
|
||||
there are multiple source data streams or indices, it will choose the number of slices based
|
||||
on the index or backing index with the smallest number of shards.
|
||||
|
||||
Adding `slices` to `_update_by_query` just automates the manual process used in
|
||||
the section above, creating sub-requests which means it has some quirks:
|
||||
|
@ -641,7 +646,7 @@ be larger than others. Expect larger slices to have a more even distribution.
|
|||
the point above about distribution being uneven and you should conclude that
|
||||
using `max_docs` with `slices` might not result in exactly `max_docs` documents
|
||||
being updated.
|
||||
* Each sub-request gets a slightly different snapshot of the source index
|
||||
* Each sub-request gets a slightly different snapshot of the source data stream or index
|
||||
though these are all taken at approximately the same time.
|
||||
|
||||
[float]
|
||||
|
|
Loading…
Reference in New Issue