Co-authored-by: debadair <debadair@elastic.co>
This commit is contained in:
parent
84513c7539
commit
8341ebc061
|
@ -19,11 +19,6 @@ POST /twitter/_delete_by_query
|
|||
--------------------------------------------------
|
||||
// TEST[setup:big_twitter]
|
||||
|
||||
[[docs-delete-by-query-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`POST /<index>/_delete_by_query`
|
||||
|
||||
////
|
||||
|
||||
[source,console-result]
|
||||
|
@ -49,6 +44,11 @@ POST /twitter/_delete_by_query
|
|||
// TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
|
||||
////
|
||||
|
||||
[[docs-delete-by-query-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`POST /<index>/_delete_by_query`
|
||||
|
||||
[[docs-delete-by-query-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
|
@ -89,8 +89,7 @@ request to be refreshed. Unlike the delete API, it does not support
|
|||
|
||||
If the request contains `wait_for_completion=false`, {es}
|
||||
performs some preflight checks, launches the request, and returns a
|
||||
<<docs-delete-by-query-task-api,`task`>>
|
||||
you can use to cancel or get the status of the task. {es} creates a
|
||||
<<tasks,`task`>> you can use to cancel or get the status of the task. {es} creates a
|
||||
record of this task as a document at `.tasks/task/${taskId}`. When you are
|
||||
done with a task, you should delete the task document so {es} can reclaim the
|
||||
space.
|
||||
|
@ -227,9 +226,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=timeout]
|
|||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=version]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=timeout]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
|
||||
|
||||
[[docs-delete-by-query-api-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
@ -239,7 +236,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards
|
|||
using the <<query-dsl,Query DSL>>.
|
||||
|
||||
|
||||
[[docs-delete-by-quer-api-response-body]]
|
||||
[[docs-delete-by-query-api-response-body]]
|
||||
==== Response body
|
||||
|
||||
//////////////////////////
|
||||
|
@ -330,7 +327,7 @@ The number of requests per second effectively executed during the delete by quer
|
|||
`throttled_until_millis`::
|
||||
|
||||
This field should always be equal to zero in a `_delete_by_query` response. It only
|
||||
has meaning when using the <<docs-delete-by-query-task-api, Task API>>, where it
|
||||
has meaning when using the <<tasks, Task API>>, where it
|
||||
indicates the next time (in milliseconds since epoch) a throttled request will be
|
||||
executed again in order to conform to `requests_per_second`.
|
||||
|
||||
|
@ -541,7 +538,7 @@ Adding `slices` to `_delete_by_query` just automates the manual process used in
|
|||
the section above, creating sub-requests which means it has some quirks:
|
||||
|
||||
* You can see these requests in the
|
||||
<<docs-delete-by-query-task-api,Tasks APIs>>. These sub-requests are "child"
|
||||
<<tasks,Tasks APIs>>. These sub-requests are "child"
|
||||
tasks of the task for the request with `slices`.
|
||||
* Fetching the status of the task for the request with `slices` only contains
|
||||
the status of completed slices.
|
||||
|
@ -655,7 +652,7 @@ you to delete that document.
|
|||
|
||||
[float]
|
||||
[[docs-delete-by-query-cancel-task-api]]
|
||||
==== Cancel a delete by query operation
|
||||
===== Cancel a delete by query operation
|
||||
|
||||
Any delete by query can be canceled using the <<tasks,task cancel API>>:
|
||||
|
||||
|
|
|
@ -1,10 +1,12 @@
|
|||
[[docs-update-by-query]]
|
||||
=== Update By Query API
|
||||
++++
|
||||
<titleabbrev>Update by query</titleabbrev>
|
||||
++++
|
||||
|
||||
The simplest usage of `_update_by_query` just performs an update on every
|
||||
document in the index without changing the source. This is useful to
|
||||
<<picking-up-a-new-property,pick up a new property>> or some other online
|
||||
mapping change. Here is the API:
|
||||
Updates documents that match the specified query.
|
||||
If no query is specified, performs an update on every document in the index without
|
||||
modifying the source, which is useful for picking up mapping changes.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -12,7 +14,7 @@ POST twitter/_update_by_query?conflicts=proceed
|
|||
--------------------------------------------------
|
||||
// TEST[setup:big_twitter]
|
||||
|
||||
That will return something like this:
|
||||
////
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
|
@ -37,42 +39,262 @@ That will return something like this:
|
|||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
|
||||
|
||||
`_update_by_query` gets a snapshot of the index when it starts and indexes what
|
||||
it finds using `internal` versioning. That means you'll get a version
|
||||
conflict if the document changes between the time when the snapshot was taken
|
||||
and when the index request is processed. When the versions match, the document
|
||||
is updated and the version number is incremented.
|
||||
////
|
||||
|
||||
NOTE: Since `internal` versioning does not support the value 0 as a valid
|
||||
version number, documents with version equal to zero cannot be updated using
|
||||
`_update_by_query` and will fail the request.
|
||||
[[docs-update-by-query-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
All update and query failures cause the `_update_by_query` to abort and are
|
||||
returned in the `failures` of the response. The updates that have been
|
||||
performed still stick. In other words, the process is not rolled back, only
|
||||
aborted. While the first failure causes the abort, all failures that are
|
||||
returned by the failing bulk request are returned in the `failures` element; therefore
|
||||
it's possible for there to be quite a few failed entities.
|
||||
`POST /<index>/_update_by_query`
|
||||
|
||||
If you want to simply count version conflicts, and not cause the `_update_by_query`
|
||||
to abort, you can set `conflicts=proceed` on the url or `"conflicts": "proceed"`
|
||||
in the request body. The first example does this because it is just trying to
|
||||
pick up an online mapping change, and a version conflict simply means that the
|
||||
conflicting document was updated between the start of the `_update_by_query`
|
||||
and the time when it attempted to update the document. This is fine because
|
||||
that update will have picked up the online mapping update.
|
||||
[[docs-update-by-query-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
Back to the API format, this will update tweets from the `twitter` index:
|
||||
You can specify the query criteria in the request URI or the request body
|
||||
using the same syntax as the <<search-search,Search API>>.
|
||||
|
||||
[source,console]
|
||||
When you submit an update by query request, {es} gets a snapshot of the index
|
||||
when it begins processing the request and updates matching documents using
|
||||
`internal` versioning.
|
||||
When the versions match, the document is updated and the version number is incremented.
|
||||
If a document changes between the time that the snapshot is taken and
|
||||
the update operation is processed, it results in a version conflict and the operation fails.
|
||||
You can opt to count version conflicts instead of halting and returning by
|
||||
setting `conflicts` to `proceed`.
|
||||
|
||||
NOTE: Documents with a version equal to 0 cannot be updated using update by
|
||||
query because `internal` versioning does not support 0 as a valid
|
||||
version number.
|
||||
|
||||
While processing an update by query request, {es} performs multiple search
|
||||
requests sequentially to find all of the matching documents.
|
||||
A bulk update request is performed for each batch of matching documents.
|
||||
Any query or update failures cause the update by query request to fail and
|
||||
the failures are shown in the response.
|
||||
Any update requests that completed successfully still stick, they are not rolled back.
|
||||
|
||||
===== Refreshing shards
|
||||
|
||||
Specifying the `refresh` parameter refreshes all shards once the request completes.
|
||||
This is different than the update API#8217;s `refresh` parameter, which causes just the shard
|
||||
that received the request to be refreshed. Unlike the update API, it does not support
|
||||
`wait_for`.
|
||||
|
||||
[[docs-update-by-query-task-api]]
|
||||
===== Running update by query asynchronously
|
||||
|
||||
If the request contains `wait_for_completion=false`, {es}
|
||||
performs some preflight checks, launches the request, and returns a
|
||||
<<tasks,`task`>> you can use to cancel or get the status of the task.
|
||||
{es} creates a record of this task as a document at `.tasks/task/${taskId}`.
|
||||
When you are done with a task, you should delete the task document so
|
||||
{es} can reclaim the space.
|
||||
|
||||
===== Waiting for active shards
|
||||
|
||||
`wait_for_active_shards` controls how many copies of a shard must be active
|
||||
before proceeding with the request. See <<index-wait-for-active-shards>>
|
||||
for details. `timeout` controls how long each write request waits for unavailable
|
||||
shards to become available. Both work exactly the way they work in the
|
||||
<<docs-bulk,Bulk API>>. Update by query uses scrolled searches, so you can also
|
||||
specify the `scroll` parameter to control how long it keeps the search context
|
||||
alive, for example `?scroll=10m`. The default is 5 minutes.
|
||||
|
||||
===== Throttling update requests
|
||||
|
||||
To control the rate at which update by query issues batches of update operations,
|
||||
you can set `requests_per_second` to any positive decimal number. This pads each
|
||||
batch with a wait time to throttle the rate. Set `requests_per_second` to `-1`
|
||||
to disable throttling.
|
||||
|
||||
Throttling uses a wait time between batches so that the internal scroll requests
|
||||
can be given a timeout that takes the request padding into account. The padding
|
||||
time is the difference between the batch size divided by the
|
||||
`requests_per_second` and the time spent writing. By default the batch size is
|
||||
`1000`, so if `requests_per_second` is set to `500`:
|
||||
|
||||
[source,txt]
|
||||
--------------------------------------------------
|
||||
POST twitter/_update_by_query?conflicts=proceed
|
||||
target_time = 1000 / 500 per second = 2 seconds
|
||||
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
You can also limit `_update_by_query` using the
|
||||
<<query-dsl,Query DSL>>. This will update all documents from the
|
||||
`twitter` index for the user `kimchy`:
|
||||
Since the batch is issued as a single `_bulk` request, large batch sizes
|
||||
cause {es} to create many requests and wait before starting the next set.
|
||||
This is "bursty" instead of "smooth".
|
||||
|
||||
[[docs-update-by-query-slice]]
|
||||
===== Slicing
|
||||
|
||||
Update by query supports <<sliced-scroll, sliced scroll>> to parallelize the
|
||||
update process. This can improve efficiency and provide a
|
||||
convenient way to break the request down into smaller parts.
|
||||
|
||||
Setting `slices` to `auto` chooses a reasonable number for most indices.
|
||||
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
|
||||
that:
|
||||
|
||||
* Query performance is most efficient when the number of `slices` is equal to
|
||||
the number of shards in the index. If that number is large (for example,
|
||||
500), choose a lower number as too many `slices` hurts performance. Setting
|
||||
`slices` higher than the number of shards generally does not improve efficiency
|
||||
and adds overhead.
|
||||
|
||||
* Update performance scales linearly across available resources with the
|
||||
number of slices.
|
||||
|
||||
Whether query or update performance dominates the runtime depends on the
|
||||
documents being reindexed and cluster resources.
|
||||
|
||||
[[docs-update-by-query-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<index>`::
|
||||
(Optional, string) A comma-separated list of index names to search. Use `_all`
|
||||
or omit to search all indices.
|
||||
|
||||
[[docs-update-by-query-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=allow-no-indices]
|
||||
+
|
||||
Defaults to `true`.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=analyzer]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=analyze_wildcard]
|
||||
|
||||
`conflicts`::
|
||||
(Optional, string) What to do if delete by query hits version conflicts:
|
||||
`abort` or `proceed`. Defaults to `abort`.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=default_operator]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=df]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]
|
||||
+
|
||||
Defaults to `open`.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailable]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=lenient]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=max_docs]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=pipeline]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=preference]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=request_cache]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=refresh]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_type]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_timeout]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=slices]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=sort]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source_excludes]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source_includes]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=stats]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=timeout]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=version]
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
|
||||
|
||||
[[docs-update-by-query-api-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`query`::
|
||||
(Optional, <<query-dsl,query object>>) Specifies the documents to update
|
||||
using the <<query-dsl,Query DSL>>.
|
||||
|
||||
|
||||
[[docs-update-by-query-api-response-body]]
|
||||
==== Response body
|
||||
|
||||
`took`::
|
||||
The number of milliseconds from start to end of the whole operation.
|
||||
|
||||
`timed_out`::
|
||||
This flag is set to `true` if any of the requests executed during the
|
||||
update by query execution has timed out.
|
||||
|
||||
`total`::
|
||||
The number of documents that were successfully processed.
|
||||
|
||||
`updated`::
|
||||
The number of documents that were successfully updated.
|
||||
|
||||
`deleted`::
|
||||
The number of documents that were successfully deleted.
|
||||
|
||||
`batches`::
|
||||
The number of scroll responses pulled back by the update by query.
|
||||
|
||||
`version_conflicts`::
|
||||
The number of version conflicts that the update by query hit.
|
||||
|
||||
`noops`::
|
||||
The number of documents that were ignored because the script used for
|
||||
the update by query returned a `noop` value for `ctx.op`.
|
||||
|
||||
`retries`::
|
||||
The number of retries attempted by update by query. `bulk` is the number of bulk
|
||||
actions retried, and `search` is the number of search actions retried.
|
||||
|
||||
`throttled_millis`::
|
||||
Number of milliseconds the request slept to conform to `requests_per_second`.
|
||||
|
||||
`requests_per_second`::
|
||||
The number of requests per second effectively executed during the update by query.
|
||||
|
||||
`throttled_until_millis`::
|
||||
This field should always be equal to zero in an `_update_by_query` response. It only
|
||||
has meaning when using the <<docs-update-by-query-task-api, Task API>>, where it
|
||||
indicates the next time (in milliseconds since epoch) a throttled request will be
|
||||
executed again in order to conform to `requests_per_second`.
|
||||
|
||||
`failures`::
|
||||
Array of failures if there were any unrecoverable errors during the process. If
|
||||
this is non-empty then the request aborted because of those failures.
|
||||
Update by query is implemented using batches. Any failure causes the entire
|
||||
process to abort, but all failures in the current batch are collected into the
|
||||
array. You can use the `conflicts` option to prevent reindex from aborting on
|
||||
version conflicts.
|
||||
|
||||
[[docs-update-by-query-api-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The simplest usage of `_update_by_query` just performs an update on every
|
||||
document in the index without changing the source. This is useful to
|
||||
<<picking-up-a-new-property,pick up a new property>> or some other online
|
||||
mapping change.
|
||||
|
||||
To update selected documents, specify a query in the request body:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -91,11 +313,36 @@ POST twitter/_update_by_query?conflicts=proceed
|
|||
way as the <<search-search,Search API>>. You can also use the `q`
|
||||
parameter in the same way as the search API.
|
||||
|
||||
So far we've only been updating documents without changing their source. That
|
||||
is genuinely useful for things like
|
||||
<<picking-up-a-new-property,picking up new properties>> but it's only half the
|
||||
fun. `_update_by_query` <<modules-scripting-using,supports scripts>> to update
|
||||
the document. This will increment the `likes` field on all of kimchy's tweets:
|
||||
Update documents in multiple indices:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST twitter,blog/_update_by_query
|
||||
--------------------------------------------------
|
||||
// TEST[s/^/PUT twitter\nPUT blog\n/]
|
||||
|
||||
Limit the update by query operation to shards that a particular routing value:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST twitter/_update_by_query?routing=1
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
By default update by query uses scroll batches of 1000.
|
||||
You can change the batch size with the `scroll_size` parameter:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST twitter/_update_by_query?scroll_size=100
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[[docs-update-by-query-api-source]]
|
||||
===== Update the document source
|
||||
|
||||
Update by query supports scripts to update the document source.
|
||||
For example, the following request increments the likes field for all of kimchy’s tweets:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -114,62 +361,29 @@ POST twitter/_update_by_query
|
|||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
Just as in <<docs-update,Update API>> you can set `ctx.op` to change the
|
||||
operation that is executed:
|
||||
Note that `conflicts=proceed` is not specified in this example. In this case, a
|
||||
version conflict should halt the process so you can handle the failure.
|
||||
|
||||
As with the <<docs-update,Update API>>, you can set `ctx.op` to change the
|
||||
operation that is performed:
|
||||
|
||||
[horizontal]
|
||||
`noop`::
|
||||
|
||||
Set `ctx.op = "noop"` if your script decides that it doesn't have to make any
|
||||
changes. That will cause `_update_by_query` to omit that document from its updates.
|
||||
This no operation will be reported in the `noop` counter in the
|
||||
<<docs-update-by-query-response-body, response body>>.
|
||||
Set `ctx.op = "noop"` if your script decides that it doesn't have to make any changes.
|
||||
The update by query operation skips updating the document and increments the `noop` counter.
|
||||
|
||||
`delete`::
|
||||
Set `ctx.op = "delete"` if your script decides that the document should be deleted.
|
||||
The update by query operation deletes the document and increments the `deleted` counter.
|
||||
|
||||
Set `ctx.op = "delete"` if your script decides that the document must be
|
||||
deleted. The deletion will be reported in the `deleted` counter in the
|
||||
<<docs-update-by-query-response-body, response body>>.
|
||||
Update by query only supports `update`, `noop`, and `delete`.
|
||||
Setting `ctx.op` to anything else is an error. Setting any other field in `ctx` is an error.
|
||||
This API only enables you to modify the source of matching documents, you cannot move them.
|
||||
|
||||
Setting `ctx.op` to anything else is an error. Setting any
|
||||
other field in `ctx` is an error.
|
||||
[[docs-update-by-query-api-ingest-pipeline]]
|
||||
===== Update documents using an ingest pipeline
|
||||
|
||||
Note that we stopped specifying `conflicts=proceed`. In this case we want a
|
||||
version conflict to abort the process so we can handle the failure.
|
||||
|
||||
This API doesn't allow you to move the documents it touches, just modify their
|
||||
source. This is intentional! We've made no provisions for removing the document
|
||||
from its original location.
|
||||
|
||||
It's also possible to do this whole thing on multiple indexes at once, just
|
||||
like the search API:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST twitter,blog/_update_by_query
|
||||
--------------------------------------------------
|
||||
// TEST[s/^/PUT twitter\nPUT blog\n/]
|
||||
|
||||
If you provide `routing` then the routing is copied to the scroll query,
|
||||
limiting the process to the shards that match that routing value:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST twitter/_update_by_query?routing=1
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
By default `_update_by_query` uses scroll batches of 1000. You can change the
|
||||
batch size with the `scroll_size` URL parameter:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST twitter/_update_by_query?scroll_size=100
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
`_update_by_query` can also use the <<ingest>> feature by
|
||||
specifying a `pipeline` like this:
|
||||
Update by query can use the <<ingest>> feature by specifying a `pipeline`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -187,162 +401,10 @@ POST twitter/_update_by_query?pipeline=set-foo
|
|||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[float]
|
||||
==== URL parameters
|
||||
|
||||
In addition to the standard parameters like `pretty`, the Update By Query API
|
||||
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`,
|
||||
and `scroll`.
|
||||
|
||||
Sending the `refresh` will update all shards in the index being updated when
|
||||
the request completes. This is different than the Update API's `refresh`
|
||||
parameter, which causes just the shard that received the new data to be indexed.
|
||||
Also unlike the Update API it does not support `wait_for`.
|
||||
|
||||
If the request contains `wait_for_completion=false` then Elasticsearch will
|
||||
perform some preflight checks, launch the request, and then return a `task`
|
||||
which can be used with <<docs-update-by-query-task-api,Tasks APIs>>
|
||||
to cancel or get the status of the task. Elasticsearch will also create a
|
||||
record of this task as a document at `.tasks/task/${taskId}`. This is yours
|
||||
to keep or remove as you see fit. When you are done with it, delete it so
|
||||
Elasticsearch can reclaim the space it uses.
|
||||
|
||||
`wait_for_active_shards` controls how many copies of a shard must be active
|
||||
before proceeding with the request. See <<index-wait-for-active-shards,here>>
|
||||
for details. `timeout` controls how long each write request waits for unavailable
|
||||
shards to become available. Both work exactly how they work in the
|
||||
<<docs-bulk,Bulk API>>. Because `_update_by_query` uses scroll search, you can also specify
|
||||
the `scroll` parameter to control how long it keeps the "search context" alive,
|
||||
e.g. `?scroll=10m`. By default it's 5 minutes.
|
||||
|
||||
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
|
||||
`1000`, etc.) and throttles the rate at which `_update_by_query` issues batches of
|
||||
index operations by padding each batch with a wait time. The throttling can be
|
||||
disabled by setting `requests_per_second` to `-1`.
|
||||
|
||||
The throttling is done by waiting between batches so that scroll that
|
||||
`_update_by_query` uses internally can be given a timeout that takes into
|
||||
account the padding. The padding time is the difference between the batch size
|
||||
divided by the `requests_per_second` and the time spent writing. By default the
|
||||
batch size is `1000`, so if the `requests_per_second` is set to `500`:
|
||||
|
||||
[source,txt]
|
||||
--------------------------------------------------
|
||||
target_time = 1000 / 500 per second = 2 seconds
|
||||
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
|
||||
--------------------------------------------------
|
||||
|
||||
Since the batch is issued as a single `_bulk` request, large batch sizes will
|
||||
cause Elasticsearch to create many requests and then wait for a while before
|
||||
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
|
||||
|
||||
[float]
|
||||
[[docs-update-by-query-response-body]]
|
||||
==== Response body
|
||||
|
||||
//////////////////////////
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST /twitter/_update_by_query?conflicts=proceed
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
//////////////////////////
|
||||
|
||||
The JSON response looks like this:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"took" : 147,
|
||||
"timed_out": false,
|
||||
"total": 5,
|
||||
"updated": 5,
|
||||
"deleted": 0,
|
||||
"batches": 1,
|
||||
"version_conflicts": 0,
|
||||
"noops": 0,
|
||||
"retries": {
|
||||
"bulk": 0,
|
||||
"search": 0
|
||||
},
|
||||
"throttled_millis": 0,
|
||||
"requests_per_second": -1.0,
|
||||
"throttled_until_millis": 0,
|
||||
"failures" : [ ]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
|
||||
|
||||
[horizontal]
|
||||
`took`::
|
||||
|
||||
The number of milliseconds from start to end of the whole operation.
|
||||
|
||||
`timed_out`::
|
||||
|
||||
This flag is set to `true` if any of the requests executed during the
|
||||
update by query execution has timed out.
|
||||
|
||||
`total`::
|
||||
|
||||
The number of documents that were successfully processed.
|
||||
|
||||
`updated`::
|
||||
|
||||
The number of documents that were successfully updated.
|
||||
|
||||
`deleted`::
|
||||
|
||||
The number of documents that were successfully deleted.
|
||||
|
||||
`batches`::
|
||||
|
||||
The number of scroll responses pulled back by the update by query.
|
||||
|
||||
`version_conflicts`::
|
||||
|
||||
The number of version conflicts that the update by query hit.
|
||||
|
||||
`noops`::
|
||||
|
||||
The number of documents that were ignored because the script used for
|
||||
the update by query returned a `noop` value for `ctx.op`.
|
||||
|
||||
`retries`::
|
||||
|
||||
The number of retries attempted by update by query. `bulk` is the number of bulk
|
||||
actions retried, and `search` is the number of search actions retried.
|
||||
|
||||
`throttled_millis`::
|
||||
|
||||
Number of milliseconds the request slept to conform to `requests_per_second`.
|
||||
|
||||
`requests_per_second`::
|
||||
|
||||
The number of requests per second effectively executed during the update by query.
|
||||
|
||||
`throttled_until_millis`::
|
||||
|
||||
This field should always be equal to zero in an `_update_by_query` response. It only
|
||||
has meaning when using the <<docs-update-by-query-task-api, Task API>>, where it
|
||||
indicates the next time (in milliseconds since epoch) a throttled request will be
|
||||
executed again in order to conform to `requests_per_second`.
|
||||
|
||||
`failures`::
|
||||
|
||||
Array of failures if there were any unrecoverable errors during the process. If
|
||||
this is non-empty then the request aborted because of those failures.
|
||||
Update by query is implemented using batches. Any failure causes the entire
|
||||
process to abort, but all failures in the current batch are collected into the
|
||||
array. You can use the `conflicts` option to prevent reindex from aborting on
|
||||
version conflicts.
|
||||
|
||||
|
||||
|
||||
[float]
|
||||
[[docs-update-by-query-task-api]]
|
||||
==== Works with the Task API
|
||||
[[docs-update-by-query-fetch-tasks]]
|
||||
===== Get the status of update by query operations
|
||||
|
||||
You can fetch the status of all running update by query requests with the
|
||||
<<tasks,Task API>>:
|
||||
|
@ -421,7 +483,7 @@ you to delete that document.
|
|||
|
||||
[float]
|
||||
[[docs-update-by-query-cancel-task-api]]
|
||||
==== Works with the Cancel Task API
|
||||
===== Cancel an update by query operation
|
||||
|
||||
Any update by query can be cancelled using the <<tasks,Task Cancel API>>:
|
||||
|
||||
|
@ -439,7 +501,7 @@ that it has been cancelled and terminates itself.
|
|||
|
||||
[float]
|
||||
[[docs-update-by-query-rethrottle]]
|
||||
==== Rethrottling
|
||||
===== Change throttling for a request
|
||||
|
||||
The value of `requests_per_second` can be changed on a running update by query
|
||||
using the `_rethrottle` API:
|
||||
|
@ -458,17 +520,9 @@ query takes effect immediately, but rethrotting that slows down the query will
|
|||
take effect after completing the current batch. This prevents scroll
|
||||
timeouts.
|
||||
|
||||
[float]
|
||||
[[docs-update-by-query-slice]]
|
||||
==== Slicing
|
||||
|
||||
Update by query supports <<sliced-scroll>> to parallelize the updating process.
|
||||
This parallelization can improve efficiency and provide a convenient way to
|
||||
break the request down into smaller parts.
|
||||
|
||||
[float]
|
||||
[[docs-update-by-query-manual-slice]]
|
||||
===== Manual slicing
|
||||
===== Slice manually
|
||||
Slice an update by query manually by providing a slice id and total number of
|
||||
slices to each request:
|
||||
|
||||
|
@ -522,7 +576,7 @@ Which results in a sensible `total` like this one:
|
|||
|
||||
[float]
|
||||
[[docs-update-by-query-automatic-slice]]
|
||||
===== Automatic slicing
|
||||
===== Use automatic slicing
|
||||
|
||||
You can also let update by query automatically parallelize using
|
||||
<<sliced-scroll>> to slice on `_id`. Use `slices` to specify the number of
|
||||
|
@ -590,29 +644,9 @@ being updated.
|
|||
* Each sub-request gets a slightly different snapshot of the source index
|
||||
though these are all taken at approximately the same time.
|
||||
|
||||
[float]
|
||||
[[docs-update-by-query-picking-slices]]
|
||||
====== Picking the number of slices
|
||||
|
||||
If slicing automatically, setting `slices` to `auto` will choose a reasonable
|
||||
number for most indices. If you're slicing manually or otherwise tuning
|
||||
automatic slicing, use these guidelines.
|
||||
|
||||
Query performance is most efficient when the number of `slices` is equal to the
|
||||
number of shards in the index. If that number is large, (for example,
|
||||
500) choose a lower number as too many `slices` will hurt performance. Setting
|
||||
`slices` higher than the number of shards generally does not improve efficiency
|
||||
and adds overhead.
|
||||
|
||||
Update performance scales linearly across available resources with the
|
||||
number of slices.
|
||||
|
||||
Whether query or update performance dominates the runtime depends on the
|
||||
documents being reindexed and cluster resources.
|
||||
|
||||
[float]
|
||||
[[picking-up-a-new-property]]
|
||||
==== Pick up a new property
|
||||
===== Pick up a new property
|
||||
|
||||
Say you created an index without dynamic mapping, filled it with data, and then
|
||||
added a mapping value to pick up more fields from the data:
|
||||
|
|
Loading…
Reference in New Issue