167 lines
7.2 KiB
Plaintext
167 lines
7.2 KiB
Plaintext
[[java-docs-update-by-query]]
|
|
=== Update By Query API
|
|
|
|
The simplest usage of `updateByQuery` updates each
|
|
document in an index without changing the source. This usage enables
|
|
picking up a new property or another online mapping change.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query]
|
|
--------------------------------------------------
|
|
|
|
Calls to the `updateByQuery` API start by getting a snapshot of the index, indexing
|
|
any documents found using the `internal` versioning.
|
|
|
|
NOTE: Version conflicts happen when a document changes between the time of the
|
|
snapshot and the time the index request processes.
|
|
|
|
When the versions match, `updateByQuery` updates the document
|
|
and increments the version number.
|
|
|
|
All update and query failures cause `updateByQuery` to abort. These failures are
|
|
available from the `BulkByScrollResponse#getIndexingFailures` method. Any
|
|
successful updates remain and are not rolled back. While the first failure
|
|
causes the abort, the response contains all of the failures generated by the
|
|
failed bulk request.
|
|
|
|
To prevent version conflicts from causing `updateByQuery` to abort, set
|
|
`abortOnVersionConflict(false)`. The first example does this because it is
|
|
trying to pick up an online mapping change and a version conflict means that
|
|
the conflicting document was updated between the start of the `updateByQuery`
|
|
and the time when it attempted to update the document. This is fine because
|
|
that update will have picked up the online mapping update.
|
|
|
|
The `UpdateByQueryRequestBuilder` API supports filtering the updated documents,
|
|
limiting the total number of documents to update, and updating documents
|
|
with a script:
|
|
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-filter]
|
|
--------------------------------------------------
|
|
|
|
`UpdateByQueryRequestBuilder` also enables direct access to the query used
|
|
to select the documents. You can use this access to change the default scroll size or
|
|
otherwise modify the request for matching documents.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-size]
|
|
--------------------------------------------------
|
|
|
|
You can also combine `maxDocs` with sorting to limit the documents updated:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-sort]
|
|
--------------------------------------------------
|
|
|
|
In addition to changing the `_source` field for the document, you can use a
|
|
script to change the action, similar to the Update API:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-script]
|
|
--------------------------------------------------
|
|
|
|
As in the <<java-docs-update,Update API>>, you can set the value of `ctx.op` to change the
|
|
operation that executes:
|
|
|
|
`noop`::
|
|
|
|
Set `ctx.op = "noop"` if your script doesn't make any
|
|
changes. The `updateByQuery` operation then omits that document from the updates.
|
|
This behavior increments the `noop` counter in the response body.
|
|
|
|
`delete`::
|
|
|
|
Set `ctx.op = "delete"` if your script decides that the document must be
|
|
deleted. The deletion will be reported in the `deleted` counter in the
|
|
response body.
|
|
|
|
Setting `ctx.op` to any other value generates an error. Setting any
|
|
other field in `ctx` generates an error.
|
|
|
|
This API doesn't allow you to move the documents it touches, just modify their
|
|
source. This is intentional! We've made no provisions for removing the document
|
|
from its original location.
|
|
|
|
You can also perform these operations on multiple indices at once, similar to the search API:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-multi-index]
|
|
--------------------------------------------------
|
|
|
|
If you provide a `routing` value then the process copies the routing value to the scroll query,
|
|
limiting the process to the shards that match that routing value:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-routing]
|
|
--------------------------------------------------
|
|
|
|
`updateByQuery` can also use the ingest node by
|
|
specifying a `pipeline` like this:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-pipeline]
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[java-docs-update-by-query-task-api]]
|
|
=== Works with the Task API
|
|
|
|
You can fetch the status of all running update-by-query requests with the Task API:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-list-tasks]
|
|
--------------------------------------------------
|
|
|
|
With the `TaskId` shown above you can look up the task directly:
|
|
|
|
// provide API Example
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-get-task]
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[java-docs-update-by-query-cancel-task-api]]
|
|
=== Works with the Cancel Task API
|
|
|
|
Any Update By Query can be canceled using the Task Cancel API:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-cancel-task]
|
|
--------------------------------------------------
|
|
|
|
Use the `list tasks` API to find the value of `taskId`.
|
|
|
|
Cancelling a request is typically a very fast process but can take up to a few seconds.
|
|
The task status API continues to list the task until the cancellation is complete.
|
|
|
|
[float]
|
|
[[java-docs-update-by-query-rethrottle]]
|
|
=== Rethrottling
|
|
|
|
Use the `_rethrottle` API to change the value of `requests_per_second` on a running update:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-rethrottle]
|
|
--------------------------------------------------
|
|
|
|
Use the `list tasks` API to find the value of `taskId`.
|
|
|
|
As with the `updateByQuery` API, the value of `requests_per_second`
|
|
can be any positive float value to set the level of the throttle, or `Float.POSITIVE_INFINITY` to disable throttling.
|
|
A value of `requests_per_second` that speeds up the process takes
|
|
effect immediately. `requests_per_second` values that slow the query take effect
|
|
after completing the current batch in order to prevent scroll timeouts.
|