OpenSearch/docs/java-rest/high-level/document/update-by-query.asciidoc

182 lines
8.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[java-rest-high-document-update-by-query]]
=== Update By Query API
[[java-rest-high-document-update-by-query-request]]
==== Update By Query Request
A `UpdateByQueryRequest` can be used to update documents in an index.
It requires an existing index (or a set of indices) on which the update is to be performed.
The simplest form of a `UpdateByQueryRequest` looks like follows:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request]
--------------------------------------------------
<1> Creates the `UpdateByQueryRequest` on a set of indices.
By default version conflicts abort the `UpdateByQueryRequest` process but you can just count them by settings it to
`proceed` in the request body
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-conflicts]
--------------------------------------------------
<1> Set `proceed` on version conflict
You can limit the documents by adding a type to the source or by adding a query.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-typeOrQuery]
--------------------------------------------------
<1> Only copy `doc` type
<2> Only copy documents which have field `user` set to `kimchy`
Its also possible to limit the number of processed documents by setting size.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-size]
--------------------------------------------------
<1> Only copy 10 documents
By default `UpdateByQueryRequest` uses batches of 1000. You can change the batch size with `setBatchSize`.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-scrollSize]
--------------------------------------------------
<1> Use batches of 100 documents
Update by query can also use the ingest feature by specifying a `pipeline`.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-pipeline]
--------------------------------------------------
<1> set pipeline to `my_pipeline`
`UpdateByQueryRequest` also supports a `script` that modifies the document. The following example illustrates that.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-script]
--------------------------------------------------
<1> `setScript` to increment the `likes` field on all documents with user `kimchy`.
`UpdateByQueryRequest` also helps in automatically parallelizing using `sliced-scroll` to
slice on `_uid`. Use `setSlices` to specify the number of slices to use.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-slices]
--------------------------------------------------
<1> set number of slices to use
`UpdateByQueryRequest` uses the `scroll` parameter to control how long it keeps the "search context" alive.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-scroll]
--------------------------------------------------
<1> set scroll time
If you provide routing then the routing is copied to the scroll query, limiting the process to the shards that match
that routing value.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-routing]
--------------------------------------------------
<1> set routing
==== Optional arguments
In addition to the options above the following arguments can optionally be also provided:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-timeout]
--------------------------------------------------
<1> Timeout to wait for the update by query request to be performed as a `TimeValue`
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-refresh]
--------------------------------------------------
<1> Refresh index after calling update by query
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-request-indicesOptions]
--------------------------------------------------
<1> Set indices options
[[java-rest-high-document-update-by-query-sync]]
==== Synchronous Execution
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-execute]
--------------------------------------------------
[[java-rest-high-document-update-by-query-async]]
==== Asynchronous Execution
The asynchronous execution of an update by query request requires both the `UpdateByQueryRequest`
instance and an `ActionListener` instance to be passed to the asynchronous
method:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-execute-async]
--------------------------------------------------
<1> The `UpdateByQueryRequest` to execute and the `ActionListener` to use when
the execution completes
The asynchronous method does not block and returns immediately. Once it is
completed the `ActionListener` is called back using the `onResponse` method
if the execution successfully completed or using the `onFailure` method if
it failed.
A typical listener for `BulkByScrollResponse` looks like:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-execute-listener]
--------------------------------------------------
<1> Called when the execution is successfully completed. The response is
provided as an argument and contains a list of individual results for each
operation that was executed. Note that one or more operations might have
failed while the others have been successfully executed.
<2> Called when the whole `UpdateByQueryRequest` fails. In this case the raised
exception is provided as an argument and no operation has been executed.
[[java-rest-high-document-update-by-query-execute-listener-response]]
==== Update By Query Response
The returned `BulkByScrollResponse` contains information about the executed operations and
allows to iterate over each result as follows:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests}/CRUDDocumentationIT.java[update-by-query-response]
--------------------------------------------------
<1> Get total time taken
<2> Check if the request timed out
<3> Get total number of docs processed
<4> Number of docs that were updated
<5> Number of docs that were deleted
<6> Number of batches that were executed
<7> Number of skipped docs
<8> Number of version conflicts
<9> Number of times request had to retry bulk index operations
<10> Number of times request had to retry search operations
<11> The total time this request has throttled itself not including the current throttle time if it is currently sleeping
<12> Remaining delay of any current throttle sleep or 0 if not sleeping
<13> Failures during search phase
<14> Failures during bulk index operation