770 lines
24 KiB
Plaintext
770 lines
24 KiB
Plaintext
[[docs-update-by-query]]
|
|
=== Update By Query API
|
|
++++
|
|
<titleabbrev>Update by query</titleabbrev>
|
|
++++
|
|
|
|
Updates documents that match the specified query.
|
|
If no query is specified, performs an update on every document in the data stream or index without
|
|
modifying the source, which is useful for picking up mapping changes.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST my-index-000001/_update_by_query?conflicts=proceed
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index_big]
|
|
|
|
////
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"took" : 147,
|
|
"timed_out": false,
|
|
"updated": 120,
|
|
"deleted": 0,
|
|
"batches": 1,
|
|
"version_conflicts": 0,
|
|
"noops": 0,
|
|
"retries": {
|
|
"bulk": 0,
|
|
"search": 0
|
|
},
|
|
"throttled_millis": 0,
|
|
"requests_per_second": -1.0,
|
|
"throttled_until_millis": 0,
|
|
"total": 120,
|
|
"failures" : [ ]
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
|
|
|
|
////
|
|
|
|
[[docs-update-by-query-api-request]]
|
|
==== {api-request-title}
|
|
|
|
`POST /<target>/_update_by_query`
|
|
|
|
[[docs-update-by-query-api-desc]]
|
|
==== {api-description-title}
|
|
|
|
You can specify the query criteria in the request URI or the request body
|
|
using the same syntax as the <<search-search,Search API>>.
|
|
|
|
When you submit an update by query request, {es} gets a snapshot of the data stream or index
|
|
when it begins processing the request and updates matching documents using
|
|
`internal` versioning.
|
|
When the versions match, the document is updated and the version number is incremented.
|
|
If a document changes between the time that the snapshot is taken and
|
|
the update operation is processed, it results in a version conflict and the operation fails.
|
|
You can opt to count version conflicts instead of halting and returning by
|
|
setting `conflicts` to `proceed`.
|
|
|
|
NOTE: Documents with a version equal to 0 cannot be updated using update by
|
|
query because `internal` versioning does not support 0 as a valid
|
|
version number.
|
|
|
|
While processing an update by query request, {es} performs multiple search
|
|
requests sequentially to find all of the matching documents.
|
|
A bulk update request is performed for each batch of matching documents.
|
|
Any query or update failures cause the update by query request to fail and
|
|
the failures are shown in the response.
|
|
Any update requests that completed successfully still stick, they are not rolled back.
|
|
|
|
===== Refreshing shards
|
|
|
|
Specifying the `refresh` parameter refreshes all shards once the request completes.
|
|
This is different than the update API's `refresh` parameter, which causes just the shard
|
|
that received the request to be refreshed. Unlike the update API, it does not support
|
|
`wait_for`.
|
|
|
|
[[docs-update-by-query-task-api]]
|
|
===== Running update by query asynchronously
|
|
|
|
If the request contains `wait_for_completion=false`, {es}
|
|
performs some preflight checks, launches the request, and returns a
|
|
<<tasks,`task`>> you can use to cancel or get the status of the task.
|
|
{es} creates a record of this task as a document at `.tasks/task/${taskId}`.
|
|
When you are done with a task, you should delete the task document so
|
|
{es} can reclaim the space.
|
|
|
|
===== Waiting for active shards
|
|
|
|
`wait_for_active_shards` controls how many copies of a shard must be active
|
|
before proceeding with the request. See <<index-wait-for-active-shards>>
|
|
for details. `timeout` controls how long each write request waits for unavailable
|
|
shards to become available. Both work exactly the way they work in the
|
|
<<docs-bulk,Bulk API>>. Update by query uses scrolled searches, so you can also
|
|
specify the `scroll` parameter to control how long it keeps the search context
|
|
alive, for example `?scroll=10m`. The default is 5 minutes.
|
|
|
|
===== Throttling update requests
|
|
|
|
To control the rate at which update by query issues batches of update operations,
|
|
you can set `requests_per_second` to any positive decimal number. This pads each
|
|
batch with a wait time to throttle the rate. Set `requests_per_second` to `-1`
|
|
to disable throttling.
|
|
|
|
Throttling uses a wait time between batches so that the internal scroll requests
|
|
can be given a timeout that takes the request padding into account. The padding
|
|
time is the difference between the batch size divided by the
|
|
`requests_per_second` and the time spent writing. By default the batch size is
|
|
`1000`, so if `requests_per_second` is set to `500`:
|
|
|
|
[source,txt]
|
|
--------------------------------------------------
|
|
target_time = 1000 / 500 per second = 2 seconds
|
|
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
|
|
--------------------------------------------------
|
|
|
|
Since the batch is issued as a single `_bulk` request, large batch sizes
|
|
cause {es} to create many requests and wait before starting the next set.
|
|
This is "bursty" instead of "smooth".
|
|
|
|
[[docs-update-by-query-slice]]
|
|
===== Slicing
|
|
|
|
Update by query supports <<slice-scroll, sliced scroll>> to parallelize the
|
|
update process. This can improve efficiency and provide a
|
|
convenient way to break the request down into smaller parts.
|
|
|
|
Setting `slices` to `auto` chooses a reasonable number for most data streams and indices.
|
|
If you're slicing manually or otherwise tuning automatic slicing, keep in mind
|
|
that:
|
|
|
|
* Query performance is most efficient when the number of `slices` is equal to
|
|
the number of shards in the index or backing index. If that number is large (for example,
|
|
500), choose a lower number as too many `slices` hurts performance. Setting
|
|
`slices` higher than the number of shards generally does not improve efficiency
|
|
and adds overhead.
|
|
|
|
* Update performance scales linearly across available resources with the
|
|
number of slices.
|
|
|
|
Whether query or update performance dominates the runtime depends on the
|
|
documents being reindexed and cluster resources.
|
|
|
|
[[docs-update-by-query-api-path-params]]
|
|
==== {api-path-parms-title}
|
|
|
|
`<target>`::
|
|
(Optional, string)
|
|
Comma-separated list of data streams, indices, and index aliases to search.
|
|
Wildcard (`*`) expressions are supported.
|
|
+
|
|
To search all data streams or indices in a cluster, omit this parameter or use
|
|
`_all` or `*`.
|
|
|
|
[[docs-update-by-query-api-query-params]]
|
|
==== {api-query-parms-title}
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=allow-no-indices]
|
|
+
|
|
Defaults to `true`.
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=analyzer]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=analyze_wildcard]
|
|
|
|
`conflicts`::
|
|
(Optional, string) What to do if update by query hits version conflicts:
|
|
`abort` or `proceed`. Defaults to `abort`.
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=default_operator]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=df]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]
|
|
+
|
|
Defaults to `open`.
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailable]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=lenient]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=max_docs]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=pipeline]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=preference]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=request_cache]
|
|
|
|
`refresh`::
|
|
(Optional, Boolean)
|
|
If `true`, {es} refreshes affected shards to make the operation visible to
|
|
search. Defaults to `false`.
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=requests_per_second]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
|
|
|
|
`scroll`::
|
|
(Optional, <<time-units,time value>>)
|
|
Period to retain the <<scroll-search-context,search context>> for scrolling. See
|
|
<<scroll-search-results>>.
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=scroll_size]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_type]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_timeout]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=slices]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=sort]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source_excludes]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=source_includes]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=stats]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=timeout]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=version]
|
|
|
|
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
|
|
|
|
[[docs-update-by-query-api-request-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`query`::
|
|
(Optional, <<query-dsl,query object>>) Specifies the documents to update
|
|
using the <<query-dsl,Query DSL>>.
|
|
|
|
|
|
[[docs-update-by-query-api-response-body]]
|
|
==== Response body
|
|
|
|
`took`::
|
|
The number of milliseconds from start to end of the whole operation.
|
|
|
|
`timed_out`::
|
|
This flag is set to `true` if any of the requests executed during the
|
|
update by query execution has timed out.
|
|
|
|
`total`::
|
|
The number of documents that were successfully processed.
|
|
|
|
`updated`::
|
|
The number of documents that were successfully updated.
|
|
|
|
`deleted`::
|
|
The number of documents that were successfully deleted.
|
|
|
|
`batches`::
|
|
The number of scroll responses pulled back by the update by query.
|
|
|
|
`version_conflicts`::
|
|
The number of version conflicts that the update by query hit.
|
|
|
|
`noops`::
|
|
The number of documents that were ignored because the script used for
|
|
the update by query returned a `noop` value for `ctx.op`.
|
|
|
|
`retries`::
|
|
The number of retries attempted by update by query. `bulk` is the number of bulk
|
|
actions retried, and `search` is the number of search actions retried.
|
|
|
|
`throttled_millis`::
|
|
Number of milliseconds the request slept to conform to `requests_per_second`.
|
|
|
|
`requests_per_second`::
|
|
The number of requests per second effectively executed during the update by query.
|
|
|
|
`throttled_until_millis`::
|
|
This field should always be equal to zero in an `_update_by_query` response. It only
|
|
has meaning when using the <<docs-update-by-query-task-api, Task API>>, where it
|
|
indicates the next time (in milliseconds since epoch) a throttled request will be
|
|
executed again in order to conform to `requests_per_second`.
|
|
|
|
`failures`::
|
|
Array of failures if there were any unrecoverable errors during the process. If
|
|
this is non-empty then the request aborted because of those failures.
|
|
Update by query is implemented using batches. Any failure causes the entire
|
|
process to abort, but all failures in the current batch are collected into the
|
|
array. You can use the `conflicts` option to prevent reindex from aborting on
|
|
version conflicts.
|
|
|
|
[[docs-update-by-query-api-example]]
|
|
==== {api-examples-title}
|
|
|
|
The simplest usage of `_update_by_query` just performs an update on every
|
|
document in the data stream or index without changing the source. This is useful to
|
|
<<picking-up-a-new-property,pick up a new property>> or some other online
|
|
mapping change.
|
|
|
|
To update selected documents, specify a query in the request body:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST my-index-000001/_update_by_query?conflicts=proceed
|
|
{
|
|
"query": { <1>
|
|
"term": {
|
|
"user.id": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index]
|
|
|
|
<1> The query must be passed as a value to the `query` key, in the same
|
|
way as the <<search-search,Search API>>. You can also use the `q`
|
|
parameter in the same way as the search API.
|
|
|
|
Update documents in multiple data streams or indices:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST my-index-000001,my-index-000002/_update_by_query
|
|
--------------------------------------------------
|
|
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
|
|
|
|
Limit the update by query operation to shards that a particular routing value:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST my-index-000001/_update_by_query?routing=1
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index]
|
|
|
|
By default update by query uses scroll batches of 1000.
|
|
You can change the batch size with the `scroll_size` parameter:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST my-index-000001/_update_by_query?scroll_size=100
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index]
|
|
|
|
[[docs-update-by-query-api-source]]
|
|
===== Update the document source
|
|
|
|
Update by query supports scripts to update the document source.
|
|
For example, the following request increments the `count` field for all
|
|
documents with a `user.id` of `kimchy` in `my-index-000001`:
|
|
|
|
////
|
|
[source,console]
|
|
----
|
|
PUT my-index-000001/_create/1
|
|
{
|
|
"user": {
|
|
"id": "kimchy"
|
|
},
|
|
"count": 1
|
|
}
|
|
----
|
|
////
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST my-index-000001/_update_by_query
|
|
{
|
|
"script": {
|
|
"source": "ctx._source.count++",
|
|
"lang": "painless"
|
|
},
|
|
"query": {
|
|
"term": {
|
|
"user.id": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
Note that `conflicts=proceed` is not specified in this example. In this case, a
|
|
version conflict should halt the process so you can handle the failure.
|
|
|
|
As with the <<docs-update,Update API>>, you can set `ctx.op` to change the
|
|
operation that is performed:
|
|
|
|
[horizontal]
|
|
`noop`::
|
|
Set `ctx.op = "noop"` if your script decides that it doesn't have to make any changes.
|
|
The update by query operation skips updating the document and increments the `noop` counter.
|
|
|
|
`delete`::
|
|
Set `ctx.op = "delete"` if your script decides that the document should be deleted.
|
|
The update by query operation deletes the document and increments the `deleted` counter.
|
|
|
|
Update by query only supports `update`, `noop`, and `delete`.
|
|
Setting `ctx.op` to anything else is an error. Setting any other field in `ctx` is an error.
|
|
This API only enables you to modify the source of matching documents, you cannot move them.
|
|
|
|
[[docs-update-by-query-api-ingest-pipeline]]
|
|
===== Update documents using an ingest pipeline
|
|
|
|
Update by query can use the <<ingest>> feature by specifying a `pipeline`:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT _ingest/pipeline/set-foo
|
|
{
|
|
"description" : "sets foo",
|
|
"processors" : [ {
|
|
"set" : {
|
|
"field": "foo",
|
|
"value": "bar"
|
|
}
|
|
} ]
|
|
}
|
|
POST my-index-000001/_update_by_query?pipeline=set-foo
|
|
--------------------------------------------------
|
|
// TEST[setup:my_index]
|
|
|
|
|
|
[discrete]
|
|
[[docs-update-by-query-fetch-tasks]]
|
|
===== Get the status of update by query operations
|
|
|
|
You can fetch the status of all running update by query requests with the
|
|
<<tasks,Task API>>:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET _tasks?detailed=true&actions=*byquery
|
|
--------------------------------------------------
|
|
// TEST[skip:No tasks to retrieve]
|
|
|
|
The responses looks like:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"nodes" : {
|
|
"r1A2WoRbTwKZ516z6NEs5A" : {
|
|
"name" : "r1A2WoR",
|
|
"transport_address" : "127.0.0.1:9300",
|
|
"host" : "127.0.0.1",
|
|
"ip" : "127.0.0.1:9300",
|
|
"attributes" : {
|
|
"testattr" : "test",
|
|
"portsfile" : "true"
|
|
},
|
|
"tasks" : {
|
|
"r1A2WoRbTwKZ516z6NEs5A:36619" : {
|
|
"node" : "r1A2WoRbTwKZ516z6NEs5A",
|
|
"id" : 36619,
|
|
"type" : "transport",
|
|
"action" : "indices:data/write/update/byquery",
|
|
"status" : { <1>
|
|
"total" : 6154,
|
|
"updated" : 3500,
|
|
"created" : 0,
|
|
"deleted" : 0,
|
|
"batches" : 4,
|
|
"version_conflicts" : 0,
|
|
"noops" : 0,
|
|
"retries": {
|
|
"bulk": 0,
|
|
"search": 0
|
|
},
|
|
"throttled_millis": 0
|
|
},
|
|
"description" : ""
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> This object contains the actual status. It is just like the response JSON
|
|
with the important addition of the `total` field. `total` is the total number
|
|
of operations that the reindex expects to perform. You can estimate the
|
|
progress by adding the `updated`, `created`, and `deleted` fields. The request
|
|
will finish when their sum is equal to the `total` field.
|
|
|
|
With the task id you can look up the task directly. The following example
|
|
retrieves information about task `r1A2WoRbTwKZ516z6NEs5A:36619`:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619
|
|
--------------------------------------------------
|
|
// TEST[catch:missing]
|
|
|
|
The advantage of this API is that it integrates with `wait_for_completion=false`
|
|
to transparently return the status of completed tasks. If the task is completed
|
|
and `wait_for_completion=false` was set on it, then it'll come back with a
|
|
`results` or an `error` field. The cost of this feature is the document that
|
|
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
|
|
you to delete that document.
|
|
|
|
|
|
[discrete]
|
|
[[docs-update-by-query-cancel-task-api]]
|
|
===== Cancel an update by query operation
|
|
|
|
Any update by query can be cancelled using the <<tasks,Task Cancel API>>:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
|
|
--------------------------------------------------
|
|
|
|
The task ID can be found using the <<tasks,tasks API>>.
|
|
|
|
Cancellation should happen quickly but might take a few seconds. The task status
|
|
API above will continue to list the update by query task until this task checks
|
|
that it has been cancelled and terminates itself.
|
|
|
|
|
|
[discrete]
|
|
[[docs-update-by-query-rethrottle]]
|
|
===== Change throttling for a request
|
|
|
|
The value of `requests_per_second` can be changed on a running update by query
|
|
using the `_rethrottle` API:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _update_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
|
|
--------------------------------------------------
|
|
|
|
The task ID can be found using the <<tasks, tasks API>>.
|
|
|
|
Just like when setting it on the `_update_by_query` API, `requests_per_second`
|
|
can be either `-1` to disable throttling or any decimal number
|
|
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
|
|
query takes effect immediately, but rethrotting that slows down the query will
|
|
take effect after completing the current batch. This prevents scroll
|
|
timeouts.
|
|
|
|
[discrete]
|
|
[[docs-update-by-query-manual-slice]]
|
|
===== Slice manually
|
|
Slice an update by query manually by providing a slice id and total number of
|
|
slices to each request:
|
|
|
|
[source,console]
|
|
----------------------------------------------------------------
|
|
POST my-index-000001/_update_by_query
|
|
{
|
|
"slice": {
|
|
"id": 0,
|
|
"max": 2
|
|
},
|
|
"script": {
|
|
"source": "ctx._source['extra'] = 'test'"
|
|
}
|
|
}
|
|
POST my-index-000001/_update_by_query
|
|
{
|
|
"slice": {
|
|
"id": 1,
|
|
"max": 2
|
|
},
|
|
"script": {
|
|
"source": "ctx._source['extra'] = 'test'"
|
|
}
|
|
}
|
|
----------------------------------------------------------------
|
|
// TEST[setup:my_index_big]
|
|
|
|
Which you can verify works with:
|
|
|
|
[source,console]
|
|
----------------------------------------------------------------
|
|
GET _refresh
|
|
POST my-index-000001/_search?size=0&q=extra:test&filter_path=hits.total
|
|
----------------------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
Which results in a sensible `total` like this one:
|
|
|
|
[source,console-result]
|
|
----------------------------------------------------------------
|
|
{
|
|
"hits": {
|
|
"total": {
|
|
"value": 120,
|
|
"relation": "eq"
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------------------
|
|
|
|
[discrete]
|
|
[[docs-update-by-query-automatic-slice]]
|
|
===== Use automatic slicing
|
|
|
|
You can also let update by query automatically parallelize using
|
|
<<slice-scroll>> to slice on `_id`. Use `slices` to specify the number of
|
|
slices to use:
|
|
|
|
[source,console]
|
|
----------------------------------------------------------------
|
|
POST my-index-000001/_update_by_query?refresh&slices=5
|
|
{
|
|
"script": {
|
|
"source": "ctx._source['extra'] = 'test'"
|
|
}
|
|
}
|
|
----------------------------------------------------------------
|
|
// TEST[setup:my_index_big]
|
|
|
|
Which you also can verify works with:
|
|
|
|
[source,console]
|
|
----------------------------------------------------------------
|
|
POST my-index-000001/_search?size=0&q=extra:test&filter_path=hits.total
|
|
----------------------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
Which results in a sensible `total` like this one:
|
|
|
|
[source,console-result]
|
|
----------------------------------------------------------------
|
|
{
|
|
"hits": {
|
|
"total": {
|
|
"value": 120,
|
|
"relation": "eq"
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------------------
|
|
|
|
Setting `slices` to `auto` will let Elasticsearch choose the number of slices
|
|
to use. This setting will use one slice per shard, up to a certain limit. If
|
|
there are multiple source data streams or indices, it will choose the number of slices based
|
|
on the index or backing index with the smallest number of shards.
|
|
|
|
Adding `slices` to `_update_by_query` just automates the manual process used in
|
|
the section above, creating sub-requests which means it has some quirks:
|
|
|
|
* You can see these requests in the
|
|
<<docs-update-by-query-task-api,Tasks APIs>>. These sub-requests are "child"
|
|
tasks of the task for the request with `slices`.
|
|
* Fetching the status of the task for the request with `slices` only contains
|
|
the status of completed slices.
|
|
* These sub-requests are individually addressable for things like cancellation
|
|
and rethrottling.
|
|
* Rethrottling the request with `slices` will rethrottle the unfinished
|
|
sub-request proportionally.
|
|
* Canceling the request with `slices` will cancel each sub-request.
|
|
* Due to the nature of `slices` each sub-request won't get a perfectly even
|
|
portion of the documents. All documents will be addressed, but some slices may
|
|
be larger than others. Expect larger slices to have a more even distribution.
|
|
* Parameters like `requests_per_second` and `max_docs` on a request with
|
|
`slices` are distributed proportionally to each sub-request. Combine that with
|
|
the point above about distribution being uneven and you should conclude that
|
|
using `max_docs` with `slices` might not result in exactly `max_docs` documents
|
|
being updated.
|
|
* Each sub-request gets a slightly different snapshot of the source data stream or index
|
|
though these are all taken at approximately the same time.
|
|
|
|
[discrete]
|
|
[[picking-up-a-new-property]]
|
|
===== Pick up a new property
|
|
|
|
Say you created an index without dynamic mapping, filled it with data, and then
|
|
added a mapping value to pick up more fields from the data:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT test
|
|
{
|
|
"mappings": {
|
|
"dynamic": false, <1>
|
|
"properties": {
|
|
"text": {"type": "text"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST test/_doc?refresh
|
|
{
|
|
"text": "words words",
|
|
"flag": "bar"
|
|
}
|
|
POST test/_doc?refresh
|
|
{
|
|
"text": "words words",
|
|
"flag": "foo"
|
|
}
|
|
PUT test/_mapping <2>
|
|
{
|
|
"properties": {
|
|
"text": {"type": "text"},
|
|
"flag": {"type": "text", "analyzer": "keyword"}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> This means that new fields won't be indexed, just stored in `_source`.
|
|
|
|
<2> This updates the mapping to add the new `flag` field. To pick up the new
|
|
field you have to reindex all documents with it.
|
|
|
|
Searching for the data won't find anything:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST test/_search?filter_path=hits.total
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"flag": "foo"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"hits" : {
|
|
"total": {
|
|
"value": 0,
|
|
"relation": "eq"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
But you can issue an `_update_by_query` request to pick up the new mapping:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST test/_update_by_query?refresh&conflicts=proceed
|
|
POST test/_search?filter_path=hits.total
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"flag": "foo"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"hits" : {
|
|
"total": {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can do the exact same thing when adding a field to a multifield.
|