Edits to text in Delete By Query API doc (#39017)
This commit is contained in:
parent
1037aa0665
commit
2f520c663c
|
@ -2,7 +2,7 @@
|
|||
== Delete By Query API
|
||||
|
||||
The simplest usage of `_delete_by_query` just performs a deletion on every
|
||||
document that match a query. Here is the API:
|
||||
document that matches a query. Here is the API:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -20,7 +20,7 @@ POST twitter/_delete_by_query
|
|||
|
||||
<1> The query must be passed as a value to the `query` key, in the same
|
||||
way as the <<search-search,Search API>>. You can also use the `q`
|
||||
parameter in the same way as the search api.
|
||||
parameter in the same way as the search API.
|
||||
|
||||
That will return something like this:
|
||||
|
||||
|
@ -68,7 +68,7 @@ these documents. In case a search or bulk request got rejected, `_delete_by_quer
|
|||
failures that are returned by the failing bulk request are returned in the `failures`
|
||||
element; therefore it's possible for there to be quite a few failed entities.
|
||||
|
||||
If you'd like to count version conflicts rather than cause them to abort then
|
||||
If you'd like to count version conflicts rather than cause them to abort, then
|
||||
set `conflicts=proceed` on the url or `"conflicts": "proceed"` in the request body.
|
||||
|
||||
Back to the API format, this will delete tweets from the `twitter` index:
|
||||
|
@ -140,14 +140,14 @@ POST twitter/_delete_by_query?scroll_size=5000
|
|||
[float]
|
||||
=== URL Parameters
|
||||
|
||||
In addition to the standard parameters like `pretty`, the Delete By Query API
|
||||
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`
|
||||
In addition to the standard parameters like `pretty`, the delete by query API
|
||||
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`,
|
||||
and `scroll`.
|
||||
|
||||
Sending the `refresh` will refresh all shards involved in the delete by query
|
||||
once the request completes. This is different than the Delete API's `refresh`
|
||||
once the request completes. This is different than the delete API's `refresh`
|
||||
parameter which causes just the shard that received the delete request
|
||||
to be refreshed. Also unlike the Delete API it does not support `wait_for`.
|
||||
to be refreshed. Also unlike the delete API it does not support `wait_for`.
|
||||
|
||||
If the request contains `wait_for_completion=false` then Elasticsearch will
|
||||
perform some preflight checks, launch the request, and then return a `task`
|
||||
|
@ -163,10 +163,10 @@ for details. `timeout` controls how long each write request waits for unavailabl
|
|||
shards to become available. Both work exactly how they work in the
|
||||
<<docs-bulk,Bulk API>>. As `_delete_by_query` uses scroll search, you can also specify
|
||||
the `scroll` parameter to control how long it keeps the "search context" alive,
|
||||
eg `?scroll=10m`, by default it's 5 minutes.
|
||||
e.g. `?scroll=10m`. By default it's 5 minutes.
|
||||
|
||||
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
|
||||
`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of
|
||||
`1000`, etc.) and throttles the rate at which delete by query issues batches of
|
||||
delete operations by padding each batch with a wait time. The throttling can be
|
||||
disabled by setting `requests_per_second` to `-1`.
|
||||
|
||||
|
@ -182,7 +182,7 @@ target_time = 1000 / 500 per second = 2 seconds
|
|||
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
|
||||
--------------------------------------------------
|
||||
|
||||
Since the batch is issued as a single `_bulk` request large batch sizes will
|
||||
Since the batch is issued as a single `_bulk` request, large batch sizes will
|
||||
cause Elasticsearch to create many requests and then wait for a while before
|
||||
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
|
||||
|
||||
|
@ -259,13 +259,13 @@ The number of version conflicts that the delete by query hit.
|
|||
`noops`::
|
||||
|
||||
This field is always equal to zero for delete by query. It only exists
|
||||
so that delete by query, update by query and reindex APIs return responses
|
||||
so that delete by query, update by query, and reindex APIs return responses
|
||||
with the same structure.
|
||||
|
||||
`retries`::
|
||||
|
||||
The number of retries attempted by delete by query. `bulk` is the number
|
||||
of bulk actions retried and `search` is the number of search actions retried.
|
||||
of bulk actions retried, and `search` is the number of search actions retried.
|
||||
|
||||
`throttled_millis`::
|
||||
|
||||
|
@ -286,7 +286,7 @@ executed again in order to conform to `requests_per_second`.
|
|||
|
||||
Array of failures if there were any unrecoverable errors during the process. If
|
||||
this is non-empty then the request aborted because of those failures.
|
||||
Delete-by-query is implemented using batches and any failure causes the entire
|
||||
Delete by query is implemented using batches, and any failure causes the entire
|
||||
process to abort but all failures in the current batch are collected into the
|
||||
array. You can use the `conflicts` option to prevent reindex from aborting on
|
||||
version conflicts.
|
||||
|
@ -296,7 +296,7 @@ version conflicts.
|
|||
[[docs-delete-by-query-task-api]]
|
||||
=== Works with the Task API
|
||||
|
||||
You can fetch the status of any running delete-by-query requests with the
|
||||
You can fetch the status of any running delete by query requests with the
|
||||
<<tasks,Task API>>:
|
||||
|
||||
[source,js]
|
||||
|
@ -306,7 +306,7 @@ GET _tasks?detailed=true&actions=*/delete/byquery
|
|||
// CONSOLE
|
||||
// TEST[skip:No tasks to retrieve]
|
||||
|
||||
The responses looks like:
|
||||
The response looks like:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -346,7 +346,7 @@ The responses looks like:
|
|||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE
|
||||
<1> this object contains the actual status. It is just like the response json
|
||||
<1> This object contains the actual status. It is just like the response JSON
|
||||
with the important addition of the `total` field. `total` is the total number
|
||||
of operations that the reindex expects to perform. You can estimate the
|
||||
progress by adding the `updated`, `created`, and `deleted` fields. The request
|
||||
|
@ -373,7 +373,7 @@ you to delete that document.
|
|||
[[docs-delete-by-query-cancel-task-api]]
|
||||
=== Works with the Cancel Task API
|
||||
|
||||
Any Delete By Query can be canceled using the <<tasks,task cancel API>>:
|
||||
Any delete by query can be canceled using the <<tasks,task cancel API>>:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -403,26 +403,26 @@ POST _delete_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_seco
|
|||
|
||||
The task ID can be found using the <<tasks,tasks API>>.
|
||||
|
||||
Just like when setting it on the `_delete_by_query` API `requests_per_second`
|
||||
Just like when setting it on the delete by query API, `requests_per_second`
|
||||
can be either `-1` to disable throttling or any decimal number
|
||||
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
|
||||
query takes effect immediately but rethrotting that slows down the query will
|
||||
take effect on after completing the current batch. This prevents scroll
|
||||
take effect after completing the current batch. This prevents scroll
|
||||
timeouts.
|
||||
|
||||
[float]
|
||||
[[docs-delete-by-query-slice]]
|
||||
=== Slicing
|
||||
|
||||
Delete-by-query supports <<sliced-scroll>> to parallelize the deleting process.
|
||||
Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the deleting process.
|
||||
This parallelization can improve efficiency and provide a convenient way to
|
||||
break the request down into smaller parts.
|
||||
|
||||
[float]
|
||||
[[docs-delete-by-query-manual-slice]]
|
||||
==== Manually slicing
|
||||
==== Manual slicing
|
||||
|
||||
Slice a delete-by-query manually by providing a slice id and total number of
|
||||
Slice a delete by query manually by providing a slice id and total number of
|
||||
slices to each request:
|
||||
|
||||
[source,js]
|
||||
|
@ -498,7 +498,7 @@ Which results in a sensible `total` like this one:
|
|||
==== Automatic slicing
|
||||
|
||||
You can also let delete-by-query automatically parallelize using
|
||||
<<sliced-scroll>> to slice on `_id`. Use `slices` to specify the number of
|
||||
<<sliced-scroll, sliced scroll>> to slice on `_id`. Use `slices` to specify the number of
|
||||
slices to use:
|
||||
|
||||
[source,js]
|
||||
|
@ -575,8 +575,8 @@ be larger than others. Expect larger slices to have a more even distribution.
|
|||
are distributed proportionally to each sub-request. Combine that with the point
|
||||
above about distribution being uneven and you should conclude that the using
|
||||
`size` with `slices` might not result in exactly `size` documents being
|
||||
`_delete_by_query`ed.
|
||||
* Each sub-requests gets a slightly different snapshot of the source index
|
||||
deleted.
|
||||
* Each sub-request gets a slightly different snapshot of the source index
|
||||
though these are all taken at approximately the same time.
|
||||
|
||||
[float]
|
||||
|
@ -588,8 +588,8 @@ number for most indices. If you're slicing manually or otherwise tuning
|
|||
automatic slicing, use these guidelines.
|
||||
|
||||
Query performance is most efficient when the number of `slices` is equal to the
|
||||
number of shards in the index. If that number is large, (for example,
|
||||
500) choose a lower number as too many `slices` will hurt performance. Setting
|
||||
number of shards in the index. If that number is large (for example,
|
||||
500), choose a lower number as too many `slices` will hurt performance. Setting
|
||||
`slices` higher than the number of shards generally does not improve efficiency
|
||||
and adds overhead.
|
||||
|
||||
|
|
Loading…
Reference in New Issue