Edits to text in Delete By Query API doc (#39017)

This commit is contained in:
Darren Meiss 2019-02-20 04:39:00 -05:00 committed by Daniel Mitterdorfer
parent 1037aa0665
commit 2f520c663c
1 changed files with 27 additions and 27 deletions

View File

@ -2,7 +2,7 @@
== Delete By Query API
The simplest usage of `_delete_by_query` just performs a deletion on every
document that match a query. Here is the API:
document that matches a query. Here is the API:
[source,js]
--------------------------------------------------
@ -20,7 +20,7 @@ POST twitter/_delete_by_query
<1> The query must be passed as a value to the `query` key, in the same
way as the <<search-search,Search API>>. You can also use the `q`
parameter in the same way as the search api.
parameter in the same way as the search API.
That will return something like this:
@ -68,7 +68,7 @@ these documents. In case a search or bulk request got rejected, `_delete_by_quer
failures that are returned by the failing bulk request are returned in the `failures`
element; therefore it's possible for there to be quite a few failed entities.
If you'd like to count version conflicts rather than cause them to abort then
If you'd like to count version conflicts rather than cause them to abort, then
set `conflicts=proceed` on the url or `"conflicts": "proceed"` in the request body.
Back to the API format, this will delete tweets from the `twitter` index:
@ -140,14 +140,14 @@ POST twitter/_delete_by_query?scroll_size=5000
[float]
=== URL Parameters
In addition to the standard parameters like `pretty`, the Delete By Query API
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`
In addition to the standard parameters like `pretty`, the delete by query API
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`,
and `scroll`.
Sending the `refresh` will refresh all shards involved in the delete by query
once the request completes. This is different than the Delete API's `refresh`
once the request completes. This is different than the delete API's `refresh`
parameter which causes just the shard that received the delete request
to be refreshed. Also unlike the Delete API it does not support `wait_for`.
to be refreshed. Also unlike the delete API it does not support `wait_for`.
If the request contains `wait_for_completion=false` then Elasticsearch will
perform some preflight checks, launch the request, and then return a `task`
@ -163,10 +163,10 @@ for details. `timeout` controls how long each write request waits for unavailabl
shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>. As `_delete_by_query` uses scroll search, you can also specify
the `scroll` parameter to control how long it keeps the "search context" alive,
eg `?scroll=10m`, by default it's 5 minutes.
e.g. `?scroll=10m`. By default it's 5 minutes.
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of
`1000`, etc.) and throttles the rate at which delete by query issues batches of
delete operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.
@ -182,7 +182,7 @@ target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------
Since the batch is issued as a single `_bulk` request large batch sizes will
Since the batch is issued as a single `_bulk` request, large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
@ -259,13 +259,13 @@ The number of version conflicts that the delete by query hit.
`noops`::
This field is always equal to zero for delete by query. It only exists
so that delete by query, update by query and reindex APIs return responses
so that delete by query, update by query, and reindex APIs return responses
with the same structure.
`retries`::
The number of retries attempted by delete by query. `bulk` is the number
of bulk actions retried and `search` is the number of search actions retried.
of bulk actions retried, and `search` is the number of search actions retried.
`throttled_millis`::
@ -286,7 +286,7 @@ executed again in order to conform to `requests_per_second`.
Array of failures if there were any unrecoverable errors during the process. If
this is non-empty then the request aborted because of those failures.
Delete-by-query is implemented using batches and any failure causes the entire
Delete by query is implemented using batches, and any failure causes the entire
process to abort but all failures in the current batch are collected into the
array. You can use the `conflicts` option to prevent reindex from aborting on
version conflicts.
@ -296,7 +296,7 @@ version conflicts.
[[docs-delete-by-query-task-api]]
=== Works with the Task API
You can fetch the status of any running delete-by-query requests with the
You can fetch the status of any running delete by query requests with the
<<tasks,Task API>>:
[source,js]
@ -306,7 +306,7 @@ GET _tasks?detailed=true&actions=*/delete/byquery
// CONSOLE
// TEST[skip:No tasks to retrieve]
The responses looks like:
The response looks like:
[source,js]
--------------------------------------------------
@ -346,7 +346,7 @@ The responses looks like:
}
--------------------------------------------------
// TESTRESPONSE
<1> this object contains the actual status. It is just like the response json
<1> This object contains the actual status. It is just like the response JSON
with the important addition of the `total` field. `total` is the total number
of operations that the reindex expects to perform. You can estimate the
progress by adding the `updated`, `created`, and `deleted` fields. The request
@ -373,7 +373,7 @@ you to delete that document.
[[docs-delete-by-query-cancel-task-api]]
=== Works with the Cancel Task API
Any Delete By Query can be canceled using the <<tasks,task cancel API>>:
Any delete by query can be canceled using the <<tasks,task cancel API>>:
[source,js]
--------------------------------------------------
@ -403,26 +403,26 @@ POST _delete_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_seco
The task ID can be found using the <<tasks,tasks API>>.
Just like when setting it on the `_delete_by_query` API `requests_per_second`
Just like when setting it on the delete by query API, `requests_per_second`
can be either `-1` to disable throttling or any decimal number
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
query takes effect immediately but rethrotting that slows down the query will
take effect on after completing the current batch. This prevents scroll
take effect after completing the current batch. This prevents scroll
timeouts.
[float]
[[docs-delete-by-query-slice]]
=== Slicing
Delete-by-query supports <<sliced-scroll>> to parallelize the deleting process.
Delete by query supports <<sliced-scroll, sliced scroll>> to parallelize the deleting process.
This parallelization can improve efficiency and provide a convenient way to
break the request down into smaller parts.
[float]
[[docs-delete-by-query-manual-slice]]
==== Manually slicing
==== Manual slicing
Slice a delete-by-query manually by providing a slice id and total number of
Slice a delete by query manually by providing a slice id and total number of
slices to each request:
[source,js]
@ -498,7 +498,7 @@ Which results in a sensible `total` like this one:
==== Automatic slicing
You can also let delete-by-query automatically parallelize using
<<sliced-scroll>> to slice on `_id`. Use `slices` to specify the number of
<<sliced-scroll, sliced scroll>> to slice on `_id`. Use `slices` to specify the number of
slices to use:
[source,js]
@ -575,8 +575,8 @@ be larger than others. Expect larger slices to have a more even distribution.
are distributed proportionally to each sub-request. Combine that with the point
above about distribution being uneven and you should conclude that the using
`size` with `slices` might not result in exactly `size` documents being
`_delete_by_query`ed.
* Each sub-requests gets a slightly different snapshot of the source index
deleted.
* Each sub-request gets a slightly different snapshot of the source index
though these are all taken at approximately the same time.
[float]
@ -588,8 +588,8 @@ number for most indices. If you're slicing manually or otherwise tuning
automatic slicing, use these guidelines.
Query performance is most efficient when the number of `slices` is equal to the
number of shards in the index. If that number is large, (for example,
500) choose a lower number as too many `slices` will hurt performance. Setting
number of shards in the index. If that number is large (for example,
500), choose a lower number as too many `slices` will hurt performance. Setting
`slices` higher than the number of shards generally does not improve efficiency
and adds overhead.