Further improve docs for requests_per_second

In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
This commit is contained in:
Nik Everett 2017-08-15 15:53:29 -04:00
parent dd4f7eee22
commit 5ea6f90968
3 changed files with 53 additions and 27 deletions

View File

@ -164,14 +164,25 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of requests per second that the delete-by-query
issues or it can be set to `-1` to disabled throttling. The throttling is done
waiting between bulk batches so that it can manipulate the scroll timeout. The
wait time is the difference between the time it took the batch to complete and
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
broken into multiple bulk requests large batch sizes will cause Elasticsearch
to create many requests and then wait for a while before starting the next set.
This is "bursty" instead of "smooth". The default is `-1`.
`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of
delete operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.
The throttling is done by waiting between batches so that scroll that
`_delete_by_query` uses internally can be given a timeout that takes into
account the padding. The padding time is the difference between the batch size
divided by the `requests_per_second` and the time spent writing. By default the
batch size is `1000`, so if the `requests_per_second` is set to `500`:
[source,txt]
--------------------------------------------------
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------
Since the batch is issued as a single `_bulk` request large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
[float]
=== Response body

View File

@ -534,20 +534,24 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of batches that the reindex issues by
padding each batch with a wait time. The throttling can be disabled by
setting `requests_per_second` to `-1`.
`1000`, etc) and throttles rate at which reindex issues batches of index
operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.
The throttling is done waiting between bulk batches so that it can manipulate the
scroll timeout. The wait time is the difference between the request scroll search
size divided by the `requests_per_second` and the `batch_write_time`. By default
the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`:
The throttling is done by waiting between batches so that scroll that reindex
uses internally can be given a timeout that takes into account the padding.
The padding time is the difference between the batch size divided by the
`requests_per_second` and the time spent writing. By default the batch size is
`1000`, so if the `requests_per_second` is set to `500`:
`target_total_time` = `1000` / `500 per second` = `2 seconds` +
`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds`
[source,txt]
--------------------------------------------------
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------
Since the batch isn't broken into multiple bulk requests large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
Since the batch is issued as a single `_bulk` request large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
[float]

View File

@ -221,14 +221,25 @@ shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>.
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles the number of requests per second that the update-by-query
issues or it can be set to `-1` to disabled throttling. The throttling is done
waiting between bulk batches so that it can manipulate the scroll timeout. The
wait time is the difference between the time it took the batch to complete and
the time `requests_per_second * requests_in_the_batch`. Since the batch isn't
broken into multiple bulk requests large batch sizes will cause Elasticsearch
to create many requests and then wait for a while before starting the next set.
This is "bursty" instead of "smooth". The default is `-1`.
`1000`, etc) and throttles rate at which `_update_by_query` issues batches of
index operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.
The throttling is done by waiting between batches so that scroll that
`_update_by_query` uses internally can be given a timeout that takes into
account the padding. The padding time is the difference between the batch size
divided by the `requests_per_second` and the time spent writing. By default the
batch size is `1000`, so if the `requests_per_second` is set to `500`:
[source,txt]
--------------------------------------------------
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - delete_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------
Since the batch is issued as a single `_bulk` request large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
[float]
[[docs-update-by-query-response-body]]