From 5ea6f90968512da90862c41f42b96a15a60ce198 Mon Sep 17 00:00:00 2001 From: Nik Everett Date: Tue, 15 Aug 2017 15:53:29 -0400 Subject: [PATCH] Further improve docs for requests_per_second In #26185 we made the description of `requests_per_second` sane for reindex. This improves on the description by using some more common vocabulary ("batch size", etc) and improving the formatting of the example calculation so it stands out and doesn't require scrolling. --- docs/reference/docs/delete-by-query.asciidoc | 27 ++++++++++++++------ docs/reference/docs/reindex.asciidoc | 26 +++++++++++-------- docs/reference/docs/update-by-query.asciidoc | 27 ++++++++++++++------ 3 files changed, 53 insertions(+), 27 deletions(-) diff --git a/docs/reference/docs/delete-by-query.asciidoc b/docs/reference/docs/delete-by-query.asciidoc index b2a59231d34..6db27698245 100644 --- a/docs/reference/docs/delete-by-query.asciidoc +++ b/docs/reference/docs/delete-by-query.asciidoc @@ -164,14 +164,25 @@ shards to become available. Both work exactly how they work in the <>. `requests_per_second` can be set to any positive decimal number (`1.4`, `6`, -`1000`, etc) and throttles the number of requests per second that the delete-by-query -issues or it can be set to `-1` to disabled throttling. The throttling is done -waiting between bulk batches so that it can manipulate the scroll timeout. The -wait time is the difference between the time it took the batch to complete and -the time `requests_per_second * requests_in_the_batch`. Since the batch isn't -broken into multiple bulk requests large batch sizes will cause Elasticsearch -to create many requests and then wait for a while before starting the next set. -This is "bursty" instead of "smooth". The default is `-1`. +`1000`, etc) and throttles rate at which `_delete_by_query` issues batches of +delete operations by padding each batch with a wait time. The throttling can be +disabled by setting `requests_per_second` to `-1`. + +The throttling is done by waiting between batches so that scroll that +`_delete_by_query` uses internally can be given a timeout that takes into +account the padding. The padding time is the difference between the batch size +divided by the `requests_per_second` and the time spent writing. By default the +batch size is `1000`, so if the `requests_per_second` is set to `500`: + +[source,txt] +-------------------------------------------------- +target_time = 1000 / 500 per second = 2 seconds +wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds +-------------------------------------------------- + +Since the batch is issued as a single `_bulk` request large batch sizes will +cause Elasticsearch to create many requests and then wait for a while before +starting the next set. This is "bursty" instead of "smooth". The default is `-1`. [float] === Response body diff --git a/docs/reference/docs/reindex.asciidoc b/docs/reference/docs/reindex.asciidoc index a6e00e2d100..817c676a72c 100644 --- a/docs/reference/docs/reindex.asciidoc +++ b/docs/reference/docs/reindex.asciidoc @@ -534,20 +534,24 @@ shards to become available. Both work exactly how they work in the <>. `requests_per_second` can be set to any positive decimal number (`1.4`, `6`, -`1000`, etc) and throttles the number of batches that the reindex issues by -padding each batch with a wait time. The throttling can be disabled by -setting `requests_per_second` to `-1`. +`1000`, etc) and throttles rate at which reindex issues batches of index +operations by padding each batch with a wait time. The throttling can be +disabled by setting `requests_per_second` to `-1`. -The throttling is done waiting between bulk batches so that it can manipulate the -scroll timeout. The wait time is the difference between the request scroll search -size divided by the `requests_per_second` and the `batch_write_time`. By default -the scroll batch size is `1000`, so if the `requests_per_second` is set to `500`: +The throttling is done by waiting between batches so that scroll that reindex +uses internally can be given a timeout that takes into account the padding. +The padding time is the difference between the batch size divided by the +`requests_per_second` and the time spent writing. By default the batch size is +`1000`, so if the `requests_per_second` is set to `500`: -`target_total_time` = `1000` / `500 per second` = `2 seconds` + -`wait_time` = `target_total_time` - `batch_write_time` = `2 seconds` - `.5 seconds` = `1.5 seconds` +[source,txt] +-------------------------------------------------- +target_time = 1000 / 500 per second = 2 seconds +wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds +-------------------------------------------------- -Since the batch isn't broken into multiple bulk requests large batch sizes will -cause Elasticsearch to create many requests and then wait for a while before +Since the batch is issued as a single `_bulk` request large batch sizes will +cause Elasticsearch to create many requests and then wait for a while before starting the next set. This is "bursty" instead of "smooth". The default is `-1`. [float] diff --git a/docs/reference/docs/update-by-query.asciidoc b/docs/reference/docs/update-by-query.asciidoc index 2597fd28cb8..6b25b693f10 100644 --- a/docs/reference/docs/update-by-query.asciidoc +++ b/docs/reference/docs/update-by-query.asciidoc @@ -221,14 +221,25 @@ shards to become available. Both work exactly how they work in the <>. `requests_per_second` can be set to any positive decimal number (`1.4`, `6`, -`1000`, etc) and throttles the number of requests per second that the update-by-query -issues or it can be set to `-1` to disabled throttling. The throttling is done -waiting between bulk batches so that it can manipulate the scroll timeout. The -wait time is the difference between the time it took the batch to complete and -the time `requests_per_second * requests_in_the_batch`. Since the batch isn't -broken into multiple bulk requests large batch sizes will cause Elasticsearch -to create many requests and then wait for a while before starting the next set. -This is "bursty" instead of "smooth". The default is `-1`. +`1000`, etc) and throttles rate at which `_update_by_query` issues batches of +index operations by padding each batch with a wait time. The throttling can be +disabled by setting `requests_per_second` to `-1`. + +The throttling is done by waiting between batches so that scroll that +`_update_by_query` uses internally can be given a timeout that takes into +account the padding. The padding time is the difference between the batch size +divided by the `requests_per_second` and the time spent writing. By default the +batch size is `1000`, so if the `requests_per_second` is set to `500`: + +[source,txt] +-------------------------------------------------- +target_time = 1000 / 500 per second = 2 seconds +wait_time = target_time - delete_time = 2 seconds - .5 seconds = 1.5 seconds +-------------------------------------------------- + +Since the batch is issued as a single `_bulk` request large batch sizes will +cause Elasticsearch to create many requests and then wait for a while before +starting the next set. This is "bursty" instead of "smooth". The default is `-1`. [float] [[docs-update-by-query-response-body]]