[Docs] Spelling and grammar changes to reindex.asciidoc (#29232)

This commit is contained in:
Andrew Banchich 2018-03-27 06:16:18 -04:00 committed by Christoph Büscher
parent 0ac89a32cc
commit d2baf4b191
1 changed files with 62 additions and 62 deletions

View File

@ -136,7 +136,7 @@ POST _reindex
// TEST[setup:twitter]
You can limit the documents by adding a type to the `source` or by adding a
query. This will only copy ++tweet++'s made by `kimchy` into `new_twitter`:
query. This will only copy tweets made by `kimchy` into `new_twitter`:
[source,js]
--------------------------------------------------
@ -161,11 +161,13 @@ POST _reindex
`index` and `type` in `source` can both be lists, allowing you to copy from
lots of sources in one request. This will copy documents from the `_doc` and
`post` types in the `twitter` and `blog` index. It'd include the `post` type in
the `twitter` index and the `_doc` type in the `blog` index. If you want to be
more specific you'll need to use the `query`. It also makes no effort to handle
ID collisions. The target index will remain valid but it's not easy to predict
which document will survive because the iteration order isn't well defined.
`post` types in the `twitter` and `blog` index. The copied documents would include the
`post` type in the `twitter` index and the `_doc` type in the `blog` index. For more
specific parameters, you can use `query`.
The Reindex API makes no effort to handle ID collisions. For such issues, the target index
will remain valid, but it's not easy to predict which document will survive because
the iteration order isn't well defined.
[source,js]
--------------------------------------------------
@ -203,8 +205,8 @@ POST _reindex
// CONSOLE
// TEST[setup:twitter]
If you want a particular set of documents from the twitter index you'll
need to sort. Sorting makes the scroll less efficient but in some contexts
If you want a particular set of documents from the `twitter` index you'll
need to use `sort`. Sorting makes the scroll less efficient but in some contexts
it's worth it. If possible, prefer a more selective query to `size` and `sort`.
This will copy 10000 documents from `twitter` into `new_twitter`:
@ -226,8 +228,8 @@ POST _reindex
// TEST[setup:twitter]
The `source` section supports all the elements that are supported in a
<<search-request-body,search request>>. For instance only a subset of the
fields from the original documents can be reindexed using source filtering
<<search-request-body,search request>>. For instance, only a subset of the
fields from the original documents can be reindexed using `source` filtering
as follows:
[source,js]
@ -286,10 +288,10 @@ Set `ctx.op = "delete"` if your script decides that the document must be
deleted from the destination index. The deletion will be reported in the
`deleted` counter in the <<docs-reindex-response-body, response body>>.
Setting `ctx.op` to anything else is an error. Setting any
other field in `ctx` is an error.
Setting `ctx.op` to anything else will return an error, as will setting any
other field in `ctx`.
Think of the possibilities! Just be careful! With great power.... You can
Think of the possibilities! Just be careful; you are able to
change:
* `_id`
@ -299,7 +301,7 @@ change:
* `_routing`
Setting `_version` to `null` or clearing it from the `ctx` map is just like not
sending the version in an indexing request. It will cause that document to be
sending the version in an indexing request; it will cause the document to be
overwritten in the target index regardless of the version on the target or the
version type you use in the `_reindex` request.
@ -310,11 +312,11 @@ preserved unless it's changed by the script. You can set `routing` on the
`keep`::
Sets the routing on the bulk request sent for each match to the routing on
the match. The default.
the match. This is the default value.
`discard`::
Sets the routing on the bulk request sent for each match to null.
Sets the routing on the bulk request sent for each match to `null`.
`=<some text>`::
@ -422,7 +424,7 @@ POST _reindex
The `host` parameter must contain a scheme, host, and port (e.g.
`https://otherhost:9200`). The `username` and `password` parameters are
optional and when they are present reindex will connect to the remote
optional, and when they are present `_reindex` will connect to the remote
Elasticsearch node using basic auth. Be sure to use `https` when using
basic auth or the password will be sent in plain text.
@ -446,7 +448,7 @@ NOTE: Reindexing from remote clusters does not support
Reindexing from a remote server uses an on-heap buffer that defaults to a
maximum size of 100mb. If the remote index includes very large documents you'll
need to use a smaller batch size. The example below sets the batch size `10`
need to use a smaller batch size. The example below sets the batch size to `10`
which is very, very small.
[source,js]
@ -477,8 +479,8 @@ POST _reindex
It is also possible to set the socket read timeout on the remote connection
with the `socket_timeout` field and the connection timeout with the
`connect_timeout` field. Both default to thirty seconds. This example
sets the socket read timeout to one minute and the connection timeout to ten
`connect_timeout` field. Both default to 30 seconds. This example
sets the socket read timeout to one minute and the connection timeout to 10
seconds:
[source,js]
@ -533,14 +535,14 @@ for details. `timeout` controls how long each write request waits for unavailabl
shards to become available. Both work exactly how they work in the
<<docs-bulk,Bulk API>>. As `_reindex` uses scroll search, you can also specify
the `scroll` parameter to control how long it keeps the "search context" alive,
eg `?scroll=10m`, by default it's 5 minutes.
(e.g. `?scroll=10m`). The default value is 5 minutes.
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
`1000`, etc) and throttles rate at which reindex issues batches of index
`1000`, etc) and throttles the rate at which `_reindex` issues batches of index
operations by padding each batch with a wait time. The throttling can be
disabled by setting `requests_per_second` to `-1`.
The throttling is done by waiting between batches so that scroll that reindex
The throttling is done by waiting between batches so that the `scroll` which `_reindex`
uses internally can be given a timeout that takes into account the padding.
The padding time is the difference between the batch size divided by the
`requests_per_second` and the time spent writing. By default the batch size is
@ -552,9 +554,9 @@ target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
--------------------------------------------------
Since the batch is issued as a single `_bulk` request large batch sizes will
Since the batch is issued as a single `_bulk` request, large batch sizes will
cause Elasticsearch to create many requests and then wait for a while before
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
starting the next set. This is "bursty" instead of "smooth". The default value is `-1`.
[float]
[[docs-reindex-response-body]]
@ -606,12 +608,12 @@ The JSON response looks like this:
`took`::
The number of milliseconds from start to end of the whole operation.
The total milliseconds the entire operation took.
`timed_out`::
This flag is set to `true` if any of the requests executed during the
reindex has timed out.
reindex timed out.
`total`::
@ -657,7 +659,7 @@ The number of requests per second effectively executed during the reindex.
`throttled_until_millis`::
This field should always be equal to zero in a delete by query response. It only
This field should always be equal to zero in a `_delete_by_query` response. It only
has meaning when using the <<docs-reindex-task-api, Task API>>, where it
indicates the next time (in milliseconds since epoch) a throttled request will be
executed again in order to conform to `requests_per_second`.
@ -681,7 +683,7 @@ GET _tasks?detailed=true&actions=*reindex
--------------------------------------------------
// CONSOLE
The responses looks like:
The response looks like:
[source,js]
--------------------------------------------------
@ -726,9 +728,9 @@ The responses looks like:
// NOTCONSOLE
// We can't test tasks output
<1> this object contains the actual status. It is just like the response json
with the important addition of the `total` field. `total` is the total number
of operations that the reindex expects to perform. You can estimate the
<1> this object contains the actual status. It is identical to the response JSON
except for the important addition of the `total` field. `total` is the total number
of operations that the `_reindex` expects to perform. You can estimate the
progress by adding the `updated`, `created`, and `deleted` fields. The request
will finish when their sum is equal to the `total` field.
@ -743,7 +745,7 @@ GET /_tasks/taskId:1
The advantage of this API is that it integrates with `wait_for_completion=false`
to transparently return the status of completed tasks. If the task is completed
and `wait_for_completion=false` was set on it them it'll come back with a
and `wait_for_completion=false` was set, it will return a
`results` or an `error` field. The cost of this feature is the document that
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
you to delete that document.
@ -761,10 +763,10 @@ POST _tasks/task_id:1/_cancel
--------------------------------------------------
// CONSOLE
The `task_id` can be found using the tasks API above.
The `task_id` can be found using the Tasks API.
Cancelation should happen quickly but might take a few seconds. The task status
API above will continue to list the task until it is wakes to cancel itself.
Cancelation should happen quickly but might take a few seconds. The Tasks
API will continue to list the task until it wakes to cancel itself.
[float]
@ -780,9 +782,9 @@ POST _reindex/task_id:1/_rethrottle?requests_per_second=-1
--------------------------------------------------
// CONSOLE
The `task_id` can be found using the tasks API above.
The `task_id` can be found using the Tasks API above.
Just like when setting it on the `_reindex` API `requests_per_second`
Just like when setting it on the Reindex API, `requests_per_second`
can be either `-1` to disable throttling or any decimal number
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
query takes effect immediately but rethrotting that slows down the query will
@ -806,7 +808,7 @@ POST test/_doc/1?refresh
--------------------------------------------------
// CONSOLE
But you don't like the name `flag` and want to replace it with `tag`.
but you don't like the name `flag` and want to replace it with `tag`.
`_reindex` can create the other index for you:
[source,js]
@ -836,7 +838,7 @@ GET test2/_doc/1
// CONSOLE
// TEST[continued]
and it'll look like:
which will return:
[source,js]
--------------------------------------------------
@ -854,8 +856,6 @@ and it'll look like:
--------------------------------------------------
// TESTRESPONSE
Or you can search by `tag` or whatever you want.
[float]
[[docs-reindex-slice]]
=== Slicing
@ -902,7 +902,7 @@ POST _reindex
// CONSOLE
// TEST[setup:big_twitter]
Which you can verify works with:
You can verify this works by:
[source,js]
----------------------------------------------------------------
@ -912,7 +912,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
// CONSOLE
// TEST[continued]
Which results in a sensible `total` like this one:
which results in a sensible `total` like this one:
[source,js]
----------------------------------------------------------------
@ -928,7 +928,7 @@ Which results in a sensible `total` like this one:
[[docs-reindex-automatic-slice]]
==== Automatic slicing
You can also let reindex automatically parallelize using <<sliced-scroll>> to
You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
slice on `_uid`. Use `slices` to specify the number of slices to use:
[source,js]
@ -946,7 +946,7 @@ POST _reindex?slices=5&refresh
// CONSOLE
// TEST[setup:big_twitter]
Which you also can verify works with:
You can also this verify works by:
[source,js]
----------------------------------------------------------------
@ -955,7 +955,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
// CONSOLE
// TEST[continued]
Which results in a sensible `total` like this one:
which results in a sensible `total` like this one:
[source,js]
----------------------------------------------------------------
@ -979,7 +979,7 @@ section above, creating sub-requests which means it has some quirks:
sub-requests are "child" tasks of the task for the request with `slices`.
* Fetching the status of the task for the request with `slices` only contains
the status of completed slices.
* These sub-requests are individually addressable for things like cancellation
* These sub-requests are individually addressable for things like cancelation
and rethrottling.
* Rethrottling the request with `slices` will rethrottle the unfinished
sub-request proportionally.
@ -992,7 +992,7 @@ are distributed proportionally to each sub-request. Combine that with the point
above about distribution being uneven and you should conclude that the using
`size` with `slices` might not result in exactly `size` documents being
`_reindex`ed.
* Each sub-requests gets a slightly different snapshot of the source index
* Each sub-request gets a slightly different snapshot of the source index,
though these are all taken at approximately the same time.
[float]
@ -1000,12 +1000,12 @@ though these are all taken at approximately the same time.
===== Picking the number of slices
If slicing automatically, setting `slices` to `auto` will choose a reasonable
number for most indices. If you're slicing manually or otherwise tuning
number for most indices. If slicing manually or otherwise tuning
automatic slicing, use these guidelines.
Query performance is most efficient when the number of `slices` is equal to the
number of shards in the index. If that number is large, (for example,
500) choose a lower number as too many `slices` will hurt performance. Setting
number of shards in the index. If that number is large (e.g. 500),
choose a lower number as too many `slices` will hurt performance. Setting
`slices` higher than the number of shards generally does not improve efficiency
and adds overhead.
@ -1019,9 +1019,9 @@ documents being reindexed and cluster resources.
=== Reindex daily indices
You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
to reindex daily indices to apply a new template to the existing documents.
to reindex daily indices to apply a new template to the existing documents.
Assuming you have indices consisting of documents as following:
Assuming you have indices consisting of documents as follows:
[source,js]
----------------------------------------------------------------
@ -1032,12 +1032,12 @@ PUT metricbeat-2016.05.31/_doc/1?refresh
----------------------------------------------------------------
// CONSOLE
The new template for the `metricbeat-*` indices is already loaded into Elasticsearch
The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
but it applies only to the newly created indices. Painless can be used to reindex
the existing documents and apply the new template.
The script below extracts the date from the index name and creates a new index
with `-1` appended. All data from `metricbeat-2016.05.31` will be reindex
with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
into `metricbeat-2016.05.31-1`.
[source,js]
@ -1059,7 +1059,7 @@ POST _reindex
// CONSOLE
// TEST[continued]
All documents from the previous metricbeat indices now can be found in the `*-1` indices.
All documents from the previous metricbeat indices can now be found in the `*-1` indices.
[source,js]
----------------------------------------------------------------
@ -1069,13 +1069,13 @@ GET metricbeat-2016.05.31-1/_doc/1
// CONSOLE
// TEST[continued]
The previous method can also be used in combination with <<docs-reindex-change-name, change the name of a field>>
to only load the existing data into the new index, but also rename fields if needed.
The previous method can also be used in conjunction with <<docs-reindex-change-name, change the name of a field>>
to load only the existing data into the new index and rename any fields if needed.
[float]
=== Extracting a random subset of an index
Reindex can be used to extract a random subset of an index for testing:
`_reindex` can be used to extract a random subset of an index for testing:
[source,js]
----------------------------------------------------------------
@ -1100,5 +1100,5 @@ POST _reindex
// CONSOLE
// TEST[setup:big_twitter]
<1> Reindex defaults to sorting by `_doc` so `random_score` won't have any
<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
effect unless you override the sort to `_score`.