[Docs] Spelling and grammar changes to reindex.asciidoc (#29232)
This commit is contained in:
parent
0ac89a32cc
commit
d2baf4b191
|
@ -136,7 +136,7 @@ POST _reindex
|
|||
// TEST[setup:twitter]
|
||||
|
||||
You can limit the documents by adding a type to the `source` or by adding a
|
||||
query. This will only copy ++tweet++'s made by `kimchy` into `new_twitter`:
|
||||
query. This will only copy tweets made by `kimchy` into `new_twitter`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -161,11 +161,13 @@ POST _reindex
|
|||
|
||||
`index` and `type` in `source` can both be lists, allowing you to copy from
|
||||
lots of sources in one request. This will copy documents from the `_doc` and
|
||||
`post` types in the `twitter` and `blog` index. It'd include the `post` type in
|
||||
the `twitter` index and the `_doc` type in the `blog` index. If you want to be
|
||||
more specific you'll need to use the `query`. It also makes no effort to handle
|
||||
ID collisions. The target index will remain valid but it's not easy to predict
|
||||
which document will survive because the iteration order isn't well defined.
|
||||
`post` types in the `twitter` and `blog` index. The copied documents would include the
|
||||
`post` type in the `twitter` index and the `_doc` type in the `blog` index. For more
|
||||
specific parameters, you can use `query`.
|
||||
|
||||
The Reindex API makes no effort to handle ID collisions. For such issues, the target index
|
||||
will remain valid, but it's not easy to predict which document will survive because
|
||||
the iteration order isn't well defined.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -203,8 +205,8 @@ POST _reindex
|
|||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
If you want a particular set of documents from the twitter index you'll
|
||||
need to sort. Sorting makes the scroll less efficient but in some contexts
|
||||
If you want a particular set of documents from the `twitter` index you'll
|
||||
need to use `sort`. Sorting makes the scroll less efficient but in some contexts
|
||||
it's worth it. If possible, prefer a more selective query to `size` and `sort`.
|
||||
This will copy 10000 documents from `twitter` into `new_twitter`:
|
||||
|
||||
|
@ -226,8 +228,8 @@ POST _reindex
|
|||
// TEST[setup:twitter]
|
||||
|
||||
The `source` section supports all the elements that are supported in a
|
||||
<<search-request-body,search request>>. For instance only a subset of the
|
||||
fields from the original documents can be reindexed using source filtering
|
||||
<<search-request-body,search request>>. For instance, only a subset of the
|
||||
fields from the original documents can be reindexed using `source` filtering
|
||||
as follows:
|
||||
|
||||
[source,js]
|
||||
|
@ -286,10 +288,10 @@ Set `ctx.op = "delete"` if your script decides that the document must be
|
|||
deleted from the destination index. The deletion will be reported in the
|
||||
`deleted` counter in the <<docs-reindex-response-body, response body>>.
|
||||
|
||||
Setting `ctx.op` to anything else is an error. Setting any
|
||||
other field in `ctx` is an error.
|
||||
Setting `ctx.op` to anything else will return an error, as will setting any
|
||||
other field in `ctx`.
|
||||
|
||||
Think of the possibilities! Just be careful! With great power.... You can
|
||||
Think of the possibilities! Just be careful; you are able to
|
||||
change:
|
||||
|
||||
* `_id`
|
||||
|
@ -299,7 +301,7 @@ change:
|
|||
* `_routing`
|
||||
|
||||
Setting `_version` to `null` or clearing it from the `ctx` map is just like not
|
||||
sending the version in an indexing request. It will cause that document to be
|
||||
sending the version in an indexing request; it will cause the document to be
|
||||
overwritten in the target index regardless of the version on the target or the
|
||||
version type you use in the `_reindex` request.
|
||||
|
||||
|
@ -310,11 +312,11 @@ preserved unless it's changed by the script. You can set `routing` on the
|
|||
`keep`::
|
||||
|
||||
Sets the routing on the bulk request sent for each match to the routing on
|
||||
the match. The default.
|
||||
the match. This is the default value.
|
||||
|
||||
`discard`::
|
||||
|
||||
Sets the routing on the bulk request sent for each match to null.
|
||||
Sets the routing on the bulk request sent for each match to `null`.
|
||||
|
||||
`=<some text>`::
|
||||
|
||||
|
@ -422,7 +424,7 @@ POST _reindex
|
|||
|
||||
The `host` parameter must contain a scheme, host, and port (e.g.
|
||||
`https://otherhost:9200`). The `username` and `password` parameters are
|
||||
optional and when they are present reindex will connect to the remote
|
||||
optional, and when they are present `_reindex` will connect to the remote
|
||||
Elasticsearch node using basic auth. Be sure to use `https` when using
|
||||
basic auth or the password will be sent in plain text.
|
||||
|
||||
|
@ -446,7 +448,7 @@ NOTE: Reindexing from remote clusters does not support
|
|||
|
||||
Reindexing from a remote server uses an on-heap buffer that defaults to a
|
||||
maximum size of 100mb. If the remote index includes very large documents you'll
|
||||
need to use a smaller batch size. The example below sets the batch size `10`
|
||||
need to use a smaller batch size. The example below sets the batch size to `10`
|
||||
which is very, very small.
|
||||
|
||||
[source,js]
|
||||
|
@ -477,8 +479,8 @@ POST _reindex
|
|||
|
||||
It is also possible to set the socket read timeout on the remote connection
|
||||
with the `socket_timeout` field and the connection timeout with the
|
||||
`connect_timeout` field. Both default to thirty seconds. This example
|
||||
sets the socket read timeout to one minute and the connection timeout to ten
|
||||
`connect_timeout` field. Both default to 30 seconds. This example
|
||||
sets the socket read timeout to one minute and the connection timeout to 10
|
||||
seconds:
|
||||
|
||||
[source,js]
|
||||
|
@ -533,14 +535,14 @@ for details. `timeout` controls how long each write request waits for unavailabl
|
|||
shards to become available. Both work exactly how they work in the
|
||||
<<docs-bulk,Bulk API>>. As `_reindex` uses scroll search, you can also specify
|
||||
the `scroll` parameter to control how long it keeps the "search context" alive,
|
||||
eg `?scroll=10m`, by default it's 5 minutes.
|
||||
(e.g. `?scroll=10m`). The default value is 5 minutes.
|
||||
|
||||
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
|
||||
`1000`, etc) and throttles rate at which reindex issues batches of index
|
||||
`1000`, etc) and throttles the rate at which `_reindex` issues batches of index
|
||||
operations by padding each batch with a wait time. The throttling can be
|
||||
disabled by setting `requests_per_second` to `-1`.
|
||||
|
||||
The throttling is done by waiting between batches so that scroll that reindex
|
||||
The throttling is done by waiting between batches so that the `scroll` which `_reindex`
|
||||
uses internally can be given a timeout that takes into account the padding.
|
||||
The padding time is the difference between the batch size divided by the
|
||||
`requests_per_second` and the time spent writing. By default the batch size is
|
||||
|
@ -552,9 +554,9 @@ target_time = 1000 / 500 per second = 2 seconds
|
|||
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
|
||||
--------------------------------------------------
|
||||
|
||||
Since the batch is issued as a single `_bulk` request large batch sizes will
|
||||
Since the batch is issued as a single `_bulk` request, large batch sizes will
|
||||
cause Elasticsearch to create many requests and then wait for a while before
|
||||
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
|
||||
starting the next set. This is "bursty" instead of "smooth". The default value is `-1`.
|
||||
|
||||
[float]
|
||||
[[docs-reindex-response-body]]
|
||||
|
@ -606,12 +608,12 @@ The JSON response looks like this:
|
|||
|
||||
`took`::
|
||||
|
||||
The number of milliseconds from start to end of the whole operation.
|
||||
The total milliseconds the entire operation took.
|
||||
|
||||
`timed_out`::
|
||||
|
||||
This flag is set to `true` if any of the requests executed during the
|
||||
reindex has timed out.
|
||||
reindex timed out.
|
||||
|
||||
`total`::
|
||||
|
||||
|
@ -657,7 +659,7 @@ The number of requests per second effectively executed during the reindex.
|
|||
|
||||
`throttled_until_millis`::
|
||||
|
||||
This field should always be equal to zero in a delete by query response. It only
|
||||
This field should always be equal to zero in a `_delete_by_query` response. It only
|
||||
has meaning when using the <<docs-reindex-task-api, Task API>>, where it
|
||||
indicates the next time (in milliseconds since epoch) a throttled request will be
|
||||
executed again in order to conform to `requests_per_second`.
|
||||
|
@ -681,7 +683,7 @@ GET _tasks?detailed=true&actions=*reindex
|
|||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
The responses looks like:
|
||||
The response looks like:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -726,9 +728,9 @@ The responses looks like:
|
|||
// NOTCONSOLE
|
||||
// We can't test tasks output
|
||||
|
||||
<1> this object contains the actual status. It is just like the response json
|
||||
with the important addition of the `total` field. `total` is the total number
|
||||
of operations that the reindex expects to perform. You can estimate the
|
||||
<1> this object contains the actual status. It is identical to the response JSON
|
||||
except for the important addition of the `total` field. `total` is the total number
|
||||
of operations that the `_reindex` expects to perform. You can estimate the
|
||||
progress by adding the `updated`, `created`, and `deleted` fields. The request
|
||||
will finish when their sum is equal to the `total` field.
|
||||
|
||||
|
@ -743,7 +745,7 @@ GET /_tasks/taskId:1
|
|||
|
||||
The advantage of this API is that it integrates with `wait_for_completion=false`
|
||||
to transparently return the status of completed tasks. If the task is completed
|
||||
and `wait_for_completion=false` was set on it them it'll come back with a
|
||||
and `wait_for_completion=false` was set, it will return a
|
||||
`results` or an `error` field. The cost of this feature is the document that
|
||||
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
|
||||
you to delete that document.
|
||||
|
@ -761,10 +763,10 @@ POST _tasks/task_id:1/_cancel
|
|||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
The `task_id` can be found using the tasks API above.
|
||||
The `task_id` can be found using the Tasks API.
|
||||
|
||||
Cancelation should happen quickly but might take a few seconds. The task status
|
||||
API above will continue to list the task until it is wakes to cancel itself.
|
||||
Cancelation should happen quickly but might take a few seconds. The Tasks
|
||||
API will continue to list the task until it wakes to cancel itself.
|
||||
|
||||
|
||||
[float]
|
||||
|
@ -780,9 +782,9 @@ POST _reindex/task_id:1/_rethrottle?requests_per_second=-1
|
|||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
The `task_id` can be found using the tasks API above.
|
||||
The `task_id` can be found using the Tasks API above.
|
||||
|
||||
Just like when setting it on the `_reindex` API `requests_per_second`
|
||||
Just like when setting it on the Reindex API, `requests_per_second`
|
||||
can be either `-1` to disable throttling or any decimal number
|
||||
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
|
||||
query takes effect immediately but rethrotting that slows down the query will
|
||||
|
@ -806,7 +808,7 @@ POST test/_doc/1?refresh
|
|||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
But you don't like the name `flag` and want to replace it with `tag`.
|
||||
but you don't like the name `flag` and want to replace it with `tag`.
|
||||
`_reindex` can create the other index for you:
|
||||
|
||||
[source,js]
|
||||
|
@ -836,7 +838,7 @@ GET test2/_doc/1
|
|||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
and it'll look like:
|
||||
which will return:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -854,8 +856,6 @@ and it'll look like:
|
|||
--------------------------------------------------
|
||||
// TESTRESPONSE
|
||||
|
||||
Or you can search by `tag` or whatever you want.
|
||||
|
||||
[float]
|
||||
[[docs-reindex-slice]]
|
||||
=== Slicing
|
||||
|
@ -902,7 +902,7 @@ POST _reindex
|
|||
// CONSOLE
|
||||
// TEST[setup:big_twitter]
|
||||
|
||||
Which you can verify works with:
|
||||
You can verify this works by:
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -912,7 +912,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
|
|||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Which results in a sensible `total` like this one:
|
||||
which results in a sensible `total` like this one:
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -928,7 +928,7 @@ Which results in a sensible `total` like this one:
|
|||
[[docs-reindex-automatic-slice]]
|
||||
==== Automatic slicing
|
||||
|
||||
You can also let reindex automatically parallelize using <<sliced-scroll>> to
|
||||
You can also let `_reindex` automatically parallelize using <<sliced-scroll>> to
|
||||
slice on `_uid`. Use `slices` to specify the number of slices to use:
|
||||
|
||||
[source,js]
|
||||
|
@ -946,7 +946,7 @@ POST _reindex?slices=5&refresh
|
|||
// CONSOLE
|
||||
// TEST[setup:big_twitter]
|
||||
|
||||
Which you also can verify works with:
|
||||
You can also this verify works by:
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -955,7 +955,7 @@ POST new_twitter/_search?size=0&filter_path=hits.total
|
|||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
Which results in a sensible `total` like this one:
|
||||
which results in a sensible `total` like this one:
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -979,7 +979,7 @@ section above, creating sub-requests which means it has some quirks:
|
|||
sub-requests are "child" tasks of the task for the request with `slices`.
|
||||
* Fetching the status of the task for the request with `slices` only contains
|
||||
the status of completed slices.
|
||||
* These sub-requests are individually addressable for things like cancellation
|
||||
* These sub-requests are individually addressable for things like cancelation
|
||||
and rethrottling.
|
||||
* Rethrottling the request with `slices` will rethrottle the unfinished
|
||||
sub-request proportionally.
|
||||
|
@ -992,7 +992,7 @@ are distributed proportionally to each sub-request. Combine that with the point
|
|||
above about distribution being uneven and you should conclude that the using
|
||||
`size` with `slices` might not result in exactly `size` documents being
|
||||
`_reindex`ed.
|
||||
* Each sub-requests gets a slightly different snapshot of the source index
|
||||
* Each sub-request gets a slightly different snapshot of the source index,
|
||||
though these are all taken at approximately the same time.
|
||||
|
||||
[float]
|
||||
|
@ -1000,12 +1000,12 @@ though these are all taken at approximately the same time.
|
|||
===== Picking the number of slices
|
||||
|
||||
If slicing automatically, setting `slices` to `auto` will choose a reasonable
|
||||
number for most indices. If you're slicing manually or otherwise tuning
|
||||
number for most indices. If slicing manually or otherwise tuning
|
||||
automatic slicing, use these guidelines.
|
||||
|
||||
Query performance is most efficient when the number of `slices` is equal to the
|
||||
number of shards in the index. If that number is large, (for example,
|
||||
500) choose a lower number as too many `slices` will hurt performance. Setting
|
||||
number of shards in the index. If that number is large (e.g. 500),
|
||||
choose a lower number as too many `slices` will hurt performance. Setting
|
||||
`slices` higher than the number of shards generally does not improve efficiency
|
||||
and adds overhead.
|
||||
|
||||
|
@ -1018,10 +1018,10 @@ documents being reindexed and cluster resources.
|
|||
[float]
|
||||
=== Reindex daily indices
|
||||
|
||||
You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
|
||||
to reindex daily indices to apply a new template to the existing documents.
|
||||
You can use `_reindex` in combination with <<modules-scripting-painless, Painless>>
|
||||
to reindex daily indices to apply a new template to the existing documents.
|
||||
|
||||
Assuming you have indices consisting of documents as following:
|
||||
Assuming you have indices consisting of documents as follows:
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -1032,12 +1032,12 @@ PUT metricbeat-2016.05.31/_doc/1?refresh
|
|||
----------------------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
The new template for the `metricbeat-*` indices is already loaded into Elasticsearch
|
||||
The new template for the `metricbeat-*` indices is already loaded into Elasticsearch,
|
||||
but it applies only to the newly created indices. Painless can be used to reindex
|
||||
the existing documents and apply the new template.
|
||||
|
||||
The script below extracts the date from the index name and creates a new index
|
||||
with `-1` appended. All data from `metricbeat-2016.05.31` will be reindex
|
||||
with `-1` appended. All data from `metricbeat-2016.05.31` will be reindexed
|
||||
into `metricbeat-2016.05.31-1`.
|
||||
|
||||
[source,js]
|
||||
|
@ -1059,7 +1059,7 @@ POST _reindex
|
|||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
All documents from the previous metricbeat indices now can be found in the `*-1` indices.
|
||||
All documents from the previous metricbeat indices can now be found in the `*-1` indices.
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -1069,13 +1069,13 @@ GET metricbeat-2016.05.31-1/_doc/1
|
|||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
The previous method can also be used in combination with <<docs-reindex-change-name, change the name of a field>>
|
||||
to only load the existing data into the new index, but also rename fields if needed.
|
||||
The previous method can also be used in conjunction with <<docs-reindex-change-name, change the name of a field>>
|
||||
to load only the existing data into the new index and rename any fields if needed.
|
||||
|
||||
[float]
|
||||
=== Extracting a random subset of an index
|
||||
|
||||
Reindex can be used to extract a random subset of an index for testing:
|
||||
`_reindex` can be used to extract a random subset of an index for testing:
|
||||
|
||||
[source,js]
|
||||
----------------------------------------------------------------
|
||||
|
@ -1100,5 +1100,5 @@ POST _reindex
|
|||
// CONSOLE
|
||||
// TEST[setup:big_twitter]
|
||||
|
||||
<1> Reindex defaults to sorting by `_doc` so `random_score` won't have any
|
||||
<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
|
||||
effect unless you override the sort to `_score`.
|
||||
|
|
Loading…
Reference in New Issue