Adds support for `?slices=N` to reindex which automatically
parallelizes the process using parallel scrolls on `_uid`. Performance
testing sees a 3x performance improvement for simple docs
on decent hardware, maybe 30% performance improvement
for more complex docs. Still compelling, especially because
clusters should be able to get closer to the 3x than the 30%
number.
Closes#20624
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!
Funny node names have been removed in #19456 and replaced by UUID. This commit removes these obsolete node names and replace them by real UUIDs in the documentation.
closes#20065
Update-By-Query and Delete-By-Query use internal versioning to update/delete documents. But documents can have a version number equal to zero using the external versioning... making the UBQ/DBQ request fail because zero is not a valid version number and they only support internal versioning for now. Sequence numbers might help to solve this issue in the future.
This adds a get task API that supports GET /_tasks/${taskId} and
removes that responsibility from the list tasks API. The get task
API supports wait_for_complation just as the list tasks API does
but doesn't support any of the list task API's filters. In exchange,
it supports falling back to the .results index when the task isn't
running any more. Like any good GET API it 404s when it doesn't
find the task.
Then we change reindex, update-by-query, and delete-by-query to
persist the task result when wait_for_completion=false. The leads
to the neat behavior that, once you start a reindex with
wait_for_completion=false, you can fetch the result of the task by
using the get task API and see the result when it has finished.
Also rename the .results index to .tasks.
The header indicates to how many shard copies (primary and replicas shards) a write was supposed to go to, to how many
shard copies to write succeeded and potentially captures shard failures if writing into a replica shard fails.
For async writes it also includes the number of shards a write is still pending.
Closes#7994
- Removed "ok": true from response examples
- Added "created" flag to index response examples
- Replaced exists flag with found in delete response examples