We keep track of the current stage of recovery using an instance of RecoveryState which is stored on the relevant IndexShard. At the moment changes to this object are made in many places of the code, which are charged of doing it in the right order, keeping track of timers and many more. Also the changes to shard state are decoupled from the recovery stages which caused #9503.
This PR refactors this and brings all of the changes into IndexShard. It also makes all recovery follow the exact same stages and shortcut some. This is in order to keep things simple and always the same (those shortcuts didn't add anything, we ended doing it all anyway).
Also, all timer management is now folded into RecoveryState and unit tests are added.
This closes#9503 by moving the shard to post recovery only once the recovery is done (before they were decoupled), meaning that master promotion of the target shard to started can not cancel the recovery.
Closes#9902
#9760 was a fix for translog leaking due to measing a delete flag. This is not needed here as we have a better solution to not loose the flag. This commit takes the changes from 1x in order to keep the code base similar and enjoy the extra tests.
Closes#9760
While the parser allowed changing field type settings, these would never
have been serialized. So this change simply removes parsing using
parseField. Backcompat will still work if a user uploads old settings
(they just would never have worked anyways, so we continue ignoring
them with 1.x, and 2.x will now error).
see #8143closes#9914
This setting is used by the release script to run rest tests against
the version being released. It used to work only for tests using
the global cluster. Now it supercedes both SUITE and TEST scope
test clusters.
closes#9916
Closes#9915.
Squashed commit of the following:
commit cfa59f5a3f03d9d1b432980dcee6495447c1e7ea
Author: Robert Muir <rmuir@apache.org>
Date: Fri Feb 27 12:10:16 2015 -0500
add missing null check
commit 62fe5403068c730c0e0b6fd1ab1a0246eeef6220
Author: Robert Muir <rmuir@apache.org>
Date: Fri Feb 27 11:31:53 2015 -0500
Disable ExtrasFS for now, until we hook all this in properly in a separate issue.
commit 822795c57c5cf846423fad443c2327c4ed0094ac
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Feb 27 10:12:02 2015 +0100
Fix PercolatorTests.
commit 98b2a0a7d8298648125c9a367cb7e31b3ec7d51b
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Feb 27 09:27:11 2015 +0100
Fix ChildrenQueryTests.
commit 9b99656fc56bbd01c9afe22baffae3c37bb48a71
Author: Robert Muir <rmuir@apache.org>
Date: Thu Feb 26 20:50:02 2015 -0500
cutover apis, no work on test failures yet.
Some tests failures are seen when a node attempts to use a port that is already bound
by some other process on the test machine. This commit adds a bind to test port availability
and iterates over the port range until an available port is found. This reduces the likelihood
of a test node failing to start up due to the port already being bound.
Today we have two ways of getting a setting, either with the full settings key or with only
the last part of the key where the prefix is implicit depending on the package the class is in via
component settings. this is trappy as well as confusing for users and can break easily if a class is moved
to a new package since the prefix then implicitly changes.
This commit removes the component settings from the codebase.
Almost all of our meta fields that allow enabling/disabling have an `enabled`
setting. However, _field_names is enabled by default, and disabling
requires setting `index=no`. This change adds a flag similar to that
with other meta fields.
closes#9893
Random geo shape testing periodically fails on a known issue within Spatial4j core. A simple patch in ES will fix the issue. For now this random test will be disabled until the patch can be applied.
The request tracer logs in TRACE level under the `transport.tracer` log and is dynamically configurable with include and exclude arrays to filter out unneeded info. By default all requests are logged with the exception of fault detection pings (fired every second).
add the notion of tracers in the MockTransportService for testing purposes
Closes#9286
Currently rounding in DateMathParser This always done in UTC, even
when another time zone is specified. This is fixed by passing the time zone
down to the rounding logic when it is specified.
Closes#9814Closes#9885
To support the `_recovery` API, the recovery process keeps track of current progress in a class called RecoveryState. This class currently have some issues, mostly around concurrency (see #6644 ). This PR cleans it up as well as other issues around it:
- Make the Index subsection API cleaner:
- remove redundant information - all calculation is done based on the underlying file map
- clearer definition of what is what: total files, vs reused files (local files that match the source) vs recovered files (copied over). % based progress is reported based on recovered files only.
- cleaned up json response to match other API (sadly this breaks the structure). We now properly report human values for dates and other units.
- Add more robust unit testing
- Detail flag was passed along as state (it's now a ToXContent param)
- State lookup during reporting is now always done via the IndexShard , no more fall backs to many other classes.
- Cleanup APIs around time and move the little computations to the state class as opposed to doing them out of the API
I also improved error messages out of the REST testing infra for things I run into.
Closes#6644Closes#9811
Together with #8782 it should help in the situations simliar to #8887 by adding an ability to get information about currently running snapshot without accessing the repository itself.
Closes#8887
The number of current pending tasks is useful to detect and overloaded master. This commit adds it to the cluster health API. The complete list can be retrieved from the dedicated pending tasks API.
It also adds rest tests for the cluster health variants.
Closes#9877
Today we fail the shard if we need to upgrade a replica to a primary on shadow replicas
on shared filesystem. Yet, this commit allows promotion by re-initializing on the master preventing
reallocation of all replicas.
We try to lock all shards when an index is deleted but likely not
succeeding since shards are still active. To ensure that shards
that used to be allocated on that node get cleaned up as well we have
to retry or block on the delete until we get the locks. This is not desirable
since the delete happens on the cluster state processing thread. Instead of blocking
this commit schedules a pending delete for the index just like if we can't delete shards.
Otherwise the fs repository metadata that points to non-existing location is stored in the old index cluster state, which causes warnings during OldIndexBackwardsCompatibilityTests.
There are two implications to this change.
First, percolator now uses _uid internally, extracting the id portion
when needed. Second, sorting on _id is no longer possible, since you
can no longer index _id. However, _uid can still be used to sort, and
is better anyways as indexing _id just to make it available to
fielddata for sorting is wasteful.
see #8143closes#9842
Today if we delete files from the index directory we never acquire the
write lock. Yet, for safety reasons we should get the write lock before
we modify / delete any files. Where we can we should leave the deletion
to the index writer and only delete that are necessary to delete ourself.