really bad happens
If the `onFailure()` method is called on the cluster state task then
something bad happened that we can't really deal with so this change
throws an ElasticsearchException in that case and the step will be
re-executed when the policy is next triggered. If we can't submit a
cluster state task then we can't move to the error state so there isn't
really anything else we can do here. If the cluster state task fails
like this there are probably bigger issues witht he cluster anyway.
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/ExecuteStepsUpdateTask.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTask.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToNextStepUpdateTask.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/ClusterStateUpdateStepTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/ExecuteStepsUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToNextStepUpdateTaskTests.java
* Adds infrastructure for dealing with errors in step execution
This change adds a new ErrorStep which a step can move to if it
encounters an error it cannot automatically recover from by retrying on
the next execution. The error step is special in that it cannot
complete. The intention is that the user will need to call an API to man
ually mocve the step in order to progress the index's lifecycle. The
error step retains the phase and action names of the step before it but
with the step name set to `ERROR`. For this reason no ordinary step can
have this name. `AbstractStepTestCase.testStepNameNotError()` ensures
that no step uses `ERROR` as the step name for either its stepKey or
its nextStepKey.
The new `index.lifecycle.failed_step` setting is used
to store the name of the failed step so the user can know in which step
the error occured. More error information will be added shortly.
The async steps will now move to the error step if listener.onFailure()
is called.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/LifecycleSettings.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/ErrorStep.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunner.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTask.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/PolicyStepsRegistry.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/ErrorStepTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunnerTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToNextStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/PolicyStepsRegistryTests.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ErrorStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/LifecycleSettings.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/AbstractStepTestCase.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ErrorStepTests.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunner.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTask.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/PolicyStepsRegistry.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunnerTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToNextStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/PolicyStepsRegistryTests.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ErrorStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/LifecycleSettings.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/AbstractStepTestCase.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ErrorStepTests.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunner.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTask.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/PolicyStepsRegistry.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunnerTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToErrorStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/MoveToNextStepUpdateTaskTests.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/PolicyStepsRegistryTests.java
* Addresses review comments
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ErrorStep.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ErrorStepTests.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunner.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunnerTests.java
The current implementation starts/stops watcher using an executor. This
can result in our of order operations.
This commit reduces those executor calls to an absolute minimum in order
to be able to do state changes within the cluster state listener method,
which runs in sequence.
When a state change occurs that forces the watcher service to pause
(like no watcher index, no master node, no local shards), the service is
now in a paused state.
Pausing is a super lightweight operation, which marks the
ExecutionService as paused and waits for the currently executing watches
to finish in the background via an executor. The same applies for
stopping, the potentially long running operation is outsourced in to an
executor, as waiting for executed watches is decoupled from the current
state.
The only other long running operation is starting, where watches need to
be loaded. This is also done via an executor, but has an additional
protection by checking the cluster state version it was started with. If
another cluster state version was trying to load the watches, then this
loading will not take effect.
This PR also cleans up some unused states, like the a simple boolean in
the HistoryStore/TriggeredWatchStore marking it as started or stopped,
as this can now be caught in the execution service.
Another advantage of this approach is the fact, that now only triggered
watches are not getting executed, while watches that are run via the
Execute Watch API will still be executed regardless if watcher is
stopped or not.
Lastly the TickerScheduleTriggerEngine thread now only starts on data nodes.
This commit is a follow up to #30135. It updates the stream
compatibility versions in the start_trial requests and responses to
reflect that fact that this work has been backported to 6.3.
Necessary changes so that the licensing functionality can be
used in a JVM in FIPS 140 approved mode.
* Uses adequate salt length in encryption
* Changes key derivation to PBKDF2WithHmacSHA512 from a custom
approach with SHA512 and manual key stretching
* Removes redundant manual padding
Other relevant changes:
* Uses the SAH512 hash instead of the encrypted key bytes as the
key fingerprint to be included in the license specification
* Removes the explicit verification check of the encryption key
as this is implicitly checked in signature verification.
This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase.
closes#12792
* SQL: Reduce number of ranges generated for comparisons
Rewrote optimization rule for combining ranges by improving the
detection of binary comparisons in a tree to better combine
them in a range, regardless of their place inside an expression.
Additionally, improve the comparisons of Numbers of different types
Also, improve reassembly of conjunction/disjunction into balanced
trees.
Do not promote BinaryComparisons to Ranges since it introduces NULL
boundaries and thus a corner-case that needs too much handling
Compare BinaryComparisons directly between themselves and to Ranges
Fix#30017
This commit adds a tombstone document into Lucene for every No-op.
With this change, Lucene index is expected to have a complete history
of operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.
Relates #29530
* master:
Fix message content in users tool (#30293)
[DOCS] Fixes links to breaking changes
[DOCS] Adds new installation package details (#29590)
Revert "Build: Move gradle wrapper jar to a dot dir (#30146)"
The elasticsearch-users utility had various messages that were
outdated or incorrect. This commit updates the output from this
command to reflect current terminology and configuration.
The variadic constructor was only used in a few places and the
RepositoriesMetaData class is backed by a List anyway, so just using a
List will make it simpler to instantiate it.
Cause the CLI to ignore commands that are empty or consist only of
newlines. This is a fairly standard thing for SQL CLIs to do.
It looks like:
```
sql> ;
sql>
|
| ;
sql> exit;
Bye!
```
I think I *could* have implemented this with a `CliCommand` that throws
out empty string but it felt simpler to bake it in to the `CliRepl`.
Closes#30000
xpack core contains a fork of `Cron` from quartz who's javadoc has a
`<table>` with non-html5 compatible stuff. This html5ifies the table and
switches the `:x-pack:plugin:core` project to building javadoc with
HTML5.
* Adds ClusterState to AsyncActionStep.performAction
This is needed so a new step can be created for the shrink action which
can select a node to allocate to based on the current routing rules and
the node attributes on teh discovery nodes.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/AsyncActionStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/DeleteStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ForceMergeStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/RolloverStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkSetAliasStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/UpdateSettingsStep.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/DeleteStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ForceMergeStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/RolloverStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkSetAliasStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/UpdateSettingsStepTests.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunner.java
x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleService.java
x-pack/plugin/index-lifecycle/src/test/java/org/elasticsearch/xpack/inde
xlifecycle/IndexLifecycleRunnerTests.java
* Adds single node allocation to shrink
This change adds two new steps as the first steps of the shrink action:
1. A `SetSingleNodeAllocateStep` which:
1. Determines which of the active nodes match the existing index
allocation rules
2. Randomly (using Randomness so its deterministic for testing)
picks one of the matching nodes
3. Updates the index settings to add a require allocation rule for
the node that was picked (using the
`index.routing.allocation.require._name` setting)
2. An `AllocationRoutedStep` which ensures that at least one copy of
each shard is allocated according to the new allocation rules
Note that this change also modifies the `AllocationRoutedStep` to add a
boolean field which determines whether the allocation is complete when
at least one copy of each shard matches the allocation rulees or if it
needs to wait for all shard copies to be allocated according to the
rules.
Lastly, a `randomStepKey()` method is added to `AbstractStepTestCase`
for convenience.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/AllocateAction.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/AllocationRoutedStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/SetSingleNodeAllocateStep.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkAction.java
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkSetAliasStep.java
x-pack/plugin/core/src/test/java/org/elasticsearch/action/admin/indices/
settings/put/UpdateSettingsTestHelper.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/AbstractStepTestCase.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/AllocationRoutedStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/SetSingleNodeAllocateStepTests.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/ShrinkActionTests.java
* Fixes AllocationRoutedStep when `waitForAllShardCopies=false`
This change fixes `AllocationRoutedStep` so that when
`waitForAllShardCopies=false` we wait for any shard copy of each shard
to be allocated according to the allocation rules rather than
specifically the primary of each shard.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/AllocationRoutedStep.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/AllocationRoutedStepTests.java
* Corrects Licence headers and typo
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifec
ycle/SetSingleNodeAllocateStep.java
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/indexlifec
ycle/SetSingleNodeAllocateStepTests.java
This commit refactors the DataStreamDiagnostics class
achieving the following advantages:
- simpler code; by encapsulating the moving bucket histogram
into its own class
- better performance; by using an array to store the buckets
instead of a map
- explicit handling of gap buckets; in preparation of fixing #30080
Starting with the refactoring in https://github.com/elastic/elasticsearch/pull/22778 (released in 5.3) we may fail to properly replicate operation when a mapping update on master fails. If a bulk
operations needs a mapping update half way, it will send a request to the master before continuing
to index the operations. If that request times out or isn't acked (i.e., even one node in the cluster
didn't process it within 30s), we end up throwing the exception and aborting the entire bulk. This is
a problem because all operations that were processed so far are not replicated any more to the
replicas. Although these operations were never "acked" to the user (we threw an error) it cause the
local checkpoint on the replicas to lag (on 6.x) and the primary and replica to diverge.
This PR does a couple of things:
1) Most importantly, treat *any* mapping update failure as a document level failure, meaning only
the relevant indexing operation will fail.
2) Removes the mapping update callbacks from `IndexShard.applyIndexOperationOnPrimary` and
similar methods for simpler execution. We don't use exceptions any more when a mapping
update was successful.
I think we need to do more work here (the fact that a single slow node can prevent those mappings
updates from being acked and thus fail operations is bad), but I want to keep this as small as I can
(it is already too big).
The overall NOTICE file for the ML X-Pack module should
include the notices from the 3rd party C++ components as
well as the 3rd party Java components.
Currently, the only way to get the REST response for the `/_cluster/state`
call to return the `cluster_uuid` is to request the `metadata` metrics,
which is one of the most expensive response structures. However, external
monitoring agents will likely want the `cluster_uuid` to correlate the
response with other API responses whether or not they want cluster
metadata.
We had a number of awaitsFix links that weren't updated after the xpack
merge.
Where possible I changed the links to the new locations, but in some
circumstances the original ticket was closed (suggesting the awaitsfix
should be removed) or was otherwise unclear the status.
* master: (24 commits)
Watcher: Ensure mail message ids are unique per watch action (#30112)
REST: Remove GET support for clear cache indices (#29525)
SQL: Correct error message (#30138)
Require acknowledgement to start_trial license (#30135)
Fix a bug in FieldCapabilitiesRequest#equals and hashCode. (#30181)
SQL: Add BinaryMathProcessor to named writeables list (#30127)
Tests: Use buildDir as base for generated-resources (#30191)
Fix SliceBuilderTests#testRandom failures
Build: Fix deb version to use tilde with prerelease versions (#29000)
Fix edge cases in CompositeKeyExtractorTests (#30175)
Document time unit limitations for date histograms (#30177)
Add support for field capabilities to the high-level REST client. (#29664)
Remove licenses missed by the migration (#30128)
[DOCS] Updates docker installation package details (#30110)
Fix TermsSetQueryBuilder.doEquals() method (#29629)
[Monitoring] Remove unhelpful Monitoring tests (#30144)
[Test] Fix RenameProcessorTests.testRenameExistingFieldNullValue() (#29655)
add copyright/scope configuration for intellij to Contributing Guide (#29688)
[test] include oss tar in packaging tests (#30155)
TEST: Update settings should go through cluster state (#29682)
...
Email message IDs are supposed to be unique. In order to guarantee this,
we need to take the action id of a watch action into account as well,
not just the watch id from the watch execution context. This prevents
that two actions from the same watch execution end up with the same
message id.
This is related to #30134. It modifies the start_trial action to require
an acknowledgement parameter in the rest request to actually start the
trial license. There are backwards compatibility issues as prior ES
versions did not support this parameter. To handle this, it is assumed
that a request coming from a node prior to 6.3 is acknowledged. And
attempts to write a non-acknowledged request to a prior to 6.3 node will
throw an exception.
Additionally this PR adds messages about the trial license the user is
generating.
Currently the test picks random java.util.TimeZone ids in some places.
Internally we still need to convert back to joda DateTimeZone by id
occassionally (e.g. when serializing to pre 6.3 versions). There are
some deprecated "SystemV/*" time zones that Jodas DateTimeZone refuses
to convert. This change excludes those rare cases from the set of
allowed random time zones. It would be quiet odd for them to appear in
practice.
Closes#30156
A few of the old style license got kept around because their comment
string did not start with a space. This caused the license check to not
see it as a license and skip it. This commit cleans it up.
The purpose of this change is to reflect what is the future way of configuring aliases in
the Rollover action for ILM. With this change, users will be able to edit all desired rollover-related
information in the index's template, so that the index-lifecycle policy does not need to be dependent on it.
* es/master:
Watcher: Fold two smoke test projects into smoke-test-watcher (#30137)
In the field capabilities API, deprecate support for providing fields in the request body. (#30157)
Set JAVA_HOME before forking setup commands (#29647)
Remove animal sniffer from low-level REST client (#29646)
Cleanup .gitignore (#30145)
Do not add noop from local translog to translog again (#29637)
Build: Assert jar LICENSE and NOTICE files match
Correct transport compression algorithm in docs (#29645)
[Test] Fix docs check for DEB package in packaging tests (#30126)
Painless: Docs Clean Up (#29592)
Fixes Eclipse build for sql jdbc project (#30114)
Remove reference to `not_analyzed`.
[Docs] Add community analysis plugin (#29612)
Adds tasks that check that the all jars that we build have LICENSE.txt
and NOTICE.txt files and that the files are correct. Sets check to
depend on these task.
This is mostly there for extra parnoia because we automatically
configure all Jar tasks to include the LICENSE.txt and NOTICE.txt
files anyway. But it is quite possible to add configuration to those
tasks that would override either file.
This causes check to depend on several more things than it used to.
Take, for example, javadoc:
check depends on the new verifyJavadocJarNotice which depends on
extractJavadocJar which depends on javadocJar which depends on
javadoc, this check now depends on javadoc.
The bundled configuration isn't recognised by eclipse so these
dependencies are missed when it imports the `x-pack:plugin:sql:jdbc`
project. This change makes these dependencies compile dependencies if
the build is running for Eclipse.
Tests need to wait for changes to the job's established memory usage to
propagate and an over enthusiastic optimisation meant jobs were updated
from stale state causing recent change to be lost.
This commit fixes the classpath for the SQL CLI tool on Windows. As the
x-pack bin folder was collapsed into the distribution bin folder, the
location of the classpath here needed to no longer contain the old
plugins directory.
This commit adds the distribution type to the startup scripts so that we
can discern from log output and the main response the type of the
distribution (deb/rpm/tar/zip).
With the move of X-Pack to a module, the classpath for the scripts needs
to be adjusted. This was done on Unix, but not for Windows. This commit
addresses Windows.
This commit adds the distribution flavor (default versus oss) to the
build process which is passed through the startup scripts to
Elasticsearch. This change will be used to customize the message on
attempting to install/remove x-pack based on the distribution flavor.
This commit makes x-pack a module and adds it to the default
distrubtion. It also creates distributions for zip, tar, deb and rpm
which contain only oss code.
The follow index api completely reuses CCS infrastructure that was exposed via:
https://github.com/elastic/elasticsearch/pull/29495
This means that the leader index parameter support the same ccs index
to indicate that an index resides in a different cluster.
I also added a qa module that smoke tests the cross cluster nature of ccr.
The idea is that this test just verifies that ccr can read data from a
remote leader index and that is it, no crazy randomization or indirectly
testing other features.
Previously, phase X's `after` step had `X` as its associated phase. This causes confusion because
we have only entered phase `X` once the `after` step is complete. Therefore, this refactor
pushes the after's phase to be associated with the previous phase. This first phase is an exception.
The first phase's `after` step is associated with the first phase (not some non-existent prior phase).
- Renames EnoughShardsWaitStep to ShrunkShardsAllocatedStep
- Changes ShrunkShardsAllocatedStep to check the shards of the shrunken index rather than the current one
- shrink index prefix is now passed into the steps of the shrink aciton
- Related Test Changes
Adds some missing tests including checking the hashcode and equals methods of `DeleteStep`, `StepKey`, and `TerminalPolicyStep` as well as adding a test for `DeleteAction.toSteps()`
- added ShrinkStep/Tests
- AsyncActionStep now passes in IndexMetaData instead of Index
- Delete usage of ClusterStateActionStep
- with ClusterStateActionStep gone, InitializePolicyContextStep
is the only other ClusterState-nonWait step
- Migrate setting-updates to UpdateSettingsStep
Also renames EnoughShardsWaitStep to ReplicasAllocatedStep, removes it from the allocate action and adds a check that th number of replicas in the cluster state is correct to it.
Various classes had some code that was not used and is not going to be needed so this change cleans up those classes so we don’t have dead code hanging around
The force-merge is an a TODO state due to the
unresolved issue around best_compression.
- updated ReadOnlyStep with tests
- implemented an update to the ForceMergeAction
- added UpdateBestCompressionSettingsStep
- added tests for SegmentCountStep
keep track of shard follow stats inside shard follow stats' node task instead of persistent task status.
By maintaining the shard follow stats inside its node task the stats update is quicker as
no cluster state update is required. The stats are now transient; meaning if the task
is going to run a different node then the stats are gone too. Currently only the processed
global checkpoint is being tracked and this is being restored when a shard follow node task
starts via the indices stats api (the reason of the first change of this change). Other stats
that we may add in the future (like fetch_time, see: https://gist.github.com/s1monw/dba13daf8493bf48431b72365e110717)
it is ok if we start from zero in case a shard follow task moves to another node.
This limit is based on the number of estimate bytes in each translog
operation that fall between the minimum and maximum request sequence number.
If this limit is met then the shard follow task executor will make sure
that a subsequent shard changes request will be performed to fetch the
remaining translog operations.
This limit is needed in order to protect against returning too many
translog operations in a single shard changes response.
Relates to #2436
We check for the existence of both leader and follower index, then properly
report to the caller. However, we do not return after reporting failure. This
causes the caller receive exception twice: IllegalArgumentException then
NullPointerException. This commit makes sure to stop the action after reporting
failure.
Looks like we need to split out the tests of core classes to core
and index-lifecycle ones stay in index-lifecycle.
I believe I got everything, although I may have missed at least one thing
checked status with
$ ./gradlew :x-pack-elasticsearch:plugin:index-lifecycle:check -Dtests.seed=39838421912001B4
$ ./gradlew :x-pack-elasticsearch:plugin:core:check -Dtests.seed=39838421912001B4
other things done in this PR:
- removal of a few unused variables/thrown exceptions/imports
- fix TimeseriesLifecycleTypeTests
- an all null AllocateAction was created
- fix AllocateActionTests
- woops. -Dtests.seed=39838421912001B4 resulted in two `null`s and an emptyMap.
this resulted in a test failure.
Specifically this change makes it optional to:
* Specify `includes`, `excludes` and `requires`maps in the allocate action as long as at least one fo the options is specified and is not an empty map
* Specify an `after` parameter on a phase. If no `after` value is specified `TimeValue.ZERO` is used and the phase will be moved to as soon as the previous phase reports `ACTIONS COMPLETED`. `after` is always non-null when we are serialising the Phase.
* Specify a `type` for a LifecyclePolicy. If no `type` is specified `TimeSeriesLifecycleType.INSTANCE` is used since this is currently the only production `type`. `type` is always non-null when we are serialising the LifecyclePolicy.
This commit enables the run task for ccr by specifying that the ccr
project not be evaluated until after core is evaluated. This is
important since ccr is alphabetically before core and thus Gradle
evaluates it first.
Relates #3665
This PR adds a new setting called `index.lifecycle.date` that
the ShrinkAction will be responsible for populating in the newly created index.
This way, we can continue to know when we should be executing the next phase
relative to the original index creation date, and not that of the shrunken index.
A node can stop being the master node whilst it is running, e.g. if it can’t access `minimumMasterNodes` number of master eligible nodes. Because of this we need logic in `IndexLifecycleService` that cancels the scheduled job if the node is no longer master and re-adds the job if the node becomes master again.
This does the following in sequential service polls
1. sets the index to read-only and runs shrink with a modified `index.lifecycle.name` setting set to `null`.
2. checks to see if shrink is complete, if it is...
b. set target index's `index.lifecycle.*` settings to the original index's values.
3. if not complete, just wait till next iteration
4. if operating on shrunken index, delete old index and add it as an alias to shrunken index
* Adds Allocate lifcycle action
* Addresses review comments
Still need to make a change in core for the FilterAllocationDecider to make the execute logic simpler
* Addresses more review comments
* Adds randomMap method to AllocateActionTests
* Addresses further review comments
The checkpoints in the assertion message that the follower checkpoint is
less than the leader checkpoint are backwards. This commit fixes this
message.
* Improves handling of exceptions in Index Lifecycle
This change improves a few different aspects:
* If an exception occurs executing the lifecycle of one index it is caught, logged and other indexes are still processed
* If the lifecycle policy specified in the settings does not exist an error is logged
* Fixes the exception when the delete action is run which occurs because Phase attempts to update the phase and action settings for the deleted index. A `LifecycleAction.indexSurvives()` method is introduced which defaults to `true` but can be overridden to indicate whether the index survives following completion of the action.
* Adds test
* Fix InternalIndexLifecycleContext to update state in memory
The internal and the mock index-lifecycle-context implementations differed
in that the InternalIndexLifecycleContext assumed no one would be using it after
it mutated state. This is not the case. We assume that the current context is updated after
a `#setAction` is called so that the listener can then appropriately use the newly modified
cluster state. since idxMeta was not being updated, any call to `context.getAction` was stale and
either returning null or the previous action, not the next action that was updated by `#setAction`.
Same goes for `setPhase`.
This PR should fix this so that the Mock and Internal implementations are more in line.
The shard follow task executor determines the range of translog operations
between the leader shard's global checkpoint and the last know processed
seqno by the current shard follow task that are missing.
Then the chunks coordinator can then chunk this range up in smaller ranges
if the requested range is above the configured max chunk size. If it is
smaller than the entire range then the chunk coordinator has just one
chuck to coordinate.
Each chunk is added to a queue and is processed by the ChunkProcessor,
that reads the translog ops from the leader shard and then indexes
these translog ops into the follow shard. After that a new chuck is polled
from the queue and the ChunkProcessor performs the same actions until
there are no more chunks in the queue to process. After that the shard
follow task executor will determine a new range of translog operations
to process.
This change changes the chunk coordinator to start polling from the chunk
queue with multiple threads at the same time to handle dealing with a higher
indexing load on the leader side better.
`IndexLifecycleInitialisationIT.testMasterFailover()` intermittently failed because the timeout of 10 seconds to check if the index had been deleted was not long enough sometimes with the poll interval set to 3 seconds. This change sets the poll interval to 1 seconds for the test so that the lifecycle is more responsive. This also means the default value for the poll interval can be safely changed without affecting the test.
This change moves the Action classes and referenced data model classes to the new :x-pack-elasticsearch:plugin:core project in preparation for splitting the x-pack features into their own gradle modules.
Note that the TransportAction classes had to be promoted to their own class file (rather than being inner classes to their Action) so they can remain in the plugin project (and will late be move to the `index-lifecycle` project when its created.
To clean up the parsing of the LifecyclePolicy this change moves the LifecycleType to its own class so it can be created in the normal parsing of LifecyclePolicy rather than having to parse to an intermediary object first. The LifecycleType is an interface which can be implemented for different lifecycle types. These types shiould be singletons and are register with the NamedXContentRegistry and NamedWriteableRegistry only so they are available when reading from a stream or parsing.
Removes the poll-interval from the IndexLifecycleMetadata and introduces it in
the form of a cluster setting that is configurable. Changes to this poll interval setting
will reflect in the Lifecycle Scheduler.
This includes changing NOCOMMIT comments to be NORELEASE comments so the build passes with them. We have tasks inGH for all these NORELEASE comments so they should be caught before merging to master
This action will rollover an index when executed if the provided conditions are met.
Users may specify the maximum age, maximum index size in bytes or maximum index size in number of documents as conditions for rollover.
When the action executes it firsts checks the local cluster state to find out if the alias exists on the index. If the alias does not exist then the index was either rolled over by a previous run or something else has rolled over the index so the action can be marked as completed. If the index still has the alias set the action will make a rollover index request using the Client. When that request returns and the listener is called the action will only be marked as complete if the response indicates the index was rolled over. If the index was not rolled over (because the conditions are not yet met) the action is not marked as complete and will be re-evaluated on the next call to execute.
* Fixed a small issue where each batch would fetch / index the previous batch last operation
* Made batch size a request param on the follow existing index api request.
This makes is easy to tune this param when running tests from scripts.
* Changed default batch size from 256 to 1024.
I forgot to configure a mapping in the follow shard shard, which caused
a dynamic update (due to type auto creation), but this was ignored.
Subsequent searches in follow index then failed due to a mapping missing.
(The _id couldn't be fetched during fetch phase, because the mapping was missing)
We should at a later stage investigate how to best solve this, but for
know to avoid confusion just fail if a dynamic update happens in a
follow shard.
This commit adds a bulk action for apply translog operations in bulk to
an index. This action is then used in the persistent task for CCR to
apply shard changes from a leader shard.
Relates #3147
This test was broken by an upstream change that no longer guarantees we
see the operations from the upstream translog in the order they appear
in that translog. As such, the assertions in this test were too strong
so this commit relaxes them.
Relates #3153
This means that the result of the action can now be async and we can then implement moving immediately to the next action if the current one is complete
The client and index metadata have now been abstracted away from the Lifecycle classes behind IndexLifecycleContext. This allow us to test the state machine without having to worry about how the state is persisted and read. It also makes the classes much easier to read and reason about.
The lifecycles are stored as custom metadata objects in the cluster state. This change also cleans up the parsing of the lifecycle state so that it can be parsed properly
Operations from a leader shard will be indexed into the engine with the
origin set to primary. The problem is here is that then we have primary
semantics in the engine such as assertions about sequence numbers being
unassigned, and we do not have correct semantics for out-of-order
delivery of operations (as we should on a following engine, whether or
not it is primary since the ordering is determined from the
leader). This commit handles this by always using the replica plan for
indexing into a following engine, whether or not the engine is for a
primary shard.
Relates #3000
A following engine even for a primary shard needs to maintain order of
operations semantics as if it were behaving like a replica. That is,
rather than assuming that the order of operations presented to the
engine is the de facto order of operations as is the case for a leader
engine for a primary shard, a following engine must behave like all
replicas behave which is that they resolve order of operations based on
sequence numbers. This commit causes this to be the case for following
engines.
Relates #2931
This test verifies that we have sufficient failover code so that
a newly elected master re-registers old schedules and fires them off.
All times are relative to the index creation date.
This commit is a first step towards a following engine
implementation. Future work will build on this by using this engine to
execute operations on a following engine from another engine (typically
a remote leader engine) that has already assigned sequence numbers to
such operations.
Relates #2776
This commit sets an index setting for the size of a translog generation
and increases the number of documents indexed to increase the chance of
multiple generations being present when testing getting operations
between two sequence numbers.
This commit fixes an off-by-one error in the shard changes action test
for getting operations between two sequence numbers. The off-by-one
error arises because sequence numbers are indexed from zero, so if N
documents are indexed then the maximum sequence number starting from
zero would be N - 1.
* xdcr: Add an internal api to read translog operations between a sequence number range.
This api will be used later by the persistent task for the following index to pull data from the leader index.
The persistent task can fetch the global checkpoint from the shard stats for each primary shard of the leader index.
Based on the global checkpoint of the primary shards of the following index, the persistent task can send several
calls to the internal api added in this commit to replicate changes from follow index to leader index in a batched manner.
Feature consists of a shell of a persistant task which will later be used to inspect the index settings and apply curator like changes to the index (move from hot to warm, rollover, shrink etc.)
This commit utilizes the pluggable engine factory feature in core to
introduce a pluggable engine factory for XDCR. For now this is only a
skeleton implementation to proof out the pluggable engine factory
concept. Future work will implement a genuine following engine for XDCR.
Relates #2655
This commit introduces the container class for CCR functionality. Future
work will expose more specific CCR functionality to the X-Pack plugin
through this class.
Relates #2704