The checkpoints in the assertion message that the follower checkpoint is
less than the leader checkpoint are backwards. This commit fixes this
message.
* Improves handling of exceptions in Index Lifecycle
This change improves a few different aspects:
* If an exception occurs executing the lifecycle of one index it is caught, logged and other indexes are still processed
* If the lifecycle policy specified in the settings does not exist an error is logged
* Fixes the exception when the delete action is run which occurs because Phase attempts to update the phase and action settings for the deleted index. A `LifecycleAction.indexSurvives()` method is introduced which defaults to `true` but can be overridden to indicate whether the index survives following completion of the action.
* Adds test
* Fix InternalIndexLifecycleContext to update state in memory
The internal and the mock index-lifecycle-context implementations differed
in that the InternalIndexLifecycleContext assumed no one would be using it after
it mutated state. This is not the case. We assume that the current context is updated after
a `#setAction` is called so that the listener can then appropriately use the newly modified
cluster state. since idxMeta was not being updated, any call to `context.getAction` was stale and
either returning null or the previous action, not the next action that was updated by `#setAction`.
Same goes for `setPhase`.
This PR should fix this so that the Mock and Internal implementations are more in line.
The shard follow task executor determines the range of translog operations
between the leader shard's global checkpoint and the last know processed
seqno by the current shard follow task that are missing.
Then the chunks coordinator can then chunk this range up in smaller ranges
if the requested range is above the configured max chunk size. If it is
smaller than the entire range then the chunk coordinator has just one
chuck to coordinate.
Each chunk is added to a queue and is processed by the ChunkProcessor,
that reads the translog ops from the leader shard and then indexes
these translog ops into the follow shard. After that a new chuck is polled
from the queue and the ChunkProcessor performs the same actions until
there are no more chunks in the queue to process. After that the shard
follow task executor will determine a new range of translog operations
to process.
This change changes the chunk coordinator to start polling from the chunk
queue with multiple threads at the same time to handle dealing with a higher
indexing load on the leader side better.
`IndexLifecycleInitialisationIT.testMasterFailover()` intermittently failed because the timeout of 10 seconds to check if the index had been deleted was not long enough sometimes with the poll interval set to 3 seconds. This change sets the poll interval to 1 seconds for the test so that the lifecycle is more responsive. This also means the default value for the poll interval can be safely changed without affecting the test.
This change moves the Action classes and referenced data model classes to the new :x-pack-elasticsearch:plugin:core project in preparation for splitting the x-pack features into their own gradle modules.
Note that the TransportAction classes had to be promoted to their own class file (rather than being inner classes to their Action) so they can remain in the plugin project (and will late be move to the `index-lifecycle` project when its created.
To clean up the parsing of the LifecyclePolicy this change moves the LifecycleType to its own class so it can be created in the normal parsing of LifecyclePolicy rather than having to parse to an intermediary object first. The LifecycleType is an interface which can be implemented for different lifecycle types. These types shiould be singletons and are register with the NamedXContentRegistry and NamedWriteableRegistry only so they are available when reading from a stream or parsing.
Removes the poll-interval from the IndexLifecycleMetadata and introduces it in
the form of a cluster setting that is configurable. Changes to this poll interval setting
will reflect in the Lifecycle Scheduler.
This includes changing NOCOMMIT comments to be NORELEASE comments so the build passes with them. We have tasks inGH for all these NORELEASE comments so they should be caught before merging to master
This action will rollover an index when executed if the provided conditions are met.
Users may specify the maximum age, maximum index size in bytes or maximum index size in number of documents as conditions for rollover.
When the action executes it firsts checks the local cluster state to find out if the alias exists on the index. If the alias does not exist then the index was either rolled over by a previous run or something else has rolled over the index so the action can be marked as completed. If the index still has the alias set the action will make a rollover index request using the Client. When that request returns and the listener is called the action will only be marked as complete if the response indicates the index was rolled over. If the index was not rolled over (because the conditions are not yet met) the action is not marked as complete and will be re-evaluated on the next call to execute.
* Fixed a small issue where each batch would fetch / index the previous batch last operation
* Made batch size a request param on the follow existing index api request.
This makes is easy to tune this param when running tests from scripts.
* Changed default batch size from 256 to 1024.
I forgot to configure a mapping in the follow shard shard, which caused
a dynamic update (due to type auto creation), but this was ignored.
Subsequent searches in follow index then failed due to a mapping missing.
(The _id couldn't be fetched during fetch phase, because the mapping was missing)
We should at a later stage investigate how to best solve this, but for
know to avoid confusion just fail if a dynamic update happens in a
follow shard.
This commit adds a bulk action for apply translog operations in bulk to
an index. This action is then used in the persistent task for CCR to
apply shard changes from a leader shard.
Relates #3147
This test was broken by an upstream change that no longer guarantees we
see the operations from the upstream translog in the order they appear
in that translog. As such, the assertions in this test were too strong
so this commit relaxes them.
Relates #3153