This change adds an extra check that verifies that all primary shards
have been started of an index that is about to be auto followed.
If not all primary shards have been started for an index
then the next auto follow run will try to follow to auto follow
this index again.
Closes#35480
When there is no persistent tasks metadata we could hit a null pointer
exception when executing a follower stats request. This is because we
inspect the persistent tasks metadata. Yet, if no tasks have been
registered, this is null (as opposed to empty). We need to avoid
de-referencing the persistent tasks metadata in this case. That is what
this commit does, and we add a test for this situation.
Currently there is a common NPE in the IndexFollowingIT that does not
indicate the test failing. This is when a cluster state listener is
called and certain index metadata is not yet available.
This commit checks that the metadata is not null before performing the
logic that depends on the metadata.
Removed extending of AbstractComponent and changed logger usage to
explicit declaration. Abstract classes still have logger
declaration using this.getClass() in order to show implementation class
name in its logs.
See #34488
avoid the assertions that check the log files, because that does not work on Windows.
The rest of the test is still useful and should work on Windows CI.
Currently on Windows CI this qa module fails because there is just one test and
that test si ignored if OS is Windows.
Validate remote cluster license as part of put auto follow pattern api call
in addition of validation that when auto follow coordinator starts auto
following indices in the leader cluster.
Also added qa module that tests what happens to ccr after downgrading to basic license.
Existing active follow indices should remain to follow,
but the auto follow feature should not pickup new leader indices.
Adjust list of dynamic index settings that should be replicated
and added a test that verifies whether builtin dynamic index settings
are classified as replicated or non replicated (whitelisted).
This commit uses the index settings version so that a follower can
replicate index settings changes as needed from the leader.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
An auto follow pattern:
* cannot start with `_`
* cannot contain a `,`
* can be encoded in UTF-8
* the length of UTF-8 encoded bytes is no longer than 255 bytes
In order to start shard follow tasks, the resume follow api already
needs execute N requests to the elected master node.
The pause follow API is also a master node action, which would make
how both APIs execute more consistent.
Error was thrown if leader index had no soft deletes enabled, but it then continued creating the follower index.
The test caught this bug, but very rarely due to timing issue.
Build failure instance:
```
1> [2018-11-05T20:29:38,597][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] before test
1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
1> [2018-11-05T20:29:38,609][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [leader-index] creating index, cause [api], templates [random-soft-deletes-templat
e, one_shard_index_template], shards [2]/[0], mappings []
1> [2018-11-05T20:29:38,628][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[leader-
index][0]] ...]).
1> [2018-11-05T20:29:38,660][INFO ][o.e.x.c.a.TransportPutFollowAction] [node_s_0] [follower-index] creating index, cause [ccr_create_and_follow], shards [2]/[0]
1> [2018-11-05T20:29:38,675][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
1> [2018-11-05T20:29:38,676][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] after test
1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] [LocalIndexFollowingIT#testDoNotCreateFoll
owerIfLeaderDoesNotHaveSoftDeletes]: cleaning up after test
1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [follower-index/TlWlXp0JSVasju2Kr_hksQ] deleting index
1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [leader-index/FQ6EwIWcRAKD8qvOg2eS8g] deleting index
FAILURE 0.23s J0 | LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes <<< FAILURES!
> Throwable #1: java.lang.AssertionError:
> Expected: <false>
> but: was <true>
> at __randomizedtesting.SeedInfo.seed([7A3C89DA3BCA17DD:65C26CBF6FEF0B39]:0)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.elasticsearch.xpack.ccr.LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes(LocalIndexFollowingIT.java:83)
> at java.lang.Thread.run(Thread.java:748)
```
Build failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+intake/46/console
The suite FollowerFailOverIT is failing because some documents are not
replicated to the follower. Maybe the FollowTask is not working as
expected or the background indexers eat all resources while the follower
cluster is trying to reform after a failover; then CI is not fast enough
to replicate all the indexed docs within 60 seconds (sometimes I see 80k
docs on the leader).
This commit limits the number of documents to be indexed into the leader
index by the background threads so that we can eliminate the latter
case. This change also replaces a docCount assertion with a docIds
assertion so we can have more information if these tests fail again.
Relates #33337
Only the response classes of get auto follow pattern, the follow and stats APIs
were moved away from Streamable. The other APIs use `AcknowledgedResponse`
or `BaseTasksResponse` as response class and
moving that class away from Streamable is a bigger change.
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
Only the follow stats request couldn't be changed to use Writeable serialization,
because that requires changes in `TransportTasksAction` and `BaseTasksRequest` base classes.
This commit fixes two issues with the CCR API specification:
- remove the CCR stats endpoint, it is not currently implemented
- fix the documentation links
* Changed the auto follow stats to also include follow stats.
* Renamed the auto follow stats api to stats api and changed its url path
from `/_ccr/auto_follow/stats` `/_ccr/stats`.
* Removed `/_ccr/stats` url path for the follow stats api, which makes
the index parameter a required parameter.
* Fixed docs.
Index shard stats for the follower shard are fetched, when a shard follow task is started.
This is needed in order to bootstap the shard follow task with the follower global checkpoint.
Sometimes index shard stats are not available (e.g. during a restart) and
we fail now, while it is very likely that these stats will be available some time later.
This limit is based on the size in bytes of the operations in the write buffer. If this limit is exceeded then no more read operations will be coordinated until the size in bytes of the write buffer has dropped below the configured write buffer size limit.
Renamed existing `max_write_buffer_size` to ``max_write_buffer_count` to indicate that limit is count based.
Closes#34705
We should not create a follower index and abort a follow request if the
leader does not have soft-deletes. Moreover, we also should not
auto-follow an index if it does not have soft-deletes.
With this change, we apply the common test config automatically to all
newly created tasks instead of opting in specifically.
For plugin authors using the plugin externally this means that the
configuration will be applied to their RandomizedTestingTasks as well.
The purpose of the task is to simplify setup and make it easier to
change projects that use the `test` task but actually run integration
tests to use a task called `integTest` for clarity, but also because
we may want to configure and run them differently.
E.x. using different levels of concurrency.
Per #31717 this commit changes the defaults to the following:
Batch size of 5120 ops.
Maximum of 12 concurrent read requests.
Maximum of 9 concurrent write requests.
This is not necessarily our final values but it's good to have these as defaults for the purposes of initial testing.
* Change the `TransportPauseFollowAction` to extend from `TransportMasterNodeAction`
instead of `HandledAction`, this removes a sync cluster state api call.
* Introduced `ResponseHandler` that removes duplicated code in `TransportPauseFollowAction` and
`TransportResumeFollowAction`.
* Changed `PauseFollowAction.Request` to not use `readFrom()`.
Both testFollowIndexAndCloseNode and testFailOverOnFollower failed
because they responded to the FollowTask a TransportService closed
exception which is currently considered as a fatal error. This behavior
is not desirable since a closing node can throw that exception, and we
should retry in that case.
This change adds TransportService closed error to the list of retryable
errors.
Closes#34694