This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.
Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.
Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.
Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.
This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
- timestamps
- expiration
- persistence to and recovery from Lucene
- handoff during primary relocation
- sharing retention leases with replicas
- exposing leases in shard-level statistics
- integration with cross-cluster replication
These will occur individually in follow-up commits.
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
The AutoFollowCoordinator should be resilient to the fact that the follower
index has already been created and in that case it should only update
the auto follow metadata with the fact that the follower index was created.
Relates to #33007
Currently auto follow stats users are unable to see whether an auto follow
error was recent or old. The new timestamp field will help user distinguish
between old and new errors.
Both index following and auto following should be resilient against missing remote connections.
This happens in the case that they get accidentally removed by a user. When this happens
auto following and index following will retry to continue instead of failing with unrecoverable exceptions.
Both the put follow and put auto follow APIs validate whether the
remote cluster connection. The logic added in this change only exists
in case during the lifetime of a follower index or auto follow pattern
the remote connection gets removed. This retry behavior similar how CCR
deals with authorization errors.
Closes#36667Closes#36255
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.
It implements the ccr clean_session action with this client.
This is related to #35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
Currently if a leader index with soft deletes disabled is auto followed then this index is silently ignored.
This commit changes this behavior to mark these indices as auto followed and report an error, which is visible in auto follow stats. Marking the index as auto follow is important, because otherwise the auto follower will continuously try to auto follow and fail.
Relates to #33007
This commit is related to #36127. It adds a CcrRestoreSourceService to
track Engine.IndexCommitRef need for in-process file restores. When a
follower starts restoring a shard through the CcrRepository it opens a
session with the leader through the PutCcrRestoreSessionAction. The
leader responds to the request by telling the follower what files it
needs to fetch for a restore. This is not yet implemented.
Once, the restore is complete, the follower closes the session with the
DeleteCcrRestoreSessionAction action.
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.
Relates to #33007
Originates from #35895
If a primary promotion happens in the test testAddRemoveShardOnLeader, the
max_seq_no_of_updates_or_deletes on a new primary might be higher than the
max_seq_no_of_updates_or_deletes on the replicas or copies of the follower.
Relates #36607
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control)
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.
Relates #36148
Relates #10708
For class fields of type collection whose order is not important
and for which duplicates are not permitted we declare them as `Set`s.
Usually the definition is a `HashSet` but in this case `TreeSet` is used
instead to aid testing.
* Enable parallel restore operations
* Add uuid to restore in progress entries to uniquely identify them
* Adjust restore in progress entries to be a map in cluster state
* Added tests for:
* Parallel restore from two different snapshots
* Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator
* Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers
testFailLeaderReplicaShard periodically fails because we concurrently
index to the leader group and close one of its replicas. If a
replication request hits a closing shard, we will fail that shard;
however, failing a shard is supported by the test framework - this makes
the test fail.
This commit add support to engine operations for resolving and verifying the sequence number and
primary term of the last modification to a document before performing an operation. This is
infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning.
Relates #36148
Relates #10708
Changed AutofollowCoordinator makes use of the wait_for_metadata_version
feature in cluster state API and removed hard coded poll interval.
Originates from #35895
Relates to #33007
The auto follow coordinator keeps track of the UUIDs of indices that it has followed. The index UUID strings need to be cleaned up in the case that these indices are removed in the remote cluster.
Relates to #33007
1. CCR tests work without any changes
2. `testDanglingIndices` require changes the source code (added TODO).
3. `testIndexDeletionWhenNodeRejoins` because it's using just two
nodes, adding the node to exclusions is needed on restart.
4. `testCorruptTranslogTruncationOfReplica` starts dedicated master
one, because otherwise, the cluster does not form, if nodes are stopped
and one node is started back.
5. `testResolvePath` needs TEST cluster, because all nodes are stopped
at the end of the test and it's not possible to perform checks needed
by SUITE cluster.
6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2
retries snapshot creation as soon as network partition heals. This
results into the race between creating snapshot and test cleanup logic
(deleting index). Zen1 on the
other hand, also schedules retry, but it takes some time after network
partition heals, so cleanup logic executes latter and test passes. The
check that snapshot is eventually created is added to
the end of the test.
Renamed the follow qa modules:
`multi-cluster-downgraded-to-basic-license` to `downgraded-to-basic-license`
`multi-cluster-with-non-compliant-license` to `non-compliant-license`
`multi-cluster-with-security` to `security`
Moved the `chain` module into the `multi-cluster` module and
changed the `multi-cluster` to start 3 clusters.
Followup from #36031
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.
When the restore shard method is called, an empty shard is initialized.
In #34474, we added a new assertion to ensure that the
LocalCheckpointTracker is always consistent with Lucene index. However,
we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this
assertion to be violated.
This commit removes resetCheckpoint from LocalCheckpointTracker and
rewrites testDedupByPrimaryTerm without resetting the local checkpoint.
Relates #34474
This commit moves back to use explicit dependsOn for test tasks on
check. Not all tasks extending RandomizedTestingTask should be run by
check directly.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
This is a follow-up to #36086. It renames the internal repository
actions to be prefixed by "internal". This allows the system user to
execute the actions.
Additionally, this PR stops casting Client to NodeClient. The client we
have is a NodeClient so executing the actions will be local.
and replaced poll interval setting with a hardcoded poll interval.
The hard coded interval will be removed in a follow up change to make
use of cluster state API's wait_for_metatdata_version.
Before the auto following was bootstrapped from thread pool scheduler,
but now auto followers for new remote clusters are bootstrapped when
a new cluster state is published.
Originates from #35895
Relates to #33007
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
This commit replaces usages of Streamable with Writeable for the
BaseTasksResponse / TransportTasksAction classes and subclasses of
these classes.
Note that where possible response fields were made final.
Relates to #34389
The current response format is:
```
{
"pattern1": {
...
},
"pattern2": {
...
}
}
```
The new format is:
```
{
"patterns": [
{
"name": "pattern1",
"pattern": {
...
}
},
{
"name": "pattern2",
"pattern": {
...
}
}
]
}
```
This format is more structured and more friendly for parsing and generating specs.
This is a breaking change, but it is better to do this now while ccr
is still a beta feature than later.
Follow up from #36049
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.
Closes#33826
AutoFollowCoordinator should take into account that after auto following
an index and while updating that a leader index has been followed, that
the auto follow pattern may have been removed via delete auto follow patterns
api.
Also fixed a bug that when a remote cluster connection has been removed,
the auto follow coordinator does not die when it tries get a remote client for
that cluster.
Closes#35480
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
Some tests kill nodes and otherwise it would take 60s by default
for replicas to get allocated and that is longer than we wait
for getting in a green state in tests.
Relates to #35403
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
This change adds an extra check that verifies that all primary shards
have been started of an index that is about to be auto followed.
If not all primary shards have been started for an index
then the next auto follow run will try to follow to auto follow
this index again.
Closes#35480
When there is no persistent tasks metadata we could hit a null pointer
exception when executing a follower stats request. This is because we
inspect the persistent tasks metadata. Yet, if no tasks have been
registered, this is null (as opposed to empty). We need to avoid
de-referencing the persistent tasks metadata in this case. That is what
this commit does, and we add a test for this situation.
Currently there is a common NPE in the IndexFollowingIT that does not
indicate the test failing. This is when a cluster state listener is
called and certain index metadata is not yet available.
This commit checks that the metadata is not null before performing the
logic that depends on the metadata.
Removed extending of AbstractComponent and changed logger usage to
explicit declaration. Abstract classes still have logger
declaration using this.getClass() in order to show implementation class
name in its logs.
See #34488
avoid the assertions that check the log files, because that does not work on Windows.
The rest of the test is still useful and should work on Windows CI.
Currently on Windows CI this qa module fails because there is just one test and
that test si ignored if OS is Windows.
Validate remote cluster license as part of put auto follow pattern api call
in addition of validation that when auto follow coordinator starts auto
following indices in the leader cluster.
Also added qa module that tests what happens to ccr after downgrading to basic license.
Existing active follow indices should remain to follow,
but the auto follow feature should not pickup new leader indices.
Adjust list of dynamic index settings that should be replicated
and added a test that verifies whether builtin dynamic index settings
are classified as replicated or non replicated (whitelisted).
This commit uses the index settings version so that a follower can
replicate index settings changes as needed from the leader.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
An auto follow pattern:
* cannot start with `_`
* cannot contain a `,`
* can be encoded in UTF-8
* the length of UTF-8 encoded bytes is no longer than 255 bytes
In order to start shard follow tasks, the resume follow api already
needs execute N requests to the elected master node.
The pause follow API is also a master node action, which would make
how both APIs execute more consistent.
Error was thrown if leader index had no soft deletes enabled, but it then continued creating the follower index.
The test caught this bug, but very rarely due to timing issue.
Build failure instance:
```
1> [2018-11-05T20:29:38,597][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] before test
1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
1> [2018-11-05T20:29:38,609][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [leader-index] creating index, cause [api], templates [random-soft-deletes-templat
e, one_shard_index_template], shards [2]/[0], mappings []
1> [2018-11-05T20:29:38,628][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[leader-
index][0]] ...]).
1> [2018-11-05T20:29:38,660][INFO ][o.e.x.c.a.TransportPutFollowAction] [node_s_0] [follower-index] creating index, cause [ccr_create_and_follow], shards [2]/[0]
1> [2018-11-05T20:29:38,675][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
1> [2018-11-05T20:29:38,676][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] after test
1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] [LocalIndexFollowingIT#testDoNotCreateFoll
owerIfLeaderDoesNotHaveSoftDeletes]: cleaning up after test
1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [follower-index/TlWlXp0JSVasju2Kr_hksQ] deleting index
1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [leader-index/FQ6EwIWcRAKD8qvOg2eS8g] deleting index
FAILURE 0.23s J0 | LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes <<< FAILURES!
> Throwable #1: java.lang.AssertionError:
> Expected: <false>
> but: was <true>
> at __randomizedtesting.SeedInfo.seed([7A3C89DA3BCA17DD:65C26CBF6FEF0B39]:0)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.elasticsearch.xpack.ccr.LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes(LocalIndexFollowingIT.java:83)
> at java.lang.Thread.run(Thread.java:748)
```
Build failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+intake/46/console
The suite FollowerFailOverIT is failing because some documents are not
replicated to the follower. Maybe the FollowTask is not working as
expected or the background indexers eat all resources while the follower
cluster is trying to reform after a failover; then CI is not fast enough
to replicate all the indexed docs within 60 seconds (sometimes I see 80k
docs on the leader).
This commit limits the number of documents to be indexed into the leader
index by the background threads so that we can eliminate the latter
case. This change also replaces a docCount assertion with a docIds
assertion so we can have more information if these tests fail again.
Relates #33337
Only the response classes of get auto follow pattern, the follow and stats APIs
were moved away from Streamable. The other APIs use `AcknowledgedResponse`
or `BaseTasksResponse` as response class and
moving that class away from Streamable is a bigger change.
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
Only the follow stats request couldn't be changed to use Writeable serialization,
because that requires changes in `TransportTasksAction` and `BaseTasksRequest` base classes.
This commit fixes two issues with the CCR API specification:
- remove the CCR stats endpoint, it is not currently implemented
- fix the documentation links
* Changed the auto follow stats to also include follow stats.
* Renamed the auto follow stats api to stats api and changed its url path
from `/_ccr/auto_follow/stats` `/_ccr/stats`.
* Removed `/_ccr/stats` url path for the follow stats api, which makes
the index parameter a required parameter.
* Fixed docs.
Index shard stats for the follower shard are fetched, when a shard follow task is started.
This is needed in order to bootstap the shard follow task with the follower global checkpoint.
Sometimes index shard stats are not available (e.g. during a restart) and
we fail now, while it is very likely that these stats will be available some time later.