Prior to this commit, if during fetch leader / follower GCP
a fatal error occurred, then the shard follow task was removed.
This is unexpected, because if such an error occurs during the lifetime of shard follow task then replication is stopped and the fatal error flag is set. This allows the ccr stats api to report the fatal exception that has occurred (instead of the user grepping through the elasticsearch logs).
This issue was found by a rare failure of the `FollowStatsIT#testFollowStatsApiIncludeShardFollowStatsWithRemovedFollowerIndex` test.
Closes#38779
* During fetching remote mapping if remote client is missing then
`NoSuchRemoteClusterException` was not handled.
* When adding remote connection, check that it is really connected
before continue-ing to run the tests.
Relates to #38695
This commit is the first step in integrating shard history retention
leases with CCR. In this commit we integrate shard history retention
leases with recovery from remote. Before we start transferring files, we
take out a retention lease on the primary. Then during the file copy
phase, we repeatedly renew the retention lease. Finally, when recovery
from remote is complete, we disable the background renewing of the
retention lease.
This commit adds a `ListenerTimeouts` class that will wrap a
`ActionListener` in a listener with a timeout scheduled on the generic
thread pool. If the timeout expires before the listener is completed,
`onFailure` will be called with an `ElasticsearchTimeoutException`.
Timeouts for the get ccr file chunk action are implemented using this
functionality. Additionally, this commit attempts to fix#38027 by also
blocking proxied get ccr file chunk actions. This test being un-muted is
useful to verify the timeout functionality.
Today when processing an operation on a replica engine (or the
following engine), we first add it to Lucene, then add it to translog,
then finally marks its seq_no as completed. If a flush occurs after step1,
but before step-3, the max_seq_no in the commit's user_data will be
smaller than the seq_no of some documents in the Lucene commit.
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.
Closes#38318Closes#36789
The CCR REST tests that rely on these assertions are flaky. They are
flaky since the introduction of recovery from the remote.
The underlying problem is this: these tests are making assertions about
the number of operations read by the shard following task. However, with
recovery from remote, we no longer have guarantees that the assumptions
these tests were relying on hold. Namely, these tests were assuming that
the only way that a document could land in the follower index is via the
shard following task. With recovery from remote, there is another way,
which is via the files that are copied over during the recovery
phase. Most of the time this will not be a problem because with the
small number of documents that we are indexing in these tests, it is
usally not the case that a flush would occur and so there would not be
any documents in the files copied over. However, a flush can occur any
time at which point all of the indexed documents could end up in a safe
commit and copied over during recovery from remote. This commit modifies
these assertions to ones that are not prone to this issue, yet still
validate the health of the follower shard.
The previous logic for concurrent file chunk fetching did not allow for multiple chunks from the same
file to be fetched in parallel. The parallelism only allowed to fetch chunks from different files in
parallel. This required complex logic on the follower to be aware from which file it was already
fetching information, in order to ensure that chunks for the same file would be fetched in sequential
order. During benchmarking, this exhibited throughput issues when recovery came towards the end,
where it would only be sequentially fetching chunks for the same largest segment file, with
throughput considerably going down in a high-latency network as there was no parallelism anymore.
The new logic here follows the peer recovery model more closely, and sends multiple requests for
the same file in parallel, and then reorders the results as necessary. Benchmarks show that this
leads to better overall throughput and the implementation is also simpler.
The should fix the following NPE:
```
[2019-02-11T23:27:48,452][WARN ][o.e.p.PersistentTasksNodeService] [node_s_0] task kD8YzUhHTK6uKNBNQI-1ZQ-0 failed with an exception
1> java.lang.NullPointerException: null
1> at org.elasticsearch.xpack.ccr.action.ShardFollowTasksExecutor.lambda$fetchFollowerShardInfo$7(ShardFollowTasksExecutor.java:305) ~[main/:?]
1> at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:68) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onCompletion(TransportBroadcastByNodeAction.java:383) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onNodeResponse(TransportBroadcastByNodeAction.java:352) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:324) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:314) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1108) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1189) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1169) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:417) [elasticsearch-8.0.0-SNAP
SHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:391) [elasticsearch-8.0.0-SNAP
SHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:687) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
```
Relates to #38779
* Add rolling upgrade multi cluster test module (#38277)
This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.
Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)
This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.
This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.
Relates to #37231 and #38037
* Filter out upgraded version index settings when starting index following (#38838)
The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.
Closes#38835
Currently we index documents concurrently to attempt to ensure that we
update mappings during the restore process. However, this does not
actually test that the mapping will be correct and is dangerous as it
can lead to a misalignment between the max sequence number and the local
checkpoint. If these are not aligned, peer recovery cannot be completed
without initiating following which this test does not do. That causes
teardown assertions to fail.
This commit removes the concurrent indexing and flushes after the
documents are indexed. Additionally it modifies the mapping specific
test to ensure that there is a mapping update when the restore session
is initiated. This mapping update is picked up at the end of the restore
by the follower.
There were two documents (seq=2 and seq=103) missing on the follower in
one of the failures of `testFailOverOnFollower`. I spent several hours
on that failure but could not figure out the reason. I adjust log and
unmute this test so we can collect more information.
Relates #38633
The Close Index API has been refactored in 6.7.0 and it now performs
pre-closing sanity checks on shards before an index is closed: the maximum
sequence number must be equals to the global checkpoint. While this is a
strong requirement for regular shards, we identified the need to relax this
check in the case of CCR following shards.
The following shards are not in charge of managing the max sequence
number or global checkpoint, which are pulled from a leader shard. They
also fetch and process batches of operations from the leader in an unordered
way, potentially leaving gaps in the history of ops. If the following shard lags
a lot it's possible that the global checkpoint and max seq number never get
in sync, preventing the following shard to be closed and a new PUT Follow
action to be issued on this shard (which is our recommended way to
resume/restart a CCR following).
This commit allows each Engine implementation to define the specific
verification it must perform before closing the index. In order to allow
following/frozen/closed shards to be closed whatever the max seq number
or global checkpoint are, the FollowingEngine and ReadOnlyEngine do
not perform any check before the index is closed.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR.
The implementation uses the parallel file writer functionality that is also used by peer recoveries.
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
This is related to #35975. We do not want a slow master to fail a
recovery from remote process due to a slow put mappings call. This
commit increases the master node timeout on this call to 30 mins.
Currently the snapshot/restore process manually sets the global
checkpoint to the max sequence number from the restored segements. This
does not work for Ccr as this will lead to documents that would be
recovered in the normal followering operation from being recovered.
This commit fixes this issue by setting the initial global checkpoint to
the existing local checkpoint.
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.
Relates #38410
Introduced FollowParameters class that put follow, resume follow,
put auto follow pattern requests and follow info response classes reuse.
The FollowParameters class had the fields, getters etc. for the common parameters
that all these APIs have. Also binary and xcontent serialization /
parsing is handled by this class.
The follow, resume follow, put auto follow pattern request classes originally
used optional non primitive fields, so FollowParameters has that too and the follow info api can handle that now too.
Also the followerIndex field can in production only be specified via
the url path. If it is also specified via the request body then
it must have the same value as is specified in the url path. This
option only existed to xcontent testing. However the AbstractSerializingTestCase
base class now also supports createXContextTestInstance() to provide
a different test instance when testing xcontent, so allowing followerIndex
to be specified via the request body is no longer needed.
By moving the followerIndex field from Body to ResumeFollowAction.Request
class and not allowing the followerIndex field to be specified via
the request body the Body class is redundant and can be removed. The
ResumeFollowAction.Request class can then directly use the
FollowParameters class.
For consistency I also removed the ability to specified followerIndex
in the put follow api and the name in put auto follow pattern api via
the request body.
Renames the following settings to remove the mention of `zen` in their names:
- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.
This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.
The API keys:-
- by default do not have an expiration but expiration can be
configured where the API keys need to be expired after a
certain amount of time.
- when generated will keep authentication information of the user that
generated them.
- can be defined with a role describing the privileges for accessing
Elasticsearch and will be limited by the role of the user that
generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
before being deleted. The expired API keys remover task handles this.
Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`
The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```
Closes#34383
There are two issues regarding the way that we sync mapping from leader
to follower when a ccr restore is completed:
1. The returned mapping from a cluster service might not be up to date
as the mapping of the restored index commit.
2. We should not compare the mapping version of the follower and the
leader. They are not related to one another.
Moreover, I think we should only ensure that once the restore is done,
the mapping on the follower should be at least the mapping of the copied
index commit. We don't have to sync the mapping which is updated after
we have opened a session.
Relates #36879Closes#37887
This is related to #35975. Currently when an index falls behind a leader
it encounters a fatal exception. This commit adds a test for that
scenario. Additionally, it tests that the user can stop following, close
the follower index, and put follow again. After the indexing is
re-bootstrapped, it will recover the documents it lost in normal
following operations.
Because concurrent sync requests from a primary to its replicas could be
in flight, it can be the case that an older retention leases collection
arrives and is processed on the replica after a newer retention leases
collection has arrived and been processed. Without a defense, in this
case the replica would overwrite the newer retention leases with the
older retention leases. This commit addresses this issue by introducing
a versioning scheme to retention leases. This versioning scheme is used
to resolve out-of-order processing on the replica. We persist this
version into Lucene and restore it on recovery. The encoding of
retention leases is starting to get a little ugly. We can consider
addressing this in a follow-up.
This suite still fails one per week sometimes with a worrying assertion.
Sadly we are still unable to find the actual source.
Expected: <SeqNoStats{maxSeqNo=229, localCheckpoint=86, globalCheckpoint=86}>
but: was <SeqNoStats{maxSeqNo=229, localCheckpoint=-1, globalCheckpoint=86}>
This change enables trace log in the suite so we will have a better
picture if this fails again.
Relates #3333
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.
Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
Unlike assertBusy, awaitBusy does not retry if the code-block throws an
AssertionError. A refresh in atLeastDocsIndexed can fail because we call
this method while we are closing some node in FollowerFailOverIT.
Currently we use the raw byte array length when calling the IndexInput
read call to determine how many bytes we want to read. However, due to
how BigArrays works, the array length might be longer than the reference
length. This commit fixes the issue and uses the BytesRef length when
calling read. Additionally, it expands the index follow test to index
many more documents. These documents should potentially lead to large
enough segment files to trigger scenarios where this fix matters.
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.
This is a continuation of #28667, #36137 and also fixes#37708.