The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
(e.g. create_and_follow api call fails) or cluster alias
(e.g. fetching remote cluster state fails).
Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.
Relates to #33007
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.
The extra caused by level is not necessary here:
```
"fetch_exceptions": [
{
"from_seq_no": 1,
"retries": 106,
"exception": {
"type": "exception",
"reason": "[index1] IndexNotFoundException[no such index]",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index",
"index_uuid": "_na_",
"index": "index1"
}
}
}
],
```
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.
This commit fixes that, by allowing that a leader index does not yet have
a mapping.
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.
This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check
before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
and if so debug log it and otherwise rethrow.
This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.
The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.
Closes#31505
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.
* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
index shards are the same as the recorded history UUIDs.
If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
on each shard changes api call verify whether their current history uuid
is the same as the recorded history uuid.
Relates to #30086
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.
Relates to #30086
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured
Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.
Closes#33555
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
In the multi-cluster-with-non-compliant-license tests, we try to write
out a java.policy to a temporary directory. However, if this temporary
directory does not already exist then writing the java.policy file will
fail. This commit ensures that the temporary directory exists before we
attempt to write the java.policy file.
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.
Relates #32867
Auto Following Patterns is a cross cluster replication feature that
keeps track whether in the leader cluster indices are being created with
names that match with a specific pattern and if so automatically let
the follower cluster follow these newly created indices.
This change adds an `AutoFollowCoordinator` component that is only active
on the elected master node. Periodically this component checks the
the cluster state of remote clusters if there new leader indices that
match with configured auto follow patterns that have been defined in
`AutoFollowMetadata` custom metadata.
This change also adds two new APIs to manage auto follow patterns. A put
auto follow pattern api:
```
PUT /_ccr/_autofollow/{{remote_cluster}}
{
"leader_index_pattern": ["logs-*", ...],
"follow_index_pattern": "{{leader_index}}-copy",
"max_concurrent_read_batches": 2
... // other optional parameters
}
```
and delete auto follow pattern api:
```
DELETE /_ccr/_autofollow/{{remote_cluster_alias}}
```
The auto follow patterns are directly tied to the remote cluster aliases
configured in the follow cluster.
Relates to #33007
Co-authored-by: Jason Tedor jason@tedor.me
With features like CCR building on the CCS infrastructure, the settings
prefix search.remote makes less sense as the namespace for these remote
cluster settings than does a more general namespace like
cluster.remote. This commit replaces these settings with cluster.remote
with a fallback to the deprecated settings search.remote.
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
These response classes did not add any value and in that case just AcknowledgedResponse should be used.
I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
This commit makes primary-replica resyncer use Lucene as the source of
history operation instead of translog if soft-deletes is enabled. With
this change, we no longer expose translog snapshot directly in IndexShard.
Relates #29530
These were broken when fetch exceptions were introduced to the status
object but equals and hash code were not updated then. This commit
addresses that.
Today we fetch the mapping from the leader and apply it as a mapping
update whenever the index metadata version on the leader changes. Yet,
the index metadata can change for many reasons other than a mapping
update (e.g., settings updates, adding an alias, or a replica being
promoted to a primary among many other reasons). This commit builds on
the addition of a mapping version to the index metadata to only fetch
mapping updates when the mapping version increases. This reduces the
number of these fetches and application of mappings on the follower to
the bare minimum.
* master:
Adjust BWC version on mapping version
Token API supports the client_credentials grant (#33106)
Build: forked compiler max memory matches jvmArgs (#33138)
Introduce mapping version to index metadata (#33147)
SQL: Enable aggregations to create a separate bucket for missing values (#32832)
Fix grammar in contributing docs
SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
APM server monitoring (#32515)
Support only string `format` in date, root object & date range (#28117)
[Rollup] Move toBuilders() methods out of rollup config objects (#32585)
Fix forbiddenapis on java 11 (#33116)
Apply publishing to genreate pom (#33094)
Have circuit breaker succeed on unknown mem usage
Do not lose default mapper on metadata updates (#33153)
Fix a mappings update test (#33146)
Reload Secure Settings REST specs & docs (#32990)
Refactor CachingUsernamePassword realm (#32646)
This commit adds tracking and reporting for fetch exceptions. We track
fetch exceptions per fetch, keeping track of up to the maximum number of
concurrent fetches. With each failing fetch, we associate the from
sequence number with the exception that caused the fetch. We report
these in the CCR stats endpoint, and add some testing for this tracking.
Welp, I broke this. I merged a change to auto-discover the CCR QA tests
by making :x-pack:plugin:ccr:check auto-discover the check tasks in the
qa sub-project. Yet, the check tasks for these sub-projects did not
depend on the necessary test tasks (as we were previously doing this
directly from the ccr build file. This commit fixes this!
Today we are by-hand maintaining a list of CCR QA sub-projects that the
check task depends on. This commit simplifies this by finding these
sub-projects automatically and adding their check task as dependencies
of the CCR check task.
This commit implements licensing for CCR. CCR will require a platinum
license, and administrative endpoints will be disabled when a license is
non-compliant.
If a leader index is deleted while there is an active follower, the
follower will send shard changes requests bound for the leader
index. Today this will result in a null pointer exception because there
will not be an index routing table for the index. A null pointer
exception looks like a bug to a user so this commit addresses this by
throwing an index not found exception instead.
For a new feature like CCR we will go without this extra layer of
indirection. This commit replaces all /_xpack/ccr/_(\S+) endpoints by
/_ccr/$1 endpoints.
Today ShardFollowNodeTask might fetch some operations more than once.
This happens because we ask the leading for up to max_batch_count
operations (instead of the left-over size) for the left-over request.
The leading then can freely respond up to the max_batch_count, and at
the same time, if one of the previous requests completed, we might issue
another read request whose range overlaps with the response of the
left-over request.
Closes#32453
Today we do not check if the `following_index` setting of the follower
is enabled or not when processing a follow-request. If that setting is
disabled, the follower will use the default engine, not the following
engine. This change checks and rejects such invalid follow requests.
Relates #30086
Today it's possible to encounter an Index operation in Lucene whose
_source is disabled, and _recovery_source was pruned by the MergePolicy.
If it's the case, we create a Translog#Index without source and let the
caller validate it later. However, this approach is challenging for the
caller.
Deletes and No-Ops don't allow invoking "source()" method. The caller
has to make sure to call "source()" only on index operations. The
current implementation in CCR does not follow this and fail to replica
deletes or no-ops. Moreover, it's easier to reason if a Translog#Index
always has the source.
Today we consider a read request is exhausted if from_seqno is equal to
or greater than the max_required_seqno. However, if we stop when
from_seqno equals to the max_required_seqno, we will miss an operation
whose seqno is max_required_seqno because we have not seen that
operation yet.
We modified the way we calculate to_seqno in #32121 but did not adjust
this test accordingly. If min_seqno equals to max_seqno, the size should be
one instead of zero.
Relates #32121
Normally translog operations will not be replayed on the primary.
Following engine is an exception where we replay translog on both
primary and replica as a non-primary strategy. Even though we won't use
the version_type in the following engine, we still need to pass a valid
value for the primary operation in order not to trip assertions in an
engine.
This commit passes version_type EXTERNAL for translog operation if its
origin is primary.
Relates #31945
The added tests are based on specific scenarios as described in the test plan.
Before this change the ShardFollowNodeTaskTests contained more random like tests,
but these have been removed and in a followup pr better random tests will
be added in a new test class as is described in the test plan.
* master:
Painless: Simplify Naming in Lookup Package (#32177)
Handle missing values in painless (#32207)
add support for write index resolution when creating/updating documents (#31520)
ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
Remove indication of future multi-homing support (#32187)
Rest test - allow for snapshots to take 0 milliseconds
Make x-pack-core generate a pom file
Rest HL client: Add put watch action (#32026)
Build: Remove pom generation for plugin zip files (#32180)
Fix comments causing errors with Java 11
Fix rollup on date fields that don't support epoch_millis (#31890)
Detect and prevent configuration that triggers a Gradle bug (#31912)
[test] port linux package packaging tests (#31943)
Revert "Introduce a Hashing Processor (#31087)" (#32178)
Remove empty @return from JavaDoc
Adjust SSLDriver behavior for JDK11 changes (#32145)
[test] use randomized runner in packaging tests (#32109)
Add support for field aliases. (#32172)
Painless: Fix caching bug and clean up addPainlessClass. (#32142)
Call setReferences() on custom referring tokenfilters in _analyze (#32157)
Fix BwC Tests looking for UUID Pre 6.4 (#32158)
Improve docs for search preferences (#32159)
use before instead of onOrBefore
Add more contexts to painless execute api (#30511)
Add EC2 credential test for repository-s3 (#31918)
A replica can be promoted and started in one cluster state update (#32042)
Fix Java 11 javadoc compile problem
Fix CP for namingConventions when gradle home has spaces (#31914)
Fix `range` queries on `_type` field for singe type indices (#31756)
[DOCS] Update TLS on Docker for 6.3 (#32114)
ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
Remove versionType from translog (#31945)
Switch distribution to new style Requests (#30595)
Build: Skip jar tests if jar disabled
Painless: Add PainlessClassBuilder (#32141)
Build: Make additional test deps of check (#32015)
Disable C2 from using AVX-512 on JDK 10 (#32138)
Build: Move shadow customizations into common code (#32014)
Painless: Fix Bug with Duplicate PainlessClasses (#32110)
Remove empty @param from Javadoc
Re-disable packaging tests on suse boxes
Docs: Fix missing example script quote (#32010)
[ML] Wait for aliases in multi-node tests (#32086)
[ML] Move analyzer dependencies out of categorization config (#32123)
Ensure to release translog snapshot in primary-replica resync (#32045)
Handle TokenizerFactory TODOs (#32063)
Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
Updates the build to gradle 4.9 (#32087)
Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
Check that client methods match API defined in the REST spec (#31825)
Enable testing in FIPS140 JVM (#31666)
Fix put mappings java API documentation (#31955)
Add exclusion option to `keep_types` token filter (#32012)
[Test] Modify assert statement for ssl handshake (#32072)
Tests shard follow task in the context of a leader and follower ReplicationGroup,
in order to test how the shard follow logic reacts to certain shard related
failure scenarios.
More tests will need to be added, but this indicates what changes need to be made
to have these tests.
Relates to #30102
The current shard follow mechanism is complex and does not give us easy ways the have visibility into the system (e.g. why we are falling behind).
The main reason why it is complex is because the current design is highly asynchronous. Also in the current model it is hard to apply backpressure
other than reducing the concurrent reads from the leader shard.
This PR has the following changes:
* Rewrote the shard follow task to coordinate the shard follow mechanism between a leader and follow shard in a single threaded manner.
This allows for better unit testing and makes it easier to add stats.
* All write operations read from the shard changes api should be added to a buffer instead of directly sending it to the bulk shard operations api.
This allows to apply backpressure. In this PR there is a limit that controls how many write ops are allowed in the buffer after which no new reads
will be performed until the number of ops is below that limit.
* The shard changes api includes the current global checkpoint on the leader shard copy. This allows reading to be a more self sufficient process;
instead of relying on a background thread to fetch the leader shard's global checkpoint.
* Reading write operations from the leader shard (via shard changes api) is a separate step then writing the write operations (via bulk shards operations api).
Whereas before a read would immediately result into a write.
* The bulk shard operations api returns the local checkpoint on the follow primary shard, to keep the shard follow task up to date with what has been written.
* Moved the shard follow logic that was previously in ShardFollowTasksExecutor to ShardFollowNodeTask.
* Moved over the changes from #31242 to make shard follow mechanism resilient from node and shard failures.
Relates to #30086
* elastic/master: (53 commits)
Painless: Restructure/Clean Up of Spec Documentation (#31013)
Update ignore_unmapped serialization after backport
Add back dropped substitution on merge
high level REST api: cancel task (#30745)
Enable engine factory to be pluggable (#31183)
Remove vestiges of animal sniffer (#31178)
Rename elasticsearch-nio to nio (#31186)
Rename elasticsearch-core to core (#31185)
Move cli sub-project out of server to libs (#31184)
[DOCS] Fixes broken link in auditing settings
QA: Better seed nodes for rolling restart
[DOCS] Moves ML content to stack-docs
[DOCS] Clarifies recommendation for audit index output type (#31146)
Add nio-transport as option for http smoke tests (#31162)
QA: Set better node names on rolling restart tests
Add support for ignore_unmapped to geo sort (#31153)
Share common parser in some AcknowledgedResponses (#31169)
Fix random failure on SearchQueryIT#testTermExpansionExceptionOnSpanFailure
Remove reference to multiple fields with one name (#31127)
Remove BlobContainer.move() method (#31100)
...
Today if a user omits the `_source` entirely or modifies the source
on indexing we have no chance to re-create the document after it has
been added. This is an issue for CCR and recovery based on soft deletes
which we are going to make the default. This change adds an additional
recovery source if the source is disabled or modified that is only kept
around until the document leaves the retention policy window.
This change adds a merge policy that efficiently removes this extra source
on merge for all document that are live and not in the retention policy window
anymore.
The primary shard copy on the following has authority of the replication
operations that occur on the following side in cross-cluster
replication. Yet today we are using the primary term directly from the
operations on the leader side. Instead we should be replacing the
primary term on the following side with the primary term of the primary
on the following side. This commit does this by copying the translog
operations with the corrected primary term. This ensures that we use
this primary term while applying the operations on the primary, and when
replicating them across to the replica (where the replica request was
carrying the primary term of the primary shard copy on the follower).
The old perform request methods on the REST client have been deprecated
in favor using request-flavored methods. This commit addresses the use
of these deprecated methods in the CCR test suite.
This commit clarifies the origin of the global checkpoint that the
following shard uses and replaces illegal_state_exc E by an assertion.
Relates #30980
Today before reading operations on the leading shard, we minimization
the requesting range with the global checkpoint. However, this might
make the request invalid if the following shard generates a requesting
range based on the global-checkpoint from a primary shard and sends
that request to a replica whose global checkpoint is lagged.
Another issue is that we are mutating the request when applying
minimization. If the request becomes invalid on a replica, we will
reroute the mutated request instead of the original one to the primary.
This commit removes the minimization and replaces it by a range check
with the local checkpoint.
The shard changes api returns the minimum IndexMetadata version the leader
index needs to have. If the leader side is behind on IndexMetadata version
then follow shard task waits with processing write operations until the
mapping has been fetched from leader index and applied in follower index
in the background.
The cluster state api is used to fetch the leader mapping and put mapping api
to apply the mapping in the follower index. This works because put mapping
api accepts fields that are already defined.
Relates to #30086
The TODOs in the rest actions was incorrect. The problem was that
these rest actions used `follow_index` as first named variable in the path
under which the rest actions were registered. Other candidate rest actions that
also have a named variable as first element in the path (but with a different
name) get resolved as rest parameters too and passed down to the rest
action that actually ends up getting executed.
In the case of the follow index api, a `index` parameter got passed down
to `RestFollowExistingAction`, but that param was never used. This caused the
follow index api call to fail, because of unused http parameters.
This change doesn't fixes that problem, but works around it by using
`index` as named variable for the follow index (instead of `follow_index`).
Relates to #30102
If security is enabled today with ccr then the follow index api will
fail with the fact that system user does not have privileges to use
the shard changes api. The reason that system user is used is because
the persistent tasks that keep the shards in sync runs in the background
and the user that invokes the follow index api only start those background
processes.
I think it is better that the system user isn't used by the persistent
tasks that keep shards in sync, but rather runs as the same user that
invoked the follow index api and use the permissions that that user has.
This is what this PR does, and this is done by keeping track of
security headers inside the persistent task (similar to how rollup does this).
This PR also adds a cluster ccr priviledge that allows a user to follow
or unfollow an index. Finally if a user that wants to follow an index,
it needs to have read and monitor privileges on the leader index and
monitor and write privileges on the follow index.
This commit adds an API to read translog snapshot from Lucene,
then cut-over from the existing translog to the new API in CCR.
Relates #30086
Relates #29530
[CCR] added rest specs and simple rest test for follow and unfollow apis, also
Added an acknowledge field in follow and unfollow api responses. Currently these api return an empty response and fixed bug in unfollow api that didn't cleanup node tasks properly.
This commit adds a tombstone document into Lucene for every No-op.
With this change, Lucene index is expected to have a complete history
of operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.
Relates #29530
The follow index api completely reuses CCS infrastructure that was exposed via:
https://github.com/elastic/elasticsearch/pull/29495
This means that the leader index parameter support the same ccs index
to indicate that an index resides in a different cluster.
I also added a qa module that smoke tests the cross cluster nature of ccr.
The idea is that this test just verifies that ccr can read data from a
remote leader index and that is it, no crazy randomization or indirectly
testing other features.
keep track of shard follow stats inside shard follow stats' node task instead of persistent task status.
By maintaining the shard follow stats inside its node task the stats update is quicker as
no cluster state update is required. The stats are now transient; meaning if the task
is going to run a different node then the stats are gone too. Currently only the processed
global checkpoint is being tracked and this is being restored when a shard follow node task
starts via the indices stats api (the reason of the first change of this change). Other stats
that we may add in the future (like fetch_time, see: https://gist.github.com/s1monw/dba13daf8493bf48431b72365e110717)
it is ok if we start from zero in case a shard follow task moves to another node.
This limit is based on the number of estimate bytes in each translog
operation that fall between the minimum and maximum request sequence number.
If this limit is met then the shard follow task executor will make sure
that a subsequent shard changes request will be performed to fetch the
remaining translog operations.
This limit is needed in order to protect against returning too many
translog operations in a single shard changes response.
Relates to #2436
We check for the existence of both leader and follower index, then properly
report to the caller. However, we do not return after reporting failure. This
causes the caller receive exception twice: IllegalArgumentException then
NullPointerException. This commit makes sure to stop the action after reporting
failure.
This commit enables the run task for ccr by specifying that the ccr
project not be evaluated until after core is evaluated. This is
important since ccr is alphabetically before core and thus Gradle
evaluates it first.
Relates #3665