Commit Graph

144 Commits

Author SHA1 Message Date
Martijn van Groningen 013b64a07c
[CCR] Change FollowIndexAction.Request class to be more user friendly (#33810)
Instead of having one constructor that accepts all arguments, all parameters
should be provided via setters. Only leader and follower index are required
arguments. This makes using this class in tests and transport client easier.
2018-09-19 07:18:24 +02:00
Martijn van Groningen 805a12361f
[CCR] Fail with a descriptive error if leader index does not exist (#33797)
Closes #33737
2018-09-18 21:47:02 +02:00
Martijn van Groningen 9fe5a273aa
[TEST] handle failed search requests differently 2018-09-18 15:55:27 +02:00
Martijn van Groningen 47b86d6e6a
[CCR] Changed AutoFollowCoordinator to keep track of certain statistics (#33684)
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
   (e.g. create_and_follow api call fails) or cluster alias
   (e.g. fetching remote cluster state fails).

Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.

Relates to #33007
2018-09-18 09:43:50 +02:00
Martijn van Groningen 15f30d689b
[CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and (#33777)
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.

The extra caused by level is not necessary here:

```
"fetch_exceptions": [
              {
                "from_seq_no": 1,
                "retries": 106,
                "exception": {
                  "type": "exception",
                  "reason": "[index1] IndexNotFoundException[no such index]",
                  "caused_by": {
                    "type": "index_not_found_exception",
                    "reason": "no such index",
                    "index_uuid": "_na_",
                    "index": "index1"
                  }
                }
              }
            ],
```
2018-09-17 22:33:37 +02:00
Martijn van Groningen d8dc042514
[CCR] Handle leader index with no mapping correctly (#33770)
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.

This commit fixes that, by allowing that a leader index does not yet have
a mapping.
2018-09-17 19:47:40 +02:00
Martijn van Groningen 7046cc467f
[CCR] Make index.xpack.ccr.following_index an internal setting (#33768) 2018-09-17 18:08:19 +02:00
Martijn van Groningen 5d2a01dcc3
[CCR] Fail with a good error if a follow index does not have ccr metadata (#33761)
instead of a NPE.
2018-09-17 18:00:16 +02:00
Jason Tedor 2d81fc3873
Keep CCR REST API specification with all of X-Pack (#33743)
This commit moves the CCR REST API specification out of the CCR
sub-project to locate them with the rest of the REST API specifications
for X-Pack.
2018-09-17 09:59:22 -04:00
Martijn van Groningen 481f8a9a07
[CCR] Make auto follow patterns work with security (#33501)
Relates to #33007
2018-09-17 07:29:00 +02:00
Jason Tedor 770ad53978
Introduce long polling for changes (#33683)
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
2018-09-16 10:35:23 -04:00
Jason Tedor 069605bd91
Do not count shard changes tasks against REST tests (#33738)
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
2018-09-16 07:32:12 -04:00
Jason Tedor 73417bf09a
Move CCR REST tests to a sub-project of ccr
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
2018-09-15 10:18:59 -04:00
Jason Tedor aa56892f2f
Move CCR REST tests to ccr sub-project (#33731)
This commit moves the CCR REST tests to the ccr sub-project as another
step towards running :x-pack:plugin:ccr:check giving us full coverage on
CCR.
2018-09-15 09:18:15 -04:00
Jason Tedor f037edb8e3
Move CCR monitoring tests to ccr sub-project (#33730)
This commit moves the CCR monitoring tests from the monitoring
sub-project to the ccr sub-project.
2018-09-15 09:16:33 -04:00
Martijn van Groningen 82a6ae1dae
[CCR] Move ccr tests in core module back to ccr module (#33711)
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.

This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check

before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
2018-09-14 17:18:00 +02:00
Jason Tedor 2282150f34
Expose retries for CCR fetch failures (#33694)
This commit exposes the number of times that a fetch has been tried to
the CCR stats endpoint, and to CCR monitoring.
2018-09-14 08:52:46 -04:00
Martijn van Groningen 222f42274e
[CCR] Check whether the rejected execution exception has the shutdown flag set (#33703)
and if so debug log it and otherwise rethrow.

This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
2018-09-14 13:28:11 +02:00
Martijn van Groningen 4bcad95fe7
[TEST] wait for no initializing shards 2018-09-14 09:59:24 +02:00
Martijn van Groningen 53ba253aa4
[CCR] Add validation for max_retry_delay (#33648) 2018-09-13 20:52:00 +02:00
Martijn van Groningen a69ae6b89f
[CCR] Add metadata to keep track of the index uuid of the leader index in the follow index (#33367)
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.

The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.

Closes #31505
2018-09-13 11:36:52 +02:00
Jason Tedor eb715d5290
Add follower index to CCR monitoring and status (#33645)
This commit adds the follower index to CCR shard follow task status, and
to monitoring.
2018-09-12 17:35:06 -04:00
Martijn van Groningen b5d8495789
[CCR] Add auto follow pattern APIs to transport client. (#33629) 2018-09-12 21:50:22 +02:00
Martijn van Groningen 5fa81310cc
[CCR] Added history uuid validation (#33546)
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.

* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
  index shards are the same as the recorded history UUIDs.
  If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
  on each shard changes api call verify whether their current history uuid
  is the same as the recorded history uuid.

Relates to #30086 

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2018-09-12 19:42:00 +02:00
Martijn van Groningen 901d8035d9
[CCR] Update es monitoring mapping and (#33635)
* [CCR] Update es monitoring mapping and
change qa tests to query based on leader index.


Co-authored-by: Jason Tedor <jason@tedor.me>
2018-09-12 19:36:17 +02:00
Tanguy Leroux bcac7f5e55 Fix checkstyle violation in ShardFollowNodeTask 2018-09-12 16:03:52 +02:00
Jason Tedor 23f12e42c1
Expose CCR stats to monitoring (#33617)
This commit exposes the CCR stats endpoint to monitoring collection.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2018-09-12 09:13:07 -04:00
Martijn van Groningen 96c49e5ed0
[CCR] Improve shard follow task's retryable error handling (#33371)
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.

Relates to #30086
2018-09-12 12:49:51 +02:00
Jason Tedor 20476b9e06
Disable CCR REST endpoints if CCR disabled (#33619)
This commit avoids enabling the CCR REST endpoints if CCR is disabled.
2018-09-12 01:54:34 -04:00
Jason Tedor eca37e6e0a
Expose CCR to the transport client (#33608)
This commit exposes CCR to the transport client.
2018-09-11 16:37:52 -04:00
Martijn van Groningen 74d41857c6
mute test on windows
Relates #33570
2018-09-10 16:49:17 +02:00
Martijn van Groningen 8eebca32d2
[CCR] Delay auto follow license check (#33557)
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured

Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
2018-09-10 13:23:02 +02:00
Martijn van Groningen c4adcee3ea
[CCR] Add create_follow_index privilege (#33559)
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.

Closes #33555
2018-09-10 13:08:20 +02:00
Jason Tedor d1b99877fa
Remove underscore from auto-follow API (#33550)
This commit removes the leading underscore from _auto_follow in the
auto-follow API endpoints.
2018-09-09 14:42:49 -04:00
Nhat Nguyen 902d20cbbe CCR: Use single global checkpoint to normalize range (#33545)
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
2018-09-09 13:18:30 -04:00
Jason Tedor 6eca627409
Reverse logic for CCR license checks (#33549)
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
2018-09-09 10:22:22 -04:00
Jason Tedor edc492419b
Add latch countdown on failure in CCR license tests (#33548)
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
2018-09-09 09:52:40 -04:00
Jason Tedor c67b0ba33e
Create temporary directory if needed in CCR test
In the multi-cluster-with-non-compliant-license tests, we try to write
out a java.policy to a temporary directory. However, if this temporary
directory does not already exist then writing the java.policy file will
fail. This commit ensures that the temporary directory exists before we
attempt to write the java.policy file.
2018-09-09 07:16:56 -04:00
Jason Tedor 5a38c930fc
Add license checks for auto-follow implementation (#33496)
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
2018-09-09 07:06:55 -04:00
Simon Willnauer c12d232215
Pass Directory instead of DirectoryService to Store (#33466)
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
2018-09-07 14:00:24 +02:00
Nhat Nguyen 8afe09a749
Pass TranslogRecoveryRunner to engine from outside (#33449)
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.

Relates #32867
2018-09-06 11:59:16 -04:00
Martijn van Groningen ef207edbf0
test: do not schedule when test has stopped 2018-09-06 14:14:24 +02:00
Martijn van Groningen cdd82bb203
test: fetch `SeqNoStats` inside try-catch block
Relates to #33457
2018-09-06 11:49:08 +02:00
Martijn van Groningen a721d09c81
[CCR] Added auto follow patterns feature (#33118)
Auto Following Patterns is a cross cluster replication feature that
keeps track whether in the leader cluster indices are being created with
names that match with a specific pattern and if so automatically let
the follower cluster follow these newly created indices.

This change adds an `AutoFollowCoordinator` component that is only active
on the elected master node. Periodically this component checks the
 the cluster state of remote clusters if there new leader indices that
match with configured auto follow patterns that have been defined in
`AutoFollowMetadata` custom metadata.

This change also adds two new APIs to manage auto follow patterns. A put
auto follow pattern api:

```
PUT /_ccr/_autofollow/{{remote_cluster}}
{
   "leader_index_pattern": ["logs-*", ...],
   "follow_index_pattern": "{{leader_index}}-copy",
   "max_concurrent_read_batches": 2
   ... // other optional parameters
}
```

and delete auto follow pattern api:

```
DELETE /_ccr/_autofollow/{{remote_cluster_alias}}
```

The auto follow patterns are directly tied to the remote cluster aliases
configured in the follow cluster.

Relates to #33007


Co-authored-by: Jason Tedor jason@tedor.me
2018-09-06 08:01:58 +02:00
Jason Tedor d71ced1b00
Generalize search.remote settings to cluster.remote (#33413)
With features like CCR building on the CCS infrastructure, the settings
prefix search.remote makes less sense as the namespace for these remote
cluster settings than does a more general namespace like
cluster.remote. This commit replaces these settings with cluster.remote
with a fallback to the deprecated settings search.remote.
2018-09-05 20:43:44 -04:00
Nhat Nguyen 16b53b5ab5 Mute testValidateFollowingIndexSettings
Tracked at #33379
2018-09-04 09:03:26 -04:00
Alpar Torok 7f7e8fd733
Disable assemble task instead of removing it (#33348) 2018-09-04 07:32:14 +03:00
Nhat Nguyen 3a1dad1050 Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-09-02 19:17:51 -04:00
Nhat Nguyen c6b011f8ea
TEST: Increase timeout testFollowIndexAndCloseNode (#33333)
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
2018-09-02 09:28:47 -04:00
Martijn van Groningen 66b164c2a6
[CCR] Removed custom follow and unfollow api's reponse classes with AcknowledgedResponse (#33260)
These response classes did not add any value and in that case just AcknowledgedResponse should be used.

I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
2018-08-31 21:16:06 +07:00