236 Commits

Author SHA1 Message Date
Jason Tedor
2d81fc3873
Keep CCR REST API specification with all of X-Pack (#33743)
This commit moves the CCR REST API specification out of the CCR
sub-project to locate them with the rest of the REST API specifications
for X-Pack.
2018-09-17 09:59:22 -04:00
Martijn van Groningen
481f8a9a07
[CCR] Make auto follow patterns work with security (#33501)
Relates to #33007
2018-09-17 07:29:00 +02:00
Jason Tedor
770ad53978
Introduce long polling for changes (#33683)
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
2018-09-16 10:35:23 -04:00
Jason Tedor
069605bd91
Do not count shard changes tasks against REST tests (#33738)
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
2018-09-16 07:32:12 -04:00
Jason Tedor
73417bf09a
Move CCR REST tests to a sub-project of ccr
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
2018-09-15 10:18:59 -04:00
Jason Tedor
aa56892f2f
Move CCR REST tests to ccr sub-project (#33731)
This commit moves the CCR REST tests to the ccr sub-project as another
step towards running :x-pack:plugin:ccr:check giving us full coverage on
CCR.
2018-09-15 09:18:15 -04:00
Jason Tedor
f037edb8e3
Move CCR monitoring tests to ccr sub-project (#33730)
This commit moves the CCR monitoring tests from the monitoring
sub-project to the ccr sub-project.
2018-09-15 09:16:33 -04:00
Martijn van Groningen
82a6ae1dae
[CCR] Move ccr tests in core module back to ccr module (#33711)
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.

This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check

before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
2018-09-14 17:18:00 +02:00
Jason Tedor
2282150f34
Expose retries for CCR fetch failures (#33694)
This commit exposes the number of times that a fetch has been tried to
the CCR stats endpoint, and to CCR monitoring.
2018-09-14 08:52:46 -04:00
Martijn van Groningen
222f42274e
[CCR] Check whether the rejected execution exception has the shutdown flag set (#33703)
and if so debug log it and otherwise rethrow.

This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
2018-09-14 13:28:11 +02:00
Martijn van Groningen
4bcad95fe7
[TEST] wait for no initializing shards 2018-09-14 09:59:24 +02:00
Martijn van Groningen
53ba253aa4
[CCR] Add validation for max_retry_delay (#33648) 2018-09-13 20:52:00 +02:00
Martijn van Groningen
a69ae6b89f
[CCR] Add metadata to keep track of the index uuid of the leader index in the follow index (#33367)
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.

The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.

Closes #31505
2018-09-13 11:36:52 +02:00
Jason Tedor
eb715d5290
Add follower index to CCR monitoring and status (#33645)
This commit adds the follower index to CCR shard follow task status, and
to monitoring.
2018-09-12 17:35:06 -04:00
Martijn van Groningen
b5d8495789
[CCR] Add auto follow pattern APIs to transport client. (#33629) 2018-09-12 21:50:22 +02:00
Martijn van Groningen
5fa81310cc
[CCR] Added history uuid validation (#33546)
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.

* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
  index shards are the same as the recorded history UUIDs.
  If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
  on each shard changes api call verify whether their current history uuid
  is the same as the recorded history uuid.

Relates to #30086 

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2018-09-12 19:42:00 +02:00
Martijn van Groningen
901d8035d9
[CCR] Update es monitoring mapping and (#33635)
* [CCR] Update es monitoring mapping and
change qa tests to query based on leader index.


Co-authored-by: Jason Tedor <jason@tedor.me>
2018-09-12 19:36:17 +02:00
Tanguy Leroux
bcac7f5e55 Fix checkstyle violation in ShardFollowNodeTask 2018-09-12 16:03:52 +02:00
Jason Tedor
23f12e42c1
Expose CCR stats to monitoring (#33617)
This commit exposes the CCR stats endpoint to monitoring collection.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2018-09-12 09:13:07 -04:00
Martijn van Groningen
96c49e5ed0
[CCR] Improve shard follow task's retryable error handling (#33371)
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.

Relates to #30086
2018-09-12 12:49:51 +02:00
Jason Tedor
20476b9e06
Disable CCR REST endpoints if CCR disabled (#33619)
This commit avoids enabling the CCR REST endpoints if CCR is disabled.
2018-09-12 01:54:34 -04:00
Jason Tedor
eca37e6e0a
Expose CCR to the transport client (#33608)
This commit exposes CCR to the transport client.
2018-09-11 16:37:52 -04:00
Martijn van Groningen
74d41857c6
mute test on windows
Relates #33570
2018-09-10 16:49:17 +02:00
Martijn van Groningen
8eebca32d2
[CCR] Delay auto follow license check (#33557)
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured

Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
2018-09-10 13:23:02 +02:00
Martijn van Groningen
c4adcee3ea
[CCR] Add create_follow_index privilege (#33559)
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.

Closes #33555
2018-09-10 13:08:20 +02:00
Jason Tedor
d1b99877fa
Remove underscore from auto-follow API (#33550)
This commit removes the leading underscore from _auto_follow in the
auto-follow API endpoints.
2018-09-09 14:42:49 -04:00
Nhat Nguyen
902d20cbbe CCR: Use single global checkpoint to normalize range (#33545)
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
2018-09-09 13:18:30 -04:00
Jason Tedor
6eca627409
Reverse logic for CCR license checks (#33549)
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
2018-09-09 10:22:22 -04:00
Jason Tedor
edc492419b
Add latch countdown on failure in CCR license tests (#33548)
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
2018-09-09 09:52:40 -04:00
Jason Tedor
c67b0ba33e
Create temporary directory if needed in CCR test
In the multi-cluster-with-non-compliant-license tests, we try to write
out a java.policy to a temporary directory. However, if this temporary
directory does not already exist then writing the java.policy file will
fail. This commit ensures that the temporary directory exists before we
attempt to write the java.policy file.
2018-09-09 07:16:56 -04:00
Jason Tedor
5a38c930fc
Add license checks for auto-follow implementation (#33496)
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
2018-09-09 07:06:55 -04:00
Simon Willnauer
c12d232215
Pass Directory instead of DirectoryService to Store (#33466)
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
2018-09-07 14:00:24 +02:00
Nhat Nguyen
8afe09a749
Pass TranslogRecoveryRunner to engine from outside (#33449)
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.

Relates #32867
2018-09-06 11:59:16 -04:00
Martijn van Groningen
ef207edbf0
test: do not schedule when test has stopped 2018-09-06 14:14:24 +02:00
Martijn van Groningen
cdd82bb203
test: fetch SeqNoStats inside try-catch block
Relates to #33457
2018-09-06 11:49:08 +02:00
Martijn van Groningen
a721d09c81
[CCR] Added auto follow patterns feature (#33118)
Auto Following Patterns is a cross cluster replication feature that
keeps track whether in the leader cluster indices are being created with
names that match with a specific pattern and if so automatically let
the follower cluster follow these newly created indices.

This change adds an `AutoFollowCoordinator` component that is only active
on the elected master node. Periodically this component checks the
 the cluster state of remote clusters if there new leader indices that
match with configured auto follow patterns that have been defined in
`AutoFollowMetadata` custom metadata.

This change also adds two new APIs to manage auto follow patterns. A put
auto follow pattern api:

```
PUT /_ccr/_autofollow/{{remote_cluster}}
{
   "leader_index_pattern": ["logs-*", ...],
   "follow_index_pattern": "{{leader_index}}-copy",
   "max_concurrent_read_batches": 2
   ... // other optional parameters
}
```

and delete auto follow pattern api:

```
DELETE /_ccr/_autofollow/{{remote_cluster_alias}}
```

The auto follow patterns are directly tied to the remote cluster aliases
configured in the follow cluster.

Relates to #33007


Co-authored-by: Jason Tedor jason@tedor.me
2018-09-06 08:01:58 +02:00
Jason Tedor
d71ced1b00
Generalize search.remote settings to cluster.remote (#33413)
With features like CCR building on the CCS infrastructure, the settings
prefix search.remote makes less sense as the namespace for these remote
cluster settings than does a more general namespace like
cluster.remote. This commit replaces these settings with cluster.remote
with a fallback to the deprecated settings search.remote.
2018-09-05 20:43:44 -04:00
Nhat Nguyen
16b53b5ab5 Mute testValidateFollowingIndexSettings
Tracked at #33379
2018-09-04 09:03:26 -04:00
Alpar Torok
7f7e8fd733
Disable assemble task instead of removing it (#33348) 2018-09-04 07:32:14 +03:00
Nhat Nguyen
3a1dad1050 Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-09-02 19:17:51 -04:00
Nhat Nguyen
c6b011f8ea
TEST: Increase timeout testFollowIndexAndCloseNode (#33333)
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
2018-09-02 09:28:47 -04:00
Martijn van Groningen
66b164c2a6
[CCR] Removed custom follow and unfollow api's reponse classes with AcknowledgedResponse (#33260)
These response classes did not add any value and in that case just AcknowledgedResponse should be used.

I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
2018-08-31 21:16:06 +07:00
Nhat Nguyen
d3f32273eb Merge branch 'master' into ccr 2018-08-30 23:22:58 -04:00
Martijn van Groningen
41c7fc8d37
[CCR] Introduce leader index name & last fetch time stats to stats api response (#33155) 2018-08-29 10:54:58 +07:00
Nhat Nguyen
e2b931e80b
Use Lucene history in primary-replica resync (#33178)
This commit makes primary-replica resyncer use Lucene as the source of
history operation instead of translog if soft-deletes is enabled. With
this change, we no longer expose translog snapshot directly in IndexShard.

Relates #29530
2018-08-28 10:44:15 -04:00
Jason Tedor
5954354e62
Fix ShardFollowNodeTask.Status equals and hash code (#33189)
These were broken when fetch exceptions were introduced to the status
object but equals and hash code were not updated then. This commit
addresses that.
2018-08-28 08:53:45 -04:00
Jason Tedor
cd91992c89
Only fetch mapping updates when necessary (#33182)
Today we fetch the mapping from the leader and apply it as a mapping
update whenever the index metadata version on the leader changes. Yet,
the index metadata can change for many reasons other than a mapping
update (e.g., settings updates, adding an alias, or a replica being
promoted to a primary among many other reasons). This commit builds on
the addition of a mapping version to the index metadata to only fetch
mapping updates when the mapping version increases. This reduces the
number of these fetches and application of mappings on the follower to
the bare minimum.
2018-08-28 06:06:22 -04:00
Jason Tedor
0e5d42ca38
Merge branch 'master' into ccr
* master:
  Adjust BWC version on mapping version
  Token API supports the client_credentials grant (#33106)
  Build: forked compiler max memory matches jvmArgs (#33138)
  Introduce mapping version to index metadata (#33147)
  SQL: Enable aggregations to create a separate bucket for missing values (#32832)
  Fix grammar in contributing docs
  SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
  APM server monitoring (#32515)
  Support only string `format` in date, root object & date range (#28117)
  [Rollup] Move toBuilders() methods out of rollup config objects (#32585)
  Fix forbiddenapis on java 11  (#33116)
  Apply publishing to genreate pom (#33094)
  Have circuit breaker succeed on unknown mem usage
  Do not lose default mapper on metadata updates (#33153)
  Fix a mappings update test (#33146)
  Reload Secure Settings REST specs & docs (#32990)
  Refactor CachingUsernamePassword realm (#32646)
2018-08-27 13:49:59 -04:00
Martijn van Groningen
47e9e72df2
reduce maximum number of writes to speed up test 2018-08-27 12:14:46 +07:00
Jason Tedor
ef9607ea0c
Track fetch exceptions for shard follow tasks (#33047)
This commit adds tracking and reporting for fetch exceptions. We track
fetch exceptions per fetch, keeping track of up to the maximum number of
concurrent fetches. With each failing fetch, we associate the from
sequence number with the exception that caused the fetch. We report
these in the CCR stats endpoint, and add some testing for this tracking.
2018-08-24 14:21:23 -04:00