558 Commits

Author SHA1 Message Date
Martijn van Groningen
a69ae6b89f
[CCR] Add metadata to keep track of the index uuid of the leader index in the follow index (#33367)
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.

The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.

Closes #31505
2018-09-13 11:36:52 +02:00
Jason Tedor
eb715d5290
Add follower index to CCR monitoring and status (#33645)
This commit adds the follower index to CCR shard follow task status, and
to monitoring.
2018-09-12 17:35:06 -04:00
Martijn van Groningen
b5d8495789
[CCR] Add auto follow pattern APIs to transport client. (#33629) 2018-09-12 21:50:22 +02:00
Martijn van Groningen
5fa81310cc
[CCR] Added history uuid validation (#33546)
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.

* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
  index shards are the same as the recorded history UUIDs.
  If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
  on each shard changes api call verify whether their current history uuid
  is the same as the recorded history uuid.

Relates to #30086 

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2018-09-12 19:42:00 +02:00
Tanguy Leroux
bcac7f5e55 Fix checkstyle violation in ShardFollowNodeTask 2018-09-12 16:03:52 +02:00
Jason Tedor
23f12e42c1
Expose CCR stats to monitoring (#33617)
This commit exposes the CCR stats endpoint to monitoring collection.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2018-09-12 09:13:07 -04:00
Martijn van Groningen
96c49e5ed0
[CCR] Improve shard follow task's retryable error handling (#33371)
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.

Relates to #30086
2018-09-12 12:49:51 +02:00
Jason Tedor
20476b9e06
Disable CCR REST endpoints if CCR disabled (#33619)
This commit avoids enabling the CCR REST endpoints if CCR is disabled.
2018-09-12 01:54:34 -04:00
Jason Tedor
eca37e6e0a
Expose CCR to the transport client (#33608)
This commit exposes CCR to the transport client.
2018-09-11 16:37:52 -04:00
Martijn van Groningen
8eebca32d2
[CCR] Delay auto follow license check (#33557)
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured

Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
2018-09-10 13:23:02 +02:00
Martijn van Groningen
c4adcee3ea
[CCR] Add create_follow_index privilege (#33559)
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.

Closes #33555
2018-09-10 13:08:20 +02:00
Jason Tedor
d1b99877fa
Remove underscore from auto-follow API (#33550)
This commit removes the leading underscore from _auto_follow in the
auto-follow API endpoints.
2018-09-09 14:42:49 -04:00
Nhat Nguyen
902d20cbbe CCR: Use single global checkpoint to normalize range (#33545)
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
2018-09-09 13:18:30 -04:00
Jason Tedor
6eca627409
Reverse logic for CCR license checks (#33549)
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
2018-09-09 10:22:22 -04:00
Jason Tedor
edc492419b
Add latch countdown on failure in CCR license tests (#33548)
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
2018-09-09 09:52:40 -04:00
Jason Tedor
5a38c930fc
Add license checks for auto-follow implementation (#33496)
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
2018-09-09 07:06:55 -04:00
Simon Willnauer
c12d232215
Pass Directory instead of DirectoryService to Store (#33466)
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
2018-09-07 14:00:24 +02:00
Nhat Nguyen
8afe09a749
Pass TranslogRecoveryRunner to engine from outside (#33449)
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.

Relates #32867
2018-09-06 11:59:16 -04:00
Martijn van Groningen
ef207edbf0
test: do not schedule when test has stopped 2018-09-06 14:14:24 +02:00
Martijn van Groningen
cdd82bb203
test: fetch SeqNoStats inside try-catch block
Relates to #33457
2018-09-06 11:49:08 +02:00
Martijn van Groningen
a721d09c81
[CCR] Added auto follow patterns feature (#33118)
Auto Following Patterns is a cross cluster replication feature that
keeps track whether in the leader cluster indices are being created with
names that match with a specific pattern and if so automatically let
the follower cluster follow these newly created indices.

This change adds an `AutoFollowCoordinator` component that is only active
on the elected master node. Periodically this component checks the
 the cluster state of remote clusters if there new leader indices that
match with configured auto follow patterns that have been defined in
`AutoFollowMetadata` custom metadata.

This change also adds two new APIs to manage auto follow patterns. A put
auto follow pattern api:

```
PUT /_ccr/_autofollow/{{remote_cluster}}
{
   "leader_index_pattern": ["logs-*", ...],
   "follow_index_pattern": "{{leader_index}}-copy",
   "max_concurrent_read_batches": 2
   ... // other optional parameters
}
```

and delete auto follow pattern api:

```
DELETE /_ccr/_autofollow/{{remote_cluster_alias}}
```

The auto follow patterns are directly tied to the remote cluster aliases
configured in the follow cluster.

Relates to #33007


Co-authored-by: Jason Tedor jason@tedor.me
2018-09-06 08:01:58 +02:00
Nhat Nguyen
16b53b5ab5 Mute testValidateFollowingIndexSettings
Tracked at #33379
2018-09-04 09:03:26 -04:00
Nhat Nguyen
3a1dad1050 Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-09-02 19:17:51 -04:00
Nhat Nguyen
c6b011f8ea
TEST: Increase timeout testFollowIndexAndCloseNode (#33333)
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
2018-09-02 09:28:47 -04:00
Martijn van Groningen
66b164c2a6
[CCR] Removed custom follow and unfollow api's reponse classes with AcknowledgedResponse (#33260)
These response classes did not add any value and in that case just AcknowledgedResponse should be used.

I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
2018-08-31 21:16:06 +07:00
Nhat Nguyen
d3f32273eb Merge branch 'master' into ccr 2018-08-30 23:22:58 -04:00
Martijn van Groningen
41c7fc8d37
[CCR] Introduce leader index name & last fetch time stats to stats api response (#33155) 2018-08-29 10:54:58 +07:00
Nhat Nguyen
e2b931e80b
Use Lucene history in primary-replica resync (#33178)
This commit makes primary-replica resyncer use Lucene as the source of
history operation instead of translog if soft-deletes is enabled. With
this change, we no longer expose translog snapshot directly in IndexShard.

Relates #29530
2018-08-28 10:44:15 -04:00
Jason Tedor
5954354e62
Fix ShardFollowNodeTask.Status equals and hash code (#33189)
These were broken when fetch exceptions were introduced to the status
object but equals and hash code were not updated then. This commit
addresses that.
2018-08-28 08:53:45 -04:00
Jason Tedor
cd91992c89
Only fetch mapping updates when necessary (#33182)
Today we fetch the mapping from the leader and apply it as a mapping
update whenever the index metadata version on the leader changes. Yet,
the index metadata can change for many reasons other than a mapping
update (e.g., settings updates, adding an alias, or a replica being
promoted to a primary among many other reasons). This commit builds on
the addition of a mapping version to the index metadata to only fetch
mapping updates when the mapping version increases. This reduces the
number of these fetches and application of mappings on the follower to
the bare minimum.
2018-08-28 06:06:22 -04:00
Jason Tedor
0e5d42ca38
Merge branch 'master' into ccr
* master:
  Adjust BWC version on mapping version
  Token API supports the client_credentials grant (#33106)
  Build: forked compiler max memory matches jvmArgs (#33138)
  Introduce mapping version to index metadata (#33147)
  SQL: Enable aggregations to create a separate bucket for missing values (#32832)
  Fix grammar in contributing docs
  SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
  APM server monitoring (#32515)
  Support only string `format` in date, root object & date range (#28117)
  [Rollup] Move toBuilders() methods out of rollup config objects (#32585)
  Fix forbiddenapis on java 11  (#33116)
  Apply publishing to genreate pom (#33094)
  Have circuit breaker succeed on unknown mem usage
  Do not lose default mapper on metadata updates (#33153)
  Fix a mappings update test (#33146)
  Reload Secure Settings REST specs & docs (#32990)
  Refactor CachingUsernamePassword realm (#32646)
2018-08-27 13:49:59 -04:00
Martijn van Groningen
47e9e72df2
reduce maximum number of writes to speed up test 2018-08-27 12:14:46 +07:00
Jason Tedor
ef9607ea0c
Track fetch exceptions for shard follow tasks (#33047)
This commit adds tracking and reporting for fetch exceptions. We track
fetch exceptions per fetch, keeping track of up to the maximum number of
concurrent fetches. With each failing fetch, we associate the from
sequence number with the exception that caused the fetch. We report
these in the CCR stats endpoint, and add some testing for this tracking.
2018-08-24 14:21:23 -04:00
Martijn van Groningen
b0f22d67c4
fixed not returning response instance 2018-08-24 16:56:29 +07:00
Martijn van Groningen
575f33941c
Required changes after merging in master branch. 2018-08-24 12:51:26 +07:00
Jason Tedor
b08d02e3b7
Implement CCR licensing (#33002)
This commit implements licensing for CCR. CCR will require a platinum
license, and administrative endpoints will be disabled when a license is
non-compliant.
2018-08-20 23:33:18 -04:00
Nhat Nguyen
919888eba7 TEST: Enable debug log testValidateFollowingIndexSettings 2018-08-06 14:55:56 -04:00
Nhat Nguyen
c394eb9ae9 CCR: Expose the operation primary term
Relates #32442
2018-08-06 10:55:37 -04:00
Jason Tedor
3b739b9fd5
Avoid NPE on shard changes action (#32630)
If a leader index is deleted while there is an active follower, the
follower will send shard changes requests bound for the leader
index. Today this will result in a null pointer exception because there
will not be an index routing table for the index. A null pointer
exception looks like a bug to a user so this commit addresses this by
throwing an index not found exception instead.
2018-08-06 08:01:47 -04:00
Jason Tedor
32c2759bb9
Remove extra blank line in CcrStatsAction.java
This commit removes an extra blank line that was accidentally committed
to CcrStatsAction.java.
2018-08-03 09:55:04 -04:00
Jason Tedor
d640c9ddf9
Introduce CCR stats endpoint (#32350)
This commit introduces the CCR stats endpoint which provides shard-level
stats on the status of CCR follower tasks.
2018-08-03 09:09:45 -04:00
Jason Tedor
2387616c80
Remove _xpack from CCR APIs (#32563)
For a new feature like CCR we will go without this extra layer of
indirection. This commit replaces all /_xpack/ccr/_(\S+) endpoints by
/_ccr/$1 endpoints.
2018-08-02 20:21:43 -04:00
Nhat Nguyen
8cfbb64d6e
ShardFollowNodeTask should fetch operation once (#32455)
Today ShardFollowNodeTask might fetch some operations more than once.
This happens because we ask the leading for up to max_batch_count
operations (instead of the left-over size) for the left-over request.
The leading then can freely respond up to the max_batch_count, and at
the same time, if one of the previous requests completed, we might issue
another read request whose range overlaps with the response of the
left-over request.

Closes #32453
2018-07-30 20:53:09 -04:00
Nhat Nguyen
aa3b6e098c
Reject follow request if following setting not enabled on follower (#32448)
Today we do not check if the `following_index` setting of the follower
is enabled or not when processing a follow-request. If that setting is
disabled, the follower will use the default engine, not the following
engine. This change checks and rejects such invalid follow requests.

Relates #30086
2018-07-29 21:57:45 -04:00
Nhat Nguyen
8474f8a01c
Validate source of an index in LuceneChangesSnapshot (#32288)
Today it's possible to encounter an Index operation in Lucene whose
_source is disabled, and _recovery_source was pruned by the MergePolicy.
If it's the case, we create a Translog#Index without source and let the
caller validate it later. However, this approach is challenging for the
caller.

Deletes and No-Ops don't allow invoking "source()" method. The caller
has to make sure to call "source()" only on index operations. The
current implementation in CCR does not follow this and fail to replica
deletes or no-ops. Moreover, it's easier to reason if a Translog#Index
always has the source.
2018-07-27 08:16:52 -04:00
Nhat Nguyen
88190299df
CCR: Fix incorrect read request completion condition (#32266)
Today we consider a read request is exhausted if from_seqno is equal to
or greater than the max_required_seqno. However, if we stop when
from_seqno equals to the max_required_seqno, we will miss an operation
whose seqno is max_required_seqno because we have not seen that 
operation yet.
2018-07-22 22:14:27 -04:00
Martijn van Groningen
b6b596e471
[CCR] Add random shard follow task test (#32188)
Added shard follow task unit tests that tests whether the shard follow task is able to process randomly generated shard changes api responses.
2018-07-21 12:38:05 +02:00
Nhat Nguyen
8e15504443 TEST: Fix range issue in ShardChangesActionTests
We modified the way we calculate to_seqno in #32121 but did not adjust
this test accordingly. If min_seqno equals to max_seqno, the size should be
one instead of zero.

Relates #32121
2018-07-20 17:20:41 -04:00
Nhat Nguyen
fe574f89f8 CCR: Translog op on primary should have versionType
Normally translog operations will not be replayed on the primary.
Following engine is an exception where we replay translog on both
primary and replica as a non-primary strategy.  Even though we won't use
the version_type in the following engine, we still need to pass a valid
value for the primary operation in order not to trip assertions in an
engine.

This commit passes version_type EXTERNAL for translog operation if its
origin is primary.

Relates #31945
2018-07-20 08:39:38 -04:00
Martijn van Groningen
a6b7497fdc
[CCR] Add more unit tests for shard follow task (#32121)
The added tests are based on specific scenarios as described in the test plan.
Before this change the ShardFollowNodeTaskTests contained more random like tests,
but these have been removed and in a followup pr better random tests will
be added in a new test class as is described in the test plan.
2018-07-20 14:12:05 +02:00