Commit Graph

210 Commits

Author SHA1 Message Date
Martijn van Groningen 74dc2da873
Change shard changes api's threadpool from get to search (#34421) 2018-10-15 08:09:00 +01:00
Nhat Nguyen 429c29e833 CCR/TEST: AwaitsFix testFailOverOnFollower
Tracked at #34412
2018-10-13 21:05:33 -04:00
Nhat Nguyen 7bc11a8099 Unmute testFollowIndexAndCloseNode
This issue was resolved by #34288.

Closes #33337
Relates #34288
2018-10-10 15:48:22 -04:00
Nhat Nguyen 33791ac27c
CCR: Following primary should process operations once (#34288)
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
2018-10-10 15:39:57 -04:00
Martijn van Groningen 268e134121
renamed test class 2018-10-08 15:05:50 +02:00
Martijn van Groningen c6c83d19f7
[CCR] Clear fetch exceptions if an empty but successful shard changes response returns (#34256)
Also fixed ShardFollowNodeTaskTests to not return ops when responseSize
is empty. Otherwise ops are returned when no ops are expected to be returned.

Co-authored-by: Jason Tedor <jason@tedor.me>
2018-10-06 07:53:37 -04:00
Jason Tedor 7478167d60
Rename CCR stats implementation (#34300)
In the CCR docs we want to refer to the endpoint that returns following
stats as the follow stats API. This commit renames the internal
implementation of this endpoint to reflect this usage.
2018-10-05 06:25:24 -04:00
Nhat Nguyen d7893fd1e4 TEST: Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-10-02 17:20:31 -04:00
Martijn van Groningen 7f5c2f1050
[CCR] Validate follower index historyUUIDs (#34078)
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid.

No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing.

Closes #33956
2018-10-02 18:01:06 +02:00
Martijn van Groningen d12a64eac2
[CCR] Only use primary shards and get expected count from leader index (#34186)
Closes #34173
2018-10-01 20:13:16 +02:00
Nhat Nguyen a02debadfe TEST: Unmute testFollowIndexAndCloseNode
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
2018-10-01 11:59:33 -04:00
Jason Tedor 80f7c1dcc9
Fix compilation in unfollow action tests
This arose when two commits were pushed at roughly the same time, both
of which compiled successfully against master, but not when taken
together. This commit fixes a reference in one of the commits that was
changed in the other commit.
2018-09-30 14:30:08 -04:00
Jason Tedor 1893765055
Change CCR stats endpoint to be index-centric (#34169)
This commit modifies the CCR stats endpoint for indices to be
/{index}/_ccr/stats. This makes this endpoint consistent with other
index-centric endpoints like indices stats.
2018-09-30 14:29:32 -04:00
Jason Tedor e2bd2028d8
Allow specifying shard changes batch sizes in bytes (#34168)
This commit changes the shard changes requests from using a raw byte
value to being able to be specified using bytes units (e.g., 4mb).
2018-09-30 14:22:22 -04:00
Martijn van Groningen 7c91c7a638
fixed test compile error 2018-09-30 19:31:30 +02:00
Martijn van Groningen b1a27b2e6b
[CCR] Add unfollow API (#34132)
The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients.

For the unfollow api to work the index follow needs to be stopped and the index needs to be closed.

Closes #33931
2018-09-30 19:19:34 +02:00
Nhat Nguyen ad61398879
CCR: Optimize indexing ops using seq_no on followers (#34099)
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.

Relates #33656
2018-09-28 20:42:26 -04:00
Martijn van Groningen a984f8afb3
[CCR] Validate index privileges prior to following an index (#33758)
Prior to following an index in the follow API, check whether current
user has sufficient privileges in the leader cluster to read and
monitor the leader index.

Also check this in the create and follow API prior to creating the
follow index.

Also introduced READ_CCR cluster privilege that include the minimal
cluster level actions that are required for ccr in the leader cluster.
So a user can follow indices in a cluster, but not use the ccr admin APIs.

Closes #33553

Co-authored-by: Jason Tedor <jason@tedor.me>
2018-09-28 17:51:23 +02:00
Martijn van Groningen 3d7e3b2ab1
[TEST] changed naming of test methods to not refer to old api names. 2018-09-28 17:43:53 +02:00
Martijn van Groningen eb00348b57
[CCR] Adjust list retryable errors (#33985)
The following changes were made:
* Added ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error.
* Added Index block exception. If the leader index gets closed, this exception is returned.
* Added ClusterBlockException service unavailable. In case for example the leader cluster is without elected master.
* Removed IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error.

Closes #33954
2018-09-28 13:33:09 +02:00
Martijn van Groningen 506c1c2d47
Retry errors when fetching follower global checkpoint. (#34019)
Closes #34016
2018-09-28 10:34:08 +02:00
Martijn van Groningen 9129948f60
Rename CCR APIs (#34027)
* Renamed CCR APIs

Renamed:
* `/{index}/_ccr/create_and_follow` to `/{index}/_ccr/follow`
* `/{index}/_ccr/unfollow` to `/{index}/_ccr/pause_follow`
* `/{index}/_ccr/follow` to `/{index}/_ccr/resume_follow`

Relates to #33931
2018-09-28 08:02:20 +02:00
Martijn van Groningen 17b3b97899
Fixed CCR stats api serialization issues and (#33983)
always use `IndicesOptions.strictExpand()` for indices options.

The follow index may be closed and we still want to get stats from
shard follow task and the whether the provided index name matches with
follow index name is checked when locating the task itself in the ccr
stats transport action.
2018-09-28 07:45:32 +02:00
Nhat Nguyen 48c169e065
CCR: replicates max seq_no of updates to follower (#34051)
This commit replicates the max_seq_no_of_updates on the leading index
to the primaries of the following index via ShardFollowNodeTask. The
max_seq_of_updates is then transmitted to the replicas of the follower
via replication requests (that's BulkShardOperationsRequest).

Relates #33656
2018-09-26 08:00:10 -04:00
Martijn van Groningen eae5487477
[CCR] set minimum version to 6.5.0 2018-09-26 09:31:36 +02:00
Martijn van Groningen 96b3417985
[CCR] Don't auto follow follow indices in the same cluster. (#33944) 2018-09-26 07:34:51 +02:00
Nhat Nguyen 5166dd0a4c
Replicate max seq_no of updates to replicas (#33967)
We start tracking max seq_no_of_updates on the primary in #33842. This
commit replicates that value from a primary to its replicas in replication 
requests or the translog phase of peer-recovery.

With this change, we guarantee that the value of max seq_no_of_updates
on a replica when any index/delete operation is performed at least the
max_seq_no_of_updates on the primary when that operation was executed.

Relates #33656
2018-09-25 08:07:57 -04:00
Martijn van Groningen 793b2a94b4
[CCR] Expose auto follow stats to monitoring (#33886) 2018-09-25 07:19:46 +02:00
Nhat Nguyen 6ec36b1273
CCR: Make AutoFollowMetadata immutable (#33977)
We should make AutoFollowMetadata immutable to avoid being inconsistent
when one thread modifies it while other reads it.
2018-09-24 17:47:10 -04:00
Martijn van Groningen 2795ef561f
[CCR] Add get auto follow pattern api (#33849)
Relates to #33007
2018-09-24 20:26:13 +02:00
Nhat Nguyen ddd5ce5740
TEST: Avoid invalid ranges in ShardChangesActionTests (#33976)
If numWrites is between 2 and 9, we will issue an invalid range because
the from_seq_no is negative. This commit makes sure that numWrites is at
least 10, and adds an explicit test to verify invalid request ranges.
2018-09-23 22:28:41 -04:00
Nhat Nguyen 7944a0cb25
Track max seq_no of updates or deletes on primary (#33842)
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).

The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.

Relates #33656
2018-09-22 08:02:57 -04:00
Martijn van Groningen e1e5f40727
[CCR] Move headers from auto follow pattern to auto follow metadata (#33846)
This ensures that we will not serialize the headers as part of the
auto follow pattern in the to be added get auto follow api.
2018-09-21 18:08:29 +02:00
Martijn van Groningen 384ce58535
removed unused fields 2018-09-20 08:56:23 +02:00
Martijn van Groningen 44c7c4b166
[CCR] Add auto follow stats api (#33801)
GET /_ccr/auto_follow/stats

Returns:

```
{
   "number_of_successful_follow_indices": ...
   "number_of_failed_follow_indices": ...
   "number_of_failed_remote_cluster_state_requests": ...
   "recent_auto_follow_errors": [
      ...
   ]
}
```

Relates to #33007
2018-09-20 07:16:20 +02:00
Martijn van Groningen d9947c631a
[CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (#33821) 2018-09-19 13:13:20 +02:00
Martijn van Groningen 013b64a07c
[CCR] Change FollowIndexAction.Request class to be more user friendly (#33810)
Instead of having one constructor that accepts all arguments, all parameters
should be provided via setters. Only leader and follower index are required
arguments. This makes using this class in tests and transport client easier.
2018-09-19 07:18:24 +02:00
Martijn van Groningen 805a12361f
[CCR] Fail with a descriptive error if leader index does not exist (#33797)
Closes #33737
2018-09-18 21:47:02 +02:00
Martijn van Groningen 47b86d6e6a
[CCR] Changed AutoFollowCoordinator to keep track of certain statistics (#33684)
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
   (e.g. create_and_follow api call fails) or cluster alias
   (e.g. fetching remote cluster state fails).

Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.

Relates to #33007
2018-09-18 09:43:50 +02:00
Martijn van Groningen 15f30d689b
[CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and (#33777)
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.

The extra caused by level is not necessary here:

```
"fetch_exceptions": [
              {
                "from_seq_no": 1,
                "retries": 106,
                "exception": {
                  "type": "exception",
                  "reason": "[index1] IndexNotFoundException[no such index]",
                  "caused_by": {
                    "type": "index_not_found_exception",
                    "reason": "no such index",
                    "index_uuid": "_na_",
                    "index": "index1"
                  }
                }
              }
            ],
```
2018-09-17 22:33:37 +02:00
Martijn van Groningen d8dc042514
[CCR] Handle leader index with no mapping correctly (#33770)
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.

This commit fixes that, by allowing that a leader index does not yet have
a mapping.
2018-09-17 19:47:40 +02:00
Martijn van Groningen 7046cc467f
[CCR] Make index.xpack.ccr.following_index an internal setting (#33768) 2018-09-17 18:08:19 +02:00
Martijn van Groningen 5d2a01dcc3
[CCR] Fail with a good error if a follow index does not have ccr metadata (#33761)
instead of a NPE.
2018-09-17 18:00:16 +02:00
Martijn van Groningen 481f8a9a07
[CCR] Make auto follow patterns work with security (#33501)
Relates to #33007
2018-09-17 07:29:00 +02:00
Jason Tedor 770ad53978
Introduce long polling for changes (#33683)
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
2018-09-16 10:35:23 -04:00
Jason Tedor 73417bf09a
Move CCR REST tests to a sub-project of ccr
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
2018-09-15 10:18:59 -04:00
Jason Tedor aa56892f2f
Move CCR REST tests to ccr sub-project (#33731)
This commit moves the CCR REST tests to the ccr sub-project as another
step towards running :x-pack:plugin:ccr:check giving us full coverage on
CCR.
2018-09-15 09:18:15 -04:00
Jason Tedor f037edb8e3
Move CCR monitoring tests to ccr sub-project (#33730)
This commit moves the CCR monitoring tests from the monitoring
sub-project to the ccr sub-project.
2018-09-15 09:16:33 -04:00
Martijn van Groningen 82a6ae1dae
[CCR] Move ccr tests in core module back to ccr module (#33711)
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.

This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check

before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
2018-09-14 17:18:00 +02:00
Jason Tedor 2282150f34
Expose retries for CCR fetch failures (#33694)
This commit exposes the number of times that a fetch has been tried to
the CCR stats endpoint, and to CCR monitoring.
2018-09-14 08:52:46 -04:00
Martijn van Groningen 222f42274e
[CCR] Check whether the rejected execution exception has the shutdown flag set (#33703)
and if so debug log it and otherwise rethrow.

This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
2018-09-14 13:28:11 +02:00
Martijn van Groningen 53ba253aa4
[CCR] Add validation for max_retry_delay (#33648) 2018-09-13 20:52:00 +02:00
Martijn van Groningen a69ae6b89f
[CCR] Add metadata to keep track of the index uuid of the leader index in the follow index (#33367)
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.

The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.

Closes #31505
2018-09-13 11:36:52 +02:00
Jason Tedor eb715d5290
Add follower index to CCR monitoring and status (#33645)
This commit adds the follower index to CCR shard follow task status, and
to monitoring.
2018-09-12 17:35:06 -04:00
Martijn van Groningen b5d8495789
[CCR] Add auto follow pattern APIs to transport client. (#33629) 2018-09-12 21:50:22 +02:00
Martijn van Groningen 5fa81310cc
[CCR] Added history uuid validation (#33546)
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.

* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
  index shards are the same as the recorded history UUIDs.
  If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
  on each shard changes api call verify whether their current history uuid
  is the same as the recorded history uuid.

Relates to #30086 

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2018-09-12 19:42:00 +02:00
Tanguy Leroux bcac7f5e55 Fix checkstyle violation in ShardFollowNodeTask 2018-09-12 16:03:52 +02:00
Jason Tedor 23f12e42c1
Expose CCR stats to monitoring (#33617)
This commit exposes the CCR stats endpoint to monitoring collection.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2018-09-12 09:13:07 -04:00
Martijn van Groningen 96c49e5ed0
[CCR] Improve shard follow task's retryable error handling (#33371)
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.

Relates to #30086
2018-09-12 12:49:51 +02:00
Jason Tedor 20476b9e06
Disable CCR REST endpoints if CCR disabled (#33619)
This commit avoids enabling the CCR REST endpoints if CCR is disabled.
2018-09-12 01:54:34 -04:00
Jason Tedor eca37e6e0a
Expose CCR to the transport client (#33608)
This commit exposes CCR to the transport client.
2018-09-11 16:37:52 -04:00
Martijn van Groningen 8eebca32d2
[CCR] Delay auto follow license check (#33557)
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured

Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
2018-09-10 13:23:02 +02:00
Martijn van Groningen c4adcee3ea
[CCR] Add create_follow_index privilege (#33559)
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.

Closes #33555
2018-09-10 13:08:20 +02:00
Jason Tedor d1b99877fa
Remove underscore from auto-follow API (#33550)
This commit removes the leading underscore from _auto_follow in the
auto-follow API endpoints.
2018-09-09 14:42:49 -04:00
Nhat Nguyen 902d20cbbe CCR: Use single global checkpoint to normalize range (#33545)
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
2018-09-09 13:18:30 -04:00
Jason Tedor 6eca627409
Reverse logic for CCR license checks (#33549)
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
2018-09-09 10:22:22 -04:00
Jason Tedor edc492419b
Add latch countdown on failure in CCR license tests (#33548)
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
2018-09-09 09:52:40 -04:00
Jason Tedor 5a38c930fc
Add license checks for auto-follow implementation (#33496)
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
2018-09-09 07:06:55 -04:00
Simon Willnauer c12d232215
Pass Directory instead of DirectoryService to Store (#33466)
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
2018-09-07 14:00:24 +02:00
Nhat Nguyen 8afe09a749
Pass TranslogRecoveryRunner to engine from outside (#33449)
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.

Relates #32867
2018-09-06 11:59:16 -04:00
Martijn van Groningen ef207edbf0
test: do not schedule when test has stopped 2018-09-06 14:14:24 +02:00
Martijn van Groningen cdd82bb203
test: fetch `SeqNoStats` inside try-catch block
Relates to #33457
2018-09-06 11:49:08 +02:00
Martijn van Groningen a721d09c81
[CCR] Added auto follow patterns feature (#33118)
Auto Following Patterns is a cross cluster replication feature that
keeps track whether in the leader cluster indices are being created with
names that match with a specific pattern and if so automatically let
the follower cluster follow these newly created indices.

This change adds an `AutoFollowCoordinator` component that is only active
on the elected master node. Periodically this component checks the
 the cluster state of remote clusters if there new leader indices that
match with configured auto follow patterns that have been defined in
`AutoFollowMetadata` custom metadata.

This change also adds two new APIs to manage auto follow patterns. A put
auto follow pattern api:

```
PUT /_ccr/_autofollow/{{remote_cluster}}
{
   "leader_index_pattern": ["logs-*", ...],
   "follow_index_pattern": "{{leader_index}}-copy",
   "max_concurrent_read_batches": 2
   ... // other optional parameters
}
```

and delete auto follow pattern api:

```
DELETE /_ccr/_autofollow/{{remote_cluster_alias}}
```

The auto follow patterns are directly tied to the remote cluster aliases
configured in the follow cluster.

Relates to #33007


Co-authored-by: Jason Tedor jason@tedor.me
2018-09-06 08:01:58 +02:00
Nhat Nguyen 16b53b5ab5 Mute testValidateFollowingIndexSettings
Tracked at #33379
2018-09-04 09:03:26 -04:00
Nhat Nguyen 3a1dad1050 Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-09-02 19:17:51 -04:00
Nhat Nguyen c6b011f8ea
TEST: Increase timeout testFollowIndexAndCloseNode (#33333)
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
2018-09-02 09:28:47 -04:00
Martijn van Groningen 66b164c2a6
[CCR] Removed custom follow and unfollow api's reponse classes with AcknowledgedResponse (#33260)
These response classes did not add any value and in that case just AcknowledgedResponse should be used.

I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
2018-08-31 21:16:06 +07:00
Nhat Nguyen d3f32273eb Merge branch 'master' into ccr 2018-08-30 23:22:58 -04:00
Martijn van Groningen 41c7fc8d37
[CCR] Introduce leader index name & last fetch time stats to stats api response (#33155) 2018-08-29 10:54:58 +07:00
Nhat Nguyen e2b931e80b
Use Lucene history in primary-replica resync (#33178)
This commit makes primary-replica resyncer use Lucene as the source of
history operation instead of translog if soft-deletes is enabled. With
this change, we no longer expose translog snapshot directly in IndexShard.

Relates #29530
2018-08-28 10:44:15 -04:00
Jason Tedor 5954354e62
Fix ShardFollowNodeTask.Status equals and hash code (#33189)
These were broken when fetch exceptions were introduced to the status
object but equals and hash code were not updated then. This commit
addresses that.
2018-08-28 08:53:45 -04:00
Jason Tedor cd91992c89
Only fetch mapping updates when necessary (#33182)
Today we fetch the mapping from the leader and apply it as a mapping
update whenever the index metadata version on the leader changes. Yet,
the index metadata can change for many reasons other than a mapping
update (e.g., settings updates, adding an alias, or a replica being
promoted to a primary among many other reasons). This commit builds on
the addition of a mapping version to the index metadata to only fetch
mapping updates when the mapping version increases. This reduces the
number of these fetches and application of mappings on the follower to
the bare minimum.
2018-08-28 06:06:22 -04:00
Jason Tedor 0e5d42ca38
Merge branch 'master' into ccr
* master:
  Adjust BWC version on mapping version
  Token API supports the client_credentials grant (#33106)
  Build: forked compiler max memory matches jvmArgs (#33138)
  Introduce mapping version to index metadata (#33147)
  SQL: Enable aggregations to create a separate bucket for missing values (#32832)
  Fix grammar in contributing docs
  SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
  APM server monitoring (#32515)
  Support only string `format` in date, root object & date range (#28117)
  [Rollup] Move toBuilders() methods out of rollup config objects (#32585)
  Fix forbiddenapis on java 11  (#33116)
  Apply publishing to genreate pom (#33094)
  Have circuit breaker succeed on unknown mem usage
  Do not lose default mapper on metadata updates (#33153)
  Fix a mappings update test (#33146)
  Reload Secure Settings REST specs & docs (#32990)
  Refactor CachingUsernamePassword realm (#32646)
2018-08-27 13:49:59 -04:00
Martijn van Groningen 47e9e72df2
reduce maximum number of writes to speed up test 2018-08-27 12:14:46 +07:00
Jason Tedor ef9607ea0c
Track fetch exceptions for shard follow tasks (#33047)
This commit adds tracking and reporting for fetch exceptions. We track
fetch exceptions per fetch, keeping track of up to the maximum number of
concurrent fetches. With each failing fetch, we associate the from
sequence number with the exception that caused the fetch. We report
these in the CCR stats endpoint, and add some testing for this tracking.
2018-08-24 14:21:23 -04:00
Martijn van Groningen b0f22d67c4
fixed not returning response instance 2018-08-24 16:56:29 +07:00
Martijn van Groningen 575f33941c
Required changes after merging in master branch. 2018-08-24 12:51:26 +07:00
Jason Tedor b08d02e3b7
Implement CCR licensing (#33002)
This commit implements licensing for CCR. CCR will require a platinum
license, and administrative endpoints will be disabled when a license is
non-compliant.
2018-08-20 23:33:18 -04:00
Nhat Nguyen 919888eba7 TEST: Enable debug log testValidateFollowingIndexSettings 2018-08-06 14:55:56 -04:00
Nhat Nguyen c394eb9ae9 CCR: Expose the operation primary term
Relates #32442
2018-08-06 10:55:37 -04:00
Jason Tedor 3b739b9fd5
Avoid NPE on shard changes action (#32630)
If a leader index is deleted while there is an active follower, the
follower will send shard changes requests bound for the leader
index. Today this will result in a null pointer exception because there
will not be an index routing table for the index. A null pointer
exception looks like a bug to a user so this commit addresses this by
throwing an index not found exception instead.
2018-08-06 08:01:47 -04:00
Jason Tedor 32c2759bb9
Remove extra blank line in CcrStatsAction.java
This commit removes an extra blank line that was accidentally committed
to CcrStatsAction.java.
2018-08-03 09:55:04 -04:00
Jason Tedor d640c9ddf9
Introduce CCR stats endpoint (#32350)
This commit introduces the CCR stats endpoint which provides shard-level
stats on the status of CCR follower tasks.
2018-08-03 09:09:45 -04:00
Jason Tedor 2387616c80
Remove _xpack from CCR APIs (#32563)
For a new feature like CCR we will go without this extra layer of
indirection. This commit replaces all /_xpack/ccr/_(\S+) endpoints by
/_ccr/$1 endpoints.
2018-08-02 20:21:43 -04:00
Nhat Nguyen 8cfbb64d6e
ShardFollowNodeTask should fetch operation once (#32455)
Today ShardFollowNodeTask might fetch some operations more than once.
This happens because we ask the leading for up to max_batch_count
operations (instead of the left-over size) for the left-over request.
The leading then can freely respond up to the max_batch_count, and at
the same time, if one of the previous requests completed, we might issue
another read request whose range overlaps with the response of the
left-over request.

Closes #32453
2018-07-30 20:53:09 -04:00
Nhat Nguyen aa3b6e098c
Reject follow request if following setting not enabled on follower (#32448)
Today we do not check if the `following_index` setting of the follower
is enabled or not when processing a follow-request. If that setting is
disabled, the follower will use the default engine, not the following
engine. This change checks and rejects such invalid follow requests.

Relates #30086
2018-07-29 21:57:45 -04:00
Nhat Nguyen 8474f8a01c
Validate source of an index in LuceneChangesSnapshot (#32288)
Today it's possible to encounter an Index operation in Lucene whose
_source is disabled, and _recovery_source was pruned by the MergePolicy.
If it's the case, we create a Translog#Index without source and let the
caller validate it later. However, this approach is challenging for the
caller.

Deletes and No-Ops don't allow invoking "source()" method. The caller
has to make sure to call "source()" only on index operations. The
current implementation in CCR does not follow this and fail to replica
deletes or no-ops. Moreover, it's easier to reason if a Translog#Index
always has the source.
2018-07-27 08:16:52 -04:00
Nhat Nguyen 88190299df
CCR: Fix incorrect read request completion condition (#32266)
Today we consider a read request is exhausted if from_seqno is equal to
or greater than the max_required_seqno. However, if we stop when
from_seqno equals to the max_required_seqno, we will miss an operation
whose seqno is max_required_seqno because we have not seen that 
operation yet.
2018-07-22 22:14:27 -04:00
Martijn van Groningen b6b596e471
[CCR] Add random shard follow task test (#32188)
Added shard follow task unit tests that tests whether the shard follow task is able to process randomly generated shard changes api responses.
2018-07-21 12:38:05 +02:00
Nhat Nguyen 8e15504443 TEST: Fix range issue in ShardChangesActionTests
We modified the way we calculate to_seqno in #32121 but did not adjust
this test accordingly. If min_seqno equals to max_seqno, the size should be
one instead of zero.

Relates #32121
2018-07-20 17:20:41 -04:00