461 Commits

Author SHA1 Message Date
Martijn van Groningen
eb00348b57
[CCR] Adjust list retryable errors (#33985)
The following changes were made:
* Added ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error.
* Added Index block exception. If the leader index gets closed, this exception is returned.
* Added ClusterBlockException service unavailable. In case for example the leader cluster is without elected master.
* Removed IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error.

Closes #33954
2018-09-28 13:33:09 +02:00
Martijn van Groningen
506c1c2d47
Retry errors when fetching follower global checkpoint. (#34019)
Closes #34016
2018-09-28 10:34:08 +02:00
Martijn van Groningen
9129948f60
Rename CCR APIs (#34027)
* Renamed CCR APIs

Renamed:
* `/{index}/_ccr/create_and_follow` to `/{index}/_ccr/follow`
* `/{index}/_ccr/unfollow` to `/{index}/_ccr/pause_follow`
* `/{index}/_ccr/follow` to `/{index}/_ccr/resume_follow`

Relates to #33931
2018-09-28 08:02:20 +02:00
Martijn van Groningen
17b3b97899
Fixed CCR stats api serialization issues and (#33983)
always use `IndicesOptions.strictExpand()` for indices options.

The follow index may be closed and we still want to get stats from
shard follow task and the whether the provided index name matches with
follow index name is checked when locating the task itself in the ccr
stats transport action.
2018-09-28 07:45:32 +02:00
Nhat Nguyen
48c169e065
CCR: replicates max seq_no of updates to follower (#34051)
This commit replicates the max_seq_no_of_updates on the leading index
to the primaries of the following index via ShardFollowNodeTask. The
max_seq_of_updates is then transmitted to the replicas of the follower
via replication requests (that's BulkShardOperationsRequest).

Relates #33656
2018-09-26 08:00:10 -04:00
Martijn van Groningen
eae5487477
[CCR] set minimum version to 6.5.0 2018-09-26 09:31:36 +02:00
Martijn van Groningen
96b3417985
[CCR] Don't auto follow follow indices in the same cluster. (#33944) 2018-09-26 07:34:51 +02:00
Nhat Nguyen
5166dd0a4c
Replicate max seq_no of updates to replicas (#33967)
We start tracking max seq_no_of_updates on the primary in #33842. This
commit replicates that value from a primary to its replicas in replication 
requests or the translog phase of peer-recovery.

With this change, we guarantee that the value of max seq_no_of_updates
on a replica when any index/delete operation is performed at least the
max_seq_no_of_updates on the primary when that operation was executed.

Relates #33656
2018-09-25 08:07:57 -04:00
Martijn van Groningen
793b2a94b4
[CCR] Expose auto follow stats to monitoring (#33886) 2018-09-25 07:19:46 +02:00
Nhat Nguyen
6ec36b1273
CCR: Make AutoFollowMetadata immutable (#33977)
We should make AutoFollowMetadata immutable to avoid being inconsistent
when one thread modifies it while other reads it.
2018-09-24 17:47:10 -04:00
Martijn van Groningen
2795ef561f
[CCR] Add get auto follow pattern api (#33849)
Relates to #33007
2018-09-24 20:26:13 +02:00
Nhat Nguyen
ddd5ce5740
TEST: Avoid invalid ranges in ShardChangesActionTests (#33976)
If numWrites is between 2 and 9, we will issue an invalid range because
the from_seq_no is negative. This commit makes sure that numWrites is at
least 10, and adds an explicit test to verify invalid request ranges.
2018-09-23 22:28:41 -04:00
Nhat Nguyen
7944a0cb25
Track max seq_no of updates or deletes on primary (#33842)
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).

The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.

Relates #33656
2018-09-22 08:02:57 -04:00
Martijn van Groningen
e1e5f40727
[CCR] Move headers from auto follow pattern to auto follow metadata (#33846)
This ensures that we will not serialize the headers as part of the
auto follow pattern in the to be added get auto follow api.
2018-09-21 18:08:29 +02:00
Martijn van Groningen
384ce58535
removed unused fields 2018-09-20 08:56:23 +02:00
Martijn van Groningen
44c7c4b166
[CCR] Add auto follow stats api (#33801)
GET /_ccr/auto_follow/stats

Returns:

```
{
   "number_of_successful_follow_indices": ...
   "number_of_failed_follow_indices": ...
   "number_of_failed_remote_cluster_state_requests": ...
   "recent_auto_follow_errors": [
      ...
   ]
}
```

Relates to #33007
2018-09-20 07:16:20 +02:00
Martijn van Groningen
d9947c631a
[CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (#33821) 2018-09-19 13:13:20 +02:00
Martijn van Groningen
013b64a07c
[CCR] Change FollowIndexAction.Request class to be more user friendly (#33810)
Instead of having one constructor that accepts all arguments, all parameters
should be provided via setters. Only leader and follower index are required
arguments. This makes using this class in tests and transport client easier.
2018-09-19 07:18:24 +02:00
Martijn van Groningen
805a12361f
[CCR] Fail with a descriptive error if leader index does not exist (#33797)
Closes #33737
2018-09-18 21:47:02 +02:00
Martijn van Groningen
9fe5a273aa
[TEST] handle failed search requests differently 2018-09-18 15:55:27 +02:00
Martijn van Groningen
47b86d6e6a
[CCR] Changed AutoFollowCoordinator to keep track of certain statistics (#33684)
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
   (e.g. create_and_follow api call fails) or cluster alias
   (e.g. fetching remote cluster state fails).

Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.

Relates to #33007
2018-09-18 09:43:50 +02:00
Martijn van Groningen
15f30d689b
[CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and (#33777)
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.

The extra caused by level is not necessary here:

```
"fetch_exceptions": [
              {
                "from_seq_no": 1,
                "retries": 106,
                "exception": {
                  "type": "exception",
                  "reason": "[index1] IndexNotFoundException[no such index]",
                  "caused_by": {
                    "type": "index_not_found_exception",
                    "reason": "no such index",
                    "index_uuid": "_na_",
                    "index": "index1"
                  }
                }
              }
            ],
```
2018-09-17 22:33:37 +02:00
Martijn van Groningen
d8dc042514
[CCR] Handle leader index with no mapping correctly (#33770)
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.

This commit fixes that, by allowing that a leader index does not yet have
a mapping.
2018-09-17 19:47:40 +02:00
Martijn van Groningen
7046cc467f
[CCR] Make index.xpack.ccr.following_index an internal setting (#33768) 2018-09-17 18:08:19 +02:00
Martijn van Groningen
5d2a01dcc3
[CCR] Fail with a good error if a follow index does not have ccr metadata (#33761)
instead of a NPE.
2018-09-17 18:00:16 +02:00
Jason Tedor
2d81fc3873
Keep CCR REST API specification with all of X-Pack (#33743)
This commit moves the CCR REST API specification out of the CCR
sub-project to locate them with the rest of the REST API specifications
for X-Pack.
2018-09-17 09:59:22 -04:00
Martijn van Groningen
481f8a9a07
[CCR] Make auto follow patterns work with security (#33501)
Relates to #33007
2018-09-17 07:29:00 +02:00
Jason Tedor
770ad53978
Introduce long polling for changes (#33683)
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
2018-09-16 10:35:23 -04:00
Jason Tedor
069605bd91
Do not count shard changes tasks against REST tests (#33738)
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
2018-09-16 07:32:12 -04:00
Jason Tedor
73417bf09a
Move CCR REST tests to a sub-project of ccr
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
2018-09-15 10:18:59 -04:00
Jason Tedor
aa56892f2f
Move CCR REST tests to ccr sub-project (#33731)
This commit moves the CCR REST tests to the ccr sub-project as another
step towards running :x-pack:plugin:ccr:check giving us full coverage on
CCR.
2018-09-15 09:18:15 -04:00
Jason Tedor
f037edb8e3
Move CCR monitoring tests to ccr sub-project (#33730)
This commit moves the CCR monitoring tests from the monitoring
sub-project to the ccr sub-project.
2018-09-15 09:16:33 -04:00
Martijn van Groningen
82a6ae1dae
[CCR] Move ccr tests in core module back to ccr module (#33711)
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.

This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check

before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
2018-09-14 17:18:00 +02:00
Jason Tedor
2282150f34
Expose retries for CCR fetch failures (#33694)
This commit exposes the number of times that a fetch has been tried to
the CCR stats endpoint, and to CCR monitoring.
2018-09-14 08:52:46 -04:00
Martijn van Groningen
222f42274e
[CCR] Check whether the rejected execution exception has the shutdown flag set (#33703)
and if so debug log it and otherwise rethrow.

This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
2018-09-14 13:28:11 +02:00
Martijn van Groningen
4bcad95fe7
[TEST] wait for no initializing shards 2018-09-14 09:59:24 +02:00
Martijn van Groningen
53ba253aa4
[CCR] Add validation for max_retry_delay (#33648) 2018-09-13 20:52:00 +02:00
Martijn van Groningen
a69ae6b89f
[CCR] Add metadata to keep track of the index uuid of the leader index in the follow index (#33367)
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.

The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.

Closes #31505
2018-09-13 11:36:52 +02:00
Jason Tedor
eb715d5290
Add follower index to CCR monitoring and status (#33645)
This commit adds the follower index to CCR shard follow task status, and
to monitoring.
2018-09-12 17:35:06 -04:00
Martijn van Groningen
b5d8495789
[CCR] Add auto follow pattern APIs to transport client. (#33629) 2018-09-12 21:50:22 +02:00
Martijn van Groningen
5fa81310cc
[CCR] Added history uuid validation (#33546)
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.

* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
  index shards are the same as the recorded history UUIDs.
  If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
  on each shard changes api call verify whether their current history uuid
  is the same as the recorded history uuid.

Relates to #30086 

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2018-09-12 19:42:00 +02:00
Martijn van Groningen
901d8035d9
[CCR] Update es monitoring mapping and (#33635)
* [CCR] Update es monitoring mapping and
change qa tests to query based on leader index.


Co-authored-by: Jason Tedor <jason@tedor.me>
2018-09-12 19:36:17 +02:00
Tanguy Leroux
bcac7f5e55 Fix checkstyle violation in ShardFollowNodeTask 2018-09-12 16:03:52 +02:00
Jason Tedor
23f12e42c1
Expose CCR stats to monitoring (#33617)
This commit exposes the CCR stats endpoint to monitoring collection.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2018-09-12 09:13:07 -04:00
Martijn van Groningen
96c49e5ed0
[CCR] Improve shard follow task's retryable error handling (#33371)
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.

Relates to #30086
2018-09-12 12:49:51 +02:00
Jason Tedor
20476b9e06
Disable CCR REST endpoints if CCR disabled (#33619)
This commit avoids enabling the CCR REST endpoints if CCR is disabled.
2018-09-12 01:54:34 -04:00
Jason Tedor
eca37e6e0a
Expose CCR to the transport client (#33608)
This commit exposes CCR to the transport client.
2018-09-11 16:37:52 -04:00
Martijn van Groningen
74d41857c6
mute test on windows
Relates #33570
2018-09-10 16:49:17 +02:00
Martijn van Groningen
8eebca32d2
[CCR] Delay auto follow license check (#33557)
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured

Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
2018-09-10 13:23:02 +02:00
Martijn van Groningen
c4adcee3ea
[CCR] Add create_follow_index privilege (#33559)
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.

Closes #33555
2018-09-10 13:08:20 +02:00