Commit Graph

335 Commits

Author SHA1 Message Date
Martijn van Groningen 88f4b0a326
Do not set fatal exception when shard follow task is stopped. (#37603)
When shard follow task is cancelled while fetching operations then
the fatal exception field should not be set.
2019-01-21 07:54:51 +01:00
Tim Brooks fe753ee1d2
Do not add index event listener if CCR disabled (#37432)
Currently we add the CcrRestoreSourceService as a index event
listener. However, if ccr is disabled, this service is null and we
attempt to add a null listener throwing an exception. This commit only
adds the listener if ccr is enabled.
2019-01-18 16:31:21 -07:00
Tim Brooks cd41289396
Add local session timeouts to leader node (#37438)
This is related to #35975. This commit adds timeout functionality to
the local session on a leader node. When a session is started, a timeout
is scheduled using a repeatable runnable. If the session is not accessed
in between two runs the session is closed. When the sssion is closed,
the repeating task is cancelled.

Additionally, this commit moves session uuid generation to the leader
cluster. And renames the PutCcrRestoreSessionRequest to
StartCcrRestoreSessionRequest to reflect that change.
2019-01-18 14:48:20 -07:00
Martijn van Groningen 6846666b6b
Add ccr follow info api (#37408)
* Add ccr follow info api

This api returns all follower indices and per follower index
the provided parameters at put follow / resume follow time and
whether index following is paused or active.

Closes #37127

* iter

* [DOCS] Edits the get follower info API

* [DOCS] Fixes link to remote cluster

* [DOCS] Clarifies descriptions for configured parameters
2019-01-18 16:37:21 +01:00
Tim Brooks 978c818d0f
Use RestoreSnapshotRequest in CcrRepositoryIT
Commit #37535 removed an internal restore request in favor of the
RestoreSnapshotRequest. Commit #37449 added a new test that used the
internal restore request. This commit modifies the new test to use the
RestoreSnapshotRequest.
2019-01-17 15:31:27 -07:00
Tim Brooks b6f06a48c0
Implement follower rate limiting for file restore (#37449)
This is related to #35975. This commit implements rate limiting on the
follower side using a new class `CombinedRateLimiter`.
2019-01-17 14:58:46 -07:00
Armin Braun 381d035cd6
Remove Redundant RestoreRequest Class (#37535)
* Same as #37464 but for the restore side
2019-01-17 22:23:23 +01:00
Martijn van Groningen b85bfd3e17
Added fatal_exception field for ccr stats in monitoring mapping. (#37563) 2019-01-17 14:04:41 +01:00
Martijn van Groningen 99b09845da
Moved ccr integration to the package with other ccr integration tests. 2019-01-17 13:57:56 +01:00
Przemyslaw Gomulka 5e94f384c4
Remove the use of AbstracLifecycleComponent constructor #37488 (#37488)
The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class.
It no longer extends the AbstractComponent so there is no need for this constructor
There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor.
This is part 1. which will be backported to 6.x with a migration guide/deprecation log.
part 2 will have this constructor removed in 7
relates #35560

relates #34488
2019-01-16 09:05:30 +01:00
Martijn van Groningen 9554b2fecb
When removing an AutoFollower also mark it as removed. (#37402)
Currently when there are no more auto follow patterns for a remote cluster then
the AutoFollower instance for this remote cluster will be removed. If
a new auto follow pattern for this remote cluster gets added quickly enough
after the last delete then there may be two AutoFollower instance running
for this remote cluster instead of one.

Each AutoFollower instance stops automatically after it sees in the
start() method that there are no more auto follow patterns for the
remote cluster it is tracking. However when an auto follow pattern
gets removed and then added back quickly enough then old AutoFollower
may never detect that at some point there were no auto follow patterns
for the remote cluster it is monitoring. The creation and removal of
an AutoFollower instance happens independently in the `updateAutoFollowers()`
as part of a cluster state update.

By adding the `removed` field, an AutoFollower instance will not miss the
fact there were no auto follow patterns at some point in time. The
`updateAutoFollowers()` method now marks an AutoFollower instance as
removed when it sees that there are no more patterns for a remote cluster.
The updateAutoFollowers() method can then safely start a new AutoFollower
instance.

Relates to #36761
2019-01-15 16:24:19 +01:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Tim Brooks 5c68338a1c
Implement ccr file restore (#37130)
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
2019-01-14 13:07:55 -07:00
Martijn van Groningen de852765d6
unmuted test
Relates to #37014
2019-01-14 14:27:42 +01:00
Martijn van Groningen e4391afd98
Test fix, wait for auto follower to have stopped in the background
Relates to #36761
2019-01-11 17:26:17 +01:00
Martijn van Groningen 6d81e7c3e7
[CCR] FollowingEngine should fail with 403 if operation has no seqno assigned (#37213)
Fail with a 403 when indexing a document directly into a follower index.

In order to test this change, I had to move specific assertions into a dedicated class and
disable assertions for that class in the rest qa module. I think that is the right trade off.
2019-01-10 15:54:34 +01:00
Martijn van Groningen df488720e0
[CCR] Make shard follow tasks more resilient for restarts (#37239)
If a running shard follow task needs to be restarted and
the remote connection seeds have changed then
a shard follow task currently fails with a fatal error.

The change creates the remote client lazily and adjusts
the errors a shard follow task should retry.

This issue was found in test failures in the recently added
ccr rolling upgrade tests. The reason why this issue occurs
more frequently in the rolling upgrade test is because ccr
is setup in local mode (so remote connection seed will become stale) and
all nodes are restarted, which forces the shard follow tasks to get
restarted at some point during the test. Note that these tests
cannot be enabled yet, because this change will need to be backported
to 6.x first. (otherwise the issue still occurs on non upgraded nodes)

I also changed the RestartIndexFollowingIT to setup remote cluster
via persistent settings and to also restart the leader cluster. This
way what happens during the ccr rolling upgrade qa tests, also happens
in this test.

Relates to #37231
2019-01-10 15:02:30 +01:00
Martijn van Groningen 1a41d84536
[CCR] Resume follow Api should not require a request body (#37217)
Closes #37022
2019-01-10 09:48:26 +01:00
Martijn van Groningen 9122585359
[CCR] Added more logging. 2019-01-09 12:17:47 +01:00
Alpar Torok 6344e9a3ce
Testing conventions: add support for checking base classes (#36650) 2019-01-08 13:39:03 +02:00
Jason Tedor c8c596cead
Introduce retention lease expiration (#37195)
This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
2019-01-07 22:03:52 -08:00
Jason Tedor c0f8c89172
Introduce shard history retention leases (#37167)
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.

Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.

Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.

Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.

This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
 - timestamps
 - expiration
 - persistence to and recovery from Lucene
 - handoff during primary relocation
 - sharing retention leases with replicas
 - exposing leases in shard-level statistics
 - integration with cross-cluster replication

These will occur individually in follow-up commits.
2019-01-07 07:43:57 -08:00
Jim Ferenczi e38cf1d0dc
Add the ability to set the number of hits to track accurately (#36357)
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.

Relates #33028
2019-01-04 20:36:49 +01:00
Luca Cavanna c1beb95aa1 Mute LocalIndexFollowingIT#testRemoveRemoteConnection
Relates to #37014
2018-12-28 16:39:36 +01:00
Nhat Nguyen 7580d9d925
Make SourceToParse immutable (#36971)
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.

Relates #36921
2018-12-24 14:06:50 -05:00
Martijn van Groningen 561b704129
[CCR] AutoFollowCoordinator and follower index already created (#36540)
The AutoFollowCoordinator should be resilient to the fact that the follower
index has already been created and in that case it should only update
the auto follow metadata with the fact that the follower index was created.

Relates to #33007
2018-12-24 10:16:38 +01:00
Martijn van Groningen 44fe265d82
[CCR] Added auto_follow_exception.timestamp field to auto follow stats (#36947)
Currently auto follow stats users are unable to see whether an auto follow
error was recent or old. The new timestamp field will help user distinguish
between old and new errors.
2018-12-24 07:53:51 +01:00
Martijn van Groningen 4fb62fcba6
Make CCR resilient against missing remote cluster connections (#36682)
Both index following and auto following should be resilient against missing remote connections.
This happens in the case that they get accidentally removed by a user. When this happens
auto following and index following will retry to continue instead of failing with unrecoverable exceptions.

Both the put follow and put auto follow APIs validate whether the
remote cluster connection. The logic added in this change only exists
in case during the lifetime of a follower index or auto follow pattern
the remote connection gets removed. This retry behavior similar how CCR
deals with authorization errors.

Closes #36667
Closes #36255
2018-12-24 07:28:34 +01:00
Martijn van Groningen 4ded4717fe
[CCR] Add `ccr.auto_follow_coordinator.wait_for_timeout` setting (#36714)
This setting controls the wait for timeout the autofollow coordinator
should use when setting cluster state requests to a remote cluster.
2018-12-21 09:36:40 +01:00
Tim Brooks d9b2ed6135
Send clear session as routable remote request (#36805)
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.

It implements the ccr clean_session action with this client.
2018-12-20 17:43:12 -07:00
Tim Brooks 4cd570593d
Update index mappings when ccr restore complete (#36879)
This is related to #35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
2018-12-20 13:53:04 -07:00
Martijn van Groningen b42074c1cc
[CCR] Report error if auto follower tries auto follow a leader index with soft deletes disabled (#36886)
Currently if a leader index with soft deletes disabled is auto followed then this index is silently ignored.
This commit changes this behavior to mark these indices as auto followed and report an error, which is visible in auto follow stats. Marking the index as auto follow is important, because otherwise the auto follower will continuously try to auto follow and fail.

Relates to #33007
2018-12-20 15:21:52 +01:00
Martijn van Groningen 7b1dfeff2e
Renamed `WHITE_LISTED_SETTINGS` to `NON_REPLICATED_SETTINGS`
because the latter better describes the purpose of this field.
2018-12-20 15:08:04 +01:00
Martijn van Groningen 18691daebe
[TEST] Renamed ccr qa module. 2018-12-19 13:57:12 +01:00
Martijn van Groningen 3cc0cf03c6
[TEST] No need to specifically check licensesMetaData on master node. 2018-12-19 13:51:24 +01:00
Martijn van Groningen a6af33ef0b
[TEST] Wait for license metadata to be installed 2018-12-19 13:03:45 +01:00
Alpar Torok e9ef5bdce8
Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311)
- Create a separate unitTest task instead of Gradle's built in 
- convert all configuration to use the new task 
- the  built in task is now disabled
2018-12-19 08:25:20 +02:00
Tim Brooks 1fa105658e
Add CcrRestoreSourceService to track sessions (#36578)
This commit is related to #36127. It adds a CcrRestoreSourceService to
track Engine.IndexCommitRef need for in-process file restores. When a
follower starts restoring a shard through the CcrRepository it opens a
session with the leader through the PutCcrRestoreSessionAction. The
leader responds to the request by telling the follower what files it
needs to fetch for a restore. This is not yet implemented.

Once, the restore is complete, the follower closes the session with the
DeleteCcrRestoreSessionAction action.
2018-12-18 11:23:13 -07:00
Martijn van Groningen 1afcfc97bd
[TEST] Added more logging
Relates to #36761
2018-12-18 16:01:02 +01:00
Boaz Leskes 5f76f39386
Rename seq# powered optimistic concurrency control parameters to ifSeqNo/ifPrimaryTerm (#36757)
This PR renames the parameters previously introduce to the following:

### URL Parameters
```
PUT twitter/_doc/1?if_seq_no=501&if_primary_term=1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

DELETE twitter/_doc/1?if_seq_no=501&if_primary_term=1
```

### Bulk API
```
POST _bulk
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1", "if_seq_no": 501, "if_primary_term": 1 } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2", "if_seq_no": 501, "if_primary_term": 1 } }
```

### Java API
```
IndexRequest.ifSeqNo(long seqNo)
IndexRequest.ifPrimaryTerm(long primaryTerm)
DeleteRequest.ifSeqNo(long seqNo)
DeleteRequest.ifPrimaryTerm(long primaryTerm)
```

Relates #36148
Relates #10708
2018-12-18 14:35:18 +01:00
Martijn van Groningen 0ff1f1fa18
Muted tests.
Relates to #36764
2018-12-18 13:39:01 +01:00
Martijn van Groningen 57e1a4bc9f
[TEST] Ensure shard follow tasks have really stopped.
Relates to #36696
2018-12-18 10:43:27 +01:00
Tim Brooks 3dd5a5a3c5
Initialize startup `CcrRepositories` (#36730)
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
2018-12-17 13:19:32 -07:00
Martijn van Groningen a181a25226
[CCR] Add time since last auto follow fetch to auto follow stats (#36542)
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.

Relates to #33007
Originates from #35895
2018-12-17 14:14:56 +01:00
Martijn van Groningen f27d2c2927
[TEST] Pause index following at end of test,
so that no unexpected failures happen at test teardown.
2018-12-17 07:55:27 +01:00
Nhat Nguyen 2028c2af14 TEST: Do not assert max_seq_of_updates if promotion
If a primary promotion happens in the test testAddRemoveShardOnLeader, the
max_seq_no_of_updates_or_deletes on a  new primary might be higher than the
max_seq_no_of_updates_or_deletes on the replicas or copies of the follower.

Relates #36607
2018-12-16 16:48:04 -05:00
Martijn van Groningen 97107e99e8
Moved test to its rightful place. 2018-12-16 13:57:51 +01:00
Boaz Leskes 733a6d34c1
Add seq no powered optimistic locking support to the index and delete transport actions (#36619)
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control) 
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.

Relates #36148 
Relates #10708
2018-12-15 17:59:57 +01:00
Albert Zaharovits a30e8c2fa3
HasPrivilegesResponse use TreeSet for fields (#36329)
For class fields of type collection whose order is not important 
and for which duplicates are not permitted we declare them as `Set`s.
Usually the definition is a `HashSet` but in this case `TreeSet` is used
instead to aid testing.
2018-12-15 08:34:54 +02:00
Martijn van Groningen 68a674ef1f
[CCR] Fix follow stats API's follower index filtering feature (#36647)
Currently always all follow stats for all follower indices are being
returned even if follow stats for only specific indices are requested.
2018-12-14 19:39:30 +01:00