Commit Graph

44497 Commits

Author SHA1 Message Date
Jason Tedor 625d37a26a
Introduce retention lease background sync (#38262)
This commit introduces a background sync for retention leases. The idea
here is that we do a heavyweight sync when adding a new retention lease,
and then periodically we want to background sync any retention lease
renewals to the replicas. As long as the background sync interval is
significantly lower than the extended lifetime of a retention lease, it
is okay if from time to time a replica misses a sync (it will still have
an older version of the lease that is retaining more data as we assume
that renewals do not decrease the retaining sequence number). There are
two follow-ups that will come after this commit. The first is to address
the fact that we have not adapted the should periodically flush logic to
possibly flush the retention leases. We want to do something like flush
if we have not flushed in the last five minutes and there are renewed
retention leases since the last time that we flushed. An additional
follow-up will remove the syncing of retention leases when a retention
lease expires. Today this sync could be invoked in the background by a
merge operation. Rather, we will move the syncing of retention lease
expiration to be done under the background sync. The background sync
will use the heavyweight sync (write action) if a lease has expired, and
will use the lightweight background sync (replication action) otherwise.
2019-02-04 10:35:29 -05:00
Christoph Büscher 5ee7232379
Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334) 2019-02-04 16:10:06 +01:00
Christoph Büscher 715e581378
Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330) 2019-02-04 15:46:19 +01:00
Lee Hinman f19fdcd491
Re-enable accounting breaker check in InternalTestCluster (#38131)
Relates to #30290

The intent for this is to see whether this failure still happens, and if so, provide more up-to-date logs for analysis.
2019-02-04 07:40:59 -07:00
David Roberts fb6a176caf
[ML] Add explanation so far to file structure finder exceptions (#38191)
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
2019-02-04 14:32:35 +00:00
Boaz Leskes e49b593c81
Move TokenService to seqno powered cas (#38311)
Relates #37872 
Relates #10708
2019-02-04 15:25:41 +01:00
Yannick Welsch ece8c659c5
Decrease leader and follower check timeout (#38298)
Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still
being a very long time for a node to be completely unresponsive.
2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka 9b64558efb
Migrating from joda to java.time. Watcher plugin (#35809)
part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time

refers #27330
2019-02-04 15:08:31 +01:00
Daniel Mitterdorfer d975f93967
Use stricter timer in DeadHostStateTests (#38301)
With this commit we add a monotonically strict timer to ensure time is
advancing even if the timer is called in a tight loop in tests. We also
relax a condition in a similar test so it only checks that time is not
moving backwards.

Closes #33747
2019-02-04 15:03:31 +01:00
Przemyslaw Gomulka 85b4bfe3ff
Core: Migrating from joda to java.time. Monitoring plugin (#36297)
monitoring plugin migration from joda to java.time

refers #27330
2019-02-04 14:47:08 +01:00
Christoph Büscher 7ed3e6e07e
Mute MlMigrationFullClusterRestartIT#testMigration (#38315) 2019-02-04 11:38:01 +01:00
Alexander Reelsen 87f3579125
Add nanosecond field mapper (#37755)
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1970 to 2262.

Note that aggregations will still be in milliseconds.
However docvalue fields will have full nanosecond resolution

Relates #27330
2019-02-04 11:31:16 +01:00
Christoph Büscher 15510da2af
Mute SharedClusterSnapshotRestoreIT#testAbortedSnapshotDuringInitDoesNotStart (#38304) 2019-02-04 10:41:35 +01:00
Boaz Leskes ff13a43144
Move ML Optimistic Concurrency Control to Seq No (#38278)
This commit moves the usage of internal versioning for CAS operations to use sequence numbers and primary terms

Relates to #36148
Relates to #10708
2019-02-04 10:41:08 +01:00
David Turner 1d82a6d9f9
Deprecate unused Zen1 settings (#38289)
Today the following settings in the `discovery.zen` namespace are still used:

- `discovery.zen.no_master_block`
- `discovery.zen.hosts_provider`
- `discovery.zen.ping.unicast.concurrent_connects`
- `discovery.zen.ping.unicast.hosts.resolve_timeout`
- `discovery.zen.ping.unicast.hosts`

This commit deprecates all other settings in this namespace so that they can be
removed in the next major version.
2019-02-04 08:52:08 +00:00
Armin Braun 4561f425db
Remove Redundandant Loop in SnapshotShardsService (#38283)
* This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state
2019-02-04 09:06:39 +01:00
Alpar Torok d58e899d45
Remove empty service files (#38192) 2019-02-04 10:05:04 +02:00
Tim Vernum 0164acb0a7
Cleanup construction of interceptors (#38294)
It would be beneficial to apply some of the request interceptors even
when features are disabled. This change reworks the way we build that
list so that the interceptors we always want to use are constructed
outside of the settings check.
2019-02-04 17:27:41 +11:00
Costin Leau 75f0750ff7
SQL: Remove exceptions from Analyzer (#38260)
Instead of throwing an exception, use an unresolved attribute to pass
the message to the Verifier.
Additionally improve the parser to save the extended source for the
Aggregate and OrderBy.

Close #38208
2019-02-03 22:32:16 +02:00
Costin Leau a088155f4d
SQL: Move metrics tracking inside PlanExecutor (#38259)
Move metrics in one place, from the transport layer inside the
PlanExecutor
Remove unused class

Close #38258
2019-02-03 22:31:35 +02:00
Christoph Büscher 820029522b
Mute DateProcessorTests#testJodaPatternLocale (#38265)
Only fails on FIPS 8, muting this selectively.
2019-02-03 19:52:53 +01:00
Jason Tedor d2cc1459a3
Fix ordering problem in add or renew lease test (#38280)
We have to set the primary term before we add a retention lease,
otherwise we can not assert the correct primary term.
2019-02-03 12:54:31 -05:00
Christoph Büscher 6ca7a913ea
Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (#38275) 2019-02-03 12:54:13 +01:00
Albert Zaharovits 3c1544d259
Fix NPE in Logfile Audit Filter (#38120)
The culprit in #38097 is an `IndicesRequest` that has no indices,
but instead of `request.indices()` returning `null` or `String[0]`
it returned `String[] {null}` . This tripped the audit filter.

I have addressed this in two ways:
1. `request.indices()` returning `String[] {null}` is treated as `null`
    or `String[0]`, i.e. no indices
2. `null` values among the roles and indices lists, which are
    unexpected, will never again stumble the audit filter; `null` values
    are treated as special values that will not match any policy,
    i.e. their events will always be printed.

Closes #38097
2019-02-03 10:34:17 +02:00
Armin Braun 89d7c57bd9
Fix Incorrect Transport Response Handler Type (#38264)
* Fix Incorrect Transport Response Handler Type
* The response type here is not empty and was always wrong but this only became visible now that 0a604e3b24 was introduced
   * As a result of 0a604e3b24 we started actually handling the response
of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler
* fix busy assert not handling `Exception`
* Closes #38226
* Closes #38256
2019-02-03 08:48:15 +01:00
Nhat Nguyen 0861dc3581
Mute testCanRunUnsafeBootstrapAfterErroneousDetachWithoutLoosingMetaData (#38268)
Tracked at #38267
2019-02-02 20:02:21 -05:00
Andrei Stefan 6968f0925b
SQL: Generate relevant error message when grouping functions are not used in GROUP BY (#38017)
* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.
2019-02-02 22:05:47 +02:00
Nhat Nguyen 75abb5b8a6
Adapt LLRest warning exception in FullClusterRestartIT (#38253)
We now throw a WarningFailureException instead of ResponseException if
there's any warning in a response. This change leads to the failures of
testSnapshotRestore in the BWC builds for the last two days.

Relates #37247
2019-02-02 12:09:14 -05:00
Christoph Büscher 50cdc61874
Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38257) 2019-02-02 13:46:29 +01:00
David Turner c311062476
Add CoordinatorTests for empty unicast hosts list (#38209)
Today we have DiscoveryDisruptionIT tests for checking that discovery can still
work once the cluster has formed, even if the cluster is misconfigured and only
has a single master-eligible node in its unicast hosts list. In fact with Zen2
we can go one better: we do not need any nodes in the unicast hosts list,
because nodes also use the contents of the last-committed cluster state for
discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to
the overenthusiastic fault-detection timeouts.

This commit replaces these tests with deterministic `CoordinatorTests` that
verify the same behaviour. It also removes some duplication by extracting a
test method called `testFollowerCheckerAfterMasterReelection()`

Closes #37687
2019-02-02 07:54:56 +00:00
Nhat Nguyen 80d3092292
Fix primary term in testAddOrRenewRetentionLease (#38239)
We should increase primary term before renewing leases; otherwise, the
term of the latest RetentionLeases will be lower than the current term.

Relates #37951
2019-02-02 02:38:53 -05:00
Nhat Nguyen 1ec04dff43
FIx testReplicaIgnoresOlderRetentionLeasesVersion (#38246)
If the innerLength is 0, the version won't be increased; then there will
be two RetentionLeases with the same term and version, but their leases
are different.

Relates #37951
Closes #38245
2019-02-02 02:37:37 -05:00
Gordon Brown 475a045192
Mute tests in SSLConfigurationReloaderTests (#38248)
Specifically `testReloadingTrustStore` and `testReloadingPEMTrustConfig`
2019-02-01 21:00:58 -07:00
Gordon Brown 7a1e89c7ed
Ensure ILM policies run safely on leader indices (#38140)
Adds a Step to the Shrink and Delete actions which prevents those
actions from running on a leader index - all follower indices must first
unfollow the leader index before these actions can run. This prevents
the loss of history before follower indices are ready, which might
otherwise result in the loss of data.
2019-02-01 20:46:12 -07:00
Nhat Nguyen 8bee5b8e06
Mute testAddOrRenewRetentionLease (#38240)
Relates #38239
2019-02-01 21:27:10 -05:00
Boaz Leskes f6e06a2b19 Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231)
After backporting #37977, #37857 and #37872
2019-02-01 20:37:16 -05:00
Costin Leau 783c9ed372
SQL: Allow sorting of groups by aggregates (#38042)
Introduce client-side sorting of groups based on aggregate
functions. To allow this, the Analyzer has been extended to push down
to underlying Aggregate, aggregate function and the Querier has been
extended to identify the case and consume the results in order and sort
them based on the given columns.
The underlying QueryContainer has been slightly modified to allow a view
of the underlying values being extracted as the columns used for sorting
might not be requested by the user.

The PR also adds minor tweaks, mainly related to tree output.

Close #35118
2019-02-02 01:38:25 +02:00
Jack Conradson 630889baec
Remove extraneous test from Painless lambda tests (#38111)
This test has been awaiting a fix that isn't currently relevant because incoming
lambda parameters are read-only. If this ever changes a new set of tests can
be added that are up-to-date.
2019-02-01 15:10:59 -08:00
Jason Tedor f181e17038
Introduce retention leases versioning (#37951)
Because concurrent sync requests from a primary to its replicas could be
in flight, it can be the case that an older retention leases collection
arrives and is processed on the replica after a newer retention leases
collection has arrived and been processed. Without a defense, in this
case the replica would overwrite the newer retention leases with the
older retention leases. This commit addresses this issue by introducing
a versioning scheme to retention leases. This versioning scheme is used
to resolve out-of-order processing on the replica. We persist this
version into Lucene and restore it on recovery. The encoding of
retention leases is starting to get a little ugly. We can consider
addressing this in a follow-up.
2019-02-01 17:19:19 -05:00
Ioannis Kakavas 78a65c340d
Correctly disable tests for FIPS JVMs (#38214)
Replace assertFalse with assumeFalse

Resolves: #38212
2019-02-01 23:56:35 +02:00
Nhat Nguyen 9c39dea7ae
AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227)
Tracked at #38226
2019-02-01 16:24:02 -05:00
Tal Levy bae656dcea
Preserve ILM operation mode when creating new lifecycles (#38134)
There was a bug where creating a new policy would start
the ILM service, even if it was stopped. This change ensures
that there is no change to the existing operation mode
2019-02-01 13:16:34 -08:00
Nhat Nguyen 3ecdfe1060
Enable trace log in FollowerFailOverIT (#38148)
This suite still fails one per week sometimes with a worrying assertion.
Sadly we are still unable to find the actual source.

Expected: <SeqNoStats{maxSeqNo=229, localCheckpoint=86, globalCheckpoint=86}>
but: was   <SeqNoStats{maxSeqNo=229, localCheckpoint=-1, globalCheckpoint=86}>

This change enables trace log in the suite so we will have a better
picture if this fails again.

Relates #3333
2019-02-01 15:44:39 -05:00
Armin Braun 03a1d21070
SnapshotShardsService Simplifications (#38025)
* Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot
* Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)
2019-02-01 20:46:14 +01:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Boaz Leskes 9350da98a7
Disable bwc preparing to backport of#37977, #37857 and #37872 (#38126) 2019-02-01 20:28:08 +01:00
Benjamin Trent a70f54fc77
Adding ml_settings entry to HLRC and Docs for deprecation_info (#38118) 2019-02-01 12:45:28 -06:00
Nhat Nguyen f64b20383e
Replace awaitBusy with assertBusy in atLeastDocsIndexed (#38190)
Unlike assertBusy, awaitBusy does not retry if the code-block throws an
AssertionError. A refresh in atLeastDocsIndexed can fail because we call
this method while we are closing some node in FollowerFailOverIT.
2019-02-01 13:31:17 -05:00
Luca Cavanna ee57420de6
Adjust SearchRequest version checks (#38181)
The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.
2019-02-01 19:23:13 +01:00
Nhat Nguyen 70235838d1
AwaitsFix testClientSucceedsWithVerificationDisabled (#38213)
Tracked at #38212
2019-02-01 12:50:07 -05:00