Commit Graph

127 Commits

Author SHA1 Message Date
Julie Tibshirani 36e5670f76 Unmute FullClusterRestartIT#testSearch. 2020-05-28 10:55:17 -07:00
Gordon Brown 2f561084f0
Mute FullClusterRestartIT.testSearch (#57249)
This test fails reliably on upgrades from 6.0.0 and 6.0.1. See #57245 for details.
2020-05-27 13:15:05 -06:00
Nhat Nguyen 5b08eaf90c
Fix trimUnsafeCommits for indices created before 6.2 (#57187)
If an upgraded node is restarted multiple times without flushing a new
index commit, then we will wrongly exclude all commits from the starting
commits. This bug is reproducible with these minimal steps: (1) create
an empty index on 6.1.4 with translog retention disabled, (2) upgrade
the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem
is that with the new translog policy can trim translog without having a
new index commit, while the existing commit still refers to the previous
translog generation.

Closes #57091
2020-05-27 15:08:49 -04:00
Rene Groeschke c29bc87040
Move bwcVersions extension property to BuildParams (back port) (#56381)
* Move bwcVersions extension property to BuildParams (#56206)
* Fix :qa Task Using Broken BwC Versions Resolution (#56332)

Co-authored-by: Armin Braun <me@obrown.io>
2020-05-11 09:39:13 +02:00
Mark Vieira dd73a14d11
Improve total build configuration time (#54611) (#54994)
This commit includes a number of changes to reduce overall build
configuration time. These optimizations include:

- Removing the usage of the 'nebula.info-scm' plugin. This plugin
   leverages jgit to load read various pieces of VCS information. This
   is mostly overkill and we have our own minimal implementation for
   determining the current commit id.
- Removing unnecessary build dependencies such as perforce and jgit
   now that we don't need them. This reduces our classpath considerably.
- Expanding the usage lazy task creation, particularly in our
   distribution projects. The archives and packages projects create
   lots of tasks with very complex configuration. Avoiding the creation
   of these tasks at configuration time gives us a nice boost.
2020-04-08 16:47:02 -07:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Mark Vieira 4b528d97ad
Consolidate duplication of BWC testing task setup in script plugin (#53079)
(cherry picked from commit 33fc8e7ebfac8d47a5f9f026b3836bb47bea141a)
2020-03-03 14:43:02 -08:00
Nhat Nguyen fa701e4c1f Fix testRetentionLeasesEstablishedWhenRelocatingPrimary (#52445)
Replace the current assertion with a more robust assertion.

Closes #52364
2020-02-26 22:06:01 -05:00
Mark Vieira 8d2370bf00
Always use bundled JDK for external cluster nodes when BWC testing (#51505) (#51701) 2020-01-30 14:35:43 -08:00
Nhat Nguyen 69ef9b05cd Increase shard inactive time to 1h in upgrade tests (#51651)
testRecovery relies on the fact that shards are not flushed on inactive. 
Our CI recently was too slow. It took more than 20 minutes to complete
the full cluster restart suite. This slowness caused some shards of
testRecovery were flushed on inactive.

This commit increases the inactive time to 1h to reduce this noise.

Closes #51640
2020-01-30 14:48:56 -05:00
Nhat Nguyen fb32a55dd5 Deprecate synced flush (#50835)
A normal flush has the same effect as a synced flush on Elasticsearch 
7.6 or later. It's deprecated in 7.6 and will be removed in 8.0.

Relates #50776
2020-01-13 19:54:38 -05:00
Nhat Nguyen 05f97d5e1b Revert "Deprecate synced flush (#50835)"
This reverts commit 1a32d7142a.
2020-01-13 11:41:03 -05:00
Nhat Nguyen 1a32d7142a
Deprecate synced flush (#50835)
A normal flush has the same effect as a synced flush on Elasticsearch 
7.6 or later. It's deprecated in 7.6 and will be removed in 8.0.

Relates #50776
2020-01-13 10:58:29 -05:00
Nhat Nguyen 33204c2055 Use peer recovery retention leases for indices without soft-deletes (#50351)
Today, the replica allocator uses peer recovery retention leases to
select the best-matched copies when allocating replicas of indices with
soft-deletes. We can employ this mechanism for indices without
soft-deletes because the retaining sequence number of a PRRL is the
persisted global checkpoint (plus one) of that copy. If the primary and
replica have the same retaining sequence number, then we should be able
to perform a noop recovery. The reason is that we must be retaining
translog up to the local checkpoint of the safe commit, which is at most
the global checkpoint of either copy). The only limitation is that we
might not cancel ongoing file-based recoveries with PRRLs for noop
recoveries. We can't make the translog retention policy comply with
PRRLs. We also have this problem with soft-deletes if a PRRL is about to
expire.

Relates #45136
Relates #46959
2019-12-23 22:04:07 -05:00
Tim Brooks cb73fb0f9b
Backport remote proxy mode stats and naming (#50402)
* Update remote cluster stats to support simple mode (#49961)

Remote cluster stats API currently only returns useful information if
the strategy in use is the SNIFF mode. This PR modifies the API to
provide relevant information if the user is in the SIMPLE mode. This
information is the configured addresses, max socket connections, and
open socket connections.

* Send hostname in SNI header in simple remote mode (#50247)

Currently an intermediate proxy must route conncctions to the
appropriate remote cluster when using simple mode. This commit offers
a additional mechanism for the proxy to route the connections by
including the hostname in the TLS SNI header.

* Rename the remote connection mode simple to proxy (#50291)

This commit renames the simple connection mode to the proxy connection
mode for remote cluster connections. In order to do this, the mode specific
settings which we namespaced by their mode (ex: sniff.seed and
proxy.addresses) have been reverted.

* Modify proxy mode to support a single address (#50391)

Currently, the remote proxy connection mode uses a list setting for the
proxy address. This commit modifies this so that the setting is
proxy_address and only supports a single remote proxy address.
2019-12-19 18:02:48 -07:00
Nhat Nguyen c732d9923d Fix doc type in FullClusterRestartIT
"_doc" is not accepted in 6.x version.
2019-12-15 21:54:57 -05:00
Nhat Nguyen 4d22e3cd15 Skip bwc versions without retention leases 2019-12-15 13:18:13 -05:00
Nhat Nguyen df46848fb0 Migrate peer recovery from translog to retention lease (#49448)
Since 7.4, we switch from translog to Lucene as the source of history
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates #45136
2019-12-15 10:24:39 -05:00
Tim Brooks e965a6f2df
Fix remote settings upgrade test (#49609)
This commit fixes #49587. Due to a settings change, the broken test was
asserting on the incorrect setting. This commit fixes that issue and
adds additional assertions to ensure that all settings are working
properly.
2019-11-26 16:37:27 -07:00
Hendrik Muhs 41daf284f5 mute FullClusterRestartSettingsUpgradeIT 2019-11-26 13:28:35 +01:00
Tim Brooks 416178c7c8
Enable simple remote connection strategy (#49561)
This commit back ports three commits related to enabling the simple
connection strategy.

Allow simple connection strategy to be configured (#49066)

Currently the simple connection strategy only exists in the code. It
cannot be configured. This commit moves in the direction of allowing it
to be configured. It introduces settings for the addresses and socket
count. Additionally it introduces new settings for the sniff strategy
so that the more generic number of connections and seed node settings
can be deprecated.

The simple settings are not yet registered as the registration is
dependent on follow-up work to validate the settings.

Ensure at least 1 seed configured in remote test (#49389)

This fixes #49384. Currently when we select a random subset of seed
nodes from a list, it is possible for 0 seeds to be selected. This test
depends on at least 1 seed being selected.

Add the simple strategy to cluster settings (#49414)

This is related to #49067. This commit adds the simple connection
strategy settings and strategy mode setting to the cluster settings
registry. With these changes, the simple connection mode can be used.
Additionally, it adds validation to ensure that settings cannot be
misconfigured.
2019-11-25 16:53:07 -07:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Mark Vieira 6ab4645f4e
[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818)
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.

Closes #42042
2019-11-01 11:33:11 -07:00
Alpar Torok 2b16d7bcf8
Backport testclusters all (#47565)
* Bwc testclusters all (#46265)

Convert all bwc projects to testclusters

* Fix bwc versions config

* WIP fix rolling upgrade

* Fix bwc tests on old versions

* Fix rolling upgrade
2019-10-04 16:12:53 +03:00
Alpar Torok 67bf3a4f51 Fix default distro bwc tests 2019-10-04 09:44:17 +03:00
Nhat Nguyen 44fdf2020a Always flush in FullClusterRestartIT#testRecovery (#47465)
The pattern in the latest failure is similar to the source fixed in #46956
but relates to synced-flush. If peer recovery happens after indexing,
and indexing flushes some shard at the end, then a synced flush in the
test will not roll or commit translog.

Closes #46712
2019-10-02 18:03:22 -04:00
Alpar Torok a032f9b2d5
Backport testclusters fix bwc (#47363)
* Add support for bwc for testclusters and convert full cluster restart (#45374)

* Testclusters fix bwc (#46740)

Additions to make testclsuters work with lather versions of ES

* Do common node config on bwc tests

Before this PR we always ever ran `ElasticsearchCluster.start` once, and
the common node config was never done.
This becomes apparent in upgrading from `6.x` to `7.x` as the new config
is missing preventing the cluster from starting.

* Do common node config on bwc tests

Before this PR we always ever ran `ElasticsearchCluster.start` once, and
the common node config was never done.
This becomes apparent in upgrading from `6.x` to `7.x` as the new config
is missing preventing the cluster from starting.

* Fix logic to pick up snapshot from 6.x

* Make sure ports are cleared

* Fix test

* Don't clear all the config as we rely on it

* Fix removal of keys
2019-10-02 14:37:00 +03:00
Nhat Nguyen e8515d1d13 Force flush in FullClusterRestartIT#testRecovery (#46956)
If peer recovery happens after indexing, and indexing flushes some shard
at the end, then the explicit flush in the test will be a noop. Then
replicas will have some uncommitted translog , which is transferred in
peer recovery, although all of these operations are in the commit
already. If that replica becomes primary (after we restarted the
cluster), it will have translog to replay and the test will fail.

Another issue in this test is that synced_flush is not a replication
action, then the global checkpoint on replicas might be not up to date.
We need to either wait for the global checkpoint to be synced or call a
replication action to sync it.

Closes #46712
2019-09-22 19:04:01 -04:00
David Turner 9ff320d967
Use index for peer recovery instead of translog (#45137)
Today we recover a replica by copying operations from the primary's translog.
However we also retain some historical operations in the index itself, as long
as soft-deletes are enabled. This commit adjusts peer recovery to use the
operations in the index for recovery rather than those in the translog, and
ensures that the replication group retains enough history for use in peer
recovery by means of retention leases.

Reverts #38904 and #42211
Relates #41536
Backport of #45136 to 7.x.
2019-08-02 15:00:43 +01:00
Nhat Nguyen ab832c4f17 Use doc instead of _doc in FullClusterRestartIT
ES does not accept doc type starting with underscore until 6.2.0. We
have to use "doc" instead of "_doc" in FullClusterRestartIT if we are
upgrading from a 6.2.0- cluster.

Closes #42581
2019-05-27 21:35:56 -04:00
Ignacio Vera 5d3e381648
mute test testClosedIndices (#42582) 2019-05-27 12:05:17 +02:00
Nhat Nguyen 5d2fcc53e4 Unmute FullClusterRestartIT#testClosedIndices
Fixed in #39566
Closes #39576
2019-05-26 11:20:04 -04:00
Jason Tedor 7f3ab4524f
Bump 7.x branch to version 7.2.0
This commit adds the 7.2.0 version constant to the 7.x branch, and bumps
BWC logic accordingly.
2019-05-01 13:38:57 -04:00
Mark Vieira 1287c7d91f
[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) (#40993)
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)

This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.

(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)

* Fix forking JVM runner

* Don't bump shadow plugin version
2019-04-09 11:52:50 -07:00
Mark Vieira 2569fb60de Avoid sharing source directories as it breaks intellij (#40877)
* Avoid sharing source directories as it breaks intellij
* Subprojects share main project output classes directory
* Fix jar hell
* Fix sql security with ssl integ tests
* Relax dependency ordering rule so we don't explode on cycles
2019-04-08 17:26:46 +03:00
Alpar Torok 0f89427eb6
Back port build changes from same version bwc tests (#39744)
* Back port build changes from #39102

This back-ports how versions are determined and bwc test are set up from
 #39102 without enabling the bwc from current version tests so it's
 easier/possible to backmerge future buld changes.
It's expected that the tets are lacking many of the required fixes in
this version to enable them.
2019-03-07 17:25:09 +02:00
David Turner f5fb93afdf Mute FullClusterRestartIT#testClosedIndices (#39600)
Relates #39576
2019-03-02 12:26:45 +00:00
Tanguy Leroux e005eeb0b3
Backport support for replicating closed indices to 7.x (#39506)(#39499)
Backport support for replicating closed indices (#39499)
    
    Before this change, closed indexes were simply not replicated. It was therefore
    possible to close an index and then decommission a data node without knowing
    that this data node contained shards of the closed index, potentially leading to
    data loss. Shards of closed indices were not completely taken into account when
    balancing the shards within the cluster, or automatically replicated through shard
    copies, and they were not easily movable from node A to node B using APIs like
    Cluster Reroute without being fully reopened and closed again.
    
    This commit changes the logic executed when closing an index, so that its shards
    are not just removed and forgotten but are instead reinitialized and reallocated on
    data nodes using an engine implementation which does not allow searching or
     indexing, which has a low memory overhead (compared with searchable/indexable
    opened shards) and which allows shards to be recovered from peer or promoted
    as primaries when needed.
    
    This new closing logic is built on top of the new Close Index API introduced in
    6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before
    closing them, and closing an index on a 8.0 cluster will reinitialize the index shards
    and therefore impact the cluster health.
    
    Some APIs have been adapted to make them work with closed indices:
    - Cluster Health API
    - Cluster Reroute API
    - Cluster Allocation Explain API
    - Recovery API
    - Cat Indices
    - Cat Shards
    - Cat Health
    - Cat Recovery
    
    This commit contains all the following changes (most recent first):
    * c6c42a1 Adapt NoOpEngineTests after #39006
    * 3f9993d Wait for shards to be active after closing indices (#38854)
    * 5e7a428 Adapt the Cluster Health API to closed indices (#39364)
    * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767)
    * 71f5c34 Recover closed indices after a full cluster restart (#39249)
    * 4db7fd9 Adapt the Recovery API for closed indices (#38421)
    * 4fd1bb2 Adapt more tests suites to closed indices (#39186)
    * 0519016 Add replica to primary promotion test for closed indices (#39110)
    * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631)
    * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955)
    * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
    * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329)
    * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327)
    * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326)
    * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024)
    * e53a9be Fix compilation error in IndexShardIT after merge with master
    * cae4155 Relax NoOpEngine constraints (#37413)
    * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
    * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903)
    
    Relates to #33888
2019-03-01 14:48:26 +01:00
Christoph Büscher 9f6c77fad4
Fix FullClusterRestartIT#testSnapshotRestore (#38795)
This test failed on 7.1 when running full cluster restart tests against pre-7.0
clusters (e.g. 6.6 clusters). The fixes the expected type in the
templates after the cluster restart.
2019-02-15 20:12:26 +01:00
Julie Tibshirani e769cb4efd Perform precise check for types warnings in cluster restart tests. (#37944)
Instead of using `WarningsHandler.PERMISSIVE`, we only match warnings
that are due to types removal.

This PR also renames `allowTypeRemovalWarnings` to `allowTypesRemovalWarnings`.

Relates to #37920.
2019-02-13 11:28:58 -08:00
Alpar Torok bd4ca4c702 Rename integTest to bwcTestSample for bwc test projects (#38433)
* Rename integTest to bwcTestSample for bwc test projects

This change renames the `integTest` task to `bwcTestSample` for projects
testing bwc to make it possible to run all the bwc tests that check
would run without running on bwc tests.

This change makes it possible to add a new PR check on backports to make
sure these don't break BWC tests in master.

* Rename task as per PR
2019-02-11 15:05:16 +02:00
Martijn van Groningen b284fede0b
Make qa/full-cluster-restart tests pass. By fixing a helper method and (#38604)
muting a test.

Relates to #38603
2019-02-08 14:14:23 +01:00
Tim Vernum 1008f1c68e
Only "include_type_name" if running on >= 7 (#38594)
In cluster restart tests, we need to "include_type_name" if the
cluster includes a pre-7 version, but the test is running against
a 7+ version
2019-02-08 18:06:15 +11:00
Jason Tedor fdf6b3f23f
Add 7.1 version constant to 7.x branch (#38513)
This commit adds the 7.1 version constant to the 7.x branch.

Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
2019-02-07 16:32:27 -05:00
Mayya Sharipova e4fa32470b
Types removal fix FullClusterRestartIT warnings (#38445)
Backport PR #38389 for 6.7 produces warnings for rollover test.
This fixes FullClusterRestartIT warning expectations
for rollover request

Relates to #38389
2019-02-05 14:15:43 -05:00
markharwood 578fd14257
Types removal - fix FullClusterRestartIT warning expectations (#38310)
Relax test warning message checking to pre-empt PR 38022 landing in 6.7 with new warning messages.
The relaxed test now just assumes any warning message starting with “[types removal]” is tolerated rather than the precise phrasing used in the 6.7 branch.
2019-02-04 20:09:07 +00:00
Nhat Nguyen 75abb5b8a6
Adapt LLRest warning exception in FullClusterRestartIT (#38253)
We now throw a WarningFailureException instead of ResponseException if
there's any warning in a response. This change leads to the failures of
testSnapshotRestore in the BWC builds for the last two days.

Relates #37247
2019-02-02 12:09:14 -05:00
Tanguy Leroux 0e6a7c20a1
Fix FullClusterRestartIT.testHistoryUUIDIsAdded (#38098)
This test failed once because the index wasn't fully ready 
(ie, engine opened). This commit changes the test so that 
it waits for the index to be green before checking the history 
UUID.

Closes #34452
2019-02-01 11:16:54 +01:00
Adrien Grand c8af0f4bfa
Use mappings to format doc-value fields by default. (#30831)
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.

This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
2019-01-30 10:31:51 +01:00
markharwood b889221f75
Types removal - deprecate include_type_name with index templates (#37484)
Added deprecation warnings for use of include_type_name in put/get index templates.
HLRC changes:
GetIndexTemplateRequest has a new client-side class which is a copy of server's GetIndexTemplateResponse but modified to be typeless.
PutIndexTemplateRequest has a new client-side counterpart which doesn't use types in the mappings
Relates to #35190
2019-01-29 20:52:41 +00:00