Commit Graph

109 Commits

Author SHA1 Message Date
Tanguy Leroux f5c5411fe8
Differentiate base paths in repository integration tests (#47284) (#47300)
This commit change the repositories base paths used in Azure/S3/GCS
integration tests so that they don't conflict with each other when tests
 run in parallel on real storage services.

Closes #47202
2019-10-01 08:39:55 +02:00
Tanguy Leroux 6986d7f968 Add blob container retries tests for Google Cloud Storage (#46968)
Similarly to what has been done for S3 in #45383, this commit 
adds unit tests that verify the behavior of the SDK client and 
blob container implementation for Google Storage when the 
remote service returns errors.

The main purpose was to add an extra test to the specific retry 
logic for 410-Gone errors added in #45963.

Relates #45963
2019-09-24 08:58:24 +02:00
Tanguy Leroux add7148f3b GCS deleteBlobsIgnoringIfNotExists should catch StorageException (#46832)
GoogleCloudStorageBlobStore.deleteBlobsIgnoringIfNotExists() does
not correctly catch StorageException thrown by batch.submit().

In the case a snapshot is deleted through BlobStoreRepository.deleteSnapshot()
a storage exception is not caught (only IOException are) so the deletion is
interrupted and indices cannot be cleaned up. The storage exception bubbles
up to SnapshotService.deleteSnapshotFromRepository() but the listener that
 removes the deletion from the cluster state is not executed, leaving the
deletion in the cluster state.

This bug has been reported in #46772 where batch.submit() threw an
exception in the test testIndicesDeletedFromRepository and following
tests failed because a snapshot deletion was running.

Relates #46772
2019-09-20 10:02:23 +02:00
Tanguy Leroux 3ae51f25dd Move testSnapshotWithLargeSegmentFiles to ESMockAPIBasedRepositoryIntegTestCase (#46802)
This commit moves the common test testSnapshotWithLargeSegmentFiles 
to the ESMockAPIBasedRepositoryIntegTestCase base class.
2019-09-18 15:41:30 +02:00
Tanguy Leroux 4db37801d0 Add resumable uploads support to GCS repository integration tests (#46562)
This commit adds support for resumable uploads to the internal HTTP 
server used in GoogleCloudStorageBlobStoreRepositoryTests. This 
way we can also test the behavior of the Google's client when the 
service returns server errors in response to resumable upload requests.

The BlobStore implementation for GCS has the choice between 2 
methods to upload a blob: resumable and multipart. In the current 
implementation, the client executes a resumable upload if the blob 
size is larger than LARGE_BLOB_THRESHOLD_BYTE_SIZE, 
otherwise it executes a multipart upload. This commit makes this 
logic overridable in tests, allowing to randomize the decision of 
using one method or the other.

The commit add support for single request resumable uploads 
and chunked resumable uploads (the blob is uploaded into multiple 
2Mb chunks; each chunk being a resumable upload). For this last 
case, this PR also adds a test testSnapshotWithLargeSegmentFiles 
which makes it more probable that a chunked resumable upload is 
executed.
2019-09-18 09:33:05 +02:00
Armin Braun 371c355bca
Retry GCS Resumable Upload on Error 410 (#45963) (#46783)
A resumable upload session can fail on with a 410 error and should
be retried in that case. I added retrying twice using resetting of
the given `InputStream` as the retry mechanism since the same
approach is used by the AWS S3 SDK already as well and relied upon
by the S3 repository implementation.

Related GCS documentation:
https://cloud.google.com/storage/docs/json_api/v1/status-codes#410_Gone
2019-09-17 19:06:43 +02:00
Luca Cavanna e57756492a Update http-core and http-client dependencies (#46549)
Relates to #45808
Closes #45577
2019-09-12 09:45:29 +02:00
Tanguy Leroux 88bed09119 Mutualize code in cloud-based repository integration tests (#46483)
This commit factors out some common code between the cloud-based
repository integration tests that were recently improved.

Relates #46376
2019-09-09 16:02:14 +02:00
Tanguy Leroux 8e3dc68454 Inject random server errors in GoogleCloudStorageBlobStoreRepositoryTests (#46376)
This commit modifies the HTTP server used in 
GoogleCloudStorageBlobStoreRepositoryTests so that it randomly 
returns server errors. The test does not inject server errors for the 
following types of request: batch request, resumable upload request.
2019-09-09 09:59:59 +02:00
Tanguy Leroux 28974b5723 Replace mocked client in GCSBlobStoreRepositoryTests by HTTP server (#46255)
This commit removes the usage of MockGoogleCloudStoragePlugin in
GoogleCloudStorageBlobStoreRepositoryTests and replaces it by a
HttpServer that emulates the Storage service. This allows the repository
tests to use the real Google's client under the hood in tests and will allow
us to test the behavior of the snapshot/restore feature for GCS repositories
by simulating random server-side internal errors.

The HTTP server used to emulate the Storage service is intentionally simple
and minimal to keep things understandable and maintainable. Testing full
client options on the server side (like authentication, chunked encoding
etc) remains the responsibility of the GoogleCloudStorageFixture.
2019-09-05 10:37:37 +02:00
Tanguy Leroux 9e14ffa8be Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068) 2019-08-28 16:29:46 +02:00
Jason Tedor 3d64605075
Remove node settings from blob store repositories (#45991)
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
2019-08-26 16:26:13 -04:00
Armin Braun 6aaee8aa0a
Repository Cleanup Endpoint (#43900) (#45780)
* Repository Cleanup Endpoint (#43900)

* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 17:59:49 +02:00
Armin Braun c8db0e9b7e
Remove blobExists Method from BlobContainer (#44472) (#44475)
* We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface
   * Keep it as an implementation detail in the Azure repository
2019-07-17 11:56:02 +02:00
Mark Vieira 7c2e4b2857
[Backport] Enable caching of rest tests which use integ-test distribution (#44181) 2019-07-10 15:42:28 -07:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Armin Braun 3317169c4f
Fix GCS Blob Repository 3rd Party Tests (#43030) (#43913)
* We have to strip the trailing slash from child names here like we do for AWS
* closes #43029
2019-07-03 15:09:28 +02:00
Armin Braun 455b12a4fb
Add Ability to List Child Containers to BlobContainer (#42653) (#43903)
* Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
2019-07-03 11:30:49 +02:00
Armin Braun 21e74dd7d2
Upgrade GCS Repository Dependencies (#43142) (#43418)
* Upgrade to latest GCS SDK and transitive dependencies (I chose the later version here on conflict)
* Remove now unnecessary hack for custom endpoints (the linked bugs were both resolved in the SDK)
2019-06-20 16:35:54 +02:00
Yannick Welsch e5a4a2272b Wipe repositories more often (#42511)
Fixes an issue where repositories are unintentionally shared among tests (given that the repo contents is captured in a static variable on the test class, to allow "sharing" among nodes) and two tests randomly chose the same snapshot name, leading to a conflict.

Closes #42519
2019-06-12 11:58:38 +02:00
Jason Tedor 371cb9a8ce
Remove Log4j 1.2 API as a dependency (#42702)
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
2019-05-30 16:08:07 -04:00
Armin Braun 116b050cc6
Cleanup Bulk Delete Exception Logging (#41693) (#42606)
* Cleanup Bulk Delete Exception Logging

* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
2019-05-28 11:00:28 +02:00
Armin Braun 44bf784fe1
Add Infrastructure to Run 3rd Party Repository Tests (#42586) (#42604)
* Add Infrastructure to Run 3rd Party Repository Tests

* Add infrastructure to run third party repository tests using our standard JUnit infrastructure
* This is a prerequisite of #42189
2019-05-28 10:46:22 +02:00
Armin Braun c4f44024af
Remove Delete Method from BlobStore (#41619) (#42574)
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
  * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
2019-05-27 12:24:20 +02:00
Armin Braun 7cc4b9a8b3
Implement Bulk Deletes for GCS Repository (#41368) (#41681)
* Implement Bulk Deletes for GCS Repository (#41368)

* Just like #40322 for AWS
* We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it
  * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex)
* Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations

back port of #41368
2019-04-30 17:03:57 +02:00
Alpar Torok 335f2bf102 Testclsuters: convert plugins qa projects (#41496)
Add testclusters support for files in keystore and convert qa subprojects within plugins.
2019-04-26 08:57:52 -07:00
Armin Braun aad33121d8
Async Snapshot Repository Deletes (#40144) (#41571)
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
2019-04-26 15:36:09 +02:00
Jay Modi f34663282c
Update apache httpclient to version 4.5.8 (#40875)
This change updates our version of httpclient to version 4.5.8, which
contains the fix for HTTPCLIENT-1968, which is a bug where the client
started re-writing paths that contained encoded reserved characters
with their unreserved form.
2019-04-05 13:48:10 -06:00
David Emanuel Buchmann b5ed039160
plugins/repository-gcs: Update google-cloud-storage/core to 1.59.0 (#39748)
* plugins/repository-gcs: Update google-cloud-storage /
google-cloud-core to 1.59.0

* plugins: Update sha1 for google-cloud-core & google-cloud-storage
2019-03-10 11:04:52 -04:00
Henning Andersen 00a26b9dd2 Blob store compression fix (#39073)
Blob store compression was not enabled for some of the files in
snapshots due to constructor accessing sub-class fields. Fixed to
instead accept compress field as constructor param. Also fixed chunk
size validation to work.

Deprecated repositories.fs.compress setting as well to be able to unify
in a future commit.
2019-02-20 09:24:41 +01:00
Jay Modi 54dbf9469c
Update httpclient for JDK 11 TLS engine (#37994)
The apache commons http client implementations recently released
versions that solve TLS compatibility issues with the new TLS engine
that supports TLSv1.3 with JDK 11. This change updates our code to
use these versions since JDK 11 is a supported JDK and we should
allow the use of TLSv1.3.
2019-01-30 14:24:29 -07:00
Alpar Torok a7c3d5842a
Split third party audit exclusions by type (#36763) 2019-01-07 17:24:19 +02:00
Armin Braun 617e294133
SNAPSHOT: Make Atomic Blob Writes Mandatory (#37168)
* With #37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface
* The overrides on the other three cloud implementations are ok because:
   * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket."
   * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency
   * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either
* Relates #37011
2019-01-07 12:11:19 +01:00
Armin Braun 5df93218d5
SNAPSHOTS: Upgrade GCS Dependencies to 1.55.0 (#36634)
* Closes #35459
* Closes #35229
2018-12-14 13:24:29 +01:00
Tanguy Leroux 6186ccf83e
[Tests] Fix third party tests with Gradle 5.0 (#36302)
* [Tests] Fix third party tests with Gradle 5.0

* apply feedback
2018-12-06 16:05:05 +01:00
Gordon Brown b2057138a7
Remove AbstractComponent from AbstractLifecycleComponent (#35560)
AbstractLifecycleComponent now no longer extends AbstractComponent. In
order to accomplish this, many, many classes now instantiate their own
logger.
2018-11-19 09:51:32 -07:00
Jernej Klancic baf33b3162 Removes AbstractComponent from several classes (#35566)
Removes inhertiting from AbstractComponent for some classes (mostly
in the plugins module).

Relates to #34488
2018-11-16 20:50:18 +01:00
Christoph Büscher 09cac321e7
Upgrade to Joda 2.10.1 (#35410)
This version contains a bugfix that allows us to reenable one of our muted tests
in DateTimeUnitTests.

Closes #33749
2018-11-12 10:02:41 +01:00
Andy Bristol eec357ebde [test] quote base_path in repository tests 2018-11-01 13:01:53 -07:00
Nik Everett e28509fbfe
Core: Less settings to AbstractComponent (#35140)
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.

I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler

These files don't *need* `settings` either, but this change is large
enough as is.

Relates to #34488
2018-10-31 21:23:20 -04:00
Nik Everett 086ada4c08
Core: Drop settings member from AbstractComponent (#35083)
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.

This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
2018-10-30 16:10:38 -04:00
Alpar Torok 59536966c2
Add a new "contains" feature (#34738)
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
2018-10-25 08:50:50 +03:00
Tanguy Leroux c5e5a97a34
Update Google Cloud Storage Library for Java (#32940)
This commit updated the google-cloud-storage library from version 1.28.0
 to version 1.40.0.
2018-08-24 10:55:23 +02:00
Alpar Torok 38e2e1d553
Detect and prevent configuration that triggers a Gradle bug (#31912)
* Detect and prevent configuration that triggers a Gradle bug

As we found in #31862, this can lead to a lot of wasted time as it's not
immediatly obvius what's going on.
Givent how many projects we have it's getting increasingly easier to run
into gradle/gradle#847.
2018-07-19 06:46:58 +00:00
Vladimir Dolzhenko b1bf643e41
lazy snapshot repository initialization (#31606)
lazy snapshot repository initialization
2018-07-13 20:05:49 +02:00
Yannick Welsch 2bb4f38371
Add write*Blob option to replace existing blob (#31729)
Adds a new parameter to the BlobContainer#write*Blob methods to specify whether the existing file
should be overridden or not. For some metadata files in the repository, we actually want to replace
the current file. This is currently implemented through an explicit blob delete and then a fresh write.
In case of using a cloud provider (S3, GCS, Azure), this results in 2 API requests instead of just 1.
This change will therefore allow us to achieve the same functionality using less API requests.
2018-07-03 09:13:50 +02:00
Tanguy Leroux d8b3f332ef
Remove extra check for object existence in repository-gcs read object (#31661) 2018-06-29 13:52:31 +02:00
Tanguy Leroux 0ef22db844
[Test] Clean up some repository-s3 tests (#31601)
This commit removes some tests in the repository-s3 plugin that 
have not been executed for 2+ years but have been maintained 
for nothing. Most of the tests in AbstractAwsTestCase were 
obsolete or superseded by fixture based integration tests.
2018-06-29 13:21:29 +02:00
Alpar Torok 08b8d11e30
Add support for switching distribution for all integration tests (#30874)
* remove left-over comment

* make sure of the property for plugins

* skip installing modules if these exist in the distribution

* Log the distrbution being ran

* Don't allow running with integ-tests-zip passed externally

* top level x-pack/qa can't run with oss distro

* Add support for matching objects in lists

Makes it possible to have a key that points to a list and assert that a
certain object is present in the list. All keys have to be present and
values have to match. The objects in the source list may have additional
fields.

example:
```
  match:  { 'nodes.$master.plugins': { name: ingest-attachment }  }
```

* Update plugin and module tests to work with other distributions

Some of the tests expected that the integration tests will always be ran
with  the `integ-test-zip` distribution so that there will be no other
plugins loaded.

With this change, we check for the presence of the plugin without
assuming exclusivity.

* Allow modules to run on other distros as well

To match the behavior of tets.distributions

* Add and use a new `contains` assertion

Replaces the  previus changes that caused `match` to do a partial match.

* Implement PR review comments
2018-06-26 06:49:03 -07:00
Albert Zaharovits 3378240b29
Reload secure settings for plugins (#31383)
Adds the ability to reread and decrypt the local node keystore.
Commonly, the contents of the keystore, backing the `SecureSettings`,
are not retrievable except during node initialization. This changes that
by adding a new API which broadcasts a password to every node. The
password is used to decrypt the local keystore and use it to populate
a `Settings` object that is passes to all the plugins implementing the
`ReloadablePlugin` interface. The plugin is then responsible to do
whatever "reload" means in his case. When the `reload`handler returns,
the keystore is closed and its contents are no longer retrievable.
Password is never stored persistently on any node.
Plugins that have been moded in this commit are: `repository-azure`,
`repository-s3`, `repository-gcs` and `discovery-ec2`.
2018-06-18 09:42:11 +03:00