Commit Graph

239 Commits

Author SHA1 Message Date
Tanguy Leroux 3ae51f25dd Move testSnapshotWithLargeSegmentFiles to ESMockAPIBasedRepositoryIntegTestCase (#46802)
This commit moves the common test testSnapshotWithLargeSegmentFiles 
to the ESMockAPIBasedRepositoryIntegTestCase base class.
2019-09-18 15:41:30 +02:00
Tanguy Leroux fd42358a6d Add support for Multipart upload to S3 repository integration tests (#46704)
This commit adds support for Multipart upload to the internal HTTP 
server used in S3 repository integration tests.
2019-09-18 09:40:25 +02:00
David Turner 65dc888623 Resume partial download from S3 on connection drop (#46589)
Today if the connection to S3 times out or drops after starting to download an
object then the SDK does not attempt to recover or resume the download, causing
the restore of the whole shard to fail and retry. This commit allows
Elasticsearch to detect such a mid-stream failure and to resume the download
from where it failed.
2019-09-17 13:11:36 +01:00
Tanguy Leroux 88bed09119 Mutualize code in cloud-based repository integration tests (#46483)
This commit factors out some common code between the cloud-based
repository integration tests that were recently improved.

Relates #46376
2019-09-09 16:02:14 +02:00
David Turner cc092b1be1 Add support for OneZoneInfrequentAccess storage (#46436)
The `repository-s3` plugin has supported a storage class of `onezone_ia` since
the SDK upgrade in #30723, but we do not test or document this fact. This
commit adds this storage class to the docs and adds a test to ensure that the
documented storage classes are all accepted by S3 too.

Fixes #30474
2019-09-09 07:54:44 +01:00
Tanguy Leroux 2290865559 Fix usage of randomIntBetween() in testWriteBlobWithRetries (#46380)
This commit fixes the usage of randomIntBetween() in the test 
testWriteBlobWithRetries, when the test generates a random array  
of a single byte.
2019-09-06 09:10:38 +02:00
Tanguy Leroux bd7a04cd55 Disable request throttling in S3BlobStoreRepositoryTests (#46226)
When some high values are randomly picked up - for example the number
of indices to snapshot or the number of snapshots to create - the tests
in S3BlobStoreRepositoryTests can generate a high number of requests to
the internal S3 server.

In order to test the retry logic of the S3 client, the internal server is
designed to randomly generate random server errors. When many
 requests are made, it is possible that the S3 client reaches its maximum
number of successive retries capacity. Then the S3 client will stop
retrying requests until enough retry attempts succeed, but it means
that any request could fail before reaching the max retries count and
make the test fail too.

Closes #46217
Closes #46218
Closes #46219
2019-09-02 16:44:43 +02:00
Henning Andersen d68e05aade Mute 2 tests in S3BlobStoreRepositoryTests (#46221)
Muted testSnapshotAndRestore and testMultipleSnapshotAndRollback

Relates #46218 and #46219
2019-09-02 10:38:03 +02:00
Tanguy Leroux 0c1b263e8d Inject random errors in S3BlobStoreRepositoryTests (#46125)
This commit modifies the HTTP server used in S3BlobStoreRepositoryTests 
so that it randomly returns server errors for any type of request executed by
 the SDK client. It is now possible to verify that the repository tests are s
uccessfully completed even if one or more errors were returned by the S3 
service in response of a blob upload, a blob deletion or a object listing request 
etc.

Because injecting errors forces the SDK client to retry requests, the test limits
 the maximum errors to send in response for each request at 3 retries.
2019-08-30 11:58:09 +02:00
Tanguy Leroux b526309fbd Replace MockAmazonS3 usage in S3BlobStoreRepositoryTests by a HTTP server (#46081)
This commit removes the usage of MockAmazonS3 in S3BlobStoreRepositoryTests 
and replaces it by a HttpServer that emulates the S3 service. This allows the 
repository tests to use the real Amazon's S3 client under the hood in tests and will 
allow to test the behavior of the snapshot/restore feature for S3 repositories by 
simulating random server-side internal errors.

The HTTP server used to emulate the S3 service is intentionally simple and minimal 
to keep things understandable and maintainable. Testing full client options on the 
server side (like authentication, chunked encoding etc) remains the responsibility 
of the AmazonS3Fixture.
2019-08-29 13:16:59 +02:00
Tanguy Leroux 9e14ffa8be Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068) 2019-08-28 16:29:46 +02:00
Jason Tedor 3d64605075
Remove node settings from blob store repositories (#45991)
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
2019-08-26 16:26:13 -04:00
Tanguy Leroux a3d918bddb Refactor RepositoryCredentialsTests (#45919)
This commit refactors the S3 credentials tests in
RepositoryCredentialsTests so that it now uses a single
node (ESSingleNodeTestCase) to test how secure/insecure
credentials are overriding each other. Using a single node
makes it much easier to understand what each test is actually
testing and IMO better reflect how things are initialized.

It also allows to fold into this class the test
testInsecureRepositoryCredentials which was wrongly located
in S3BlobStoreRepositoryTests. By moving this test away, the
S3BlobStoreRepositoryTests class does not need the
allow_insecure_settings option anymore and thus can be
executed as part of the usual gradle test task.
2019-08-26 15:14:43 +02:00
Tanguy Leroux aee92d573c Allow partial request body reads in AWS S3 retries tests (#45847)
This commit changes the tests added in #45383 so that the fixture that 
emulates the S3 service now sometimes consumes all the request body 
before sending an error, sometimes consumes only a part of the request 
body and sometimes consumes nothing. The idea here is to beef up a bit 
the tests that writes blob because the client's retry logic relies on 
marking and resetting the blob's input stream.

This pull request also changes the testWriteBlobWithRetries() so that it 
(rarely) tests with a large blob (up to 1mb), which is more than the client's 
default read limit on input streams (131Kb).

Finally, it optimizes the ZeroInputStream so that it is a bit more effective 
(now works using an internal buffer and System.arraycopy() primitives).
2019-08-23 13:43:31 +02:00
Tanguy Leroux 57a36eb373 Add tests to check that requests are retried when writing/reading blobs on S3 (#45383)
This commit adds tests to verify the behavior of the S3BlobContainer and 
its underlying AWS SDK client when the remote S3 service is responding 
errors or not responding at all. The expected behavior is that requests are 
retried multiple times before the client gives up and the S3BlobContainer 
bubbles up an exception.

The test verifies the behavior of BlobContainer.writeBlob() and 
BlobContainer.readBlob(). In the case of S3 writing a blob can be executed 
as a single upload or using multipart requests; the test checks both scenario 
by writing a small then a large blob.
2019-08-22 11:41:40 +02:00
Armin Braun 6aaee8aa0a
Repository Cleanup Endpoint (#43900) (#45780)
* Repository Cleanup Endpoint (#43900)

* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 17:59:49 +02:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Armin Braun 5d7fafec14
Add Assertion to Ensure Retries in S3BlobContainer (#45224) (#45230)
* We need a `markSupported` input stream to retry uploads
* Relates #45153
2019-08-06 16:11:19 +02:00
Armin Braun 07cf2cb665
Add disable_chunked_encoding Setting to S3 Repo (#44052) (#44562)
* Add disable_chunked_encoding setting to S3 repo plugin to support S3 implementations that don't support chunked encoding
2019-07-18 16:57:56 +02:00
Armin Braun 65fcaecce1
Remove Minio Host Hack in S3 Repository Build (#44491) (#44497)
* Resolving the todo to clean this hackyness up
2019-07-17 19:59:00 +02:00
Armin Braun c8db0e9b7e
Remove blobExists Method from BlobContainer (#44472) (#44475)
* We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface
   * Keep it as an implementation detail in the Azure repository
2019-07-17 11:56:02 +02:00
Armin Braun 940aa71930
Cleanup S3 BlobContainer Listing Logic (#43088) (#44406)
* Cleanup duplication in creating and looping over IO Requests
2019-07-16 12:19:20 +02:00
Armin Braun af9b98e81c
Recursively Delete Unreferenced Index Directories (#42189) (#44051)
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes
* Runs after ever delete operation
* Relates  #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
2019-07-08 10:55:39 +02:00
Armin Braun 2176d09c37
Provide an Option to Use Path-Style-Access with S3 Repo (#41966) (#44046)
* Provide an Option to Use Path-Style-Access with S3 Repo

* As discussed, added the option to use path style access back again and
deprecated it.
* Defaulted to `false`
* Added warning to docs

* Closes #41816
2019-07-08 08:10:01 +02:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Armin Braun 455b12a4fb
Add Ability to List Child Containers to BlobContainer (#42653) (#43903)
* Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
2019-07-03 11:30:49 +02:00
Yannick Welsch e5a4a2272b Wipe repositories more often (#42511)
Fixes an issue where repositories are unintentionally shared among tests (given that the repo contents is captured in a static variable on the test class, to allow "sharing" among nodes) and two tests randomly chose the same snapshot name, leading to a conflict.

Closes #42519
2019-06-12 11:58:38 +02:00
Armin Braun 116b050cc6
Cleanup Bulk Delete Exception Logging (#41693) (#42606)
* Cleanup Bulk Delete Exception Logging

* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
2019-05-28 11:00:28 +02:00
Armin Braun 44bf784fe1
Add Infrastructure to Run 3rd Party Repository Tests (#42586) (#42604)
* Add Infrastructure to Run 3rd Party Repository Tests

* Add infrastructure to run third party repository tests using our standard JUnit infrastructure
* This is a prerequisite of #42189
2019-05-28 10:46:22 +02:00
Armin Braun c4f44024af
Remove Delete Method from BlobStore (#41619) (#42574)
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
  * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
2019-05-27 12:24:20 +02:00
Armin Braun aad33121d8
Async Snapshot Repository Deletes (#40144) (#41571)
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
2019-04-26 15:36:09 +02:00
Armin Braun 23b3741618
Remove Exists Check from S3 Repository Deletes (#40931) (#41534)
* The check doesn't add much if anything practically, since the S3 repository is eventually consistent and we only log the non-existence of a blob anyway
  * We don't do the check on writes for this very reason and documented it as such
  * Removing the check saves one API call per single delete speeding up the deletion process and lowering costs
2019-04-25 18:25:03 +02:00
Armin Braun c4e84e2b34
Add Bulk Delete Api to BlobStore (#40322) (#41253)
* Adds Bulk delete API to blob container
* Implement bulk delete API for S3
* Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs
* Closes #40250
2019-04-16 17:19:05 +02:00
Armin Braun 65732d707f
Add Support for S3 Intelligent Tiering (#39376) (#39620)
* Add support for S3 intelligent tiering
* Closes #38836
2019-03-04 10:32:37 +01:00
Henning Andersen 00a26b9dd2 Blob store compression fix (#39073)
Blob store compression was not enabled for some of the files in
snapshots due to constructor accessing sub-class fields. Fixed to
instead accept compress field as constructor param. Also fixed chunk
size validation to work.

Deprecated repositories.fs.compress setting as well to be able to unify
in a future commit.
2019-02-20 09:24:41 +01:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Armin Braun 57823c484f
Streamline S3 Repository- and Client-Settings (#37393)
* Make repository settings override static settings
* Cache clients according to settings
   * Introduce custom implementations for the AWS credentials here to be able to use them as part of a hash key
2019-01-30 06:22:31 +01:00
Armin Braun 617e294133
SNAPSHOT: Make Atomic Blob Writes Mandatory (#37168)
* With #37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface
* The overrides on the other three cloud implementations are ok because:
   * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket."
   * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency
   * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either
* Relates #37011
2019-01-07 12:11:19 +01:00
Tanguy Leroux 6186ccf83e
[Tests] Fix third party tests with Gradle 5.0 (#36302)
* [Tests] Fix third party tests with Gradle 5.0

* apply feedback
2018-12-06 16:05:05 +01:00
Yannick Welsch 2970abfce9
Add read-only repository verification (#35731)
Adds a verification mode for read-only repositories. It also makes the extra bucket check on
repository creation obsolete, which fixes #35703.
2018-11-23 14:45:05 +01:00
Jernej Klancic baf33b3162 Removes AbstractComponent from several classes (#35566)
Removes inhertiting from AbstractComponent for some classes (mostly
in the plugins module).

Relates to #34488
2018-11-16 20:50:18 +01:00
Armin Braun 02b4e28534
#31608 Add S3 Setting to Force Path Type Access (#34721)
* SNAPSHOTS: Use Path Style Access in S3

* Use path style access pattern to fix #31608
* closes #31608
2018-11-09 05:07:26 +01:00
Andy Bristol eec357ebde [test] quote base_path in repository tests 2018-11-01 13:01:53 -07:00
Nik Everett e28509fbfe
Core: Less settings to AbstractComponent (#35140)
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.

I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler

These files don't *need* `settings` either, but this change is large
enough as is.

Relates to #34488
2018-10-31 21:23:20 -04:00
Nik Everett 086ada4c08
Core: Drop settings member from AbstractComponent (#35083)
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.

This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
2018-10-30 16:10:38 -04:00
Nik Everett 9f87fdc7ab
Drop deprecationLogger from AbstractComponent (#34859)
Drops the `deprecationLogger` from `AbstractComponent`, moving it to
places where we need it. This saves us from building a bunch of
`DeprecationLogger`s that we don't need.

Relates to #34488
2018-10-26 15:40:16 -04:00
Alpar Torok 59536966c2
Add a new "contains" feature (#34738)
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
2018-10-25 08:50:50 +03:00
Nik Everett 6c07d105f3
Amazon: Wrap at 140 columns (#34495)
Applies our standard column wrapping to the `discovery-ec2` and
`repository-s3` plugins.
2018-10-18 09:09:09 -04:00
Jason Tedor 99681f91f8
Use more precise does S3 bucket exist method (#34123)
We are using a deprecated method for checking if an S3 bucket
exists. This deprecated method has a limitation that it can not
distinguish between invalid credentials and a lack of permissions. This
commit switches to using a method that correctly surfaces if invalid
credentials are supplied when checking for the existence of a bucket.
2018-09-28 10:05:04 -04:00
Jason Tedor 839a677557
Do not override named S3 client credentials (#33793)
In cases when mixed secure S3 client credentials and insecure S3 client
credentials were used (that is, those defined on the repository), we
were overriding the credentials from the repository using insecure
settings to all the repositories. This commit fixes this by not mixing
up repositories that use insecure settings with those that use secure
settings.
2018-09-19 16:18:54 -04:00