OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	74e3694234	Optimize GCS Repo Uploads (#51596 ) (#51618 ) For small uploads (that can still be up to 5MB!) we needlessly reading the `InputStream` into a BAOS which entailed allocating the `byte[]` for the stream contents twice (because to `toByteArray` on the BAOS copies). Also, for resumeable uploads we were needlessly wrapping the output channel and running each individual write in its own privileged context when we could just wrap the whole upload in a single privileged context. Relates #51593	2020-01-29 16:07:30 +01:00
Armin Braun	7914c1a734	Optimize GCS Mock (#51593 ) (#51594 ) This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754	2020-01-29 11:06:05 +01:00
Armin Braun	4a7e09f624	Enforce Logging of Errors in GCS Rest RetriesTests (#50761 ) (#50783 ) It's impossible to tell why #50754 fails without this change. We're failing to close the `exchange` somewhere and there is no write timeout in the GCS SDK (something to look into separately) only a read timeout on the socket so if we're failing on an assertion without reading the full request body (at least into the read-buffer) we're locking up waiting forever on `write0`. This change ensure the `exchange` is closed in the tests where we could lock up on a write and logs the failure so we can find out what broke #50754.	2020-01-09 10:46:07 +01:00
Armin Braun	761d6e8e4b	Remove BlobContainer Tests against Mocks (#50194 ) (#50220 ) * Remove BlobContainer Tests against Mocks Removing all these weird mocks as asked for by #30424. All these tests are now part of real repository ITs and otherwise left unchanged if they had independent tests that didn't call the `createBlobStore` method previously. The HDFS tests also get added coverage as a side-effect because they did not have an implementation of the abstract repository ITs. Closes #30424	2019-12-16 11:37:09 +01:00
Armin Braun	6eee41e253	Remove Unused Single Delete in BlobStoreRepository (#50024 ) (#50123 ) * Remove Unused Single Delete in BlobStoreRepository There are no more production uses of the non-bulk delete or the delete that throws on missing so this commit removes both these methods. Only the bulk delete logic remains. Where the bulk delete was derived from single deletes, the single delete code was inlined into the bulk delete method. Where single delete was used in tests it was replaced by bulk deleting.	2019-12-12 11:17:46 +01:00
Armin Braun	d19c8db4e4	Fix GCS Mock Batch Delete Behavior (#50034 ) (#50084 ) Batch deletes get a response for every delete request, not just those that actually hit an existing blob. The fact that we only responded for existing blobs leads to a degenerate response that throws a parse exception if a batch delete only contains non-existant blobs.	2019-12-11 17:40:25 +01:00
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Armin Braun	3862400270	Remove Redundant EsBlobStoreTestCase (#49603 ) (#49605 ) All the implementations of `EsBlobStoreTestCase` use the exact same bootstrap code that is also used by their implementation of `EsBlobStoreContainerTestCase`. This means all tests might as well live under `EsBlobStoreContainerTestCase` saving a lot of code duplication. Also, there was no HDFS implementation for `EsBlobStoreTestCase` which is now automatically resolved by moving the tests over since there is a HDFS implementation for the container tests.	2019-11-26 20:57:19 +01:00
Armin Braun	495b543e63	Improve Stability of GCS Mock API (#49592 ) (#49597 ) Same as #49518 pretty much but for GCS. Fixing a few more spots where input stream can get closed without being fully drained and adding assertions to make sure it's always drained. Moved the no-close stream wrapper to production code utilities since there's a number of spots in production code where it's also useful (will reuse it there in a follow-up).	2019-11-26 16:53:51 +01:00
Tanguy Leroux	f753fa2265	HttpHandlers should return correct list of objects (#49283 ) This commit fixes the server side logic of "List Objects" operations of Azure and S3 fixtures. Until today, the fixtures were returning a " flat" view of stored objects and were not correctly handling the delimiter parameter. This causes some objects listing to be wrongly interpreted by the snapshot deletion logic in Elasticsearch which relies on the ability to list child containers of BlobContainer (#42653) to correctly delete stale indices. As a consequence, the blobs were not correctly deleted from the emulated storage service and stayed in heap until they got garbage collected, causing CI failures like #48978. This commit fixes the server side logic of Azure and S3 fixture when listing objects so that it now return correct common blob prefixes as expected by the snapshot deletion process. It also adds an after-test check to ensure that tests leave the repository empty (besides the root index files). Closes #48978	2019-11-20 09:26:42 +01:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Tanguy Leroux	8a14ea5567	Add docker-composed based test fixture for GCS (#48902 ) Similarly to what has be done for Azure in #48636, this commit adds a new :test:fixtures:gcs-fixture project which provides two docker-compose based fixtures that emulate a Google Cloud Storage service. Some code has been extracted from existing tests and placed into this new project so that it can be easily reused in other projects.	2019-11-07 13:27:22 -05:00
Mark Vieira	6ab4645f4e	[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818 ) This commit introduces a consistent, and type-safe manner for handling global build parameters through out our build logic. Primarily this replaces the existing usages of extra properties with static accessors. It also introduces and explicit API for initialization and mutation of any such parameters, as well as better error handling for uninitialized or eager access of parameter values. Closes #42042	2019-11-01 11:33:11 -07:00
Andrey Ershov	088988bb37	GCS snapshot cleanup tool backport to 7.x (#48750 ) This is the backport of #45076 with dependent changes.	2019-10-31 18:21:36 +03:00
Tanguy Leroux	e1dd0e753d	Differentiate service account tokens in GCS tests (#48382 ) This commit changes the test so that each node use a specific service account and private key. It also changes how unique request ids are generated for refresh token request using the token itself, so that error count will be specific per node (each node should execute a single refresh token request as tokens are valid for 1 hour).	2019-10-23 16:57:35 +02:00
Tanguy Leroux	f5c5411fe8	Differentiate base paths in repository integration tests (#47284 ) (#47300 ) This commit change the repositories base paths used in Azure/S3/GCS integration tests so that they don't conflict with each other when tests run in parallel on real storage services. Closes #47202	2019-10-01 08:39:55 +02:00
Tanguy Leroux	6986d7f968	Add blob container retries tests for Google Cloud Storage (#46968 ) Similarly to what has been done for S3 in #45383, this commit adds unit tests that verify the behavior of the SDK client and blob container implementation for Google Storage when the remote service returns errors. The main purpose was to add an extra test to the specific retry logic for 410-Gone errors added in #45963. Relates #45963	2019-09-24 08:58:24 +02:00
Tanguy Leroux	add7148f3b	GCS deleteBlobsIgnoringIfNotExists should catch StorageException (#46832 ) GoogleCloudStorageBlobStore.deleteBlobsIgnoringIfNotExists() does not correctly catch StorageException thrown by batch.submit(). In the case a snapshot is deleted through BlobStoreRepository.deleteSnapshot() a storage exception is not caught (only IOException are) so the deletion is interrupted and indices cannot be cleaned up. The storage exception bubbles up to SnapshotService.deleteSnapshotFromRepository() but the listener that removes the deletion from the cluster state is not executed, leaving the deletion in the cluster state. This bug has been reported in #46772 where batch.submit() threw an exception in the test testIndicesDeletedFromRepository and following tests failed because a snapshot deletion was running. Relates #46772	2019-09-20 10:02:23 +02:00
Tanguy Leroux	3ae51f25dd	Move testSnapshotWithLargeSegmentFiles to ESMockAPIBasedRepositoryIntegTestCase (#46802 ) This commit moves the common test testSnapshotWithLargeSegmentFiles to the ESMockAPIBasedRepositoryIntegTestCase base class.	2019-09-18 15:41:30 +02:00
Tanguy Leroux	4db37801d0	Add resumable uploads support to GCS repository integration tests (#46562 ) This commit adds support for resumable uploads to the internal HTTP server used in GoogleCloudStorageBlobStoreRepositoryTests. This way we can also test the behavior of the Google's client when the service returns server errors in response to resumable upload requests. The BlobStore implementation for GCS has the choice between 2 methods to upload a blob: resumable and multipart. In the current implementation, the client executes a resumable upload if the blob size is larger than LARGE_BLOB_THRESHOLD_BYTE_SIZE, otherwise it executes a multipart upload. This commit makes this logic overridable in tests, allowing to randomize the decision of using one method or the other. The commit add support for single request resumable uploads and chunked resumable uploads (the blob is uploaded into multiple 2Mb chunks; each chunk being a resumable upload). For this last case, this PR also adds a test testSnapshotWithLargeSegmentFiles which makes it more probable that a chunked resumable upload is executed.	2019-09-18 09:33:05 +02:00
Armin Braun	371c355bca	Retry GCS Resumable Upload on Error 410 (#45963 ) (#46783 ) A resumable upload session can fail on with a 410 error and should be retried in that case. I added retrying twice using resetting of the given `InputStream` as the retry mechanism since the same approach is used by the AWS S3 SDK already as well and relied upon by the S3 repository implementation. Related GCS documentation: https://cloud.google.com/storage/docs/json_api/v1/status-codes#410_Gone	2019-09-17 19:06:43 +02:00
Luca Cavanna	e57756492a	Update http-core and http-client dependencies (#46549 ) Relates to #45808 Closes #45577	2019-09-12 09:45:29 +02:00
Tanguy Leroux	88bed09119	Mutualize code in cloud-based repository integration tests (#46483 ) This commit factors out some common code between the cloud-based repository integration tests that were recently improved. Relates #46376	2019-09-09 16:02:14 +02:00
Tanguy Leroux	8e3dc68454	Inject random server errors in GoogleCloudStorageBlobStoreRepositoryTests (#46376 ) This commit modifies the HTTP server used in GoogleCloudStorageBlobStoreRepositoryTests so that it randomly returns server errors. The test does not inject server errors for the following types of request: batch request, resumable upload request.	2019-09-09 09:59:59 +02:00
Tanguy Leroux	28974b5723	Replace mocked client in GCSBlobStoreRepositoryTests by HTTP server (#46255 ) This commit removes the usage of MockGoogleCloudStoragePlugin in GoogleCloudStorageBlobStoreRepositoryTests and replaces it by a HttpServer that emulates the Storage service. This allows the repository tests to use the real Google's client under the hood in tests and will allow us to test the behavior of the snapshot/restore feature for GCS repositories by simulating random server-side internal errors. The HTTP server used to emulate the Storage service is intentionally simple and minimal to keep things understandable and maintainable. Testing full client options on the server side (like authentication, chunked encoding etc) remains the responsibility of the GoogleCloudStorageFixture.	2019-09-05 10:37:37 +02:00
Tanguy Leroux	9e14ffa8be	Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068 )	2019-08-28 16:29:46 +02:00
Jason Tedor	3d64605075	Remove node settings from blob store repositories (#45991 ) This commit starts from the simple premise that the use of node settings in blob store repositories is a mistake. Here we see that the node settings are used to get default settings for store and restore throttle rates. Yet, since there are not any node settings registered to this effect, there can never be a default setting to fall back to there, and so we always end up falling back to the default rate. Since this was the only use of node settings in blob store repository, we move them. From this, several places fall out where we were chaining settings through only to get them to the blob store repository, so we clean these up as well. That leaves us with the changeset in this commit.	2019-08-26 16:26:13 -04:00
Armin Braun	6aaee8aa0a	Repository Cleanup Endpoint (#43900 ) (#45780 ) * Repository Cleanup Endpoint (#43900) * Snapshot cleanup functionality via transport/REST endpoint. * Added all the infrastructure for this with the HLRC and node client * Made use of it in tests and resolved relevant TODO * Added new `Custom` CS element that tracks the cleanup logic. Kept it similar to the delete and in progress classes and gave it some (for now) redundant way of handling multiple cleanups but only allow one * Use the exact same mechanism used by deletes to have the combination of CS entry and increment in repository state ID provide some concurrency safety (the initial approach of just an entry in the CS was not enough, we must increment the repository state ID to be safe against concurrent modifications, otherwise we run the risk of "cleaning up" blobs that just got created without noticing) * Isolated the logic to the transport action class as much as I could. It's not ideal, but we don't need to keep any state and do the same for other repository operations (like getting the detailed snapshot shard status)	2019-08-21 17:59:49 +02:00
Armin Braun	c8db0e9b7e	Remove blobExists Method from BlobContainer (#44472 ) (#44475 ) * We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface * Keep it as an implementation detail in the Azure repository	2019-07-17 11:56:02 +02:00
Mark Vieira	7c2e4b2857	[Backport] Enable caching of rest tests which use integ-test distribution (#44181 )	2019-07-10 15:42:28 -07:00
Armin Braun	be20fb80e4	Recursive Delete on BlobContainer (#43281 ) (#43920 ) This is a prerequisite of #42189: * Add directory delete method to blob container specific to each implementation: * Some notes on the implementations: * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting. * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block. * HDFS and FS are trivial since we have directory delete methods available for them * Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)	2019-07-03 17:14:57 +02:00
Armin Braun	3317169c4f	Fix GCS Blob Repository 3rd Party Tests (#43030 ) (#43913 ) * We have to strip the trailing slash from child names here like we do for AWS * closes #43029	2019-07-03 15:09:28 +02:00
Armin Braun	455b12a4fb	Add Ability to List Child Containers to BlobContainer (#42653 ) (#43903 ) * Add Ability to List Child Containers to BlobContainer (#42653) * Add Ability to List Child Containers to BlobContainer * This is a prerequisite of #42189	2019-07-03 11:30:49 +02:00
Armin Braun	21e74dd7d2	Upgrade GCS Repository Dependencies (#43142 ) (#43418 ) * Upgrade to latest GCS SDK and transitive dependencies (I chose the later version here on conflict) * Remove now unnecessary hack for custom endpoints (the linked bugs were both resolved in the SDK)	2019-06-20 16:35:54 +02:00
Yannick Welsch	e5a4a2272b	Wipe repositories more often (#42511 ) Fixes an issue where repositories are unintentionally shared among tests (given that the repo contents is captured in a static variable on the test class, to allow "sharing" among nodes) and two tests randomly chose the same snapshot name, leading to a conflict. Closes #42519	2019-06-12 11:58:38 +02:00
Jason Tedor	371cb9a8ce	Remove Log4j 1.2 API as a dependency (#42702 ) We had this as a dependency for legacy dependencies that still needed the Log4j 1.2 API. This appears to no longer be necessary, so this commit removes this artifact as a dependency. To remove this dependency, we had to fix a few places where we were accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since both APIs were on the compile-time classpath). Finally, we can remove our custom Netty logger factory. This was needed when we were on Log4j 1.2 and handled logging in our own unique way. When we migrated to Log4j 2 we could have dropped this dependency. However, even then Netty would still pick up Log4j 1.2 since it was on the classpath, thus the advantage to removing this as a dependency now.	2019-05-30 16:08:07 -04:00
Armin Braun	116b050cc6	Cleanup Bulk Delete Exception Logging (#41693 ) (#42606 ) * Cleanup Bulk Delete Exception Logging * Follow up to #41368 * Collect all failed blob deletes and add them to the exception message * Remove logging of blob name list from caller exception logging	2019-05-28 11:00:28 +02:00
Armin Braun	44bf784fe1	Add Infrastructure to Run 3rd Party Repository Tests (#42586 ) (#42604 ) * Add Infrastructure to Run 3rd Party Repository Tests * Add infrastructure to run third party repository tests using our standard JUnit infrastructure * This is a prerequisite of #42189	2019-05-28 10:46:22 +02:00
Armin Braun	c4f44024af	Remove Delete Method from BlobStore (#41619 ) (#42574 ) * Remove Delete Method from BlobStore (#41619) * The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either	2019-05-27 12:24:20 +02:00
Armin Braun	7cc4b9a8b3	Implement Bulk Deletes for GCS Repository (#41368 ) (#41681 ) * Implement Bulk Deletes for GCS Repository (#41368) * Just like #40322 for AWS * We already had a bulk delete API but weren't using it from the blob container implementation, now we are using it * Made the bulk delete API also compliant with our interface that only suppresses errors about non existent blobs by stating failed deletes (I didn't use any bulk stat action here since having to stat here should be the exception anyway and it would make error handling a lot more complex) * Fixed bulk delete API to limit its batch size to 100 in line with GCS recommendations back port of #41368	2019-04-30 17:03:57 +02:00
Alpar Torok	335f2bf102	Testclsuters: convert plugins qa projects (#41496 ) Add testclusters support for files in keystore and convert qa subprojects within plugins.	2019-04-26 08:57:52 -07:00
Armin Braun	aad33121d8	Async Snapshot Repository Deletes (#40144 ) (#41571 ) Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106 * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657	2019-04-26 15:36:09 +02:00
Jay Modi	f34663282c	Update apache httpclient to version 4.5.8 (#40875 ) This change updates our version of httpclient to version 4.5.8, which contains the fix for HTTPCLIENT-1968, which is a bug where the client started re-writing paths that contained encoded reserved characters with their unreserved form.	2019-04-05 13:48:10 -06:00
David Emanuel Buchmann	b5ed039160	plugins/repository-gcs: Update google-cloud-storage/core to 1.59.0 (#39748 ) * plugins/repository-gcs: Update google-cloud-storage / google-cloud-core to 1.59.0 * plugins: Update sha1 for google-cloud-core & google-cloud-storage	2019-03-10 11:04:52 -04:00
Henning Andersen	00a26b9dd2	Blob store compression fix (#39073 ) Blob store compression was not enabled for some of the files in snapshots due to constructor accessing sub-class fields. Fixed to instead accept compress field as constructor param. Also fixed chunk size validation to work. Deprecated repositories.fs.compress setting as well to be able to unify in a future commit.	2019-02-20 09:24:41 +01:00
Jay Modi	54dbf9469c	Update httpclient for JDK 11 TLS engine (#37994 ) The apache commons http client implementations recently released versions that solve TLS compatibility issues with the new TLS engine that supports TLSv1.3 with JDK 11. This change updates our code to use these versions since JDK 11 is a supported JDK and we should allow the use of TLSv1.3.	2019-01-30 14:24:29 -07:00
Alpar Torok	a7c3d5842a	Split third party audit exclusions by type (#36763 )	2019-01-07 17:24:19 +02:00
Armin Braun	617e294133	SNAPSHOT: Make Atomic Blob Writes Mandatory (#37168 ) * With #37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface * The overrides on the other three cloud implementations are ok because: * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket." * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either * Relates #37011	2019-01-07 12:11:19 +01:00
Armin Braun	5df93218d5	SNAPSHOTS: Upgrade GCS Dependencies to 1.55.0 (#36634 ) * Closes #35459 * Closes #35229	2018-12-14 13:24:29 +01:00
Tanguy Leroux	6186ccf83e	[Tests] Fix third party tests with Gradle 5.0 (#36302 ) * [Tests] Fix third party tests with Gradle 5.0 * apply feedback	2018-12-06 16:05:05 +01:00

1 2 3

124 Commits