Upgrading the GCS SDK to the most recent version.
Adjusting (i.e. improving) the REST mock accordingly.
This should significantly boost performance by pulling in
https://github.com/googleapis/java-core/issues/86 in some cases.
We were not correctly respecting the download range which lead
to the GCS SDK client closing the connection at times.
Also, fixes another instance of failing to drain the request fully before sending the response headers.
Closes#51446
There is an open JDK bug that is causing an assertion in the JDK's
http server to trip if we don't drain the request body before sending response headers.
See https://bugs.openjdk.java.net/browse/JDK-8180754
Working around this issue here by always draining the request at the beginning of the handler.
Fixes#51446
This test was still very GC heavy in Java 8 runs in particular
which seems to slow down request processing to the point of timeouts
in some runs.
This PR completely removes the large number of O(MB) `byte[]` allocations
that were happening in the mock http handler which cuts the allocation rate
by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles`
run.
Closes#51446Closes#50754
* Faster and Simpler GCS REST Mock
I reworked the GCS mock a little to use less copying+allocation,
log the full request body on failure to read a multi-part request
and generally be a little simpler and easy to follow to track down
the remaining issues that are causing almost daily failures from this
class's multi-part request parsing that can't be reproduced locally.
* Fix GCS Mock Broken Handling of some Blobs
We were incorrectly handling blobs starting in `\r\n` which broke
tests randomly when blobs started on these.
Relates #49429
* Better Logging GCS Blobstore Mock
Two things:
1. We should just throw a descriptive assertion error and figure out why we're not reading a multi-part instead of
returning a `400` and failing the tests that way here since we can't reproduce these 400s locally.
2. We were missing logging the exception on a cleanup delete failure that coincides with the `400` issue in tests.
Relates #49429
Batch deletes get a response for every delete request, not just those that actually hit an existing blob.
The fact that we only responded for existing blobs leads to a degenerate response that throws a parse exception if a batch delete only contains non-existant blobs.
Removing a lot of needless buffering and array creation
to reduce the significant memory usage of tests using this.
The incoming stream from the `exchange` is already buffered
so there is no point in adding a ton of additional buffers
everywhere.
Same as #49518 pretty much but for GCS.
Fixing a few more spots where input stream can get closed
without being fully drained and adding assertions to make sure
it's always drained.
Moved the no-close stream wrapper to production code utilities since
there's a number of spots in production code where it's also useful
(will reuse it there in a follow-up).
This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent`
that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks
weren't fully drained.
Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).
I would suggest merging this and closing related issue after verifying that this fixes things on CI.
The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently.
Relates #49401, #49429
This commit fixes the server side logic of "List Objects" operations
of Azure and S3 fixtures. Until today, the fixtures were returning a "
flat" view of stored objects and were not correctly handling the
delimiter parameter. This causes some objects listing to be wrongly
interpreted by the snapshot deletion logic in Elasticsearch which
relies on the ability to list child containers of BlobContainer (#42653)
to correctly delete stale indices.
As a consequence, the blobs were not correctly deleted from the
emulated storage service and stayed in heap until they got garbage
collected, causing CI failures like #48978.
This commit fixes the server side logic of Azure and S3 fixture when
listing objects so that it now return correct common blob prefixes as
expected by the snapshot deletion process. It also adds an after-test
check to ensure that tests leave the repository empty (besides the
root index files).
Closes#48978
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
Similarly to what has be done for Azure in #48636, this commit
adds a new :test:fixtures:gcs-fixture project which provides two
docker-compose based fixtures that emulate a Google Cloud
Storage service.
Some code has been extracted from existing tests and placed
into this new project so that it can be easily reused in other
projects.