Commit Graph

92 Commits

Author SHA1 Message Date
Himanshu Setia 6f893ed1cd
Merging javadoc feature branch changes to main (#715)
* Adds a gradle plugin to validate missing javadocs

Use `./gradlew missingJavadoc` to validate missing javadocs.
Currently this task fails because several modules are missing
appropriate javadocs. Once added, this should pass.
Also, precommit PomValidation check currently fails with missing Javadoc
plugin, that needs to be fixed -
https://github.com/opensearch-project/OpenSearch/issues/449
Thus keeping this in a separate feature branch.

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Fix Javadoc errors in module `client/rest` (#685)

* Fix Javadoc errors in client/rest module

Signed-off-by: Gregor Zurowski <gregor@zurowski.org>

* Add package info file in client/rest module

Signed-off-by: Gregor Zurowski <gregor@zurowski.org>

* Fix typos

Signed-off-by: Gregor Zurowski <gregor@zurowski.org>

* Add exception documentation to Javadoc

Signed-off-by: Gregor Zurowski <gregor@zurowski.org>

* Fixes precommit task configuration failures due to newly added missin… (#707)

* Fixes precommit task configuration failures due to newly added missingJavadoc task

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Fixes javadoc task errors due to PR#685

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Updated CONTRIBUTING.md for info on javadocs

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Correcting licenses and naming

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Correcting version info

Signed-off-by: Himanshu Setia <setiah@amazon.com>

Co-authored-by: Gregor Zurowski <gregor@zurowski.org>
2021-05-18 13:21:41 -07:00
Rabi Panda 943c778a7f
[CVE-2018-11765] Upgrade hadoop dependencies for hdfs plugin (#654)
Hadoop 2.8.5 has been reported to have CVEs (https://bugzilla.redhat.com/show_bug.cgi?id=1883549). We need to upgrade this to 2.10.1. This also updates the hadoop-minicluster version to 2.10.1 as well. This upgrade also brings in two additional dependencies, woodstox-core and stax2-api that are added along with the sha1s, licenses and notices.

Also upgrade guava to the latest as per the CVE https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-05-13 14:56:47 -07:00
Rabi Panda 67460c7804
Update hadoop-minicluster version for test fixture. (#645)
The `hadoop-minicluster:2.8.5` which is used for integration tests, has dependencies which have been flagged for having security vulnerabilities. Update this to the latest version `3.3.0`

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-05-04 10:48:43 -07:00
Nick Knize 4dde0f2a3b
[DOCKER] add apt update to test fixture krb5kdc (#565)
* [DOCKER] add apt update to test fixture krb5kdc

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-04-15 16:06:21 -05:00
Rabi Panda 2a3ce0bb75
Fix rename issues and failing repository-hdfs tests. (#518)
This commit fixes some partial rename issues and as a result fixes the failing secure repository-hdfs tests.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-04-09 17:51:27 -07:00
Nick Knize 9168f1fb43
[License] Add SPDX and OpenSearch Modification license header (#509)
This commit adds the SPDX Apache-2.0 license header along with an additional
copyright header for all modifications.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-04-09 14:28:18 -05:00
Rabi Panda 2dca3462f2
Fix stragglers from renaming to OpenSearch work. (#483)
This commit fixes more instances where we missed renaming to OpenSearch.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-04-05 11:51:20 -07:00
Himanshu Setia 28bbd94d6f
[Rename] Refactoring Elastic references in docker and kerberos builds (#428) (#438)
Signed-off-by: Himanshu Setia <setiah@amazon.com>
2021-03-22 09:20:14 -05:00
Himanshu Setia 9a51fb2f3b Reverts the rename for test/fixtures/old-elasticsearch done in #370 (#402)
Signed-off-by: Himanshu Setia <setiah@amazon.com>
2021-03-21 20:56:34 -05:00
Himanshu Setia b5e00e1e58 Refactors test/fixtures as per OpenSearch naming (#370)
Signed-off-by: Himanshu Setia <setiah@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda 972d8ea920 [Rename] refactor the libs/core module. (#350)
Refactor the code in the `libs/core` module and any references to those in the entire code base. The refactoring is done as part of the renaming to OpenSearch work.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize 946c7bb2dc [Rename] o.e.common subpackages round 1 (#332)
* [Rename] o.e.common subpackages round 1

This commit refactors the following subpackages of o.e.common:

* o.e.common.joda
* o.e.common.lease
* o.e.common.metrics
* o.e.common.network
* o.e.common.path
* o.e.common.recycling
* o.e.common.regex
* o.e.common.rounding
* o.e.common.text
* o.e.common.time
* o.e.common.transport

to the o.opensearch namespace. All references throughout the codebase have been
refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>

* fix imports 1

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize c4565adc9d [Rename] o.e.common.geo, hash, io (#317)
This commit refactors the following packages:

* o.e.common.geo
* o.e.common.hash
* o.e.common.io

into the o.opensearch.common namespace. All references throughout the codebase
have been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize 0f74cbed1c [Rename] o.e.common.blobstore,breaker,bytes (#307)
This commit refactors the following packages:

* o.e.common.blobstore
* o.e.common.breaker
* o.e.common.bytes

to the o.opensearch.common namespace. All references throughout the codebase
have been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize dafc0510ea [Rename] o.e.common classes (#305)
This commit refactors classes under o.e.common to o.opensearch.common. All
references throughout the codebase have also been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda 3eee5183d1 [Rename] server/rest (#229)
This commit refactors the `server/rest` package as part of the Elasticsearch to OpenSearch renaming.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Francisco Fernández Castaño 2bb5716b3d
Add repositories metering API (#62088)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.

Backport of #60371
2020-09-08 14:01:04 +02:00
Ryan Ernst 6d3b691048
Add snapshot only test modules (#61954)
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.

Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.
2020-09-04 16:35:18 -07:00
Jay Modi f0128ae074
Canonicalize client name in krb5kdc-fixture (#61119)
This commit changes the value for client name canonicalization to true
in the krb5.conf template file. This is done as a means to workaround
JDK-8246193 which has made it into some builds of JDK8.

Closes #61050
2020-08-13 14:58:08 -06:00
Armin Braun 3e2dfc6eac
Remove GCS Bucket Exists Check (#60899) (#60914)
Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS.
We don't need to do the bucket exists check before using the repo, that just needlessly
increases the necessary permissions for using the GCS repository.
2020-08-11 09:54:27 +02:00
Mark Vieira dc7d4c615c
Ensure fixture runtime dependencies are built before starting containers (#59474) 2020-07-13 15:58:01 -07:00
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Armin Braun be6fa72432
Fix GCS Mock Behavior for Missing Bucket (#57283) (#57310)
* Fix GCS Mock Behavior for Missing Bucket

We were throwing a 500 instead of a 404 for a missing bucket.
This would make yaml tests needlessly wait for multiple seconds, retrying
the 500 response with backoff, in the test checking behavior for missing buckets.
2020-05-29 10:01:20 +02:00
Armin Braun a4eb3edf46
Fix GCS Repository YAML Test Build (#57073) (#57101)
A few relatively obvious issues here:

* We cannot run the different IT runs (large blob setting one and normal integ run) concurrently
* We need to set the dependency tasks up correctly for the large blob run so that it works in isolation
* We can't use the `localAddress` for the location header of the resumable upload
(this breaks in YAML tests because GCS is using a loopback port forward for the initial request and the
local address will be chosen as the actual Docker container host)

Closes #57026
2020-05-25 11:10:39 +02:00
Armin Braun 0a879b95d1
Save Bounds Checks in BytesReference (#56577) (#56621)
Two spots that allow for some optimization:

* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
   * Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
2020-05-12 20:33:45 +02:00
Tanguy Leroux 35622747fd
Add Minio tests for searchable snapshots (#56112) (#56179)
This commit adds QA tests for searchable snapshot on MinIO,
similarly to what already exist for S3, GCS and Azure.
2020-05-05 11:40:06 +02:00
Yannick Welsch ba39c261e8 Use streaming reads for GCS (#55506)
To read from GCS repositories we're currently using Google SDK's official BlobReadChannel,
which issues a new request every 2MB (default chunk size for BlobReadChannel) using range
requests, and fully downloads the chunk before exposing it to the returned InputStream. This
means that the SDK issues an awfully high number of requests to download large blobs.
Increasing the chunk size is not an option, as that will mean that an awfully high amount of
heap memory will be consumed by the download process.

The Google SDK does not provide the right abstractions for a streaming download. This PR
uses the lower-level primitives of the SDK to implement a streaming download, similar to what
S3's SDK does.

Also closes #55505
2020-04-21 13:22:26 +02:00
Yannick Welsch b9da307cd1 Add GCS support for searchable snapshots (#55403)
Adds ranged read support for GCS repositories in order to enable searchable snapshot support
for GCS.

As part of this PR, I've extracted some of the test infrastructure to make sure that
GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering
similar test (as I saw those diverging in what they cover)
2020-04-20 13:02:59 +02:00
Rory Hunter a5b545b2a0
Use LTS version of Ubuntu in Dockerfiles (#55370)
We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS
version and has now appears to have been retired from the Ubuntu repositories.
Switch to 18.04, which is the current long-term support version. This
also requires a switch from OpenJDK 12 to 11.

Also change a usage of 16.04 to 18.04, for consistency.
2020-04-17 16:14:14 -04:00
Rory Hunter 49f8f66a41 Revert "Use LTS version of Ubuntu in Dockerfiles (#55327)"
This reverts commit dd76fbac60.
2020-04-16 20:05:22 +01:00
Rory Hunter dd76fbac60 Use LTS version of Ubuntu in Dockerfiles (#55327)
We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS
version and has now appears to have been retired from the Ubuntu repositories.
Switch to 18.04, which is the current long-term support version. Also change a
usage of 16.04 to 18.04, for consistency.
2020-04-16 19:47:18 +01:00
Ryan Ernst 29b70733ae
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:27:53 -07:00
Tanguy Leroux 4d36917e52
Merge feature/searchable-snapshots branch into 7.x (#54803) (#54825)
This is a backport of #54803 for 7.x.

This pull request cherry picks the squashed commit from #54803 with the additional commits:

    6f50c92 which adjusts master code to 7.x
    a114549 to mute a failing ILM test (#54818)
    48cbca1 and 50186b2 that cleans up and fixes the previous test
    aae12bb that adds a missing feature flag (#54861)
    6f330e3 that adds missing serialization bits (#54864)
    bf72c02 that adjust the version in YAML tests
    a51955f that adds some plumbing for the transport client used in integration tests

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-07 13:28:53 +02:00
James Baiera b84c74cf70
Update the HDFS version used by HDFS Repo (#53693) (#54125) 2020-03-25 14:01:29 -04:00
Armin Braun 4271963462
Revert "Use Azure Bulk Deletes in Azure Repository (#53919)" (#54089) (#54111)
This reverts commit 23cccf088810b8416ed278571352393cc2de9523.
Unfortunately SAS token auth still doesn't work with bulk deletes so we can't use them yet.

Closes #54080
2020-03-25 12:13:25 +01:00
Armin Braun b51ea25a00
Use Azure Bulk Deletes in Azure Repository (#53919) (#53967)
Now that we upgraded the Azure SDK to 8.6.2 in #53865 we can make use of
bulk deletes.
2020-03-23 13:35:05 +01:00
Armin Braun 204c366a4e
Upgrade GCS SDK to 1.104.0 (#52839) (#53152)
Upgrading the GCS SDK to the most recent version.
Adjusting (i.e. improving) the REST mock accordingly.
This should significantly boost performance by pulling in
https://github.com/googleapis/java-core/issues/86 in some cases.
2020-03-05 11:18:18 +01:00
Armin Braun f7d71b5930
Fix GCS Mock Range Downloads (#52804) (#52830)
We were not correctly respecting the download range which lead
to the GCS SDK client closing the connection at times.
Also, fixes another instance of failing to drain the request fully before sending the response headers.

Closes #51446
2020-02-26 18:38:02 +01:00
Armin Braun 3be70f64d8
Fix GCS Mock Http Handler JDK Bug (#51933) (#51941)
There is an open JDK bug that is causing an assertion in the JDK's
http server to trip if we don't drain the request body before sending response headers.
See https://bugs.openjdk.java.net/browse/JDK-8180754
Working around this issue here by always draining the request at the beginning of the handler.

Fixes #51446
2020-02-05 15:37:06 +01:00
Armin Braun 7914c1a734
Optimize GCS Mock (#51593) (#51594)
This test was still very GC heavy in Java 8 runs in particular
which seems to slow down request processing to the point of timeouts
in some runs.
This PR completely removes the large number of O(MB) `byte[]` allocations
that were happening in the mock http handler which cuts the allocation rate
by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles`
run.

Closes #51446
Closes #50754
2020-01-29 11:06:05 +01:00
Armin Braun a725896c92
Fix and Reenable SnapshotTool Minio Tests (#50736) (#50745)
This solves half of the problem in #46813 by moving the S3
tests to using the shared minio fixture so we at least have
some non-3rd-party, constantly running coverage on these tests.
2020-01-08 16:33:36 +01:00
Armin Braun d0d48311f4
Faster and Simpler GCS REST Mock (#50706) (#50707)
* Faster and Simpler GCS REST Mock

I reworked the GCS mock a little to use less copying+allocation,
log the full request body on failure to read a multi-part request
and generally be a little simpler and easy to follow to track down
the remaining issues that are causing almost daily failures from this
class's multi-part request parsing that can't be reproduced locally.
2020-01-07 20:17:46 +01:00
Armin Braun 72a405fafb
Fix GCS Mock Broken Handling of some Blobs (#50666) (#50671)
* Fix GCS Mock Broken Handling of some Blobs

We were incorrectly handling blobs starting in `\r\n` which broke
tests randomly when blobs started on these.

Relates #49429
2020-01-06 19:27:57 +01:00
Armin Braun c73930988b
Remove Unused Delete Endpoint from GCS Mock (#50128) (#50134)
Follow up to #50024: we're not using the single-delete any more so no need to have a mock endpoint for it
2019-12-12 14:18:06 +01:00
Armin Braun 0fae4065ef
Better Logging GCS Blobstore Mock (#50102) (#50124)
* Better Logging GCS Blobstore Mock

Two things:
1. We should just throw a descriptive assertion error and figure out why we're not reading a multi-part instead of
returning a `400` and failing the tests that way here since we can't reproduce these 400s locally.
2. We were missing logging the exception on a cleanup delete failure that coincides with the `400` issue in tests.

Relates #49429
2019-12-12 11:17:22 +01:00
Armin Braun d19c8db4e4
Fix GCS Mock Batch Delete Behavior (#50034) (#50084)
Batch deletes get a response for every delete request, not just those that actually hit an existing blob.
The fact that we only responded for existing blobs leads to a degenerate response that throws a parse exception if a batch delete only contains non-existant blobs.
2019-12-11 17:40:25 +01:00
Armin Braun 90e9d61f2b
Optimize GoogleCloudStorageHttpHandler (#49677) (#49707)
Removing a lot of needless buffering and array creation
to reduce the significant memory usage of tests using this.
The incoming stream from the `exchange` is already buffered
so there is no point in adding a ton of additional buffers
everywhere.
2019-11-29 11:17:47 +01:00
Armin Braun 495b543e63
Improve Stability of GCS Mock API (#49592) (#49597)
Same as #49518 pretty much but for GCS.
Fixing a few more spots where input stream can get closed
without being fully drained and adding assertions to make sure
it's always drained.
Moved the no-close stream wrapper to production code utilities since
there's a number of spots in production code where it's also useful
(will reuse it there in a follow-up).
2019-11-26 16:53:51 +01:00
Armin Braun a5fa86ed97
Improve Stability of Mock APIs (#49518) (#49524)
This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent`
that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks
weren't fully drained.

Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).

I would suggest merging this and closing related issue after verifying that this fixes things on CI.

The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently.

Relates #49401, #49429
2019-11-25 10:28:55 +01:00