Commit Graph

2390 Commits

Author SHA1 Message Date
David Turner 65dc888623 Resume partial download from S3 on connection drop (#46589)
Today if the connection to S3 times out or drops after starting to download an
object then the SDK does not attempt to recover or resume the download, causing
the restore of the whole shard to fail and retry. This commit allows
Elasticsearch to detect such a mid-stream failure and to resume the download
from where it failed.
2019-09-17 13:11:36 +01:00
Luca Cavanna e57756492a Update http-core and http-client dependencies (#46549)
Relates to #45808
Closes #45577
2019-09-12 09:45:29 +02:00
Mark Vieira ccf656a9d0
Repository plugin test cacheability fixes (#46572) 2019-09-11 08:24:55 -07:00
Tanguy Leroux 88bed09119 Mutualize code in cloud-based repository integration tests (#46483)
This commit factors out some common code between the cloud-based
repository integration tests that were recently improved.

Relates #46376
2019-09-09 16:02:14 +02:00
Tanguy Leroux 023cf44025 Inject random server errors in AzureBlobStoreRepositoryTests (#46371)
This commit modifies the HTTP server used in 
AzureBlobStoreRepositoryTests so that it randomly returns 
server errors for any type of request executed by the Azure client.
2019-09-09 10:00:09 +02:00
Tanguy Leroux 8e3dc68454 Inject random server errors in GoogleCloudStorageBlobStoreRepositoryTests (#46376)
This commit modifies the HTTP server used in 
GoogleCloudStorageBlobStoreRepositoryTests so that it randomly 
returns server errors. The test does not inject server errors for the 
following types of request: batch request, resumable upload request.
2019-09-09 09:59:59 +02:00
David Turner cc092b1be1 Add support for OneZoneInfrequentAccess storage (#46436)
The `repository-s3` plugin has supported a storage class of `onezone_ia` since
the SDK upgrade in #30723, but we do not test or document this fact. This
commit adds this storage class to the docs and adds a test to ensure that the
documented storage classes are all accepted by S3 too.

Fixes #30474
2019-09-09 07:54:44 +01:00
Tanguy Leroux 2290865559 Fix usage of randomIntBetween() in testWriteBlobWithRetries (#46380)
This commit fixes the usage of randomIntBetween() in the test 
testWriteBlobWithRetries, when the test generates a random array  
of a single byte.
2019-09-06 09:10:38 +02:00
Tanguy Leroux 28974b5723 Replace mocked client in GCSBlobStoreRepositoryTests by HTTP server (#46255)
This commit removes the usage of MockGoogleCloudStoragePlugin in
GoogleCloudStorageBlobStoreRepositoryTests and replaces it by a
HttpServer that emulates the Storage service. This allows the repository
tests to use the real Google's client under the hood in tests and will allow
us to test the behavior of the snapshot/restore feature for GCS repositories
by simulating random server-side internal errors.

The HTTP server used to emulate the Storage service is intentionally simple
and minimal to keep things understandable and maintainable. Testing full
client options on the server side (like authentication, chunked encoding
etc) remains the responsibility of the GoogleCloudStorageFixture.
2019-09-05 10:37:37 +02:00
Tanguy Leroux 6d1a82134c Add repository integration tests for Azure (#46263)
Similarly to what had been done for S3 (#46081) and GCS (#46255) 
this commit adds repository integration tests for Azure, based on an 
internal HTTP server instead of mocks.
2019-09-05 09:26:42 +02:00
Tanguy Leroux bd7a04cd55 Disable request throttling in S3BlobStoreRepositoryTests (#46226)
When some high values are randomly picked up - for example the number
of indices to snapshot or the number of snapshots to create - the tests
in S3BlobStoreRepositoryTests can generate a high number of requests to
the internal S3 server.

In order to test the retry logic of the S3 client, the internal server is
designed to randomly generate random server errors. When many
 requests are made, it is possible that the S3 client reaches its maximum
number of successive retries capacity. Then the S3 client will stop
retrying requests until enough retry attempts succeed, but it means
that any request could fail before reaching the max retries count and
make the test fail too.

Closes #46217
Closes #46218
Closes #46219
2019-09-02 16:44:43 +02:00
Henning Andersen d68e05aade Mute 2 tests in S3BlobStoreRepositoryTests (#46221)
Muted testSnapshotAndRestore and testMultipleSnapshotAndRollback

Relates #46218 and #46219
2019-09-02 10:38:03 +02:00
Tanguy Leroux 0c1b263e8d Inject random errors in S3BlobStoreRepositoryTests (#46125)
This commit modifies the HTTP server used in S3BlobStoreRepositoryTests 
so that it randomly returns server errors for any type of request executed by
 the SDK client. It is now possible to verify that the repository tests are s
uccessfully completed even if one or more errors were returned by the S3 
service in response of a blob upload, a blob deletion or a object listing request 
etc.

Because injecting errors forces the SDK client to retry requests, the test limits
 the maximum errors to send in response for each request at 3 retries.
2019-08-30 11:58:09 +02:00
Tanguy Leroux b526309fbd Replace MockAmazonS3 usage in S3BlobStoreRepositoryTests by a HTTP server (#46081)
This commit removes the usage of MockAmazonS3 in S3BlobStoreRepositoryTests 
and replaces it by a HttpServer that emulates the S3 service. This allows the 
repository tests to use the real Amazon's S3 client under the hood in tests and will 
allow to test the behavior of the snapshot/restore feature for S3 repositories by 
simulating random server-side internal errors.

The HTTP server used to emulate the S3 service is intentionally simple and minimal 
to keep things understandable and maintainable. Testing full client options on the 
server side (like authentication, chunked encoding etc) remains the responsibility 
of the AmazonS3Fixture.
2019-08-29 13:16:59 +02:00
Tanguy Leroux 9e14ffa8be Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068) 2019-08-28 16:29:46 +02:00
Jason Tedor 3d64605075
Remove node settings from blob store repositories (#45991)
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
2019-08-26 16:26:13 -04:00
Tanguy Leroux a3d918bddb Refactor RepositoryCredentialsTests (#45919)
This commit refactors the S3 credentials tests in
RepositoryCredentialsTests so that it now uses a single
node (ESSingleNodeTestCase) to test how secure/insecure
credentials are overriding each other. Using a single node
makes it much easier to understand what each test is actually
testing and IMO better reflect how things are initialized.

It also allows to fold into this class the test
testInsecureRepositoryCredentials which was wrongly located
in S3BlobStoreRepositoryTests. By moving this test away, the
S3BlobStoreRepositoryTests class does not need the
allow_insecure_settings option anymore and thus can be
executed as part of the usual gradle test task.
2019-08-26 15:14:43 +02:00
Tanguy Leroux aee92d573c Allow partial request body reads in AWS S3 retries tests (#45847)
This commit changes the tests added in #45383 so that the fixture that 
emulates the S3 service now sometimes consumes all the request body 
before sending an error, sometimes consumes only a part of the request 
body and sometimes consumes nothing. The idea here is to beef up a bit 
the tests that writes blob because the client's retry logic relies on 
marking and resetting the blob's input stream.

This pull request also changes the testWriteBlobWithRetries() so that it 
(rarely) tests with a large blob (up to 1mb), which is more than the client's 
default read limit on input streams (131Kb).

Finally, it optimizes the ZeroInputStream so that it is a bit more effective 
(now works using an internal buffer and System.arraycopy() primitives).
2019-08-23 13:43:31 +02:00
Tanguy Leroux 57a36eb373 Add tests to check that requests are retried when writing/reading blobs on S3 (#45383)
This commit adds tests to verify the behavior of the S3BlobContainer and 
its underlying AWS SDK client when the remote S3 service is responding 
errors or not responding at all. The expected behavior is that requests are 
retried multiple times before the client gives up and the S3BlobContainer 
bubbles up an exception.

The test verifies the behavior of BlobContainer.writeBlob() and 
BlobContainer.readBlob(). In the case of S3 writing a blob can be executed 
as a single upload or using multipart requests; the test checks both scenario 
by writing a small then a large blob.
2019-08-22 11:41:40 +02:00
Armin Braun 6aaee8aa0a
Repository Cleanup Endpoint (#43900) (#45780)
* Repository Cleanup Endpoint (#43900)

* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 17:59:49 +02:00
Jim Ferenczi fe2a7523ec Add support for inlined user dictionary in the Kuromoji plugin (#45489)
This change adds a new option called user_dictionary_rules to
Kuromoji's tokenizer. It can be used to set additional tokenization rules
to the Japanese tokenizer directly in the settings (instead of using a file).
This commit also adds a check that no rules are duplicated since this is not allowed
in the UserDictionary.

Closes #25343
2019-08-21 16:28:30 +02:00
Igor Motov 1818c5fa44 Ingest Attachment: Upgrade tika to v1.22 (#45575)
Upgrades:
Apache Tika: 1.19.1 -> 1.22.
pdfbox : 2.0.12 -> 2.0.16
poi : 4.0.0 -> 4.0.1
2019-08-19 18:17:16 -04:00
Luca Cavanna c31cddf27e
Update the schema for the REST API specification (#42346)
* Update the REST API specification

This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.

Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.

Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.

Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.

* Adapt YAML runner to new REST API specification format

The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.

Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
2019-08-16 14:40:00 +02:00
Yogesh Gaikwad 471d940c44
Refactor cluster privileges and cluster permission (#45265) (#45442)
The current implementations make it difficult for
adding new privileges (example: a cluster privilege which is
more than cluster action-based and not exposed to the security
administrator). On the high level, we would like our cluster privilege
either:
- a named cluster privilege
  This corresponds to `cluster` field from the role descriptor
- or a configurable cluster privilege
  This corresponds to the `global` field from the role-descriptor and
allows a security administrator to configure them.

Some of the responsibilities like the merging of action based cluster privileges
are now pushed at cluster permission level. How to implement the predicate
(using Automaton) is being now enforced by cluster permission.

`ClusterPermission` helps in enforcing the cluster level access either by
performing checks against cluster action and optionally against a request.
It is a collection of one or more permission checks where if any of the checks
allow access then the permission allows access to a cluster action.

Implementations of cluster privilege must be able to provide information
regarding the predicates to the cluster permission so that can be enforced.
This is enforced by making implementations of cluster privilege aware of
cluster permission builder and provide a way to specify how the permission is
to be built for a given privilege.

This commit renames `ConditionalClusterPrivilege` to `ConfigurableClusterPrivilege`.
`ConfigurableClusterPrivilege` is a renderable cluster privilege exposed
as a `global` field in role descriptor.

Other than this there is a requirement where we would want to know if a cluster
permission is implied by another cluster-permission (`has-privileges`).
This is helpful in addressing queries related to privileges for a user.
This is not just simply checking of cluster permissions since we do not
have access to runtime information (like request object).
This refactoring does not try to address those scenarios.

Relates #44048
2019-08-13 09:06:18 +10:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Armin Braun a501d68f23
Upgrade to Netty 4.1.38 (#45132) (#45364)
* A number of fixes to buffer handling in the .37 and .38 -> we should stay up to date
2019-08-09 03:38:14 +02:00
Tim Brooks af908efa41
Disable netty direct buffer pooling by default (#44837)
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
2019-08-08 15:10:31 -06:00
Armin Braun 5d7fafec14
Add Assertion to Ensure Retries in S3BlobContainer (#45224) (#45230)
* We need a `markSupported` input stream to retry uploads
* Relates #45153
2019-08-06 16:11:19 +02:00
Yannick Welsch 7aeb2fe73c Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-06 10:45:44 +02:00
Tim Brooks 984ba82251
Move nio channel initialization to event loop (#45155)
Currently in the transport-nio work we connect and bind channels on the
a thread before the channel is registered with a selector. Additionally,
it is at this point that we set all the socket options. This commit
moves these operations onto the event-loop after the channel has been
registered with a selector. It attempts to set the socket options for a
non-server channel at registration time. If that fails, it will attempt
to set the options after the channel is connected. This should fix
#41071.
2019-08-02 17:31:31 -04:00
Armin Braun 9450505d5b
Stop Passing Around REST Request in Multiple Spots (#44949) (#45109)
* Stop Passing Around REST Request in Multiple Spots

* Motivated by #44564
  * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap.
  * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.
2019-08-02 07:31:38 +02:00
Tim Brooks aff66e3ac5
Add Cors integration tests (#44361)
This commit adds integration tests to ensure that the basic cors
functionality works for the netty and nio transports.
2019-07-31 14:24:23 -06:00
Armin Braun 548c767b6b
S3 3rd Party Test Goal (#44799) (#45004)
* Create S3 Third Party Test Task that Covers the S3 CLI Tool
* Adjust snapshot cli test tool tests to work with real S3
  * Build adjustment
  * Clean up repo path before testing
* Dedup the logic for asserting path contents by using the correct utility method here that somehow became unused
2019-07-30 17:16:41 +02:00
Armin Braun 4495140d1f
Release Pooled Buffers Earlier for HTTP Requests (#44952) (#44991)
* We should release the buffers right after copying and not only do so after we did all the request handling on the copy
* Relates #44564
2019-07-30 10:30:01 +02:00
Ignacio Vera 821f6f893b
Upgrade to Lucene 8.2.0 release (#44859) (#44892) 2019-07-26 08:14:59 +02:00
Ioannis Kakavas 3714cb63da Allow parsing the value of java.version sysprop (#44017)
We often start testing with early access versions of new Java
versions and this have caused minor issues in our tests
(i.e. #43141) because the version string that the JVM reports
cannot be parsed as it ends with the string -ea.

This commit changes how we parse and compare Java versions to
allow correct parsing and comparison of the output of java.version
system property that might include an additional alphanumeric
part after the version numbers
 (see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it 
handles a version number part, like before, but additionally a 
PRE part that matches ([a-zA-Z0-9]+).

It also changes a number of tests that would attempt to parse
java.specification.version in order to get the full version
of Java. java.specification.version only contains the major
version and is thus inappropriate when trying to compare against
a version that might contain a minor, patch or an early access
part. We know parse java.version that can be consistently
parsed.

Resolves #43141
2019-07-22 20:14:56 +03:00
Jason Tedor d82a570a2a
Reomve debugging loging statements from Azure tests
This commit removes some unneeded debugging logging statements from the
Azure storage tests.

Relates #44672
2019-07-22 16:55:55 +09:00
Jason Tedor a493a34143
Use debug logging instead for Azure tests (#44672)
These Azure tests have hard println statements which means we always see
these messages during configuration. Yet, there are unnecessary most of
the time. This commit changes them to use debug logging.
2019-07-22 16:46:13 +09:00
Armin Braun 07cf2cb665
Add disable_chunked_encoding Setting to S3 Repo (#44052) (#44562)
* Add disable_chunked_encoding setting to S3 repo plugin to support S3 implementations that don't support chunked encoding
2019-07-18 16:57:56 +02:00
Yogesh Gaikwad 4c95cc3223 skip repository-hdfs integTest in case of fips jvm (#44319)
The repository-hdfs runners need to be disabled it in fips mode.

Testing done for all the tasks, dynamic created and static (integTest, integTestHa, integSecureTest, integSecureHaTest)
2019-07-18 21:10:53 +10:00
maarab7 1375cc93a8 Fix parameter value for calling data.advanceExact (#44205)
While the code works perfectly well for a single segment, it returns the wrong values for multiple segments. E.g. If we have 500 docs in one segment and if we want to get the doc id = 280 then data.advanceExact(topDocs.scoreDocs[i].doc) works fine. If we have two segments, say, with first segment having docs 1-200 and the second segment having docs 201-500, then 280 is fetched from the second segment but is actually 480. Subtracting the docBase (280-200) takes us to the correct document which is 80 in the second segment and actually 280.
2019-07-18 10:55:10 +02:00
Jason Tedor 39c5f98de7
Introduce test issue logging (#44477)
Today we have an annotation for controlling logging levels in
tests. This annotation serves two purposes, one is to control the
logging level used in tests, when such control is needed to impact and
assert the behavior of loggers in tests. The other use is when a test is
failing and additional logging is needed. This commit separates these
two concerns into separate annotations.

The primary motivation for this is that we have a history of leaving
behind the annotation for the purpose of investigating test failures
long after the test failure is resolved. The accumulation of these stale
logging annotations has led to excessive disk consumption. Having
recently cleaned this up, we would like to avoid falling into this state
again. To do this, we are adding a link to the test failure under
investigation to the annotation when used for the purpose of
investigating test failures. We will add tooling to inspect these
annotations, in the same way that we have tooling on awaits fix
annotations. This will enable us to report on the use of these
annotations, and report when stale uses of the annotation exist.
2019-07-18 05:33:33 +09:00
Armin Braun 65fcaecce1
Remove Minio Host Hack in S3 Repository Build (#44491) (#44497)
* Resolving the todo to clean this hackyness up
2019-07-17 19:59:00 +02:00
Ignacio Vera eb348d2593
Upgrade to lucene-8.2.0-snapshot-6413aae226 (#44480) 2019-07-17 13:28:28 +02:00
Armin Braun c8db0e9b7e
Remove blobExists Method from BlobContainer (#44472) (#44475)
* We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface
   * Keep it as an implementation detail in the Azure repository
2019-07-17 11:56:02 +02:00
Tim Brooks 6b1a769638
Move CORS Config into :server package (#43779)
This commit moves the config that stores Cors options into the server
package. Currently both nio and netty modules must have a copy of this
config. Moving it into server allows one copy and the tests to be in a
common location.
2019-07-16 17:50:42 -06:00
Tim Brooks 0a352486e8
Isolate nio channel registered from channel active (#44388)
Registering a channel with a selector is a required operation for the
channel to be handled properly. Currently, we mix the registeration with
other setup operations (ip filtering, SSL initiation, etc). However, a
fail to register is fatal. This PR modifies how registeration occurs to
immediately close the channel if it fails.

There are still two clear loopholes for how a user can interact with a
channel even if registration fails. 1. through the exception handler.
2. through the channel accepted callback. These can perhaps be improved
in the future. For now, this PR prevents writes from proceeding if the
channel is not registered.
2019-07-16 17:18:57 -06:00
Armin Braun 940aa71930
Cleanup S3 BlobContainer Listing Logic (#43088) (#44406)
* Cleanup duplication in creating and looping over IO Requests
2019-07-16 12:19:20 +02:00
Ryan Ernst 7e06888bae
Convert testclusters to use distro download plugin (#44253) (#44362)
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.
2019-07-15 17:53:05 -07:00
Yogesh Gaikwad b40b6dd542 Disable repository-hdfs tests in FIPS jvm (#44283)
Due to https://github.com/elastic/elasticsearch/issues/40079,
we need to disable repository-hdfs tests in FIPS jvm.
2019-07-13 20:11:32 +10:00