OpenSearch

Commit Graph

Author	SHA1	Message	Date
Tanguy Leroux	8a14ea5567	Add docker-composed based test fixture for GCS (#48902 ) Similarly to what has be done for Azure in #48636, this commit adds a new :test:fixtures:gcs-fixture project which provides two docker-compose based fixtures that emulate a Google Cloud Storage service. Some code has been extracted from existing tests and placed into this new project so that it can be easily reused in other projects.	2019-11-07 13:27:22 -05:00
Armin Braun	d83e374062	Bound Linearizability Check in CoordinatorTests (#48751 ) (#48853 ) Same as #44444 but for the coordinator tests. Closes #48742	2019-11-04 21:36:17 +01:00
Armin Braun	a22f6fbe3c	Cleanup Redundant Futures in Recovery Code (#48805 ) (#48832 ) Follow up to #48110 cleaning up the redundant future uses that were left over from that change.	2019-11-02 17:28:12 +01:00
Tanguy Leroux	989467ca1e	Add docker-compose based test fixture for Azure (#48736 ) This commit adds a new :test:fixtures:azure-fixture project which provides a docker-compose based container that runs a AzureHttpFixture Java class that emulates an Azure Storage service. The logic to emulate the service is extracted from existing tests and placed in AzureHttpHandler into the new project so that it can be easily reused. The :plugins:repository-azure project is an example of such utilization. The AzureHttpFixture fixture is just a wrapper around AzureHttpHandler and is now executed within the docker container. The :plugins:repository-azure:qa:microsoft-azure project uses the new test fixture and the existing AzureStorageFixture has been removed.	2019-10-31 10:43:43 +01:00
Armin Braun	52e5ceb321	Restore from Individual Shard Snapshot Files in Parallel (#48110 ) (#48686 ) Make restoring shard snapshots run in parallel on the `SNAPSHOT` thread-pool.	2019-10-30 14:36:30 +01:00
Tanguy Leroux	24f6985235	Reduce allocations when draining HTTP requests bodies in repository tests (#48541 ) In repository integration tests, we drain the HTTP request body before returning a response. Before this change this operation was done using Streams.readFully() which uses a 8kb buffer to read the input stream, it now uses a 1kb for the same operation. This should reduce the allocations made during the tests and speed them up a bit on CI. Co-authored-by: Armin Braun <me@obrown.io>	2019-10-29 09:15:06 +01:00
Rory Hunter	30389c6660	Improve SAML tests resiliency to auto-formatting (#48517 ) Backport of #48452. The SAML tests have large XML documents within which various parameters are replaced. At present, if these test are auto-formatted, the XML documents get strung out over many, many lines, and are basically illegible. Fix this by using named placeholders for variables, and indent the multiline XML documents. The tests in `SamlSpMetadataBuilderTests` deserve a special mention, because they include a number of certificates in Base64. I extracted these into variables, for additional legibility.	2019-10-27 16:06:23 +00:00
Tim Brooks	f5f1072824	Multiple remote connection strategy support (#48496 ) * Extract remote "sniffing" to connection strategy (#47253) Currently the connection strategy used by the remote cluster service is implemented as a multi-step sniffing process in the RemoteClusterConnection. We intend to introduce a new connection strategy that will operate in a different manner. This commit extracts the sniffing logic to a dedicated strategy class. Additionally, it implements dedicated tests for this class. Additionally, in previous commits we moved away from a world where the remote cluster connection was mutable. Instead, when setting updates are made, the connection is torn down and rebuilt. We still had methods and tests hanging around for the mutable behavior. This commit removes those. * Introduce simple remote connection strategy (#47480) This commit introduces a simple remote connection strategy which will open remote connections to a configurable list of user supplied addresses. These addresses can be remote Elasticsearch nodes or intermediate proxies. We will perform normal clustername and version validation, but otherwise rely on the remote cluster to route requests to the appropriate remote node. * Make remote setting updates support diff strategies (#47891) Currently the entire remote cluster settings infrastructure is designed around the sniff strategy. As we introduce an additional conneciton strategy this infrastructure needs to be modified to support it. This commit modifies the code so that the strategy implementations will tell the service if the connection needs to be torn down and rebuilt. As part of this commit, we will wait 10 seconds for new clusters to connect when they are added through the "update" settings infrastructure. * Make remote setting updates support diff strategies (#47891) Currently the entire remote cluster settings infrastructure is designed around the sniff strategy. As we introduce an additional conneciton strategy this infrastructure needs to be modified to support it. This commit modifies the code so that the strategy implementations will tell the service if the connection needs to be torn down and rebuilt. As part of this commit, we will wait 10 seconds for new clusters to connect when they are added through the "update" settings infrastructure.	2019-10-25 09:29:41 -06:00
Tim Brooks	c0b545f325	Make BytesReference an interface (#48486 ) BytesReference is currently an abstract class which is extended by various implementations. This makes it very difficult to use the delegation pattern. The implication of this is that our releasable BytesReference is a PagedBytesReference type and cannot be used as a generic releasable bytes reference that delegates to any reference type. This commit makes BytesReference an interface and introduces an AbstractBytesReference for common functionality.	2019-10-24 15:39:30 -06:00
Igor Motov	bdbc353dea	Geo: improve handling of out of bounds points in linestrings (#47939 ) Brings handling of out of bounds points in linestrings in line with points. Now points with latitude above 90 and below -90 are handled the same way as for points by adjusting the longitude by moving it by 180 degrees. Relates to #43916	2019-10-23 14:17:44 -04:00
Armin Braun	7215201406	Track Shard-Snapshot Index Generation at Repository Root (#48371 ) This change adds a new field `"shards"` to `RepositoryData` that contains a mapping of `IndexId` to a `String[]`. This string array can be accessed by shard id to get the generation of a shard's shard folder (i.e. the `N` in the name of the currently valid `/indices/${indexId}/${shardId}/index-${N}` for the shard in question). This allows for creating a new snapshot in the shard without doing any LIST operations on the shard's folder. In the case of AWS S3, this saves about 1/3 of the cost for updating an empty shard (see #45736) and removes one out of two remaining potential issues with eventually consistent blob stores (see #38941 ... now only the root `index-${N}` is determined by listing). Also and equally if not more important, a number of possible failure modes on eventually consistent blob stores like AWS S3 are eliminated by moving all delete operations to the `master` node and moving from incremental naming of shard level index-N to uuid suffixes for these blobs. This change moves the deleting of the previous shard level `index-${uuid}` blob to the master node instead of the data node allowing for a safe and consistent update of the shard's generation in the `RepositoryData` by first updating `RepositoryData` and then deleting the now unreferenced `index-${newUUID}` blob. __No deletes are executed on the data nodes at all for any operation with this change.__ Note also: Previous issues with hanging data nodes interfering with master nodes are completely impossible, even on S3 (see next section for details). This change changes the naming of the shard level `index-${N}` blobs to a uuid suffix `index-${UUID}`. The reason for this is the fact that writing a new shard-level `index-` generation blob is not atomic anymore in its effect. Not only does the blob have to be written to have an effect, it must also be referenced by the root level `index-N` (`RepositoryData`) to become an effective part of the snapshot repository. This leads to a problem if we were to use incrementing names like we did before. If a blob `index-${N+1}` is written but due to the node/network/cluster/... crashes the root level `RepositoryData` has not been updated then a future operation will determine the shard's generation to be `N` and try to write a new `index-${N+1}` to the already existing path. Updates like that are problematic on S3 for consistency reasons, but also create numerous issues when thinking about stuck data nodes. Previously stuck data nodes that were tasked to write `index-${N+1}` but got stuck and tried to do so after some other node had already written `index-${N+1}` were prevented form doing so (except for on S3) by us not allowing overwrites for that blob and thus no corruption could occur. Were we to continue using incrementing names, we could not do this. The stuck node scenario would either allow for overwriting the `N+1` generation or force us to continue using a `LIST` operation to figure out the next `N` (which would make this change pointless). With uuid naming and moving all deletes to `master` this becomes a non-issue. Data nodes write updated shard generation `index-${uuid}` and `master` makes those `index-${uuid}` part of the `RepositoryData` that it deems correct and cleans up all those `index-` that are unused. Co-authored-by: Yannick Welsch <yannick@welsch.lu> Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2019-10-23 10:58:26 +01:00
Tanguy Leroux	4790ee4c32	Reenable azure repository tests and remove some randomization in http servers (#48283 ) Relates #47948 Relates #47380	2019-10-23 09:06:50 +02:00
Armin Braun	8a02a5fc7d	Simplify Shard Snapshot Upload Code (#48155 ) (#48345 ) The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.	2019-10-22 17:17:09 +01:00
Armin Braun	dc08feadc6	Remove Redundant Version Param from Repository APIs (#48231 ) (#48298 ) This parameter isn't used by any implementation	2019-10-21 16:20:45 +02:00
Ignacio Vera	b1224fca8c	upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227 )	2019-10-21 08:21:09 +02:00
Alpar Torok	cc26e30281	Increase timeout for yml tests (#48237 ) Some of these are larger than what can complete in the regular timeout. Closes #48212	2019-10-18 11:14:15 -07:00
jimczi	b858e19bcc	Revert #46598 that breaks the cachability of the sub search contexts.	2019-10-15 09:40:59 +02:00
Alpar Torok	fbbe04b801	Add a verifyVersions to the test FW (#47192 ) The test FW has a method to check that it's implementation of getting index and wire compatible versions as well as reasoning about which version is released or not produces the same rezults as the simillar implementation in the build. This PR adds the `verifyVersions` task to the test FW so we have one task to check everything related to versions.	2019-10-10 11:23:56 +03:00
Armin Braun	302e09decf	Simplify some Common ActionRunnable Uses (#47799 ) (#47828 ) Especially in the snapshot code there's a lot of logic chaining `ActionRunnables` in tricky ways now and the code is getting hard to follow. This change introduces two convinience methods that make it clear that a wrapped listener is invoked with certainty in some trickier spots and shortens the code a bit.	2019-10-09 23:29:50 +02:00
Hendrik Muhs	5e0e54f455	[Transform] move root endpoint to _transform with BWC layer (#47127 ) (#47682 ) move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings	2019-10-08 08:59:01 +02:00
Alpar Torok	2b16d7bcf8	Backport testclusters all (#47565 ) * Bwc testclusters all (#46265) Convert all bwc projects to testclusters * Fix bwc versions config * WIP fix rolling upgrade * Fix bwc tests on old versions * Fix rolling upgrade	2019-10-04 16:12:53 +03:00
Ryan Ernst	f32692208e	Add explanations to script score queries (#46693 ) (#47548 ) While function scores using scripts do allow explanations, they are only creatable with an expert plugin. This commit improves the situation for the newer script score query by adding the ability to set the explanation from the script itself. To set the explanation, a user would check for `explanation != null` to indicate an explanation is needed, and then call `explanation.set("some description")`.	2019-10-03 21:05:05 -07:00
Nhat Nguyen	5e4732f2bb	Limit number of retaining translog files for peer recovery (#47414 ) Today we control the extra translog (when soft-deletes is disabled) for peer recoveries by size and age. If users manually (force) flush many times within a short period, we can keep many small (or empty) translog files as neither the size or age condition is reached. We can protect the cluster from running out of the file descriptors in such a situation by limiting the number of retaining translog files.	2019-10-03 20:45:29 -04:00
Yannick Welsch	99d2fe295d	Use optype CREATE for single auto-id index requests (#47353 ) Changes auto-id index requests to use optype CREATE, making it compliant with our docs. This will also make these auto-id index requests compatible with the new "create-doc" index privilege (which is based on the optype), the default optype is changed to create, just as it is already documented.	2019-10-02 14:16:52 +02:00
Henning Andersen	b5a2afccb2	MockSearchService concurrency fix (#47139 ) Fixed MockSearchService concurrency, assertNoInFlightContext could have false negative result (rarely). Split out from #46060 Closes #47048	2019-10-02 12:33:18 +02:00
Tanguy Leroux	f5c5411fe8	Differentiate base paths in repository integration tests (#47284 ) (#47300 ) This commit change the repositories base paths used in Azure/S3/GCS integration tests so that they don't conflict with each other when tests run in parallel on real storage services. Closes #47202	2019-10-01 08:39:55 +02:00
Armin Braun	3d23cb44a3	Speed up Snapshot Finalization (#47283 ) (#47309 ) As a result of #45689 snapshot finalization started to take significantly longer than before. This may be a little unfortunate since it increases the likelihood of failing to finalize after having written out all the segment blobs. This change parallelizes all the metadata writes that can safely run in parallel in the finalization step to speed the finalization step up again. Also, this will generally speed up the snapshot process overall in case of large number of indices. This is also a nice to have for #46250 since we add yet another step (deleting of old index- blobs in the shards to the finalization.	2019-09-30 23:28:59 +02:00
Yannick Welsch	9dc90e41fc	Remove "force" version type (#47228 ) It's been deprecated long ago and can be removed. Relates to #20377 Closes #19769	2019-09-30 11:58:34 +02:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Tim Brooks	e11c56760d	Fix bind failure logging for mock transport (#47150 ) Currently the MockNioTransport uses a custom exception handler for server channel exceptions. This means that bind failures are logged at the warn level. This commit modifies the transport to use the common TcpTransport exception handler which will log exceptions at the correct level.	2019-09-27 13:53:48 -06:00
Nhat Nguyen	444b47ce88	Relax maxSeqNoOfUpdates assertion in FollowingEngine (#47188 ) We disable MSU optimization if the local checkpoint is smaller than max_seq_no_of_updates. Hence, we need to relax the MSU assertion in FollowingEngine for that scenario. Suppose the leader has three operations: index-0, delete-1, and index-2 for the same doc Id. MSU on the leader is 1 as index-2 is an append. If the follower applies index-0 then index-2, then the assertion is violated. Closes #47137	2019-09-27 14:00:20 -04:00
Tanguy Leroux	42ae76ab7c	Injected response errors in Azure repository tests should have a body (#47176 ) The Azure SDK client expects server errors to have a body, something that looks like: <?xml version="1.0" encoding="utf-8"?> <Error> <Code>string-value</Code> <Message>string-value</Message> </Error> I've forgot to add such errors in Azure tests and that triggers some NPE in the client like the one reported in #47120. Closes #47120	2019-09-27 09:43:29 +02:00
Jim Ferenczi	73a09b34b8	Replace SearchContextException with SearchException (#47046 ) This commit removes the SearchContextException in favor of a simpler SearchException that doesn't leak the SearchContext. Relates #46523	2019-09-26 14:21:23 +02:00
Tanguy Leroux	95e2ca741e	Remove unused private methods and fields (#47154 ) This commit removes a bunch of unused private fields and unused private methods from the code base. Backport of (#47115)	2019-09-26 12:49:21 +02:00
David Turner	45c7783018	Warn on slow metadata persistence (#47130 ) Today if metadata persistence is excessively slow on a master-ineligible node then the `ClusterApplierService` emits a warning indicating that the `GatewayMetaState` applier was slow, but gives no further details. If it is excessively slow on a master-eligible node then we do not see any warning at all, although we might see other consequences such as a lagging node or a master failure. With this commit we emit a warning if metadata persistence takes longer than a configurable threshold, which defaults to `10s`. We also emit statistics that record how much index metadata was persisted and how much was skipped since this can help distinguish cases where IO was slow from cases where there are simply too many indices involved. Backport of #47005.	2019-09-26 07:40:54 +01:00
Tim Brooks	4f47e1f169	Extract proxy connection logic to specialized class (#47138 ) Currently the logic to check if a connection to a remote discovery node exists and otherwise create a proxy connection is mixed with the collect nodes, cluster connection lifecycle, and other RemoteClusterConnection logic. This commit introduces a specialized RemoteConnectionManager class which handles the open connections. Additionally, it reworks the "round-robin" proxy logic to create the list of potential connections at connection open/close time, opposed to each time a connection is requested.	2019-09-25 15:58:18 -06:00
David Turner	ac920e8e64	Assert no exceptions during state application (#47090 ) Today we log and swallow exceptions during cluster state application, but such an exception should not occur. This commit adds assertions of this fact, and updates the Javadocs to explain it. Relates #47038	2019-09-25 12:32:51 +01:00
Tim Brooks	6720c56bdd	Set netty system properties in BuildPlugin (#45881 ) Currently in production instances of Elasticsearch we set a couple of system properties by default. We currently do not apply all of these system properties in tests. This commit applies these properties in the tests.	2019-09-24 10:49:36 -06:00
David Turner	6943a3101f	Cut PersistedState interface from GatewayMetaState (#46655 ) Today `GatewayMetaState` implements `PersistedState` but it's an error to use it as a `PersistedState` before it's been started, or if the node is master-ineligible. It also holds some fields that are meaningless on nodes that do not persist their states. Finally, it takes responsibility for both loading the original cluster state and some of the high-level logic for writing the cluster state back to disk. This commit addresses these concerns by introducing a more specific `PersistedState` implementation for use on master-eligible nodes which is only instantiated if and when it's appropriate. It also moves the fields and high-level persistence logic into a new `IncrementalClusterStateWriter` with a more appropriate lifecycle. Follow-up to #46326 and #46532 Relates #47001	2019-09-24 12:31:13 +01:00
Julie Tibshirani	9124c94a6c	Add support for aliases in queries on _index. (#46944 ) Previously, queries on the _index field were not able to specify index aliases. This was a regression in functionality compared to the 'indices' query that was deprecated and removed in 6.0. Now queries on _index can specify an alias, which is resolved to the concrete index names when we check whether an index matches. To match a remote shard target, the pattern needs to be of the form 'cluster:index' to match the fully-qualified index name. Index aliases can be specified in the following query types: term, terms, prefix, and wildcard.	2019-09-23 13:21:37 -07:00
Jim Ferenczi	08f28e642b	Replace SearchContext with QueryShardContext in query builder tests (#46978 ) This commit replaces the SearchContext used in AbstractQueryTestCase with a QueryShardContext in order to reduce the visibility of search contexts. Relates #46523	2019-09-23 20:24:02 +02:00
Luca Cavanna	d4d1182677	update _common.json format (#46872 ) API spec now use an object for the documentation field. _common was not updated yet. This commit updates _common.json and its corresponding parser. Closes #46744 Co-Authored-By: Tomas Della Vedova <delvedor@users.noreply.github.com>	2019-09-23 17:01:29 +02:00
Yannick Welsch	9638ca20b0	Allow dropping documents with auto-generated ID (#46773 ) When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678	2019-09-19 16:46:33 +02:00
Tanguy Leroux	3ae51f25dd	Move testSnapshotWithLargeSegmentFiles to ESMockAPIBasedRepositoryIntegTestCase (#46802 ) This commit moves the common test testSnapshotWithLargeSegmentFiles to the ESMockAPIBasedRepositoryIntegTestCase base class.	2019-09-18 15:41:30 +02:00
Armin Braun	f983b67fdc	Add Assertion About Leaking index-N to Repo Tests (#46774 ) (#46801 ) This adds an assert to make sure we're not leaking index-N blobs on the shard level to the repo consistency checks. It is ok to have a single redundant index-N blob in a failure scenario but additional index-N should always be cleaned up before adding more.	2019-09-18 13:15:56 +02:00
Tanguy Leroux	4db37801d0	Add resumable uploads support to GCS repository integration tests (#46562 ) This commit adds support for resumable uploads to the internal HTTP server used in GoogleCloudStorageBlobStoreRepositoryTests. This way we can also test the behavior of the Google's client when the service returns server errors in response to resumable upload requests. The BlobStore implementation for GCS has the choice between 2 methods to upload a blob: resumable and multipart. In the current implementation, the client executes a resumable upload if the blob size is larger than LARGE_BLOB_THRESHOLD_BYTE_SIZE, otherwise it executes a multipart upload. This commit makes this logic overridable in tests, allowing to randomize the decision of using one method or the other. The commit add support for single request resumable uploads and chunked resumable uploads (the blob is uploaded into multiple 2Mb chunks; each chunk being a resumable upload). For this last case, this PR also adds a test testSnapshotWithLargeSegmentFiles which makes it more probable that a chunked resumable upload is executed.	2019-09-18 09:33:05 +02:00
Armin Braun	2c70d403fc	Reenable+Fix testMasterShutdownDuringFailedSnapshot (#46303 ) (#46747 ) Reenable this test since it was fixed by #45689 in production code (specifically, the fact that we write the `snap-` blobs without overwrite checks now). Only required adding the assumed blocking on index file writes to test code to properly work again. * Closes #25281	2019-09-17 18:09:48 +02:00
Armin Braun	b00de8edf3	Ensure SAS Tokens in Test Use Minimal Permissions (#46112 ) (#46628 ) There were some issues with the Azure implementation requiring permissions to list all containers ue to a container exists check. This was caught in CI this time, but going forward we should ensure that CI is executed using a token that does not allow listing containers. Relates #43288	2019-09-17 15:40:11 +02:00
Armin Braun	b0f09b279f	Make Snapshot Logic Write Metadata after Segments (#45689 ) (#46764 ) * Write metadata during snapshot finalization after segment files to prevent outdated metadata in case of dynamic mapping updates as explained in #41581 * Keep the old behavior of writing the metadata beforehand in the case of mixed version clusters for BwC reasons * Still overwrite the metadata in the end, so even a mixed version cluster is fixed by this change if a newer version master does the finalization * Fixes #41581	2019-09-17 13:09:39 +02:00
Przemysław Witek	e49be611ad	[7.x] Add audit messages for Data Frame Analytics (#46521 ) (#46738 )	2019-09-16 21:21:38 +02:00
Nhat Nguyen	5465c8d095	Increase timeout for relocation tests (#46554 ) There's nothing wrong in the logs from these failures. I think 30 seconds might not be enough to relocate shards with many documents as CI is quite slow. This change increases the timeout to 60 seconds for these relocation tests. It also dumps the hot threads in case of timed out. Closes #46526 Closes #46439	2019-09-12 16:34:01 -04:00
Jim Ferenczi	4407f3af1b	Delay the creation of SubSearchContext to the FetchSubPhase (#46598 ) This change delays the creation of the SubSearchContext for nested and parent/child inner_hits to the fetch sub phase in order to ensure that a SearchContext can built entirely from a QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid. Relates #46523	2019-09-12 14:52:15 +02:00
Jim Ferenczi	23bf310c84	Replace the SearchContext with QueryShardContext when building aggregator factories (#46527 ) This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them. The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`. Relates #46523	2019-09-11 16:43:30 +02:00
Armin Braun	41633cb9b5	More Efficient Ordering of Shard Upload Execution (#42791 ) (#46588 ) * More Efficient Ordering of Shard Upload Execution (#42791) * Change the upload order of of snapshots to work file by file in parallel on the snapshot pool instead of merely shard-by-shard * Inspired by #39657 * Cleanup BlobStoreRepository Abort and Failure Handling (#46208)	2019-09-11 13:59:20 +02:00
Jim Ferenczi	425b1a77e8	Add more context to QueryShardContext (#46584 ) This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext. It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the query shard context. Relates #46523	2019-09-11 12:24:51 +02:00
David Turner	6c67b53932	Load metadata at start time not construction time (#46326 ) Today we load the metadata from disk while constructing the node. However there is no real need to do so, and this commit moves that code to run later while the node is starting instead.	2019-09-10 11:15:10 +01:00
Tanguy Leroux	88bed09119	Mutualize code in cloud-based repository integration tests (#46483 ) This commit factors out some common code between the cloud-based repository integration tests that were recently improved. Relates #46376	2019-09-09 16:02:14 +02:00
Armin Braun	1bb1c77885	Increase REST-Test Client Timeout to 60s (#46455 ) (#46461 ) We are seeing requests take more than the default 30s which leads to requests being retried and returning unexpected failures like e.g. "index already exists" because the initial requests that timed out, worked out functionally anyway. => double the timeout to reduce the likelihood of the failures described in #46091 => As suggested in the issue, we should in a follow-up turn off retrying all-together probably	2019-09-07 07:40:16 +02:00
Tanguy Leroux	28974b5723	Replace mocked client in GCSBlobStoreRepositoryTests by HTTP server (#46255 ) This commit removes the usage of MockGoogleCloudStoragePlugin in GoogleCloudStorageBlobStoreRepositoryTests and replaces it by a HttpServer that emulates the Storage service. This allows the repository tests to use the real Google's client under the hood in tests and will allow us to test the behavior of the snapshot/restore feature for GCS repositories by simulating random server-side internal errors. The HTTP server used to emulate the Storage service is intentionally simple and minimal to keep things understandable and maintainable. Testing full client options on the server side (like authentication, chunked encoding etc) remains the responsibility of the GoogleCloudStorageFixture.	2019-09-05 10:37:37 +02:00
Alpar Torok	d709a5c193	Quote the task name in reproduction line printer (#46266 ) Some tasks have `#` for instance that doesn't play well with some shells ( e.x. zsh )	2019-09-04 12:22:58 +03:00
Lee Hinman	57f322f85e	Move MockRespository into test framework (#46298 ) This moves the `MockRespository` class into `test/framework/src/main` so it can be used across all modules and plugins in tests.	2019-09-03 16:21:10 -06:00
David Turner	d340530a47	Avoid overshooting watermarks during relocation (#46079 ) Today the `DiskThresholdDecider` attempts to account for already-relocating shards when deciding how to allocate or relocate a shard. Its goal is to stop relocating shards onto a node before that node exceeds the low watermark, and to stop relocating shards away from a node as soon as the node drops below the high watermark. The decider handles multiple data paths by only accounting for relocating shards that affect the appropriate data path. However, this mechanism does not correctly account for _new_ relocating shards, which are unwittingly ignored. This means that we may evict far too many shards from a node above the high watermark, and may relocate far too many shards onto a node causing it to blow right past the low watermark and potentially other watermarks too. There are in fact two distinct issues that this PR fixes. New incoming shards have an unknown data path until the `ClusterInfoService` refreshes its statistics. New outgoing shards have a known data path, but we fail to account for the change of the corresponding `ShardRouting` from `STARTED` to `RELOCATING`, meaning that we fail to find the correct data path and treat the path as unknown here too. This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths for all shards. With the changes here, the data paths are handled in tests as they are in production, except that their sizes are fake. Fixes #45177	2019-08-29 12:40:55 +01:00
Rory Hunter	3666bcfbd8	Handle multiple loopback addresses (#46061 ) AbstractSimpleTransportTestCase.testTransportProfilesWithPortAndHost expects a host to only have a single IPv4 loopback address, which isn't necessarily the case. Allow for >= 1 address. Backport of #45901.	2019-08-29 09:45:51 +01:00
Tanguy Leroux	9e14ffa8be	Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068 )	2019-08-28 16:29:46 +02:00
Luca Cavanna	267183998e	[TEST] wait for http channels to be closed in ESIntegTestCase (#45977 ) We recently added a check to `ESIntegTestCase` in order to verify that no http channels are being tracked when we close clusters and the REST client. Close listeners though are invoked asynchronously, hence this check may fail if we assert before the close listener that removes the channel from the map is invoked. With this commit we add an `assertBusy` so we try and wait for the map to be empty. Closes #45914 Closes #45955	2019-08-27 14:00:24 +02:00
Nhat Nguyen	f2e8b17696	Do not create engine under IndexShard#mutex (#45263 ) Today we create new engines under IndexShard#mutex. This is not ideal because it can block the cluster state updates which also execute under the same mutex. We can avoid this problem by creating new engines under a separate mutex. Closes #43699	2019-08-26 17:18:29 -04:00
Tanguy Leroux	8e66df9925	Move testRetentionLeasesClearedOnRestore (#45896 )	2019-08-23 13:43:40 +02:00
Jason Tedor	de6b6fd338	Add node.processors setting in favor of processors (#45885 ) This commit namespaces the existing processors setting under the "node" namespace. In doing so, we deprecate the existing processors setting in favor of node.processors.	2019-08-22 22:18:37 -04:00
Armin Braun	bfddaaa2ae	Acknowledge Indices Were Wiped Successfully in REST Tests (#45832 ) (#45842 ) In internal test clusters tests we check that wiping all indices was acknowledged but in REST tests we didn't. This aligns the behavior in both kinds of tests. Relates #45605 which might be caused by unacked deletes that were just slow.	2019-08-22 17:19:51 +02:00
Luca Cavanna	a47ade3e64	Cancel search task on connection close (#43332 ) This PR introduces a mechanism to cancel a search task when its corresponding connection gets closed. That would relief users from having to manually deal with tasks and cancel them if needed. Especially the process of finding the task_id requires calling get tasks which needs to call every node in the cluster. The implementation is based on associating each http channel with its currently running search task, and cancelling the task when the previously registered close listener gets called.	2019-08-22 10:43:20 +02:00
Armin Braun	824f1090a9	Disable testTimeoutPerConnection on Windows (#45785 ) (#45818 ) * It appears this test that is specific to how the BSD network stack works does randomly fail on Windows => disabling it since it's not clear that it should work on Windows in a stable way * Fixes #45777	2019-08-22 06:06:09 +02:00
William Brafford	2b549e7342	CLI tools: write errors to stderr instead of stdout (#45586 ) Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools. This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr. Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo \| sort).	2019-08-21 14:46:07 -04:00
Armin Braun	6aaee8aa0a	Repository Cleanup Endpoint (#43900 ) (#45780 ) * Repository Cleanup Endpoint (#43900) * Snapshot cleanup functionality via transport/REST endpoint. * Added all the infrastructure for this with the HLRC and node client * Made use of it in tests and resolved relevant TODO * Added new `Custom` CS element that tracks the cleanup logic. Kept it similar to the delete and in progress classes and gave it some (for now) redundant way of handling multiple cleanups but only allow one * Use the exact same mechanism used by deletes to have the combination of CS entry and increment in repository state ID provide some concurrency safety (the initial approach of just an entry in the CS was not enough, we must increment the repository state ID to be safe against concurrent modifications, otherwise we run the risk of "cleaning up" blobs that just got created without noticing) * Isolated the logic to the transport action class as much as I could. It's not ideal, but we don't need to keep any state and do the same for other repository operations (like getting the detailed snapshot shard status)	2019-08-21 17:59:49 +02:00
Gordon Brown	ecb3ebd796	Clean SLM and ongoing snapshots in test framework (#45564 ) Adjusts the cluster cleanup routine in ESRestTestCase to clean up SLM test cases, and optionally wait for all snapshots to be deleted. Waiting for all snapshots to be deleted, rather than failing if any are in progress, is necessary for tests which use SLM policies because SLM policies may be in the process of executing when the test ends.	2019-08-16 14:17:34 -06:00
Igor Motov	98c850c08b	Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618 ) Changes the order of parameters in Geometries from lat, lon to lon, lat and moves all Geometry classes are moved to the org.elasticsearch.geomtery package. Backport of #45332 Closes #45048	2019-08-16 14:42:02 -04:00
Luca Cavanna	c31cddf27e	Update the schema for the REST API specification (#42346 ) * Update the REST API specification This patch updates the REST API spefication in JSON files to better encode deprecated entities, to improve specification of URL paths, and to open up the schema for future extensions. Notably, it changes the `paths` from a list of strings to a list of objects, where each particular object encodes all the information for this particular path: the `parts` and the `methods`. Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST` methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one. Also `documentation` becomes an object that supports an `url` and also a `description` which is a new field. * Adapt YAML runner to new REST API specification format The logic for choosing the path to use when running tests has been simplified, as a consequence of the path parts being listed under each path in the spec. The special case for create and index has been removed. Also the parsing code has been hardened so that errors are thrown earlier when the structure of the spec differs from what expected, and their error messages should be more helpful.	2019-08-16 14:40:00 +02:00
Alpar Torok	4a67645e5d	Use dynamic ports for ESSingleNodeTestCase too Extends #45601 to cover all tests.	2019-08-16 09:17:19 +03:00
Armin Braun	73e266b2fd	Fix Failures when Closing Indices in EsBlobStoreRepositoryIntegTestCase (#45532 ) (#45614 ) * Same issue as in #44754 as far as I can see: in case of async translog persistence we randomly fail to close * Closes #45335 * Closes #45334	2019-08-15 19:45:17 +02:00
Alpar Torok	03a1645bc6	Use dynamic port ranges for ExternalTestCluster (#45601 ) Moves methods added in #44213 and uses them to configure the port range for `ExternalTestCluster` too. These were still using `9300-9400` ( teh default ) and running into races.	2019-08-15 16:40:12 +03:00
Nick Knize	647a8308c3	[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363 ) * Introduce Spatial Plugin (#44389) Introduce a skeleton Spatial plugin that holds new licensed features coming to Geo/Spatial land! * [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923) Refactor DeprecatedParameters specific to legacy geo_shape out of AbstractGeometryFieldMapper.TypeParser#parse. * [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980) Add a new ShapeFieldMapper to the xpack spatial module for indexing arbitrary cartesian geometries using a new field type called shape. The indexing approach leverages lucene's new XYShape field type which is backed by BKD in the same manner as LatLonShape but without the WGS84 latitude longitude restrictions. The new field mapper builds on and extends the refactoring effort in AbstractGeometryFieldMapper and accepts shapes in either GeoJSON or WKT format (both of which support non geospatial geometries). Tests are provided in the ShapeFieldMapperTest class in the same manner as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests. Documentation for how to use the new field type and what parameters are accepted is included. The QueryBuilder for searching indexed shapes is provided in a separate commit. * [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108) Add a new ShapeQueryBuilder to the xpack spatial module for querying arbitrary Cartesian geometries indexed using the new shape field type. The query builder extends AbstractGeometryQueryBuilder and leverages the ShapeQueryProcessor added in the previous field mapper commit. Tests are provided in ShapeQueryTests in the same manner as GeoShapeQueryTests and docs are updated to explain how the query works.	2019-08-14 16:35:10 -05:00
Nhat Nguyen	4fcf7bbd07	Do not hold writeLock while verifying Lucene/translog We should not hold Engine#writeLock while executing assertConsistentHistoryBetweenTranslogAndLuceneIndex for this check might acquire Engine#readLock. Relates #45461	2019-08-13 16:16:06 -04:00
Nhat Nguyen	24514275c7	Get max_seq_no after snapshot translog and Lucene (#45461 ) We should capture max_seq_no after snapshotting translog and Lucene; otherwise, that max_seq_no can be smaller some operation in translog or Lucene. With this change, we also hold the Engine#writeLock during this check so that no indexing can happen. Closes #45454	2019-08-13 16:16:06 -04:00
Nhat Nguyen	25c6102101	Trim local translog in peer recovery (#44756 ) Today, if an operation-based peer recovery occurs, we won't trim translog but leave it as is. Some unacknowledged operations existing in translog of that replica might suddenly reappear when it gets promoted. With this change, we ensure trimming translog above the starting sequence number of phase 2. This change can allow us to read translog forward.	2019-08-10 22:59:02 -04:00
Armin Braun	12ed6dc999	Only retain reasonable history for peer recoveries (#45208 ) (#45355 ) Today if a shard is not fully allocated we maintain a retention lease for a lost peer for up to 12 hours, retaining all operations that occur in that time period so that we can recover this replica using an operations-based recovery if it returns. However it is not always reasonable to perform an operations-based recovery on such a replica: if the replica is a very long way behind the rest of the replication group then it can be much quicker to perform a file-based recovery instead. This commit introduces a notion of "reasonable" recoveries. If an operations-based recovery would involve copying only a small number of operations, but the index is large, then an operations-based recovery is reasonable; on the other hand if there are many operations to copy across and the index itself is relatively small then it makes more sense to perform a file-based recovery. We measure the size of the index by computing its number of documents (including deleted documents) in all segments belonging to the current safe commit, and compare this to the number of operations a lease is retaining below the local checkpoint of the safe commit. We consider an operations-based recovery to be reasonable iff it would involve replaying at most 10% of the documents in the index. The mechanism for this feature is to expire peer-recovery retention leases early if they are retaining so much history that an operations-based recovery using that lease would be unreasonable. Relates #41536	2019-08-09 01:56:32 +02:00
Tim Brooks	af908efa41	Disable netty direct buffer pooling by default (#44837 ) Elasticsearch does not grant Netty reflection access to get Unsafe. The only mechanism that currently exists to free direct buffers in a timely manner is to use Unsafe. This leads to the occasional scenario, under heavy network load, that direct byte buffers can slowly build up without being freed. This commit disables Netty direct buffer pooling and moves to a strategy of using a single thread-local direct buffer for interfacing with sockets. This will reduce the memory usage from networking. Elasticsearch currently derives very little value from direct buffer usage (TLS, compression, Lucene, Elasticsearch handling, etc all use heap bytes). So this seems like the correct trade-off until that changes.	2019-08-08 15:10:31 -06:00
David Turner	355713b9ca	Improve slow logging in MasterService (#45241 ) Adds a tighter threshold for logging a warning about slowness in the `MasterService` instead of relying on the cluster service's 30-second warning threshold. This new threshold applies to the computation of the cluster state update in isolation, so we get a warning if computing a new cluster state update takes longer than 10 seconds even if it is subsequently applied quickly. It also applies independently to the length of time it takes to notify the cluster state tasks on completion of publication, in case any of these notifications holds up the master thread for too long. Relates #45007 Backport of #45086	2019-08-06 17:01:49 +01:00
Jason Tedor	5b1b146099	Normalize environment paths (#45179 ) This commit applies a normalization process to environment paths, both in how they are stored internally, also their settings values. This normalization is done via two means: - we make the paths absolute - we remove redundant name elements from the path (what Java calls "normalization") This change ensures that when we compare and refer to these paths within the system, we are using a common ground. For example, prior to the change if the data path was relative, we would not compare it correctly to paths from disk usage. This is because the paths in disk usage were being made absolute.	2019-08-06 06:04:30 -04:00
Yannick Welsch	7aeb2fe73c	Add per-socket keepalive options (#44055 ) Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings. By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like to explore whether we can enable them by default, in particular to force keepalive configurations that are better tuned for running ES.	2019-08-06 10:45:44 +02:00
Zachary Tong	3df1c76f9b	Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179 ) This adjusts the `buckets_path` parser so that pipeline aggs can select specific buckets (via their bucket keys) instead of fetching the entire set of buckets. This is useful for bucket_script in particular, which might want specific buckets for calculations. It's possible to workaround this with `filter` aggs, but the workaround is hacky and probably less performant. - Adjusts documentation - Adds a barebones AggregatorTestCase for bucket_script - Tweaks AggTestCase to use getMockScriptService() for reductions and pipelines. Previously pipelines could just pass in a script service for testing, but this didnt work for regular aggs. The new getMockScriptService() method fixes that issue, but needs to be used for pipelines too. This had a knock-on effect of touching MovFn, AvgBucket and ScriptedMetric	2019-08-05 12:18:40 -04:00
David Turner	13a167051f	Remove fileBasedRecovery flag (#45146 ) Today `RecoveryTarget#prepareForTranslogOperations` takes a boolean flag indicating whether the recovery is file-based or not. This was used in 6.x to bootstrap some commit data that were missing in indices created in 5.x: `b506955f8d/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java (L298-L300)` This flag no longer has any effect, so this commit removes it. Backport of #45131 to 7.x.	2019-08-05 08:17:40 +01:00
Tim Brooks	984ba82251	Move nio channel initialization to event loop (#45155 ) Currently in the transport-nio work we connect and bind channels on the a thread before the channel is registered with a selector. Additionally, it is at this point that we set all the socket options. This commit moves these operations onto the event-loop after the channel has been registered with a selector. It attempts to set the socket options for a non-server channel at registration time. If that fails, it will attempt to set the options after the channel is connected. This should fix #41071.	2019-08-02 17:31:31 -04:00
David Turner	9ff320d967	Use index for peer recovery instead of translog (#45137 ) Today we recover a replica by copying operations from the primary's translog. However we also retain some historical operations in the index itself, as long as soft-deletes are enabled. This commit adjusts peer recovery to use the operations in the index for recovery rather than those in the translog, and ensures that the replication group retains enough history for use in peer recovery by means of retention leases. Reverts #38904 and #42211 Relates #41536 Backport of #45136 to 7.x.	2019-08-02 15:00:43 +01:00
Armin Braun	9450505d5b	Stop Passing Around REST Request in Multiple Spots (#44949 ) (#45109 ) * Stop Passing Around REST Request in Multiple Spots * Motivated by #44564 * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap. * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.	2019-08-02 07:31:38 +02:00
David Turner	c088bafbbc	Wait for events in waitForRelocation (#45074 ) Adds a `waitForEvents(Priority.LANGUID)` to the cluster health request in `ESIntegTestCase#waitForRelocation()` to deal with the case that this health request returns successfully despite the fact that there is a pending reroute task which will relocate another shard. Relates #44433 Fixes #45003	2019-08-01 13:47:39 +01:00
Nhat Nguyen	979d0a71c7	Remove leniency during replay translog in peer recovery (#44989 ) This change removes leniency in InternalEngine during replaying translog in peer recovery.	2019-07-30 13:25:15 -04:00
Armin Braun	548c767b6b	S3 3rd Party Test Goal (#44799 ) (#45004 ) * Create S3 Third Party Test Task that Covers the S3 CLI Tool * Adjust snapshot cli test tool tests to work with real S3 * Build adjustment * Clean up repo path before testing * Dedup the logic for asserting path contents by using the correct utility method here that somehow became unused	2019-07-30 17:16:41 +02:00
David Turner	55f1dd8da6	Close nodes properly in Coordinator tests (#44967 ) Today closing a `ClusterNode` in an `AbstractCoordinatorTestCase` uses `onNode()` so has no effect if the node is not in the current list of nodes. It also discards the `Runnable` it creates without having run it, so has no effect anyway. This commit makes these tests much stricter about properly closing the nodes started during `Coordinator` tests, by tracking the persisted states that are opened, and adds an assertion to catch the trappy requirement that the closing node still belongs to the cluster.	2019-07-30 11:47:36 +01:00
Andrey Ershov	5a0bd696fc	Snapshot tool S3 cleanup 7.x backport (#44575 ) Backport of #44551	2019-07-30 11:02:08 +02:00
Nhat Nguyen	4813728783	Remove leniency in reset engine from translog (#44711 ) Replaying operations from the local translog must never fail as those operations were processed successfully on the primary before and the mapping is up to update already. This change removes leniency during resetting engine from translog in IndexShard and InternalEngine.	2019-07-29 16:31:45 -04:00
Yannick Welsch	8653c33838	Fix testBlockingIncomingRequests (#44939 ) Adapted test to take non-blocking nature into account.	2019-07-29 16:37:53 +02:00

1 2 3 4 5 ...

2289 Commits