OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Turner	9ff320d967	Use index for peer recovery instead of translog (#45137 ) Today we recover a replica by copying operations from the primary's translog. However we also retain some historical operations in the index itself, as long as soft-deletes are enabled. This commit adjusts peer recovery to use the operations in the index for recovery rather than those in the translog, and ensures that the replication group retains enough history for use in peer recovery by means of retention leases. Reverts #38904 and #42211 Relates #41536 Backport of #45136 to 7.x.	2019-08-02 15:00:43 +01:00
Armin Braun	9450505d5b	Stop Passing Around REST Request in Multiple Spots (#44949 ) (#45109 ) * Stop Passing Around REST Request in Multiple Spots * Motivated by #44564 * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap. * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.	2019-08-02 07:31:38 +02:00
David Turner	c088bafbbc	Wait for events in waitForRelocation (#45074 ) Adds a `waitForEvents(Priority.LANGUID)` to the cluster health request in `ESIntegTestCase#waitForRelocation()` to deal with the case that this health request returns successfully despite the fact that there is a pending reroute task which will relocate another shard. Relates #44433 Fixes #45003	2019-08-01 13:47:39 +01:00
Nhat Nguyen	979d0a71c7	Remove leniency during replay translog in peer recovery (#44989 ) This change removes leniency in InternalEngine during replaying translog in peer recovery.	2019-07-30 13:25:15 -04:00
Armin Braun	548c767b6b	S3 3rd Party Test Goal (#44799 ) (#45004 ) * Create S3 Third Party Test Task that Covers the S3 CLI Tool * Adjust snapshot cli test tool tests to work with real S3 * Build adjustment * Clean up repo path before testing * Dedup the logic for asserting path contents by using the correct utility method here that somehow became unused	2019-07-30 17:16:41 +02:00
David Turner	55f1dd8da6	Close nodes properly in Coordinator tests (#44967 ) Today closing a `ClusterNode` in an `AbstractCoordinatorTestCase` uses `onNode()` so has no effect if the node is not in the current list of nodes. It also discards the `Runnable` it creates without having run it, so has no effect anyway. This commit makes these tests much stricter about properly closing the nodes started during `Coordinator` tests, by tracking the persisted states that are opened, and adds an assertion to catch the trappy requirement that the closing node still belongs to the cluster.	2019-07-30 11:47:36 +01:00
Andrey Ershov	5a0bd696fc	Snapshot tool S3 cleanup 7.x backport (#44575 ) Backport of #44551	2019-07-30 11:02:08 +02:00
Nhat Nguyen	4813728783	Remove leniency in reset engine from translog (#44711 ) Replaying operations from the local translog must never fail as those operations were processed successfully on the primary before and the mapping is up to update already. This change removes leniency during resetting engine from translog in IndexShard and InternalEngine.	2019-07-29 16:31:45 -04:00
Yannick Welsch	8653c33838	Fix testBlockingIncomingRequests (#44939 ) Adapted test to take non-blocking nature into account.	2019-07-29 16:37:53 +02:00
Yannick Welsch	24873dd3e3	Do not block transport thread on startup (#44939 ) We currently block the transport thread on startup, which has caused test failures. I think this is some kind of deadlock situation. I don't think we should even block a transport thread, and there's also no need to do so. We can just reject requests as long we're not fully set up. Note that the HTTP layer is only started much later (after we've completed full start up of the transport layer), so that one should be completely unaffected by this. Closes #41745	2019-07-29 11:35:17 +02:00
Jason Tedor	6ea2b5dec0	Deprecate setting processors to more than available (#44889 ) Today the processors setting is permitted to be set to more than the number of processors available to the JVM. The processors setting directly sizes the number of threads in the various thread pools, with most of these sizes being a linear function in the number of processors. It doesn't make any sense to set processors very high as the overhead from context switching amongst all the threads will overwhelm, and changing the setting does not control how many physical CPU resources there are on which to schedule the additional threads. We have to draw a line somewhere and this commit deprecates setting processors to more than the number of available processors. This is the right place to draw the line given the linear growth as a function of processors in most of the thread pools, and that some are capped at the number of available processors already.	2019-07-26 17:06:44 +09:00
Yannick Welsch	0ce841915c	Add Clone Index API (#44267 ) Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the difference that the number of primary shards is kept the same. In case where the filesystem provides hard-linking capabilities, this is a very cheap operation. Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it supports the same options as the split and shrink APIs. Closes #44128	2019-07-25 22:02:28 +02:00
Andrei Stefan	2633d11eb7	Switch from using docvalue_fields to extracting values from _source (#44062 ) (#44804 ) * Switch from using docvalue_fields to extracting values from _source where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. * This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias. (cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)	2019-07-25 10:02:41 +03:00
Igor Motov	f9943a3e53	Geo: deprecate ShapeBuilder in QueryBuilders (#44715 ) Removes unnecessary now timeline decompositions from shape builders and deprecates ShapeBuilders in QueryBuilder in favor of libs/geo shapes. Relates to #40908	2019-07-24 14:27:58 -04:00
Armin Braun	d8be9244f9	Fix Repository Cleanup Test Correctness (#44738 ) (#44751 ) * The tests were creating the corruption and asserting its existence not on the repository base path but on a clean path. As a result the consistency assertion on the repository wouldn't see the corruption ever an pass even if the cleanup was broken for repositories that have a non-root base path	2019-07-24 16:03:37 +02:00
Armin Braun	818103ff1e	Fix testRetentionLeasesClearedOnRestore (#44754 ) (#44766 ) * Fix this test randomly failing when running into async translog persistence edge case and failing to successfully close index * Also, slightly improve debug logging on close failure * Closes #44681	2019-07-23 21:29:07 +02:00
Igor Motov	9338fc8536	GEO: Switch to using GeoTestUtil to generate random geo shapes (#44635 ) Switches to more robust way of generating random test geometries by reusing lucene's GeoTestUtil. Removes duplicate random geometry generators by moving them to the test framework. Closes #37278	2019-07-23 14:30:41 -04:00
Mayya Sharipova	972a49312c	Fix testQuotedQueryStringWithBoost test (#43385 ) Add more logging to indexRandom Seems that asynchronous indexing from indexRandom sometimes indexes the same document twice, which will mess up the expected score calculations. For example, indexing: { "index" : {"_id" : "1" } } {"important" :"phrase match", "less_important": "nothing important"} { "index" : {"_id" : "2" } } {"important" :"nothing important", "less_important" :"phrase match"} Produces the expected scores: 13.8 for doc1, and 1.38 for doc2 indexing: { "index" : {"_id" : "1" } } {"important" :"phrase match", "less_important": "nothing important"} { "index" : {"_id" : "2" } } {"important" :"nothing important", "less_important" :"phrase match"} { "index" : {"_id" : "3" } } {"important" :"phrase match", "less_important": "nothing important"} Produces scores: 9.4 for doc1, and 1.96 for doc2 which are found in the error logs. Relates to #43144	2019-07-22 08:44:31 -04:00
Ryan Ernst	4c05d25ec7	Convert Transport Request/Response to Writeable (#44636 ) (#44654 ) This commit converts all remaining TransportRequest and TransportResponse classes to implement Writeable, and disallows Streamable implementations. relates #34389	2019-07-20 11:25:58 -07:00
Ryan Ernst	f4ee2e9e91	Convert direct implementations of Streamable to Writeable (#44605 ) (#44646 ) This commit converts Streamable to Writeable for direct implementations. relates #34389	2019-07-20 08:32:29 -07:00
Ryan Ernst	f193d14764	Convert remaining Action Response/Request to writeable.reader (#44528 ) (#44607 ) This commit converts readFrom to ctor with StreamInput on the remaining ActionResponse and ActionRequest classes. relates #34389	2019-07-19 13:33:38 -07:00
Lee Hinman	fe2ef66e45	Expose index age in ILM explain output (#44457 ) * Expose index age in ILM explain output This adds the index's age to the ILM explain output, for example: ``` { "indices" : { "ilm-000001" : { "index" : "ilm-000001", "managed" : true, "policy" : "full-lifecycle", "lifecycle_date" : "2019-07-16T19:48:22.294Z", "lifecycle_date_millis" : 1563306502294, "age" : "1.34m", "phase" : "hot", "phase_time" : "2019-07-16T19:48:22.487Z", ... etc ... } } } ``` This age can be used to tell when ILM will transition the index to the next phase, based on that phase's `min_age`. Resolves #38988 * Expose age in getters and in HLRC	2019-07-18 15:33:45 -06:00
Andrey Ershov	ef6ddd15c6	Revert "Snapshot tool: S3 orphaned files cleanup (#44551)" This reverts commit `09edeeb3`	2019-07-18 17:21:45 +02:00
Andrey Ershov	6f5327ba45	Fix BlobStoreTestUtil	2019-07-18 17:00:23 +02:00
Andrey Ershov	09edeeb38e	Snapshot tool: S3 orphaned files cleanup (#44551 ) A tool to work with snapshots. Co-authored by @original-brownbear. This commit adds snapshot tool and the single command cleanup, that cleans up orphaned files for S3. Snapshot tool lives in x-pack/snapshot-tool. (cherry picked from commit fc4aed44dd975d83229561090f957a95cc76b287)	2019-07-18 16:38:00 +02:00
David Turner	452f7f67a0	Defer reroute when starting shards (#44539 ) Today we reroute the cluster as part of the process of starting a shard, which runs at `URGENT` priority. In large clusters, rerouting may take some time to complete, and this means that a mere trickle of shard-started events can cause starvation for other, lower-priority, tasks that are pending on the master. However, it isn't really necessary to perform a reroute when starting a shard, as long as one occurs eventually. This commit removes the inline reroute from the process of starting a shard and replaces it with a deferred one that runs at `NORMAL` priority, avoiding starvation of higher-priority tasks. Backport of #44433 and #44543.	2019-07-18 14:10:40 +01:00
Nhat Nguyen	51180af91d	Make peer recovery send file chunks async (#44468 ) Relates #44040 Relates #36195	2019-07-17 22:25:43 -04:00
Nhat Nguyen	458f24c46a	Reenable accounting circuit breaker (#44495 ) We have a new Lucene 8.2 snapshot on master and 7.x; hence we can re-enable the accounting on these branches. Relates #30290	2019-07-17 22:25:43 -04:00
Jason Tedor	39c5f98de7	Introduce test issue logging (#44477 ) Today we have an annotation for controlling logging levels in tests. This annotation serves two purposes, one is to control the logging level used in tests, when such control is needed to impact and assert the behavior of loggers in tests. The other use is when a test is failing and additional logging is needed. This commit separates these two concerns into separate annotations. The primary motivation for this is that we have a history of leaving behind the annotation for the purpose of investigating test failures long after the test failure is resolved. The accumulation of these stale logging annotations has led to excessive disk consumption. Having recently cleaned this up, we would like to avoid falling into this state again. To do this, we are adding a link to the test failure under investigation to the annotation when used for the purpose of investigating test failures. We will add tooling to inspect these annotations, in the same way that we have tooling on awaits fix annotations. This will enable us to report on the use of these annotations, and report when stale uses of the annotation exist.	2019-07-18 05:33:33 +09:00
Yannick Welsch	f78e64e3e2	Terminate linearizability check early on large histories (#44444 ) Large histories can be problematic and have the linearizability checker occasionally run OOM. As it's very difficult to bound the size of the histories just right, this PR will let it instead run for 10 seconds on large histories and then abort. Closes #44429	2019-07-17 18:51:25 +02:00
Armin Braun	c8db0e9b7e	Remove blobExists Method from BlobContainer (#44472 ) (#44475 ) * We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface * Keep it as an implementation detail in the Azure repository	2019-07-17 11:56:02 +02:00
Tim Brooks	0a352486e8	Isolate nio channel registered from channel active (#44388 ) Registering a channel with a selector is a required operation for the channel to be handled properly. Currently, we mix the registeration with other setup operations (ip filtering, SSL initiation, etc). However, a fail to register is fatal. This PR modifies how registeration occurs to immediately close the channel if it fails. There are still two clear loopholes for how a user can interact with a channel even if registration fails. 1. through the exception handler. 2. through the channel accepted callback. These can perhaps be improved in the future. For now, this PR prevents writes from proceeding if the channel is not registered.	2019-07-16 17:18:57 -06:00
Nhat Nguyen	301c8daf4c	Revert "Make peer recovery send file chunks async (#44040 )" This reverts commit `a2b4687d89`.	2019-07-16 14:18:35 -04:00
Nhat Nguyen	a2b4687d89	Make peer recovery send file chunks async (#44040 )	2019-07-16 10:43:46 -04:00
Lee Hinman	fb0461ac76	[7.x] Add Snapshot Lifecycle Management (#44382 ) * Add Snapshot Lifecycle Management (#43934) * Add SnapshotLifecycleService and related CRUD APIs This commit adds `SnapshotLifecycleService` as a new service under the ilm plugin. This service handles snapshot lifecycle policies by scheduling based on the policies defined schedule. This also includes the get, put, and delete APIs for these policies Relates to #38461 * Make scheduledJobIds return an immutable set * Use Object.equals for SnapshotLifecyclePolicy * Remove unneeded TODO * Implement ToXContentFragment on SnapshotLifecyclePolicyItem * Copy contents of the scheduledJobIds * Handle snapshot lifecycle policy updates and deletions (#40062) (Note this is a PR against the `snapshot-lifecycle-management` feature branch) This adds logic to `SnapshotLifecycleService` to handle updates and deletes for snapshot policies. Policies with incremented versions have the old policy cancelled and the new one scheduled. Deleted policies have their schedules cancelled when they are no longer present in the cluster state metadata. Relates to #38461 * Take a snapshot for the policy when the SLM policy is triggered (#40383) (This is a PR for the `snapshot-lifecycle-management` branch) This commit fills in `SnapshotLifecycleTask` to actually perform the snapshotting when the policy is triggered. Currently there is no handling of the results (other than logging) as that will be added in subsequent work. This also adds unit tests and an integration test that schedules a policy and ensures that a snapshot is correctly taken. Relates to #38461 * Record most recent snapshot policy success/failure (#40619) Keeping a record of the results of the successes and failures will aid troubleshooting of policies and make users more confident that their snapshots are being taken as expected. This is the first step toward writing history in a more permanent fashion. * Validate snapshot lifecycle policies (#40654) (This is a PR against the `snapshot-lifecycle-management` branch) With the commit, we now validate the content of snapshot lifecycle policies when the policy is being created or updated. This checks for the validity of the id, name, schedule, and repository. Additionally, cluster state is checked to ensure that the repository exists prior to the lifecycle being added to the cluster state. Part of #38461 * Hook SLM into ILM's start and stop APIs (#40871) (This pull request is for the `snapshot-lifecycle-management` branch) This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are cancelled. Relates to #38461 * Add tests for SnapshotLifecyclePolicyItem (#40912) Adds serialization tests for SnapshotLifecyclePolicyItem. * Fix improper import in build.gradle after master merge * Add human readable version of modified date for snapshot lifecycle policy (#41035) * Add human readable version of modified date for snapshot lifecycle policy This small change changes it from: ``` ... "modified_date": 1554843903242, ... ``` To ``` ... "modified_date" : "2019-04-09T21:05:03.242Z", "modified_date_millis" : 1554843903242, ... ``` Including the `"modified_date"` field when the `?human` field is used. Relates to #38461 * Fix test * Add API to execute SLM policy on demand (#41038) This commit adds the ability to perform a snapshot on demand for a policy. This can be useful to take a snapshot immediately prior to performing some sort of maintenance. ```json PUT /_ilm/snapshot/<policy>/_execute ``` And it returns the response with the generated snapshot name: ```json { "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug" } ``` Note that this does not allow waiting for the snapshot, and the snapshot could still fail. It does record this information into the cluster state similar to a regularly trigged SLM job. Relates to #38461 * Add next_execution to SLM policy metadata (#41221) * Add next_execution to SLM policy metadata This adds the next time a snapshot lifecycle policy will be executed when retriving a policy's metadata, for example: ```json GET /_ilm/snapshot?human { "production" : { "version" : 1, "modified_date" : "2019-04-15T21:16:21.865Z", "modified_date_millis" : 1555362981865, "policy" : { "name" : "<production-snap-{now/d}>", "schedule" : "/30 * * * ?", "repository" : "repo", "config" : { "indices" : [ "foo-", "important" ], "ignore_unavailable" : true, "include_global_state" : false } }, "next_execution" : "2019-04-15T21:16:30.000Z", "next_execution_millis" : 1555362990000 }, "other" : { "version" : 1, "modified_date" : "2019-04-15T21:12:19.959Z", "modified_date_millis" : 1555362739959, "policy" : { "name" : "<other-snap-{now/d}>", "schedule" : "0 30 2 * ?", "repository" : "repo", "config" : { "indices" : [ "other" ], "ignore_unavailable" : false, "include_global_state" : true } }, "next_execution" : "2019-04-16T02:30:00.000Z", "next_execution_millis" : 1555381800000 } } ``` Relates to #38461 * Fix and enhance tests * Figured out how to Cron * Change SLM endpoint from /_ilm/* to /_slm/* (#41320) This commit changes the endpoint for snapshot lifecycle management from: ``` GET /_ilm/snapshot/<policy> ``` to: ``` GET /_slm/policy/<policy> ``` It mimics the ILM path only using `slm` instead of `ilm`. Relates to #38461 * Add initial documentation for SLM (#41510) * Add initial documentation for SLM This adds the initial documentation for snapshot lifecycle management. It also includes the REST spec API json files since they're sort of documentation. Relates to #38461 * Add `manage_slm` and `read_slm` roles (#41607) * Add `manage_slm` and `read_slm` roles This adds two more built in roles - `manage_slm` which has permission to perform any of the SLM actions, as well as stopping, starting, and retrieving the operation status of ILM. `read_slm` which has permission to retrieve snapshot lifecycle policies as well as retrieving the operation status of ILM. Relates to #38461 * Add execute to the test * Fix ilm -> slm typo in test * Record SLM history into an index (#41707) It is useful to have a record of the actions that Snapshot Lifecycle Management takes, especially for the purposes of alerting when a snapshot fails or has not been taken successfully for a certain amount of time. This adds the infrastructure to record SLM actions into an index that can be queried at leisure, along with a lifecycle policy so that this history does not grow without bound. Additionally, SLM automatically setting up an index + lifecycle policy leads to `index_lifecycle` custom metadata in the cluster state, which some of the ML tests don't know how to deal with due to setting up custom `NamedXContentRegistry`s. Watcher would cause the same problem, but it is already disabled (for the same reason). * High Level Rest Client support for SLM (#41767) * High Level Rest Client support for SLM This commit add HLRC support for SLM. Relates to #38461 * Fill out documentation tests with tags * Add more callouts and asciidoc for HLRC * Update javadoc links to real locations * Add security test testing SLM cluster privileges (#42678) * Add security test testing SLM cluster privileges This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm` cluster privileges. Relates to #38461 * Don't redefine vars * Add Getting Started Guide for SLM (#42878) This commit adds a basic Getting Started Guide for SLM. * Include SLM policy name in Snapshot metadata (#43132) Keep track of which SLM policy in the metadata field of the Snapshots taken by SLM. This allows users to more easily understand where the snapshot came from, and will enable future SLM features such as retention policies. * Fix compilation after master merge * [TEST] Move exception wrapping for devious exception throwing Fixes an issue where an exception was created from one line and thrown in another. * Fix SLM for the change to AcknowledgedResponse * Add Snapshot Lifecycle Management Package Docs (#43535) * Fix compilation for transport actions now that task is required * Add a note mentioning the privileges needed for SLM (#43708) * Add a note mentioning the privileges needed for SLM This adds a note to the top of the "getting started with SLM" documentation mentioning that there are two built-in privileges to assist with creating roles for SLM users and administrators. Relates to #38461 * Mention that you can create snapshots for indices you can't read * Fix REST tests for new number of cluster privileges * Mute testThatNonExistingTemplatesAreAddedImmediately (#43951) * Fix SnapshotHistoryStoreTests after merge * Remove overridden newResponse functions that have been removed * Fix compilation for backport * Fix get snapshot output parsing in test * [DOCS] Add redirects for removed autogen anchors (#44380) * Switch <tt>...</tt> in javadocs for {@code ...}	2019-07-16 07:37:13 -06:00
Armin Braun	099d52f3b0	Prevent Confusing Blocked Thread Warnings in MockNioTransport (#44356 ) (#44376 ) * Prevent Confusing Blocked Thread Warnings in MockNioTransport * We can run into a race where the stacktrace collection and subsequent logging happens after the thread has already unblocked thus logging a confusing stacktrace of wherever the transport thread was after it became unblocked * Fixed this by comparing whether or not the recorded timestamp is still the same before and after the stacktrace was recorded and not logging if it already changed	2019-07-16 04:40:50 +02:00
David Turner	86ee8eab3f	Allow RerouteService to reroute at lower priority (#44338 ) Today the `BatchedRerouteService` submits its delayed reroute task at `HIGH` priority, but in some cases a lower priority would be more appropriate. This commit adds the facility to submit delayed reroute tasks at different priorities, such that each submitted reroute task runs at a priority no lower than the one requested. It does not change the fact that all delayed reroute tasks are submitted at `HIGH` priority, but at least it makes this explicit.	2019-07-15 17:41:39 +01:00
Christoph Büscher	835b7a120d	Fix AnalyzeAction response serialization (#44284 ) Currently we loose information about whether a token list in an AnalyzeAction response is null or an empty list, because we write a 0 value to the stream in both cases and deserialize to a null value on the receiving side. This change fixes this so we write an additional flag indicating whether the value is null or not, followed by the size of the list and its content. Closes #44078	2019-07-14 10:35:11 +02:00
Przemyslaw Gomulka	e23ecc5838	JSON logging refactoring and X-Opaque-ID support backport(#41354 ) (#44178 ) This is a refactor to current JSON logging to make it more open for extensions and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage) These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter. Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field. closes #41350 backport #41354	2019-07-12 16:53:27 +02:00
Yannick Welsch	068286ca4b	Remove RemoteClusterConnection.ConnectedNodes (#44235 ) This instead exposes the set of connected nodes on ConnectionManager.	2019-07-12 14:54:21 +02:00
Armin Braun	6c02cf0241	Fix InternalTestCluster StopRandomNode Assertion (#44258 ) (#44265 ) * The assertion added in #44214 is tripped by tests running dedicated test clusters per test needlessly.This breaks existing tests like the one in #44245. * Closes #44245	2019-07-12 13:18:55 +02:00
David Turner	735c897ec6	Avoid counting votes from master-ineligible nodes (#43688 ) Today if a master-eligible node is converted to a master-ineligible node it may remain in the voting configuration, meaning that the master node may count its publish responses as an indication that it has properly persisted the cluster state. However master-ineligible nodes do not properly persist the cluster state, so it is not safe to count these votes. This change adjusts `CoordinationState` to take account of this from a safety point of view, and also adjusts the `Coordinator` to prevent such nodes from joining the cluster. Instead, it triggers a reconfiguration to remove from the voting configuration a node that now appears to be master-ineligible before processing its join. Backport of #43688, see #44260.	2019-07-12 11:30:52 +01:00
Alpar Torok	8d35583c43	Fix port range allocation with large worker IDs (#44213 ) * Fix port range allocation with large worker IDs Relates to #43983 The IDs gradle uses are incremented for the lifetime of the daemon which can result in port ranges that are outside the valid range. This change implements a modulo based formula to wrap the port ranges when the IDs get too large. Adresses #44134 but #44157 is also required to be able to close it.	2019-07-12 11:04:57 +03:00
Yogesh Gaikwad	91c342a888	fix and enable repository-hdfs secure tests (#44044 ) (#44199 ) Due to recent changes are done for converting `repository-hdfs` to test clusters (#41252), the `integTestSecure*` tasks did not depend on `secureHdfsFixture` which when running would fail as the fixture would not be available. This commit adds the dependency of the fixture to the task. The `secureHdfsFixture` is a `AntFixture` which is spawned a process. Internally it waits for 30 seconds for the resources to be made available. For my local machine, it took almost 45 seconds to be available so I have added the wait time as an input to the `AntFixture` defaults to 30 seconds and set it to 60 seconds in case of secure hdfs fixture. The integ test for secure hdfs was disabled for a long time and so the changes done in #42090 to fix the tests are also done in this commit.	2019-07-12 12:44:01 +10:00
Armin Braun	2768662822	Cleanup Stale Root Level Blobs in Sn. Repository (#43542 ) (#44226 ) * Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to #42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written	2019-07-11 19:35:15 +02:00
Armin Braun	5f22370b6b	Fix ShrinkIndexIT (#44214 ) (#44223 ) * Fix ShrinkIndexIT * Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived. * Closes #44164	2019-07-11 17:58:00 +02:00
Nick Knize	374030a53f	Upgrade to lucene-8.2.0-snapshot-860e0be5378 (#44171 ) (#44184 ) Upgrades lucene library to lucene-8.2.0-snapshot-860e0be5378	2019-07-11 09:17:22 -05:00
Alpar Torok	7ba18732f7	Run some REST tests against a cluster running in docker containers (#39515 ) * Run REST tests against a cluster running on docker Closes #38053	2019-07-11 15:28:33 +03:00
Armin Braun	8ce8c627dd	Some Cleanup in o.e.i.shard (#44097 ) (#44208 ) * Some Cleanup in o.e.i.shard * Extract one duplicated method * Cleanup obviously unused code	2019-07-11 13:52:06 +02:00
Yannick Welsch	2ee07f1ff4	Simplify port usage in transport tests (#44157 ) Simplifies AbstractSimpleTransportTestCase to use JVM-local ports and also adds an assertion so that cases like #44134 can be more easily debugged. The likely reason for that one is that a test, which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle daemon) kept increasing Gradle worker IDs, causing an overflow at some point.	2019-07-11 13:35:37 +02:00
Armin Braun	c0ed64bb92	Improve Repository Consistency Check in Tests (#44204 ) * Improve Repository Consistency Check in Tests (#44099) * Check that index metadata as well as snapshot metadata always exists when referenced by other metadata * Fix SnapshotResiliencyTests on ExtraFS (#44113) * As a result of #44099 we're now checking more directories and have to ignore the `extraN` folders for those like we do for indices already * Closes #44112	2019-07-11 11:14:37 +02:00
Armin Braun	8a554f9737	Remove IncompatibleSnapshots Logic from Codebase (#44096 ) (#44183 ) * The incompatible snapshots logic was created to track 1.x snapshots that became incompatible with 2.x * It serves no purpose at this point * It adds an additional GET request to every loading of RepositoryData (from loading the incompatible snapshots blob)	2019-07-11 07:15:51 +02:00
Armin Braun	d6f09fdb97	Add WARN Logging if Mock Network Accepts Huge Number of Connections (#44169 ) (#44182 ) * Add WARN Logging if Mock Network Accepts Huge Number of Connections * As discussed, added warn logging to rule out endless accept loops for #43387 * Had to handle it by the relatively awkward override in the mock nio because we don't have logging in the NIO module where (`ServerChannelContext` lives)	2019-07-10 22:08:36 +02:00
Ryan Ernst	fb77d8f461	Removed writeTo from TransportResponse and ActionResponse (#44092 ) The base classes for transport requests and responses currently implement Streamable and Writeable. The writeTo method on these base classes is implemented with an empty implementation. Not only does this complicate subclasses to think they need to call super.writeTo, but it also can lead to not implementing writeTo when it should have been implemented, or extendiong one of these classes when not necessary, since there is nothing to actually implement. This commit removes the empty writeTo from these base classes, and fixes subclasses to not call super and in some cases implement an empty writeTo themselves. relates #34389	2019-07-10 12:42:04 -07:00
Zachary Tong	92ad588275	Remove generic on AggregatorFactory (#43664 ) (#44079 ) AggregatorFactory was generic over itself, but it doesn't appear we use this functionality anywhere (e.g. to allow the super class to declare arguments/return types generically for subclasses to override). Most places use a wildcard constraint, and even when a concrete type is specified it wasn't used. But since AggFactories are widely used, this led to the generic touching many pieces of code and making type signatures fairly complex	2019-07-10 13:20:28 -04:00
David Turner	aec44fecbc	Decouple DiskThresholdMonitor & ClusterInfoService (#44105 ) Today the `ClusterInfoService` requires the `DiskThresholdMonitor` at construction time so that it can notify it when nodes report changes in their disk usage, but this is awkward to construct: the `DiskThresholdMonitor` requires a `RerouteService` which requires an `AllocationService` which comees from the `ClusterModule` which requires the `ClusterInfoService`. Today we break the cycle with a `LazilyInitializedRerouteService` which is itself a little ugly. This commit replaces this with a more traditional subject/observer relationship between the `ClusterInfoService` and the `DiskThresholdMonitor`.	2019-07-09 18:43:32 +01:00
David Turner	268971db03	Wait for blackholed connection before discovery (#44077 ) Since #42636 we no longer treat connections specially when simulating a blackholed connection. This means that at the end of the safety phase we may have just started a connection attempt which will time out, but the default timeout is 30 seconds, much longer than the 2 seconds we normally allow for post-safety-phase discovery. This commit adds time for such a connection attempt to time out. It also fixes some spurious logging of `this` that now refers to an object with an unhelpful `toString()` implementation introduced in #42636. Fixes #44073	2019-07-09 10:59:53 +01:00
Armin Braun	9eac5ceb1b	Dry up inputstream to bytesreference (#43675 ) (#44094 ) * Dry up Reading InputStream to BytesReference * Dry up spots where we use the same pattern to get from an InputStream to a BytesReferences	2019-07-09 09:18:25 +02:00
David Turner	6dce458ecc	Randomise retention lease expiry time (#44067 ) In today's test suite indices mostly use the default value of `12h` for the `index.soft_deletes.retention_lease.period` setting, which in the context of the test suite essentially means "never expires". In fact, the tests should all behave correctly even if the lease period is much shorter; tests that rely on leases not expiring should configure their indices appropriately. This commit randomises the lease expiry time for those indices created during tests which do not set a specific value for this setting.	2019-07-08 18:29:27 +02:00
Alpar Torok	0c8294e633	Make sure the clean task doesn't break test fixtures (#43641 ) Use a dedicated fixture dir.	2019-07-08 17:58:27 +03:00
Armin Braun	afe81fd625	Some Cleanup in Test Framework (#44039 ) (#44059 ) * Remove some obvious dead code * Move assert methods that were only used in a single test class to the child they belong to * Inline some redundant methods	2019-07-08 14:15:31 +02:00
Armin Braun	af9b98e81c	Recursively Delete Unreferenced Index Directories (#42189 ) (#44051 ) * Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes * Runs after ever delete operation * Relates #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)	2019-07-08 10:55:39 +02:00
Nhat Nguyen	9089820d8f	Enable indexing optimization using sequence numbers on replicas (#43616 ) This PR enables the indexing optimization using sequence numbers on replicas. With this optimization, indexing on replicas should be faster and use less memory as it can forgo the version lookup when possible. This change also deactivates the append-only optimization on replicas. Relates #34099	2019-07-05 22:12:08 -04:00
Yannick Welsch	504a43d43a	Move ConnectionManager to async APIs (#42636 ) This commit converts the ConnectionManager's openConnection and connectToNode methods to async-style. This will allow us to not block threads anymore when opening connections. This PR also adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove some hacks in the test infrastructure that had to account for the previous synchronous nature of the connection APIs.	2019-07-05 20:40:22 +02:00
Yannick Welsch	d090fa514f	Use unique ports per test worker (#43983 ) * Use unique ports per test worker * Add test for system property * check presence of tests.gradle * Revert "check presence of tests.gradle" This reverts commit 2fee7512a28f95c94c5bf7a3312e808f918a9510.	2019-07-05 11:02:28 +02:00
Jim Ferenczi	cdf55cb5c5	Refactor index engines to manage readers instead of searchers (#43860 ) This commit changes the way we manage refreshes in the index engines. Instead of relying on a SearcherManager, this change uses a ReaderManager that creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand (when acquireSearcher is called) from the current ElasticsearchDirectoryReader. It also slightly changes the Engine.Searcher to extend IndexSearcher in order to simplify the usage in the consumer.	2019-07-04 22:49:43 +02:00
Alan Woodward	4b99255fed	Add name() method to TokenizerFactory (#43909 ) This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory, and removes the need to pass around tokenizer names when building custom analyzers. As this means that TokenizerFactory is no longer a functional interface, the commit also adds a factory method to TokenizerFactory to make construction simpler.	2019-07-04 11:28:55 +01:00
Armin Braun	be20fb80e4	Recursive Delete on BlobContainer (#43281 ) (#43920 ) This is a prerequisite of #42189: * Add directory delete method to blob container specific to each implementation: * Some notes on the implementations: * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting. * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block. * HDFS and FS are trivial since we have directory delete methods available for them * Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)	2019-07-03 17:14:57 +02:00
Armin Braun	455b12a4fb	Add Ability to List Child Containers to BlobContainer (#42653 ) (#43903 ) * Add Ability to List Child Containers to BlobContainer (#42653) * Add Ability to List Child Containers to BlobContainer * This is a prerequisite of #42189	2019-07-03 11:30:49 +02:00
Armin Braun	826f38cd70	Enable Parallel Deletes in Azure Repository (#42783 ) (#43886 ) * Parallel deletes via private thread pool	2019-07-03 09:28:39 +02:00
Zachary Tong	ea1794832f	Add RareTerms aggregation (#35718 ) This adds a `rare_terms` aggregation. It is an aggregation designed to identify the long-tail of keywords, e.g. terms that are "rare" or have low doc counts. This aggregation is designed to be more memory efficient than the alternative, which is setting a terms aggregation to size: LONG_MAX (or worse, ordering a terms agg by count ascending, which has unbounded error). This aggregation works by maintaining a map of terms that have been seen. A counter associated with each value is incremented when we see the term again. If the counter surpasses a predefined threshold, the term is removed from the map and inserted into a cuckoo filter. If a future term is found in the cuckoo filter we assume it was previously removed from the map and is "common". The map keys are the "rare" terms after collection is done.	2019-07-01 10:30:02 -04:00
Nhat Nguyen	598e00a689	Make peer recovery send file info step async (#43792 ) Relates #36195	2019-07-01 08:40:45 -04:00
Ryan Ernst	3a2c698ce0	Rename Action to ActionType (#43778 ) Action is a class that encapsulates meta information about an action that allows it to be called remotely, specifically the action name and response type. With recent refactoring, the action class can now be constructed as a static constant, instead of needing to create a subclass. This makes the old pattern of creating a singleton INSTANCE both misnamed and lacking a common placement. This commit renames Action to ActionType, thus allowing the old INSTANCE naming pattern to be TYPE on the transport action itself. ActionType also conveys that this class is also not the action itself, although this change does not rename any concrete classes as those will be removed organically as they are converted to TYPE constants. relates #34389	2019-06-30 22:00:17 -07:00
David Turner	fca7a19713	Avoid parallel reroutes in DiskThresholdMonitor (#43381 ) Today the `DiskThresholdMonitor` limits the frequency with which it submits reroute tasks, but it might still submit these tasks faster than the master can process them if, for instance, each reroute takes over 60 seconds. This causes a problem since the reroute task runs with priority `IMMEDIATE` and is always scheduled when there is a node over the high watermark, so this can starve any other pending tasks on the master. This change avoids further updates from the monitor while its last task(s) are still in progress, and it measures the time of each update from the completion time of the reroute task rather than its start time, to allow a larger window for other tasks to run. It also now makes use of the `RoutingService` to submit the reroute task, in order to batch this task with any other pending reroutes. It enhances the `RoutingService` to notify its listeners on completion. Fixes #40174 Relates #42559	2019-06-30 16:54:16 +01:00
Nhat Nguyen	55b3ec8d7b	Make peer recovery clean files step async (#43787 ) Relates #36195	2019-06-29 18:30:51 -04:00
Albert Zaharovits	5e17bc5dcc	Consistent Secure Settings #40416 Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes a single public method, namely `allSecureSettingsConsistent`. The method returns `true` if the local node's secure settings (inside the keystore) are equal to the master's, and `false` otherwise. Technically, the local node has to have exactly the same secure settings - setting names should not be missing or in surplus - for all `SecureSetting` instances that are flagged with the newly introduced `Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent` is not a consensus view across the cluster, but rather the local node's perspective in relation to the master.	2019-06-29 23:26:17 +03:00
Jim Ferenczi	7ca69db83f	Refactor IndexSearcherWrapper to disallow the wrapping of IndexSearcher (#43645 ) This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection betweeen the query and the live docs instead of checking the live docs on every document that match the query.	2019-06-28 16:28:02 +02:00
Christoph Büscher	2cc7f5a744	Allow reloading of search time analyzers (#43313 ) Currently changing resources (like dictionaries, synonym files etc...) of search time analyzers is only possible by closing an index, changing the underlying resource (e.g. synonym files) and then re-opening the index for the change to take effect. This PR adds a new API endpoint that allows triggering reloading of certain analysis resources (currently token filters) that will then pick up changes in underlying file resources. To achieve this we introduce a new type of custom analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows swapping out analysis components. Custom analyzers that contain filters that are markes as "updateable" will automatically choose this implementation. This PR also adds this capability to `synonym` token filters for use in search time analyzers. Relates to #29051	2019-06-28 09:55:40 +02:00
Nhat Nguyen	ce8771feb7	Do not use MockInternalEngine in GatewayIndexStateIT (#43716 ) GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the flushing on shutdown. This behaviour, however, can be randomly disabled in MockInternalEngine. Closes #43034	2019-06-27 18:28:04 -04:00
Yannick Welsch	6744344ef2	Handle situation where only voting-only nodes are bootstrapped (#43628 ) Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will still try to become elected and bring full master nodes into the cluster.	2019-06-27 18:10:15 +02:00
David Roberts	c5beb05f77	[ML][DataFrame] Consider data frame templates internal in REST tests (#43692 ) The data frame index template pattern was not in the list considered as internal and therefore not needing cleanup after every test.	2019-06-27 14:40:30 +01:00
Christoph Büscher	36360358b2	Move query builder caching check to dedicated tests (#43238 ) Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable flag. This is a bit fragile due to the high randomization of query builders performed by this general test. Also we might only rarely check the "interesting" cases because they rarely get generated when fully randomizing the query builder. This change moved the general checks out ot #testToQuery and instead adds dedicated cache tests for those query builders that exhibit something other than the default behaviour. Closes #43200	2019-06-27 14:56:29 +02:00
Yannick Welsch	2049f715b3	Add voting-only master node (#43410 ) A voting-only master-eligible node is a node that can participate in master elections but will not act as a master in the cluster. In particular, a voting-only node can help elect another master-eligible node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two can still elect a master amongst them-selves. This only requires one of the two remaining nodes to have the capability to act as master, but both need to have voting powers. This means that one of the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a voting-only non-dedicated master node can play the role of the third master-eligible node, which allows running an HA cluster with only two dedicated master nodes. Closes #14340 Co-authored-by: David Turner <david.turner@elastic.co>	2019-06-26 08:07:56 +02:00
Zachary Tong	63fef5a31e	Add scripting support to AggregatorTestCase (#43494 ) This refactors AggregatorTestCase to allow testing mock scripts. The main change is to QueryShardContext. This was previously mocked, but to get the ScriptService you have to invoke a final method which can't be mocked. Instead, we just create a mostly-empty QueryShardContext and populate the fields that are needed for testing. It also introduces a few new helper methods that can be overridden to change the default behavior a bit. Most tests should be able to override getMockScriptService() to supply a ScriptService to the context, which is later used by the aggs. More complicated tests can override queryShardContextMock() as before. Adds a test to MaxAggregatorTests to test out the new functionality.	2019-06-25 11:52:12 -04:00
Yannick Welsch	d45f12799c	Sync global checkpoint on pending in-sync shards (#43526 ) At the end of a peer recovery the primary wants to mark the replica as in-sync. For that the persisted local checkpoint of the replica needs to have caught up with the global checkpoint on the primary. If translog durability is set to ASYNC, this means that information about the persisted local checkpoint can lag on the primary and might need to be explicitly fetched through a global checkpoint sync action. Unfortunately, that action will only be triggered after 30 seconds, and, even worse, will only run based on what the in-sync shard copies say (see IndexShard.maybeSyncGlobalCheckpoint). As the replica has not been marked as in-sync yet, it is not taken into consideration, and the primary might have its global checkpoint equal to the max seq no, so it thinks nothing needs to be done. Closes #43486	2019-06-24 18:35:57 +02:00
Armin Braun	1053a89b79	Log Blocked IO Thread State (#43424 ) (#43447 ) * Let's log the state of the thread to find out if it's dead-locked or just stuck after being suspended * Relates #43392	2019-06-20 22:31:36 +02:00
Yannick Welsch	29d76baf7d	Increase timeout for assertSeqNos Helps with tests that do async translog syncing	2019-06-20 19:06:49 +02:00
Zachary Tong	a8a81200d0	Better support for unmapped fields in AggregatorTestCase (#43405 ) AggregatorTestCase will NPE if only a single, null MappedFieldType is provided (which is required to simulate an unmapped field). While it's possible to test unmapped fields by supplying other, non-related field types... that's clunky and unnecessary. AggregatorTestCase just needs to filter out null field types when setting up.	2019-06-20 11:31:49 -04:00
Armin Braun	7d1983a7e3	Fix Operation Timestamps in Tests (#43155 ) (#43419 ) * For the issue in #43086 we were running into inactive shards because the random timestamps previously used would randomly make `org.elasticsearch.index.shard.IndexShard#checkIdle` see an incorrect+huge inactive time * Also fixed one other spot in tests that passed `ms` instead of `ns` for the same timestamp on an index op to correctly use relative `ns` * Closes #43086	2019-06-20 16:36:17 +02:00
David Turner	c8eb09f158	Fail connection attempts earlier in tests (#43320 ) Today the `DisruptibleMockTransport` always allows a connection to a node to be established, and then fails requests sent to that node such as the subsequent handshake. Since #42342, we log handshake failures on an open connection as a warning, and this makes the test logs rather noisy. This change fails the connection attempt first, avoiding these unrealistic warnings.	2019-06-20 14:45:24 +01:00
Armin Braun	5af9387fad	Fix Stuck IO Thread Logging Time Precision (#42882 ) * The precision of the timestamps we get from the cached time thread is only 200ms by default resulting in a number of needless ~200ms slow network thread execution logs * Fixed by making the warn threshold a function of the precision of the cached time thread found in the settings	2019-06-20 14:26:20 +02:00
Yannick Welsch	7f8e1454ab	Advance checkpoints only after persisting ops (#43205 ) Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync. This commit required changing some core classes in the system: - The LocalCheckpointTracker keeps track now not only of the information whether an operation has been processed, but also whether that operation has been persisted to disk. - TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this. - ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all shard copies when in primary mode. The computed global checkpoint (which represents the minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored in the checkpoint entry for the local shard copy, has been moved to an extra field. - The periodic global checkpoint sync now also takes async durability into account, where the local checkpoints on shards only advance when the translog is asynchronously fsynced. This means that the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is not sufficient anymore. - The new index closing API does not work when combined with async durability. The shard verification step is now requires an additional pre-flight step to fsync the translog, so that the main verify shard step has the most up-to-date global checkpoint at disposition.	2019-06-20 11:12:38 +02:00
Mark Vieira	0867ea75b7	Properly format reproduction lines for test methods that contain periods (#43255 )	2019-06-18 09:17:47 -07:00
Nhat Nguyen	0c5086d2f3	Rebuild version map when opening internal engine (#43202 ) With this change, we will rebuild the live version map and local checkpoint using documents (including soft-deleted) from the safe commit when opening an internal engine. This allows us to safely prune away _id of all soft-deleted documents as the version map is always in-sync with the Lucene index. Relates #40741 Supersedes #42979	2019-06-17 18:08:09 -04:00
Martijn Laarman	8b1b9f8ab9	Introduce stability description to the REST API specification (#38413 ) (#43278 ) * introduce state to the REST API specification * change state over to stability * CCR is no GA updated to stable * SQL is now GA so marked as stable * Introduce `internal` as state for API's, marks stable in terms of lifetime but unstable in terms of guarantees on its output format since it exposes internal representations * make setting a wrong stability value, or not setting it at all an error that causes the YAML test suite to fail * update spec files to be explicit about their stability state * Document the fact that stability needs to be defined Otherwise the YAML test runner will fail (with a nice exception message) * address check style violations * update rest spec unit tests to include stability * found one more test spec file not declaring stability, made sure stability appears after documentation everywhere * cluster.state is stable, mark response in some way to denote its a key value format that can be changed during minors * mark data frame API's as beta * remove internal and private as states for an API * removed the wrong enum values in the Stability Enum in the previous commit (cherry picked from commit 61c34bbd92f8f7e5f22fa411c6b682b0ebd8a99d)	2019-06-17 16:57:13 +02:00
Henning Andersen	ba15d08e14	Allow cluster access during node restart (#42946 ) (#43272 ) This commit modifies InternalTestCluster to allow using client() and other operations inside a RestartCallback (onStoppedNode typically). Restarting nodes are now removed from the map and thus all methods now return the state as if the restarting node does not exist. This avoids various exceptions stemming from accessing the stopped node(s).	2019-06-17 15:04:17 +02:00
Christoph Büscher	7af23324e3	SimpleQ.S.B and QueryStringQ.S.B tests should avoid `now` in query (#43199 ) Currently the randomization of the q.b. in these tests can create query strings that can cause caching to be disabled for this query if we query all fields and there is a date field present. This is pretty much an anomaly that we shouldn't generally test for in the "testToQuery" tests where cache policies are checked. This change makes sure we don't create offending query strings so the cache checks never hit these cases and adds a special test method to check this edge case. Closes #43112	2019-06-14 11:21:48 +02:00
Alpar Torok	7cc6dca697	Remove explicily enabled build fixture task	2019-06-14 10:42:08 +03:00
Jason Tedor	5bc3b7f741	Enable node roles to be pluggable (#43175 ) This commit introduces the possibility for a plugin to introduce additional node roles.	2019-06-13 15:15:48 -04:00
Yogesh Gaikwad	4ae1e30a98	Enable krb5kdc-fixture, kerberos tests mount urandom for kdc container (#41710 ) (#43178 ) Infra has fixed #10462 by installing `haveged` on CI workers. This commit enables the disabled fixture and tests, and mounts `/dev/urandom` for the container so there is enough entropy required for kdc. Note: hdfs-repository tests have been disabled, will raise a separate issue for it. Closes #40624 Closes #40678	2019-06-13 13:02:16 +10:00
Alan Woodward	9de1c69c28	IndexAnalyzers doesn't need to extend AbstractIndexComponent (#43149 ) AIC doesn't add anything here, and it removes the need to pass index settings to the constructor.	2019-06-12 17:48:31 +01:00
Henning Andersen	6108899a2d	Fix unresponsive network simulation (#42579 ) Unresponsive network simulation would throw away requests. However, then we no longer have any guarantees that a transport action either succeeds or fails, which could lead to hangs (example: unclosed IndexShard permits). Closes #42244	2019-06-11 17:54:48 +02:00
Nhat Nguyen	f2e66e22eb	Increase waiting time when check retention locks (#42994 ) WriteActionsTests#testBulk and WriteActionsTests#testIndex sometimes fail with a pending retention lock. We might leak retention locks when switching to async recovery. However, it's more likely that ongoing recoveries prevent the retention lock from releasing. This change increases the waiting time when we check for no pending retention lock and also ensures no ongoing recovery in WriteActionsTests. Closes #41054	2019-06-10 17:58:37 -04:00
Jason Tedor	915d2f2daa	Refactor put mapping request validation for reuse (#43005 ) This commit refactors put mapping request validation for reuse. The concrete case that we are after here is the ability to apply effectively the same framework to indices aliases requests. This commit refactors the put mapping request validation framework to allow for that.	2019-06-09 10:19:04 -04:00
henryptung	61b62125b8	Wire query cache into sorting nested-filter computation (#42906 ) Don't use Lucene's default query cache when filtering in sort. Closes #42813	2019-06-06 21:16:58 +02:00
Gordon Brown	6eb4600e93	Add custom metadata to snapshots (#41281 ) Adds a metadata field to snapshots which can be used to store arbitrary key-value information. This may be useful for attaching a description of why a snapshot was taken, tagging snapshots to make categorization easier, or identifying the source of automatically-created snapshots.	2019-06-05 17:30:31 -06:00
Przemyslaw Gomulka	ab5bc83597	Deprecation info for joda-java migration on 7.x (#42659 ) Some clusters might have been already migrated to version 7 without being warned about the joda-java migration changes. Deprecation api on that version will give them guidance on what patterns need to be changed. relates. This change is using the same logic like in 6.8 that is: verifying the pattern is from the incompatible set ('y'-Y', 'C', 'Z' etc), not from predifined set, not prefixed with 8. AND was also created in 6.x. Mappings created in 7.x are considered migrated and should not generate warnings There is no pipeline check (present on 6.8) as it is impossible to verify when the pipeline was created, and therefore to make sure the format is depracated or not #42010	2019-06-05 19:50:04 +02:00
Mark Vieira	e44b8b1e2e	[Backport] Remove dependency substitutions 7.x (#42866 ) * Remove unnecessary usage of Gradle dependency substitution rules (#42773) (cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)	2019-06-04 13:50:23 -07:00
Tim Vernum	8de3a88205	Log the status of security on license change (#42741 ) Whether security is enabled/disabled is dependent on the combination of the node settings and the cluster license. This commit adds a license state listener that logs when the license change causes security to switch state (or to be initialised). This is primarily useful for diagnosing cluster formation issues. Backport of: #42488	2019-06-04 14:25:43 +10:00
David Turner	df0f0b3d40	Rename autoMinMasterNodes to autoManageMasterNodes (#42789 ) Renames the `ClusterScope` attribute `autoMinMasterNodes` to reflect its broader meaning since 7.0. Backport of the relevant part of #42700 to `7.x`.	2019-06-03 12:12:07 +01:00
Jason Tedor	371cb9a8ce	Remove Log4j 1.2 API as a dependency (#42702 ) We had this as a dependency for legacy dependencies that still needed the Log4j 1.2 API. This appears to no longer be necessary, so this commit removes this artifact as a dependency. To remove this dependency, we had to fix a few places where we were accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since both APIs were on the compile-time classpath). Finally, we can remove our custom Netty logger factory. This was needed when we were on Log4j 1.2 and handled logging in our own unique way. When we migrated to Log4j 2 we could have dropped this dependency. However, even then Netty would still pick up Log4j 1.2 since it was on the classpath, thus the advantage to removing this as a dependency now.	2019-05-30 16:08:07 -04:00
Marios Trivyzas	ce30afcd01	Deprecate CommonTermsQuery and cutoff_frequency (#42619 ) (#42691 ) Since the max_score optimization landed in Elasticsearch 7, the CommonTermsQuery is redundant and slower. Moreover the cutoff_frequency parameter for MatchQuery and MultiMatchQuery is redundant. Relates to #27096 (cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)	2019-05-30 18:04:47 +02:00
Armin Braun	0e92ef1843	Fix Incorrect Time Math in MockTransport (#42595 ) (#42617 ) * Fix Incorrect Time Math in MockTransport * The timeunit here must be nanos for the current time (we even convert it accordingly in the logging) * Also, changed the log message when dumping stack traces a little to make it easier to grep for (otherwise it's the same as the message on unregister)	2019-05-28 17:58:23 +02:00
David Turner	746a2f41fd	Remove PRE_60_NODE_CHECKPOINT (#42531 ) This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing with 5.x nodes' lack of sequence number support. Backport of #42527	2019-05-28 12:25:53 +01:00
Armin Braun	44bf784fe1	Add Infrastructure to Run 3rd Party Repository Tests (#42586 ) (#42604 ) * Add Infrastructure to Run 3rd Party Repository Tests * Add infrastructure to run third party repository tests using our standard JUnit infrastructure * This is a prerequisite of #42189	2019-05-28 10:46:22 +02:00
Armin Braun	c4f44024af	Remove Delete Method from BlobStore (#41619 ) (#42574 ) * Remove Delete Method from BlobStore (#41619) * The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either	2019-05-27 12:24:20 +02:00
Armin Braun	b68358945f	Dump Stacktrace on Slow IO-Thread Operations (#42000 ) (#42572 ) * Dump Stacktrace on Slow IO-Thread Operations * Follow up to #39729 extending the functionality to actually dump the stack when the thread is blocked not afterwards * Logging the stacktrace after the thread became unblocked is only of limited use because we don't know what happened in the slow callback from that (only whether we were blocked on a read,write,connect etc.) * Relates #41745	2019-05-27 11:44:36 +02:00
Armin Braun	489616da62	Fix testTracerLog Network Tests (#42286 ) (#42565 ) * Fix testTracerLog Network Tests * Start appender before using it like we do for e.g. the Netty leak detection appender to avoid interference from actions on the network threads that might still be dangling from previous tests in the same suite * Closes #41890	2019-05-27 11:39:59 +02:00
Nhat Nguyen	02739d038c	Mute accounting circuit breaker check after test (#42448 ) If we close an engine while a refresh is happening, then we might leak refCount of some SegmentReaders. We need to skip the ram accounting circuit breaker check until we have a new Lucene snapshot which includes the fix for LUCENE-8809. This also adds a test to the engine but left it muted so we won't forget to reenable this check. Closes #30290	2019-05-24 15:42:12 -04:00
Simon Willnauer	46ccfba808	Remove IndexStore and DirectoryService (#42446 ) Both of these classes are basically a bloated wrapper around a simple construct that can simply be a DirectoryFactory interface. This change removes both classes and replaces them with a simple stateless interface that creates a new `Directory` per shard. The concept of `index.store` is preserved since it makes sense from a configuration perspective.	2019-05-24 12:14:56 +02:00
David Turner	f864f6a740	Cluster state from API should always have a master (#42454 ) Today the `TransportClusterStateAction` ignores the state passed by the `TransportMasterNodeAction` and obtains its state from the cluster applier. This might be inconsistent, showing a different node as the master or maybe even having no master. This change adjusts the action to use the passed-in state directly, and adds tests showing that the state returned is consistent with our expectations even if there is a concurrent master failover. Fixes #38331 Relates #38432	2019-05-24 08:45:22 +01:00
Simon Willnauer	a79cd77e5c	Remove IndexShard dependency from Repository (#42213 ) * Remove IndexShard dependency from Repository In order to simplify repository testing especially for BlobStoreRepository it's important to remove the dependency on IndexShard and reduce it to Store and MapperService (in the snapshot case). This significantly reduces the dependcy footprint for Repository and allows unittesting without starting nodes or instantiate entire shard instances. This change deprecates the old method signatures and adds a unittest for FileRepository to show the advantage of this change. In addition, the unittesting surfaced a bug where the internal file names that are private to the repository were used in the recovery stats instead of the target file names which makes it impossible to relate to the actual lucene files in the recovery stats. * don't delegate deprecated methods * apply comments * test	2019-05-22 14:27:11 +02:00
Yannick Welsch	770d8e9e39	Remove usage of max_local_storage_nodes in test infrastructure (#41652 ) Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a follow-up PR to deprecate this setting in 7.x and to remove it in 8.0. This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly done by passing the data path from the stopped node to the new node that is started.	2019-05-22 11:04:55 +02:00
Christoph Büscher	3e59c31a12	Change IndexAnalyzers default analyzer access (#42011 ) Currently IndexAnalyzers keeps the three default as separate class members although they should refer to the same analyzers held in the additional analyzers map under the default names. This assumption should be made more explicit by keeping all analyzers in the map. This change adapts the constructor to check all the default entries are there and the getters to reach into the map with the default names when needed.	2019-05-10 18:08:51 +02:00
David Turner	4c909e93bb	Reject port ranges in `discovery.seed_hosts` (#41905 ) Today Elasticsearch accepts, but silently ignores, port ranges in the `discovery.seed_hosts` setting: ``` discovery.seed_hosts: 10.1.2.3:9300-9400 ``` Silently ignoring part of a setting like this is trappy. With this change we reject seed host addresses of this form. Closes #40786 Backport of #41404	2019-05-08 08:34:32 +01:00
Alan Woodward	4cca1e8fff	Correct spelling of MockLogAppender.PatternSeenEventExpectation (#41893 ) The class was called PatternSeenEventExcpectation. This commit is a straight class rename to correct the spelling.	2019-05-07 17:28:51 +01:00
Henning Andersen	f068a22f5f	SeqNo CAS linearizability (#38561 ) Add a test that stresses concurrent writes using ifSeqno/ifPrimaryTerm to do CAS style updates. Use linearizability checker to verify linearizability. Linearizability of successful CAS'es is guaranteed. Changed linearizability checker to allow collecting history concurrently. Changed unresponsive network simulation to wake up immediately when network disruption is cleared to ensure tests proceed in a timely manner (and this also seems more likely to provoke issues).	2019-05-07 14:04:38 +02:00
Jim Ferenczi	70bf432fa8	Fix full text queries test that start with now (#41854 ) Full text queries that start with now are not cacheable if they target a date field. However we assume in the query builder tests that all queries are cacheable and this assumption fails when the random generated query string starts with "now". This fails twice in several years since the probability that a random string starts with "now" is low but this commit ensures that isCacheable is correctly checked for full text queries that fall into this edge case. Closes #41847	2019-05-06 19:08:30 +02:00
Tim Brooks	927013426a	Read multiple TLS packets in one read call (#41820 ) This is related to #27260. Currently we have a single read buffer that is no larger than a single TLS packet. This prevents us from reading multiple TLS packets in a single socket read call. This commit modifies our TLS work to support reading similar to the plaintext case. The data will be copied to a (potentially) recycled TLS packet-sized buffer for interaction with the SSLEngine.	2019-05-06 09:51:32 -06:00
Nhat Nguyen	c7924014fa	Verify consistency of version and source in disruption tests (#41614 ) (#41661 ) With this change, we will verify the consistency of version and source (besides id, seq_no, and term) of live documents between shard copies at the end of disruption tests.	2019-05-03 18:47:14 -04:00
Jay Modi	8421e38887	Do not print null method name in reproduce line (#41691 ) This commit updates the reproduce line that is printed out when a test fails so that it does not output `.null` as the method name when the failure is not a specific method but a class level issue such as threads being leaked from the SUITE. Previously, when this occurred the reproduce line would look like: `./gradlew :server:integTest --tests "org.elasticsearch.indices.memory.breaker.CircuitBreakerServiceIT.null"` and after this change, the line no longer contains the `.null` after the class name.	2019-05-02 12:20:07 -06:00
Sandmannn	728fe2d409	Small correction in comments (#41623 )	2019-05-02 15:30:18 +03:00
Nhat Nguyen	887f3f2c83	Simplify initialization of max_seq_no of updates (#41161 ) Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example #40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation). Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary. Relates #33842	2019-04-30 15:14:52 -04:00
Tim Brooks	df3ef66294	Remove dedicated SSL network write buffer (#41654 ) This is related to #27260. Currently for the SSLDriver we allocate a dedicated network write buffer and encrypt the data into that buffer one buffer at a time. This requires constantly switching between encrypting and flushing. This commit adds a dedicated outbound buffer for SSL operations that will internally allocate new packet sized buffers as they are need (for writing encrypted data). This allows us to totally encrypt an operation before writing it to the network. Eventually it can be hooked up to buffer recycling. This commit also backports the following commit: Handle WRAP ops during SSL read It is possible that a WRAP operation can occur while decrypting handshake data in TLS 1.3. The SSLDriver does not currently handle this well as it does not have access to the outbound buffer during read call. This commit moves the buffer into the Driver to fix this issue. Data wrapped during a read call will be queued for writing after the read call is complete.	2019-04-29 17:59:13 -06:00
Nhat Nguyen	615a0211f0	Recovery should not indefinitely retry on mapping error (#41099 ) A stuck peer recovery in #40913 reveals that we indefinitely retry on new cluster states if indexing translog operations hits a mapper exception. We should not wait and retry if the mapping on the target is as recent as the mapping that the primary used to index the replaying operations. Relates #40913	2019-04-27 10:55:08 -04:00
Armin Braun	aad33121d8	Async Snapshot Repository Deletes (#40144 ) (#41571 ) Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106 * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657	2019-04-26 15:36:09 +02:00
Tim Brooks	1f8ff052a1	Revert "Remove dedicated SSL network write buffer (#41283 )" This reverts commit `f65a86c258`.	2019-04-25 18:39:25 -06:00
Tim Brooks	f65a86c258	Remove dedicated SSL network write buffer (#41283 ) This is related to #27260. Currently for the SSLDriver we allocate a dedicated network write buffer and encrypt the data into that buffer one buffer at a time. This requires constantly switching between encrypting and flushing. This commit adds a dedicated outbound buffer for SSL operations that will internally allocate new packet sized buffers as they are need (for writing encrypted data). This allows us to totally encrypt an operation before writing it to the network. Eventually it can be hooked up to buffer recycling.	2019-04-25 14:30:54 -06:00
Armin Braun	40aef2b8aa	Introduce Delegating ActionListener Wrappers (#40129 ) (#41527 ) * Introduce Delegating ActionListener Wrappers * Dry up use cases of ActionListener that simply pass through the response or exception to another listener	2019-04-25 16:05:04 +02:00
Ryan Ernst	7e3875d781	Upgrade hamcrest to 2.1 (#41464 ) hamcrest has some improvements in newer versions, like FileMatchers that make assertions regarding file exists cleaner. This commit upgrades to the latest version of hamcrest so we can start using new and improved matchers.	2019-04-24 23:40:03 -07:00
Martijn Laarman	85b9dc18a7	fix #35262 define deprecations of API's as a whole and urls (#39063 ) * fix #35262 define deprecations of API's as a whole and urls * document hot threads deprecated paths * deprecate scroll_id as part of the URL, documented only as part of the body which is a safer behaviour as well * use version numbers up to patch version * rest spec parser picks up deprecated paths as paths too (cherry picked from commit 7e06023e7603b7584bfd9ee4e8a1ccd82c208ce7)	2019-04-23 14:28:36 +02:00
Jason Tedor	4a288af85f	Avoid concurrent modification in mock log appender (#41424 ) It can be the case that while we are setting up expectations that also a log message is appended. For example, if we are setting up these expectations after a cluster has formed and messages start being sent around the cluster. In this case, we would hit a concurrent modification exception while we are mutating the expectations, and also while the expectations are being iterated over as a message is appended. This commit avoids this by using a copy-on-write array list which is safe for concurrent modification and iteration. Note that another possible approach here is to use synchronized, but that seems unnecessary since we don't appear to rely on messages that are sent while we are setting up expectations. Rather, we are setting up some expectations and some situation that we think will cause those expectations to be met. Using copy-on-write array list here is nice since we avoid bottlenecking these tests on synchronizing these methods.	2019-04-22 21:47:16 -04:00
Adrien Grand	9fd5237fd4	Clean up Node#close. (#39317 ) (#41301 ) `Node#close` is pretty hard to rely on today: - it might swallow exceptions - it waits for 10 seconds for threads to terminate but doesn't signal anything if threads are still not terminated after 10 seconds This commit makes `IOException`s propagated and splits `Node#close` into `Node#close` and `Node#awaitClose` so that the decision what to do if a node takes too long to close can be done on top of `Node#close`. It also adds synchronization to lifecycle transitions to make them atomic. I don't think it is a source of problems today, but it makes things easier to reason about.	2019-04-17 16:10:53 +02:00
Armin Braun	c4e84e2b34	Add Bulk Delete Api to BlobStore (#40322 ) (#41253 ) * Adds Bulk delete API to blob container * Implement bulk delete API for S3 * Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs * Closes #40250	2019-04-16 17:19:05 +02:00
Tim Brooks	ad3b7abaa3	Deprecate old transport settings (#41229 ) This is related to #36652. We intend to remove a number of old transport settings in 8.0. This commit deprecates those settings for 7.x.	2019-04-15 21:43:09 -06:00
Nhat Nguyen	e9999dfa1d	Init global checkpoint after copy commit in peer recovery (#40823 ) Today a new replica of a closed index does not have a safe commit invariant when its engine is opened because we won't initialize the global checkpoint on a recovering replica until the finalize step. With this change, we can achieve that property by creating a new translog with the global checkpoint from the primary at the end of phase 1.	2019-04-11 22:18:31 -04:00
David Turner	b522de975d	Move primary term from replicas proxy to repl op (#41119 ) A small refactoring that removes the primaryTerm field from ReplicasProxy and instead passes it directly in to the methods that need it. Relates #40706.	2019-04-11 21:19:27 +01:00
Armin Braun	233df6b73b	Make Transport Shard Bulk Action Async (#39793 ) (#41112 ) This is a dependency of #39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.	2019-04-11 16:01:52 +02:00
Mark Vieira	1287c7d91f	[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978 ) (#40993 ) * Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions. (cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6) * Fix forking JVM runner * Don't bump shadow plugin version	2019-04-09 11:52:50 -07:00
David Turner	2ff19bc1b7	Use Writeable for TransportReplAction derivatives (#40905 ) Relates #34389, backport of #40894.	2019-04-05 19:10:10 +01:00
Martijn van Groningen	809a5f13a4	Make -try xlint warning disabled by default. (#40833 ) Many gradle projects specifically use the -try exclude flag, because there are many cases where auto-closeable resource ignore is never referenced in body of corresponding try statement. Suppressing this warning specifically in each case that it happens using `@SuppressWarnings("try")` would be very verbose. This change removes `-try` from any gradle project and adds it to the build plugin. Also this change removes exclude flags from gradle projects that is already specified in build plugin (for example -deprecation). Relates to #40366	2019-04-05 08:02:26 +02:00
Jason Tedor	f377155f10	Use default memory lock setting in testing (#40730 ) Today we are running our internal tests with bootstrap.memory_lock enabled. This is not out default setting, and not the recommended value. This commit switches to use the default value, which is to not enable bootstrap.memory_lock.	2019-04-02 17:56:32 -04:00
Alpar Torok	293297ae3d	Fix repository-hdfs when no docker and unnecesary fixture The hdfs-fixture is actually executed in plugin/repository-hdfs as a dependency. The fixture is not needed and actually causes a failure because we have two copies now and both use the same ports.	2019-03-29 16:55:12 +02:00
Alpar Torok	2b91fb1cc0	Avoid building hdfs-fixure use an image that works instead Avoid the additional requirement for the debian package repos to be up, and depend on dockerhub only instead.	2019-03-29 16:55:11 +02:00
Alpar Torok	e8c0b53796	Add ability to mute and mute flaky fixture (#40630 )	2019-03-29 12:10:04 +02:00
Alpar Torok	d791e08932	Test fixtures krb5 (#40297 ) Replaces the vagrant based kerberos fixtures with docker based test fixtures plugin. The configuration is now entirely static on the docker side and no longer driven by Gradle, also two different services are being configured since there are two different consumers of the fixture that can run in parallel and require different configurations.	2019-03-28 17:26:58 +02:00
Yannick Welsch	fea91c6113	Mute testTracerLog Relates to #40586	2019-03-28 14:05:54 +01:00
David Turner	5a2ba34174	Get node ID from nodes info in REST tests (#40052 ) (#40532 ) We discussed recently that the cluster state API should be considered "internal" and therefore our usual cast-iron stability guarantees do not hold for this API. However, there are a good number of REST tests that try to identify the master node. Today they call `GET /_cluster/state` API and extract the master node ID from the response. In fact many of these tests just want an arbitary node ID (or perhaps a data node ID) so an alternative is to call `GET _nodes` or `GET _nodes/data:true` and obtain a node ID from the keys of the `nodes` map in the response. This change adds the ability for YAML-based REST tests to extract an arbitrary key from a map so that they can obtain a node ID from the nodes info API instead of using the master node ID from the cluster state API. Relates #40047.	2019-03-27 23:08:10 +00:00
Tim Brooks	ab44f5fd5d	Add InboundHandler for inbound message handling (#40430 ) This commit adds an InboundHandler to handle inbound message processing. With this commit, this code is moved out of the TcpTransport. Additionally, finer grained unit tests are added to ensure that the inbound processing works as expected	2019-03-27 12:33:26 -06:00
Yannick Welsch	64b31f44af	No mapper service and index caches for replicated closed indices (#40423 ) Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with full indexing and search capabilities allocated. We can save on a lot of heap memory for those indices by not allocating a mapper service and caching infrastructure (which preallocates a constant amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed metricbeat indices (each index with one shard). After this change, the same instance can host 7300 replicated closed metricbeat instances (not that this would be a recommended configuration). Most of the remaining memory is in the cluster state and the IndexSettings object.	2019-03-27 19:04:24 +01:00
Yannick Welsch	8f7c5732f1	Use default discovery implementation for single-node discovery (#40036 ) Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions: - auto-bootstrapping, but requiring initial_master_nodes not to be set. - not actively pinging other nodes using the Peerfinder - not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).	2019-03-27 19:04:24 +01:00
Tim Brooks	3860ddd1a4	Move outbound message handling to OutboundHandler (#40336 ) Currently there are some components of message serializer and sending that still occur in TcpTransport. This commit makes it possible to send a message without the TcpTransport by moving all of the remaining application logic to the OutboundHandler. Additionally, it adds unit tests to ensure that this logic works as expected.	2019-03-27 11:47:36 -06:00
Tim Brooks	760cfffe4b	Move TransportMessageListener to TransportService (#40474 ) Currently the TransportMessageListener is applied and used in the Transport class. However, local requests and responses never make it to this class. This PR moves the listener add/remove methods to the TransportService. After this change the Transport can only have one listener set with it. This one listener is the TransportService, which will then propogate the events to the external listeners. Additionally this commit back ports #40237 Remove Tracer from MockTransportService Currently the TransportMessageListener is applied and used in the Transport class. However, local requests and responses never make it to this class. This PR moves the listener add/remove methods to the TransportService. After this change the Transport can only have one listener set with it. This one listener is the TransportService, which will then propogate the events to the external listeners.	2019-03-27 09:24:20 -06:00
Henning Andersen	bf444b9f02	Store Pending Deletions Fix (#40345 ) FilterDirectory.getPendingDeletions does not delegate, fixed temporarily by overriding in StoreDirectory. This in turn caused duplicate file name use after a trimUnsafeCommits had been done, since a new IndexWriter would not consider the pending deletes in IndexFileDeleter. This should only happen on windows (AFAIK). Reenabled doing index updates for all tests using IndexShardTests.indexOnReplicaWithGaps (which could fail due to above when using mocked WindowsFS). Added getPendingDeletions delegation to all elasticsearch FilterDirectory subclasses that were not trivial test-only overrides to minimize the risk of hitting this issue in another case.	2019-03-26 15:30:44 +01:00
Armin Braun	cafb83297c	Use Correct Enum in Wipe Snapshots Test Method (#40422 ) (#40438 ) * Mistake was made in #39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state	2019-03-26 09:04:15 +01:00
Nhat Nguyen	efaf95628b	Use separate translog dir in testDeleteWithFatalError This test currently opens a new engine but shares the same translog directory of the previous opening engine.	2019-03-20 10:22:27 -04:00
Mayya Sharipova	49a7c6e0e8	Expose proximity boosting (#39385 ) (#40251 ) Expose DistanceFeatureQuery for geo, date and date_nanos types Closes #33382	2019-03-20 09:24:41 -04:00
Henning Andersen	4c2a8638ca	Cascading primary failure lead to MSU too low (#40249 ) If a replica were first reset due to one primary failover and then promoted (before resync completes), its MSU would not include changes since global checkpoint, leading to errors during translog replay. Fixed by re-initializing MSU before restoring local history.	2019-03-20 14:00:43 +01:00
Nhat Nguyen	a13b4bc8c5	Always fail engine if delete operation fails (#40117 ) Unlike index operations which can fail at the document level to analyzing errors, delete operations should never fail at the document level whether soft-deletes is enabled or not. With this change, we will always fail the engine if we fail to apply a delete operation to Lucene. Closes #33256	2019-03-19 13:09:23 -04:00
Nhat Nguyen	38e9522218	Remove wait for cluster state step in peer recovery (#40004 ) We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped using it since #25692 (6.0). This change removes that action and related code in 7.x and 8.0. Relates #19287 Relates #25692	2019-03-18 15:17:21 -04:00
Nhat Nguyen	9ba0bdf528	Dump cluster state if ensureGreen timed out in QA tests (#40133 ) When the method ensureGreen in QA tests is timed out, it does not provide enough info for us to investigate why the testing index is not green yet. With this change, we will dump the cluster state if ensureGreen timed out. Relates #32027	2019-03-18 15:17:21 -04:00
Nhat Nguyen	d720a64b9e	Ensure sendBatch not called recursively (#39988 ) This PR introduces AsyncRecoveryTarget which executes remote calls of peer recovery asynchronously. In this change, we also add a new assertion to ensure that method sendBatch, which sends a batch of history operations in phase2, is never called recursively on the same thread. This new assertion will also be used in method sendFileChunks.	2019-03-18 15:17:21 -04:00
Andrey Ershov	42602478b8	Unmute, fix, refactor and zen2ify NetworkDisruptionIT (#38351 ) This commit unmutes NetworkDisruptionIT. It makes changes necessary for Zen2 - avoids usage of autoMinMasterNodes and selects cluster size, such that there is no need to call AddVotingExclusion. This test also introduces refactors a single method prepareDistruptedCluster to be used by both test methods. Unfortunately, NetworkDisruption is broken and the testNetworkPartitionRemovalRestoresConnections "is fixed" by introducing assertBusy - #38348. Relates #36205 Relates #38348 (cherry picked from commit 97707c7f892636e5b75c3df546b067414acb27cd)	2019-03-18 16:39:43 +01:00
Jim Ferenczi	eb540125ea	Fix IndexSearcherWrapper visibility (#39071 ) (#40145 ) This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin. This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed. Closes #30758	2019-03-18 11:33:54 +01:00
Tim Brooks	0b50a670a4	Remove transport name from tcp channel (#40074 ) Currently, we maintain a transport name ("mock-nio", "nio", "netty") that is passed to a `TcpTransportChannel` when a request is received. The value of this name is to associate with the task when we register a task with the task manager. However, it is only possible to run ES with one transport, so having an implementation specific name is unnecessary. This commit removes the name and replaces it with the generic "transport".	2019-03-15 12:04:13 -06:00
David Turner	a323132503	Create retention leases file during recovery (#39359 ) Today we load the shard history retention leases from disk whenever opening the engine, and treat a missing file as an empty set of leases. However in some cases this is inappropriate: we might be restoring from a snapshot (if the target index already exists then there may be leases on disk) or force-allocating a stale primary, and in neither case does it make sense to restore the retention leases from disk. With this change we write an empty retention leases file during recovery, except for the following cases: - During peer recovery the on-disk leases may be accurate and could be needed if the recovery target is made into a primary. - During recovery from an existing store, as long as we are not force-allocating a stale primary. Relates #37165	2019-03-15 07:49:49 +00:00
Jason Tedor	9181668edf	Stop returning cluster state size by default (#40016 ) Computing the compressed size of the cluster state on every invocation of cluster:monitor/state action is expensive, and the value of this field is dubious anyway. Therefore we want to remove computing this field. As a first step, we stop computing and return this field by default. To avoid breaking users, we will give them a system property to use to tide them over until the next major release when we will actually remove this field. This comes with a deprecation warning too, and the backport to the appropriate minor will also include a note in the migration guide. There will be a follow-up to remove this field in the next major version.	2019-03-14 08:57:55 -04:00
David Turner	049970af3e	Only connect to new nodes on new cluster state (#39629 ) Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves #29025.	2019-03-12 19:26:13 +00:00
Tim Brooks	5612ed97ca	Add log warnings for long running event handling (#39729 ) Recently we have had a number of test issues related to blocking activity occuring on the io thread. This commit adds a log warning for when handling event takes a >150 milliseconds. This is implemented for the MockNioTransport which is the transport used in ESIntegTestCase.	2019-03-08 13:07:24 -07:00
Alpar Torok	0f89427eb6	Back port build changes from same version bwc tests (#39744 ) * Back port build changes from #39102 This back-ports how versions are determined and bwc test are set up from #39102 without enabling the bwc from current version tests so it's easier/possible to backmerge future buld changes. It's expected that the tets are lacking many of the required fixes in this version to enable them.	2019-03-07 17:25:09 +02:00
Alpar Torok	34ea84948c	Fix bwc tests failure to extract (#39619 ) * Set correct packaging for older versions Continue using zip packages for pre 7 Some other bwc fixes that hid behind this one. Closes #39441 #39751	2019-03-07 09:07:11 +02:00
Armin Braun	bb2f8485f1	Wipe Snapshots Before Indices in RestTests (#39662 ) (#39763 ) * Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots	2019-03-06 21:48:17 +01:00
Nhat Nguyen	1fe7cb594f	Don’t ack if unable to remove failing replica (#39584 ) Today when a replicated write operation fails to execute on a replica, the primary will reach out to the master to fail that replica (and mark it stale). We then won't ack that request until the master removes the failing replica; otherwise, we will lose the acked operation if the failed replica is still in the in-sync set. However, if a node with the primary is shutting down, we might ack such request even though we are unable to send a shard-failure request to the master. This happens because we ignore NodeClosedException which is triggered when the ClusterService is being closed. Closes #39467	2019-03-06 15:30:55 -05:00
Simon Willnauer	e620fb2e4a	Add option to force load term dict into memory (#39741 ) Lucene added an optimization to leave the term dictionary on disk for non-id like fields. This change happened very late in the release processes such that it's better to have an escape hatch if certain use-cases are hurt by this optimization. This setting might be removed in the future if it turns out to be unnecessary.	2019-03-06 15:29:04 +01:00
David Turner	295e39a8c8	Drop node if asymmetrically partitioned from master (#39598 ) When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.	2019-03-06 09:41:57 +00:00
Armin Braun	aaecaf59a4	Optimize Bulk Message Parsing and Message Length Parsing (#39634 ) (#39730 ) * Optimize Bulk Message Parsing and Message Length Parsing * findNextMarker took almost 1ms per invocation during the PMC rally track * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search * It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes * Fixed by adding bulk `int` read to BytesReference	2019-03-06 08:13:15 +01:00
Yannick Welsch	936dbb00e3	Isolate Zen1 (#39470 ) Cherry-picks a few commits from #39466 to align 7.x with master branch.	2019-03-04 15:51:17 +01:00
Adrien Grand	934946a232	Don't swallow exception in ThreadPool.terminate. (#39038 ) (#39623 ) The use of `closeWhileHandlingException` means that any exception while trying to close the threadpool is going to be swallowed. Relates #39030	2019-03-04 10:58:29 +01:00
Tanguy Leroux	e005eeb0b3	Backport support for replicating closed indices to 7.x (#39506 )(#39499 ) Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888	2019-03-01 14:48:26 +01:00
Lee Hinman	dae48ba262	Add details about what acquired the shard lock last (#38807 ) This adds a `details` parameter to shard locking in `NodeEnvironment`. This is intended to be used for diagnosing issues such as ``` 1> [2019-02-11T14:34:19,262][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] deleting index 1> [2019-02-11T14:34:19,279][WARN ][o.e.i.IndicesService ] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] failed to delete index 1> org.elasticsearch.env.ShardLockObtainFailedException: [.tasks][0]: obtaining shard lock timed out after 0ms 1> at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:736) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:655) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:601) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:554) ~[main/:?] ``` In the hope that we will be able to determine why the shard is still locked. Relates to #30290 as well as some other CI failures	2019-02-28 10:50:47 -07:00
Tal Levy	f538b30af9	ensure no initializing shards during cluster cleanup (#39283 ) (#39480 ) there are testing situations where newly created indices are being wiped before they are fully initialized. This results in an edge-case in the shard-locking strategy where an index cannot be deleted. This should fix that	2019-02-27 15:56:33 -08:00
Armin Braun	28b771f5db	Remove Dead Code Test Infrastructure (#39192 ) (#39436 ) * Just removing some obviously unused things	2019-02-27 09:38:47 +01:00
Tim Brooks	f24dae302d	Make security tests transport agnostic (#39411 ) Currently there are two security tests that specifically target the netty security transport. This PR moves the client authentication tests into `AbstractSimpleSecurityTransportTestCase` so that the nio transport will also be tested. Additionally the work to build transport configurations is moved out of the netty transport and tested independently.	2019-02-26 18:55:19 -07:00
Jason Tedor	a6c0166d68	Renew retention leases while following (#39335 ) This commit is the final piece of the integration of CCR with retention leases. Namely, we periodically renew retention leases and advance the retaining sequence number while following.	2019-02-25 17:14:19 -05:00
Nhat Nguyen	48219112e3	Do not wait for advancement of checkpoint in recovery (#39006 ) With this change, we won't wait for the local checkpoint to advance to the max_seq_no before starting phase2 of peer-recovery. We also remove the sequence number range check in peer-recovery. We can safely do these thanks to Yannick's finding. The replication group to be used is currently sampled after indexing into the primary (see `ReplicationOperation` class). This means that when initiating tracking of a new replica, we have to consider the following two cases: - There are operations for which the replication group has not been sampled yet. As we initiated the new replica as tracking, we know that those operations will be replicated to the new replica and follow the typical replication group semantics (e.g. marked as stale when unavailable). - There are operations for which the replication group has already been sampled. These operations will not be sent to the new replica. However, we know that those operations are already indexed into Lucene and the translog on the primary, as the sampling is happening after that. This means that by taking a snapshot of Lucene or the translog, we will be getting those ops as well. What we cannot guarantee anymore is that all ops up to `endingSeqNo` are available in the snapshot (i.e. also see comment in `RecoverySourceHandler` saying `We need to wait for all operations up to the current max to complete, otherwise we can not guarantee that all operations in the required range will be available for replaying from the translog of the source.`). This is not needed, though, as we can no longer guarantee that max seq no == local checkpoint. Relates #39000 Closes #38949 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2019-02-25 12:10:14 -05:00
Armin Braun	50d2736746	Fix Deadlock from Thread.suspend in Test (#39261 ) (#39341 ) * The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance). * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads * Closes #35686	2019-02-25 09:15:19 +01:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Armin Braun	1a21cc0357	Simplify and Fix Synchronization in InternalTestCluster (#39168 ) (#39241 ) * Simplify and Fix Synchronization in InternalTestCluster (#39168) * Remove unnecessary `synchronized` statements * Make `Predicate`s constants where possible * Cleanup some stream usage * Make unsafe public methods `synchronized` * Closes #37965 * Closes #37275 * Closes #37345	2019-02-21 16:27:18 +01:00
Lee Hinman	d9de899316	Wrap accounting breaker check in assertBusy (#39211 ) There may be situations where indices have not yet been closed from a Lucene perspective, causing the breaker to not immediately be at 0 Relates to #30290	2019-02-21 08:00:31 -07:00
Marios Trivyzas	1316825f52	Replace superfluous usage of Counter with Supplier (#39048 ) (#39225 ) `Counter` was used as a means of a functional argument to pass the relative cached time before `Supplier` iface was introduced.	2019-02-21 12:42:54 +02:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Tal Levy	cb7e3708bc	Rollup jobs should be cleaned up before indices are deleted (#38930 ) (#39144 ) Rollup jobs should be stopped + deleted before the indices are removed. It's possible for an active rollup job to issue a bulk request, the test ends and the cleanup code deletes all indices. The in-flight bulk request will then stall + error because the index no-longer exists... but this process might take longer than the StopRollup timeout. Which means the test fails, and often fails several other tests since the job is still active (e.g. other tests cannot create the same-named job, or fail to stop the job in their cleanup because it's still stalled). This tends to knock over several tests before the bulk finally times out and the job shuts down. Instead, we need to simply stop jobs first. Inflight bulks will resolve quickly, and we can carry on with deleting indices after the jobs are confirmed inactive. stop-job.asciidoc tended to trigger this issue because it executed an async stop API and then exited, which setup the above situation. In can and did happen with other tests though. As an extra precaution, the doc test was modified to substitute in wait_for_completion to help head off these issues too.	2019-02-20 11:12:01 -08:00
Adrien Grand	d8852b83d0	Don't swallow IOExceptions in InternalTestCluster. (#39068 ) Relates #39030	2019-02-19 15:03:47 +01:00
Ioannis Kakavas	59e9a0f4f4	Disable specific locales for tests in fips mode (#38938 ) * Disable specific locales for tests in fips mode The Bouncy Castle FIPS provider that we use for running our tests in fips mode has an issue with locale sensitive handling of Dates as described in https://github.com/bcgit/bc-java/issues/405 This causes certificate validation to fail if any given test that includes some form of certificate validation happens to run in one of the locales. This manifested earlier in #33081 which was handled insufficiently in #33299 This change ensures that the problematic 3 locales * th-TH * ja-JP-u-ca-japanese-x-lvariant-JP * th-TH-u-nu-thai-x-lvariant-TH will not be used when running our tests in a FIPS 140 JVM. It also reverts #33299	2019-02-19 08:46:08 +02:00
David Roberts	ae9243ad0a	Reduce single node test cleanup logging (#39060 ) As per https://github.com/elastic/elasticsearch/pull/39049#discussion_r257719530	2019-02-18 17:38:49 +00:00
Nhat Nguyen	2947ccf5c3	Add remote recovery to ShardFollowTaskReplicationTests (#39007 ) We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975	2019-02-18 09:57:56 -05:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Christoph Büscher	9f6c77fad4	Fix FullClusterRestartIT#testSnapshotRestore (#38795 ) This test failed on 7.1 when running full cluster restart tests against pre-7.0 clusters (e.g. 6.6 clusters). The fixes the expected type in the templates after the cluster restart.	2019-02-15 20:12:26 +01:00
Lee Hinman	7d449c5f65	Check that delete index request succeeded in test teardown (#38903 ) (#38913 ) Backport of #38903 When tearing down from `ESSingleNodeTestCase` we perform a delete on "*" indices, it some cases, however, those indices are not fully deleted. Rather than have a failure occur later down the change (see: https://github.com/elastic/elasticsearch/issues/30290#issuecomment-463589008 ) the failure should occurr immediately so it can be diagnosed more easily.	2019-02-14 13:46:17 -07:00
Julie Tibshirani	e769cb4efd	Perform precise check for types warnings in cluster restart tests. (#37944 ) Instead of using `WarningsHandler.PERMISSIVE`, we only match warnings that are due to types removal. This PR also renames `allowTypeRemovalWarnings` to `allowTypesRemovalWarnings`. Relates to #37920.	2019-02-13 11:28:58 -08:00
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Nhat Nguyen	225ebb6935	Ensure no snapshotted commit when close engine (#38663 ) With this change, we can automatically detect an implementation that acquires an index commit but fails to release.	2019-02-12 11:39:35 -05:00
Tim Brooks	023e3c207a	Concurrent file chunk fetching for CCR restore (#38656 ) Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR. The implementation uses the parallel file writer functionality that is also used by peer recoveries.	2019-02-09 21:19:57 -07:00
Tim Vernum	84483b26cf	Fix version logic when bumping major version (#38593 ) When we are preparing to release a major version the rules around "unreleased" versions and branches get a bit more complex. This change implements the following rules: - If the tip version on the previous major is a .0 (e.g. 6.7.0) then the tip of the minor before that (e.g. 6.6.1) must be unreleased. (This is because 6.7.0 would be "staged" in preparation for release, but 6.6.1 would be open for bug fixes on the release 6.6.x line) (in VersionCollection & VersionUtils) - The "major.x" branch (if it exists) will always point to the latest minor in that series. Anything that is not the latest minor, must therefore be on a the "major.minor" branch For example, if v7.1.0 exists then the "7.x" branch must be 7.1.0, and 7.0.0 must be on the "7.0" branch (in VersionCollection)	2019-02-08 18:00:03 +11:00
Jason Tedor	fdf6b3f23f	Add 7.1 version constant to 7.x branch (#38513 ) This commit adds the 7.1 version constant to the 7.x branch. Co-authored-by: Andy Bristol <andy.bristol@elastic.co> Co-authored-by: Tim Brooks <tim@uncontended.net> Co-authored-by: Christoph Büscher <cbuescher@posteo.de> Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com> Co-authored-by: markharwood <markharwood@gmail.com> Co-authored-by: Ioannis Kakavas <ioannis@elastic.co> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: David Roberts <dave.roberts@elastic.co> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Alpar Torok <torokalpar@gmail.com> Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Tim Vernum <tim@adjective.org> Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>	2019-02-07 16:32:27 -05:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
Luca Cavanna	a7046e001c	Remove support for maxRetryTimeout from low-level REST client (#38085 ) We have had various reports of problems caused by the maxRetryTimeout setting in the low-level REST client. Such setting was initially added in the attempts to not have requests go through retries if the request already took longer than the provided timeout. The implementation was problematic though as such timeout would also expire in the first request attempt (see #31834), would leave the request executing after expiration causing memory leaks (see #33342), and would not take into account the http client internal queuing (see #25951). Given all these issues, it seems that this custom timeout mechanism gives little benefits while causing a lot of harm. We should rather rely on connect and socket timeout exposed by the underlying http client and accept that a request can overall take longer than the configured timeout, which is the case even with a single retry anyways. This commit removes the `maxRetryTimeout` setting and all of its usages.	2019-02-06 08:43:47 +01:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Turner	b7ab521eb1	Throw AssertionError when no master (#38432 ) Today we throw a fatal `RuntimeException` if an exception occurs in `getMasterName()`, and this includes the case where there is currently no master. However, sometimes we call this method inside an `assertBusy()` in order to allow for a cluster that is in the process of stabilising and electing a master. The trouble is that `assertBusy()` only retries on an `AssertionError` and not on a general `RuntimeException`, so the lack of a master is immediately fatal. This commit fixes the issue by asserting there is a master, triggering a retry if there is not. Fixes #38331	2019-02-05 17:11:20 +00:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Mayya Sharipova	641704464d	Deprecate types in rollover index API (#38039 ) Relates to #35190	2019-02-04 16:07:45 -05:00
markharwood	578fd14257	Types removal - fix FullClusterRestartIT warning expectations (#38310 ) Relax test warning message checking to pre-empt PR 38022 landing in 6.7 with new warning messages. The relaxed test now just assumes any warning message starting with “[types removal]” is tolerated rather than the precise phrasing used in the 6.7 branch.	2019-02-04 20:09:07 +00:00
Jason Tedor	625d37a26a	Introduce retention lease background sync (#38262 ) This commit introduces a background sync for retention leases. The idea here is that we do a heavyweight sync when adding a new retention lease, and then periodically we want to background sync any retention lease renewals to the replicas. As long as the background sync interval is significantly lower than the extended lifetime of a retention lease, it is okay if from time to time a replica misses a sync (it will still have an older version of the lease that is retaining more data as we assume that renewals do not decrease the retaining sequence number). There are two follow-ups that will come after this commit. The first is to address the fact that we have not adapted the should periodically flush logic to possibly flush the retention leases. We want to do something like flush if we have not flushed in the last five minutes and there are renewed retention leases since the last time that we flushed. An additional follow-up will remove the syncing of retention leases when a retention lease expires. Today this sync could be invoked in the background by a merge operation. Rather, we will move the syncing of retention lease expiration to be done under the background sync. The background sync will use the heavyweight sync (write action) if a lease has expired, and will use the lightweight background sync (replication action) otherwise.	2019-02-04 10:35:29 -05:00
Lee Hinman	f19fdcd491	Re-enable accounting breaker check in InternalTestCluster (#38131 ) Relates to #30290 The intent for this is to see whether this failure still happens, and if so, provide more up-to-date logs for analysis.	2019-02-04 07:40:59 -07:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Julie Tibshirani	c2e9d13ebd	Default include_type_name to false in the yml test harness. (#38058 ) This PR removes the temporary change we made to the yml test harness in #37285 to automatically set `include_type_name` to `true` in index creation requests if it's not already specified. This is possible now that the vast majority of index creation requests were updated to be typeless in #37611. A few additional tests also needed updating here. Additionally, this PR updates the test harness to set `include_type_name` to `false` in index creation requests when communicating with 6.x nodes. This mirrors the logic added in #37611 to allow for typeless document write requests in test set-up code. With this update in place, we can remove many references to `include_type_name: false` from the yml tests.	2019-02-01 11:44:13 -08:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Armin Braun	0a604e3b24	Fix Two Races that Lead to Stuck Snapshots (#37686 ) * Fixes two broken spots: 1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting 2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state * Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk * Significantly extended test infrastructure to reproduce the above two issues * Two new test runs: 1. Reproducing the effects of node disconnects/restarts in isolation 2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes * Relates #32265 * Closes #32348	2019-02-01 05:45:40 +01:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Przemyslaw Gomulka	28b5c7ce78	Do not set up NodeAndClusterIdStateListener in test (#38110 ) When extending ESIntegTestCase are run on the same jvm, the static field in NodeAndClusterIdConverter will throw an AlreadySet exceptions. overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening relates #32850	2019-01-31 18:59:40 +01:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
Jason Tedor	a9b12b38f0	Push primary term to replication tracker (#38044 ) This commit pushes the primary term into the replication tracker. This is a precursor to using the primary term to resolving ordering problems for retention leases. Namely, it can be that out-of-order retention lease sync requests arrive on a replica. To resolve this, we need a tuple of (primary term, version). For this to be, the primary term needs to be accessible in the replication tracker. As the primary term is part of the replication group anyway, this change conceptually makes sense.	2019-01-31 09:19:49 -05:00
Luca Cavanna	622fb7883b	Introduce ability to minimize round-trips in CCS (#37828 ) With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125	2019-01-31 15:12:14 +01:00
Tim Vernum	cde126dbff	Enable SSL in reindex with security QA tests (#37600 ) Update the x-pack/qa/reindex-tests-with-security integration tests to run with TLS enabled on the Rest interface. Relates: #37527	2019-01-31 20:59:50 +11:00
Alexander Reelsen	160d1bd4dd	Work around JDK8 timezone bug in tests (#37968 ) The timezone GMT0 cannot be properly parsed on java8. The randomZone() method now excludes GMT0, if java8 is used. Closes #37814	2019-01-31 08:52:35 +01:00
David Turner	81c443c9de	Deprecate minimum_master_nodes (#37868 ) Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in tests, but for 7.x nodes this setting is not required as it has no effect. This commit removes this setting so that nodes are started with more realistic configurations, and deprecates it.	2019-01-30 20:09:15 +00:00
Nik Everett	e97718245d	Test: Enable strict deprecation on all tests (#36558 ) This drops the option for tests to disable strict deprecation mode in the low level rest client in favor of configuring expected warnings on any calls that should expect warnings. This behavior is paranoid-by-default which is generally the right way to handle deprecations and tests in general.	2019-01-30 11:48:34 -05:00
Colin Goodheart-Smithe	21e392e95e	Removes typed calls from YAML REST tests (#37611 ) This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour. The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.	2019-01-30 16:32:58 +00:00
Tim Brooks	f3f9cabd67	Add timeout for ccr recovery action (#37840 ) This is related to #35975. It adds a action timeout setting that allows timeouts to be applied to the individual transport actions that are used during a ccr recovery.	2019-01-29 12:29:06 -07:00
Armin Braun	7f1784e9f9	Remove Dead MockTransport Code (#34044 ) * All these methods are unused	2019-01-29 15:08:11 +01:00
Luca Cavanna	2325fb9cb3	Remove test only SearchShardTarget constructor (#37912 ) Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.	2019-01-29 14:58:11 +01:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Jason Tedor	5fddb631a2	Introduce retention lease syncing (#37398 ) This commit introduces retention lease syncing from the primary to its replicas when a new retention lease is added. A follow-up commit will add a background sync of the retention leases as well so that renewed retention leases are synced to replicas.	2019-01-27 07:49:56 -05:00
Christoph Büscher	b4b4cd6ebd	Clean codebase from empty statements (#37822 ) * Remove empty statements There are a couple of instances of undocumented empty statements all across the code base. While they are mostly harmless, they make the code hard to read and are potentially error-prone. Removing most of these instances and marking blocks that look empty by intention as such. * Change test, slightly more verbose but less confusing	2019-01-25 14:23:02 +01:00
Jim Ferenczi	787acb14b9	Track total hits up to 10,000 by default (#37466 ) This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028	2019-01-25 13:45:39 +01:00
Julie Tibshirani	e1d8df4ffa	Deprecate types in create index requests. (#37134 ) From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates: * Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests. * Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.	2019-01-24 13:17:47 -08:00
Andrey Ershov	4974684003	Add tool elasticsearch-node unsafe-bootstrap (#37696 ) elasticsearch-node tool helps to restore cluster if half or more of master eligible nodes are lost. Of course, all bets are off, regarding data consistency. There are two parts of the tool: unsafe-bootstrap to be used when there is still at least one master-eligible node alive and detach-cluster, when there are no master-eligible nodes left. This commit implements the first part. Docs for the tool will be added separately as a part of #37812.	2019-01-24 19:25:55 +01:00
Tal Levy	289106a578	Refactor GeoHashGrid to be abstract and re-usable (#37742 ) This change split out all the specific GeoHash classes for the geohash_grid aggregation into abstract GeoGrid classes that can be re-used for specific hashing types, like `geohash`	2019-01-24 10:12:14 -08:00
Alpar Torok	37768b7eac	Testing conventions now checks for tests in main (#37321 ) * Testing conventions now checks for tests in main This is the last outstanding feature of the old NamingConventionsTask, so time to remove it. * PR review	2019-01-24 17:30:50 +02:00
Nhat Nguyen	a6abb28abf	Fix InternalEngineTests#assertOpsOnPrimary (#37746 ) The assertion `assertOpsOnPrimary` does not store seq_no and primary term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This leads to failures of the subsequence CAS deletes or indexes with seq_no and term. Moreover, this assertion trips a translog assertion because it bumps the primary term of some operations but not the primary term of the engine. Relates #36467 Closes #37684	2019-01-24 10:02:48 -05:00
Yannick Welsch	feab59df03	Bubble exceptions up in ClusterApplierService (#37729 ) Exceptions thrown by the cluster applier service's settings and cluster appliers are bubbled up, and block the state from being applied instead of silently being ignored. In combination with the cluster state publishing lag detector, this will throw a node out of the cluster that can't properly apply cluster state updates.	2019-01-24 14:09:03 +01:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
Boaz Leskes	52ba407931	Expose sequence number and primary terms in search responses (#37639 ) Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.	2019-01-23 09:01:58 +01:00
Henning Andersen	228611843c	Fail start of non-data node if node has data (#37347 ) * Fail start of non-data node if node has data Check that nodes started with node.data=false cannot start if they have shard data to avoid (old) indexes being resurrected into the cluster in red status. Issue #27073	2019-01-22 13:27:12 +01:00
Tim Brooks	21838d73b5	Extract message serialization from `TcpTransport` (#37034 ) This commit introduces a NetworkMessage class. This class has two subclasses - InboundMessage and OutboundMessage. These messages can be serialized and deserialized independent of the transport. This allows more granular testing. Additionally, the serialization mechanism is now a simple Supplier. This builds the framework to eventually move the serialization of transport messages to the network thread. This is the one serialization component that is not currently performed on the network thread (transport deserialization and http serialization and deserialization are all on the network thread).	2019-01-21 14:14:18 -07:00
Tim Brooks	f516d68fb2	Share `NioGroup` between http and transport impls (#37396 ) Currently we create dedicated network threads for both the http and transport implementations. Since these these threads should never perform blocking operations, these threads could be shared. This commit modifies the nio-transport to have 0 http workers be default. If the default configs are used, this will cause the http transport to be run on the transport worker threads. The http worker setting will still exist in case the user would like to configure dedicated workers. Additionally, this commmit deletes dedicated acceptor threads. We have never had these for the netty transport and they can be added back if a need is determined in the future.	2019-01-21 13:50:56 -07:00
Julie Tibshirani	8da7a27f3b	Deprecate types in the put mapping API. (#37280 ) From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates: - Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`. - Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.	2019-01-18 12:28:31 -08:00
Yannick Welsch	377d96e376	Remove initial_master_nodes on node restart (#37580 ) Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since being switched to Zen2 because the bootstrap setting was left around when nodes got restarted with data folders wiped. The test in question here was starting one node (which autobootstrapped to that single node), then another node. The first node was then shut down (after excluding it from the voting configuration), its data folder wiped, and restarted. After restart, the node had an empty data folder yet initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of its own, and not rejoin the existing cluster with the other node.	2019-01-18 16:36:42 +01:00
Jason Tedor	687978b7d1	Reject all requests that have an unconsumed body (#37504 ) This commit removes some leniency from REST handling where we move to reject all requests that have a body where the body is not used during the course of handling the request. For example, DELETE /index { "query" : { "term" : { "field" : "value" } } } is now rejected.	2019-01-16 07:29:25 -05:00
Przemyslaw Gomulka	5e94f384c4	Remove the use of AbstracLifecycleComponent constructor #37488 (#37488 ) The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class. It no longer extends the AbstractComponent so there is no need for this constructor There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor. This is part 1. which will be backported to 6.x with a migration guide/deprecation log. part 2 will have this constructor removed in 7 relates #35560 relates #34488	2019-01-16 09:05:30 +01:00
Tanguy Leroux	23ae9808ba	Fix IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) (#37414 ) This commit fixes the IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) method where the startReplica parameter was not correctly propagated and the value true always used instead.	2019-01-15 12:48:21 +01:00
Marios Trivyzas	d6a104f52b	[TEST] Muted testDifferentRolesMaintainPathOnRestart Relates to #37462	2019-01-15 11:51:45 +02:00
Jason Tedor	e11a32eda8	Reformat some classes in the index universe This commit reformats some classes in the index universe with the purpose of breaking some long method definitions and invocations into a line per parameter. This has the advantage that for an upcoming change to these definitions and invocations, the diff for that change will be a single line per definition or invocation. That makes these sorts of changes easier to read.	2019-01-14 21:45:24 -05:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Nhat Nguyen	1e3702da0b	Relax assertSameDocIdsOnShards assertion If the checking node no longer holds the shard copy, the assertion assertSameDocIdsOnShards might fail. This is too harsh since the assertion is to ensure the consistency between active copies.	2019-01-14 15:28:48 -05:00
Nhat Nguyen	15aa3764a4	Reduce recovery time with compress or secure transport (#36981 ) Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) \| Transport \| Baseline \| chunks=1 \| chunks=2 \| chunks=3 \| chunks=4 \| \| ----------\| ---------\| -------- \| -------- \| -------- \| -------- \| \| Plain \| 184s \| 137s \| 106s \| 105s \| 106s \| \| TLS \| 346s \| 294s \| 176s \| 153s \| 117s \| \| Compress \| 1556s \| 1407s \| 1193s \| 1183s \| 1211s \| - NYC_Taxis (38.6GB) \| Transport \| Baseline \| chunks=1 \| chunks=2 \| chunks=3 \| chunks=4 \| \| ----------\| ---------\| ---------\| ---------\| ---------\| -------- \| \| Plain \| 321s \| 249s \| 191s \| * \| * \| \| TLS \| 618s \| 539s \| 323s \| 290s \| 213s \| \| Compress \| 2622s \| 2421s \| 2018s \| 2029s \| n/a \| Relates #33844	2019-01-14 15:14:46 -05:00
Tim Brooks	5c68338a1c	Implement ccr file restore (#37130 ) This is related to #35975. It implements a file based restore in the CcrRepository. The restore transfers files from the leader cluster to the follower cluster. It does not implement any advanced resiliency features at the moment. Any request failure will end the restore.	2019-01-14 13:07:55 -07:00
Armin Braun	033e67fa59	Cleanup Deadcode in Rest Tests (#37418 ) * Either dead code outright or redundant overrides removed	2019-01-14 16:22:44 +01:00
Daniel Mitterdorfer	abe35fb99b	Remove unused index store in directory service With this commit we remove the unused field `indexStore` from all implementations of `FsDirectoryService`. Relates #37097	2019-01-14 13:44:32 +01:00
Jason Tedor	03be4dbaca	Introduce retention lease persistence (#37375 ) This commit introduces the persistence of retention leases by persisting them in index commits and recovering them when recovering a shard from store.	2019-01-12 14:43:19 -08:00
Nhat Nguyen	44a1071018	Make recovery source partially non-blocking (#37291 ) Today a peer-recovery may run into a deadlock if the value of node_concurrent_recoveries is too high. This happens because the peer-recovery is executed in a blocking fashion. This commit attempts to make the recovery source partially non-blocking. I will make three follow-ups to make it fully non-blocking: (1) send translog operations, (2) primary relocation, (3) send commit files. Relates #36195	2019-01-12 12:49:48 -05:00
Armin Braun	63fe3c6ed6	Fix PrimaryAllocationIT Race Condition (#37355 ) * Fix PrimaryAllocationIT Race Condition * Forcing a stale primary allocation on a green index was tripping the assertion that was removed * Added a test that this case still errors out correctly * Made the ability to wipe stopped datanode's data public on the internal test cluster and used it to ensure correct behaviour on the fixed test * Previously it simply passed because the test finished before the index went green and would NPE when the index was green at the time of the shard store status request, that would then come up empty * Closes #37345	2019-01-11 23:26:04 +01:00
Yannick Welsch	f4abf9628a	Mock connections more accurately in DisruptableMockTransport (#37296 ) This commit moves DisruptableMockTransport to use a more accurate representation of connection management, which allows to use the full connection manager and does not require mocking out any behavior. With this, we can implement restarting nodes in CoordinatorTests.	2019-01-11 16:06:48 +01:00
Jason Tedor	822626dadf	Make consistent empty retention lease supplier This commit makes the use of empty retention lease suppliers to always be an empty list as opposed to in some cases an empty set. This commit is solely for consistency reasons, there is no functional change here.	2019-01-10 18:34:55 -08:00
Yannick Welsch	d499233068	Zen2: Add join validation (#37203 ) Adds join validation to Zen2, which prevents a node from joining a cluster when the node does not have the right ES version or does not satisfy any other of the join validation constraints.	2019-01-10 12:57:50 +01:00
Alexander Reelsen	b2e8437424	Tests: Add ElasticsearchAssertions.awaitLatch method (#36777 ) * Tests: Add ElasticsearchAssertions.awaitLatch method Some tests are using assertTrue(latch.await(...)) in their code. This leads to an assertion error without any error message. This adds a method which has a nicer error message and can be used in tests. * fix forbidden apis * fix spaces	2019-01-10 09:25:36 +01:00
Armin Braun	eacc63b032	TESTS: Real Coordinator in SnapshotServiceTests (#37162 ) * TESTS: Real Coordinator in SnapshotServiceTests * Introduce real coordinator in SnapshotServiceTests to be able to test network disruptions realistically * Make adjustments to cluster applier service so that we can pass a mocked single threaded executor for tests	2019-01-09 16:53:49 +01:00
Tanguy Leroux	7f6fe14b66	Merge branch 'master' into close-index-api-refactoring	2019-01-09 09:26:05 +01:00
Mayya Sharipova	ec32e66088	Deprecate reference to _type in lookup queries (#37016 ) Relates to #35190	2019-01-08 18:46:41 -08:00
Tanguy Leroux	d70ebfd1d6	Merge branch 'master' into close-index-api-refactoring	2019-01-08 09:17:48 +01:00
Jason Tedor	c8c596cead	Introduce retention lease expiration (#37195 ) This commit implements a straightforward approach to retention lease expiration. Namely, we inspect which leases are expired when obtaining the current leases through the replication tracker. At that moment, we clean the map that persists the retention leases in memory.	2019-01-07 22:03:52 -08:00
Julie Tibshirani	c5aac4705d	Revert "Stop automatically nesting mappings in index creation requests. (#36924 )" This reverts commit `ac1c6940d2`.	2019-01-07 17:56:40 -08:00
Tanguy Leroux	97bf4d7176	Merge branch 'master' into close-index-api-refactoring	2019-01-07 18:38:27 +01:00
David Turner	9d0e0eb0f3	[Zen2] Remove initial master node count setting (#37150 ) The `cluster.unsafe_initial_master_node_count` setting was introduced as a temporary measure while the design of `cluster.initial_master_nodes` was being finalised. This commit removes this temporary setting, replacing it with usages of `cluster.initial_master_nodes` where appropriate.	2019-01-07 16:05:00 +00:00
Tanguy Leroux	e149b0852e	[Close Index API] Add unique UUID to ClusterBlock (#36775 ) This commit adds a unique id to cluster blocks, so that they can be uniquely identified if needed. This is important for the Close Index API where multiple concurrent closing requests can be executed at the same time. By adding a UUID to the cluster block, we can generate unique "closing block" that can later be verified on shards and then checked again from the cluster state before closing the index. When the verification on shard is done, the closing block is replaced by the regular INDEX_CLOSED_BLOCK instance. If something goes wrong, calling the Open Index API will remove the block. Related to #33888	2019-01-07 16:44:59 +01:00
Jason Tedor	c0f8c89172	Introduce shard history retention leases (#37167 ) This commit is the first in a series which will culminate with fully-functional shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API. Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem. Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene and restored during recovery, and broadcast to replicas under certain circumstances. This commit is merely putting the basics in place. This first commit only introduces the concept and integrates their use with the soft delete retention policy. We add some tests to demonstrate the basic management is correct, and that the soft delete policy is correctly influenced by the existence of any retention leases. We make no effort in this commit to implement any of the following: - timestamps - expiration - persistence to and recovery from Lucene - handoff during primary relocation - sharing retention leases with replicas - exposing leases in shard-level statistics - integration with cross-cluster replication These will occur individually in follow-up commits.	2019-01-07 07:43:57 -08:00
Alpar Torok	a7c3d5842a	Split third party audit exclusions by type (#36763 )	2019-01-07 17:24:19 +02:00
Simon Willnauer	ac2e09b25a	Fix suite scope random initializaation (#37163 ) The initialization of a suite scope cluster had some sideffects on subsequent runs which causes issues when tests must be reproduced. This moves the suite scope initialization to a privte random context. Closes #36202	2019-01-07 14:20:17 +01:00
Tanguy Leroux	f5af79b9cd	Merge branch 'master' into close-index-api-refactoring	2019-01-07 12:43:03 +01:00
Armin Braun	31c33fdb9b	MINOR: Remove some Deadcode in Gradle (#37160 )	2019-01-07 09:21:25 +01:00
Jim Ferenczi	e38cf1d0dc	Add the ability to set the number of hits to track accurately (#36357 ) In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates #33028	2019-01-04 20:36:49 +01:00
Julie Tibshirani	ac1c6940d2	Stop automatically nesting mappings in index creation requests. (#36924 ) Now that we unwrap mappings in DocumentMapperParser#extractMappings, it is not necessary for the mapping definition to always be nested under the type. This leniency around the mapping format was added in `2341825358`.	2019-01-03 17:41:28 -08:00
David Findley	d4e7660248	Fix weighted_avg parser not found for RestHighLevelClient (#37027 ) Add integration test for weighted avg sub aggregation Add weighted avg parser to DefaultNamedXContents Fixes #36861	2019-01-02 15:53:21 -06:00
Josh Soref	1df66d21fe	Spelling: replace uknown with unknown (#37056 )	2019-01-02 17:33:02 +01:00
Josh Soref	d3e98278c3	Spelling: replace cachable with cacheable (#37047 )	2019-01-02 14:10:30 +01:00
Armin Braun	85be9d6a89	SNAPSHOT: Deterministic ClusterState Tests (#36644 ) * Use `DeterministicTaskQueue` infrastructure to test `SnapshotsService`	2018-12-31 11:17:21 +01:00
Luca Cavanna	51fe20e0c3	Add support for local cluster alias to SearchRequest (#36997 ) With the upcoming cross-cluster search alternate execution mode, the CCS node will be able to split a CCS request into multiple search requests, one per remote cluster involved. In order to do that, the CCS node has to be able to signal to each remote cluster that such sub-requests are part of a CCS request. Each cluster does not know about the other clusters involved, and does not know either what alias it is given in the CCS node, hence the CCS coordinating node needs to be able to provide the alias as part of the search request so that it is used as index prefix in the returned search hits. The cluster alias is a notion that's already supported in the search shards iterator and search shard target, but it is currently used in CCS as both index prefix and connection lookup key when fanning out to all the shards. With CCS alternate execution mode the provided cluster alias needs to be used only as index prefix, as shards are local to each cluster hence no cluster alias should be used for connection lookups. The local cluster alias can be set to the SearchRequest at the transport layer only, and its constructor/getter methods are package private. Relates to #32125	2018-12-28 12:43:25 +01:00
Andrey Ershov	a02cfdf6e4	Switch InternalTestClusterTests to zen2 (#36977 ) Today InternalTestClusterTests is still using zen1. This commit fixes it. Two types of changes were required: 1. Explicitly pass file discovery host provider setting. It's done in ESIntegTestCase as a part of the Zen2 feature and should be done here as well. 2. For the test, that uses autoManageMinMasterNodes = false perform cluster bootstrap.	2018-12-27 22:21:37 +01:00
Nhat Nguyen	7580d9d925	Make SourceToParse immutable (#36971 ) Today the routing of a SourceToParse is assigned in a separate step after the object is created. We can easily forget to set the routing. With this commit, the routing must be provided in the constructor of SourceToParse. Relates #36921	2018-12-24 14:06:50 -05:00
Tim Brooks	c8a8391dfa	Only compress responses if request was compressed (#36867 ) This is a follow-up to some discussions around #36399. Currently we have relatively confusing compression behavior where compression can be configured for requests based on transport.compress or a specific setting for a remote cluster. However, we can only compress responses based on transport.compress as we do not know where a request is coming from (currently). This commit modifies the behavior to NEVER compress responses based on settings. Instead, a response will only be compressed if the request was compressed. This commit also updates the documentation to more clearly described transport level compression.	2018-12-21 10:14:00 -07:00
Tanguy Leroux	bd2af2c400	Merge branch 'master' into close-index-api-refactoring	2018-12-21 12:22:24 +01:00
Andrey Ershov	ca92d74e7e	[Zen2] Change unsafe bootstrap nodes count to nodes list in tests (#36559 ) This commit modifies ESSingleNodeTestCase and ESIntegTestCase and several concrete test classes to use node names when bootstrapping the cluster. Today ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING setting is used to bootstrap clusters in tests. Instead, we want to use ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING and get rid of the former setting eventually. There were two main problems when refactoring InternalTestCluster: 1. Nodes are created one-by-one in buildNode method. And node.name is created in this method as well. It's not suitable for bootstrapping, because we need to have the names of all master eligible nodes in advance, before creating the node with bootstrapping configuration set. We address this issue by separating buildNode into two methods: getNodeSettings and buildNode. We first iterate over all nodes to get nodes settings, then change the setting for the bootstrapping node and then proceed with building the node. 2. If autoManageMinMasterNodes = false, there is no way for the test to set the list of bootstrapping nodes because node names are not known in advance. This problem is solved by adding updateNodesSettings method to NodeConfigurationSource and ESIntegTestCase (which could be overridden by concrete integration test class). Once we have the list of settings for all nodes, the integration test class is allowed to update it. In our case, we update the ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING setting.	2018-12-20 15:20:33 +01:00
Tanguy Leroux	fb24469fe7	Merge branch 'master' into close-index-api-refactoring	2018-12-19 16:17:26 +01:00
Yannick Welsch	487a1c4f71	Fix cluster state persistence for single-node discovery (#36825 ) Single-node discovery is not persisting cluster states, which was caused by a recent 7.0-only refactoring. This commit ensures that the cluster state is properly persisted when using single-node discovery and adds a corresponding test.	2018-12-19 13:26:04 +01:00
Alan Woodward	344917efab	Add script filter to intervals (#36776 ) This commit adds the ability to filter out intervals based on their start and end position, and internal gaps: ``` POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "hot porridge", "filter" : { "script" : { "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0" } } } } } } } ```	2018-12-19 11:12:18 +00:00
Tanguy Leroux	c99fd6a53b	Merge branch 'master' into close-index-api-refactoring	2018-12-19 09:34:59 +01:00
Alpar Torok	e9ef5bdce8	Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311 ) - Create a separate unitTest task instead of Gradle's built in - convert all configuration to use the new task - the built in task is now disabled	2018-12-19 08:25:20 +02:00
Tim Brooks	aaf466ff5e	Revert transport.port change for tests (#36809 ) Commit #36786 updated docs and strings to reference transport.port instead of transport.tcp.port. However, this breaks backwards compatibility tests as the tests rely on string configurations and transport.port does not exist prior to 6.6. This commit reverts the places were we reference transport.tcp.port for tests. This work will need to be reintroduced in a backwards compatible way.	2018-12-18 19:01:13 -07:00
Tim Brooks	47a9a8de49	Update transport docs and settings for changes (#36786 ) This is related to #36652. In 7.0 we plan to deprecate a number of settings that make reference to the concept of a tcp transport. We mostly just have a single transport type now (based on tcp). Settings should only reference tcp if they are referring to socket options. This commit updates the settings in the docs. And removes string usages of the old settings. Additionally it adds a missing remote compress setting to the docs.	2018-12-18 13:09:58 -07:00
Ryan Ernst	8ec8342a52	Internal: Remove originalSettings from Node (#36569 ) This commit removes the originalSettings member from Node. It was only needed to allows test clusters to recreate the node in certain situations. Instead, the test cluster now keeps track of these settings.	2018-12-18 10:05:27 -08:00
Tanguy Leroux	0a0c969517	Merge branch 'master' into close-index-api-refactoring	2018-12-18 09:27:35 +01:00
Luca Cavanna	b57e12aa44	Add raw sort values to SearchSortValues transport serialization (#36617 ) In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one. This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.	2018-12-18 09:20:51 +01:00
Nicholas Knize	96d279ed83	Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320 )" This reverts commit `5bc7822562`.	2018-12-17 20:09:46 -06:00
Nick Knize	5bc7822562	[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320 ) This commit exposes lucene's LatLonShape field as the default type in GeoShapeFieldMapper. To use the new indexing approach, simply set "type" : "geo_shape" in the mappings without setting any of the strategy, precision, tree_levels, or distance_error_pct parameters. Note the following when using the new indexing approach: * geo_shape query does not support querying by MULTIPOINT. * LINESTRING and MULTILINESTRING queries do not yet support WITHIN relation. * CONTAINS relation is not yet supported. The tree, precision, tree_levels, distance_error_pct, and points_only parameters are deprecated.	2018-12-17 14:38:14 -06:00
Luca Cavanna	f1e1f93943	[TEST] fix float comparison in RandomObjects#getExpectedParsedValue This commit fixes a test bug introduced with #36597. This caused some test failure as stored field values comparisons would not work when CBOR xcontent type was used. Closes #29080	2018-12-17 21:19:59 +01:00
Tanguy Leroux	79999d37d4	Merge branch 'master' into close-index-api-refactoring	2018-12-17 10:14:38 +01:00
Boaz Leskes	733a6d34c1	Add seq no powered optimistic locking support to the index and delete transport actions (#36619 ) This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control) in the delete and index transport actions and requests. A follow up will come with adding sequence numbers to the update and get results. Relates #36148 Relates #10708	2018-12-15 17:59:57 +01:00
Tim Brooks	3065300434	Unify transport settings naming (#36623 ) This commit updates our transport settings for 7.0. It generally takes a few approaches. First, for normal transport settings, it usestransport. instead of transport.tcp. Second, it uses transport.tcp, http.tcp, or network.tcp for all settings that are proxies for OS level socket settings. Third, it marks the network.tcp.connect_timeout setting for removal. Network service level settings are only settings that apply to both the http and transport modules. There is no connect timeout in http. Fourth, it moves all the transport settings to a single class TransportSettings similar to the HttpTransportSettings class. This commit does not actually remove any settings. It just adds the new renamed settings and adds todos for settings that will be deprecated.	2018-12-14 14:41:04 -07:00
Tim Brooks	fbf88b2ab7	Remove the `MockTcpTransport` (#36628 ) This commit removes all remaining usages of the `MockTcpTransport`. Additionally it removes the `MockTcpTransport` and its test case.	2018-12-14 10:59:07 -07:00
Luca Cavanna	bb3ae18da5	Increase coverage in SearchSortValuesTests (#36597 ) SearchSortValuesTests extends now `AbstractSerializingTestCase` which removes some code duplication and standardizes the way we test `fromXContent`, serialization and equals/hashcode. Also, we were never creating `SearchSortValues` through their public constructor that accept an array of `DocValueFormat` together with the array of raw sort values. That is covered now, which involved some conversion from `BytesRef` to String in the test. Also, the previous test was not using doing any equality check against the original and parsed versions in `testFromXContent` due to values being parsed with different types in some cases, which is now covered by converting those values using a new method added to `RandomObjects`. The code was already there as part of `randomStoredFieldValues`, but it is now exposed to be used in other scenarios.	2018-12-14 18:57:37 +01:00
Luca Cavanna	7dc3d3b78b	Add sort and collapse info to SearchHits transport serialization (#36555 ) In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of the `SearchHits`, specifically: - lucene `SortField[]` which contains info about the fields that sorting was performed on and their type, which depends on mappings (that the CCS node does not know about) - collapse field (`String`) that field collapsing was executed on, if requested - collapse values (`Object[]`) that field collapsing was based on, if requested This info is needed to be able to reconstruct the `TopFieldDocs` or `CollapseFieldTopDocs` in the CCS coordinating node to feed the `mergeTopDocs` method and reduce multiple search responses received (one per cluster) into one. This commit adds such information to the `SearchHits` class. It's nullable info that is not serialized through the REST layer. `SearchPhaseController` sets such info at the end of the hits reduction phase.	2018-12-14 12:22:54 +01:00
Armin Braun	c5b3ac5578	SNAPSHOTS: Allow Parallel Restore Operations (#36397 ) * Enable parallel restore operations * Add uuid to restore in progress entries to uniquely identify them * Adjust restore in progress entries to be a map in cluster state * Added tests for: * Parallel restore from two different snapshots * Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator * Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers	2018-12-14 11:39:23 +01:00
Tanguy Leroux	8e5dd20efb	[Close Index API] Refactor MetaDataIndexStateService (#36354 ) The commit changes how indices are closed in the MetaDataIndexStateService. It now uses a 3 steps process where writes are blocked on indices to be closed, then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction added in #36249, and finally indices states are moved to CLOSE and their routing tables removed. The closing process also takes care of using the pre-7.0 way to close indices if the cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction. It also closes unassigned indices. Related to #33888	2018-12-13 17:36:23 +01:00
Boaz Leskes	f6b5d7e013	Add sequence numbers based optimistic concurrency control support to Engine (#36467 ) This commit add support to engine operations for resolving and verifying the sequence number and primary term of the last modification to a document before performing an operation. This is infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning. Relates #36148 Relates #10708	2018-12-13 08:08:40 +01:00
Tal Levy	cd1bec3a06	[refactor] add Environment in BootstrapContext (#36573 ) There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in #36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.	2018-12-12 21:07:21 -08:00
Julie Tibshirani	71a39d10be	Make sure that BWC tests run successfully, even with types deprecation messages. (#36511 )	2018-12-12 12:57:32 -08:00
Tim Brooks	7f612d5dd8	Always compress based on the settings (#36522 ) Currently TransportRequestOptions allows specific requests to request compression. This commit removes this and always compresses based on the settings. Additionally, it removes TransportResponseOptions as they are unused. This closes #36399.	2018-12-12 09:39:15 -07:00
Simon Willnauer	ff5dd14753	Fix test failures related to file corruption (#36530 ) * Fix CorruptFileIT to also take last DV generation into account We currently only prune old .liv generations. With soft_deletes it's important to also prune DV generations. * Fix CorruptionUtils to skip the footer bytes after the checksum is read. Today we read a broken checksum since we also checksum the 8 footer bytes that include the checksum algorithm and the footer magic. Closes #36526	2018-12-12 16:21:02 +01:00
Tim Brooks	e63d52af63	Move page size constants to PageCacheRecycler (#36524 ) `PageCacheRecycler` is the class that creates and holds pages of arrays for various uses. `BigArrays` is just one user of these pages. This commit moves the constants that define the page sizes for the recycler to be on the recycler class.	2018-12-12 07:00:50 -07:00
Alpar Torok	c00d0fc814	Test fixtures improovements (#36037 ) * Upgrae plugin to latest and expose udp * Explicit check for windows * Rename the properties for the port numbers * Tasks for pre and pos container actions	2018-12-12 12:00:47 +02:00
Nik Everett	03daad9812	Re-deprecate xpack rollup endpoints (#36451 ) Redeprecates the `/_xpack/rollup` endpoints in favor of `/_rollup`. When we cleanup the rollup in a cluster containing 6.x nodes we need to use `/_xpack/rollup` instead of `/_rollup` because the 6.x nodes don't know about `/_rollup`. In those cases we must ignore the deprecation warnings that the 7.0 node will return for the end point. Closes #36044	2018-12-11 19:43:17 -05:00
Tim Brooks	797f985067	Add version to handshake requests (#36171 ) Currently our handshake requests do not include a version. This is unfortunate as we cannot rely on the stream version since it is not the sending node's version. Instead it is the minimum compatibility version. The handshake request is currently empty and we do nothing with it. This should allow us to add data to the request without breaking backwards compatibility. This commit adds the version to the handshake request. Additionally, it allows "future data" to be added to the request. This allows nodes to craft a version compatible response. And will properly handle additional data in future handshake requests. The proper handling of "future data" is useful as this is the only request where we do not know the other node's version. Finally, it renames the TcpTransportHandshaker to TransportHandshaker.	2018-12-11 16:09:28 -07:00
Mayya Sharipova	2f18325384	Deprecate types in update_by_query and delete_by_query (#36365 ) Relates to #35190	2018-12-11 17:09:59 -05:00
Tim Brooks	790f8102e9	Modify `BigArrays` to take name of circuit breaker (#36461 ) This commit modifies BigArrays to take a circuit breaker name and the circuit breaking service. The default instance of BigArrays that is passed around everywhere always uses the request breaker. At the network level, we want to be using the inflight request breaker. So this change will allow that. Additionally, as this change moves away from a single instance of BigArrays, the class is modified to not be a Releasable anymore. Releasing big arrays was always dispatching to the PageCacheRecycler, so this change makes the PageCacheRecycler the class that needs to be managed and torn-down. Finally, this commit closes #31435 be making the serialization of transport messages use the inflight request breaker. With this change, we no longer push the global BigArrays instnace to the network level.	2018-12-11 11:55:41 -07:00
markharwood	a9eccbcd02	Tests- added helper methods to ESRestTestCase for checking warnings (#36443 ) Added helper methods to ESRestTestCase for checking warnings in mixed and current-version-only clusters. This is supported by a new VersionSpecificWarningsHandler class with associated unit test. Closes #36251	2018-12-11 17:30:15 +00:00
Andrey Ershov	8b821706cc	Switch more tests to zen2 (#36367 ) 1. CCR tests work without any changes 2. `testDanglingIndices` require changes the source code (added TODO). 3. `testIndexDeletionWhenNodeRejoins` because it's using just two nodes, adding the node to exclusions is needed on restart. 4. `testCorruptTranslogTruncationOfReplica` starts dedicated master one, because otherwise, the cluster does not form, if nodes are stopped and one node is started back. 5. `testResolvePath` needs TEST cluster, because all nodes are stopped at the end of the test and it's not possible to perform checks needed by SUITE cluster. 6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2 retries snapshot creation as soon as network partition heals. This results into the race between creating snapshot and test cleanup logic (deleting index). Zen1 on the other hand, also schedules retry, but it takes some time after network partition heals, so cleanup logic executes latter and test passes. The check that snapshot is eventually created is added to the end of the test.	2018-12-11 17:12:17 +01:00
Julie Tibshirani	87831051dc	Deprecate types in explain requests. (#35611 ) The following updates were made: - Add a new untyped endpoint `{index}/_explain/{id}`. - Add deprecation warnings to RestAction, plus tests in RestActionTests. - For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml). - Deprecate relevant methods on the Java HLRC requests/ responses. - Update documentation (for both the REST API and Java HLRC).	2018-12-10 19:45:13 -08:00
Jernej Klancic	d615add1b1	Add pipeline parent validation for auto date histogram (#35670 ) Allow `auto_date_histogram` as a valid parent agg for derivative, cumulative sum, moving average, moving function and serial differencing pipeline aggregations. Since all these aggs share the same requirement (sequentially ordered parent aggs), this commit also refactors to share the same validation code so that any newly added aggs won't be forgotten. Closes #35578	2018-12-10 16:02:49 -05:00
Jim Ferenczi	75392adf60	[TEST] Convert SearchHitsTests to AbstractStreamableXContentTestCase (#36313 ) This change adds a way to provide the content type of the rest serialization tests when creating random instances. This is used by SearchHitsTests to ensure that the internal members of the class are created with the same xContentType and that equals can be used to compare an instances created from an XContent view.	2018-12-10 20:41:20 +01:00
Nik Everett	9626e700ce	LLRC: Make warning behavior pluggable per request (#36345 ) This allows you to plug the behavior that the LLRC uses to handle warnings on a per request basis. We entertained the idea of allowing you to set the warnings behavior to strict mode on a per request basis but that wouldn't allow the high level rest client to fail when it sees an unexpected warning. We also entertained the idea of adding a list of "required warnings" to the `RequestOptions` but that won't work well with failures that occur sometimes like those we see in mixed clusters. Adding a list of "allowed warnings" to the `RequestOptions` would work for mixed clusters but it'd leave many of the assertions in our tests weaker than we'd like. This behavior plugging implementation allows us to make a "required warnings" option when we need it and an "allowed warnings" behavior when we need it. I don't think this behavior is going to be commonly used by used outside of the Elasticsearch build, but I expect they'll be a few commendably paranoid folks who could use this behavior.	2018-12-10 08:32:00 -05:00
David Turner	9f86e996fe	[Zen2] Support rolling upgrades from Zen1 (#35737 ) We support rolling upgrades from Zen1 by keeping the master as a Zen1 node until there are no more Zen1 nodes in the cluster, using the following principles: - Zen1 nodes will never vote for Zen2 nodes - Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes - Zen2 nodes that were previously part of a mixed cluster will automatically (and unsafely) bootstrap themselves when the last Zen1 node leaves.	2018-12-08 07:33:35 +00:00
Tim Brooks	8a53f2b464	Implement basic `CcrRepository` restore (#36287 ) This is related to #35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.	2018-12-07 15:27:04 -07:00
Tim Brooks	5556204f81	Use MockNioTransport in MockTransportService (#36346 ) The default transport used in the MockTransportService is the MockTcpTransport. This commit changes that to be the MockNioTransport.	2018-12-07 11:17:11 -07:00
Nhat Nguyen	f2df0a5be4	Remove LocalCheckpointTracker#resetCheckpoint (#34667 ) In #34474, we added a new assertion to ensure that the LocalCheckpointTracker is always consistent with Lucene index. However, we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this assertion to be violated. This commit removes resetCheckpoint from LocalCheckpointTracker and rewrites testDedupByPrimaryTerm without resetting the local checkpoint. Relates #34474	2018-12-07 12:22:20 -05:00
David Turner	ed1c5a0241	Introduce `zen2` discovery type (#36298 ) With this change it is now possible to start a node running Zen2.	2018-12-06 16:20:08 +00:00
David Turner	38ab15c6fb	Avoid shutting down the only master (#36272 ) Today the InternalTestClusterTests sometimes set up a cluster with a single master, start some other ndoes, shut the original master down, and then reset the cluster. This doesn't really work, because the original master may be stale. This change avoids shutting down the only master in this situation.	2018-12-06 08:27:38 +01:00
Yannick Welsch	a0ae1cc987	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 23:13:12 +01:00
Yannick Welsch	03d0ea91ef	Zen2: Rename tombstones to exclusions (#36226 ) Renames the withdrawal / tombstones APIs to voting configuration exclusions.	2018-12-05 23:12:28 +01:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
Yannick Welsch	b20497560c	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 14:06:38 +01:00
Yannick Welsch	0b9efff5cb	Zen2: Persist cluster states the old way on non-master-eligible nodes (#36247 ) The shard deletion logic (triggered by IndicesStore), which also leads to index metadata deletion on non-master-eligible data nodes, currently races against the new cluster state persistence logic triggered by accepting cluster states. One thread is writing the index metadata while another one is deleting the index metadata, leading to exceptions and assertions tripping (see below). The solution proposed by this PR is to move the cluster state persistence of non-master-eligible nodes back to the cluster applier service, just as it used to be for Zen1. This ensures that the index metadata deletion logic, which is triggered by the shard deletion logic, runs on the same thread on which we persist the cluster state.	2018-12-05 14:04:45 +01:00
Alpar Torok	60e45cd81d	Testing conventions task part 2 (#36107 ) Closes #35435 - make it easier to add additional testing tasks with the proper configuration and add some where they were missing. - mute or fix failing tests - add a check as part of testing conventions to find classes not included in any testing task.	2018-12-05 14:20:01 +02:00
Yannick Welsch	70c361ea5a	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-04 21:26:11 +01:00
Adrien Grand	0df08dd458	Set Lucene version upon index creation. (#36038 ) It is important that all shards of a given index have the same `indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to be considered illegal. At the moment, we use the latest Lucene version when creating a shard, which could cause shards to have different created versions eg. in case of forced allocation. This commit makes sure to reuse the appropriate Lucene version in order to avoid such issues. Closes #33826	2018-12-04 17:53:20 +01:00
Nhat Nguyen	b59deb573e	Always set soft-deletes field of IndexWriterConfig (#36196 ) Today we configure the soft-deletes field iff soft-deletes enabled. Although this choice was correct, it prevents an engine with soft-deletes disabled from opening a Lucene index with soft-deletes. Moreover, this change should not have any side-effect if a Lucene index does not have any soft-deletes. Relates #36141	2018-12-04 11:15:34 -05:00
Andrey Ershov	35e3d77e2c	[Zen2] Implement state recovery (#36013 ) This commit implements proper metadata recovery for Zen2. GatewayService is responsible for the recovery. In Zen1 GatewayService creates an instance of Gateway, that is used to reach out to other cluster nodes, get their state and calculate the most up-to-date state based on versions. After that Gateway performs upgrade and archival of ClusterSettings and closes bad indices. Then recovered state is passed to GatewayService.GatewayRecoveryListener that mixes up current state and restored state, removes state not recovered block, creates the routing table and performs re-routing. In Zen2 we should perform this kind of logic on cluster startup, except mixing state (because there is nothing to mix) and opening routing table. This commit refactors out all `ClusterUpdate` functions in a separate class `ClusterStateUpdaters`, which is used by `Gateway` and `GatewayService` in case of Zen1, and by `GatewayMetaState` and `GatewayService` in case of Zen2. This commit also switches all integration tests that are already using Zen2 from InMemoryPersistedState to GatewayMetaState.	2018-12-04 14:45:45 +01:00
Yannick Welsch	80ee7943c9	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-04 09:37:09 +01:00
Alpar Torok	d036e0ca89	Testclusters: implement starting, waiting for and stopping single cluster nodes (#35599 )	2018-12-04 10:16:51 +02:00
David Turner	034c7655b7	[Zen2] Reduce cluster scope in NodeDisconnectIT (#36168 ) This test suite can stop all the shared master-eligible nodes, which breaks the cluster since any non-shared master-eligible nodes are stopped first in the reset process between tests. Since this test suite can leave the cluster in this somewhat broken state, it seems best that it uses a new cluster for each test.	2018-12-04 07:48:56 +00:00
Armin Braun	433a506d06	SNAPSHOT: Improve Resilience SnapshotShardService (#36113 ) * Resolve the index in the snapshotting thread * Added test for routing table - snapshot state mismatch	2018-12-03 16:39:29 +01:00
Jim Ferenczi	74aca756b8	Remove the distinction between query and filter context in QueryBuilders (#35354 ) When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context.. This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required. Closes #35293	2018-12-03 11:49:11 +01:00
David Turner	8011438ea8	Use correct source of randomness This fixes a failure of InternalTestClusterTests#testBeforeTest which checks that the cluster is set up the same when starting from the same seed. Trappily, using ESTestCase#randomIntBetween() is no good, we have to use InternalTestCluster#random via RandomNumbers#randomIntBetween() instead.	2018-12-02 09:39:43 +00:00
David Turner	8191348d6b	[Zen2] Only bootstrap a single node (#36119 ) Today, we allow all nodes in an integration test to bootstrap. However this seems to lead to test failures due to post-election instability. The change avoids this instability by only bootstrapping a single node in the cluster.	2018-12-01 06:43:11 +00:00
Luca Cavanna	0ebc17743a	Histogram aggs: add empty buckets only in the final reduce step (#35921 ) Empty buckets don't need to be added when performing an incremental reduction step, they can be added later in the final reduction step. This will allow us to later remove the max buckets limit when performing non final reduction.	2018-11-30 20:33:09 +01:00
Tim Brooks	ea7ea51050	Make `TcpTransport#openConnection` fully async (#36095 ) This is a follow-up to #35144. That commit made the underlying connection opening process in TcpTransport asynchronous. However the method still blocked on the process being complete before returning. This commit moves the blocking to the ConnectionManager level. This is another step towards the top-level TransportService api being async.	2018-11-30 11:30:42 -07:00
Tim Brooks	26dcbcc8cc	Remove `MockTcpTransport` for ESIntegTestCase (#36089 ) This commit removes the `MockTcpTransport` as a transport option for `ESIntegTestCase`. It is the first step in replacing the usages of `MockTcpTransport` with `MockNioTransport`.	2018-11-30 09:04:51 -07:00
Adrien Grand	fa3d365ee8	Fix CompositeBytesReference#slice to not throw AIOOBE with legal offsets. (#35955 ) CompositeBytesReference#slice has two bugs: - One that makes it fail if the reference is empty and an empty slice is created, this is #35950 and is fixed by special-casing empty-slices. - One performance bug that makes it always create a composite slice when creating a slice that ends on a boundary, this is fixed by computing `limit` as the index of the sub reference that holds the last element rather than the next element after the slice. Closes #35950	2018-11-30 10:32:46 +01:00
Zachary Tong	61c2db5ebb	Revert "Deprecate X-Pack centric rollup endpoints (#35962 )" This reverts commit `b84f1f6a3a`.	2018-11-29 12:58:23 -05:00
Zachary Tong	40c5445480	Revert "[TEST] Use deprecated form of rollup endpoint in mixed cluster (#36000 )" This reverts commit `85cdf4f913`.	2018-11-29 12:56:25 -05:00
Tim Brooks	c305f9dc03	Make keepalive pings bidirectional and optimizable (#35441 ) This is related to #34405 and a follow-up to #34753. It makes a number of changes to our current keepalive pings. The ping interval configuration is moved to the ConnectionProfile. The server channel now responds to pings. This makes the keepalive pings bidirectional. On the client-side, the pings can now be optimized away. What this means is that if the channel has received a message or sent a message since the last pinging round, the ping is not sent for this round.	2018-11-29 08:55:53 -07:00
Zachary Tong	85cdf4f913	[TEST] Use deprecated form of rollup endpoint in mixed cluster (#36000 ) When wiping rollup jobs, if we are in a mixed cluster with < v7.0 nodes we need to fall back to the deprecated endpoint because we may talk to a 6.x node.	2018-11-29 07:37:33 -05:00
David Turner	7f257187af	[Zen2] Update default for USE_ZEN2 to true (#35998 ) Today the default for USE_ZEN2 is false and it is overridden in many places. By defaulting it to true we can be sure that the only places in which Zen2 does not work are those in which it is explicitly set to false.	2018-11-29 12:18:35 +00:00
Jason Tedor	b84f1f6a3a	Deprecate X-Pack centric rollup endpoints (#35962 ) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.	2018-11-27 20:34:17 -05:00
Tim Brooks	cc1fa799c8	Remove `TcpChannel#setSoLinger` method (#35924 ) This commit removes the dedicated `setSoLinger` method. This simplifies the `TcpChannel` interface. This method has very little effect as the SO_LINGER is not set prior to the channels being closed in the abstract transport test case. We still will set SO_LINGER on the `MockNioTransport`. However we can do this manually.	2018-11-27 09:08:14 -07:00
Andrey Ershov	0e283f9670	[Zen2] PersistedState interface implementation (#35819 ) Today GatewayMetaState is capable of atomically storing MetaData to disk. We've also moved fields that are needed to be persisted in Zen2 from ClusterState to ClusterState.MetaData.CoordinationMetaData. This commit implements PersistedState interface. version and currentTerm are persisted as a part of Manifest. GatewayMetaState now implements both ClusterStateApplier and PersistedState interfaces. We started with two descendants Zen1GatewayMetaState and Zen2GatewayMetaState, but it turned out to be not easy to glue it. GatewayMetaState now constructs previousClusterState (including MetaData) and previousManifest inside the constructor so that all PersistedState methods are usable as soon as GatewayMetaState instance is constructed. Also, loadMetaData is renamed to getMetaData, because it just returns previousClusterState.metaData(). Sadly, we don't have access to localNode (obtained from TransportService in the constructor, so getLastAcceptedState should be called, after setLocalNode method is invoked. Currently, when deciding whether to write IndexMetaData to disk, we're comparing current IndexMetaData version and received IndexMetaData version. This is not safe in Zen2 if the term has changed. So updateClusterState now accepts incremental write method parameter. When it's set to false, we always write IndexMetaData to disk. Things that are not covered by GatewayMetaStateTests are covered by GatewayMetaStatePersistedStateTests. This commit also adds an option to use GatewayMetaState instead of InMemoryPersistedState in TestZenDiscovery. However, by default InMemoryPersistedState is used and only one test in PersistedStateIT used GatewayMetaState. In order to use it for other tests, proper state recovery should be implemented.	2018-11-27 15:04:52 +01:00
Christophe Bismuth	b95a4db6e6	Throw a parsing exception when boost is set in span_or query (#28390 ) (#34112 )	2018-11-26 12:15:59 -05:00
Jim Ferenczi	e37a0ef844	Upgrade to lucene-8.0.0-snapshot-67cdd21996 (#35816 )	2018-11-22 15:42:59 +01:00
Andrey Ershov	a056bd8c1c	[Zen2] Move ClusterState fields to be persisted to ClusterState.MetaData (#35625 ) Today we have a way to atomically persist global MetaData and IndexMetaData to disk when new ClusterState is received. All other ClusterState fields are not persisted. However, there are other parts of ClusterState that should be persisted, namely: version term lastCommittedConfiguration lastAcceptedConfiguration votingTombstones version is changed frequently, other fields are not. We decided to group term, lastCommittedConfiguration, lastAcceptedConfiguration and votingTombstones into CoordinationMetaData class and make CoordinationMetaData a field inside MetaData. MetaData.toXContent and MetaData.fromXContent should take care of CoordinationMetaData. version stays as a top level field in ClusterState and will be persisted as part of Manifest in a follow-up commit. Also MetaData.isGlobalStateEquals should be extended to include coordinationMetaData in comparison. This commit favors exposing getters, such as getTerm directly in ClusterState to avoid massive code changes. An example of CoordinationMetaState.toXContent: { "term": 1, "last_committed_config": [ "TiIuBcbBtpuXyDDVHXeD", "ZIAoVbkjjLPLUuYLaTkw" ], "last_accepted_config": [ "OwkXbXZNOZPJqccdFHdz", "LouzsGYwmQzpeQMrboZe", "fCKGRZdjLTqzXAqPUtGL", "pLoxshjpJXwDhbgjfYJy", "SjINLwFIlIEFZCbjrSFo", "MDkVncJEVyZLJktopWje" ] }	2018-11-21 17:03:26 +01:00
Andrey Ershov	6ac0cb1842	Merge branch master into zen2 2 types of conflicts during the merge: 1) Line length fix 2) Classes no longer extend AbstractComponent	2018-11-21 15:36:49 +01:00
Yannick Welsch	8939a7894f	Zen2: Move disruption tests to Zen2 (#35724 ) - Moves disruption tests to Zen2 - Registers a few missing settings - Removes .put(TestZenDiscovery.USE_ZEN2.getKey(), true) from tests where Zen2 is now enabled by default through the parent test class - Moves QuorumGatewayIT back to Zen1, as it is not stable with Zen2 as it currently relies on dangling indices due to the lack of proper CS persistence, which triggers secondary failures	2018-11-21 14:43:33 +01:00
Armin Braun	7a210342ab	TESTS: Remove Dead Code in Disruption Tests (#35768 ) * Neither this class nor the constructor are used anywhere	2018-11-21 10:33:50 +01:00
Christoph Büscher	5847f8379c	Move ScoreAccessor to test-framework (#35766 ) This class is only used by RandomScoreFunctionIT and the MockScriptEngine, so it shouldn't be part of the server codebase.	2018-11-21 10:28:31 +01:00
Armin Braun	33c713ba60	TESTS: More Logging in LongGcDisruptionTests (#35702 ) * The existing logging is not helpful enough to track down which threads hang, we need the hanging thread's stacktraces too * Relates #35686	2018-11-20 15:36:01 +01:00
Alpar Torok	8659af68e0	Auto skip license headers on no source (#35640 ) * Unmute BuildExamplePluginsIT * Skip licenseHeaders when there are no sources	2018-11-20 13:02:33 +02:00
Simon Willnauer	29ef442841	Add a `_freeze` / `_unfreeze` API (#35592 ) This commit adds a rest endpoint for freezing and unfreezing an index. Among other cleanups mainly fixing an issue accessing package private APIs from a plugin that got caught by integration tests this change also adds documentation for frozen indices. Note: frozen indices are marked as `beta` and available as a basic feature. Relates to #34352	2018-11-20 08:03:24 +01:00
Yannick Welsch	47ada69c46	Zen2: Move most integration tests to Zen2 (#35678 ) Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR are as follows: - ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings). - Some of the integration tests require persistent storage of the cluster state, which is not fully implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched over as soon as that is fully implemented. - Some very few integration tests are not running yet with Zen2 for other reasons, depending on some of the other open points in #32006.	2018-11-19 21:15:29 +01:00
Gordon Brown	b2057138a7	Remove AbstractComponent from AbstractLifecycleComponent (#35560 ) AbstractLifecycleComponent now no longer extends AbstractComponent. In order to accomplish this, many, many classes now instantiate their own logger.	2018-11-19 09:51:32 -07:00
Arthur Gavlyukovskiy	022726011c	Remove use of AbstractComponent in server (#35444 ) Removed extending of AbstractComponent and changed logger usage to explicit declaration. Abstract classes still have logger declaration using this.getClass() in order to show implementation class name in its logs. See #34488	2018-11-16 16:10:32 -05:00
Jernej Klancic	baf33b3162	Removes AbstractComponent from several classes (#35566 ) Removes inhertiting from AbstractComponent for some classes (mostly in the plugins module). Relates to #34488	2018-11-16 20:50:18 +01:00
Lee Hinman	ce35d049e9	[TEST] Fix ClusterApplierServiceTests.testClusterStateUpdateLogging This changes the test to not use a `CountDownlatch`, instead adding an assertion for the final logging message and waiting until the `MockAppender` has seen it before proceeding. Resolves #23739	2018-11-15 14:15:23 -07:00
David Turner	86ef041539	[Zen2] Introduce ClusterBootstrapService (#35488 ) Today, the bootstrapping of a Zen2 cluster is driven externally, requiring something else to wait for discovery to converge and then to inject the initial configuration. This is hard to use in some situations, such as REST tests. This change introduces the `ClusterBootstrapService` which brings the bootstrap retry logic within each node and allows it to be controlled via an (unsafe) node setting.	2018-11-15 20:09:22 +00:00
Tanguy Leroux	c9b4ef0dfd	Use RunOnce when appropriate (#35553 ) This pull request replaces some blocks of code that must be run once and that are currently based on AtomicBoolean by the convient RunOnce class added in #35489.	2018-11-15 09:24:40 +01:00
Andrey Ershov	045fdd0d3b	Merge master into zen2	2018-11-14 15:37:13 +03:00
Yannick Welsch	4cfdb0609e	Adapt InternalCluster#fullRestart to call onNodeStopped when all nodes are stopped (#35494 ) Refactors and simplifies the logic around stopping nodes, making sure that for a full cluster restart onNodeStopped is only called after the nodes are actually all stopped (and in particular not while starting up some nodes again). This change also ensures that a closed node client is not being used anymore (which required a small change to a test). Relates to #35049	2018-11-14 13:24:56 +01:00
Zachary Tong	c346a0f027	[Rollup] Add `wait_for_completion` option to StopRollupJob API (#34811 ) This adds a `wait_for_completion` flag which allows the user to block the Stop API until the task has actually moved to a stopped state, instead of returning immediately. If the flag is set, a `timeout` parameter can be specified to determine how long (at max) to block the API call. If unspecified, the timeout is 30s. If the timeout is exceeded before the job moves to STOPPED, a timeout exception is thrown. Note: this is just signifying that the API call itself timed out. The job will remain in STOPPING and evenutally flip over to STOPPED in the background. If the user asks the API to block, we move over the the generic threadpool so that we don't hold up a networking thread.	2018-11-13 16:37:17 -05:00
Julie Tibshirani	bc799e4a6f	Ignore warnings related to types deprecation in REST tests. (#35395 )	2018-11-13 11:56:01 -08:00
David Turner	8e40a2bbe2	[Zen2] Introduce vote withdrawal (#35446 ) If shutting down half or more of the master-eligible nodes, their votes must first be explicitly withdrawn to ensure that the cluster doesn't lose its quorum. This works via _voting tombstones_, stored in the cluster state, which tell the reconfigurator to remove nodes from the voting configuration. This change introduces voting tombstones to the cluster state, together with transport APIs for adding and removing them, and makes use of these APIs in `InternalTestCluster` to support tests which remove at least half of the master-eligible nodes at once (e.g. shrinking from two master-eligible nodes to one).	2018-11-13 19:32:32 +00:00
David Turner	fbd3cab410	[Zen2] Remove AbstractComponent usage (#35483 ) AbstractComponent was deprecated in #35140 and is looking like it will be removed at some point by #34888. Today all it does is provide a logger. This change removes the usages of AbstractComponent that live solely in the zen2 feature branch to avoid some future merge pain, and replaces it where necessary with some directly-created loggers.	2018-11-13 15:20:49 +00:00
Yannick Welsch	fe29b18c26	Fix compilation	2018-11-12 11:05:11 +01:00
Yannick Welsch	4e6c58c942	Merge remote-tracking branch 'elastic/master' into zen2	2018-11-12 10:03:59 +01:00
Tim Brooks	ba478827ad	Improve MockTcpTransport memory usage (#35402 ) The MockTcpTransport is not friendly in regards to memory usage. It must allocate multiple byte arrays for every message. This improves the memory situation by failing fast if the message is improperly formatted. Additionally, it uses reusable big arrays for at least half of the allocated byte arrays.	2018-11-09 10:12:49 -07:00
Jim Ferenczi	7054e289fa	Add trace log of the request for the query and fetch phases (#34479 ) This change adds a logger for the query and fetch phases that prints all requests before their execution at the trace level. This will help debugging cases where an issue occurs during the execution since only completed queries are logged by the slow logs.	2018-11-09 09:41:51 +01:00
Tim Brooks	93c2c604e5	Move compression config to ConnectionProfile (#35357 ) This is related to #34483. It introduces a namespaced setting for compression that allows users to configure compression on a per remote cluster basis. The transport.tcp.compress remains as a fallback setting. If transport.tcp.compress is set to true, then all requests and responses are compressed. If it is set to false, only requests to clusters based on the cluster.remote.cluster_name.transport.compress setting are compressed. However, after this change regardless of any local settings, responses will be compressed if the request that is received was compressed.	2018-11-08 10:37:59 -07:00
Yannick Welsch	c315ead0ac	Zen2: Add diff-based publishing (#35290 ) Enables diff-based publishing, which is an optimization where only the changing parts of the cluster state are published to the nodes in the cluster, falling back to full cluster state publishing if the receiver does not have the previous cluster state.	2018-11-08 17:16:09 +01:00
David Turner	6885a7cb0f	Introduce transport API for cluster bootstrapping (#34961 ) - Introduces a transport API for bootstrapping a Zen2 cluster - Introduces a transport API for requesting the set of nodes that a master-eligible node has discovered and for waiting until this comprises the expected number of nodes. - Alters ESIntegTestCase to use these APIs when forming a cluster, rather than injecting the initial configuration directly.	2018-11-08 16:09:37 +00:00
Zachary Tong	54b445d74b	[Test] Remove obsolete job/cluster cleanup code Also makes sure the awaitBusy for job stoppage is checked, so that we can fail if we timed out waiting for a job to stop. Closes #35295	2018-11-08 10:23:23 -05:00
David Turner	77789a733d	Merge branch 'master' into 2018-11-08-merge-master	2018-11-08 13:38:18 +00:00
Simon Willnauer	0cc0fd2d15	Add a frozen engine implementation (#34357 ) This change adds a `frozen` engine that allows lazily open a directory reader on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases. Relates to #34352	2018-11-07 20:23:35 +01:00
Alpar Torok	8a85b2eada	Remove build qualifier from server's Version (#35172 ) With this change, `Version` no longer carries information about the qualifier, we still need a way to show the "display version" that does have both qualifier and snapshot. This is now stored by the build and red from `META-INF`.	2018-11-07 14:01:05 +02:00
Tim Brooks	f395b1eace	Open node connections asynchronously (#35144 ) This is related to #29023. Additionally at other points we have discussed a preference for removing the need to unnecessarily block threads for opening new node connections. This commit lays the groudwork for this by opening connections asynchronously at the transport level. We still block, however, this work will make it possible to eventually remove all blocking on new connections out of the TransportService and Transport.	2018-11-06 17:58:20 -07:00
David Turner	7e356ac29b	[Zen2] Introduce auto_shrink_voting_configuration setting (#35217 ) Today we allow the user to set the minimum size of a voting configuration. On reflection we would rather this was simply '3' where possible, and we can use the retirement API to control the removal of nodes more explicitly. This change replaces the old reconfigurator setting with a new one, `cluster.auto_shrink_voting_configuration`, which determines whether Elasticsearch should automatically remove nodes from the voting configuration or not.	2018-11-06 18:10:29 +00:00
Nick Knize	a5e1f4d3a2	Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224 )	2018-11-06 11:55:23 +01:00
David Turner	2fb3d1a465	[Zen2] Fix some rarely-failing tests (#35198 ) Recent changes have left a few Zen2 tests occasionally failing. This commit fixes them.	2018-11-05 21:54:53 +00:00
Boaz Leskes	28078642b3	Engine.newChangesSnapshot may cause unneeded refreshes if called concurrently (#35169 ) When the engine is asked for historical operations, we check if some of the requested operations are not yet refreshed and if so we refresh before returning the operations. The refresh check is based on capturing the local checkpoint before each refresh and comparing that value to the one requested when `newChangesSnapshot` was called. If the requested range is above the captured local checkpoint we issue a refresh. This can currently cause unneeded extra refreshes if the method is called concurrently which may cause unwanted degradation in indexing performance. This is especially relevant for CCR where we always ask for a range below the global checkpoint. That range is guaranteed to be below the local checkpoint of the shard and one refresh is enough to serve multiple changes requests. This commit fixes this by introducing a dedicated mutex to make sure the test for whether a refresh is needed actually wait for concurrents for concurrent refreshes that were caused by another change refresh. Note that this is not a big change in semantics as refreshes are serialized by lucene anyway. I also opted not to keep the synchronization to the changes snapshot request only even if in theory we can apply it to all refreshes, not matter where they come from.	2018-11-04 13:43:33 +01:00
Nhat Nguyen	855ab3fa1e	Add equals/hashCode to SeqNoStats (#35223 ) This commit adds equals/hashCode to SeqNoStats so we can verify it wholly in tests.	2018-11-02 21:31:36 -04:00
Tim Brooks	0166388d74	Use single netty event loop group for transports (#35181 ) Currently we create a new netty event loop group for client connections and all server profiles. Each new group creates new threads for io processing. This means 2 * num of processors new threads for each group. A single group should be able to handle all io processing (for the transports). This also brings the netty module inline with what we do for nio. Additionally, this PR renames the worker threads to be the same for netty and nio.	2018-11-02 16:31:19 -06:00
Colin Goodheart-Smithe	fc6e1f7f3f	Merge branch 'master' into index-lifecycle	2018-11-02 10:56:35 +00:00
Alpar Torok	f22700812e	Introduce build qualifier parameter (#35155 ) * Introduce property to set version qualifier - VersionProperties.elasticsearch is now a string which can have qualifier and snapshot too - The Version class in the build no longer cares about snapshot and qualifier.	2018-11-02 05:27:40 +02:00
Julie Tibshirani	746d94e299	Unmute AbstractQueryTestCase#testToQuery. The RangeQueryBuilderTests#testToQuery failures were fixed in #34868 and #35145.	2018-11-01 12:06:36 -07:00
Tal Levy	c3cf7dd305	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-11-01 10:13:02 -07:00
Nik Everett	e28509fbfe	Core: Less settings to AbstractComponent (#35140 ) Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to stop passing around `Settings` in a ton of places. While this change touches many files, it touches them all in fairly small, mechanical ways, doing a few things per file: 1. Drop the `super(settings);` line on everything that extends `AbstractComponent`. 2. Drop the `settings` argument to the ctor if it is no longer used. 3. If the file doesn't use `logger` then drop `extends AbstractComponent` from it. 4. Clean up all compilation failure caused by the `settings` removal and drop any now unused `settings` isntances and method arguments. I've intentionally not removed the `settings` argument from a few files: 1. TransportAction 2. AbstractLifecycleComponent 3. BaseRestHandler These files don't need `settings` either, but this change is large enough as is. Relates to #34488	2018-10-31 21:23:20 -04:00
Igor Motov	b5e5e93c46	Fixes randomDateTimeZone method (#35145 ) The randomDateTimeZone method shouldn't return deprecated timezones this causes some tests to fail with deprecation warning.	2018-10-31 20:32:18 -04:00
Seong-hyun, Oh	9ef4788c13	Make XContentBuilder in AliasActions build `is_write_index` field (#35071 ) Make XContentBuilder in AliasesActions build `is_write_index` field	2018-10-31 14:15:46 -07:00
Armin Braun	e6f9f0666e	NETWORKING: MockTransportService Wait for Close (#35038 ) * NETWORKING: MockTransportService Wait for Close * Make `MockTransportService` wait `30s` for close listeners to run before failing the assertion * Closes #34990	2018-10-31 21:33:49 +01:00
David Turner	0072c90e2a	Pre-populate unicast hosts files (#35136 ) Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in #35052. Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since #33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour. Closes #35052.	2018-10-31 19:21:24 +00:00
Tal Levy	d5d28420b6	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-31 10:47:07 -07:00
Luca Cavanna	674225aaa1	[TEST] Enforce skip headers when needed (#34735 ) The java yaml test runner supports sending request headers, yet not all clients support headers. This commit makes sure that we enforce adding a skip section with feature "headers" whenever headers are used in a do section as part of a test. That decreases the chance for new tests to break client builds due to the missing skip section. Closes #34650	2018-10-31 13:07:02 +01:00
Tal Levy	5141084048	rename CRUD api REST path prefix _ilm to _ilm/policy (#35056 ) This PR renames the CRUD APIS for ILM GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy PUT _ilm/<policy> -> _ilm/policy/<policy> DELETE _ilm/<policy> -> _ilm/policy/<policy> closes #34929.	2018-10-30 16:19:05 -07:00
Nik Everett	086ada4c08	Core: Drop settings member from AbstractComponent (#35083 ) Drops the `Settings` member from `AbstractComponent`, moving it from the base class on to the classes that use it. For the most part this is a mechanical change that doesn't drop `Settings` accesses. The one exception to this is naming threads where it switches from an invocation that passes `Settings` and extracts the node name to one that explicitly passes the node name. This change doesn't drop the `Settings` argument from `AbstractComponent`'s ctor because this change is big enough as is. We'll do that in a follow up change.	2018-10-30 16:10:38 -04:00
Ryan Ernst	512319cef7	Test: Filter out deprecated joda tzs in tests (#34868 ) This commit filters out usage of deprecated tzs by tests. These are tested separately and should not require checking for warnings on any test using random timezones. closes #34188	2018-10-30 11:15:34 -07:00
Tal Levy	18c72e86c5	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-30 08:09:57 -07:00
Luca Cavanna	7ef65dedc3	[TEST] improve validation of yaml suites (#34957 ) Validation of test sections and suites consists of checking that the proper skip features sections are in place depending on the features used in tests. The validation logic was previously only performed on do sections included in each test section, and the skip needed to be present in the same test section. What happens often though is that the skip is added to the setup section, or the teardown section. This commit improves the validation of test suites by validating setup and teardown section first, then looking at each test section while still eventually reading the skip section from setup or teardown. We are also making SkipSection, SetupSection, TearDownSection, ClientYamlTestSection and ClientYamlTestSuite immutable. Previously it was possible to utilize constants like SetupSection.EMPTY, which were modifiable and affect every other future users by modifiying them. This has been corrected. Also, validation has been improved to cumulate errors so that all the errors from a suite will be listed at once. Relates to #34735	2018-10-30 16:06:31 +01:00
Andy Bristol	b8280ea7cc	median absolute deviation agg (#34482 ) This commit adds a new single value metric aggregation that calculates the statistic called median absolute deviation, which is a measure of variability that works on more types of data than standard deviation Our calculation of MAD is approximated using t-digests. In the collect phase, we collect each value visited into a t-digest. In the reduce phase, we merge all value t-digests, then create a t-digest of deviations using the first t-digest's median and centroids	2018-10-30 07:22:52 -07:00
Andrey Ershov	97f74c5a38	Merge branch 'master' into 'zen2' Conflicts during the merge: 1. >=140 chars line length fixed for a lot of project files and warnings for those files are no longer suppressed 2. Node name is removed from AbstractComponent, it’s no longer taken from settings, but is explicitly passed as constructor argument and there were quite a few new classes on zen2 branch that require this change 3. TransportResponseHandler interface changed (new method added) and Zen2 makes a lot of subclasses in tests 4. Deprecated way of obtaining logger was changed	2018-10-30 14:39:48 +03:00
Przemyslaw Gomulka	995bf0ee66	Bulk Api support for global parameters (#34528 ) Bulk Request in High level rest client should be consistent with what is possible in Rest API, therefore should support global parameters. Global parameters are passed in URL in Rest API. Some parameters are mandatory - index, type - and would fail validation if not provided before before the bulk is executed. Optional parameters - routing, pipeline. The usage of these should be consistent across sync/async execution, bulk processor and BulkRequestBuilder closes #26026	2018-10-30 09:08:12 +01:00
Tal Levy	c9e4d26a53	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-29 14:03:55 -07:00
Pratik Sanglikar	f1135ef0ce	Core: Replace deprecated Loggers calls with LogManager. (#34691 ) Replace deprecated Loggers calls with LogManager. Relates to #32174	2018-10-29 15:52:30 -04:00
Nik Everett	b093116a1e	Logging: Drop another deprecated Loggers method (#34520 ) Drop a method from `Loggers` that we deprecated because it just delegated to `LogManager`.	2018-10-29 10:05:24 -04:00
Alpar Torok	baa144e844	Enforce a [skip] when using [contains] (#34840 ) Be friendly to other runners	2018-10-29 14:54:22 +02:00
Tal Levy	d8322ca069	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-26 12:46:21 -07:00
Nik Everett	10295b306d	Core: Drop nodeName from AbstractComponent (#34487 ) `AbstractComponent` is trouble because its name implies that everything should extend from it. It is useful, but maybe too broadly useful. The things it offers access too, the `Settings` instance for the entire server and a logger are nice to have around, but not really needed everywhere. The `Settings` instance especially adds a fair bit of ceremony to testing without any value. This removes the `nodeName` method from `AbstractComponent` so it is more clear where we actually need the node name.	2018-10-26 15:26:14 -04:00
Tal Levy	e737ea7d4a	remove old doc placeholder and migrate ilm docs to top-level (#34615 ) we are restructuring the docs, this migrates ILM docs outside of the x-pack doc structure.	2018-10-26 12:19:52 -07:00
Igor Motov	02a342eb8c	Tests: remove possibly unnecessary rollup job logging (#34883 ) It seems that this statement is a debug leftover since it currently adds an error message `{"jobs":[]}` after each successful REST test.	2018-10-26 14:23:10 -04:00
Jay Modi	a0279bc069	Responses can use Writeable.Reader interface (#34655 ) In order to remove Streamable from the codebase, Response objects need to be read using the Writeable.Reader interface which this change enables. This change enables the use of Writeable.Reader by adding the `Action#getResponseReader` method. The default implementation simply uses the existing `newResponse` method and the readFrom method. As responses are migrated to the Writeable.Reader interface, Action classes can be updated to throw an UnsupportedOperationException when `newResponse` is called and override the `getResponseReader` method. Relates #34389	2018-10-26 09:21:54 -06:00
Tal Levy	810cd46a30	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-25 14:35:33 -07:00
Nik Everett	59df6e8689	Test: Lookup node versions on rest test start (#34657 ) This is a forward port of a change made to clean up backwards compatibility for the rollup cleanups. It makes the version of each node available very early on in test execution. The 6.x version of the change used those versions to control the cleanup backwards compatibility but that isn't needed in this branch. But having the versions around is useful. So this makes them available. Closes #34629	2018-10-25 16:43:33 -04:00
Tim Brooks	cf9aff954e	Reduce channels in AbstractSimpleTransportTestCase (#34863 ) This is related to #30876. The AbstractSimpleTransportTestCase initiates many tcp connections. There are normally over 1,000 connections in TIME_WAIT at the end of the test. This is because every test opens at least two different transports that connect to each other with 13 channel connection profiles. This commit modifies the default connection profile used by this test to 6. One connection for each type, except for REG which gets 2 connections.	2018-10-25 13:37:49 -06:00
Lee Hinman	3e7042832a	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-25 11:00:36 -06:00
lipsill	185c06bb7f	Logging: tests: clean up logging (#34606 ) Replace internal deprecated calls to `Loggers.getLogger(Class)` with direct calls to log4j `LogManager.getLogger(Class)`	2018-10-25 09:52:41 -04:00
Alpar Torok	59536966c2	Add a new "contains" feature (#34738 ) The contains syntax was added in #30874 but the skips were not properly put in place. The java runner has the feature so the tests will run as part of the build, but language clients will be able to support it at their own pace.	2018-10-25 08:50:50 +03:00
Ryan Ernst	687dc1eb11	Scripting: Remove SearchScript (#34730 ) This commit removes the last non context based script class.	2018-10-24 15:03:38 -07:00
Luca Cavanna	d51bc05dce	[TEST] Improve validation of do sections (#34734 ) We throw parsing exception when an unknown array is found, but we don't when an unknown top-level field is found. This commit makes sure that unsupported top-level fields are not ignored in a do section. Closes #34651	2018-10-24 21:27:07 +02:00
lipsill	d5ad3de42e	[test] Introduce strict deprecation mode for REST tests (#34338 ) #33708 introduced a strict deprecation mode that makes a REST request fail if there is a warning header in the response returned by Elasticsearch (usually a deprecation message signaling that a feature or a field has been deprecated). This change adds the strict deprecation mode into the REST integration tests, and makes the tests fail if a deprecated feature is used. Also any test using a deprecated feature has been modified to pass the build. The YAML integration tests already analyzed HTTP warnings so they do not use this mode, keeping their "expected vs actual" behavior.	2018-10-24 08:21:24 -04:00
Nhat Nguyen	52266d8b11	TEST: Clone replicas list when compute replication targets (#34728 ) In #34407, we supposed to clone the list of replicas of ReplicationGroup when computing replication targets, but somehow we missed it. If we don't clone the list, a WriteReplicationAction may use an old ReplicationTargets which consists replicas which are removed from the current list of replicas Relates #34407 Closes #33457	2018-10-23 21:08:34 -04:00
Zachary Tong	299d044bfc	Collapse pipeline aggs into single package (#34658 ) - Restrict visibility of Aggregators and Factories - Move PipelineAggregatorBuilders up a level so it is consistent with AggregatorBuilders - Checkstyle line length fixes for a few classes - Minor odds/ends (swapping to method references, formatting, etc)	2018-10-23 16:01:01 -04:00
Tal Levy	62ac2fa5ec	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-23 09:43:46 -07:00
Zachary Tong	4dbf498721	[Rollup] Job deletion should be invoked on the allocated task (#34574 ) We should delete a job by directly talking to the allocated task and telling it to shutdown. Today we shut down a job via the persistent task framework. This is not ideal because, while the job has been removed from the persistent task CS, the allocated task continues to live until it gets the shutdown message. This means a user can delete a job, immediately delete the rollup index, and then see new documents appear in the just-deleted index. This happens because the indexer in the allocated task is still running and indexes a few more documents before getting the shutdown command. In this PR, the transport action is changed to a TransportTasksAction, and we invoke onCancelled() directly on the matching job. The race condition still exists after this PR (albeit less likely), but this was a precursor to fixing the issue and a self-contained chunk of code. A second PR will followup to fix the race itself.	2018-10-23 12:23:22 -04:00
Tal Levy	67bfdb16ad	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-22 13:09:37 -07:00
Yannick Welsch	6d6ac74a08	Zen2: Fail fast on disconnects (#34503 ) Integrates the failure detectors with the Connection lifecycle, to fail nodes as soon as: - a leader detects one of his followers disconnecting. - a follower detects its leader disconnecting.	2018-10-22 17:20:12 +02:00
Jason Tedor	243335e2ba	Allow set section in setup section of REST tests (#34678 ) This commit enables using a set section in the setup section of REST tests.	2018-10-22 11:14:27 -04:00
Jason Tedor	7af19b8f81	Migrate wait for pending tasks helper to server (#34675 ) In some of our X-Pack REST tests we have to wait for pending tasks to complete. We are now needing this functionality in ESRestTestCase for the docs tests where we run against X-Pack features. This commit moves the helper method that we have in X-Pack to ESRestTestCase, and removes duplicate logic from waiting for rollup tasks to complete.	2018-10-22 11:14:02 -04:00
Ryan Ernst	222652dfce	Scripting: Convert script fields to use script context (#34164 ) This commit removes the use of SearchScript for script fields and adds a new FieldScript.	2018-10-20 16:33:49 -07:00
David Turner	bfd24fc030	[Zen2] Reconfigure cluster as its membership changes (#34592 ) As master-eligible nodes join or leave the cluster we should give them votes or take them away, in order to maintain the optimal level of fault-tolerance in the system. #33924 introduced the `Reconfigurator` to calculate the optimal configuration of the cluster, and in this change we add the plumbing needed to actually perform the reconfigurations needed as the cluster grows or shrinks.	2018-10-19 19:24:54 +01:00
Nhat Nguyen	bd92a28cfc	CCR: Replicate existing ops with old term on follower (#34412 ) Since #34288, we might hit deadlock if the FollowTask has more fetchers than writers. This can happen in the following scenario: Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has two fetchers and one writer. 1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0, num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1 respectively. 2. The second request which fetches seq#1 completes before, and then it triggers a write request containing only seq#1. 3. The primary of a follower fails after it has replicated seq#1 to replicas. 4. Since the old primary did not respond, the FollowTask issues another write request containing seq#1 (resend the previous write request). 5. The new primary has seq#1 already; thus it won't replicate seq#1 to replicas but will wait for the global checkpoint to advance at least seq#1. The problem is that the FollowTask has only one writer and that writer is waiting for seq#0 which won't be delivered until the writer completed. This PR proposes to replicate existing operations with the old primary term (instead of the current term) on the follower. In particular, when the following primary detects that it has processed an process already, it will look up the term of an existing operation with the same seq_no in the Lucene index, then rewrite that operation with the old term before replicating it to the following replicas. This approach is wait-free but requires soft-deletes on the follower. Relates #34288	2018-10-19 13:56:00 -04:00
David Turner	3de266e3cf	Merge branch 'master' into zen2	2018-10-19 14:30:07 +01:00
Colin Goodheart-Smithe	84ef91529c	Merge branch 'master' into index-lifecycle	2018-10-19 13:24:04 +01:00
Daniel Mitterdorfer	dbb6fe58fa	Remove hand-coded XContent duplicate checks With this commit we cleanup hand-coded duplicate checks in XContent parsing. They were necessary previously but since we reconfigured the underlying parser in #22073 and #22225, these checks are obsolete and were also ineffective unless an undocumented system property has been set. As we also remove this escape hatch, we can remove the additional checks as well. Closes #22253 Relates #34588	2018-10-19 10:13:13 +02:00
Tal Levy	09067c8942	Merge remote-tracking branch 'upstream/master' into index-lifecycle	2018-10-17 15:37:11 -07:00
Nhat Nguyen	eb36f10394	TEST: Capture replication targets when replication group ready (#34407 ) Today, WriteReplicationAction uses a set of replication targets directly from the primary shard of ReplicationGroup. It should be fine except when we add/remove or promote a shard while a write action is executing. We have encountered these two issues: 1. Replicas are not found in the replication targets. This happens because we remove replicas but the WriteReplicationAction still uses the old replication targets which include the removed replicas. 2. Access ReplicationGroup from a primary shard which hasn't activated the primary-mode yet. This is because we won't activate the primary-mode for a promoting shard after bumping the primary term which is executed asynchronously. This commit captures the replication targets when the replication group is ready and continue using those targets until we re-compute the new targets after the group is changed. Closes #33457	2018-10-17 17:37:52 -04:00
Armin Braun	08d4bf6e84	TESTS: Remove Dead Code in Test Infra. (#34548 ) * None of this infrastructure is used * Some redundant throws and resulting catch code removed	2018-10-17 20:08:39 +01:00
Colin Goodheart-Smithe	90f7cec7a5	Merge branch 'master' into index-lifecycle	2018-10-17 18:22:23 +01:00
Nik Everett	139bbc3f03	Rollup: Consolidate rollup cleanup for http tests (#34342 ) This moves the rollup cleanup code for http tests from the high level rest client into the test framework and then entirely removes the rollup cleanup code for http tests that lived in x-pack. This is nice because it consolidates the cleanup into one spot, automatically invokes the cleanup without the test having to know that it is "about rollup", and should allow us to run the rollup docs tests. Part of #34530	2018-10-17 09:32:16 -04:00
Andrey Ershov	93bb24e1f8	Merge branch 'master' into zen2	2018-10-17 14:37:53 +02:00
Armin Braun	3954d041a0	SCRIPTING: Move sort Context to its Own Class (#33717 ) * SCRIPTING: Move sort Context to its own Class	2018-10-17 10:02:44 +01:00
Tal Levy	fbe8dc014c	Merge branch 'master' into index-lifecycle	2018-10-16 13:58:53 -07:00
Armin Braun	ea576a8ca2	Disc: Move AbstractDisruptionTC to filebased D. (#34461 ) * Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration	2018-10-16 15:28:40 +01:00
David Turner	950ca3adda	Merge branch 'master' into zen2	2018-10-16 14:41:14 +01:00
Simon Willnauer	d43a1fac33	Lock down Engine.Searcher (#34363 ) `Engine.Searcher` is non-final today which makes it error prone in the case of wrapping the underlying reader or lucene `IndexSearcher` like we do in `IndexSearcherWrapper`. Yet, there is no subclass of it yet that would be dramatic to just drop on the floor. With the start of development of frozen indices this changed since in #34357 functionality was added to a subclass which would be dropped if a `IndexSearcherWrapper` is installed on an index. This change locks down the `Engine.Searcher` to prevent such a functionality trap.	2018-10-16 14:53:07 +02:00
Martijn van Groningen	a1ec91395c	Changed CCR internal integration tests to use a leader and follower cluster instead of a single cluster (#34344 ) The `AutoFollowTests` needs to restart the clusters between each tests, because it is using auto follow stats in assertions. Auto follow stats are only reset by stopping the elected master node. Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api. * Renamed AutoFollowTests to AutoFollowIT, because it is an integration test. Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name of an internal api and isn't a good name for an integration test. * move creation of NodeConfigurationSource to a seperate method * Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.	2018-10-16 14:45:46 +02:00
Jim Ferenczi	544de13d8e	Disallow negative query boost (#34486 ) This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts in 6x in order to ensure that users are aware when they'll upgrade to ES 7. Relates #33309	2018-10-16 11:31:53 +01:00
Armin Braun	ebca27371c	SCRIPTING: Move Aggregation Script Context to its own class (#33820 ) * SCRIPTING: Move Aggregation Script Context to its own class	2018-10-15 17:28:05 +01:00
Colin Goodheart-Smithe	0b42eda0e3	Merge branch 'master' into index-lifecycle	2018-10-15 16:03:37 +01:00
Andrey Ershov	e3a1981a57	Mute testToQuery test	2018-10-15 14:08:04 +02:00
Yannick Welsch	5fbead00a3	Zen2: Add infrastructure for integration tests (#34365 ) Adds the infrastructure to run integration tests against Zen2.	2018-10-14 20:55:04 +01:00
Nhat Nguyen	33791ac27c	CCR: Following primary should process operations once (#34288 ) Today we rewrite the operations from the leader with the term of the following primary because the follower should own its history. The problem is that a newly promoted primary may re-assign its term to operations which were replicated to replicas before by the previous primary. If this happens, some operations with the same seq_no may be assigned different terms. This is not good for the future optimistic locking using a combination of seqno and term. This change ensures that the primary of a follower only processes an operation if that operation was not processed before. The skipped operations are guaranteed to be delivered to replicas via either primary-replica resync or peer-recovery. However, the primary must not acknowledge until the global checkpoint is at least the highest seqno of all skipped ops (i.e., they all have been processed on every replica). Relates #31751 Relates #31113	2018-10-10 15:39:57 -04:00
Nik Everett	06993e0c35	Logging: Make ESLoggerFactory package private (#34199 ) Since all calls to `ESLoggerFactory` outside of the logging package were deprecated, it seemed like it'd simplify things to migrate all of the deprecated calls and declare `ESLoggerFactory` to be package private. This does that.	2018-10-06 09:54:08 -04:00
David Turner	c6b0f08472	Add safety phase to CoordinatorTests (#34241 ) Today's CoordinatorTests have a limited amount of randomisation in how things are scheduled. However, to be fully confident in Zen2's liveness we require the system to stabilise after any permitted sequence of events. We can achieve this by running the system in a much more random fashion for a while, with much larger variation in when things are scheduled (simulating GC pressure and network disruption) and then continuing to assert that the system stabilises as we expect. When running randomly, we do not expect to make significant progress and merely verify that no safety property is violated. This change introduces the runRandomly() test method which implements this idea. It also fixes a handful of liveness bugs that this first version of runRandomly() exposed.	2018-10-04 07:40:26 +01:00
Kazuhiro Sera	d45fe43a68	Fix a variety of typos and misspelled words (#32792 )	2018-10-03 18:11:38 +01:00
David Turner	a9eae1d068	Merge branch 'master' into zen2	2018-10-03 08:36:34 +01:00
Gordon Brown	fb907706ec	Merge branch 'master' into index-lifecycle	2018-10-02 13:43:46 -06:00
Nik Everett	f904c41506	HLRC: Add get rollup job (#33921 ) Adds support for the get rollup job to the High Level REST Client. I had to do three interesting and unexpected things: 1. I ported the rollup state wiping code into the high level client tests. I'll move this into the test framework in a followup and remove the x-pack version. 2. The `timeout` in the rollup config was serialized using the `toString` representation of `TimeValue` which produces fractional time values which are more human readable but aren't supported by parsing. So I switched it to `getStringRep`. 3. Refactor the xcontent round trip testing utilities so we can test parsing of classes that don't implements `ToXContent`.	2018-10-02 09:11:29 -04:00
David Turner	a127805b4a	[Zen2] Simulate scheduling delays (#34181 ) Today we schedule tasks (both immediate and future ones) exactly when requested. In fact it is more realistic to allow for a small amount of delay in the scheduling of tasks, and this helps to exercise more interleavings of actions and therefore to improve test coverage. This change adds to the DeterministicTaskQueue the ability to add a random delay to the scheduling of tasks. This change also provides more explicit timeouts for stabilisation in the CoordinatorTests. Using the randomised scheduling feature in the CoordinatorTests also found a situation in which we could become a leader, then a candidate, and then a leader again very quickly, causing a clash of the _BECOME_MASTER_ and _FINISH_ELECTION_ tasks. We change their behaviour to not consider these duplicates to be problematic.	2018-10-02 11:22:05 +01:00
Lee Hinman	2d9cb21490	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-01 14:10:09 -06:00
Nhat Nguyen	ad61398879	CCR: Optimize indexing ops using seq_no on followers (#34099 ) This change introduces the indexing optimization using sequence numbers in the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader and replicated to replicas and followers. Relates #33656	2018-09-28 20:42:26 -04:00
Ryan Ernst	47cbae9b26	Scripting: Remove ExecutableScript (#34154 ) This commit removes the legacy ExecutableScript, which was no longer used except in tests. All uses have previously been converted to script contexts.	2018-09-28 17:13:08 -07:00
Lee Hinman	6ea396a476	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-28 15:40:12 -06:00
Hendrik Muhs	e2f310b56c	Fix AggregationFactories.Builder equality and hash regarding order (#34005 ) Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization and deserialization. This ensures storing configs with aggregation works properly. This also addresses a potential issue in caching when the same query contains aggregations but in different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in the cache.	2018-09-28 13:30:50 +02:00
Alan Woodward	f243d75f59	Remove special-casing of Synonym filters in AnalysisRegistry (#34034 ) The synonym filters no longer need access to the AnalysisRegistry in their constructors, so we can remove the special-case code and move them to the common analysis module. This commit means that synonyms are no longer available for `server` integration tests, so several of these are either rewritten or migrated to the common analysis module as rest-spec-api tests	2018-09-28 09:02:47 +01:00
Ryan Ernst	a2c941806b	Tests: Add support for custom contexts to mock scripts (#34100 ) This commit adds the ability to plug in compilation of custom contexts in mock script engine. This is needed for testing plugins which add custom contexts like watcher.	2018-09-27 12:23:59 -07:00
Lee Hinman	a26cc1a242	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-27 11:00:37 -06:00
Jim Ferenczi	269ae0bc15	Handle MatchNoDocsQuery in span query wrappers (#34106 ) * Handle MatchNoDocsQuery in span query wrappers This change adds a new SpanMatchNoDocsQuery query that replaces MatchNoDocsQuery in the span query wrappers. The `wildcard` query now returns MatchNoDocsQuery if the target field is not in the mapping (#34093) so we need the equivalent span query in order to be able to pass it to other span wrappers. Closes #34105	2018-09-27 14:19:08 +02:00
Simon Willnauer	bda7bc145b	Fold EngineSearcher into Engine.Searcher (#34082 ) EngineSearcher can be easily folded into Engine.Searcher which removes a level of inheritance that is necessary for most of it's subclasses. This change folds it into Engine.Searcher and removes the dependency on ReferenceManager.	2018-09-27 09:06:04 +02:00
Yogesh Gaikwad	0301062c6e	Mute SpanMultiTermQueryBuilderTests#testToQuery	2018-09-27 15:26:06 +10:00
Nik Everett	ddce9704d4	Logging: Drop two deprecated methods (#34055 ) This drops two deprecated methods from `ESLoggerFactory`, switching all calls to those methods to calls to methods of the same name on `LogManager`.	2018-09-26 11:20:52 -04:00
Adrien Grand	3c2841d493	REST test for typeless APIs. (#33934 ) This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates #15613	2018-09-26 17:11:37 +02:00
Ryan Ernst	7800b4fa91	Core: Abstract DateMathParser in an interface (#33905 ) This commits creates a DateMathParser interface, which is already implemented for both joda and java time. While currently the java time DateMathParser is not used, this change will allow a followup which will create a DateMathParser from a DateFormatter, so the caller does not need to know the internals of the DateFormatter they have.	2018-09-26 07:56:25 -07:00
Zachary Tong	25d74bd0cb	Prefer mapped aggs to lead reductions (#33528 ) Previously, unmapped aggs try to delegate reduction to a sibling agg that is mapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the unmapped agg _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON (e.g. same key multiple times) and break when converting to maps. This fixes by sorting the list of aggregations ahead of time so that mapped aggs appear first, meaning they preferentially lead the reduction. If all aggs are unmapped, the first unmapped agg simply creates a new unmapped object and returns that for the reduction. This means that unmapped aggs no longer defer and there is no chance for a secondary execution of pipelines (or other side effects caused by deferring execution). Closes #33514	2018-09-26 10:09:31 -04:00
Christoph Büscher	ba3ceeaccf	Clean up "unused variable" warnings (#31876 ) This change cleans up "unused variable" warnings. There are several cases were we most likely want to suppress the warnings (especially in the client documentation test where the snippets contain many unused variables). In a lot of cases the unused variables can just be deleted though.	2018-09-26 14:09:32 +02:00
David Turner	d995fc85c6	Integrate LeaderChecker with Coordinator (#34049 ) This change ensures that follower nodes periodically check that their leader is healthy, and that they elect a new leader if not.	2018-09-26 12:18:13 +01:00
Ryan Ernst	be8475955e	Scripting: Use ParameterMap for deprecated ctx var in update scripts (#34065 ) This commit removes the sysprop controlling whether ctx is in params for update scripts and replaces it with use of the new ParameterMap, which outputs a deprecation warning whenever params.ctx is used.	2018-09-25 22:08:02 -07:00
Nhat Nguyen	5166dd0a4c	Replicate max seq_no of updates to replicas (#33967 ) We start tracking max seq_no_of_updates on the primary in #33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery. With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed. Relates #33656	2018-09-25 08:07:57 -04:00
David Turner	1d47c9582b	Fix CoordinatorTests (#34002 ) Today the CoordinatorTests are not very reliable if two elections are scheduled concurrently. Although we expect occasional failures due to this, in fact the failures are much more common than expected due to a handful of issues. This PR fixes these issues.	2018-09-25 08:43:47 +01:00
Lee Hinman	243e863f6e	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-24 10:33:51 -06:00
Tim Brooks	78e483e8d8	Introduce abstract security transport testcase (#33878 ) This commit introduces an AbstractSimpleSecurityTransportTestCase for security transports. This classes provides transport tests that are specific for security transports. Additionally, it fixes the tests referenced in #33285.	2018-09-24 09:44:44 -06:00
Nhat Nguyen	7944a0cb25	Track max seq_no of updates or deletes on primary (#33842 ) This PR is the first step to use seq_no to optimize indexing operations. The idea is to track the max seq_no of either update or delete ops on a primary, and transfer this information to replicas, and replicas use it to optimize indexing plan for index operations (with assigned seq_no). The max_seq_no_of_updates on primary is initialized once when a primary finishes its local recovery or peer recovery in relocation or being promoted. After that, the max_seq_no_of_updates is only advanced internally inside an engine when processing update or delete operations. Relates #33656	2018-09-22 08:02:57 -04:00
Vladimir Dolzhenko	477391d751	Don't test corruption detection within CFS checksum (#33911 ) Closes #33881	2018-09-22 10:21:36 +02:00
Yannick Welsch	a612dd1272	Zen2: Add node id to log output of CoordinatorTests (#33929 ) With recent changes to the logging framework, the node name can no longer be injected into the logging output using the node.name setting, which means that for the CoordinatorTests (which are simulating a cluster in a fully deterministic fashion using a single thread), as all the different nodes are running under the same test thread, we are not able to distinguish which log lines are coming from which node. This commit readds logging for node ids in the CoordinatorTests, making two very small changes to DeterministicTaskQueue and TestThreadInfoPatternConverter.	2018-09-21 18:40:12 +02:00
lipsill	b48d5a8942	[TEST] ClientYamlSuiteRestApiParser to parse spec without path parts (#33720 ) Previously ClientYamlSuiteRestApiParser threw an exception when an api spec contained neither path parts nor url parameter sections. Closes #31649	2018-09-21 17:26:55 +02:00
Alexander Reelsen	1de2a925ce	Watcher: Ensure that execution triggers properly on initial setup (#33360 ) This commit reverts most of #33157 as it introduces another race condition and breaks a common case of watcher, when the first watch is added to the system and the index does not exist yet. This means, that the index will be created, which triggers a reload, but during this time the put watch operation that triggered this is not yet indexed, so that both processes finish roughly add the same time and should not overwrite each other but act complementary. This commit reverts the logic of cleaning out the ticker engine watches on start up, as this is done already when the execution is paused - which also gets paused on the cluster state listener again, as we can be sure here, that the watches index has not yet been created. This also adds a new test, that starts a one node cluster and emulates the case of a non existing watches index and a watch being added, which should result in proper execution. Closes #33320	2018-09-21 14:22:34 +02:00
Armin Braun	3a5b8a71b4	NETWORKING: Fix Portability of SO_LINGER=0 in Tests (#33895 ) * Setting SO_LINGER for open but not connected non-blocking sockets throws on OSX * Fixed by only applying setting to connected sockets which will save the same number of FDs as doing it on open sockets anyway * closes #33879	2018-09-21 10:08:16 +02:00
Nhat Nguyen	5f7f793f43	Propagate max_auto_id_timestamp in peer recovery (#33693 ) Today we don't store the auto-generated timestamp of append-only operations in Lucene; and assign -1 to every index operations constructed from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica if a retry append-only arrives first via peer-recovery; then an original append-only arrives via replication. Since the retry append-only (delivered via recovery) does not have timestamp, the replica will happily optimizes the original request while it should not. This change transmits the max auto-generated timestamp from the primary to replicas before translog phase in peer recovery. This timestamp will prevent replicas from optimizing append-only requests if retry counterparts have been processed. Relates #33656 Relates #33222	2018-09-20 19:53:30 -04:00
David Turner	187f787f52	[Zen2] Introduce LeaderChecker (#33024 ) It is important that follower nodes periodically check that their leader is still healthy and that they remain part of its cluster. If these checks fail repeatedly then followers should attempt to find and join a new leader, possibly electing one in the process. The LeaderChecker, introduced in this commit, performs these periodic checks and deals with retries.	2018-09-20 20:05:55 +01:00
Nhat Nguyen	76a1a863e3	TEST: stop assertSeqNos if shards movement (#33875 ) Currently, assertSeqNos assumes that the cluster is stable at the end of the test (i.e., no more shard movement). However, this assumption does not always hold. In these cases, we can stop the assertion instead of failing a test. Closes #33704	2018-09-20 13:44:26 -04:00
David Turner	0b4a6ae97c	Merge commit '3522b9084b611c89ec4f06c1863542883840ed0e' into zen2	2018-09-20 15:17:47 +01:00
Tim Vernum	ff934e3dcd	Mute broken test on MacOS Seems to be triggered by `0cf0d73` See: https://github.com/elastic/elasticsearch/issues/33879	2018-09-20 14:06:40 +10:00
Nik Everett	26c4f1fb6c	Core: Default node.name to the hostname (#33677 ) Changes the default of the `node.name` setting to the hostname of the machine on which Elasticsearch is running. Previously it was the first 8 characters of the node id. This had the advantage of producing a unique name even when the node name isn't configured but the disadvantage of being unrecognizable and not being available until fairly late in the startup process. Of particular interest is that it isn't available until after logging is configured. This forces us to use a volatile read whenever we add the node name to the log. Using the hostname is available immediately on startup and is generally recognizable but has the disadvantage of not being unique when run on machines that don't set their hostname or when multiple elasticsearch processes are run on the same host. I believe that, taken together, it is better to default to the hostname. 1. Running multiple copies of Elasticsearch on the same node is a fairly advanced feature. We do it all the as part of the elasticsearch build for testing but we make sure to set the node name then. 2. That the node.name defaults to some flavor of "localhost" on an unconfigured box feels like it isn't going to come up too much in production. I expect most production deployments to at least set the hostname. As a bonus, production deployments need no longer set the node name in most cases. At least in my experience most folks set it to the hostname anyway.	2018-09-19 15:21:29 -04:00
Nik Everett	3ede13a454	Test framework fall cleaning (#33423 ) Wraps all lines in our test framework at 140 characters because that is our standard line length and removes all of the checkstyle suppressions for the test framework. Drops most of `ModuleTestCase` because it isn't used and we're moving away from using guice in the way that it wants to test anyway. Also switches a few classes that extend it but don't use it to extend `ESTestCase` instead.	2018-09-19 14:34:02 -04:00
Lee Hinman	81e9150c7a	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-19 09:43:26 -06:00
Yannick Welsch	10009434bf	Merge remote-tracking branch 'elastic/master' into zen2	2018-09-19 11:18:01 +02:00
Vladimir Dolzhenko	a3e8b831ee	add elasticsearch-shard tool (#32281 ) Relates #31389	2018-09-19 10:28:22 +02:00
Armin Braun	0cf0d73813	TESTS: Set SO_LINGER = 0 for MockNioTransport (#32560 ) * TESTS: Set SO_LINGER = 0 for MockNioTransport * Prevents lingering sockets in TIME_WAIT piling up during test runs and leading to port collisions that manifest as timeouts * Fixes #32552	2018-09-19 06:05:36 +02:00
Lee Hinman	c87cff22b4	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-18 13:57:41 -06:00
Yannick Welsch	758b2f9111	Zen2: Add DisruptableMockTransport (#33713 ) Adds a mock transport implementation that allows to simulate network disruptions.	2018-09-18 11:48:24 +02:00
Or Bin	a5bad4d92c	Docs: Fixed a grammatical mistake: 'a HTTP ...' -> 'an HTTP ...' (#33744 ) Fixed a grammatical mistake: 'a HTTP ...' -> 'an HTTP ...' Closes #33728	2018-09-17 15:35:54 -04:00
Lee Hinman	7ff11b4ae1	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-17 10:41:10 -06:00
Alpar Torok	5ca6f31205	Move precommit task implementation to java (#33407 ) Replace precommit tasks that execute with Java implementations	2018-09-17 14:09:28 +03:00
Lee Hinman	e6cbaa5a78	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-09-14 16:27:37 -06:00
David Turner	31e8781eaa	Merge branch 'master' into zen2	2018-09-14 14:28:28 +02:00
Armin Braun	0b4960ff6b	SCRIPTING: Move terms_set Context to its Own Class (#33602 ) * SCRIPTING: Move terms_set Context to its Own Class * Extracted TermsSetQueryScript * Kept mechanics close to what they were with SearchScript	2018-09-14 06:21:18 +02:00
Colin Goodheart-Smithe	8e59de3eb2	Merge branch 'master' into index-lifecycle	2018-09-13 09:46:14 +01:00
Jim Ferenczi	6ca36bba15	Fix field mapping updates with similarity (#33634 ) This change fixes a bug introduced in 6.3 that prevents fields with an explicit similarity to be updated. It also adds a test that checks this case for similarities but also for analyzers since they could suffer from the same problem. Closes #33611	2018-09-13 09:21:27 +02:00
David Turner	5a3fd8e4e7	Use file-based discovery not MockUncasedHostsProvider (#33554 ) Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However #33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`.	2018-09-13 07:37:15 +02:00
Martijn van Groningen	5fa81310cc	[CCR] Added history uuid validation (#33546 ) For correctness we need to verify whether the history uuid of the leader index shards never changes while that index is being followed. * The history UUIDs are recorded as custom index metadata in the follow index. * The follow api validates whether the current history UUIDs of the leader index shards are the same as the recorded history UUIDs. If not the follow api fails. * While a follow index is following a leader index; shard follow tasks on each shard changes api call verify whether their current history uuid is the same as the recorded history uuid. Relates to #30086 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>	2018-09-12 19:42:00 +02:00
Simon Willnauer	c783488e97	Add `_source`-only snapshot repository (#32844 ) This change adds a `_source` only snapshot repository that allows to wrap any existing repository as a _backend_ to snapshot only the `_source` part including live docs markers. Snapshots taken with the `source` repository won't include any indices, doc-values or points. The snapshot will be reduced in size and functionality such that it requires full re-indexing after it's successfully restored. The restore process will copy the `_source` data locally starts a special shard and engine to allow `match_all` scrolls and searches. Any other query, or get call will fail with and unsupported operation exception. The restored index is also marked as read-only. This feature aims mainly for disaster recovery use-cases where snapshot size is a concern or where time to restore is less of an issue. NOTE: The snapshot produced by this repository is still a valid lucene index. This change doesn't allow for any longer retention policies which is out of scope for this change.	2018-09-12 17:47:10 +02:00
Nhat Nguyen	743327efc2	Reset replica engine to global checkpoint on promotion (#33473 ) When a replica starts following a newly promoted primary, it may have some operations which don't exist on the new primary. Thus we need to throw those operations to align a replica with the new primary. This can be done by first resetting an engine from the safe commit, then replaying the local translog up to the global checkpoint. Relates #32867	2018-09-11 22:09:37 -04:00

... 9 10 11 12 13 ...

2648 Commits