OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	50d2736746	Fix Deadlock from Thread.suspend in Test (#39261 ) (#39341 ) * The lambda invoked by the `lockedExecutor` eventually gets JITed (which runs a static initializer that we will suspend in with a very tiny chance). * Fixed by creating the `Runnable` in the main test thread and using the same instance in all threads * Closes #35686	2019-02-25 09:15:19 +01:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Armin Braun	1a21cc0357	Simplify and Fix Synchronization in InternalTestCluster (#39168 ) (#39241 ) * Simplify and Fix Synchronization in InternalTestCluster (#39168) * Remove unnecessary `synchronized` statements * Make `Predicate`s constants where possible * Cleanup some stream usage * Make unsafe public methods `synchronized` * Closes #37965 * Closes #37275 * Closes #37345	2019-02-21 16:27:18 +01:00
Lee Hinman	d9de899316	Wrap accounting breaker check in assertBusy (#39211 ) There may be situations where indices have not yet been closed from a Lucene perspective, causing the breaker to not immediately be at 0 Relates to #30290	2019-02-21 08:00:31 -07:00
Marios Trivyzas	1316825f52	Replace superfluous usage of Counter with Supplier (#39048 ) (#39225 ) `Counter` was used as a means of a functional argument to pass the relative cached time before `Supplier` iface was introduced.	2019-02-21 12:42:54 +02:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Tal Levy	cb7e3708bc	Rollup jobs should be cleaned up before indices are deleted (#38930 ) (#39144 ) Rollup jobs should be stopped + deleted before the indices are removed. It's possible for an active rollup job to issue a bulk request, the test ends and the cleanup code deletes all indices. The in-flight bulk request will then stall + error because the index no-longer exists... but this process might take longer than the StopRollup timeout. Which means the test fails, and often fails several other tests since the job is still active (e.g. other tests cannot create the same-named job, or fail to stop the job in their cleanup because it's still stalled). This tends to knock over several tests before the bulk finally times out and the job shuts down. Instead, we need to simply stop jobs first. Inflight bulks will resolve quickly, and we can carry on with deleting indices after the jobs are confirmed inactive. stop-job.asciidoc tended to trigger this issue because it executed an async stop API and then exited, which setup the above situation. In can and did happen with other tests though. As an extra precaution, the doc test was modified to substitute in wait_for_completion to help head off these issues too.	2019-02-20 11:12:01 -08:00
Adrien Grand	d8852b83d0	Don't swallow IOExceptions in InternalTestCluster. (#39068 ) Relates #39030	2019-02-19 15:03:47 +01:00
Ioannis Kakavas	59e9a0f4f4	Disable specific locales for tests in fips mode (#38938 ) * Disable specific locales for tests in fips mode The Bouncy Castle FIPS provider that we use for running our tests in fips mode has an issue with locale sensitive handling of Dates as described in https://github.com/bcgit/bc-java/issues/405 This causes certificate validation to fail if any given test that includes some form of certificate validation happens to run in one of the locales. This manifested earlier in #33081 which was handled insufficiently in #33299 This change ensures that the problematic 3 locales * th-TH * ja-JP-u-ca-japanese-x-lvariant-JP * th-TH-u-nu-thai-x-lvariant-TH will not be used when running our tests in a FIPS 140 JVM. It also reverts #33299	2019-02-19 08:46:08 +02:00
David Roberts	ae9243ad0a	Reduce single node test cleanup logging (#39060 ) As per https://github.com/elastic/elasticsearch/pull/39049#discussion_r257719530	2019-02-18 17:38:49 +00:00
Nhat Nguyen	2947ccf5c3	Add remote recovery to ShardFollowTaskReplicationTests (#39007 ) We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975	2019-02-18 09:57:56 -05:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Christoph Büscher	9f6c77fad4	Fix FullClusterRestartIT#testSnapshotRestore (#38795 ) This test failed on 7.1 when running full cluster restart tests against pre-7.0 clusters (e.g. 6.6 clusters). The fixes the expected type in the templates after the cluster restart.	2019-02-15 20:12:26 +01:00
Lee Hinman	7d449c5f65	Check that delete index request succeeded in test teardown (#38903 ) (#38913 ) Backport of #38903 When tearing down from `ESSingleNodeTestCase` we perform a delete on "*" indices, it some cases, however, those indices are not fully deleted. Rather than have a failure occur later down the change (see: https://github.com/elastic/elasticsearch/issues/30290#issuecomment-463589008 ) the failure should occurr immediately so it can be diagnosed more easily.	2019-02-14 13:46:17 -07:00
Julie Tibshirani	e769cb4efd	Perform precise check for types warnings in cluster restart tests. (#37944 ) Instead of using `WarningsHandler.PERMISSIVE`, we only match warnings that are due to types removal. This PR also renames `allowTypeRemovalWarnings` to `allowTypesRemovalWarnings`. Relates to #37920.	2019-02-13 11:28:58 -08:00
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Nhat Nguyen	225ebb6935	Ensure no snapshotted commit when close engine (#38663 ) With this change, we can automatically detect an implementation that acquires an index commit but fails to release.	2019-02-12 11:39:35 -05:00
Tim Brooks	023e3c207a	Concurrent file chunk fetching for CCR restore (#38656 ) Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR. The implementation uses the parallel file writer functionality that is also used by peer recoveries.	2019-02-09 21:19:57 -07:00
Tim Vernum	84483b26cf	Fix version logic when bumping major version (#38593 ) When we are preparing to release a major version the rules around "unreleased" versions and branches get a bit more complex. This change implements the following rules: - If the tip version on the previous major is a .0 (e.g. 6.7.0) then the tip of the minor before that (e.g. 6.6.1) must be unreleased. (This is because 6.7.0 would be "staged" in preparation for release, but 6.6.1 would be open for bug fixes on the release 6.6.x line) (in VersionCollection & VersionUtils) - The "major.x" branch (if it exists) will always point to the latest minor in that series. Anything that is not the latest minor, must therefore be on a the "major.minor" branch For example, if v7.1.0 exists then the "7.x" branch must be 7.1.0, and 7.0.0 must be on the "7.0" branch (in VersionCollection)	2019-02-08 18:00:03 +11:00
Jason Tedor	fdf6b3f23f	Add 7.1 version constant to 7.x branch (#38513 ) This commit adds the 7.1 version constant to the 7.x branch. Co-authored-by: Andy Bristol <andy.bristol@elastic.co> Co-authored-by: Tim Brooks <tim@uncontended.net> Co-authored-by: Christoph Büscher <cbuescher@posteo.de> Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com> Co-authored-by: markharwood <markharwood@gmail.com> Co-authored-by: Ioannis Kakavas <ioannis@elastic.co> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: David Roberts <dave.roberts@elastic.co> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Alpar Torok <torokalpar@gmail.com> Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Tim Vernum <tim@adjective.org> Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>	2019-02-07 16:32:27 -05:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
Luca Cavanna	a7046e001c	Remove support for maxRetryTimeout from low-level REST client (#38085 ) We have had various reports of problems caused by the maxRetryTimeout setting in the low-level REST client. Such setting was initially added in the attempts to not have requests go through retries if the request already took longer than the provided timeout. The implementation was problematic though as such timeout would also expire in the first request attempt (see #31834), would leave the request executing after expiration causing memory leaks (see #33342), and would not take into account the http client internal queuing (see #25951). Given all these issues, it seems that this custom timeout mechanism gives little benefits while causing a lot of harm. We should rather rely on connect and socket timeout exposed by the underlying http client and accept that a request can overall take longer than the configured timeout, which is the case even with a single retry anyways. This commit removes the `maxRetryTimeout` setting and all of its usages.	2019-02-06 08:43:47 +01:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Turner	b7ab521eb1	Throw AssertionError when no master (#38432 ) Today we throw a fatal `RuntimeException` if an exception occurs in `getMasterName()`, and this includes the case where there is currently no master. However, sometimes we call this method inside an `assertBusy()` in order to allow for a cluster that is in the process of stabilising and electing a master. The trouble is that `assertBusy()` only retries on an `AssertionError` and not on a general `RuntimeException`, so the lack of a master is immediately fatal. This commit fixes the issue by asserting there is a master, triggering a retry if there is not. Fixes #38331	2019-02-05 17:11:20 +00:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Mayya Sharipova	641704464d	Deprecate types in rollover index API (#38039 ) Relates to #35190	2019-02-04 16:07:45 -05:00
markharwood	578fd14257	Types removal - fix FullClusterRestartIT warning expectations (#38310 ) Relax test warning message checking to pre-empt PR 38022 landing in 6.7 with new warning messages. The relaxed test now just assumes any warning message starting with “[types removal]” is tolerated rather than the precise phrasing used in the 6.7 branch.	2019-02-04 20:09:07 +00:00
Jason Tedor	625d37a26a	Introduce retention lease background sync (#38262 ) This commit introduces a background sync for retention leases. The idea here is that we do a heavyweight sync when adding a new retention lease, and then periodically we want to background sync any retention lease renewals to the replicas. As long as the background sync interval is significantly lower than the extended lifetime of a retention lease, it is okay if from time to time a replica misses a sync (it will still have an older version of the lease that is retaining more data as we assume that renewals do not decrease the retaining sequence number). There are two follow-ups that will come after this commit. The first is to address the fact that we have not adapted the should periodically flush logic to possibly flush the retention leases. We want to do something like flush if we have not flushed in the last five minutes and there are renewed retention leases since the last time that we flushed. An additional follow-up will remove the syncing of retention leases when a retention lease expires. Today this sync could be invoked in the background by a merge operation. Rather, we will move the syncing of retention lease expiration to be done under the background sync. The background sync will use the heavyweight sync (write action) if a lease has expired, and will use the lightweight background sync (replication action) otherwise.	2019-02-04 10:35:29 -05:00
Lee Hinman	f19fdcd491	Re-enable accounting breaker check in InternalTestCluster (#38131 ) Relates to #30290 The intent for this is to see whether this failure still happens, and if so, provide more up-to-date logs for analysis.	2019-02-04 07:40:59 -07:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Julie Tibshirani	c2e9d13ebd	Default include_type_name to false in the yml test harness. (#38058 ) This PR removes the temporary change we made to the yml test harness in #37285 to automatically set `include_type_name` to `true` in index creation requests if it's not already specified. This is possible now that the vast majority of index creation requests were updated to be typeless in #37611. A few additional tests also needed updating here. Additionally, this PR updates the test harness to set `include_type_name` to `false` in index creation requests when communicating with 6.x nodes. This mirrors the logic added in #37611 to allow for typeless document write requests in test set-up code. With this update in place, we can remove many references to `include_type_name: false` from the yml tests.	2019-02-01 11:44:13 -08:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Armin Braun	0a604e3b24	Fix Two Races that Lead to Stuck Snapshots (#37686 ) * Fixes two broken spots: 1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting 2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state * Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk * Significantly extended test infrastructure to reproduce the above two issues * Two new test runs: 1. Reproducing the effects of node disconnects/restarts in isolation 2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes * Relates #32265 * Closes #32348	2019-02-01 05:45:40 +01:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Przemyslaw Gomulka	28b5c7ce78	Do not set up NodeAndClusterIdStateListener in test (#38110 ) When extending ESIntegTestCase are run on the same jvm, the static field in NodeAndClusterIdConverter will throw an AlreadySet exceptions. overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening relates #32850	2019-01-31 18:59:40 +01:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
Jason Tedor	a9b12b38f0	Push primary term to replication tracker (#38044 ) This commit pushes the primary term into the replication tracker. This is a precursor to using the primary term to resolving ordering problems for retention leases. Namely, it can be that out-of-order retention lease sync requests arrive on a replica. To resolve this, we need a tuple of (primary term, version). For this to be, the primary term needs to be accessible in the replication tracker. As the primary term is part of the replication group anyway, this change conceptually makes sense.	2019-01-31 09:19:49 -05:00
Luca Cavanna	622fb7883b	Introduce ability to minimize round-trips in CCS (#37828 ) With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125	2019-01-31 15:12:14 +01:00
Tim Vernum	cde126dbff	Enable SSL in reindex with security QA tests (#37600 ) Update the x-pack/qa/reindex-tests-with-security integration tests to run with TLS enabled on the Rest interface. Relates: #37527	2019-01-31 20:59:50 +11:00
Alexander Reelsen	160d1bd4dd	Work around JDK8 timezone bug in tests (#37968 ) The timezone GMT0 cannot be properly parsed on java8. The randomZone() method now excludes GMT0, if java8 is used. Closes #37814	2019-01-31 08:52:35 +01:00
David Turner	81c443c9de	Deprecate minimum_master_nodes (#37868 ) Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in tests, but for 7.x nodes this setting is not required as it has no effect. This commit removes this setting so that nodes are started with more realistic configurations, and deprecates it.	2019-01-30 20:09:15 +00:00
Nik Everett	e97718245d	Test: Enable strict deprecation on all tests (#36558 ) This drops the option for tests to disable strict deprecation mode in the low level rest client in favor of configuring expected warnings on any calls that should expect warnings. This behavior is paranoid-by-default which is generally the right way to handle deprecations and tests in general.	2019-01-30 11:48:34 -05:00
Colin Goodheart-Smithe	21e392e95e	Removes typed calls from YAML REST tests (#37611 ) This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour. The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.	2019-01-30 16:32:58 +00:00
Tim Brooks	f3f9cabd67	Add timeout for ccr recovery action (#37840 ) This is related to #35975. It adds a action timeout setting that allows timeouts to be applied to the individual transport actions that are used during a ccr recovery.	2019-01-29 12:29:06 -07:00
Armin Braun	7f1784e9f9	Remove Dead MockTransport Code (#34044 ) * All these methods are unused	2019-01-29 15:08:11 +01:00
Luca Cavanna	2325fb9cb3	Remove test only SearchShardTarget constructor (#37912 ) Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.	2019-01-29 14:58:11 +01:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Jason Tedor	5fddb631a2	Introduce retention lease syncing (#37398 ) This commit introduces retention lease syncing from the primary to its replicas when a new retention lease is added. A follow-up commit will add a background sync of the retention leases as well so that renewed retention leases are synced to replicas.	2019-01-27 07:49:56 -05:00
Christoph Büscher	b4b4cd6ebd	Clean codebase from empty statements (#37822 ) * Remove empty statements There are a couple of instances of undocumented empty statements all across the code base. While they are mostly harmless, they make the code hard to read and are potentially error-prone. Removing most of these instances and marking blocks that look empty by intention as such. * Change test, slightly more verbose but less confusing	2019-01-25 14:23:02 +01:00
Jim Ferenczi	787acb14b9	Track total hits up to 10,000 by default (#37466 ) This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028	2019-01-25 13:45:39 +01:00
Julie Tibshirani	e1d8df4ffa	Deprecate types in create index requests. (#37134 ) From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates: * Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests. * Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.	2019-01-24 13:17:47 -08:00
Andrey Ershov	4974684003	Add tool elasticsearch-node unsafe-bootstrap (#37696 ) elasticsearch-node tool helps to restore cluster if half or more of master eligible nodes are lost. Of course, all bets are off, regarding data consistency. There are two parts of the tool: unsafe-bootstrap to be used when there is still at least one master-eligible node alive and detach-cluster, when there are no master-eligible nodes left. This commit implements the first part. Docs for the tool will be added separately as a part of #37812.	2019-01-24 19:25:55 +01:00
Tal Levy	289106a578	Refactor GeoHashGrid to be abstract and re-usable (#37742 ) This change split out all the specific GeoHash classes for the geohash_grid aggregation into abstract GeoGrid classes that can be re-used for specific hashing types, like `geohash`	2019-01-24 10:12:14 -08:00
Alpar Torok	37768b7eac	Testing conventions now checks for tests in main (#37321 ) * Testing conventions now checks for tests in main This is the last outstanding feature of the old NamingConventionsTask, so time to remove it. * PR review	2019-01-24 17:30:50 +02:00
Nhat Nguyen	a6abb28abf	Fix InternalEngineTests#assertOpsOnPrimary (#37746 ) The assertion `assertOpsOnPrimary` does not store seq_no and primary term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This leads to failures of the subsequence CAS deletes or indexes with seq_no and term. Moreover, this assertion trips a translog assertion because it bumps the primary term of some operations but not the primary term of the engine. Relates #36467 Closes #37684	2019-01-24 10:02:48 -05:00
Yannick Welsch	feab59df03	Bubble exceptions up in ClusterApplierService (#37729 ) Exceptions thrown by the cluster applier service's settings and cluster appliers are bubbled up, and block the state from being applied instead of silently being ignored. In combination with the cluster state publishing lag detector, this will throw a node out of the cluster that can't properly apply cluster state updates.	2019-01-24 14:09:03 +01:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
Boaz Leskes	52ba407931	Expose sequence number and primary terms in search responses (#37639 ) Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.	2019-01-23 09:01:58 +01:00
Henning Andersen	228611843c	Fail start of non-data node if node has data (#37347 ) * Fail start of non-data node if node has data Check that nodes started with node.data=false cannot start if they have shard data to avoid (old) indexes being resurrected into the cluster in red status. Issue #27073	2019-01-22 13:27:12 +01:00
Tim Brooks	21838d73b5	Extract message serialization from `TcpTransport` (#37034 ) This commit introduces a NetworkMessage class. This class has two subclasses - InboundMessage and OutboundMessage. These messages can be serialized and deserialized independent of the transport. This allows more granular testing. Additionally, the serialization mechanism is now a simple Supplier. This builds the framework to eventually move the serialization of transport messages to the network thread. This is the one serialization component that is not currently performed on the network thread (transport deserialization and http serialization and deserialization are all on the network thread).	2019-01-21 14:14:18 -07:00
Tim Brooks	f516d68fb2	Share `NioGroup` between http and transport impls (#37396 ) Currently we create dedicated network threads for both the http and transport implementations. Since these these threads should never perform blocking operations, these threads could be shared. This commit modifies the nio-transport to have 0 http workers be default. If the default configs are used, this will cause the http transport to be run on the transport worker threads. The http worker setting will still exist in case the user would like to configure dedicated workers. Additionally, this commmit deletes dedicated acceptor threads. We have never had these for the netty transport and they can be added back if a need is determined in the future.	2019-01-21 13:50:56 -07:00
Julie Tibshirani	8da7a27f3b	Deprecate types in the put mapping API. (#37280 ) From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates: - Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`. - Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.	2019-01-18 12:28:31 -08:00
Yannick Welsch	377d96e376	Remove initial_master_nodes on node restart (#37580 ) Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since being switched to Zen2 because the bootstrap setting was left around when nodes got restarted with data folders wiped. The test in question here was starting one node (which autobootstrapped to that single node), then another node. The first node was then shut down (after excluding it from the voting configuration), its data folder wiped, and restarted. After restart, the node had an empty data folder yet initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of its own, and not rejoin the existing cluster with the other node.	2019-01-18 16:36:42 +01:00
Jason Tedor	687978b7d1	Reject all requests that have an unconsumed body (#37504 ) This commit removes some leniency from REST handling where we move to reject all requests that have a body where the body is not used during the course of handling the request. For example, DELETE /index { "query" : { "term" : { "field" : "value" } } } is now rejected.	2019-01-16 07:29:25 -05:00
Przemyslaw Gomulka	5e94f384c4	Remove the use of AbstracLifecycleComponent constructor #37488 (#37488 ) The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class. It no longer extends the AbstractComponent so there is no need for this constructor There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor. This is part 1. which will be backported to 6.x with a migration guide/deprecation log. part 2 will have this constructor removed in 7 relates #35560 relates #34488	2019-01-16 09:05:30 +01:00
Tanguy Leroux	23ae9808ba	Fix IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) (#37414 ) This commit fixes the IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) method where the startReplica parameter was not correctly propagated and the value true always used instead.	2019-01-15 12:48:21 +01:00
Marios Trivyzas	d6a104f52b	[TEST] Muted testDifferentRolesMaintainPathOnRestart Relates to #37462	2019-01-15 11:51:45 +02:00
Jason Tedor	e11a32eda8	Reformat some classes in the index universe This commit reformats some classes in the index universe with the purpose of breaking some long method definitions and invocations into a line per parameter. This has the advantage that for an upcoming change to these definitions and invocations, the diff for that change will be a single line per definition or invocation. That makes these sorts of changes easier to read.	2019-01-14 21:45:24 -05:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Nhat Nguyen	1e3702da0b	Relax assertSameDocIdsOnShards assertion If the checking node no longer holds the shard copy, the assertion assertSameDocIdsOnShards might fail. This is too harsh since the assertion is to ensure the consistency between active copies.	2019-01-14 15:28:48 -05:00
Nhat Nguyen	15aa3764a4	Reduce recovery time with compress or secure transport (#36981 ) Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) \| Transport \| Baseline \| chunks=1 \| chunks=2 \| chunks=3 \| chunks=4 \| \| ----------\| ---------\| -------- \| -------- \| -------- \| -------- \| \| Plain \| 184s \| 137s \| 106s \| 105s \| 106s \| \| TLS \| 346s \| 294s \| 176s \| 153s \| 117s \| \| Compress \| 1556s \| 1407s \| 1193s \| 1183s \| 1211s \| - NYC_Taxis (38.6GB) \| Transport \| Baseline \| chunks=1 \| chunks=2 \| chunks=3 \| chunks=4 \| \| ----------\| ---------\| ---------\| ---------\| ---------\| -------- \| \| Plain \| 321s \| 249s \| 191s \| * \| * \| \| TLS \| 618s \| 539s \| 323s \| 290s \| 213s \| \| Compress \| 2622s \| 2421s \| 2018s \| 2029s \| n/a \| Relates #33844	2019-01-14 15:14:46 -05:00
Tim Brooks	5c68338a1c	Implement ccr file restore (#37130 ) This is related to #35975. It implements a file based restore in the CcrRepository. The restore transfers files from the leader cluster to the follower cluster. It does not implement any advanced resiliency features at the moment. Any request failure will end the restore.	2019-01-14 13:07:55 -07:00
Armin Braun	033e67fa59	Cleanup Deadcode in Rest Tests (#37418 ) * Either dead code outright or redundant overrides removed	2019-01-14 16:22:44 +01:00
Daniel Mitterdorfer	abe35fb99b	Remove unused index store in directory service With this commit we remove the unused field `indexStore` from all implementations of `FsDirectoryService`. Relates #37097	2019-01-14 13:44:32 +01:00
Jason Tedor	03be4dbaca	Introduce retention lease persistence (#37375 ) This commit introduces the persistence of retention leases by persisting them in index commits and recovering them when recovering a shard from store.	2019-01-12 14:43:19 -08:00
Nhat Nguyen	44a1071018	Make recovery source partially non-blocking (#37291 ) Today a peer-recovery may run into a deadlock if the value of node_concurrent_recoveries is too high. This happens because the peer-recovery is executed in a blocking fashion. This commit attempts to make the recovery source partially non-blocking. I will make three follow-ups to make it fully non-blocking: (1) send translog operations, (2) primary relocation, (3) send commit files. Relates #36195	2019-01-12 12:49:48 -05:00
Armin Braun	63fe3c6ed6	Fix PrimaryAllocationIT Race Condition (#37355 ) * Fix PrimaryAllocationIT Race Condition * Forcing a stale primary allocation on a green index was tripping the assertion that was removed * Added a test that this case still errors out correctly * Made the ability to wipe stopped datanode's data public on the internal test cluster and used it to ensure correct behaviour on the fixed test * Previously it simply passed because the test finished before the index went green and would NPE when the index was green at the time of the shard store status request, that would then come up empty * Closes #37345	2019-01-11 23:26:04 +01:00
Yannick Welsch	f4abf9628a	Mock connections more accurately in DisruptableMockTransport (#37296 ) This commit moves DisruptableMockTransport to use a more accurate representation of connection management, which allows to use the full connection manager and does not require mocking out any behavior. With this, we can implement restarting nodes in CoordinatorTests.	2019-01-11 16:06:48 +01:00
Jason Tedor	822626dadf	Make consistent empty retention lease supplier This commit makes the use of empty retention lease suppliers to always be an empty list as opposed to in some cases an empty set. This commit is solely for consistency reasons, there is no functional change here.	2019-01-10 18:34:55 -08:00
Yannick Welsch	d499233068	Zen2: Add join validation (#37203 ) Adds join validation to Zen2, which prevents a node from joining a cluster when the node does not have the right ES version or does not satisfy any other of the join validation constraints.	2019-01-10 12:57:50 +01:00
Alexander Reelsen	b2e8437424	Tests: Add ElasticsearchAssertions.awaitLatch method (#36777 ) * Tests: Add ElasticsearchAssertions.awaitLatch method Some tests are using assertTrue(latch.await(...)) in their code. This leads to an assertion error without any error message. This adds a method which has a nicer error message and can be used in tests. * fix forbidden apis * fix spaces	2019-01-10 09:25:36 +01:00
Armin Braun	eacc63b032	TESTS: Real Coordinator in SnapshotServiceTests (#37162 ) * TESTS: Real Coordinator in SnapshotServiceTests * Introduce real coordinator in SnapshotServiceTests to be able to test network disruptions realistically * Make adjustments to cluster applier service so that we can pass a mocked single threaded executor for tests	2019-01-09 16:53:49 +01:00
Tanguy Leroux	7f6fe14b66	Merge branch 'master' into close-index-api-refactoring	2019-01-09 09:26:05 +01:00
Mayya Sharipova	ec32e66088	Deprecate reference to _type in lookup queries (#37016 ) Relates to #35190	2019-01-08 18:46:41 -08:00
Tanguy Leroux	d70ebfd1d6	Merge branch 'master' into close-index-api-refactoring	2019-01-08 09:17:48 +01:00
Jason Tedor	c8c596cead	Introduce retention lease expiration (#37195 ) This commit implements a straightforward approach to retention lease expiration. Namely, we inspect which leases are expired when obtaining the current leases through the replication tracker. At that moment, we clean the map that persists the retention leases in memory.	2019-01-07 22:03:52 -08:00
Julie Tibshirani	c5aac4705d	Revert "Stop automatically nesting mappings in index creation requests. (#36924 )" This reverts commit `ac1c6940d2`.	2019-01-07 17:56:40 -08:00
Tanguy Leroux	97bf4d7176	Merge branch 'master' into close-index-api-refactoring	2019-01-07 18:38:27 +01:00
David Turner	9d0e0eb0f3	[Zen2] Remove initial master node count setting (#37150 ) The `cluster.unsafe_initial_master_node_count` setting was introduced as a temporary measure while the design of `cluster.initial_master_nodes` was being finalised. This commit removes this temporary setting, replacing it with usages of `cluster.initial_master_nodes` where appropriate.	2019-01-07 16:05:00 +00:00
Tanguy Leroux	e149b0852e	[Close Index API] Add unique UUID to ClusterBlock (#36775 ) This commit adds a unique id to cluster blocks, so that they can be uniquely identified if needed. This is important for the Close Index API where multiple concurrent closing requests can be executed at the same time. By adding a UUID to the cluster block, we can generate unique "closing block" that can later be verified on shards and then checked again from the cluster state before closing the index. When the verification on shard is done, the closing block is replaced by the regular INDEX_CLOSED_BLOCK instance. If something goes wrong, calling the Open Index API will remove the block. Related to #33888	2019-01-07 16:44:59 +01:00
Jason Tedor	c0f8c89172	Introduce shard history retention leases (#37167 ) This commit is the first in a series which will culminate with fully-functional shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API. Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem. Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene and restored during recovery, and broadcast to replicas under certain circumstances. This commit is merely putting the basics in place. This first commit only introduces the concept and integrates their use with the soft delete retention policy. We add some tests to demonstrate the basic management is correct, and that the soft delete policy is correctly influenced by the existence of any retention leases. We make no effort in this commit to implement any of the following: - timestamps - expiration - persistence to and recovery from Lucene - handoff during primary relocation - sharing retention leases with replicas - exposing leases in shard-level statistics - integration with cross-cluster replication These will occur individually in follow-up commits.	2019-01-07 07:43:57 -08:00
Alpar Torok	a7c3d5842a	Split third party audit exclusions by type (#36763 )	2019-01-07 17:24:19 +02:00
Simon Willnauer	ac2e09b25a	Fix suite scope random initializaation (#37163 ) The initialization of a suite scope cluster had some sideffects on subsequent runs which causes issues when tests must be reproduced. This moves the suite scope initialization to a privte random context. Closes #36202	2019-01-07 14:20:17 +01:00
Tanguy Leroux	f5af79b9cd	Merge branch 'master' into close-index-api-refactoring	2019-01-07 12:43:03 +01:00
Armin Braun	31c33fdb9b	MINOR: Remove some Deadcode in Gradle (#37160 )	2019-01-07 09:21:25 +01:00

1 2 3 4 5 ...

2003 Commits