OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	6e06f82106	Fix failing CCR retention lease test Finally! This commit should fix the issues with the CCR retention lease that has been plaguing build failures. The issue here is that we are trying to prevent the clear session requests from being executed until after we have been able to validate that retention leases are being renewed. However, we were only blocking the clear session requests but not blocking them when they are proxied through another node. This commit addresses that. Relates #39268	2019-02-22 20:43:39 -05:00
Jason Tedor	2d4c98a991	Change sort order of shard stats in CCR test This commit changes the sort order of shard stats that are collected in CCR retention lease integration tests. This change is done so that primaries appear first in sort order.	2019-02-22 18:17:28 -05:00
Jason Tedor	e569cf8324	Address failing CCR retention lease test This test fails rarely but it is flaky in its current form. The problem here is that we lack a guarantee on the retention leases having been synced to all shard copies. We need to sleep long enough to ensure that that occurs, and then we can sample the retention leases, possibly sleep again (we usually will not have too since the first sleep will have been long enough to allow a sync and a renewal to happen, if one was going to happen), and the sample the retention leases for comparison. Closes #39331	2019-02-22 18:15:10 -05:00
Jason Tedor	e4e96b8181	Fix shard logged in background lease renewal The shard logged here is the leader shard but it should be the follower shard since this background retention lease renewal is happening on the follower side. This commit fixes that.	2019-02-22 17:32:51 -05:00
Jason Tedor	feb25c71a0	Simplify mocking in CCR retention lease tests This commit simplifies the use of transport mocking in the CCR retention lease integration tests. Instead of adding a send rule between nodes, we add a default send rule. This greatly simplifies the code here, and speeds the test up a little bit too.	2019-02-22 17:24:12 -05:00
Tim Brooks	931953a3ee	Ensure index commit released when testing timeouts (#39273 ) This fixes #39245. Currently it is possible in this test that the clear session call times-out. This means that the index commit will not be released and there will be an assertion triggered in the test teardown. This commit ensures that we wipe the leader index in the test to avoid this assertion. It is okay if the clear session call times-out in normal usage. This scenario is unavoidable due to potential network issues. We have a local timeout on the leader to clean it up when this scenario happens.	2019-02-22 11:14:42 -07:00
Benjamin Trent	3262d6c917	[ML-DataFrame] Add _preview endpoint (#38924 ) (#39319 ) * [DATA-FRAME] add preview endpoint * adjusting preview tests and fixing parser * adjusing preview transport * remove unused import * adjusting test * Addressing PR comments * Fixing failing test and adjusting for pr comments * fixing integration test	2019-02-22 10:55:38 -06:00
David Roberts	4f2bd238d2	[ML] Increase datafeed integration test timeout for slow machines (#39311 ) The assertBusy() that waits the default 10 seconds for a datafeed to complete very occasionally times out on slow machines. This commit increases the timeout to 60 seconds. It will almost never actually take this long, but it's better to have a timeout that will prevent time being wasted looking at spurious test failures.	2019-02-22 15:35:32 +00:00
Gordon Brown	2ad1e6aedc	Fix testCannotShrinkLeaderIndex (#38529 ) This test should no longer pass when the functionality it is intended to test is broken, as it now indexes a number of documents and verifies that the index is staying on the same step until after indexing and replication of those documents is finished. This prevents the test from passing if the leader index progresses in its lifecycle during that time.	2019-02-22 08:03:36 -07:00
Dimitris Athanasiou	1c6818fe74	[ML] Improve DeleteExpiredDataIT failure message (#39298 ) (#39310 ) This test failed once in a very long time with the assertion that there is no document for the `non_existing_job` in the state index. I could not see how that is possible and I cannot reproduce. With this commit the failure message will reveal some examples of the left behind docs which might shed a light about what could go wrong.	2019-02-22 16:15:11 +02:00
Daniel Mitterdorfer	9fea21aca5	Remove ExceptionsHelper#detailedMessage in tests (#37921 ) (#39297 ) With this commit we remove all usages of the deprecated method `ExceptionsHelper#detailedMessage` in tests. We do not address production code here but rather in dedicated follow-up PRs to keep the individual changes manageable. Relates #19069	2019-02-22 14:03:29 +01:00
Ioannis Kakavas	401226fc90	Mute rolling upgrade watcher CRUD tests (#39293 ) This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: https://github.com/elastic/elasticsearch/issues/33185	2019-02-22 13:27:45 +02:00
Lee Hinman	3401afdf35	Add ILM plugin for MonitoringIT tests (#39271 ) Without this, when creating the watch history indices they complain about there being no such setting as `index.lifecycle.name`. Relates to #38805	2019-02-21 21:45:43 -07:00
Julie Tibshirani	29243f7001	Avoid using TimeWarp in TransformIntegrationTests. (#39277 ) This commit makes `TransformIntegrationTests` into a standard integration test, as opposed to using `TimeWarp`, which registers the mock component `ScheduleEngineTriggerMock` to trigger watches. The simplification may help with flakiness we've observed `TimeWarp, as in #37882.	2019-02-21 18:02:44 -08:00
Jay Modi	697911c31d	Fixed missed stopping of SchedulerEngine (#39193 ) The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875	2019-02-21 14:31:33 -07:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Benjamin Trent	34d06471c3	[CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (#39270 )	2019-02-21 14:17:03 -06:00
Benjamin Trent	8072543428	Muting AutoFollowIT.testAutoFollowManyIndices (#39265 )	2019-02-21 13:43:09 -06:00
Jason Tedor	b9f8be6968	Clarify the use of sleep in CCR test Sleeps in tests smell funny, and we try to avoid them to the extent possible. We are using a small one in a CCR test. This commit clarifies the purpose of that sleep by adding a comment explaining it. We also removed a hard-coded value from the test, that if we ever modified the value higher up where it was set, we could end up forgetting to change the value here. Now we ensure that these would move in lock step if we ever maintain them later.	2019-02-21 14:05:48 -05:00
Jason Tedor	719c38a36d	Fix CCR tests that manipulate transport requests We have some CCR tests where we use mock transport send rules to control the behavior that we desire in these tests. Namely, we want to simulate an exception being thrown on the leader side, or a variety of other situations. These send rules were put in place between the data nodes on each side. However, it might not be the case that these requests are being sent between data nodes. For example, a request that is handled on a non-data master node would not be sent from a data node. And it might not be the case that the request is sent to a data node, as it could be proxied through a non-data coordinating node. This commit addresses this by putting these send rules in places between all nodes on each side. Closes #39011 Closes #39201	2019-02-21 12:26:09 -05:00
Tanguy Leroux	fc896e452c	ReadOnlyEngine should update translog recovery state information (#39238 ) (#39251 ) `ReadOnlyEngine` never recovers operations from translog and never updates translog information in the index shard's recovery state, even though the recovery state goes through the `TRANSLOG` stage during the recovery. It means that recovery information for frozen shards indicates an unkown number of recovered translog ops in the Recovery APIs (translog_ops: `-1` and translog_ops_percent: `-1.0%`) and this is confusing. This commit changes the `recoverFromTranslog()` method in `ReadOnlyEngine` so that it always recover from an empty translog snapshot, allowing the recovery state translog information to be correctly updated. Related to #33888	2019-02-21 18:08:06 +01:00
Martijn van Groningen	f40139c403	Change ShardFollowTask to reuse common serialization logic (#39094 ) Initially in #38910, ShardFollowTask was reusing ImmutableFollowParameters' serialization logic. After merging, bwc tests failed sometimes and the binary serialization that ShardFollowTask was originally was using was added back. ImmutableFollowParameters is using optional fields (optional vint) while ShardFollowTask was not (vint).	2019-02-21 09:32:33 +01:00
Nhat Nguyen	a96df5d209	Reduce refresh when lookup term in FollowingEngine (#39184 ) Today we always refresh when looking up the primary term in FollowingEngine. This is not necessary for we can simply return none for operations before the global checkpoint.	2019-02-20 19:21:00 -05:00
Nhat Nguyen	cdec11c4eb	Relax history check in ShardFollowTaskReplicationTests (#39162 ) The follower won't always have the same history as the leader for its soft-deletes retention can be different. However, if some operation exists on the history of the follower, then the same operation must exist on the leader. This change relaxes the history check in ShardFollowTaskReplicationTests. Closes #39093	2019-02-20 19:21:00 -05:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Mark Vieira	24ac9da276	Mute CCR retention test that is consistently failing locally and in CI	2019-02-20 11:57:46 -08:00
Jay Modi	af451459a5	Fix failures in SessionFactoryLoadBalancingTests (#39154 ) This change aims to fix failures in the session factory load balancing tests that mock failure scenarios. For these tests, we randomly shut down ldap servers and bind a client socket to the port they were listening on. Unfortunately, we would occasionally encounter failures in these tests where a socket was already in use and/or the port we expected to connect to was wrong and in fact was to one of the ldap instances that should have been shut down. The failures are caused by the behavior of certain operating systems when it comes to binding ports and wildcard addresses. It is possible for a separate application to be bound to a wildcard address and still allow our code to bind to that port on a specific address. So when we close the server socket and open the client socket, we are still able to establish a connection since the other application is already listening on that port on a wildcard address. Another variant is that the os will allow a wildcard bind of a server socket when there is already an application listening on that port for a specific address. In order to do our best to prevent failures in these scenarios, this change does the following: 1. Binds a client socket to all addresses in an awaitBusy 2. Adds assumption that we could bind all valid addresses 3. In the case that we still establish a connection to an address that we should not be able to, try to bind and expect a failure of not being connected Closes #32190	2019-02-20 11:38:26 -07:00
Jason Tedor	90b1b36f50	Add cleanup logic to CCR retention lease test This commit adds some logic to remove the mock transport rules at the end of a CCR retention lease test.	2019-02-20 13:20:07 -05:00
Jason Tedor	cfd7c77b64	Fix broken CCR retention lease unfollow test This commit fixes a broken CCR retention lease unfollow test. The problem with the test is that the random subset of shards that we picked to disrupt would not necessarily overlap with the actual shards in use. We could take a non-empty subset of [0, 3] (e.g., { 2 }) when the only shard IDs in use were [0, 1]. This commit fixes this by taking into account the number of shards in use in the test. With this change, we also take measure to ensure that a successful branch is tested more frequently than would otherwise be the case. On that branch, we want to sometimes pretend that the retention lease is already removed. The randomness here was also sometimes selecting a subset of shards that did not overlap with the shards actually in use during the test. While this does not break the test, it is confusing and reduces the amount of coverage of that branch. Relates #39185	2019-02-20 12:09:28 -05:00
Albert Zaharovits	af8ef1bb98	Do not create the missing index when invoking getRole (#39039 ) In most of the places we avoid creating the `.security` index (or updating the mapping) for read/search operations. This is more of a nit for the case of the getRole call, that fixes a possible mapping update during a get role, and removes a dead if branch about creating the `.security` index.	2019-02-20 17:33:10 +02:00
Jason Tedor	48984f647d	Mute failing CCR retention lease unfollow test This commit mutes a CCR retention lease unfollow test that is failing randomly, but frequently.	2019-02-20 09:47:17 -05:00
Jason Tedor	09ea3ccd16	Remove retention leases when unfollowing (#39088 ) This commit attempts to remove the retention leases on the leader shards when unfollowing an index. This is best effort, since the leader might not be available.	2019-02-20 07:06:49 -05:00
Andrei Stefan	c1018db404	SQL: enforce JDBC driver - ES server version parity (#38972 ) (cherry picked from commit 822a21f29491f295b22dacd04b747781a69ffa61)	2019-02-20 11:29:02 +02:00
Andrei Stefan	92206c8567	Added "validate.properties" property to JDBC's list of allowed properties. (#39050 ) This defaults to "true" (current behavior) and will throw an exception if there is a property that cannot be recognized. If "false", it will ignore anything unrecognizable. (cherry picked from commit 38fbf9792bcf4fe66bb3f17589e5fe6d29748d07)	2019-02-20 11:29:01 +02:00
Tim Vernum	4aa50ed348	Resolve concurrency with watcher trigger service (#39164 ) The watcher trigger service could attempt to modify the perWatchStats map simultaneously from multiple threads. This would cause the internal state to become inconsistent, in particular the count() method may return an incorrect value for the number of watches. This changes replaces the implementation of the map with a ConcurrentHashMap so that its internal state remains consistent even when accessed from mutiple threads. Backport of: #39092	2019-02-20 19:18:00 +11:00
Julie Tibshirani	f5b28ca69d	Enable test logging for TransformIntegrationTests#testSearchTransform. There is already fairly detailed debug logging in the watcher framework, which should hopefully help debug the failure. Relates to #37882.	2019-02-19 18:15:34 -08:00
Tal Levy	b5dbd1a027	AwaitsFix XPackUsageIT#testXPackCcrUsage. relates to #39126.	2019-02-19 13:28:46 -08:00
Benjamin Trent	109b6451fd	ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs (#38822 ) (#39119 ) * ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs * Addressing pr feedback	2019-02-19 12:45:56 -06:00
Ioannis Kakavas	210f34f8e9	Remove BCryptTests (#39098 ) This test was added to verify that we fixed a specific behavior in Bcrypt and hasn't been running for almost 4 years now.	2019-02-19 18:12:18 +02:00
David Roberts	35e30b34f9	[ML] Stop the ML memory tracker before closing node (#39111 ) The ML memory tracker does searches against ML results and config indices. These searches can be asynchronous, and if they are running while the node is closing then they can cause problems for other components. This change adds a stop() method to the MlMemoryTracker that waits for in-flight searches to complete. Once stop() has returned the MlMemoryTracker will not kick off any new searches. The MlLifeCycleService now calls MlMemoryTracker.stop() before stopping stopping the node. Fixes #37117	2019-02-19 15:12:40 +00:00
David Roberts	bbcdea43c5	[ML] Allow stop unassigned datafeed and relax unset upgrade mode wait (#39034 ) These two changes are interlinked. Before this change unsetting ML upgrade mode would wait for all datafeeds to be assigned and not waiting for their corresponding jobs to initialise. However, this could be inappropriate, if there was a reason other that upgrade mode why one job was unable to be assigned or slow to start up. Unsetting of upgrade mode would hang in this case. This change relaxes the condition for considering upgrade mode to be unset to simply that an assignment attempt has been made for each ML persistent task that did not fail because upgrade mode was enabled. Thus after unsetting upgrade mode there is no guarantee that every ML persistent task is assigned, just that each is not unassigned due to upgrade mode. In order to make setting upgrade mode work immediately after unsetting upgrade mode it was then also necessary to make it possible to stop a datafeed that was not assigned. There was no particularly good reason why this was not allowed in the past. It is trivial to stop an unassigned datafeed because it just involves removing the persistent task.	2019-02-19 14:07:10 +00:00
Martijn van Groningen	c8d59f6f0f	Fix shard follow task startup error handling (#39053 ) Prior to this commit, if during fetch leader / follower GCP a fatal error occurred, then the shard follow task was removed. This is unexpected, because if such an error occurs during the lifetime of shard follow task then replication is stopped and the fatal error flag is set. This allows the ccr stats api to report the fatal exception that has occurred (instead of the user grepping through the elasticsearch logs). This issue was found by a rare failure of the `FollowStatsIT#testFollowStatsApiIncludeShardFollowStatsWithRemovedFollowerIndex` test. Closes #38779	2019-02-19 08:54:02 +01:00
Ioannis Kakavas	59e9a0f4f4	Disable specific locales for tests in fips mode (#38938 ) * Disable specific locales for tests in fips mode The Bouncy Castle FIPS provider that we use for running our tests in fips mode has an issue with locale sensitive handling of Dates as described in https://github.com/bcgit/bc-java/issues/405 This causes certificate validation to fail if any given test that includes some form of certificate validation happens to run in one of the locales. This manifested earlier in #33081 which was handled insufficiently in #33299 This change ensures that the problematic 3 locales * th-TH * ja-JP-u-ca-japanese-x-lvariant-JP * th-TH-u-nu-thai-x-lvariant-TH will not be used when running our tests in a FIPS 140 JVM. It also reverts #33299	2019-02-19 08:46:08 +02:00
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Martijn van Groningen	ce412908ed	also check ccr stats api return empty response in ensureNoCcrTasks() If this fails then it returns more detailed information, for example fatal error.	2019-02-18 16:15:22 +01:00
Nhat Nguyen	2947ccf5c3	Add remote recovery to ShardFollowTaskReplicationTests (#39007 ) We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975	2019-02-18 09:57:56 -05:00
Hendrik Muhs	1efb01661c	set minimum supported version (#39043 ) (#39051 ) change the minimum supported version of data frame transform	2019-02-18 15:41:25 +01:00
Martijn van Groningen	4fd1f8048d	Mute test #38949	2019-02-18 15:24:07 +01:00
David Roberts	b660d2cac6	[ML] More advanced post-test cleanup of ML indices (#39049 ) The .ml-annotations index is created asynchronously when some other ML index exists. This can interfere with the post-test index deletion, as the .ml-annotations index can be created after all other indices have been deleted. This change adds an ML specific post-test cleanup step that runs before the main cleanup and: 1. Checks if any ML indices exist 2. If so, waits for the .ml-annotations index to exist 3. Deletes the other ML indices found in step 1. 4. Calls the super class cleanup This means that by the time the main post-test index cleanup code runs: 1. The only ML index it has to delete will be the .ml-annotations index 2. No other ML indices will exist that could trigger recreation of the .ml-annotations index Fixes #38952	2019-02-18 14:16:03 +00:00
Martijn van Groningen	e8ea85d6e9	wait for shard to be allocated before executing a resume follow api	2019-02-18 14:50:40 +01:00

1 2 3 4 5 ...

2698 Commits