OpenSearch

Commit Graph

Author	SHA1	Message	Date
Julie Tibshirani	29243f7001	Avoid using TimeWarp in TransformIntegrationTests. (#39277 ) This commit makes `TransformIntegrationTests` into a standard integration test, as opposed to using `TimeWarp`, which registers the mock component `ScheduleEngineTriggerMock` to trigger watches. The simplification may help with flakiness we've observed `TimeWarp, as in #37882.	2019-02-21 18:02:44 -08:00
Jay Modi	697911c31d	Fixed missed stopping of SchedulerEngine (#39193 ) The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875	2019-02-21 14:31:33 -07:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Benjamin Trent	34d06471c3	[CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (#39270 )	2019-02-21 14:17:03 -06:00
Benjamin Trent	8072543428	Muting AutoFollowIT.testAutoFollowManyIndices (#39265 )	2019-02-21 13:43:09 -06:00
Jason Tedor	b9f8be6968	Clarify the use of sleep in CCR test Sleeps in tests smell funny, and we try to avoid them to the extent possible. We are using a small one in a CCR test. This commit clarifies the purpose of that sleep by adding a comment explaining it. We also removed a hard-coded value from the test, that if we ever modified the value higher up where it was set, we could end up forgetting to change the value here. Now we ensure that these would move in lock step if we ever maintain them later.	2019-02-21 14:05:48 -05:00
Jason Tedor	719c38a36d	Fix CCR tests that manipulate transport requests We have some CCR tests where we use mock transport send rules to control the behavior that we desire in these tests. Namely, we want to simulate an exception being thrown on the leader side, or a variety of other situations. These send rules were put in place between the data nodes on each side. However, it might not be the case that these requests are being sent between data nodes. For example, a request that is handled on a non-data master node would not be sent from a data node. And it might not be the case that the request is sent to a data node, as it could be proxied through a non-data coordinating node. This commit addresses this by putting these send rules in places between all nodes on each side. Closes #39011 Closes #39201	2019-02-21 12:26:09 -05:00
Tanguy Leroux	fc896e452c	ReadOnlyEngine should update translog recovery state information (#39238 ) (#39251 ) `ReadOnlyEngine` never recovers operations from translog and never updates translog information in the index shard's recovery state, even though the recovery state goes through the `TRANSLOG` stage during the recovery. It means that recovery information for frozen shards indicates an unkown number of recovered translog ops in the Recovery APIs (translog_ops: `-1` and translog_ops_percent: `-1.0%`) and this is confusing. This commit changes the `recoverFromTranslog()` method in `ReadOnlyEngine` so that it always recover from an empty translog snapshot, allowing the recovery state translog information to be correctly updated. Related to #33888	2019-02-21 18:08:06 +01:00
Martijn van Groningen	f40139c403	Change ShardFollowTask to reuse common serialization logic (#39094 ) Initially in #38910, ShardFollowTask was reusing ImmutableFollowParameters' serialization logic. After merging, bwc tests failed sometimes and the binary serialization that ShardFollowTask was originally was using was added back. ImmutableFollowParameters is using optional fields (optional vint) while ShardFollowTask was not (vint).	2019-02-21 09:32:33 +01:00
Nhat Nguyen	a96df5d209	Reduce refresh when lookup term in FollowingEngine (#39184 ) Today we always refresh when looking up the primary term in FollowingEngine. This is not necessary for we can simply return none for operations before the global checkpoint.	2019-02-20 19:21:00 -05:00
Nhat Nguyen	cdec11c4eb	Relax history check in ShardFollowTaskReplicationTests (#39162 ) The follower won't always have the same history as the leader for its soft-deletes retention can be different. However, if some operation exists on the history of the follower, then the same operation must exist on the leader. This change relaxes the history check in ShardFollowTaskReplicationTests. Closes #39093	2019-02-20 19:21:00 -05:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Mark Vieira	24ac9da276	Mute CCR retention test that is consistently failing locally and in CI	2019-02-20 11:57:46 -08:00
Jay Modi	af451459a5	Fix failures in SessionFactoryLoadBalancingTests (#39154 ) This change aims to fix failures in the session factory load balancing tests that mock failure scenarios. For these tests, we randomly shut down ldap servers and bind a client socket to the port they were listening on. Unfortunately, we would occasionally encounter failures in these tests where a socket was already in use and/or the port we expected to connect to was wrong and in fact was to one of the ldap instances that should have been shut down. The failures are caused by the behavior of certain operating systems when it comes to binding ports and wildcard addresses. It is possible for a separate application to be bound to a wildcard address and still allow our code to bind to that port on a specific address. So when we close the server socket and open the client socket, we are still able to establish a connection since the other application is already listening on that port on a wildcard address. Another variant is that the os will allow a wildcard bind of a server socket when there is already an application listening on that port for a specific address. In order to do our best to prevent failures in these scenarios, this change does the following: 1. Binds a client socket to all addresses in an awaitBusy 2. Adds assumption that we could bind all valid addresses 3. In the case that we still establish a connection to an address that we should not be able to, try to bind and expect a failure of not being connected Closes #32190	2019-02-20 11:38:26 -07:00
Jason Tedor	90b1b36f50	Add cleanup logic to CCR retention lease test This commit adds some logic to remove the mock transport rules at the end of a CCR retention lease test.	2019-02-20 13:20:07 -05:00
Jason Tedor	cfd7c77b64	Fix broken CCR retention lease unfollow test This commit fixes a broken CCR retention lease unfollow test. The problem with the test is that the random subset of shards that we picked to disrupt would not necessarily overlap with the actual shards in use. We could take a non-empty subset of [0, 3] (e.g., { 2 }) when the only shard IDs in use were [0, 1]. This commit fixes this by taking into account the number of shards in use in the test. With this change, we also take measure to ensure that a successful branch is tested more frequently than would otherwise be the case. On that branch, we want to sometimes pretend that the retention lease is already removed. The randomness here was also sometimes selecting a subset of shards that did not overlap with the shards actually in use during the test. While this does not break the test, it is confusing and reduces the amount of coverage of that branch. Relates #39185	2019-02-20 12:09:28 -05:00
Albert Zaharovits	af8ef1bb98	Do not create the missing index when invoking getRole (#39039 ) In most of the places we avoid creating the `.security` index (or updating the mapping) for read/search operations. This is more of a nit for the case of the getRole call, that fixes a possible mapping update during a get role, and removes a dead if branch about creating the `.security` index.	2019-02-20 17:33:10 +02:00
Jason Tedor	48984f647d	Mute failing CCR retention lease unfollow test This commit mutes a CCR retention lease unfollow test that is failing randomly, but frequently.	2019-02-20 09:47:17 -05:00
Jason Tedor	09ea3ccd16	Remove retention leases when unfollowing (#39088 ) This commit attempts to remove the retention leases on the leader shards when unfollowing an index. This is best effort, since the leader might not be available.	2019-02-20 07:06:49 -05:00
Andrei Stefan	c1018db404	SQL: enforce JDBC driver - ES server version parity (#38972 ) (cherry picked from commit 822a21f29491f295b22dacd04b747781a69ffa61)	2019-02-20 11:29:02 +02:00
Andrei Stefan	92206c8567	Added "validate.properties" property to JDBC's list of allowed properties. (#39050 ) This defaults to "true" (current behavior) and will throw an exception if there is a property that cannot be recognized. If "false", it will ignore anything unrecognizable. (cherry picked from commit 38fbf9792bcf4fe66bb3f17589e5fe6d29748d07)	2019-02-20 11:29:01 +02:00
Tim Vernum	4aa50ed348	Resolve concurrency with watcher trigger service (#39164 ) The watcher trigger service could attempt to modify the perWatchStats map simultaneously from multiple threads. This would cause the internal state to become inconsistent, in particular the count() method may return an incorrect value for the number of watches. This changes replaces the implementation of the map with a ConcurrentHashMap so that its internal state remains consistent even when accessed from mutiple threads. Backport of: #39092	2019-02-20 19:18:00 +11:00
Julie Tibshirani	f5b28ca69d	Enable test logging for TransformIntegrationTests#testSearchTransform. There is already fairly detailed debug logging in the watcher framework, which should hopefully help debug the failure. Relates to #37882.	2019-02-19 18:15:34 -08:00
Tal Levy	b5dbd1a027	AwaitsFix XPackUsageIT#testXPackCcrUsage. relates to #39126.	2019-02-19 13:28:46 -08:00
Benjamin Trent	109b6451fd	ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs (#38822 ) (#39119 ) * ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs * Addressing pr feedback	2019-02-19 12:45:56 -06:00
Ioannis Kakavas	210f34f8e9	Remove BCryptTests (#39098 ) This test was added to verify that we fixed a specific behavior in Bcrypt and hasn't been running for almost 4 years now.	2019-02-19 18:12:18 +02:00
David Roberts	35e30b34f9	[ML] Stop the ML memory tracker before closing node (#39111 ) The ML memory tracker does searches against ML results and config indices. These searches can be asynchronous, and if they are running while the node is closing then they can cause problems for other components. This change adds a stop() method to the MlMemoryTracker that waits for in-flight searches to complete. Once stop() has returned the MlMemoryTracker will not kick off any new searches. The MlLifeCycleService now calls MlMemoryTracker.stop() before stopping stopping the node. Fixes #37117	2019-02-19 15:12:40 +00:00
David Roberts	bbcdea43c5	[ML] Allow stop unassigned datafeed and relax unset upgrade mode wait (#39034 ) These two changes are interlinked. Before this change unsetting ML upgrade mode would wait for all datafeeds to be assigned and not waiting for their corresponding jobs to initialise. However, this could be inappropriate, if there was a reason other that upgrade mode why one job was unable to be assigned or slow to start up. Unsetting of upgrade mode would hang in this case. This change relaxes the condition for considering upgrade mode to be unset to simply that an assignment attempt has been made for each ML persistent task that did not fail because upgrade mode was enabled. Thus after unsetting upgrade mode there is no guarantee that every ML persistent task is assigned, just that each is not unassigned due to upgrade mode. In order to make setting upgrade mode work immediately after unsetting upgrade mode it was then also necessary to make it possible to stop a datafeed that was not assigned. There was no particularly good reason why this was not allowed in the past. It is trivial to stop an unassigned datafeed because it just involves removing the persistent task.	2019-02-19 14:07:10 +00:00
Martijn van Groningen	c8d59f6f0f	Fix shard follow task startup error handling (#39053 ) Prior to this commit, if during fetch leader / follower GCP a fatal error occurred, then the shard follow task was removed. This is unexpected, because if such an error occurs during the lifetime of shard follow task then replication is stopped and the fatal error flag is set. This allows the ccr stats api to report the fatal exception that has occurred (instead of the user grepping through the elasticsearch logs). This issue was found by a rare failure of the `FollowStatsIT#testFollowStatsApiIncludeShardFollowStatsWithRemovedFollowerIndex` test. Closes #38779	2019-02-19 08:54:02 +01:00
Ioannis Kakavas	59e9a0f4f4	Disable specific locales for tests in fips mode (#38938 ) * Disable specific locales for tests in fips mode The Bouncy Castle FIPS provider that we use for running our tests in fips mode has an issue with locale sensitive handling of Dates as described in https://github.com/bcgit/bc-java/issues/405 This causes certificate validation to fail if any given test that includes some form of certificate validation happens to run in one of the locales. This manifested earlier in #33081 which was handled insufficiently in #33299 This change ensures that the problematic 3 locales * th-TH * ja-JP-u-ca-japanese-x-lvariant-JP * th-TH-u-nu-thai-x-lvariant-TH will not be used when running our tests in a FIPS 140 JVM. It also reverts #33299	2019-02-19 08:46:08 +02:00
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Martijn van Groningen	ce412908ed	also check ccr stats api return empty response in ensureNoCcrTasks() If this fails then it returns more detailed information, for example fatal error.	2019-02-18 16:15:22 +01:00
Nhat Nguyen	2947ccf5c3	Add remote recovery to ShardFollowTaskReplicationTests (#39007 ) We simulate remote recovery in ShardFollowTaskReplicationTests by bootstrapping the follower with the safe commit of the leader. Relates #35975	2019-02-18 09:57:56 -05:00
Hendrik Muhs	1efb01661c	set minimum supported version (#39043 ) (#39051 ) change the minimum supported version of data frame transform	2019-02-18 15:41:25 +01:00
Martijn van Groningen	4fd1f8048d	Mute test #38949	2019-02-18 15:24:07 +01:00
David Roberts	b660d2cac6	[ML] More advanced post-test cleanup of ML indices (#39049 ) The .ml-annotations index is created asynchronously when some other ML index exists. This can interfere with the post-test index deletion, as the .ml-annotations index can be created after all other indices have been deleted. This change adds an ML specific post-test cleanup step that runs before the main cleanup and: 1. Checks if any ML indices exist 2. If so, waits for the .ml-annotations index to exist 3. Deletes the other ML indices found in step 1. 4. Calls the super class cleanup This means that by the time the main post-test index cleanup code runs: 1. The only ML index it has to delete will be the .ml-annotations index 2. No other ML indices will exist that could trigger recreation of the .ml-annotations index Fixes #38952	2019-02-18 14:16:03 +00:00
Martijn Laarman	9b4d96534b	Fix #38623 remove xpack namespace REST API (#38625 ) (#39036 ) * Fix #38623 remove xpack namespace REST API Except for xpack.usage and xpack.info API's, this moves the last remaining API's out of the xpack namespace * rename xpack api's inside inside the files as well * updated yaml tests references to xpack namespaces api's * update callsApi calls in the IT subclasses * make sure docs testing does not use xpack namespaced api's * fix leftover xpack namespaced method names in docs/build.gradle * found another leftover reference (cherry picked from commit ccb5d934363c37506b76119ac050a254fa80b5e7)	2019-02-18 12:40:07 +01:00
Martijn van Groningen	9aa542fb1b	Mute test Relates to #38779	2019-02-18 12:02:52 +01:00
Hendrik Muhs	4f662bd289	Add data frame feature (#38934 ) (#39029 ) The data frame plugin allows users to create feature indexes by pivoting a source index. In a nutshell this can be understood as reindex supporting aggregations or similar to the so called entity centric indexing. Full history is provided in: feature/data-frame-transforms	2019-02-18 11:07:29 +01:00
Martijn van Groningen	ed08bc3537	Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (#38709 ) * During fetching remote mapping if remote client is missing then `NoSuchRemoteClusterException` was not handled. * When adding remote connection, check that it is really connected before continue-ing to run the tests. Relates to #38695	2019-02-18 09:41:44 +01:00
Nhat Nguyen	204480d818	Mute testRetentionLeaseIsRenewedDuringRecovery Tracked at #39011	2019-02-17 15:34:51 -05:00
Jason Tedor	a5ce1e0bec	Integrate retention leases to recovery from remote (#38829 ) This commit is the first step in integrating shard history retention leases with CCR. In this commit we integrate shard history retention leases with recovery from remote. Before we start transferring files, we take out a retention lease on the primary. Then during the file copy phase, we repeatedly renew the retention lease. Finally, when recovery from remote is complete, we disable the background renewing of the retention lease.	2019-02-16 15:37:52 -05:00
Tim Brooks	b1c1daa63f	Add get file chunk timeouts with listener timeouts (#38758 ) This commit adds a `ListenerTimeouts` class that will wrap a `ActionListener` in a listener with a timeout scheduled on the generic thread pool. If the timeout expires before the listener is completed, `onFailure` will be called with an `ElasticsearchTimeoutException`. Timeouts for the get ccr file chunk action are implemented using this functionality. Additionally, this commit attempts to fix #38027 by also blocking proxied get ccr file chunk actions. This test being un-muted is useful to verify the timeout functionality.	2019-02-16 10:56:03 -07:00
Jason Tedor	d80325f288	Mark fail over on follower test as awaits fix This test is failing since the introduction of recovery from remote. This commit marks this test as awaits fix.	2019-02-16 12:28:16 -05:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Nhat Nguyen	20755e666c	Reduce global checkpoint sync interval in disruption tests (#38931 ) We verify seq_no_stats is aligned between copies at the end of some disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due to a lagged global checkpoint on replicas. The global checkpoint on replicas is lagged because we sync the global checkpoint 30 seconds (by default) after the last replication operation. This change reduces the global checkpoint sync-internal to 1s in the disruption tests. Closes #38318 Closes #36789	2019-02-15 21:04:20 -05:00
Jason Tedor	58551198d5	Address some CCR REST test case flakiness (#38975 ) The CCR REST tests that rely on these assertions are flaky. They are flaky since the introduction of recovery from the remote. The underlying problem is this: these tests are making assertions about the number of operations read by the shard following task. However, with recovery from remote, we no longer have guarantees that the assumptions these tests were relying on hold. Namely, these tests were assuming that the only way that a document could land in the follower index is via the shard following task. With recovery from remote, there is another way, which is via the files that are copied over during the recovery phase. Most of the time this will not be a problem because with the small number of documents that we are indexing in these tests, it is usally not the case that a flush would occur and so there would not be any documents in the files copied over. However, a flush can occur any time at which point all of the indexed documents could end up in a safe commit and copied over during recovery from remote. This commit modifies these assertions to ones that are not prone to this issue, yet still validate the health of the follower shard.	2019-02-15 16:01:02 -05:00
Martijn van Groningen	03b67b3ee1	Introduced class reuses follow parameter code between ShardFollowTasks (#38910 ) and AutoFollowPattern classes. The ImmutableFollowParameters is like the already existing FollowParameters, but all of its fields are final.	2019-02-15 18:26:15 +01:00
iverase	b19b778cbb	[CI] Muting method testFollowIndex in IndexFollowingIT Relates to #38949	2019-02-15 16:07:45 +01:00
Yogesh Gaikwad	36c274867e	Fix intermittent failure in ApiKeyIntegTests (#38627 ) (#38935 ) Few tests failed intermittently and most of the times due to invalidated or expired keys that were deleted were still reported in search results. This commit removes the test and adds enhancements to other tests testing different scenario's. When ExpiredApiKeysRemover is triggered, the tests did not await its termination thereby sometimes the results would be wrong for a search operation. DELETE_INTERVAL setting has been further reduced to 100ms so we can trigger ExpiredApiKeysRemover faster. Closes #38408	2019-02-15 23:01:35 +11:00

1 2 3 4 5 ...

2301 Commits