OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	24e478c58e	Fix test, more than one node may be connected. Relates to #37681	2019-02-26 10:40:09 +01:00
David Kyle	f7cba82c77	[ML] Reenable ml rolling upgrade tests (#39290 )	2019-02-26 08:51:59 +00:00
Ioannis Kakavas	7f999c43b3	[BACKPORT-7.x] Fix TokenBackwardsCompatibility tests (#39294 ) This change is a backport of #39252 - Fixes TokenBackwardsCompatibilityIT: Existing tests seemed to made the assumption that in the oneThirdUpgraded stage the master node will be on the old version and in the twoThirdsUpgraded stage, the master node will be one of the upgraded ones. However, there is no guarantee that the master node in any of the states will or will not be one of the upgraded ones. This class now tests: - That we can generate and consume tokens before we start the rolling upgrade. - That we can consume tokens generated in the old cluster during all the stages of the rolling upgrade. - That while on a mixed cluster, when/if the master node is upgraded, we can generate, consume and refresh a token - That after the rolling upgrade, we can consume a token generated in an old cluster and can invalidate it so that it can't be used any more. - Ensures that during the rolling upgrade, the upgraded nodes have the same configuration as the old nodes. Specifically that the file realm we use is explicitly named `file1`. This is needed because while attempting to refresh a token in a mixed cluster we might create a token hitting an old node and attempt to refresh it hitting a new node. If the file realm name is not the same, the refresh will be seen as being made by a "different" client, and will, thus, fail. - Renames the Authentication variable we check while refreshing a token to be clientAuth in order to make the code more readable. Some of the above were possibly causing the flakiness of #37379	2019-02-26 10:42:36 +02:00
Martijn van Groningen	b159cc51c0	Ensure remote connection established and clean remote connection prior to leader cluster restart Relates to #37681	2019-02-26 09:06:30 +01:00
Nhat Nguyen	e9dda75834	Enable soft-deletes by default for 7.0+ indices (#38929 ) Today when users upgrade to 7.0, existing indices will automatically switch to soft-deletes without an opt-out option. With this change, we only enable soft-deletes by default for new indices. Relates #36141	2019-02-25 17:54:29 -05:00
Jason Tedor	a6c0166d68	Renew retention leases while following (#39335 ) This commit is the final piece of the integration of CCR with retention leases. Namely, we periodically renew retention leases and advance the retaining sequence number while following.	2019-02-25 17:14:19 -05:00
Lee Hinman	7b8178c839	Remove Hipchat support from Watcher (#39374 ) * Remove Hipchat support from Watcher (#39199) Hipchat has been shut down and has previously been deprecated in Watcher (#39160), therefore we should remove support for these actions. * Add migrate note	2019-02-25 15:08:46 -07:00
Benjamin Trent	926291aac8	[DATA-FRAME] Sort `GET` transforms and stats by ID (#39365 ) (#39369 ) * [Data-Frame] Sort `GET` transforms and stats by ID * removing unused import	2019-02-25 14:22:41 -06:00
Nhat Nguyen	0f29b89655	Unmute FollowerFailOverIT#testFailOverOnFollower Relates #38633	2019-02-25 14:44:44 -05:00
Hendrik Muhs	1897883adc	[ML-DataFrame] Dataframe access headers (#39289 ) (#39368 ) store user headers as part of the config and run transform as user	2019-02-25 19:08:26 +01:00
Benjamin Trent	3d49523726	[DATA-FRAME] adds specs and yml tests for existing endpoints (#39326 ) (#39363 ) * [DATA-FRAME] adds specs and yml tests for existing endpoints * removing bad URL, adding test for _all	2019-02-25 11:19:49 -06:00
Nhat Nguyen	48219112e3	Do not wait for advancement of checkpoint in recovery (#39006 ) With this change, we won't wait for the local checkpoint to advance to the max_seq_no before starting phase2 of peer-recovery. We also remove the sequence number range check in peer-recovery. We can safely do these thanks to Yannick's finding. The replication group to be used is currently sampled after indexing into the primary (see `ReplicationOperation` class). This means that when initiating tracking of a new replica, we have to consider the following two cases: - There are operations for which the replication group has not been sampled yet. As we initiated the new replica as tracking, we know that those operations will be replicated to the new replica and follow the typical replication group semantics (e.g. marked as stale when unavailable). - There are operations for which the replication group has already been sampled. These operations will not be sent to the new replica. However, we know that those operations are already indexed into Lucene and the translog on the primary, as the sampling is happening after that. This means that by taking a snapshot of Lucene or the translog, we will be getting those ops as well. What we cannot guarantee anymore is that all ops up to `endingSeqNo` are available in the snapshot (i.e. also see comment in `RecoverySourceHandler` saying `We need to wait for all operations up to the current max to complete, otherwise we can not guarantee that all operations in the required range will be available for replaying from the translog of the source.`). This is not needed, though, as we can no longer guarantee that max seq no == local checkpoint. Relates #39000 Closes #38949 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2019-02-25 12:10:14 -05:00
Martijn van Groningen	6f69ef165b	Protect against the leader index being removed (#39351 ) when dealing with TimeoutException The `IndexFollowingIT#testDeleteLeaderIndex()`` test failed, because a NPE was captured as fatal error instead of an IndexNotFoundException. Closes #39308	2019-02-25 13:40:10 +01:00
Costin Leau	9d97f3289d	Mute CcrRollingUpgradeIT#testCannotFollowLeaderInUpgradedCluster See #39355	2019-02-25 14:06:27 +02:00
Martijn van Groningen	9bf0538878	Wait for index following is active for auto followed index (#39175 ) before executing pause follow api: https://github.com/elastic/elasticsearch/issues/39126#issuecomment-465512002 Closes #39126	2019-02-25 10:44:20 +01:00
Yogesh Gaikwad	7021e1bd3b	Add await busy loop for SimpleKdcLdapServer initialization (#39221 ) (#39342 ) There have been intermittent failures where either LDAP server could not be started or KDC server could not be started causing failures during test runs. `KdcNetwork` class from Apache kerby project does not set reuse address to `true` on the socket so if the port that we found to be free is in `TIME_WAIT` state it may fail to bind. As this is an internal class for kerby, I could not find a way to extend. This commit adds a retry loop for initialization. It will keep trying in an await busy loop and fail after 10 seconds if not initialized. Closes #35982	2019-02-25 20:35:08 +11:00
Jason Tedor	6e06f82106	Fix failing CCR retention lease test Finally! This commit should fix the issues with the CCR retention lease that has been plaguing build failures. The issue here is that we are trying to prevent the clear session requests from being executed until after we have been able to validate that retention leases are being renewed. However, we were only blocking the clear session requests but not blocking them when they are proxied through another node. This commit addresses that. Relates #39268	2019-02-22 20:43:39 -05:00
Jason Tedor	2d4c98a991	Change sort order of shard stats in CCR test This commit changes the sort order of shard stats that are collected in CCR retention lease integration tests. This change is done so that primaries appear first in sort order.	2019-02-22 18:17:28 -05:00
Jason Tedor	e569cf8324	Address failing CCR retention lease test This test fails rarely but it is flaky in its current form. The problem here is that we lack a guarantee on the retention leases having been synced to all shard copies. We need to sleep long enough to ensure that that occurs, and then we can sample the retention leases, possibly sleep again (we usually will not have too since the first sleep will have been long enough to allow a sync and a renewal to happen, if one was going to happen), and the sample the retention leases for comparison. Closes #39331	2019-02-22 18:15:10 -05:00
Jason Tedor	e4e96b8181	Fix shard logged in background lease renewal The shard logged here is the leader shard but it should be the follower shard since this background retention lease renewal is happening on the follower side. This commit fixes that.	2019-02-22 17:32:51 -05:00
Jason Tedor	feb25c71a0	Simplify mocking in CCR retention lease tests This commit simplifies the use of transport mocking in the CCR retention lease integration tests. Instead of adding a send rule between nodes, we add a default send rule. This greatly simplifies the code here, and speeds the test up a little bit too.	2019-02-22 17:24:12 -05:00
Tim Brooks	931953a3ee	Ensure index commit released when testing timeouts (#39273 ) This fixes #39245. Currently it is possible in this test that the clear session call times-out. This means that the index commit will not be released and there will be an assertion triggered in the test teardown. This commit ensures that we wipe the leader index in the test to avoid this assertion. It is okay if the clear session call times-out in normal usage. This scenario is unavoidable due to potential network issues. We have a local timeout on the leader to clean it up when this scenario happens.	2019-02-22 11:14:42 -07:00
Benjamin Trent	3262d6c917	[ML-DataFrame] Add _preview endpoint (#38924 ) (#39319 ) * [DATA-FRAME] add preview endpoint * adjusting preview tests and fixing parser * adjusing preview transport * remove unused import * adjusting test * Addressing PR comments * Fixing failing test and adjusting for pr comments * fixing integration test	2019-02-22 10:55:38 -06:00
David Roberts	4f2bd238d2	[ML] Increase datafeed integration test timeout for slow machines (#39311 ) The assertBusy() that waits the default 10 seconds for a datafeed to complete very occasionally times out on slow machines. This commit increases the timeout to 60 seconds. It will almost never actually take this long, but it's better to have a timeout that will prevent time being wasted looking at spurious test failures.	2019-02-22 15:35:32 +00:00
Gordon Brown	2ad1e6aedc	Fix testCannotShrinkLeaderIndex (#38529 ) This test should no longer pass when the functionality it is intended to test is broken, as it now indexes a number of documents and verifies that the index is staying on the same step until after indexing and replication of those documents is finished. This prevents the test from passing if the leader index progresses in its lifecycle during that time.	2019-02-22 08:03:36 -07:00
Dimitris Athanasiou	1c6818fe74	[ML] Improve DeleteExpiredDataIT failure message (#39298 ) (#39310 ) This test failed once in a very long time with the assertion that there is no document for the `non_existing_job` in the state index. I could not see how that is possible and I cannot reproduce. With this commit the failure message will reveal some examples of the left behind docs which might shed a light about what could go wrong.	2019-02-22 16:15:11 +02:00
Daniel Mitterdorfer	9fea21aca5	Remove ExceptionsHelper#detailedMessage in tests (#37921 ) (#39297 ) With this commit we remove all usages of the deprecated method `ExceptionsHelper#detailedMessage` in tests. We do not address production code here but rather in dedicated follow-up PRs to keep the individual changes manageable. Relates #19069	2019-02-22 14:03:29 +01:00
Ioannis Kakavas	401226fc90	Mute rolling upgrade watcher CRUD tests (#39293 ) This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: https://github.com/elastic/elasticsearch/issues/33185	2019-02-22 13:27:45 +02:00
Lee Hinman	3401afdf35	Add ILM plugin for MonitoringIT tests (#39271 ) Without this, when creating the watch history indices they complain about there being no such setting as `index.lifecycle.name`. Relates to #38805	2019-02-21 21:45:43 -07:00
Julie Tibshirani	29243f7001	Avoid using TimeWarp in TransformIntegrationTests. (#39277 ) This commit makes `TransformIntegrationTests` into a standard integration test, as opposed to using `TimeWarp`, which registers the mock component `ScheduleEngineTriggerMock` to trigger watches. The simplification may help with flakiness we've observed `TimeWarp, as in #37882.	2019-02-21 18:02:44 -08:00
Jay Modi	697911c31d	Fixed missed stopping of SchedulerEngine (#39193 ) The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875	2019-02-21 14:31:33 -07:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Benjamin Trent	34d06471c3	[CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (#39270 )	2019-02-21 14:17:03 -06:00
Benjamin Trent	8072543428	Muting AutoFollowIT.testAutoFollowManyIndices (#39265 )	2019-02-21 13:43:09 -06:00
Jason Tedor	b9f8be6968	Clarify the use of sleep in CCR test Sleeps in tests smell funny, and we try to avoid them to the extent possible. We are using a small one in a CCR test. This commit clarifies the purpose of that sleep by adding a comment explaining it. We also removed a hard-coded value from the test, that if we ever modified the value higher up where it was set, we could end up forgetting to change the value here. Now we ensure that these would move in lock step if we ever maintain them later.	2019-02-21 14:05:48 -05:00
Jason Tedor	719c38a36d	Fix CCR tests that manipulate transport requests We have some CCR tests where we use mock transport send rules to control the behavior that we desire in these tests. Namely, we want to simulate an exception being thrown on the leader side, or a variety of other situations. These send rules were put in place between the data nodes on each side. However, it might not be the case that these requests are being sent between data nodes. For example, a request that is handled on a non-data master node would not be sent from a data node. And it might not be the case that the request is sent to a data node, as it could be proxied through a non-data coordinating node. This commit addresses this by putting these send rules in places between all nodes on each side. Closes #39011 Closes #39201	2019-02-21 12:26:09 -05:00
Tanguy Leroux	fc896e452c	ReadOnlyEngine should update translog recovery state information (#39238 ) (#39251 ) `ReadOnlyEngine` never recovers operations from translog and never updates translog information in the index shard's recovery state, even though the recovery state goes through the `TRANSLOG` stage during the recovery. It means that recovery information for frozen shards indicates an unkown number of recovered translog ops in the Recovery APIs (translog_ops: `-1` and translog_ops_percent: `-1.0%`) and this is confusing. This commit changes the `recoverFromTranslog()` method in `ReadOnlyEngine` so that it always recover from an empty translog snapshot, allowing the recovery state translog information to be correctly updated. Related to #33888	2019-02-21 18:08:06 +01:00
Martijn van Groningen	f40139c403	Change ShardFollowTask to reuse common serialization logic (#39094 ) Initially in #38910, ShardFollowTask was reusing ImmutableFollowParameters' serialization logic. After merging, bwc tests failed sometimes and the binary serialization that ShardFollowTask was originally was using was added back. ImmutableFollowParameters is using optional fields (optional vint) while ShardFollowTask was not (vint).	2019-02-21 09:32:33 +01:00
Nhat Nguyen	a96df5d209	Reduce refresh when lookup term in FollowingEngine (#39184 ) Today we always refresh when looking up the primary term in FollowingEngine. This is not necessary for we can simply return none for operations before the global checkpoint.	2019-02-20 19:21:00 -05:00
Nhat Nguyen	cdec11c4eb	Relax history check in ShardFollowTaskReplicationTests (#39162 ) The follower won't always have the same history as the leader for its soft-deletes retention can be different. However, if some operation exists on the history of the follower, then the same operation must exist on the leader. This change relaxes the history check in ShardFollowTaskReplicationTests. Closes #39093	2019-02-20 19:21:00 -05:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Mark Vieira	24ac9da276	Mute CCR retention test that is consistently failing locally and in CI	2019-02-20 11:57:46 -08:00
Jay Modi	af451459a5	Fix failures in SessionFactoryLoadBalancingTests (#39154 ) This change aims to fix failures in the session factory load balancing tests that mock failure scenarios. For these tests, we randomly shut down ldap servers and bind a client socket to the port they were listening on. Unfortunately, we would occasionally encounter failures in these tests where a socket was already in use and/or the port we expected to connect to was wrong and in fact was to one of the ldap instances that should have been shut down. The failures are caused by the behavior of certain operating systems when it comes to binding ports and wildcard addresses. It is possible for a separate application to be bound to a wildcard address and still allow our code to bind to that port on a specific address. So when we close the server socket and open the client socket, we are still able to establish a connection since the other application is already listening on that port on a wildcard address. Another variant is that the os will allow a wildcard bind of a server socket when there is already an application listening on that port for a specific address. In order to do our best to prevent failures in these scenarios, this change does the following: 1. Binds a client socket to all addresses in an awaitBusy 2. Adds assumption that we could bind all valid addresses 3. In the case that we still establish a connection to an address that we should not be able to, try to bind and expect a failure of not being connected Closes #32190	2019-02-20 11:38:26 -07:00
Jason Tedor	90b1b36f50	Add cleanup logic to CCR retention lease test This commit adds some logic to remove the mock transport rules at the end of a CCR retention lease test.	2019-02-20 13:20:07 -05:00
Jason Tedor	cfd7c77b64	Fix broken CCR retention lease unfollow test This commit fixes a broken CCR retention lease unfollow test. The problem with the test is that the random subset of shards that we picked to disrupt would not necessarily overlap with the actual shards in use. We could take a non-empty subset of [0, 3] (e.g., { 2 }) when the only shard IDs in use were [0, 1]. This commit fixes this by taking into account the number of shards in use in the test. With this change, we also take measure to ensure that a successful branch is tested more frequently than would otherwise be the case. On that branch, we want to sometimes pretend that the retention lease is already removed. The randomness here was also sometimes selecting a subset of shards that did not overlap with the shards actually in use during the test. While this does not break the test, it is confusing and reduces the amount of coverage of that branch. Relates #39185	2019-02-20 12:09:28 -05:00
Albert Zaharovits	af8ef1bb98	Do not create the missing index when invoking getRole (#39039 ) In most of the places we avoid creating the `.security` index (or updating the mapping) for read/search operations. This is more of a nit for the case of the getRole call, that fixes a possible mapping update during a get role, and removes a dead if branch about creating the `.security` index.	2019-02-20 17:33:10 +02:00
Jason Tedor	48984f647d	Mute failing CCR retention lease unfollow test This commit mutes a CCR retention lease unfollow test that is failing randomly, but frequently.	2019-02-20 09:47:17 -05:00
Jason Tedor	09ea3ccd16	Remove retention leases when unfollowing (#39088 ) This commit attempts to remove the retention leases on the leader shards when unfollowing an index. This is best effort, since the leader might not be available.	2019-02-20 07:06:49 -05:00
Andrei Stefan	c1018db404	SQL: enforce JDBC driver - ES server version parity (#38972 ) (cherry picked from commit 822a21f29491f295b22dacd04b747781a69ffa61)	2019-02-20 11:29:02 +02:00
Andrei Stefan	92206c8567	Added "validate.properties" property to JDBC's list of allowed properties. (#39050 ) This defaults to "true" (current behavior) and will throw an exception if there is a property that cannot be recognized. If "false", it will ignore anything unrecognizable. (cherry picked from commit 38fbf9792bcf4fe66bb3f17589e5fe6d29748d07)	2019-02-20 11:29:01 +02:00

1 2 3 4 5 ...

2714 Commits