OpenSearch

Commit Graph

Author	SHA1	Message	Date
Igor Motov	3d93011e32	Fix median calculation in MedianAbsoluteDeviationAggregatorTests (#38979 ) Fixes an error in median calculation in MedianAbsoluteDeviationAggregatorTests for odd number of sample points, which causes some rare test failures. Fixes #38937	2019-02-20 13:24:30 -05:00
Ioannis Kakavas	c783069804	Fix NPE on Stale Index in IndicesService(#39173 ) This is a backport of #38891 which closes #38845	2019-02-20 15:35:35 +02:00
David Turner	efffb3d5b7	Simplify calculation in AwarenessAllocationDecider (#38091 ) Today's calculation of the maximum number of shards per attribute is rather convoluted. This commit clarifies that it returns ceil(shardCount/numberOfAttributes).	2019-02-20 08:54:57 +00:00
Henning Andersen	00a26b9dd2	Blob store compression fix (#39073 ) Blob store compression was not enabled for some of the files in snapshots due to constructor accessing sub-class fields. Fixed to instead accept compress field as constructor param. Also fixed chunk size validation to work. Deprecated repositories.fs.compress setting as well to be able to unify in a future commit.	2019-02-20 09:24:41 +01:00
Hendrik Muhs	50b3858f7c	add version 6.6.2	2019-02-19 20:28:06 +01:00
David Turner	0a9574c9d4	Add some missing toString() implementations (#39124 ) Sometimes we turn objects into strings for logging or debugging using `toString()`, but the default implementation is often unhelpful. This change improves on this in two places I ran into recently.	2019-02-19 17:52:41 +00:00
Jason Tedor	fef9bdb23f	Allow retention lease operations under blocks (#39089 ) This commit allows manipulating retention leases under blocks.	2019-02-19 10:26:49 -05:00
Jason Tedor	12f6963456	Fix retention leases sync on recovery test This test had a bug. We attempt to allow only the primary to be allocated, to force all replicas to recovery from the primary after we had set the state of the retention leases on the primary. However, in building the index settings, we were overwriting the settings that exclude the replicas from being allocated. This means that some of the replicas would end up assigned and rather than receive retention leases during recovery, they would be part of the replication group receiving retention leases as they are manipulated. Since retention lease renewals are only synced periodically, this means that the replica could be lagging a little behind in some cases leading to an assertion tripping in the test. This commit addresses this by ensuring that the replicas are indeed not allocated until after the retention leases are done being manipulated on the replica. We did this by not overwriting the exclude settings. Closes #39105	2019-02-19 09:07:33 -05:00
Alexander Reelsen	7f8a640363	Fix DateFormatters.parseMillis when no timezone is given (#39100 ) The parseMillis method was able to work on formats without timezones by falling back to UTC. The Date Formatter interface did not support this, as the calling code was using the `Instant.from` java time API. This switches over to an internal method which adds UTC as a timezone. Closes #39067	2019-02-19 14:12:22 +01:00
Jim Ferenczi	199155f5fb	Enforce Completion Context Limit (#38675 ) (#39075 ) This change adds a limit to the number of completion contexts that a completion field can define. Closes #32741	2019-02-19 08:52:24 +01:00
Albert Zaharovits	6bc88b00ec	Mute GatewayMetaStateTests.testAtomicityWithFailures (#39079 ) Mute test GatewayMetaStateTests.testAtomicityWithFailures	2019-02-19 00:25:45 +02:00
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Jason Tedor	d43ac8fe11	Include in log retention leases that failed to sync When retention leases fail to sync after an expiration check, we emit a log message about this. This commit adds the retention leases that failed to sync.	2019-02-18 15:08:08 -05:00
Jason Tedor	bbb61002ba	Add some logging related to retention lease syncing (#39066 ) When the background retention lease sync fires, we check an see if any retention leases are expired. If any did expire, we execute a full retention lease sync (write action). Since this is happening on a background thread, we do not block that thread waiting for success (it will simply try again when the timer elapses). However, we were swallowing exceptions that indicate failure. This commit addresses that by logging the failures. Additionally, we add some trace logging to the execution of syncing retention leases.	2019-02-18 15:02:31 -05:00
Henning Andersen	99b2bc3461	Fix potential race during TcpTransport close (#39031 ) Fixed two potential causes for leaked threads during tests: 1. When adding a channel to serverChannels, we add it under a monitor that we do not use when reading from it. This is potentially unsafe if there is no other happens-before relationship ensuring the safety of this. 2. Long-shot but if the thread pool was shutdown before entering this code, we would silently forget about closing server channels so added assert. Strengthened the locking to ensure that once we stop the transport, no new server channels can be made. Relates to CI failure issue: #37543	2019-02-18 19:13:23 +01:00
Alan Woodward	ab4d5f404f	Add overlapping, before, after filters to intervals query (#38999 ) Lucene recently added `overlapping`, `before` and `after` filters to the intervals package. This commit exposes them in elasticsearch.	2019-02-18 15:06:24 +00:00
Adrien Grand	45b17e8645	Don't close caches while there might still be in-flight requests. (#38958 ) Many of our index components use ref-counting so that in the event that a shard is closed while there are still ongoing requests, then the index reader and the store only effectively get closed when ongoing requests have finished. However we don't apply the same principle to the request and query caches, which might get closed while there are still in-flight requests. This commit adds ref-counting to `IndicesService` so that the caches and other components it maintains only get closed when all shards are effectively closed. Closes #37117	2019-02-18 13:59:58 +01:00
Martijn van Groningen	ed08bc3537	Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (#38709 ) * During fetching remote mapping if remote client is missing then `NoSuchRemoteClusterException` was not handled. * When adding remote connection, check that it is really connected before continue-ing to run the tests. Relates to #38695	2019-02-18 09:41:44 +01:00
Jason Tedor	a5ce1e0bec	Integrate retention leases to recovery from remote (#38829 ) This commit is the first step in integrating shard history retention leases with CCR. In this commit we integrate shard history retention leases with recovery from remote. Before we start transferring files, we take out a retention lease on the primary. Then during the file copy phase, we repeatedly renew the retention lease. Finally, when recovery from remote is complete, we disable the background renewing of the retention lease.	2019-02-16 15:37:52 -05:00
Tim Brooks	b1c1daa63f	Add get file chunk timeouts with listener timeouts (#38758 ) This commit adds a `ListenerTimeouts` class that will wrap a `ActionListener` in a listener with a timeout scheduled on the generic thread pool. If the timeout expires before the listener is completed, `onFailure` will be called with an `ElasticsearchTimeoutException`. Timeouts for the get ccr file chunk action are implemented using this functionality. Additionally, this commit attempts to fix #38027 by also blocking proxied get ccr file chunk actions. This test being un-muted is useful to verify the timeout functionality.	2019-02-16 10:56:03 -07:00
Luca Cavanna	a1a49f201d	Tie break search shard iterator comparisons on cluster alias (#38853 ) `SearchShardIterator` inherits its `compareTo` implementation from `PlainShardIterator`. That is good in most of the cases, as such comparisons are based on the shard id which is unique, even when searching against indices with same names across multiple clusters (thanks to the index uuid being different). In case though the same cluster is registered multiple times with different aliases, the shard id is exactly the same, hence remote results will be returned before local ones with same shard id objects. That is because remote iterators are added before local ones, and we use a stable sorting method in `GroupShardIterators` constructor. This PR enhances `compareTo` for `SearchShardIterator` to tie break on cluster alias and introduces consistent `equals` and `hashcode` methods. This allows to remove a TODO in `SearchResponseMerger` which otherwise has to handle this special case specifically. Also, while at it I added missing tests around equals/hashcode and compareTo and expanded existing ones.	2019-02-16 09:41:03 +01:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Nhat Nguyen	20755e666c	Reduce global checkpoint sync interval in disruption tests (#38931 ) We verify seq_no_stats is aligned between copies at the end of some disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due to a lagged global checkpoint on replicas. The global checkpoint on replicas is lagged because we sync the global checkpoint 30 seconds (by default) after the last replication operation. This change reduces the global checkpoint sync-internal to 1s in the disruption tests. Closes #38318 Closes #36789	2019-02-15 21:04:20 -05:00
Nhat Nguyen	a67b9f6d1f	Relax testStressMaybeFlushOrRollTranslogGeneration (#38918 ) The predicate shouldPeriodicallyFlush is determined by the uncommitted translog size and the local checkpoint. The uncommitted translog size depends on the local checkpoint. The condition shouldPeriodicallyFlush can be true twice in in the test in the following scenario: 1. Index doc-0 and advances the local checkpoint to 0, the condition shouldPeriodicallyFlush remains false. 2. Index doc-1 and add it to translog, but the local checkpoint is not advanced yet (still 0). The condition shouldPeriodicallyFlush becomes true because the uncommitted translog size is 216bytes (2ops + gen-1 + gen-2) > 180bytes and the translog generation of the new index commit would advance from 1 to 2. > [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=0, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=2, > max_seq_no=1}] 1. The shouldPeriodicallyFlush becomes true again after the local checkpoint is advanced to 1 because the uncommitted translog size is 216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation of the new index commit would advance from 2 to 4. > [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=1, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=4, > max_seq_no=1}] We need to relax the assertion in this test to cover this situation. Closes #31629	2019-02-15 21:04:12 -05:00
Armin Braun	238425e5e7	Fix Issue with Concurrent Snapshot Init + Delete (#38518 ) * Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes #38489	2019-02-15 16:50:47 -08:00
Alan Woodward	176013e23c	Avoid double term construction in DfsPhase (#38716 ) DfsPhase captures terms used for scoring a query in order to build global term statistics across multiple shards for more accurate scoring. It currently does this by building the query's `Weight` and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()` for each collected term. This duplicates work, however, as the various `Weight` implementations will already have collected these statistics at construction time. This commit replaces this round-about way of collecting stats, instead using a delegating IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()` is called from the Weight. It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term statistics, but ignore field statistics. `Rescorer.extractTerms` has been removed, and replaced with a new method on `RescoreContext` that returns any queries used by the rescore implementation. The delegating IndexSearcher then collects term contexts and statistics in the same way described above for each Query.	2019-02-15 16:00:38 +00:00
Daniel Mitterdorfer	fcc7f553f5	Also mmap cfs files for hybridfs (#38940 ) (#38947 ) With this commit we add the `.cfs` file extension to the list of file types that are memory-mapped by hybridfs. `.cfs` files combine all files of a Lucene segment into a single file in order to save file handles. As this strategy is only used for "small" segments (less than 10% of the shard size), it is benefical to memory-map them instead of accessing them via NIO. Relates #36668	2019-02-15 15:34:40 +01:00
David Turner	578514e892	Recover peers from translog, ignoring soft deletes (#38904 ) Today if soft deletes are enabled then we read the operations needed for peer recovery from Lucene. However we do not currently make any attempt to retain history in Lucene specifically for peer recoveries so we may discard it and fall back to a more expensive file-based recovery. Yet we still retain sufficient history in the translog to perform an operations-based peer recovery. In the long run we would like to fix this by retaining more history in Lucene, possibly using shard history retention leases (#37165). For now, however, this commit reverts to performing peer recoveries using the history retained in the translog regardless of whether soft deletes are enabled or not.	2019-02-15 10:45:15 +01:00
Henning Andersen	a211e51343	ShardBulkAction ignore primary response on primary (#38901 ) Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.	2019-02-15 10:13:11 +01:00
Jason Tedor	00cb8d0be8	Mark coordinator test as awaits fix This test is failing frequently so this commit mutes it. Relates #38867	2019-02-14 12:43:31 -05:00
Lee Hinman	0c733c04be	Remove immediate operation retry after mapping update (#38873 ) Prior to this commit, when an indexing operation resulted in an `Engine.Result.Type.MAPPING_UPDATE_REQUIRED`, TransportShardBulkAction immediately retries the indexing operation to see if it succeeds. In the event that it succeeds the context does not wait until the mapping update has propagated through the cluster state before finishing the indexing. In some of our tests we rely on mappings being available as soon as they've been introduced in a document that indexed correctly. By removing the immediate retry we always wait for this to be the case. Resolves #38428 Supercedes #38579 Relates to #38711	2019-02-14 09:31:08 -07:00
Christoph Büscher	6c5cec4ff4	Enable silent FollowersCheckerTest (#38851 ) One of the test methods wasn't run because it was private. Making this method public and fixing some issues around mocking the threadpool that otherwise would lead to an NPE.	2019-02-14 16:16:48 +01:00
Albert Zaharovits	6243a9797f	_cat/indices with Security, hide names when wildcard (#38824 ) This changes the output of the `_cat/indices` API with `Security` enabled. It is possible to only display the index name (and possibly the index health, depending on the request options) but not its stats (doc count, merges, size, etc). This is the case for closed indices which have index metadata in the cluster state but no associated shards, hence no shard stats. However, when `Security` is enabled, and the request contains wildcards, open indices without stats are a common occurrence. This is because the index names in the response table are picked up directly from the cluster state which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the stats data which is populated by the indices stats API which does go through the index name resolver. This is a bug, because it is circumventing `Security`'s function to hide unauthorized indices. This has been fixed by displaying the index names as they are resolved by the indices stats API. The outputs of these two APIs is now very similar: same index names, similar data but different format. Closes #37190	2019-02-14 15:09:17 +02:00
David Roberts	6ea483a663	Mute DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex Due to https://github.com/elastic/elasticsearch/issues/38845	2019-02-14 11:46:22 +00:00
Luca Cavanna	7456117019	[TEST] address testCollectNodes rare failure (#38559 ) #37767 changed the expected exception for "no such cluster" error from `IllegalStateException` to a dedicated `NoSuchRemoteClusterException`. An assertion in `testCollectNodes` needs to be updated accordingly.	2019-02-14 10:57:14 +01:00
Nhat Nguyen	5d22e45990	Copy retention leases when trim unsafe commits (#37995 ) When a primary shard is recovered from its store, we trim the last commit (when it's unsafe). If that primary crashes before the recovery completes, we will lose the committed retention leases because they are baked in the last commit. With this change, we copy the retention leases from the last commit to the safe commit when trimming unsafe commits. Relates #37165	2019-02-13 17:27:48 -05:00
Jason Tedor	062eea8fcc	Fix excessive increments in soft delete policy (#38813 ) In this case, we were incrementing the policy too much. This means on every iteration we actually keep increasing the minimum retained sequence number, even with leases in place. It was a bug from when the soft deletes policy had retention leases incorporated into it. This commit fixes this bug by ensuring we only increment in the proper places, and adds careful tests for the various situations.	2019-02-13 14:04:45 -05:00
Jake Landis	46bb663a09	Make 7.x like 6.7 user agent ecs, but default to true (#38828 ) Forward port of https://github.com/elastic/elasticsearch/pull/38757 This change reverts the initial 7.0 commits and replaces them with the 6.7 variant that still allows for the ecs flag. This commit differs from the 6.7 variants in that ecs flag will now default to true. 6.7: `ecs` : default `false` 7.x: `ecs` : default `true` 8.0: no option, but behaves as `true` * Revert "Ingest node - user agent, move device to an object (#38115)" This reverts commit `5b008a34aa`. * Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)" This reverts commit `cac6b8e06f`. * cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 Add ECS schema for user-agent ingest processor (#37727) * cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 Ingest node - user agent, move device to an object (#38115) (#38121) * cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes) Dep. check for ECS changes to User Agent processor (#38362) * make true the default for the ecs option, and update 7.0 references and tests	2019-02-13 10:28:01 -06:00
Przemyslaw Gomulka	7404882105	Fix line separators in JSON logging tests backport#38771 #38834 The hardcoded '\n' in string will not work in Windows where there is a different line separator. A System.lineSeparator should be used to make it work on all platforms closes #38705 backport #38771	2019-02-13 13:34:33 +01:00
Zachary Tong	57f69082fd	Disable cache on QueryProfilerIT (#38748 ) - Disables the request cache on the test, to prevent cached values from potentially interfering with test results - Changes the test to execute a single query, in hopes of making failures more reproducible Backport of #38583	2019-02-12 13:11:52 -05:00
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Nhat Nguyen	4a5070dcfb	Use current term in initial leases in engine test (#38285 ) We need to use the current primary term instead of 1L for the initial retention leases; otherwise, the primary term of the committed retention leases won't match the current primary term if the retention leases never gets updated.	2019-02-12 11:40:04 -05:00
Nhat Nguyen	eca5404572	Fix synchronization in LocalCheckpointTracker#contains (#38755 ) We are accessing the `CountedBitSet` in `LocalCheckpointTracker#contains` without proper synchronization. Relates #33871	2019-02-12 11:39:50 -05:00
Nhat Nguyen	225ebb6935	Ensure no snapshotted commit when close engine (#38663 ) With this change, we can automatically detect an implementation that acquires an index commit but fails to release.	2019-02-12 11:39:35 -05:00
Tanguy Leroux	51d6b9ab31	Fix CloseWhileRelocatingShardsIT (#38728 )	2019-02-12 14:04:44 +01:00
Jason Tedor	bbc9aa9979	Introduce retention lease actions (#38756 ) This commit introduces actions for some common retention lease operations that clients need to be able to perform remotely. These actions include add/renew/remove.	2019-02-12 07:38:03 -05:00
Przemyslaw Gomulka	7e178aa4a7	Enable IndexActionTests and WatcherIndexingListenerTests Backport #38738 fix tests to use clock in milliseconds precision in watcher code make sure the date comparison in string format is using same formatters some of the code was modified in #38514 possibly because of merge conflicts closes #38581 Backport #38738	2019-02-12 13:05:44 +01:00
Luca Cavanna	90fff54954	Tie break on cluster alias when merging shard search failures (#38715 ) A recent test failure triggered an edge case scenario where failures may be coming back with the same shard id, yet from different clusters. This commit adapts the failures comparator to take the cluster alias into account when merging failures as part of CCS requests execution. Also the corresponding test has been split in two: with and without search shard target set to the failure. Closes #38672	2019-02-12 11:25:44 +01:00
Jason Tedor	c7cdd6a46a	Add dedicated retention lease exceptions (#38754 ) When a retention lease already exists on an add retention lease invocation, or a retention lease is not found on a renew retention lease invocation today we throw an illegal argument exception. This puts a burden on the caller to catch that specific exception and parse the message. This commit relieves the burden from the caller by adding dedicated exception types for these situations.	2019-02-12 00:32:09 -05:00
Jason Tedor	b97c74bbab	Enable removal of retention leases (#38751 ) This commit introduces the ability to remove retention leases. Explicit removal will be needed to manage retention leases used to increase the likelihood of operation-based recoveries syncing, and for consumers such as ILM.	2019-02-11 21:19:11 -05:00

1 2 3 4 5 ...

2586 Commits