OpenSearch

Commit Graph

Author	SHA1	Message	Date
Alan Woodward	df124f32db	Refactor control flow in TransportAnalyzeAction (#42801 ) The control flow in TransportAnalyzeAction is currently spread across two large methods, and is quite difficult to follow. This commit tidies things up a bit, to make it clearer when we use pre-defined analyzers and when we use custom built ones.	2019-06-04 14:52:46 +01:00
Yu	428beabc49	Remove "template" field in IndexTemplateMetaData (#42099 ) Remove "template" field from XContent parsing in IndexTemplateMetaData	2019-06-03 12:43:11 -05:00
Armin Braun	00db9c1a2f	Make Connection Future Err. Handling more Resilient (#42781 ) (#42804 ) * There were a number of possible (runtime-) exceptions that could be raised in the adjusted code and prevent resolving the listener * Relates #42350	2019-06-03 19:29:36 +02:00
David Turner	df0f0b3d40	Rename autoMinMasterNodes to autoManageMasterNodes (#42789 ) Renames the `ClusterScope` attribute `autoMinMasterNodes` to reflect its broader meaning since 7.0. Backport of the relevant part of #42700 to `7.x`.	2019-06-03 12:12:07 +01:00
Alan Woodward	2129d06643	Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197 ) This commit clones the existing AnalyzeRequest/AnalyzeResponse classes to the high-level rest client, and adjusts request converters to use these new classes. This is a prerequisite to removing the Streamable interface from the internal server version of these classes.	2019-06-03 09:46:36 +01:00
Alan Woodward	d0da30e5f4	Return NO_INTERVALS rather than null from empty TokenStream (#42750 ) IntervalBuilder#analyzeText will currently return null if it is passed an empty TokenStream, which can lead to a confusing NullPointerException later on during querying. This commit changes the code to return NO_INTERVALS instead. Fixes #42587	2019-05-31 17:45:57 +01:00
Jason Tedor	61c6a26b31	Remove locale-dependent string checking We were checking if an exception was caused by a specific reason "Not a directory". Alas, this reason is locale-dependent and can fail on systems that are not set to en_US.UTF-8. This commit addresses this by deriving what the locale-dependent error message would be and using that for comparison with the actual exception thrown. Relates #41689	2019-05-31 12:08:38 -04:00
Jason Tedor	371cb9a8ce	Remove Log4j 1.2 API as a dependency (#42702 ) We had this as a dependency for legacy dependencies that still needed the Log4j 1.2 API. This appears to no longer be necessary, so this commit removes this artifact as a dependency. To remove this dependency, we had to fix a few places where we were accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since both APIs were on the compile-time classpath). Finally, we can remove our custom Netty logger factory. This was needed when we were on Log4j 1.2 and handled logging in our own unique way. When we migrated to Log4j 2 we could have dropped this dependency. However, even then Netty would still pick up Log4j 1.2 since it was on the classpath, thus the advantage to removing this as a dependency now.	2019-05-30 16:08:07 -04:00
David Turner	d14799f0a5	Prevent merging nodes' data paths (#42665 ) Today Elasticsearch does not prevent you from reconfiguring a node's `path.data` to point to data paths that previously belonged to more than one node. There's no good reason to be able to do this, and the consequences can be quietly disastrous. Furthermore, #42489 might result in a user trying to split up a previously-shared collection of data paths by hand and there's definitely scope for mixing the paths up across nodes when doing this. This change adds a check during startup to ensure that each data path belongs to the same node.	2019-05-30 18:08:55 +01:00
Marios Trivyzas	ce30afcd01	Deprecate CommonTermsQuery and cutoff_frequency (#42619 ) (#42691 ) Since the max_score optimization landed in Elasticsearch 7, the CommonTermsQuery is redundant and slower. Moreover the cutoff_frequency parameter for MatchQuery and MultiMatchQuery is redundant. Relates to #27096 (cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)	2019-05-30 18:04:47 +02:00
David Turner	86b1a07887	Log leader and handshake failures by default (#42342 ) Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log anything above `DEBUG` level. However there are some situations where it is appropriate for them to log at a higher level: - if the low-level handshake succeeds but the high-level one fails then this indicates a config error that the user should resolve, and the exception will help them to do so. - if leader checks fail repeatedly then we restart discovery, and the exception will help to determine what went wrong. Resolves #42153	2019-05-30 08:14:19 +01:00
Igor Motov	d2f9ccbe18	Geo: Refactor libs/geo parsers (#42549 ) Refactors the WKT and GeoJSON parsers from an utility class into an instantiatable objects. This is a preliminary step in preparation for moving out coordinate validators from Geometry constructors. This should allow us to make validators plugable.	2019-05-29 20:07:27 -04:00
Henning Andersen	53f5d313cd	Use correct global checkpoint sync interval (#42642 ) A disruption test case need to use a lower checkpoint sync interval since they verify sequence numbers after the test waiting max 10 seconds for it to stabilize. Closes #42637	2019-05-29 08:15:53 +02:00
Adrien Grand	38f9e24411	Add 7.1.2 version constant. (#42648 ) Relates to #42635	2019-05-28 23:14:10 +02:00
Jim Ferenczi	267e5a1110	fix javadoc of SearchRequestBuilder#setTrackTotalHits (#42219 )	2019-05-28 22:12:16 +02:00
Armin Braun	6166fed6f1	Fix BulkProcessorRetryIT (#41700 ) (#42618 ) * Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active * Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well	2019-05-28 17:58:00 +02:00
Vigya Sharma	130c832e10	Validate routing commands using updated routing state (#42066 ) When multiple commands are called in sequence, fetch shards from mutable, up-to-date routing nodes to ensure each command's changes are visible to subsequent commands. This addresses an issue uncovered during work on #41050.	2019-05-28 17:01:14 +02:00
David Turner	c21745c8ab	Avoid loading retention leases while writing them (#42620 ) Resolves #41430.	2019-05-28 15:27:06 +01:00
Yannick Welsch	1e0b0f640b	Fix compilation Follow-up to `5598647922`	2019-05-28 13:56:36 +02:00
Yannick Welsch	5598647922	Reset state recovery after successful recovery (#42576 ) The problem this commit addresses is that state recovery is not reset on a node that then becomes master with a cluster state that has a state not recovered flag in it. The situation that was observed in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains), when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state (with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and as it was previously leader in term1 and successfully completed state recovery, does never retry state recovery in term 3. Closes #39172	2019-05-28 13:46:10 +02:00
David Turner	746a2f41fd	Remove PRE_60_NODE_CHECKPOINT (#42531 ) This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing with 5.x nodes' lack of sequence number support. Backport of #42527	2019-05-28 12:25:53 +01:00
Armin Braun	00d665540a	Make unwrapCorrupt Check Suppressed Ex. (#41889 ) (#42605 ) * Make unwrapCorrupt Check Suppressed Ex. (#41889) * As discussed in #24800 we want to check for suppressed corruption indicating exceptions here as well to more reliably categorize corruption related exceptions * Closes #24800, 41201	2019-05-28 12:44:40 +02:00
Daniel Mitterdorfer	adb3574af8	Mute NodeTests (#42615 ) Relates #42577 Relates #42614	2019-05-28 12:25:18 +02:00
Armin Braun	116b050cc6	Cleanup Bulk Delete Exception Logging (#41693 ) (#42606 ) * Cleanup Bulk Delete Exception Logging * Follow up to #41368 * Collect all failed blob deletes and add them to the exception message * Remove logging of blob name list from caller exception logging	2019-05-28 11:00:28 +02:00
Nhat Nguyen	de6be819d6	Allocate to data-only nodes in ReopenWhileClosingIT (#42560 ) If all primary shards are allocated on the master node, then the verifying before close step will never interact with mock transport service. This change prefers to allocate shards on data-only nodes. Closes #39757	2019-05-27 17:32:06 -04:00
Armin Braun	a94d24ae5a	Fix RareClusterStateIT (#42430 ) (#42580 ) * It looks like we might be cancelling a previous publication instead of the one triggered by the given request with a very low likelihood. * Fixed by adding a wait for no in-progress publications * Also added debug logging that would've identified this problem * Closes #36813	2019-05-27 13:57:17 +02:00
Armin Braun	c4f44024af	Remove Delete Method from BlobStore (#41619 ) (#42574 ) * Remove Delete Method from BlobStore (#41619) * The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either	2019-05-27 12:24:20 +02:00
Armin Braun	bb7e8eb2fd	Introduce ShardState Enum + Slight Cleanup SnapshotsInProgress (#41940 ) (#42573 ) * Added separate enum for the state of each shard, it was really confusing that we used the same enum for the state of the snapshot overall and the state of each individual shard * relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150 * Shortened some obvious spots in equals method and saved a few lines via `computeIfAbsent` to make up for adding 50 new lines to this class	2019-05-27 12:08:45 +02:00
Armin Braun	7b4d1ac352	Remove Obsolete BwC Logic from BlobStoreRepository (#42193 ) (#42571 ) * Remove Obsolete BwC Logic from BlobStoreRepository * We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here * Some other minor+obvious cleanups	2019-05-27 11:47:04 +02:00
Armin Braun	c7448b12e1	Cleanup Redundant BlobStoreFormat Class (#42195 ) (#42570 ) * No need to have an abstract class here when there's only a single impl.	2019-05-27 11:28:50 +02:00
Armin Braun	49767fc1e9	Some Cleanup in o.e.gateway Package (#42108 ) (#42568 ) * Removing obvious dead code * Removing redundant listener interface	2019-05-27 11:28:12 +02:00
Armin Braun	a5ca20a250	Some Cleanup in o.e.i.engine (#42278 ) (#42566 ) * Some Cleanup in o.e.i.engine * Remove dead code and parameters * Reduce visibility in some obvious spots * Add missing `assert`s (not that important here since the methods themselves will probably be dead-code eliminated) but still	2019-05-27 11:04:54 +02:00
Martijn van Groningen	e591d30918	fixed test compile issue	2019-05-27 10:17:00 +02:00
Martijn van Groningen	48a71459c0	Improve how internal representation of pipelines are updated (#42257 ) If a single pipeline is updated then the internal representation of all pipelines was updated. With this change, only the internal representation of the pipelines that have been modified will be updated. Prior to this change the IngestMetadata of the previous and current cluster was used to determine whether the internal representation of pipelines should be updated. If applying the previous cluster state change failed then subsequent cluster state changes that have no changes to IngestMetadata will not attempt to update the internal representation of the pipelines. This commit, changes how the IngestService updates the internal representation by keeping track of the underlying configuration and use that to detect against the new IngestMetadata whether a pipeline configuration has been changed and if so, then the internal pipeline representation will be updated.	2019-05-27 10:01:15 +02:00
Nhat Nguyen	85e60850af	Add debug log for retention leases (#42557 ) We need more information to understand why CcrRetentionLeaseIT is failing. This commit adds some debug log to retention leases and enables them in CcrRetentionLeaseIT.	2019-05-26 16:04:47 -04:00
Tanguy Leroux	6bec876682	Improve Close Index Response (#39687 ) This changes the `CloseIndexResponse` so that it reports closing result for each index. Shard failures or exception are also reported per index, and the global acknowledgment flag is computed from the index results only. The response looks like: ``` { "acknowledged" : true, "shards_acknowledged" : true, "indices" : { "docs" : { "closed" : true } } } ``` The response reports shard failures like: ``` { "acknowledged" : false, "shards_acknowledged" : false, "indices" : { "docs-1" : { "closed" : true }, "docs-2" : { "closed" : false, "shards" : { "1" : { "failures" : [ { "shard" : 1, "index" : "docs-2", "status" : "BAD_REQUEST", "reason" : { "type" : "index_closed_exception", "reason" : "closed", "index_uuid" : "JFmQwr_aSPiZbkAH_KEF7A", "index" : "docs-2" } } ] } } }, "docs-3" : { "closed" : true } } } ``` Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2019-05-24 21:57:55 -04:00
Julie Tibshirani	3a6c2525ca	Deprecate support for chained multi-fields. (#42330 ) This PR contains a straight backport of #41926, and also updates the migration documentation and deprecation info API for 7.x.	2019-05-24 15:55:06 -07:00
Jason Tedor	f2cfd09289	Remove renewal in retention lease recovery test (#42536 ) This commit removes the act of renewing some retention leases during a retention lease recovery test. Having renewal does not add anything extra to this test, but does allow for some situations where the test can fail spuriously (i.e., in a way that does not indicate that production code is broken).	2019-05-24 17:40:59 -05:00
Nhat Nguyen	74d771d8f6	Adjust load SplitIndexIT#testSplitIndexPrimaryTerm (#42477 ) SplitIndexIT#testSplitIndexPrimaryTerm sometimes timeout due to relocating many shards. This change adjusts loads and increases the timeout.	2019-05-24 15:47:29 -04:00
Nhat Nguyen	02739d038c	Mute accounting circuit breaker check after test (#42448 ) If we close an engine while a refresh is happening, then we might leak refCount of some SegmentReaders. We need to skip the ram accounting circuit breaker check until we have a new Lucene snapshot which includes the fix for LUCENE-8809. This also adds a test to the engine but left it muted so we won't forget to reenable this check. Closes #30290	2019-05-24 15:42:12 -04:00
Nhat Nguyen	329d1307a5	Add test to verify force primary allocation on closed indices (#42458 ) This change adds a test verifying that we can force primary allocation on closed indices.	2019-05-24 17:23:58 +02:00
Henning Andersen	075fd2a0ac	Shard CLI tool always check shards (#41480 ) The shard CLI tool would not do anything if a corruption marker was not present. But a corruption marker is only added if a corruption is detected during indexing/writing, not if a search or other read fails. Changed the tool to always check shards regardless of corruption marker presence. Related to #41298	2019-05-24 16:49:37 +02:00
Marios Trivyzas	523b5bfdb5	Fix sorting on nested field with unmapped (#42451 ) Previously sorting on a missing nested field would fail with an Exception: `[nested_field] failed to find nested object under path [nested_path]` despite `unmapped_type` being set on the query. Fixes: #33644 (cherry picked from commit 631142d5dd088a10de8dcd939b50a14301173283)	2019-05-24 15:47:41 +02:00
Christoph Büscher	12d5642e93	Small internal AnalysisRegistry changes (#42500 ) Some internal refactorings to the AnalysisRegistry, spin-off from #40782.	2019-05-24 15:27:35 +02:00
David Turner	a5b6ed8d1e	Remove AwaitsFix of #41967 following #42504	2019-05-24 14:26:49 +01:00
David Turner	4d02ca1633	Drain master task queue when stabilising (#42504 ) Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in #36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes #41967, in which the master entered the stabilisation phase with over 800 tasks to process.	2019-05-24 14:18:02 +01:00
weizijun	40348ab726	Use accurate total hits in IndexPrimaryRelocationIT By default, we track total hits up to 10k but we might index more than 10k documents `testPrimaryRelocationWhileIndexing`. With this change, we always request for the accurate total hits in the test. > java.lang.AssertionError: Count is 10000+ hits but 11684 was expected.	2019-05-24 12:47:21 +02:00
Simon Willnauer	46ccfba808	Remove IndexStore and DirectoryService (#42446 ) Both of these classes are basically a bloated wrapper around a simple construct that can simply be a DirectoryFactory interface. This change removes both classes and replaces them with a simple stateless interface that creates a new `Directory` per shard. The concept of `index.store` is preserved since it makes sense from a configuration perspective.	2019-05-24 12:14:56 +02:00
David Turner	f864f6a740	Cluster state from API should always have a master (#42454 ) Today the `TransportClusterStateAction` ignores the state passed by the `TransportMasterNodeAction` and obtains its state from the cluster applier. This might be inconsistent, showing a different node as the master or maybe even having no master. This change adjusts the action to use the passed-in state directly, and adds tests showing that the state returned is consistent with our expectations even if there is a concurrent master failover. Fixes #38331 Relates #38432	2019-05-24 08:45:22 +01:00
David Turner	528f8cc073	Add stack traces to RetentionLeasesIT failures (#42425 ) Today `RetentionLeaseIT` calls `fail(e.toString())` on some exceptions, losing the stack trace that came with the exception. This commit adjusts this to re-throw the exception wrapped in an `AssertionError` so we can see more details about failures such as #41430.	2019-05-24 08:37:51 +01:00
David Turner	c0974a9813	Add more logging to MockDiskUsagesIT (#42424 ) This commit adds a log message containing the routing table, emitted on each iteration of the failing assertBusy() in #40174. It also modernizes the code a bit.	2019-05-24 08:28:10 +01:00
Jack Conradson	167f391cfd	Bug fix to allow access to top level params in reduce script (#42096 )	2019-05-23 16:00:39 -07:00
Ryan Ernst	a49bafc194	Split document and metadata fields in GetResult (#38373 ) (#42456 ) This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.	2019-05-23 14:01:07 -07:00
Jake Landis	2b22ceac04	Bulk processor concurrent requests (#41451 ) (#42438 ) `org.elasticsearch.action.bulk.BulkProcessor` is a threadsafe class that allows for simple semantics to deal with sending bulk requests. Once a bulk reaches it's pre-defined size, documents, or flush interval it will execute sending the bulk. One configurable option is the number of concurrent outstanding bulk requests. That concurrency is implemented in `org.elasticsearch.action.bulk.BulkRequestHandler` via a semaphore. However, the only code that currently calls into this code is blocked by `synchronized` methods. This results in the in-ability for the BulkProcessor to behave concurrently despite supporting configurable amounts of concurrent requests. This change removes the `synchronized` method in favor an explicit lock around the non-thread safe parts of the method. The call into `org.elasticsearch.action.bulk.BulkRequestHandler` is no longer blocking, which allows `org.elasticsearch.action.bulk.BulkRequestHandler` to handle it's own concurrency.	2019-05-23 14:22:16 -05:00
Simon Willnauer	5a884dac03	Unguice Snapshot / Restore services (#42357 ) This removes the @Inject annotations from the Snapshot/Restore infrastructure classes and registers them manually in Node.java	2019-05-23 17:09:26 +02:00
Jim Ferenczi	a497603219	Disable max score optimization for queries with unbounded max scores (#41361 ) Lucene 8 has the ability to skip blocks of non-competitive documents. However some queries don't track their maximum score (`script_score`, `span`, ...) so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down some boolean queries if other clauses have bounded max scores. This commit disables the max score optimization when we detect a mandatory scoring clause with unbounded max scores. Optional clauses are not checked since they can still skip documents when the unbounded clause is after the current document.	2019-05-23 16:53:57 +02:00
Yannick Welsch	f57fdc57e9	Deprecate max_local_storage_nodes (#42426 ) Allows this setting to be removed in 8.0, see #42428	2019-05-23 15:59:55 +02:00
Christoph Büscher	85ff9543b7	Prevent normalizer from not being closed on exception (#42375 ) Currently AnalysisRegistry#processNormalizerFactory creates a normalizer and only later checks whether it should be added to the normalizer map passed in. In case we throw an exception it isn't closed. This can be prevented by moving the check that throws the exception earlier.	2019-05-23 15:53:55 +02:00
markharwood	c2c8d0e637	Test fix - results equality failed because of subtle scoring differences between replicas. (#42366 ) Diverging merge policies means the segments and therefore scores are not the same. Fixed the test by ensuring there are zero replicas. Closes #32492	2019-05-23 12:00:57 +01:00
Jim Ferenczi	b88e80ab89	Upgrade to Lucene 8.1.0 (#42214 ) This commit upgrades to the GA release of Lucene 8.1.0	2019-05-23 11:46:45 +02:00
Jim Ferenczi	4ca5649a0d	Upgrade to lucene 8.1.0-snapshot-e460356abe (#40952 )	2019-05-23 11:45:33 +02:00
Marios Trivyzas	0777223bab	Allow `fields` to be set to `` (#42301 ) Allow for SimpleQueryString, QueryString and MultiMatchQuery to set the `fields` parameter to the wildcard ``. If so, set the leniency to `true`, to achieve the same behaviour as from the `"default_field" : "" setting. Furthermore, check if `` is in the list of the `default_field` but not necessarily as the 1st element. Closes: #39577 (cherry picked from commit e75ff0c748e6b68232c2b08e19ac4a4934918264)	2019-05-23 10:10:48 +02:00
Yannick Welsch	a71d19e92a	Ensure testAckedIndexing uses disruption index settings AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by testAckedIndexing, which has led to spurious test failures Relates #41068, #38931	2019-05-22 19:13:14 +02:00
Jake Landis	496fee3333	bump to 7.3 (#42365 )	2019-05-22 11:57:07 -05:00
Luca Cavanna	c2af62455f	Cut over SearchResponse and SearchTemplateResponse to Writeable (#41855 ) Relates to #34389	2019-05-22 18:47:54 +02:00
Luca Cavanna	29c9bb9181	Clean up ShardId usage of Streamable (#41843 ) ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be easily replaced with direct usages of the constructor that takes a StreamInput as argument.	2019-05-22 18:47:54 +02:00
Luca Cavanna	96ba0b13e0	Cut over MultiSearchResponse to Writeable (#41844 ) Relates to #34389	2019-05-22 18:47:54 +02:00
Luca Cavanna	1ded45b0a2	Cut over SearchPhaseResult to Writeable (#41853 ) Relates to #34389	2019-05-22 18:47:54 +02:00
Luca Cavanna	c85f285298	Move InternalAggregations to Writeable (#41841 ) Relates to #34389	2019-05-22 18:47:54 +02:00
Luca Cavanna	39d4c7c26f	Skip explain fetch sub phase when request holds only suggestions (#41739 ) In case a search request holds only the suggest section, the query phase is skipped and only the suggest phase is executed instead. There will never be hits returned, and in case the explain flag is set to true, the explain sub phase throws a null pointer exception as the query is null. Usually a null query is replaced with a match all query as part of SearchContext#preProcess which is though skipped as well with suggest only searches. To address this, we skip the explain sub fetch phase for search requests that only requested suggestions. Closes #31260	2019-05-22 18:47:54 +02:00
Luca Cavanna	3416cda8b1	Cut over ClusterSearchShardsGroup to Writeable (#41788 )	2019-05-22 18:47:54 +02:00
Guillaume Darmont	3e231bbad6	StackOverflowError when calling BulkRequest#add (#41672 ) Removing of payload in BulkRequest (#39843) had a side effect of making `BulkRequest.add(DocWriteRequest<?>...)` (with varargs) recursive, thus leading to StackOverflowError. This PR adds a small change in RequestConvertersTests to show the error and the corresponding fix in `BulkRequest`. Fixes #41668	2019-05-22 11:22:14 -05:00
mushao999	d4b5933225	Fix alpha version error message (#40406 )	2019-05-22 09:06:10 -07:00
Yannick Welsch	eae58c477c	Remove testNodeFailuresAreProcessedOnce This test was not checking the thing it was supposed to anyway.	2019-05-22 14:52:01 +02:00
Yannick Welsch	250973af1d	Fix testCannotJoinIfMasterLostDataFolder Relates to #41047	2019-05-22 14:37:31 +02:00
Simon Willnauer	a79cd77e5c	Remove IndexShard dependency from Repository (#42213 ) * Remove IndexShard dependency from Repository In order to simplify repository testing especially for BlobStoreRepository it's important to remove the dependency on IndexShard and reduce it to Store and MapperService (in the snapshot case). This significantly reduces the dependcy footprint for Repository and allows unittesting without starting nodes or instantiate entire shard instances. This change deprecates the old method signatures and adds a unittest for FileRepository to show the advantage of this change. In addition, the unittesting surfaced a bug where the internal file names that are private to the repository were used in the recovery stats instead of the target file names which makes it impossible to relate to the actual lucene files in the recovery stats. * don't delegate deprecated methods * apply comments * test	2019-05-22 14:27:11 +02:00
Ignacio Vera	3a20ff7e86	Fix TopHitsAggregationBuilder adding duplicate _score sort clauses (#42179 ) (#42343 ) When using High Level Rest Client Java API to produce search query, using AggregationBuilders.topHits("th").sort("_score", SortOrder.DESC) caused query to contain duplicate sort clauses.	2019-05-22 14:02:52 +02:00
Yannick Welsch	f338005179	Revert "Mute MinimumMasterNodesIT.testThreeNodesNoMasterBlock()" This reverts commit 448fc8444559be3145e4a7f65dec794ebbff7b81.	2019-05-22 13:22:09 +02:00
Yannick Welsch	0c7322ebf2	Avoid bubbling up failures from a shard that is recovering (#42287 ) A shard that is undergoing peer recovery is subject to logging warnings of the form org.elasticsearch.action.FailedNodeException: Failed node [XYZ] ... Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ... These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e. there is an IndexShard instance, but no proper IndexCommit just yet). As these failures are currently bubbled up to the master, they cause unnecessary reroutes and confusion amongst users due to being logged as warnings. Closes #40107	2019-05-22 12:26:15 +02:00
Yannick Welsch	770d8e9e39	Remove usage of max_local_storage_nodes in test infrastructure (#41652 ) Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a follow-up PR to deprecate this setting in 7.x and to remove it in 8.0. This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly done by passing the data path from the stopped node to the new node that is started.	2019-05-22 11:04:55 +02:00
Yannick Welsch	c9dedf180b	Use comparator for Reconfigurator (#42283 ) Simplifies the voting configuration reconfiguration logic by switching to an explicit Comparator for the priorities. Does not make changes to the behavior of the component.	2019-05-22 10:04:51 +02:00
Nhat Nguyen	bcbf1aff6b	Peer recovery should flush at the end (#41660 ) Flushing at the end of a peer recovery (if needed) can bring these benefits: 1. Closing an index won't end up with the red state for a recovering replica should always be ready for closing whether it performs the verifying-before-close step or not. 2. Good opportunities to compact store (i.e., flushing and merging Lucene, and trimming translog) Closes #40024 Closes #39588	2019-05-21 22:45:17 -04:00
Nhat Nguyen	84df48ccb3	Recovery with syncId should verify seqno infos (#41265 ) This change verifies and aborts recovery if source and target have the same syncId but different sequenceId. This commit also adds an upgrade test to ensure that we always utilize syncId.	2019-05-21 22:44:17 -04:00
Nhat Nguyen	3573b1d0ce	Skip global checkpoint sync for closed indices (#41874 ) The verifying-before-close step ensures the global checkpoints on all shard copies are in sync; thus, we don' t need to sync global checkpoints for closed indices. Relate #33888	2019-05-21 19:55:21 -04:00
Nhat Nguyen	4d55e9e070	Estimate num history ops should always use translog (#42211 ) Currently, we ignore soft-deletes in peer recovery, thus estimateNumberOfHistoryOperations should always use translog. Relates #38904	2019-05-21 19:53:31 -04:00
Jason Tedor	b510402b67	Fix off-by-one error in an index shard test There is an off-by-one error in this test. It leads to the recovery thread never being started, and that means joining on it will wait indefinitely. This commit addresses that by fixing the off-by-one error. Relates #42325	2019-05-21 19:20:29 -04:00
Nhat Nguyen	6808951e6f	Mute testDelayedOperationsBeforeAndAfterRelocated Tracked at #42325	2019-05-21 17:08:43 -04:00
Jason Tedor	dd7a65fdf2	Fix compilation in IndexShardTests I forgot to git add these before pushing, sorry. This commit fixes compilation in IndexShardTests, they are needed here and not in master due to differences in how Java infers types in generics between JDK 8 and JDK 11.	2019-05-21 16:12:27 -04:00
Jason Tedor	f7ff0aff79	Execute actions under permit in primary mode only (#42241 ) Today when executing an action on a primary shard under permit, we do not enforce that the shard is in primary mode before executing the action. This commit addresses this by wrapping actions to be executed under permit in a check that the shard is in primary mode before executing the action.	2019-05-21 15:54:31 -04:00
Jason Tedor	32b70ed34c	Avoid unnecessary persistence of retention leases (#42299 ) Today we are persisting the retention leases at least every thirty seconds by a scheduled background sync. This sync causes an fsync to disk and when there are a large number of shards allocated to slow disks, these fsyncs can pile up and can severely impact the system. This commit addresses this by only persisting and fsyncing the retention leases if they have changed since the last time that we persisted and fsynced the retention leases.	2019-05-21 14:00:48 -04:00
Armin Braun	ecd033bea6	Cleanup Various Uses of ActionListener (#40126 ) (#42274 ) * Cleanup Various Uses of ActionListener * Use shorter `map`, `runAfter` or `wrap` where functionally equivalent to anonymous class * Use ActionRunnable where functionally equivalent	2019-05-21 17:20:52 +02:00
Henning Andersen	75425ae167	Remove 7.0.2 (#42282 ) 7.0.2 removed, since it will never be, fixing branch consistency check.	2019-05-21 15:52:58 +02:00
David Turner	7abeaba8bb	Prevent in-place downgrades and invalid upgrades (#41731 ) Downgrading an Elasticsearch node to an earlier version is unsupported, because we do not make any attempt to guarantee that a node can read any of the on-disk data written by a future version. Yet today we do not actively prevent downgrades, and sometimes users will attempt to roll back a failed upgrade with an in-place downgrade and get into an unrecoverable state. This change adds the current version of the node to the node metadata file, and checks the version found in this file against the current version at startup. If the node cannot be sure of its ability to read the on-disk data then it refuses to start, preserving any on-disk data in its upgraded state. This change also adds a command-line tool to overwrite the node metadata file without performing any version checks, to unsafely bypass these checks and recover the historical and lenient behaviour.	2019-05-21 08:04:30 +01:00
Jake Landis	b0a25c3170	add 7.1.1 and 6.8.1 versions (#42251 )	2019-05-20 17:58:24 -05:00
Ryan Ernst	be515d7ce0	Validate non-secure settings are not in keystore (#42209 ) Secure settings currently error if they exist inside elasticsearch.yml. This commit adds validation that non-secure settings do not exist inside the keystore. closes #41831	2019-05-20 11:35:53 -07:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
Alexander Reelsen	c72c76b5ea	Update to joda time 2.10.2 (#42199 )	2019-05-20 16:58:54 +02:00
Zachary Tong	072a9bdf55	Fix FiltersAggregation NPE when `filters` is empty (#41459 ) If `keyedFilters` is null it assumes there are unkeyed filters...which will NPE if the unkeyed filters was actually empty. This refactors to simplify the filter assignment a bit, adds an empty check and tidies up some formatting.	2019-05-20 10:04:21 -04:00
Jim Ferenczi	b7599472ac	Fix random failure in SearchRequestTests#testRandomVersionSerialization (#42069 ) This commit fixes a test bug that ends up comparing the result of two consecutive calls to System.currentTimeMillis that can be different on slow CIs. Closes #42064	2019-05-20 10:14:05 +02:00
Nhat Nguyen	0ec7986049	Enable debug log in testRetentionLeasesSyncOnRecovery Relates #39105	2019-05-19 22:07:25 -04:00
Nhat Nguyen	6ffc6ea42e	Don't verify evictions in testFilterCacheStats (#42091 ) If a background merge and refresh happens after a search but before a stats query, then evictions will be non-zero. Closes #32506	2019-05-15 18:17:53 -04:00
Nhat Nguyen	a75e916078	Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098 ) This test can create and shuffle 2(35*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153	2019-05-15 18:17:46 -04:00
Igor Motov	70ea3cf847	SQL: Add initial geo support (#42031 ) (#42135 ) Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. Queries that are supported as a result of this initial implementation Metadata commands - `DESCRIBE table` - returns the correct column types `GEOMETRY` for geo shapes and geo points. - `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions - `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. Returning geoshapes and geopoints from elasticsearch - `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console. - `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation; Using geopoints to elasticsearch - The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. - `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query. Limitations: Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values. Relates to #29872 Backport of #42031	2019-05-14 18:57:12 -05:00
Jay Modi	327f44e051	Concurrent tests wait for threads to be ready (#42083 ) This change updates tests that use a CountDownLatch to synchronize the running of threads when testing concurrent operations so that we ensure the thread has been fully created and run by the scheduler. Previously, these tests used a latch with a value of 1 and the test thread counted down while the threads performing concurrent operations just waited. This change updates the value of the latch to be 1 + the number of threads. Each thread counts down and then waits. This means that each thread has been constructed and has started running. All threads will have a common start point now.	2019-05-14 16:29:52 -04:00
David Turner	367e027962	Log cluster UUID when committed (#42065 ) Today we do not expose the cluster UUID in any logs by default, but it would be useful to see it. For instance if a user starts multiple nodes as separate clusters then they will silently remain as separate clusters even if they are subsequently reconfigured to look like a single cluster. This change logs the committed cluster UUID the first time the node encounters it.	2019-05-14 05:35:14 -04:00
Yogesh Gaikwad	90dce0864a	Increase the sample space for random inner hits name generator (#42057 ) (#42072 ) This commits changes the minimum length for inner hits name to avoid name collision which sometimes failed the test.	2019-05-12 10:32:02 +10:00
Andrei Stefan	912c6bdbff	Prevent order being lost for _nodes API filters (#42045 ) (#42089 ) * Switch to using a list instead of a Set for the filters, so that the order of these filters is kept. (cherry picked from commit 74a743829799b64971e0ac5ae265f43f6c14e074)	2019-05-11 01:58:03 +03:00
Nhat Nguyen	c19ea0a6f1	Remove global checkpoint assertion in peer recovery (#41987 ) If remote recovery copies an index commit which has gaps in sequence numbers to a follower; then these assertions (introduced in #40823) don't hold for follower replicas. Closes #41037	2019-05-10 14:38:35 -04:00
Christoph Büscher	3e59c31a12	Change IndexAnalyzers default analyzer access (#42011 ) Currently IndexAnalyzers keeps the three default as separate class members although they should refer to the same analyzers held in the additional analyzers map under the default names. This assumption should be made more explicit by keeping all analyzers in the map. This change adapts the constructor to check all the default entries are there and the getters to reach into the map with the default names when needed.	2019-05-10 18:08:51 +02:00
Jay Modi	80432a3552	Remove close method in PageCacheRecycler/Recycler (#41917 ) The changes in #39317 brought to light some concurrency issues in the close method of Recyclers as we do not wait for threads running in the threadpool to be finished prior to the closing of the PageCacheRecycler and the Recyclers that are used internally. #41695 was opened to address the concurrent close issues but upon review, the closing of these classes is not really needed as the instances should be become available for garbage collection once there is no longer a reference to the closed node. Closes #41683	2019-05-10 08:56:05 -06:00
Alan Woodward	44c3418531	Simplify handling of keyword field normalizers (#42002 ) We have a number of places in analysis-handling code where we check if a field type is a keyword field, and if so then extract the normalizer rather than pulling the index-time analyzer. However, a keyword normalizer is really just a special case of an analyzer, so we should be able to simplify this by setting the normalizer as the index-time analyzer at construction time.	2019-05-10 14:38:46 +01:00
Nhat Nguyen	809ed3b721	shouldRollGeneration should execute under read lock (#41696 ) Translog#shouldRollGeneration should execute under the read lock since it accesses the current writer.	2019-05-10 09:28:33 -04:00
David Turner	2a8a64d3f1	Remove extra `ms` from log message (#42068 ) This log message logs a `TimeValue` which includes units, but also logs an extra `ms`. This commit removes the extra `ms`.	2019-05-10 14:03:37 +01:00
Armin Braun	ea7db2bb6a	Fix testCloseOrDeleteIndexDuringSnapshot (#42007 ) * This test was resulting in a `PARTIAL` instead of a `SUCCESS` state for the case of closing an index during snapshotting on 7.x * The reason for this is the changed default behaviour regarding waiting for active shards between 8.0 and 7.x * Fixed by adjusting the waiting behaviour on the close index request in the test * Closes #39828	2019-05-10 11:59:20 +02:00
Armin Braun	dc444cef49	Fix Race in Closing IndicesService.CacheCleaner (#42016 ) (#42052 ) * When close becomes true while the management pool is shut down, we run into an unhandled `EsRejectedExecutionException` that fails tests * Found this while trying to reproduce #32506 * Running the IndexStatsIT in a loop is a way of reproducing this	2019-05-10 09:29:27 +02:00
Tal Levy	5640197632	Refactor TransportSingleShardAction to serialize Writeable responses (#41985 ) (#42040 ) Previously, TransportSingleShardAction required constructing a new empty response object. This response object's Streamable readFrom was used. As part of the migration to Writeable, the interface here was updated to leverage Writeable.Reader. relates to #34389.	2019-05-09 22:08:31 -07:00
Jay Modi	2998c107fb	Fix node close stopwatch usage (#41918 ) The close method in Node uses a StopWatch to time to closing of various services. However, the call to log the timing was made before any of the services had been closed and therefore no timing would be printed out. This change moves the timing log call to be a closeable that is the last item closed.	2019-05-09 09:41:42 -06:00
Jay Modi	f3bcc4fc22	Default seed address tests account for no IPv6 (#41971 ) This change makes the default seed address tests account for the lack of an IPv6 network. By default docker containers only run with IPv4 and these tests fail in a vanilla installation of elasticsearch-ci. To resolve this we only expect IPv6 seed addresses if IPv6 is available. Relates #41404	2019-05-09 08:19:46 -06:00
David Kyle	256588d773	Mute IndexStatsIT#testFilterCacheStats See https://github.com/elastic/elasticsearch/issues/32506	2019-05-09 13:49:47 +01:00
Jim Ferenczi	b7c7ca8f09	Fix IAE on cross_fields query introduced in 7.0.1 (#41938 ) If the max doc in the index is greater than the minimum total term frequency among the requested fields we need to adjust max doc to be equal to the min ttf. This was removed by mistake when fixing #41125. Closes #41934	2019-05-09 14:25:46 +02:00
Alan Woodward	309e4a11b5	Cut AnalyzeResponse over to Writeable (#41915 ) This commit makes AnalyzeResponse and its various helper classes implement Writeable. The classes are also now immutable. Relates to #34389	2019-05-09 13:09:23 +01:00
Jim Ferenczi	a329aaec90	Fix assertion error when caching the result of a search in a read-only index (#41900 ) The ReadOnlyEngine wraps its reader with a SoftDeletesDirectoryReaderWrapper if soft deletes are enabled. However the wrapping is done on top of the ElasticsearchDirectoryReader and that trips assertion later on since the cache key of these directories are different. This commit changes the order of the wrapping to put the ElasticsearchDirectoryReader first in order to ensure that it is always retrieved first when we unwrap the directory. Closes #41795	2019-05-09 08:59:52 +02:00
Benjamin Trent	edd6438e34	mute test related to #41967 (#41968 )	2019-05-08 15:03:28 -05:00
William Brafford	a2b7871f9f	Allow unknown task time in QueueResizingEsTPE (#41957 ) * Allow unknown task time in QueueResizingEsTPE The afterExecute method previously asserted that a TimedRunnable task must have a positive execution time. However, the code in TimedRunnable returns a value of -1 when a task time is unknown. Here, we expand the logic in the assertion to allow for that possibility, and we don't update our task time average if the value is negative. * Add a failure flag to TimedRunnable In order to be sure that a task has an execution time of -1 because of a failure, I'm adding a failure flag boolean to the TimedRunnable class. If execution time is negative for some other reason, an assertion will fail. Backport of #41810 Fixes #41448	2019-05-08 14:15:22 -04:00
David Roberts	452ee55cdb	Make ISO8601 date parser accept timezone when time does not have seconds (#41896 ) Prior to this change the ISO8601 date parser would only parse an optional timezone if seconds were specified. This change moves the timezone to the same level of optional components as hour, so that timestamps without minutes or seconds may optionally contain a timezone. It also adds a unit test to cover all the supported formats.	2019-05-08 13:50:53 +01:00
Yannick Welsch	957046dad0	Allow IDEA test runner to control number of test iterations (#41653 ) Allows configuring the number of test iterations via IntelliJ's config dialog, instead of having to add it manually via the tests.iters system property.	2019-05-08 13:57:29 +02:00
Armin Braun	5c824f3993	Reenable testCloseOrDeleteIndexDuringSnapshot (#41892 ) * Relates #39828	2019-05-08 13:10:19 +02:00
Jim Ferenczi	ca3d881716	Always set terminated_early if terminate_after is set in the search request (#40839 ) * terminated_early should always be set in the response with terminate_after Today we set `terminated_early` to true in the response if the query terminated early due to `terminate_after`. However if `terminate_after` is smaller than the number of documents in a shard we don't set the flag in the response indicating that the query was exhaustive. This change fixes this disprepancy by setting terminated_early to false in the response if the number of documents that match the query is smaller than the provided `terminate_after` value. Closes #33949	2019-05-08 12:26:38 +02:00
David Turner	4c909e93bb	Reject port ranges in `discovery.seed_hosts` (#41905 ) Today Elasticsearch accepts, but silently ignores, port ranges in the `discovery.seed_hosts` setting: ``` discovery.seed_hosts: 10.1.2.3:9300-9400 ``` Silently ignoring part of a setting like this is trappy. With this change we reject seed host addresses of this form. Closes #40786 Backport of #41404	2019-05-08 08:34:32 +01:00
David Turner	935f70c05e	Handle serialization exceptions during publication (#41781 ) Today if an exception is thrown when serializing a cluster state during publication then the master enters a poisoned state where it cannot publish any more cluster states, but nor does it stand down as master, yielding repeated exceptions of the following form: ``` failed to commit cluster state version [12345] org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1045) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:252) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.0.0.jar:7.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: cannot start publishing next value before accepting previous one at org.elasticsearch.cluster.coordination.CoordinationState.handleClientValue(CoordinationState.java:280) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1030) ~[elasticsearch-7.0.0.jar:7.0.0] ... 11 more ``` This is because it already created the publication request using `CoordinationState#handleClientValue()` but then it fails before accepting it. This commit addresses this by performing the serialization before calling `handleClientValue()`. Relates #41090, which was the source of such a serialization exception.	2019-05-07 17:53:12 +01:00
Alan Woodward	4cca1e8fff	Correct spelling of MockLogAppender.PatternSeenEventExpectation (#41893 ) The class was called PatternSeenEventExcpectation. This commit is a straight class rename to correct the spelling.	2019-05-07 17:28:51 +01:00
Ryan Ernst	e9e4bae683	Fix fractional seconds for strict_date_optional_time (#41871 ) The fractional seconds portion of strict_date_optional_time was accidentally copied from the printer, which always prints at least 3 fractional digits. This commit fixes the formatter to allow 1 or 2 fractional seconds. closes #41633	2019-05-07 09:09:30 -07:00
Henning Andersen	f068a22f5f	SeqNo CAS linearizability (#38561 ) Add a test that stresses concurrent writes using ifSeqno/ifPrimaryTerm to do CAS style updates. Use linearizability checker to verify linearizability. Linearizability of successful CAS'es is guaranteed. Changed linearizability checker to allow collecting history concurrently. Changed unresponsive network simulation to wake up immediately when network disruption is cleared to ensure tests proceed in a timely manner (and this also seems more likely to provoke issues).	2019-05-07 14:04:38 +02:00
Jim Ferenczi	70bf432fa8	Fix full text queries test that start with now (#41854 ) Full text queries that start with now are not cacheable if they target a date field. However we assume in the query builder tests that all queries are cacheable and this assumption fails when the random generated query string starts with "now". This fails twice in several years since the probability that a random string starts with "now" is low but this commit ensures that isCacheable is correctly checked for full text queries that fall into this edge case. Closes #41847	2019-05-06 19:08:30 +02:00
Przemyslaw Gomulka	79b7ce8697	Fix javadoc in WrapperQueryBuilder backport(41641) #41849 missing brackets in javadoc backports #41641	2019-05-06 17:55:11 +02:00
Henning Andersen	227d5e15fb	ReadOnlyEngine assertion fix (#41842 ) Fixed the assertion that maxSeqNo == globalCheckpoint to actually check against the global checkpoint.	2019-05-06 16:11:38 +02:00
Hicham Mallah	4a88da70c5	Add index name to cluster block exception (#41489 ) Updates the error message to reveal the index name that is causing it. Closes #40870	2019-05-04 19:11:59 -04:00
Nhat Nguyen	c7924014fa	Verify consistency of version and source in disruption tests (#41614 ) (#41661 ) With this change, we will verify the consistency of version and source (besides id, seq_no, and term) of live documents between shard copies at the end of disruption tests.	2019-05-03 18:47:14 -04:00
Nhat Nguyen	e61469aae6	Noop peer recoveries on closed index (#41400 ) If users close an index to change some non-dynamic index settings, then the current implementation forces replicas of that closed index to copy over segment files from the primary. With this change, we make peer recoveries of closed index skip both phases. Relates #33888 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2019-05-03 12:07:37 -04:00
Issam EL-ATIF	23706d4cdf	Update error message for allowed characters in aggregation names (#41573 ) Exception message thrown when specifying illegal characters did no accurately described the allowed characters. This updates the error message to reflect reality (any character except [, ] and >)	2019-05-03 11:55:09 -04:00
Jason Tedor	03c959f188	Upgrade keystore on package install (#41755 ) When Elasticsearch is run from a package installation, the running process does not have permissions to write to the keystore. This is because of the root:root ownership of /etc/elasticsearch. This is why we create the keystore if it does not exist during package installation. If the keystore needs to be upgraded, that is currently done by the running Elasticsearch process. Yet, as just mentioned, the Elasticsearch process would not have permissions to do that during runtime. Instead, this needs to be done during package upgrade. This commit adds an upgrade command to the keystore CLI for this purpose, and that is invoked during package upgrade if the keystore already exists. This ensures that we are always on the latest keystore format before the Elasticsearch process is invoked, and therefore no upgrade would be needed then. While this bug has always existed, we have not heard of reports of it in practice. Yet, this bug becomes a lot more likely with a recent change to the format of the keystore to remove the distinction between file and string entries.	2019-05-03 10:34:30 -04:00
David Turner	873d0020a5	Reject null customs at build time (#41782 ) Today you can add a null `Custom` to the cluster state or its metadata, but attempting to publish such a cluster state will fail. Unfortunately, the publication-time failure gives very little information about the source of the problem. This change causes the failure to manifest earlier and adds information about which `Custom` was null in order to simplify the investigation. Relates #41090.	2019-05-03 14:52:32 +02:00
Jack Conradson	025619bbf1	Improve error message for ln/log with negative results in function score This changes the error message for a negative result in a function score when using the ln modifier to suggest using ln1p or ln2p when a negative result occurs in a function score and for the log modifier to suggest using log1p or log2p. This relates to #41509	2019-05-02 16:31:25 -07:00
Jason Tedor	d0f071236a	Simplify filtering addresses on interfaces (#41758 ) This commit is a refactoring of how we filter addresses on interfaces. In particular, we refactor all of these methods into a common private method. We also change the order of logic to first check if an address matches our filter and then check if the interface is up. This is to possibly avoid problems we are seeing where devices are flapping up and down while we are checking for loopback addresses. We do not expect the loopback device to flap up and down so by reversing the logic here we avoid that problem on CI machines. Finally, we expand the error message when this does occur so that we know which device is flapping.	2019-05-02 16:36:27 -04:00
Colin Goodheart-Smithe	ab9154005b	Adds version 6.7.3	2019-05-02 17:36:23 +01:00
Tim Brooks	b4bcbf9f64	Support http read timeouts for transport-nio (#41466 ) This is related to #27260. Currently there is a setting http.read_timeout that allows users to define a read timeout for the http transport. This commit implements support for this functionality with the transport-nio plugin. The behavior here is that a repeating task will be scheduled for the interval defined. If there have been no requests received since the last run and there are no inflight requests, the channel will be closed.	2019-05-02 09:48:52 -06:00
David Turner	b189596631	Add details to BulkShardRequest#getDescription() (#41711 ) Today a bulk shard request appears as follows in the detailed task list: requests[42], index[my_index] This change adds the shard index and refresh policy too: requests[42], index[my_index][2], refresh[IMMEDIATE]	2019-05-02 08:29:25 +02:00
Andy Bristol	b9e44288d3	mute NodeTests#testCloseOnInterruptibleTask For #41448	2019-05-01 13:24:22 -07:00
Jason Tedor	39b0b5809d	Fix minimum compatible version after 6.8 This commit fixes the minimum compatible version after the introduction of 6.8.	2019-05-01 16:21:13 -04:00
Jay Modi	7f7eb7b679	Add version 7.0.2 to 7.x branch (#41715 )	2019-05-01 15:23:53 -04:00
Jason Tedor	f08ac103ee	Add 6.8 version constant This commit adds the 6.8 version constant to the 7.x branch.	2019-05-01 13:38:58 -04:00
Jason Tedor	7f3ab4524f	Bump 7.x branch to version 7.2.0 This commit adds the 7.2.0 version constant to the 7.x branch, and bumps BWC logic accordingly.	2019-05-01 13:38:57 -04:00
Henning Andersen	c6abe74dd6	Close and acquire commit during reset engine fix (#41584 ) (#41709 ) If closing a shard while resetting engine, IndexEventListener.afterIndexShardClosed would be called while there is still an active IndexWriter on the shard. For integration tests, this leads to an exception during check index called from MockFSIndexStore .Listener. Fixed. Relates to #38561	2019-05-01 15:22:24 +02:00
Jason Tedor	26c72c96bd	Fix imports in KeyStoreWrapperTests This commit addresses a checkstyle violation in KeyStoreWrapperTests, removing a leftover import.	2019-05-01 07:21:23 -04:00
Jason Tedor	0b46a62f6b	Drop distinction in entries for keystore (#41701 ) Today we allow adding entries from a file or from a string, yet we internally maintain this distinction such that if you try to add a value from a file for a setting that expects a string or add a value from a string for a setting that expects a file, you will have a bad time. This causes a pain for operators such that for each setting they need to know this difference. Yet, we do not need to maintain this distinction internally as they are bytes after all. This commit removes that distinction and includes logic to upgrade legacy keystores.	2019-05-01 07:02:04 -04:00
Nhat Nguyen	887f3f2c83	Simplify initialization of max_seq_no of updates (#41161 ) Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example #40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation). Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary. Relates #33842	2019-04-30 15:14:52 -04:00
Igor Motov	10ab838106	Geo: Add GeoJson parser to libs/geo classes (#41575 ) (#41657 ) Adds GeoJson parser for Geometry classes defined in libs/geo. Relates #40908 and #29872	2019-04-29 19:43:31 -04:00
Alan Woodward	a01f451ef7	Limit complexity of IntervalQueryBuilderTests#testRandomSource() (#41538 ) IntervalsSources can throw IllegalArgumentExceptions if they would produce too many disjunctions. To mitigate against this when building random sources, we limit the depth of the randomly generated source to four nested sources Fixes #41402	2019-04-29 13:31:19 +01:00
Dan Hermann	b23709b178	Applies the same naming restrictions to repositories as to snapshots except that leading underscores and uppercase characters are permitted. (#41585 ) Fixes #40817.	2019-04-29 07:31:01 -05:00
Armin Braun	6e51b6f96d	Add Repository Consistency Assertion to SnapshotResiliencyTests (#41631 ) * Add Repository Consistency Assertion to SnapshotResiliencyTests (#40857) * Add Repository Consistency Assertion to SnapshotResiliencyTests * Add some quick validation on not leaving behind any dangling metadata or dangling indices to the snapshot resiliency tests * Added todo about expanding this assertion further * Fix SnapshotResiliencyTest Repo Consistency Check (#41332) * Fix SnapshotResiliencyTest Repo Consistency Check * Due to the random creation of an empty `extra0` file by the Lucene mockFS we see broken tests because we use the existence of an index folder in assertions and the index deletion doesn't go through if there are extra files in an index folder * Fixed by removing the `extra0` file and resulting empty directory trees before asserting repo consistency * Closes #41326 * Reenable SnapshotResiliency Test (#41437) This was fixed in https://github.com/elastic/elasticsearch/pull/41332 but I forgot to reenable the test. * fix compile on java8	2019-04-29 12:01:58 +02:00
Nhat Nguyen	615a0211f0	Recovery should not indefinitely retry on mapping error (#41099 ) A stuck peer recovery in #40913 reveals that we indefinitely retry on new cluster states if indexing translog operations hits a mapper exception. We should not wait and retry if the mapping on the target is as recent as the mapping that the primary used to index the replaying operations. Relates #40913	2019-04-27 10:55:08 -04:00
Michael Morello	75283294f5	Fix multi-node parsing in voting config exclusions REST API (#41588 ) Fixes an issue where multiple nodes where not properly parsed in the voting config exclusions REST API. Closes #41587	2019-04-27 12:20:03 +02:00
Nick Knize	113b24be4b	Refactor GeoHashUtils (#40869 ) This commit refactors GeoHashUtils class into a new Geohash utility class located in the ES geo library. The intent is to not only better control what geo methods are whitelisted for painless scripting but to clean up the geo utility API in general.	2019-04-26 10:06:36 -05:00
Armin Braun	aad33121d8	Async Snapshot Repository Deletes (#40144 ) (#41571 ) Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106 * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657	2019-04-26 15:36:09 +02:00
Armin Braun	7824f60a34	Simplify Snapshot Resiliency Test (#40930 ) (#41565 ) * Thanks to #39793 dynamic mapping updates don't contain blocking operations anymore so we don't have to manually put the mapping in this test and can keep it a little simpler	2019-04-26 10:59:09 +02:00
Christoph Büscher	078936b8f5	Remove search analyzers from DocumentFieldMappers (#41484 ) These references seem to be unused except for tests and should be removed to keep the places we store analyzers limited.	2019-04-26 09:48:48 +02:00
Armin Braun	6a24fd3f26	Add Restore Operation to SnapshotResiliencyTests (#40634 ) (#41546 ) * Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in #39504	2019-04-26 09:04:34 +02:00
Christoph Büscher	52495843cc	[Docs] Fix common word repetitions (#39703 )	2019-04-25 20:47:47 +02:00
Armin Braun	23b3741618	Remove Exists Check from S3 Repository Deletes (#40931 ) (#41534 ) * The check doesn't add much if anything practically, since the S3 repository is eventually consistent and we only log the non-existence of a blob anyway * We don't do the check on writes for this very reason and documented it as such * Removing the check saves one API call per single delete speeding up the deletion process and lowering costs	2019-04-25 18:25:03 +02:00
Jim Ferenczi	6184efaff6	Handle unmapped fields in _field_caps API (#34071 ) (#41426 ) Today the `_field_caps` API returns the list of indices where a field is present only if this field has different types within the requested indices. However if the request is an index pattern (or an alias, or both...) there is no way to infer the indices if the response contains only fields that have the same type in all indices. This commit changes the response to always return the list of indices in the response. It also adds a way to retrieve unmapped field in a specific section per field called `unmapped`. This section is created for each field that is present in some indices but not all if the parameter `include_unmapped` is set to true in the request (defaults to false).	2019-04-25 18:13:48 +02:00
Armin Braun	40aef2b8aa	Introduce Delegating ActionListener Wrappers (#40129 ) (#41527 ) * Introduce Delegating ActionListener Wrappers * Dry up use cases of ActionListener that simply pass through the response or exception to another listener	2019-04-25 16:05:04 +02:00
Ignacio Vera	d119abdf96	Improve accuracy for Geo Centroid Aggregation (#41514 ) keeps the partial results as doubles and uses Kahan summation to help reduce floating point errors.	2019-04-25 15:25:48 +02:00
Armin Braun	cd830b53e2	Name Snapshot Data Blobs by UUID (#40652 ) (#41523 ) * Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in #38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix	2019-04-25 13:18:03 +02:00
Luca Cavanna	8a0e5f7b87	Deprecate support for first line empty in msearch API (#41442 ) In order to support empty action metadata in the first msearch item, we need to remove support for prepending msearch request body with an empty line, which prevents us from parsing the empty line as action metadata for the first search item. Relates to #41011	2019-04-25 12:45:18 +02:00
Przemyslaw Gomulka	906f88029b	Remove the test which is testing java and joda api backport(#41493 ) #41518 The test is testing the java time API and fails in case it hits daylight saving time changes. Java time has the right implementation and we don't need to test this. more details on how the test was affected by the DST change on this comment closes #39617 backport(#41493)	2019-04-25 12:21:01 +02:00
Armin Braun	7c819fd2aa	Fix BulkRejectionIT (#41446 ) (#41500 ) * Due to #40866 one of the two parallel bulk requests can randomly be rejected outright when the write queue is full already, we can catch this situation and ignore it since we can still have the rejection for the dynamic mapping udate for the other reuqest and it's somewhat rare to run into this anyway * Closes #41363	2019-04-24 20:46:21 +02:00
Zachary Tong	ec5dd0594f	Disallow null/empty or duplicate composite sources (#41359 ) Adds some validation to prevent duplicate source names from being used in the composite agg. Also refactored to use a ConstructingObjectParser and removed the private ctor and setter for sources, making it mandatory.	2019-04-24 13:23:31 -04:00
Armin Braun	1db9166ea0	Fix Broken Index Shard Snapshot File Preventing Snapshot Creation (#41310 ) (#41473 ) * The problem here is that if we run into a corrupted index-N file, instead of generating a new index-(N+1) file, we instead set the newest index generation to -1 and thus tried to create `index-0` * If `index-0` is corrupt, this prevents us from ever creating a new snapshot using the broken shard, because we are unable to create `index-0` since it already exists * Fixed by still using the index generation for naming the next index file, even if it was a broken index file * Added test that makes sure restoring as well as snapshotting on top of the broken shard index file work as expected * closes #41304	2019-04-24 18:39:17 +02:00
Armin Braun	381b8e2ece	Fix BulkProcessor Retry ITs (#41338 ) (#41472 ) * The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually * Fixed by adding retry functionality to the top level request as well * Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did * closes #41324	2019-04-24 13:46:32 +02:00
Jason Tedor	65af47eb31	Introduce aliases version (#41397 ) This commit introduces aliases versions to index metadata. This will be useful in CCR when we replicate aliases.	2019-04-23 12:19:11 -04:00
David Roberts	7e2aec022d	[TEST] Mute BulkRejectionIT.testBulkRejectionAfterDynamicMappingUpdate Due to https://github.com/elastic/elasticsearch/issues/41363	2019-04-23 15:58:38 +01:00
David Roberts	d8a2970fa4	[TEST] Mute RemoteClusterServiceTests.testCollectNodes Due to https://github.com/elastic/elasticsearch/issues/41067	2019-04-23 15:13:01 +01:00
David Turner	0bb15d3dac	Allow ops to be blocked after primary promotion (#41360 ) Today we assert that there are no operations in flight in this test. However we will sometimes be in a situation where the operations are blocked, and we distinguish these cases since #41271 causing the assertion to fail. This commit addresses this by allowing operations to be blocked sometimes after a primary promotion. Fixes #41333.	2019-04-19 07:48:43 +01:00
Jim Ferenczi	8f73e1e883	Fix unmapped field handling in the composite aggregation (#41280 ) The `composite` aggregation maps unknown fields as numerics, this means that any `after` value that is set on a query with an unmapped field on some indices will fail if the provided value is not numeric. This commit changes the default value source to use keyword instead in order to be able to parse any type of after values.	2019-04-18 23:08:13 +02:00
Jim Ferenczi	754037b71e	Unified highlighter should ignore terms that targets the _id field (#41275 ) The `_id` field uses a binary encoding to index terms that is not compatible with the utf8 automaton that the unified highlighter creates to reanalyze the input. For these reason this commit ignores terms that target the `_id` field when `require_field_match` is set to false. Closes #37525	2019-04-18 22:31:23 +02:00
Jim Ferenczi	068f8ba223	more_like_this query to throw an error if the like fields is not provided (#40632 ) With the removal of the `_all` field the `mlt` query cannot infer a field name to use to analyze the provided (un)like text if the `fields` parameter is not explicitly set in the query and the `index.query.default_field` is not changed in the index settings (by default it is set to ``). For this reason the like text is ignored and queries are only built from the provided document ids. This change fixes this bug by throwing an error if the fields option is not set and the `index.query.default_field` is equals to ``. The error is thrown only if like or unlike texts are provided in the query.	2019-04-18 22:30:22 +02:00
Simon Willnauer	11dc9fe249	Mark searcher as accessed in acquireSearcher (#41335 ) This fixes an issue where every N seconds a slow search request is triggered since the searcher access time is not set unless the shard is idle. This change moves to a more pro-active approach setting the searcher as accessed all the time.	2019-04-18 19:14:50 +02:00
Adrien Grand	a699cb76a5	Fix javadoc tag. (#41330 ) s/returns/return/	2019-04-18 14:41:09 +02:00
Armin Braun	389a13b68e	Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (#41325 ) (#41331 ) * For #41324	2019-04-18 11:55:28 +02:00
Alpar Torok	a4a4259cac	Mute failing test Tracking #41326	2019-04-18 09:26:20 +03:00
Armin Braun	c77e10b16b	Handle Bulk Requests on Write Threadpool (#40866 ) (#41315 ) * Bulk requests can be thousands of items large and take more than O(10ms) time to handle => we should not handle them on the transport threadpool to not block select loops * relates #39128 * relates #39658	2019-04-18 07:10:23 +02:00
David Turner	946baf87d3	Assert TransportReplicationActions acquire permits (#41271 ) Today we do not distinguish "no operations in flight" from "operations are blocked", since both return `0` from `IndexShard#getActiveOperationsCount()`. We therefore cannot assert that every `TransportReplicationAction` performs its actions under permit(s). This commit fixes this by returning `IndexShard#OPERATIONS_BLOCKED` if operations are blocked, allowing these two cases to be distinguished.	2019-04-17 23:05:03 +02:00
Zachary Tong	7e62ff2823	[Rollup] Validate timezones based on rules not string comparision (#36237 ) The date_histogram internally converts obsolete timezones (such as "Canada/Mountain") into their modern equivalent ("America/Edmonton"). But rollup just stored the TZ as provided by the user. When checking the TZ for query validation we used a string comparison, which would fail due to the date_histo's upgrading behavior. Instead, we should convert both to a TimeZone object and check if their rules are compatible.	2019-04-17 13:46:44 -04:00
Christoph Büscher	4d964194db	Fix error applying `ignore_malformed` to boolean values (#41261 ) The `ignore_malformed` option currently works on numeric fields only when the bad value isn't a string value but not if it is a boolean. In this case we get a parsing error from the xContent parser which we need to catch in addition to the field mapper. Closes #11498	2019-04-17 18:44:57 +02:00
David Turner	2670ed2f8f	Assert the stability of custom search preferences (#41150 ) Today the `?preference=custom_string_value` search preference will only change its choice of a shard copy if something changes the `IndexShardRoutingTable` for that specific shard. Users can use this behaviour to route searches to a consistent set of shard copies, which means they can reliably hit copies with hot caches, and use the other copies only for redundancy in case of failure. However we do not assert this property anywhere, so we might break it in future. This commit adds a test that shows that searches are routed consistently even if other indices are created/rebalanced/deleted. Relates https://discuss.elastic.co/t/176598, #41115, #26791	2019-04-17 17:47:44 +02:00
Nhat Nguyen	2ee87c99d9	Fix bwc version of sanity check of read only engine Relates #41041	2019-04-17 10:25:47 -04:00
Nhat Nguyen	aa0c957a4a	Do not trim unsafe commits when open readonly engine (#41041 ) Today we always trim unsafe commits (whose max_seq_no >= global checkpoint) before starting a read-write or read-only engine. This is mandatory for read-write engines because they must start with the safe commit. This is also fine for read-only engines since most of the cases we should have exactly one commit after closing an index (trimming is a noop). However, this is dangerous for following indices which might have more than one commits when they are being closed. With this change, we move the trimming logic to the ctor of InternalEngine so we won't trim anything if we are going to open a read-only engine.	2019-04-17 10:16:12 -04:00
Adrien Grand	f7e590ce0d	ProfileScorer should propagate `setMinCompetitiveScore`. (#40958 ) (#41302 ) Currently enabling profiling disables top-hits optimizations, which is unfortunate: it would be nice to be able to notice the difference in method counts and timings depending on whether total hit counts are requested.	2019-04-17 16:11:14 +02:00
Adrien Grand	9fd5237fd4	Clean up Node#close. (#39317 ) (#41301 ) `Node#close` is pretty hard to rely on today: - it might swallow exceptions - it waits for 10 seconds for threads to terminate but doesn't signal anything if threads are still not terminated after 10 seconds This commit makes `IOException`s propagated and splits `Node#close` into `Node#close` and `Node#awaitClose` so that the decision what to do if a node takes too long to close can be done on top of `Node#close`. It also adds synchronization to lifecycle transitions to make them atomic. I don't think it is a source of problems today, but it makes things easier to reason about.	2019-04-17 16:10:53 +02:00
Jason Tedor	6566979c18	Always check for archiving broken index settings (#41209 ) Today we check if an index has broken settings when checking if an index needs to be upgraded. However, it can be the case that an index setting became broken even if an index is already upgraded to the current version if the user removed a plugin (or downgraded from the default distribution to the non-default distribution) while on the same version of Elasticsearch. In this case, some registered settings would go missing and the index would now be broken. Yet, we miss this check and instead of archiving the settings, the index becomes unassigned due to the missing settings. This commit addresses this by checking for broken settings whether or not the index is upgraded.	2019-04-17 07:00:23 -04:00
Christoph Büscher	badb7a22e0	Some cleanups in NoisyChannelSpellChecker (#40949 ) One of the two #getCorrections methods is only used in tests, so we can move it and any of the required helper methods to that test. Also reducing the visibility of several methods to package private since the class isn't used elsewhere outside the package.	2019-04-17 10:22:12 +02:00
David Turner	bfa06d963e	Do not create missing directories in readonly repo (#41249 ) Today we erroneously look for a node setting called `readonly` when deciding whether or not to create a missing directory in a filesystem repository. This change fixes this by using the repository setting instead. Closes #41009 Relates #26909	2019-04-17 09:43:14 +02:00
Yogesh Gaikwad	6a552c05fe	Use alias name from rollover request to query indices stats (#40774 ) (#41284 ) In `TransportRolloverAction` before doing rollover we resolve source index name (write index) from the alias in the rollover request. Before evaluating the conditions and executing rollover action, we retrieve stats, but to do so we used the source index name resolved from the alias instead of alias from the index. This fails when the user is assigned a role with index privilege on the alias instead of the concrete index. This commit fixes this by using the alias from the request. After this change, verified that when we retrieve all the stats (including write + read indexes) we are considering only source index. Closes #40771	2019-04-17 14:15:05 +10:00
Jim Ferenczi	043c1f5d42	Unified highlighter should respect no_match_size with number_of_fragments set to 0 (#41069 ) The unified highlighter returns the first sentence of the text when number_of_fragments is set to 0 (full highlighting). This is a legacy of the removed postings highlighter that was based on sentence break only. This commit changes this behavior in order to respect the provided no_match_size value when number_of_fragments is set to 0. This means that the behavior will be consistent for any value of the number_of_fragments option. Closes #41066	2019-04-16 19:25:25 +02:00
Armin Braun	c4e84e2b34	Add Bulk Delete Api to BlobStore (#40322 ) (#41253 ) * Adds Bulk delete API to blob container * Implement bulk delete API for S3 * Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs * Closes #40250	2019-04-16 17:19:05 +02:00
Jim Ferenczi	c22a2cea12	BlendedTermQuery should ignore fields that don't exists in the index (#41125 ) Today the blended term query detects if a term exists in a field by looking at the term statistics in the index. However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it. This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the requested fields. Closes #41118	2019-04-16 16:25:42 +02:00
David Turner	8577bbd73b	Inline TransportReplAct#createReplicatedOperation (#41197 ) `TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation` exists so it can be overridden in tests. This commit re-works these tests to use a real `ReplicationOperation` and inlines the now-unnecessary method. Relates #40706.	2019-04-16 13:36:29 +01:00
David Turner	10e58210a0	Validate cluster UUID when joining Zen1 cluster (#41063 ) Today we fail to join a Zen2 cluster if the cluster UUID does not match our own, but we do not perform the same validation when joining a Zen1 cluster. This means that a Zen2 node will pass join validation and be added to a Zen1 cluster but will reject all cluster states from the master. Relates #37775	2019-04-16 12:49:47 +01:00
Nhat Nguyen	8ee84f2268	Correct flush parameters in engine test Since #40213, we forbid a combination of flush parameters: force=true and wait_if_ongoing=false. Closes #41236	2019-04-16 05:04:31 -04:00
Christoph Büscher	f8161ffa88	Fix some `range` query edge cases (#41160 ) Currently we throw an error when a range querys minimum value exceeds the maximum value due to the fact that they are neighbouring values and both upper and lower value are excluded from the interval. Since this is a condition that the user usually doesn't specify conciously (at least in the case of float and double values its difficult to see which values are adjacent) we should ignore those "wrong" intervals and create a MatchNoDocsQuery in those cases. We should still throw errors with an actionable message if the user specifies the query interval in a way that min value > max value. This PR adds those checks and tests for those cases. Closes #40937	2019-04-16 10:56:13 +02:00
Tim Brooks	ad3b7abaa3	Deprecate old transport settings (#41229 ) This is related to #36652. We intend to remove a number of old transport settings in 8.0. This commit deprecates those settings for 7.x.	2019-04-15 21:43:09 -06:00
Tim Brooks	56c00eecbc	Remove string usages of old transport settings (#41207 ) This is related to #36652. We intend to deprecate a number of transport settings in 7.x and remove them in 8.0. This commit removes the string usages of these settings.	2019-04-15 16:54:24 -06:00
Zachary Tong	f19b052e03	Better error messages when pipelines reference incompatible aggs (#40068 ) Pipelines require single-valued agg or a numeric to be returned. If they don't get that, they throw an exception. Unfortunately, this exception text is very confusing to users because it usually arises from pathing "through" multiple terms aggs. The final target is a numeric, but it's the intermediary aggs that cause the problem. This commit adds the current agg name to the exception message so the user knows which "level" is the issue.	2019-04-15 10:35:53 -04:00
Jim Ferenczi	d30fec4914	Full text queries should not always ignore unmapped fields (#41062 ) Full text queries ignore unmapped fields since https://github.com/elastic/elasticsearch/issues/41022 even if all fields in the query are unmapped. This change makes sure that we ignore unmapped fields only if they are mixed with mapped fields and returns a MatchNoDocsQuery otherwise. Closes #41022	2019-04-15 12:16:50 +02:00
Christoph Büscher	2980a6c70f	Clarify some ToXContent implementations behaviour (#41000 ) This change adds either ToXContentObject or ToXContentFragment to classes directly implementing ToXContent currently. This helps in reasoning about whether those implementations output full xcontent object or just fragments. Relates to #16347	2019-04-15 09:42:08 +02:00
Yogesh Gaikwad	e7375368d6	Remove nested loop in IndicesStatsResponse (#40988 ) (#41138 ) This commit removes nested loop in `getIndices`.	2019-04-13 04:36:29 +10:00
Ignacio Vera	8af930c468	Improve error message when polygons contains twice the same point in no-consecutive position (#41051 ) (#41133 ) When a polygon contains a self-intersection due to have twice the same point in no-consecutive position, the polygon builder tries to split the polygon. During the split one of the polygons become invalid as it is not closed and an error is thrown which is not related to the real issue. We detect this situation now and throw a more meaningful error.	2019-04-12 09:16:33 +02:00
Nhat Nguyen	e9999dfa1d	Init global checkpoint after copy commit in peer recovery (#40823 ) Today a new replica of a closed index does not have a safe commit invariant when its engine is opened because we won't initialize the global checkpoint on a recovering replica until the finalize step. With this change, we can achieve that property by creating a new translog with the global checkpoint from the primary at the end of phase 1.	2019-04-11 22:18:31 -04:00
Antonio Matarrese	79c7a57737	Use the breadth first collection mode for significant terms aggs. (#29042 ) This helps avoid memory issues when computing deep sub-aggregations. Because it should be rare to use sub-aggregations with significant terms, we opted to always choose breadth first as opposed to exposing a `collect_mode` option. Closes #28652.	2019-04-11 15:56:02 -07:00
Nhat Nguyen	0f496842fd	Fix msu assertion in restore shard history test Since #40249, we always reinitialize max_seq_no_of_updates to max_seq_no when a promoting primary restores history regardless of whether it did rollback previously or not. Closes #40929	2019-04-11 18:44:13 -04:00
Ryan Ernst	5cdd87deb7	Remove settings members from Node (#40811 ) This commit removes the settings member variable from Node. This member made it confusing which settings should actually be looked at. Now all settings are accessed through the final environment.	2019-04-11 13:59:54 -07:00
David Turner	b522de975d	Move primary term from replicas proxy to repl op (#41119 ) A small refactoring that removes the primaryTerm field from ReplicasProxy and instead passes it directly in to the methods that need it. Relates #40706.	2019-04-11 21:19:27 +01:00
Armin Braun	233df6b73b	Make Transport Shard Bulk Action Async (#39793 ) (#41112 ) This is a dependency of #39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.	2019-04-11 16:01:52 +02:00
Jason Tedor	24446ceae0	Add packaging to cluster stats response (#41048 ) This commit adds a packaging_types field to the cluster stats response that outlines the build flavors and types present in a cluster.	2019-04-10 13:47:19 -04:00
Zachary Tong	e611334b2b	Add 7.0.1 version constant	2019-04-10 11:32:53 -04:00
Dimitrios Liappis	799541e068	Mute DateTimeUnitTests.testConversion (#40738 ) Due to #39617 Backport of #40086	2019-04-10 16:37:16 +03:00
Jim Ferenczi	4263a28039	Fix rewrite of inner queries in DisMaxQueryBuilder (#40956 ) This commit implements missing rewrite for the DisMaxQueryBuilder. Closes #40953	2019-04-10 11:38:16 +02:00
Jason Tedor	3aae98f922	Add debug logging for leases sync on recovery test This commit adds some debug logging for a retention leases sync on recovery test.	2019-04-09 22:59:22 -04:00
Julie Tibshirani	d38214060e	Mute ClusterDisruptionIT#testCannotJoinIfMasterLostDataFolder. Tracked in #41047.	2019-04-09 17:36:21 -07:00
Julie Tibshirani	a417905098	Mute RareClusterStateIT#testDelayedMappingPropagationOnPrimary as we await a fix. Tracked in #41030.	2019-04-09 13:41:53 -07:00
Julie Tibshirani	a0fc2461d7	Mute DedicatedClusterSnapshotRestoreIT#testSnapshotWithStuckNode as we await a fix.	2019-04-09 12:03:33 -07:00
Mark Vieira	1287c7d91f	[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978 ) (#40993 ) * Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions. (cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6) * Fix forking JVM runner * Don't bump shadow plugin version	2019-04-09 11:52:50 -07:00
Jason Tedor	321f93c4f9	Wait for all listeners in checkpoint listeners test It could be that we try to shutdown the executor pool before all the listeners have been invoked. It can happen that one was not invoked if it timed out and was in the process of being notified that it timed out on the executor. If we do this shutdown then, a listener will be met with rejected execution exception. To address this, we first wait until all listeners have been notified (or timed out) before proceeding with shutting down the executor. Relates #40970	2019-04-09 14:27:09 -04:00
Henning Andersen	c5a77e5d8c	Node repurpose tool docs (#40525 ) Added documentation for node repurpose tool and included documentation on how to repurpose nodes safely. Adjusted order of tools in `elasticsearch-node` tool since the repurpose tool is most likely to be used. Co-Authored-By: David Turner <david.turner@elastic.co>	2019-04-09 15:07:37 +02:00
David Turner	08ecdfe20e	Short-circuit rebalancing when disabled (#40966 ) Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled, but we still execute `balanceByWeights()` and perform some rather expensive calculations before discovering that we cannot rebalance any shards. In a large cluster this can make cluster state updates occur rather slowly. With this change we check earlier whether rebalancing is globally disabled and, if so, avoid the rebalancing process entirely. Relates #40942 which was reverted because of egregiously faulty tests.	2019-04-09 07:59:52 +01:00
Nhat Nguyen	69421612e5	Mute testRecoverMissingAnalyzer Tracked at #40867	2019-04-08 22:47:20 -04:00
Nhat Nguyen	713e5c987b	Adjust init map size of user data of index commit (#40965 ) The number of user data attributes of an index commit has increased from 6 to 8, but we forgot to adjust. This change increases the initial size of that map to avoid resizing.	2019-04-08 22:47:20 -04:00
Christoph Büscher	335955b874	Some internal refactorings in AnalysisRegistry (#40609 ) Reducing some methods scope and marking them as static where possible. Removing "alias" support from AnalysisRegistry#produceAnalyze and changing that method to return a NamedAnalyzer instead of having a side effect on the analyzer map passed in. Also, CustomAnalyzerProvider doesn't seem to need the `environment` field.	2019-04-08 20:48:34 +02:00
David Turner	8eef92fafd	Revert "Short-circuit rebalancing when disabled (#40942 )" This reverts commit `f78e6ef73b`.	2019-04-08 15:58:56 +01:00
David Turner	f78e6ef73b	Short-circuit rebalancing when disabled (#40942 ) Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled, but we still execute `balanceByWeights()` and perform some rather expensive calculations before discovering that we cannot rebalance any shards. In a large cluster this can make cluster state updates occur rather slowly. With this change we check earlier whether rebalancing is globally disabled and, if so, avoid the rebalancing process entirely.	2019-04-08 14:57:29 +01:00
Jim Ferenczi	bc0fe7d64d	Handle min_doc_freq in phrase suggester (#40840 ) The phrase suggesters have an option to remove terms that have a frequency lower than a provided min_doc_freq. However this value is overwritten by the frequency of the original term in the popular mode. This change ensures that we keep the maximum value between the provided min_doc_value and the original term frequency as a threshold to select candidates. Fixes #16764	2019-04-08 12:23:54 +02:00
Jason Tedor	4163e59768	Mute failing IndexShard local history test This test fails reliably with, so this commit mutes that test until a fix is available.	2019-04-07 10:17:46 -04:00
Jason Tedor	6900399144	Be lenient when parsing build flavor and type on the wire (#40734 ) Today we are strict when parsing build flavor and types off the wire. This means that if a later version introduces a new build flavor or type, an older version would not be able to parse what that new version is sending. For a practical example of this, we recently added the build type "docker", and this means that in a rolling upgrade scenario older nodes would not be able to understand the build type that the newer node is sending. This breaks clusters and is bad. We do not normally think of adding a new enumeration value as being a serialization breaking change, it is just not a lesson that we have learned before. We should be lenient here though, so that we can add future changes without running the risk of breaking ourselves horribly. It is either that, or we have super-strict testing infrastructure here yet still I fear the possibility of mistakes. This commit changes the parsing of build flavor and build type so that we are still strict at startup, yet we are lenient with values coming across the wire. This will help avoid us breaking rolling upgrades, or clients that are on an older version.	2019-04-06 17:24:16 -04:00
Jason Tedor	e44e84ab42	Suppress lease background sync failures if stopping (#40902 ) If the transport service is stopped, likely because we are shutting down, and a retention lease background sync fires the logs will display a warn message and stacktrace. Yet, this situaton is harmless and can happen as a normal course of business when shutting down. This commit suppresses the log messages in this case.	2019-04-06 10:18:52 -04:00
David Turner	2ff19bc1b7	Use Writeable for TransportReplAction derivatives (#40905 ) Relates #34389, backport of #40894.	2019-04-05 19:10:10 +01:00
Colin Goodheart-Smithe	4452e8e10f	Mutes GatewayIndexStateIT.testRecoverBrokenIndexMetadata	2019-04-05 10:53:52 -04:00
David Turner	922a70ce32	Remove unused import Relates #40863	2019-04-05 09:21:34 +01:00
David Turner	d8956d2601	Remove test-only customisation from TransReplAct (#40863 ) The `getIndexShard()` and `sendReplicaRequest()` methods in TransportReplicationAction are effectively only used to customise some behaviour in tests. However there are other ways to do this that do not cause such an obstacle to separating the TransportReplicationAction into its two halves (see #40706). This commit removes these customisation points and injects the test-only behaviour using other techniques.	2019-04-05 08:54:41 +01:00
Nhat Nguyen	5a2eb07c0e	Primary replica resync should not send ops without seqno (#40433 ) Primary-replica resync in a mixed-cluster between 6.x and 5.6 can send operations without sequence number to a replica which already processed operations with sequence number. This leads to the failure of that replica for we trip the sequence number assertion when writing resync operations without sequence number to translog.	2019-04-04 21:54:31 -04:00
Colin Goodheart-Smithe	402f312c5e	Adds version 6.7.2	2019-04-04 16:35:39 +01:00
Nhat Nguyen	2756a3936b	Reject illegal flush parameters (#40213 ) This change rejects an illegal combination of flush parameters where force is true, but wait_if_ongoing is false. This combination is trappy and should be forbidden. Closes #36342	2019-04-04 09:02:31 -04:00
Nhat Nguyen	c4960ad736	Ensure flush happen before closing an index (#40184 ) If there's an ongoing flush triggered by the translog flush threshold, we may fail to execute a flush because waitIfOngoing is false by default. Relates to #36342	2019-04-04 09:02:31 -04:00
Nhat Nguyen	e716b9ceee	Ensure no scheduled refresh in testPendingRefreshWithIntervalChange If a refresh, which is scheduled by the setting change, executes after the index-2 operation and win the refresh race (i.e., maybeRefresh) with the scheduledRefresh that we are going to check, then the latter will return false. Closes #39565 Relates #39462 PR #40387	2019-04-04 09:02:31 -04:00
Adrien Grand	670e76669c	Fix alias resolution runtime complexity. (#40263 ) (#40788 ) A user reported that the same query that takes ~900ms when querying an index pattern only takes ~50ms when only querying indices that have matches. The query is a date range query and we confirmed that the `can_match` phase works as expected. I was able to reproduce this issue locally with a single node: with 900 1-shard indices, a query to an index pattern that matches all indices runs in ~90ms while a query to the only index that has matches runs in 0-1ms. This ended up not being related to the `can_match` phase but to the cost of resolving aliases when querying an index pattern that matches lots of indices. In that case, we first resolve the index pattern to a list of concrete indices and then for each concrete index, we check whether it was matched through an alias, meaning we might have to apply alias filters. Unfortunately this second per-index operation runs in linear time with the number of matched concrete indices, which means that alias resolution runs in O(num_indices^2) overall. So queries get exponentially slower as an index pattern matches more indices. I reorganized alias resolution into a one-step operation that runs in linear time with the number of matches indices, and then a per-index operation that runs in linear time with the number of aliases of this index. This makes alias resolution run is O(num_indices * num_aliases_per_index) overall instead. When testing the scenario described above, the `took` went down from ~90ms to ~10ms. It is still more than the 0-1ms latency that one gets when only querying the single index that has data, but still much better than what we had before. Closes #40248	2019-04-04 11:40:42 +02:00
Adrien Grand	f5f5c3e429	Add unit test for MetaDataMappingService with typeless put mapping. (#40578 ) (#40720 ) This is currently only tested via REST tests. Closes #37450	2019-04-04 10:07:55 +02:00
Ryan Ernst	a28d5f35d9	Fix geo points missing test (#40704 ) This commit initializes the geo points for the missing doc values test. fixes #40684	2019-04-03 16:48:09 -07:00
Mayya Sharipova	a94e9500ac	Correct bug in ScriptDocValues (#40488 ) If a field `field_name` was missing in a document, doc['field_name'].get(0) incorrectly retrieved a value of the previously accessed document. This happened because `get(int index)` function was just accessing `values[index]` without checking the number of values - `count`. This PR fixes this.	2019-04-03 16:47:59 -07:00
Yannick Welsch	6ae7d593ea	Avoid background sync on relocated primary (#40800 ) There were some test failures caused by the background retention lease sync running on a relocated primary. This commit fixes the situation that triggered the assertion and reactivates the failing test. Closes #40731	2019-04-03 20:28:48 +02:00
Christoph Büscher	89389197b3	Help Eclipse infering lambda parameter types (#40747 ) The Eclipse compiler (4.10, Photon) cannot build this test because it cannot correctly infer the type arguments of the functions. Explicitely adding them helps in this case.	2019-04-03 17:51:22 +02:00
Christoph Büscher	09ba3ec677	Small refactorings to analysis components (#40745 ) This change adds the following internal refactorings: * wraps input analyzers into an unmodifiable map in IndexAnalyzers ctor * removes duplicated indexSetting in IndexAnalyzers * removes references to IndexAnalyzers from DocumentMapperParser and TypeParser.ParserContext. It can always be retrieve it from MapperService directly in those cases	2019-04-03 14:22:16 +02:00
David Turner	1d2bc85586	Inline TransportReplAction#registerRequestHandlers (#40762 ) It is important that resync actions are not rejected on the primary even if its `write` threadpool is overloaded. Today we do this by exposing `registerRequestHandlers` to subclasses and overriding it in `TransportResyncReplicationAction`. This isn't ideal because it obscures the difference between this action and other replication actions, and also might allow subclasses to try and use some state before they are properly initialised. This change replaces this override with a constructor parameter to solve these issues. Relates #40706	2019-04-03 12:12:26 +01:00
Jason Tedor	df65e46d10	Deprecate versions of Java prior to Java 11 (#40756 ) This commit deprecates versions of Java prior to Java 11. This commit will cause a warning to be printed to standard error when any command line tool is invoked, or when Elasticsearch is started. Additionally, we log a deprecation message when Elasticsearch is started.	2019-04-03 06:39:40 -04:00
David Turner	e64524c46f	Remove some abstractions from `TransportReplicationAction` (#40706 ) `TransportReplicationAction` is a rather complex beast, and some of its concrete implementations do not need all of its features. More specifically, it (a) chases a primary around the cluster until it manages to pin it down and then (b) executes an action on that primary and all its replicas. There are some actions that are coordinated by the primary itself, meaning that there is no need for the chase-the-primary phases, and in the case of peer recovery retention leases and primary/replica resync it is important to bypass these first phases. This commit is a step towards separating the `TransportReplicationAction` into these two parts. It is a mostly mechanical sequence of steps to remove some abstractions that are no longer in use.	2019-04-03 09:08:29 +01:00
Simon Willnauer	dd624c31b0	Don't mark shard as refreshPending on stats fetching (#40458 ) Completion and DocStats are pulled from internal readers instead of external since #33835 and #33847 which doesn't require us to refresh after a stats call since refreshes will happen internally anyhow and that will cause updated stats on ongoing indexing.	2019-04-02 16:15:30 +02:00
David Turner	6f00952abd	Use TAR instead of DOCKER build type before 6.7.0 (#40723 ) In 6.7.0 (#39378) we added a build type of DOCKER for the docker images, but unfortunately earlier versions do not understand this and will reject any transport messages that mention this build type. This commit fixes this by reporting TAR instead of DOCKER when talking to older nodes. Relates (but does not fix) #40511 Relates #39378	2019-04-02 13:17:50 +01:00
Alexander Reelsen	c644fbfc6e	Allow single digit milliseconds in strict date parsing (#40676 ) In order to remain compatible with the existing joda based implementation the parsing of milliseconds should support parsing single digits instead of relying on three, even with strict formats. This adds a few tests to duel against the existing joda based implementation in order to ensure the parsing behaviour is the same. Closes #40403	2019-04-02 10:27:50 +02:00
Andrey Ershov	287e334ef3	Do not perform cleanup if Manifest write fails with dirty exception (#40519 ) Currently, if Manifest write is unsuccessful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong. Consider the following sequence (caught by CI here https://github.com/elastic/elasticsearch/issues/39077): - cluster global data is written successful - the associated manifest write fails (during the fsync, ie files have been written) - deleting (revert) the manifest files, fails, metadata is therefore persisted - deleting (revert) the cluster global data is successful In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen ``` java.io.IOException: failed to find global metadata [generation: 0] ``` because the manifest file is referencing missing global metadata file. This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed. In the future, we might add more fine-grained check - perform the clean up if WriteStateException is dirty, but Manifest deletion is successful. Closes https://github.com/elastic/elasticsearch/issues/39077 (cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)	2019-04-01 12:52:32 +03:00
Jim Ferenczi	7cc79123df	Fix merging of text field mapper (#40627 ) On mapping updates the `text` field mapper does not update the field types for the underlying prefix and phrase fields. In practice this shouldn't be considered as a bug but we have an assert in the code that check that field types in the mapper service are identical to the ones present in field mappers.	2019-04-01 08:41:42 +02:00
Jason Tedor	cebe509460	Fix bug in detecting use of bundled JDK on macOS This commit fixes a bug in detecting the use of the bundled JDK on macOS. This bug arose because the path of Java home is different on macOS.	2019-03-31 19:43:17 -04:00
Henning Andersen	92d07e9377	Geo Point parse error fix (#40447 ) When geo point parsing threw a parse exception, it did not consume remaining tokens from the parser. This in turn meant that indexing documents with malformed geo points into mappings with ignore_malformed=true would fail in some cases, since DocumentParser expects geo_point parsing to end on the END_OBJECT token. Related to #17617	2019-03-29 17:39:12 +01:00
Luca Cavanna	a0b02ce6ef	Move top-level pipeline aggs out of QuerySearchResult (#40319 ) As part of #40177 we have added top-level pipeline aggs to `InternalAggregations`. Given that `QuerySearchResult` holds an `InternalAggregations` instance, there is no need to keep on setting top-level pipeline aggs separately. Top-level pipeline aggs can then always be transported through `InternalAggregations`. Such change is made in a backwards compatible manner.	2019-03-29 17:01:14 +01:00
Oghenovo Usiwoma	444b4c4136	Improve error message for absence of indices (#39789 ) "no indices exist" has been added to the error message for absence of indices	2019-03-29 17:01:14 +01:00
Luca Cavanna	48b0deef4f	Remove throws IOException from PipelineAggregationBuilder#create (#40222 ) IOException are never thrown in any of the existing pipeline aggregation builders. Removing the throws IOException from the create method allows to remove it also from a couple of other methods which ends up simplifying AggregationPhase (one less catch).	2019-03-29 17:01:14 +01:00
Jason Tedor	585f38787c	Add usage indicators for the bundled JDK (#40616 ) This commit adds indications whether or not a distribution is from the bundled JDK, and whether or not we are using the bundled JDK.	2019-03-29 08:25:32 -04:00
Martijn van Groningen	be31800154	Update ingest jdocs that a null return value will drop the current document. (#40359 )	2019-03-29 09:46:54 +01:00
Jason Tedor	7255562afd	Add start and stop time to cat recovery API (#40378 ) The cat recovery API is incredibly useful. Yet it is missing the start and stop time as an option from the output. This commit adds these as options to the cat recovery API. We elect to make these not visible by default to avoid breaking the output that users might rely on.	2019-03-28 16:23:37 -04:00
Mayya Sharipova	24755209b4	Add randomScore function in script_score query (#40186 ) To make script_score query to have the same features as function_score query, we need to add randomScore function. This function produces different random scores on different index shards. It is also able to produce random scores based on the internal Lucene Document Ids.	2019-03-28 13:23:47 -04:00
David Turner	1a3916a8de	Optimise rejection of out-of-range `long` values (#40325 ) Today if you try and insert a very large number like `1e9999999` into a long field we first construct this number as a `BigDecimal`, convert this to a `BigInteger` and then reject it because it is out of range. Unfortunately making such a large `BigInteger` is rather expensive. We can avoid this expense by performing a (weaker) range check on the `BigDecimal` representation of incoming `long`s too. Relates #26137 Closes #40323	2019-03-28 12:27:34 +00:00
David Turner	073b13f5b0	Add docs for cluster.remote..proxy setting (#40281 ) In #33062 we introduced the `cluster.remote..proxy` setting for proxied connections to remote clusters, but left it deliberately undocumented since it needed followup work so that it could work with SNI. However, since #32517 is now closed we can add this documentation and remove the comment about its lack of documentation.	2019-03-28 12:11:24 +00:00
jimczi	8775e37d03	Fix SearchResponseMerger#testMergeSearchHits This commit fixes an edge case in tests where search hits are empty after the merge but some shards returned hits. This can happen if the total number of merged hits is less than the provided `from`. Closes #40553	2019-03-28 09:57:21 +01:00
Adrien Grand	65a35c985c	Remove type from VersionConflictEngineException. (#37490 ) (#40514 ) It initially mentioned the type in the exception because the type used to be required to uniquely identify a document. This is not necessary anymore given that indices have at most one type.	2019-03-28 09:32:09 +01:00
Adrien Grand	2326a3dccb	Remove String interning from `o.e.index.Index`. (#40350 ) (#40517 ) `Index` interns its name and uuid. My guess is that the main goal is to avoid having duplicate strings in the representation of the cluster state. However I doubt it helps much given that we have many other objects in the cluster state that we don't try to reuse, and interning has some cost. When looking into #40263 my profiler pointed to string interning because of the `Index` object that is created in `QueryShardContext` as one of the bottlenecks of the `can_match` phase.	2019-03-28 09:31:42 +01:00
Andy Bristol	23395a9b9f	search as you type fieldmapper (#35600 ) Adds the search_as_you_type field type that acts like a text field optimized for as-you-type search completion. It creates a couple subfields that analyze the indexed terms as shingles, against which full terms are queried, and a prefix subfield that analyze terms as the largest shingle size used and edge-ngrams, against which partial terms are queried Adds a match_bool_prefix query type that creates a boolean clause of a term query for each term except the last, for which a boolean clause with a prefix query is created. The match_bool_prefix query is the recommended way of querying a search as you type field, which will boil down to term queries for each shingle of the input text on the appropriate shingle field, and the final (possibly partial) term as a term query on the prefix field. This field type also supports phrase and phrase prefix queries however	2019-03-27 13:29:13 -07:00
Benjamin Trent	6563dc7ed9	Muting test for #40553 (#40555 )	2019-03-27 14:52:12 -05:00
Tim Brooks	ab44f5fd5d	Add InboundHandler for inbound message handling (#40430 ) This commit adds an InboundHandler to handle inbound message processing. With this commit, this code is moved out of the TcpTransport. Additionally, finer grained unit tests are added to ensure that the inbound processing works as expected	2019-03-27 12:33:26 -06:00
Yannick Welsch	64b31f44af	No mapper service and index caches for replicated closed indices (#40423 ) Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with full indexing and search capabilities allocated. We can save on a lot of heap memory for those indices by not allocating a mapper service and caching infrastructure (which preallocates a constant amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed metricbeat indices (each index with one shard). After this change, the same instance can host 7300 replicated closed metricbeat instances (not that this would be a recommended configuration). Most of the remaining memory is in the cluster state and the IndexSettings object.	2019-03-27 19:04:24 +01:00
Yannick Welsch	8f7c5732f1	Use default discovery implementation for single-node discovery (#40036 ) Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions: - auto-bootstrapping, but requiring initial_master_nodes not to be set. - not actively pinging other nodes using the Peerfinder - not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).	2019-03-27 19:04:24 +01:00
Tim Brooks	3860ddd1a4	Move outbound message handling to OutboundHandler (#40336 ) Currently there are some components of message serializer and sending that still occur in TcpTransport. This commit makes it possible to send a message without the TcpTransport by moving all of the remaining application logic to the OutboundHandler. Additionally, it adds unit tests to ensure that this logic works as expected.	2019-03-27 11:47:36 -06:00
David Turner	707d40ce06	Stabilise testStaleMasterNotHijackingMajority (#40253 ) This test inadvertently asserts that the election occurs after a master failure is clean. However, messy elections are a fact of life so we should not fail on a messy election. This change moves this test away from an `AbstractDisruptionTestCase` since it does not need the fault detector to be so enthusiastic, and weakens the assertions to merely say that we ignore states published by the old master without saying anything about the cleanliness of the election. Closes #36556	2019-03-27 16:00:14 +00:00
Tim Brooks	760cfffe4b	Move TransportMessageListener to TransportService (#40474 ) Currently the TransportMessageListener is applied and used in the Transport class. However, local requests and responses never make it to this class. This PR moves the listener add/remove methods to the TransportService. After this change the Transport can only have one listener set with it. This one listener is the TransportService, which will then propogate the events to the external listeners. Additionally this commit back ports #40237 Remove Tracer from MockTransportService Currently the TransportMessageListener is applied and used in the Transport class. However, local requests and responses never make it to this class. This PR moves the listener add/remove methods to the TransportService. After this change the Transport can only have one listener set with it. This one listener is the TransportService, which will then propogate the events to the external listeners.	2019-03-27 09:24:20 -06:00
Przemyslaw Gomulka	65f01277ed	Parse composite patterns using ClassicFormat.parseObject backport(#40100 ) (#40501 ) Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read. closes #39916 backport #40100	2019-03-27 13:51:44 +01:00
Yannick Welsch	b4b17e16e0	Remove TransportSingleItemBulkWriteAction as replication action (#40424 ) The implementation of TransportIndexAction and TransportDeleteAction as TransportReplicationAction existed for interoperability with older 5.x nodes, as these older nodes coordinated single index / deletes as replication requests. This BWC layer is no longer needed in 7.x, where these single actions are now mapped to bulk requests. Completely removing the deprecated transport actions is not possible yet if we want to keep BWC with a 6.x transport client. The best way here is to wait for the transport client to go away and then just remove the actions.	2019-03-27 13:16:58 +01:00
Jim Ferenczi	fe05a4d511	Fix random failures in SearchResponseMerger#testMergeSearchHits (#40223 ) This commit fixes the expectation in the test when the search hits are empty. Closes #40214	2019-03-27 11:17:10 +01:00
alex101101	fb8ad0cf30	Add a soft limit to the field name length (#40309 ) Adds an optional limit to the length of field names, throws an IllegalArgumentException if the limit is breached. Closes #33651	2019-03-26 17:58:32 +01:00
Daniel Mitterdorfer	f2b5960f90	Add version 6.7.1	2019-03-26 17:38:14 +01:00
Yannick Welsch	bf7b167bba	Remove timeout task after completing cluster state publication (#40411 ) Each cluster state publication schedules a cancellation task with the provided publication timeout (30s by default). This scheduled cancellation keeps a reference to the publication, and therefore the full cluster state that was published. In case of frequently updating a large cluster state, this results in a large number of cancellation tasks keeping references to all previously published cluster states.	2019-03-26 17:13:57 +01:00
Henning Andersen	bf444b9f02	Store Pending Deletions Fix (#40345 ) FilterDirectory.getPendingDeletions does not delegate, fixed temporarily by overriding in StoreDirectory. This in turn caused duplicate file name use after a trimUnsafeCommits had been done, since a new IndexWriter would not consider the pending deletes in IndexFileDeleter. This should only happen on windows (AFAIK). Reenabled doing index updates for all tests using IndexShardTests.indexOnReplicaWithGaps (which could fail due to above when using mocked WindowsFS). Added getPendingDeletions delegation to all elasticsearch FilterDirectory subclasses that were not trivial test-only overrides to minimize the risk of hitting this issue in another case.	2019-03-26 15:30:44 +01:00
Alan Woodward	12634850d6	IntervalQueryBuilderTests#testNonIndexedFields test fix (#40418 ) This test checks that interval queries constructed against a field with no indexed positions will throw exceptions. It uses a randomly-build IntervalsSourceProvider against a fixed set of fields; however, the random source builder can occasionally provide a source with a fixed field, meaning that even if the top-level query asks for a set of intervals over a non-indexed field, the source will delegate to another field, and no exception will be thrown. This commit changes the test to always use a simple Match provider. Fixes #40436	2019-03-26 08:33:42 +00:00
Nhat Nguyen	495dc11c9c	Mute testPendingRefreshWithIntervalChange Tracked at #39565	2019-03-25 11:47:08 -04:00
Armin Braun	3968d46a17	Remove Redundant Request Wrappers from RepositoryService (#40192 ) (#40404 )	2019-03-25 16:36:02 +01:00
Armin Braun	dc5ff0fffc	Log Warning on Failed Blob Deletes in BlobStoreRepository (#40188 ) (#40340 ) * Log Warning on Failed Blob Deletes in BlobStoreRepository * We should not just debug log these spots, they all can and will lead to leaked files when snapshot deletion fails	2019-03-25 08:52:09 +01:00
Nhat Nguyen	b9f96a8e1f	Expose external refreshes through the stats API (#38643 ) Right now, the stats API only provides refresh metrics regarding internal refreshes. This isn't very useful and somewhat misleading for cluster administrators since the internal refreshes are not indicative of documents being available for search. In this PR I added a new metric for collecting external refreshes as they occur and exposing them through the stats API. Now, calling an endpoint for stats will yield external refresh metrics as well. Relates #36712	2019-03-24 22:21:00 -04:00
Armin Braun	13d76239a0	Use Netty ByteBuf Bulk Operations for Faster Deserialization (#40158 ) (#40339 ) * Use bulk methods to read numbers faster from byte buffers	2019-03-24 19:08:51 +01:00
Jason Tedor	10bbb082a4	Only run retention lease actions on active primary (#40386 ) In some cases, a request to perform a retention lease action can arrive on a primary shard before it is active. In this case, the primary shard would not yet be in primary mode, tripping an assertion in the replication tracker. Instead, we should not attempt to perform such actions on an initializing shard. This commit addresses this by not returning the primary shard in the single shard iterator if the primary shard is not yet active.	2019-03-23 09:39:39 -04:00
Zachary Tong	78f737dad3	Map value field to double in MovavgIT (#40230 ) We were accidentally not mapping the index, which meant dynamic mapping was choosing floats for the values. This led to enough loss of precision for the aggregated values to differ slightly from the test doubles, which accumulated into large differences in the holt output. This test fix adds an explicit mapping.	2019-03-21 14:03:14 -04:00
Jason Tedor	1e6941b138	Reduce retention lease sync intervals (#40302 ) This commit adjusts the frequency with which CCR renews retention leases and with which primaries sync retention leases to replicas. This helps Lucene reclaim soft-deleted documents more aggressively, which we have found in some use-cases can help improve performance, and either way will help keep disk space under more control.	2019-03-21 07:37:44 -04:00
Alan Woodward	83d2870308	Add `use_field` option to intervals query (#40157 ) This is the equivalent of the `field_masking_span` query, allowing users to merge intervals from multiple fields - for example, to search for stemmed tokens near unstemmed tokens.	2019-03-20 16:26:04 +00:00
Like	6f64267626	Make setting index.translog.sync_interval be dynamic (#37382 ) Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's not dynamic which can be updated for closed index only. Closes #32763	2019-03-20 17:12:45 +01:00
Yannick Welsch	a5fb7fb17c	Fix snapshot restore logging on fresh restore (#40252 ) A recent refactoring (#37130) where imports got mixed up (changing Lucene's IndexNotFoundException to Elasticsearch's IndexNotFoundException) led to many warnings being logged in case of restoring a fresh snapshot.	2019-03-20 16:51:44 +01:00
Jim Ferenczi	3400483af4	Add date and date_nanos conversion to the numeric_type sort option (#40199 ) (#40224 ) This change adds an option to convert a `date` field to nanoseconds resolution and a `date_nanos` field to millisecond resolution when sorting. The resolution of the sort can be set using the `numeric_type` option of the field sort builder. The conversion is done at the shard level and is restricted to dates from 1970 to 2262 for the nanoseconds resolution in order to avoid numeric overflow.	2019-03-20 16:50:28 +01:00
Nhat Nguyen	efaf95628b	Use separate translog dir in testDeleteWithFatalError This test currently opens a new engine but shares the same translog directory of the previous opening engine.	2019-03-20 10:22:27 -04:00
Mayya Sharipova	49a7c6e0e8	Expose proximity boosting (#39385 ) (#40251 ) Expose DistanceFeatureQuery for geo, date and date_nanos types Closes #33382	2019-03-20 09:24:41 -04:00
Henning Andersen	4c2a8638ca	Cascading primary failure lead to MSU too low (#40249 ) If a replica were first reset due to one primary failover and then promoted (before resync completes), its MSU would not include changes since global checkpoint, leading to errors during translog replay. Fixed by re-initializing MSU before restoring local history.	2019-03-20 14:00:43 +01:00
Simon Willnauer	235f57989f	Return cached segments stats if `include_unloaded_segments` is true (#39698 ) Today we don't return segments stats for closed indices which makes it hard to tell how much memory such an index would require. With this change we return the statistics if requested by setting `include_unloaded_segments` to true on the rest request. Relates to #39512	2019-03-20 12:08:41 +01:00
Jason Tedor	9ce740a2eb	Modfiy casing in JVM home log message This makes the log message consistent with the following line that shows the JVM arguments.	2019-03-20 00:06:16 -04:00
Zachary Tong	69f5869707	Mute SearchResponseMergerTests#testMergeSearchHits Tracking issue: https://github.com/elastic/elasticsearch/issues/40214	2019-03-19 13:40:38 -04:00
David Turner	33d8738c68	Fix RareClusterStateIT on MacOS (#40203 ) Today RareClusterStateIT#testAssignmentWithJustAddedNodes fails on my Mac because it waits for the default connection timeout of 30 seconds to connect to a fake node with IP address 0.0.0.0. This connection attempt fails much more quickly on Linux so the test passes. This commit fixes this by reducing the connection timeout for this test.	2019-03-19 17:33:21 +00:00
Nhat Nguyen	a13b4bc8c5	Always fail engine if delete operation fails (#40117 ) Unlike index operations which can fail at the document level to analyzing errors, delete operations should never fail at the document level whether soft-deletes is enabled or not. With this change, we will always fail the engine if we fail to apply a delete operation to Lucene. Closes #33256	2019-03-19 13:09:23 -04:00
Luca Cavanna	d14e79e849	Serialize top-level pipeline aggs as part of InternalAggregations (#40177 ) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059	2019-03-19 14:43:39 +01:00
Luca Cavanna	803ec46331	Skip sibling pipeline aggregators reduction during non-final reduce (#40101 ) Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction, pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned. Each coordinating node should rather honour the reduce context flag that indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone. Note that his bug affects only pipeline aggs that don't have a parent in the aggs tree, while all the others work well. Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.	2019-03-19 14:43:39 +01:00
Luca Cavanna	83f12a3d9c	CCS: skip empty search hits when minimizing round-trips (#40098 ) When minimizing round-trips, each cluster returns its own independent search response. In case sort by field and/or field collapsing were requested, when one cluster has no results to return, the information about the field that sorting was based on (SortField array) as well as the field (and the values) that collapsing was performed on are missing in the search response. That causes problems as we can't build the proper `TopDocs` instance which would need to be either `TopFieldDocs` or `CollapseTopFieldDocs`. The merge routine expects that all the top docs are of the same exact type which can't be guaranteed. Given that the problematic results are empty, hence have no impact on the final results, we can simply skip them. Relates to #32125 Closes #40067	2019-03-19 14:43:39 +01:00
Luca Cavanna	9c38fa6468	[TEST] Update TransportSearchActionTests#testShouldMinimizeRoundtrips Relates to #40044 Closes #40051	2019-03-19 14:43:38 +01:00
Luca Cavanna	07bfb4c7f7	CCS: Disable minimizing round-trips when dfs is requested (#40044 ) When using DFS_QUERY_THEN_FETCH search type, the dfs phase is run and its results are used in the query phase to make scoring accurate. When using CCS, depending on whether the DFS phase runs in the CCS coordinating node (like if all shards were local) or in each remote cluster (when minimizing round-trips), scoring will differ. This commit disables minimizing round-trips whenever DFS is requested, as it is not currently possible to ensure that scoring is accurate in that case. Relates to #32125	2019-03-19 14:43:38 +01:00
Nhat Nguyen	8dc6862b17	Unmute and trace testPendingRefreshWithIntervalChange Tracked at #39565	2019-03-19 09:07:54 -04:00
Henning Andersen	dde41cc2dd	Node repurpose tool (#39403 ) When a node is repurposed to master/no-data or no-master/no-data, v7.x will not start (see #37748 and #37347). The `elasticsearch repurpose` tool can fix this by cleaning up the problematic data.	2019-03-19 11:52:02 +01:00
Dimitris Athanasiou	95f660d577	Mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock test (#39689 ) Relates #39688	2019-03-18 15:04:26 -06:00
Henning Andersen	0b214c1bfb	Linearizability checker memory reduction (#40149 ) The cache used in linearizability checker now uses approximately 6x less memory by changing the cache from a set of (bits, state) tuples into a map from bits -> { state }. Each combination of states is kept once only, building on the assumption that the number of state permutations is small compared to the number of bits permutations. For those histories that are difficult to check we will have many bits combinations that use the same state permutations. We end up now using approximately 15 bytes per entry compared to 101 bytes before, ie. a 6x improvement, allowing us to linearizability check significantly longer histories. Re-enabled linearizability checker in CoordinatorTests, hoping above ensures we no longer run out of memory. Resolves #39437	2019-03-18 21:16:59 +01:00
Nhat Nguyen	38e9522218	Remove wait for cluster state step in peer recovery (#40004 ) We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped using it since #25692 (6.0). This change removes that action and related code in 7.x and 8.0. Relates #19287 Relates #25692	2019-03-18 15:17:21 -04:00
Nhat Nguyen	d720a64b9e	Ensure sendBatch not called recursively (#39988 ) This PR introduces AsyncRecoveryTarget which executes remote calls of peer recovery asynchronously. In this change, we also add a new assertion to ensure that method sendBatch, which sends a batch of history operations in phase2, is never called recursively on the same thread. This new assertion will also be used in method sendFileChunks.	2019-03-18 15:17:21 -04:00
Jim Ferenczi	eb540125ea	Fix IndexSearcherWrapper visibility (#39071 ) (#40145 ) This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin. This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed. Closes #30758	2019-03-18 11:33:54 +01:00
Jim Ferenczi	5b73a1bc7d	Add an option to force the numeric type of a field sort (#38095 ) (#40084 ) This change adds an option to the `FieldSortBuilder` that allows to transform the type of a numeric field into another. Possible values for this option are `long` that transforms the source field into an integer and `double` that transforms the source field into a floating point. This new option is useful for cross-index search when the sort field is mapped differently on some indices. For instance if a field is mapped as a floating point in one index and as an integer in another it is possible to align the type for both indices using the `numeric_type` option: ``` { "sort": { "field": "my_field", "numeric_type": "double" <1> } } ``` <1> Ensure that values for this field are transformed to a floating point if needed.	2019-03-18 09:32:45 +01:00
Albert Zaharovits	1b75ee0bd7	AuditTrail correctly handle ReplicatedWriteRequest (#39925 ) This fix deduplicates index names in `BulkShardRequests` and only audits the specific resolved index for every comprising `BulkItemRequest`.	2019-03-17 13:05:26 +02:00
Jason Tedor	86d1d03c37	Remove cluster state size (#40109 ) This commit removes the cluster state size field from the cluster state response, and drops the backwards compatibility layer added in 6.7.0 to continue to support this field. As calculation of this field was expensive and had dubious value, we have elected to remove this field.	2019-03-15 17:16:25 -04:00
Tim Brooks	0b50a670a4	Remove transport name from tcp channel (#40074 ) Currently, we maintain a transport name ("mock-nio", "nio", "netty") that is passed to a `TcpTransportChannel` when a request is received. The value of this name is to associate with the task when we register a task with the task manager. However, it is only possible to run ES with one transport, so having an implementation specific name is unnecessary. This commit removes the name and replaces it with the generic "transport".	2019-03-15 12:04:13 -06:00
Zachary Tong	c72feedd74	Do not allow Sampler to allocate more than maxDoc size, better CB accounting (#39381 ) The `sampler` agg creates a BestDocsDeferringCollector, which internally initializes a priority queue of size `shardSize`. This queue is populated with empty `Object` sentinels, which is roughly 16b per object. Similarly, the Diversified samplers create a DiversifiedTopDocsCollectors which internally track PQ slots with ScoreDocKeys, weighing in around 28kb If the user sets a very abusive `shard_size`, this could easily OOM a node or cluster since these PQ are allocated up-front without any checks. This commit makes sure that when we create the collector, it cannot be larger than the maxDoc so that we don't accidentally blow up the node. We ensure the size is not greater than the overall index maxDoc. A similar treatment is done for `maxDocsPerValue` parameter of the diversified samplers For good measure, this also adds in some CB accounting to try and track memory usage. Finally, a redundant array creation is removed to reduce a bit of temporary memory.	2019-03-15 13:19:55 -04:00
Yannick Welsch	c74111ff8e	Reduce logging noise when stepping down as master before state recovery (#39950 ) Reduces the logging noise from the state recovery component when there are duelling elections. Relates to #32006	2019-03-15 17:24:03 +01:00
David Turner	0d152a54f8	Await all pending activity in testConnectAndDisconnect (#40037 ) We call `ensureConnections()` to undo the effects of a disruption. However, it is possible that one or more targets are currently CONNECTING and have been since the disruption was active, and that the connection attempt was thwarted by a concurrent disruption to the connection. If so, we cannot simply add our listener to the queue because it will be notified when this CONNECTING activity completes even though it was disrupted. We must therefore wait for all the current activity to finish and then go through and reconnect to any missing nodes. Closes #40030.	2019-03-15 08:08:57 +00:00
David Turner	a323132503	Create retention leases file during recovery (#39359 ) Today we load the shard history retention leases from disk whenever opening the engine, and treat a missing file as an empty set of leases. However in some cases this is inappropriate: we might be restoring from a snapshot (if the target index already exists then there may be leases on disk) or force-allocating a stale primary, and in neither case does it make sense to restore the retention leases from disk. With this change we write an empty retention leases file during recovery, except for the following cases: - During peer recovery the on-disk leases may be accurate and could be needed if the recovery target is made into a primary. - During recovery from an existing store, as long as we are not force-allocating a stale primary. Relates #37165	2019-03-15 07:49:49 +00:00
David Turner	8d2184b315	Fix up committed configuration on fake Zen1 nodes (#40065 ) Today we test Zen1/Zen2 compatibility by running 7.x nodes with a "fake" Zen1 implementation. However this is not a truly faithful test because these nodes do known how to properly deserialize a 7.x cluster state, voting configurations and all, whereas a real Zen1 node is in 6.7 and ignores the coordination metadata. We only ever apply a cluster state that's been committed, which in Zen2 involves setting the last-committed configuration to equal the last-accepted configuration. Zen1 knows nothing about this adjustment, so it is possible for these to differ. This breaks the assertion that the cluster states are equal on all nodes after integration tests. This commit fixes this by implementing this adjustment in Zen1 before applying a cluster state. Fixes #40055.	2019-03-15 07:44:31 +00:00
Ioannis Kakavas	35aaf04c8c	Handle empty input in AddStringKeyStoreCommand (#39490 ) This change ensures that we do not make assumptions about the length of the input that we can read from the stdin. It still consumes only one line, as the previous implementation	2019-03-15 09:38:22 +02:00
Tamara Braun	e2b60c7141	Fix not Recognizing Disabled Object Mapper (#39862 ) * Fixes not finding disabled object mapper when using dotted field name notation * Closes #39456	2019-03-14 10:57:00 -07:00
Ioannis Kakavas	8dc8fc507d	Handle UTF-8 values in the keystore (#39496 ) * Handle UTF8 values in the keystore Our current implementation uses CharBuffer#array to get the chars that were decoded from the UTF-8 bytes. The backing array of CharBuffer is created in CharsetDecoder#decode and gets an initial length that is the same as the length of the ByteBuffer it decodes, hence the number of UTF-8 bytes. This works fine for the first 128 characters where each one needs one bytes, but for the next UTF-8 characters (other latin alphabets Greek, Cyrillic etc.) where we need 2 to 4 bytes per character, this backing char array has a larger size than the number of the actual chars this CharBuffer contains. Calling `array()` on it will return a char array that can potentially have extra null chars so the SecureString we get from the KeystoreWrapper, is not the same as the one we entered. This commit changes the behavior to use Arrays#copyOfRange to get the necessary chars from the CharBuffer and adds a test with random ( maybe not printable ) UTF-8 strings	2019-03-14 18:03:50 +02:00
Jason Tedor	9181668edf	Stop returning cluster state size by default (#40016 ) Computing the compressed size of the cluster state on every invocation of cluster:monitor/state action is expensive, and the value of this field is dubious anyway. Therefore we want to remove computing this field. As a first step, we stop computing and return this field by default. To avoid breaking users, we will give them a system property to use to tide them over until the next major release when we will actually remove this field. This comes with a deprecation warning too, and the backport to the appropriate minor will also include a note in the migration guide. There will be a follow-up to remove this field in the next major version.	2019-03-14 08:57:55 -04:00
Yogesh Gaikwad	20e5994179	Mute failing tests in NodeConnectionsServiceTests (#40034 ) (#40035 )	2019-03-14 19:40:15 +11:00
Przemyslaw Gomulka	8a314a36db	Change zone formatting for all printers backport(#39568 ) #39952 After the joda-java time migration we were formatting zone ids with zoneOrOffsetId method. This when a date was provided with a ZoneRegion for instance America/Edmonton it was appending this zone identifier instead of zone formatted as +HH:MM. This fix is changing the format of zone suffix for all printers and also always wrapping a Temporal into a ZonedDateTime when formatting. closes #38471 backport #39568	2019-03-13 18:27:37 +01:00
Tim Brooks	352f9f1f39	Remove sizing from `Recycler#obtain` (#39975 ) Currently there is a method `Recycler#obtain(size)` that allows a size parameter to be passed. However all implementations ignore this parameter and just allocate a page size based on other settings. This commit removes this method.	2019-03-13 09:32:31 -06:00
Andrey Ershov	9300826d8a	Do not log unsuccessful join attempt each time (#39756 ) When performing the test with 57 master-eligible nodes and one node crash, we saw messy elections, when multiple nodes were attempting to become master. JoinHelper has logged 105 long log messages with lengthy stack traces during one such election. To address this, we decided to log these messages every time only on debug level. We will log last unsuccessful join attempt (along with a timestamp) if any with WARN level if the cluster is failing to form. (cherry picked from commit 17a148cc27b5ac6c2e04ef5ae344da05a8a90902)	2019-03-13 13:30:31 +01:00
Christoph Büscher	b10dd3769c	Add analysis modes to restrict token filter use contexts (#36103 ) Currently token filter settings are treated as fixed once they are declared and used in an analyzer. This is done to prevent changes in analyzers that are already used actively to index documents, since changes to the analysis chain could corrupt the index. However, it would be safe to allow updates to token filters at search time ("search_analyzer"). This change introduces a new property of token filters that allows to mark them as only being usable at search or at index time. Any analyzer that uses these tokenfilters inherits that property and can be rejected if they are used in other contexts. This is a first step towards making specific token filters (e.g. synonym filter) updateable. Relates to #29051	2019-03-12 23:48:55 +01:00
Andy Bristol	e2b88bc706	add version 6.6.3	2019-03-12 13:21:36 -07:00
David Turner	049970af3e	Only connect to new nodes on new cluster state (#39629 ) Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves #29025.	2019-03-12 19:26:13 +00:00
Przemyslaw Gomulka	a29bba4ede	Migrate Streamable to writeable for index package backport(#37381 ) #39949 Migrate streamable classes from index package to Writeable and clean up access modifiers Related to #34389 backport#37381	2019-03-12 12:10:36 +01:00
lzh3636	ad55e5b80d	Log missing file exception when failing to read metadata snapshot (#32920 ) Adds the exception to the logged output, which contains info about the file that's missing.	2019-03-12 10:41:44 +01:00
Nhat Nguyen	ce5f09ab04	Enforce retention leases require soft deletes (#39922 ) If a primary on 6.7 and a replica on 5.6 are running more than 5 minutes (retention leases background sync interval), the retention leases background sync will be triggered, and it will trip 6.7 node due to the illegal checkpoint value. We can fix the problem by making the returned checkpoint depends on the node version. This PR, however, chooses to enforce retention leases require soft deletes, and make retention leases sync noop if soft deletes is disabled instead. Closes #39914	2019-03-11 22:37:47 -04:00
Nhat Nguyen	bf814357ad	Enable soft deletes in RetentionLeaseIT Relates #39922	2019-03-11 22:37:42 -04:00
Armin Braun	9eb4614fa6	More Verbose Assertion in testSnapshotWithStuckNode (#39893 ) (#39928 ) * The test failure in #39852 is caused by a file in the initial repository when there should not be any * It seems that on a normal consistent file system no left-over file should exist ever here after the validation finishes and I can't reproduce or see any other path to a dangling file in the fresh respository => added a more verbose and strict assertion that will log what file is left over next time * Relates #39852	2019-03-11 19:27:08 +01:00
Jake Landis	b0b0f66669	Remove types from internal monitoring templates and bump to api 7 (#39888 ) (#39926 ) This commit removes the "doc" type from monitoring internal indexes. The template still carries the "_doc" type since that is needed for the internal representation. This change impacts the following templates: monitoring-alerts.json monitoring-beats.json monitoring-es.json monitoring-kibana.json monitoring-logstash.json As part of the required changes, the system_api_version has been bumped from "6" to "7" and support for version "2" has been dropped. A new empty pipeline is now introduced for the version "7", and the formerly empty "6" pipeline will now remove the type and re-direct the request to the "7" index. Additionally, to due to a difference in the internal representation (which requires the inclusion of "_doc" type) and external representation (which requires the exclusion of any type) a helper method is introduced to help convert internal to external representation, and used by the monitoring HTTP template exporter. Relates #38637	2019-03-11 13:17:27 -05:00
Yannick Welsch	4f941c6963	Do not swallow exceptions in TimedRunnable (#39856 ) Executors of type fixed_auto_queue_size (i.e. search / search_throttled) wrap runnables into TimedRunnable, which is an AbstractRunnable. This is dangerous as it might silently swallow exceptions, and possibly miss calling a response listener. While this has not triggered any failures in the tests I have run so far, it might help uncover future problems. Follow-up to #36137	2019-03-11 19:03:12 +01:00
Yannick Welsch	292eb8b001	Fix CoordinatorTests.testIncompatibleDiffResendsFullState (#39345 ) This test started failing since decreasing the leader and follower check timeouts (#38298). The reason is that the test was relying on the default publication timeout to come into effect before leader / follower check timeouts, which is now not always true anymore. Closes #38867	2019-03-11 19:03:10 +01:00
Tim Brooks	dd77899278	Log send failure at debug level if channel closed (#39807 ) Currently we log exceptions due to channel close at the debug level in the normal exception handler. Currently we log all send failures due to channel close at the warn level. This commit changes that to only log at warn if the send failure is not due to channel closed. Additionally, it adds the ssl engine closed as a channel close exception.	2019-03-11 10:33:02 -06:00
Yannick Welsch	b7be724e50	Check term earlier in publication process (#39909 ) in order to avoid tripping assertPreviousStateConsistency. Closes #39314	2019-03-11 15:40:20 +01:00
David Turner	6e4f304f88	Synchronize pendingOutgoingJoins (#39900 ) Today we use a ConcurrentHashSet to track the in-flight outgoing joins in the `JoinHelper`. This is fine for adding and removing elements but not for the emptiness test in `isJoinPending()` which might return false if one join finishes just after another one starts, even though joins were pending throughout. As used today this is ok: it means the node was trying to join a master but this join attempt just finished unsuccessfully, and causes it to (rightfully) reject a `FollowerCheck` from the failed master. However this kind of API inconsistency is trappy and there is no need to be clever here, so this change replaces the set with a `synchronizedSet()`.	2019-03-11 12:13:21 +00:00
Ankit Jain	471aa6a16a	Fixing 503 Service Unavailable errors during fetch phase (#39086 ) When ESRejectedExecutionException gets thrown on the coordinating node while trying to fetch hits, the resulting exception will hold no shard failures, hence `503` is used as the response status code. In that case, `429` should be returned instead. Also, the status code should be taken from the cause if available whenever there are no shard failures instead of blindly returning `503` like we currently do. Closes #38586	2019-03-11 10:13:55 +01:00
Adrien Grand	b841de2e38	Don't emit deprecation warnings on calls to the monitoring bulk API. (#39805 ) (#39838 ) The monitoring bulk API accepts the same format as the bulk API, yet its concept of types is different from "mapping types" and the deprecation warning is only emitted as a side-effect of this API reusing the parsing logic of bulk requests. This commit extracts the parsing logic from `_bulk` into its own class with a new flag that allows to configure whether usage of `_type` should emit a warning or not. Support for payloads has been removed for simplicity since they were unused. @jakelandis has a separate change that removes this notion of type from the monitoring bulk API that we are considering bringing to 8.0.	2019-03-11 07:58:28 +01:00
Adrien Grand	2bbef67770	Propagate exceptions in o.e.common.io.Streams. (#39042 ) (#39848 ) This commit propagates some exceptions that were previously swallowed and also makes sure that exceptions closing streams are either propagated if the try block succeeded or added as suppressed exceptions otherwise.	2019-03-11 07:58:01 +01:00
Benjamin Trent	4da04616c9	[ML] refactoring lazy query and agg parsing (#39776 ) (#39881 ) * [ML] refactoring lazy query and agg parsing * Clean up and addressing PR comments * removing unnecessary try/catch block * removing bad call to logger * removing unused import * fixing bwc test failure due to serialization and config migrator test * fixing style issues * Adjusting DafafeedUpdate class serialization * Adding todo for refactor in v8 * Making query non-optional so it does not write a boolean byte	2019-03-10 14:54:02 -05:00
Julie Tibshirani	8454cfc1b2	Move validation from FieldTypeLookup to MapperMergeValidator. (#39814 ) This commit consolidates more mapping validation logic into the same class. `FieldTypeLookup` is now a bit simpler, and has the sole responsibility of quickly resolving field names to their types. I have a broader refactor planned around mapping merge validation, but this change should at least be a step in the right direction.	2019-03-08 18:05:21 -08:00
Nhat Nguyen	993182e426	Combine overriddenOps and skippedOps in translog (#39771 ) These two stats are not important enough to be distinguishable. This change combines them into a single stat. Closes #33317	2019-03-08 16:28:50 -05:00
Julie Tibshirani	be9c37fc76	Small simplifications to mapping validation. (#39777 ) These simplifications to `MapperMergeValidator` are possible now that there is always a single mapping definition. * Remove the type argument in `validateMapperStructure`. * Remove unnecessary checks against existing mappers.	2019-03-08 12:34:09 -08:00
Nhat Nguyen	a0a91f74ff	Treat TransportService stopped error as node is closing (#39800 ) If TransportService is stopped before a shard-failure request is sent but after the request is registered, TransportService will notify ReplicationOperation a TransportException with an error message: "transport stop, action: internal:cluster/shard/failure". Relates #39584	2019-03-08 15:15:56 -05:00
Ryan Ernst	465343f12a	Bundle java in distributions (#38013 ) * Bundle java in distributions Setting up a jdk is currently a required external step when installing elasticsearch. This is particularly problematic for the rpm/deb packages as installing a jdk in the same package installation command does not guarantee any order, so must be done in separate steps. Additionally, JAVA_HOME must be set and often causes problems in selecting a correct jdk when, for example, the system java is an older unsupported version. This commit bundles platform specific openjdks into each distribution. In addition to eliminating the issues above, it also presents future possible improvements like using jlink to build jdk images only containing modules that elasticsearch uses. closes #31845	2019-03-08 11:04:18 -08:00
Gordon Brown	e6b9262a31	Mute testOpenCloseApiWildcards (#39578 ) (#39579 )	2019-03-08 15:18:16 +00:00
David Roberts	aec2db78ea	Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica Due to https://github.com/elastic/elasticsearch/issues/36813	2019-03-08 13:28:27 +00:00
David Roberts	366eef99a1	Mute SharedClusterSnapshotRestoreIT.testCloseOrDeleteIndexDuringSnapshot Due to https://github.com/elastic/elasticsearch/issues/39828	2019-03-08 11:42:13 +00:00
David Turner	5d68143b18	Reformat elasticsearch-node messages (#39811 ) Flows the warning messages emitted by the `elasticsearch-node` tool to a width of 72 characters and tweaks the wording slightly.	2019-03-08 10:01:29 +00:00
Jake Landis	797d6b8a66	Execute ingest node pipeline before creating the index (#39607 ) (#39796 ) Prior to this commit (and after 6.5.0), if an ingest node changes the _index in a pipeline, the original target index would be created. For daily indexes this could create an extra, empty index per day. This commit changes the TransportBulkAction to execute the ingest node pipeline before attempting to create the index. This ensures that the only index created is the original or one set by the ingest node pipeline. This was the execution order prior to 6.5.0 (#32786). The execution order was changed in 6.5 to better support default pipelines. Specifically the execution order was changed to be able to read the settings from the index meta data. This commit also includes a change in logic such that if the target index does not exist when ingest node pipeline runs, it will now pull the default pipeline (if one exists) from the settings of the best matched of the index template. Relates #32786 Relates #32758 Closes #36545	2019-03-07 13:31:41 -06:00
Jason Tedor	0250d554b6	Introduce forget follower API (#39718 ) This commit introduces the forget follower API. This API is needed in cases that unfollowing a following index fails to remove the shard history retention leases on the leader index. This can happen explicitly through user action, or implicitly through an index managed by ILM. When this occurs, history will be retained longer than necessary. While the retention lease will eventually expire, it can be expensive to allow history to persist for that long, and also prevent ILM from performing actions like shrink on the leader index. As such, we introduce an API to allow for manual removal of the shard history retention leases in this case.	2019-03-07 11:08:45 -05:00
Armin Braun	213cc6673c	Remove Dead Code in o.e.util package (#39717 ) (#39779 ) * None of this code is used so we should delete it, we can always bring it back if needed	2019-03-07 08:31:46 +01:00
Nhat Nguyen	b69affda6a	Use unwrapped cause to determine if node is closing (#39723 ) We need to unwrap and use the actual cause when determining if the node with primary shard is shutting down because TransportService will throw a TransportException wrapped in a SendRequestTransportException. Relates #39584	2019-03-06 15:30:55 -05:00
Nhat Nguyen	1fe7cb594f	Don’t ack if unable to remove failing replica (#39584 ) Today when a replicated write operation fails to execute on a replica, the primary will reach out to the master to fail that replica (and mark it stale). We then won't ack that request until the master removes the failing replica; otherwise, we will lose the acked operation if the failed replica is still in the in-sync set. However, if a node with the primary is shutting down, we might ack such request even though we are unable to send a shard-failure request to the master. This happens because we ignore NodeClosedException which is triggered when the ClusterService is being closed. Closes #39467	2019-03-06 15:30:55 -05:00
markharwood	1873de5240	Bug fix for AnnotatedTextHighlighter - port of 39525 (#39749 ) Bug fix for AnnotatedTextHighlighter - port of 39525 Relates to #39395	2019-03-06 19:02:04 +00:00
Yannick Welsch	d094107592	Fix SharedClusterSnapshotRestoreIT Relates to #39644	2019-03-06 17:51:23 +01:00
Yannick Welsch	fef11f7efc	Allow snapshotting replicated closed indices (#39644 ) This adds the capability to snapshot replicated closed indices. It also changes snapshot requests in v8.0.0 to automatically expand wildcards to closed indices and hence start snapshotting closed indices by default. For v7.1.0 and above, wildcards are by default only expanded to open indices, which can be changed by explicitly setting the expand_wildcards option either to all or closed. Note that indices are always restored as open indices, even if they have been snapshotted as closed replicated indices. Relates to #33888	2019-03-06 16:08:20 +01:00
Simon Willnauer	e620fb2e4a	Add option to force load term dict into memory (#39741 ) Lucene added an optimization to leave the term dictionary on disk for non-id like fields. This change happened very late in the release processes such that it's better to have an escape hatch if certain use-cases are hurt by this optimization. This setting might be removed in the future if it turns out to be unnecessary.	2019-03-06 15:29:04 +01:00
Christoph Büscher	6c503824c8	Fix occasional SearchServiceTests failure (#39697 ) Currently SearchServiceTests.testCloseSearchContextOnRewriteException can fail if a refresh happens while we test for the SearchPhaseExecutionException that is thrown later in the test. The test takes the current Store#refCount and expects it to be the same after the exception is thrown. If a refresh happens in that interval however, the refCound will be different, causing the test to fail. This can be provoked e.g. by running this section in a tight loop. Switching of refresh for this tests solves the issue.	2019-03-06 14:18:03 +01:00
Andrey Ershov	52fd102e23	Avoid serialising state if it was already serialised (#39179 ) When preparing the state to send to other nodes, we're serializing it for each node, despite using putIfAbsent. This commit checks if the state was already serialized for this node version before performing the potentially expensive computation. The map is not used by multiple threads, so computeIfAbsent is not needed (and could not be used here easily, because IOException could be thrown). (cherry picked from commit c99be63b43f5250f3cd220130df73c5e9e097459)	2019-03-06 11:54:13 +01:00
David Turner	295e39a8c8	Drop node if asymmetrically partitioned from master (#39598 ) When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.	2019-03-06 09:41:57 +00:00
David Turner	77dd711847	Tidy up GroupedActionListener (#39633 ) Today the `GroupedActionListener` accepts a `defaults` parameter but all callers pass an empty list. Also it is permitted to pass an empty group but this is trappy because the delegated listener is never be called in that case. This commit removes the `defaults` parameter and forbids an empty group.	2019-03-06 09:25:10 +00:00
Armin Braun	aaecaf59a4	Optimize Bulk Message Parsing and Message Length Parsing (#39634 ) (#39730 ) * Optimize Bulk Message Parsing and Message Length Parsing * findNextMarker took almost 1ms per invocation during the PMC rally track * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search * It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes * Fixed by adding bulk `int` read to BytesReference	2019-03-06 08:13:15 +01:00
Jason Tedor	75a0d4f470	Rename retention lease setting (#39719 ) This commit renames the retention lease setting index.soft_deletes.retention.lease so that it is under the namespace index.soft_deletes.retention_lease. As such, we rename the setting to index.soft_deletes.retention_lease.period.	2019-03-05 22:04:45 -05:00
Jason Tedor	504c792861	Add Docker build type (#39378 ) This commit adds a new build type (together with deb/rpm/tar/zip) to represent the official Docker images. This build type will be displayed in APIs such as the main and nodes info APIs.	2019-03-05 22:03:15 -05:00
Luca Cavanna	9d0211485c	Tie-break completion suggestions with same score and surface form (#39564 ) In case multiple completion suggestion entries have the same score and surface form, the order in which such options will be returned is currently not deterministic. With this commmit we introduce tie-breaking for such situations, based on shard id, index name, index uuid and doc id like we already do for ordinary search hits. With this change we also make shardIndex mandatory when sorting and comparing completion suggestion options, which was previously only needed later when fetching hits). Also, we need to make sure shardIndex is properly set when merging completion suggestions coming from multiple clusters in `SearchResponseMerger`	2019-03-05 18:03:54 +01:00
Jim Ferenczi	160dc29f0e	Handle total hits equal to track_total_hits (#37907 ) This change ensures that a total hits equal to the value set for track_total_hits is not considered as a lower bound.	2019-03-05 16:28:48 +01:00
Armin Braun	750ec8ba53	Minor Cleanups in QueryPhase (#39680 ) (#39694 ) * Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for #39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`	2019-03-05 15:04:16 +01:00
Christoph Büscher	5cdea6ef17	Fix Fuzziness#asDistance(String) (#39643 ) Currently Fuzziness#asDistance(String) doesn't work for custom AUTO values. If the fuzziness is AUTO, the method returns the correct edit distance to use, depending on the input string, but for custom AUTO values it currently always returns an edit distance of 1. Correcting this and adding unit and integration tests to catch these cases. Closes #39614	2019-03-05 14:31:07 +01:00
Simon Willnauer	19f6a35358	Move BWC Version to 7.1.0 after backport Relates to #39512	2019-03-05 14:11:59 +01:00
Simon Willnauer	d112c89041	Allow inclusion of unloaded segments in stats (#39512 ) Today we have no chance to fetch actual segment stats for segments that are currently unloaded. This is relevant in the case of frozen indices. This allows to monitor how much memory a frozen index would use if it was unfrozen.	2019-03-05 14:02:20 +01:00
Armin Braun	e8d9744340	Use Threadpool Time in ClusterApplierService (#39679 ) (#39685 ) * Use threadpool's time in `ClusterApplierService` to allow for deterministic tests * This is a part of/requirement for #39504	2019-03-05 12:37:49 +01:00
Gordon Brown	380dc27d91	Mute testCloseWhileRelocatingShards (#39589 )	2019-03-05 13:34:43 +02:00
Alan Woodward	0b14782b23	Add stopword support to IntervalBuilder (#39637 ) The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.	2019-03-05 10:50:45 +00:00
Christoph Büscher	2fe1fa8972	Shortcut counts on exists queries (#39570 ) (#39660 ) `TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed. Closes #37475	2019-03-04 19:53:43 +01:00
Prabhakar S	98925e9a09	Fixing the custom object serialization bug in diffable utils. (#39544 ) While serializing custom objects, the length of the list is computed after filtering out the unsupported objects but while writing objects the filter is not applied thus resulting in writing unsupported objects which will fail to deserialize by the receiever. Adding the condition to filter out unsupported custom objects.	2019-03-04 18:41:14 +01:00
Nhat Nguyen	801f13f201	Assert recovery done in testDoNotWaitForPendingSeqNo (#39595 ) Since #39006 we should be able to complete a peer-recovery without waiting for pending indexing operations. Thus, the assertion in testDoNotWaitForPendingSeqNo should be updated from false to true. Closes #39510	2019-03-04 10:21:23 -05:00
Yannick Welsch	936dbb00e3	Isolate Zen1 (#39470 ) Cherry-picks a few commits from #39466 to align 7.x with master branch.	2019-03-04 15:51:17 +01:00
Luca Cavanna	9ddaabba88	Remote private SearchHits.Total class (#39556 ) This is now possible as Lucene's `TotalHits` implements `equals`/`hashcode`, all the other methods can be in-lined in `SearchHits` instead, no need for a specific wrapper class.	2019-03-04 13:46:45 +01:00
Armin Braun	547af21a12	Introduce Mapping ActionListener (#39538 ) (#39636 ) * Introduce Safer Chaining of Listeners * The motivation here is to make reasoning about chains of `ActionListener` a little easier, by providing a safe method for nesting `ActionListener` that guarantees that a response is never dropped. Also, it dries up the code a little by removing the need to repeat `listener::onFailure` and `listener.onResponse` over and over. * Refactored a number of obvious/easy spots to use the new listener constructor	2019-03-04 12:56:46 +01:00
Daniel Mitterdorfer	fca6a2f006	Avoid deprecated API usage in TaskOperationFailure (#39303 ) (#39628 ) With this commit we remove usage of the deprecated method `ExceptionsHelper#detailedMessage` in the class `TaskOperationFailure`. Relates #19069	2019-03-04 11:37:59 +01:00
David Turner	dd68244841	Wait for state recovery in testFreshestMasterElectedAfterFullClusterRestart (#39602 ) Zen1IT#testFreshestMasterElectedAfterFullClusterRestart fails sometimes because we request the cluster state before state recovery has completed, and therefore obtain the default value for the setting we're relying on. Confusingly, we were starting out by setting this setting to its default value, so the test looked like it was failing because of a production bug. This commit avoids this confusion in future by setting it to a non-default value at the start of the test. Fixes #39586.	2019-03-04 10:26:07 +00:00
Adrien Grand	782f873165	Don't swallow exceptions in Store#close(). (#39035 ) (#39622 ) Store#close() swallows any `IOException`. Relates #39030	2019-03-04 10:58:43 +01:00
Adrien Grand	934946a232	Don't swallow exception in ThreadPool.terminate. (#39038 ) (#39623 ) The use of `closeWhileHandlingException` means that any exception while trying to close the threadpool is going to be swallowed. Relates #39030	2019-03-04 10:58:29 +01:00
Adrien Grand	21540a5ada	Enhancements to IndicesQueryCache. (#39099 ) (#39626 ) This commit adds the following: - more tests to IndicesServiceCloseTests, one of them found a bug in the order in which `IndicesQueryCache#onClose` and `IndicesService.indicesRefCount#decRef` are called. - made `IndicesQueryCache.stats2` a synchronized map. All writes to it are already protected by the lock of the Lucene cache, but the final read from an assertion in `IndicesQueryCache#close()` was not so this change should avoid any potential visibility issues. - human-readable `toString`s to make debugging easier. Relates #37117	2019-03-04 10:58:12 +01:00
Armin Braun	68bc178017	Disable Bwc Tests (#39551 ) * Disable Bwc Tests * For #39550	2019-03-04 10:41:52 +01:00
Yannick Welsch	0f65390c29	Do not mutate engine during planning step (#39571 ) This cleans up the Engine implementation by separating the sequence number generation from the planning step in the engine, to avoid for the planning step to have any side effects. This makes it easier to see that every sequence number is properly accounted for.	2019-03-04 10:11:39 +01:00
David Turner	9ec24bae80	Mute testDoNotWaitForPendingSeqNo Relates #39510, #39595.	2019-03-03 22:03:53 -05:00
Mayya Sharipova	d0e65a45a2	Add debug log for flush for IndicesRequestCacheIT (#39475 ) Add debug log when index is flushed to investigate a failure in IndicesRequestCacheIT "DEBUG" level is used as "TRACE" produces too much output irrelevant for this issue Relates to #32827	2019-03-01 13:12:45 -05:00
Luca Cavanna	29e3c18713	Mute failing IndexShardIT#testPendingRefreshWithIntervalChange Relates to #39565	2019-03-01 14:55:19 +01:00
Tanguy Leroux	e005eeb0b3	Backport support for replicating closed indices to 7.x (#39506 )(#39499 ) Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888	2019-03-01 14:48:26 +01:00
Yannick Welsch	1a50af7dd4	Do not close bad indices on startup (#39500 ) With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.	2019-03-01 09:23:46 +01:00
Tal Levy	b9b46fdec6	fix UpdateSettingsRequestStreamableTests.mutateInstance (#39386 ) (#39477 ) Mutations of the timeout values were using string-representations. This resulted in very rare cases where the original timeout value was represented as something like "0ms" and the new random time-value generated was "0s". Although their string representations differ, their underlying TimeValue does not. This resulted in `-Dtests.seed=7F4C034C43C22B1B` to fail.	2019-02-28 21:02:32 -08:00
Mark Tozzi	609118c229	Override and mute InternalAutoDateHistogramTests#testReduceRandom() (#39536 ) pending resolution of #39497	2019-02-28 16:00:32 -05:00
Lee Hinman	dae48ba262	Add details about what acquired the shard lock last (#38807 ) This adds a `details` parameter to shard locking in `NodeEnvironment`. This is intended to be used for diagnosing issues such as ``` 1> [2019-02-11T14:34:19,262][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] deleting index 1> [2019-02-11T14:34:19,279][WARN ][o.e.i.IndicesService ] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] failed to delete index 1> org.elasticsearch.env.ShardLockObtainFailedException: [.tasks][0]: obtaining shard lock timed out after 0ms 1> at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:736) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:655) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:601) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:554) ~[main/:?] ``` In the hope that we will be able to determine why the shard is still locked. Relates to #30290 as well as some other CI failures	2019-02-28 10:50:47 -07:00
Armin Braun	e564c4d8ad	Add Package Level JavaDoc on Snapshots (#38108 ) (#39514 ) * Add Package Level JavaDoc on Snapshots	2019-02-28 18:23:01 +01:00
Simon Willnauer	5c96b90ed5	Never block on scheduled refresh if a refresh is running (#39462 ) Today we block on the ReferenceManager in the case of a scheduled refresh. Yet if there is a refresh happening concurrently we might block and create very smallish segments. Instead we should just move on to the next shard and free up the refresh thread instead.	2019-02-28 11:57:45 +01:00
Armin Braun	d3d7d9bb9d	Remove Dead Code + Duplication in o.e.c.routing (#36678 ) (#39493 ) * Removed obviously unused fields+methods * Inlined public methods that only had one caller * Simplified `Optional` chain * Simplified some obviously redundant conditions	2019-02-28 10:33:05 +01:00
Armin Braun	90ab4a6f6e	Stabilize RareClusterState (#38671 ) (#39468 ) * Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes #36813	2019-02-28 08:01:52 +01:00
Tanguy Leroux	4dd274b51d	Unmute CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState() (#39452 ) This commit unmutes the test and comments out the offending call to linearizabilityChecker.isLinearizable() as suggested in #39437	2019-02-27 20:38:54 +01:00
Tanguy Leroux	983b5d1c0e	Mute SpecificMasterNodesIT.testElectOnlyBetweenMasterNodes() Tracked in #38331	2019-02-27 18:00:02 +01:00
Daniel Mitterdorfer	2ccba18809	Correct name of basic_date_time_no_millis (#39367 ) (#39454 ) With this commit we correct the name of the Java time based formatter for `basic_date_time_no_millis`.	2019-02-27 17:03:50 +01:00
Alan Woodward	71b8494181	Upgrade to lucene 8.0.0-snapshot-ff9509a8df (#39444 ) Backport of #39350 Contains the following: * LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory` * LUCENE-8292: `TermsEnum` is fully abstract * LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge * LUCENE-8676: Nori tokenizer deals correctly with large buffers * LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps * LUCENE-8664: Add `equals` and `hashCode` to `TotalHits` * LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold * LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle * LUCENE-8645: `Intervals#fixField` can merge intervals from different fields * LUCENE-8585: Create jump-tables for DocValues at index time	2019-02-27 14:36:08 +00:00
Armin Braun	f675b33d50	Increase Timeout in UnicastZenPingTests (#38893 ) (#39449 ) * Just like #37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause * Closes #26701	2019-02-27 15:22:17 +01:00
Jason Tedor	55e98f08d8	Provide a clearer error message on keystore add (#39327 ) When trying to add a setting to the keystore with an upper case name, we reject with an unclear error message. This commit makes that error message much clearer.	2019-02-27 08:10:23 -05:00
Armin Braun	27485871b8	Don't Ping on Handshake Connection (#39076 ) (#39446 ) * Don't Ping on Handshake Connection * It does not make sense to run pings on the handshake connection * Set the ping interval to `-1` to deactivate pings on it	2019-02-27 13:39:25 +01:00
Tanguy Leroux	6912e27ee0	Mute MinimumMasterNodesIT.testThreeNodesNoMasterBlock() Tracked in #39172	2019-02-27 13:13:22 +01:00
David Turner	41668f7723	Move PeerFinder's logger to the expected package (#39412 ) Today the abstract `org.elasticsearch.discovery.PeerFinder` uses the logger of its implementation, which in production is in `o.e.cluster.coordination`. This turns out to be confusing and unhelpful, so with this change we move to using the logger that belongs to `PeerFinder`.	2019-02-27 08:44:05 +00:00
Armin Braun	28b771f5db	Remove Dead Code Test Infrastructure (#39192 ) (#39436 ) * Just removing some obviously unused things	2019-02-27 09:38:47 +01:00
Tim Brooks	f24dae302d	Make security tests transport agnostic (#39411 ) Currently there are two security tests that specifically target the netty security transport. This PR moves the client authentication tests into `AbstractSimpleSecurityTransportTestCase` so that the nio transport will also be tested. Additionally the work to build transport configurations is moved out of the netty transport and tested independently.	2019-02-26 18:55:19 -07:00
Nhat Nguyen	a9e86bc941	Adjust testWaitForPendingSeqNo (#39404 ) Since #39006, we should either remove `testWaitForPendingSeqNo` or adjust it not to wait for the pending operations. This change picks the latter. Relates #39006	2019-02-26 16:21:56 -05:00
Mayya Sharipova	4ca514f18c	Fix testCacheWithFilteredAlias failure (#39401 ) Move refresh after Forcemerge Relates to #32827	2019-02-26 14:11:35 -05:00
Luca Cavanna	2619f48e4d	Rename SearchRequest#withLocalReduction (#39108 ) `withLocalReduction` is confusing as `local` effectively means "local to the remote clusters" rather than "local the coordinating node" where the method is executed. I propose we rename the method to `crossClusterSearch` which better resembles what the static method is used for.	2019-02-26 16:30:54 +01:00
Luca Cavanna	c09773a76e	Completion suggestions to be reduced once instead of twice (#39255 ) We have been calling `reduce` against completion suggestions twice, once in `SearchPhaseController#reducedQueryPhase` where all suggestions get reduced, and once more in `SearchPhaseController#sortDocs` where we add the top completion suggestions to the `TopDocs` so their docs can be fetched. There is no need to do reduction twice. All suggestions can be reduced in one call, then we can filter the result and pass only the already reduced completion suggestions over to `sortDocs`. The small important detail is that `shardIndex`, which is currently used only to fetch suggestions hits, needs to be set before the first reduction, hence outside of `sortDocs` where we have been doing it until now.	2019-02-26 11:42:02 +01:00
Yannick Welsch	d42f422258	Add linearizability checker for coordination layer (#36943 ) Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all CoordinatorTests.	2019-02-26 08:26:55 +01:00
Nhat Nguyen	575eed8582	Bubble up exception when processing NoOp (#39338 ) Today we do not bubble up exceptions when processing NoOps but always treat them as document-level failures. This incorrect treatment causes the assert_no_failure being tripped in peer-recovery if IndexWriter was closed exceptionally before. Closes #38898	2019-02-25 17:54:45 -05:00
Nhat Nguyen	e9dda75834	Enable soft-deletes by default for 7.0+ indices (#38929 ) Today when users upgrade to 7.0, existing indices will automatically switch to soft-deletes without an opt-out option. With this change, we only enable soft-deletes by default for new indices. Relates #36141	2019-02-25 17:54:29 -05:00
Igor Motov	d5046b1c25	[CI] Fixes testQueryRandomGeoCollection failure again (#39275 ) Moves the check for tiny polygons earlier in the test. It turned out that polygons can be so tiny that we cannot even figure out their orientation. Relates to #37356	2019-02-25 16:35:17 -05:00
Evgenia Badyanova	1ed3407930	Reduce garbage from allocations in deprecation logger (#38780 ) (#39370 ) 1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (#37411) by 3%. For more details: https://github.com/elastic/elasticsearch/issues/37530#issuecomment-462758384 Relates to #37530 Relates to #37411 Relates to #35754	2019-02-25 16:23:22 -05:00
Lee Hinman	5c7dd6f0ee	Set mappings when creating indices in SuggestSearchIT (#39323 ) * Set mappings when creating indices in SuggestSearchIT These tests don't test dynamic mapping, so they can use preset mappings. This removes the possibility they may fail due to the mapping not being available since mapping updates are asynchronous. Resolves #39315 * Wrap creates in assertAcked	2019-02-25 13:27:03 -07:00
Mayya Sharipova	bf058d6e4d	Fix anaylze NullPointerException when AnalyzeTokenList tokens is null (#39332 ) (#39361 )	2019-02-25 12:49:18 -05:00
Nhat Nguyen	48219112e3	Do not wait for advancement of checkpoint in recovery (#39006 ) With this change, we won't wait for the local checkpoint to advance to the max_seq_no before starting phase2 of peer-recovery. We also remove the sequence number range check in peer-recovery. We can safely do these thanks to Yannick's finding. The replication group to be used is currently sampled after indexing into the primary (see `ReplicationOperation` class). This means that when initiating tracking of a new replica, we have to consider the following two cases: - There are operations for which the replication group has not been sampled yet. As we initiated the new replica as tracking, we know that those operations will be replicated to the new replica and follow the typical replication group semantics (e.g. marked as stale when unavailable). - There are operations for which the replication group has already been sampled. These operations will not be sent to the new replica. However, we know that those operations are already indexed into Lucene and the translog on the primary, as the sampling is happening after that. This means that by taking a snapshot of Lucene or the translog, we will be getting those ops as well. What we cannot guarantee anymore is that all ops up to `endingSeqNo` are available in the snapshot (i.e. also see comment in `RecoverySourceHandler` saying `We need to wait for all operations up to the current max to complete, otherwise we can not guarantee that all operations in the required range will be available for replaying from the translog of the source.`). This is not needed, though, as we can no longer guarantee that max seq no == local checkpoint. Relates #39000 Closes #38949 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2019-02-25 12:10:14 -05:00
David Turner	236db51d34	Fix testSnapshotFileFailureDuringSnapshot (#39362 ) Today this test catches an exception and asserts that its proximate cause has message `Random IOException` but occasionally this exception is wrapped two layers deep, causing the test to fail. This commit adjusts the test to look at the root cause of the exception instead. 1> [2019-02-25T12:31:50,837][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotFileFailureDuringSnapshot] --> caught a top level exception, asserting what's expected 1> org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] Snapshot could not be read 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:212) ~[main/:?] 1> at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135) ~[main/:?] 1> at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54) ~[main/:?] 1> at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127) ~[main/:?] 1> at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208) ~[main/:?] 1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[main/:?] 1> at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?] 1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202] 1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202] 1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202] 1> Caused by: org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots 1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:564) ~[main/:?] 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?] 1> ... 9 more 1> Caused by: java.io.IOException: Random IOException 1> at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.maybeIOExceptionOrBlock(MockRepository.java:275) ~[test/:?] 1> at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.readBlob(MockRepository.java:317) ~[test/:?] 1> at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101) ~[main/:?] 1> at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90) ~[main/:?] 1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:560) ~[main/:?] 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?] 1> ... 9 more FAILURE 0.59s J0 \| SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot <<< FAILURES! > Throwable #1: java.lang.AssertionError: > Expected: a string containing "Random IOException" > but: was "[test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots" > at __randomizedtesting.SeedInfo.seed([B73CA847D4B4F52D:884E042D2D899330]:0) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot(SharedClusterSnapshotRestoreIT.java:821) > at java.lang.Thread.run(Thread.java:748)	2019-02-25 16:43:55 +00:00
Marios Trivyzas	11fe8cd16f	[Tests] Fix flakiness by ensuring stable cluster (#39300 ) (#39356 ) In integration tests where `setBootstrapMasterNodeIndex()` is used in combination with `autoMinMasterNodes = false` the cluster can start bootstrapping once the number of nodes set with the `setBootstrapMasterNodeIndex` have been started but it's not ensured that all nodes have successfully joined to form the cluster. This behaviour was introduced with `5db7ed22a0` and in order to ensure that the cluster is properly formed before proceeding with the integration test, use `ensureStableCluster()` with the appropriate number of expected nodes. Fixes: #39220	2019-02-25 17:26:15 +01:00
David Turner	dc23be5a9d	Avoid creating a green index in RetentionLeaseIT (#39347 ) In #39224 we made shard history retention lease syncing ignore the `index.write.wait_for_active_shards` setting on the index, and added a test that showed that it was ignored. However the test as merged actually creates a green index, so the `wait_for_active_shards` setting has no effect. This change adjusts the test to create a yellow index to verify that `wait_for_active_shards` really is ignored.	2019-02-25 15:33:09 +00:00
Yannick Welsch	a2bc41621c	Clean GatewayAllocator when stepping down as master (#38885 ) This fixes an issue where a messy master election might prevent shard allocation to properly proceed. I've encountered this in failing CI tests when we were bootstrapping multiple nodes. Tests would sometimes time out on an `ensureGreen` after an unclean master election. The reason for this is how the async shard information fetching works and how the clean-up logic in GatewayAllocator is integrated with the rest of the system. When a node becomes master, it will, as part of the first cluster state update where it becomes master, already try allocating shards (see `JoinTaskExecutor`, in particular the call to `reroute`). This process, which runs on the MasterService thread, will trigger async shard fetching. If the node is still processing an earlier election failure in ClusterApplierService (e.g. due to a messy election), that will possibly trigger the clean-up logic in GatewayAllocator after the shard fetching has been initiated by MasterService, thereby cancelling the fetching, which means that no subsequent reroute (allocation) is triggered after the shard fetching results return. This means that no shard allocation will happen unless the user triggers an explicit reroute command. The bug imo is that GatewayAllocator is called from both MasterService and ClusterApplierService threads, with no clear happens-before relation. The fix here makes it so that the clean-up logic is also run on the MasterService thread instead of the ClusterApplierService thread, reestablishing a clear happens-before relation. Note that testing this is tricky. With the newly added test, I can quite often reproduce this by adding `Thread.sleep(10);` in ClusterApplierService (to make sure it does not go too quickly) and adding `Thread.sleep(50);` in `TransportNodesListGatewayStartedShards` to make sure that shard state fetching does not go too quickly either. Note that older versions of Zen discovery are affected by this as well, but did not exhibit this issue as often because master elections are much slower there.	2019-02-25 10:37:31 +01:00
David Turner	96c09b032d	Ignore waitForActiveShards when syncing leases (#39224 ) Adjust the retention lease sync actions so that they do not respect the `index.write.wait_for_active_shards` setting on an index, allowing them to sync retention leases even if insufficiently many shards are currently active to accept writes. Relates #39089	2019-02-25 08:53:43 +00:00
Nhat Nguyen	f17d408fbb	Add cause to assert_no_failure when replay translog (#39333 ) We tripped this assertion three times for the last two weeks. However, it only says "this IndexWriter is closed" without the actual cause. ``` [2019-02-14T11:46:31,144][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] fatal error in thread [elasticsearch[node-1][generic][T#2]], exiting java.lang.AssertionError: unexpected failure while replicating translog entry: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed ``` This change replaces an assert with an AssertionError so that we will have the actual cause in the next build failures. Relates #38898	2019-02-23 13:04:43 -05:00
Zachary Tong	c7516b03b6	Better HoltWinters parameter validation (#38747 ) We validate HW parameters (namely, window > 2 * period) when parsing the XContent... but that means transport clients can configure bad params. This change allows model to validate the window and throw an exception if they wish. It also makes some test changes: - removes testBadModelParams(), which was a junk test (didn't do anything), and bad param checking is done elsewhere in units tests - Fixes one of the windows in testHoltWintersNotEnoughData() - Ensures the period in testHoltWintersNotEnoughData() is >> window - Removes `setTypes()` since that's deprecated	2019-02-22 15:25:26 -05:00
Daniel Mitterdorfer	9fea21aca5	Remove ExceptionsHelper#detailedMessage in tests (#37921 ) (#39297 ) With this commit we remove all usages of the deprecated method `ExceptionsHelper#detailedMessage` in tests. We do not address production code here but rather in dedicated follow-up PRs to keep the individual changes manageable. Relates #19069	2019-02-22 14:03:29 +01:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Tanguy Leroux	fc896e452c	ReadOnlyEngine should update translog recovery state information (#39238 ) (#39251 ) `ReadOnlyEngine` never recovers operations from translog and never updates translog information in the index shard's recovery state, even though the recovery state goes through the `TRANSLOG` stage during the recovery. It means that recovery information for frozen shards indicates an unkown number of recovered translog ops in the Recovery APIs (translog_ops: `-1` and translog_ops_percent: `-1.0%`) and this is confusing. This commit changes the `recoverFromTranslog()` method in `ReadOnlyEngine` so that it always recover from an empty translog snapshot, allowing the recovery state translog information to be correctly updated. Related to #33888	2019-02-21 18:08:06 +01:00
Daniel Mitterdorfer	ef921fd157	Migrate Streamable to Writeable for cluster block package (#37391 ) (#39236 )	2019-02-21 15:21:44 +01:00
Marios Trivyzas	ecfd48b6d3	[Tests] Make testEngineGCDeletesSetting deterministic (#38942 ) (#39231 ) `InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>	2019-02-21 14:30:59 +02:00
Marios Trivyzas	1316825f52	Replace superfluous usage of Counter with Supplier (#39048 ) (#39225 ) `Counter` was used as a means of a functional argument to pass the relative cached time before `Supplier` iface was introduced.	2019-02-21 12:42:54 +02:00
Ignacio Vera	be8a5315d7	Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (#39176 ) The type date_nanos does not direct doc-value iterators and it needs to extend `next_doc` in order to delegate the call to the wrapped iterator.	2019-02-21 11:10:51 +01:00
Tal Levy	8150ca40f2	mute test3MasterNodes2Failed	2019-02-20 17:35:37 -08:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Mirko Jotic	a6ae146ccc	Converting Derivative Pipeline Agg integration test into AggregatorTestsCase. (#38679 ) Replicates the majority of existing Derivative pipeline integration tests into an AggregatorTestCase, with the goal of removing the integration tests in the near future.	2019-02-20 16:35:32 -05:00
Igor Motov	3d93011e32	Fix median calculation in MedianAbsoluteDeviationAggregatorTests (#38979 ) Fixes an error in median calculation in MedianAbsoluteDeviationAggregatorTests for odd number of sample points, which causes some rare test failures. Fixes #38937	2019-02-20 13:24:30 -05:00
Ioannis Kakavas	c783069804	Fix NPE on Stale Index in IndicesService(#39173 ) This is a backport of #38891 which closes #38845	2019-02-20 15:35:35 +02:00
David Turner	efffb3d5b7	Simplify calculation in AwarenessAllocationDecider (#38091 ) Today's calculation of the maximum number of shards per attribute is rather convoluted. This commit clarifies that it returns ceil(shardCount/numberOfAttributes).	2019-02-20 08:54:57 +00:00
Henning Andersen	00a26b9dd2	Blob store compression fix (#39073 ) Blob store compression was not enabled for some of the files in snapshots due to constructor accessing sub-class fields. Fixed to instead accept compress field as constructor param. Also fixed chunk size validation to work. Deprecated repositories.fs.compress setting as well to be able to unify in a future commit.	2019-02-20 09:24:41 +01:00
Hendrik Muhs	50b3858f7c	add version 6.6.2	2019-02-19 20:28:06 +01:00
David Turner	0a9574c9d4	Add some missing toString() implementations (#39124 ) Sometimes we turn objects into strings for logging or debugging using `toString()`, but the default implementation is often unhelpful. This change improves on this in two places I ran into recently.	2019-02-19 17:52:41 +00:00
Jason Tedor	fef9bdb23f	Allow retention lease operations under blocks (#39089 ) This commit allows manipulating retention leases under blocks.	2019-02-19 10:26:49 -05:00
Jason Tedor	12f6963456	Fix retention leases sync on recovery test This test had a bug. We attempt to allow only the primary to be allocated, to force all replicas to recovery from the primary after we had set the state of the retention leases on the primary. However, in building the index settings, we were overwriting the settings that exclude the replicas from being allocated. This means that some of the replicas would end up assigned and rather than receive retention leases during recovery, they would be part of the replication group receiving retention leases as they are manipulated. Since retention lease renewals are only synced periodically, this means that the replica could be lagging a little behind in some cases leading to an assertion tripping in the test. This commit addresses this by ensuring that the replicas are indeed not allocated until after the retention leases are done being manipulated on the replica. We did this by not overwriting the exclude settings. Closes #39105	2019-02-19 09:07:33 -05:00
Alexander Reelsen	7f8a640363	Fix DateFormatters.parseMillis when no timezone is given (#39100 ) The parseMillis method was able to work on formats without timezones by falling back to UTC. The Date Formatter interface did not support this, as the calling code was using the `Instant.from` java time API. This switches over to an internal method which adds UTC as a timezone. Closes #39067	2019-02-19 14:12:22 +01:00
Jim Ferenczi	199155f5fb	Enforce Completion Context Limit (#38675 ) (#39075 ) This change adds a limit to the number of completion contexts that a completion field can define. Closes #32741	2019-02-19 08:52:24 +01:00
Albert Zaharovits	6bc88b00ec	Mute GatewayMetaStateTests.testAtomicityWithFailures (#39079 ) Mute test GatewayMetaStateTests.testAtomicityWithFailures	2019-02-19 00:25:45 +02:00
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Jason Tedor	d43ac8fe11	Include in log retention leases that failed to sync When retention leases fail to sync after an expiration check, we emit a log message about this. This commit adds the retention leases that failed to sync.	2019-02-18 15:08:08 -05:00
Jason Tedor	bbb61002ba	Add some logging related to retention lease syncing (#39066 ) When the background retention lease sync fires, we check an see if any retention leases are expired. If any did expire, we execute a full retention lease sync (write action). Since this is happening on a background thread, we do not block that thread waiting for success (it will simply try again when the timer elapses). However, we were swallowing exceptions that indicate failure. This commit addresses that by logging the failures. Additionally, we add some trace logging to the execution of syncing retention leases.	2019-02-18 15:02:31 -05:00
Henning Andersen	99b2bc3461	Fix potential race during TcpTransport close (#39031 ) Fixed two potential causes for leaked threads during tests: 1. When adding a channel to serverChannels, we add it under a monitor that we do not use when reading from it. This is potentially unsafe if there is no other happens-before relationship ensuring the safety of this. 2. Long-shot but if the thread pool was shutdown before entering this code, we would silently forget about closing server channels so added assert. Strengthened the locking to ensure that once we stop the transport, no new server channels can be made. Relates to CI failure issue: #37543	2019-02-18 19:13:23 +01:00
Alan Woodward	ab4d5f404f	Add overlapping, before, after filters to intervals query (#38999 ) Lucene recently added `overlapping`, `before` and `after` filters to the intervals package. This commit exposes them in elasticsearch.	2019-02-18 15:06:24 +00:00
Adrien Grand	45b17e8645	Don't close caches while there might still be in-flight requests. (#38958 ) Many of our index components use ref-counting so that in the event that a shard is closed while there are still ongoing requests, then the index reader and the store only effectively get closed when ongoing requests have finished. However we don't apply the same principle to the request and query caches, which might get closed while there are still in-flight requests. This commit adds ref-counting to `IndicesService` so that the caches and other components it maintains only get closed when all shards are effectively closed. Closes #37117	2019-02-18 13:59:58 +01:00
Martijn van Groningen	ed08bc3537	Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (#38709 ) * During fetching remote mapping if remote client is missing then `NoSuchRemoteClusterException` was not handled. * When adding remote connection, check that it is really connected before continue-ing to run the tests. Relates to #38695	2019-02-18 09:41:44 +01:00
Jason Tedor	a5ce1e0bec	Integrate retention leases to recovery from remote (#38829 ) This commit is the first step in integrating shard history retention leases with CCR. In this commit we integrate shard history retention leases with recovery from remote. Before we start transferring files, we take out a retention lease on the primary. Then during the file copy phase, we repeatedly renew the retention lease. Finally, when recovery from remote is complete, we disable the background renewing of the retention lease.	2019-02-16 15:37:52 -05:00
Tim Brooks	b1c1daa63f	Add get file chunk timeouts with listener timeouts (#38758 ) This commit adds a `ListenerTimeouts` class that will wrap a `ActionListener` in a listener with a timeout scheduled on the generic thread pool. If the timeout expires before the listener is completed, `onFailure` will be called with an `ElasticsearchTimeoutException`. Timeouts for the get ccr file chunk action are implemented using this functionality. Additionally, this commit attempts to fix #38027 by also blocking proxied get ccr file chunk actions. This test being un-muted is useful to verify the timeout functionality.	2019-02-16 10:56:03 -07:00
Luca Cavanna	a1a49f201d	Tie break search shard iterator comparisons on cluster alias (#38853 ) `SearchShardIterator` inherits its `compareTo` implementation from `PlainShardIterator`. That is good in most of the cases, as such comparisons are based on the shard id which is unique, even when searching against indices with same names across multiple clusters (thanks to the index uuid being different). In case though the same cluster is registered multiple times with different aliases, the shard id is exactly the same, hence remote results will be returned before local ones with same shard id objects. That is because remote iterators are added before local ones, and we use a stable sorting method in `GroupShardIterators` constructor. This PR enhances `compareTo` for `SearchShardIterator` to tie break on cluster alias and introduces consistent `equals` and `hashcode` methods. This allows to remove a TODO in `SearchResponseMerger` which otherwise has to handle this special case specifically. Also, while at it I added missing tests around equals/hashcode and compareTo and expanded existing ones.	2019-02-16 09:41:03 +01:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Nhat Nguyen	20755e666c	Reduce global checkpoint sync interval in disruption tests (#38931 ) We verify seq_no_stats is aligned between copies at the end of some disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due to a lagged global checkpoint on replicas. The global checkpoint on replicas is lagged because we sync the global checkpoint 30 seconds (by default) after the last replication operation. This change reduces the global checkpoint sync-internal to 1s in the disruption tests. Closes #38318 Closes #36789	2019-02-15 21:04:20 -05:00
Nhat Nguyen	a67b9f6d1f	Relax testStressMaybeFlushOrRollTranslogGeneration (#38918 ) The predicate shouldPeriodicallyFlush is determined by the uncommitted translog size and the local checkpoint. The uncommitted translog size depends on the local checkpoint. The condition shouldPeriodicallyFlush can be true twice in in the test in the following scenario: 1. Index doc-0 and advances the local checkpoint to 0, the condition shouldPeriodicallyFlush remains false. 2. Index doc-1 and add it to translog, but the local checkpoint is not advanced yet (still 0). The condition shouldPeriodicallyFlush becomes true because the uncommitted translog size is 216bytes (2ops + gen-1 + gen-2) > 180bytes and the translog generation of the new index commit would advance from 1 to 2. > [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=0, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=2, > max_seq_no=1}] 1. The shouldPeriodicallyFlush becomes true again after the local checkpoint is advanced to 1 because the uncommitted translog size is 216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation of the new index commit would advance from 2 to 4. > [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=1, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=4, > max_seq_no=1}] We need to relax the assertion in this test to cover this situation. Closes #31629	2019-02-15 21:04:12 -05:00
Armin Braun	238425e5e7	Fix Issue with Concurrent Snapshot Init + Delete (#38518 ) * Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes #38489	2019-02-15 16:50:47 -08:00
Alan Woodward	176013e23c	Avoid double term construction in DfsPhase (#38716 ) DfsPhase captures terms used for scoring a query in order to build global term statistics across multiple shards for more accurate scoring. It currently does this by building the query's `Weight` and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()` for each collected term. This duplicates work, however, as the various `Weight` implementations will already have collected these statistics at construction time. This commit replaces this round-about way of collecting stats, instead using a delegating IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()` is called from the Weight. It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term statistics, but ignore field statistics. `Rescorer.extractTerms` has been removed, and replaced with a new method on `RescoreContext` that returns any queries used by the rescore implementation. The delegating IndexSearcher then collects term contexts and statistics in the same way described above for each Query.	2019-02-15 16:00:38 +00:00
Daniel Mitterdorfer	fcc7f553f5	Also mmap cfs files for hybridfs (#38940 ) (#38947 ) With this commit we add the `.cfs` file extension to the list of file types that are memory-mapped by hybridfs. `.cfs` files combine all files of a Lucene segment into a single file in order to save file handles. As this strategy is only used for "small" segments (less than 10% of the shard size), it is benefical to memory-map them instead of accessing them via NIO. Relates #36668	2019-02-15 15:34:40 +01:00
David Turner	578514e892	Recover peers from translog, ignoring soft deletes (#38904 ) Today if soft deletes are enabled then we read the operations needed for peer recovery from Lucene. However we do not currently make any attempt to retain history in Lucene specifically for peer recoveries so we may discard it and fall back to a more expensive file-based recovery. Yet we still retain sufficient history in the translog to perform an operations-based peer recovery. In the long run we would like to fix this by retaining more history in Lucene, possibly using shard history retention leases (#37165). For now, however, this commit reverts to performing peer recoveries using the history retained in the translog regardless of whether soft deletes are enabled or not.	2019-02-15 10:45:15 +01:00
Henning Andersen	a211e51343	ShardBulkAction ignore primary response on primary (#38901 ) Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.	2019-02-15 10:13:11 +01:00
Jason Tedor	00cb8d0be8	Mark coordinator test as awaits fix This test is failing frequently so this commit mutes it. Relates #38867	2019-02-14 12:43:31 -05:00
Lee Hinman	0c733c04be	Remove immediate operation retry after mapping update (#38873 ) Prior to this commit, when an indexing operation resulted in an `Engine.Result.Type.MAPPING_UPDATE_REQUIRED`, TransportShardBulkAction immediately retries the indexing operation to see if it succeeds. In the event that it succeeds the context does not wait until the mapping update has propagated through the cluster state before finishing the indexing. In some of our tests we rely on mappings being available as soon as they've been introduced in a document that indexed correctly. By removing the immediate retry we always wait for this to be the case. Resolves #38428 Supercedes #38579 Relates to #38711	2019-02-14 09:31:08 -07:00
Christoph Büscher	6c5cec4ff4	Enable silent FollowersCheckerTest (#38851 ) One of the test methods wasn't run because it was private. Making this method public and fixing some issues around mocking the threadpool that otherwise would lead to an NPE.	2019-02-14 16:16:48 +01:00
Albert Zaharovits	6243a9797f	_cat/indices with Security, hide names when wildcard (#38824 ) This changes the output of the `_cat/indices` API with `Security` enabled. It is possible to only display the index name (and possibly the index health, depending on the request options) but not its stats (doc count, merges, size, etc). This is the case for closed indices which have index metadata in the cluster state but no associated shards, hence no shard stats. However, when `Security` is enabled, and the request contains wildcards, open indices without stats are a common occurrence. This is because the index names in the response table are picked up directly from the cluster state which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the stats data which is populated by the indices stats API which does go through the index name resolver. This is a bug, because it is circumventing `Security`'s function to hide unauthorized indices. This has been fixed by displaying the index names as they are resolved by the indices stats API. The outputs of these two APIs is now very similar: same index names, similar data but different format. Closes #37190	2019-02-14 15:09:17 +02:00
David Roberts	6ea483a663	Mute DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex Due to https://github.com/elastic/elasticsearch/issues/38845	2019-02-14 11:46:22 +00:00
Luca Cavanna	7456117019	[TEST] address testCollectNodes rare failure (#38559 ) #37767 changed the expected exception for "no such cluster" error from `IllegalStateException` to a dedicated `NoSuchRemoteClusterException`. An assertion in `testCollectNodes` needs to be updated accordingly.	2019-02-14 10:57:14 +01:00
Nhat Nguyen	5d22e45990	Copy retention leases when trim unsafe commits (#37995 ) When a primary shard is recovered from its store, we trim the last commit (when it's unsafe). If that primary crashes before the recovery completes, we will lose the committed retention leases because they are baked in the last commit. With this change, we copy the retention leases from the last commit to the safe commit when trimming unsafe commits. Relates #37165	2019-02-13 17:27:48 -05:00
Jason Tedor	062eea8fcc	Fix excessive increments in soft delete policy (#38813 ) In this case, we were incrementing the policy too much. This means on every iteration we actually keep increasing the minimum retained sequence number, even with leases in place. It was a bug from when the soft deletes policy had retention leases incorporated into it. This commit fixes this bug by ensuring we only increment in the proper places, and adds careful tests for the various situations.	2019-02-13 14:04:45 -05:00
Jake Landis	46bb663a09	Make 7.x like 6.7 user agent ecs, but default to true (#38828 ) Forward port of https://github.com/elastic/elasticsearch/pull/38757 This change reverts the initial 7.0 commits and replaces them with the 6.7 variant that still allows for the ecs flag. This commit differs from the 6.7 variants in that ecs flag will now default to true. 6.7: `ecs` : default `false` 7.x: `ecs` : default `true` 8.0: no option, but behaves as `true` * Revert "Ingest node - user agent, move device to an object (#38115)" This reverts commit `5b008a34aa`. * Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)" This reverts commit `cac6b8e06f`. * cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 Add ECS schema for user-agent ingest processor (#37727) * cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 Ingest node - user agent, move device to an object (#38115) (#38121) * cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes) Dep. check for ECS changes to User Agent processor (#38362) * make true the default for the ecs option, and update 7.0 references and tests	2019-02-13 10:28:01 -06:00
Przemyslaw Gomulka	7404882105	Fix line separators in JSON logging tests backport#38771 #38834 The hardcoded '\n' in string will not work in Windows where there is a different line separator. A System.lineSeparator should be used to make it work on all platforms closes #38705 backport #38771	2019-02-13 13:34:33 +01:00
Zachary Tong	57f69082fd	Disable cache on QueryProfilerIT (#38748 ) - Disables the request cache on the test, to prevent cached values from potentially interfering with test results - Changes the test to execute a single query, in hopes of making failures more reproducible Backport of #38583	2019-02-12 13:11:52 -05:00
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Nhat Nguyen	4a5070dcfb	Use current term in initial leases in engine test (#38285 ) We need to use the current primary term instead of 1L for the initial retention leases; otherwise, the primary term of the committed retention leases won't match the current primary term if the retention leases never gets updated.	2019-02-12 11:40:04 -05:00
Nhat Nguyen	eca5404572	Fix synchronization in LocalCheckpointTracker#contains (#38755 ) We are accessing the `CountedBitSet` in `LocalCheckpointTracker#contains` without proper synchronization. Relates #33871	2019-02-12 11:39:50 -05:00
Nhat Nguyen	225ebb6935	Ensure no snapshotted commit when close engine (#38663 ) With this change, we can automatically detect an implementation that acquires an index commit but fails to release.	2019-02-12 11:39:35 -05:00
Tanguy Leroux	51d6b9ab31	Fix CloseWhileRelocatingShardsIT (#38728 )	2019-02-12 14:04:44 +01:00
Jason Tedor	bbc9aa9979	Introduce retention lease actions (#38756 ) This commit introduces actions for some common retention lease operations that clients need to be able to perform remotely. These actions include add/renew/remove.	2019-02-12 07:38:03 -05:00
Przemyslaw Gomulka	7e178aa4a7	Enable IndexActionTests and WatcherIndexingListenerTests Backport #38738 fix tests to use clock in milliseconds precision in watcher code make sure the date comparison in string format is using same formatters some of the code was modified in #38514 possibly because of merge conflicts closes #38581 Backport #38738	2019-02-12 13:05:44 +01:00
Luca Cavanna	90fff54954	Tie break on cluster alias when merging shard search failures (#38715 ) A recent test failure triggered an edge case scenario where failures may be coming back with the same shard id, yet from different clusters. This commit adapts the failures comparator to take the cluster alias into account when merging failures as part of CCS requests execution. Also the corresponding test has been split in two: with and without search shard target set to the failure. Closes #38672	2019-02-12 11:25:44 +01:00
Jason Tedor	c7cdd6a46a	Add dedicated retention lease exceptions (#38754 ) When a retention lease already exists on an add retention lease invocation, or a retention lease is not found on a renew retention lease invocation today we throw an illegal argument exception. This puts a burden on the caller to catch that specific exception and parse the message. This commit relieves the burden from the caller by adding dedicated exception types for these situations.	2019-02-12 00:32:09 -05:00
Jason Tedor	b97c74bbab	Enable removal of retention leases (#38751 ) This commit introduces the ability to remove retention leases. Explicit removal will be needed to manage retention leases used to increase the likelihood of operation-based recoveries syncing, and for consumers such as ILM.	2019-02-11 21:19:11 -05:00
Nick Knize	e2f432a413	Fix the version check for LegacyGeoShapeFieldMapper (#38547 ) Change version check from 7.0 to 6.6 in BaseGeoShapeFieldMapper to correctly use LegacyGeoShapeFieldMapper for indexes created prior to 6.6.	2019-02-11 16:27:47 -06:00
Nick Knize	078da6d9bd	Fix GeoHash PrefixTree BWC (#38584 ) geo_shape indexes created before 6.6 use geohash string encoding as default tree parameter and quadtree encoding for 6.6 and later. This commit fixes bwc to use geohash encoding in LegacyGeoshapeFieldMapper for indexes created before 6.6.	2019-02-11 11:59:51 -06:00
David Roberts	d1848b96fc	Fix possible assertion failure in IndicesQueryCache.close (#38731 ) The assertion that the stats2 map is empty in IndicesQueryCache.close has been observed to fail very occasionally in internal cluster tests. The likely cause is a cross-thread visibility problem for a count variable. This change makes that count volatile. Relates #37117 Backport of #38714	2019-02-11 17:33:20 +00:00
Tanguy Leroux	dc212de822	Specialize pre-closing checks for engine implementations (#38702 ) (#38722 ) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2019-02-11 17:34:17 +01:00
Luca Cavanna	6443b46184	Clean up ShardSearchLocalRequest (#38574 ) Added a constructor accepting `StreamInput` as argument, which allowed to make most of the instance members final as well as remove the default constructor. Removed a test only constructor in favour of invoking the existing constructor that takes a `SearchRequest` as first argument. Also removed profile members and related methods as they were all unused.	2019-02-11 15:55:46 +01:00
Alexander Reelsen	884b5063a4	Create ISO8601 joda compatible java time formatter (#38434 ) The existing formatter being used was not on par with the joda formatter as it was missing the ability to parse a comma as a separator between seconds and milliseconds. While a real iso8601 would be much more complex, this might be sufficient for some more use-cases. The ingest date formatter now also uses the iso8601 formatter by default. Closes #38345	2019-02-11 15:11:26 +01:00
Alexander Reelsen	e7868e92bd	Restore date aggregation performance in UTC case (#38221 ) (#38700 ) The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes #37826	2019-02-11 16:30:48 +03:00
Luca Cavanna	fe8bd757b2	Look up connection using the right cluster alias when releasing contexts (#38570 ) Whenever phase failure is raised in AbstractSearchAsyncAction, we go and release search contexts of shards that successfully returned their results, prior to notifying the listener of the failure. In case we are executing a CCS request, it's important to look-up the connection to send the release context request to. This commit makes sure that the lookup takes the cluster alias into account. We used to use `null` at all times instead which is not correct and was not caught as any exception is caught without re-throwing it.	2019-02-11 13:40:42 +01:00
Przemyslaw Gomulka	ba9a4d13e1	mute Failing tests related to logging and joda-java migration backport(#38704 )(#38710 ) the tests awaits fix from #38693 and #38705 and #38581	2019-02-11 13:15:12 +01:00
Przemyslaw Gomulka	ab9e2f2e69	Move testToUtc test to DateFormattersTests #38698 Backport #38610 The test was relying on toString in ZonedDateTime which is different to what is formatted by strict_date_time when milliseconds are 0 The method is just delegating to dateFormatter, so that scenario should be covered there. closes #38359 Backport #38610	2019-02-11 11:34:25 +01:00
Like	b8be6cb5c7	Reject index.optimize_auto_generated_id setting (#28895 ) This commit rejects the index.optmize_auto_generated_id setting for indices created on or after 7.0.0. This setting was deprecated in 6.7.0.	2019-02-10 13:46:09 -05:00
Tim Brooks	023e3c207a	Concurrent file chunk fetching for CCR restore (#38656 ) Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR. The implementation uses the parallel file writer functionality that is also used by peer recoveries.	2019-02-09 21:19:57 -07:00
Christoph Büscher	e3c7b93917	Mute failure in InternalEngineTests (#38622 )	2019-02-08 16:29:54 +01:00
Dimitris Athanasiou	fe8182ece2	Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7x (#38597 )	2019-02-08 11:32:28 +02:00
Jason Tedor	fdf6b3f23f	Add 7.1 version constant to 7.x branch (#38513 ) This commit adds the 7.1 version constant to the 7.x branch. Co-authored-by: Andy Bristol <andy.bristol@elastic.co> Co-authored-by: Tim Brooks <tim@uncontended.net> Co-authored-by: Christoph Büscher <cbuescher@posteo.de> Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com> Co-authored-by: markharwood <markharwood@gmail.com> Co-authored-by: Ioannis Kakavas <ioannis@elastic.co> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: David Roberts <dave.roberts@elastic.co> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Alpar Torok <torokalpar@gmail.com> Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Tim Vernum <tim@adjective.org> Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>	2019-02-07 16:32:27 -05:00
Jason Tedor	f8ed6c15c4	Enable BWC after backport recovering leases (#38485 ) This commit enables the BWC tests after backporting recovery of retention leases during peer recovery.	2019-02-06 08:03:19 -05:00
Jason Tedor	4b42281a4e	Collapse retention lease integration tests (#38483 ) This commit collapses the retention lease integration tests into a single suite.	2019-02-06 07:55:41 -05:00
Tanguy Leroux	510829f9f7	TransportVerifyShardBeforeCloseAction should force a flush (#38401 ) This commit changes the `TransportVerifyShardBeforeCloseAction` so that it always forces the flush of the shard. It seems that #37961 is not sufficient to ensure that the translog and the Lucene commit share the exact same max seq no and global checkpoint information in case of one or more noop operations have been made. The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial situation and they both fail 1 on 10 executions. Relates to #33888	2019-02-06 13:22:54 +01:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
Ioannis Kakavas	e1d464b22c	Mute testRetentionLeasesSyncOnRecovery (#38488 ) Relates: #38487	2019-02-06 08:52:54 +02:00
Armin Braun	34f2cc78f6	Fix Master Failover and DataNode Leave Blocking Snapshot (#38460 ) * Closes #38447	2019-02-05 23:56:59 +01:00
Jason Tedor	79a45b47da	Recover retention leases during peer recovery (#38435 ) This commit integrates retention leases with recovery. With this change, we copy the current retention leases on primary to the replica during phase two of recovery. At this point, the replica will have been added to the replication group and so is already receiving retention lease sync requests from the primary. This means that if any retention lease syncs are triggered on the primary after we sample the retention leases here during phase two, that sync request will also arrive on the replica ensuring that the replica is from this point on up to date with the retention leases on the primary. We have to copy these during phase two since we will be applying indexing operations, potentially triggering merges, and therefore must ensure the correct retention leases are in place beforehand.	2019-02-05 17:43:41 -05:00
Henning Andersen	20c66c5a05	Bubble-up exceptions from scheduler (#38317 ) Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. This is a continuation of #38014 Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail. schedule could bubble rejected exception to uncaught exception handler when not using SAME executor if thread pool is terminated. Now ignore rejected exception silently if executor is shutdown.	2019-02-05 21:48:24 +01:00
Boaz Leskes	033ba725af	Remove support for internal versioning for concurrency control (#38254 ) Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page: > When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach. Relates to #1078	2019-02-05 20:53:35 +01:00
Jason Tedor	b03d138122	Lift retention lease expiration to index shard (#38380 ) This commit lifts the control of when retention leases are expired to index shard. In this case, we move expiration to an explicit action rather than a side-effect of calling ReplicationTracker#getRetentionLeases. This explicit action is invoked on a timer. If any retention leases expire, then we hard sync the retention leases to the replicas. Otherwise, we proceed with a background sync.	2019-02-05 14:42:17 -05:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
Tal Levy	aef5775561	re-enables awaitsfixed datemath tests (#38376 ) Previously, date formats of `YYYY.MM.dd` would hit an issue where the year would jump towards the end of the calendar year. This was an issue that had since been resolved in tests by using `yyyy` to be the more accurate representation of the year. Closes #37037.	2019-02-05 11:20:40 -08:00
Julie Tibshirani	3ce7d2c9b6	Make sure to reject mappings with type _doc when include_type_name is false. (#38270 ) `CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when deserializing index creation requests, accidentally accepts mappings that are nested twice under the type key (as described in the bug report #38266). This in turn causes us to be too lenient in parsing typeless mappings. In particular, we accept the following index creation request, even though it should not contain the type key `_doc`: ``` PUT index?include_type_name=false { "mappings": { "_doc": { "properties": { ... } } } } ``` There is a similar issue for both 'put templates' and 'put mappings' requests as well. This PR makes the minimal changes to detect and reject these typed mappings in requests. It does not address #38266 generally, or attempt a larger refactor around types in these server-side requests, as I think this should be done at a later time.	2019-02-05 10:52:32 -08:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Turner	b7ab521eb1	Throw AssertionError when no master (#38432 ) Today we throw a fatal `RuntimeException` if an exception occurs in `getMasterName()`, and this includes the case where there is currently no master. However, sometimes we call this method inside an `assertBusy()` in order to allow for a cluster that is in the process of stabilising and electing a master. The trouble is that `assertBusy()` only retries on an `AssertionError` and not on a general `RuntimeException`, so the lack of a master is immediately fatal. This commit fixes the issue by asserting there is a master, triggering a retry if there is not. Fixes #38331	2019-02-05 17:11:20 +00:00
Armin Braun	2f6afd290e	Fix Concurrent Snapshot Ending And Stabilize Snapshot Finalization (#38368 ) * The problem in #38226 is that in some corner cases multiple calls to `endSnapshot` were made concurrently, leading to non-deterministic behavior (`beginSnapshot` was triggering a repository finalization while one that was triggered by a `deleteSnapshot` was already in progress) * Fixed by: * Making all `endSnapshot` calls originate from the cluster state being in a "completed" state (apart from on short-circuit on initializing an empty snapshot). This forced putting the failure string into `SnapshotsInProgress.Entry`. * Adding deduplication logic to `endSnapshot` * Also: * Streamlined the init behavior to work the same way (keep state on the `SnapshotsService` to decide which snapshot entries are stale) * closes #38226	2019-02-05 16:44:18 +01:00
Lee Hinman	d862453d68	Support unknown fields in ingest pipeline map configuration (#38352 ) We already support unknown objects in the list of pipelines, this changes the `PipelineConfiguration` to support fields other than just `id` and `config`. Relates to #36938	2019-02-05 07:52:17 -07:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Christoph Büscher	d255303584	Add typless client side GetIndexRequest calls and response class (#37778 ) The HLRC client currently uses `org.elasticsearch.action.admin.indices.get.GetIndexRequest` and `org.elasticsearch.action.admin.indices.get.GetIndexResponse` in its get index calls. Both request and response are designed for the typed APIs, including some return types e.g. for `getMappings()` which in the maps it returns still use a level including the type name. In order to change this without breaking existing users of the HLRC API, this PR introduces two new request and response objects in the `org.elasticsearch.client.indices` client package. These are used by the IndicesClient#get and IndicesClient#exists calls now by default and support the type-less API. The old request and response objects are still kept for use in similarly named, but deprecated methods. The newly introduced client side classes are simplified versions of the server side request/response classes since they don't need to support wire serialization, and only the response needs fromXContent parsing (but no xContent-serialization, since this is the responsibility of the server-side class). Also changing the return type of `GetIndexResponse#getMapping` to `Map<String, MappingMetaData> getMappings()`, while it previously was returning another map keyed by the type-name. Similar getters return simple Maps instead of the ImmutableOpenMaps that the server side response objects return.	2019-02-05 03:41:05 +01:00
Gordon Brown	292e0f6fb7	Deprecate `_type` in simulate pipeline requests (#37949 ) As mapping types are being removed throughout Elasticsearch, the use of `_type` in pipeline simulation requests is deprecated. Additionally, the default `_type` used if one is not supplied has been changed to `_doc` for consistency with the rest of Elasticsearch.	2019-02-04 16:11:44 -07:00
Christoph Büscher	0ced775389	Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica (#38357 )	2019-02-04 22:30:34 +01:00
Mayya Sharipova	641704464d	Deprecate types in rollover index API (#38039 ) Relates to #35190	2019-02-04 16:07:45 -05:00
Zachary Tong	ab1150378b	Add Composite to AggregationBuilders (#38207 )	2019-02-04 13:47:04 -05:00
David Turner	2c1eab2b8a	Clarify slow cluster-state log messages (#38302 ) The message `... took [31s] above the warn threshold of 30s` suggests incorrectly that the task took 61 seconds. This commit adds the clarifying words `which is`.	2019-02-04 17:44:00 +00:00
Andrey Ershov	7bc8bc9605	ensureGreen (#38324 )	2019-02-04 16:36:04 +01:00
Jason Tedor	625d37a26a	Introduce retention lease background sync (#38262 ) This commit introduces a background sync for retention leases. The idea here is that we do a heavyweight sync when adding a new retention lease, and then periodically we want to background sync any retention lease renewals to the replicas. As long as the background sync interval is significantly lower than the extended lifetime of a retention lease, it is okay if from time to time a replica misses a sync (it will still have an older version of the lease that is retaining more data as we assume that renewals do not decrease the retaining sequence number). There are two follow-ups that will come after this commit. The first is to address the fact that we have not adapted the should periodically flush logic to possibly flush the retention leases. We want to do something like flush if we have not flushed in the last five minutes and there are renewed retention leases since the last time that we flushed. An additional follow-up will remove the syncing of retention leases when a retention lease expires. Today this sync could be invoked in the background by a merge operation. Rather, we will move the syncing of retention lease expiration to be done under the background sync. The background sync will use the heavyweight sync (write action) if a lease has expired, and will use the lightweight background sync (replication action) otherwise.	2019-02-04 10:35:29 -05:00
Christoph Büscher	5ee7232379	Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334 )	2019-02-04 16:10:06 +01:00
Christoph Büscher	715e581378	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330 )	2019-02-04 15:46:19 +01:00
Boaz Leskes	e49b593c81	Move TokenService to seqno powered cas (#38311 ) Relates #37872 Relates #10708	2019-02-04 15:25:41 +01:00
Yannick Welsch	ece8c659c5	Decrease leader and follower check timeout (#38298 ) Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still being a very long time for a node to be completely unresponsive.	2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka	9b64558efb	Migrating from joda to java.time. Watcher plugin (#35809 ) part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time refers #27330	2019-02-04 15:08:31 +01:00
Alexander Reelsen	87f3579125	Add nanosecond field mapper (#37755 ) This adds a dedicated field mapper that supports nanosecond resolution - at the price of a reduced date range. When using the date field mapper, the time is stored as milliseconds since the epoch in a long in lucene. This field mapper stores the time in nanoseconds since the epoch - which means its range is much smaller, ranging roughly from 1970 to 2262. Note that aggregations will still be in milliseconds. However docvalue fields will have full nanosecond resolution Relates #27330	2019-02-04 11:31:16 +01:00
Christoph Büscher	15510da2af	Mute SharedClusterSnapshotRestoreIT#testAbortedSnapshotDuringInitDoesNotStart (#38304 )	2019-02-04 10:41:35 +01:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
Armin Braun	4561f425db	Remove Redundandant Loop in SnapshotShardsService (#38283 ) * This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state	2019-02-04 09:06:39 +01:00
Alpar Torok	d58e899d45	Remove empty service files (#38192 )	2019-02-04 10:05:04 +02:00
Jason Tedor	d2cc1459a3	Fix ordering problem in add or renew lease test (#38280 ) We have to set the primary term before we add a retention lease, otherwise we can not assert the correct primary term.	2019-02-03 12:54:31 -05:00
Christoph Büscher	6ca7a913ea	Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (#38275 )	2019-02-03 12:54:13 +01:00
Armin Braun	89d7c57bd9	Fix Incorrect Transport Response Handler Type (#38264 ) * Fix Incorrect Transport Response Handler Type * The response type here is not empty and was always wrong but this only became visible now that `0a604e3b24` was introduced * As a result of `0a604e3b24` we started actually handling the response of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler * fix busy assert not handling `Exception` * Closes #38226 * Closes #38256	2019-02-03 08:48:15 +01:00
Nhat Nguyen	0861dc3581	Mute testCanRunUnsafeBootstrapAfterErroneousDetachWithoutLoosingMetaData (#38268 ) Tracked at #38267	2019-02-02 20:02:21 -05:00
Christoph Büscher	50cdc61874	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38257 )	2019-02-02 13:46:29 +01:00
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Nhat Nguyen	80d3092292	Fix primary term in testAddOrRenewRetentionLease (#38239 ) We should increase primary term before renewing leases; otherwise, the term of the latest RetentionLeases will be lower than the current term. Relates #37951	2019-02-02 02:38:53 -05:00
Nhat Nguyen	1ec04dff43	FIx testReplicaIgnoresOlderRetentionLeasesVersion (#38246 ) If the innerLength is 0, the version won't be increased; then there will be two RetentionLeases with the same term and version, but their leases are different. Relates #37951 Closes #38245	2019-02-02 02:37:37 -05:00
Nhat Nguyen	8bee5b8e06	Mute testAddOrRenewRetentionLease (#38240 ) Relates #38239	2019-02-01 21:27:10 -05:00
Boaz Leskes	f6e06a2b19	Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231 ) After backporting #37977, #37857 and #37872	2019-02-01 20:37:16 -05:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Nhat Nguyen	9c39dea7ae	AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227 ) Tracked at #38226	2019-02-01 16:24:02 -05:00
Armin Braun	03a1d21070	SnapshotShardsService Simplifications (#38025 ) * Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot * Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)	2019-02-01 20:46:14 +01:00
Luca Cavanna	ee57420de6	Adjust SearchRequest version checks (#38181 ) The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.	2019-02-01 19:23:13 +01:00
Andrey Ershov	04dc41b99e	Zen2ify RareClusterStateIT (#38184 ) In Zen 1 there are commit timeout and publish timeout and these settings could be changed on-the-fly. In Zen 2, there is only commit timeout and this setting is static. RareClusterStateIT is actively using these settings and the fact, they are dynamic. This commit adds cancelCommitedPublication method to Coordinator to be used by tests. This method will cancel current committed publication if there is any. When there is BlockClusterStateProcessing on the non-master node, the publication will be accepted and committed, but not yet applied. So we can use the method above to cancel it. Also, this commit replaces callback + AtomicReference with ActionFuture, which makes test code easier to read.	2019-02-01 18:18:11 +01:00
Yannick Welsch	025bf28405	Fix _host based require filters (#38173 ) Using index.routing.allocation.require._host does not correctly work because the boolean logic in filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when opType ==OpType.AND	2019-02-01 16:02:37 +01:00
Tanguy Leroux	da6269b456	RestoreService should update primary terms when restoring shards of existing indices (#38177 ) When restoring shards of existing indices, the RestoreService also restores the values of primary terms stored in the snapshot index metadata. The primary terms are not updated and could potentially conflict with current index primary terms if the restored primary terms are lower than the existing ones. This situation is likely to happen with replicated closed indices (because primary terms are increased when the index is transitioning from open to closed state, and the snapshotted primary terms are the one at the time the index was opened) (see #38024) and maybe also with CCR. This commit changes the RestoreService so that it updates the primary terms using the maximum value between the snapshotted values and the existing values. Related to #33888	2019-02-01 15:59:11 +01:00
Desmond Vehar	c1c4abae10	Throw if two inner_hits have the same name (#37645 ) This change throws an error if two inner_hits have the same name Closes #37584	2019-02-01 15:53:50 +01:00
Alexander Reelsen	35ed137684	Ensure joda compatibility in custom date formats (#38171 ) If custom date formats are used, there may be combinations that the new performat DateFormatters.from() method has not covered yet. This adds a few such corner cases and ensures the tests are correctly commented out.	2019-02-01 15:42:56 +01:00
Jim Ferenczi	66e4fb4fb6	Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (#38169 ) In #38158 we ensured that global ordinals are not loaded when another execution hint is explicitly set on the source. This change is a follow up that addresses a comment `dd6043c1c0 (r252984782)` added after the merge.	2019-02-01 15:32:19 +01:00
Nhat Nguyen	2e475d63f7	Do not set timeout for IndexRequests in GatewayIndexStateIT (#38147 ) CI might not be fast enough to publish a dynamic mapping update within 100ms.	2019-02-01 09:30:03 -05:00
Andrey Ershov	c1270e97b0	Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (#38178 ) In Zen2 cluster bootstrap is required and some parameters are called differently in Zen2.	2019-02-01 15:24:08 +01:00
Andrey Ershov	bda591453c	Add elasticsearch-node detach-cluster command (#37979 ) This commit adds the second part of `elasticsearch-node` tool - `detach-cluster` command in addition to `unsafe-bootstrap` command. Also, this commit changes the semantics of `unsafe-bootstrap`, now `unsafe-bootstrap` changes clusterUUID. So the algorithm of running `elasticsearch-node` tool is the following: 1) Stop all nodes in the cluster. 2) Pick master-eligible node with the highest (term, version) pair and run the `unsafe-bootstrap` command on it. If there are no survived master-eligible nodes - skip this step. 3) Run `detach-cluster` command on the remaining survived nodes. Detach cluster makes the following changes to the node metadata: 1) Sets clusterUUID committed to false. 2) Sets currentTerm and term to 0. 3) Removes voting tombstones and sets voting configurations to special constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster bootstrap. `ElasticsearchNodeCommand` base abstract class is introduced, because `UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in common. Also, this commit adds "ordinal" parameter to both commands, because it's impossible to write IT otherwise. For MUST_JOIN_ELECTED_MASTER case special handling is introduced in `ClusterFormationFailureHelper`. Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed from `UnsafeBootstrapMasterIT`).	2019-02-01 14:53:55 +01:00
Alexander Reelsen	979e5576e5	Add tests for fractional epoch parsing (#38162 ) Fractional epoch parsing is supported, the tests we used were edge cases that did not make sense. This adds tests to properly check for this.	2019-02-01 14:48:37 +01:00
Tanguy Leroux	029e4b6278	Clear send behavior rule in CloseWhileRelocatingShardsIT (#38159 ) The current CloseWhileRelocatingShardsIT test adds some "send behavior" rule to a target node's mocked transport service in order to detect when shard relocating are started. These rules are never cleared and prevent the test to complete normally after the rebalance is re-enabled again. This commit changes the test so that rules are cleared and most verifications are done before the rebalance is reenabled again. Closes #38090	2019-02-01 12:58:46 +01:00
Yannick Welsch	ce469cfda5	Fix testCorruptedIndex (#38161 ) Folks at the Lucene project do not seem to be interested in classifying corruptions and distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525), so we'll just cop out as well. Closes #34322	2019-02-01 12:51:38 +01:00
Luca Cavanna	e18cac3659	Add finalReduce flag to SearchRequest (#38104 ) With #37000 we made sure that fnial reduction is automatically disabled whenever a localClusterAlias is provided with a SearchRequest. While working on #37838, we found a scenario where we do need to set a localClusterAlias yet we would like to perform a final reduction in the remote cluster: when searching on a single remote cluster. Relates to #32125 This commit adds support for a separate finalReduce flag to SearchRequest and makes use of it in TransportSearchAction in case we are searching against a single remote cluster. This also makes sure that num_reduce_phases is correct when searching against a single remote cluster: it makes little sense to return `num_reduce_phases` set to `2`, which looks especially weird in case the search was performed against a single remote shard. We should perform one reduction phase only in this case and `num_reduce_phases` should reflect that. * line length	2019-02-01 12:11:42 +01:00
Jim Ferenczi	6fa93ca493	Forbid negative field boosts in analyzed queries (#37930 ) This change forbids negative field boost in the `query_string`, `simple_query_string` and `multi_match` queries. Negative boosts are not allowed in Lucene 8 (scores must be positive). The backport of this change to 6x will turn the error into a deprecation warning in order to raise the awareness of this breaking change in 7.0. Closes #33309	2019-02-01 11:41:40 +01:00
Jim Ferenczi	57b1d245e8	Remove AtomiFieldData#getLegacyFieldValues (#38087 ) This function is unused now that we format the docvalue fields with the default formatter on the field (#30831)	2019-02-01 11:41:17 +01:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Jim Ferenczi	b7308aa03c	Don't load global ordinals with the `map` execution_hint (#37833 ) The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality. Closes #37705	2019-02-01 09:35:46 +01:00
David Turner	23f00e3676	Relax fault detector in some disruption tests (#38101 ) Today we use `AbstractDisruptionTestCase` to test the behaviour of things like master elections in the presence of cluster disruptions. These tests have rather enthusiastic fault detection settings, detecting a fault if a single ping fails, with a one-second timeout. Furthermore there are some tests that assert the identity of the master remains unchanged during some disruption, and these assertions fail rather often thanks to the overly sensitive fault detector. However in a number of these tests the fault detector need not be this sensitive. This commit moves some such tests into their own test suite and uses more sensible fault-detection settings to avoid the kind of master instability that is causing CI failures. Closes #37699	2019-02-01 08:10:49 +00:00
Alexander Reelsen	c02cd3e2fd	Fix java time epoch date formatters (#37829 ) The self written epoch date formatters were not properly able to format an Instant to a string due to a misconfiguration. This fix also removes a until now existing runtime behaviour under java 8 regarding the names of the aggregation buckets, which are now the same as before and have been under java 11.	2019-02-01 09:03:48 +01:00

... 10 11 12 13 14 ...

3565 Commits