OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Julie Tibshirani	c2e9d13ebd	Default include_type_name to false in the yml test harness. (#38058 ) This PR removes the temporary change we made to the yml test harness in #37285 to automatically set `include_type_name` to `true` in index creation requests if it's not already specified. This is possible now that the vast majority of index creation requests were updated to be typeless in #37611. A few additional tests also needed updating here. Additionally, this PR updates the test harness to set `include_type_name` to `false` in index creation requests when communicating with 6.x nodes. This mirrors the logic added in #37611 to allow for typeless document write requests in test set-up code. With this update in place, we can remove many references to `include_type_name: false` from the yml tests.	2019-02-01 11:44:13 -08:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Armin Braun	0a604e3b24	Fix Two Races that Lead to Stuck Snapshots (#37686 ) * Fixes two broken spots: 1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting 2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state * Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk * Significantly extended test infrastructure to reproduce the above two issues * Two new test runs: 1. Reproducing the effects of node disconnects/restarts in isolation 2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes * Relates #32265 * Closes #32348	2019-02-01 05:45:40 +01:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Przemyslaw Gomulka	28b5c7ce78	Do not set up NodeAndClusterIdStateListener in test (#38110 ) When extending ESIntegTestCase are run on the same jvm, the static field in NodeAndClusterIdConverter will throw an AlreadySet exceptions. overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening relates #32850	2019-01-31 18:59:40 +01:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
Jason Tedor	a9b12b38f0	Push primary term to replication tracker (#38044 ) This commit pushes the primary term into the replication tracker. This is a precursor to using the primary term to resolving ordering problems for retention leases. Namely, it can be that out-of-order retention lease sync requests arrive on a replica. To resolve this, we need a tuple of (primary term, version). For this to be, the primary term needs to be accessible in the replication tracker. As the primary term is part of the replication group anyway, this change conceptually makes sense.	2019-01-31 09:19:49 -05:00
Luca Cavanna	622fb7883b	Introduce ability to minimize round-trips in CCS (#37828 ) With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125	2019-01-31 15:12:14 +01:00
Tim Vernum	cde126dbff	Enable SSL in reindex with security QA tests (#37600 ) Update the x-pack/qa/reindex-tests-with-security integration tests to run with TLS enabled on the Rest interface. Relates: #37527	2019-01-31 20:59:50 +11:00
Alexander Reelsen	160d1bd4dd	Work around JDK8 timezone bug in tests (#37968 ) The timezone GMT0 cannot be properly parsed on java8. The randomZone() method now excludes GMT0, if java8 is used. Closes #37814	2019-01-31 08:52:35 +01:00
David Turner	81c443c9de	Deprecate minimum_master_nodes (#37868 ) Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in tests, but for 7.x nodes this setting is not required as it has no effect. This commit removes this setting so that nodes are started with more realistic configurations, and deprecates it.	2019-01-30 20:09:15 +00:00
Nik Everett	e97718245d	Test: Enable strict deprecation on all tests (#36558 ) This drops the option for tests to disable strict deprecation mode in the low level rest client in favor of configuring expected warnings on any calls that should expect warnings. This behavior is paranoid-by-default which is generally the right way to handle deprecations and tests in general.	2019-01-30 11:48:34 -05:00
Colin Goodheart-Smithe	21e392e95e	Removes typed calls from YAML REST tests (#37611 ) This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour. The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.	2019-01-30 16:32:58 +00:00
Tim Brooks	f3f9cabd67	Add timeout for ccr recovery action (#37840 ) This is related to #35975. It adds a action timeout setting that allows timeouts to be applied to the individual transport actions that are used during a ccr recovery.	2019-01-29 12:29:06 -07:00
Armin Braun	7f1784e9f9	Remove Dead MockTransport Code (#34044 ) * All these methods are unused	2019-01-29 15:08:11 +01:00
Luca Cavanna	2325fb9cb3	Remove test only SearchShardTarget constructor (#37912 ) Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.	2019-01-29 14:58:11 +01:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Jason Tedor	5fddb631a2	Introduce retention lease syncing (#37398 ) This commit introduces retention lease syncing from the primary to its replicas when a new retention lease is added. A follow-up commit will add a background sync of the retention leases as well so that renewed retention leases are synced to replicas.	2019-01-27 07:49:56 -05:00
Christoph Büscher	b4b4cd6ebd	Clean codebase from empty statements (#37822 ) * Remove empty statements There are a couple of instances of undocumented empty statements all across the code base. While they are mostly harmless, they make the code hard to read and are potentially error-prone. Removing most of these instances and marking blocks that look empty by intention as such. * Change test, slightly more verbose but less confusing	2019-01-25 14:23:02 +01:00
Jim Ferenczi	787acb14b9	Track total hits up to 10,000 by default (#37466 ) This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028	2019-01-25 13:45:39 +01:00
Julie Tibshirani	e1d8df4ffa	Deprecate types in create index requests. (#37134 ) From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates: * Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests. * Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.	2019-01-24 13:17:47 -08:00
Andrey Ershov	4974684003	Add tool elasticsearch-node unsafe-bootstrap (#37696 ) elasticsearch-node tool helps to restore cluster if half or more of master eligible nodes are lost. Of course, all bets are off, regarding data consistency. There are two parts of the tool: unsafe-bootstrap to be used when there is still at least one master-eligible node alive and detach-cluster, when there are no master-eligible nodes left. This commit implements the first part. Docs for the tool will be added separately as a part of #37812.	2019-01-24 19:25:55 +01:00
Tal Levy	289106a578	Refactor GeoHashGrid to be abstract and re-usable (#37742 ) This change split out all the specific GeoHash classes for the geohash_grid aggregation into abstract GeoGrid classes that can be re-used for specific hashing types, like `geohash`	2019-01-24 10:12:14 -08:00
Alpar Torok	37768b7eac	Testing conventions now checks for tests in main (#37321 ) * Testing conventions now checks for tests in main This is the last outstanding feature of the old NamingConventionsTask, so time to remove it. * PR review	2019-01-24 17:30:50 +02:00
Nhat Nguyen	a6abb28abf	Fix InternalEngineTests#assertOpsOnPrimary (#37746 ) The assertion `assertOpsOnPrimary` does not store seq_no and primary term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This leads to failures of the subsequence CAS deletes or indexes with seq_no and term. Moreover, this assertion trips a translog assertion because it bumps the primary term of some operations but not the primary term of the engine. Relates #36467 Closes #37684	2019-01-24 10:02:48 -05:00
Yannick Welsch	feab59df03	Bubble exceptions up in ClusterApplierService (#37729 ) Exceptions thrown by the cluster applier service's settings and cluster appliers are bubbled up, and block the state from being applied instead of silently being ignored. In combination with the cluster state publishing lag detector, this will throw a node out of the cluster that can't properly apply cluster state updates.	2019-01-24 14:09:03 +01:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
Boaz Leskes	52ba407931	Expose sequence number and primary terms in search responses (#37639 ) Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.	2019-01-23 09:01:58 +01:00
Henning Andersen	228611843c	Fail start of non-data node if node has data (#37347 ) * Fail start of non-data node if node has data Check that nodes started with node.data=false cannot start if they have shard data to avoid (old) indexes being resurrected into the cluster in red status. Issue #27073	2019-01-22 13:27:12 +01:00
Tim Brooks	21838d73b5	Extract message serialization from `TcpTransport` (#37034 ) This commit introduces a NetworkMessage class. This class has two subclasses - InboundMessage and OutboundMessage. These messages can be serialized and deserialized independent of the transport. This allows more granular testing. Additionally, the serialization mechanism is now a simple Supplier. This builds the framework to eventually move the serialization of transport messages to the network thread. This is the one serialization component that is not currently performed on the network thread (transport deserialization and http serialization and deserialization are all on the network thread).	2019-01-21 14:14:18 -07:00
Tim Brooks	f516d68fb2	Share `NioGroup` between http and transport impls (#37396 ) Currently we create dedicated network threads for both the http and transport implementations. Since these these threads should never perform blocking operations, these threads could be shared. This commit modifies the nio-transport to have 0 http workers be default. If the default configs are used, this will cause the http transport to be run on the transport worker threads. The http worker setting will still exist in case the user would like to configure dedicated workers. Additionally, this commmit deletes dedicated acceptor threads. We have never had these for the netty transport and they can be added back if a need is determined in the future.	2019-01-21 13:50:56 -07:00
Julie Tibshirani	8da7a27f3b	Deprecate types in the put mapping API. (#37280 ) From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates: - Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`. - Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.	2019-01-18 12:28:31 -08:00
Yannick Welsch	377d96e376	Remove initial_master_nodes on node restart (#37580 ) Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since being switched to Zen2 because the bootstrap setting was left around when nodes got restarted with data folders wiped. The test in question here was starting one node (which autobootstrapped to that single node), then another node. The first node was then shut down (after excluding it from the voting configuration), its data folder wiped, and restarted. After restart, the node had an empty data folder yet initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of its own, and not rejoin the existing cluster with the other node.	2019-01-18 16:36:42 +01:00
Jason Tedor	687978b7d1	Reject all requests that have an unconsumed body (#37504 ) This commit removes some leniency from REST handling where we move to reject all requests that have a body where the body is not used during the course of handling the request. For example, DELETE /index { "query" : { "term" : { "field" : "value" } } } is now rejected.	2019-01-16 07:29:25 -05:00
Przemyslaw Gomulka	5e94f384c4	Remove the use of AbstracLifecycleComponent constructor #37488 (#37488 ) The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class. It no longer extends the AbstractComponent so there is no need for this constructor There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor. This is part 1. which will be backported to 6.x with a migration guide/deprecation log. part 2 will have this constructor removed in 7 relates #35560 relates #34488	2019-01-16 09:05:30 +01:00
Tanguy Leroux	23ae9808ba	Fix IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) (#37414 ) This commit fixes the IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) method where the startReplica parameter was not correctly propagated and the value true always used instead.	2019-01-15 12:48:21 +01:00
Marios Trivyzas	d6a104f52b	[TEST] Muted testDifferentRolesMaintainPathOnRestart Relates to #37462	2019-01-15 11:51:45 +02:00
Jason Tedor	e11a32eda8	Reformat some classes in the index universe This commit reformats some classes in the index universe with the purpose of breaking some long method definitions and invocations into a line per parameter. This has the advantage that for an upcoming change to these definitions and invocations, the diff for that change will be a single line per definition or invocation. That makes these sorts of changes easier to read.	2019-01-14 21:45:24 -05:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Nhat Nguyen	1e3702da0b	Relax assertSameDocIdsOnShards assertion If the checking node no longer holds the shard copy, the assertion assertSameDocIdsOnShards might fail. This is too harsh since the assertion is to ensure the consistency between active copies.	2019-01-14 15:28:48 -05:00
Nhat Nguyen	15aa3764a4	Reduce recovery time with compress or secure transport (#36981 ) Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) \| Transport \| Baseline \| chunks=1 \| chunks=2 \| chunks=3 \| chunks=4 \| \| ----------\| ---------\| -------- \| -------- \| -------- \| -------- \| \| Plain \| 184s \| 137s \| 106s \| 105s \| 106s \| \| TLS \| 346s \| 294s \| 176s \| 153s \| 117s \| \| Compress \| 1556s \| 1407s \| 1193s \| 1183s \| 1211s \| - NYC_Taxis (38.6GB) \| Transport \| Baseline \| chunks=1 \| chunks=2 \| chunks=3 \| chunks=4 \| \| ----------\| ---------\| ---------\| ---------\| ---------\| -------- \| \| Plain \| 321s \| 249s \| 191s \| * \| * \| \| TLS \| 618s \| 539s \| 323s \| 290s \| 213s \| \| Compress \| 2622s \| 2421s \| 2018s \| 2029s \| n/a \| Relates #33844	2019-01-14 15:14:46 -05:00
Tim Brooks	5c68338a1c	Implement ccr file restore (#37130 ) This is related to #35975. It implements a file based restore in the CcrRepository. The restore transfers files from the leader cluster to the follower cluster. It does not implement any advanced resiliency features at the moment. Any request failure will end the restore.	2019-01-14 13:07:55 -07:00
Armin Braun	033e67fa59	Cleanup Deadcode in Rest Tests (#37418 ) * Either dead code outright or redundant overrides removed	2019-01-14 16:22:44 +01:00
Daniel Mitterdorfer	abe35fb99b	Remove unused index store in directory service With this commit we remove the unused field `indexStore` from all implementations of `FsDirectoryService`. Relates #37097	2019-01-14 13:44:32 +01:00
Jason Tedor	03be4dbaca	Introduce retention lease persistence (#37375 ) This commit introduces the persistence of retention leases by persisting them in index commits and recovering them when recovering a shard from store.	2019-01-12 14:43:19 -08:00
Nhat Nguyen	44a1071018	Make recovery source partially non-blocking (#37291 ) Today a peer-recovery may run into a deadlock if the value of node_concurrent_recoveries is too high. This happens because the peer-recovery is executed in a blocking fashion. This commit attempts to make the recovery source partially non-blocking. I will make three follow-ups to make it fully non-blocking: (1) send translog operations, (2) primary relocation, (3) send commit files. Relates #36195	2019-01-12 12:49:48 -05:00
Armin Braun	63fe3c6ed6	Fix PrimaryAllocationIT Race Condition (#37355 ) * Fix PrimaryAllocationIT Race Condition * Forcing a stale primary allocation on a green index was tripping the assertion that was removed * Added a test that this case still errors out correctly * Made the ability to wipe stopped datanode's data public on the internal test cluster and used it to ensure correct behaviour on the fixed test * Previously it simply passed because the test finished before the index went green and would NPE when the index was green at the time of the shard store status request, that would then come up empty * Closes #37345	2019-01-11 23:26:04 +01:00
Yannick Welsch	f4abf9628a	Mock connections more accurately in DisruptableMockTransport (#37296 ) This commit moves DisruptableMockTransport to use a more accurate representation of connection management, which allows to use the full connection manager and does not require mocking out any behavior. With this, we can implement restarting nodes in CoordinatorTests.	2019-01-11 16:06:48 +01:00

1 2 3 4 5 ...

1920 Commits