OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Nhat Nguyen	9c39dea7ae	AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227 ) Tracked at #38226	2019-02-01 16:24:02 -05:00
Armin Braun	03a1d21070	SnapshotShardsService Simplifications (#38025 ) * Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot * Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)	2019-02-01 20:46:14 +01:00
Luca Cavanna	ee57420de6	Adjust SearchRequest version checks (#38181 ) The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.	2019-02-01 19:23:13 +01:00
Andrey Ershov	04dc41b99e	Zen2ify RareClusterStateIT (#38184 ) In Zen 1 there are commit timeout and publish timeout and these settings could be changed on-the-fly. In Zen 2, there is only commit timeout and this setting is static. RareClusterStateIT is actively using these settings and the fact, they are dynamic. This commit adds cancelCommitedPublication method to Coordinator to be used by tests. This method will cancel current committed publication if there is any. When there is BlockClusterStateProcessing on the non-master node, the publication will be accepted and committed, but not yet applied. So we can use the method above to cancel it. Also, this commit replaces callback + AtomicReference with ActionFuture, which makes test code easier to read.	2019-02-01 18:18:11 +01:00
Yannick Welsch	025bf28405	Fix _host based require filters (#38173 ) Using index.routing.allocation.require._host does not correctly work because the boolean logic in filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when opType ==OpType.AND	2019-02-01 16:02:37 +01:00
Tanguy Leroux	da6269b456	RestoreService should update primary terms when restoring shards of existing indices (#38177 ) When restoring shards of existing indices, the RestoreService also restores the values of primary terms stored in the snapshot index metadata. The primary terms are not updated and could potentially conflict with current index primary terms if the restored primary terms are lower than the existing ones. This situation is likely to happen with replicated closed indices (because primary terms are increased when the index is transitioning from open to closed state, and the snapshotted primary terms are the one at the time the index was opened) (see #38024) and maybe also with CCR. This commit changes the RestoreService so that it updates the primary terms using the maximum value between the snapshotted values and the existing values. Related to #33888	2019-02-01 15:59:11 +01:00
Desmond Vehar	c1c4abae10	Throw if two inner_hits have the same name (#37645 ) This change throws an error if two inner_hits have the same name Closes #37584	2019-02-01 15:53:50 +01:00
Alexander Reelsen	35ed137684	Ensure joda compatibility in custom date formats (#38171 ) If custom date formats are used, there may be combinations that the new performat DateFormatters.from() method has not covered yet. This adds a few such corner cases and ensures the tests are correctly commented out.	2019-02-01 15:42:56 +01:00
Jim Ferenczi	66e4fb4fb6	Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (#38169 ) In #38158 we ensured that global ordinals are not loaded when another execution hint is explicitly set on the source. This change is a follow up that addresses a comment `dd6043c1c0 (r252984782)` added after the merge.	2019-02-01 15:32:19 +01:00
Nhat Nguyen	2e475d63f7	Do not set timeout for IndexRequests in GatewayIndexStateIT (#38147 ) CI might not be fast enough to publish a dynamic mapping update within 100ms.	2019-02-01 09:30:03 -05:00
Andrey Ershov	c1270e97b0	Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (#38178 ) In Zen2 cluster bootstrap is required and some parameters are called differently in Zen2.	2019-02-01 15:24:08 +01:00
Andrey Ershov	bda591453c	Add elasticsearch-node detach-cluster command (#37979 ) This commit adds the second part of `elasticsearch-node` tool - `detach-cluster` command in addition to `unsafe-bootstrap` command. Also, this commit changes the semantics of `unsafe-bootstrap`, now `unsafe-bootstrap` changes clusterUUID. So the algorithm of running `elasticsearch-node` tool is the following: 1) Stop all nodes in the cluster. 2) Pick master-eligible node with the highest (term, version) pair and run the `unsafe-bootstrap` command on it. If there are no survived master-eligible nodes - skip this step. 3) Run `detach-cluster` command on the remaining survived nodes. Detach cluster makes the following changes to the node metadata: 1) Sets clusterUUID committed to false. 2) Sets currentTerm and term to 0. 3) Removes voting tombstones and sets voting configurations to special constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster bootstrap. `ElasticsearchNodeCommand` base abstract class is introduced, because `UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in common. Also, this commit adds "ordinal" parameter to both commands, because it's impossible to write IT otherwise. For MUST_JOIN_ELECTED_MASTER case special handling is introduced in `ClusterFormationFailureHelper`. Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed from `UnsafeBootstrapMasterIT`).	2019-02-01 14:53:55 +01:00
Alexander Reelsen	979e5576e5	Add tests for fractional epoch parsing (#38162 ) Fractional epoch parsing is supported, the tests we used were edge cases that did not make sense. This adds tests to properly check for this.	2019-02-01 14:48:37 +01:00
Tanguy Leroux	029e4b6278	Clear send behavior rule in CloseWhileRelocatingShardsIT (#38159 ) The current CloseWhileRelocatingShardsIT test adds some "send behavior" rule to a target node's mocked transport service in order to detect when shard relocating are started. These rules are never cleared and prevent the test to complete normally after the rebalance is re-enabled again. This commit changes the test so that rules are cleared and most verifications are done before the rebalance is reenabled again. Closes #38090	2019-02-01 12:58:46 +01:00
Yannick Welsch	ce469cfda5	Fix testCorruptedIndex (#38161 ) Folks at the Lucene project do not seem to be interested in classifying corruptions and distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525), so we'll just cop out as well. Closes #34322	2019-02-01 12:51:38 +01:00
Luca Cavanna	e18cac3659	Add finalReduce flag to SearchRequest (#38104 ) With #37000 we made sure that fnial reduction is automatically disabled whenever a localClusterAlias is provided with a SearchRequest. While working on #37838, we found a scenario where we do need to set a localClusterAlias yet we would like to perform a final reduction in the remote cluster: when searching on a single remote cluster. Relates to #32125 This commit adds support for a separate finalReduce flag to SearchRequest and makes use of it in TransportSearchAction in case we are searching against a single remote cluster. This also makes sure that num_reduce_phases is correct when searching against a single remote cluster: it makes little sense to return `num_reduce_phases` set to `2`, which looks especially weird in case the search was performed against a single remote shard. We should perform one reduction phase only in this case and `num_reduce_phases` should reflect that. * line length	2019-02-01 12:11:42 +01:00
Jim Ferenczi	6fa93ca493	Forbid negative field boosts in analyzed queries (#37930 ) This change forbids negative field boost in the `query_string`, `simple_query_string` and `multi_match` queries. Negative boosts are not allowed in Lucene 8 (scores must be positive). The backport of this change to 6x will turn the error into a deprecation warning in order to raise the awareness of this breaking change in 7.0. Closes #33309	2019-02-01 11:41:40 +01:00
Jim Ferenczi	57b1d245e8	Remove AtomiFieldData#getLegacyFieldValues (#38087 ) This function is unused now that we format the docvalue fields with the default formatter on the field (#30831)	2019-02-01 11:41:17 +01:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Jim Ferenczi	b7308aa03c	Don't load global ordinals with the `map` execution_hint (#37833 ) The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality. Closes #37705	2019-02-01 09:35:46 +01:00
David Turner	23f00e3676	Relax fault detector in some disruption tests (#38101 ) Today we use `AbstractDisruptionTestCase` to test the behaviour of things like master elections in the presence of cluster disruptions. These tests have rather enthusiastic fault detection settings, detecting a fault if a single ping fails, with a one-second timeout. Furthermore there are some tests that assert the identity of the master remains unchanged during some disruption, and these assertions fail rather often thanks to the overly sensitive fault detector. However in a number of these tests the fault detector need not be this sensitive. This commit moves some such tests into their own test suite and uses more sensible fault-detection settings to avoid the kind of master instability that is causing CI failures. Closes #37699	2019-02-01 08:10:49 +00:00
Alexander Reelsen	c02cd3e2fd	Fix java time epoch date formatters (#37829 ) The self written epoch date formatters were not properly able to format an Instant to a string due to a misconfiguration. This fix also removes a until now existing runtime behaviour under java 8 regarding the names of the aggregation buckets, which are now the same as before and have been under java 11.	2019-02-01 09:03:48 +01:00
Yannick Welsch	859e2f5bc8	Adapt timeouts in UpdateMappingIntegrationIT Relates to #37263 and possibly #36916	2019-02-01 08:58:31 +01:00
Adrien Grand	d83c748417	Fix test bug in DynamicMappingsIT. (#37906 ) Closes #37898	2019-02-01 08:35:29 +01:00
Przemyslaw Gomulka	2758578570	Trim the JSON source in indexing slow logs (#38081 ) The '{' as a first character in log line is causing problems for beats when parsing plaintext logs. This can happen if the submitted document has an additional '\n' at the beginning and we are not reformatting. Trimming the source part of a SlogLog solves that and keeps the logs readable. closes #38080	2019-02-01 08:12:12 +01:00
Armin Braun	0a604e3b24	Fix Two Races that Lead to Stuck Snapshots (#37686 ) * Fixes two broken spots: 1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting 2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state * Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk * Significantly extended test infrastructure to reproduce the above two issues * Two new test runs: 1. Reproducing the effects of node disconnects/restarts in isolation 2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes * Relates #32265 * Closes #32348	2019-02-01 05:45:40 +01:00
Nhat Nguyen	b8b843476d	Disable dynamic mapping in testSimpleGetFieldMappingsWithDefaults (#38045 ) Since #31140 we no longer require acking on the dynamic mapping of index requests. Thus, a returned mapping from a get mapping request does not necessarily contain the dynamic updates from the index request. This commit replaces the dynamic mapping update with a manual put mapping. Relates #31140 Closes #37928	2019-01-31 21:01:41 -05:00
Nhat Nguyen	a8ebe2a217	Fix random params in testSoftDeletesRetentionLock (#38114 ) Since #37992 the retainingSequenceNumber is initialized with 0 while the global checkpoint can be -1. Relates #37992	2019-01-31 20:50:41 -05:00
Lee Hinman	c67a9663af	Fix MasterServiceTests.testClusterStateUpdateLogging (#38116 ) This changes the test to not use a `CountDownlatch`, instead adding an assertion for the final logging message and waiting until the `MockAppender` has seen it before proceeding. Related to df2c06f6f30f7e23a6863a3f72fc3bdb7648885c Resolves #23739	2019-01-31 17:13:19 -07:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Pascal Christoph	a3d9ba3f4b	Log document id when MapperParsingException occurs (#37800 ) Closes #37658	2019-01-31 16:33:13 -05:00
Nhat Nguyen	237fcda2cc	Disable dynamic mapping update in testTransportBulkTasks (#38073 ) If a replica does not have a right mapping yet, we will retry the index request on that replica; then the actual tasks is higher than the expected tasks. Since #31140 this happens more frequently for we no longer require acking on the dynamic mapping of index requests. Relates #31140 Closes #37893	2019-01-31 13:16:52 -05:00
Przemyslaw Gomulka	28b5c7ce78	Do not set up NodeAndClusterIdStateListener in test (#38110 ) When extending ESIntegTestCase are run on the same jvm, the static field in NodeAndClusterIdConverter will throw an AlreadySet exceptions. overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening relates #32850	2019-01-31 18:59:40 +01:00
Nhat Nguyen	8e95780f98	Soft-deletes policy should always fetch latest leases (#37940 ) If a new retention lease is added while a primary's soft-deletes policy is locked for peer-recovery, that lease won't be baked into the Lucene commit. Relates #37165 Relates #37375	2019-01-31 12:02:57 -05:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
David Turner	7f738e8541	Minor logging improvements (#38084 ) Fixes some log messages that caused some minor confusion when digging through a log generated by a failing test.	2019-01-31 16:41:04 +00:00
Tal Levy	9923f0fe6a	fix a few versionAdded values in ElasticsearchExceptions (#37877 ) TooManyBucketsException was introduced in v6.2 and SnapshotInProgressException was introduced in v6.7	2019-01-31 08:28:20 -08:00
Tanguy Leroux	7a597cad0d	Reenable BWC tests after backport of #37899 (#38093 ) This commit adapts the version used in StartedShardEntry serialization after the backport of #37899 and reenables bwc tests. Related to #37899 Related to #38074	2019-01-31 16:53:28 +01:00
Henning Andersen	7487be3d3c	Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock	2019-01-31 15:31:01 +01:00
Jason Tedor	a9b12b38f0	Push primary term to replication tracker (#38044 ) This commit pushes the primary term into the replication tracker. This is a precursor to using the primary term to resolving ordering problems for retention leases. Namely, it can be that out-of-order retention lease sync requests arrive on a replica. To resolve this, we need a tuple of (primary term, version). For this to be, the primary term needs to be accessible in the replication tracker. As the primary term is part of the replication group anyway, this change conceptually makes sense.	2019-01-31 09:19:49 -05:00
Luca Cavanna	622fb7883b	Introduce ability to minimize round-trips in CCS (#37828 ) With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125	2019-01-31 15:12:14 +01:00
Armin Braun	ae9f4df361	Don't Assert Ack on when Publish Timeout is 0 in Test (#38077 ) * Publish timeout is set to `0` so out of order processing of states on the node can lead to a `false` ack response * See #30672 * Closes #36813	2019-01-31 14:35:11 +01:00
Alexander Reelsen	9f026bb8ad	Reduce object creation in Rounding class (#38061 ) This reduces objects creations in the rounding class (used by aggs) by properly creating the objects only once. Furthermore a few unneeded ZonedDateTime objects were created in order to create other objects out of them. This was changed as well. Running the benchmarks shows a much faster performance for all of the java time based Rounding classes.	2019-01-31 14:18:28 +01:00
Adrien Grand	a536fa7755	Treat put-mapping calls with `_doc` as a top-level key as typed calls. (#38032 ) Currently the put-mapping API assumes that because the type name is `_doc` then it is dealing with a typeless put-mapping call. Yet we still allow running the put-mapping API in a typed fashion with `_doc` as a type name. The current logic triggers surprising errors when doing a typed put-mapping call with `_doc` as a type name on an index that has a type already. This is a bit of a corner-case, but is more important on 6.x due to the fact that using the index API with `_doc` as a type name triggers typed calls to the put-mapping API with `_doc` as a type name.	2019-01-31 13:57:42 +01:00
David Turner	eadcb5f0f8	Fix size of rolling-upgrade bootstrap config (#38031 ) Zen2 nodes will bootstrap themselves once they believe there to be no remaining Zen1 master-eligible nodes in the cluster, as long as minimum_master_nodes is satisfied. Today the bootstrap configuration comprises just the ids of the known master-eligible nodes, and this might be too small to be safe. For instance, if there are 5 master-eligible nodes (so that minimum_master_nodes is 3) then the bootstrap configuration could comprise just 3 nodes, of which 2 form a quorum, and this does not intersect other quorums that might arise, leading to a split-brain. This commit fixes this by expanding the bootstrap configuration so that its quorums satisfy minimum_master_nodes, by adding some of the IDs of the other master-eligible nodes in the last-published cluster state.	2019-01-31 08:00:11 +00:00
Alexander Reelsen	b94acb608b	Speed up converting of temporal accessor to zoned date time (#37915 ) The existing implementation was slow due to exceptions being thrown if an accessor did not have a time zone. This implementation queries for having a timezone, local time and local date and also checks for an instant preventing to throw an exception and thus speeding up the conversion. This removes the existing method and create a new one named DateFormatters.from(TemporalAccessor accessor) to resemble the naming of the java time ones. Before this change an epoch millis parser using the toZonedDateTime method took approximately 50x longer. Relates #37826	2019-01-31 08:55:40 +01:00
Alexander Reelsen	160d1bd4dd	Work around JDK8 timezone bug in tests (#37968 ) The timezone GMT0 cannot be properly parsed on java8. The randomZone() method now excludes GMT0, if java8 is used. Closes #37814	2019-01-31 08:52:35 +01:00
Nhat Nguyen	f5398d6511	Mute testRetentionLeasesSyncOnExpiration Tracked at #37963	2019-01-31 00:57:27 -05:00
Jason Tedor	a6a534f1f0	Reenable BWC testing after retention lease stats (#38062 ) This commit adjusts the BWC version on retention leases in stats, so with this we also reenable BWC testing.	2019-01-30 20:34:27 -05:00

1 2 3 4 5 ...

2473 Commits