OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Roberts	e943e27954	Spawn controller processes from a different directory on macOS (#47013 ) This is the Java side of https://github.com/elastic/ml-cpp/pull/593 with a fallback so that ml-cpp bundles with either the new or old directory structure work for the time being. A few days after merging the C++ changes a followup to this change will be made that removes the fallback.	2019-09-27 14:02:40 +01:00
Yannick Welsch	6fd3b4723f	Remove write lock for Translog.getGeneration (#47036 ) No need for the write lock, and currentFileGeneration is already protected by the read lock. Also removes the unused method "isCurrent".	2019-09-27 13:58:07 +02:00
Jim Ferenczi	73a09b34b8	Replace SearchContextException with SearchException (#47046 ) This commit removes the SearchContextException in favor of a simpler SearchException that doesn't leak the SearchContext. Relates #46523	2019-09-26 14:21:23 +02:00
Tanguy Leroux	95e2ca741e	Remove unused private methods and fields (#47154 ) This commit removes a bunch of unused private fields and unused private methods from the code base. Backport of (#47115)	2019-09-26 12:49:21 +02:00
jimczi	97d977f381	#47046 Fix serialization version check after backport	2019-09-26 09:56:24 +02:00
Jim Ferenczi	04972baffa	Merge ShardSearchTransportRequest and ShardSearchLocalRequest (#46996 ) (#47081 ) This change merges the `ShardSearchTransportRequest` and `ShardSearchLocalRequest` into a single `ShardSearchRequest` that can be used to create a SearchContext. Relates #46523	2019-09-26 09:20:53 +02:00
Martijn van Groningen	429f23ea2f	Allow ingest processors to execute in a non blocking manner. (#47122 ) Backport of #46241 This PR changes the ingest executing to be non blocking by adding an additional method to the Processor interface that accepts a BiConsumer as handler and changing IngestService#executeBulkRequest(...) to ingest document in a non blocking fashion iff a processor executes in a non blocking fashion. This is the second PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. The plan is to merge changes made to the server module separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation. This change originates from the enrich branch and was introduced there in #43361.	2019-09-26 08:55:28 +02:00
David Turner	45c7783018	Warn on slow metadata persistence (#47130 ) Today if metadata persistence is excessively slow on a master-ineligible node then the `ClusterApplierService` emits a warning indicating that the `GatewayMetaState` applier was slow, but gives no further details. If it is excessively slow on a master-eligible node then we do not see any warning at all, although we might see other consequences such as a lagging node or a master failure. With this commit we emit a warning if metadata persistence takes longer than a configurable threshold, which defaults to `10s`. We also emit statistics that record how much index metadata was persisted and how much was skipped since this can help distinguish cases where IO was slow from cases where there are simply too many indices involved. Backport of #47005.	2019-09-26 07:40:54 +01:00
Tim Brooks	4f47e1f169	Extract proxy connection logic to specialized class (#47138 ) Currently the logic to check if a connection to a remote discovery node exists and otherwise create a proxy connection is mixed with the collect nodes, cluster connection lifecycle, and other RemoteClusterConnection logic. This commit introduces a specialized RemoteConnectionManager class which handles the open connections. Additionally, it reworks the "round-robin" proxy logic to create the list of potential connections at connection open/close time, opposed to each time a connection is requested.	2019-09-25 15:58:18 -06:00
Nhat Nguyen	7c5a088aa5	Increase ensureGreen timeout for testReplicaCorruption (#47136 ) We can have a large number of shard copies in this test. For example, the two recent failures have 24 and 27 copies respectively and all replicas have to copy segment files as their stores are corrupted. Our CI needs more than 30 seconds to start all these copies. Note that in two recent failures, the cluster was green just after the cluster health timed out. Closes #41899	2019-09-25 17:04:08 -04:00
Lee Hinman	a267df30fa	Wait for snapshot completion in SLM snapshot invocation (#47051 ) * Wait for snapshot completion in SLM snapshot invocation This changes the snapshots internally invoked by SLM to wait for completion. This allows us to capture more snapshotting failure scenarios. For example, previously a snapshot would be created and then registered as a "success", however, the snapshot may have been aborted, or it may have had a subset of its shards fail. These cases are now handled by inspecting the response to the `CreateSnapshotRequest` and ensuring that there are no failures. If any failures are present, the history store now stores the action as a failure instead of a success. Relates to #38461 and #43663	2019-09-25 14:25:22 -06:00
Armin Braun	93fcd23da8	Fail Snapshot on Corrupted Metadata Blob (#47009 ) (#47096 ) We should not be quietly ignoring a corrupted shard-level index-N blob. Simply creating a new empty shard-level index-N and moving on means that all snapshots of that shard show `SUCESS` as their state at the repository root but are in fact broken. This change at least makes it visible to the user that they can't snapshot the given shard any more and forces the user to move on to a new repository since the current one is broken and will not allow snapshotting the inconsistent shard again. Also, this change stops the delete action for shards with broken index-N blobs instead of simply deleting all blobs in the path containing the broken index-N. This prevents a temporarily broken/missing index-N blob from corrupting all snapshots of that shard.	2019-09-25 15:55:33 +02:00
Nhat Nguyen	22575bd7e6	Remove isRecovering method from Engine (#47039 ) We already prevent flushing in Engine if it's recovering. Hence, we can remove the protection in IndexShard.	2019-09-25 08:58:08 -04:00
Armin Braun	c4a166fc9a	Simplify SnapshotResiliencyTests (#46961 ) (#47108 ) Simplify `SnapshotResiliencyTests` to more closely match the structure of `AbstractCoordinatorTestCase` and allow for future drying up between the two classes: * Make the test cluster nodes a nested-class in the test cluster itself * Remove the needless custom network disruption implementation and simply track disconnected node ids like `AbstractCoordinatorTestCase` does	2019-09-25 14:53:11 +02:00
Yannick Welsch	81cbd3fba4	Mute ClusterShardLimitIT.testIndexCreationOverLimitFromTemplate Relates #47107	2019-09-25 14:03:08 +02:00
David Turner	ac920e8e64	Assert no exceptions during state application (#47090 ) Today we log and swallow exceptions during cluster state application, but such an exception should not occur. This commit adds assertions of this fact, and updates the Javadocs to explain it. Relates #47038	2019-09-25 12:32:51 +01:00
Martijn van Groningen	eef1ba3fad	Make ingest pipeline resolution logic unit testable (#47026 ) Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847	2019-09-25 11:35:00 +02:00
Daniel Mitterdorfer	48df560593	Emit log message when parent circuit breaker trips (#47000 ) (#47073 ) We emit a debug log message whenever a child circuit breaker trips (in `ChildMemoryCircuitBreaker#circuitBreak(String, long)`) but we never emit a log message when the parent circuit breaker trips. As this is more likely to happen with the real memory circuit breaker it is not possible to detect this in the logs. With this commit we add a log message on the same log level (debug) when the parent circuit breaker trips.	2019-09-25 10:22:46 +02:00
Julie Tibshirani	41ee8aa6fc	Reject regexp queries on the _index field. (#46945 ) We speculatively added support for `regexp` queries on the `_index` field in #34089 (this functionality was not actually requested by a user). Supporting regex logic adds complexity to the `_index` field for not much gain, so we would like to remove it. From an end-to-end test it turns out this functionality never even worked in the first place because of an error in how regex flags were interpreted! For this reason, we can remove support for `regexp` on `_index` without a deprecation period. Relates to #46640.	2019-09-24 12:17:00 -07:00
Tim Brooks	71ec0707cf	Remove locking around connection attempts (#46845 ) Currently in the ConnectionManager we lock around the node id. This is odd because we key connections by the ephemeral id. Upon further investigation it appears to me that we do not need the locking. Using the concurrent map, we can ensure that only one connection attempt completes. There is a very small chance that a new connection attempt will proceed right as another connection attempt is completing. However, since the whole process is asynchronous and event oriented (lightweight), that does not seem to be an issue.	2019-09-24 11:05:42 -06:00
Tim Brooks	f02582de4b	Reduce a bind failure to trace logging (#46891 ) Due to recent changes in the nio transport, a failure to bind the server channel has started to be logged at an error level. This exception leads to an automatic retry on a different port, so it should only be logged at a trace level.	2019-09-24 10:32:18 -06:00
David Turner	9135e2f9e3	Improve LeaderCheck rejection messages (#46998 ) Today the `LeaderChecker` rejects checks from nodes that are not in the current cluster with the exception message `leader check from unknown node` which offers no information about why the node is unknown. In fact the node must have been in the cluster in the recent past, so it might help guide the user to a more useful log message if we describe it as a `removed node` instead of an `unknown node`. This commit changes the exception message like this, and also tidies up a few other loose ends in the `LeaderChecker`.	2019-09-24 13:41:37 +01:00
David Turner	6943a3101f	Cut PersistedState interface from GatewayMetaState (#46655 ) Today `GatewayMetaState` implements `PersistedState` but it's an error to use it as a `PersistedState` before it's been started, or if the node is master-ineligible. It also holds some fields that are meaningless on nodes that do not persist their states. Finally, it takes responsibility for both loading the original cluster state and some of the high-level logic for writing the cluster state back to disk. This commit addresses these concerns by introducing a more specific `PersistedState` implementation for use on master-eligible nodes which is only instantiated if and when it's appropriate. It also moves the fields and high-level persistence logic into a new `IncrementalClusterStateWriter` with a more appropriate lifecycle. Follow-up to #46326 and #46532 Relates #47001	2019-09-24 12:31:13 +01:00
Julie Tibshirani	9124c94a6c	Add support for aliases in queries on _index. (#46944 ) Previously, queries on the _index field were not able to specify index aliases. This was a regression in functionality compared to the 'indices' query that was deprecated and removed in 6.0. Now queries on _index can specify an alias, which is resolved to the concrete index names when we check whether an index matches. To match a remote shard target, the pattern needs to be of the form 'cluster:index' to match the fully-qualified index name. Index aliases can be specified in the following query types: term, terms, prefix, and wildcard.	2019-09-23 13:21:37 -07:00
Jim Ferenczi	08f28e642b	Replace SearchContext with QueryShardContext in query builder tests (#46978 ) This commit replaces the SearchContext used in AbstractQueryTestCase with a QueryShardContext in order to reduce the visibility of search contexts. Relates #46523	2019-09-23 20:24:02 +02:00
Eray	199fff8a55	Allow max_children only in top level nested sort (#46731 ) This commit restricts the usage of max_children to the top level nested sort since it is ignored on the other levels.	2019-09-23 18:53:50 +02:00
Armin Braun	2da040601b	Fix Bug in Snapshot Status Response Timestamps (#46919 ) (#46970 ) Fixing a corner case where snapshot total time calculation was off when getting the `SnapshotStatus` of an in-progress snapshot. Closes #46913	2019-09-23 15:01:47 +02:00
David Turner	7bc86f23ec	Wait longer for leader failure in logs test (#46958 ) `testLogsWarningPeriodicallyIfClusterNotFormed` simulates a leader failure and waits for long enough that a failing leader check is scheduled. However it does not wait for the failing check to actually fail, which requires another two actions and therefore might take up to 200ms more. Unlucky timing would result in this test failing, for instance: ./gradle ':server:test' \ --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testLogsWarningPeriodicallyIfClusterNotFormed" \ -Dtests.jvm.argline="-Dhppc.bitmixer=DETERMINISTIC" \ -Dtests.seed=F18CDD0EBEB5653:E9BC1A8B062E697A This commit adds the extra delay needed for the leader failure to complete as expected. Fixes #46920	2019-09-23 10:52:13 +01:00
Armin Braun	ee4e6b1382	Add TestLogging for #46701 (#46939 ) (#46949 ) This at a very low rate and with the force merge in place before checking the cache size it's not clear why the cache is not of size `0` -> seems something else must be happening here that is unexpected. -> add debug logging to this test to find out Relates #46701	2019-09-21 15:24:58 +02:00
Armin Braun	938648fcff	Remove Duplicate Shard Snapshot State Updates (#46862 ) (#46906 ) We were repeatedly trying to send shard state updates for aborted snapshots on every cluster state update. This is simply dead-code since those updates are already safely sent in the callbacks passed to `SnapshotShardsService#snapshot`. On master failover, we ensure that the status update is resent via `SnapshotShardsService#syncShardStatsOnNewMaster`. => there is no need for trying to send updates here over and over and this logic can safely be removed	2019-09-20 14:30:03 +02:00
Jason Tedor	97acf353fa	Move pipelines resolved assertion (#46892 ) This assertion was added during the development of required pipelines. In the initial version of that work, the notion of whether or not a request was forwarded from the coordinating node to an ingest node was introduced. It was realized later that instead we needed to track whether or not the pipeline for the request was resolved. When that change was made, this assertion, while not incorrect, was left behind and only applied if the coordnating node was forwarding the request. Instead, the assertion applies whether or not the request is being forwarded. This commit addresses that by moving the assertion and renaming some variables.	2019-09-20 07:27:56 -04:00
Jason Tedor	2425fd1a50	Removed unused import from RequiredPipelineIT.java This commit removes an unused import that was left behind after cleaning up a backport. Sorry.	2019-09-19 16:46:27 -04:00
Jason Tedor	bd77626177	Add the ability to require an ingest pipeline (#46847 ) This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.	2019-09-19 16:37:45 -04:00
Armin Braun	c922743c5d	Remove Bogus Test: testDeleteOrphanSnapshot (#46835 ) (#46874 ) This test is broken with a very low failure rate after recent changes. Particularly after #45689 which does not check for duplicate snapshot uuids during snapshot finalization any more. The check for duplicate uuids during finalization was removed conciously since it lead to problems during master failover. This test fails because it increments the repository state id in an unexpected manner now, starting from the impossible situation of having the same snapshot UUID for two different repository state ids. This situation can't normally be reached, but we manually crafted it here. This test didn't do anything before though, because the manually crafted cluster state would simply result in an error during finalization before and nothing but a normal snapshot delete would be tested. => removing this test here, it doesn't test anything. Closes #46843	2019-09-19 18:52:35 +02:00
Yannick Welsch	9638ca20b0	Allow dropping documents with auto-generated ID (#46773 ) When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678	2019-09-19 16:46:33 +02:00
Armin Braun	a087553009	Rearrange BlobStoreRepository to Prepare #46250 (#46824 ) (#46853 ) In #46250 we will have to move to two different delete paths for BwC. Both paths will share the initial reads from the repository though so I extracted/separated the delete logic away from the initial reads to significantly simplify the diff against #46250 here. Also, I added some JavaDoc from #46250 here as well which makes the code a little easier to follow even ignoring #46250 I think.	2019-09-19 13:07:00 +02:00
Tanguy Leroux	3ae51f25dd	Move testSnapshotWithLargeSegmentFiles to ESMockAPIBasedRepositoryIntegTestCase (#46802 ) This commit moves the common test testSnapshotWithLargeSegmentFiles to the ESMockAPIBasedRepositoryIntegTestCase base class.	2019-09-18 15:41:30 +02:00
Christos Soulios	0076083b35	Implement rounding optimization for fixed offset timezones (#46809 ) Fixes #45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals	2019-09-18 15:56:34 +03:00
Armin Braun	142b10604e	Fix testHistoryRetention (#46799 ) (#46805 ) Suppress the reasonable-history check in this test to guarantee we're always getting ops based recovery even after a background sync. Closes #45953 Co-Authored-By: David Turner <david.turner@elastic.co>	2019-09-18 13:22:55 +02:00
Martijn van Groningen	ac4e990924	Add ingest cluster state listeners (#46650 ) In the case that an ingest processor factory relies on other configuration in the cluster state in order to construct a processor instance then it is currently undetermined if a processor facotry can be notified about a change if multiple cluster state updates are bundled together and if a processor implement `ClusterStateApplier` interface. (IngestService implements this interface too) The idea with ingest cluster state listener is that it is guaranteed to update the processor factory first before the ingest service creates a pipeline with their respective processor instances. Currently this concept is used in the enrich branch: https://github.com/elastic/elasticsearch/blob/enrich/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichProcessorFactory.java#L21 In this case it a processor factory is interested in enrich indices' _meta mapping fields. This is the third PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. Changes to the server module are merged separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation.	2019-09-18 09:13:16 +02:00
Armin Braun	2c70d403fc	Reenable+Fix testMasterShutdownDuringFailedSnapshot (#46303 ) (#46747 ) Reenable this test since it was fixed by #45689 in production code (specifically, the fact that we write the `snap-` blobs without overwrite checks now). Only required adding the assumed blocking on index file writes to test code to properly work again. * Closes #25281	2019-09-17 18:09:48 +02:00
Armin Braun	b0f09b279f	Make Snapshot Logic Write Metadata after Segments (#45689 ) (#46764 ) * Write metadata during snapshot finalization after segment files to prevent outdated metadata in case of dynamic mapping updates as explained in #41581 * Keep the old behavior of writing the metadata beforehand in the case of mixed version clusters for BwC reasons * Still overwrite the metadata in the end, so even a mixed version cluster is fixed by this change if a newer version master does the finalization * Fixes #41581	2019-09-17 13:09:39 +02:00
Armin Braun	c045bc7f54	Minor Rearrangements in Snapshot Code (#46652 ) (#46752 ) Inlining one trivial single-use method and extracting the stale shard path blob calculation to make the diff with #46250 more manageable.	2019-09-17 09:23:00 +02:00
Armin Braun	20cb95ca5e	Fix testSnapshotRelocatingPrimary to Actually Run Relocations (#46594 ) (#46620 ) Without replicas we won't actually get any relocations going when removing the node constraints in this test. Adjusted the code to force relocations by forbidding nodes that hold primaries instead. Also, fixed the timeouts and asserted that we actually get relocations. Fixes #46276	2019-09-16 15:15:33 +02:00
Andrei Dan	c57cca98b2	[ILM] Add date setting to calculate index age (#46561 ) (#46697 ) * [ILM] Add date setting to calculate index age Add the `index.lifecycle.origination_date` to allow users to configure a custom date that'll be used to calculate the index age for the phase transmissions (as opposed to the default index creation date). This could be useful for users to create an index with an "older" origination date when indexing old data. Relates to #42449. * [ILM] Don't override creation date on policy init The initial approach we took was to override the lifecycle creation date if the `index.lifecycle.origination_date` setting was set. This had the disadvantage of the user not being able to update the `origination_date` anymore once set. This commit changes the way we makes use of the `index.lifecycle.origination_date` setting by checking its value when we calculate the index age (ie. at "read time") and, in case it's not set, default to the index creation date. * Make origination date setting index scope dynamic * Document orignation date setting in ilm settings (cherry picked from commit d5bd2bb77ee28c1978ab6679f941d7c02e389d32) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-09-16 08:50:28 +01:00
Armin Braun	2b85dcb201	Parallelize Repository Cleanup Actions (#46647 ) (#46714 ) * Parallelize Repository Cleanup Actions Deleting root blobs and unreferenced indices can safely happen in parallel, no need to have both operations run sequentially when they preclude all other repository operations.	2019-09-16 07:52:03 +02:00
David Turner	272b0ecbdd	Remove docs for proxy mode (#46677 ) We added docs for proxy mode in #40281 but on reflection we should not be documenting this setting since it does not play well with all proxies and we can't recommend its use. This commit removes those docs and expands its Javadoc instead.	2019-09-13 22:20:11 +01:00
Nhat Nguyen	cabff5a7cd	Handle lower retaining seqno retention lease error (#46420 ) We renew the CCR retention lease at a fixed interval, therefore it's possible to have more than one in-flight renewal requests at the same time. If requests arrive out of order, then the assertion is violated. Closes #46416 Closes #46013	2019-09-13 08:50:19 -04:00
Nhat Nguyen	e1a33c6283	Fix false positive out of sync warning in synced-flush (#46576 ) Synced-flush consists of three steps: (1) force-flush on every active copy; (2) check for ongoing indexing operations; (3) seal copies if there's no change since step 1. If some indexing operations are completed on the primary but not replicas, then Lucene commits from step 1 on replicas won't be the same as the primary's. And step 2 would pass if it's executed when all pending operations are done. Once step 2 passes, we will incorrectly emit the "out of sync" warning message although nothing wrong here. Relates #28464 Relates #30244	2019-09-12 16:34:33 -04:00
Nhat Nguyen	5465c8d095	Increase timeout for relocation tests (#46554 ) There's nothing wrong in the logs from these failures. I think 30 seconds might not be enough to relocate shards with many documents as CI is quite slow. This change increases the timeout to 60 seconds for these relocation tests. It also dumps the hot threads in case of timed out. Closes #46526 Closes #46439	2019-09-12 16:34:01 -04:00

1 2 3 4 5 ...

3648 Commits