OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	495dc11c9c	Mute testPendingRefreshWithIntervalChange Tracked at #39565	2019-03-25 11:47:08 -04:00
Armin Braun	3968d46a17	Remove Redundant Request Wrappers from RepositoryService (#40192 ) (#40404 )	2019-03-25 16:36:02 +01:00
Armin Braun	dc5ff0fffc	Log Warning on Failed Blob Deletes in BlobStoreRepository (#40188 ) (#40340 ) * Log Warning on Failed Blob Deletes in BlobStoreRepository * We should not just debug log these spots, they all can and will lead to leaked files when snapshot deletion fails	2019-03-25 08:52:09 +01:00
Nhat Nguyen	b9f96a8e1f	Expose external refreshes through the stats API (#38643 ) Right now, the stats API only provides refresh metrics regarding internal refreshes. This isn't very useful and somewhat misleading for cluster administrators since the internal refreshes are not indicative of documents being available for search. In this PR I added a new metric for collecting external refreshes as they occur and exposing them through the stats API. Now, calling an endpoint for stats will yield external refresh metrics as well. Relates #36712	2019-03-24 22:21:00 -04:00
Armin Braun	13d76239a0	Use Netty ByteBuf Bulk Operations for Faster Deserialization (#40158 ) (#40339 ) * Use bulk methods to read numbers faster from byte buffers	2019-03-24 19:08:51 +01:00
Jason Tedor	10bbb082a4	Only run retention lease actions on active primary (#40386 ) In some cases, a request to perform a retention lease action can arrive on a primary shard before it is active. In this case, the primary shard would not yet be in primary mode, tripping an assertion in the replication tracker. Instead, we should not attempt to perform such actions on an initializing shard. This commit addresses this by not returning the primary shard in the single shard iterator if the primary shard is not yet active.	2019-03-23 09:39:39 -04:00
Zachary Tong	78f737dad3	Map value field to double in MovavgIT (#40230 ) We were accidentally not mapping the index, which meant dynamic mapping was choosing floats for the values. This led to enough loss of precision for the aggregated values to differ slightly from the test doubles, which accumulated into large differences in the holt output. This test fix adds an explicit mapping.	2019-03-21 14:03:14 -04:00
Jason Tedor	1e6941b138	Reduce retention lease sync intervals (#40302 ) This commit adjusts the frequency with which CCR renews retention leases and with which primaries sync retention leases to replicas. This helps Lucene reclaim soft-deleted documents more aggressively, which we have found in some use-cases can help improve performance, and either way will help keep disk space under more control.	2019-03-21 07:37:44 -04:00
Alan Woodward	83d2870308	Add `use_field` option to intervals query (#40157 ) This is the equivalent of the `field_masking_span` query, allowing users to merge intervals from multiple fields - for example, to search for stemmed tokens near unstemmed tokens.	2019-03-20 16:26:04 +00:00
Like	6f64267626	Make setting index.translog.sync_interval be dynamic (#37382 ) Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's not dynamic which can be updated for closed index only. Closes #32763	2019-03-20 17:12:45 +01:00
Yannick Welsch	a5fb7fb17c	Fix snapshot restore logging on fresh restore (#40252 ) A recent refactoring (#37130) where imports got mixed up (changing Lucene's IndexNotFoundException to Elasticsearch's IndexNotFoundException) led to many warnings being logged in case of restoring a fresh snapshot.	2019-03-20 16:51:44 +01:00
Jim Ferenczi	3400483af4	Add date and date_nanos conversion to the numeric_type sort option (#40199 ) (#40224 ) This change adds an option to convert a `date` field to nanoseconds resolution and a `date_nanos` field to millisecond resolution when sorting. The resolution of the sort can be set using the `numeric_type` option of the field sort builder. The conversion is done at the shard level and is restricted to dates from 1970 to 2262 for the nanoseconds resolution in order to avoid numeric overflow.	2019-03-20 16:50:28 +01:00
Nhat Nguyen	efaf95628b	Use separate translog dir in testDeleteWithFatalError This test currently opens a new engine but shares the same translog directory of the previous opening engine.	2019-03-20 10:22:27 -04:00
Mayya Sharipova	49a7c6e0e8	Expose proximity boosting (#39385 ) (#40251 ) Expose DistanceFeatureQuery for geo, date and date_nanos types Closes #33382	2019-03-20 09:24:41 -04:00
Henning Andersen	4c2a8638ca	Cascading primary failure lead to MSU too low (#40249 ) If a replica were first reset due to one primary failover and then promoted (before resync completes), its MSU would not include changes since global checkpoint, leading to errors during translog replay. Fixed by re-initializing MSU before restoring local history.	2019-03-20 14:00:43 +01:00
Simon Willnauer	235f57989f	Return cached segments stats if `include_unloaded_segments` is true (#39698 ) Today we don't return segments stats for closed indices which makes it hard to tell how much memory such an index would require. With this change we return the statistics if requested by setting `include_unloaded_segments` to true on the rest request. Relates to #39512	2019-03-20 12:08:41 +01:00
Jason Tedor	9ce740a2eb	Modfiy casing in JVM home log message This makes the log message consistent with the following line that shows the JVM arguments.	2019-03-20 00:06:16 -04:00
Zachary Tong	69f5869707	Mute SearchResponseMergerTests#testMergeSearchHits Tracking issue: https://github.com/elastic/elasticsearch/issues/40214	2019-03-19 13:40:38 -04:00
David Turner	33d8738c68	Fix RareClusterStateIT on MacOS (#40203 ) Today RareClusterStateIT#testAssignmentWithJustAddedNodes fails on my Mac because it waits for the default connection timeout of 30 seconds to connect to a fake node with IP address 0.0.0.0. This connection attempt fails much more quickly on Linux so the test passes. This commit fixes this by reducing the connection timeout for this test.	2019-03-19 17:33:21 +00:00
Nhat Nguyen	a13b4bc8c5	Always fail engine if delete operation fails (#40117 ) Unlike index operations which can fail at the document level to analyzing errors, delete operations should never fail at the document level whether soft-deletes is enabled or not. With this change, we will always fail the engine if we fail to apply a delete operation to Lucene. Closes #33256	2019-03-19 13:09:23 -04:00
Luca Cavanna	d14e79e849	Serialize top-level pipeline aggs as part of InternalAggregations (#40177 ) We currently convert pipeline aggregators to their corresponding InternalAggregation instance as part of the final reduction phase. They arrive to the coordinating node as part of QuerySearchResult objects fom the shards and, despite we may incrementally reduce aggs (hence we may have some non-final reduce and the final one later) all the reduction phases happen on the same node. With CCS minimizing roundtrips though, each cluster performs its own non-final reduction, and then serializes the results back to the CCS coordinating node which will perform the final coordination. This breaks the assumptions made up until now around reductions happening all on the same node. With #40101 we have made sure that top-level pipeline aggs are not reduced as part of the non-final reduction. The next step is to make sure that they don't get lost, meaning that each coordinating node needs to send them back to the CCS coordinating node as part of the top-level `InternalAggregations` object. Closes #40059	2019-03-19 14:43:39 +01:00
Luca Cavanna	803ec46331	Skip sibling pipeline aggregators reduction during non-final reduce (#40101 ) Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction, pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned. Each coordinating node should rather honour the reduce context flag that indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone. Note that his bug affects only pipeline aggs that don't have a parent in the aggs tree, while all the others work well. Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.	2019-03-19 14:43:39 +01:00
Luca Cavanna	83f12a3d9c	CCS: skip empty search hits when minimizing round-trips (#40098 ) When minimizing round-trips, each cluster returns its own independent search response. In case sort by field and/or field collapsing were requested, when one cluster has no results to return, the information about the field that sorting was based on (SortField array) as well as the field (and the values) that collapsing was performed on are missing in the search response. That causes problems as we can't build the proper `TopDocs` instance which would need to be either `TopFieldDocs` or `CollapseTopFieldDocs`. The merge routine expects that all the top docs are of the same exact type which can't be guaranteed. Given that the problematic results are empty, hence have no impact on the final results, we can simply skip them. Relates to #32125 Closes #40067	2019-03-19 14:43:39 +01:00
Luca Cavanna	9c38fa6468	[TEST] Update TransportSearchActionTests#testShouldMinimizeRoundtrips Relates to #40044 Closes #40051	2019-03-19 14:43:38 +01:00
Luca Cavanna	07bfb4c7f7	CCS: Disable minimizing round-trips when dfs is requested (#40044 ) When using DFS_QUERY_THEN_FETCH search type, the dfs phase is run and its results are used in the query phase to make scoring accurate. When using CCS, depending on whether the DFS phase runs in the CCS coordinating node (like if all shards were local) or in each remote cluster (when minimizing round-trips), scoring will differ. This commit disables minimizing round-trips whenever DFS is requested, as it is not currently possible to ensure that scoring is accurate in that case. Relates to #32125	2019-03-19 14:43:38 +01:00
Nhat Nguyen	8dc6862b17	Unmute and trace testPendingRefreshWithIntervalChange Tracked at #39565	2019-03-19 09:07:54 -04:00
Henning Andersen	dde41cc2dd	Node repurpose tool (#39403 ) When a node is repurposed to master/no-data or no-master/no-data, v7.x will not start (see #37748 and #37347). The `elasticsearch repurpose` tool can fix this by cleaning up the problematic data.	2019-03-19 11:52:02 +01:00
Dimitris Athanasiou	95f660d577	Mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock test (#39689 ) Relates #39688	2019-03-18 15:04:26 -06:00
Henning Andersen	0b214c1bfb	Linearizability checker memory reduction (#40149 ) The cache used in linearizability checker now uses approximately 6x less memory by changing the cache from a set of (bits, state) tuples into a map from bits -> { state }. Each combination of states is kept once only, building on the assumption that the number of state permutations is small compared to the number of bits permutations. For those histories that are difficult to check we will have many bits combinations that use the same state permutations. We end up now using approximately 15 bytes per entry compared to 101 bytes before, ie. a 6x improvement, allowing us to linearizability check significantly longer histories. Re-enabled linearizability checker in CoordinatorTests, hoping above ensures we no longer run out of memory. Resolves #39437	2019-03-18 21:16:59 +01:00
Nhat Nguyen	38e9522218	Remove wait for cluster state step in peer recovery (#40004 ) We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped using it since #25692 (6.0). This change removes that action and related code in 7.x and 8.0. Relates #19287 Relates #25692	2019-03-18 15:17:21 -04:00
Nhat Nguyen	d720a64b9e	Ensure sendBatch not called recursively (#39988 ) This PR introduces AsyncRecoveryTarget which executes remote calls of peer recovery asynchronously. In this change, we also add a new assertion to ensure that method sendBatch, which sends a batch of history operations in phase2, is never called recursively on the same thread. This new assertion will also be used in method sendFileChunks.	2019-03-18 15:17:21 -04:00
Jim Ferenczi	eb540125ea	Fix IndexSearcherWrapper visibility (#39071 ) (#40145 ) This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin. This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed. Closes #30758	2019-03-18 11:33:54 +01:00
Jim Ferenczi	5b73a1bc7d	Add an option to force the numeric type of a field sort (#38095 ) (#40084 ) This change adds an option to the `FieldSortBuilder` that allows to transform the type of a numeric field into another. Possible values for this option are `long` that transforms the source field into an integer and `double` that transforms the source field into a floating point. This new option is useful for cross-index search when the sort field is mapped differently on some indices. For instance if a field is mapped as a floating point in one index and as an integer in another it is possible to align the type for both indices using the `numeric_type` option: ``` { "sort": { "field": "my_field", "numeric_type": "double" <1> } } ``` <1> Ensure that values for this field are transformed to a floating point if needed.	2019-03-18 09:32:45 +01:00
Albert Zaharovits	1b75ee0bd7	AuditTrail correctly handle ReplicatedWriteRequest (#39925 ) This fix deduplicates index names in `BulkShardRequests` and only audits the specific resolved index for every comprising `BulkItemRequest`.	2019-03-17 13:05:26 +02:00
Jason Tedor	86d1d03c37	Remove cluster state size (#40109 ) This commit removes the cluster state size field from the cluster state response, and drops the backwards compatibility layer added in 6.7.0 to continue to support this field. As calculation of this field was expensive and had dubious value, we have elected to remove this field.	2019-03-15 17:16:25 -04:00
Tim Brooks	0b50a670a4	Remove transport name from tcp channel (#40074 ) Currently, we maintain a transport name ("mock-nio", "nio", "netty") that is passed to a `TcpTransportChannel` when a request is received. The value of this name is to associate with the task when we register a task with the task manager. However, it is only possible to run ES with one transport, so having an implementation specific name is unnecessary. This commit removes the name and replaces it with the generic "transport".	2019-03-15 12:04:13 -06:00
Zachary Tong	c72feedd74	Do not allow Sampler to allocate more than maxDoc size, better CB accounting (#39381 ) The `sampler` agg creates a BestDocsDeferringCollector, which internally initializes a priority queue of size `shardSize`. This queue is populated with empty `Object` sentinels, which is roughly 16b per object. Similarly, the Diversified samplers create a DiversifiedTopDocsCollectors which internally track PQ slots with ScoreDocKeys, weighing in around 28kb If the user sets a very abusive `shard_size`, this could easily OOM a node or cluster since these PQ are allocated up-front without any checks. This commit makes sure that when we create the collector, it cannot be larger than the maxDoc so that we don't accidentally blow up the node. We ensure the size is not greater than the overall index maxDoc. A similar treatment is done for `maxDocsPerValue` parameter of the diversified samplers For good measure, this also adds in some CB accounting to try and track memory usage. Finally, a redundant array creation is removed to reduce a bit of temporary memory.	2019-03-15 13:19:55 -04:00
Yannick Welsch	c74111ff8e	Reduce logging noise when stepping down as master before state recovery (#39950 ) Reduces the logging noise from the state recovery component when there are duelling elections. Relates to #32006	2019-03-15 17:24:03 +01:00
David Turner	0d152a54f8	Await all pending activity in testConnectAndDisconnect (#40037 ) We call `ensureConnections()` to undo the effects of a disruption. However, it is possible that one or more targets are currently CONNECTING and have been since the disruption was active, and that the connection attempt was thwarted by a concurrent disruption to the connection. If so, we cannot simply add our listener to the queue because it will be notified when this CONNECTING activity completes even though it was disrupted. We must therefore wait for all the current activity to finish and then go through and reconnect to any missing nodes. Closes #40030.	2019-03-15 08:08:57 +00:00
David Turner	a323132503	Create retention leases file during recovery (#39359 ) Today we load the shard history retention leases from disk whenever opening the engine, and treat a missing file as an empty set of leases. However in some cases this is inappropriate: we might be restoring from a snapshot (if the target index already exists then there may be leases on disk) or force-allocating a stale primary, and in neither case does it make sense to restore the retention leases from disk. With this change we write an empty retention leases file during recovery, except for the following cases: - During peer recovery the on-disk leases may be accurate and could be needed if the recovery target is made into a primary. - During recovery from an existing store, as long as we are not force-allocating a stale primary. Relates #37165	2019-03-15 07:49:49 +00:00
David Turner	8d2184b315	Fix up committed configuration on fake Zen1 nodes (#40065 ) Today we test Zen1/Zen2 compatibility by running 7.x nodes with a "fake" Zen1 implementation. However this is not a truly faithful test because these nodes do known how to properly deserialize a 7.x cluster state, voting configurations and all, whereas a real Zen1 node is in 6.7 and ignores the coordination metadata. We only ever apply a cluster state that's been committed, which in Zen2 involves setting the last-committed configuration to equal the last-accepted configuration. Zen1 knows nothing about this adjustment, so it is possible for these to differ. This breaks the assertion that the cluster states are equal on all nodes after integration tests. This commit fixes this by implementing this adjustment in Zen1 before applying a cluster state. Fixes #40055.	2019-03-15 07:44:31 +00:00
Ioannis Kakavas	35aaf04c8c	Handle empty input in AddStringKeyStoreCommand (#39490 ) This change ensures that we do not make assumptions about the length of the input that we can read from the stdin. It still consumes only one line, as the previous implementation	2019-03-15 09:38:22 +02:00
Tamara Braun	e2b60c7141	Fix not Recognizing Disabled Object Mapper (#39862 ) * Fixes not finding disabled object mapper when using dotted field name notation * Closes #39456	2019-03-14 10:57:00 -07:00
Ioannis Kakavas	8dc8fc507d	Handle UTF-8 values in the keystore (#39496 ) * Handle UTF8 values in the keystore Our current implementation uses CharBuffer#array to get the chars that were decoded from the UTF-8 bytes. The backing array of CharBuffer is created in CharsetDecoder#decode and gets an initial length that is the same as the length of the ByteBuffer it decodes, hence the number of UTF-8 bytes. This works fine for the first 128 characters where each one needs one bytes, but for the next UTF-8 characters (other latin alphabets Greek, Cyrillic etc.) where we need 2 to 4 bytes per character, this backing char array has a larger size than the number of the actual chars this CharBuffer contains. Calling `array()` on it will return a char array that can potentially have extra null chars so the SecureString we get from the KeystoreWrapper, is not the same as the one we entered. This commit changes the behavior to use Arrays#copyOfRange to get the necessary chars from the CharBuffer and adds a test with random ( maybe not printable ) UTF-8 strings	2019-03-14 18:03:50 +02:00
Jason Tedor	9181668edf	Stop returning cluster state size by default (#40016 ) Computing the compressed size of the cluster state on every invocation of cluster:monitor/state action is expensive, and the value of this field is dubious anyway. Therefore we want to remove computing this field. As a first step, we stop computing and return this field by default. To avoid breaking users, we will give them a system property to use to tide them over until the next major release when we will actually remove this field. This comes with a deprecation warning too, and the backport to the appropriate minor will also include a note in the migration guide. There will be a follow-up to remove this field in the next major version.	2019-03-14 08:57:55 -04:00
Yogesh Gaikwad	20e5994179	Mute failing tests in NodeConnectionsServiceTests (#40034 ) (#40035 )	2019-03-14 19:40:15 +11:00
Przemyslaw Gomulka	8a314a36db	Change zone formatting for all printers backport(#39568 ) #39952 After the joda-java time migration we were formatting zone ids with zoneOrOffsetId method. This when a date was provided with a ZoneRegion for instance America/Edmonton it was appending this zone identifier instead of zone formatted as +HH:MM. This fix is changing the format of zone suffix for all printers and also always wrapping a Temporal into a ZonedDateTime when formatting. closes #38471 backport #39568	2019-03-13 18:27:37 +01:00
Jim Ferenczi	7a7658707a	Upgrade to Lucene release 8.0.0 (#39998 ) This commit upgrades to the GA release of Lucene 8 Closes #39640	2019-03-13 18:11:50 +01:00
Tim Brooks	352f9f1f39	Remove sizing from `Recycler#obtain` (#39975 ) Currently there is a method `Recycler#obtain(size)` that allows a size parameter to be passed. However all implementations ignore this parameter and just allocate a page size based on other settings. This commit removes this method.	2019-03-13 09:32:31 -06:00
Andrey Ershov	9300826d8a	Do not log unsuccessful join attempt each time (#39756 ) When performing the test with 57 master-eligible nodes and one node crash, we saw messy elections, when multiple nodes were attempting to become master. JoinHelper has logged 105 long log messages with lengthy stack traces during one such election. To address this, we decided to log these messages every time only on debug level. We will log last unsuccessful join attempt (along with a timestamp) if any with WARN level if the cluster is failing to form. (cherry picked from commit 17a148cc27b5ac6c2e04ef5ae344da05a8a90902)	2019-03-13 13:30:31 +01:00
Christoph Büscher	b10dd3769c	Add analysis modes to restrict token filter use contexts (#36103 ) Currently token filter settings are treated as fixed once they are declared and used in an analyzer. This is done to prevent changes in analyzers that are already used actively to index documents, since changes to the analysis chain could corrupt the index. However, it would be safe to allow updates to token filters at search time ("search_analyzer"). This change introduces a new property of token filters that allows to mark them as only being usable at search or at index time. Any analyzer that uses these tokenfilters inherits that property and can be rejected if they are used in other contexts. This is a first step towards making specific token filters (e.g. synonym filter) updateable. Relates to #29051	2019-03-12 23:48:55 +01:00
Andy Bristol	e2b88bc706	add version 6.6.3	2019-03-12 13:21:36 -07:00
David Turner	049970af3e	Only connect to new nodes on new cluster state (#39629 ) Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves #29025.	2019-03-12 19:26:13 +00:00
Przemyslaw Gomulka	a29bba4ede	Migrate Streamable to writeable for index package backport(#37381 ) #39949 Migrate streamable classes from index package to Writeable and clean up access modifiers Related to #34389 backport#37381	2019-03-12 12:10:36 +01:00
lzh3636	ad55e5b80d	Log missing file exception when failing to read metadata snapshot (#32920 ) Adds the exception to the logged output, which contains info about the file that's missing.	2019-03-12 10:41:44 +01:00
Nhat Nguyen	ce5f09ab04	Enforce retention leases require soft deletes (#39922 ) If a primary on 6.7 and a replica on 5.6 are running more than 5 minutes (retention leases background sync interval), the retention leases background sync will be triggered, and it will trip 6.7 node due to the illegal checkpoint value. We can fix the problem by making the returned checkpoint depends on the node version. This PR, however, chooses to enforce retention leases require soft deletes, and make retention leases sync noop if soft deletes is disabled instead. Closes #39914	2019-03-11 22:37:47 -04:00
Nhat Nguyen	bf814357ad	Enable soft deletes in RetentionLeaseIT Relates #39922	2019-03-11 22:37:42 -04:00
Armin Braun	9eb4614fa6	More Verbose Assertion in testSnapshotWithStuckNode (#39893 ) (#39928 ) * The test failure in #39852 is caused by a file in the initial repository when there should not be any * It seems that on a normal consistent file system no left-over file should exist ever here after the validation finishes and I can't reproduce or see any other path to a dangling file in the fresh respository => added a more verbose and strict assertion that will log what file is left over next time * Relates #39852	2019-03-11 19:27:08 +01:00
Jake Landis	b0b0f66669	Remove types from internal monitoring templates and bump to api 7 (#39888 ) (#39926 ) This commit removes the "doc" type from monitoring internal indexes. The template still carries the "_doc" type since that is needed for the internal representation. This change impacts the following templates: monitoring-alerts.json monitoring-beats.json monitoring-es.json monitoring-kibana.json monitoring-logstash.json As part of the required changes, the system_api_version has been bumped from "6" to "7" and support for version "2" has been dropped. A new empty pipeline is now introduced for the version "7", and the formerly empty "6" pipeline will now remove the type and re-direct the request to the "7" index. Additionally, to due to a difference in the internal representation (which requires the inclusion of "_doc" type) and external representation (which requires the exclusion of any type) a helper method is introduced to help convert internal to external representation, and used by the monitoring HTTP template exporter. Relates #38637	2019-03-11 13:17:27 -05:00
Yannick Welsch	4f941c6963	Do not swallow exceptions in TimedRunnable (#39856 ) Executors of type fixed_auto_queue_size (i.e. search / search_throttled) wrap runnables into TimedRunnable, which is an AbstractRunnable. This is dangerous as it might silently swallow exceptions, and possibly miss calling a response listener. While this has not triggered any failures in the tests I have run so far, it might help uncover future problems. Follow-up to #36137	2019-03-11 19:03:12 +01:00
Yannick Welsch	292eb8b001	Fix CoordinatorTests.testIncompatibleDiffResendsFullState (#39345 ) This test started failing since decreasing the leader and follower check timeouts (#38298). The reason is that the test was relying on the default publication timeout to come into effect before leader / follower check timeouts, which is now not always true anymore. Closes #38867	2019-03-11 19:03:10 +01:00
Tim Brooks	dd77899278	Log send failure at debug level if channel closed (#39807 ) Currently we log exceptions due to channel close at the debug level in the normal exception handler. Currently we log all send failures due to channel close at the warn level. This commit changes that to only log at warn if the send failure is not due to channel closed. Additionally, it adds the ssl engine closed as a channel close exception.	2019-03-11 10:33:02 -06:00
Yannick Welsch	b7be724e50	Check term earlier in publication process (#39909 ) in order to avoid tripping assertPreviousStateConsistency. Closes #39314	2019-03-11 15:40:20 +01:00
David Turner	6e4f304f88	Synchronize pendingOutgoingJoins (#39900 ) Today we use a ConcurrentHashSet to track the in-flight outgoing joins in the `JoinHelper`. This is fine for adding and removing elements but not for the emptiness test in `isJoinPending()` which might return false if one join finishes just after another one starts, even though joins were pending throughout. As used today this is ok: it means the node was trying to join a master but this join attempt just finished unsuccessfully, and causes it to (rightfully) reject a `FollowerCheck` from the failed master. However this kind of API inconsistency is trappy and there is no need to be clever here, so this change replaces the set with a `synchronizedSet()`.	2019-03-11 12:13:21 +00:00
Ankit Jain	471aa6a16a	Fixing 503 Service Unavailable errors during fetch phase (#39086 ) When ESRejectedExecutionException gets thrown on the coordinating node while trying to fetch hits, the resulting exception will hold no shard failures, hence `503` is used as the response status code. In that case, `429` should be returned instead. Also, the status code should be taken from the cause if available whenever there are no shard failures instead of blindly returning `503` like we currently do. Closes #38586	2019-03-11 10:13:55 +01:00
Adrien Grand	b841de2e38	Don't emit deprecation warnings on calls to the monitoring bulk API. (#39805 ) (#39838 ) The monitoring bulk API accepts the same format as the bulk API, yet its concept of types is different from "mapping types" and the deprecation warning is only emitted as a side-effect of this API reusing the parsing logic of bulk requests. This commit extracts the parsing logic from `_bulk` into its own class with a new flag that allows to configure whether usage of `_type` should emit a warning or not. Support for payloads has been removed for simplicity since they were unused. @jakelandis has a separate change that removes this notion of type from the monitoring bulk API that we are considering bringing to 8.0.	2019-03-11 07:58:28 +01:00
Adrien Grand	2bbef67770	Propagate exceptions in o.e.common.io.Streams. (#39042 ) (#39848 ) This commit propagates some exceptions that were previously swallowed and also makes sure that exceptions closing streams are either propagated if the try block succeeded or added as suppressed exceptions otherwise.	2019-03-11 07:58:01 +01:00
Benjamin Trent	4da04616c9	[ML] refactoring lazy query and agg parsing (#39776 ) (#39881 ) * [ML] refactoring lazy query and agg parsing * Clean up and addressing PR comments * removing unnecessary try/catch block * removing bad call to logger * removing unused import * fixing bwc test failure due to serialization and config migrator test * fixing style issues * Adjusting DafafeedUpdate class serialization * Adding todo for refactor in v8 * Making query non-optional so it does not write a boolean byte	2019-03-10 14:54:02 -05:00
Julie Tibshirani	8454cfc1b2	Move validation from FieldTypeLookup to MapperMergeValidator. (#39814 ) This commit consolidates more mapping validation logic into the same class. `FieldTypeLookup` is now a bit simpler, and has the sole responsibility of quickly resolving field names to their types. I have a broader refactor planned around mapping merge validation, but this change should at least be a step in the right direction.	2019-03-08 18:05:21 -08:00
Nhat Nguyen	993182e426	Combine overriddenOps and skippedOps in translog (#39771 ) These two stats are not important enough to be distinguishable. This change combines them into a single stat. Closes #33317	2019-03-08 16:28:50 -05:00
Julie Tibshirani	be9c37fc76	Small simplifications to mapping validation. (#39777 ) These simplifications to `MapperMergeValidator` are possible now that there is always a single mapping definition. * Remove the type argument in `validateMapperStructure`. * Remove unnecessary checks against existing mappers.	2019-03-08 12:34:09 -08:00
Nhat Nguyen	a0a91f74ff	Treat TransportService stopped error as node is closing (#39800 ) If TransportService is stopped before a shard-failure request is sent but after the request is registered, TransportService will notify ReplicationOperation a TransportException with an error message: "transport stop, action: internal:cluster/shard/failure". Relates #39584	2019-03-08 15:15:56 -05:00
Ryan Ernst	465343f12a	Bundle java in distributions (#38013 ) * Bundle java in distributions Setting up a jdk is currently a required external step when installing elasticsearch. This is particularly problematic for the rpm/deb packages as installing a jdk in the same package installation command does not guarantee any order, so must be done in separate steps. Additionally, JAVA_HOME must be set and often causes problems in selecting a correct jdk when, for example, the system java is an older unsupported version. This commit bundles platform specific openjdks into each distribution. In addition to eliminating the issues above, it also presents future possible improvements like using jlink to build jdk images only containing modules that elasticsearch uses. closes #31845	2019-03-08 11:04:18 -08:00
Gordon Brown	e6b9262a31	Mute testOpenCloseApiWildcards (#39578 ) (#39579 )	2019-03-08 15:18:16 +00:00
David Roberts	aec2db78ea	Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica Due to https://github.com/elastic/elasticsearch/issues/36813	2019-03-08 13:28:27 +00:00
David Roberts	366eef99a1	Mute SharedClusterSnapshotRestoreIT.testCloseOrDeleteIndexDuringSnapshot Due to https://github.com/elastic/elasticsearch/issues/39828	2019-03-08 11:42:13 +00:00
David Turner	5d68143b18	Reformat elasticsearch-node messages (#39811 ) Flows the warning messages emitted by the `elasticsearch-node` tool to a width of 72 characters and tweaks the wording slightly.	2019-03-08 10:01:29 +00:00
Jake Landis	797d6b8a66	Execute ingest node pipeline before creating the index (#39607 ) (#39796 ) Prior to this commit (and after 6.5.0), if an ingest node changes the _index in a pipeline, the original target index would be created. For daily indexes this could create an extra, empty index per day. This commit changes the TransportBulkAction to execute the ingest node pipeline before attempting to create the index. This ensures that the only index created is the original or one set by the ingest node pipeline. This was the execution order prior to 6.5.0 (#32786). The execution order was changed in 6.5 to better support default pipelines. Specifically the execution order was changed to be able to read the settings from the index meta data. This commit also includes a change in logic such that if the target index does not exist when ingest node pipeline runs, it will now pull the default pipeline (if one exists) from the settings of the best matched of the index template. Relates #32786 Relates #32758 Closes #36545	2019-03-07 13:31:41 -06:00
Jason Tedor	0250d554b6	Introduce forget follower API (#39718 ) This commit introduces the forget follower API. This API is needed in cases that unfollowing a following index fails to remove the shard history retention leases on the leader index. This can happen explicitly through user action, or implicitly through an index managed by ILM. When this occurs, history will be retained longer than necessary. While the retention lease will eventually expire, it can be expensive to allow history to persist for that long, and also prevent ILM from performing actions like shrink on the leader index. As such, we introduce an API to allow for manual removal of the shard history retention leases in this case.	2019-03-07 11:08:45 -05:00
Armin Braun	213cc6673c	Remove Dead Code in o.e.util package (#39717 ) (#39779 ) * None of this code is used so we should delete it, we can always bring it back if needed	2019-03-07 08:31:46 +01:00
Nhat Nguyen	b69affda6a	Use unwrapped cause to determine if node is closing (#39723 ) We need to unwrap and use the actual cause when determining if the node with primary shard is shutting down because TransportService will throw a TransportException wrapped in a SendRequestTransportException. Relates #39584	2019-03-06 15:30:55 -05:00
Nhat Nguyen	1fe7cb594f	Don’t ack if unable to remove failing replica (#39584 ) Today when a replicated write operation fails to execute on a replica, the primary will reach out to the master to fail that replica (and mark it stale). We then won't ack that request until the master removes the failing replica; otherwise, we will lose the acked operation if the failed replica is still in the in-sync set. However, if a node with the primary is shutting down, we might ack such request even though we are unable to send a shard-failure request to the master. This happens because we ignore NodeClosedException which is triggered when the ClusterService is being closed. Closes #39467	2019-03-06 15:30:55 -05:00
markharwood	1873de5240	Bug fix for AnnotatedTextHighlighter - port of 39525 (#39749 ) Bug fix for AnnotatedTextHighlighter - port of 39525 Relates to #39395	2019-03-06 19:02:04 +00:00
Yannick Welsch	d094107592	Fix SharedClusterSnapshotRestoreIT Relates to #39644	2019-03-06 17:51:23 +01:00
Yannick Welsch	fef11f7efc	Allow snapshotting replicated closed indices (#39644 ) This adds the capability to snapshot replicated closed indices. It also changes snapshot requests in v8.0.0 to automatically expand wildcards to closed indices and hence start snapshotting closed indices by default. For v7.1.0 and above, wildcards are by default only expanded to open indices, which can be changed by explicitly setting the expand_wildcards option either to all or closed. Note that indices are always restored as open indices, even if they have been snapshotted as closed replicated indices. Relates to #33888	2019-03-06 16:08:20 +01:00
Simon Willnauer	e620fb2e4a	Add option to force load term dict into memory (#39741 ) Lucene added an optimization to leave the term dictionary on disk for non-id like fields. This change happened very late in the release processes such that it's better to have an escape hatch if certain use-cases are hurt by this optimization. This setting might be removed in the future if it turns out to be unnecessary.	2019-03-06 15:29:04 +01:00
Christoph Büscher	6c503824c8	Fix occasional SearchServiceTests failure (#39697 ) Currently SearchServiceTests.testCloseSearchContextOnRewriteException can fail if a refresh happens while we test for the SearchPhaseExecutionException that is thrown later in the test. The test takes the current Store#refCount and expects it to be the same after the exception is thrown. If a refresh happens in that interval however, the refCound will be different, causing the test to fail. This can be provoked e.g. by running this section in a tight loop. Switching of refresh for this tests solves the issue.	2019-03-06 14:18:03 +01:00
Andrey Ershov	52fd102e23	Avoid serialising state if it was already serialised (#39179 ) When preparing the state to send to other nodes, we're serializing it for each node, despite using putIfAbsent. This commit checks if the state was already serialized for this node version before performing the potentially expensive computation. The map is not used by multiple threads, so computeIfAbsent is not needed (and could not be used here easily, because IOException could be thrown). (cherry picked from commit c99be63b43f5250f3cd220130df73c5e9e097459)	2019-03-06 11:54:13 +01:00
David Turner	295e39a8c8	Drop node if asymmetrically partitioned from master (#39598 ) When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.	2019-03-06 09:41:57 +00:00
David Turner	77dd711847	Tidy up GroupedActionListener (#39633 ) Today the `GroupedActionListener` accepts a `defaults` parameter but all callers pass an empty list. Also it is permitted to pass an empty group but this is trappy because the delegated listener is never be called in that case. This commit removes the `defaults` parameter and forbids an empty group.	2019-03-06 09:25:10 +00:00
Armin Braun	aaecaf59a4	Optimize Bulk Message Parsing and Message Length Parsing (#39634 ) (#39730 ) * Optimize Bulk Message Parsing and Message Length Parsing * findNextMarker took almost 1ms per invocation during the PMC rally track * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search * It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes * Fixed by adding bulk `int` read to BytesReference	2019-03-06 08:13:15 +01:00
Jason Tedor	75a0d4f470	Rename retention lease setting (#39719 ) This commit renames the retention lease setting index.soft_deletes.retention.lease so that it is under the namespace index.soft_deletes.retention_lease. As such, we rename the setting to index.soft_deletes.retention_lease.period.	2019-03-05 22:04:45 -05:00
Jason Tedor	504c792861	Add Docker build type (#39378 ) This commit adds a new build type (together with deb/rpm/tar/zip) to represent the official Docker images. This build type will be displayed in APIs such as the main and nodes info APIs.	2019-03-05 22:03:15 -05:00
Luca Cavanna	9d0211485c	Tie-break completion suggestions with same score and surface form (#39564 ) In case multiple completion suggestion entries have the same score and surface form, the order in which such options will be returned is currently not deterministic. With this commmit we introduce tie-breaking for such situations, based on shard id, index name, index uuid and doc id like we already do for ordinary search hits. With this change we also make shardIndex mandatory when sorting and comparing completion suggestion options, which was previously only needed later when fetching hits). Also, we need to make sure shardIndex is properly set when merging completion suggestions coming from multiple clusters in `SearchResponseMerger`	2019-03-05 18:03:54 +01:00
Jim Ferenczi	160dc29f0e	Handle total hits equal to track_total_hits (#37907 ) This change ensures that a total hits equal to the value set for track_total_hits is not considered as a lower bound.	2019-03-05 16:28:48 +01:00
Armin Braun	750ec8ba53	Minor Cleanups in QueryPhase (#39680 ) (#39694 ) * Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for #39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`	2019-03-05 15:04:16 +01:00
Christoph Büscher	5cdea6ef17	Fix Fuzziness#asDistance(String) (#39643 ) Currently Fuzziness#asDistance(String) doesn't work for custom AUTO values. If the fuzziness is AUTO, the method returns the correct edit distance to use, depending on the input string, but for custom AUTO values it currently always returns an edit distance of 1. Correcting this and adding unit and integration tests to catch these cases. Closes #39614	2019-03-05 14:31:07 +01:00
Simon Willnauer	19f6a35358	Move BWC Version to 7.1.0 after backport Relates to #39512	2019-03-05 14:11:59 +01:00
Simon Willnauer	d112c89041	Allow inclusion of unloaded segments in stats (#39512 ) Today we have no chance to fetch actual segment stats for segments that are currently unloaded. This is relevant in the case of frozen indices. This allows to monitor how much memory a frozen index would use if it was unfrozen.	2019-03-05 14:02:20 +01:00
Armin Braun	e8d9744340	Use Threadpool Time in ClusterApplierService (#39679 ) (#39685 ) * Use threadpool's time in `ClusterApplierService` to allow for deterministic tests * This is a part of/requirement for #39504	2019-03-05 12:37:49 +01:00

1 2 3 4 5 ...

2803 Commits