Commit Graph

2708 Commits

Author SHA1 Message Date
Yogesh Gaikwad 20e5994179
Mute failing tests in NodeConnectionsServiceTests (#40034) (#40035) 2019-03-14 19:40:15 +11:00
Przemyslaw Gomulka 8a314a36db
Change zone formatting for all printers backport(#39568) #39952
After the joda-java time migration we were formatting zone ids with zoneOrOffsetId method. This when a date was provided with a ZoneRegion for instance America/Edmonton it was appending this zone identifier instead of zone formatted as +HH:MM.
This fix is changing the format of zone suffix for all printers and also always wrapping a Temporal into a ZonedDateTime when formatting.

closes #38471
backport #39568
2019-03-13 18:27:37 +01:00
Jim Ferenczi 7a7658707a
Upgrade to Lucene release 8.0.0 (#39998)
This commit upgrades to the GA release of Lucene 8

Closes #39640
2019-03-13 18:11:50 +01:00
Tim Brooks 352f9f1f39
Remove sizing from `Recycler#obtain` (#39975)
Currently there is a method `Recycler#obtain(size)` that allows a size
parameter to be passed. However all implementations ignore this
parameter and just allocate a page size based on other settings. This
commit removes this method.
2019-03-13 09:32:31 -06:00
Andrey Ershov 9300826d8a Do not log unsuccessful join attempt each time (#39756)
When performing the test with 57 master-eligible nodes and one node
crash, we saw messy elections, when multiple nodes were attempting to
become master.
JoinHelper has logged 105 long log messages with lengthy stack
traces during one such election.
To address this, we decided to log these messages every time only on
debug level.
We will log last unsuccessful join attempt (along with a timestamp)
if any with WARN level if the cluster is failing to form.

(cherry picked from commit 17a148cc27b5ac6c2e04ef5ae344da05a8a90902)
2019-03-13 13:30:31 +01:00
Christoph Büscher b10dd3769c Add analysis modes to restrict token filter use contexts (#36103)
Currently token filter settings are treated as fixed once they are declared and
used in an analyzer. This is done to prevent changes in analyzers that are already
used actively to index documents, since changes to the analysis chain could
corrupt the index. However, it would be safe to allow updates to token
filters at search time ("search_analyzer"). This change introduces a new
property of token filters that allows to mark them as only being usable at search
or at index time. Any analyzer that uses these tokenfilters inherits that property
and can be rejected if they are used in other contexts. This is a first step towards
making specific token filters (e.g. synonym filter) updateable.

Relates to #29051
2019-03-12 23:48:55 +01:00
Andy Bristol e2b88bc706 add version 6.6.3 2019-03-12 13:21:36 -07:00
David Turner 049970af3e Only connect to new nodes on new cluster state (#39629)
Today, when applying new cluster state we attempt to connect to all of its
nodes as a blocking part of the application process. This is the right thing to
do with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness reasonably gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves #29025.
2019-03-12 19:26:13 +00:00
Przemyslaw Gomulka a29bba4ede
Migrate Streamable to writeable for index package backport(#37381) #39949
Migrate streamable classes from index package to Writeable and clean up access modifiers

Related to #34389
backport#37381
2019-03-12 12:10:36 +01:00
lzh3636 ad55e5b80d Log missing file exception when failing to read metadata snapshot (#32920)
Adds the exception to the logged output, which contains info about the file that's missing.
2019-03-12 10:41:44 +01:00
Nhat Nguyen ce5f09ab04 Enforce retention leases require soft deletes (#39922)
If a primary on 6.7 and a replica on 5.6 are running more than 5 minutes
(retention leases background sync interval), the retention leases
background sync will be triggered, and it will trip 6.7 node due to the
illegal checkpoint value. We can fix the problem by making the returned
checkpoint depends on the node version. This PR, however, chooses to
enforce retention leases require soft deletes, and make retention leases
sync noop if soft deletes is disabled instead.

Closes #39914
2019-03-11 22:37:47 -04:00
Nhat Nguyen bf814357ad Enable soft deletes in RetentionLeaseIT
Relates #39922
2019-03-11 22:37:42 -04:00
Armin Braun 9eb4614fa6
More Verbose Assertion in testSnapshotWithStuckNode (#39893) (#39928)
* The test failure in #39852 is caused by a file in the initial repository when there should not be any
  * It seems that on a normal consistent file system no left-over file should exist ever here after the validation finishes and I can't reproduce or see any other path to a dangling file in the fresh respository
=> added a more verbose and strict assertion that will log what file is left over next time
* Relates #39852
2019-03-11 19:27:08 +01:00
Jake Landis b0b0f66669
Remove types from internal monitoring templates and bump to api 7 (#39888) (#39926)
This commit removes the "doc" type from monitoring internal indexes.
The template still carries the "_doc" type since that is needed for
the internal representation.

This change impacts the following templates:
monitoring-alerts.json
monitoring-beats.json
monitoring-es.json
monitoring-kibana.json
monitoring-logstash.json

As part of the required changes, the system_api_version has been
bumped from "6" to "7" and support for version "2" has been dropped.

A new empty pipeline is now introduced for the version "7", and
the formerly empty "6" pipeline will now remove the type and re-direct
the request to the "7" index.

Additionally, to due to a difference in the internal representation
(which requires the inclusion of "_doc" type) and external representation
(which requires the exclusion of any type) a helper method is introduced
to help convert internal to external representation, and used by the
monitoring HTTP template exporter.

Relates #38637
2019-03-11 13:17:27 -05:00
Yannick Welsch 4f941c6963 Do not swallow exceptions in TimedRunnable (#39856)
Executors of type fixed_auto_queue_size (i.e. search / search_throttled) wrap runnables into
TimedRunnable, which is an AbstractRunnable. This is dangerous as it might silently swallow
exceptions, and possibly miss calling a response listener. While this has not triggered any failures in
the tests I have run so far, it might help uncover future problems.

Follow-up to #36137
2019-03-11 19:03:12 +01:00
Yannick Welsch 292eb8b001 Fix CoordinatorTests.testIncompatibleDiffResendsFullState (#39345)
This test started failing since decreasing the leader and follower check timeouts (#38298). The
reason is that the test was relying on the default publication timeout to come into effect before
leader / follower check timeouts, which is now not always true anymore.

Closes #38867
2019-03-11 19:03:10 +01:00
Tim Brooks dd77899278
Log send failure at debug level if channel closed (#39807)
Currently we log exceptions due to channel close at the debug level in
the normal exception handler. Currently we log all send failures due to
channel close at the warn level. This commit changes that to only log at
warn if the send failure is not due to channel closed. Additionally, it
adds the ssl engine closed as a channel close exception.
2019-03-11 10:33:02 -06:00
Yannick Welsch b7be724e50 Check term earlier in publication process (#39909)
in order to avoid tripping assertPreviousStateConsistency.

Closes #39314
2019-03-11 15:40:20 +01:00
David Turner 6e4f304f88 Synchronize pendingOutgoingJoins (#39900)
Today we use a ConcurrentHashSet to track the in-flight outgoing joins in the
`JoinHelper`. This is fine for adding and removing elements but not for the
emptiness test in `isJoinPending()` which might return false if one join
finishes just after another one starts, even though joins were pending
throughout.

As used today this is ok: it means the node was trying to join a master but
this join attempt just finished unsuccessfully, and causes it to (rightfully)
reject a `FollowerCheck` from the failed master. However this kind of API
inconsistency is trappy and there is no need to be clever here, so this change
replaces the set with a `synchronizedSet()`.
2019-03-11 12:13:21 +00:00
Ankit Jain 471aa6a16a Fixing 503 Service Unavailable errors during fetch phase (#39086)
When ESRejectedExecutionException gets thrown on the coordinating node while trying to fetch hits, the resulting exception will hold no shard failures, hence `503` is used as the response status code. In that case, `429` should be returned instead. Also, the status code should be taken from the cause if available whenever there are no shard failures instead of blindly returning `503` like we currently do.

Closes #38586
2019-03-11 10:13:55 +01:00
Adrien Grand b841de2e38
Don't emit deprecation warnings on calls to the monitoring bulk API. (#39805) (#39838)
The monitoring bulk API accepts the same format as the bulk API, yet its concept
of types is different from "mapping types" and the deprecation warning is only
emitted as a side-effect of this API reusing the parsing logic of bulk requests.

This commit extracts the parsing logic from `_bulk` into its own class with a
new flag that allows to configure whether usage of `_type` should emit a warning
or not. Support for payloads has been removed for simplicity since they were
unused.

@jakelandis has a separate change that removes this notion of type from the
monitoring bulk API that we are considering bringing to 8.0.
2019-03-11 07:58:28 +01:00
Adrien Grand 2bbef67770
Propagate exceptions in o.e.common.io.Streams. (#39042) (#39848)
This commit propagates some exceptions that were previously swallowed and also
makes sure that exceptions closing streams are either propagated if the try
block succeeded or added as suppressed exceptions otherwise.
2019-03-11 07:58:01 +01:00
Benjamin Trent 4da04616c9
[ML] refactoring lazy query and agg parsing (#39776) (#39881)
* [ML] refactoring lazy query and agg parsing

* Clean up and addressing PR comments

* removing unnecessary try/catch block

* removing bad call to logger

* removing unused import

* fixing bwc test failure due to serialization and config migrator test

* fixing style issues

* Adjusting DafafeedUpdate class serialization

* Adding todo for refactor in v8

* Making query non-optional so it does not write a boolean byte
2019-03-10 14:54:02 -05:00
Julie Tibshirani 8454cfc1b2 Move validation from FieldTypeLookup to MapperMergeValidator. (#39814)
This commit consolidates more mapping validation logic into the same class.
`FieldTypeLookup` is now a bit simpler, and has the sole responsibility of quickly
resolving field names to their types.

I have a broader refactor planned around mapping merge validation, but this
change should at least be a step in the right direction.
2019-03-08 18:05:21 -08:00
Nhat Nguyen 993182e426 Combine overriddenOps and skippedOps in translog (#39771)
These two stats are not important enough to be distinguishable.
This change combines them into a single stat.

Closes #33317
2019-03-08 16:28:50 -05:00
Julie Tibshirani be9c37fc76 Small simplifications to mapping validation. (#39777)
These simplifications to `MapperMergeValidator` are possible now that there is
always a single mapping definition.

* Remove the type argument in `validateMapperStructure`.
* Remove unnecessary checks against existing mappers.
2019-03-08 12:34:09 -08:00
Nhat Nguyen a0a91f74ff Treat TransportService stopped error as node is closing (#39800)
If TransportService is stopped before a shard-failure request is sent
but after the request is registered, TransportService will notify
ReplicationOperation a TransportException with an error message:
"transport stop, action: internal:cluster/shard/failure".

Relates #39584
2019-03-08 15:15:56 -05:00
Ryan Ernst 465343f12a
Bundle java in distributions (#38013)
* Bundle java in distributions

Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.

This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.

closes #31845
2019-03-08 11:04:18 -08:00
Gordon Brown e6b9262a31 Mute testOpenCloseApiWildcards (#39578) (#39579) 2019-03-08 15:18:16 +00:00
David Roberts aec2db78ea Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica
Due to https://github.com/elastic/elasticsearch/issues/36813
2019-03-08 13:28:27 +00:00
David Roberts 366eef99a1 Mute SharedClusterSnapshotRestoreIT.testCloseOrDeleteIndexDuringSnapshot
Due to https://github.com/elastic/elasticsearch/issues/39828
2019-03-08 11:42:13 +00:00
David Turner 5d68143b18 Reformat elasticsearch-node messages (#39811)
Flows the warning messages emitted by the `elasticsearch-node` tool to a width
of 72 characters and tweaks the wording slightly.
2019-03-08 10:01:29 +00:00
Jake Landis 797d6b8a66
Execute ingest node pipeline before creating the index (#39607) (#39796)
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
2019-03-07 13:31:41 -06:00
Jason Tedor 0250d554b6
Introduce forget follower API (#39718)
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
2019-03-07 11:08:45 -05:00
Armin Braun 213cc6673c
Remove Dead Code in o.e.util package (#39717) (#39779)
* None of this code is used so we should delete it, we can always bring it back if needed
2019-03-07 08:31:46 +01:00
Nhat Nguyen b69affda6a Use unwrapped cause to determine if node is closing (#39723)
We need to unwrap and use the actual cause when determining if the node
with primary shard is shutting down because TransportService will throw
a TransportException wrapped in a SendRequestTransportException.

Relates #39584
2019-03-06 15:30:55 -05:00
Nhat Nguyen 1fe7cb594f Don’t ack if unable to remove failing replica (#39584)
Today when a replicated write operation fails to execute on a replica,
the primary will reach out to the master to fail that replica (and mark
it stale). We then won't ack that request until the master removes the
failing replica; otherwise, we will lose the acked operation if the
failed replica is still in the in-sync set. However, if a node with the
primary is shutting down, we might ack such request even though we are
unable to send a shard-failure request to the master. This happens
because we ignore NodeClosedException which is triggered when the
ClusterService is being closed.

Closes #39467
2019-03-06 15:30:55 -05:00
markharwood 1873de5240
Bug fix for AnnotatedTextHighlighter - port of 39525 (#39749)
Bug fix for AnnotatedTextHighlighter - port of 39525

Relates to #39395
2019-03-06 19:02:04 +00:00
Yannick Welsch d094107592 Fix SharedClusterSnapshotRestoreIT
Relates to #39644
2019-03-06 17:51:23 +01:00
Yannick Welsch fef11f7efc Allow snapshotting replicated closed indices (#39644)
This adds the capability to snapshot replicated closed indices.

It also changes snapshot requests in v8.0.0 to automatically expand wildcards to closed indices and hence start snapshotting closed indices by default. For v7.1.0 and above, wildcards are by default only expanded to open indices, which can be changed by explicitly setting the expand_wildcards option either to all or closed.

Note that indices are always restored as open indices, even if they have been snapshotted as closed replicated indices.

Relates to #33888
2019-03-06 16:08:20 +01:00
Simon Willnauer e620fb2e4a Add option to force load term dict into memory (#39741)
Lucene added an optimization to leave the term dictionary on disk
for non-id like fields. This change happened very late in the release
processes such that it's better to have an escape hatch if certain
use-cases are hurt by this optimization. This setting might be
removed in the future if it turns out to be unnecessary.
2019-03-06 15:29:04 +01:00
Christoph Büscher 6c503824c8 Fix occasional SearchServiceTests failure (#39697)
Currently SearchServiceTests.testCloseSearchContextOnRewriteException can fail
if a refresh happens while we test for the SearchPhaseExecutionException that is
thrown later in the test. The test takes the current Store#refCount and expects
it to be the same after the exception is thrown. If a refresh happens in that
interval however, the refCound will be different, causing the test to fail. This
can be provoked e.g. by running this section in a tight loop.
Switching of refresh for this tests solves the issue.
2019-03-06 14:18:03 +01:00
Andrey Ershov 52fd102e23 Avoid serialising state if it was already serialised (#39179)
When preparing the state to send to other nodes, we're serializing it
for each node, despite using putIfAbsent.
This commit checks if the state was already serialized for this node
version before performing the potentially expensive computation.
The map is not used by multiple threads, so computeIfAbsent is not
needed (and could not be used here easily, because IOException could
be thrown).

(cherry picked from commit c99be63b43f5250f3cd220130df73c5e9e097459)
2019-03-06 11:54:13 +01:00
David Turner 295e39a8c8 Drop node if asymmetrically partitioned from master (#39598)
When a node is joining the cluster we ensure that it can send requests to the
master _at that time_. If it joins the cluster and _then_ loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.

This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.
2019-03-06 09:41:57 +00:00
David Turner 77dd711847 Tidy up GroupedActionListener (#39633)
Today the `GroupedActionListener` accepts a `defaults` parameter but all
callers pass an empty list. Also it is permitted to pass an empty group but
this is trappy because the delegated listener is never be called in that case.
This commit removes the `defaults` parameter and forbids an empty group.
2019-03-06 09:25:10 +00:00
Armin Braun aaecaf59a4
Optimize Bulk Message Parsing and Message Length Parsing (#39634) (#39730)
* Optimize Bulk Message Parsing and Message Length Parsing

* findNextMarker took almost 1ms per invocation during the PMC rally track
  * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search
* It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes
  * Fixed by adding bulk `int` read to BytesReference
2019-03-06 08:13:15 +01:00
Jason Tedor 75a0d4f470
Rename retention lease setting (#39719)
This commit renames the retention lease setting
index.soft_deletes.retention.lease so that it is under the namespace
index.soft_deletes.retention_lease. As such, we rename the setting to
index.soft_deletes.retention_lease.period.
2019-03-05 22:04:45 -05:00
Jason Tedor 504c792861
Add Docker build type (#39378)
This commit adds a new build type (together with deb/rpm/tar/zip) to
represent the official Docker images. This build type will be displayed
in APIs such as the main and nodes info APIs.
2019-03-05 22:03:15 -05:00
Luca Cavanna 9d0211485c Tie-break completion suggestions with same score and surface form (#39564)
In case multiple completion suggestion entries have the same score and
surface form, the order in which such options will be returned is
currently not deterministic.

With this commmit we introduce tie-breaking for such situations, based
on shard id, index name, index uuid and doc id like we already do for
 ordinary search hits. With this change we also make shardIndex
mandatory when sorting and comparing completion suggestion options,
which was previously only needed later when fetching hits).

Also, we need to make sure shardIndex is properly set when merging
completion suggestions coming from multiple clusters in
`SearchResponseMerger`
2019-03-05 18:03:54 +01:00
Jim Ferenczi 160dc29f0e Handle total hits equal to track_total_hits (#37907)
This change ensures that a total hits equal to the value set for
track_total_hits is not considered as a lower bound.
2019-03-05 16:28:48 +01:00