Commit Graph

2512 Commits

Author SHA1 Message Date
Andrey Ershov c1270e97b0
Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (#38178)
In Zen2 cluster bootstrap is required and some parameters are 
called differently in Zen2.
2019-02-01 15:24:08 +01:00
Andrey Ershov bda591453c
Add elasticsearch-node detach-cluster command (#37979)
This commit adds the second part of `elasticsearch-node` tool -
`detach-cluster` command in addition to `unsafe-bootstrap` command.
Also, this commit changes the semantics of `unsafe-bootstrap`, now
`unsafe-bootstrap` changes clusterUUID.
So the algorithm of running `elasticsearch-node` tool is the following:
1) Stop all nodes in the cluster.
2) Pick master-eligible node with the highest (term, version) pair and
run the `unsafe-bootstrap` command on it. If there are no survived
master-eligible nodes - skip this step.
3) Run `detach-cluster` command on the remaining survived nodes.

Detach cluster makes the following changes to the node metadata:
1) Sets clusterUUID committed to false.
2) Sets currentTerm and term to 0. 
3) Removes voting tombstones and sets voting configurations to special
constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster
bootstrap.

`ElasticsearchNodeCommand` base abstract class is introduced, because
`UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in
common.
Also, this commit adds "ordinal" parameter to both commands, because it's 
impossible to write IT otherwise.
For MUST_JOIN_ELECTED_MASTER case special handling is introduced in
`ClusterFormationFailureHelper`.
Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed
from `UnsafeBootstrapMasterIT`).
2019-02-01 14:53:55 +01:00
Alexander Reelsen 979e5576e5
Add tests for fractional epoch parsing (#38162)
Fractional epoch parsing is supported, the tests we used were edge cases
that did not make sense. This adds tests to properly check for this.
2019-02-01 14:48:37 +01:00
Tanguy Leroux 029e4b6278
Clear send behavior rule in CloseWhileRelocatingShardsIT (#38159)
The current CloseWhileRelocatingShardsIT test adds some "send behavior" 
rule to a target node's mocked transport service in order to detect when shard 
relocating are started. These rules are never cleared and prevent the test to 
complete normally after the rebalance is re-enabled again.

This commit changes the test so that rules are cleared and most verifications 
are done before the rebalance is reenabled again.

Closes #38090
2019-02-01 12:58:46 +01:00
Yannick Welsch ce469cfda5
Fix testCorruptedIndex (#38161)
Folks at the Lucene project do not seem to be interested in classifying corruptions and
distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525),
so we'll just cop out as well.

Closes #34322
2019-02-01 12:51:38 +01:00
Luca Cavanna e18cac3659
Add finalReduce flag to SearchRequest (#38104)
With #37000 we made sure that fnial reduction is automatically disabled
whenever a localClusterAlias is provided with a SearchRequest.

While working on #37838, we found a scenario where we do need to set a
localClusterAlias yet we would like to perform a final reduction in the
remote cluster: when searching on a single remote cluster.

Relates to #32125

This commit adds support for a separate finalReduce flag to
SearchRequest and makes use of it in TransportSearchAction in case we
are searching against a single remote cluster.

This also makes sure that num_reduce_phases is correct when searching
against a single remote cluster: it makes little sense to return
`num_reduce_phases` set to `2`, which looks especially weird in case
the search was performed against a single remote shard. We should
perform one reduction phase only in this case and `num_reduce_phases`
should reflect that.

* line length
2019-02-01 12:11:42 +01:00
Jim Ferenczi 6fa93ca493
Forbid negative field boosts in analyzed queries (#37930)
This change forbids negative field boost in the `query_string`, `simple_query_string`
and `multi_match` queries.
Negative boosts are not allowed in Lucene 8 (scores must be positive).
The backport of this change to 6x will turn the error into a deprecation warning
in order to raise the awareness of this breaking change in 7.0.

Closes #33309
2019-02-01 11:41:40 +01:00
Jim Ferenczi 57b1d245e8
Remove AtomiFieldData#getLegacyFieldValues (#38087)
This function is unused now that we format the docvalue fields with the default
formatter on the field (#30831)
2019-02-01 11:41:17 +01:00
Andrey Ershov bfd618cf83
Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038)
Currently, there are a few tests that use autoMinMasterNodes=false and
hence override addExtraClusterBootstrapSettings, mostly this is 10-30
lines of codes that are copy-pasted from class to class.

This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex`
which is suitable for all classes and copy-paste could be removed.

Removing code is always a good thing!
2019-02-01 11:34:31 +01:00
Jim Ferenczi b7308aa03c
Don't load global ordinals with the `map` execution_hint (#37833)
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality.

Closes #37705
2019-02-01 09:35:46 +01:00
David Turner 23f00e3676
Relax fault detector in some disruption tests (#38101)
Today we use `AbstractDisruptionTestCase` to test the behaviour of things like
master elections in the presence of cluster disruptions. These tests have
rather enthusiastic fault detection settings, detecting a fault if a single
ping fails, with a one-second timeout. Furthermore there are some tests that
assert the identity of the master remains unchanged during some disruption, and
these assertions fail rather often thanks to the overly sensitive fault
detector.

However in a number of these tests the fault detector need not be this
sensitive. This commit moves some such tests into their own test suite and uses
more sensible fault-detection settings to avoid the kind of master instability
that is causing CI failures.

Closes #37699
2019-02-01 08:10:49 +00:00
Alexander Reelsen c02cd3e2fd
Fix java time epoch date formatters (#37829)
The self written epoch date formatters were not properly able to format
an Instant to a string due to a misconfiguration.

This fix also removes a until now existing runtime behaviour under java
8 regarding the names of the aggregation buckets, which are now the same
as before and have been under java 11.
2019-02-01 09:03:48 +01:00
Yannick Welsch 859e2f5bc8 Adapt timeouts in UpdateMappingIntegrationIT
Relates to #37263 and possibly #36916
2019-02-01 08:58:31 +01:00
Adrien Grand d83c748417
Fix test bug in DynamicMappingsIT. (#37906)
Closes #37898
2019-02-01 08:35:29 +01:00
Przemyslaw Gomulka 2758578570
Trim the JSON source in indexing slow logs (#38081)
The '{' as a first character in log line is causing problems for beats when parsing plaintext logs. This can happen if the submitted document has an additional '\n' at the beginning and we are not reformatting. 
Trimming the source part of a SlogLog solves that and keeps the logs readable.

closes #38080
2019-02-01 08:12:12 +01:00
Armin Braun 0a604e3b24
Fix Two Races that Lead to Stuck Snapshots (#37686)
* Fixes two broken spots:
    1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting
    2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state
* Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk
* Significantly extended test infrastructure to reproduce the above two issues
  * Two new test runs:
      1. Reproducing the effects of node disconnects/restarts in isolation
      2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes
* Relates #32265 
* Closes #32348
2019-02-01 05:45:40 +01:00
Nhat Nguyen b8b843476d
Disable dynamic mapping in testSimpleGetFieldMappingsWithDefaults (#38045)
Since #31140 we no longer require acking on the dynamic mapping of index
requests. Thus, a returned mapping from a get mapping request does not
necessarily contain the dynamic updates from the index request. This
commit replaces the dynamic mapping update with a manual put mapping.

Relates #31140
Closes #37928
2019-01-31 21:01:41 -05:00
Nhat Nguyen a8ebe2a217
Fix random params in testSoftDeletesRetentionLock (#38114)
Since #37992 the retainingSequenceNumber is initialized with 0 
while the global checkpoint can be -1.

Relates #37992
2019-01-31 20:50:41 -05:00
Lee Hinman c67a9663af
Fix MasterServiceTests.testClusterStateUpdateLogging (#38116)
This changes the test to not use a `CountDownlatch`, instead adding an assertion
for the final logging message and waiting until the `MockAppender` has seen it
before proceeding.

Related to df2c06f6f30f7e23a6863a3f72fc3bdb7648885c
Resolves #23739
2019-01-31 17:13:19 -07:00
Yuri Astrakhan f3cde06a1d
geotile_grid implementation (#37842)
Implements `geotile_grid` aggregation

This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240

This code uses the same base classes as `geohash_grid` agg, but uses a different hashing
algorithm to allow zoom consistency.  Each grid bucket is aligned to Web Mercator tiles.
2019-01-31 19:11:30 -05:00
Pascal Christoph a3d9ba3f4b Log document id when MapperParsingException occurs (#37800)
Closes #37658
2019-01-31 16:33:13 -05:00
Nhat Nguyen 237fcda2cc
Disable dynamic mapping update in testTransportBulkTasks (#38073)
If a replica does not have a right mapping yet, we will retry the index
request on that replica; then the actual tasks is higher than the
expected tasks. Since #31140 this happens more frequently for we no
longer require acking on the dynamic mapping of index requests.

Relates #31140
Closes #37893
2019-01-31 13:16:52 -05:00
Przemyslaw Gomulka 28b5c7ce78
Do not set up NodeAndClusterIdStateListener in test (#38110)
When extending ESIntegTestCase are run on the same jvm, the static field in
NodeAndClusterIdConverter will throw an AlreadySet exceptions.
overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening
relates #32850
2019-01-31 18:59:40 +01:00
Nhat Nguyen 8e95780f98
Soft-deletes policy should always fetch latest leases (#37940)
If a new retention lease is added while a primary's soft-deletes policy
is locked for peer-recovery, that lease won't be baked into the Lucene
commit.

Relates #37165
Relates #37375
2019-01-31 12:02:57 -05:00
Henning Andersen 68ed72b923
Handle scheduler exceptions (#38014)
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
2019-01-31 17:51:45 +01:00
David Turner 7f738e8541
Minor logging improvements (#38084)
Fixes some log messages that caused some minor confusion when digging through a
log generated by a failing test.
2019-01-31 16:41:04 +00:00
Tal Levy 9923f0fe6a
fix a few versionAdded values in ElasticsearchExceptions (#37877)
TooManyBucketsException was introduced in v6.2
and SnapshotInProgressException was introduced in v6.7
2019-01-31 08:28:20 -08:00
Tanguy Leroux 7a597cad0d
Reenable BWC tests after backport of #37899 (#38093)
This commit adapts the version used in StartedShardEntry serialization
 after the backport of  #37899 and reenables bwc tests.

Related to #37899
Related to #38074
2019-01-31 16:53:28 +01:00
Henning Andersen 7487be3d3c Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock 2019-01-31 15:31:01 +01:00
Jason Tedor a9b12b38f0
Push primary term to replication tracker (#38044)
This commit pushes the primary term into the replication tracker. This
is a precursor to using the primary term to resolving ordering problems
for retention leases. Namely, it can be that out-of-order retention
lease sync requests arrive on a replica. To resolve this, we need a
tuple of (primary term, version). For this to be, the primary term needs
to be accessible in the replication tracker. As the primary term is part
of the replication group anyway, this change conceptually makes sense.
2019-01-31 09:19:49 -05:00
Luca Cavanna 622fb7883b
Introduce ability to minimize round-trips in CCS (#37828)
With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters.

This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour.

Relates to #32125
2019-01-31 15:12:14 +01:00
Armin Braun ae9f4df361
Don't Assert Ack on when Publish Timeout is 0 in Test (#38077)
* Publish timeout is set to `0` so out of order processing of states on the node can lead to a `false` ack response
  * See #30672
* Closes #36813
2019-01-31 14:35:11 +01:00
Alexander Reelsen 9f026bb8ad
Reduce object creation in Rounding class (#38061)
This reduces objects creations in the rounding class (used by aggs) by properly
creating the objects only once. Furthermore a few unneeded ZonedDateTime objects
were created in order to create other objects out of them. This was
changed as well.

Running the benchmarks shows a much faster performance for all of the
java time based Rounding classes.
2019-01-31 14:18:28 +01:00
Adrien Grand a536fa7755
Treat put-mapping calls with `_doc` as a top-level key as typed calls. (#38032)
Currently the put-mapping API assumes that because the type name is `_doc` then
it is dealing with a typeless put-mapping call. Yet we still allow running the
put-mapping API in a typed fashion with `_doc` as a type name. The current logic
triggers surprising errors when doing a typed put-mapping call with `_doc` as a
type name on an index that has a type already.

This is a bit of a corner-case, but is more important on 6.x due to the fact
that using the index API with `_doc` as a type name triggers typed calls to the
put-mapping API with `_doc` as a type name.
2019-01-31 13:57:42 +01:00
David Turner eadcb5f0f8
Fix size of rolling-upgrade bootstrap config (#38031)
Zen2 nodes will bootstrap themselves once they believe there to be no remaining
Zen1 master-eligible nodes in the cluster, as long as minimum_master_nodes is
satisfied.

Today the bootstrap configuration comprises just the ids of the known
master-eligible nodes, and this might be too small to be safe. For instance, if
there are 5 master-eligible nodes (so that minimum_master_nodes is 3) then the
bootstrap configuration could comprise just 3 nodes, of which 2 form a quorum,
and this does not intersect other quorums that might arise, leading to a
split-brain.

This commit fixes this by expanding the bootstrap configuration so that its
quorums satisfy minimum_master_nodes, by adding some of the IDs of the other
master-eligible nodes in the last-published cluster state.
2019-01-31 08:00:11 +00:00
Alexander Reelsen b94acb608b
Speed up converting of temporal accessor to zoned date time (#37915)
The existing implementation was slow due to exceptions being thrown if
an accessor did not have a time zone. This implementation queries for
having a timezone, local time and local date and also checks for an
instant preventing to throw an exception and thus speeding up the conversion.

This removes the existing method and create a new one named
DateFormatters.from(TemporalAccessor accessor) to resemble the naming of
the java time ones.

Before this change an epoch millis parser using the toZonedDateTime
method took approximately 50x longer.

Relates #37826
2019-01-31 08:55:40 +01:00
Alexander Reelsen 160d1bd4dd
Work around JDK8 timezone bug in tests (#37968)
The timezone GMT0 cannot be properly parsed on java8.
The randomZone() method now excludes GMT0, if java8 is used.

Closes #37814
2019-01-31 08:52:35 +01:00
Nhat Nguyen f5398d6511 Mute testRetentionLeasesSyncOnExpiration
Tracked at #37963
2019-01-31 00:57:27 -05:00
Jason Tedor a6a534f1f0
Reenable BWC testing after retention lease stats (#38062)
This commit adjusts the BWC version on retention leases in stats, so
with this we also reenable BWC testing.
2019-01-30 20:34:27 -05:00
Tim Brooks b88bdfe958
Add dispatching to `HandledTransportAction` (#38050)
This commit allows implementors of the `HandledTransportAction` to
specify what thread the action should be executed on. The motivation for
this commit is that certain CCR requests should be performed on the
generic threadpool.
2019-01-30 15:40:49 -07:00
Michael Basnight 945ad05d54
Update verify repository to allow unknown fields (#37619)
The subparser in verify repository allows for unknown fields. This
commit sets the value to true for the parser and modifies the test such
that it accurately tests it.

Relates #36938
2019-01-30 14:31:16 -06:00
David Turner 81c443c9de
Deprecate minimum_master_nodes (#37868)
Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in
tests, but for 7.x nodes this setting is not required as it has no effect.
This commit removes this setting so that nodes are started with more realistic
configurations, and deprecates it.
2019-01-30 20:09:15 +00:00
Armin Braun a070b8acc0
Extract TransportRequestDeduplication from ShardStateAction (#37870)
* Extracted the logic for master request duplication so it can be reused by the snapshotting logic
* Removed custom listener used by `ShardStateAction` to not leak these into future users of this class
* Changed semantics slightly to get rid of redundant instantiations of the composite listener
* Relates #37686
2019-01-30 19:21:09 +01:00
Jason Tedor 6500b0cbd7
Expose retention leases in shard stats (#37991)
This commit exposes retention leases via shard-level stats.
2019-01-30 13:20:40 -05:00
Jason Tedor c468b2f7ca
Make primary terms fields private in index shard (#38036)
This commit encapsulates the primary terms fields in index shard. This
is a precursor to pushing the operation primary term down to the
replication tracker.
2019-01-30 12:56:58 -05:00
Nhat Nguyen ed460c2815 Log flush_stats and commit_stats in testMaybeFlush
This test failed a few times over the last several months. It seems that
we triggered a flush, but CI was too slow to finish it in several
seconds. I added the flush stats and commit stats and unmuted this test.
We should have a good clue if this test fails again.

Relates #37896
2019-01-30 12:54:31 -05:00
Christoph Büscher ecbaa38864
Remove deprecated Plugin#onModule extension points (#37866)
Removes some guice index level extension point marked as @Deprecated since at
least 6.0. They served as a signpost for plugin authors upgrading from 2.x but
this shouldn't be relevant in 7.0 anymore.
2019-01-30 17:17:54 +01:00
Igor Motov 23805fa41a
Geo: Fix Empty Geometry Collection Handling (#37978)
Fixes handling empty geometry collection and re-enables
testParseGeometryCollection test.

Fixes #37894
2019-01-30 09:20:30 -05:00
Luca Cavanna b91d587275
Move SearchHit and SearchHits to Writeable (#37931)
This allowed to make SearchHits immutable, while quite a few fields in
SearchHit have to stay mutable unfortunately.

Relates to #34389
2019-01-30 12:05:54 +01:00
Jason Tedor ba285a56a7
Fix limit on retaining sequence number (#37992)
We only assign non-negative sequence numbers to operations, so the lower
limit on retaining sequence numbers should be that it is non-negative
only.
2019-01-30 05:25:17 -05:00
Alexander Reelsen 9ec4abc31e
Ensure date parsing BWC compatibility (#37929)
In order to retain BWC this changes the java date formatters to be able to
parse nanoseconds resolution, even if only milliseconds are supported.
This used to work on joda time as well so that a user could store a date
like `2018-10-03T14:42:44.613469+0000` and then just loose the precision
on anything lower than millisecond level.
2019-01-30 10:47:12 +01:00
Adrien Grand c8af0f4bfa
Use mappings to format doc-value fields by default. (#30831)
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.

This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
2019-01-30 10:31:51 +01:00
Adrien Grand b63b50b945
Give precedence to index creation when mixing typed templates with typeless index creation and vice-versa. (#37871)
Currently if you mix typed templates and typeless index creation or typeless
templates and typed index creation then you will end up with an error because
Elasticsearch tries to create an index that has multiple types: `_doc` and
the explicit type name that you used.

This commit proposes to give precedence to the index creation call so that
the type from the template will be ignored if the index creation call is
typeless while the template is typed, and the type from the index creation
call will be used if there is a typeless template.

This is consistent with the fact that index creation already "wins" if a field
is defined differently in the index creation call and in a template: the
definition from the index creation call is used in such cases.

Closes #37773
2019-01-30 10:28:24 +01:00
Jim Ferenczi 2732bb5cf3
Fix fetch source option in expand search phase (#37908)
This change fixes the copy of the fetch source option into the
expand search request that is used to retrieve the documents of each
collapsed group.

Closes #23829
2019-01-30 08:46:14 +01:00
Jim Ferenczi 5dcc805dc9
Restore a noop _all metadata field for 6x indices (#37808)
This commit restores a noop version of the AllFieldMapper that is instanciated only
for indices created in 6x. We need this metadata field mapper to be present in this version
in order to allow the upgrade of indices that explicitly disable _all (enabled: false).
The mapping of these indices contains a reference to the _all field that we cannot remove
in 7 so we'll need to keep this metadata mapper in 7x. Since indices created in 6x will not
be compatible with 8, we'll remove this noop mapper in the next major version.

Closes #37429
2019-01-30 08:45:50 +01:00
Marios Trivyzas f5b9b4d89c Add version 6.6.1 (#37975) 2019-01-30 15:33:01 +11:00
markharwood b889221f75
Types removal - deprecate include_type_name with index templates (#37484)
Added deprecation warnings for use of include_type_name in put/get index templates.
HLRC changes:
GetIndexTemplateRequest has a new client-side class which is a copy of server's GetIndexTemplateResponse but modified to be typeless.
PutIndexTemplateRequest has a new client-side counterpart which doesn't use types in the mappings
Relates to #35190
2019-01-29 20:52:41 +00:00
jimczi 193017672a Handle completion suggestion without contexts
This change fixes the handling of completion suggestion without contexts.

Relates #36996
2019-01-29 20:31:46 +01:00
Tim Brooks 00ace369af
Use `CcrRepository` to init follower index (#35719)
This commit modifies the put follow index action to use a
CcrRepository when creating a follower index. It routes 
the logic through the snapshot/restore process. A 
wait_for_active_shards parameter can be used to configure
how long to wait before returning the response.
2019-01-29 11:47:29 -07:00
Albert Zaharovits d05a4b9d14
Get Aliases with wildcard exclusion expression (#34230)
This commit adds the code in the HTTP layer that will parse exclusion wildcard
expressions.
The existing code issues 404s for wildcards as well as explicit indices.
But, in general, in an expression with exclude wildcards (-...*) following other
include wildcards, there is no way to tell if the include wildcard produced no
results or they were subsequently excluded.
Therefore, the proposed change is breaking the behavior of 404s for
wildcards. Specifically, no 404s will be returned for wildcards, even
if they are not followed by exclude wildcards or the exclude wildcards
could not possibly exclude what has previously been included.
Only explicitly requested aliases will be called out as missing.
2019-01-29 18:56:20 +02:00
Boaz Leskes 218df3009a
Move update and delete by query to use seq# for optimistic concurrency control (#37857)
The delete and update by query APIs both offer protection against overriding concurrent user changes to the documents they touch. They currently are using internal versioning. This PR changes that to rely on sequences numbers and primary terms.

Relates #37639 
Relates #36148 
Relates #10708
2019-01-29 10:23:05 -05:00
Yannick Welsch 3c9f7031b9
Enforce cluster UUIDs (#37775)
This commit adds join validation around cluster UUIDs, preventing a node to join a cluster if it was
previously part of another cluster. The commit introduces a new flag to the cluster state,
clusterUUIDCommitted, which denotes whether the node has locked into a cluster with the given
uuid. When a cluster is committed, this flag will turn to true, and subsequent cluster state updates
will keep the information about committal. Note that coordinating-only nodes are still free to switch
clusters at will (after restart), as they don't carry any persistent state.
2019-01-29 15:41:05 +01:00
Luca Cavanna 09a11a34ef
Remove clusterAlias instance member from QueryShardContext (#37923)
The clusterAlias member is only used in the copy constructor, to be able
to reconstruct the fully qualified index. It is also possible to remove
the instance member and add a private constructor that accepts the already built Index object which contains the cluster alias.
2019-01-29 15:31:49 +01:00
Boaz Leskes 65a9b61a91
Add Seq# based optimistic concurrency control to UpdateRequest (#37872)
The update request has a lesser known support for a one off update of a known document version. This PR adds an a seq# based alternative to power these operations.

Relates #36148 
Relates #10708
2019-01-29 09:18:05 -05:00
Tanguy Leroux 5d1964bcbf
Ignore shard started requests when primary term does not match (#37899)
This commit changes the StartedShardEntry so that it also contains the 
primary term of the shard to start. This way the master node can also 
checks that the primary term from the start request is equal to the current 
shard's primary term in the cluster state, and it can ignore any shard 
started request that would concerns a previous instance of the shard that 
would have been allocated to the same node.

Such situation are likely to happen with frozen (or restored) indices and 
the replication of closed indices, because with replicated closed indices 
the shards will be initialized again after the index is closed and can 
potentially be re initialized again if the index is reopened as a frozen 
index. In such cases the lifecycle of the shards would be something like:
* shard is STARTED
* index is closed
* shards is INITIALIZING (index state is CLOSED, primary term is X)
* index is reopened
* shards are INITIALIZING again (index state is OPENED, potentially frozen, 
primary term is X+1)

Adding the primary term to the shard started request will allow to discard 
potential StartedShardEntry requests received by the master node if the 
request concerns the shard with primary term X because it has been 
moved/reinitialized in the meanwhile under the primary term X+1.

Relates to #33888
2019-01-29 15:09:40 +01:00
Luca Cavanna 2325fb9cb3
Remove test only SearchShardTarget constructor (#37912)
Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.
2019-01-29 14:58:11 +01:00
Luca Cavanna 42eec55837
Replace failure.get().addSuppressed with failure.accumulateAndGet() (#37649)
Also add a test for concurrent incoming failures
2019-01-29 14:57:33 +01:00
Luca Cavanna a6d4838a67
Clean up allowPartialSearchResults serialization (#37911)
When serializing allowPartialSearchResults to the shards through ShardSearchTransportRequest, we use an optional boolean field, though
the corresponding instance member is declared `boolean` which can never
be null. We also have an assert to verify that the incoming search
request provides a non-null value for the flag, and a comment explaining
that null should be considered a bug.

This commit makes the allowPartialSearchResults method in
ShardSearchRequest return a `boolean` rather than a `Boolean` and
changes the serialization from optional to non optional, in a bw comp manner.
2019-01-29 14:56:22 +01:00
Tanguy Leroux 460f10ce60
Close Index API should force a flush if a sync is needed (#37961)
This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a 
forced flush, forcing the translog and the Lucene commit to contain the same max seq 
number and global checkpoint in the case the Translog contains operations that were 
not written in the IndexWriter (like a Delete that touches a non existing doc). This way 
the assertion added in #37426 won't trip.

Related to #33888
2019-01-29 13:15:58 +01:00
Yannick Welsch 504a89feaf
Step down as master when configured out of voting configuration (#37802)
Abdicates to another master-eligible node once the active master is reconfigured out of the voting
configuration, for example through the use of voting configuration exclusions.

Follow-up to #37712
2019-01-29 12:43:04 +01:00
Yannick Welsch 827c4f6567 Make Version.java aware of 6.x Lucene upgrade
Relates to #37913
2019-01-29 10:44:01 +01:00
Przemyslaw Gomulka 891320f5ac
Elasticsearch support to JSON logging (#36833)
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.

To populate additional fields node.id and cluster.uuid which are not available at start time, 
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. 

Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions

Docker log4j2 configuration is now almost the same as the one use for ES binary. 
The only difference is that docker is using console appenders, whereas ES is using file appenders.

relates: #32850
2019-01-29 07:20:09 +01:00
Nhat Nguyen 9ceb218d85 Adjust bwc version for put mapping requests
Relates #37675
2019-01-28 10:57:11 -05:00
Armin Braun 0d109396fa
Increase Timeout in #testSnapshotCanceled (#37890)
* The test failure reported in the issue looks like a mere timeout. Logging suggestst hat the snapshot completes/aborts correctly but the busy
loop polling the snapshot state times out too early.
* Closes #37888
2019-01-28 14:13:02 +01:00
Luca Cavanna a9adc16922 Mute failing SearchQueryIT test
Relates to #37814
2019-01-28 13:41:13 +01:00
Alpar Torok 64b98db973
Add an alias for :server:integTest so it runs as part of internalClusterTest (#37910) 2019-01-28 14:26:22 +02:00
Jason Tedor 194cdfe208
Sync retention leases on expiration (#37902)
This commit introduces a sync of retention leases when a retention lease
expires. As expiration of retention leases is lazy, their expiration is
managed only when getting the current retention leases from the
replication tracker. At this point, we callback to our full retention
lease sync to sync and flush these on all shard copies. With this
change, replicas do not locally manage expiration of retention leases;
instead, that is done only on the primary.
2019-01-28 07:11:51 -05:00
Tanguy Leroux 758eb9d451 Track accurate total hits in CloseIndexIT
The test was not using the TRACK_TOTAL_HITS_ACCURATE and thus
encountered a different issue tracked in #37907. In the meanwhile
we can adapt the test to not fail anymore.

Closes #37897
2019-01-28 11:30:20 +01:00
Martijn van Groningen 4e1a779773
Prepare ShardFollowNodeTask to bootstrap when it fall behind leader shard (#37562)
* Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing.
* Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved.
* Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header.

Relates to #35975
2019-01-28 09:30:04 +01:00
Jim Ferenczi a056804831 Track total hits in tests that index more than 10,000 docs
This change sets track_total_hits to true on a test that requires
to check the total hits of a query that can return more than 10,000 docs.

Closes #37895
2019-01-28 09:24:32 +01:00
Dimitrios Liappis 290c6637c2
Refactor into appropriate uses of scheduleUnlessShuttingDown (#37709)
Replace `threadPool().schedule()` / catch
`EsRejectedExecutionException` pattern with direct calls to
`ThreadPool#scheduleUnlessShuttingDown()`.

Closes #36318
2019-01-28 10:01:26 +02:00
Julie Tibshirani b1735aa93b
Support both typed and typeless 'get mapping' requests in the HLRC. (#37796)
From previous PRs, we've already added support for include_type_name to
the get mapping API. We had also taken an approach to the HLRC where the
server-side `GetMappingResponse#fromXContent` could only handle typeless
input.

This PR updates the HLRC for 'get mapping' to be in line with our new approach:

* Add a typeless 'get mappings' method to the Java HLRC, that accepts new
client-side request and response objects. This new response only handles
typeless mapping definitions.
* Switch the old version of `GetMappingResponse` back to expecting typed
mappings, and deprecate the corresponding method on the HLRC.

Finally, the PR also does some small, related clean-up around 'get field mappings'.
2019-01-27 16:02:22 -08:00
Jason Tedor f24dce1122
Fix newlines in retention lease sync action tests
There is a method invocation here spanning multiple lines. This commit
breaks it up into a line per parameter as this is friendlier to future
changes and diffs.
2019-01-27 08:16:14 -05:00
Jason Tedor 3801925cf0
Copy retention leases under lock
When adding a retention lease, we make a reference copy of the retention
leases under lock and then make a copy of that collection outside of the
lock. However, since we merely copied a reference to the retention
leases, after leaving a lock the underlying collection could change on
us. Rather, we want to copy these under lock. This commit adds a
dedicated method for doing this, asserts that we hold a lock when we use
this method, and changes adding a retention lease to use this method.

This commit was intended to be included with #37398 but was pushed to
the wrong branch.
2019-01-27 08:13:47 -05:00
Jason Tedor 5fddb631a2
Introduce retention lease syncing (#37398)
This commit introduces retention lease syncing from the primary to its
replicas when a new retention lease is added. A follow-up commit will
add a background sync of the retention leases as well so that renewed
retention leases are synced to replicas.
2019-01-27 07:49:56 -05:00
Nhat Nguyen 780b4c72fe
Make ChannelActionListener a top-level class (#37797)
We start using this class more often. Let's make it a top-level class.
2019-01-26 22:01:30 -05:00
Julie Tibshirani afc60bb0e5 Mute DynamicMappingIT#testConflictingDynamicMappings
Tracked in #37898.
2019-01-25 18:09:34 -08:00
Tal Levy eb973a4744 fix GeoHashGridTests precision parsing error
Previously, a hardcoded precision value of 4 was
used by these tests resulting in no approximation
errors. Now that the precision is between 1-12,
precision values of 1 and 2 result in potential
bucketing errors.

This commit adjusts the range to be 4-12.

Fixes #37892.
2019-01-25 17:29:04 -08:00
Julie Tibshirani 58301ead6d Mute IndexShardIT#testMaybeFlush
Tracked in #37896.
2019-01-25 17:12:16 -08:00
Julie Tibshirani 23b0d9b3ed Mute RecoveryWhileUnderLoadIT#testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest
Tracked in #37895.
2019-01-25 16:50:39 -08:00
Julie Tibshirani e41ccdc1a0 Mute GeoWKTShapeParserTests#testParseGeometryCollection
Tracked in #37894.
2019-01-25 16:15:16 -08:00
Julie Tibshirani 827ed12146 Mute TasksIT#testTransportBulkTasks
Tracked in #37893.
2019-01-25 15:29:24 -08:00
Julie Tibshirani a4020f4587 Mute SharedClusterSnapshotRestoreIT#testSnapshotCanceledOnRemovedShard
Tracked in #37888.
2019-01-25 13:40:29 -08:00
Like eb7bf16427 Migrate o.e.i.r.RecoveryState to Writeable (#37380)
Relates to #34389
2019-01-25 15:52:04 -05:00
Nhat Nguyen 5cd4dfb0e4
Relax cluster metadata version check (#37834)
If the in_sync_allocations of index-1 or index-2 is changed, the
metadata version will be increased. This leads to the failure in
the metadata version checks. We need to relax them.

Closes #37820
2019-01-25 14:54:13 -05:00
Yuri Astrakhan f1e71be8b2
Refactored GeoHashGrid unit tests (#37832)
* Refactored GeoHashGrid unit tests

This change allows other grid aggregations to reuse the same tests.

The change mostly just moves code to the base classes, trying to
keep changes to a bare minimum.

* rename createInternalGeoHashGridBucket to createInternalGeoGridBucket

* indentation
2019-01-25 13:37:24 -05:00
Zachary Tong afd4618851 Fixes for a few randomized agg tests that fail hasValue() checks
Closes #37743
Closes #37873
2019-01-25 12:39:42 -05:00
Igor Motov 68149b6058
Geo: replace intermediate geo objects with libs/geo (#37721)
Replaces intermediate geo objects built by ShapeBuilders with
objects from the libs/geo hierarchy. This should allow us to build
all geo functionality around a single hierarchy.

Follow up for #35320
2019-01-25 11:37:27 -05:00
Tanguy Leroux a644bc095c
Add unit tests for ShardStateAction's ShardStartedClusterStateTaskExecutor (#37756) 2019-01-25 16:51:53 +01:00
Vishnu Gt 27c3fb8e0d Do not allow negative variances (#37384)
Due to floating point error, it was possible for variances to become negative which should never happen.  This bugfix sets variance to zero if it becomes negative as a result of fp error.
2019-01-25 09:56:34 -05:00