Commit Graph

2551 Commits

Author SHA1 Message Date
Yannick Welsch 3c9f7031b9
Enforce cluster UUIDs (#37775)
This commit adds join validation around cluster UUIDs, preventing a node to join a cluster if it was
previously part of another cluster. The commit introduces a new flag to the cluster state,
clusterUUIDCommitted, which denotes whether the node has locked into a cluster with the given
uuid. When a cluster is committed, this flag will turn to true, and subsequent cluster state updates
will keep the information about committal. Note that coordinating-only nodes are still free to switch
clusters at will (after restart), as they don't carry any persistent state.
2019-01-29 15:41:05 +01:00
Luca Cavanna 09a11a34ef
Remove clusterAlias instance member from QueryShardContext (#37923)
The clusterAlias member is only used in the copy constructor, to be able
to reconstruct the fully qualified index. It is also possible to remove
the instance member and add a private constructor that accepts the already built Index object which contains the cluster alias.
2019-01-29 15:31:49 +01:00
Boaz Leskes 65a9b61a91
Add Seq# based optimistic concurrency control to UpdateRequest (#37872)
The update request has a lesser known support for a one off update of a known document version. This PR adds an a seq# based alternative to power these operations.

Relates #36148 
Relates #10708
2019-01-29 09:18:05 -05:00
Tanguy Leroux 5d1964bcbf
Ignore shard started requests when primary term does not match (#37899)
This commit changes the StartedShardEntry so that it also contains the 
primary term of the shard to start. This way the master node can also 
checks that the primary term from the start request is equal to the current 
shard's primary term in the cluster state, and it can ignore any shard 
started request that would concerns a previous instance of the shard that 
would have been allocated to the same node.

Such situation are likely to happen with frozen (or restored) indices and 
the replication of closed indices, because with replicated closed indices 
the shards will be initialized again after the index is closed and can 
potentially be re initialized again if the index is reopened as a frozen 
index. In such cases the lifecycle of the shards would be something like:
* shard is STARTED
* index is closed
* shards is INITIALIZING (index state is CLOSED, primary term is X)
* index is reopened
* shards are INITIALIZING again (index state is OPENED, potentially frozen, 
primary term is X+1)

Adding the primary term to the shard started request will allow to discard 
potential StartedShardEntry requests received by the master node if the 
request concerns the shard with primary term X because it has been 
moved/reinitialized in the meanwhile under the primary term X+1.

Relates to #33888
2019-01-29 15:09:40 +01:00
Luca Cavanna 2325fb9cb3
Remove test only SearchShardTarget constructor (#37912)
Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.
2019-01-29 14:58:11 +01:00
Luca Cavanna 42eec55837
Replace failure.get().addSuppressed with failure.accumulateAndGet() (#37649)
Also add a test for concurrent incoming failures
2019-01-29 14:57:33 +01:00
Luca Cavanna a6d4838a67
Clean up allowPartialSearchResults serialization (#37911)
When serializing allowPartialSearchResults to the shards through ShardSearchTransportRequest, we use an optional boolean field, though
the corresponding instance member is declared `boolean` which can never
be null. We also have an assert to verify that the incoming search
request provides a non-null value for the flag, and a comment explaining
that null should be considered a bug.

This commit makes the allowPartialSearchResults method in
ShardSearchRequest return a `boolean` rather than a `Boolean` and
changes the serialization from optional to non optional, in a bw comp manner.
2019-01-29 14:56:22 +01:00
Tanguy Leroux 460f10ce60
Close Index API should force a flush if a sync is needed (#37961)
This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a 
forced flush, forcing the translog and the Lucene commit to contain the same max seq 
number and global checkpoint in the case the Translog contains operations that were 
not written in the IndexWriter (like a Delete that touches a non existing doc). This way 
the assertion added in #37426 won't trip.

Related to #33888
2019-01-29 13:15:58 +01:00
Yannick Welsch 504a89feaf
Step down as master when configured out of voting configuration (#37802)
Abdicates to another master-eligible node once the active master is reconfigured out of the voting
configuration, for example through the use of voting configuration exclusions.

Follow-up to #37712
2019-01-29 12:43:04 +01:00
Yannick Welsch 827c4f6567 Make Version.java aware of 6.x Lucene upgrade
Relates to #37913
2019-01-29 10:44:01 +01:00
Przemyslaw Gomulka 891320f5ac
Elasticsearch support to JSON logging (#36833)
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.

To populate additional fields node.id and cluster.uuid which are not available at start time, 
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. 

Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions

Docker log4j2 configuration is now almost the same as the one use for ES binary. 
The only difference is that docker is using console appenders, whereas ES is using file appenders.

relates: #32850
2019-01-29 07:20:09 +01:00
Nhat Nguyen 9ceb218d85 Adjust bwc version for put mapping requests
Relates #37675
2019-01-28 10:57:11 -05:00
Armin Braun 0d109396fa
Increase Timeout in #testSnapshotCanceled (#37890)
* The test failure reported in the issue looks like a mere timeout. Logging suggestst hat the snapshot completes/aborts correctly but the busy
loop polling the snapshot state times out too early.
* Closes #37888
2019-01-28 14:13:02 +01:00
Luca Cavanna a9adc16922 Mute failing SearchQueryIT test
Relates to #37814
2019-01-28 13:41:13 +01:00
Alpar Torok 64b98db973
Add an alias for :server:integTest so it runs as part of internalClusterTest (#37910) 2019-01-28 14:26:22 +02:00
Jason Tedor 194cdfe208
Sync retention leases on expiration (#37902)
This commit introduces a sync of retention leases when a retention lease
expires. As expiration of retention leases is lazy, their expiration is
managed only when getting the current retention leases from the
replication tracker. At this point, we callback to our full retention
lease sync to sync and flush these on all shard copies. With this
change, replicas do not locally manage expiration of retention leases;
instead, that is done only on the primary.
2019-01-28 07:11:51 -05:00
Tanguy Leroux 758eb9d451 Track accurate total hits in CloseIndexIT
The test was not using the TRACK_TOTAL_HITS_ACCURATE and thus
encountered a different issue tracked in #37907. In the meanwhile
we can adapt the test to not fail anymore.

Closes #37897
2019-01-28 11:30:20 +01:00
Martijn van Groningen 4e1a779773
Prepare ShardFollowNodeTask to bootstrap when it fall behind leader shard (#37562)
* Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing.
* Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved.
* Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header.

Relates to #35975
2019-01-28 09:30:04 +01:00
Jim Ferenczi a056804831 Track total hits in tests that index more than 10,000 docs
This change sets track_total_hits to true on a test that requires
to check the total hits of a query that can return more than 10,000 docs.

Closes #37895
2019-01-28 09:24:32 +01:00
Dimitrios Liappis 290c6637c2
Refactor into appropriate uses of scheduleUnlessShuttingDown (#37709)
Replace `threadPool().schedule()` / catch
`EsRejectedExecutionException` pattern with direct calls to
`ThreadPool#scheduleUnlessShuttingDown()`.

Closes #36318
2019-01-28 10:01:26 +02:00
Julie Tibshirani b1735aa93b
Support both typed and typeless 'get mapping' requests in the HLRC. (#37796)
From previous PRs, we've already added support for include_type_name to
the get mapping API. We had also taken an approach to the HLRC where the
server-side `GetMappingResponse#fromXContent` could only handle typeless
input.

This PR updates the HLRC for 'get mapping' to be in line with our new approach:

* Add a typeless 'get mappings' method to the Java HLRC, that accepts new
client-side request and response objects. This new response only handles
typeless mapping definitions.
* Switch the old version of `GetMappingResponse` back to expecting typed
mappings, and deprecate the corresponding method on the HLRC.

Finally, the PR also does some small, related clean-up around 'get field mappings'.
2019-01-27 16:02:22 -08:00
Jason Tedor f24dce1122
Fix newlines in retention lease sync action tests
There is a method invocation here spanning multiple lines. This commit
breaks it up into a line per parameter as this is friendlier to future
changes and diffs.
2019-01-27 08:16:14 -05:00
Jason Tedor 3801925cf0
Copy retention leases under lock
When adding a retention lease, we make a reference copy of the retention
leases under lock and then make a copy of that collection outside of the
lock. However, since we merely copied a reference to the retention
leases, after leaving a lock the underlying collection could change on
us. Rather, we want to copy these under lock. This commit adds a
dedicated method for doing this, asserts that we hold a lock when we use
this method, and changes adding a retention lease to use this method.

This commit was intended to be included with #37398 but was pushed to
the wrong branch.
2019-01-27 08:13:47 -05:00
Jason Tedor 5fddb631a2
Introduce retention lease syncing (#37398)
This commit introduces retention lease syncing from the primary to its
replicas when a new retention lease is added. A follow-up commit will
add a background sync of the retention leases as well so that renewed
retention leases are synced to replicas.
2019-01-27 07:49:56 -05:00
Nhat Nguyen 780b4c72fe
Make ChannelActionListener a top-level class (#37797)
We start using this class more often. Let's make it a top-level class.
2019-01-26 22:01:30 -05:00
Julie Tibshirani afc60bb0e5 Mute DynamicMappingIT#testConflictingDynamicMappings
Tracked in #37898.
2019-01-25 18:09:34 -08:00
Tal Levy eb973a4744 fix GeoHashGridTests precision parsing error
Previously, a hardcoded precision value of 4 was
used by these tests resulting in no approximation
errors. Now that the precision is between 1-12,
precision values of 1 and 2 result in potential
bucketing errors.

This commit adjusts the range to be 4-12.

Fixes #37892.
2019-01-25 17:29:04 -08:00
Julie Tibshirani 58301ead6d Mute IndexShardIT#testMaybeFlush
Tracked in #37896.
2019-01-25 17:12:16 -08:00
Julie Tibshirani 23b0d9b3ed Mute RecoveryWhileUnderLoadIT#testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest
Tracked in #37895.
2019-01-25 16:50:39 -08:00
Julie Tibshirani e41ccdc1a0 Mute GeoWKTShapeParserTests#testParseGeometryCollection
Tracked in #37894.
2019-01-25 16:15:16 -08:00
Julie Tibshirani 827ed12146 Mute TasksIT#testTransportBulkTasks
Tracked in #37893.
2019-01-25 15:29:24 -08:00
Julie Tibshirani a4020f4587 Mute SharedClusterSnapshotRestoreIT#testSnapshotCanceledOnRemovedShard
Tracked in #37888.
2019-01-25 13:40:29 -08:00
Like eb7bf16427 Migrate o.e.i.r.RecoveryState to Writeable (#37380)
Relates to #34389
2019-01-25 15:52:04 -05:00
Nhat Nguyen 5cd4dfb0e4
Relax cluster metadata version check (#37834)
If the in_sync_allocations of index-1 or index-2 is changed, the
metadata version will be increased. This leads to the failure in
the metadata version checks. We need to relax them.

Closes #37820
2019-01-25 14:54:13 -05:00
Yuri Astrakhan f1e71be8b2
Refactored GeoHashGrid unit tests (#37832)
* Refactored GeoHashGrid unit tests

This change allows other grid aggregations to reuse the same tests.

The change mostly just moves code to the base classes, trying to
keep changes to a bare minimum.

* rename createInternalGeoHashGridBucket to createInternalGeoGridBucket

* indentation
2019-01-25 13:37:24 -05:00
Zachary Tong afd4618851 Fixes for a few randomized agg tests that fail hasValue() checks
Closes #37743
Closes #37873
2019-01-25 12:39:42 -05:00
Igor Motov 68149b6058
Geo: replace intermediate geo objects with libs/geo (#37721)
Replaces intermediate geo objects built by ShapeBuilders with
objects from the libs/geo hierarchy. This should allow us to build
all geo functionality around a single hierarchy.

Follow up for #35320
2019-01-25 11:37:27 -05:00
Tanguy Leroux a644bc095c
Add unit tests for ShardStateAction's ShardStartedClusterStateTaskExecutor (#37756) 2019-01-25 16:51:53 +01:00
Vishnu Gt 27c3fb8e0d Do not allow negative variances (#37384)
Due to floating point error, it was possible for variances to become negative which should never happen.  This bugfix sets variance to zero if it becomes negative as a result of fp error.
2019-01-25 09:56:34 -05:00
Tanguy Leroux ef8dd12c6d Limit number of documents indexed in CloseIndexIT test
This test indexes an unlimited number of documents, this commit
reduces this number to 25K and also tracks exact number of hits
when counting the docs.
2019-01-25 15:09:27 +01:00
Christoph Büscher b4b4cd6ebd
Clean codebase from empty statements (#37822)
* Remove empty statements

There are a couple of instances of undocumented empty statements all across the
code base. While they are mostly harmless, they make the code hard to read and
are potentially error-prone. Removing most of these instances and marking blocks
that look empty by intention as such.

* Change test, slightly more verbose but less confusing
2019-01-25 14:23:02 +01:00
Henning Andersen 49073dd2f6
Fail start on invalid index metadata (#37748)
Node started with node.data=false and node.master=false can no longer
start if they have index metadata. This avoids resurrecting old indexes
into the cluster and ensures metadata is cleaned out before
re-purposing a node that was previously master or data node.

Issue #27073
2019-01-25 14:22:48 +01:00
Jim Ferenczi cb451edb01
Allow nested fields in the composite aggregation (#37178)
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.

Closes #28611
2019-01-25 14:00:39 +01:00
Alexander Reelsen 9e350d027e
Add BWC compatible processing to ingest date processors (#37407)
The ingest date processor is currently only able to parse joda formats.
However it is not using the existing elasticsearch classes but access
joda directly. This means that our existing BWC layer does not notify
the user about deprecated formats. This commit switches to use the
exising Elasticsearch Joda methods to acquire a date format, that
includes the BWC check and the ability to parse java 8 dates.

The date parsing in ingest has also another extra feature, that the
fallback year, when a date format without a year is used, is the current
year, and not 1970 like usual. This is currently not properly supported
in the DateFormatter class. As this is the only case for this feature
and java time can take care of this using the toZonedDateTime() method,
a workaround just for the joda time parser has been created, that can be
removed soon again from 7.0.
2019-01-25 13:50:19 +01:00
Jim Ferenczi 787acb14b9
Track total hits up to 10,000 by default (#37466)
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.

Closes #33028
2019-01-25 13:45:39 +01:00
Mayya Sharipova 70af3c7983
Correct deprec log in RestGetFieldMappingAction (#37843)
* Correct deprec log in RestGetFieldMappingAction

Correct a class used for deprecation logging in
RestGetFieldMappingAction

* Correct deprec log in RestCreateIndexAction

Correct a class used for deprecation logging in
RestCreateIndexAction
2019-01-25 07:13:46 -05:00
Andrey Ershov 9e7fd8caed
Migrate ZenDiscoveryIT to Zen2 (#37465)
ZenDiscoveryIT contained 5 tests. 3 run without changes, testNodeRejectsClusterStateWithWrongMasterNode removed, testHandleNodeJoin_incompatibleClusterState changed.
2019-01-25 11:17:09 +01:00
Armin Braun 7692b607b9
Fix ClusterDisruptionIT#testAckedIndexing (#37853)
* Stop threads before logging the list of exceptions
* For the broken case of concurrent iteration in the finally block and the threads not having shut down,
use `CopyOnWriteArrayList` to have concurrency safe iteration
* Closes #37810
2019-01-25 09:38:29 +01:00
Martijn van Groningen 5a9dadb3ff
changed versionAdded now that #37767 is backedported 2019-01-25 09:18:42 +01:00
Martijn van Groningen 1151f3b3ff
Fail with a dedicated exception if remote connection is missing or (#37767)
or connectivity to the remote connection is failing.

Relates to #37681
2019-01-25 08:53:18 +01:00
Ricardo Ferreira df8fa9781e Remove Abstract Component (#35898)
TransportAction and BaseRestHandler now no longer extends AbstractComponent. The AbstractComponent no longer has usages so it was deleted.

Closes #34488
2019-01-25 08:35:19 +01:00
Yuri Astrakhan 6a13a252e9
Abstract GeoHashGridAggregatorFactory creation, renamed geohash -> hash (#37836)
* Delegate `new GeoHashGridAggregatorFactory(...)` inside the `GeoGridAggregationBuilder` to the child classes.
* Rename all `geohash...` to `hash...`
2019-01-24 23:45:18 -05:00
Nhat Nguyen 3ccd488755 Remove testMappingsPropagatedToMasterNodeImmediately
This test is obsolete since #31140 where an index request with dynamic
mapping update no longer requires acking.

Closes #37816
2019-01-24 21:48:50 -05:00
Julie Tibshirani e1d8df4ffa
Deprecate types in create index requests. (#37134)
From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates:
* Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests.
* Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.
2019-01-24 13:17:47 -08:00
Boaz Leskes af2f4c8f73 enable bwc tests and bump versions after backporting https://github.com/elastic/elasticsearch/pull/37639 2019-01-24 20:55:55 +01:00
Nhat Nguyen 864e465515 Adjust minRetainedSeqNo asssertion in CombinedDeletionPolicyTests
In these tests, we initialize the retained_seq_no with NO_OPS_PERFORMED,
thus we should verify that the min of the retained_seq_no is at least
NO_OPS_PERFORMED not 0.

Closes #35994
2019-01-24 13:43:51 -05:00
Andrey Ershov 4974684003
Add tool elasticsearch-node unsafe-bootstrap (#37696)
elasticsearch-node tool helps to restore cluster if half or more of
master eligible nodes are lost. Of course, all bets are off, regarding
data consistency.

There are two parts of the tool: unsafe-bootstrap to be used when there
is still at least one master-eligible node alive and detach-cluster,
when there are no master-eligible nodes left.
This commit implements the first part.

Docs for the tool will be added separately as a part of #37812.
2019-01-24 19:25:55 +01:00
Tal Levy 289106a578
Refactor GeoHashGrid to be abstract and re-usable (#37742)
This change split out all the specific GeoHash
classes for the geohash_grid aggregation into
abstract GeoGrid classes that can be re-used for
specific hashing types, like `geohash`
2019-01-24 10:12:14 -08:00
Nhat Nguyen 76fb573569
Do not allow put mapping on follower (#37675)
Today, the mapping on the follower is managed and replicated from its
leader index by the ShardFollowTask. Thus, we should prevent users
from modifying the mapping on the follower indices.

Relates #30086
2019-01-24 12:13:00 -05:00
David Turner 187b233571 Read m_m_n from cluster states from 6.7
This completes the BWC serialisation changes required for a 6.7 master to
inform other nodes of the node-level value of the `minimum_master_nodes`
setting.

Relates #37701, #37811
2019-01-24 17:05:49 +00:00
David Roberts 0e36adc35f Mute SimpleClusterStateIT testMetadataVersion
Due to https://github.com/elastic/elasticsearch/issues/37820
2019-01-24 16:50:55 +00:00
David Roberts bd02ca4b7b Mute NoMasterNodeIT testNoMasterActionsWriteMasterBlock
Due to https://github.com/elastic/elasticsearch/issues/37823
2019-01-24 15:17:13 +00:00
Nhat Nguyen a6abb28abf
Fix InternalEngineTests#assertOpsOnPrimary (#37746)
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates #36467
Closes #37684
2019-01-24 10:02:48 -05:00
David Roberts a81931bb2a Mute DynamicMappingIT testMappingsPropagatedToMasterNodeImmediately
Due to https://github.com/elastic/elasticsearch/issues/37816
2019-01-24 14:32:44 +00:00
Jason Tedor 7517e3a7bd
Optimize warning header de-duplication (#37725)
Now that warning headers no longer contain a timestamp of when the
warning was generated, we no longer need to extract the warning value
from the warning to determine whether or not the warning value is
duplicated. Instead, we can compare strings directly.

Further, when de-duplicating warning headers, are constantly rebuilding
sets. Instead of doing that, we can carry about the set with us and
rebuild it if we find a new warning value.

This commit applies both of these optimizations.
2019-01-24 08:39:24 -05:00
Yannick Welsch feab59df03
Bubble exceptions up in ClusterApplierService (#37729)
Exceptions thrown by the cluster applier service's settings and cluster appliers are bubbled up, and
block the state from being applied instead of silently being ignored. In combination with the cluster
state publishing lag detector, this will throw a node out of the cluster that can't properly apply
cluster state updates.
2019-01-24 14:09:03 +01:00
Simon Willnauer c7b16162ae
Remove unused ThreadBarrier class (#37666)
This class is pretty complex and only used in a test where we can simply
fail the test with an assertion error.
2019-01-24 13:52:22 +01:00
Yannick Welsch 2bf269e628 Fix docs for MappingUpdatedAction
Follow-up to #31140
2019-01-24 12:44:36 +01:00
David Roberts bcf5a4ca47 Mute ClusterDisruptionIT testAckedIndexing
Due to https://github.com/elastic/elasticsearch/issues/37810
2019-01-24 10:58:02 +00:00
Yannick Welsch 64adb5ad5b
Set acking timeout to 0 on dynamic mapping update (#31140)
As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not
be required for acking to succeed in order for index requests with dynamic mapping updates to
successfully complete.

Relates to #30672 and Closes #30844
2019-01-24 11:39:46 +01:00
Armin Braun 36889e8a2f
Remove Custom Listeners from SnapshotsService (#37629)
* Remove Custom Listeners from SnapshotsService

Motivations:
    * Shorten the code some more
    * Use ActionListener#wrap to get easy to reason about behavior in failure scenarios
    * Remove duplication in the logic of handling snapshot completion listeners (listeners removing themselves and comparing snapshots to their targets)
        * Also here, move all listener handling into `SnapshotsService` and remove custom listener class by putting listeners in a map
2019-01-24 10:11:18 +01:00
David Turner bdef2ab8c0
Use m_m_nodes from Zen1 master for Zen2 bootstrap (#37701)
Today we support a smooth rolling upgrade from Zen1 to Zen2 by automatically
bootstrapping the cluster once all the Zen1 nodes have left, as long as the
`minimum_master_nodes` count is satisfied. However this means that Zen2 nodes
also require the `minimum_master_nodes` setting for this one specific and
transient situation.

Since nodes only perform this automatic bootstrapping if they previously
belonged to a Zen1 cluster, they can keep track of the `minimum_master_nodes`
setting from the previous master instead of requiring it to be set on the Zen2
node.
2019-01-24 08:57:40 +00:00
Mayya Sharipova fdb66039d4
Change `rational` to `saturation` in script_score (#37766)
This change of the function name is necessary for conformity
with feature queries.

Closes #37714
2019-01-23 14:28:20 -05:00
Mayya Sharipova c8565fe692
Deprecate types in get field mapping API (#37667)
- Add deprecation warning to RestGetFieldMappingAction
- Add two new java HRLC classes GetFieldMappingsRequest and
GetFieldMappingsResponse. These classes use new typeless forms
of a request and response, and differ in that from the server
versions.

Relates to #35190
2019-01-23 14:24:35 -05:00
Tim Brooks f45b5fedb5
Add ability to listen to group of affix settings (#37679)
Currently we have the ability to listen for setting changes to two group
affix settings. However, it is possible that we might have the need to
listen to more than two. This commit adds a method that allows consumer
to listen to a list of affix settings for changes.
2019-01-23 12:05:39 -07:00
Jason Tedor 169cb38778
Liberalize StreamOutput#writeStringList (#37768)
In some cases we only have a string collection instead of a string list
that we want to serialize out. We have a convenience method for writing
a list of strings, but no such method for writing a collection of
strings. Yet, a list of strings is a collection of strings, so we can
simply liberalize StreamOutput#writeStringList to be more generous in
the collections that it accepts and write out collections of strings
too. On the other side, we do not have a convenience method for reading
a list of strings. This commit addresses both of these issues.
2019-01-23 12:52:17 -05:00
Benjamin Trent 1c2ae9185c
Add PersistentTasksClusterService::unassignPersistentTask method (#37576)
* Add PersistentTasksClusterService::unassignPersistentTask method

* adding cancellation test

* Adding integration test for unallocating tasks from a node

* Addressing review comments

* adressing minor PR comments
2019-01-23 11:48:32 -06:00
Igor Motov e3672aa551
Tests: disable testRandomGeoCollectionQuery on tiny polygons (#37579)
Due to https://issues.apache.org/jira/browse/LUCENE-8634 this test
may fail if a really tiny polygon is generated. This commit checks for
tiny polygons and skips the final check, which is expected to fail
until the lucene bug is fixed and new version of lucene is released.
2019-01-23 12:25:54 -05:00
Julie Tibshirani f0fc6e8003
Make sure PutMappingRequest accepts content types other than JSON. (#37720) 2019-01-23 08:51:05 -08:00
David Kyle d193ca8aae
Use disassociate in preference to deassociate (#37704) 2019-01-23 16:06:25 +00:00
Armin Braun 2439f68745
Delete Redundant RoutingServiceTests (#37750)
* This test compleletly overrode the `reroute` method and hence did nothing put test the override itself
   * Removed the test since it tests nothing and simplified `reroute` accordingly
2019-01-23 16:39:02 +01:00
Nhat Nguyen 6a9838359c
Always return metadata version if metadata is requested (#37674)
If the indices of a ClusterStateRequest are specified, we fail to
include the cluster state metadata version in the response.

Relates  #37633
2019-01-23 10:24:51 -05:00
Luca Cavanna 12f5b02fd0
Streamline skip_unavailable handling (#37672)
This commit moves the collectSearchShards method out of RemoteClusterService into TransportSearchAction that currently calls it. RemoteClusterService used to be used only for cross-cluster search but is now also used in cross-cluster replication where different API are called through the RemoteClusterAwareClient. There is no reason for the collectSearchShards and fetchShards methods to be respectively in RemoteClusterService and RemoteClusterConnection. The search shards API can be called through the RemoteClusterAwareClient too, the only missing bit is a way to handle failures based on the skip_unavailable setting for each cluster (currently only supported in RemoteClusterConnection#fetchShards) which is achieved by adding a isSkipUnavailable(String clusterAlias) method to RemoteClusterService.
This change is useful for #32125 as we will very soon need to also call the search API against remote clusters, which will be done through RemoteClusterAwareClient. In that case we will also need to support skip_unavailable when calling the search API so we need some way to handle the skip_unavailable setting like we currently do for the search_shards call.

Relates to #32125
2019-01-23 13:53:37 +01:00
Yannick Welsch d5139e0590
Only bootstrap and elect node in current voting configuration (#37712)
Adapts bootstrapping and leader election to only trigger on nodes that are actually part of the voting
configuration.
2019-01-23 13:10:11 +01:00
Simon Willnauer 4ec3a6d922
Ensure either success or failure path for SearchOperationListener is called (#37467)
Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to #37185
2019-01-23 12:38:44 +01:00
Tanguy Leroux 6130d15172
Adapt SyncedFlushService (#37691) 2019-01-23 11:08:54 +01:00
Alexander Reelsen 701d89caa2 Mute FilterAggregatorTests#testRandom
Relates #37743
2019-01-23 11:00:37 +01:00
Alexander Reelsen daa2ec8a60
Switch mapping/aggregations over to java time (#36363)
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.

The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.

Relates #27330
2019-01-23 10:40:05 +01:00
Boaz Leskes 52ba407931
Expose sequence number and primary terms in search responses (#37639)
Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. 

This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.
2019-01-23 09:01:58 +01:00
Andrey Ershov 7c6566e14c
Migrate SpecificMasterNodesIT to Zen2 (#37532)
1. testSimpleOnlyMasterNodeElection - requires cluster bootstrap when
the first master node is started.
2. testElectOnlyBetweenMasterNodes - requires cluster bootstrap when
the first master node is started and requires adding voting exclusion
before shutting down the first master node.
3. testAliasFilterValidation - requires cluster bootstrap when the
first master node is started.
2019-01-23 07:22:41 +01:00
Andrey Ershov e2e00cd245
Fix MetaStateFormat tests
It's not safe to continue writing state using MetaDataStateFormat
after dirty WriteStateException occurred if it's not recovered by
successful subsequent state write.

We've encountered test failure of testFailRandomlyAndReadAnyState.
The test breaks in the following way. There are 3 state paths. And what
happens next

Successful write at the beginning of the test yields 0 0 0 state
files in the directories.
1st write in the loop is unsuccessful, but not dirty - 0 0 0.
2nd write in the loop is not successful and dirty (failure during
fsync), however before removing new files we have 1 1 1. But now during
deletion, the first deletion fails and we get - 1 0 0.
3rd write in the loop is unsuccessful, but not dirty - so we want to
keep old generation, which happens to be the 1st generation, so now we
have 1 x x in state folders. Now we assert that we either load 0 or 1
state from the state folders and select only 2rd and 3th folder to
emulate disk failures - this results in NPE because there is nothing in
these folders.
Fortunately, this won’t be a problem in real life, because if there is
a dirty exception, we shut down the node and make sure we perform a
successful write on the node startup.
2019-01-23 07:21:26 +01:00
Zachary Tong 2ba9e361ab
Add helper classes to determine if aggs have a value (#36020)
This adds a set of helper classes to determine if an agg "has a value". 
This is needed because InternalAggs represent "empty" in different 
manners according to convention. Some use `NaN`, `+/- Inf`, `0.0`, etc.

A user can pass the Internal agg type to one of these helper methods
and it will report if the agg contains a value or not, which allows the
user to differentiate "empty" from a real `NaN`.

These helpers are best-effort in some cases.  For example, several
pipeline aggs share a single return class but use different conventions
to mark "empty", so the helper uses the loosest definition that applies
to all the aggs that use the class.

Sums in particular are unreliable.  The InternalSum simply returns 0.0
if the agg is empty (which is correct, no values == sum of zero).  But this
also means the helper cannot differentiate from "empty" and `+1 + -1`.
2019-01-22 12:38:55 -05:00
Jason Tedor 715719ee3b
Remove warn-date from warning headers (#37622)
This commit removes the warn-date from warning headers. Previously we
were stamping every warning header with when the request
occurred. However, this has a severe performance penalty when
deprecation logging is called frequently, as obtaining the current time
and formatting it properly is expensive. A previous change moved to
using the startup time as the time to stamp on every warning header, but
this was only to prove that the timestamping was expensive. Since the
warn-date is optional, we elect to remove it from the warning
header. Prior to this commit, we worked in Kibana to make the warn-date
treated as optional there so that we can follow-up in Elasticsearch and
remove the warn-date. This commit does that.
2019-01-22 12:29:24 -05:00
Yannick Welsch 23ba900840
Publish to masters first (#37673)
Prefer publishing to master-eligible nodes first, so that cluster state updates are committed more
quickly, and master-eligible nodes also turned more quickly into followers after a leader election.
2019-01-22 13:53:10 +01:00
David Kyle 3fad1eeaed
Un-assign persistent tasks as nodes exit the cluster (#37656)
PersistentTasksClusterService decides if a task should be reassigned by 
checking there is a node in the cluster with the same Id. If a node is 
restarted PersistentTasksClusterService may not observe the change and 
decide the task still has a valid assignment because the node's 
ephemeral Id is not used in that decision. This change un-assigns tasks
as the nodes in the cluster change.
2019-01-22 12:44:45 +00:00
Henning Andersen 228611843c
Fail start of non-data node if node has data (#37347)
* Fail start of non-data node if node has data

Check that nodes started with node.data=false cannot start if they have
shard data to avoid (old) indexes being resurrected into the cluster in red status.

Issue #27073
2019-01-22 13:27:12 +01:00
Yannick Welsch 2a7b7ccf1c
Use cancel instead of timeout for aborting publications (#37670)
When publications were cancelled because a node turned to follower or candidate, it would still
show as time out, which can be confusing in the logs. This change adapts the improper call of
onTimeout by generalizing it to a cancel method.
2019-01-22 12:51:03 +01:00
Christoph Büscher 0a93a0358b
Remove deprecated FieldNamesFieldMapper.Builder#index (#37305)
The method calls "enabled" in addition to what the super.index() does, but this
seems to be done explicitely now in the TypeParsers `parse` method. The removed
method has been deprecated since at least 6.0. Also making some of the Builders
methods and ctos private since they are only used internally in this class.
2019-01-22 12:12:21 +01:00
David Turner 5db7ed22a0
Bootstrap a Zen2 cluster once quorum is discovered (#37463)
Today when bootstrapping a Zen2 cluster we wait for every node in the
`initial_master_nodes` setting to be discovered, so that we can map the
node names or addresses in the `initial_master_nodes` list to their IDs for
inclusion in the initial voting configuration. This means that if any of
the expected master-eligible nodes fails to start then bootstrapping will
not occur and the cluster will not form. This is not ideal, and we would
prefer the cluster to bootstrap even if some of the master-eligible nodes
do not start.

Safe bootstrapping requires that all pairs of quorums of all initial
configurations overlap, and this is particularly troublesome to ensure
given that nodes may be concurrently and independently attempting to
bootstrap the cluster. The solution is to bootstrap using an initial
configuration whose size matches the size of the expected set of
master-eligible nodes, but with the unknown IDs replaced by "placeholder"
IDs that can never belong to any node.  Any quorum of received votes in any
of these placeholder-laden initial configurations is also a quorum of the
"true" initial set of master-eligible nodes, giving the guarantee that it
intersects all other quorums as required.

Note that this change means that the initial configuration is not
necessarily robust to any node failures. Normally the cluster will form and
then auto-reconfigure to a more robust configuration in which the
placeholder IDs are replaced by the IDs of genuine nodes as they join the
cluster; however if a node fails between bootstrapping and this
auto-reconfiguration then the cluster may become unavailable. This we feel
to be less likely than a node failing to start at all.

This commit also enormously simplifies the cluster bootstrapping process.
Today, the cluster bootstrapping process involves two (local) transport actions
in order to support a flexible bootstrapping API and to make it easily
accessible to plugins. However this flexibility is not required for the current
design so it is adding a good deal of unnecessary complexity. Here we remove
this complexity in favour of a much simpler ClusterBootstrapService
implementation that does all the work itself.
2019-01-22 11:03:51 +00:00
Adrien Grand e9fcb25a28
Upgrade to lucene-8.0.0-snapshot-83f9835. (#37668)
This snapshot uses a new file format for doc-values which is expected to make
advance/advanceExact perform faster on sparse fields:
https://issues.apache.org/jira/browse/LUCENE-8585
2019-01-22 11:44:29 +01:00
Alpar Torok 74d1cfbf7e Mute failing test
Tracking ##37687
2019-01-22 10:50:27 +02:00
Alexander Reelsen 4fb68ea195
Fix java time formatters that round up (#37604)
In order to be able to parse epoch seconds and epoch milli seconds own
java time fields had been introduced. These fields are however not
compatible with the way that java time allows one to configure default
fields (when a part of a timestamp cannot be read then a default value
is added), which is used for the formatters that are rounding up to the
next value.

This commit allows java date formatters to configure its round up parsing 
by setting default values via a consumer. By default all formats are setting 
JavaDateFormatter.ROUND_UP_BASE_FIELDS for rounding up. The epoch
however parsers both need to set different fields. The merged date
formatters do not set any fields, they just append all the round up formatters.

Also the formatter now properly copies the locale and the timezone, 
fractional parsing has been set to nano seconds with proper width.
2019-01-22 09:42:17 +01:00
Alpar Torok 17d704347e Mute failing test
Tracking #37685
2019-01-22 10:31:23 +02:00
Tanguy Leroux 0290547ad7
Ensure that max seq # is equal to the global checkpoint when creating ReadOnlyEngines (#37426)
Since version 6.7.0 the Close Index API guarantees that all translog 
operations have been correctly flushed before the index is closed. If 
the index is reopened as a Frozen index (which uses a ReadOnlyEngine) 
we can verify that the maximum sequence number from the last Lucene 
commit is indeed equal to the last known global checkpoint and refuses 
to open the read only engine if it's not the case. In this PR the check is 
only done for indices created on or after 6.7.0 as they are guaranteed 
to be closed using the new Close Index API.

Related #33888
2019-01-22 09:22:33 +01:00
Alpar Torok a713183cab Mute failing discovery disruption tests
Tracking #37539
2019-01-22 10:16:04 +02:00
Nhat Nguyen 7394892b4c
Make prepare engine step of recovery source non-blocking (#37573)
Relates #37174
2019-01-21 21:35:10 -05:00
Tim Brooks 21838d73b5
Extract message serialization from `TcpTransport` (#37034)
This commit introduces a NetworkMessage class. This class has two
subclasses - InboundMessage and OutboundMessage. These messages can
be serialized and deserialized independent of the transport. This allows
more granular testing. Additionally, the serialization mechanism is now
a simple Supplier. This builds the framework to eventually move the
serialization of transport messages to the network thread. This is the
one serialization component that is not currently performed on the
network thread (transport deserialization and http serialization and
deserialization are all on the network thread).
2019-01-21 14:14:18 -07:00
Tim Brooks f516d68fb2
Share `NioGroup` between http and transport impls (#37396)
Currently we create dedicated network threads for both the http and
transport implementations. Since these these threads should never
perform blocking operations, these threads could be shared. This commit
modifies the nio-transport to have 0 http workers be default. If the
default configs are used, this will cause the http transport to be run
on the transport worker threads. The http worker setting will still exist
in case the user would like to configure dedicated workers. Additionally,
this commmit deletes dedicated acceptor threads. We have never had these
for the netty transport and they can be added back if a need is
determined in the future.
2019-01-21 13:50:56 -07:00
Armin Braun 3a3f5b39c3
Fix Race in Concurrent Snapshot Delete and Create (#37612)
* The repo id was determined wrong when the delete picked up on an in progress snapshot
  * NOTE: This solution is still a best-effort fix and there's a slight chance of running into concurrency issues here
when multiple create and delete requests for the same snapshot name are happening concurrently, but these require a sequence
of multiple cluster state updates between the changed method reading the genId and submitting its cluster state update task
* Added test reproduced the issue reliably in about 50% of runs
* Closes #37581
2019-01-21 13:10:33 +01:00
Luca Cavanna 09a6ba50ef
Add support for merging multiple search responses into one (#37566)
This will be used in cross-cluster search when reduction will be
performed locally on each cluster. The CCS coordinating node will send
one search request per remote cluster involved and will get one search
response back from each one of them. Such responses contain all the info
to be able to perform an additional reduction and return results back
to the user.

Relates to #32125
2019-01-21 11:51:47 +01:00
Jason Tedor adae233f77
Add some deprecation optimizations (#37597)
This commit optimizes some of the performance issues from using
deprecation logging:
 - we optimize encoding the deprecation value
 - we optimize formatting the deprecation string
 - we optimize away getting the current time (by using cached startup
   time)
2019-01-18 16:42:25 -05:00
Tal Levy 106f900dfb
refactor inner geogrid classes to own class files (#37596)
To make further refactoring of GeoGrid aggregations
easier (related: #30320), splitting out these inner
class dependencies into their own files makes it
easier to map the relationship between classes
2019-01-18 13:40:00 -08:00
Julie Tibshirani 8da7a27f3b
Deprecate types in the put mapping API. (#37280)
From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates:
- Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`.
- Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.
2019-01-18 12:28:31 -08:00
Jack Conradson de55b4dfd1
Add types deprecation to script contexts (#37554)
This adds deprecation to _type in the script contexts for ingest and update. 
This adds a DeprecationMap that wraps the ctx Map containing _type for these 
specific contexts.
2019-01-18 09:13:49 -08:00
Yannick Welsch 377d96e376
Remove initial_master_nodes on node restart (#37580)
Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since
being switched to Zen2 because the bootstrap setting was left around when nodes got restarted
with data folders wiped.

The test in question here was starting one node (which autobootstrapped to that single node), then
another node. The first node was then shut down (after excluding it from the voting configuration),
its data folder wiped, and restarted. After restart, the node had an empty data folder yet
initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of
its own, and not rejoin the existing cluster with the other node.
2019-01-18 16:36:42 +01:00
Jason Tedor ed297b7369
Only update response headers if we have a new one (#37590)
Currently when adding a response header, we do some de-duplication, and
maybe drop the header on the floor if we have reached capacity. Yet, we
still update the thread local tracking the response headers. This is
really expensive because under the hood there is a shared reference that
we synchronize on. In the case of a request processed across many shards
in a tight loop, this contention can be detrimental to performance. We
can avoid updating the thread local in these cases though, when the
response header is duplicate of one that we have already seen, or when
it's dropped on the floor. This commit addresses these performance
issues by avoiding the unnecessary set.
2019-01-18 08:20:05 -05:00
Tanguy Leroux 29d3a708da Fix BulkWithUpdatesIT and CloseIndexIT
As of today the Close Index API does its best to close indices,
but closing an index with ongoing recoveries might or might not
be acknowledged depending of the values of the max seq number
and global checkpoint at the time the
TransportVerifyShardBeforeClose action is executed.

These tests failed because they always expect that the index is
correctly closed on the first try, which is not always the case.
Instead we need to retry the closing until it succeed.

Closes #37571
2019-01-18 10:54:35 +01:00
David Turner 65e76b3f6f
Migrate RecoveryFromGatewayIT to Zen2 (#37520)
* Fixes `testTwoNodeFirstNodeCleared` by manipulating voting config exclusions.

* Removes `testRecoveryDifferentNodeOrderStartup` since state recovery is now
  handled entirely on the elected master, so the order in which the data nodes
  start is irrelevant.
2019-01-18 09:15:51 +00:00
David Turner 699d881739
Migrate IndicesExistsIT to Zen2 (#37526)
This test was actually passing, for the wrong reason: it asserts a
`MasterNotDiscoveredException` is thrown, expecting this to be due to a failure
to perform state recovery, but in fact it's thrown because the node is not
correctly bootstrapped.
2019-01-18 09:15:30 +00:00
Christoph Büscher 2f0e0b2426
Allow indices.get_mapping response parsing without types (#37492)
This change adds deprecation warning to the indices.get_mapping API in case the
"inlcude_type_name" parameter is set to "true" and changes the parsing code in
GetMappingsResponse to parse the type-less response instead of the one
containing types. As a consequence the HLRC client doesn't need to force
"include_type_name=true" any more and the GetMappingsResponseTests can be
adapted to the new format as well. Also removing some "include_type_name"
parameters in yaml test and docs where not necessary.
2019-01-18 09:33:36 +01:00
Armin Braun 62ddc8c776 Reenable UnicastZenPingTests#testSimplePings
* This was muted needlessly, the problem in #26701 only applies to `6.x`
* Relates #26701
2019-01-18 08:36:22 +01:00
Tim Brooks b6f06a48c0
Implement follower rate limiting for file restore (#37449)
This is related to #35975. This commit implements rate limiting on the
follower side using a new class `CombinedRateLimiter`.
2019-01-17 14:58:46 -07:00
Armin Braun 381d035cd6
Remove Redundant RestoreRequest Class (#37535)
* Same as #37464 but for the restore side
2019-01-17 22:23:23 +01:00
Tal Levy a0c504e4a3
Create specific exception for when snapshots are in progress (#37550)
delete and close index actions threw IllegalArgumentExceptions
when attempting to run against an index that has a snapshot
in progress.

This change introduces a dedicated SnapshotInProgressException
for these scenarios. This is done to explicitly signal to clients that
this is the reason the action failed, and it is a retryable error.

relates to #37541.
2019-01-17 13:21:12 -08:00
James Baiera 5782a5bbbc Mute UnicastZenPingTests#testSimplePings
relates #26701
2019-01-17 15:13:09 -05:00
Yannick Welsch 68de2edb14
Fix assertion at end of forceRefreshes (#37559)
This commit ensures that we only change refreshListeners to a list if we're actually adding
something to the list.
2019-01-17 19:18:47 +01:00
Yannick Welsch 6d64a2a901
Propagate Errors in executors to uncaught exception handler (#36137)
This is a continuation of #28667 and has as goal to convert all executors to propagate errors to the
uncaught exception handler. Notable missing ones were the direct executor and the scheduler. This
commit also makes it the property of the executor, not the runnable, to ensure this property. A big
part of this commit also consists of vastly improving the test coverage in this area.
2019-01-17 17:46:35 +01:00
Nhat Nguyen 20ed3dd1a8
Make recovery source send operations non-blocking (#37503)
Relates #37458
2019-01-17 09:59:05 -05:00
Jim Ferenczi 4351a5e537
Allow field types to optimize phrase prefix queries (#37436)
This change adds a way to customize how phrase prefix queries should be created
on field types. The match phrase prefix query is exposed in field types in order
to allow optimizations based on the options set on the field.
For instance the text field uses the configured prefix field (if available) to
build a span near that mixes the original field and the prefix field on the last
position.
This change also contains a small refactoring of the match/multi_match query that
simplifies the interactions between the builders.

Closes #31921
2019-01-17 15:10:28 +01:00
Yannick Welsch d9fa4e4ada
Fix testRelocateWhileContinuouslyIndexingAndWaitingForRefresh (#37560)
This test failed because the refresh at the end of the test is not guaranteed to run before the
indexing is completed, and therefore there's no guarantee that the refresh will free all operations.
This triggers an assertion failure in the test clean-up, which asserts that there are no more pending
operations.
2019-01-17 13:59:09 +01:00
Yannick Welsch 6fe2d6da03 Mute TransportClientNodesServiceTests#testListenerFailures
Relates to #37567
2019-01-17 13:54:48 +01:00
Martijn van Groningen da799306a8
Decreased time out in test
Relates to #37378
2019-01-17 11:51:17 +01:00
Torgeir Thoresen 676e1b1a13 Fix erroneous docstrings for abstract bulk by scroll request (#37517) 2019-01-17 10:22:49 +01:00
Przemyslaw Gomulka b6e5ccaf8a
Remove the AbstracLifecycleComponent constructor with Settings (#37523)
Adding the migration guide and removing the deprecated in 6.x
constructor

relates #35560
relates #34488
2019-01-17 09:10:09 +01:00
Jason Tedor 18a3e48a4a
Change file descriptor limit to 65535 (#37537)
Some systems default to a nofile ulimit of 65535. To reduce the pain of
deploying Elasticsearch to such systems, this commit lowers the required
limit from 65536 to 65535.
2019-01-16 17:19:12 -05:00
Nhat Nguyen 655103de58 Increase timeout for testAddNewReplicas
We flush quite often in testAddNewReplicas to create the safe index
commit with gaps in sequence numbers. This test is failing recently
because CI is too slow to complete 5 small flushes in 10 seconds.

This commit increases timeout for this test and also ensures to always
terminate the background indexing. The latter is to eliminate unrelated
failures if this test fails again.

Closes #37183
2019-01-16 13:17:10 -05:00
Andrey Ershov 4e72f3c5c6
DedicatedClusterSnapshotRestoreIT to Zen2 (#37489)
All tests except testRestorePersistentSettings (renamed to
testExceptionWhenRestoringPersistentSettings) worked fine.
testExceptionWhenRestoringPersistentSettings re-written to use a custom
setting, because "minimum master node" setting is no longer available
in Zen2. It turns out there is no good replacement for "minimum master
node" setting for this test, that's why the custom setting is
introduced.

Unfortunately, there is #37485 bug and currently
RestoreService does not perform setting validation. That's why the
test is annotated with @AwaitsFix, the idea is to merge this commit and
then fix the issue and enable the test. (The test passes with a simple
fix, that adds a single line to RestoreService).
2019-01-16 11:14:16 -05:00
Jack Conradson 3d8c04659c
Deprecate _type from LeafDocLookup (#37491)
* Deprecate _type from LeafDocLookup

* Response to PR comments.

* Response to PR comments.
2019-01-16 07:05:09 -08:00
Tim Brooks 0b5af276a8
Allow system privilege to execute proxied actions (#37508)
Currently all proxied actions are denied for the `SystemPrivilege`.
Unfortunately, there are use cases (CCR) where we would like to proxy
actions to a remote node that are normally performed by the
system context. This commit allows the system context to perform
proxy actions if they are actions that the system context is normally
allowed to execute.
2019-01-16 07:52:38 -07:00
Nhat Nguyen 0160ba2539 AwaitsFix testAddNewReplicas
Tracked at #37183
2019-01-16 09:48:35 -05:00
Adrien Grand 9d8afe68a5
IndexMetaData#mappingOrDefault doesn't need to take a type argument. (#37480)
Currently it takes a type, but this isn't really needed now that indices can
have at most one type. The only downside is that we might return a different
error when trying to index into a type that doesnt't exist yet.
2019-01-16 14:01:09 +01:00
Armin Braun 21a88d5505
Simplify + Cleanup Dead Code in Settings (#37341)
* Remove dead code
* Simplify some overly complex code, this class is long enough already
2019-01-16 13:57:16 +01:00
Jason Tedor 687978b7d1
Reject all requests that have an unconsumed body (#37504)
This commit removes some leniency from REST handling where we move to
reject all requests that have a body where the body is not used during
the course of handling the request. For example,

DELETE /index
{
  "query" : {
    "term" :  {
      "field" : "value"
    }
  }
}

is now rejected.
2019-01-16 07:29:25 -05:00
Dimitrios Liappis 347cbaf0ed
Fix line length for aliases and remove suppression (#37455)
Relates #34884
2019-01-16 13:06:29 +02:00
Armin Braun 5a5e44d1de
Simplify Snapshot Create Request Handling (#37464)
* The internal create request is absolutely redundant, the only difference to the transport request is that we resolved the snapshot
name when moving from the transport to the internal version
  * Removed it and passed the transport request into the snapshot service instead
* nicer way of resolve snapshot name in callback
2019-01-16 11:08:48 +01:00
Przemyslaw Gomulka 5e94f384c4
Remove the use of AbstracLifecycleComponent constructor #37488 (#37488)
The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class.
It no longer extends the AbstractComponent so there is no need for this constructor
There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor.
This is part 1. which will be backported to 6.x with a migration guide/deprecation log.
part 2 will have this constructor removed in 7
relates #35560

relates #34488
2019-01-16 09:05:30 +01:00
Julie Tibshirani 0a3bff2ca9
Only log one types warning per bulk search request. (#37446) 2019-01-15 12:38:32 -08:00
Andrey Ershov 42fd68ed38
Use GatewayMetaState in CoordinatorTests rarely (#36897)
This commit adds one more underlying implementation of MockPersistedState.
Previously only InMemoryPersistentState was used, not GatewayMetaState
is used rarely.
When adding GatewayMetaState support the main question was: do we want to
emulate exceptions as we do today in MockPersistedState before
delegating to GatewayMetaState or do we want these exceptions to
propagate from the lower level, i.e. file system exceptions?
On the one hand, lower level exception propagation is already tested in
GatewayMetaStateTests, so this won't improve the coverage.
On the other hand, the benefit of low-level exceptions is to see how all these
components work in conjunction. Finally, we abandoned the idea of low-level
exceptions because we don't have a way to deal with IOError today in
CoordinatorTests, but hacking GatewayMetaState not to throw
IOError seems unnatural.
So MockPersistedState rarely throws an exception before delegating to
GatewayMetaState, which is not supposed to throw the exception.

This commit required two changes:

Move GatewayMetaStateUT to upper-level from
GatewayMetaStatePersistedStateTests, because otherwise, it's not easy
to construct GatewayMetaState instance in CoordinatorTests.
Move addition of STATE_NOT_RECOVERED_BLOCK from GatewayMetaState
constructor to GatewayMetaState.applyClusterUpdaters, because
CoordinatorTests class assumes that there is no such block and most of
them fail.
2019-01-15 13:33:25 -05:00
Jim Ferenczi f8d80dff7c
Fix duplicate removal when merging completion suggestions (#36996)
The completion suggester ignores the original weight of the suggestion when duplicates are removed. This change fixes this bug and keeps the best weighted suggestion among the duplicates. It also removes the custom implementation of the top docs suggest collector now that https://issues.apache.org/jira/browse/LUCENE-8529 is committed in Lucene.

Closes #35836
2019-01-15 19:27:31 +01:00
Nhat Nguyen 6647122f1c
Prepare to make send translog of recovery non-blocking (#37458)
This commit prepares the required infra to make send a translog snapshot
of the recovery source non-blocking. I'll make a follow-up to make the send
snapshot method non-blocking.

Relates #37291
2019-01-15 13:17:25 -05:00
Andrey Ershov 02d4d8b409
MinimumMasterNodesIT changed for Zen2 (#37428)
There were 5 tests in MinimumMasterNodesIT. 2 of them removed, 3 of
them changed and renamed.
1) testSimpleMinimumMasterNodes -> testTwoNodesNoMasterBlock. The
flow of this test is left intact but in order to make it work on
Zen2, additional work for the cluster bootstrapping and voting
exclusions is needed. 
2) testDynamicUpdateMinimumMasterNodes -> removed, there is nothing
that corresponds to the dynamic change of the minimum master nodes
setting.
3) testCanNotBringClusterDown -> removed, it also plays with changing
minimum master nodes dynamically.
4) testMultipleNodesShutdownNonMasterNodes ->
testThreeNodesNoMasterBlock. Previously this test was checking that
there would be no master block, if min_master_nodes=3 and 4 nodes are
started, then 2 nodes are brought down. Zen2 dynamically accommodates
to the number of nodes in the cluster, so it's possible that there
still will be a master in 2 nodes cluster. For Zen2, we start up 3
nodes. And shut down 2 of them (w/o voting exclusions), which results
in no master block.
5) testCanNotPublishWithoutMinMastNodes ->
testCanNotCommitStateThreeNodes. Test flow is not changed. But
previously there was no check that nodes in the bigger part of
network partition will elect the master, before healing the network
partition. For Zen2 it does not work, because persistent setting
addition is accepted on the old master and if it's elected new master
again, this setting will appear in the cluster state.

Also, I have a feeling that we need to remove this class, but could not
come up with a good name.
2019-01-15 13:09:48 -05:00
Dimitrios Liappis 63793499bd
Fix line length for `node` and remove suppresion (#37454)
Relates #34884
2019-01-15 19:57:24 +02:00
David Turner a2a40c50a0
Report terms and version if cluster does not form (#37473)
Adds the node's current term and the term and version of the the last-accepted
cluster state to the message reported by the `ClusterFormationFailureHelper`,
since these values may be of importance when tracking down a cluster formation
failure.
2019-01-15 17:32:08 +00:00
Nhat Nguyen 68e2d36fa3 Adjust bwc version for max_concurrent_file_chunks
Relates #36981
2019-01-15 11:18:55 -05:00
Luca Cavanna 0b396a0c5e Restore assertion on discount overlaps in SimilarityTests
This assertion was commented out as the getDiscountOverlaps getter was
missing from LegacyBm25Similarity. That has been fixed in lucene.
2019-01-15 16:43:09 +01:00
Julie Tibshirani 1a1dbf705f
Make sure to use the resolved type in DocumentMapperService#extractMappings. (#37451)
* Pull out a shared method MapperService#resolveDocumentType.
* Make sure to resolve the type when extracting the mappings.

Addresses #36811.
2019-01-15 07:32:47 -08:00
Fabricio Archanjo Fonseca 3cc8f39532 New mapping signature and mapping string source fixed. (#37401)
* New mapping signature and mapping string source fixed.

* Keep compatibility with CreateIndexRequest class.
2019-01-15 08:06:32 -07:00
David Roberts 7cdf7f882b
[ML] Fix ML datafeed CCS with wildcarded cluster name (#37470)
The test that remote clusters used by ML datafeeds have
a license that allows ML was not accounting for the
possibility that the remote cluster name could be
wildcarded.  This change fixes that omission.

Fixes #36228
2019-01-15 14:19:05 +00:00
Dimitrios Liappis 19fc59f089
Fix line length for monitor and remove suppressions (#37456)
Relates #34884
2019-01-15 14:18:15 +02:00
Simon Willnauer 147c5e65d3
Remove dead code from ShardSearchStats (#37421)
The clear methodsa are unused and unsafe at this point. This commit
removes the dead code.
2019-01-15 09:39:53 +01:00
Nhat Nguyen bf49f54456
Simplify testSendSnapshotSendsOps (#37445)
The test testSendSnapshotSendsOps is currently using a mock instance of
RecoveryTargetHandler which will be hard to modify when we make the
RecoveryTargetHandler non-blocking. This commit prepares for the
incoming changes by replacing the mock instance with a stub.
2019-01-15 03:07:56 -05:00
Tim Vernum b97245cfcd
Restore lost @Inject annotation (#37452)
The Inject Annotation was removed from IndicesClusterStateService as
part of reformatting in e11a32e, but this causes CreationException on
cluster startup.
2019-01-15 18:20:22 +11:00
Jason Tedor 43bfdd32ee
Add run under primary permit method (#37440)
This commit adds a simple method for executing a runnable against a
shard under a primary permit. Today there is only a single caller for
this method, but this there are two upcoming use-cases for which having
this method will help keep the code simpler.
2019-01-14 21:54:42 -05:00
Jason Tedor e11a32eda8
Reformat some classes in the index universe
This commit reformats some classes in the index universe with the
purpose of breaking some long method definitions and invocations into a
line per parameter. This has the advantage that for an upcoming change
to these definitions and invocations, the diff for that change will be a
single line per definition or invocation. That makes these sorts of
changes easier to read.
2019-01-14 21:45:24 -05:00
Jason Tedor 3bc0711b90
Add simple method to write collection of writeables (#37448)
This commit adds a simple convenience method for writing a collection of
writeables, and replaces existing call sites with the new method.
2019-01-14 21:28:28 -05:00
Jason Tedor eb86b9f284
Fix retention lease commit test
This commit fixes an issue with testing committed retention leases when
they are not any retention leases (a deliberate edge case).

Closes #37420
2019-01-14 21:16:49 -05:00
Jason Tedor 74640d0ba7
Introduce retention lease serialization (#37447)
This commit is a simple introduction of the serialization of retention
leases, which will be needed when they are sent across the wire while
synchronizing retention leases to replicas.
2019-01-14 21:06:44 -05:00
Nhat Nguyen 397f315f56
Make finalize step of recovery source non-blocking (#37388)
Relates #37291
2019-01-14 18:20:54 -05:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Nhat Nguyen 15aa3764a4
Reduce recovery time with compress or secure transport (#36981)
Today file-chunks are sent sequentially one by one in peer-recovery. This is a
correct choice since the implementation is straightforward and recovery is
network bound in most of the time. However, if the connection is encrypted, we
might not be able to saturate the network pipe because encrypting/decrypting
are cpu bound rather than network-bound.

With this commit, a source node can send multiple (default to 2) file-chunks
without waiting for the acknowledgments from the target.

Below are the benchmark results for PMC and NYC_taxis.

- PMC (20.2 GB)

| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| -------- | -------- | -------- | -------- |
| Plain     | 184s     | 137s     | 106s     | 105s     | 106s     |
| TLS       | 346s     | 294s     | 176s     | 153s     | 117s     |
| Compress  | 1556s    | 1407s    | 1193s    | 1183s    | 1211s    |

- NYC_Taxis (38.6GB)

| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| ---------| ---------| ---------| -------- |
| Plain     | 321s     | 249s     | 191s     |  *       | *        |
| TLS       | 618s     | 539s     | 323s     | 290s     | 213s     |
| Compress  | 2622s    | 2421s    | 2018s    | 2029s    | n/a      |

Relates #33844
2019-01-14 15:14:46 -05:00
Tim Brooks 5c68338a1c
Implement ccr file restore (#37130)
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
2019-01-14 13:07:55 -07:00
Christoph Büscher c801b89072
Fix Eclipse specific compilation issue (#37419)
Without pulling out the supplier function to the enclosing class, Eclipse 4.8
complains with the following error "No enclosing instance of type
CoordinatorTests.Cluster is available due to some intermediate constructor
invocation"
2019-01-14 20:39:04 +01:00
markharwood 92c6c98e8d
Performance fix. Reduce deprecation calls for the same bulk request (#37415)
DeprecationLogger has warning de-duplication logic but it is expensive to run as it involves parsing existing warning headers. This PR changes the upstream bulk indexing code to do its own "event thinning" rather than relying on DeprecationLogger's trimming.
Closes #37411
2019-01-14 17:51:49 +00:00
David Kyle 1abe5df09c Mute IndexShardRetentionLeaseTests.testCommit #37420 2019-01-14 14:17:11 +00:00
Daniel Mitterdorfer abe35fb99b
Remove unused index store in directory service
With this commit we remove the unused field `indexStore` from all
implementations of `FsDirectoryService`.

Relates #37097
2019-01-14 13:44:32 +01:00
Tanguy Leroux 07dc8c7eee
Improve CloseWhileRelocatingShardsIT (#37348) 2019-01-14 13:14:36 +01:00
Tanguy Leroux 6ca076bf74
Fix ClusterBlock serialization and Close Index API logic after backport to 6.x (#37360)
This commit changes the versions in the serialization logic of ClusterBlock 
after the backport to 6.x of the Close Index API refactoring (#37359).
2019-01-14 13:13:15 +01:00
Christoph Büscher 89b45f1fc6
Remove deprecated pipeline request contructors (#37366)
The constructors in PutPipelineRequest and SimulatePipelineRequest that guess
the xContent type from the provided source are deprecated since 6.0 and each
have a counterpart that takes the xContent type as an explicit argument.
Removing these ctors together with the builders and methods in
ClusterAdminClient that don't have the xContent type as argument.
2019-01-14 11:14:38 +01:00
Nhat Nguyen d44a6f9fbc
Simplify SyncedFlushService flow with StepListener (#37383)
Today the SyncedFlushService flow is written with multiple nested 
callbacks which are hard to read. This commit replaces them with 
sequential step listeners.
2019-01-14 03:54:34 -05:00
Luca Cavanna d54f88f62c
Remove unused empty constructors from suggestions classes (#37295)
We recently migrated suggestions to `Writeable`. That allows us to also
clean up empty constructors and methods that called them as they are no
longer needed. They are replaced by constructors that accept a
`StreamInput` instance.
2019-01-14 08:32:45 +01:00
Jason Tedor 03be4dbaca
Introduce retention lease persistence (#37375)
This commit introduces the persistence of retention leases by persisting
them in index commits and recovering them when recovering a shard from
store.
2019-01-12 14:43:19 -08:00
Nhat Nguyen 44a1071018
Make recovery source partially non-blocking (#37291)
Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts
to make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.

Relates #36195
2019-01-12 12:49:48 -05:00
Armin Braun 63fe3c6ed6
Fix PrimaryAllocationIT Race Condition (#37355)
* Fix PrimaryAllocationIT Race Condition

* Forcing a stale primary allocation on a green index was tripping the assertion that was removed
   * Added a test that this case still errors out correctly
* Made the ability to wipe stopped datanode's data public on the internal test cluster and used it to ensure correct behaviour on the fixed test
   * Previously it simply passed because the test finished before the index went green and would NPE when the index was green at the time of the shard store status request, that would then come up empty
* Closes #37345
2019-01-11 23:26:04 +01:00
Nhat Nguyen 70cee18e56
Introduce StepListener (#37327)
This commit introduces StepListener which provides a simple way to write
a flow consisting of multiple asynchronous steps without having nested
callbacks.

Relates #37291
2019-01-11 13:06:17 -05:00
Christoph Büscher bb6d8784e7
Switch indices.get rest after backport of `include_type_name` (#37351)
With the `include_type_name` available now for indices.get on 6.x after the
backport, the corresponsing yaml test can include anything from 6.7 on.
Also changing the RestGetIndicesActionTests base test class.
2019-01-11 17:24:12 +01:00
Armin Braun 1eba1d1df9
Fix SnapshotDisruptionIT Race Condition (#37358)
* Due to a race between retrying the snapshot creation and the failed snapshot create trying to delete the snapshot there is no guarantee that the snapshot is eventually created by retries
   * Adjusted the assertion accordingly
* Closes #36779
2019-01-11 16:09:26 +01:00
Yannick Welsch f4abf9628a
Mock connections more accurately in DisruptableMockTransport (#37296)
This commit moves DisruptableMockTransport to use a more accurate representation of connection
management, which allows to use the full connection manager and does not require mocking out
any behavior. With this, we can implement restarting nodes in CoordinatorTests.
2019-01-11 16:06:48 +01:00
Boaz Leskes d21df2a17a
Use Sequence number powered OCC for processing updates (#37308)
Updates perform realtime get, perform the requested update and then index the document again
using optimistic concurrency control. This PR changes the logic to use sequence numbers instead
of versioning. 

Note that the current versioning logic isn't suffering from the same problem as external OCC
requests because the get and indexing is always done on the same primary.

Relates #36148 
Relates #10708
2019-01-11 06:23:55 -08:00
Yannick Welsch 4d3928d444 Increase timeouts in UnicastZenPingTests
Relates to #37268
2019-01-11 11:33:55 +01:00
Yannick Welsch 3a929c7aea Increase assertBusy timeouts for RefreshListenersTests 2019-01-11 10:15:39 +01:00
Alpar Torok 82ca2d62de Mute CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards
Tracked by #37274
2019-01-11 10:54:53 +02:00
Alpar Torok 3e73911cbe Mute PrimaryAllocationIT.testForceStaleReplicaToBePromotedToPrimaryOnWrongNode
Tracking issue: #37345
2019-01-11 10:49:25 +02:00
Ignacio Vera 0a50821bb2
Geo: Do not normalize the longitude with value -180 for Lucene shapes (#37299)
Lucene based shapes should not normalize the longitude value -180 to 180.
2019-01-11 09:37:18 +01:00
Nhat Nguyen 360c430ad7
Add runAfter and notifyOnce wrapper to ActionListener (#37331)
Relates #37291
2019-01-11 03:33:06 -05:00
Alexander Reelsen 9f3da013d8
Date/Time parsing: Use java time API instead of exception handling (#37222)
* Add benchmark

* Use java time API instead of exception handling

when several formatters are used, the existing way of parsing those is
to throw an exception catch it, and try the next one. This is is
considerably slower than the approach taken in joda time, so that
indexing is reduced when a date format like `x||y` is used and y is the
date format being used.

This commit now uses the java API to parse the date by appending the
date time formatters to each other and does not rely on exception
handling.

* fix benchmark

* fix tests by changing formatter, also expose printer

* restore optional printing logic to fix tests

* fix tests

* incorporate review comments
2019-01-11 09:25:05 +01:00
Jason Tedor 822626dadf
Make consistent empty retention lease supplier
This commit makes the use of empty retention lease suppliers to always
be an empty list as opposed to in some cases an empty set. This commit
is solely for consistency reasons, there is no functional change here.
2019-01-10 18:34:55 -08:00
Jason Tedor edc95c8a8e
Add validation for retention lease construction (#37312)
This commit adds some simple validation that the values input to the
retention lease constructor our valid values. We will later rely on
these values being within the validated range.
2019-01-10 18:13:05 -08:00
Jack Conradson b5b93a2746
Rename ParameterMap to DeprecationMap (#37317)
Mechanical change to rename ParameterMap to DeprecationMap as this 
seems more appropriate for an extended Map to issue deprecation warnings.
2019-01-10 13:58:00 -08:00
markharwood 434430506b
Type removal - added deprecation warnings to _bulk apis (#36549)
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
2019-01-10 21:35:19 +00:00
Julie Tibshirani a433c4012c
Support include_type_name in the field mapping and index template APIs. (#37210)
* Add include_type_name to the get field mappings API.
* Make sure the API specification lists include_type_name as a boolean.
* Add include_type_name to the get index templates API.
* Add include_type_name to the put index templates API.
2019-01-10 09:24:08 -08:00