Commit Graph

2016 Commits

Author SHA1 Message Date
Armin Braun 0a604e3b24
Fix Two Races that Lead to Stuck Snapshots (#37686)
* Fixes two broken spots:
    1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting
    2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state
* Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk
* Significantly extended test infrastructure to reproduce the above two issues
  * Two new test runs:
      1. Reproducing the effects of node disconnects/restarts in isolation
      2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes
* Relates #32265 
* Closes #32348
2019-02-01 05:45:40 +01:00
Yuri Astrakhan f3cde06a1d
geotile_grid implementation (#37842)
Implements `geotile_grid` aggregation

This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240

This code uses the same base classes as `geohash_grid` agg, but uses a different hashing
algorithm to allow zoom consistency.  Each grid bucket is aligned to Web Mercator tiles.
2019-01-31 19:11:30 -05:00
Przemyslaw Gomulka 28b5c7ce78
Do not set up NodeAndClusterIdStateListener in test (#38110)
When extending ESIntegTestCase are run on the same jvm, the static field in
NodeAndClusterIdConverter will throw an AlreadySet exceptions.
overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening
relates #32850
2019-01-31 18:59:40 +01:00
Henning Andersen 68ed72b923
Handle scheduler exceptions (#38014)
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
2019-01-31 17:51:45 +01:00
Jason Tedor a9b12b38f0
Push primary term to replication tracker (#38044)
This commit pushes the primary term into the replication tracker. This
is a precursor to using the primary term to resolving ordering problems
for retention leases. Namely, it can be that out-of-order retention
lease sync requests arrive on a replica. To resolve this, we need a
tuple of (primary term, version). For this to be, the primary term needs
to be accessible in the replication tracker. As the primary term is part
of the replication group anyway, this change conceptually makes sense.
2019-01-31 09:19:49 -05:00
Luca Cavanna 622fb7883b
Introduce ability to minimize round-trips in CCS (#37828)
With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters.

This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour.

Relates to #32125
2019-01-31 15:12:14 +01:00
Tim Vernum cde126dbff
Enable SSL in reindex with security QA tests (#37600)
Update the x-pack/qa/reindex-tests-with-security integration tests to
run with TLS enabled on the Rest interface.

Relates: #37527
2019-01-31 20:59:50 +11:00
Alexander Reelsen 160d1bd4dd
Work around JDK8 timezone bug in tests (#37968)
The timezone GMT0 cannot be properly parsed on java8.
The randomZone() method now excludes GMT0, if java8 is used.

Closes #37814
2019-01-31 08:52:35 +01:00
David Turner 81c443c9de
Deprecate minimum_master_nodes (#37868)
Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in
tests, but for 7.x nodes this setting is not required as it has no effect.
This commit removes this setting so that nodes are started with more realistic
configurations, and deprecates it.
2019-01-30 20:09:15 +00:00
Nik Everett e97718245d
Test: Enable strict deprecation on all tests (#36558)
This drops the option for tests to disable strict deprecation mode in
the low level rest client in favor of configuring expected warnings on
any calls that should expect warnings. This behavior is
paranoid-by-default which is generally the right way to handle
deprecations and tests in general.
2019-01-30 11:48:34 -05:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Tim Brooks f3f9cabd67
Add timeout for ccr recovery action (#37840)
This is related to #35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
2019-01-29 12:29:06 -07:00
Armin Braun 7f1784e9f9
Remove Dead MockTransport Code (#34044)
* All these methods are unused
2019-01-29 15:08:11 +01:00
Luca Cavanna 2325fb9cb3
Remove test only SearchShardTarget constructor (#37912)
Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.
2019-01-29 14:58:11 +01:00
Przemyslaw Gomulka 891320f5ac
Elasticsearch support to JSON logging (#36833)
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.

To populate additional fields node.id and cluster.uuid which are not available at start time, 
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. 

Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions

Docker log4j2 configuration is now almost the same as the one use for ES binary. 
The only difference is that docker is using console appenders, whereas ES is using file appenders.

relates: #32850
2019-01-29 07:20:09 +01:00
Jason Tedor 5fddb631a2
Introduce retention lease syncing (#37398)
This commit introduces retention lease syncing from the primary to its
replicas when a new retention lease is added. A follow-up commit will
add a background sync of the retention leases as well so that renewed
retention leases are synced to replicas.
2019-01-27 07:49:56 -05:00
Christoph Büscher b4b4cd6ebd
Clean codebase from empty statements (#37822)
* Remove empty statements

There are a couple of instances of undocumented empty statements all across the
code base. While they are mostly harmless, they make the code hard to read and
are potentially error-prone. Removing most of these instances and marking blocks
that look empty by intention as such.

* Change test, slightly more verbose but less confusing
2019-01-25 14:23:02 +01:00
Jim Ferenczi 787acb14b9
Track total hits up to 10,000 by default (#37466)
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.

Closes #33028
2019-01-25 13:45:39 +01:00
Julie Tibshirani e1d8df4ffa
Deprecate types in create index requests. (#37134)
From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates:
* Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests.
* Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.
2019-01-24 13:17:47 -08:00
Andrey Ershov 4974684003
Add tool elasticsearch-node unsafe-bootstrap (#37696)
elasticsearch-node tool helps to restore cluster if half or more of
master eligible nodes are lost. Of course, all bets are off, regarding
data consistency.

There are two parts of the tool: unsafe-bootstrap to be used when there
is still at least one master-eligible node alive and detach-cluster,
when there are no master-eligible nodes left.
This commit implements the first part.

Docs for the tool will be added separately as a part of #37812.
2019-01-24 19:25:55 +01:00
Tal Levy 289106a578
Refactor GeoHashGrid to be abstract and re-usable (#37742)
This change split out all the specific GeoHash
classes for the geohash_grid aggregation into
abstract GeoGrid classes that can be re-used for
specific hashing types, like `geohash`
2019-01-24 10:12:14 -08:00
Alpar Torok 37768b7eac
Testing conventions now checks for tests in main (#37321)
* Testing conventions now checks for tests in main

This is the last outstanding feature of the old NamingConventionsTask,
so time to remove it.

* PR review
2019-01-24 17:30:50 +02:00
Nhat Nguyen a6abb28abf
Fix InternalEngineTests#assertOpsOnPrimary (#37746)
The assertion `assertOpsOnPrimary` does not store seq_no and primary
term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This
leads to failures of the subsequence CAS deletes or indexes with seq_no
and term. Moreover, this assertion trips a translog assertion because it
bumps the primary term of some operations but not the primary term of
the engine.

Relates #36467
Closes #37684
2019-01-24 10:02:48 -05:00
Yannick Welsch feab59df03
Bubble exceptions up in ClusterApplierService (#37729)
Exceptions thrown by the cluster applier service's settings and cluster appliers are bubbled up, and
block the state from being applied instead of silently being ignored. In combination with the cluster
state publishing lag detector, this will throw a node out of the cluster that can't properly apply
cluster state updates.
2019-01-24 14:09:03 +01:00
Alexander Reelsen daa2ec8a60
Switch mapping/aggregations over to java time (#36363)
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.

The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.

Relates #27330
2019-01-23 10:40:05 +01:00
Boaz Leskes 52ba407931
Expose sequence number and primary terms in search responses (#37639)
Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. 

This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.
2019-01-23 09:01:58 +01:00
Henning Andersen 228611843c
Fail start of non-data node if node has data (#37347)
* Fail start of non-data node if node has data

Check that nodes started with node.data=false cannot start if they have
shard data to avoid (old) indexes being resurrected into the cluster in red status.

Issue #27073
2019-01-22 13:27:12 +01:00
Tim Brooks 21838d73b5
Extract message serialization from `TcpTransport` (#37034)
This commit introduces a NetworkMessage class. This class has two
subclasses - InboundMessage and OutboundMessage. These messages can
be serialized and deserialized independent of the transport. This allows
more granular testing. Additionally, the serialization mechanism is now
a simple Supplier. This builds the framework to eventually move the
serialization of transport messages to the network thread. This is the
one serialization component that is not currently performed on the
network thread (transport deserialization and http serialization and
deserialization are all on the network thread).
2019-01-21 14:14:18 -07:00
Tim Brooks f516d68fb2
Share `NioGroup` between http and transport impls (#37396)
Currently we create dedicated network threads for both the http and
transport implementations. Since these these threads should never
perform blocking operations, these threads could be shared. This commit
modifies the nio-transport to have 0 http workers be default. If the
default configs are used, this will cause the http transport to be run
on the transport worker threads. The http worker setting will still exist
in case the user would like to configure dedicated workers. Additionally,
this commmit deletes dedicated acceptor threads. We have never had these
for the netty transport and they can be added back if a need is
determined in the future.
2019-01-21 13:50:56 -07:00
Julie Tibshirani 8da7a27f3b
Deprecate types in the put mapping API. (#37280)
From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates:
- Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`.
- Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.
2019-01-18 12:28:31 -08:00
Yannick Welsch 377d96e376
Remove initial_master_nodes on node restart (#37580)
Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since
being switched to Zen2 because the bootstrap setting was left around when nodes got restarted
with data folders wiped.

The test in question here was starting one node (which autobootstrapped to that single node), then
another node. The first node was then shut down (after excluding it from the voting configuration),
its data folder wiped, and restarted. After restart, the node had an empty data folder yet
initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of
its own, and not rejoin the existing cluster with the other node.
2019-01-18 16:36:42 +01:00
Jason Tedor 687978b7d1
Reject all requests that have an unconsumed body (#37504)
This commit removes some leniency from REST handling where we move to
reject all requests that have a body where the body is not used during
the course of handling the request. For example,

DELETE /index
{
  "query" : {
    "term" :  {
      "field" : "value"
    }
  }
}

is now rejected.
2019-01-16 07:29:25 -05:00
Przemyslaw Gomulka 5e94f384c4
Remove the use of AbstracLifecycleComponent constructor #37488 (#37488)
The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class.
It no longer extends the AbstractComponent so there is no need for this constructor
There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor.
This is part 1. which will be backported to 6.x with a migration guide/deprecation log.
part 2 will have this constructor removed in 7
relates #35560

relates #34488
2019-01-16 09:05:30 +01:00
Tanguy Leroux 23ae9808ba
Fix IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) (#37414)
This commit fixes the IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) 
method where the startReplica parameter was not correctly propagated and the value
 true always used instead.
2019-01-15 12:48:21 +01:00
Marios Trivyzas d6a104f52b
[TEST] Muted testDifferentRolesMaintainPathOnRestart
Relates to #37462
2019-01-15 11:51:45 +02:00
Jason Tedor e11a32eda8
Reformat some classes in the index universe
This commit reformats some classes in the index universe with the
purpose of breaking some long method definitions and invocations into a
line per parameter. This has the advantage that for an upcoming change
to these definitions and invocations, the diff for that change will be a
single line per definition or invocation. That makes these sorts of
changes easier to read.
2019-01-14 21:45:24 -05:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Nhat Nguyen 1e3702da0b Relax assertSameDocIdsOnShards assertion
If the checking node no longer holds the shard copy, the assertion
assertSameDocIdsOnShards might fail. This is too harsh since the
assertion is to ensure the consistency between active copies.
2019-01-14 15:28:48 -05:00
Nhat Nguyen 15aa3764a4
Reduce recovery time with compress or secure transport (#36981)
Today file-chunks are sent sequentially one by one in peer-recovery. This is a
correct choice since the implementation is straightforward and recovery is
network bound in most of the time. However, if the connection is encrypted, we
might not be able to saturate the network pipe because encrypting/decrypting
are cpu bound rather than network-bound.

With this commit, a source node can send multiple (default to 2) file-chunks
without waiting for the acknowledgments from the target.

Below are the benchmark results for PMC and NYC_taxis.

- PMC (20.2 GB)

| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| -------- | -------- | -------- | -------- |
| Plain     | 184s     | 137s     | 106s     | 105s     | 106s     |
| TLS       | 346s     | 294s     | 176s     | 153s     | 117s     |
| Compress  | 1556s    | 1407s    | 1193s    | 1183s    | 1211s    |

- NYC_Taxis (38.6GB)

| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| ---------| ---------| ---------| -------- |
| Plain     | 321s     | 249s     | 191s     |  *       | *        |
| TLS       | 618s     | 539s     | 323s     | 290s     | 213s     |
| Compress  | 2622s    | 2421s    | 2018s    | 2029s    | n/a      |

Relates #33844
2019-01-14 15:14:46 -05:00
Tim Brooks 5c68338a1c
Implement ccr file restore (#37130)
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
2019-01-14 13:07:55 -07:00
Armin Braun 033e67fa59
Cleanup Deadcode in Rest Tests (#37418)
* Either dead code outright or redundant overrides removed
2019-01-14 16:22:44 +01:00
Daniel Mitterdorfer abe35fb99b
Remove unused index store in directory service
With this commit we remove the unused field `indexStore` from all
implementations of `FsDirectoryService`.

Relates #37097
2019-01-14 13:44:32 +01:00
Jason Tedor 03be4dbaca
Introduce retention lease persistence (#37375)
This commit introduces the persistence of retention leases by persisting
them in index commits and recovering them when recovering a shard from
store.
2019-01-12 14:43:19 -08:00
Nhat Nguyen 44a1071018
Make recovery source partially non-blocking (#37291)
Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts
to make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.

Relates #36195
2019-01-12 12:49:48 -05:00
Armin Braun 63fe3c6ed6
Fix PrimaryAllocationIT Race Condition (#37355)
* Fix PrimaryAllocationIT Race Condition

* Forcing a stale primary allocation on a green index was tripping the assertion that was removed
   * Added a test that this case still errors out correctly
* Made the ability to wipe stopped datanode's data public on the internal test cluster and used it to ensure correct behaviour on the fixed test
   * Previously it simply passed because the test finished before the index went green and would NPE when the index was green at the time of the shard store status request, that would then come up empty
* Closes #37345
2019-01-11 23:26:04 +01:00
Yannick Welsch f4abf9628a
Mock connections more accurately in DisruptableMockTransport (#37296)
This commit moves DisruptableMockTransport to use a more accurate representation of connection
management, which allows to use the full connection manager and does not require mocking out
any behavior. With this, we can implement restarting nodes in CoordinatorTests.
2019-01-11 16:06:48 +01:00
Jason Tedor 822626dadf
Make consistent empty retention lease supplier
This commit makes the use of empty retention lease suppliers to always
be an empty list as opposed to in some cases an empty set. This commit
is solely for consistency reasons, there is no functional change here.
2019-01-10 18:34:55 -08:00
Yannick Welsch d499233068
Zen2: Add join validation (#37203)
Adds join validation to Zen2, which prevents a node from joining a cluster when the node does not
have the right ES version or does not satisfy any other of the join validation constraints.
2019-01-10 12:57:50 +01:00
Alexander Reelsen b2e8437424
Tests: Add ElasticsearchAssertions.awaitLatch method (#36777)
* Tests: Add ElasticsearchAssertions.awaitLatch method

Some tests are using assertTrue(latch.await(...)) in their code. This
leads to an assertion error without any error message. This adds a
method which has a nicer error message and can be used in tests.

* fix forbidden apis

* fix spaces
2019-01-10 09:25:36 +01:00
Armin Braun eacc63b032
TESTS: Real Coordinator in SnapshotServiceTests (#37162)
* TESTS: Real Coordinator in SnapshotServiceTests

* Introduce real coordinator in SnapshotServiceTests to be able to test network disruptions realistically
  * Make adjustments to cluster applier service so that we can pass a mocked single threaded executor for tests
2019-01-09 16:53:49 +01:00
Tanguy Leroux 7f6fe14b66 Merge branch 'master' into close-index-api-refactoring 2019-01-09 09:26:05 +01:00
Mayya Sharipova ec32e66088 Deprecate reference to _type in lookup queries (#37016)
Relates to #35190
2019-01-08 18:46:41 -08:00
Tanguy Leroux d70ebfd1d6 Merge branch 'master' into close-index-api-refactoring 2019-01-08 09:17:48 +01:00
Jason Tedor c8c596cead
Introduce retention lease expiration (#37195)
This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
2019-01-07 22:03:52 -08:00
Julie Tibshirani c5aac4705d Revert "Stop automatically nesting mappings in index creation requests. (#36924)"
This reverts commit ac1c6940d2.
2019-01-07 17:56:40 -08:00
Tanguy Leroux 97bf4d7176 Merge branch 'master' into close-index-api-refactoring 2019-01-07 18:38:27 +01:00
David Turner 9d0e0eb0f3
[Zen2] Remove initial master node count setting (#37150)
The `cluster.unsafe_initial_master_node_count` setting was introduced as a
temporary measure while the design of `cluster.initial_master_nodes` was being
finalised. This commit removes this temporary setting, replacing it with usages
of `cluster.initial_master_nodes` where appropriate.
2019-01-07 16:05:00 +00:00
Tanguy Leroux e149b0852e
[Close Index API] Add unique UUID to ClusterBlock (#36775)
This commit adds a unique id to cluster blocks, so that they can be uniquely 
identified if needed. This is important for the Close Index API where multiple 
concurrent closing requests can be executed at the same time. By adding a 
UUID to the cluster block, we can generate unique "closing block" that can 
later be verified on shards and then checked again from the cluster state 
before closing the index. When the verification on shard is done, the closing 
block is replaced by the regular INDEX_CLOSED_BLOCK instance.

If something goes wrong, calling the Open Index API will remove the block.

Related to #33888
2019-01-07 16:44:59 +01:00
Jason Tedor c0f8c89172
Introduce shard history retention leases (#37167)
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.

Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.

Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.

Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.

This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
 - timestamps
 - expiration
 - persistence to and recovery from Lucene
 - handoff during primary relocation
 - sharing retention leases with replicas
 - exposing leases in shard-level statistics
 - integration with cross-cluster replication

These will occur individually in follow-up commits.
2019-01-07 07:43:57 -08:00
Alpar Torok a7c3d5842a
Split third party audit exclusions by type (#36763) 2019-01-07 17:24:19 +02:00
Simon Willnauer ac2e09b25a
Fix suite scope random initializaation (#37163)
The initialization of a suite scope cluster had some sideffects on
subsequent runs which causes issues when tests must be reproduced.
This moves the suite scope initialization to a privte random context.

Closes #36202
2019-01-07 14:20:17 +01:00
Tanguy Leroux f5af79b9cd Merge branch 'master' into close-index-api-refactoring 2019-01-07 12:43:03 +01:00
Armin Braun 31c33fdb9b
MINOR: Remove some Deadcode in Gradle (#37160) 2019-01-07 09:21:25 +01:00
Jim Ferenczi e38cf1d0dc
Add the ability to set the number of hits to track accurately (#36357)
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.

Relates #33028
2019-01-04 20:36:49 +01:00
Julie Tibshirani ac1c6940d2
Stop automatically nesting mappings in index creation requests. (#36924)
Now that we unwrap mappings in DocumentMapperParser#extractMappings, it is not
necessary for the mapping definition to always be nested under the type. This
leniency around the mapping format was added in 2341825358.
2019-01-03 17:41:28 -08:00
David Findley d4e7660248 Fix weighted_avg parser not found for RestHighLevelClient (#37027)
Add integration test for weighted avg sub aggregation
Add weighted avg parser to DefaultNamedXContents

Fixes #36861
2019-01-02 15:53:21 -06:00
Josh Soref 1df66d21fe Spelling: replace uknown with unknown (#37056) 2019-01-02 17:33:02 +01:00
Josh Soref d3e98278c3 Spelling: replace cachable with cacheable (#37047) 2019-01-02 14:10:30 +01:00
Armin Braun 85be9d6a89
SNAPSHOT: Deterministic ClusterState Tests (#36644)
* Use `DeterministicTaskQueue` infrastructure to test `SnapshotsService`
2018-12-31 11:17:21 +01:00
Luca Cavanna 51fe20e0c3
Add support for local cluster alias to SearchRequest (#36997)
With the upcoming cross-cluster search alternate execution mode, the CCS
 node will be able to split a CCS request into multiple search requests,
 one per remote cluster involved. In order to do that, the CCS node has
to be able to signal to each remote cluster that such sub-requests are
part of a CCS request. Each cluster does not know about the other
clusters involved, and does not know either what alias it is given in
the CCS node, hence the CCS coordinating node needs to be able to provide
the alias as part of the search request so that it is used as index prefix
in the returned search hits.

The cluster alias is a notion that's already supported in the search shards
iterator and search shard target, but it is currently used in CCS as both
 index prefix and connection lookup key when fanning out to all the shards.
With CCS alternate execution mode the provided cluster alias needs to be
used only as index prefix, as shards are local to each cluster hence no
cluster alias should be used for connection lookups.

The local cluster alias can be set to the SearchRequest at the transport layer
only, and its constructor/getter methods are package private.

Relates to #32125
2018-12-28 12:43:25 +01:00
Andrey Ershov a02cfdf6e4
Switch InternalTestClusterTests to zen2 (#36977)
Today InternalTestClusterTests is still using zen1.
This commit fixes it.
Two types of changes were required:

1. Explicitly pass file discovery host provider setting. It's done in
ESIntegTestCase as a part of the Zen2 feature and should be done here
as well.
2. For the test, that uses autoManageMinMasterNodes = false perform
cluster bootstrap.
2018-12-27 22:21:37 +01:00
Nhat Nguyen 7580d9d925
Make SourceToParse immutable (#36971)
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.

Relates #36921
2018-12-24 14:06:50 -05:00
Tim Brooks c8a8391dfa
Only compress responses if request was compressed (#36867)
This is a follow-up to some discussions around #36399. Currently we have
relatively confusing compression behavior where compression can be
configured for requests based on transport.compress or a specific
setting for a remote cluster. However, we can only compress responses
based on transport.compress as we do not know where a request is
coming from (currently).

This commit modifies the behavior to NEVER compress responses based on
settings. Instead, a response will only be compressed if the request was
compressed. This commit also updates the documentation to more clearly
described transport level compression.
2018-12-21 10:14:00 -07:00
Tanguy Leroux bd2af2c400 Merge branch 'master' into close-index-api-refactoring 2018-12-21 12:22:24 +01:00
Andrey Ershov ca92d74e7e
[Zen2] Change unsafe bootstrap nodes count to nodes list in tests (#36559)
This commit modifies ESSingleNodeTestCase and ESIntegTestCase and
several concrete test classes to use node names when bootstrapping the
cluster.

Today ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING
setting is used to bootstrap clusters in tests. Instead, we want to use
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING and get rid of
the former setting eventually.

There were two main problems when refactoring InternalTestCluster:

1. Nodes are created one-by-one in buildNode method. And node.name
is created in this method as well. It's not suitable for bootstrapping,
because we need to have the names of all master eligible nodes in
advance, before creating the node with bootstrapping configuration set.
We address this issue by separating buildNode into two methods:
getNodeSettings and buildNode. We first iterate over all nodes to
get nodes settings, then change the setting for the bootstrapping node
and then proceed with building the node.
2. If autoManageMinMasterNodes = false, there is no way for the test to
set the list of bootstrapping nodes because node names are not known in
advance. This problem is solved by adding updateNodesSettings method
to NodeConfigurationSource and ESIntegTestCase (which could be
overridden by concrete integration test class). Once we have the list
of settings for all nodes, the integration test class is allowed to
update it. In our case, we update the
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING setting.
2018-12-20 15:20:33 +01:00
Tanguy Leroux fb24469fe7 Merge branch 'master' into close-index-api-refactoring 2018-12-19 16:17:26 +01:00
Yannick Welsch 487a1c4f71
Fix cluster state persistence for single-node discovery (#36825)
Single-node discovery is not persisting cluster states, which was caused by a recent 7.0-only
refactoring. This commit ensures that the cluster state is properly persisted when using single-node
discovery and adds a corresponding test.
2018-12-19 13:26:04 +01:00
Alan Woodward 344917efab
Add script filter to intervals (#36776)
This commit adds the ability to filter out intervals based on their start and end position, and internal
gaps:
```
POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "match" : {
          "query" : "hot porridge",
          "filter" : {
            "script" : {
              "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0"
            }
          }
        }
      }
    }
  }
}
```
2018-12-19 11:12:18 +00:00
Tanguy Leroux c99fd6a53b Merge branch 'master' into close-index-api-refactoring 2018-12-19 09:34:59 +01:00
Alpar Torok e9ef5bdce8
Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311)
- Create a separate unitTest task instead of Gradle's built in 
- convert all configuration to use the new task 
- the  built in task is now disabled
2018-12-19 08:25:20 +02:00
Tim Brooks aaf466ff5e
Revert transport.port change for tests (#36809)
Commit #36786 updated docs and strings to reference transport.port instead of
transport.tcp.port. However, this breaks backwards compatibility tests
as the tests rely on string configurations and transport.port does not
exist prior to 6.6. This commit reverts the places were we reference
transport.tcp.port for tests. This work will need to be reintroduced in
a backwards compatible way.
2018-12-18 19:01:13 -07:00
Tim Brooks 47a9a8de49
Update transport docs and settings for changes (#36786)
This is related to #36652. In 7.0 we plan to deprecate a number of
settings that make reference to the concept of a tcp transport. We
mostly just have a single transport type now (based on tcp). Settings
should only reference tcp if they are referring to socket options. This
commit updates the settings in the docs. And removes string usages of
the old settings. Additionally it adds a missing remote compress setting
to the docs.
2018-12-18 13:09:58 -07:00
Ryan Ernst 8ec8342a52
Internal: Remove originalSettings from Node (#36569)
This commit removes the originalSettings member from Node. It was only
needed to allows test clusters to recreate the node in certain
situations. Instead, the test cluster now keeps track of these settings.
2018-12-18 10:05:27 -08:00
Tanguy Leroux 0a0c969517 Merge branch 'master' into close-index-api-refactoring 2018-12-18 09:27:35 +01:00
Luca Cavanna b57e12aa44
Add raw sort values to SearchSortValues transport serialization (#36617)
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.

This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.
2018-12-18 09:20:51 +01:00
Nicholas Knize 96d279ed83 Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)"
This reverts commit 5bc7822562.
2018-12-17 20:09:46 -06:00
Nick Knize 5bc7822562
[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)
This commit  exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new 
indexing approach, simply set "type" : "geo_shape" in 
the mappings without setting any of the strategy, precision, 
tree_levels, or distance_error_pct parameters. Note the 
following when using the new indexing approach:

* geo_shape query does not support querying by 
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not 
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct, 
and points_only parameters are deprecated.
2018-12-17 14:38:14 -06:00
Luca Cavanna f1e1f93943 [TEST] fix float comparison in RandomObjects#getExpectedParsedValue
This commit fixes a test bug introduced with #36597. This caused some
test failure as stored field values comparisons would not work when CBOR
xcontent type was used.

Closes #29080
2018-12-17 21:19:59 +01:00
Tanguy Leroux 79999d37d4 Merge branch 'master' into close-index-api-refactoring 2018-12-17 10:14:38 +01:00
Boaz Leskes 733a6d34c1
Add seq no powered optimistic locking support to the index and delete transport actions (#36619)
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control) 
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.

Relates #36148 
Relates #10708
2018-12-15 17:59:57 +01:00
Tim Brooks 3065300434
Unify transport settings naming (#36623)
This commit updates our transport settings for 7.0. It generally takes a
few approaches. First, for normal transport settings, it usestransport.
instead of transport.tcp. Second, it uses transport.tcp, http.tcp,
or network.tcp for all settings that are proxies for OS level socket
settings. Third, it marks the network.tcp.connect_timeout setting for
removal. Network service level settings are only settings that apply to
both the http and transport modules. There is no connect timeout in
http. Fourth, it moves all the transport settings to a single class
TransportSettings similar to the HttpTransportSettings class.

This commit does not actually remove any settings. It just adds the new
renamed settings and adds todos for settings that will be deprecated.
2018-12-14 14:41:04 -07:00
Tim Brooks fbf88b2ab7
Remove the `MockTcpTransport` (#36628)
This commit removes all remaining usages of the `MockTcpTransport`.
Additionally it removes the `MockTcpTransport` and its test case.
2018-12-14 10:59:07 -07:00
Luca Cavanna bb3ae18da5
Increase coverage in SearchSortValuesTests (#36597)
SearchSortValuesTests extends now `AbstractSerializingTestCase` which removes some code duplication and standardizes the way we test `fromXContent`, serialization and equals/hashcode.

Also, we were never creating `SearchSortValues` through their public constructor that accept an array of `DocValueFormat` together with the array of raw sort values. That is covered now, which involved some conversion from `BytesRef` to String in the test.

Also, the previous test was not using doing any equality check against the original and parsed versions in `testFromXContent` due to values being parsed with different types in some cases, which is now covered by converting those values using a new method added to `RandomObjects`. The code was already there as part of `randomStoredFieldValues`, but it is now exposed to be used in other scenarios.
2018-12-14 18:57:37 +01:00
Luca Cavanna 7dc3d3b78b
Add sort and collapse info to SearchHits transport serialization (#36555)
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of the `SearchHits`, specifically:

- lucene `SortField[]` which contains info about the fields that sorting was performed on and their type, which depends on mappings (that the CCS node does not know about)
- collapse field (`String`) that field collapsing was executed on, if requested
- collapse values (`Object[]`) that field collapsing was based on, if requested

This info is needed to be able to reconstruct the `TopFieldDocs` or `CollapseFieldTopDocs` in the CCS coordinating node to feed the `mergeTopDocs` method and reduce multiple search responses received (one per cluster) into one.

This commit adds such information to the `SearchHits` class. It's nullable info that is not serialized through the REST layer. `SearchPhaseController` sets such info at the end of the hits reduction phase.
2018-12-14 12:22:54 +01:00
Armin Braun c5b3ac5578
SNAPSHOTS: Allow Parallel Restore Operations (#36397)
* Enable parallel restore operations
* Add uuid to restore in progress entries to uniquely identify them
* Adjust restore in progress entries to be a map in cluster state
* Added tests for:
   * Parallel restore from two different snapshots
   * Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator
   * Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers
2018-12-14 11:39:23 +01:00
Tanguy Leroux 8e5dd20efb
[Close Index API] Refactor MetaDataIndexStateService (#36354)
The commit changes how indices are closed in the MetaDataIndexStateService. 
It now uses a 3 steps process where writes are blocked on indices to be closed, 
then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction 
added in #36249, and finally indices states are moved to CLOSE and their routing 
tables removed.

The closing process also takes care of using the pre-7.0 way to close indices if the 
cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction. It also closes unassigned indices.

Related to #33888
2018-12-13 17:36:23 +01:00
Boaz Leskes f6b5d7e013
Add sequence numbers based optimistic concurrency control support to Engine (#36467)
This commit add support to engine operations for resolving and verifying the sequence number and
primary term of the last modification to a document before performing an operation. This is
infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning.

Relates #36148 
Relates #10708
2018-12-13 08:08:40 +01:00
Tal Levy cd1bec3a06
[refactor] add Environment in BootstrapContext (#36573)
There are certain BootstrapCheck checks that may need access environment-specific
values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment
via a constructor to bypass the shortcoming in BootstrapContext. This commit
pulls in the node's environment into BootstrapContext.

Another case is found in #36519, where it is useful to check the state of the
data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on
the environment to retrieve references to things like node data paths.

This means that the BootstrapContext will have the same Settings used in the
Environment, which currently differs from the Node's settings.
2018-12-12 21:07:21 -08:00
Julie Tibshirani 71a39d10be
Make sure that BWC tests run successfully, even with types deprecation messages. (#36511) 2018-12-12 12:57:32 -08:00
Tim Brooks 7f612d5dd8
Always compress based on the settings (#36522)
Currently TransportRequestOptions allows specific requests to request
compression. This commit removes this and always compresses based on the
settings. Additionally, it removes TransportResponseOptions as they
are unused.

This closes #36399.
2018-12-12 09:39:15 -07:00
Simon Willnauer ff5dd14753
Fix test failures related to file corruption (#36530)
* Fix CorruptFileIT to also take last DV generation into account

We currently only prune old .liv generations. With soft_deletes it's important
to also prune DV generations.

* Fix CorruptionUtils to skip the footer bytes after the checksum is read.

Today we read a broken checksum since we also checksum the 8 footer bytes that include
the checksum algorithm and the footer magic.

Closes #36526
2018-12-12 16:21:02 +01:00
Tim Brooks e63d52af63
Move page size constants to PageCacheRecycler (#36524)
`PageCacheRecycler` is the class that creates and holds pages of arrays
for various uses. `BigArrays` is just one user of these pages. This
commit moves the constants that define the page sizes for the recycler
to be on the recycler class.
2018-12-12 07:00:50 -07:00
Alpar Torok c00d0fc814
Test fixtures improovements (#36037)
* Upgrae plugin to latest and expose udp
* Explicit check for windows
* Rename the properties for the port numbers
* Tasks for pre and pos container actions
2018-12-12 12:00:47 +02:00
Nik Everett 03daad9812
Re-deprecate xpack rollup endpoints (#36451)
Redeprecates the `/_xpack/rollup` endpoints in favor of `/_rollup`.

When we cleanup the rollup in a cluster containing 6.x nodes we need to
use `/_xpack/rollup` instead of `/_rollup` because the 6.x nodes don't
know about `/_rollup`. In those cases we must ignore the deprecation
warnings that the 7.0 node will return for the end point.

Closes #36044
2018-12-11 19:43:17 -05:00
Tim Brooks 797f985067
Add version to handshake requests (#36171)
Currently our handshake requests do not include a version. This is
unfortunate as we cannot rely on the stream version since it is not the
sending node's version. Instead it is the minimum compatibility version.
The handshake request is currently empty and we do nothing with it. This
should allow us to add data to the request without breaking backwards
compatibility.

This commit adds the version to the handshake request. Additionally, it
allows "future data" to be added to the request. This allows nodes to craft
a version compatible response. And will properly handle additional data in
future handshake requests. The proper handling of "future data" is useful
as this is the only request where we do not know the other node's version.

Finally, it renames the TcpTransportHandshaker to
TransportHandshaker.
2018-12-11 16:09:28 -07:00
Mayya Sharipova 2f18325384
Deprecate types in update_by_query and delete_by_query (#36365)
Relates to #35190
2018-12-11 17:09:59 -05:00
Tim Brooks 790f8102e9
Modify `BigArrays` to take name of circuit breaker (#36461)
This commit modifies BigArrays to take a circuit breaker name and
the circuit breaking service. The default instance of BigArrays that
is passed around everywhere always uses the request breaker. At the
network level, we want to be using the inflight request breaker. So this
change will allow that.

Additionally, as this change moves away from a single instance of
BigArrays, the class is modified to not be a Releasable anymore.
Releasing big arrays was always dispatching to the PageCacheRecycler,
so this change makes the PageCacheRecycler the class that needs to be
managed and torn-down.

Finally, this commit closes #31435 be making the serialization of
transport messages use the inflight request breaker. With this change,
we no longer push the global BigArrays instnace to the network level.
2018-12-11 11:55:41 -07:00
markharwood a9eccbcd02
Tests- added helper methods to ESRestTestCase for checking warnings (#36443)
Added helper methods to ESRestTestCase for checking warnings in mixed and current-version-only clusters.
This is supported by a new VersionSpecificWarningsHandler class with associated unit test.
Closes #36251
2018-12-11 17:30:15 +00:00
Andrey Ershov 8b821706cc
Switch more tests to zen2 (#36367)
1. CCR tests work without any changes
2. `testDanglingIndices` require changes the source code (added TODO).
3. `testIndexDeletionWhenNodeRejoins` because it's using just two
nodes, adding the node to exclusions is needed on restart.
4. `testCorruptTranslogTruncationOfReplica` starts dedicated master
one, because otherwise, the cluster does not form, if nodes are stopped
and one node is started back.
5. `testResolvePath` needs TEST cluster, because all nodes are stopped
at the end of the test and it's not possible to perform checks needed
by SUITE cluster.
6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2
retries snapshot creation as soon as network partition heals. This
results into the race between creating snapshot and test cleanup logic
(deleting index). Zen1 on the
other hand, also schedules retry, but it takes some time after network
partition heals, so cleanup logic executes latter and test passes. The
check that snapshot is eventually created is added to
the end of the test.
2018-12-11 17:12:17 +01:00
Julie Tibshirani 87831051dc
Deprecate types in explain requests. (#35611)
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
2018-12-10 19:45:13 -08:00
Jernej Klancic d615add1b1 Add pipeline parent validation for auto date histogram (#35670)
Allow `auto_date_histogram` as a valid parent agg for derivative, 
cumulative sum, moving average, moving function and serial 
differencing pipeline aggregations.

Since all these aggs share the same requirement (sequentially
ordered parent aggs), this commit also refactors to share
the same validation code so that any newly added aggs won't
be forgotten.

Closes #35578
2018-12-10 16:02:49 -05:00
Jim Ferenczi 75392adf60
[TEST] Convert SearchHitsTests to AbstractStreamableXContentTestCase (#36313)
This change adds a way to provide the content type of the rest serialization
tests when creating random instances. This is used by SearchHitsTests to ensure
that the internal members of the class are created with the same xContentType
and that equals can be used to compare an instances created from an XContent
view.
2018-12-10 20:41:20 +01:00
Nik Everett 9626e700ce
LLRC: Make warning behavior pluggable per request (#36345)
This allows you to plug the behavior that the LLRC uses to handle
warnings on a per request basis.

We entertained the idea of allowing you to set the warnings behavior to
strict mode on a per request basis but that wouldn't allow the high
level rest client to fail when it sees an unexpected warning.

We also entertained the idea of adding a list of "required warnings" to
the `RequestOptions` but that won't work well with failures that occur
*sometimes* like those we see in mixed clusters.

Adding a list of "allowed warnings" to the `RequestOptions` would work
for mixed clusters but it'd leave many of the assertions in our tests
weaker than we'd like.

This behavior plugging implementation allows us to make a "required
warnings" option when we need it and an "allowed warnings" behavior when
we need it.

I don't think this behavior is going to be commonly used by used outside
of the Elasticsearch build, but I expect they'll be a few commendably
paranoid folks who could use this behavior.
2018-12-10 08:32:00 -05:00
David Turner 9f86e996fe
[Zen2] Support rolling upgrades from Zen1 (#35737)
We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:

- Zen1 nodes will never vote for Zen2 nodes
- Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
- Zen2 nodes that were previously part of a mixed cluster will automatically
  (and unsafely) bootstrap themselves when the last Zen1 node leaves.
2018-12-08 07:33:35 +00:00
Tim Brooks 8a53f2b464
Implement basic `CcrRepository` restore (#36287)
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.

When the restore shard method is called, an empty shard is initialized.
2018-12-07 15:27:04 -07:00
Tim Brooks 5556204f81
Use MockNioTransport in MockTransportService (#36346)
The default transport used in the MockTransportService is the
MockTcpTransport. This commit changes that to be the
MockNioTransport.
2018-12-07 11:17:11 -07:00
Nhat Nguyen f2df0a5be4
Remove LocalCheckpointTracker#resetCheckpoint (#34667)
In #34474, we added a new assertion to ensure that the
LocalCheckpointTracker is always consistent with Lucene index. However,
we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this
assertion to be violated.

This commit removes resetCheckpoint from LocalCheckpointTracker and
rewrites testDedupByPrimaryTerm without resetting the local checkpoint.

Relates #34474
2018-12-07 12:22:20 -05:00
David Turner ed1c5a0241
Introduce `zen2` discovery type (#36298)
With this change it is now possible to start a node running Zen2.
2018-12-06 16:20:08 +00:00
David Turner 38ab15c6fb Avoid shutting down the only master (#36272)
Today the InternalTestClusterTests sometimes set up a cluster with a single
master, start some other ndoes, shut the original master down, and then reset
the cluster. This doesn't really work, because the original master may be
stale. This change avoids shutting down the only master in this situation.
2018-12-06 08:27:38 +01:00
Yannick Welsch a0ae1cc987 Merge remote-tracking branch 'elastic/master' into zen2 2018-12-05 23:13:12 +01:00
Yannick Welsch 03d0ea91ef
Zen2: Rename tombstones to exclusions (#36226)
Renames the withdrawal / tombstones APIs to voting configuration exclusions.
2018-12-05 23:12:28 +01:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Yannick Welsch b20497560c Merge remote-tracking branch 'elastic/master' into zen2 2018-12-05 14:06:38 +01:00
Yannick Welsch 0b9efff5cb
Zen2: Persist cluster states the old way on non-master-eligible nodes (#36247)
The shard deletion logic (triggered by IndicesStore), which also leads to index metadata deletion on
non-master-eligible data nodes, currently races against the new cluster state persistence logic
triggered by accepting cluster states. One thread is writing the index metadata while another one is
deleting the index metadata, leading to exceptions and assertions tripping (see below). The solution
proposed by this PR is to move the cluster state persistence of non-master-eligible nodes back to
the cluster applier service, just as it used to be for Zen1. This ensures that the index metadata
deletion logic, which is triggered by the shard deletion logic, runs on the same thread on which we
persist the cluster state.
2018-12-05 14:04:45 +01:00
Alpar Torok 60e45cd81d
Testing conventions task part 2 (#36107)
Closes #35435

- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
2018-12-05 14:20:01 +02:00
Yannick Welsch 70c361ea5a Merge remote-tracking branch 'elastic/master' into zen2 2018-12-04 21:26:11 +01:00
Adrien Grand 0df08dd458
Set Lucene version upon index creation. (#36038)
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.

Closes #33826
2018-12-04 17:53:20 +01:00
Nhat Nguyen b59deb573e
Always set soft-deletes field of IndexWriterConfig (#36196)
Today we configure the soft-deletes field iff soft-deletes enabled.
Although this choice was correct, it prevents an engine with
soft-deletes disabled from opening a Lucene index with soft-deletes.
Moreover, this change should not have any side-effect if a Lucene index
does not have any soft-deletes.

Relates #36141
2018-12-04 11:15:34 -05:00
Andrey Ershov 35e3d77e2c
[Zen2] Implement state recovery (#36013)
This commit implements proper metadata recovery for Zen2.

GatewayService is responsible for the recovery. In Zen1 GatewayService
creates an instance of Gateway, that is used to reach out to other cluster
nodes, get their state and calculate the most up-to-date state based on
versions. After that Gateway performs upgrade and archival of
 ClusterSettings and closes bad indices. Then recovered state is passed to GatewayService.GatewayRecoveryListener that mixes up current state
and restored state, removes state not recovered block, creates the
routing table and performs re-routing.

In Zen2 we should perform this kind of logic on cluster startup, except
mixing state (because there is nothing to mix) and opening routing table.

This commit refactors out all `ClusterUpdate` functions in a separate class
`ClusterStateUpdaters`, which is used by `Gateway` and `GatewayService`
in case of Zen1, and by `GatewayMetaState` and `GatewayService` in case of
Zen2.

This commit also switches all integration tests that are already using Zen2 from
InMemoryPersistedState to GatewayMetaState.
2018-12-04 14:45:45 +01:00
Yannick Welsch 80ee7943c9 Merge remote-tracking branch 'elastic/master' into zen2 2018-12-04 09:37:09 +01:00
Alpar Torok d036e0ca89
Testclusters: implement starting, waiting for and stopping single cluster nodes (#35599) 2018-12-04 10:16:51 +02:00
David Turner 034c7655b7
[Zen2] Reduce cluster scope in NodeDisconnectIT (#36168)
This test suite can stop all the shared master-eligible nodes, which breaks the
cluster since any non-shared master-eligible nodes are stopped first in the
reset process between tests.

Since this test suite can leave the cluster in this somewhat broken state, it
seems best that it uses a new cluster for each test.
2018-12-04 07:48:56 +00:00
Armin Braun 433a506d06
SNAPSHOT: Improve Resilience SnapshotShardService (#36113)
* Resolve the index in the snapshotting thread
* Added test for routing table - snapshot state mismatch
2018-12-03 16:39:29 +01:00
Jim Ferenczi 74aca756b8
Remove the distinction between query and filter context in QueryBuilders (#35354)
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.

Closes #35293
2018-12-03 11:49:11 +01:00
David Turner 8011438ea8 Use correct source of randomness
This fixes a failure of InternalTestClusterTests#testBeforeTest which checks
that the cluster is set up the same when starting from the same seed. Trappily,
using ESTestCase#randomIntBetween() is no good, we have to use
InternalTestCluster#random via RandomNumbers#randomIntBetween() instead.
2018-12-02 09:39:43 +00:00
David Turner 8191348d6b
[Zen2] Only bootstrap a single node (#36119)
Today, we allow all nodes in an integration test to bootstrap. However this
seems to lead to test failures due to post-election instability. The change
avoids this instability by only bootstrapping a single node in the cluster.
2018-12-01 06:43:11 +00:00
Luca Cavanna 0ebc17743a
Histogram aggs: add empty buckets only in the final reduce step (#35921)
Empty buckets don't need to be added when performing an incremental reduction step, they can be added later in the final reduction step. This will allow us to later remove the max buckets limit when performing non final reduction.
2018-11-30 20:33:09 +01:00
Tim Brooks ea7ea51050
Make `TcpTransport#openConnection` fully async (#36095)
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
2018-11-30 11:30:42 -07:00
Tim Brooks 26dcbcc8cc
Remove `MockTcpTransport` for ESIntegTestCase (#36089)
This commit removes the `MockTcpTransport` as a transport option for
`ESIntegTestCase`. It is the first step in replacing the usages of
`MockTcpTransport` with `MockNioTransport`.
2018-11-30 09:04:51 -07:00
Adrien Grand fa3d365ee8
Fix CompositeBytesReference#slice to not throw AIOOBE with legal offsets. (#35955)
CompositeBytesReference#slice has two bugs:
 - One that makes it fail if the reference is empty and an empty slice is
   created, this is #35950 and is fixed by special-casing empty-slices.
 - One performance bug that makes it always create a composite slice when
   creating a slice that ends on a boundary, this is fixed by computing `limit`
   as the index of the sub reference that holds the last element rather than
   the next element after the slice.

Closes #35950
2018-11-30 10:32:46 +01:00
Zachary Tong 61c2db5ebb Revert "Deprecate X-Pack centric rollup endpoints (#35962)"
This reverts commit b84f1f6a3a.
2018-11-29 12:58:23 -05:00
Zachary Tong 40c5445480 Revert "[TEST] Use deprecated form of rollup endpoint in mixed cluster (#36000)"
This reverts commit 85cdf4f913.
2018-11-29 12:56:25 -05:00
Tim Brooks c305f9dc03
Make keepalive pings bidirectional and optimizable (#35441)
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.

The ping interval configuration is moved to the ConnectionProfile.

The server channel now responds to pings. This makes the keepalive
pings bidirectional.

On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
2018-11-29 08:55:53 -07:00
Zachary Tong 85cdf4f913
[TEST] Use deprecated form of rollup endpoint in mixed cluster (#36000)
When wiping rollup jobs, if we are in a mixed cluster with < v7.0 nodes
we need to fall back to the deprecated endpoint because we may talk
to a 6.x node.
2018-11-29 07:37:33 -05:00
David Turner 7f257187af
[Zen2] Update default for USE_ZEN2 to true (#35998)
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
2018-11-29 12:18:35 +00:00
Jason Tedor b84f1f6a3a
Deprecate X-Pack centric rollup endpoints (#35962)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
2018-11-27 20:34:17 -05:00
Tim Brooks cc1fa799c8
Remove `TcpChannel#setSoLinger` method (#35924)
This commit removes the dedicated `setSoLinger` method. This simplifies
the `TcpChannel` interface. This method has very little effect as the
SO_LINGER is not set prior to the channels being closed in the abstract
transport test case. We still will set SO_LINGER on the
`MockNioTransport`. However we can do this manually.
2018-11-27 09:08:14 -07:00
Andrey Ershov 0e283f9670
[Zen2] PersistedState interface implementation (#35819)
Today GatewayMetaState is capable of atomically storing MetaData to
disk. We've also moved fields that are needed to be persisted in Zen2
from ClusterState to ClusterState.MetaData.CoordinationMetaData.

This commit implements PersistedState interface.

version and currentTerm are persisted as a part of Manifest.
GatewayMetaState now implements both ClusterStateApplier and
PersistedState interfaces. We started with two descendants
Zen1GatewayMetaState and Zen2GatewayMetaState, but it turned
out to be not easy to glue it.
GatewayMetaState now constructs previousClusterState (including
MetaData) and previousManifest inside the constructor so that all
PersistedState methods are usable as soon as GatewayMetaState
instance is constructed. Also, loadMetaData is renamed to
getMetaData, because it just returns
previousClusterState.metaData().
Sadly, we don't have access to localNode (obtained from 
TransportService in the constructor, so getLastAcceptedState
should be called, after setLocalNode method is invoked.
Currently, when deciding whether to write IndexMetaData to disk,
we're comparing current IndexMetaData version and received
IndexMetaData version. This is not safe in Zen2 if the term has changed.
So updateClusterState now accepts incremental write
method parameter. When it's set to false, we always write
IndexMetaData to disk.
Things that are not covered by GatewayMetaStateTests are covered
by GatewayMetaStatePersistedStateTests.
This commit also adds an option to use GatewayMetaState instead of
InMemoryPersistedState in TestZenDiscovery. However, by default
InMemoryPersistedState is used and only one test in PersistedStateIT
used GatewayMetaState. In order to use it for other tests, proper
state recovery should be implemented.
2018-11-27 15:04:52 +01:00
Christophe Bismuth b95a4db6e6 Throw a parsing exception when boost is set in span_or query (#28390) (#34112) 2018-11-26 12:15:59 -05:00
Jim Ferenczi e37a0ef844
Upgrade to lucene-8.0.0-snapshot-67cdd21996 (#35816) 2018-11-22 15:42:59 +01:00