Commit Graph

2578 Commits

Author SHA1 Message Date
Przemyslaw Gomulka ba9a4d13e1
mute Failing tests related to logging and joda-java migration backport(#38704)(#38710)
the tests awaits fix from #38693 and #38705 and #38581
2019-02-11 13:15:12 +01:00
Przemyslaw Gomulka ab9e2f2e69
Move testToUtc test to DateFormattersTests #38698 Backport #38610
The test was relying on toString in ZonedDateTime which is different to
what is formatted by strict_date_time when milliseconds are 0
The method is just delegating to dateFormatter, so that scenario should
be covered there.

closes #38359
Backport #38610
2019-02-11 11:34:25 +01:00
Like b8be6cb5c7
Reject index.optimize_auto_generated_id setting (#28895)
This commit rejects the index.optmize_auto_generated_id setting for
indices created on or after 7.0.0. This setting was deprecated in 6.7.0.
2019-02-10 13:46:09 -05:00
Tim Brooks 023e3c207a
Concurrent file chunk fetching for CCR restore (#38656)
Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR.

The implementation uses the parallel file writer functionality that is also used by peer recoveries.
2019-02-09 21:19:57 -07:00
Christoph Büscher e3c7b93917 Mute failure in InternalEngineTests (#38622) 2019-02-08 16:29:54 +01:00
Dimitris Athanasiou fe8182ece2
Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7x (#38597) 2019-02-08 11:32:28 +02:00
Jason Tedor fdf6b3f23f
Add 7.1 version constant to 7.x branch (#38513)
This commit adds the 7.1 version constant to the 7.x branch.

Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
2019-02-07 16:32:27 -05:00
Jason Tedor f8ed6c15c4
Enable BWC after backport recovering leases (#38485)
This commit enables the BWC tests after backporting recovery of retention
leases during peer recovery.
2019-02-06 08:03:19 -05:00
Jason Tedor 4b42281a4e
Collapse retention lease integration tests (#38483)
This commit collapses the retention lease integration tests into a
single suite.
2019-02-06 07:55:41 -05:00
Tanguy Leroux 510829f9f7
TransportVerifyShardBeforeCloseAction should force a flush (#38401)
This commit changes the `TransportVerifyShardBeforeCloseAction` so that it 
always forces the flush of the shard. It seems that #37961 is not sufficient to 
ensure that the translog and the Lucene commit share the exact same max 
seq no and global checkpoint information in case of one or more noop 
operations have been made.

The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` 
and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial 
situation and they both fail 1 on 10 executions.

Relates to #33888
2019-02-06 13:22:54 +01:00
David Turner 5a3c452480
Align docs etc with new discovery setting names (#38492)
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
2019-02-06 11:34:38 +00:00
Ioannis Kakavas e1d464b22c
Mute testRetentionLeasesSyncOnRecovery (#38488)
Relates: #38487
2019-02-06 08:52:54 +02:00
Armin Braun 34f2cc78f6
Fix Master Failover and DataNode Leave Blocking Snapshot (#38460)
* Closes #38447
2019-02-05 23:56:59 +01:00
Jason Tedor 79a45b47da
Recover retention leases during peer recovery (#38435)
This commit integrates retention leases with recovery. With this change,
we copy the current retention leases on primary to the replica during
phase two of recovery. At this point, the replica will have been added
to the replication group and so is already receiving retention lease
sync requests from the primary. This means that if any retention lease
syncs are triggered on the primary after we sample the retention leases
here during phase two, that sync request will also arrive on the replica
ensuring that the replica is from this point on up to date with the
retention leases on the primary. We have to copy these during phase two
since we will be applying indexing operations, potentially triggering
merges, and therefore must ensure the correct retention leases are in
place beforehand.
2019-02-05 17:43:41 -05:00
Henning Andersen 20c66c5a05
Bubble-up exceptions from scheduler (#38317)
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

This is a continuation of #38014

Fixed NPE that caused CCR tests (IndexFollowingIT and likely others)
to fail.

schedule could bubble rejected exception to uncaught exception
handler when not using SAME executor if thread pool is terminated.
Now ignore rejected exception silently if executor is shutdown.
2019-02-05 21:48:24 +01:00
Boaz Leskes 033ba725af
Remove support for internal versioning for concurrency control (#38254)
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:

> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document 

We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. 

This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.

Relates to #1078
2019-02-05 20:53:35 +01:00
Jason Tedor b03d138122
Lift retention lease expiration to index shard (#38380)
This commit lifts the control of when retention leases are expired to
index shard. In this case, we move expiration to an explicit action
rather than a side-effect of calling
ReplicationTracker#getRetentionLeases. This explicit action is invoked
on a timer. If any retention leases expire, then we hard sync the
retention leases to the replicas. Otherwise, we proceed with a
background sync.
2019-02-05 14:42:17 -05:00
Tim Brooks c2a8fe1f91
Prevent CCR recovery from missing documents (#38237)
Currently the snapshot/restore process manually sets the global
checkpoint to the max sequence number from the restored segements. This
does not work for Ccr as this will lead to documents that would be
recovered in the normal followering operation from being recovered.

This commit fixes this issue by setting the initial global checkpoint to
the existing local checkpoint.
2019-02-05 13:32:41 -06:00
Tal Levy aef5775561
re-enables awaitsfixed datemath tests (#38376)
Previously, date formats of `YYYY.MM.dd` would hit an issue
where the year would jump towards the end of the calendar year.
This was an issue that had since been resolved in tests by using
`yyyy` to be the more accurate representation of the year.

Closes #37037.
2019-02-05 11:20:40 -08:00
Julie Tibshirani 3ce7d2c9b6
Make sure to reject mappings with type _doc when include_type_name is false. (#38270)
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).

This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:

```
PUT index?include_type_name=false
{
  "mappings": {
    "_doc": {
      "properties": { ... }
    }
  }
}
```

There is a similar issue for both 'put templates' and 'put mappings' requests
as well.

This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
2019-02-05 10:52:32 -08:00
David Turner f2dd5dd6eb
Remove DiscoveryPlugin#getDiscoveryTypes (#38414)
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.

Relates #38410
2019-02-05 17:42:24 +00:00
David Turner b7ab521eb1
Throw AssertionError when no master (#38432)
Today we throw a fatal `RuntimeException` if an exception occurs in
`getMasterName()`, and this includes the case where there is currently no
master. However, sometimes we call this method inside an `assertBusy()` in
order to allow for a cluster that is in the process of stabilising and electing
a master. The trouble is that `assertBusy()` only retries on an
`AssertionError` and not on a general `RuntimeException`, so the lack of a
master is immediately fatal.

This commit fixes the issue by asserting there is a master, triggering a retry
if there is not.

Fixes #38331
2019-02-05 17:11:20 +00:00
Armin Braun 2f6afd290e
Fix Concurrent Snapshot Ending And Stabilize Snapshot Finalization (#38368)
* The problem in #38226 is that in some corner cases multiple calls to `endSnapshot` were made concurrently, leading to non-deterministic behavior (`beginSnapshot` was triggering a repository finalization while one that was triggered by a `deleteSnapshot` was already in progress)
   * Fixed by:
      * Making all `endSnapshot` calls originate from the cluster state being in a "completed" state (apart from on short-circuit on initializing an empty snapshot). This forced putting the failure string into `SnapshotsInProgress.Entry`.
      * Adding deduplication logic to `endSnapshot`
* Also:
  * Streamlined the init behavior to work the same way (keep state on the `SnapshotsService` to decide which snapshot entries are stale)
* closes #38226
2019-02-05 16:44:18 +01:00
Lee Hinman d862453d68
Support unknown fields in ingest pipeline map configuration (#38352)
We already support unknown objects in the list of pipelines, this changes the
`PipelineConfiguration` to support fields other than just `id` and `config`.

Relates to #36938
2019-02-05 07:52:17 -07:00
David Turner 3b2a0d7959
Rename no-master-block setting (#38350)
Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any
value set for the old setting is now ignored.
2019-02-05 08:47:56 +00:00
David Turner 2d114a02ff
Rename static Zen1 settings (#38333)
Renames the following settings to remove the mention of `zen` in their names:

- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
2019-02-05 08:46:52 +00:00
Yogesh Gaikwad fe36861ada
Add support for API keys to access Elasticsearch (#38291)
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to 
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.

This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.

The API keys:-
- by default do not have an expiration but expiration can be
  configured where the API keys need to be expired after a
  certain amount of time.
- when generated will keep authentication information of the user that
   generated them.
- can be defined with a role describing the privileges for accessing
   Elasticsearch and will be limited by the role of the user that
   generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
  before being deleted. The expired API keys remover task handles this.

Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`

The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the 
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```

Closes #34383
2019-02-05 14:21:57 +11:00
Christoph Büscher d255303584
Add typless client side GetIndexRequest calls and response class (#37778)
The HLRC client currently uses `org.elasticsearch.action.admin.indices.get.GetIndexRequest`
and `org.elasticsearch.action.admin.indices.get.GetIndexResponse` in its get index calls. Both request and
response are designed for the typed APIs, including some return types e.g. for `getMappings()` which in
the maps it returns still use a level including the type name.
In order to change this without breaking existing users of the HLRC API, this PR introduces two new request
and response objects in the `org.elasticsearch.client.indices` client package. These are used by the
IndicesClient#get and IndicesClient#exists calls now by default and support the type-less API. The old request
and response objects are still kept for use in similarly named, but deprecated methods.

The newly introduced client side classes are simplified versions of the server side request/response classes since
they don't need to support wire serialization, and only the response needs fromXContent parsing (but no
xContent-serialization, since this is the responsibility of the server-side class).
Also changing the return type of `GetIndexResponse#getMapping` to
`Map<String, MappingMetaData> getMappings()`, while it previously was returning another map
keyed by the type-name. Similar getters return simple Maps instead of the ImmutableOpenMaps that the 
server side response objects return.
2019-02-05 03:41:05 +01:00
Gordon Brown 292e0f6fb7
Deprecate `_type` in simulate pipeline requests (#37949)
As mapping types are being removed throughout Elasticsearch, the use of
`_type` in pipeline simulation requests is deprecated. Additionally, the
default `_type` used if one is not supplied has been changed to `_doc` for
consistency with the rest of Elasticsearch.
2019-02-04 16:11:44 -07:00
Christoph Büscher 0ced775389
Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica (#38357) 2019-02-04 22:30:34 +01:00
Mayya Sharipova 641704464d
Deprecate types in rollover index API (#38039)
Relates to #35190
2019-02-04 16:07:45 -05:00
Zachary Tong ab1150378b
Add Composite to AggregationBuilders (#38207) 2019-02-04 13:47:04 -05:00
David Turner 2c1eab2b8a
Clarify slow cluster-state log messages (#38302)
The message `... took [31s] above the warn threshold of 30s` suggests
incorrectly that the task took 61 seconds. This commit adds the clarifying
words `which is`.
2019-02-04 17:44:00 +00:00
Andrey Ershov 7bc8bc9605
ensureGreen (#38324) 2019-02-04 16:36:04 +01:00
Jason Tedor 625d37a26a
Introduce retention lease background sync (#38262)
This commit introduces a background sync for retention leases. The idea
here is that we do a heavyweight sync when adding a new retention lease,
and then periodically we want to background sync any retention lease
renewals to the replicas. As long as the background sync interval is
significantly lower than the extended lifetime of a retention lease, it
is okay if from time to time a replica misses a sync (it will still have
an older version of the lease that is retaining more data as we assume
that renewals do not decrease the retaining sequence number). There are
two follow-ups that will come after this commit. The first is to address
the fact that we have not adapted the should periodically flush logic to
possibly flush the retention leases. We want to do something like flush
if we have not flushed in the last five minutes and there are renewed
retention leases since the last time that we flushed. An additional
follow-up will remove the syncing of retention leases when a retention
lease expires. Today this sync could be invoked in the background by a
merge operation. Rather, we will move the syncing of retention lease
expiration to be done under the background sync. The background sync
will use the heavyweight sync (write action) if a lease has expired, and
will use the lightweight background sync (replication action) otherwise.
2019-02-04 10:35:29 -05:00
Christoph Büscher 5ee7232379
Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334) 2019-02-04 16:10:06 +01:00
Christoph Büscher 715e581378
Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330) 2019-02-04 15:46:19 +01:00
Boaz Leskes e49b593c81
Move TokenService to seqno powered cas (#38311)
Relates #37872 
Relates #10708
2019-02-04 15:25:41 +01:00
Yannick Welsch ece8c659c5
Decrease leader and follower check timeout (#38298)
Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still
being a very long time for a node to be completely unresponsive.
2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka 9b64558efb
Migrating from joda to java.time. Watcher plugin (#35809)
part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time

refers #27330
2019-02-04 15:08:31 +01:00
Alexander Reelsen 87f3579125
Add nanosecond field mapper (#37755)
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1970 to 2262.

Note that aggregations will still be in milliseconds.
However docvalue fields will have full nanosecond resolution

Relates #27330
2019-02-04 11:31:16 +01:00
Christoph Büscher 15510da2af
Mute SharedClusterSnapshotRestoreIT#testAbortedSnapshotDuringInitDoesNotStart (#38304) 2019-02-04 10:41:35 +01:00
David Turner 1d82a6d9f9
Deprecate unused Zen1 settings (#38289)
Today the following settings in the `discovery.zen` namespace are still used:

- `discovery.zen.no_master_block`
- `discovery.zen.hosts_provider`
- `discovery.zen.ping.unicast.concurrent_connects`
- `discovery.zen.ping.unicast.hosts.resolve_timeout`
- `discovery.zen.ping.unicast.hosts`

This commit deprecates all other settings in this namespace so that they can be
removed in the next major version.
2019-02-04 08:52:08 +00:00
Armin Braun 4561f425db
Remove Redundandant Loop in SnapshotShardsService (#38283)
* This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state
2019-02-04 09:06:39 +01:00
Alpar Torok d58e899d45
Remove empty service files (#38192) 2019-02-04 10:05:04 +02:00
Jason Tedor d2cc1459a3
Fix ordering problem in add or renew lease test (#38280)
We have to set the primary term before we add a retention lease,
otherwise we can not assert the correct primary term.
2019-02-03 12:54:31 -05:00
Christoph Büscher 6ca7a913ea
Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (#38275) 2019-02-03 12:54:13 +01:00
Armin Braun 89d7c57bd9
Fix Incorrect Transport Response Handler Type (#38264)
* Fix Incorrect Transport Response Handler Type
* The response type here is not empty and was always wrong but this only became visible now that 0a604e3b24 was introduced
   * As a result of 0a604e3b24 we started actually handling the response
of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler
* fix busy assert not handling `Exception`
* Closes #38226
* Closes #38256
2019-02-03 08:48:15 +01:00
Nhat Nguyen 0861dc3581
Mute testCanRunUnsafeBootstrapAfterErroneousDetachWithoutLoosingMetaData (#38268)
Tracked at #38267
2019-02-02 20:02:21 -05:00
Christoph Büscher 50cdc61874
Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38257) 2019-02-02 13:46:29 +01:00
David Turner c311062476
Add CoordinatorTests for empty unicast hosts list (#38209)
Today we have DiscoveryDisruptionIT tests for checking that discovery can still
work once the cluster has formed, even if the cluster is misconfigured and only
has a single master-eligible node in its unicast hosts list. In fact with Zen2
we can go one better: we do not need any nodes in the unicast hosts list,
because nodes also use the contents of the last-committed cluster state for
discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to
the overenthusiastic fault-detection timeouts.

This commit replaces these tests with deterministic `CoordinatorTests` that
verify the same behaviour. It also removes some duplication by extracting a
test method called `testFollowerCheckerAfterMasterReelection()`

Closes #37687
2019-02-02 07:54:56 +00:00
Nhat Nguyen 80d3092292
Fix primary term in testAddOrRenewRetentionLease (#38239)
We should increase primary term before renewing leases; otherwise, the
term of the latest RetentionLeases will be lower than the current term.

Relates #37951
2019-02-02 02:38:53 -05:00
Nhat Nguyen 1ec04dff43
FIx testReplicaIgnoresOlderRetentionLeasesVersion (#38246)
If the innerLength is 0, the version won't be increased; then there will
be two RetentionLeases with the same term and version, but their leases
are different.

Relates #37951
Closes #38245
2019-02-02 02:37:37 -05:00
Nhat Nguyen 8bee5b8e06
Mute testAddOrRenewRetentionLease (#38240)
Relates #38239
2019-02-01 21:27:10 -05:00
Boaz Leskes f6e06a2b19 Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231)
After backporting #37977, #37857 and #37872
2019-02-01 20:37:16 -05:00
Jason Tedor f181e17038
Introduce retention leases versioning (#37951)
Because concurrent sync requests from a primary to its replicas could be
in flight, it can be the case that an older retention leases collection
arrives and is processed on the replica after a newer retention leases
collection has arrived and been processed. Without a defense, in this
case the replica would overwrite the newer retention leases with the
older retention leases. This commit addresses this issue by introducing
a versioning scheme to retention leases. This versioning scheme is used
to resolve out-of-order processing on the replica. We persist this
version into Lucene and restore it on recovery. The encoding of
retention leases is starting to get a little ugly. We can consider
addressing this in a follow-up.
2019-02-01 17:19:19 -05:00
Nhat Nguyen 9c39dea7ae
AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227)
Tracked at #38226
2019-02-01 16:24:02 -05:00
Armin Braun 03a1d21070
SnapshotShardsService Simplifications (#38025)
* Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot
* Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)
2019-02-01 20:46:14 +01:00
Luca Cavanna ee57420de6
Adjust SearchRequest version checks (#38181)
The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.
2019-02-01 19:23:13 +01:00
Andrey Ershov 04dc41b99e
Zen2ify RareClusterStateIT (#38184)
In Zen 1 there are commit timeout and publish timeout and these
settings could be changed on-the-fly.

In Zen 2, there is only commit timeout and this setting is static.
RareClusterStateIT is actively using these settings and the fact, they
are dynamic.

This commit adds cancelCommitedPublication method to Coordinator to
be used by tests. This method will cancel current committed publication
if there is any.
When there is BlockClusterStateProcessing on the non-master node, the
publication will be accepted and committed, but not yet applied. So we
can use the method above to cancel it.

Also, this commit replaces callback + AtomicReference with ActionFuture,
which makes test code easier to read.
2019-02-01 18:18:11 +01:00
Yannick Welsch 025bf28405
Fix _host based require filters (#38173)
Using index.routing.allocation.require._host does not correctly work because the boolean logic in
filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when
opType ==OpType.AND
2019-02-01 16:02:37 +01:00
Tanguy Leroux da6269b456
RestoreService should update primary terms when restoring shards of existing indices (#38177)
When restoring shards of existing indices, the RestoreService also 
restores the values of primary terms stored in the snapshot index 
metadata. The primary terms are not updated and could potentially 
conflict with current index primary terms if the restored primary terms 
are lower than the existing ones.

This situation is likely to happen with replicated closed indices 
(because primary terms are increased when the index is transitioning 
from open to closed state, and the snapshotted primary terms are the
 one at the time the index was opened) (see #38024) and maybe also 
with CCR.

This commit changes the RestoreService so that it updates the primary 
terms using the maximum value between the snapshotted values and 
the existing values.

Related to #33888
2019-02-01 15:59:11 +01:00
Desmond Vehar c1c4abae10 Throw if two inner_hits have the same name (#37645)
This change throws an error if two inner_hits have the same name

Closes #37584
2019-02-01 15:53:50 +01:00
Alexander Reelsen 35ed137684
Ensure joda compatibility in custom date formats (#38171)
If custom date formats are used, there may be combinations that the new
performat DateFormatters.from() method has not covered yet. This adds a
few such corner cases and ensures the tests are correctly commented
out.
2019-02-01 15:42:56 +01:00
Jim Ferenczi 66e4fb4fb6
Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (#38169)
In #38158 we ensured that global ordinals are not loaded when another execution hint is explicitly set on the source. This change is a follow up that addresses a comment
dd6043c1c0 (r252984782) added after the merge.
2019-02-01 15:32:19 +01:00
Nhat Nguyen 2e475d63f7
Do not set timeout for IndexRequests in GatewayIndexStateIT (#38147)
CI might not be fast enough to publish a dynamic mapping update within 100ms.
2019-02-01 09:30:03 -05:00
Andrey Ershov c1270e97b0
Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (#38178)
In Zen2 cluster bootstrap is required and some parameters are 
called differently in Zen2.
2019-02-01 15:24:08 +01:00
Andrey Ershov bda591453c
Add elasticsearch-node detach-cluster command (#37979)
This commit adds the second part of `elasticsearch-node` tool -
`detach-cluster` command in addition to `unsafe-bootstrap` command.
Also, this commit changes the semantics of `unsafe-bootstrap`, now
`unsafe-bootstrap` changes clusterUUID.
So the algorithm of running `elasticsearch-node` tool is the following:
1) Stop all nodes in the cluster.
2) Pick master-eligible node with the highest (term, version) pair and
run the `unsafe-bootstrap` command on it. If there are no survived
master-eligible nodes - skip this step.
3) Run `detach-cluster` command on the remaining survived nodes.

Detach cluster makes the following changes to the node metadata:
1) Sets clusterUUID committed to false.
2) Sets currentTerm and term to 0. 
3) Removes voting tombstones and sets voting configurations to special
constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster
bootstrap.

`ElasticsearchNodeCommand` base abstract class is introduced, because
`UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in
common.
Also, this commit adds "ordinal" parameter to both commands, because it's 
impossible to write IT otherwise.
For MUST_JOIN_ELECTED_MASTER case special handling is introduced in
`ClusterFormationFailureHelper`.
Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed
from `UnsafeBootstrapMasterIT`).
2019-02-01 14:53:55 +01:00
Alexander Reelsen 979e5576e5
Add tests for fractional epoch parsing (#38162)
Fractional epoch parsing is supported, the tests we used were edge cases
that did not make sense. This adds tests to properly check for this.
2019-02-01 14:48:37 +01:00
Tanguy Leroux 029e4b6278
Clear send behavior rule in CloseWhileRelocatingShardsIT (#38159)
The current CloseWhileRelocatingShardsIT test adds some "send behavior" 
rule to a target node's mocked transport service in order to detect when shard 
relocating are started. These rules are never cleared and prevent the test to 
complete normally after the rebalance is re-enabled again.

This commit changes the test so that rules are cleared and most verifications 
are done before the rebalance is reenabled again.

Closes #38090
2019-02-01 12:58:46 +01:00
Yannick Welsch ce469cfda5
Fix testCorruptedIndex (#38161)
Folks at the Lucene project do not seem to be interested in classifying corruptions and
distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525),
so we'll just cop out as well.

Closes #34322
2019-02-01 12:51:38 +01:00
Luca Cavanna e18cac3659
Add finalReduce flag to SearchRequest (#38104)
With #37000 we made sure that fnial reduction is automatically disabled
whenever a localClusterAlias is provided with a SearchRequest.

While working on #37838, we found a scenario where we do need to set a
localClusterAlias yet we would like to perform a final reduction in the
remote cluster: when searching on a single remote cluster.

Relates to #32125

This commit adds support for a separate finalReduce flag to
SearchRequest and makes use of it in TransportSearchAction in case we
are searching against a single remote cluster.

This also makes sure that num_reduce_phases is correct when searching
against a single remote cluster: it makes little sense to return
`num_reduce_phases` set to `2`, which looks especially weird in case
the search was performed against a single remote shard. We should
perform one reduction phase only in this case and `num_reduce_phases`
should reflect that.

* line length
2019-02-01 12:11:42 +01:00
Jim Ferenczi 6fa93ca493
Forbid negative field boosts in analyzed queries (#37930)
This change forbids negative field boost in the `query_string`, `simple_query_string`
and `multi_match` queries.
Negative boosts are not allowed in Lucene 8 (scores must be positive).
The backport of this change to 6x will turn the error into a deprecation warning
in order to raise the awareness of this breaking change in 7.0.

Closes #33309
2019-02-01 11:41:40 +01:00
Jim Ferenczi 57b1d245e8
Remove AtomiFieldData#getLegacyFieldValues (#38087)
This function is unused now that we format the docvalue fields with the default
formatter on the field (#30831)
2019-02-01 11:41:17 +01:00
Andrey Ershov bfd618cf83
Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038)
Currently, there are a few tests that use autoMinMasterNodes=false and
hence override addExtraClusterBootstrapSettings, mostly this is 10-30
lines of codes that are copy-pasted from class to class.

This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex`
which is suitable for all classes and copy-paste could be removed.

Removing code is always a good thing!
2019-02-01 11:34:31 +01:00
Jim Ferenczi b7308aa03c
Don't load global ordinals with the `map` execution_hint (#37833)
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality.

Closes #37705
2019-02-01 09:35:46 +01:00
David Turner 23f00e3676
Relax fault detector in some disruption tests (#38101)
Today we use `AbstractDisruptionTestCase` to test the behaviour of things like
master elections in the presence of cluster disruptions. These tests have
rather enthusiastic fault detection settings, detecting a fault if a single
ping fails, with a one-second timeout. Furthermore there are some tests that
assert the identity of the master remains unchanged during some disruption, and
these assertions fail rather often thanks to the overly sensitive fault
detector.

However in a number of these tests the fault detector need not be this
sensitive. This commit moves some such tests into their own test suite and uses
more sensible fault-detection settings to avoid the kind of master instability
that is causing CI failures.

Closes #37699
2019-02-01 08:10:49 +00:00
Alexander Reelsen c02cd3e2fd
Fix java time epoch date formatters (#37829)
The self written epoch date formatters were not properly able to format
an Instant to a string due to a misconfiguration.

This fix also removes a until now existing runtime behaviour under java
8 regarding the names of the aggregation buckets, which are now the same
as before and have been under java 11.
2019-02-01 09:03:48 +01:00
Yannick Welsch 859e2f5bc8 Adapt timeouts in UpdateMappingIntegrationIT
Relates to #37263 and possibly #36916
2019-02-01 08:58:31 +01:00
Adrien Grand d83c748417
Fix test bug in DynamicMappingsIT. (#37906)
Closes #37898
2019-02-01 08:35:29 +01:00
Przemyslaw Gomulka 2758578570
Trim the JSON source in indexing slow logs (#38081)
The '{' as a first character in log line is causing problems for beats when parsing plaintext logs. This can happen if the submitted document has an additional '\n' at the beginning and we are not reformatting. 
Trimming the source part of a SlogLog solves that and keeps the logs readable.

closes #38080
2019-02-01 08:12:12 +01:00
Armin Braun 0a604e3b24
Fix Two Races that Lead to Stuck Snapshots (#37686)
* Fixes two broken spots:
    1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting
    2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state
* Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk
* Significantly extended test infrastructure to reproduce the above two issues
  * Two new test runs:
      1. Reproducing the effects of node disconnects/restarts in isolation
      2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes
* Relates #32265 
* Closes #32348
2019-02-01 05:45:40 +01:00
Nhat Nguyen b8b843476d
Disable dynamic mapping in testSimpleGetFieldMappingsWithDefaults (#38045)
Since #31140 we no longer require acking on the dynamic mapping of index
requests. Thus, a returned mapping from a get mapping request does not
necessarily contain the dynamic updates from the index request. This
commit replaces the dynamic mapping update with a manual put mapping.

Relates #31140
Closes #37928
2019-01-31 21:01:41 -05:00
Nhat Nguyen a8ebe2a217
Fix random params in testSoftDeletesRetentionLock (#38114)
Since #37992 the retainingSequenceNumber is initialized with 0 
while the global checkpoint can be -1.

Relates #37992
2019-01-31 20:50:41 -05:00
Lee Hinman c67a9663af
Fix MasterServiceTests.testClusterStateUpdateLogging (#38116)
This changes the test to not use a `CountDownlatch`, instead adding an assertion
for the final logging message and waiting until the `MockAppender` has seen it
before proceeding.

Related to df2c06f6f30f7e23a6863a3f72fc3bdb7648885c
Resolves #23739
2019-01-31 17:13:19 -07:00
Yuri Astrakhan f3cde06a1d
geotile_grid implementation (#37842)
Implements `geotile_grid` aggregation

This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240

This code uses the same base classes as `geohash_grid` agg, but uses a different hashing
algorithm to allow zoom consistency.  Each grid bucket is aligned to Web Mercator tiles.
2019-01-31 19:11:30 -05:00
Pascal Christoph a3d9ba3f4b Log document id when MapperParsingException occurs (#37800)
Closes #37658
2019-01-31 16:33:13 -05:00
Nhat Nguyen 237fcda2cc
Disable dynamic mapping update in testTransportBulkTasks (#38073)
If a replica does not have a right mapping yet, we will retry the index
request on that replica; then the actual tasks is higher than the
expected tasks. Since #31140 this happens more frequently for we no
longer require acking on the dynamic mapping of index requests.

Relates #31140
Closes #37893
2019-01-31 13:16:52 -05:00
Przemyslaw Gomulka 28b5c7ce78
Do not set up NodeAndClusterIdStateListener in test (#38110)
When extending ESIntegTestCase are run on the same jvm, the static field in
NodeAndClusterIdConverter will throw an AlreadySet exceptions.
overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening
relates #32850
2019-01-31 18:59:40 +01:00
Nhat Nguyen 8e95780f98
Soft-deletes policy should always fetch latest leases (#37940)
If a new retention lease is added while a primary's soft-deletes policy
is locked for peer-recovery, that lease won't be baked into the Lucene
commit.

Relates #37165
Relates #37375
2019-01-31 12:02:57 -05:00
Henning Andersen 68ed72b923
Handle scheduler exceptions (#38014)
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
2019-01-31 17:51:45 +01:00
David Turner 7f738e8541
Minor logging improvements (#38084)
Fixes some log messages that caused some minor confusion when digging through a
log generated by a failing test.
2019-01-31 16:41:04 +00:00
Tal Levy 9923f0fe6a
fix a few versionAdded values in ElasticsearchExceptions (#37877)
TooManyBucketsException was introduced in v6.2
and SnapshotInProgressException was introduced in v6.7
2019-01-31 08:28:20 -08:00
Tanguy Leroux 7a597cad0d
Reenable BWC tests after backport of #37899 (#38093)
This commit adapts the version used in StartedShardEntry serialization
 after the backport of  #37899 and reenables bwc tests.

Related to #37899
Related to #38074
2019-01-31 16:53:28 +01:00
Henning Andersen 7487be3d3c Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock 2019-01-31 15:31:01 +01:00
Jason Tedor a9b12b38f0
Push primary term to replication tracker (#38044)
This commit pushes the primary term into the replication tracker. This
is a precursor to using the primary term to resolving ordering problems
for retention leases. Namely, it can be that out-of-order retention
lease sync requests arrive on a replica. To resolve this, we need a
tuple of (primary term, version). For this to be, the primary term needs
to be accessible in the replication tracker. As the primary term is part
of the replication group anyway, this change conceptually makes sense.
2019-01-31 09:19:49 -05:00
Luca Cavanna 622fb7883b
Introduce ability to minimize round-trips in CCS (#37828)
With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters.

This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour.

Relates to #32125
2019-01-31 15:12:14 +01:00
Armin Braun ae9f4df361
Don't Assert Ack on when Publish Timeout is 0 in Test (#38077)
* Publish timeout is set to `0` so out of order processing of states on the node can lead to a `false` ack response
  * See #30672
* Closes #36813
2019-01-31 14:35:11 +01:00
Alexander Reelsen 9f026bb8ad
Reduce object creation in Rounding class (#38061)
This reduces objects creations in the rounding class (used by aggs) by properly
creating the objects only once. Furthermore a few unneeded ZonedDateTime objects
were created in order to create other objects out of them. This was
changed as well.

Running the benchmarks shows a much faster performance for all of the
java time based Rounding classes.
2019-01-31 14:18:28 +01:00
Adrien Grand a536fa7755
Treat put-mapping calls with `_doc` as a top-level key as typed calls. (#38032)
Currently the put-mapping API assumes that because the type name is `_doc` then
it is dealing with a typeless put-mapping call. Yet we still allow running the
put-mapping API in a typed fashion with `_doc` as a type name. The current logic
triggers surprising errors when doing a typed put-mapping call with `_doc` as a
type name on an index that has a type already.

This is a bit of a corner-case, but is more important on 6.x due to the fact
that using the index API with `_doc` as a type name triggers typed calls to the
put-mapping API with `_doc` as a type name.
2019-01-31 13:57:42 +01:00