41433 Commits

Author SHA1 Message Date
David Turner
8b9fa55c93
Add storage-layer disruptions to CoordinatorTests (#34347)
Today we assume the storage layer operates perfectly in CoordinatorTests, which
means we are not testing that the system's invariants are preserved if the
storage layer fails for some reason. This change injects (rare) storage-layer
failures during the safety phase to cover these cases.
2018-10-13 14:24:15 +01:00
David Turner
d98199df14
Extend duration of fixLag() (#34364)
Today, fixLag() waits for a new cluster state to be committed. However, it does
not account for the fact that a term bump may occur, requiring a new election
to take place after the cluster state is committed. This change fixes this.
2018-10-11 23:24:08 +01:00
David Turner
a32e303b0c
Account for election duration (#34362)
Today we may schedule two elections very close together, which can cause the
first election to fail even if there are no other nodes. This change adds a
delay in between subsequent elections on the same node, effectively allowing
time for each election to complete before scheduling the next one.
2018-10-11 15:31:08 +01:00
David Turner
52a3a19551
Add low-level bootstrap implementation (#34345)
Today we inject the initial configuration of the cluster (i.e. the set of
voting nodes) at startup. In reality we must support injecting the initial
configuration after startup too. This commit adds low-level support for doing
so as safely as possible.
2018-10-08 15:56:48 +01:00
David Turner
ac99d1d66d
Fix bugs in fixLag() (#34346)
The hack to work around lag detection had some issues:
- it always called runFor(), even if no lag was detected
- it looked at the last-accepted state not the last-applied state, so missed
  some lag situations.

This fixes these issues.
2018-10-08 11:33:25 +01:00
David Turner
03da4f6c51
Gather votes from all nodes (#34335)
Today we accept that some nodes may vote for the wrong master in an election.
This is mostly fine because they do end up joining the correct master in the
end, but the lack of a vote from every follower may prevent a future desirable
reconfiguration from taking place.

The solution is to hold another election in a yet-higher term in order to
collect a complete set of votes. Elections are somewhat disruptive so we should
think carefully about when this election should take place. One option is to
wait as late as possible (on the grounds that it might not ever be necessary).
This unfortunately makes it harder to predict how an
apparently-smoothly-running cluster will react to nodes leaving and joining.
Instead we prefer to perform the election as soon as possible in the leader's
term, adding "votes from all followers" to the invariants that we expect to
hold in a stable cluster. The start of a leader's term is already a somewhat
disrupted time for the cluster, so performing another election at this point
does not materially change the cluster's behaviour.

This change implements the logic needed to trigger a new election in order to
satisfy this extra stabilisation condition.
2018-10-06 07:22:04 +01:00
David Turner
29d7d1d503
Minor housekeeping of tests (#34315)
From experience with #34257, here are a few things that help with analysing
logs from test runs. Also we prevent trying to stabilise a cluster with raised
delay variability, because lowering the delay variability requires time to
allow all the extra-varied-scheduled tasks to work their way out of the system.
2018-10-05 07:57:03 +01:00
Yannick Welsch
b32abcbd00
Zen2: Add Cluster State Applier (#34257)
Adds the cluster state applier to Coordinator, and adds tests for cluster state acking.
2018-10-04 20:33:28 +02:00
David Turner
c6b0f08472
Add safety phase to CoordinatorTests (#34241)
Today's CoordinatorTests have a limited amount of randomisation in how things
are scheduled. However, to be fully confident in Zen2's liveness we require the
system to stabilise after any permitted sequence of events. We can achieve
this by running the system in a much more random fashion for a while, with much
larger variation in when things are scheduled (simulating GC pressure and
network disruption) and then continuing to assert that the system stabilises as
we expect. When running randomly, we do not expect to make significant progress
and merely verify that no safety property is violated.

This change introduces the runRandomly() test method which implements this
idea. It also fixes a handful of liveness bugs that this first version of
runRandomly() exposed.
2018-10-04 07:40:26 +01:00
David Turner
cbe1cf98c6 Merge branch 'master' into zen2 2018-10-03 22:12:56 +01:00
Jim Ferenczi
ee21067a41
Add early termination support for min/max aggregations (#33375)
This commit adds the support to early terminate the collection of a leaf
in the min/max aggregator. If the query matches all documents the min and max value
for a numeric field can be retrieved efficiently in the points reader.
This change applies this optimization when possible.
2018-10-03 18:33:39 +02:00
Mayya Sharipova
8f10c771e6 Add migration info for missing values in script
Relates to #30975
2018-10-03 11:56:18 -04:00
Jay Modi
3c1fdc9fc0
Security: reduce memory usage of DnRoleMapper (#34250)
The `DnRoleMapper` class is used to map distinguished names of groups
and users to role names. This mapper builds in an internal map that
maps from a `com.unboundid.ldap.sdk.DN` to a `Set<String>`. In cases
where a lot of distinct DNs are mapped to roles, this can consume quite
a bit of memory. The majority of the memory is consumed by the DN
object. For example, a 94 character DN that has 9 relative DNs (RDN)
will retain 4KB of memory, whereas the String itself consumes less than
250 bytes.

In order to reduce memory usage, we can map from a normalized DN string
to a List of roles. The normalized string is actually how the DN class
determines equality with another DN and we can drop the overhead of
needing to keep all of the other objects in memory. Additionally the
use of a List provides memory savings as each HashSet is backed by a
HashMap, which consumes a great deal more memory than an appropriately
sized ArrayList. The uniqueness we get from a Set is maintained by
first building a set when parsing the file and then converting to a
list upon completion.

Closes #34237
2018-10-03 09:30:57 -06:00
albendz
f09190c14d Require combine and reduce scripts in scripted metrics aggregation (#33452)
* Make text message not required in constructor for slack

* Remove unnecessary comments in test file

* Throw exception when reduce or combine is not provided; update tests

* Update integration tests for scripted metrics to always include reduce and combine

* Remove some old changes from previous branches

* Rearrange script presence checks to be earlier in build

* Change null check order in script builder for aggregated metrics; correct test scripts in IT

* Add breaking change details to PR
2018-10-03 15:22:01 +01:00
Jason Tedor
9d36cbaf16
Set BWC builds for 6.x to use JDK 11
The BWC builds for the 6.x branch should be using JDK 11. This commit
fixes the BWC builds to specify that they use JDK 11 instead of JDK 10
which is now incompatible with the 6.x build.
2018-10-03 09:52:21 -04:00
Jason Tedor
86642d29e5
Require JDK 11 for compilation (#34103)
Now that JDK 11 is GA, we would switch our 6.x and master branches to
the JDK 11 compiler. This commit makes this change, as well as removes
JDK 10 from the CI configuration.
2018-10-03 08:59:00 -04:00
Jim Ferenczi
41528c0813 Adapt bwc version after backport (bis)
Relates #34225
2018-10-03 14:24:01 +02:00
Jim Ferenczi
1aa8e72be7 Adapt bwc version after backport
Relates #34225
2018-10-03 12:24:07 +02:00
Jim Ferenczi
5a3e031831
Preserve the order of nested documents in the Lucene index (#34225)
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
2018-10-03 11:55:30 +02:00
Vladimir Dolzhenko
a7f62ee902
[GCE Discovery] Automatically set project-id and zone (#33721)
Fetch default values for project-id and zone from metadata server

Closes #13618
2018-10-03 11:37:36 +02:00
Julie Tibshirani
c6fcb60071
Add support for 'ack watch' to the HLRC. (#33962) 2018-10-03 02:03:03 -07:00
Colin Goodheart-Smithe
2d64e3db9a
Adds trace logging to IndicesRequestCache (#34180)
* Adds trace logging to IndicesRequestCache

This change adds trace level logging to `IndicesrrequestCache` witht eh
primary aim of helping to identify the cause of teh failures in
https://github.com/elastic/elasticsearch/issues/32827. The cache will
log at trace level when a cache hit or miss occurs including the reader
version and the cache key. Note that this change adds a
`cacheKeyRenderer` whcih supplies a human readable String of the cache
key since the actual cache key itself is a `BytesReference` containing
the wire protocol serialised form of the request.

Logging is also added for the case where a search timeout occurs and fr
that reason the cache entry is invalidated.

* Adds comment to remaind us to remove cacheKeyRenderer
2018-10-03 08:58:33 +01:00
David Turner
a9eae1d068 Merge branch 'master' into zen2 2018-10-03 08:36:34 +01:00
Nhat Nguyen
d7893fd1e4 TEST: Mute testFollowIndexAndCloseNode
Tracked at #33337
2018-10-02 17:20:31 -04:00
Christoph Büscher
6fb9c63ed6 [Docs] Fix broken link for HLRC rethrottle 2018-10-02 23:14:37 +02:00
Wilder Pereira
5af6ae564d Change "REST Verb" to "HTTP Verb" (#34195) 2018-10-02 17:09:54 -04:00
Christoph Büscher
a1c441f78a
HLRC: Add throttling for update & delete-by-query (#33951)
This change adds throttling to the update-by-query and delete-by-query cases
similar to throttling for reindex. This mostly means additional methods on the
client class itself, since the request hits the same RestHandler, just with
slightly different endpoints, and also the return values are similar.
2018-10-02 21:44:15 +02:00
Gordon Brown
dd3fe92673
[DOCS] Note that User Cluster Metadata is not private (#34156)
As user-defined cluster metadata is accessible to anyone with access to
get the cluster settings, stored in the logs, and likely to be tracked
by monitoring solutions, it is useful to clarify in the documentation
that it should not be used to store secret information.
2018-10-02 13:36:13 -06:00
Dimitrios Liappis
f12e0a8398
Add ES version 6.4.3 (#34239)
Version bump
2018-10-02 21:15:58 +03:00
David Turner
a7ce4b31ed
Fix logging of cluster state update descriptions (#34182)
In #28941 we changed the computation of cluster state task descriptions but
this introduced a bug in which we only log the empty descriptions (rather than
the non-empty ones). This change fixes that.
2018-10-02 19:08:19 +01:00
Nik Everett
d3a4fe9a8e
Docs: Wrap expert script example to fit in docs (#34201)
This slightly reworks the expert script plugin example so it fits on the
page when the docs are rendered. The box in which it is rendered is not
very wide so it took a bit of twisting to make it readable.
2018-10-02 12:25:58 -04:00
Jason Tedor
5140f992b4
Fix use of hostname in Windows service (#34193)
To pass the HOSTNAME envrionment variable to the Windows service, we
have to add some command line flags to the service invocation. Namely,
we have to specify that we are passing HOSTNAME variable, and we will
pass for it the value of %%COMPUTERNAME%%. This ensures that if the
hostname is changed, we pick this up the next time that the service is
started. This change is needed for the service now that we use the
HOSTNAME as the default node name.
2018-10-02 12:10:43 -04:00
Jay Modi
2e5945a5e9
HLRC: PutUserRequest should not be closeable (#34196)
The PutUserRequest implemented closeable as it assumed ownership of the
password provided to the class. This change removes the ownership of
the password, documents it in the javadoc, and removes the closeable
implementation.

Additionally, the intermediate bytes used for writing the password to
XContent are now cleared. This makes the PutUserRequest consistent with
the behavior discussed in #33509.
2018-10-02 10:10:32 -06:00
jaymode
306e178d83
Test: remove awaitsfix incorrectly added in #34148 2018-10-02 10:02:20 -06:00
Martijn van Groningen
7f5c2f1050
[CCR] Validate follower index historyUUIDs (#34078)
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid.

No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing.

Closes #33956
2018-10-02 18:01:06 +02:00
Jay Modi
8539fb68d9
Test: Revert pinning MockWebServer to TLSv1.2 (#34148)
Revert "[TESTS] Pin MockWebServer to TLS1.2 (#33127)" (commit
214652d4af8188d4ba872626eeea3bcdff7096f0) and "Pin TLS1.2 in
SSLConfigurationReloaderTests" (commit
d9f5e4fd2e06c9b69f3b4744e49e747e1ff708b4), which pinned the
MockWebServer used in the SSLConfigurationReloaderTests to TLSv1.2 in
order to prevent failures with JDK 11 related to ssl session
invalidation. We no longer need this pinning as the problematic code
was fixed in #34130.
2018-10-02 09:54:21 -06:00
Andriy
6b714c9e1e [Docs] Updated link to kafka-elasticsearch-consumer project (#34234) 2018-10-02 17:46:38 +02:00
Lisa Cawley
a4cf4ca585
[DOCS] Clarifies examples in reindex and task APIs (#33143) 2018-10-02 08:37:45 -07:00
Serge Populov
13af5d5d7f Docs: Fix typo in field name in aggregations (#34223) 2018-10-02 10:54:29 -04:00
Christoph Büscher
5183ea3d68
Use OptionalInt instead of Optional<Integer> (#34220)
Optionals containing boxed primitive types are prohibitively costly because they
have two level of boxing. For Optional<Integer> the analogous OptionalInt can be
used to avoid the boxing of the contained int value.
2018-10-02 15:58:07 +02:00
Marios Trivyzas
2ba18f50a8
SQL: Remove more ANTLR4 grammar ambiguities (#34074)
The `-` and `+` as a number literal prefix are already
parsed by the rule in `valueExpression`. To accommodate
this, there are some code changes that enables the
`ExpressionBuilder` to parse Literal integers and decimals
together with the `-/+` prefix sign (if exists) and validate
them (wrong format, large numbers, etc.).

Follows: #33854
2018-10-02 15:26:04 +02:00
Benjamin Trent
10201e06cb
Allowing {index}/_xpack/rollup/data to accept comma delimited list (#34115)
* Allowing `{index}/_xpack/rollup/data` to accept comma delimited list

* Address PR comments
2018-10-02 06:21:46 -07:00
Nik Everett
f904c41506
HLRC: Add get rollup job (#33921)
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
2018-10-02 09:11:29 -04:00
Nhat Nguyen
7dbc403226
TEST: Index diff num docs in rolling upgrade tests (#34191)
Today we index the same number of documents (50 documents) in each round
of the rolling upgrade tests. If the actual count does not match, we can
not guess the problematic round.

Relates #27650
2018-10-02 09:04:03 -04:00
Jim Ferenczi
ead6ffce54
Fix cross fields mode of the query_string query (#34216)
This change fixes a bug in the cross fields mode of the `query_string`
query. The multi fields query builder must be reseted before parsing
in order to clear the list of expanded fields coming from the previous text block.

Closes #34215
2018-10-02 14:53:26 +02:00
Przemyslaw Gomulka
3f8cc89c9f
Completion types with multi-fields support (#34081)
Mappings with completion type and multi-fields, were not able to index array or
object format on completion fields. Only string format was supported.
This is fixed by providing multiField parser with externalValueContext with already parsed object

closes #15115
2018-10-02 14:32:56 +02:00
Alexander Reelsen
b1b0f3276b
Core: Add methods to get locale/timezone in DateFormatter (#34113)
This adds some method into the `DateFormatter` interface, namely

* `withLocale()` to change the locale of a date formatter
* `getLocale()`
* `getZone()`
* `hashCode()`
* `equals()`

These methods will be needed for aggregations and mapping changes, where
zones and locales can be specified in the mapping or in search/aggs
parts of a search request.
2018-10-02 14:13:30 +02:00
Ioannis Kakavas
1d049cadbe Fix HLRC docs 2018-10-02 13:23:44 +03:00
David Turner
a127805b4a
[Zen2] Simulate scheduling delays (#34181)
Today we schedule tasks (both immediate and future ones) exactly when
requested. In fact it is more realistic to allow for a small amount of delay in
the scheduling of tasks, and this helps to exercise more interleavings of
actions and therefore to improve test coverage.

This change adds to the DeterministicTaskQueue the ability to add a random
delay to the scheduling of tasks.

This change also provides more explicit timeouts for stabilisation in the
CoordinatorTests.

Using the randomised scheduling feature in the CoordinatorTests also found a
situation in which we could become a leader, then a candidate, and then a
leader again very quickly, causing a clash of the _BECOME_MASTER_ and
_FINISH_ELECTION_ tasks. We change their behaviour to not consider these
duplicates to be problematic.
2018-10-02 11:22:05 +01:00
Shaunak Kashyap
3eed873dde
Updating test assertion (#34040) 2018-10-02 03:19:12 -07:00