1661 Commits

Author SHA1 Message Date
David Turner
bfd24fc030
[Zen2] Reconfigure cluster as its membership changes (#34592)
As master-eligible nodes join or leave the cluster we should give them votes or
take them away, in order to maintain the optimal level of fault-tolerance in
the system. #33924 introduced the `Reconfigurator` to calculate the optimal
configuration of the cluster, and in this change we add the plumbing needed to
actually perform the reconfigurations needed as the cluster grows or shrinks.
2018-10-19 19:24:54 +01:00
David Turner
3de266e3cf Merge branch 'master' into zen2 2018-10-19 14:30:07 +01:00
Daniel Mitterdorfer
dbb6fe58fa
Remove hand-coded XContent duplicate checks
With this commit we cleanup hand-coded duplicate checks in XContent
parsing. They were necessary previously but since we reconfigured the
underlying parser in #22073 and #22225, these checks are obsolete and
were also ineffective unless an undocumented system property has been
set. As we also remove this escape hatch, we can remove the additional
checks as well.

Closes #22253
Relates #34588
2018-10-19 10:13:13 +02:00
Nhat Nguyen
eb36f10394
TEST: Capture replication targets when replication group ready (#34407)
Today, WriteReplicationAction uses a set of replication targets directly
from the primary shard of ReplicationGroup. It should be fine except
when we add/remove or promote a shard while a write action is executing.
We have encountered these two issues:

1. Replicas are not found in the replication targets. This happens
because we remove replicas but the WriteReplicationAction still uses the
old replication targets which include the removed replicas.

2. Access ReplicationGroup from a primary shard which hasn't activated
the primary-mode yet. This is because we won't activate the primary-mode
for a promoting shard after bumping the primary term which is executed
asynchronously.

This commit captures the replication targets when the replication group
is ready and continue using those targets until we re-compute the new
targets after the group is changed.

Closes #33457
2018-10-17 17:37:52 -04:00
Armin Braun
08d4bf6e84
TESTS: Remove Dead Code in Test Infra. (#34548)
* None of this infrastructure is used
* Some redundant throws and resulting catch code removed
2018-10-17 20:08:39 +01:00
Nik Everett
139bbc3f03
Rollup: Consolidate rollup cleanup for http tests (#34342)
This moves the rollup cleanup code for http tests from the high level rest
client into the test framework and then entirely removes the rollup cleanup
code for http tests that lived in x-pack. This is nice because it
consolidates the cleanup into one spot, automatically invokes the cleanup
without the test having to know that it is "about rollup", and should allow
us to run the rollup docs tests.

Part of #34530
2018-10-17 09:32:16 -04:00
Andrey Ershov
93bb24e1f8 Merge branch 'master' into zen2 2018-10-17 14:37:53 +02:00
Armin Braun
3954d041a0
SCRIPTING: Move sort Context to its Own Class (#33717)
* SCRIPTING: Move sort Context to its own Class
2018-10-17 10:02:44 +01:00
Armin Braun
ea576a8ca2
Disc: Move AbstractDisruptionTC to filebased D. (#34461)
* Discovery: Move AbstractDisruptionTestCase to file-based discovery.
* Relates #33675
* Simplify away ClusterDiscoveryConfiguration
2018-10-16 15:28:40 +01:00
David Turner
950ca3adda Merge branch 'master' into zen2 2018-10-16 14:41:14 +01:00
Simon Willnauer
d43a1fac33
Lock down Engine.Searcher (#34363)
`Engine.Searcher` is non-final today which makes it error prone
in the case of wrapping the underlying reader or lucene `IndexSearcher`
like we do in `IndexSearcherWrapper`. Yet, there is no subclass of it yet
that would be dramatic to just drop on the floor. With the start of development
of frozen indices this changed since in #34357 functionality was added to
a subclass which would be dropped if a `IndexSearcherWrapper` is installed on an index.
This change locks down the `Engine.Searcher` to prevent such a functionality trap.
2018-10-16 14:53:07 +02:00
Martijn van Groningen
a1ec91395c
Changed CCR internal integration tests to use a leader and follower cluster instead of a single cluster (#34344)
The `AutoFollowTests` needs to restart the clusters between each tests, because
it is using auto follow stats in assertions. Auto follow stats are only reset
by stopping the elected master node.

Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api.

* Renamed AutoFollowTests to AutoFollowIT, because it is an integration test.
Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name
of an internal api and isn't a good name for an integration test.

* move creation of NodeConfigurationSource to a seperate method

* Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.
2018-10-16 14:45:46 +02:00
Jim Ferenczi
544de13d8e
Disallow negative query boost (#34486)
This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so
it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts
in 6x in order to ensure that users are aware when they'll upgrade to ES 7.

Relates #33309
2018-10-16 11:31:53 +01:00
Armin Braun
ebca27371c
SCRIPTING: Move Aggregation Script Context to its own class (#33820)
* SCRIPTING: Move Aggregation Script Context to its own class
2018-10-15 17:28:05 +01:00
Andrey Ershov
e3a1981a57 Mute testToQuery test 2018-10-15 14:08:04 +02:00
Yannick Welsch
5fbead00a3
Zen2: Add infrastructure for integration tests (#34365)
Adds the infrastructure to run integration tests against Zen2.
2018-10-14 20:55:04 +01:00
Nhat Nguyen
33791ac27c
CCR: Following primary should process operations once (#34288)
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
2018-10-10 15:39:57 -04:00
Nik Everett
06993e0c35
Logging: Make ESLoggerFactory package private (#34199)
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
2018-10-06 09:54:08 -04:00
David Turner
c6b0f08472
Add safety phase to CoordinatorTests (#34241)
Today's CoordinatorTests have a limited amount of randomisation in how things
are scheduled. However, to be fully confident in Zen2's liveness we require the
system to stabilise after any permitted sequence of events. We can achieve
this by running the system in a much more random fashion for a while, with much
larger variation in when things are scheduled (simulating GC pressure and
network disruption) and then continuing to assert that the system stabilises as
we expect. When running randomly, we do not expect to make significant progress
and merely verify that no safety property is violated.

This change introduces the runRandomly() test method which implements this
idea. It also fixes a handful of liveness bugs that this first version of
runRandomly() exposed.
2018-10-04 07:40:26 +01:00
Kazuhiro Sera
d45fe43a68 Fix a variety of typos and misspelled words (#32792) 2018-10-03 18:11:38 +01:00
David Turner
a9eae1d068 Merge branch 'master' into zen2 2018-10-03 08:36:34 +01:00
Nik Everett
f904c41506
HLRC: Add get rollup job (#33921)
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
2018-10-02 09:11:29 -04:00
David Turner
a127805b4a
[Zen2] Simulate scheduling delays (#34181)
Today we schedule tasks (both immediate and future ones) exactly when
requested. In fact it is more realistic to allow for a small amount of delay in
the scheduling of tasks, and this helps to exercise more interleavings of
actions and therefore to improve test coverage.

This change adds to the DeterministicTaskQueue the ability to add a random
delay to the scheduling of tasks.

This change also provides more explicit timeouts for stabilisation in the
CoordinatorTests.

Using the randomised scheduling feature in the CoordinatorTests also found a
situation in which we could become a leader, then a candidate, and then a
leader again very quickly, causing a clash of the _BECOME_MASTER_ and
_FINISH_ELECTION_ tasks. We change their behaviour to not consider these
duplicates to be problematic.
2018-10-02 11:22:05 +01:00
Nhat Nguyen
ad61398879
CCR: Optimize indexing ops using seq_no on followers (#34099)
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.

Relates #33656
2018-09-28 20:42:26 -04:00
Ryan Ernst
47cbae9b26
Scripting: Remove ExecutableScript (#34154)
This commit removes the legacy ExecutableScript, which was no longer
used except in tests. All uses have previously been converted to script
contexts.
2018-09-28 17:13:08 -07:00
Hendrik Muhs
e2f310b56c
Fix AggregationFactories.Builder equality and hash regarding order (#34005)
Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.

This also addresses a potential issue in caching when the same query contains aggregations but in 
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in 
the cache.
2018-09-28 13:30:50 +02:00
Alan Woodward
f243d75f59
Remove special-casing of Synonym filters in AnalysisRegistry (#34034)
The synonym filters no longer need access to the AnalysisRegistry in their
constructors, so we can remove the special-case code and move them to the
common analysis module.

This commit means that synonyms are no longer available for `server` integration tests,
so several of these are either rewritten or migrated to the common analysis module
as rest-spec-api tests
2018-09-28 09:02:47 +01:00
Ryan Ernst
a2c941806b
Tests: Add support for custom contexts to mock scripts (#34100)
This commit adds the ability to plug in compilation of custom contexts
in mock script engine. This is needed for testing plugins which add
custom contexts like watcher.
2018-09-27 12:23:59 -07:00
Jim Ferenczi
269ae0bc15
Handle MatchNoDocsQuery in span query wrappers (#34106)
* Handle MatchNoDocsQuery in span query wrappers

This change adds a new SpanMatchNoDocsQuery query that replaces
MatchNoDocsQuery in the span query wrappers.
The `wildcard` query now returns MatchNoDocsQuery if the target field is not
in the mapping (#34093) so we need the equivalent span query in order to
be able to pass it to other span wrappers.

Closes #34105
2018-09-27 14:19:08 +02:00
Simon Willnauer
bda7bc145b
Fold EngineSearcher into Engine.Searcher (#34082)
EngineSearcher can be easily folded into Engine.Searcher which removes
a level of inheritance that is necessary for most of it's subclasses.
This change folds it into Engine.Searcher and removes the dependency on
ReferenceManager.
2018-09-27 09:06:04 +02:00
Yogesh Gaikwad
0301062c6e Mute SpanMultiTermQueryBuilderTests#testToQuery 2018-09-27 15:26:06 +10:00
Nik Everett
ddce9704d4
Logging: Drop two deprecated methods (#34055)
This drops two deprecated methods from `ESLoggerFactory`, switching all
calls to those methods to calls to methods of the same name on
`LogManager`.
2018-09-26 11:20:52 -04:00
Adrien Grand
3c2841d493
REST test for typeless APIs. (#33934)
This commit duplicates REST tests for the
 - `indices.create`
 - `indices.put_mapping`
 - `indices.get_mapping`
 - `index`
 - `get`
 - `delete`
 - `update`
 - `bulk`
APIs, so that we both test them when used without types (include_type_name=false)
and with types, mostly for mixed-version cluster tests.

Given a suite called `X_test_name.yml`, I first copied it to
`(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set
`include_type_name=false` on every API that supports it.

Relates #15613
2018-09-26 17:11:37 +02:00
Ryan Ernst
7800b4fa91
Core: Abstract DateMathParser in an interface (#33905)
This commits creates a DateMathParser interface, which is already
implemented for both joda and java time. While currently the java time
DateMathParser is not used, this change will allow a followup which will
create a DateMathParser from a DateFormatter, so the caller does not
need to know the internals of the DateFormatter they have.
2018-09-26 07:56:25 -07:00
Zachary Tong
25d74bd0cb
Prefer mapped aggs to lead reductions (#33528)
Previously, unmapped aggs try to delegate reduction to a sibling agg that is
mapped.  That delegated agg will run the reductions, and also
reduce any pipeline aggs.  But because delegation comes before running
pipelines, the unmapped agg _also_ tries to run pipeline aggs.

This causes the pipeline to run twice, and potentially double it's output
in buckets which can create invalid JSON (e.g. same key multiple times)
and break when converting to maps.

This fixes by sorting the list of aggregations ahead of time so that mapped
aggs appear first, meaning they preferentially lead the reduction.  If all aggs
are unmapped, the first unmapped agg simply creates a new unmapped object
and returns that for the reduction.

This means that unmapped aggs no longer defer and there is no chance for 
a secondary execution of pipelines (or other side effects caused by deferring
execution).

Closes #33514
2018-09-26 10:09:31 -04:00
Christoph Büscher
ba3ceeaccf
Clean up "unused variable" warnings (#31876)
This change cleans up "unused variable" warnings. There are several cases were we 
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
2018-09-26 14:09:32 +02:00
David Turner
d995fc85c6
Integrate LeaderChecker with Coordinator (#34049)
This change ensures that follower nodes periodically check that their leader is
healthy, and that they elect a new leader if not.
2018-09-26 12:18:13 +01:00
Ryan Ernst
be8475955e
Scripting: Use ParameterMap for deprecated ctx var in update scripts (#34065)
This commit removes the sysprop controlling whether ctx is in params for
update scripts and replaces it with use of the new ParameterMap, which
outputs a deprecation warning whenever params.ctx is used.
2018-09-25 22:08:02 -07:00
Nhat Nguyen
5166dd0a4c
Replicate max seq_no of updates to replicas (#33967)
We start tracking max seq_no_of_updates on the primary in #33842. This
commit replicates that value from a primary to its replicas in replication 
requests or the translog phase of peer-recovery.

With this change, we guarantee that the value of max seq_no_of_updates
on a replica when any index/delete operation is performed at least the
max_seq_no_of_updates on the primary when that operation was executed.

Relates #33656
2018-09-25 08:07:57 -04:00
David Turner
1d47c9582b
Fix CoordinatorTests (#34002)
Today the CoordinatorTests are not very reliable if two elections are scheduled
concurrently. Although we expect occasional failures due to this, in fact the
failures are much more common than expected due to a handful of issues. This PR
fixes these issues.
2018-09-25 08:43:47 +01:00
Tim Brooks
78e483e8d8
Introduce abstract security transport testcase (#33878)
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
2018-09-24 09:44:44 -06:00
Nhat Nguyen
7944a0cb25
Track max seq_no of updates or deletes on primary (#33842)
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).

The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.

Relates #33656
2018-09-22 08:02:57 -04:00
Vladimir Dolzhenko
477391d751
Don't test corruption detection within CFS checksum (#33911)
Closes #33881
2018-09-22 10:21:36 +02:00
Yannick Welsch
a612dd1272
Zen2: Add node id to log output of CoordinatorTests (#33929)
With recent changes to the logging framework, the node name can no longer be injected into the logging output using the node.name setting, which means that for the CoordinatorTests (which are simulating a cluster in a fully deterministic fashion using a single thread), as all the different nodes are running under the same test thread, we are not able to distinguish which log lines are coming from which node. This commit readds logging for node ids in the CoordinatorTests, making two very small changes to DeterministicTaskQueue and TestThreadInfoPatternConverter.
2018-09-21 18:40:12 +02:00
lipsill
b48d5a8942 [TEST] ClientYamlSuiteRestApiParser to parse spec without path parts (#33720)
Previously ClientYamlSuiteRestApiParser threw an exception when an api
spec contained neither path parts nor url parameter sections.

Closes #31649
2018-09-21 17:26:55 +02:00
Alexander Reelsen
1de2a925ce
Watcher: Ensure that execution triggers properly on initial setup (#33360)
This commit reverts most of #33157 as it introduces another race
condition and breaks a common case of watcher, when the first watch is
added to the system and the index does not exist yet.

This means, that the index will be created, which triggers a reload, but
during this time the put watch operation that triggered this is not yet
indexed, so that both processes finish roughly add the same time and
should not overwrite each other but act complementary.

This commit reverts the logic of cleaning out the ticker engine watches
on start up, as this is done already when the execution is paused -
which also gets paused on the cluster state listener again, as we can be
sure here, that the watches index has not yet been created.

This also adds a new test, that starts a one node cluster and emulates
the case of a non existing watches index and a watch being added, which
should result in proper execution.

Closes #33320
2018-09-21 14:22:34 +02:00
Armin Braun
3a5b8a71b4
NETWORKING: Fix Portability of SO_LINGER=0 in Tests (#33895)
* Setting SO_LINGER for open but not connected non-blocking sockets
throws on OSX
  * Fixed by only applying setting to connected sockets which will save
the same number of FDs as doing it on open sockets anyway
* closes #33879
2018-09-21 10:08:16 +02:00
Nhat Nguyen
5f7f793f43
Propagate max_auto_id_timestamp in peer recovery (#33693)
Today we don't store the auto-generated timestamp of append-only
operations in Lucene; and assign -1 to every index operations
constructed from LuceneChangesSnapshot. This looks innocent but it
generates duplicate documents on a replica if a retry append-only
arrives first via peer-recovery; then an original append-only arrives
via replication. Since the retry append-only (delivered via recovery)
does not have timestamp, the replica will happily optimizes the original
request while it should not.

This change transmits the max auto-generated timestamp from the primary
to replicas before translog phase in peer recovery. This timestamp will
prevent replicas from optimizing append-only requests if retry
counterparts have been processed.

Relates #33656 
Relates #33222
2018-09-20 19:53:30 -04:00
David Turner
187f787f52
[Zen2] Introduce LeaderChecker (#33024)
It is important that follower nodes periodically check that their leader is
still healthy and that they remain part of its cluster. If these checks fail
repeatedly then followers should attempt to find and join a new leader,
possibly electing one in the process. The LeaderChecker, introduced in this
commit, performs these periodic checks and deals with retries.
2018-09-20 20:05:55 +01:00
Nhat Nguyen
76a1a863e3
TEST: stop assertSeqNos if shards movement (#33875)
Currently, assertSeqNos assumes that the cluster is stable at the end of
the test (i.e., no more shard movement). However, this assumption does
not always hold. In these cases, we can stop the assertion instead of
failing a test.

Closes #33704
2018-09-20 13:44:26 -04:00