1731 Commits

Author SHA1 Message Date
Jason Tedor
9be87adb95
Increment settings version when upgrading index (#34566)
When we upgrade an index, we set the settings version upgraded
setting. This should be considered a settings change, and therefore we
need to increment the settings version. This commit addresses that.
2018-10-17 18:00:17 -04:00
Nik Everett
b6aa42777a
Search: Wrap lucene classes at 140 columns (#34491)
Applies our line length guidance for all classes in the server in `lucene`
directories *except* `XMoreLikeThis`. The only long line in
`XMoreLikeThis` says "remove this when we upgrade to Lucene 5. Given
that we're on Lucene 8, this is a little terrifying and deserves another
look.
2018-10-17 15:54:35 -04:00
Armin Braun
08d4bf6e84
TESTS: Remove Dead Code in Test Infra. (#34548)
* None of this infrastructure is used
* Some redundant throws and resulting catch code removed
2018-10-17 20:08:39 +01:00
Colin Goodheart-Smithe
90f7cec7a5
Merge branch 'master' into index-lifecycle 2018-10-17 18:22:23 +01:00
Simon Willnauer
b0e98cbce2
Pass the host name on as server_name if proxy mode is on (#34559)
In remote cluster setup if we see a configured proxy we should set
the seed nodes host name as the `server_name` to trigger SNI based
routing even for seed nodes. Since remote cluster connections are
plain TCP connections we have to set the host manually since the other
side can't take it from the request URL like in the HTTP case.
This also adds some more informative logging to remote cluster connection.
2018-10-17 19:11:50 +02:00
Andrey Ershov
51f38ddc0c
Switch MetaDataStateFormat to Lucene directory abstraction (#33989)
Switch MetaDataStateFormat to Lucene directory abstraction

This commit switches MetaDataStateFormat class to Lucene directory abstraction to make it easier to test MetaDataStateFormat for different IO failures.
This commits also adds different IO failures tests to MetaDataStateFormatTests.
2018-10-17 18:17:17 +02:00
Andrey Ershov
93bb24e1f8 Merge branch 'master' into zen2 2018-10-17 14:37:53 +02:00
Armin Braun
3954d041a0
SCRIPTING: Move sort Context to its Own Class (#33717)
* SCRIPTING: Move sort Context to its own Class
2018-10-17 10:02:44 +01:00
Tal Levy
fbe8dc014c Merge branch 'master' into index-lifecycle 2018-10-16 13:58:53 -07:00
Simon Willnauer
a93aefb4a4
Assume that rollover datemath tests run on the same day. (#34527)
in #28741 RolloverIT fails because we are cutting over to the
next day while the test executes. We assume that this doesn't happen
based on the assertions in the test. This adds a assumeTrue to ensure
we are at least 5 min away form a date-flip.

Closes #28741
2018-10-16 20:22:32 +02:00
David Turner
303575f742 Fix up merge of master 2018-10-16 15:29:47 +01:00
Armin Braun
ea576a8ca2
Disc: Move AbstractDisruptionTC to filebased D. (#34461)
* Discovery: Move AbstractDisruptionTestCase to file-based discovery.
* Relates #33675
* Simplify away ClusterDiscoveryConfiguration
2018-10-16 15:28:40 +01:00
David Turner
950ca3adda Merge branch 'master' into zen2 2018-10-16 14:41:14 +01:00
Simon Willnauer
d43a1fac33
Lock down Engine.Searcher (#34363)
`Engine.Searcher` is non-final today which makes it error prone
in the case of wrapping the underlying reader or lucene `IndexSearcher`
like we do in `IndexSearcherWrapper`. Yet, there is no subclass of it yet
that would be dramatic to just drop on the floor. With the start of development
of frozen indices this changed since in #34357 functionality was added to
a subclass which would be dropped if a `IndexSearcherWrapper` is installed on an index.
This change locks down the `Engine.Searcher` to prevent such a functionality trap.
2018-10-16 14:53:07 +02:00
Martijn van Groningen
a1ec91395c
Changed CCR internal integration tests to use a leader and follower cluster instead of a single cluster (#34344)
The `AutoFollowTests` needs to restart the clusters between each tests, because
it is using auto follow stats in assertions. Auto follow stats are only reset
by stopping the elected master node.

Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api.

* Renamed AutoFollowTests to AutoFollowIT, because it is an integration test.
Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name
of an internal api and isn't a good name for an integration test.

* move creation of NodeConfigurationSource to a seperate method

* Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.
2018-10-16 14:45:46 +02:00
Jason Tedor
05911fb499
Adjust settings version BWC version after backport
This commit adjusts the settings version BWC version after backporting
the change to the 6.x branch which currently is versioned as 6.5.0.
2018-10-16 06:38:38 -04:00
Jim Ferenczi
544de13d8e
Disallow negative query boost (#34486)
This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so
it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts
in 6x in order to ensure that users are aware when they'll upgrade to ES 7.

Relates #33309
2018-10-16 11:31:53 +01:00
Jason Tedor
4b2052c683
Introduce index settings version (#34429)
This commit introduces settings version to index metadata. This value is
monotonically increasing and is updated on settings updates. This will
be useful in cross-cluster replication so that we can request settings
updates from the leader only when there is a settings update.
2018-10-16 06:22:20 -04:00
Daniel Mitterdorfer
92b2e1a209
Remove lenient boolean handling
With this commit we remove some leftovers from #26389 which cleaned up
lenient boolean handling.

Relates #26389
Relates #22298
Relates #34467
2018-10-16 06:30:00 +02:00
Jason Tedor
55dee53046
Do not update number of replicas on no indices (#34481)
Today when submitting an update settings request to update the number of
replicas with a wildcard that does not match any indices and allow no
indices is set to true, the request ends up being interpreted as
updating the number of replicas for all indices. That is, consider the
following sequence:

PUT /test-index
{
  "settings": {
    "index.number_of_replicas": 0
  }
}

PUT /non-existent-*/_settings?expand_wildcards=open&allow_no_indices=true
{
  "settings": {
    "index.number_of_replicas": 1
  }
}

GET /test-index/_settings

The latter will show that the number of replicas on test-index is now
one. This is surprising, and should be considered a bug.

The underlying problem here is treating no indices in the underlying
methods used to update the routing table and the metadata as meaning all
indices. This commit takes away this assumption. Tests that relied on
this behavior have been changed to no longer rely on this.

A test for this situation is added in UpdateNumberOfReplicasIT.
2018-10-15 19:49:58 -04:00
Nik Everett
23ece922c9
Core: Remove two methods from AbstractComponent (#34336)
This removes another two methods from `AbstractComponent`. One isn't
used at all and another is only used in a single class in watcher. I've
moved the method that watcher uses into the single class that uses it.
2018-10-15 16:05:14 -04:00
Nik Everett
a6d1cc6ca9 Revert "Search: Fix spelling mistake in Javadoc (#34480)"
This reverts commit 4e1d7baed0a1e2fa1fa17fe4479045d811f1e02e.
2018-10-15 15:42:11 -04:00
fonxian
4e1d7baed0 Search: Fix spelling mistake in Javadoc (#34480)
"iff" -> "if".
2018-10-15 15:38:37 -04:00
Ryan Ernst
26f1d7fc94
Tests: Handle epoch date formatters edge cases (#34437)
This commit handles cases testing withLocale and withZone when the zone
and locale in question is the same as the special base case. This can
happen sometimes since the locale and zoneids are randomized.
2018-10-15 12:18:18 -07:00
Jim Ferenczi
67577fca56
Fix handling of empty keyword in terms aggregation (#34457)
Empty values on keyword fields are filtered by the `map` execution mode
of the `terms` aggregation. This commit restores them as valid buckets.

Closes #34434
2018-10-15 19:33:52 +01:00
Armin Braun
ebca27371c
SCRIPTING: Move Aggregation Script Context to its own class (#33820)
* SCRIPTING: Move Aggregation Script Context to its own class
2018-10-15 17:28:05 +01:00
Colin Goodheart-Smithe
0b42eda0e3
Merge branch 'master' into index-lifecycle 2018-10-15 16:03:37 +01:00
David Turner
9bb620eece Mute PartitionedRoutingIT#testShrinking on Windows 2018-10-15 13:18:00 +01:00
Ryan Ernst
72d818c304
Tests: Fix DateFormatter equals tests with locale (#34435)
This commit removes randomization of locale for DateFormatter equals
tests, instead using explicit locales. The test framework already
randomizes locales, so the random choice of the second locale can
sometimes be equal to the already chosen locale. Randomization also does
not provide any extra protection, as the equality of DateFormatter does
not implement equality of the locales itself.

closes #34337
2018-10-14 23:54:49 +01:00
Yannick Welsch
5fbead00a3
Zen2: Add infrastructure for integration tests (#34365)
Adds the infrastructure to run integration tests against Zen2.
2018-10-14 20:55:04 +01:00
David Turner
8b9fa55c93
Add storage-layer disruptions to CoordinatorTests (#34347)
Today we assume the storage layer operates perfectly in CoordinatorTests, which
means we are not testing that the system's invariants are preserved if the
storage layer fails for some reason. This change injects (rare) storage-layer
failures during the safety phase to cover these cases.
2018-10-13 14:24:15 +01:00
David Turner
d98199df14
Extend duration of fixLag() (#34364)
Today, fixLag() waits for a new cluster state to be committed. However, it does
not account for the fact that a term bump may occur, requiring a new election
to take place after the cluster state is committed. This change fixes this.
2018-10-11 23:24:08 +01:00
David Turner
a32e303b0c
Account for election duration (#34362)
Today we may schedule two elections very close together, which can cause the
first election to fail even if there are no other nodes. This change adds a
delay in between subsequent elections on the same node, effectively allowing
time for each election to complete before scheduling the next one.
2018-10-11 15:31:08 +01:00
Jay Modi
6d99d7dafc
ListenableFuture should preserve ThreadContext (#34394)
ListenableFuture may run a listener on the same thread that called the
addListener method or it may execute on another thread after the future
has completed. Whenever the ListenableFuture stores the listener for
execution later, it should preserve the thread context which is what
this change does.
2018-10-11 15:24:38 +01:00
Nhat Nguyen
33791ac27c
CCR: Following primary should process operations once (#34288)
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.

This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).

Relates #31751
Relates #31113
2018-10-10 15:39:57 -04:00
Simon Willnauer
34b935ae57
Improve getRestHandlerWrapper JavaDocs (#34376)
Questions on how to work with `ActionPlugin#getRestHandlerWrapper()`
come up in discuss forums all the time. This change adds an example
to the javadoc how this method should/could be used.
2018-10-10 17:28:07 +01:00
David Turner
52a3a19551
Add low-level bootstrap implementation (#34345)
Today we inject the initial configuration of the cluster (i.e. the set of
voting nodes) at startup. In reality we must support injecting the initial
configuration after startup too. This commit adds low-level support for doing
so as safely as possible.
2018-10-08 15:56:48 +01:00
Yannick Welsch
49cbcaff4f
Allow excluding folder names when scanning for dangling indices (#34349)
ES is scanning for dangling indices on every cluster state update. For this, it lists the subfolders of
the indices directory to determine which extra index directories exist on the node where there's no
corresponding index in the cluster state. These are potential targets for dangling index import. On
certain machine types, and with large number of indices, this subfolder listing can be horribly slow.
This means that every cluster state update will be slowed down by potentially hundreds of
milliseconds. One of the reasons for this poor performance is that Files.isDirectory() is a relatively
expensive call on some OS and JDK versions. There is no need though to do all these isDirectory
calls for folders which we know we are going to discard anyhow in the next step of the dangling
indices logic. This commit allows adding an exclusion predicate to the availableIndexFolders
methods which can dramatically speed up this method when scanning for dangling indices.
2018-10-08 15:35:50 +02:00
David Turner
ac99d1d66d
Fix bugs in fixLag() (#34346)
The hack to work around lag detection had some issues:
- it always called runFor(), even if no lag was detected
- it looked at the last-accepted state not the last-applied state, so missed
  some lag situations.

This fixes these issues.
2018-10-08 11:33:25 +01:00
Nik Everett
06993e0c35
Logging: Make ESLoggerFactory package private (#34199)
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
2018-10-06 09:54:08 -04:00
David Turner
03da4f6c51
Gather votes from all nodes (#34335)
Today we accept that some nodes may vote for the wrong master in an election.
This is mostly fine because they do end up joining the correct master in the
end, but the lack of a vote from every follower may prevent a future desirable
reconfiguration from taking place.

The solution is to hold another election in a yet-higher term in order to
collect a complete set of votes. Elections are somewhat disruptive so we should
think carefully about when this election should take place. One option is to
wait as late as possible (on the grounds that it might not ever be necessary).
This unfortunately makes it harder to predict how an
apparently-smoothly-running cluster will react to nodes leaving and joining.
Instead we prefer to perform the election as soon as possible in the leader's
term, adding "votes from all followers" to the invariants that we expect to
hold in a stable cluster. The start of a leader's term is already a somewhat
disrupted time for the cluster, so performing another election at this point
does not materially change the cluster's behaviour.

This change implements the logic needed to trigger a new election in order to
satisfy this extra stabilisation condition.
2018-10-06 07:22:04 +01:00
Daniel Mitterdorfer
7d826916b9
Adjust size of BigArrays in circuit breaker test
With this commit we restore the previous behavior in
`BigArraysTests#testMaxSizeExceededOnResize` but lower the sizes that
are tested to the range between 256 bytes to 16 kB so the test does not
produce a whole lot of garbage.

The previous attempt to reduce the amount of garbage produced by that
test was to properly size the array initially but it failed to account
for object alignment which lead to test failures in some cases. While it
would be possible to account for object alignment, we would need to open
up BigArrays or directly use the underlying Lucene API which would
require us to allocate an array upfront only to find its size (incl.
object alignment).

Instead we have fixed this issue by conservatively sizing the array
initially (so the initial allocation will never trip the circuit
breaker) and reduce garbage by reducing the circuit breaker's upper
bound as described previously.

Closes #33750
Relates #34325
2018-10-05 15:39:08 +02:00
Jim Ferenczi
5c7b52e930 Adapt bwc version after backport
Relates #33587
2018-10-05 13:07:39 +02:00
eray
daf88335d7 Add max_children limit to nested sort (#33587)
Add an option to `nested` sort to limit the number of children to visit when picking the sort value
of the root document. 

Closes #33592
2018-10-05 12:02:47 +02:00
David Turner
29d7d1d503
Minor housekeeping of tests (#34315)
From experience with #34257, here are a few things that help with analysing
logs from test runs. Also we prevent trying to stabilise a cluster with raised
delay variability, because lowering the delay variability requires time to
allow all the extra-varied-scheduled tasks to work their way out of the system.
2018-10-05 07:57:03 +01:00
Dimitris Athanasiou
4dacfa95d2
[ML] Allow asynchronous job deletion (#34058)
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.

This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.

Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.

Closes #32836
2018-10-05 02:41:28 +03:00
Nik Everett
09aaed4fe4
Tasks: Document that status is not semvered (#34270)
The `status` part of the tasks API reflects the internal status of a
running task. In general, we do not make backwards breaking changes to
the `status` but because it is internal we reserve the right to do so. I
suspect we will very rarely excercise that right but it is important
that we have it so we're not boxed into any particular implementation
for a request.

In some sense this is policy making by documentation change. In another
it is clarification of the way we've always thought of this field.

I also reflect the documentation change into the Javadoc in a few
places. There I acknowledge Kibana's "special relationship" with
Elasticsearch. Kibana parses `_reindex`'s `status` field and, because
we're friends with those folks, we should talk to them before we make
backwards breaking changes to it. We *want* to be friends with everyone
but there is only so much time in the day and we don't *want* to make
backwards breaking fields to `status` at all anyway. So we hope that
breaking changes documentation should be enough for other folks.

Relates to #34245.
2018-10-04 14:42:37 -04:00
Yannick Welsch
b32abcbd00
Zen2: Add Cluster State Applier (#34257)
Adds the cluster state applier to Coordinator, and adds tests for cluster state acking.
2018-10-04 20:33:28 +02:00
Vladimir Dolzhenko
dcfe64e0e4
[CI] Fix bogus ScheduleWithFixedDelayTests.testRunnableRunsAtMostOnceAfterCancellation
Closes #34004
2018-10-04 16:31:56 +02:00
Armin Braun
3ccfc3de58
SCRIPTING: Terms set query expression (#33856)
* SCRIPTING: Add Expr. Compile for TermSetQuery Ctx.

* Follow up to #33602 adding the ability to compile TermsSetQuery
scripts with the expressions engine in the same way we support
SearchScript in Expressions
   * Duplicated the code here for now to make the change less complex,
 the only difference to SearchScript is that `_score` and `_value` are not handled for TermsSetQuery
* remove redundant check
2018-10-04 16:03:57 +02:00