7052 Commits

Author SHA1 Message Date
Simon Willnauer
1d8c8529ed Remove IndexTemplateAlreadyExistsException and IndexShardAlreadyExistsException (#21539)
Both exception can be replaced with java built-in exception, IAE and ISE respectively.
This should be back ported partially to 5.x which the transport layer code should be preserved.

Relates to #21494
2016-11-14 17:09:57 +01:00
Simon Willnauer
26375256ff Enable 5.x to 6.x BWC tests (#21537)
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
2016-11-14 17:03:57 +01:00
Yannick Welsch
d3e97ce6cd Fix line length in TCPTransportTests
Makes checkstyle happy
2016-11-14 16:55:14 +01:00
Yannick Welsch
d42f7eec61 Check valid cluster service state transitions (#21538)
This commit adds assertions to check whether the cluster service state transitions in a way that we expect it to.

Relates to #21379.
2016-11-14 16:49:25 +01:00
Simon Willnauer
26a8a94e56 [TEST] Add test to ensure transport.tcp.compress works
This adds a basic unittest to ensure `transport.tcp.compress` has effect
on all basic TcpTransport implementations.

Relates to #21526
2016-11-14 16:13:44 +01:00
Simon Willnauer
7d4bde8e00 remove forbidden API 2016-11-14 15:30:07 +01:00
Yannick Welsch
8655cd7182 Add assertion that checks that the same shard with same id is not added to same node (#21498)
Adds an assertion that checks that the same shard with same id is not added to same node. Previously we would just silently ignore the second shard being added.
2016-11-14 15:14:14 +01:00
Simon Willnauer
bdc942fa72 Enable 5.x to 6.x BWC tests
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
2016-11-14 14:26:49 +01:00
Adrien Grand
1fd5c47e7f Upgrade to lucene-6.3.0. (#21464) 2016-11-14 09:36:45 +01:00
Jason Tedor
c7a1b3eb50 Merge branch 'master' into feature/seq_no
* master:
  Hack around cluster service and logging race
  Do not prematurely shutdown Log4j
  Support decimal constants with trailing [dD] in painless (#21412)
  In painless suggest a long constant if int won't do (#21415)
  Account for different paths for sysctl utilities
  [TEST] testRebalancePossible() may not have an assigned node id
  Tests: Disable merge in SearchCancellationTests
  Tests: clean search scroll at the end of SearchCancellationIT
2016-11-13 20:01:44 -05:00
Jason Tedor
19decd7552 Hack around cluster service and logging race
When a cluster update task executes, there can be log messages after the
update task has finished processing and the new cluster state becomes
visible. The visibility of the cluster state allows the test thread in
UpdateSettingsIT#testUpdateAutoThrottleSettings and
UpdateSettingsiT#testUpdateMergeMaxThreadCount to proceed. The test
thread will remove and stop a mock appender setup at the beginning of
the test. The log messages in the cluster state update task that occur
after processing has finished can race with the removal of the
appender. Log4j will grab a reference to the appenders when processing
these log messages, and this races with the removal and stopping of the
appenders. If Log4j grabs a reference to the appenders before the mock
appender has been removed, and the test thread subsequently removes and
stops the appender before Log4j has appended the log message, Log4j will
get angry that we are appending to a stopped appender, causing the test
to fail. This commit addresses this race by waiting for the cluster
state update task to have finished processing before freeing the test
thread to make its assertions and finally remove and stop the
appender. Yes, this is a hack.

Relates #21518
2016-11-13 18:06:12 -05:00
Jason Tedor
d273419d00 Do not prematurely shutdown Log4j
When a node closes, we shutdown logging as the last statement. This
statement must be last lest any subsequent attempts to log will blow up
by running into security permissions. Yet, in the case of a tribe node
this isn't enough. The first internal tribe node to close will shutdown
logging, and subsequent node closes will blow up with the aforementioned
problem. This commit migrate the Log4j shutdown to occur as part of the
shutdown hook that closes the node, after all nodes have
closed. Consequently, we can remove a hack in the test infrastructure to
prevent Log4j shutdowns when internal test nodes close and instead just
register a single shutdown hook that runs when the test JVM exits.

Relates #21519
2016-11-13 17:27:30 -05:00
Boaz Leskes
fac6cf0d4e testUpgradeOldIndex should properly set index setting. They are needed for assertions 2016-11-12 11:42:02 +01:00
Ali Beyad
38023fb58d [TEST] testRebalancePossible() may not have an assigned node id 2016-11-11 23:10:34 -05:00
Igor Motov
ca639e8c86 Tests: Disable merge in SearchCancellationTests
We have to have at least 2 segments for the test to work and sometimes random merge policy merges them into one.
2016-11-11 18:22:28 -05:00
Igor Motov
058b6e019c Tests: clean search scroll at the end of SearchCancellationIT
Under some rare conditions search cancellation response might not fully clean scroll context. For now this commit adds the cleaning operation to the test, and we will address the root cause in https://github.com/elastic/elasticsearch/issues/21511
2016-11-11 18:22:15 -05:00
Jason Tedor
1ea69b1a80 Merge branch 'master' into feature/seq_no
* master:
  Set vm.max_map_count on systemd package install
  [TEST] reduce the number of snapshotted shards to 1 in testSnapshotSucceedsAfterSnapshotFailure() so that we are more likely to trigger I/O exceptions on writing the control files during the finalize phase of snapshotting (with the aim of triggering an I/O failure when writing pending-index-*).
  Add documentation for Logger with Transport Client
  Enable appender exceptions in UpdateSettingsIT
  [TEST] remove AwaitsFix from testSnapshotSucceedsAfterSnapshotFailure, turns out the issue is specific to Java 9 v143
  Cleanup formatting in UpdateSettingsIT.java
  [TEST] mute the testSnapshotSucceedsAfterSnapshotFailure() test until its clear what is going wrong.
  Mark SearchQueryIT test as awaits fix
  Makes snapshot throttling test go much faster (#21485)
  Breaking changes docs for template index_patterns
  [TEST] adds randomness between atomic and non-atomic move operations in MockRepository
  Cache successful shard deletion checks (#21438)
  Task cancellation command should wait for all child nodes to receive cancellation request before returning
2016-11-11 17:03:01 -05:00
Jason Tedor
d06f43c706 Tighten sequence number assertion
We have an assertion in the engine regarding the initial state of a
sequence number before an indexing operation. This assertion is too
loose, it catches operations during recovery from old indices where
sequence numbers do not even exist. This commit tightens these
assertions to not catch such operations and enables us to reenable some
tests.

Relates #21509
2016-11-11 16:49:13 -05:00
Ali Beyad
5f1d108704 [TEST] reduce the number of snapshotted shards to 1 in testSnapshotSucceedsAfterSnapshotFailure()
so that we are more likely to trigger I/O exceptions on writing the control files during the
finalize phase of snapshotting (with the aim of triggering an I/O failure when writing pending-index-*).
2016-11-11 16:22:11 -05:00
Jason Tedor
8d1260a58a Convert nocommit to TODO in SeqNoFieldMapper
This commit converts a nocommit to a TODO in SeqNoFieldMapper that will
be dealt with in a follow-up.
2016-11-11 16:11:41 -05:00
Jason Tedor
c77d285699 Remove nocommit in TransportShardBulkAction
This commit removes a nocommit in TransportShardBulkAction that deserves
a larger issue.
2016-11-11 16:10:22 -05:00
Jason Tedor
33f7cd5a16 Remove shard ID from doc write response
This commit removes the shard ID from doc write response; this was
useful for debugging but its time has passed.

Relates #21508
2016-11-11 15:18:25 -05:00
Jason Tedor
9352d16602 Enable appender exceptions in UpdateSettingsIT
This commit sets the mock appender in UpdateSettingsIT to not ignore
exceptions. This means that when an exception is hit, we will see an
actual stack trace that could be useful in debugging a non-reproducible
test failure.

Relates #21461
2016-11-11 12:41:20 -05:00
Ali Beyad
c9c3992f94 [TEST] remove AwaitsFix from testSnapshotSucceedsAfterSnapshotFailure,
turns out the issue is specific to Java 9 v143
2016-11-11 12:37:04 -05:00
Jason Tedor
79076334ae Cleanup formatting in UpdateSettingsIT.java
This commit cleans up some code formatting in UpdateSettingsIT.java and
removes this from from the checkstyle line-length supressions.
2016-11-11 12:10:32 -05:00
Ali Beyad
8f85e388da [TEST] mute the testSnapshotSucceedsAfterSnapshotFailure() test
until its clear what is going wrong.

Relates #21496
2016-11-11 11:50:23 -05:00
Jason Tedor
372480a16a Mark SearchQueryIT test as awaits fix
This commit marks the test SearchQueryIT#testRangeQueryWithTimeZone as
awaits fix.

Relates #21501
2016-11-11 11:33:17 -05:00
Yannick Welsch
9cbb23f3d7 Test distinctNodes 2016-11-11 17:29:51 +01:00
Jason Tedor
1e7c424479 Merge branch 'master' into feature/seq_no
* master:
  ShardActiveResponseHandler shouldn't hold to an entire cluster state
  Ensures cleanup of temporary index-* generational blobs during snapshotting (#21469)
  Remove (again) test uses of onModule (#21414)
  [TEST] Add assertBusy when checking for pending operation counter after tests
  Revert "Add trace logging when aquiring and releasing operation locks for replication requests"
  Allows multiple patterns to be specified for index templates (#21009)
  [TEST] fixes rebalance single shard check as it isn't guaranteed that a rebalance makes sense and the method only tests if rebalance is allowed
  Document _reindex with random_score
2016-11-11 11:25:27 -05:00
Ali Beyad
a5ccd02e76 Makes snapshot throttling test go much faster (#21485)
[TEST] Makes the snapshot throttling test go much faster.  Before, 
the snapshot throttling test would throttle at a rate of 0.5 kb per
second, even though it would snapshot/restore about 25 kb of data.
This commit increases the throttling rate to 10kb per second, so
we still test the throttling mechanism while speeding up the test from
taking 30 plus seconds down to 2 seconds or less.
2016-11-11 10:52:26 -05:00
Yannick Welsch
d195ef258b test fix 2016-11-11 16:09:34 +01:00
Yannick Welsch
1635baf876 fix tests that add duplicate shards 2016-11-11 15:28:40 +01:00
Yannick Welsch
7099f10909 Add assertion that checks that the same shard with same id is not added to same node 2016-11-11 15:28:40 +01:00
Ali Beyad
adb7aaded4 [TEST] adds randomness between atomic and non-atomic move
operations in MockRepository
2016-11-11 09:07:28 -05:00
Yannick Welsch
2d3a52c0f2 Cache successful shard deletion checks (#21438)
Each node checks on every cluster state update if there are shards that it can possibly delete from its disk. It decides this by doing a file-system lookup for each shard id that is fully allocated in the cluster. With lots of shards, this amounts to lots of Files.exists() checks, considerably slowing down cluster state updates. This commit adds a caching layer so that the Files.exists() checks can be skipped if not needed.
2016-11-11 10:06:15 +01:00
Jason Tedor
d3417fb022 Merge branch 'master' into feature/seq_no
* master: (516 commits)
  Avoid angering Log4j in TransportNodesActionTests
  Add trace logging when aquiring and releasing operation locks for replication requests
  Fix handler name on message not fully read
  Remove accidental import.
  Improve log message in TransportNodesAction
  Clean up of Script.
  Update Joda Time to version 2.9.5 (#21468)
  Remove unused ClusterService dependency from SearchPhaseController (#21421)
  Remove max_local_storage_nodes from elasticsearch.yml (#21467)
  Wait for all reindex subtasks before rethrottling
  Correcting a typo-Maan to Man-in README.textile (#21466)
  Fix InternalSearchHit#hasSource to return the proper boolean value (#21441)
  Replace all index date-math examples with the URI encoded form
  Fix typos (#21456)
  Adapt ES_JVM_OPTIONS packaging test to ubuntu-1204
  Add null check in InternalSearchHit#sourceRef to prevent NPE (#21431)
  Add VirtualBox version check (#21370)
  Export ES_JVM_OPTIONS for SysV init
  Skip reindex rethrottle tests with workers
  Make forbidden APIs be quieter about classpath warnings (#21443)
  ...
2016-11-10 23:40:33 -05:00
Igor Motov
df965fc9b3 Task cancellation command should wait for all child nodes to receive cancellation request before returning
Currently the task cancellation command returns as soon as the top-level parent child is marked as cancelled. This create race conditions in tests where child tasks on other nodes may continue to run for some time after the main task is cancelled. This commit fixes this situation making task cancellation command to wait until it got propagated to all nodes that have child tasks.

Closes #21126
2016-11-10 22:43:43 -05:00
Igor Motov
06a50fa31e ShardActiveResponseHandler shouldn't hold to an entire cluster state
ShardActiveResponseHandler doesn't need to hold to an entire cluster state since it only needs to know the cluster state version. It seems that on overloaded systems where nodes are unresponsive holding onto a lot of different cluster states can make the situation worse.

Closes #21394
2016-11-10 22:28:49 -05:00
Ali Beyad
3001b636db Ensures cleanup of temporary index-* generational blobs during snapshotting (#21469)
Ensures pending index-* blobs are deleted when snapshotting.  The
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot #5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes #21462
2016-11-10 21:45:02 -05:00
Ryan Ernst
48bfb142b9 Remove (again) test uses of onModule (#21414)
This change was reverted after it caused random test failures. This was
due to a copy/paste error in the original PR which caused the mock
version of ClusterInfoService to be used whenever the mock *ZenPing* was
used, and the real ClusterInfoService to be used when MockZenPing was
not used.
2016-11-10 16:06:14 -08:00
Areek Zillur
7ed195fe93 [TEST] Add assertBusy when checking for pending operation counter after tests
Currently, pending operations can complete after tests with disruption scheme
completes. This commit waits for the pending operation counter to complete
after the tests are run
2016-11-10 18:35:52 -05:00
Areek Zillur
5b4c3fb1ac Revert "Add trace logging when aquiring and releasing operation locks for replication requests"
This reverts commit 4e996ca9f5ae6d47f6039b1d75fd3d10a6286d64.
2016-11-10 18:35:25 -05:00
Alexander Lin
0219a211d3 Allows multiple patterns to be specified for index templates (#21009)
* Allows for an array of index template patterns to be provided to an
index template, and rename the field from 'template' to 'index_pattern'.

Closes #20690
2016-11-10 18:00:30 -05:00
Ali Beyad
5c4392e58a [TEST] fixes rebalance single shard check as it isn't guaranteed that a
rebalance makes sense and the method only tests if rebalance is allowed
2016-11-10 17:13:39 -05:00
Jason Tedor
179dd885e2 Avoid angering Log4j in TransportNodesActionTests
When logging a mock exception, Log4j attempts to render the stack
trace. On a mock exception, this will be null and Log4j will hit a
NullPointerException. This NullPointerException will get recorded in the
status logger buffer that we use to ensure that we do not having any
misuses of Log4j in production code. This commit replaces the use of a
mock exception with an actual exception to avoid angering the Log4j
assertions in ESTestCase.
2016-11-10 16:08:08 -05:00
Areek Zillur
4e996ca9f5 Add trace logging when aquiring and releasing operation locks for replication requests 2016-11-10 15:13:42 -05:00
Jason Tedor
0a06a0c2b3 Fix handler name on message not fully read
Today when a message is not fully read on a response, we log (among
other details) the handler name. Unfortunately, if the handler is a
wrapper, all that we see is

o.e.t.TransportService$ContextRestoreResponseHandler@7446ba18

completely losing the offending handler. This commit adds an override
for TransportService$ContextRestoreResponseHandler#toString so that the
underlying offender can be discovered.

Relates #21478
2016-11-10 14:56:48 -05:00
Jack Conradson
834976823a Remove accidental import. 2016-11-10 11:46:14 -08:00
Jason Tedor
fdbe336104 Improve log message in TransportNodesAction
Today when handling responses from nodes in TransportNodesAction, if a
node timeouts or some other failure occurs and the action is not
accumulating exceptions, we log a confusing message:

org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction]
ignoring unexpected response [null] of type [null], expected
[ClusterStatsNodeResponse] or [FailedNodeException]

Moreover, the original exception is completely lost. Since this log
message is confusing and unhelpful, we can drop it. Instead, we hold
onto the exception and log it at the warn level before dropping it from
the response.

Relates #21476
2016-11-10 14:32:14 -05:00
Jack Conradson
aeb97ff412 Clean up of Script.
Closes #21321
2016-11-10 09:59:13 -08:00