6974 Commits

Author SHA1 Message Date
Boaz Leskes
523f7ea71e Fix a racing condition in MockTransportService#addUnresponsiveRule where a request can be delayed even if the rule was removed.
Relates to #21129

Also properly reset DiscoveryWithServiceDisruptionsIT#disableBeforeIndexDeletion
2016-11-01 14:08:18 +01:00
Jay Modi
6e7e89159b ensure the XContentBuilder is always closed in RestBuilderListener
There may be cases where the XContentBuilder is not used and therefore it never gets
closed, which can cause a leak of bytes. This change moves the creation of the builder
into a try with resources block and adds an assertion to verify that we always consume
the bytes in our code; the try-with resources provides protections against memory leaks
caused by plugins, which do not test this.
2016-11-01 09:02:05 -04:00
Areek Zillur
02ecff13e4 incorporate feedback 2016-10-31 23:50:09 -04:00
Jack Conradson
185dff7346 Cleanup ScriptType (#21179)
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
2016-10-31 13:48:51 -07:00
Boaz Leskes
c10a6ddec1 IndexService#maybeRefresh should catch IndexShardClosedException (#21205)
We throw this exception in some cases that the shard is closed, so we have to be consistent here. Otherwise we get logs like:

```
 1> [2016-10-30T21:06:22,529][WARN ][o.e.i.IndexService       ] [node_s_0] [test] failed to run task refresh - suppressing re-occurring exceptions unless the exception changes
 1> org.elasticsearch.index.shard.IndexShardClosedException: CurrentState[CLOSED] operation only allowed when not closed
 1> 	at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:1147) ~[main/:?]
 1> 	at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:1141) ~[main/:?]
```
2016-10-31 20:04:33 +01:00
Areek Zillur
eafd3dfc55 Merge branch 'master' into enhancement/replicate_primary_write_failures 2016-10-31 13:06:21 -04:00
Yannick Welsch
d7d5909e69 Disconnect from newly added nodes if cluster state publishing fails (#21197)
Before publishing a cluster state the master connects to the nodes that are added in the cluster state. When publishing fails, however, it does not disconnect from these nodes, leaving NodeConnectionsService out of sync with the currently applied cluster state.
2016-10-31 15:09:43 +01:00
Yannick Welsch
37228f924a [TEST] Use assertBusy to check assertMaster property in presence of a low publish timeout
The assertion assertMaster checks if all nodes have each other in the cluster state and the correct master set.
It is usually called after a disruption has been healed and ensureStableCluster been called. In presence of a low
publish timeout of 1s in this test class, publishing might not be fully done even after ensureStableCluster returns.
This commit adds an assertBusy to assertMaster so that the node has a bit more time to apply the cluster state from
the master, even if it's a bit slow.
2016-10-31 14:04:18 +01:00
Boaz Leskes
e7cfe101e4 Retrying replication requests on replica doesn't call onRetry (#21189)
Replication request may arrive at a replica before the replica's node has processed a required mapping update. In these cases the TransportReplicationAction will retry the request once a new cluster state arrives. Sadly that retry logic failed to call `ReplicationRequest#onRetry`, causing duplicates in the append only use case.

This commit fixes this and also the test which missed the check. I also added an assertion which would have helped finding the source of the duplicates.

This was discovered by https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/174/

Relates #20211
2016-10-31 13:43:55 +01:00
Igor Motov
d731a330aa Tests: Add addtional logging to SearchCancellationIT tests 2016-10-28 11:29:49 -10:00
Boaz Leskes
b9691d15ae IndexWithShadowReplicasIT.testReplicaToPrimaryPromotion should wait for node leave to be processed 2016-10-28 20:22:24 +02:00
Adrien Grand
b3cc54cf0d Upgrade to lucene-6.3.0-snapshot-ed102d6 (#21150)
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
2016-10-28 14:47:15 +02:00
Adrien Grand
9cbbddb6dc Add support for quote_field_suffix to simple_query_string. (#21060)
Closes #18641
2016-10-28 09:11:57 +02:00
Areek Zillur
2f883fcb85 Rethrow original exception when it fails the engine during write operations 2016-10-27 16:38:15 -04:00
Simon Willnauer
97cc426a89 Fix bwc cluster formation in order to run BWC tests against a mixed version cluster (#21145)
This fixes our cluster formation task to run REST tests against a mixed version cluster.
Yet, due to some limitations in our test framework `indices.rollover` tests are currently
disabled for the BWC case since they select the current master as the merge node which
happens to be a BWC node and we can't relocate all shards to it since the primaries are on
a higher version node. This will be fixed in a followup.

Closes #21142

Note: This has been cherry-picked from 5.0 and fixes several rest tests
as well as a BWC break in `OsStats.java`
2016-10-27 17:03:53 +02:00
markharwood
9944a594b1 Aggregations fix: scripted heuristics for scoring significant_terms aggs were not thread safe when running local to the coordinating node. New code spawns an object for each shard search execution rather than sharing a common instance which is not thread safe.
Closes #18120
2016-10-27 13:56:48 +01:00
Yannick Welsch
f3e578f942 Stop delaying existing requests after network delay rule is cleared (#21129)
The network disruption type "network delay" continues delaying existing requests even after the disruption has been cleared. This commit ensures that the requests get to execute right after the delay rule is cleared.
2016-10-27 13:48:17 +02:00
Yannick Welsch
952097b1c0 [TEST] Fix testDelayShards to wait for master to remove stopped node
This test failed when the node that was shutting down was not yet removed from the cluster state on the master.
The cluster allocation explain API will not see any unassigned shards until the node shutting down is removed from the
cluster state.
2016-10-27 12:02:00 +02:00
Yannick Welsch
118913b553 [TEST] Fix testRolloverConditionsNotMet to expect correct rollover index name
PR #21138 changed the target index name even if _rollover conditions are not met but missed to adapt this test.
2016-10-27 11:00:44 +02:00
Jun Ohtani
a66c76eb44 Merge pull request #20704 from johtani/remove_request_params_in_analyze_api
Removing request parameters in _analyze API
2016-10-27 17:43:18 +09:00
Simon Willnauer
e745015325 Return target index name even if _rollover conditions are not met (#21138)
Today we return the old index name as the target / new index name.
This change passes the correct rollover index name to the response.
2016-10-27 09:20:46 +02:00
Areek Zillur
947a17ee37 cleanup operation listener handling of failure in results 2016-10-27 00:26:01 -04:00
Areek Zillur
7fb44a3ab6 add tests 2016-10-27 00:26:01 -04:00
Areek Zillur
fa3ee6b996 Incorporate feedback 2016-10-27 00:25:55 -04:00
Jack Conradson
512a77a633 Refactor ScriptType to be a top-level class. 2016-10-26 10:21:22 -07:00
Areek Zillur
a3fcfe8196 add constructor overloads for primary result 2016-10-26 12:07:32 -04:00
Areek Zillur
65832b987f Revert "cleanup indexing operation listener"
This reverts commit bb785483ae41e30a756a17f3fb6cdb5d06a1ad6c.
2016-10-26 11:23:09 -04:00
Ali Beyad
c88452dc80 Abort snapshots on a node that leaves the cluster (#21084)
Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.

This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster 
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.

Closes #20876
2016-10-26 10:04:50 -04:00
Yannick Welsch
e82a1f5cca Only allow the master to update the list of nodes in the cluster state (#21092)
The cluster state on a node is updated either
- by incoming cluster states that are received from the active master or
- by the node itself when it notices that the master has gone.

In the second case, the node adds the NO_MASTER_BLOCK and removes the current master as active master from its cluster state. In one particular case, it would also update the list of nodes, removing the master node that just failed. In the future, we want a clear separation between actions that can be executed by a master publishing a cluster state and a node locally updating its cluster state when no active master is around.
2016-10-26 09:24:03 +02:00
Igor Motov
e6dda02c66 Tests: silence cancelling scroll search tests
Investigating it locally
2016-10-25 20:06:05 -10:00
Igor Motov
6fe3bd817b Tests: make sure that 2 segments are created in SearchCancellationTests
Otherwise, the test fails if forced merge kicks in.
2016-10-25 20:05:49 -10:00
Jason Tedor
9c3e4d6e22 Add correct Content-Length on HEAD requests
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.

This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.

Relates #21123
2016-10-25 23:08:19 -04:00
Igor Motov
17ad88d539 Makes search action cancelable by task management API
Long running searches now can be cancelled using standard task cancellation mechanism.
2016-10-25 12:27:34 -10:00
Britta Weber
7945894ede Remove unused interface InitialStateDiscoveryListener (#21115) 2016-10-25 18:29:23 +02:00
Areek Zillur
c237263ad1 fix computing took for write operation result 2016-10-25 10:42:09 -04:00
Areek Zillur
7a6f56a692 fix tests 2016-10-25 10:22:32 -04:00
Areek Zillur
1ad1e2730d fix wildcard import 2016-10-25 10:00:44 -04:00
Areek Zillur
64a897e5f2 add setters for translog location and took in engine operation result 2016-10-25 09:58:14 -04:00
Areek Zillur
bb785483ae cleanup indexing operation listener 2016-10-25 09:33:04 -04:00
Areek Zillur
168946ad5a Improve documentation for handling write operation failure 2016-10-25 09:22:49 -04:00
Areek Zillur
1aee578aa1 add operation result as a parameter to postIndex/delete in indexing operation listener 2016-10-25 09:12:39 -04:00
Areek Zillur
1587a77ffd Revert "Generify index shard method to execute engine write operation"
This reverts commit 1bdeada8aa9e699d5c864d7257fea926426ce474.
2016-10-25 09:11:16 -04:00
Jason Tedor
1bc08ff1e5 Fix empty <p> tag warning in o/e/m/o/OsProbe.java
This commit fixes an empty <p> tag warning in o/e/m/o/OsProbe.java.
2016-10-25 08:33:15 -04:00
Jason Tedor
b89c5aff51 Add preformatted tags to Javadoc in OsProbe
This commit adds preformatted tags to the Javadoc for
OsProbe#readSysFsCgroupCpuAcctCpuStat to render the form of the cpu.stat
file in a fixed-width font.
2016-10-25 08:19:10 -04:00
Jason Tedor
9a6c81c9f1 Mock areCgroupStatsAvailable in OsProbeTests
When acquiring cgroup stats, we check if such stats are available by
invoking a method areCgroupStatsAvailable. This method checks
availability by looking for existence of some virtual files in
/proc/self/cgroup and /sys/fs/cgroups. If these stats are not available,
the getCgroup method returns null. The OsProbeTests#testCgroupProbe did
not account for this. On some systems where tests run, the cgroup stats
might not be available yet this test method was expecting them to be (we
mock the relevant virtual file reads). This commit handles the execution
of this test on such systems by overriding the behavior of
OsProbe#areCgroupStatsAvailable. We test both the possibility of this
method returning true as well as false.
2016-10-24 19:59:48 -04:00
Jason Tedor
de241f441d Remove unused import from o/e/m/o/OsProbe.java
This commit removes an unused import from o/e/m/o/OsProbe.java.
2016-10-24 16:40:41 -04:00
Jason Tedor
900ee0536e Strengthen handling of unavailable cgroup stats
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.

Relates #21094
2016-10-24 16:36:51 -04:00
Christoph Büscher
e8a3225719 Tests: Fix compile issue with type inference on java 9 build 2016-10-24 19:59:08 +02:00
Christoph Büscher
a43f70522c Tests: fix issue with SliceBuilderTests creation of mutated test objects 2016-10-24 18:48:18 +02:00
Li Weinan
d4e42b77a5 .es_temp_file remains after system crash, causing it not to start again #21007
When system starts, it creates a temporary file named .es_temp_file to ensure the data directories are writable.

If system crashes after creating the .es_temp_file but before deleting this file, next time the system will not be able to start because the Files.createFile(resolve) will throw an exception if the file already exists.
2016-10-24 16:41:36 +02:00