Commit Graph

6587 Commits

Author SHA1 Message Date
Boaz Leskes c10a6ddec1 IndexService#maybeRefresh should catch `IndexShardClosedException` (#21205)
We throw this exception in some cases that the shard is closed, so we have to be consistent here. Otherwise we get logs like:

```
 1> [2016-10-30T21:06:22,529][WARN ][o.e.i.IndexService       ] [node_s_0] [test] failed to run task refresh - suppressing re-occurring exceptions unless the exception changes
 1> org.elasticsearch.index.shard.IndexShardClosedException: CurrentState[CLOSED] operation only allowed when not closed
 1> 	at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:1147) ~[main/:?]
 1> 	at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:1141) ~[main/:?]
```
2016-10-31 20:04:33 +01:00
Yannick Welsch d7d5909e69 Disconnect from newly added nodes if cluster state publishing fails (#21197)
Before publishing a cluster state the master connects to the nodes that are added in the cluster state. When publishing fails, however, it does not disconnect from these nodes, leaving NodeConnectionsService out of sync with the currently applied cluster state.
2016-10-31 15:09:43 +01:00
Yannick Welsch 37228f924a [TEST] Use assertBusy to check assertMaster property in presence of a low publish timeout
The assertion assertMaster checks if all nodes have each other in the cluster state and the correct master set.
It is usually called after a disruption has been healed and ensureStableCluster been called. In presence of a low
publish timeout of 1s in this test class, publishing might not be fully done even after ensureStableCluster returns.
This commit adds an assertBusy to assertMaster so that the node has a bit more time to apply the cluster state from
the master, even if it's a bit slow.
2016-10-31 14:04:18 +01:00
Boaz Leskes e7cfe101e4 Retrying replication requests on replica doesn't call `onRetry` (#21189)
Replication request may arrive at a replica before the replica's node has processed a required mapping update. In these cases the TransportReplicationAction will retry the request once a new cluster state arrives. Sadly that retry logic failed to call `ReplicationRequest#onRetry`, causing duplicates in the append only use case.

This commit fixes this and also the test which missed the check. I also added an assertion which would have helped finding the source of the duplicates.

This was discovered by https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/174/

Relates #20211
2016-10-31 13:43:55 +01:00
Igor Motov d731a330aa Tests: Add addtional logging to SearchCancellationIT tests 2016-10-28 11:29:49 -10:00
Boaz Leskes b9691d15ae IndexWithShadowReplicasIT.testReplicaToPrimaryPromotion should wait for node leave to be processed 2016-10-28 20:22:24 +02:00
Adrien Grand b3cc54cf0d Upgrade to lucene-6.3.0-snapshot-ed102d6 (#21150)
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
2016-10-28 14:47:15 +02:00
Adrien Grand 9cbbddb6dc Add support for `quote_field_suffix` to `simple_query_string`. (#21060)
Closes #18641
2016-10-28 09:11:57 +02:00
Simon Willnauer 97cc426a89 Fix bwc cluster formation in order to run BWC tests against a mixed version cluster (#21145)
This fixes our cluster formation task to run REST tests against a mixed version cluster.
Yet, due to some limitations in our test framework `indices.rollover` tests are currently
disabled for the BWC case since they select the current master as the merge node which
happens to be a BWC node and we can't relocate all shards to it since the primaries are on
a higher version node. This will be fixed in a followup.

Closes #21142

Note: This has been cherry-picked from 5.0 and fixes several rest tests
as well as a BWC break in `OsStats.java`
2016-10-27 17:03:53 +02:00
markharwood 9944a594b1 Aggregations fix: scripted heuristics for scoring significant_terms aggs were not thread safe when running local to the coordinating node. New code spawns an object for each shard search execution rather than sharing a common instance which is not thread safe.
Closes #18120
2016-10-27 13:56:48 +01:00
Yannick Welsch f3e578f942 Stop delaying existing requests after network delay rule is cleared (#21129)
The network disruption type "network delay" continues delaying existing requests even after the disruption has been cleared. This commit ensures that the requests get to execute right after the delay rule is cleared.
2016-10-27 13:48:17 +02:00
Yannick Welsch 952097b1c0 [TEST] Fix testDelayShards to wait for master to remove stopped node
This test failed when the node that was shutting down was not yet removed from the cluster state on the master.
The cluster allocation explain API will not see any unassigned shards until the node shutting down is removed from the
cluster state.
2016-10-27 12:02:00 +02:00
Yannick Welsch 118913b553 [TEST] Fix testRolloverConditionsNotMet to expect correct rollover index name
PR #21138 changed the target index name even if _rollover conditions are not met but missed to adapt this test.
2016-10-27 11:00:44 +02:00
Jun Ohtani a66c76eb44 Merge pull request #20704 from johtani/remove_request_params_in_analyze_api
Removing request parameters in _analyze API
2016-10-27 17:43:18 +09:00
Simon Willnauer e745015325 Return target index name even if _rollover conditions are not met (#21138)
Today we return the old index name as the target / new index name.
This change passes the correct rollover index name to the response.
2016-10-27 09:20:46 +02:00
Jack Conradson 512a77a633 Refactor ScriptType to be a top-level class. 2016-10-26 10:21:22 -07:00
Ali Beyad c88452dc80 Abort snapshots on a node that leaves the cluster (#21084)
Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.

This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster 
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.

Closes #20876
2016-10-26 10:04:50 -04:00
Yannick Welsch e82a1f5cca Only allow the master to update the list of nodes in the cluster state (#21092)
The cluster state on a node is updated either
- by incoming cluster states that are received from the active master or
- by the node itself when it notices that the master has gone.

In the second case, the node adds the NO_MASTER_BLOCK and removes the current master as active master from its cluster state. In one particular case, it would also update the list of nodes, removing the master node that just failed. In the future, we want a clear separation between actions that can be executed by a master publishing a cluster state and a node locally updating its cluster state when no active master is around.
2016-10-26 09:24:03 +02:00
Igor Motov e6dda02c66 Tests: silence cancelling scroll search tests
Investigating it locally
2016-10-25 20:06:05 -10:00
Igor Motov 6fe3bd817b Tests: make sure that 2 segments are created in SearchCancellationTests
Otherwise, the test fails if forced merge kicks in.
2016-10-25 20:05:49 -10:00
Jason Tedor 9c3e4d6e22 Add correct Content-Length on HEAD requests
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.

This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.

Relates #21123
2016-10-25 23:08:19 -04:00
Igor Motov 17ad88d539 Makes search action cancelable by task management API
Long running searches now can be cancelled using standard task cancellation mechanism.
2016-10-25 12:27:34 -10:00
Britta Weber 7945894ede Remove unused interface InitialStateDiscoveryListener (#21115) 2016-10-25 18:29:23 +02:00
Jason Tedor 1bc08ff1e5 Fix empty <p> tag warning in o/e/m/o/OsProbe.java
This commit fixes an empty <p> tag warning in o/e/m/o/OsProbe.java.
2016-10-25 08:33:15 -04:00
Jason Tedor b89c5aff51 Add preformatted tags to Javadoc in OsProbe
This commit adds preformatted tags to the Javadoc for
OsProbe#readSysFsCgroupCpuAcctCpuStat to render the form of the cpu.stat
file in a fixed-width font.
2016-10-25 08:19:10 -04:00
Jason Tedor 9a6c81c9f1 Mock areCgroupStatsAvailable in OsProbeTests
When acquiring cgroup stats, we check if such stats are available by
invoking a method areCgroupStatsAvailable. This method checks
availability by looking for existence of some virtual files in
/proc/self/cgroup and /sys/fs/cgroups. If these stats are not available,
the getCgroup method returns null. The OsProbeTests#testCgroupProbe did
not account for this. On some systems where tests run, the cgroup stats
might not be available yet this test method was expecting them to be (we
mock the relevant virtual file reads). This commit handles the execution
of this test on such systems by overriding the behavior of
OsProbe#areCgroupStatsAvailable. We test both the possibility of this
method returning true as well as false.
2016-10-24 19:59:48 -04:00
Jason Tedor de241f441d Remove unused import from o/e/m/o/OsProbe.java
This commit removes an unused import from o/e/m/o/OsProbe.java.
2016-10-24 16:40:41 -04:00
Jason Tedor 900ee0536e Strengthen handling of unavailable cgroup stats
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.

Relates #21094
2016-10-24 16:36:51 -04:00
Christoph Büscher e8a3225719 Tests: Fix compile issue with type inference on java 9 build 2016-10-24 19:59:08 +02:00
Christoph Büscher a43f70522c Tests: fix issue with SliceBuilderTests creation of mutated test objects 2016-10-24 18:48:18 +02:00
Li Weinan d4e42b77a5 .es_temp_file remains after system crash, causing it not to start again #21007
When system starts, it creates a temporary file named .es_temp_file to ensure the data directories are writable.

If system crashes after creating the .es_temp_file but before deleting this file, next time the system will not be able to start because the Files.createFile(resolve) will throw an exception if the file already exists.
2016-10-24 16:41:36 +02:00
Christoph Büscher f6f129b21f Consolidate code for equals/hashCode testing in central utility class
Currently test that check that equals() and hashCode() are working as expected
for classes implementing them are quiet similar. This change moves common
assertions in this method to a common utility class. In addition, another common
utility function in most of these test classes that creates copies of input
object by running them through a StreamOutput and reading them back in, is moved
to ESTestCase so it can be shared across all these classes.

Closes #20629
2016-10-24 15:50:40 +02:00
Jason Tedor 3d642ab0eb Add basic cgroup CPU metrics
This commit adds basic cgroup CPU metrics to the node stats API.

Relates #21029
2016-10-24 08:26:56 -04:00
Simon Willnauer 0a410d3916 Pass executor name to request interceptor to support async intercept calls (#21089)
Today the request interceptor can't support async calls since the response
of the async call would execute on a different thread ie. a client or listener
thread. This means in-turn that the intercepted handler is not executed with the
thread it was supposed to run and therefor can, if it's executing blocking
operations, potentially deadlock an entire server.
2016-10-24 13:57:07 +02:00
Tanguy Leroux 127b4a8efc Change permissions on config files (#20966)
This commit changes some default file permissions on configuration files.
2016-10-24 09:42:03 +02:00
Igor Motov 04c7665432 Fix NPE in SearchContext.toString()
Fixes NPE in SearchContext.toString() for user requests that contain scroll id but not scroll timeout.
2016-10-21 12:49:46 -10:00
Nik Everett 8cc22eb960 Make sure HEAD / has 0 Content-Length (#21077)
Before this commit `curl -XHEAD localhost:9200?pretty` would return
`Content-Length: 1` and a body which is fairly upsetting to standards
compliant tools. Now it'll return `Content-Length: 0` with an empty
body like every other `HEAD` request.

Relates to #21075
2016-10-21 16:44:50 -04:00
Ali Beyad 3d2e885157 Separates decision making from decision application in BalancedShardsAllocator (#20634)
Refactors the BalancedShardsAllocator to create a method that
provides an allocation decision for allocating a single
unassigned shard or a single started shard that can no longer
remain on its current node.  Having a separate method that
provides a detailed decision on the allocation of a single shard
will enable the cluster allocation explain API to directly
invoke these methods to provide allocation explanations.
2016-10-21 15:33:27 -04:00
Christoph Büscher 8329bf145a Tests: Add test for parsing InnerHits with highlight query
This adds a test from #21065 that checks correct highlighting of inner hits of a
has-child query when using a nested highlight query.
2016-10-21 20:44:24 +02:00
Adrien Grand d88239ba63 `ip_range` aggregation should accept null bounds. (#21043)
* `ip_range` aggregation should accept null bounds.

Closes #21006

* test

* iter
2016-10-21 14:39:00 +02:00
Jason Tedor 3b2eff665e Fix typo in exception message in RestGetAction
This commit fixes a duplicated word in an exception message in
RestGetAction.
2016-10-21 07:45:33 -04:00
Jim Ferenczi 05915357c9 Set subSearchContext.topDocs after the rescoring in TopDocsAggs
This change fixes a bug introduced in https://github.com/elastic/elasticsearch/pull/20978
The top docs should be set in the subSearchContext after the rescoring
2016-10-21 11:01:17 +02:00
Igor Motov 441320b734 Remove cluster.routing.allocation.snapshot.relocation_enabled setting
This experimental setting enables relocation of shards that are being snapshotted, which can cause the shard allocation failures. This setting is undocumented and there is no good reason to set it in production.
2016-10-20 14:19:12 -10:00
Jason Tedor 3c7c8723ff Cleanup load average handling
This commit cleans up the code handling load averages in OsProbe:
 - remove support for BSD; we do not support this OS
 - add Javadocs
 - strengthen assertions and testing
 - add debug logging for exceptional situation

Relates #21037
2016-10-20 15:39:46 -04:00
Ryan Ernst 60353a245a Plugins: Make UnicastHostsProvider extension pull based (#21036)
This change moves providing UnicastHostsProvider for zen discovery to be
pull based, adding a getter in DiscoveryPlugin. A new setting is added,
discovery.zen.hosts_provider, to separate the discovery type from the
hosts provider for zen when it is selected. Unfortunately existing
plugins added ZenDiscovery with their own name in order to just provide
a hosts provider, so there are already many users setting the hosts
provider through discovery.type. This change also includes backcompat,
falling back to discovery.type when discovery.zen.hosts_provider is not
set.
2016-10-20 09:13:59 -07:00
markharwood 4a815bf665 Test fix - configure script object fully before making available. Hopefully a fix for issue 18120 but have been unable to reproduce so cannot confirm. 2016-10-20 14:27:52 +01:00
Jim Ferenczi e04ee40f2c Add specialization of TermsQuery for _type disjunctions 2016-10-20 15:10:45 +02:00
Jim Ferenczi 1b822cc7ef Rescorer should be applied in the TopHits aggregation (#20978)
When using a top hits aggregation the rescorer are ignored.
This change applies the rescorer to the top hits of each bucket.

Fixes #19317
2016-10-20 12:50:49 +02:00
Jim Ferenczi adb30ac091 Max score should be updated when a rescorer is used (#20977)
The max score returned in the response of a query does not take rescorer into account.
This change updates the max_score when a rescorer is used in a query.
Fixes #20651
2016-10-20 12:38:28 +02:00
Jim Ferenczi d0bbe89c16 Optimize query with types filter in the URL (t/t/_search) (#20979)
This change adds a TypesQuery that checks if the disjunction of types should be rewritten to a MatchAllDocs query. The check is done only if the number of terms is below a threshold (16 by default and configurable via max_boolean_clause).
2016-10-20 12:33:32 +02:00