Commit Graph

30142 Commits

Author SHA1 Message Date
Islam Heggo f562c7f15a Correct the explanation of load time percentiles (#28510)
* Correct the explanation of load time percentiles

* Adjusting the percentile clarification

Eliminating the false sentence about majority of load time
2018-02-08 16:29:43 -08:00
Nhat Nguyen dbf9fb31e4
Do not ignore shard not-available exceptions in replication (#28571)
The shard not-available exceptions are currently ignored in the
replication as the best effort avoids failing not-yet-ready shards.
However these exceptions can also happen from fully active shards. If
this is the case, we may have skipped important failures from replicas.
Since #28049, only fully initialized shards are received write requests.
This restriction allows us to handle all exceptions in the replication.

There is a side-effect with this change. If a replica retries its peer
recovery second time after being tracked in the replication group, it
can receive replication requests even though it's not-yet-ready. That
shard may be failed and allocated to another node even though it has a
good lucene index on that node.

This PR does not change the way we report replication errors to users,
hence the shard not-available exceptions won't be reported as before.

Relates #28049
Relates #28534
2018-02-08 18:05:27 -05:00
Boaz Leskes ba59cf1262
Capture stack traces while issuing IndexShard operations permits to easy debugging (#28567)
Today we acquire a permit from the shard to coordinate between indexing operations, recoveries and other state transitions. When we leak an  permit it's practically impossible to find who the culprit is. This PR add stack traces capturing for each permit so we can identify which part of the code is responsible for acquiring the unreleased permit. This code is only active when assertions are active. 

The output is something like:
```
java.lang.AssertionError: shard [test][1] on node [node_s0] has pending operations:
--> java.lang.RuntimeException: something helpful 2
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:223)
	at org.elasticsearch.index.shard.IndexShard.<init>(IndexShard.java:322)
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:382)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:514)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231)
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

--> java.lang.RuntimeException: something helpful
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:223)
	at org.elasticsearch.index.shard.IndexShard.<init>(IndexShard.java:311)
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:382)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:514)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231)
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

```
2018-02-08 22:59:02 +01:00
Nhat Nguyen 5b8870f193
Only log warning when actually failing shards (#28558)
Currently the master node logs a warning message whenever it receives a 
failed shard request. However, this can be noisy because

- Multiple failed shard requests can be issued for a single shard
- Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing 
shard action actually happens. This is another prerequisite step in 
order to not ignore the shard not-available exceptions in the
replication.

Relates #28534
2018-02-08 15:37:52 -05:00
Ryan Ernst 578773f11b Re-enable bwc tests after backport of #28556 2018-02-08 11:53:48 -08:00
Jason Tedor 5badacf391
Fix race condition in queue size test
The queue size test has a race condition. Namely the offering thread can
run so quickly completing all of its offering iterations before the
queue size thread ever has a chance to run a single size poll
iteration. This means that the size will never actually be polled and
the test can spuriously fail. What we really want to do here, since this
test is checking for a race condition between polling the size of the
queue and offers to the queue, we want to execute each iteration in
lockstep giving the threads multiple changes for the race between
polling the size and offers to occur. This commit addresses this by
running the two threads in lockstep for multiple iterations so that they
have multiple chances to race.

Relates #28584
2018-02-08 14:23:24 -05:00
Igor Motov 80c8c3b114 Add 6.2.2 version constant 2018-02-08 13:43:34 -05:00
Lee Hinman 2e4c834a13
Switch to non-deprecated ParseField.match method for o.e.search (#28526)
* Switch to non-deprecated ParseField.match method for o.e.search

This replaces more of the `ParseField.match` calls with the same call using a
deprecation handler. It encapsulates all of the instances in the
`org.elastsicsearch.search` package.

Relates to #28504

* Address Nik's comments
2018-02-08 11:17:57 -07:00
Nhat Nguyen 0edde25f09 TEST: use new nodes assumption in testUpdateSnapshotStatus
Using an assertion is not correct here as the list of new nodes can be
empty.
2018-02-08 13:11:21 -05:00
Lee Hinman b64bf51000
Replace more deprecated ParseField.match calls with non-deprecated call (#28525)
* Replace more deprecated ParseField.match calls with non-deprecated call

This replaces more of the `ParseField.match` calls with the same call using a
deprecation handler.

Relates to #28504

* Address Nik's comments
2018-02-08 09:45:28 -07:00
Christoph Büscher 01791277cb
Test that rank_eval request parsing is not lenient (#28516)
Parsing of a ranking evaluation request and its subcomponents should throw parsing
errors on unknown fields. This change adds tests for this and changes the parser 
behaviour in cases where it is needed.
2018-02-08 17:38:45 +01:00
Ryan Ernst a55eda626f
Plugins: Store elasticsearch and java versions in PluginInfo (#28556)
Plugin descriptors currently contain an elasticsearch version,
which the plugin was built against, and a java version, which the plugin
was built with. These versions are read and validated, but not stored.
This commit keeps them in PluginInfo so they can be used later.
While seeing the elasticsearch version is less interesting (since it is
enforced to match that of the running elasticsearc node), the java
version is interesting since we only validate the format, not the actual
version. This also makes PluginInfo have full parity with the plugin
properties file.
2018-02-08 08:31:39 -08:00
Luca Cavanna cb7159d4c9
REST high-level Client: add missing final modifiers (#28572)
A couple of methods in RestHighLevelClient were supposed to be final but the modifier was forgotten.
2018-02-08 17:14:20 +01:00
Jason Tedor 86fd48e5f5
Fix size blocking queue to not lie about its weight
Today when offering an item to a size blocking queue that is at
capacity, we first increment the size of the queue and then check if the
capacity is exceeded or not. If the capacity is indeed exceeded, we do
not add the item to the queue and immediately decrement the size of the
queue. However, this incremented size is exposed externally even though
the offered item was never added to the queue (this is effectively a
race on the size of the queue). This can lead to misleading statistics
such as the size of a queue backing a thread pool. This commit fixes
this issue so that such a size is never exposed. To do this, we replace
the hidden CAS loop that increments the size of the queue with a CAS
loop that only increments the size of the queue if we are going to be
successful in adding the item to the queue.

Relates #28557
2018-02-08 06:54:39 -05:00
Tim Brooks 16f7e00514
Improve testTransportStatsWithException test (#28554)
This commit modifies the transport stats with exception test to remove
the requirement that we calculate the published address size when
comparing bytes received. This is tricky and is currently broken as we
also place the address string in the transport exception, however we do
not adjust the bytes for that.

The solution in this commit is to just serialize the transport exception
in the test and use that for the calculation.
2018-02-07 14:31:42 -07:00
Jason Tedor 666c4f9414
Index shard should roll generation via the engine
Today when a replica shard detects a new primary shard (via a primary
term transition), we roll the translog generation. However, the
mechanism that we are using here is by reaching through the engine to
the translog directly. By poking all the way through rather than asking
the engine to manage the roll for us we miss:
 - taking a read lock in the engine while the roll is occurring
 - trimming unreferenced readers

This commit addresses this by asking the engine to roll the translog
generation for us.

Relates #28537
2018-02-07 14:57:50 -05:00
Martijn van Groningen 2023c98bea
Added more parameter to PersistentTaskPlugin#getPersistentTasksExecutor(...) 2018-02-07 17:43:35 +01:00
Christoph Büscher c0886cf7c6
[Tests] Relax assertion in SuggestStatsIT (#28544)
The test expects suggest times in milliseconds that are strictly
positive. Internally they are measured in nanos, it is possible that on
really fast execution this is rounded to 0L, so this should also be an
accepted value.

Closes #28543
2018-02-07 17:26:08 +01:00
Christoph Büscher 305b87b4b7
Make internal Rounding fields final (#28532)
The fields in the internal rounding classes can be made final with very minor
adjustments to how they are read from a StreamInput.
2018-02-07 09:33:21 +01:00
Jason Tedor c2fcf15d9d
Fix the ability to remove old plugin
We now read the plugin descriptor when removing an old plugin. This is
to check if we are removing a plugin that is extended by another
plugin. However, when reading the descriptor we enforce that it is of
the same version that we are. This is not the case when a user has
upgraded Elasticsearch and is now trying to remove an old plugin. This
commit fixes this by skipping the version enforcement when reading the
plugin descriptor only when removing a plugin.

Relates #28540
2018-02-06 17:38:26 -05:00
Lee Hinman 64adaffe11 [TEST] Expand failure message for wildfly integration tests 2018-02-06 15:34:41 -07:00
Lee Hinman 6b4ea4e6fb Add 6.2.1 version constant 2018-02-06 12:13:24 -07:00
Yannick Welsch e6f873c620
Remove feature parsing for GetIndicesAction (#28535)
Removes dead code. Follow-up of #24723
2018-02-06 18:00:14 +01:00
Yannick Welsch c8df446000
No refresh on shard activation needed (#28013)
A shard is fully baked when it moves to POST_RECOVERY. There is no need to do an extra refresh on shard activation again as the shard has already been refreshed when it moved to POST_RECOVERY.
2018-02-06 17:29:22 +01:00
Yannick Welsch d43f0b5f26
Improve failure message when restoring an index that already exists in the cluster (#28498)
Makes the message more actionable and removes the focus on the fact that the index is open.
2018-02-06 14:24:52 +01:00
Martijn van Groningen 2a35b4ee2b
Use right skip versions.
Closes #27570
2018-02-06 12:22:42 +01:00
Ivan Brusic 38c5f4efee [Docs] Fix incomplete URLs (#28528) 2018-02-06 09:25:28 +01:00
Lee Hinman eebff4d2b3
Use non deprecated xcontenthelper (#28503)
* Move to non-deprecated XContentHelper.createParser(...)

This moves away from one of the now-deprecated XContentHelper.createParser
methods in favor of specifying the deprecation logger at parser creation time.

Relates to #28449

Note that this doesn't move all the `createParser` calls because some of them
use the already-deprecated method that doesn't specify the XContentType.

* Remove the deprecated (and now non-needed) createParser method
2018-02-05 16:18:18 -07:00
Jack Conradson 5c1d3aa2f0
Painless: Fixes a null pointer exception in certain cases of for loop usage (#28506)
The initializer and afterthought were not having their types
appropriately cast which is necessary with expressions which in turn
caused values to be popped off the stack that were null.
2018-02-05 11:57:21 -08:00
Nik Everett 5003ef18ac
Scripts: Fix security for deprecation warning (#28485)
If you call `getDates()` on a long or date type field add a deprecation
warning to the response and log something to the deprecation logger.
This *mostly* worked just fine but if the deprecation logger happens to
roll then the roll will be performed with the script's permissions
rather than the permissions of the server. And scripts don't have
permissions to, say, open files. So the rolling failed. This fixes that
by wrapping the call the deprecation logger in `doPriviledged`.

This is a strange `doPrivileged` call because it doens't check
Elasticsearch's `SpecialPermission`. `SpecialPermission` is a permission
that no-script code has and that scripts never have. Usually all
`doPrivileged` calls check `SpecialPermission` to make sure that they
are not accidentally acting on behalf of a script. But in this case we
are *intentionally* acting on behalf of a script.

Closes #28408
2018-02-03 14:56:08 -05:00
Nhat Nguyen de6d31ebc2 Backport fail shard w/o marking as stale PR to v6.3
Relates #28054
2018-02-03 12:07:39 -05:00
Nhat Nguyen 965efa51cc
Allows failing shards without marking as stale (#28054)
Currently when failing a shard we also mark it as stale (eg. remove its
allocationId from from the InSync set). However in some cases, we need 
to be able to fail shards but keep them InSync set. This commit adds
such capacity. This is a preparatory change to make the primary-replica
resync less lenient.

Relates #24841
2018-02-03 09:41:53 -05:00
Andy Bristol 13083e27da
[TEST] packaging tests: clean up Vagrantfile (#28173)
* Consolidates provision steps so it's more clear which steps are
applied to all boxes
* Removes duplicate configuration that was being stomped
* Ensure rsync, a dependency for platform steps, is installed on linux
* Ruby style changes

For #26741
2018-02-02 12:04:45 -08:00
Nhat Nguyen 875bbfe699 Backported the harden synced-flush PR to v6.3.0
Relates #28464
2018-02-02 14:31:37 -05:00
Deb Adair 459233d550 [DOCS] Fixed list formatting. 2018-02-02 11:01:40 -08:00
Ryan Ernst a139c42964
Build: Fix test task to explicitly depend on testClasses task (#28490)
Gradle 4.5 now hides immutable task dependencies. We previously copied
the existing dependencies from the builtin test task to the
randomizedtesting task. This commit adds testClasses as an extra
dependency of the randomizedtesting task, to ensure the classes are
built.
2018-02-02 10:57:01 -08:00
Jack Conradson 90c74a7e09
Remove RuntimeClass from Painless Definition in favor of just Painless Struct. (#28486) 2018-02-02 10:26:02 -08:00
David Turner ab8f5ea54c
Forbid trappy methods from java.time (#28476)
ava.time has the functionality needed to deal with timezones with varying 
offsets correctly, but it also has a bunch of methods that silently let you
forget about the hard cases, which raises the risk that we'll quietly do the
wrong thing at some point in the future.

This change adds the trappy methods to the list of forbidden methods to try and
help stop this from happening.

It also fixes the only use of these methods in the codebase so far:
IngestDocument#deepCopy() used ZonedDateTime.of() which may alter the offset of
the given time in cases where the offset is ambiguous.
2018-02-02 18:24:02 +00:00
Lee Hinman 3ddea8d8d2
Start switching to non-deprecated ParseField.match method (#28488)
This commit switches all the modules and server test code to use the
non-deprecated `ParseField.match` method, passing in the parser's deprecation
handler or the logging deprecation handler when a parser is not available (like
in tests).

Relates to #28449
2018-02-02 10:10:13 -07:00
Nhat Nguyen 5f2121960e
Synced-flush should not seal index of out of sync replicas (#28464)
Today the correctness of synced-flush is guaranteed by ensuring that 
there is no ongoing indexing operations on the primary. Unfortunately, a
replica might fall out of sync with the primary even the condition is
met. Moreover, if synced-flush mistakenly issues a sync_id for an out of
sync replica, then that replica would not be able to recover from the 
primary. ES prevents that peer-recovery because it detects that both
indexes from primary and replica were sealed with the same sync_id but
have a different content. This commit modifies the synced-flush to not
issue sync_id for out of sync replicas. This change will report the
divergence issue earlier to users and also prevent replicas from getting
into the "unrecoverable" state.

Relates #10032
2018-02-02 11:20:38 -05:00
Luca Cavanna 075fdc579f
[DOCS] Update APIs grouping and ordering in REST high-level Client docs (#28497)
With this new structure supported APIs are ordered and grouped in the same way as they are in the Elasticsearch reference docs
2018-02-02 17:19:50 +01:00
Jim Ferenczi 7c2bcf3953
Mark synonym_graph as beta in the docs (#28496)
We do want to keep this functionality in the future and we provide support for it.
This change is a first step towards replacing the `synonym` token filter with `synonym_graph`.
2018-02-02 16:33:48 +01:00
Jim Ferenczi 88f4c1c03a
Add a test for sub-aggregations rewrite (#28491)
This commit adds a test to check that the rewrite of a sub-aggregation triggers a copy of the parent aggregation.

Relates #28430
Closes #27782
2018-02-02 16:04:33 +01:00
javanna b5986c8dce fix checkstyle error in SearchDocumentationIT 2018-02-02 14:05:30 +01:00
Clinton Gormley 45c1e37740 Add defined ID to terms agg size header 2018-02-02 13:43:20 +01:00
javanna 9d11cfd652 [DOCS] Remove rawtypes suppressions and fix violations in REST high-level client docs tests 2018-02-02 11:57:55 +01:00
javanna 174e243f83 [DOCS] Adapt indices exists docs after recent listener changes 2018-02-02 11:46:05 +01:00
Haris Osmanagić 897ef458f3 Add support for indices exists to REST high level client (#27384)
Relates to #27205
2018-02-02 11:25:36 +01:00
Yannick Welsch 031415a5f6
Replicate writes only to fully initialized shards (#28049)
The primary currently replicates writes to all other shard copies as soon as they're added to the routing table. Initially those shards are not even ready yet to receive these replication requests, for example when undergoing a file-based peer recovery. Based on the specific stage that the shard copies are in, they will throw different kinds of exceptions when they receive the replication requests. The primary then ignores responses from shards that match certain exception types. With this mechanism it's not possible for a primary to distinguish between a situation where a replication target shard is not allocated and ready yet to receive requests and a situation where the shard was successfully allocated and active but subsequently failed.
This commit changes replication so that only initializing shards that have successfully opened their engine are used as replication targets. This removes the need to replicate requests to initializing shards that are not even ready yet to receive those requests. This saves on network bandwidth and enables features that rely on the distinction between a "not-yet-ready" shard and a failed shard.
2018-02-02 11:13:07 +01:00
Christoph Büscher bc10334f7a
[Docs] Move callouts in range.asciidoc (#28264)
Currently the callouts for this section are below all the examples, making it
harder to relate them to the snippets. Instead they should be moved closer 
to the examples.
2018-02-02 11:00:07 +01:00