Commit Graph

242 Commits

Author SHA1 Message Date
Robin Neatherway 282974215c MetaDataIndexAliasesService wrong get type (#28614)
A get of the wrong type would always have returned null so these
indices would have been inserted into the map repeatedly.
2018-02-12 15:55:17 -08:00
Robin Neatherway 3174b2cbfa NXYSignificanceHeuristic equality update (#28616)
* NXYSignificanceHeuristic.java: implementation of equality would have
  failed with a ClassCastException when comparing to another type.
  Replaced with the Eclipse generated form.
2018-02-12 15:51:57 -08:00
Robin Neatherway 68b7a5c281 Fix DeadlockAnalyzer printer (#28615)
Remove `if` block that was always true.
2018-02-12 15:28:56 -08:00
Nhat Nguyen 9eb9ce3843
Require translogUUID when reading global checkpoint (#28587)
Today we use the persisted global checkpoint to calculate the starting 
seqno in peer-recovery. However we do not check whether the translog 
actually belongs to the existing Lucene index when reading the global
checkpoint. In some rare cases if the translog does not match the Lucene
index, that recovering replica won't be able to complete its recovery.
This can happen as follows.

1. Replica executes a file-based recovery
2. Index files are copied to replica but crashed before finishing the recovery
3. Replica starts recovery again with seq-based as the copied commit is safe
4. Replica fails to open engine because translog and Lucene index are not matched
5. Replica won't be able to recover from primary

This commit enforces the translogUUID requirement when reading the 
global checkpoint directly from the checkpoint file.

Relates #28435
2018-02-12 13:23:32 -05:00
Lee Hinman 6538542603
Switch to hardcoding Smile as the state format (#28610)
This commit changes the state format that was previously passed in to
`MetaDataStateFormat` to always use Smile. This doesn't actually change the
format, since we have used Smile for writing the format since at least 5.0. This
removes the automatic detection of the state format when reading state, since
any state that could be processed in 6.x and 7.x would already have been written
in Smile format.

This is work towards removing the deprecated methods in the XContent code where
we do automatic content-type detection.

Relates to #28504
2018-02-12 08:07:01 -07:00
Jim Ferenczi e6a8528554
Force depth_first mode execution for terms aggregation under a nested context (#28421)
This commit forces the depth_first mode for `terms` aggregation that contain a sub-aggregation that need to access the score of the document
in a nested context (the `terms` aggregation is a child of a `nested` aggregation). The score of children documents is not accessible in
breadth_first mode because the `terms` aggregation cannot access the nested context.

Close #28394
2018-02-12 13:38:11 +01:00
Jim Ferenczi 7dc00ef1f5
Search option terminate_after does not handle post_filters and aggregations correctly (#28459)
* Search option terminate_after does not handle post_filters and aggregations correctly

This change fixes the handling of the `terminate_after` option when post_filters (or min_score) are used.
`post_filter` should be applied before `terminate_after` in order to terminate the query when enough document are accepted
by the post_filters.
This commit also changes the type of exception thrown by `terminate_after` in order to ensure that multi collectors (aggregations)
do not try to continue the collection when enough documents have been collected.

Closes #28411
2018-02-12 13:36:33 +01:00
Ke Li 55448b2630 [Tests] Remove unnecessary condition check (#28559)
The condition value in question is true, regardless of the randomBoolean() value.
This change simplifies this removing the condition blocks.
2018-02-12 11:33:19 +01:00
Boaz Leskes 4aece92b2c
IndexShardOperationPermits: shouldn't use new Throwable to capture stack traces (#28598)
The is a follow up to #28567 changing the method used to capture stack traces, as requested
during the review. Instead of creating a throwable, we explicitly capture the stack trace of the
current thread. This should Make Jason Happy Again ™️ .
2018-02-12 10:33:13 +01:00
Michael Basnight e0bea70070
Generalize BWC logic (#28505)
Generalizing BWC building so that there is less code to modify for a release. This ensures we do not
need to think about what major or minor version is in the gradle code. It follows the general rules of the
elastic release structure. For more information on the rules, see the VersionCollection's javadoc.

This also removes the additional bwc snapshots that will never be released, such as 6.0.2, which were
being built and tested against every time we ran bwc tests.

Additionally, it creates 4 new projects that correspond to the different types of snapshots that may exist
for a given version. Its possible to now run those individual tasks to work out bwc logic whereas
previously it was impossible and the entire suite of bwc tests had to be run to work out any logic
changes in the build tools' bwc project. Please note that if the project does not make sense for the 
version that is current, that an error will be thrown from that individual project if an attempt is made to 
run it.

This should allow for automating the version bumps as well, since it removes all the hardcoded version
logic from the configs.
2018-02-09 14:55:10 -06:00
Lee Hinman 5263b8cc7e
Remove all instances of the deprecated `ParseField.match` method (#28586)
This removes all the server references to the deprecated `ParseField.match`
method in favor of the method that passes in the deprecation logger.

Relates to #28504
2018-02-09 09:19:24 -07:00
Martijn van Groningen 766b9d600e
Fixed a bug that prevents pipelines to load that use stored scripts after a restart.
The bug was caused because the ScriptService had no reference to a ClusterState instance,
because it received the ClusterState after the PipelineStore. This only is the case
after a restart.

A bad side effect is that during a restart, any pipeline to be loaded after the pipeline that uses a stored script,
was never loaded, which caused many pipeline to be missing in bulk / index request api calls.
2018-02-09 17:14:00 +01:00
Yannick Welsch 5735e088f9
Fsync directory after cleanup (#28604)
After copying over the Lucene segments during peer recovery, we call cleanupAndVerify which removes all other files in the directory and which then calls getMetadata to check if the resulting files are a proper index. There are two issues with this:

- the directory is not fsynced after the deletions, so that the call to getMetadata, which lists files in the directory, can get a stale view, possibly seeing a deleted corruption marker (which leads to the exception seen in #28435)
- failing to delete a corruption marker should result in a hard failure, as the shard is otherwise unusable.
2018-02-09 17:06:36 +01:00
Igor Motov da1a10fa92
Add generic array support to AbstractObjectParser (#28552)
Adds a generic declareFieldArray that can process arrays of arbitrary elements.
2018-02-08 19:46:12 -05:00
Nhat Nguyen dbf9fb31e4
Do not ignore shard not-available exceptions in replication (#28571)
The shard not-available exceptions are currently ignored in the
replication as the best effort avoids failing not-yet-ready shards.
However these exceptions can also happen from fully active shards. If
this is the case, we may have skipped important failures from replicas.
Since #28049, only fully initialized shards are received write requests.
This restriction allows us to handle all exceptions in the replication.

There is a side-effect with this change. If a replica retries its peer
recovery second time after being tracked in the replication group, it
can receive replication requests even though it's not-yet-ready. That
shard may be failed and allocated to another node even though it has a
good lucene index on that node.

This PR does not change the way we report replication errors to users,
hence the shard not-available exceptions won't be reported as before.

Relates #28049
Relates #28534
2018-02-08 18:05:27 -05:00
Boaz Leskes ba59cf1262
Capture stack traces while issuing IndexShard operations permits to easy debugging (#28567)
Today we acquire a permit from the shard to coordinate between indexing operations, recoveries and other state transitions. When we leak an  permit it's practically impossible to find who the culprit is. This PR add stack traces capturing for each permit so we can identify which part of the code is responsible for acquiring the unreleased permit. This code is only active when assertions are active. 

The output is something like:
```
java.lang.AssertionError: shard [test][1] on node [node_s0] has pending operations:
--> java.lang.RuntimeException: something helpful 2
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:223)
	at org.elasticsearch.index.shard.IndexShard.<init>(IndexShard.java:322)
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:382)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:514)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231)
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

--> java.lang.RuntimeException: something helpful
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:223)
	at org.elasticsearch.index.shard.IndexShard.<init>(IndexShard.java:311)
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:382)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:514)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231)
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

```
2018-02-08 22:59:02 +01:00
Nhat Nguyen 5b8870f193
Only log warning when actually failing shards (#28558)
Currently the master node logs a warning message whenever it receives a 
failed shard request. However, this can be noisy because

- Multiple failed shard requests can be issued for a single shard
- Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing 
shard action actually happens. This is another prerequisite step in 
order to not ignore the shard not-available exceptions in the
replication.

Relates #28534
2018-02-08 15:37:52 -05:00
Jason Tedor 5badacf391
Fix race condition in queue size test
The queue size test has a race condition. Namely the offering thread can
run so quickly completing all of its offering iterations before the
queue size thread ever has a chance to run a single size poll
iteration. This means that the size will never actually be polled and
the test can spuriously fail. What we really want to do here, since this
test is checking for a race condition between polling the size of the
queue and offers to the queue, we want to execute each iteration in
lockstep giving the threads multiple changes for the race between
polling the size and offers to occur. This commit addresses this by
running the two threads in lockstep for multiple iterations so that they
have multiple chances to race.

Relates #28584
2018-02-08 14:23:24 -05:00
Igor Motov 80c8c3b114 Add 6.2.2 version constant 2018-02-08 13:43:34 -05:00
Lee Hinman 2e4c834a13
Switch to non-deprecated ParseField.match method for o.e.search (#28526)
* Switch to non-deprecated ParseField.match method for o.e.search

This replaces more of the `ParseField.match` calls with the same call using a
deprecation handler. It encapsulates all of the instances in the
`org.elastsicsearch.search` package.

Relates to #28504

* Address Nik's comments
2018-02-08 11:17:57 -07:00
Lee Hinman b64bf51000
Replace more deprecated ParseField.match calls with non-deprecated call (#28525)
* Replace more deprecated ParseField.match calls with non-deprecated call

This replaces more of the `ParseField.match` calls with the same call using a
deprecation handler.

Relates to #28504

* Address Nik's comments
2018-02-08 09:45:28 -07:00
Ryan Ernst a55eda626f
Plugins: Store elasticsearch and java versions in PluginInfo (#28556)
Plugin descriptors currently contain an elasticsearch version,
which the plugin was built against, and a java version, which the plugin
was built with. These versions are read and validated, but not stored.
This commit keeps them in PluginInfo so they can be used later.
While seeing the elasticsearch version is less interesting (since it is
enforced to match that of the running elasticsearc node), the java
version is interesting since we only validate the format, not the actual
version. This also makes PluginInfo have full parity with the plugin
properties file.
2018-02-08 08:31:39 -08:00
Jason Tedor 86fd48e5f5
Fix size blocking queue to not lie about its weight
Today when offering an item to a size blocking queue that is at
capacity, we first increment the size of the queue and then check if the
capacity is exceeded or not. If the capacity is indeed exceeded, we do
not add the item to the queue and immediately decrement the size of the
queue. However, this incremented size is exposed externally even though
the offered item was never added to the queue (this is effectively a
race on the size of the queue). This can lead to misleading statistics
such as the size of a queue backing a thread pool. This commit fixes
this issue so that such a size is never exposed. To do this, we replace
the hidden CAS loop that increments the size of the queue with a CAS
loop that only increments the size of the queue if we are going to be
successful in adding the item to the queue.

Relates #28557
2018-02-08 06:54:39 -05:00
Jason Tedor 666c4f9414
Index shard should roll generation via the engine
Today when a replica shard detects a new primary shard (via a primary
term transition), we roll the translog generation. However, the
mechanism that we are using here is by reaching through the engine to
the translog directly. By poking all the way through rather than asking
the engine to manage the roll for us we miss:
 - taking a read lock in the engine while the roll is occurring
 - trimming unreferenced readers

This commit addresses this by asking the engine to roll the translog
generation for us.

Relates #28537
2018-02-07 14:57:50 -05:00
Martijn van Groningen 2023c98bea
Added more parameter to PersistentTaskPlugin#getPersistentTasksExecutor(...) 2018-02-07 17:43:35 +01:00
Christoph Büscher c0886cf7c6
[Tests] Relax assertion in SuggestStatsIT (#28544)
The test expects suggest times in milliseconds that are strictly
positive. Internally they are measured in nanos, it is possible that on
really fast execution this is rounded to 0L, so this should also be an
accepted value.

Closes #28543
2018-02-07 17:26:08 +01:00
Christoph Büscher 305b87b4b7
Make internal Rounding fields final (#28532)
The fields in the internal rounding classes can be made final with very minor
adjustments to how they are read from a StreamInput.
2018-02-07 09:33:21 +01:00
Jason Tedor c2fcf15d9d
Fix the ability to remove old plugin
We now read the plugin descriptor when removing an old plugin. This is
to check if we are removing a plugin that is extended by another
plugin. However, when reading the descriptor we enforce that it is of
the same version that we are. This is not the case when a user has
upgraded Elasticsearch and is now trying to remove an old plugin. This
commit fixes this by skipping the version enforcement when reading the
plugin descriptor only when removing a plugin.

Relates #28540
2018-02-06 17:38:26 -05:00
Lee Hinman 6b4ea4e6fb Add 6.2.1 version constant 2018-02-06 12:13:24 -07:00
Yannick Welsch e6f873c620
Remove feature parsing for GetIndicesAction (#28535)
Removes dead code. Follow-up of #24723
2018-02-06 18:00:14 +01:00
Yannick Welsch c8df446000
No refresh on shard activation needed (#28013)
A shard is fully baked when it moves to POST_RECOVERY. There is no need to do an extra refresh on shard activation again as the shard has already been refreshed when it moved to POST_RECOVERY.
2018-02-06 17:29:22 +01:00
Yannick Welsch d43f0b5f26
Improve failure message when restoring an index that already exists in the cluster (#28498)
Makes the message more actionable and removes the focus on the fact that the index is open.
2018-02-06 14:24:52 +01:00
Lee Hinman eebff4d2b3
Use non deprecated xcontenthelper (#28503)
* Move to non-deprecated XContentHelper.createParser(...)

This moves away from one of the now-deprecated XContentHelper.createParser
methods in favor of specifying the deprecation logger at parser creation time.

Relates to #28449

Note that this doesn't move all the `createParser` calls because some of them
use the already-deprecated method that doesn't specify the XContentType.

* Remove the deprecated (and now non-needed) createParser method
2018-02-05 16:18:18 -07:00
Nik Everett 5003ef18ac
Scripts: Fix security for deprecation warning (#28485)
If you call `getDates()` on a long or date type field add a deprecation
warning to the response and log something to the deprecation logger.
This *mostly* worked just fine but if the deprecation logger happens to
roll then the roll will be performed with the script's permissions
rather than the permissions of the server. And scripts don't have
permissions to, say, open files. So the rolling failed. This fixes that
by wrapping the call the deprecation logger in `doPriviledged`.

This is a strange `doPrivileged` call because it doens't check
Elasticsearch's `SpecialPermission`. `SpecialPermission` is a permission
that no-script code has and that scripts never have. Usually all
`doPrivileged` calls check `SpecialPermission` to make sure that they
are not accidentally acting on behalf of a script. But in this case we
are *intentionally* acting on behalf of a script.

Closes #28408
2018-02-03 14:56:08 -05:00
Nhat Nguyen de6d31ebc2 Backport fail shard w/o marking as stale PR to v6.3
Relates #28054
2018-02-03 12:07:39 -05:00
Nhat Nguyen 965efa51cc
Allows failing shards without marking as stale (#28054)
Currently when failing a shard we also mark it as stale (eg. remove its
allocationId from from the InSync set). However in some cases, we need 
to be able to fail shards but keep them InSync set. This commit adds
such capacity. This is a preparatory change to make the primary-replica
resync less lenient.

Relates #24841
2018-02-03 09:41:53 -05:00
Nhat Nguyen 875bbfe699 Backported the harden synced-flush PR to v6.3.0
Relates #28464
2018-02-02 14:31:37 -05:00
David Turner ab8f5ea54c
Forbid trappy methods from java.time (#28476)
ava.time has the functionality needed to deal with timezones with varying 
offsets correctly, but it also has a bunch of methods that silently let you
forget about the hard cases, which raises the risk that we'll quietly do the
wrong thing at some point in the future.

This change adds the trappy methods to the list of forbidden methods to try and
help stop this from happening.

It also fixes the only use of these methods in the codebase so far:
IngestDocument#deepCopy() used ZonedDateTime.of() which may alter the offset of
the given time in cases where the offset is ambiguous.
2018-02-02 18:24:02 +00:00
Lee Hinman 3ddea8d8d2
Start switching to non-deprecated ParseField.match method (#28488)
This commit switches all the modules and server test code to use the
non-deprecated `ParseField.match` method, passing in the parser's deprecation
handler or the logging deprecation handler when a parser is not available (like
in tests).

Relates to #28449
2018-02-02 10:10:13 -07:00
Nhat Nguyen 5f2121960e
Synced-flush should not seal index of out of sync replicas (#28464)
Today the correctness of synced-flush is guaranteed by ensuring that 
there is no ongoing indexing operations on the primary. Unfortunately, a
replica might fall out of sync with the primary even the condition is
met. Moreover, if synced-flush mistakenly issues a sync_id for an out of
sync replica, then that replica would not be able to recover from the 
primary. ES prevents that peer-recovery because it detects that both
indexes from primary and replica were sealed with the same sync_id but
have a different content. This commit modifies the synced-flush to not
issue sync_id for out of sync replicas. This change will report the
divergence issue earlier to users and also prevent replicas from getting
into the "unrecoverable" state.

Relates #10032
2018-02-02 11:20:38 -05:00
Jim Ferenczi 88f4c1c03a
Add a test for sub-aggregations rewrite (#28491)
This commit adds a test to check that the rewrite of a sub-aggregation triggers a copy of the parent aggregation.

Relates #28430
Closes #27782
2018-02-02 16:04:33 +01:00
Haris Osmanagić 897ef458f3 Add support for indices exists to REST high level client (#27384)
Relates to #27205
2018-02-02 11:25:36 +01:00
Yannick Welsch 031415a5f6
Replicate writes only to fully initialized shards (#28049)
The primary currently replicates writes to all other shard copies as soon as they're added to the routing table. Initially those shards are not even ready yet to receive these replication requests, for example when undergoing a file-based peer recovery. Based on the specific stage that the shard copies are in, they will throw different kinds of exceptions when they receive the replication requests. The primary then ignores responses from shards that match certain exception types. With this mechanism it's not possible for a primary to distinguish between a situation where a replication target shard is not allocated and ready yet to receive requests and a situation where the shard was successfully allocated and active but subsequently failed.
This commit changes replication so that only initializing shards that have successfully opened their engine are used as replication targets. This removes the need to replicate requests to initializing shards that are not even ready yet to receive those requests. This saves on network bandwidth and enables features that rely on the distinction between a "not-yet-ready" shard and a failed shard.
2018-02-02 11:13:07 +01:00
Nhat Nguyen 5be478f938 Remove uncommitted ops assertion in shouldFlush
This assertion does not hold if engine is flushed between the invocation
of translog.uncommittedSizeInBytes and translog.uncommittedOperations.
These two values can be calculated from different commits.
2018-02-01 18:20:56 -05:00
markharwood 998461c737
Test fix - reenable BWC tests and lower version checks now that PR 28440 for allowPartialSearchResults flag backported to 6.x (#28482)
Support for allowPartialSearchResults is now in 6.3 so changing master BWC checks accordingly
2018-02-01 19:20:21 +00:00
Nhat Nguyen 1970e01782
Add lower bound for translog flush threshold (#28382)
If the translog flush threshold is too small (eg. smaller than the
translog header), we may repeatedly flush even there is no uncommitted
operation because the shouldFlush condition can still be true after
flushing. This is currently avoided by adding an extra guard against the
uncommitted operations. However, this extra guard makes the shouldFlush
complicated. This commit replaces that extra guard by a lower bound for
translog flush threshold. We keep the lower bound small for convenience
in testing.

Relates #28350
Relates #23606
2018-02-01 13:51:53 -05:00
Luca Cavanna d860971572
REST high-level client: add support for split and shrink index API (#28425)
Relates to #27205
2018-02-01 16:37:01 +01:00
Martijn van Groningen 61806802fb
Add persistent tasks
Persistent tasks are build on top of node tasks and provide functionality to restart a task to run on a different coordination node in case the coordinating node is no longer available.
It is up to a persistent task implementation to keep track of status, so that in case the task is restarted, the task can continue were it left off before it was restarted.
2018-02-01 15:26:17 +01:00
Alan Woodward 9500513ba1
Move leftover aliases test from core/ to server/ (#28463) 2018-02-01 11:40:29 +00:00
Colin Goodheart-Smithe 65157e9428
[TEST] Replaces flaky breaker IT test with unit test (#28418)
This change remove the `CircuitBreakerIT. testParentChecking` test method which fails intermittently in unexpected ways with a `MemoryCircuitBreakerTests. testBorrowingSiblingBreakerMemory` unit test method which can test the borrowing functionality more directly

Closes #28223
2018-02-01 08:32:46 +00:00
Jim Ferenczi dd40b984c4
Add a shallow copy method to aggregation builders (#28430)
This change adds a shallow copy method for aggregation builders. This method returns a copy of the builder replacing the factoriesBuilder and metaDada
This method is used when the builder is rewritten (AggregationBuilder#rewrite) in order to make sure that we create a new instance of the parent builder when sub aggregations are rewritten.

Relates #27782
2018-02-01 09:22:32 +01:00
Jim Ferenczi c7d5a54b42
Fix AIOOB on indexed geo_shape query (#28458)
This change fixes a possible AIOOB during the parsing of the document that contains the indexed shape.
This change ensures that the parsing does not continue when the field that contains the shape has been found.

Closes #28456
2018-02-01 09:01:48 +01:00
Jason Tedor 1b3d529bef Introduce secure security manager to project
This commit migrates SecureSM, our secure security manager
implementation, from its own repository to being a sub-project of
Elasticsearch.
2018-01-31 18:23:28 -05:00
Nhat Nguyen 5e0be61774
Add logging to index commit deletion policy (#28448)
This would help us to figure out which index commit that an engine 
started with or used in peer-recovery.

Relates #28405
2018-01-31 11:09:49 -05:00
markharwood 77d2dd203e
Search - add allow_partial_search_results flag with default setting false (#28440)
Adds allow_partial_search_results flag to search requests with default setting = true.
When false, will error if search either timeouts, has partial errors or has missing shards rather
than returning partial search results. A cluster-level setting provides a default for search requests with no flag.

Closes #27435
2018-01-31 15:51:29 +00:00
David Turner 4c154b70d3
Fix rounding of time values near to overlapping days (#28151)
Sometimes, in some places, the clocks are set back across midnight, leading to
overlapping days. This was not handled as expected, and this change fixes this.

Additionally, in this situation it is not true that rounding a time down to the
nearest day is a monotonic operation, as asserted in these tests. This change 
suppresses those assertions in those rare cases.

Fixes #27966.
2018-01-31 15:10:47 +00:00
kel 5819e57baa Replace Bits with new abstract class to respresent documents that have a value (#24088) (#28334) 2018-01-31 15:42:11 +01:00
Martijn van Groningen 592eedbf49
Make persistent tasks work.
Made persistent tasks executors pluggable.
2018-01-31 12:28:06 +01:00
Martijn van Groningen 07e727c769
Removed ClientHelper dependency from PersistentTasksService. 2018-01-31 12:28:06 +01:00
David Roberts cc16f9d9c9
Added AllocatedPersistentTask#waitForPersistentTaskStatus(...) that delegates to PersistentTasksService#waitForPersistentTaskStatus(...)
This allows persistent tasks executor implementations to not have an instance of PersistentTasksService.
2018-01-31 12:28:06 +01:00
Igor Motov 41071e4711
Add adding ability to associate an ID with tasks.
Persistent tasks portion of elastic/elasticsearch#23250
2018-01-31 12:28:06 +01:00
Jay Modi 8521b2d11e
Remove InternalClient and InternalSecurityClient (#3054)
This change removes the InternalClient and the InternalSecurityClient. These are replaced with
usage of the ThreadContext and a transient value, `action.origin`, to indicate which component the
request came from. The security code has been updated to look for this value and ensure the
request is executed as the proper user. This work comes from #2808 where @s1monw suggested
that we do this.

While working on this, I came across index template registries and rather than updating them to use
the new method, I replaced the ML one with the template upgrade framework so that we could
remove this template registry. The watcher template registry is still needed as the template must be
updated for rolling upgrades to work (see #2950).
2018-01-31 12:28:05 +01:00
Martijn van Groningen 4dd69951f3
Make the persistent task status available to PersistentTasksExecutor.nodeOperation(...) method 2018-01-31 12:28:05 +01:00
Colin Goodheart-Smithe 1c489ee867
Refactor/to x content fragments2 (#2329)
* Moves more classes over to ToXContentObject/Fragment

* Removes ToXContentToBytes

* Removes ToXContent from Enums

* review comment fix

* slight change to use XContantHelper
2018-01-31 12:28:05 +01:00
David Roberts 7313ad5b29
Make AllocatedPersistentTask members volatile (#2297)
These members are default initialized on contruction and then set by the
init() method.  It's possible that another thread accessing the object
after init() is called could still see the null/0 values, depending on how
the compiler optimizes the code.
2018-01-31 12:28:05 +01:00
Colin Goodheart-Smithe b0de3c38d6
Moves more classes over to ToXContentObject/Fragment (#2283) 2018-01-31 12:28:04 +01:00
Luca Cavanna 65ce2276eb
Adapt to upstream changes made to AbstractStreamableXContentTestCase (#2117) 2018-01-31 12:28:04 +01:00
Yannick Welsch b5f281386a
Move tribe to a module (#2088)
Companion PR to elastic/elasticsearch#25778
2018-01-31 12:28:04 +01:00
Igor Motov ffdb05e48e
Persistent Tasks: remove unused isCurrentStatus method (#2076)
Removes a method that is no longer used in production code.

Relates to #957
2018-01-31 12:28:04 +01:00
David Kyle 0d50f9c6a9
Call initialising constructor of BaseTasksRequest (#1771) 2018-01-31 12:28:04 +01:00
Chris Earle 1cef531165
Always Accumulate Transport Exceptions (#1619)
This is the x-pack side of the removal of `accumulateExceptions()` for both `TransportNodesAction` and `TransportTasksAction`.

There are occasional, random failures that occur during API calls that are silently ignored from the caller's perspective, which also leads to weird API responses that have no response and also no errors, which is obviously untrue.
2018-01-31 12:28:03 +01:00
Hendrik Muhs 614aef2527
Pass down the provided timeout. 2018-01-31 12:28:03 +01:00
Simon Willnauer 292e383d2c
Fix static / version based BWC tests (#1456)
With the leniency in Version.java we missed to really setup BWC
testing for static indices. This change brings back the testing and adds
missing bwc indices.

Relates to elastic/elasticsearch#24732
2018-01-31 12:28:03 +01:00
Yannick Welsch e69317b24b
Don't call ClusterService.state() in a ClusterStateUpdateTask
The current state is readily available as a parameter
2018-01-31 12:28:02 +01:00
Yannick Welsch 44ea5d6b3e
Separate publishing from applying cluster states
Companion commit to elastic/elasticsearch#24236
2018-01-31 12:28:02 +01:00
Igor Motov a08e2d9e5e
Persistent tasks: require allocation id on task completion (#1107)
Persistent tasks should verify that completion notification is done for correct version of the task, otherwise a delayed notification from an old node can accidentally close a newly reassigned task.
2018-01-31 12:28:01 +01:00
Colin Goodheart-Smithe 76cd7b1eb2
Fixes compile errors in Eclipse due to generics
PersistentTasksCustomMetadata was using a generic param named `Params`. This conflicted with the imported interface `ToXContent.Params`. The java compiler was preferring the generic param over the interface so everything was fine but Eclipse apparently prefers the interface int his case which was screwing up the Hierarchy and causing compile errors in Eclipse. This changes fixes it by renaming the Generic param to `P`
2018-01-31 12:27:34 +01:00
Igor Motov fc524bc9b5
Persistent Tasks: force writeable name of params and status to be the same as their task (#1072)
Changes persistent task serialization and forces params and status to have the same writeable name as the task itself.
2018-01-31 12:27:34 +01:00
Martijn van Groningen 4771965931
Use task builder instead of creating persistent tasks directly. 2018-01-31 12:27:34 +01:00
Igor Motov abd9ae399c
Persistent Tasks: PersistentTaskRequest -> PersistTaskParams (#1057)
Removes the last pieces of ActionRequest from PersistentTaskRequest and renames it into PersistTaskParams, which is now just an interface that extends NamedWriteable and ToXContent.
2018-01-31 12:27:33 +01:00
Igor Motov 6bfea09dd6
Persistent Tasks: switch from long task ids to string task ids (#1035)
This commit switches from long persistent task ids to caller-supplied string persistent task ids.
2018-01-31 12:27:33 +01:00
Hendrik Muhs 0a1f25588b
Added PersistentTasksService#waitForPersistentTasksStatus(...) method to allow callers to wait when an executor node has updated its task status. 2018-01-31 12:27:33 +01:00
Igor Motov 0a1abd430d
Persistent Tasks: remove listener from PersistentTasksExecutor#nodeOperation (#1032)
Instead of having a separate listener for indicating that the current task is finished, this commit is switching to use allocated object itself.
2018-01-31 12:27:32 +01:00
Igor Motov 95c6005f6f
Persistent Tasks: remove retries on notification failures (#977)
Retries should be already handled by TransportMasterNodeAction, there is no need to introduce another retry layer in Persistent Tasks code.
2018-01-31 12:27:32 +01:00
Martijn van Groningen fab0dc449a
Remove PersistentTask#isCurrentStatus() usages 2018-01-31 12:27:32 +01:00
Igor Motov 5a8512bf4e
Persistent Tasks: refactor PersistentTasksService to use ActionListener (#937)
PersistentTasksService methods are not using ActionListener<PersistentTask<?>> instead of PersistentTaskOperationListener.
2018-01-31 12:27:29 +01:00
Jason Tedor 97822dbea3
Respond to rename random ASCII helper methods
This commit is response to the renaming of the random ASCII helper
methods in ESTestCase. The name of this method was changed because these
methods only produce random strings generated from [a-zA-Z], not from
all ASCII characters.
2018-01-31 12:00:10 +01:00
Igor Motov 5b45b167bd
Persistent Tasks: check the current state in waitForPersistentTaskStatus (#935)
Add a check for the current state waitForPersistentTaskStatus before waiting for the next one. This fixes sporadic failure in testPersistentActionStatusUpdate test.

Fixes #928
2018-01-31 12:00:09 +01:00
Martijn van Groningen a5acb556b0
Use PersistentTasksService#waitForPersistentTaskStatus(...) to wait for job and datafeed status and use PersistentTasksService#removeTask(...) to force close job and force stop datafeed. 2018-01-31 12:00:09 +01:00
Igor Motov 1b0f5b9572
Persistent Tasks: require correct allocation id for status updates (#923)
In order to prevent tasks state updates by stale executors, this commit adds a check for correct allocation id during status update operation.
2018-01-31 12:00:09 +01:00
Igor Motov 6ca044736e
Persistent Tasks: Add waitForPersistentTaskStatus method (#901)
This method allows to wait for tasks to change their status to match the supplied predicate.
2018-01-31 12:00:09 +01:00
Martijn van Groningen 78b844e79b
Check allocationIdOnLastStatusUpdate when trying to detect whether a task is stale. 2018-01-31 11:59:02 +01:00
Igor Motov b142d7e29c
Persistent Tasks: Remove unused stopped and removeOnCompletion flags (#853)
The stopped and removeOnCompletion flags are not currently used, this commit removes them for now to simplify things.
2018-01-31 11:59:01 +01:00
Igor Motov 37fad04879
Persistent Tasks: Merge NodePersistentTask and RunningPersistentTask (#842)
Refactors NodePersistentTask and RunningPersistentTask into a single AllocatedPersistentTask. Makes it possible to update Persistent Task Status via AllocatedPersistentTask.
2018-01-31 11:59:01 +01:00
Igor Motov 19f39fd392
Persistent Tasks: remove task restart on failure (#815)
If a persistent task throws an exception, the persistent tasks framework will no longer try to restart the task. This is a temporary measure to prevent threshing the cluster with endless restart attempt. We will revisit this in the future version to make the restart process more robust. Please note, however, that if node executing the task goes down, the task will still be restarted on another node.
2018-01-31 11:59:01 +01:00
Igor Motov 9bd24418d5
Make PersistentAction independent from TransportActions (#742)
Removes the transport layer dependency from PersistentActions, makes PersistentActionRegistry immutable and rename actions into tasks in class and variable names.
2018-01-31 11:59:01 +01:00
Igor Motov 810d9335c0
Simplify names of PersistentTasks-related classes
PersistentTask -> NodePersistentTask
PersistentTasksInProgress -> PersistentTasks
PersistentTaskInProgress -> PersistentTask
2018-01-31 11:59:00 +01:00
Igor Motov b33fc05492
Request and Status in Persistent Tasks should be serialized using their writable names
Refactors xcontent serialization of Request and Status to use their writable names instead of action name. That simplifies the parsing logic, allows reuse of the same status object for multiple actions and is consistent with how named objects in xcontent are used.
2018-01-31 11:59:00 +01:00
Igor Motov 5eeb480d97
Add persistent task assignment explanations.
This commit allows persistent actions to indicate why a task was or wasn't assigned to a certain node.
2018-01-31 11:59:00 +01:00
Martijn van Groningen 479429c6ef
In order to keep track of restarted tasks, `allocationIdOnLastStatusUpdate` field was added to `PersistentTaskInProgress` class.
This will allow persistent task implementors to detect whether the executor node has changed or has been unset since the last status update has occured.
2018-01-31 11:58:07 +01:00