Commit Graph

7139 Commits

Author SHA1 Message Date
Ali Beyad 49298c16a9 [TEST] fix explain API awaiting info explanation check 2017-01-02 18:18:09 -05:00
Ali Beyad 47907b7093 [TEST] fix explain API test to allow for either awaiting info state or
no valid shard copy
2017-01-02 15:24:01 -05:00
Ali Beyad 20ab4be59f Cluster Explain API uses the allocation process to explain shard allocation decisions (#22182)
This PR completes the refactoring of the cluster allocation explain API and improves it in the following two high-level ways:

 1. The explain API now uses the same allocators that the AllocationService uses to make shard allocation decisions. Prior to this PR, the explain API would run the deciders against each node for the shard in question, but this was not executed on the same code path as the allocators, and many of the scenarios in shard allocation were not captured due to not executing through the same code paths as the allocators.

 2. The APIs have changed, both on the Java and JSON level, to accurately capture the decisions made by the system. The APIs also now report on shard moving and rebalancing decisions, whereas the previous API did not report decisions for moving shards which cannot remain on their current node or rebalancing shards to form a more balanced cluster.

Note: this change affects plugin developers who may have a custom implementation of the ShardsAllocator interface. The method weighShards has been removed and no longer has any utility. In order to support the new explain API, however, a custom implementation of ShardsAllocator must now implement ShardAllocationDecision decideShardAllocation(ShardRouting shard, RoutingAllocation allocation) which provides a decision and explanation for allocating a single shard. For implementations that do not support explaining a single shard allocation via the cluster allocation explain API, this method can simply return an UnsupportedOperationException.
2017-01-02 12:28:32 -06:00
javanna a3918ad094 Remove unused ParseFieldMatcher#match method 2016-12-31 09:24:44 +01:00
javanna cd6b569286 Remove some usages of ParseFieldMatcher in favour of using ParseField directly
Relates to #19552
Relates to #22130
2016-12-31 09:24:44 +01:00
Igor Motov f985638bba Add a generic way of checking version before serializing custom cluster object
In #22313 we added a check that prevents the SnapshotDeletionsInProgress custom cluster state objects from being sent to older elasticsearch nodes. This commits make this check generic and available to other cluster state custom objects if needed.
2016-12-30 14:27:09 -05:00
javanna 74acffaae9 fix compiler warning on access to static field using `this` 2016-12-30 18:57:47 +01:00
javanna df2acb3d9d Remove some more usages of ParseFieldMatcher in favour of using ParseField directly
Relates to #19552
Relates to #22130
2016-12-30 18:57:47 +01:00
javanna 6c54cbade4 Remove some more usages of ParseFieldMatcher in favour of using ParseField directly
Relates to #19552
Relates to #22130
2016-12-30 18:57:47 +01:00
javanna 45d010e874 Remove some usages of ParseFieldMatcher in favour of using ParseField directly
Relates to #19552
Relates to #22130
2016-12-30 18:57:47 +01:00
Adrien Grand 00de5b83bd The percentage of deleted docs needs to be strictly over 10% for deleted docs to be expunged. 2016-12-30 11:18:02 +01:00
Adrien Grand f1d7721932 Fix TermsAggregatorTests to not use LuceneTestCase.newSearcher since it needs a DirectoryReader. 2016-12-30 10:12:24 +01:00
Adrien Grand f89bb18a5d Dynamic `date` fields should use the `format` that was used to detect it is a date. (#22174)
Unless the dynamic templates define an explicit format in the mapping
definition: in that case the explicit mapping should have precedence.

Closes #9410
2016-12-30 09:48:24 +01:00
Adrien Grand 3f805d68cb Add the ability to set an analyzer on keyword fields. (#21919)
This adds a new `normalizer` property to `keyword` fields that pre-processes the
field value prior to indexing, but without altering the `_source`. Note that
only the normalization components that work on a per-character basis are
applied, so for instance stemming filters will be ignored while lowercasing or
ascii folding will be applied.

Closes #18064
2016-12-30 09:36:10 +01:00
Yannick Welsch 816e1c6cc4 Free shard resources when recovery reset is cancelled
Resetting a recovery consists of resetting the old recovery target and replacing it by a new recovery target object. This is done on the Cancellable threads of
the new recovery target. If the new recovery target is already cancelled before or while this happens, for example due to shard closing or recovery source
changing, we have to make sure that the old recovery target object frees all shard resources.

Relates to #22325
2016-12-29 17:05:47 +01:00
Yannick Welsch 6e6d9eb255 Use a fresh recovery id when retrying recoveries (#22325)
Recoveries are tracked on the target node using RecoveryTarget objects that are kept in a RecoveriesCollection. Each recovery has a unique id that is communicated from the recovery target to the source so that it can call back to the target and execute actions using the right recovery context. In case of a network disconnect, recoveries are retried. At the moment, the same recovery id is reused for the restarted recovery. This can lead to confusion though if the disconnect is unilateral and the recovery source continues with the recovery process. If the target reuses the same recovery id while doing a second attempt, there might be two concurrent recoveries running on the source for the same target.

This commit changes the recovery retry process to use a fresh recovery id. It also waits for the first recovery attempt to be fully finished (all resources locally freed) to further prevent concurrent access to the shard. Finally, in case of primary relocation, it also fails a second recovery attempt if the first attempt moved past the finalization step, as the relocation source can then be moved to RELOCATED state and start indexing as primary into the target shard (see TransportReplicationAction). Resetting the target shard in this state could mean that indexing is halted until the recovery retry attempt is completed and could also destroy existing documents indexed and acknowledged before the reset.

Relates to #22043
2016-12-29 10:58:15 +01:00
Igor Motov ca90d9ea82 Remove PROTO-based custom cluster state components
Switches custom cluster state components from PROTO-based de-serialization to named objects based de-serialization
2016-12-28 13:32:35 -05:00
Jim Ferenczi e7444f7d77 Fix scaled_float numeric type in aggregations (#22351)
`scaled_float` should be used as DOUBLE in aggregations but currently they are used as LONG.
This change fixes this issue and adds a simple it test for it.

Fixes #22350
2016-12-27 09:23:22 +01:00
Adrien Grand 3cb164b22e Fix IndexShardTests.testDocStats. 2016-12-26 20:19:11 +01:00
Adrien Grand 2127db27a3 Add trace logging to CircuitBreakerServiceIT.testParentChecking. 2016-12-26 16:05:27 +01:00
Adrien Grand 2d81750a13 Make ESTestCase resilient to initialization errors. 2016-12-26 14:55:22 +01:00
Adrien Grand f80165c374 Fix LineLength issues. 2016-12-26 11:22:09 +01:00
Adrien Grand d89757b848 Fix mutate function to always actually modify the failure object. 2016-12-26 10:34:50 +01:00
Ali Beyad 1cb5dc42ff Updates SnapshotDeletionsInProgress version number introduced to 5.2.0 2016-12-25 19:28:01 -05:00
Ali Beyad 8261bd358a Synchronize snapshot deletions on the cluster state (#22313)
Before, snapshot/restore would synchronize all operations on the cluster
state except for deleting snapshots.  This meant that only one
snapshot/restore operation would be allowed in the cluster at any given
time, except for deletions - there could be two or more snapshot
deletions running at the same time, or a deletion could be running,
unbeknowest to the rest of the cluster, and thus a snapshot or restore
would be allowed at the same time as the snapshot deletion was still in
progress.  This could cause any number of synchronization issues,
including the situation where a snapshot that was deleted could reappear
in the index-N file, even though its data was no longer present in the
repository.

This commit introduces a new custom type to the cluster state to
represent deletions in progress.  Now, another deletion cannot start if
a deletion is currently in progress.  Similarily, a snapshot or restore
cannot be started if a deletion is currently in progress.  In each case,
if attempting to run another snapshot/restore operation while a deletion
is in progress, a ConcurrentSnapshotExecutionException will be thrown.
This is the same exception thrown if trying to snapshot while another
snapshot is in progress, or restore while a snapshot is in progress.

Closes #19957
2016-12-25 19:00:20 -05:00
Jason Tedor d5c18bf5c9 Fix doc stats test when deleting all docs
This commit fixes an issue with IndexShardTests#testDocStats when the
number of deleted docs is equal to the number of docs. In this case,
Luence will remove the underlying segment tripping an assertion on the
number of deleted docs.
2016-12-23 15:20:42 -05:00
Jason Tedor 6deb5283db Fix delete op serialization format constant
The delete op serizliation format constant for 5.x was off by one. This
commit fixes this, and cleans up the handling of these formats.
2016-12-23 11:50:43 -05:00
Jason Tedor 2713549533 Use reader for doc stats
Today we try to pull stats from index writer but we do not get a
consistent view of stats. Under heavy indexing, this inconsistency can
be very skewed indeed. In particular, it can lead to the number of
deleted docs being reported as negative and this leads to serialization
issues. Instead, we should provide a consistent view of the stats by
using an index reader.

Relates #22317
2016-12-23 09:44:56 -05:00
Boaz Leskes c2baa5f213 TransportService should capture listener before spawning background notification task
Not doing this made it difficult to establish a happens before relationship between connecting to a node and adding a listeners. Causing test code like this to fail sproadically:

```
        // connection to reuse
        handleA.transportService.connectToNode(handleB.node);

        // install a listener to check that no new connections are made
        handleA.transportService.addConnectionListener(new TransportConnectionListener() {
            @Override
            public void onConnectionOpened(DiscoveryNode node) {
                fail("should not open any connections. got [" + node + "]");
            }
        });

```

relates to #22277
2016-12-23 13:55:12 +01:00
Yannick Welsch baea17b53f Separate cluster update tasks that are published from those that are not (#21912)
This commit factors out the cluster state update tasks that are published (ClusterStateUpdateTask) from those that are not (LocalClusterUpdateTask), serving as a basis for future refactorings to separate the publishing mechanism out of ClusterService.
2016-12-23 12:23:52 +01:00
Boaz Leskes eb7450bcdc UnicastZenPing add trace logging on connection opening 2016-12-23 09:15:11 +01:00
Jason Tedor faaa671fb6 Enable assertions in integration tests
When starting a standalone cluster, we do not able assertions. This is
problematic because it means that we miss opportunities to catch
bugs. This commit enables assertions for standalone integration tests,
and fixes a couple bugs that were uncovered by enabling these.

Relates #22334
2016-12-22 20:08:02 -05:00
Nik Everett 55099df1cb Support negative numbers in writeVLong (#22314)
We don't *want* to use negative numbers with `writeVLong`
so throw an exception when we try. On the other
hand unforeseen bugs might cause us to write negative numbers (some versions of Elasticsearch don't have the exception, only an assertion)
so this fixes `readVLong` so that instead of reading a wrong
value and corrupting the stream it reads the negative value.
2016-12-22 13:02:39 -05:00
Boaz Leskes 13c5881f3e UnicastZenPing's PingingRound should prevent opening connections after being closed
This may cause them to leak. Provisioning for it was made in #22277 but sadly a crucial ensureOpen call was forgotten
2016-12-22 18:45:44 +01:00
Boaz Leskes 7d0dbd2082 add trace logging to UnicastZenPingTests.testResolveReuseExistingNodeConnections 2016-12-22 18:10:15 +01:00
Tal Levy 6d7261c4d3 Adds ingest processor headers to exception for unknown processor. (#22315)
Optimistically check for `tag` of an unknown processor for better tracking of which
processor declaration is to blame in an invalid configuration.

Closes #21429.
2016-12-22 08:24:00 -08:00
Nik Everett f5f2149ff2 Remove much ceremony from parsing client yaml test suites (#22311)
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.

I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
2016-12-22 11:00:34 -05:00
Stéphane Campinas e1b8528ab8 Support numeric bounds with decimal parts for long/integer/short/byte datatypes (#21972)
Close #21600
2016-12-22 15:20:15 +01:00
Martijn van Groningen b9a90eca20 inner hits: Don't inline inner hits if the query the inner hits is inlined into can't resolve mappings and ignore_unmapped has been set to true
Closes #21620
2016-12-22 14:51:33 +01:00
Colin Goodheart-Smithe 9a73a2efb3 Fix stackoverflow error on InternalNumericMetricAggregation 2016-12-22 13:50:54 +00:00
Adrien Grand 9b3b693d15 Date detection should not rely on a hardcoded set of characters. (#22171)
Currently we only apply date detection on strings that contain either `:`, `-`
or `/`. This commit inverses the heuristic in order to only apply date detection
on strings that are not parseable as a number, so that more date formats can be
used as dynamic dates formats.

Closes #1694
2016-12-22 14:35:59 +01:00
Adrien Grand e39942fc02 `value_type` is useful regardless of scripting. (#22160)
Today we only expose `value_type` in scriptable aggregations, however it is
also useful with unmapped fields. I suspect we never noticed because
`value_type` was not documented (fixed) and most aggregations are scriptable.

Closes #20163
2016-12-22 14:35:12 +01:00
Adrien Grand fd6e1a30de Improve concurrency of ShardCoreKeyMap. (#22316)
`ShardCoreKeyMap.add` is called on each segment for all search requests, which
means it might become a bottleneck under a cocurrent load of cheap search
requests since this method acquires a mutex. This change proposes to use a
`ConcurrentHashMap` which allows to only take the mutex in the case that the
`LeafReader` has never been seen before.
2016-12-22 14:34:08 +01:00
Martijn van Groningen acd64c6ee1 fixed jdocs and removed already fixed norelease 2016-12-22 14:18:30 +01:00
Colin Goodheart-Smithe 06576ed13b Adds abstract test classes for serialisation (#22281)
This adds test classes that can be used to test the wire serialisation and (optionally) the XContent serialisation of objects that implement Streamable/Writeable and ToXContent.

These test classes will enable classes sich as InternalAggregation (or at least its implementations) to be tested in a consistent way when is comes to testing serialisation.
2016-12-22 10:49:18 +00:00
Jason Tedor 7946396fe6 Introduce translog no-op
As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.

Relates #22291
2016-12-21 23:08:16 -05:00
Jason Tedor 91cb563247 Provide helpful error message if a plugin exists
Today if an older version of a plugin exists, we fail to notify the user
with a helpful error message. This happens because during plugin
verification, we attempt to read the plugin descriptors for all existing
plugins. When an older version of a plugin is sitting on disk, we will
attempt to read this old plugin descriptor and fail due to a version
mismatch. This leads to an unhelpful error message. Instead, we should
check for existence of the plugin as part of the verification phase, but
before attempting to read plugin descriptors for existing plugins. This
enables us to provide a helpful error message to the user.

Relates #22305
2016-12-21 22:37:07 -05:00
Nik Everett 8aca504c86 Clear static variable after suite
This was causing test failures:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/1101/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+dockeralpine-periodic/513/consoleFull
2016-12-21 13:39:28 -05:00
Luca Cavanna 5c8232c03b Restore deprecation warning for invalid match_mapping_type values (#22304)
The deprecation warning gives now the same message as 5.x. The deprecation warning was previously removed, but given that we are still lenient with old indices we should still output the warning.
2016-12-21 16:56:55 +01:00
Adrien Grand 84edf36f11 Make `-0` compare less than `+0` consistently. (#22173)
Our `float`/`double` fields generally assume that `-0` compares less than `+0`,
except when bounds are exclusive: an exclusive lower bound on `-0` excludes
`+0` and an exclusive upper bound on `+0` excludes `-0`.

Closes #22167
2016-12-21 16:51:45 +01:00