Commit Graph

30122 Commits

Author SHA1 Message Date
Robin Neatherway 282974215c MetaDataIndexAliasesService wrong get type (#28614)
A get of the wrong type would always have returned null so these
indices would have been inserted into the map repeatedly.
2018-02-12 15:55:17 -08:00
Robin Neatherway 3174b2cbfa NXYSignificanceHeuristic equality update (#28616)
* NXYSignificanceHeuristic.java: implementation of equality would have
  failed with a ClassCastException when comparing to another type.
  Replaced with the Eclipse generated form.
2018-02-12 15:51:57 -08:00
Robin Neatherway 68b7a5c281 Fix DeadlockAnalyzer printer (#28615)
Remove `if` block that was always true.
2018-02-12 15:28:56 -08:00
Rachel Johnson 617044e5fe Update search.asciidoc (#28646)
[DOCS] Corrected typo - singe to single.
2018-02-12 15:02:13 -08:00
Ryan Ernst 65f1dd424e
Plugins: Remove intermediate "elasticsearch" directory within plugin zips (#28589)
This commit removes the extra layer of all plugin files existing under
"elasticsearch" within plugin zips. This simplifies building plugin zips
and removes the need for special logic of modules vs plugins.
2018-02-12 14:27:30 -08:00
Michael Basnight 4e0c1463d5
Fix build.snapshot bug in version collection (#28641)
The build.snapshot was mistakenly passed in to every snapshot version,
so when release tests were run, these versions were mistaken as released
entities and could not be found in maven, because they do not
exist. This fix removes that bug in logic, and always makes them proper
snapshots. This has a benefit of cleaning up the VersionUtilsTests
because they no longer rely on different sets of versions to check
against, which was also a bug.
2018-02-12 14:56:07 -06:00
Nhat Nguyen 9eb9ce3843
Require translogUUID when reading global checkpoint (#28587)
Today we use the persisted global checkpoint to calculate the starting 
seqno in peer-recovery. However we do not check whether the translog 
actually belongs to the existing Lucene index when reading the global
checkpoint. In some rare cases if the translog does not match the Lucene
index, that recovering replica won't be able to complete its recovery.
This can happen as follows.

1. Replica executes a file-based recovery
2. Index files are copied to replica but crashed before finishing the recovery
3. Replica starts recovery again with seq-based as the copied commit is safe
4. Replica fails to open engine because translog and Lucene index are not matched
5. Replica won't be able to recover from primary

This commit enforces the translogUUID requirement when reading the 
global checkpoint directly from the checkpoint file.

Relates #28435
2018-02-12 13:23:32 -05:00
David Turner c15e5d480c
Improve logging in ClusterFormationTasks#configureWaitTask (#28637)
We suspect a build failure might be due to a startup timeout, but there is 
insufficient information in the logs. This change enhances the information
available so we will be better-informed next time.
2018-02-12 18:46:25 +01:00
Lee Hinman 6538542603
Switch to hardcoding Smile as the state format (#28610)
This commit changes the state format that was previously passed in to
`MetaDataStateFormat` to always use Smile. This doesn't actually change the
format, since we have used Smile for writing the format since at least 5.0. This
removes the automatic detection of the state format when reading state, since
any state that could be processed in 6.x and 7.x would already have been written
in Smile format.

This is work towards removing the deprecated methods in the XContent code where
we do automatic content-type detection.

Relates to #28504
2018-02-12 08:07:01 -07:00
Martijn van Groningen c19d84012e
iter 2018-02-12 13:50:48 +01:00
Martijn van Groningen 21e5ee6551
[TEST] Changed how stash dumps are logged in yaml tests in case of failures
Currently if a yaml test has a teardown and a test is failing then
a stash dump of a request in the teardown is logged instead of
a stash dump of a request in the test itself.

By handling the logging of stash dumps separately for setup, tests and
teardown yaml sections we shouldn't miss the stash dump of request/response
that is actually causing the yaml test to fail.
2018-02-12 13:50:48 +01:00
Jim Ferenczi e6a8528554
Force depth_first mode execution for terms aggregation under a nested context (#28421)
This commit forces the depth_first mode for `terms` aggregation that contain a sub-aggregation that need to access the score of the document
in a nested context (the `terms` aggregation is a child of a `nested` aggregation). The score of children documents is not accessible in
breadth_first mode because the `terms` aggregation cannot access the nested context.

Close #28394
2018-02-12 13:38:11 +01:00
Jim Ferenczi 7dc00ef1f5
Search option terminate_after does not handle post_filters and aggregations correctly (#28459)
* Search option terminate_after does not handle post_filters and aggregations correctly

This change fixes the handling of the `terminate_after` option when post_filters (or min_score) are used.
`post_filter` should be applied before `terminate_after` in order to terminate the query when enough document are accepted
by the post_filters.
This commit also changes the type of exception thrown by `terminate_after` in order to ensure that multi collectors (aggregations)
do not try to continue the collection when enough documents have been collected.

Closes #28411
2018-02-12 13:36:33 +01:00
Ke Li 55448b2630 [Tests] Remove unnecessary condition check (#28559)
The condition value in question is true, regardless of the randomBoolean() value.
This change simplifies this removing the condition blocks.
2018-02-12 11:33:19 +01:00
Boaz Leskes 4aece92b2c
IndexShardOperationPermits: shouldn't use new Throwable to capture stack traces (#28598)
The is a follow up to #28567 changing the method used to capture stack traces, as requested
during the review. Instead of creating a throwable, we explicitly capture the stack trace of the
current thread. This should Make Jason Happy Again ™️ .
2018-02-12 10:33:13 +01:00
Jim Ferenczi 37e938f9de Fix indices.sort rest test
Set the number of replicas to 0 in order to avoid race condition during the test
Fixes #24416
2018-02-12 09:03:55 +01:00
Jason Tedor 69313ffef3
Disable console logging in the Windows service
When Elasticsearch is run as a service we should not use the console
logger otherwise we end up duplicating logging (to the Elasticsearch
logs and whereever standard output is captured). Previously we disabled
the console logger when started as a service using systemd (otherwise
the console logs are duplicated to the journal). This commit does the
same for the Windows service, starting Elasticsearch with the --quiet
flag to avoid standard output being written to the service stdout logs.

Relates #28618
2018-02-11 11:10:40 -05:00
Ivan Brusic 5f6ce1ae75 Highlight the "hidden" REST API test suite (#28381) 2018-02-09 15:05:19 -08:00
Michael Basnight e0bea70070
Generalize BWC logic (#28505)
Generalizing BWC building so that there is less code to modify for a release. This ensures we do not
need to think about what major or minor version is in the gradle code. It follows the general rules of the
elastic release structure. For more information on the rules, see the VersionCollection's javadoc.

This also removes the additional bwc snapshots that will never be released, such as 6.0.2, which were
being built and tested against every time we ran bwc tests.

Additionally, it creates 4 new projects that correspond to the different types of snapshots that may exist
for a given version. Its possible to now run those individual tasks to work out bwc logic whereas
previously it was impossible and the entire suite of bwc tests had to be run to work out any logic
changes in the build tools' bwc project. Please note that if the project does not make sense for the 
version that is current, that an error will be thrown from that individual project if an attempt is made to 
run it.

This should allow for automating the version bumps as well, since it removes all the hardcoded version
logic from the configs.
2018-02-09 14:55:10 -06:00
Ryan Ernst 20c37efea2
Build: Replace provided configuration with compileOnly (#28564)
When elasticsearch was originally moved to gradle, the "provided" equivalent in maven had to be done through a plugin. Since then, gradle added the "compileOnly" configuration. This commit removes the provided plugin and replaces all uses with compileOnly.
2018-02-09 11:30:24 -08:00
Ryan Ernst b637a5b028
Build: Improve test status logging (#28563)
Tests use the (internal) gradle progress logger. Since aroudn gradle
4.0, loggers can have children loggers which each get their own line of
updating output. This commit improves the test status output to give
each jvm its own status line, as well as lines for suite and test
counts. It also fixes an issue where the progress logger was not marked
as completed, which caused its last output to never be cleared.
2018-02-09 11:29:07 -08:00
Ryan Ernst 7e65cd8ca8
Build: Add warning when bwc tests are disabled (#28585)
The bwc tests can be disabled in order to facilitate commits necessary
to older branches to maintain backcompat. A check already exists to
ensure this flag is not left disabled too long (once a day by the
branchConsistency check). However, it can be surprising if you try
running bwc tests explicitly and they look like nothing is happening.
This commit adds a warning during configuration to ensure it is clear
the bwc tests are disabled and enforces a link to a PR which is in the
process of being backported.
2018-02-09 11:26:22 -08:00
David Pilato 8bfaf7d390
Add information about the snapshot repository (#27719)
From https://twitter.com/ddanieltwitt/status/938754761968451585, I think
we should add in our documentation, where users can find the SNAPSHOT
versions we are publishing.
2018-02-09 17:59:04 +01:00
Lee Hinman 5263b8cc7e
Remove all instances of the deprecated `ParseField.match` method (#28586)
This removes all the server references to the deprecated `ParseField.match`
method in favor of the method that passes in the deprecation logger.

Relates to #28504
2018-02-09 09:19:24 -07:00
Martijn van Groningen 766b9d600e
Fixed a bug that prevents pipelines to load that use stored scripts after a restart.
The bug was caused because the ScriptService had no reference to a ClusterState instance,
because it received the ClusterState after the PipelineStore. This only is the case
after a restart.

A bad side effect is that during a restart, any pipeline to be loaded after the pipeline that uses a stored script,
was never loaded, which caused many pipeline to be missing in bulk / index request api calls.
2018-02-09 17:14:00 +01:00
Yannick Welsch 5735e088f9
Fsync directory after cleanup (#28604)
After copying over the Lucene segments during peer recovery, we call cleanupAndVerify which removes all other files in the directory and which then calls getMetadata to check if the resulting files are a proper index. There are two issues with this:

- the directory is not fsynced after the deletions, so that the call to getMetadata, which lists files in the directory, can get a stale view, possibly seeing a deleted corruption marker (which leads to the exception seen in #28435)
- failing to delete a corruption marker should result in a hard failure, as the shard is otherwise unusable.
2018-02-09 17:06:36 +01:00
Christoph Büscher 231fd3c9be [Docs] Remove misleading comment
The TikaImpl#parse method comment sounds like this method is only used
in the same package for testing, but AttachmentProcessor uses it outside
of testing, so we should remove this comment.
2018-02-09 15:47:38 +01:00
Jason Tedor 641a6c9e62
Guard accessDeclaredMembers for Tika on JDK 10
Tika parsers need accessDeclaredMembers because ZipFile needs
accessDeclaredMembers on JDK 10. This commit guards adding this
permission to parsers so that the permission is only granted on JDK
10. Additionally, we add an assertion that forces us to check if the
permission is still needed in JDK 11.

Relates #28603
2018-02-09 09:08:07 -05:00
Christoph Büscher cc9cb5356a
Add missing runtime permission to TikaImpl (#28602)
Tests on jdk10 were failing because of a change in its ZipFile implementation 
that now needs `accessDeclaredMembers` permissions. This change adds 
the missing permission to the plugins security policy and TikaImpl.

Closes #28568
2018-02-09 14:41:24 +01:00
Igor Motov da1a10fa92
Add generic array support to AbstractObjectParser (#28552)
Adds a generic declareFieldArray that can process arrays of arbitrary elements.
2018-02-08 19:46:12 -05:00
Islam Heggo f562c7f15a Correct the explanation of load time percentiles (#28510)
* Correct the explanation of load time percentiles

* Adjusting the percentile clarification

Eliminating the false sentence about majority of load time
2018-02-08 16:29:43 -08:00
Nhat Nguyen dbf9fb31e4
Do not ignore shard not-available exceptions in replication (#28571)
The shard not-available exceptions are currently ignored in the
replication as the best effort avoids failing not-yet-ready shards.
However these exceptions can also happen from fully active shards. If
this is the case, we may have skipped important failures from replicas.
Since #28049, only fully initialized shards are received write requests.
This restriction allows us to handle all exceptions in the replication.

There is a side-effect with this change. If a replica retries its peer
recovery second time after being tracked in the replication group, it
can receive replication requests even though it's not-yet-ready. That
shard may be failed and allocated to another node even though it has a
good lucene index on that node.

This PR does not change the way we report replication errors to users,
hence the shard not-available exceptions won't be reported as before.

Relates #28049
Relates #28534
2018-02-08 18:05:27 -05:00
Boaz Leskes ba59cf1262
Capture stack traces while issuing IndexShard operations permits to easy debugging (#28567)
Today we acquire a permit from the shard to coordinate between indexing operations, recoveries and other state transitions. When we leak an  permit it's practically impossible to find who the culprit is. This PR add stack traces capturing for each permit so we can identify which part of the code is responsible for acquiring the unreleased permit. This code is only active when assertions are active. 

The output is something like:
```
java.lang.AssertionError: shard [test][1] on node [node_s0] has pending operations:
--> java.lang.RuntimeException: something helpful 2
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:223)
	at org.elasticsearch.index.shard.IndexShard.<init>(IndexShard.java:322)
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:382)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:514)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231)
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

--> java.lang.RuntimeException: something helpful
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:223)
	at org.elasticsearch.index.shard.IndexShard.<init>(IndexShard.java:311)
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:382)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:514)
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231)
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.base/java.lang.Thread.run(Thread.java:844)

```
2018-02-08 22:59:02 +01:00
Nhat Nguyen 5b8870f193
Only log warning when actually failing shards (#28558)
Currently the master node logs a warning message whenever it receives a 
failed shard request. However, this can be noisy because

- Multiple failed shard requests can be issued for a single shard
- Failed shard requests can be still issued for an already failed shard

This commit moves the log-warn to AllocationService in which the failing 
shard action actually happens. This is another prerequisite step in 
order to not ignore the shard not-available exceptions in the
replication.

Relates #28534
2018-02-08 15:37:52 -05:00
Ryan Ernst 578773f11b Re-enable bwc tests after backport of #28556 2018-02-08 11:53:48 -08:00
Jason Tedor 5badacf391
Fix race condition in queue size test
The queue size test has a race condition. Namely the offering thread can
run so quickly completing all of its offering iterations before the
queue size thread ever has a chance to run a single size poll
iteration. This means that the size will never actually be polled and
the test can spuriously fail. What we really want to do here, since this
test is checking for a race condition between polling the size of the
queue and offers to the queue, we want to execute each iteration in
lockstep giving the threads multiple changes for the race between
polling the size and offers to occur. This commit addresses this by
running the two threads in lockstep for multiple iterations so that they
have multiple chances to race.

Relates #28584
2018-02-08 14:23:24 -05:00
Igor Motov 80c8c3b114 Add 6.2.2 version constant 2018-02-08 13:43:34 -05:00
Lee Hinman 2e4c834a13
Switch to non-deprecated ParseField.match method for o.e.search (#28526)
* Switch to non-deprecated ParseField.match method for o.e.search

This replaces more of the `ParseField.match` calls with the same call using a
deprecation handler. It encapsulates all of the instances in the
`org.elastsicsearch.search` package.

Relates to #28504

* Address Nik's comments
2018-02-08 11:17:57 -07:00
Nhat Nguyen 0edde25f09 TEST: use new nodes assumption in testUpdateSnapshotStatus
Using an assertion is not correct here as the list of new nodes can be
empty.
2018-02-08 13:11:21 -05:00
Lee Hinman b64bf51000
Replace more deprecated ParseField.match calls with non-deprecated call (#28525)
* Replace more deprecated ParseField.match calls with non-deprecated call

This replaces more of the `ParseField.match` calls with the same call using a
deprecation handler.

Relates to #28504

* Address Nik's comments
2018-02-08 09:45:28 -07:00
Christoph Büscher 01791277cb
Test that rank_eval request parsing is not lenient (#28516)
Parsing of a ranking evaluation request and its subcomponents should throw parsing
errors on unknown fields. This change adds tests for this and changes the parser 
behaviour in cases where it is needed.
2018-02-08 17:38:45 +01:00
Ryan Ernst a55eda626f
Plugins: Store elasticsearch and java versions in PluginInfo (#28556)
Plugin descriptors currently contain an elasticsearch version,
which the plugin was built against, and a java version, which the plugin
was built with. These versions are read and validated, but not stored.
This commit keeps them in PluginInfo so they can be used later.
While seeing the elasticsearch version is less interesting (since it is
enforced to match that of the running elasticsearc node), the java
version is interesting since we only validate the format, not the actual
version. This also makes PluginInfo have full parity with the plugin
properties file.
2018-02-08 08:31:39 -08:00
Luca Cavanna cb7159d4c9
REST high-level Client: add missing final modifiers (#28572)
A couple of methods in RestHighLevelClient were supposed to be final but the modifier was forgotten.
2018-02-08 17:14:20 +01:00
Jason Tedor 86fd48e5f5
Fix size blocking queue to not lie about its weight
Today when offering an item to a size blocking queue that is at
capacity, we first increment the size of the queue and then check if the
capacity is exceeded or not. If the capacity is indeed exceeded, we do
not add the item to the queue and immediately decrement the size of the
queue. However, this incremented size is exposed externally even though
the offered item was never added to the queue (this is effectively a
race on the size of the queue). This can lead to misleading statistics
such as the size of a queue backing a thread pool. This commit fixes
this issue so that such a size is never exposed. To do this, we replace
the hidden CAS loop that increments the size of the queue with a CAS
loop that only increments the size of the queue if we are going to be
successful in adding the item to the queue.

Relates #28557
2018-02-08 06:54:39 -05:00
Tim Brooks 16f7e00514
Improve testTransportStatsWithException test (#28554)
This commit modifies the transport stats with exception test to remove
the requirement that we calculate the published address size when
comparing bytes received. This is tricky and is currently broken as we
also place the address string in the transport exception, however we do
not adjust the bytes for that.

The solution in this commit is to just serialize the transport exception
in the test and use that for the calculation.
2018-02-07 14:31:42 -07:00
Jason Tedor 666c4f9414
Index shard should roll generation via the engine
Today when a replica shard detects a new primary shard (via a primary
term transition), we roll the translog generation. However, the
mechanism that we are using here is by reaching through the engine to
the translog directly. By poking all the way through rather than asking
the engine to manage the roll for us we miss:
 - taking a read lock in the engine while the roll is occurring
 - trimming unreferenced readers

This commit addresses this by asking the engine to roll the translog
generation for us.

Relates #28537
2018-02-07 14:57:50 -05:00
Martijn van Groningen 2023c98bea
Added more parameter to PersistentTaskPlugin#getPersistentTasksExecutor(...) 2018-02-07 17:43:35 +01:00
Christoph Büscher c0886cf7c6
[Tests] Relax assertion in SuggestStatsIT (#28544)
The test expects suggest times in milliseconds that are strictly
positive. Internally they are measured in nanos, it is possible that on
really fast execution this is rounded to 0L, so this should also be an
accepted value.

Closes #28543
2018-02-07 17:26:08 +01:00
Christoph Büscher 305b87b4b7
Make internal Rounding fields final (#28532)
The fields in the internal rounding classes can be made final with very minor
adjustments to how they are read from a StreamInput.
2018-02-07 09:33:21 +01:00
Jason Tedor c2fcf15d9d
Fix the ability to remove old plugin
We now read the plugin descriptor when removing an old plugin. This is
to check if we are removing a plugin that is extended by another
plugin. However, when reading the descriptor we enforce that it is of
the same version that we are. This is not the case when a user has
upgraded Elasticsearch and is now trying to remove an old plugin. This
commit fixes this by skipping the version enforcement when reading the
plugin descriptor only when removing a plugin.

Relates #28540
2018-02-06 17:38:26 -05:00