Commit Graph

48802 Commits

Author SHA1 Message Date
Jake Landis c320b499a0
Prevent deadlock by using separate schedulers (#48697) (#48964)
Currently the BulkProcessor class uses a single scheduler to schedule
flushes and retries. Functionally these are very different concerns but
can result in a dead lock. Specifically, the single shared scheduler
can kick off a flush task, which only finishes it's task when the bulk
that is being flushed finishes. If (for what ever reason), any items in
that bulk fails it will (by default) schedule a retry. However, that retry
will never run it's task, since the flush task is consuming the 1 and
only thread available from the shared scheduler.

Since the BulkProcessor is mostly client based code, the client can
provide their own scheduler. As-is the scheduler would require
at minimum 2 worker threads to avoid the potential deadlock. Since the
number of threads is a configuration option in the scheduler, the code
can not enforce this 2 worker rule until runtime. For this reason this
commit splits the single task scheduler into 2 schedulers. This eliminates
the potential for the flush task to block the retry task and removes this
deadlock scenario.

This commit also deprecates the Java APIs that presume a single scheduler,
and updates any internal code to no longer use those APIs.

Fixes #47599

Note - #41451 fixed the general case where a bulk fails and is retried
that can result in a deadlock. This fix should address that case as well as
the case when a bulk failure *from the flush* needs to be retried.
2019-11-11 16:31:21 -06:00
Jason Tedor acae07113f
Fix names of UBI-based Docker build contexts
This commit fixes the names of the UBI-based Docker build contexts to
lift the ubi component of the name into the archive base name, instead
of the classifier.
2019-11-11 15:43:53 -05:00
Benjamin Trent 46ab1db54f
[7.x] [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050) (#48958)
* [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

Related PR: https://github.com/elastic/ml-cpp/pull/809

* adjusting bwc version
2019-11-11 15:43:03 -05:00
Mark Vieira 8acbd0aa2a
Ensure client jar projects generate correct POM artifacts (#48961) 2019-11-11 12:25:14 -08:00
Mark Tozzi d9e569278f
Refactor and DRY up Kahan Sum algorithm (#48558) (#48959) 2019-11-11 15:09:19 -05:00
Armin Braun c45470f84f
Fix ShardGenerations in RepositoryData in BwC Case (#48920) (#48947)
We were tripping the assertion that the makes sure we only have empty `ShardGenerations` in `RepositoryData` in the BwC case because shard generations were passed to the `Repository` in the BwC case. Fixed by only generating empty shard gen for BwC snapshots in `SnapshotsService`.
2019-11-11 18:02:53 +01:00
Jake Landis 909fbd0015
[7.x] Mute FullClusterRestartTest#testWatcher and 30s timeout… (#48850)
The timeout was increased to 60s to allow this test more time to reach a
yellow state. However, the test will still on occasion fail even with the
60s timeout.

Related: #48381
Related: #48434
Related: #47950
Related: #40178
2019-11-11 09:38:14 -06:00
Christoph Büscher 6119f0aaa2 Fix Eclipse compilation in DataFrameDataExtractorTests (#48942) 2019-11-11 16:17:55 +01:00
Martijn van Groningen a1dd830cb5
Re-enabled test with longer timeout waiting for monitoring.
See #48258
2019-11-11 16:07:50 +01:00
István Zoltán Szabó c2f52015d3 [DOCS] Removes best practice about fields that are highly correlated to the dependent variable. (#48935) 2019-11-11 16:01:21 +01:00
István Zoltán Szabó 91888959e8 [DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307) 2019-11-11 15:55:12 +01:00
Patrick Maynard 4b85498617 [DOCS] Fix typo in search type docs (#48868) 2019-11-11 09:38:48 -05:00
Rory Hunter 014e1b1090
Improve resiliency to auto-formatting in server (#48940)
Backport of #48450.

Make a number of changes so that code in the `server` directory is more
resilient to automatic formatting. This covers:

* Reformatting multiline JSON to embed whitespace in the strings
* Move some comments around to they aren't auto-formatted to a strange
  place. This also required moving some `&&` and `||` operators from the
  end-of-line to start-of-line`.
* Add helper method `reformatJson()`, to strip whitespace from a JSON
  document using XContent methods. This is sometimes necessary where
  a test is comparing some machine-generated JSON with an expected
  value.

Also, `HyperLogLogPlusPlus.java` is now excluded from formatting because it
contains large data tables that don't reformat well with the current settings,
and changing the settings would be worse for the rest of the codebase.
2019-11-11 14:33:04 +00:00
James Rodewig dd92830801 [DOCS] Reformat condition token filter (#48775) 2019-11-11 08:49:44 -05:00
Rafael Acevedo eb0d8f3383 update gradle to 5.6.4 (#48872) 2019-11-11 15:30:31 +02:00
Rory Hunter 35e21f85f3
Reenable Docker tests again (#48936)
Backport of #48898.

We no longer configure distributions for prior versions for Docker. This
is because doing so prompts Gradle to try and resolve the Docker
dependencies, which doesn't work as they can't be downloaded via Ivy
(configured in DistributionDownloadPlugin). Since we need these for the
BATS upgrade tests, and those tests only cover .rpm and .deb, it's OK to
omit creating such distributions in the first place. We may need to
revisit this in the future, to allow upgrade testing using Docker
containers.
2019-11-11 11:43:32 +00:00
Alpar Torok e33a1b7942 Add links to infra-stats for scans generated in CI (#48732)
* Add links to infra-stats for scans generated in CI

It turns out we already gather system logs in infra-stats, and we have
system metrics too there.

This PR adds a links to the logs we gather for the host the build is
runnig on.
And a link to the host overview in the infrastructure app tuned to 5
minutes from before gradle started to 5 minutes after the scan was
generated.

* add buildFinished
2019-11-11 11:24:09 +02:00
Arne Welzel f642baa9fb [DOCS] Remove extra "when" (#48926) 2019-11-11 10:11:02 +01:00
Yannick Welsch 87862868c6 Allow realtime get to read from translog (#48843)
The realtime GET API currently has erratic performance in case where a document is accessed
that has just been indexed but not refreshed yet, as the implementation will currently force an
internal refresh in that case. Refreshing can be an expensive operation, and also will block the
thread that executes the GET operation, blocking other GETs to be processed. In case of
frequent access of recently indexed documents, this can lead to a refresh storm and terrible
GET performance.

While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted
to read from the translog in case of realtime GET API or update API, this was removed in 5.0
(#20102) to avoid inconsistencies between values that were returned from the translog and
those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and
upsert to read from the translog again as it was easier to guarantee consistency for these, and
also brought back more predictable performance characteristics of this API. Calls to the realtime
GET API, however, would still always do a refresh if necessary to return consistent results. This
means that users that were calling realtime GET APIs to coordinate updates on client side
(realtime GET + CAS for conditional index of updated doc) would still see very erratic
performance.

This PR (together with #48707) resolves the inconsistencies between reading from translog and
index. In particular it fixes the inconsistencies that happen when requesting stored fields, which
were not available when reading from translog. In case where stored fields are requested, this
PR will reparse the _source from the translog and derive the stored fields to be returned. With
this, it changes the realtime GET API to allow reading from the translog again, avoid refresh
storms and blocking the GET threadpool, and provide overall much better and predictable
performance for this API.
2019-11-09 17:47:50 +01:00
Nhat Nguyen ff6c121eb9 Closed shard should never open new engine (#47186)
We should not open new engines if a shard is closed. We break this
assumption in #45263 where we stop verifying the shard state before
creating an engine but only before swapping the engine reference.
We can fail to snapshot the store metadata or checkIndex a closed shard
if there's some IndexWriter holding the index lock.

Closes #47060
2019-11-08 23:40:34 -05:00
Nhat Nguyen 9a42e71dd9 Do not cancel recovery for copy on broken node (#48265)
This change fixes a poisonous situation where an ongoing recovery was
canceled because a better copy was found on a node that the cluster had
previously tried allocating the shard to but failed. The solution is to
keep track of the set of nodes that an allocation was failed on so that
we can avoid canceling the current recovery for a copy on failed nodes.

Closes #47974
2019-11-08 23:10:47 -05:00
Julian Simioni 5e4501eb3f [Docs] Consolidate single example into a single line (#48904)
The first example of splitting rules for the `word_delimiter` token filter was spread across two bullet points. This makes it look like they are two separate splitting rules.
2019-11-08 15:12:45 -05:00
Yannick Welsch af887be3e5 Hide orphaned tasks from follower stats (#48901)
CCR follower stats can return information for persistent tasks that are in the process of being cleaned up. This is problematic for tests where CCR follower indices have been deleted, but their persistent follower task is only cleaned up asynchronously afterwards. If one of the following tests then accesses the follower stats, it might still get the stats for that follower task.

In addition, some tests were not cleaning up their auto-follow patterns, leaving orphaned patterns behind. Other tests cleaned up their auto-follow patterns. As always the same name was used, it just depended on the test execution order whether this led to a failure or not. This commit fixes the offensive tests, and will also automatically remove auto-follow-patterns at the end of tests, like we do for many other features.

Closes #48700
2019-11-08 13:56:53 +01:00
Henning Andersen 8835142ac9 Grok processor ignore case test (#48909)
Added test demonstrating that grok using ignore case works, since this
does a minimal test that the `joni` and `jcodings` libraries are
compatible.

Forward-port of test from #43334
2019-11-08 00:04:29 +01:00
bellengao bdc7057d58 [DOCS] Correct typo in split index API docs (#48894) 2019-11-07 15:27:27 -05:00
Tanguy Leroux 8a14ea5567
Add docker-composed based test fixture for GCS (#48902)
Similarly to what has be done for Azure in #48636, this commit
adds a new :test:fixtures:gcs-fixture project which provides two
docker-compose based fixtures that emulate a Google Cloud
Storage service.

Some code has been extracted from existing tests and placed
into this new project so that it can be easily reused in other
projects.
2019-11-07 13:27:22 -05:00
bellengao 293902c6a5 [DOCS] Fix shard type in CCR overview doc (#48882)
Closes #48875
2019-11-07 10:09:45 -05:00
Rory Hunter df16ff777e
Disable docker packaging tests again (#48896)
Backport of #48883.

Per elastic/infra#15864, the Elasticsearch CI images are failing due to
a packer_cache failure. This is because Gradle is trying to resolve
a `.docker` file through the Ivy repository, which doesn't work. Disable
the Docker tests again until we figure out the way forward.
2019-11-07 14:28:33 +00:00
Dan Hermann 5805560a2a
Validate index name time format setting at parse time (#47911) (#48881) 2019-11-07 05:24:49 -06:00
Dimitris Athanasiou dfc6a13b44
[7.x][ML] Handle nested arrays in source fields (#48885) (#48889)
Backport of #48885
2019-11-07 07:30:50 +02:00
Adrien Grand 3b9ce0a4f3
Elasticsearch 7.5 is on Lucene 8.3. (#48831) 2019-11-06 10:13:09 -05:00
Tanguy Leroux 552381d7f9 Add mention to Pause Auto-Follower API in Upgrade Clusters docs (#48764)
Relates #46665
2019-11-06 09:48:44 -05:00
István Zoltán Szabó 3c9bd13dca [DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241) 2019-11-06 07:41:38 -05:00
István Zoltán Szabó 70765dfb05 [DOCS] Adds classification type evaluation docs to the DFA evaluation API (#47657) 2019-11-06 07:38:33 -05:00
James Rodewig f1396b6322 [DOCS] Add Java to list of HTTP client libraries for basic authentication (#48647) 2019-11-05 17:09:10 -05:00
David Turner bd5c6c4779
Add preflight check to dynamic mapping updates (#48867)
Today if the primary discovers that an indexing request needs a mapping update
then it will send it to the master for validation and processing. If, however,
the put-mapping request is invalid then the master still processes it as a
(no-op) cluster state update. When there are a large number of indexing
operations that result in invalid mapping updates this can overwhelm the
master.

However, the primary already has a reasonably up-to-date mapping against which
it can check the (approximate) validity of the put-mapping request before
sending it to the master. For instance it is not possible to remove fields in a
mapping update, so if the primary detects that a mapping update will exceed the
fields limit then it can reject it itself and avoid bothering the master.

This commit adds a pre-flight check to the mapping update path so that the
primary can discard obviously-invalid put-mapping requests itself.

Fixes #35564
Backport of #48817
2019-11-05 18:08:22 +01:00
Rory Hunter 24f7d4e83b
Add Docker packaging tests on 7.x (#48857)
Backport of #46599 and #47640. Add packaging tests for Docker.

* Introduce packaging tests for Docker (#46599)

Closes #37617. Add packaging tests for our Docker images, similar to what
we have for RPMs or Debian packages. This works by running a container and
probing it e.g. via `docker exec`. Test can also be run in Vagrant, by
exporting the Docker images to disk and loading them again in VMs. Docker
is installed via `Vagrantfile` in a selection of boxes.

* Only define Docker pkg tests if Docker is available (#47640)

Closes #47639, and unmutes tests that were muted in b958467.

The Docker packaging tests were being defined irrespective of whether
Docker was actually available in the current environment. Instead,
implement exclude lists so that in environments where Docker is not
available, no Docker packaging tests are defined. For CI hosts, the build
checks `.ci/dockerOnLinuxExclusions`. The Vagrant VMs can defined the
extension property `shouldTestDocker` property to opt-in to packaging
tests.

As part of this, define a seperate utility class for checking Docker,
and call that instead of defining checks in-line in BuildPlugin.groovy
2019-11-05 15:17:59 +00:00
glerb baabc21a04 [DOCS] Correct typo in Discovery docs (#48494) 2019-11-05 08:48:43 -05:00
David Roberts c03f7ba74c [TEST] Mute TimeoutCheckerTests.testWatchdog
Due to https://github.com/elastic/elasticsearch/issues/48861
2019-11-05 11:49:46 +00:00
Armin Braun d83e374062
Bound Linearizability Check in CoordinatorTests (#48751) (#48853)
Same as #44444 but for the coordinator tests.
Closes #48742
2019-11-04 21:36:17 +01:00
Dan Hermann c85cf7a6de
Validate proxy base path at parse time (#47912) (#48825) 2019-11-04 09:51:13 -06:00
Nhat Nguyen 020ff0fef9 Do not intercept renew requests from other tests (#48833)
We might have some outstanding renew retention lease requests after a 
shard has unfollowed. If testRetentionLeaseIsAddedIfItDisappearsWhileFollowing
intercepts a renew request from other tests then we will never unlatch 
and the test will time out.

Closes #45192
2019-11-02 21:15:05 -04:00
Nhat Nguyen 0887cbc964 Fix testForceMergeWithSoftDeletesRetentionAndRecoverySource (#48766)
This test failure manifests the limitation of the recovery source merge
policy explained in #41628. If we already merge down to a single segment
then subsequent force merges will be noop although they can prune
recovery source. We need to adjust this test until we have a fix for the
merge policy.

Relates #41628
Closes #48735
2019-11-02 21:14:12 -04:00
Armin Braun 3c20541823
Cleanup Concurrent RepositoryData Loading (#48329) (#48834)
The loading of `RepositoryData` is not an atomic operation.
It uses a list + get combination of calls.
This lead to accidentally returning an empty repository data
for generations >=0 which can never not exist unless the repository
is corrupted.
In the test #48122 (and other SLM tests) there was a low chance of
running into this concurrent modification scenario and the repository
actually moving two index generations between listing out the
index-N and loading the latest version of it. Since we only keep
two index-N around at a time this lead to unexpectedly absent
snapshots in status APIs.
Fixing the behavior to be more resilient is non-trivial but in the works.
For now I think we should simply throw in this scenario. This will also
help prevent corruption in the unlikely event but possible of running into this
issue in a snapshot create or delete operation on master failover on a
repository like S3 which doesn't have the "no overwrites" protection on
writing a new index-N.

Fixes #48122
2019-11-02 20:42:29 +01:00
Armin Braun a22f6fbe3c
Cleanup Redundant Futures in Recovery Code (#48805) (#48832)
Follow up to #48110 cleaning up the redundant future
uses that were left over from that change.
2019-11-02 17:28:12 +01:00
Nhat Nguyen 4c70770877 Add debug log for CcrRetentionLeaseIT (#48820)
testRetentionLeaseIsAddedIfItDisappearsWhileFollowing is still failing 
although we already have several fixes. I think other tests interfere
and cause this test to fail. We can use the test scope to isolate them.
However, I prefer to add debug logs so we can find the source.

Relates #45192
2019-11-01 22:07:35 -04:00
Jason Tedor c24595e2ec
Fix names of UBI-based Docker image build contexts
This commit fixes the name of the UBI-based Docker image build contexts
to include "7" (to set us up for the future where we are likely to have
a ubi8-based image).
2019-11-01 17:29:15 -04:00
Armin Braun e26d01e71f
Make CcrRepository#restore non-Blocking (#48814) (#48823)
With the changes in #48110 there is no more need
to block a generic thread when waiting for the multi file transfer
in `CcrRepository`.
2019-11-01 21:02:47 +01:00
Lee Hinman 6c290ecaf7 Fix ilm/20_move_to_step basic moving to step (#48821)
Previously this step moved to the forcemerge step, however, if the
machine running the test was fast enough, it would execute the
forcemerge and move to the next step (`segment-count`) so the comparison
would fail. This commit changes the step to be a step that will never go
anywhere else, the terminal step.

Resolves #48761
2019-11-01 13:58:24 -06:00
Jason Tedor c82ecb664c
Do not wrap ingest processor exception with IAE (#48816)
The problem with wrapping here is that it converts any exception into an
IAE, which we treat as a client error (400 status) whereas the exception
being wrapped here could be a server error (e.g., NPE). This commit
stops wrapping all ingest processor exceptions as IAEs.
2019-11-01 15:11:35 -04:00