Commit Graph

51094 Commits

Author SHA1 Message Date
Armin Braun d8b43c6283
Make Snapshot Deletes Less Racy (#54765) (#55226)
Snapshot deletes should first check the cluster state for an in-progress snapshot
and try to abort it before checking the repository contents. This allows for atomically
checking and aborting a snapshot in the same cluster state update, removing all possible
races where a snapshot that is in-progress could not be found if it finishes between
checking the repository contents and the cluster state.
Also removes confusing races, where checking the cluster state off of the cluster state thread
finds an in-progress snapshot that is then not found in the cluster state update to abort it.
Finally, the logic to use the repository generation of the in-progress snapshot + 1 was error
prone because it would always fail the delete when the repository had a pending generation different from its safe generation when a snapshot started (leading to the snapshot finalizing at a
higher generation).

These issues (particularly that last point) can easily be reproduced by running `SLMSnapshotBlockingIntegTests` in a loop with current `master` (see #54766).

The snapshot resiliency test for concurrent snapshot creation and deletion was made to more
aggressively start the delete operation so that the above races would become visible.
Previously, the fact that deletes would never coincide with initializing snapshots resulted
in a number of the above races not reproducing.

This PR is the most consistent I could get snapshot deletes without changes to the state machine. The fact that aborted deletes will not put the delete operation in the cluster state before waiting for the snapshot to abort still allows for some possible (though practically very unlikely) races. These will be fixed by a state-machine change in upcoming work in #54705 (which will have a much simpler and clearer diff after this change).

Closes #54766
2020-04-15 14:47:16 +02:00
Rory Hunter 70616cd76a
Define aarch64 packaging test tasks (#55228)
Backport of #55073.

We added tasks to build an ARM distribution and Docker image, but didn't
provide any way to run packaging tests against them. Add extra loops on
the possible Architecture values, and skip tasks that can't be run on
the current Architecture.
2020-04-15 13:39:18 +01:00
Dan Hermann 30638a0b41
[7.x] Wipe data streams in each REST test (#55009) 2020-04-15 07:27:39 -05:00
Hendrik Muhs 9ec9866acb [Transform] simplify TransformConfigUpdate (#55224)
removes the unnecessary ToXContent method in TransformConfigUpdate
2020-04-15 13:22:50 +02:00
Armin Braun e164c9aaee
Remove Redundant Cluster State during Snapshot INIT + Master Failover (#54420) (#55208)
* Remove Redundant Cluster State during Snapshot INIT + Master Failover (#54420)

Similar to #54395 we know that a snapshot in INIT state has not
written anything to the repository yet. If we see one from a master
failover, there is no point in moving it to ABORTED before removing it
from the cluster state in a subsequent CS update.
Instead, we can simply remove its job from the CS the first time
we see it on master failover and be done with it.
2020-04-15 12:27:52 +02:00
Yannick Welsch d1123281b1 Use unlimited cache size by default (#55218)
Sets the default cache size for searchable snapshots to unlimited, which, for testing purposes,
is a better default than the 1GB that we currently have.
2020-04-15 12:20:51 +02:00
David Kyle bdf0eab78d
[7.x] Fix non-deterministic behaviour in ModelLoadingServiceTests (#55008) (#55213) 2020-04-15 11:09:12 +01:00
Ioannis Kakavas 0f51934bcf
[7.x] Add support for more named curves (#55179) (#55211)
We implicitly only supported the prime256v1 ( aka secp256r1 )
curve for the EC keys we read as PEM files to be used in any
SSL Context. We would not fail when trying to read a key
pair using a different curve but we would silently assume
that it was using `secp256r1` which would lead to strange
TLS handshake issues if the curve was actually another one.

This commit fixes that behavior in that it
supports parsing EC keys that use any of the named curves
defined in rfc5915 and rfc5480 making no assumptions about
whether the security provider in use supports them (JDK8 and
higher support all the curves defined in rfc5480).
2020-04-15 12:33:40 +03:00
Tomas Della Vedova 9872deace7
Yaml test: Fixed bad indentation (#55170) (#55206) 2020-04-15 11:13:52 +02:00
David Roberts 8c33cad2b2 [TEST] Lower ML model memory limit in HLRC datafeed tests (#55210)
The MachineLearningIT.testStopDatafeed test was creating 3
jobs each with the default model memory limit of 1GB.  This
meant that the test would not run on a machine with less than
10GB of RAM (due to the default ML memory percentage of 30%).

This change reduces the model memory limit for these jobs to
0.5GB, which means the test will run on a machine with only
5GB of RAM.

Relates to https://discuss.elastic.co/t/failed-ml-tests-when-running-the-gradle-check-task-against-unchanged-repo-code/227829
2020-04-15 09:44:07 +01:00
Armin Braun 48048646e7
Move Snapshot Status Related Method to Appropriate Places (#54558) (#55209)
* Move Snapshot Status Related Method to Appropriate Places

Lots of things living in `SnapshotsService` for no reason other than
that `SnapshotsService` provides the `RepositoriesService`.
Cleaning this up to directly use `RepositoriesService` in the relevant
transport actions and by that shortening the already very complex `SnapshotsService`.
2020-04-15 10:25:52 +02:00
Dimitris Athanasiou 4000138105
[7.x][ML] Add debug logging for outlier detection stop and restart integ test (#55169) (#55202)
To understand the failures in #55068

Backport of #55169
2020-04-15 10:40:38 +03:00
Lisa Cawley 2910d01179
[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192) 2020-04-14 18:47:09 -07:00
Yang Wang f49354b7d7
Add migration notes for deprecating local parameter of get field mapping API (#55194)
This is a follow-up for #55099 to add migration notes about the deprecation of local parameter for get field mappings API.
2020-04-15 11:38:05 +10:00
Ryan Ernst 5243295788
Update gradle in vagrant to use jdk14 (#55130)
This commit updates the Java version used within vagrant tests by gradle
inside the VM to use JDK14.

closes #55044
2020-04-14 17:00:12 -07:00
Lee Hinman 36f6e542a2 Ignore ILM indices in the TerminalPolicyStep (#55184)
Prior to the change in #51631 indices were moved to the `TerminalPolicyStep` when their ILM actions
had completed. Once we switched ILM to stop in the last policy configured, these steps because
inaccessible from the policy's perspective. This meant that indices upgraded from ES prior to 7.7.0
could see the following error spammed in their logs every 10 minutes (by default) for every index in
this state:

```
[2020-04-14T15:52:23,764][ERROR][o.e.x.i.IndexLifecycleRunner] [midgar] current step [{"phase":"completed","action":"completed","name":"completed"}] for index [foo] with policy [full] is not recognized
```

This changes the runner to ignore these steps, which is what is desired anyway since the index is
already in the terminal phase.
2020-04-14 16:45:03 -06:00
Rory Hunter 18dc2f7330 Rename some Docker projects for consistency (#55150)
Apply the :distribution:archives naming convention to some of the Docker
sub-projects, so that we have a more consistent naming scheme.

Also, we've seen some examples of Docker packaging tests failing sporadically
when they try to clean up the temp directory, citing a not-empty
directory. Ensure that any running container is removed before cleaning
up the temp dir, in an effort to avoid this problem.
2020-04-14 22:09:05 +01:00
Igor Motov 1754e50cbd
[7.x] Add analytics plugin usage stats to _xpack/usage (#54911) (#55162)
Adds analytics plugin usage stats to _xpack/usage.

Closes #54847
2020-04-14 17:03:14 -04:00
Mark Vieira ce85063653
[7.x] Re-add origin url information to publish POM files (#55173) 2020-04-14 13:24:15 -07:00
Armin Braun f7467a7fe8
Fix Cluster Stabilization in SnapshotResiliencyTests (#55159) (#55168)
Just like in `AbstractCoordinatorTestCase` we can't just assume the cluster
is stable once all the cluster states align since stray follower/leader check
tasks could still hit us after a disconnect, causing future test operations to fail.
=> fixed by running all tasks in the possible time span of running into these
checks before validating that cluster states align on all nodes to prevent this
like we do in the coordinator tests.

Closes #55103
2020-04-14 19:22:26 +02:00
Albert Zaharovits 7f35b927d1
Preserve parent task id for ml transform (#55124)
This change ensures that internal client requests spawned by the
transform persistent task executor and that use the end user security
credentials, have the parent task id assigned. The objective here is
to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because transform tasks already implements graceful shutdown of the
child tasks, this change does not interfere with that by opting out of
the persistent task cancellation of child tasks.

Relates #55046 #54943 #52314
Closes #54957
2020-04-14 18:43:47 +03:00
James Rodewig 12130843ca
[DOCS] Add maintenance releases to upgrade table (#55012)
Updates the supported upgrade path table in [Upgrade Elasticsearch][0]
to include a new row for maintenance releases. For example, this row
covers upgrading from 7.6.0 to 7.6.2.

The new table row only displays for releases greater than n.x.0. For
example, the new row will display for the 7.7.1 release but not the
7.7.0 release.

[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/setup-upgrade.html
2020-04-14 11:28:55 -04:00
Andrei Dan d918ef0da9
[Tests] Enable searchable_snapshots for non-snapshot builds (#55151) (#55157)
Fixes https://github.com/elastic/elasticsearch/issues/55050

(cherry picked from commit 13391ceff1cbf6db69706c5f46127b6ff8850a1f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-14 16:13:39 +01:00
James Rodewig 3fbd8b371f [DOCS] Use consistent line breaks in EQL function docs 2020-04-14 10:17:45 -04:00
Albert Zaharovits 5998486ce8
Refactor AuditTrail for TransportRequests instead of TransportMessage (#55141)
This commit refactors the `AuditTrail` to use the `TransportRequest` as a parameter
for all its audit methods, instead of the current `TransportMessage` super class.

The goal is to gain access to the `TransportRequest#parentTaskId` member,
so that it can be audited. The `parentTaskId` is used internally when spawning tasks
that handle transport requests; in this way tasks across nodes are related by the
same parent task.

Relates #52314
2020-04-14 16:53:59 +03:00
Yannick Welsch a610513ec7 Provide repository-level stats for searchable snapshots (#55051)
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
2020-04-14 14:34:08 +02:00
Przemysław Witek d5bb574e1e
[7.x] Unassign DFA tasks in SetUpgradeModeAction (#54523) (#55143) 2020-04-14 14:09:02 +02:00
Igor Motov 8a669dc9b7
EQL: Add cascading search cancellation (#54843)
EQL search cancellation now propagates cancellation to underlying search
operations.

Relates to #49638
2020-04-14 08:06:02 -04:00
Alan Woodward 16ebbff3b6 Mute CancellableTasksIT (#55152)
Test failures are tracked in #55106
2020-04-14 12:55:20 +01:00
Yannick Welsch 73c522320a Fix DanglingIndicesIT wait conditions (#55105)
Closes #55105
2020-04-14 13:54:33 +02:00
David Turner 87e8367ece Fix testCreateAndRestoreSearchableSnapshot (#55147)
Fixes a couple of related failures in SearchableSnapshotsIntegTests.

Firstly, we were not correctly accounting for the case where the cache was so
small that some/all files were read directly; fixed this by only asserting that
the cache is definitely used if the corresponding node has a cache that's large
enough to hold the whole index.

Secondly, we were not permitting shards to be completely empty, which might be
the case (rarely) if there were not many documents indexed and the distribution
of IDs was a bit unlucky; fixed this by asserting that we get stats for at
least one file for the whole index, rather than for each shard separately.

Closes #55126
2020-04-14 11:54:46 +01:00
Ioannis Kakavas 70cc1d57fb Mute failing test (#54734) 2020-04-14 10:18:33 +01:00
Mark Vieira cb58725164 Mute InferenceIngestIT.testPipelineIngest 2020-04-14 09:27:56 +01:00
debadair e8fa539bea
[DOCS] Removed obsolete warning about no way to securely store passwords (#55133) (#55140)
* [DOCS] Removed obsolete warning about no way to securely store passwords.

* Update x-pack/docs/en/watcher/actions/email.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2020-04-13 21:38:32 -07:00
William Brafford 52bebec51f
NodeInfo response should use a collection rather than fields (#54460) (#55132)
This is a first cut at giving NodeInfo the ability to carry a flexible
list of heterogeneous info responses. The trick is to be able to
serialize and deserialize an arbitrary list of blocks of information. It
is convenient to be able to deserialize into usable Java objects so that
we can aggregate nodes stats for the cluster stats endpoint.

In order to provide a little bit of clarity about which objects can and
can't be used as info blocks, I've introduced a new interface called
"ReportingService."

I have removed the hard-coded getters (e.g., getOs()) in favor of a
flexible method that can return heterogeneous kinds of info blocks
(e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the
need to cast in the client code.
2020-04-13 17:18:39 -04:00
Ryan Ernst ae14d1661e
Replace license check isAuthAllowed with isSecurityEnabled (#54547) (#55082)
The isAuthAllowed() method for license checking is used by code that
wants to ensure security is both enabled and available. The enabled
state is dynamic and provided by isSecurityEnabled(). But since security
is available with all license types, an check on the license level is
not necessary. Thus, this change replaces isAuthAllowed() with calling
isSecurityEnabled().
2020-04-13 12:26:39 -07:00
Benjamin Trent d32f6fed1d
[ML] inference only persist if there are stats (#54752) (#55121)
We needlessly send documents to be persisted. If there are no stats added, then we should not attempt to persist them.

Also, this PR fixes the race condition that caused issue:  https://github.com/elastic/elasticsearch/issues/54786
2020-04-13 14:03:05 -04:00
lcawl fcd96db006 [DOCS] Edits create data frame analytics job API (#54751) 2020-04-13 10:43:52 -07:00
Nhat Nguyen 96bb1164f0 Support hierarchical task cancellation (#54757)
With this change, when a task is canceled, the task manager will cancel
not only its direct child tasks but all also its descendant tasks.

Closes #50990
2020-04-13 12:35:21 -04:00
Igor Motov 51c6f69e02
[7.x] Add support for filters to T-Test aggregation (#54980) (#55066)
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes #53692
2020-04-13 12:28:58 -04:00
Jake Landis a2fafa6af4
[7.x] Lazy test cluster module and plugins (#54852) (#55087)
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.

Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
2020-04-13 10:53:35 -05:00
James Rodewig 57d6493e29 [DOCS] EQL: Document `string` function (#55086) 2020-04-13 11:23:45 -04:00
Peter Dyson f0b6cf4c11 [DOCS] Note where ILM policies are stored and backup caveats (#54859) 2020-04-13 09:11:16 -06:00
Igor Motov 6861295706
Further improve InternalTTestTests (#55081)
A small follow-up to #54910. Now that we can generated consistent set of
internal aggs to reduce, we no longer need to keep agg parameters as class
variables.

Related to #54910
2020-04-13 10:26:23 -04:00
Vishal Patel 16921ebbd8 [DOCS] Collapse nested objects in Explore API docs (#55067)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-04-13 09:27:03 -04:00
Benjamin Trent c5c7ee9d73
[7.x] [ML] Start gathering and storing inference stats (#53429) (#54738)
* [ML] Start gathering and storing inference stats (#53429)

This PR enables stats on inference to be gathered and stored in the `.ml-stats-*` indices.

Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user.

`.ml-stats-*` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name.

We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).
2020-04-13 08:15:46 -04:00
Ioannis Kakavas 7a8a66d9ae
[7.x] Fix ReloadSecureSettings API to consume password (#54771) (#55059)
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also:

- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
2020-04-13 09:50:55 +03:00
Yang Wang 862799956c
Deprecate local parameter for get field mapping request (#55014) (#55099)
The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0.

This PR deprecates the parameter from rest layer. It will be removed in next major version.
2020-04-12 13:48:47 +10:00
Andrei Dan c0406f78b7
ILM add cluster update timeout on step retry (#54878) (#55022)
This commits adds a timeout when moving ILM back on to a failed step. In
case the master is struggling with processing the cluster update requests
these ones will expire (as we'll send them again anyway on the next ILM
loop run)

ILM more descriptive source messages for cluster updates

Use the configured ILM step master timeout setting

(cherry picked from commit ff6c5ed16616eadfcddd9c95317d370f0d126583)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-11 10:13:31 +01:00
Andrei Dan b8df265b42
[7.x] ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) (#55018)
* ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909)

This changes the priority of the cluster state update that stops ILM
altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to
temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL`
priority can see the "stop ILM update" not make it up the tasks queue.

On the same note, we're keeping the `start ILM` cluster update priority
to `NORMAL` on purpose such that we only start `ILM` if the cluster can
handle it.

(cherry picked from commit d67df3a7cd2a8619c2c9efac4dde3ba83271f2fa)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-11 10:12:35 +01:00