Commit Graph

4290 Commits

Author SHA1 Message Date
Benjamin Trent 19602fd573
[ML][Inference] changing setting to be memorySizeSettting (#49259) (#49302) 2019-11-19 07:56:40 -05:00
Przemysław Witek 38aec2e298
Relax assertions related to datafeed timing stats in .yml test (#49285) (#49291) 2019-11-19 12:50:14 +01:00
David Roberts a5204c1c80
[ML] Fixes for stop datafeed edge cases (#49284)
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of #49191
2019-11-19 10:51:46 +00:00
Lisa Cawley abd4a70b10 [DOCS] Merges duplicate pages for Kerberos realms (#49207) 2019-11-18 15:23:06 -08:00
Lisa Cawley b4f82c9cdb [DOCS] Merges duplicate pages for LDAP realms (#49203) 2019-11-18 14:09:24 -08:00
Julie Tibshirani a0ee6c8f7e
Add telemetry for flattened fields. (#48972) (#49125)
Currently we just record the number of flattened fields defined in the mappings.
2019-11-18 12:29:42 -08:00
Lisa Cawley b0054eecd6 [DOCS] Merges duplicate pages for file realms (#49200) 2019-11-18 12:02:18 -08:00
Benjamin Trent eefe7688ce
[7.x][ML] ML Model Inference Ingest Processor (#49052) (#49257)
* [ML] ML Model Inference Ingest Processor (#49052)

* [ML][Inference] adds lazy model loader and inference (#47410)

This adds a couple of things:

- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:

* [ML][Inference] Adjust inference configuration option API (#47812)

* [ML][Inference] adds logistic_regression output aggregator (#48075)

* [ML][Inference] Adding read/del trained models (#47882)

* [ML][Inference] Adding inference ingest processor (#47859)

* [ML][Inference] fixing classification inference for ensemble (#48463)

* [ML][Inference] Adding model memory estimations (#48323)

* [ML][Inference] adding more options to inference processor (#48545)

* [ML][Inference] handle string values better in feature extraction (#48584)

* [ML][Inference] Adding _stats endpoint for inference (#48492)

* [ML][Inference] add inference processors and trained models to usage (#47869)

* [ML][Inference] add new flag for optionally including model definition (#48718)

* [ML][Inference] adding license checks (#49056)

* [ML][Inference] Adding memory and compute estimates to inference (#48955)

* fixing version of indexed docs for model inference
2019-11-18 13:19:17 -05:00
Lisa Cawley 48f53efd9a [DOCS] Merges duplicate pages for SAML realms (#49209) 2019-11-18 10:09:29 -08:00
Armin Braun 25cc8e3663
Fix RepoCleanup not Removed on Master-Failover (#49217) (#49239)
The logic for `cleanupInProgress()` was backwards everywhere (method itself and
all but one user). Also, we weren't checking it when removing a repository.

This lead to a bug (in the one spot that didn't use the method backwards) that prevented
the cleanup cluster state entry from ever being removed from the cluster state if master
failed over during the cleanup process.

This change corrects the backwards logic, adds a test that makes sure the cleanup
is always removed and adds a check that prevents repository removal during cleanup
to the repositories service.

Also, the failure handling logic in the cleanup action was broken. Repeated invocation would lead to the cleanup being removed from the cluster state even if it was in progress. Fixed by adding a flag that indicates whether or not any removal of the cleanup task from the cluster state must be executed. Sorry for mixing this in here, but I had to fix it in the same PR, as the first test (for master-failover) otherwise would often just delete the blocked cleanup action as a result of a transport master action retry.
2019-11-18 16:44:09 +01:00
Przemysław Witek 5f9965e4b8
Lower minimum model memory limit value from 1MB to 1kB. (#49227) (#49242) 2019-11-18 14:58:20 +01:00
Hendrik Muhs ca912624ec [Transform] improve error handling of script errors (#48887)
improve error handling for script errors, treating it as irrecoverable errors which puts the task
immediately into failed state, also improves the error extraction to properly report the script 
error.

fixes #48467
2019-11-18 10:24:39 +01:00
Tanguy Leroux fcac3fbfd9 AutoFollowIT should not rely on assertBusy but should use latches instead (#49141)
AutoFollowIT relies on assertBusy() calls to wait for a given number of 
leader indices to be created but this is prone to failures on CI. Instead, 
we should use latches to indicate when auto-follow patterns must be 
paused and resumed.
2019-11-18 09:40:56 +01:00
Dimitris Athanasiou 805c31e19e
[7.x][ML] Avoid NPE when node load is calculated on job assignment (#49186) (#49214)
This commit fixes a NPE problem as reported in #49150.
But this problem uncovered that we never added proper handling
of state for data frame analytics tasks.

In this commit we improve the `MlTasks.getDataFrameAnalyticsState`
method to handle null tasks and state tasks properly.

Closes #49150

Backport of #49186
2019-11-18 10:33:07 +02:00
Przemysław Witek 150db2b544
Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143) (#49164) 2019-11-18 07:52:57 +01:00
Jason Tedor 60d1d67aac
CCR should auto-retry rejected execution exceptions (#49213)
If CCR encounters a rejected execution exception, today we treat this as
fatal. This is not though, as the stuffed queue could drain. Requiring
an administrator to manually restart the follow tasks that faced such an
exception is a burden. This commit addresses this by making CCR
auto-retry on rejected execution exceptions.
2019-11-17 12:48:46 -05:00
Lisa Cawley 09a9ec4d23 [DOCS] Merges duplicate pages for native realms (#49198) 2019-11-15 15:35:53 -08:00
Mayya Sharipova 0e933a093d Add index name to search requests (#49175)
We can't guarantee expected request failures if search request is across
many indexes, as if expected shards fail, some indexes may return 200.

closes #47743
2019-11-15 16:39:18 -05:00
Jay Modi 57f57227ac
Clean up static web server in sql-client tests (#49187) (#49197)
The JdbcHttpClientRequestTests and HttpClientRequestTests classes both
hold a static reference to a mock web server that internally uses the
JDKs built-in HttpServer, which resides in a sun package that the
RamUsageEstimator does not have access to. This causes builds that use
a runtime of Java 8 to fail since the StaticFieldsInvariantRule is run
when Java 8 is used.

Relates #41526
Relates #49105
2019-11-15 13:02:21 -07:00
Lisa Cawley bc6a9de2dd [DOCS] Edits the get tokens API (#45312) 2019-11-15 10:54:07 -08:00
Lee Hinman 680436dd0d
[7.x] Don't halt policy execution on policy trigger exception… (#49171)
When triggered either by becoming master, a new cluster state, or a
periodic schedule, an ILM policy execution through
`maybeRunAsyncAction`, `runPolicyAfterStateChange`, or
`runPeriodicStep` throwing an exception will cause the loop the
terminate. This means that any indices that would have been processed
after the index where the exception was thrown will not be processed by
ILM.

For most execution this is not a problem because the actual running of
steps is protected by a try/catch that moves the index to the ERROR step
in the event of a problem. If an exception occurs prior to step
execution (for example, in fetching and parsing the current
policy/step) however, it causes the loop termination previously
mentioned.

This commit wraps the invocation of the methods specified above in a
try/catch block that provides better logging and does not bubble the
exception up.
2019-11-15 09:22:37 -07:00
Albert Zaharovits 89b3c32b40
Audit log filter and marker (#49145)
This adds a log marker and a marker filter for the audit log.

Closes #47251
2019-11-15 08:44:09 -05:00
Christos Soulios d9f0245b10
[7.x] Implement stats aggregation for string terms (#49097)
Backport of #47468 to 7.x

This PR adds a new metric aggregation called string_stats that operates on string terms of a document and returns the following:

min_length: The length of the shortest term
max_length: The length of the longest term
avg_length: The average length of all terms
distribution: The probability distribution of all characters appearing in all terms
entropy: The total Shannon entropy value calculated for all terms

This aggregation has been implemented as an analytics plugin.
2019-11-15 14:36:21 +02:00
Andrei Dan 085d08cfd1
ILM Remove obsolete testRolloverAlreadyExists (#49104) (#49144)
The rollover action is now a retryable step (see #48256)
so ILM will keep retrying until it succeeds as opposed to stopping and
moving the execution in the ERROR step.

Fixes #49073

(cherry picked from commit 3ae90898121b43032ec8f3b50514d93a86e14d0f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

# Conflicts:
#	x-pack/plugin/ilm/qa/multi-node/src/test/java/org/elasticsearch/xpack/ilm/TimeSeriesLifecycleActionsIT.java
2019-11-15 12:06:22 +00:00
Ioannis Kakavas f5f0e1366a
Handle unexpected/unchecked exceptions correctly (#49080) (#49137)
Ensures that methods that are called from different threads ( i.e.
from the callbacks of org.apache.http.concurrent.FutureCallback )
catch `Exception` instead of only the expected checked exceptions.

This resolves a bug where OpenIdConnectAuthenticator#mergeObjects
would throw an IllegalStateException that was never caught causing
the thread to hang and the listener to never be called. This would
in turn cause Kibana requests to authenticate with OpenID Connect
to timeout and fail without even logging anything relevant.

This also guards against unexpected Exceptions that might be thrown
by invoked library methods while performing the necessary operations
in these callbacks.
2019-11-15 11:54:08 +02:00
James Baiera 6bb6adb8d3
Reuse collected cluster state in EnrichPolicyRunner (#48488) (#49100)
The cluster state is obtained twice in the EnrichPolicyRunner when updating 
the final alias. There is a possibility for the state to be slightly different 
between those two calls. This PR just has the function get the cluster state 
once and reuse it for the life of the function call.
2019-11-14 14:14:39 -05:00
Dan Hermann cac9fe4d86
[7.x] Validate monitoring password at parse time (#49083) 2019-11-14 09:39:28 -06:00
Dimitris Athanasiou be5894ed9c
[7.x][SQL] Mute JdbcConfigurationTests.testDriverConfigurationWithSSLInURL (#49085) (#49086)
Relates #41557
2019-11-14 15:15:55 +02:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Marios Trivyzas 7c3198ba44
SQL: [Tests] Mute testReplaceChildren for Pivot (#49045)
Temporarily "mute" the testReplaceChildren for Pivot since it leads to
failing tests for some seeds, since the new child doesn't respond to a
valid data type.

Relates to #48900

(cherry picked from commit 6200a2207b9a4264d2f3fc976577323c7e084317)
2019-11-14 11:30:33 +01:00
Armin Braun 25e05b0013
Fix X-Pack SchedulerEngine Shutdown (#48951) (#49054)
We can have a race here where `scheduleNextRun` executes concurrently to `stop`
and so we run into a `RejectedExecutionException` that we don't catch and thus it
fails tests.
=> Fixed by ignoring these so long as they coincide with a scheduler shutdown
2019-11-13 22:06:55 +01:00
Przemysław Witek e6ad3c29fd
Do not throw exceptions resulting from persisting datafeed timing stats. (#49044) (#49050) 2019-11-13 20:23:13 +01:00
Henning Andersen 66f0c8900f
Fix Transport Stopped Exception (#48930) (#49035)
When a node shuts down, `TransportService` moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped `NodeClosedException` exception instead, which is correctly
handled in replication action. Fixed other usages too.

Relates #42612
2019-11-13 18:48:05 +01:00
Tanguy Leroux e86b598813 Fix AutoFollowIT (#49025)
This commit fixes an off-by-one bug in the AutoFollowIT test that causes
failures because the leaderIndices counter is incremented during the evaluation
of the leaderIndices.incrementAndGet() < 20 condition but the 20th index is
not created, making the final assertion not verified.

It also gives a bit more time for cluster state updates to be processed on the
follower cluster.

Closes #48982
2019-11-13 13:20:57 +01:00
Ioannis Kakavas 4405042900
Remove unnecessary details logged for OIDC (#48746) (#49031)
This commit removes unnecessary details logged for
OIDC.

Co-Authored-By: Ioannis Kakavas <ikakavas@protonmail.com>
2019-11-13 13:43:56 +02:00
Yannick Welsch 2dfa0133d5 Always use primary term from primary to index docs on replica (#47583)
Ensures that we always use the primary term established by the primary to index docs on the
replica. Makes the logic around replication less brittle by always using the operation primary
term on the replica that is coming from the primary.
2019-11-13 12:13:45 +01:00
Ioannis Kakavas e0331e2a0f
Remove limitation for SAML encryption in FIPS mode (#48948) (#49019)
Our documentation regarding FIPS 140 claimed that when using SAML
in a JVM that is configured in FIPS approved only mode, one could
not use encrypted assertions. This stemmed from a wrong
understanding regarding the compliance of RSA-OAEP which is used
as the key wrapping algorithm for encrypting the key with which the
SAML Assertion is encrypted.

However, as stated for instance in
https://downloads.bouncycastle.org/fips-java/BC-FJA-SecurityPolicy-1.0.0.pdf
RSA-OAEP is approved for key transport, so this limitation is not
effective.

This change removes the limitation from our FIPS 140 related
documentation.
2019-11-13 12:10:01 +02:00
Julie Tibshirani 37fa3fb4ff
Ensure parameters are updated when merging flattened mappings. (#48971) (#49014)
This PR makes the following two fixes around updating flattened fields:

* Make sure that the new value for ignore_above is immediately taken into
  affect. Previously we recorded the new value but did not use it when parsing
  documents.
* Allow depth_limit to be updated dynamically. It seems plausible that a user
  might want to tweak this setting as they encounter more data.
2019-11-12 21:50:39 -05:00
Lee Hinman 5eb37c29fe
[7.x] Re-read policy phase JSON when using ILM's move-to-step… (#49011)
When using the move-to-step API, we should reread the phase JSON from
the latest version of the ILM policy. This allows a user to move to the
same step while re-reading the policy's latest version. For example,
when changing rollover criteria.

While manually messing around with some other things I discovered that
we only reread the policy when using the retry API, not the move-to-step
API. This commit changes the move-to-step API to always read the latest
version of the policy.
2019-11-12 19:41:06 -07:00
Martijn van Groningen 18d5d73305
Enable spotless for enrich gradle project in 7 dot x branch. (#48976)
Backport of #48908

The enrich project doesn't have much history as all the other gradle projects,
so it makes sense to enable spotless for this gradle project.
2019-11-12 13:22:34 +01:00
Armin Braun ea9f094e75
Significantly Lower Monitoring HttpExport Memory Footprint (#48854) (#48966)
The `HttpExportBulk` exporter is using a lot more memory than it needs to
by allocating buffers for serialization and IO:

* Remove copying of all bytes when flushing, instead use the stream wrapper
* Remove copying step turning the BAOS into a `byte[]`
  * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput`
* Don't allocate a new BAOS for every document, just keep appending to a single BAOS
2019-11-12 08:49:40 +01:00
Jake Landis c320b499a0
Prevent deadlock by using separate schedulers (#48697) (#48964)
Currently the BulkProcessor class uses a single scheduler to schedule
flushes and retries. Functionally these are very different concerns but
can result in a dead lock. Specifically, the single shared scheduler
can kick off a flush task, which only finishes it's task when the bulk
that is being flushed finishes. If (for what ever reason), any items in
that bulk fails it will (by default) schedule a retry. However, that retry
will never run it's task, since the flush task is consuming the 1 and
only thread available from the shared scheduler.

Since the BulkProcessor is mostly client based code, the client can
provide their own scheduler. As-is the scheduler would require
at minimum 2 worker threads to avoid the potential deadlock. Since the
number of threads is a configuration option in the scheduler, the code
can not enforce this 2 worker rule until runtime. For this reason this
commit splits the single task scheduler into 2 schedulers. This eliminates
the potential for the flush task to block the retry task and removes this
deadlock scenario.

This commit also deprecates the Java APIs that presume a single scheduler,
and updates any internal code to no longer use those APIs.

Fixes #47599

Note - #41451 fixed the general case where a bulk fails and is retried
that can result in a deadlock. This fix should address that case as well as
the case when a bulk failure *from the flush* needs to be retried.
2019-11-11 16:31:21 -06:00
Benjamin Trent 46ab1db54f
[7.x] [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050) (#48958)
* [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

Related PR: https://github.com/elastic/ml-cpp/pull/809

* adjusting bwc version
2019-11-11 15:43:03 -05:00
Jake Landis 909fbd0015
[7.x] Mute FullClusterRestartTest#testWatcher and 30s timeout… (#48850)
The timeout was increased to 60s to allow this test more time to reach a
yellow state. However, the test will still on occasion fail even with the
60s timeout.

Related: #48381
Related: #48434
Related: #47950
Related: #40178
2019-11-11 09:38:14 -06:00
Christoph Büscher 6119f0aaa2 Fix Eclipse compilation in DataFrameDataExtractorTests (#48942) 2019-11-11 16:17:55 +01:00
Martijn van Groningen a1dd830cb5
Re-enabled test with longer timeout waiting for monitoring.
See #48258
2019-11-11 16:07:50 +01:00
Yannick Welsch af887be3e5 Hide orphaned tasks from follower stats (#48901)
CCR follower stats can return information for persistent tasks that are in the process of being cleaned up. This is problematic for tests where CCR follower indices have been deleted, but their persistent follower task is only cleaned up asynchronously afterwards. If one of the following tests then accesses the follower stats, it might still get the stats for that follower task.

In addition, some tests were not cleaning up their auto-follow patterns, leaving orphaned patterns behind. Other tests cleaned up their auto-follow patterns. As always the same name was used, it just depended on the test execution order whether this led to a failure or not. This commit fixes the offensive tests, and will also automatically remove auto-follow-patterns at the end of tests, like we do for many other features.

Closes #48700
2019-11-08 13:56:53 +01:00
Dan Hermann 5805560a2a
Validate index name time format setting at parse time (#47911) (#48881) 2019-11-07 05:24:49 -06:00
Dimitris Athanasiou dfc6a13b44
[7.x][ML] Handle nested arrays in source fields (#48885) (#48889)
Backport of #48885
2019-11-07 07:30:50 +02:00
James Rodewig f1396b6322 [DOCS] Add Java to list of HTTP client libraries for basic authentication (#48647) 2019-11-05 17:09:10 -05:00
David Roberts c03f7ba74c [TEST] Mute TimeoutCheckerTests.testWatchdog
Due to https://github.com/elastic/elasticsearch/issues/48861
2019-11-05 11:49:46 +00:00
Dan Hermann c85cf7a6de
Validate proxy base path at parse time (#47912) (#48825) 2019-11-04 09:51:13 -06:00
Nhat Nguyen 020ff0fef9 Do not intercept renew requests from other tests (#48833)
We might have some outstanding renew retention lease requests after a 
shard has unfollowed. If testRetentionLeaseIsAddedIfItDisappearsWhileFollowing
intercepts a renew request from other tests then we will never unlatch 
and the test will time out.

Closes #45192
2019-11-02 21:15:05 -04:00
Armin Braun 3c20541823
Cleanup Concurrent RepositoryData Loading (#48329) (#48834)
The loading of `RepositoryData` is not an atomic operation.
It uses a list + get combination of calls.
This lead to accidentally returning an empty repository data
for generations >=0 which can never not exist unless the repository
is corrupted.
In the test #48122 (and other SLM tests) there was a low chance of
running into this concurrent modification scenario and the repository
actually moving two index generations between listing out the
index-N and loading the latest version of it. Since we only keep
two index-N around at a time this lead to unexpectedly absent
snapshots in status APIs.
Fixing the behavior to be more resilient is non-trivial but in the works.
For now I think we should simply throw in this scenario. This will also
help prevent corruption in the unlikely event but possible of running into this
issue in a snapshot create or delete operation on master failover on a
repository like S3 which doesn't have the "no overwrites" protection on
writing a new index-N.

Fixes #48122
2019-11-02 20:42:29 +01:00
Armin Braun a22f6fbe3c
Cleanup Redundant Futures in Recovery Code (#48805) (#48832)
Follow up to #48110 cleaning up the redundant future
uses that were left over from that change.
2019-11-02 17:28:12 +01:00
Nhat Nguyen 4c70770877 Add debug log for CcrRetentionLeaseIT (#48820)
testRetentionLeaseIsAddedIfItDisappearsWhileFollowing is still failing 
although we already have several fixes. I think other tests interfere
and cause this test to fail. We can use the test scope to isolate them.
However, I prefer to add debug logs so we can find the source.

Relates #45192
2019-11-01 22:07:35 -04:00
Armin Braun e26d01e71f
Make CcrRepository#restore non-Blocking (#48814) (#48823)
With the changes in #48110 there is no more need
to block a generic thread when waiting for the multi file transfer
in `CcrRepository`.
2019-11-01 21:02:47 +01:00
Lee Hinman 6c290ecaf7 Fix ilm/20_move_to_step basic moving to step (#48821)
Previously this step moved to the forcemerge step, however, if the
machine running the test was fast enough, it would execute the
forcemerge and move to the next step (`segment-count`) so the comparison
would fail. This commit changes the step to be a step that will never go
anywhere else, the terminal step.

Resolves #48761
2019-11-01 13:58:24 -06:00
Hendrik Muhs 5ecde37a68
[7.x][Transform] decouple task and indexer (#48812)
decouple TransformTask and ClientTransformIndexer. Interaction between the 2 classes are
now moved into a context class which holds shared information.

relates #45369
2019-11-01 19:39:35 +01:00
Mark Vieira 6ab4645f4e
[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818)
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.

Closes #42042
2019-11-01 11:33:11 -07:00
Dimitris Athanasiou f2d4c94a9c
[7.x][ML] Deduplicate multi-fields for data frame analytics (#48799) (#48806)
In the case multi-fields exist in the source index, we pick
all variants of them in our extracted fields detection for
data frame analytics. This means we may have multiple instances
of the same feature. The worse consequence of this is when the
dependent variable (for regression or classification) is also
duplicated which means we train a model on the dependent variable
itself.

Now that #48770 is merged, this commit is adding logic to
only select one variant of multi-fields.

Closes #48756

Backport of #48799
2019-11-01 16:53:05 +02:00
Tim Vernum fd4ae697b8 Fix indentation of "except" in role mapping doc
"except" is a type of rule, and should be indented accordingly.
2019-11-01 10:46:15 -04:00
Dan Hermann 3604add5c9
[7.x] Validate monitoring username at parse time (#48774) 2019-11-01 09:02:37 -05:00
Andrei Dan 98a9227588
Fix TimeSeriesLifecycleActionsIT.testRolloverAlreadyExists (#48747) (#48795)
* ILM Test asserts on the same ilm/_explain output

With the introduction of retryable steps subsequent ilm/_explain calls
can see the state of an ilm cycle move out of the error step. This test
made several assertions assuming that the cycle remains in the error
step so this commit changes the test to make one _explain call and have
all the asserts work on the same ilm state (so subsequent assumptions to
the cycle being in the error step are valid).

* Drop unused field in test.

(cherry picked from commit 44c74bb487151c886a08b27f32b13f7a72056997)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-11-01 12:34:33 +00:00
Dimitris Athanasiou 1f662e0b12
[7.x][ML] Prevent fetching multi-field from source (#48770) (#48797)
Aggregatable mutli-fields are at the moment wrongly mapped
as normal doc_value fields and thus they support fetching from
source. However, they do not exist in the source. This results
to failure to extract such fields.

This commit fixes this bug. While a fix could be worked out
on top of the existing code, it is evident the extraction logic
has become difficult to understand and maintain. As we also
want to deduplicate multi-fields for data frame analytics,
it seemed appropriate to refactor the code to simplify and
better handle the extraction of multi-fields.

Relates #48756

Backport of #48770
2019-11-01 14:18:03 +02:00
Andrei Stefan e1e9b23db8 Cleanup static instance in @AfterClass 2019-10-31 23:24:40 -04:00
Andrei Stefan 2c73c7dfe3 SQL: binary communication implementation for drivers and the CLI (#48261)
* Introduce binary_format request parameter (binary.format for JDBC) to disable binary
communication between clients (jdbc/odbc) and server.
* for CLI - "binary" command line parameter (or -b) is introduced. Default value is "true".
* binary communication (cbor) is enabled by default
* disabling request parameter introduced for debugging purposes only

(cherry picked from commit f96a5ca61cb9fad9ed59357320af20e669348ce7)
2019-10-31 20:39:41 -04:00
Tal Levy 4be54402de
[7.x] Add ingest info to Cluster Stats (#48485) (#48661)
* Add ingest info to Cluster Stats (#48485)

This commit enhances the ClusterStatsNodes response to include global
processor usage stats on a per-processor basis.

example output:

```
...
    "processor_stats": {
      "gsub": {
        "count": 0,
        "failed": 0
        "current": 0
        "time_in_millis": 0
      },
      "script": {
        "count": 0,
        "failed": 0
        "current": 0,
        "time_in_millis": 0
      }
    }
...
```

The purpose for this enhancement is to make it easier to collect stats on how specific processors are being used across the cluster beyond the current per-node usage statistics that currently exist in node stats.

Closes #46146.

* fix BWC of ingest stats

The introduction of processor types into IngestStats had a bug.
It was set to `null` and set as the key to the map. This would
throw a NPE. This commit resolves this by setting all the processor
types from previous versions that are not serializing it out to
`_NOT_AVAILABLE`.
2019-10-31 14:36:54 -07:00
Lee Hinman d0ead688c3
[7.x] Fix TimeSeriesLifecycleActionsIT.testExplainFilters (#48… (#48776)
This test used an index without an alias to simulate a failure in the
`check-rollover-ready` step. However, with #48256 that step
automatically retries, meaning that the index may not always be in
the ERROR step.

This commit changes the test to use a shrink action with an invalid
number of shards so that it stays in the ERROR step.

Resolves #48767
2019-10-31 15:25:12 -06:00
Ioannis Kakavas 99aedc844d
Copy http headers to ThreadContext strictly (#45945) (#48675)
Previous behavior while copying HTTP headers to the ThreadContext,
would allow multiple HTTP headers with the same name, handling only
the first occurrence and disregarding the rest of the values. This
can be confusing when dealing with multiple Headers as it is not
obvious which value is read and which ones are silently dropped.

According to RFC-7230, a client must not send multiple header fields
with the same field name in a HTTP message, unless the entire field
value for this header is defined as a comma separated list or this
specific header is a well-known exception.

This commits changes the behavior in order to be more compliant to
the aforementioned RFC by requiring the classes that implement
ActionPlugin to declare if a header can be multi-valued or not when
registering this header to be copied over to the ThreadContext in
ActionPlugin#getRestHeaders.
If the header is allowed to be multivalued, then all such headers
are read from the HTTP request and their values get concatenated in
a comma-separated string.
If the header is not allowed to be multivalued, and the HTTP
request contains multiple such Headers with different values, the
request is rejected with a 400 status.
2019-10-31 23:05:12 +02:00
Andrey Ershov 088988bb37
GCS snapshot cleanup tool backport to 7.x (#48750)
This is the backport of #45076 with dependent changes.
2019-10-31 18:21:36 +03:00
Alexander Reelsen 4ecf234617 Upgrade to joda 2.10.4 (#47805) 2019-10-31 14:49:50 +01:00
emasab 185e067442 SQL: Failing Group By queries due to different ExpressionIds (#43072)
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map
that helps the QueryTranslator to identify the grouping columns. The issue is
that the same expression in different parts of the query (SELECT clause and GROUP BY clause)
ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds
use the hashCode() of NamedExpression.

Fixes: #41159
Fixes: #40001
Fixes: #40240
Fixes: #33361
Fixes: #46316
Fixes: #36074
Fixes: #34543
Fixes: #37044

Fixes: #42041
(cherry picked from commit 3c38ea555984fcd2c6bf9e39d0f47a01b09e7c48)
2019-10-31 14:49:16 +01:00
Martijn van Groningen c358ecb5fb
Don't preserve indices between enrich qa tests.
This was added because it was suspected to cause the monitoring
enrich verification to fail, but that is not the case.

See #48258
2019-10-31 14:23:56 +01:00
Andrei Dan ffe5d5417f
ILM Make the `check-rollover-ready` step retryable (#48256) (#48740)
This adds the infrastructure to be able to retry the execution of retryable
steps and makes the `check-rollover-ready` retryable as an initial step to
make the rollover action more resilient to transient errors.

(cherry picked from commit 454020ac8acb147eae97acb4ccd6fb470d1e5f48)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-10-31 11:28:55 +00:00
Albert Zaharovits 00d3151eea Document allow_restricted_indices for indices privileges (#47514)
Document the allow_restricted_indices role descriptor field.
2019-10-31 11:45:11 +02:00
David Roberts c3063c4e1f [ML] Make the URL of the ML C++ Ivy repo configurable (#48702)
At present the ML C++ artifact is always downloaded from
S3.  This change adds an option to configure the location.

(The intention is to use a file:/// URL to pick up the
artifact built in a Docker container in ml-cpp PR builds
so that C++ changes that will break Java integration tests
can be detected before the ml-cpp PRs are merged.)

Relates elastic/ml-cpp#766
2019-10-31 09:21:44 +00:00
Dimitris Athanasiou 919596b2e8
[7.x][ML] Move field extraction logic to its own package (#48709) (#48712)
Moves common field extraction logic to its own package so that it can
be used both for anomaly detection and data frame analytics.

In preparation for refactoring extraction fields to be simpler and to
support multi-fields properly.

Backport of #48709
2019-10-31 02:41:00 +02:00
Yogesh Gaikwad c7342dde29
Fix to release system resource after reading JKWSet file (#48666) (#48677)
When we load a JSON Web Key (JWKSet) from the specified
file using JWKSet.load it internally uses IOUtils.readFileToString
but the opened FileInputStream is never closed after usage.
https://bitbucket.org/connect2id/nimbus-jose-jwt/issues/342

This commit reads the file and parses the JWKSet from the string.

This also fixes an issue wherein if the underlying file changed,
for every change event it would add another file watcher. The
change is to only add the file watcher at the start.

Closes #44942
2019-10-31 10:16:33 +11:00
Lee Hinman 2d5291cf3b Un-AwaitsFix and enhance logging for testPolicyCRUD (#48719)
* Un-AwaitsFix and enhance logging for testPolicyCRUD

This removes the `AwaitsFix` and increases the test logging for
`SnapshotLifecycleServiceTests.testPolicyCRUD` in an effort to track
down the cause of #44997.

* Remove unused import
2019-10-30 17:02:57 -06:00
Julie Tibshirani ae1ef5fd92 Refactor unit tests for vector functions. (#48662)
This PR performs the following changes:
* Split `ScoreScriptUtilsTests` into `DenseVectorFunctionTests` and
`SparseVectorFunctionTests`. This will make it easier to delete all sparse
vector function tests once we remove support on 8.x.
* As much as possible, break up the large test methods into individual tests
for each vector function (`cosineSimilarity`, `l2norm`, etc.).
2019-10-30 15:36:06 -07:00
Lee Hinman ed2bb73de2 Fix SnapshotLifecycleService logger (#48711)
The logger was erroneously using the `SnapshotLifecycleMetadata` class
for its initialization, making it hard to target packages for logging
levels since `SnapshotLifecycleMetadata` is in a different package.
2019-10-30 13:13:50 -06:00
Benjamin Trent c9ead80c31
[7.x] [ML][Inference] separating definition and config object storage (#48651) (#48695)
* [ML][Inference] separating definition and config object storage (#48651)

This separates out the `definition` object from being stored within the configuration object in the index. 

This allows us to gather the config object without decompressing a potentially large definition.

Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.
2019-10-30 13:27:29 -04:00
Lee Hinman 72a601c47f
[7.x] Don't schedule SLM jobs when services have been stopped… (#48692)
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves #47749
2019-10-30 09:46:35 -06:00
Armin Braun 52e5ceb321
Restore from Individual Shard Snapshot Files in Parallel (#48110) (#48686)
Make restoring shard snapshots run in parallel on the `SNAPSHOT` thread-pool.
2019-10-30 14:36:30 +01:00
Yogesh Gaikwad 9ed7352a12
Add Sysprop to Adjust IO Buffer Size (#48267) (#48667)
The 1MB IO-buffer size per transport thread is causing trouble in
some tests, albeit at a low rate. Reducing the number of transport
threads was not enough to fully fix this situation.
Allowing to configure the size of the buffer and reducing it by
more than an order of magnitude should fix these tests.

Closes #46803
2019-10-30 14:19:54 +11:00
Yogesh Gaikwad 1b64c1992a Add owner flag parameter to the rest spec (#48500)
This commit adds missing info about newly added
`owner` flag to the rest spec, also adds a rest test
for the same.

Closes#48499
2019-10-30 13:07:01 +11:00
Julie Tibshirani 89c65752dc
Update the signature of vector script functions. (#48653)
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
2019-10-29 15:46:05 -07:00
Gordon Brown 25724c5c46
Adjust date parsing in ILM integration tests (#48648)
The format returned by the API is not always parsable with
`Instant.parse()`, so this commit adjusts to parsing those dates as
`ISO_ZONED_DATE_TIME` instead, which appears to always parse the
returned value correctly.
2019-10-29 15:44:04 -07:00
Gordon Brown 50d7424e7d
Unmute and increase logging on flaky SLM tests (#48612)
The failures in these tests have been remarkably difficult to track
down, in part because they will not reproduce locally. This commit
unmutes the flaky tests and increases logging, as well as introducing
some additional logging, to attempt to pin down the failures.
2019-10-29 13:39:19 -07:00
Andrei Dan 8b22e297ed
ILM open/close steps are noop if idx is open/close (#48614) (#48640)
The open and close follower steps didn't check if the index is open,
closed respectively, before executing the open/close request.
This changes the steps to check the index state and only perform the
open/close operation if the index is not already open/closed.
2019-10-29 17:43:56 +00:00
Lisa Cawley be9df101bf [DOCS] Adds missing references to oidc realms (#48224) 2019-10-29 09:41:34 -07:00
Gordon Brown cf235796c0
Use more reliable "never run" cron pattern in tests (#48608)
The cron schedule "1 2 3 4 5 ?" will run every May 4 at 03:02:01, which
may result in unnecessary test failures once a year. This commit
switches out uses of that schedule in tests for one which will never
execute (because it specifies a day which doesn't exist, Feb. 31). Also
factors the schedule out to a constant to make the intent clearer.
2019-10-29 09:33:14 -07:00
Przemysław Witek 7c944d26c5
[7.x] Assert that the results of classification analysis can be evaluated using _evaluate API. (#48626) (#48634) 2019-10-29 16:20:56 +01:00
Ioannis Kakavas a0362153e2
Update oauth2-oidc-sdk and nimbus-jose-jwt (#48537) (#48628)
Update two dependencies for our OpenID Connect realm implementation
to their latest versions
2019-10-29 14:18:59 +02:00
Yannick Welsch 790cfc8ad2 Fix upgraded_scroll test (#48525)
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to
the node on which it was previously allocated, but to which it can't be allocated now due to the
shard lock being held because of an in-progress scroll. As the master keeps on retrying and
retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks
any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing
(see #48395). 

Closes #48395
2019-10-29 08:10:40 +01:00
Cris da Rocha 947f89a3a1 Update troubleshooting.asciidoc (#48516) 2019-10-28 18:44:24 -07:00
Mark Vieira e5c6440a4f
Simplify usage of Gradle Shadow plugin (#48478) (#48597)
This commit simplifies and standardizes our usage of the Gradle Shadow
plugin to conform more to plugin conventions. The custom "bundle" plugin
has been removed as it's not necessary and performs the same function
as the Shadow plugin's default behavior with existing configurations.

Additionally, this removes unnecessary creation of a "nodeps" artifact,
which is unnecessary because by default project dependencies will in
fact use the non-shadowed JAR unless explicitly depending on the
"shadow" configuration.

Finally, we've cleaned up the logic used for unit testing, so we are
now correctly testing against the shadow JAR when the plugin is applied.
This better represents a real-world scenario for consumers and provides
better test coverage for incorrectly declared dependencies.

(cherry picked from commit 3698131109c7e78bdd3a3340707e1c7b4740d310)
2019-10-28 12:11:55 -07:00
Benjamin Trent 6ea59dd428
[ML][Transforms] add wait_for_checkpoint flag to stop (#47935) (#48591)
Adds `wait_for_checkpoint` for `_stop` API.
2019-10-28 13:02:57 -04:00
Gordon Brown 5021410165
Retry on RepositoryException in SLM tests (#48548)
Due to a bug, GETing a snapshot can cause a RespositoryException to be
thrown. This error is transient and should be retried, rather than
causing the test to fail. This commit converts those
RepositoryExceptions into AssertionErrors so that they will be retried
in code wrapped in assertBusy.
2019-10-28 09:24:38 -07:00