1032 Commits

Author SHA1 Message Date
Dimitris Athanasiou
e09074d382
[7.x][ML] Fix online updates with custom rules referencing filters (#63057) (#63064)
When an opened anomaly detection job is updated with a detection
rule that references a filter, apart from updating the c++ process
with the rule, we also need to update it with the referenced filter.

This commit fixes a bug which led to the job not applying such updates
on-the-fly.

Fixes #62948

Backport of #63057
2020-09-30 16:01:06 +03:00
Przemysław Witek
4366d58564
[7.x] [ML] Implement AucRoc metric for classification (#60502) (#63051) 2020-09-30 12:55:52 +02:00
Dimitris Athanasiou
179fe9cc0e
[7.x][ML] Delete dest index and reindex if incompatible (#62960) (#63050)
Data frame analytics results format changed in version `7.10.0`.
If existing jobs that were not completed are restarted, it is
possible the destination index had already been created. That index's
mappings are not suitable for the new results format.

This commit checks the version of the destination index and deletes
it when the version is outdated. The job will then continue by
recreating the destination index and reindexing.

Backport of #62960
2020-09-30 12:57:48 +03:00
David Roberts
05427c2bb2
[ML] Add timeouts to named pipe connections (#63022)
This PR adds timeouts to the named pipe connections of the
autodetect, normalize and data_frame_analyzer processes.
This argument requires the changes of elastic/ml-cpp#1514 in
order to work, so that PR will be merged before this one.
(The controller process already had a different mechanism,
tied to the ES JVM lifetime.)

Backport of #62993
2020-09-29 18:04:02 +01:00
Benjamin Trent
2b9032a07d
[7.x] [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976) (#62999)
* [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976)

This fixes the two test failures.

The shard failure seems to be due to the .ml-stats index being in the middle of being created.
2020-09-29 08:12:20 -04:00
Dimitris Athanasiou
7f6c1ff5b4
[7.x][ML] Remove top level importance from classification inference results (#62486) (#62964)
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.

Backport of #62486
2020-09-29 10:58:48 +03:00
Benjamin Trent
a054e62bc4
[ML] allow datafeeds to run if there are any concrete indices (#62827) (#62965)
This commit allows a datafeed to be assigned to a node if only one index pattern has concrete indices.
2020-09-28 12:58:07 -04:00
Benjamin Trent
c56424f740
[ML] write deprecation warning when include_model_definition parameter is used (#62834) (#62885)
for get trained models include_model_definition is now deprecated.

This commit writes a deprecation warning if that parameter is used and suggests the caller to utilize the replacement
2020-09-24 11:38:54 -04:00
Daniel Mitterdorfer
d2166030d1
Mute failing test case in DeleteExpiredDataIT (#62870) (#62871)
Relates #62699
2020-09-24 15:42:52 +02:00
Dimitris Athanasiou
7de5201291
[7.x][ML] Handle data frame analytics state spreading over multiple docs (#62564) (#62824)
When state persistence was first implemented for data frame analytics
we had the assumption that state would always fit in a single document.
However this is not the case any more.

This commit adds handling of state that spreads over multiple documents.

Backport of #62564
2020-09-23 16:16:34 +03:00
Dimitris Athanasiou
69e72656fa
[7.x][ML] Reset reindexing progress when DFA job resumes with incomplete reindexing (#62772) (#62816)
This fixes reindexing progress in the scenario when a DFA job that had not finished
reindexing is resumed (either because the user called stop and start or because the
job was reassigned in the middle of reindexing). Before the fix reindexing progress
stays to the value it had reached before until it surpasses that value.

When we resume a data frame analytics job we want to preserve reindexing progress
and reset all other phases. Except for when reindexing was not completed.
In that case we are deleting the destination index and starting reindexing
from scratch. Thus we need to reset reindexing progress too.

Backport of #62772
2020-09-23 14:09:04 +03:00
Benjamin Trent
77bfb32635
[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) (#62784)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)

* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
 global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.

This commit addresses this issue for the ML plugin

* unmuting test
2020-09-22 15:07:08 -04:00
Nik Everett
39a617773d
Raname grok's built-in patterns (backport of #62735) (#62765)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
2020-09-22 13:06:43 -04:00
Andrei Dan
0be89bcd7f
Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (#62763) 2020-09-22 13:43:15 +01:00
Benjamin Trent
0f142c6afc
[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563) (#62629)
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.

- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
2020-09-18 11:06:07 -04:00
Benjamin Trent
e163559e4c
[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) (#62620)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
2020-09-18 10:07:35 -04:00
Ignacio Vera
6a3d731be1
Only call reduce on a single InternalAggregation when needed (#62525) (#62594)
Adds a new abstract method in InternalAggregation that flags the framework if it needs to reduce on a single InternalAggregation.
2020-09-18 08:43:58 +02:00
Jake Landis
5b7246157f
[7.x] Fix projects that failed to build within Intellij (#62258) (#62408)
This commit address some build failures from the perspective of Intellij.
These changes include:
* changing an order of a dependency definition that seems to can cause Intellij build to fail.
* introduction of an abstract class out of the test source set (seems to be an issue sharing 
  classes cross projects with non-standard source sets. 
* a couple of missing dependency definitions (not sure how the command line worked prior to this)
2020-09-17 17:45:12 -05:00
Dimitris Athanasiou
7118ff7976
[7.x][ML] Remove model snapshot legacy doc ids (#62434) (#62569)
Removes methods that were no longer used regarding version 5.4 doc ids of ModelState.

Also adds clean up of 5.4 model state and quantile docs in the daily maintenance.

Backport of #62434
2020-09-17 23:43:28 +03:00
Dimitris Athanasiou
f5c28e2054
[7.x][ML] Do not start data frame analytics when too many docs are analyzed (#62547) (#62558)
The data frame structure in c++ has a limit on 2^32 documents. This commit
adds a check that the number of documents involved in the analysis are
less than that and fails to start otherwise. That saves the cost of
reindexing when it is unnecessary.

Backport of #62547
2020-09-17 19:06:38 +03:00
David Kyle
417ce9396d
[ML] Add datafeed run time fields integration test (#62535) (#62538) 2020-09-17 13:41:07 +01:00
Benjamin Trent
341eeae6e7
[ML] fixes testWatchdog test verifying matcher is interrupted on timeout (#62391) (#62447)
Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.

The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.

closes #48861
2020-09-16 09:13:22 -04:00
Benjamin Trent
8d89a28126
[ML] unmuting test for testTooManyPartitions memory check on windows (#62393) (#62405)
This commit unmutes the windows check for testTooManyPartitions test.

The assertion has since changed to include a soft_limit check.

This coupled with changes over the past years means the test should be enabled again.

related to: #32033
2020-09-16 07:03:10 -04:00
David Roberts
e4275f3749 [ML] Use utility thread pool for memory estimation (#62314)
The job comms thread pool is intended for the long-running job
processes that do anomaly detection or data frame analytics and
count towards job count and memory limits.

This commit moves the short-lived memory estimation processes
to the ML utility thread pool.

Although this doesn't matter in most cases, at the limits of
scale it could mean that memory estimations would get in the way
of starting jobs, or would queue up for an excessive period of
time while waiting for jobs to finish.
2020-09-14 16:47:12 +01:00
David Roberts
d8288526d9 [ML] Add null checks for C++ log handler (#62238)
It has been observed that if the normalizer process fails
to connect to the JVM then this causes a null pointer
exception as the JVM tries to close the native process
object.  The accessors and close methods of the native
process class that access the C++ log handler should not
assume that it connected correctly.
2020-09-14 11:28:26 +01:00
David Roberts
969a1c558b [ML] Include the "properties" layer in find_file_structure mappings (#62158)
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing.  The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects.  However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects.  As a first
step it makes sense to move the returned mappings closer
to the standard format.

This is a small building block towards fixing #55616
2020-09-10 09:33:42 +01:00
Jake Landis
d8dad9ab2c
[7.x] Remove integTest task from PluginBuildPlugin (#61879) (#62135)
This commit removes `integTest` task from all es-plugins.  
Most relevant projects have been converted to use yamlRestTest, javaRestTest, 
or internalClusterTest in prior PRs. 

A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest 
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)

related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
2020-09-09 14:25:41 -05:00
Benjamin Trent
e181e24d48
[ML] only persist progress if it has changed (#62123) (#62180)
* [ML] only persist progress if it has changed

We already search for the previously stored progress document.

For optimization purposes, and to prevent restoring the same
progress after a failed analytics job is stopped,
this commit does an equality check between the previously stored progress and current progress
If the progress has changed, persistence continues as normal.
2020-09-09 12:04:09 -04:00
Benjamin Trent
057bf3f7d5
[ML] setting require_alias to previous value on bulk index retry (#62103) (#62108)
Previous work has been done to prevent automatically creating a concrete index when an alias is desired.

This commit addresses a path where this check was not being done.

relates: #62064
2020-09-08 11:38:32 -04:00
David Roberts
b2636678b2 [ML] Add support for date_nanos fields in find_file_structure (#62048)
Now that #61324 is merged it is possible for the find_file_structure
endpoint to suggest using date_nanos fields for timestamps where
the timestamp format provides greater than millisecond accuracy.
2020-09-08 13:05:09 +01:00
David Kyle
a5b24bf44c
Mute ClassificationIT (#62063)
testWithOnlyTrainingRowsAndTrainingPercentIsFifty_DependentVariableIsBoolean
For #60759
2020-09-07 16:10:48 +01:00
Dimitris Athanasiou
d37f197efd
[7.x][ML] Allow training_percent to be any positive double up to hundred (#61977) (#61990)
This changes the valid range of `training_percent` for regression and
classification from [1, 100] to (0, 100].

Backport of #61977
2020-09-04 17:34:14 +03:00
Benjamin Trent
cec102a391
[7.x] [ML] adds new n_gram_encoding custom processor (#61578) (#61935)
* [ML] adds new n_gram_encoding custom processor (#61578)

This adds a new `n_gram_encoding` feature processor for analytics and inference.

The focus of this processor is simple ngram encodings that allow:
 - multiple ngrams [1..5]
 - Prefix, infix, suffix
2020-09-04 08:36:50 -04:00
Dimitris Athanasiou
bdccab7c7a
[7.x][ML] Add incremental id during data frame analytics reindexing (#61943) (#61971)
Previously, we added a copy of the `_id` during reindexing and sorted
the destination index on that. This allowed us to traverse the docs in the
destination index in a stable order multiple times and with efficiency.
However, the destination index being sorted means we cannot have `nested`
typed fields. This is a problem as it does not allow us to provide
a good experience with our evaluate API when it comes to computing
metrics for specific classes, features, etc.

This commit changes the approach in order to result to a destination
index that allows nested fields.

Instead of adding a copy of the `_id` field, we now add an incremental
id that we can use to traverse the docs in a stable order. We also
ensure we always assign the same incremental id to the same doc from
the source indices by sorting on `_seq_no` during reindexing. That
in combination with the reindexing API using scroll gives us a stable
order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties.

The extractor now does not need to scroll. Instead we sort on the incremental
id and we do ranged searches to avoid the sort-all-docs overhead.

Finally, the `TestDocsIterator` is simply changed to search_after the incremental id.

With these changes data frame analytics jobs do not use scroll at any part.

Having all these in place, the commit adds the `nested` types to the necessary
fields of `classification` and `regression` analyses results.

Backport of #61943
2020-09-04 13:24:42 +03:00
Tanguy Leroux
c90ee32cdc
Mute ClassificationIT.testTooLowConfiguredMemoryStillStarts (#61915)
Relates #61913
2020-09-03 15:52:01 +02:00
Dimitris Athanasiou
ec405978fc
[7.x][ML] Update reindexing task progress before persisting job progress (#61868) (#61875)
This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes #61852

Backport of #61868
2020-09-02 21:44:18 +03:00
Benjamin Trent
c22415c241
[7.x] [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846) (#61869)
* [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846)

Native PR addresses this test failure: https://github.com/elastic/ml-cpp/pull/1465


closes https://github.com/elastic/elasticsearch/issues/61704

closes https://github.com/elastic/elasticsearch/issues/61561
2020-09-02 13:23:23 -04:00
Jake Landis
794aac717d
[7.x] Convert first 1/2 x-pack plugins from integTest to [yaml | java]RestTest or internalClusterTest (#60630) (#61855)
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.

This includes the following projects:
async-search, autoscaling, ccr, enrich, eql, frozen-indicies,
data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml

A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.

A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is).

related: #61802
related: #56841
related: #59939
related: #55896
2020-09-02 11:19:24 -05:00
Dimitris Athanasiou
07ab0beea0
[7.x][ML] Improve handling of exception while starting DFA process (#61838) (#61847)
While starting the data frame analytics process it is possible
to get an exception before the process crash handler is in place.
In addition, right after starting the process, we check the process
is alive to ensure we capture a failed process. However, those exceptions
are unhandled.

This commit catches any exception thrown while starting the process
and sets the task to failed with the root cause error message.

I have also taken the chance to remove some unused parameters
in `NativeAnalyticsProcessFactory`.

Relates #61704

Backport of #61838
2020-09-02 16:32:45 +03:00
David Kyle
d268540f20
[ML] Check and install the latest template in the DFA executor (#61589) (#61842)
During a rolling upgrade it is possible that a worker node will be upgraded before
the master in which case the DFA templates will not have been installed.
Before a DFA task starts check that the latest template is installed and install it if necessary.
2020-09-02 12:16:29 +01:00
Nik Everett
f8158bdb2d Skip failing test
Tracked by https://github.com/elastic/elasticsearch/issues/61561
2020-09-01 13:44:31 -04:00
Dimitris Athanasiou
2547cfbe54
[7.x][ML] Persist progress when setting DFA task to failed (#61782) (#61792)
When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.

This commit solves this bug by persisting the progress before
setting the task to failed.

Backport of #61782
2020-09-01 18:33:07 +03:00
Benjamin Trent
7dabaad7d9
[ML] refactor ml job node selection into its own class (#61521) (#61747)
This is a minor refactor where the job node load logic (node availability, etc.) is refactored into its own class.

This will allow future things (i.e. autoscaling decisions) to use the same node load detection class.
backport of #61521
2020-08-31 14:00:23 -04:00
Henning Andersen
4c9fe31da8 Mute testTooLowConfiguredMemoryStillStarts (#61705)
Related to #61704
2020-08-31 11:19:53 +02:00
David Kyle
49a5afc6c1
[ML] Increase wait for templates timeout in tests (#61623) (#61628) 2020-08-27 12:57:12 +01:00
David Kyle
25e811ced7
Rewrite Inference yml tests for better clean up (#61180) (#61555)
Inference processors asynchronously usage write stats to the .ml-stats index after they used. 
In tests the write can leak into the next test causing failures depending on which test follows.
This change waits for the usage stats docs to be written at the end of the test
2020-08-27 11:16:26 +01:00
Dimitris Athanasiou
3ed65eb418
[7.x][ML] Recover data frame extraction search from latest sort key (#61544) (#61572)
If a search failure occurs during data frame extraction we catch
the error and retry once. However, we retry another search that is
identical to the first one. This means we will re-fetch any docs
that were already processed. This may result either to training
a model using duplicate data or in the case of outlier detection to
an error message that the process received more records than it
expected.

This commit fixes this issue by tracking the latest doc's sort key
and then using that in a range query in case we restart the search
due to a failure.

Backport of #61544

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-26 17:54:00 +03:00
Benjamin Trent
a6e7a3d65f
[7.x] [ML] write warning if configured memory limit is too low for analytics job (#61505) (#61528)
Backports the following commits to 7.x:

[ML] write warning if configured memory limit is too low for analytics job (#61505)

Having `_start` fail when the configured memory limit is too low can be frustrating. 

We should instead warn the user that their job might not run properly if their configured limit is too low. 

It might be that our estimate is too high, and their configured limit works just fine.
2020-08-26 10:35:38 -04:00
Przemyslaw Gomulka
9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Przemysław Witek
11c2710e7f
[7.x] [ML] Do not mark the DFA job as FAILED when a failure occurs after the node is shutdown (#61331) (#61526) 2020-08-26 09:53:13 +02:00