Commit Graph

1062 Commits

Author SHA1 Message Date
Dimitris Athanasiou 197de8fe66
[7.10][ML] Increase timeout waiting for DFA jobs to finish in integ tests (#65126) (#65131)
It appears that occasionally 30 seconds are not enough for CI workers
to complete DFA jobs. In order to eliminate such failures we increase
the time we wait for DFA jobs to complete in integration tests to
60 seconds.

Fixes #64926

Backport of #65126
2020-11-17 16:46:17 +02:00
Przemysław Witek de668ab84b
[7.10] [ML] Extract dependent variable's mapping correctly in case of a multi-field (#63813) (#64287) 2020-11-16 10:34:58 +01:00
Benjamin Trent b888f36388
[ML] fix custom feature processor extraction bugs around boolean fields and custom one_hot feature output order (#64937) (#65009)
This commit fixes two problems:

- When extracting a doc value, we allow boolean scalars to be used as input
- The output order of processed feature names is deterministic. Previous custom one hot fields used to be non-deterministic and thus could cause weird bugs.
2020-11-12 11:15:57 -05:00
Dimitris Athanasiou b5efaf6e3b
[7.10][ML] Protect against stack overflow while loading DFA data (#64947) (#64956)
If we encounter an exception during extracting data in a data
frame analytics job, we retry once. However, we were not catching
exceptions thrown from processing the search response. This may
result in an infinite loop that causes a stack overflow.

This commit fixes this problem.

Backport of #64947
2020-11-12 11:08:40 +02:00
Daniel Mitterdorfer b8c9780c23
Mute multiple tests in ClassificationIT (#64930)
Relates #64926
2020-11-11 15:30:19 +01:00
Benjamin Trent f0ff673f82
[ML] Fix bug with data frame analytics classification test data sampling when using custom feature processors (#64727) (#64864)
When using custom processors, the field names extracted from the documents are not the
same as the feature names used for training.

Consequently, it is possible for the stratified sampler to have an incorrect view of the feature rows.
This can lead to the wrong column being read for the class label, and thus throw errors on training
row extraction.

This commit changes the training row feature names used by the stratified sampler so that it matches
the names (and their order) that are sent to the analytics process.
2020-11-10 08:47:07 -05:00
Benjamin Trent dafafd7ec6
[ML] fix edge case for data frame analytics where a field mapped as a keyword actually has boolean and string values in the _source (#64826) (#64862)
It is possible that a value mapped as a `keyword` has any scalar value type. This includes any numerical value, String, or boolean.

This commit allows `boolean` types to be considered as a part of the categorical feature collection when this is the case.
2020-11-10 08:46:52 -05:00
Dimitris Athanasiou e3fee3e6df
[ML] Ignore _doc_count field in data frame analytics (#64541)
This field is added from version 7.11 onwards. We are
adding it to the list of ignored fields for data frame analytics
in 7.10 to avoid failing to start an outlier detection job
in a mixed cluster environment.

Relates #64503
2020-11-03 19:53:17 +02:00
David Roberts e0d0ac86dd [ML] Increase forecast test timeout (#64471)
ForecastIT.testOverflowToDisk has been observed to fail a few
times in FIPS JVMs because it takes longer than the permitted
30 seconds.  This PR bumps the timeout up to 60 seconds.

Fixes #63793
2020-11-02 14:58:18 +00:00
David Roberts adc5509eda
[ML] Support the unsigned_long type in data frame analytics (#64072)
Adds support for the unsigned_long type to data frame analytics.

This type is handled in the same way as the long type.  Values
sent to the ML native processes are converted to floats and
hence will lose accuracy when outside the range where a float
can uniquely represent long values.

Backport of #64066
2020-10-26 09:05:49 +00:00
David Roberts cb0c538b35 [ML] Fix rare ML daily maintenance test race condition (#64043)
Depending on thread scheduling the ML daily maintenance
tests could do one more iteration than expected, causing
rare failures.

Fixes #64036
2020-10-22 13:03:02 +01:00
Benjamin Trent eff7f06ca6
[ML] fix inference binary classification predication label and feature importance (#63688) (#63930)
When calculating feature importance, the leaf values directly correlate the value of the importance.

Consequently, positive leaf values -> positive feature importance

negative leaf values -> negative feature importance.

It follows that for binary classification, this is done such that the importance relates to the leaf values, which relate directly to the "probability of class 1".

So, the feature importance calculated is always for the importance as it relates to class 1.

The inverse is the importance as it relates to class 0.
2020-10-20 08:50:15 -04:00
Przemysław Witek acbd48f834
[ML] Allow setting num_top_classes to a special value -1 (#63587) (#63602) 2020-10-13 13:57:50 +02:00
David Roberts 3f210e2620 [ML] Load data streams plugin for ML internal cluster tests (#63560)
Now that deprecation logs get indexed to a data stream, if we
do not load the data stream plugin in our tests and any test
generates a deprecation log message then millions of exceptions
get logged, slowing down the tests to the extent that they can
fail.

This change loads the data streams plugin during the ML internal
cluster tests.  (It should already be present in external cluster
tests.)

Fixes #63548
2020-10-12 17:46:50 +01:00
Dimitris Athanasiou e1c418aac7
[7.10][ML] Validate dest pipeline exists on transform update (#63494) (#63549)
Adds validation that the dest pipeline exists when a transform
is updated. Refactors the pipeline check into the `SourceDestValidator`.

Fixes #59587

Backport of #63494
2020-10-12 15:41:35 +03:00
Benjamin Trent a9be4181c6
[ML] fix grabbing the doc value limit setting in _explain (#63402) (#63471)
Getting the doc value settings shouldn't use the API callers headers. We only use this value internally.
2020-10-08 08:53:26 -04:00
David Roberts a9d541561f [ML] Unmute DeleteExpiredDataIT.testDeleteExpiredDataNoThrottle (#63408)
This test appears to work again following the Lucene bug fix
that was integrated in #63395
2020-10-08 09:11:29 +01:00
Przemysław Witek bd761cce1d
[ML] Validate that AucRoc has the data necessary to be calculated (#63302) (#63454) 2020-10-08 09:52:15 +02:00
Luca Cavanna 659988a77f
Remove runtime fields (#63418)
We are not going to release runtime fields with 7.10, hence we are removing them from the 7.10 branch.
2020-10-07 20:39:41 +02:00
Tim Vernum c30c5555c5
Mute DeleteExpiredDataIT deleteExpired NoThrottle (#63381)
Mutes test method DeleteExpiredDataIT.testDeleteExpiredDataNoThrottle

Relates: #63379
Backport of: #63380
2020-10-07 17:43:52 +11:00
Gordon Brown 5c8b0662df
Deprecate REST access to System Indices (#63274) (Original #60945)
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.

Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:

- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`

Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
2020-10-06 13:41:40 -06:00
Igor Motov 2405162c39
Mute RegressionIT.testAliasFields test (#63339)
It fails quite frequently in 7.x.

Relates to #63268
2020-10-06 12:18:12 -04:00
David Kyle ea32b4ab82
[ML] Audit message when nightly maintenance times out (#63252) (#63330)
During deletion of old ml data set the delete by query timeout to 8 hours and
audit a job message when the nightly maintenance task times out.
2020-10-06 16:19:37 +01:00
Benjamin Trent a72d7cc76a
[ML] prefer secondary auth headers on data frame analytics _explain (#63281) (#63323)
We should prefer secondary auth headers when calling _explain
2020-10-06 09:15:29 -04:00
David Kyle 8f4ef40f78
[ML] Auditor ensures template is installed before writes (#63286)
The ML auditors should not write if the latest template is not present. 
Instead a PUT template request is made and the writes queued up
2020-10-06 11:20:37 +01:00
Benjamin Trent 1e63313c19
[ML] adds feature_importance_baseline object to model metadata (#63172) (#63237)
this adds the new field `feature_importance_baseline` and allows it to be optionally be included in the model's metadata.

Related to: https://github.com/elastic/ml-cpp/pull/1522
2020-10-05 09:33:38 -04:00
Benjamin Trent 752ee0288e
[7.x] [ML] optimize delete expired snapshots (#63134) (#63200)
* [ML] optimize delete expired snapshots (#63134)

When deleting expired snapshots, we do an individual delete action per snapshot per job.

We should instead gather the expired snapshots and delete them in a single call.

This commit achieves this and a side-effect is there is less audit log spam on nightly cleanup

closes https://github.com/elastic/elasticsearch/issues/62875
2020-10-02 13:24:36 -04:00
Przemysław Witek 5370f270d7
[7.x] [ML] Ensure data frame analytics jobs don't run on a node that's too new (#62749) (#63175) 2020-10-02 17:19:58 +02:00
Benjamin Trent cfcf973259
[7.x] [ML] renames */inference* apis to */trained_models* (#63097) (#63136)
* [ML] renames */inference* apis to */trained_models* (#63097)

This commit renames all `inference` CRUD APIs to `trained_models`.

This aligns with internal terminology, documentation, and use-cases.
2020-10-02 07:34:28 -04:00
David Kyle 279f951700
[ML] Set parent task Id on ml expired data removers (#62854) (#62966)
Setting the parent task Id (of the delete expired data action) on the ML
expired data removers makes it easier to track and cancel long running
tasks
2020-10-02 10:14:10 +01:00
Dimitris Athanasiou e09074d382
[7.x][ML] Fix online updates with custom rules referencing filters (#63057) (#63064)
When an opened anomaly detection job is updated with a detection
rule that references a filter, apart from updating the c++ process
with the rule, we also need to update it with the referenced filter.

This commit fixes a bug which led to the job not applying such updates
on-the-fly.

Fixes #62948

Backport of #63057
2020-09-30 16:01:06 +03:00
Przemysław Witek 4366d58564
[7.x] [ML] Implement AucRoc metric for classification (#60502) (#63051) 2020-09-30 12:55:52 +02:00
Dimitris Athanasiou 179fe9cc0e
[7.x][ML] Delete dest index and reindex if incompatible (#62960) (#63050)
Data frame analytics results format changed in version `7.10.0`.
If existing jobs that were not completed are restarted, it is
possible the destination index had already been created. That index's
mappings are not suitable for the new results format.

This commit checks the version of the destination index and deletes
it when the version is outdated. The job will then continue by
recreating the destination index and reindexing.

Backport of #62960
2020-09-30 12:57:48 +03:00
David Roberts 05427c2bb2
[ML] Add timeouts to named pipe connections (#63022)
This PR adds timeouts to the named pipe connections of the
autodetect, normalize and data_frame_analyzer processes.
This argument requires the changes of elastic/ml-cpp#1514 in
order to work, so that PR will be merged before this one.
(The controller process already had a different mechanism,
tied to the ES JVM lifetime.)

Backport of #62993
2020-09-29 18:04:02 +01:00
Benjamin Trent 2b9032a07d
[7.x] [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976) (#62999)
* [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976)

This fixes the two test failures.

The shard failure seems to be due to the .ml-stats index being in the middle of being created.
2020-09-29 08:12:20 -04:00
Dimitris Athanasiou 7f6c1ff5b4
[7.x][ML] Remove top level importance from classification inference results (#62486) (#62964)
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.

Backport of #62486
2020-09-29 10:58:48 +03:00
Benjamin Trent a054e62bc4
[ML] allow datafeeds to run if there are any concrete indices (#62827) (#62965)
This commit allows a datafeed to be assigned to a node if only one index pattern has concrete indices.
2020-09-28 12:58:07 -04:00
Benjamin Trent c56424f740
[ML] write deprecation warning when include_model_definition parameter is used (#62834) (#62885)
for get trained models include_model_definition is now deprecated.

This commit writes a deprecation warning if that parameter is used and suggests the caller to utilize the replacement
2020-09-24 11:38:54 -04:00
Daniel Mitterdorfer d2166030d1
Mute failing test case in DeleteExpiredDataIT (#62870) (#62871)
Relates #62699
2020-09-24 15:42:52 +02:00
Dimitris Athanasiou 7de5201291
[7.x][ML] Handle data frame analytics state spreading over multiple docs (#62564) (#62824)
When state persistence was first implemented for data frame analytics
we had the assumption that state would always fit in a single document.
However this is not the case any more.

This commit adds handling of state that spreads over multiple documents.

Backport of #62564
2020-09-23 16:16:34 +03:00
Dimitris Athanasiou 69e72656fa
[7.x][ML] Reset reindexing progress when DFA job resumes with incomplete reindexing (#62772) (#62816)
This fixes reindexing progress in the scenario when a DFA job that had not finished
reindexing is resumed (either because the user called stop and start or because the
job was reassigned in the middle of reindexing). Before the fix reindexing progress
stays to the value it had reached before until it surpasses that value.

When we resume a data frame analytics job we want to preserve reindexing progress
and reset all other phases. Except for when reindexing was not completed.
In that case we are deleting the destination index and starting reindexing
from scratch. Thus we need to reset reindexing progress too.

Backport of #62772
2020-09-23 14:09:04 +03:00
Benjamin Trent 77bfb32635
[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) (#62784)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)

* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
 global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.

This commit addresses this issue for the ML plugin

* unmuting test
2020-09-22 15:07:08 -04:00
Nik Everett 39a617773d
Raname grok's built-in patterns (backport of #62735) (#62765)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
2020-09-22 13:06:43 -04:00
Andrei Dan 0be89bcd7f
Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (#62763) 2020-09-22 13:43:15 +01:00
Benjamin Trent 0f142c6afc
[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563) (#62629)
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.

- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
2020-09-18 11:06:07 -04:00
Benjamin Trent e163559e4c
[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) (#62620)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
2020-09-18 10:07:35 -04:00
Ignacio Vera 6a3d731be1
Only call reduce on a single InternalAggregation when needed (#62525) (#62594)
Adds a new abstract method in InternalAggregation that flags the framework if it needs to reduce on a single InternalAggregation.
2020-09-18 08:43:58 +02:00
Jake Landis 5b7246157f
[7.x] Fix projects that failed to build within Intellij (#62258) (#62408)
This commit address some build failures from the perspective of Intellij.
These changes include:
* changing an order of a dependency definition that seems to can cause Intellij build to fail.
* introduction of an abstract class out of the test source set (seems to be an issue sharing 
  classes cross projects with non-standard source sets. 
* a couple of missing dependency definitions (not sure how the command line worked prior to this)
2020-09-17 17:45:12 -05:00
Dimitris Athanasiou 7118ff7976
[7.x][ML] Remove model snapshot legacy doc ids (#62434) (#62569)
Removes methods that were no longer used regarding version 5.4 doc ids of ModelState.

Also adds clean up of 5.4 model state and quantile docs in the daily maintenance.

Backport of #62434
2020-09-17 23:43:28 +03:00
Dimitris Athanasiou f5c28e2054
[7.x][ML] Do not start data frame analytics when too many docs are analyzed (#62547) (#62558)
The data frame structure in c++ has a limit on 2^32 documents. This commit
adds a check that the number of documents involved in the analysis are
less than that and fails to start otherwise. That saves the cost of
reindexing when it is unnecessary.

Backport of #62547
2020-09-17 19:06:38 +03:00