130 Commits

Author SHA1 Message Date
Benjamin Trent
eefe7688ce
[7.x][ML] ML Model Inference Ingest Processor (#49052) (#49257)
* [ML] ML Model Inference Ingest Processor (#49052)

* [ML][Inference] adds lazy model loader and inference (#47410)

This adds a couple of things:

- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:

* [ML][Inference] Adjust inference configuration option API (#47812)

* [ML][Inference] adds logistic_regression output aggregator (#48075)

* [ML][Inference] Adding read/del trained models (#47882)

* [ML][Inference] Adding inference ingest processor (#47859)

* [ML][Inference] fixing classification inference for ensemble (#48463)

* [ML][Inference] Adding model memory estimations (#48323)

* [ML][Inference] adding more options to inference processor (#48545)

* [ML][Inference] handle string values better in feature extraction (#48584)

* [ML][Inference] Adding _stats endpoint for inference (#48492)

* [ML][Inference] add inference processors and trained models to usage (#47869)

* [ML][Inference] add new flag for optionally including model definition (#48718)

* [ML][Inference] adding license checks (#49056)

* [ML][Inference] Adding memory and compute estimates to inference (#48955)

* fixing version of indexed docs for model inference
2019-11-18 13:19:17 -05:00
Rory Hunter
c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Dimitris Athanasiou
dfc6a13b44
[7.x][ML] Handle nested arrays in source fields (#48885) (#48889)
Backport of #48885
2019-11-07 07:30:50 +02:00
Dimitris Athanasiou
1f662e0b12
[7.x][ML] Prevent fetching multi-field from source (#48770) (#48797)
Aggregatable mutli-fields are at the moment wrongly mapped
as normal doc_value fields and thus they support fetching from
source. However, they do not exist in the source. This results
to failure to extract such fields.

This commit fixes this bug. While a fix could be worked out
on top of the existing code, it is evident the extraction logic
has become difficult to understand and maintain. As we also
want to deduplicate multi-fields for data frame analytics,
it seemed appropriate to refactor the code to simplify and
better handle the extraction of multi-fields.

Relates #48756

Backport of #48770
2019-11-01 14:18:03 +02:00
Przemysław Witek
7c944d26c5
[7.x] Assert that the results of classification analysis can be evaluated using _evaluate API. (#48626) (#48634) 2019-10-29 16:20:56 +01:00
Przemysław Witek
7e30277a37
Mute RegressionIT.testStopAndRestart (#48575) (#48576) 2019-10-28 13:08:11 +01:00
Przemysław Witek
149537a165
Assert that inference model has been persisted (#48332) (#48453) 2019-10-24 14:18:43 +02:00
Przemysław Witek
60d8ecb2b7
Mute ClassificationIT tests (#48338) (#48339) 2019-10-22 12:45:50 +02:00
Przemysław Witek
2db2b945ec
[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174) (#48294) 2019-10-21 22:07:19 +02:00
Benjamin Trent
abd1b5118f
[ML] fixing tests (#48084) (#48253)
* [ML] fixing tests

* unmuting tests

* reverting outlier detection job changes
2019-10-21 09:21:06 -04:00
Przemysław Witek
28f68fa221
Make num_top_classes parameter's default value equal to 2 (#48119) (#48201) 2019-10-17 18:43:15 +02:00
Dimitris Athanasiou
e0489fc328
[7.x][ML] Always refresh dest index before starting analytics process (#48090) (#48196)
If a job stops right after reindexing is finished but before
we refresh the destination index, we don't refresh at all.
If the job is started again right after, it jumps into the analyzing state.
However, the data is still not searchable.
This is why we were seeing test failures that we start the process
expecting X rows (where X is lower than the expected number of docs)
and we end up getting X+.

We fix this by moving the refresh of the dest index right before
we start the process so it always ensures the data is searchable.

Closes #47612

Backport of #48090
2019-10-17 17:20:19 +01:00
Benjamin Trent
ee110c2d42
[ML] Muting tests due to #48085 (#48086) (#48154) 2019-10-16 15:46:50 -04:00
Przemysław Witek
8f815240b3
[7.x] Allow integer types for classification's dependent variable (#47902) (#48080) 2019-10-16 11:09:56 +02:00
David Roberts
d9c7e3847e [TEST] Don't assert order of data frame analytics audit messages (#48065)
Audit messages are stored with millisecond timestamps. If two
messages have the same millisecond timestamp then asserting on
their order is impossible given the information available.

This PR changes the assertion on audit messages in the native
data frame analytics tests to assert that the expected audit
messages exist in any order.

Fixes #48035
2019-10-15 19:59:52 +01:00
Przemysław Witek
eaa56344b5
Verify that the failure reason of analytics process is empty (#48042) (#48071) 2019-10-15 18:33:20 +02:00
Przemysław Witek
620bd9d224
Enable test testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows_TopClassesRequested now that top classes are correctly reported by C++. (#48043) (#48053) 2019-10-15 14:49:16 +02:00
David Roberts
984323783e
[ML][7.x] Add lazy assignment job config option (#47993)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-15 06:55:11 +01:00
David Roberts
1ca25bed38
[ML][7.x] Add option to stop datafeed that finds no data (#47995)
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Backport of #47922
2019-10-14 17:19:13 +01:00
Przemysław Witek
c62fe8c344
Require that the dependent variable column has at most 2 distinct values in classfication analysis. (#47858) (#47906) 2019-10-11 14:57:08 +02:00
Igor Motov
b5afa95fd8 Fix Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart
Tracked by #47612
2019-10-10 18:17:01 +04:00
Igor Motov
17433e79d8 Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart
Tracked by #47612
2019-10-10 17:56:23 +04:00
Dimitris Athanasiou
7667ea5f6f
[7.x][ML] Additional outlier detection parameters (#47600) (#47669)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
2019-10-07 18:21:33 +03:00
Dimitris Athanasiou
ffacfc642c
[7.x][ML] Mute RegressionIT.testStopAndRestart (#47624) (#47625)
Relates #47612
2019-10-05 23:58:32 +03:00
Przemysław Witek
ee952da2e2
[7.x] Implement evaluation API for multiclass classification problem (#47126) (#47343) 2019-10-04 17:54:51 +02:00
Przemysław Witek
ec9b77deaa
[7.x] Implement new analysis type: classification (#46537) (#47559) 2019-10-04 13:47:19 +02:00
Dimitris Athanasiou
36884a3c32
[7.x][ML] Restore analytics state if available (#47128) (#47393)
This commit restores the model state if available in data
frame analytics jobs.

In addition, this changes the start API so that a stopped job
can be restarted. As we now store the progress in the state index
when the task is stopped, we can use it to determine what state
the job was in when it got stopped.

Note that in order to be able to distinguish between a job
that runs for the first time and another that is restarting,
we ensure reindexing progress is reported to be at least 1
for a running task.
2019-10-02 10:24:05 +03:00
Rory Hunter
53a4d2176f
Convert most awaitBusy calls to assertBusy (#45794) (#47112)
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.

There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.

I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.

Other changes:

  * Rework `waitForDocs` to scale its timeout. Instead of calling
    `assertBusy` in a loop, work out a reasonable overall timeout and await
    just once.
  * Some tests failed after switching to `assertBusy` and had to be fixed.
  * Correct the expect templates in AbstractUpgradeTestCase.  The ES
    Security team confirmed that they don't use templates any more, so
    remove this from the expected templates. Also rewrite how the setup
    code checks for templates, in order to give more information.
  * Remove an expected ML template from XPackRestTestConstants The ML team
    advised that the ML tests shouldn't be waiting for any
    `.ml-notifications*` templates, since such checks should happen in the
    production code instead.
  * Also rework the template checking code in `XPackRestTestHelper` to give
    more helpful failure messages.
  * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
2019-09-29 12:21:46 +01:00
David Roberts
77cc6d5bad [TEST] Work around _cat/indices bug with security enabled (#47160)
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652
2019-09-26 13:29:40 +01:00
Yannick Welsch
eb86d71edd Mute MlJobIT.testDeleteJob
Relates #45652
2019-09-25 12:53:09 +02:00
Yannick Welsch
7a5b5af171 Mute MlJobIT.testDeleteJobAsync
Relates #45652
2019-09-25 12:53:05 +02:00
Dimitris Athanasiou
02a5e153dc
[7.x][ML] Parse and index data frame analytics state (#46804) (#46820)
This commit reuses the same state processor that is used for autodetect
to parse state output from data frame analytics jobs. We then index the
state document into the state index.

Backport of #46804
2019-09-18 20:37:40 +03:00
Dimitris Athanasiou
cebe8da617
[7.x][ML] MlMemoryTracker should ignore analytics tasks without config (#46789) (#46811)
It is possible for a running analytics job that its config is removed
from the '.ml-config' index (perhaps the user deleted the entire index,
etc.). In that case the task remains without a matching config. I have
raised #46781 to discuss how to deal with this issue.

This commit focuses on `MlMemoryTracker` and changes it so that when
we get the configs for the running tasks we leniently ignore missing ones.
This at least means memory tracking will keep working for other jobs
if one or more are missing.

In addition, this commit makes the cleanup code for native analytics
tests more robust by explicitly stopping all jobs and force-stopping
if an error occurs. This helps so that a single failing test does
not cause other tests fail due to pending tasks.

Backport of #46789
2019-09-18 16:35:25 +03:00
Przemysław Witek
e49be611ad
[7.x] Add audit messages for Data Frame Analytics (#46521) (#46738) 2019-09-16 21:21:38 +02:00
David Roberts
461de5b58e [TEST] Remove incorrect data frame analytics state assertion (#46597)
After starting the analytics job and checking its state
the state can be any of "started", "reindexing" or
"analyzing" depending on how quickly the work is done.
2019-09-11 16:33:14 +01:00
Przemysław Witek
e38e631dac
[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967) (#46519) 2019-09-11 12:17:26 +02:00
Dimitris Athanasiou
a6834068e3
[7.x][ML] Extract DataFrameAnalyticsTask into its own class (#46402) (#46426)
This refactors `DataFrameAnalyticsTask` into its own class.
The task has quite a lot of functionality now and I believe it would
make code more readable to have it live as its own class rather than
an inner class of the start action class.

Backport of #46402
2019-09-06 14:13:46 +03:00
Dimitris Athanasiou
8fca5b5204
[7.x][ML] Unmute testStopOutlierDetectionWithEnoughDocumentsToScroll (#46271) (#46282)
The test seems to have been failing due to a race condition between
stopping the task and refreshing the destination index. In particular,
we were going forward with refreshing the destination index even
though the task stopped in the meantime. This was fixed in
request.

Closes #43960

Backport of #46271
2019-09-04 10:57:01 +03:00
Dimitris Athanasiou
873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Dimitris Athanasiou
be554fe5f0
[7.x][ML] Improve progress reportings for DF analytics (#45856) (#45910)
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.

This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.

In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.

This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.

When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
2019-08-23 23:04:39 +03:00
Przemysław Witek
85d55e30d0
Add test that proves _timing_stats document is deleted when the job is deleted (#45840) (#45854) 2019-08-23 07:03:09 +02:00
Przemysław Witek
bf701b83d2
Shorten field names in EstimateMemoryUsageResponse (#45719) (#45772) 2019-08-21 12:45:09 +02:00
Przemysław Witek
c6709f0979
Mute tests affected by renaming fields in Estimate memory usage response (#45743) (#45766) 2019-08-21 09:57:23 +02:00
Dimitris Athanasiou
d5c3d9b50f
[7.x][ML] Do not skip rows with missing values for regression (#45751) (#45754)
Regression analysis support missing fields. Even more, it is expected
that the dependent variable has missing fields to the part of the
data frame that is not for training.

This commit allows to declare that an analysis supports missing values.
For such analysis, rows with missing values are not skipped. Instead,
they are written as normal with empty strings used for the missing values.

This also contains a fix to the integration test.

Closes #45425
2019-08-21 08:15:38 +03:00
Przemysław Witek
7bc8400222
Call the new _estimate_memory_usage API endpoint on df analytics _start (#45536) (#45701) 2019-08-19 21:37:55 +02:00
Przemysław Witek
df574e5168
[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) (#45510) 2019-08-14 08:26:03 +02:00
Armin Braun
90803a5caf
Reenable Integ Tests in native-multi-node-tests (#45482) (#45496)
* Reenable Integ Tests in native-multi-node-tests

* The tests broken here were likely fixed by #45463 => let's reenable them and see if things run fine again
* Relates #45405, #45455
2019-08-13 15:55:54 +02:00
Mark Vieira
7e3379444b
Fix build failure due to unknown task and disable test conventions
(cherry picked from commit 8ed84bc5cef9bcfae6c817059f764d97e4451a4a)
2019-08-12 09:18:39 -07:00
Przemyslaw Gomulka
421e9b8e8b
Mute integ tests in native-multi-node-tests (#45457)
Tracked at #45405
2019-08-12 17:42:24 +02:00
Przemyslaw Gomulka
d11ae08467
Muting ForecastIT.testOverflowToDisk (#45435) (#45438)
awaits #45405
2019-08-12 11:01:32 +02:00