Commit Graph

548 Commits

Author SHA1 Message Date
Dimitris Athanasiou c23a2187da
[7.x][ML] Only report complete writing_results progress after completion (#49551) (#49577)
We depend on the number of data frame rows in order to report progress
for the writing of results, the last phase of a job run. However, results
include other objects than just the data frame rows (e.g, progress, inference model, etc.).

The problem this commit fixes is that if we receive the last data frame row results
we'll report that progress is complete even though we still have more results to process
potentially. If the job gets stopped for any reason at this point, we will not be able
to restart the job properly as we'll think that the job was completed.

This commit addresses this by limiting the max progress we can report for the
writing_results phase before the results processor completes to 98.
At the end, when the process is done we set the progress to 100.

The commit also improves failure capturing and reporting in the results processor.

Backport of #49551
2019-11-26 12:20:37 +02:00
Benjamin Trent 688c78c589
[ML] Stop timing stats failure propagation (#49495) (#49501) 2019-11-25 10:09:30 -05:00
David Roberts 62811c2272 [ML] Add default categorization analyzer definition to ML info (#49545)
The categorization job wizard in the ML UI will use this
information when showing the effect of the chosen categorization
analyzer on a sample of input.
2019-11-25 13:39:16 +00:00
Dimitris Athanasiou aca38f6882
[7.x][ML] DFA jobs should accept excluding an unsupported field (#49535) (#49544)
Before this change excluding an unsupported field resulted in
an error message that explained the excluded field could not be
detected as if it doesn't exist. This error message is confusing.

This commit commit changes this so that there is no error in this
scenario. When excluding a field that does exist but has been
automatically been excluded from the analysis there is no harm
(unlike excluding a missing field which could be a typo).

Backport of #49535
2019-11-25 15:13:00 +02:00
Dimitris Athanasiou c149c64dc4
[7.x][ML] Apply source query on data frame analytics memory estimation (#49517) (#49532)
Closes #49454

Backport of #49517
2019-11-25 12:51:57 +02:00
Dimitris Athanasiou 8eaee7cbdc
[7.x][ML] Explain data frame analytics API (#49455) (#49504)
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.

The API consolidates information that is useful before
creating a data frame analytics job.

It includes:

- memory estimation
- field selection explanation

Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.

Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.

Backport of #49455
2019-11-22 22:06:10 +02:00
Benjamin Trent a7477ad7c3
[7.x] [ML][Inference] compressing model definition and lazy parsing (#49269) (#49446)
* [ML][Inference] compressing model definition and lazy parsing (#49269)

* [ML][Inference] compressing model definition and lazy parsing

* addressing PR comments

* adding commons io

* implementing simplified bounded stream

* adjusting for type inclusion
2019-11-21 15:32:32 -05:00
Benjamin Trent d41b2e3f38
[ML][Inference] allowing per-model licensing (#49398) (#49435)
* [ML][Inference] allowing per-model licensing

* changing to internal action + removing pre-mature opt
2019-11-21 09:46:34 -05:00
Przemysław Witek c7ac2011eb
[7.x] Implement accuracy metric for multiclass classification (#47772) (#49430) 2019-11-21 15:01:18 +01:00
David Roberts 20558cf61c [ML] Fix simultaneous stop and force stop datafeed (#49367)
If a datafeed is stopped normally and force stopped at the same
time then it is possible that the force stop removes the
persistent task while the normal stop is performing actions.
Currently this causes the normal stop to error, but since
stopping a stopped datafeed is not an error this doesn't make
sense. Instead the force stop should just take precedence.

This is a followup to #49191 and should really have been
included in the changes in that PR.
2019-11-20 12:52:47 +00:00
Przemysław Witek 9c0ec7ce23
[7.x] Make AnalyticsProcessManager class more robust (#49282) (#49356) 2019-11-20 10:08:16 +01:00
Dimitris Athanasiou 4d6e037e90
[7.x][ML] Extract creation of DFA field extractor into a factory (#49315) (#49329)
This commit moves the async calls required to retrieve the components
that make up `ExtractedFieldsExtractor` out of `DataFrameDataExtractorFactory`
and into a dedicated `ExtractorFieldsExtractorFactory` class.

A few more refactorings are performed:

  - The detector no longer needs the results field. Instead, it knows
  whether to use it or not based on whether the task is restarting.
  - We pass more accurately whether the task is restarting or not.
  - The validation of whether fields that have a cardinality limit
  are valid is now performed in the detector after retrieving the
  respective cardinalities.

Backport of #49315
2019-11-20 10:02:42 +02:00
Przemysław Witek 42bb8ae525
[7.x] Extract indexData method out of RegressionIT tests (#49306) (#49313) 2019-11-19 22:47:12 +01:00
Benjamin Trent 19602fd573
[ML][Inference] changing setting to be memorySizeSettting (#49259) (#49302) 2019-11-19 07:56:40 -05:00
David Roberts a5204c1c80
[ML] Fixes for stop datafeed edge cases (#49284)
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of #49191
2019-11-19 10:51:46 +00:00
Benjamin Trent eefe7688ce
[7.x][ML] ML Model Inference Ingest Processor (#49052) (#49257)
* [ML] ML Model Inference Ingest Processor (#49052)

* [ML][Inference] adds lazy model loader and inference (#47410)

This adds a couple of things:

- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:

* [ML][Inference] Adjust inference configuration option API (#47812)

* [ML][Inference] adds logistic_regression output aggregator (#48075)

* [ML][Inference] Adding read/del trained models (#47882)

* [ML][Inference] Adding inference ingest processor (#47859)

* [ML][Inference] fixing classification inference for ensemble (#48463)

* [ML][Inference] Adding model memory estimations (#48323)

* [ML][Inference] adding more options to inference processor (#48545)

* [ML][Inference] handle string values better in feature extraction (#48584)

* [ML][Inference] Adding _stats endpoint for inference (#48492)

* [ML][Inference] add inference processors and trained models to usage (#47869)

* [ML][Inference] add new flag for optionally including model definition (#48718)

* [ML][Inference] adding license checks (#49056)

* [ML][Inference] Adding memory and compute estimates to inference (#48955)

* fixing version of indexed docs for model inference
2019-11-18 13:19:17 -05:00
Przemysław Witek 5f9965e4b8
Lower minimum model memory limit value from 1MB to 1kB. (#49227) (#49242) 2019-11-18 14:58:20 +01:00
Dimitris Athanasiou 805c31e19e
[7.x][ML] Avoid NPE when node load is calculated on job assignment (#49186) (#49214)
This commit fixes a NPE problem as reported in #49150.
But this problem uncovered that we never added proper handling
of state for data frame analytics tasks.

In this commit we improve the `MlTasks.getDataFrameAnalyticsState`
method to handle null tasks and state tasks properly.

Closes #49150

Backport of #49186
2019-11-18 10:33:07 +02:00
Przemysław Witek 150db2b544
Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143) (#49164) 2019-11-18 07:52:57 +01:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Przemysław Witek e6ad3c29fd
Do not throw exceptions resulting from persisting datafeed timing stats. (#49044) (#49050) 2019-11-13 20:23:13 +01:00
Christoph Büscher 6119f0aaa2 Fix Eclipse compilation in DataFrameDataExtractorTests (#48942) 2019-11-11 16:17:55 +01:00
Dimitris Athanasiou dfc6a13b44
[7.x][ML] Handle nested arrays in source fields (#48885) (#48889)
Backport of #48885
2019-11-07 07:30:50 +02:00
David Roberts c03f7ba74c [TEST] Mute TimeoutCheckerTests.testWatchdog
Due to https://github.com/elastic/elasticsearch/issues/48861
2019-11-05 11:49:46 +00:00
Dimitris Athanasiou f2d4c94a9c
[7.x][ML] Deduplicate multi-fields for data frame analytics (#48799) (#48806)
In the case multi-fields exist in the source index, we pick
all variants of them in our extracted fields detection for
data frame analytics. This means we may have multiple instances
of the same feature. The worse consequence of this is when the
dependent variable (for regression or classification) is also
duplicated which means we train a model on the dependent variable
itself.

Now that #48770 is merged, this commit is adding logic to
only select one variant of multi-fields.

Closes #48756

Backport of #48799
2019-11-01 16:53:05 +02:00
Dimitris Athanasiou 1f662e0b12
[7.x][ML] Prevent fetching multi-field from source (#48770) (#48797)
Aggregatable mutli-fields are at the moment wrongly mapped
as normal doc_value fields and thus they support fetching from
source. However, they do not exist in the source. This results
to failure to extract such fields.

This commit fixes this bug. While a fix could be worked out
on top of the existing code, it is evident the extraction logic
has become difficult to understand and maintain. As we also
want to deduplicate multi-fields for data frame analytics,
it seemed appropriate to refactor the code to simplify and
better handle the extraction of multi-fields.

Relates #48756

Backport of #48770
2019-11-01 14:18:03 +02:00
David Roberts c3063c4e1f [ML] Make the URL of the ML C++ Ivy repo configurable (#48702)
At present the ML C++ artifact is always downloaded from
S3.  This change adds an option to configure the location.

(The intention is to use a file:/// URL to pick up the
artifact built in a Docker container in ml-cpp PR builds
so that C++ changes that will break Java integration tests
can be detected before the ml-cpp PRs are merged.)

Relates elastic/ml-cpp#766
2019-10-31 09:21:44 +00:00
Dimitris Athanasiou 919596b2e8
[7.x][ML] Move field extraction logic to its own package (#48709) (#48712)
Moves common field extraction logic to its own package so that it can
be used both for anomaly detection and data frame analytics.

In preparation for refactoring extraction fields to be simpler and to
support multi-fields properly.

Backport of #48709
2019-10-31 02:41:00 +02:00
Benjamin Trent c9ead80c31
[7.x] [ML][Inference] separating definition and config object storage (#48651) (#48695)
* [ML][Inference] separating definition and config object storage (#48651)

This separates out the `definition` object from being stored within the configuration object in the index. 

This allows us to gather the config object without decompressing a potentially large definition.

Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.
2019-10-30 13:27:29 -04:00
Przemysław Witek 7c944d26c5
[7.x] Assert that the results of classification analysis can be evaluated using _evaluate API. (#48626) (#48634) 2019-10-29 16:20:56 +01:00
Przemysław Witek 7e30277a37
Mute RegressionIT.testStopAndRestart (#48575) (#48576) 2019-10-28 13:08:11 +01:00
Martijn van Groningen b034153df7
Change grok watch dog to be Matcher based instead of thread based. (#48346)
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
2019-10-24 15:34:01 +02:00
Przemysław Witek 149537a165
Assert that inference model has been persisted (#48332) (#48453) 2019-10-24 14:18:43 +02:00
Przemyslaw Gomulka aaa6209be6
[7.x] [Java.time] Calculate week of a year with ISO rules BACKPORT(#48209) (#48349)
Reverting the change introducing IsoLocal.ROOT and introducing IsoCalendarDataProvider that defaults start of the week to Monday and requires minimum 4 days in first week of a year. This extension is using java SPI mechanism and defaults for Locale.ROOT only.
It require jvm property java.locale.providers to be set with SPI,COMPAT

closes #41670
backport #48209
2019-10-23 17:39:38 +02:00
Przemysław Witek 60d8ecb2b7
Mute ClassificationIT tests (#48338) (#48339) 2019-10-22 12:45:50 +02:00
Przemysław Witek 2db2b945ec
[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174) (#48294) 2019-10-21 22:07:19 +02:00
Benjamin Trent abd1b5118f
[ML] fixing tests (#48084) (#48253)
* [ML] fixing tests

* unmuting tests

* reverting outlier detection job changes
2019-10-21 09:21:06 -04:00
rsarawgi 5e4dd0fd2e [ML] Removing usages of ToXContentParams.INCLUDE_TYPE (#48165)
Removing the option of ToXContentParams.INCLUDE_TYPE and replacing them with ToXContentParams.FOR_INTERNAL_STORAGE
Closes #48057
2019-10-18 14:49:26 +01:00
Przemysław Witek 28f68fa221
Make num_top_classes parameter's default value equal to 2 (#48119) (#48201) 2019-10-17 18:43:15 +02:00
Dimitris Athanasiou e0489fc328
[7.x][ML] Always refresh dest index before starting analytics process (#48090) (#48196)
If a job stops right after reindexing is finished but before
we refresh the destination index, we don't refresh at all.
If the job is started again right after, it jumps into the analyzing state.
However, the data is still not searchable.
This is why we were seeing test failures that we start the process
expecting X rows (where X is lower than the expected number of docs)
and we end up getting X+.

We fix this by moving the refresh of the dest index right before
we start the process so it always ensures the data is searchable.

Closes #47612

Backport of #48090
2019-10-17 17:20:19 +01:00
Benjamin Trent ee110c2d42
[ML] Muting tests due to #48085 (#48086) (#48154) 2019-10-16 15:46:50 -04:00
Benjamin Trent 0dddbb5b42
[ML] Parse and index inference model (#48016) (#48152)
This adds parsing an inference model as a possible
result of the analytics process. When we do parse such a model
we persist a `TrainedModelConfig` into the inference index
that contains additional metadata derived from the running job.
2019-10-16 15:46:20 -04:00
Przemysław Witek 8f815240b3
[7.x] Allow integer types for classification's dependent variable (#47902) (#48080) 2019-10-16 11:09:56 +02:00
David Roberts d9c7e3847e [TEST] Don't assert order of data frame analytics audit messages (#48065)
Audit messages are stored with millisecond timestamps. If two
messages have the same millisecond timestamp then asserting on
their order is impossible given the information available.

This PR changes the assertion on audit messages in the native
data frame analytics tests to assert that the expected audit
messages exist in any order.

Fixes #48035
2019-10-15 19:59:52 +01:00
Przemysław Witek eaa56344b5
Verify that the failure reason of analytics process is empty (#48042) (#48071) 2019-10-15 18:33:20 +02:00
Przemysław Witek 620bd9d224
Enable test testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows_TopClassesRequested now that top classes are correctly reported by C++. (#48043) (#48053) 2019-10-15 14:49:16 +02:00
David Roberts 984323783e
[ML][7.x] Add lazy assignment job config option (#47993)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-15 06:55:11 +01:00
David Roberts 1ca25bed38
[ML][7.x] Add option to stop datafeed that finds no data (#47995)
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Backport of #47922
2019-10-14 17:19:13 +01:00
David Roberts 46ae86ac31 [ML] Fix detection of syslog-like timestamp in find_file_structure (#47970)
Usually syslog timestamps have two spaces before a single
digit day-of-month. However, in some non-syslog cases
where syslog-like timestamps are used there is only one
space. The grok pattern supports this, so the timestamp
parser should too. This change makes the
find_file_structure endpoint do this.

Also fixes another problem that the same test case
exposed in the find_file_structure endpoint, which was
that the exclude_lines_pattern for delimited files was
always created on the assumption the delimiter was a
comma. Now it is based on the actual delimiter.
2019-10-13 20:07:54 +01:00
Benjamin Trent 627faf1850
[7.x] [ML][Analytics] fix bug where regression deleted early does not delete state (#47885) (#47914)
* [ML][Analytics] fix bug where regression deleted early does not delete state (#47885)

* [ML][Analytics] fix bug where regression deleted early does not delete state

* Fixing ml with security test failure

* fixing for older java
2019-10-11 15:11:16 -04:00