369 Commits

Author SHA1 Message Date
Rene Groeschke
d952b101e6
Replace compile configuration usage with api (7.x backport) ()
* Replace compile configuration usage with api ()

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Przemysław Witek
9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis () () 2020-06-30 14:09:11 +02:00
Przemysław Witek
3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API () () 2020-06-29 10:56:11 +02:00
Benjamin Trent
7a202b149e
Muting analytics tests () () 2020-06-26 16:50:59 -04:00
Benjamin Trent
add8ff1ad3
[ML] assume data streams are enabled in data stream tests () () 2020-06-24 14:14:48 -04:00
Przemysław Witek
551b8bcd73
[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names () () 2020-06-24 15:52:45 +02:00
Luca Cavanna
dbbf2772d8 Mute newly added ml data streams tests ()
Relates to 
2020-06-24 15:11:40 +02:00
Benjamin Trent
a9b868b7a9
[7.x] [ML] allow data streams to be expanded for analytics and transforms () ()
This commits allows data streams to be a valid source for analytics and transforms.

Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.

For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
2020-06-23 14:40:35 -04:00
David Roberts
0d6bfd0ac3
[7.x][ML] Fix wire serialization for flush acknowledgements ()
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null.  This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant.  But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.

This change corrects the discrepancies.  Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.

Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.

Backport of 
2020-06-23 16:42:06 +01:00
Benjamin Trent
bf8641aa15
[7.x] [ML] calculate cache misses for inference and return in stats () ()
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-19 09:46:51 -04:00
Przemysław Witek
9dd3d5aa48
[7.x] Delete auto-generated annotations when model snapshot is reverted () () 2020-06-18 17:59:52 +02:00
Jason Tedor
b78b3edeea
Upgrade to JNA 5.5.0 ()
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
2020-06-17 07:35:08 -04:00
Przemysław Witek
b22e91cefc
[7.x] Delete auto-generated annotations when job is deleted. () () 2020-06-17 09:17:20 +02:00
Rene Groeschke
01e9126588
Remove deprecated usage of testCompile configuration () ()
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Valeriy Khakhutskyy
c0f368bbf3
[7.x][ML] Adjust assertion for job case memory usage estimates ()
Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion.

Backport or 
2020-06-10 15:17:16 +02:00
Benjamin Trent
9666a895f7
[ML] inference performance optimizations and refactor () ()
This is a major refactor of the underlying inference logic.

The main refactor is now we are separating the model configuration and
the inference interfaces.

This has the following benefits:
 - we can store extra things with the model that are not
   necessary for inference (i.e. treenode split information gain)
 - we can optimize inference separate from model serialization and storage.
 - The user is oblivious to the optimizations (other than seeing the benefits).

A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.

This new class satisfies a new interface that is ONLY for inference.

The optimizations applied currently are:
- feature maps are flattened once
- feature extraction only happens once at the highest level
  (improves inference + feature importance through put)
- Only storing what we need for inference + feature importance on heap
2020-06-05 14:20:58 -04:00
Przemysław Witek
6b5f49d097
[7.x] Introduce ModelPlotConfig. annotations_enabled setting () () 2020-06-04 15:15:35 +02:00
Benjamin Trent
34f1e0b6bb
[7.x] [ML] mark forecasts for force closed/failed jobs as failed () ()
* [ML] mark forecasts for force closed/failed jobs as failed ()

forecasts that are still running should be marked as failed/finished in the following scenarios:

- Job is force closed
- Job is re-assigned to another node.

Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.

Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.

relates to https://github.com/elastic/elasticsearch/issues/56419
2020-05-29 14:48:10 -04:00
Benjamin Trent
35d5126cea
[7.x] [ML] adds new for_export flag to GET _ml/inference API () ()
* [ML] adds new for_export flag to GET _ml/inference API ()

Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 14:01:08 -04:00
Benjamin Trent
c8374dc9f3
[ML] add max_model_memory parameter to forecast request () ()
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 11:16:08 -04:00
Przemysław Witek
ea2012778e
Mute failing test () () 2020-05-25 14:06:29 +02:00
Benjamin Trent
297f864884
[ML] relax throttling on expired data cleanup () ()
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 08:46:42 -04:00
Dimitris Athanasiou
011e995165
[7.x][ML] Unmute ClssificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange () ()
Closes 
2020-05-06 18:20:46 +03:00
Julie Tibshirani
49de092b38 Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet. 2020-05-05 16:25:36 -07:00
Julie Tibshirani
63062ec7bd Mute ClassificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange. 2020-05-05 13:48:35 -07:00
Benjamin Trent
e1c5ca421e
[7.x] [ML] lay ground work for handling >1 result indices () ()
* [ML] lay ground work for handling >1 result indices ()

This commit removes all but one reference to `getInitialResultsIndexName`. 
This is to support more than one result index for a single job.
2020-05-05 15:54:08 -04:00
David Roberts
7aa0daaabd
[7.x][ML] More advanced model snapshot retention options ()
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of 
2020-05-05 14:31:58 +01:00
Dimitris Athanasiou
75dadb7a6d
[7.x][ML] Add loss_function to regression () ()
Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of 
2020-05-05 14:59:51 +03:00
Martijn van Groningen
6d03081560
Add auto create action ()
Backport of  to 7.x branch.

Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See 

* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:

AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via , can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Relates to 
2020-05-04 19:10:09 +02:00
Dimitris Athanasiou
17b904def5
[7.x][ML] Decouple DFA progress testing from analyses phases () ()
This refactors native integ tests to assert progress without
expecting explicit phases for analyses. We can test those with
yaml tests in a single place.

Backport of 
2020-04-30 17:05:47 +03:00
Dimitris Athanasiou
d9685a0f19
[7.x][ML] Validate at least one feature is available for DF analytics () ()
We were previously checking at least one supported field existed
when the _explain API was called. However, in the case of analyses
with required fields (e.g. regression) we were not accounting that
the dependent variable is not a feature and thus if the source index
only contains the dependent variable field there are no features to
train a model on.

This commit adds a validation that at least one feature is available
for analysis. Note that we also move that validation away from
`ExtractedFieldsDetector` and the _explain API and straight into
the _start API. The reason for doing this is to allow the user to use
the _explain API in order to understand why they would be seeing an
error like this one.

For example, the user might be using an index that has fields but
they are of unsupported types. If they start the job and get
an error that there are no features, they will wonder why that is.
Calling the _explain API will show them that all their fields are
unsupported. If the _explain API was failing instead, there would
be no way for the user to understand why all those fields are
ignored.

Closes 

Backport of 
2020-04-29 11:39:58 +03:00
Przemysław Witek
c89917c799
Register DFA jobs on putAnalytics rather than via a separate method () () 2020-04-24 10:59:32 +02:00
Dimitris Athanasiou
b8379872a7
[7.x][ML] Logs error when DFA task is set to failed () ()
Also unmutes the integ test that stops and restarts
an outlier detection job with the hope of learning more
of the failure in .

Backport of 

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-24 11:06:07 +03:00
David Roberts
da5aeb8be7
[ML] Return assigned node in start/open job/datafeed response ()
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Backport of 
2020-04-22 12:06:53 +01:00
Stuart Tettemer
93a2e9b0f9
Test: MockScoreScript can be cacheable. ()
Backport: 0ed1eb5
2020-04-20 17:09:58 -06:00
Benjamin Trent
cabff65aec
[ML] Fixing inference stats race condition () ()
`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.

To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.

When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.

This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.

I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).

closes  https://github.com/elastic/elasticsearch/issues/54786
2020-04-20 16:21:18 -04:00
Benjamin Trent
fa0373a19f
[7.x] [ML] Fix log spam and disable ILM/SLM history for native ML tests ()
* [ML] fix native ML test log spam ()

This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors.

* removing unnecessary changes

* removing unused imports

* removing unnecessary java setting
2020-04-20 15:41:30 -04:00
William Brafford
7817948926 Disable monitoring in ML multinode tests ()
Removing the deprecated "xpack.monitoring.enabled" setting introduced
log spam and potentially some failures in ML tests. It's possible to use
a different, non-deprecated setting to disable monitoring, so we do that
here.
2020-04-20 10:51:16 -04:00
Przemysław Witek
7d5f74e964
Fix and unmute testSetUpgradeMode_ExistingTaskGetsUnassigned () () 2020-04-20 13:29:29 +02:00
William Brafford
49e30b15a2
Deprecate disabling basic-license features () ()
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.

This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
2020-04-17 15:04:17 -04:00
Benjamin Trent
8c581c3388
[ML] fixing and unmuting testHRDSplit test () ()
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). 

The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. 

closes https://github.com/elastic/elasticsearch/issues/32966
2020-04-17 09:55:52 -04:00
Benjamin Trent
2b68aa3471
muting test for issue 55068 () 2020-04-16 10:32:12 -04:00
David Roberts
8489f8c121
[ML] Add test to prove categorization state written after lookback ()
When a datafeed transitions from lookback to real-time we request
that state is persisted from the autodetect process in the
background.

This PR adds a test to prove that for a categorization job the
state that is persisted includes the categorization state.
Without the fix from elastic/ml-cpp#1137 this test fails.  After
that C++ fix is merged this test should pass.

Backport of 
2020-04-16 11:55:18 +01:00
David Roberts
5de6ddfef2 Mute ClassificationIT.testSetUpgradeMode_ExistingTaskGetsUnassigned
Due to https://github.com/elastic/elasticsearch/issues/55221
2020-04-16 09:03:46 +01:00
William Brafford
2ba3be9db6
Remove deprecated third-party methods from tests () ()
I've noticed that a lot of our tests are using deprecated static methods
from the Hamcrest matchers. While this is not a big deal in any
objective sense, it seems like a small good thing to reduce compilation
warnings and be ready for a new release of the matcher library if we
need to upgrade. I've also switched a few other methods in tests that
have drop-in replacements.
2020-04-15 17:54:47 -04:00
Dimitris Athanasiou
4000138105
[7.x][ML] Add debug logging for outlier detection stop and restart integ test () ()
To understand the failures in 

Backport of 
2020-04-15 10:40:38 +03:00
Przemysław Witek
d5bb574e1e
[7.x] Unassign DFA tasks in SetUpgradeModeAction () () 2020-04-14 14:09:02 +02:00
Mark Vieira
cb58725164 Mute InferenceIngestIT.testPipelineIngest 2020-04-14 09:27:56 +01:00
Benjamin Trent
d32f6fed1d
[ML] inference only persist if there are stats () ()
We needlessly send documents to be persisted. If there are no stats added, then we should not attempt to persist them.

Also, this PR fixes the race condition that caused issue:  https://github.com/elastic/elasticsearch/issues/54786
2020-04-13 14:03:05 -04:00
Benjamin Trent
c5c7ee9d73
[7.x] [ML] Start gathering and storing inference stats () ()
* [ML] Start gathering and storing inference stats ()

This PR enables stats on inference to be gathered and stored in the `.ml-stats-*` indices.

Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user.

`.ml-stats-*` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name.

We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).
2020-04-13 08:15:46 -04:00