Commit Graph

791 Commits

Author SHA1 Message Date
Benjamin Trent d41b2e3f38
[ML][Inference] allowing per-model licensing (#49398) (#49435)
* [ML][Inference] allowing per-model licensing

* changing to internal action + removing pre-mature opt
2019-11-21 09:46:34 -05:00
Przemysław Witek c7ac2011eb
[7.x] Implement accuracy metric for multiclass classification (#47772) (#49430) 2019-11-21 15:01:18 +01:00
David Roberts 20558cf61c [ML] Fix simultaneous stop and force stop datafeed (#49367)
If a datafeed is stopped normally and force stopped at the same
time then it is possible that the force stop removes the
persistent task while the normal stop is performing actions.
Currently this causes the normal stop to error, but since
stopping a stopped datafeed is not an error this doesn't make
sense. Instead the force stop should just take precedence.

This is a followup to #49191 and should really have been
included in the changes in that PR.
2019-11-20 12:52:47 +00:00
Przemysław Witek 9c0ec7ce23
[7.x] Make AnalyticsProcessManager class more robust (#49282) (#49356) 2019-11-20 10:08:16 +01:00
Dimitris Athanasiou 4d6e037e90
[7.x][ML] Extract creation of DFA field extractor into a factory (#49315) (#49329)
This commit moves the async calls required to retrieve the components
that make up `ExtractedFieldsExtractor` out of `DataFrameDataExtractorFactory`
and into a dedicated `ExtractorFieldsExtractorFactory` class.

A few more refactorings are performed:

  - The detector no longer needs the results field. Instead, it knows
  whether to use it or not based on whether the task is restarting.
  - We pass more accurately whether the task is restarting or not.
  - The validation of whether fields that have a cardinality limit
  are valid is now performed in the detector after retrieving the
  respective cardinalities.

Backport of #49315
2019-11-20 10:02:42 +02:00
Przemysław Witek 42bb8ae525
[7.x] Extract indexData method out of RegressionIT tests (#49306) (#49313) 2019-11-19 22:47:12 +01:00
Benjamin Trent 19602fd573
[ML][Inference] changing setting to be memorySizeSettting (#49259) (#49302) 2019-11-19 07:56:40 -05:00
David Roberts a5204c1c80
[ML] Fixes for stop datafeed edge cases (#49284)
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of #49191
2019-11-19 10:51:46 +00:00
Benjamin Trent eefe7688ce
[7.x][ML] ML Model Inference Ingest Processor (#49052) (#49257)
* [ML] ML Model Inference Ingest Processor (#49052)

* [ML][Inference] adds lazy model loader and inference (#47410)

This adds a couple of things:

- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:

* [ML][Inference] Adjust inference configuration option API (#47812)

* [ML][Inference] adds logistic_regression output aggregator (#48075)

* [ML][Inference] Adding read/del trained models (#47882)

* [ML][Inference] Adding inference ingest processor (#47859)

* [ML][Inference] fixing classification inference for ensemble (#48463)

* [ML][Inference] Adding model memory estimations (#48323)

* [ML][Inference] adding more options to inference processor (#48545)

* [ML][Inference] handle string values better in feature extraction (#48584)

* [ML][Inference] Adding _stats endpoint for inference (#48492)

* [ML][Inference] add inference processors and trained models to usage (#47869)

* [ML][Inference] add new flag for optionally including model definition (#48718)

* [ML][Inference] adding license checks (#49056)

* [ML][Inference] Adding memory and compute estimates to inference (#48955)

* fixing version of indexed docs for model inference
2019-11-18 13:19:17 -05:00
Przemysław Witek 5f9965e4b8
Lower minimum model memory limit value from 1MB to 1kB. (#49227) (#49242) 2019-11-18 14:58:20 +01:00
Dimitris Athanasiou 805c31e19e
[7.x][ML] Avoid NPE when node load is calculated on job assignment (#49186) (#49214)
This commit fixes a NPE problem as reported in #49150.
But this problem uncovered that we never added proper handling
of state for data frame analytics tasks.

In this commit we improve the `MlTasks.getDataFrameAnalyticsState`
method to handle null tasks and state tasks properly.

Closes #49150

Backport of #49186
2019-11-18 10:33:07 +02:00
Przemysław Witek 150db2b544
Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143) (#49164) 2019-11-18 07:52:57 +01:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Przemysław Witek e6ad3c29fd
Do not throw exceptions resulting from persisting datafeed timing stats. (#49044) (#49050) 2019-11-13 20:23:13 +01:00
Christoph Büscher 6119f0aaa2 Fix Eclipse compilation in DataFrameDataExtractorTests (#48942) 2019-11-11 16:17:55 +01:00
Dimitris Athanasiou dfc6a13b44
[7.x][ML] Handle nested arrays in source fields (#48885) (#48889)
Backport of #48885
2019-11-07 07:30:50 +02:00
David Roberts c03f7ba74c [TEST] Mute TimeoutCheckerTests.testWatchdog
Due to https://github.com/elastic/elasticsearch/issues/48861
2019-11-05 11:49:46 +00:00
Dimitris Athanasiou f2d4c94a9c
[7.x][ML] Deduplicate multi-fields for data frame analytics (#48799) (#48806)
In the case multi-fields exist in the source index, we pick
all variants of them in our extracted fields detection for
data frame analytics. This means we may have multiple instances
of the same feature. The worse consequence of this is when the
dependent variable (for regression or classification) is also
duplicated which means we train a model on the dependent variable
itself.

Now that #48770 is merged, this commit is adding logic to
only select one variant of multi-fields.

Closes #48756

Backport of #48799
2019-11-01 16:53:05 +02:00
Dimitris Athanasiou 1f662e0b12
[7.x][ML] Prevent fetching multi-field from source (#48770) (#48797)
Aggregatable mutli-fields are at the moment wrongly mapped
as normal doc_value fields and thus they support fetching from
source. However, they do not exist in the source. This results
to failure to extract such fields.

This commit fixes this bug. While a fix could be worked out
on top of the existing code, it is evident the extraction logic
has become difficult to understand and maintain. As we also
want to deduplicate multi-fields for data frame analytics,
it seemed appropriate to refactor the code to simplify and
better handle the extraction of multi-fields.

Relates #48756

Backport of #48770
2019-11-01 14:18:03 +02:00
David Roberts c3063c4e1f [ML] Make the URL of the ML C++ Ivy repo configurable (#48702)
At present the ML C++ artifact is always downloaded from
S3.  This change adds an option to configure the location.

(The intention is to use a file:/// URL to pick up the
artifact built in a Docker container in ml-cpp PR builds
so that C++ changes that will break Java integration tests
can be detected before the ml-cpp PRs are merged.)

Relates elastic/ml-cpp#766
2019-10-31 09:21:44 +00:00
Dimitris Athanasiou 919596b2e8
[7.x][ML] Move field extraction logic to its own package (#48709) (#48712)
Moves common field extraction logic to its own package so that it can
be used both for anomaly detection and data frame analytics.

In preparation for refactoring extraction fields to be simpler and to
support multi-fields properly.

Backport of #48709
2019-10-31 02:41:00 +02:00
Benjamin Trent c9ead80c31
[7.x] [ML][Inference] separating definition and config object storage (#48651) (#48695)
* [ML][Inference] separating definition and config object storage (#48651)

This separates out the `definition` object from being stored within the configuration object in the index. 

This allows us to gather the config object without decompressing a potentially large definition.

Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.
2019-10-30 13:27:29 -04:00
Przemysław Witek 7c944d26c5
[7.x] Assert that the results of classification analysis can be evaluated using _evaluate API. (#48626) (#48634) 2019-10-29 16:20:56 +01:00
Przemysław Witek 7e30277a37
Mute RegressionIT.testStopAndRestart (#48575) (#48576) 2019-10-28 13:08:11 +01:00
Martijn van Groningen b034153df7
Change grok watch dog to be Matcher based instead of thread based. (#48346)
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
2019-10-24 15:34:01 +02:00
Przemysław Witek 149537a165
Assert that inference model has been persisted (#48332) (#48453) 2019-10-24 14:18:43 +02:00
Przemyslaw Gomulka aaa6209be6
[7.x] [Java.time] Calculate week of a year with ISO rules BACKPORT(#48209) (#48349)
Reverting the change introducing IsoLocal.ROOT and introducing IsoCalendarDataProvider that defaults start of the week to Monday and requires minimum 4 days in first week of a year. This extension is using java SPI mechanism and defaults for Locale.ROOT only.
It require jvm property java.locale.providers to be set with SPI,COMPAT

closes #41670
backport #48209
2019-10-23 17:39:38 +02:00
Przemysław Witek 60d8ecb2b7
Mute ClassificationIT tests (#48338) (#48339) 2019-10-22 12:45:50 +02:00
Przemysław Witek 2db2b945ec
[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174) (#48294) 2019-10-21 22:07:19 +02:00
Benjamin Trent abd1b5118f
[ML] fixing tests (#48084) (#48253)
* [ML] fixing tests

* unmuting tests

* reverting outlier detection job changes
2019-10-21 09:21:06 -04:00
rsarawgi 5e4dd0fd2e [ML] Removing usages of ToXContentParams.INCLUDE_TYPE (#48165)
Removing the option of ToXContentParams.INCLUDE_TYPE and replacing them with ToXContentParams.FOR_INTERNAL_STORAGE
Closes #48057
2019-10-18 14:49:26 +01:00
Przemysław Witek 28f68fa221
Make num_top_classes parameter's default value equal to 2 (#48119) (#48201) 2019-10-17 18:43:15 +02:00
Dimitris Athanasiou e0489fc328
[7.x][ML] Always refresh dest index before starting analytics process (#48090) (#48196)
If a job stops right after reindexing is finished but before
we refresh the destination index, we don't refresh at all.
If the job is started again right after, it jumps into the analyzing state.
However, the data is still not searchable.
This is why we were seeing test failures that we start the process
expecting X rows (where X is lower than the expected number of docs)
and we end up getting X+.

We fix this by moving the refresh of the dest index right before
we start the process so it always ensures the data is searchable.

Closes #47612

Backport of #48090
2019-10-17 17:20:19 +01:00
Benjamin Trent ee110c2d42
[ML] Muting tests due to #48085 (#48086) (#48154) 2019-10-16 15:46:50 -04:00
Benjamin Trent 0dddbb5b42
[ML] Parse and index inference model (#48016) (#48152)
This adds parsing an inference model as a possible
result of the analytics process. When we do parse such a model
we persist a `TrainedModelConfig` into the inference index
that contains additional metadata derived from the running job.
2019-10-16 15:46:20 -04:00
Przemysław Witek 8f815240b3
[7.x] Allow integer types for classification's dependent variable (#47902) (#48080) 2019-10-16 11:09:56 +02:00
David Roberts d9c7e3847e [TEST] Don't assert order of data frame analytics audit messages (#48065)
Audit messages are stored with millisecond timestamps. If two
messages have the same millisecond timestamp then asserting on
their order is impossible given the information available.

This PR changes the assertion on audit messages in the native
data frame analytics tests to assert that the expected audit
messages exist in any order.

Fixes #48035
2019-10-15 19:59:52 +01:00
Przemysław Witek eaa56344b5
Verify that the failure reason of analytics process is empty (#48042) (#48071) 2019-10-15 18:33:20 +02:00
Przemysław Witek 620bd9d224
Enable test testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows_TopClassesRequested now that top classes are correctly reported by C++. (#48043) (#48053) 2019-10-15 14:49:16 +02:00
David Roberts 984323783e
[ML][7.x] Add lazy assignment job config option (#47993)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-15 06:55:11 +01:00
David Roberts 1ca25bed38
[ML][7.x] Add option to stop datafeed that finds no data (#47995)
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Backport of #47922
2019-10-14 17:19:13 +01:00
David Roberts 46ae86ac31 [ML] Fix detection of syslog-like timestamp in find_file_structure (#47970)
Usually syslog timestamps have two spaces before a single
digit day-of-month. However, in some non-syslog cases
where syslog-like timestamps are used there is only one
space. The grok pattern supports this, so the timestamp
parser should too. This change makes the
find_file_structure endpoint do this.

Also fixes another problem that the same test case
exposed in the find_file_structure endpoint, which was
that the exclude_lines_pattern for delimited files was
always created on the assumption the delimiter was a
comma. Now it is based on the actual delimiter.
2019-10-13 20:07:54 +01:00
Benjamin Trent 627faf1850
[7.x] [ML][Analytics] fix bug where regression deleted early does not delete state (#47885) (#47914)
* [ML][Analytics] fix bug where regression deleted early does not delete state (#47885)

* [ML][Analytics] fix bug where regression deleted early does not delete state

* Fixing ml with security test failure

* fixing for older java
2019-10-11 15:11:16 -04:00
Przemysław Witek c62fe8c344
Require that the dependent variable column has at most 2 distinct values in classfication analysis. (#47858) (#47906) 2019-10-11 14:57:08 +02:00
Igor Motov b5afa95fd8 Fix Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart
Tracked by #47612
2019-10-10 18:17:01 +04:00
Igor Motov 17433e79d8 Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart
Tracked by #47612
2019-10-10 17:56:23 +04:00
Dimitris Athanasiou c1b0bfd74a
[7.x][ML] Unwrap exception causes before calling instanceof (#47676) (#47724)
When exceptions could be returned from another node, the exception
might be wrapped in a `RemoteTransportException`. In places where
we handled specific exceptions using `instanceof` we ought to unwrap
the cause first.

This commit attempts to fix this issue after searching code in the ML
plugin.

Backport of #47676
2019-10-08 16:02:47 +03:00
Dimitris Athanasiou 7667ea5f6f
[7.x][ML] Additional outlier detection parameters (#47600) (#47669)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
2019-10-07 18:21:33 +03:00
Dimitris Athanasiou ffacfc642c
[7.x][ML] Mute RegressionIT.testStopAndRestart (#47624) (#47625)
Relates #47612
2019-10-05 23:58:32 +03:00
Przemysław Witek ee952da2e2
[7.x] Implement evaluation API for multiclass classification problem (#47126) (#47343) 2019-10-04 17:54:51 +02:00
Przemysław Witek ec9b77deaa
[7.x] Implement new analysis type: classification (#46537) (#47559) 2019-10-04 13:47:19 +02:00
David Roberts 31a5e1c7ee [ML] More accurate job memory overhead (#47516)
When an ML job runs the memory required can be
broken down into:

1. Memory required to load the executable code
2. Instrumented model memory
3. Other memory used by the job's main process or
   ancilliary processes that is not instrumented

Previously we added a simple fixed overhead to
account for 1 and 3. This was 100MB for anomaly
detection jobs (large because of the completely
uninstrumented categorization function and
normalize process), and 20MB for data frame
analytics jobs.

However, this was an oversimplification because
the executable code only needs to be loaded once
per machine.  Also the 100MB overhead for anomaly
detection jobs was probably too high in most cases
because categorization and normalization don't use
_that_ much memory.

This PR therefore changes the calculation of memory
requirements as follows:

1. A per-node overhead of 30MB for _only_ the first
   job of any type to be run on a given node - this
   is to account for loading the executable code
2. The established model memory (if applicable) or
   model memory limit of the job
3. A per-job overhead of 10MB for anomaly detection
   jobs and 5MB for data frame analytics jobs, to
   account for the uninstrumented memory usage

This change will enable more jobs to be run on the
same node.  It will be particularly beneficial when
there are a large number of small jobs.  It will
have less of an effect when there are a small number
of large jobs.
2019-10-04 09:57:31 +01:00
Dimitris Athanasiou b9541eb3af
[7.x][ML] Make PUT data frame analytics action a master node action (… (#47433)
While it seemed like the PUT data frame analytics action did not
have to be a master node action as the config is stored in an index
rather than the cluster state, there are other subtle nuances which
make it worthwhile to convert it. In particular, it helps maintain
order of execution for put actions which are anyhow user driven and
are expected to have low volume.

This commit converts `TransportPutDataFrameAnalyticsAction` from
a handled transport action to a master node action.

Note this means that the action might fail in a mixed cluster
but as the API is still experimental and not widely used there will
be few moments more suitable to make this change than now.
2019-10-02 16:24:21 +03:00
David Roberts 4379a3c52b [ML] Throttle the delete-by-query of expired results (#47177)
Due to #47003 many clusters will have built up a
large backlog of expired results. On upgrading to
a version where that bug is fixed users could find
that the first ML daily maintenance task deletes
a very large amount of documents.

This change introduces throttling to the
delete-by-query that the ML daily maintenance uses
to delete expired results to limit it to deleting an
average 200 documents per second. (There is no
throttling for state/forecast documents as these
are expected to be lower volume.)

Additionally a rough time limit of 8 hours is applied
to the whole delete expired data action. (This is only
rough as it won't stop part way through a single
operation - it only checks the timeout between
operations.)

Relates #47103
2019-10-02 11:16:34 +01:00
Dimitris Athanasiou 36884a3c32
[7.x][ML] Restore analytics state if available (#47128) (#47393)
This commit restores the model state if available in data
frame analytics jobs.

In addition, this changes the start API so that a stopped job
can be restarted. As we now store the progress in the state index
when the task is stopped, we can use it to determine what state
the job was in when it got stopped.

Note that in order to be able to distinguish between a job
that runs for the first time and another that is restarting,
we ensure reindexing progress is reported to be at least 1
for a running task.
2019-10-02 10:24:05 +03:00
Benjamin Trent f5fe5e7cd6
[7.x] [ML][Inference] Adding preprocessors to definition object (#47320) (#47370)
* [ML][Inference] Adding preprocessors to definition object (#47320)

* [ML][Inference] Adding preprocessors to definition object

* Update TrainedModelConfig.java

* adjusting for backport
2019-10-01 13:31:25 -04:00
Benjamin Trent 4335e07716
[7.x] [ML][Inference] adding .ml-inference* index and storage (#47267) (#47310)
* [ML][Inference] adding .ml-inference* index and storage (#47267)

* [ML][Inference] adding .ml-inference* index and storage

* Addressing PR comments

* Allowing null definition, adding validation tests for model config

* fixing line length

* adjusting for backport
2019-10-01 08:20:33 -04:00
David Roberts 0807d409bf [ML] Reinstate ML daily maintenance actions (#47103)
A refactoring in 6.6 meant that the ML daily
maintenance actions have not been run at all
since then. This change installs the local
master listener that schedules the ML daily
maintenance, and also defends against some
subtle race conditions that could occur in the
future if a node flipped very quickly between
master and non-master.

Fixes #47003
2019-09-30 13:12:32 +01:00
Rory Hunter 53a4d2176f
Convert most awaitBusy calls to assertBusy (#45794) (#47112)
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.

There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.

I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.

Other changes:

  * Rework `waitForDocs` to scale its timeout. Instead of calling
    `assertBusy` in a loop, work out a reasonable overall timeout and await
    just once.
  * Some tests failed after switching to `assertBusy` and had to be fixed.
  * Correct the expect templates in AbstractUpgradeTestCase.  The ES
    Security team confirmed that they don't use templates any more, so
    remove this from the expected templates. Also rewrite how the setup
    code checks for templates, in order to give more information.
  * Remove an expected ML template from XPackRestTestConstants The ML team
    advised that the ML tests shouldn't be waiting for any
    `.ml-notifications*` templates, since such checks should happen in the
    production code instead.
  * Also rework the template checking code in `XPackRestTestHelper` to give
    more helpful failure messages.
  * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
2019-09-29 12:21:46 +01:00
Przemysław Witek 3fbd58d156
[7.x] Allow evaluation to consist of multiple steps. (#46653) (#47194) 2019-09-27 13:01:51 +02:00
David Roberts 77cc6d5bad [TEST] Work around _cat/indices bug with security enabled (#47160)
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652
2019-09-26 13:29:40 +01:00
Dimitris Athanasiou 0765bd4bf7
[7.x][ML] Ensure data frame analytics task is only marked completed once (#47119) (#47157)
Closes #46907
2019-09-26 15:26:06 +03:00
Tanguy Leroux 95e2ca741e
Remove unused private methods and fields (#47154)
This commit removes a bunch of unused private fields and unused
private methods from the code base.

Backport of (#47115)
2019-09-26 12:49:21 +02:00
Benjamin Trent 05fb7be571
[7.x] [ML][Inference] Feature pre-processing objects and functions (#46777) (#47040)
* [ML][Inference] Feature pre-processing objects and functions (#46777)

To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future. 

This PR lays down the ground work for 3 basic encodings:

* HotOne
* Target Mean
* Frequency

More feature encodings or pre-processings could be added in the future:

* Handling missing columns
* Standardization
* Label encoding
* etc....

* fixing compilation for namedxcontent tests
2019-09-25 08:16:24 -04:00
Yannick Welsch eb86d71edd Mute MlJobIT.testDeleteJob
Relates #45652
2019-09-25 12:53:09 +02:00
Yannick Welsch 7a5b5af171 Mute MlJobIT.testDeleteJobAsync
Relates #45652
2019-09-25 12:53:05 +02:00
Benjamin Trent 00c1c0132b
[ML] fix two datafeed flush lockup bugs (#46982) (#47024)
* [ML] fix two flush lockup bugs

* Addressing PR comments

* moving debug logging line so it is only written on success
2019-09-24 13:03:20 -04:00
Yannick Welsch 9638ca20b0 Allow dropping documents with auto-generated ID (#46773)
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
2019-09-19 16:46:33 +02:00
Dimitris Athanasiou 02a5e153dc
[7.x][ML] Parse and index data frame analytics state (#46804) (#46820)
This commit reuses the same state processor that is used for autodetect
to parse state output from data frame analytics jobs. We then index the
state document into the state index.

Backport of #46804
2019-09-18 20:37:40 +03:00
Dimitris Athanasiou cebe8da617
[7.x][ML] MlMemoryTracker should ignore analytics tasks without config (#46789) (#46811)
It is possible for a running analytics job that its config is removed
from the '.ml-config' index (perhaps the user deleted the entire index,
etc.). In that case the task remains without a matching config. I have
raised #46781 to discuss how to deal with this issue.

This commit focuses on `MlMemoryTracker` and changes it so that when
we get the configs for the running tasks we leniently ignore missing ones.
This at least means memory tracking will keep working for other jobs
if one or more are missing.

In addition, this commit makes the cleanup code for native analytics
tests more robust by explicitly stopping all jobs and force-stopping
if an error occurs. This helps so that a single failing test does
not cause other tests fail due to pending tasks.

Backport of #46789
2019-09-18 16:35:25 +03:00
Przemysław Witek e49be611ad
[7.x] Add audit messages for Data Frame Analytics (#46521) (#46738) 2019-09-16 21:21:38 +02:00
Dimitris Athanasiou 63eb0d9081
[7.x][ML] Avoid marking data frame analytics task completed twice (#46721) (#46724)
When the stop API is called while the task is running there is
a chance the task gets marked completed twice. This may cause
undesired side effects, like indexing the progress document a second
time after the stop API has returned (the cause for #46705).

This commit adds a check that the task has not been completed before
proceeding to mark it so. In addition, when we update the task's state
we could get some warnings that the task was missing if the stop API
has been called in the meantime. We now check the errors are
`ResourceNotFoundException` and ignore them if so.

Closes #46705

Backports #46721
2019-09-15 17:25:26 +03:00
Dimitris Athanasiou 0bc8acaf5b
[7.x][ML] Create state index and alias before starting an analytics job (#46602) (#46648)
This is fixing a bug where if an analytics job is started before any
anomaly detection job is opened, we create an index after the state
write alias.

Instead, we should create the state index and alias before starting
an analytics job and this commit makes sure this is the case.

Backport of #46602
2019-09-13 10:34:12 +03:00
David Roberts 461de5b58e [TEST] Remove incorrect data frame analytics state assertion (#46597)
After starting the analytics job and checking its state
the state can be any of "started", "reindexing" or
"analyzing" depending on how quickly the work is done.
2019-09-11 16:33:14 +01:00
Dimitris Athanasiou 579af626f5
[7.x][ML] No error when datafeed stops during updating to started (#46495) (#46542)
Investigating the test failure reported in #45518 it appears that
the datafeed task was not found during a tast state update. There
are only two places where such an update is performed: when we set
the state to `started` and when we set it to `stopping`. We handle
`ResourceNotFoundException` in the latter but not in the former.

Thus the test reveals a rare race condition where the datafeed gets
requested to stop before we managed to update its state to `started`.
I could not reproduce this scenario but it would be my best guess.

This commit catches `ResourceNotFoundException` while updating the
state to `started` and lets the task terminate smoothly.

Closes #45518

Backport of #46495
2019-09-11 13:18:42 +03:00
Przemysław Witek e38e631dac
[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967) (#46519) 2019-09-11 12:17:26 +02:00
Przemysław Witek e21deae535
Disallow persisting any documents when datafeed is isolated (#46485) (#46490) 2019-09-09 21:01:27 +02:00
David Roberts 7c7fb7e32d [ML] Tolerate total_search_time_ms not mapped in get datafeed stats (#46432)
ML users who upgrade from versions prior to 7.4 to 7.4 or later
will have ML results indices that do not have mappings for the
total_search_time_ms field.  Therefore, when searching these
indices we must tolerate this field not having a mapping.

Fixes #46437
2019-09-06 14:31:15 +01:00
Dimitris Athanasiou a6834068e3
[7.x][ML] Extract DataFrameAnalyticsTask into its own class (#46402) (#46426)
This refactors `DataFrameAnalyticsTask` into its own class.
The task has quite a lot of functionality now and I believe it would
make code more readable to have it live as its own class rather than
an inner class of the start action class.

Backport of #46402
2019-09-06 14:13:46 +03:00
Benjamin Trent 457ff3e2fb
7.x/ml fix instance serialization bwc (#46404)
* [ML] Fixing instance serialization version for bwc

* fixing CppLogMessage
2019-09-05 13:23:26 -05:00
Benjamin Trent 5201386232
[ML] testFullClusterRestart waiting for stable cluster (#46280) (#46335)
* [ML] waiting for ml indices before waiting task assignment testFullClusterRestart

* waiting for a stable cluster after fullrestart

* removing unused imports
2019-09-05 06:57:33 -05:00
Dimitris Athanasiou 8fca5b5204
[7.x][ML] Unmute testStopOutlierDetectionWithEnoughDocumentsToScroll (#46271) (#46282)
The test seems to have been failing due to a race condition between
stopping the task and refreshing the destination index. In particular,
we were going forward with refreshing the destination index even
though the task stopped in the meantime. This was fixed in
request.

Closes #43960

Backport of #46271
2019-09-04 10:57:01 +03:00
Benjamin Trent d0c5573a51
[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044) (#46096)
Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot.

This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.
2019-08-30 09:27:07 -05:00
Dimitris Athanasiou 5921ae53d8
[7.x][ML] Regression dependent variable must be numeric (#46072) (#46136)
* [ML] Regression dependent variable must be numeric

This adds a validation that the dependent variable of a regression
analysis must be numeric.

* Address review comments and fix some problems

In addition to addressing the review comments, this
commit fixes a few issues I found during testing.

In particular:

- if there were mappings for required fields but they were
not included we were not reporting the error
- if explicitly included fields had unsupported types we were
not reporting the error

Unfortunately, I couldn't get those fixed without refactoring
the code in `ExtractedFieldsDetector`.
2019-08-30 09:57:43 +03:00
Przemysław Witek b8a0379057
Refactor auditor-related classes (#45893) (#46120) 2019-08-29 14:21:03 +02:00
Przemysław Witek fbe9e8a530
Do not throw an exception if the process finished quickly but without any error. (#46073) (#46113) 2019-08-29 10:47:17 +02:00
Dimitris Athanasiou 25d64508f6
[7.x][ML] Support boolean fields for DF analytics (#46037) (#46054)
This commit adds support for `boolean` fields in data frame
analytics (and currently both outlier detection and regression).
The analytics process expects `boolean` fields to be encoded as
integers with 0 or 1 value.
2019-08-28 12:02:29 +03:00
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Benjamin Trent a3a4ae0ac2
[ML] fixing bug where analytics process starts with 0 rows (#45879) (#45988)
The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start.

When building the configuration for the process we should not start the native process if there are no rows.

Adding some logging to indicate what is occurring.
2019-08-26 14:18:17 -05:00
Benjamin Trent d64018f8e1
[ML] add supported types to no fields error message (#45926) (#45987)
* [ML] add supported types to no fields error message

* adding supported types to logger debug
2019-08-26 14:18:00 -05:00
Dimitris Athanasiou be554fe5f0
[7.x][ML] Improve progress reportings for DF analytics (#45856) (#45910)
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.

This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.

In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.

This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.

When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
2019-08-23 23:04:39 +03:00
Przemysław Witek 85d55e30d0
Add test that proves _timing_stats document is deleted when the job is deleted (#45840) (#45854) 2019-08-23 07:03:09 +02:00
Przemysław Witek 2ed19b2c81
Put error message from inside the process into the exception that is thrown when the process doesn't start correctly. (#45846) (#45875) 2019-08-23 07:02:50 +02:00
Benjamin Trent 8e3c54fff7
[7.x] [ML] Adding data frame analytics stats to _usage API (#45820) (#45872)
* [ML] Adding data frame analytics stats to _usage API (#45820)

* [ML] Adding data frame analytics stats to _usage API

* making the size of analytics stats 10k

* adjusting backport
2019-08-22 15:15:41 -05:00
Przemysław Witek 7512337922
[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775) (#45825) 2019-08-22 11:14:26 +02:00
Przemysław Witek bf701b83d2
Shorten field names in EstimateMemoryUsageResponse (#45719) (#45772) 2019-08-21 12:45:09 +02:00
Przemysław Witek c6709f0979
Mute tests affected by renaming fields in Estimate memory usage response (#45743) (#45766) 2019-08-21 09:57:23 +02:00
Dimitris Athanasiou d5c3d9b50f
[7.x][ML] Do not skip rows with missing values for regression (#45751) (#45754)
Regression analysis support missing fields. Even more, it is expected
that the dependent variable has missing fields to the part of the
data frame that is not for training.

This commit allows to declare that an analysis supports missing values.
For such analysis, rows with missing values are not skipped. Instead,
they are written as normal with empty strings used for the missing values.

This also contains a fix to the integration test.

Closes #45425
2019-08-21 08:15:38 +03:00
Benjamin Trent ba7b677618
[ML] better handle empty results when evaluating regression (#45745) (#45759)
* [ML] better handle empty results when evaluating regression

* adding new failure test to ml_security black list

* fixing equality check for regression results
2019-08-20 17:37:04 -05:00
Dimitris Athanasiou 49edf9e5b5
[7.x][ML] Remove timeout on waiting for DF analytics result processor to complete (#45724) (#45733)
We cannot know how long the analysis will take to complete thus we should not have
a timeout. Note that if the process crashes, the result processor will pick the
exception due to the stream closing.

Closes #45723
2019-08-20 17:21:40 +03:00
Przemysław Witek b37ebd1adf
Prepare the codebase for new Auditor subclasses (#45716) (#45731) 2019-08-20 16:03:50 +02:00
Przemysław Witek 80dd0a0948
Get rid of EstimateMemoryUsageRequest and EstimateMemoryUsageAction.Request. (#45718) (#45725) 2019-08-20 15:49:17 +02:00
Przemysław Witek 7bc8400222
Call the new _estimate_memory_usage API endpoint on df analytics _start (#45536) (#45701) 2019-08-19 21:37:55 +02:00
Igor Motov 98c850c08b
Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618)
Changes the order of parameters in Geometries from lat, lon to lon, lat
and moves all Geometry classes are moved to the
org.elasticsearch.geomtery package.

Backport of #45332

Closes #45048
2019-08-16 14:42:02 -04:00
Przemysław Witek df574e5168
[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) (#45510) 2019-08-14 08:26:03 +02:00
Armin Braun 90803a5caf
Reenable Integ Tests in native-multi-node-tests (#45482) (#45496)
* Reenable Integ Tests in native-multi-node-tests

* The tests broken here were likely fixed by #45463 => let's reenable them and see if things run fine again
* Relates #45405, #45455
2019-08-13 15:55:54 +02:00
Przemysław Witek 1aed388a24
Add view_index_metadata to roles.yml and remove as many df analytics test cases from build.gradle blacklist as possible. (#45451) (#45465) 2019-08-13 08:31:58 +02:00
Mark Vieira 7e3379444b
Fix build failure due to unknown task and disable test conventions
(cherry picked from commit 8ed84bc5cef9bcfae6c817059f764d97e4451a4a)
2019-08-12 09:18:39 -07:00
Przemyslaw Gomulka 421e9b8e8b
Mute integ tests in native-multi-node-tests (#45457)
Tracked at #45405
2019-08-12 17:42:24 +02:00
Przemyslaw Gomulka d11ae08467
Muting ForecastIT.testOverflowToDisk (#45435) (#45438)
awaits #45405
2019-08-12 11:01:32 +02:00
Dimitris Athanasiou d02d6e40c2 [ML] Mute regression integ test
Relates #45425
2019-08-12 10:59:24 +03:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Dimitris Athanasiou 27497ff75f
[7.x][ML] Add regression analysis to DF analytics (#45292) (#45388)
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.

This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
2019-08-09 19:31:13 +03:00
Jason Tedor 9a142ff25c
Introduce formal node ML role (#45174)
This commit builds on the ability for plugins to introduce new roles to
add a formal node ML role.
2019-08-06 13:00:05 -04:00
David Roberts a1f0285f0e [TEST] Only test US locale in day/month order test in FIPS JVM (#45141)
In the FIPS JVM the JVM default locale seems to leak into places
where it should be overridden. This change skips assertions
in TimestampFormatFinderTests.testGuessIsDayFirstFromLocale
that may be impacted.

Fixes #45140
2019-08-02 15:04:47 +01:00
David Roberts f617585dbd [ML] Improve CSV header row detection in find_file_structure (#45099)
When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes #45047
2019-08-02 09:08:21 +01:00
Dimitris Athanasiou 8a6675b994
[7.x][ML] Check dest index is empty when starting DF analytics (#45094) (#45112)
If one tries to start a DF analytics job that has already run,
the result will be that the task will fail after reindexing the
dest index from the source index. The results of the prior run
will be gone and the task state is not properly set to failed
with the failure reason.

This commit improves the behavior in this scenario. First, we
set the task state to `failed` in a set of failures that were
missed. Second, a validation is added that if the destination
index exists, it must be empty.
2019-08-02 00:19:48 +03:00
Przemysław Witek 6c87845fc1
Persist DatafeedTimingStats with RefreshPolicy.NONE by default (#44940) (#45079) 2019-08-01 14:36:59 +02:00
Dimitris Athanasiou aef419c0b0
[7.x][ML] Catch any error thrown while closing data frame analytics process (#44958) (#44968)
In case closing the process throws an exception we should be catching
it no matter its type. The process may have terminated because of a
fatal error in which case closing the process will throw a server
error, not an `IOException`. If this happens we fail to mark the
persistent task as failed and the task gets in limbo.
2019-07-29 21:59:10 +03:00
Benjamin Trent 3b514f0dae
[ML] update Instant serialization (#44765) (#44954)
* [ML] update Instant serialization

* addressing PR comments

* removing unused import
2019-07-29 13:06:56 -05:00
Dimitris Athanasiou 9dd527328a
[ML] Outlier detection should only fetch docs that have the analyzed … (#44944) (#44959)
As data frame rows with missing values for analyzed fields are skipped,
we can be more efficient by including a query that only picks documents
that have values for all analyzed fields. Besides improving the number
of documents we go through, we also provide a more accurate measurement
of how many rows we need which reduces the memory requirements.

This also adds an integration test that runs outlier detection on data
with missing fields.
2019-07-29 18:23:56 +03:00
Luca Cavanna a3cc32da64 TaskListener#onFailure to accept Exception instead of Throwable (#44946)
TaskListener accepts today Throwable in its onFailure method. Though
looking at where it is called (TransportAction), it can never be
notified of a Throwable.

This commit changes the signature of TaskListener#onFailure so that it
accepts an `Exception` rather than a `Throwable` as second argument.
2019-07-29 16:47:19 +02:00
David Kyle d05f12dadb [ML] Close any opened pipes if there is an error connecting to the process (#44869) 2019-07-29 10:48:31 +01:00
Przemysław Witek 79121ea127
[7.x] Implement exponential average search time per hour statistics. (#44683) (#44897) 2019-07-26 15:56:34 +02:00
Przemysław Witek 8bb8543fdf
Treat PostDataActionResponse.DataCounts.bucketCount as incremental rather than absolute (total). (#44803) (#44856) 2019-07-25 20:46:56 +02:00
Przemysław Witek 53f409e5ae
Add result_type field to TimingStats and DatafeedTimingStats documents (#44812) (#44841) 2019-07-25 10:11:55 +02:00
Andrei Stefan 2633d11eb7
Switch from using docvalue_fields to extracting values from _source (#44062) (#44804)
* Switch from using docvalue_fields to extracting values from _source
where applicable. Doing this means parsing the _source and handling the
numbers parsing just like Elasticsearch is doing it when it's indexing
a document.
* This also introduces a minor limitation: aliases type of fields that
are NOT part of a tree of sub-fields will not be able to be retrieved
anymore. field_caps API doesn't shed any light into a field being an
alias or not and at _source parsing time there is no way to know if a
root field is an alias or not. Fields of the type "a.b.c.alias" can be
extracted from docvalue_fields, only if the field they point to can be
extracted from docvalue_fields. Also, not all fields in a hierarchy of
fields can be evaluated to being an alias.

(cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)
2019-07-25 10:02:41 +03:00
Alpar Torok b34ac66d96
Mute multiple tests on Windows (7.x) (#44676)
* Mute failing test

tracked in #44552

* mute EvilSecurityTests

tracking in #44558

* Fix line endings in ESJsonLayoutTests

* Mute failing ForecastIT  test on windows

Tracking in #44609

* mute BasicRenormalizationIT.testDefaultRenormalization

tracked in #44613

* fix mute testDefaultRenormalization

* Increase busyWait timeout windows is slow

* Mute failure unconfigured node name

* mute x-pack internal cluster test windows

tracking #44610

* Mute JvmErgonomicsTests on windows

Tracking #44669

* mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot

Tracking #44671

* Mute NodeTests on Windows

Tracking #44256
2019-07-22 11:32:29 +03:00
David Kyle 0fc091f166
Enable XLint warnings for ML (#44346)
Removes the warning suppression -Xlint:-deprecation,-rawtypes,-serial,-try,-unchecked.
Many warnings were unchecked warnings in the test code often because of the use of mocks.
These are suppressed with @SuppressWarning
2019-07-18 09:33:37 +01:00
Tal Levy a5ad59451c
migrate more ML actions off of using Request suppliers (#44462) (#44529)
many classes still use the Streamable constructors of HandledTransportAction,
this commit moves more of those classes to the new Writeable constructors.

relates #34389.
2019-07-17 20:28:29 -07:00
Ryan Ernst 0755a13c9f
Convert AcknowledgedRequest to Writeable.Reader (#44412) (#44454)
This commit adds constructors to AcknolwedgedRequest subclasses to
implement Writeable.Reader, and ensures all future subclasses implement
the same.

relates #34389
2019-07-17 11:17:36 -07:00
Tal Levy 901310a826
[7.x] Migrate ML Actions to use writeable ActionType (#44302) (#44391)
* Migrate ML Actions to use writeable ActionType (#44302)

This commit converts all the StreamableResponseActionType
actions in the ML core module to be ActionType and leverage
the Writeable infrastructure.
2019-07-16 12:41:10 -07:00
Przemysław Witek 34bf6bcec0
Treat big changes in searchCount as significant and persist the document after such changes (#44413) (#44424) 2019-07-16 16:15:32 +02:00
Lee Hinman fb0461ac76
[7.x] Add Snapshot Lifecycle Management (#44382)
* Add Snapshot Lifecycle Management (#43934)

* Add SnapshotLifecycleService and related CRUD APIs

This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.

This also includes the get, put, and delete APIs for these policies

Relates to #38461

* Make scheduledJobIds return an immutable set

* Use Object.equals for SnapshotLifecyclePolicy

* Remove unneeded TODO

* Implement ToXContentFragment on SnapshotLifecyclePolicyItem

* Copy contents of the scheduledJobIds

* Handle snapshot lifecycle policy updates and deletions (#40062)

(Note this is a PR against the `snapshot-lifecycle-management` feature branch)

This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.

Relates to #38461

* Take a snapshot for the policy when the SLM policy is triggered (#40383)

(This is a PR for the `snapshot-lifecycle-management` branch)

This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.

This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.

Relates to #38461

* Record most recent snapshot policy success/failure (#40619)

Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.

This is the first step toward writing history in a more permanent
fashion.

* Validate snapshot lifecycle policies (#40654)

(This is a PR against the `snapshot-lifecycle-management` branch)

With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.

Part of #38461

* Hook SLM into ILM's start and stop APIs (#40871)

(This pull request is for the `snapshot-lifecycle-management` branch)

This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.

Relates to #38461

* Add tests for SnapshotLifecyclePolicyItem (#40912)

Adds serialization tests for SnapshotLifecyclePolicyItem.

* Fix improper import in build.gradle after master merge

* Add human readable version of modified date for snapshot lifecycle policy (#41035)

* Add human readable version of modified date for snapshot lifecycle policy

This small change changes it from:

```
...
"modified_date": 1554843903242,
...
```

To

```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```

Including the `"modified_date"` field when the `?human` field is used.

Relates to #38461

* Fix test

* Add API to execute SLM policy on demand (#41038)

This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.

```json
PUT /_ilm/snapshot/<policy>/_execute
```

And it returns the response with the generated snapshot name:

```json
{
  "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```

Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.

Relates to #38461

* Add next_execution to SLM policy metadata (#41221)

* Add next_execution to SLM policy metadata

This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:

```json
GET /_ilm/snapshot?human
{
  "production" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:16:21.865Z",
    "modified_date_millis" : 1555362981865,
    "policy" : {
      "name" : "<production-snap-{now/d}>",
      "schedule" : "*/30 * * * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "foo-*",
          "important"
        ],
        "ignore_unavailable" : true,
        "include_global_state" : false
      }
    },
    "next_execution" : "2019-04-15T21:16:30.000Z",
    "next_execution_millis" : 1555362990000
  },
  "other" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:12:19.959Z",
    "modified_date_millis" : 1555362739959,
    "policy" : {
      "name" : "<other-snap-{now/d}>",
      "schedule" : "0 30 2 * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "other"
        ],
        "ignore_unavailable" : false,
        "include_global_state" : true
      }
    },
    "next_execution" : "2019-04-16T02:30:00.000Z",
    "next_execution_millis" : 1555381800000
  }
}
```

Relates to #38461

* Fix and enhance tests

* Figured out how to Cron

* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)

This commit changes the endpoint for snapshot lifecycle management from:

```
GET /_ilm/snapshot/<policy>
```

to:

```
GET /_slm/policy/<policy>
```

It mimics the ILM path only using `slm` instead of `ilm`.

Relates to #38461

* Add initial documentation for SLM (#41510)

* Add initial documentation for SLM

This adds the initial documentation for snapshot lifecycle management.

It also includes the REST spec API json files since they're sort of
documentation.

Relates to #38461

* Add `manage_slm` and `read_slm` roles (#41607)

* Add `manage_slm` and `read_slm` roles

This adds two more built in roles -

`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.

`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.

Relates to #38461

* Add execute to the test

* Fix ilm -> slm typo in test

* Record SLM history into an index (#41707)

It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.

This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.

Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s.  Watcher would cause the same problem, but it
is already disabled (for the same reason).

* High Level Rest Client support for SLM (#41767)

* High Level Rest Client support for SLM

This commit add HLRC support for SLM.

Relates to #38461

* Fill out documentation tests with tags

* Add more callouts and asciidoc for HLRC

* Update javadoc links to real locations

* Add security test testing SLM cluster privileges (#42678)

* Add security test testing SLM cluster privileges

This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.

Relates to #38461

* Don't redefine vars

*  Add Getting Started Guide for SLM  (#42878)

This commit adds a basic Getting Started Guide for SLM.

* Include SLM policy name in Snapshot metadata (#43132)

Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.

* Fix compilation after master merge

* [TEST] Move exception wrapping for devious exception throwing

Fixes an issue where an exception was created from one line and thrown in another.

* Fix SLM for the change to AcknowledgedResponse

* Add Snapshot Lifecycle Management Package Docs (#43535)

* Fix compilation for transport actions now that task is required

* Add a note mentioning the privileges needed for SLM (#43708)

* Add a note mentioning the privileges needed for SLM

This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.

Relates to #38461

* Mention that you can create snapshots for indices you can't read

* Fix REST tests for new number of cluster privileges

* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)

* Fix SnapshotHistoryStoreTests after merge

* Remove overridden newResponse functions that have been removed

* Fix compilation for backport

* Fix get snapshot output parsing in test

* [DOCS] Add redirects for removed autogen anchors (#44380)

* Switch <tt>...</tt> in javadocs for {@code ...}
2019-07-16 07:37:13 -06:00
Przemysław Witek 3f3a3d3f2b
[7.x] Add DatafeedTimingStats.average_search_time_per_bucket_ms and TimingStats.total_bucket_processing_time_ms stats (#44125) (#44404) 2019-07-16 12:51:29 +02:00
Ryan Ernst 7e06888bae
Convert testclusters to use distro download plugin (#44253) (#44362)
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.
2019-07-15 17:53:05 -07:00
Ryan Ernst 59658daef9
Separate streamable based master node actions (#44313)
This commit creates new base classes for master node actions whose
response types still implement Streamable. This simplifies both finding
remaining classes to convert, as well as creating new master node
actions that use Writeable for their responses.

relates #34389
2019-07-15 09:20:20 -07:00
Armin Braun d73e2f9c56
HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164) (#44324)
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes #33077
2019-07-15 10:21:54 +02:00
Ryan Ernst 1dcf53465c Reorder HandledTransportAction ctor args (#44291)
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.

relates #34389
2019-07-12 13:45:09 -07:00
Przemysław Witek dd5f4ae00e
Update .ml-config mappings before indexing job, datafeed or df analytics config (#44216) (#44273) 2019-07-12 16:49:48 +02:00
Przemysław Witek 40d3c60d7a
Make testDatafeedTimingStats_DatafeedJobIdUpdated test easier to debug (#44206) (#44268) 2019-07-12 13:52:26 +02:00
Benjamin Trent c82d9c5b50
[ML] Adds support for regression.mean_squared_error to eval API (#44140) (#44218)
* [ML] Adds support for regression.mean_squared_error to eval API

* addressing PR comments

* fixing tests
2019-07-11 09:22:52 -05:00
David Roberts 5886aefeed [ML] Wait for .ml-config primary before assigning persistent tasks (#44170)
Now that ML job configs are stored in an index rather than
cluster state, availability of the .ml-config index is very
important to the operation of ML.  When a cluster starts up
the ML persistent tasks will be considered for node
assignment very early on.  It is best in this case if
assignment is deferred until after the .ml-config index is
available.

The introduction of data frame analytics jobs has made this
problem worse, because anomaly detection jobs already waited
for the primary shards of the .ml-state, .ml-anomalies-shared
and .ml-meta indices to be available before doing node
assignment, and by coincidence this would probably lead to
the primary shards of .ml-config also being searchable.  But
data frame analytics jobs had no other index checks prior to
this change.

This fixes problem 2 of #44156
2019-07-11 11:43:39 +01:00
Igor Motov df2e1fb43e Geo: add validator that only checks altitude (#43893)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
2019-07-10 16:55:03 -04:00
Ryan Ernst c6efb9be2a Convert ReplicationResponse to Writeable (#43953)
This commit convers ReplicationResponse and all its subclasses to
support Writeable.Reader as a constructor.

relates #34389
2019-07-10 12:45:10 -07:00
David Roberts 07f53e39b3 [ML] Fix ML memory tracker lockup when inner step fails (#44158)
When the ML memory tracker is refreshed and a refresh is
already in progress the idea is that the second and
subsequent refresh requests receive the same response as
the currently in progress refresh.

There was a bug that if a refresh failed then the ML
memory tracker's view of whether a refresh was in progress
was not reset, leading to every subsequent request being
registered to receive a response that would never come.

This change makes the ML memory tracker pass on failures
as well as successes to all interested parties and reset
the list of interested parties so that further refresh
attempts are possible after either a success or failure.

This fixes problem 1 of #44156
2019-07-10 15:46:46 +01:00
Przemysław Witek 44781e415e
[7.x] [ML] Add DatafeedTimingStats to datafeed GetDatafeedStatsAction.Response (#43045) (#44118) 2019-07-10 11:51:44 +02:00
David Roberts 853ddb5a07 [ML] Fix custom timestamp override with dot-separated fractional seconds (#44127)
Custom timestamp overrides provided to the find_file_structure
endpoint produced an invalid Grok pattern if the fractional
seconds separator was a dot rather than a comma or colon.
This commit fixes that problem and adds tests for this sort
of timestamp override.

Fixes #44110
2019-07-10 10:27:08 +01:00
Armin Braun 9eac5ceb1b
Dry up inputstream to bytesreference (#43675) (#44094)
* Dry up Reading InputStream to BytesReference
* Dry up spots where we use the same pattern to get from an InputStream to a BytesReferences
2019-07-09 09:18:25 +02:00
Dimitris Athanasiou 30b20920b9
[7.x][ML] Report correct count for df-analytics get-stats API (#43969) (#43981)
The count should match the number of all df-analytics that
matched the id in the request. However, we set the count
to the number of df-analytics returned which was bound to the
`size` parameter. This commit fixes this by setting the count
to the count of the `get` response.
2019-07-05 10:28:57 +03:00
Benjamin Trent 36f7259737 [ML] Fix datafeed checks when a concrete remote index is present (#43923)
A bug was introduced in 6.6.0 when we added support for
rollup indices. Rollup caps does NOT support looking at
remote indices, consequently, since we always look up rollup
caps, the datafeed fails with an error if its config
includes a concrete remote index.  (When all remote indices
in a datafeed config are wildcards the problem did not
occur.)

The rollups feature does not support remote indices, so if
there is any remote index in a datafeed config (wildcarded
or not), we can skip the rollup cap checks.  This PR
implements that change.
2019-07-04 13:31:45 +01:00
Alan Woodward 4b99255fed Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:28:55 +01:00
Alpar Torok 1b6109517a Mute failing test
Tracking in #43960
2019-07-04 12:13:02 +03:00
Dimitris Athanasiou 96b0b27f18
[7.x][ML] Set df-analytics task state to failed when appropriate (#43880) (#43906)
This introduces a `failed` state to which the data frame analytics
persistent task is set to when something unexpected fails. It could
be the process crashing, the results processor hitting some error,
etc. The failure message is then captured and set on the task state.
From there, it becomes available via the _stats API as `failure_reason`.

The df-analytics stop API now has a `force` boolean parameter. This allows
the user to call it for a failed task in order to reset it to `stopped` after
we have ensured the failure has been communicated to the user.

This commit also adds the analytics version in the persistent task
params as this allows us to prevent tasks to run on unsuitable nodes in
the future.
2019-07-03 12:41:56 +03:00
Christoph Büscher fe3f9f0c6b Yet another `the the` cleanup (#43815) 2019-07-01 20:22:19 +02:00
Dimitris Athanasiou 8f49d01113
[7.x][ML] Rename df-analytics `_id_copy` to `ml__id_copy` (#43754) (#43783)
Renames `_id_copy` to `ml__id_copy` as field names starting with
underscore are deprecated. The new field name `ml__id_copy` was
chosen as an obscure enough field that users won't have in their data.
Otherwise, this field is only intented to be used by df-analytics.
2019-06-30 19:37:00 +03:00
David Roberts b599c68d23 [ML] Assert that a no-op job creates no results nor state (#43681)
If a job is opened and then closed and does nothing in
between then it should not persist any results or state
documents. This change adapts the no-op job test to
assert no results in addition to no state, and to log
any documents that cause this assertion to fail.

Relates elastic/ml-cpp#512
Relates #43680
2019-06-29 14:57:49 +01:00
Ryan Ernst 28ab77a023
Add StreamableResponseAction to aid in deprecation of Streamable (#43770)
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.

relates #34389
2019-06-28 21:40:00 -07:00
David Roberts 7951c63b91 [ML] Mark ml-cpp dependency as regularly changing (#43760)
Since #41817 was merged the ml-cpp zip file for any
given version has been cached indefinitely by Gradle.
This is problematic, particularly in the case of the
master branch where the version 8.0.0-SNAPSHOT will
be in use for more than a year.

This change tells Gradle that the ml-cpp zip file is
a "changing" dependency, and to check whether it has
changed every two hours. Two hours is a compromise
between checking on every build and annoying developers
with slow internet connections and checking rarely
causing bug fixes in the ml-cpp code to take a long
time to propagate through to elasticsearch PRs that
rely on them.
2019-06-28 21:21:18 +01:00
Dimitris Athanasiou cab879118d
[7.x][ML] Support multiple source indices for df-analytics (#43702) (#43731)
This commit adds support for multiple source indices.
In order to deal with multiple indices having different mappings,
it attempts a best-effort approach to merge the mappings assuming
there are no conflicts. In case conflicts exists an error will be
returned.

To allow users creating custom mappings for special use cases,
the destination index is now allowed to exist before the analytics
job runs. In addition, settings are no longer copied except for
the `index.number_of_shards` and `index.number_of_replicas`.
2019-06-28 13:28:03 +03:00
Przemysław Witek 94f18da5df
Add version and create_time to data frame analytics config (#43683) (#43712) 2019-06-28 07:37:21 +02:00
Igor Motov 3607876a71 Geo: Makes coordinate validator in libs/geo plugable (#43657)
Moves coordinate validation from Geometry constructors into
parser.

Relates #43644
2019-06-27 19:53:41 -04:00
Przemysław Witek 68dbbd8793
Deduplicate two similar TimeUtils classes. (#43697)
* Deduplicate org.elasticsearch.xpack.core.dataframe.utils.TimeUtils and org.elasticsearch.xpack.core.ml.utils.time.TimeUtils into a common class: org.elasticsearch.xpack.core.common.time.TimeUtils.
* Add unit tests for parseTimeField and parseTimeFieldToInstant methods
2019-06-27 18:51:48 +02:00
David Roberts f39619d182 [ML] Don't write timing stats on no-op (#43680)
Similar to elastic/ml-cpp#512, if a job opens and closes and
does nothing in between we shouldn't write timing stats to
the results index.
2019-06-27 16:37:54 +01:00
Przemysław Witek ba518722a2
[7.x] [ML] Tag destination index with data frame metadata (#43567) (#43660) 2019-06-27 08:08:39 +02:00
David Roberts 31dc5b7d3a [TEST] Wait for replicas before stopping nodes in ML distributed test (#43622)
If we stop a node before replicas exist then
the test can fail because we lose a whole
index if we stop the node with the primary on.
2019-06-26 11:52:53 +01:00
David Roberts 558e323c89 [ML] Introduce a setting for the process connect timeout (#43234)
This change introduces a new setting,
xpack.ml.process_connect_timeout, to enable
the timeout for one of the external ML processes
to connect to the ES JVM to be increased.

The timeout may need to be increased if many
processes are being started simultaneously on
the same machine. This is unlikely in clusters
with many ML nodes, as we balance the processes
across the ML nodes, but can happen in clusters
with a single ML node and a high value for
xpack.ml.node_concurrent_job_allocations.
2019-06-26 09:22:04 +01:00
Dimitris Athanasiou 126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00
Przemysław Witek b15e40ffad
Extract TimingStats-related functionality into TimingStatsReporter (#43371) (#43557) 2019-06-25 15:48:39 +02:00
David Roberts 9c285ddbab [ML] Improve message when native controller cannot connect (#43565)
The error message if the native controller failed to run
(for example due to running Elasticsearch on an unsupported
platform) was not easy to understand.  This change removes
pointless detail from the message and adds some hints about
likely causes.

Fixes #42341
2019-06-25 12:06:54 +01:00
Martijn van Groningen 101cf384ba
Replace Streamable w/ Writable in AcknowledgedResponse and subclasses (backport 7.x) (#43525)
This commit replaces usages of Streamable with Writeable for the
AcknowledgedResponse and its subclasses, plus associated actions.

Note that where possible response fields were made final and default
constructors were removed.

This is a large PR, but the change is mostly mechanical.

Relates to #34389
Backport of #43414
2019-06-24 13:47:37 +02:00
David Kyle 73221d2265 [ML] Resolve NetworkDisruptionIT (#43441)
After the network disruption a partition is created,
one side of which can form a cluster the other can't.
Ensure requests are sent to a node on the correct side
of the cluster
2019-06-21 10:24:02 +01:00
Jason Tedor 1f1a035def
Remove stale test logging annotations (#43403)
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
2019-06-19 22:58:22 -04:00
Lee Hinman d81ce9a647 Return 0 for negative "free" and "total" memory reported by the OS (#42725)
* Return 0 for negative "free" and "total" memory reported by the OS

We've had a situation where the MX bean reported negative values for the
free memory of the OS, in those rare cases we want to return a value of
0 rather than blowing up later down the pipeline.

In the event that there is a serialization or creation error with regard
to memory use, this adds asserts so the failure will occur as soon as
possible and give us a better location for investigation.

Resolves #42157

* Fix test passing in invalid memory value

* Fix another test passing in invalid memory value

* Also change mem check in MachineLearning.machineMemoryFromStats

* Add background documentation for why we prevent negative return values

* Clarify comment a bit more
2019-06-19 10:35:48 -06:00
Przemysław Witek 86b58d9ff3
Rename AutoDetectResultProcessor* to AutodetectResultProcessor* for consistency with other classes where the spelling is "Autodetect" (#43359) (#43366) 2019-06-19 15:31:26 +02:00
Jason Tedor fa09113080 Remove trace logging for ML native multi-node tests
This trace logging looks like it was copy/pasted from another test,
where the logging in that test was only added to investigate a test
failure. This commit removes the trace logging.
2019-06-18 22:28:27 -04:00
Igor Motov 9f7d1ff2de Geo: Add coerce support to libs/geo WKT parser (#43273)
Adds support for coercing not closed polygons and ignoring Z value
to libs/geo WKT parser.

Closes #43173
2019-06-18 14:41:01 -04:00
Alpar Torok 94930d0e84 Testclusters: convert ml qa tests (#43229)
* Testclusters: convert ml qa tests

This PR converts the ML tests to use testclusters.
2019-06-18 11:55:11 +03:00
David Roberts da97325790 [ML] Speed up persistent task rechecks in ML failover tests (#43291)
The ML failover tests sometimes need to wait for jobs to be
assigned to new nodes following a node failure.  They wait
10 seconds for this to happen.  However, if the node that
failed was the master node and a new master was elected then
this 10 seconds might not be long enough as a refresh of the
memory stats will delay job assignment.  Once the memory
refresh completes the persistent task will be assigned when
the next cluster state update occurs or after the periodic
recheck interval, which defaults to 30 seconds.  Rather than
increase the length of the wait for assignment to 31 seconds,
this change decreases the periodic recheck interval to 1
second.

Fixes #43289
2019-06-18 09:19:20 +01:00
David Roberts 3effe264da [ML] Fix problem with lost shards in distributed failure test (#43153)
We were stopping a node in the cluster at a time when
the replica shards of the .ml-state index might not
have been created.  This change moves the wait for
green status to a point where the .ml-state index
exists.

Fixes #40546
Fixes #41742

Forward port of #43111
2019-06-17 09:28:56 +01:00
Przemysław Witek b2613a123d
[7.x] Report exponential_avg_bucket_processing_time which gives more weight to recent buckets (#43189) (#43263) 2019-06-17 08:58:26 +02:00
David Roberts 3928c624a3 [ML] Close sample stream in post_data endpoint (#43235)
A static code analysis revealed that we are not closing
the input stream in the post_data endpoint.  This
actually makes no difference in practice, as the
particular InputStream implementation in this case is
org.elasticsearch.common.bytes.BytesReferenceStreamInput
and its close() method is a no-op. However, it is good
practice to close the stream anyway.
2019-06-14 17:54:54 +01:00
Przemysław Witek 65a584b6fb
[7.x] Report timing stats as part of the Job stats response (#42709) (#43193) 2019-06-14 09:03:14 +02:00
Jason Tedor 5bc3b7f741
Enable node roles to be pluggable (#43175)
This commit introduces the possibility for a plugin to introduce
additional node roles.
2019-06-13 15:15:48 -04:00
Ryan Ernst c3ce3f6891 Add native code info to ML info api (#43172)
The machine learning feature of xpack has native binaries with a
different commit id than the rest of code. It is currently exposed in
the xpack info api. This commit adds that commit information to the ML
info api, so that it may be removed from the info api.
2019-06-13 11:38:58 -07:00
David Roberts 43665183c2 [ML] Restrict detection of epoch timestamps in find_file_structure (#43188)
Previously 10 digit numbers were considered candidates to be
timestamps recorded as seconds since the epoch and 13 digit
numbers as timestamps recorded as milliseconds since the epoch.

However, this meant that we could detect these formats for
numbers that would represent times far in the future.  As an
example ISBN numbers starting with 9 were detected as milliseconds
since the epoch since they had 13 digits.

This change tweaks the logic for detecting such timestamps to
require that they begin with 1 or 2.  This means that numbers
that would represent times beyond about 2065 are no longer
detected as epoch timestamps.  (We can add 3 to the definition
as we get closer to the cutoff date.)
2019-06-13 13:15:41 +01:00
Dimitris Athanasiou b28e006f7c
[ML] Lock down extraction method when possible (#43104) (#43140) 2019-06-12 14:07:17 +03:00
Ryan Ernst 172cd4dbfa Remove description from xpack feature sets (#43065)
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
2019-06-11 09:22:58 -07:00
David Roberts d3136f99e6 [ML] Fix race condition when closing time checker (#43098)
The tests for the ML TimeoutChecker rely on threads
not being interrupted after the TimeoutChecker is
closed.  This change ensures this by making the
close() and setTimeoutExceeded() methods synchronized
so that the code inside them cannot execute
simultaneously.

Fixes #43097
2019-06-11 16:39:17 +01:00
Benjamin Trent 79052050bf
[ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds (#42969) (#43069)
* [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds

* only supporting doc_values for geo_point fields

* moving validation into GeoPointField ctor
2019-06-10 21:52:53 -05:00
Benjamin Trent eadfe05587
[ML] Changes slice specification to auto. See #42996 (#43039) (#43070) 2019-06-10 21:52:22 -05:00
Dimitris Athanasiou 76a92b49a8
[ML] Get resources action should be lenient when sort field is unmapped (#42991) (#43046)
Get resources action sorts on the resource id. When there are no resources at
all, then it is possible the index does not contain a mapping for the resource
id field. In that case, the search api fails by default.

This commit adjusts the search request to ignore unmapped fields.

Closes elastic/kibana#37870
2019-06-10 19:50:19 +03:00
Alan Woodward 8e23e4518a Move construction of custom analyzers into AnalysisRegistry (#42940)
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.

This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
2019-06-10 14:33:25 +01:00
David Turner 68339f90e9 Mute AutodetectMemoryLimitIT#testTooManyPartitions
Relates #43013
2019-06-10 09:20:36 +01:00
Henning Andersen dea935ac31
Reindex max_docs parameter name (#42942)
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.

The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.

Finally, all 3 endpoints now support max_docs in both body and URL.

Relates #24344
2019-06-07 12:16:36 +02:00
David Roberts 40c827a3b8 [ML] Close sample stream in find_file_structure endpoint (#42896)
A static code analysis revealed that we are not closing
the input stream in the find_file_structure endpoint.
This actually makes no difference in practice, as the
particular InputStream implementation in this case is
org.elasticsearch.common.bytes.BytesReferenceStreamInput
and its close() method is a no-op.  However, it is good
practice to close the stream anyway.
2019-06-06 11:03:45 +01:00
David Roberts b202a59f88 [ML] Add earliest and latest timestamps to field stats (#42890)
This change adds the earliest and latest timestamps into
the field stats for fields of type "date" in the output of
the ML find_file_structure endpoint.  This will enable the
cards for date fields in the file data visualizer in the UI
to be made to look more similar to the cards for date
fields in the index data visualizer in the UI.
2019-06-06 08:58:35 +01:00
David Roberts d5baedb789 [ML] Change dots in CSV column names to underscores (#42839)
Dots in the column names cause an error in the ingest
pipeline, as dots are special characters in ingest pipeline.
This PR changes dots into underscores in CSV field names
suggested by the ML find_file_structure endpoint _unless_
the field names are specifically overridden.  The reason for
allowing them in overrides is that fields that are not
mentioned in the ingest pipeline can contain dots.  But it's
more consistent that the default behaviour is to replace
them all.

Fixes elastic/kibana#26800
2019-06-05 11:28:33 +01:00
Mark Vieira e44b8b1e2e
[Backport] Remove dependency substitutions 7.x (#42866)
* Remove unnecessary usage of Gradle dependency substitution rules (#42773)

(cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)
2019-06-04 13:50:23 -07:00
David Roberts b61202b0a8 [ML] Add a limit on line merging in find_file_structure (#42501)
When analysing a semi-structured text file the
find_file_structure endpoint merges lines to form
multi-line messages using the assumption that the
first line in each message contains the timestamp.
However, if the timestamp is misdetected then this
can lead to excessive numbers of lines being merged
to form massive messages.

This commit adds a line_merge_size_limit setting
(default 10000 characters) that halts the analysis
if a message bigger than this is created.  This
prevents significant CPU time being spent subsequently
trying to determine the internal structure of the
huge bogus messages.
2019-06-03 13:45:51 +01:00
David Roberts 10aca87389 [ML] Better detection of binary input in find_file_structure (#42707)
This change helps to prevent the situation where a binary
file uploaded to the find_file_structure endpoint is
detected as being text in the UTF-16 character set, and
then causes a large amount of CPU to be spent analysing
the bogus text structure.

The approach is to check the distribution of zero bytes
between odd and even file positions, on the grounds that
UTF-16BE or UTF16-LE would have a very skewed distribution.
2019-06-03 12:47:22 +01:00
David Roberts 48dc0dca57 [ML] Use map and filter instead of flatMap in find_file_structure (#42534)
Using map and filter avoids the garbage from all the
Stream.of calls that flatMap necessitated. Performance
is better when there are masses of fields.
2019-05-24 20:12:06 +01:00
David Roberts 34de68b007 [ML] Fix possible race condition when closing an opening job (#42506)
This change fixes a race condition that would result in an
in-memory data structure becoming out-of-sync with persistent
tasks in cluster state.

If repeated often enough this could result in it being
impossible to open any ML jobs on the affected node, as the
master node would think the node had capacity to open another
job but the chosen node would error during the open sequence
due to its in-memory data structure being full.

The race could be triggered by opening a job and then closing
it a tiny fraction of a second later.  It is unlikely a user
of the UI could open and close the job that fast, but a script
or program calling the REST API could.

The nasty thing is, from the externally observable states and
stats everything would appear to be fine - the fast open then
close sequence would appear to leave the job in the closed
state.  It's only later that the leftovers in the in-memory
data structure might build up and cause a problem.
2019-05-24 20:11:58 +01:00
David Roberts f472186b9f [ML] Improve file structure finder timestamp format determination (#41948)
This change contains a major refactoring of the timestamp
format determination code used by the ML find file structure
endpoint.

Previously timestamp format determination was done separately
for each piece of text supplied to the timestamp format finder.
This had the drawback that it was not possible to distinguish
dd/MM and MM/dd in the case where both numbers were 12 or less.
In order to do this sensibly it is best to look across all the
available timestamps and see if one of the numbers is greater
than 12 in any of them.  This necessitates making the timestamp
format finder an instantiable class that can accumulate evidence
over time.

Another problem with the previous approach was that it was only
possible to override the timestamp format to one of a limited
set of timestamp formats.  There was no way out if a file to be
analysed had a timestamp that was sane yet not in the supported
set.  This is now changed to allow any timestamp format that can
be parsed by a combination of these Java date/time formats:
yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss,
a, XX, XXX, zzz
Additionally S letter groups (fractional seconds) are supported
providing they occur after ss and separated from the ss by a dot,
comma or colon.  Spacing and punctuation is also permitted with
the exception of the question mark, newline and carriage return
characters, together with literal text enclosed in single quotes.

The full list of changes/improvements in this refactor is:

- Make TimestampFormatFinder an instantiable class
- Overrides must be specified in Java date/time format - Joda
  format is no longer accepted
- Joda timestamp formats in outputs are now derived from the
  determined or overridden Java timestamp formats, not stored
  separately
- Functionality for determining the "best" timestamp format in
  a set of lines has been moved from TextLogFileStructureFinder
  to TimestampFormatFinder, taking advantage of the fact that
  TimestampFormatFinder is now an instantiable class with state
- The functionality to quickly rule out some possible Grok
  patterns when looking for timestamp formats has been changed
  from using simple regular expressions to the much faster
  approach of using the Shift-And method of sub-string search,
  but using an "alphabet" consisting of just 1 (representing any
  digit) and 0 (representing non-digits)
- Timestamp format overrides are now much more flexible
- Timestamp format overrides that do not correspond to a built-in
  Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern
  whose definition is included within the date processor in the
  ingest pipeline
- Grok patterns that correspond to multiple Java date/time
  patterns are now handled better - the Grok pattern is accepted
  as matching broadly, and the required set of Java date/time
  patterns is built up considering all observed samples
- As a result of the more flexible acceptance of Grok patterns,
  when looking for the "best" timestamp in a set of lines
  timestamps are considered different if they are preceded by
  a different sequence of punctuation characters (to prevent
  timestamps far into some lines being considered similar to
  timestamps near the beginning of other lines)
- Out-of-the-box Grok patterns that are considered now include
  %{DATE} and %{DATESTAMP}, which have indeterminate day/month
  ordering
- The order of day/month in formats with indeterminate day/month
  order is determined by considering all observed samples (plus
  the server locale if the observed samples still do not suggest
  an ordering)

Relates #38086
Closes #35137
Closes #35132
2019-05-24 09:10:08 +01:00
Dimitris Athanasiou a6eb20ad35
[ML] Include node name when native controller cannot start process (#42225) (#42338)
This adds the node name where we fail to start a process via the native
controller to facilitate debugging as otherwise it might not be known
to which node the job was allocated.
2019-05-22 12:42:04 +03:00
Yannick Welsch 770d8e9e39 Remove usage of max_local_storage_nodes in test infrastructure (#41652)
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.

This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
2019-05-22 11:04:55 +02:00
Ed Savage d97f4d5e28 [ML][TEST] Fix limits in AutodetectMemoryLimitIT (#42279)
Re-enable muted tests and accommodate recent backend changes
that result in higher memory usage being reported for a job
at the start of its life-cycle
2019-05-21 18:44:47 +01:00
Dimitris Athanasiou a4e6fb4dd2
[ML] Fix logger declaration in ML plugins (#42222) (#42238)
This corrects what appears to have been a copy-paste error
where the logger for `MachineLearning` and `DataFrame` was wrongly
set to be that of `XPackPlugin`.
2019-05-21 18:03:24 +03:00
Zachary Tong 6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
Ed Savage 840af87a74 [ML] Temporarily muting failing tests
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates #42086
2019-05-19 08:29:50 -04:00
Ed Savage a68b04e47b [ML] Improve hard_limit audit message (#42086)
Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.
2019-05-17 17:40:08 -04:00
David Roberts 226df35d96 [ML] Improve message misformation error in file structure finder (#42175)
This change replaces the extremely unfriendly message
"Number of messages analyzed must be positive" in the
case where the sample lines were incorrectly grouped
into just one message to an error that more helpfully
explains the likely root cause of the problem.
2019-05-16 18:29:38 +01:00
Benjamin Trent bf5a40c754
[ML] relax set upgrade mode test to match what is guaranteed (#41958) (#41979)
* [ML] relax set upgrade mode test to match what is guaranteed

* removing unused import
2019-05-09 14:28:50 -05:00
Jason Tedor d7fd51a84e
Provide names for all artifact repositories (#41857)
This commit adds a name for each Maven and Ivy repository used in the
build.
2019-05-07 06:35:28 -04:00
Ryan Ernst 6fd8924c5a Switch run task to use real distro (#41590)
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
2019-05-06 12:34:07 -07:00
Jason Tedor f4da98ca3d
Use a proper repository for ml-cpp artifacts (#41817)
This switches the strategy used to download machine learning artifacts
from a manual download through S3 to using an Ivy repository on top of
S3. This gives us all the benefits of Gradle dependency resolution
including local caching.
2019-05-04 12:44:19 -04:00
Jason Tedor 241c4ef97a
Use https for artifact locations
This commit switches to using https for some artifact locations.
2019-05-03 16:15:48 -04:00
Benjamin Trent a92c06ae09
[ML] Refactor NativeStorageProvider to enable reuse (#41414) (#41746)
* [ML] Refactor NativeStorageProvider to enable reuse

Moves `NativeStorageProvider` as a machine learning component
so that it can be reused for other job types. Also, we now
pass the persistent task description as unique identifier which
avoids conflicts between jobs of different type but with same ids.

* Adding nativeStorageProvider as component

Since `TransportForecastJobAction` is expected to get injected a `NativeStorageProvider` class, we need to make sure that it is a constructed component, as it does not have a zero parametered, public ctor.
2019-05-02 09:46:22 -05:00
Jason Tedor 7f3ab4524f
Bump 7.x branch to version 7.2.0
This commit adds the 7.2.0 version constant to the 7.x branch, and bumps
BWC logic accordingly.
2019-05-01 13:38:57 -04:00
Tom Veasey b3f4533e1c [ML] Update for model selection change and disable temporarily (#41482) (#41682) 2019-04-30 15:47:54 -05:00
Yogesh Gaikwad c0d40ae4ca
Remove deprecated stashWithOrigin calls and use the alternative (#40847) (#41562)
This commit removes the deprecated `stashWithOrigin` and
modifies its usage to use the alternative.
2019-04-28 21:25:42 +10:00
Christoph Büscher 52495843cc [Docs] Fix common word repetitions (#39703) 2019-04-25 20:47:47 +02:00
Benjamin Trent 07d36fdb23
[ML] refactoring the ML plugin to use the common auditor code (#41419) (#41485) 2019-04-24 09:56:59 -05:00
Dimitris Athanasiou eb2295ac81
[7.1][ML] Refactor autodetect service into its own class (#41378) (#41409)
This also improves aims to improve the corresponding unit tests
with regard to readability and maintainability.
2019-04-22 17:42:13 +03:00
Zachary Tong 7e62ff2823 [Rollup] Validate timezones based on rules not string comparision (#36237)
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
2019-04-17 13:46:44 -04:00
Iana Bondarska e090176f17 [ML] Exclude analysis fields with core field names from anomaly results (#41093)
Added "_index", "_type", "_id" to list of reserved fields.

Closes #39406
2019-04-17 16:08:03 +01:00
David Kyle 116167df55 [ML] Write header to autodetect before it is visible to other calls (#41085) 2019-04-16 13:51:29 +01:00
David Roberts 3f00c29adb [ML] Allow xpack.ml.max_machine_memory_percent higher than 100% (#41193)
Values higher than 100% are now allowed to accommodate use
cases where swapping has been determined to be acceptable.
Anomaly detector jobs only use their full model memory
during background persistence, and this is deliberately
staggered, so with large numbers of jobs few will generally
be persisting state at the same time.  Settings higher than
available memory are only recommended for OEM type
situations where a wrapper tightly controls the types of
jobs that can be created, and each job alone is considerably
smaller than what each node can handle.
2019-04-15 14:37:46 +01:00
Benjamin Trent 05cf53934a
[ML] checking if p-tasks metadata is null before updating state (#41091) (#41123)
* [ML] checking if p-tasks metadata is null before updating state

* Adding test that validates fix

* removing debug println
2019-04-11 13:54:41 -05:00
Przemysław Witek f5014ace64
[ML] Add validation that rejects duplicate detectors in PutJobAction (#40967) (#41072)
* [ML] Add validation that rejects duplicate detectors in PutJobAction

Closes #39704

* Add YML integration test for duplicate detectors fix.

* Use "== false" comparison rather than "!" operator.

* Refine error message to sound more natural.

* Put job description in square brackets in the error message.

* Use the new validation in ValidateJobConfigAction.

* Exclude YML tests for new validation from permission tests.
2019-04-10 15:43:35 +02:00
Mark Vieira 1287c7d91f
[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) (#40993)
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)

This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.

(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)

* Fix forking JVM runner

* Don't bump shadow plugin version
2019-04-09 11:52:50 -07:00
Jason Tedor ebba9393c1
Fix unsafe publication of invalid license enforcer (#40985)
The invalid license enforced is exposed to the cluster state update
thread (via the license state listener) before the constructor has
finished. This violates the JLS for safe publication of an object, and
means there is a concurrency bug lurking here. This commit addresses
this by avoiding publication of the invalid license enforcer before the
constructor has returned.
2019-04-09 13:51:37 -04:00
David Roberts d16f86f7ab [ML] Add created_by info to usage stats (#40518)
This change adds information about which UI path
(if any) created ML anomaly detector jobs to the
stats returned by the _xpack/usage endpoint.

Counts for the following possibilities are expected:

* ml_module_apache_access
* ml_module_apm_transaction
* ml_module_auditbeat_process_docker
* ml_module_auditbeat_process_hosts
* ml_module_nginx_access
* ml_module_sample
* multi_metric_wizard
* population_wizard
* single_metric_wizard
* unknown

The "unknown" count is for jobs that do not have a
created_by setting in their custom_settings.

Closes #38403
2019-04-04 10:55:20 +01:00
Dimitris Athanasiou 65cca2ee6f
[7.x][ML] Scrolling datafeed should clear scroll contexts on error (#40773) (#40794)
Closes #40772
2019-04-04 12:28:06 +03:00
David Kyle 1354696db9
[ML] Data Frame HLRC Get Stats API (#40443) 2019-03-26 11:17:13 +00:00
Ed Savage c20ea9a2dd [ML][TEST] Fix failing test testPersistJobOnGracefulShutdown_givenTimeAdvancedAfterNoNewData (#40363)
Ensure that there is at least a 1s delay between the time that state
is persisted by each of the two jobs in the test.

Model snapshot IDs use the current time in epoch seconds to
distinguish themselves, hence snapshots will be overwritten
by another if it occurs in the same 1s window.

Closes #40347
2019-03-25 17:55:10 +00:00
David Turner 1265a15b75 Mute testPersistJobOnGracefulShutdown_givenTimeAdvancedAfterNoNewData 2019-03-22 08:46:51 +00:00
Ed Savage 23d5f7babf
[ML] Add integration tests to check persistence (#40272) (#40315)
Additional checks to exercise the behaviour of
persistence on graceful close of an anomaly job.

Related to elastic/ml-cpp#393
Backports #40272
2019-03-21 17:01:10 +00:00
David Roberts 64028f3d8f Mute JobResultsProviderIT.testMultipleSimultaneousJobCreations
Due to https://github.com/elastic/elasticsearch/issues/40134
2019-03-17 07:50:08 +00:00
David Roberts 8d01b11918 [ML] Fix race condition when creating multiple jobs (#40049)
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785
2019-03-15 10:18:03 +00:00
David Kyle 78a9754318 Mute test NetworkDisruptionIT.testJobRelocation
Relates to #39858
2019-03-15 10:06:31 +00:00
Benjamin Trent 2016e23285
[ML] Refactor common utils out of ML plugin to XPack.Core (#39976) (#40009)
* [ML] Refactor common utils out of ML plugin to XPack.Core

* implementing GET filters with abstract transport

* removing added rest param

* adjusting how defaults can be supplied
2019-03-13 17:08:43 -05:00
Dimitris Athanasiou 79e414df86
[ML] Fix datafeed skipping first bucket after lookback when aggs are … (#39859) (#39958)
The problem here was that `DatafeedJob` was updating the last end time searched
based on the `now` even though when there are aggregations, the extactor will
only search up to the floor of `now` against the histogram interval.
This commit fixes the issue by using the end time as calculated by the extractor.

It also adds an integration test that uses aggregations. This test would fail
before this fix. Unfortunately the test is slow as we need to wait for the
datafeed to work in real time.

Closes #39842
2019-03-13 09:09:07 +02:00
David Kyle 48788269b0
[ML] Correct small inconsistencies in ml APIs spec and docs (#39907) 2019-03-11 14:02:50 +00:00
Benjamin Trent 4da04616c9
[ML] refactoring lazy query and agg parsing (#39776) (#39881)
* [ML] refactoring lazy query and agg parsing

* Clean up and addressing PR comments

* removing unnecessary try/catch block

* removing bad call to logger

* removing unused import

* fixing bwc test failure due to serialization and config migrator test

* fixing style issues

* Adjusting DafafeedUpdate class serialization

* Adding todo for refactor in v8

* Making query non-optional so it does not write a boolean byte
2019-03-10 14:54:02 -05:00
David Roberts 5f8f91c03b
[ML] Use scaling thread pool and xpack.ml.max_open_jobs cluster-wide dynamic (#39736)
This change does the following:

1. Makes the per-node setting xpack.ml.max_open_jobs
   into a cluster-wide dynamic setting
2. Changes the job node selection to continue to use the
   per-node attributes storing the maximum number of open
   jobs if any node in the cluster is older than 7.1, and
   use the dynamic cluster-wide setting if all nodes are on
   7.1 or later
3. Changes the docs to reflect this
4. Changes the thread pools for native process communication
   from fixed size to scaling, to support the dynamic nature
   of xpack.ml.max_open_jobs
5. Renames the autodetect thread pool to the job comms
   thread pool to make clear that it will be used for other
   types of ML jobs (data frame analytics in particular)

Backport of #39320
2019-03-06 12:29:34 +00:00
Dimitris Athanasiou 5c023770d2 [ML] Disable security audit trail in native integ tests suite (#39683)
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.

Relates #39658
Closes #39575
2019-03-05 12:43:15 +02:00
David Kyle a58145f9e6
[ML] Transition to typeless (mapping) APIs (#39573)
ML has historically used doc as the single mapping type but reindex in 7.x
will change the mapping to _doc. Switching to the typeless APIs handles 
case where the mapping type is either doc or _doc. This change removes
deprecated typed usages.
2019-03-04 13:52:05 +00:00
David Roberts 085ff38122 Mute DeleteExpiredDataIT.testDeleteExpiredData
Due to https://github.com/elastic/elasticsearch/issues/39575
2019-03-03 18:34:30 +00:00
Dimitris Athanasiou 8843832039 [ML] Shave off DeleteExpiredDataIT runtime (#39557)
This commit parallelizes some parts of the test
and its remove an unnecessary refresh call.
On my local machine it shaves off about 15 seconds
for a test execution time of ~64s (down from ~80s).
This test is still slow but progress over perfection.

Relates #37339
2019-03-01 19:10:00 +02:00
Dimitris Athanasiou 8122650a55 [ML] Add integration test for interim results after advancing bucket (#39447)
This is an integration test that captures the issue described in
elastic/ml-cpp#324
2019-02-28 11:12:08 +02:00
Mehran Koushkebaghi 1d0097b5e8 [ML] Refactoring scheduled event to store instant instead of zoned time zone (#39380)
The ScheduledEvent class has never preserved the time
zone so it makes more sense for it to store the start and
end time using Instant rather than ZonedDateTime.

Closes #38620
2019-02-27 09:27:04 +00:00
David Roberts 4f2bd238d2 [ML] Increase datafeed integration test timeout for slow machines (#39311)
The assertBusy() that waits the default 10 seconds for a
datafeed to complete very occasionally times out on slow
machines.  This commit increases the timeout to 60 seconds.
It will almost never actually take this long, but it's
better to have a timeout that will prevent time being
wasted looking at spurious test failures.
2019-02-22 15:35:32 +00:00
Dimitris Athanasiou 1c6818fe74
[ML] Improve DeleteExpiredDataIT failure message (#39298) (#39310)
This test failed once in a very long time with the assertion
that there is no document for the `non_existing_job` in the
state index. I could not see how that is possible and I cannot
reproduce. With this commit the failure message will reveal
some examples of the left behind docs which might shed a light
about what could go wrong.
2019-02-22 16:15:11 +02:00
Benjamin Trent 109b6451fd
ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs (#38822) (#39119)
* ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs

* Addressing pr feedback
2019-02-19 12:45:56 -06:00
David Roberts 35e30b34f9 [ML] Stop the ML memory tracker before closing node (#39111)
The ML memory tracker does searches against ML results
and config indices.  These searches can be asynchronous,
and if they are running while the node is closing then
they can cause problems for other components.

This change adds a stop() method to the MlMemoryTracker
that waits for in-flight searches to complete.  Once
stop() has returned the MlMemoryTracker will not kick
off any new searches.

The MlLifeCycleService now calls MlMemoryTracker.stop()
before stopping stopping the node.

Fixes #37117
2019-02-19 15:12:40 +00:00
David Roberts bbcdea43c5 [ML] Allow stop unassigned datafeed and relax unset upgrade mode wait (#39034)
These two changes are interlinked.

Before this change unsetting ML upgrade mode would wait for all
datafeeds to be assigned and not waiting for their corresponding
jobs to initialise.  However, this could be inappropriate, if
there was a reason other that upgrade mode why one job was unable
to be assigned or slow to start up.  Unsetting of upgrade mode
would hang in this case.

This change relaxes the condition for considering upgrade mode to
be unset to simply that an assignment attempt has been made for
each ML persistent task that did not fail because upgrade mode
was enabled.  Thus after unsetting upgrade mode there is no
guarantee that every ML persistent task is assigned, just that
each is not unassigned due to upgrade mode.

In order to make setting upgrade mode work immediately after
unsetting upgrade mode it was then also necessary to make it
possible to stop a datafeed that was not assigned.  There was
no particularly good reason why this was not allowed in the past.
It is trivial to stop an unassigned datafeed because it just
involves removing the persistent task.
2019-02-19 14:07:10 +00:00
David Roberts b660d2cac6 [ML] More advanced post-test cleanup of ML indices (#39049)
The .ml-annotations index is created asynchronously when
some other ML index exists.  This can interfere with the
post-test index deletion, as the .ml-annotations index
can be created after all other indices have been deleted.

This change adds an ML specific post-test cleanup step
that runs before the main cleanup and:

1. Checks if any ML indices exist
2. If so, waits for the .ml-annotations index to exist
3. Deletes the other ML indices found in step 1.
4. Calls the super class cleanup

This means that by the time the main post-test index
cleanup code runs:

1. The only ML index it has to delete will be the
   .ml-annotations index
2. No other ML indices will exist that could trigger
   recreation of the .ml-annotations index

Fixes #38952
2019-02-18 14:16:03 +00:00
Martijn Laarman 9b4d96534b
Fix #38623 remove xpack namespace REST API (#38625) (#39036)
* Fix #38623 remove xpack namespace REST API

Except for xpack.usage and xpack.info API's, this moves the last remaining API's out of the xpack namespace

* rename xpack api's inside inside the files as well

* updated yaml tests references to xpack namespaces api's

* update callsApi calls in the IT subclasses

* make sure docs testing does not use xpack namespaced api's

* fix leftover xpack namespaced method names in docs/build.gradle

* found another leftover reference

(cherry picked from commit ccb5d934363c37506b76119ac050a254fa80b5e7)
2019-02-18 12:40:07 +01:00
Dimitris Athanasiou 21f76aba28
[ML] Extract base class for integ tests with native processes (#38850) (#38860) 2019-02-14 12:15:00 +02:00
Benjamin Trent d2ac05e249
ML allow aliased .ml-anomalies* index on PUT Job (#38821) (#38847) 2019-02-13 10:58:55 -06:00
Benjamin Trent 24a8ea06f5
ML: update set_upgrade_mode, add logging (#38372) (#38538)
* ML: update set_upgrade_mode, add logging

* Attempt to fix datafeed isolation

Also renamed a few methods/variables for clarity and added
some comments
2019-02-08 12:56:04 -06:00
Boaz Leskes 033ba725af
Remove support for internal versioning for concurrency control (#38254)
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:

> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document 

We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. 

This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.

Relates to #1078
2019-02-05 20:53:35 +01:00
David Turner f2dd5dd6eb
Remove DiscoveryPlugin#getDiscoveryTypes (#38414)
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.

Relates #38410
2019-02-05 17:42:24 +00:00
David Roberts 92bc681705
[ML] Report index unavailable instead of waiting for lazy node (#38423)
If a job cannot be assigned to a node because an index it
requires is unavailable and there are lazy ML nodes then
index unavailable should be reported as the assignment
explanation rather than waiting for a lazy ML node.
2019-02-05 16:10:00 +00:00
Yogesh Gaikwad fe36861ada
Add support for API keys to access Elasticsearch (#38291)
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to 
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.

This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.

The API keys:-
- by default do not have an expiration but expiration can be
  configured where the API keys need to be expired after a
  certain amount of time.
- when generated will keep authentication information of the user that
   generated them.
- can be defined with a role describing the privileges for accessing
   Elasticsearch and will be limited by the role of the user that
   generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
  before being deleted. The expired API keys remover task handles this.

Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`

The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the 
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```

Closes #34383
2019-02-05 14:21:57 +11:00
David Roberts fb6a176caf
[ML] Add explanation so far to file structure finder exceptions (#38191)
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
2019-02-04 14:32:35 +00:00
Boaz Leskes ff13a43144
Move ML Optimistic Concurrency Control to Seq No (#38278)
This commit moves the usage of internal versioning for CAS operations to use sequence numbers and primary terms

Relates to #36148
Relates to #10708
2019-02-04 10:41:08 +01:00
David Turner 1d82a6d9f9
Deprecate unused Zen1 settings (#38289)
Today the following settings in the `discovery.zen` namespace are still used:

- `discovery.zen.no_master_block`
- `discovery.zen.hosts_provider`
- `discovery.zen.ping.unicast.concurrent_connects`
- `discovery.zen.ping.unicast.hosts.resolve_timeout`
- `discovery.zen.ping.unicast.hosts`

This commit deprecates all other settings in this namespace so that they can be
removed in the next major version.
2019-02-04 08:52:08 +00:00
Benjamin Trent 5db305023d
ML: Fix error race condition on stop _all datafeeds and close _all jobs (#38113)
* ML: Ignore when task is not found for _all

* Addressing PR comments

* Update TransportStopDatafeedAction.java
2019-02-01 11:16:35 -06:00
David Roberts 1fa413a16d
[ML] Remove "8" prefixes from file structure finder timestamp formats (#38016)
In 7.x Java timestamp formats are the default timestamp format and
there is no need to prefix them with "8".  (The "8" prefix was used
in 6.7 to distinguish Java timestamp formats from Joda timestamp
formats.)

This change removes the "8" prefixes from timestamp formats in the
output of the ML file structure finder.
2019-02-01 15:36:04 +00:00
Benjamin Trent be381b4525
ML: better handle task state race condition (#38040) 2019-01-31 11:07:54 -06:00
Henning Andersen 68ed72b923
Handle scheduler exceptions (#38014)
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
2019-01-31 17:51:45 +01:00
Benjamin Trent 9782aaa1b8
ML: Add reason field in JobTaskState (#38029)
* ML: adding reason to job failure status

* marking reason as nullable

* Update AutodetectProcessManager.java
2019-01-30 11:56:24 -06:00
Benjamin Trent 8280a20664
ML: Add upgrade mode docs, hlrc, and fix bug (#37942)
* ML: Add upgrade mode docs, hlrc, and fix bug

* [DOCS] Fixes build error and edits text

* adjusting docs

* Update docs/reference/ml/apis/set-upgrade-mode.asciidoc

Co-Authored-By: benwtrent <ben.w.trent@gmail.com>

* Update set-upgrade-mode.asciidoc

* Update set-upgrade-mode.asciidoc
2019-01-30 06:51:11 -06:00
Adrien Grand c8af0f4bfa
Use mappings to format doc-value fields by default. (#30831)
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.

This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
2019-01-30 10:31:51 +01:00
Benjamin Trent 34d61d3231
ML: ignore unknown fields for JobTaskState (#37982) 2019-01-29 12:51:34 -06:00
David Kyle 6d1693ff49 [ML] Prevent submit after autodetect worker is stopped (#37700)
Runnables can be submitted to
AutodetectProcessManager.AutodetectWorkerExecutorService
without error after it has been shutdown which can lead
to requests timing out as their handlers are never called
by the terminated executor.

This change throws an EsRejectedExecutionException if a
runnable is submitted after after the shutdown and calls
AbstractRunnable.onRejection on any tasks not run.

Closes #37108
2019-01-29 15:09:40 +00:00
Henrique Gonçalves eceb3185c7 [ML] Make GetJobStats work with arbitrary wildcards and groups (#36683)
The /_ml/anomaly_detectors/{job}/_stats endpoint now
works correctly when {job} is a wildcard or job group.

Closes #34745
2019-01-29 09:06:50 +00:00
Dimitris Athanasiou ebe9c95230
[ML] Audit all errors during job deletion (#37933)
This commit moves the auditing of job deletion related errors
to the final listener in the job delete action. This ensures
any error that occurs during job deletion is audited.
2019-01-29 10:23:50 +02:00
Benjamin Trent 7e4c0e6991
ML: Adds set_upgrade_mode API endpoint (#37837)
* ML: Add MlMetadata.upgrade_mode and API

* Adding tests

* Adding wait conditionals for the upgrade_mode call to return

* Adding tests

* adjusting format and tests

* Adjusting wait conditions for api return and msgs

* adjusting doc tests

* adding upgrade mode tests to black list
2019-01-28 09:07:30 -06:00
David Kyle c0409fb9f0
[ML] Marginal gains in slow multi node QA tests (#37825)
Move 2 tests that are simple rest tests and out of the QA suite and cut the number
of post data calls in ForecastIT
2019-01-28 10:00:59 +00:00
David Roberts 57d321ed5f
[ML] Tighten up use of aliases rather than concrete indices (#37874)
We have read and write aliases for the ML results indices.  However,
the job still had methods that purported to reliably return the name
of the concrete results index being used by the job.  After reindexing
prior to upgrade to 7.x this will be wrong, so the method has been
renamed and the comments made more explicit to say the returned index
name may not be the actual concrete index name for the lifetime of the
job.  Additionally, the selection of indices when deleting the job
has been changed so that it works regardless of concrete index names.

All these changes are nice-to-have for 6.7 and 7.0, but will become
critical if we add rolling results indices in the 7.x release stream
as 6.7 and 7.0 nodes may have to operate in a mixed version cluster
that includes a version that can roll results indices.
2019-01-28 09:38:46 +00:00
David Roberts f2c0c26d15
[ML] Adjust structure finder for Joda to Java time migration (#37306)
The ML file structure finder has always reported both Joda
and Java time format strings.  This change makes the Java time
format strings the ones that are incorporated into mappings
and ingest pipeline definitions.

The BWC syntax of prepending "8" to these formats is used.
This will need to be removed once Java time format strings
become the default in Elasticsearch.

This commit also removes direct imports of Joda classes in the
structure finder unit tests.  Instead the core Joda BWC class
is used.
2019-01-26 20:19:57 +00:00
Benjamin Trent 9e932f4869
ML: removing unnecessary upgrade code (#37879) 2019-01-25 13:57:41 -06:00
Christoph Büscher b4b4cd6ebd
Clean codebase from empty statements (#37822)
* Remove empty statements

There are a couple of instances of undocumented empty statements all across the
code base. While they are mostly harmless, they make the code hard to read and
are potentially error-prone. Removing most of these instances and marking blocks
that look empty by intention as such.

* Change test, slightly more verbose but less confusing
2019-01-25 14:23:02 +01:00
David Roberts deafce1acd
[ML] No need to add state doc mapping on job open in 7.x (#37759)
When upgrading from 5.4 to 5.5 to 6.7 (inclusive) it was
necessary to ensure there was a mapping for type "doc" on
the ML state index before opening a job.  This was because
5.4 created a multi-type ML state index.

In version 7.x we can be sure that any such 5.4 index is no
longer in use.  It would have had to be reindexed into the
6.x index format prior to the upgrade to version 7.x.
2019-01-25 13:15:35 +00:00
Jim Ferenczi 787acb14b9
Track total hits up to 10,000 by default (#37466)
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.

Closes #33028
2019-01-25 13:45:39 +01:00
David Kyle e1226f69b7
[ML] Increase close job timeout and lower the max number (#37770) 2019-01-24 09:18:48 +00:00
Lee Hinman 427bc7f940
Use ILM for Watcher history deletion (#37443)
* Use ILM for Watcher history deletion

This commit adds an index lifecycle policy for the `.watch-history-*` indices.
This policy is automatically used for all new watch history indices.

This does not yet remove the automatic cleanup that the monitoring plugin does
for the .watch-history indices, and it does not touch the
`xpack.watcher.history.cleaner_service.enabled` setting.

Relates to #32041
2019-01-23 10:18:08 -07:00
Alexander Reelsen daa2ec8a60
Switch mapping/aggregations over to java time (#36363)
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.

The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.

Relates #27330
2019-01-23 10:40:05 +01:00
David Roberts 7b3dd3022d
[ML] Update ML results mappings on process start (#37706)
This change moves the update to the results index mappings
from the open job action to the code that starts the
autodetect process.

When a rolling upgrade is performed we need to update the
mappings for already-open jobs that are reassigned from an
old version node to a new version node, but the open job
action is not called in this case.

Closes #37607
2019-01-23 09:37:37 +00:00
Ryan Ernst fc99eb3e65
Add cache cleaning task for ML snapshot (#37505)
The ML subproject of xpack has a cache for the cpp artifact snapshots
which is checked on each build. The cache is outside of the build dir so
that it is not wiped on a typical clean, as the artifacts can be large
and do not change often. This commit adds a cleanCache task which will
wipe the cache dir, as over time the size of the directory can become
bloated.
2019-01-19 16:16:58 -08:00
Benjamin Trent 12cdf1cba4
ML: Add support for single bucket aggs in Datafeeds (#37544)
Single bucket aggs are now supported in datafeed aggregation configurations.
2019-01-18 15:08:53 -06:00
Benjamin Trent 5384162a42
ML: creating ML State write alias and pointing writes there (#37483)
* ML: creating ML State write alias and pointing writes there

* Moving alias check to openJob method

* adjusting concrete index lookup for ml-state
2019-01-18 14:32:34 -06:00
Yannick Welsch 6d64a2a901
Propagate Errors in executors to uncaught exception handler (#36137)
This is a continuation of #28667 and has as goal to convert all executors to propagate errors to the
uncaught exception handler. Notable missing ones were the direct executor and the scheduler. This
commit also makes it the property of the executor, not the runnable, to ensure this property. A big
part of this commit also consists of vastly improving the test coverage in this area.
2019-01-17 17:46:35 +01:00
David Kyle 75410dc632 [Ml] Prevent config snapshot failure blocking migration (#37493) 2019-01-16 11:51:15 +00:00
Hendrik Muhs 15d1b904a1
[ML] log minimum diskspace setting if forecast fails due to insufficient d… (#37486)
log minimum disk space setting if forecast fails due to insufficient disk space
2019-01-16 08:10:13 +01:00
David Kyle bea46f7b52
[ML] Migrate unallocated jobs and datafeeds (#37430)
Migrate ml job and datafeed config of open jobs and update
the parameters of the persistent tasks as they become unallocated
during a rolling upgrade. Block allocation of ml persistent tasks
until the configs are migrated.
2019-01-15 18:21:39 +00:00
David Kyle 7c11b05c28
[ML] Remove unused code from the JIndex project (#37477) 2019-01-15 17:19:58 +00:00