Commit Graph

21 Commits

Author SHA1 Message Date
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Benjamin Trent a3a4ae0ac2
[ML] fixing bug where analytics process starts with 0 rows (#45879) (#45988)
The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start.

When building the configuration for the process we should not start the native process if there are no rows.

Adding some logging to indicate what is occurring.
2019-08-26 14:18:17 -05:00
Benjamin Trent ba7b677618
[ML] better handle empty results when evaluating regression (#45745) (#45759)
* [ML] better handle empty results when evaluating regression

* adding new failure test to ml_security black list

* fixing equality check for regression results
2019-08-20 17:37:04 -05:00
Przemysław Witek 1aed388a24
Add view_index_metadata to roles.yml and remove as many df analytics test cases from build.gradle blacklist as possible. (#45451) (#45465) 2019-08-13 08:31:58 +02:00
Dimitris Athanasiou 27497ff75f
[7.x][ML] Add regression analysis to DF analytics (#45292) (#45388)
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.

This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
2019-08-09 19:31:13 +03:00
Dimitris Athanasiou 8a6675b994
[7.x][ML] Check dest index is empty when starting DF analytics (#45094) (#45112)
If one tries to start a DF analytics job that has already run,
the result will be that the task will fail after reindexing the
dest index from the source index. The results of the prior run
will be gone and the task state is not properly set to failed
with the failure reason.

This commit improves the behavior in this scenario. First, we
set the task state to `failed` in a set of failures that were
missed. Second, a validation is added that if the destination
index exists, it must be empty.
2019-08-02 00:19:48 +03:00
Ryan Ernst 7e06888bae
Convert testclusters to use distro download plugin (#44253) (#44362)
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.
2019-07-15 17:53:05 -07:00
Benjamin Trent c82d9c5b50
[ML] Adds support for regression.mean_squared_error to eval API (#44140) (#44218)
* [ML] Adds support for regression.mean_squared_error to eval API

* addressing PR comments

* fixing tests
2019-07-11 09:22:52 -05:00
Dimitris Athanasiou cab879118d
[7.x][ML] Support multiple source indices for df-analytics (#43702) (#43731)
This commit adds support for multiple source indices.
In order to deal with multiple indices having different mappings,
it attempts a best-effort approach to merge the mappings assuming
there are no conflicts. In case conflicts exists an error will be
returned.

To allow users creating custom mappings for special use cases,
the destination index is now allowed to exist before the analytics
job runs. In addition, settings are no longer copied except for
the `index.number_of_shards` and `index.number_of_replicas`.
2019-06-28 13:28:03 +03:00
Przemysław Witek 94f18da5df
Add version and create_time to data frame analytics config (#43683) (#43712) 2019-06-28 07:37:21 +02:00
Dimitris Athanasiou 126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00
Alpar Torok 94930d0e84 Testclusters: convert ml qa tests (#43229)
* Testclusters: convert ml qa tests

This PR converts the ML tests to use testclusters.
2019-06-18 11:55:11 +03:00
Dimitris Athanasiou 76a92b49a8
[ML] Get resources action should be lenient when sort field is unmapped (#42991) (#43046)
Get resources action sorts on the resource id. When there are no resources at
all, then it is possible the index does not contain a mapping for the resource
id field. In that case, the search api fails by default.

This commit adjusts the search request to ignore unmapped fields.

Closes elastic/kibana#37870
2019-06-10 19:50:19 +03:00
Mark Vieira e44b8b1e2e
[Backport] Remove dependency substitutions 7.x (#42866)
* Remove unnecessary usage of Gradle dependency substitution rules (#42773)

(cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)
2019-06-04 13:50:23 -07:00
Przemysław Witek f5014ace64
[ML] Add validation that rejects duplicate detectors in PutJobAction (#40967) (#41072)
* [ML] Add validation that rejects duplicate detectors in PutJobAction

Closes #39704

* Add YML integration test for duplicate detectors fix.

* Use "== false" comparison rather than "!" operator.

* Refine error message to sound more natural.

* Put job description in square brackets in the error message.

* Use the new validation in ValidateJobConfigAction.

* Exclude YML tests for new validation from permission tests.
2019-04-10 15:43:35 +02:00
Benjamin Trent 7e4c0e6991
ML: Adds set_upgrade_mode API endpoint (#37837)
* ML: Add MlMetadata.upgrade_mode and API

* Adding tests

* Adding wait conditionals for the upgrade_mode call to return

* Adding tests

* adjusting format and tests

* Adjusting wait conditions for api return and msgs

* adjusting doc tests

* adding upgrade mode tests to black list
2019-01-28 09:07:30 -06:00
Benjamin Trent 9e932f4869
ML: removing unnecessary upgrade code (#37879) 2019-01-25 13:57:41 -06:00
Benjamin Trent df3b58cb04
ML: add migrate anomalies assistant (#36643)
* ML: add migrate anomalies assistant

* adjusting failure handling for reindex

* Fixing request and tests

* Adding tests to blacklist

* adjusting test

* test fix: posting data directly to the job instead of relying on datafeed

* adjusting API usage

* adding Todos and adjusting endpoint

* Adding types to reindexRequest

* removing unreliable "live" data test

* adding index refresh to test

* adding index refresh to test

* adding index refresh to yaml test

* fixing bad exists call

* removing todo

* Addressing remove comments

* Adjusting rest endpoint name

* making service have its own logger

* adjusting validity check for newindex names

* fixing typos

* fixing renaming
2019-01-09 14:25:35 -06:00
Benjamin Trent 767d8e0801
[ML] Delete forecast API (#31134) (#33218)
* Delete forecast API (#31134)
2018-09-03 19:06:18 -05:00
Nik Everett 2c81d7f77e
Build: Rework shadow plugin configuration (#32409)
This reworks how we configure the `shadow` plugin in the build. The major
change is that we no longer bundle dependencies in the `compile` configuration,
instead we bundle dependencies in the new `bundle` configuration. This feels
more right because it is a little more "opt in" rather than "opt out" and the
name of the `bundle` configuration is a little more obvious.

As an neat side effect of this, the `runtimeElements` configuration used when
one project depends on another now contains exactly the dependencies needed
to run the project so you no longer need to reference projects that use the
shadow plugin like this:

```
testCompile project(path: ':client:rest-high-level', configuration: 'shadow')
```

You can instead use the much more normal:

```
testCompile "org.elasticsearch.client:elasticsearch-rest-high-level-client:${version}"
```
2018-08-21 20:03:28 -04:00
Jason Tedor 28d12b05b7
Move ML tests to be sub-projects of ML (#33026)
This commit moves the ML QA tests to be a sub-project of ML. The purpose
of this refactoring is to enable ML developers to run
:x-pack:plugin:ml:check and run the vast majority of a ML tests with a
single command (this still does not contain the ML REST tests, nor the
upgrade tests). This simplifies local development for faster iteration.
2018-08-21 12:23:21 -04:00