1849 Commits

Author SHA1 Message Date
Tom Veasey
690099553c
[7.x][ML] Adds the class_assignment_objective parameter to classification ()
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes .
2020-03-13 17:35:51 +00:00
Tim Vernum
a8677499d7
[Backport] Add support for secondary authentication ()
This change makes it possible to send secondary authentication
credentials to select endpoints that need to perform a single action
in the context of two users.

Typically this need arises when a server process needs to call an
endpoint that users should not (or might not) have direct access to,
but some part of that action must be performed using the logged-in
user's identity.

Backport of: 
2020-03-13 16:30:20 +11:00
Jay Modi
af36665b08
Deprecate the logstash enabled setting ()
The setting, `xpack.logstash.enabled`, exists to enable or disable the
logstash extensions found within x-pack. In practice, this setting had
no effect on the functionality of the extension. Given this, the
setting is now deprecated in preparation for removal.

Backport of 
2020-03-12 10:18:39 -06:00
Yannick Welsch
48124807d5 Fix SourceOnlySnapshotIT ()
The tests in this class had been failing for a while, but went unnoticed as not tested by CI (see ).

The reason the tests fail is that the can-match phase is smarter now, and filters out access to a non-existing field.

Closes 
2020-03-12 14:15:03 +01:00
Benjamin Trent
89668c5ea0
[ML][Inference] adds new default_field_map field to trained models () ()
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Przemysław Witek
8c4c19d310
Perform evaluation in multiple steps when necessary () () 2020-03-11 15:36:38 +01:00
Dimitris Athanasiou
cc7751eb16
[7.x][ML] Add ILM policy to ml stats indices () ()
Adds a size based ILM policy to automatically
rollover ml stats indices.

Backport of 
2020-03-11 13:01:34 +02:00
Dimitris Athanasiou
0fd0516d0d
[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees () ()
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.

Backport of 
2020-03-11 12:45:27 +02:00
David Roberts
532a720e1b
[ML] Skeleton estimate_model_memory endpoint for anomaly detection ()
This is a partial implementation of an endpoint for anomaly
detector model memory estimation.

It is not complete, lacking docs, HLRC and sensible numbers
for many anomaly detector configurations.  These will be
added in a followup PR in time for 7.7 feature freeze.

A skeleton endpoint is useful now because it allows work on
the UI side of the change to commence.  The skeleton endpoint
handles the same cases that the old UI code used to handle,
and produces very similar estimates for these cases.

Backport of 
2020-03-11 10:20:00 +00:00
Jake Landis
2ab502afc4
[7.x] Remove dead 'beats' code () () 2020-03-10 20:57:29 -05:00
Przemko Robakowski
847ac9c7d7
Fix null config in SnapshotLifecyclePolicy.toRequest () ()
This avoids NPE when executing SLM policy when no config was provided.

Related to 

Closes 

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-03-10 20:44:30 +01:00
Przemysław Witek
d54d7f2be0
[7.x] Implement ILM policy for .ml-state* indices () () 2020-03-10 14:24:18 +01:00
Hendrik Muhs
696aa4ddaf
[7.x][Transform] add support for script in group_by () ()
add the possibility to base the group_by on the output of a script.

closes 
backport 
2020-03-10 11:12:58 +01:00
Cauê Marcondes
b68d7b1c33
giving kibana user privileges to create custom link index () () 2020-03-10 09:50:38 +01:00
Henning Andersen
a4d481f2bb ILM Freeze step retry when not acknowledged ()
A freeze operation can partially fail in multiple places, including the
close verification step. This left the index in an unfrozen but
partially closed state. Now throw an exception to retry the freeze step
instead.
2020-03-10 08:03:39 +01:00
Jay Modi
a81460dbf5
Make watch history indices hidden ()
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.

Relates 
Backport of 
2020-03-06 09:47:03 -07:00
Dimitris Athanasiou
9abf537527
[7.x][ML] Improve DF analytics audits and logging () ()
Adds audits for when the job starts reindexing, loading data,
analyzing, writing results. Also adds some info logging.

Backport of 
2020-03-06 13:47:27 +02:00
Nik Everett
609c61f75c
Formalize usage stats for analytics (backport of ) ()
This moves the usage statistics gathering from the `AnalyticsPlugin`
into an `AnalyicsUsage`, removing the static state. It also checks the
license level when parsing all analytics aggregations. This is how we
were checking them before but we did it in an easy to forget way. This
way is slightly simpler, I think.
2020-03-04 10:29:11 -05:00
Adrien Grand
cb868d2f5e
Introduce a constant_keyword field. () ()
This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.

The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?

For this field there is a choice between
 1. accepting values in `_source` when they are equal to the value configured
    in mappings, but rejecting mapping updates
 2. rejecting values in `_source` but then allowing updates to the value that
    is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.

Backport of 
2020-03-03 16:01:47 +01:00
Yang Wang
70814daa86
Allow _rollup_search with read privilege () ()
Currently _rollup_search requires manage privilege to access. It should really be
a read only operation. This PR changes the requirement to be read indices privilege.

Resolves: 
2020-03-03 22:29:54 +11:00
Hendrik Muhs
a328a8eaf1
[7.x][Transform] implement node.transform to control where to… ()
implement transform node attributes to disable transform on certain nodes and
test which nodes are allowed to do remote connections

closes 
closes 
closes 

backport 
2020-03-02 16:10:57 +01:00
Martijn van Groningen
d102158e6f
Improve closing mock webserver when failed to start ()
Fix NPE when closing a webserver that hasn't started correctly.

This can happen when ssl context isn't initialized. The server instance is then never set,
which causes an NPE that masks the actual failure.

Example stacktrace that would mask an actual failure:

```
java.lang.NullPointerException
	at org.elasticsearch.test.http.MockWebServer.close(MockWebServer.java:271)
	at org.elasticsearch.xpack.watcher.test.integration.HttpSecretsIntegrationTests.cleanup(HttpSecretsIntegrationTests.java:70)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
```
2020-03-02 07:19:08 +01:00
Dimitris Athanasiou
85b4e45093
[7.x]ML] Parse and report memory usage for DF Analytics () ()
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.

Backport of  and 
2020-02-29 13:03:40 +02:00
Yang Wang
82553524af
Respect runas realm for ApiKey security operations () ()
When user A runs as user B and performs any API key related operations,
user B's realm should always be used to associate with the API key.
Currently user A's realm is used when getting or invalidating API keys
and owner=true. The PR is to fix this bug.

resolves: 
2020-02-28 10:53:52 +11:00
Benjamin Trent
19a6c5d980
[7.x] [ML][Inference] Add support for multi-value leaves to the tree model () ()
* [ML][Inference] Add support for multi-value leaves to the tree model ()

This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.
2020-02-27 14:05:28 -05:00
Benjamin Trent
eac38e9847
[ML] Add indices_options to datafeed config and update () ()
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.

This is necessary for the following use cases:
 - Reading from frozen indices
 - Allowing certain indices in multiple index patterns to not exist yet

These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.

closes https://github.com/elastic/elasticsearch/issues/48056
2020-02-27 13:43:25 -05:00
Yang Wang
14c21aedd2
Simplify ml license checking with XpackLicenseState internals () ()
This change removes TrainedModelConfig#isAvailableWithLicense method with calls to
XPackLicenseState#isAllowedByLicense.

Please note there are subtle changes to the code logic. But they are the right changes:
* Instead of Platinum license, Enterprise license nows guarantees availability.
* No explicit check when the license requirement is basic. Since basic license is always available, this check is unnecessary.
* Trial license is always allowed.
2020-02-27 14:14:16 +11:00
Yang Wang
f5c4e92558
Refactor license checking () ()
Improve code resuse and readility. Add convenience checking method which
covers most use cases without having to pass many boolean arguments.
2020-02-27 13:04:19 +11:00
Adrien Grand
1807f86751
Generalize how queries on _index are handled at rewrite time ()
Generalize how queries on `_index` are handled at rewrite time ()

Since this change refactors rewrites, I also took it as an opportunity to adrress : instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.

This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.

Closes 
2020-02-26 15:37:43 +01:00
Tim Brooks
6669e53f08
Do not lock on reads of XPackLicenseState ()
XPackLicenseState reads to necessary to validate a number of cluster
operations. This reads occasionally occur on transport threads which
should not be blocked. Currently we sychronize when reading. However,
this is unecessary as only a single piece of state is updateable. This
commit makes this state volatile and removes the locking.
2020-02-25 15:38:35 -07:00
David Kyle
044a4e127a
[ML] Add reason to DataFrameAnalyticsTask setFailed log message () () 2020-02-24 15:21:51 +00:00
Yang Wang
7cefba78c5
License removal leads back to a basic license () ()
A new basic license will be generated when existing license is deleted.
In addition, deleting an existing basic license is a no-op.

Resolves: 
2020-02-24 11:02:40 +11:00
Jason Tedor
1685cbe504
Add messages for CCR on license state changes ()
When a license expires, or license state changes, functionality might be
disabled. This commit adds messages for CCR to inform users that CCR
functionality will be disabled when a license expires, or when license
state changes to a license level lower than trial/platinum/enterprise.
2020-02-22 09:09:42 -05:00
Benjamin Trent
afd90647c9
[ML] Adds feature importance to option to inference processor () ()
This adds machine learning model feature importance calculations to the inference processor.

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : {
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 18:42:31 -05:00
Jay Modi
f3f6ff97ee
Single instance of the IndexNameExpressionResolver ()
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.

In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.

Backport of 
2020-02-21 07:50:02 -07:00
Przemko Robakowski
aff693bc9f
Make FreezeStep retryable () ()
* Make FreezeStep retryable

This change marks `FreezeStep` as retryable and adds test to make sure we can really run it again.

* refactor tests

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-21 10:11:35 +01:00
Armin Braun
4bb780bc37
Refactor Inflexible Snapshot Repository BwC () ()
* Refactor Inflexible Snapshot Repository BwC ()

Transport the version to use for  a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere.
Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.
2020-02-21 09:14:34 +01:00
Przemysław Witek
b84e8db7b5
[7.x] Rename .ml-state index to .ml-state-000001 to support rollover () () 2020-02-21 08:55:59 +01:00
Yang Wang
4bc7545e43
Add enterprise mode and refactor license check () ()
Add enterprise operation mode to properly map enterprise license.

Aslo refactor XPackLicenstate class to consolidate license status and mode checks.
This class has many sychronised methods to check basically three things:
* Minimum operation mode required
* Whether security is enabled
* Whether current license needs to be active

Depends on the actual feature, either 1, 2 or all of above checks are performed.
These are now consolidated in to 3 helper methods (2 of them are new).
The synchronization is pushed down to the helper methods so actual checking
methods no longer need to worry about it.

resolves: 
2020-02-21 14:18:18 +11:00
Benjamin Trent
2a5c181dda
[ML][Inference] don't return inflated definition when storing trained models () ()
When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition.

These definitions can be large and returning the inflated definition causes undo work on the server and client side.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-20 19:47:29 -05:00
Przemko Robakowski
88bb06f055
Make DeleteStep retryable () ()
* Make DeleteStep retryable

This change marks `DeleteStep` as retryable and adds test to make sure we really can invoke it again.

* Fix unused import

* revert unneeded changes

* test reworked
2020-02-19 21:16:59 +01:00
David Kyle
7bbe5c8464
[Ml] Validate tree feature index is within range ()
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
2020-02-19 14:41:43 +00:00
Przemysław Witek
7cd997df84
[ML] Make ml internal indices hidden () () 2020-02-19 14:02:32 +01:00
Przemysław Witek
5acee761eb
Implement unit tests for AnomalyDetectorsIndex class () () 2020-02-19 12:24:59 +01:00
Ioannis Kakavas
09773efb41
[7.x] Return realm name in SAML Authenticate API () ()
This is useful in cases where the caller of the API needs to know
the name of the realm that consumed the SAML Response and
authenticated the user and this is not self evident (i.e. because
there are many saml realms defined in ES).
Currently, the way to learn the realm name would be to make a
subsequent request to the `_authenticate` API.
2020-02-18 17:16:24 +02:00
Jason Tedor
c9f72a0116
Fix shard follow task cleaner under security ()
The shard follow task cleaner executes on behalf of the user to clean up
a shard follow task after the follower index has been
deleted. Otherwise, these persistent tasks are left laying around, and
they fail to execute because the follower index has been deleted. In the
face of security, attempts to complete these persistent tasks would
fail.  This is because these cleanups are executed under the system
context (this makes sense, they are happening on behalf of the user
after the user has executed an action) but the system role was never
granted the permission for persistent task completion. This commit
addresses this by adding this cluster privilege to the system role.
2020-02-16 17:26:14 -05:00
Andrei Dan
bd3a70db4e
ILM fix the init step to actually be retryable () ()
We marked the `init` ILM step as retryable but our test used `waitUntil`
without an assert so we didn’t catch the fact that we were not actually
able to retry this step as our ILM state didn’t contain any information
about the policy execution (as we were in the process of initialising
it).

This commit manually sets the current step to `init` when we’re moving
the ilm policy into the ERROR step (this enables us to successfully
move to the error step and later retry the step)

* ShrunkenIndexCheckStep: Use correct logger

(cherry picked from commit f78d4b3d91345a2a8fc0f48b90dd66c9959bd7ff)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-02-15 18:42:05 +00:00
Andrei Dan
da2d441d50
ILM make the set-single-node-allocation retryable () ()
(cherry picked from commit 0e473115958f691fc8dc87293642aea6a07fe3da)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-02-14 17:31:24 +00:00
Nik Everett
146def8caa
Implement top_metrics agg () ()
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <zach@elastic.co>
2020-02-14 11:19:11 -05:00
Dimitris Athanasiou
ad56802ac6
[7.x][ML] Refactor ML mappings and templates into JSON resources (#51… ()
ML mappings and index templates have so far been created
programmatically. While this had its merits due to static typing,
there is consensus it would be clear to maintain those in json files.
In addition, we are going to adding ILM policies to these indices
and the component for a plugin to register ILM policies is
`IndexTemplateRegistry`. It expects the templates to be in resource
json files.

For the above reasons this commit refactors ML mappings and index
templates into json resource files that are registered via
`MlIndexTemplateRegistry`.

Backport of 
2020-02-14 17:16:06 +02:00