591 Commits

Author SHA1 Message Date
Armin Braun
cec02da0ac
Fix Source Only Snapshot REST Test Failure (#50456) (#50459)
We are matching on the exact number of shards in this test, but may run into
snapshotting more than the single index created in it due to auto-created indices like
`.watcher`.
Fixed by making the test only take a snapshot of the single index used by this test.

Closes #50450
2019-12-23 12:24:08 +01:00
Benjamin Trent
71ff330c4e
[ML][Inference] updates specs with new params + docs (#50373) (#50441) 2019-12-20 12:13:45 -05:00
Przemysław Witek
3e3a93002f
[7.x] Fix accuracy metric (#50310) (#50433) 2019-12-20 15:34:38 +01:00
Hendrik Muhs
de14092ad2 [Transform] refactor source and dest validation to support CCS (#50018)
refactors source and dest validation, adds support for CCS, makes resolve work like reindex/search, allow aliased dest index with a single write index.

fixes #49988
fixes #49851
relates #43201
2019-12-20 10:49:53 +01:00
Przemysław Witek
cc4bc797f9
[7.x] Implement precision and recall metrics for classification evaluation (#49671) (#50378) 2019-12-19 18:55:05 +01:00
Tim Vernum
47e5e34f42
Support "enterprise" license types (#49474)
This adds "enterprise" as an acceptable type for a license loaded
through the PUT _license API.

Internally an enterprise license is treated as having a "platinum"
operating mode.

The handling of License types was refactored to have a new explicit
"LicenseType" enum in addition to the existing "OperatingMode" enum.

By default (in 7.x) the GET license API will return "platinum" when an
enterprise license is active in order to be compatible with existing
consumers of that API.
A new "accept_enterprise" flag has been introduced to allow clients to
opt-in to receive the correct "enterprise" type.

Backport of: #49223
2019-12-12 14:37:44 +11:00
Julie Tibshirani
277880bb4f
In sparse vector REST tests, specify the index name in searches. (#50061)
The `sparse_vector` REST tests occasionally fail on 7.x because we don't receive the expected response headers with deprecation warnings.

One theory as to what is happening is that there is an extra empty index present in addition to the test index. Since the search doesn't specify an index name, it hits both the test index and this extra empty index and shard responses from the extra index don't produce deprecation warnings. If not all shard responses contain the warning headers, then certain deprecation warnings can be lost (due to the bug described in #33936).

This PR tries to harden the `sparse_vector` tests by always specifying the index name during a search. This doesn't fix the root causes of the issue, but is good practice and can help avoid intermittent failures.

Addresses #49383.
2019-12-11 10:33:47 -08:00
Stuart Cam
44cd2f444c Add the REST API specifications for SLM Status / Start / Stop endpoints. (#49759)
Was originally missed in PR #47710

(cherry picked from commit 133b34c8355639ae0f699a86ffd9f37d19f73bca)
2019-12-11 13:34:13 +11:00
Dimitris Athanasiou
8891f4db88
[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990) (#50023)
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.

Backport of #49990
2019-12-10 15:29:19 +02:00
Dimitris Athanasiou
4edb2e7bb6
[7.x][ML] Add optional source filtering during data frame reindexing (#49690) (#49718)
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531

Backport of #49690
2019-11-29 16:10:44 +02:00
Jim Ferenczi
496bb9e2ee
Add a listener to track the progress of a search request locally (#49471) (#49691)
This commit adds a function in NodeClient that allows to track the progress
of a search request locally. Progress is tracked through a SearchProgressListener
that exposes query and fetch responses as well as partial and final reduces.
This new method can be used by modules/plugins inside a node in order to track the
progress of a local search request.

Relates #49091
2019-11-28 18:23:09 +01:00
David Roberts
62811c2272 [ML] Add default categorization analyzer definition to ML info (#49545)
The categorization job wizard in the ML UI will use this
information when showing the effect of the chosen categorization
analyzer on a sample of input.
2019-11-25 13:39:16 +00:00
Dimitris Athanasiou
8eaee7cbdc
[7.x][ML] Explain data frame analytics API (#49455) (#49504)
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.

The API consolidates information that is useful before
creating a data frame analytics job.

It includes:

- memory estimation
- field selection explanation

Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.

Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.

Backport of #49455
2019-11-22 22:06:10 +02:00
Przemysław Witek
c7ac2011eb
[7.x] Implement accuracy metric for multiclass classification (#47772) (#49430) 2019-11-21 15:01:18 +01:00
Przemysław Witek
38aec2e298
Relax assertions related to datafeed timing stats in .yml test (#49285) (#49291) 2019-11-19 12:50:14 +01:00
Julie Tibshirani
a0ee6c8f7e
Add telemetry for flattened fields. (#48972) (#49125)
Currently we just record the number of flattened fields defined in the mappings.
2019-11-18 12:29:42 -08:00
Benjamin Trent
eefe7688ce
[7.x][ML] ML Model Inference Ingest Processor (#49052) (#49257)
* [ML] ML Model Inference Ingest Processor (#49052)

* [ML][Inference] adds lazy model loader and inference (#47410)

This adds a couple of things:

- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:

* [ML][Inference] Adjust inference configuration option API (#47812)

* [ML][Inference] adds logistic_regression output aggregator (#48075)

* [ML][Inference] Adding read/del trained models (#47882)

* [ML][Inference] Adding inference ingest processor (#47859)

* [ML][Inference] fixing classification inference for ensemble (#48463)

* [ML][Inference] Adding model memory estimations (#48323)

* [ML][Inference] adding more options to inference processor (#48545)

* [ML][Inference] handle string values better in feature extraction (#48584)

* [ML][Inference] Adding _stats endpoint for inference (#48492)

* [ML][Inference] add inference processors and trained models to usage (#47869)

* [ML][Inference] add new flag for optionally including model definition (#48718)

* [ML][Inference] adding license checks (#49056)

* [ML][Inference] Adding memory and compute estimates to inference (#48955)

* fixing version of indexed docs for model inference
2019-11-18 13:19:17 -05:00
Przemysław Witek
150db2b544
Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143) (#49164) 2019-11-18 07:52:57 +01:00
Mayya Sharipova
0e933a093d Add index name to search requests (#49175)
We can't guarantee expected request failures if search request is across
many indexes, as if expected shards fail, some indexes may return 200.

closes #47743
2019-11-15 16:39:18 -05:00
Yogesh Gaikwad
1b64c1992a Add owner flag parameter to the rest spec (#48500)
This commit adds missing info about newly added
`owner` flag to the rest spec, also adds a rest test
for the same.

Closes#48499
2019-10-30 13:07:01 +11:00
Julie Tibshirani
89c65752dc
Update the signature of vector script functions. (#48653)
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
2019-10-29 15:46:05 -07:00
Benjamin Trent
6ea59dd428
[ML][Transforms] add wait_for_checkpoint flag to stop (#47935) (#48591)
Adds `wait_for_checkpoint` for `_stop` API.
2019-10-28 13:02:57 -04:00
Russ Cam
b24bbd4296 Change policy_id to list type in slm.get_lifecycle (#47766)
This commit changes the REST API spec slm.get_lifecycle's policy_id url part to be of type "list", in line with other REST API specs that accept a comma-separated list of values.

Closes #47765
2019-10-25 09:04:25 +10:00
Julie Tibshirani
2664cbd20b
Deprecate the sparse_vector field type. (#48368)
We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
2019-10-23 16:35:03 -07:00
Igor Motov
8163e0a9e5 Mute XPackRestIT security/authz/14_cat_indices
Mutes "Test empty request while single authorized closed index"

Tracked by #47875
2019-10-23 14:17:44 -04:00
Przemysław Witek
2db2b945ec
[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174) (#48294) 2019-10-21 22:07:19 +02:00
Przemysław Witek
1a42e37070
[7.x] Default "prediction_field_name" to (dependent_variable + "_prediction") (#48232) (#48279) 2019-10-21 13:18:08 +02:00
Albert Zaharovits
69fc715bc3
Fix security origin for TokenService#findActiveTokensFor... (#47418) (#48280)
All internal searches (triggered by APIs) across the .security index
must be performed while "under the security origin". Otherwise,
the search is performed in the context of the caller which most
likely does not have privileges to search .security (hopefully).
This commit fixes this in the case of two methods in the
TokenService and corrects an overly done such context switch
in the ApiKeyService.

In addition, this makes all tests from the client/rest-high-level
module execute as an all mighty administrator,
but not a literal superuser.

Closes #47151
2019-10-21 13:15:05 +03:00
Przemysław Witek
28f68fa221
Make num_top_classes parameter's default value equal to 2 (#48119) (#48201) 2019-10-17 18:43:15 +02:00
Martijn van Groningen
a5fe69c344
Include enrich into the info api as feature (#48157)
This commit also fixes a bug, the enrich enabled setting
was not included in the list of settings.

Backport of #48109
2019-10-17 09:51:32 +02:00
Martijn van Groningen
aff0c9babc
This commits merges (#48040) the enrich-7.x feature branch,
which is backport merge and adds a new ingest processor, named enrich processor,
that allows document being ingested to be enriched with data from other indices.

Besides a new enrich processor, this PR adds several APIs to manage an enrich policy.
An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner.

Related to #32789
2019-10-15 17:31:45 +02:00
David Roberts
984323783e
[ML][7.x] Add lazy assignment job config option (#47993)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-15 06:55:11 +01:00
Martijn van Groningen
cc4b6c43b3
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-15 07:23:47 +02:00
James Baiera
18d7e32b7d Add wait for completion for Enrich policy execution (#47886)
This PR adds the ability to run the enrich policy execution task in the background,
returning a task id instead of waiting for the completed operation.
2019-10-14 16:05:28 -04:00
David Roberts
1ca25bed38
[ML][7.x] Add option to stop datafeed that finds no data (#47995)
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Backport of #47922
2019-10-14 17:19:13 +01:00
Martijn van Groningen
d4901a71d7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-14 10:27:17 +02:00
Tanguy Leroux
742fa818b8
Add Pause/Resume Auto Follower APIs (#47510) (#47904)
This commit adds two APIs that allow to pause and resume
CCR auto-follower patterns:

// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause

// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume

The ability to pause and resume auto-follow patterns can be
useful in some situations, including the rolling upgrades of
cluster using a bi-directional cross-cluster replication scheme
(see #46665).

This commit adds a new active flag to the AutoFollowPattern
and adapts the AutoCoordinator and AutoFollower classes so
that it stops to fetch remote's cluster state when all auto-follow
patterns associate to the remote cluster are paused.

When an auto-follower is paused, remote indices that match the
pattern are just ignored: they are not added to the pattern's
followed indices uids list that is maintained in the local cluster
state. This way, when the auto-follow pattern is resumed the
indices created in the remote cluster in the meantime will be
picked up again and added as new following indices. Indices
created and then deleted in the remote cluster will be ignored
as they won't be seen at all by the auto-follower pattern at
resume time.

Backport of #47510 for 7.x
2019-10-13 09:22:51 +02:00
Benjamin Trent
627faf1850
[7.x] [ML][Analytics] fix bug where regression deleted early does not delete state (#47885) (#47914)
* [ML][Analytics] fix bug where regression deleted early does not delete state (#47885)

* [ML][Analytics] fix bug where regression deleted early does not delete state

* Fixing ml with security test failure

* fixing for older java
2019-10-11 15:11:16 -04:00
Przemysław Witek
c62fe8c344
Require that the dependent variable column has at most 2 distinct values in classfication analysis. (#47858) (#47906) 2019-10-11 14:57:08 +02:00
Hendrik Muhs
fd1c4c198a [Transform] fixes tests which might fail due to auto-stop (#47867)
Batch transforms automatically stop after all data has processed, therefore tests can not reliable test the state. This change rewrites tests to remove the unreliable tests or use continuous transforms instead as they do not auto-stop.

fixes #47441
2019-10-11 11:10:38 +02:00
Martijn van Groningen
102016d571
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-10 14:44:05 +02:00
Hendrik Muhs
0e7869128a
[7.5][Transform] introduce new roles and deprecate old ones (#47780) (#47819)
deprecate data_frame_transforms_{user,admin} roles and introduce transform_{user,admin} roles as replacement
2019-10-10 10:31:24 +02:00
Martijn van Groningen
da1e2ea461
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-09 09:06:13 +02:00
Hendrik Muhs
5e0e54f455
[Transform] move root endpoint to _transform with BWC layer (#47127) (#47682)
move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings
2019-10-08 08:59:01 +02:00
Dimitris Athanasiou
7667ea5f6f
[7.x][ML] Additional outlier detection parameters (#47600) (#47669)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
2019-10-07 18:21:33 +03:00
Yogesh Gaikwad
b6d1d2e6ec
Add 'create_doc' index privilege (#45806) (#47645)
Use case:
User with `create_doc` index privilege will be allowed to only index new documents
either via Index API or Bulk API.

There are two cases that we need to think:
- **User indexing a new document without specifying an Id.**
   For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`.
- **User indexing a new document with an Id.**
   This is problematic as we do not know whether a document with Id exists or not.
   If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine.

Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents.

In the `AuthorizationService` when authorizing a bulk request, we check the implied action.
This code changes that to append the `:op_type/index` or `:op_type/create`
to indicate the implied index action.
2019-10-07 23:58:44 +11:00
Martijn van Groningen
f2f2304c75
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-07 10:07:56 +02:00
Przemysław Witek
ee952da2e2
[7.x] Implement evaluation API for multiclass classification problem (#47126) (#47343) 2019-10-04 17:54:51 +02:00
Przemysław Witek
ec9b77deaa
[7.x] Implement new analysis type: classification (#46537) (#47559) 2019-10-04 13:47:19 +02:00
Lee Hinman
2e3eb4b24e
Add API to execute SLM retention on-demand (#47405) (#47463)
* Add API to execute SLM retention on-demand (#47405)

This is a backport of #47405

This commit adds the `/_slm/_execute_retention` API endpoint. This
endpoint kicks off SLM retention and then returns immediately.

This in particular allows us to run retention without scheduling it
(for entirely manual invocation) or perform a one-off cleanup.

This commit also includes HLRC for the new API, and fixes an issue
in SLMSnapshotBlockingIntegTests where retention invoked prior to the
test completing could resurrect an index the internal test cluster
cleanup had already deleted.

Resolves #46508
Relates to #43663
2019-10-02 12:29:04 -06:00