Commit Graph

1378 Commits

Author SHA1 Message Date
Benjamin Trent eefe7688ce
[7.x][ML] ML Model Inference Ingest Processor (#49052) (#49257)
* [ML] ML Model Inference Ingest Processor (#49052)

* [ML][Inference] adds lazy model loader and inference (#47410)

This adds a couple of things:

- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:

* [ML][Inference] Adjust inference configuration option API (#47812)

* [ML][Inference] adds logistic_regression output aggregator (#48075)

* [ML][Inference] Adding read/del trained models (#47882)

* [ML][Inference] Adding inference ingest processor (#47859)

* [ML][Inference] fixing classification inference for ensemble (#48463)

* [ML][Inference] Adding model memory estimations (#48323)

* [ML][Inference] adding more options to inference processor (#48545)

* [ML][Inference] handle string values better in feature extraction (#48584)

* [ML][Inference] Adding _stats endpoint for inference (#48492)

* [ML][Inference] add inference processors and trained models to usage (#47869)

* [ML][Inference] add new flag for optionally including model definition (#48718)

* [ML][Inference] adding license checks (#49056)

* [ML][Inference] Adding memory and compute estimates to inference (#48955)

* fixing version of indexed docs for model inference
2019-11-18 13:19:17 -05:00
Przemysław Witek 5f9965e4b8
Lower minimum model memory limit value from 1MB to 1kB. (#49227) (#49242) 2019-11-18 14:58:20 +01:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Alan Woodward f0e386d60d Fix ResizeResponseTests randomization 2019-11-13 11:50:04 +00:00
Alan Woodward 999d66fc87 Add client-side ResizeRequest and ResizeResponse classes (#48937)
Closes #48468
2019-11-13 10:34:11 +00:00
Michael Basnight bc23bc5146 Add delete alias to the HLRC (#48819)
The delete alias call is a rest only API call, but should still be added
to the rest client. This commit adds it as well as relevant tests.

Ref #47678
2019-11-12 11:02:53 -06:00
Benjamin Trent 46ab1db54f
[7.x] [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050) (#48958)
* [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

Related PR: https://github.com/elastic/ml-cpp/pull/809

* adjusting bwc version
2019-11-11 15:43:03 -05:00
Dan Hermann c85cf7a6de
Validate proxy base path at parse time (#47912) (#48825) 2019-11-04 09:51:13 -06:00
Michael Basnight ede1681c5a Remove old HLRC base test case (#48714)
This commit finishes cleaning up the AbstractHlrcXContentTestCase work
and removes this class. All classes that were using this are now using
the updated base class.

Ref #39745
2019-10-30 14:42:52 -05:00
Michael Basnight eee4cfaa8b Refactor HLRC transform stats test (#48708)
This test uses a deprecated base class, and this commit moves it over
to the new class.

Ref #39745
2019-10-30 14:42:37 -05:00
Michael Basnight d63e5772c0 Cleanup HLRC graph tests to use new test style (#48644)
The old graph tests were duplicated a lot and used a deprecated parent
class. This commit cleans that up and removes one of the duplicated
tests.

Ref #39745
2019-10-30 14:42:22 -05:00
Benjamin Trent c9ead80c31
[7.x] [ML][Inference] separating definition and config object storage (#48651) (#48695)
* [ML][Inference] separating definition and config object storage (#48651)

This separates out the `definition` object from being stored within the configuration object in the index. 

This allows us to gather the config object without decompressing a potentially large definition.

Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.
2019-10-30 13:27:29 -04:00
Gordon Brown 4bd514715d
Increase timeout in ILM doc test slightly (#48606)
This assertBusy can occasionally time out on systems under heavy load,
such as CI, so this commit increases the timeout.
2019-10-29 11:12:41 -07:00
Christoph Büscher 09d68e7548
Support `search_type` in Rank Evaluation API (#48542) (#48631)
Adding support for the `search_type` request parameter to the Ranking Evaluation
API since this parameter can impact the ranking and the metric score and should
be choosen in the same way when evaluating the search as later in the real
search.

Closes #48503
2019-10-29 14:54:33 +01:00
Rory Hunter da4654527b
Improve resiliency to auto-formatting in client (#48617)
Backport of #48447. Make a number of changes so that code in the
`client` directory is more resilient to automatic formatting. This
covers:

* Literal JSON handling:
   * Reformatting multiline JSON to embed whitespace in the strings
   * Remove string concatenation where JSON fits on a single line
   * Use `String.format` for large documents with variable content
* Remove some erroneous doc refs in `QueryDSLDocumentationTests`
* Move some comments around to they aren't auto-formatted to a strange
  place
2019-10-29 10:40:54 +00:00
Mark Vieira e5c6440a4f
Simplify usage of Gradle Shadow plugin (#48478) (#48597)
This commit simplifies and standardizes our usage of the Gradle Shadow
plugin to conform more to plugin conventions. The custom "bundle" plugin
has been removed as it's not necessary and performs the same function
as the Shadow plugin's default behavior with existing configurations.

Additionally, this removes unnecessary creation of a "nodeps" artifact,
which is unnecessary because by default project dependencies will in
fact use the non-shadowed JAR unless explicitly depending on the
"shadow" configuration.

Finally, we've cleaned up the logic used for unit testing, so we are
now correctly testing against the shadow JAR when the plugin is applied.
This better represents a real-world scenario for consumers and provides
better test coverage for incorrectly declared dependencies.

(cherry picked from commit 3698131109c7e78bdd3a3340707e1c7b4740d310)
2019-10-28 12:11:55 -07:00
Benjamin Trent 6ea59dd428
[ML][Transforms] add wait_for_checkpoint flag to stop (#47935) (#48591)
Adds `wait_for_checkpoint` for `_stop` API.
2019-10-28 13:02:57 -04:00
Gordon Brown 5021410165
Retry on RepositoryException in SLM tests (#48548)
Due to a bug, GETing a snapshot can cause a RespositoryException to be
thrown. This error is transient and should be retried, rather than
causing the test to fail. This commit converts those
RepositoryExceptions into AssertionErrors so that they will be retried
in code wrapped in assertBusy.
2019-10-28 09:24:38 -07:00
Michael Basnight 1ba57dbe08 [Docs] add missing snapshot restore reference (#45256) 2019-10-28 09:55:10 -05:00
David Turner e821a22580 Mute SecurityDocumentationIT#testGetUsers - see #48440 2019-10-28 14:03:01 +01:00
Michael Basnight 5228956ecc Add slices to delete and update by query in HLRC (#48420)
The slices param was missing from both delete by query and update by
query in the HLRC request converters. This commit fixes the omission.
2019-10-25 15:23:17 -05:00
Tanguy Leroux 2861088a59 Mute RestClientMultipleHostsIntegTests.testCancelAsyncRequests (#48535)
Relates https://github.com/elastic/elasticsearch/issues/45577
2019-10-25 17:42:24 +02:00
Tim Brooks c0b545f325
Make BytesReference an interface (#48486)
BytesReference is currently an abstract class which is extended by
various implementations. This makes it very difficult to use the
delegation pattern. The implication of this is that our releasable
BytesReference is a PagedBytesReference type and cannot be used as a
generic releasable bytes reference that delegates to any reference type.
This commit makes BytesReference an interface and introduces an
AbstractBytesReference for common functionality.
2019-10-24 15:39:30 -06:00
Michael Basnight d49958cef3 Remove deprecated test from the HLRC tests (#48424)
The AbstractHlrcWriteableXContentTestCase was replaced by a better test
case a while ago, and this is the last two instances using it. They have
been converted and the test is now deleted.

Ref #39745
2019-10-24 14:02:04 -05:00
Michael Basnight c19379ef31 Remove random when using HLRC sync and async calls (#48211)
This commit removes the randomization used by every execute call in the
high level rest tests. Previously every execute call, which can be many
calls per single test, would rely on a random boolean to determine if
they should use the sync or async methods provided to the execute
method. This commit runs the tests twice, using two different clusters,
both of them providing the value one time via a sysprop. This ensures
that the whole suite of tests is run using the sync and async code
paths.

Closes #39667
2019-10-24 09:06:17 -05:00
Przemysław Witek aa29567e11
[7.x] Fix assignment
Backport of https://github.com/elastic/elasticsearch/pull/48216
2019-10-22 11:34:09 +02:00
Martijn van Groningen c09b62d5bf
Backport: also validate source index at put enrich policy time (#48311)
Backport of: #48254

This changes tests to create a valid
source index prior to creating the enrich policy.
2019-10-22 07:38:16 +02:00
Przemysław Witek 2db2b945ec
[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174) (#48294) 2019-10-21 22:07:19 +02:00
Lee Hinman cc0c876a8d
fix incorrect comparison (#48208) (#48303)
* remove comparison of identical values

the comparison `tookInMillis == tookInMillis` is always true.

* add comparison between tookInMillis
2019-10-21 09:14:44 -06:00
Przemysław Witek 1a42e37070
[7.x] Default "prediction_field_name" to (dependent_variable + "_prediction") (#48232) (#48279) 2019-10-21 13:18:08 +02:00
Albert Zaharovits 69fc715bc3
Fix security origin for TokenService#findActiveTokensFor... (#47418) (#48280)
All internal searches (triggered by APIs) across the .security index
must be performed while "under the security origin". Otherwise,
the search is performed in the context of the caller which most
likely does not have privileges to search .security (hopefully).
This commit fixes this in the case of two methods in the
TokenService and corrects an overly done such context switch
in the ApiKeyService.

In addition, this makes all tests from the client/rest-high-level
module execute as an all mighty administrator,
but not a literal superuser.

Closes #47151
2019-10-21 13:15:05 +03:00
Martijn van Groningen 5cb44a414c
Fixed links in java docs for EnrichClient (#48233) 2019-10-18 16:26:49 +02:00
Benjamin Trent 876f4aafac
[ML] Add logistic_regression output aggregator (#48238) (#48244) 2019-10-18 10:08:17 -04:00
Przemysław Witek 28f68fa221
Make num_top_classes parameter's default value equal to 2 (#48119) (#48201) 2019-10-17 18:43:15 +02:00
Gordon Brown eb7969e8cc
Fix ILM HLRC Javadoc->Documentation links (#48083)
Several links from the ILM HLRC Javadoc to the online documentation were
not updated when the ILM HLRC documentation was written. This commit
fixes those links.
2019-10-17 09:59:40 -06:00
Stuart Tettemer 356eef00c8
Scripting: get context names REST API (#48026) (#48168)
Adds `GET /_script_context`, returning a `contexts` object with each
available context as a key whose value is an empty object. eg.
```
{
  "contexts": {
    "aggregation_selector": {},
    "aggs": {},
    "aggs_combine": {},
...
  }
}
```

refs: #47411
2019-10-17 09:08:55 -06:00
Benjamin Trent 0dddbb5b42
[ML] Parse and index inference model (#48016) (#48152)
This adds parsing an inference model as a possible
result of the analytics process. When we do parse such a model
we persist a `TrainedModelConfig` into the inference index
that contains additional metadata derived from the running job.
2019-10-16 15:46:20 -04:00
Martijn van Groningen aff0c9babc
This commits merges (#48040) the enrich-7.x feature branch,
which is backport merge and adds a new ingest processor, named enrich processor,
that allows document being ingested to be enriched with data from other indices.

Besides a new enrich processor, this PR adds several APIs to manage an enrich policy.
An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner.

Related to #32789
2019-10-15 17:31:45 +02:00
David Roberts 984323783e
[ML][7.x] Add lazy assignment job config option (#47993)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-15 06:55:11 +01:00
Martijn van Groningen cc4b6c43b3
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-15 07:23:47 +02:00
Gordon Brown 300ddfa3c1
SLM Start/Stop HLRC and docs (#47966)
This commit adds HLRC support and documentation for the SLM Start and
Stop APIs, as well as updating existing documentation where appropriate.

This commit also ensures that the SLM APIs are properly included in the
HLRC documentation.
2019-10-14 16:56:31 -06:00
Martijn van Groningen 7cc73f6193
Add HLRC support for enrich execute policy API (#47991)
This PR also includes HLRC docs for the enrich stats api.

Relates to #32789
2019-10-14 19:55:48 +02:00
Michael Basnight f6f5efe141 Add cloudId builder to the HLRC (#47868)
Elastic cloud has a concept of a cloud Id. This Id is a base64 encoded
url, split up into a few parts. This commit allows the user to pass in a
cloud id now, which is translated to a HttpHost that is defined by the
encoded parts therein.
2019-10-14 12:47:06 -05:00
Tanguy Leroux e4ea8b46b6
Add Pause/Resume Auto-Follower APIs to High Level REST Client (#48004)
This commit adds support for Pause/Resume Auto-Follower APIs 
to the HLRC, with the documentation.

Relates #47510
2019-10-14 18:25:53 +02:00
David Roberts 1ca25bed38
[ML][7.x] Add option to stop datafeed that finds no data (#47995)
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Backport of #47922
2019-10-14 17:19:13 +01:00
Martijn van Groningen d4901a71d7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-14 10:27:17 +02:00
Tanguy Leroux 742fa818b8
Add Pause/Resume Auto Follower APIs (#47510) (#47904)
This commit adds two APIs that allow to pause and resume
CCR auto-follower patterns:

// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause

// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume

The ability to pause and resume auto-follow patterns can be
useful in some situations, including the rolling upgrades of
cluster using a bi-directional cross-cluster replication scheme
(see #46665).

This commit adds a new active flag to the AutoFollowPattern
and adapts the AutoCoordinator and AutoFollower classes so
that it stops to fetch remote's cluster state when all auto-follow
patterns associate to the remote cluster are paused.

When an auto-follower is paused, remote indices that match the
pattern are just ignored: they are not added to the pattern's
followed indices uids list that is maintained in the local cluster
state. This way, when the auto-follow pattern is resumed the
indices created in the remote cluster in the meantime will be
picked up again and added as new following indices. Indices
created and then deleted in the remote cluster will be ignored
as they won't be seen at all by the auto-follower pattern at
resume time.

Backport of #47510 for 7.x
2019-10-13 09:22:51 +02:00
Yogesh Gaikwad ac209c142c
Remove uniqueness constraint for API key name and make it optional (#47549) (#47959)
Since we cannot guarantee the uniqueness of the API key `name` this commit removes the constraint and makes this field optional.

Closes #46646
2019-10-12 22:22:16 +11:00
Przemysław Witek d210bfa888
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 10:19:55 +02:00
Martijn van Groningen 102016d571
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-10 14:44:05 +02:00
Hendrik Muhs 0e7869128a
[7.5][Transform] introduce new roles and deprecate old ones (#47780) (#47819)
deprecate data_frame_transforms_{user,admin} roles and introduce transform_{user,admin} roles as replacement
2019-10-10 10:31:24 +02:00
Martijn van Groningen aace42d38d
Add HLRC support for enrich stats API (#47306)
This PR also includes HLRC docs for the enrich stats api.

Relates to #32789
2019-10-10 09:08:29 +02:00
Tim Brooks 02622c1ef9
Fix issues with serializing BulkByScrollResponse (#45357)
Currently there are two issues with serializing BulkByScrollResponse.
First, when deserializing from XContent, indexing exceptions and search
exceptions are switched. Additionally, search exceptions do no retain
the appropriate RestStatus code, so you must evaluate the status code
from the exception. However, the exception class is not always correctly
retained when serialized.

This commit adds tests in the failure case. Additionally, fixes the
swapping of failure types and adds the rest status code to the search
failure.
2019-10-09 10:12:14 -06:00
Martijn van Groningen da1e2ea461
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-09 09:06:13 +02:00
Benjamin Trent d33dbf82d4
[7.x] [ML][Inference] adjusting definition object schema and validation (#47447) (#47673)
* [ML][Inference] adjusting definition object schema and validation (#47447)

* [ML][Inference] adjusting definition object schema and validation

* finalizing schema and fixing inference npe

* addressing PR comments

* fixing for backport
2019-10-08 07:11:05 -04:00
Hendrik Muhs 5e0e54f455
[Transform] move root endpoint to _transform with BWC layer (#47127) (#47682)
move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings
2019-10-08 08:59:01 +02:00
Dimitris Athanasiou 7667ea5f6f
[7.x][ML] Additional outlier detection parameters (#47600) (#47669)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
2019-10-07 18:21:33 +03:00
Yogesh Gaikwad b6d1d2e6ec
Add 'create_doc' index privilege (#45806) (#47645)
Use case:
User with `create_doc` index privilege will be allowed to only index new documents
either via Index API or Bulk API.

There are two cases that we need to think:
- **User indexing a new document without specifying an Id.**
   For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`.
- **User indexing a new document with an Id.**
   This is problematic as we do not know whether a document with Id exists or not.
   If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine.

Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents.

In the `AuthorizationService` when authorizing a bulk request, we check the implied action.
This code changes that to append the `:op_type/index` or `:op_type/create`
to indicate the implied index action.
2019-10-07 23:58:44 +11:00
Yogesh Gaikwad 7c862fe71f
Add support to retrieve all API keys if user has privilege (#47274) (#47641)
This commit adds support to retrieve all API keys if the authenticated
user is authorized to do so.
This removes the restriction of specifying one of the
parameters (like id, name, username and/or realm name)
when the `owner` is set to `false`.

Closes #46887
2019-10-07 23:58:21 +11:00
Martijn van Groningen f2f2304c75
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-07 10:07:56 +02:00
Przemysław Witek ee952da2e2
[7.x] Implement evaluation API for multiclass classification problem (#47126) (#47343) 2019-10-04 17:54:51 +02:00
Przemysław Witek ec9b77deaa
[7.x] Implement new analysis type: classification (#46537) (#47559) 2019-10-04 13:47:19 +02:00
Alpar Torok 0a14bb174f Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 11:55:00 +03:00
Lee Hinman 2e3eb4b24e
Add API to execute SLM retention on-demand (#47405) (#47463)
* Add API to execute SLM retention on-demand (#47405)

This is a backport of #47405

This commit adds the `/_slm/_execute_retention` API endpoint. This
endpoint kicks off SLM retention and then returns immediately.

This in particular allows us to run retention without scheduling it
(for entirely manual invocation) or perform a one-off cleanup.

This commit also includes HLRC for the new API, and fixes an issue
in SLMSnapshotBlockingIntegTests where retention invoked prior to the
test completing could resurrect an index the internal test cluster
cleanup had already deleted.

Resolves #46508
Relates to #43663
2019-10-02 12:29:04 -06:00
Benjamin Trent 2228a7dd8d
[ML][Inference] adding ensemble model objects (#47241) (#47438)
* [ML][Inference] adding ensemble model objects

* addressing PR comments

* Update TreeTests.java

* addressing PR comments

* fixing test
2019-10-02 09:49:46 -04:00
Benjamin Trent f5fe5e7cd6
[7.x] [ML][Inference] Adding preprocessors to definition object (#47320) (#47370)
* [ML][Inference] Adding preprocessors to definition object (#47320)

* [ML][Inference] Adding preprocessors to definition object

* Update TrainedModelConfig.java

* adjusting for backport
2019-10-01 13:31:25 -04:00
Benjamin Trent 4335e07716
[7.x] [ML][Inference] adding .ml-inference* index and storage (#47267) (#47310)
* [ML][Inference] adding .ml-inference* index and storage (#47267)

* [ML][Inference] adding .ml-inference* index and storage

* Addressing PR comments

* Allowing null definition, adding validation tests for model config

* fixing line length

* adjusting for backport
2019-10-01 08:20:33 -04:00
Martijn van Groningen fe937ea4b8
Add config namespace in get policy api response (#47162)
Currently the policy config is placed directly in the json object
of the toplevel `policies` array field. For example:

```
{
    "policies": [
        {
            "match": {
                "name" : "my-policy",
                "indices" : ["users"],
                "match_field" : "email",
                "enrich_fields" : [
                    "first_name",
                    "last_name",
                    "city",
                    "zip",
                    "state"
                ]
            }
        }
    ]
}
```

This change adds a `config` field in each policy json object:

```
{
    "policies": [
        {
            "config": {
                "match": {
                    "name" : "my-policy",
                    "indices" : ["users"],
                    "match_field" : "email",
                    "enrich_fields" : [
                        "first_name",
                        "last_name",
                        "city",
                        "zip",
                        "state"
                    ]
                }
            }
        }
    ]
}
```

This allows us in the future to add other information about policies
in the get policy api response.

The UI will consume this API to build an overview of all policies.
The UI may in the future include additional information about a policy
and the plan is to include that in the get policy api, so that this
information can be gathered in a single api call.

An example of the information that is likely to be added is:
* Last policy execution time
* The status of a policy (executing, executed, unexecuted)
* Information about the last failure if exists
2019-09-30 14:37:23 +02:00
Yannick Welsch 9dc90e41fc Remove "force" version type (#47228)
It's been deprecated long ago and can be removed.

Relates to #20377

Closes #19769
2019-09-30 11:58:34 +02:00
Martijn van Groningen 66f72bcdbc
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-30 08:12:28 +02:00
Rory Hunter 53a4d2176f
Convert most awaitBusy calls to assertBusy (#45794) (#47112)
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.

There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.

I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.

Other changes:

  * Rework `waitForDocs` to scale its timeout. Instead of calling
    `assertBusy` in a loop, work out a reasonable overall timeout and await
    just once.
  * Some tests failed after switching to `assertBusy` and had to be fixed.
  * Correct the expect templates in AbstractUpgradeTestCase.  The ES
    Security team confirmed that they don't use templates any more, so
    remove this from the expected templates. Also rewrite how the setup
    code checks for templates, in order to give more information.
  * Remove an expected ML template from XPackRestTestConstants The ML team
    advised that the ML tests shouldn't be waiting for any
    `.ml-notifications*` templates, since such checks should happen in the
    production code instead.
  * Also rework the template checking code in `XPackRestTestHelper` to give
    more helpful failure messages.
  * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
2019-09-29 12:21:46 +01:00
Martijn van Groningen 7ffe2e7e63
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-27 14:42:11 +02:00
Tanguy Leroux 95e2ca741e
Remove unused private methods and fields (#47154)
This commit removes a bunch of unused private fields and unused
private methods from the code base.

Backport of (#47115)
2019-09-26 12:49:21 +02:00
Yogesh Gaikwad 9a64b7a888
[Backport] Validate `query` field when creating roles (#46275) (#47094)
In the current implementation, the validation of the role query
occurs at runtime when the query is being executed.

This commit adds validation for the role query when creating a role
but not for the template query as we do not have the runtime
information required for evaluating the template query (eg. authenticated user's
information). This is similar to the scripts that we
store but do not evaluate or parse if they are valid queries or not.

For validation, the query is evaluated (if not a template), parsed to build the
QueryBuilder and verify if the query type is allowed.

Closes #34252
2019-09-26 17:57:36 +10:00
Benjamin Trent fcddaa90de
[7.x] [ML][Inference] adding tree model (#47044) (#47141)
* [ML][Inference] adding tree model (#47044)

* [ML][Inference] adding tree model

* renaming features for updated schema

* fixing 7.x compilation
2019-09-25 19:11:15 -04:00
Gordon Brown 7ac647c365
Add support for POST requests to SLM Execute API (#47061)
This commit adds support for POST requests to the SLM `_execute` API,
because POST is a more appropriate HTTP verb for this action as it is
not idempotent. The docs are also changed to favor POST over PUT,
although PUT is not removed or officially deprecated.
2019-09-25 16:15:10 -06:00
Gordon Brown a46eef9634
Change SLM stats format (#46991)
Using arrays of objects with embedded IDs is preferred for new APIs over
using entity IDs as JSON keys.  This commit changes the SLM stats API to
use the preferred format.
2019-09-25 11:32:08 -06:00
Benjamin Trent 05fb7be571
[7.x] [ML][Inference] Feature pre-processing objects and functions (#46777) (#47040)
* [ML][Inference] Feature pre-processing objects and functions (#46777)

To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future. 

This PR lays down the ground work for 3 basic encodings:

* HotOne
* Target Mean
* Frequency

More feature encodings or pre-processings could be added in the future:

* Handling missing columns
* Standardization
* Label encoding
* etc....

* fixing compilation for namedxcontent tests
2019-09-25 08:16:24 -04:00
Hendrik Muhs e974f178b5 [Transform] rename data frame transform to transform for hlrc client (#46933)
rename data frame transform to transform for hlrc
2019-09-25 08:31:43 +02:00
maidoo 618efcfcf9
Add submitDeleteByQueryTask method to RestHighLevelClient (#46833)
The HLRC has a method for reindex, that allows to trigger an async reindex by running RestHighLevelClient.submitReindexTask and RestHighLevelClient.reindex. The delete by query however only has an RestHighLevelClient.deleteByQuery method (and its async counterpart), but no RestHighLevelClient.submitDeleteByQueryTask. So add RestHighLevelClient.submitDeleteByQueryTask

Closes #46395
2019-09-24 10:10:55 +02:00
Martijn van Groningen 084bda6d32
fixed hlrc method signatures 2019-09-23 11:14:07 +02:00
Martijn van Groningen 0cfddca61d
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-23 09:46:05 +02:00
Martijn van Groningen 363beefbcd
Change HLRC count request to accept a QueryBuilder (#46904)
Currently the CountRequest accepts a search source builder, while the
RestCountAction only accepts a top level query object. This can lead
to confusion if another element (e.g. aggregations) is specified,
because that will be ignored on the server side in RestCountAction.

By deprecating the current setter & constructor that accept a
SearchSourceBuilder and adding replacement that accepts a QueryBuilder
it is clear what the count api can handle from HLRC side.

Follow up from #46829
2019-09-23 08:43:07 +02:00
Lisa Cawley 875d864be6
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00
Andrei Dan 402fb5e882
Fix flaky test addSnapshotLifecyclePolicy (#46881) (#46912)
* addSnapshotLifecyclePolicy drop version assertion

This drops the assertion on the policy version (which was pinned to 1L)
as we want to execute both put policy apis (sync and async) for
documentation purposes. This will sometimes (depending on the async
call) yield a version of 2L. Waiting for the async call to always
complete could be an option but the test is already rather slow and it's
a bit of an overkill as we're already verifying the policy was created.

(cherry picked from commit af4864c39129bcdbf98d00223f445346a62075e4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-09-20 15:07:54 +01:00
Martijn van Groningen d82712417f
[HLRC] Send min_score as query string parameter to the count API (#46829)
Prior to this commit min_score was sent as request body parameter
(via SearchSourceBuilder), which is not possible in the count api.

Similar to #46474
2019-09-20 10:23:45 +02:00
James Rodewig b73a9604c1
[DOCS] Separate and reformat synced flush API docs (#46634) (#46839) 2019-09-19 08:22:37 -04:00
Turaç Kangal d1f47cf00e
[HLRC] Send terminate_after as query string parameter to the count API (#46474)
Prior to this commit terminate_after was sent as request body parameter
(via SearchSourceBuilder), which is not possible in the count api.

Closes #46446
2019-09-18 15:54:43 +02:00
Przemysław Witek e49be611ad
[7.x] Add audit messages for Data Frame Analytics (#46521) (#46738) 2019-09-16 21:21:38 +02:00
Hendrik Muhs c8f52ec4ff
[Transform] Rename data frame plugin to transform: classes in xpack.core (#46644) (#46734)
rename classes in xpack.core of transform plugin from "data frame transform" to "transform"
2019-09-16 13:39:22 +02:00
Henning Andersen c99adfb99a Relax ReindexIT test timeouts (#46312)
A couple of tests used 2 second timeout, which was prone to failure,
increased to 10 seconds.

Closes #46301
2019-09-16 09:30:10 +02:00
Luca Cavanna e57756492a Update http-core and http-client dependencies (#46549)
Relates to #45808
Closes #45577
2019-09-12 09:45:29 +02:00
Jilles van Gurp 60f40d7638 Expose the ability to cancel async requests in REST high-level client (#45688)
This commits makes all the async methods in the high level client return the `Cancellable` object that the low level client now exposes.

Relates to #45379 
Closes #44802
2019-09-12 09:45:29 +02:00
Luca Cavanna cfb186afaf Add support for cancelling async requests in low-level REST client (#45379)
The low-level REST client exposes a `performRequestAsync` method that
allows to send async requests, but today it does not expose the ability
to cancel such requests. That is something that the underlying apache
async http client supports, and it makes sense for us to expose.

This commit adds a return value to the `performRequestAsync` method,
which is backwards compatible. A `Cancellable` object gets returned,
which exposes a `cancel` public method. When calling `cancel`, the
on-going request associated with the returned `Cancellable` instance
will be cancelled by calling its `abort` method. This works throughout
multiple retries, though some special care was needed for the case where
`cancel` is called between different attempts (when one attempt has
failed and the consecutive one has not been sent yet).

Note that cancelling a request on the client side does not automatically 
translate to cancelling the server side execution of it. That needs to be 
specifically implemented, which is on the work for the search API (see #43332).

Relates to #44802
2019-09-12 09:45:28 +02:00
Hendrik Muhs efea581dcc
[7.x][Transform]Rename data frame plugin to transform: plugin and package names (#46583)
rename data frame transform plugin to transform:

 - rename plugin data-frame to transform
 - change all package names from o.e.*.dataframe.* to o.e.*.transform.*
 - necessary changes to fix loading/testing
2019-09-11 14:50:08 +02:00
Martijn van Groningen 60ad099178
Add HLRC support for enrich get policy API. (#45970)
Changed the signature of AbstractResponseTestCase#createServerTestInstance(...)
to include the randomly selected xcontent type. This is needed for the
creating a server response instance with a query which is represented as BytesReference.
Maybe this should go into a different change?

This PR also includes HLRC docs for the get policy api.

Relates to #32789
2019-09-11 14:42:50 +02:00
Lee Hinman cdc3a260af
Add retention to Snapshot Lifecycle Management (backport of #4… (#46506)
* Add retention to Snapshot Lifecycle Management (#46407)

This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown

* Update serialization versions

* Fix type inference

* Use non-Cancellable HLRC return value

* Fix Client mocking in test

* Fix SLMSnapshotBlockingIntegTests for 7.x branch

* Update SnapshotRetentionTask for non-multi-repo snapshot retrieval

* Add serialization guards for SnapshotLifecyclePolicy
2019-09-10 09:08:09 -06:00
Henning Andersen 7125c101e6 HLRC multisearchTemplate forgot params (#46492)
Since 7.3, the request converter for multiSearchTemplate would silently
not set the two request parameters `typed_keys` and
`max_concurrent_searches`.

Closes #46488
2019-09-10 08:47:08 +02:00
Martijn van Groningen ded98e50b7
Change exact match processor to match processor. (#46041)
Besides a rename, this changes allows to processor to attach multiple
enrich docs to the document being ingested.

Also in order to control the maximum number of enrich docs to be
included in the document being ingested, the `max_matches` setting
is added to the enrich processor.

Relates #32789
2019-09-04 18:05:12 +02:00
Martijn van Groningen 555b630160
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-02 09:16:55 +02:00