Commit Graph

576 Commits

Author SHA1 Message Date
Martijn van Groningen a4b0f66919
Add enrich stats api (#46462)
The enrich api returns enrich coordinator stats and
information about currently executing enrich policies.

The coordinator stats include per ingest node:
* The current number of search requests in the queue.
* The total number of outstanding remote requests that
  have been executed since node startup. Each remote
  request is likely to include multiple search requests.
  This depends on how much search requests are in the
  queue at the time when the remote request is performed.
* The number of current outstanding remote requests.
* The total number of search requests that `enrich`
  processors have executed since node startup.

The current execution policies stats include:
* The name of policy that is executing
* A full blow task info object that is executing the policy.

Relates to #32789
2019-09-11 13:40:24 +02:00
Przemysław Witek e38e631dac
[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967) (#46519) 2019-09-11 12:17:26 +02:00
Lee Hinman cdc3a260af
Add retention to Snapshot Lifecycle Management (backport of #4… (#46506)
* Add retention to Snapshot Lifecycle Management (#46407)

This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown

* Update serialization versions

* Fix type inference

* Use non-Cancellable HLRC return value

* Fix Client mocking in test

* Fix SLMSnapshotBlockingIntegTests for 7.x branch

* Update SnapshotRetentionTask for non-multi-repo snapshot retrieval

* Add serialization guards for SnapshotLifecyclePolicy
2019-09-10 09:08:09 -06:00
Martijn van Groningen c057fce978
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-09 08:40:54 +02:00
David Roberts 7c7fb7e32d [ML] Tolerate total_search_time_ms not mapped in get datafeed stats (#46432)
ML users who upgrade from versions prior to 7.4 to 7.4 or later
will have ML results indices that do not have mappings for the
total_search_time_ms field.  Therefore, when searching these
indices we must tolerate this field not having a mapping.

Fixes #46437
2019-09-06 14:31:15 +01:00
Julie Tibshirani 40c3225d26
First round of optimizations for vector functions. (#46294)
This PR merges the `vectors-optimize-brute-force` feature branch, which makes
the following changes to how vector functions are computed:
* Precompute the L2 norm of each vector at indexing time. (#45390)
* Switch to ByteBuffer for vector encoding. (#45936)
* Decode vectors and while computing the vector function. (#46103) 
* Use an array instead of a List for the query vector. (#46155)
* Precompute the normalized query vector when using cosine similarity. (#46190)

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2019-09-04 14:45:57 -07:00
Martijn van Groningen 555b630160
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-02 09:16:55 +02:00
Zachary Tong cf8a4171e1
Rename `data-science` plugin to `analytics` (#46133)
Rename `data-science` plugin to `analytics`. 
Also removes enabled flag. Backport of #46092
2019-08-29 12:45:39 -04:00
Julie Tibshirani d94c4dcffb Use float instead of double for query vectors. (#46004)
Currently, when using script_score functions like cosineSimilarity, the query
vector is treated as an array of doubles. Since the stored document vectors use
floats, it seems like the least surprising behavior for the query vectors to
also be float arrays.

In addition to improving consistency, this change may help with some
optimizations we have been considering around vector dot product.
2019-08-28 11:03:14 -07:00
Martijn van Groningen 1157224a6b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-28 10:14:07 +02:00
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Yogesh Gaikwad 7b6246ec67
Add `manage_own_api_key` cluster privilege (#45897) (#46023)
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.

This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.

The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.

The changes cover HLRC changes, documentation for the API changes.

Closes #40031
2019-08-28 00:44:23 +10:00
Dimitris Athanasiou dd6c13fdf9
[ML] Add description to DF analytics (#45774) (#46019) 2019-08-27 15:48:59 +03:00
Albert Zaharovits 1ebee5bf9b
PKI realm authentication delegation (#45906)
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.

In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.

Closes #34396
2019-08-27 14:42:46 +03:00
Zachary Tong 943a016bb2
Add Cumulative Cardinality agg (and Data Science plugin) (#45990)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 16:19:55 -04:00
Benjamin Trent a3a4ae0ac2
[ML] fixing bug where analytics process starts with 0 rows (#45879) (#45988)
The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start.

When building the configuration for the process we should not start the native process if there are no rows.

Adding some logging to indicate what is occurring.
2019-08-26 14:18:17 -05:00
Martijn van Groningen 837cfa2640
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-23 11:22:27 +02:00
Nhat Nguyen 3393f9599e
Ignore translog retention policy if soft-deletes enabled (#45473)
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Backport of #45473
Relates #45136
2019-08-22 16:40:06 -04:00
Przemysław Witek 7512337922
[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775) (#45825) 2019-08-22 11:14:26 +02:00
Martijn van Groningen 7f2ba91360
adjusted enrich rest specs to new format 2019-08-21 14:42:10 +02:00
Martijn van Groningen 2677ac14d2
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-21 14:28:17 +02:00
Przemysław Witek bf701b83d2
Shorten field names in EstimateMemoryUsageResponse (#45719) (#45772) 2019-08-21 12:45:09 +02:00
Przemysław Witek c6709f0979
Mute tests affected by renaming fields in Estimate memory usage response (#45743) (#45766) 2019-08-21 09:57:23 +02:00
Benjamin Trent ba7b677618
[ML] better handle empty results when evaluating regression (#45745) (#45759)
* [ML] better handle empty results when evaluating regression

* adding new failure test to ml_security black list

* fixing equality check for regression results
2019-08-20 17:37:04 -05:00
Michael Basnight e3373d349b Consolidate enrich list all and get by name APIs (#45705)
The get and list APIs are a single API in this commit. Whether
requesting one named policy or all policies, a list of policies is
returened. The list API code has all been removed and the GET api is
what remains, which contains much of the list response code.
2019-08-20 10:29:59 -05:00
Przemysław Witek 80dd0a0948
Get rid of EstimateMemoryUsageRequest and EstimateMemoryUsageAction.Request. (#45718) (#45725) 2019-08-20 15:49:17 +02:00
Luca Cavanna c31cddf27e
Update the schema for the REST API specification (#42346)
* Update the REST API specification

This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.

Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.

Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.

Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.

* Adapt YAML runner to new REST API specification format

The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.

Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
2019-08-16 14:40:00 +02:00
Martijn van Groningen 5ea0985711
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-16 09:47:11 +02:00
Benjamin Trent 0c343d8443
[7.x] [ML][Transforms] adjusting stats.progress for cont. transforms (#45361) (#45551)
* [ML][Transforms] adjusting stats.progress for cont. transforms (#45361)

* [ML][Transforms] adjusting stats.progress for cont. transforms

* addressing PR comments

* rename fix

* Adjusting bwc serialization versions
2019-08-14 13:08:27 -05:00
Przemysław Witek df574e5168
[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) (#45510) 2019-08-14 08:26:03 +02:00
Martijn van Groningen 1951cdf1cb
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-13 09:12:31 +02:00
Dimitris Athanasiou 27497ff75f
[7.x][ML] Add regression analysis to DF analytics (#45292) (#45388)
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.

This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
2019-08-09 19:31:13 +03:00
David Roberts 14545f8958
[ML-DataFrame] Combine task_state and indexer_state in _stats (#45324)
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:

- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state

Backport of #45276
2019-08-08 16:24:26 +01:00
Martijn van Groningen 708f856940
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-08 16:52:45 +02:00
Benjamin Trent 5db9982f71
[7.x] [ML][Data Frame] Add update transform api endpoint (#45154) (#45279)
* [ML][Data Frame] Add update transform api endpoint (#45154)

This adds the ability to `_update` stored data frame transforms. All mutable fields are applied when the next checkpoint starts. The exception being `description`.

This PR contains all that is necessary for this addition:
* HLRC
* Docs
* Server side
2019-08-07 10:37:35 -05:00
Zachary Tong 422aca9a5d Fix Rollup job creation to work with templates (#43943)
The PutJob API accidentally used an "expert" API of CreateIndexRequest.
That API is semi-lenient to syntax; a type could be omitted and the
request would work as expected.  But if a type was omitted it would
not merge with templates correctly, leading to index creation that
only has the template and not the requested mappings in the request.

This commit refactors the PutJob API to:

- Include the type name
- Use a less "expert" API in an attempt to future proof against errors
- Uses an XContentBuilder instead of string replacing, removes json template
2019-08-06 10:53:44 -04:00
Tomas Della Vedova 6b71621afc
Updated slm API spec parameters and URL (#44797) (#45102) 2019-08-02 11:39:52 +02:00
Dimitris Athanasiou 8a6675b994
[7.x][ML] Check dest index is empty when starting DF analytics (#45094) (#45112)
If one tries to start a DF analytics job that has already run,
the result will be that the task will fail after reindexing the
dest index from the source index. The results of the prior run
will be gone and the task state is not properly set to failed
with the failure reason.

This commit improves the behavior in this scenario. First, we
set the task state to `failed` in a set of failures that were
missed. Second, a validation is added that if the destination
index exists, it must be empty.
2019-08-02 00:19:48 +03:00
Martijn van Groningen aae2f0cff2
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-01 13:38:03 +07:00
Mayya Sharipova 0c68765088
Adds usage stats for vectors (#45023)
Example of usage:

_xpack/usage

"vectors": {
    "available": true,
    "enabled": true,
    "dense_vector_fields_count" : 1,
    "sparse_vector_fields_count" : 1,
    "dense_vector_dims_avg_count" : 100
}
Backport for #44512
2019-07-31 12:32:41 -04:00
Benjamin Trent 3f48720d41
[ML][Data Frames] unify validation exceptions between PUT/_preview (#44983) (#45012)
* [ML][Data Frames] unify validation exceptions between PUT/_preview

* addressing PR comments
2019-07-30 13:05:07 -05:00
Martijn van Groningen db49cb505e
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-29 14:45:10 +07:00
Gordon Brown d4b2d21339
Add option to filter ILM explain response (#44777)
In order to make it easier to interpret the output of the ILM Explain
API, this commit adds two request parameters to that API:

- `only_managed`, which causes the response to only contain indices
  which have `index.lifecycle.name` set
- `only_errors`, which causes the response to contain only indices in an
  ILM error state

"Error state" is defined as either being in the `ERROR` step or having
`index.lifecycle.name` set to a policy that does not exist.
2019-07-26 11:57:38 -04:00
James Baiera c5528a25e6 Merge branch '7.x' into enrich-7.x 2019-07-25 13:12:56 -04:00
Andrei Stefan 2633d11eb7
Switch from using docvalue_fields to extracting values from _source (#44062) (#44804)
* Switch from using docvalue_fields to extracting values from _source
where applicable. Doing this means parsing the _source and handling the
numbers parsing just like Elasticsearch is doing it when it's indexing
a document.
* This also introduces a minor limitation: aliases type of fields that
are NOT part of a tree of sub-fields will not be able to be retrieved
anymore. field_caps API doesn't shed any light into a field being an
alias or not and at _source parsing time there is no way to know if a
root field is an alias or not. Fields of the type "a.b.c.alias" can be
extracted from docvalue_fields, only if the field they point to can be
extracted from docvalue_fields. Also, not all fields in a hierarchy of
fields can be evaluated to being an alias.

(cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)
2019-07-25 10:02:41 +03:00
Przemysław Witek 26da573e94
[ML] [7.x] Only emit deprecation warning if there was actual change of a datafeed's job_id. (#44755)
* Only emit deprecation warning if there was actual change of a datafeed's job_id.
* Add @Deprecated annotation to DatafeedUpdate.Builder#setJobId method
2019-07-24 10:03:25 +02:00
David Roberts caf9411a72
[ML] Improve response format of data frame stats endpoint (#44743)
This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.

This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.

Backport of #44350
2019-07-23 18:00:50 +01:00
Przemysław Witek 16c8e18013
Deprecate the ability to update datafeed's job_id. (#44691) (#44742) 2019-07-23 14:48:56 +02:00
Benjamin Trent 4456850a8e
[7.x] [ML][Data Frame] Add optional defer_validation param to PUT (#44455) (#44697)
* [ML][Data Frame] Add optional defer_validation param to PUT (#44455)

* [ML][Data Frame] Add optional defer_validation param to PUT

* addressing PR comments

* reverting bad replace

* addressing pr comments

* Update put-transform.asciidoc

* Update put-transform.asciidoc

* Update put-transform.asciidoc

* adjusting for backport

* fixing imports

* [DOCS] Fixes formatting in  create data frame transform API
2019-07-22 15:12:55 -05:00
Benjamin Trent 06e21f7902
[7.x] [ML][Data Frame] adding force delete (#44590) (#44696)
* [ML][Data Frame] adding force delete (#44590)

* [ML][Data Frame] adding force delete

* Update delete-transform.asciidoc

* adjusting for backport
2019-07-22 13:13:25 -05:00
Yannick Welsch d98b3e4760
Move frozen indices to x-pack module (#44490)
Backport of #44408 and #44286.
2019-07-17 16:53:10 +02:00
Benjamin Trent 2c7ff812da
[ML] Add r_squared eval metric to regression (#44248) (#44378)
* [ML] Add r_squared eval metric to regression

* fixing tests and binarysoftclassification class

* Update RSquared.java

* Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java

Co-Authored-By: David Kyle <david.kyle@elastic.co>

* removing unnecessary debug test
2019-07-16 11:11:31 -05:00
Lee Hinman fb0461ac76
[7.x] Add Snapshot Lifecycle Management (#44382)
* Add Snapshot Lifecycle Management (#43934)

* Add SnapshotLifecycleService and related CRUD APIs

This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.

This also includes the get, put, and delete APIs for these policies

Relates to #38461

* Make scheduledJobIds return an immutable set

* Use Object.equals for SnapshotLifecyclePolicy

* Remove unneeded TODO

* Implement ToXContentFragment on SnapshotLifecyclePolicyItem

* Copy contents of the scheduledJobIds

* Handle snapshot lifecycle policy updates and deletions (#40062)

(Note this is a PR against the `snapshot-lifecycle-management` feature branch)

This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.

Relates to #38461

* Take a snapshot for the policy when the SLM policy is triggered (#40383)

(This is a PR for the `snapshot-lifecycle-management` branch)

This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.

This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.

Relates to #38461

* Record most recent snapshot policy success/failure (#40619)

Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.

This is the first step toward writing history in a more permanent
fashion.

* Validate snapshot lifecycle policies (#40654)

(This is a PR against the `snapshot-lifecycle-management` branch)

With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.

Part of #38461

* Hook SLM into ILM's start and stop APIs (#40871)

(This pull request is for the `snapshot-lifecycle-management` branch)

This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.

Relates to #38461

* Add tests for SnapshotLifecyclePolicyItem (#40912)

Adds serialization tests for SnapshotLifecyclePolicyItem.

* Fix improper import in build.gradle after master merge

* Add human readable version of modified date for snapshot lifecycle policy (#41035)

* Add human readable version of modified date for snapshot lifecycle policy

This small change changes it from:

```
...
"modified_date": 1554843903242,
...
```

To

```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```

Including the `"modified_date"` field when the `?human` field is used.

Relates to #38461

* Fix test

* Add API to execute SLM policy on demand (#41038)

This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.

```json
PUT /_ilm/snapshot/<policy>/_execute
```

And it returns the response with the generated snapshot name:

```json
{
  "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```

Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.

Relates to #38461

* Add next_execution to SLM policy metadata (#41221)

* Add next_execution to SLM policy metadata

This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:

```json
GET /_ilm/snapshot?human
{
  "production" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:16:21.865Z",
    "modified_date_millis" : 1555362981865,
    "policy" : {
      "name" : "<production-snap-{now/d}>",
      "schedule" : "*/30 * * * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "foo-*",
          "important"
        ],
        "ignore_unavailable" : true,
        "include_global_state" : false
      }
    },
    "next_execution" : "2019-04-15T21:16:30.000Z",
    "next_execution_millis" : 1555362990000
  },
  "other" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:12:19.959Z",
    "modified_date_millis" : 1555362739959,
    "policy" : {
      "name" : "<other-snap-{now/d}>",
      "schedule" : "0 30 2 * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "other"
        ],
        "ignore_unavailable" : false,
        "include_global_state" : true
      }
    },
    "next_execution" : "2019-04-16T02:30:00.000Z",
    "next_execution_millis" : 1555381800000
  }
}
```

Relates to #38461

* Fix and enhance tests

* Figured out how to Cron

* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)

This commit changes the endpoint for snapshot lifecycle management from:

```
GET /_ilm/snapshot/<policy>
```

to:

```
GET /_slm/policy/<policy>
```

It mimics the ILM path only using `slm` instead of `ilm`.

Relates to #38461

* Add initial documentation for SLM (#41510)

* Add initial documentation for SLM

This adds the initial documentation for snapshot lifecycle management.

It also includes the REST spec API json files since they're sort of
documentation.

Relates to #38461

* Add `manage_slm` and `read_slm` roles (#41607)

* Add `manage_slm` and `read_slm` roles

This adds two more built in roles -

`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.

`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.

Relates to #38461

* Add execute to the test

* Fix ilm -> slm typo in test

* Record SLM history into an index (#41707)

It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.

This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.

Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s.  Watcher would cause the same problem, but it
is already disabled (for the same reason).

* High Level Rest Client support for SLM (#41767)

* High Level Rest Client support for SLM

This commit add HLRC support for SLM.

Relates to #38461

* Fill out documentation tests with tags

* Add more callouts and asciidoc for HLRC

* Update javadoc links to real locations

* Add security test testing SLM cluster privileges (#42678)

* Add security test testing SLM cluster privileges

This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.

Relates to #38461

* Don't redefine vars

*  Add Getting Started Guide for SLM  (#42878)

This commit adds a basic Getting Started Guide for SLM.

* Include SLM policy name in Snapshot metadata (#43132)

Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.

* Fix compilation after master merge

* [TEST] Move exception wrapping for devious exception throwing

Fixes an issue where an exception was created from one line and thrown in another.

* Fix SLM for the change to AcknowledgedResponse

* Add Snapshot Lifecycle Management Package Docs (#43535)

* Fix compilation for transport actions now that task is required

* Add a note mentioning the privileges needed for SLM (#43708)

* Add a note mentioning the privileges needed for SLM

This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.

Relates to #38461

* Mention that you can create snapshots for indices you can't read

* Fix REST tests for new number of cluster privileges

* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)

* Fix SnapshotHistoryStoreTests after merge

* Remove overridden newResponse functions that have been removed

* Fix compilation for backport

* Fix get snapshot output parsing in test

* [DOCS] Add redirects for removed autogen anchors (#44380)

* Switch <tt>...</tt> in javadocs for {@code ...}
2019-07-16 07:37:13 -06:00
Przemysław Witek 3f3a3d3f2b
[7.x] Add DatafeedTimingStats.average_search_time_per_bucket_ms and TimingStats.total_bucket_processing_time_ms stats (#44125) (#44404) 2019-07-16 12:51:29 +02:00
Yannick Welsch a848fc9bf4 Revert "Add usage stats for frozen indices (#44286)"
This reverts commit 5e73c49ec8.
2019-07-15 21:41:25 +02:00
Yannick Welsch 5e73c49ec8 Add usage stats for frozen indices (#44286)
Adds usage stats for frozen indices of the form:

  "frozen_indices" : {
    "available" : true,
    "enabled" : true,
    "indices_count" : 0
  }
2019-07-15 17:34:46 +02:00
Benjamin Trent 79c62fd724
[ML][Data Frame] Fixing default delay set in timesync (#44281) (#44293)
* [ML][Data Frame] Fixing default delay set in timesync

* disallowing explicit null, don't do duration check on write
2019-07-12 15:21:47 -05:00
Mayya Sharipova 32cb47b91c Add l1norm and l2norm distances for vectors (#44116)
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947
2019-07-11 14:30:02 -04:00
Benjamin Trent c82d9c5b50
[ML] Adds support for regression.mean_squared_error to eval API (#44140) (#44218)
* [ML] Adds support for regression.mean_squared_error to eval API

* addressing PR comments

* fixing tests
2019-07-11 09:22:52 -05:00
Przemysław Witek 44781e415e
[7.x] [ML] Add DatafeedTimingStats to datafeed GetDatafeedStatsAction.Response (#43045) (#44118) 2019-07-10 11:51:44 +02:00
David Roberts cb62d4acdf [ML-DataFrame] Add a frequency option to transform config, default 1m (#44120)
Previously a data frame transform would check whether the
source index was changed every 10 seconds. Sometimes it
may be desirable for the check to be done less frequently.
This commit increases the default to 60 seconds but also
allows the frequency to be overridden by a setting in the
data frame transform config.
2019-07-10 09:59:00 +01:00
Dimitris Athanasiou d3ddedf9fc
[7.x][ML] Add missing doc links to df-analytics rest spec and HLRC javadocs (#44025) (#44033) 2019-07-06 02:03:29 +03:00
Mayya Sharipova 37e1ad7062 Forbid empty doc values on vector functions (#43944)
Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
2019-07-05 18:09:06 -04:00
Dimitris Athanasiou 30b20920b9
[7.x][ML] Report correct count for df-analytics get-stats API (#43969) (#43981)
The count should match the number of all df-analytics that
matched the id in the request. However, we set the count
to the number of df-analytics returned which was bound to the
`size` parameter. This commit fixes this by setting the count
to the count of the `get` response.
2019-07-05 10:28:57 +03:00
Martijn van Groningen 1dd3d14f09
take into account `manage_enrich` builtin role 2019-07-04 16:51:48 +02:00
Martijn van Groningen 653f1436a0
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-04 13:05:10 +02:00
Dimitris Athanasiou 96b0b27f18
[7.x][ML] Set df-analytics task state to failed when appropriate (#43880) (#43906)
This introduces a `failed` state to which the data frame analytics
persistent task is set to when something unexpected fails. It could
be the process crashing, the results processor hitting some error,
etc. The failure message is then captured and set on the task state.
From there, it becomes available via the _stats API as `failure_reason`.

The df-analytics stop API now has a `force` boolean parameter. This allows
the user to call it for a failed task in order to reset it to `stopped` after
we have ensured the failure has been communicated to the user.

This commit also adds the analytics version in the persistent task
params as this allows us to prevent tasks to run on unsuitable nodes in
the future.
2019-07-03 12:41:56 +03:00
Alexander Reelsen 9077c4402f Watcher: Allow to execute actions for each element in array (#41997)
This adds the ability to execute an action for each element that occurs
in an array, for example you could sent a dedicated slack action for
each search hit returned from a search.

There is also a limit for the number of actions executed, which is
hardcoded to 100 right now, to prevent having watches run forever.

The watch history logs each action result and the total number of actions
the were executed.

Relates #34546
2019-07-03 11:28:50 +02:00
Tim Vernum 2a8f30eb9a
Support builtin privileges in get privileges API (#43901)
Adds a new "/_security/privilege/_builtin" endpoint so that builtin
index and cluster privileges can be retrieved via the Rest API

Backport of: #42134
2019-07-03 19:08:28 +10:00
Mayya Sharipova 756c42f99f
Add dims parameter to dense_vector mapping (#43444) (#43895)
Typically, dense vectors of both documents and queries must have the same
number of dimensions. Different number of dimensions among documents
or query vector indicate an error. This PR enforces that all vectors
for the same field have the same number of dimensions. It also enforces
that query vectors have the same number of dimensions.
2019-07-02 21:14:16 -04:00
Benjamin Trent fb825a6470
[7.x] [ML][Data Frame] add node attr to GET _stats (#43842) (#43894)
* [ML][Data Frame] add node attr to GET _stats (#43842)

* [ML][Data Frame] add node attr to GET _stats

* addressing testing issues with node.attributes

* adjusting for backport
2019-07-02 19:35:37 -05:00
Tim Vernum 8d099dad38
Add "manage_api_key" cluster privilege (#43865)
This adds a new cluster privilege for manage_api_key. Users with this
privilege are able to create new API keys (as a child of their own
user identity) and may also get and invalidate any/all API keys
(including those owned by other users).

Backport of: #43728
2019-07-02 21:57:42 +10:00
Benjamin Trent 82c1ddc117
[7.x] [ML][Data Frame] Add deduced mappings to _preview response payload (#43742) (#43849)
* [ML][Data Frame] Add deduced mappings to _preview response payload (#43742)

* [ML][Data Frame] Add deduced mappings to _preview response payload

* updating preview docs

* fixing code for backport
2019-07-02 06:52:14 -05:00
Tanguy Leroux b977f019b8
Expose translog stats in ReadOnlyEngine (#43752) (#43823)
Backport of #43752 for 7.x.
2019-07-02 13:39:00 +02:00
Tomas Della Vedova 4cdb24bceb
Use explicit string keys in data_frame test (#43854) 2019-07-02 11:06:29 +02:00
Julie Tibshirani ffa5919d7c
Add support for 'flattened object' fields. (#43762)
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
2019-07-01 12:08:50 +03:00
Hendrik Muhs a58d231f4d relax trigger count for transform stats test (#43753)
relax trigger count test as we can not guarantee it due to async behaviour
2019-07-01 10:30:40 +02:00
Martijn van Groningen eb8e03bc8b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-30 21:32:51 +02:00
Dimitris Athanasiou 86c853a7c2
[7.x][ML] Rename outlier score setting to feature_influence_threshold (#43705) (#43734)
Renames outlier score setting `minimum_score_to_write_feature_influence`
to `feature_influence_threshold`.
2019-06-28 13:28:25 +03:00
Dimitris Athanasiou cab879118d
[7.x][ML] Support multiple source indices for df-analytics (#43702) (#43731)
This commit adds support for multiple source indices.
In order to deal with multiple indices having different mappings,
it attempts a best-effort approach to merge the mappings assuming
there are no conflicts. In case conflicts exists an error will be
returned.

To allow users creating custom mappings for special use cases,
the destination index is now allowed to exist before the analytics
job runs. In addition, settings are no longer copied except for
the `index.number_of_shards` and `index.number_of_replicas`.
2019-06-28 13:28:03 +03:00
Christoph Büscher 2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Przemysław Witek 94f18da5df
Add version and create_time to data frame analytics config (#43683) (#43712) 2019-06-28 07:37:21 +02:00
David Roberts f39619d182 [ML] Don't write timing stats on no-op (#43680)
Similar to elastic/ml-cpp#512, if a job opens and closes and
does nothing in between we shouldn't write timing stats to
the results index.
2019-06-27 16:37:54 +01:00
Martijn van Groningen 683e116601
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-27 08:35:37 +02:00
Benjamin Trent d05593c3ad
[ML][Data Frame] adds tests for continuous DF (#43601) (#43654) 2019-06-26 14:59:19 -05:00
Benjamin Trent 52e26bbc42
[ML][Data Frame] improve pivot nested field validations (#43548) (#43636)
* [ML][Data Frame] improve pivot nested field validations

* addressing pr comments
2019-06-26 13:35:51 -05:00
Benjamin Trent c121b00c98
[7.x] [ML][Data Frame] Add support for allow_no_match for endpoints (#43490) (#43637)
* [ML][Data Frame] Add support for allow_no_match for endpoints (#43490)

* [ML][Data Frame] Add support for allow_no_match parameter in endpoints

Adds support for:
* Get Transforms
* Get Transforms stats
* stop transforms

* Update DataFrameTransformDocumentationIT.java
2019-06-26 10:09:56 -05:00
Yannick Welsch 2049f715b3 Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <david.turner@elastic.co>
2019-06-26 08:07:56 +02:00
Tanguy Leroux 0dc1c12f13
Fix indices shown in _cat/indices (#43286)
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.

This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
 information about the indices.

Closes #39933
2019-06-25 20:02:34 +02:00
Dimitris Athanasiou 126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00
Martijn van Groningen df9f06213d
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-21 19:58:04 +02:00
Benjamin Trent f4b75d6d14
[7.x] [ML][Data Frame] Add version and create_time to transform config (#43384) (#43480)
* [ML][Data Frame] Add version and create_time to transform config (#43384)

* [ML][Data Frame] Add version and create_time to transform config

* s/transform_version/version s/Date/Instant

* fixing getter/setter for version

* adjusting for backport
2019-06-21 09:11:44 -05:00
Martijn van Groningen 9de4e878f7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-20 09:44:31 +02:00
Benjamin Trent 77ce3260dd
[ML][Data Frame] make response.count be total count of hits (#43241) (#43389)
* [ML][Data Frame] make response.count be total count of hits

* addressing line length check

* changing response count for filters

* adjusting serialization, variable name, and total count logic

* making count mandatory for creation
2019-06-19 16:19:06 -05:00
Benjamin Trent b333ced5a7
[7.x] [ML][Data Frame] adds new pipeline field to dest config (#43124) (#43388)
* [ML][Data Frame] adds new pipeline field to dest config (#43124)

* [ML][Data Frame] adds new pipeline field to dest config

* Adding pipeline support to _preview

* removing unused import

* moving towards extracting _source from pipeline simulation

* fixing permission requirement, adding _index entry to doc

* adjusting for java 8 compatibility

* adjusting bwc serialization version to 7.3.0
2019-06-19 16:18:27 -05:00
Mayya Sharipova aa6248d4d7
Move dense_vector and sparse_vector to module (#43280) (#43333) 2019-06-18 11:56:04 -04:00
Martijn Laarman 8b1b9f8ab9
Introduce stability description to the REST API specification (#38413) (#43278)
* introduce state to the REST API specification

* change state over to stability

* CCR is no GA updated to stable

* SQL is now GA so marked as stable

* Introduce `internal` as state for API's, marks stable in terms of lifetime but unstable in terms of guarantees on its output format since it exposes internal representations

* make setting a wrong stability value, or not setting it at all an error that causes the YAML test suite to fail

* update spec files to be explicit about their stability state

* Document the fact that stability needs to be defined

Otherwise the YAML test runner will fail (with a nice exception message)

* address check style violations

* update rest spec unit tests to include stability

* found one more test spec file not declaring stability, made sure stability appears after documentation everywhere

* cluster.state is stable, mark response in some way to denote its a key value format that can be changed during minors

* mark data frame API's as beta

* remove internal and private as states for an API

* removed the wrong enum values in the Stability Enum in the previous commit

(cherry picked from commit 61c34bbd92f8f7e5f22fa411c6b682b0ebd8a99d)
2019-06-17 16:57:13 +02:00
Przemysław Witek b2613a123d
[7.x] Report exponential_avg_bucket_processing_time which gives more weight to recent buckets (#43189) (#43263) 2019-06-17 08:58:26 +02:00
Przemysław Witek 65a584b6fb
[7.x] Report timing stats as part of the Job stats response (#42709) (#43193) 2019-06-14 09:03:14 +02:00
Martijn van Groningen c8e6474eef
Changes required for merging in 7.x branch. 2019-06-13 16:58:27 +02:00