Commit Graph

733 Commits

Author SHA1 Message Date
Hendrik Muhs e974f178b5 [Transform] rename data frame transform to transform for hlrc client (#46933)
rename data frame transform to transform for hlrc
2019-09-25 08:31:43 +02:00
Martijn van Groningen 0cfddca61d
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-23 09:46:05 +02:00
Lisa Cawley 875d864be6
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00
Hendrik Muhs abe889af75
[7.5][Transform] rename classes in transform plugin (#46867)
rename classes and settings in transform plugin, provide BWC for old settings
2019-09-20 10:43:00 +02:00
Benjamin Trent 9cf9c64ec2
[7.x] [ML][Transforms] remove `force` flag from _start (#46414) (#46748)
* [ML][Transforms] remove `force` flag from _start (#46414)

* [ML][Transforms] remove `force` flag from _start

* fixing expected error message

* adjusting bwc version
2019-09-18 10:06:05 -04:00
Tomas Della Vedova e1cf103980
Fixes for API specification (#46522) (#46736)
Follow-up of #42346
2019-09-17 11:49:24 +02:00
Benjamin Trent 92acc732de
[ML][Transform] Use field caps for mapping deductino (#46703) (#46742) 2019-09-16 10:05:55 -04:00
Martijn van Groningen a4b0f66919
Add enrich stats api (#46462)
The enrich api returns enrich coordinator stats and
information about currently executing enrich policies.

The coordinator stats include per ingest node:
* The current number of search requests in the queue.
* The total number of outstanding remote requests that
  have been executed since node startup. Each remote
  request is likely to include multiple search requests.
  This depends on how much search requests are in the
  queue at the time when the remote request is performed.
* The number of current outstanding remote requests.
* The total number of search requests that `enrich`
  processors have executed since node startup.

The current execution policies stats include:
* The name of policy that is executing
* A full blow task info object that is executing the policy.

Relates to #32789
2019-09-11 13:40:24 +02:00
Przemysław Witek e38e631dac
[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967) (#46519) 2019-09-11 12:17:26 +02:00
Lee Hinman cdc3a260af
Add retention to Snapshot Lifecycle Management (backport of #4… (#46506)
* Add retention to Snapshot Lifecycle Management (#46407)

This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown

* Update serialization versions

* Fix type inference

* Use non-Cancellable HLRC return value

* Fix Client mocking in test

* Fix SLMSnapshotBlockingIntegTests for 7.x branch

* Update SnapshotRetentionTask for non-multi-repo snapshot retrieval

* Add serialization guards for SnapshotLifecyclePolicy
2019-09-10 09:08:09 -06:00
Martijn van Groningen c057fce978
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-09 08:40:54 +02:00
David Roberts 7c7fb7e32d [ML] Tolerate total_search_time_ms not mapped in get datafeed stats (#46432)
ML users who upgrade from versions prior to 7.4 to 7.4 or later
will have ML results indices that do not have mappings for the
total_search_time_ms field.  Therefore, when searching these
indices we must tolerate this field not having a mapping.

Fixes #46437
2019-09-06 14:31:15 +01:00
Julie Tibshirani 40c3225d26
First round of optimizations for vector functions. (#46294)
This PR merges the `vectors-optimize-brute-force` feature branch, which makes
the following changes to how vector functions are computed:
* Precompute the L2 norm of each vector at indexing time. (#45390)
* Switch to ByteBuffer for vector encoding. (#45936)
* Decode vectors and while computing the vector function. (#46103) 
* Use an array instead of a List for the query vector. (#46155)
* Precompute the normalized query vector when using cosine similarity. (#46190)

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2019-09-04 14:45:57 -07:00
Martijn van Groningen 555b630160
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-02 09:16:55 +02:00
Zachary Tong cf8a4171e1
Rename `data-science` plugin to `analytics` (#46133)
Rename `data-science` plugin to `analytics`. 
Also removes enabled flag. Backport of #46092
2019-08-29 12:45:39 -04:00
Julie Tibshirani d94c4dcffb Use float instead of double for query vectors. (#46004)
Currently, when using script_score functions like cosineSimilarity, the query
vector is treated as an array of doubles. Since the stored document vectors use
floats, it seems like the least surprising behavior for the query vectors to
also be float arrays.

In addition to improving consistency, this change may help with some
optimizations we have been considering around vector dot product.
2019-08-28 11:03:14 -07:00
Martijn van Groningen 1157224a6b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-28 10:14:07 +02:00
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Yogesh Gaikwad 7b6246ec67
Add `manage_own_api_key` cluster privilege (#45897) (#46023)
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.

This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.

The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.

The changes cover HLRC changes, documentation for the API changes.

Closes #40031
2019-08-28 00:44:23 +10:00
Dimitris Athanasiou dd6c13fdf9
[ML] Add description to DF analytics (#45774) (#46019) 2019-08-27 15:48:59 +03:00
Albert Zaharovits 1ebee5bf9b
PKI realm authentication delegation (#45906)
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.

In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.

Closes #34396
2019-08-27 14:42:46 +03:00
Zachary Tong 943a016bb2
Add Cumulative Cardinality agg (and Data Science plugin) (#45990)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 16:19:55 -04:00
Benjamin Trent a3a4ae0ac2
[ML] fixing bug where analytics process starts with 0 rows (#45879) (#45988)
The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start.

When building the configuration for the process we should not start the native process if there are no rows.

Adding some logging to indicate what is occurring.
2019-08-26 14:18:17 -05:00
Martijn van Groningen 837cfa2640
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-23 11:22:27 +02:00
Nhat Nguyen 3393f9599e
Ignore translog retention policy if soft-deletes enabled (#45473)
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Backport of #45473
Relates #45136
2019-08-22 16:40:06 -04:00
Przemysław Witek 7512337922
[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775) (#45825) 2019-08-22 11:14:26 +02:00
Martijn van Groningen 7f2ba91360
adjusted enrich rest specs to new format 2019-08-21 14:42:10 +02:00
Martijn van Groningen 2677ac14d2
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-21 14:28:17 +02:00
Przemysław Witek bf701b83d2
Shorten field names in EstimateMemoryUsageResponse (#45719) (#45772) 2019-08-21 12:45:09 +02:00
Przemysław Witek c6709f0979
Mute tests affected by renaming fields in Estimate memory usage response (#45743) (#45766) 2019-08-21 09:57:23 +02:00
Benjamin Trent ba7b677618
[ML] better handle empty results when evaluating regression (#45745) (#45759)
* [ML] better handle empty results when evaluating regression

* adding new failure test to ml_security black list

* fixing equality check for regression results
2019-08-20 17:37:04 -05:00
Michael Basnight e3373d349b Consolidate enrich list all and get by name APIs (#45705)
The get and list APIs are a single API in this commit. Whether
requesting one named policy or all policies, a list of policies is
returened. The list API code has all been removed and the GET api is
what remains, which contains much of the list response code.
2019-08-20 10:29:59 -05:00
Przemysław Witek 80dd0a0948
Get rid of EstimateMemoryUsageRequest and EstimateMemoryUsageAction.Request. (#45718) (#45725) 2019-08-20 15:49:17 +02:00
Luca Cavanna c31cddf27e
Update the schema for the REST API specification (#42346)
* Update the REST API specification

This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.

Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.

Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.

Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.

* Adapt YAML runner to new REST API specification format

The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.

Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
2019-08-16 14:40:00 +02:00
Martijn van Groningen 5ea0985711
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-16 09:47:11 +02:00
Benjamin Trent 0c343d8443
[7.x] [ML][Transforms] adjusting stats.progress for cont. transforms (#45361) (#45551)
* [ML][Transforms] adjusting stats.progress for cont. transforms (#45361)

* [ML][Transforms] adjusting stats.progress for cont. transforms

* addressing PR comments

* rename fix

* Adjusting bwc serialization versions
2019-08-14 13:08:27 -05:00
Przemysław Witek df574e5168
[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) (#45510) 2019-08-14 08:26:03 +02:00
Martijn van Groningen 1951cdf1cb
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-13 09:12:31 +02:00
Dimitris Athanasiou 27497ff75f
[7.x][ML] Add regression analysis to DF analytics (#45292) (#45388)
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.

This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
2019-08-09 19:31:13 +03:00
David Roberts 14545f8958
[ML-DataFrame] Combine task_state and indexer_state in _stats (#45324)
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:

- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state

Backport of #45276
2019-08-08 16:24:26 +01:00
Martijn van Groningen 708f856940
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-08 16:52:45 +02:00
Benjamin Trent 5db9982f71
[7.x] [ML][Data Frame] Add update transform api endpoint (#45154) (#45279)
* [ML][Data Frame] Add update transform api endpoint (#45154)

This adds the ability to `_update` stored data frame transforms. All mutable fields are applied when the next checkpoint starts. The exception being `description`.

This PR contains all that is necessary for this addition:
* HLRC
* Docs
* Server side
2019-08-07 10:37:35 -05:00
Zachary Tong 422aca9a5d Fix Rollup job creation to work with templates (#43943)
The PutJob API accidentally used an "expert" API of CreateIndexRequest.
That API is semi-lenient to syntax; a type could be omitted and the
request would work as expected.  But if a type was omitted it would
not merge with templates correctly, leading to index creation that
only has the template and not the requested mappings in the request.

This commit refactors the PutJob API to:

- Include the type name
- Use a less "expert" API in an attempt to future proof against errors
- Uses an XContentBuilder instead of string replacing, removes json template
2019-08-06 10:53:44 -04:00
Tomas Della Vedova 6b71621afc
Updated slm API spec parameters and URL (#44797) (#45102) 2019-08-02 11:39:52 +02:00
Dimitris Athanasiou 8a6675b994
[7.x][ML] Check dest index is empty when starting DF analytics (#45094) (#45112)
If one tries to start a DF analytics job that has already run,
the result will be that the task will fail after reindexing the
dest index from the source index. The results of the prior run
will be gone and the task state is not properly set to failed
with the failure reason.

This commit improves the behavior in this scenario. First, we
set the task state to `failed` in a set of failures that were
missed. Second, a validation is added that if the destination
index exists, it must be empty.
2019-08-02 00:19:48 +03:00
Martijn van Groningen aae2f0cff2
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-01 13:38:03 +07:00
Mayya Sharipova 0c68765088
Adds usage stats for vectors (#45023)
Example of usage:

_xpack/usage

"vectors": {
    "available": true,
    "enabled": true,
    "dense_vector_fields_count" : 1,
    "sparse_vector_fields_count" : 1,
    "dense_vector_dims_avg_count" : 100
}
Backport for #44512
2019-07-31 12:32:41 -04:00
Benjamin Trent 3f48720d41
[ML][Data Frames] unify validation exceptions between PUT/_preview (#44983) (#45012)
* [ML][Data Frames] unify validation exceptions between PUT/_preview

* addressing PR comments
2019-07-30 13:05:07 -05:00
Martijn van Groningen db49cb505e
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-29 14:45:10 +07:00
Gordon Brown d4b2d21339
Add option to filter ILM explain response (#44777)
In order to make it easier to interpret the output of the ILM Explain
API, this commit adds two request parameters to that API:

- `only_managed`, which causes the response to only contain indices
  which have `index.lifecycle.name` set
- `only_errors`, which causes the response to contain only indices in an
  ILM error state

"Error state" is defined as either being in the `ERROR` step or having
`index.lifecycle.name` set to a policy that does not exist.
2019-07-26 11:57:38 -04:00
James Baiera c5528a25e6 Merge branch '7.x' into enrich-7.x 2019-07-25 13:12:56 -04:00
Andrei Stefan 2633d11eb7
Switch from using docvalue_fields to extracting values from _source (#44062) (#44804)
* Switch from using docvalue_fields to extracting values from _source
where applicable. Doing this means parsing the _source and handling the
numbers parsing just like Elasticsearch is doing it when it's indexing
a document.
* This also introduces a minor limitation: aliases type of fields that
are NOT part of a tree of sub-fields will not be able to be retrieved
anymore. field_caps API doesn't shed any light into a field being an
alias or not and at _source parsing time there is no way to know if a
root field is an alias or not. Fields of the type "a.b.c.alias" can be
extracted from docvalue_fields, only if the field they point to can be
extracted from docvalue_fields. Also, not all fields in a hierarchy of
fields can be evaluated to being an alias.

(cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)
2019-07-25 10:02:41 +03:00
Przemysław Witek 26da573e94
[ML] [7.x] Only emit deprecation warning if there was actual change of a datafeed's job_id. (#44755)
* Only emit deprecation warning if there was actual change of a datafeed's job_id.
* Add @Deprecated annotation to DatafeedUpdate.Builder#setJobId method
2019-07-24 10:03:25 +02:00
David Roberts caf9411a72
[ML] Improve response format of data frame stats endpoint (#44743)
This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.

This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.

Backport of #44350
2019-07-23 18:00:50 +01:00
Przemysław Witek 16c8e18013
Deprecate the ability to update datafeed's job_id. (#44691) (#44742) 2019-07-23 14:48:56 +02:00
Benjamin Trent 4456850a8e
[7.x] [ML][Data Frame] Add optional defer_validation param to PUT (#44455) (#44697)
* [ML][Data Frame] Add optional defer_validation param to PUT (#44455)

* [ML][Data Frame] Add optional defer_validation param to PUT

* addressing PR comments

* reverting bad replace

* addressing pr comments

* Update put-transform.asciidoc

* Update put-transform.asciidoc

* Update put-transform.asciidoc

* adjusting for backport

* fixing imports

* [DOCS] Fixes formatting in  create data frame transform API
2019-07-22 15:12:55 -05:00
Benjamin Trent 06e21f7902
[7.x] [ML][Data Frame] adding force delete (#44590) (#44696)
* [ML][Data Frame] adding force delete (#44590)

* [ML][Data Frame] adding force delete

* Update delete-transform.asciidoc

* adjusting for backport
2019-07-22 13:13:25 -05:00
Yannick Welsch d98b3e4760
Move frozen indices to x-pack module (#44490)
Backport of #44408 and #44286.
2019-07-17 16:53:10 +02:00
Benjamin Trent 2c7ff812da
[ML] Add r_squared eval metric to regression (#44248) (#44378)
* [ML] Add r_squared eval metric to regression

* fixing tests and binarysoftclassification class

* Update RSquared.java

* Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java

Co-Authored-By: David Kyle <david.kyle@elastic.co>

* removing unnecessary debug test
2019-07-16 11:11:31 -05:00
Lee Hinman fb0461ac76
[7.x] Add Snapshot Lifecycle Management (#44382)
* Add Snapshot Lifecycle Management (#43934)

* Add SnapshotLifecycleService and related CRUD APIs

This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.

This also includes the get, put, and delete APIs for these policies

Relates to #38461

* Make scheduledJobIds return an immutable set

* Use Object.equals for SnapshotLifecyclePolicy

* Remove unneeded TODO

* Implement ToXContentFragment on SnapshotLifecyclePolicyItem

* Copy contents of the scheduledJobIds

* Handle snapshot lifecycle policy updates and deletions (#40062)

(Note this is a PR against the `snapshot-lifecycle-management` feature branch)

This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.

Relates to #38461

* Take a snapshot for the policy when the SLM policy is triggered (#40383)

(This is a PR for the `snapshot-lifecycle-management` branch)

This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.

This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.

Relates to #38461

* Record most recent snapshot policy success/failure (#40619)

Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.

This is the first step toward writing history in a more permanent
fashion.

* Validate snapshot lifecycle policies (#40654)

(This is a PR against the `snapshot-lifecycle-management` branch)

With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.

Part of #38461

* Hook SLM into ILM's start and stop APIs (#40871)

(This pull request is for the `snapshot-lifecycle-management` branch)

This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.

Relates to #38461

* Add tests for SnapshotLifecyclePolicyItem (#40912)

Adds serialization tests for SnapshotLifecyclePolicyItem.

* Fix improper import in build.gradle after master merge

* Add human readable version of modified date for snapshot lifecycle policy (#41035)

* Add human readable version of modified date for snapshot lifecycle policy

This small change changes it from:

```
...
"modified_date": 1554843903242,
...
```

To

```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```

Including the `"modified_date"` field when the `?human` field is used.

Relates to #38461

* Fix test

* Add API to execute SLM policy on demand (#41038)

This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.

```json
PUT /_ilm/snapshot/<policy>/_execute
```

And it returns the response with the generated snapshot name:

```json
{
  "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```

Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.

Relates to #38461

* Add next_execution to SLM policy metadata (#41221)

* Add next_execution to SLM policy metadata

This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:

```json
GET /_ilm/snapshot?human
{
  "production" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:16:21.865Z",
    "modified_date_millis" : 1555362981865,
    "policy" : {
      "name" : "<production-snap-{now/d}>",
      "schedule" : "*/30 * * * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "foo-*",
          "important"
        ],
        "ignore_unavailable" : true,
        "include_global_state" : false
      }
    },
    "next_execution" : "2019-04-15T21:16:30.000Z",
    "next_execution_millis" : 1555362990000
  },
  "other" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:12:19.959Z",
    "modified_date_millis" : 1555362739959,
    "policy" : {
      "name" : "<other-snap-{now/d}>",
      "schedule" : "0 30 2 * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "other"
        ],
        "ignore_unavailable" : false,
        "include_global_state" : true
      }
    },
    "next_execution" : "2019-04-16T02:30:00.000Z",
    "next_execution_millis" : 1555381800000
  }
}
```

Relates to #38461

* Fix and enhance tests

* Figured out how to Cron

* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)

This commit changes the endpoint for snapshot lifecycle management from:

```
GET /_ilm/snapshot/<policy>
```

to:

```
GET /_slm/policy/<policy>
```

It mimics the ILM path only using `slm` instead of `ilm`.

Relates to #38461

* Add initial documentation for SLM (#41510)

* Add initial documentation for SLM

This adds the initial documentation for snapshot lifecycle management.

It also includes the REST spec API json files since they're sort of
documentation.

Relates to #38461

* Add `manage_slm` and `read_slm` roles (#41607)

* Add `manage_slm` and `read_slm` roles

This adds two more built in roles -

`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.

`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.

Relates to #38461

* Add execute to the test

* Fix ilm -> slm typo in test

* Record SLM history into an index (#41707)

It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.

This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.

Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s.  Watcher would cause the same problem, but it
is already disabled (for the same reason).

* High Level Rest Client support for SLM (#41767)

* High Level Rest Client support for SLM

This commit add HLRC support for SLM.

Relates to #38461

* Fill out documentation tests with tags

* Add more callouts and asciidoc for HLRC

* Update javadoc links to real locations

* Add security test testing SLM cluster privileges (#42678)

* Add security test testing SLM cluster privileges

This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.

Relates to #38461

* Don't redefine vars

*  Add Getting Started Guide for SLM  (#42878)

This commit adds a basic Getting Started Guide for SLM.

* Include SLM policy name in Snapshot metadata (#43132)

Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.

* Fix compilation after master merge

* [TEST] Move exception wrapping for devious exception throwing

Fixes an issue where an exception was created from one line and thrown in another.

* Fix SLM for the change to AcknowledgedResponse

* Add Snapshot Lifecycle Management Package Docs (#43535)

* Fix compilation for transport actions now that task is required

* Add a note mentioning the privileges needed for SLM (#43708)

* Add a note mentioning the privileges needed for SLM

This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.

Relates to #38461

* Mention that you can create snapshots for indices you can't read

* Fix REST tests for new number of cluster privileges

* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)

* Fix SnapshotHistoryStoreTests after merge

* Remove overridden newResponse functions that have been removed

* Fix compilation for backport

* Fix get snapshot output parsing in test

* [DOCS] Add redirects for removed autogen anchors (#44380)

* Switch <tt>...</tt> in javadocs for {@code ...}
2019-07-16 07:37:13 -06:00
Przemysław Witek 3f3a3d3f2b
[7.x] Add DatafeedTimingStats.average_search_time_per_bucket_ms and TimingStats.total_bucket_processing_time_ms stats (#44125) (#44404) 2019-07-16 12:51:29 +02:00
Yannick Welsch a848fc9bf4 Revert "Add usage stats for frozen indices (#44286)"
This reverts commit 5e73c49ec8.
2019-07-15 21:41:25 +02:00
Yannick Welsch 5e73c49ec8 Add usage stats for frozen indices (#44286)
Adds usage stats for frozen indices of the form:

  "frozen_indices" : {
    "available" : true,
    "enabled" : true,
    "indices_count" : 0
  }
2019-07-15 17:34:46 +02:00
Benjamin Trent 79c62fd724
[ML][Data Frame] Fixing default delay set in timesync (#44281) (#44293)
* [ML][Data Frame] Fixing default delay set in timesync

* disallowing explicit null, don't do duration check on write
2019-07-12 15:21:47 -05:00
Mayya Sharipova 32cb47b91c Add l1norm and l2norm distances for vectors (#44116)
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947
2019-07-11 14:30:02 -04:00
Benjamin Trent c82d9c5b50
[ML] Adds support for regression.mean_squared_error to eval API (#44140) (#44218)
* [ML] Adds support for regression.mean_squared_error to eval API

* addressing PR comments

* fixing tests
2019-07-11 09:22:52 -05:00
Przemysław Witek 44781e415e
[7.x] [ML] Add DatafeedTimingStats to datafeed GetDatafeedStatsAction.Response (#43045) (#44118) 2019-07-10 11:51:44 +02:00
David Roberts cb62d4acdf [ML-DataFrame] Add a frequency option to transform config, default 1m (#44120)
Previously a data frame transform would check whether the
source index was changed every 10 seconds. Sometimes it
may be desirable for the check to be done less frequently.
This commit increases the default to 60 seconds but also
allows the frequency to be overridden by a setting in the
data frame transform config.
2019-07-10 09:59:00 +01:00
Dimitris Athanasiou d3ddedf9fc
[7.x][ML] Add missing doc links to df-analytics rest spec and HLRC javadocs (#44025) (#44033) 2019-07-06 02:03:29 +03:00
Mayya Sharipova 37e1ad7062 Forbid empty doc values on vector functions (#43944)
Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
2019-07-05 18:09:06 -04:00
Dimitris Athanasiou 30b20920b9
[7.x][ML] Report correct count for df-analytics get-stats API (#43969) (#43981)
The count should match the number of all df-analytics that
matched the id in the request. However, we set the count
to the number of df-analytics returned which was bound to the
`size` parameter. This commit fixes this by setting the count
to the count of the `get` response.
2019-07-05 10:28:57 +03:00
Martijn van Groningen 1dd3d14f09
take into account `manage_enrich` builtin role 2019-07-04 16:51:48 +02:00
Martijn van Groningen 653f1436a0
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-04 13:05:10 +02:00
Dimitris Athanasiou 96b0b27f18
[7.x][ML] Set df-analytics task state to failed when appropriate (#43880) (#43906)
This introduces a `failed` state to which the data frame analytics
persistent task is set to when something unexpected fails. It could
be the process crashing, the results processor hitting some error,
etc. The failure message is then captured and set on the task state.
From there, it becomes available via the _stats API as `failure_reason`.

The df-analytics stop API now has a `force` boolean parameter. This allows
the user to call it for a failed task in order to reset it to `stopped` after
we have ensured the failure has been communicated to the user.

This commit also adds the analytics version in the persistent task
params as this allows us to prevent tasks to run on unsuitable nodes in
the future.
2019-07-03 12:41:56 +03:00
Alexander Reelsen 9077c4402f Watcher: Allow to execute actions for each element in array (#41997)
This adds the ability to execute an action for each element that occurs
in an array, for example you could sent a dedicated slack action for
each search hit returned from a search.

There is also a limit for the number of actions executed, which is
hardcoded to 100 right now, to prevent having watches run forever.

The watch history logs each action result and the total number of actions
the were executed.

Relates #34546
2019-07-03 11:28:50 +02:00
Tim Vernum 2a8f30eb9a
Support builtin privileges in get privileges API (#43901)
Adds a new "/_security/privilege/_builtin" endpoint so that builtin
index and cluster privileges can be retrieved via the Rest API

Backport of: #42134
2019-07-03 19:08:28 +10:00
Mayya Sharipova 756c42f99f
Add dims parameter to dense_vector mapping (#43444) (#43895)
Typically, dense vectors of both documents and queries must have the same
number of dimensions. Different number of dimensions among documents
or query vector indicate an error. This PR enforces that all vectors
for the same field have the same number of dimensions. It also enforces
that query vectors have the same number of dimensions.
2019-07-02 21:14:16 -04:00
Benjamin Trent fb825a6470
[7.x] [ML][Data Frame] add node attr to GET _stats (#43842) (#43894)
* [ML][Data Frame] add node attr to GET _stats (#43842)

* [ML][Data Frame] add node attr to GET _stats

* addressing testing issues with node.attributes

* adjusting for backport
2019-07-02 19:35:37 -05:00
Tim Vernum 8d099dad38
Add "manage_api_key" cluster privilege (#43865)
This adds a new cluster privilege for manage_api_key. Users with this
privilege are able to create new API keys (as a child of their own
user identity) and may also get and invalidate any/all API keys
(including those owned by other users).

Backport of: #43728
2019-07-02 21:57:42 +10:00
Benjamin Trent 82c1ddc117
[7.x] [ML][Data Frame] Add deduced mappings to _preview response payload (#43742) (#43849)
* [ML][Data Frame] Add deduced mappings to _preview response payload (#43742)

* [ML][Data Frame] Add deduced mappings to _preview response payload

* updating preview docs

* fixing code for backport
2019-07-02 06:52:14 -05:00
Tanguy Leroux b977f019b8
Expose translog stats in ReadOnlyEngine (#43752) (#43823)
Backport of #43752 for 7.x.
2019-07-02 13:39:00 +02:00
Tomas Della Vedova 4cdb24bceb
Use explicit string keys in data_frame test (#43854) 2019-07-02 11:06:29 +02:00
Julie Tibshirani ffa5919d7c
Add support for 'flattened object' fields. (#43762)
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
2019-07-01 12:08:50 +03:00
Hendrik Muhs a58d231f4d relax trigger count for transform stats test (#43753)
relax trigger count test as we can not guarantee it due to async behaviour
2019-07-01 10:30:40 +02:00
Martijn van Groningen eb8e03bc8b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-30 21:32:51 +02:00
Dimitris Athanasiou 86c853a7c2
[7.x][ML] Rename outlier score setting to feature_influence_threshold (#43705) (#43734)
Renames outlier score setting `minimum_score_to_write_feature_influence`
to `feature_influence_threshold`.
2019-06-28 13:28:25 +03:00
Dimitris Athanasiou cab879118d
[7.x][ML] Support multiple source indices for df-analytics (#43702) (#43731)
This commit adds support for multiple source indices.
In order to deal with multiple indices having different mappings,
it attempts a best-effort approach to merge the mappings assuming
there are no conflicts. In case conflicts exists an error will be
returned.

To allow users creating custom mappings for special use cases,
the destination index is now allowed to exist before the analytics
job runs. In addition, settings are no longer copied except for
the `index.number_of_shards` and `index.number_of_replicas`.
2019-06-28 13:28:03 +03:00
Christoph Büscher 2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Przemysław Witek 94f18da5df
Add version and create_time to data frame analytics config (#43683) (#43712) 2019-06-28 07:37:21 +02:00
David Roberts f39619d182 [ML] Don't write timing stats on no-op (#43680)
Similar to elastic/ml-cpp#512, if a job opens and closes and
does nothing in between we shouldn't write timing stats to
the results index.
2019-06-27 16:37:54 +01:00
Martijn van Groningen 683e116601
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-27 08:35:37 +02:00
Benjamin Trent d05593c3ad
[ML][Data Frame] adds tests for continuous DF (#43601) (#43654) 2019-06-26 14:59:19 -05:00
Benjamin Trent 52e26bbc42
[ML][Data Frame] improve pivot nested field validations (#43548) (#43636)
* [ML][Data Frame] improve pivot nested field validations

* addressing pr comments
2019-06-26 13:35:51 -05:00
Benjamin Trent c121b00c98
[7.x] [ML][Data Frame] Add support for allow_no_match for endpoints (#43490) (#43637)
* [ML][Data Frame] Add support for allow_no_match for endpoints (#43490)

* [ML][Data Frame] Add support for allow_no_match parameter in endpoints

Adds support for:
* Get Transforms
* Get Transforms stats
* stop transforms

* Update DataFrameTransformDocumentationIT.java
2019-06-26 10:09:56 -05:00
Yannick Welsch 2049f715b3 Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <david.turner@elastic.co>
2019-06-26 08:07:56 +02:00
Tanguy Leroux 0dc1c12f13
Fix indices shown in _cat/indices (#43286)
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.

This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
 information about the indices.

Closes #39933
2019-06-25 20:02:34 +02:00
Dimitris Athanasiou 126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00
Martijn van Groningen df9f06213d
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-21 19:58:04 +02:00
Benjamin Trent f4b75d6d14
[7.x] [ML][Data Frame] Add version and create_time to transform config (#43384) (#43480)
* [ML][Data Frame] Add version and create_time to transform config (#43384)

* [ML][Data Frame] Add version and create_time to transform config

* s/transform_version/version s/Date/Instant

* fixing getter/setter for version

* adjusting for backport
2019-06-21 09:11:44 -05:00
Martijn van Groningen 9de4e878f7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-20 09:44:31 +02:00
Benjamin Trent 77ce3260dd
[ML][Data Frame] make response.count be total count of hits (#43241) (#43389)
* [ML][Data Frame] make response.count be total count of hits

* addressing line length check

* changing response count for filters

* adjusting serialization, variable name, and total count logic

* making count mandatory for creation
2019-06-19 16:19:06 -05:00
Benjamin Trent b333ced5a7
[7.x] [ML][Data Frame] adds new pipeline field to dest config (#43124) (#43388)
* [ML][Data Frame] adds new pipeline field to dest config (#43124)

* [ML][Data Frame] adds new pipeline field to dest config

* Adding pipeline support to _preview

* removing unused import

* moving towards extracting _source from pipeline simulation

* fixing permission requirement, adding _index entry to doc

* adjusting for java 8 compatibility

* adjusting bwc serialization version to 7.3.0
2019-06-19 16:18:27 -05:00
Mayya Sharipova aa6248d4d7
Move dense_vector and sparse_vector to module (#43280) (#43333) 2019-06-18 11:56:04 -04:00
Martijn Laarman 8b1b9f8ab9
Introduce stability description to the REST API specification (#38413) (#43278)
* introduce state to the REST API specification

* change state over to stability

* CCR is no GA updated to stable

* SQL is now GA so marked as stable

* Introduce `internal` as state for API's, marks stable in terms of lifetime but unstable in terms of guarantees on its output format since it exposes internal representations

* make setting a wrong stability value, or not setting it at all an error that causes the YAML test suite to fail

* update spec files to be explicit about their stability state

* Document the fact that stability needs to be defined

Otherwise the YAML test runner will fail (with a nice exception message)

* address check style violations

* update rest spec unit tests to include stability

* found one more test spec file not declaring stability, made sure stability appears after documentation everywhere

* cluster.state is stable, mark response in some way to denote its a key value format that can be changed during minors

* mark data frame API's as beta

* remove internal and private as states for an API

* removed the wrong enum values in the Stability Enum in the previous commit

(cherry picked from commit 61c34bbd92f8f7e5f22fa411c6b682b0ebd8a99d)
2019-06-17 16:57:13 +02:00
Przemysław Witek b2613a123d
[7.x] Report exponential_avg_bucket_processing_time which gives more weight to recent buckets (#43189) (#43263) 2019-06-17 08:58:26 +02:00
Przemysław Witek 65a584b6fb
[7.x] Report timing stats as part of the Job stats response (#42709) (#43193) 2019-06-14 09:03:14 +02:00
Martijn van Groningen c8e6474eef
Changes required for merging in 7.x branch. 2019-06-13 16:58:27 +02:00
Martijn van Groningen 1f3db7eb3e
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-13 16:49:38 +02:00
Luca Cavanna afeda1a7b9 Split search in two when made against throttled and non throttled searches (#42510)
When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.

This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.

Closes #40900
2019-06-12 11:25:03 +02:00
Benjamin Trent 7ff3d86cf0
[ML][Data Frame] adding dest.index and id validations (#43053) (#43109)
* [ML][Data Frame] adding dest.index and id validations

* adjusting message format

* Adjusting id validity pattern

* Update DataFrameStrings.java
2019-06-11 15:55:18 -05:00
Benjamin Trent e384bf0276
[ML-DataFrame] stop task at completion of data frame function (#42955) (#43114)
* stop data frame task after it finishes

* test auto stop

* adapt tests

* persist the state correctly and move stop into listener

* Calling `onStop` even if persistence fails, changing `stop` to rely on doSaveState
2019-06-11 15:55:02 -05:00
Ryan Ernst 172cd4dbfa Remove description from xpack feature sets (#43065)
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
2019-06-11 09:22:58 -07:00
Martijn Laarman cb7ce865b7
remove path from rest-api-spec (#41452) (#43084)
(cherry picked from commit f5fde1d0843d2f0f53d3b9a15b9cfc8b94471ab7)
2019-06-11 12:52:36 +02:00
Dimitris Athanasiou 76a92b49a8
[ML] Get resources action should be lenient when sort field is unmapped (#42991) (#43046)
Get resources action sorts on the resource id. When there are no resources at
all, then it is possible the index does not contain a mapping for the resource
id field. In that case, the search api fails by default.

This commit adjusts the search request to ignore unmapped fields.

Closes elastic/kibana#37870
2019-06-10 19:50:19 +03:00
David Roberts b202a59f88 [ML] Add earliest and latest timestamps to field stats (#42890)
This change adds the earliest and latest timestamps into
the field stats for fields of type "date" in the output of
the ML find_file_structure endpoint.  This will enable the
cards for date fields in the file data visualizer in the UI
to be made to look more similar to the cards for date
fields in the index data visualizer in the UI.
2019-06-06 08:58:35 +01:00
Benjamin Trent 293f306b9a
[ML][Data Frame] forcing that no ptask => STOPPED state (#42800) (#42860)
* [ML][Data Frame] forcing that no ptask => STOPPED state

* Addressing side-effect, early exit for stop when stopped
2019-06-05 07:09:34 -05:00
David Roberts b61202b0a8 [ML] Add a limit on line merging in find_file_structure (#42501)
When analysing a semi-structured text file the
find_file_structure endpoint merges lines to form
multi-line messages using the assumption that the
first line in each message contains the timestamp.
However, if the timestamp is misdetected then this
can lead to excessive numbers of lines being merged
to form massive messages.

This commit adds a line_merge_size_limit setting
(default 10000 characters) that halts the analysis
if a message bigger than this is created.  This
prevents significant CPU time being spent subsequently
trying to determine the internal structure of the
huge bogus messages.
2019-06-03 13:45:51 +01:00
Benjamin Trent 0253927ec4
[ML Data Frame] Refactor stop logic (#42644) (#42763)
* Revert "invalid test"

This reverts commit 9dd8b52c13c716918ff97e6527aaf43aefc4695d.

* Testing

* mend

* Revert "[ML Data Frame] Mute Data Frame tests"

This reverts commit 5d837fa312b0e41a77a65462667a2d92d1114567.

* Call onStop and onAbort outside atomic update

* Don’t update CS

* Tidying up

* Remove invalid test that asserted logic that has been removed

* Add stopped event

* Revert "Add stopped event"

This reverts commit 02ba992f4818bebd838e1c7678bd2e1cc090bfab.

* Adding check for STOPPED in saveState
2019-06-03 06:53:44 -05:00
Przemysław Witek f6779de2b7
Increase maximum forecast interval to 10 years. (#41082) (#42710)
Increase the maximum duration to ~10 years (3650 days).
2019-05-31 06:19:47 +02:00
James Baiera 215170b6c3 Merge branch '7.x' into enrich-7.x 2019-05-30 16:13:06 -04:00
David Kyle c5a410f68b [ML Data Frame] Set DF task state when stopping (#42516)
Set the state to stopped prior to persisting
2019-05-29 16:39:44 +01:00
Hendrik Muhs 345ff21ae5 [ML-DataFrame] rewrite start and stop to answer with acknowledged (#42589)
rewrite start and stop to answer with acknowledged

fixes #42450
2019-05-29 11:14:32 +02:00
Michael Basnight 77eed9e6a0 Add enrich policy GET API (#41384)
This commit wires up the Rest calls and Transport calls for GET enrich
policy, as well as tests and rest spec additions.
2019-05-28 23:19:23 -05:00
Michael Basnight be60125a4e Merge branch '7.x' into enrich-7.x 2019-05-28 18:32:18 -05:00
David Kyle aea600fe7d [Ml Data Frame] Return bad_request on preview when config is invalid (#42447) 2019-05-28 15:36:50 +01:00
Martijn van Groningen a91cec4c46
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-05-27 10:18:02 +02:00
Hendrik Muhs 6d47ee9268 [ML-DataFrame] add support for fixed_interval, calendar_interval, remove interval (#42427)
* add support for fixed_interval, calendar_interval, remove interval

* adapt HLRC

* checkstyle

* add a hlrc to server test

* adapt yml test

* improve naming and doc

* improve interface and add test code for hlrc to server

* address review comments

* repair merge conflict

* fix date patterns

* address review comments

* remove assert for warning

* improve exception message

* use constants
2019-05-24 20:30:17 +02:00
Michael Basnight 2325ffb757 Add enrich policy execute API (#41762)
This commit wires up the Rest calls and Transport calls for execute
enrich policy, as well as tests and rest spec additions.
2019-05-24 09:39:29 -05:00
Martijn van Groningen 79fa7d8098
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-05-24 11:44:35 +02:00
David Kyle a23257ce06 [ML Data Frame] Account for completed data frames in test (#42351)
When asserting on the checkpoint value if the DF has completed the checkpoint will be 1 else 0.
Similarly state may be started or indexing. Closes #42309
2019-05-23 14:05:09 +01:00
Michael Basnight 323251c3d1 Merge branch '7.x' into enrich-7.x 2019-05-21 16:51:42 -05:00
David Kyle 7e4d3c695b [ML Data Frame] Persist and restore checkpoint and position (#41942)
Persist and restore Data frame's current checkpoint and position
2019-05-21 18:57:13 +01:00
David Kyle 0fd42ce1f5
[ML Data Frame] Start directly data frame rather than via the scheduler (#42224)
Trigger indexer start directly to put the indexer in INDEXING state immediately
2019-05-21 15:48:45 +01:00
David Kyle 24144aead2
[ML] Complete the Data Frame task on stop (#41752) (#42063)
Wait for indexer to stop then complete the persistent task on stop.
If the wait_for_completion is true the request will not return until stopped.
2019-05-21 10:24:20 +01:00
Zachary Tong 6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
Martijn van Groningen 855f5cc6a5
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-05-20 12:16:57 +02:00
Benjamin Trent febee07dcc
[ML] adding pivot.max_search_page_size option for setting paging size (#41920) (#42079)
* [ML] adding pivot.size option for setting paging size

* Changing field name to address PR comments

* fixing ctor usage

* adjust hlrc for field name change
2019-05-10 13:22:31 -05:00
Benjamin Trent 0931815355
[ML] properly nesting objects in document source (#41901) (#42077)
* [ML] properly nesting objects in document source

* Throw exception on agg extraction failure, cause it to fail df

* throwing error to stop df if unsupported agg is found
2019-05-10 13:22:12 -05:00
Benjamin Trent b23b06dded
[ML] verify that there are no duplicate leaf fields in aggs (#41895) (#42025)
* [ML] verify that there are no duplicate leaf fields in aggs

* addressing pr comments

* addressing PR comments

* optmizing duplication check
2019-05-09 14:29:10 -05:00
Michael Basnight 202a840da9 Merge remote-tracking branch 'upstream/7.x' into enrich-7.x 2019-05-08 13:59:01 -05:00
Zachary Tong f410f91f13 Cleanup RollupSearch exceptions, disallow partial results (#41272)
- msearch exceptions should be thrown directly instead of wrapping
in a RuntimeException
- Do not allow partial results (where some indices are missing), 
instead throw an exception if any index is missing
2019-05-08 12:38:42 -04:00
Benjamin Trent 50fc27e9a0
[ML] addresses preview bug, and adds check to PUT (#41803) (#41850) 2019-05-06 10:56:26 -05:00
Michael Basnight 5d53706310 Add enrich policy DELETE API (#41495)
This commit wires up the Rest calls and Transport calls for DELETE enrich
policy, as well as tests and rest spec additions.
2019-05-02 11:02:49 -05:00
Michael Basnight 2978ac3061 Add enrich policy list API (#41553)
This commit wires up the Rest calls and Transport calls for listing all
enrich policies, as well  as tests and rest spec additions.
2019-05-02 11:01:26 -05:00
Martijn van Groningen e429cd7f28
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-05-02 09:40:25 +02:00
Jason Tedor 7f3ab4524f
Bump 7.x branch to version 7.2.0
This commit adds the 7.2.0 version constant to the 7.x branch, and bumps
BWC logic accordingly.
2019-05-01 13:38:57 -04:00
Albert Zaharovits 990be1f806
Security Tokens moved to a new separate index (#40742)
This commit introduces the `.security-tokens` and `.security-tokens-7`
alias-index pair. Because index snapshotting is at the index level granularity
(ie you cannot snapshot a subset of an index) snapshoting .`security` had
the undesirable effect of storing ephemeral security tokens. The changes
herein address this issue by moving tokens "seamlessly" (without user
intervention) to another index, so that a "Security Backup" (ie snapshot of
`.security`) would not be bloated by ephemeral data.
2019-05-01 14:53:56 +03:00
Martijn van Groningen eb9618f1b7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-04-29 09:21:04 +02:00
Benjamin Trent a0990ca239
[ML] cleanup + adding description field to transforms (#41554) (#41605)
* [ML] cleanup + adding description field to transforms

* making description length have a max of 1k
2019-04-26 16:50:59 -05:00
Martijn van Groningen 6af17e4bdf
Add enrich qa module for rest tests and (#41568)
move put policy api yaml test to this rest module.

The main benefit is that all tests will then be run when running:
`./gradlew -p x-pack/plugin/enrich check`

The rest qa module starts a node with default distribution and basic
license.

This qa module will also be used for adding different rest tests (not yaml),
for example rest tests needed for #41532

Also when we are going to work on security integration then we can
add a security qa module under the qa folder. Also at some point
we should add a multi node qa module.
2019-04-26 20:20:02 +02:00
Michael Basnight fad45ea6bd Add enrich policy PUT API (#41383)
This commit wires up the Rest calls and Transport calls for PUT enrich
policy, as well as tests and rest spec additions.
2019-04-25 15:15:25 -05:00
Benjamin Trent 08843ba62b
[ML] Adds progress reporting for transforms (#41278) (#41529)
* [ML] Adds progress reporting for transforms

* fixing after master merge

* Addressing PR comments

* removing unused imports

* Adjusting afterKey handling and percentage to be 100*

* Making sure it is a linked hashmap for serialization

* removing unused import

* addressing PR comments

* removing unused import

* simplifying code, only storing total docs and decrementing

* adjusting for rewrite

* removing initial progress gathering from executor
2019-04-25 11:23:12 -05:00
Michael Basnight 38e6dcd388 Merge remote-tracking branch 'upstream/7.x' into enrich-7.x 2019-04-23 20:38:28 -05:00
Benjamin Trent e2f8ffdde8
[ML][Data Frame] Moving destination creation to _start (#41416) (#41433)
* [ML][Data Frame] Moving destination creation to _start

* slight refactor of DataFrameAuditor constructor
2019-04-23 09:32:57 -05:00
Martijn Laarman 85b9dc18a7 fix #35262 define deprecations of API's as a whole and urls (#39063)
* fix #35262 define deprecations of API's as a whole and urls

* document hot threads deprecated paths

* deprecate scroll_id as part of the URL, documented only as part of the body which is a safer behaviour as well

* use version numbers up to patch version

* rest spec parser picks up deprecated paths as paths too

(cherry picked from commit 7e06023e7603b7584bfd9ee4e8a1ccd82c208ce7)
2019-04-23 14:28:36 +02:00
Michael Basnight 860e783f14 Merge remote-tracking branch 'upstream/7.x' into enrich-7.x 2019-04-22 09:39:28 -05:00
Zachary Tong 7e62ff2823 [Rollup] Validate timezones based on rules not string comparision (#36237)
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
2019-04-17 13:46:44 -04:00
Yogesh Gaikwad 6a552c05fe
Use alias name from rollover request to query indices stats (#40774) (#41284)
In `TransportRolloverAction` before doing rollover we resolve
source index name (write index) from the alias in the rollover request.
Before evaluating the conditions and executing rollover action, we
retrieve stats, but to do so we used the source index name
resolved from the alias instead of alias from the index.
This fails when the user is assigned a role with index privilege on the
alias instead of the concrete index. This commit fixes this by using
the alias from the request.
After this change, verified that when we retrieve all the stats (including write + read indexes)
we are considering only source index.

Closes #40771
2019-04-17 14:15:05 +10:00
David Kyle 2b539f8347 [ML DataFrame] Data Frame stop all (#41156)
Wild card support for the data frame stop API
2019-04-15 15:04:28 +01:00
Nik Everett c379206c1e
Fix some documentation urls in rest-api-spec (#40618) (#41145)
Fixes some documentation urls in the rest-api-spec. Some of these URLs
pointed to 404s and a few others pointed to deprecated documentation
when we have better documentation now. I'm not consistent about `master`
vs `current` because we're not consistent in other places and I think we
should solve all of those at once with something a little more
automatic.
2019-04-12 10:11:14 -04:00
Martijn van Groningen b66ad34565
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-04-12 08:24:59 +02:00
Przemysław Witek f5014ace64
[ML] Add validation that rejects duplicate detectors in PutJobAction (#40967) (#41072)
* [ML] Add validation that rejects duplicate detectors in PutJobAction

Closes #39704

* Add YML integration test for duplicate detectors fix.

* Use "== false" comparison rather than "!" operator.

* Refine error message to sound more natural.

* Put job description in square brackets in the error message.

* Use the new validation in ValidateJobConfigAction.

* Exclude YML tests for new validation from permission tests.
2019-04-10 15:43:35 +02:00
Hendrik Muhs f9018ab11b [ML-DataFrame] create checkpoints on every new run (#40725)
Use the checkpoint service to create a checkpoint on every new run. Expose checkpoints stats on _stats endpoint.
2019-04-10 09:14:11 +02:00
Martijn van Groningen 5a1d5cca4f
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-04-09 09:59:24 +02:00
Benjamin Trent 665f0d81aa
[ML] refactoring start task a bit, removing unused code (#40798) (#40845) 2019-04-05 09:01:01 -05:00
Martijn van Groningen e6bdfea474
first commit 2019-04-05 09:32:40 +02:00
Benjamin Trent 945e7ca01e
[ML] Periodically persist data-frame running statistics to internal index (#40650) (#40729)
* [ML] Add mappings, serialization, and hooks to persist stats

* Adding tests for transforms without tasks having stats persisted

* intermittent commit

* Adjusting usage stats to account for stored stats docs

* Adding tests for id expander

* Addressing PR comments

* removing unused import

* adding shard failures to the task response
2019-04-02 14:16:55 -05:00
Tim Vernum 2c770ba3cb
Support mustache templates in role mappings (#40571)
This adds a new `role_templates` field to role mappings that is an
alternative to the existing roles field.

These templates are evaluated at runtime to determine which roles should be
granted to a user.
For example, it is possible to specify:

    "role_templates": [
      { "template":{ "source": "_user_{{username}}" } }
    ]

which would mean that every user is assigned to their own role based on
their username.

You may not specify both roles and role_templates in the same role
mapping.

This commit adds support for templates to the role mapping API, the role
mapping engine, the Java high level rest client, and Elasticsearch
documentation.

Due to the lack of caching in our role mapping store, it is currently
inefficient to use a large number of templated role mappings. This will be
addressed in a future change.

Backport of: #39984, #40504
2019-04-02 20:55:10 +11:00
Tim Vernum 7bdd41399d
Support roles with application privileges against wildcard applications (#40675)
This commit introduces 2 changes to application privileges:

- The validation rules now accept a wildcard in the "suffix" of an application name.
  Wildcards were always accepted in the application name, but the "valid filename" check
  for the suffix incorrectly prevented the use of wildcards there.

- A role may now be defined against a wildcard application (e.g. kibana-*) and this will
  be correctly treated as granting the named privileges against all named applications.
  This does not allow wildcard application names in the body of a "has-privileges" check, but the
  "has-privileges" check can test concrete application names against roles with wildcards.

Backport of: #40398
2019-04-02 14:48:39 +11:00
Benjamin Trent 12943c5d2c
[ML] Add data frame task state object and field (#40169) (#40490)
* [ML] Add data frame task state object and field

* A new state item is added so that the overall task state can be
accoutned for
* A new FAILED state and reason have been added as well so that failures
can be shown to the user for optional correction

* Addressing PR comments

* adjusting after master merge

* addressing pr comment

* Adjusting auditor usage with failure state

* Refactor, renamed state items to task_state and indexer_state

* Adding todo and removing redundant auditor call

* Address HLRC changes and PR comment

* adjusting hlrc IT test
2019-03-27 06:53:58 -05:00
Benjamin Trent 7b4f964708
[ML] make source and dest objects in the transform config (#40337) (#40396)
* [ML] make source and dest objects in the transform config

* addressing PR comments

* Fixing compilation post merge

* adding comment for Arrays.hashCode

* addressing changes for moving dest to object

* fixing data_frame yml tests

* fixing API test
2019-03-25 07:16:41 -05:00
Benjamin Trent 2dd879abac
[ML] adds support for non-numeric mapped types (#40220) (#40380)
* [ML] adds support for non-numeric mapped types and mapping overrides

* correcting hlrc compilation issues after merge

* removing mapping_override option

* clearing up unnecessary changes
2019-03-23 14:04:14 -05:00
Lisa Cawley e6799849d1 [DOCS] Adds placeholder for start and stop data frame transform APIs (#40278) 2019-03-21 09:39:10 -07:00
Lisa Cawley caa0129d44 [DOCS] Adds placeholder for create and delete data frame transform APIs (#40233) 2019-03-21 09:13:50 -07:00
lcawl 0e712d476e Adds URL for preview data frame transforms 2019-03-21 08:28:23 -07:00
Lisa Cawley ff2bcc9d11 [DOCS] Adds placeholder for get data frame transform APIs (#40283) 2019-03-21 07:57:01 -07:00
Yogesh Gaikwad 5d30df5a60
Fix so non super users can also create API keys (#40028) (#40286)
When creating API keys we check for if API key with
the same key name already exists and fail the request if it does.
The check should have been performed with XPackSecurityUser
instead of the authenticated user. This caused the request to fail
in case of the non-super user trying to create an API key.
This commit fixes by executing search action with SECURITY_ORIGIN
so it can be executed with XPackSecurityUser.
Also fixed the Rest test to avoid using a user with `super_user` role.

Closes #40029
2019-03-21 15:53:25 +11:00
Benjamin Trent 5ae43855fc
[ML] Refactor GET Transforms API (#40015) (#40269)
* [Data Frame] Refactor GET Transforms API:

* Add pagination
* comma delimited list expression support GET transforms
* Flag troublesome internal code for future refactor

* Removing `allow_no_transforms` param, ratcheting down pageparam option

* Changing  DataFrameFeatureSet#usage to not get all configs

* Intermediate commit

* Writing test for batch data gatherer

* Removing unused import

* removing bad println used for debugging

* Updating BatchedDataIterator comments and query

* addressing pr comments

* disallow null scrollId to cause stackoverflow
2019-03-20 19:14:50 -05:00
David Kyle 387648065d
[ML] Data Frame HLRC start & stop APIs (#40197) 2019-03-19 13:30:01 +00:00
Gordon Brown c8a4a7fc9d
Remove Migration Upgrade and Assistance APIs (#40075)
The Migration Assistance API has been functionally replaced by the
Deprecation Info API, and the Migration Upgrade API is not used for the
transition from ES 6.x to 7.x, and does not need to be kept around to
repair indices that were not properly upgraded before upgrading the
cluster, as was the case in 6.
2019-03-18 13:46:56 -06:00
Benjamin Trent 2016e23285
[ML] Refactor common utils out of ML plugin to XPack.Core (#39976) (#40009)
* [ML] Refactor common utils out of ML plugin to XPack.Core

* implementing GET filters with abstract transport

* removing added rest param

* adjusting how defaults can be supplied
2019-03-13 17:08:43 -05:00
Benjamin Trent 8c6ff5de31
[Data Frame] Refactor PUT transform to not create a task (#39934) (#40010)
* [Data Frame] Refactor PUT transform such that:

 * POST _start creates the task and starts it
 * GET transforms queries docs instead of tasks
 * POST _stop verifies the stored config exists before trying to stop
the task

* Addressing PR comments

* Refactoring DataFrameFeatureSet#usage, decreasing size returned getTransformConfigurations

* fixing failing usage test
2019-03-13 17:08:15 -05:00
David Kyle 48788269b0
[ML] Correct small inconsistencies in ml APIs spec and docs (#39907) 2019-03-11 14:02:50 +00:00
Benjamin Trent 6c6549fc51
[Data-Frame] make the config be strictly parsed on _preview (#39713) (#39873)
* [Data-Frame] make the config be strictly parsed on _preview

* adding test to verify strictly parsing

* adjusting test after master merge
2019-03-09 14:03:57 -06:00
Jason Tedor 0250d554b6
Introduce forget follower API (#39718)
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
2019-03-07 11:08:45 -05:00
Yogesh Gaikwad c91dcbd5ee
Types removal security index template (#39705) (#39728)
As we are moving to single type indices,
we need to address this change in security-related indexes.
To address this, we are
- updating index templates to use preferred type name `_doc`
- updating the API calls to use preferred type name `_doc`

Upgrade impact:-
In case of an upgrade from 6.x, the security index has type
`doc` and this will keep working as there is a single type and `_doc`
works as an alias to an existing type. The change is handled in the
`SecurityIndexManager` when we load mappings and settings from
the template. Previously, we used to do a `PutIndexTemplateRequest`
with the mapping source JSON with the type name. This has been
modified to remove the type name from the source.
So in the case of an upgrade, the `doc` type is updated
whereas for fresh installs `_doc` is updated. This happens as
backend handles `_doc` as an alias to the existing type name.

An optional step is to `reindex` security index and update the
type to `_doc`.

Since we do not support the security audit log index,
that template has been deleted.

Relates: #38637
2019-03-06 18:53:59 +11:00
Tomas Della Vedova fad52acf5a Removed incorrect ML YAML tests (#39400)
A client cannot know that a job_id is already taken, so
this test should not have been specified as a client test
2019-03-05 17:13:10 +00:00
Martijn Laarman 52ecf18dc4
Index on rollup.rollup_search.json is a list (#39097) (#39653)
And not a string since it accepts comma separated list of indices.

(cherry picked from commit cf34d50b3a983b5fc0c9c7aa279cecd4aa10e28b)
2019-03-04 15:23:18 +01:00
Martijn Laarman c2a94aabbc
ilm.explain_lifecycle documents human again (#39113) (#39648)
This is already exposed as a `_common.json` global parameter.

(cherry picked from commit e84050c0307bb5d5cea8eacc6b63b34248a41a01)
2019-03-04 15:23:01 +01:00
Martijn Laarman 9788036857
metric on watcher stats is a list not an enum (#39114) (#39645)
`enum` is a single option from a known list of `options`
`list` is an array of unknown values
`flags` are multiple options from a list of known `options`.

We don't support the `flags` type but a `list` with `options` acts as one. This is already the case for other API's taking metric such as `node.stats.json`. 

watcher.stats behaves the same as other API's as `metrics` and as such accepts the following `GET _xpack/watcher/stats/queued_watches,current_watches`

(cherry picked from commit 4c00a025b8ac9b397b27c4ae2f799553d6499412)
2019-03-04 15:22:44 +01:00
Martijn Laarman 7c69fd9e44
parts documented as optional are actually required (#39122) (#39641)
(cherry picked from commit e0f728b44ad49e28477767b3ee783a07ddf4bb0d)
2019-03-04 15:22:26 +01:00
David Kyle a58145f9e6
[ML] Transition to typeless (mapping) APIs (#39573)
ML has historically used doc as the single mapping type but reindex in 7.x
will change the mapping to _doc. Switching to the typeless APIs handles 
case where the mapping type is either doc or _doc. This change removes
deprecated typed usages.
2019-03-04 13:52:05 +00:00
Ioannis Kakavas 2ce9457c8f Mute Bulk indexing of monitoring data (#39448)
Relates: #30101
2019-02-28 07:40:36 +02:00
Tim Vernum 30687cbe7f
Switch internal security index to ".security-7" (#39422)
This changes the name of the internal security index to ".security-7",
but supports indices that were upgraded from earlier versions and use
the ".security-6" name.

In all cases, both ".security-6" and ".security-7" are considered to
be restricted index names regardless of which name is actually in use
on the cluster.

Backport of: #39337
2019-02-27 12:49:44 +11:00
Yogesh Gaikwad 0c7310936b
Fixed required fields and paths list (#39358) (#39428)
Some small fix for the `x-pack` rest api spec.

* In both `security.enable_user.json` and `security.disable_user.json`
   the `username` parameter was `false` instead of `true`
   (the documentation is already correct).
* In `security.get_privileges.json` there were missing all the
   possible paths since the path parameters are not required.
   This fix aligns the document with the rest of the spec,
   where all the possible combinations are listed.
2019-02-27 12:40:15 +11:00
Benjamin Trent 926291aac8
[DATA-FRAME] Sort `GET` transforms and stats by ID (#39365) (#39369)
* [Data-Frame] Sort `GET` transforms and stats by ID

* removing unused import
2019-02-25 14:22:41 -06:00
Benjamin Trent 3d49523726
[DATA-FRAME] adds specs and yml tests for existing endpoints (#39326) (#39363)
* [DATA-FRAME] adds specs and yml tests for existing endpoints

* removing bad URL, adding test for _all
2019-02-25 11:19:49 -06:00
Benjamin Trent 109b6451fd
ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs (#38822) (#39119)
* ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs

* Addressing pr feedback
2019-02-19 12:45:56 -06:00
David Roberts bbcdea43c5 [ML] Allow stop unassigned datafeed and relax unset upgrade mode wait (#39034)
These two changes are interlinked.

Before this change unsetting ML upgrade mode would wait for all
datafeeds to be assigned and not waiting for their corresponding
jobs to initialise.  However, this could be inappropriate, if
there was a reason other that upgrade mode why one job was unable
to be assigned or slow to start up.  Unsetting of upgrade mode
would hang in this case.

This change relaxes the condition for considering upgrade mode to
be unset to simply that an assignment attempt has been made for
each ML persistent task that did not fail because upgrade mode
was enabled.  Thus after unsetting upgrade mode there is no
guarantee that every ML persistent task is assigned, just that
each is not unassigned due to upgrade mode.

In order to make setting upgrade mode work immediately after
unsetting upgrade mode it was then also necessary to make it
possible to stop a datafeed that was not assigned.  There was
no particularly good reason why this was not allowed in the past.
It is trivial to stop an unassigned datafeed because it just
involves removing the persistent task.
2019-02-19 14:07:10 +00:00
Martijn Laarman 9b4d96534b
Fix #38623 remove xpack namespace REST API (#38625) (#39036)
* Fix #38623 remove xpack namespace REST API

Except for xpack.usage and xpack.info API's, this moves the last remaining API's out of the xpack namespace

* rename xpack api's inside inside the files as well

* updated yaml tests references to xpack namespaces api's

* update callsApi calls in the IT subclasses

* make sure docs testing does not use xpack namespaced api's

* fix leftover xpack namespaced method names in docs/build.gradle

* found another leftover reference

(cherry picked from commit ccb5d934363c37506b76119ac050a254fa80b5e7)
2019-02-18 12:40:07 +01:00
Albert Zaharovits 6243a9797f _cat/indices with Security, hide names when wildcard (#38824)
This changes the output of the `_cat/indices` API with `Security` enabled.

It is possible to only display the index name (and possibly the index
health, depending on the request options) but not its stats (doc count, merges,
size, etc). This is the case for closed indices which have index metadata in the
cluster state but no associated shards, hence no shard stats.
However, when `Security` is enabled, and the request contains wildcards,
**open** indices without stats are a common occurrence. This is because the
index names in the response table are picked up directly from the cluster state
which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the
stats data which is populated by the indices stats API which does go through the
index name resolver.
This is a bug, because it is circumventing `Security`'s function to hide
unauthorized indices.

This has been fixed by displaying the index names as they are resolved by the indices
stats API. The outputs of these two APIs is now very similar: same index names,
similar data but different format.

Closes #37190
2019-02-14 15:09:17 +02:00
Benjamin Trent 24a8ea06f5
ML: update set_upgrade_mode, add logging (#38372) (#38538)
* ML: update set_upgrade_mode, add logging

* Attempt to fix datafeed isolation

Also renamed a few methods/variables for clarity and added
some comments
2019-02-08 12:56:04 -06:00
Boaz Leskes 033ba725af
Remove support for internal versioning for concurrency control (#38254)
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:

> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document 

We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. 

This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.

Relates to #1078
2019-02-05 20:53:35 +01:00
Julie Tibshirani 3ce7d2c9b6
Make sure to reject mappings with type _doc when include_type_name is false. (#38270)
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).

This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:

```
PUT index?include_type_name=false
{
  "mappings": {
    "_doc": {
      "properties": { ... }
    }
  }
}
```

There is a similar issue for both 'put templates' and 'put mappings' requests
as well.

This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
2019-02-05 10:52:32 -08:00
Yogesh Gaikwad fe36861ada
Add support for API keys to access Elasticsearch (#38291)
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to 
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.

This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.

The API keys:-
- by default do not have an expiration but expiration can be
  configured where the API keys need to be expired after a
  certain amount of time.
- when generated will keep authentication information of the user that
   generated them.
- can be defined with a role describing the privileges for accessing
   Elasticsearch and will be limited by the role of the user that
   generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
  before being deleted. The expired API keys remover task handles this.

Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`

The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the 
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```

Closes #34383
2019-02-05 14:21:57 +11:00
Boaz Leskes f6e06a2b19 Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231)
After backporting #37977, #37857 and #37872
2019-02-01 20:37:16 -05:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Boaz Leskes b11732104f
Move watcher to use seq# and primary term for concurrency control (#37977)
* move watcher to seq# occ

* top level set

* fix parsing and missing setters

* share toXContent for PutResponse and rest end point

* fix redacted password

* fix username reference

* fix deactivate-watch.asciidoc have seq no references

* add seq# + term to activate-watch.asciidoc

* more doc fixes
2019-01-30 20:14:59 -05:00
David Roberts be788160ef
[ML] Datafeed deprecation checks (#38026)
Deprecation checks for the ML datafeed
query and aggregations.
2019-01-30 20:12:20 +00:00
Benjamin Trent 8280a20664
ML: Add upgrade mode docs, hlrc, and fix bug (#37942)
* ML: Add upgrade mode docs, hlrc, and fix bug

* [DOCS] Fixes build error and edits text

* adjusting docs

* Update docs/reference/ml/apis/set-upgrade-mode.asciidoc

Co-Authored-By: benwtrent <ben.w.trent@gmail.com>

* Update set-upgrade-mode.asciidoc

* Update set-upgrade-mode.asciidoc
2019-01-30 06:51:11 -06:00
Adrien Grand c8af0f4bfa
Use mappings to format doc-value fields by default. (#30831)
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.

This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
2019-01-30 10:31:51 +01:00
Tim Brooks 00ace369af
Use `CcrRepository` to init follower index (#35719)
This commit modifies the put follow index action to use a
CcrRepository when creating a follower index. It routes 
the logic through the snapshot/restore process. A 
wait_for_active_shards parameter can be used to configure
how long to wait before returning the response.
2019-01-29 11:47:29 -07:00
Jake Landis 99b75a9bdf
deprecate types for watcher (#37594)
This commit adds deprecation warnings for index actions
and search actions when executed via watcher. Unit and 
integration tests updated accordingly. 

relates #35190
2019-01-28 13:46:43 -06:00
Benjamin Trent 7e4c0e6991
ML: Adds set_upgrade_mode API endpoint (#37837)
* ML: Add MlMetadata.upgrade_mode and API

* Adding tests

* Adding wait conditionals for the upgrade_mode call to return

* Adding tests

* adjusting format and tests

* Adjusting wait conditions for api return and msgs

* adjusting doc tests

* adding upgrade mode tests to black list
2019-01-28 09:07:30 -06:00
David Kyle c0409fb9f0
[ML] Marginal gains in slow multi node QA tests (#37825)
Move 2 tests that are simple rest tests and out of the QA suite and cut the number
of post data calls in ForecastIT
2019-01-28 10:00:59 +00:00
Benjamin Trent 9e932f4869
ML: removing unnecessary upgrade code (#37879) 2019-01-25 13:57:41 -06:00
Benjamin Trent 5384162a42
ML: creating ML State write alias and pointing writes there (#37483)
* ML: creating ML State write alias and pointing writes there

* Moving alias check to openJob method

* adjusting concrete index lookup for ml-state
2019-01-18 14:32:34 -06:00
Martijn van Groningen 6846666b6b
Add ccr follow info api (#37408)
* Add ccr follow info api

This api returns all follower indices and per follower index
the provided parameters at put follow / resume follow time and
whether index following is paused or active.

Closes #37127

* iter

* [DOCS] Edits the get follower info API

* [DOCS] Fixes link to remote cluster

* [DOCS] Clarifies descriptions for configured parameters
2019-01-18 16:37:21 +01:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Martijn van Groningen 1a41d84536
[CCR] Resume follow Api should not require a request body (#37217)
Closes #37022
2019-01-10 09:48:26 +01:00
Benjamin Trent df3b58cb04
ML: add migrate anomalies assistant (#36643)
* ML: add migrate anomalies assistant

* adjusting failure handling for reindex

* Fixing request and tests

* Adding tests to blacklist

* adjusting test

* test fix: posting data directly to the job instead of relying on datafeed

* adjusting API usage

* adding Todos and adjusting endpoint

* Adding types to reindexRequest

* removing unreliable "live" data test

* adding index refresh to test

* adding index refresh to test

* adding index refresh to yaml test

* fixing bad exists call

* removing todo

* Addressing remove comments

* Adjusting rest endpoint name

* making service have its own logger

* adjusting validity check for newindex names

* fixing typos

* fixing renaming
2019-01-09 14:25:35 -06:00
Jim Ferenczi e38cf1d0dc
Add the ability to set the number of hits to track accurately (#36357)
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.

Relates #33028
2019-01-04 20:36:49 +01:00
Dimitris Athanasiou 586453fef1
[ML] Remove types from datafeed (#36538)
Closes #34265
2019-01-04 09:43:44 +02:00
Dimitris Athanasiou b04b3173db
[ML][TEST] Clean up max_model_memory_limit cluster setting (#37101)
Removes the `xpack.ml.max_model_memory_limit` cluster setting
at the teardown of the `ml_info.yml` tests to ensure the setting
does not trip other tests.
2019-01-03 15:15:31 +02:00
David Kyle 42bb2bae21
[ML] Order GET job stats response by job id (#36841) 2019-01-02 16:52:20 +00:00
Ioannis Kakavas 0cae979dfe
Remove bwc logic for token invalidation (#36893)
- Removes bwc invalidation logic from the TokenService
- Removes bwc serialization for InvalidateTokenResponse objects as
    old nodes in supported mixed clusters during upgrade will be 6.7 and
    thus will know of the new format
- Removes the created field from the TokensInvalidationResult and the
    InvalidateTokenResponse as it is no longer useful in > 7.0
2018-12-28 13:09:42 +02:00
David Roberts 0f2f00a20a
[ML] Resolve 7.0.0 TODOs in ML code (#36842)
This change cleans up a number of ugly BWC
workarounds in the ML code.

7.0 cannot run in a mixed version cluster with
versions prior to 6.7, so code that deals with
these old versions is no longer required.

Closes #29963
2018-12-20 12:49:57 +00:00
David Kyle e294056bbf
[ML] Merge the Jindex master feature branch (#36702)
* [ML] Job and datafeed mappings with index template (#32719)

Index mappings for the configuration documents

* [ML] Job config document CRUD operations (#32738)

* [ML] Datafeed config CRUD operations (#32854)

* [ML] Change JobManager to work with Job config in index  (#33064)

* [ML] Change Datafeed actions to read config from the config index (#33273)

* [ML] Allocate jobs based on JobParams rather than cluster state config (#33994)

* [ML] Return missing job error when .ml-config is does not exist (#34177)

* [ML] Close job in index (#34217)

* [ML] Adjust finalize job action to work with documents (#34226)

* [ML] Job in index: Datafeed node selector (#34218)

* [ML] Job in Index: Stop and preview datafeed (#34605)

* [ML] Delete job document (#34595)

* [ML] Convert job data remover to work with index configs (#34532)

* [ML] Job in index: Get datafeed and job stats from index (#34645)

* [ML] Job in Index: Convert get calendar events to index docs (#34710)

* [ML] Job in index: delete filter action (#34642)

This changes the delete filter action to search
for jobs using the filter to be deleted in the index
rather than the cluster state.

* [ML] Job in Index: Enable integ tests (#34851)

Enables the ml integration tests excluding the rolling upgrade tests and a lot of fixes to
make the tests pass again.

* [ML] Reimplement established model memory (#35500)

This is the 7.0 implementation of a master node service to
keep track of the native process memory requirement of each ML
job with an associated native process.

The new ML memory tracker service works when the whole cluster
is upgraded to at least version 6.6. For mixed version clusters
the old mechanism of established model memory stored on the job
in cluster state was used. This means that the old (and complex)
code to keep established model memory up to date on the job object
has been removed in 7.0.

Forward port of #35263

* [ML] Need to wait for shards to replicate in distributed test (#35541)

Because the cluster was expanded from 1 node to 3 indices would
initially start off with 0 replicas.  If the original node was
killed before auto-expansion to 1 replica was complete then
the test would fail because the indices would be unavailable.

* [ML] DelayedDataCheckConfig index mappings (#35646)

* [ML] JIndex: Restore finalize job action (#35939)

* [ML] Replace Version.CURRENT in streaming functions (#36118)

* [ML] Use 'anomaly-detector' in job config doc name (#36254)

* [ML] Job In Index: Migrate config from the clusterstate (#35834)

Migrate ML configuration from clusterstate to index for closed jobs
only once all nodes are v6.6.0 or higher

* [ML] Check groups against job Ids on update (#36317)

* [ML] Adapt to periodic persistent task refresh (#36633)

* [ML] Adapt to periodic persistent task refresh

If https://github.com/elastic/elasticsearch/pull/36069/files is
merged then the approach for reallocating ML persistent tasks
after refreshing job memory requirements can be simplified.
This change begins the simplification process.

* Remove AwaitsFix and implement TODO

* [ML] Default search size for configs

* Fix TooManyJobsIT.testMultipleNodes

Two problems:

1. Stack overflow during async iteration when lots of
   jobs on same machine
2. Not effectively setting search size in all cases

* Use execute() instead of submit() in MlMemoryTracker

We don't need a Future to wait for completion

* [ML][TEST] Fix NPE in JobManagerTests

* [ML] JIindex: Limit the size of bulk migrations (#36481)

* [ML] Prevent updates and upgrade tests (#36649)

* [FEATURE][ML] Add cluster setting that enables/disables config  migration (#36700)

This commit adds a cluster settings called `xpack.ml.enable_config_migration`.
The setting is `true` by default. When set to `false`, no config migration will
be attempted and non-migrated resources (e.g. jobs, datafeeds) will be able
to be updated normally.

Relates #32905

* [ML] Snapshot ml configs before migrating (#36645)

* [FEATURE][ML] Split in batches and migrate all jobs and datafeeds (#36716)

Relates #32905

* SQL: Fix translation of LIKE/RLIKE keywords (#36672)

* SQL: Fix translation of LIKE/RLIKE keywords

Refactor Like/RLike functions to simplify internals and improve query
 translation when chained or within a script context.

Fix #36039
Fix #36584

* Fixing line length for EnvironmentTests and RecoveryTests (#36657)

Relates #34884

* Add back one line removed by mistake regarding java version check and
COMPAT jvm parameter existence

* Do not resolve addresses in remote connection info (#36671)

The remote connection info API leads to resolving addresses of seed
nodes when invoked. This is problematic because if a hostname fails to
resolve, we would not display any remote connection info. Yet, a
hostname not resolving can happen across remote clusters, especially in
the modern world of cloud services with dynamically chaning
IPs. Instead, the remote connection info API should be providing the
configured seed nodes. This commit changes the remote connection info to
display the configured seed nodes, avoiding a hostname resolution. Note
that care was taken to preserve backwards compatibility with previous
versions that expect the remote connection info to serialize a transport
address instead of a string representing the hostname.

* [Painless] Add boxed type to boxed type casts for method/return (#36571)

This adds implicit boxed type to boxed types casts for non-def types to create asymmetric casting relative to the def type when calling methods or returning values. This means that a user calling a method taking an Integer can call it with a Byte, Short, etc. legally which matches the way def works. This creates consistency in the casting model that did not previously exist.

* SNAPSHOTS: Adjust BwC Versions in Restore Logic (#36718)

* Re-enables bwc tests with adjusted version conditions now that #36397 enables concurrent snapshots in 6.6+

* ingest: fix on_failure with Drop processor (#36686)

This commit allows a document to be dropped when a Drop processor
is used in the on_failure fork of the processor chain.

Fixes #36151

* Initialize startup `CcrRepositories` (#36730)

Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.

* [TEST] fix float comparison in RandomObjects#getExpectedParsedValue

This commit fixes a test bug introduced with #36597. This caused some
test failure as stored field values comparisons would not work when CBOR
xcontent type was used.

Closes #29080

* [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)

This commit  exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new 
indexing approach, simply set "type" : "geo_shape" in 
the mappings without setting any of the strategy, precision, 
tree_levels, or distance_error_pct parameters. Note the 
following when using the new indexing approach:

* geo_shape query does not support querying by 
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not 
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct, 
and points_only parameters are deprecated.

* TESTS:Debug Log. IndexStatsIT#testFilterCacheStats

* ingest: support default pipelines + bulk upserts (#36618)

This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.

However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.

This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.

fixes #36219

* Fix duplicate phrase in shrink/split error message (#36734)

This commit removes a duplicate "must be a" from the shrink/split error
messages.

* Deprecate types in get_source and exist_source (#36426)

This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.

Relates to #35190

* Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)"

This reverts commit 5bc7822562.

* Enhance Invalidate Token API (#35388)

This change:

- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the 
   number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation

After back-porting to 6.x, the `created` field will be removed from master as a field in the 
response

Resolves: #35115
Relates: #34556

* Add raw sort values to SearchSortValues transport serialization (#36617)

In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.

This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.

* Watcher: Ensure all internal search requests count hits (#36697)

In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.

Closes #36177

* [TEST] Ensure shard follow tasks have really stopped.

Relates to #36696

* Ensure MapperService#getAllMetaFields elements order is deterministic (#36739)

MapperService#getAllMetaFields returns an array, which is created out of
an `ObjectHashSet`. Such set does not guarantee deterministic hash
ordering. The array returned by its toArray may be sorted differently
at each run. This caused some repeatability issues in our tests (see #29080)
as we pick random fields from the array of possible metadata fields,
but that won't be repeatable if the input array is sorted differently at
every run. Once setting the tests seed, hppc picks that up and the sorting is
deterministic, but failures don't repeat with the seed that gets printed out
originally (as a seed was not originally set).
See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173.

With this commit, we simply create a static sorted array that is used for
`getAllMetaFields`. The change is in production code but really affects
only testing as the only production usage of this method was to iterate
through all values when parsing fields in the high-level REST client code.
Anyways, this seems like a good change as returning an array would imply
that it's deterministically sorted.

* Expose Sequence Number based Optimistic Concurrency Control in the rest layer (#36721)

Relates #36148 
Relates #10708

* [ML] Mute MlDistributedFailureIT
2018-12-18 17:45:31 +00:00
Alexander Reelsen 00521a5b36
Watcher: Ensure all internal search requests count hits (#36697)
In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.

Closes #36177
2018-12-18 10:11:39 +01:00
Ioannis Kakavas 7b9ca62174
Enhance Invalidate Token API (#35388)
This change:

- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the 
   number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation

After back-porting to 6.x, the `created` field will be removed from master as a field in the 
response

Resolves: #35115
Relates: #34556
2018-12-18 10:05:50 +02:00
Nik Everett 03daad9812
Re-deprecate xpack rollup endpoints (#36451)
Redeprecates the `/_xpack/rollup` endpoints in favor of `/_rollup`.

When we cleanup the rollup in a cluster containing 6.x nodes we need to
use `/_xpack/rollup` instead of `/_rollup` because the 6.x nodes don't
know about `/_rollup`. In those cases we must ignore the deprecation
warnings that the 7.0 node will return for the end point.

Closes #36044
2018-12-11 19:43:17 -05:00
Ioannis Kakavas d7c5d8049a
Deprecate /_xpack/security/* in favor of /_security/* (#36293)
* This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.

- REST API docs
- HLRC docs and doc tests
- Handle REST actions with deprecation warnings
- Changed endpoints in rest-api-spec and relevant file names
2018-12-11 11:13:10 +02:00
Jason Tedor 0909a631ba
Add non-X-Pack centric rollup endpoints (#36383)
* Add non-X-Pack centric rollup endpoints

This commit adds new endpoints for rollup that do not have _xpack in
their path. The purpose for this change is to take these endpoints into
6.x as well so that they can be available in mixed cluster tests too. A
follow-up change will deprecate the use of _xpack in the rollup
endpoints. And finally, in the future, we would remove the _xpack
endpoints.

* Remove import

* Fix typo
2018-12-10 14:50:30 -05:00
David Turner ca3f5c1e2e
Cancel GetDiscoveredNodesAction when bootstrapped (#36423)
Today the `GetDiscoveredNodesAction` waits, possibly indefinitely, to discover
enough nodes to bootstrap the cluster. However it is possible that the cluster
forms before a node has discovered the expected collection of nodes, in which
case the action will wait indefinitely despite the fact that it is no longer
required.

This commit changes the behaviour so that the action fails once a node receives
a cluster state with a nonempty configuration, indicating that the cluster has
been successfully bootstrapped and therefore the `GetDiscoveredNodesAction`
need wait no longer.

Relates #36380 and #36381; reverts 558f4ec278.
2018-12-10 17:23:03 +00:00
David Roberts 558f4ec278
Ignore zen2 discovery task in waitForPendingTasks (#36381)
Fixes #36380
2018-12-10 11:07:02 +00:00
Michael Basnight b5b6e37a60
Deprecate X-Pack centric watcher endpoints (#36218)
This commit is part of our plan to deprecate and ultimately remove the use of
_xpack in the REST APIs.

Relates #35958
2018-12-08 12:57:16 -06:00
David Roberts 9e8cfbb40d
[ML] Deprecate X-Pack centric ML endpoints (#36315)
This commit is part of our plan to deprecate and
ultimately remove the use of _xpack in the REST APIs.

Relates #35958
2018-12-07 20:34:11 +00:00
Nik Everett ead2b9e08b
HLRC: Add rollup search (#36334)
Relates to #29827
2018-12-07 14:39:58 -05:00
Benjamin Trent adc8355c5d
Adding additional tests for agg parsing in datafeedconfig (#36261)
* Adding additional tests for agg parsing in datafeedconfig

* Fixing bug, adding yml test
2018-12-06 11:19:34 -06:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Tal Levy fdec331b13
[ILM] fix ilm.remove_policy rest-spec (#36165)
The rest interface for remove-policy-from-index API does not support
`_ilm/remove`, it requires that an `{index}` pattern be defined in
the URL path. This fixes the rest-api-spec to reflect the implementation
2018-12-03 10:55:31 -08:00
Jake Landis f8f521bad4
Deprecate /_xpack/monitoring/* in favor of /_monitoring/* (#36130)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.

* Add deprecation for /_xpack/monitoring/_bulk in favor of /_monitoring/bulk
* Removed xpack from the rest-api-spec and tests
* Removed xpack from the Action name
* Removed MonitoringRestHandler as an unnecessary abstraction
* Minor corrections to comments

Relates #35958
2018-12-03 10:26:08 -06:00
Tal Levy 78e07d467c
[ILM] rest-spec for remove-policy had wrong link (#36083) 2018-11-29 14:11:51 -08:00
Zachary Tong 61c2db5ebb Revert "Deprecate X-Pack centric rollup endpoints (#35962)"
This reverts commit b84f1f6a3a.
2018-11-29 12:58:23 -05:00
Jim Ferenczi 8a7f3f75f3
Add support for rest_total_hits_as_int (#36051)
The support for rest_total_hits_as_int has already been merged to 6x
in #35848 so this change adds this new option to master. The plan was
to add this new option as part of #35848 but we've decided to wait a few
days before merging this breaking change so this commit just handles
the new option as a noop exactly like 6x for now. This will allow
users to migrate to this parameter before #35848 is merged.

Relates #33028
2018-11-29 18:36:16 +01:00
Jason Tedor 9fa9e1419f
Remove X-Pack centric graph endpoints (#36010)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
2018-11-29 07:09:37 -05:00
Gordon Brown c26af3b0a2
Deprecate X-Pack centric Migration endpoints (#35976)
This commit is part of our plan to deprecate and remove the use of
_xpack in the REST API routes.
2018-11-28 13:19:33 -07:00
Jason Tedor a3186e4a32
Deprecate X-Pack centric license endpoints (#35959)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
2018-11-28 08:24:35 -05:00
Jason Tedor c42d9d91c9
Deprecate X-Pack centric SQL endpoints (#35964)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
2018-11-27 22:16:21 -05:00
Jason Tedor b84f1f6a3a
Deprecate X-Pack centric rollup endpoints (#35962)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
2018-11-27 20:34:17 -05:00