Commit Graph

1978 Commits

Author SHA1 Message Date
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Benjamin Trent 8f302282f4
[ML] adds new feature_processors field for data frame analytics (#60528) (#61148)
feature_processors allow users to create custom features from
individual document fields.

These `feature_processors` are the same object as the trained model's pre_processors.

They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.

closes https://github.com/elastic/elasticsearch/issues/59327
2020-08-14 10:32:20 -04:00
David Roberts d1b60269f4
[ML] Ensure annotations index mappings are up to date (#61142)
When the ML annotations index was first added, only the
ML UI wrote to it, so the code to create it was designed
with this in mind.  Now the ML backend also creates
annotations, and those mappings can change between
versions.

In this change:

1. The code that runs on the master node to create the
   annotations index if it doesn't exist but another ML
   index does also now ensures the mappings are up-to-date.
   This is good enough for the ML UI's use of the
   annotations index, because the upgrade order rules say
   that the whole Elasticsearch cluster must be upgraded
   prior to Kibana, so the master node should be on the
   newer version before Kibana tries to write an
   annotation with the new fields.
2. We now also check whether the annotations index exists
   with the correct mappings before starting an autodetect
   process on a node.  This is necessary because ML nodes
   can be upgraded before the master node, so could write
   an annotation with the new fields before the master node
   knows about the new fields.

Backport of #61107
2020-08-14 13:51:04 +01:00
Benjamin Trent 7c3bfb9437
[ML] updating feature_importance results mapping (#61104) (#61144)
This updates the feature_importance mapping change from elastic/ml-cpp#1387
2020-08-14 08:43:10 -04:00
Lee Hinman e3df64a429
[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994) (#61045)
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
2020-08-12 11:06:23 -06:00
Andrei Dan 32173a82c8
ILM: add frozen phase (#60983) (#61035)
This adds a frozen phase to ILM that will allow the execution of the
set_priority, unfollow, allocate, freeze and searchable_snapshot actions.

The frozen phase will be executed after the cold and before the delete phase.

(cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-12 16:36:27 +01:00
Benjamin Trent 4275a715c9
[ML] adjusting inference processor to support foreach usage (#60915) (#61022)
`foreach` processors store information within the `_ingest` metadata object.

This commit adds the contents of the `_ingest` metadata (if it is not empty).

And will append new inference results if the result field already exists.

This allows a `foreach` to execute and multiple inference results being written to the same result field.

closes https://github.com/elastic/elasticsearch/issues/60867
2020-08-12 08:34:18 -04:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Dimitris Athanasiou 2e18c0f2ac
[7.x][ML] Audit force stopping data frame analytics (#60973) (#61004)
Audits a message when a data frame analytics job is force stopped.

Backport of #60973
2020-08-12 07:45:26 +03:00
Benjamin Trent 66b3e89482
[ML] enable logging for test failures (#60902) (#60910) 2020-08-10 12:36:30 -04:00
Andrei Dan 235e5ed3ea
[7.x] ILM: add force-merge step to searchable snapshots action (#60819) (#60882)
This adds a force-merge step to the searchable snapshot action, enabled by default,
but parameterizable using the `force_merge-index" optional boolean.

eg.
```
PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "cold": {
        "actions": {
          "searchable_snapshot" : {
            "snapshot_repository" : "backing_repo",
            "force_merge_index": true
          }
        }
      }
    }
  }
}
```

(cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-10 13:45:11 +01:00
Hendrik Muhs b210aaf666 [Transform] remove wrong test (#60807)
remove test, scripts are excluded in the change collector, the test is a leftover from a previous
solution of #57332, which has been discarded

relates #60724
fixes #60794
2020-08-06 11:56:19 +02:00
Ryan Ernst d88098c1d5
Mute flaky transform pivot test
see https://github.com/elastic/elasticsearch/issues/60794
2020-08-05 14:53:25 -07:00
Francisco Fernández Castaño b4044004aa
Add recovery state tracking for Searchable Snapshots (#60751)
This pull request adds recovery state tracking for Searchable Snapshots.

In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.

Those differences can be summarized as follows:

-  The Directory implementation that's provided by SearchableSnapshots mark the
    snapshot files as reused during recovery. In order to keep track of the
    recovery process as the cache is pre-warmed, those files shouldn't be marked
    as reused.
 - Once the shard is created, the cache starts its pre-warming phase, meaning that
    we should keep track of those downloads during that process and tie the recovery
    to this pre-warming phase. The shard is considered recovered once this pre-warming
    phase has finished.

Backport of #60505
2020-08-05 17:41:49 +02:00
Hendrik Muhs 08f94c914b [Transform] disable optimizations when using scripts in group_by (#60724)
disable optimizations when using scripts in group_by, when scripts using scripts we can not predict
the outcome and we have no query counterpart. Other optimizations for other group_by's are not
affected.

fixes #57332
2020-08-05 17:27:19 +02:00
Przemysław Witek 0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Adrien Grand 67f6f34c23
Remove dataset.* fields. (#60720)
These are being replaced by the `data_stream.*` fields.
2020-08-05 11:35:05 +02:00
Adrien Grand 602d269059
Rename `datastream` to `data_stream`. (#60714)
The name of the feature having a space: "data stream", the key should
have an underscore.
2020-08-05 09:55:02 +02:00
Adrien Grand 20ae1b75bd
Rename dataset to datastream (#60638)
Co-authored-by: ruflin <spam@ruflin.com>
2020-08-04 09:58:54 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Yang Wang 54aaadade7
API key name should always be required for creation (#59836) (#60636)
The name is now required when creating or granting API keys.
2020-08-04 13:28:47 +10:00
Yannick Welsch b0d601fa63 Adjust searchable snapshot license (#60578)
No longer needs Platinum license for testing on staging.
2020-08-03 13:19:53 +02:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Hendrik Muhs a721d6d19b [Transform] use correct version in BWC serialization test (#60500)
use correct version in BWC serialization test

fixes #60464
2020-07-31 11:23:05 +02:00
Przemysław Witek 9e27f7474c
Make MlDailyMaintenanceService delete jobs that are in deleting state anyway (#60121) (#60439) 2020-07-30 09:53:11 +02:00
Hendrik Muhs aaed6b59d6
[7.x][Transform] add support for missing bucket (#59591) (#60390)
add support for "missing_bucket" in group_by

fixes #42941
fixes #55102
backport #59591
2020-07-30 08:26:51 +02:00
Benjamin Trent 76359aaa53
[ML] always write prediction_[score|probability] for classification inference (#60335) (#60397)
In order to unify model inference and analytics results we
need to write the same fields.

prediction_probability and prediction_score are now written
for inference calls against classification models.
2020-07-29 10:58:14 -04:00
Jim Ferenczi 578749a5e8 Fix AsyncResultsServiceTests#testRetrieveFromMemoryWithExpiration (#60337)
This change ensures that the expiration time that is set in the test
is long enough to not be triggered by a slow execution.

Closes #60255
2020-07-29 09:47:47 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Benjamin Trent 54c8936508
[ML] do not summerize importance for custom features (#60198) (#60333)
If a feature is created via a custom pre-processor,
we should return the importance for that feature.

This means we will not return the importance for the
original document field for custom processed features.

closes https://github.com/elastic/elasticsearch/issues/59330
2020-07-28 15:58:20 -04:00
Dimitris Athanasiou ed7dcff7c4
[7.x][ML] Audit updates on data frame analytics jobs (#60126) (#60287)
Closes #59652

Backport of #60126
2020-07-28 16:33:35 +03:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Dan Hermann ca25f6ae6f
Include the resolve index action in the view_index_metadata privilege (#59785) (#60112) 2020-07-23 08:13:56 -05:00
Larry Gregory a686ccc9b2
[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854) (#60047) 2020-07-22 11:06:10 -04:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Tanguy Leroux 604f22db79
Use a dedicated thread pool for searchable snapshot cache prewarming (#59313) (#59590)
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.

Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
 been completed because of the prewarming tasks making no progress and
filling up the thread pool.

This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.

This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
2020-07-15 11:45:52 +02:00
Tal Levy 4bb91b61e8
Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349) (#59577)
Closes #44505.
2020-07-14 22:37:48 -07:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Ryan Ernst 3b688bfee5
Add license feature usage api (#59342) (#59571)
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.

There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
2020-07-14 14:34:59 -07:00
Albert Zaharovits 4eb310c777
Disallow mapping updates for doc ingestion privileges (#58784)
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.

In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
2020-07-14 23:39:41 +03:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
Nhat Nguyen 4d7c59bedb
Assign follower primary to nodes with remote cluster client role (#59375)
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-07-14 11:23:55 -04:00
Andrei Stefan cf752992d6
Add telemetry metrics (#59526) 2020-07-14 16:25:24 +03:00
Dan Hermann 59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00