Commit Graph

2034 Commits

Author SHA1 Message Date
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Igor Motov f70a59971a
[7.x] Add rate aggregation (#61369) (#61554)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 17:39:00 -04:00
Przemyslaw Gomulka f3f7d25316
Header warning logging refactoring backport(#55941) (#61515)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
backports #55941
2020-08-25 16:35:54 +02:00
David Kyle 539cf914bc
[ML] handle new model metadata stream from native process (#59725) (#61251)
This adds the serialization handling for the new model_metadata object from the native process.

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-08-24 15:52:13 -04:00
Yang Wang cd52233b94
Include authentication type for the authenticate response (#61247) (#61411)
Add a new "authentication_type" field to the response of "GET _security/_authenticate".
2020-08-21 22:59:43 +10:00
Lloyd cb83e7011c
[Backport][API keys] Add full_name and email to API key doc and use them to populate authing User (#61354) (#61403)
The API key document currently doesn't include the user's full_name or email attributes,
and as a result, when those attributes return `null` when hitting `GET`ing  `/_security/_authenticate`,
and in the SAML response from the [IdP Plugin](https://github.com/elastic/elasticsearch/pull/54046).

This changeset adds those fields to the document and extracts them to fill in the User when
authenticating. They're effectively going to be a snapshot of the User from when the key was
created, but this is in line with roles and metadata as well.

Signed-off-by: lloydmeta <lloydmeta@gmail.com>
2020-08-21 18:32:19 +09:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Benjamin Trent 8f302282f4
[ML] adds new feature_processors field for data frame analytics (#60528) (#61148)
feature_processors allow users to create custom features from
individual document fields.

These `feature_processors` are the same object as the trained model's pre_processors.

They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.

closes https://github.com/elastic/elasticsearch/issues/59327
2020-08-14 10:32:20 -04:00
David Roberts d1b60269f4
[ML] Ensure annotations index mappings are up to date (#61142)
When the ML annotations index was first added, only the
ML UI wrote to it, so the code to create it was designed
with this in mind.  Now the ML backend also creates
annotations, and those mappings can change between
versions.

In this change:

1. The code that runs on the master node to create the
   annotations index if it doesn't exist but another ML
   index does also now ensures the mappings are up-to-date.
   This is good enough for the ML UI's use of the
   annotations index, because the upgrade order rules say
   that the whole Elasticsearch cluster must be upgraded
   prior to Kibana, so the master node should be on the
   newer version before Kibana tries to write an
   annotation with the new fields.
2. We now also check whether the annotations index exists
   with the correct mappings before starting an autodetect
   process on a node.  This is necessary because ML nodes
   can be upgraded before the master node, so could write
   an annotation with the new fields before the master node
   knows about the new fields.

Backport of #61107
2020-08-14 13:51:04 +01:00
Benjamin Trent 7c3bfb9437
[ML] updating feature_importance results mapping (#61104) (#61144)
This updates the feature_importance mapping change from elastic/ml-cpp#1387
2020-08-14 08:43:10 -04:00
Lee Hinman e3df64a429
[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994) (#61045)
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
2020-08-12 11:06:23 -06:00
Andrei Dan 32173a82c8
ILM: add frozen phase (#60983) (#61035)
This adds a frozen phase to ILM that will allow the execution of the
set_priority, unfollow, allocate, freeze and searchable_snapshot actions.

The frozen phase will be executed after the cold and before the delete phase.

(cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-12 16:36:27 +01:00
Benjamin Trent 4275a715c9
[ML] adjusting inference processor to support foreach usage (#60915) (#61022)
`foreach` processors store information within the `_ingest` metadata object.

This commit adds the contents of the `_ingest` metadata (if it is not empty).

And will append new inference results if the result field already exists.

This allows a `foreach` to execute and multiple inference results being written to the same result field.

closes https://github.com/elastic/elasticsearch/issues/60867
2020-08-12 08:34:18 -04:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Dimitris Athanasiou 2e18c0f2ac
[7.x][ML] Audit force stopping data frame analytics (#60973) (#61004)
Audits a message when a data frame analytics job is force stopped.

Backport of #60973
2020-08-12 07:45:26 +03:00
Benjamin Trent 66b3e89482
[ML] enable logging for test failures (#60902) (#60910) 2020-08-10 12:36:30 -04:00
Andrei Dan 235e5ed3ea
[7.x] ILM: add force-merge step to searchable snapshots action (#60819) (#60882)
This adds a force-merge step to the searchable snapshot action, enabled by default,
but parameterizable using the `force_merge-index" optional boolean.

eg.
```
PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "cold": {
        "actions": {
          "searchable_snapshot" : {
            "snapshot_repository" : "backing_repo",
            "force_merge_index": true
          }
        }
      }
    }
  }
}
```

(cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-10 13:45:11 +01:00
Hendrik Muhs b210aaf666 [Transform] remove wrong test (#60807)
remove test, scripts are excluded in the change collector, the test is a leftover from a previous
solution of #57332, which has been discarded

relates #60724
fixes #60794
2020-08-06 11:56:19 +02:00
Ryan Ernst d88098c1d5
Mute flaky transform pivot test
see https://github.com/elastic/elasticsearch/issues/60794
2020-08-05 14:53:25 -07:00
Francisco Fernández Castaño b4044004aa
Add recovery state tracking for Searchable Snapshots (#60751)
This pull request adds recovery state tracking for Searchable Snapshots.

In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.

Those differences can be summarized as follows:

-  The Directory implementation that's provided by SearchableSnapshots mark the
    snapshot files as reused during recovery. In order to keep track of the
    recovery process as the cache is pre-warmed, those files shouldn't be marked
    as reused.
 - Once the shard is created, the cache starts its pre-warming phase, meaning that
    we should keep track of those downloads during that process and tie the recovery
    to this pre-warming phase. The shard is considered recovered once this pre-warming
    phase has finished.

Backport of #60505
2020-08-05 17:41:49 +02:00
Hendrik Muhs 08f94c914b [Transform] disable optimizations when using scripts in group_by (#60724)
disable optimizations when using scripts in group_by, when scripts using scripts we can not predict
the outcome and we have no query counterpart. Other optimizations for other group_by's are not
affected.

fixes #57332
2020-08-05 17:27:19 +02:00
Przemysław Witek 0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Adrien Grand 67f6f34c23
Remove dataset.* fields. (#60720)
These are being replaced by the `data_stream.*` fields.
2020-08-05 11:35:05 +02:00
Adrien Grand 602d269059
Rename `datastream` to `data_stream`. (#60714)
The name of the feature having a space: "data stream", the key should
have an underscore.
2020-08-05 09:55:02 +02:00
Adrien Grand 20ae1b75bd
Rename dataset to datastream (#60638)
Co-authored-by: ruflin <spam@ruflin.com>
2020-08-04 09:58:54 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Yang Wang 54aaadade7
API key name should always be required for creation (#59836) (#60636)
The name is now required when creating or granting API keys.
2020-08-04 13:28:47 +10:00
Yannick Welsch b0d601fa63 Adjust searchable snapshot license (#60578)
No longer needs Platinum license for testing on staging.
2020-08-03 13:19:53 +02:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Hendrik Muhs a721d6d19b [Transform] use correct version in BWC serialization test (#60500)
use correct version in BWC serialization test

fixes #60464
2020-07-31 11:23:05 +02:00
Przemysław Witek 9e27f7474c
Make MlDailyMaintenanceService delete jobs that are in deleting state anyway (#60121) (#60439) 2020-07-30 09:53:11 +02:00
Hendrik Muhs aaed6b59d6
[7.x][Transform] add support for missing bucket (#59591) (#60390)
add support for "missing_bucket" in group_by

fixes #42941
fixes #55102
backport #59591
2020-07-30 08:26:51 +02:00
Benjamin Trent 76359aaa53
[ML] always write prediction_[score|probability] for classification inference (#60335) (#60397)
In order to unify model inference and analytics results we
need to write the same fields.

prediction_probability and prediction_score are now written
for inference calls against classification models.
2020-07-29 10:58:14 -04:00
Jim Ferenczi 578749a5e8 Fix AsyncResultsServiceTests#testRetrieveFromMemoryWithExpiration (#60337)
This change ensures that the expiration time that is set in the test
is long enough to not be triggered by a slow execution.

Closes #60255
2020-07-29 09:47:47 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Benjamin Trent 54c8936508
[ML] do not summerize importance for custom features (#60198) (#60333)
If a feature is created via a custom pre-processor,
we should return the importance for that feature.

This means we will not return the importance for the
original document field for custom processed features.

closes https://github.com/elastic/elasticsearch/issues/59330
2020-07-28 15:58:20 -04:00
Dimitris Athanasiou ed7dcff7c4
[7.x][ML] Audit updates on data frame analytics jobs (#60126) (#60287)
Closes #59652

Backport of #60126
2020-07-28 16:33:35 +03:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Dan Hermann ca25f6ae6f
Include the resolve index action in the view_index_metadata privilege (#59785) (#60112) 2020-07-23 08:13:56 -05:00
Larry Gregory a686ccc9b2
[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854) (#60047) 2020-07-22 11:06:10 -04:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Tanguy Leroux 604f22db79
Use a dedicated thread pool for searchable snapshot cache prewarming (#59313) (#59590)
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.

Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
 been completed because of the prewarming tasks making no progress and
filling up the thread pool.

This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.

This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
2020-07-15 11:45:52 +02:00
Tal Levy 4bb91b61e8
Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349) (#59577)
Closes #44505.
2020-07-14 22:37:48 -07:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Ryan Ernst 3b688bfee5
Add license feature usage api (#59342) (#59571)
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.

There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
2020-07-14 14:34:59 -07:00
Albert Zaharovits 4eb310c777
Disallow mapping updates for doc ingestion privileges (#58784)
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.

In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
2020-07-14 23:39:41 +03:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
Nhat Nguyen 4d7c59bedb
Assign follower primary to nodes with remote cluster client role (#59375)
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-07-14 11:23:55 -04:00
Andrei Stefan cf752992d6
Add telemetry metrics (#59526) 2020-07-14 16:25:24 +03:00
Dan Hermann 59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Hendrik Muhs c8290167a0
[7.x][Transform] separate pivot and extract function interface (#59505)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.

piggy backed fixes:
 - when running geo tile group_by it could fail due to query clause limit (unreleased)
 - new style page size using settings was not validating limit of 10k (7.8)
2020-07-14 11:27:16 +02:00
Lee Hinman bf1a60130d
[7.x] Add telemetery for data streams (#59433) (#59454)
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:

```
  ...
  "data_streams" : {
    "available" : true,
    "enabled" : true,
    "data_streams" : 3,
    "indices_count" : 17
  }
  ...
```
2020-07-13 14:30:11 -06:00
David Roberts b5e8250a4e
[ML] Drive categorization warning notifications from annotations (#59393)
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work.  However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to.  Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes.  (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list.  This means it's best to have both.)

Backport of #59377
2020-07-13 15:28:57 +01:00
Yang Wang a84469742c
Improve role cache efficiency for API key roles (#58156) (#59397)
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
2020-07-13 22:58:11 +10:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Dimitris Athanasiou b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
Dimitris Athanasiou d07b11b86b
[7.x][ML] Perform test inference on java (#58877) (#59298)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.

Backport of #58877

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-07-09 16:30:49 +03:00
David Kyle 86555ec163
Remove unused function InferenceIndexConstants.mapping() (#59146) (#59158)
InferenceIndexConstants.mapping() is broken and unused.
2020-07-09 14:28:53 +01:00
David Kyle dbb9c802b1
Better error message when the model cannot be parsed due to its size (#59166) (#59209)
The actual cause can be lost in a long list of parse exceptions
this surfaces the cause when the problem is size.
2020-07-09 13:43:46 +01:00
Albert Zaharovits 2b7456db7f
Improve auditing of API key authentication #58928
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
2020-07-09 13:26:18 +03:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
David Turner 6ffdb19a2a Clean searchable snapshots cache on startup (#59009)
Today we empty the searchable snapshots cache when cleanly closing a
shard, but leak cache files in some cases involving an unclean shutdown.
Such leaks are not permanent, they are cleaned up on shard relocation or
deletion, but they still might last for arbitrarily long until that
happens. This commit introduces a cleanup process that runs during node
startup to catch such leaks sooner.

Also, today we permit searchable snapshots to be held on custom data
paths, and store the corresponding cache files within the custom
location. Supporting this feature would make the cleanup process
significantly more complicated since it would require each node to parse
the index metadata for the shards it held before shutdown. Yet, this
feature is undocumented and offers minimal benefits to searchable
snapshots. Therefore with this commit we forbid custom data paths for
searchable snapshot shards.
2020-07-08 15:17:52 +01:00
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Yannick Welsch 0b9eb210b8
Add basic searchable snapshots usage information (#58828) (#59160)
Adds super basic usage information for searchable snapshots, to be extended later.

Backport of #58828
2020-07-08 13:09:29 +02:00
Albert Zaharovits d4a0f80c32
Ensure authz role for API key is named after owner role (#59041)
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
2020-07-07 23:26:57 +03:00
Rene Groeschke e8181fc627
Fix implicit duplicate duplicatesStrategy in processResources (#58929) (#59127)
* Fix implicit duplicate duplicatesStrategy in processResources
* Fix duplicates strategy in docker distribution setup
2020-07-07 13:45:36 +02:00
David Roberts e217f9a1e8
[ML] Wait for shards to initialize after creating ML internal indices (#59087)
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices.  Currently this can result in attempts
to search the newly created index before its shards have initialized.

This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.

Backport of #59027
2020-07-07 10:52:10 +01:00
Przemysław Witek 4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Przemysław Witek f35ad0d4e1
Report peak model memory in ModelSizeStats (#59017) (#59055) 2020-07-06 12:55:12 +02:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Luca Cavanna e3fc1638d8 Improve error handling in async search code (#57925)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.

Closes #58995
2020-07-03 16:07:26 +02:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
David Kyle f6a0c2c59d
[7.x] Pipeline Inference Aggregation (#58965)
Adds a pipeline aggregation that loads a model and performs inference on the
input aggregation results.
2020-07-03 09:29:04 +01:00
Dan Hermann c988afdc15
Data stream support for migrations deprecations info API 2020-07-02 11:16:22 -05:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Przemysław Witek 8e074c4495
Rename "error" field to "value" for consistency between metrics (#58726) (#58870) 2020-07-02 09:08:56 +02:00
Yang Wang a5a8b4ae1d
Add cache for application privileges (#55836) (#58798)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-07-02 11:50:03 +10:00
Benjamin Trent c64e283dbf
[7.x] [ML] handles compressed model stream from native process (#58009) (#58836)
* [ML] handles compressed model stream from native process (#58009)

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: https://github.com/elastic/ml-cpp/pull/1349
2020-07-01 15:14:31 -04:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Ryan Ernst c23613e05a
Split license allowed checks into two types (#58704) (#58797)
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.

Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
2020-07-01 07:11:05 -07:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
Dario Gieselaar 417f7062c5
[7.x] Add read privileges for annotations for apm_user (#58530) (#58781)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-01 09:04:57 +02:00
Yang Wang 3d49e62960
Support handling LogoutResponse from SAML idP (#56316) (#58792)
SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective.

The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error.

This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.
2020-07-01 16:47:27 +10:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
David Roberts d9e0e0bf95
[ML] Pass through the stop-on-warn setting for categorization jobs (#58738)
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.

Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356

Backport of #58632
2020-06-30 15:17:04 +01:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Przemysław Witek 9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 14:09:11 +02:00
Tim Vernum dcc5a06dec
Display enterprise license as platinum in /_xpack (#58217)
The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.

However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.

This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.

This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
2020-06-30 16:42:28 +10:00
Przemysław Witek 3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API (#58302) (#58648) 2020-06-29 10:56:11 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00