Commit Graph

2098 Commits

Author SHA1 Message Date
Lee Hinman b2bfa9ed4f
[7.10] Fix ignoring existed cluster settings in DataTierAllocationDecider (#67137) (4295d489) (#67180)
Relates to #67133.
Seem to #65037.

The main changes of this PR are:

    Modify the construction method of DataTierAllocationDecider, add a param settings like FilterAllocationDecider.
    Create DataTierAllocationDecider in the main method of DataTierMigrationRoutedStep and SetSingleNodeAllocateStep, and the DataTierAllocationDecider is constructed using the cluster settings in the cluster metadata, so the cluster level _tier filters can be seen when executing the steps.
    Add some tests for the change.

Co-authored-by: bellengao <gbl_long@163.com>
2021-01-07 11:27:22 -07:00
Lee Hinman e9b798bdb1
[7.10] Make FilterAllocationDecider totally ignore tier-based allocation settings (#67019) (#67034)
Previously we treated attribute filtering for _tier-prefixed attributes a pass-through, meaning
that they were essentially always treated as matching in DiscoveryNodeFilters.match, however, for
exclude settings, this meant that the node was considered to match the node if a _tier* filter was
specified.

This commit prunes these attributes from the DiscoveryNodeFilters when considering the filters for
FilterAllocationDecider so that they are only considered in DataTierAllocationDecider.

Resolves #66679
2021-01-05 12:42:34 -07:00
Andrei Dan 2620725297
Fix MANAGE_IDX_TEMPLATE privilege to allow `component_template/*` (#66514) (#66581)
(cherry picked from commit bcc28e0ab8e6883e14b23f93f428dee03b377a1d)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-12-18 11:17:16 +00:00
Albert Zaharovits 480561dbc3
Store and use only internal security headers (#66365)
For async searches (EQL included) the client's request headers were
erroneously stored in the .tasks index. This might expose the requesting
client's HTTP Authorization header. This PR fixes that by employing the
usual approach to store only the security-internal headers, which carry
the authentication result, instead of the original Authorization header,
which is commonly utilized to redo authentication for scheduled tasks.
2020-12-17 23:40:55 +02:00
Lee Hinman 8cbb9612d0
[7.10] Create AllocationDeciders in the main method of the ILM step (#65037) (8ac30f9a) (#66070)
Backports the following commits to 7.x:

    Create AllocationDeciders in the main method of the ILM step (#65037) (8ac30f9)
2020-12-08 16:56:25 -07:00
Jim Ferenczi 1c34507e66 Create async search index if necessary on updates and deletes (#64606)
This change ensures that we create the async search index with the right mappings and settings when updating or deleting a document. Users can delete the async search index at any time so we have to re-create it internally if necessary before applying any new operation.
2020-12-02 09:04:28 +01:00
Ioannis Kakavas f6921af885 Revert "Gracefully handle exceptions from Security Providers (#65464) (#65554)"
This reverts commit 12ba9e3e16. This
commit was mechanically backported to 7.10 while it shouldn't have
been.
2020-11-26 17:11:34 +02:00
Ioannis Kakavas 12ba9e3e16
Gracefully handle exceptions from Security Providers (#65464) (#65554)
In certain situations, such as when configured in FIPS 140 mode,
the Java security provider in use might throw a subclass of
java.lang.Error. We currently do not catch these and as a result
the JVM exits, shutting down elasticsearch.

This commit attempts to address this by catching subclasses of Error
that might be thrown for instance when a PBKDF2 implementation
is used from a Security Provider in FIPS 140 mode, with the password
input being less than 14 bytes (112 bits).

- In our PBKDF2 family of hashers, we catch the Error and
throw an ElasticsearchException while creating or verifying the
hash. We throw on verification instead of simply returning false
on purpose so that the message bubbles up and the cause becomes
obvious (otherwise it would be indistinguishable from a wrong
password).
- In KeyStoreWrapper, we catch the Error in order to wrap and re-throw 
a GeneralSecurityException with a helpful message. This can happen when 
using any of the keystore CLI commands, when the node starts or when we 
attempt to reload secure settings.
- In the `elasticsearch-users` tool, we catch the ElasticsearchException that
the Hasher class re-throws and throw an appropriate UserException.

Tests are missing because it's not trivial to set CI in fips approved mode
right now, and thus any tests would need to be muted. There is a parallel
effort in #64024 to enable that and tests will be added in a followup.
2020-11-26 17:04:34 +02:00
Martijn van Groningen 387af748a5
Add support for data stream APIs in transport client. (#65484)
Backporting #65433 to the 7.10 branch.
2020-11-25 10:23:02 +01:00
Przemysław Witek de668ab84b
[7.10] [ML] Extract dependent variable's mapping correctly in case of a multi-field (#63813) (#64287) 2020-11-16 10:34:58 +01:00
Benjamin Trent b888f36388
[ML] fix custom feature processor extraction bugs around boolean fields and custom one_hot feature output order (#64937) (#65009)
This commit fixes two problems:

- When extracting a doc value, we allow boolean scalars to be used as input
- The output order of processed feature names is deterministic. Previous custom one hot fields used to be non-deterministic and thus could cause weird bugs.
2020-11-12 11:15:57 -05:00
Lee Hinman 6dbfafcff2
[7.10] Fix SetSingleNodeAllocateStep for data tier deployments (#64679) (#64730)
Backports the following commits to 7.10:

    Fix SetSingleNodeAllocateStep for data tier deployments (#64679)
2020-11-06 10:12:16 -07:00
Andrei Dan a3d9408fda
Fix DataTiersUsageTransportActionTests testCalculateMAD (#64596) (#64628)
Random the compression factor starting with 1 (to elimitinate nearly 0 values)
which will only use one centroid (and yield 0 for MAD as the aproximate median
is the same as the single centroid mean value)

(cherry picked from commit 940e0f1fde0f40f99af117dd03ab0891c9eedae6)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-11-05 10:58:40 +00:00
Jay Modi 4c3300bf57
Fix job scheduling for same scheduled time (#64598)
The SchedulerEngine used by SLM uses a custom runnable that will
schedule itself for its next execution if there is one to run. For the
majority of jobs, this scheduling could be many hours or days away. Due
to the scheduling so far in advance, there is a chance that time drifts
on the machine or even that time varies core to core so there is no
guarantee that the job actually runs on or after the scheduled time.
This can cause some jobs to reschedule themselves for the same
scheduled time even if they ran only a millisecond prior to the
scheduled time, which causes unexpected actions to be taken such as
what appears as duplicated snapshots.

This change resolves this by checking the triggered time against the
scheduled time and using the appropriate value to ensure that we do
not have unexpected job runs.

Relates #63754
Backport of #64501
2020-11-04 10:15:46 -07:00
Jason Tedor 1126ba4df8
Serialize can contain data with roles (#64324)
This commit internalizes whether or not a role represents the ability to
contain data. In the future, this will let us remove the compatibility
role notion.
2020-10-29 20:44:39 -04:00
Jason Tedor b46b6d5977
Fix compilation in DataTierTests.java
This commit fixes a compilation issue in DataTierTests.java that was
introduced due to language-level differences between 7.10/7.x and
master.
2020-10-27 13:04:55 -04:00
Jason Tedor 04a9845a49
Adjust defaults for tiered data roles (#64015)
This commit adjusts the defaults for the tiered data roles so that they
are enabled by default, or if the node has the legacy data role. This
ensures that the default experience is that the tiered data roles are
enabled.

To fully specifiy the behavior for the tiered data roles then:
 - starting a new node with the defaults: enabled
 - starting a new node with node.roles configured: enabled if and only
   if the tiered data roles are explicitly configured, independently
   of the node having the data role
 - starting a new node with node.data enabled: enabled unless the
   tiered data roles are explicitly disabled
 - starting a new node with node.data disabled: disabled unless the
   tiered data roles are explicitly enabled
2020-10-27 12:48:31 -04:00
Henning Andersen 0cba23e08f XPack Usage should run on MANAGEMENT threads (#64160)
XPack usage starts out on management threads, but depending on the
implementation of the usage plugin, they could end up running on
transport threads instead. Fixed to always reschedule on a management
thread.
2020-10-27 16:03:26 +01:00
Nhat Nguyen 566d1fd459 Return the same point in time in search response (#64188)
With this change, we will always return the same point in time in a
search response as its input until we implement the retry mechanism
for the point in times.
2020-10-27 10:17:44 -04:00
David Roberts adc5509eda
[ML] Support the unsigned_long type in data frame analytics (#64072)
Adds support for the unsigned_long type to data frame analytics.

This type is handled in the same way as the long type.  Values
sent to the ML native processes are converted to floats and
hence will lose accuracy when outside the range where a float
can uniquely represent long values.

Backport of #64066
2020-10-26 09:05:49 +00:00
Benjamin Trent eff7f06ca6
[ML] fix inference binary classification predication label and feature importance (#63688) (#63930)
When calculating feature importance, the leaf values directly correlate the value of the importance.

Consequently, positive leaf values -> positive feature importance

negative leaf values -> negative feature importance.

It follows that for binary classification, this is done such that the importance relates to the leaf values, which relate directly to the "probability of class 1".

So, the feature importance calculated is always for the importance as it relates to class 1.

The inverse is the importance as it relates to class 0.
2020-10-20 08:50:15 -04:00
Ioannis Kakavas 364511395d
[7.10] Move RestRequestFilter to core (#63507)
Move RestRequestFilter to core so that Rest requests outside xpack can use 
it to filter fields and expand its usage.

Backport of #63507
2020-10-16 13:57:52 +03:00
Jim Ferenczi 1d78bd0f72 Async search should retry updates on version conflict (#63652)
* Async search should retry updates on version conflict

The _async_search APIs can throw version conflict exception when the internal response
is updated concurrently. That can happen if the final response is written while the user
extends the expiration time. That scenario should be rare but it happened in Kibana for
several users so this change ensures that updates are retried at least 5 times. That
should resolve the transient errors for Kibana. This change also preserves the version
conflict exception in case the retry didn't work instead of returning a confusing 404.
This commit also ensures that we don't delete the response if the search was cancelled
internally and not deleted explicitly by the user.

Closes #63213
2020-10-16 08:49:02 +02:00
Albert Zaharovits f4e1e6893d Add view_index_metadata over metricbeat-* for monitoring agent (#63750)
The `remote_monitoring_agent` reserved role is extended to grant more privileges
over the metricbeat-* index pattern.
In addition to the index and create_index index privileges that it granted already,
it now also grants the view_index_metadata privilege.

Closes #63203
2020-10-16 02:13:55 +03:00
Jay Modi ebdaeb2f9a
Ensure cancelled jobs do not continue to run (#63771)
This commit ensures that jobs within the SchedulerEngine do not
continue to run after they are cancelled. There was no synchronization
between the cancel method of an ActiveSchedule and the run method, so
an actively running schedule would go ahead and reschedule itself even
if the cancel method had been called.

This commit adds synchronization between cancelling and the scheduling
of the next run to ensure that the job is cancelled. In real life
scenarios this could manifest as a job running multiple times for
SLM. This could happen if a job had been triggered and was cancelled
prior to completing its run such as if the node was no longer the
master node or if SLM was stopping/stopped.

Closes #63754
Backport of #63762
2020-10-15 14:01:14 -06:00
Albert Zaharovits 2b7fbe9957 Add the missing apikey.* fields to the logfile audit layout for docker builds (#63609)
The layout pattern for the security audit for docker builds was missing the apiKey.* fields.
2020-10-14 13:58:41 +03:00
Lee Hinman 7371e51583
[7.10] Add DiscoveryNodeRole compatibility role for bwc tier serialization (#63581) (#63613)
Backports the following commits to 7.10:

    Add DiscoveryNodeRole compatibility role for bwc tier serialization (#63581)
2020-10-13 09:17:15 -06:00
Przemysław Witek acbd48f834
[ML] Allow setting num_top_classes to a special value -1 (#63587) (#63602) 2020-10-13 13:57:50 +02:00
Dimitris Athanasiou e1c418aac7
[7.10][ML] Validate dest pipeline exists on transform update (#63494) (#63549)
Adds validation that the dest pipeline exists when a transform
is updated. Refactors the pipeline check into the `SourceDestValidator`.

Fixes #59587

Backport of #63494
2020-10-12 15:41:35 +03:00
Przemysław Witek bd761cce1d
[ML] Validate that AucRoc has the data necessary to be calculated (#63302) (#63454) 2020-10-08 09:52:15 +02:00
Alan Woodward 88b45dfa61
Convert TextFieldMapper to parametrized form (#63269) (#63392)
As a result of this, we can remove a chunk of code from TypeParsers as well. Tests
for search/index mode analyzers have moved into their own file. This commit also
rationalises the serialization checks for parameters into a single SerializerCheck
interface that takes the values includeDefaults, isConfigured and the value
itself.

Relates to #62988
2020-10-07 13:26:25 +01:00
Gordon Brown 15edc39d9b
Update logstash_admin role for system indices (#63368)
This PR updates the `logstash_admin` role to include the recently-added Logstash Pipeline Management APIs, as well as access to the `.logstash*` index pattern.

Co-authored-by: William Brafford <williamrandolphbrafford@gmail.com>
2020-10-06 20:43:36 -06:00
Gordon Brown 5c8b0662df
Deprecate REST access to System Indices (#63274) (Original #60945)
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.

Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:

- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`

Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
2020-10-06 13:41:40 -06:00
Tanguy Leroux 87076c32e2
Determine shard size before allocating shards recovering from snapshots (#61906) (#63337)
Determines the shard size of shards before allocating shards that are
recovering from snapshots. It ensures during shard allocation that the
target node that is selected as recovery target will have enough free
disk space for the recovery event. This applies to regular restores,
CCR bootstrap from remote, as well as mounting searchable snapshots.

The InternalSnapshotInfoService is responsible for fetching snapshot
shard sizes from repositories. It provides a getShardSize() method
to other components of the system that can be used to retrieve the
latest known shard size. If the latest snapshot shard size retrieval
failed, the getShardSize() returns
ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE. While
we'd like a better way to handle such failures, returning this value
allows to keep the existing behavior for now.

Note that this PR does not address an issues (we already have today)
where a replica is being allocated without knowing how much disk
space is being used by the primary.

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-10-06 18:37:05 +02:00
David Kyle ea32b4ab82
[ML] Audit message when nightly maintenance times out (#63252) (#63330)
During deletion of old ml data set the delete by query timeout to 8 hours and
audit a job message when the nightly maintenance task times out.
2020-10-06 16:19:37 +01:00
Hendrik Muhs 058c55da6a [Transform] disallow field and script being empty for group sources (#63313)
fail validation earlier when field and script are both missing in a group source
2020-10-06 16:59:02 +02:00
Yang Wang abf9b885b4
Bulk invalidate API keys using a list of IDs (#63224) (#63320)
Add a new ids field to the API of invalidating API keys so that it supports bulk
invalidation with a list of IDs.
Note the existing id field is kept as is and it is an error if both id and ids are specified.
2020-10-07 00:49:21 +11:00
Yang Wang bbfa2f1303 Fix test failure due to missing client action 2020-10-07 00:45:30 +11:00
Yang Wang 7969fbb4ab
Cache API key doc to reduce traffic to the security index (#59376) (#63319)
Getting the API key document form the security index is the most time consuing part
of the API Key authentication flow (>60% if index is local and >90% if index is remote).
This traffic is now avoided by caching added with this PR.

Additionally, we add a cache invalidator registry so that clearing of different caches will
be managed in a single place (requires follow-up PRs).
2020-10-06 23:49:23 +11:00
David Kyle 8f4ef40f78
[ML] Auditor ensures template is installed before writes (#63286)
The ML auditors should not write if the latest template is not present. 
Instead a PUT template request is made and the writes queued up
2020-10-06 11:20:37 +01:00
Armin Braun cf75abb021
Optimize XContentParserUtils.ensureExpectedToken (#62691) (#63253)
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
2020-10-05 19:08:32 +02:00
Benjamin Trent 1e63313c19
[ML] adds feature_importance_baseline object to model metadata (#63172) (#63237)
this adds the new field `feature_importance_baseline` and allows it to be optionally be included in the model's metadata.

Related to: https://github.com/elastic/ml-cpp/pull/1522
2020-10-05 09:33:38 -04:00
Benjamin Trent 752ee0288e
[7.x] [ML] optimize delete expired snapshots (#63134) (#63200)
* [ML] optimize delete expired snapshots (#63134)

When deleting expired snapshots, we do an individual delete action per snapshot per job.

We should instead gather the expired snapshots and delete them in a single call.

This commit achieves this and a side-effect is there is less audit log spam on nightly cleanup

closes https://github.com/elastic/elasticsearch/issues/62875
2020-10-02 13:24:36 -04:00
Przemysław Witek 5370f270d7
[7.x] [ML] Ensure data frame analytics jobs don't run on a node that's too new (#62749) (#63175) 2020-10-02 17:19:58 +02:00
Joe Gallo d172a18c95 Tidy up some ILM and SLM packages (#63146)
Very minor refactoring, just moving some ILM and SLM classes around to decrease
the total number of packages.
2020-10-02 09:30:24 -04:00
Benjamin Trent 535f8a434b
Revert "[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)" (#63144)
This reverts commit 95242eccee.
2020-10-02 07:03:15 -04:00
Ioannis Kakavas e91f66e22f
Ensure domain_name setting for AD realm is present (#61983) (#63159)
We would only check for a null value and not for an empty string so
that meant that we were not actually enforcing this mandatory
setting. This commits ensures we check for both and fail 
accordingly if necessary, on startup
2020-10-02 12:16:08 +03:00
Lee Hinman f0f0da2188
[7.x] Add telemetry for data tiers (#63031) (#63140)
Backports the following commits to 7.x:

    Add telemetry for data tiers (#63031)
2020-10-01 12:37:32 -06:00
Benjamin Trent 95242eccee
[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)
This adds a new `baseline` field to the feature importance values. 

This field contains the baseline importance for a given feature and class.
2020-10-01 09:48:07 -04:00
Yang Wang e31bef4032
Fix API key role descriptors rewrite bug for upgraded clusters (#62917) (#63042)
This PR ensures that API key role descriptors are always rewritten to a target node
compatible format before a request is sent.
2020-09-30 22:16:39 +10:00