Commit Graph

1915 Commits

Author SHA1 Message Date
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Yannick Welsch 0b9eb210b8
Add basic searchable snapshots usage information (#58828) (#59160)
Adds super basic usage information for searchable snapshots, to be extended later.

Backport of #58828
2020-07-08 13:09:29 +02:00
Albert Zaharovits d4a0f80c32
Ensure authz role for API key is named after owner role (#59041)
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
2020-07-07 23:26:57 +03:00
Rene Groeschke e8181fc627
Fix implicit duplicate duplicatesStrategy in processResources (#58929) (#59127)
* Fix implicit duplicate duplicatesStrategy in processResources
* Fix duplicates strategy in docker distribution setup
2020-07-07 13:45:36 +02:00
David Roberts e217f9a1e8
[ML] Wait for shards to initialize after creating ML internal indices (#59087)
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices.  Currently this can result in attempts
to search the newly created index before its shards have initialized.

This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.

Backport of #59027
2020-07-07 10:52:10 +01:00
Przemysław Witek 4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Przemysław Witek f35ad0d4e1
Report peak model memory in ModelSizeStats (#59017) (#59055) 2020-07-06 12:55:12 +02:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Luca Cavanna e3fc1638d8 Improve error handling in async search code (#57925)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.

Closes #58995
2020-07-03 16:07:26 +02:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
David Kyle f6a0c2c59d
[7.x] Pipeline Inference Aggregation (#58965)
Adds a pipeline aggregation that loads a model and performs inference on the
input aggregation results.
2020-07-03 09:29:04 +01:00
Dan Hermann c988afdc15
Data stream support for migrations deprecations info API 2020-07-02 11:16:22 -05:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Przemysław Witek 8e074c4495
Rename "error" field to "value" for consistency between metrics (#58726) (#58870) 2020-07-02 09:08:56 +02:00
Yang Wang a5a8b4ae1d
Add cache for application privileges (#55836) (#58798)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-07-02 11:50:03 +10:00
Benjamin Trent c64e283dbf
[7.x] [ML] handles compressed model stream from native process (#58009) (#58836)
* [ML] handles compressed model stream from native process (#58009)

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: https://github.com/elastic/ml-cpp/pull/1349
2020-07-01 15:14:31 -04:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Ryan Ernst c23613e05a
Split license allowed checks into two types (#58704) (#58797)
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.

Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
2020-07-01 07:11:05 -07:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
Dario Gieselaar 417f7062c5
[7.x] Add read privileges for annotations for apm_user (#58530) (#58781)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-01 09:04:57 +02:00
Yang Wang 3d49e62960
Support handling LogoutResponse from SAML idP (#56316) (#58792)
SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective.

The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error.

This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.
2020-07-01 16:47:27 +10:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
David Roberts d9e0e0bf95
[ML] Pass through the stop-on-warn setting for categorization jobs (#58738)
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.

Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356

Backport of #58632
2020-06-30 15:17:04 +01:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Przemysław Witek 9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 14:09:11 +02:00
Tim Vernum dcc5a06dec
Display enterprise license as platinum in /_xpack (#58217)
The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.

However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.

This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.

This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
2020-06-30 16:42:28 +10:00
Przemysław Witek 3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API (#58302) (#58648) 2020-06-29 10:56:11 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00
Dimitris Athanasiou 1817b896c9
[7.x][ML] Add status and increased estimate to memory usage (#58588) (#58606)
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.

Backport of #58588
2020-06-28 16:27:26 +03:00
Lee Hinman f732003370
[7.x] Fix negative limiting with fewer PARTIAL snapshots than minimum required (#58563) (#58569)
In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove
the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can
cause:

```
[mynode] error during snapshot retention task
java.lang.IllegalArgumentException: -5
	at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?]
	at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?]
	at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?]
	at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
```

When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and
adds a unit test for the behavior.

Resolves #58515
2020-06-25 14:16:34 -06:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Igor Motov 20af856abd
[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192)
Adds async support to EQL searches

Closes #49638

Co-authored-by: James Rodewig james.rodewig@elastic.co
2020-06-25 14:11:57 -04:00
Benjamin Trent c7ba79bc19
[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537) (#58553)
* [ML] make waiting for renormalization optional for internally flushing job (#58537)

When flushing, datafeeds only need the guaruntee that the latest bucket has been handled.

But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data.

This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization.

closes #58395
2020-06-25 12:26:52 -04:00
Nik Everett 71adade73a
Return clear error message if aggregation type is invalid (#58255) (#58365)
The main changes are:

1. Catch the `NamedObjectNotFoundException` when parsing aggregation
   type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().

Closes #58146.

Co-authored-by: bellengao <gbl_long@163.com>
2020-06-25 11:08:25 -04:00
Dimitris Athanasiou c3dfafe0b4
[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541) (#58550)
It is possible for the source document to have an empty string value
for a field that is mapped as numeric. We should treat those as missing
values and avoid throwing an assertion error.

Backport of #58541
2020-06-25 18:07:29 +03:00
Dimitris Athanasiou 5af7071db0
[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546)
This changes the default value for the results field of inference
applied on models that are trained via a data frame analytics job.
Previously, the results field default was `predicted_value`. This
commit makes it the same as in the training job itself. The new
default field is `<dependent_variable>_prediction`. Apart from
making inference consistent with the training job the model came
from, it is helpful to preserve the dependent variable name
by default as it provides some context to the user that may
avoid confusion as to which model results came from.

Backport of #58538
2020-06-25 18:03:43 +03:00
Chris Roberson d5899d1765
[Monitoring] APM mapping update (#46244) (#58498)
* Add acm mapping to APM for beats

* Add root mapping for APM

* Add sourcemap mapping to APM

* Fix missing properties

* Fix a second missing properties

* Add request property to acm

* Remove root and sourcemap per review

Co-authored-by: Mike Place <mike.place@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-24 13:26:30 -04:00
Armin Braun 9e4c5d1dde
Cleaner Handling of Snapshot Related null Custom Values in CS (#58382) (#58501)
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
2020-06-24 17:24:44 +02:00
Przemysław Witek 551b8bcd73
[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names (#58484) (#58490) 2020-06-24 15:52:45 +02:00
Benjamin Trent fa88e71532
[ML] unify usages of _all and wildcard <*> (#58460) (#58494) 2020-06-24 09:47:57 -04:00
Jim Ferenczi fcd8a432d9 Submit _async search task should cancel children on cancellation (#58332)
This change allows the submit async search task to cancel children
and removes the manual indirection that cancels the search task when the submit
task is cancelled. This is now handled by the task cancellation, which can cancel
grand-children since #54757.
2020-06-24 09:10:26 +02:00
Przemysław Witek 4e4ca6ac25
Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447) (#58459) 2020-06-23 22:18:39 +02:00
Benjamin Trent a9b868b7a9
[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280) (#58455)
This commits allows data streams to be a valid source for analytics and transforms.

Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.

For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
2020-06-23 14:40:35 -04:00
David Roberts 0d6bfd0ac3
[7.x][ML] Fix wire serialization for flush acknowledgements (#58443)
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null.  This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant.  But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.

This change corrects the discrepancies.  Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.

Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.

Backport of #58413
2020-06-23 16:42:06 +01:00
David Roberts f97b37190b [ML] Add a new annotation type for categorization status changes (#58394)
Adds a new value to the "event" enum of ML annotations, namely
"categorization_status_change".

This will allow users to see when categorization was found to
be performing poorly.  Once per-partition categorization is
available, it will allow users to see when categorization is
performing poorly for a specific partition.

It does not make sense to reuse the "model_change" event that
annotations already have, because categorizer state is separate
to model state ("model" state is really anomaly detector state),
and is not reverted by the revert model snapshot API.
Therefore annotations related to categorization need to be
treated differently to annotations related to anomaly detection.
2020-06-23 09:16:27 +01:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00