OpenSearch

Commit Graph

Author	SHA1	Message	Date
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Igor Motov	f70a59971a	[7.x] Add rate aggregation (#61369 ) (#61554 ) Adds a new rate aggregation that can calculate a document rate for buckets of a date_histogram. Closes #60674	2020-08-25 17:39:00 -04:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
David Kyle	539cf914bc	[ML] handle new model metadata stream from native process (#59725 ) (#61251 ) This adds the serialization handling for the new model_metadata object from the native process. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-08-24 15:52:13 -04:00
Yang Wang	cd52233b94	Include authentication type for the authenticate response (#61247 ) (#61411 ) Add a new "authentication_type" field to the response of "GET _security/_authenticate".	2020-08-21 22:59:43 +10:00
Lloyd	cb83e7011c	[Backport][API keys] Add full_name and email to API key doc and use them to populate authing User (#61354 ) (#61403 ) The API key document currently doesn't include the user's full_name or email attributes, and as a result, when those attributes return `null` when hitting `GET`ing `/_security/_authenticate`, and in the SAML response from the [IdP Plugin](https://github.com/elastic/elasticsearch/pull/54046). This changeset adds those fields to the document and extracts them to fill in the User when authenticating. They're effectively going to be a snapshot of the User from when the key was created, but this is in line with roles and metadata as well. Signed-off-by: lloydmeta <lloydmeta@gmail.com>	2020-08-21 18:32:19 +09:00
Mark Tozzi	db1df6cc30	[7.x] Remove a bunch of type boilerplate from Aggs (#60852 ) (#61031 )	2020-08-17 12:13:05 -04:00
Benjamin Trent	8f302282f4	[ML] adds new feature_processors field for data frame analytics (#60528 ) (#61148 ) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes https://github.com/elastic/elasticsearch/issues/59327	2020-08-14 10:32:20 -04:00
David Roberts	d1b60269f4	[ML] Ensure annotations index mappings are up to date (#61142 ) When the ML annotations index was first added, only the ML UI wrote to it, so the code to create it was designed with this in mind. Now the ML backend also creates annotations, and those mappings can change between versions. In this change: 1. The code that runs on the master node to create the annotations index if it doesn't exist but another ML index does also now ensures the mappings are up-to-date. This is good enough for the ML UI's use of the annotations index, because the upgrade order rules say that the whole Elasticsearch cluster must be upgraded prior to Kibana, so the master node should be on the newer version before Kibana tries to write an annotation with the new fields. 2. We now also check whether the annotations index exists with the correct mappings before starting an autodetect process on a node. This is necessary because ML nodes can be upgraded before the master node, so could write an annotation with the new fields before the master node knows about the new fields. Backport of #61107	2020-08-14 13:51:04 +01:00
Benjamin Trent	7c3bfb9437	[ML] updating feature_importance results mapping (#61104 ) (#61144 ) This updates the feature_importance mapping change from elastic/ml-cpp#1387	2020-08-14 08:43:10 -04:00
Lee Hinman	e3df64a429	[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994 ) (#61045 ) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848	2020-08-12 11:06:23 -06:00
Andrei Dan	32173a82c8	ILM: add frozen phase (#60983 ) (#61035 ) This adds a frozen phase to ILM that will allow the execution of the set_priority, unfollow, allocate, freeze and searchable_snapshot actions. The frozen phase will be executed after the cold and before the delete phase. (cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-12 16:36:27 +01:00
Benjamin Trent	4275a715c9	[ML] adjusting inference processor to support foreach usage (#60915 ) (#61022 ) `foreach` processors store information within the `_ingest` metadata object. This commit adds the contents of the `_ingest` metadata (if it is not empty). And will append new inference results if the result field already exists. This allows a `foreach` to execute and multiple inference results being written to the same result field. closes https://github.com/elastic/elasticsearch/issues/60867	2020-08-12 08:34:18 -04:00
Armin Braun	32423a486d	Simplify and Speed up some Compression Usage (#60953 ) (#61008 ) Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.	2020-08-12 11:06:23 +02:00
Dimitris Athanasiou	2e18c0f2ac	[7.x][ML] Audit force stopping data frame analytics (#60973 ) (#61004 ) Audits a message when a data frame analytics job is force stopped. Backport of #60973	2020-08-12 07:45:26 +03:00
Benjamin Trent	66b3e89482	[ML] enable logging for test failures (#60902 ) (#60910 )	2020-08-10 12:36:30 -04:00
Andrei Dan	235e5ed3ea	[7.x] ILM: add force-merge step to searchable snapshots action (#60819 ) (#60882 ) This adds a force-merge step to the searchable snapshot action, enabled by default, but parameterizable using the `force_merge-index" optional boolean. eg. ``` PUT _ilm/policy/my_policy { "policy": { "phases": { "cold": { "actions": { "searchable_snapshot" : { "snapshot_repository" : "backing_repo", "force_merge_index": true } } } } } } ``` (cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-10 13:45:11 +01:00
Hendrik Muhs	b210aaf666	[Transform] remove wrong test (#60807 ) remove test, scripts are excluded in the change collector, the test is a leftover from a previous solution of #57332, which has been discarded relates #60724 fixes #60794	2020-08-06 11:56:19 +02:00
Ryan Ernst	d88098c1d5	Mute flaky transform pivot test see https://github.com/elastic/elasticsearch/issues/60794	2020-08-05 14:53:25 -07:00
Francisco Fernández Castaño	b4044004aa	Add recovery state tracking for Searchable Snapshots (#60751 ) This pull request adds recovery state tracking for Searchable Snapshots. In order to track recoveries for searchable snapshot backed indices, this pull request adds a new type of RecoveryState. This newRecoveryState instance is able to deal with the small differences that arise during Searchable snapshots recoveries. Those differences can be summarized as follows: - The Directory implementation that's provided by SearchableSnapshots mark the snapshot files as reused during recovery. In order to keep track of the recovery process as the cache is pre-warmed, those files shouldn't be marked as reused. - Once the shard is created, the cache starts its pre-warming phase, meaning that we should keep track of those downloads during that process and tie the recovery to this pre-warming phase. The shard is considered recovered once this pre-warming phase has finished. Backport of #60505	2020-08-05 17:41:49 +02:00
Hendrik Muhs	08f94c914b	[Transform] disable optimizations when using scripts in group_by (#60724 ) disable optimizations when using scripts in group_by, when scripts using scripts we can not predict the outcome and we have no query counterpart. Other optimizations for other group_by's are not affected. fixes #57332	2020-08-05 17:27:19 +02:00
Przemysław Witek	0afa1bd972	Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601 ) (#60727 )	2020-08-05 13:39:40 +02:00
Yannick Welsch	9f6f66f156	Fail searchable snapshot shards on invalid license (#60722 ) Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.	2020-08-05 13:14:15 +02:00
Adrien Grand	67f6f34c23	Remove dataset.* fields. (#60720 ) These are being replaced by the `data_stream.*` fields.	2020-08-05 11:35:05 +02:00
Adrien Grand	602d269059	Rename `datastream` to `data_stream`. (#60714 ) The name of the feature having a space: "data stream", the key should have an underscore.	2020-08-05 09:55:02 +02:00
Adrien Grand	20ae1b75bd	Rename dataset to datastream (#60638 ) Co-authored-by: ruflin <spam@ruflin.com>	2020-08-04 09:58:54 +02:00
Armin Braun	7ae9dc2092	Unify Stream Copy Buffer Usage (#56078 ) (#60608 ) We have various ways of copying between two streams and handling thread-local buffers throughout the codebase. This commit unifies a number of them and removes buffer allocations in many spots.	2020-08-04 09:54:52 +02:00
Yang Wang	54aaadade7	API key name should always be required for creation (#59836 ) (#60636 ) The name is now required when creating or granting API keys.	2020-08-04 13:28:47 +10:00
Yannick Welsch	b0d601fa63	Adjust searchable snapshot license (#60578 ) No longer needs Platinum license for testing on staging.	2020-08-03 13:19:53 +02:00
Rene Groeschke	ed4b70190b	Replace immediate task creations by using task avoidance api (#60071 ) (#60504 ) - Replace immediate task creations by using task avoidance api - One step closer to #56610 - Still many tasks are created during configuration phase. Tackled in separate steps	2020-07-31 13:09:04 +02:00
Hendrik Muhs	a721d6d19b	[Transform] use correct version in BWC serialization test (#60500 ) use correct version in BWC serialization test fixes #60464	2020-07-31 11:23:05 +02:00
Przemysław Witek	9e27f7474c	Make MlDailyMaintenanceService delete jobs that are in deleting state anyway (#60121 ) (#60439 )	2020-07-30 09:53:11 +02:00
Hendrik Muhs	aaed6b59d6	[7.x][Transform] add support for missing bucket (#59591 ) (#60390 ) add support for "missing_bucket" in group_by fixes #42941 fixes #55102 backport #59591	2020-07-30 08:26:51 +02:00
Benjamin Trent	76359aaa53	[ML] always write prediction_[score\|probability] for classification inference (#60335 ) (#60397 ) In order to unify model inference and analytics results we need to write the same fields. prediction_probability and prediction_score are now written for inference calls against classification models.	2020-07-29 10:58:14 -04:00
Jim Ferenczi	578749a5e8	Fix AsyncResultsServiceTests#testRetrieveFromMemoryWithExpiration (#60337 ) This change ensures that the expiration time that is set in the test is long enough to not be triggered by a slow execution. Closes #60255	2020-07-29 09:47:47 +02:00
Armin Braun	753fd4f6bc	Cleanup and optimize More Serialization Spots (#59959 ) (#60331 ) Same as #59626 for a few more spots.	2020-07-29 07:20:44 +02:00
Benjamin Trent	54c8936508	[ML] do not summerize importance for custom features (#60198 ) (#60333 ) If a feature is created via a custom pre-processor, we should return the importance for that feature. This means we will not return the importance for the original document field for custom processed features. closes https://github.com/elastic/elasticsearch/issues/59330	2020-07-28 15:58:20 -04:00
Dimitris Athanasiou	ed7dcff7c4	[7.x][ML] Audit updates on data frame analytics jobs (#60126 ) (#60287 ) Closes #59652 Backport of #60126	2020-07-28 16:33:35 +03:00
David Roberts	89466eefa5	Don't require separate privilege for internal detail of put pipeline (#60190 ) Putting an ingest pipeline used to require that the user calling it had permission to get nodes info as well as permission to manage ingest. This was due to an internal implementaton detail that was not visible to the end user. This change alters the behaviour so that a user with the manage_pipeline cluster privilege can put an ingest pipeline regardless of whether they have the separate privilege to get nodes info. The internal implementation detail now runs as the internal _xpack user when security is enabled. Backport of #60106	2020-07-27 10:44:48 +01:00
Dan Hermann	ca25f6ae6f	Include the resolve index action in the view_index_metadata privilege (#59785 ) (#60112 )	2020-07-23 08:13:56 -05:00
Larry Gregory	a686ccc9b2	[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854 ) (#60047 )	2020-07-22 11:06:10 -04:00
James Baiera	b3363cf8f9	[7.x] Remove unneeded rest params from Data Stream Stats (#59575 ) (#59661 ) This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data Streams Stats REST endpoint. These options are required for broadcast requests, but are not needed for anything in terms of resolving data streams. Instead, we just set a default set of IndicesOptions on the transport request.	2020-07-21 15:59:16 -04:00
Przemysław Witek	283a1f605c	Rename binary_soft_classification evaluation to outlier_detection (#59951 ) (#59970 )	2020-07-21 15:15:04 +02:00
Benjamin Trent	b7f30fc929	[7.x] Adding new `require_alias` option to indexing requests (#58917 ) (#59769 ) * Adding new `require_alias` option to indexing requests (#58917) This commit adds the `require_alias` flag to requests that create new documents. This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias. When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings. This is useful when an alias is required instead of a concrete index. closes https://github.com/elastic/elasticsearch/issues/55267	2020-07-17 10:24:58 -04:00
Benjamin Trent	a28547c4b4	[7.x] [ML] add new `custom` field to trained model processors (#59542 ) (#59700 ) * [ML] add new `custom` field to trained model processors (#59542) This commit adds the new configurable field `custom`. `custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job. Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature). This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors in the analytics job configuration, we need to know the input and output field names.	2020-07-16 10:57:38 -04:00
Przemysław Witek	df4fea79cb	Add a "verbose" option to the data frame analytics stats endpoint (#59589 ) (#59621 )	2020-07-16 09:51:31 +02:00
Martijn van Groningen	2a89e13e43	Move data stream transport and rest action to xpack (#59593 ) Backport of #59525 to 7.x branch. * Actions are moved to xpack core. * Transport and rest actions are moved the data-streams module. * Removed data streams methods from Client interface. * Adjusted tests to use client.execute(...) instead of data stream specific methods. * only attempt to delete all data streams if xpack is installed in rest tests * Now that ds apis are in xpack and ESIntegTestCase no longers deletes all ds, do that in the MlNativeIntegTestCase class for ml tests.	2020-07-15 16:50:44 +02:00
Tanguy Leroux	604f22db79	Use a dedicated thread pool for searchable snapshot cache prewarming (#59313 ) (#59590 ) Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.	2020-07-15 11:45:52 +02:00
Tal Levy	4bb91b61e8	Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349 ) (#59577 ) Closes #44505.	2020-07-14 22:37:48 -07:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Ryan Ernst	3b688bfee5	Add license feature usage api (#59342 ) (#59571 ) This commit adds a new api to track when gold+ features are used within x-pack. The tracking is done internally whenever a feature is checked against the current license. The output of the api is a list of each used feature, which includes the name, license level, and last time it was used. In addition to a unit test for the tracking, a rest test is added which ensures starting up a default configured node does not result in any features registering as used. There are a couple features which currently do not work well with the tracking, as they are checked in a manner that makes them look always used. Those features will be fixed in followups, and in this PR they are omitted from the feature usage output.	2020-07-14 14:34:59 -07:00
Albert Zaharovits	4eb310c777	Disallow mapping updates for doc ingestion privileges (#58784 ) The `create_doc`, `create`, `write` and `index` privileges do not grant the PutMapping action anymore. Apart from the `write` privilege, the other three privileges also do NOT grant (auto) updating the mapping when ingesting a document with unmapped fields, according to the templates. In order to maintain the BWC in the 7.x releases, the above privileges will still grant the Put and AutoPutMapping actions, but only when the "index" entity is an alias or a concrete index, but not a data stream or a backing index of a data stream.	2020-07-14 23:39:41 +03:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
Nhat Nguyen	4d7c59bedb	Assign follower primary to nodes with remote cluster client role (#59375 ) The primary shards of follower indices during the bootstrap need to be on nodes with the remote cluster client role as those nodes reach out to the corresponding leader shards on the remote cluster to copy Lucene segment files and renew the retention leases. This commit introduces a new allocation decider that ensures bootstrapping follower primaries are allocated to nodes with the remote cluster client role. Co-authored-by: Jason Tedor <jason@tedor.me>	2020-07-14 11:23:55 -04:00
Andrei Stefan	cf752992d6	Add telemetry metrics (#59526 )	2020-07-14 16:25:24 +03:00
Dan Hermann	59f639a279	Add auto_configure privilege	2020-07-14 08:23:49 -05:00
Andrei Dan	7dcdaeae49	Default to @timestamp in composable template datastream definition (#59317 ) (#59516 ) This makes the data_stream timestamp field specification optional when defining a composable template. When there isn't one specified it will default to `@timestamp`. (cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 12:36:54 +01:00
Hendrik Muhs	c8290167a0	[7.x][Transform] separate pivot and extract function interface (#59505 ) separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function. Foundation to add more functions to transform. piggy backed fixes: - when running geo tile group_by it could fail due to query clause limit (unreleased) - new style page size using settings was not validating limit of 10k (7.8)	2020-07-14 11:27:16 +02:00
Lee Hinman	bf1a60130d	[7.x] Add telemetery for data streams (#59433 ) (#59454 ) This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is pretty minimal, returning only the number of data streams and the number of indices currently abstracted by a data stream: ``` ... "data_streams" : { "available" : true, "enabled" : true, "data_streams" : 3, "indices_count" : 17 } ... ```	2020-07-13 14:30:11 -06:00
David Roberts	b5e8250a4e	[ML] Drive categorization warning notifications from annotations (#59393 ) With the introduction of per-partition categorization the old logic for creating a job notification for categorization status "warn" does not work. However, the C++ code is already writing annotations for categorization status "warn" that take into account whether per-partition categorization is being used and which partition(s) the warnings relate to. Therefore, this change alters the Java results processor to create notifications based on the annotations the C++ writes. (It is arguable that we don't need both annotations and notifications, but they show up in different ways in the UI: only annotations are visible in results and only notifications set the warning symbol in the jobs list. This means it's best to have both.) Backport of #59377	2020-07-13 15:28:57 +01:00
Yang Wang	a84469742c	Improve role cache efficiency for API key roles (#58156 ) (#59397 ) This PR ensure that same roles are cached only once even when they are from different API keys. API key role descriptors and limited role descriptors are now saved in Authentication#metadata as raw bytes instead of deserialised Map<String, Object>. Hashes of these bytes are used as keys for API key roles. Only when the required role is not found in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly from raw bytes to RoleDescriptors without going through the current detour of "bytes -> Map -> bytes -> RoleDescriptors".	2020-07-13 22:58:11 +10:00
Dan Hermann	e01d73c737	[7.x] Data stream admin actions are now index-level actions	2020-07-10 14:36:18 -05:00
Dimitris Athanasiou	b2243337d8	[7.x][ML] Data frame analytics max_num_threads setting (#59254 ) (#59308 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU. Backport of #59254 and #57274	2020-07-09 19:15:46 +03:00
Dimitris Athanasiou	d07b11b86b	[7.x][ML] Perform test inference on java (#58877 ) (#59298 ) Since we are able to load the inference model and perform inference in java, we no longer need to rely on the analytics process to be performing test inference on the docs that were not used for training. The benefit is that we do not need to send test docs and fit them in memory of the c++ process. Backport of #58877 Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-07-09 16:30:49 +03:00
David Kyle	86555ec163	Remove unused function InferenceIndexConstants.mapping() (#59146 ) (#59158 ) InferenceIndexConstants.mapping() is broken and unused.	2020-07-09 14:28:53 +01:00
David Kyle	dbb9c802b1	Better error message when the model cannot be parsed due to its size (#59166 ) (#59209 ) The actual cause can be lost in a long list of parse exceptions this surfaces the cause when the problem is size.	2020-07-09 13:43:46 +01:00
Albert Zaharovits	2b7456db7f	Improve auditing of API key authentication #58928 1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields to the `access_granted`, `access_denied`, `authentication_success`, and (some) `tampered_request` audit events. The `apikey.id` and `apikey.name` are present only when authn using an API Key. 2. When authn with an API Key, the `user.realm` field now contains the effective realm name of the user that created the key, instead of the synthetic value of `_es_api_key`.	2020-07-09 13:26:18 +03:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
David Turner	6ffdb19a2a	Clean searchable snapshots cache on startup (#59009 ) Today we empty the searchable snapshots cache when cleanly closing a shard, but leak cache files in some cases involving an unclean shutdown. Such leaks are not permanent, they are cleaned up on shard relocation or deletion, but they still might last for arbitrarily long until that happens. This commit introduces a cleanup process that runs during node startup to catch such leaks sooner. Also, today we permit searchable snapshots to be held on custom data paths, and store the corresponding cache files within the custom location. Supporting this feature would make the cleanup process significantly more complicated since it would require each node to parse the index metadata for the shards it held before shutdown. Yet, this feature is undocumented and offers minimal benefits to searchable snapshots. Therefore with this commit we forbid custom data paths for searchable snapshot shards.	2020-07-08 15:17:52 +01:00
Dan Hermann	90c8d3fc9d	IndexNameExpressionResolver::dataStreamNames should support exclusions	2020-07-08 07:35:52 -05:00
Yannick Welsch	0b9eb210b8	Add basic searchable snapshots usage information (#58828 ) (#59160 ) Adds super basic usage information for searchable snapshots, to be extended later. Backport of #58828	2020-07-08 13:09:29 +02:00
Albert Zaharovits	d4a0f80c32	Ensure authz role for API key is named after owner role (#59041 ) The composite role that is used for authz, following the authn with an API key, is an intersection of the privileges from the owner role and the key privileges defined when the key has been created. This change ensures that the `#names` property of such a role equals the `#names` property of the key owner role, thereby rectifying the value for the `user.roles` audit event field.	2020-07-07 23:26:57 +03:00
Rene Groeschke	e8181fc627	Fix implicit duplicate duplicatesStrategy in processResources (#58929 ) (#59127 ) * Fix implicit duplicate duplicatesStrategy in processResources * Fix duplicates strategy in docker distribution setup	2020-07-07 13:45:36 +02:00
David Roberts	e217f9a1e8	[ML] Wait for shards to initialize after creating ML internal indices (#59087 ) There have been a few test failures that are likely caused by tests performing actions that use ML indices immediately after the actions that create those ML indices. Currently this can result in attempts to search the newly created index before its shards have initialized. This change makes the method that creates the internal ML indices that have been affected by this problem (state and stats) wait for the shards to be initialized before returning. Backport of #59027	2020-07-07 10:52:10 +01:00
Przemysław Witek	4a791e835b	Simplify parser declarations when specialist types are stored in strings (#58996 ) (#59056 )	2020-07-06 13:05:03 +02:00
Przemysław Witek	f35ad0d4e1	Report peak model memory in ModelSizeStats (#59017 ) (#59055 )	2020-07-06 12:55:12 +02:00
David Kyle	c651135562	[ML] Make Inference processor field_map and inference_config optional (#59010 ) Relaxes the requirement that the inference ingest processor must has a field_map and inference_config defined even if they are empty.	2020-07-06 11:35:30 +01:00
Benjamin Trent	b9d9964d10	[ML] add exponent output aggregator to inference (#58933 ) (#59016 ) * [ML] add exponent output aggregator to inference * fixing docs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-03 14:51:00 -04:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Luca Cavanna	e3fc1638d8	Improve error handling in async search code (#57925 ) - The exception that we caught when failing to schedule a thread was incorrect. - We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager - when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed. Closes #58995	2020-07-03 16:07:26 +02:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
David Kyle	f6a0c2c59d	[7.x] Pipeline Inference Aggregation (#58965 ) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results.	2020-07-03 09:29:04 +01:00
Dan Hermann	c988afdc15	Data stream support for migrations deprecations info API	2020-07-02 11:16:22 -05:00
Przemysław Witek	751e84e4c8	Rename regression evaluation metrics to make the names consistent with loss functions (#58887 ) (#58927 )	2020-07-02 17:35:55 +02:00
Przemysław Witek	8e074c4495	Rename "error" field to "value" for consistency between metrics (#58726 ) (#58870 )	2020-07-02 09:08:56 +02:00
Yang Wang	a5a8b4ae1d	Add cache for application privileges (#55836 ) (#58798 ) Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).	2020-07-02 11:50:03 +10:00
Benjamin Trent	c64e283dbf	[7.x] [ML] handles compressed model stream from native process (#58009 ) (#58836 ) * [ML] handles compressed model stream from native process (#58009) This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: https://github.com/elastic/ml-cpp/pull/1349	2020-07-01 15:14:31 -04:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Ryan Ernst	c23613e05a	Split license allowed checks into two types (#58704 ) (#58797 ) The checks on the license state have a singular method, isAllowed, that returns whether the given feature is allowed by the current license. However, there are two classes of usages, one which intends to actually use a feature, and another that intends to return in telemetry whether the feature is allowed. When feature usage tracking is added, the latter case should not count as a "usage", so this commit reworks the calls to isAllowed into 2 methods, checkFeature, which will (eventually) both check whether a feature is allowed, and keep track of the last usage time, and isAllowed, which simply determines whether the feature is allowed. Note that I considered having a boolean flag on the current method, but wanted the additional clarity that a different method name provides, versus a boolean flag which is more easily copied without realizing what the flag means since it is nameless in call sites.	2020-07-01 07:11:05 -07:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
Dario Gieselaar	417f7062c5	[7.x] Add read privileges for annotations for apm_user (#58530 ) (#58781 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-01 09:04:57 +02:00
Yang Wang	3d49e62960	Support handling LogoutResponse from SAML idP (#56316 ) (#58792 ) SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective. The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error. This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.	2020-07-01 16:47:27 +10:00
Martijn van Groningen	adcef93a6c	Introduce new put mapping action for dynamic mapping updates. (#58746 ) Backport of #58419 Mapping updates that originate from indexing a document with unmapped fields will use this new action instead of the current put mapping action. This way on the security side, authorization logic can easily determine whether a mapping update is automatically generated or a mapping update originates from the put mapping api. The new auto put mapping action is only used if all nodes are on the version that supports it.	2020-06-30 18:02:31 +02:00
David Roberts	d9e0e0bf95	[ML] Pass through the stop-on-warn setting for categorization jobs (#58738 ) When per_partition_categorization.stop_on_warn is set for an analysis config it is now passed through to the autodetect C++ process. Also adds some end-to-end tests that exercise the functionality added in elastic/ml-cpp#1356 Backport of #58632	2020-06-30 15:17:04 +01:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Przemysław Witek	9ea9b7bd3b	[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 ) (#58731 )	2020-06-30 14:09:11 +02:00
Tim Vernum	dcc5a06dec	Display enterprise license as platinum in /_xpack (#58217 ) The GET /_license endpoint displays "enterprise" licenses as "platinum" by default so that old clients (including beats, kibana and logstash) know to interpret this new license type as if it were a platinum license. However, this compatibility layer was not applied to the GET /_xpack/ endpoint which also displays a license type & mode. This commit causes the _xpack API to mimic the _license API and treat enterprise as platinum by default, with a new accept_enterprise parameter that will cause the API to return the correct "enterprise" value. This BWC layer exists only for the 7.x branch. This is a breaking change because, since 7.6, the _xpack API has returned "enterprise" for enterprise licenses, but this has been found to break old versions of beats and logstash so needs to be corrected.	2020-06-30 16:42:28 +10:00
Przemysław Witek	3f7c45472e	[7.x] Introduce DataFrameAnalyticsConfig update API (#58302 ) (#58648 )	2020-06-29 10:56:11 +02:00
Yang Wang	61fa7f4d22	Change privilege of enrich stats API to monitor (#52027 ) (#52196 ) The remote_monitoring_user user needs to access the enrich stats API. But the request is denied because the API is categorized under admin. The correct privilege should be monitor.	2020-06-29 10:25:33 +10:00
Dimitris Athanasiou	1817b896c9	[7.x][ML] Add status and increased estimate to memory usage (#58588 ) (#58606 ) Adds parsing of `status` and `memory_reestimate_bytes` to data frame analytics `memory_usage`. When the training surpasses the model memory limit, the status will be set to `hard_limit` and `memory_reestimate_bytes` can be used to update the job's limit in order to restart the job. Backport of #58588	2020-06-28 16:27:26 +03:00
Lee Hinman	f732003370	[7.x] Fix negative limiting with fewer PARTIAL snapshots than minimum required (#58563 ) (#58569 ) In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can cause: ``` [mynode] error during snapshot retention task java.lang.IllegalArgumentException: -5 at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?] at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?] at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?] at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?] ``` When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and adds a unit test for the behavior. Resolves #58515	2020-06-25 14:16:34 -06:00
Henning Andersen	38be2812b1	Enhance extensible plugin (#58542 ) Rather than let ExtensiblePlugins know extending plugins' classloaders, we now pass along an explicit ExtensionLoader that loads the extensions asked for. Extensions constructed that way can optionally receive their own Plugin instance in the constructor.	2020-06-25 20:37:56 +02:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Igor Motov	20af856abd	[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192 ) Adds async support to EQL searches Closes #49638 Co-authored-by: James Rodewig james.rodewig@elastic.co	2020-06-25 14:11:57 -04:00
Benjamin Trent	c7ba79bc19	[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537 ) (#58553 ) * [ML] make waiting for renormalization optional for internally flushing job (#58537) When flushing, datafeeds only need the guaruntee that the latest bucket has been handled. But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data. This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization. closes #58395	2020-06-25 12:26:52 -04:00
Nik Everett	71adade73a	Return clear error message if aggregation type is invalid (#58255 ) (#58365 ) The main changes are: 1. Catch the `NamedObjectNotFoundException` when parsing aggregation type, and then throw a `ParsingException` with clear error message with hint. 2. Add a unit test method: AggregatorFactoriesTests#testInvalidType(). Closes #58146. Co-authored-by: bellengao <gbl_long@163.com>	2020-06-25 11:08:25 -04:00
Dimitris Athanasiou	c3dfafe0b4	[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541 ) (#58550 ) It is possible for the source document to have an empty string value for a field that is mapped as numeric. We should treat those as missing values and avoid throwing an assertion error. Backport of #58541	2020-06-25 18:07:29 +03:00
Dimitris Athanasiou	5af7071db0	[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546 ) This changes the default value for the results field of inference applied on models that are trained via a data frame analytics job. Previously, the results field default was `predicted_value`. This commit makes it the same as in the training job itself. The new default field is `<dependent_variable>_prediction`. Apart from making inference consistent with the training job the model came from, it is helpful to preserve the dependent variable name by default as it provides some context to the user that may avoid confusion as to which model results came from. Backport of #58538	2020-06-25 18:03:43 +03:00
Chris Roberson	d5899d1765	[Monitoring] APM mapping update (#46244 ) (#58498 ) * Add acm mapping to APM for beats * Add root mapping for APM * Add sourcemap mapping to APM * Fix missing properties * Fix a second missing properties * Add request property to acm * Remove root and sourcemap per review Co-authored-by: Mike Place <mike.place@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-24 13:26:30 -04:00
Armin Braun	9e4c5d1dde	Cleaner Handling of Snapshot Related null Custom Values in CS (#58382 ) (#58501 ) Add the ability to get a custom value while specifying a default and use it throughout the codebase to get rid of the `null` edge case and shorten the code a little.	2020-06-24 17:24:44 +02:00
Przemysław Witek	551b8bcd73	[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names (#58484 ) (#58490 )	2020-06-24 15:52:45 +02:00
Benjamin Trent	fa88e71532	[ML] unify usages of _all and wildcard <*> (#58460 ) (#58494 )	2020-06-24 09:47:57 -04:00
Jim Ferenczi	fcd8a432d9	Submit _async search task should cancel children on cancellation (#58332 ) This change allows the submit async search task to cancel children and removes the manual indirection that cancels the search task when the submit task is cancelled. This is now handled by the task cancellation, which can cancel grand-children since #54757.	2020-06-24 09:10:26 +02:00
Przemysław Witek	4e4ca6ac25	Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447 ) (#58459 )	2020-06-23 22:18:39 +02:00
Benjamin Trent	a9b868b7a9	[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280 ) (#58455 ) This commits allows data streams to be a valid source for analytics and transforms. Data streams are fairly transparent and our `_search` and `_reindex` actions work without error. For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.	2020-06-23 14:40:35 -04:00
David Roberts	0d6bfd0ac3	[7.x][ML] Fix wire serialization for flush acknowledgements (#58443 ) There was a discrepancy in the implementation of flush acknowledgements: most of the class was designed on the basis that the "last finalized bucket time" could be null but the wire serialization assumed that it was never null. This works because, the C++ sends zero "last finalized bucket time" when it is not known or not relevant. But then the Java code will print that to XContent as it is assuming null represents not known or not relevant. This change corrects the discrepancies. Internally within the class null represents not known or not relevant, but this is translated from/to 0 for communications from the C++ and old nodes that have the bug. Additionally I switched from Date to Instant for this class and made the member variables final to modernise it a bit. Backport of #58413	2020-06-23 16:42:06 +01:00
David Roberts	f97b37190b	[ML] Add a new annotation type for categorization status changes (#58394 ) Adds a new value to the "event" enum of ML annotations, namely "categorization_status_change". This will allow users to see when categorization was found to be performing poorly. Once per-partition categorization is available, it will allow users to see when categorization is performing poorly for a specific partition. It does not make sense to reuse the "model_change" event that annotations already have, because categorizer state is separate to model state ("model" state is really anomaly detector state), and is not reverted by the revert model snapshot API. Therefore annotations related to categorization need to be treated differently to annotations related to anomaly detection.	2020-06-23 09:16:27 +01:00
Martijn van Groningen	7dda9934f9	Keep track of timestamp_field mapping as part of a data stream (#58400 ) Backporting #58096 to 7.x branch. Relates to #53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.	2020-06-22 17:46:38 +02:00
Przemko Robakowski	a44dad9fbb	[7.x] Add support for snapshot and restore to data streams (#57675 ) (#58371 ) * Add support for snapshot and restore to data streams (#57675) This change adds support for including data streams in snapshots. Names are provided in indices field (the same way as in other APIs), wildcards are supported. If rename pattern is specified it renames both data streams and backing indices. It also adds test to make sure SLM works correctly. Closes #57127 Relates to #53100 * version fix * compilation fix * compilation fix * remove unused changes * compilation fix * test fix	2020-06-19 22:41:51 +02:00
Benjamin Trent	bf8641aa15	[7.x] [ML] calculate cache misses for inference and return in stats (#58252 ) (#58363 ) When a local model is constructed, the cache hit miss count is incremented. When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.	2020-06-19 09:46:51 -04:00
Jason Tedor	be08268562	Allow follower indices to override leader settings (#58103 ) Today when creating a follower index via the put follow API, or via an auto-follow pattern, it is not possible to specify settings overrides for the follower index. Instead, we copy all of the leader index settings to the follower. Yet, there are cases where a user would want some different settings on the follower index such as the number of replicas, or allocation settings. This commit addresses this by allowing the user to specify settings overrides when creating follower index via manual put follower calls, or via auto-follow patterns. Note that not all settings can be overrode (e.g., index.number_of_shards) so we also have detection that prevents attempting to override settings that must be equal between the leader and follow index. Note that we do not even allow specifying such settings in the overrides, even if they are specified to be equal between the leader and the follower index. Instead, the must be implicitly copied from the leader index, not explicitly set by the user.	2020-06-18 11:56:06 -04:00
Andrei Dan	caa5d3abe0	ILM actions check the managed index is not a DS write index (#58239 ) (#58295 ) This changes the actions that would attempt to make the managed index read only to check if the managed index is the write index of a data stream before proceeding. The updated actions are shrink, readonly, freeze and forcemerge. (cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-18 07:45:11 +01:00
Rene Groeschke	abc72c1a27	Unify dependency licenses task configuration (#58116 ) (#58274 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-18 08:15:50 +02:00
David Roberts	3f8d16304c	Add ML admin permissions to the kibana_system role (#58172 ) As part of the "ML in Spaces" project, access to the ML UI in Kibana is migrating to being controlled by Kibana privileges. The ML UI will check whether the logged-in user has permission to do something ML-related using Kibana privileges, and if they do will call the relevant ML Elasticsearch API using the Kibana system user. In order for this to work the kibana_system role needs to have administrative access to ML. Backport of #58061	2020-06-17 17:03:32 +01:00
Andrei Dan	e17c51151b	[7.x] ILM: don't take snapshot of a data stream's write index (#58159 ) (#58222 ) We don't allow converting a data stream's writeable index into a searchable snapshot. We are currently preventing swapping a data stream's write index with the restored index. This adds another step that will not proceed with the searchable snapshot action until the managed index is not the write index of a data stream anymore. (cherry picked from commit ccd618ead7cf7f5a74b9fb34524d00024de1479a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-17 09:45:16 +01:00
Benjamin Trent	3309817d18	[ML] fixing tree inference ctor to allow target_type to be optional (#58132 ) (#58165 ) The tree trained model object will set its target_type to be regression by default. This updates the inference object to behave the same way.	2020-06-16 13:29:11 -04:00
Alan Woodward	12a3f6dfca	MappedFieldType should not extend FieldType (#58160 ) MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to #56814	2020-06-16 16:56:43 +01:00
Alejandro Fernández Haro	3d0c8da66d	Add monitor and view_index_metadata to the built-in `kibana_system` role (#57755 ) Allows the kibana user to collect data telemetry in a background task by giving the kibana_system built-in role the view_index_metadata and monitoring privileges over all indices (*).	2020-06-15 14:40:27 +03:00
Shaunak Kashyap	5e2faad783	Add ILM policy PUT and GET for remote_monitoring_agent built-in role (#57963 ) Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>	2020-06-15 14:35:30 +03:00
Rene Groeschke	01e9126588	Remove deprecated usage of testCompile configuration (#57921 ) (#58083 ) * Remove usage of deprecated testCompile configuration * Replace testCompile usage by testImplementation * Make testImplementation non transitive by default (as we did for testCompile) * Update CONTRIBUTING about using testImplementation for test dependencies * Fail on testCompile configuration usage	2020-06-14 22:30:44 +02:00
Benjamin Trent	79c784932f	[ML] allow feature_names to be optional in ensemble inference model (#58059 ) (#58067 ) This has `EnsembleInferenceModel` not parse feature_names from the XContent. Instead, it will rely on `rewriteFeatureIndices` to be called ahead time. Consequently, protections are made for a fail fast path if `rewriteFeatureIndices` has not been called before `infer`.	2020-06-12 16:33:54 -04:00
David Roberts	93b693527a	[7.x][ML] Add categorizer stats ML result type (#58001 ) This type of result will store stats about how well categorization is performing. When per-partition categorization is in use, separate documents will be written for every partition so that it is possible to see if categorization is working well for some partitions but not others. This PR is a minimal implementation to allow the C++ side changes to be made. More Java side changes related to per-partition categorization will be in followup PRs. However, even in the long term I do not see a major benefit in introducing dedicated APIs for querying categorizer stats. Like forecast request stats the categorizer stats can be read directly from the job's results alias. Backport of #57978	2020-06-12 12:08:07 +01:00
David Kyle	39020f3900	HLRC for delete expired data by job Id (#57722 ) (#57975 ) High level rest client changes for #57337	2020-06-12 09:44:17 +01:00
Benjamin Trent	2881995a45	[ML] adding new inference model size estimate handling from native process (#57930 ) (#57999 ) Adds support for reading in `model_size_info` objects. These objects contain numeric values indicating the model definition size and complexity. Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-11 15:59:23 -04:00
Andrei Dan	9f280621ba	[7.x] ILM add data stream support to searchable snapshot action (#57873 ) (#57916 ) (cherry picked from commit 34856a90532c6c62a53817bb395399c8a8c17c0f) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-10 10:16:57 +01:00
Yang Wang	72a6441a88	Revert "Resolve anonymous roles and deduplicate roles during authentication (#53453 ) (#55995 )" (#57858 ) This reverts commit `84a2f1adf2`.	2020-06-10 10:42:52 +10:00
Andrei Dan	3945712c72	[7.x] ILM add data stream support to the Shrink action (#57616 ) (#57884 ) The shrink action creates a shrunken index with the target number of shards. This makes the shrink action data stream aware. If the ILM managed index is part of a data stream the shrink action will make sure to swap the original managed index with the shrunken one as part of the data stream's backing indices and then delete the original index. (cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-09 19:45:22 +01:00
Nik Everett	44a79d1739	Deprecte Rounding#round (#57845 ) (#57893 ) This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in favor of calling ``` Rounding.Prepared prepared = rounding.prepare(min, max); ... prepared.round(val) ``` because it is always going to be faster to prepare once. There are going to be some cases where we won't know what to prepare for and in those cases you can call `prepareForUnknown` and stil be faster than calling the deprecated method over and over and over again. Ultimately, this is important because it doesn't look like there is an easy way to cache `Rounding.Prepared` or any of its precursors like `LocalTimeOffset.Lookup`. Instead, we can just build it at most once per request. Relates to #56124	2020-06-09 14:30:56 -04:00
Dan Hermann	b501b282f8	Change default backing index naming scheme	2020-06-09 09:31:34 -05:00
Przemysław Witek	7a1300a09e	[7.x] Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808 ) (#57815 )	2020-06-08 17:41:12 +02:00
Mayya Sharipova	70e63a365a	Refactor how to determine if a field is metafield (#57378 ) (#57771 ) Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422	2020-06-08 09:16:18 -04:00
Andrei Dan	1b84e93d83	[7.x] DataStream creation validation allows for prefixed indices (#57750 ) (#57799 ) We want to validate the DataStreams on creation to make sure the future backing indices would not clash with existing indices in the system (so we can always rollover the data stream). This changes the validation logic to allow for a DataStream to be created with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the former backing index (`foo-000001`) exists in the system. The new validation logic will look for potential index conflicts with indices in the system that have the counter in the name greater than the data stream's generation. This ensures that the `DataStream`'s future rollovers are safe because for a `DataStream` `foo` of generation 4, we will look for standalone indices in the form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if `foo-000006` exists in the system), but will also allow replacing a backing index with an index named by prefixing the backing index it replaces. (cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-08 13:31:52 +01:00
David Kyle	08d1286de7	[7.x] Delete expired data by job (#57337 ) (#57796 ) Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.	2020-06-08 13:00:23 +01:00
Luca Cavanna	7a06a13d99	Add description to submit and get async search, as well as cancel tasks (#57745 ) This makes it easier to debug where such tasks come from in case they are returned from the get tasks API. Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.	2020-06-08 11:17:29 +02:00
David Roberts	1d64d55a86	[7.x][ML] Add per-partition categorization option (#57723 ) This PR adds the initial Java side changes to enable use of the per-partition categorization functionality added in elastic/ml-cpp#1293. There will be a followup change to complete the work, as there cannot be any end-to-end integration tests until elastic/ml-cpp#1293 is merged, and also elastic/ml-cpp#1293 does not implement some of the more peripheral functionality, like stop_on_warn and per-partition stats documents. The changes so far cover REST APIs, results object formats, HLRC and docs. Backport of #57683	2020-06-06 08:15:17 +01:00
Benjamin Trent	9666a895f7	[ML] inference performance optimizations and refactor (#57674 ) (#57753 ) This is a major refactor of the underlying inference logic. The main refactor is now we are separating the model configuration and the inference interfaces. This has the following benefits: - we can store extra things with the model that are not necessary for inference (i.e. treenode split information gain) - we can optimize inference separate from model serialization and storage. - The user is oblivious to the optimizations (other than seeing the benefits). A major part of this commit is removing all inference related methods from the trained model configurations (ensemble, tree, etc.) and moving them to a new class. This new class satisfies a new interface that is ONLY for inference. The optimizations applied currently are: - feature maps are flattened once - feature extraction only happens once at the highest level (improves inference + feature importance through put) - Only storing what we need for inference + feature importance on heap	2020-06-05 14:20:58 -04:00
Dimitris Athanasiou	f49a14ce6f	[7.x][ML] Fix race condition when force stopping DF analytics job (#57680 ) (#57717 ) When we force delete a DF analytics job, we currently first force stop it and then we proceed with deleting the job config. This may result in logging errors if the job config is deleted before it is retrieved while the job is starting. Instead of force stopping the job, it would make more sense to try to stop the job gracefully first. So we now try that out first. If normal stop fails, then we resort to force stopping the job to ensure we can go through with the delete. In addition, this commit introduces `timeout` for the delete action and makes use of it in the child requests. Backport of #57680	2020-06-05 17:50:01 +03:00
Hendrik Muhs	e91b975878	[Transform] mark old data frame transform roles deprecated (#57655 ) mark old data frame transform roles deprecated fixes #50087	2020-06-05 09:20:35 +02:00
Hendrik Muhs	c1c8817eae	[7.x][Transform] improve update API (#57685 ) rewrite config on update if either version is outdated, credentials change, the update changes the config or deprecated settings are found. Deprecated settings get migrated to the new format. The upgrade can be easily extended to do any necessary re-writes. fixes #56499 backport #57648	2020-06-05 08:48:47 +02:00

1 2 3 4 5 ...

2084 Commits