OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-09 22:45:04 +00:00

Author	SHA1	Message	Date
Martijn van Groningen	2a89e13e43	Move data stream transport and rest action to xpack (#59593 ) Backport of #59525 to 7.x branch. * Actions are moved to xpack core. * Transport and rest actions are moved the data-streams module. * Removed data streams methods from Client interface. * Adjusted tests to use client.execute(...) instead of data stream specific methods. * only attempt to delete all data streams if xpack is installed in rest tests * Now that ds apis are in xpack and ESIntegTestCase no longers deletes all ds, do that in the MlNativeIntegTestCase class for ml tests.	2020-07-15 16:50:44 +02:00
Tanguy Leroux	604f22db79	Use a dedicated thread pool for searchable snapshot cache prewarming (#59313 ) (#59590 ) Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.	2020-07-15 11:45:52 +02:00
Tal Levy	4bb91b61e8	Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349 ) (#59577 ) Closes #44505.	2020-07-14 22:37:48 -07:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Ryan Ernst	3b688bfee5	Add license feature usage api (#59342 ) (#59571 ) This commit adds a new api to track when gold+ features are used within x-pack. The tracking is done internally whenever a feature is checked against the current license. The output of the api is a list of each used feature, which includes the name, license level, and last time it was used. In addition to a unit test for the tracking, a rest test is added which ensures starting up a default configured node does not result in any features registering as used. There are a couple features which currently do not work well with the tracking, as they are checked in a manner that makes them look always used. Those features will be fixed in followups, and in this PR they are omitted from the feature usage output.	2020-07-14 14:34:59 -07:00
Albert Zaharovits	4eb310c777	Disallow mapping updates for doc ingestion privileges (#58784 ) The `create_doc`, `create`, `write` and `index` privileges do not grant the PutMapping action anymore. Apart from the `write` privilege, the other three privileges also do NOT grant (auto) updating the mapping when ingesting a document with unmapped fields, according to the templates. In order to maintain the BWC in the 7.x releases, the above privileges will still grant the Put and AutoPutMapping actions, but only when the "index" entity is an alias or a concrete index, but not a data stream or a backing index of a data stream.	2020-07-14 23:39:41 +03:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
Nhat Nguyen	4d7c59bedb	Assign follower primary to nodes with remote cluster client role (#59375 ) The primary shards of follower indices during the bootstrap need to be on nodes with the remote cluster client role as those nodes reach out to the corresponding leader shards on the remote cluster to copy Lucene segment files and renew the retention leases. This commit introduces a new allocation decider that ensures bootstrapping follower primaries are allocated to nodes with the remote cluster client role. Co-authored-by: Jason Tedor <jason@tedor.me>	2020-07-14 11:23:55 -04:00
Andrei Stefan	cf752992d6	Add telemetry metrics (#59526 )	2020-07-14 16:25:24 +03:00
Dan Hermann	59f639a279	Add auto_configure privilege	2020-07-14 08:23:49 -05:00
Andrei Dan	7dcdaeae49	Default to @timestamp in composable template datastream definition (#59317 ) (#59516 ) This makes the data_stream timestamp field specification optional when defining a composable template. When there isn't one specified it will default to `@timestamp`. (cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 12:36:54 +01:00
Hendrik Muhs	c8290167a0	[7.x][Transform] separate pivot and extract function interface (#59505 ) separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function. Foundation to add more functions to transform. piggy backed fixes: - when running geo tile group_by it could fail due to query clause limit (unreleased) - new style page size using settings was not validating limit of 10k (7.8)	2020-07-14 11:27:16 +02:00
Lee Hinman	bf1a60130d	[7.x] Add telemetery for data streams (#59433 ) (#59454 ) This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is pretty minimal, returning only the number of data streams and the number of indices currently abstracted by a data stream: ``` ... "data_streams" : { "available" : true, "enabled" : true, "data_streams" : 3, "indices_count" : 17 } ... ```	2020-07-13 14:30:11 -06:00
David Roberts	b5e8250a4e	[ML] Drive categorization warning notifications from annotations (#59393 ) With the introduction of per-partition categorization the old logic for creating a job notification for categorization status "warn" does not work. However, the C++ code is already writing annotations for categorization status "warn" that take into account whether per-partition categorization is being used and which partition(s) the warnings relate to. Therefore, this change alters the Java results processor to create notifications based on the annotations the C++ writes. (It is arguable that we don't need both annotations and notifications, but they show up in different ways in the UI: only annotations are visible in results and only notifications set the warning symbol in the jobs list. This means it's best to have both.) Backport of #59377	2020-07-13 15:28:57 +01:00
Yang Wang	a84469742c	Improve role cache efficiency for API key roles (#58156 ) (#59397 ) This PR ensure that same roles are cached only once even when they are from different API keys. API key role descriptors and limited role descriptors are now saved in Authentication#metadata as raw bytes instead of deserialised Map<String, Object>. Hashes of these bytes are used as keys for API key roles. Only when the required role is not found in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly from raw bytes to RoleDescriptors without going through the current detour of "bytes -> Map -> bytes -> RoleDescriptors".	2020-07-13 22:58:11 +10:00
Dan Hermann	e01d73c737	[7.x] Data stream admin actions are now index-level actions	2020-07-10 14:36:18 -05:00
Dimitris Athanasiou	b2243337d8	[7.x][ML] Data frame analytics max_num_threads setting (#59254 ) (#59308 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU. Backport of #59254 and #57274	2020-07-09 19:15:46 +03:00
Dimitris Athanasiou	d07b11b86b	[7.x][ML] Perform test inference on java (#58877 ) (#59298 ) Since we are able to load the inference model and perform inference in java, we no longer need to rely on the analytics process to be performing test inference on the docs that were not used for training. The benefit is that we do not need to send test docs and fit them in memory of the c++ process. Backport of #58877 Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-07-09 16:30:49 +03:00
David Kyle	86555ec163	Remove unused function InferenceIndexConstants.mapping() (#59146 ) (#59158 ) InferenceIndexConstants.mapping() is broken and unused.	2020-07-09 14:28:53 +01:00
David Kyle	dbb9c802b1	Better error message when the model cannot be parsed due to its size (#59166 ) (#59209 ) The actual cause can be lost in a long list of parse exceptions this surfaces the cause when the problem is size.	2020-07-09 13:43:46 +01:00
Albert Zaharovits	2b7456db7f	Improve auditing of API key authentication #58928 1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields to the `access_granted`, `access_denied`, `authentication_success`, and (some) `tampered_request` audit events. The `apikey.id` and `apikey.name` are present only when authn using an API Key. 2. When authn with an API Key, the `user.realm` field now contains the effective realm name of the user that created the key, instead of the synthetic value of `_es_api_key`.	2020-07-09 13:26:18 +03:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
David Turner	6ffdb19a2a	Clean searchable snapshots cache on startup (#59009 ) Today we empty the searchable snapshots cache when cleanly closing a shard, but leak cache files in some cases involving an unclean shutdown. Such leaks are not permanent, they are cleaned up on shard relocation or deletion, but they still might last for arbitrarily long until that happens. This commit introduces a cleanup process that runs during node startup to catch such leaks sooner. Also, today we permit searchable snapshots to be held on custom data paths, and store the corresponding cache files within the custom location. Supporting this feature would make the cleanup process significantly more complicated since it would require each node to parse the index metadata for the shards it held before shutdown. Yet, this feature is undocumented and offers minimal benefits to searchable snapshots. Therefore with this commit we forbid custom data paths for searchable snapshot shards.	2020-07-08 15:17:52 +01:00
Dan Hermann	90c8d3fc9d	IndexNameExpressionResolver::dataStreamNames should support exclusions	2020-07-08 07:35:52 -05:00
Yannick Welsch	0b9eb210b8	Add basic searchable snapshots usage information (#58828 ) (#59160 ) Adds super basic usage information for searchable snapshots, to be extended later. Backport of #58828	2020-07-08 13:09:29 +02:00
Albert Zaharovits	d4a0f80c32	Ensure authz role for API key is named after owner role (#59041 ) The composite role that is used for authz, following the authn with an API key, is an intersection of the privileges from the owner role and the key privileges defined when the key has been created. This change ensures that the `#names` property of such a role equals the `#names` property of the key owner role, thereby rectifying the value for the `user.roles` audit event field.	2020-07-07 23:26:57 +03:00
Rene Groeschke	e8181fc627	Fix implicit duplicate duplicatesStrategy in processResources (#58929 ) (#59127 ) * Fix implicit duplicate duplicatesStrategy in processResources * Fix duplicates strategy in docker distribution setup	2020-07-07 13:45:36 +02:00
David Roberts	e217f9a1e8	[ML] Wait for shards to initialize after creating ML internal indices (#59087 ) There have been a few test failures that are likely caused by tests performing actions that use ML indices immediately after the actions that create those ML indices. Currently this can result in attempts to search the newly created index before its shards have initialized. This change makes the method that creates the internal ML indices that have been affected by this problem (state and stats) wait for the shards to be initialized before returning. Backport of #59027	2020-07-07 10:52:10 +01:00
Przemysław Witek	4a791e835b	Simplify parser declarations when specialist types are stored in strings (#58996 ) (#59056 )	2020-07-06 13:05:03 +02:00
Przemysław Witek	f35ad0d4e1	Report peak model memory in ModelSizeStats (#59017 ) (#59055 )	2020-07-06 12:55:12 +02:00
David Kyle	c651135562	[ML] Make Inference processor field_map and inference_config optional (#59010 ) Relaxes the requirement that the inference ingest processor must has a field_map and inference_config defined even if they are empty.	2020-07-06 11:35:30 +01:00
Benjamin Trent	b9d9964d10	[ML] add exponent output aggregator to inference (#58933 ) (#59016 ) * [ML] add exponent output aggregator to inference * fixing docs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-03 14:51:00 -04:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Luca Cavanna	e3fc1638d8	Improve error handling in async search code (#57925 ) - The exception that we caught when failing to schedule a thread was incorrect. - We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager - when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed. Closes #58995	2020-07-03 16:07:26 +02:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
David Kyle	f6a0c2c59d	[7.x] Pipeline Inference Aggregation (#58965 ) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results.	2020-07-03 09:29:04 +01:00
Dan Hermann	c988afdc15	Data stream support for migrations deprecations info API	2020-07-02 11:16:22 -05:00
Przemysław Witek	751e84e4c8	Rename regression evaluation metrics to make the names consistent with loss functions (#58887 ) (#58927 )	2020-07-02 17:35:55 +02:00
Przemysław Witek	8e074c4495	Rename "error" field to "value" for consistency between metrics (#58726 ) (#58870 )	2020-07-02 09:08:56 +02:00
Yang Wang	a5a8b4ae1d	Add cache for application privileges (#55836 ) (#58798 ) Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).	2020-07-02 11:50:03 +10:00
Benjamin Trent	c64e283dbf	[7.x] [ML] handles compressed model stream from native process (#58009 ) (#58836 ) * [ML] handles compressed model stream from native process (#58009) This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: https://github.com/elastic/ml-cpp/pull/1349	2020-07-01 15:14:31 -04:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Ryan Ernst	c23613e05a	Split license allowed checks into two types (#58704 ) (#58797 ) The checks on the license state have a singular method, isAllowed, that returns whether the given feature is allowed by the current license. However, there are two classes of usages, one which intends to actually use a feature, and another that intends to return in telemetry whether the feature is allowed. When feature usage tracking is added, the latter case should not count as a "usage", so this commit reworks the calls to isAllowed into 2 methods, checkFeature, which will (eventually) both check whether a feature is allowed, and keep track of the last usage time, and isAllowed, which simply determines whether the feature is allowed. Note that I considered having a boolean flag on the current method, but wanted the additional clarity that a different method name provides, versus a boolean flag which is more easily copied without realizing what the flag means since it is nameless in call sites.	2020-07-01 07:11:05 -07:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
Dario Gieselaar	417f7062c5	[7.x] Add read privileges for annotations for apm_user (#58530 ) (#58781 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-01 09:04:57 +02:00
Yang Wang	3d49e62960	Support handling LogoutResponse from SAML idP (#56316 ) (#58792 ) SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective. The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error. This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.	2020-07-01 16:47:27 +10:00
Martijn van Groningen	adcef93a6c	Introduce new put mapping action for dynamic mapping updates. (#58746 ) Backport of #58419 Mapping updates that originate from indexing a document with unmapped fields will use this new action instead of the current put mapping action. This way on the security side, authorization logic can easily determine whether a mapping update is automatically generated or a mapping update originates from the put mapping api. The new auto put mapping action is only used if all nodes are on the version that supports it.	2020-06-30 18:02:31 +02:00
David Roberts	d9e0e0bf95	[ML] Pass through the stop-on-warn setting for categorization jobs (#58738 ) When per_partition_categorization.stop_on_warn is set for an analysis config it is now passed through to the autodetect C++ process. Also adds some end-to-end tests that exercise the functionality added in elastic/ml-cpp#1356 Backport of #58632	2020-06-30 15:17:04 +01:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00

1 2 3 4 5 ...

1938 Commits