OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	b46b6d5977	Fix compilation in DataTierTests.java This commit fixes a compilation issue in DataTierTests.java that was introduced due to language-level differences between 7.10/7.x and master.	2020-10-27 13:04:55 -04:00
Jason Tedor	04a9845a49	Adjust defaults for tiered data roles (#64015 ) This commit adjusts the defaults for the tiered data roles so that they are enabled by default, or if the node has the legacy data role. This ensures that the default experience is that the tiered data roles are enabled. To fully specifiy the behavior for the tiered data roles then: - starting a new node with the defaults: enabled - starting a new node with node.roles configured: enabled if and only if the tiered data roles are explicitly configured, independently of the node having the data role - starting a new node with node.data enabled: enabled unless the tiered data roles are explicitly disabled - starting a new node with node.data disabled: disabled unless the tiered data roles are explicitly enabled	2020-10-27 12:48:31 -04:00
Henning Andersen	0cba23e08f	XPack Usage should run on MANAGEMENT threads (#64160 ) XPack usage starts out on management threads, but depending on the implementation of the usage plugin, they could end up running on transport threads instead. Fixed to always reschedule on a management thread.	2020-10-27 16:03:26 +01:00
Nhat Nguyen	566d1fd459	Return the same point in time in search response (#64188 ) With this change, we will always return the same point in time in a search response as its input until we implement the retry mechanism for the point in times.	2020-10-27 10:17:44 -04:00
David Roberts	adc5509eda	[ML] Support the unsigned_long type in data frame analytics (#64072 ) Adds support for the unsigned_long type to data frame analytics. This type is handled in the same way as the long type. Values sent to the ML native processes are converted to floats and hence will lose accuracy when outside the range where a float can uniquely represent long values. Backport of #64066	2020-10-26 09:05:49 +00:00
Benjamin Trent	eff7f06ca6	[ML] fix inference binary classification predication label and feature importance (#63688 ) (#63930 ) When calculating feature importance, the leaf values directly correlate the value of the importance. Consequently, positive leaf values -> positive feature importance negative leaf values -> negative feature importance. It follows that for binary classification, this is done such that the importance relates to the leaf values, which relate directly to the "probability of class 1". So, the feature importance calculated is always for the importance as it relates to class 1. The inverse is the importance as it relates to class 0.	2020-10-20 08:50:15 -04:00
Ioannis Kakavas	364511395d	[7.10] Move RestRequestFilter to core (#63507 ) Move RestRequestFilter to core so that Rest requests outside xpack can use it to filter fields and expand its usage. Backport of #63507	2020-10-16 13:57:52 +03:00
Jim Ferenczi	1d78bd0f72	Async search should retry updates on version conflict (#63652 ) * Async search should retry updates on version conflict The _async_search APIs can throw version conflict exception when the internal response is updated concurrently. That can happen if the final response is written while the user extends the expiration time. That scenario should be rare but it happened in Kibana for several users so this change ensures that updates are retried at least 5 times. That should resolve the transient errors for Kibana. This change also preserves the version conflict exception in case the retry didn't work instead of returning a confusing 404. This commit also ensures that we don't delete the response if the search was cancelled internally and not deleted explicitly by the user. Closes #63213	2020-10-16 08:49:02 +02:00
Albert Zaharovits	f4e1e6893d	Add view_index_metadata over metricbeat-* for monitoring agent (#63750 ) The `remote_monitoring_agent` reserved role is extended to grant more privileges over the metricbeat-* index pattern. In addition to the index and create_index index privileges that it granted already, it now also grants the view_index_metadata privilege. Closes #63203	2020-10-16 02:13:55 +03:00
Jay Modi	ebdaeb2f9a	Ensure cancelled jobs do not continue to run (#63771 ) This commit ensures that jobs within the SchedulerEngine do not continue to run after they are cancelled. There was no synchronization between the cancel method of an ActiveSchedule and the run method, so an actively running schedule would go ahead and reschedule itself even if the cancel method had been called. This commit adds synchronization between cancelling and the scheduling of the next run to ensure that the job is cancelled. In real life scenarios this could manifest as a job running multiple times for SLM. This could happen if a job had been triggered and was cancelled prior to completing its run such as if the node was no longer the master node or if SLM was stopping/stopped. Closes #63754 Backport of #63762	2020-10-15 14:01:14 -06:00
Albert Zaharovits	2b7fbe9957	Add the missing apikey.* fields to the logfile audit layout for docker builds (#63609 ) The layout pattern for the security audit for docker builds was missing the apiKey.* fields.	2020-10-14 13:58:41 +03:00
Lee Hinman	7371e51583	[7.10] Add DiscoveryNodeRole compatibility role for bwc tier serialization (#63581 ) (#63613 ) Backports the following commits to 7.10: Add DiscoveryNodeRole compatibility role for bwc tier serialization (#63581)	2020-10-13 09:17:15 -06:00
Przemysław Witek	acbd48f834	[ML] Allow setting num_top_classes to a special value -1 (#63587 ) (#63602 )	2020-10-13 13:57:50 +02:00
Dimitris Athanasiou	e1c418aac7	[7.10][ML] Validate dest pipeline exists on transform update (#63494 ) (#63549 ) Adds validation that the dest pipeline exists when a transform is updated. Refactors the pipeline check into the `SourceDestValidator`. Fixes #59587 Backport of #63494	2020-10-12 15:41:35 +03:00
Przemysław Witek	bd761cce1d	[ML] Validate that AucRoc has the data necessary to be calculated (#63302 ) (#63454 )	2020-10-08 09:52:15 +02:00
Alan Woodward	88b45dfa61	Convert TextFieldMapper to parametrized form (#63269 ) (#63392 ) As a result of this, we can remove a chunk of code from TypeParsers as well. Tests for search/index mode analyzers have moved into their own file. This commit also rationalises the serialization checks for parameters into a single SerializerCheck interface that takes the values includeDefaults, isConfigured and the value itself. Relates to #62988	2020-10-07 13:26:25 +01:00
Gordon Brown	15edc39d9b	Update logstash_admin role for system indices (#63368 ) This PR updates the `logstash_admin` role to include the recently-added Logstash Pipeline Management APIs, as well as access to the `.logstash*` index pattern. Co-authored-by: William Brafford <williamrandolphbrafford@gmail.com>	2020-10-06 20:43:36 -06:00
Gordon Brown	5c8b0662df	Deprecate REST access to System Indices (#63274 ) (Original #60945 ) This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns. Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default: - `GET _cluster/health` - `GET {index}/_recovery` - `GET _cluster/allocation/explain` - `GET _cluster/state` - `POST _cluster/reroute` - `GET {index}/_stats` - `GET {index}/_segments` - `GET {index}/_shard_stores` - `GET _cat/[indices,aliases,health,recovery,shards,segments]` Deprecation warnings for accessing system indices take the form: ``` this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default ```	2020-10-06 13:41:40 -06:00
Tanguy Leroux	87076c32e2	Determine shard size before allocating shards recovering from snapshots (#61906 ) (#63337 ) Determines the shard size of shards before allocating shards that are recovering from snapshots. It ensures during shard allocation that the target node that is selected as recovery target will have enough free disk space for the recovery event. This applies to regular restores, CCR bootstrap from remote, as well as mounting searchable snapshots. The InternalSnapshotInfoService is responsible for fetching snapshot shard sizes from repositories. It provides a getShardSize() method to other components of the system that can be used to retrieve the latest known shard size. If the latest snapshot shard size retrieval failed, the getShardSize() returns ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE. While we'd like a better way to handle such failures, returning this value allows to keep the existing behavior for now. Note that this PR does not address an issues (we already have today) where a replica is being allocated without knowing how much disk space is being used by the primary. Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2020-10-06 18:37:05 +02:00
David Kyle	ea32b4ab82	[ML] Audit message when nightly maintenance times out (#63252 ) (#63330 ) During deletion of old ml data set the delete by query timeout to 8 hours and audit a job message when the nightly maintenance task times out.	2020-10-06 16:19:37 +01:00
Hendrik Muhs	058c55da6a	[Transform] disallow field and script being empty for group sources (#63313 ) fail validation earlier when field and script are both missing in a group source	2020-10-06 16:59:02 +02:00
Yang Wang	abf9b885b4	Bulk invalidate API keys using a list of IDs (#63224 ) (#63320 ) Add a new ids field to the API of invalidating API keys so that it supports bulk invalidation with a list of IDs. Note the existing id field is kept as is and it is an error if both id and ids are specified.	2020-10-07 00:49:21 +11:00
Yang Wang	bbfa2f1303	Fix test failure due to missing client action	2020-10-07 00:45:30 +11:00
Yang Wang	7969fbb4ab	Cache API key doc to reduce traffic to the security index (#59376 ) (#63319 ) Getting the API key document form the security index is the most time consuing part of the API Key authentication flow (>60% if index is local and >90% if index is remote). This traffic is now avoided by caching added with this PR. Additionally, we add a cache invalidator registry so that clearing of different caches will be managed in a single place (requires follow-up PRs).	2020-10-06 23:49:23 +11:00
David Kyle	8f4ef40f78	[ML] Auditor ensures template is installed before writes (#63286 ) The ML auditors should not write if the latest template is not present. Instead a PUT template request is made and the writes queued up	2020-10-06 11:20:37 +01:00
Armin Braun	cf75abb021	Optimize XContentParserUtils.ensureExpectedToken (#62691 ) (#63253 ) We only ever use this with `XContentParser` no need to make it inline worse by forcing the lambda and hence dynamic callsite here. => Extraced the exception formatting code path that is likely very cold to a separate method and removed the lambda usage in hot loops by simplifying the signature here.	2020-10-05 19:08:32 +02:00
Benjamin Trent	1e63313c19	[ML] adds feature_importance_baseline object to model metadata (#63172 ) (#63237 ) this adds the new field `feature_importance_baseline` and allows it to be optionally be included in the model's metadata. Related to: https://github.com/elastic/ml-cpp/pull/1522	2020-10-05 09:33:38 -04:00
Benjamin Trent	752ee0288e	[7.x] [ML] optimize delete expired snapshots (#63134 ) (#63200 ) * [ML] optimize delete expired snapshots (#63134) When deleting expired snapshots, we do an individual delete action per snapshot per job. We should instead gather the expired snapshots and delete them in a single call. This commit achieves this and a side-effect is there is less audit log spam on nightly cleanup closes https://github.com/elastic/elasticsearch/issues/62875	2020-10-02 13:24:36 -04:00
Przemysław Witek	5370f270d7	[7.x] [ML] Ensure data frame analytics jobs don't run on a node that's too new (#62749 ) (#63175 )	2020-10-02 17:19:58 +02:00
Joe Gallo	d172a18c95	Tidy up some ILM and SLM packages (#63146 ) Very minor refactoring, just moving some ILM and SLM classes around to decrease the total number of packages.	2020-10-02 09:30:24 -04:00
Benjamin Trent	535f8a434b	Revert "[ML] adding `baseline` field to total_feature_importance objects (#63098 ) (#63125 )" (#63144 ) This reverts commit `95242eccee`.	2020-10-02 07:03:15 -04:00
Ioannis Kakavas	e91f66e22f	Ensure domain_name setting for AD realm is present (#61983 ) (#63159 ) We would only check for a null value and not for an empty string so that meant that we were not actually enforcing this mandatory setting. This commits ensures we check for both and fail accordingly if necessary, on startup	2020-10-02 12:16:08 +03:00
Lee Hinman	f0f0da2188	[7.x] Add telemetry for data tiers (#63031 ) (#63140 ) Backports the following commits to 7.x: Add telemetry for data tiers (#63031)	2020-10-01 12:37:32 -06:00
Benjamin Trent	95242eccee	[ML] adding `baseline` field to total_feature_importance objects (#63098 ) (#63125 ) This adds a new `baseline` field to the feature importance values. This field contains the baseline importance for a given feature and class.	2020-10-01 09:48:07 -04:00
Yang Wang	e31bef4032	Fix API key role descriptors rewrite bug for upgraded clusters (#62917 ) (#63042 ) This PR ensures that API key role descriptors are always rewritten to a target node compatible format before a request is sent.	2020-09-30 22:16:39 +10:00
Benjamin Trent	0860746bf2	[ML] changing ngram loop order for minor performance improvement (#63033 ) (#63059 ) This is a very minor optimization but trivial to implement, so might as well. ``` Benchmark (nGramStrs) Mode Cnt Score Error Units NGramProcessorBenchmark.ngramInnerLoop 1,2,3 avgt 20 4415092.443 ± 31302.115 ns/op NGramProcessorBenchmark.ngramOuterLoop 1,2,3 avgt 20 4235550.340 ± 103393.465 ns/op ``` This measurement is in nanoseconds, consequently, the overall performance of inference is dominated by other factors (i.e. map#put). But, this optimization adds up overtime and is simple.	2020-09-30 07:51:31 -04:00
Przemysław Witek	4366d58564	[7.x] [ML] Implement AucRoc metric for classification (#60502 ) (#63051 )	2020-09-30 12:55:52 +02:00
Benjamin Trent	0b3af242d4	[ML] fixing classification feature importance parsing (#63003 ) (#63015 ) Classification feature importance supports various types in the class name: - string - boolean - numerical The xcontent parsing on the server side and the HLRC side should support and test these types.	2020-09-29 10:54:35 -04:00
Yang Wang	068f605040	Use compilation as validation for painless role template (#62845 ) (#63010 ) * Use compilation as validation for painless role template (#62845) Role template validation now performs only compilation if the script is painless. It no longer attempts to execute the script with empty input which is problematic. The compliation process will catch things like invalid syntax, undefined variables, which still provide certain level of protection against ill-defined role templates. Behaviour for Mustache script is unchanged. * Checkstyle	2020-09-30 00:37:41 +10:00
Dimitris Athanasiou	facf9ede0a	[ML] Fix binary classification importance in LegacyFeatureImportanceTests (#63000 ) Fixes #62991	2020-09-29 15:53:34 +03:00
Dimitris Athanasiou	7f6c1ff5b4	[7.x][ML] Remove top level importance from classification inference results (#62486 ) (#62964 ) As we have decided top level importance for classification is not useful, it has been removed from the results from the training job. This commit also removes them from inference. Backport of #62486	2020-09-29 10:58:48 +03:00
Hendrik Muhs	b1a8437d0b	[7.x][Transform] Improve robustness when saving state (#62927 ) refactor how state is persisted, call doSaveState only from the indexer thread, except there is none. fixes #60781 fixes #52931 fixes #51629 fixes #52035	2020-09-28 10:12:51 +02:00
Andrei Stefan	a43f29cfc9	EQL: data streams tests for PIT and EQL sequences (#62850 ) (#62889 ) * PIT should run well with data streams (cherry picked from commit 0a89a7db848b015b797c7678874b5c9e33bbd650)	2020-09-24 23:37:46 +03:00
Andrei Dan	e323c5245b	[7.x] ILM: migrate action configures the _tier_preference setting (#62829 ) (#62860 ) The `migrate` action will now configure the `index.routing.allocation.include._tier_preference` setting to the corresponding tiers. For the HOT phase it will configure `data_hot`, for the WARM phase it will configure `data_warm,data_hot` and for the COLD phase `data_cold,data_warm,data_cold`. (cherry picked from commit 9dbf0e6f0c267e40c5bcfb568bb2254da103ae40) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-09-24 13:37:09 +01:00
Hendrik Muhs	a70389015d	[Transform] Return parsed count for get transform stats (#62809 ) In case of more than 500 transforms, get and stats return paged results which can be requested using page parameters. For >500 transforms count wasn't parsed out of the server response but taken from size of the list of transforms. The change also adds client/server hlrc tests and fixes a wrong type for count in get. fixes #56245	2020-09-24 08:38:07 +02:00
Dimitris Athanasiou	7de5201291	[7.x][ML] Handle data frame analytics state spreading over multiple docs (#62564 ) (#62824 ) When state persistence was first implemented for data frame analytics we had the assumption that state would always fit in a single document. However this is not the case any more. This commit adds handling of state that spreads over multiple documents. Backport of #62564	2020-09-23 16:16:34 +03:00
Nhat Nguyen	663b85b98f	Make keep alive optional in PointInTimeBuilder (#62720 ) Remove the keepAlive parameter from the constructor of PointInTimeBuilder as it's optional.	2020-09-22 18:52:54 -04:00
Benjamin Trent	77bfb32635	[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694 ) (#62784 ) * [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) * [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls global parameters, outside of the global index, are ignored for internal callers in certain cases. If the interal caller is adding requests via the following methods: ``` - BulkRequest#add(IndexRequest) - BulkRequest#add(UpdateRequest) - BulkRequest#add(DocWriteRequest) - BulkRequest#add(DocWriteRequest[]) ``` It is better to specifically set the desired parameters on the requests before they are added to the bulk request object. This commit addresses this issue for the ML plugin * unmuting test	2020-09-22 15:07:08 -04:00
Andrei Dan	79d0c4ed18	ILM: allow check-migration step to continue if tier setting unset (#62636 ) (#62724 ) This allows the `check-migration` step to move past the allocation check if the tier routing settings are manually unset. This helps a user unblock ILM in case a tier is removed (ie. if the warm tier is decommissioned this will allow users to resume the ILM policies stuck in `check-migration` waiting for the warm nodes to become available and the managed index to allocate. this allows the index to allocate on the other available tiers) (cherry picked from commit d7a1eaa7f51d0972d10c0df1d3cd77d6b755dd41) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-09-21 20:40:01 +01:00
Lee Hinman	4a08928c47	[7.x] Add index.routing.allocation.include._tier_preference setting (#62589 ) (#62667 ) This commit adds the `index.routing.allocation.prefer._tier` setting to the `DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a preference-based list of tiers for an index to be assigned to. For example, if the setting were set to: ``` "index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content" ``` If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and `data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes. This allows us to specify an index's preference for tier(s) without causing the index to be unassigned if no nodes of a preferred tier are available. Subsequent work will change the ILM migration to make additional use of this setting. Relates to #60848	2020-09-18 15:41:36 -06:00

1 2 3 4 5 ...

2083 Commits