OpenSearch

Commit Graph

Author	SHA1	Message	Date
Yang Wang	edf27cd765	Adjust BWC versions for API key auth test. API key realm name is not available in authentication metadata prior to v7.5. The issue is tracked at #59425	2020-07-14 00:38:42 +10:00
David Roberts	b5e8250a4e	[ML] Drive categorization warning notifications from annotations (#59393 ) With the introduction of per-partition categorization the old logic for creating a job notification for categorization status "warn" does not work. However, the C++ code is already writing annotations for categorization status "warn" that take into account whether per-partition categorization is being used and which partition(s) the warnings relate to. Therefore, this change alters the Java results processor to create notifications based on the annotations the C++ writes. (It is arguable that we don't need both annotations and notifications, but they show up in different ways in the UI: only annotations are visible in results and only notifications set the warning symbol in the jobs list. This means it's best to have both.) Backport of #59377	2020-07-13 15:28:57 +01:00
David Kyle	054d5236d4	Mute RegressionIT failure (#59414 ) For #59413	2020-07-13 14:12:19 +01:00
Yang Wang	a84469742c	Improve role cache efficiency for API key roles (#58156 ) (#59397 ) This PR ensure that same roles are cached only once even when they are from different API keys. API key role descriptors and limited role descriptors are now saved in Authentication#metadata as raw bytes instead of deserialised Map<String, Object>. Hashes of these bytes are used as keys for API key roles. Only when the required role is not found in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly from raw bytes to RoleDescriptors without going through the current detour of "bytes -> Map -> bytes -> RoleDescriptors".	2020-07-13 22:58:11 +10:00
Dan Hermann	e01d73c737	[7.x] Data stream admin actions are now index-level actions	2020-07-10 14:36:18 -05:00
Dan Hermann	7fa9cf601b	Data stream support for rollup search	2020-07-10 11:13:34 -05:00
Alan Woodward	4b9cbfca64	Remove test backported in error	2020-07-09 21:45:41 +01:00
Alan Woodward	f4caadd239	MappedFieldType no longer requires equals/hashCode/clone (#59212 ) With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer have any cause to compare MappedFieldType instances. This means that we can remove all equals and hashCode implementations, and in addition we no longer need the clone implementations which were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes, which will be particularly useful for the runtime fields project.	2020-07-09 21:05:10 +01:00
Lisa Cawley	54483394ae	[DOCS] Clarify subscription requirements (#58958 ) (#59307 )	2020-07-09 12:24:45 -07:00
Dan Hermann	c7e977701a	Data stream support for async search	2020-07-09 13:12:04 -05:00
Dan Hermann	b9fb12924b	Data stream support for EQL search	2020-07-09 13:10:44 -05:00
Dimitris Athanasiou	b2243337d8	[7.x][ML] Data frame analytics max_num_threads setting (#59254 ) (#59308 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU. Backport of #59254 and #57274	2020-07-09 19:15:46 +03:00
Costin Leau	d9c1e531db	EQL: Introduce until functionality (#59292 ) Sequences now support until conditional, which prevents a match from occurring if the until matches a document while doing look-ups. Thus a sequence must complete before the until condition matches - if any document within the sequence occurs at, or after, the until hit, the sequence is discarded. (cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)	2020-07-09 17:12:01 +03:00
Dimitris Athanasiou	d07b11b86b	[7.x][ML] Perform test inference on java (#58877 ) (#59298 ) Since we are able to load the inference model and perform inference in java, we no longer need to rely on the analytics process to be performing test inference on the docs that were not used for training. The benefit is that we do not need to send test docs and fit them in memory of the c++ process. Backport of #58877 Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-07-09 16:30:49 +03:00
David Kyle	86555ec163	Remove unused function InferenceIndexConstants.mapping() (#59146 ) (#59158 ) InferenceIndexConstants.mapping() is broken and unused.	2020-07-09 14:28:53 +01:00
Andrei Stefan	d187b531ed	EQL: Give a name to all toml tests and enforce the naming of new tests (#59283 ) (#59295 ) (cherry picked from commit c8ffe3c9237d3cdd90331795b8e37517155b7e91)	2020-07-09 16:20:29 +03:00
David Kyle	dbb9c802b1	Better error message when the model cannot be parsed due to its size (#59166 ) (#59209 ) The actual cause can be lost in a long list of parse exceptions this surfaces the cause when the problem is size.	2020-07-09 13:43:46 +01:00
David Kyle	c5443f78ce	Add Inference Pipeline aggregation to HLRC (#59086 ) (#59250 ) Adds InferencePipelineAggregationBuilder to the HLRC duplicating the server side classes	2020-07-09 13:38:45 +01:00
Daniel Mitterdorfer	10ef4d2140	Mute testMaxRestoreBytesPerSecIsUsed (#59289 ) Relates #59287	2020-07-09 12:52:17 +02:00
Alan Woodward	67a27e2b9d	Add declarative parameters to FieldMappers (#58663 ) The FieldMapper infrastructure currently has a bunch of shared parameters, many of which are only applicable to a subset of the 41 mapper implementations we ship with. Merging, parsing and serialization of these parameters are spread around the class hierarchy, with much repetitive boilerplate code required. It would be much easier to reason about these things if we could declare the parameter set of each FieldMapper directly in the implementing class, and share the parsing, merging and serialization logic instead. This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it. Parameters are declared on Builder classes, with the declaration including the parameter name, whether or not it is updateable, a default value, how to parse it from mappings, and how to extract it from another mapper at merge time. Builders have a getParameters method, which returns a list of the declared parameters; this is then used for parsing, merging and serialization. Merging is achieved by constructing a new Builder from the existing Mapper, and merging in values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new, merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final. Other mappers can be gradually migrated to this new style, and once they have all been refactored we can merge ParametrizedFieldMapper and FieldMapper entirely.	2020-07-09 11:43:21 +01:00
Daniel Mitterdorfer	daa48329ec	[TEST] Mute FollowerFailOverIT.testFailOverOnFollower (#58659 ) (#59286 ) Relates #58534 Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>	2020-07-09 12:38:36 +02:00
Albert Zaharovits	2b7456db7f	Improve auditing of API key authentication #58928 1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields to the `access_granted`, `access_denied`, `authentication_success`, and (some) `tampered_request` audit events. The `apikey.id` and `apikey.name` are present only when authn using an API Key. 2. When authn with an API Key, the `user.realm` field now contains the effective realm name of the user that created the key, instead of the synthetic value of `_es_api_key`.	2020-07-09 13:26:18 +03:00
Dimitris Athanasiou	d323f8d698	[ML] Add REST spec for the update data frame analytics endpoint (#59253 ) (#59281 ) Closes #59148 Backport of #59253	2020-07-09 13:12:21 +03:00
Ignacio Vera	1ad00d1ceb	Add Support in geo_match enrichment policy for any type of geometry (#59276 ) geo_match enrichment works currently only with points. This change adds the ability to use any type of geometry.	2020-07-09 11:41:41 +02:00
Andrei Stefan	c0e0bca84c	Remove search_after and implicit_join_key_field (#59232 ) (#59280 ) (cherry picked from commit 6ede6c59eff321b9fedad30e19508b9e4f788b54)	2020-07-09 12:34:01 +03:00
Bogdan Pintea	acfff7b896	Add sample versions of standard deviation and variance funcs (#59093 ) (#59274 ) * Add sample versions of standard deviation and variance functions (#59093) * Add STDDEV_SAMP, VAR_SAMP This commit adds the sampling variations of the standard deviation and variance agg functions. (cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de) * Fix: workaround for lack of Map#of() in Java8 Replace Map#of() with a HashMap static init.	2020-07-09 10:17:13 +02:00
Ignacio Vera	14ab35e323	Fix numerical error in CentroidCalculatorTests#testPolygonAsPoint (#59012 ) (#59272 )	2020-07-09 08:42:07 +02:00
Lee Hinman	bb1c53a0f5	Allow warnings about 'global' template in upgrade tests (#59242 ) These tests sometimes install a template so they can be compatible with older versions, but they run amok of the occasionally installed "global" template which changes the default number of shards. This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they are not (since the global template is only randomly installed). Resolves #58807 Resolves #58258	2020-07-08 13:40:55 -06:00
Armin Braun	cc3c8be0f1	Fix SLMSnapshotBlockingIntegTests.testSnapshotInProgress (#59218 ) (#59239 ) Waiting `INIT` here is dead code in newer versions that don't use `INIT` any longer and leads to nothing being written to the repository in older versions if the snapshot is cancelled at the `INIT` step which then breaks repo consistency checks. Since we have other tests ensuring that snapshot abort works properly we can just remove the wait for `INIT` here and backport this down to 7.8 to fix tests. relates #59140	2020-07-08 19:13:01 +02:00
James Rodewig	838f717e5f	[DOCS] Add data streams to security docs (#59084 ) (#59237 )	2020-07-08 12:53:56 -04:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
David Turner	6ffdb19a2a	Clean searchable snapshots cache on startup (#59009 ) Today we empty the searchable snapshots cache when cleanly closing a shard, but leak cache files in some cases involving an unclean shutdown. Such leaks are not permanent, they are cleaned up on shard relocation or deletion, but they still might last for arbitrarily long until that happens. This commit introduces a cleanup process that runs during node startup to catch such leaks sooner. Also, today we permit searchable snapshots to be held on custom data paths, and store the corresponding cache files within the custom location. Supporting this feature would make the cleanup process significantly more complicated since it would require each node to parse the index metadata for the shards it held before shutdown. Yet, this feature is undocumented and offers minimal benefits to searchable snapshots. Therefore with this commit we forbid custom data paths for searchable snapshot shards.	2020-07-08 15:17:52 +01:00
Nik Everett	a29d3515a2	Improve cardinality measure used to build aggs (#56533 ) (#59107 ) This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-08 08:42:23 -04:00
Dan Hermann	90c8d3fc9d	IndexNameExpressionResolver::dataStreamNames should support exclusions	2020-07-08 07:35:52 -05:00
Armin Braun	9268b25789	Add Check for Metadata Existence in BlobStoreRepository (#59141 ) (#59216 ) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911	2020-07-08 14:25:01 +02:00
Costin Leau	3e32d060bf	EQL: Fix bug in skipping window (#59196 ) Corrected condition that caused a sequence window to be skipped when a query returns no results by checking not just the current stage but also following ones as they can match with in-flight sequences. Improve logging Fix NPE when emptying a SequenceGroup Increase randomization in testing Make maxspan inclusive (up to and equal to value vs just up to) (cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)	2020-07-08 14:36:39 +03:00
Yannick Welsch	0b9eb210b8	Add basic searchable snapshots usage information (#58828 ) (#59160 ) Adds super basic usage information for searchable snapshots, to be extended later. Backport of #58828	2020-07-08 13:09:29 +02:00
Yang Wang	a6109063a2	Even more robust test for API key auth 429 response (#59159 ) (#59208 ) Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.	2020-07-08 16:43:07 +10:00
Nhat Nguyen	ef5c397c0f	Sending operations concurrently in peer recovery (#58018 ) Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay. With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time. Backport of #58018	2020-07-07 22:03:31 -04:00
Albert Zaharovits	d4a0f80c32	Ensure authz role for API key is named after owner role (#59041 ) The composite role that is used for authz, following the authn with an API key, is an intersection of the privileges from the owner role and the key privileges defined when the key has been created. This change ensures that the `#names` property of such a role equals the `#names` property of the key owner role, thereby rectifying the value for the `user.roles` audit event field.	2020-07-07 23:26:57 +03:00
Benjamin Trent	e343e066fc	[7.x] [ML] prefer secondary auth headers on evaluate (#59167 ) (#59183 ) * [ML] prefer secondary auth headers on evaluate (#59167) We should prefer the secondary auth headers when evaluating a data frame	2020-07-07 15:34:47 -04:00
Andrei Dan	24c6a30e2b	[7.9] GET data stream API returns additional information (#59128 ) (#59177 ) * GET data stream API returns additional information (#59128) This adds the data stream's index template, the configured ILM policy (if any) and the health status of the data stream to the GET _data_stream response. Restoring a data stream from a snapshot could install a data stream that doesn't match any composable templates. This also makes the `template` field in the `GET _data_stream` response optional. (cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-07 20:30:09 +01:00
Nik Everett	93ff5bf9c8	Remove blocking from inference pipeline builder (#59096 ) (#59162 ) This removes the blocking model lookup from the `inference` aggregator's builder by integrating it into the request rewrite process that loads stuff asynchronously. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-07 12:31:17 -04:00
Armin Braun	6dec2cf722	Fix SLM Tests Leaking Snapshot Operation (#59150 ) (#59155 ) Fixed an issue #59082 introduced. We have to wait for no more operations in all tests here not just the one we were waiting in already so that the cleanup operation from the parent class can run without failure.	2020-07-07 17:19:06 +02:00
Rene Groeschke	a896df53ac	Remove misc dependency related deprecation warnings (7.x backport) (#59122 ) * Fix dependency related deprecations (#58892) * Fix classpath setup for forbiddenapi usage	2020-07-07 17:10:31 +02:00
Christoph Büscher	7c64a1bd7b	Muting failing ApiKeyIntegTests	2020-07-07 16:02:59 +02:00
Yang Wang	f84b76661d	Make test more robust for API key auth 429 (#59077 ) (#59136 ) Adds error handling when filling up the queue of the crypto thread pool. Also reduce queue size of the crypto thread pool to 10 so that the queue can be cleared out in time. Test testAuthenticationReturns429WhenThreadPoolIsSaturated has seen failure on CI when it tries to push 1000 tasks into the queue (setup phase). Since multiple tests share the same internal test cluster, it may be possible that there are lingering requests not fully cleared out from the queue. When it happens, we will not be able to push all 1000 tasks into the queue. But since what we need is just queue saturation, so as long as we can be sure that the queue is fully filled, it is safe to ignore rejection error and just move on. A number of 1000 tasks also take some to clear out, which could cause the test suite to time out. This PR change the queue to 10 so the tests would have better chance to complete in time.	2020-07-07 22:27:10 +10:00
Rene Groeschke	e8181fc627	Fix implicit duplicate duplicatesStrategy in processResources (#58929 ) (#59127 ) * Fix implicit duplicate duplicatesStrategy in processResources * Fix duplicates strategy in docker distribution setup	2020-07-07 13:45:36 +02:00
Ignacio Vera	5cc6457ed8	upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091 ) (#59120 )	2020-07-07 12:07:41 +02:00
Armin Braun	d6d6df16bb	Share IT Infrastructure between Core Snapshot and SLM ITs (#59082 ) (#59119 ) For #58994 it would be useful to be able to share test infrastructure. This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests accordingly and adds a shared and efficient (compared to the previous implementations) way of waiting for no running snapshot operations to the test infrastructure to dry things up further.	2020-07-07 12:04:41 +02:00
David Roberts	e217f9a1e8	[ML] Wait for shards to initialize after creating ML internal indices (#59087 ) There have been a few test failures that are likely caused by tests performing actions that use ML indices immediately after the actions that create those ML indices. Currently this can result in attempts to search the newly created index before its shards have initialized. This change makes the method that creates the internal ML indices that have been affected by this problem (state and stats) wait for the shards to be initialized before returning. Backport of #59027	2020-07-07 10:52:10 +01:00
Francisco Fernández Castaño	0752a86fe5	Enforce higher priority for RepositoriesService ClusterStateApplier (#59040 ) * Enforce higher priority for RepositoriesService ClusterStateApplier This avoids shards allocation failures when the repository instance comes in the same ClusterState update as the shard allocation. Backport of #58808	2020-07-07 09:51:08 +02:00
Jake Landis	604c6dd528	7.x - Create plugin for yamlTest task (#56841 ) (#59090 ) This commit creates a new Gradle plugin to provide a separate task name and source set for running YAML based REST tests. The only project converted to use the new plugin in this PR is distribution/archives/integ-test-zip. For which the testing has been moved to :rest-api-spec since it makes the most sense and it avoids a small but awkward change to the distribution plugin. The remaining cases in modules, plugins, and x-pack will be handled in followups. This plugin is distinctly different from the plugin introduced in #55896 since the YAML REST tests are intended to be black box tests over HTTP. As such they should not (by default) have access to the classpath for that which they are testing. The YAML based REST tests will be moved to separate source sets (yamlRestTest). The which source is the target for the test resources is dependent on if this new plugin is applied. If it is not applied, it will default to the test source set. Further, this introduces a breaking change for plugin developers that use the YAML testing framework. They will now need to either use the new source set and matching task, or configure the rest resources to use the old "test" source set that matches the old integTest task. (The former should be preferred). As part of this change (which is also breaking for plugin developers) the rest resources plugin has been removed from the build plugin and now requires either explicit application or application via the new YAML REST test plugin. Plugin developers should be able to fix the breaking changes to the YAML tests by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests under a yamlRestTest folder (instead of test)	2020-07-06 14:16:26 -05:00
Costin Leau	f9c15d0fec	EQL: Introduce sequencing fetch size (#59063 ) The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory. This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired. As such, use in testing a minimum fetch size which exposed a number of bugs: Jumping across data across queries causing valid data to be seen as a gap. Incorrectly resuming searching across pages (again causing data to be discarded). which have been addressed. (cherry picked from commit 2f389a7724790d7b0bda67264d6eafcfa8b2116e)	2020-07-06 19:14:26 +03:00
Costin Leau	b2e9c6f640	Update UnresolvedRelationTests UnresolvedRelation does not care about its source during equality hence ignore it when doing randomized mutations. Relates #59014 (cherry picked from commit b21222e714fbf85aad0916e4d4b6a933d2b6958a)	2020-07-06 19:14:25 +03:00
Costin Leau	fe775a315f	EQL: Obey size request parameter (#59014 ) While at it, change the default size to 10 (to align it with the search API defaults). (cherry picked from commit 45795939b277e736a9e4f2f008d1c3f406239075)	2020-07-06 19:14:25 +03:00
Yang Wang	2a1635ad69	Create API key with TransportBulkAction directly (#59046 ) (#59060 ) Use TransportBulkAction directly to create API keys instead of going through the proxy from IndexAction to BulkAction.	2020-07-06 23:32:07 +10:00
Yang Wang	66c0231895	Improve threadpool usage and error handling for API key validation (#58090 ) (#59047 ) The PR introduces following two changes: Move API key validation into a new separate threadpool. The new threadpool is created separately with half of the available processors and 1000 in queue size. We could combine it with the existing TokenService's threadpool. Technically it is straightforward, but I am not sure whether it could be a rushed optimization since I am not clear about potential impact on the token service. On threadpoool saturation, it now fails with EsRejectedExecutionException which in turns gives back a 429, instead of 401 status code to users.	2020-07-06 21:21:07 +10:00
Przemysław Witek	4a791e835b	Simplify parser declarations when specialist types are stored in strings (#58996 ) (#59056 )	2020-07-06 13:05:03 +02:00
Przemysław Witek	f35ad0d4e1	Report peak model memory in ModelSizeStats (#59017 ) (#59055 )	2020-07-06 12:55:12 +02:00
David Kyle	c651135562	[ML] Make Inference processor field_map and inference_config optional (#59010 ) Relaxes the requirement that the inference ingest processor must has a field_map and inference_config defined even if they are empty.	2020-07-06 11:35:30 +01:00
David Kyle	0fc12194bf	[ML] Increase timeout in MlDistributedFailureIT (#58997 ) (#59013 ) Doubles the timeout on the ensureStableClusterOnAllNodes method to 60s to account for v slow ci	2020-07-06 11:30:41 +01:00
Martijn van Groningen	f0dd9b4ace	Add data stream timestamp validation via metadata field mapper (#59002 ) Backport of #58582 to 7.x branch. This commit adds a new metadata field mapper that validates, that a document has exactly a single timestamp value in the data stream timestamp field and that the timestamp field mapping only has `type`, `meta` or `format` attributes configured. Other attributes can affect the guarantee that an index with this meta field mapper has a useable timestamp field. The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever a new backing index of a data stream is created. Relates to #53100	2020-07-06 11:32:33 +02:00
Armin Braun	49857cc35d	Dry up Master Disconnect Disruption Tests (#58953 ) (#59050 ) Dry up tests that use a disruption that isolates the master from all other nodes. Also, turn disruption types that have neither parameters nor state into constants to make things a little clearer.	2020-07-06 11:04:24 +02:00
Yang Wang	a9151db735	Map only specific type of OIDC Claims (#58524 ) (#59043 ) This commit changes our behavior in 2 ways: - When mapping claims to user properties ( principal, email, groups, name), we only handle string and array of string type. Previously we would fail to recognize an array of other types and that would cause failures when trying to cast to String. - When adding unmapped claims to the user metadata, we only handle string, number, boolean and arrays of these. Previously, we would fail to recognize an array of other types and that would cause failures when attempting to process role mappings. For user properties that are inherently single valued, like principal(username) we continue to support arrays of strings where we select the first one in case this is being depended on by users but we plan on removing this leniency in the next major release. Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>	2020-07-06 11:36:41 +10:00
Tanguy Leroux	49f4227837	Check acknowledged responses in FsSearchableSnapshotsIT (#59021 ) Despite all my attempts I did not manage to reproduce issues like the ones described in #58961. My guess is that the _mount request got retried at some point but I wasn't able to validate this assumption. Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small random chunk size and a large number of documents is picked up in the tests. The parent class also does not verify the acknowledged status of some requests. This commit lowers down the chunk size and number of docs in tests (this is extensively tests in unit tests) and also adds assertions on acknowledged responses. Relates #58961	2020-07-05 10:50:31 +02:00
Armin Braun	071d8b2c1c	Deduplicate Empty InternalAggregations (#58386 ) (#59032 ) Working through a heap dump for an unrelated issue I found that we can easily rack up tens of MBs of duplicate empty instances in some cases. I moved to a static constructor to guard against that in all cases.	2020-07-04 14:02:16 +02:00
Bogdan Pintea	e88d71b187	[7.x] SQL: Redact credentials in connection exceptions (#58650 ) (#59025 ) * SQL: Redact credentials in connection exceptions (#58650) This commit adds the functionality to redact the credentials from the exceptions generated when a connection attempt fails, preventing them from leaking into logs, console history etc. There are a few causes that can lead to failed connections. The most challenging to deal with is a malformed connection string. The redaction tries to get around it by modifying the URI to a parsable state, so that the redaction can be applied reliably. If there's no reliability guarantee, the redaction will bluntly replace the entire connection string and the user informed about the option to modify it so that the redaction won't apply. (This is done by using a caplitalized scheme, which is legal, but otherwise never used in practice.) The commit fixes a couple of other issues with the URI parser: - it allows an empty hostname, or even entire connection string (as per the existing documentation); - it reduces the editing of the connection string in the exception messages (so that the user easier recognize their input); - it uses the default URI as source for the scheme and hostname. (cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd) * Implement String#repeat(), unavailable in Java8 Implement a client.StringUtils#repeatString() as a replacement for String#repeat(), unavailable in Java8.	2020-07-04 11:29:06 +02:00
Benjamin Trent	b9d9964d10	[ML] add exponent output aggregator to inference (#58933 ) (#59016 ) * [ML] add exponent output aggregator to inference * fixing docs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-03 14:51:00 -04:00
Lisa Cawley	5c19464a2f	[DOCS] Clarifies number of file and native realms (#58949 )	2020-07-03 11:00:28 -07:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Bogdan Pintea	3d96d91efb	[7.x] SQL: fix handling of escaped chars in JDBC connection string (#58429 ) (#58977 ) SQL: fix handling of escaped chars in JDBC connection string (#58429) This commit fixes an issue emerging when the connection string URI contains escaped characters. The original URI is pre-parsed in order to re-assemble a new URI having the optional elements filled in with defaults. The new URI has been using however the unescaped query and fragment parts. So if these contained any escaped `&` or `=` (such as in the password option value), the unescaping would reveal them and make them later interfere with the options parsing. The commit changes that, so that the new URI be built from the unescaped "raw" parts of the original URI. (cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)	2020-07-03 17:03:00 +02:00
Luca Cavanna	e3fc1638d8	Improve error handling in async search code (#57925 ) - The exception that we caught when failing to schedule a thread was incorrect. - We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager - when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed. Closes #58995	2020-07-03 16:07:26 +02:00
Hendrik Muhs	ca3da7af85	[ML] handle broken setup with state alias being an index (#58999 ) .ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias ML stops working. This change improves error handling by setting the job to failed and properly log and audit the problem. The user still has to manually fix the problem. This change should lead to a quicker resolution of the problem. fixes #58482	2020-07-03 15:26:59 +02:00
Ignacio Vera	2c2486d3d4	Fix GeoHash grid aggregation circuit breaker tests (#58218 ) (#59001 )	2020-07-03 13:46:35 +02:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
Luca Cavanna	4f86f6fb38	Submit async search to not require read privilege (#58942 ) When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data. The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.	2020-07-03 12:18:07 +02:00
David Kyle	f6a0c2c59d	[7.x] Pipeline Inference Aggregation (#58965 ) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results.	2020-07-03 09:29:04 +01:00
Tim Vernum	1133c29ce9	Treat roles as a SortedSet (#58988 ) The Saml SP document stored the role mapping in a Set, but this made the order in XContent inconsistent. This switched it to use a TreeSet. Resolves: #54733 Backport of: #55201	2020-07-03 13:40:58 +10:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Benjamin Trent	bd9b3b6116	[ML] fix inference ml-stats-write alias creation (#58947 ) (#58959 ) The check for potentially creating the .ml-stats-write alias should verify that the indices actually exist. closes #58662	2020-07-02 16:16:42 -04:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Tal Levy	d516959774	Re-enable support for array-valued geo_shape fields. (#58786 ) (#58943 ) A regression in the mapping code led to geo_shape no longer supporting array-valued fields. This commit fixes this support and adds an integration test to make sure this problem does not return!	2020-07-02 11:21:55 -07:00
David Roberts	2c04685b81	[ML] Ensure config index mappings are up-to-date before updating configs (#58938 ) We already had code to ensure the config index mappings were up-to-date before creating a new config. However, it's also possible that an update to a config could add the latest settings that require the latest mappings to index correctly. This change checks that the latest config index mappings are in place in the 3 update actions in the same way as the checks are done in the 3 put actions. Backport of #58916	2020-07-02 18:55:19 +01:00
Robin Clarke	567720d970	[DOCS] Added caveat about the number of file realms (#58369 )	2020-07-02 10:27:36 -07:00
Dan Hermann	c988afdc15	Data stream support for migrations deprecations info API	2020-07-02 11:16:22 -05:00
Przemysław Witek	751e84e4c8	Rename regression evaluation metrics to make the names consistent with loss functions (#58887 ) (#58927 )	2020-07-02 17:35:55 +02:00
Tanguy Leroux	6aa669c8bb	Fix SearchableSnapshotDirectoryStatsTests (#58912 ) Similar to #58847 but in a different tests. The failure never reproduced locally but occurs from time to time on CI.	2020-07-02 16:39:26 +02:00
Dan Hermann	b78bfa01f6	[7.x] Data stream support for graph explore API	2020-07-02 08:19:03 -05:00
David Kyle	d6643bfc7f	Revert "Mute FsSearchableSnapshotsIT testClearCache (#58902 )" The test was fixed in #58847 This reverts commit `bb96c910a5`.	2020-07-02 13:21:05 +01:00
David Kyle	bb96c910a5	Mute FsSearchableSnapshotsIT testClearCache (#58902 ) For #58901	2020-07-02 12:58:28 +01:00
Costin Leau	965f77fa44	EQL: Introduce sequence internal paging (#58859 ) Refactor sequence matching classes in order to decouple querying from results consumption (and matching). Rename some classes to better convey their intent. Introduce internal pagination of sequence algorithm, that is getting the data in slices and, if needed, moving forward in order to find more matches until either the dataset is consumer or the number of results desired is found. (cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)	2020-07-02 13:44:21 +03:00
Przemysław Witek	8e074c4495	Rename "error" field to "value" for consistency between metrics (#58726 ) (#58870 )	2020-07-02 09:08:56 +02:00
Yang Wang	a5a8b4ae1d	Add cache for application privileges (#55836 ) (#58798 ) Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).	2020-07-02 11:50:03 +10:00
Benjamin Trent	c64e283dbf	[7.x] [ML] handles compressed model stream from native process (#58009 ) (#58836 ) * [ML] handles compressed model stream from native process (#58009) This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: https://github.com/elastic/ml-cpp/pull/1349	2020-07-01 15:14:31 -04:00
Mark Vieira	1fcaec7dfc	Ignore test seed used in test system properties (#58789 )	2020-07-01 11:52:22 -07:00
James Rodewig	a966513eae	[DOCS] Remove problematic terms (#58832 ) (#58851 )	2020-07-01 13:47:14 -04:00
Nhat Nguyen	f63cbad629	Ensure CCR partial reads never overuse buffer (#58620 ) When the documents are large, a follower can receive a partial response because the requesting range of operations is capped by max_read_request_size instead of max_read_request_operation_count. In this case, the follower will continue reading the subsequent ranges without checking the remaining size of the buffer. The buffer then can use more memory than max_write_buffer_size and even causes OOM. Backport of #58620	2020-07-01 13:23:28 -04:00
Tanguy Leroux	ec4843f4df	Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847 ) Since #58728 part of searchable snapshot shard files are written in cache in an asynchronous manner in a dedicated thread pool. It means that even if a search query is successful and returns, there are still more bytes to write in the cached files on disk. On CI this can be slow; if we want to check that the cached_bytes_written has changed we need to check multiple times to give some time for the cached data to be effectively written.	2020-07-01 18:01:00 +02:00
Benjamin Trent	c768467155	Muting flakey test (#58855 ) (#58856 )	2020-07-01 11:54:43 -04:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Ryan Ernst	c23613e05a	Split license allowed checks into two types (#58704 ) (#58797 ) The checks on the license state have a singular method, isAllowed, that returns whether the given feature is allowed by the current license. However, there are two classes of usages, one which intends to actually use a feature, and another that intends to return in telemetry whether the feature is allowed. When feature usage tracking is added, the latter case should not count as a "usage", so this commit reworks the calls to isAllowed into 2 methods, checkFeature, which will (eventually) both check whether a feature is allowed, and keep track of the last usage time, and isAllowed, which simply determines whether the feature is allowed. Note that I considered having a boolean flag on the current method, but wanted the additional clarity that a different method name provides, versus a boolean flag which is more easily copied without realizing what the flag means since it is nameless in call sites.	2020-07-01 07:11:05 -07:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
Tanguy Leroux	d35e8f45da	Allow read operations to be executed without waiting for full range to be written in cache (#58728 ) (#58829 ) This commit changes CacheFile and CachedBlobContainerIndexInput so that the read operations made by these classes are now progressively executed and do not wait for full range to be written in cache. It relies on the change introduced in #58477 and it is the last change extracted from #58164. Relates #58164	2020-07-01 15:38:17 +02:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Andrei Stefan	b904a60275	EQL: Add case handling to stringContains (#58762 ) (#58813 ) Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit 1a58776d3aa563beb364b067a1db46497122306f)	2020-07-01 13:51:45 +03:00
Andrei Stefan	470bcee5bf	EQL: Integrate TOML tests for function folding (#58748 ) (#58812 ) Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit e9b1fa58cf8d510a4b4afb14f66b0d5f9c603ebb)	2020-07-01 13:50:54 +03:00
Przemysław Witek	2638809cba	Mute failing test DataFrameAnalyticsConfigProviderIT.testUpdate_UpdateCannotBeAppliedWhenTaskIsRunning (#58821 )	2020-07-01 12:28:23 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
David Kyle	27d52d4d23	Remove the Model interface (#58754 ) (#58803 ) The Model interface was implemented by just one class and did not contribute to making the code more undertandable	2020-07-01 09:57:02 +01:00
Dario Gieselaar	417f7062c5	[7.x] Add read privileges for annotations for apm_user (#58530 ) (#58781 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-01 09:04:57 +02:00
Yang Wang	3d49e62960	Support handling LogoutResponse from SAML idP (#56316 ) (#58792 ) SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective. The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error. This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.	2020-07-01 16:47:27 +10:00
Tim Vernum	9e49af03b7	Reenable test after backport (#58717 ) This commit re-enables CCR rolling upgrade tests following the backport of #58217 to 7.8 branch (7.8.1)	2020-07-01 11:50:30 +10:00
Lee Hinman	74a78b3a7b	Mute AzureSearchableSnapshotsIT (#58775 ) Relates to #58260	2020-06-30 13:30:51 -06:00
Dan Hermann	22806c943d	Data stream support for ILM remove policy API (#58595 ) (#58770 )	2020-06-30 14:03:19 -05:00
Benjamin Trent	a2331bc9d4	[Transform] fix bug in supporting boolean values in pivot (#58741 ) (#58760 ) Since the underlying composite aggs support boolean mapped values for terms, transforms should also support them closes #58697	2020-06-30 13:47:58 -04:00
Martijn van Groningen	adcef93a6c	Introduce new put mapping action for dynamic mapping updates. (#58746 ) Backport of #58419 Mapping updates that originate from indexing a document with unmapped fields will use this new action instead of the current put mapping action. This way on the security side, authorization logic can easily determine whether a mapping update is automatically generated or a mapping update originates from the put mapping api. The new auto put mapping action is only used if all nodes are on the version that supports it.	2020-06-30 18:02:31 +02:00
Julie Tibshirani	ab65a57d70	Merge mappings for composable index templates (#58709 ) This PR implements recursive mapping merging for composable index templates. When creating an index, we perform the following: * Add each component template mapping in order, merging each one in after the last. * Merge in the index template mappings (if present). * Merge in the mappings on the index request itself (if present). Some principles: * All 'structural' changes are disallowed (but everything else is fine). An object mapper can never be changed between `type: object` and `type: nested`. A field mapper can never be changed to an object mapper, and vice versa. * Generally, each section is merged recursively. This includes `object` mappings, as well as root options like `dynamic_templates` and `meta`. Once we reach 'leaf components' like field definitions, they always overwrite an existing one instead of being merged. Relates to #53101.	2020-06-30 08:01:37 -07:00
David Roberts	d9e0e0bf95	[ML] Pass through the stop-on-warn setting for categorization jobs (#58738 ) When per_partition_categorization.stop_on_warn is set for an analysis config it is now passed through to the autodetect C++ process. Also adds some end-to-end tests that exercise the functionality added in elastic/ml-cpp#1356 Backport of #58632	2020-06-30 15:17:04 +01:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Przemysław Witek	9ea9b7bd3b	[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 ) (#58731 )	2020-06-30 14:09:11 +02:00
Benjamin Trent	def5550df3	[ML] fix ml inference stats tests (#58690 ) (#58729 )	2020-06-30 07:53:33 -04:00
Przemyslaw Gomulka	3923a10165	Exclude SystemV timezones from randomZone method (#58549 ) (#58655 ) RandomZone test method returns a ZoneId from the set of ids supported by java. The only difference between joda and java supported timezones are SystemV* timezones. These should be excluded from randomZone method as they would break testing. They also do not bring much confidence when used in testing as I suspect they are rarely used. That exclude should be removed for simplification once joda support is removed.	2020-06-30 12:45:53 +02:00
Andrei Stefan	7b80ea7218	Fix release tests (#58713 ) (#58725 ) (cherry picked from commit 7816c100612168bf46595c4813fe374bca2e7259)	2020-06-30 13:42:32 +03:00
Tanguy Leroux	4e03633a66	Differentiate base paths for searchable snapshots QA tests (#58664 ) (#58714 ) This commit adds the BuildParams.testSeed to the repository base paths used in searchable snapshots QA tests. For S3 and GCS the test seed is added for coherency sake with other integration tests while it's required for Azure as Azure 3rd party tests are executed on CI simultaneously for regular and SAS token accounts. Closes #58260	2020-06-30 10:18:33 +02:00
Tim Vernum	dcc5a06dec	Display enterprise license as platinum in /_xpack (#58217 ) The GET /_license endpoint displays "enterprise" licenses as "platinum" by default so that old clients (including beats, kibana and logstash) know to interpret this new license type as if it were a platinum license. However, this compatibility layer was not applied to the GET /_xpack/ endpoint which also displays a license type & mode. This commit causes the _xpack API to mimic the _license API and treat enterprise as platinum by default, with a new accept_enterprise parameter that will cause the API to return the correct "enterprise" value. This BWC layer exists only for the 7.x branch. This is a breaking change because, since 7.6, the _xpack API has returned "enterprise" for enterprise licenses, but this has been found to break old versions of beats and logstash so needs to be corrected.	2020-06-30 16:42:28 +10:00
Costin Leau	3a546f1f51	EQL: Introduce support for sequence maxspan (#58635 ) EQL sequences can specify now a maximum time allowed for their span (computed between the first and the last matching event). (cherry picked from commit 747c3592244192a2e25a092f62aec91a899afc83)	2020-06-29 21:31:00 +03:00
Igor Motov	773f3574a9	Removes debug logging from RestEqlCancellationIT (#58676 ) The test didn't fail since the fix in #58493. So, it's time to remove debug logging and close the issue. Closes #58270	2020-06-29 13:15:01 -04:00
Andrei Stefan	3cb8f54f28	EQL: case sensitivity aware integration testing (#58624 ) (#58672 ) * EQL: case sensitivity aware integration testing (#58624) * Add DataLoader * Rewrite case sensitivity settings: NULL -> run both case sensitive and insensitive tests TRUE -> run case sensitive test only FALSE -> run case insensitive test only * Rename test_queries_supported * Add more toml tests from the Python client Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)	2020-06-29 18:40:07 +03:00
Tanguy Leroux	73adcf4d44	SparseFileTracker.Gap should keep a reference to the corresponding Range (#58587 ) (#58665 ) SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill, it does not need to resolve the range each time onSuccess/onProgress/onFailure are called. Relates #58477	2020-06-29 15:24:19 +02:00
Przemysław Witek	3f7c45472e	[7.x] Introduce DataFrameAnalyticsConfig update API (#58302 ) (#58648 )	2020-06-29 10:56:11 +02:00
Yang Wang	61fa7f4d22	Change privilege of enrich stats API to monitor (#52027 ) (#52196 ) The remote_monitoring_user user needs to access the enrich stats API. But the request is denied because the API is categorized under admin. The correct privilege should be monitor.	2020-06-29 10:25:33 +10:00
Dimitris Athanasiou	1817b896c9	[7.x][ML] Add status and increased estimate to memory usage (#58588 ) (#58606 ) Adds parsing of `status` and `memory_reestimate_bytes` to data frame analytics `memory_usage`. When the training surpasses the model memory limit, the status will be set to `hard_limit` and `memory_reestimate_bytes` can be used to update the job's limit in order to restart the job. Backport of #58588	2020-06-28 16:27:26 +03:00
Costin Leau	3c81b91474	EQL: Add Head/Tail pipe support (#58536 ) Introduce pipe support, in particular head and tail (which can also be chained). (cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea) (cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)	2020-06-27 09:49:14 +03:00
Benjamin Trent	7a202b149e	Muting analytics tests (#58617 ) (#58618 )	2020-06-26 16:50:59 -04:00
Tanguy Leroux	775fb5d4cf	Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477 ) (#58584 ) Today SparseFileTracker allows to wait for a range to become available before executing a given listener. In the case of searchable snapshot, we'd like to be able to wait for a large range to be filled (ie, downloaded and written to disk) while being able to execute the listener as soon as a smaller range is available. This pull request is an extract from #58164 which introduces a ProgressListenableActionFuture that is used internally by SparseFileTracker. The progressive listenable future allows to register listeners attached to SparseFileTracker.Gap so that they are executed once the Gap is completed (with success or failure) or as soon as the Gap progress reaches a given progress value. This progress value is defined when the tracker.waitForRange() method is called; this method has been modified to accept a range and another listener's range to operate on. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-26 18:26:20 +02:00
James Baiera	89243857ce	Update precommit to filter out project dependencies (#58189 ) (#58572 ) If a project is pulling in an external org.elasticsearch dependency, the dependency report generation would require a license file for the dependency to be present. This would break precommit because a license was present that it did not feel was warranted. This un-reverts the update to the dependenciesInfo task, as well as the JNA license addition.	2020-06-25 16:33:25 -04:00
Lee Hinman	f732003370	[7.x] Fix negative limiting with fewer PARTIAL snapshots than minimum required (#58563 ) (#58569 ) In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can cause: ``` [mynode] error during snapshot retention task java.lang.IllegalArgumentException: -5 at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?] at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?] at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?] at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?] ``` When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and adds a unit test for the behavior. Resolves #58515	2020-06-25 14:16:34 -06:00
Henning Andersen	38be2812b1	Enhance extensible plugin (#58542 ) Rather than let ExtensiblePlugins know extending plugins' classloaders, we now pass along an explicit ExtensionLoader that loads the extensions asked for. Extensions constructed that way can optionally receive their own Plugin instance in the constructor.	2020-06-25 20:37:56 +02:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Igor Motov	20af856abd	[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192 ) Adds async support to EQL searches Closes #49638 Co-authored-by: James Rodewig james.rodewig@elastic.co	2020-06-25 14:11:57 -04:00
Benjamin Trent	c7ba79bc19	[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537 ) (#58553 ) * [ML] make waiting for renormalization optional for internally flushing job (#58537) When flushing, datafeeds only need the guaruntee that the latest bucket has been handled. But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data. This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization. closes #58395	2020-06-25 12:26:52 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Nik Everett	71adade73a	Return clear error message if aggregation type is invalid (#58255 ) (#58365 ) The main changes are: 1. Catch the `NamedObjectNotFoundException` when parsing aggregation type, and then throw a `ParsingException` with clear error message with hint. 2. Add a unit test method: AggregatorFactoriesTests#testInvalidType(). Closes #58146. Co-authored-by: bellengao <gbl_long@163.com>	2020-06-25 11:08:25 -04:00
Dimitris Athanasiou	c3dfafe0b4	[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541 ) (#58550 ) It is possible for the source document to have an empty string value for a field that is mapped as numeric. We should treat those as missing values and avoid throwing an assertion error. Backport of #58541	2020-06-25 18:07:29 +03:00
Dimitris Athanasiou	5af7071db0	[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546 ) This changes the default value for the results field of inference applied on models that are trained via a data frame analytics job. Previously, the results field default was `predicted_value`. This commit makes it the same as in the training job itself. The new default field is `<dependent_variable>_prediction`. Apart from making inference consistent with the training job the model came from, it is helpful to preserve the dependent variable name by default as it provides some context to the user that may avoid confusion as to which model results came from. Backport of #58538	2020-06-25 18:03:43 +03:00
Benjamin Trent	add8ff1ad3	[ML] assume data streams are enabled in data stream tests (#58502 ) (#58508 )	2020-06-24 14:14:48 -04:00
Chris Roberson	d5899d1765	[Monitoring] APM mapping update (#46244 ) (#58498 ) * Add acm mapping to APM for beats * Add root mapping for APM * Add sourcemap mapping to APM * Fix missing properties * Fix a second missing properties * Add request property to acm * Remove root and sourcemap per review Co-authored-by: Mike Place <mike.place@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-24 13:26:30 -04:00
Armin Braun	9e4c5d1dde	Cleaner Handling of Snapshot Related null Custom Values in CS (#58382 ) (#58501 ) Add the ability to get a custom value while specifying a default and use it throughout the codebase to get rid of the `null` edge case and shorten the code a little.	2020-06-24 17:24:44 +02:00

1 2 3 4 5 ...

5927 Commits