OpenSearch

Commit Graph

Author	SHA1	Message	Date
Costin Leau	965f77fa44	EQL: Introduce sequence internal paging (#58859 ) Refactor sequence matching classes in order to decouple querying from results consumption (and matching). Rename some classes to better convey their intent. Introduce internal pagination of sequence algorithm, that is getting the data in slices and, if needed, moving forward in order to find more matches until either the dataset is consumer or the number of results desired is found. (cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)	2020-07-02 13:44:21 +03:00
Przemysław Witek	8e074c4495	Rename "error" field to "value" for consistency between metrics (#58726 ) (#58870 )	2020-07-02 09:08:56 +02:00
Yang Wang	a5a8b4ae1d	Add cache for application privileges (#55836 ) (#58798 ) Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).	2020-07-02 11:50:03 +10:00
Benjamin Trent	c64e283dbf	[7.x] [ML] handles compressed model stream from native process (#58009 ) (#58836 ) * [ML] handles compressed model stream from native process (#58009) This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: https://github.com/elastic/ml-cpp/pull/1349	2020-07-01 15:14:31 -04:00
Mark Vieira	1fcaec7dfc	Ignore test seed used in test system properties (#58789 )	2020-07-01 11:52:22 -07:00
James Rodewig	a966513eae	[DOCS] Remove problematic terms (#58832 ) (#58851 )	2020-07-01 13:47:14 -04:00
Nhat Nguyen	f63cbad629	Ensure CCR partial reads never overuse buffer (#58620 ) When the documents are large, a follower can receive a partial response because the requesting range of operations is capped by max_read_request_size instead of max_read_request_operation_count. In this case, the follower will continue reading the subsequent ranges without checking the remaining size of the buffer. The buffer then can use more memory than max_write_buffer_size and even causes OOM. Backport of #58620	2020-07-01 13:23:28 -04:00
Tanguy Leroux	ec4843f4df	Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847 ) Since #58728 part of searchable snapshot shard files are written in cache in an asynchronous manner in a dedicated thread pool. It means that even if a search query is successful and returns, there are still more bytes to write in the cached files on disk. On CI this can be slow; if we want to check that the cached_bytes_written has changed we need to check multiple times to give some time for the cached data to be effectively written.	2020-07-01 18:01:00 +02:00
Benjamin Trent	c768467155	Muting flakey test (#58855 ) (#58856 )	2020-07-01 11:54:43 -04:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Ryan Ernst	c23613e05a	Split license allowed checks into two types (#58704 ) (#58797 ) The checks on the license state have a singular method, isAllowed, that returns whether the given feature is allowed by the current license. However, there are two classes of usages, one which intends to actually use a feature, and another that intends to return in telemetry whether the feature is allowed. When feature usage tracking is added, the latter case should not count as a "usage", so this commit reworks the calls to isAllowed into 2 methods, checkFeature, which will (eventually) both check whether a feature is allowed, and keep track of the last usage time, and isAllowed, which simply determines whether the feature is allowed. Note that I considered having a boolean flag on the current method, but wanted the additional clarity that a different method name provides, versus a boolean flag which is more easily copied without realizing what the flag means since it is nameless in call sites.	2020-07-01 07:11:05 -07:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
Tanguy Leroux	d35e8f45da	Allow read operations to be executed without waiting for full range to be written in cache (#58728 ) (#58829 ) This commit changes CacheFile and CachedBlobContainerIndexInput so that the read operations made by these classes are now progressively executed and do not wait for full range to be written in cache. It relies on the change introduced in #58477 and it is the last change extracted from #58164. Relates #58164	2020-07-01 15:38:17 +02:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Andrei Stefan	b904a60275	EQL: Add case handling to stringContains (#58762 ) (#58813 ) Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit 1a58776d3aa563beb364b067a1db46497122306f)	2020-07-01 13:51:45 +03:00
Andrei Stefan	470bcee5bf	EQL: Integrate TOML tests for function folding (#58748 ) (#58812 ) Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit e9b1fa58cf8d510a4b4afb14f66b0d5f9c603ebb)	2020-07-01 13:50:54 +03:00
Przemysław Witek	2638809cba	Mute failing test DataFrameAnalyticsConfigProviderIT.testUpdate_UpdateCannotBeAppliedWhenTaskIsRunning (#58821 )	2020-07-01 12:28:23 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
David Kyle	27d52d4d23	Remove the Model interface (#58754 ) (#58803 ) The Model interface was implemented by just one class and did not contribute to making the code more undertandable	2020-07-01 09:57:02 +01:00
Dario Gieselaar	417f7062c5	[7.x] Add read privileges for annotations for apm_user (#58530 ) (#58781 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-01 09:04:57 +02:00
Yang Wang	3d49e62960	Support handling LogoutResponse from SAML idP (#56316 ) (#58792 ) SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective. The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error. This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.	2020-07-01 16:47:27 +10:00
Tim Vernum	9e49af03b7	Reenable test after backport (#58717 ) This commit re-enables CCR rolling upgrade tests following the backport of #58217 to 7.8 branch (7.8.1)	2020-07-01 11:50:30 +10:00
Lee Hinman	74a78b3a7b	Mute AzureSearchableSnapshotsIT (#58775 ) Relates to #58260	2020-06-30 13:30:51 -06:00
Dan Hermann	22806c943d	Data stream support for ILM remove policy API (#58595 ) (#58770 )	2020-06-30 14:03:19 -05:00
Benjamin Trent	a2331bc9d4	[Transform] fix bug in supporting boolean values in pivot (#58741 ) (#58760 ) Since the underlying composite aggs support boolean mapped values for terms, transforms should also support them closes #58697	2020-06-30 13:47:58 -04:00
Martijn van Groningen	adcef93a6c	Introduce new put mapping action for dynamic mapping updates. (#58746 ) Backport of #58419 Mapping updates that originate from indexing a document with unmapped fields will use this new action instead of the current put mapping action. This way on the security side, authorization logic can easily determine whether a mapping update is automatically generated or a mapping update originates from the put mapping api. The new auto put mapping action is only used if all nodes are on the version that supports it.	2020-06-30 18:02:31 +02:00
Julie Tibshirani	ab65a57d70	Merge mappings for composable index templates (#58709 ) This PR implements recursive mapping merging for composable index templates. When creating an index, we perform the following: * Add each component template mapping in order, merging each one in after the last. * Merge in the index template mappings (if present). * Merge in the mappings on the index request itself (if present). Some principles: * All 'structural' changes are disallowed (but everything else is fine). An object mapper can never be changed between `type: object` and `type: nested`. A field mapper can never be changed to an object mapper, and vice versa. * Generally, each section is merged recursively. This includes `object` mappings, as well as root options like `dynamic_templates` and `meta`. Once we reach 'leaf components' like field definitions, they always overwrite an existing one instead of being merged. Relates to #53101.	2020-06-30 08:01:37 -07:00
David Roberts	d9e0e0bf95	[ML] Pass through the stop-on-warn setting for categorization jobs (#58738 ) When per_partition_categorization.stop_on_warn is set for an analysis config it is now passed through to the autodetect C++ process. Also adds some end-to-end tests that exercise the functionality added in elastic/ml-cpp#1356 Backport of #58632	2020-06-30 15:17:04 +01:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Przemysław Witek	9ea9b7bd3b	[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 ) (#58731 )	2020-06-30 14:09:11 +02:00
Benjamin Trent	def5550df3	[ML] fix ml inference stats tests (#58690 ) (#58729 )	2020-06-30 07:53:33 -04:00
Przemyslaw Gomulka	3923a10165	Exclude SystemV timezones from randomZone method (#58549 ) (#58655 ) RandomZone test method returns a ZoneId from the set of ids supported by java. The only difference between joda and java supported timezones are SystemV* timezones. These should be excluded from randomZone method as they would break testing. They also do not bring much confidence when used in testing as I suspect they are rarely used. That exclude should be removed for simplification once joda support is removed.	2020-06-30 12:45:53 +02:00
Andrei Stefan	7b80ea7218	Fix release tests (#58713 ) (#58725 ) (cherry picked from commit 7816c100612168bf46595c4813fe374bca2e7259)	2020-06-30 13:42:32 +03:00
Tanguy Leroux	4e03633a66	Differentiate base paths for searchable snapshots QA tests (#58664 ) (#58714 ) This commit adds the BuildParams.testSeed to the repository base paths used in searchable snapshots QA tests. For S3 and GCS the test seed is added for coherency sake with other integration tests while it's required for Azure as Azure 3rd party tests are executed on CI simultaneously for regular and SAS token accounts. Closes #58260	2020-06-30 10:18:33 +02:00
Tim Vernum	dcc5a06dec	Display enterprise license as platinum in /_xpack (#58217 ) The GET /_license endpoint displays "enterprise" licenses as "platinum" by default so that old clients (including beats, kibana and logstash) know to interpret this new license type as if it were a platinum license. However, this compatibility layer was not applied to the GET /_xpack/ endpoint which also displays a license type & mode. This commit causes the _xpack API to mimic the _license API and treat enterprise as platinum by default, with a new accept_enterprise parameter that will cause the API to return the correct "enterprise" value. This BWC layer exists only for the 7.x branch. This is a breaking change because, since 7.6, the _xpack API has returned "enterprise" for enterprise licenses, but this has been found to break old versions of beats and logstash so needs to be corrected.	2020-06-30 16:42:28 +10:00
Costin Leau	3a546f1f51	EQL: Introduce support for sequence maxspan (#58635 ) EQL sequences can specify now a maximum time allowed for their span (computed between the first and the last matching event). (cherry picked from commit 747c3592244192a2e25a092f62aec91a899afc83)	2020-06-29 21:31:00 +03:00
Igor Motov	773f3574a9	Removes debug logging from RestEqlCancellationIT (#58676 ) The test didn't fail since the fix in #58493. So, it's time to remove debug logging and close the issue. Closes #58270	2020-06-29 13:15:01 -04:00
Andrei Stefan	3cb8f54f28	EQL: case sensitivity aware integration testing (#58624 ) (#58672 ) * EQL: case sensitivity aware integration testing (#58624) * Add DataLoader * Rewrite case sensitivity settings: NULL -> run both case sensitive and insensitive tests TRUE -> run case sensitive test only FALSE -> run case insensitive test only * Rename test_queries_supported * Add more toml tests from the Python client Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)	2020-06-29 18:40:07 +03:00
Tanguy Leroux	73adcf4d44	SparseFileTracker.Gap should keep a reference to the corresponding Range (#58587 ) (#58665 ) SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill, it does not need to resolve the range each time onSuccess/onProgress/onFailure are called. Relates #58477	2020-06-29 15:24:19 +02:00
Przemysław Witek	3f7c45472e	[7.x] Introduce DataFrameAnalyticsConfig update API (#58302 ) (#58648 )	2020-06-29 10:56:11 +02:00
Yang Wang	61fa7f4d22	Change privilege of enrich stats API to monitor (#52027 ) (#52196 ) The remote_monitoring_user user needs to access the enrich stats API. But the request is denied because the API is categorized under admin. The correct privilege should be monitor.	2020-06-29 10:25:33 +10:00
Dimitris Athanasiou	1817b896c9	[7.x][ML] Add status and increased estimate to memory usage (#58588 ) (#58606 ) Adds parsing of `status` and `memory_reestimate_bytes` to data frame analytics `memory_usage`. When the training surpasses the model memory limit, the status will be set to `hard_limit` and `memory_reestimate_bytes` can be used to update the job's limit in order to restart the job. Backport of #58588	2020-06-28 16:27:26 +03:00
Costin Leau	3c81b91474	EQL: Add Head/Tail pipe support (#58536 ) Introduce pipe support, in particular head and tail (which can also be chained). (cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea) (cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)	2020-06-27 09:49:14 +03:00
Benjamin Trent	7a202b149e	Muting analytics tests (#58617 ) (#58618 )	2020-06-26 16:50:59 -04:00
Tanguy Leroux	775fb5d4cf	Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477 ) (#58584 ) Today SparseFileTracker allows to wait for a range to become available before executing a given listener. In the case of searchable snapshot, we'd like to be able to wait for a large range to be filled (ie, downloaded and written to disk) while being able to execute the listener as soon as a smaller range is available. This pull request is an extract from #58164 which introduces a ProgressListenableActionFuture that is used internally by SparseFileTracker. The progressive listenable future allows to register listeners attached to SparseFileTracker.Gap so that they are executed once the Gap is completed (with success or failure) or as soon as the Gap progress reaches a given progress value. This progress value is defined when the tracker.waitForRange() method is called; this method has been modified to accept a range and another listener's range to operate on. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-26 18:26:20 +02:00
James Baiera	89243857ce	Update precommit to filter out project dependencies (#58189 ) (#58572 ) If a project is pulling in an external org.elasticsearch dependency, the dependency report generation would require a license file for the dependency to be present. This would break precommit because a license was present that it did not feel was warranted. This un-reverts the update to the dependenciesInfo task, as well as the JNA license addition.	2020-06-25 16:33:25 -04:00
Lee Hinman	f732003370	[7.x] Fix negative limiting with fewer PARTIAL snapshots than minimum required (#58563 ) (#58569 ) In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can cause: ``` [mynode] error during snapshot retention task java.lang.IllegalArgumentException: -5 at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?] at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?] at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?] at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?] ``` When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and adds a unit test for the behavior. Resolves #58515	2020-06-25 14:16:34 -06:00
Henning Andersen	38be2812b1	Enhance extensible plugin (#58542 ) Rather than let ExtensiblePlugins know extending plugins' classloaders, we now pass along an explicit ExtensionLoader that loads the extensions asked for. Extensions constructed that way can optionally receive their own Plugin instance in the constructor.	2020-06-25 20:37:56 +02:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Igor Motov	20af856abd	[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192 ) Adds async support to EQL searches Closes #49638 Co-authored-by: James Rodewig james.rodewig@elastic.co	2020-06-25 14:11:57 -04:00
Benjamin Trent	c7ba79bc19	[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537 ) (#58553 ) * [ML] make waiting for renormalization optional for internally flushing job (#58537) When flushing, datafeeds only need the guaruntee that the latest bucket has been handled. But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data. This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization. closes #58395	2020-06-25 12:26:52 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Nik Everett	71adade73a	Return clear error message if aggregation type is invalid (#58255 ) (#58365 ) The main changes are: 1. Catch the `NamedObjectNotFoundException` when parsing aggregation type, and then throw a `ParsingException` with clear error message with hint. 2. Add a unit test method: AggregatorFactoriesTests#testInvalidType(). Closes #58146. Co-authored-by: bellengao <gbl_long@163.com>	2020-06-25 11:08:25 -04:00
Dimitris Athanasiou	c3dfafe0b4	[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541 ) (#58550 ) It is possible for the source document to have an empty string value for a field that is mapped as numeric. We should treat those as missing values and avoid throwing an assertion error. Backport of #58541	2020-06-25 18:07:29 +03:00
Dimitris Athanasiou	5af7071db0	[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546 ) This changes the default value for the results field of inference applied on models that are trained via a data frame analytics job. Previously, the results field default was `predicted_value`. This commit makes it the same as in the training job itself. The new default field is `<dependent_variable>_prediction`. Apart from making inference consistent with the training job the model came from, it is helpful to preserve the dependent variable name by default as it provides some context to the user that may avoid confusion as to which model results came from. Backport of #58538	2020-06-25 18:03:43 +03:00
Benjamin Trent	add8ff1ad3	[ML] assume data streams are enabled in data stream tests (#58502 ) (#58508 )	2020-06-24 14:14:48 -04:00
Chris Roberson	d5899d1765	[Monitoring] APM mapping update (#46244 ) (#58498 ) * Add acm mapping to APM for beats * Add root mapping for APM * Add sourcemap mapping to APM * Fix missing properties * Fix a second missing properties * Add request property to acm * Remove root and sourcemap per review Co-authored-by: Mike Place <mike.place@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-24 13:26:30 -04:00
Armin Braun	9e4c5d1dde	Cleaner Handling of Snapshot Related null Custom Values in CS (#58382 ) (#58501 ) Add the ability to get a custom value while specifying a default and use it throughout the codebase to get rid of the `null` edge case and shorten the code a little.	2020-06-24 17:24:44 +02:00
Martijn van Groningen	f4fad9c65a	Re-enable data streams yaml tests in bwc mode (#58500 ) Backport of #58403 to 7.x branch.	2020-06-24 16:59:51 +02:00
Hendrik Muhs	c1bbfeddc9	Improve rolling upgrade test setup assertions (#58313 ) wrap test setup and add proper assert messages relates #58282	2020-06-24 16:54:48 +02:00
Andrei Stefan	69f73d948b	EQL: code cleanup and further tests (#58458 ) (#58497 ) Add FunctionPipe tests to all functions. Cleanup functions code. (cherry picked from commit 0f83d5799841fe99d8aeaf46e50dd11aa6bf8a57)	2020-06-24 17:38:56 +03:00
Przemysław Witek	551b8bcd73	[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names (#58484 ) (#58490 )	2020-06-24 15:52:45 +02:00
Benjamin Trent	fa88e71532	[ML] unify usages of _all and wildcard <*> (#58460 ) (#58494 )	2020-06-24 09:47:57 -04:00
Luca Cavanna	dbbf2772d8	Mute newly added ml data streams tests (#58492 ) Relates to #58491	2020-06-24 15:11:40 +02:00
markharwood	d5ac3bb87f	Field capabilities - make `keyword` a family of field types (#58315 ) (#58483 ) Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type. Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities. Relates to #53175	2020-06-24 12:32:14 +01:00
Alan Woodward	d251a482e9	Move MappedFieldType.similarity() to TextSearchInfo (#58439 ) Similarities only apply to a few text-based field types, but are currently set directly on the base MappedFieldType class. This commit moves similarity information into TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper. It was previously possible to include a similarity parameter on a number of field types that would then ignore this information. To make it obvious that this has no effect, setting this parameter on non-text field types now issues a deprecation warning.	2020-06-24 10:00:32 +01:00
Jim Ferenczi	fcd8a432d9	Submit _async search task should cancel children on cancellation (#58332 ) This change allows the submit async search task to cancel children and removes the manual indirection that cancels the search task when the submit task is cancelled. This is now handled by the task cancellation, which can cancel grand-children since #54757.	2020-06-24 09:10:26 +02:00
Larry Gregory	2ca09cddaf	[DOCS] Rename kibana user to kibana_system (#58423 )	2020-06-23 14:25:09 -07:00
Przemysław Witek	4e4ca6ac25	Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447 ) (#58459 )	2020-06-23 22:18:39 +02:00
Benjamin Trent	a9b868b7a9	[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280 ) (#58455 ) This commits allows data streams to be a valid source for analytics and transforms. Data streams are fairly transparent and our `_search` and `_reindex` actions work without error. For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.	2020-06-23 14:40:35 -04:00
Benjamin Trent	0cc84d3caf	[ML] wait for yellow state for stats index in tests (#58436 ) (#58456 ) GET inference stats now reads from the .ml-stats index. Our tests should wait for yellow state before attempting to query the index for stat information.	2020-06-23 13:32:24 -04:00
Dimitris Athanasiou	f67fee387b	[7.x][ML] Make regression training set predictable in size (#58331 ) (#58453 ) Unlike `classification`, which is using a cross validation splitter that produces training sets whose size is predictable and equal to `training_percent * class_cardinality`, for regression we have been using a random splitter that takes an independent decision for each document. This means we cannot predict the exact size of the training set. This poses a problem as we move towards performing test inference on the java side as we need to be able to provide an accurate upper bound of the training set size to the c++ process. This commit replaces the random splitter we use for regression with the same streaming-reservoir approach we do for `classification`. Backport of #58331	2020-06-23 19:49:03 +03:00
Marios Trivyzas	e7c40d973e	SQL: Relax parsing of date/time escaped literals (#58336 ) (#58450 ) Improve the usability of the MS-SQL server/ODBC escaped date/time/timestamp literals, by allowing timezone/offset ids in the parsed string, e.g.: ``` {ts '2000-01-01T11:11:11Z'} ``` Closes: #58262 (cherry picked from commit 0af1f2fef805324e802d97d2fd9b4660abb403f0)	2020-06-23 18:05:54 +02:00
David Roberts	0d6bfd0ac3	[7.x][ML] Fix wire serialization for flush acknowledgements (#58443 ) There was a discrepancy in the implementation of flush acknowledgements: most of the class was designed on the basis that the "last finalized bucket time" could be null but the wire serialization assumed that it was never null. This works because, the C++ sends zero "last finalized bucket time" when it is not known or not relevant. But then the Java code will print that to XContent as it is assuming null represents not known or not relevant. This change corrects the discrepancies. Internally within the class null represents not known or not relevant, but this is translated from/to 0 for communications from the C++ and old nodes that have the bug. Additionally I switched from Date to Instant for this class and made the member variables final to modernise it a bit. Backport of #58413	2020-06-23 16:42:06 +01:00
Mark Tozzi	52806a8f89	Small VS config cleanup (#58294 ) (#58442 )	2020-06-23 10:53:06 -04:00
Benjamin Trent	61142a3005	[ML] only log if forecasts are set to failed (#58421 ) (#58437 ) This adjusts the logging level for setting forecasts to failed to WARN. And it will only log if 1 or more forecasts were adjusted to failed.	2020-06-23 10:24:03 -04:00
Alan Woodward	8ebd341710	Add text search information to MappedFieldType (#58230 ) (#58432 ) Now that MappedFieldType no longer extends lucene's FieldType, we need to have a way of getting the index information about a field necessary for building text queries, building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo abstraction that holds this information, and a getTextSearchInfo() method to MappedFieldType to make it available. Field types that do not support text search can just return null here. This allows us to remove the MapperService.getLuceneFieldType() shim method.	2020-06-23 14:37:26 +01:00
Alan Woodward	519d1278e2	Make FieldTypeLookup immutable (#58162 ) (#58411 ) FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to the presence of multiple mapping types within a single index, this had to be updated in-place because a mapping update might only affect one type. However, now that we only have a single type per index, we can completely rebuild the FieldTypeLookup on each update, removing lots of concurrency worries.	2020-06-23 10:51:32 +01:00
David Roberts	f97b37190b	[ML] Add a new annotation type for categorization status changes (#58394 ) Adds a new value to the "event" enum of ML annotations, namely "categorization_status_change". This will allow users to see when categorization was found to be performing poorly. Once per-partition categorization is available, it will allow users to see when categorization is performing poorly for a specific partition. It does not make sense to reuse the "model_change" event that annotations already have, because categorizer state is separate to model state ("model" state is really anomaly detector state), and is not reverted by the revert model snapshot API. Therefore annotations related to categorization need to be treated differently to annotations related to anomaly detection.	2020-06-23 09:16:27 +01:00
Rene Groeschke	bd2dd81bc6	Fix deprecated property usage in archive tasks (#58269 ) (#58308 )	2020-06-23 09:11:46 +02:00
Martijn van Groningen	7dda9934f9	Keep track of timestamp_field mapping as part of a data stream (#58400 ) Backporting #58096 to 7.x branch. Relates to #53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.	2020-06-22 17:46:38 +02:00
Costin Leau	765f1b5775	SQL: Fix bug in resolving aliases against filters (#58399 ) When doing aliasing with the same name over non existing fields, the analyzer gets stuck in a loop trying to resolve the alias over and over leading to SO. This PR breaks the cycle by checking the relationship between the alias and the child it tries to replace as an alias should never replace its child. Fix #57270 Close #57417 Co-authored-by: Hailei <zhh5919@163.com> (cherry picked from commit 46786ff2e1ed5951006ff4bdd2b6ac6a1ebcf17b)	2020-06-22 16:05:42 +03:00
Przemko Robakowski	a44dad9fbb	[7.x] Add support for snapshot and restore to data streams (#57675 ) (#58371 ) * Add support for snapshot and restore to data streams (#57675) This change adds support for including data streams in snapshots. Names are provided in indices field (the same way as in other APIs), wildcards are supported. If rename pattern is specified it renames both data streams and backing indices. It also adds test to make sure SLM works correctly. Closes #57127 Relates to #53100 * version fix * compilation fix * compilation fix * remove unused changes * compilation fix * test fix	2020-06-19 22:41:51 +02:00
Benjamin Trent	bf8641aa15	[7.x] [ML] calculate cache misses for inference and return in stats (#58252 ) (#58363 ) When a local model is constructed, the cache hit miss count is incremented. When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.	2020-06-19 09:46:51 -04:00
Stuart Tettemer	20abba8433	Scripting: Deprecate general cache settings (#55753 ) (#58283 ) Backport: ef543b0	2020-06-18 11:54:23 -06:00
Jim Ferenczi	1c1a6d4ec8	Handle failures with no explicit cause in async search (#58319 ) This commit fixes an AOOBE in the handling of fatal failures in _async_search. If the underlying cause is not found, this change uses the root failure. Closes #58311	2020-06-18 18:57:58 +02:00
Przemysław Witek	9dd3d5aa48	[7.x] Delete auto-generated annotations when model snapshot is reverted (#58240 ) (#58335 )	2020-06-18 17:59:52 +02:00
Jason Tedor	be08268562	Allow follower indices to override leader settings (#58103 ) Today when creating a follower index via the put follow API, or via an auto-follow pattern, it is not possible to specify settings overrides for the follower index. Instead, we copy all of the leader index settings to the follower. Yet, there are cases where a user would want some different settings on the follower index such as the number of replicas, or allocation settings. This commit addresses this by allowing the user to specify settings overrides when creating follower index via manual put follower calls, or via auto-follow patterns. Note that not all settings can be overrode (e.g., index.number_of_shards) so we also have detection that prevents attempting to override settings that must be equal between the leader and follow index. Note that we do not even allow specifying such settings in the overrides, even if they are specified to be equal between the leader and the follower index. Instead, the must be implicitly copied from the leader index, not explicitly set by the user.	2020-06-18 11:56:06 -04:00
Alan Woodward	4b8cf2af6a	Add serialization test for FieldMappers when include_defaults=true (#58235 ) (#58328 ) Fixes a bug in TextFieldMapper serialization when index is false, and adds a base-class test to ensure that all field mappers are tested against all variations with defaults both included and excluded. Fixes #58188	2020-06-18 15:46:04 +01:00
Marios Trivyzas	50b391e91b	SQL: [Docs] Fix TIME_PARSE documentation (#58182 ) (#58317 ) TIME_PARSE works correctly if both date and time parts are specified, and a TIME object (that contains only time is returned). Adjust docs and add a unit test that validates the behavior. Follows: #55223 (cherry picked from commit 9d6b679a5da88f3c131b9bdba49aa92c6c272abe)	2020-06-18 16:09:13 +02:00
Alan Woodward	ca2d12d039	Remove Settings parameter from FieldMapper base class (#58237 ) This is currently used to set the indexVersionCreated parameter on FieldMapper. However, this parameter is only actually used by two implementations, and clutters the API considerably. We should just remove it, and use it directly in the implementations that require it.	2020-06-18 12:53:54 +01:00
Tanguy Leroux	f3b6e41f02	Do not wrap CacheFile reentrant r/w locks with ReleasableLock (#58244 ) Today the read/write locks used internally by CacheFile object are wrapped into a ReleasableLock. This is not strictly required and also prevents usage of the tryLock() methods which we would like to use for early releasing of read operations (#58164).	2020-06-18 11:01:53 +02:00
Andrei Dan	caa5d3abe0	ILM actions check the managed index is not a DS write index (#58239 ) (#58295 ) This changes the actions that would attempt to make the managed index read only to check if the managed index is the write index of a data stream before proceeding. The updated actions are shrink, readonly, freeze and forcemerge. (cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-18 07:45:11 +01:00
Rene Groeschke	abc72c1a27	Unify dependency licenses task configuration (#58116 ) (#58274 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-18 08:15:50 +02:00
Lee Hinman	d272646a55	Fix name of template in allowed warning for DS YML test (#58273 ) The warning was present, but had the incorrect template name, leading to a test failure.	2020-06-17 11:23:04 -06:00
David Roberts	3f8d16304c	Add ML admin permissions to the kibana_system role (#58172 ) As part of the "ML in Spaces" project, access to the ML UI in Kibana is migrating to being controlled by Kibana privileges. The ML UI will check whether the logged-in user has permission to do something ML-related using Kibana privileges, and if they do will call the relevant ML Elasticsearch API using the Kibana system user. In order for this to work the kibana_system role needs to have administrative access to ML. Backport of #58061	2020-06-17 17:03:32 +01:00
Benjamin Trent	2de242f80e	[ML] rename EnsembleSizeInfo#inputFieldNameLengths to this.featureNameLengths (#58241 ) (#58253 )	2020-06-17 10:08:55 -04:00
Benjamin Trent	69338b03d7	[ML] expand data_streams when assigning datafeed to node (#58175 ) (#58242 )	2020-06-17 08:34:34 -04:00
Ignacio Vera	2d3d7ab387	mute CentroidCalculatorTests#testPolygonAsPoint (#58249 ) (#58250 )	2020-06-17 14:32:13 +02:00
Jason Tedor	b78b3edeea	Upgrade to JNA 5.5.0 (#58183 ) This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are now on the latest maintained line, and pick up a large collection of bug fixes that have accumulated.	2020-06-17 07:35:08 -04:00
Dimitris Athanasiou	36dbf08d47	[7.x][ML] Improve stability of stratified splitter tests (#58180 ) (#58224 ) The main improvement here is that the total expected count of training rows in the test is calculated as the sum of the training fraction times the cardinality of each class (instead of the training fraction times the total doc count). Also relaxes slightly the error bound on the uniformity test from 0.12 to 0.13. Closes #54122 Backport of #58180	2020-06-17 12:40:21 +03:00
Andrei Dan	e17c51151b	[7.x] ILM: don't take snapshot of a data stream's write index (#58159 ) (#58222 ) We don't allow converting a data stream's writeable index into a searchable snapshot. We are currently preventing swapping a data stream's write index with the restored index. This adds another step that will not proceed with the searchable snapshot action until the managed index is not the write index of a data stream anymore. (cherry picked from commit ccd618ead7cf7f5a74b9fb34524d00024de1479a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-17 09:45:16 +01:00
Ignacio Vera	7080ba5b05	Check for degenerated lines when calculating the centroid (#58216 )	2020-06-17 09:34:49 +02:00
Przemysław Witek	b22e91cefc	[7.x] Delete auto-generated annotations when job is deleted. (#58169 ) (#58219 )	2020-06-17 09:17:20 +02:00
Lisa Cawley	46d797b1d9	[DOCS] Fixes license management links (#58213 )	2020-06-16 16:49:48 -07:00
Stuart Tettemer	01795d1925	Revert "Scripting: Deprecate general cache settings (#55753 )" (#58201 ) This reverts commit `88e8b34fc2`.	2020-06-16 14:58:18 -06:00
Stuart Tettemer	88e8b34fc2	Scripting: Deprecate general cache settings (#55753 ) Backport: ef543b0	2020-06-16 13:06:59 -06:00
Benjamin Trent	081da09c72	Allow GET <pattern>/_rollup/data to expand data streams (#58173 ) (#58177 )	2020-06-16 14:01:54 -04:00
Benjamin Trent	3309817d18	[ML] fixing tree inference ctor to allow target_type to be optional (#58132 ) (#58165 ) The tree trained model object will set its target_type to be regression by default. This updates the inference object to behave the same way.	2020-06-16 13:29:11 -04:00
Benjamin Trent	6c03d97419	Mute TimeSeriesDataStreamsIT.testSearchableSnapshotAction (#58127 ) (#58181 ) Co-authored-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-16 12:40:38 -04:00
Alan Woodward	12a3f6dfca	MappedFieldType should not extend FieldType (#58160 ) MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to #56814	2020-06-16 16:56:43 +01:00
Dan Hermann	7079a3b09f	[7.x] Prohibit freezing the write index of a data stream (#58168 )	2020-06-16 09:37:32 -05:00
Yannick Welsch	1e235a7f55	Fix off-by-one on CCR lease (#58158 ) The leases issued by CCR keep one extra operation around on the leader shards. This is not harmful to the leader cluster, but means that there's potentially one delete that can't be cleaned up.	2020-06-16 14:04:58 +02:00
David Turner	423697f414	Default to zero replicas for searchable snapshots (#57802 ) Today a mounted searchable snapshot defaults to having the same replica configuration as the index that was snapshotted. This commit changes this behaviour so that we default to zero replicas on these indices, but allow the user to override this in the mount request. Relates #50999	2020-06-16 10:12:23 +01:00
Tal Levy	69d5e044af	Add optional description parameter to ingest processors. (#57906 ) (#58152 ) This commit adds an optional field, `description`, to all ingest processors so that users can explain the purpose of the specific processor instance. Closes #56000.	2020-06-15 19:27:57 -07:00
Lisa Cawley	554e60860f	[DOCS] Add token and HTTPS requirements for Kerberos (#57180 ) Co-authored-by: Tim Vernum <tim@adjective.org>	2020-06-15 14:30:13 -07:00
Lee Hinman	d56d2dfb09	[7.x] Scope index templates put during cluster upgrade tests (#58065 ) (#58122 ) This template was added for 7.0 for what I am guessing is a BWC issue related to deprecation warnings. It unfortunately seems to cause failures because templates for these tests are not cleared after the test (because these are upgrade tests). Resolves #56363	2020-06-15 10:47:36 -06:00
markharwood	03dd73dc0d	Fix for wildcard fields that returned ByteRefs not Strings to scripts. (#58060 ) (#58109 ) This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values. Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts. Closes #58044	2020-06-15 14:52:56 +01:00
Alejandro Fernández Haro	3d0c8da66d	Add monitor and view_index_metadata to the built-in `kibana_system` role (#57755 ) Allows the kibana user to collect data telemetry in a background task by giving the kibana_system built-in role the view_index_metadata and monitoring privileges over all indices (*).	2020-06-15 14:40:27 +03:00
Shaunak Kashyap	5e2faad783	Add ILM policy PUT and GET for remote_monitoring_agent built-in role (#57963 ) Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>	2020-06-15 14:35:30 +03:00
Rene Groeschke	01e9126588	Remove deprecated usage of testCompile configuration (#57921 ) (#58083 ) * Remove usage of deprecated testCompile configuration * Replace testCompile usage by testImplementation * Make testImplementation non transitive by default (as we did for testCompile) * Update CONTRIBUTING about using testImplementation for test dependencies * Fail on testCompile configuration usage	2020-06-14 22:30:44 +02:00
Jason Tedor	dcf4131f00	Revert "Add JNA license to SQL CLI dependency licenses" This reverts commit `076b32d4f3`.	2020-06-12 17:04:39 -04:00
Dan Hermann	17f3318732	[7.x] Resolve index API (#58037 )	2020-06-12 15:41:32 -05:00
Jason Tedor	076b32d4f3	Add JNA license to SQL CLI dependency licenses Previously we excluded requiring licenses for dependencies with the group name org.elasticsearch under the assumption that these use the top-level Elasticsearch license. This is not always correct, for example, for the org.elasticsearch:jna dependency as this is merely a wrapper around the upstream JNA project, and that is the license that we should be including. A recent change modified this check from using the group name to checking only if the dependency is a project dependency. This exposed the use of JNA in SQL CLI to this check, but the license for it was not added. This commit addresses this by adding the license. Relates #58015	2020-06-12 16:38:23 -04:00
Benjamin Trent	79c784932f	[ML] allow feature_names to be optional in ensemble inference model (#58059 ) (#58067 ) This has `EnsembleInferenceModel` not parse feature_names from the XContent. Instead, it will rely on `rewriteFeatureIndices` to be called ahead time. Consequently, protections are made for a fail fast path if `rewriteFeatureIndices` has not been called before `infer`.	2020-06-12 16:33:54 -04:00
Mark Vieira	0ce102a5f4	Fix issue with bwc tests running wrong cluster versions (#58063 ) We were previously configuring BWC testing tasks by matching on task name prefix. This naive approach breaks down when you have versions like 1.0.1 and 1.0.10 since they both share a common prefix. This commit makes the pattern matching more specific so we won't inadvertently spin up the wrong cluster version.	2020-06-12 12:34:15 -07:00
Ignacio Vera	c518670f83	Fix Geo grid aggregation circuit breaker tests (#58028 ) (#58042 ) This commit makes sure we create index with only one shard.	2020-06-12 15:39:27 +02:00
Martijn van Groningen	01d8bb8cfa	Enforce valid field mapping exists for timestamp_field in templates. (#58036 ) Backport of #57741 to 7.x branch. Relates to #53100	2020-06-12 15:24:42 +02:00
David Roberts	93b693527a	[7.x][ML] Add categorizer stats ML result type (#58001 ) This type of result will store stats about how well categorization is performing. When per-partition categorization is in use, separate documents will be written for every partition so that it is possible to see if categorization is working well for some partitions but not others. This PR is a minimal implementation to allow the C++ side changes to be made. More Java side changes related to per-partition categorization will be in followup PRs. However, even in the long term I do not see a major benefit in introducing dedicated APIs for querying categorizer stats. Like forecast request stats the categorizer stats can be read directly from the job's results alias. Backport of #57978	2020-06-12 12:08:07 +01:00
markharwood	2da8e57f59	Search - add range query support to wildcard field (#57881 ) (#57988 ) Backport to add range query support to wildcard field Closes #57816	2020-06-12 11:30:54 +01:00
David Kyle	39020f3900	HLRC for delete expired data by job Id (#57722 ) (#57975 ) High level rest client changes for #57337	2020-06-12 09:44:17 +01:00
Mark Tozzi	36f551bdb4	Make ValuesSourceConfig behave like a config object (#57762 ) (#58012 )	2020-06-11 17:23:55 -04:00
Benjamin Trent	2881995a45	[ML] adding new inference model size estimate handling from native process (#57930 ) (#57999 ) Adds support for reading in `model_size_info` objects. These objects contain numeric values indicating the model definition size and complexity. Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-11 15:59:23 -04:00
Alan Woodward	16e230dcb8	Update to lucene snapshot e7c625430ed (#57981 ) Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.	2020-06-11 14:51:53 +01:00
David Roberts	54d4f2a623	[ML] Refresh annotations index on job flush and close (#57979 ) Now that annotations are part of the anomaly detection job results the annotations index should be refreshed on flushing and closing the job so that flush and close continue to fulfil their contracts that immediately after returning all results the job generated up to that point are searchable.	2020-06-11 12:29:04 +01:00
David Kyle	b87b147704	Add models for search to ModelLoadingService (#57592 ) (#57919 ) ModelLoadingService only caches models if they are referenced by an ingest pipeline. For models used in search we want to always cache the models and rely on TTL to evict them. Additionally when an ingest pipeline is deleted the model it references should not be evicted if it is used in search.	2020-06-11 10:48:37 +01:00
David Kyle	2905a2f623	Use Search After job iterators (#57875 ) (#57923 ) Search after is a better choice for the delete expired data iterators where processing takes a long time as unlike scroll a context does not have to be kept alive. Also changes the delete expired data endpoint to 404 if the job is unknown	2020-06-11 10:06:18 +01:00
Costin Leau	ff0ea62cb8	EQL: Fix casing for tiebreaker field (#57943 ) Use tiebreaker instead of tieBreaker (cherry picked from commit 3c774948a5d5e10fac267cb9a54f5d0559a00c1d)	2020-06-11 00:10:19 +03:00
Albert Zaharovits	c57ccd99f7	Just log 401 stacktraces (#55774 ) Ensure stacktraces of 401 errors for unauthenticated users are logged but not returned in the response body.	2020-06-10 20:39:32 +03:00
Valeriy Khakhutskyy	c0f368bbf3	[7.x][ML] Adjust assertion for job case memory usage estimates (#57929 ) Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion. Backport or #57882	2020-06-10 15:17:16 +02:00
Aleksandr Maus	ec60335496	EQL: implement case sensitivity for indexOf and endsWith string functions (#57707 ) (#57908 ) * EQL: implement case sensitivity for indexOf and endsWith string functions	2020-06-10 08:55:49 -04:00
Andrei Dan	9f280621ba	[7.x] ILM add data stream support to searchable snapshot action (#57873 ) (#57916 ) (cherry picked from commit 34856a90532c6c62a53817bb395399c8a8c17c0f) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-10 10:16:57 +01:00
Yannick Welsch	80f221e920	Use clean thread context for transport and applier service (#57792 ) (#57914 ) Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and also that thread contexts are not leaked). Moves the ClusterApplierService to use the system context (same as we do for MasterService), which allows to remove a hack from TemplateUgradeService and makes it clearer that applying CS updates is fully executing under system context.	2020-06-10 10:30:28 +02:00
Hendrik Muhs	95bd7b63b0	[Transform] fix page size return in cat transform, add dps (#57871 ) fixes the page size reported after moving page size to settings(#56007) and adds documents per second(throttling) to the output. fixes #56498	2020-06-10 08:10:25 +02:00
Yang Wang	72a6441a88	Revert "Resolve anonymous roles and deduplicate roles during authentication (#53453 ) (#55995 )" (#57858 ) This reverts commit `84a2f1adf2`.	2020-06-10 10:42:52 +10:00
Jake Landis	a370d5eead	[7.x] Ensure Joni warning are logged at debug (#57302 ) (#57897 ) When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.	2020-06-09 17:06:29 -05:00
Yannick Welsch	9eec819c5b	Revert "Use clean thread context for transport and applier service (#57792 )" This reverts commit `259be236cf`.	2020-06-09 22:24:54 +02:00
Costin Leau	439205d1ea	EQL: Introduce tie breaker support (#57787 ) Allow a field inside the data to be used as a tie breaker for events that have the same timestamp. The field is optional by default. If used, the tie-breaker always requires a non-null value since it is used inside `search_after` which requires a non-null value. Fix #56824 (cherry picked from commit e5719ecb474b32730d93afdbb6834a32b0b2df8b)	2020-06-09 22:50:19 +03:00
Andrei Dan	3945712c72	[7.x] ILM add data stream support to the Shrink action (#57616 ) (#57884 ) The shrink action creates a shrunken index with the target number of shards. This makes the shrink action data stream aware. If the ILM managed index is part of a data stream the shrink action will make sure to swap the original managed index with the shrunken one as part of the data stream's backing indices and then delete the original index. (cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-09 19:45:22 +01:00
Nik Everett	44a79d1739	Deprecte Rounding#round (#57845 ) (#57893 ) This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in favor of calling ``` Rounding.Prepared prepared = rounding.prepare(min, max); ... prepared.round(val) ``` because it is always going to be faster to prepare once. There are going to be some cases where we won't know what to prepare for and in those cases you can call `prepareForUnknown` and stil be faster than calling the deprecated method over and over and over again. Ultimately, this is important because it doesn't look like there is an easy way to cache `Rounding.Prepared` or any of its precursors like `LocalTimeOffset.Lookup`. Instead, we can just build it at most once per request. Relates to #56124	2020-06-09 14:30:56 -04:00
Dan Hermann	b501b282f8	Change default backing index naming scheme	2020-06-09 09:31:34 -05:00
Hossein Dehghan	2c6bd978d8	[Docs] Fix missing closing bracket for watcher webhook.asciidoc (#57803 )	2020-06-09 13:59:51 +02:00
Yannick Welsch	259be236cf	Use clean thread context for transport and applier service (#57792 ) Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and also that thread contexts are not leaked). Moves the ClusterApplierService to use the system context (same as we do for MasterService), which allows to remove a hack from TemplateUgradeService and makes it clearer that applying CS updates is fully executing under system context.	2020-06-09 12:32:28 +02:00
Andrei Stefan	3cc8166946	SQL: handle MIN and MAX functions on dates in Painless scripts (#57605 ) (#57863 ) * Convert to date/datetime the result of numeric aggregations (min, max) in Painless scripts (cherry picked from commit f1de99e2a6fbf3806c4f2b6b809738aa8faa2d75)	2020-06-09 10:09:01 +03:00
Benjamin Trent	d5522c2747	[ML] add new circuit breaker for inference model caching (#57731 ) (#57830 ) This adds new plugin level circuit breaker for the ML plugin. `model_inference` is the circuit breaker qualified name. Right now it simply adds to the breaker when the model is loaded (and possibly breaking) and removing from the breaker when the model is unloaded.	2020-06-08 16:02:48 -04:00
Armin Braun	0987c0a5f3	Fix Broken Numeric Shard Generations in RepositoryData (#57813 ) (#57821 ) Fix broken numeric shard generations when reading them from the wire or physically from the physical repository. This should be the cheapest way to clean up broken shard generations in a BwC and safe-to-backport manner for now. We can potentially further optimize this by also not doing the checks on the generations based on the versions we see in the `RepositoryData` but I don't think it matters much since we will read `RepositoryData` from cache in almost all cases. Closes #57798	2020-06-08 18:36:56 +02:00
Przemysław Witek	7a1300a09e	[7.x] Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808 ) (#57815 )	2020-06-08 17:41:12 +02:00
Mayya Sharipova	70e63a365a	Refactor how to determine if a field is metafield (#57378 ) (#57771 ) Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422	2020-06-08 09:16:18 -04:00
Andrei Dan	1b84e93d83	[7.x] DataStream creation validation allows for prefixed indices (#57750 ) (#57799 ) We want to validate the DataStreams on creation to make sure the future backing indices would not clash with existing indices in the system (so we can always rollover the data stream). This changes the validation logic to allow for a DataStream to be created with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the former backing index (`foo-000001`) exists in the system. The new validation logic will look for potential index conflicts with indices in the system that have the counter in the name greater than the data stream's generation. This ensures that the `DataStream`'s future rollovers are safe because for a `DataStream` `foo` of generation 4, we will look for standalone indices in the form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if `foo-000006` exists in the system), but will also allow replacing a backing index with an index named by prefixing the backing index it replaces. (cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-08 13:31:52 +01:00
David Kyle	08d1286de7	[7.x] Delete expired data by job (#57337 ) (#57796 ) Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.	2020-06-08 13:00:23 +01:00
Luca Cavanna	7a06a13d99	Add description to submit and get async search, as well as cancel tasks (#57745 ) This makes it easier to debug where such tasks come from in case they are returned from the get tasks API. Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.	2020-06-08 11:17:29 +02:00
Luca Cavanna	06ef3042c1	Specify reason whenever async search gets cancelled (#57761 ) This allows to trace where the cancel tasks request came from given that it may be triggered for multiple reasons.	2020-06-08 10:25:31 +02:00
David Roberts	1d64d55a86	[7.x][ML] Add per-partition categorization option (#57723 ) This PR adds the initial Java side changes to enable use of the per-partition categorization functionality added in elastic/ml-cpp#1293. There will be a followup change to complete the work, as there cannot be any end-to-end integration tests until elastic/ml-cpp#1293 is merged, and also elastic/ml-cpp#1293 does not implement some of the more peripheral functionality, like stop_on_warn and per-partition stats documents. The changes so far cover REST APIs, results object formats, HLRC and docs. Backport of #57683	2020-06-06 08:15:17 +01:00
Benjamin Trent	9666a895f7	[ML] inference performance optimizations and refactor (#57674 ) (#57753 ) This is a major refactor of the underlying inference logic. The main refactor is now we are separating the model configuration and the inference interfaces. This has the following benefits: - we can store extra things with the model that are not necessary for inference (i.e. treenode split information gain) - we can optimize inference separate from model serialization and storage. - The user is oblivious to the optimizations (other than seeing the benefits). A major part of this commit is removing all inference related methods from the trained model configurations (ensemble, tree, etc.) and moving them to a new class. This new class satisfies a new interface that is ONLY for inference. The optimizations applied currently are: - feature maps are flattened once - feature extraction only happens once at the highest level (improves inference + feature importance through put) - Only storing what we need for inference + feature importance on heap	2020-06-05 14:20:58 -04:00
Jake Landis	459ab9a0b2	[7.x] Ensure type exists for all monitoring configuration (#57399 ) (#57704 ) #47711 and #47246 helped to validate that monitoring settings are rejected at time of setting the monitoring settings. Else an invalid monitoring setting can find it's way into the cluster state and result in an exception thrown [1] on the cluster state application (there by causing significant issues). Some additional monitoring settings have been identified that can result in invalid cluster state that also result in exceptions thrown on cluster state application. All settings require a type of either http or local to be applicable. When a setting is changed, the exporters are automatically updated with the new settings. However, if the old or new settings lack of a type setting an exception will be thrown (since exporters are always of type 'http' or 'local'). Arguably we shouldn't blindly create and destroy new exporters on each monitoring setting update, but the lifecycle of the exporters is abit out the scope this PR is trying to address. This commit introduces a similar methodology to check for validity as #47711 and #47246 but this time for ALL (including non-http) settings. Monitoring settings are not useful unless there an exporter with a type defined. The type is used as dependent setting, such that it must exist to set the value. This ensures that when any monitoring settings changes that they can only get added to cluster state if the type exists. If the type exists (and the other validations pass) then the exporters will get re-built and the cluster state remains valid. Tests have been included to ensure that all dynamic monitoring settings have the type as dependent settings. [1] org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]	2020-06-05 10:47:11 -05:00
Dimitris Athanasiou	f49a14ce6f	[7.x][ML] Fix race condition when force stopping DF analytics job (#57680 ) (#57717 ) When we force delete a DF analytics job, we currently first force stop it and then we proceed with deleting the job config. This may result in logging errors if the job config is deleted before it is retrieved while the job is starting. Instead of force stopping the job, it would make more sense to try to stop the job gracefully first. So we now try that out first. If normal stop fails, then we resort to force stopping the job to ensure we can go through with the delete. In addition, this commit introduces `timeout` for the delete action and makes use of it in the child requests. Backport of #57680	2020-06-05 17:50:01 +03:00
Tanguy Leroux	0e57528d5d	Remove more //NORELEASE (#57517 ) We agreed on removing the following //NORELEASE tags.	2020-06-05 15:34:06 +02:00
Hendrik Muhs	61c496d320	[Transform] use old roles only together with old endpoints (#57710 ) avoids a CI failure if new endpoints used together with old roles and warnings are asserted.	2020-06-05 10:08:05 +02:00
Hendrik Muhs	e91b975878	[Transform] mark old data frame transform roles deprecated (#57655 ) mark old data frame transform roles deprecated fixes #50087	2020-06-05 09:20:35 +02:00
Hendrik Muhs	c1c8817eae	[7.x][Transform] improve update API (#57685 ) rewrite config on update if either version is outdated, credentials change, the update changes the config or deprecated settings are found. Deprecated settings get migrated to the new format. The upgrade can be easily extended to do any necessary re-writes. fixes #56499 backport #57648	2020-06-05 08:48:47 +02:00
Jake Landis	f4a3d969ad	[7.x] Ensure default watches are updated for rolling upgrades. (#57185 ) (#57563 ) For a rolling/mixed cluster upgrade (add new version to existing cluster then shutdown old instances), the watches that ship by default with monitoring may not get properly updated to the new version. Monitoring watches can only get published if the internal state is marked as dirty. If a node is not master, will also get marked as clean (e.g. not dirty). For a mixed cluster upgrade, it is possible for the new node to be added, not as master, the internal state gets marked as clean so that no more attempts can be made to publish the watches. This happens on all new nodes. Once the old nodes are de-commissioned one of the new version nodes in the cluster gets promoted to master. However, that new master node (with out intervention like restarting the node or removing/adding exporters) will never attempt to re-publish since the internal state was already marked as clean. This commit adds a cluster state listener to mark the resource dirty when a node is promoted to master. This will allow the new resource to be published without any intervention.	2020-06-04 16:44:36 -05:00
William Brafford	dfb6def3da	Revert "Restore xpack.ilm.enabled and xpack.slm.enabled settings (#57383 )" This reverts commit `7a67fb2d04`.	2020-06-04 16:25:05 -04:00
Ioannis Kakavas	8afd55ebe6	Disable testing conventions for idp in fips (#57663 ) (#57676 ) Since we disable both integTest and test tasks. This should have been part of #57048 but we missed it.	2020-06-04 20:51:38 +03:00
Ioannis Kakavas	af9f9d7f03	[7.x] Add http proxy support for OIDC realm (#57039 ) (#57584 ) This change introduces support for using an http proxy for egress communication of the OpenID Connect realm.	2020-06-04 20:51:00 +03:00
William Brafford	7a67fb2d04	Restore xpack.ilm.enabled and xpack.slm.enabled settings (#57383 ) In #55592 and #55416, we deprecated the settings for enabling and disabling basic license features and turned those settings into no-ops. Since doing so, we've had feedback that this change may not give users enough time to cleanly switch from non-ILM index management tools to ILM. If two index managers operate simultaneously, results could be strange and difficult to reconstruct. We don't know of any cases where SLM will cause a problem, but we are restoring that setting as well, to be on the safe side. This PR is not a strict commit reversion. First, we are keeping the new xpack.watcher.use_ilm_index_management setting, introduced when xpack.ilm.enabled was made a no-op, so that users can begin migrating to using it. Second, the SLM setting was modified in the same commit as a group of other settings, so I have taken just the changes relating to SLM.	2020-06-04 13:38:22 -04:00
Mark Vieira	9b0f5a1589	Include vendored code notices in distribution notice files (#57017 ) (#57569 ) (cherry picked from commit 627ef279fd29f8af63303bcaafd641aef0ffc586)	2020-06-04 10:34:24 -07:00
Przemysław Witek	6b5f49d097	[7.x] Introduce ModelPlotConfig. annotations_enabled setting (#57539 ) (#57641 )	2020-06-04 15:15:35 +02:00
Benjamin Trent	ea9b8b9d41	[ML] fix setting forecasts to failed method (#57654 ) (#57656 )	2020-06-04 08:54:46 -04:00
Rene Groeschke	751f16858b	Remove duplicate ssl setup in sql/qa projects (#57319 ) (#57643 ) * Remove duplicate ssl setup in sql/qa projects * Fix enforcement of task instances * Use static data for cert generation * Move ssl testing logic into a plugin * Document test cert creation	2020-06-04 14:53:23 +02:00
Marios Trivyzas	5f8442d1f4	SQL: Improve performances of LTRIM/RTRIM (#57603 ) Change custom stripping leading and trailing whitespaces implementation to substantially improves performance: ``` Benchmark Mode Cnt Score Error Units StringTrim.testWithStringBuilder avgt 25 82547.575 ± 66.244 ns/op (existing impl) StringTrim.testWithSubstring avgt 25 1398.762 ± 101.152 ns/op (new impl) StringTrim.testWithJavaStrip avgt 25 1186.120 ± 10.374 ns/op (for reference) ``` Java's string stripLeading()/stripTrailing() not available to all supported JDKs. Enhanced LENGTH unit tests and compine a couple of LTRIM/RTRIM integ tests. Relates to: #57594 (partially cherry picked from commit ee7868d68733f195dc46926a7eab3d9dd7033ef4) Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>	2020-06-04 13:43:49 +02:00
Igor Motov	8d7f389f3a	Increase search.max_buckets to 65,535 (#57042 ) Increases the default search.max_buckets limit to 65,535, and only counts buckets during reduce phase. Closes #51731	2020-06-03 15:35:41 -04:00
Julie Tibshirani	e0a15e8dc4	Remove the 'array value parser' marker interface. (#57571 ) (#57622 ) This PR replaces the marker interface with the method FieldMapper#parsesArrayValue. I find this cleaner and it will help with the fields retrieval work (#55363). The refactor also ensures that only field mappers can declare they parse array values. Previously other types like ObjectMapper could implement the marker interface and be passed array values, which doesn't make sense.	2020-06-03 11:30:14 -07:00
Marios Trivyzas	a674844893	SQL: Implement TRIM function (#57518 ) (#57593 ) Add `TRIM` function which combines the functionality of both `LTRIM` and `RTRIM` by stripping both leading and trailing whitespaces. Refers to #41195 (cherry picked from commit 6c86c919e12f0c4cb5e39d129aa65ab3e274268f)	2020-06-03 15:19:48 +02:00
Ioannis Kakavas	64583f7ec4	Mute EmailSslTests test case in fips (#57576 ) (#57577 ) We test expected TLS failures by catching SSLException, but other security providers ( i.e. BCFIPS ) might throw a different one. In this case, BCFIPS throws org.bouncycastle.tls.TlsFatalAlert	2020-06-03 11:23:31 +03:00
Marios Trivyzas	634936e3be	SQL: [Tests] Enable tests which have been fixed (#57526 ) (#57538 ) Enable integration tests for issues that have been fixed over time. (cherry picked from commit 117759ee152bcfb0043e5af3a784302ca31f6b8c)	2020-06-02 23:38:33 +02:00
Nik Everett	2a27c411fb	Same memory when geo aggregations are not on top (#57483 ) (#57551 ) Saves memory when the `geotile_grid` and `geohash_grid` are not on the top level by using the `LongKeyedBucketOrds` we built in #55873.	2020-06-02 16:21:50 -04:00
Dan Hermann	97a51272b0	Fix incorrect log warning when exporting monitoring via HTTP without authentication (#57552 )	2020-06-02 15:03:55 -05:00
Mark Tozzi	e50f514092	IndexFieldData should hold the ValuesSourceType (#57373 ) (#57532 )	2020-06-02 12:16:53 -04:00
Armin Braun	ba2d70d8eb	Serialize Outbound Messages on IO Threads (#56961 ) (#57080 ) Almost every outbound message is serialized to buffers of 16k pagesize. We were serializing these messages off the IO loop (and retaining the concrete message instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the channel is ready. 1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average. 2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically. With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable. Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC. This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain). I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once. Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.	2020-06-02 16:15:18 +02:00
Rene Groeschke	8584da40af	Move classes from build scripts to buildSrc (#57197 ) (#57512 ) * Move classes from build scripts to buildSrc - move Run task - move duplicate SanEvaluator * Remove :run workaround * Some little cleanup on build scripts on the way	2020-06-02 15:33:53 +02:00
Andrei Dan	bd188f4a21	[7.x] ILM: add support for rolling over data streams (#57295 ) (#57515 ) As the datastream information is stored in the `ClusterState.Metadata` we exposed the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for the steps to be able to identify when a managed index is part of a DataStream. If a managed index is part of a DataStream the rollover target is the DataStream name and the highest generation index is the write index (ie. the rolled index). (cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-06-02 11:55:23 +01:00
Przemysław Witek	ea6cfb7c3d	[7.x] Make Annotation a result type (#56342 ) (#57508 )	2020-06-02 11:56:41 +02:00
Tanguy Leroux	b4a2cd810a	Use 3rd party task to run integration tests on external service (#56588 ) Backport of #56587 for 7.x	2020-06-02 11:26:58 +02:00
Marios Trivyzas	52c555e286	SQL: Make CASTing string to DATETIME more lenient (#57451 ) (#57509 ) Some BI tools (i.e. Tableau) would try to cast strings where the time part is separated from the date part with a whitespace instead of `T`. Adjust type conversion used by CAST to support this. (cherry picked from commit 0e18321e7ad9f779c42855efbf93f171b9128a5e)	2020-06-02 10:54:03 +02:00
Marios Trivyzas	b8a13de20f	SQL: Implement TOP as an alternative to LIMIT (#57428 ) (#57507 ) Add basic support for `TOP X` as a synonym to LIMIT X which is used by [MS-SQL server](https://docs.microsoft.com/en-us/sql/t-sql/queries/top-transact-sql?view=sql-server-ver15), e.g.: ``` SELECT TOP 5 a, b, c FROM test ``` TOP in SQL server also supports the `PERCENTAGE` and `WITH TIES` keywords which this implementation doesn't. Don't allow usage of both TOP and LIMIT in the same query. Refers to #41195 (cherry picked from commit 2f5ab81b9ad884434d1faa60f4391f966ede73e8)	2020-06-02 10:53:42 +02:00
Przemysław Witek	ceb4b29b98	Introduce Annotation.event field (#57144 ) (#57453 )	2020-06-01 20:42:25 +02:00
Mark Tozzi	1f500583b1	Clean up Aggregator Supplier Boiler Plate (#57442 ) (#57452 )	2020-06-01 14:21:07 -04:00
Zachary Tong	daaf5a3dcc	Fix assertion catching in aggregation supported type test (#56466 ) (#57382 ) At some point, we changed the supported-type test to also catch assertion errors. This has the side effect of also catching the `fail()` call inside the try-catch, which silently smothered some failures. This modifies the test to throw at the end of the try-catch block to prevent from accidentally catching itself. Catching the AssertionError is convenient because there are other locations that do throw an assertion in tests (due to hitting an assertion before the exception is thrown) so I think we should keep it around. Also includes a variety of fixes to other tests which were failing but being silently smothered.	2020-06-01 12:10:05 -04:00
David Kyle	064093c4d4	Fix compilation after backport of #57278	2020-06-01 12:03:13 +01:00
Przemysław Witek	72ad9a4548	[7.x] Make AnnotationPersister use bulk requests instead of indexing individual documents (#57278 ) (#57354 )	2020-06-01 12:05:09 +02:00
David Roberts	9fdf1722e6	[TEST] Fix more allowed warnings for composable template rename (#57398 ) Should have been done in #57232	2020-05-31 18:14:48 +01:00
Benjamin Trent	34f1e0b6bb	[7.x] [ML] mark forecasts for force closed/failed jobs as failed (#57143 ) (#57374 ) * [ML] mark forecasts for force closed/failed jobs as failed (#57143) forecasts that are still running should be marked as failed/finished in the following scenarios: - Job is force closed - Job is re-assigned to another node. Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted. Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index. relates to https://github.com/elastic/elasticsearch/issues/56419	2020-05-29 14:48:10 -04:00
Benjamin Trent	35d5126cea	[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351 ) (#57368 ) * [ML] adds new for_export flag to GET _ml/inference API (#57351) Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API. This flag is useful for moving models between clusters.	2020-05-29 14:01:08 -04:00
Benjamin Trent	15aba60c02	[7.x] Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695 ) (#57359 ) * Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695) This commit lays the ground work for plugins supplying their own circuit breakers. It adds a new interface: `CircuitBreakerPlugin`. This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers. With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.	2020-05-29 12:13:46 -04:00
Benjamin Trent	c8374dc9f3	[ML] add max_model_memory parameter to forecast request (#57254 ) (#57355 ) This adds a max_model_memory setting to forecast requests. This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb"). The default value is `20mb`. There is a HARD limit at `500mb` which will throw an error if used. If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited. related native change: https://github.com/elastic/ml-cpp/pull/1238 closes: https://github.com/elastic/elasticsearch/issues/56420	2020-05-29 11:16:08 -04:00
Marios Trivyzas	b2651323fd	SQL: Implement TIME_PARSE function for parsing strings into TIME values (#55223 ) (#57342 ) Implement TIME_PARSE(<time_str>, <pattern_str>) function which allows to parse a time string according to the specified pattern into a time object. The patterns allowed are those of java.time.format.DateTimeFormatter. Closes #54963 Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> Co-authored-by: Patrick Jiang(白泽) <patrickjiang0530@gmail.com> (cherry picked from commit 1fe1188d449cad7d0782a202372edc52a4014135)	2020-05-29 15:48:37 +02:00
Dan Hermann	6b0d707671	[7.x] Do not report negative values for swap sizes (#57353 )	2020-05-29 08:11:47 -05:00
Martijn van Groningen	04ef39da77	Change cluster info actions to be able to resolve data streams. (#57343 ) Backport of #56878 to 7.x branch. With this change the following APIs will be able to resolve data streams: get index, get mappings and ilm explain APIs. Relates to #53100	2020-05-29 12:17:53 +02:00
Dimitris Athanasiou	322f953060	[7.x][ML] Anomaly detection jobs should allow missing values for geo fields (#57300 ) (#57338 ) Allows geo fields (`geo_point`, `geo_shape`) to have missing values. Fixes a bug where such missing values would result in an error. Closes #57299 Backport of #57300	2020-05-29 13:06:16 +03:00
Benjamin Trent	24d605e41e	[ML] fixing GET _ml/inference so size param is respected (#57303 ) (#57308 ) `size` was previously ignored when grabbing full trained model configs. closes https://github.com/elastic/elasticsearch/issues/57298	2020-05-28 15:45:26 -04:00
Martijn van Groningen	225ccd1cfa	Ensure template exists when creating data stream (#57275 ) Backporting #56888 to 7.x branch. Limit the creation of data streams only for namespaces that have a composable template with a data stream definition. This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover. Also remove `timestamp_field` parameter from create data stream request and let the create data stream api resolve the timestamp field from the data stream definition snippet inside a composable template. Relates to #53100	2020-05-28 15:08:25 +02:00
Marios Trivyzas	fdac9e99fa	SQL: Fix unecessary evaluation for CASE/IIF (#57159 ) (#57262 ) Previously, `CASE` and `IIF` when translated to painless scripts (used in GROUP BY, HAVING, WHERE) a custom `caseFunction` registered in the `InternalSqlScriptUtils` was used. This function received and array of arbitrary length: ```[condition1, result1, condition2, result2, ... elseResult]``` Painless doesn't know of the context and therefore is evaluating all conditions and results before invoking the `caseFunction` on them. As a consequence, erroneous result expressions (i.e. division by 0) where always evaluated despite of the guarding condition. Replace the `caseFunction` with painless `<cond> ? <res1> : <res2>` expressions to properly guard the result expressions and only evaluate the one for which its guarding condition evaluates to true (or of course the elseResult). As a bonus, this approach includes performance benefits since we avoid unnecessary evaluations of both conditions and result expressions. Fixes: #49672 (cherry picked from commit 9584b345d89f797bfb658212b928b9812804f02f)	2020-05-28 11:30:14 +02:00
Tim Vernum	408250dcc4	Fix smtp.ssl.trust setting for watcher email (#57268 ) The ssl.trust setting for Watcher provides a list of hostnames that should be automatically trusted for SSL hostname verification. It was accidentally broken when we added the full ssl.* settings for email notifications (see #45272) This commit corrects this, so the setting is once again respected, as long as none of the other ssl settings are configured for email notifications. Resolves: #52153 Backport of: #56090	2020-05-28 17:34:13 +10:00
Ryan Ernst	fdb8573413	Convert remaining compilerJavaHome reference	2020-05-27 17:04:04 -07:00
Ryan Ernst	beb1d0c338	Remove compiler java version flag (#57237 ) This commit removes the compiler.java setting from the build. It was originally added when Gradle was far behind support for the latest jdk, but is no longer applicable as we don't have any need to update the supported compile version before gradle supports the newer version. Note that the runtime version changing support still exists here, this only ensures we use the same jdk to compile as we use to run gradle.	2020-05-27 16:33:38 -07:00
David Roberts	d139a79ef6	[7.x][ML] Fix monitoring if orphaned anomaly detector persistent tasks exist (#57240 ) Since #51888 the ML job stats endpoint has returned entries for jobs that have a persistent task but not job config. Such orphaned tasks caused monitoring to fail. This change ignores any such corrupt jobs for monitoring purposes. Backport of #57235	2020-05-27 22:59:11 +01:00
James Baiera	3b73ce3112	Fix enrich coordinator to reject documents instead of deadlocking (#56247 ) (#57179 ) This PR removes the blocking call to insert ingest documents into a queue in the coordinator. It replaces it with an offer call which will throw a rejection exception in the event that the queue is full. This prevents deadlocks of the write threads when the queue fills to capacity and there are more than one enrich processors in a pipeline.	2020-05-27 15:32:13 -04:00
Lee Hinman	c0f732b9f6	[7.x] Rename template V2 classes to ComposableTemplate (#57183 ) (#57232 ) Backports the following commits to 7.x: Rename template V2 classes to ComposableTemplate (#57183)	2020-05-27 11:01:59 -06:00
AndyHunt66	6760c69783	[DOCS] Fix formatting of create API key API docs (#57138 )	2020-05-27 08:34:51 -04:00
Tal Levy	81060820e9	Fix NormalizerAgg test searcher wrapping (#57171 ) The searcher was randomly wrapping its reader as slow, parallel, or filtered. This was causing casting issues in the normalizer tests. By removing the wrapping, the problem goes away. Closes #57164	2020-05-26 13:25:19 -07:00
Benjamin Trent	decc6277f9	[ML] allow unran/incomplete forecasts to be deleted for stopped/failed jobs (#57152 ) (#57172 ) If a job is NOT opened, forecasts should be able to be deleted, no matter their state. This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids closes https://github.com/elastic/elasticsearch/issues/56419	2020-05-26 15:44:22 -04:00
Bogdan Pintea	74b2c8a770	Change error message for comp against fields (#57126 ) Change the error message wording for comparisons against fields in filtering (s/variables/fields). (cherry picked from commit d9a1cb50940d0a98fd75b9c0123ca6e1d862f65d)	2020-05-26 17:57:51 +02:00
Bogdan Pintea	0c379e334a	SQL: update the JLine dependency to 3.14.1 (#57111 ) * Update the JLine dependency to 3.14.1 Update the JLine dependency from 3.10.0 to 3.14.1. (cherry picked from commit c2d9b74046fa5ddb54604da3afa7887cc38548a1)	2020-05-26 17:56:34 +02:00
markharwood	b2bc6071fd	Add regex query support to wildcard field (approach 2) (#55548 ) (#57141 ) Backport of #55548 Adds equivalence for keyword field to the wildcard field. Regex, fuzzy, wildcard and prefix queries are all supported. All queries use an approximation query backed by an automaton-based verification queries. Closes #54275	2020-05-26 16:55:59 +01:00
markharwood	1d74549d7f	Wildcard field - add support for null field with test (#57047 ) (#57139 ) Backport of #57047	2020-05-26 16:07:49 +01:00
David Kyle	571477d0ad	[7.x] Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041 ) (#57136 ) Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.	2020-05-26 10:56:42 +01:00
Ioannis Kakavas	6984b3ef6f	Adjust reload keystore test to pass in FIPS (#57050 ) (#57133 ) In KeystoreWrapper class we determine if the error to decrypt a given keystore is caused by a wrong password based on the exception that the SunJCE implementation of AES is throwing (AEADBadTagException). Other implementations from other Security Providers might cause decryption to fail in a different way and cause us to throw a generic error message. We handle this in this test by matching both possible exception messages. Relates: #56889	2020-05-26 11:21:50 +03:00
Ioannis Kakavas	1e03de4999	Fix key usage in SamlAuthenticatorTests (#57124 ) (#57129 ) In #51089 where SamlAuthenticatorTests were refactored, we missed to update one test case which meant that a single key would be used both for signing and encryption in the same run. As explained in #51089, and due to FIPS 140 requirements, BouncyCastle FIPS provider will block RSA keys that have been used for signing from being used for encryption and vice versa This commit changes testNoAttributesReturnedWhenTheyCannotBeDecrypted to always use the specific keys we have added for encryption.	2020-05-26 10:51:47 +03:00
Jim Ferenczi	52443d41cf	Stop async search maintenance service on restart (#56982 ) This change ensures that we stop the maintenance service on all nodes when a data node is restarted. This ensures that we don't send update_by_query requests on the node that is restarted. This commit also raises the log level to trace for some packages in order to investigate the failures to acquire a shard lock after a restart. Relates #56765	2020-05-26 09:30:33 +02:00
Przemysław Witek	ea2012778e	Mute failing test (#57112 ) (#57113 )	2020-05-25 14:06:29 +02:00
Ioannis Kakavas	174af2bb1a	[7.x] Refactor SamlAuthenticatorTests (#51089 ) (#57105 ) - Use opensaml to sign and encrypt responses/assertions/attributes instead of doing this manually - Use opensaml to build response and assertion objects instead of parsing xml strings - Always use different keys for signing and encryption. Due to FIPS 140 requirements, BouncyCastle FIPS provider will block RSA keys that have been used for signing from being used for encryption and vice versa. This change adds new encryption specific keys to be used throughout the tests.	2020-05-25 14:09:42 +03:00
Ioannis Kakavas	6c832fe4e3	Don't run IDP tests in FIPS 140 mode (#57048 ) (#57098 ) We don't support this for now so there is no need to handle all the test logic/exceptions to run this in FIPS 140 mode.	2020-05-25 14:08:48 +03:00
Armin Braun	9fa60f7367	Add History UUID Index Setting (#56930 ) (#57104 ) Pre-requesite for #50278 to be able to uniquely identify index metadata by its version fields and UUIDs when restoring into closed indices.	2020-05-25 11:26:03 +02:00
Rene Groeschke	28920a45f1	Improvement usage of gradle task avoidance api (#56627 ) (#56981 ) Use gradle task avoidance api wherever it is possible as a drop in replacement in the es build	2020-05-25 09:37:33 +02:00
Marios Trivyzas	b91bae30b1	SQL: [Tests] Move JDBC integration tests to new module (#56872 ) (#57072 ) Move the JDBC functionality integration tests from `:sql:qa` to a separate module `:sql:qa:jdbc`. This way the tests are isolated from the rest of the integration tests and they only depend to the `:sql:jdbc` module, thus removing the danger of accidentally pulling in some dependency that may hide bugs. Moreover this is a preparation for #56722, so that we can run those tests between different JDBC and ES node versions and ensure forward compatibility. Move the rest of existing tests inside a new `:sql:qa:server` project, so that the `:sql:qa` becomes the parent project for both and one can run all the integration tests by using this parent project. (cherry picked from commit c09f4a04484b8a43934fe58fbc41bd90b7dbcc76)	2020-05-22 17:49:36 +02:00
Ioannis Kakavas	6c90727166	Fix custom policy in plugins in FIPS 140 (#52046 ) (#57049 ) Our FIPS 140 testing depends on setting the appropriate java policy in order to configure the JVM in FIPS mode. Some tests ( discovery-ec2 and ccr qa ) also needed to set a custom policy file to grant a specific permission, which overwrote the FIPS related policy and tests would fail. This change ensures that when a custom policy needs to be set in these tests, the permissions that are necessary for FIPS are also set. Resolves: #51685, #52034	2020-05-21 19:26:56 +03:00
Benjamin Trent	f00dfb2d5f	[ML] adds WKT support in filestructurefinder (#57014 ) (#57032 ) Field mapping detection is done via grok patterns. This commit adds well-known text (WKT) formatted geometry detection. If everything is a `POINT`, then a `geo_point` mapping is preferred. Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred. This does NOT detect other types of formatted geometries (geohash, comma delimited points, etc.) closes https://github.com/elastic/elasticsearch/issues/56967	2020-05-21 08:22:51 -04:00
markharwood	eb8cb31d46	Update Lucene version to 8.6.0-snapshot-9d6c738ffce (#57024 ) Same version as master	2020-05-21 11:28:16 +01:00
James Rodewig	37e2bb7057	[DOCS] Add watcher multi-doc index ex (#52040 ) (#57011 ) Adds an example snippet for creating a `_doc` payload field with the Watcher `index` action. Co-authored-by: Luiz Guilherme Pais dos Santos <luiz.santos@elastic.co>	2020-05-20 16:57:45 -04:00
Brandon Morelli	ec41d36c62	docs: update links to beats security docs (#56875 ) (#56953 )	2020-05-20 11:28:39 -07:00
Bogdan Pintea	ec4a6aa1c6	SQL: JDBC: fix temporary directory locked test errors in Windows (#56917 ) * Fix temp dir locked errors The tests involving a temporary directory (containing the JDBC JAR) fail on Windows because they can't be deleted, due to still being in use. This commit forces a premature closing of the JAR file, which mitigates the failure by giving the JVM more time to collect any open FDs. (Calling the System.gc() in the tests is another working alternative fix.) The stream-based JAR access is taken care by disabling the cache usage (cherry picked from commit 04f97333a015404a68e8f19223f33aadeb396687)	2020-05-20 19:46:57 +02:00
Florian Kelbert	edada6bc39	[Docs] Insert missing colon (#56980 )	2020-05-20 15:49:17 +02:00
Benjamin Trent	ee4ce8ecec	Fix geotile_grid group_by field mapping (#56939 ) (#56990 ) The original implementation utilized `bbox` as the index mapping type. This would not work as it would have to be `envelope`. But, given that `envelope` and `polygon` are tessellated in the same way, we choose to use `polygon` as the geo_shape type. This is for easier support other places in the stack (a la kibana maps)	2020-05-20 08:22:13 -04:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Marios Trivyzas	644ae49817	SQL: Fix behaviour of COUNT(DISTINCT <literal>) (#56869 ) (#56932 ) Previously `COUNT(DISTINCT <literal>)` was returning the same result as `COUNT(<literal>)` which is not correct as it should always return 1 if there is at least one matching row (bucket if there is a GROUP BY), or 0 otherwise. (cherry picked from commit 7f7d7562d43034907f432d39d0d66f490d78f4a8)	2020-05-19 11:19:06 +02:00
Yannick Welsch	f296c08021	Increase timeout for assertLongBusy in AutoFollowIT (#56910 ) Closes #56891	2020-05-18 16:20:46 +02:00
Benjamin Trent	297f864884	[ML] relax throttling on expired data cleanup (#56711 ) (#56895 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 08:46:42 -04:00
David Kyle	0fac152188	Muse AsyncSearchActionIT (#56897 ) For #56765	2020-05-18 13:36:33 +01:00
Ioannis Kakavas	bb852ab2e7	Cause is tracked in #49094 (#56887 )	2020-05-18 15:03:38 +03:00

... 3 4 5 6 7 ...

5936 Commits