OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	9a8225bbc1	Upgrade to lucene-8.7.0-snapshot-9cd3af50f80. (#62450 ) (#62476 ) This new snapshot contains the following JIRAs that we're interested in: - [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525) Better handling of small documents. This should improve retrieval times when documents are less than ~1kB. - [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510) Faster flushes when index sorting is enabled by not compressing the temporary files that store stored fields and term vectors.	2020-09-17 10:28:20 +02:00
Luca Cavanna	26388fe22e	Runtime fields: rename fielddata and mapped field type classes (#62483 ) With this commit we rename all of the fielddata, doc_values and mapped field type classes for runtime fields to not start with the Script prefix but rather their runtime type (e.g. Boolean) and only then Script	2020-09-17 09:14:30 +02:00
Ignacio Vera	2d3ca9c155	Introduce a sparse HyperLogLogPlusPlus class for cloning and serializing low cardinality buckets (#62480 ) (#62520 ) Reduces the memory footprint of an HLL++ structure that uses Linear counting when cloning or deserialising the data structure.	2020-09-17 08:54:50 +02:00
Costin Leau	ceaf96061c	EQL: Fetch sequence documents using Point-In-Time (#62469 ) To preserve the PIT semantics, the retrieval of results has moved from using multi-get to using an idsQuery. (cherry picked from commit 1c2362fcf2be62ce568b3772924abce7331ef23c)	2020-09-17 00:12:19 +03:00
Luca Cavanna	1e352fdb7f	Runtime fields: rename script classes (#62448 ) With this commit we rename the script classes used for each mapped field type used for runtime fields. The new naming is a shorter version of the previous one: from e.g. BooleanScriptFieldScrip to BooleanScript . We also move such classes to the existing mapper package.	2020-09-16 18:00:06 +02:00
Christoph Büscher	f8634e5bea	Muting SimpleSecurityNetty4ServerTransportTests	2020-09-16 15:14:08 +02:00
Benjamin Trent	341eeae6e7	[ML] fixes testWatchdog test verifying matcher is interrupted on timeout (#62391 ) (#62447 ) Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition. The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately. closes #48861	2020-09-16 09:13:22 -04:00
Lyudmila Fokina	167172a057	Update authc failure headers on license change (#61734 ) (#62442 ) Backport of #61734	2020-09-16 14:37:03 +02:00
Benjamin Trent	8d89a28126	[ML] unmuting test for testTooManyPartitions memory check on windows (#62393 ) (#62405 ) This commit unmutes the windows check for testTooManyPartitions test. The assertion has since changed to include a soft_limit check. This coupled with changes over the past years means the test should be enabled again. related to: #32033	2020-09-16 07:03:10 -04:00
Christoph Büscher	6a016fb755	Muting LogstashSystemIndexIT.testPipelineCRUD	2020-09-16 11:04:41 +02:00
Hendrik Muhs	8566e9e3e7	[Transform] Make pivot validation sub-agg aware (#62381 ) With the addition of sub aggregations like filter, the validation could fail if 2 sub aggs use the same output name. This change makes validation sub-agg aware. fixes #57814	2020-09-16 07:55:58 +02:00
Yang Wang	a11dfbe031	Oidc additional client auth types (#58708 ) (#62289 ) The OpenID Connect specification defines a number of ways for a client (RP) to authenticate itself to the OP when accessing the Token Endpoint. We currently only support `client_secret_basic`. This change introduces support for 2 additional authentication methods, namely `client_secret_post` (where the client credentials are passed in the body of the POST request to the OP) and `client_secret_jwt` where the client constructs a JWT and signs it using the the client secret as a key. Support for the above, and especially `client_secret_jwt` in our integration tests meant that the OP we use ( Connect2id server ) should be able to validate the JWT that we send it from the RP. Since we run the OP in docker and it listens on an ephemeral port we would have no way of knowing the port so that we can configure the ES running via the testcluster to know the "correct" Token Endpoint, and even if we did, this would not be the Token Endpoint URL that the OP would think it listens on. To alleviate this, we run an ES single node cluster in docker, alongside the OP so that we can configured it with the correct hostname and port within the docker network. Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>	2020-09-16 14:29:09 +10:00
Nik Everett	24a24d050a	Implement fields fetch for runtime fields (backport of #61995 ) (#62416 ) This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to #59332	2020-09-15 20:24:10 -04:00
Nik Everett	e5ad3a41f1	Check for runtime field loops in queries (backport of #61927 ) (#62421 ) We were checking for loops in queries before, but we had an "off by one" error where we wouldn't notice the "top level" runtime field when detecting a loop. So the error message would be wrong. I also caught a few bugs with query generation caused by missing `@Override` annotations and fixed a few of them. There is a bug with `regexp` queries with match options that I'm not fixing in this PR but will get to later. Relates to #59332	2020-09-15 17:24:19 -04:00
Costin Leau	b2e85d5639	SQL: Do not resolve self-referencing aliases (#62382 ) Prevent the analyzer for trying to resolve aliases on expressions that reference themselves (or fields within themselves) as that causes infinite recursion. Fix #62296 (cherry picked from commit 021d27815b03e92e02859bc9c0c8eec78f30c72e)	2020-09-15 20:53:28 +03:00
Armin Braun	9ac4ee9c44	Increase Flaky Timeout in testIlmHistoryIndexCanRollover (#62353 ) (#62402 ) This busy assert easily takes about 5s on a very fast work station so the default of 10s is not sufficient here at all.	2020-09-15 19:50:45 +02:00
Nik Everett	771a8893a6	Add more debugging information for cardinality agg (#62317 ) (#62397 ) This adds two extra bits of info to the profiler: 1. Count of the number of different types of collectors. This lets us figure out if we're using the optimization for segment ordinals. It adds a few more similar counters just for good measure. 2. Profiles the `getLeafCollector` and `postCollection` methods. These are non-trivial for some aggregations, like cardinality.	2020-09-15 13:21:11 -04:00
William Brafford	af64e46065	Add logstash system index APIs (#53350 ) (#62347 ) We want Logstash indices to be system indices, but the logstash service will still need to be able to manage its indices. This PR adds special system index APIs to the logstash plugin so that logstash can manage its pipelines without direct access to the underlying indices. * Add logstash module with dedicated logstash APIs * merge with x-pack plugin * add system index access allowance * Break out serialization tests into distinct classes * Log failures for partial multiget failure * Move LogstashSystemIndexIT to javaRestTest task Co-authored-by: William Brafford <william.brafford@elastic.co> Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>	2020-09-15 12:42:14 -04:00
Adrien Grand	6db8afefc2	Upgrade to lucene-8.7.0-snapshot-cdfdc1e0851. (#62376 ) Upgrade to a new Lucene snapshot that (at least partially) addresses the indexing rate regression when index sorting is enabled. Backport of #62334.	2020-09-15 17:48:07 +02:00
Fernando Briano	7dd073c243	Wraps timestamp values in quotes in runtime fields YAML tests. (#62155 )	2020-09-15 15:24:57 +01:00
Albert Zaharovits	aeed1c05b0	Ensure authz operation overrides transient authz headers (#61621 ) AuthorizationService#authorize uses the thread context to carry the result of the authorisation as transient headers. The listener argument to the `authorize` method must necessarily observe the header values. This PR makes it so that the authorisation transient headers (`_indices_permissions` and `_authz_info`, but NOT `_originating_action_name`) of the child action override the ones of the parent action. Co-authored-by: Tim Vernum tim@adjective.org	2020-09-15 16:37:38 +03:00
Armin Braun	76f56c1264	Add Missing NamedWritable Registration for ExecuteEnrichPolicyStatus (#62364 ) (#62374 ) This was missing and caused nodes to drop out of the cluster on serialization failures when ever one tried to get an enrich policy task by name. The test in here is a little dirty but I figured it would be nice to have an actual reproducer for the issue and I couldn't find any infrastructure to nicely time the tasks so I put this on top of existing test infra.	2020-09-15 15:24:15 +02:00
Costin Leau	03d2395183	EQL: Use Point In Time inside sequences (#62276 ) Use the newly introduced PIT API to have a consistent view of the data while doing sequence matching, which involves multiple calls, aka repeatable reads and thus avoid race conditions or any in-flight updates on the data. (cherry picked from commit daa72fc3c71fd36afb55278021ff6bbc591ef148)	2020-09-15 15:40:03 +03:00
Lee Hinman	6b2af30a62	[7.x] Add "synthetics--" templates for synthetics fleet data (#62193 ) (#62346 ) * Add "synthetics--" templates for synthetics fleet data For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong with those and thus we should have a place for it to live. This would be data reported from heartbeat and under the 'monitoring' category. This commit adds a composable index template for `synthetics--` indices similar to the work in #56709 and #57629. Resolves #61665	2020-09-14 17:14:34 -06:00
Julie Tibshirani	4a19bdb2ea	Support the 'fields' option in inner_hits and top_hits. (#62337 ) This PR adds support for the 'fields' option in the following places: * Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing * The `top_hits` aggregation Addresses #61949.	2020-09-14 11:51:45 -07:00
David Roberts	e4275f3749	[ML] Use utility thread pool for memory estimation (#62314 ) The job comms thread pool is intended for the long-running job processes that do anomaly detection or data frame analytics and count towards job count and memory limits. This commit moves the short-lived memory estimation processes to the ML utility thread pool. Although this doesn't matter in most cases, at the limits of scale it could mean that memory estimations would get in the way of starting jobs, or would queue up for an excessive period of time while waiting for jobs to finish.	2020-09-14 16:47:12 +01:00
Lee Hinman	bf9651c635	[7.x] Add "content" tier as new "data_content" role (#62247 ) (#62322 ) Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to #60848	2020-09-14 09:42:57 -06:00
Benjamin Trent	13c193a9fc	[Enrich] add logging for when there are search/bulk failures on _execute (#62313 ) (#62320 ) When calling `_execute` there is a chance that there will be bulk indexing failures or search failures. These will result in the call failing overall. But, no information is provided for troubleshooting the failure. This commit adds logging to indicate the number of failures, and new debug level logging so that failure details can be determined if necessary. closes https://github.com/elastic/elasticsearch/issues/60491	2020-09-14 11:20:13 -04:00
Armin Braun	95766da345	Save Some Allocations when Working with ClusterState (#62060 ) (#62303 ) Just a number of obvious spots where we were allocating duplicate empty structures or otherwise inefficient that I found while investigating snapshot cluster state update performance.	2020-09-14 15:09:54 +02:00
Tanguy Leroux	9e38dd0254	Deprecate Repository Stats API (#62297 ) (#62308 ) This commit deprecates the Repository Stats API added in 7.8.0 as an experimental API behind a feature flag. The goal is to deprecate this API in 7.10.0 and remove it in a follow up PR in 8.0.0. This API is now superseded by the Repositories Metering API.	2020-09-14 14:57:38 +02:00
David Roberts	d8288526d9	[ML] Add null checks for C++ log handler (#62238 ) It has been observed that if the normalizer process fails to connect to the JVM then this causes a null pointer exception as the JVM tries to close the native process object. The accessors and close methods of the native process class that access the C++ log handler should not assume that it connected correctly.	2020-09-14 11:28:26 +01:00
Martijn van Groningen	c88f4174ec	Fix resolve index data streams yaml test. (#62221 ) Closes #62190	2020-09-14 08:43:58 +02:00
Nhat Nguyen	7779c1f703	Ensure to release async search iterator in tests We need to close an async search response iterator to release the related point in time if the test uses pit.	2020-09-12 12:04:10 -04:00
Martijn van Groningen	1bb094a27b	Return 404 when deleting a non existing data stream (#62224 ) Backport of #62059 to 7.x branch. Return a 404 http status code when attempting to delete a non existing data stream. However only return a 404 when targeting a data stream without any wildcards. Closes #62022	2020-09-11 15:36:05 +02:00
Nhat Nguyen	b118697368	Adjust BWC rest version for point in time (#62264 ) Relates #61872	2020-09-11 08:54:11 -04:00
Luca Cavanna	b5e1e652c1	Remove unused import	2020-09-11 10:19:01 +02:00
Luca Cavanna	3d3a1b4bc2	Tweak OpenPointInTimeRequest createTask This commit addresses a super minor misalignment with master, applying exactly the same change that was made as part of #62057, which was backported before point in time APIs were backported.	2020-09-11 10:06:35 +02:00
Nhat Nguyen	aafb2cb812	Support point in time cross cluster search (#61827 ) This commit integrates point in time into cross cluster search. Relates #61062 Closes #61790	2020-09-10 19:25:48 -04:00
Nhat Nguyen	808c8689ac	Always include the matching node when resolving point in time (#61658 ) If shards are relocated to new nodes, then searches with a point in time will fail, although a pit keeps search contexts open. This commit solves this problem by reducing info used by SearchShardIterator and always including the matching nodes when resolving a point in time. Closes #61627	2020-09-10 19:25:48 -04:00
Nhat Nguyen	035f0638f4	Support point in time in async_search (#61560 ) This commit integrates point in time into async search and ensures that it works correctly with security enabled. Relates #61062	2020-09-10 19:25:48 -04:00
Nhat Nguyen	2eb1e8bc84	Make keep alive of point in time optional in search (#62184 ) A search request should not be required to extend the keep_alive of a point in time. This change makes that parameter optional.	2020-09-10 19:25:48 -04:00
Jim Ferenczi	4d528e91a1	Ensure validation of the reader context is executed first (#61831 ) This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext) before any other operation and that it is correctly recycled or removed at the end of the operation. This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once. Relates #61446 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>	2020-09-10 19:25:48 -04:00
Nhat Nguyen	3d69b5c41e	Introduce point in time APIs in x-pack basic (#61062 ) This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>	2020-09-10 19:25:47 -04:00
Nhat Nguyen	87c889f9c9	CCR should retry on CircuitBreakingException (#62013 ) CCR shard follow task can hit CircuitBreakingException on the leader cluster (read changes requests) or the follower cluster (bulk requests). CCR should retry on CircuitBreakingException as it's a transient error.	2020-09-10 17:23:47 -04:00
Nik Everett	ac23380560	Fix some query methods in runtime fields We were missing a few `@Override` annotations in runtime fields which let us drift from the methods we were supposed to override. Oops. This adds them and links the methods.	2020-09-10 17:06:05 -04:00
Luca Cavanna	39e59d6edf	Share more query execution code for runtime fields (#62229 ) For runtime fields we have written quite some lucene queries that work against runtime values that are the result of the execution of the different script contexts that runtime fields support. The all (but one) share the same main logic: use a two phase iterator, iterate over all documents, and decide whether the current doc matches or not based on what the script returns. I went ahead and shared this bit of code in the base class for all queries on top of runtime fields.	2020-09-10 20:27:49 +02:00
Luca Cavanna	cd9774d8cb	Runtime fields: rename emitValue function to emit (#62191 ) We decided to shorten the emitValue function to emit, given that emit is self-explanatory. Relates to #59332	2020-09-10 20:27:49 +02:00
Tanguy Leroux	42f5d38d9b	Remove REST APIs documentation for experimental Searchable Snapshot APIs (#62217 ) (#62231 ) This commit removes the documentation for some specific Searchable Snapshot REST APIs: - clear cache - searchable snapshot stats - repository stats These APIs are low-level and are useful to investigate the behavior of snapshot backed indices but we expect them to be removed in the future or to appear in a different form.	2020-09-10 16:51:28 +02:00
Ignacio Vera	c8981ea93d	upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213 ) (#62222 )	2020-09-10 16:23:18 +02:00
Martijn van Groningen	f87fc67592	Mute resolve index data stream tests. (#62211 ) Relates to #62190 and #62210	2020-09-10 13:13:50 +02:00
Andrei Stefan	cce6da7d52	EQL: add the wildcard field type to the IT tests (#62166 ) (#62200 ) * Add wildcard field type as an option for randomized testing of IT queries (cherry picked from commit 87b14c409c180c4d53c3c61a30bd69f1b81a2823)	2020-09-10 12:36:36 +03:00
Yannick Welsch	e3feafc1e9	Enable searchable snapshots in release builds (#62201 ) Enables searchable snapshot functionality not only in snapshot, but also release builds.	2020-09-10 11:20:12 +02:00
David Roberts	969a1c558b	[ML] Include the "properties" layer in find_file_structure mappings (#62158 ) Previously the "mappings" field of the response from the find_file_structure endpoint was not a drop-in for the mappings format of the create index endpoint - the "properties" layer was missing. The reason for omitting it initially was that the assumption was that the find_file_structure endpoint would only ever return very simple mappings without any nested objects. However, this will not be true in the future, as we will improve mappings detection for complex JSON objects. As a first step it makes sense to move the returned mappings closer to the standard format. This is a small building block towards fixing #55616	2020-09-10 09:33:42 +01:00
Jake Landis	c6c6596623	[7.x] Configure internalClusterTest for snapshot_feature_enabled flag (#62185 ) (#62189 ) closes #62039	2020-09-09 15:41:43 -05:00
Jake Landis	d8dad9ab2c	[7.x] Remove integTest task from PluginBuildPlugin (#61879 ) (#62135 ) This commit removes `integTest` task from all es-plugins. Most relevant projects have been converted to use yamlRestTest, javaRestTest, or internalClusterTest in prior PRs. A few projects needed to be adjusted to allow complete removal of this task * x-pack/plugin - converted to use yamlRestTest and javaRestTest * plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task * qa/die-with-dignity - convert to javaRestTest * x-pack/qa/security-example-spi-extension - convert to javaRestTest * multiple projects - remove the integTest.enabled = false (yay!) related: #61802 related: #60630 related: #59444 related: #59089 related: #56841 related: #59939 related: #55896	2020-09-09 14:25:41 -05:00
Nik Everett	6d2cab9437	Stop runtime script from emitting too many values (#61938 ) (#62186 ) This prevent `keyword` valued runtime scripts from emitting too many values or values that take up too much space. Without this you can put allocate a ton of memory with the script by sticking it into a tight loop. Painless has some protections against this but: 1. I don't want to rely on them out of sheer paranoia 2. They don't really kick in when the script uses callbacks like we do anyway. Relates to #59332	2020-09-09 14:47:24 -04:00
Benjamin Trent	e181e24d48	[ML] only persist progress if it has changed (#62123 ) (#62180 ) * [ML] only persist progress if it has changed We already search for the previously stored progress document. For optimization purposes, and to prevent restoring the same progress after a failed analytics job is stopped, this commit does an equality check between the previously stored progress and current progress If the progress has changed, persistence continues as normal.	2020-09-09 12:04:09 -04:00
Benjamin Trent	1b9dc0172a	[ML] adding feature_name and node size validation for tree models (#62096 ) (#62161 ) When a tree model is provided, it is possible that it is a stump. Meaning, it only has one node with no splits This implies that the tree has no features. In this case, having zero feature_names is appropriate. In any other case, this should be considered a validation failure. This commit adds the validation if there is more than 1 node, that the feature_names in the model are non-empty. closes #60759	2020-09-09 08:50:25 -04:00
Luca Cavanna	b680d3fb29	Async search: don't track fetch failures (#62111 ) Fetch failures are currently tracked byy AsyncSearchTask like ordinary shard failures. Though they should be treated differently or they end up causing weird scenarios like total=num_shards and successful=num_shards as the query phase ran fine yet the failed count would reflect the number of shards where fetch failed. Given that partial results only include aggs for now and are complete even if fetch fails, we can ignore fetch failures in async search, as they will be anyways included in the response. They are in fact either received as a failure when all shards fail during fetch, or as part of the final response when only some shards fail during fetch.	2020-09-09 13:44:07 +02:00
Luca Cavanna	ad83261348	Print out search request as part of async search task description (#62057 ) Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should. With this commit we address that while also making sure that the description highlights that the task is originated from an async search. Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.	2020-09-09 13:44:07 +02:00
Rory Hunter	b7fd7cf154	Write deprecation logs to a data stream (#61966 ) Backport of #58924. Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream as well as to disk.	2020-09-09 12:16:28 +01:00
markharwood	5a48895065	Wildcard field bug fix for prefix + term queries. Wildcard syntax should not be supported. (#62085 ) (#62154 ) Wildcard field bug fix for term and prefix queries. We now escape any * or ? characters in the search string before delegating to the main wildcardQuery() method. Closes #62081	2020-09-09 10:46:52 +01:00
Dimitris Athanasiou	6c1700c343	[7.x][ML] Outlier detection mapping for nested feature influence (#62068 ) (#62150 ) Adds mappings for outlier detection results. Backport of #62068	2020-09-09 12:41:26 +03:00
Yang Wang	146b2e6b1a	[Test] Fix data-stream rest test failure (#62137 ) (#62144 ) By enabling searchable snapshots for release builds.	2020-09-09 17:12:53 +10:00
Armin Braun	ed4984a32e	Remove Redundant Stream Wrapping from Compression (#62017 ) (#62132 ) In many cases we don't need a `StreamInput` or `StreamOutput` wrapper around these streams so I this commit adjusts the API to just normal streams and adds the wrapping where necessary.	2020-09-09 03:27:38 +02:00
Costin Leau	0f9532689f	EQL: Propagate key constraints through the query (#62073 ) Since join keys are common across all queries in a Join/Sequence, any constraint applied on one query needs to be obeyed but all the other queries. This PR enhances the optimizer to propagate such constraints across all queries so they get pushed down to the actual generated ES queries. Fix #58937 (cherry picked from commit 4afa5debc199c132c07015bfae17952c40a21e5d)	2020-09-08 18:40:47 +03:00
Benjamin Trent	057bf3f7d5	[ML] setting require_alias to previous value on bulk index retry (#62103 ) (#62108 ) Previous work has been done to prevent automatically creating a concrete index when an alias is desired. This commit addresses a path where this check was not being done. relates: #62064	2020-09-08 11:38:32 -04:00
Dimitris Athanasiou	41507cff48	[7.x][ML] Update mappings of ml stats index (#61980 ) (#62091 ) - Adds missing mappings for `alpha`, `gamma`, and `lambda`. - Corrects name of `soft_tree_depth_limit` and `soft_tree_depth_tolerance`. - Removes unused `regularization_depth_penalty_multiplier`, `regularization_leaf_weight_penalty_multiplier` and `regularization_tree_size_penalty_multiplier`. Backport of #61980	2020-09-08 16:41:57 +03:00
David Roberts	b2636678b2	[ML] Add support for date_nanos fields in find_file_structure (#62048 ) Now that #61324 is merged it is possible for the find_file_structure endpoint to suggest using date_nanos fields for timestamps where the timestamp format provides greater than millisecond accuracy.	2020-09-08 13:05:09 +01:00
Francisco Fernández Castaño	2bb5716b3d	Add repositories metering API (#62088 ) This pull request adds a new set of APIs that allows tracking the number of requests performed by the different registered repositories. In order to avoid losing data, the repository statistics are archived after the repository is closed for a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the statistics for the active repositories as well as the modified/closed repositories. Backport of #60371	2020-09-08 14:01:04 +02:00
David Kyle	fb6ee5b36d	[7.x] [ML] Assert mappings match templates in Upgrade tests (#61905 ) At the end of the rolling upgrade tests check the mappings of the concrete .ml and .transform-internal indices match the mappings in the templates. When the templates change, the tests should prove that the mappings have been updated in the new cluster.	2020-09-08 12:21:19 +01:00
Przemko Robakowski	bb357f6aae	[7.x] Move internal index templates to composable templates (#61457 ) (#61661 ) This change moves watcher, ILM history and SLM history templates to composable templates. Versions are updated to reflect the switch. Only change to the templates themselves is added `_meta` to mark them as managed	2020-09-08 11:26:06 +02:00
Andrei Stefan	7d5791b6bd	EQL: create the search request with a list of indices (#62005 ) (#62076 ) * The query client uses an array of indices instead of the comma separated version of the indices names (cherry picked from commit 8ec4a768f4892a4a2faed25836cb333a9deb2ace)	2020-09-08 10:26:59 +03:00
David Kyle	a5b24bf44c	Mute ClassificationIT (#62063 ) testWithOnlyTrainingRowsAndTrainingPercentIsFifty_DependentVariableIsBoolean For #60759	2020-09-07 16:10:48 +01:00
Luca Cavanna	168b448a0f	Rename runtime_script field type to runtime (#62034 ) We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts. This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper . Relates to #59332	2020-09-07 15:07:23 +02:00
Jim Ferenczi	fa8e76abb1	Improve reduction of terms aggregations (#61779 ) (#62028 ) Today, the terms aggregation reduces multiple aggregations at once using a map to group same buckets together. This operation can be costly since it requires to lookup every bucket in a global map with no particular order. This commit changes how term buckets are sorted by shards and partial reduces in order to be able to reduce results using a merge-sort strategy. For bwc, results are merged with the legacy code if any of the aggregations use a different sort (if it was returned by a node in prior versions). Relates #51857	2020-09-07 13:13:20 +02:00
Martijn van Groningen	7e566ddd06	Move data stream yaml tests to xpack plugin module. (#62032 ) Backport of #61998 to 7.x branch. Moving the data stream yaml tests to xpack plugin module has the following benefits: * The tests are ran both with security enabled (as part of xpack/plugin integTest) and disabled (as part of xpack/plugin/data-stream/qa/rest integTest). * and running the tests in mixed cluster qa environment.	2020-09-07 11:03:32 +02:00
Tanguy Leroux	ebbf4df9fd	Adapt SearchableSnapshotsBlobStoreCacheIntegTests to Lucene 8.7.0 (#61989 ) (#62030 ) Elasticsearch now uses #61957 which includes https://issues.apache.org/jira/browse/LUCENE-9456. We can remove the corresponding //TODO in SearchableSnapshotsBlobStoreCacheIntegTests.	2020-09-07 10:25:44 +02:00
Luca Cavanna	0c8b438577	Add support for runtime fields (#61776 ) This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch. We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script. The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields. Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-09-07 09:14:53 +02:00
Dimitris Athanasiou	d37f197efd	[7.x][ML] Allow training_percent to be any positive double up to hundred (#61977 ) (#61990 ) This changes the valid range of `training_percent` for regression and classification from [1, 100] to (0, 100]. Backport of #61977	2020-09-04 17:34:14 +03:00
Yannick Welsch	6d08b55d4e	Simplify searchable snapshot shard allocation (#61911 ) Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).	2020-09-04 15:45:00 +02:00
Tanguy Leroux	289b1f4ae7	Reduce locking in prewarming (#61837 ) (#61967 ) During prewarming of a Lucene file a CacheFile is acquired and then locked for the duration of the prewarming, ie locked until all the part of the file has been downloaded and written to cache on disk. The locking (executed with CacheFile#fileLock()) is here to prevent the cache file to be evicted while it is prewarming. But holding the lock may take a while for large files, specially since restoring snapshot files now respects the indices.recovery.max_bytes_per_sec setting of 40mb (#58658), and this can have bad consequences like preventing the CacheFile to be evicted, opened or closed. In manual tests this bug slow downs various requests like mounting a new searchable snapshot index or deleting an existing one that is still prewarming. This commit reduces the time the lock is held during prewarming so that the read lock is only required when actively writing to the CacheFile.	2020-09-04 15:06:50 +02:00
Martijn van Groningen	84af9abd76	Fix skip versions fix xpack data stream yaml tests. (#61981 ) Backport of #61926 to 7.x branch. Relates to #61904	2020-09-04 14:53:38 +02:00
Benjamin Trent	cec102a391	[7.x] [ML] adds new n_gram_encoding custom processor (#61578 ) (#61935 ) * [ML] adds new n_gram_encoding custom processor (#61578) This adds a new `n_gram_encoding` feature processor for analytics and inference. The focus of this processor is simple ngram encodings that allow: - multiple ngrams [1..5] - Prefix, infix, suffix	2020-09-04 08:36:50 -04:00
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Dimitris Athanasiou	bdccab7c7a	[7.x][ML] Add incremental id during data frame analytics reindexing (#61943 ) (#61971 ) Previously, we added a copy of the `_id` during reindexing and sorted the destination index on that. This allowed us to traverse the docs in the destination index in a stable order multiple times and with efficiency. However, the destination index being sorted means we cannot have `nested` typed fields. This is a problem as it does not allow us to provide a good experience with our evaluate API when it comes to computing metrics for specific classes, features, etc. This commit changes the approach in order to result to a destination index that allows nested fields. Instead of adding a copy of the `_id` field, we now add an incremental id that we can use to traverse the docs in a stable order. We also ensure we always assign the same incremental id to the same doc from the source indices by sorting on `_seq_no` during reindexing. That in combination with the reindexing API using scroll gives us a stable order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties. The extractor now does not need to scroll. Instead we sort on the incremental id and we do ranged searches to avoid the sort-all-docs overhead. Finally, the `TestDocsIterator` is simply changed to search_after the incremental id. With these changes data frame analytics jobs do not use scroll at any part. Having all these in place, the commit adds the `nested` types to the necessary fields of `classification` and `regression` analyses results. Backport of #61943	2020-09-04 13:24:42 +03:00
Tanguy Leroux	10d14ce101	Enable searchable snapshot feature for all test clusters (#61888 ) (#61965 ) This commit reenables the searchable snapshot feature for integration tests after #61802 which changed some build plugins.	2020-09-04 11:20:24 +02:00
Tim Vernum	cdfb163c7c	Add explicit test for DLS with OIDC metadata (#61955 ) When a user authenticates via OpenID Connect we copy information from the OIDC claims into the user's metadata in a particular format. This commit adds a test that metadata in that format can be used in a mustache template for Document Level Security. Backport of: #60030	2020-09-04 16:21:20 +10:00
Tim Vernum	57efda2865	Add DEBUG logging for undefined role mapping field (#61887 ) A role mapping with the following content: "rules": { "field": { "userid" : "admin" } } will never match because `userid` is not a valid field. The correct field is `username`. This change adds DEBUG logging when an undefined field is referenced. The choice to use DEBUG rather than INFO/WARN is that the set of fields is partially dynamic (e.g. the `metadata.*` fields), so it may be perfectly reasonable to check a field that is not defined for that user. For example this rule: "rules": { "field": { "metadata.ranking" : "A" } } would generate a log message for an unranked user, which would erroneously suggest that such a rule is an error. This DEBUG logging will assist in diagnosing problems, without introducing that confusion. Backport of: #61246 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-09-04 14:19:05 +10:00
Ryan Ernst	d6e17170c3	Simplify adding plugins and modules to testclusters (#61886 ) There are currently half a dozen ways to add plugins and modules for test clusters to use. All of them require the calling project to peek into the plugin or module they want to use to grab its bundlePlugin task, and then both depend on that task, as well as extract the archive path the task will produce. This creates cross project dependencies that are difficult to detect, and if the dependent plugin/module has not yet been configured, the build will fail because the task does not yet exist. This commit makes the plugin and module methods for testclusters symmetetric, and simply adding a file provider directly, or a project path that will produce the plugin/module zip. Internally this new variant uses normal configuration/dependencies across projects to get the zip artifact. It also has the added benefit of no longer needing the caller to add to the test task a dependsOn for bundlePlugin task.	2020-09-03 19:37:46 -07:00
Jake Landis	ea1e8ad6ea	[7.x] Fix passing params to template or script failed in watcher (#58559 ) (#61885 ) The main changes are: * Fix custom params are missing when using template or script in watcher's logging action or jira action. * Add yaml tests to test passing params to template or script successfully. Relates to #57625 Co-authored-by: bellengao <gbl_long@163.com>	2020-09-03 15:47:51 -05:00
Costin Leau	99ee87e332	EQL: Revert filter pipe (#61907 ) The current implementation of the filter pipe is incomplete hence why it got reverted. Note this is not a complete revert as some of the improvements of said commit (such as the PostAnalyzer) are useful in general. Relates #61805 (cherry picked from commit 7a7eb66f7d39586c3a3bc00dce49e6c47a23b46a)	2020-09-03 22:31:08 +03:00
Martijn van Groningen	3d9c12e2d3	Fix data stream wildcard resolution bug in eql search api.(#61910 ) Backport of #61904 to 7.x branch. The eql search api redirects to the search api. For this reason the eql search api could work with concrete data stream names. However if security is enabled and a data stream name snippet with a wildcard was used then it could not resolve this expressions. This is because the EqlSearchRequest class didn't overwrite the `includeDataStreams()` method. This pr fixes this, so that the security layer can properly expand data stream name wildcard expressions for the eql search api. This commit also moves the eql data stream test to xpack rest tests, so that the test runs with security enabled. This is required to reproduce the bug. Closes #60828	2020-09-03 16:03:57 +02:00
Tanguy Leroux	c90ee32cdc	Mute ClassificationIT.testTooLowConfiguredMemoryStillStarts (#61915 ) Relates #61913	2020-09-03 15:52:01 +02:00
Jake Landis	dbb78e1c45	[7.x] Correct the query dsl for watching elasticsearch version (#58321 ) (#61882 ) The term query should be looking at the cluster_uuid field in elasticsearch_version_mismatch.json. Co-authored-by: bellengao <gbl_long@163.com>	2020-09-02 16:58:21 -05:00
Nik Everett	c19f67ce30	Support longs in BitArray (backport of #61867 ) (#61871 ) We frequently use `long`s with `BitArray` in aggs and right now we have to assert that the `long` fits in an `int`. This adds support for `long` to `BitArray` so we don't need those assertions.	2020-09-02 17:24:31 -04:00
Dimitris Athanasiou	ec405978fc	[7.x][ML] Update reindexing task progress before persisting job progress (#61868 ) (#61875 ) This fixes a bug introduced by #61782. In that PR I thought I could simplify the persistence of progress by using the progress straight from the stats holder in the task instead of calling the get stats action. However, I overlooked that it is then possible to have stale progress for the reindexing task as that is only updated when the get stats API is called. In this commit this is fixed by updating reindexing task progress before persisting the job progress. This seems to be much more lightweight than calling the get stats request. Closes #61852 Backport of #61868	2020-09-02 21:44:18 +03:00
Benjamin Trent	c22415c241	[7.x] [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846 ) (#61869 ) * [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846) Native PR addresses this test failure: https://github.com/elastic/ml-cpp/pull/1465 closes https://github.com/elastic/elasticsearch/issues/61704 closes https://github.com/elastic/elasticsearch/issues/61561	2020-09-02 13:23:23 -04:00
Jake Landis	f6b3148e5e	[7.x] Convert second 1/2 x-pack plugins from integTest to [yaml \| java]RestTest or internalClusterTest (#61802 ) (#61856 ) For 1/2 the plugins in x-pack, the integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. This includes the following projects: security, spatial, stack, transform, vecotrs, voting-only-node, and watcher. A few of the more specialized qa projects within these plugins have not been changed with this PR due to additional complexity which should be addressed separately. related: #60630 related: #56841 related: #59939 related: #55896	2020-09-02 11:20:55 -05:00
Jake Landis	794aac717d	[7.x] Convert first 1/2 x-pack plugins from integTest to [yaml \| java]RestTest or internalClusterTest (#60630 ) (#61855 ) For 1/2 the plugins in x-pack, the integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. This includes the following projects: async-search, autoscaling, ccr, enrich, eql, frozen-indicies, data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml A few of the more specialized qa projects within these plugins have not been changed with this PR due to additional complexity which should be addressed separately. A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is). related: #61802 related: #56841 related: #59939 related: #55896	2020-09-02 11:19:24 -05:00
Dimitris Athanasiou	07ab0beea0	[7.x][ML] Improve handling of exception while starting DFA process (#61838 ) (#61847 ) While starting the data frame analytics process it is possible to get an exception before the process crash handler is in place. In addition, right after starting the process, we check the process is alive to ensure we capture a failed process. However, those exceptions are unhandled. This commit catches any exception thrown while starting the process and sets the task to failed with the root cause error message. I have also taken the chance to remove some unused parameters in `NativeAnalyticsProcessFactory`. Relates #61704 Backport of #61838	2020-09-02 16:32:45 +03:00
Costin Leau	e6dc8054a5	EQL: Introduce filter pipe (#61805 ) Allow filtering through a pipe, across events and sequences. Filter pipes are pushed down to base queries. For now filtering after limit (head/tail) is forbidden as the semantics are still up for debate. Fix #59763 (cherry picked from commit 80569a388b76cecb5f55037fe989c8b6f140761b)	2020-09-02 15:48:51 +03:00
David Kyle	d268540f20	[ML] Check and install the latest template in the DFA executor (#61589 ) (#61842 ) During a rolling upgrade it is possible that a worker node will be upgraded before the master in which case the DFA templates will not have been installed. Before a DFA task starts check that the latest template is installed and install it if necessary.	2020-09-02 12:16:29 +01:00
Nik Everett	f8158bdb2d	Skip failing test Tracked by https://github.com/elastic/elasticsearch/issues/61561	2020-09-01 13:44:31 -04:00
Dimitris Athanasiou	2547cfbe54	[7.x][ML] Persist progress when setting DFA task to failed (#61782 ) (#61792 ) When an error occurs and we set the task to failed via the `DataFrameAnalyticsTask.setFailed` method we do not persist progress. If the job is later restarted, this means we do not correctly restore from where we can but instead we start the job from scratch and have to redo the reindexing phase. This commit solves this bug by persisting the progress before setting the task to failed. Backport of #61782	2020-09-01 18:33:07 +03:00
Tanguy Leroux	d94d6b5b70	Also account for state not recovered in BlobStoreCacheService Following #61726 after a test failure	2020-09-01 12:10:04 +02:00
Ioannis Kakavas	ced2c140fe	Unmute TokenAuthIntegTests test (#61715 ) @ywangd made an awesome analysis on why this test is failing, over at https://github.com/elastic/elasticsearch/issues/55816#issuecomment-620913282 This change makes it so that we use the same client to perform a refresh of a token, as we use to subsequently attempt to authenticate with the refreshed token. This ensures the tests are failing and is a good approximation of how we expect the same client doing the refresh, to also perform the subsequent authentication in real life uses. The errors we were seeing from users have disappeared after #55114 so we deem our behavior safe.	2020-09-01 13:06:11 +03:00
Tanguy Leroux	787dfda4c1	Prevent snapshots to be mounted as system indices (#61517 ) (#61727 ) System indices can be snapshotted and are therefore potential candidates to be mounted as searchable snapshot indices. As of today nothing prevents a snapshot to be mounted under an index name starting with . and this can lead to conflicting situations because searchable snapshot indices are read-only and Elasticsearch expects some system indices to be writable; because searchable snapshot indices will soon use an internal system index (#60522) to speed up recoveries and we should prevent the system index to be itself a searchable snapshot index (leading to some deadlock situation for recovery). This commit introduces a changes to prevent snapshots to be mounted as a system index.	2020-09-01 11:13:28 +02:00
Tanguy Leroux	92eb6e7844	Remove cluster state listener in BlobStoreCacheService (#61726 ) (#61769 ) BlobStoreCacheService implements ClusterStateListener in order to maintain a ready flag that can be used to know when the snapshot blob cache should be queries or not. Now the getAsync() method correctly handles the various exceptions that can be thrown when the .snapshot-blob-cache index is not available(in isExpectedCacheGetException()) and logs as DEBUG we can safely remove the ready flag.	2020-09-01 11:12:52 +02:00
Benjamin Trent	7dabaad7d9	[ML] refactor ml job node selection into its own class (#61521 ) (#61747 ) This is a minor refactor where the job node load logic (node availability, etc.) is refactored into its own class. This will allow future things (i.e. autoscaling decisions) to use the same node load detection class. backport of #61521	2020-08-31 14:00:23 -04:00
Benjamin Trent	8b33d8813a	[ML] binary classification per-class feature importance for model inference (#61597 ) (#61746 ) This commit addresses two issues: - per class feature importance is now written out for binary classification (logistic regression) - The `class_name` in per class feature importance now matches what is written in the `top_classes` array. backport of https://github.com/elastic/elasticsearch/pull/61597	2020-08-31 13:57:00 -04:00
Mayya Sharipova	fe9c66096c	Small refactoring of AsyncExecutionId (#61640 ) - don't do encoding of asynchExecutionId if it is already provided in the encoded form - create a new instance of AsyncExecutionId after checks for correctness are done	2020-08-31 10:24:36 -04:00
Nhat Nguyen	e37ce561c7	Set timeout of auto put-follow request to unbounded (#61679 ) If the master node of the follower cluster is busy, then the auto-follower will fail to initialize the following process. This also occurs when an auto-follow pattern matches multiple indices. We should set the timeout of put-follow requests issued by the auto-follower to unbounded to avoid this problem. Closes #56891	2020-08-31 09:58:19 -04:00
Jason Tedor	64cd229b35	Upgrade to Lucene 8.6.2 (#61688 ) This commit upgrades the Lucene dependencies to 8.6.2.	2020-08-31 09:54:07 -04:00
Rory Hunter	ff6c071275	Implement deprecation logging using log4j (#61629 ) Backport of #61474. Part of #46106. Simplify the implementation of deprecation logging by relying of log4j more completely, and implementing additional behaviour through custom appenders and filters.	2020-08-31 12:42:04 +01:00
Henning Andersen	4c9fe31da8	Mute testTooLowConfiguredMemoryStillStarts (#61705 ) Related to #61704	2020-08-31 11:19:53 +02:00
Ioannis Kakavas	c621d291d2	Call ActionListener.onResponse exactly once (#61584 ) (#61682 ) Under specific circumstances we would call onResponse twice, which led to unexpected behavior.	2020-08-30 16:47:09 +03:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
Albert Zaharovits	1cb97a2c4f	Relax the index access control check for scroll searches (#61446 ) The check introduced by #60640 for scroll searches, in which we log if the index access control before the query and fetch phases differs from when the scroll context is created, is too strict, leading to spurious warning log messages. The check verifies instance equality but this assumes that the fetch phase is executed in the same thread context as the scroll context validation. However, this is not true if the scroll search is executed cross-cluster, and even for local scroll searches it is an unfounded assumption. The check is hence reduced to a null check for the index access. The fact that the access control is suitable given the indices that are actually accessed (by the scroll) will be done in a follow-up, after we better regulate the creation of index access controls in general.	2020-08-27 21:16:01 +03:00
Luca Cavanna	f769821bc8	Pass SearchLookup supplier through to fielddataBuilder (#61430 ) (#61638 ) Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-08-27 18:09:56 +02:00
Nik Everett	5a83e89a2b	Migrate histogram field test (#61602 ) (#61632 ) Replaces the superclass of the test for `HistogramFieldMapperTests` with one that doesn't extend `ESSingleNodeTestCase` so we don't depend on the entire world to test the field mapper. Continues #61301.	2020-08-27 11:08:19 -04:00
David Turner	c89fb8b9fa	Avoid listener call under SparseFileTracker#mutex (#61626 ) Today we sometimes notify a listener of completion while holding `SparseFileTracker#mutex`. This commit move all such calls out from under the mutex and adds assertions that the mutex is not held in the listener. Closes #61520	2020-08-27 15:39:38 +01:00
David Kyle	49a5afc6c1	[ML] Increase wait for templates timeout in tests (#61623 ) (#61628 )	2020-08-27 12:57:12 +01:00
David Kyle	25e811ced7	Rewrite Inference yml tests for better clean up (#61180 ) (#61555 ) Inference processors asynchronously usage write stats to the .ml-stats index after they used. In tests the write can leak into the next test causing failures depending on which test follows. This change waits for the usage stats docs to be written at the end of the test	2020-08-27 11:16:26 +01:00
David Turner	f6055dc9b2	Suppress noisy SSL exceptions (#61359 ) If a TLS-protected connection closes unexpectedly then today we often emit a `WARN` log, typically one of the following: io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16) io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake We typically only report unexpectedly-closed connections at `DEBUG` level, but these two messages don't follow that rule and generate a lot of noise as a result. This commit adjusts the logging to report these two exceptions at `DEBUG` level only.	2020-08-27 10:59:39 +01:00
David Turner	b866aaf81c	Use int for number of parts in blob store (#61618 ) Today we use `long` to represent the number of parts of a blob. There's no need for this extra range, it forces us to do some casting elsewhere, and indeed when snapshotting we iterate over the parts using an `int` which would be an infinite loop in case of overflow anyway: for (int i = 0; i < fileInfo.numberOfParts(); i++) { This commit changes the representation of the number of parts of a blob to an `int`.	2020-08-27 10:54:03 +01:00
Ioannis Kakavas	3640ff1ff2	Add SAML AuthN request signing tests (#61582 ) - Add a unit test for our signing code - Change SAML IT to use signed authentication requests for Shibboleth to consume Backport of #48444	2020-08-27 10:41:56 +03:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
David Turner	e14d9c9514	Introduce cache index for searchable snapshots (#61595 ) If a searchable snapshot shard fails (e.g. its node leaves the cluster) we want to be able to start it up again on a different node as quickly as possible to avoid unnecessarily blocking or failing searches. It isn't feasible to fully restore such shards in an acceptably short time. In particular we would like to be able to deal with the `can_match` phase of a search ASAP so that we can skip unnecessary waiting on shards that may still be warming up but which are not required for the search. This commit solves this problem by introducing a system index that holds much of the data required to start a shard. Today() this means it holds the contents of every file with size <8kB, and the first 4kB of every other file in the shard. This system index acts as a second-level cache, behind the first-level node-local disk cache but in front of the blob store itself. Reading chunks from the index is slower than reading them directly from disk, but faster than reading them from the blob store, and is also replicated and accessible to all nodes in the cluster. () the exact heuristics for what we should put into the system index are still under investigation and may change in future. This second-level cache is populated when we attempt to read a chunk which is missing from both levels of cache and must therefore be read from the blob store. We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which verify that we do not hit the blob store more than necessary when starting up a shard that we've seen before, whether due to a node restart or because a snapshot was mounted multiple times. Backport of #60522 Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2020-08-27 06:38:32 +01:00
Dimitris Athanasiou	3ed65eb418	[7.x][ML] Recover data frame extraction search from latest sort key (#61544 ) (#61572 ) If a search failure occurs during data frame extraction we catch the error and retry once. However, we retry another search that is identical to the first one. This means we will re-fetch any docs that were already processed. This may result either to training a model using duplicate data or in the case of outlier detection to an error message that the process received more records than it expected. This commit fixes this issue by tracking the latest doc's sort key and then using that in a range query in case we restart the search due to a failure. Backport of #61544 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-26 17:54:00 +03:00
Benjamin Trent	a6e7a3d65f	[7.x] [ML] write warning if configured memory limit is too low for analytics job (#61505 ) (#61528 ) Backports the following commits to 7.x: [ML] write warning if configured memory limit is too low for analytics job (#61505) Having `_start` fail when the configured memory limit is too low can be frustrating. We should instead warn the user that their job might not run properly if their configured limit is too low. It might be that our estimate is too high, and their configured limit works just fine.	2020-08-26 10:35:38 -04:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Przemysław Witek	11c2710e7f	[7.x] [ML] Do not mark the DFA job as FAILED when a failure occurs after the node is shutdown (#61331 ) (#61526 )	2020-08-26 09:53:13 +02:00
Igor Motov	f70a59971a	[7.x] Add rate aggregation (#61369 ) (#61554 ) Adds a new rate aggregation that can calculate a document rate for buckets of a date_histogram. Closes #60674	2020-08-25 17:39:00 -04:00
markharwood	8b56441d2b	Search - add case insensitive support for regex queries. (#59441 ) (#61532 ) Backport to add case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes #59235	2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
Costin Leau	bff3c7470e	EQL: Replace SearchHit in response with Event (#61428 ) (#61522 ) The building block of the eql response is currently the SearchHit. This is a problem since it is tied to an actual search, and thus has scoring, highlighting, shard information and a lot of other things that are not relevant for EQL. This becomes a problem when doing sequence queries since the response is not generated from one search query and thus there are no SearchHits to speak of. Emulating one is not just conceptually incorrect but also problematic since most of the data is missed or made-up. As such this PR introduces a simple class, Event, that maps nicely to the terminology while hiding the ES internals (the use of SearchHit or GetResult/GetResponse depending on the API used). Fix #59764 Fix #59779 Co-authored-by: Igor Motov <igor@motovs.org> (cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)	2020-08-25 17:32:42 +03:00
Armin Braun	f22ddf822e	Some Optimizations around BytesArray (#61183 ) (#61511 ) * Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache * Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference * Lighter `writeTo` implementation * Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory	2020-08-25 07:13:39 +02:00
Armin Braun	806dfcfcf7	Speed up Compression Logic by Pooling Resources (#61358 ) (#61495 ) This is mostly motivated by the performance issues we are seeing around the GET mappings REST API which (in case of a large number of indices) will create decompressing streams in a hot loop which takes a significant amount of time for the system calls involved in instantiating deflaters and inflaters. Also, this fixes a leaked deflater when deserializing cached repository data.	2020-08-25 04:01:55 +02:00
David Kyle	539cf914bc	[ML] handle new model metadata stream from native process (#59725 ) (#61251 ) This adds the serialization handling for the new model_metadata object from the native process. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-08-24 15:52:13 -04:00
Dimitris Athanasiou	618dd65d5f	[7.x][ML] Add debug logging for field caps request during DF Analytics (#61459 ) (#61478 ) Adds debug logging for the request and the response that is getting field capabilities during a data frame analytics job. Backport of #61459	2020-08-24 18:01:30 +03:00
Dimitris Athanasiou	18ca8a6be3	[7.x][ML] Remove redundant logging for creation of annotations index (#61461 ) (#61475 ) This commit removes the log info message "Created ML annotations index and aliases". The message comes in addition to elasticsearch's index creation logging and it does not add to it. In addition, since #61107 that message may be logged multiple times. Backport of #61461	2020-08-24 17:46:29 +03:00
Yang Wang	f0615113b6	Report anonymous roles in authenticate response (#61355 ) (#61454 ) Report anonymous roles in response to "GET _security/_authenticate" API call when: * Anonymous role is enabled * User is not the anonymous user * Credentials is not an API Key	2020-08-24 14:51:44 +10:00
Yang Wang	0509465a9e	Warn about unlicensed realms if no auth token can be extracted (#61402 ) (#61419 ) There are warnings about unlicense realms when user lookup fails. This PR adds similar warnings for when no authentication token can be extracted from the request.	2020-08-22 00:04:45 +10:00
Yang Wang	cd52233b94	Include authentication type for the authenticate response (#61247 ) (#61411 ) Add a new "authentication_type" field to the response of "GET _security/_authenticate".	2020-08-21 22:59:43 +10:00
Lloyd	cb83e7011c	[Backport][API keys] Add full_name and email to API key doc and use them to populate authing User (#61354 ) (#61403 ) The API key document currently doesn't include the user's full_name or email attributes, and as a result, when those attributes return `null` when hitting `GET`ing `/_security/_authenticate`, and in the SAML response from the [IdP Plugin](https://github.com/elastic/elasticsearch/pull/54046). This changeset adds those fields to the document and extracts them to fill in the User when authenticating. They're effectively going to be a snapshot of the User from when the key was created, but this is in line with roles and metadata as well. Signed-off-by: lloydmeta <lloydmeta@gmail.com>	2020-08-21 18:32:19 +09:00
Julie Tibshirani	997c73ec17	Correct how field retrieval handles multifields and copy_to. (#61391 ) Before when a value was copied to a field through a parent field or `copy_to`, we parsed it using the `FieldMapper` from the source field. Instead we should parse it using the target `FieldMapper`. This ensures that we apply the appropriate mapping type and options to the copied value. To implement the fix cleanly, this PR refactors the value parsing strategy. Now instead of looking up values directly, field mappers produce a helper object `ValueFetcher`. The value fetchers are responsible for almost all aspects of fetching, including looking up the right paths in the _source. The PR is fairly big but each commit can be reviewed individually. Fixes #61033.	2020-08-20 15:53:35 -07:00
Alan Woodward	a3a0c63ccf	Convert NumberFieldMapper to parametrized form (#61092 ) (#61376 ) In addition, this commit converts ScaledFloatFieldMapper as it was relying on a number of static values taken from NumberFieldMapper that had changed or been removed.	2020-08-20 16:43:26 +01:00
Nik Everett	9789e6d154	Migrate some field mapper tests to ESTestCase (#61301 ) (#61346 ) This switches a few tests for field mappers from `ESSingleNodeTestCase` to `ESTestCase` because, in general, we prefer to avoid `ESSingleNodeTestCase` when we can because it is slow and "big". "Big" here means that it pulls in an entire node, making it difficult to reason about what you are testing.	2020-08-19 15:43:49 -04:00
Francisco Fernández Castaño	89a7f32100	Fix SearchableSnapshotDirectoryTests#testRecoveryStateIsKeptOpenAfterPreWarmFailure (#61343 ) The test didn't take into account the case where 0 documents are indexed into the shard, meaning that files aren't loaded during the pre-warm phase. The test injects FileSystem failures, if the snapshot doesn't contain any files, pre-warm doesn't read any files and the recovery completes normally. Closes #61295 Backport of #61317	2020-08-19 19:28:47 +02:00
Andrei Stefan	a214d7902a	EQL: make endsWith function use a wildcard ES query wherever possible (#61160 ) (#61320 ) (cherry picked from commit 55fdb7e2c74d4fae86ec40686091ecba831caeaf)	2020-08-19 14:17:55 +03:00
Andrei Stefan	a6c0670a14	EQL: make stringContains function use a wildcard ES query (#61189 ) (#61313 ) (cherry picked from commit 039a7d1c68f6f1ed0e7e6cfb86be6b04eec8051c)	2020-08-19 12:40:48 +03:00
Martijn van Groningen	d4a8172f8e	Disable ilm history in data streams rest qa module. (#61312 ) Backport of #61291 to 7.x branch. Closes #61273	2020-08-19 10:34:26 +02:00
Andrei Stefan	93abbb9057	Add data streams wildcard pattern yml test (#61269 ) (#61280 ) (cherry picked from commit e13a365eeb6d8c6a7c9a91f94f0e8e78e3fe4773)	2020-08-18 19:38:07 +03:00
Andrei Stefan	5de0f19cc3	EQL: Return sequence join keys in the original type (#61268 ) (#61282 ) (cherry picked from commit d54957d61faa0d502387656e3cace594017b6ea0)	2020-08-18 19:37:15 +03:00
Martijn van Groningen	cbf60f6c5e	Add tests that simulate new indexing strategy upgrade procedure. (#61263 ) Backport of #61082 to 7.x branch. Closes #58251	2020-08-18 17:02:29 +02:00
Andrei Stefan	ad627c7eab	Introduce ordering in the constant_keyword test for better predictibility. (#61248 ) (#61252 ) (cherry picked from commit 69193f9de8178dbaa1d8467f1686b100dd2b161c)	2020-08-18 12:17:15 +03:00
Mark Tozzi	db1df6cc30	[7.x] Remove a bunch of type boilerplate from Aggs (#60852 ) (#61031 )	2020-08-17 12:13:05 -04:00
Andrei Stefan	db8788e5a2	QL: wildcard field type support (#58062 ) (#61205 ) (cherry picked from commit c874e6cdd3e051ce599b50c18642de038b84105f)	2020-08-17 18:24:32 +03:00
Andrei Stefan	90e116738e	QL: add filtering query dsl support to IndexResolver (#60514 ) (#61200 ) (cherry picked from commit 7b3635d796be26af9f87d19963a8ed4ab4bbf13f)	2020-08-17 17:59:58 +03:00
Nik Everett	1b7bbafd81	Add method to make random DateFormatter pattern (backport of #60613 ) (#61213 ) Adds a method to make a random date `DateFormatter` pattern. We expect this'll be useful for runtime fields to compate their formatting with the standard date field.	2020-08-17 10:57:52 -04:00
David Kyle	ba89af544f	[7.x] Respect ML upgrade mode in TrainedModelStatsService (#61143 ) (#61187 ) When in upgrade mode the ml stats service should not write to the stats index.	2020-08-17 11:09:25 +01:00
Benjamin Trent	43fc6c34bc	Muting analytics integration tests for change new native output model_metadata (#61158 ) relates to elastic/ml-cpp#1456	2020-08-14 11:45:35 -04:00
Benjamin Trent	8f302282f4	[ML] adds new feature_processors field for data frame analytics (#60528 ) (#61148 ) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes https://github.com/elastic/elasticsearch/issues/59327	2020-08-14 10:32:20 -04:00
David Roberts	d1b60269f4	[ML] Ensure annotations index mappings are up to date (#61142 ) When the ML annotations index was first added, only the ML UI wrote to it, so the code to create it was designed with this in mind. Now the ML backend also creates annotations, and those mappings can change between versions. In this change: 1. The code that runs on the master node to create the annotations index if it doesn't exist but another ML index does also now ensures the mappings are up-to-date. This is good enough for the ML UI's use of the annotations index, because the upgrade order rules say that the whole Elasticsearch cluster must be upgraded prior to Kibana, so the master node should be on the newer version before Kibana tries to write an annotation with the new fields. 2. We now also check whether the annotations index exists with the correct mappings before starting an autodetect process on a node. This is necessary because ML nodes can be upgraded before the master node, so could write an annotation with the new fields before the master node knows about the new fields. Backport of #61107	2020-08-14 13:51:04 +01:00
Benjamin Trent	7c3bfb9437	[ML] updating feature_importance results mapping (#61104 ) (#61144 ) This updates the feature_importance mapping change from elastic/ml-cpp#1387	2020-08-14 08:43:10 -04:00
Nhat Nguyen	328c86a4ec	Increase timeout in PrimaryFollowerAllocationIT A slow CI can take more than 10 seconds to relocate shards on the follower.	2020-08-13 14:41:32 -04:00
Benjamin Trent	a497263c47	[ML] ensure config index is updated before clearing finished_time (#61064 ) (#61085 ) When a user upgrades between versions, they may stop their ML jobs. Then when the upgrade is complete, they will want to open the jobs again. But, when opening a job, we attempt to clear out the jobs finished_time. If the job configuration has adjusted between the versions (i.e. added a new field), it will dynamically update the .ml-config index. We should instead manually change the mapping to be the updated version.	2020-08-13 08:12:10 -04:00
David Turner	dd7410d8c2	Disable rebalancing in searchable snapshots tests (#61068 ) Fixes a test failure in which we allocated some shards and then relocated them elsewhere, invalidating an assertion about the recovery statistics which assumed that the shards stayed where they were originally allocated. Closes #61067.	2020-08-13 09:08:27 +01:00
Lee Hinman	e3df64a429	[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994 ) (#61045 ) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848	2020-08-12 11:06:23 -06:00
Andrei Dan	32173a82c8	ILM: add frozen phase (#60983 ) (#61035 ) This adds a frozen phase to ILM that will allow the execution of the set_priority, unfollow, allocate, freeze and searchable_snapshot actions. The frozen phase will be executed after the cold and before the delete phase. (cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-12 16:36:27 +01:00
Yannick Welsch	6644f2283d	Do not access snapshot repo on dedicated voting-only master node (#61016 ) Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and will never be the elected master so we should not require it to have write access to the repository. Closes #59649	2020-08-12 16:56:45 +02:00
Benjamin Trent	4275a715c9	[ML] adjusting inference processor to support foreach usage (#60915 ) (#61022 ) `foreach` processors store information within the `_ingest` metadata object. This commit adds the contents of the `_ingest` metadata (if it is not empty). And will append new inference results if the result field already exists. This allows a `foreach` to execute and multiple inference results being written to the same result field. closes https://github.com/elastic/elasticsearch/issues/60867	2020-08-12 08:34:18 -04:00
markharwood	66098e0bf4	Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959 ) (#61010 ) The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type. The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields. Closes #60957	2020-08-12 10:44:52 +01:00
Armin Braun	32423a486d	Simplify and Speed up some Compression Usage (#60953 ) (#61008 ) Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.	2020-08-12 11:06:23 +02:00
Andrei Dan	35423a75af	Tests: don't fail if ILM executed the action already (#60916 ) (#60982 ) (cherry picked from commit 8c970ad20f4f55a9c0d6a256aa643ea037281e75) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-12 09:04:04 +01:00
Dimitris Athanasiou	2e18c0f2ac	[7.x][ML] Audit force stopping data frame analytics (#60973 ) (#61004 ) Audits a message when a data frame analytics job is force stopped. Backport of #60973	2020-08-12 07:45:26 +03:00
Nhat Nguyen	ceaa28e97b	Increase timeout in testFollowIndexWithConcurrentMappingChanges (#60534 ) The test failed because the leader was taking a lot of CPUs to process many mapping updates. This commit reduces the mapping updates, increases timeout, and adds more debug info. Closes #59832	2020-08-11 17:03:22 -04:00
Nhat Nguyen	bf7eecf1dc	Fix synchronization in ShardFollowNodeTask (#60490 ) The leader mapping, settings, and aliases versions in a shard follow-task are updated without proper synchronization and can go backward.	2020-08-11 14:52:52 -04:00
Francisco Fernández Castaño	d544528c7b	Increase information on assertRecoveryStats assertion (#60960 ) Backport of #60952	2020-08-11 15:30:59 +02:00
Dimitris Athanasiou	6062672148	[7.x][ML] Monitor reindex response in DF analytics (#60911 ) (#60958 ) Examines the reindex response in order to report potential problems that occurred during the reindexing phase of data frame analytics jobs. Backport of #60911	2020-08-11 16:17:37 +03:00
Mark Tozzi	ab8518fb5b	[7.x] Extensibility for Composite Agg #59648 (#60842 )	2020-08-11 09:14:33 -04:00
Dan Hermann	839c6cdfc0	Un-mute data stream REST test (#60120 ) (#60939 )	2020-08-11 08:10:04 -05:00
David Kyle	18a65c5b9a	DFA Get Stats can return multiple responses if more than one error occurs (#60950 ) If the search for get stats with multiple job Ids fails the listener is called for each failure. This change waits for all responses then returns the first error if there was one.	2020-08-11 10:28:05 +01:00
Alan Woodward	54279212cf	Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847 ) (#60924 ) This commit cuts over all metadata field mappers to parametrized format.	2020-08-11 09:02:28 +01:00
Benjamin Trent	66b3e89482	[ML] enable logging for test failures (#60902 ) (#60910 )	2020-08-10 12:36:30 -04:00
Francisco Fernández Castaño	2a4fd8329b	Avoid a race condition while waiting for pre warm to finish on SearchableSnapshotDirectoryTests (#60906 ) Backport of #60885. Closes #60813	2020-08-10 17:29:16 +02:00
Jim Ferenczi	f30f1f04e2	Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816 ) This commit removes the ability to test the top level result of an aggregator before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test the final output (the one sent to the end user) rather than an intermediary result that could be different. This change also removes spurious commits triggered on top of a random index writer. These commits slow down the tests and are redundant with the commits that the random index writer performs.	2020-08-10 17:23:00 +02:00
David Roberts	dd02e9f31a	[TEST] Mute SearchableSnapshotActionIT testSearchableSnapshotForceMergesIndexToOneSegment (#60904 ) Due to https://github.com/elastic/elasticsearch/issues/60901	2020-08-10 15:25:39 +01:00
Henning Andersen	a155315ceb	Autoscaling decider and decision service (#59005 ) (#60884 ) Split the autoscaling decider into a service and configuration in order to enable having additional context information available in the service. Added AutoscalingDeciderContext holding generic information all deciders are expected to need. Implemented GET _autoscaling/decision	2020-08-10 15:28:52 +02:00
Andrei Dan	235e5ed3ea	[7.x] ILM: add force-merge step to searchable snapshots action (#60819 ) (#60882 ) This adds a force-merge step to the searchable snapshot action, enabled by default, but parameterizable using the `force_merge-index" optional boolean. eg. ``` PUT _ilm/policy/my_policy { "policy": { "phases": { "cold": { "actions": { "searchable_snapshot" : { "snapshot_repository" : "backing_repo", "force_merge_index": true } } } } } } ``` (cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-10 13:45:11 +01:00
Martijn van Groningen	64bb082f9b	Improve error message for non append-only writes that target data stream (#60874 ) Backport of #60809 to 7.x branch. Closes #60581	2020-08-10 13:18:59 +02:00
David Kyle	6b2ddf4453	Fix typo in DataHistogramGroupByIT name (#60880 ) (#60883 )	2020-08-10 11:55:01 +01:00
David Turner	f168bdac7d	Change transitive -> transient in ILM log message (#60871 ) "Transitive" is technically ok here but it's an overloaded word and it's not immediately clear which meaning is intended so this log message always makes me do a double-take. I think both "transient" and "transitory" are clearer, with "transient" being the usual choice.	2020-08-10 11:37:49 +01:00
David Turner	a2d5bfca2f	Even longer timeout for XPackRestIT (#60812 ) This suite is still occasionally failing with a timeout on macOS. Suggest further increasing this timeout until this suite is broken up. Relates #58071	2020-08-10 10:26:21 +01:00
Benjamin Trent	bc17afc535	[7.x] [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776 ) (#60784 ) * [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776) When deleting an analytics configuration, the request MIGHT fail if the .ml-stats index does not exist or is in strange state (shards unallocated). Instead of making the request fail, we should log that we were unable to delete the stats docs and then have them cleaned up in the 'delete_expire_data' janitorial process	2020-08-06 08:55:35 -04:00
David Turner	05b2a2db8b	AwaitsFix for #60781	2020-08-06 12:28:53 +01:00
David Turner	f24a3a4e81	AwaitsFix for 60781	2020-08-06 11:35:44 +01:00
Hendrik Muhs	b210aaf666	[Transform] remove wrong test (#60807 ) remove test, scripts are excluded in the change collector, the test is a leftover from a previous solution of #57332, which has been discarded relates #60724 fixes #60794	2020-08-06 11:56:19 +02:00
Dimitris Athanasiou	cedbe6968b	[7.x][ML] Include cause in logging during test inference (#60749 ) (#60805 ) When an exception is thrown during test inference we are not including the cause message in our logging. This commit addresses this issue. Backport of #60749	2020-08-06 11:45:59 +03:00
Ryan Ernst	d88098c1d5	Mute flaky transform pivot test see https://github.com/elastic/elasticsearch/issues/60794	2020-08-05 14:53:25 -07:00
Francisco Fernández Castaño	b4044004aa	Add recovery state tracking for Searchable Snapshots (#60751 ) This pull request adds recovery state tracking for Searchable Snapshots. In order to track recoveries for searchable snapshot backed indices, this pull request adds a new type of RecoveryState. This newRecoveryState instance is able to deal with the small differences that arise during Searchable snapshots recoveries. Those differences can be summarized as follows: - The Directory implementation that's provided by SearchableSnapshots mark the snapshot files as reused during recovery. In order to keep track of the recovery process as the cache is pre-warmed, those files shouldn't be marked as reused. - Once the shard is created, the cache starts its pre-warming phase, meaning that we should keep track of those downloads during that process and tie the recovery to this pre-warming phase. The shard is considered recovered once this pre-warming phase has finished. Backport of #60505	2020-08-05 17:41:49 +02:00
Hendrik Muhs	08f94c914b	[Transform] disable optimizations when using scripts in group_by (#60724 ) disable optimizations when using scripts in group_by, when scripts using scripts we can not predict the outcome and we have no query counterpart. Other optimizations for other group_by's are not affected. fixes #57332	2020-08-05 17:27:19 +02:00
Hendrik Muhs	2b6891b584	[7.x][Transform] implement test suite to test continuous transforms (#60725 ) implements a test suite for testing continuous transform with randomization in terms of mappings, index settings, transform configuration. Add a test case for terms and date histogram. The test covers: - continuous mode with several checkpoints created - correctness of results - optimizations (minimal necessary writes) - permutations of features (index settings, aggs, data types, index or data stream)	2020-08-05 16:56:01 +02:00
Albert Zaharovits	e5dce5e805	Use the Index Access Control from the scroll search context (#60640 ) When the RBACEngine authorizes scroll searches it sets the index access control to the very limiting IndicesAccessControl.ALLOW_NO_INDICES value. This change will set it to the value for the index access control that was produced during the authorization of the initial search that created the scroll, which is now stored in the scroll context.	2020-08-05 15:37:37 +03:00
Przemysław Witek	0afa1bd972	Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601 ) (#60727 )	2020-08-05 13:39:40 +02:00
Yannick Welsch	9f6f66f156	Fail searchable snapshot shards on invalid license (#60722 ) Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.	2020-08-05 13:14:15 +02:00
Adrien Grand	67f6f34c23	Remove dataset.* fields. (#60720 ) These are being replaced by the `data_stream.*` fields.	2020-08-05 11:35:05 +02:00
Rory Hunter	43762f69d1	Move deprecation HTTP tests to deprecation plugin (#60523 ) Backport of #60298. This PR moves the deprecation HTTP tests under the deprecation plugin, as a precursor to adding further tests as part of #58924.	2020-08-05 09:54:34 +01:00
Adrien Grand	602d269059	Rename `datastream` to `data_stream`. (#60714 ) The name of the feature having a space: "data stream", the key should have an underscore.	2020-08-05 09:55:02 +02:00
Russ Cam	e9c0bf1566	Remove body from indices.create_data_stream REST spec (#60705 ) This commit removes the body property from the indices.create_data_stream.json REST API spec as the API does not support sending a body. Update the description of the API to remove that a data stream can be updated with the API - data streams can only be created with this API and attempting to update yields a `resource_already_exists_exception`. Closes #60704 (cherry picked from commit 2cab2e0ee094769852df31566dbe22b5df59d900)	2020-08-05 17:01:28 +10:00
Igor Motov	959690a64a	Refactor extendedBounds to use DoubleBounds (#60556 ) (#60681 ) Refactors extendedBounds to use DoubleBounds instead of 2 variables. This is a follow up for #59175	2020-08-04 16:45:47 -04:00
Francisco Fernández Castaño	b500b3d55a	Decrease restore rate limit value to enforce its usage on SearchableSnapshotsIntegTests#testMaxRestoreBytesPerSecIsUsed (#60650 ) Fixes #59287. Backport of #59592	2020-08-04 17:44:47 +02:00
Alan Woodward	b3ae5d26bd	Move mapper validation to the mappers themselves (#60072 ) (#60649 ) Currently, validation of mappers (checking that cross-references are correct, limits on field name lengths and object depths, multiple definitions, etc) is performed by the MapperService. This means that any mapper-specific validation, for example that done on the CompletionFieldMapper, needs to be called specifically from core server code, and so we can't add validation to mappers that live in plugins. This commit reworks the validation framework so that mapper-specific validation is done on the Mapper itself. Mapper gets a new `validate(MappingLookup)` method (already present on `MetadataFieldMapper` and now pulled up to the parent interface), which is called from a new `DocumentMapper.validate()` method. All the validation code currently living on `MapperService` moves either to individual mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into `MappingLookup`, an altered `DocumentFieldMappers` which now knows about object fields and can check for duplicate definitions, or into DocumentMapper which handles soft limit checks.	2020-08-04 14:39:20 +01:00
Rene Groeschke	bdd7347bbf	Merge test runner task into RestIntegTest (7.x backport) (#60600 ) * Merge test runner task into RestIntegTest (#60261) * Merge test runner task into RestIntegTest * Reorganizing Standalone runner and RestIntegTest task * Rework general test task configuration and extension * Fix merge issues * use former 7.x common test configuration	2020-08-04 14:46:32 +02:00
Adrien Grand	20ae1b75bd	Rename dataset to datastream (#60638 ) Co-authored-by: ruflin <spam@ruflin.com>	2020-08-04 09:58:54 +02:00
Armin Braun	7ae9dc2092	Unify Stream Copy Buffer Usage (#56078 ) (#60608 ) We have various ways of copying between two streams and handling thread-local buffers throughout the codebase. This commit unifies a number of them and removes buffer allocations in many spots.	2020-08-04 09:54:52 +02:00
Yang Wang	54aaadade7	API key name should always be required for creation (#59836 ) (#60636 ) The name is now required when creating or granting API keys.	2020-08-04 13:28:47 +10:00
Tim Vernum	c58e32bb27	Improve assertion failure when error is not empty (#60572 ) This commit changes TokenAuthIntegTests so all occurrences of assertThat(x.size(), equalTo(0)); become assertThat(x, empty()); This means that the assertion failure message will include the contents of the list (`x`) instead of just its size, which facilitates easier failure diagnosis. Relates: #56903 Backport of: #60496	2020-08-04 11:05:18 +10:00
Jake Landis	bcb9d06bb6	[7.x] Cleanup xpack build.gradle (#60554 ) (#60603 ) This commit does three things: * Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license) * Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack * Removes a place holder test in favor of disabling the test task (in the async plugin)	2020-08-03 13:11:43 -05:00
Hendrik Muhs	1e01832b0c	fix possible NPE introduced in #60591	2020-08-03 16:40:38 +02:00
Hendrik Muhs	cd6492fc11	[Transform] fix regression of date histogram optimization (#60591 ) fixes mix up of input and output field name for date histogram optimization. minimal fix, more tests to be added with #60469 fixes #60590	2020-08-03 15:52:08 +02:00
Yannick Welsch	b0d601fa63	Adjust searchable snapshot license (#60578 ) No longer needs Platinum license for testing on staging.	2020-08-03 13:19:53 +02:00
Yannick Welsch	9e24a54382	Clean existing index folder when loading searchable snapshot (#60122 ) Closing a regular index and mounting a snapshot-backed index into that existing index does not clean the existing index folders of those preexisting shards. This PR removes the existing Lucene / translog files once the searchable snapshot shard is starting up. Future PRs will make reuse of the existing index files to populate the cache.	2020-08-03 13:19:11 +02:00
Yang Wang	a76fc324d4	Fix get-license test failure by ensure cluster is ready (#60498 ) (#60569 ) When a new cluster starts, the HTTP layer becomes ready to accept incoming requests while the basic license is still being populated in the background. When a get license request comes in before the license is ready, it can get 404 error. This PR fixes it by either wrap the license check in assertBusy or ensure the license is ready before perform the check. This is a backport for both #60498 and #60573	2020-08-03 19:40:03 +10:00
Tim Vernum	1a373b0c21	Only call listener once (SP template registration) (#60567 ) This fixes a bug in the IdP's template registration that would sometimes call the listener twice. Resolves: #54285 Resolves: #54423 Backport of: #60497	2020-08-03 13:45:16 +10:00
Andrei Dan	ac258f10d6	Data streams: throw ResourceAlreadyExists exception (#60518 ) (#60536 ) For consistency reasons (and reducing the overload of IllegalArgumentException) this changes the exception thrown when trying to create a data stream that already exists. (cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-01 16:31:09 +01:00
Julie Tibshirani	f1d4fd8c3e	Correct name of IndexFieldData#loadGlobalDirect. (#60492 ) It seems 'localGlobalDirect' was just a typo.	2020-07-31 10:53:21 -07:00
Rene Groeschke	ed4b70190b	Replace immediate task creations by using task avoidance api (#60071 ) (#60504 ) - Replace immediate task creations by using task avoidance api - One step closer to #56610 - Still many tasks are created during configuration phase. Tackled in separate steps	2020-07-31 13:09:04 +02:00
Hendrik Muhs	a721d6d19b	[Transform] use correct version in BWC serialization test (#60500 ) use correct version in BWC serialization test fixes #60464	2020-07-31 11:23:05 +02:00
Julie Tibshirani	8ac81a3447	Remove IndexFieldData#clear since it is unused. (#60475 ) This method was never called. It also seemed tricky that calling a method on `IndexFieldData` could clear the contents of a shared cache.	2020-07-30 14:07:55 -07:00
Mark Tozzi	970a0c8957	[7.x] Aggregation tests for Wildcard Field (#58507 ) (#60423 )	2020-07-30 08:56:21 -04:00
Przemysław Witek	9e27f7474c	Make MlDailyMaintenanceService delete jobs that are in deleting state anyway (#60121 ) (#60439 )	2020-07-30 09:53:11 +02:00
Hendrik Muhs	aaed6b59d6	[7.x][Transform] add support for missing bucket (#59591 ) (#60390 ) add support for "missing_bucket" in group_by fixes #42941 fixes #55102 backport #59591	2020-07-30 08:26:51 +02:00
Bogdan Pintea	8c22adc447	SQL: Add option to provide the delimiter for the CSV format (#59907 ) (#60420 ) * SQL: Add option to provide the delimiter for the CSV format (#59907) * Add option to provide the delimiter to the CSV fmt This adds the option to provide the desired character as the separator for the CSV format (the default remains comma). A set of characters are excluded though - like CR, LF, `"` - to avoid slipping onto the CSV-dialects slope. The tab is also forbidden, the user needs to choose the "tsv" format explicitely. Update the doc to make it clear that the textual CSV, TSV and TXT formats pass the cursor back to the user through the Cursor HTTP header. (cherry picked from commit 3a8b00cc7480f7ada57fcea3cbac957facac08fc) * Java8 fixes - replace Set#of(); - URLDecoder#decode() requires a string (vs a charset) as 2nd arg.	2020-07-29 21:40:11 +02:00
Bogdan Pintea	30610d962a	Fix SYS COLUMNS schema in ODBC mode (#59513 ) (#60418 ) * Fix SYS COLUMNS schema in ODBC mode (#59513) * Fix SYS COLUMNS schema in ODBC mode This fixes a regression when certain ODBC-specific columns that need to be of the short type were returned as the integer type. This also fixes the stubbing for the -indices SYS COLUMN commands. (cherry picked from commit 96d89dc9b1fd731e736ef804a16bd05496c1dea6) Java8 fix: avoid diamond notation in test. Qualify anonymous class in test.	2020-07-29 21:19:32 +02:00
Bogdan Pintea	4c771485f6	SQL: fix NPE on ambiguous GROUP BY (#59370 ) (#60416 ) * fix npe on ambiguous group by * add tests for aggregates and group by, add quotes to error message * add more cases for Group By ambiguity test * change error messages for field ambiguity * change collection aliases approach * add locations of attributes for ambiguous grouping error * Adress review comments - remove Comparable implementations from Attribute and Location; - add ad-hoc comparator for sorting locations in ambiguity message; - remove added AttributeAlias class with Touple; - add code comment to explain issue with Location overwriting. * Fix c&p error in location ref generation comparator Fix copy&paste error in dedicated comparator used for sorting ambiguity location references. Slightly increase its readability. Co-authored-by: Nikita Verkhovin <verkhovin13@gmail.com> (cherry picked from commit 9ba70a3483f0f4987229bec231cdc004f51b88a5)	2020-07-29 20:44:28 +02:00
Bogdan Pintea	79ef263fc2	Add test with alias reuse and grouping (#60396 ) (#60421 ) Add test with alias reuse and grouping. (cherry picked from commit 37ee819eb98fd10c1b16a61e4e1d446d0ee859de)	2020-07-29 20:43:04 +02:00
Mark Vieira	39fa1c4df0	Add compatibility testing for JDBC driver (#60409 ) This commit adds compatibility testing of our JDBC driver against different Elasticsearch versions. Although we are really testing the forwards compatibility nature of the JDBC driver we model the testing the same as we do existing BWC tests, that is, with the current branch fetching the earlier versions of the artifact that is to be tested. In this case, that's the JDBC driver itself. Because the tests include the JDBC driver jar on it's classpath we had to change the packaging of the driver jar in order to avoid jarhell and other conflicting dependency issues when using an old JDBC driver with later branches. For this we simply relocate all driver dependencies in the shadow jar under a "shadowed" package. This allows the JDBC driver to use the correct version of Elasticsearch libs classes, while the tests themselves use their versions. Since this required a change to the driver jar compatibility testing can only go back as far as that version which at the time of this commit is 7.8.1.	2020-07-29 10:45:11 -07:00
David Roberts	2a0116f51b	[ML] Take more care that memory estimation uses unique named pipes (#60405 ) Prior to this change ML memory estimation processes for a given job would always use the same named pipe names. This would often cause one of the processes to fail. This change avoids this risk by adding an incrementing counter value into the named pipe names used for memory estimation processes. Backport of #60395	2020-07-29 17:29:55 +01:00
Armin Braun	bfee7b91ff	Increase Timeouts in SLMBlockingIntegTests (#60356 ) (#60403 ) The retention run goes through a number of steps and can randomly take more than 10s. => increased timeout to 30s like we did in other spots in this test Also, noticed that we had a hard wait of 10s in this test, removed it and adjusted following busy assert in a way that can deal with a missing snapshot (from when the assert runs before the snapshot was put into the CS). Closes #60336	2020-07-29 17:34:49 +02:00
Benjamin Trent	76359aaa53	[ML] always write prediction_[score\|probability] for classification inference (#60335 ) (#60397 ) In order to unify model inference and analytics results we need to write the same fields. prediction_probability and prediction_score are now written for inference calls against classification models.	2020-07-29 10:58:14 -04:00
Nhat Nguyen	9d4a64e749	Allow CCR on nodes with legacy roles only (#60093 ) CCR will stop functioning if the master node is on 7.8, but data nodes are before that version because the master node considers that all data nodes do not have the remote cluster client role. This commit allows CCR work on data nodes with legacy roles only. Relates #54146 Relates #59375	2020-07-29 10:57:31 -04:00
Benjamin Trent	a6da1fd73e	[ML] require alias when indexing to an alias that should be created (#60315 ) (#60394 ) This sets up all indexing to one of our write aliases to require it actually be an alias. This allows failures scenarios to be captured quickly, loudly, and then potentially recovered.	2020-07-29 10:52:36 -04:00
Jim Ferenczi	578749a5e8	Fix AsyncResultsServiceTests#testRetrieveFromMemoryWithExpiration (#60337 ) This change ensures that the expiration time that is set in the test is long enough to not be triggered by a slow execution. Closes #60255	2020-07-29 09:47:47 +02:00
Hendrik Muhs	5eb04fb413	[Transform] fix performance regression introduced in #60196 (#60276 ) re-work #60196, to not skip building change collectors as otherwise date histogram only pivots would run slow relates #60125	2020-07-29 09:44:03 +02:00
Armin Braun	753fd4f6bc	Cleanup and optimize More Serialization Spots (#59959 ) (#60331 ) Same as #59626 for a few more spots.	2020-07-29 07:20:44 +02:00
Benjamin Trent	54c8936508	[ML] do not summerize importance for custom features (#60198 ) (#60333 ) If a feature is created via a custom pre-processor, we should return the importance for that feature. This means we will not return the importance for the original document field for custom processed features. closes https://github.com/elastic/elasticsearch/issues/59330	2020-07-28 15:58:20 -04:00
Julie Tibshirani	c7bfb5de41	Add search `fields` parameter to support high-level field retrieval. (#60258 ) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.	2020-07-28 10:58:20 -07:00
Nhat Nguyen	416e51980c	Relax ShardFollowTasksExecutor validation (#60054 ) If a primary shard of a follower index is being relocated, then we will fail to create a follow-task. This validation is too restricted. We should ensure that all primaries of the follower index are active instead. Closes #59625	2020-07-28 13:46:49 -04:00
Nhat Nguyen	6ece629ec3	Set timeout of master requests on follower to unbounded (#60070 ) Today, a follow task will fail if the master node of the follower cluster is temporarily overloaded and unable to process master node requests (such as update mapping, setting, or alias) from a follow-task within the default timeout. This error is transient, and follow-tasks should not abort. We can avoid this problem by setting the timeout of master node requests on the follower cluster to unbounded. Closes #56891	2020-07-28 13:46:49 -04:00
Zachary Tong	9f8ec3e3fb	Mute SSLDriverTests#testCloseDuringHandshakePreJDK11 Tracking issue: https://github.com/elastic/elasticsearch/issues/59992	2020-07-28 13:20:53 -04:00
markharwood	e0286e9bd3	Search - remove allow-expensive-query checks from wildcard field. (#60273 ) (#60308 ) Removing allow-expensive-query checks because we think this field type is fast enough. Closes #60139	2020-07-28 17:12:33 +01:00
Dimitris Athanasiou	ed7dcff7c4	[7.x][ML] Audit updates on data frame analytics jobs (#60126 ) (#60287 ) Closes #59652 Backport of #60126	2020-07-28 16:33:35 +03:00
Dimitris Athanasiou	16ffcfb9f6	[7.x][ML] Ensure bulk requests are not over memory limit (#60219 ) (#60283 ) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219	2020-07-28 16:04:03 +03:00
Dimitris Athanasiou	981e436d6c	[7.x][ML] Improve assertion on regression alias field test (#60221 ) (#60264 ) Previously the test was asserting the prediction on each document was close 10.0 from the expected. It turned out that was not enough as we occasionally saw the test failing by little. Instead of relaxing that assertion, this commit changes it to assert the mean prediction error is less than 10.0. This should reduce the chances of the test failing significantly. Fixes #60212 Backport of #60221	2020-07-28 11:48:00 +03:00
Dan Hermann	b98caf58ee	Mark data stream APIs as stable (#59860 ) (#60206 )	2020-07-27 10:37:52 -05:00
Benjamin Trent	ea3c49979e	Test mute for issue 60212 (#60214 )	2020-07-27 10:10:40 -04:00
Hendrik Muhs	95c99ca887	[Transform] Fix Regression: continuous transform can fail for (date) histogram group_by(#60196 ) do not create change collector if group_by configuration does not support change detection fixes #60125	2020-07-27 14:50:03 +02:00
Dimitris Athanasiou	439b7f7e59	[7.x][ML] DFA result processor should only skip rows and model chunks on cancel (#60113 ) (#60193 ) When the job is force-closed or shutting down due to a fatal error we clean up all cancellable job operations. This includes cancelling the results processor. However, this means that we might not persist objects that are written from the process like stats, memory usage, etc. In hindsight, we do not gain from cancelling the results processor in its entirety. It makes more sense to skip row results and model chunks but keep stats and instrumentation about the job as the latter may contain useful information to understand what happened to the job. Backport of #60113	2020-07-27 13:42:46 +03:00
David Roberts	89466eefa5	Don't require separate privilege for internal detail of put pipeline (#60190 ) Putting an ingest pipeline used to require that the user calling it had permission to get nodes info as well as permission to manage ingest. This was due to an internal implementaton detail that was not visible to the end user. This change alters the behaviour so that a user with the manage_pipeline cluster privilege can put an ingest pipeline regardless of whether they have the separate privilege to get nodes info. The internal implementation detail now runs as the internal _xpack user when security is enabled. Backport of #60106	2020-07-27 10:44:48 +01:00
Nhat Nguyen	bc65b3a590	Increase timeout in AutoFollowIT (#60004 ) It can take more than 10 seconds to auto-follow and create a follow-task on a slow CI. This commit increases timeout in AutoFollowIT by replacing assertBusy with assertLongBusy. Closes #59952	2020-07-23 16:36:53 -04:00
Nhat Nguyen	0fe4d5df67	Increase timeout testFollowIndexWithConcurrentMappingChanges Fixes #59273	2020-07-23 16:22:58 -04:00
Dimitris Athanasiou	6b9a362ec2	[7.x][ML] Skip test inference if DFA task has been stopped (#60116 ) (#60127 ) If the job is stopped before starting inference on test data, we should skip inference entirely. Backport of #60116	2020-07-23 18:34:09 +03:00
Dan Hermann	ca25f6ae6f	Include the resolve index action in the view_index_metadata privilege (#59785 ) (#60112 )	2020-07-23 08:13:56 -05:00
Dan Hermann	fe12217c7f	[7.x] Move REST specs for data streams (#60111 )	2020-07-23 08:10:54 -05:00
Armin Braun	ebb6677815	Formalize and Streamline Buffer Sizes used by Repositories (#59771 ) (#60051 ) Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient. By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well. We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`. This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs. This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.	2020-07-22 21:06:31 +02:00
Larry Gregory	a686ccc9b2	[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854 ) (#60047 )	2020-07-22 11:06:10 -04:00
Jay Modi	c8ef2e18f7	Thread safe clean up of LocalNodeModeListeners (#60007 ) This commit continues on the work in #59801 and makes other implementors of the LocalNodeMasterListener interface thread safe in that they will no longer allow the callbacks to run on different threads and possibly race each other. This also helps address other issues where these events could be queued to wait for execution while the service keeps moving forward thinking it is the master even when that is not the case. In order to accomplish this, the LocalNodeMasterListener no longer has the executorName() method to prevent future uses that could encounter this surprising behavior. Each use was inspected and if the class was also a ClusterStateListener, the implementation of LocalNodeMasterListener was removed in favor of a single listener that combined the logic. A single listener is used and there is currently no guarantee on execution order between ClusterStateListeners and LocalNodeMasterListeners, so a future change there could cause undesired consequences. For other classes, the implementations of the callbacks were inspected and if the operations were lightweight, the overriden executorName method was removed to use the default, which runs on the same thread. Backport of #59932	2020-07-22 08:02:18 -06:00
Dimitris Athanasiou	7e652ca873	[7.x][ML] Include same fields during test inference as in training (#… (#60034 ) In #58877, when we switched test inference on java, we just use the doc's `_source` as features. However, this could be missing out on features that were used during training, e.g. alias fields, etc. This commit addresses this by extracting fields to use as features during inference the same way they are extracted in `DataFrameDataExtractor` when they are used for training. Backport of #59963	2020-07-22 12:54:13 +03:00
David Roberts	7358f9fb05	[ML] Mute ForecastIT.testOverflowToDisk in EAR builds (#60040 ) Due to https://github.com/elastic/elasticsearch/issues/58806	2020-07-22 10:17:37 +01:00
James Baiera	1c1a4297e0	Track backing indices in data streams stats from cluster state (#59817 ) (#60015 ) If shard level results are incomplete in the data streams stats call, it is possible to get inaccurate counts of the number of backing indices, despite this data being accurate and available in the cluster state.	2020-07-21 23:21:33 -04:00
James Baiera	b3363cf8f9	[7.x] Remove unneeded rest params from Data Stream Stats (#59575 ) (#59661 ) This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data Streams Stats REST endpoint. These options are required for broadcast requests, but are not needed for anything in terms of resolving data streams. Instead, we just set a default set of IndicesOptions on the transport request.	2020-07-21 15:59:16 -04:00
Armin Braun	5613e4b00b	Increase Timeout in testSLMRetentionAfterRestore (#59979 ) (#59991 ) This test failed by hitting the 10s default busy assert timeout. Given how involved the retention run is (multiple disk reads, CS updates etc.) we should have a higher timeout here. Also, removed the pointless delete call for the snapshot that we just asserted is gone, at the end of the test. Closes #59956	2020-07-21 18:19:18 +02:00
Nik Everett	6f6076e208	Drop some params from IndexFieldData.Builder (backport of #59934 ) (#59972 ) We never used the `IndexSettings` parameter and we only used the `MappedFieldType` parameter to get the name of the field which we already know everywhere where we build the `IFD.Builder`. This allows us to drop a fair bit of ceremony from a couple of tests.	2020-07-21 10:28:59 -04:00
Przemysław Witek	283a1f605c	Rename binary_soft_classification evaluation to outlier_detection (#59951 ) (#59970 )	2020-07-21 15:15:04 +02:00
Yannick Welsch	07784a0b16	CCR recoveries using wrong setting for chunk sizes (#59597 ) The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.	2020-07-21 13:56:06 +02:00
Tal Levy	c9ac4bf7c8	Reduce memory usage of GeoGridTiler tests (#59921 ) This PR further reduces the memory footprint of the testGeoHashGridCircuitBreaker test such that only 0.26% of the randomized runs result in memory usage of between 500kb-1mb. where most of that those that are in that range produce ~650kb of usage. Before, 3% of the runs would use > 50mb of memory resulting in OOMs in CI Closes #59853.	2020-07-20 15:45:39 -07:00
Jay Modi	515b53d297	Fix race in SLM master/cluster state listeners (#59896 ) This change fixes two possible race conditions in SLM related to how local master changes and cluster state events are observed. When implementing the `LocalNodeMasterListener` interface, it is only recommended to execute on a separate threadpool if the operations are heavy and would block the cluster state thread. SLM specified that the listeners should run in the Snapshot thread pool, but the operations in the listener were lightweight. This had the side effect of causing master changes to be delayed if the Snapshot threads were all busy and could also potentially cause the `onMaster` and `offMaster` calls to race if both were queued and then executed concurrently. Additionally, the `SnapshotLifecycleService` is also a `ClusterStateListener` and there is currently no order of operations guarantee between `LocalNodeMasterListeners` and `ClusterStateListeners` so this could lead to incorrect behavior. The resolution for these two issues is that the SnapshotRetentionService now specifies the `SAME` executor for its implementation of the `LocalNodeMasterListener` interface. The `SnapshotLifecycleService` is no longer a `LocalNodeMasterListener` and instead tracks local master changes in its `ClusterStateListner`. Backport of #59801	2020-07-20 09:59:46 -06:00
Nik Everett	fcd8b5fe6e	Fix top_metrics when metric is missing (backport of #59471 ) (#59881 ) This fixes a null pointer exception when the metric is missing for the latest document returned by `top_metrics`. Closes #58926	2020-07-20 10:42:58 -04:00
Albert Zaharovits	3ffb20bdfc	Fix DLS/FLS permission for the submit async search action (#59693 ) The submit async search action should not populate the thread context DLS/FLS permission set, because it is not currently authorised as an "indices request" and hence the permission set that it builds is incomplete and it overrides the DLS/FLS permission set of the actual spawned search request (which is built correctly).	2020-07-20 09:37:26 +03:00
Costin Leau	9cc80621c3	EQL: Fix matching of tail/desc queries (#59827 ) When dealing with tail queries, data is returned descending for the base criterion yet the rest of the queries are ascending. This caused a problem during insertion since while in a page, the data is ASC, between pages the blocks of data is DESC. This caused incorrectly sorting inside a SequenceGroup which led to incorrect results. Further more in case of limit, since the data in a page is ASC, early return is not possible neither is desc matching. Thus the page needs to be consumed first before finding the final results. A future improvement could be to keep only the top N results dropping the rest during insertion time. (cherry picked from commit 77c88da054a1ce662a264f72cde5986d4ce37e3a)	2020-07-19 00:49:16 +03:00
Lee Hinman	8c7d414a3b	[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806 ) (#59810 ) Backports the following commits to 7.x: Fix retrieving data stream stats for a DS with multiple backing indices (#59806)	2020-07-17 16:56:07 -06:00
Nik Everett	95e6e4a452	Small cleanup for IndexFieldData (#59724 ) (#59800 ) This drops `IndexComponent` from `IndexFieldData` because it wasn't doing anything other than forcing us to perform a bunch of ceremony to build them.	2020-07-17 13:38:15 -04:00
Tal Levy	c9ab7bb651	Fix bug in circuit-breaker check for geoshape grid aggregations (#57962 ) (#59741 ) There was a bug in the geoshape circuit-breaker check where the hash values array was being allocated before its new size was accounted for by the circuit breaker. Fixes #57847.	2020-07-17 09:26:00 -07:00
Benjamin Trent	b7f30fc929	[7.x] Adding new `require_alias` option to indexing requests (#58917 ) (#59769 ) * Adding new `require_alias` option to indexing requests (#58917) This commit adds the `require_alias` flag to requests that create new documents. This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias. When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings. This is useful when an alias is required instead of a concrete index. closes https://github.com/elastic/elasticsearch/issues/55267	2020-07-17 10:24:58 -04:00
Alan Woodward	b29d368b52	Convert DateFieldMapper to parametrized format (#59429 ) (#59759 ) This commit makes DateFieldMapper extend ParametrizedFieldMapper, declaring its parameters explicitly. As well as changes to DateFieldMapper itself, there are some changes to dynamic mapping code to ensure that dynamically detected date formats are passed through to new date mapper builders.	2020-07-17 12:46:18 +01:00
Andrei Dan	301d61a98e	Tests: fix TimeSeriesDataStreamsIT.testShrinkActionInPolicyWithoutHotPhase (#59603 ) (#59689 ) The ILM policy for the source and shrunk indices run separately (ie. they are two separate managed indices). This fixes the test which exhibited some flakiness by allowing some time for the ILM policy for the source index to finish executing. (cherry picked from commit c78d5e8499fc5ca2ca1314f97bcc6b55ba06e2e7) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-17 11:26:06 +01:00
Andrei Stefan	d513e1090f	Do not create the index, if it's already there (#59745 ) (#59747 ) (cherry picked from commit d097447d257efdf0a36b1157e1f177aed86ecca1)	2020-07-17 11:38:30 +03:00
Tanguy Leroux	4827fec1cf	Revert "Mute AzureSearchableSnapshotsIT (#58775 )" (#59749 ) This reverts commit `74a78b3a7b`.	2020-07-17 10:02:46 +02:00
Martijn van Groningen	0096238df1	Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727 ) Backport #59503 to 7.x and adjusted exception messages. Relates to #59076	2020-07-16 22:29:40 +02:00
Igor Motov	2408803fad	Adds hard_bounds to histogram aggregations (#59175 ) (#59656 ) Adds a hard_bounds parameter to explicitly limit the buckets that a histogram can generate. This is especially useful in case of open ended ranges that can produce a very large number of buckets.	2020-07-16 15:31:53 -04:00
Marios Trivyzas	c7efbc1b83	SQL: Implement DATE_PARSE function for parsing strings into DATE values (#57391 ) (#59699 ) Implement DATE_PARSE(<date_str>, <pattern_str>) function which allows to parse a date string according to the specified pattern into a date object. The patterns allowed are those of java.time.format.DateTimeFormatter. Closes #54962 Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com> Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com> (cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)	2020-07-16 17:24:30 +02:00
Benjamin Trent	a28547c4b4	[7.x] [ML] add new `custom` field to trained model processors (#59542 ) (#59700 ) * [ML] add new `custom` field to trained model processors (#59542) This commit adds the new configurable field `custom`. `custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job. Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature). This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors in the analytics job configuration, we need to know the input and output field names.	2020-07-16 10:57:38 -04:00
Nik Everett	343053c0a7	Fix compilation in Eclipse (backport #59675 ) Eclipse was confused by #59583. It can't see a the public inner interface within the superclass. This time. Usually that is fine, but the Eclipse gods don't like this particular code, I guess.	2020-07-16 08:25:12 -04:00
David Kyle	c349fdcb89	Mute RegressionIT testWithDataStream (#59687 ) For #59664	2020-07-16 09:45:29 +01:00
Przemysław Witek	df4fea79cb	Add a "verbose" option to the data frame analytics stats endpoint (#59589 ) (#59621 )	2020-07-16 09:51:31 +02:00
Nhat Nguyen	b599f7a9c0	Fix estimate size of translog operations (#59206 ) Make sure that the estimateSize method includes all fields of translog operations.	2020-07-16 00:19:30 -04:00
Costin Leau	5f2285a8b3	EQL: Fix bug in returning results (#59673 ) Using serialization/deserialization when dealing with non-trivial documents causes the process to get stuck not to mention it is expensive. Use a much more simple approach at the expense of losing information (we're just interested in the source after all). (cherry picked from commit e1659822db7ce1390ba9bbfb21768e24a0907dff)	2020-07-16 01:01:13 +03:00
Julie Tibshirani	2b70758a05	Correct type parametrization in geo mappers. (#59583 ) Previously the concrete type parameters for the MappedFieldType didn't always match those for the FieldMapper. This PR updates the mappers so that the type parameters always match, which makes the design easier to follow.	2020-07-15 14:10:47 -07:00
Martijn van Groningen	f1028fbbcc	Only install stack templates via elected master node (#59624 ) (#59657 ) to avoid many error stacktraces in logs during a rolling upgrade. Stack templates use the composable index template and component APIs,these APIs aren't supported in 7.7 and earlier and in mixed cluster environments this can cause a lot of ActionNotFoundTransportException errors in the logs during rolling upgrades. If these templates are only installed via elected master node then the APIs are always there and the ActionNotFoundTransportException errors are then prevented.	2020-07-15 22:22:01 +02:00
David Kyle	df7fc8f967	Accounting for model size when models are not cached (#59607 ) When an inference model is loaded it is accounted for in circuit breaker and should not be released until there are no users of the model. Adds a reference count to the model to track usage.	2020-07-15 18:06:15 +01:00
David Turner	691759fb1f	Validate snapshot UUID during restore (#59601 ) Today when mounting a searchable snapshot we obtain the snapshot/index UUIDs and then assume that these are the UUIDs used during the subsequent restore. If you concurrently delete the snapshot and replace it with one with the same name then this assumption is violated, with chaotic consequences. This commit introduces a check that ensures that the snapshot UUID does not change during the mount process. If the snapshot remains in place then the index UUID necessarily does not change either. Relates #50999	2020-07-15 16:23:20 +01:00
Costin Leau	6b75525efb	EQL: Improve testing spec (#59615 ) Case sensitivity is incorporated as a test dimension - instead of running the same test twice, two different tests are created. Clean-up the test invocation by removing unused parameters. Fix #59294 (cherry picked from commit 72c8a3582d8e8a4a663d82814a17a1a3d2757292)	2020-07-15 18:07:24 +03:00
Igor Motov	b5ab447b3e	EQL: Fix async EQL Rest test (#59556 ) (#59620 ) Unfortunately, we cannot guarantee that the execution will be truly async even with 0ms timeout since we cannot block the execution. So, we need to modify the test to work in both async and non-async mode. Closes #59416	2020-07-15 11:02:33 -04:00
Martijn van Groningen	2a89e13e43	Move data stream transport and rest action to xpack (#59593 ) Backport of #59525 to 7.x branch. * Actions are moved to xpack core. * Transport and rest actions are moved the data-streams module. * Removed data streams methods from Client interface. * Adjusted tests to use client.execute(...) instead of data stream specific methods. * only attempt to delete all data streams if xpack is installed in rest tests * Now that ds apis are in xpack and ESIntegTestCase no longers deletes all ds, do that in the MlNativeIntegTestCase class for ml tests.	2020-07-15 16:50:44 +02:00
Ignacio Vera	f8037abf47	upgrade to lucene-8.6.0 release (#59596 ) (#59599 )	2020-07-15 12:40:57 +02:00
Tanguy Leroux	604f22db79	Use a dedicated thread pool for searchable snapshot cache prewarming (#59313 ) (#59590 ) Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.	2020-07-15 11:45:52 +02:00
Francisco Fernández Castaño	66ef1cdad7	Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124 ) Add a custom factory for recovery state into IndexStorePlugin that allows different implementors to provide its own RecoveryState implementation. Backport of #59038	2020-07-15 11:11:07 +02:00
Yannick Welsch	bc11503dc3	Wait for active license in CcrRestIT (#59543 ) Relates #53966 Closes #59486	2020-07-15 09:38:08 +02:00
Tal Levy	4bb91b61e8	Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349 ) (#59577 ) Closes #44505.	2020-07-14 22:37:48 -07:00
Armin Braun	2dd086445c	Enable Fully Concurrent Snapshot Operations (#56911 ) (#59578 ) Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first	2020-07-15 03:42:31 +02:00
Armin Braun	06d94cbb2a	Fix TODO about Spurious FAILED Snapshots (#58994 ) (#59576 ) There is no point in writing out snapshots that contain no data that can be restored whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot step that wrote data to the repository that would've other become unreferenced, but in the current day state machine without the `INIT` step there is no point in doing so.	2020-07-15 00:54:30 +02:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Martijn van Groningen	35ae3d19db	Remove data stream feature flag (#59572 ) so that it can used in the next minor release (7.9.0). Backport of #59504 to 7.x branch. Closes #53100	2020-07-14 23:50:41 +02:00
Ryan Ernst	3b688bfee5	Add license feature usage api (#59342 ) (#59571 ) This commit adds a new api to track when gold+ features are used within x-pack. The tracking is done internally whenever a feature is checked against the current license. The output of the api is a list of each used feature, which includes the name, license level, and last time it was used. In addition to a unit test for the tracking, a rest test is added which ensures starting up a default configured node does not result in any features registering as used. There are a couple features which currently do not work well with the tracking, as they are checked in a manner that makes them look always used. Those features will be fixed in followups, and in this PR they are omitted from the feature usage output.	2020-07-14 14:34:59 -07:00
James Baiera	5f7e7e9410	[7.x] Data Stream Stats API (#58707 ) (#59566 ) This API reports on statistics important for data streams, including the number of data streams, the number of backing indices for those streams, the disk usage for each data stream, and the maximum timestamp for each data stream	2020-07-14 16:57:46 -04:00
Costin Leau	679619c798	EQL: Improve retrieval of results (#59552 ) Instead of retrieving an entire SearchHit, get just a reference and postpone the document retrieval when assembling the final results. Remove sort information from results to make them consistent. Move TumblingWindow under the sequence package. Co-authored-by: James Rodewig <james.rodewig@elastic.co> (cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)	2020-07-14 23:53:57 +03:00
Albert Zaharovits	6d6d565eeb	Fix auditing of nameless API Keys (#59531 ) API keys can be created nameless using the grant endpoint (it is a bug, see #59484). This change ensures auditing doesn't throw when such an API Key is used for authn.	2020-07-14 23:46:25 +03:00
Albert Zaharovits	4eb310c777	Disallow mapping updates for doc ingestion privileges (#58784 ) The `create_doc`, `create`, `write` and `index` privileges do not grant the PutMapping action anymore. Apart from the `write` privilege, the other three privileges also do NOT grant (auto) updating the mapping when ingesting a document with unmapped fields, according to the templates. In order to maintain the BWC in the 7.x releases, the above privileges will still grant the Put and AutoPutMapping actions, but only when the "index" entity is an alias or a concrete index, but not a data stream or a backing index of a data stream.	2020-07-14 23:39:41 +03:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
David Kyle	0d2ea1b881	Check for ml privilege when using the Inference Aggregation (#59530 ) (#59562 ) The inference pipeline aggregation requires the user has permission to access the ml get trained models endpoint (_ml/inference/)	2020-07-14 20:53:40 +01:00
Tim Brooks	408a07f96a	Separate coordinating and primary bytes in stats (#59487 ) Currently we combine coordinating and primary bytes into a single bucket for indexing pressure stats. This makes sense for rejection logic. However, for metrics it would be useful to separate them.	2020-07-14 12:37:06 -06:00
Dan Hermann	70fe553ce0	[7.x] Reenable BWC tests for data streams (#59538 )	2020-07-14 13:35:52 -05:00
Albert Zaharovits	b1e4233806	Fix auditing of API Key authn without the owner realm name (#59470 ) The `Authentication` object that gets built following an API Key authentication contains the realm name of the owner user that created the key (which is audited), but the specific field used for storing it changed in #51305 . This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't throw an NPE, because the owner realm name is not found in the expected field. Closes #59425	2020-07-14 21:35:29 +03:00
Dimitris Athanasiou	ee4610c0ca	[7.x][ML] Rename cross validation splitter package (#59529 ) (#59544 ) Renames and moves the cross validation splitter package. First, the package and classes are renamed from using "cross validation splitter" to "train test splitter". Cross validation as a term is overloaded and encompasses more concepts than what we are trying to do here. Second, the package used to be under `process` but it does not make sense to be there, it can be a top level package under `dataframe`. Backport of #59529	2020-07-14 18:54:46 +03:00
Dimitris Athanasiou	37406487b9	[7.x][ML] Improve error for non-included field with unsupported type (#59424 ) (#59541 ) When a field is not included yet its type is unsupported, we currently state that the reason the field is excluded is that it is not in the includes list. However, this implies the user could include it but if the user tried to do so, they would get a failure as they would be including a field with unsupported type. This commit improves this by stating the reason a not included field with unsupported type is excluded is because of its type. Backport of #59424	2020-07-14 18:54:34 +03:00
Andrei Stefan	1fd16ffb70	Add license header to EqlStatsIT.java (#59537 )	2020-07-14 18:45:13 +03:00
Dan Hermann	e54b4a729f	[7.x] Adds write_index_only option to put mapping API (#59539 )	2020-07-14 10:34:08 -05:00
Nhat Nguyen	4d7c59bedb	Assign follower primary to nodes with remote cluster client role (#59375 ) The primary shards of follower indices during the bootstrap need to be on nodes with the remote cluster client role as those nodes reach out to the corresponding leader shards on the remote cluster to copy Lucene segment files and renew the retention leases. This commit introduces a new allocation decider that ensures bootstrapping follower primaries are allocated to nodes with the remote cluster client role. Co-authored-by: Jason Tedor <jason@tedor.me>	2020-07-14 11:23:55 -04:00
Dimitris Athanasiou	e302c66847	[7.x][ML] Fix NPE when starting classification with missing dependent_variable (#59524 ) (#59540 ) Since we have added checking the cardinality of the dependent_variable for classification, we have introduced a bug where an NPE is thrown if the dependent_variable is a missing field. This commit is fixing this issue. Backport of #59524	2020-07-14 17:56:55 +03:00
Andrei Stefan	cf752992d6	Add telemetry metrics (#59526 )	2020-07-14 16:25:24 +03:00
Dan Hermann	59f639a279	Add auto_configure privilege	2020-07-14 08:23:49 -05:00
David Kyle	d86435938b	[7.x] Add ml licence check to the pipeline inference agg. (#59213 ) (#59412 ) Ensures the licence is sufficient for the model used in inference	2020-07-14 14:03:10 +01:00
Yang Wang	f651487d74	Support prefix search for API key names (#59113 ) (#59520 ) This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.	2020-07-14 22:06:20 +10:00
Andrei Dan	7dcdaeae49	Default to @timestamp in composable template datastream definition (#59317 ) (#59516 ) This makes the data_stream timestamp field specification optional when defining a composable template. When there isn't one specified it will default to `@timestamp`. (cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 12:36:54 +01:00
Yang Wang	2e71d0aa91	Allow mixed usage of boolean and string when merging OIDC claims (#59112 ) (#59512 ) Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization. This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.	2020-07-14 20:41:16 +10:00
Andrei Dan	4180333bbc	[7.x] Composable templates: add a default mapping for @timestamp (#59244 ) (#59510 ) This adds a low precendece mapping for the `@timestamp` field with type `date`. This will aid with the bootstrapping of data streams as a timestamp mapping can be omitted when nanos precision is not needed. (cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 11:29:33 +01:00
Costin Leau	5580eb61ed	EQL: Improve sequence limiting (#59439 ) Improve the way limit (in particular offset) is being applied to handle the case where the matches are less than the offset and absolute limit. Combine Matcher and SequenceStateMachine into one class since the two have evolved beyond their original name and structure. (cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)	2020-07-14 13:19:09 +03:00
Hendrik Muhs	c8290167a0	[7.x][Transform] separate pivot and extract function interface (#59505 ) separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function. Foundation to add more functions to transform. piggy backed fixes: - when running geo tile group_by it could fail due to query clause limit (unreleased) - new style page size using settings was not validating limit of 10k (7.8)	2020-07-14 11:27:16 +02:00
Martijn van Groningen	5f24be1bc1	Also set system property when running test task. (#59499 ) Closes #59488	2020-07-14 10:34:52 +02:00
Rene Groeschke	d5c11479da	Remove remaining deprecated api usages (#59231 ) (#59498 ) - Fix duplicate path deprecation by removing duplicate test resources - fix deprecated non annotated input property in LazyPropertyList - fix deprecated usage of AbstractArchiveTask.version - Resolve correct test resources	2020-07-14 10:25:00 +02:00
David Roberts	529aa345df	[ML] Account for per-partition categorization in model memory estimate (#59458 ) Now that we have per-partition categorization, the estimate for the model memory limit required for a particular analysis config needs to take into account whether categorization is operating for the job as a whole or per-partition.	2020-07-14 09:16:28 +01:00
Yang Wang	4350add12c	Allow null name when deserialising API key document (#59485 ) (#59496 ) API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.	2020-07-14 16:08:32 +10:00
Tim Brooks	623df95a32	Adding indexing pressure stats to node stats API (#59467 ) We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.	2020-07-13 17:23:42 -06:00
Lee Hinman	81bdb20b8a	Fix license header for DataStreamRestIT	2020-07-13 14:41:29 -06:00
Lee Hinman	bf1a60130d	[7.x] Add telemetery for data streams (#59433 ) (#59454 ) This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is pretty minimal, returning only the number of data streams and the number of indices currently abstracted by a data stream: ``` ... "data_streams" : { "available" : true, "enabled" : true, "data_streams" : 3, "indices_count" : 17 } ... ```	2020-07-13 14:30:11 -06:00
Christos Soulios	3868bcc7b8	[7.x] Histogram integration on Histogram field type (#59431 ) Backports #58930 to 7.x Implements histogram aggregation over histogram fields as requested in #53285.	2020-07-13 19:36:33 +03:00
Dimitris Athanasiou	a7895ff458	[7.x][ML] Remove unused member var from ExtractedFieldsDetector (#59395 ) (#59406 ) Removes member variable `index` from `ExtractedFieldsDetector` as it is not used. Backport of #59395 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-13 19:10:43 +03:00
Igor Motov	1acb4aeba9	EQL: Prepare for release (#59331 ) (#59426 ) Enables eql setting in release builds. Relates #51613	2020-07-13 11:54:32 -04:00

... 5 6 7 8 9 ...

5803 Commits