OpenSearch

Commit Graph

Author	SHA1	Message	Date
Dimitris Athanasiou	ec405978fc	[7.x][ML] Update reindexing task progress before persisting job progress (#61868 ) (#61875 ) This fixes a bug introduced by #61782. In that PR I thought I could simplify the persistence of progress by using the progress straight from the stats holder in the task instead of calling the get stats action. However, I overlooked that it is then possible to have stale progress for the reindexing task as that is only updated when the get stats API is called. In this commit this is fixed by updating reindexing task progress before persisting the job progress. This seems to be much more lightweight than calling the get stats request. Closes #61852 Backport of #61868	2020-09-02 21:44:18 +03:00
Benjamin Trent	c22415c241	[7.x] [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846 ) (#61869 ) * [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846) Native PR addresses this test failure: https://github.com/elastic/ml-cpp/pull/1465 closes https://github.com/elastic/elasticsearch/issues/61704 closes https://github.com/elastic/elasticsearch/issues/61561	2020-09-02 13:23:23 -04:00
Jake Landis	f6b3148e5e	[7.x] Convert second 1/2 x-pack plugins from integTest to [yaml \| java]RestTest or internalClusterTest (#61802 ) (#61856 ) For 1/2 the plugins in x-pack, the integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. This includes the following projects: security, spatial, stack, transform, vecotrs, voting-only-node, and watcher. A few of the more specialized qa projects within these plugins have not been changed with this PR due to additional complexity which should be addressed separately. related: #60630 related: #56841 related: #59939 related: #55896	2020-09-02 11:20:55 -05:00
Jake Landis	794aac717d	[7.x] Convert first 1/2 x-pack plugins from integTest to [yaml \| java]RestTest or internalClusterTest (#60630 ) (#61855 ) For 1/2 the plugins in x-pack, the integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. This includes the following projects: async-search, autoscaling, ccr, enrich, eql, frozen-indicies, data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml A few of the more specialized qa projects within these plugins have not been changed with this PR due to additional complexity which should be addressed separately. A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is). related: #61802 related: #56841 related: #59939 related: #55896	2020-09-02 11:19:24 -05:00
Dimitris Athanasiou	07ab0beea0	[7.x][ML] Improve handling of exception while starting DFA process (#61838 ) (#61847 ) While starting the data frame analytics process it is possible to get an exception before the process crash handler is in place. In addition, right after starting the process, we check the process is alive to ensure we capture a failed process. However, those exceptions are unhandled. This commit catches any exception thrown while starting the process and sets the task to failed with the root cause error message. I have also taken the chance to remove some unused parameters in `NativeAnalyticsProcessFactory`. Relates #61704 Backport of #61838	2020-09-02 16:32:45 +03:00
Costin Leau	e6dc8054a5	EQL: Introduce filter pipe (#61805 ) Allow filtering through a pipe, across events and sequences. Filter pipes are pushed down to base queries. For now filtering after limit (head/tail) is forbidden as the semantics are still up for debate. Fix #59763 (cherry picked from commit 80569a388b76cecb5f55037fe989c8b6f140761b)	2020-09-02 15:48:51 +03:00
David Roberts	89599ba0a3	[ML] Update ML mappings upgrade test and extend to config index (#61830 ) The ML mappings upgrade test had become useless as it was checking a field that has been the same since 6.5. This commit switches to a field that was changed in 7.9. Additionally, the test only used to check the results index mappings. This commit also adds checking for the config index. Backport of #61340	2020-09-02 12:23:59 +01:00
David Kyle	d268540f20	[ML] Check and install the latest template in the DFA executor (#61589 ) (#61842 ) During a rolling upgrade it is possible that a worker node will be upgraded before the master in which case the DFA templates will not have been installed. Before a DFA task starts check that the latest template is installed and install it if necessary.	2020-09-02 12:16:29 +01:00
Nik Everett	f8158bdb2d	Skip failing test Tracked by https://github.com/elastic/elasticsearch/issues/61561	2020-09-01 13:44:31 -04:00
Dimitris Athanasiou	2547cfbe54	[7.x][ML] Persist progress when setting DFA task to failed (#61782 ) (#61792 ) When an error occurs and we set the task to failed via the `DataFrameAnalyticsTask.setFailed` method we do not persist progress. If the job is later restarted, this means we do not correctly restore from where we can but instead we start the job from scratch and have to redo the reindexing phase. This commit solves this bug by persisting the progress before setting the task to failed. Backport of #61782	2020-09-01 18:33:07 +03:00
Tanguy Leroux	d94d6b5b70	Also account for state not recovered in BlobStoreCacheService Following #61726 after a test failure	2020-09-01 12:10:04 +02:00
Ioannis Kakavas	ced2c140fe	Unmute TokenAuthIntegTests test (#61715 ) @ywangd made an awesome analysis on why this test is failing, over at https://github.com/elastic/elasticsearch/issues/55816#issuecomment-620913282 This change makes it so that we use the same client to perform a refresh of a token, as we use to subsequently attempt to authenticate with the refreshed token. This ensures the tests are failing and is a good approximation of how we expect the same client doing the refresh, to also perform the subsequent authentication in real life uses. The errors we were seeing from users have disappeared after #55114 so we deem our behavior safe.	2020-09-01 13:06:11 +03:00
Tanguy Leroux	787dfda4c1	Prevent snapshots to be mounted as system indices (#61517 ) (#61727 ) System indices can be snapshotted and are therefore potential candidates to be mounted as searchable snapshot indices. As of today nothing prevents a snapshot to be mounted under an index name starting with . and this can lead to conflicting situations because searchable snapshot indices are read-only and Elasticsearch expects some system indices to be writable; because searchable snapshot indices will soon use an internal system index (#60522) to speed up recoveries and we should prevent the system index to be itself a searchable snapshot index (leading to some deadlock situation for recovery). This commit introduces a changes to prevent snapshots to be mounted as a system index.	2020-09-01 11:13:28 +02:00
Tanguy Leroux	92eb6e7844	Remove cluster state listener in BlobStoreCacheService (#61726 ) (#61769 ) BlobStoreCacheService implements ClusterStateListener in order to maintain a ready flag that can be used to know when the snapshot blob cache should be queries or not. Now the getAsync() method correctly handles the various exceptions that can be thrown when the .snapshot-blob-cache index is not available(in isExpectedCacheGetException()) and logs as DEBUG we can safely remove the ready flag.	2020-09-01 11:12:52 +02:00
Benjamin Trent	7dabaad7d9	[ML] refactor ml job node selection into its own class (#61521 ) (#61747 ) This is a minor refactor where the job node load logic (node availability, etc.) is refactored into its own class. This will allow future things (i.e. autoscaling decisions) to use the same node load detection class. backport of #61521	2020-08-31 14:00:23 -04:00
Benjamin Trent	8b33d8813a	[ML] binary classification per-class feature importance for model inference (#61597 ) (#61746 ) This commit addresses two issues: - per class feature importance is now written out for binary classification (logistic regression) - The `class_name` in per class feature importance now matches what is written in the `top_classes` array. backport of https://github.com/elastic/elasticsearch/pull/61597	2020-08-31 13:57:00 -04:00
Mayya Sharipova	fe9c66096c	Small refactoring of AsyncExecutionId (#61640 ) - don't do encoding of asynchExecutionId if it is already provided in the encoded form - create a new instance of AsyncExecutionId after checks for correctness are done	2020-08-31 10:24:36 -04:00
Nhat Nguyen	e37ce561c7	Set timeout of auto put-follow request to unbounded (#61679 ) If the master node of the follower cluster is busy, then the auto-follower will fail to initialize the following process. This also occurs when an auto-follow pattern matches multiple indices. We should set the timeout of put-follow requests issued by the auto-follower to unbounded to avoid this problem. Closes #56891	2020-08-31 09:58:19 -04:00
Jason Tedor	64cd229b35	Upgrade to Lucene 8.6.2 (#61688 ) This commit upgrades the Lucene dependencies to 8.6.2.	2020-08-31 09:54:07 -04:00
Rory Hunter	ff6c071275	Implement deprecation logging using log4j (#61629 ) Backport of #61474. Part of #46106. Simplify the implementation of deprecation logging by relying of log4j more completely, and implementing additional behaviour through custom appenders and filters.	2020-08-31 12:42:04 +01:00
Henning Andersen	4c9fe31da8	Mute testTooLowConfiguredMemoryStillStarts (#61705 ) Related to #61704	2020-08-31 11:19:53 +02:00
Ioannis Kakavas	c621d291d2	Call ActionListener.onResponse exactly once (#61584 ) (#61682 ) Under specific circumstances we would call onResponse twice, which led to unexpected behavior.	2020-08-30 16:47:09 +03:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
Albert Zaharovits	1cb97a2c4f	Relax the index access control check for scroll searches (#61446 ) The check introduced by #60640 for scroll searches, in which we log if the index access control before the query and fetch phases differs from when the scroll context is created, is too strict, leading to spurious warning log messages. The check verifies instance equality but this assumes that the fetch phase is executed in the same thread context as the scroll context validation. However, this is not true if the scroll search is executed cross-cluster, and even for local scroll searches it is an unfounded assumption. The check is hence reduced to a null check for the index access. The fact that the access control is suitable given the indices that are actually accessed (by the scroll) will be done in a follow-up, after we better regulate the creation of index access controls in general.	2020-08-27 21:16:01 +03:00
Luca Cavanna	f769821bc8	Pass SearchLookup supplier through to fielddataBuilder (#61430 ) (#61638 ) Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-08-27 18:09:56 +02:00
Nik Everett	5a83e89a2b	Migrate histogram field test (#61602 ) (#61632 ) Replaces the superclass of the test for `HistogramFieldMapperTests` with one that doesn't extend `ESSingleNodeTestCase` so we don't depend on the entire world to test the field mapper. Continues #61301.	2020-08-27 11:08:19 -04:00
David Turner	c89fb8b9fa	Avoid listener call under SparseFileTracker#mutex (#61626 ) Today we sometimes notify a listener of completion while holding `SparseFileTracker#mutex`. This commit move all such calls out from under the mutex and adds assertions that the mutex is not held in the listener. Closes #61520	2020-08-27 15:39:38 +01:00
David Kyle	49a5afc6c1	[ML] Increase wait for templates timeout in tests (#61623 ) (#61628 )	2020-08-27 12:57:12 +01:00
David Kyle	25e811ced7	Rewrite Inference yml tests for better clean up (#61180 ) (#61555 ) Inference processors asynchronously usage write stats to the .ml-stats index after they used. In tests the write can leak into the next test causing failures depending on which test follows. This change waits for the usage stats docs to be written at the end of the test	2020-08-27 11:16:26 +01:00
David Turner	f6055dc9b2	Suppress noisy SSL exceptions (#61359 ) If a TLS-protected connection closes unexpectedly then today we often emit a `WARN` log, typically one of the following: io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16) io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake We typically only report unexpectedly-closed connections at `DEBUG` level, but these two messages don't follow that rule and generate a lot of noise as a result. This commit adjusts the logging to report these two exceptions at `DEBUG` level only.	2020-08-27 10:59:39 +01:00
David Turner	b866aaf81c	Use int for number of parts in blob store (#61618 ) Today we use `long` to represent the number of parts of a blob. There's no need for this extra range, it forces us to do some casting elsewhere, and indeed when snapshotting we iterate over the parts using an `int` which would be an infinite loop in case of overflow anyway: for (int i = 0; i < fileInfo.numberOfParts(); i++) { This commit changes the representation of the number of parts of a blob to an `int`.	2020-08-27 10:54:03 +01:00
Ioannis Kakavas	aac9eb6b64	Kerberos doc kibana link (#61466 ) (#61619 ) Add a note in Kerberos documentation that Kibana requires a configuration change too, and link to that documentation page.	2020-08-27 12:42:52 +03:00
Ioannis Kakavas	3640ff1ff2	Add SAML AuthN request signing tests (#61582 ) - Add a unit test for our signing code - Change SAML IT to use signed authentication requests for Shibboleth to consume Backport of #48444	2020-08-27 10:41:56 +03:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
David Turner	e14d9c9514	Introduce cache index for searchable snapshots (#61595 ) If a searchable snapshot shard fails (e.g. its node leaves the cluster) we want to be able to start it up again on a different node as quickly as possible to avoid unnecessarily blocking or failing searches. It isn't feasible to fully restore such shards in an acceptably short time. In particular we would like to be able to deal with the `can_match` phase of a search ASAP so that we can skip unnecessary waiting on shards that may still be warming up but which are not required for the search. This commit solves this problem by introducing a system index that holds much of the data required to start a shard. Today() this means it holds the contents of every file with size <8kB, and the first 4kB of every other file in the shard. This system index acts as a second-level cache, behind the first-level node-local disk cache but in front of the blob store itself. Reading chunks from the index is slower than reading them directly from disk, but faster than reading them from the blob store, and is also replicated and accessible to all nodes in the cluster. () the exact heuristics for what we should put into the system index are still under investigation and may change in future. This second-level cache is populated when we attempt to read a chunk which is missing from both levels of cache and must therefore be read from the blob store. We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which verify that we do not hit the blob store more than necessary when starting up a shard that we've seen before, whether due to a node restart or because a snapshot was mounted multiple times. Backport of #60522 Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2020-08-27 06:38:32 +01:00
Dimitris Athanasiou	3ed65eb418	[7.x][ML] Recover data frame extraction search from latest sort key (#61544 ) (#61572 ) If a search failure occurs during data frame extraction we catch the error and retry once. However, we retry another search that is identical to the first one. This means we will re-fetch any docs that were already processed. This may result either to training a model using duplicate data or in the case of outlier detection to an error message that the process received more records than it expected. This commit fixes this issue by tracking the latest doc's sort key and then using that in a range query in case we restart the search due to a failure. Backport of #61544 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-26 17:54:00 +03:00
Benjamin Trent	a6e7a3d65f	[7.x] [ML] write warning if configured memory limit is too low for analytics job (#61505 ) (#61528 ) Backports the following commits to 7.x: [ML] write warning if configured memory limit is too low for analytics job (#61505) Having `_start` fail when the configured memory limit is too low can be frustrating. We should instead warn the user that their job might not run properly if their configured limit is too low. It might be that our estimate is too high, and their configured limit works just fine.	2020-08-26 10:35:38 -04:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Ioannis Kakavas	283eaabc71	[7.x] Refactor SamlAuthenticationIT (#57162 ) (#61568 ) Refactor the tests to not require a mock HTTP Server. This has been the cause of flakiness and removing it doesn't affect the logical coverage of this suite. The "fake UI" is now simulated by an http client that makes the necessary requests to Elasticsearch APIs.	2020-08-26 15:34:56 +03:00
Przemysław Witek	11c2710e7f	[7.x] [ML] Do not mark the DFA job as FAILED when a failure occurs after the node is shutdown (#61331 ) (#61526 )	2020-08-26 09:53:13 +02:00
Igor Motov	f70a59971a	[7.x] Add rate aggregation (#61369 ) (#61554 ) Adds a new rate aggregation that can calculate a document rate for buckets of a date_histogram. Closes #60674	2020-08-25 17:39:00 -04:00
markharwood	8b56441d2b	Search - add case insensitive support for regex queries. (#59441 ) (#61532 ) Backport to add case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes #59235	2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
Costin Leau	bff3c7470e	EQL: Replace SearchHit in response with Event (#61428 ) (#61522 ) The building block of the eql response is currently the SearchHit. This is a problem since it is tied to an actual search, and thus has scoring, highlighting, shard information and a lot of other things that are not relevant for EQL. This becomes a problem when doing sequence queries since the response is not generated from one search query and thus there are no SearchHits to speak of. Emulating one is not just conceptually incorrect but also problematic since most of the data is missed or made-up. As such this PR introduces a simple class, Event, that maps nicely to the terminology while hiding the ES internals (the use of SearchHit or GetResult/GetResponse depending on the API used). Fix #59764 Fix #59779 Co-authored-by: Igor Motov <igor@motovs.org> (cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)	2020-08-25 17:32:42 +03:00
Armin Braun	f22ddf822e	Some Optimizations around BytesArray (#61183 ) (#61511 ) * Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache * Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference * Lighter `writeTo` implementation * Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory	2020-08-25 07:13:39 +02:00
Armin Braun	806dfcfcf7	Speed up Compression Logic by Pooling Resources (#61358 ) (#61495 ) This is mostly motivated by the performance issues we are seeing around the GET mappings REST API which (in case of a large number of indices) will create decompressing streams in a hot loop which takes a significant amount of time for the system calls involved in instantiating deflaters and inflaters. Also, this fixes a leaked deflater when deserializing cached repository data.	2020-08-25 04:01:55 +02:00
David Kyle	539cf914bc	[ML] handle new model metadata stream from native process (#59725 ) (#61251 ) This adds the serialization handling for the new model_metadata object from the native process. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-08-24 15:52:13 -04:00
James Rodewig	2b852388c5	[DOCS] Fix hyphenation for "time series" (#61472 ) (#61481 )	2020-08-24 11:18:07 -04:00
Dimitris Athanasiou	618dd65d5f	[7.x][ML] Add debug logging for field caps request during DF Analytics (#61459 ) (#61478 ) Adds debug logging for the request and the response that is getting field capabilities during a data frame analytics job. Backport of #61459	2020-08-24 18:01:30 +03:00
Dimitris Athanasiou	18ca8a6be3	[7.x][ML] Remove redundant logging for creation of annotations index (#61461 ) (#61475 ) This commit removes the log info message "Created ML annotations index and aliases". The message comes in addition to elasticsearch's index creation logging and it does not add to it. In addition, since #61107 that message may be logged multiple times. Backport of #61461	2020-08-24 17:46:29 +03:00

1 2 3 4 5 ...

6116 Commits