OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Kyle	49a5afc6c1	[ML] Increase wait for templates timeout in tests (#61623 ) (#61628 )	2020-08-27 12:57:12 +01:00
David Kyle	25e811ced7	Rewrite Inference yml tests for better clean up (#61180 ) (#61555 ) Inference processors asynchronously usage write stats to the .ml-stats index after they used. In tests the write can leak into the next test causing failures depending on which test follows. This change waits for the usage stats docs to be written at the end of the test	2020-08-27 11:16:26 +01:00
David Turner	f6055dc9b2	Suppress noisy SSL exceptions (#61359 ) If a TLS-protected connection closes unexpectedly then today we often emit a `WARN` log, typically one of the following: io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16) io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake We typically only report unexpectedly-closed connections at `DEBUG` level, but these two messages don't follow that rule and generate a lot of noise as a result. This commit adjusts the logging to report these two exceptions at `DEBUG` level only.	2020-08-27 10:59:39 +01:00
David Turner	b866aaf81c	Use int for number of parts in blob store (#61618 ) Today we use `long` to represent the number of parts of a blob. There's no need for this extra range, it forces us to do some casting elsewhere, and indeed when snapshotting we iterate over the parts using an `int` which would be an infinite loop in case of overflow anyway: for (int i = 0; i < fileInfo.numberOfParts(); i++) { This commit changes the representation of the number of parts of a blob to an `int`.	2020-08-27 10:54:03 +01:00
Ioannis Kakavas	3640ff1ff2	Add SAML AuthN request signing tests (#61582 ) - Add a unit test for our signing code - Change SAML IT to use signed authentication requests for Shibboleth to consume Backport of #48444	2020-08-27 10:41:56 +03:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
David Turner	e14d9c9514	Introduce cache index for searchable snapshots (#61595 ) If a searchable snapshot shard fails (e.g. its node leaves the cluster) we want to be able to start it up again on a different node as quickly as possible to avoid unnecessarily blocking or failing searches. It isn't feasible to fully restore such shards in an acceptably short time. In particular we would like to be able to deal with the `can_match` phase of a search ASAP so that we can skip unnecessary waiting on shards that may still be warming up but which are not required for the search. This commit solves this problem by introducing a system index that holds much of the data required to start a shard. Today() this means it holds the contents of every file with size <8kB, and the first 4kB of every other file in the shard. This system index acts as a second-level cache, behind the first-level node-local disk cache but in front of the blob store itself. Reading chunks from the index is slower than reading them directly from disk, but faster than reading them from the blob store, and is also replicated and accessible to all nodes in the cluster. () the exact heuristics for what we should put into the system index are still under investigation and may change in future. This second-level cache is populated when we attempt to read a chunk which is missing from both levels of cache and must therefore be read from the blob store. We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which verify that we do not hit the blob store more than necessary when starting up a shard that we've seen before, whether due to a node restart or because a snapshot was mounted multiple times. Backport of #60522 Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2020-08-27 06:38:32 +01:00
Dimitris Athanasiou	3ed65eb418	[7.x][ML] Recover data frame extraction search from latest sort key (#61544 ) (#61572 ) If a search failure occurs during data frame extraction we catch the error and retry once. However, we retry another search that is identical to the first one. This means we will re-fetch any docs that were already processed. This may result either to training a model using duplicate data or in the case of outlier detection to an error message that the process received more records than it expected. This commit fixes this issue by tracking the latest doc's sort key and then using that in a range query in case we restart the search due to a failure. Backport of #61544 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-26 17:54:00 +03:00
Benjamin Trent	a6e7a3d65f	[7.x] [ML] write warning if configured memory limit is too low for analytics job (#61505 ) (#61528 ) Backports the following commits to 7.x: [ML] write warning if configured memory limit is too low for analytics job (#61505) Having `_start` fail when the configured memory limit is too low can be frustrating. We should instead warn the user that their job might not run properly if their configured limit is too low. It might be that our estimate is too high, and their configured limit works just fine.	2020-08-26 10:35:38 -04:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Przemysław Witek	11c2710e7f	[7.x] [ML] Do not mark the DFA job as FAILED when a failure occurs after the node is shutdown (#61331 ) (#61526 )	2020-08-26 09:53:13 +02:00
Igor Motov	f70a59971a	[7.x] Add rate aggregation (#61369 ) (#61554 ) Adds a new rate aggregation that can calculate a document rate for buckets of a date_histogram. Closes #60674	2020-08-25 17:39:00 -04:00
markharwood	8b56441d2b	Search - add case insensitive support for regex queries. (#59441 ) (#61532 ) Backport to add case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes #59235	2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
Costin Leau	bff3c7470e	EQL: Replace SearchHit in response with Event (#61428 ) (#61522 ) The building block of the eql response is currently the SearchHit. This is a problem since it is tied to an actual search, and thus has scoring, highlighting, shard information and a lot of other things that are not relevant for EQL. This becomes a problem when doing sequence queries since the response is not generated from one search query and thus there are no SearchHits to speak of. Emulating one is not just conceptually incorrect but also problematic since most of the data is missed or made-up. As such this PR introduces a simple class, Event, that maps nicely to the terminology while hiding the ES internals (the use of SearchHit or GetResult/GetResponse depending on the API used). Fix #59764 Fix #59779 Co-authored-by: Igor Motov <igor@motovs.org> (cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)	2020-08-25 17:32:42 +03:00
Armin Braun	f22ddf822e	Some Optimizations around BytesArray (#61183 ) (#61511 ) * Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache * Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference * Lighter `writeTo` implementation * Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory	2020-08-25 07:13:39 +02:00
Armin Braun	806dfcfcf7	Speed up Compression Logic by Pooling Resources (#61358 ) (#61495 ) This is mostly motivated by the performance issues we are seeing around the GET mappings REST API which (in case of a large number of indices) will create decompressing streams in a hot loop which takes a significant amount of time for the system calls involved in instantiating deflaters and inflaters. Also, this fixes a leaked deflater when deserializing cached repository data.	2020-08-25 04:01:55 +02:00
David Kyle	539cf914bc	[ML] handle new model metadata stream from native process (#59725 ) (#61251 ) This adds the serialization handling for the new model_metadata object from the native process. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-08-24 15:52:13 -04:00
Dimitris Athanasiou	618dd65d5f	[7.x][ML] Add debug logging for field caps request during DF Analytics (#61459 ) (#61478 ) Adds debug logging for the request and the response that is getting field capabilities during a data frame analytics job. Backport of #61459	2020-08-24 18:01:30 +03:00
Dimitris Athanasiou	18ca8a6be3	[7.x][ML] Remove redundant logging for creation of annotations index (#61461 ) (#61475 ) This commit removes the log info message "Created ML annotations index and aliases". The message comes in addition to elasticsearch's index creation logging and it does not add to it. In addition, since #61107 that message may be logged multiple times. Backport of #61461	2020-08-24 17:46:29 +03:00
Yang Wang	f0615113b6	Report anonymous roles in authenticate response (#61355 ) (#61454 ) Report anonymous roles in response to "GET _security/_authenticate" API call when: * Anonymous role is enabled * User is not the anonymous user * Credentials is not an API Key	2020-08-24 14:51:44 +10:00
Yang Wang	0509465a9e	Warn about unlicensed realms if no auth token can be extracted (#61402 ) (#61419 ) There are warnings about unlicense realms when user lookup fails. This PR adds similar warnings for when no authentication token can be extracted from the request.	2020-08-22 00:04:45 +10:00
Yang Wang	cd52233b94	Include authentication type for the authenticate response (#61247 ) (#61411 ) Add a new "authentication_type" field to the response of "GET _security/_authenticate".	2020-08-21 22:59:43 +10:00
Lloyd	cb83e7011c	[Backport][API keys] Add full_name and email to API key doc and use them to populate authing User (#61354 ) (#61403 ) The API key document currently doesn't include the user's full_name or email attributes, and as a result, when those attributes return `null` when hitting `GET`ing `/_security/_authenticate`, and in the SAML response from the [IdP Plugin](https://github.com/elastic/elasticsearch/pull/54046). This changeset adds those fields to the document and extracts them to fill in the User when authenticating. They're effectively going to be a snapshot of the User from when the key was created, but this is in line with roles and metadata as well. Signed-off-by: lloydmeta <lloydmeta@gmail.com>	2020-08-21 18:32:19 +09:00
Julie Tibshirani	997c73ec17	Correct how field retrieval handles multifields and copy_to. (#61391 ) Before when a value was copied to a field through a parent field or `copy_to`, we parsed it using the `FieldMapper` from the source field. Instead we should parse it using the target `FieldMapper`. This ensures that we apply the appropriate mapping type and options to the copied value. To implement the fix cleanly, this PR refactors the value parsing strategy. Now instead of looking up values directly, field mappers produce a helper object `ValueFetcher`. The value fetchers are responsible for almost all aspects of fetching, including looking up the right paths in the _source. The PR is fairly big but each commit can be reviewed individually. Fixes #61033.	2020-08-20 15:53:35 -07:00
Alan Woodward	a3a0c63ccf	Convert NumberFieldMapper to parametrized form (#61092 ) (#61376 ) In addition, this commit converts ScaledFloatFieldMapper as it was relying on a number of static values taken from NumberFieldMapper that had changed or been removed.	2020-08-20 16:43:26 +01:00
Nik Everett	9789e6d154	Migrate some field mapper tests to ESTestCase (#61301 ) (#61346 ) This switches a few tests for field mappers from `ESSingleNodeTestCase` to `ESTestCase` because, in general, we prefer to avoid `ESSingleNodeTestCase` when we can because it is slow and "big". "Big" here means that it pulls in an entire node, making it difficult to reason about what you are testing.	2020-08-19 15:43:49 -04:00
Francisco Fernández Castaño	89a7f32100	Fix SearchableSnapshotDirectoryTests#testRecoveryStateIsKeptOpenAfterPreWarmFailure (#61343 ) The test didn't take into account the case where 0 documents are indexed into the shard, meaning that files aren't loaded during the pre-warm phase. The test injects FileSystem failures, if the snapshot doesn't contain any files, pre-warm doesn't read any files and the recovery completes normally. Closes #61295 Backport of #61317	2020-08-19 19:28:47 +02:00
Andrei Stefan	a214d7902a	EQL: make endsWith function use a wildcard ES query wherever possible (#61160 ) (#61320 ) (cherry picked from commit 55fdb7e2c74d4fae86ec40686091ecba831caeaf)	2020-08-19 14:17:55 +03:00
Andrei Stefan	a6c0670a14	EQL: make stringContains function use a wildcard ES query (#61189 ) (#61313 ) (cherry picked from commit 039a7d1c68f6f1ed0e7e6cfb86be6b04eec8051c)	2020-08-19 12:40:48 +03:00
Martijn van Groningen	d4a8172f8e	Disable ilm history in data streams rest qa module. (#61312 ) Backport of #61291 to 7.x branch. Closes #61273	2020-08-19 10:34:26 +02:00
Andrei Stefan	93abbb9057	Add data streams wildcard pattern yml test (#61269 ) (#61280 ) (cherry picked from commit e13a365eeb6d8c6a7c9a91f94f0e8e78e3fe4773)	2020-08-18 19:38:07 +03:00
Andrei Stefan	5de0f19cc3	EQL: Return sequence join keys in the original type (#61268 ) (#61282 ) (cherry picked from commit d54957d61faa0d502387656e3cace594017b6ea0)	2020-08-18 19:37:15 +03:00
Martijn van Groningen	cbf60f6c5e	Add tests that simulate new indexing strategy upgrade procedure. (#61263 ) Backport of #61082 to 7.x branch. Closes #58251	2020-08-18 17:02:29 +02:00
Andrei Stefan	ad627c7eab	Introduce ordering in the constant_keyword test for better predictibility. (#61248 ) (#61252 ) (cherry picked from commit 69193f9de8178dbaa1d8467f1686b100dd2b161c)	2020-08-18 12:17:15 +03:00
Mark Tozzi	db1df6cc30	[7.x] Remove a bunch of type boilerplate from Aggs (#60852 ) (#61031 )	2020-08-17 12:13:05 -04:00
Andrei Stefan	db8788e5a2	QL: wildcard field type support (#58062 ) (#61205 ) (cherry picked from commit c874e6cdd3e051ce599b50c18642de038b84105f)	2020-08-17 18:24:32 +03:00
Andrei Stefan	90e116738e	QL: add filtering query dsl support to IndexResolver (#60514 ) (#61200 ) (cherry picked from commit 7b3635d796be26af9f87d19963a8ed4ab4bbf13f)	2020-08-17 17:59:58 +03:00
Nik Everett	1b7bbafd81	Add method to make random DateFormatter pattern (backport of #60613 ) (#61213 ) Adds a method to make a random date `DateFormatter` pattern. We expect this'll be useful for runtime fields to compate their formatting with the standard date field.	2020-08-17 10:57:52 -04:00
David Kyle	ba89af544f	[7.x] Respect ML upgrade mode in TrainedModelStatsService (#61143 ) (#61187 ) When in upgrade mode the ml stats service should not write to the stats index.	2020-08-17 11:09:25 +01:00
Benjamin Trent	43fc6c34bc	Muting analytics integration tests for change new native output model_metadata (#61158 ) relates to elastic/ml-cpp#1456	2020-08-14 11:45:35 -04:00
Benjamin Trent	8f302282f4	[ML] adds new feature_processors field for data frame analytics (#60528 ) (#61148 ) feature_processors allow users to create custom features from individual document fields. These `feature_processors` are the same object as the trained model's pre_processors. They are passed to the native process and the native process then appends them to the pre_processor array in the inference model. closes https://github.com/elastic/elasticsearch/issues/59327	2020-08-14 10:32:20 -04:00
David Roberts	d1b60269f4	[ML] Ensure annotations index mappings are up to date (#61142 ) When the ML annotations index was first added, only the ML UI wrote to it, so the code to create it was designed with this in mind. Now the ML backend also creates annotations, and those mappings can change between versions. In this change: 1. The code that runs on the master node to create the annotations index if it doesn't exist but another ML index does also now ensures the mappings are up-to-date. This is good enough for the ML UI's use of the annotations index, because the upgrade order rules say that the whole Elasticsearch cluster must be upgraded prior to Kibana, so the master node should be on the newer version before Kibana tries to write an annotation with the new fields. 2. We now also check whether the annotations index exists with the correct mappings before starting an autodetect process on a node. This is necessary because ML nodes can be upgraded before the master node, so could write an annotation with the new fields before the master node knows about the new fields. Backport of #61107	2020-08-14 13:51:04 +01:00
Benjamin Trent	7c3bfb9437	[ML] updating feature_importance results mapping (#61104 ) (#61144 ) This updates the feature_importance mapping change from elastic/ml-cpp#1387	2020-08-14 08:43:10 -04:00
Nhat Nguyen	328c86a4ec	Increase timeout in PrimaryFollowerAllocationIT A slow CI can take more than 10 seconds to relocate shards on the follower.	2020-08-13 14:41:32 -04:00
Benjamin Trent	a497263c47	[ML] ensure config index is updated before clearing finished_time (#61064 ) (#61085 ) When a user upgrades between versions, they may stop their ML jobs. Then when the upgrade is complete, they will want to open the jobs again. But, when opening a job, we attempt to clear out the jobs finished_time. If the job configuration has adjusted between the versions (i.e. added a new field), it will dynamically update the .ml-config index. We should instead manually change the mapping to be the updated version.	2020-08-13 08:12:10 -04:00
David Turner	dd7410d8c2	Disable rebalancing in searchable snapshots tests (#61068 ) Fixes a test failure in which we allocated some shards and then relocated them elsewhere, invalidating an assertion about the recovery statistics which assumed that the shards stayed where they were originally allocated. Closes #61067.	2020-08-13 09:08:27 +01:00
Lee Hinman	e3df64a429	[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994 ) (#61045 ) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848	2020-08-12 11:06:23 -06:00
Andrei Dan	32173a82c8	ILM: add frozen phase (#60983 ) (#61035 ) This adds a frozen phase to ILM that will allow the execution of the set_priority, unfollow, allocate, freeze and searchable_snapshot actions. The frozen phase will be executed after the cold and before the delete phase. (cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-12 16:36:27 +01:00
Yannick Welsch	6644f2283d	Do not access snapshot repo on dedicated voting-only master node (#61016 ) Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and will never be the elected master so we should not require it to have write access to the repository. Closes #59649	2020-08-12 16:56:45 +02:00

1 2 3 4 5 ...

5381 Commits