OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-19 19:35:02 +00:00

Author	SHA1	Message	Date
Benjamin Trent	0f142c6afc	[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563 ) (#62629 ) This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well. - `GET _ml/calendars/<calendar_id>/events` - `GET _ml/calendars/<calendar_id>` - `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>` - `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`	2020-09-18 11:06:07 -04:00
Benjamin Trent	e163559e4c	[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922 ) (#62620 ) * [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true. * fixing test * Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java	2020-09-18 10:07:35 -04:00
David Turner	7324ee1044	Remove unused upgrade actions (#62552 ) These actions were almost completely removed in #40075 but a couple of classes were left in place. This commit completes their removal.	2020-09-18 08:16:13 +01:00
Dimitris Athanasiou	7118ff7976	[7.x][ML] Remove model snapshot legacy doc ids (#62434 ) (#62569 ) Removes methods that were no longer used regarding version 5.4 doc ids of ModelState. Also adds clean up of 5.4 model state and quantile docs in the daily maintenance. Backport of #62434	2020-09-17 23:43:28 +03:00
Lee Hinman	9bb7ce0b22	[7.x] Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338 ) (#62557 ) Backports the following commits to 7.x: Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338)	2020-09-17 13:29:23 -06:00
Jim Ferenczi	df93b31b15	Faster sequential access for stored fields (#62509 ) (#62573 ) Faster sequential access for stored fields Spinoff of #61806 Today retrieving stored fields at search time is optimized for random access. So we make no effort to keep state in order to not decompress the same data multiple times because two documents might be in the same compressed block. This strategy is acceptable when retrieving a top N sorted by score since there is no guarantee that documents will be on the same block. However, we have some use cases where the document to retrieve might be completely sequential: Scrolls or normal search sorted by document id. Queries on Runtime fields that extract from _source. This commit exposes a sequential stored fields reader in the custom leaf reader that we use at search time. That allows to leverage the merge instances of stored fields readers that are optimized for sequential access. This change focuses on the fetch phase for now and leverages the merge instances for stored fields only if all documents to retrieve are adjacent. Applying the same logic in the source lookup of runtime fields should be trivial but will be done in a follow up. The speedup on queries sorted by doc id is significant. I played with the scroll task of the http_logs rally track on my laptop and had the following result: \| Metric \| Task \| Baseline \| Contender \| Diff \| Unit \| \|--------------------------------------------------------------:\|-------:\|------------:\|------------:\|---------:\|--------:\| \| Total Young Gen GC \| \| 0.199 \| 0.231 \| 0.032 \| s \| \| Total Old Gen GC \| \| 0 \| 0 \| 0 \| s \| \| Store size \| \| 17.9704 \| 17.9704 \| 0 \| GB \| \| Translog size \| \| 2.04891e-06 \| 2.04891e-06 \| 0 \| GB \| \| Heap used for segments \| \| 0.820332 \| 0.820332 \| 0 \| MB \| \| Heap used for doc values \| \| 0.113979 \| 0.113979 \| 0 \| MB \| \| Heap used for terms \| \| 0.37973 \| 0.37973 \| 0 \| MB \| \| Heap used for norms \| \| 0.03302 \| 0.03302 \| 0 \| MB \| \| Heap used for points \| \| 0 \| 0 \| 0 \| MB \| \| Heap used for stored fields \| \| 0.293602 \| 0.293602 \| 0 \| MB \| \| Segment count \| \| 541 \| 541 \| 0 \| \| \| Min Throughput \| scroll \| 12.7872 \| 12.8747 \| 0.08758 \| pages/s \| \| Median Throughput \| scroll \| 12.9679 \| 13.0556 \| 0.08776 \| pages/s \| \| Max Throughput \| scroll \| 13.4001 \| 13.5705 \| 0.17046 \| pages/s \| \| 50th percentile latency \| scroll \| 524.966 \| 251.396 \| -273.57 \| ms \| \| 90th percentile latency \| scroll \| 577.593 \| 271.066 \| -306.527 \| ms \| \| 100th percentile latency \| scroll \| 664.73 \| 272.734 \| -391.997 \| ms \| \| 50th percentile service time \| scroll \| 522.387 \| 248.776 \| -273.612 \| ms \| \| 90th percentile service time \| scroll \| 573.118 \| 267.79 \| -305.328 \| ms \| \| 100th percentile service time \| scroll \| 660.642 \| 268.963 \| -391.678 \| ms \| \| error rate \| scroll \| 0 \| 0 \| 0 \| % \| Closes #62024	2020-09-17 19:58:18 +02:00
Andrei Dan	3753682877	Fix AllocationRoutedStep equals and hashcode (#62548 ) (#62559 ) (cherry picked from commit 79039e16305c7fb71ee012e693219a0d2b77e97b) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-09-17 17:40:21 +01:00
Dimitris Athanasiou	f5c28e2054	[7.x][ML] Do not start data frame analytics when too many docs are analyzed (#62547 ) (#62558 ) The data frame structure in c++ has a limit on 2^32 documents. This commit adds a check that the number of documents involved in the analysis are less than that and fails to start otherwise. That saves the cost of reindexing when it is unnecessary. Backport of #62547	2020-09-17 19:06:38 +03:00
Lee Hinman	3081b3827b	[7.x] Add host.ip and observer.ip fields to the synthetics-- mappings (#62412 ) (#62553 ) We need to ensure these are mapped as 'ip' instead of a keyword, even if they do end up not being used. Relates to #62193	2020-09-17 09:01:53 -06:00
Lee Hinman	a636d106bf	[7.x] Remove data_frozen node role (tier) and frozen ILM phase (#62403 ) (#62465 ) Backports the following commits to 7.x: Remove data_frozen node role (tier) and frozen ILM phase (#62403)	2020-09-17 08:58:07 -06:00
Andrei Dan	fe1194d58f	[7.x] ILM migrate data between tiers (#61377 ) (#62536 ) This adds ILM support for automatically migrating the managed indices between data tiers. This proposal makes use of a MigrateAction that is injected (similar to how the Unfollow action is injected) in phases that don't define index allocation rules using the AllocateAction or don't explicitly define the MigrateAction itself (regardless if it's enabled or disabled). (cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-09-17 15:08:31 +01:00
Christoph Büscher	aba86d7d29	Fix condition in ILM step that cannot be met (#62377 ) (#62528 ) ReplaceDataStreamBackingIndexStep#performAction seems to perform an equality check on an original Index and the write indexes names, but because this compares an Index instance to a String, the condition can never be met. This PR changes this comparison.	2020-09-17 12:38:05 +02:00
Tim Vernum	fe3bf86620	Fix RestrictedTrustManagerTests on Zulu8 (#62436 ) Since #61857 we test using BCJSSE (Bouncy Castle SSL) when running on Zulu8 because Azul have backported SSL changes from Java11 into their Java8 JRE which prevents us from using Sun JSSE in FIPS mode. BCJSSE uses different exception messages than Sun JSSE, so we needed to update RestrictedTrustManagerTests.testThatDelegateTrustManagerIsRespected to reflect the fact that sometimes we might be receive BCJSSE error messages on a Java8 JVM Resolves: #62281	2020-09-17 20:00:42 +10:00
Dimitris Athanasiou	d091c12e0c	[7.x] Generalize AsyncTwoPhaseIndexer first phase (#61739 ) (#62482 ) Current implementations of the indexer are using aggregations. Thus each search step executes a search action. However, we can generalize that to allow for any action that returns a `SearchResponse`. This commit abstracts the search phase from the search action. Backport of #61739	2020-09-17 11:57:22 +03:00
Lyudmila Fokina	167172a057	Update authc failure headers on license change (#61734 ) (#62442 ) Backport of #61734	2020-09-16 14:37:03 +02:00
Hendrik Muhs	8566e9e3e7	[Transform] Make pivot validation sub-agg aware (#62381 ) With the addition of sub aggregations like filter, the validation could fail if 2 sub aggs use the same output name. This change makes validation sub-agg aware. fixes #57814	2020-09-16 07:55:58 +02:00
Yang Wang	a11dfbe031	Oidc additional client auth types (#58708 ) (#62289 ) The OpenID Connect specification defines a number of ways for a client (RP) to authenticate itself to the OP when accessing the Token Endpoint. We currently only support `client_secret_basic`. This change introduces support for 2 additional authentication methods, namely `client_secret_post` (where the client credentials are passed in the body of the POST request to the OP) and `client_secret_jwt` where the client constructs a JWT and signs it using the the client secret as a key. Support for the above, and especially `client_secret_jwt` in our integration tests meant that the OP we use ( Connect2id server ) should be able to validate the JWT that we send it from the RP. Since we run the OP in docker and it listens on an ephemeral port we would have no way of knowing the port so that we can configure the ES running via the testcluster to know the "correct" Token Endpoint, and even if we did, this would not be the Token Endpoint URL that the OP would think it listens on. To alleviate this, we run an ES single node cluster in docker, alongside the OP so that we can configured it with the correct hostname and port within the docker network. Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>	2020-09-16 14:29:09 +10:00
Albert Zaharovits	aeed1c05b0	Ensure authz operation overrides transient authz headers (#61621 ) AuthorizationService#authorize uses the thread context to carry the result of the authorisation as transient headers. The listener argument to the `authorize` method must necessarily observe the header values. This PR makes it so that the authorisation transient headers (`_indices_permissions` and `_authz_info`, but NOT `_originating_action_name`) of the child action override the ones of the parent action. Co-authored-by: Tim Vernum tim@adjective.org	2020-09-15 16:37:38 +03:00
Armin Braun	76f56c1264	Add Missing NamedWritable Registration for ExecuteEnrichPolicyStatus (#62364 ) (#62374 ) This was missing and caused nodes to drop out of the cluster on serialization failures when ever one tried to get an enrich policy task by name. The test in here is a little dirty but I figured it would be nice to have an actual reproducer for the issue and I couldn't find any infrastructure to nicely time the tasks so I put this on top of existing test infra.	2020-09-15 15:24:15 +02:00
Lee Hinman	6b2af30a62	[7.x] Add "synthetics--" templates for synthetics fleet data (#62193 ) (#62346 ) * Add "synthetics--" templates for synthetics fleet data For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong with those and thus we should have a place for it to live. This would be data reported from heartbeat and under the 'monitoring' category. This commit adds a composable index template for `synthetics--` indices similar to the work in #56709 and #57629. Resolves #61665	2020-09-14 17:14:34 -06:00
Lee Hinman	bf9651c635	[7.x] Add "content" tier as new "data_content" role (#62247 ) (#62322 ) Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to #60848	2020-09-14 09:42:57 -06:00
Martijn van Groningen	1bb094a27b	Return 404 when deleting a non existing data stream (#62224 ) Backport of #62059 to 7.x branch. Return a 404 http status code when attempting to delete a non existing data stream. However only return a 404 when targeting a data stream without any wildcards. Closes #62022	2020-09-11 15:36:05 +02:00
Luca Cavanna	b5e1e652c1	Remove unused import	2020-09-11 10:19:01 +02:00
Luca Cavanna	3d3a1b4bc2	Tweak OpenPointInTimeRequest createTask This commit addresses a super minor misalignment with master, applying exactly the same change that was made as part of #62057, which was backported before point in time APIs were backported.	2020-09-11 10:06:35 +02:00
Nhat Nguyen	aafb2cb812	Support point in time cross cluster search (#61827 ) This commit integrates point in time into cross cluster search. Relates #61062 Closes #61790	2020-09-10 19:25:48 -04:00
Nhat Nguyen	808c8689ac	Always include the matching node when resolving point in time (#61658 ) If shards are relocated to new nodes, then searches with a point in time will fail, although a pit keeps search contexts open. This commit solves this problem by reducing info used by SearchShardIterator and always including the matching nodes when resolving a point in time. Closes #61627	2020-09-10 19:25:48 -04:00
Nhat Nguyen	035f0638f4	Support point in time in async_search (#61560 ) This commit integrates point in time into async search and ensures that it works correctly with security enabled. Relates #61062	2020-09-10 19:25:48 -04:00
Nhat Nguyen	3d69b5c41e	Introduce point in time APIs in x-pack basic (#61062 ) This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>	2020-09-10 19:25:47 -04:00
Yannick Welsch	e3feafc1e9	Enable searchable snapshots in release builds (#62201 ) Enables searchable snapshot functionality not only in snapshot, but also release builds.	2020-09-10 11:20:12 +02:00
Benjamin Trent	1b9dc0172a	[ML] adding feature_name and node size validation for tree models (#62096 ) (#62161 ) When a tree model is provided, it is possible that it is a stump. Meaning, it only has one node with no splits This implies that the tree has no features. In this case, having zero feature_names is appropriate. In any other case, this should be considered a validation failure. This commit adds the validation if there is more than 1 node, that the feature_names in the model are non-empty. closes #60759	2020-09-09 08:50:25 -04:00
Dimitris Athanasiou	6c1700c343	[7.x][ML] Outlier detection mapping for nested feature influence (#62068 ) (#62150 ) Adds mappings for outlier detection results. Backport of #62068	2020-09-09 12:41:26 +03:00
Dimitris Athanasiou	41507cff48	[7.x][ML] Update mappings of ml stats index (#61980 ) (#62091 ) - Adds missing mappings for `alpha`, `gamma`, and `lambda`. - Corrects name of `soft_tree_depth_limit` and `soft_tree_depth_tolerance`. - Removes unused `regularization_depth_penalty_multiplier`, `regularization_leaf_weight_penalty_multiplier` and `regularization_tree_size_penalty_multiplier`. Backport of #61980	2020-09-08 16:41:57 +03:00
David Kyle	fb6ee5b36d	[7.x] [ML] Assert mappings match templates in Upgrade tests (#61905 ) At the end of the rolling upgrade tests check the mappings of the concrete .ml and .transform-internal indices match the mappings in the templates. When the templates change, the tests should prove that the mappings have been updated in the new cluster.	2020-09-08 12:21:19 +01:00
Przemko Robakowski	bb357f6aae	[7.x] Move internal index templates to composable templates (#61457 ) (#61661 ) This change moves watcher, ILM history and SLM history templates to composable templates. Versions are updated to reflect the switch. Only change to the templates themselves is added `_meta` to mark them as managed	2020-09-08 11:26:06 +02:00
Dimitris Athanasiou	d37f197efd	[7.x][ML] Allow training_percent to be any positive double up to hundred (#61977 ) (#61990 ) This changes the valid range of `training_percent` for regression and classification from [1, 100] to (0, 100]. Backport of #61977	2020-09-04 17:34:14 +03:00
Benjamin Trent	cec102a391	[7.x] [ML] adds new n_gram_encoding custom processor (#61578 ) (#61935 ) * [ML] adds new n_gram_encoding custom processor (#61578) This adds a new `n_gram_encoding` feature processor for analytics and inference. The focus of this processor is simple ngram encodings that allow: - multiple ngrams [1..5] - Prefix, infix, suffix	2020-09-04 08:36:50 -04:00
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Dimitris Athanasiou	bdccab7c7a	[7.x][ML] Add incremental id during data frame analytics reindexing (#61943 ) (#61971 ) Previously, we added a copy of the `_id` during reindexing and sorted the destination index on that. This allowed us to traverse the docs in the destination index in a stable order multiple times and with efficiency. However, the destination index being sorted means we cannot have `nested` typed fields. This is a problem as it does not allow us to provide a good experience with our evaluate API when it comes to computing metrics for specific classes, features, etc. This commit changes the approach in order to result to a destination index that allows nested fields. Instead of adding a copy of the `_id` field, we now add an incremental id that we can use to traverse the docs in a stable order. We also ensure we always assign the same incremental id to the same doc from the source indices by sorting on `_seq_no` during reindexing. That in combination with the reindexing API using scroll gives us a stable order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties. The extractor now does not need to scroll. Instead we sort on the incremental id and we do ranged searches to avoid the sort-all-docs overhead. Finally, the `TestDocsIterator` is simply changed to search_after the incremental id. With these changes data frame analytics jobs do not use scroll at any part. Having all these in place, the commit adds the `nested` types to the necessary fields of `classification` and `regression` analyses results. Backport of #61943	2020-09-04 13:24:42 +03:00
Tim Vernum	cdfb163c7c	Add explicit test for DLS with OIDC metadata (#61955 ) When a user authenticates via OpenID Connect we copy information from the OIDC claims into the user's metadata in a particular format. This commit adds a test that metadata in that format can be used in a mustache template for Document Level Security. Backport of: #60030	2020-09-04 16:21:20 +10:00
Tim Vernum	57efda2865	Add DEBUG logging for undefined role mapping field (#61887 ) A role mapping with the following content: "rules": { "field": { "userid" : "admin" } } will never match because `userid` is not a valid field. The correct field is `username`. This change adds DEBUG logging when an undefined field is referenced. The choice to use DEBUG rather than INFO/WARN is that the set of fields is partially dynamic (e.g. the `metadata.*` fields), so it may be perfectly reasonable to check a field that is not defined for that user. For example this rule: "rules": { "field": { "metadata.ranking" : "A" } } would generate a log message for an unranked user, which would erroneously suggest that such a rule is an error. This DEBUG logging will assist in diagnosing problems, without introducing that confusion. Backport of: #61246 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-09-04 14:19:05 +10:00
David Kyle	d268540f20	[ML] Check and install the latest template in the DFA executor (#61589 ) (#61842 ) During a rolling upgrade it is possible that a worker node will be upgraded before the master in which case the DFA templates will not have been installed. Before a DFA task starts check that the latest template is installed and install it if necessary.	2020-09-02 12:16:29 +01:00
Tanguy Leroux	787dfda4c1	Prevent snapshots to be mounted as system indices (#61517 ) (#61727 ) System indices can be snapshotted and are therefore potential candidates to be mounted as searchable snapshot indices. As of today nothing prevents a snapshot to be mounted under an index name starting with . and this can lead to conflicting situations because searchable snapshot indices are read-only and Elasticsearch expects some system indices to be writable; because searchable snapshot indices will soon use an internal system index (#60522) to speed up recoveries and we should prevent the system index to be itself a searchable snapshot index (leading to some deadlock situation for recovery). This commit introduces a changes to prevent snapshots to be mounted as a system index.	2020-09-01 11:13:28 +02:00
Benjamin Trent	8b33d8813a	[ML] binary classification per-class feature importance for model inference (#61597 ) (#61746 ) This commit addresses two issues: - per class feature importance is now written out for binary classification (logistic regression) - The `class_name` in per class feature importance now matches what is written in the `top_classes` array. backport of https://github.com/elastic/elasticsearch/pull/61597	2020-08-31 13:57:00 -04:00
Mayya Sharipova	fe9c66096c	Small refactoring of AsyncExecutionId (#61640 ) - don't do encoding of asynchExecutionId if it is already provided in the encoded form - create a new instance of AsyncExecutionId after checks for correctness are done	2020-08-31 10:24:36 -04:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
David Turner	f6055dc9b2	Suppress noisy SSL exceptions (#61359 ) If a TLS-protected connection closes unexpectedly then today we often emit a `WARN` log, typically one of the following: io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16) io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake We typically only report unexpectedly-closed connections at `DEBUG` level, but these two messages don't follow that rule and generate a lot of noise as a result. This commit adjusts the logging to report these two exceptions at `DEBUG` level only.	2020-08-27 10:59:39 +01:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
David Turner	e14d9c9514	Introduce cache index for searchable snapshots (#61595 ) If a searchable snapshot shard fails (e.g. its node leaves the cluster) we want to be able to start it up again on a different node as quickly as possible to avoid unnecessarily blocking or failing searches. It isn't feasible to fully restore such shards in an acceptably short time. In particular we would like to be able to deal with the `can_match` phase of a search ASAP so that we can skip unnecessary waiting on shards that may still be warming up but which are not required for the search. This commit solves this problem by introducing a system index that holds much of the data required to start a shard. Today() this means it holds the contents of every file with size <8kB, and the first 4kB of every other file in the shard. This system index acts as a second-level cache, behind the first-level node-local disk cache but in front of the blob store itself. Reading chunks from the index is slower than reading them directly from disk, but faster than reading them from the blob store, and is also replicated and accessible to all nodes in the cluster. () the exact heuristics for what we should put into the system index are still under investigation and may change in future. This second-level cache is populated when we attempt to read a chunk which is missing from both levels of cache and must therefore be read from the blob store. We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which verify that we do not hit the blob store more than necessary when starting up a shard that we've seen before, whether due to a node restart or because a snapshot was mounted multiple times. Backport of #60522 Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2020-08-27 06:38:32 +01:00
Benjamin Trent	a6e7a3d65f	[7.x] [ML] write warning if configured memory limit is too low for analytics job (#61505 ) (#61528 ) Backports the following commits to 7.x: [ML] write warning if configured memory limit is too low for analytics job (#61505) Having `_start` fail when the configured memory limit is too low can be frustrating. We should instead warn the user that their job might not run properly if their configured limit is too low. It might be that our estimate is too high, and their configured limit works just fine.	2020-08-26 10:35:38 -04:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00

1 2 3 4 5 ...

2083 Commits