OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-20 11:54:52 +00:00

Author	SHA1	Message	Date
Tanguy Leroux	49f4227837	Check acknowledged responses in FsSearchableSnapshotsIT (#59021 ) Despite all my attempts I did not manage to reproduce issues like the ones described in #58961. My guess is that the _mount request got retried at some point but I wasn't able to validate this assumption. Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small random chunk size and a large number of documents is picked up in the tests. The parent class also does not verify the acknowledged status of some requests. This commit lowers down the chunk size and number of docs in tests (this is extensively tests in unit tests) and also adds assertions on acknowledged responses. Relates #58961	2020-07-05 10:50:31 +02:00
Armin Braun	071d8b2c1c	Deduplicate Empty InternalAggregations (#58386 ) (#59032 ) Working through a heap dump for an unrelated issue I found that we can easily rack up tens of MBs of duplicate empty instances in some cases. I moved to a static constructor to guard against that in all cases.	2020-07-04 14:02:16 +02:00
Bogdan Pintea	e88d71b187	[7.x] SQL: Redact credentials in connection exceptions (#58650 ) (#59025 ) * SQL: Redact credentials in connection exceptions (#58650) This commit adds the functionality to redact the credentials from the exceptions generated when a connection attempt fails, preventing them from leaking into logs, console history etc. There are a few causes that can lead to failed connections. The most challenging to deal with is a malformed connection string. The redaction tries to get around it by modifying the URI to a parsable state, so that the redaction can be applied reliably. If there's no reliability guarantee, the redaction will bluntly replace the entire connection string and the user informed about the option to modify it so that the redaction won't apply. (This is done by using a caplitalized scheme, which is legal, but otherwise never used in practice.) The commit fixes a couple of other issues with the URI parser: - it allows an empty hostname, or even entire connection string (as per the existing documentation); - it reduces the editing of the connection string in the exception messages (so that the user easier recognize their input); - it uses the default URI as source for the scheme and hostname. (cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd) * Implement String#repeat(), unavailable in Java8 Implement a client.StringUtils#repeatString() as a replacement for String#repeat(), unavailable in Java8.	2020-07-04 11:29:06 +02:00
Benjamin Trent	b9d9964d10	[ML] add exponent output aggregator to inference (#58933 ) (#59016 ) * [ML] add exponent output aggregator to inference * fixing docs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-03 14:51:00 -04:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Bogdan Pintea	3d96d91efb	[7.x] SQL: fix handling of escaped chars in JDBC connection string (#58429 ) (#58977 ) SQL: fix handling of escaped chars in JDBC connection string (#58429) This commit fixes an issue emerging when the connection string URI contains escaped characters. The original URI is pre-parsed in order to re-assemble a new URI having the optional elements filled in with defaults. The new URI has been using however the unescaped query and fragment parts. So if these contained any escaped `&` or `=` (such as in the password option value), the unescaping would reveal them and make them later interfere with the options parsing. The commit changes that, so that the new URI be built from the unescaped "raw" parts of the original URI. (cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)	2020-07-03 17:03:00 +02:00
Luca Cavanna	e3fc1638d8	Improve error handling in async search code (#57925 ) - The exception that we caught when failing to schedule a thread was incorrect. - We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager - when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed. Closes #58995	2020-07-03 16:07:26 +02:00
Hendrik Muhs	ca3da7af85	[ML] handle broken setup with state alias being an index (#58999 ) .ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias ML stops working. This change improves error handling by setting the job to failed and properly log and audit the problem. The user still has to manually fix the problem. This change should lead to a quicker resolution of the problem. fixes #58482	2020-07-03 15:26:59 +02:00
Ignacio Vera	2c2486d3d4	Fix GeoHash grid aggregation circuit breaker tests (#58218 ) (#59001 )	2020-07-03 13:46:35 +02:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
Luca Cavanna	4f86f6fb38	Submit async search to not require read privilege (#58942 ) When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data. The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.	2020-07-03 12:18:07 +02:00
David Kyle	f6a0c2c59d	[7.x] Pipeline Inference Aggregation (#58965 ) Adds a pipeline aggregation that loads a model and performs inference on the input aggregation results.	2020-07-03 09:29:04 +01:00
Tim Vernum	1133c29ce9	Treat roles as a SortedSet (#58988 ) The Saml SP document stored the role mapping in a Set, but this made the order in XContent inconsistent. This switched it to use a TreeSet. Resolves: #54733 Backport of: #55201	2020-07-03 13:40:58 +10:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Benjamin Trent	bd9b3b6116	[ML] fix inference ml-stats-write alias creation (#58947 ) (#58959 ) The check for potentially creating the .ml-stats-write alias should verify that the indices actually exist. closes #58662	2020-07-02 16:16:42 -04:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Tal Levy	d516959774	Re-enable support for array-valued geo_shape fields. (#58786 ) (#58943 ) A regression in the mapping code led to geo_shape no longer supporting array-valued fields. This commit fixes this support and adds an integration test to make sure this problem does not return!	2020-07-02 11:21:55 -07:00
David Roberts	2c04685b81	[ML] Ensure config index mappings are up-to-date before updating configs (#58938 ) We already had code to ensure the config index mappings were up-to-date before creating a new config. However, it's also possible that an update to a config could add the latest settings that require the latest mappings to index correctly. This change checks that the latest config index mappings are in place in the 3 update actions in the same way as the checks are done in the 3 put actions. Backport of #58916	2020-07-02 18:55:19 +01:00
Dan Hermann	c988afdc15	Data stream support for migrations deprecations info API	2020-07-02 11:16:22 -05:00
Przemysław Witek	751e84e4c8	Rename regression evaluation metrics to make the names consistent with loss functions (#58887 ) (#58927 )	2020-07-02 17:35:55 +02:00
Tanguy Leroux	6aa669c8bb	Fix SearchableSnapshotDirectoryStatsTests (#58912 ) Similar to #58847 but in a different tests. The failure never reproduced locally but occurs from time to time on CI.	2020-07-02 16:39:26 +02:00
Dan Hermann	b78bfa01f6	[7.x] Data stream support for graph explore API	2020-07-02 08:19:03 -05:00
David Kyle	d6643bfc7f	Revert "Mute FsSearchableSnapshotsIT testClearCache (#58902 )" The test was fixed in #58847 This reverts commit bb96c910a58df6b7b1e0e90d49dce4167d3cc79a.	2020-07-02 13:21:05 +01:00
David Kyle	bb96c910a5	Mute FsSearchableSnapshotsIT testClearCache (#58902 ) For #58901	2020-07-02 12:58:28 +01:00
Costin Leau	965f77fa44	EQL: Introduce sequence internal paging (#58859 ) Refactor sequence matching classes in order to decouple querying from results consumption (and matching). Rename some classes to better convey their intent. Introduce internal pagination of sequence algorithm, that is getting the data in slices and, if needed, moving forward in order to find more matches until either the dataset is consumer or the number of results desired is found. (cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)	2020-07-02 13:44:21 +03:00
Przemysław Witek	8e074c4495	Rename "error" field to "value" for consistency between metrics (#58726 ) (#58870 )	2020-07-02 09:08:56 +02:00
Yang Wang	a5a8b4ae1d	Add cache for application privileges (#55836 ) (#58798 ) Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors. Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).	2020-07-02 11:50:03 +10:00
Benjamin Trent	c64e283dbf	[7.x] [ML] handles compressed model stream from native process (#58009 ) (#58836 ) * [ML] handles compressed model stream from native process (#58009) This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: https://github.com/elastic/ml-cpp/pull/1349	2020-07-01 15:14:31 -04:00
Mark Vieira	1fcaec7dfc	Ignore test seed used in test system properties (#58789 )	2020-07-01 11:52:22 -07:00
Nhat Nguyen	f63cbad629	Ensure CCR partial reads never overuse buffer (#58620 ) When the documents are large, a follower can receive a partial response because the requesting range of operations is capped by max_read_request_size instead of max_read_request_operation_count. In this case, the follower will continue reading the subsequent ranges without checking the remaining size of the buffer. The buffer then can use more memory than max_write_buffer_size and even causes OOM. Backport of #58620	2020-07-01 13:23:28 -04:00
Tanguy Leroux	ec4843f4df	Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847 ) Since #58728 part of searchable snapshot shard files are written in cache in an asynchronous manner in a dedicated thread pool. It means that even if a search query is successful and returns, there are still more bytes to write in the cached files on disk. On CI this can be slow; if we want to check that the cached_bytes_written has changed we need to check multiple times to give some time for the cached data to be effectively written.	2020-07-01 18:01:00 +02:00
Benjamin Trent	c768467155	Muting flakey test (#58855 ) (#58856 )	2020-07-01 11:54:43 -04:00
Lee Hinman	d3d03fc1c6	[7.x] Add default composable templates for new indexing strategy (#57629 ) (#58757 ) Backports the following commits to 7.x: Add default composable templates for new indexing strategy (#57629)	2020-07-01 09:32:32 -06:00
Ryan Ernst	c23613e05a	Split license allowed checks into two types (#58704 ) (#58797 ) The checks on the license state have a singular method, isAllowed, that returns whether the given feature is allowed by the current license. However, there are two classes of usages, one which intends to actually use a feature, and another that intends to return in telemetry whether the feature is allowed. When feature usage tracking is added, the latter case should not count as a "usage", so this commit reworks the calls to isAllowed into 2 methods, checkFeature, which will (eventually) both check whether a feature is allowed, and keep track of the last usage time, and isAllowed, which simply determines whether the feature is allowed. Note that I considered having a boolean flag on the current method, but wanted the additional clarity that a different method name provides, versus a boolean flag which is more easily copied without realizing what the flag means since it is nameless in call sites.	2020-07-01 07:11:05 -07:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
Tanguy Leroux	d35e8f45da	Allow read operations to be executed without waiting for full range to be written in cache (#58728 ) (#58829 ) This commit changes CacheFile and CachedBlobContainerIndexInput so that the read operations made by these classes are now progressively executed and do not wait for full range to be written in cache. It relies on the change introduced in #58477 and it is the last change extracted from #58164. Relates #58164	2020-07-01 15:38:17 +02:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Andrei Stefan	b904a60275	EQL: Add case handling to stringContains (#58762 ) (#58813 ) Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit 1a58776d3aa563beb364b067a1db46497122306f)	2020-07-01 13:51:45 +03:00
Andrei Stefan	470bcee5bf	EQL: Integrate TOML tests for function folding (#58748 ) (#58812 ) Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com> (cherry picked from commit e9b1fa58cf8d510a4b4afb14f66b0d5f9c603ebb)	2020-07-01 13:50:54 +03:00
Przemysław Witek	2638809cba	Mute failing test DataFrameAnalyticsConfigProviderIT.testUpdate_UpdateCannotBeAppliedWhenTaskIsRunning (#58821 )	2020-07-01 12:28:23 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
David Kyle	27d52d4d23	Remove the Model interface (#58754 ) (#58803 ) The Model interface was implemented by just one class and did not contribute to making the code more undertandable	2020-07-01 09:57:02 +01:00
Dario Gieselaar	417f7062c5	[7.x] Add read privileges for annotations for apm_user (#58530 ) (#58781 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-01 09:04:57 +02:00
Yang Wang	3d49e62960	Support handling LogoutResponse from SAML idP (#56316 ) (#58792 ) SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective. The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error. This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.	2020-07-01 16:47:27 +10:00
Lee Hinman	74a78b3a7b	Mute AzureSearchableSnapshotsIT (#58775 ) Relates to #58260	2020-06-30 13:30:51 -06:00
Dan Hermann	22806c943d	Data stream support for ILM remove policy API (#58595 ) (#58770 )	2020-06-30 14:03:19 -05:00
Benjamin Trent	a2331bc9d4	[Transform] fix bug in supporting boolean values in pivot (#58741 ) (#58760 ) Since the underlying composite aggs support boolean mapped values for terms, transforms should also support them closes #58697	2020-06-30 13:47:58 -04:00
Martijn van Groningen	adcef93a6c	Introduce new put mapping action for dynamic mapping updates. (#58746 ) Backport of #58419 Mapping updates that originate from indexing a document with unmapped fields will use this new action instead of the current put mapping action. This way on the security side, authorization logic can easily determine whether a mapping update is automatically generated or a mapping update originates from the put mapping api. The new auto put mapping action is only used if all nodes are on the version that supports it.	2020-06-30 18:02:31 +02:00
Julie Tibshirani	ab65a57d70	Merge mappings for composable index templates (#58709 ) This PR implements recursive mapping merging for composable index templates. When creating an index, we perform the following: * Add each component template mapping in order, merging each one in after the last. * Merge in the index template mappings (if present). * Merge in the mappings on the index request itself (if present). Some principles: * All 'structural' changes are disallowed (but everything else is fine). An object mapper can never be changed between `type: object` and `type: nested`. A field mapper can never be changed to an object mapper, and vice versa. * Generally, each section is merged recursively. This includes `object` mappings, as well as root options like `dynamic_templates` and `meta`. Once we reach 'leaf components' like field definitions, they always overwrite an existing one instead of being merged. Relates to #53101.	2020-06-30 08:01:37 -07:00

... 3 4 5 6 7 ...

5291 Commits