OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-09 06:25:07 +00:00

Author	SHA1	Message	Date
Tanguy Leroux	8669766a81	Reduce contention in CacheFile.fileLock() method (#55662 ) The CacheFile.fileLock() method is used to acquire a lock on a cache file so that the file can't be deleted (or its file handle closed) during the execution of a read or a write operation. Today this lock is obtained by first acquiring the eviction lock (the write lock of the readwrite lock), then by checking if the cache file is evicted and the file channel still open, and finally by obtaining the file lock (the read lock of the readwrite lock). Acquiring the read lock while the eviction lock is held ensures that the cache file eviction cannot start in the meanwhile. But eviction starts (and terminations) also acquire the eviction lock; and this lock cannot be obtained while a read lock is held (the write lock of a readwrite lock is exclusive). If we were acquiring a read lock and checking the eviction flag and file channel existence while holding the read lock we know that no eviction can start or finish until the read lock is released.	2020-04-23 14:40:27 +02:00
Rory Hunter	d66af46724	Always use deprecateAndMaybeLog for deprecation warnings (#55319 ) Backport of #55115. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages can potentially be deduplicated.	2020-04-23 09:20:54 +01:00
David Roberts	87f4751eca	[ML] Make find_file_structure recognize Kibana CSV report timestamps (#55609 ) The Kibana CSV export feature uses a non-standard timestamp format. This change adds it to the formats the find_file_structure endpoint recognizes out-of-the-box, to make round-tripping data from Kibana back to Kibana via CSV files easier. Fixes #55586	2020-04-23 08:39:07 +01:00
Jake Landis	25ea6a74f0	[7.x] Validate REST specs against schema (#55117 ) (#55563 ) A JSON schema was recently introduced for the REST API specification. #54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes #54314	2020-04-22 14:14:03 -05:00
Albert Zaharovits	82ed0ab420	Update the audit logfile list of system users (#55578 ) Out of the box "access granted" audit events are not logged for system users. The list of system users was stale and included only the _system and _xpack users. This commit expands this list with _xpack_security and _async_search, effectively reducing the auditing noise by not logging the audit events of these system users out of the box. Closes #37924	2020-04-22 21:59:31 +03:00
Tal Levy	c370b83bd7	Fix locale lowercase test issue in GenerateSnapshotNameStepTests (#55597 ) (#55605 ) The testPerformAction test has been failing periodically due to how Hamcrest's containsStringIgnoringCase does not lowercase using the same Locale set in the test infrastructure. This commit falls back to explicitly lowercasing using the root locale	2020-04-22 11:29:57 -07:00
Tal Levy	f27ce69f0c	[backport] Add geo_bounds aggregation support for geo_shape (#55328 ) (#55600 ) This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields	2020-04-22 11:29:35 -07:00
Tal Levy	0844455505	Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037 ) (#55500 ) After #53562, the `geo_shape` field mapper is registered within a module. This opens the door for introducing a new `geo_shape` field mapper into the Spatial Plugin that has doc-values support. This is very much an extension of server's GeoShapeFieldMapper, but with the addition of the doc values implementation.	2020-04-22 08:12:54 -07:00
Dimitris Athanasiou	50a5afed15	[7.x][ML] Prepare parsing phase_progress from DFA process (#55580 ) (#55587 ) Data frame analytics process currently reports progress as an integer `progress_percent`. We parse that and report it from the _stats API as the progress of the `analyzing` phase. However, we want to allow the DFA process to report progress for more than one phase. This commit prepares for this by parsing `phase_progress` from the process, an object that contains the `phase` name plus the `progress_percent` for that phase. Backport of #55580	2020-04-22 16:38:32 +03:00
Benjamin Trent	7c81cd7833	[ML] explicitly disallow partial results in datafeed extractors (#55537 ) (#55585 ) Instead of doing our own checks against REST status, shard counts, and shard failures, this commit changes all our extractor search requests to set `.setAllowPartialSearchResults(false)`. - Scrolls are automatically cleared when a search failure occurs with `.setAllowPartialSearchResults(false)` set. - Code error handling is simplified closes https://github.com/elastic/elasticsearch/issues/40793	2020-04-22 09:07:44 -04:00
David Roberts	810caf5ffe	[ML] Test that audit message is written when closing unassigned job (#55582 ) Issue #55521 suggested that audit messages were not written when closing an unassigned job. This is not the case, but we didn't have a test to prove it. Backport of #55571	2020-04-22 13:23:43 +01:00
David Roberts	2dc5586afe	[ML] Add effective max model memory limit to ML info (#55581 ) The ML info endpoint returns the max_model_memory_limit setting if one is configured. However, it is still possible to create a job that cannot run anywhere in the current cluster because no node in the cluster has enough memory to accommodate it. This change adds an extra piece of information, limits.effective_max_model_memory_limit, to the ML info response that returns the biggest model memory limit that could be run in the current cluster assuming no other jobs were running. The idea is that the ML UI will be able to warn users who try to create jobs with higher model memory limits that their jobs will not be able to start unless they add a bigger ML node to their cluster. Backport of #55529	2020-04-22 12:28:50 +01:00
David Roberts	da5aeb8be7	[ML] Return assigned node in start/open job/datafeed response (#55570 ) Adds a "node" field to the response from the following endpoints: 1. Open anomaly detection job 2. Start datafeed 3. Start data frame analytics job If the job or datafeed is assigned to a node immediately then this field will return the ID of that node. In the case where a job or datafeed is opened or started lazily the node field will contain an empty string. Clients that want to test whether a job or datafeed was opened or started lazily can therefore check for this. Backport of #55473	2020-04-22 12:06:53 +01:00
David Kyle	e99ef3542c	Mute ModelLoadingServiceTests::testMaxCachedLimitReached	2020-04-22 11:53:07 +01:00
Tim Vernum	8b566aea47	Fix use of password protected PKCS#8 keys for SSL (#55567 ) PEMUtils would incorrectly fill the encryption password with zeros (the '\0' character) after decrypting a PKCS#8 key. Since PEMUtils did not take ownership of this password it should not zero it out because it does not know whether the caller will use that password array again. This is actually what PEMKeyConfig does - it uses the key encryption password as the password for the ephemeral keystore that it creates in order to build a KeyManager. Backport of: #55457	2020-04-22 16:38:51 +10:00
Yang Wang	32e46bf552	Fix certutil http for empty password with JDK 11 and lower (#55437 ) (#55565 ) Fix elasticseaerch-certutil http command so that it correctly accepts empty keystore password with JDK version 11 and lower.	2020-04-22 15:03:10 +10:00
David Kyle	8e8c6b4aee	Fix accounting in ModelLoadingServiceTests (#55307 ) (#55547 ) In the test after the first load event is is not known which models are cached as loading a later one will evict an earlier one and the order is not known. The models could have been loaded 1 or 2 times not exactly twice	2020-04-21 19:25:06 +01:00
Armin Braun	db7eb8e8ff	Remove Redundant CS Update on Snapshot Finalization (#55276 ) (#55528 ) This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.	2020-04-21 15:33:17 +02:00
David Turner	be60d50452	Allow searching of snapshot taken while indexing (#55511 ) Today a read-only engine requires a complete history of operations, in the sense that its local checkpoint must equal its maximum sequence number. This is a valid check for read-only engines that were obtained by closing an index since closing an index waits for all in-flight operations to complete. However a snapshot may not have this property if it was taken while indexing was ongoing, but that's ok. This commit weakens the check for a complete history to exclude the case of a searchable snapshot. Relates #50999	2020-04-21 13:21:38 +01:00
Ignacio Vera	e4c65b4388	mute test SSLReloadDuringStartupIntegTests.testReloadDuringStartup (#55525 )	2020-04-21 14:13:13 +02:00
Jim Ferenczi	0b3bdfcc3e	Fix expiration time in async search response (#55435 ) This change ensures that we return the latest expiration time when retrieving the response from the index. This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.	2020-04-21 14:04:29 +02:00
Przemysław Witek	59d377462f	Apply default timeout in StopDataFrameAnalyticsAction.Request (#55512 ) (#55517 )	2020-04-21 13:05:48 +02:00
Nhat Nguyen	3cc4e0dd09	Retry follow task when remote connection queue full (#55314 ) If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.	2020-04-20 22:43:05 -04:00
Stuart Tettemer	93a2e9b0f9	Test: MockScoreScript can be cacheable. (#55499 ) Backport: 0ed1eb5	2020-04-20 17:09:58 -06:00
Benjamin Trent	cabff65aec	[ML] Fixing inference stats race condition (#55163 ) (#55486 ) `updateAndGet` could actually call the internal method more than once on contention. If I read the JavaDocs, it says: ```* @param updateFunction a side-effect-free function``` So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted. To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`. When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`. This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference. I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO). closes https://github.com/elastic/elasticsearch/issues/54786	2020-04-20 16:21:18 -04:00
Benjamin Trent	24d41eb695	[ML] partitions model definitions into chunks (#55260 ) (#55484 ) This paves the data layer way so that exceptionally large models are partitioned across multiple documents. This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0. I chose the definition document limit to be 100. This SHOULD be plenty for any large model. One of the largest models that I have created so far had the following stats: ~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap. With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents. Supporting models 20 times this size (compressed) seems adequate for now.	2020-04-20 16:08:54 -04:00
Benjamin Trent	fa0373a19f	[7.x] [ML] Fix log spam and disable ILM/SLM history for native ML tests (#55475 ) * [ML] fix native ML test log spam (#55459) This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors. * removing unnecessary changes * removing unused imports * removing unnecessary java setting	2020-04-20 15:41:30 -04:00
Lee Hinman	9eddd2bcc9	[7.x] Add prefer_v2_templates flag and index setting (#55411 ) (#55476 ) This commit adds a new querystring parameter on the following APIs: - Index - Update - Bulk - Create Index - Rollover These APIs now support a `?prefer_v2_templates=true\|false` flag. This flag changes the preference creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be changed to `true` for 8.0+ in subsequent work. Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting. This setting is used so that actions that automatically create a new index (things like rollover initiated by ILM) will inherit the preference from the original index. This setting is dynamic so that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias performing periodic rollover. This also adds support for sending this parameter to the High Level Rest Client. Relates to #53101	2020-04-20 12:05:42 -06:00
Armin Braun	a0763d958d	Make RepositoryData Less Memory Heavy (#55293 ) (#55468 ) We don't really need `LinkedHashSet` here. We can assume that all the entries are unique and just use a list and use the list utilities to create the cheapest possible version of the list. Also, this fixes a bug in `addSnapshot` which would mutate the existing linked hash set on the current instance (fortunately this never caused a real world bug) and brings the collection in line with the java docs on its getter that claim immutability.	2020-04-20 18:28:06 +02:00
William Brafford	7817948926	Disable monitoring in ML multinode tests (#55461 ) Removing the deprecated "xpack.monitoring.enabled" setting introduced log spam and potentially some failures in ML tests. It's possible to use a different, non-deprecated setting to disable monitoring, so we do that here.	2020-04-20 10:51:16 -04:00
David Turner	0df329dde7	Use soft deletes for searchable snapshots tests (#55453 ) This allows us to perform some dummy indexing including updates/deletes.	2020-04-20 14:37:51 +01:00
Przemysław Witek	7d5f74e964	Fix and unmute testSetUpgradeMode_ExistingTaskGetsUnassigned (#55368 ) (#55452 )	2020-04-20 13:29:29 +02:00
Yannick Welsch	b9da307cd1	Add GCS support for searchable snapshots (#55403 ) Adds ranged read support for GCS repositories in order to enable searchable snapshot support for GCS. As part of this PR, I've extracted some of the test infrastructure to make sure that GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering similar test (as I saw those diverging in what they cover)	2020-04-20 13:02:59 +02:00
Jason Tedor	9ecb222bfa	Remove unneeded validation in feature set usage This validation is not needed, as we have discovered the source of the serialization error that was leading to some usage instances appearing to not have a name.	2020-04-18 14:29:59 -04:00
Jason Tedor	23049391be	Upgrade feature aware check usage of ASM to 7.3.1 (#54577 ) This commit upgrades the ASM dependency used in the feature aware check to 7.3.1. This gives support for JDK 14. Additionally, now that Gradle understands JDK 13, it means we can remove a restriction on running the feature aware check to JDK 12 and lower.	2020-04-18 10:49:57 -04:00
Jay Modi	405ff0ce27	Handle TLS file updates during startup (#55330 ) This change reworks the loading and monitoring of files that are used for the construction of SSLContexts so that updates to these files are not lost if the updates occur during startup. Previously, the SSLService would parse the settings, build the SSLConfiguration objects, and construct the SSLContexts prior to the SSLConfigurationReloader starting to monitor these files for changes. This allowed for a small window where updates to these files may never be observed until the node restarted. To remove the potential miss of a change to these files, the code now parses the settings and builds SSLConfiguration instances prior to the construction of the SSLService. The files back the SSLConfiguration instances are then registered for monitoring and finally the SSLService is constructed from the previously parse SSLConfiguration instances. As the SSLService is not constructed when the code starts monitoring the files for changes, a CompleteableFuture is used to obtain a reference to the SSLService; this allows for construction of the SSLService to complete and ensures that we do not miss any file updates during the construction of the SSLService. While working on this change, the SSLConfigurationReloader was also refactored to reflect how it is currently used. When the SSLConfigurationReloader was originally written the files that it monitored could change during runtime. This is no longer the case as we stopped the monitoring of files that back dynamic SSLContext instances. In order to support the ability for items to change during runtime, the class made use of concurrent data structures. The use of these concurrent datastructures has been removed. Closes #54867 Backport of #54999	2020-04-17 20:10:33 -06:00
Zachary Tong	f46b567563	Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250 ) Some aggregations, such as the Terms* family, will use an alternate class to represent unmapped shard results (while the rest of the aggs use the same object but with some form of "empty" or "nullish" values to represent unmapped). This was problematic with AbstractWireSerializingTestCase because it expects the instanceReader to always match the original class. Instead, we need to use the NamedWriteable version so that the registry can be consulted for the proper deserialization reader.	2020-04-17 16:39:38 -04:00
Ryan Ernst	66071b2f6e	Remove combo security and license helper from license state (#55366 ) (#55417 ) Security features in the license state currently do a dynamic check on whether security is enabled. This is because the license level can change the default security enabled state. This commit splits out the check on security being enabled, so that the combo method of security enabled plus license allowed is no longer necessary.	2020-04-17 13:07:02 -07:00
William Brafford	49e30b15a2	Deprecate disabling basic-license features (#54816 ) (#55405 ) We believe there's no longer a need to be able to disable basic-license features completely using the "xpack..enabled" settings. If users don't want to use those features, they simply don't need to use them. Having such features always available lets us build more complex features that assume basic-license features are present. This commit deprecates settings of the form "xpack..enabled" for basic-license features, excluding "security", which is a special case. It also removes deprecated settings from integration tests and unit tests where they're not directly relevant; e.g. monitoring and ILM are no longer disabled in many integration tests.	2020-04-17 15:04:17 -04:00
Benjamin Trent	4be3663968	[7.x] [ML] fix bugs with prediction field value settings (#55333 ) (#55394 ) * [ML] fix bugs with prediction field value settings (#55333) This fixes two unreleased bugs: 1. Prediction value type of `number` might show unexpected classes Analytics created models may have class labels like `1, 5, 10` (or some collection of discrete, whole numbers). These labels are passed to the inference model config in the `classification_labels` field. When the predicted value format is `numeric` it should attempt to see if the classification labels are provided and are numeric. If so, use those. If not, use the underlying value. 2. When supplying an update overwrite, inference was losing the default prediction field value. This is because it was not copied over in the copy ctor in the ClassificationConfig.Builder class. closes #55332	2020-04-17 14:45:02 -04:00
Jake Landis	eb30cf5c89	[7.x] Move Watcher config out of RestResourcesPlugin (#55136 ) (#55336 )	2020-04-17 12:38:01 -05:00
Benjamin Trent	8c581c3388	[ML] fixing and unmuting testHRDSplit test (#55349 ) (#55393 ) This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. closes https://github.com/elastic/elasticsearch/issues/32966	2020-04-17 09:55:52 -04:00
Tanguy Leroux	eb52df6652	Mute GraphTests.testTimedoutQueryCrawl (#55397 ) Relates #55396 Relates #53913	2020-04-17 15:31:48 +02:00
Benjamin Trent	65e0084120	[ML] do not start stopping tasks on reassignment (#55315 ) (#55388 ) When a anomaly jobs, datafeeds, and analytics tasks are stopped, they enter an ephemeral state called `STOPPING`. If the node executing the task fails while this is occurring, they could be stuck in the limbo state of `STOPPING`. It is best to mark the tasks as completed if they get reassigned to a node.	2020-04-17 08:57:12 -04:00
Tanguy Leroux	290361c63b	Mute MlConfigIndexMappingsFullClusterRestartIT.testMlConfigIndexMappingsAfterMigration (#55389 ) Relates #54415	2020-04-17 14:54:17 +02:00
Costin Leau	fc6261967b	SQL: Streamline declaration of LeafAggs (#55380 ) Avoid repetition of the aggregation builder setup Relates #55241 (cherry picked from commit 6cfe130e5da4aac11bad64f187fecc411139f5e2)	2020-04-17 15:04:54 +03:00
markharwood	7761b01a33	Remove normalizer support from wildcard field while we decide on approach for handling case insensitvity (#55294 ) (#55375 ) Closes #55288	2020-04-17 11:43:26 +01:00
Marios Trivyzas	f958e9abdc	SQL: Implement scripting inside aggs (#55241 ) (#55371 ) Implement the use of scalar functions inside aggregate functions. This allows for complex expressions inside aggregations, with or without GROUBY as well as with or without a HAVING clause. e.g.: ``` SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b FROM test GROUP BY b HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5 ``` Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as this is currently not implemented on the ElasticSearch side. Fixes: #29980 Fixes: #36865 Fixes: #37271 (cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)	2020-04-17 12:41:22 +02:00
Tanguy Leroux	71855fbfe0	Mute testSupportedFieldTypes in HDRPreAggregatedPercentile tests (#55369 ) Relates #55360	2020-04-17 10:49:43 +02:00
Martijn van Groningen	417d5f2009	Make data streams in APIs resolvable. (#55337 ) Backport from: #54726 The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs. Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex(). If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api). Relates to #53100	2020-04-17 08:33:37 +02:00

1 2 3 4 5 ...

5337 Commits