OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-18 19:05:06 +00:00

Author	SHA1	Message	Date
Jake Landis	43dc72f1a5	Fix cluster alert for watcher/monitoring IndexOutOfBoundsExcep… (#47756 ) If a cluster sending monitoring data is unhealthy and triggers an alert, then stops sending data the following exception [1] can occur. This exception stops the current Watch and the behavior is actually correct in part due to the exception. Simply fixing the exception introduces some incorrect behavior. Now that the Watch does not error in the this case, it will result in an incorrectly "resolved" alert. The fix here is two parts a) fix the exception b) fix the following incorrect behavior. a) fixing the exception is as easy as checking the size of the array before accessing it. b) fixing the following incorrect behavior is a bit more intrusive - Note - the UI depends on the success/met state for each condition to determine an "OK" or "FIRING" In this scenario, where an unhealthy cluster triggers an alert and then goes silent, it should keep "FIRING" until it hears back that the cluster is green. To keep the Watch "FIRING" either the index action or the email action needs to fire. Since the Watch is neither a "new" alert or a "resolved" alert, we do not want to keep sending an email (that would be non-passive too). Without completely changing the logic of how an alert is resolved allowing the index action to take place would result in the alert being resolved. Since we can not keep "FIRING" either the email or index action (since we don't want to resolve the alert nor re-write the logic for alert resolution), we will introduce a 3rd action. A logging action that WILL fire when the cluster is unhealthy. Specifically will fire when there is an unresolved alert and it can not find the cluster state. This logging action is logged at debug, so it should be noticed much. This logging action serves as an 'anchor' for the UI to keep the state in an a "FIRING" status until the alert is resolved. This presents a possible scenario where a cluster starts firing, then goes completely silent forever, the Watch will be "FIRING" forever. This is an edge case that already exists in some scenarios and requires manual intervention to remove that Watch. This changes changes to use a template-like method to populate the version_created for the default monitoring watches. The version is set to 7.5 since that is where this is first introduced. Fixes #43184	2019-10-09 10:47:21 -05:00
Martijn van Groningen	f8ebb75fcf	Reuse OperationRouting#searchShards(...) to select local enrich shard (#47359 ) The currently logic shard selecting logic selects a random shard copy instead of selecting the local shard copy and if local copy is not available then selecting a random shard copy. The latter is desired behaviour for enrich. By reusing `OperationRouting#searchShards(...)` we get the desired behaviour and reuse the same logic that the search api is using.	2019-10-09 17:31:43 +02:00
Yogesh Gaikwad	1139cce9a3	[DOCS] Add docs for `create_doc` index privilege (#47584 ) (#47778 ) This commit adds documentation for new index privilege create_doc which only allows indexing of new documents but no updates to existing documents via Index or Bulk APIs. Relates: #45806	2019-10-09 21:22:36 +11:00
Andrei Stefan	75a7daae73	SQL: use calendar interval of 1y instead of fixed interval for grouping by YEAR and HISTOGRAMs (#47558 ) (cherry picked from commit 55f5463eee4ecea3537df4b34645f1d87472a802)	2019-10-09 11:51:35 +03:00
Martijn van Groningen	be0e17770c	required change after merging in 7 dot x branch	2019-10-09 09:16:23 +02:00
Martijn van Groningen	da1e2ea461	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-09 09:06:13 +02:00
Lee Hinman	fb7abe9fa4	Separate SLM stop/start/status API from ILM (#47710 ) * Separate SLM stop/start/status API from ILM This separates a start/stop/status API for SLM from being tied to ILM's operation mode. These APIs look like: ``` POST /_slm/stop POST /_slm/start GET /_slm/status ``` This allows administrators to have fine-grained control over preventing periodic snapshots and deletions while performing cluster maintenance. Relates to #43663 * Allow going from RUNNING to STOPPED * Align with the OperationMode rules * Fix slmStopping method * Make OperationModeUpdateTask constructor private * Wipe snapshots better in test	2019-10-08 17:21:38 -06:00
Gordon Brown	a492864a9d	Manage retention of failed snapshots in SLM (#47617 ) Failed snapshots will eventually build up unless they are deleted. While failures may not take up much space, they add noise to the list of snapshots and it's desirable to remove them when they are no longer useful. With this change, failed snapshots are deleted using the following strategy: `FAILED` snapshots will be kept until the configured `expire_after` period has passed, if present, and then be deleted. If there is no configured `expire_after` in the retention policy, then they will be deleted if there is at least one more recent successful snapshot from this policy (as they may otherwise be useful for troubleshooting purposes). Failed snapshots are not counted towards either `min_count` or `max_count`.	2019-10-08 17:07:08 -06:00
James Baiera	b9fb354618	Add retry to force merge operation in EnrichPolicyRunner (#47178 ) Adds a check when running an Enrich policy to make sure that an Enrich index is force merged down to one segment, and if it was not fully merged, attempts the merge again, up to a configurable number of times.	2019-10-08 11:23:02 -04:00
Martijn van Groningen	8b7100eb1f	Don't remove indices to avoid monitoring from intermittently failing to index monitoring docs.	2019-10-08 17:10:42 +02:00
Jake Landis	b578059c90	Re-enable Watcher rest test (#47699 ) (#47705 ) This test is believed to be fixed by #43939 closes #43988	2019-10-08 09:45:27 -05:00
Dimitris Athanasiou	c1b0bfd74a	[7.x][ML] Unwrap exception causes before calling instanceof (#47676 ) (#47724 ) When exceptions could be returned from another node, the exception might be wrapped in a `RemoteTransportException`. In places where we handled specific exceptions using `instanceof` we ought to unwrap the cause first. This commit attempts to fix this issue after searching code in the ML plugin. Backport of #47676	2019-10-08 16:02:47 +03:00
Alpar Torok	36d018c909	Convert RunTask to use testclusers, remove ClusterFormationTasks (#47572 ) * Convert RunTask to use testclusers, remove ClusterFormationTasks This PR adds a new RunTask and a way for it to start a testclusters cluster out of band and block on it to replace the old RunTask that used ClusterFormationTasks. With this we can now remove ClusterFormationTasks.	2019-10-08 14:43:29 +03:00
Benjamin Trent	d33dbf82d4	[7.x] [ML][Inference] adjusting definition object schema and validation (#47447 ) (#47673 ) * [ML][Inference] adjusting definition object schema and validation (#47447) * [ML][Inference] adjusting definition object schema and validation * finalizing schema and fixing inference npe * addressing PR comments * fixing for backport	2019-10-08 07:11:05 -04:00
Hendrik Muhs	5e0e54f455	[Transform] move root endpoint to _transform with BWC layer (#47127 ) (#47682 ) move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings	2019-10-08 08:59:01 +02:00
Lee Hinman	91988c7c26	Throw error retrieving non-existent SLM policy (#47679 ) Previously when retrieving an SLM policy it would always return a 200 with `{}` in the body, even if the policy did not exist. This changes that behavior to throw an error (similar to our other APIs) if a policy doesn't exist. This also adds a basic CRUD yml test for the behavior. Resolves #47664	2019-10-07 19:54:04 -06:00
Lee Hinman	906be45209	Add a test for SLM retention with security enabled (#47608 ) This enhances the existing SLM test using users/roles/etc to also test that SLM retention works when security is enabled. Relates to #43663	2019-10-07 19:52:09 -06:00
Lisa Cawley	39ef795085	[DOCS] Cleans up links to security content (#47610 ) (#47703 )	2019-10-07 15:23:19 -07:00
Tal Levy	a17f394e27	Geo-Match Enrich Processor (#47243 ) (#47701 ) this commit introduces a geo-match enrich processor that looks up a specific `geo_point` field in the enrich-index for all entries that have a geo_shape match field that meets some specific relation criteria with the input field. For example, the enrich index may contain documents with zipcodes and their respective geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode they associate according to which shape they are contained within. this commit also refactors some of the MatchProcessor by moving a lot of the shared code to AbstractEnrichProcessor. Closes #42639.	2019-10-07 15:03:46 -07:00
Jake Landis	74876811c2	Watcher - catch uncaught exception. (#47680 ) (#47695 ) If a thread pool rejection exception happens, an alternative code path is chosen to write history and delete the trigger. If an exception happens during deletion of the trigger an exception may be thrown and not caught. This commit catches the exception and provides a meaning error message. fixes #47008	2019-10-07 15:45:45 -05:00
Jake Landis	a49a1b6994	Watcher remove assertion that is susceptible to a race conditi… (#47667 ) When deactivating a watch, there is a chance that it is fully deactivated and reporting as not running but the history is not fully written yet. There is not a tight coupling between the associated watcher history index and the deactivation. This test assumes that once a watch is deactivated that all history is fully written in a very short time period. If the Watch is deactivated, but the history is slow to write it can result in a failing test. This change removes an assertion that assumes that the deactivation of a watch ensured the all of the watch history was written. There is still a minor race condition with respect to the remaining history assertions. However, if the history is slow to be written, it will allow the test to still passing. fixes #47503	2019-10-07 12:07:10 -05:00
Dimitris Athanasiou	7667ea5f6f	[7.x][ML] Additional outlier detection parameters (#47600 ) (#47669 ) Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600	2019-10-07 18:21:33 +03:00
Marios Trivyzas	e698e68f06	SQL: Allow whitespaces in escape patterns (#47577 ) Previously, we supported only the format `{fn <FUNCTION_NAME>()}` but other DBs like MSSQL, DB2, MariaDB/MySQL alos allow whitespaces between `{` and `fn`. Furhermore, also some applications - like PowerBI - generate escape sequences with spaces: `select { fn name(params) } etc.` Add support for white spaces between `{` and the escape pattern definition like `fn`, `ts`, `d`, `guid` etc. Closes: #47401 (cherry picked from commit 08a22d0b393f4a76c52dabc5e7b9cafcc19c30ca)	2019-10-07 15:05:02 +02:00
Yogesh Gaikwad	b6d1d2e6ec	Add 'create_doc' index privilege (#45806 ) (#47645 ) Use case: User with `create_doc` index privilege will be allowed to only index new documents either via Index API or Bulk API. There are two cases that we need to think: - User indexing a new document without specifying an Id. For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`. - User indexing a new document with an Id. This is problematic as we do not know whether a document with Id exists or not. If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine. Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents. In the `AuthorizationService` when authorizing a bulk request, we check the implied action. This code changes that to append the `:op_type/index` or `:op_type/create` to indicate the implied index action.	2019-10-07 23:58:44 +11:00
Yogesh Gaikwad	7c862fe71f	Add support to retrieve all API keys if user has privilege (#47274 ) (#47641 ) This commit adds support to retrieve all API keys if the authenticated user is authorized to do so. This removes the restriction of specifying one of the parameters (like id, name, username and/or realm name) when the `owner` is set to `false`. Closes #46887	2019-10-07 23:58:21 +11:00
Tanguy Leroux	b5ac0204d2	Fail earlier Put Follow requests for closed leader indices (#47637 ) Backport of (#47582) Today when following a new leader index, we fetch the remote cluster state, check the remote cluster license, check the user privileges, retrieve the index shard stats before initiating a CCR restore session. But if the leader index to follow is closed, we're executing a bunch of operations that would inevitability fail at some point (on retrieving the index shard stats, because this type of request forbid closed indices when resolving indices). We could fail a Put Follow request at the first step by checking the leader index state directly from the remote cluster state. This also helps the Resume Follow API to fail a bit earlier.	2019-10-07 13:59:04 +02:00
Alpar Torok	bc85b22c1f	Complete testclusters backport (#47623 ) * Use versions specific distribution folders so we don't need to clean up (#46539) * Retry deleting distro dir on windows When retarting the cluster we clean up old distribution files that might still be in use by the OS. Windows closes resources of ded processes async, so we do a couple of retries to get arround it. Closes #46014 * Avoid having to delete the distro folder. * Remove the use of ClusterFormationTasks form RestTestTask (#47022) This PR removes a use-case of the ClusterFormationTasks and converts a project that flew under the radar so far. There's probably more clean-up possible here, but for now the goal is to be able to remove that code after `RunTask` is also updated. * Migrate some 7.x only projects	2019-10-07 11:43:57 +03:00
Martijn van Groningen	f2f2304c75	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-07 10:07:56 +02:00
Andrei Dan	4506b37ed5	ILM: Skip rolling indexes that are already rolled (#47324 ) (#47592 ) An index with an ILM policy that has a rollover action in one of the phases was rolled over when the ILM conditions dictated regardless if it was already rolled over (eg. manually after modifying an index template in order to force the creation of a new index that uses the new mappings). This changes this behaviour and has ILM check if the index it's about to roll has not been rolled over in the meantime. (cherry picked from commit 37d6106feeb9f9369519117c88a9e7e30f3ac797) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-10-07 07:47:47 +01:00
Ioannis Kakavas	36cabbae80	NameID mapping and Single Logout (#47288 ) (#47561 ) Clarify in the documentation that for SAML Single Logout to be functional, the Identity Provider needs to release a NameID.	2019-10-07 09:19:32 +03:00
Dimitris Athanasiou	ffacfc642c	[7.x][ML] Mute RegressionIT.testStopAndRestart (#47624 ) (#47625 ) Relates #47612	2019-10-05 23:58:32 +03:00
Jason Tedor	35ca3d68d7	Validating monitoring hosts setting while parsing (#47571 ) This commit lifts the validation of the monitoring hosts setting into the setting itself, rather than when the setting is used. This prevents a scenario where an invalid value for the setting is accepted, but then later fails while applying a cluster state with the invalid setting.	2019-10-04 17:32:49 -04:00
Lee Hinman	79376b7219	Set default SLM retention invocation time (#47604 ) This adds a default for the `slm.retention_schedule` setting, setting it to `0 30 1 * * ?` which is 1:30am every day. Having retention unset meant that it would never be invoked and clean up snapshots. We determined it would be better to have a default than never to be run. When coming to a decision, we weighed the option of an absolute time (such as 1:30am) versus a periodic invocation (like every 12 hours). In the end we decided on the absolute time because it has better predictability and consistency than a periodic invocation, which would rely on when the master node were elected or restarted. Relates to #43663	2019-10-04 15:00:20 -06:00
Lisa Cawley	f35fcf7204	[DOCS] Adds security content in the Elasticsearch Reference (#47596 )	2019-10-04 13:11:05 -07:00
James Baiera	a66c0dcd95	Add pipeline to ensure unique Enrich index documents (#46348 ) Adds a pipeline that removes ids and routing from documents before indexing them into enrich indices. Enrich documents may come from multiple indices, and thus have id collisions on them. This pipeline ensures that documents with colliding id fields do not clobber one another during the reindex operation while executing an enrich policy.	2019-10-04 12:20:52 -04:00
Przemysław Witek	ee952da2e2	[7.x] Implement evaluation API for multiclass classification problem (#47126 ) (#47343 )	2019-10-04 17:54:51 +02:00
Lisa Cawley	9b3e5409c1	[7.x][DOCS] Copies security source files from stack-docs (#47534 )	2019-10-04 08:19:10 -07:00
Andrei Stefan	a46f312ded	SQL: fix multi full-text functions usage with aggregate functions (#47444 ) * Skip functions involving full-text predicates when replacing multiple aggregate functions with "stats" or "matrix_stats" aggregations. (cherry picked from commit bb14ba83128dfb7a70f825ea08b1524072fb9ad0)	2019-10-04 16:27:22 +03:00
Alpar Torok	2b16d7bcf8	Backport testclusters all (#47565 ) * Bwc testclusters all (#46265) Convert all bwc projects to testclusters * Fix bwc versions config * WIP fix rolling upgrade * Fix bwc tests on old versions * Fix rolling upgrade	2019-10-04 16:12:53 +03:00
Przemysław Witek	8c180a77f0	[7.x] Fix serialization of evaluation response. (#47557 ) (#47566 )	2019-10-04 15:12:18 +02:00
Przemysław Witek	ec9b77deaa	[7.x] Implement new analysis type: classification (#46537 ) (#47559 )	2019-10-04 13:47:19 +02:00
David Roberts	31a5e1c7ee	[ML] More accurate job memory overhead (#47516 ) When an ML job runs the memory required can be broken down into: 1. Memory required to load the executable code 2. Instrumented model memory 3. Other memory used by the job's main process or ancilliary processes that is not instrumented Previously we added a simple fixed overhead to account for 1 and 3. This was 100MB for anomaly detection jobs (large because of the completely uninstrumented categorization function and normalize process), and 20MB for data frame analytics jobs. However, this was an oversimplification because the executable code only needs to be loaded once per machine. Also the 100MB overhead for anomaly detection jobs was probably too high in most cases because categorization and normalization don't use _that_ much memory. This PR therefore changes the calculation of memory requirements as follows: 1. A per-node overhead of 30MB for _only_ the first job of any type to be run on a given node - this is to account for loading the executable code 2. The established model memory (if applicable) or model memory limit of the job 3. A per-job overhead of 10MB for anomaly detection jobs and 5MB for data frame analytics jobs, to account for the uninstrumented memory usage This change will enable more jobs to be run on the same node. It will be particularly beneficial when there are a large number of small jobs. It will have less of an effect when there are a small number of large jobs.	2019-10-04 09:57:31 +01:00
Yogesh Gaikwad	d371f9d44d	Fix for ApiKeyIntegTests related to Expired API keys remover (#43477 ) (#47546 ) When API key is invalidated we do two things first it tries to trigger `ExpiredApiKeysRemover` task and second, we do index the invalidation for the API key. The index invalidation may happen before the `ExpiredApiKeysRemover` task is run and in that case, the API key invalidated will also get deleted. If the `ExpiredApiKeysRemover` runs before the API key invalidation is indexed then the API key is not deleted and will be deleted in the future run. This behavior was not captured in the tests related to `ExpiredApiKeysRemover` causing intermittent failures. This commit fixes those tests by checking if the API key invalidated is reported back when we get API keys after invalidation and perform the checks based on that. Closes #41747	2019-10-04 13:17:52 +10:00
Lisa Cawley	9c7b58900c	[DOCS] Fixes missing link title (#47481 )	2019-10-03 08:06:31 -07:00
Ioannis Kakavas	fd6a585009	Fix ADRealmTests in FIPS 140 JVMs (#47437 ) (#47506 ) The changes introduced in #47179 made it so that we could try to build an SSLContext with verification mode set to None, which is not allowed in FIPS 140 JVMs. This commit address that	2019-10-03 17:14:26 +03:00
Alpar Torok	0a14bb174f	Remove eclipse conditionals (#44075 ) * Remove eclipse conditionals We used to have some meta projects with a `-test` prefix because historically eclipse could not distinguish between test and main source-sets and could only use a single classpath. This is no longer the case for the past few Eclipse versions. This PR adds the necessary configuration to correctly categorize source folders and libraries. With this change eclipse can import projects, and the visibility rules are correct e.x. auto compete doesn't offer classes from test code or `testCompile` dependencies when editing classes in `main`. Unfortunately the cyclic dependency detection in Eclipse doesn't seem to take the difference between test and non test source sets into account, but since we are checking this in Gradle anyhow, it's safe to set to `warning` in the settings. Unfortunately there is no setting to ignore it. This might cause problems when building since Eclipse will probably not know the right order to build things in so more wirk might be necesarry.	2019-10-03 11:55:00 +03:00
Lee Hinman	2e3eb4b24e	Add API to execute SLM retention on-demand (#47405 ) (#47463 ) * Add API to execute SLM retention on-demand (#47405) This is a backport of #47405 This commit adds the `/_slm/_execute_retention` API endpoint. This endpoint kicks off SLM retention and then returns immediately. This in particular allows us to run retention without scheduling it (for entirely manual invocation) or perform a one-off cleanup. This commit also includes HLRC for the new API, and fixes an issue in SLMSnapshotBlockingIntegTests where retention invoked prior to the test completing could resurrect an index the internal test cluster cleanup had already deleted. Resolves #46508 Relates to #43663	2019-10-02 12:29:04 -06:00
Lee Hinman	013d87d716	Fix AllocationRoutedStepTests.testConditionMetOnlyOneCopyAlloc… (#47313 ) * Fix AllocationRoutedStepTests.testConditionMetOnlyOneCopyAllocated These tests were using randomly generated includes/excludes/requires for routing, however, it was possible to generate mutually exclusive allocation settings (about 1 out of 50,000 times for my runs). This splits the test into three different tests, and removes the randomization (it doesn't add anything to the testing here) to fix the issue. Resolves #47142	2019-10-02 10:01:23 -06:00
Ioannis Kakavas	4f722f0f53	Fix Active Directory tests (#47358 ) (#47440 ) Fixes multiple Active Directory related tests that run against the samba fixture. Some were failing since we changed the realm settings format in 7.0 and a few were slightly broken in other ways. We can move to cleanup the tests in a follow up but this work fits better to be done with or after we move the tests from a Samba based fixture to a real(-ish) Microsoft Active Directory based fixture. Resolves: #33425, #35738	2019-10-02 17:18:12 +03:00
Benjamin Trent	2228a7dd8d	[ML][Inference] adding ensemble model objects (#47241 ) (#47438 ) * [ML][Inference] adding ensemble model objects * addressing PR comments * Update TreeTests.java * addressing PR comments * fixing test	2019-10-02 09:49:46 -04:00

... 3 4 5 6 7 ...

4211 Commits