OpenSearch

Commit Graph

Author	SHA1	Message	Date
Przemysław Witek	8f815240b3	[7.x] Allow integer types for classification's dependent variable (#47902 ) (#48080 )	2019-10-16 11:09:56 +02:00
David Roberts	d9c7e3847e	[TEST] Don't assert order of data frame analytics audit messages (#48065 ) Audit messages are stored with millisecond timestamps. If two messages have the same millisecond timestamp then asserting on their order is impossible given the information available. This PR changes the assertion on audit messages in the native data frame analytics tests to assert that the expected audit messages exist in any order. Fixes #48035	2019-10-15 19:59:52 +01:00
Przemysław Witek	eaa56344b5	Verify that the failure reason of analytics process is empty (#48042 ) (#48071 )	2019-10-15 18:33:20 +02:00
Przemysław Witek	620bd9d224	Enable test testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows_TopClassesRequested now that top classes are correctly reported by C++. (#48043 ) (#48053 )	2019-10-15 14:49:16 +02:00
David Roberts	984323783e	[ML][7.x] Add lazy assignment job config option (#47993 ) This change adds: - A new option, allow_lazy_open, to anomaly detection jobs - A new option, allow_lazy_start, to data frame analytics jobs Both work in the same way: they allow a job to be opened/started even if no ML node exists that can accommodate the job immediately. In this situation the job waits in the opening/starting state until ML node capacity is available. (The starting state for data frame analytics jobs is new in this change.) Additionally, the ML nightly maintenance tasks now creates audit warnings for ML jobs that are unassigned. This means that jobs that cannot be assigned to an ML node for a very long time will show a yellow warning triangle in the UI. A final change is that it is now possible to close a job that is not assigned to a node without using force. This is because previously jobs that were open but not assigned to a node were an aberration, whereas after this change they'll be relatively common.	2019-10-15 06:55:11 +01:00
David Roberts	1ca25bed38	[ML][7.x] Add option to stop datafeed that finds no data (#47995 ) Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data. Backport of #47922	2019-10-14 17:19:13 +01:00
David Roberts	46ae86ac31	[ML] Fix detection of syslog-like timestamp in find_file_structure (#47970 ) Usually syslog timestamps have two spaces before a single digit day-of-month. However, in some non-syslog cases where syslog-like timestamps are used there is only one space. The grok pattern supports this, so the timestamp parser should too. This change makes the find_file_structure endpoint do this. Also fixes another problem that the same test case exposed in the find_file_structure endpoint, which was that the exclude_lines_pattern for delimited files was always created on the assumption the delimiter was a comma. Now it is based on the actual delimiter.	2019-10-13 20:07:54 +01:00
Benjamin Trent	627faf1850	[7.x] [ML][Analytics] fix bug where regression deleted early does not delete state (#47885 ) (#47914 ) * [ML][Analytics] fix bug where regression deleted early does not delete state (#47885) * [ML][Analytics] fix bug where regression deleted early does not delete state * Fixing ml with security test failure * fixing for older java	2019-10-11 15:11:16 -04:00
Przemysław Witek	c62fe8c344	Require that the dependent variable column has at most 2 distinct values in classfication analysis. (#47858 ) (#47906 )	2019-10-11 14:57:08 +02:00
Igor Motov	b5afa95fd8	Fix Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart Tracked by #47612	2019-10-10 18:17:01 +04:00
Igor Motov	17433e79d8	Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart Tracked by #47612	2019-10-10 17:56:23 +04:00
Dimitris Athanasiou	c1b0bfd74a	[7.x][ML] Unwrap exception causes before calling instanceof (#47676 ) (#47724 ) When exceptions could be returned from another node, the exception might be wrapped in a `RemoteTransportException`. In places where we handled specific exceptions using `instanceof` we ought to unwrap the cause first. This commit attempts to fix this issue after searching code in the ML plugin. Backport of #47676	2019-10-08 16:02:47 +03:00
Dimitris Athanasiou	7667ea5f6f	[7.x][ML] Additional outlier detection parameters (#47600 ) (#47669 ) Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600	2019-10-07 18:21:33 +03:00
Dimitris Athanasiou	ffacfc642c	[7.x][ML] Mute RegressionIT.testStopAndRestart (#47624 ) (#47625 ) Relates #47612	2019-10-05 23:58:32 +03:00
Przemysław Witek	ee952da2e2	[7.x] Implement evaluation API for multiclass classification problem (#47126 ) (#47343 )	2019-10-04 17:54:51 +02:00
Przemysław Witek	ec9b77deaa	[7.x] Implement new analysis type: classification (#46537 ) (#47559 )	2019-10-04 13:47:19 +02:00
David Roberts	31a5e1c7ee	[ML] More accurate job memory overhead (#47516 ) When an ML job runs the memory required can be broken down into: 1. Memory required to load the executable code 2. Instrumented model memory 3. Other memory used by the job's main process or ancilliary processes that is not instrumented Previously we added a simple fixed overhead to account for 1 and 3. This was 100MB for anomaly detection jobs (large because of the completely uninstrumented categorization function and normalize process), and 20MB for data frame analytics jobs. However, this was an oversimplification because the executable code only needs to be loaded once per machine. Also the 100MB overhead for anomaly detection jobs was probably too high in most cases because categorization and normalization don't use _that_ much memory. This PR therefore changes the calculation of memory requirements as follows: 1. A per-node overhead of 30MB for _only_ the first job of any type to be run on a given node - this is to account for loading the executable code 2. The established model memory (if applicable) or model memory limit of the job 3. A per-job overhead of 10MB for anomaly detection jobs and 5MB for data frame analytics jobs, to account for the uninstrumented memory usage This change will enable more jobs to be run on the same node. It will be particularly beneficial when there are a large number of small jobs. It will have less of an effect when there are a small number of large jobs.	2019-10-04 09:57:31 +01:00
Dimitris Athanasiou	b9541eb3af	[7.x][ML] Make PUT data frame analytics action a master node action (… (#47433 ) While it seemed like the PUT data frame analytics action did not have to be a master node action as the config is stored in an index rather than the cluster state, there are other subtle nuances which make it worthwhile to convert it. In particular, it helps maintain order of execution for put actions which are anyhow user driven and are expected to have low volume. This commit converts `TransportPutDataFrameAnalyticsAction` from a handled transport action to a master node action. Note this means that the action might fail in a mixed cluster but as the API is still experimental and not widely used there will be few moments more suitable to make this change than now.	2019-10-02 16:24:21 +03:00
David Roberts	4379a3c52b	[ML] Throttle the delete-by-query of expired results (#47177 ) Due to #47003 many clusters will have built up a large backlog of expired results. On upgrading to a version where that bug is fixed users could find that the first ML daily maintenance task deletes a very large amount of documents. This change introduces throttling to the delete-by-query that the ML daily maintenance uses to delete expired results to limit it to deleting an average 200 documents per second. (There is no throttling for state/forecast documents as these are expected to be lower volume.) Additionally a rough time limit of 8 hours is applied to the whole delete expired data action. (This is only rough as it won't stop part way through a single operation - it only checks the timeout between operations.) Relates #47103	2019-10-02 11:16:34 +01:00
Dimitris Athanasiou	36884a3c32	[7.x][ML] Restore analytics state if available (#47128 ) (#47393 ) This commit restores the model state if available in data frame analytics jobs. In addition, this changes the start API so that a stopped job can be restarted. As we now store the progress in the state index when the task is stopped, we can use it to determine what state the job was in when it got stopped. Note that in order to be able to distinguish between a job that runs for the first time and another that is restarting, we ensure reindexing progress is reported to be at least 1 for a running task.	2019-10-02 10:24:05 +03:00
Benjamin Trent	f5fe5e7cd6	[7.x] [ML][Inference] Adding preprocessors to definition object (#47320 ) (#47370 ) * [ML][Inference] Adding preprocessors to definition object (#47320) * [ML][Inference] Adding preprocessors to definition object * Update TrainedModelConfig.java * adjusting for backport	2019-10-01 13:31:25 -04:00
Benjamin Trent	4335e07716	[7.x] [ML][Inference] adding .ml-inference* index and storage (#47267 ) (#47310 ) * [ML][Inference] adding .ml-inference* index and storage (#47267) * [ML][Inference] adding .ml-inference* index and storage * Addressing PR comments * Allowing null definition, adding validation tests for model config * fixing line length * adjusting for backport	2019-10-01 08:20:33 -04:00
David Roberts	0807d409bf	[ML] Reinstate ML daily maintenance actions (#47103 ) A refactoring in 6.6 meant that the ML daily maintenance actions have not been run at all since then. This change installs the local master listener that schedules the ML daily maintenance, and also defends against some subtle race conditions that could occur in the future if a node flipped very quickly between master and non-master. Fixes #47003	2019-09-30 13:12:32 +01:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Przemysław Witek	3fbd58d156	[7.x] Allow evaluation to consist of multiple steps. (#46653 ) (#47194 )	2019-09-27 13:01:51 +02:00
David Roberts	77cc6d5bad	[TEST] Work around _cat/indices bug with security enabled (#47160 ) When the ML native multi-node tests use _cat/indices/_all and the request goes to a non-master node, _all is translated to a list of concrete indices by the authz layer on the coordinating node before the request is forwarded to the master node. Then it is possible for the master node to return an index_not_found_exception if one of the concrete indices that was expanded on the coordinating node has been deleted in the meantime. (#47159 has been opened to track the underlying problem.) It has been observed that the index that gets deleted when the problem affects the ML native multi-node tests is always the ML notifications index. The tests that fail are only interested in the presence or absense of ML results indices. Therefore the workaround is to only _cat indices that match the ML results index pattern. Fixes #45652	2019-09-26 13:29:40 +01:00
Dimitris Athanasiou	0765bd4bf7	[7.x][ML] Ensure data frame analytics task is only marked completed once (#47119 ) (#47157 ) Closes #46907	2019-09-26 15:26:06 +03:00
Tanguy Leroux	95e2ca741e	Remove unused private methods and fields (#47154 ) This commit removes a bunch of unused private fields and unused private methods from the code base. Backport of (#47115)	2019-09-26 12:49:21 +02:00
Benjamin Trent	05fb7be571	[7.x] [ML][Inference] Feature pre-processing objects and functions (#46777 ) (#47040 ) * [ML][Inference] Feature pre-processing objects and functions (#46777) To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future. This PR lays down the ground work for 3 basic encodings: * HotOne * Target Mean * Frequency More feature encodings or pre-processings could be added in the future: * Handling missing columns * Standardization * Label encoding * etc.... * fixing compilation for namedxcontent tests	2019-09-25 08:16:24 -04:00
Yannick Welsch	eb86d71edd	Mute MlJobIT.testDeleteJob Relates #45652	2019-09-25 12:53:09 +02:00
Yannick Welsch	7a5b5af171	Mute MlJobIT.testDeleteJobAsync Relates #45652	2019-09-25 12:53:05 +02:00
Benjamin Trent	00c1c0132b	[ML] fix two datafeed flush lockup bugs (#46982 ) (#47024 ) * [ML] fix two flush lockup bugs * Addressing PR comments * moving debug logging line so it is only written on success	2019-09-24 13:03:20 -04:00
Yannick Welsch	9638ca20b0	Allow dropping documents with auto-generated ID (#46773 ) When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678	2019-09-19 16:46:33 +02:00
Dimitris Athanasiou	02a5e153dc	[7.x][ML] Parse and index data frame analytics state (#46804 ) (#46820 ) This commit reuses the same state processor that is used for autodetect to parse state output from data frame analytics jobs. We then index the state document into the state index. Backport of #46804	2019-09-18 20:37:40 +03:00
Dimitris Athanasiou	cebe8da617	[7.x][ML] MlMemoryTracker should ignore analytics tasks without config (#46789 ) (#46811 ) It is possible for a running analytics job that its config is removed from the '.ml-config' index (perhaps the user deleted the entire index, etc.). In that case the task remains without a matching config. I have raised #46781 to discuss how to deal with this issue. This commit focuses on `MlMemoryTracker` and changes it so that when we get the configs for the running tasks we leniently ignore missing ones. This at least means memory tracking will keep working for other jobs if one or more are missing. In addition, this commit makes the cleanup code for native analytics tests more robust by explicitly stopping all jobs and force-stopping if an error occurs. This helps so that a single failing test does not cause other tests fail due to pending tasks. Backport of #46789	2019-09-18 16:35:25 +03:00
Przemysław Witek	e49be611ad	[7.x] Add audit messages for Data Frame Analytics (#46521 ) (#46738 )	2019-09-16 21:21:38 +02:00
Dimitris Athanasiou	63eb0d9081	[7.x][ML] Avoid marking data frame analytics task completed twice (#46721 ) (#46724 ) When the stop API is called while the task is running there is a chance the task gets marked completed twice. This may cause undesired side effects, like indexing the progress document a second time after the stop API has returned (the cause for #46705). This commit adds a check that the task has not been completed before proceeding to mark it so. In addition, when we update the task's state we could get some warnings that the task was missing if the stop API has been called in the meantime. We now check the errors are `ResourceNotFoundException` and ignore them if so. Closes #46705 Backports #46721	2019-09-15 17:25:26 +03:00
Dimitris Athanasiou	0bc8acaf5b	[7.x][ML] Create state index and alias before starting an analytics job (#46602 ) (#46648 ) This is fixing a bug where if an analytics job is started before any anomaly detection job is opened, we create an index after the state write alias. Instead, we should create the state index and alias before starting an analytics job and this commit makes sure this is the case. Backport of #46602	2019-09-13 10:34:12 +03:00
David Roberts	461de5b58e	[TEST] Remove incorrect data frame analytics state assertion (#46597 ) After starting the analytics job and checking its state the state can be any of "started", "reindexing" or "analyzing" depending on how quickly the work is done.	2019-09-11 16:33:14 +01:00
Dimitris Athanasiou	579af626f5	[7.x][ML] No error when datafeed stops during updating to started (#46495 ) (#46542 ) Investigating the test failure reported in #45518 it appears that the datafeed task was not found during a tast state update. There are only two places where such an update is performed: when we set the state to `started` and when we set it to `stopping`. We handle `ResourceNotFoundException` in the latter but not in the former. Thus the test reveals a rare race condition where the datafeed gets requested to stop before we managed to update its state to `started`. I could not reproduce this scenario but it would be my best guess. This commit catches `ResourceNotFoundException` while updating the state to `started` and lets the task terminate smoothly. Closes #45518 Backport of #46495	2019-09-11 13:18:42 +03:00
Przemysław Witek	e38e631dac	[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967 ) (#46519 )	2019-09-11 12:17:26 +02:00
Przemysław Witek	e21deae535	Disallow persisting any documents when datafeed is isolated (#46485 ) (#46490 )	2019-09-09 21:01:27 +02:00
David Roberts	7c7fb7e32d	[ML] Tolerate total_search_time_ms not mapped in get datafeed stats (#46432 ) ML users who upgrade from versions prior to 7.4 to 7.4 or later will have ML results indices that do not have mappings for the total_search_time_ms field. Therefore, when searching these indices we must tolerate this field not having a mapping. Fixes #46437	2019-09-06 14:31:15 +01:00
Dimitris Athanasiou	a6834068e3	[7.x][ML] Extract DataFrameAnalyticsTask into its own class (#46402 ) (#46426 ) This refactors `DataFrameAnalyticsTask` into its own class. The task has quite a lot of functionality now and I believe it would make code more readable to have it live as its own class rather than an inner class of the start action class. Backport of #46402	2019-09-06 14:13:46 +03:00
Benjamin Trent	457ff3e2fb	7.x/ml fix instance serialization bwc (#46404 ) * [ML] Fixing instance serialization version for bwc * fixing CppLogMessage	2019-09-05 13:23:26 -05:00
Benjamin Trent	5201386232	[ML] testFullClusterRestart waiting for stable cluster (#46280 ) (#46335 ) * [ML] waiting for ml indices before waiting task assignment testFullClusterRestart * waiting for a stable cluster after fullrestart * removing unused imports	2019-09-05 06:57:33 -05:00
Dimitris Athanasiou	8fca5b5204	[7.x][ML] Unmute testStopOutlierDetectionWithEnoughDocumentsToScroll (#46271 ) (#46282 ) The test seems to have been failing due to a race condition between stopping the task and refreshing the destination index. In particular, we were going forward with refreshing the destination index even though the task stopped in the meantime. This was fixed in request. Closes #43960 Backport of #46271	2019-09-04 10:57:01 +03:00
Benjamin Trent	d0c5573a51	[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044 ) (#46096 ) Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot. This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.	2019-08-30 09:27:07 -05:00
Dimitris Athanasiou	5921ae53d8	[7.x][ML] Regression dependent variable must be numeric (#46072 ) (#46136 ) * [ML] Regression dependent variable must be numeric This adds a validation that the dependent variable of a regression analysis must be numeric. * Address review comments and fix some problems In addition to addressing the review comments, this commit fixes a few issues I found during testing. In particular: - if there were mappings for required fields but they were not included we were not reporting the error - if explicitly included fields had unsupported types we were not reporting the error Unfortunately, I couldn't get those fixed without refactoring the code in `ExtractedFieldsDetector`.	2019-08-30 09:57:43 +03:00
Przemysław Witek	b8a0379057	Refactor auditor-related classes (#45893 ) (#46120 )	2019-08-29 14:21:03 +02:00

1 2 3 4 5 ...

506 Commits